My first post here but I've spent the last few weeks basically living on S.O., looking for answers and solutions. Unfortunately I have not found an answer to my current problem on here, or anywhere else, so I am hoping one of you lovely people can help me.
I am trying to batch process Autodesk Maya files from within Windows, to replace the filepaths for references, to one singular directory. At present, it just throws back a "" after I try to execute the code.
Here is my code so far - please pick holes in it as much as you want, I need to get better!
### Reference Example
# file -rdi 2 -ns "platform3_MDL_MASTER" -rfn
# "Toyota_2000GT_Spider:Toyota2000GT_Spider_RN"
# -typ "mayaAscii"
# "C:/svn/TEST/previs/maya_references/Toyota_2000GT_Spider.ma";
import os
# Defining the reference paths - before and after
projectPath = "C:/projects/TEST/previs"
originalPath = "C:/projects/TEST/previs/maya_references/"
newPath = "R:/projects/FRAT/production/maya_references/"
# Makes a list of all previs files in the given directory.
previsFiles = [os.path.join(d, x)
for d, dirs, files in os.walk(projectPath)
for x in files if x.endswith("_PREVIS.ma")]
previsSuffix = '.ma";'
newLines = []
# Loops through each previs file found...
# for each line that contains the previsSuffix...
# and splits the filepath into a directory and a filename
# and then replaces that originalPath with the newPath.
for scene in previsFiles:
with open(scene, "r") as fp:
previsReadlines = fp.readlines()
for line in previsReadlines:
if previsSuffix in line:
# Splits the directory from the file name.
lines = os.path.split(line)
newLines = line.replace(lines[0], newPath)
else:
break
with open(scene, 'w') as fw:
previsWritelines = fw.writelines()
You'll have the adjust this to work with your script, but I hope it gives you the general idea of how to replace a reference path with another.
The original code had 2 main issues:
1) You weren't actually changing the contents. Doing newLines = Doesn't actually re-assign previsReadlines, so the changes weren't being recorded.
2) Nothing was being passed to write with. writelines needs a parameter you intend to write with.
# Adjust these paths to existing maya scene files.
scnPath = "/path/to/file/to/open.ma"
oldPath = "/path/to/old/file.ma"
newPath = "/path/to/new/file.ma"
with open(scnPath, "r") as fp:
# Get all file contents in a list.
fileLines = fp.readlines()
# Use enumerate to keep track of what index we're at.
for i, line in enumerate(previsReadlines):
# Check if the line has the old path in it.
if oldPath in line:
# Replace with the new path and assign the change.
# Before you were assigning this to a new variable that goes nowhere.
# Instead it needs to re-assign the line from the list we first read from.
fileLines[i] = line.replace(oldPath, newPath)
# Completely replace the file with our changes.
with open(scnPath, 'w') as fw:
# You must pass the contents in here to write it.
fw.writelines(fileLines)
Related
I have a folder containing hundreds of files (scan_zmat_x.txt) where x is an incremental int [1,2,3...]. I need to open the file, find the last instance of a line, let's call it "gumballs" for now. Then I need to put everything in a new file. So far I've tried using .sh scripts but I only have access to a Windows machine currently. So py is a good alternative. I'm really stuck and could use some guidance.
I appreciate it. Cheers.
#!/bin/tcsh
Efile=opt/e.txt
Logs=opt/infiles/scan_zmat_$i.log
for i in Logs do
grep -winr "gumballs" scan_zmat_$i.log|tail -n 1 > $Efile
done
If you're willing to wait, just find each line with the value you want in each file, keeping the last one each time
import glob
search_string = "gumballs"
with open("results.txt") as fh_results:
for name_file in glob.iglob("scan_zmat_*.txt"):
discovered_line = None # clear the match for each file
with open(name_file) as fh:
for line in fh:
if search_string in line: # update on each match
discovered_line = line
if discovered_line is not None:
fh_results.write(line) # includes newline chars
# else: # optional message
# print(f"WARNING: no lines in '{name_file}' matched '{search_string}'")
Caveats
Both this and your original search may be sorted by the inode (or Windows/NTFS equivalent), which is normally the order the files were written in, but not necessarily
If you want to be certain they're sorted, use glob.glob and sort it as you prefer instead of using glob.iglob directly (which yields an iterable of the filenames in the order provided by the filesystem)
If the files are large, it could be more efficient to seek backwards in blocks (repeatedly .seek()ing)
Not tested! If you don't want to wait)
#!/usr/bin/env python3
from os.path import join as joinpath
from os import listdir # like 'ls' in bash
# Path to files you want to read
DIRPATH = "/home/some/path/"
OUTFILE = "out.txt" # will be created in cwd
collected = [] # OS dependent sorting
def main():
for fname in listdir(DIRPATH):
# Check with 'os.path.isfile' if you want
if fname.startswith("scan_zmat_") and fname.endswith(".txt"):
with open(joinpath(DIRPATH, fname), "rb") as fd:
# GOTO second last byte
fd.seek(-1, 2)
# every read() advances the pointer
while fd.read(1) == b"\n":
fd.seek(-2, 1)
end = fd.tell() + 1
fd.seek(-1, 1)
while fd.read(1) != b"\n":
fd.seek(-2, 1)
collected.append(fd.read(end-fd.tell()))
with open(OUTFILE, "wb") as fd:
fd.writelines(collected)
if __name__ == "__main__":
main()
For every input file processed (see code below) I am trying to use "os.path.basename" to write to a new output file - I know I am missing something obvious...?
import os
import glob
import gzip
dbpath = '/home/university/Desktop/test'
for infile in glob.glob( os.path.join(dbpath, 'G[D|E]/????/*.gz') ):
print("current file is: " + infile)
**
outfile=os.path.basename('/home/university/Desktop/test/G[D|E]
/????/??????.xaa.fastq.gz').rsplit('.xaa.fastq.gz')[0]
file=open(outfile, 'w+')
**
gzsuppl = Chem.ForwardSDMolSupplier(gzip.open(infile))
for m in gzsuppl:
if m is None: continue
...etc
file.close()
print(count)
It is not clear to me how to capture the variable [0] (i.e. everything upstream of .xaa.fastq.gz) and use as the basename for the new output file?
Unfortunately it simply writes the new output file as "??????" rather than the actual sequence of 6 letters.
Thanks for any help given.
This seems like it will get everything upstream of the .xaa.fastq.gz in the paths returned from glob() in your sample code:
import os
filepath = '/home/university/Desktop/test/GD /AAML/DEAAML.xaa.fastq.gz'
filepath = os.path.normpath(filepath) # Changes path separators for Windows.
# This section was adapted from answer https://stackoverflow.com/a/3167684/355230
folders = []
while 1:
filepath, folder = os.path.split(filepath)
if folder:
folders.append(folder)
else:
if filepath:
folders.append(filepath)
break
folders.reverse()
if len(folders) > 1:
# The last element of folders should contain the original filename.
filename_prefix = os.path.basename(folders[-1]).split('.')[0]
outfile = os.path.join(*(folders[:-1] + [filename_prefix + '.rest_of_filename']))
print(outfile) # -> \home\university\Desktop\test\GD \AAML\DEAAML.rest_of_filename
Of course what ends-up in outfile isn't the final path plus filename since I don't know what the remainder of the filename will be and just put a placeholder in (the '.rest_of_filename').
I'm not familiar with the kind of input data you're working with, but here's what I can tell you:
The "something obvious" you're missing is that outfile has no connection to infile. Your outfile line uses the ?????? rather than the actual filename because that's what you ask for. It's glob.glob that turns it into a list of matches.
Here's how I'd write that aspect of the outfile line:
outfile = infile.rsplit('.xaa.fastq.gz', 1)[0]
(The , 1 ensures that it'll never split more than once, no matter how crazy a filename gets. It's just a good habit to get into when using split or rsplit like this.)
You're setting yourself up for a bug, because the glob pattern can match *.gz files which don't end in .xaa.fastq.gz, which would mean that a random .gz file which happens to wind up in the folder listing would cause outfile to have the same path as infile and you'd end up writing to the input file.
There are three solutions to this problem which apply to your use case:
Use *.xaa.fastq.gz instead of *.gzin your glob. I don't recommend this because it's easy for a typo to sneak in and make them different again, which would silently reintroduce the bug.
Write your output to a different folder than you took your input from.
outfile = os.path.join(outpath, os.path.relpath(infile, dbpath))
outparent = os.path.dirname(outfile)
if not os.path.exists(outparent):
os.makedirs(outparent)
Add an assert outfile != infile line so the program will die with a meaningful error message in the "this should never actually happen" case, rather than silently doing incorrect things.
The indentation of what you posted could be wrong, but it looks like you're opening a bunch of files, then only closing the last one. My advice is to use this instead, so it's impossible to get that wrong:
with open(outfile, 'w+') as file:
# put things which use `file` here
The name file is already present in the standard library and the variable names you chose are unhelpful. I'd rename infile to inpath, outfile to outpath, and file to outfile. That way, you can tell whether each one is a path (ie. a string) or a Python file object just from the variable name and there's no risk of accessing file before you (re)define it and getting a very confusing error message.
I am iterating directories and files inside of them while I modify in place each file. I am looking to have the new modified file being read right after.
Here is my code with descriptive comments:
# go through each directory based on their ids
for id in id_list:
id_dir = os.path.join(ouput_dir, id)
os.chdir(id_dir)
# go through all files (with a specific extension)
for filename in glob('*' + ext):
# modify the file by replacing all new-line characters with an empty space
with fileinput.FileInput(filename, inplace=True) as f:
for line in f:
print(line.replace('\n', ' '), end='')
# here I would like to read the NEW modified file
with open(filename) as newf:
content = newf.read()
As it stands, the newf is not the new modified one, but instead the original f. I think I understand why that is, however I found it difficult to overcome that issue.
I can always do 2 separate iterations (go through each directory based on their ids, go through all files (with a specific extension) and modify the file, and then repeat iteration to read each one of them) but I was hoping if there was a more efficient way around it. Perhaps if it would be possible to restart the second for loop after the modification has taken place and then have the read take place (so to avoid at least repeating the outer for loop).
Any ideas/designs of to achieve the above with a clean and efficient way?
For me it works with this code:
#!/usr/bin/env python3
import os
from glob import glob
import fileinput
id_list=['1']
ouput_dir='.'
ext = '.txt'
# go through each directory based on their ids
for id in id_list:
id_dir = os.path.join(ouput_dir, id)
os.chdir(id_dir)
# go through all files (with a specific extension)
for filename in glob('*' + ext):
# modify the file by replacing all new-line characters with an empty space
for line in fileinput.FileInput(filename, inplace=True):
print(line.replace('\n', ' ') , end="")
# here I would like to read the NEW modified file
with open(filename) as newf:
content = newf.read()
print(content)
notice how I iterate over the lines!
I am not saying that the way you are going about doing this is incorrect but I feel that you are overcomplicating it. Here is my super simple solution.
import glob, fileinput
for filename in glob('*' + ext):
f_in = (x.rstrip() for x in open(filename, 'rb').readlines()) #instead of trying to modify in place we instead read in data and replace raw_values.
with open(filename, 'wb') as f_out: # we then write the data stream back out
#extra modification to the data can go here, i just remove the /r and /n and write back out
for i in f_in:
f_out.write(i)
#now there is no need to read the data back in because we already have a static referance to it.
I'm looking for some help with my code which is rigth below :
for file in file_name :
if os.path.isfile(file):
for line_number, line in enumerate(fileinput.input(file, inplace=1)):
print file
os.system("pause")
if line_number ==1:
line = line.replace('Object','#Object')
sys.stdout.write(line)
I wanted to modify some previous extracted files in order to plot them with matplotlib. To do so, I remove some lines, comment some others.
My problem is the following :
Using for line_number, line in enumerate(fileinput.input(file, inplace=1)): gives me only 4 out of 5 previous extracted files (when looking file_name list contains 5 references !)
Using for line_number, line in enumerate(file): gives me the 5 previous extracted file, BUT I don't know how to make modifications using the same file without creating another one...
Did you have an idea on this issue? Is this a normal issue?
There a number of things that might help you.
Firstly file_name appears to be a list of file names. It might be better named file_names and then you could use file_name for each one. You have verified that this does hold 5 entries.
The enumerate() function is used to help when enumerating a list of items to provide both an index and the item for each loop. This saves you having to use a separate counter variable, e.g.
for index, item in enumerate(["item1", "item2", "item3"]):
print index, item
would print:
0 item1
1 item2
2 item3
This is not really required, as you have chosen to use the fileinput library. This is designed to take a list of files and iterate over all of the lines in all of the files in one single loop. As such you need to tweak your approach a bit, assuming your list of files is called file_names then you write something as follows:
# Keep only files in the file list
file_names = [file_name for file_name in file_names if os.path.isfile(file_name)]
# Iterate all lines in all files
for line in fileinput.input(file_names, inplace=1):
if fileinput.filelineno() == 1:
line = line.replace('Object','#Object')
sys.stdout.write(line)
The main point here being that it is better to pre filter any non-filenames before passing the list to fileinput. I will leave it up to you to fix the output.
fileinput provides a number of functions to help you figure out which file or line number is currently being processed.
Assuming you're still having trouble, my typical approach is to open a file read-only, read its contents into a variable, close the file, make an edited variable, open the file to write (wiping out original file), and finally write the edited contents.
I like this approach since I can simply change the file_name that gets written out if I want to test my edits without wiping out the original file.
Also, I recommend naming containers using plural nouns, like #Martin Evans suggests.
import os
file_names = ['file_1.txt', 'file_2.txt', 'file_3.txt', 'file_4.txt', 'file_5.txt']
file_names = [x for x in file_names if os.path.isfile(x)] # see #Martin's answer again
for file_name in file_names:
# Open read-only and put contents into a list of line strings
with open(file_name, 'r') as f_in:
lines = f_in.read().splitlines()
# Put the lines you want to write out in out_lines
out_lines = []
for index_no, line in enumerate(lines):
if index_no == 1:
out_lines.append(line.replace('Object', '#Object'))
elif ...
else:
out_lines.append(line)
# Uncomment to write to different file name for edits testing
# with open(file_name + '.out', 'w') as f_out:
# f_out.write('\n'.join(out_lines))
# Write out the file, clobbering the original
with open(file_name, 'w') as f_out:
f_out.write('\n'.join(out_lines))
Downside with this approach is that each file needs to be small enough to fit into memory twice (lines + out_lines).
Best of luck!
I'm not a programmer; I'm a pilot who has done just a little bit of scripting in a past life, so I'm completely non-current at this. I have searched the forum and found somewhat similar problems that, with more expertise and time I might be able to adapt to my problem, but I hope I can get closer by asking my own question. I hope my problem is unique enough that those considering answering do not feel their time is wasted, considering my disadvantage. Anyway here is my problem:
Some of my crew members periodically have a need to rename a few hundred to more than 1,000 small csv files based on a specific convention applied to their contents. Not all of the files are used in a given project, but any subset of them could be used, so automation makes a lot of sense here. Currently this is done manually as needed. I can easily move all these files into a single directory for processing, since all their file names are unique as received.
Here are representative excerpts from two example csv files, preceded by their respective file names (As I receive them):
A_13LSAT_2014-04-23_1431.csv:
1,KDAL CURLO RW13L SAT 20140414_0644,SID,N/A,DDI
2,*,RW13L(AER),SAT
3,RW13L(AER),+325123.36,-0965121.20,RW31R(DER),+325031.35,-0965020.95
4,1,1.2,+325123.36,-0965121.20,0.0,+325031.35,-0965020.95,2.0
3,RW31R(DER),+325031.35,-0965020.95,GH13L,+324947.23,-0964929.84
4,1,2.4,+325031.35,-0965020.95,0.0,+324947.23,-0964929.84,2.0
5,TTT,0,0
5,CVE,0,0
A_RROSEE_2014-04-03_1419.csv:
1,KDFW SEEVR STAR RRONY SEEVR 20140403_1340,STAR,N/A,DDI
2,*,RRONY,SEEVR
3,RRONY,+333455.16,-0952530.56,ROWZE,+333233.02,-0954016.52
4,1,12.6,+333455.16,-0952530.56,0.0,+333233.02,-0954016.52,2.0
5,EIC,0,1
5,SLR,0,0
I know these files are not code, but I entered them indented in this post so they would display properly.
The files must be renamed due to the 8.3 limitation of the platform they are used on.
The convention is:
•On the first line, the first two characters in the second word of the second "cell" (Which are the 6th and 7th characters of the second cell), and,
•on line 2, the first three characters of the third cell, and
•the first three characters of the fourth cell.
The contents and format of the files must remain unaltered. In theory this convention yields unique names for every file so duplication of file names should not be a problem.
The files above would be copied and renamed respectively to:
CURW1SAT.csv
SERROSEE.csv
That's it. Just a script that will scan a directory full of these csv files, and create renamed copies in the same directory according the the convention I just described, based on their contents. I'm attempting to use Activestate Python 2.7.7.
Thanks in advance for any consideration.
It's not what you'd call pretty, but neither am I; and it works (and it's simple)
import os
import glob
fileset = set(glob.glob(os.path.basename(os.path.join(".", "*.csv"))))
for filename in fileset:
with open(filename, "r") as f:
csv_file = f.readlines()
out = csv_file[0].split(",")[1].split(" ")[1][:2]
out += csv_file[1].split(",")[2][:3]
out += csv_file[1].split(",")[3][:3]
os.rename(filename, out + ".csv")
just drop this in the folder with all the csv's to be renamed and run it
That is indeed not too complicated. Python has out of the box everything you need.
I don't think it's a good idea to rename the files, in case of error (e.g. collision) it would make the process dangerous, copying to another folder is safer.
The code could look like that:
import csv
import os
import os.path
import sys
import shutil
def Process(input_directory, output_directory, filename):
"""This methods reads the file named 'filename' in input_directory and copies
it to output_directory, renaming it."""
# Read the file and extract first 2 lines.
with open(filename, 'r') as csv_file:
reader = csv.reader(csv_file, delimiter=',')
line1 = reader.next()
line2 = reader.next()
line1_second_cell = line1[1]
# split() separate words by spaces into a list, [1] takes the second.
second_word = line1_second_cell.split()[1]
line2_third_cell = line2[2]
line2_fourth_cell = line2[3]
# [:2] takes the first two characters from a string.
new_filename = second_word[:2] + line2_third_cell[:3] + line2_fourth_cell[:3]
new_filename += '.csv'
print 'copying', filename, 'to', new_filename
shutil.copyfile(
os.path.join(input_directory, filename),
os.path.join(output_directory, new_filename))
# sys.argv is the list of arguments passed on the command line.
if len(sys.argv) == 3:
input_directory = sys.argv[1]
output_directory = sys.argv[2]
# os.listdir gives all the files in the directory (including ., .. and sub
# directories).
for filename in os.listdir(input_directory):
if filename.endswith(".csv"):
Process(input_directory, output_directory, filename)
else:
print "Usage:", sys.argv[0], "source_directory target_directory"
On windows you can run it in a command line (cmd.exe):
C:\where_your_python_is\python.exe C:\where_your_script_is\renamer.py C:\input C:\output
On linux it would be a little simpler as the python binary is in the path:
python /where_your_script_is/renamer.py /input /output
Put this in a script, and when you run it, give it the directory name as an argument on the command line:
import csv
import sys
import os
def rename_csv_file(filename):
global directory
with open(filename,'r') as csv_file:
newfilename = str()
rownum = 0
filereader = csv.reader(csv_file,delimiter=',')
for row in filereader:
if rownum == 0:
newfilename = row[1].split()[1][:2]
elif rownum == 1:
newfilename += row[2][:3]
newfilename += row[3][:3]
break
rownum += 1
newfilename += '.csv'
newfullpath = os.path.join(directory,newfilename)
os.rename(filename,newfullpath)
if len(sys.argv) < 2:
print "Usage: {} directory_name".format(sys.argv[0])
sys.exit()
directory = sys.argv[1]
csvfiles = [ os.path.join(directory,f) for f in os.listdir(directory) if (os.path.isfile(os.path.join(directory,f)) and f.endswith('.csv')) ]
for f in csvfiles:
rename_csv_file(f)
This assumes that every csv in your directory needs to be renamed. The code could be more condensed, but I tried to spell it out a bit so you could see what was going on.
import os
import csv
import shutil
#change this to the directory where your csvs are stored
dirname = r'C:\yourdirectory'
os.chdir(dirname)
for item in os.listdir(dirname): #look through directory contents
if item.endswith('.csv'):
f = open(item)
r = csv.reader(f)
line1 = r.next() #get the first line of csv
line2 = r.next() #get the second line of csv
f.close()
name1 = line1[1][:2] #first part of your name
name2 = line2[2][:3] #second part
name3 = line2[3][:3] #third part
newname = name1+name2+name3+'.csv'
shutil.copy2(os.path.join(dirname,item),newname) #copied csv with newname