I need over 2000 dummy (txt) files voor testing a recycle bin function. I've created the txt dummy files with the following code:
list = range(0,2000)
vulling = list
with open("path/file.txt", "w") as f:
for s in vulling:
f.write(str(s) +"\n")
List = open("path/file.txt")
List2 = (s.strip() + ' dummy' for s in List)
for item in List2:
open('path/%s.txt'%(item,), 'w')
But, since I can't upload empty files, I need to add content to those files. The content can be the same for all those files. For example: add a string "Spam" to every file. What would be the best solution for this?
The easiest thing to do is just to create the files with the content in them that you want to begin with:
import os.path
def create_test_files(target_dir, content, n=2000, template="file_%s.txt"):
for i in xrange(n):
path = os.path.join(target_dir, template % i)
with open(path, 'w') as fh:
fh.write(content)
yield path
for file_name in create_test_files("/tmp/example", 'Spam'):
print file_name
This is choosing file names for you, so if you need specific ones, you'll have to change it.
This is really quite fast. The other approach (create then copy) will result in having to read the original file 2000 times. Seeing as we already know the content we want in there, we can save that time.
Note: This solution uses a generator, so unless you force it to iterate (e.g. by putting it in loop, or tuple), it won't generate any files.
Related
My first post here but I've spent the last few weeks basically living on S.O., looking for answers and solutions. Unfortunately I have not found an answer to my current problem on here, or anywhere else, so I am hoping one of you lovely people can help me.
I am trying to batch process Autodesk Maya files from within Windows, to replace the filepaths for references, to one singular directory. At present, it just throws back a "" after I try to execute the code.
Here is my code so far - please pick holes in it as much as you want, I need to get better!
### Reference Example
# file -rdi 2 -ns "platform3_MDL_MASTER" -rfn
# "Toyota_2000GT_Spider:Toyota2000GT_Spider_RN"
# -typ "mayaAscii"
# "C:/svn/TEST/previs/maya_references/Toyota_2000GT_Spider.ma";
import os
# Defining the reference paths - before and after
projectPath = "C:/projects/TEST/previs"
originalPath = "C:/projects/TEST/previs/maya_references/"
newPath = "R:/projects/FRAT/production/maya_references/"
# Makes a list of all previs files in the given directory.
previsFiles = [os.path.join(d, x)
for d, dirs, files in os.walk(projectPath)
for x in files if x.endswith("_PREVIS.ma")]
previsSuffix = '.ma";'
newLines = []
# Loops through each previs file found...
# for each line that contains the previsSuffix...
# and splits the filepath into a directory and a filename
# and then replaces that originalPath with the newPath.
for scene in previsFiles:
with open(scene, "r") as fp:
previsReadlines = fp.readlines()
for line in previsReadlines:
if previsSuffix in line:
# Splits the directory from the file name.
lines = os.path.split(line)
newLines = line.replace(lines[0], newPath)
else:
break
with open(scene, 'w') as fw:
previsWritelines = fw.writelines()
You'll have the adjust this to work with your script, but I hope it gives you the general idea of how to replace a reference path with another.
The original code had 2 main issues:
1) You weren't actually changing the contents. Doing newLines = Doesn't actually re-assign previsReadlines, so the changes weren't being recorded.
2) Nothing was being passed to write with. writelines needs a parameter you intend to write with.
# Adjust these paths to existing maya scene files.
scnPath = "/path/to/file/to/open.ma"
oldPath = "/path/to/old/file.ma"
newPath = "/path/to/new/file.ma"
with open(scnPath, "r") as fp:
# Get all file contents in a list.
fileLines = fp.readlines()
# Use enumerate to keep track of what index we're at.
for i, line in enumerate(previsReadlines):
# Check if the line has the old path in it.
if oldPath in line:
# Replace with the new path and assign the change.
# Before you were assigning this to a new variable that goes nowhere.
# Instead it needs to re-assign the line from the list we first read from.
fileLines[i] = line.replace(oldPath, newPath)
# Completely replace the file with our changes.
with open(scnPath, 'w') as fw:
# You must pass the contents in here to write it.
fw.writelines(fileLines)
For every input file processed (see code below) I am trying to use "os.path.basename" to write to a new output file - I know I am missing something obvious...?
import os
import glob
import gzip
dbpath = '/home/university/Desktop/test'
for infile in glob.glob( os.path.join(dbpath, 'G[D|E]/????/*.gz') ):
print("current file is: " + infile)
**
outfile=os.path.basename('/home/university/Desktop/test/G[D|E]
/????/??????.xaa.fastq.gz').rsplit('.xaa.fastq.gz')[0]
file=open(outfile, 'w+')
**
gzsuppl = Chem.ForwardSDMolSupplier(gzip.open(infile))
for m in gzsuppl:
if m is None: continue
...etc
file.close()
print(count)
It is not clear to me how to capture the variable [0] (i.e. everything upstream of .xaa.fastq.gz) and use as the basename for the new output file?
Unfortunately it simply writes the new output file as "??????" rather than the actual sequence of 6 letters.
Thanks for any help given.
This seems like it will get everything upstream of the .xaa.fastq.gz in the paths returned from glob() in your sample code:
import os
filepath = '/home/university/Desktop/test/GD /AAML/DEAAML.xaa.fastq.gz'
filepath = os.path.normpath(filepath) # Changes path separators for Windows.
# This section was adapted from answer https://stackoverflow.com/a/3167684/355230
folders = []
while 1:
filepath, folder = os.path.split(filepath)
if folder:
folders.append(folder)
else:
if filepath:
folders.append(filepath)
break
folders.reverse()
if len(folders) > 1:
# The last element of folders should contain the original filename.
filename_prefix = os.path.basename(folders[-1]).split('.')[0]
outfile = os.path.join(*(folders[:-1] + [filename_prefix + '.rest_of_filename']))
print(outfile) # -> \home\university\Desktop\test\GD \AAML\DEAAML.rest_of_filename
Of course what ends-up in outfile isn't the final path plus filename since I don't know what the remainder of the filename will be and just put a placeholder in (the '.rest_of_filename').
I'm not familiar with the kind of input data you're working with, but here's what I can tell you:
The "something obvious" you're missing is that outfile has no connection to infile. Your outfile line uses the ?????? rather than the actual filename because that's what you ask for. It's glob.glob that turns it into a list of matches.
Here's how I'd write that aspect of the outfile line:
outfile = infile.rsplit('.xaa.fastq.gz', 1)[0]
(The , 1 ensures that it'll never split more than once, no matter how crazy a filename gets. It's just a good habit to get into when using split or rsplit like this.)
You're setting yourself up for a bug, because the glob pattern can match *.gz files which don't end in .xaa.fastq.gz, which would mean that a random .gz file which happens to wind up in the folder listing would cause outfile to have the same path as infile and you'd end up writing to the input file.
There are three solutions to this problem which apply to your use case:
Use *.xaa.fastq.gz instead of *.gzin your glob. I don't recommend this because it's easy for a typo to sneak in and make them different again, which would silently reintroduce the bug.
Write your output to a different folder than you took your input from.
outfile = os.path.join(outpath, os.path.relpath(infile, dbpath))
outparent = os.path.dirname(outfile)
if not os.path.exists(outparent):
os.makedirs(outparent)
Add an assert outfile != infile line so the program will die with a meaningful error message in the "this should never actually happen" case, rather than silently doing incorrect things.
The indentation of what you posted could be wrong, but it looks like you're opening a bunch of files, then only closing the last one. My advice is to use this instead, so it's impossible to get that wrong:
with open(outfile, 'w+') as file:
# put things which use `file` here
The name file is already present in the standard library and the variable names you chose are unhelpful. I'd rename infile to inpath, outfile to outpath, and file to outfile. That way, you can tell whether each one is a path (ie. a string) or a Python file object just from the variable name and there's no risk of accessing file before you (re)define it and getting a very confusing error message.
I am iterating directories and files inside of them while I modify in place each file. I am looking to have the new modified file being read right after.
Here is my code with descriptive comments:
# go through each directory based on their ids
for id in id_list:
id_dir = os.path.join(ouput_dir, id)
os.chdir(id_dir)
# go through all files (with a specific extension)
for filename in glob('*' + ext):
# modify the file by replacing all new-line characters with an empty space
with fileinput.FileInput(filename, inplace=True) as f:
for line in f:
print(line.replace('\n', ' '), end='')
# here I would like to read the NEW modified file
with open(filename) as newf:
content = newf.read()
As it stands, the newf is not the new modified one, but instead the original f. I think I understand why that is, however I found it difficult to overcome that issue.
I can always do 2 separate iterations (go through each directory based on their ids, go through all files (with a specific extension) and modify the file, and then repeat iteration to read each one of them) but I was hoping if there was a more efficient way around it. Perhaps if it would be possible to restart the second for loop after the modification has taken place and then have the read take place (so to avoid at least repeating the outer for loop).
Any ideas/designs of to achieve the above with a clean and efficient way?
For me it works with this code:
#!/usr/bin/env python3
import os
from glob import glob
import fileinput
id_list=['1']
ouput_dir='.'
ext = '.txt'
# go through each directory based on their ids
for id in id_list:
id_dir = os.path.join(ouput_dir, id)
os.chdir(id_dir)
# go through all files (with a specific extension)
for filename in glob('*' + ext):
# modify the file by replacing all new-line characters with an empty space
for line in fileinput.FileInput(filename, inplace=True):
print(line.replace('\n', ' ') , end="")
# here I would like to read the NEW modified file
with open(filename) as newf:
content = newf.read()
print(content)
notice how I iterate over the lines!
I am not saying that the way you are going about doing this is incorrect but I feel that you are overcomplicating it. Here is my super simple solution.
import glob, fileinput
for filename in glob('*' + ext):
f_in = (x.rstrip() for x in open(filename, 'rb').readlines()) #instead of trying to modify in place we instead read in data and replace raw_values.
with open(filename, 'wb') as f_out: # we then write the data stream back out
#extra modification to the data can go here, i just remove the /r and /n and write back out
for i in f_in:
f_out.write(i)
#now there is no need to read the data back in because we already have a static referance to it.
So am starting from scratch on a program that I haven't really seen replicated anywhere else. I'll describe exactly what I want it to do:
I have a list of strings that looks like this:
12482-2958
02274+2482
23381-3857
..........
I want to take each of these strings and search through a few dozen files (all named wds000.dat, wds005.dat, wds010.dat, etc) for matches. If one of them finds a match, I want to write that string to a new file, so in the end I have a list of strings that had matches.
If I need to be more clear about something, please let me know. Any help on where to start with this would be much appreciated. Thanks guys and gals!
Something like this should work
import os
#### your array ####
myarray = {"12482-2958", "02274+2482", "23381-3857"}
path = os.path.expanduser("path/to/myfile")
newpath = os.path.expanduser("path/to/myResultsFile")
filename = 'matches.data'
newf = open(os.path.join(newpath, filename), "w+")
###### Loops through every element in the above array ####
for element in myarray:
elementstring=''.join(element)
#### opens the path where all of your .dat files are ####
files = os.listdir(path)
for f in files:
if f.strip().endswith(".dat"):
openfile = open(os.path.join(path, f), 'rb')
#### loops through every line in the file comparing the strings ####
for line in openfile:
if elementstring in line:
newf.write(line)
openfile.close()
newf.close()
Define a function that gets a path and a string and checks for match.
You can use: open(), find(), close()
Then just create all paths in a for loop, for every path check all strings with the function and print to file if needed
Not explained much... Needing more explaining?
Not so pythonic... and probably has something to straighten out but pretty much the logic to follow:
from glob import glob
strings = ['12482-2958',...] # your strings
output = []
for file in glob('ws*.dat'):
with open(file, 'rb+') as f:
for line in f.readlines():
for subs in strings:
if subs in line:
output.append(line)
print(output)
I'm looking for some help with my code which is rigth below :
for file in file_name :
if os.path.isfile(file):
for line_number, line in enumerate(fileinput.input(file, inplace=1)):
print file
os.system("pause")
if line_number ==1:
line = line.replace('Object','#Object')
sys.stdout.write(line)
I wanted to modify some previous extracted files in order to plot them with matplotlib. To do so, I remove some lines, comment some others.
My problem is the following :
Using for line_number, line in enumerate(fileinput.input(file, inplace=1)): gives me only 4 out of 5 previous extracted files (when looking file_name list contains 5 references !)
Using for line_number, line in enumerate(file): gives me the 5 previous extracted file, BUT I don't know how to make modifications using the same file without creating another one...
Did you have an idea on this issue? Is this a normal issue?
There a number of things that might help you.
Firstly file_name appears to be a list of file names. It might be better named file_names and then you could use file_name for each one. You have verified that this does hold 5 entries.
The enumerate() function is used to help when enumerating a list of items to provide both an index and the item for each loop. This saves you having to use a separate counter variable, e.g.
for index, item in enumerate(["item1", "item2", "item3"]):
print index, item
would print:
0 item1
1 item2
2 item3
This is not really required, as you have chosen to use the fileinput library. This is designed to take a list of files and iterate over all of the lines in all of the files in one single loop. As such you need to tweak your approach a bit, assuming your list of files is called file_names then you write something as follows:
# Keep only files in the file list
file_names = [file_name for file_name in file_names if os.path.isfile(file_name)]
# Iterate all lines in all files
for line in fileinput.input(file_names, inplace=1):
if fileinput.filelineno() == 1:
line = line.replace('Object','#Object')
sys.stdout.write(line)
The main point here being that it is better to pre filter any non-filenames before passing the list to fileinput. I will leave it up to you to fix the output.
fileinput provides a number of functions to help you figure out which file or line number is currently being processed.
Assuming you're still having trouble, my typical approach is to open a file read-only, read its contents into a variable, close the file, make an edited variable, open the file to write (wiping out original file), and finally write the edited contents.
I like this approach since I can simply change the file_name that gets written out if I want to test my edits without wiping out the original file.
Also, I recommend naming containers using plural nouns, like #Martin Evans suggests.
import os
file_names = ['file_1.txt', 'file_2.txt', 'file_3.txt', 'file_4.txt', 'file_5.txt']
file_names = [x for x in file_names if os.path.isfile(x)] # see #Martin's answer again
for file_name in file_names:
# Open read-only and put contents into a list of line strings
with open(file_name, 'r') as f_in:
lines = f_in.read().splitlines()
# Put the lines you want to write out in out_lines
out_lines = []
for index_no, line in enumerate(lines):
if index_no == 1:
out_lines.append(line.replace('Object', '#Object'))
elif ...
else:
out_lines.append(line)
# Uncomment to write to different file name for edits testing
# with open(file_name + '.out', 'w') as f_out:
# f_out.write('\n'.join(out_lines))
# Write out the file, clobbering the original
with open(file_name, 'w') as f_out:
f_out.write('\n'.join(out_lines))
Downside with this approach is that each file needs to be small enough to fit into memory twice (lines + out_lines).
Best of luck!