How to open a list of files in Python - python

I'm reading data file (text), and generating a number of reports, each one is written to a different output file (also text). I'm opening them the long way:
fP = open('file1','w')
invP = open('inventory','w')
orderP = open('orders','w')
... and so on, with a corresponding group of close() lines at the end.
If I could open them with a for loop, using a list of fP names and file names, I could guarantee closing the same files.
I tried using a dictionary of fp:filename, but that [obviously] didn't work, because either the fP variable is undefined, or a string 'fP' isn't a good file object name.
Since these are output files, I probably don't need to check for open errors - if I can't open one or more, I can't go on anyway.
Is there any way to open a group of files (not more than 10 or so) from a list of names, in a loop?

Good news! Python 3.3 brings in a standard safe way to do this:
contextlib.ExitStack
From the docs:
Each instance maintains a stack of registered callbacks that are called in reverse order when the instance is closed.
(...)
Since registered callbacks are invoked in the reverse order of registration, this ends up behaving as if multiple nested with statements had been used with the registered set of callbacks.
Here's an example how to use it:
from contextlib import ExitStack
with ExitStack() as stack:
files = [
stack.enter_context(open(filename))
for filename in filenames
]
# ... use files ...
When the code leaves the with statement, all files that have already been opened will be closed.
This way you also know that if 2 files get opened and then third file fails to open, the two already-opened files will be closed correctly. Also if an exception is raised anytime inside the with block, you'll see correct cleanup.

Yes, you can use a list comprehension:
filenames = ['file1.txt', 'file2.txt', 'file3.txt'...]
filedata = {filename: open(filename, 'w') for filename in filenames}
Now, all of the opened instances are saved in filedata, assigned to the name of the file.
To close them:
for file in filedata.values():
file.close()

Since you are saying there are many data files.Instead of entering filenames manually into a list.You can get the filenames into a list with this.
from os import listdir
from os.path import isfile, join
files_in_dir = [ f for f in listdir('/home/cam/Desktop') if isfile(join('/home/cam/Desktop',f)) ]
Now you can
for file in files_in_dir:
with open(file, 'w') as f:
f.do_something

Use the with keyword to guarantee that opened files (and other similar resources, known as "context managers") are closed:
with open(file_path, 'w') as output_file:
output_file.write('whatever')
Upon exiting the with block, the file will be properly closed -- even if an exception occurs.
You could easily loop over a list of paths to the desired files:
files = ['fp1.txt', 'inventory', 'orders']
for file in files:
with open(file, 'w') as current_file:
current_file.do_some_stuff()

You can open as many files as you want and keep them in a list to close them later:
fds = [open(path, 'w') for path in paths]
# ... do stuff with files
# close files
for fd in fds:
fd.close()
Or you could use a dictionary for better readability:
# map each file name to a file descriptor
files = {path: open(path, 'w') for path in paths}
# use file descriptors through the mapping
files['inventory'].write("Something")
# close files
for path in files:
files[path].close()

Both answers above are good if you know or define ahead of time the list of files you will want to create. But, in case you want a more generic solution, you can build that list just in time, use your OS to create empty files on disk (this is done different ways depending on the OS you are), then create the list of files interactively this way:
import os
working_folder = input("Enter full path for the working folder/directory: ")
os.chdir(working_folder)
filenames_list = os.listdir()
#you can filter too, if you need so:
#filenames_list = [filename for filename in os.listdir() if '.txt' in filename]
#then you can do as Reut Sharabani and A.J. suggest above and make a list of file descriptors
file_descriptors = [open(filename, 'w') for filename in filenames_list]
#or a dictionary as Reut Sharabani suggests (I liked that one Reut :)
#do whatever you need to do with all those files already opened for writing
#then close the files
for fd in file_descriptors:
fd.close()
It is ok to use "with"; as some suggest, if you work with only one file at the time (from start to finish), but if you want to work with all the files at the same time, it is better a list or dictionary of file descriptors.

Related

Creating one csv file containing data from many csv files

I have a folder full of csv files that contain results for different participants in an experiment. I'm trying to create one large csv file containing all the csv files in the directory. I'm using listdir() to create a list of all the files but I haven't been able to open the individual files in this list. I think I need to loop over each file but I haven't been able to figure out how.
This is the code I've come up with so far. I get the following error: FileNotFoundError: [Errno 2] No such file or directory: 'results_262.csv' because it appears that the loop only reads one file in the files variable even though there should be many.
from os import listdir
path = "results"
files = listdir(path)
print(files)
results = open("results.csv", "w")
data = []
for file in files:
participant = open(f"{file}", "r")
data.append(participant.readlines())
Would anyone be able to help?
Your issue is that listdir() returns the filenames without their path information, so you need to append the filename to the path.
import os
...
for file in files:
participant = open(os.join(path, file), "r"):
...
Other details:
f"{file}" is the same thing as just file. I.e., open(file, "r") is equivalent to open(f"{file}", "r"), but more efficient - there is not need to use interpolation what you already have the value you want in a variable.
And, you're not closing your files. You should add participant.close() in your loop, or, even better, use a context manager:
for file in files:
with open(os.join(path, file), "r") as participant:
...
The with puts your file handle in a context manager, and the file gets closed automatically when you leave the scope of the context manager.

Remove auto-generated __MACOSX folder from inside a zip file in Python

I have zip files uploaded by clients through a web server that sometimes contain pesky __MACOSX directories inside that gum things up. How can I remove these?
I thought of using ZipFile, but this answer says that isn't possible and gives this suggestion:
Read out the rest of the archive and write it to a new zip file.
How can I do this with ZipFile? Another Python based alternative like shutil or something similar would also be fine.
The examples below are designed to determine if a '__MACOSX' file is contained within a zip file. If this pesky exist then a new zip archive is created and all the files that are not __MACOSX files are written to this new archive. This code can be extended to include .ds_store files. Please let me if you need to delete the old zip file and replace it with the new clean zip file.
Hopefully, these answers help you solve your issue.
Example One
from zipfile import ZipFile
original_zip = ZipFile ('original.zip', 'r')
new_zip = ZipFile ('new_archve.zip', 'w')
for item in original_zip.infolist():
buffer = original_zip.read(item.filename)
if not str(item.filename).startswith('__MACOSX/'):
new_zip.writestr(item, buffer)
new_zip.close()
original_zip.close()
Example Two
def check_archive_for_bad_filename(file):
zip_file = ZipFile(file, 'r')
for filename in zip_file.namelist():
print(filename)
if filename.startswith('__MACOSX/'):
return True
def remove_bad_filename_from_archive(original_file, temporary_file):
zip_file = ZipFile(original_file, 'r')
for item in zip_file.namelist():
buffer = zip_file.read(item)
if not item.startswith('__MACOSX/'):
if not os.path.exists(temporary_file):
new_zip = ZipFile(temporary_file, 'w')
new_zip.writestr(item, buffer)
new_zip.close()
else:
append_zip = ZipFile(temporary_file, 'a')
append_zip.writestr(item, buffer)
append_zip.close()
zip_file.close()
archive_filename = 'old.zip'
temp_filename = 'new.zip'
results = check_archive_for_bad_filename(archive_filename)
if results:
print('Removing MACOSX file from archive.')
remove_bad_filename_from_archive(archive_filename, temp_filename)
else:
print('No MACOSX file in archive.')
The idea would be to use ZipFile to extract the contents into some defined folder then remove the __MACOSX entry (os.rmdir, os.remove) and then compress it again.
Depending if you have zip command on your OS you might be able to skip the re-compressing part. You could as well control this command from python by using os.system or subprocess module.

Python Delete Files in Directory from list in Text file

I've searched through many answers on deleting multiple files based on certain parameters (e.g. all txt files). Unfortunately, I haven't seen anything where one has a longish list of files saved to a .txt (or .csv) file and wants to use that list to delete files from the working directory.
I have my current working directory set to where the .txt file is (text file with list of files for deletion, one on each row) as well as the ~4000 .xlsx files. Of the xlsx files, there are ~3000 I want to delete (listed in the .txt file).
This is what I have done so far:
import os
path = "c:\\Users\\SFMe\\Desktop\\DeleteFolder"
os.chdir(path)
list = open('DeleteFiles.txt')
for f in list:
os.remove(f)
This gives me the error:
OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'Test1.xlsx\n'
I feel like I'm missing something simple. Any help would be greatly appreciated!
Thanks
Strip ending '\n' from each line read from the text file;
Make absolute path by joining path with the file name;
Do not overwrite Python types (i.e., in you case list);
Close the text file or use with open('DeleteFiles.txt') as flist.
EDIT: Actually, upon looking at your code, due to os.chdir(path), second point may not be necessary.
import os
path = "c:\\Users\\SFMe\\Desktop\\DeleteFolder"
os.chdir(path)
flist = open('DeleteFiles.txt')
for f in flist:
fname = f.rstrip() # or depending on situation: f.rstrip('\n')
# or, if you get rid of os.chdir(path) above,
# fname = os.path.join(path, f.rstrip())
if os.path.isfile(fname): # this makes the code more robust
os.remove(fname)
# also, don't forget to close the text file:
flist.close()
As Henry Yik pointed in the commentary, you need to pass the full path when using os.remove function. Also, open function just returns the file object. You need to read the lines from the file. And don't forget to close the file. A solution would be:
import os
path = "c:\\Users\\SFMe\\Desktop\\DeleteFolder"
os.chdir(path)
# added the argument "r" to indicates only reading
list_file = open('DeleteFiles.txt', "r")
# changing variable list to _list to do not shadow
# the built-in function and type list
_list = list_file.read().splitlines()
list_file.close()
for f in _list:
os.remove(os.path.join(path,f))
A further improvement would be use list comprehension instead of a loop and a with block, which "automagically" closes the file for us:
with open('DeleteFiles.txt', "r") as list_file:
_list = list_file.read().splitlines()
[os.remove(os.path.join(path,f)) for f in _list]

Iterating over a text files in a subdirectory

How do I iterate over text files only within a directory? What I have thus far is;
for file in glob.glob('*'):
f = open(file)
text = f.read()
f.close()
This works, however I am having to store my .py file in the same directory (folder) to get it to run, and as a result the iteration is including the .py file itself. Ideally what I want to command is either;
"Look in this subdirectory/folder, and iterate over all the files in there"
OR...
"Look through all files in this directory and iterate over those with .txt extension"
I'm sure I'm asking for something fairly straight forward, but I do not know how to proceed. Its probably worth me highlighting that I got the glob module through trial and error, so if this is the wrong way to go around this particular method feel free to correct me! Thanks.
The glob.glob function actually takes a globbing pattern as its parameter.
For instance, "*.txt" while match the files whose name ends with .txt.
Here is how you can use it:
for file in glob.glob("*.txt"):
f = open(file)
text = f.read()
f.close()
If however you want to exclude some specific files, say .py files, this is not directly supported by globbing's syntax, as explained here.
In that case, you'll need to get those files, and manually exclude them:
pythonFiles = glob.glob("*.py")
otherFiles = [f for f in glob.glob("*") if f not in pythonFiles]
glob.glob() uses the same wildcard pattern matching as your standard unix-like shell. The pattern can be used to filter on extensions of course:
# this will list all ".py" files in the current directory
# (
>>> glob.glob("*.py")
['__init__.py', 'manage.py', 'fabfile.py', 'fixmig.py']
but it can also be used to explore a given path, relative:
>>> glob.glob("../*")
['../etc', '../docs', '../setup.sh', '../tools', '../project', '../bin', '../pylint.html', '../sql']
or absolute:
>>> glob.glob("/home/bruno/Bureau/mailgun/*")
['/home/bruno/Bureau/mailgun/Domains_ Verify - Mailgun.html', '/home/bruno/Bureau/mailgun/Domains_ Verify - Mailgun_files']
And you can of course do both at once:
>>> glob.glob("/home/bruno/Bureau/*.pdf")
['/home/bruno/Bureau/marvin.pdf', '/home/bruno/Bureau/24-pages.pdf', '/home/bruno/Bureau/alice-in-wonderland.pdf']
The solution is very simple.
for file in glob.glob('*'):
if not file.endswith('.txt'):
continue
f = open(file)
text = f.read()
f.close()

taking data from files which are in folder

How do I get the data from multiple txt files that placed in a specific folder. I started with this could not fix. It gives an error like 'No such file or directory: '.idea' (??)
(Let's say I have an A folder and in that, there are x.txt, y.txt, z.txt and so on. I am trying to get and print the information from all the files x,y,z)
def find_get(folder):
for file in os.listdir(folder):
f = open(file, 'r')
for data in open(file, 'r'):
print data
find_get('filex')
Thanks.
If you just want to print each line:
import glob
import os
def find_get(path):
for f in glob.glob(os.path.join(path,"*.txt")):
with open(os.path.join(path, f)) as data:
for line in data:
print(line)
glob will find only your .txt files in the specified path.
Your error comes from not joining the path to the filename, unless the file was in the same directory you were running the code from python would not be able to find the file without the full path. Another issue is you seem to have a directory .idea which would also give you an error when trying to open it as a file. This also presumes you actually have permissions to read the files in the directory.
If your files were larger I would avoid reading all into memory and/or storing the full content.
First of all make sure you add the folder name to the file name, so you can find the file relative to where the script is executed.
To do so you want to use os.path.join, which as it's name suggests - joins paths. So, using a generator:
def find_get(folder):
for filename in os.listdir(folder):
relative_file_path = os.path.join(folder, filename)
with open(relative_file_path) as f:
# read() gives the entire data from the file
yield f.read()
# this consumes the generator to a list
files_data = list(find_get('filex'))
See what we got in the list that consumed the generator:
print files_data
It may be more convenient to produce tuples which can be used to construct a dict:
def find_get(folder):
for filename in os.listdir(folder):
relative_file_path = os.path.join(folder, filename)
with open(relative_file_path) as f:
# read() gives the entire data from the file
yield (relative_file_path, f.read(), )
# this consumes the generator to a list
files_data = dict(find_get('filex'))
You will now have a mapping from the file's name to it's content.
Also, take a look at the answer by #Padraic Cunningham . He brought up the glob module which is suitable in this case.
The error you're facing is simple: listdir returns filenames, not full pathnames. To turn them into pathnames you can access from your current working directory, you have to join them to the directory path:
for filename in os.listdir(directory):
pathname = os.path.join(directory, filename)
with open(pathname) as f:
# do stuff
So, in your case, there's a file named .idea in the folder directory, but you're trying to open a file named .idea in the current working directory, and there is no such file.
There are at least four other potential problems with your code that you also need to think about and possibly fix after this one:
You don't handle errors. There are many very common reasons you may not be able to open and read a file--it may be a directory, you may not have read access, it may be exclusively locked, it may have been moved since your listdir, etc. And those aren't logic errors in your code or user errors in specifying the wrong directory, they're part of the normal flow of events, so your code should handle them, not just die. Which means you need a try statement.
You don't do anything with the files but print out every line. Basically, this is like running cat folder/* from the shell. Is that what you want? If not, you have to figure out what you want and write the corresponding code.
You open the same file twice in a row, without closing in between. At best this is wasteful, at worst it will mean your code doesn't run on any system where opens are exclusive by default. (Are there such systems? Unless you know the answer to that is "no", you should assume there are.)
You don't close your files. Sure, the garbage collector will get to them eventually--and if you're using CPython and know how it works, you can even prove the maximum number of open file handles that your code can accumulate is fixed and pretty small. But why rely on that? Just use a with statement, or call close.
However, none of those problems are related to your current error. So, while you have to fix them too, don't expect fixing one of them to make the first problem go away.
Full variant:
import os
def find_get(path):
files = {}
for file in os.listdir(path):
if os.path.isfile(os.path.join(path,file)):
with open(os.path.join(path,file), "r") as data:
files[file] = data.read()
return files
print(find_get("filex"))
Output:
{'1.txt': 'dsad', '2.txt': 'fsdfs'}
After the you could generate one file from that content, etc.
Key-thing:
os.listdir return a list of files without full path, so you need to concatenate initial path with fount item to operate.
there could be ideally used dicts :)
os.listdir return files and folders, so you need to check if list item is really file
You should check if the file is actually file and not a folder, since you can't open folders for reading. Also, you can't just open a relative path file, since it is under a folder, so you should get the correct path with os.path.join. Check below:
import os
def find_get(folder):
for file in os.listdir(folder):
if not os.path.isfile(file):
continue # skip other directories
f = open(os.path.join(folder, file), 'r')
for line in f:
print line

Categories