Python - Opening successive Files without physically opening every one - python

If I am to read a number of files in Python 3.2, say 30-40, and i want to keep the file references in a list
(all the files are in a common folder)
Is there anyway how i can open all the files to their respective file handles in the list, without having to individually open every file via the file.open() function

This is simple, just use a list comprehension based on your list of file paths. Or if you only need to access them one at a time, use a generator expression to avoid keeping all forty files open at once.
list_of_filenames = ['/foo/bar', '/baz', '/tmp/foo']
open_files = [open(f) for f in list_of_filenames]
If you want handles on all the files in a certain directory, use the os.listdir function:
import os
open_files = [open(f) for f in os.listdir(some_path)]
I've assumed a simple, flat directory here, but note that os.listdir returns a list of paths to all file objects in the given directory, whether they are "real" files or directories. So if you have directories within the directory you're opening, you'll want to filter the results using os.path.isfile:
import os
open_files = [open(f) for f in os.listdir(some_path) if os.path.isfile(f)]
Also, os.listdir only returns the bare filename, rather than the whole path, so if the current working directory is not some_path, you'll want to make absolute paths using os.path.join.
import os
open_files = [open(os.path.join(some_path, f)) for f in os.listdir(some_path)
if os.path.isfile(f)]
With a generator expression:
import os
all_files = (open(f) for f in os.listdir(some_path)) # note () instead of []
for f in all_files:
pass # do something with the open file here.
In all cases, make sure you close the files when you're done with them. If you can upgrade to Python 3.3 or higher, I recommend you use an ExitStack for one more level of convenience .

The os library (and listdir in particular) should provide you with the basic tools you need:
import os
print("\n".join(os.listdir())) # returns all of the files (& directories) in the current directory
Obviously you'll want to call open with them, but this gives you the files in an iterable form (which I think is the crux of the issue you're facing). At this point you can just do a for loop and open them all (or some of them).
quick caveat: Jon Clements pointed out in the comments of Henry Keiter's answer that you should watch out for directories, which will show up in os.listdir along with files.
Additionally, this is a good time to write in some filtering statements to make sure you only try to open the right kinds of files. You might be thinking you'll only ever have .txt files in a directory now, but someday your operating system (or users) will have a clever idea to put something else in there, and that could throw a wrench in your code.
Fortunately, a quick filter can do that, and you can do it a couple of ways (I'm just going to show a regex filter):
import os,re
scripts=re.compile(".*\.py$")
files=[open(x,'r') for x in os.listdir() if os.path.isfile(x) and scripts.match(x)]
files=map(lambda x:x.read(),files)
print("\n".join(files))
Note that I'm not checking things like whether I have permission to access the file, so if I have the ability to see the file in the directory but not permission to read it then I'll hit an exception.

Related

How to check whether file name contains a specific character with Pathlib

Since it looks like Pathlib is the future, I'm trying to refactor some of my code to change from from previous use of os to Pathlib. I'm stuck with the following problem. Since I work with a Mac, sometimes the folders contain hidden files preceded by a period (.DS_Store or names from deleted files preceded by ._). That gets me into a lot of problems when I loop through files in a directory that have certain extension. To avoid this problem using os.walk, I do the following:
for root, dirs, files in os.walk(DIR_NAME):
# iterate all files
for file in files:
if file.endswith(ext):
if file.startswith("."):
continue
do something with the file
I know we have the .stem and .suffix method to manipulate file names with Pathlib, but I don't see how they can help with this problem. The .startswith seems more intuitive but alas it does not seem to be available in Pathlib. So, the question is, how would one go about doing this in Pathlib?

How to save os.listdir output as list

I am trying to save all entries of os.listdir("./oldcsv") separately in a list but I don't know how to manipulate the output before it is processed.
What I am trying to do is generate a list containing the absolute pathnames of all *.csv files in a folder, which can later be used to easily manipulate those files' contents. I don't want to put lots of hardcoded pathnames in the script, as it is annoying and hard to read.
import os
for file in os.listdir("./oldcsv"):
if file.endswith(".csv"):
print(os.path.join("/oldcsv", file))
Normally I would use a loop with .append but in this case I cannot do so, since os.listdir just seems to create a "blob" of content. Probably there is an easy solution out there, but my brain won't think of it.
There's a glob module in the standard library that can solve your problem with a single function call:
import glob
csv_files = glob.glob("./*.csv") # get all .csv files from the working dir
assert isinstance(csv_files, list)

Iterating over .wav files in subdirectories of parent directory

Cheers everybody,
I need help with something in python 3.6 exactly. So i have structure of data like this:
|main directory
| |subdirectory's(plural)
| | |.wav files
I'm currently working from a directory where main directory is placed so I don't need to specify paths before that. So firstly I wanna iterate over my main directory and find all subdirectorys. Then in each of them I wanna find the .wav files, and when done with processing them I wanna go to next subdirectory and so on until all of them are opened, and all .wav files are processed. Exactly what I wanna do with those .wav files is input them in my program, process them so i can convert them to numpy arrays, and then I convert that numpy array into some other object (working with tensorflow to be exact, and wanna convert to TF object). I wrote about the whole process if anybody has any fast advices on doing that too so why not.
I tried doing it with for loops like:
for subdirectorys in open(data_path, "r"):
for files in subdirectorys:
#doing some processing stuff with the file
The problem is that it always raises error 13, Permission denied showing on that data_path I gave him but when I go to properties there it seems okay and all permissions are fine.
I tried some other ways like with os.open or i replaced for loop with:
with open(data_path, "r") as data:
and it always raises permission denied error.
os.walk works in some way but it's not what I need, and when i tried to modify it id didn't give errors but it also didnt do anything.
Just to say I'm not any pro programmer in python so I may be missing an obvious thing but ehh, I'm here to ask and learn. I also saw a lot of similiar questions but they mainly focus on .txt files and not specificaly in my case so I need to ask it here.
Anyway thanks for help in advance.
Edit: If you want an example for glob (more sane), here it is:
from pathlib import Path
# The pattern "**" means all subdirectories recursively,
# with "*.wav" meaning all files with any name ending in ".wav".
for file in Path(data_path).glob("**/*.wav"):
if not file.is_file(): # Skip directories
continue
with open(file, "w") as f:
# do stuff
For more info see Path.glob() on the documentation. Glob patterns are a useful thing to know.
Previous answer:
Try using either glob or os.walk(). Here is an example for os.walk().
from os import walk, path
# Recursively walk the directory data_path
for root, _, files in walk(data_path):
# files is a list of files in the current root, so iterate them
for file in files:
# Skip the file if it is not *.wav
if not file.endswith(".wav"):
continue
# os.path.join() will create the path for the file
file = path.join(root, files)
# Do what you need with the file
# You can also use block context to open the files like this
with open(file, "w") as f: # "w" means permission to write. If reading, use "r"
# Do stuff
Note that you may be confused about what open() does. It opens a file for reading, writing, and appending. Directories are not files, and therefore cannot be opened.
I suggest that you Google for documentation and do more reading about the functions used. The documentation will help more than I can.
Another good answer explaining in more detail can be seen here.
import glob
import os
main = '/main_wavs'
wavs = [w for w in glob.glob(os.path.join(main, '*/*.wav')) if os.path.isfile(w)]
In terms of permissions on a path A/B/C... A, B and C must all be accessible. For files that means read permission. For directories, it means read and execute permissions (listing contents).

Using os.system() in a specific directory only

I have a directory containing mutliple files with similar names and subdirectories named after these so that files with like-names are located in that subdirectory. I'm trying to concatenate all the .sdf files in a given subdirectory to a single .sdf file.
import os
from os import system
for ele in os.listdir(Path):
if ele.endswith('.sdf'):
chdir(Path + '/' + ele[0:5])
system('cat' + ' ' + '*.sdf' + '>' + ele[0:5] + '.sdf')
However when I run this, the concatenated file includes every .sdf file from the original directory rather than just the .sdf files from the desired one. How do I alter my script to concatenate the files in the subdirectory only?
this is a very clumsy way of doing it. Using chdir is not recommended, and system either (deprecated, and overkill to call cat)
Let me propose a pure python implementation using glob.glob to filter the .sdf files, and read each file one by one and write to the big file opened before the loop:
import glob,os
big_sdf_file = "all_data.sdf" # I'll let you compute the name/directory you want
with open(big_sdf_file,"wb") as fw:
for sdf_file in glob.glob(os.path.join(Path,"*.sdf")):
with open(sdf_file,"rb") as fr:
fw.write(fr.read())
I left big_sdf_file not computed, I would not recommend to put it in the same directory as the other files, since running the script twice would result in taking the output as input as well.
Note that the drawback of this approach is that if the files are big, they're read fully into memory, which can cause problems. In that case, replace
fw.write(fr.read())
by:
shutil.copyfileobj(fr,fw)
(importing shutil is necessary in that case). That allows packet copy instead of full-file read/write.
I'll add that it's probably not the full solution you're expecting, since there seem to be something about scanning the sub-directories of Path to create 1 big .sdf file per sub-directory, but with the provided code which doesn't use any system command or chdir, it should be easier to adapt to your needs.

Recreate input folder tree for output of some analyses

I am new to Python and, although having been reading and enjoying it so far, have ∂ experience, where ∂ → 0.
I have a folder tree and each folder at the bottom of the tree's branches contains many files. For me, this whole tree in the input.
I would to perform several steps of analysis (I believe these are irrelavant to this question), the results of which I would like to have returned in an identical tree to that of the input, called output.
I have two ideas:
Read through each folder recursively using os.walk() and for each file to perform the analysis, and
Use a function such as shutil.copytree() and perform the analysis somewhere along the way. So actually, I do not want to COPY the tree at all, rather replicate it's structure but with new files. I thought this might be a kind of 'hack' as I do actually want to use each input file to create the output file, so instead of a copycommand, I need an analyse command. The rest should remain unchanged as far as my imagination allows me to understand.
I have little experience with option 1 and zero experience with option 2.
For smaller trees up until now I have been hard-coding the paths, which has become too time-consuming at this point.
I have also seen more mundane ways, such as using glob to first find all the files I would like and work on them, but I don't know how this might help find a shortcut in recreating the input tree for my output.
My attempt at option 1 looks like this:
import os
for root, dirs, files in os.walk('/Volumes/Mac OS Drive/Data/input/'):
# I have no actual need to print these, it just helps me see what is happening
print root, "\n"
print dirs, "\n"
# This is my actual work going on
[analysis_function(name) for name in files]
however I fear this is going to be very slow, I would also like to do some kind of filtering on files too - for example the .DS_Store files created in mac trees are included in the results of the above. I would attempt to use the fnmatch module to filter only the files I want.
I have seen in the copytree function that it is possible to ignore files according to a pattern, which would be helpful, however I do not understand from the documentation where I could put my analysis function in on each file.
You can use both options: you could provide your custom copy_function that performs analysis instead of the default shutil.copy2 to shutil.copytree() (it is a more of a hack) or you could use os.walk() to have a finer control over the process.
You don't need to create parent directories manually either way. copytree() creates the parent directories for you and os.makedirs(root) can create parent directories if you use os.walk():
#!/usr/bin/env python2
import fnmatch
import itertools
import os
ignore_dir = lambda d: d in ('.git', '.svn', '.hg')
src_dir = '/Volumes/Mac OS Drive/Data/input/' # source directory
dst_dir = '/path/to/destination/' # destination directory
for root, dirs, files in os.walk(src_dir):
for input_file in fnmatch.filter(files, "*.input"): # for each input file
output_file = os.path.splitext(input_file)[0] + '.output'
output_dir = os.path.join(dst_dir, root[len(src_dir):])
if not os.path.isdir(output_dir):
os.makedirs(output_dir) # create destination directories
analyze(os.path.join(root, input_file), # perform analysis
os.path.join(output_dir, output_file))
# don't visit ignored subtrees
dirs[:] = itertools.ifilterfalse(ignore_dir, dirs)

Categories