How to do computations through directory and subfolders - python

I have one main directory which has 9 subfolders. Inside of each of them, there are 1000 files. I needed to do a for loop for reading main directory and folders but the problem is that, subfolder names are not similar and don't have a numerator and I got stuck. I have seen Iterate through folders, then subfolders and print filenames with path to text file but I could not distinguish how to get started.
My effort is below:
import os
for root, dirs, files in os.walk(r'\Desktop\output\new our scenario\test'):
for file in files:
with open(os.path.join(root, file), "r") as auto:
##Doing Whatever I want
But it's not correct and does not work.

Do you know glob? That might be a solution to your problem.
You can get a list of all files in subdirectories by using wildcard path names, e.g.:
Here is an example for looping through txt files, but you do not necessarily restrict it to a file type. But if you do not use *.* at the end it will also list dirs
import glob
file_list = glob.glob('known_dir/*/*.txt')
for file in file_list:
with open(file, "r") as auto:
##Doing Whatever you want

Related

Create a zip with only .pdf and .xml files from one directory

I would love to know how i can zip only all pdfs from the main directory without including the subfolders.
I've tried several times changing the code, without any succes with what i want to achieve.
import zipfile
fantasy_zip = zipfile.ZipFile('/home/rob/Desktop/projects/zenjobv2/archivetest.zip', 'w')
for folder, subfolders, files in os.walk('/home/rob/Desktop/projects/zenjobv2/'):
for file in files:
if file.endswith('.pdf'):
fantasy_zip.write(os.path.join(folder, file), os.path.relpath(os.path.join(folder,file), '/home/rob/Desktop/projects/zenjobv2/'), compress_type = zipfile.ZIP_DEFLATED)
elif file.endswith('.xml'):
fantasy_zip.write(os.path.join(folder, file), os.path.relpath(os.path.join(folder,file), '/home/rob/Desktop/projects/zenjobv2/'), compress_type = zipfile.ZIP_DEFLATED)
fantasy_zip.close()
I expect that a zip is created only with the .pdfs and .xml files from the zenjobv2 folder/directory without including any other folders/subfolders.
You are looping through the entire directory tree with os.walk(). It sounds like you want to just look at the files in a given directory. For that, consider os.scandir(), which returns an iterator of all files and subdirectories in a given directory. You will just have to filter out elements that are directories:
root = "/home/rob/Desktop/projects/zenjobv2"
for entry in os.scandir(root):
if entry.is_dir():
continue # Just in case there are strangely-named directories
if entry.path.endswith(".pdf") or entry.path.endswith(".xml"):
# Process the file at entry.path as you see fit

Reading all files that start with a certain string in a directory

Say I have a directory.
In this directory there are single files as well as folders.
Some of those folders could also have subfolders, etc.
What I am trying to do is find all of the files in this directory that start with "Incidences" and read each csv into a pandas data frame.
I am able to loop through all the files and get the names, but cannot read them into data frames.
I am getting the error that "___.csv" does not exist, as it might not be directly in the directory, but rather in a folder in another folder in that directory.
I have been trying the attached code.
inc_files2 = []
pop_files2 = []
for root, dirs, files in os.walk(directory):
for f in files:
if f.startswith('Incidence'):
inc_files2.append(f)
elif f.startswith('Population Count'):
pop_files2.append(f)
for file in inc_files2:
inc_frames2 = map(pd.read_csv, inc_files2)
for file in pop_files2:
pop_frames2 = map(pd.read_csv, pop_files2)
You are adding only file name to the lists, not their path. You can use something like this to add paths instead:
inc_files2.append(os.path.join(root, f))
You have to add the path from the root directory where you are
Append the entire pathname, not just the bare filename, to inc_files2.
You can use os.path.abspath(f) to read the full path of a file.
You can make use of this by making the following changes to your code.
for root, dirs, files in os.walk(directory):
for f in files:
f_abs = os.path.abspath(f)
if f.startswith('Incidence'):
inc_files2.append(f_abs)
elif f.startswith('Population Count'):
pop_files2.append(f_abs)

Unable to use getsize method with os.walk() returned files

I am trying to make a small program that looks through a directory (as I want to find recursively all the files in the sub directories I use os.walk()).
Here is my code:
import os
import os.path
filesList=[]
path = "C:\\Users\Robin\Documents"
for(root,dirs,files) in os.walk(path):
for file in files:
filesList+=file
Then I try to use the os.path.getsize() method to elements of filesList, but it doesn't work.
Indeed, I realize that the this code fills the list filesList with characters. I don't know what to do, I have tried several other things, such as :
for(root,dirs,files) in os.walk(path):
filesList+=[file for file in os.listdir(root) if os.path.isfile(file)]
This does give me files, but only one, which isn't even visible when looking in the directory.
Can someone explain me how to obtain files with which we can work (that is to say, get their size, hash them, or modify them...) on with os.walk ?
I am new to Python, and I don't really understand how to use os.walk().
The issue I suspect you're running into is that file contains only the filename itself, not any directories you have to navigate through from your starting folder. You should use os.path.join to combine the file name with the folder it is in, which is the root value yielded by os.walk:
for(root,dirs,files) in os.walk(path):
for file in files:
filesList.append(os.path.join(root, file))
Now all the filenames in filesList will be acceptable to os.path.getsize and other functions (like open).
I also fixed a secondary issue, which is that your use of += to extend a list wouldn't work the way you intended. You'd need to wrap the new file path in a list for that to work. Using append is more appropriate for adding a single value to the end of a list.
If you want to get a list of files including path use:
for(root, dirs, files) in os.walk(path):
fullpaths = [os.path.join(root, fil) for fil in files]
filesList+=fullpaths

iterating through folders and from each use one specific file in a method python

What I want to do is iterate through folders in a directory and in each folder find a file 'fileX' which I want to give to a method which itself needs the file name as a parameter to open it and get a specific value from it. So 'method' will extract some value from 'fileX' (the file name is the same in every folder).
My code looks something like this but I always get told that the file I want doesn't exist which is not the case:
import os
import xy
rootdir =r'path'
for root, dirs, files in os.walk(rootdir):
for file in files:
gain = xy.method(fileX)
print gain
Also my folders I am iterating through are named like 'folderX0', 'folderX1',..., 'folderX99', meaning they all have the same name with increasing ending numbers. It would be nice if I could tell the program to ignore every other folder which might be in 'path'.
Thanks for the help!
os.walk returns file and directory names relative to the root directory that it gives. You can combine them with os.path.join:
for root, dirs, files in os.walk(rootdir):
for file in files:
gain = xy.method(os.path.join(root, file))
print gain
See the documentation for os.walk for details:
To get a full path (which begins with top) to a file or directory in dirpath, do os.path.join(dirpath, name).
To trim it to ignore any folders but those named folderX, you could do something like the following. When doing os.walk top down (the default), you can delete items from the dirs list to prevent os.walk from looking in those directories.
for root, dirs, files in os.walk(rootdir):
for dir in dirs:
if not re.match(r'folderX[0-9]+$', dir):
dirs.remove(dir)
for file in files:
gain = xy.method(os.path.join(root, file))
print gain

os.walk on non C drive directory

I know there are several posts that touch on this, but I haven't found one that works for me yet. I need to create a list of files with an .mxd extension by searching an entire mapped directory. I used this code and it works:
import os
file_list = []
for (paths, dirs, files) in os.walk(folder):
for file in files:
if file.endswith(".mxd"):
file_list.append(os.path.join(paths, file))
However, it only works on the C drive. I need to be able to search for these files on a mapped drive of my choosing. Here's the script I'm using, but it doesn't work. I know there is an mxd file in the subdirectory of this drive, but it isn't being reported in the file list. In fact, the file list is totally empty, and it shouldn't be.
import os
path = r"U:/TEST/"
filenamelist = []
for files in os.walk(path):
if file.endswith(".mxd"):
filenamelist.append(files)
Does someone see anything wrong in my second block of code that would provent it from iterated through subdirectories at the given path and reporting back files with an .mxd extension?
os.walk(path) yields a 3-tuple which is commonly unpacked as root, dirs, files. So instead of
for files in os.walk(path):
...
use
for root, dirs, files in os.walk(path):
for filename in files:
if filename.endswith(".mxd"):
filenamelist.append(filename)
Does someone see anything wrong in my second block of code that would provent it from iterated through subdirectories at the given path and reporting back files with an .mxd extension?
Yes. Compare and contrast your two loops:
for (paths, dirs, files) in os.walk(folder):
for file in files:
if file.endswith(".mxd"):
file_list.append(os.path.join(paths, file))
for files in os.walk(path):
if file.endswith(".mxd"):
filenamelist.append(files)
You're trying to do the exact same thing, but not using even remotely the same code.
In the first one, you loop over the walk, storing each tuple in (paths, dirs, files), then loop over files.
In the second one, you loop over the walk, storing each tuple in files, don't loop over anything, and then just use some variable named file left over from some earlier code.
And then, even if that part worked, you end up appending files (which, remember, is a tuple of three lists) rather than file—or, probably better, os.path.join(paths, file)—to the list.
Just make the second one look like the first. Or, better, put it in a function and call it twice, instead of copying and pasting it.
Here's the final script that worked for crawling on a drive root other than C:
import os
file_list = []
path = r'U:\\'
for (dirpath, subdirs, files) in os.walk(path):
for file in files:
if file.endswith(".mxd"):
file_list.append(os.path.join(dirpath, file))

Categories