Checking and displaying files with os.walk()

Checking and displaying files with os.walk() - python

The intended purpose of the program is to walk through the operating system's directories starting from path and collect every file while passing it to the check() function. This function seems to work fine in printing every file even if the check() line was replaced with a simple print(file) so where am I going wrong in executing this? Should I be storing all files in a list and then afterwards reading from that list to perform my actions?
for paths, subdirs, files in os.walk(path, topdown=True):
for file in files:
check(file)

Maybe, you need the filepath, not the filename.
for paths, subdirs, files in os.walk(path, topdown=True):
for file in files:
check(os.path.join(paths, file))

Related

FileNotFoundError when attempting to move files into a temp directory

I am attempting to move files inside of a subdirectory into a temp folder. Here is my code:
with tempfile.TemporaryDirectory() as tempDirectory:
for root, dirs, files in os.walk(fileDestination, topdown=True):
for file in files:
shutil.move(file, tempDirectory)
When I look at my debugger I can see the values of the file variable holding the files I want to move. But nothing moves and I am then given the error FileNotFoundError that references the file I want to move. When I look into my file explorer I can see that the file did not move.

Figured it out on my own. So even though file variable was holding the filename it was not holding the entire path to the file. The below code works:
for subdir, dirs, files in os.walk(fileDestination, topdown=True):
for file in files:
fileName = os.path.join(fileDestination+"\\"+file)
print(fileName)
shutil.move(fileName, tempDirectory)

Create a zip with only .pdf and .xml files from one directory

I would love to know how i can zip only all pdfs from the main directory without including the subfolders.
I've tried several times changing the code, without any succes with what i want to achieve.
import zipfile
fantasy_zip = zipfile.ZipFile('/home/rob/Desktop/projects/zenjobv2/archivetest.zip', 'w')
for folder, subfolders, files in os.walk('/home/rob/Desktop/projects/zenjobv2/'):
for file in files:
if file.endswith('.pdf'):
fantasy_zip.write(os.path.join(folder, file), os.path.relpath(os.path.join(folder,file), '/home/rob/Desktop/projects/zenjobv2/'), compress_type = zipfile.ZIP_DEFLATED)
elif file.endswith('.xml'):
fantasy_zip.write(os.path.join(folder, file), os.path.relpath(os.path.join(folder,file), '/home/rob/Desktop/projects/zenjobv2/'), compress_type = zipfile.ZIP_DEFLATED)
fantasy_zip.close()
I expect that a zip is created only with the .pdfs and .xml files from the zenjobv2 folder/directory without including any other folders/subfolders.

You are looping through the entire directory tree with os.walk(). It sounds like you want to just look at the files in a given directory. For that, consider os.scandir(), which returns an iterator of all files and subdirectories in a given directory. You will just have to filter out elements that are directories:
root = "/home/rob/Desktop/projects/zenjobv2"
for entry in os.scandir(root):
if entry.is_dir():
continue # Just in case there are strangely-named directories
if entry.path.endswith(".pdf") or entry.path.endswith(".xml"):
# Process the file at entry.path as you see fit

Reading all files that start with a certain string in a directory

Say I have a directory.
In this directory there are single files as well as folders.
Some of those folders could also have subfolders, etc.
What I am trying to do is find all of the files in this directory that start with "Incidences" and read each csv into a pandas data frame.
I am able to loop through all the files and get the names, but cannot read them into data frames.
I am getting the error that "___.csv" does not exist, as it might not be directly in the directory, but rather in a folder in another folder in that directory.
I have been trying the attached code.
inc_files2 = []
pop_files2 = []
for root, dirs, files in os.walk(directory):
for f in files:
if f.startswith('Incidence'):
inc_files2.append(f)
elif f.startswith('Population Count'):
pop_files2.append(f)
for file in inc_files2:
inc_frames2 = map(pd.read_csv, inc_files2)
for file in pop_files2:
pop_frames2 = map(pd.read_csv, pop_files2)

You are adding only file name to the lists, not their path. You can use something like this to add paths instead:
inc_files2.append(os.path.join(root, f))

You have to add the path from the root directory where you are

Append the entire pathname, not just the bare filename, to inc_files2.

You can use os.path.abspath(f) to read the full path of a file.
You can make use of this by making the following changes to your code.
for root, dirs, files in os.walk(directory):
for f in files:
f_abs = os.path.abspath(f)
if f.startswith('Incidence'):
inc_files2.append(f_abs)
elif f.startswith('Population Count'):
pop_files2.append(f_abs)

iterating through folders and from each use one specific file in a method python

What I want to do is iterate through folders in a directory and in each folder find a file 'fileX' which I want to give to a method which itself needs the file name as a parameter to open it and get a specific value from it. So 'method' will extract some value from 'fileX' (the file name is the same in every folder).
My code looks something like this but I always get told that the file I want doesn't exist which is not the case:
import os
import xy
rootdir =r'path'
for root, dirs, files in os.walk(rootdir):
for file in files:
gain = xy.method(fileX)
print gain
Also my folders I am iterating through are named like 'folderX0', 'folderX1',..., 'folderX99', meaning they all have the same name with increasing ending numbers. It would be nice if I could tell the program to ignore every other folder which might be in 'path'.
Thanks for the help!

os.walk returns file and directory names relative to the root directory that it gives. You can combine them with os.path.join:
for root, dirs, files in os.walk(rootdir):
for file in files:
gain = xy.method(os.path.join(root, file))
print gain
See the documentation for os.walk for details:
To get a full path (which begins with top) to a file or directory in dirpath, do os.path.join(dirpath, name).
To trim it to ignore any folders but those named folderX, you could do something like the following. When doing os.walk top down (the default), you can delete items from the dirs list to prevent os.walk from looking in those directories.
for root, dirs, files in os.walk(rootdir):
for dir in dirs:
if not re.match(r'folderX[0-9]+$', dir):
dirs.remove(dir)
for file in files:
gain = xy.method(os.path.join(root, file))
print gain

os.walk on non C drive directory

I know there are several posts that touch on this, but I haven't found one that works for me yet. I need to create a list of files with an .mxd extension by searching an entire mapped directory. I used this code and it works:
import os
file_list = []
for (paths, dirs, files) in os.walk(folder):
for file in files:
if file.endswith(".mxd"):
file_list.append(os.path.join(paths, file))
However, it only works on the C drive. I need to be able to search for these files on a mapped drive of my choosing. Here's the script I'm using, but it doesn't work. I know there is an mxd file in the subdirectory of this drive, but it isn't being reported in the file list. In fact, the file list is totally empty, and it shouldn't be.
import os
path = r"U:/TEST/"
filenamelist = []
for files in os.walk(path):
if file.endswith(".mxd"):
filenamelist.append(files)
Does someone see anything wrong in my second block of code that would provent it from iterated through subdirectories at the given path and reporting back files with an .mxd extension?

os.walk(path) yields a 3-tuple which is commonly unpacked as root, dirs, files. So instead of
for files in os.walk(path):
...
use
for root, dirs, files in os.walk(path):
for filename in files:
if filename.endswith(".mxd"):
filenamelist.append(filename)

Does someone see anything wrong in my second block of code that would provent it from iterated through subdirectories at the given path and reporting back files with an .mxd extension?
Yes. Compare and contrast your two loops:
for (paths, dirs, files) in os.walk(folder):
for file in files:
if file.endswith(".mxd"):
file_list.append(os.path.join(paths, file))
for files in os.walk(path):
if file.endswith(".mxd"):
filenamelist.append(files)
You're trying to do the exact same thing, but not using even remotely the same code.
In the first one, you loop over the walk, storing each tuple in (paths, dirs, files), then loop over files.
In the second one, you loop over the walk, storing each tuple in files, don't loop over anything, and then just use some variable named file left over from some earlier code.
And then, even if that part worked, you end up appending files (which, remember, is a tuple of three lists) rather than file—or, probably better, os.path.join(paths, file)—to the list.
Just make the second one look like the first. Or, better, put it in a function and call it twice, instead of copying and pasting it.

Here's the final script that worked for crawling on a drive root other than C:
import os
file_list = []
path = r'U:\\'
for (dirpath, subdirs, files) in os.walk(path):
for file in files:
if file.endswith(".mxd"):
file_list.append(os.path.join(dirpath, file))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Checking and displaying files with os.walk() - python

Maybe, you need the filepath, not the filename. for paths, subdirs, files in os.walk(path, topdown=True): for file in files: check(os.path.join(paths, file))

Related

FileNotFoundError when attempting to move files into a temp directory

Create a zip with only .pdf and .xml files from one directory

Reading all files that start with a certain string in a directory

iterating through folders and from each use one specific file in a method python

os.walk on non C drive directory

Categories

Resources