Following a nested directory structure until the end - python

I have a some directories that contain some other directories which, at the lowest level, contain bunch of csv files such as (folder) a -> b -> c -> (csv files). There is usually only one folder at each level. When I process a directory how can I follow this structure until the end to get the csv files ? I was thinking maybe a recursive solution but I think there may be better ways to do this. I am using python. Hope I was clear.

The os package has a walk function that will do exactly what you need:
for current_path, directory, files in walk("/some/path"):
# current_path is the full path of the directory we are currently in
# directory is the name of the directory
# files is a list of file names in this directory
You can use os.path's to derive the full path to each file (if you need it).
Alternately, you might find the glob module to be of more use to you:
for csv_file in glob(/some/path/*/*.csv"):
# csv_file is the full path to the csv file.

Related

Python move files from current path to specific folder named like or similar to the file being moved

My Folder Structure looks like this:
Folder 95000
Folder 95002
Folder 95009
AR_95000.pdf
AR_95002.pdf
AR_95009.pdf
BS_95000.pdf
BS_95002.pdf
BS_95009.pdf
[Note folder is named 95000, 95002 etc, the actual word 'folder' is not present in folder name, its just mentioned here for representation purpose.]
My goal is, move files "AR_95000.pdf" and "BS_95000.pdf" to folder named - "95000",
then "AR_95002.pdf" and "BS_95002.pdf" to folder named - "95002"and so on.
The PDFs are reports generated by system and thus I can not control the naming of the same.
Thanks!
Using pathlib this task becomes super easy:
from pathlib import Path
root = Path("/path/to/your/root/dir")
for file in root.glob("*.pdf"):
folder_name = file.stem.rpartition("_")[-1]
file.rename(root / folder_name / file.name)
As you can see, one main advantage of pathlib over os/shutil (in this case) is the interface Path objects provide directly to os-like functions. This way the actual copying (rename()) is done directly as an instance method.
References:
Path.glob
Path.stem
str.rpartition
Path.rename
path concatenation

-Python- Move All PDF Files in Folder to NewDirectory Based on Matching Names, Using Glob or Shutil

I'm trying to write code that will move hundreds of PDF files from a :/Scans folder into another directory based on the matching each client's name. I'm not sure if I should be using Glob, or Shutil, or a combination of both. My working theory is that Glob should work for such a program, as the glob module "finds all the pathnames matching a specified pattern," and then use Shutil to physically move the files?
Here is a breakdown of my file folders to give you a better idea of what I'm trying to do.
Within :/Scans folder I have thousands of PDF files, manually renamed based on client and content, such that the folder looks like this:
lastName, firstName - [contentVariable]
(repeat the above 100,000x)
Within the :/J drive of my computer I have a folder named 'Clients' with sub-folders for each and every client, similar to the pattern above, named as 'lastName, firstName'
I'm looking to have the program go through the :/Scans folder and move each PDF to the matching client folder based on 'lastName, firstName'
I've been able to write a simple program to move files between folders, but not one that will do the aforesaid name matching.
shutil.copy('C:/Users/Kenny/Documents/Scan_Drive','C:/Users/Kenny/Documents/Clients')
^ Moving a file from one folder to another.. quite easily done.
Is there a way to modify the above code to apply to a regex (below)?
shutil.copy('C:/Users/Kenny/Documents/Scan_Drive/\w*', 'C:/Users/Kenny/Documents/Clients/\w*')
EDIT: #Byberi - Something as such?
path = "C:/Users/Kenny/Documents/Scans"
dirs = os.path.isfile()
This would print all the files and directories
for file in dirs:
print file
dest_dir = "C:/Users/Kenny/Documents/Clients"
for file in glob.glob(r'C:/*'):
print(file)
shutil.copy(file, dest_dir)
I've consulted the following threads already, but I cannot seem to find how to match and move the files.
Select files in directory and move them based on text list of filenames
https://docs.python.org/3/library/glob.html
Python move files from directories that match given criteria to new directory
https://www.guru99.com/python-copy-file.html
https://docs.python.org/3/howto/regex.html
https://code.tutsplus.com/tutorials/file-and-directory-operations-using-python--cms-25817

Python - Navigating through Subdirectories that Meet Naming Criteria

I am using Python 3.5 to analyze data contained in csv files. These files are contained in a "figs" directory, which is contained in a case directory, which is contained in an overall data directory, e.g.:
/strm1/serino/DATA/06052009/figs
Or more generally:
/strm1/serino/DATA/case_date_in_MMDDYYYY/figs
The directory I am starting in is '/strm1/serino/DATA/,' and each subdirectory is the month, day, and year of a case I am working with. Each subdirectory contains another subdirectory named 'figs,' and that is the location of each case's csv file. To be exact:
/strm1/serino/DATA/case_date_in_MMDDYYYY/figs/case_date_in_MMDDYYYY.csv
So, I would like to start in my DATA directory and go through its subdirectories to find those that have the MMDDYYYY naming. However, some of the case directories may be named with a state abbreviation at the end, like: '06052009_TX.' Therefore, instead of matching the MMDDYYYY naming exactly, it could be something as simple as verifying that the directory name contains any number 1 through 9.
Once I am in the first subdirectory (the case directory) I would like to move into the 'figs' subdirectory. Once there, I want to access the csv file with the same naming convention as the first subdirectory (the case directory). I will fill existing arrays with the data contained in each csv file.
Basically, my question concerns navigating through multiple subdirectories that match a certain naming convention and ultimately accessing the data file at the "end." I was naively playing around with glob, fnmatch, os.listdir, and os.walk, but I could not get anything close enough to working that I feel would be helpful to include. I am not very familiar with those modules. What I can include is what I am going for:
for dirs in data_dir that contain a number:
go into this directory
go into 'figs' directory
read data from the csv file whose name matches its case directory name (or whose name format matches the case directory name format)
I have come across related questions, but I have not been able to apply their answers in the way that I would like, especially with nested directories. I really appreciate the help, and let me know if I need to clarify anything.
The following should get you going. It uses the datetime.strptime() function to attempt to convert each folder name into a valid datetime object. If the conversion fails, then you know that the folder name is not in the correct format and can be skipped. It then attempts to parse any CSV file found in the corresponding fig folder:
from datetime import datetime
import glob
import csv
import os
dirpath, dirnames, filenames = next(os.walk('/strm1/serino/DATA'))
for dirname in dirnames:
if len(dirname) >= 8:
try:
dt = datetime.strptime(dirname[:8], '%m%d%Y')
print(dt, dirname)
csv_folder = os.path.join(dirpath, dirname)
for csv_file in glob.glob(os.path.join(csv_folder, 'figs', '*.csv')):
with open(csv_file, newline='') as f_input:
csv_input = csv.reader(f_input)
for row in csv_input:
print(row)
except ValueError as e:
pass
You listed several problems above. Which one are you stuck on? It seems like you already know how to navigate the file storage system using os.path. You may not know of the function os.path.join() which allows you to manually specify a file path relative to a file as such:
os.path.abspath(os.path.join(os.path.dirname(__file__), '../..', 'Data/TrailShelters/'))
To break down the above:
os.path.dirname(__file__) returns the path of the current file. '../..' means: go up two levels in the folder hierarchy. And Data/TrailShelters/ is the directory I wish to navigate to.
How does this apply to your particular case? Well, you will need to make some adaptations but you can store the os.path of the parent directory in a variable. Then you can essentially use a while sub_dir is not null loop to iterate through subdirectories. For every subdirectory you will want to examine its os.path and extract the particular part of the path you are interested in. Then you can simply use something like: if 'TN' in subdirectory_name to determine if it is a subdirectory you are interested in. If so; then update the saved os.path of the parent directory by appending the path to the subdirectory. Does that make any sense?

Search a directory, including all subdirectories that may or may not exist, for a file in Python.

I want to search a directory, including all subdirectories that may or may not exist, for a file in Python.
I see lots of examples where the directory we are peeking into is known, such as:
os.path.exists(/dir1/myfile.pdf)
...but what if the file I want is located in some arbitrary subdirectory that I don't already know exists or not? For example, the above snippet could never find a file here:
/dir1/dir2/dir3/.../dir20/myfile.pdf
and could clearly never generalize without explicitly running that line 20 times, once for each directory.
I suppose I'm looking for a recursive search, where I don't know the exact structure of the filesystem (if I said that right).
As suggested by #idjaw, try os.walk() like so:
import os
import os.path
for (dir,subdirs,files) in os.walk('/dir1'):
# Don't go into the CVS subdir!
if 'CVS' in subdirs:
subdirs.remove('CVS')
if 'myfile.pdf' in files:
print("Found:", os.path.join(dir, 'myfile.pdf'))
Here is code do find a file (in my case "wsgi.py") below the pwd
import os
for root, dirs, files in os.walk('.'):
if "wsgi.py" in files:
print root
./jg18/blog/blog
./goat/superlists/superlists
./jcg_blog/jcg_blog
./joelgoldstick.com.16/blog/blog
./blankdj19/blank/blank
./cp/cpblog/cpblog
./baseball/baseball_stats/baseball_stats
./zipcodes/zipcodes/zipcodes
./django.1.6.tut/mysite/mysite
./bits/bits/bits
If the file exists only in one dir, it will list one directory

python zipfile basename

I have some homework that I am trying to complete. I don't want the answer. I'm just having trouble in starting. The work I have tried is not working at all... Can someone please just provide a push in the right direction. I am trying to learn but after trying and trying I need some help.
I know I can you os.path.basename() to get the basename and then add it to the file name but I can't get it together.
Here is the assignment
In this project, write a function that takes a directory path and creates an archive of the directory only. For example, if the same path were used as in the example ("c:\\xxxx\\Archives\\archive_me"), the zipfile would contain archive_me\\groucho, archive_me\\harpo and archive_me\\chico.
The base directory (archive_me in the example above) is the final element of the input, and all paths recorded in the zipfile should start with the base directory.
If the directory contains sub-directories, the sub-directory names and any files in the sub-directories should not be included. (Hint: You can use isfile() to determine if a filename represents a regular file and not a directory.)
Thanks again any direction would be great.
It would help to know what you tried yourself, so I'm only giving a few pointers to methods in the standard libraries:
os.listdir to get the a list of files and folders under a given directory (beware, it returns only the file/folder name, not the full path!)
os.path.isfile as mentioned in the assignment to check if a given path represents a file or a folder
os.path.isdir, the opposite of os.path.isfile (thanks inspectorG4adget)
os.path.join to join a filename with the basedir without having to worry about slashes and delimiters
ZipFile for handling, well, zip files
zipFile.write to write the files found to the zip
I'm not sure you'll need all of those, but it doesn't hurt knowing they exist.

Categories