How to check whether file name contains a specific character with Pathlib - python

Since it looks like Pathlib is the future, I'm trying to refactor some of my code to change from from previous use of os to Pathlib. I'm stuck with the following problem. Since I work with a Mac, sometimes the folders contain hidden files preceded by a period (.DS_Store or names from deleted files preceded by ._). That gets me into a lot of problems when I loop through files in a directory that have certain extension. To avoid this problem using os.walk, I do the following:
for root, dirs, files in os.walk(DIR_NAME):
# iterate all files
for file in files:
if file.endswith(ext):
if file.startswith("."):
continue
do something with the file
I know we have the .stem and .suffix method to manipulate file names with Pathlib, but I don't see how they can help with this problem. The .startswith seems more intuitive but alas it does not seem to be available in Pathlib. So, the question is, how would one go about doing this in Pathlib?

Related

-Python- Move All PDF Files in Folder to NewDirectory Based on Matching Names, Using Glob or Shutil

I'm trying to write code that will move hundreds of PDF files from a :/Scans folder into another directory based on the matching each client's name. I'm not sure if I should be using Glob, or Shutil, or a combination of both. My working theory is that Glob should work for such a program, as the glob module "finds all the pathnames matching a specified pattern," and then use Shutil to physically move the files?
Here is a breakdown of my file folders to give you a better idea of what I'm trying to do.
Within :/Scans folder I have thousands of PDF files, manually renamed based on client and content, such that the folder looks like this:
lastName, firstName - [contentVariable]
(repeat the above 100,000x)
Within the :/J drive of my computer I have a folder named 'Clients' with sub-folders for each and every client, similar to the pattern above, named as 'lastName, firstName'
I'm looking to have the program go through the :/Scans folder and move each PDF to the matching client folder based on 'lastName, firstName'
I've been able to write a simple program to move files between folders, but not one that will do the aforesaid name matching.
shutil.copy('C:/Users/Kenny/Documents/Scan_Drive','C:/Users/Kenny/Documents/Clients')
^ Moving a file from one folder to another.. quite easily done.
Is there a way to modify the above code to apply to a regex (below)?
shutil.copy('C:/Users/Kenny/Documents/Scan_Drive/\w*', 'C:/Users/Kenny/Documents/Clients/\w*')
EDIT: #Byberi - Something as such?
path = "C:/Users/Kenny/Documents/Scans"
dirs = os.path.isfile()
This would print all the files and directories
for file in dirs:
print file
dest_dir = "C:/Users/Kenny/Documents/Clients"
for file in glob.glob(r'C:/*'):
print(file)
shutil.copy(file, dest_dir)
I've consulted the following threads already, but I cannot seem to find how to match and move the files.
Select files in directory and move them based on text list of filenames
https://docs.python.org/3/library/glob.html
Python move files from directories that match given criteria to new directory
https://www.guru99.com/python-copy-file.html
https://docs.python.org/3/howto/regex.html
https://code.tutsplus.com/tutorials/file-and-directory-operations-using-python--cms-25817

Python - Navigating through Subdirectories that Meet Naming Criteria

I am using Python 3.5 to analyze data contained in csv files. These files are contained in a "figs" directory, which is contained in a case directory, which is contained in an overall data directory, e.g.:
/strm1/serino/DATA/06052009/figs
Or more generally:
/strm1/serino/DATA/case_date_in_MMDDYYYY/figs
The directory I am starting in is '/strm1/serino/DATA/,' and each subdirectory is the month, day, and year of a case I am working with. Each subdirectory contains another subdirectory named 'figs,' and that is the location of each case's csv file. To be exact:
/strm1/serino/DATA/case_date_in_MMDDYYYY/figs/case_date_in_MMDDYYYY.csv
So, I would like to start in my DATA directory and go through its subdirectories to find those that have the MMDDYYYY naming. However, some of the case directories may be named with a state abbreviation at the end, like: '06052009_TX.' Therefore, instead of matching the MMDDYYYY naming exactly, it could be something as simple as verifying that the directory name contains any number 1 through 9.
Once I am in the first subdirectory (the case directory) I would like to move into the 'figs' subdirectory. Once there, I want to access the csv file with the same naming convention as the first subdirectory (the case directory). I will fill existing arrays with the data contained in each csv file.
Basically, my question concerns navigating through multiple subdirectories that match a certain naming convention and ultimately accessing the data file at the "end." I was naively playing around with glob, fnmatch, os.listdir, and os.walk, but I could not get anything close enough to working that I feel would be helpful to include. I am not very familiar with those modules. What I can include is what I am going for:
for dirs in data_dir that contain a number:
go into this directory
go into 'figs' directory
read data from the csv file whose name matches its case directory name (or whose name format matches the case directory name format)
I have come across related questions, but I have not been able to apply their answers in the way that I would like, especially with nested directories. I really appreciate the help, and let me know if I need to clarify anything.
The following should get you going. It uses the datetime.strptime() function to attempt to convert each folder name into a valid datetime object. If the conversion fails, then you know that the folder name is not in the correct format and can be skipped. It then attempts to parse any CSV file found in the corresponding fig folder:
from datetime import datetime
import glob
import csv
import os
dirpath, dirnames, filenames = next(os.walk('/strm1/serino/DATA'))
for dirname in dirnames:
if len(dirname) >= 8:
try:
dt = datetime.strptime(dirname[:8], '%m%d%Y')
print(dt, dirname)
csv_folder = os.path.join(dirpath, dirname)
for csv_file in glob.glob(os.path.join(csv_folder, 'figs', '*.csv')):
with open(csv_file, newline='') as f_input:
csv_input = csv.reader(f_input)
for row in csv_input:
print(row)
except ValueError as e:
pass
You listed several problems above. Which one are you stuck on? It seems like you already know how to navigate the file storage system using os.path. You may not know of the function os.path.join() which allows you to manually specify a file path relative to a file as such:
os.path.abspath(os.path.join(os.path.dirname(__file__), '../..', 'Data/TrailShelters/'))
To break down the above:
os.path.dirname(__file__) returns the path of the current file. '../..' means: go up two levels in the folder hierarchy. And Data/TrailShelters/ is the directory I wish to navigate to.
How does this apply to your particular case? Well, you will need to make some adaptations but you can store the os.path of the parent directory in a variable. Then you can essentially use a while sub_dir is not null loop to iterate through subdirectories. For every subdirectory you will want to examine its os.path and extract the particular part of the path you are interested in. Then you can simply use something like: if 'TN' in subdirectory_name to determine if it is a subdirectory you are interested in. If so; then update the saved os.path of the parent directory by appending the path to the subdirectory. Does that make any sense?

Search a directory, including all subdirectories that may or may not exist, for a file in Python.

I want to search a directory, including all subdirectories that may or may not exist, for a file in Python.
I see lots of examples where the directory we are peeking into is known, such as:
os.path.exists(/dir1/myfile.pdf)
...but what if the file I want is located in some arbitrary subdirectory that I don't already know exists or not? For example, the above snippet could never find a file here:
/dir1/dir2/dir3/.../dir20/myfile.pdf
and could clearly never generalize without explicitly running that line 20 times, once for each directory.
I suppose I'm looking for a recursive search, where I don't know the exact structure of the filesystem (if I said that right).
As suggested by #idjaw, try os.walk() like so:
import os
import os.path
for (dir,subdirs,files) in os.walk('/dir1'):
# Don't go into the CVS subdir!
if 'CVS' in subdirs:
subdirs.remove('CVS')
if 'myfile.pdf' in files:
print("Found:", os.path.join(dir, 'myfile.pdf'))
Here is code do find a file (in my case "wsgi.py") below the pwd
import os
for root, dirs, files in os.walk('.'):
if "wsgi.py" in files:
print root
./jg18/blog/blog
./goat/superlists/superlists
./jcg_blog/jcg_blog
./joelgoldstick.com.16/blog/blog
./blankdj19/blank/blank
./cp/cpblog/cpblog
./baseball/baseball_stats/baseball_stats
./zipcodes/zipcodes/zipcodes
./django.1.6.tut/mysite/mysite
./bits/bits/bits
If the file exists only in one dir, it will list one directory

Python - Opening successive Files without physically opening every one

If I am to read a number of files in Python 3.2, say 30-40, and i want to keep the file references in a list
(all the files are in a common folder)
Is there anyway how i can open all the files to their respective file handles in the list, without having to individually open every file via the file.open() function
This is simple, just use a list comprehension based on your list of file paths. Or if you only need to access them one at a time, use a generator expression to avoid keeping all forty files open at once.
list_of_filenames = ['/foo/bar', '/baz', '/tmp/foo']
open_files = [open(f) for f in list_of_filenames]
If you want handles on all the files in a certain directory, use the os.listdir function:
import os
open_files = [open(f) for f in os.listdir(some_path)]
I've assumed a simple, flat directory here, but note that os.listdir returns a list of paths to all file objects in the given directory, whether they are "real" files or directories. So if you have directories within the directory you're opening, you'll want to filter the results using os.path.isfile:
import os
open_files = [open(f) for f in os.listdir(some_path) if os.path.isfile(f)]
Also, os.listdir only returns the bare filename, rather than the whole path, so if the current working directory is not some_path, you'll want to make absolute paths using os.path.join.
import os
open_files = [open(os.path.join(some_path, f)) for f in os.listdir(some_path)
if os.path.isfile(f)]
With a generator expression:
import os
all_files = (open(f) for f in os.listdir(some_path)) # note () instead of []
for f in all_files:
pass # do something with the open file here.
In all cases, make sure you close the files when you're done with them. If you can upgrade to Python 3.3 or higher, I recommend you use an ExitStack for one more level of convenience .
The os library (and listdir in particular) should provide you with the basic tools you need:
import os
print("\n".join(os.listdir())) # returns all of the files (& directories) in the current directory
Obviously you'll want to call open with them, but this gives you the files in an iterable form (which I think is the crux of the issue you're facing). At this point you can just do a for loop and open them all (or some of them).
quick caveat: Jon Clements pointed out in the comments of Henry Keiter's answer that you should watch out for directories, which will show up in os.listdir along with files.
Additionally, this is a good time to write in some filtering statements to make sure you only try to open the right kinds of files. You might be thinking you'll only ever have .txt files in a directory now, but someday your operating system (or users) will have a clever idea to put something else in there, and that could throw a wrench in your code.
Fortunately, a quick filter can do that, and you can do it a couple of ways (I'm just going to show a regex filter):
import os,re
scripts=re.compile(".*\.py$")
files=[open(x,'r') for x in os.listdir() if os.path.isfile(x) and scripts.match(x)]
files=map(lambda x:x.read(),files)
print("\n".join(files))
Note that I'm not checking things like whether I have permission to access the file, so if I have the ability to see the file in the directory but not permission to read it then I'll hit an exception.

python zipfile basename

I have some homework that I am trying to complete. I don't want the answer. I'm just having trouble in starting. The work I have tried is not working at all... Can someone please just provide a push in the right direction. I am trying to learn but after trying and trying I need some help.
I know I can you os.path.basename() to get the basename and then add it to the file name but I can't get it together.
Here is the assignment
In this project, write a function that takes a directory path and creates an archive of the directory only. For example, if the same path were used as in the example ("c:\\xxxx\\Archives\\archive_me"), the zipfile would contain archive_me\\groucho, archive_me\\harpo and archive_me\\chico.
The base directory (archive_me in the example above) is the final element of the input, and all paths recorded in the zipfile should start with the base directory.
If the directory contains sub-directories, the sub-directory names and any files in the sub-directories should not be included. (Hint: You can use isfile() to determine if a filename represents a regular file and not a directory.)
Thanks again any direction would be great.
It would help to know what you tried yourself, so I'm only giving a few pointers to methods in the standard libraries:
os.listdir to get the a list of files and folders under a given directory (beware, it returns only the file/folder name, not the full path!)
os.path.isfile as mentioned in the assignment to check if a given path represents a file or a folder
os.path.isdir, the opposite of os.path.isfile (thanks inspectorG4adget)
os.path.join to join a filename with the basedir without having to worry about slashes and delimiters
ZipFile for handling, well, zip files
zipFile.write to write the files found to the zip
I'm not sure you'll need all of those, but it doesn't hurt knowing they exist.

Categories