Python Match Portion of File Name - python

I am trying to match file names within a folder using python so that I can run a secondary process on the files that match. My file names are such that they begin differently but match strings at some point as below:
3322_VEGETATION_AREA_2009_09
3322_VEGETATION_LINE_2009_09
4522_VEGETATION_POINT_2009_09
4422_VEGETATION_AREA_2009_09
8722_VEGETATION_LINE_2009_09
2522_VEGETATION_POINT_2009_09
4222_VEGETATION_AREA_2009_09
3522_VEGETATION_LINE_2009_09
3622_VEGETATION_POINT_2009_09
Would regex be the right approach to matching those files after the first underscore or am I overthinking this?

import glob
files = glob.glob("*VEGETATION*")
should do the trick. It should find all files in the current directory that contain "VEGETATION" somewhere in the filename

Related

How to check whether file name contains a specific character with Pathlib

Since it looks like Pathlib is the future, I'm trying to refactor some of my code to change from from previous use of os to Pathlib. I'm stuck with the following problem. Since I work with a Mac, sometimes the folders contain hidden files preceded by a period (.DS_Store or names from deleted files preceded by ._). That gets me into a lot of problems when I loop through files in a directory that have certain extension. To avoid this problem using os.walk, I do the following:
for root, dirs, files in os.walk(DIR_NAME):
# iterate all files
for file in files:
if file.endswith(ext):
if file.startswith("."):
continue
do something with the file
I know we have the .stem and .suffix method to manipulate file names with Pathlib, but I don't see how they can help with this problem. The .startswith seems more intuitive but alas it does not seem to be available in Pathlib. So, the question is, how would one go about doing this in Pathlib?

How to best locate a directory with a prefix input by the user?

I have a directory that creates a new subfolder each day, each subfolder's name always starts with the date it was created (i.e. MMDDYY). I need to prompt the user for the date of the file they need (something they'd already have) and search for a subfolder that has a matching prefix in the name. The rest of the folder name can be ignored.
If a folder with the correct prefix is found there will be a similar prompt to locate files in the folder that have a name leading with a 5 digit number that the user would also have. Those files just need copied to a new location. I'm just getting stuck on how to locate a subfolder when I only have the prefix to the folder name and same with the file inside that folder once it's found.
For example, I'm looking for a file that generated on 1/10/2019, the file name starts with 42333. The full folder name would be something like 01102019CHA71H2HBMNN. There would be two files that are found, one with a full file name that might be 42333aaabc.xrf and the other would be 42333aaabc with no file extension. These file names could exist in multiple other folders but usually I need them for specific dates.
If I understood correctly, you need a algorithm that the input is a prefix (a string).
In Python you can make "membership" tests with strings, for example:
>>> string = "A long string"
>>> "long" in string
True
Your algorithm would work with something like:
"If {prefix as string} in {directory/file name as string}:
do something"
But if your question is how to list files inside a directory, you can do this by two libraries:
os
subprocess (by calling "ls" in Linux or "dir" in Windows)
Or you could use, also the re library which is for regular expressions. It's a bit complex but way more flexible.
Good source for debugging RegEx: https://regexr.com/
For learning RegEx in Python: https://www.w3schools.com/python/python_regex.asp
Best wishes, pal
For learning

-Python- Move All PDF Files in Folder to NewDirectory Based on Matching Names, Using Glob or Shutil

I'm trying to write code that will move hundreds of PDF files from a :/Scans folder into another directory based on the matching each client's name. I'm not sure if I should be using Glob, or Shutil, or a combination of both. My working theory is that Glob should work for such a program, as the glob module "finds all the pathnames matching a specified pattern," and then use Shutil to physically move the files?
Here is a breakdown of my file folders to give you a better idea of what I'm trying to do.
Within :/Scans folder I have thousands of PDF files, manually renamed based on client and content, such that the folder looks like this:
lastName, firstName - [contentVariable]
(repeat the above 100,000x)
Within the :/J drive of my computer I have a folder named 'Clients' with sub-folders for each and every client, similar to the pattern above, named as 'lastName, firstName'
I'm looking to have the program go through the :/Scans folder and move each PDF to the matching client folder based on 'lastName, firstName'
I've been able to write a simple program to move files between folders, but not one that will do the aforesaid name matching.
shutil.copy('C:/Users/Kenny/Documents/Scan_Drive','C:/Users/Kenny/Documents/Clients')
^ Moving a file from one folder to another.. quite easily done.
Is there a way to modify the above code to apply to a regex (below)?
shutil.copy('C:/Users/Kenny/Documents/Scan_Drive/\w*', 'C:/Users/Kenny/Documents/Clients/\w*')
EDIT: #Byberi - Something as such?
path = "C:/Users/Kenny/Documents/Scans"
dirs = os.path.isfile()
This would print all the files and directories
for file in dirs:
print file
dest_dir = "C:/Users/Kenny/Documents/Clients"
for file in glob.glob(r'C:/*'):
print(file)
shutil.copy(file, dest_dir)
I've consulted the following threads already, but I cannot seem to find how to match and move the files.
Select files in directory and move them based on text list of filenames
https://docs.python.org/3/library/glob.html
Python move files from directories that match given criteria to new directory
https://www.guru99.com/python-copy-file.html
https://docs.python.org/3/howto/regex.html
https://code.tutsplus.com/tutorials/file-and-directory-operations-using-python--cms-25817

Extracting all file names in python

I have a application that converts from one photo format to another by inputting in cmd.exe following: "AppConverter.exe" "file.tiff" "file.jpeg"
But since i don't want to input this every time i want a photo converted, i would like a script that converts all files in the folder. So far i have this:
def start(self):
for root, dirs, files in os.walk("C:\\Users\\x\\Desktop\\converter"):
for file in files:
if file.endswith(".tiff"):
subprocess.run(['AppConverter.exe', '.tiff', '.jpeg'])
So how do i get the names of all the files and put them in subprocess. I am thinking taking basename (no ext.) for every file and pasting it in .tiff and .jpeg, but im at lost on how to do it.
I think the fastest way would be to use the glob module for expressions:
import glob
import subprocess
for file in glob.glob("*.tiff"):
subprocess.run(['AppConverter.exe', file, file[:-5] + '.jpeg'])
# file will be like 'test.tiff'
# file[:-5] will be 'test' (we remove the last 5 characters, so '.tiff'
# we add '.jpeg' to our extension-less string
All those informations are on the post I've linked in the comments o your original question.
You could try looking into os.path.splitext(). That allows you to split the file name into a tuple containing the basename and extension. That might help...
https://docs.python.org/3/library/os.path.html

python open file matching pattern excluding substring

I need to open some files inside a folder in python
Say, I have the following files in the folder:
text_pbs.fna
text_pdom_fo_oo.fna
text_pdom_fo_oo_aa.fna
text_pdom_fo_oo.ali
text_pdom_ba_ar.fna
text_pdom_ba_ar_aa.fna
text_pdom_ba_ar.ali
text_pdom_ba_az.fna
text_pdom_ba_az_aa.fna
text_pdom_ba_az.ali
I want to open:
text_pdom_fo_oo.fna
text_pdom_ba_ar.fna
text_pdom_ba_az.fna
only.
I tried with glob:
glob.glob('*_pdom_*[^aa].fna')
But it doesn't work.
Many thanks to point out the problem in the above pattern. Is there any other work around for this?
The ^ is not handled and must be replaced by !, You should try this code:
import glob
glob.glob('*_pdom_*[!aa].fna')
gives the result:
['text_pdom_fo_oo.fna','text_pdom_ba_ar.fna','text_pdom_ba_az.fna']

Categories