Regular Expressions in Python for match files in a folder - python

I want to match all the files in a folder using regular expressions for some reason:
I used this:
re.compile(r'\.*$')
But this is also matching hidden files and temp files.
Is there a better option?

This makes the assumption that you're wanting to do something with these file names. As someone mentioned in the comments you should use glob. Since I'm not sure what you're going for with the 'temp' files this was the simplest thing. It will return no hidden files. Files is a list of file paths from your current working directory.
import os, glob
files = [f for f in glob.glob('./*') if os.path.isfile(f)]

Try re.compile(r'\w+\.*\w*') to match alphanumeric file names with a possible dot extension.
\w+ matches one or more alphanumeric file names [a-zA-Z0-9_]
\.* matches zero or more '.' characters
\w* matches zero or more file extension alphanumeric characters.
Kodos is an excellent Python regular expression developer/debugger.

Get all files from a directory just using this:
import os
files = [f for p, d, f in os.walk('/foo/bar')][0]

Related

Python: Provide a string, open folder that contains the string in the folder name, then open/read a file in that folder

I have a highly branched folder structure as shown below. I want to match a barcode, which is nested between other descriptors and open/read a file of interest in that barcode's folder. The contents of each XXbarcodeXX folder are basically the same.
I have tried to use os.walk(), glob.glob(), and os.listdir() in combination with fnmatch, but none yielded the correct folder. and glob.glob() just returned an empty list which I think means it didnt find anything.
The closest of which I did not let finish bc it seemed to be going top down through each folder rather than just checking the folder names in the second level. This was taking far too long bc some folders in the third and fourth levels have hundreds of files/folders.
import re
path='C:\\my\\path'
barcode='barcode2'
for dirname, dirs, files in os.walk(path):
for folds in dirs:
if re.match('*'+barcode+'*', folds):
f = open(os.path.join(dirname+folds)+'FileOfInterest.txt', 'w')
The * in re.match regex you are using will probably generate an error (nothing to repeat at position 0) since is using a quantifier (zero or more times) without any preceding token. You may try to replace your regex with '..' + barcode + '..'. This regex will match your expected barcode string between any two characters (except for line terminators). In the command os.path.join you may join all the path's names and the desired file in the same command to avoid any issues with the specific OS separator.
import os
import re
path='dirStructure'
barcode='barcode2'
for dirname, dirs, files in os.walk(path):
for folds in dirs:
if re.match('..' + barcode + '..', folds):
f = open(os.path.join(dirname, folds, 'FileOfInterest.txt'), 'r')
print(f.readlines())
f.close

Python Match Portion of File Name

I am trying to match file names within a folder using python so that I can run a secondary process on the files that match. My file names are such that they begin differently but match strings at some point as below:
3322_VEGETATION_AREA_2009_09
3322_VEGETATION_LINE_2009_09
4522_VEGETATION_POINT_2009_09
4422_VEGETATION_AREA_2009_09
8722_VEGETATION_LINE_2009_09
2522_VEGETATION_POINT_2009_09
4222_VEGETATION_AREA_2009_09
3522_VEGETATION_LINE_2009_09
3622_VEGETATION_POINT_2009_09
Would regex be the right approach to matching those files after the first underscore or am I overthinking this?
import glob
files = glob.glob("*VEGETATION*")
should do the trick. It should find all files in the current directory that contain "VEGETATION" somewhere in the filename

Delete all files with partial filename python

I have files in my present working directory that I would like to delete. They all have a filename that starts with the string 'words' (for example, files words_1.csv and words_2.csv). I want to match all files in the current directory that start with 'words' and delete them. What would the search pattern be?
I found this from here, but it doesn't quite answer the question.
import os, re
def purge(dir, pattern):
for f in os.listdir(dir):
if re.search(pattern, f):
os.remove(os.path.join(dir, f))
t = 'words_1.csv'
print(t.startswith('words'))
it‘s done.
and the pattern may be the '^words.*\.csv$',but i suggest you read python RE doc.
If I'm understanding your question correctly, you have this function and you are asking how it may be used. You should be able to call simply:
purge('/path/to/your/dir','words.*')
This will remove any files starting with the string "words".
pattern is a regular expression pattern. In your case, it's simply anything beginning with "words" and ending with ".csv", so you can use
pattern = "words*.csv"

Python: Renaming first 5 files of a folder

I dont know if this can be done or not, but is there a way that I can rename only the first 5 files in a folder? I know that I can use the os.listdir() or os.walk() to walk through the entire folder, but I only need to rename the first 5 files. I am able to use a Regex to match the files, but the problem is is that there are other files that match the same Regex. Does anyone have any suggestions?
The file name takes the form of "Test Run 1 4-29-2016 2 07 56 PM".
You can limit the result from listdir:
os.listdir(os.curdir)[:5]
glob.glob will allow you to filter files using wild cards
glob.glob(pathname)
Return a possibly-empty list of path names that match pathname, which must be a string containing a path specification. pathname can be either absolute (like /usr/src/Python-1.5/Makefile) or relative (like ../../Tools//.gif), and can contain shell-style wildcards. Broken symlinks are included in the results (as in the shell). No tilde expansion is done, but *, ?, and character ranges expressed with [] will be correctly matched.
glob.glob('*.gif')[:5]

Python regular expression to match a file name. Using os.walk() to get a file name

I'm using os.walk() to get files name. What I need to do is to create a list with files name that match following patterns:
if '*' will match all files.
if 'h*' will match all files beginning with h.
if '*h' will match all files ending with h.
if '*h*' will match all files that have h in them.
if [h-w]* will match any one character in set, including set negation [^h-w]
I'm new with regular expression and I have troubles with creating an if statement for this issue. May some explain it to me (maybe with code examples) how to do it. Thanks.
I tried fnmatch, and it's working perfectly, a Big Thanks to Charles Duffy.
Here is my code:
for dp, dn, filenames in os.walk(path):
for ff in filenames:
if fnmatch.fnmatch(ff, 'My patterns here'):
list.append(os.path.join(dp, ff))

Categories