Python file searching trouble - python

I’m new to programming and could do with a little help. I’m trying to make a program that will search for files in a specified directory by extension (multiple extensions) and then only return specific results which have my list of keywords in the filename.
I have the following:
import os
from fnmatch import fnmatch
root = 'c:\users'
pattern = "*.css"
for path, subdirs, files in os.walk(root):
for name in files:
if fnmatch(name, pattern):
print os.path.join(name)
This will bring back all files with a single extension, in this case .css files, but I need it to do more such as image and text file extensions. I would also like it to only return files that have specific keywords in the file name. Can anyone point me in the right direction??
Thanks

Perhaps you could use glob:
from glob import glob
for filename in glob('*.css'):
print(filename)
If you have multiple extensions you can add the list returned by glob():
exts = ['ccs', 'txt']
all = []
for ext in exts:
all += glob('*.' + ext)
for filename in all:
print(filename)

Ok, so if you are searching for filetype and keywords, here would be some easy code to start with:
import os
import re
root = 'c:\users'
pattern = re.compile("((keyword1)|(keyword2))\.((txt)|(jpg))")
for path, subdirs, files in os.walk(root):
for name in files:
if re.match(pattern, name):
print os.path.join(name)
This code will match the file extension txt or jpg and search for the keywords keyword1 and keyword2.
If you want to make a more user-friendly code, to easily add extensions or keywords, you could use lists like so:
import os
import re
# configuration information
root = 'C:\Users'
keywords = [ 'one', 'two']
extensions = [ 'jpg', 'txt' ]
use_wildcard = True # Enables you to catch the keyword anywhere in the filename
# end of configuration
keyword_pattern = ''
first = True
for k in keywords:
if use_wildcard:
k = '.*' + k + '.*'
if first:
keyword_pattern += '(' + k + ')'
first = False
else:
keyword_pattern += '|(' + k + ')'
extension_pattern = ''
first = True
for ext in extensions:
if first:
extension_pattern += '(' + ext + ')'
first = False
else:
extension_pattern += '|(' + ext + ')'
pattern_regex = r"({0})\.({1})".format(keyword_pattern, extension_pattern)
print "Searching for: " + pattern_regex
pattern = re.compile(pattern_regex)
for path, subdirs, files in os.walk(root):
for name in files:
if re.match(pattern, name):
print os.path.join(path, name)
I'm using regular expressions because they are very powerful and are useful in a lot of cases. They may seem complex at first sight but once you understand them, it's difficult not to use them :)

Related

How to remove all numbers from a file name or a string - Python3

I have a Python script that will walk through all of the directories within the Test Folder(in this case) and will remove all numbers at the beginning of each of the file names. So my question is how would I modify my script in order to remove numbers from the whole file name? Not just the beginning or the end of it.
Thanks,
Alex
import os
for root, dirs, files in os.walk("Test Folder", topdown=True):
for name in files:
if (name.startswith("01") or name.startswith("02") or name.startswith("03") or name.startswith("04") or name.startswith("04") or name.startswith("05") or name.startswith("06") or name.startswith("07") or name.startswith("08") or name.startswith("09") or name[0].isdigit()):
old_filepath = (os.path.join(root, name))
_, new_filename = name.split(" ", maxsplit=1)
new_filepath = (os.path.join(root, new_filename))
os.rename(old_filepath, new_filepath)
Use regular expression, particularly re.sub:
>>> import re
>>> filename = '12name34with56numbers78in9it.txt'
>>> re.sub(r'\d', '', filename)
'namewithnumbersinit.txt'
This replaces everything that matches the \d pattern, i.e. that is a number, with '', i.e. nothing.
If you want to protect the extension, it get's more messy. You have to split the extension from the string, replace numbers in the first part, then join the extension back on. os.path.splitext can help you with that:
>>> filename = '12name34with56numbers78in9it.mp3'
>>> name, ext = os.path.splitext(filename)
>>> re.sub(r'\d+', '', name) + ext
'namewithnumbersinit.mp3'
You can do this:
filename = "this2has8numbers323in5it"
filename = "".join(char for char in filename if not char.isdigit())
No imports necessary.
import os
def rename_files():
# Get files names from the directory
files_names = os.listdir(" file r directory path")
saved_dir = os.chdir(files_name)
# To get cutrrent working directiry name
print os.getcwd() # This is to verify whether you are in correct path
for file_name in files_names:
os.rename(file_name,file_name.translate(None,'0123456789'))
# Above translate function remove all the number from file name
rename_files()

Renaming of files for a given pattern

I need help.
there is a folder "C:\TEMP" in this folder are formatted files "IN_ + 7123456789.amr"
It is necessary to make renaming of files for a given pattern.
"IN_ NAME _ DATE-CREATE _ Phone number.amr"
Correspondingly, if a file called "OUT_ + 7123456789.amr" the result format "OUT_ NAME_DATE-CREATE_Phone number.amr"
The question is how to specify the file name has been checked before os.rename and depending on the file name to use the template
import os
path = "C:/TEMP"
for i, filename in enumerate(os.listdir(path)):
os.chdir(path)
os.rename(filename, 'name'+str(i) +'.txt')
i = i+1
Sorry but none of your examples are consistent in your question, I still don't understand what your C:\temp contains...
Well, assuming it would look like:
>>> os.listdir(path)
['IN_ + 7123456789.amr', 'OUT_ + 7123456789.amr']
The example:
import datetime
import re
import os
os.chdir(path)
for filename in os.listdir(path):
match = re.match(r'(IN|OUT)_ \+ (\d+).amr', filename)
if match:
file_date = datetime.datetime.fromtimestamp(os.stat(filename).st_mtime)
destination = '%s_%s_%s_Phone number.amr' % (
match.group(1), # either IN or OUT
match.group(2),
file_date.strftime('%Y%m%d%H%M%S'), # adjust the format at your convenience
)
os.rename(filename, destination)
Will produce:
IN_7123456789_20150721094227_Phone number.amr
OUT_7123456789_20150721094227_Phone number.amr
Other files won't match the re.match pattern and be ignored.

Renaming multiple images with .rename and .endswith

I've been trying to get this to work, but I feel like I'm missing something. There is a large collection of images in a folder that I need to rename just part of the filename. For example, I'm trying to rename the "RJ_200", "RJ_600", and "RJ_60"1 all to the same "RJ_500", while keeping the rest of the filename intact.
Image01.Food.RJ_200.jpg
Image02.Food.RJ_200.jpg
Image03.Basket.RJ_600.jpg
Image04.Basket.RJ_600.jpg
Image05.Cup.RJ_601.jpg
Image06.Cup.RJ_602.jpg
This is what I have so far, but it keeps just giving me the "else" instead of actually renaming any of them:
import os
import fnmatch
import sys
user_profile = os.environ['USERPROFILE']
dir = user_profile + "\Desktop" + "\Working"
print (os.listdir(dir))
for images in dir:
if images.endswith("RJ_***.jpg"):
os.rename("RJ_***.jpg", "RJ_500.jpg")
else:
print ("Arg!")
The Python string method endswith does not do pattern-matching with *, so you're looking for filenames which explicitly include the asterisk character and not finding any.
Try using regular expressions to match your filenames and then building your target filename explicitly:
import os
import re
patt = r'RJ_\d\d\d'
user_profile = os.environ['USERPROFILE']
path = os.path.join(user_profile, "Desktop", "Working")
image_files = os.listdir(path)
for filename in image_files:
flds = filename.split('.')
try:
frag = flds[2]
except IndexError:
continue
if re.match(patt, flds[2]):
from_name = os.path.join(path, filename)
to_name = '.'.join([flds[0], flds[1], 'RJ_500', 'jpg'])
os.rename(from_name, os.path.join(path, to_name))
Note that you need to do your matching with the file's basename and join on the rest of the path later.
You don't need to use .endswith. You can split the image file name up using .split and check the results. Since there are several suffix strings involved, I've put them all into a set for fast membership testing.
import os
import re
import sys
suffixes = {"RJ_200", "RJ_600", "RJ_601"}
new_suffix = "RJ_500"
user_profile = os.environ["USERPROFILE"]
dir = os.path.join(user_profile, "Desktop", "Working")
for image_name in os.listdir(dir):
pieces = image_name.split(".")
if pieces[2] in suffixes:
from_path = os.path.join(dir, image_name)
new_name = ".".join([pieces[0], pieces[1], new_suffix, pieces[3]])
to_path = os.path.join(dir, new_name)
print("renaming {} to {}".format(from_path, to_path))
os.rename(from_path, to_path)

Python - Loop through list within regex

Right, i'm relatively new to Python, which you will likely see in my code, but is there any way to iterate through a list within regex?
Basically, i'm looping through each filename within a folder, getting a code (2-6 digits) from the filename, and i'm wanting to compare it with a list of codes in a text file, which have a name attached, in the format "1234_Name" (without the quotation marks). If the code exists in both lists, I want to print out the list entry, i.e. 1234_Name. Currently my code only seems to look at the first entry in the text file's list and i'm not sure how to make it look through them all to find matches.
import os, re
sitesfile = open('C:/Users/me/My Documents/WORK_PYTHON/Renaming/testnames.txt', 'r')
filefolder = r'C:/Users/me/My Documents/WORK_PYTHON/Renaming/files/'
sites = sitesfile.read()
site_split = re.split('\n', sites)
old = []
newname = []
for site in site_split:
newname.append(site)
for root, dirs, filenames in os.walk(filefolder):
for filename in filenames:
fullpath = os.path.join(root, filename)
filename_split = os.path.splitext(fullpath)
filename_zero, fileext = filename_split
filename_zs = re.split("/", filename_zero)
filenm = re.search(r"[\w]+", str(filename_zs[-1:]))#get only filename, not path
filenmgrp = filenm.group()
pacode = re.search('\d\d+', filenmgrp)
if pacode:
pacodegrp = pacode.group()
match = re.match(pacodegrp, site)
if match:
print site
Hope this makes sense - thanks a lot in advance!
So, use this code instead:
import os
import re
def locate(pattern = r'\d+[_]', root=os.curdir):
for path, dirs, files in os.walk(os.path.abspath(root)):
for filename in re.findall(pattern, ' '.join(files)):
yield os.path.join(path, filename)
..this will only return files in a folder that match a given regex pattern.
with open('list_file.txt', 'r') as f:
lines = [x.split('_')[0] for x in f.readlines()]
print_out = []
for f in locate(<your code regex>, <your directory>):
if f in lines: print_out.append(f)
print(print_out)
...find the valid codes in your list_file first, then compare the files that come back with your given regex.

Filtering os.walk() dirs and files

I'm looking for a way to include/exclude files patterns and exclude directories from a os.walk() call.
Here's what I'm doing by now:
import fnmatch
import os
includes = ['*.doc', '*.odt']
excludes = ['/home/paulo-freitas/Documents']
def _filter(paths):
for path in paths:
if os.path.isdir(path) and not path in excludes:
yield path
for pattern in (includes + excludes):
if not os.path.isdir(path) and fnmatch.fnmatch(path, pattern):
yield path
for root, dirs, files in os.walk('/home/paulo-freitas'):
dirs[:] = _filter(map(lambda d: os.path.join(root, d), dirs))
files[:] = _filter(map(lambda f: os.path.join(root, f), files))
for filename in files:
filename = os.path.join(root, filename)
print(filename)
Is there a better way to do this? How?
This solution uses fnmatch.translate to convert glob patterns to regular expressions (it assumes the includes only is used for files):
import fnmatch
import os
import os.path
import re
includes = ['*.doc', '*.odt'] # for files only
excludes = ['/home/paulo-freitas/Documents'] # for dirs and files
# transform glob patterns to regular expressions
includes = r'|'.join([fnmatch.translate(x) for x in includes])
excludes = r'|'.join([fnmatch.translate(x) for x in excludes]) or r'$.'
for root, dirs, files in os.walk('/home/paulo-freitas'):
# exclude dirs
dirs[:] = [os.path.join(root, d) for d in dirs]
dirs[:] = [d for d in dirs if not re.match(excludes, d)]
# exclude/include files
files = [os.path.join(root, f) for f in files]
files = [f for f in files if not re.match(excludes, f)]
files = [f for f in files if re.match(includes, f)]
for fname in files:
print fname
From docs.python.org:
os.walk(top[, topdown=True[, onerror=None[, followlinks=False]]])
When topdown is True, the caller can modify the dirnames list in-place … this can be used to prune the search …
for root, dirs, files in os.walk('/home/paulo-freitas', topdown=True):
# excludes can be done with fnmatch.filter and complementary set,
# but it's more annoying to read.
dirs[:] = [d for d in dirs if d not in excludes]
for pat in includes:
for f in fnmatch.filter(files, pat):
print os.path.join(root, f)
I should point out that the above code assumes excludes is a pattern, not a full path. You would need to adjust the list comprehension to filter if os.path.join(root, d) not in excludes to match the OP case.
why fnmatch?
import os
excludes=....
for ROOT,DIR,FILES in os.walk("/path"):
for file in FILES:
if file.endswith(('doc','odt')):
print file
for directory in DIR:
if not directory in excludes :
print directory
not exhaustively tested
dirtools is perfect for your use-case:
from dirtools import Dir
print(Dir('.', exclude_file='.gitignore').files())
Here is one way to do that
import fnmatch
import os
excludes = ['/home/paulo-freitas/Documents']
matches = []
for path, dirs, files in os.walk(os.getcwd()):
for eachpath in excludes:
if eachpath in path:
continue
else:
for result in [os.path.abspath(os.path.join(path, filename)) for
filename in files if fnmatch.fnmatch(filename,'*.doc') or fnmatch.fnmatch(filename,'*.odt')]:
matches.append(result)
print matches
import os
includes = ['*.doc', '*.odt']
excludes = ['/home/paulo-freitas/Documents']
def file_search(path, exe):
for x,y,z in os.walk(path):
for a in z:
if a[-4:] == exe:
print os.path.join(x,a)
for x in includes:
file_search(excludes[0],x)
This is an example of excluding directories and files with os.walk():
ignoreDirPatterns=[".git"]
ignoreFilePatterns=[".php"]
def copyTree(src, dest, onerror=None):
src = os.path.abspath(src)
src_prefix = len(src) + len(os.path.sep)
for root, dirs, files in os.walk(src, onerror=onerror):
for pattern in ignoreDirPatterns:
if pattern in root:
break
else:
#If the above break didn't work, this part will be executed
for file in files:
for pattern in ignoreFilePatterns:
if pattern in file:
break
else:
#If the above break didn't work, this part will be executed
dirpath = os.path.join(dest, root[src_prefix:])
try:
os.makedirs(dirpath,exist_ok=True)
except OSError as e:
if onerror is not None:
onerror(e)
filepath=os.path.join(root,file)
shutil.copy(filepath,dirpath)
continue;#If the above else didn't executed, this will be reached
continue;#If the above else didn't executed, this will be reached
python >=3.2 due to exist_ok in makedirs
The above methods had not worked for me.
So, This is what I came up with an expansion of my original answer to another question.
What worked for me was:
if (not (str(root) + '/').startswith(tuple(exclude_foldr)))
which compiled a path and excluded the tuple of my listed folders.
This gave me the exact result I was looking for.
My goal for this was to keep my mac organized.
I can Search any folder by path, locate & move specific file.types, ignore subfolders and i preemptively prompt the user if they want to move the files.
NOTE: the Prompt is only one time per run and is NOT per file
By Default the prompt defaults to NO when you hit enter instead of [y/N], and will just list the Potential files to be moved.
This is only a snippet of my GitHub Please visit for the total script.
HINT: Read the script below as I added info per line as to what I had done.
#!/usr/bin/env python3
# =============================================================================
# Created On : MAC OSX High Sierra 10.13.6 (17G65)
# Created On : Python 3.7.0
# Created By : Jeromie Kirchoff
# =============================================================================
"""THE MODULE HAS BEEN BUILD FOR KEEPING YOUR FILES ORGANIZED."""
# =============================================================================
from os import walk
from os import path
from shutil import move
import getpass
import click
mac_username = getpass.getuser()
includes_file_extensn = ([".jpg", ".gif", ".png", ".jpeg", ])
search_dir = path.dirname('/Users/' + mac_username + '/Documents/')
target_foldr = path.dirname('/Users/' + mac_username + '/Pictures/Archive/')
exclude_foldr = set([target_foldr,
path.dirname('/Users/' + mac_username +
'/Documents/GitHub/'),
path.dirname('/Users/' + mac_username +
'/Documents/Random/'),
path.dirname('/Users/' + mac_username +
'/Documents/Stupid_Folder/'),
])
if click.confirm("Would you like to move files?",
default=False):
question_moving = True
else:
question_moving = False
def organize_files():
"""THE MODULE HAS BEEN BUILD FOR KEEPING YOUR FILES ORGANIZED."""
# topdown=True required for filtering.
# "Root" had all info i needed to filter folders not dir...
for root, dir, files in walk(search_dir, topdown=True):
for file in files:
# creating a directory to str and excluding folders that start with
if (not (str(root) + '/').startswith(tuple(exclude_foldr))):
# showcase only the file types looking for
if (file.endswith(tuple(includes_file_extensn))):
# using path.normpath as i found an issue with double //
# in file paths.
filetomove = path.normpath(str(root) + '/' +
str(file))
# forward slash required for both to split
movingfileto = path.normpath(str(target_foldr) + '/' +
str(file))
# Answering "NO" this only prints the files "TO BE Moved"
print('Files To Move: ' + str(filetomove))
# This is using the prompt you answered at the beginning
if question_moving is True:
print('Moving File: ' + str(filetomove) +
"\n To:" + str(movingfileto))
# This is the command that moves the file
move(filetomove, movingfileto)
pass
# The rest is ignoring explicitly and continuing
else:
pass
pass
else:
pass
else:
pass
if __name__ == '__main__':
organize_files()
Example of running my script from terminal:
$ python3 organize_files.py
Exclude list: {'/Users/jkirchoff/Pictures/Archive', '/Users/jkirchoff/Documents/Stupid_Folder', '/Users/jkirchoff/Documents/Random', '/Users/jkirchoff/Documents/GitHub'}
Files found will be moved to this folder:/Users/jkirchoff/Pictures/Archive
Would you like to move files?
No? This will just list the files.
Yes? This will Move your files to the target folder.
[y/N]:
Example of listing files:
Files To Move: /Users/jkirchoff/Documents/Archive/JayWork/1.custom-award-768x512.jpg
Files To Move: /Users/jkirchoff/Documents/Archive/JayWork/10351458_318162838331056_9023492155204267542_n.jpg
...etc
Example of moving files:
Moving File: /Users/jkirchoff/Documents/Archive/JayWork/1.custom-award-768x512.jpg
To: /Users/jkirchoff/Pictures/Archive/1.custom-award-768x512.jpg
Moving File: /Users/jkirchoff/Documents/Archive/JayWork/10351458_318162838331056_9023492155204267542_n.jpg
To: /Users/jkirchoff/Pictures/Archive/10351458_318162838331056_9023492155204267542_n.jpg
...

Categories