If "Part[Digits].rar" Exists In Filename - python

So I'm looking to develop a script to return a duple with filenames and extensions so I can alter the filenames (capitalize, remove dots, dashes, etc.) in the name, then just add the extension and be good to go (at some point I'm hoping to throw this into something with a GUI).
I'm not sure what the best way to do this is, but I'd like to have it so it takes the extension (the last .ext) unless the filename ends in "part[any number of digits, any number of zeros from 1-any number].rar" then it takes the last two parts as the "extension".
I'm not set on this specific methodology. However, I do need it to integrate into the current script (integrate into the same script as I'll be using for everything else) and I know it needs to end with me having a duple name, ext pair.
My Current Code:
import os, shutil, re
def rename_file (original_filename):
name, extension = os.path.splitext(original_filename)
name = re.sub(r"\'", r"", name) # etc...more of these...
new_filename = name + extension
try:
# moves files or directories (recursively)
shutil.move(original_filename, new_filename)
except shutil.Error:
print ("Couldn't rename file %(original_filename)s!" % locals())
[rename_file(f) for f in os.listdir('.') if not f.startswith('.')]
How do I make this put "part[digits]" with the .rar as part of the "ext" instead of as part of the name?

You can do it in a regular expression instead of using splitext:
m = re.search(r'(.*?)((part\[\d+\])?\.rar)', original_filename)
name, ext = m.groups()[:2]
So, for example:
>>> m = re.search(r'(.*?)((part\[\d+\])?\.rar)', 'name_part[23].rar')
>>> m.groups()[:2]
('name_', 'part[23].rar')
Or
>>> m = re.search(r'(.*?)((part\[\d+\])?\.rar)', 'name_no_parts.rar')
>>> m.groups()[:2]
('name_no_parts', '.rar')
This assumes that the extension is actually .rar. It's easy to tweak the regex if this is not the case though.

Related

Seperation of files from one folder to another based on their filenames

My filenames have pattern like 29_11_2019_17_05_17_1050_R__2.png and 29_11_2019_17_05_17_1550_2
I want to write a function which separates these files and puts them in different folders.
Please find my code below but its not working.
Can you help me with this?
def sort_photos(folder, dir_name):
for filename in os.listdir(folder):
wavelengths = ["1550_R_", "1550_", "1050_R_", "1200_"]
for x in wavelengths:
if x == "1550_R_":
if re.match(r'.*x.*', filename):
filesrc = os.path.join(folder, filename)
shutil.copy(filesrc, dir_name)
print("Filename has 'x' in it, do something")
print("file 1550_R_ copied")
# cv2.imwrite(dir_name,filename)
else:
print("filename doesn't have '1550_R_' in it (so maybe it only has 'N' in it)")
In order to construct a RegEx using a variable, you can use string-interpolation.
In Python3.6+, you can use f-strings to accomplish this. In this case, the condition for your second if statement could be:
if re.match(fr'.*{x}.*', filename) is not None:
instead of:
if re.match(r'.*x.*', filename) is not None:
Which would only ever match filenames with an 'x' in them. I think is the immediate (though not necessarily only) problem in your example code.
Footnote
Earlier versions of Python do string interpolation differently, the oldest (AFAIK) is %-formatting, e.g:
if re.match(r".*%s.*" % x, filename) is not None:
Read here for more detail.
I am not very cleared about which problem you encounter.
However, there are two suggestions:
To detect character x in file name, you can just use:
if('x' in filename):
...
If you only intended to move the files, a file check should be added:
if os.path.isfile(name):
...
I didn't have much time so I've edited your function which acts very close to what you wanted. It essentially reads file names, and copies them to separate directories but in directories named by wavelengths instead. Though currently it cannot differentiate between '1550_' and '1550_R_' since '1550_R_' includes '1550_' and I didn't have much time. You can create a conditional statement for it by a few lines and there you go. (If you do not do that it will create two directories '1550_' and '1550_R_' but it will copy files that are eligible for either to both of the folders.)
One final note that as I said that I didn't have much time I've made it all simpler that the destination folders are created just where your files are located. You can add it easily if you want by a few lines too.
import cv2
import os
import re
import shutil
def sort_photos(folder):
wavelengths = ["1550_R_", "1550_", "1050_R_", "1200_"]
for filename in os.listdir(folder):
for x,idx in zip(wavelengths, range(len(wavelengths))):
if (x in filename):
filesrc = os.path.join(folder, filename)
path = './'+x+'/'
if not os.path.exists(path):
os.mkdir(path)
shutil.copy(filesrc, path+filename)
# print("Filename has 'x' in it, do something")
# cv2.imwrite(dir_name,filename)
# else:
# print("filename doesn't have 'A' in it (so maybe it only has 'N' in it)")
########## USAGE: sort_photos(folder), for example, go to the folder where all the files are located:
sort_photos('./')

removing all numbers from all files in a folder

I'm trying to create a python script that will go a specific folder and remove all the numbers from the file name.
This is the code
def rename_file():
print"List of Files:"
print(os.getcwd())
os.chdir("/home/n3m0/Desktop/Pictures")
for fn in os.listdir(os.getcwd()):
print("file w/ numbers -" +fn)
print("File w/o numbers - "+fn.translate(None, "0123456789"))
os.rename(fn, fn.translate(None, "0123456789"))
os.chdir("/home/n3m0/Desktop/Pictures")
rename_files()
What I'm trying to do is remove all the numbers so that I'm able to read the file name
For example I want:
B45608aco4897n Pan44ca68ke90s1.jpg to say Bacon Pancakes.jpg
When I run the script it changeS all of the names in the terminal but when I go to the folder only one file name has been changed and I have to run the script multiple times. I'm using python 2.7.
I'm not 100% on this as I am just on my phone at the moment, but try this:
from string import digits
def rename_files():
os.chdir("/whatever/directory/you/want/here")
for fn in os.listdir(os.getcwd()):
os.rename(fn, fn.translate(None, digits))
rename_files()
Your indentation is a little messed up, and that's part of what's causing you problems. You also don't necessarily need to change the working directory - we can simply just keep track of the folder we're looking at and use os.path.join to reconstruct the file path, like so:
import os
from string import digits
def renamefiles(folder_path):
for input_file in os.listdir(folder_path):
print 'Original file name: {}'.format(input_file)
if any(str(x) in input_file for x in digits):
new_name = input_file.translate(None, digits)
print 'Renaming: {} to {}'.format(input_file, new_name)
os.rename(os.path.join(folder_path, input_file), os.path.join(folder_path, new_name))
rename_files('/home/n3m0/Desktop/Pictures')
This produces a method that you can re-use - we loop through all the items in the folder, printing the original names as we go. We then check if there are any digits in the filename, and if they are we rename the file.
Note, however, that this method is not particularly safe - what if the file name consists entirely of numbers and an extension? What if there are two files named identically apart from numbers (e.g. asongtoruin0.jpg and asongtoruin1.jpg)? This method would only retain the last file it found, overwriting the first. Look into the functions available in os to try to work out how to solve this, particularly os.path.isfile.
EDIT: had some time to spare, here's a little fix to catch the error for renaming to an already-existing file name:
def renamefiles(folder_path):
for input_file in os.listdir(folder_path):
print 'Original file name: {}'.format(input_file)
if any(str(x) in input_file for x in digits):
new_name = input_file.translate(None, digits)
# if removing numbers conflicts with an existing file, try adding a number to the end of the file name.
i = 1
while os.path.isfile(os.path.join(folder_path, new_name)):
split = os.path.splitext(new_name)
new_name = '{0} ({1}){2}'.format(split[0], i, split[1])
print 'Renaming: {} to {}'.format(input_file, new_name)
os.rename(os.path.join(folder_path, input_file), os.path.join(folder_path, new_name))
rename_files('/home/n3m0/Desktop/Pictures')

Python: rename all files in a folder using numbers that file contain

I want to write a little script for managing a bunch of files I got. Those files have complex and different name but they all contain a number somewhere in their name. I want to take that number, place it in front of the file name so they can be listed logically in my filesystem.
I got a list of all those files using os.listdir but I'm struggling to find a way to locate the numbers in those files. I've checked regular expression but I'm unsure if it's the right way to do this!
example:
import os
files = os.litdir(c:\\folder)
files
['xyz3.txt' , '2xyz.txt', 'x1yz.txt']`
So basically, what I ultimately want is:
1xyz.txt
2xyz.txt
3xyz.txt
where I am stuck so far is to find those numbers (1,2,3) in the list files
This (untested) snippet should show the regexp approach. The search method of compiled patterns is used to look for the number. If found, the number is moved to the front of the file name.
import os, re
NUM_RE = re.compile(r'\d+')
for name in os.listdir('.'):
match = NUM_RE.search(name)
if match is None or match.start() == 0:
continue # no number or number already at start
newname = match.group(0) + name[:match.start()] + name[match.end():]
print 'renaming', name, 'to', newname
#os.rename(name, newname)
If this code is used in production and not as homework assignment, a useful improvement would be to parse match.group(0) as an integer and format it to include a number of leading zeros. That way foo2.txt would become 02foo.txt and get sorted before 12bar.txt. Implementing this is left as an exercise to the reader.
Assuming that the numbers in your file names are integers (untested code):
def rename(dirpath, filename):
inds = [i for i,char in filename if char in '1234567890']
ints = filename[min(inds):max(inds)+1]
newname = ints + filename[:min(inds)] + filename[max(inds)+1:]
os.rename(os.path.join(dirpath, filename), os.path.join(dirpath, newname))
def renameFilesInDir(dirpath):
""" Apply your renaming scheme to all files in the directory specified by dirpath """
dirpath, dirnames, filenames = os.walk(dirpath):
for filename in filenames:
rename(dirpath, filename)
for dirname in dirnames:
renameFilesInDir(os.path.join(dirpath, dirname))
Hope this helps

Delete multiple files matching a pattern

I have made an online gallery using Python and Django. I've just started to add editing functionality, starting with a rotation. I use sorl.thumbnail to auto-generate thumbnails on demand.
When I edit the original file, I need to clean up all the thumbnails so new ones are generated. There are three or four of them per image (I have different ones for different occasions).
I could hard-code in the file-varients... But that's messy and if I change the way I do things, I'll need to revisit the code.
Ideally I'd like to do a regex-delete. In regex terms, all my originals are named like so:
^(?P<photo_id>\d+)\.jpg$
So I want to delete:
^(?P<photo_id>\d+)[^\d].*jpg$
(Where I replace photo_id with the ID I want to clean.)
Using the glob module:
import glob, os
for f in glob.glob("P*.jpg"):
os.remove(f)
Alternatively, using pathlib:
from pathlib import Path
for p in Path(".").glob("P*.jpg"):
p.unlink()
Try something like this:
import os, re
def purge(dir, pattern):
for f in os.listdir(dir):
if re.search(pattern, f):
os.remove(os.path.join(dir, f))
Then you would pass the directory containing the files and the pattern you wish to match.
If you need recursion into several subdirectories, you can use this method:
import os, re, os.path
pattern = "^(?P<photo_id>\d+)[^\d].*jpg$"
mypath = "Photos"
for root, dirs, files in os.walk(mypath):
for file in filter(lambda x: re.match(pattern, x), files):
os.remove(os.path.join(root, file))
You can safely remove subdirectories on the fly from dirs, which contains the list of the subdirectories to visit at each node.
Note that if you are in a directory, you can also get files corresponding to a simple pattern expression with glob.glob(pattern). In this case you would have to substract the set of files to keep from the whole set, so the code above is more efficient.
How about this?
import glob, os, multiprocessing
p = multiprocessing.Pool(4)
p.map(os.remove, glob.glob("P*.jpg"))
Mind you this does not do recursion and uses wildcards (not regex).
UPDATE
In Python 3 the map() function will return an iterator, not a list. This is useful since you will probably want to do some kind processing on the items anyway, and an iterator will always be more memory-efficient to that end.
If however, a list is what you really need, just do this:
...
list(p.map(os.remove, glob.glob("P*.jpg")))
I agree it's not the most functional way, but it's concise and does the job.
It's not clear to me that you actually want to do any named-group matching -- in the use you describe, the photoid is an input to the deletion function, and named groups' purpose is "output", i.e., extracting certain substrings from the matched string (and accessing them by name in the match object). So, I would recommend a simpler approach:
import re
import os
def delete_thumbnails(photoid, photodirroot):
matcher = re.compile(r'^%s\d+\D.*jpg$' % photoid)
numdeleted = 0
for rootdir, subdirs, filenames in os.walk(photodirroot):
for name in filenames:
if not matcher.match(name):
continue
path = os.path.join(rootdir, name)
os.remove(path)
numdeleted += 1
return "Deleted %d thumbnails for %r" % (numdeleted, photoid)
You can pass the photoid as a normal string, or as a RE pattern piece if you need to remove several matchable IDs at once (e.g., r'abc[def] to remove abcd, abce, and abcf in a single call) -- that's the reason I'm inserting it literally in the RE pattern, rather than inserting the string re.escape(photoid) as would be normal practice. Certain parts such as counting the number of deletions and returning an informative message at the end are obviously frills which you should remove if they give you no added value in your use case.
Others, such as the "if not ... // continue" pattern, are highly recommended practice in Python (flat is better than nested: bailing out to the next leg of the loop as soon as you determine there is nothing to do on this one is better than nesting the actions to be done within an if), although of course other arrangements of the code would work too.
My recomendation:
def purge(dir, pattern, inclusive=True):
regexObj = re.compile(pattern)
for root, dirs, files in os.walk(dir, topdown=False):
for name in files:
path = os.path.join(root, name)
if bool(regexObj.search(path)) == bool(inclusive):
os.remove(path)
for name in dirs:
path = os.path.join(root, name)
if len(os.listdir(path)) == 0:
os.rmdir(path)
This will recursively remove every file that matches the pattern by default, and every file that doesn't if inclusive is true. It will then remove any empty folders from the directory tree.
import os, sys, glob, re
def main():
mypath = "<Path to Root Folder to work within>"
for root, dirs, files in os.walk(mypath):
for file in files:
p = os.path.join(root, file)
if os.path.isfile(p):
if p[-4:] == ".jpg": #Or any pattern you want
os.remove(p)
I find Popen(["rm " + file_name + "*.ext"], shell=True, stdout=PIPE).communicate() to be a much simpler solution to this problem. Although this is prone to injection attacks, I don't see any issues if your program is using this internally.
def recursive_purge(dir, pattern):
for f in os.listdir(dir):
if os.path.isdir(os.path.join(dir, f)):
recursive_purge(os.path.join(dir, f), pattern)
elif re.search(pattern, os.path.join(dir, f)):
os.remove(os.path.join(dir, f))

Is this the best way to get unique version of filename w/ Python?

Still 'diving in' to Python, and want to make sure I'm not overlooking something. I wrote a script that extracts files from several zip files, and saves the extracted files together in one directory. To prevent duplicate filenames from being over-written, I wrote this little function - and I'm just wondering if there is a better way to do this?
Thanks!
def unique_filename(file_name):
counter = 1
file_name_parts = os.path.splitext(file_name) # returns ('/path/file', '.ext')
while os.path.isfile(file_name):
file_name = file_name_parts[0] + '_' + str(counter) + file_name_parts[1]
counter += 1
return file_name
I really do require the files to be in a single directory, and numbering duplicates is definitely acceptable in my case, so I'm not looking for a more robust method (tho' I suppose any pointers are welcome), but just to make sure that what this accomplishes is getting done the right way.
One issue is that there is a race condition in your above code, since there is a gap between testing for existance, and creating the file. There may be security implications to this (think about someone maliciously inserting a symlink to a sensitive file which they wouldn't be able to overwrite, but your program running with a higher privilege could) Attacks like these are why things like os.tempnam() are deprecated.
To get around it, the best approach is to actually try create the file in such a way that you'll get an exception if it fails, and on success, return the actually opened file object. This can be done with the lower level os.open functions, by passing both the os.O_CREAT and os.O_EXCL flags. Once opened, return the actual file (and optionally filename) you create. Eg, here's your code modified to use this approach (returning a (file, filename) tuple):
def unique_file(file_name):
counter = 1
file_name_parts = os.path.splitext(file_name) # returns ('/path/file', '.ext')
while 1:
try:
fd = os.open(file_name, os.O_CREAT | os.O_EXCL | os.O_RDRW)
return os.fdopen(fd), file_name
except OSError:
pass
file_name = file_name_parts[0] + '_' + str(counter) + file_name_parts[1]
counter += 1
[Edit] Actually, a better way, which will handle the above issues for you, is probably to use the tempfile module, though you may lose some control over the naming. Here's an example of using it (keeping a similar interface):
def unique_file(file_name):
dirname, filename = os.path.split(file_name)
prefix, suffix = os.path.splitext(filename)
fd, filename = tempfile.mkstemp(suffix, prefix+"_", dirname)
return os.fdopen(fd), filename
>>> f, filename=unique_file('/home/some_dir/foo.txt')
>>> print filename
/home/some_dir/foo_z8f_2Z.txt
The only downside with this approach is that you will always get a filename with some random characters in it, as there's no attempt to create an unmodified file (/home/some_dir/foo.txt) first.
You may also want to look at tempfile.TemporaryFile and NamedTemporaryFile, which will do the above and also automatically delete from disk when closed.
Yes, this is a good strategy for readable but unique filenames.
One important change: You should replace os.path.isfile with os.path.lexists! As it is written right now, if there is a directory named /foo/bar.baz, your program will try to overwrite that with the new file (which won't work)... since isfile only checks for files and not directories. lexists checks for directories, symlinks, etc... basically if there's any reason that filename could not be created.
EDIT: #Brian gave a better answer, which is more secure and robust in terms of race conditions.
Two small changes...
base_name, ext = os.path.splitext(file_name)
You get two results with distinct meaning, give them distinct names.
file_name = "%s_%d%s" % (base_name, str(counter), ext)
It isn't faster or significantly shorter. But, when you want to change your file name pattern, the pattern is on one place, and slightly easier to work with.
If you want readable names this looks like a good solution.
There are routines to return unique file names for eg. temp files but they produce long random looking names.
if you don't care about readability, uuid.uuid4() is your friend.
import uuid
def unique_filename(prefix=None, suffix=None):
fn = []
if prefix: fn.extend([prefix, '-'])
fn.append(str(uuid.uuid4()))
if suffix: fn.extend(['.', suffix.lstrip('.')])
return ''.join(fn)
How about
def ensure_unique_filename(orig_file_path):
from time import time
import os
if os.path.lexists(orig_file_path):
name, ext = os.path.splitext(orig_file_path)
orig_file_path = name + str(time()).replace('.', '') + ext
return orig_file_path
time() returns current time in milliseconds. combined with original filename, it's fairly unique even in complex multithreaded cases.

Categories