Python : Rename files - python

I read out of the XML-files their category and I rename and save them with the year.
So, file "XYZ.xml" is now "News_2014.xml".
The Problem is that there are several XML-files with the category "News" from 2014. With my code, I delete all other files and I can save only 1 file.
What can I do in order that every file is saved? For example, if there are 2 files with the category "News" and the Year 2014, there file-names should be: "News_2014_01.xml" and "News_2014_02.xml".
Since there are other categories, I can not simply implement an increasing integer, i.e. another file with the category "History" should still have the Name "History_2014_01.xml" (and not ...03.xml).
Actually, I have the following code:
for text, key in enumerate(d):
#print key, d[key]
name = d[key][(d[key].find("__")+2):(d[key].rfind("__"))]
year = d[key][(d[key].find("*")+1):(d[key].rfind("*"))]
cat = d[key][(d[key].rfind("*")+1):]
os.rename(name, cat+"_"+year+'.xml')

Once you have figured out the “correct” name for the file, e.g. News_2014.xml, you could make a loop that checks whether the file exists and adds an incrementing suffix to it while that’s the case:
import os
fileName = 'News_2014.xml'
baseName, extension = os.path.splitext(fileName)
suffix = 0
while os.path.exists(os.path.join(directory, fileName)):
suffix += 1
fileName = '{}_{:02}.{}'.format(baseName, suffix, extension)
print(fileName)
os.rename(originalName, fileName)
You can put that into a function, so it’s easier to use:
def getIncrementedFileName (fileName):
baseName, extension = os.path.splitext(fileName)
suffix = 0
while os.path.exists(os.path.join(directory, fileName)):
suffix += 1
fileName = '{}_{:02}.{}'.format(baseName, suffix, extension)
return fileName
And then use that in your code:
for key, item in d.items():
name = item[item.find("__")+2:item.rfind("__")]
year = item[item.find("*")+1:item.rfind("*")]
cat = item[item.rfind("*")+1:]
fileName = getIncrementedFileName(cat + '_' + year + '.xml')
os.rename(name, fileName)

[EDIT] #poke solution is much more elegant, let alone he posted it earlier
You can check if target filename already exists, and if it does, modify filename.
The easiest solution for me would be to always start with adding 'counter' to file name, so you start with News_2014_000.xml (maybe better be prepared for more than 100 files?).
Later you loop until you find filename, that does not exist:
def versioned_filename(candidate):
target = candidate
while os.path.exists(target):
fname, ext = target.rsplit('.', 1)
head, tail = fname.rsplit('_', 1)
count = int(tail)
#:03d formats as 3-digit with leading zero
target = "{}_{:03d}.{}".format(head, count+1, ext)
return target
So, if you want to save as 'News_2014_###.xml' file you can create name as usual, but call os.rename(sourcename, versioned_filename(targetname)).
If you want more efficient solution, you can parse output of glob.glob() to find highest count, you will save on multiple calling to os.path.exists, but it makes sense only if you expect hundreds or thousands of files.

You could use a dictionary to keep track of the count. That way, there is no need to modify file names after you've renamed them. The downside is that every filename will have a number in it, even if the max number for that category ends up being 1.
cat_count = {}
for text, key in enumerate(d):
name = d[key][(d[key].find("__")+2):(d[key].rfind("__"))]
year = d[key][(d[key].find("*")+1):(d[key].rfind("*"))]
cat = d[key][(d[key].rfind("*")+1):]
cat_count[cat] = cat_count[cat] + 1 if cat in cat_count else 1
os.rename(name, "%s_%s_%02d.xml" % (cat, year, cat_count[cat]))

Related

How to give a function access to a variable in another functions's inner for loop?

The program renames the files from American MM-DD-YYYY date format to European DD-MM-YYYY date format. I need somehow to pass the value of fileName in search_files function to the rename_file function so I can change the name of the file. Any idea how can I do that?
I think it may be possible to associate every fileName with it's new formatted name and to pass them as a dictionary. I didn't try this yet, but is there an easier way to do that?
def rename_file(europeanName):
# Get the full, absolute file paths.
currentPath = os.path.abspath('.')
fileName = os.path.join(currentPath, fileName)
europeanName = os.path.join(currentPath, europeanName)
# Rename the files.
shutil.move(fileName, europeanName)
def form_new_date(beforePart, monthPart, dayPart, yearPart, afterPart):
# Form the European-style filename.
europeanName = beforePart + dayPart + '-' + monthPart + '-' + yearPart + afterPart
rename_file(europeanName)
def breakdown_old_date(matches):
for match in matches:
# Get the different parts of the filename.
beforePart = match.group(1)
monthPart = match.group(2)
dayPart = match.group(4)
yearPart = match.group(6)
afterPart = match.group(8)
form_new_date(beforePart, monthPart, dayPart, yearPart, afterPart)
def search_files(dataPattern):
matches = []
# Loop over the files in the working directory.
for fileName in os.listdir('.'):
matchObj = dataPattern.search(fileName)
# Skip files without a date.
if not matchObj:
continue
else:
matches.append(matchObj)
breakdown_old_date(matches)
def form_regex():
# Create a regex that can identify the text pattern of American-style dates.
dataPattern = re.compile(r"""
^(.*?) # all text before the date
((0|1)?\d)- # one or two digits for the month
((0|1|2|3)?\d)- # one or two digits for the day
((19|20)\d\d) # four digits for the year
(.*?)$ # all text after the date
""", re.VERBOSE)
search_files(dataPattern)
if __name__ == "__main__":
form_regex()
Make matches a list of tuples, and for each file that matches:
matches.append((matchObj, fileName))
Then extract it out in breakdown_old_date using
fileName = match[1]
(don't forget to change your match.group calls to match[0].group), and pass it as a parameter to form_new_date, then as a parameter to rename_file.
Also, move the call to form_new_date (in breakdown_old_date) into the for loop, so it executes for each file you want to move.
(Alternatively, instead of making matches a list of tuples, you could make it a dictionary.)

How to increment output filename in Python

I have a script that works, but when I run it a second time it doesn't because it keeps saving the output filename the same. I'm very new to Python and programming in general, so dumb you answers down...and then dumb them down some more. :)
arcpy.gp.Spline_sa("Observation_RegionalClip_Clip", "observatio", "C:/Users/moshell/Documents/ArcGIS/Default.gdb/Spline_shp16", "514.404", "REGULARIZED", "0.1", "12")
Where Spline_shp16 is the output filename, I would like it to save as Spline_shp17 the next time I run the script, and then Spline_shp18 the time after that, etc.
If you want to use numbers in the file names, you can check what files with similar names already exist in that directory, take the largest one, and increment it by one. Then pass this new number as a variable in the string for the filename.
For example:
import glob
import re
# get the numeric suffixes of the appropriate files
file_suffixes = []
for file in glob.glob("./Spline_shp*"):
regex_match = re.match(".*Spline_shp(\d+)", file)
if regex_match:
file_suffix = regex_match.groups()[0]
file_suffix_int = int(file_suffix)
file_suffixes.append(file_suffix_int)
new_suffix = max(file_suffixes) + 1 # get max and increment by one
new_file = f"C:/Users/moshell/Documents/ArcGIS/Default.gdb/Spline_shp{new_suffix}" # format new file name
arcpy.gp.Spline_sa(
"Observation_RegionalClip_Clip",
"observatio",
new_file,
"514.404",
"REGULARIZED",
"0.1",
"12",
)
Alternatively, if you are just interested in creating unique filenames so that nothing gets overwritten, you can append a timestamp to the end of the filename. So you would have files with names like "Spline_shp-1551375142," for example:
import time
timestamp = str(time.time())
filename = "C:/Users/moshell/Documents/ArcGIS/Default.gdb/Spline_shp-" + timestamp
arcpy.gp.Spline_sa(
"Observation_RegionalClip_Clip",
"observatio",
filename,
"514.404",
"REGULARIZED",
"0.1",
"12",
)

Script for separating series of photos

I need to visually separate photos (JPEGs) in a folder by placing black placeholder pictures between series with identical file names (only last two digits of the file names are different). The folder is typically containing single (stand alone) photos, named something like 03-12345-randomfilename.jpg and series named 03-12345-file01.jpg, 03-12345-file02.jpg, ..03, ..04, etc.
The singles should be left alone, but I need to place a black picture before and after all series.
I have the following Python script (originally written by someone else) that is intermittently failing for no apparent reason. It usually works, but sometimes it will overwrite files in the middle of a series, or more typically, it will fail to place a black picture after the last photo in a series. I've spent hours trying to figure out what's going on, but I'm stuck.
Any suggestions most appreciated.
def blackJPG(directory):
# iterate over every file name in the directory
blackJPG = '/Users/username/black.jpg'
filelist = {}
for file_name in os.listdir(directory):
filename, file_extension = os.path.splitext(file_name)
stringmatch = re.compile(r'(\d{2})(.*?)(\d+)(.*?)(([A-Za-z]+))(.*?)(\d+)')
m = stringmatch.search(file_name)
#Create search table
if m:
sequence = int(m.group(8))
filename_without_sequence = "{0}{1}{2}{3}{4}{5}".format(m.group(1),m.group(2),m.group(3),m.group(4),m.group(6),m.group(7))
filelist.update({filename_without_sequence: (sequence)})
for key, value in filelist.iteritems():
if value > 1:
newJPG = "{0}/{1}00.jpg".format(directory, key)
if value >= 10:
lastJPG = "{0}/{1}{2}.jpg".format(directory, key, value+1)
else:
lastJPG = "{0}/{1}0{2}.jpg".format(directory, key, value+1)
#Create first blackJPG
shutil.copyfile(blackJPG, newJPG)
#Create last blackJPG
shutil.copyfile(blackJPG, lastJPG)
return "Done"
If the variation is always the last 2 characters, then you can grab the part that doesn't change (the prefix) count the number of prefixes and create a file for those with more than one file:
def add_black_jpg(directory):
series_count = {}
for file in os.listdir(directory):
name, ext = os.path.splitext(file)
prefix = name[:-2]
count = series_count.get(prefix, 0)
series_count[prefix] = count + 1
for prefix, count in series_count.items():
if count > 1:
shutil.copyfile(black_jpg_location, f"{prefix}00.jpg")

Use of regular expression to exclude characters in file rename,Python?

I am trying to rename files so that they contain an ID followed by a -(int). The files generally come to me in this way but sometimes they come as 1234567-1(crop to bottom).jpg.
I have been trying to use the following code but my regular expression doesn't seem to be having any effect. The reason for the walk is because we have to handles large directory trees with many images.
def fix_length():
for root, dirs, files in os.walk(path):
for fn in files:
path2 = os.path.join(root, fn)
filename_zero, extension = os.path.splitext(fn)
re.sub("[^0-9][-]", "", filename_zero)
os.rename(path2, filename_zero + extension)
fix_length()
I have inserted print statements for filename_zero before and after the re.sub line and I am getting the same result (i.e. 1234567-1(crop to bottom) not what I wanted)
This raises an exception as the rename is trying to create a file that already exists.
I thought perhaps adding the [-] in the regex was the issue but removing it and running again I would then expect 12345671.jpg but this doesn't work either. My regex is failing me or I have failed the regex.
Any insight would be greatly appreciated.
As a follow up, I have taken all the wonderful help and settled on a solution to my specific problem.
path = 'C:\Archive'
errors = 'C:\Test\errors'
num_files = []
def best_sol():
num_files = []
for root, dirs, files in os.walk(path):
for fn in files:
filename_zero, extension = os.path.splitext(fn)
path2 = os.path.join(root, fn)
ID = re.match('^\d{1,10}', fn).group()
if len(ID) <= 7:
if ID not in num_files:
num_files = []
num_files.append(ID)
suffix = str(len(num_files))
os.rename(path2, os.path.join(root, ID + '-' + suffix + extension))
else:
num_files.append(ID)
suffix = str(len(num_files))
os.rename(path2, os.path.join( root, ID + '-' + suffix +extension))
else:
shutil.copy(path2, errors)
os.remove(path2)
This code creates an ID based upon (up to) the first 10 numeric characters in the filename. I then use lists that store the instances of this ID and use the, length of the list append a suffix. The first file will have a -1, second a -2 etc...
I am only interested (or they should only be) in ID's with a length of 7 but allow to read up to 10 to allow for human error in labelling. All files with ID longer than 7 are moved to a folder where we can investigate.
Thanks for pointing me in the right direction.
re.sub() returns the altered string, but you ignore the return value.
You want to re-assign the result to filename_zero:
filename_zero = re.sub("[^\d-]", "", filename_zero)
I've corrected your regular expression as well; this removes anything that is not a digit or a dash from the base filename:
>>> re.sub(r'[^\d-]', '', '1234567-1(crop to bottom)')
'1234567-1'
Remember, strings are immutable, you cannot alter them in-place.
If all you want is the leading digits, plus optional dash-digit suffix, select the characters to be kept, rather than removing what you don't want:
filename_zero = re.match(r'^\d+(?:-\d)?', filename_zero).group()
new_filename = re.sub(r'^([0-9]+)-([0-9]+)', r'\g1-\g2', filename_zero)
Try using this regular expression instead, I hope this is how regular expressions work in Python, I don't use it often. You also appear to have forgotten to assign the value returned by the re.sub call to the filename_zero variable.

batch renaming 100K files with python

I have a folder with over 100,000 files, all numbered with the same stub, but without leading zeros, and the numbers aren't always contiguous (usually they are, but there are gaps) e.g:
file-21.png,
file-22.png,
file-640.png,
file-641.png,
file-642.png,
file-645.png,
file-2130.png,
file-2131.png,
file-3012.png,
etc.
I would like to batch process this to create padded, contiguous files. e.g:
file-000000.png,
file-000001.png,
file-000002.png,
file-000003.png,
When I parse the folder with for filename in os.listdir('.'): the files don't come up in the order I'd like to them to. Understandably they come up
file-1,
file-1x,
file-1xx,
file-1xxx,
etc. then
file-2,
file-2x,
file-2xx,
etc. How can I get it to go through in the order of the numeric value? I am a complete python noob, but looking at the docs i'm guessing I could use map to create a new list filtering out only the numerical part, and then sort that list, then iterate that? With over 100K files this could be heavy. Any tips welcome!
import re
thenum = re.compile('^file-(\d+)\.png$')
def bynumber(fn):
mo = thenum.match(fn)
if mo: return int(mo.group(1))
allnames = os.listdir('.')
allnames.sort(key=bynumber)
Now you have the files in the order you want them and can loop
for i, fn in enumerate(allnames):
...
using the progressive number i (which will be 0, 1, 2, ...) padded as you wish in the destination-name.
There are three steps. The first is getting all the filenames. The second is converting the filenames. The third is renaming them.
If all the files are in the same folder, then glob should work.
import glob
filenames = glob.glob("/path/to/folder/*.txt")
Next, you want to change the name of the file. You can print with padding to do this.
>>> filename = "file-338.txt"
>>> import os
>>> fnpart = os.path.splitext(filename)[0]
>>> fnpart
'file-338'
>>> _, num = fnpart.split("-")
>>> num.rjust(5, "0")
'00338'
>>> newname = "file-%s.txt" % num.rjust(5, "0")
>>> newname
'file-00338.txt'
Now, you need to rename them all. os.rename does just that.
os.rename(filename, newname)
To put it together:
for filename in glob.glob("/path/to/folder/*.txt"): # loop through each file
newname = make_new_filename(filename) # create a function that does step 2, above
os.rename(filename, newname)
Thank you all for your suggestions, I will try them all to learn the different approaches. The solution I went for is based on using a natural sort on my filelist, and then iterating that to rename. This was one of the suggested answers but for some reason it has disappeared now so I cannot mark it as accepted!
import os
files = os.listdir('.')
natsort(files)
index = 0
for filename in files:
os.rename(filename, str(index).zfill(7)+'.png')
index += 1
where natsort is defined in http://code.activestate.com/recipes/285264-natural-string-sorting/
Why don't you do it in a two step process. Parse all the files and rename with padded numbers and then run another script that takes those files, which are sorted correctly now, and renames them so they're contiguous?
1) Take the number in the filename.
2) Left-pad it with zeros
3) Save name.
def renamer():
for iname in os.listdir('.'):
first, second = iname.replace(" ", "").split("-")
number, ext = second.split('.')
first, number, ext = first.strip(), number.strip(), ext.strip()
number = '0'*(6-len(number)) + number # pad the number to be 7 digits long
oname = first + "-" + number + '.' + ext
os.rename(iname, oname)
print "Done"
Hope this helps
The simplest method is given below. You can also modify for recursive search this script.
use os module.
get filenames
os.rename
import os
class Renamer:
def __init__(self, pattern, extension):
self.ext = extension
self.pat = pattern
return
def rename(self):
p, e = (self.pat, self.ext)
number = 0
for x in os.listdir(os.getcwd()):
if str(x).endswith(f".{e}") == True:
os.rename(x, f'{p}_{number}.{e}')
number+=1
return
if __name__ == "__main__":
pattern = "myfile"
extension = "txt"
r = Renamer(pattern=pattern, extension=extension)
r.rename()

Categories