Regex and renaming file error - python

I was writing a little program that finds all files with given prefix, let's say 'spam' for this example, in a folder and locates gaps in numbering and renames subsequent folders to fill the gap. Below illustrates a portion of the program that locates the files using a regex and renames it:
prefix = 'spam'
newNumber = 005
# Regex for finding files with specified prefix + any numbering + any file extension
prefixRegex = re.compile(r'(%s)((\d)+)(\.[a-zA-Z0-9]+)' % prefix)
# Rename file by keeping group 1 (prefix) and group 4 (file extension),
# but substituting numbering with newNumber
newFileName = prefixRegex.sub(r'\1%s\4' % newNumber, 'spam006.txt')
What I was expecting from above was spam005.txt, but instead I got #5.txt
I figured out I could use r'%s%s\4' % (prefix, newNumber) instead and then it does work as intended, but I'd like to understand why this error is happening. Does it have something to do with the %s used during re.compile()?

There are two problems here:
Your newNumber needs to be a string if you want it to be 005 as the first two 0 are dropped when it is being interpreted as an integer.
Your next problem is indeed in your substitution. By using the string formating you effectively create the new regexp \15\4 (see the 5 in there, that was your newNumber). When python sees this it tries to get capturing group 15 and not group 1 followed by a literal 5. You can enclose the reference in a g like this to get your desired behavior: \g<1>5\4
So your code needs to be changed to this:
prefix = 'spam'
newNumber = '005'
# Regex for finding files with specified prefix + any numbering + any file extension
prefixRegex = re.compile(r'(%s)((\d)+)(\.[a-zA-Z0-9]+)' % prefix)
# Rename file by keeping group 1 (prefix) and group 4 (file extension),
# but substituting numbering with newNumber
newFileName = prefixRegex.sub(r'\g<1>%s\4' % newNumber, 'spam006.txt')
More information about the \g<n> behavior can be found at the end of the re.sub doucmentation

Related

Python script is not listing a file in the directory?

I have a slight question.
I have the following function:
def getCommands():
for file in os.listdir(com_dir):
if file.endswith(com_ext):
z = string.strip(file, '.gcom')
print z
and in the directory ( Defined by com_dir ) there are three files.
a.gcom
b.gcom
c.gcom
when running getCommands()
The following is outputted:
a
b
Files a and b are shown ,however, c is not shown, All files are in the directory and all are using the same file extension: .gcom which is also com_ext variable wise.
Does anyone have any hints as to why file c is not being shown?
Side note: There seems to be a blank space in the output where c should be however Im not sure if this has any part in the issue at hand and isn't just simply a accidental space placed elsewhere in the script.
strip removes all the given characters from both ends of your string, whatever order they occur in. If your string is c.gcom, then strip('.gcom') removes all the characters ., g, c, o and m from the ends of your string, leaving nothing left. It doesn't stop stripping until it hits a character that is not ., g, c, o or m (or removes everything).
If you have a string ending in .gcom, and you just want to remove that ending, you can use:
z = file[:-5]
or, using your com_ext variable
com_ext = '.gcom'
...
if file.endswith(com_ext):
z = file[:-len(com_ext)]
Python 3 does it much better than Python 2:
from pathlib import Path
def getCommands(com_dir, com_ext): # com_ext = "gcom"
for f in Path(com_dir).glob("*." + com_ext):
print ("{}".format(f.stem))
But if you REALLY have to use Python 2:
def getCommands(com_dir, com_ext):
for file in os.listdir(com_dir):
s = f.split('.' + com_ext)
if len(s) > 1:
print("{}".format(s[0]))

Renaming Multiple Files at Once in a Directory

I am attempting to take a file name such as 'OP 40 856101.txt' from a directory, remove the .txt, set each single word to a specific variable, then reorder the filename based on a required order such as '856101 OP 040'. Below is my code:
import os
dir = 'C:/Users/brian/Documents/Moeller'
orig = os.listdir(dir) #original names of the files in the folder
for orig_name in orig: #This loop splits each file name into a list of stings containing each word
f = os.path.splitext(orig_name)[0]
sep = f.split() #Separation is done by a space
for t in sep: #Loops across each list of strings into an if statement that saves each part to a specific variable
#print(t)
if t.isalpha() and len(t) == 3:
wc = t
elif len(t) > 3 and len(t) < 6:
wc = t
elif t == 'OP':
op = t
elif len(t) >= 4:
pnum = t
else:
opnum = t
if len(opnum) == 2:
opnum = '0' + opnum
new_nam = '%s %s %s %s' % (pnum,op,opnum, wc) #This is the variable that contain the text for the new name
print("The orig filename is %r, the new filename is %r" % (orig_name, new_nam))
os.rename(orig_name, new_nam)
However I am getting an error with my last for loop where I attempt to rename each file in the directory.
FileNotFoundError: [WinError 2] The system cannot find the file specified: '150 856101 OP CLEAN.txt' -> '856101 OP 150 CLEAN'
The code runs perfectly until the os.rename() command, if I print out the variable new_nam, it prints out the correct naming order for all of the files in the directory. Seems like it cannot find the original file though to replace the filename to the string in new_nam. I assume it is a directory issue, however I am newer to python and can't seem to figure where to edit my code. Any tips or advice would be greatly appreciated!
Try this (just changed the last line):
os.rename(os.path.join(dir,orig_name), os.path.join(dir,new_nam))
You need to tell Python the actual path of the file to rename - otherwise, it looks only in the directory containing this file.
Incidentally, it's better not to use dir as a variable name, because that's the name of a built-in.

removing a string of four characters from the front and thirteen characters from the end of a filename

I have seen the basic Python code for a filename replacement in a directory but they are always for known strings, but how would you remove random characters of a certain length?
Would this work?
newFileName = file.replace([-5:], "")
As I am trying to remove the last five characters from the filename without removing the extension.
Here is an update:
I am trying to do this:
DMC-CIWS15-AAAA-A00-00-0000-00A-018A-D_014-00_EN-US.xml
to
CIWS15-AAAA-A00-00-0000-00A-018A-D.xml
which removes DMC- and _014-00_EN-US from the end.
I need to add this to a code that will fix a directory of files.
This problem (if I understand it correctly) has a clear separation. Remove extension, remove X characters from beginning and end, and then add the extension again to get the final answer.
import os
oldFileName = 'xxxx-filename-xxxxx.XML'
# remove n chars in beginning, m chars at end
n = 5
m = 6
name, ext = os.path.splitext(oldFileName)
# splice away the chars, and add the extension
newFileName = '{}{}'.format(name[0:-m][n:], ext)
# newFileName == 'filename.XML'
So in your case, you would use n=4 and m=13.
If you didn't know the length, but you knew you wanted everything up to and including the first dash out, and likewise everything after the first underscore (which would mean there couldn't be underscores in the normal filename or the first part of it), this would work also:
import os
oldFileName = 'DMC-CIWS15-AAAA-A00-00-0000-00A-018A-D_014-00_EN-US.xml'
name, ext = os.path.splitext(oldFileName)
newFileName = '{}{}'.format(name[name.index('-')+1:name.index('_')], ext)
# newFileName == 'CIWS15-AAAA-A00-00-0000-00A-018A-D.xml'
And even if the pattern is something else, but there is a pattern, you can code to match it, like I have here.
Its not nice but I hope this works for you tho
If you know the files that you want to rename all have the same length, you can try:
>>>file = 'DMC-CIWS15-AAAA-A00-00-0000-00A-018A-D_014-00_EN-US.xml'
>>>ext = file[51:]
>>>newFile = file[4:38]+ext
when you print the newFile you now have:
>>>print(newFile)
CIWS15-AAAA-A00-00-0000-00A-018A-D.xml

Python - Concatenate a variable into string format

I'm trying to retrieve the number from a file, and determine the padding of it, so I can apply it to the new file name, but with an added number. I'm basically trying to do a file saver sequencer.
Ex.:
fileName_0026
0026 = 4 digits
add 1 to the current number and keep the same amount of digit
The result should be 0027 and on.
What I'm trying to do is retrieve the padding number from the file and use the '%04d'%27 string formatting. I've tried everything I know (my knowledge is very limited), but nothing works. I've looked everywhere to no avail.
What I'm trying to do is something like this:
O=fileName_0026
P=Retrieved padding from original file (4)
CN=retrieve current file number (26)
NN=add 1 to current file number (27)
'%0 P d' % NN
Result=fileName_0027
I hope this is clear enough, I'm having a hard time trying to articulate this.
Thanks in advance for any help.
Cheers!
There's a few things going on here, so here's my approach and a few comments.
def get_next_filename(existing_filename):
prefix = existing_filename.split("_")[0] # get string prior to _
int_string = existing_filename.split("_")[-1].split(".")[0] # pull out the number as a string so we can extract an integer value as well as the number of characters
try:
extension = existing_filename.split("_")[-1].split(".")[-1] # check for extension
except:
extension = None
int_current = int(int_string) # integer value of current filename
int_new = int(int_string) + 1 # integer value of new filename
digits = len(int_string) # number of characters/padding in name
formatter = "%0"+str(digits)+"d" # creates a statement that int_string_new can use to create a number as a string with leading zeros
int_string_new = formatter % (int_new,) # applies that format
new_filename = prefix+"_"+int_string_new # put it all together
if extension: # add the extension if present in original name
new_filename += "."+extension
return new_filename
# since we only want to do this when the file already exists, check if it exists and execute function if so
our_filename = 'file_0026.txt'
while os.path.isfile(our_filename):
our_filename = get_next_filename(our_filename) # loop until a unique filename found
I am writing some hints to acheive that. It's unclear what exactly you wanna achieve?
fh = open("fileName_0026.txt","r") #Read a file
t= fh.read() #Read the content
name= t.split("_|.") #Output:: [fileName,0026,txt]
n=str(int(name[1])+1) #27
s= n.zfill(2) #0027
newName= "_".join([fileName,s])+".txt" #"fileName_0027.txt"
fh = open(newName,"w") #Write a new file*emphasized text*
Use the rjust function from string
O=fileName_0026
P=Retrieved padding from original file (4)
CN=retrieve current file number (26)
NN=add 1 to current file number (27)
new_padding = str(NN).rjust(P, '0')
Result=fileName_ + new_padding
import re
m = re.search(r".*_(0*)(\d*)", "filenName_00023")
print m.groups()
print("fileName_{0:04d}".format(int(m.groups()[1])+1))
{0:04d} means pad out to four digits wide with leading zeros.
As you can see there are a few ways to do this that are quite similar. But one thing the other answers haven't mention is that it's important to strip off any existing leading zeroes from your file's number string before converting it to int, otherwise it will be interpreted as octal.
edit
I just realised that my previous code crashes if the file number is zero! :embarrassed:
Here's a better version that also copes with a missing file number and names with multiple or no underscores.
#! /usr/bin/env python
def increment_filename(s):
parts = s.split('_')
#Handle names without a number after the final underscore
if not parts[-1].isdigit():
parts.append('0')
tail = parts[-1]
try:
n = int(tail.lstrip('0'))
except ValueError:
#Tail was all zeroes
n = 0
parts[-1] = str(n + 1).zfill(len(tail))
return '_'.join(parts)
def main():
for s in (
'fileName_0026',
'data_042',
'myfile_7',
'tricky_99',
'myfile_0',
'bad_file',
'worse_file_',
'_lead_ing_under_score',
'nounderscore',
):
print "'%s' -> '%s'" % (s, increment_filename(s))
if __name__ == "__main__":
main()
output
'fileName_0026' -> 'fileName_0027'
'data_042' -> 'data_043'
'myfile_7' -> 'myfile_8'
'tricky_99' -> 'tricky_100'
'myfile_0' -> 'myfile_1'
'bad_file' -> 'bad_file_1'
'worse_file_' -> 'worse_file__1'
'_lead_ing_under_score' -> '_lead_ing_under_score_1'
'nounderscore' -> 'nounderscore_1'
Some additional refinements possible:
An optional arg to specify the number to add to the current file
number,
An optional arg to specify the minimum width of the file
number string,
Improved handling of names with weird number / position of
underscores.

Python - Removing unknown 10 character string

Im using a modified version of Eric Bidelman's/HTML5Rocks cachebust.py file for css/js. link is here
Instead of appending timestamp like
.css?2012-07-30
I modified variable to -
cachebust = ''.join(random.choice(string.ascii_uppercase + string.digits) for x in range(10))
so it becomes (for example)
.css?6SKD39SFJ3
his original version didnt seem to remove the date either, so im not really sure how that is a 'cache control' but i figured if i could just auto-strip those 10 characters, it would work. first targeting any js files (for new files), then if js? (with cachecontrol already in place), strip that existing cachecontrol
asset = re.search('\.(js")><\/script>', line)
if asset is not None:
existing = re.search('\.(js?"', line)
if existing is not None:
line[i] = line.replace('.js?'STRING????'"', '.js"')
lines[i] = line.replace('.js"></script>', '.js?%s"></script>' % cachebust)
thoughts on what that STRING???? should be, or if this method wouldnt work? im new to python so im just experimenting here...
You could replace the 3 lines:
existing = re.search('\.(js?"', line)
if existing is not None:
line[i] = line.replace('.js?'STRING????'"', '.js"')
with:
re.sub(r'\.js\?[-0-9]{10}">',r'.js?">', line)
Output:
>>> re.sub(r'\.js\?[-0-9]{10}">',r'.js?">', '<script type="blah" src="url/to/path.js?2012-07-02">')
'<script type="blah" src="url/to/path.js?">'
I have used the regexp [-0-9]{10} which stands for 10 characters of digits and a dash. In case that can stand for any 10 characters, use: .{10}

Categories