I'm trying to create a python script that will go a specific folder and remove all the numbers from the file name.
This is the code
def rename_file():
print"List of Files:"
print(os.getcwd())
os.chdir("/home/n3m0/Desktop/Pictures")
for fn in os.listdir(os.getcwd()):
print("file w/ numbers -" +fn)
print("File w/o numbers - "+fn.translate(None, "0123456789"))
os.rename(fn, fn.translate(None, "0123456789"))
os.chdir("/home/n3m0/Desktop/Pictures")
rename_files()
What I'm trying to do is remove all the numbers so that I'm able to read the file name
For example I want:
B45608aco4897n Pan44ca68ke90s1.jpg to say Bacon Pancakes.jpg
When I run the script it changeS all of the names in the terminal but when I go to the folder only one file name has been changed and I have to run the script multiple times. I'm using python 2.7.
I'm not 100% on this as I am just on my phone at the moment, but try this:
from string import digits
def rename_files():
os.chdir("/whatever/directory/you/want/here")
for fn in os.listdir(os.getcwd()):
os.rename(fn, fn.translate(None, digits))
rename_files()
Your indentation is a little messed up, and that's part of what's causing you problems. You also don't necessarily need to change the working directory - we can simply just keep track of the folder we're looking at and use os.path.join to reconstruct the file path, like so:
import os
from string import digits
def renamefiles(folder_path):
for input_file in os.listdir(folder_path):
print 'Original file name: {}'.format(input_file)
if any(str(x) in input_file for x in digits):
new_name = input_file.translate(None, digits)
print 'Renaming: {} to {}'.format(input_file, new_name)
os.rename(os.path.join(folder_path, input_file), os.path.join(folder_path, new_name))
rename_files('/home/n3m0/Desktop/Pictures')
This produces a method that you can re-use - we loop through all the items in the folder, printing the original names as we go. We then check if there are any digits in the filename, and if they are we rename the file.
Note, however, that this method is not particularly safe - what if the file name consists entirely of numbers and an extension? What if there are two files named identically apart from numbers (e.g. asongtoruin0.jpg and asongtoruin1.jpg)? This method would only retain the last file it found, overwriting the first. Look into the functions available in os to try to work out how to solve this, particularly os.path.isfile.
EDIT: had some time to spare, here's a little fix to catch the error for renaming to an already-existing file name:
def renamefiles(folder_path):
for input_file in os.listdir(folder_path):
print 'Original file name: {}'.format(input_file)
if any(str(x) in input_file for x in digits):
new_name = input_file.translate(None, digits)
# if removing numbers conflicts with an existing file, try adding a number to the end of the file name.
i = 1
while os.path.isfile(os.path.join(folder_path, new_name)):
split = os.path.splitext(new_name)
new_name = '{0} ({1}){2}'.format(split[0], i, split[1])
print 'Renaming: {} to {}'.format(input_file, new_name)
os.rename(os.path.join(folder_path, input_file), os.path.join(folder_path, new_name))
rename_files('/home/n3m0/Desktop/Pictures')
Related
My filenames have pattern like 29_11_2019_17_05_17_1050_R__2.png and 29_11_2019_17_05_17_1550_2
I want to write a function which separates these files and puts them in different folders.
Please find my code below but its not working.
Can you help me with this?
def sort_photos(folder, dir_name):
for filename in os.listdir(folder):
wavelengths = ["1550_R_", "1550_", "1050_R_", "1200_"]
for x in wavelengths:
if x == "1550_R_":
if re.match(r'.*x.*', filename):
filesrc = os.path.join(folder, filename)
shutil.copy(filesrc, dir_name)
print("Filename has 'x' in it, do something")
print("file 1550_R_ copied")
# cv2.imwrite(dir_name,filename)
else:
print("filename doesn't have '1550_R_' in it (so maybe it only has 'N' in it)")
In order to construct a RegEx using a variable, you can use string-interpolation.
In Python3.6+, you can use f-strings to accomplish this. In this case, the condition for your second if statement could be:
if re.match(fr'.*{x}.*', filename) is not None:
instead of:
if re.match(r'.*x.*', filename) is not None:
Which would only ever match filenames with an 'x' in them. I think is the immediate (though not necessarily only) problem in your example code.
Footnote
Earlier versions of Python do string interpolation differently, the oldest (AFAIK) is %-formatting, e.g:
if re.match(r".*%s.*" % x, filename) is not None:
Read here for more detail.
I am not very cleared about which problem you encounter.
However, there are two suggestions:
To detect character x in file name, you can just use:
if('x' in filename):
...
If you only intended to move the files, a file check should be added:
if os.path.isfile(name):
...
I didn't have much time so I've edited your function which acts very close to what you wanted. It essentially reads file names, and copies them to separate directories but in directories named by wavelengths instead. Though currently it cannot differentiate between '1550_' and '1550_R_' since '1550_R_' includes '1550_' and I didn't have much time. You can create a conditional statement for it by a few lines and there you go. (If you do not do that it will create two directories '1550_' and '1550_R_' but it will copy files that are eligible for either to both of the folders.)
One final note that as I said that I didn't have much time I've made it all simpler that the destination folders are created just where your files are located. You can add it easily if you want by a few lines too.
import cv2
import os
import re
import shutil
def sort_photos(folder):
wavelengths = ["1550_R_", "1550_", "1050_R_", "1200_"]
for filename in os.listdir(folder):
for x,idx in zip(wavelengths, range(len(wavelengths))):
if (x in filename):
filesrc = os.path.join(folder, filename)
path = './'+x+'/'
if not os.path.exists(path):
os.mkdir(path)
shutil.copy(filesrc, path+filename)
# print("Filename has 'x' in it, do something")
# cv2.imwrite(dir_name,filename)
# else:
# print("filename doesn't have 'A' in it (so maybe it only has 'N' in it)")
########## USAGE: sort_photos(folder), for example, go to the folder where all the files are located:
sort_photos('./')
I've searched through many answers on deleting multiple files based on certain parameters (e.g. all txt files). Unfortunately, I haven't seen anything where one has a longish list of files saved to a .txt (or .csv) file and wants to use that list to delete files from the working directory.
I have my current working directory set to where the .txt file is (text file with list of files for deletion, one on each row) as well as the ~4000 .xlsx files. Of the xlsx files, there are ~3000 I want to delete (listed in the .txt file).
This is what I have done so far:
import os
path = "c:\\Users\\SFMe\\Desktop\\DeleteFolder"
os.chdir(path)
list = open('DeleteFiles.txt')
for f in list:
os.remove(f)
This gives me the error:
OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'Test1.xlsx\n'
I feel like I'm missing something simple. Any help would be greatly appreciated!
Thanks
Strip ending '\n' from each line read from the text file;
Make absolute path by joining path with the file name;
Do not overwrite Python types (i.e., in you case list);
Close the text file or use with open('DeleteFiles.txt') as flist.
EDIT: Actually, upon looking at your code, due to os.chdir(path), second point may not be necessary.
import os
path = "c:\\Users\\SFMe\\Desktop\\DeleteFolder"
os.chdir(path)
flist = open('DeleteFiles.txt')
for f in flist:
fname = f.rstrip() # or depending on situation: f.rstrip('\n')
# or, if you get rid of os.chdir(path) above,
# fname = os.path.join(path, f.rstrip())
if os.path.isfile(fname): # this makes the code more robust
os.remove(fname)
# also, don't forget to close the text file:
flist.close()
As Henry Yik pointed in the commentary, you need to pass the full path when using os.remove function. Also, open function just returns the file object. You need to read the lines from the file. And don't forget to close the file. A solution would be:
import os
path = "c:\\Users\\SFMe\\Desktop\\DeleteFolder"
os.chdir(path)
# added the argument "r" to indicates only reading
list_file = open('DeleteFiles.txt', "r")
# changing variable list to _list to do not shadow
# the built-in function and type list
_list = list_file.read().splitlines()
list_file.close()
for f in _list:
os.remove(os.path.join(path,f))
A further improvement would be use list comprehension instead of a loop and a with block, which "automagically" closes the file for us:
with open('DeleteFiles.txt', "r") as list_file:
_list = list_file.read().splitlines()
[os.remove(os.path.join(path,f)) for f in _list]
I created a script to see all the files in a folder and print the full path of each file.
The script is working and prints the output in the Command Prompt (Windows)
import os
root = 'C:\Users\marco\Desktop\Folder'
for path, subdirs, files in os.walk(root):
for name in files:
print os.path.join(path, name)
I want now to save the output in a txt file so I have edited the code assigning the os.path.join(path,name) to a variable values but, when I print the output, the script gives me an error
import os
root = 'C:\Users\marco\Desktop\Folder'
for path, subdirs, files in os.walk(root):
for name in files:
values = os.path.join(path, name)
file = open('sample.txt', 'w')
file.write(values)
file.close()
Error below
file.write(values)
NameError: name 'values' is not defined
Try this:
file = open('sample.txt', 'w')
for path, subdirs, files in os.walk(root):
for name in files:
values = os.path.join(path, name)
file.write(values+'\n')
file.close()
Note that file is a builtin symbol in Python (which is overridden here), so I suggest that you replace it with fileDesc or similar.
The problem is the variable values is only limited to the scope of the inner for loop. So assign empty value to the variable before you start the iteration. Like values=None or better yet values='' Now assuming the above code even worked you wouldn't get the output file you desired. You see, the variable values is being regularly updated. So after the iteration the location of the last file encountered would be stored in values which would then be written in the sample.txt file.
Another bad practice you seem to be following is using \ instead of \\ inside strings. This might come to bite you later (if they haven't already). You see \ when followed by a letter denotes an escape sequence\character and \\ is the escape sequence for slash.
So here's a redesigned working sample code:
import os
root = 'C:\\Users\\marco\\Desktop\\Folder'
values=''
for path, subdirs, files in os.walk(root):
for name in files:
values = values + os.path.join(path, name) + '\n'
samplef = open('sample.txt', 'w')
samplef.write(values)
samplef.close()
In case you aren't familiar, '\n' denotes the escape sequence for a new-line. Reading your output file would be quite tedious had all your files been written been on the same line.
PS: I did the code with stings as that's what I would prefer in this scenario, but you way try arrays or lists or what-not. Just be sure that you define the variable beforehand lest you should get out of scope.
import os
root = 'C:\Users\marco\Desktop\Folder'
file = open('sample.txt', 'w')
for path, subdirs, files in os.walk(root):
for name in files:
values = os.path.join(path, name)
file.write(values)
file.close()
values is not defined to be availbale in the lexical scope of the file. It is scoped within a inner loop. change it as above, will work.
I have a script that downloads files (pdfs, docs, etc) from a predetermined list of web pages. I want to edit my script to alter the names of files with a trailing _x if the file name already exists, since it's possible files from different pages will share the same filename but contain different contents, and urlretrieve() appears to automatically overwrite existing files.
So far, I have:
urlfile = 'https://www.foo.com/foo/foo/foo.pdf'
filename = urlfile.split('/')[-1]
filename = foo.pdf
if os.path.exists(filename):
filename = filename('.')[0] + '_' + 1
That works fine for one occurrence, but it looks like after one foo_1.pdf it will start saving as foo_1_1.pdf, and so on. I would like to save the files as foo_1.pdf, foo_2.pdf, and so on.
Can anybody point me in the right direction on how to I can ensure that file names are stored in the correct fashion as the script runs?
Thanks.
So what you want is something like this:
curName = "foo_0.pdf"
while os.path.exists(curName):
num = int(curName.split('.')[0].split('_')[1])
curName = "foo_{}.pdf".format(str(num+1))
Here's the general scheme:
Assume you start from the first file name (foo_0.pdf)
Check if that name is taken
If it is, iterate the name by 1
Continue looping until you find a name that isn't taken
One alternative: Generate a list of file numbers that are in use, and update it as needed. If it's sorted you can say name = "foo_{}.pdf".format(flist[-1]+1). This has the advantage that you don't have to run through all the files every time (as the above solution does). However, you need to keep the list of numbers in memory. Additionally, this will not fill any gaps in the numbers
Why not just use the tempfile module:
fileobj = tempfile.NamedTemporaryFile(suffix='.pdf', prefix='', delete = False)
Now your filename will be available in fileobj.name and you can manipulate to your heart's content. As an added benefit, this is cross-platform.
Since you're dealing with multiple pages, this seeems more like a "global archive" than a per-page archive. For a per-page archive, I would go with the answer from #wnnmaw
For a global archive, I would take a different approch...
Create a directory for each filename
Store the file in the directory as "1" + extension
write the current "number" to the directory as "_files.txt"
additional files are written as 2,3,4,etc and increment the value in _files.txt
The benefits of this:
The directory is the original filename. If you keep turning "Example-1.pdf" into "Example-2.pdf" you run into a possibility where you download a real "Example-2.pdf", and can't associate it to the original filename.
You can grab the number of like-named files either by reading _files.txt or counting the number of files in the directory.
Personally, I'd also suggest storing the files in a tiered bucketing system, so that you don't have too many files/directories in any one directory (hundreds of files makes it annoying as a user, thousands of files can affect OS performance ). A bucketing system might turn a filename into a hexdigest, then drop the file into `/%s/%s/%s" % ( hex[0:3], hex[3:6], filename ). The hexdigest is used to give you a more even distribution of characters.
import os
def uniquify(path, sep=''):
path = os.path.normpath(path)
num = 0
newpath = path
dirname, basename = os.path.split(path)
filename, ext = os.path.splitext(basename)
while os.path.exists(newpath):
newpath = os.path.join(dirname, '{f}{s}{n:d}{e}'
.format(f=filename, s=sep, n=num, e=ext))
num += 1
return newpath
filename = uniquify('foo.pdf', sep='_')
Possible problems with this include:
If you call to uniquify many many thousands of times with the same
path, each subsequent call may get a bit slower since the
while-loop starts checking from num=0 each time.
uniquify is vulnerable to race conditions whereby a file may not
exist at the time os.path.exists is called, but may exist at the
time you use the value returned by uniquify. Use
tempfile.NamedTemporaryFile to avoid this problem. You won't get
incremental numbering, but you will get files with unique names,
guaranteed not to already exist. You could use the prefix parameter to
specify the original name of the file. For example,
import tempfile
import os
def uniquify(path, sep='_', mode='w'):
path = os.path.normpath(path)
if os.path.exists(path):
dirname, basename = os.path.split(path)
filename, ext = os.path.splitext(basename)
return tempfile.NamedTemporaryFile(prefix=filename+sep, suffix=ext, delete=False,
dir=dirname, mode=mode)
else:
return open(path, mode)
Which could be used like this:
In [141]: f = uniquify('/tmp/foo.pdf')
In [142]: f.name
Out[142]: '/tmp/foo_34cvy1.pdf'
Note that to prevent a race-condition, the opened filehandle -- not merely the name of the file -- is returned.
I want to write a little script for managing a bunch of files I got. Those files have complex and different name but they all contain a number somewhere in their name. I want to take that number, place it in front of the file name so they can be listed logically in my filesystem.
I got a list of all those files using os.listdir but I'm struggling to find a way to locate the numbers in those files. I've checked regular expression but I'm unsure if it's the right way to do this!
example:
import os
files = os.litdir(c:\\folder)
files
['xyz3.txt' , '2xyz.txt', 'x1yz.txt']`
So basically, what I ultimately want is:
1xyz.txt
2xyz.txt
3xyz.txt
where I am stuck so far is to find those numbers (1,2,3) in the list files
This (untested) snippet should show the regexp approach. The search method of compiled patterns is used to look for the number. If found, the number is moved to the front of the file name.
import os, re
NUM_RE = re.compile(r'\d+')
for name in os.listdir('.'):
match = NUM_RE.search(name)
if match is None or match.start() == 0:
continue # no number or number already at start
newname = match.group(0) + name[:match.start()] + name[match.end():]
print 'renaming', name, 'to', newname
#os.rename(name, newname)
If this code is used in production and not as homework assignment, a useful improvement would be to parse match.group(0) as an integer and format it to include a number of leading zeros. That way foo2.txt would become 02foo.txt and get sorted before 12bar.txt. Implementing this is left as an exercise to the reader.
Assuming that the numbers in your file names are integers (untested code):
def rename(dirpath, filename):
inds = [i for i,char in filename if char in '1234567890']
ints = filename[min(inds):max(inds)+1]
newname = ints + filename[:min(inds)] + filename[max(inds)+1:]
os.rename(os.path.join(dirpath, filename), os.path.join(dirpath, newname))
def renameFilesInDir(dirpath):
""" Apply your renaming scheme to all files in the directory specified by dirpath """
dirpath, dirnames, filenames = os.walk(dirpath):
for filename in filenames:
rename(dirpath, filename)
for dirname in dirnames:
renameFilesInDir(os.path.join(dirpath, dirname))
Hope this helps