Python code for renaming files in directory not working - python

I'm looking to rename my files from time to time in numerical order, so for example, 1.png, 2.png., 3.png, etc
I wrote this code in an attempt to do so, I simply ended it with by printing what the files would be named to make sure it was right:
import os
os.chdir('/Users/hasso/Pictures/Digital Art/saved images for vid/1')
for f in os.listdir():
f_name=1
f_ext= '.png'
print('{}{}'.format(f_name, f_ext))
How would I go by solving this?

You keep on getting 1.png suggested as the new name because you always set f_name = 1 inside the loop. Initialize it with 1 before the loop, and then increment it as you are renaming each file instead.
A few additional points:
You don't need os.chdir because even if the default is . – the current dir –, you can also supply the target path to os.filelist.
When dealing with user home directories, it's nice if you don't have to hardcode it. os.path.expanduser retrieves this value for you.
When iterating over lists that you possibly want to change, it's best to make a separate list of only the items that you want to change. So, rather than looping over all files and changing some of them, make it easier by first gathering all items that you want to change. In your case, make a list of only .png files and then loop over this list.
(Rather advanced) os.rename will throw an error if you try to rename to an already existing name. What I do below is check if the next name to be used is already in the list, and if it is, increase the f_name number.
import os
yourPath = os.path.expanduser('~')+'/Pictures/Digital Art/saved images for vid/1'
filelist = []
for f in os.listdir(yourPath):
if f.lower().endswith('.png'):
filelist.append (f)
f_name = 1
for f in filelist:
while True:
next_name = str(f_name)+'.png'
if not next_name in filelist:
break
f_name += 1
print ('Renaming {} to {}'.format(yourPath+'/'+f, next_name))
# os.rename (yourPath+'/'+f, next_name)
f_name += 1

I'm not sure why you need to use os.chdir() to change directories, when you can just pass the path straight to os.listdir(). To rename files, you can use os.rename(). You also need to increment the counter for the file names, since your current code you keep fname equal to 1 on each iteration. You need to keep the counter outside the loop and increment it within the loop. This is where you can useenumerate(), since you can use indexing instead.
Basic version:
from os import listdir
from os import rename
from os.path import join
path = "path_to_images"
for i, f in enumerate(listdir(path), start=1):
rename(join(path, f), join(path, str(i) + '.png'))
You can get the full paths using os.path.join(), since os.listdir() doesn't include the full path of the file. The above code is also not very robust as it renames all files, and doesn't handle renaming already existent .png files.
Advanced version:
from os import listdir
from os import rename
from os.path import join
from os.path import exists
path = "path_to_images"
extension = '.png'
fname = 1
for f in listdir(path):
if f.endswith(extension):
while exists(join(path, str(fname) + extension)):
fname += 1
rename(join(path, f), join(path, str(fname) + extension))
fname += 1
Which uses os.path.exists() to check if the file already exists.

Related

Removing all duplicate images with different filenames from a directory

I am trying to iterate through a folder and delete any file that is a duplicate image (but different name). After running this script all files get deleted except for one. There are at least a dozen unique ones out of about 5,000. Any help understanding why this is happening would be appreciated.
import os
import cv2
directory = r'C:\Users\Grid\scratch'
for filename in os.listdir(directory):
a=directory+'\\'+filename
n=(cv2.imread(a))
q=0
for filename in os.listdir(directory):
b=directory+'\\'+filename
m=(cv2.imread(b))
comparison = n == m
equal_arrays = comparison.all()
if equal_arrays==True and q==1:
os.remove(b)
q=1
There are a few issues with your code, and it's confusing that it could run at all without throwing an exception, since the comparison variable is a boolean, so calling comparison.all() shouldn't work.
A few pointers: You only need to get the directory contents once. It also would be much simpler to collect md5 or sha1 hashes of the files while iterating the directory and then remove duplicates along the way.
for example:
import hashlib
import os
hashes = set()
for filename in os.listdir(directory):
path = os.path.join(directory, filename)
digest = hashlib.sha1(open(path,'rb').read()).digest()
if digest not in hashes:
hashes.add(digest)
else:
os.remove(path)
You can use a more secure hash if you would like but the chances of encountering a collision are astronomically low.

A way to create files and directories without overwriting

You know how when you download something and the downloads folder contains a file with the same name, instead of overwriting it or throwing an error, the file ends up with a number appended to the end? For example, if I want to download my_file.txt, but it already exists in the target folder, the new file will be named my_file(2).txt. And if I try again, it will be my_file(3).txt.
I was wondering if there is a way in Python 3.x to check that and get a unique name (not necessarily create the file or directory). I'm currently implementing it doing this:
import os
def new_name(name, newseparator='_')
#name can be either a file or directory name
base, extension = os.path.splitext(name)
i = 2
while os.path.exists(name):
name = base + newseparator + str(i) + extension
i += 1
return name
In the example above, running new_file('my_file.txt') would return my_file_2.txt if my_file.txt already exists in the cwd. name can also contain the full or relative path, it will work as well.
I would use PathLib and do something along these lines:
from pathlib import Path
def new_fn(fn, sep='_'):
p=Path(fn)
if p.exists():
if not p.is_file():
raise TypeError
np=p.resolve(strict=True)
parent=str(np.parent)
extens=''.join(np.suffixes) # handle multiple ext such as .tar.gz
base=str(np.name).replace(extens,'')
i=2
nf=parent+base+sep+str(i)+extens
while Path(nf).exists():
i+=1
nf=parent+base+sep+str(i)+extens
return nf
else:
return p.parent.resolve(strict=True) / p
This only handles files as written but the same approach would work with directories (which you added later.) I will leave that as a project for the reader.
Another way of getting a new name would be using the built-in tempfile module:
from pathlib import Path
from tempfile import NamedTemporaryFile
def new_path(path: Path, new_separator='_'):
prefix = str(path.stem) + new_separator
dir = path.parent
suffix = ''.join(path.suffixes)
with NamedTemporaryFile(prefix=prefix, suffix=suffix, delete=False, dir=dir) as f:
return f.name
If you execute this function from within Downloads directory, you will get something like:
>>> new_path(Path('my_file.txt'))
'/home/krassowski/Downloads/my_file_90_lv301.txt'
where the 90_lv301 part was generated internally by the Python's tempfile module.
Note: with the delete=False argument, the function will create (and leave undeleted) an empty file with the new name. If you do not want to have an empty file created that way, just remove the delete=False, however keeping it will prevent anyone else from creating a new file with such name before your next operation (though they could still overwrite it).
Simply put, having delete=False prevents concurrency issues if you (or the end-user) were to run your program twice at the same time.

Count the number of folders in a directory and subdirectories

I've got a script that will accurately tell me how many files are in a directory, and the subdirectories within. However, I'm also looking into identify how many folders there are within the same directory and its subdirectories...
My current script:
import os, getpass
from os.path import join, getsize
user = 'Copy of ' + getpass.getuser()
path = "C://Documents and Settings//" + user + "./"
folder_counter = sum([len(folder) for r, d, folder in os.walk(path)])
file_counter = sum([len(files) for r, d, files in os.walk(path)])
print ' [*] ' + str(file_counter) + ' Files were found and ' + str(folder_counter) + ' folders'
This code gives me the print out of: [*] 147 Files were found and 147 folders.
Meaning that the folder_counter isn't counting the right elements. How can I correct this so the folder_counter is correct?
Python 2.7 solution
For a single directory and in you can also do:
import os
print len(os.walk('dir_name').next()[1])
which will not load the whole string list and also return you the amount of directories inside the 'dir_name' directory.
Python 3.x solution
Since many people just want an easy and fast solution, without actually understanding the solution, I edit my answer to include the exact working code for Python 3.x.
So, in Python 3.x we have the next method instead of .next. Thus, the above snippet becomes:
import os
print(len(next(os.walk('dir_name'))[1]))
where dir_name is the directory that you want to find out how many directories has inside.
I think you want something like:
import os
files = folders = 0
for _, dirnames, filenames in os.walk(path):
# ^ this idiom means "we won't be using this value"
files += len(filenames)
folders += len(dirnames)
print "{:,} files, {:,} folders".format(files, folders)
Note that this only iterates over os.walk once, which will make it much quicker on paths containing lots of files and directories. Running it on my Python directory gives me:
30,183 files, 2,074 folders
which exactly matches what the Windows folder properties view tells me.
Note that your current code calculates the same number twice because the only change is renaming one of the returned values from the call to os.walk:
folder_counter = sum([len(folder) for r, d, folder in os.walk(path)])
# ^ here # ^ and here
file_counter = sum([len(files) for r, d, files in os.walk(path)])
# ^ vs. here # ^ and here
Despite that name change, you're counting the same value (i.e. in both it's the third of the three returned values that you're using)! Python functions do not know what names (if any at all; you could do print list(os.walk(path)), for example) the values they return will be assigned to, and their behaviour certainly won't change because of it. Per the documentation, os.walk returns a three-tuple (dirpath, dirnames, filenames), and the names you use for that, e.g. whether:
for foo, bar, baz in os.walk(...):
or:
for all_three in os.walk(..):
won't change that.
If interested only in the number of folders in /input/dir (and not in the subdirectories):
import os
folder_count = 0 # type: int
input_path = "/path/to/your/input/dir" # type: str
for folders in os.listdir(input_path): # loop over all files
if os.path.isdir(os.path.join(input_path, folders): # if it's a directory
folder_count += 1 # increment counter
print("There are {} folders".format(folder_count))
>>> import os
>>> len(list(os.walk('folder_name')))
According to os.walk the first argument dirpath enumerates all directories.

Changing name of file until it is unique

I have a script that downloads files (pdfs, docs, etc) from a predetermined list of web pages. I want to edit my script to alter the names of files with a trailing _x if the file name already exists, since it's possible files from different pages will share the same filename but contain different contents, and urlretrieve() appears to automatically overwrite existing files.
So far, I have:
urlfile = 'https://www.foo.com/foo/foo/foo.pdf'
filename = urlfile.split('/')[-1]
filename = foo.pdf
if os.path.exists(filename):
filename = filename('.')[0] + '_' + 1
That works fine for one occurrence, but it looks like after one foo_1.pdf it will start saving as foo_1_1.pdf, and so on. I would like to save the files as foo_1.pdf, foo_2.pdf, and so on.
Can anybody point me in the right direction on how to I can ensure that file names are stored in the correct fashion as the script runs?
Thanks.
So what you want is something like this:
curName = "foo_0.pdf"
while os.path.exists(curName):
num = int(curName.split('.')[0].split('_')[1])
curName = "foo_{}.pdf".format(str(num+1))
Here's the general scheme:
Assume you start from the first file name (foo_0.pdf)
Check if that name is taken
If it is, iterate the name by 1
Continue looping until you find a name that isn't taken
One alternative: Generate a list of file numbers that are in use, and update it as needed. If it's sorted you can say name = "foo_{}.pdf".format(flist[-1]+1). This has the advantage that you don't have to run through all the files every time (as the above solution does). However, you need to keep the list of numbers in memory. Additionally, this will not fill any gaps in the numbers
Why not just use the tempfile module:
fileobj = tempfile.NamedTemporaryFile(suffix='.pdf', prefix='', delete = False)
Now your filename will be available in fileobj.name and you can manipulate to your heart's content. As an added benefit, this is cross-platform.
Since you're dealing with multiple pages, this seeems more like a "global archive" than a per-page archive. For a per-page archive, I would go with the answer from #wnnmaw
For a global archive, I would take a different approch...
Create a directory for each filename
Store the file in the directory as "1" + extension
write the current "number" to the directory as "_files.txt"
additional files are written as 2,3,4,etc and increment the value in _files.txt
The benefits of this:
The directory is the original filename. If you keep turning "Example-1.pdf" into "Example-2.pdf" you run into a possibility where you download a real "Example-2.pdf", and can't associate it to the original filename.
You can grab the number of like-named files either by reading _files.txt or counting the number of files in the directory.
Personally, I'd also suggest storing the files in a tiered bucketing system, so that you don't have too many files/directories in any one directory (hundreds of files makes it annoying as a user, thousands of files can affect OS performance ). A bucketing system might turn a filename into a hexdigest, then drop the file into `/%s/%s/%s" % ( hex[0:3], hex[3:6], filename ). The hexdigest is used to give you a more even distribution of characters.
import os
def uniquify(path, sep=''):
path = os.path.normpath(path)
num = 0
newpath = path
dirname, basename = os.path.split(path)
filename, ext = os.path.splitext(basename)
while os.path.exists(newpath):
newpath = os.path.join(dirname, '{f}{s}{n:d}{e}'
.format(f=filename, s=sep, n=num, e=ext))
num += 1
return newpath
filename = uniquify('foo.pdf', sep='_')
Possible problems with this include:
If you call to uniquify many many thousands of times with the same
path, each subsequent call may get a bit slower since the
while-loop starts checking from num=0 each time.
uniquify is vulnerable to race conditions whereby a file may not
exist at the time os.path.exists is called, but may exist at the
time you use the value returned by uniquify. Use
tempfile.NamedTemporaryFile to avoid this problem. You won't get
incremental numbering, but you will get files with unique names,
guaranteed not to already exist. You could use the prefix parameter to
specify the original name of the file. For example,
import tempfile
import os
def uniquify(path, sep='_', mode='w'):
path = os.path.normpath(path)
if os.path.exists(path):
dirname, basename = os.path.split(path)
filename, ext = os.path.splitext(basename)
return tempfile.NamedTemporaryFile(prefix=filename+sep, suffix=ext, delete=False,
dir=dirname, mode=mode)
else:
return open(path, mode)
Which could be used like this:
In [141]: f = uniquify('/tmp/foo.pdf')
In [142]: f.name
Out[142]: '/tmp/foo_34cvy1.pdf'
Note that to prevent a race-condition, the opened filehandle -- not merely the name of the file -- is returned.

Python automated file names

I want to automate the file name used when saving a spreadsheet using xlwt. Say there is a sub directory named Data in the folder the python program is running. I want the program to count the number of files in that folder (# = n). Then the filename must end in (n+1). If there are 0 files in the folder, the filename must be Trial_1.xls. This file must be saved in that sub directory.
I know the following:
import xlwt, os, os.path
n = len([name for name in os.listdir('.') if os.path.isfile(name)])
counts the number of files in the same folder.
a = n + 1
filename = "Trial_" + "a" + ".xls"
book.save(filename)
this will save the file properly named in to the same folder.
My question is how do I extend this in to a sub directory? Thanks.
os.listdir('.') the . in this points to the directory from where the file is executed. Change the . to point to the subdirectory you are interested in.
You should give it the full path name from the root of your file system; otherwise it will be relative to the directory from where the script is executed. This might not be what you want; especially if you need to refer to the sub directory from another program.
You also need to provide the full path to the filename variable; which would include the sub directory.
To make life easier, just set the full path to a variable and refer to it when needed.
TARGET_DIR = '/home/me/projects/data/'
n = sum(1 for f in os.listdir(TARGET_DIR) if os.path.isfile(os.path.join(TARGET_DIR, f)))
new_name = "{}Trial_{}.xls".format(TARGET_DIR,n+1)
You actually want glob:
from glob import glob
DIR = 'some/where/'
existing_files = glob(DIR + '*.xls')
filename = DIR + 'stuff--%d--stuff.xls' % (len(existing_files) + 1)
Since you said Burhan Khalid's answer "Works perfectly!" you should accept it.
I just wanted to point out a different way to compute the number. The way you are doing it works, but if we imagine you were counting grains of sand or something would use way too much memory. Here is a more direct way to get the count:
n = sum(1 for name in os.listdir('.') if os.path.isfile(name))
For every qualifying name, we get a 1, and all these 1's get fed into sum() and you get your count.
Note that this code uses a "generator expression" instead of a list comprehension. Instead of building a list, taking its length, and then discarding the list, the above code just makes an iterator that sum() iterates to compute the count.
It's a bit sleazy, but there is a shortcut we can use: sum() will accept boolean values, and will treat True as a 1, and False as a 0. We can sum these.
# sum will treat Boolean True as a 1, False as a 0
n = sum(os.path.isfile(name) for name in os.listdir('.'))
This is sufficiently tricky that I probably would not use this without putting a comment. But I believe this is the fastest, most efficient way to count things in Python.

Categories