How to rename files in reverse order in Python? - python

I have a scanner that creates a folder of images named like this:
A1.jpg A2.jpg A3.jpg...A24.jpg -> B1.jpg B2.jpg B3.jpg...B24.jpg
There are 16 rows and 24 images per letter row i.e A1 to P24, 384 images total.
I would like to rename them by reversing the order. The first file should take the name of the last and vice versa. Consider first to be A1 (which is also the first created during scanning)
The closest example I can find is in shell but that is not really what I want:
for i in {1..50}; do
mv "$i.txt" "renamed/$(( 50 - $i + 1 )).txt"
done
Perhaps I need to save the filenames into a list (natsort maybe?) then use those names somehow?
I also thought I could use the image creation time as the scanner always creates the files in the same order with the same names. In saying that, any solutions may not be so useful for others with the same challenge.
What is a sensible approach to this problem?

I don't know if this is the most optimal way of doing that, but here it is:
import os
folder_name = "test"
new_folder_name = folder_name + "_new"
file_names = os.listdir(folder_name)
file_names_new = file_names[::-1]
print(file_names)
print(file_names_new)
os.mkdir(new_folder_name)
for name, new_name in zip(file_names, file_names_new):
os.rename(folder_name + "/" + name, new_folder_name + "/" + new_name)
os.rmdir(folder_name)
os.rename(new_folder_name, folder_name)
This assumes that you have files saved in the directory "test"

I would store the original list. Then rename all files in the same order (e.g. 1.jpg, 2.jpg etc.). Then I'd rename all of those files into the reverse of the original list.
In that way you will not encounter duplicate file names during the renaming.
You can make use of the pathlib functions rename and iterdir for this. I think it's straightforward how to put that together.

Solution based on shutil package (os package sometimes has permissions problems) and "in place" not to waste memory if the folder is huge
import wizzi_utils as wu
import os
def reverse_names(dir_path: str, temp_file_suffix: str = '.temp_unique_suffix') -> None:
"""
"in place" solution:
go over the list from both directions and swap names
swap needs a temp variable so move first file to target name with 'temp_file_suffix'
"""
files_full_paths = wu.find_files_in_folder(dir_path=dir_path, file_suffix='', ack=True, tabs=0)
files_num = len(files_full_paths)
for i in range(files_num): # works for even and odd files_num
j = files_num - i - 1
if i >= j: # crossed the middle - done
break
file_a, file_b = files_full_paths[i], files_full_paths[j]
print('replacing {}(idx in dir {}) with {}(idx in dir {}):'.format(
os.path.basename(file_a), i, os.path.basename(file_b), j))
temp_file_name = '{}{}'.format(file_b, temp_file_suffix)
wu.move_file(file_src=file_a, file_dst=temp_file_name, ack=True, tabs=1)
wu.move_file(file_src=file_b, file_dst=file_a, ack=True, tabs=1)
wu.move_file(file_src=temp_file_name, file_dst=file_b, ack=True, tabs=1)
return
def main():
reverse_names(dir_path='./scanner_files', temp_file_suffix='.temp_unique_suffix')
return
if __name__ == '__main__':
main()
found 6 files that ends with in folder "D:\workspace\2021wizzi_utils\temp\StackOverFlow\scanner_files":
['A1.jpg', 'A2.jpg', 'A3.jpg', 'B1.jpg', 'B2.jpg', 'B3.jpg']
replacing A1.jpg(idx in dir 0) with B3.jpg(idx in dir 5):
D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/A1.jpg Moved to D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/B3.jpg.temp_unique_suffix(0B)
D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/B3.jpg Moved to D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/A1.jpg(0B)
D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/B3.jpg.temp_unique_suffix Moved to D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/B3.jpg(0B)
replacing A2.jpg(idx in dir 1) with B2.jpg(idx in dir 4):
D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/A2.jpg Moved to D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/B2.jpg.temp_unique_suffix(0B)
D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/B2.jpg Moved to D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/A2.jpg(0B)
D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/B2.jpg.temp_unique_suffix Moved to D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/B2.jpg(0B)
replacing A3.jpg(idx in dir 2) with B1.jpg(idx in dir 3):
D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/A3.jpg Moved to D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/B1.jpg.temp_unique_suffix(0B)
D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/B1.jpg Moved to D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/A3.jpg(0B)
D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/B1.jpg.temp_unique_suffix Moved to D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/B1.jpg(0B)

Related

Is there a better way to do this? Counting Files, and directories via for loop vs map

Folks,
I'm trying to optimize this to help speed up the process...
What I am doing is creating a dictionary of scandir entries...
e.g.
fs_data = {}
for item in Path(fqpn).iterdir():
# snipped out a bunch of normalization code
fs_data[item.name.title().strip()] = item
{'file1': <file1 scandisk data>, etc}
and then later using a function to gather the count of files, and directories in the data.
Now I suspect that the new code, using map could be optimized to be faster than the old code. I suspect that having to run the list comprehension twice, once for files, and once for directories.
But I can't think of a way to optimize it to only have to run once.
Can anyone suggest a way to sum the files, and directories at the same time in the new version? (I could fall back to the old code, if necessary)
But I might be over optimizing at this point?
Any feedback would be welcome.
def new_fs_counts(fs_entries) -> (int, int):
"""
Quickly count the files vs directories in a list of scandir entries
Used primary by sync_database_disk to count a path's files & directories
Parameters
----------
fs_entries (list) - list of scandir entries
Returns
-------
tuple - (# of files, # of dirs)
"""
def counter(fs_entry):
return (fs_entry.is_file(), not fs_entry.is_file())
mapdata = list(map(counter, fs_entries.values()))
files = sum(files for files, _ in mapdata)
dirs = sum(dirs for _, dirs in mapdata)
return (files, dirs)
vs
def old_fs_counts(fs_entries) -> (int, int):
"""
Quickly count the files vs directories in a list of scandir entries
Used primary by sync_database_disk to count a path's files & directories
Parameters
----------
fs_entries (list) - list of scandir entries
Returns
-------
tuple - (# of files, # of dirs)
"""
files = 0
dirs = 0
for fs_item in fs_entries:
is_file = fs_entries[fs_item].is_file()
files += is_file
dirs += not is_file
return (files, dirs)
map is fast here if you map the is_file function directly:
files = sum(map(os.DirEntry.is_file, fs_entries.values()))
dirs = len(fs_entries) - files
(Something with filter might be even faster, at least if most entries aren't files. Or filter with is_dir if that works for you and most entries aren't directories. Or itertools.filterfalse with is_file. Or using itertools.compress. Also, counting True with list.count or operator.countOf instead of summing bools might be faster. But all of these ideas take more code (and some also memory). I'd prefer my above way.)
Okay, map is definitely not the right answer here.
This morning I got up and created a test using timeit...
and it was a bit of a splash of reality to the face.
Without optimizations, new vs old, the new map code was roughly 2x the time.
New : 0.023185124970041215
old : 0.011841499945148826
I really ended up falling for a bit of click bait, and thought that rewriting with MAP would gain some better efficiency.
For the sake of completeness.
from timeit import timeit
import os
new = '''
def counter(fs_entry):
files = fs_entry.is_file()
return (files, not files)
mapdata = list(map(counter, fs_entries.values()))
files = sum(files for files, _ in mapdata)
dirs = sum(dirs for _, dirs in mapdata)
#dirs = len(fs_entries)-files
'''
#dirs = sum(dirs for _, dirs in mapdata)
old = '''
files = 0
dirs = 0
for fs_item in fs_entries:
is_file = fs_entries[fs_item].is_file()
files += is_file
dirs += not is_file
'''
fs_location = '/Volumes/4TB_Drive/gallery/albums/collection1'
fs_data = {}
for item in os.scandir(fs_location):
fs_data[item.name] = item
print("New : ", timeit(stmt=new, number=1000, globals={'fs_entries':fs_data}))
print("old : ", timeit(stmt=old, number=1000, globals={'fs_entries':fs_data}))
And while I was able close the gap with some optimizations.. (Thank you Lee for your suggestion)
New : 0.10864979098550975
old : 0.08246175001841038
It is clear that the for loop solution is easier to read, faster, and just simpler.
The speed difference between new and old, doesn't seem to be map specifically.
The duplicate sum statement added .021, and The biggest slow down was from the second fs_entry.is_file, it added .06x to the timings...

How to duplicate file.jpg 1000 times at once in python?

I want to duplicate file 1000 times at once so I tried this code to create a file duplicator, but it only duplicates 1 file in the directory.
import shutil
src = r'D:\src\file.jpg'
dst = r'D:\dst\file.jpg'
for _ in range(5):
shutil.copy(src, dst)
IIUC, You want to duplicate the file n times, which is equal to 5 in your given code.
Firstly There is no need to declare two variables with same value if you do not intend to change it later, so:
import shutil
src = r'D:\src\file.jpg'
ext = r'.jpg'
#Change the 5 to what number you want to duplicate
for i in range(5):
shutil.copy(src, f'{src + str(i) + ext}')
I changed the code so that the file name will be different with the increasing file count.
EX:
file0.jpg
file1.jpg
file2.jpg
...
Edit: Thanks Timus for reminding me, I changed i to str(i) since i is an integer originally and you can't add it to the end of a string.

How to extract jpg EXIF metadata from a folder in chronological order

I am currently writing a script to extract EXIF GPS data from a folder of jpg images. I am using os.scandir to extract the entries from the folder but from my understanding os.scandir opens the files in an arbitrary way. I need the images to be opened in chronological order by filename. Below is my current code which works as intended however it does not open the images in the correct order. The files within my image folder are named chronologically like so: "IMG_0097, IMG_0098" etc.
#!/usr/bin/python
import os, exif, folium
def convert_lat(coordinates, ref):
latCoords = coordinates[0] + coordinates[1] / 60 + coordinates[2] / 3600
if ref == 'W' or ref == 'S':
latCoords = -latCoords
return latCoords
coordList=[]
map = folium.Map(location=[51.50197125069916, -0.14000860301423912], zoom_start = 16)
from exif import Image
with os.scandir('gps/') as entries:
try:
for entry in entries:
img_path = 'gps/'+entry.name
with open (img_path, 'rb') as src:
img = Image(src)
if img.has_exif:
latCoords = (convert_lat(img.gps_latitude, img.gps_latitude_ref))
longCoords = (convert_lat(img.gps_longitude, img.gps_longitude_ref))
coord = [latCoords, longCoords]
coordList.append(coord)
folium.Marker(coord, popup=str(coord)).add_to(map)
folium.PolyLine(coordList, color =" red", weight=2.5, opacity=1).add_to(map)
print(img_path)
print(coord)
else:
print (src.name,'has no EXIF information')
except:
print(img_path)
print("error occured")
map.save(outfile='/home/jamesdean/Desktop/Python scripts/map.html')
print ("Map generated successfully")
I would say ditch os.scandir and take advantage of more modern features the standard library has to offer:
from pathlib import Path
from operator import attrgetter
# assuming there is a folder named "gps" in the current working directory...
for path in sorted(Path("gps").glob("*.jpg"), key=attrgetter("stem")):
print(path) # do something with the current path
The from operator import attrgetter and key=attrgetter("stem") are a bit redundant, but I'm just being explicit about what attribute I would like to use for determining the sorted order. In this case, the "stem" attribute of a path refers to just the name of the file as a string. For example, if the current path has a filename (including extension) of "IMG_0097.jpg", then path.stem would be "IMG_0097". Like I said, the stem is a string, so your paths will be sorted in lexicographical order. You don't need to do any conversion to integers, because your filenames already include leading zeroes, so lexicographical ordering should work just fine.
You can sort a list using the built-in sorted function, Paul made an interesting point and simply sorting without any arguments will work just as fine:
a = ["IMG_0097.jpg", "IMG_0085.jpg", "IMG_0043.jpg", "IMG_0098.jpg", "IMG_0099.jpg", "IMG_0100.jpg"]
sorted_list = sorted(a)
print(sorted_list)
Output:
['IMG_0043.jpg', 'IMG_0085.jpg', 'IMG_0097.jpg', 'IMG_0098.jpg', 'IMG_0099.jpg', 'IMG_0100.jpg']
In your case you can do:
for entry in sorted(entries):

Looping over different python dictionaries - wrong results?

I am using Python 2.7, btw..
Let's say I have a couple directories that I want to create dictionaries for. The files in each of the directories are named YYYYMMDD.hhmmss and are all different, and the size of each directory is different:
path1 = /path/to/folders/to/make/dictionaries
dir1 = os.listdir(path1)
I also have another static directory that will have some files to compare
gpath1 = /path/to/static/files
gdir1 = os.listdir(gpath1)
dir1_file_list = [datetime.strptime(g, '%Y%m%d.%H%M%S') for g in gdir1]
So I have a static directory of files in gdir2, and I now want to loop through each directory in dir1 and create a unique dictionary. This is the code:
for i in range(0,len(dir1)):
path2 = path1 + "/" + dir1[i]
dir2 = os.listdir(path2)
dir2_file_list = [datetime.strptime(r, '%Y%m%d.%H%M%S') for r in dir2]
# Define a dictionary, and initialize comparisons
dict_gr = []
dict_gr = dict()
for dir1_file in dir1_file_list:
dict_gr[str(dir1_file)] = []
# Look for instances within the last 5 minutes
for dir2_file in dir2_file_list:
if 0 <= (dir1_file - dir2_file).total_seconds() <= 300:
dict_gr[str(dir1_file)].append(str(dir2_file))
# Sort the dictionaries
for key, value in sorted(dict_gr.iteritems()):
dir2_lib.append(key)
dir1_lib.append(sorted(value))
The issue is that path2 and dir2 both properly go to the different folders and grab the necessary filenames, and creating dict_gr will all work well. However, when I go to the part of the script where I sort the dictionaries, the 2nd directory that has been looped over will contain the contents of the first directory. The 3rd looped dictionary will contain the contents of the 1st and 2nd, etc. In other words, they are not matching uniquely with each directory.
Any thoughts?
Overlooked appending to dir2_lib and dir1_lib, needed to initialize these.

sort images based on a cluster correspondances list

I have the following working code to sort images according to a cluster list which is a list of tuples: (image_id, cluster_id).
One image can only be in one and only one cluster (there is never the same image in two clusters for example).
I wonder if there is a way to shorten the "for+for+if+if" loops at the end of the code as yet, for each file name, I must check in every pairs in the cluster list, which makes it a little redundant.
import os
import re
import shutil
srcdir = '/home/username/pictures/' #
if not os.path.isdir(srcdir):
print("Error, %s is not a valid directory!" % srcdir)
return None
pts_cls # is the list of pairs (image_id, cluster_id)
filelist = [(srcdir+fn) for fn in os.listdir(srcdir) if
re.search(r'\.jpg$', fn, re.IGNORECASE)]
filelist.sort(key=lambda var:[int(x) if x.isdigit() else
x for x in re.findall(r'[^0-9]|[0-9]+', var)])
for f in filelist:
fbname = os.path.splitext(os.path.basename(f))[0]
for e,cls in enumerate(pts_cls): # for each (img_id, clst_id) pair
if str(cls[0])==fbname: # check if image_id corresponds to file basename on disk)
if cls[1]==-1: # if cluster_id is -1 (->noise)
outdir = srcdir+'cluster_'+'Noise'+'/'
else:
outdir = srcdir+'cluster_'+str(cls[1])+'/'
if not os.path.isdir(outdir):
os.makedirs(outdir)
dstf = outdir+os.path.basename(f)
if os.path.isfile(dstf)==False:
shutil.copy2(f,dstf)
Of course, as I am pretty new to Python, any other well explained improvements are welcome!
I think you're complicating this far more than needed. Since your image names are unique (there can only be one image_id) you can safely convert pts_cls into a dict and have fast lookups on the spot instead of looping through the list of pairs each and every time. You are also utilizing regex where its not needed and you're packing your paths only to unpack them later.
Also, your code would break if it happens that an image from your source directory is not in the pts_cls as its outdir would never be set (or worse, its outdir would be the one from the previous loop).
I'd streamline it like:
import os
import shutil
src_dir = "/home/username/pictures/"
if not os.path.isdir(src_dir):
print("Error, %s is not a valid directory!" % src_dir)
exit(1) # return is expected only from functions
pts_cls = [] # is the list of pairs (image_id, cluster_id), load from whereever...
# convert your pts_cls into a dict - since there cannot be any images in multiple clusters
# base image name is perfectly ok to use as a key for blazingly fast lookups later
cluster_map = dict(pts_cls)
# get only `.jpg` files; store base name and file name, no need for a full path at this time
files = [(fn[:-4], fn) for fn in os.listdir(src_dir) if fn.lower()[-4:] == ".jpg"]
# no need for sorting based on your code
for name, file_name in files: # loop through all files
if name in cluster_map: # proceed with the file only if in pts_cls
cls = cluster_map[name] # get our cluster value
# get our `cluster_<cluster_id>` or `cluster_Noise` (if cluster == -1) target path
target_dir = os.path.join(src_dir, "cluster_" + str(cls if cls != -1 else "Noise"))
target_file = os.path.join(target_dir, file_name) # get the final target path
if not os.path.exists(target_file): # if the target file doesn't exists
if not os.path.isdir(target_dir): # make sure our target path exists
os.makedirs(target_dir, exist_ok=True) # create a full path if it doesn't
shutil.copy(os.path.join(src_dir, file_name), target_file) # copy
UPDATE - If you have multiple 'special' folders for certain cluster IDs (like Noise is for -1) you can create a map like cluster_targets = {-1: "Noise"} where the keys are your cluster IDs and their values are, obviously, the special names. Then you can replace the target_dir generation with: target_dir = os.path.join(src_dir, "cluster_" + str(cluster_targets.get(cls,cls)))
UPDATE #2 - Since your image_id values appear to be integers while filenames are strings, I'd suggest you to just build your cluster_map dict by converting your image_id parts to strings. That way you'd be comparing likes to likes without the danger of type mismatch:
cluster_map = {str(k): v for k, v in pts_cls}
If you're sure that none of the *.jpg files in your src_dir will have a non-integer in their name you can instead convert the filename into an integer to begin with in the files list generation - just replace fn[:-4] with int(fn[:-4]). But I wouldn't advise that as, again, you never know how your files might be named.

Categories