I am using Python 2.7, btw..
Let's say I have a couple directories that I want to create dictionaries for. The files in each of the directories are named YYYYMMDD.hhmmss and are all different, and the size of each directory is different:
path1 = /path/to/folders/to/make/dictionaries
dir1 = os.listdir(path1)
I also have another static directory that will have some files to compare
gpath1 = /path/to/static/files
gdir1 = os.listdir(gpath1)
dir1_file_list = [datetime.strptime(g, '%Y%m%d.%H%M%S') for g in gdir1]
So I have a static directory of files in gdir2, and I now want to loop through each directory in dir1 and create a unique dictionary. This is the code:
for i in range(0,len(dir1)):
path2 = path1 + "/" + dir1[i]
dir2 = os.listdir(path2)
dir2_file_list = [datetime.strptime(r, '%Y%m%d.%H%M%S') for r in dir2]
# Define a dictionary, and initialize comparisons
dict_gr = []
dict_gr = dict()
for dir1_file in dir1_file_list:
dict_gr[str(dir1_file)] = []
# Look for instances within the last 5 minutes
for dir2_file in dir2_file_list:
if 0 <= (dir1_file - dir2_file).total_seconds() <= 300:
dict_gr[str(dir1_file)].append(str(dir2_file))
# Sort the dictionaries
for key, value in sorted(dict_gr.iteritems()):
dir2_lib.append(key)
dir1_lib.append(sorted(value))
The issue is that path2 and dir2 both properly go to the different folders and grab the necessary filenames, and creating dict_gr will all work well. However, when I go to the part of the script where I sort the dictionaries, the 2nd directory that has been looped over will contain the contents of the first directory. The 3rd looped dictionary will contain the contents of the 1st and 2nd, etc. In other words, they are not matching uniquely with each directory.
Any thoughts?
Overlooked appending to dir2_lib and dir1_lib, needed to initialize these.
Related
I have a scanner that creates a folder of images named like this:
A1.jpg A2.jpg A3.jpg...A24.jpg -> B1.jpg B2.jpg B3.jpg...B24.jpg
There are 16 rows and 24 images per letter row i.e A1 to P24, 384 images total.
I would like to rename them by reversing the order. The first file should take the name of the last and vice versa. Consider first to be A1 (which is also the first created during scanning)
The closest example I can find is in shell but that is not really what I want:
for i in {1..50}; do
mv "$i.txt" "renamed/$(( 50 - $i + 1 )).txt"
done
Perhaps I need to save the filenames into a list (natsort maybe?) then use those names somehow?
I also thought I could use the image creation time as the scanner always creates the files in the same order with the same names. In saying that, any solutions may not be so useful for others with the same challenge.
What is a sensible approach to this problem?
I don't know if this is the most optimal way of doing that, but here it is:
import os
folder_name = "test"
new_folder_name = folder_name + "_new"
file_names = os.listdir(folder_name)
file_names_new = file_names[::-1]
print(file_names)
print(file_names_new)
os.mkdir(new_folder_name)
for name, new_name in zip(file_names, file_names_new):
os.rename(folder_name + "/" + name, new_folder_name + "/" + new_name)
os.rmdir(folder_name)
os.rename(new_folder_name, folder_name)
This assumes that you have files saved in the directory "test"
I would store the original list. Then rename all files in the same order (e.g. 1.jpg, 2.jpg etc.). Then I'd rename all of those files into the reverse of the original list.
In that way you will not encounter duplicate file names during the renaming.
You can make use of the pathlib functions rename and iterdir for this. I think it's straightforward how to put that together.
Solution based on shutil package (os package sometimes has permissions problems) and "in place" not to waste memory if the folder is huge
import wizzi_utils as wu
import os
def reverse_names(dir_path: str, temp_file_suffix: str = '.temp_unique_suffix') -> None:
"""
"in place" solution:
go over the list from both directions and swap names
swap needs a temp variable so move first file to target name with 'temp_file_suffix'
"""
files_full_paths = wu.find_files_in_folder(dir_path=dir_path, file_suffix='', ack=True, tabs=0)
files_num = len(files_full_paths)
for i in range(files_num): # works for even and odd files_num
j = files_num - i - 1
if i >= j: # crossed the middle - done
break
file_a, file_b = files_full_paths[i], files_full_paths[j]
print('replacing {}(idx in dir {}) with {}(idx in dir {}):'.format(
os.path.basename(file_a), i, os.path.basename(file_b), j))
temp_file_name = '{}{}'.format(file_b, temp_file_suffix)
wu.move_file(file_src=file_a, file_dst=temp_file_name, ack=True, tabs=1)
wu.move_file(file_src=file_b, file_dst=file_a, ack=True, tabs=1)
wu.move_file(file_src=temp_file_name, file_dst=file_b, ack=True, tabs=1)
return
def main():
reverse_names(dir_path='./scanner_files', temp_file_suffix='.temp_unique_suffix')
return
if __name__ == '__main__':
main()
found 6 files that ends with in folder "D:\workspace\2021wizzi_utils\temp\StackOverFlow\scanner_files":
['A1.jpg', 'A2.jpg', 'A3.jpg', 'B1.jpg', 'B2.jpg', 'B3.jpg']
replacing A1.jpg(idx in dir 0) with B3.jpg(idx in dir 5):
D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/A1.jpg Moved to D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/B3.jpg.temp_unique_suffix(0B)
D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/B3.jpg Moved to D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/A1.jpg(0B)
D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/B3.jpg.temp_unique_suffix Moved to D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/B3.jpg(0B)
replacing A2.jpg(idx in dir 1) with B2.jpg(idx in dir 4):
D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/A2.jpg Moved to D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/B2.jpg.temp_unique_suffix(0B)
D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/B2.jpg Moved to D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/A2.jpg(0B)
D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/B2.jpg.temp_unique_suffix Moved to D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/B2.jpg(0B)
replacing A3.jpg(idx in dir 2) with B1.jpg(idx in dir 3):
D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/A3.jpg Moved to D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/B1.jpg.temp_unique_suffix(0B)
D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/B1.jpg Moved to D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/A3.jpg(0B)
D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/B1.jpg.temp_unique_suffix Moved to D:/workspace/2021wizzi_utils/temp/StackOverFlow/scanner_files/B1.jpg(0B)
My file structure looks like this:
- Outer folder
- Inner folder 1
- Files...
- Inner folder 2
- Files...
- …
I'm trying to count the total number of files in the whole of Outer folder. os.walk doesn't return any files when I pass it the Outer folder, and as I've only got two layers I've written it manually:
total = 0
folders = ([name for name in os.listdir(Outer_folder)
if os.path.isdir(os.path.join(Outer_folder, name))])
for folder in folders:
contents = os.listdir(os.path.join(Outer_folder, folder))
total += len(contents)
print(total)
Is there a better way to do this? And can I find the number of files in an arbitrarily nested set of folders? I can't see any examples of deeply nested folders on Stack Overflow.
By 'better', I mean some kind of built in function, rather than manually writing something to iterate - e.g. an os.walk that walks the whole tree.
Use pathlib:
Return total number of files in directory and subdirectories shows how to get just the total number.
pathlib is part of the
standard library, and should be used instead of os because it treats paths as objects with methods, not strings to be sliced.
Python 3's pathlib Module: Taming the File System
Use a condition to select only files:
[x.parent for x in f if x.is_file()]
File and subdirectory count in each directory:
from pathlib import Path
import numpy as np
p = Path.cwd() # if you're running in the current dir
# p = Path('path to to dir') # otherwise, specify a path
# creates a generator of all the files matching the pattern
f = p.rglob('*')
# optionally, use list(...) to unpack the generator
# f = list(p.rglob('*'))
# counts them
paths, counts = np.unique([x.parent for x in f], return_counts=True)
path_counts = list(zip(paths, counts))
Output:
List of tuples with path and count
[(WindowsPath('E:/PythonProjects/stack_overflow'), 8),
(WindowsPath('E:/PythonProjects/stack_overflow/.ipynb_checkpoints'), 7),
(WindowsPath('E:/PythonProjects/stack_overflow/complete_solutions/data'), 6),
(WindowsPath('E:/PythonProjects/stack_overflow/csv_files'), 3),
(WindowsPath('E:/PythonProjects/stack_overflow/csv_files/.ipynb_checkpoints'), 1),
(WindowsPath('E:/PythonProjects/stack_overflow/data'), 5)]
f = list(p.rglob('*')) unpacks the generator and produces a list of all the files.
One-liner:
Use Path.cwd().rglob('*') or Path('some path').rglob('*')
path_counts = list(zip(*np.unique([x.parent for x in Path.cwd().rglob('*')], return_counts=True)))
I will suggest you use recursion as the function below:
def get_folder_count(path):
folders = os.listdir(path)
folders = list(filter(lambda a: os.path.isdir(os.path.join(path, a)), folders))
count = len(folders)
for i in range(count):
count += get_folder_count(os.path.join(path, folders[i]))
return count
Following scenario of traversing dir structure.
"Build complete dir tree with files but if files in single dir are similar in name list only single entity"
Example tree ( let's assume they're are not sorted ):
- rootDir
-dirA
fileA_01
fileA_03
fileA_05
fileA_06
fileA_04
fileA_02
fileA_...
fileAB
fileAC
-dirB
fileBA
fileBB
fileBC
Expected output:
- rootDir
-dirA
fileA_01 - fileA_06 ...
fileAB
fileAC
-dirB
fileBA
fileBB
fileBC
So I did already simple def findSimilarNames that for fileA_01 (or any fileA_) will return list [fileA_01...fileA_06]
Now I'm in os.walk and I'm doing loop over files so every file will be checked against similar filenames so e.g fileA_03 I've got rest of them [fileA_01 - fileA_06] and now I want to modify the list that I iterate over to just skip items from findSimilarNames, without need of using another loop or if's inside.
I searched here and people are suggesting avoidance of modifying iteration list, but doing so I would avoid every file iteration.
Pseudo code:
for root,dirs,files in os.walk( path ):
for file in files:
similarList = findSimilarNames( file )
#OVERWRITE ITERATION LIST SOMEHOW
files = (set(files)-set(similarList))
#DEAL WITH ELEMENT
What I'm trying to avoid is below - checking each file because maybe it's already found by findSimilarNames.
for root,dirs,files in os.walk( path ):
filteredbysimilar = files[:]
for file in files:
similar = findSimilarNames( file )
filteredbysimilar = list(set(filteredbysimilar)-set(similar))
#--
for filteredFile in filteredbysimilar:
#DEAL WITH ELEMENT
#OVERWRITE ITERATION LIST SOMEHOW
You can get this effect by using a while-loop style iteration. Since you want to do set subtraction to remove the similar groups anyway, the natural approach is to start with a set of all the filenames, and repeatedly remove groups until nothing is left. Thus:
unprocessed = set(files)
while unprocessed:
f = unprocessed.pop() # removes and returns an arbitrary element
group = findSimilarNames(f)
unprocessed -= group # it is not an error that `f` has already been removed.
doSomethingWith(group) # i.e., "DEAL WITH ELEMENT" :)
How about building up a list of files that aren't similar?
unsimilar = set()
for f in files:
if len(findSimilarNames(f).intersection(unsimilar))==0:
unsimilar.add(f)
This assumes findSimilarNames yields a set.
Hey guys I'm a rookie in python and need some help.
My problem is, that I have a folder full of text files (with lists in it), where two belong to each other and need to be read and compared.
Folder with many files: File1_in.xlo, File1_out.xlo, File2_in.xlo, File2_out.xlo, ...
--> so File1_in.xlo and File1_out.xlo belong together and need to be compared.
I already can append the lists of the 'in-Files' (or 'out-Files') and then compare them, but since there are many Files the lists become really long (thousands and thousands of entries), so the idea is to compare the files or respectively the lists pairwise.
My first try looks like:
import os
for filename in sorted(os.listdir('path')):
if filename.endswith('in.xlo'):
with open(os.path.join('path', filename)) as inn:
lines = inn.readlines()
for x in lines:
temperatureIn = x.split()[4]
if filename.endswith('out.xlo'):
with open(os.path.join('path', filename)) as outt:
lines = outt.readlines()
for x in lines:
temperatureOut = x.split()[4] #4. column in list
So the problem is, as you can see, the 'temperatureIn's are always overwritten before I can compare them with the 'temperatureOut's. I think/ hope there must be a way to open both files at once to compare the list entries.
I hope you can understand my problem and someone can help me.
Thanks
Use zip to access in-Files and out-Files in pairs
files = sorted(os.listdir('path'))
in_files = [fname for fname in files if fname.endswith('in.xlo')]
out_files = [fname for fname in files if fname.endswith('out.xlo')]
for in_file, out_file in zip(in_files, out_files):
with open(os.path.join('path', in_file)) as inn, open(os.path.join('path', out_file)) as outt:
# Do whatever you want
add them to a list created just before your for loop, as:
temps_in =[]
for x in lines:
temperatureIn = x.split()[4]
temps_in.append(temperatureIn)
Do the same thoing for temperatures out, then compare your two lists
I have the following working code to sort images according to a cluster list which is a list of tuples: (image_id, cluster_id).
One image can only be in one and only one cluster (there is never the same image in two clusters for example).
I wonder if there is a way to shorten the "for+for+if+if" loops at the end of the code as yet, for each file name, I must check in every pairs in the cluster list, which makes it a little redundant.
import os
import re
import shutil
srcdir = '/home/username/pictures/' #
if not os.path.isdir(srcdir):
print("Error, %s is not a valid directory!" % srcdir)
return None
pts_cls # is the list of pairs (image_id, cluster_id)
filelist = [(srcdir+fn) for fn in os.listdir(srcdir) if
re.search(r'\.jpg$', fn, re.IGNORECASE)]
filelist.sort(key=lambda var:[int(x) if x.isdigit() else
x for x in re.findall(r'[^0-9]|[0-9]+', var)])
for f in filelist:
fbname = os.path.splitext(os.path.basename(f))[0]
for e,cls in enumerate(pts_cls): # for each (img_id, clst_id) pair
if str(cls[0])==fbname: # check if image_id corresponds to file basename on disk)
if cls[1]==-1: # if cluster_id is -1 (->noise)
outdir = srcdir+'cluster_'+'Noise'+'/'
else:
outdir = srcdir+'cluster_'+str(cls[1])+'/'
if not os.path.isdir(outdir):
os.makedirs(outdir)
dstf = outdir+os.path.basename(f)
if os.path.isfile(dstf)==False:
shutil.copy2(f,dstf)
Of course, as I am pretty new to Python, any other well explained improvements are welcome!
I think you're complicating this far more than needed. Since your image names are unique (there can only be one image_id) you can safely convert pts_cls into a dict and have fast lookups on the spot instead of looping through the list of pairs each and every time. You are also utilizing regex where its not needed and you're packing your paths only to unpack them later.
Also, your code would break if it happens that an image from your source directory is not in the pts_cls as its outdir would never be set (or worse, its outdir would be the one from the previous loop).
I'd streamline it like:
import os
import shutil
src_dir = "/home/username/pictures/"
if not os.path.isdir(src_dir):
print("Error, %s is not a valid directory!" % src_dir)
exit(1) # return is expected only from functions
pts_cls = [] # is the list of pairs (image_id, cluster_id), load from whereever...
# convert your pts_cls into a dict - since there cannot be any images in multiple clusters
# base image name is perfectly ok to use as a key for blazingly fast lookups later
cluster_map = dict(pts_cls)
# get only `.jpg` files; store base name and file name, no need for a full path at this time
files = [(fn[:-4], fn) for fn in os.listdir(src_dir) if fn.lower()[-4:] == ".jpg"]
# no need for sorting based on your code
for name, file_name in files: # loop through all files
if name in cluster_map: # proceed with the file only if in pts_cls
cls = cluster_map[name] # get our cluster value
# get our `cluster_<cluster_id>` or `cluster_Noise` (if cluster == -1) target path
target_dir = os.path.join(src_dir, "cluster_" + str(cls if cls != -1 else "Noise"))
target_file = os.path.join(target_dir, file_name) # get the final target path
if not os.path.exists(target_file): # if the target file doesn't exists
if not os.path.isdir(target_dir): # make sure our target path exists
os.makedirs(target_dir, exist_ok=True) # create a full path if it doesn't
shutil.copy(os.path.join(src_dir, file_name), target_file) # copy
UPDATE - If you have multiple 'special' folders for certain cluster IDs (like Noise is for -1) you can create a map like cluster_targets = {-1: "Noise"} where the keys are your cluster IDs and their values are, obviously, the special names. Then you can replace the target_dir generation with: target_dir = os.path.join(src_dir, "cluster_" + str(cluster_targets.get(cls,cls)))
UPDATE #2 - Since your image_id values appear to be integers while filenames are strings, I'd suggest you to just build your cluster_map dict by converting your image_id parts to strings. That way you'd be comparing likes to likes without the danger of type mismatch:
cluster_map = {str(k): v for k, v in pts_cls}
If you're sure that none of the *.jpg files in your src_dir will have a non-integer in their name you can instead convert the filename into an integer to begin with in the files list generation - just replace fn[:-4] with int(fn[:-4]). But I wouldn't advise that as, again, you never know how your files might be named.