I have the following code, and it never stops. It never evaluates the next condition once it has finished with exporting the file. What am I doing wrong?
I am working on Python 3.x and Windows 10.
for maindir, subdirs, shpfiles in os.walk(by_numSegments): # check in the whole folder
if "poly1000numSeg" in maindir: # check only in the input folder (segment_img)
if "compactness_1" in maindir:
for s, ishp in enumerate(shpfiles):
input_list = list(filter(lambda mpoly: mpoly.endswith('.shp'), os.listdir(maindir)))
# list with the first uploaded polygon. In the loop will the following polygons added
auto_inter = gpd.GeoDataFrame.from_file(os.path.join(maindir, input_list[0]))
# add the rest of the polygons one by one
for i in range(len(input_list)-1):
mp = gpd.GeoDataFrame.from_file(os.path.join(maindir, input_list[i+1]))
auto_inter = gpd.overlay(auto_inter, mp, how='intersection')
# export
auto_inter.to_file(os.path.join(src, "compactness_1/numSeg1000_c1.shp"))
if "compactness10" in maindir:
for s, ishp in enumerate(shpfiles):
input_list = list(filter(lambda mpoly: mpoly.endswith('.shp'), os.listdir(maindir)))
# list with the first uploaded polygon. In the loop will the following polygons added
auto_inter = gpd.GeoDataFrame.from_file(os.path.join(maindir, input_list[0]))
# add the rest of the polygons one by one
for i in range(len(input_list)-1):
mp = gpd.GeoDataFrame.from_file(os.path.join(maindir, input_list[i+1]))
auto_inter = gpd.overlay(auto_inter, mp, how='intersection')
# export
auto_inter.to_file(os.path.join(src, "compactness10/numSeg1000_c10.shp"))
I suspect src is the same folder you are iterating. You are adding files while iterating the file list.
for maindir, subdirs, shpfiles in os.walk(by_numSegments): # check in the whole folder
if "poly1000numSeg" in maindir: # check only in the input folder (segment_img)
if "compactness_1" in maindir:
for s, ishp in enumerate(shpfiles):
input_list = list(filter(lambda mpoly: mpoly.endswith('.shp'), os.listdir(maindir))) # get file list
......
for i in range(len(input_list)-1): # loop through list
.......
auto_inter.to_file(os.path.join(src, "compactness_1/numSeg1000_c1.shp")) # create new file
Try setting input_list before the loop:
maindir, subdirs, shpfiles in os.walk(by_numSegments): # check in the whole folder
if "poly1000numSeg" in maindir: # check only in the input folder (segment_img)
if "compactness_1" in maindir:
input_list = list(filter(lambda mpoly: mpoly.endswith('.shp'), os.listdir(maindir))) # get file list
for s, ishp in enumerate(shpfiles):
......
for i in range(len(input_list)-1): # loop through list
.......
auto_inter.to_file(os.path.join(src, "compactness_1/numSeg1000_c1.shp")) # create new file
Related
I have nested for loops which are causing the execution of my operation to be incredibly slow. I wanted to know if there is another way to do this.
The operation is basically going through files in 6 different directories and seeing if there is a file in each directory that is the same before opening each file up and then displaying them.
My code is:
original_images = os.listdir(original_folder)
ground_truth_images = os.listdir(ground_truth_folder)
randomforest_images = os.listdir(randomforest)
ilastik_images = os.listdir(ilastik)
kmeans_images = os.listdir(kmeans)
logreg_multi_images = os.listdir(logreg_multi)
random_forest_multi_images = os.listdir(randomforest_multi)
for x in original_images:
for y in ground_truth_images:
for z in randomforest_images:
for i in ilastik_images:
for j in kmeans_images:
for t in logreg_multi_images:
for w in random_forest_multi_images:
if x == y == z == i == j == w == t:
*** rest of code operation ***
If the condition is that the same file must be present in all seven directories to run the rest of the code operation, then it's not necessary to search for the same file in all directories. As soon as the file is not in one of the directories, you can forget about it and move to the next file. So you can build a for loop looping through the files in the first directory and then build a chain of nested if statements: If the file exists in the next directory, you move forward to the directory after that and search there. If it doesn't, you move back to the first directory and pick the next file in it.
Convert all of them to sets and iterate through the last one, checking membership for all of the others:
original_images = os.listdir(original_folder)
ground_truth_images = os.listdir(ground_truth_folder)
randomforest_images = os.listdir(randomforest)
ilastik_images = os.listdir(ilastik)
kmeans_images = os.listdir(kmeans)
logreg_multi_images = os.listdir(logreg_multi)
files = set()
# add folder contents to the set of all files here
for folder in [original_images, ground_truth_images, randomforest_images, ilastik_images, kmeans_images, logreg_multi_images]:
files.update(folder)
random_forest_multi_images = set(os.listdir(randomforest_multi))
# find all common items between the sets
for file in random_forest_multi_images.intersection(files):
# rest of code
The reason this works is that you are only interested in the intersection of all sets, so you only need to iterate over one set and check for membership in the rest
You should check x == y before going in the nest loop. Then y == z etc. Now you are going over each loop way too often.
There is also another approach:
You can create a set of all your images and create an intersection over each set so the only elements which will remain are the ones that are equal. If you are sure that the files are the same you can skip that step.
If x is in all other list you can create your paths on the go:
import pathlib
original_images = os.listdir(original_folder)
ground_truth_images = pathlib.Path(ground_truth_folder) #this is a folder
randomforest_images = pathlib.Path(randomforest)
for x in original_images:
y = ground_truth_images / x
i = randomforest_images / x
# And so on for all your files
# check if all files exist:
for file in [x, y, i, j, t ,w]:
if not file.exists():
continue # go to next x
# REST OF YOUR CODE USING x, y, i, j, t, w,
# y, i, j, t, w, are now pathlib object, you can get s string (of its path using str(y), str(i) etc.
I'm trying to make a list of files, using recursion, in a given directory. I have been able to do this correctly but the order of my list is not correct. I need the files on the most surface layer of the directory show first in the list with the other files in subdirectories being in lexicographical order after.
Here is the code I have to do what I've discussed above.
import os
important = []
def search_directory(folder):
hold = os.listdir(folder)
for i in hold:
test = os.path.join(folder, i)
if os.path.isfile(test) == True and
os.path.isfile(test) not in interesting:
interesting.append(test)
else:
search_directory(test)
return important
Seems like you need BFS traversal of your directory tree.
import collections
import os
def extract_tree(root):
q = collections.deque()
q.append(root)
tree = []
while q:
root = q.popleft()
contents = sorted(os.listdir(root))
for f in contents:
path = os.path.join(root, f)
if os.path.isfile(path):
tree.append(path)
else:
q.append(path)
return tree
Following scenario of traversing dir structure.
"Build complete dir tree with files but if files in single dir are similar in name list only single entity"
Example tree ( let's assume they're are not sorted ):
- rootDir
-dirA
fileA_01
fileA_03
fileA_05
fileA_06
fileA_04
fileA_02
fileA_...
fileAB
fileAC
-dirB
fileBA
fileBB
fileBC
Expected output:
- rootDir
-dirA
fileA_01 - fileA_06 ...
fileAB
fileAC
-dirB
fileBA
fileBB
fileBC
So I did already simple def findSimilarNames that for fileA_01 (or any fileA_) will return list [fileA_01...fileA_06]
Now I'm in os.walk and I'm doing loop over files so every file will be checked against similar filenames so e.g fileA_03 I've got rest of them [fileA_01 - fileA_06] and now I want to modify the list that I iterate over to just skip items from findSimilarNames, without need of using another loop or if's inside.
I searched here and people are suggesting avoidance of modifying iteration list, but doing so I would avoid every file iteration.
Pseudo code:
for root,dirs,files in os.walk( path ):
for file in files:
similarList = findSimilarNames( file )
#OVERWRITE ITERATION LIST SOMEHOW
files = (set(files)-set(similarList))
#DEAL WITH ELEMENT
What I'm trying to avoid is below - checking each file because maybe it's already found by findSimilarNames.
for root,dirs,files in os.walk( path ):
filteredbysimilar = files[:]
for file in files:
similar = findSimilarNames( file )
filteredbysimilar = list(set(filteredbysimilar)-set(similar))
#--
for filteredFile in filteredbysimilar:
#DEAL WITH ELEMENT
#OVERWRITE ITERATION LIST SOMEHOW
You can get this effect by using a while-loop style iteration. Since you want to do set subtraction to remove the similar groups anyway, the natural approach is to start with a set of all the filenames, and repeatedly remove groups until nothing is left. Thus:
unprocessed = set(files)
while unprocessed:
f = unprocessed.pop() # removes and returns an arbitrary element
group = findSimilarNames(f)
unprocessed -= group # it is not an error that `f` has already been removed.
doSomethingWith(group) # i.e., "DEAL WITH ELEMENT" :)
How about building up a list of files that aren't similar?
unsimilar = set()
for f in files:
if len(findSimilarNames(f).intersection(unsimilar))==0:
unsimilar.add(f)
This assumes findSimilarNames yields a set.
I am using Python 2.7, btw..
Let's say I have a couple directories that I want to create dictionaries for. The files in each of the directories are named YYYYMMDD.hhmmss and are all different, and the size of each directory is different:
path1 = /path/to/folders/to/make/dictionaries
dir1 = os.listdir(path1)
I also have another static directory that will have some files to compare
gpath1 = /path/to/static/files
gdir1 = os.listdir(gpath1)
dir1_file_list = [datetime.strptime(g, '%Y%m%d.%H%M%S') for g in gdir1]
So I have a static directory of files in gdir2, and I now want to loop through each directory in dir1 and create a unique dictionary. This is the code:
for i in range(0,len(dir1)):
path2 = path1 + "/" + dir1[i]
dir2 = os.listdir(path2)
dir2_file_list = [datetime.strptime(r, '%Y%m%d.%H%M%S') for r in dir2]
# Define a dictionary, and initialize comparisons
dict_gr = []
dict_gr = dict()
for dir1_file in dir1_file_list:
dict_gr[str(dir1_file)] = []
# Look for instances within the last 5 minutes
for dir2_file in dir2_file_list:
if 0 <= (dir1_file - dir2_file).total_seconds() <= 300:
dict_gr[str(dir1_file)].append(str(dir2_file))
# Sort the dictionaries
for key, value in sorted(dict_gr.iteritems()):
dir2_lib.append(key)
dir1_lib.append(sorted(value))
The issue is that path2 and dir2 both properly go to the different folders and grab the necessary filenames, and creating dict_gr will all work well. However, when I go to the part of the script where I sort the dictionaries, the 2nd directory that has been looped over will contain the contents of the first directory. The 3rd looped dictionary will contain the contents of the 1st and 2nd, etc. In other words, they are not matching uniquely with each directory.
Any thoughts?
Overlooked appending to dir2_lib and dir1_lib, needed to initialize these.
I have the following working code to sort images according to a cluster list which is a list of tuples: (image_id, cluster_id).
One image can only be in one and only one cluster (there is never the same image in two clusters for example).
I wonder if there is a way to shorten the "for+for+if+if" loops at the end of the code as yet, for each file name, I must check in every pairs in the cluster list, which makes it a little redundant.
import os
import re
import shutil
srcdir = '/home/username/pictures/' #
if not os.path.isdir(srcdir):
print("Error, %s is not a valid directory!" % srcdir)
return None
pts_cls # is the list of pairs (image_id, cluster_id)
filelist = [(srcdir+fn) for fn in os.listdir(srcdir) if
re.search(r'\.jpg$', fn, re.IGNORECASE)]
filelist.sort(key=lambda var:[int(x) if x.isdigit() else
x for x in re.findall(r'[^0-9]|[0-9]+', var)])
for f in filelist:
fbname = os.path.splitext(os.path.basename(f))[0]
for e,cls in enumerate(pts_cls): # for each (img_id, clst_id) pair
if str(cls[0])==fbname: # check if image_id corresponds to file basename on disk)
if cls[1]==-1: # if cluster_id is -1 (->noise)
outdir = srcdir+'cluster_'+'Noise'+'/'
else:
outdir = srcdir+'cluster_'+str(cls[1])+'/'
if not os.path.isdir(outdir):
os.makedirs(outdir)
dstf = outdir+os.path.basename(f)
if os.path.isfile(dstf)==False:
shutil.copy2(f,dstf)
Of course, as I am pretty new to Python, any other well explained improvements are welcome!
I think you're complicating this far more than needed. Since your image names are unique (there can only be one image_id) you can safely convert pts_cls into a dict and have fast lookups on the spot instead of looping through the list of pairs each and every time. You are also utilizing regex where its not needed and you're packing your paths only to unpack them later.
Also, your code would break if it happens that an image from your source directory is not in the pts_cls as its outdir would never be set (or worse, its outdir would be the one from the previous loop).
I'd streamline it like:
import os
import shutil
src_dir = "/home/username/pictures/"
if not os.path.isdir(src_dir):
print("Error, %s is not a valid directory!" % src_dir)
exit(1) # return is expected only from functions
pts_cls = [] # is the list of pairs (image_id, cluster_id), load from whereever...
# convert your pts_cls into a dict - since there cannot be any images in multiple clusters
# base image name is perfectly ok to use as a key for blazingly fast lookups later
cluster_map = dict(pts_cls)
# get only `.jpg` files; store base name and file name, no need for a full path at this time
files = [(fn[:-4], fn) for fn in os.listdir(src_dir) if fn.lower()[-4:] == ".jpg"]
# no need for sorting based on your code
for name, file_name in files: # loop through all files
if name in cluster_map: # proceed with the file only if in pts_cls
cls = cluster_map[name] # get our cluster value
# get our `cluster_<cluster_id>` or `cluster_Noise` (if cluster == -1) target path
target_dir = os.path.join(src_dir, "cluster_" + str(cls if cls != -1 else "Noise"))
target_file = os.path.join(target_dir, file_name) # get the final target path
if not os.path.exists(target_file): # if the target file doesn't exists
if not os.path.isdir(target_dir): # make sure our target path exists
os.makedirs(target_dir, exist_ok=True) # create a full path if it doesn't
shutil.copy(os.path.join(src_dir, file_name), target_file) # copy
UPDATE - If you have multiple 'special' folders for certain cluster IDs (like Noise is for -1) you can create a map like cluster_targets = {-1: "Noise"} where the keys are your cluster IDs and their values are, obviously, the special names. Then you can replace the target_dir generation with: target_dir = os.path.join(src_dir, "cluster_" + str(cluster_targets.get(cls,cls)))
UPDATE #2 - Since your image_id values appear to be integers while filenames are strings, I'd suggest you to just build your cluster_map dict by converting your image_id parts to strings. That way you'd be comparing likes to likes without the danger of type mismatch:
cluster_map = {str(k): v for k, v in pts_cls}
If you're sure that none of the *.jpg files in your src_dir will have a non-integer in their name you can instead convert the filename into an integer to begin with in the files list generation - just replace fn[:-4] with int(fn[:-4]). But I wouldn't advise that as, again, you never know how your files might be named.