I want to write a python script to randomly keep only some images in multiple folders.
I am new to python, and I am trying to find the solution. However, I could not find a good one to start with yet.
I would appreciate it if anyone could help me. Thank you.
This might help you. It firstly retrieves the list of all directories, and afterwards removing random files to get only n files. Note: path_to_all_images_folder has to be declared
import os
import random
def keep_n_dir(directory, n):
files = os.listdir(directory) #You retrieve the list of names of files
if len(files) > n: #If you already have less than n files, you do nothing
diff = len(files) - n
files_to_delete = random.sample(files, k=diff) #Random sample files to delete
for file in files_to_delete:
os.remove(os.path.join(directory, file)) #Delete additional files
directories = os.listdir(path_to_all_images_folder)
directories = [os.path.join(path_to_all_images_folder, folder) for folder in directories]
for directory in directories:
if os.path.isdir(directory):
keep_n_dir(directory, n)
ATTENTION! This code removes from the directory the other files. It only keeps n.
Related
I have a folder containing subfolders and I'd like to check if these subfolders contain files or not.
Output should look like
C:...\2001 contains files
C:...\2002 contains no files
and so on.
so far, I only found advice on how to delete empty folders, but I want to keep them since they might be filled later.
(I started coding 2 months ago but the people at my company think I'm some kind of IT wizard so I'd be very thankful if someone could help.)
This would work:
from os import listdir
path = "C:.../"
subfolders = listdir(path)
folders_with_files = []
for subfolder in subfolders:
if listdir(path + subfolder) != []:
folders_with_files.append(subfolder)
As a beginner in Python I would need your help since I do not know enough how to create such script for my need. To give you an overall idea I have a folder Folder_1 that contains 50 000 different frames from a video in .png :
Folder_1 :
picture_00001
picture_00002
picture_00003
picture_00004
...
picture_50000
Since my Windows explorer is not running quite well with this huge amount of pictures I will need to move all of them in different folders in order to lower my RAM consumption and letting me working on a batch without considering the 50 000 pictures.
Therefore my objective is to have a script that will simply move the first 500 files to a folder sub_folder1 and then moving the 500 next to sub_folder2 etc... The folders needs to be created with the script as well :
Folder_1
sub_folder1
picture_00001
picture_00002
...
picture_00500
sub_folder2
picture_00501
picture_00502
...
picture_01000
I started working on with for i in range(500) but I have not clue on what to write then.
Hopping this is clear enough otherwise let me know and I will do my best to be even more precised.
Thank you in advance for your help.
One possible solution is:
First you find out which are the .png file names in the directory. You can achieve this by using os.listdir(<dir>) to return a list of filenames, then iterate over it and select just the correct files with fnmatch.
Then you set the increment (in this example 10, in yours 500), iterate over a generator range(0, len(files), increment), create a folder just if it doesn't exist and then move the files in chunks.
Solution:
from fnmatch import fnmatch
import os
import shutil
def get_filenames(root_dir):
pattern = '*.png'
files = []
for file in os.listdir(root_dir):
if fnmatch(file, pattern):
files.append(file)
return files
def distribute_files():
root_dir = r"C:\frames"
files = get_filenames(root_dir)
increment = 10
for i in range(0, len(files), increment):
subfolder = "files_{}_{}".format(i + 1, i + increment)
new_dir = os.path.join(root_dir, subfolder)
if not os.path.exists(new_dir):
os.makedirs(new_dir)
for file in files[i:i + increment]:
file_path = os.path.join(root_dir, file)
shutil.move(file_path, new_dir)
if __name__ == "__main__":
distribute_files()
Hope it helps.
Regards
I have a folder with a large number of files (mask_folder). The filenames in this folder are built as follows:
asdgaw-1454_mask.tif
lkafmns-8972_mask.tif
sdnfksdfk-1880_mask.tif
etc.
In another folder (test_folder), I have a smaller number of files with filenames written almost the same, but without the addition of _mask. Like:
asdgaw-1454.tif
lkafmns-8972.tif
etc.
What I need is a code to find the files in mask_folder that have an identical start of the filenames as compared to the files in test_folder and then these files should be copied from the mask_folder to the test_folder.
In that way the test_folder contains paired files as follows:
asdgaw-1454_mask.tif
asdgaw-1454.tif
lkafmns-8972_mask.tif
lkafmns-8972.tif
etc.
This is what I tried, it runs without any errors but nothing happens:
import shutil
import os
mask_folder = "//Mask/"
test_folder = "//Test/"
n = 8
list_of_files_mask = []
list_of_files_test = []
for file in os.listdir(mask_folder):
if not file.startswith('.'):
list_of_files_mask.append(file)
start_mask = file[0:n]
print(start_mask)
for file in os.listdir(test_folder):
if not file.startswith('.'):
list_of_files_test.append(file)
start_test = file[0:n]
print(start_test)
for file in start_test:
if start_mask == start_test:
shutil.copy2(file, test_folder)
The past period I searched for but not found a solution for above mentioned problem. So, any help is really appreciated.
First, you want to get only the files, not the folders as well, so you should probably use os.walk() instead of listdir() to make the solution more robust. Read more about it in this question.
Then, I suggest loading the filenames of the test folder into memory (since they are the smaller part) and then NOT load all the other files into memory as well but instead copy them right away.
import os
import shutil
test_dir_path = ''
mask_dir_path = ''
# load file names from test folder into a list
test_file_list = []
for _, _, file_names in os.walk(test_dir_path):
# 'file_names' is a list of strings
test_file_list.extend(file_names)
# exit after this directory, do not check child directories
break
# check mask folder for matches
for _, _, file_names in os.walk(mask_dir_path):
for name_1 in file_names:
# we just remove a part of the filename to get exact matches
name_2 = name_1.replace('_mask', '')
# we check if 'name_2' is in the file name list of the test folder
if name_2 in test_file_list:
print('we copy {} because {} was found'.format(name_1, name_2))
shutil.copy2(
os.path.join(mask_dir_path, name_1),
test_dir_path)
# exit after this directory, do not check child directories
break
Does this solve your problem?
I have a image dataset archived in tree structure, where the name of different level of folders are the labels of these images, for example
/Label1
/Label2
/image1
/image2
/Label3
/image3
/Label4
/image4
/image5
Then how could I count number of images in each end folder. In the above example, I want to know how many images are there in folder /Label1/Label2, /Label1/Label3 and Label4.
I checked out the function os.walk(), but it seems that there is no easy way to count number of files in each individual end folder.
Could anyone help me, thank you in advance.
You can do this with os.walk():
import os
c=0
print(os.getcwd())
for root, directories, filenames in os.walk('C:/Windows'):
for files in filenames:
c=c+1
print(c)
output:
125765
>>>
If you have multiple file formats in sub-directories, you can use an operator and check jpeg, png, and then increment the counter.
I checked out the function os.walk(), but it seems that there is no easy way to count number of files in each individual end folder.
Sure there is. It's just len(files).
I'm not 100% sure what you meant by "end directory", but if you wanted to skip directories that aren't leaves—that is, that have subdirectories—that's just as easy: if not dirs.
Finally, I don't know whether you wanted to print the counts out, store them in a dict with the paths as keys, total them up, or what, but those are all pretty trivial, so here's some code that does all three as examples:
total = 0
counts = {}
for root, dirs, files in os.walk(path):
if not dirs:
print(f'{root}: {len(files)}')
counts[root] = len(files)
total += len(files)
I'm getting lost... I'll post what I tried, but it didn't work.
So I have to get through 3 folders.
Let's say there is the main folder(label it main for this) and 100 sub folders to the main folder(1-100 labeled), but I need to get inside those subfolders(labeled A,B,C,D,E..etc for what I need, won't go more than D) and I need to read the files inside the subfolders A,B,C,D which are .txt folders.
So
Main--->1-100--->A,B,C,D for each 1-100---> read .txt folders
import os
import glob
os.chdir("/Main")
for folders in glob.glob("20*"):
print(folders)
I tried this with an extra code to get in the subfolders
for folder in folders:
glob.glob("tracking_data*")
print(folder)
Which didn't give me what I needed. The first code works fine, the 2nd code was supposed to give me a list of tracking data folders.
I know what I need to do, and am probably overcomplicating it with my lack of previous programming. Thank you
I'm not sure I understand what it is you wish to do, but assuming you want to iterate over 2-nested folders and search for .txt files, this code should do it:
if __name__ == '__main__':
import os
rootdir = '/'
txt_files = []
for _, dirs_1_100, _ in os.walk(rootdir): # iterate through 1-100
for dir_1_100 in dirs_1_100:
path_1_100 = os.path.join(rootdir, dir_1_100)
for _, dirs_A_E, _ in os.walk(path_1_100): # iterate through A-E
for dir_A_E in dirs_A_E:
path_A_E = os.path.join(path_1_100, dir_A_E)
for _, _, files in os.walk(path_A_E):
for requested_file in files:
if not requested_file.endswith('.txt'):
continue
else:
txt_files.append(os.path.join(dir_A_E, requested_file))