How to move every 500 files into different folders - python

As a beginner in Python I would need your help since I do not know enough how to create such script for my need. To give you an overall idea I have a folder Folder_1 that contains 50 000 different frames from a video in .png :
Folder_1 :
picture_00001
picture_00002
picture_00003
picture_00004
...
picture_50000
Since my Windows explorer is not running quite well with this huge amount of pictures I will need to move all of them in different folders in order to lower my RAM consumption and letting me working on a batch without considering the 50 000 pictures.
Therefore my objective is to have a script that will simply move the first 500 files to a folder sub_folder1 and then moving the 500 next to sub_folder2 etc... The folders needs to be created with the script as well :
Folder_1
sub_folder1
picture_00001
picture_00002
...
picture_00500
sub_folder2
picture_00501
picture_00502
...
picture_01000
I started working on with for i in range(500) but I have not clue on what to write then.
Hopping this is clear enough otherwise let me know and I will do my best to be even more precised.
Thank you in advance for your help.

One possible solution is:
First you find out which are the .png file names in the directory. You can achieve this by using os.listdir(<dir>) to return a list of filenames, then iterate over it and select just the correct files with fnmatch.
Then you set the increment (in this example 10, in yours 500), iterate over a generator range(0, len(files), increment), create a folder just if it doesn't exist and then move the files in chunks.
Solution:
from fnmatch import fnmatch
import os
import shutil
def get_filenames(root_dir):
pattern = '*.png'
files = []
for file in os.listdir(root_dir):
if fnmatch(file, pattern):
files.append(file)
return files
def distribute_files():
root_dir = r"C:\frames"
files = get_filenames(root_dir)
increment = 10
for i in range(0, len(files), increment):
subfolder = "files_{}_{}".format(i + 1, i + increment)
new_dir = os.path.join(root_dir, subfolder)
if not os.path.exists(new_dir):
os.makedirs(new_dir)
for file in files[i:i + increment]:
file_path = os.path.join(root_dir, file)
shutil.move(file_path, new_dir)
if __name__ == "__main__":
distribute_files()
Hope it helps.
Regards

Related

Remove images in multiple folders (Python)

I want to write a python script to randomly keep only some images in multiple folders.
I am new to python, and I am trying to find the solution. However, I could not find a good one to start with yet.
I would appreciate it if anyone could help me. Thank you.
This might help you. It firstly retrieves the list of all directories, and afterwards removing random files to get only n files. Note: path_to_all_images_folder has to be declared
import os
import random
def keep_n_dir(directory, n):
files = os.listdir(directory) #You retrieve the list of names of files
if len(files) > n: #If you already have less than n files, you do nothing
diff = len(files) - n
files_to_delete = random.sample(files, k=diff) #Random sample files to delete
for file in files_to_delete:
os.remove(os.path.join(directory, file)) #Delete additional files
directories = os.listdir(path_to_all_images_folder)
directories = [os.path.join(path_to_all_images_folder, folder) for folder in directories]
for directory in directories:
if os.path.isdir(directory):
keep_n_dir(directory, n)
ATTENTION! This code removes from the directory the other files. It only keeps n.

How to find and copy almost identical filenames from one folder to another using python?

I have a folder with a large number of files (mask_folder). The filenames in this folder are built as follows:
asdgaw-1454_mask.tif
lkafmns-8972_mask.tif
sdnfksdfk-1880_mask.tif
etc.
In another folder (test_folder), I have a smaller number of files with filenames written almost the same, but without the addition of _mask. Like:
asdgaw-1454.tif
lkafmns-8972.tif
etc.
What I need is a code to find the files in mask_folder that have an identical start of the filenames as compared to the files in test_folder and then these files should be copied from the mask_folder to the test_folder.
In that way the test_folder contains paired files as follows:
asdgaw-1454_mask.tif
asdgaw-1454.tif
lkafmns-8972_mask.tif
lkafmns-8972.tif
etc.
This is what I tried, it runs without any errors but nothing happens:
import shutil
import os
mask_folder = "//Mask/"
test_folder = "//Test/"
n = 8
list_of_files_mask = []
list_of_files_test = []
for file in os.listdir(mask_folder):
if not file.startswith('.'):
list_of_files_mask.append(file)
start_mask = file[0:n]
print(start_mask)
for file in os.listdir(test_folder):
if not file.startswith('.'):
list_of_files_test.append(file)
start_test = file[0:n]
print(start_test)
for file in start_test:
if start_mask == start_test:
shutil.copy2(file, test_folder)
The past period I searched for but not found a solution for above mentioned problem. So, any help is really appreciated.
First, you want to get only the files, not the folders as well, so you should probably use os.walk() instead of listdir() to make the solution more robust. Read more about it in this question.
Then, I suggest loading the filenames of the test folder into memory (since they are the smaller part) and then NOT load all the other files into memory as well but instead copy them right away.
import os
import shutil
test_dir_path = ''
mask_dir_path = ''
# load file names from test folder into a list
test_file_list = []
for _, _, file_names in os.walk(test_dir_path):
# 'file_names' is a list of strings
test_file_list.extend(file_names)
# exit after this directory, do not check child directories
break
# check mask folder for matches
for _, _, file_names in os.walk(mask_dir_path):
for name_1 in file_names:
# we just remove a part of the filename to get exact matches
name_2 = name_1.replace('_mask', '')
# we check if 'name_2' is in the file name list of the test folder
if name_2 in test_file_list:
print('we copy {} because {} was found'.format(name_1, name_2))
shutil.copy2(
os.path.join(mask_dir_path, name_1),
test_dir_path)
# exit after this directory, do not check child directories
break
Does this solve your problem?

Renaming files in the order they are titled python

I'm using a python script that other guys have helped me with to rename all .jpg or .png files in a directory to whatever I want in order.
So if I have 20 .png files in a directory, I want to rename them in order from 1-20.
The script I have DOES this and I'm happy with it. However, it was just pointed out that the files I have been renaming with this script have been out of order.
As an example, when I rename 1.png to testImage1.png, I'm really renaming testImage10.png as testImage1.png. I tested this with my script by creating 5 text files with the same content, but text files 1-3 I put in different content to keep track of what is what after I'm done renaming. Sure enough, everything is mixed up.
import os
import sys
source = sys.argv[1]
files = os.listdir(source)
name = sys.argv[2]
def rename():
i = 1
for file in files:
os.rename(os.path.join(source, file), os.path.join(source, name+str(i)+'.png'))
i += 1
rename()
I took the time to try and use my (limited) python knowledge to create a series of if/elif statements to sift through and rename the files with the correct name in order.
def roundTwo():
print('Beginning of the end')
i = 1
for root, dirs, files in os.walk(source):
for file in files:
print('Test')
if source == 'newFile0.txt' or 'newFile0.png':
os.rename(os.path.join(source, file), os.path.join(source, name+str(i)+'.txt'))
print('Test1')
i += 1
elif source == 'newFile1.txt' or 'newFile1.png':
os.rename(os.path.join(source, file), os.path.join(source, name+str(i)+'.txt'))
print('Test2')
i += 1
roundTwo()
I did a fair amount of searching to include using Re or fnmatch but nothing comes exactly close to what I'm looking to do. Perhaps I'm using the wrong terms to search? Any insight helps!
If your problem is with 1 and 10, then you can use natural sorting. Sort your variable files as follows:
from natsort import natsorted, ns
natsorted(files, alg=ns.IGNORECASE)
Example:
>>> x = ['a/b/c21.txt', 'a/b/c1.txt', 'a/b/c10.txt', 'a/b/c11.txt', 'a/b/c2.txt']
>>> sorted(x)
['a/b/c1.txt', 'a/b/c10.txt', 'a/b/c11.txt', 'a/b/c2.txt', 'a/b/c21.txt']
>>> natsorted(x, alg=ns.IGNORECASE)
['a/b/c1.txt', 'a/b/c2.txt', 'a/b/c10.txt', 'a/b/c11.txt', 'a/b/c21.txt']
If all the files have some sort of basename you can modify your first function to extract the number assigned to the image
baseName='testImage'
def rename():
for file in files:
number=file[len(baseName):file.find('.png')]
os.rename(os.path.join(source, file), os.path.join(source, name+number+'.png'))
Hope it helps

Python folder recursion

I'm getting lost... I'll post what I tried, but it didn't work.
So I have to get through 3 folders.
Let's say there is the main folder(label it main for this) and 100 sub folders to the main folder(1-100 labeled), but I need to get inside those subfolders(labeled A,B,C,D,E..etc for what I need, won't go more than D) and I need to read the files inside the subfolders A,B,C,D which are .txt folders.
So
Main--->1-100--->A,B,C,D for each 1-100---> read .txt folders
import os
import glob
os.chdir("/Main")
for folders in glob.glob("20*"):
print(folders)
I tried this with an extra code to get in the subfolders
for folder in folders:
glob.glob("tracking_data*")
print(folder)
Which didn't give me what I needed. The first code works fine, the 2nd code was supposed to give me a list of tracking data folders.
I know what I need to do, and am probably overcomplicating it with my lack of previous programming. Thank you
I'm not sure I understand what it is you wish to do, but assuming you want to iterate over 2-nested folders and search for .txt files, this code should do it:
if __name__ == '__main__':
import os
rootdir = '/'
txt_files = []
for _, dirs_1_100, _ in os.walk(rootdir): # iterate through 1-100
for dir_1_100 in dirs_1_100:
path_1_100 = os.path.join(rootdir, dir_1_100)
for _, dirs_A_E, _ in os.walk(path_1_100): # iterate through A-E
for dir_A_E in dirs_A_E:
path_A_E = os.path.join(path_1_100, dir_A_E)
for _, _, files in os.walk(path_A_E):
for requested_file in files:
if not requested_file.endswith('.txt'):
continue
else:
txt_files.append(os.path.join(dir_A_E, requested_file))

Python File System Reader Performance

I need to scan a file system for a list of files, and log those who don't exist. Currently I have an input file with a list of the 13 million files which need to be investigated. This script needs to be run from a remote location, as I do not have access/cannot run scripts directly on the storage server.
My current approach works, but is relatively slow. I'm still fairly new to Python, so I'm looking for tips on speeding things up.
import sys,os
from pz import padZero #prepends 0's to string until desired length
output = open('./out.txt', 'w')
input = open('./in.txt', 'r')
rootPath = '\\\\server\share\' #UNC path to storage
for ifid in input:
ifid = padZero(str(ifid)[:-1], 8) #extracts/formats fileName
dir = padZero(str(ifid)[:-3], 5) #exracts/formats the directory containing the file
fPath = rootPath + '\\' + dir + '\\' + ifid + '.tif'
try:
size = os.path.getsize(fPath) #don't actually need size, better approach?
except:
output.write(ifid+'\n')
Thanks.
dirs = collections.defaultdict(set)
for file_path in input:
file_path = file_path.rjust(8, "0")
dir, name = file_path[:-3], file_path
dirs[dir].add(name)
for dir, files in dirs.iteritems():
for missing_file in files - set(glob.glob("*.tif")):
print missing_file
Explanation
First read the input file into a dictionary of directory: filename. Then for each directory, list all the TIFF files in that directory on the server, and (set) subtract this from the collection of filenames you should have. Print anything that's left.
EDIT: Fixed silly things. Too late at night when I wrote this!
That padZero and string concatenation stuff looks to me like it would take a good percent of time.
What you want it to do is spend all its time reading the directory, very little else.
Do you have to do it in python? I've done similar stuff in C and C++. Java should be pretty good too.
You're going to be I/O bound, especially on a network, so any changes you can make to your script will result in very minimal speedups, but off the top of my head:
import os
input, output = open("in.txt"), open("out.txt", "w")
root = r'\\server\share'
for fid in input:
fid = fid.strip().rjust(8, "0")
dir = fid[:-3] # no need to re-pad
path = os.path.join(root, dir, fid + ".tif")
if not os.path.isfile(path):
output.write(fid + "\n")
I don't really expect that to be any faster, but it is arguably easier to read.
Other approaches may be faster. For example, if you expect to touch most of the files, you could just pull a complete recursive directory listing from the server, convert it to a Python set(), and check for membership in that rather than hitting the server for many small requests. I will leave the code as an exercise...
I would probably use a shell command to get the full listing of files in all directories and subdirectories in one hit. Hopefully this will minimise the amount of requests you need to make to the server.
You can get a listing of the remote server's files by doing something like:
Linux: mount the shared drive as /shared/directory/ and then do ls -R /shared/directory > ~/remote_file_list.txt
Windows: Use Map Network Drive to mount the shared drive as drive letter X:, then do dir /S X:/shared_directory > C:/remote_file_list.txt
Use the same methods to create a listing of your local folder's contents as local_file_list.txt. You python script will then reduce to an exercise in text processing.
Note: I did actually have to do this at work.

Categories