Python folder recursion - python

I'm getting lost... I'll post what I tried, but it didn't work.
So I have to get through 3 folders.
Let's say there is the main folder(label it main for this) and 100 sub folders to the main folder(1-100 labeled), but I need to get inside those subfolders(labeled A,B,C,D,E..etc for what I need, won't go more than D) and I need to read the files inside the subfolders A,B,C,D which are .txt folders.
So
Main--->1-100--->A,B,C,D for each 1-100---> read .txt folders
import os
import glob
os.chdir("/Main")
for folders in glob.glob("20*"):
print(folders)
I tried this with an extra code to get in the subfolders
for folder in folders:
glob.glob("tracking_data*")
print(folder)
Which didn't give me what I needed. The first code works fine, the 2nd code was supposed to give me a list of tracking data folders.
I know what I need to do, and am probably overcomplicating it with my lack of previous programming. Thank you

I'm not sure I understand what it is you wish to do, but assuming you want to iterate over 2-nested folders and search for .txt files, this code should do it:
if __name__ == '__main__':
import os
rootdir = '/'
txt_files = []
for _, dirs_1_100, _ in os.walk(rootdir): # iterate through 1-100
for dir_1_100 in dirs_1_100:
path_1_100 = os.path.join(rootdir, dir_1_100)
for _, dirs_A_E, _ in os.walk(path_1_100): # iterate through A-E
for dir_A_E in dirs_A_E:
path_A_E = os.path.join(path_1_100, dir_A_E)
for _, _, files in os.walk(path_A_E):
for requested_file in files:
if not requested_file.endswith('.txt'):
continue
else:
txt_files.append(os.path.join(dir_A_E, requested_file))

Related

Read first file in python

I'm trying to make a script that iterates through folders in a directory and saves the very first one in the folder to a variable. I'm trying to do this because I have folders in a directory with hundreads of .exr files but I only need the first file in the folder to be saved to the directory. I can get it to print out all the files in each folder but thats a bit too much info than I need. Is there an easy way to do this using something like os.walk? This is what I'm working with so far
import os
def main():
dirName = r"F:\FOLDERNAME"
#Get the List of all files in the directory tree at given path
listOfFiles = getListOfFiles(dirName)
#Print the files
for elem in listOfFiles:
print(elem)
if __name__ =='__main__':
main()
Thanks yall!
If you already:
can get it to print out all the files in each folder
And you only want :
the very first one
Then all you need is something like:
first_file = list_of_files[0]
Looks like you're still learning the very basics of coding. I suggest looking up lessons/tutorials on lists, list indexing, and iterables.
If you use os.walk(), it returns a list of files in each directory. You can print the first one.
for root, dirs, files in os.walk(dirName):
if files:
print(os.path.join(root, files[0]))

Remove images in multiple folders (Python)

I want to write a python script to randomly keep only some images in multiple folders.
I am new to python, and I am trying to find the solution. However, I could not find a good one to start with yet.
I would appreciate it if anyone could help me. Thank you.
This might help you. It firstly retrieves the list of all directories, and afterwards removing random files to get only n files. Note: path_to_all_images_folder has to be declared
import os
import random
def keep_n_dir(directory, n):
files = os.listdir(directory) #You retrieve the list of names of files
if len(files) > n: #If you already have less than n files, you do nothing
diff = len(files) - n
files_to_delete = random.sample(files, k=diff) #Random sample files to delete
for file in files_to_delete:
os.remove(os.path.join(directory, file)) #Delete additional files
directories = os.listdir(path_to_all_images_folder)
directories = [os.path.join(path_to_all_images_folder, folder) for folder in directories]
for directory in directories:
if os.path.isdir(directory):
keep_n_dir(directory, n)
ATTENTION! This code removes from the directory the other files. It only keeps n.

How to move every 500 files into different folders

As a beginner in Python I would need your help since I do not know enough how to create such script for my need. To give you an overall idea I have a folder Folder_1 that contains 50 000 different frames from a video in .png :
Folder_1 :
picture_00001
picture_00002
picture_00003
picture_00004
...
picture_50000
Since my Windows explorer is not running quite well with this huge amount of pictures I will need to move all of them in different folders in order to lower my RAM consumption and letting me working on a batch without considering the 50 000 pictures.
Therefore my objective is to have a script that will simply move the first 500 files to a folder sub_folder1 and then moving the 500 next to sub_folder2 etc... The folders needs to be created with the script as well :
Folder_1
sub_folder1
picture_00001
picture_00002
...
picture_00500
sub_folder2
picture_00501
picture_00502
...
picture_01000
I started working on with for i in range(500) but I have not clue on what to write then.
Hopping this is clear enough otherwise let me know and I will do my best to be even more precised.
Thank you in advance for your help.
One possible solution is:
First you find out which are the .png file names in the directory. You can achieve this by using os.listdir(<dir>) to return a list of filenames, then iterate over it and select just the correct files with fnmatch.
Then you set the increment (in this example 10, in yours 500), iterate over a generator range(0, len(files), increment), create a folder just if it doesn't exist and then move the files in chunks.
Solution:
from fnmatch import fnmatch
import os
import shutil
def get_filenames(root_dir):
pattern = '*.png'
files = []
for file in os.listdir(root_dir):
if fnmatch(file, pattern):
files.append(file)
return files
def distribute_files():
root_dir = r"C:\frames"
files = get_filenames(root_dir)
increment = 10
for i in range(0, len(files), increment):
subfolder = "files_{}_{}".format(i + 1, i + increment)
new_dir = os.path.join(root_dir, subfolder)
if not os.path.exists(new_dir):
os.makedirs(new_dir)
for file in files[i:i + increment]:
file_path = os.path.join(root_dir, file)
shutil.move(file_path, new_dir)
if __name__ == "__main__":
distribute_files()
Hope it helps.
Regards

How to find and copy almost identical filenames from one folder to another using python?

I have a folder with a large number of files (mask_folder). The filenames in this folder are built as follows:
asdgaw-1454_mask.tif
lkafmns-8972_mask.tif
sdnfksdfk-1880_mask.tif
etc.
In another folder (test_folder), I have a smaller number of files with filenames written almost the same, but without the addition of _mask. Like:
asdgaw-1454.tif
lkafmns-8972.tif
etc.
What I need is a code to find the files in mask_folder that have an identical start of the filenames as compared to the files in test_folder and then these files should be copied from the mask_folder to the test_folder.
In that way the test_folder contains paired files as follows:
asdgaw-1454_mask.tif
asdgaw-1454.tif
lkafmns-8972_mask.tif
lkafmns-8972.tif
etc.
This is what I tried, it runs without any errors but nothing happens:
import shutil
import os
mask_folder = "//Mask/"
test_folder = "//Test/"
n = 8
list_of_files_mask = []
list_of_files_test = []
for file in os.listdir(mask_folder):
if not file.startswith('.'):
list_of_files_mask.append(file)
start_mask = file[0:n]
print(start_mask)
for file in os.listdir(test_folder):
if not file.startswith('.'):
list_of_files_test.append(file)
start_test = file[0:n]
print(start_test)
for file in start_test:
if start_mask == start_test:
shutil.copy2(file, test_folder)
The past period I searched for but not found a solution for above mentioned problem. So, any help is really appreciated.
First, you want to get only the files, not the folders as well, so you should probably use os.walk() instead of listdir() to make the solution more robust. Read more about it in this question.
Then, I suggest loading the filenames of the test folder into memory (since they are the smaller part) and then NOT load all the other files into memory as well but instead copy them right away.
import os
import shutil
test_dir_path = ''
mask_dir_path = ''
# load file names from test folder into a list
test_file_list = []
for _, _, file_names in os.walk(test_dir_path):
# 'file_names' is a list of strings
test_file_list.extend(file_names)
# exit after this directory, do not check child directories
break
# check mask folder for matches
for _, _, file_names in os.walk(mask_dir_path):
for name_1 in file_names:
# we just remove a part of the filename to get exact matches
name_2 = name_1.replace('_mask', '')
# we check if 'name_2' is in the file name list of the test folder
if name_2 in test_file_list:
print('we copy {} because {} was found'.format(name_1, name_2))
shutil.copy2(
os.path.join(mask_dir_path, name_1),
test_dir_path)
# exit after this directory, do not check child directories
break
Does this solve your problem?

Going through all folders in Python

I want to go through all folders inside a directory:
directory\
folderA\
a.cpp
folderB\
b.cpp
folderC\
c.cpp
folderD\
d.cpp
The name of the folders are all known.
Specifically, I am trying to count the number of lines of code on each of the a.cpp, b.cpp, c.pp and d.cpp source files. So, go inside folderA and read a.cpp, count lines and then go back to directory, go inside folderB, read b.cpp, count lines etc.
This is what I have up until now,
dir = directory_path
for folder_name in folder_list():
dir = os.path.join(dir, folder_name)
with open(dir) as file:
source= file.read()
c = source.count_lines()
but I am new to Python and have no idea if my approach is appropriate and how to proceed. Any example code shown will be appreciated!
Also, does the with open handles the file opening/closing as it should for all those reads or more handling is required?
I would do it like this:
import glob
import os
path = 'C:/Users/me/Desktop/' # give the path where all the folders are located
list_of_folders = ['test1', 'test2'] # give the program a list with all the folders you need
names = {} # initialize a dict
for each_folder in list_of_folders: # go through each file from a folder
full_path = os.path.join(path, each_folder) # join the path
os.chdir(full_path) # change directory to the desired path
for each_file in glob.glob('*.cpp'): # self-explanatory
with open(each_file) as f: # opens a file - no need to close it
names[each_file] = sum(1 for line in f if line.strip())
print(names)
Output:
{'file1.cpp': 2, 'file3.cpp': 2, 'file2.cpp': 2}
{'file1.cpp': 2, 'file3.cpp': 2, 'file2.cpp': 2}
Regarding the with question, you don't need to close the file or make any other checks. You should be safe as it is now.
You may, however, check if the full_path exists as somebody (you) could mistakenly delete a folder from your PC (a folder from list_of_folders)
You can do this by os.path.isdir which returns True if the file exists:
os.path.isdir(full_path)
PS: I used Python 3.
Use Python 3's os.walk() to traverse all subdirectories and files of a given path, opening each file and do your logic. You can use a 'for' loop to walk it, simplifying your code greatly.
https://docs.python.org/2/library/os.html#os.walk
As manglano said, os.walk()
you can generate a list of folder.
[src for src,_,_ in os.walk(sourcedir)]
you can generate a list of file path.
[src+'/'+file for src,dir,files in os.walk(sourcedir) for file in files]

Categories