Apply script to each folder in a drive - python

I am working on a data cleanup in a network drive. The drive has 1000+ folders, and those folders have several subfolders. The script that I got from G4G (seen below) prompts me to select a folder. I can click on one of my 1000+ folders, and the data is cleaned up properly (duplicates are deleted). However, I'd like to loop the command through the whole drive to avoid clicking on folders for hours. I cannot select the drive as my folder because duplicate file names between the first folders in the drive should not be considered duplicates.
EDIT:
I'll give an example to clarify.
Z:/Folder1 and Z:/Folder2 both have several files named "text.txt," immediately inside of the folders and within the subdirectories of the folders. Folder1 and Folder2, amongst all "text.txt" files immediately inside and within its subdirectories, should each be left with one "text.txt." If the current script is applied to Folder1 and Folder2 individually, then the desired result of one "text.txt" file existing in Folder1 and one existing in Folder2 is accomplished. If the script is applied to the Z drive, then between Folder1 and Folder2, there would only be one "text.txt," and one of the folders would be without a file named "text.txt."
How can I apply this script to each first folder in the drive without having to manually click on each folder?
from tkinter.filedialog import askdirectory
# Importing required libraries.
from tkinter import Tk
import os
import hashlib
from pathlib import Path
# We don't want the GUI window of
# tkinter to be appearing on our screen
Tk().withdraw()
# Dialog box for selecting a folder.
file_path = askdirectory(title="Select a folder")
# Listing out all the files
# inside our root folder.
list_of_files = os.walk(file_path)
# In order to detect the duplicate
# files we are going to define an empty dictionary.
unique_files = dict()
for root, folders, files in list_of_files:
# Running a for loop on all the files
for file in files:
# Finding complete file path
file_path = Path(os.path.join(root, file))
# Converting all the content of
# our file into md5 hash.
Hash_file = hashlib.md5(open(file_path, 'rb').read()).hexdigest()
# If file hash has already #
# been added we'll simply delete that file
if Hash_file not in unique_files:
unique_files[Hash_file] = file_path
else:
if file.endswith((".txt",".bmp")):
os.remove(file_path)
print(f"{file_path} has been deleted")

Maybe you should run it for your drive and use if/else to skip first folder
list_of_files = os.walk("your drive")
for root, folders, files in list_of_files:
if root != "your drive":
for file in files:
# ... code ...
This way you can skip also other (sub)folders.
OR you can use next() to skip some element from os.walk() because os.walk() doesn't give directly list with all elements but generator.
list_of_files = os.walk("your drive")
next(list_of_files) # skip first item
for root, folders, files in list_of_files:
for file in files:
# ... code ...

Related

How to run a script on each folder in a drive?

I am working on a data cleanup in a network drive. The drive has 1000+ folders, and those folders have several subfolders. The script that I got from G4G (seen below) prompts me to select a folder. I can click on one of my 1000+ folders, and the data is cleaned up properly (duplicates are deleted). However, I'd like to loop the command through the whole drive to avoid clicking on folders for hours. I cannot select the drive as my folder because duplicate file names between the first folders in the drive should not be considered duplicates.
Example:
Z:/Folder1 and Z:/Folder2 both have several files named text.txt, immediately inside of the folders and within the subdirectories of the folders. Folder1 and Folder2, among all text.txt files immediately inside and within its subdirectories, should each be left with one text.txt. If the current script is applied to Folder1 and Folder2 individually, then the desired result of one text.txt file existing in Folder1 and one existing in Folder2 is accomplished. If the script is applied to the Z: drive, then between Folder1 and Folder2, there would only be one text.txt, and one of the folders would be without a file named text.txt.
How can I apply this script to each first folder in the drive without having to manually click on each folder?
from tkinter.filedialog import askdirectory
# Importing required libraries.
from tkinter import Tk
import os
import hashlib
from pathlib import Path
# We don't want the GUI window of
# tkinter to be appearing on our screen
Tk().withdraw()
# Dialog box for selecting a folder.
file_path = askdirectory(title="Select a folder")
# Listing out all the files
# inside our root folder.
list_of_files = os.walk(file_path)
# In order to detect the duplicate
# files we are going to define an empty dictionary.
unique_files = dict()
for root, folders, files in list_of_files:
# Running a for loop on all the files
for file in files:
# Finding complete file path
file_path = Path(os.path.join(root, file))
# Converting all the content of
# our file into md5 hash.
Hash_file = hashlib.md5(open(file_path, 'rb').read()).hexdigest()
# If file hash has already #
# been added we'll simply delete that file
if Hash_file not in unique_files:
unique_files[Hash_file] = file_path
else:
if file.endswith((".txt",".bmp")):
os.remove(file_path)
print(f"{file_path} has been deleted")
You can change the script as below.
Basically below script gets absolute path of all the directories in the current directory, and feeds them one by one for cleanup.
from tkinter.filedialog import askdirectory
# Importing required libraries.
from tkinter import Tk
import os
import hashlib
from pathlib import Path
# We don't want the GUI window of
# tkinter to be appearing on our screen
Tk().withdraw()
# Dialog box for selecting a folder.
file_paths = [os.path.abspath(i) for i in os.listdir() if os.path.isdir(i)]
for file_path in file_paths:
# Listing out all the files
# inside our root folder.
list_of_files = os.walk(file_path)
# In order to detect the duplicate
# files we are going to define an empty dictionary.
unique_files = dict()
for root, folders, files in list_of_files:
# Running a for loop on all the files
for file in files:
# Finding complete file path
file_path = Path(os.path.join(root, file))
# Converting all the content of
# our file into md5 hash.
Hash_file = hashlib.md5(open(file_path, 'rb').read()).hexdigest()
# If file hash has already #
# been added we'll simply delete that file
if Hash_file not in unique_files:
unique_files[Hash_file] = file_path
else:
if file.endswith((".txt",".bmp")):
os.remove(file_path)
print(f"{file_path} has been deleted")

How to run python script for every folder in directory and skip any empty folders?

I am trying to figure out how to generate an excel workbook for each subfolder in my directory while skipping the folders that are empty. My directory structure is below.
So it would start with Folder A, execute my lines of code to create an excel file using Folder A's contents, then move to Folder B, execute my lines of code to create a separate excel file using Folder B's contents, then move to Folder C and skip it since it's empty, and continue on.
How do I loop through each folder in this manner and keep going when a folder is empty?
I would greatly appreciate the help!
myscript.py
folderA
- report1.xlsx
- report2.xlsx
folderB
- report1.xlsx
- report2.xlsx
folderC
** EMPTY **
folderD
- report1.xlsx
- report2.xlsx
Something like this maybe?
from pathlib import Path
from itertools import groupby
def by_folder(path):
return path.parent
for folder, files in groupby(Path("path/to/root/dir").rglob("*.xlsx"), key=by_folder):
print(f"Gonna merge these files from {folder}: ")
for file in files:
print(f"{file.name}")
print()
We recursively search for .xlsx files in the root directory, and group files into lists based on the immediate common parent folder. If a folder is empty, it won't be matched by the glob pattern.
You can use the os.listdir() method to list everything that's inside a folder. Only bad thing is that it'll get all files, so you may get a problem to get "inside a file" when actually you want to get inside all folder.
The following code loop through all subfolders, skip files, and print the name of all folders that are not empty.
for folder in os.listdir("."):
try:
if len( os.listdir("./"+folder) )>0:
print(
folder
)
except:
pass

loop over subfolder inside a folder in python

I am trying to move files from one folder to another. I have a folder name from a to z. Inside each folder(a-z) i have several folders. I can move files from the subfolder of the folder(a-z) to my folder but I want to do it from a-z at once.
folder structure : a--ab
--ac
b--bc
--bd
.. till z
import glob
import os
import shutil
path = "E:\\download\\images\\a\\*"
move_path = "E:\\download\\images\\final\\"
files = glob.glob(path,recursive = True)
for file in files:
subfile= os.listdir(file)
for sub in subfile:
subpath = file + "\\" + sub
shutil.move(subpath,move_path +"\\" + sub)
Copy this tiny script in E:\download\images and run it from there. This way, the Path class will use that directory as the working root directory.
The images variable will contain a generator that will give you every file matching the glob (which means: every file in every subfolder that has a 3-letter extension and with the first subfolder's name having only one character).
When renaming, the file will be moved from the subfolder path to final/, thus being moved.
Keep in mind that the glob will pick every file or folder name having a 3-letter extension. You'll need to do additional checks if you have other files or folders that match this nomenclature.
from pathlib import Path
images = Path().glob('?/**/*.???')
for img in images:
img.rename('final/' + img.name)

Search a folder and sub folders for files starting with criteria

I have a folder "c:\test" , the folder "test" contains many sub folders and files (.xml, .wav). I need to search all folders for files in the test folder and all sub-folders, starting with the number 4 and being 7 characters long in it and copy these files to another folder called 'c:\test.copy' using python. any other files need to be ignored.
So far i can copy the files starting with a 4 but not structure to the new folder using the following,
from glob import glob
import os, shutil
root_src_dir = r'C:/test' #Path of the source directory
root_dst_dir = 'c:/test.copy' #Path to the destination directory
for file in glob('c:/test/**/4*.*'):
shutil.copy(file, root_dst_dir)
any help would be most welcome
You can use os.walk:
import os
import shutil
root_src_dir = r'C:/test' #Path of the source directory
root_dst_dir = 'c:/test.copy' #Path to the destination directory
for root, _, files in os.walk(root_src_dir):
for file in files:
if file.startswith("4") and len(file) == 7:
shutil.copy(os.path.join(root, file), root_dst_dir)
If, by 7 characters, you mean 7 characters without the file extension, then replace len(file) == 7 with len(os.path.splitext(file)[0]) == 7.
This can be done using the os and shutil modules:
import os
import shutil
Firstly, we need to establish the source and destination paths. source should the be the directory you are copying and destination should be the directory you want to copy into.
source = r"/root/path/to/source"
destination = r"/root/path/to/destination"
Next, we have to check if the destination path exists because shutil.copytree() will raise a FileExistsError if the destination path already exists. If it does already exist, we can remove the tree and duplicate it again. You can think of this block of code as simply refreshing the duplicate directory:
if os.path.exists(destination):
shutil.rmtree(destination)
shutil.copytree(source, destination)
Then, we can use os.walk to recursively navigate the entire directory, including subdirectories:
for path, _, files in os.walk(destination):
for file in files:
if not file.startswith("4") and len(os.path.splitext(file)[0]) != 7:
os.remove(os.path.join(path, file))
if not os.listdir(path):
os.rmdir(path)
We then can loop through the files in each directory and check if the file does not meet your condition (starts with "4" and has a length of 7). If it does not meet the condition, we simply remove it from the directory using os.remove.
The final if-statement checks if the directory is now empty. If the directory is empty after removing the files, we simply delete that directory using os.rmdir.

Python: Unzip selected files in directory tree

I have the following directory, in the parent dir there are several folders lets say ABCD and within each folder many zips with names as displayed and the letter of the parent folder included in the name along with other info:
-parent--A-xxxAxxxx_timestamp.zip
-xxxAxxxx_timestamp.zip
-xxxAxxxx_timestamp.zip
--B-xxxBxxxx_timestamp.zip
-xxxBxxxx_timestamp.zip
-xxxBxxxx_timestamp.zip
--C-xxxCxxxx_timestamp.zip
-xxxCxxxx_timestamp.zip
-xxxCxxxx_timestamp.zip
--D-xxxDxxxx_timestamp.zip
-xxxDxxxx_timestamp.zip
-xxxDxxxx_timestamp.zip
I need to unzip only selected zips in this tree and place them in the same directory with the same name without the .zip extension.
Output:
-parent--A-xxxAxxxx_timestamp
-xxxAxxxx_timestamp
-xxxAxxxx_timestamp
--B-xxxBxxxx_timestamp
-xxxBxxxx_timestamp
-xxxBxxxx_timestamp
--C-xxxCxxxx_timestamp
-xxxCxxxx_timestamp
-xxxCxxxx_timestamp
--D-xxxDxxxx_timestamp
-xxxDxxxx_timestamp
-xxxDxxxx_timestamp
My effort:
for path in glob.glob('./*/xxx*xxxx*'): ##walk the dir tree and find the files of interest
zipfile=os.path.basename(path) #save the zipfile path
zip_ref=zipfile.ZipFile(path, 'r')
zip_ref=extractall(zipfile.replace(r'.zip', '')) #unzip to a folder without the .zip extension
The problem is that i dont know how to save the A,B,C,D etc to include them in the path where the files will be unzipped. Thus, the unzipped folders are created in the parent directory. Any ideas?
The code that you have seems to be working fine, you just to make sure that you are not overriding variable names and using the correct ones. The following code works perfectly for me
import os
import zipfile
import glob
for path in glob.glob('./*/xxx*xxxx*'): ##walk the dir tree and find the files of interest
zf = os.path.basename(path) #save the zipfile path
zip_ref = zipfile.ZipFile(path, 'r')
zip_ref.extractall(path.replace(r'.zip', '')) #unzip to a folder without the .zip extension
Instead of trying to do it in a single statement , it would be much easier and more readable to do it by first getting list of all folders and then get list of files inside each folder. Example -
import os.path
for folder in glob.glob("./*"):
#Using *.zip to only get zip files
for path in glob.glob(os.path.join(".",folder,"*.zip")):
filename = os.path.split(path)[1]
if folder in filename:
#Do your logic

Categories