Sorting through many files and unzipping them at once - python

I have a program that currently unzips files in a specific folder. However, I have many files in t that need to be sorted through. In my trades folder there are many coins and each coin has many files. I can get the files in each coin but cannot go through the folders initially for each coin. I need to be able to go through all those files without manually changing the directory to each coin.
This is how the files are set up
import os
import zipfile
dir_name = 'E:/binance-public-data/python/data/spot/monthly/trades'
extension = ".zip"
for item in os.listdir(dir_name):
if item.endswith(extension):
file_name = os.path.abspath(item)
zip_ref = zipfile.ZipFile(file_name)
zip_ref.extractall(dir_name)
zip_ref.close()
os.remove(file_name)

os.listdir gives you the file name in the directory. You would have to rebuild its path from your current directory before accessing the file. os.path.abspath(item) created an absolute path from your current working directory, not the directory holding your zip file.
You can use glob instead. It will filter out non-zip file extensions and will keep the relative path to the found file so you don't have to fiddle with the path. The pathlib module makes path handling a bit easier than glob.glob and os.path, so I'll use it here.
import zipfile
from pathlib import Path
dir_name = Path('E:/binance-public-data/python/data/spot/monthly/trades')
extension = ".zip"
for file_name in dir_name.glob("*/*" + extension):
with zipfile.ZipFile(file_name) as zip_ref:
zip_ref.extractall(file_name.parent)
file_name.unlink()

Related

Python script to move specific filetypes from the all directories to one folder

I'm trying to write a python script to move all music files from my whole pc to one spcific folder.
They are scattered everywhere and I want to get them all in one place, so I don't want to copy but completely move them.
I was already able to make a list of all the files with this script:
import os
targetfiles = []
extensions = (".mp3", ".wav", ".flac")
for root, dirs, files in os.walk('/'):
for file in files:
if file.endswith(extensions):
targetfiles.append(os.path.join(root, file))
print(targetfiles)
This prints out a nice list of all the files but I'm stuck to now move them.
I did many diffent tries with different code and this was one of them:
import os
import shutil
targetfiles = []
extensions = (".mp3", ".wav", ".flac")
for root, dirs, files in os.walk('/'):
for file in files:
if file.endswith(extensions):
targetfiles.append(os.path.join(root, file))
new_path = 'C:/Users/Nicolaas/Music/All' + file
shutil.move(targetfiles, new_path)
But everything I try gives me an error:
TypeError: rename: src should be string, bytes or os.PathLike, not list
I think I've met my limit gathering this all as I'm only starting at Python but I would be very grateful if anyone could point me in the right direction!
You are trying to move a list of files to a new location, but the shutil.move function expects a single file as the first argument. To move all the files in the targetfiles list to the new location, you have to use a loop to move each file individually.
for file in targetfiles:
shutil.move(file, new_path)
Also if needed add a trailing slash to the new path 'C:/Users/Nicolaas/Music/All/'
On a sidenote are you sure that moving all files with those extentions is a good idea? I would suggest copying them or having a backup.
Edit:
You can use an if statement to exclude certain folders from being searched.
for root, dirs, files in os.walk('/'):
if any(folder in root for folder in excluded_folders):
continue
for file in files:
if file.endswith(extensions):
targetfiles.append(os.path.join(root, file))
Where excluded_folder is a list of the unwanted folders like: excluded_folders = ['Program Files', 'Windows']
I would suggest using glob for matching:
import glob
def match(extension, root_dir):
return glob.glob(f'**\\*.{extension}', root_dir=root_dir, recursive=True)
root_dirs = ['C:\\Path\\to\\Albums', 'C:\\Path\\to\\dir\\with\\music\\files']
excluded_folders = ['Bieber', 'Eminem']
extensions = ("mp3", "wav", "flac")
targetfiles = [f'{root_dir}\\{file_name}' for root_dir in root_dirs for extension in extensions for file_name in match(extension, root_dir) if not any(excluded_folder in file_name for excluded_folder in excluded_folders)]
Then you can move these files to new_path

Zip each folder (directory) recursively

I am trying to zip each folder on its own in Python. However, the first folder is being zipped and includes all folders within it. Could someone please explain what is going on? Should I not be using shutil for this?
#%% Set path variable
path = r"G:\Folder"
os.chdir(path)
os.getcwd()
#%% Zip all folders
def retrieve_file_paths(dirName):
# File paths variable
filePaths = []
# Read all directory, subdirectories and file lists
for root, directories, files in os.walk(dirName):
for filename in directories:
# Createthe full filepath by using os module
filePath = os.path.join(root, filename)
filePaths.append(filePath)
# return all paths
return filePaths
filepaths = retrieve_file_paths(path)
#%% Print folders and start zipping individually
for x in filepaths:
print(x)
shutil.make_archive(x, 'zip', path)
shutil.make_archive will make an archive of all files and subfolders - since this is what most people want. If you need more choice of what files are included, you must use zipfile directly.
You can do this right within the walk loop (that is what it's for).
import os
import zipfile
dirName = 'C:\...'
# Read all directory, subdirectories and file lists
for root, directories, files in os.walk(dirName):
zf = zipfile.ZipFile(os.path.join(root, "thisdir.zip"), "w", compression=zipfile.ZIP_DEFLATED, compresslevel=9)
for name in files:
if name == 'thisdir.zip': continue
filePath = os.path.join(root, name)
zf.write(filePath, arcname=name)
zf.close()
This will create a file "thisdir.zip" in each subdirectory, containing only the files within this directory.
(edit: tested & corrected code example)
Following Torben's answer to my question, I modified the code to zip each directory recursively. I realised what had happened was that I was not specifying sub directories. Code below:
#Set path variable
path = r"insert root directory here"
os.chdir(path)
# Declare the functionto return all file paths in selected directory
def retrieve_file_paths(dirName):
for root, dirs, files in os.walk(dirName):
for dir in dirs:
zf = zipfile.ZipFile(os.path.join(root+dir, root+dir+'.zip'), "w", compression=zipfile.ZIP_DEFLATED, compresslevel=9)
files = os.listdir(root+dir)
print(files)
filePaths.append(files)
for f in files:
filepath = root + dir +'/'+ f
zf.write(filepath, arcname=f)
zf.close()
retrieve_file_paths(path)
it's a relativly simple answer once you got a look onto the Docs.
You can see the following under shutil.make_archive:
Note This function is not thread-safe.
The way threading in computing works on a high level basis:
On a machine there are cores, which can process data. (e.g. AMD Ryzen 5 5800x with 8cores)
Within a process, there are threads (e.g. 16 Threads on the Ryzen 5800X).
However, in multiprocessing there is no data shared between the processes.
In multithreading within one process you can access data from the same variable.
Because this function is not thread-safe, you will share the variable "x" and access the same item. Which means there can only be one output.
Have a look into multithreading and works with locks in order to sequelize threads.
Cheers

Search a folder and sub folders for files starting with criteria

I have a folder "c:\test" , the folder "test" contains many sub folders and files (.xml, .wav). I need to search all folders for files in the test folder and all sub-folders, starting with the number 4 and being 7 characters long in it and copy these files to another folder called 'c:\test.copy' using python. any other files need to be ignored.
So far i can copy the files starting with a 4 but not structure to the new folder using the following,
from glob import glob
import os, shutil
root_src_dir = r'C:/test' #Path of the source directory
root_dst_dir = 'c:/test.copy' #Path to the destination directory
for file in glob('c:/test/**/4*.*'):
shutil.copy(file, root_dst_dir)
any help would be most welcome
You can use os.walk:
import os
import shutil
root_src_dir = r'C:/test' #Path of the source directory
root_dst_dir = 'c:/test.copy' #Path to the destination directory
for root, _, files in os.walk(root_src_dir):
for file in files:
if file.startswith("4") and len(file) == 7:
shutil.copy(os.path.join(root, file), root_dst_dir)
If, by 7 characters, you mean 7 characters without the file extension, then replace len(file) == 7 with len(os.path.splitext(file)[0]) == 7.
This can be done using the os and shutil modules:
import os
import shutil
Firstly, we need to establish the source and destination paths. source should the be the directory you are copying and destination should be the directory you want to copy into.
source = r"/root/path/to/source"
destination = r"/root/path/to/destination"
Next, we have to check if the destination path exists because shutil.copytree() will raise a FileExistsError if the destination path already exists. If it does already exist, we can remove the tree and duplicate it again. You can think of this block of code as simply refreshing the duplicate directory:
if os.path.exists(destination):
shutil.rmtree(destination)
shutil.copytree(source, destination)
Then, we can use os.walk to recursively navigate the entire directory, including subdirectories:
for path, _, files in os.walk(destination):
for file in files:
if not file.startswith("4") and len(os.path.splitext(file)[0]) != 7:
os.remove(os.path.join(path, file))
if not os.listdir(path):
os.rmdir(path)
We then can loop through the files in each directory and check if the file does not meet your condition (starts with "4" and has a length of 7). If it does not meet the condition, we simply remove it from the directory using os.remove.
The final if-statement checks if the directory is now empty. If the directory is empty after removing the files, we simply delete that directory using os.rmdir.

Python: Unzip selected files in directory tree

I have the following directory, in the parent dir there are several folders lets say ABCD and within each folder many zips with names as displayed and the letter of the parent folder included in the name along with other info:
-parent--A-xxxAxxxx_timestamp.zip
-xxxAxxxx_timestamp.zip
-xxxAxxxx_timestamp.zip
--B-xxxBxxxx_timestamp.zip
-xxxBxxxx_timestamp.zip
-xxxBxxxx_timestamp.zip
--C-xxxCxxxx_timestamp.zip
-xxxCxxxx_timestamp.zip
-xxxCxxxx_timestamp.zip
--D-xxxDxxxx_timestamp.zip
-xxxDxxxx_timestamp.zip
-xxxDxxxx_timestamp.zip
I need to unzip only selected zips in this tree and place them in the same directory with the same name without the .zip extension.
Output:
-parent--A-xxxAxxxx_timestamp
-xxxAxxxx_timestamp
-xxxAxxxx_timestamp
--B-xxxBxxxx_timestamp
-xxxBxxxx_timestamp
-xxxBxxxx_timestamp
--C-xxxCxxxx_timestamp
-xxxCxxxx_timestamp
-xxxCxxxx_timestamp
--D-xxxDxxxx_timestamp
-xxxDxxxx_timestamp
-xxxDxxxx_timestamp
My effort:
for path in glob.glob('./*/xxx*xxxx*'): ##walk the dir tree and find the files of interest
zipfile=os.path.basename(path) #save the zipfile path
zip_ref=zipfile.ZipFile(path, 'r')
zip_ref=extractall(zipfile.replace(r'.zip', '')) #unzip to a folder without the .zip extension
The problem is that i dont know how to save the A,B,C,D etc to include them in the path where the files will be unzipped. Thus, the unzipped folders are created in the parent directory. Any ideas?
The code that you have seems to be working fine, you just to make sure that you are not overriding variable names and using the correct ones. The following code works perfectly for me
import os
import zipfile
import glob
for path in glob.glob('./*/xxx*xxxx*'): ##walk the dir tree and find the files of interest
zf = os.path.basename(path) #save the zipfile path
zip_ref = zipfile.ZipFile(path, 'r')
zip_ref.extractall(path.replace(r'.zip', '')) #unzip to a folder without the .zip extension
Instead of trying to do it in a single statement , it would be much easier and more readable to do it by first getting list of all folders and then get list of files inside each folder. Example -
import os.path
for folder in glob.glob("./*"):
#Using *.zip to only get zip files
for path in glob.glob(os.path.join(".",folder,"*.zip")):
filename = os.path.split(path)[1]
if folder in filename:
#Do your logic

Find files with extension in directory and subdirectories?

I want to make a Python script to quickly organize my files on my desktop into folders based on extension. Basically, how could I use a loop to take a file, do something to it, move on to the next file, and so on?
Everything you need is probably contained in the os library, more specifically in the os.path bit of it and the shutil one.
To explore a directory tree you can use os.walk and to move files around you can use shutil.move.
EDIT: a small script I hacked together to get you going:
import os
import shutil as sh
from collections import defaultdict
DESKTOP = '/home/mac/Desktop'
#This dictionary will contain: <extension>: <list_of_files> mappings
register = defaultdict(list)
#Populate the register
for dir_, dirs, fnames in os.walk('/home/mac/Desktop'):
for fname in fnames:
register[fname.rsplit('.', 1)[1]].append(os.path.join(dir_, fname))
#Iterate over the register, creating the directory and moving the files
#with that extension in it.
for dirname, files in register.iteritems():
dirname = os.path.join(DESKTOP, dirname)
if not os.path.exists(dirname):
os.makedirs(dirname)
for file_ in files:
sh.move(file_, dirname)
I would recommend os.walk from the module os and list of the filenames

Categories