Zip each folder (directory) recursively - python

I am trying to zip each folder on its own in Python. However, the first folder is being zipped and includes all folders within it. Could someone please explain what is going on? Should I not be using shutil for this?
#%% Set path variable
path = r"G:\Folder"
os.chdir(path)
os.getcwd()
#%% Zip all folders
def retrieve_file_paths(dirName):
# File paths variable
filePaths = []
# Read all directory, subdirectories and file lists
for root, directories, files in os.walk(dirName):
for filename in directories:
# Createthe full filepath by using os module
filePath = os.path.join(root, filename)
filePaths.append(filePath)
# return all paths
return filePaths
filepaths = retrieve_file_paths(path)
#%% Print folders and start zipping individually
for x in filepaths:
print(x)
shutil.make_archive(x, 'zip', path)

shutil.make_archive will make an archive of all files and subfolders - since this is what most people want. If you need more choice of what files are included, you must use zipfile directly.
You can do this right within the walk loop (that is what it's for).
import os
import zipfile
dirName = 'C:\...'
# Read all directory, subdirectories and file lists
for root, directories, files in os.walk(dirName):
zf = zipfile.ZipFile(os.path.join(root, "thisdir.zip"), "w", compression=zipfile.ZIP_DEFLATED, compresslevel=9)
for name in files:
if name == 'thisdir.zip': continue
filePath = os.path.join(root, name)
zf.write(filePath, arcname=name)
zf.close()
This will create a file "thisdir.zip" in each subdirectory, containing only the files within this directory.
(edit: tested & corrected code example)

Following Torben's answer to my question, I modified the code to zip each directory recursively. I realised what had happened was that I was not specifying sub directories. Code below:
#Set path variable
path = r"insert root directory here"
os.chdir(path)
# Declare the functionto return all file paths in selected directory
def retrieve_file_paths(dirName):
for root, dirs, files in os.walk(dirName):
for dir in dirs:
zf = zipfile.ZipFile(os.path.join(root+dir, root+dir+'.zip'), "w", compression=zipfile.ZIP_DEFLATED, compresslevel=9)
files = os.listdir(root+dir)
print(files)
filePaths.append(files)
for f in files:
filepath = root + dir +'/'+ f
zf.write(filepath, arcname=f)
zf.close()
retrieve_file_paths(path)

it's a relativly simple answer once you got a look onto the Docs.
You can see the following under shutil.make_archive:
Note This function is not thread-safe.
The way threading in computing works on a high level basis:
On a machine there are cores, which can process data. (e.g. AMD Ryzen 5 5800x with 8cores)
Within a process, there are threads (e.g. 16 Threads on the Ryzen 5800X).
However, in multiprocessing there is no data shared between the processes.
In multithreading within one process you can access data from the same variable.
Because this function is not thread-safe, you will share the variable "x" and access the same item. Which means there can only be one output.
Have a look into multithreading and works with locks in order to sequelize threads.
Cheers

Related

Python script to move specific filetypes from the all directories to one folder

I'm trying to write a python script to move all music files from my whole pc to one spcific folder.
They are scattered everywhere and I want to get them all in one place, so I don't want to copy but completely move them.
I was already able to make a list of all the files with this script:
import os
targetfiles = []
extensions = (".mp3", ".wav", ".flac")
for root, dirs, files in os.walk('/'):
for file in files:
if file.endswith(extensions):
targetfiles.append(os.path.join(root, file))
print(targetfiles)
This prints out a nice list of all the files but I'm stuck to now move them.
I did many diffent tries with different code and this was one of them:
import os
import shutil
targetfiles = []
extensions = (".mp3", ".wav", ".flac")
for root, dirs, files in os.walk('/'):
for file in files:
if file.endswith(extensions):
targetfiles.append(os.path.join(root, file))
new_path = 'C:/Users/Nicolaas/Music/All' + file
shutil.move(targetfiles, new_path)
But everything I try gives me an error:
TypeError: rename: src should be string, bytes or os.PathLike, not list
I think I've met my limit gathering this all as I'm only starting at Python but I would be very grateful if anyone could point me in the right direction!
You are trying to move a list of files to a new location, but the shutil.move function expects a single file as the first argument. To move all the files in the targetfiles list to the new location, you have to use a loop to move each file individually.
for file in targetfiles:
shutil.move(file, new_path)
Also if needed add a trailing slash to the new path 'C:/Users/Nicolaas/Music/All/'
On a sidenote are you sure that moving all files with those extentions is a good idea? I would suggest copying them or having a backup.
Edit:
You can use an if statement to exclude certain folders from being searched.
for root, dirs, files in os.walk('/'):
if any(folder in root for folder in excluded_folders):
continue
for file in files:
if file.endswith(extensions):
targetfiles.append(os.path.join(root, file))
Where excluded_folder is a list of the unwanted folders like: excluded_folders = ['Program Files', 'Windows']
I would suggest using glob for matching:
import glob
def match(extension, root_dir):
return glob.glob(f'**\\*.{extension}', root_dir=root_dir, recursive=True)
root_dirs = ['C:\\Path\\to\\Albums', 'C:\\Path\\to\\dir\\with\\music\\files']
excluded_folders = ['Bieber', 'Eminem']
extensions = ("mp3", "wav", "flac")
targetfiles = [f'{root_dir}\\{file_name}' for root_dir in root_dirs for extension in extensions for file_name in match(extension, root_dir) if not any(excluded_folder in file_name for excluded_folder in excluded_folders)]
Then you can move these files to new_path

Iterate Through all Folders in a Drive - A Legacy Storage Option Migration to Cloud

I have a folder structure similar to the following:
This structure is used to store images.
New images are appended to the deepest available directory.
A directory can hold a maximum of 100 images.
Examples:
The first 100 images added will have the path:
X:\Images\DB\0\0\0\0\0\0\image_name.jpg
A random image may have the path:
X:\Images\DB\0\2\1\4\2\7\image_name.jpg
The last 100 images added will have the path:
X:\Images\DB\0\9\9\9\9\9\image_name.jpg
N.B. An image is only ever stored at the deepest possible directory.
X:\Images\DB\0\x\x\x\x\x\IMAGES_HERE
E.G. There are no images stored in: X:\Images\DB\0\1\2\3
N.B. The deepest folder path to an image only exists if an image is stored there. Example:
X:\Images\DB\0\9\9\9\9\9
... may not exist (and it doesn't in my case).
What I want to achieve is, beginning at the root directory, navigate through every possible path to the images and run a command.
I'm aware the time complexity for this is in terms of hours, if not days. It's a legacy storage option with the command migrating images to the cloud.
I have already managed to code some functions to allow me to travel to the current deepest directory and execute a command, but visiting all possible paths adds a complexity I'm struggling with - also I'm new to Python.
Here is the code:
# file generator
def files(path):
for file in os.listdir(path):
if os.path.isfile(os.path.join(path, file)):
yield file
# latest deepest directory
def get_deepest_dir(dir):
current_dir = dir
next_dir = os.listdir(current_dir)[-1]
if len(list(files(current_dir))) == 0:
next_dir = os.path.join(current_dir, next_dir)
return get_deepest_dir(next_dir)
else:
return current_dir
# perform command
def sync():
dir = get_deepest_dir(root_dir)
command = "<command_here>"
subprocess.Popen(command, shell=True)
I used the following to search for csv / pdf files. I've left an example of what I wrote to search through all folders.
os.listdir -
os.listdir() method in python is used to get the list of all files and directories in the specified directory.
os.walk -
os.walk() method, in python is used to generate the file names in a directory tree by walking the tree either top-down or bottom-up.
#Import Python Modules
import os,time
import pandas as pd
## Search Folder
##src_path ="/Users/folder1/test/"
src_path ="/Users/folder1/"
path = src_path
files = os.listdir(path)
for f in files:
if f.endswith('.csv'):
print(f)
for root, directories, files in os.walk(path, topdown=False):
for name in files:
if name.endswith('.csv'):
print(os.path.join(root, name))
## for name in directories:
## print(os.path.join(root, name))
for root, directories, files in os.walk(path):
for name in files:
if name.endswith('.pdf'):
print(os.path.join(root, name))
## for name in directories:
## print(os.path.join(root, name))
Thanks to #NeoTheNerd above for the solution.
The adapted code which worked for me is here.
def all_dirs(path):
for root, directories, files in os.walk(path, topdown=False):
if sum(c.isdigit() for c in root) == 6:
print("Migrating Images From {}".format(root))
all_dirs("X:\\Images\\DB\\0")

Can't get absolute path in Python

I've tried to use os.path.abspath(file) as well as Path.absolute(file) to get the paths of .png files I'm working on that are on a separate drive from the project folder that the code is in. The result from the following script is "Project Folder for the code/filename.png", whereas obviously what I need is the path to the folder that the .png is in;
for root, dirs, files in os.walk(newpath):
for file in files:
if not file.startswith("."):
if file.endswith(".png"):
number, scansize, letter = file.split("-")
filepath = os.path.abspath(file)
# replace weird backslash effects
correctedpath = filepath.replace(os.sep, "/")
newentry = [number, file, correctedpath]
textures.append(newentry)
I've read other answers on here that seem to suggest that the project file for the code can't be in the same directory as the folder that is being worked on. But that isn't the case here. Can someone kindly point out what I'm not getting? I need the absolute path because the purpose of the program will be to write the paths for the files into text files.
You could use pathlib.Path.rglob here to recursively get all the pngs:
As a list comprehension:
from pathlib import Path
search_dir = "/path/to/search/dir"
# This creates a list of tuples with `number` and the resolved path
paths = [(p.name.split("-")[0], p.resolve()) for p in Path(search_dir).rglob("*.png")]
Alternatively, you can process them in a loop:
paths = []
for p in Path(search_dir).rglob("*.png"):
number, scansize, letter = p.name.split("-")
# more processing ...
paths.append([number, p.resolve()])
I just recently wrote something like what you're looking for.
This code relies on the assumption that your files are the end of the path.
it's not suitable to find a directory or something like this.
there's no need for a nested loop.
DIR = "your/full/path/to/direcetory/containing/desired/files"
def get_file_path(name, template):
"""
#:param template: file's template (txt,html...)
#return: The path to the given file.
#rtype: str
"""
substring = f'{name}.{template}'
for path in os.listdir(DIR):
full_path = os.path.join(DIR, path)
if full_path.endswith(substring):
return full_path
The result from
for root, dirs, files in os.walk(newpath):
is that files just contains the filenames without a directory path. Using just filenames means that python by default uses your project folder as directory for those filenames. In your case the files are in newpath. You can use os.path.join to add a directory path to the found filenames.
filepath = os.path.join(newpath, file)
In case you want to find the png files in subdirectories the easiest way is to use glob:
import glob
newpath = r'D:\Images'
file_paths = glob.glob(newpath + "/**/*.png", recursive=True)
for file_path in file_paths:
print(file_path)

Python os.walk Include only specific folders

I am writing a Python script that takes user input in the form of a date eg 20180829, which will be a subdirectory name, it then uses the os.walk function to walk through a specific directory and once it reaches the directory that is passed in it will jump inside and look at all the directory's within it and create a directory structure in a different location.
My directory structure will look something like this:
|dir1
|-----|dir2|
|-----------|dir3
|-----------|20180829
|-----------|20180828
|-----------|20180827
|-----------|20180826
So dir3 will have a number of sub folders which will all be in the format of a date. I need to be able to copy the directory structure of just the directory that is passed in at the start eg 20180829 and skip the rest of directory's.
I have been looking online for a way to do this but all I can find is ways of Excluding directory's from the os.walk function like in the thread below:
Filtering os.walk() dirs and files
I also found a thread that allows me to print out the directory paths that I want but will not let me create the directory's I want:
Python 3.5 OS.Walk for selected folders and include their subfolders.
The following is the code I have which is printing out the correct directory structure but is creating the entire directory structure in the new location which I don't want it to do.
includes = '20180828'
inputpath = Desktop
outputpath = Documents
for startFilePath, dirnames, filenames in os.walk(inputpath, topdown=True):
endFilePath = os.path.join(outputpath, startFilePath)
if not os.path.isdir(endFilePath):
os.mkdir(endFilePath)
for filename in filenames:
if (includes in startFilePath):
print(includes, "+++", startFilePath)
break
I am not sure if I understand what you need, but I think you overcomplicate a few things. If the code below doesn't help you, let me know and we will think about other approaches.
I run this to create an example like yours.
# setup example project structure
import os
import sys
PLATFORM = 'windows' if sys.platform.startswith('win') else 'linux'
DESKTOP_DIR = \
os.path.join(os.path.join(os.path.expanduser('~')), 'Desktop') \
if PLATFORM == 'linux' \
else os.path.join(os.path.join(os.environ['USERPROFILE']), 'Desktop')
example_dirs = ['20180829', '20180828', '20180827', '20180826']
for _dir in example_dirs:
path = os.path.join(DESKTOP_DIR, 'dir_from', 'dir_1', 'dir_2', 'dir_3', _dir)
os.makedirs(path, exist_ok=True)
And here's what you need.
# do what you want to do
dir_from = os.path.join(DESKTOP_DIR, 'dir_from')
dir_to = os.path.join(DESKTOP_DIR, 'dir_to')
target = '20180828'
for root, dirs, files in os.walk(dir_from, topdown=True):
for _dir in dirs:
if _dir == target:
path = os.path.join(root, _dir).replace(dir_from, dir_to)
os.makedirs(path, exist_ok=True)
continue

How to run script for all files in a folder/directry

I am new to python. I have successful written a script to search for something within a file using :
open(r"C:\file.txt) and re.search function and all works fine.
Is there a way to do the search function with all files within a folder? Because currently, I have to manually change the file name of my script by open(r"C:\file.txt),open(r"C:\file1.txt),open(r"C:\file2.txt)`, etc.
Thanks.
You can use os.walk to check all the files, as the following:
import os
for root, _, files in os.walk(path):
for filename in files:
with open(os.path.join(root, filename), 'r') as f:
#your code goes here
Explanation:
os.walk returns tuple of (root path, dir names, file names) in the folder, so you can iterate through filenames and open each file by using os.path.join(root, filename) which basically joins the root path with the file name so you can open the file.
Since you're a beginner, I'll give you a simple solution and walk through it.
Import the os module, and use the os.listdir function to create a list of everything in the directory. Then, iterate through the files using a for loop.
Example:
# Importing the os module
import os
# Give the directory you wish to iterate through
my_dir = <your directory - i.e. "C:\Users\bleh\Desktop\files">
# Using os.listdir to create a list of all of the files in dir
dir_list = os.listdir(my_dir)
# Use the for loop to iterate through the list you just created, and open the files
for f in dir_list:
# Whatever you want to do to all of the files
If you need help on the concepts, refer to the following:
for looops in p3: http://www.python-course.eu/python3_for_loop.php
os function Library (this has some cool stuff in it): https://docs.python.org/2/library/os.html
Good luck!
You can use the os.listdir(path) function:
import os
path = '/Users/ricardomartinez/repos/Salary-API'
# List for all files in a given PATH
file_list = os.listdir(path)
# If you want to filter by file type
file_list = [file for file in os.listdir(path) if os.path.splitext(file)[1] == '.py']
# Both cases yo can iterate over the list and apply the operations
# that you have
for file in file_list:
print(file)
#Operations that you want to do over files

Categories