Python os.walk Include only specific folders - python

I am writing a Python script that takes user input in the form of a date eg 20180829, which will be a subdirectory name, it then uses the os.walk function to walk through a specific directory and once it reaches the directory that is passed in it will jump inside and look at all the directory's within it and create a directory structure in a different location.
My directory structure will look something like this:
|dir1
|-----|dir2|
|-----------|dir3
|-----------|20180829
|-----------|20180828
|-----------|20180827
|-----------|20180826
So dir3 will have a number of sub folders which will all be in the format of a date. I need to be able to copy the directory structure of just the directory that is passed in at the start eg 20180829 and skip the rest of directory's.
I have been looking online for a way to do this but all I can find is ways of Excluding directory's from the os.walk function like in the thread below:
Filtering os.walk() dirs and files
I also found a thread that allows me to print out the directory paths that I want but will not let me create the directory's I want:
Python 3.5 OS.Walk for selected folders and include their subfolders.
The following is the code I have which is printing out the correct directory structure but is creating the entire directory structure in the new location which I don't want it to do.
includes = '20180828'
inputpath = Desktop
outputpath = Documents
for startFilePath, dirnames, filenames in os.walk(inputpath, topdown=True):
endFilePath = os.path.join(outputpath, startFilePath)
if not os.path.isdir(endFilePath):
os.mkdir(endFilePath)
for filename in filenames:
if (includes in startFilePath):
print(includes, "+++", startFilePath)
break

I am not sure if I understand what you need, but I think you overcomplicate a few things. If the code below doesn't help you, let me know and we will think about other approaches.
I run this to create an example like yours.
# setup example project structure
import os
import sys
PLATFORM = 'windows' if sys.platform.startswith('win') else 'linux'
DESKTOP_DIR = \
os.path.join(os.path.join(os.path.expanduser('~')), 'Desktop') \
if PLATFORM == 'linux' \
else os.path.join(os.path.join(os.environ['USERPROFILE']), 'Desktop')
example_dirs = ['20180829', '20180828', '20180827', '20180826']
for _dir in example_dirs:
path = os.path.join(DESKTOP_DIR, 'dir_from', 'dir_1', 'dir_2', 'dir_3', _dir)
os.makedirs(path, exist_ok=True)
And here's what you need.
# do what you want to do
dir_from = os.path.join(DESKTOP_DIR, 'dir_from')
dir_to = os.path.join(DESKTOP_DIR, 'dir_to')
target = '20180828'
for root, dirs, files in os.walk(dir_from, topdown=True):
for _dir in dirs:
if _dir == target:
path = os.path.join(root, _dir).replace(dir_from, dir_to)
os.makedirs(path, exist_ok=True)
continue

Related

Zip each folder (directory) recursively

I am trying to zip each folder on its own in Python. However, the first folder is being zipped and includes all folders within it. Could someone please explain what is going on? Should I not be using shutil for this?
#%% Set path variable
path = r"G:\Folder"
os.chdir(path)
os.getcwd()
#%% Zip all folders
def retrieve_file_paths(dirName):
# File paths variable
filePaths = []
# Read all directory, subdirectories and file lists
for root, directories, files in os.walk(dirName):
for filename in directories:
# Createthe full filepath by using os module
filePath = os.path.join(root, filename)
filePaths.append(filePath)
# return all paths
return filePaths
filepaths = retrieve_file_paths(path)
#%% Print folders and start zipping individually
for x in filepaths:
print(x)
shutil.make_archive(x, 'zip', path)
shutil.make_archive will make an archive of all files and subfolders - since this is what most people want. If you need more choice of what files are included, you must use zipfile directly.
You can do this right within the walk loop (that is what it's for).
import os
import zipfile
dirName = 'C:\...'
# Read all directory, subdirectories and file lists
for root, directories, files in os.walk(dirName):
zf = zipfile.ZipFile(os.path.join(root, "thisdir.zip"), "w", compression=zipfile.ZIP_DEFLATED, compresslevel=9)
for name in files:
if name == 'thisdir.zip': continue
filePath = os.path.join(root, name)
zf.write(filePath, arcname=name)
zf.close()
This will create a file "thisdir.zip" in each subdirectory, containing only the files within this directory.
(edit: tested & corrected code example)
Following Torben's answer to my question, I modified the code to zip each directory recursively. I realised what had happened was that I was not specifying sub directories. Code below:
#Set path variable
path = r"insert root directory here"
os.chdir(path)
# Declare the functionto return all file paths in selected directory
def retrieve_file_paths(dirName):
for root, dirs, files in os.walk(dirName):
for dir in dirs:
zf = zipfile.ZipFile(os.path.join(root+dir, root+dir+'.zip'), "w", compression=zipfile.ZIP_DEFLATED, compresslevel=9)
files = os.listdir(root+dir)
print(files)
filePaths.append(files)
for f in files:
filepath = root + dir +'/'+ f
zf.write(filepath, arcname=f)
zf.close()
retrieve_file_paths(path)
it's a relativly simple answer once you got a look onto the Docs.
You can see the following under shutil.make_archive:
Note This function is not thread-safe.
The way threading in computing works on a high level basis:
On a machine there are cores, which can process data. (e.g. AMD Ryzen 5 5800x with 8cores)
Within a process, there are threads (e.g. 16 Threads on the Ryzen 5800X).
However, in multiprocessing there is no data shared between the processes.
In multithreading within one process you can access data from the same variable.
Because this function is not thread-safe, you will share the variable "x" and access the same item. Which means there can only be one output.
Have a look into multithreading and works with locks in order to sequelize threads.
Cheers

Finding Subdirectories in Python

I want to find subdirectories in Python for a personal project, with a catch. I imagine I'd use something like os.walk(), but every instance I can find involving it uses a predefined string with the location of the folder to look at. For example, this code
import os
rootdir = 'path/to/dir'
for rootdir, dirs, files in os.walk(rootdir):
for subdir in dirs:
print(os.path.join(rootdir, subdir))
involves setting a defined rootdir. I do not want this. Instead, I want to just look in the file the code is being run at. If I run the code.py in a c:/users/me/ it should search all subdirectories of that location. If I move the code to another folder, it should search the subdirectories of that folder. Hope this makes sense.
Scripts can see their their own filename in the __file__ attribute. You can use that to find the script's directory and make that the basis of the search.
import os
root = os.path.split(os.path.realpath(__file__))[0]
print(root)
for rootdir, dirs, files in os.walk(root):
for subdir in dirs:
print(os.path.join(rootdir, subdir))

Can't get absolute path in Python

I've tried to use os.path.abspath(file) as well as Path.absolute(file) to get the paths of .png files I'm working on that are on a separate drive from the project folder that the code is in. The result from the following script is "Project Folder for the code/filename.png", whereas obviously what I need is the path to the folder that the .png is in;
for root, dirs, files in os.walk(newpath):
for file in files:
if not file.startswith("."):
if file.endswith(".png"):
number, scansize, letter = file.split("-")
filepath = os.path.abspath(file)
# replace weird backslash effects
correctedpath = filepath.replace(os.sep, "/")
newentry = [number, file, correctedpath]
textures.append(newentry)
I've read other answers on here that seem to suggest that the project file for the code can't be in the same directory as the folder that is being worked on. But that isn't the case here. Can someone kindly point out what I'm not getting? I need the absolute path because the purpose of the program will be to write the paths for the files into text files.
You could use pathlib.Path.rglob here to recursively get all the pngs:
As a list comprehension:
from pathlib import Path
search_dir = "/path/to/search/dir"
# This creates a list of tuples with `number` and the resolved path
paths = [(p.name.split("-")[0], p.resolve()) for p in Path(search_dir).rglob("*.png")]
Alternatively, you can process them in a loop:
paths = []
for p in Path(search_dir).rglob("*.png"):
number, scansize, letter = p.name.split("-")
# more processing ...
paths.append([number, p.resolve()])
I just recently wrote something like what you're looking for.
This code relies on the assumption that your files are the end of the path.
it's not suitable to find a directory or something like this.
there's no need for a nested loop.
DIR = "your/full/path/to/direcetory/containing/desired/files"
def get_file_path(name, template):
"""
#:param template: file's template (txt,html...)
#return: The path to the given file.
#rtype: str
"""
substring = f'{name}.{template}'
for path in os.listdir(DIR):
full_path = os.path.join(DIR, path)
if full_path.endswith(substring):
return full_path
The result from
for root, dirs, files in os.walk(newpath):
is that files just contains the filenames without a directory path. Using just filenames means that python by default uses your project folder as directory for those filenames. In your case the files are in newpath. You can use os.path.join to add a directory path to the found filenames.
filepath = os.path.join(newpath, file)
In case you want to find the png files in subdirectories the easiest way is to use glob:
import glob
newpath = r'D:\Images'
file_paths = glob.glob(newpath + "/**/*.png", recursive=True)
for file_path in file_paths:
print(file_path)

Python - Print all the directories except one

I have a python script that print all the directories from a main directory. What I want is to print all the directories expect the one that is old (that I include on exclude list).
For that I am using the following script:
include = 'C://Data//'
exclude = ['C:/Data//00_Old']
for root, dirs, files in os.walk(include, topdown=False):
dirs[:] = [d for d in dirs if d not in exclude]
for name in dirs:
directory = os.path.join(root, name)
print(directory)
Problem is: it is printing all the directories even the excluded one. What I am doing wrong?
To simplify it even further, you can do:
from pathlib import Path
# I'm assuming this is where all your sub-folders are that you want to filter.
include = 'C://Data//'
# You don't need the parent 'C://Data//' because you looping through the parent folder.
exclude = ['00_Old']
root_folder = Path(include)
for folder in root_folder.iterdir():
if folder not in exclude:
# do work
It is better to use the pathlib module for file system related requirements. I would suggest to try something like this.
from pathlib import Path
files = list(Path('C:/Data/').glob('**/*')) #recursively get all the file names
print([x for x in files if 'C:/Data/00_Old' not in str(x)])

Directory is not being recognized in Python

I'm uploading a zipped folder that contains a folder of text files, but it's not detecting that the folder that is zipped up is a directory. I think it might have something to do with requiring an absolute path in the os.path.isdir call, but can't seem to figure out how to implement that.
zipped = zipfile.ZipFile(request.FILES['content'])
for libitem in zipped.namelist():
if libitem.startswith('__MACOSX/'):
continue
# If it's a directory, open it
if os.path.isdir(libitem):
print "You have hit a directory in the zip folder -- we must open it before continuing"
for item in os.listdir(libitem):
The file you've uploaded is a single zip file which is simply a container for other files and directories. All of the Python os.path functions operate on files on your local file system which means you must first extract the contents of your zip before you can use os.path or os.listdir.
Unfortunately it's not possible to determine from the ZipFile object whether an entry is for a file or directory.
A rewrite or your code which does an extract first may look something like this:
import tempfile
# Create a temporary directory into which we can extract zip contents.
tmpdir = tempfile.mkdtemp()
try:
zipped = zipfile.ZipFile(request.FILES['content'])
zipped.extractall(tmpdir)
# Walk through the extracted directory structure doing what you
# want with each file.
for (dirpath, dirnames, filenames) in os.walk(tmpdir):
# Look into subdirectories?
for dirname in dirnames:
full_dir_path = os.path.join(dirpath, dirname)
# Do stuff in this directory
for filename in filenames:
full_file_path = os.path.join(dirpath, filename)
# Do stuff with this file.
finally:
# ... Clean up temporary diretory recursively here.
Usually to make things handle relative paths etc when running scripts you'd want to use os.path.
It seems to me that you're reading from a Zipfile the items you've not actually unzipped it so why would you expect the file/dirs to exist?
Usually I'd print os.getcwd() to find out where I am and also use os.path.join to join with the root of the data directory, whether that is the same as the directory containing the script I can't tell. Using something like scriptdir = os.path.dirname(os.path.abspath(__file__)).
I'd expect you would have to do something like
libitempath = os.path.join(scriptdir, libitem)
if os.path.isdir(libitempath):
....
But I'm guessing at what you're doing as it's a little unclear for me.

Categories