Python walker that does NOT go into specific folders - python

this is my first post, so be gentle. ;)
PROBLEM: I would like to be able to use os.walk as a directory walker, but not do into certain folders. Ex:
Tree:
\Proj1_0
\Load001
\lib
\src
\Proj2_0
\Load001
\lib
\src
\Load002
\lib
\src
I want to show the projects and loads, but not the sub-directories under loads. I can do that using the following code.
import os
for root, subFolders, files in os.walk('.'):
# root does NOT contain 'Load'
if root.find('Load') == -1:
print "\nPROJECT: " + root + "\n"
for folder in subFolders:
print " " + folder
However, the list is a big list, so I tried using del to but could not get it to work right and the same thing using lists, such as (which I got from another post here):
def my_walk(top_dir, ignore):
for dirpath, dirnames, filenames in os.walk(top_dir):
dirnames[:] = [
dn for dn in dirnames
if os.path.join(dirpath, dn) not in ignore]
yield dirpath, dirnames, filename
list my_walk('.','Load')
But I could not get the return to work properly, either. I am new to Python and appreciate any help. Thanks!

Try:
dirnames[:] = [
dn for dn in dirnames
if ignore not in os.path.join(dirpath, dn)]
You want to keep directories where os.path.join(dirpath, dn) does not contain the string ignore.
By the way, you are right to use dirnames[:] on the left-hand side of the assignment. To prune the directories visited by os.walk, you have to modify the same list dirnames.
dirnames[:] = ... modifies the same list in-place.
dirnames = ... would redirect the name dirnames to a different value.

You can try the following:
for x in os.walk('.', topdown=True):
dirpath, dirnames, dirfiles = x
print(dirpath, dirnames)
dirnames[:] = filter(lambda x : not x.startswith('Load'), dirnames)
From help(os.walk), you can modify the names if topdown is True, in order to restrict the search.

BTW, this is what I ended up with...
import os,string
path = '.'
path = os.path.normpath(path)
res = []
for root,dirs,files in os.walk(path, topdown=True):
depth = root[len(path) + len(os.path.sep):].count(os.path.sep)
if depth == 2:
# We're currently two directories in, so all subdirs have depth 3
res += [os.path.join(root, d) for d in dirs]
dirs[:] = [] # Don't recurse any deeper
print(res)
I know this is an old post, but thought I should update it with my answer. In case, anyone else finds it useful.

Related

Checking if pairs from zip are correct?

I need your advice on this problem.
I have collected what I need in these two lists: simpl2, astik, with this code:
simpl2 = []
astik = []
for path, subdirs, files in os.walk(rootfolder):
for name in files:
if 'sim2.shp' == name:
simpl2.append(os.path.join(path, name))
elif 'ASTIK.shp' == name:
astik.append(os.path.join(path, name))
The code above searches in a rootfolder that contains folders: v1.v2,v3,v4
So using this:
for i,j in zip(simpl2,astik):
print(i,j)
gives this:
CONTENT
C:\Users\user\Desktop\pl\v1\exported\sim2.shp C:\Users\user\Desktop\pl\v1\ASTIK\ASTIK.shp
C:\Users\user\Desktop\pl\v2\exported\sim2.shp C:\Users\user\Desktop\pl\v4\ASTIK\ASTIK.shp
Question
How to ensure that the pairs would be from the same folder (like the first row that come both from v1 and if don't (like the second row where one is from v2 and the other from v4) make them not have a pair at all.
This should happen because, they will be used later and they have to be correct pairs otherwise I have a code ready with exception for those that don't have a pair, so the problem is how to fix this part that is described earlier.
Explanation
The rootfolder is:
C:\Users\user\Desktop\pl
after that pl there is a v1,v2,v3,v4 folder. Each of these folders has some files that are the same to all the 4 folders. The only difference is that some will be empty. I just want to check if correct pairs of the same v are created in the lists.
Ok, seeing your update maybe you are interested in something more like this:
import os
simpl2 = []
astik = []
rootfolder = r'C:\Users\user\Desktop\pl'
subfolders = [os.path.join(rootfolder, i) for i in ['v1','v2','v3','v4']]
for folder in subfolders:
temp = {name: os.path.join(path, name)
for path, subdirs, files in os.walk(folder)
for name in files
if name in ['sim2.shp', 'ASTIK.shp']}
if len(temp) == 2:
simpl2.append(temp['sim2.shp'])
astik.append(temp['ASTIK.shp'])
OLD CODE
But... if this is your end goal you could also just store the paths. If both files are in the path then you know the path contains both files. You can then easily build the endpaths with os.path.join() when needed.
paths = []
for path, subdirs, files in os.walk(rootfolder):
if ('sim2.shp' in files) and ('ASTIK.shp' in files):
paths.append(path)
Or a more compact format:
lookfor = ['sim2.shp','ASTIK.shp']
paths = [p for p,s,f in os.walk(rootfolder) if all(i in f for i in lookfor)]

How do you count subdirectories in a folder?

I figured out how to count directories in a folder, but not sure how I could edit my code to recursively count subdirectories. Any help would be appreciated.
This is my code so far.
def nestingLevel(path):
count = 0
for item in os.listdir(path):
if item[0] != '.':
n = os.path.join(path,item)
if os.path.isdir(n):
count += 1 + nestingLevel(n)
return count
I think you may want to use os.walk:
import os
def fcount(path):
count1 = 0
for root, dirs, files in os.walk(path):
count1 += len(dirs)
return count1
path = "/home/"
print fcount(path)
You can use a glob here - the ** pattern indicates a recursive glob. The trailing slash matches on directories, excluding other types of files.
from pathlib import Path
def recursive_subdir_count(path):
dirs = Path(path).glob('**/')
result = sum(1 for dir in dirs)
result -= 1 # discount `path` itself
Using / works on windows, macOS, and Linux, so don't worry about putting os.sep instead.
Beware of a weird edge case: shell globs typically exclude hidden directories, i.e. those which begin with a ., but pathlib includes those (it's a feature, not a bug: see issue26096). If you care about discounting hidden directories, filter them out in the expression when calling sum. Or, use the older module glob which excludes them by default.
If you want to count them all without the root, this will do it:
len([i for i, j, k in os.walk('.')])-1

os.listdir analog for a zipped directory

My goal is to list all files contained in the certain sub-directory inside a zip-archive.
os.listdir(target_dir) raises a FileNotFoundError, and zfile.namelist() just lists all the files in all directories.
Any ideas?
Try the following:
files = list(filter(lambda f: f.startswith("subdir"), zfile.namelist()))
print(files)
Explanation: filter filters the list supplied by zfile.namelist() on a lambda that is checking whether the filename starts with "subdir".
The filter function does not return a list but rather a filter object (generator) and thus we need to convert it to a list.
You could also use the following line which does the same but uses list comprehension:
files = [f for f in zfile.namelist() if f.startswith("subdir")]
Edit: As pointed out by advance512: "The problem with this solution is that it will also return files in subdirectories inside the subdirectory you're checking.":
files = [f for f in zfile.namelist() if f.startswith("subdir") and f.count("/") == 1]
This will not return any files in sub-sub directories.
You can use the supplied zip_listdir function, which is a bit quick-n-dirty but should always work in Unix clones.
class MockZipFile(object):
fake_file_names = [
"string.pyc", # Top level name
"test/__init__.pyc", # Package directory
"test/test_support.pyc", # Module test.test_support
"test/bogus/__init__.pyc", # Subpackage directory
"test/bogus/myfile.pyc" # Submodule test.bogus.myfile
]
def namelist(self):
return self.fake_file_names
def zip_listdir(zip_file, target_dir):
file_names = zip_file.namelist()
if not target_dir.endswith("/"):
target_dir += "/"
if target_dir == "/":
target_dir = ""
result = [ file_name
for file_name in file_names
if file_name.startswith(target_dir) and
not "/" in file_name[len(target_dir):]
]
return result
mockZipfile = MockZipFile()
print zip_listdir(zip_file=mockZipfile, target_dir="test")
print zip_listdir(zip_file=mockZipfile, target_dir="test/bogus")
print zip_listdir(zip_file=mockZipfile, target_dir="test/")
print zip_listdir(zip_file=mockZipfile, target_dir="/")
print zip_listdir(zip_file=mockZipfile, target_dir="")
print zip_listdir(zip_file=mockZipfile, target_dir="/asd")
Please note I created a MockZipFile class, and am using it as the input for the zip_listdir function, but a proper zipfile object should work exactly the same.

get all folders (os.walk) that are older than x days, delete

How can I concisely express "get all folders older than x days"
I have a method getOldDirs(dirPath, olderThanDays), it must walk through a given root folder and return a list of folders that are older than say 7 days.
I call the above function from another function cleanOldFolders(). cleanOldFolders() will delete those folders similar to "rm -Rf
code that I have, how can I modify the loops concisely:
"""
Clean oldFolders
"""
def cleanOldFolders(self):
pathString = self.folderRoot + '/' + self.configMode + '/' + self.appId
oldDirList = self.getOldDirs(pathString, 7);
# Notify user that the following folders are deleted
# remove all old dirs perhaps using shutil.removetree for each folder oldDirList, rm -Rf
return
Get old dirs:
"""
get all subfolders under dirPath older than olderThanDays
"""
def getOldDirs(self,dirPath, olderThanDays):
# What is the concise way of expressing Get me list of all dir/subdirs from "dirPath" that are older than "olderThanDays"
# I know I have to use os.walk,
# I want a concise loop like this - but should recurse using os.walk
a = [os.path.join(dirPath, myfile) for myfile in os.listdir(dirPath)
if (os.path.isdir(os.path.join(dirPath, myfile)) and
(self.isOlder(os.path.join(dirPath, myfile), olderThanDays))
)]
# for root, dirs, files in os.walk(dirPath):
# for name in dirs:
# print os.path.join(root, name)
return a
One of the nice things about os.walk() is that it does the recursing for you. For its usage in your application it's important to specify the optional keyword argument topdown as False because its default is True and os.rmdir() won't delete non-empty directories.
This means your code will need to delete all the files and subdirectories in each subdirectory it encounters before removing the subdirectory itself. To facilitate doing that, the directory list getOldDirs() returns should be in the order that the subdirectories need to be deleted in.
It's also important to note that in the following, the directory's age is calculated in fractional, not whole, days, which means that seconds count and that one that was only say, 6 days and 23 hours and 59 seconds old won't get put on the list to be deleted even though it is only two seconds away from being old enough.
import os
import time
def getOldDirs(self, dirPath, olderThanDays):
"""
return a list of all subfolders under dirPath older than olderThanDays
"""
olderThanDays *= 86400 # convert days to seconds
present = time.time()
for root, dirs, files in os.walk(dirPath, topdown=False):
for name in dirs:
subDirPath = os.path.join(root, name)
if (present - os.path.getmtime(subDirPath)) > olderThanDays:
yield subDirPath
This should be a starting point.
import os
from time import time as _time
SEVEN_DAYS = 60*60*24*7
def get_old_dirs(dir_path, older_than=SEVEN_DAYS):
time_now = _time()
for path, folders, files in os.walk(dir_path):
for folder in folders:
folder_path = os.path.join(path, folder)
if (time_now - os.path.getmtime(folder_path)) > older_than:
yield folder_path
list_of_folders = list(get_old_dirs("/some/path"))
Also, if you don't want to walk into folders that are older than older_than days (because you're going to delete them) you can prune the search tree be removing folder names from the folders list
def get_old_dirs(dir_path, older_than=SEVEN_DAYS):
time_now = _time()
for path, folders, files in os.walk(dir_path):
for folder in folders[:]:
folder_path = os.path.join(path, folder)
if (time_now - os.path.getmtime(folder_path)) > older_than:
yield folder_path
folders.remove(folder)
This uses os.walk and gets you the list of files older than 7 days
import os
from datetime import date
old_dirs = []
today = date.today()
for root, dirs, files in os.walk(start_path):
for name in dirs:
filedate = date.fromtimestamp(os.path.getmtime(os.path.join(root, name)))
if (today - filedate).days > 7:
old_dirs.append(name)

How to traverse through the files in a directory?

I have a directory logfiles. I want to process each file inside this directory using a Python script.
for file in directory:
# do something
How do I do this?
With os.listdir() or os.walk(), depending on whether you want to do it recursively.
In Python 2, you can try something like:
import os.path
def print_it(x, dir_name, files):
print dir_name
print files
os.path.walk(your_dir, print_it, 0)
Note: the 3rd argument of os.path.walk is whatever you want. You'll get it as the 1st arg of the callback.
In Python 3 os.path.walk has been removed; use os.walk instead. Instead of taking a callback, you just pass it a directory and it yields (dirpath, dirnames, filenames) triples. So a rough equivalent of the above becomes
import os
for dirpath, dirnames, filenames in os.walk(your_dir):
print dirpath
print dirnames
print filenames
You can list every file from a directory recursively like this.
from os import listdir
from os.path import isfile, join, isdir
def getAllFilesRecursive(root):
files = [ join(root,f) for f in listdir(root) if isfile(join(root,f))]
dirs = [ d for d in listdir(root) if isdir(join(root,d))]
for d in dirs:
files_in_d = getAllFilesRecursive(join(root,d))
if files_in_d:
for f in files_in_d:
files.append(join(root,f))
return files
import os
# location of directory you want to scan
loc = '/home/sahil/Documents'
# global dictonary element used to store all results
global k1
k1 = {}
# scan function recursively scans through all the diretories in loc and return a dictonary
def scan(element,loc):
le = len(element)
for i in range(le):
try:
second_list = os.listdir(loc+'/'+element[i])
temp = loc+'/'+element[i]
print "....."
print "Directory %s " %(temp)
print " "
print second_list
k1[temp] = second_list
scan(second_list,temp)
except OSError:
pass
return k1 # return the dictonary element
# initial steps
try:
initial_list = os.listdir(loc)
print initial_list
except OSError:
print "error"
k =scan(initial_list,loc)
print " ..................................................................................."
print k
I made this code as a directory scanner to make a playlist feature for my audio player and it will recursively scan all the sub directories present in directory.
You could try glob:
import glob
for file in glob.glob('log-*-*.txt'):
# Etc.
But glob doesn't work recursively (as far as I know), so if your logs are in folders inside of that directory, you'd be better off looking at what Ignacio Vazquez-Abrams posted.
If you need to check for multiple file types, use
glob.glob("*.jpg") + glob.glob("*.png")
Glob doesn't care about the ordering of the files in the list. If you need files sorted by filename, use
sorted(glob.glob("*.jpg"))
import os
rootDir = '.'
for dirName, subdirList, fileList in os.walk(rootDir):
print('Found directory: %s' % dirName)
for fname in fileList:
print('\t%s' % fname)
# Remove the first entry in the list of sub-directories
# if there are any sub-directories present
if len(subdirList) > 0:
del subdirList[0]
Here's my version of the recursive file walker based on the answer of Matheus Araujo, that can take optional exclusion list arguments, which happens to be very helpful when dealing with tree copies where some directores / files / file extensions aren't wanted.
import os
def get_files_recursive(root, d_exclude_list=[], f_exclude_list=[], ext_exclude_list=[], primary_root=None):
"""
Walk a path to recursively find files
Modified version of https://stackoverflow.com/a/24771959/2635443 that includes exclusion lists
:param root: path to explore
:param d_exclude_list: list of root relative directories paths to exclude
:param f_exclude_list: list of filenames without paths to exclude
:param ext_exclude_list: list of file extensions to exclude, ex: ['.log', '.bak']
:param primary_root: Only used for internal recursive exclusion lookup, don't pass an argument here
:return: list of files found in path
"""
# Make sure we use a valid os separator for exclusion lists, this is done recursively :(
d_exclude_list = [os.path.normpath(d) for d in d_exclude_list]
files = [os.path.join(root, f) for f in os.listdir(root) if os.path.isfile(os.path.join(root, f))
and f not in f_exclude_list and os.path.splitext(f)[1] not in ext_exclude_list]
dirs = [d for d in os.listdir(root) if os.path.isdir(os.path.join(root, d))]
for d in dirs:
p_root = os.path.join(primary_root, d) if primary_root is not None else d
if p_root not in d_exclude_list:
files_in_d = get_files_recursive(os.path.join(root, d), d_exclude_list, f_exclude_list, ext_exclude_list, primary_root=p_root)
if files_in_d:
for f in files_in_d:
files.append(os.path.join(root, f))
return files
This is an update of my last version that accepts glob style wildcards in exclude lists.
The function basically walks into every subdirectory of the given path and returns the list of all files from those directories, as relative paths.
Function works like Matheus' answer, and may use optional exclude lists.
Eg:
files = get_files_recursive('/some/path')
files = get_files_recursive('/some/path', f_exclude_list=['.cache', '*.bak'])
files = get_files_recursive('C:\\Users', d_exclude_list=['AppData', 'Temp'])
files = get_files_recursive('/some/path', ext_exclude_list=['.log', '.db'])
Hope this helps someone like the initial answer of this thread helped me :)
import os
from fnmatch import fnmatch
def glob_path_match(path, pattern_list):
"""
Checks if path is in a list of glob style wildcard paths
:param path: path of file / directory
:param pattern_list: list of wildcard patterns to check for
:return: Boolean
"""
return any(fnmatch(path, pattern) for pattern in pattern_list)
def get_files_recursive(root, d_exclude_list=None, f_exclude_list=None, ext_exclude_list=None, primary_root=None):
"""
Walk a path to recursively find files
Modified version of https://stackoverflow.com/a/24771959/2635443 that includes exclusion lists
and accepts glob style wildcards on files and directories
:param root: path to explore
:param d_exclude_list: list of root relative directories paths to exclude
:param f_exclude_list: list of filenames without paths to exclude
:param ext_exclude_list: list of file extensions to exclude, ex: ['.log', '.bak']
:param primary_root: Only used for internal recursive exclusion lookup, don't pass an argument here
:return: list of files found in path
"""
if d_exclude_list is not None:
# Make sure we use a valid os separator for exclusion lists, this is done recursively :(
d_exclude_list = [os.path.normpath(d) for d in d_exclude_list]
else:
d_exclude_list = []
if f_exclude_list is None:
f_exclude_list = []
if ext_exclude_list is None:
ext_exclude_list = []
files = [os.path.join(root, f) for f in os.listdir(root) if os.path.isfile(os.path.join(root, f))
and not glob_path_match(f, f_exclude_list) and os.path.splitext(f)[1] not in ext_exclude_list]
dirs = [d for d in os.listdir(root) if os.path.isdir(os.path.join(root, d))]
for d in dirs:
p_root = os.path.join(primary_root, d) if primary_root is not None else d
if not glob_path_match(p_root, d_exclude_list):
files_in_d = get_files_recursive(os.path.join(root, d), d_exclude_list, f_exclude_list, ext_exclude_list,
primary_root=p_root)
if files_in_d:
for f in files_in_d:
files.append(os.path.join(root, f))
return files

Categories