I want to talk a few directories once, and just grab the info for one dir. Currently I use:
i = 0
for root, dirs, files in os.walk(home_path):
if i >= 1:
return 1
i += 1
for this_dir in dirs:
do stuff
This is horribly tedious of course. When I want to walk the subdir under it, I do the same 5 lines, using j, etc...
What is the shortest way to grab all dirs and files underneath a single directory in python?
You can empty the dirs list and os.walk() won't recurse:
for root, dirs, files in os.walk(home_path):
for dir in dirs:
# do something with each directory
dirs[:] = [] # clear directories.
Note the dirs[:] = slice assignment; we are replacing the elements in dirs (and not the list referred to by dirs) so that os.walk() will not process deleted directories.
This only works if you keep the topdown keyword argument to True, from the documentation of os.walk():
When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting, or even to inform walk() about directories the caller creates or renames before it resumes walk() again.
Alternatively, use os.listdir() and filter the names out into directories and files yourself:
dirs = []
files = []
for name in os.listdir(home_path):
path = os.path.join(home_path, name)
if os.isdir(path):
dirs.append(name)
else:
files.append(name)
Related
I need to stop os.walk from going down further if the path contains both "release" and "arm-linux". I have a bunch of these at different levels of directories. So I can't simply dictate the level. So far I have the following and it unnecessarily dive past directories in 'arm-linux'.
def main(argv):
for root, dirs, files in os.walk("."):
path = root.split(os.sep)
if "release" and "arm-linux" in path:
print(os.path.abspath(root))
getSharedLib(argv)
[update] This is my solution
def main(argv):
for root, dirs, files in os.walk("."):
path = root.split(os.sep)
if "release" in path and "arm-linux" in path:
print(os.path.abspath(root))
getSharedLib(argv)
del dirs[:]
From the documentation
When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames;
Note that topdown is True by default.
Edit
To delete all the elements of dirs, you will need something like del dirs[:]. That will delete all the elements of the list object that is referred to as dirs in your code, but is referred to by another name in the os.walk code.
Just using del dirs will stop dirs in your code from referring to the list, but won't do anything to the os.walk reference. Similarly dirs = [] will replace what dirs in your code refers to, but won't affect os.walk code.
I need to list all files with the containing directory path inside a folder. I tried to use os.walk, which obviously would be the perfect solution.
However, it also lists hidden folders and files. I'd like my application not to list any hidden folders or files. Is there any flag you can use to make it not yield any hidden files?
Cross-platform is not really important to me, it's ok if it only works for linux (.* pattern)
No, there is no option to os.walk() that'll skip those. You'll need to do so yourself (which is easy enough):
for root, dirs, files in os.walk(path):
files = [f for f in files if not f[0] == '.']
dirs[:] = [d for d in dirs if not d[0] == '.']
# use files and dirs
Note the dirs[:] = slice assignment; os.walk recursively traverses the subdirectories listed in dirs. By replacing the elements of dirs with those that satisfy a criteria (e.g., directories whose names don't begin with .), os.walk() will not visit directories that fail to meet the criteria.
This only works if you keep the topdown keyword argument to True, from the documentation of os.walk():
When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting, or even to inform walk() about directories the caller creates or renames before it resumes walk() again.
I realize it wasn't asked in the question, but I had a similar problem where I wanted to exclude both hidden files and files beginning with __, specifically __pycache__ directories. I landed on this question because I was trying to figure out why my list comprehension was not doing what I expected. I was not modifying the list in place with dirnames[:].
I created a list of prefixes I wanted to exclude and modified the dirnames in place like so:
exclude_prefixes = ('__', '.') # exclusion prefixes
for dirpath, dirnames, filenames in os.walk(node):
# exclude all dirs starting with exclude_prefixes
dirnames[:] = [dirname
for dirname in dirnames
if not dirname.startswith(exclude_prefixes)]
My use-case was similar to that of OP, except I wanted to return a count of the total number of sub-directories inside a certain folder. In my case I wanted to omit any sub-directories named .git (as well as any folders that may be nested inside these .git folders).
In Python 3.6.7, I found that the accepted answer's approach didn't work -- it counted all .git folder and their sub-folders. Here's what did work for me:
num_local_subdir = 0
for root, dirs, files in os.walk(local_folder_path):
if '.git' in dirs:
dirs.remove('.git')
num_local_subdir += (len(dirs))
Another solution that can allow you to skip those hidden folders using any and map functions.
for root, dirs, files in os.walk(path):
if any(map(lambda p: p[0] == '.', dirs)):
continue
I have written an image carving script to assist with my work. The tool carves images by specified extention and compares to a hash database.
The tool is used to search across mounted drives, some which have operating systems on.
The problem I am having is that when a drive is mounted with an OS, it is searching across the 'All Users' directory, and so is including images from my local disc.
I can't figure out how to skip the 'All Users' directory and just stick to the mounted drive.
My section for os.walk is as follows:
for path, subdirs, files in os.walk(root):
for name in files:
if re.match(pattern, name.lower()):
appendfile.write (os.path.join(path, name))
appendfile.write ('\n')
log(name)
i=i+1
Any help is much appreciated
Assuming All Users is the name of the directory, you can remove the directory from your subdirs list, so that os.walk() does not iterate over it.
Example -
for path, subdirs, files in os.walk(root):
if 'All Users' in subdirs:
subdirs.remove('All Users')
for name in files:
if re.match(pattern, name.lower()):
appendfile.write (os.path.join(path, name))
appendfile.write ('\n')
log(name)
i=i+1
If you only want to not walk for All Users inside a particular parent, you can include the check for that as well in the above if condition.
From os.walk documentation -
os.walk(top, topdown=True, onerror=None, followlinks=False)
Generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames).
When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting, or even to inform walk() about directories the caller creates or renames before it resumes walk() again. Modifying dirnames when topdown is False is ineffective, because in bottom-up mode the directories in dirnames are generated before dirpath itself is generated.
topdown is normally true, unless specified otherwise.
if you have more than one directory to remove you can use a slice-assignment in oder to remove excluded directories in the subdirs
excl_dirs = {'All Users', 'some other dir'}
for path, dirnames, files in os.walk(root):
dirnames[:] = [d for d in dirnames if d not in excl_dirs]
...
as the documentation states:
When topdown is True, the caller can modify the dirnames list in-place
(perhaps using del or slice assignment), and walk() will only recurse
into the subdirectories whose names remain in dirnames; ..
I need to list all files with the containing directory path inside a folder. I tried to use os.walk, which obviously would be the perfect solution.
However, it also lists hidden folders and files. I'd like my application not to list any hidden folders or files. Is there any flag you can use to make it not yield any hidden files?
Cross-platform is not really important to me, it's ok if it only works for linux (.* pattern)
No, there is no option to os.walk() that'll skip those. You'll need to do so yourself (which is easy enough):
for root, dirs, files in os.walk(path):
files = [f for f in files if not f[0] == '.']
dirs[:] = [d for d in dirs if not d[0] == '.']
# use files and dirs
Note the dirs[:] = slice assignment; os.walk recursively traverses the subdirectories listed in dirs. By replacing the elements of dirs with those that satisfy a criteria (e.g., directories whose names don't begin with .), os.walk() will not visit directories that fail to meet the criteria.
This only works if you keep the topdown keyword argument to True, from the documentation of os.walk():
When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting, or even to inform walk() about directories the caller creates or renames before it resumes walk() again.
I realize it wasn't asked in the question, but I had a similar problem where I wanted to exclude both hidden files and files beginning with __, specifically __pycache__ directories. I landed on this question because I was trying to figure out why my list comprehension was not doing what I expected. I was not modifying the list in place with dirnames[:].
I created a list of prefixes I wanted to exclude and modified the dirnames in place like so:
exclude_prefixes = ('__', '.') # exclusion prefixes
for dirpath, dirnames, filenames in os.walk(node):
# exclude all dirs starting with exclude_prefixes
dirnames[:] = [dirname
for dirname in dirnames
if not dirname.startswith(exclude_prefixes)]
My use-case was similar to that of OP, except I wanted to return a count of the total number of sub-directories inside a certain folder. In my case I wanted to omit any sub-directories named .git (as well as any folders that may be nested inside these .git folders).
In Python 3.6.7, I found that the accepted answer's approach didn't work -- it counted all .git folder and their sub-folders. Here's what did work for me:
num_local_subdir = 0
for root, dirs, files in os.walk(local_folder_path):
if '.git' in dirs:
dirs.remove('.git')
num_local_subdir += (len(dirs))
Another solution that can allow you to skip those hidden folders using any and map functions.
for root, dirs, files in os.walk(path):
if any(map(lambda p: p[0] == '.', dirs)):
continue
On a mac in python 2.7 when walking through directories using os.walk my script goes through 'apps' i.e. appname.app, since those are really just directories of themselves. Well later on in processing I am hitting errors when going through them. I don't want to go through them anyways so for my purposes it would be best just to ignore those types of 'directories'.
So this is my current solution:
for root, subdirs, files in os.walk(directory, True):
for subdir in subdirs:
if '.' in subdir:
subdirs.remove(subdir)
#do more stuff
As you can see, the second for loop will run for every iteration of subdirs, which is unnecessary since the first pass removes everything I want to remove anyways.
There must be a more efficient way to do this. Any ideas?
You can do something like this (assuming you want to ignore directories containing '.'):
subdirs[:] = [d for d in subdirs if '.' not in d]
The slice assignment (rather than just subdirs = ...) is necessary because you need to modify the same list that os.walk is using, not create a new one.
Note that your original code is incorrect because you modify the list while iterating over it, which is not allowed.
Perhaps this example from the Python docs for os.walk will be helpful. It works from the bottom up (deleting).
# Delete everything reachable from the directory named in "top",
# assuming there are no symbolic links.
# CAUTION: This is dangerous! For example, if top == '/', it
# could delete all your disk files.
import os
for root, dirs, files in os.walk(top, topdown=False):
for name in files:
os.remove(os.path.join(root, name))
for name in dirs:
os.rmdir(os.path.join(root, name))
I am a bit confused about your goal, are you trying to remove a directory subtree and are encountering errors, or are you trying to walk a tree and just trying to list simple file names (excluding directory names)?
I think all that is required is to remove the directory before iterating over it:
for root, subdirs, files in os.walk(directory, True):
if '.' in subdirs:
subdirs.remove('.')
for subdir in subdirs:
#do more stuff