Efficiently removing subdirectories in dirnames from os.walk - python

On a mac in python 2.7 when walking through directories using os.walk my script goes through 'apps' i.e. appname.app, since those are really just directories of themselves. Well later on in processing I am hitting errors when going through them. I don't want to go through them anyways so for my purposes it would be best just to ignore those types of 'directories'.
So this is my current solution:
for root, subdirs, files in os.walk(directory, True):
for subdir in subdirs:
if '.' in subdir:
subdirs.remove(subdir)
#do more stuff
As you can see, the second for loop will run for every iteration of subdirs, which is unnecessary since the first pass removes everything I want to remove anyways.
There must be a more efficient way to do this. Any ideas?

You can do something like this (assuming you want to ignore directories containing '.'):
subdirs[:] = [d for d in subdirs if '.' not in d]
The slice assignment (rather than just subdirs = ...) is necessary because you need to modify the same list that os.walk is using, not create a new one.
Note that your original code is incorrect because you modify the list while iterating over it, which is not allowed.

Perhaps this example from the Python docs for os.walk will be helpful. It works from the bottom up (deleting).
# Delete everything reachable from the directory named in "top",
# assuming there are no symbolic links.
# CAUTION: This is dangerous! For example, if top == '/', it
# could delete all your disk files.
import os
for root, dirs, files in os.walk(top, topdown=False):
for name in files:
os.remove(os.path.join(root, name))
for name in dirs:
os.rmdir(os.path.join(root, name))
I am a bit confused about your goal, are you trying to remove a directory subtree and are encountering errors, or are you trying to walk a tree and just trying to list simple file names (excluding directory names)?

I think all that is required is to remove the directory before iterating over it:
for root, subdirs, files in os.walk(directory, True):
if '.' in subdirs:
subdirs.remove('.')
for subdir in subdirs:
#do more stuff

Related

Rename a directory with a filename inside with Python

I am trying to rename several directories with the name of the first file inside them.
I am trying to:
List the files inside a folder.
Identify the directories.
For each directory, access it, grab the name of the first file inside and rename the directory with such name.
This is what I got so far but it is not working. I know the code is wrong but before fixing the code I would like to know if the logic is right. Can anyone help please?
import os
for (root, dirs, files) in os.walk('.'):
print(f'Found directory: {dirpath}')
dirlist = []
for d_idx, d in enumerate(dirlist):
print(d)
filelist = []
for f_idex, f in enumerate(filelist):
files.append(f)[1]
print(f)
os.rename(d, f)
Thank you!
There are a few problems in your code:
You are renaming directories as you iterate them with os.walk. This is not a good idea, os.walk gives you a generator, meaning it creates elements as you iterate them, so renaming things within the loop will confuse it.
Both for d_idx, d in enumerate(dirlist): and for f_idex, f in enumerate(filelist): iterate over variables that are declared to be empty lists in the line before, so those loops don't do anything. Also, within the second one, files.append(f) would append f to the list files, but the [1] at the end means "get the second element (remeber Python indexing is 0-based) of the value returned by the append function" - but append does not return anything (it modifies the list, not returns a new list), so that would fail (and you are not using the value read by [1] anyway, so it would not do anything).
In os.rename(d, f), first, since the loops before do not ever run, d and f will not have a value, but also, assuming both d and f came from dirs and files, they would be given as paths relative to their parents, not to your current directory (.), so the renaming would fail.
This code should work as you want:
import os
# List of paths to rename
renames = []
# Walk current dir
for (root, dirs, files) in os.walk('.'):
# Skip this dir (cannot rename current directory)
if root == '.': continue
# Add renaming to list
renames.append((root, files[0]))
# Iterate renaming list in reverse order so deepest dirs are renamed first
for root, new_name in reversed(renames):
# Make new full dir name (relative to current directory)
new_full_name = os.path.join(os.path.dirname(root), new_name)
# Rename
os.rename(root, new_full_name)

os.walk but with directories on top?

I have some simple code to print out the structure of a directory.
My example directory ABC contains subdirectory A containing A.txt, a subdirectory Z containing Z.txt, and a file info.txt. In real use, this will be big collection of many files and nested directories.
import os
topdir = 'ABC/'
for dirpath, dirnames, files in os.walk(topdir):
print(os.path.join(dirpath))
for name in files:
print(os.path.join(dirpath, name))
The output is:
ABC/
ABC/info.txt
ABC/A
ABC/A/A.txt
ABC/Z
ABC/Z/Z.txt
How can I make it so directories are processed/printed on the top?
I want the output to replicate what I see in Windows Explorer, which displays directories first, and files after.
The output I want:
ABC/
ABC/A
ABC/A/A.txt
ABC/Z
ABC/Z/Z.txt
ABC/info.txt
Without storing all the files in a list and sorting that list in one way or the other, you could make a recursive function and first recurse to the next level of the directory structure before printing the files on the current level:
def print_dirs(directories):
try:
dirpath, dirnames, files = next(directories)
print(dirpath) # print current path; no need for join here
for _ in dirnames: # once for each sub-directory...
print_dirs(directories) # ... recursively call generator
for name in files: # now, print files in current directory
print(os.path.join(dirpath, name))
except StopIteration:
pass
print_dirs(os.walk(topdir))
The same could also be done with a stack, but I think this way it's a little bit clearer. And yes, this will also store some directories in a list/on a stack, but not all the files but just as many as there are levels of nested directories.
Edit: This had a problem of printing any next directory on the generator, even if that's not a sub-directory but a sibling (or "uncle" or whatever). The for _ in dirnames loop should fix that, making the recursive call once for each of the subdirectories, if any. The directory itself does not have to be passed as a parameter as it will be gotten from the generator.

How should I use os.walk() without walking in certain subdirectories? [duplicate]

I need to list all files with the containing directory path inside a folder. I tried to use os.walk, which obviously would be the perfect solution.
However, it also lists hidden folders and files. I'd like my application not to list any hidden folders or files. Is there any flag you can use to make it not yield any hidden files?
Cross-platform is not really important to me, it's ok if it only works for linux (.* pattern)
No, there is no option to os.walk() that'll skip those. You'll need to do so yourself (which is easy enough):
for root, dirs, files in os.walk(path):
files = [f for f in files if not f[0] == '.']
dirs[:] = [d for d in dirs if not d[0] == '.']
# use files and dirs
Note the dirs[:] = slice assignment; os.walk recursively traverses the subdirectories listed in dirs. By replacing the elements of dirs with those that satisfy a criteria (e.g., directories whose names don't begin with .), os.walk() will not visit directories that fail to meet the criteria.
This only works if you keep the topdown keyword argument to True, from the documentation of os.walk():
When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting, or even to inform walk() about directories the caller creates or renames before it resumes walk() again.
I realize it wasn't asked in the question, but I had a similar problem where I wanted to exclude both hidden files and files beginning with __, specifically __pycache__ directories. I landed on this question because I was trying to figure out why my list comprehension was not doing what I expected. I was not modifying the list in place with dirnames[:].
I created a list of prefixes I wanted to exclude and modified the dirnames in place like so:
exclude_prefixes = ('__', '.') # exclusion prefixes
for dirpath, dirnames, filenames in os.walk(node):
# exclude all dirs starting with exclude_prefixes
dirnames[:] = [dirname
for dirname in dirnames
if not dirname.startswith(exclude_prefixes)]
My use-case was similar to that of OP, except I wanted to return a count of the total number of sub-directories inside a certain folder. In my case I wanted to omit any sub-directories named .git (as well as any folders that may be nested inside these .git folders).
In Python 3.6.7, I found that the accepted answer's approach didn't work -- it counted all .git folder and their sub-folders. Here's what did work for me:
num_local_subdir = 0
for root, dirs, files in os.walk(local_folder_path):
if '.git' in dirs:
dirs.remove('.git')
num_local_subdir += (len(dirs))
Another solution that can allow you to skip those hidden folders using any and map functions.
for root, dirs, files in os.walk(path):
if any(map(lambda p: p[0] == '.', dirs)):
continue

Navigating specific dirs in filter with os.walk

I am aware that I can remove dirs from os.walk using something along the lines of
for root, dirs, files in os.walk('/path/to/dir'):
ignore = ['dir1', 'dir2']
dirs[:] = [d for d in dirs if d not in ignore]
I want to do the opposite of this, so only keep the dirs in list. Ive tried a few variations but to no avail. Any pointers would be appreciated.
The dirs i am interested in are 2 levels down, so I have taken on the comments and created global variables for the sub levels and am using the following Code.
Expected Functionality
for root, dirs, files in os.walk(global_subdir):
keep = ['dir1', 'dir2']
dirs[:] = [d for d in dirs if d in keep]
for filename in files:
print os.path.join(root, filename)
As said in the comments of a deleted answer -
As mentioned already, this doesnt work. The dirs in keep are 2 levels sub root. Im guessing this is causing the problem
The issue is that the directory one level above your required directory would not be traversed since its not in your keep list, hence the program would never reach till your required directories.
The best way to solve this would be to start os.walk at the directory that is just one level above your required directory.
But if this is not possible (like maybe the directories one level above the required one is not known before traversing) or ( the required directories have different directories one level above). And what you really want is to just avoid looping through the files for directories that are not in the keep directory.
A solution would be to traverse all directories, but loop through the files only when root is in the keep list (or set for better performance). Example -
keep = set(['required directory1','required directory2'])
for root, dirs, files in os.walk(global_subdir):
if root in keep:
for filename in files:
print os.path.join(root, filename)

os.walk without hidden folders

I need to list all files with the containing directory path inside a folder. I tried to use os.walk, which obviously would be the perfect solution.
However, it also lists hidden folders and files. I'd like my application not to list any hidden folders or files. Is there any flag you can use to make it not yield any hidden files?
Cross-platform is not really important to me, it's ok if it only works for linux (.* pattern)
No, there is no option to os.walk() that'll skip those. You'll need to do so yourself (which is easy enough):
for root, dirs, files in os.walk(path):
files = [f for f in files if not f[0] == '.']
dirs[:] = [d for d in dirs if not d[0] == '.']
# use files and dirs
Note the dirs[:] = slice assignment; os.walk recursively traverses the subdirectories listed in dirs. By replacing the elements of dirs with those that satisfy a criteria (e.g., directories whose names don't begin with .), os.walk() will not visit directories that fail to meet the criteria.
This only works if you keep the topdown keyword argument to True, from the documentation of os.walk():
When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting, or even to inform walk() about directories the caller creates or renames before it resumes walk() again.
I realize it wasn't asked in the question, but I had a similar problem where I wanted to exclude both hidden files and files beginning with __, specifically __pycache__ directories. I landed on this question because I was trying to figure out why my list comprehension was not doing what I expected. I was not modifying the list in place with dirnames[:].
I created a list of prefixes I wanted to exclude and modified the dirnames in place like so:
exclude_prefixes = ('__', '.') # exclusion prefixes
for dirpath, dirnames, filenames in os.walk(node):
# exclude all dirs starting with exclude_prefixes
dirnames[:] = [dirname
for dirname in dirnames
if not dirname.startswith(exclude_prefixes)]
My use-case was similar to that of OP, except I wanted to return a count of the total number of sub-directories inside a certain folder. In my case I wanted to omit any sub-directories named .git (as well as any folders that may be nested inside these .git folders).
In Python 3.6.7, I found that the accepted answer's approach didn't work -- it counted all .git folder and their sub-folders. Here's what did work for me:
num_local_subdir = 0
for root, dirs, files in os.walk(local_folder_path):
if '.git' in dirs:
dirs.remove('.git')
num_local_subdir += (len(dirs))
Another solution that can allow you to skip those hidden folders using any and map functions.
for root, dirs, files in os.walk(path):
if any(map(lambda p: p[0] == '.', dirs)):
continue

Categories