I am working on a backup script in Python, and would like it to be able to ignore folders. I therefore have a list of folders to be ignored, ie ['Folder 1', 'Folder3']. I am using os.walk, and am trying to get it to skip any folder in the ignored folders list or that has any of the ignored folders as a parent directory. Has anyone done this before, as examples I've seen don't seem to work and often end up creating an empty folder?
From the docs:
When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting, or even to inform walk() about directories the caller creates or renames before it resumes walk() again.
So, iterate through your list and remove entries that match.
After the following statement
folders = [path+'/'+dir for (path,dirs,files) in os.walk(base)
for dir in dirs
if dir not in ['Folder 1', 'Folder3', ...]]
the variable folders should contain the folders you are interested in.
Edit1: ... + '/' + ... works just in Unix-like OS. I think there is a os.path.join which does the same job platform indepentently
Edit2: If you want to exclude all Subdirectories of the directories to be excluded, you can try the following:
exclusions = ['Folder 1', 'Folder3', ...]
folders = [path+'/'+dir for (path,dirs,files) in os.walk(base)
if not any([f in path for f in exclusions])
for dir in dirs
if dir not in exclusions
]
Related
I have a recursive directory. Both subdirectory and files names have illegal characters. I have a function to clean up the names, such as it replaces a space with an underscore in the name. There must be an easier way but I couldn't find a way to both rename folders and files. So, I want to rename the folders first.
for path, subdirs, files in os.walk(root):
for name in subdirs:
new_name=clean_names(name)
name=os.path.join(path,name)
new_name=os.path.join(path,new_name)
os.chdir(path)
os.rename(name,new_name)
When I check my real folder and it contents I see that only the first subfolder name is corrected. I can see the reason because os.chdir(path) changes the cwd then it doesn't change back before for loop starts to second path. I thought after the os.rename I could rechange the cwd but I am sure there is a more elegant way to do this. If I remove the os.chdir line it gives filenotfound error.
I see that renaming subdirectories has been asked about before, but they are in command line.
You should use os.walk(root, topdown=False) instead; otherwise once the top folder gets renamed, os.walk won't have access to the subfolders because it can no longer find their parent folders.
Excerpt from the documentation:
If optional argument topdown is True or not specified, the triple for
a directory is generated before the triples for any of its
subdirectories (directories are generated top-down). If topdown is
False, the triple for a directory is generated after the triples for
all of its subdirectories (directories are generated bottom-up). No
matter the value of topdown, the list of subdirectories is retrieved
before the tuples for the directory and its subdirectories are
generated.
Note that you do not need to call os.chdir at all because all the paths passed to os.rename are absolute.
I've been porting (very simply) a Python script from Windows to Linux (directory changes mostly), and I want to add a few new features to it.
The script is used to update mods on a game server. All mods are located in ShooterGame/Content/Mods/. Some mods are included by default (TheCenter and 11111111) - every other mod is located in the same folder as the default ones, but the names consist of random numbers.
I've been trying to exclude the 2 default directories and then build a list of contents of the ShooterGame/Content/Mods/ folder, but I've failed to do so.
This is the code that I've tried to use to exclude just the TheCenter folder:
def build_list_of_mods(self):
"""
Build a list of all installed mods by grabbing all directory names from the mod folder
:return:
"""
exclude = ["TheCenter"]
if not os.path.isdir(os.path.join(self.working_dir, "ShooterGame/Content/Mods/")):
return
for curdir, dirs, files in os.walk(os.path.join(self.working_dir, "ShooterGame/Content/Mods/")):
for d in dirs:
dirs[:] = [d for d in dirs if d not in exclude]
self.installed_mods.append(d)
break
It doesn't work, sadly. Have I missed something or just done everything wrong?
Try adding topdown=True to the os.walk() function like this:
for curdir, dirs, files in os.walk(os.path.join(self.working_dir, "ShooterGame/Content/Mods/"), topdown=True):
Plus I cannot try it but maybe dirs[:] should be outside of the for-loop, as the documentation says:
When topdown is true, the caller can modify the dirnames list in-place (e.g., via del or slice assignment), and walk will only recurse into the subdirectories whose names remain in dirnames;
I'm assuming you want self.installed_mods to contain the values of dirs without the values of exclude.
You could simply call dirs.remove() with the values of exclude and then append the content of dirs to self.installed_mods.
Or in a shorter way: self.installed_mods.extend([dir for dir in dirs if dir not in exclude]).
I need to list all files with the containing directory path inside a folder. I tried to use os.walk, which obviously would be the perfect solution.
However, it also lists hidden folders and files. I'd like my application not to list any hidden folders or files. Is there any flag you can use to make it not yield any hidden files?
Cross-platform is not really important to me, it's ok if it only works for linux (.* pattern)
No, there is no option to os.walk() that'll skip those. You'll need to do so yourself (which is easy enough):
for root, dirs, files in os.walk(path):
files = [f for f in files if not f[0] == '.']
dirs[:] = [d for d in dirs if not d[0] == '.']
# use files and dirs
Note the dirs[:] = slice assignment; os.walk recursively traverses the subdirectories listed in dirs. By replacing the elements of dirs with those that satisfy a criteria (e.g., directories whose names don't begin with .), os.walk() will not visit directories that fail to meet the criteria.
This only works if you keep the topdown keyword argument to True, from the documentation of os.walk():
When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting, or even to inform walk() about directories the caller creates or renames before it resumes walk() again.
I realize it wasn't asked in the question, but I had a similar problem where I wanted to exclude both hidden files and files beginning with __, specifically __pycache__ directories. I landed on this question because I was trying to figure out why my list comprehension was not doing what I expected. I was not modifying the list in place with dirnames[:].
I created a list of prefixes I wanted to exclude and modified the dirnames in place like so:
exclude_prefixes = ('__', '.') # exclusion prefixes
for dirpath, dirnames, filenames in os.walk(node):
# exclude all dirs starting with exclude_prefixes
dirnames[:] = [dirname
for dirname in dirnames
if not dirname.startswith(exclude_prefixes)]
My use-case was similar to that of OP, except I wanted to return a count of the total number of sub-directories inside a certain folder. In my case I wanted to omit any sub-directories named .git (as well as any folders that may be nested inside these .git folders).
In Python 3.6.7, I found that the accepted answer's approach didn't work -- it counted all .git folder and their sub-folders. Here's what did work for me:
num_local_subdir = 0
for root, dirs, files in os.walk(local_folder_path):
if '.git' in dirs:
dirs.remove('.git')
num_local_subdir += (len(dirs))
Another solution that can allow you to skip those hidden folders using any and map functions.
for root, dirs, files in os.walk(path):
if any(map(lambda p: p[0] == '.', dirs)):
continue
I'm using Python to parse a WordPress site downloaded via wget. All the HTML files are nested inside a complicated folder structure (thanks to WordPress and its long URLs), like site_dump/2010/03/11/post-title/index.html.
However, within the post-title directory there are other directories for the feed and for Google News-esque number-based indexes:
site_dump/2010/03/11/post-title/index.html # I want this
site_dump/2010/03/11/post-title/feed/index.html # Not these
site_dump/2010/03/11/post-title/115232/site.com/2010/03/11/post-title/index.html
I only want to access the index.html files that are at the 5th nested level (site_dump/2010/03/11/post-title/index.html), and not beyond. Right now I split the root variable by a slash (/) in the os.walk loop and only deal with the file if it is inside 5 levels of folders:
import os
for root, dirs, files in os.walk('site_dump'):
nested_levels = root.split('/')
if len(nested_levels) == 5:
print(nested_levels) # Eventually do stuff with the file here
However, this seems kind of inefficient, since os.walk is still traversing those really deep folders. Is there a way to limit how deep os.walk goes when traversing a directory tree?
You can modify dirs in place to prevent further traversal into the directory structure.
for root, dirs, files in os.walk('site_dump'):
nested_levels = root.split('/')
if len(nested_levels) == 5:
del dirs[:]
# Eventually do stuff with the file here
del dirs[:] will remove the contents of the list, rather than replace dirs with a reference to a new list. When doing this it is important to modify the list in-place.
From the docs, with topdown referring to an optional parameter for os.walk that you omitted and defaults to True:
When topdown is True, the caller can modify the dirnames list in-place
(perhaps using del or slice assignment), and walk() will only recurse
into the subdirectories whose names remain in dirnames; this can be
used to prune the search, impose a specific order of visiting, or even
to inform walk() about directories the caller creates or renames
before it resumes walk() again. Modifying dirnames when topdown is
False is ineffective, because in bottom-up mode the directories in
dirnames are generated before dirpath itself is generated.
I need to list all files with the containing directory path inside a folder. I tried to use os.walk, which obviously would be the perfect solution.
However, it also lists hidden folders and files. I'd like my application not to list any hidden folders or files. Is there any flag you can use to make it not yield any hidden files?
Cross-platform is not really important to me, it's ok if it only works for linux (.* pattern)
No, there is no option to os.walk() that'll skip those. You'll need to do so yourself (which is easy enough):
for root, dirs, files in os.walk(path):
files = [f for f in files if not f[0] == '.']
dirs[:] = [d for d in dirs if not d[0] == '.']
# use files and dirs
Note the dirs[:] = slice assignment; os.walk recursively traverses the subdirectories listed in dirs. By replacing the elements of dirs with those that satisfy a criteria (e.g., directories whose names don't begin with .), os.walk() will not visit directories that fail to meet the criteria.
This only works if you keep the topdown keyword argument to True, from the documentation of os.walk():
When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting, or even to inform walk() about directories the caller creates or renames before it resumes walk() again.
I realize it wasn't asked in the question, but I had a similar problem where I wanted to exclude both hidden files and files beginning with __, specifically __pycache__ directories. I landed on this question because I was trying to figure out why my list comprehension was not doing what I expected. I was not modifying the list in place with dirnames[:].
I created a list of prefixes I wanted to exclude and modified the dirnames in place like so:
exclude_prefixes = ('__', '.') # exclusion prefixes
for dirpath, dirnames, filenames in os.walk(node):
# exclude all dirs starting with exclude_prefixes
dirnames[:] = [dirname
for dirname in dirnames
if not dirname.startswith(exclude_prefixes)]
My use-case was similar to that of OP, except I wanted to return a count of the total number of sub-directories inside a certain folder. In my case I wanted to omit any sub-directories named .git (as well as any folders that may be nested inside these .git folders).
In Python 3.6.7, I found that the accepted answer's approach didn't work -- it counted all .git folder and their sub-folders. Here's what did work for me:
num_local_subdir = 0
for root, dirs, files in os.walk(local_folder_path):
if '.git' in dirs:
dirs.remove('.git')
num_local_subdir += (len(dirs))
Another solution that can allow you to skip those hidden folders using any and map functions.
for root, dirs, files in os.walk(path):
if any(map(lambda p: p[0] == '.', dirs)):
continue