For loop skipping over certain folders - python

I am trying to loop through sub folders in a directory, but my for loop is skipping over some folders for a reason that is not clear to me, despite having the same name structure and properties as the other folders in the directory. I should add that the directory is a mapped network drive, but I'm not sure why that would matter. I've tried both os.walk and listdir to pick up the names of the folders:
root = r"mydir"
for directory, subdirectory, files in os.walk(mydir):
print(directory)
def get_immediate_subdirectories(a_dir):
return [name for name in os.listdir(a_dir)
if os.path.isdir(os.path.join(a_dir, name))]
Is there a way to force the code to pick up everything in my main directory other than what I'm doing here?
Expected Output:
['11111-Apples',
'11112-Bananas',
'11113-Grapes',
'11114-Oranges',
'11115-Pears']
Actual Output:
['11112-Bananas',
'11113-Grapes',
'11115-Pears']

Related

Finding Subdirectories in Python

I want to find subdirectories in Python for a personal project, with a catch. I imagine I'd use something like os.walk(), but every instance I can find involving it uses a predefined string with the location of the folder to look at. For example, this code
import os
rootdir = 'path/to/dir'
for rootdir, dirs, files in os.walk(rootdir):
for subdir in dirs:
print(os.path.join(rootdir, subdir))
involves setting a defined rootdir. I do not want this. Instead, I want to just look in the file the code is being run at. If I run the code.py in a c:/users/me/ it should search all subdirectories of that location. If I move the code to another folder, it should search the subdirectories of that folder. Hope this makes sense.
Scripts can see their their own filename in the __file__ attribute. You can use that to find the script's directory and make that the basis of the search.
import os
root = os.path.split(os.path.realpath(__file__))[0]
print(root)
for rootdir, dirs, files in os.walk(root):
for subdir in dirs:
print(os.path.join(rootdir, subdir))

os.walk but with directories on top?

I have some simple code to print out the structure of a directory.
My example directory ABC contains subdirectory A containing A.txt, a subdirectory Z containing Z.txt, and a file info.txt. In real use, this will be big collection of many files and nested directories.
import os
topdir = 'ABC/'
for dirpath, dirnames, files in os.walk(topdir):
print(os.path.join(dirpath))
for name in files:
print(os.path.join(dirpath, name))
The output is:
ABC/
ABC/info.txt
ABC/A
ABC/A/A.txt
ABC/Z
ABC/Z/Z.txt
How can I make it so directories are processed/printed on the top?
I want the output to replicate what I see in Windows Explorer, which displays directories first, and files after.
The output I want:
ABC/
ABC/A
ABC/A/A.txt
ABC/Z
ABC/Z/Z.txt
ABC/info.txt
Without storing all the files in a list and sorting that list in one way or the other, you could make a recursive function and first recurse to the next level of the directory structure before printing the files on the current level:
def print_dirs(directories):
try:
dirpath, dirnames, files = next(directories)
print(dirpath) # print current path; no need for join here
for _ in dirnames: # once for each sub-directory...
print_dirs(directories) # ... recursively call generator
for name in files: # now, print files in current directory
print(os.path.join(dirpath, name))
except StopIteration:
pass
print_dirs(os.walk(topdir))
The same could also be done with a stack, but I think this way it's a little bit clearer. And yes, this will also store some directories in a list/on a stack, but not all the files but just as many as there are levels of nested directories.
Edit: This had a problem of printing any next directory on the generator, even if that's not a sub-directory but a sibling (or "uncle" or whatever). The for _ in dirnames loop should fix that, making the recursive call once for each of the subdirectories, if any. The directory itself does not have to be passed as a parameter as it will be gotten from the generator.

How to rename sub directory and file names recursively in script python3?

I have a recursive directory. Both subdirectory and files names have illegal characters. I have a function to clean up the names, such as it replaces a space with an underscore in the name. There must be an easier way but I couldn't find a way to both rename folders and files. So, I want to rename the folders first.
for path, subdirs, files in os.walk(root):
for name in subdirs:
new_name=clean_names(name)
name=os.path.join(path,name)
new_name=os.path.join(path,new_name)
os.chdir(path)
os.rename(name,new_name)
When I check my real folder and it contents I see that only the first subfolder name is corrected. I can see the reason because os.chdir(path) changes the cwd then it doesn't change back before for loop starts to second path. I thought after the os.rename I could rechange the cwd but I am sure there is a more elegant way to do this. If I remove the os.chdir line it gives filenotfound error.
I see that renaming subdirectories has been asked about before, but they are in command line.
You should use os.walk(root, topdown=False) instead; otherwise once the top folder gets renamed, os.walk won't have access to the subfolders because it can no longer find their parent folders.
Excerpt from the documentation:
If optional argument topdown is True or not specified, the triple for
a directory is generated before the triples for any of its
subdirectories (directories are generated top-down). If topdown is
False, the triple for a directory is generated after the triples for
all of its subdirectories (directories are generated bottom-up). No
matter the value of topdown, the list of subdirectories is retrieved
before the tuples for the directory and its subdirectories are
generated.
Note that you do not need to call os.chdir at all because all the paths passed to os.rename are absolute.

python3 - filter os.walk subdirectories and retrieve file name+paths

I need help getting a list of file names and locations within specific sub-directories. My directory is structured as follows:
C:\folder
-->\2014
----->\14-0023
-------->\(folders with files inside)
----->\CLOSED
-------->\14-0055!
----------->\(folders with files inside)
-->\2015
----->\15-0025
-------->\(folders with files inside)
----->\CLOSED
-------->\15-0017!
----------->\(folders with files inside)
I would like to get a list of files and their paths ONLY if they are within CLOSED.
I have tried writing multiple scripts and search questions on SO, but have not been able to come up with something to retrieve the list I want. While there seems to be questions related to my trouble, such as Filtering os.walk() dirs and files , they don't quite have the same requirements as I do and I've thus far failed to adapt code I've found on SO for my purpose.
For example, here's some sample code from another SO thread I found that I tried to adapt for my purpose.
l=[]
include_prefixes = ['CLOSED']
for dir, dirs, files in os.walk(path2, topdown=True):
dirs[:] = [d for d in dirs if d in include_prefixes]
for file in files:
l.append(os.path.join(dir,file))
^the above got me an empty list...
After a few more failures, I thought to just get a list of the correct folder paths and make another script to iterate within THOSE folders.
l=[]
regex = re.compile(r'\d{2}-\d{4}', re.IGNORECASE)
for root, subFolders, files in os.walk(path2):
try:
subFolders.remove(regex)
except ValueError:
pass
for subFolder in subFolders:
l.append(os.path.join(root,subFolder))
^Yet I still failed and just got all the file paths in the directory. No matter what I do, I can't seem to force os.walk to (a) remove specific subdirs from it's list of subdirs and then (b) make it loop through those subdirs to get the file names and paths I need.
What should I fix in my example code? Or is there entirely different code that I should consider?

Python: Copy directories, not files, from multiple locations, overwriting if same name

I have a main folder netbooks_nbo which contain more dated folders. I want to get the last seven folders (by last modified date) and copy them to somewhere on the C:\ drive. Here's my current code:
Code looks like this:
import os
import distutils.core
def get_immediate_subdirectories(dir):
return [os.path.join(dir, name) for name in os.listdir(dir)
if os.path.isdir(os.path.join(dir, name))]
def main():
path = "\\\\Network_Drive\\netbooks_nbo"
all_dirs = get_immediate_subdirectories(path)
all_dirs.sort(key=lambda x: os.path.getmtime(x))
all_dirs = all_dirs[len(all_dirs)-7: len(all_dirs)]
for i in all_dirs:
for n in get_immediate_subdirectories(i):
distutils.dir_util.copy_tree(n, "C:\\AllFiles")
print "copied"+ n
The problem is that dir_util.copy_tree copies all the files, rather than the actual directories. I want to preserve the directory structure. I tried using shutil.copytree(src, dst) but it just returns an Error because C:\AllFiles will already exist after one iteration of the for loop. And shutil.copy(src,dst) doesn't work because of some bizarre permission error.
Any ideas?
If the directory trees are not too large, you could pack each directory tree into an archive file and then unpack each of the archive files to your destination.

Categories