Detect last subdir os.walk() - python

I need to copy similar config files to the very end of each random depth sub tree. I'm using os.walk() to get dirName and subdirlist, but can't get how to ensure of copying to last subdir only. example:
tree dir/sd1/sd2
dir/sd3/sd4/sd5
dir/sd6/sd7/sd8/sd9/sd10
there are hundreds of subdirs, dir names are pretty random, I use them to change few lines in a config file(I use fileinput library there without problem, to replace few lines in template). how to filter out only path till the end and copy only in sd2,sd5,sd10? I tried with top-down option too but did not succeed.

It seems to be quite easy with os.walk() API:
import os
for root, dirs, files in os.walk('.'):
if not dirs:
print(root, "is a directory without subdirectories")
# do whatever you need to do with your files here

Related

Finding Subdirectories in Python

I want to find subdirectories in Python for a personal project, with a catch. I imagine I'd use something like os.walk(), but every instance I can find involving it uses a predefined string with the location of the folder to look at. For example, this code
import os
rootdir = 'path/to/dir'
for rootdir, dirs, files in os.walk(rootdir):
for subdir in dirs:
print(os.path.join(rootdir, subdir))
involves setting a defined rootdir. I do not want this. Instead, I want to just look in the file the code is being run at. If I run the code.py in a c:/users/me/ it should search all subdirectories of that location. If I move the code to another folder, it should search the subdirectories of that folder. Hope this makes sense.
Scripts can see their their own filename in the __file__ attribute. You can use that to find the script's directory and make that the basis of the search.
import os
root = os.path.split(os.path.realpath(__file__))[0]
print(root)
for rootdir, dirs, files in os.walk(root):
for subdir in dirs:
print(os.path.join(rootdir, subdir))

How to search the entire HDD for all pdf files?

As the title suggests, I would like to get python 3.5 to search my root ('C:\')
for pdf files and then move those files to a specific folder.
This task can easily split into 2:
1. Search my root for files with the pdf extension.
2. Move those to a specific folder.
Now. I know how to search for a specific file name, but not plural files that has a specific extension.
import os
print('Welcome to the Walker Module.')
print('find(name, path) or find_all(name, path)')
def find(name, path):
for root, dirs, files in os.walk(path):
print('Searching for files...')
if name in files:
return os.path.join(root, name)
def find_all(name, path):
result = []
for root, dirs, files in os.walk(path):
print('Searching for files...')
if name in files:
result.append(os.path.join(root, name))
return result
This little program will find me either the 1st or all locations of a specific file.
I, however, can not modify this to be able to search for pdf files due to the lack of knowledge with python and programming in general.
Would love to have some kind of insight on where to go from here.
To sum it up,
Search the root for all pdf files.
Move those files into a specific location. Lets say 'G:\Books'
Thanks in advance.
Your find_all function is very close to the final result.
When you loop through the files, you can check their extension with os.path.splitext, and if they are .pdf files you can move them with shutil.move
Here's an example that walks the tree of a source directory, checks the extension of every file and, in case of match, moves the files to a destination directory:
import os
import shutil
def move_all_ext(extension, source_root, dest_dir):
# Recursively walk source_root
for (dirpath, dirnames, filenames) in os.walk(source_root):
# Loop through the files in current dirpath
for filename in filenames:
# Check file extension
if os.path.splitext(filename)[-1] == extension:
# Move file
shutil.move(os.path.join(dirpath, filename), os.path.join(dest_dir, filename))
# Move all pdf files from C:\ to G:\Books
move_all_ext(".pdf", "C:\\", "G:\\Books")
You can use glob from python 3.5 onwards. It supports a recursive search.
If recursive is true, the pattern “**” will match any files and zero or more directories and subdirectories. If the pattern is followed by an os.sep, only directories and subdirectories match.
Therefore you can use it like
import glob
from os import path
import shutil
def searchandmove(wild, srcpath, destpath):
search = path.join(srcpath,'**', wild)
for fpath in glob.iglob(search, recursive=True):
print(fpath)
dest = path.join(destpath, path.basename(fpath))
shutil.move(fpath, dest)
searchandmove('*.pdf', 'C:\\', 'G:\\Books')
With a minimum of string wrangling. For large searches however such as from the root of a filesystem it can take a while, but I'm sure any approach would have this issue.
Tested only on linux, but should work fine on windows. Whatever you pass as destpath must already exist.

Unable to use getsize method with os.walk() returned files

I am trying to make a small program that looks through a directory (as I want to find recursively all the files in the sub directories I use os.walk()).
Here is my code:
import os
import os.path
filesList=[]
path = "C:\\Users\Robin\Documents"
for(root,dirs,files) in os.walk(path):
for file in files:
filesList+=file
Then I try to use the os.path.getsize() method to elements of filesList, but it doesn't work.
Indeed, I realize that the this code fills the list filesList with characters. I don't know what to do, I have tried several other things, such as :
for(root,dirs,files) in os.walk(path):
filesList+=[file for file in os.listdir(root) if os.path.isfile(file)]
This does give me files, but only one, which isn't even visible when looking in the directory.
Can someone explain me how to obtain files with which we can work (that is to say, get their size, hash them, or modify them...) on with os.walk ?
I am new to Python, and I don't really understand how to use os.walk().
The issue I suspect you're running into is that file contains only the filename itself, not any directories you have to navigate through from your starting folder. You should use os.path.join to combine the file name with the folder it is in, which is the root value yielded by os.walk:
for(root,dirs,files) in os.walk(path):
for file in files:
filesList.append(os.path.join(root, file))
Now all the filenames in filesList will be acceptable to os.path.getsize and other functions (like open).
I also fixed a secondary issue, which is that your use of += to extend a list wouldn't work the way you intended. You'd need to wrap the new file path in a list for that to work. Using append is more appropriate for adding a single value to the end of a list.
If you want to get a list of files including path use:
for(root, dirs, files) in os.walk(path):
fullpaths = [os.path.join(root, fil) for fil in files]
filesList+=fullpaths

How do you get the absolute path of a file in Python?

I have read quite a few links on the site saying to use "os.path.abspath(#filename)". This method isn't exactly working for me. I am writing a program that will be able to search a given directory for files with certain extensions, save the name and absolute path as keys and values (respectively) into a dictionary, and then use the absolute path to open the files and make the edits that are required. The problem I am having is that when I use os.path.abspath() it isn't returning the full path.
Let's say my program is on the desktop. I have a file stored at "C:\Users\Travis\Desktop\Test1\Test1A\test.c". My program can easily locate this file, but when I use os.path.abspath() it returns "C:\Users\Travis\Desktop\test.c" which is the absolute path of where my source code is stored, but not the file I was searching for.
My exact code is:
import os
Files={}#Dictionary that will hold file names and absolute paths
root=os.getcwd()#Finds starting point
for root, dirs, files in os.walk(root):
for file in files:
if file.endswith('.c'):#Look for files that end in .c
Files[file]=os.path.abspath(file)
Any tips or advice as to why it may be doing this and how I can fix it? Thanks in advance!
os.path.abspath() makes a relative path absolute relative to the current working directory, not to the file's original location. A path is just a string, Python has no way of knowing where the filename came from.
You need to supply the directory yourself. When you use os.walk, each iteration lists the directory being listed (root in your code), the list of subdirectories (just their names) and a list of filenames (again, just their names). Use root together with the filename to make an absolute path:
Files={}
cwd = os.path.abspath(os.getcwd())
for root, dirs, files in os.walk(cwd):
for file in files:
if file.endswith('.c'):
Files[file] = os.path.join(root, os.path.abspath(file))
Note that your code only records the one path for each unique filename; if you have foo/bar/baz.c and foo/spam/baz.c, it depends on the order the OS listed the bar and spam subdirectories which one of the two paths wins.
You may want to collect paths into a list instead:
Files={}
cwd = os.path.abspath(os.getcwd())
for root, dirs, files in os.walk(cwd):
for file in files:
if file.endswith('.c'):
full_path = os.path.join(root, os.path.abspath(file))
Files.setdefault(file, []).append(full_path)
Per the docs for os.path.join,
If any component is an absolute path, all previous components (on
Windows, including the previous drive letter, if there was one) are
thrown away
So, for example, if the second argument is an absolute path, the first path, '/a/b/c' is discarded.
In [14]: os.path.join('/a/b/c', '/d/e/f')
Out[14]: '/d/e/f'
Therefore,
os.path.join(root, os.path.abspath(file))
will discard root no matter what it is, and return os.path.abspath(file) which will tack file on to the current working directory, which will not necessarily be the same as root.
Instead, to form the absolute path to the file:
fullpath = os.path.abspath(os.path.join(root, file))
Actually, I believe the os.path.abspath is unnecessary, since I believe root will always be absolute, but my reasoning for that depends on the source code for os.walk not just the documented (guaranteed) behavior of os.walk. So to be absolutely sure (pun intended), use os.path.abspath.
import os
samefiles = {}
root = os.getcwd()
for root, dirs, files in os.walk(root):
for file in files:
if file.endswith('.c'):
fullpath = os.path.join(root, file)
samefiles.setdefault(file, []).append(fullpath)
print(samefiles)
Glob is useful in these cases, you can do:
files = {f:os.path.join(os.getcwd(), f) for f in glob.glob("*.c")}
to get the same result

Efficiently removing subdirectories in dirnames from os.walk

On a mac in python 2.7 when walking through directories using os.walk my script goes through 'apps' i.e. appname.app, since those are really just directories of themselves. Well later on in processing I am hitting errors when going through them. I don't want to go through them anyways so for my purposes it would be best just to ignore those types of 'directories'.
So this is my current solution:
for root, subdirs, files in os.walk(directory, True):
for subdir in subdirs:
if '.' in subdir:
subdirs.remove(subdir)
#do more stuff
As you can see, the second for loop will run for every iteration of subdirs, which is unnecessary since the first pass removes everything I want to remove anyways.
There must be a more efficient way to do this. Any ideas?
You can do something like this (assuming you want to ignore directories containing '.'):
subdirs[:] = [d for d in subdirs if '.' not in d]
The slice assignment (rather than just subdirs = ...) is necessary because you need to modify the same list that os.walk is using, not create a new one.
Note that your original code is incorrect because you modify the list while iterating over it, which is not allowed.
Perhaps this example from the Python docs for os.walk will be helpful. It works from the bottom up (deleting).
# Delete everything reachable from the directory named in "top",
# assuming there are no symbolic links.
# CAUTION: This is dangerous! For example, if top == '/', it
# could delete all your disk files.
import os
for root, dirs, files in os.walk(top, topdown=False):
for name in files:
os.remove(os.path.join(root, name))
for name in dirs:
os.rmdir(os.path.join(root, name))
I am a bit confused about your goal, are you trying to remove a directory subtree and are encountering errors, or are you trying to walk a tree and just trying to list simple file names (excluding directory names)?
I think all that is required is to remove the directory before iterating over it:
for root, subdirs, files in os.walk(directory, True):
if '.' in subdirs:
subdirs.remove('.')
for subdir in subdirs:
#do more stuff

Categories