I have read quite a few links on the site saying to use "os.path.abspath(#filename)". This method isn't exactly working for me. I am writing a program that will be able to search a given directory for files with certain extensions, save the name and absolute path as keys and values (respectively) into a dictionary, and then use the absolute path to open the files and make the edits that are required. The problem I am having is that when I use os.path.abspath() it isn't returning the full path.
Let's say my program is on the desktop. I have a file stored at "C:\Users\Travis\Desktop\Test1\Test1A\test.c". My program can easily locate this file, but when I use os.path.abspath() it returns "C:\Users\Travis\Desktop\test.c" which is the absolute path of where my source code is stored, but not the file I was searching for.
My exact code is:
import os
Files={}#Dictionary that will hold file names and absolute paths
root=os.getcwd()#Finds starting point
for root, dirs, files in os.walk(root):
for file in files:
if file.endswith('.c'):#Look for files that end in .c
Files[file]=os.path.abspath(file)
Any tips or advice as to why it may be doing this and how I can fix it? Thanks in advance!
os.path.abspath() makes a relative path absolute relative to the current working directory, not to the file's original location. A path is just a string, Python has no way of knowing where the filename came from.
You need to supply the directory yourself. When you use os.walk, each iteration lists the directory being listed (root in your code), the list of subdirectories (just their names) and a list of filenames (again, just their names). Use root together with the filename to make an absolute path:
Files={}
cwd = os.path.abspath(os.getcwd())
for root, dirs, files in os.walk(cwd):
for file in files:
if file.endswith('.c'):
Files[file] = os.path.join(root, os.path.abspath(file))
Note that your code only records the one path for each unique filename; if you have foo/bar/baz.c and foo/spam/baz.c, it depends on the order the OS listed the bar and spam subdirectories which one of the two paths wins.
You may want to collect paths into a list instead:
Files={}
cwd = os.path.abspath(os.getcwd())
for root, dirs, files in os.walk(cwd):
for file in files:
if file.endswith('.c'):
full_path = os.path.join(root, os.path.abspath(file))
Files.setdefault(file, []).append(full_path)
Per the docs for os.path.join,
If any component is an absolute path, all previous components (on
Windows, including the previous drive letter, if there was one) are
thrown away
So, for example, if the second argument is an absolute path, the first path, '/a/b/c' is discarded.
In [14]: os.path.join('/a/b/c', '/d/e/f')
Out[14]: '/d/e/f'
Therefore,
os.path.join(root, os.path.abspath(file))
will discard root no matter what it is, and return os.path.abspath(file) which will tack file on to the current working directory, which will not necessarily be the same as root.
Instead, to form the absolute path to the file:
fullpath = os.path.abspath(os.path.join(root, file))
Actually, I believe the os.path.abspath is unnecessary, since I believe root will always be absolute, but my reasoning for that depends on the source code for os.walk not just the documented (guaranteed) behavior of os.walk. So to be absolutely sure (pun intended), use os.path.abspath.
import os
samefiles = {}
root = os.getcwd()
for root, dirs, files in os.walk(root):
for file in files:
if file.endswith('.c'):
fullpath = os.path.join(root, file)
samefiles.setdefault(file, []).append(fullpath)
print(samefiles)
Glob is useful in these cases, you can do:
files = {f:os.path.join(os.getcwd(), f) for f in glob.glob("*.c")}
to get the same result
Related
I've tried to use os.path.abspath(file) as well as Path.absolute(file) to get the paths of .png files I'm working on that are on a separate drive from the project folder that the code is in. The result from the following script is "Project Folder for the code/filename.png", whereas obviously what I need is the path to the folder that the .png is in;
for root, dirs, files in os.walk(newpath):
for file in files:
if not file.startswith("."):
if file.endswith(".png"):
number, scansize, letter = file.split("-")
filepath = os.path.abspath(file)
# replace weird backslash effects
correctedpath = filepath.replace(os.sep, "/")
newentry = [number, file, correctedpath]
textures.append(newentry)
I've read other answers on here that seem to suggest that the project file for the code can't be in the same directory as the folder that is being worked on. But that isn't the case here. Can someone kindly point out what I'm not getting? I need the absolute path because the purpose of the program will be to write the paths for the files into text files.
You could use pathlib.Path.rglob here to recursively get all the pngs:
As a list comprehension:
from pathlib import Path
search_dir = "/path/to/search/dir"
# This creates a list of tuples with `number` and the resolved path
paths = [(p.name.split("-")[0], p.resolve()) for p in Path(search_dir).rglob("*.png")]
Alternatively, you can process them in a loop:
paths = []
for p in Path(search_dir).rglob("*.png"):
number, scansize, letter = p.name.split("-")
# more processing ...
paths.append([number, p.resolve()])
I just recently wrote something like what you're looking for.
This code relies on the assumption that your files are the end of the path.
it's not suitable to find a directory or something like this.
there's no need for a nested loop.
DIR = "your/full/path/to/direcetory/containing/desired/files"
def get_file_path(name, template):
"""
#:param template: file's template (txt,html...)
#return: The path to the given file.
#rtype: str
"""
substring = f'{name}.{template}'
for path in os.listdir(DIR):
full_path = os.path.join(DIR, path)
if full_path.endswith(substring):
return full_path
The result from
for root, dirs, files in os.walk(newpath):
is that files just contains the filenames without a directory path. Using just filenames means that python by default uses your project folder as directory for those filenames. In your case the files are in newpath. You can use os.path.join to add a directory path to the found filenames.
filepath = os.path.join(newpath, file)
In case you want to find the png files in subdirectories the easiest way is to use glob:
import glob
newpath = r'D:\Images'
file_paths = glob.glob(newpath + "/**/*.png", recursive=True)
for file_path in file_paths:
print(file_path)
I am very new to python.
I need to iterate through the subdirectories of a given directory and return all files containing a certain string.
for root, dirs, files in os.walk(path):
for name in files:
if name.endswith((".sql")):
if 'gen_dts' in open(name).read():
print name
This was the closest I got.
The syntax error I get is
Traceback (most recent call last):
File "<pyshell#77>", line 4, in <module>
if 'gen_dts' in open(name).read():
IOError: [Errno 2] No such file or directory: 'dq_offer_desc_bad_pkey_vw.sql'
The 'dq_offer_desc_bad_pkey_vw.sql' file does not contain 'gen_dts' in it.
I appreciate the help in advance.
You're getting that error because you're trying to open name, which is just the file's name, not it's full relative path. What you need to do is open(os.path.join(root, name), 'r') (I added the mode since it's good practice).
for root, dirs, files in os.walk(path):
for name in files:
if name.endswith('.sql'):
filepath = os.path.join(root, name)
if 'gen_dts' in open(filepath, 'r').read():
print filepath
os.walk() returns a generator that gives you tuples like (root, dirs, files), where root is the current directory, and dirs and files are the names of the directories and files, respectively, that are in the root directory. Note that they are the names, not the paths; or to be precise, they're the path of that directory/file relative to the current root directory, which is another way of saying the same thing. Another way to think of it is that the directories and files in dirs and files will never have slashes in them.
One final point; the root directory paths always begin with the path that you pass to os.walk(), whether it was relative to your current working directory or not. So, for os.walk('three'), the root in the first tuple will be 'three' (for os.walk('three/'), it'll be 'three/'). For os.walk('../two/three'), it'll be '../two/three'. For os.walk('/one/two/three/'), it'll be '/one/two/three/'; the second one might be '/one/two/three/four'.
The files are just the file names. You need to add the path to the before opening them. Use os.path.join.
I am trying to make a small program that looks through a directory (as I want to find recursively all the files in the sub directories I use os.walk()).
Here is my code:
import os
import os.path
filesList=[]
path = "C:\\Users\Robin\Documents"
for(root,dirs,files) in os.walk(path):
for file in files:
filesList+=file
Then I try to use the os.path.getsize() method to elements of filesList, but it doesn't work.
Indeed, I realize that the this code fills the list filesList with characters. I don't know what to do, I have tried several other things, such as :
for(root,dirs,files) in os.walk(path):
filesList+=[file for file in os.listdir(root) if os.path.isfile(file)]
This does give me files, but only one, which isn't even visible when looking in the directory.
Can someone explain me how to obtain files with which we can work (that is to say, get their size, hash them, or modify them...) on with os.walk ?
I am new to Python, and I don't really understand how to use os.walk().
The issue I suspect you're running into is that file contains only the filename itself, not any directories you have to navigate through from your starting folder. You should use os.path.join to combine the file name with the folder it is in, which is the root value yielded by os.walk:
for(root,dirs,files) in os.walk(path):
for file in files:
filesList.append(os.path.join(root, file))
Now all the filenames in filesList will be acceptable to os.path.getsize and other functions (like open).
I also fixed a secondary issue, which is that your use of += to extend a list wouldn't work the way you intended. You'd need to wrap the new file path in a list for that to work. Using append is more appropriate for adding a single value to the end of a list.
If you want to get a list of files including path use:
for(root, dirs, files) in os.walk(path):
fullpaths = [os.path.join(root, fil) for fil in files]
filesList+=fullpaths
What I want to do is iterate through folders in a directory and in each folder find a file 'fileX' which I want to give to a method which itself needs the file name as a parameter to open it and get a specific value from it. So 'method' will extract some value from 'fileX' (the file name is the same in every folder).
My code looks something like this but I always get told that the file I want doesn't exist which is not the case:
import os
import xy
rootdir =r'path'
for root, dirs, files in os.walk(rootdir):
for file in files:
gain = xy.method(fileX)
print gain
Also my folders I am iterating through are named like 'folderX0', 'folderX1',..., 'folderX99', meaning they all have the same name with increasing ending numbers. It would be nice if I could tell the program to ignore every other folder which might be in 'path'.
Thanks for the help!
os.walk returns file and directory names relative to the root directory that it gives. You can combine them with os.path.join:
for root, dirs, files in os.walk(rootdir):
for file in files:
gain = xy.method(os.path.join(root, file))
print gain
See the documentation for os.walk for details:
To get a full path (which begins with top) to a file or directory in dirpath, do os.path.join(dirpath, name).
To trim it to ignore any folders but those named folderX, you could do something like the following. When doing os.walk top down (the default), you can delete items from the dirs list to prevent os.walk from looking in those directories.
for root, dirs, files in os.walk(rootdir):
for dir in dirs:
if not re.match(r'folderX[0-9]+$', dir):
dirs.remove(dir)
for file in files:
gain = xy.method(os.path.join(root, file))
print gain
I have this script, which I have no doubt is flawed:
import fnmatch, os, sys
def findit (rootdir, find, pattern):
for folder, dirs, files in os.walk(rootdir):
print (folder)
for filename in fnmatch.filter(files,pattern):
with open(filename) as f:
s = f.read()
f.close()
if find in s :
print(filename)
findit(sys.argv[1], sys.argv[2], sys.argv[3])
when I run it I get Errno2, no such file or directory. BUT the file exists. For instance if I execute it by going: findit.py c:\python "folder" *.py it will work just fine, listing all the *.py files which contain the word "folder". BUT if I go findit.py c:\php\projects1 "include" *.php
as an example I get [Errno2] no such file or directory: 'About.php' (for example). But About.php exists. I don't understand what it's doing, or what I'm doing wrong.
If you look at any of the examples for os.walk, you'll see that they all do os.path.join(root, name). You need to do that too.
Why? Quoting from the docs:
filenames is a list of the names of the non-directory files in dirpath. Note that the names in the lists contain no path components. To get a full path (which begins with top) to a file or directory in dirpath, do os.path.join(dirpath, name).
If you just use the filename as a path, it's going to look for a file of the same name in the current working directory. If there's no such file, you'll get a FileNotFoundError. If there is such a file, you'll open and read the wrong file. Only if you happen to be looking inside the current working directory will it work.
There's also another major problem in your code: os.walk walks a directory tree recursively, finding all files in the given top directory, or any subdirectory of top, or any subdirectory of… and so on, yielding once for each directory. But you're not doing anything useful with that (except printing out the folders). Instead, you wait until it finishes, and then use the files from whichever directory it happened to reach last.
If you just want to get a flat listing of the files directly in a directory, use os.listdir, not os.walk. (Or maybe use glob.glob instead of explicitly listing everything then filtering with fnmatch.)
On the other hand, if you want to walk the tree, you have to move your second for loop inside the first one.
You've also got a minor problem: You call f.close() inside a with open(…) as f:, which leads to f being closed twice. This is guaranteed to be completely harmless (at least in 2.5+, including 3.x), but it's still a bad idea.
Putting it together, here's a working version of your code:
def findit (rootdir, find, pattern):
for folder, dirs, files in os.walk(rootdir):
print (folder)
for filename in fnmatch.filter(files,pattern):
pathname = os.path.join(folder, filename)
with open(pathname) as f:
s = f.read()
if find in s:
print(pathname)
You are using a relative filename. But your current directory does not contain the file. And you don't want to search there anyway. Use os.path.join(folder, filename) to make an absolute path.