Loop through folders in Python and for files containing strings - python

I am very new to python.
I need to iterate through the subdirectories of a given directory and return all files containing a certain string.
for root, dirs, files in os.walk(path):
for name in files:
if name.endswith((".sql")):
if 'gen_dts' in open(name).read():
print name
This was the closest I got.
The syntax error I get is
Traceback (most recent call last):
File "<pyshell#77>", line 4, in <module>
if 'gen_dts' in open(name).read():
IOError: [Errno 2] No such file or directory: 'dq_offer_desc_bad_pkey_vw.sql'
The 'dq_offer_desc_bad_pkey_vw.sql' file does not contain 'gen_dts' in it.
I appreciate the help in advance.

You're getting that error because you're trying to open name, which is just the file's name, not it's full relative path. What you need to do is open(os.path.join(root, name), 'r') (I added the mode since it's good practice).
for root, dirs, files in os.walk(path):
for name in files:
if name.endswith('.sql'):
filepath = os.path.join(root, name)
if 'gen_dts' in open(filepath, 'r').read():
print filepath
os.walk() returns a generator that gives you tuples like (root, dirs, files), where root is the current directory, and dirs and files are the names of the directories and files, respectively, that are in the root directory. Note that they are the names, not the paths; or to be precise, they're the path of that directory/file relative to the current root directory, which is another way of saying the same thing. Another way to think of it is that the directories and files in dirs and files will never have slashes in them.
One final point; the root directory paths always begin with the path that you pass to os.walk(), whether it was relative to your current working directory or not. So, for os.walk('three'), the root in the first tuple will be 'three' (for os.walk('three/'), it'll be 'three/'). For os.walk('../two/three'), it'll be '../two/three'. For os.walk('/one/two/three/'), it'll be '/one/two/three/'; the second one might be '/one/two/three/four'.

The files are just the file names. You need to add the path to the before opening them. Use os.path.join.

Related

Reading all files that start with a certain string in a directory

Say I have a directory.
In this directory there are single files as well as folders.
Some of those folders could also have subfolders, etc.
What I am trying to do is find all of the files in this directory that start with "Incidences" and read each csv into a pandas data frame.
I am able to loop through all the files and get the names, but cannot read them into data frames.
I am getting the error that "___.csv" does not exist, as it might not be directly in the directory, but rather in a folder in another folder in that directory.
I have been trying the attached code.
inc_files2 = []
pop_files2 = []
for root, dirs, files in os.walk(directory):
for f in files:
if f.startswith('Incidence'):
inc_files2.append(f)
elif f.startswith('Population Count'):
pop_files2.append(f)
for file in inc_files2:
inc_frames2 = map(pd.read_csv, inc_files2)
for file in pop_files2:
pop_frames2 = map(pd.read_csv, pop_files2)
You are adding only file name to the lists, not their path. You can use something like this to add paths instead:
inc_files2.append(os.path.join(root, f))
You have to add the path from the root directory where you are
Append the entire pathname, not just the bare filename, to inc_files2.
You can use os.path.abspath(f) to read the full path of a file.
You can make use of this by making the following changes to your code.
for root, dirs, files in os.walk(directory):
for f in files:
f_abs = os.path.abspath(f)
if f.startswith('Incidence'):
inc_files2.append(f_abs)
elif f.startswith('Population Count'):
pop_files2.append(f_abs)

Unable to resolve path with os.walk

I have a bit of code that searches for files in a network share that match a certain keyword. When a match is found, I would like to copy the found files to a different location on the network. The error I'm getting is as follows:
Traceback (most recent call last):
File "C:/Users/user.name/PycharmProjects/SearchDirectory/Sub-Search.py", line 15, in <module>
shutil.copy(path+name, dest)
File "C:\Python27\lib\shutil.py", line 119, in copy
copyfile(src, dst)
File "C:\Python27\lib\shutil.py", line 82, in copyfile
with open(src, 'rb') as fsrc:
IOError: [Errno 2] No such file or directory: '//server/otheruser$/Document (user).docx'
I believe it's because I'm trying to copy the found file without specifying its direct path, since some of the files are found in subfolders. If so, how can I store the direct path to a file when it matches the keyword? Here is the code I have so far:
import os
import shutil
dest = '//dbserver/user.name$/Reports/User'
path = '//dbserver/User$/'
keyword = 'report'
print 'Starting'
for root, dirs, files in os.walk(path):
for name in files:
if keyword in name.lower():
shutil.copy(path+name, dest)
print name
print 'Done'
PS. The user folder being accessed is hidden, hence the $.
Looking at the docs for os.walk, your error is most likely that you are not including the full path. To avoid having to worry about things like trailing slashes and OS/specific path separators, you should also consider using os.path.join.
Replace path+name with os.path.join(root, name). The root element is the path of the subdirectory under path actually containing name, which you are currently omitting from your full path.
You should also replace dest with os.path.join(dest, os.path.relpath(root, path)) if you wish to preserve the directory structure in the destination. os.path.relpath subtracts the path prefix of path from root, allowing you to create the same relative path under dest. If the correct subfolders do not exist, you may want to call os.mkdir or better yet os.makedirs on them as you go:
for root, dirs, files in os.walk(path):
out = os.path.join(dest, os.path.relpath(root, path))
#os.makedirs(out) # You may end up with empty folders if you put this line here
for name in files:
if keyword in name.lower():
os.makedirs(out) # This guarantees that only folders with at least one file get created
shutil.copy(os.path.join(root, name), out)
Finally, look into shutil.copytree, wich does something very similar to what you want. The only disadvantage is that it does not offer the fine level of control for things like filtering that os.walk does (which you are using).

iterating through folders and from each use one specific file in a method python

What I want to do is iterate through folders in a directory and in each folder find a file 'fileX' which I want to give to a method which itself needs the file name as a parameter to open it and get a specific value from it. So 'method' will extract some value from 'fileX' (the file name is the same in every folder).
My code looks something like this but I always get told that the file I want doesn't exist which is not the case:
import os
import xy
rootdir =r'path'
for root, dirs, files in os.walk(rootdir):
for file in files:
gain = xy.method(fileX)
print gain
Also my folders I am iterating through are named like 'folderX0', 'folderX1',..., 'folderX99', meaning they all have the same name with increasing ending numbers. It would be nice if I could tell the program to ignore every other folder which might be in 'path'.
Thanks for the help!
os.walk returns file and directory names relative to the root directory that it gives. You can combine them with os.path.join:
for root, dirs, files in os.walk(rootdir):
for file in files:
gain = xy.method(os.path.join(root, file))
print gain
See the documentation for os.walk for details:
To get a full path (which begins with top) to a file or directory in dirpath, do os.path.join(dirpath, name).
To trim it to ignore any folders but those named folderX, you could do something like the following. When doing os.walk top down (the default), you can delete items from the dirs list to prevent os.walk from looking in those directories.
for root, dirs, files in os.walk(rootdir):
for dir in dirs:
if not re.match(r'folderX[0-9]+$', dir):
dirs.remove(dir)
for file in files:
gain = xy.method(os.path.join(root, file))
print gain

How do you get the absolute path of a file in Python?

I have read quite a few links on the site saying to use "os.path.abspath(#filename)". This method isn't exactly working for me. I am writing a program that will be able to search a given directory for files with certain extensions, save the name and absolute path as keys and values (respectively) into a dictionary, and then use the absolute path to open the files and make the edits that are required. The problem I am having is that when I use os.path.abspath() it isn't returning the full path.
Let's say my program is on the desktop. I have a file stored at "C:\Users\Travis\Desktop\Test1\Test1A\test.c". My program can easily locate this file, but when I use os.path.abspath() it returns "C:\Users\Travis\Desktop\test.c" which is the absolute path of where my source code is stored, but not the file I was searching for.
My exact code is:
import os
Files={}#Dictionary that will hold file names and absolute paths
root=os.getcwd()#Finds starting point
for root, dirs, files in os.walk(root):
for file in files:
if file.endswith('.c'):#Look for files that end in .c
Files[file]=os.path.abspath(file)
Any tips or advice as to why it may be doing this and how I can fix it? Thanks in advance!
os.path.abspath() makes a relative path absolute relative to the current working directory, not to the file's original location. A path is just a string, Python has no way of knowing where the filename came from.
You need to supply the directory yourself. When you use os.walk, each iteration lists the directory being listed (root in your code), the list of subdirectories (just their names) and a list of filenames (again, just their names). Use root together with the filename to make an absolute path:
Files={}
cwd = os.path.abspath(os.getcwd())
for root, dirs, files in os.walk(cwd):
for file in files:
if file.endswith('.c'):
Files[file] = os.path.join(root, os.path.abspath(file))
Note that your code only records the one path for each unique filename; if you have foo/bar/baz.c and foo/spam/baz.c, it depends on the order the OS listed the bar and spam subdirectories which one of the two paths wins.
You may want to collect paths into a list instead:
Files={}
cwd = os.path.abspath(os.getcwd())
for root, dirs, files in os.walk(cwd):
for file in files:
if file.endswith('.c'):
full_path = os.path.join(root, os.path.abspath(file))
Files.setdefault(file, []).append(full_path)
Per the docs for os.path.join,
If any component is an absolute path, all previous components (on
Windows, including the previous drive letter, if there was one) are
thrown away
So, for example, if the second argument is an absolute path, the first path, '/a/b/c' is discarded.
In [14]: os.path.join('/a/b/c', '/d/e/f')
Out[14]: '/d/e/f'
Therefore,
os.path.join(root, os.path.abspath(file))
will discard root no matter what it is, and return os.path.abspath(file) which will tack file on to the current working directory, which will not necessarily be the same as root.
Instead, to form the absolute path to the file:
fullpath = os.path.abspath(os.path.join(root, file))
Actually, I believe the os.path.abspath is unnecessary, since I believe root will always be absolute, but my reasoning for that depends on the source code for os.walk not just the documented (guaranteed) behavior of os.walk. So to be absolutely sure (pun intended), use os.path.abspath.
import os
samefiles = {}
root = os.getcwd()
for root, dirs, files in os.walk(root):
for file in files:
if file.endswith('.c'):
fullpath = os.path.join(root, file)
samefiles.setdefault(file, []).append(fullpath)
print(samefiles)
Glob is useful in these cases, you can do:
files = {f:os.path.join(os.getcwd(), f) for f in glob.glob("*.c")}
to get the same result

Python script errors out

I have this script, which I have no doubt is flawed:
import fnmatch, os, sys
def findit (rootdir, find, pattern):
for folder, dirs, files in os.walk(rootdir):
print (folder)
for filename in fnmatch.filter(files,pattern):
with open(filename) as f:
s = f.read()
f.close()
if find in s :
print(filename)
findit(sys.argv[1], sys.argv[2], sys.argv[3])
when I run it I get Errno2, no such file or directory. BUT the file exists. For instance if I execute it by going: findit.py c:\python "folder" *.py it will work just fine, listing all the *.py files which contain the word "folder". BUT if I go findit.py c:\php\projects1 "include" *.php
as an example I get [Errno2] no such file or directory: 'About.php' (for example). But About.php exists. I don't understand what it's doing, or what I'm doing wrong.
If you look at any of the examples for os.walk, you'll see that they all do os.path.join(root, name). You need to do that too.
Why? Quoting from the docs:
filenames is a list of the names of the non-directory files in dirpath. Note that the names in the lists contain no path components. To get a full path (which begins with top) to a file or directory in dirpath, do os.path.join(dirpath, name).
If you just use the filename as a path, it's going to look for a file of the same name in the current working directory. If there's no such file, you'll get a FileNotFoundError. If there is such a file, you'll open and read the wrong file. Only if you happen to be looking inside the current working directory will it work.
There's also another major problem in your code: os.walk walks a directory tree recursively, finding all files in the given top directory, or any subdirectory of top, or any subdirectory of… and so on, yielding once for each directory. But you're not doing anything useful with that (except printing out the folders). Instead, you wait until it finishes, and then use the files from whichever directory it happened to reach last.
If you just want to get a flat listing of the files directly in a directory, use os.listdir, not os.walk. (Or maybe use glob.glob instead of explicitly listing everything then filtering with fnmatch.)
On the other hand, if you want to walk the tree, you have to move your second for loop inside the first one.
You've also got a minor problem: You call f.close() inside a with open(…) as f:, which leads to f being closed twice. This is guaranteed to be completely harmless (at least in 2.5+, including 3.x), but it's still a bad idea.
Putting it together, here's a working version of your code:
def findit (rootdir, find, pattern):
for folder, dirs, files in os.walk(rootdir):
print (folder)
for filename in fnmatch.filter(files,pattern):
pathname = os.path.join(folder, filename)
with open(pathname) as f:
s = f.read()
if find in s:
print(pathname)
You are using a relative filename. But your current directory does not contain the file. And you don't want to search there anyway. Use os.path.join(folder, filename) to make an absolute path.

Categories