Iterating through directories in Python [duplicate] - python

I need to iterate through the subdirectories of a given directory and search for files. If I get a file I have to open it and change the content and replace it with my own lines.
I tried this:
import os
rootdir ='C:/Users/sid/Desktop/test'
for subdir, dirs, files in os.walk(rootdir):
for file in files:
f=open(file,'r')
lines=f.readlines()
f.close()
f=open(file,'w')
for line in lines:
newline = "No you are not"
f.write(newline)
f.close()
but I am getting an error. What am I doing wrong?

The actual walk through the directories works as you have coded it. If you replace the contents of the inner loop with a simple print statement you can see that each file is found:
import os
rootdir = 'C:/Users/sid/Desktop/test'
for subdir, dirs, files in os.walk(rootdir):
for file in files:
print(os.path.join(subdir, file))
If you still get errors when running the above, please provide the error message.

Another way of returning all files in subdirectories is to use the pathlib module, introduced in Python 3.4, which provides an object oriented approach to handling filesystem paths (Pathlib is also available on Python 2.7 via the pathlib2 module on PyPi):
from pathlib import Path
rootdir = Path('C:/Users/sid/Desktop/test')
# Return a list of regular files only, not directories
file_list = [f for f in rootdir.glob('**/*') if f.is_file()]
# For absolute paths instead of relative the current dir
file_list = [f for f in rootdir.resolve().glob('**/*') if f.is_file()]
Since Python 3.5, the glob module also supports recursive file finding:
import os
from glob import iglob
rootdir_glob = 'C:/Users/sid/Desktop/test/**/*' # Note the added asterisks
# This will return absolute paths
file_list = [f for f in iglob(rootdir_glob, recursive=True) if os.path.isfile(f)]
The file_list from either of the above approaches can be iterated over without the need for a nested loop:
for f in file_list:
print(f) # Replace with desired operations

From python >= 3.5 onward, you can use **, glob.iglob(path/**, recursive=True) and it seems the most pythonic solution, i.e.:
import glob, os
for filename in glob.iglob('/pardadox-music/**', recursive=True):
if os.path.isfile(filename): # filter dirs
print(filename)
Output:
/pardadox-music/modules/her1.mod
/pardadox-music/modules/her2.mod
...
Notes:
glob.iglob
glob.iglob(pathname, recursive=False)
Return an iterator which yields the same values as glob() without actually storing them all simultaneously.
If recursive is True, the pattern '**' will match any files and
zero or more directories and subdirectories.
If the directory contains files starting with . they won’t be matched by default. For example, consider a directory containing card.gif and .card.gif:
>>> import glob
>>> glob.glob('*.gif') ['card.gif']
>>> glob.glob('.c*')['.card.gif']
You can also use rglob(pattern),
which is the same as calling glob() with **/ added in front of the given relative pattern.

Related

How to run script for all files in a folder/directry

I am new to python. I have successful written a script to search for something within a file using :
open(r"C:\file.txt) and re.search function and all works fine.
Is there a way to do the search function with all files within a folder? Because currently, I have to manually change the file name of my script by open(r"C:\file.txt),open(r"C:\file1.txt),open(r"C:\file2.txt)`, etc.
Thanks.
You can use os.walk to check all the files, as the following:
import os
for root, _, files in os.walk(path):
for filename in files:
with open(os.path.join(root, filename), 'r') as f:
#your code goes here
Explanation:
os.walk returns tuple of (root path, dir names, file names) in the folder, so you can iterate through filenames and open each file by using os.path.join(root, filename) which basically joins the root path with the file name so you can open the file.
Since you're a beginner, I'll give you a simple solution and walk through it.
Import the os module, and use the os.listdir function to create a list of everything in the directory. Then, iterate through the files using a for loop.
Example:
# Importing the os module
import os
# Give the directory you wish to iterate through
my_dir = <your directory - i.e. "C:\Users\bleh\Desktop\files">
# Using os.listdir to create a list of all of the files in dir
dir_list = os.listdir(my_dir)
# Use the for loop to iterate through the list you just created, and open the files
for f in dir_list:
# Whatever you want to do to all of the files
If you need help on the concepts, refer to the following:
for looops in p3: http://www.python-course.eu/python3_for_loop.php
os function Library (this has some cool stuff in it): https://docs.python.org/2/library/os.html
Good luck!
You can use the os.listdir(path) function:
import os
path = '/Users/ricardomartinez/repos/Salary-API'
# List for all files in a given PATH
file_list = os.listdir(path)
# If you want to filter by file type
file_list = [file for file in os.listdir(path) if os.path.splitext(file)[1] == '.py']
# Both cases yo can iterate over the list and apply the operations
# that you have
for file in file_list:
print(file)
#Operations that you want to do over files

List only files in a directory?

Is there a way to list the files (not directories) in a directory with Python? I know I could use os.listdir and a loop of os.path.isfile()s, but if there's something simpler (like a function os.path.listfilesindir or something), it would probably be better.
This is a simple generator expression:
files = (file for file in os.listdir(path)
if os.path.isfile(os.path.join(path, file)))
for file in files: # You could shorten this to one line, but it runs on a bit.
...
Or you could make a generator function if it suited you better:
def files(path):
for file in os.listdir(path):
if os.path.isfile(os.path.join(path, file)):
yield file
Then simply:
for file in files(path):
...
files = next(os.walk('..'))[2]
Using pathlib in Windows as follow:
files = (x for x in Path("your_path") if x.is_file())
Generates error:
TypeError: 'WindowsPath' object is not iterable
You should rather use Path.iterdir()
filePath = Path("your_path")
if filePath.is_dir():
files = list(x for x in filePath.iterdir() if x.is_file())
Since Python 3.6 you can use glob with a recursive option "**". Note that glob will give you all files and directories, so you can keep only the ones that are files
files = glob.glob(join(in_path, "**/*"), recursive=True)
files = [f for f in files if os.path.isfile(f)]
For the special case of working with files in the current directory, you could do it as a simple one-liner list comprehension:
[f for f in os.listdir(os.curdir) if os.path.isfile(f)]
Otherwise in the more general case, directory paths & filenames have to be joined:
dirpath = '~/path_to_dir_of_interest'
files = [f for f in os.listdir(dirpath) if os.path.isfile(os.path.join(dirpath, f))]
You could try pathlib, which has a lot of other useful stuff too.
Pathlib is an object-oriented library for interacting with filesystem paths. To get the files in the current directory, one can do:
from pathlib import *
files = (x for x in Path(".") if x.is_file())
for file in files:
print(str(file), "is a file!")
This is, in my opinion, more Pythonic than using os.path.
See also: PEP 428.
Using pathlib, the shortest way to list only files is:
[x for x in Path("your_path").iterdir() if x.is_file()]
with depth support if need be.
If you use Python 3, you could use pathlib.
But, you have to know that if you use the is_dir() method as :
from pathlib import *
#p is directory path
#files is list of files in the form of path type
files=[x for x in p.iterdir() if x.is_file()]
empty files will be skipped by .iterdir()
The solution I found is:
from pathlib import *
#p is directory path
#listing all directory's content, even empty files
contents=list(p.glob("*"))
#if element in contents isn't a folder, it's a file
#is_dir() even works for empty folders...!
files=[x for x in contents if not x.is_dir()]

List files ONLY in the current directory

In Python, I only want to list all the files in the current directory ONLY. I do not want files listed from any sub directory or parent.
There do seem to be similar solutions out there, but they don't seem to work for me. Here's my code snippet:
import os
for subdir, dirs, files in os.walk('./'):
for file in files:
do some stuff
print file
Let's suppose I have 2 files, holygrail.py and Tim inside my current directory. I have a folder as well and it contains two files - let's call them Arthur and Lancelot - inside it. When I run the script, this is what I get:
holygrail.py
Tim
Arthur
Lancelot
I am happy with holygrail.py and Tim. But the two files, Arthur and Lancelot, I do not want listed.
Just use os.listdir and os.path.isfile instead of os.walk.
Example:
import os
files = [f for f in os.listdir('.') if os.path.isfile(f)]
for f in files:
# do something
But be careful while applying this to other directory, like
files = [f for f in os.listdir(somedir) if os.path.isfile(f)]
which would not work because f is not a full path but relative to the current directory.
Therefore, for filtering on another directory, do os.path.isfile(os.path.join(somedir, f))
(Thanks Causality for the hint)
You can use os.listdir for this purpose. If you only want files and not directories, you can filter the results using os.path.isfile.
example:
files = os.listdir(os.curdir) #files and directories
or
files = filter(os.path.isfile, os.listdir( os.curdir ) ) # files only
files = [ f for f in os.listdir( os.curdir ) if os.path.isfile(f) ] #list comprehension version.
import os
destdir = '/var/tmp/testdir'
files = [ f for f in os.listdir(destdir) if os.path.isfile(os.path.join(destdir,f)) ]
You can use os.scandir(). New function in stdlib starts from Python 3.5.
import os
for entry in os.scandir('.'):
if entry.is_file():
print(entry.name)
Faster than os.listdir(). os.walk() implements os.scandir().
You can use the pathlib module.
from pathlib import Path
x = Path('./')
print(list(filter(lambda y:y.is_file(), x.iterdir())))
this can be done with os.walk()
python 3.5.2 tested;
import os
for root, dirs, files in os.walk('.', topdown=True):
dirs.clear() #with topdown true, this will prevent walk from going into subs
for file in files:
#do some stuff
print(file)
remove the dirs.clear() line and the files in sub folders are included again.
update with references;
os.walk documented here and talks about the triple list being created and topdown effects.
.clear() documented here for emptying a list
so by clearing the relevant list from os.walk you can effect its result to your needs.
import os
for subdir, dirs, files in os.walk('./'):
for file in files:
do some stuff
print file
You can improve this code with del dirs[:]which will be like following .
import os
for subdir, dirs, files in os.walk('./'):
del dirs[:]
for file in files:
do some stuff
print file
Or even better if you could point os.walk with current working directory .
import os
cwd = os.getcwd()
for subdir, dirs, files in os.walk(cwd, topdown=True):
del dirs[:] # remove the sub directories.
for file in files:
do some stuff
print file
instead of os.walk, just use os.listdir
To list files in a specific folder excluding files in its sub-folders with os.walk use:
_, _, file_list = next(os.walk(data_folder))
Following up on Pygirl and Flimm, use of pathlib, (really helpful reference, btw) their solution included the full path in the result, so here is a solution that outputs just the file names:
from pathlib import Path
p = Path(destination_dir) # destination_dir = './' in original post
files = [x.name for x in p.iterdir() if x.is_file()]
print(files)

How to filter files (with known type) from os.walk?

I have list from os.walk. But I want to exclude some directories and files. I know how to do it with directories:
for root, dirs, files in os.walk('C:/My_files/test'):
if "Update" in dirs:
dirs.remove("Update")
But how can I do it with files, which type I know. because this doesn't work:
if "*.dat" in files:
files.remove("*.dat")
files = [ fi for fi in files if not fi.endswith(".dat") ]
Exclude multiple extensions.
files = [ file for file in files if not file.endswith( ('.dat','.tar') ) ]
And in one more way, because I just wrote this, and then stumbled upon this question:
files = filter(lambda file: not file.endswith('.txt'), files)
Mote that in python3 filter returns a generator, not a list, and the list comprehension is "preferred".
A concise way of writing it, if you do this a lot:
def exclude_ext(ext):
def compare(fn): return os.path.splitext(fn)[1] != ext
return compare
files = filter(exclude_ext(".dat"), files)
Of course, exclude_ext goes in your appropriate utility package.
files = [file for file in files if os.path.splitext(file)[1] != '.dat']
Should be exactly what you need:
if thisFile.endswith(".txt"):
Try this:
import os
skippingWalk = lambda targetDirectory, excludedExtentions: (
(root, dirs, [F for F in files if os.path.splitext(F)[1] not in excludedExtentions])
for (root, dirs, files) in os.walk(targetDirectory)
)
for line in skippingWalk("C:/My_files/test", [".dat"]):
print line
This is a generator expression generating lambda function. You pass it a path and some extensions, and it invokes os.walk with the path, filters out the files with extensions in the list of unwanted extensions using a list comprehension, and returns the result.
(edit: removed the .upper() statement because there might be an actual difference between extensions of different case - if you want this to be case insensitive, add .upper() after os.path.splitext(F)[1] and pass extensions in in capital letters.)
The easiest way to filter files with a known type with os.walk() is to tell the path and get all the files filtered by the extension with an if statement.
for base, dirs, files in os.walk(path):
if files.endswith('.type'):
#Here you will go through all the files with the particular extension '.type'
.....
.....
Another solution would be to use the functions from fnmatch module:
def MatchesExtensions(name,extensions=["*.dat", "*.txt", "*.whatever"]):
for pattern in extensions:
if fnmatch.fnmatch(pattern):
return True
return False
This way you avoid all the hassle with upper/lower case extension. This means you don't need to convert to lower/upper when having to match *.JPEG, *.jpeg, *.JPeg, *.Jpeg
All above answers are working. Just wanted to add for anyone else whos files by any chance are coming from heterogeneous sources, e.g. downloading images in archives from the Internet. In this case, because Unix-like systems are case sensitive you may end up having extension like '.PNG' and '.png'. These will be treated by as different strings by endswith method, i.e. '.PNG'.endswith('png') will return False. In order to avoid this problem, use lower() function.
here is how to find all files in a directory ending with a specific extension
import glob, os
path=os.path.expanduser('C:\\Users\\A')
for filename in [item for item in os.listdir(path) if item.endswith(".ipynb") ]:
print(filename)
In these two ways I can select the files by the file type:
from os import listdir
from os.path import isfile, join
source_path = './data'
excelfiles = [f for f in listdir(source_path) if f.endswith(('.xlsx')) and isfile(join(source_path, f))]
from os import walk
excelfiles2 = []
for (dirpath, dirnames, filenames) in walk(source_path):
excelfiles2.extend(filename for filename in filenames if filename.endswith('.xlsx'))
break

Deleting files by type in Python on Windows

I know how to delete single files, however I am lost in my implementation of how to delete all files in a directory of one type.
Say the directory is \myfolder
I want to delete all files that are .config files, but nothing to the other ones.
How would I do this?
Thanks Kindly
Use the glob module:
import os
from glob import glob
for f in glob ('myfolder/*.config'):
os.unlink (f)
I would do something like the following:
import os
files = os.listdir("myfolder")
for f in files:
if not os.path.isdir(f) and ".config" in f:
os.remove(f)
It lists the files in a directory and if it's not a directory and the filename has ".config" anywhere in it, delete it. You'll either need to be in the same directory as myfolder, or give it the full path to the directory. If you need to do this recursively, I would use the os.walk function.
Here ya go:
import os
# Return all files in dir, and all its subdirectories, ending in pattern
def gen_files(dir, pattern):
for dirname, subdirs, files in os.walk(dir):
for f in files:
if f.endswith(pattern):
yield os.path.join(dirname, f)
# Remove all files in the current dir matching *.config
for f in gen_files('.', '.config'):
os.remove(f)
Note also that gen_files can be easily rewritten to accept a tuple of patterns, since str.endswith accepts a tuple

Categories