How to recursively go through all subdirectories and read files? - python

I have a root-ish directory containing multiple subdirectories, all of which contain a file name data.txt. What I would like to do is write a script that takes in the "root" directory, and then reads through all of the subdirectories and reads every "data.txt" in the subdirectories, and then writes stuff from every data.txt file to an output file.
Here's a snippet of my code:
import os
import sys
rootdir = sys.argv[1]
with open('output.txt','w') as fout:
for root, subFolders, files in os.walk(rootdir):
for file in files:
if (file == 'data.txt'):
#print file
with open(file,'r') as fin:
for lines in fin:
dosomething()
My dosomething() part -- I've tested and confirmed for it to work if I am running that part just for one file. I've also confirmed that if I tell it to print the file instead (the commented out line) the script prints out 'data.txt'.
Right now if I run it Python gives me this error:
File "recursive.py", line 11, in <module>
with open(file,'r') as fin:
IOError: [Errno 2] No such file or directory: 'data.txt'
I'm not sure why it can't find it -- after all, it prints out data.txt if I uncomment the 'print file' line. What am I doing incorrectly?

You need to use absolute paths, your file variable is just a local filename without a directory path. The root variable is that path:
with open('output.txt','w') as fout:
for root, subFolders, files in os.walk(rootdir):
if 'data.txt' in files:
with open(os.path.join(root, 'data.txt'), 'r') as fin:
for lines in fin:
dosomething()

[os.path.join(dirpath, filename) for dirpath, dirnames, filenames in os.walk(rootdir)
for filename in filenames]
A functional approach to get the tree looks shorter, cleaner and more Pythonic.
You can wrap the os.path.join(dirpath, filename) into any function to process the files you get or save the array of paths for further processing

Related

I'm trying to search multiple directories with hundreds of different text files with random text in them, but I'm struggling

I have hundreds of little text files in multiple folders. In each text file is loads of random letters and symbols and I have been tasked with finding certain information like "HSBC" and "91274163" and others. I am very new to coding and I am struggling quite a lot, I do not have long left to complete this so if anyone can help I'd appreciated
import os
FILENAMES=[]
for root, dirs, files in os.walk(r"****MY PATH****"):
for filename in files:
if filename.endswith(".txt"):
FILENAMES.append(filename)
print(filename)
print('\n')
This is the first part of my code, Which displays all the text files and then exits.
for FILENAME in FILENAMES:
print(FILENAME," contains the following function:\n")
f1=open(FILENAME,'r')
for line in f1:
if ("HSBC") in line:
print(line)
else:
pass
print('\n')
f1.close()
As soon as I add this part of the code I get "
f1=open(FILENAME,'r')
^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'File-06Ijg.txt'
I have tried many other scripts, I encounter various different encoding errors etc. At least with this script I can display all the text files so im trying to figure this one out
add the changes in your first function
import os
FILENAMES=[]
for root, dirs, files in os.walk(r"****MY PATH****"):
for filename in files:
if filename.endswith(".txt"):
# this line with get complete path
file_path = os.path.join(root,filename)
FILENAMES.append(file_path)
print(file_path)
print('\n')
I'm really sorry, should've tested before answering. Shouldn't be os.path.join(dirs, filename) but os.path.join(root, filename) instead.
Try this:
import os
FILENAMES=[]
for root, dirs, files in os.walk(r"****MY PATH****"):
for filename in files:
if filename.endswith(".txt"):
FILENAMES.append(os.path.join(root, filename))
print(filename)
print('\n')
for FILENAME in FILENAMES:
print(FILENAME," contains the following function:\n")
with open(FILENAME,'r', encoding="utf-8") as f1:
for line in f1:
if "HSBC" in line:
print(line)
print('\n')
Edit: Typo, os.join -> os.path.join

Python script to loop for files in subdirectories in order to change text and extension

I'm trying to loop through files in multiple subdirectories in order to :
1- Add some text inside the files (ending with .ext)
2- Change the extension of each file from .ext to .ext2
The script works fine when I have only one subdir in the main directory, but when I try to run the script on multiple subdirs it says:
line 8, in
with open(name, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: "here the name of the subdir"
import os
directory = 'C:\\Users\\folder\\subfolders'
for dir, subdirs, files in os.walk(directory):
for name in files:
if name.endswith((".ext")):
with open(name, "r") as f:
XMLContent = f.readlines()
XMLContent.insert(6, '<XMLFormat>\n')
XMLContent.insert(40, '\n</XMLFormat>')
with open(name, "w") as f:
XMLContent = "".join(XMLContent)
f.write(XMLContent)
os.rename(os.path.join(dir, name), os.path.join(dir, name[:name.index('.ext')] +".ext1"))
Above is a screenshot of the sub dirs I have in the folder (1.Modified).
I've also created a new folder called all and put in it three folders and for each folder, I've created 2 files of .ext type.
So, I was able to write inside each file of them and change its name as well.
import os
for root, dirs, files in os.walk("/Users/ghaith/Desktop/test/all"):
for file in files:
if file.endswith('.ext'):
path = root + '/' + file
with open(path, "r") as f:
content = f.readlines()
content.insert(1, '<XMLFormat>\n')
content.insert(3, '\n</XMLFormat>')
with open(path, "w") as f:
content = "".join(content)
f.write(content)
os.rename(path, path+'2')
Output:
< XMLFormat >
< /XMLFormat >
you need to pass the directory to open the file
with open(os.path.join(directory, name), "r") as f:
But, I think the best way is use the os.listdir() to loop in the directory
for item in os.listdir(directory):
if item.endswith(".ext"):
with open(os.path.join(directory, item), "r") as r:

Python - loop through subfolders and files in a directory without ignoring the subfolder

I have read all the stack exchange help files on looping through subfolders, as as well as the os documentation, but I am still stuck. I am trying to loop over files in subfolders, open each file, extract the first number in the first line, copy the file to a different subfolder(with the same name but in the output directory) and rename the file copy with the number as a suffix.
import os
import re
outputpath = "C:/Users/Heather/Dropbox/T_Files/Raw_FRUS_Data/Wisconsin_Copies_With_PageNumbers"
inputpath = "C:/Users/Heather/Dropbox/T_Files/Raw_FRUS_Data/FRUS_Wisconsin"
suffix=".txt"
for root, dirs, files in os.walk(inputpath):
for file in files:
file_path = os.path.join(root, file)
foldername=os.path.split(os.path.dirname(file_path))[1]
filebname=os.path.splitext(file)[0]
filename=filebname + "_"
f=open(os.path.join(root,file),'r')
data=f.readlines()
if data is None:
f.close()
else:
with open(os.path.join(root,file),'r') as f:
for line in f:
s=re.search(r'\d+',line)
if s:
pagenum=(s.group())
break
with open(os.path.join(outputpath, foldername,filename+pagenum+suffix), 'w') as f1:
with open(os.path.join(root,file),'r') as f:
for line in f:
f1.write(line)
I expect the result to be copies of the files in the input directory placed in the corresponding subfolder in the output directory, renamed with a suffix, such as "005_2", where 005 is the original file name, and 2 is the number the python code extracted from it.
The error I get seems to indicates that I am not looping through files correctly. I know the code for extracting the first number and renaming the file works because I tested it on a single file. But using os.walk to loop through multiple subfolders is not working, and I can't figure out what I am doing wrong. Here is the error:
File "<ipython-input-1-614e2851f16a>", line 23, in <module>
with open(os.path.join(outputpath, foldername,filename+pagenum+suffix), 'w') as f1:
IOError: [Errno 2] No such file or directory: 'C:/Users/Heather/Dropbox/T_Files/Raw_FRUS_Data/Wisconsin_Copies_With_PageNumbers\\FRUS_Wisconsin\\.dropbox_1473986809.txt'
Well, this isn't eloquent, but it worked
from glob import glob
folderlist=glob("C:\\...FRUS_Wisconsin*\\")
outputpath = "C:\\..\Wisconsin_Copies_With_PageNumbers"
for folder in folderlist:
foldername = str(folder.split('\\')[7])
for root, dirs, files in os.walk(folder):
for file in files:
filebname=os.path.splitext(file)[0]
filename=filebname + "_"
if not filename.startswith('._'):
with open(os.path.join(root,file),'r') as f:
for line in f:
s=re.search(r'\d+',line)
if s:
pagenum=(s.group())
break
with open(os.path.join(outputpath, foldername,filename+pagenum+suffix), 'w') as f1:
with open(os.path.join(root,file),'r') as f:
for line in f:
f1.write(line)

Zipfile not writing data

I have tried to zip log files that I have created, but nothing is written!
Code:
def del_files():
'''Adapted from answer # http://stackoverflow.com/questions/7217196/python-delete-old-files by jterrace'''
dir_to_search = os.path.join(os.getcwd(),'log')
for dirpath, dirnames, filenames in os.walk(dir_to_search):
for file in filenames:
curpath = os.path.join(dirpath, file)
log(curpath)
if curpath != path:
log("Archiving old log files...")
with zipfile.ZipFile("log_old.zip", "w") as ZipFile:
ZipFile.write(curpath)
ZipFile.close()
log("archived")
For one thing, you are overwriting the output zip file on each iteration:
with zipfile.ZipFile("log_old.zip", "w") as ZipFile:
mode "w" means to create a new file, or truncate an existing file. Probably you mean to append to the zip file, in which case it can be opened for append by using mode "a". Or you could open the zip file outside of the outer for loop.
Your code should result in log_old.zip containing a single file - the last one found by os.walk().
Opening the archive outside of the main loop is better since the file will only be opened once, and it will be closed automatically because of the context manager (with):
with zipfile.ZipFile("log_old.zip", "w") as zf:
dir_to_search = os.path.join(os.getcwd(), 'log')
for dirpath, dirnames, filenames in os.walk(dir_to_search):
for file in filenames:
curpath = os.path.join(dirpath, file)
zf.write(curpath)

Python 2.5.2: trying to open files recursively

The script below should open all the files inside the folder 'pruebaba' recursively but I get this error:
Traceback (most recent call last):
File
"/home/tirengarfio/Desktop/prueba.py",
line 8, in
f = open(file,'r') IOError: [Errno 21] Is a directory
This is the hierarchy:
pruebaba
folder1
folder11
test1.php
folder12
test1.php
test2.php
folder2
test1.php
The script:
import re,fileinput,os
path="/home/tirengarfio/Desktop/pruebaba"
os.chdir(path)
for file in os.listdir("."):
f = open(file,'r')
data = f.read()
data = re.sub(r'(\s*function\s+.*\s*{\s*)',
r'\1echo "The function starts here."',
data)
f.close()
f = open(file, 'w')
f.write(data)
f.close()
Any idea?
Use os.walk. It recursively walks into directory and subdirectories, and already gives you separate variables for files and directories.
import re
import os
from __future__ import with_statement
PATH = "/home/tirengarfio/Desktop/pruebaba"
for path, dirs, files in os.walk(PATH):
for filename in files:
fullpath = os.path.join(path, filename)
with open(fullpath, 'r') as f:
data = re.sub(r'(\s*function\s+.*\s*{\s*)',
r'\1echo "The function starts here."',
f.read())
with open(fullpath, 'w') as f:
f.write(data)
You're trying to open everything you see. One thing you tried to open was a directory; you need to check if an entry is a file or is a directory, and make a decision from there. (Was the error IOError: [Errno 21] Is a directory not descriptive enough?)
If it is a directory, then you'll want to make a recursive call to your function to walk over the files in that directory as well.
Alternatively, you might be interested in the os.walk function to take care of the recursive-ness for you.
os.listdir lists both files and directories. You should check if what you're trying to open really is a file with os.path.isfile

Categories