How to read all files in a directory using python - python

I have a directory with lot of text files. I want to read each text file in the directory and perform some kind of search operation. I take directory name as a command line argument. The error I'm getting is IsADirectoryError. Is there anyway we can make this work without any other module?
This is my code:
a = sys.argv
files = a[1:-1]
for i in files:
print(i)
f = open(i,'rb')
for line in f:
try:
for word in line.split():
'''Rest of code here'''

try this code
def read_files_from_dir(dirname):
for _file in os.listdir(dirname):
with open(os.path.join(dirname,_file), "r") as fp:
print fp.read()

Related

How open file based on extension?

I want to open any .txt file in the same directory.
In ruby I can do
File.open("*.txt").each do |line|
puts line
end
In python I can't do this it will give an error
file = open("*.txt","r")
print(file.read())
file.close()
It gives an error invalid argument.
So is there any way around it?
You can directly use the glob module for this
import glob
for file in glob.glob('*.txt'):
with open(file, 'r') as f:
print(f.read())
Use os.listdir to list all files in the current directory.
all_files = os.listdir()
Then, filter the ones which have the extension you are looking for and open each one of them in a loop.
for filename in all_files:
if filename.lower().endswith('.txt'):
with open(filename, 'rt') as f:
f.read()

opening and reading all the files in a directory in python - python beginner

I'd like to read the contents of every file in a folder/directory and then print them at the end (I eventually want to pick out bits and pieces from the individual files and put them in a separate document)
So far I have this code
import os
path = 'results/'
fileList = os.listdir(path)
for i in fileList:
file = open(os.path.join('results/'+ i), 'r')
allLines = file.readlines()
print(allLines)
at the end I dont get any errors but it only prints the contents of the last file in my folder in a series of strings and I want to make sure its reading every file so I can then access the data I want from each file. I've looked online and I cant find where I'm going wrong. Is there any way of making sure the loop is iterating over all my files and reading all of them?
also i get the same result when I use
file = open(os.path.join('results/',i), 'r')
in the 5th line
Please help I'm so lost
Thanks!!
Separate the different functions of the thing you want to do.
Use generators wherever possible. Especially if there are a lot of files or large files
Imports
from pathlib import Path
import sys
Deciding which files to process:
source_dir = Path('results/')
files = source_dir.iterdir()
[Optional] Filter files
For example, if you only need files with extension .ext
files = source_dir.glob('*.ext')
Process files
def process_files(files):
for file in files:
with file.open('r') as file_handle :
for line in file_handle:
# do your thing
yield line
Save the lines you want to keep
def save_lines(lines, output_file=sys.std_out):
for line in lines:
output_file.write(line)
you forgot indentation at this line allLines = file.readlines()
and maybe you can try that :
import os
allLines = []
path = 'results/'
fileList = os.listdir(path)
for file in fileList:
file = open(os.path.join('results/'+ i), 'r')
allLines.append(file.read())
print(allLines)
You forgot to indent this line allLines.append(file.read()).
Because it was outside the loop, it only appended the file variable to the list after the for loop was finished. So it only appended the last value of the file variable that remained after the loop. Also, you should not use readlines() in this way. Just use read() instead;
import os
allLines = []
path = 'results/'
fileList = os.listdir(path)
for file in fileList:
file = open(os.path.join('results/'+ i), 'r')
allLines.append(file.read())
print(allLines)
This also creates a file containing all the files you wanted to print.
rootdir= your folder, like 'C:\\Users\\you\\folder\\'
import os
f = open('final_file.txt', 'a')
for root, dirs, files in os.walk(rootdir):
for filename in files:
data = open(full_name).read()
f.write(data + "\n")
f.close()
This is a similar case, with more features: Copying selected lines from files in different directories to another file

Copying all files of a directory to one text file in python

My intention is to copy the text of all my c# (and later aspx) files to one final text file, but it doesn't work.
For some reason, the "yo.txt" file is not created.
I know that the iteration over the files works, but I can't write the data into the .txt file.
The variable 'data' eventually does contain all text from the files . . .
*******Could it be connected to the fact that there are some non-ascii characters in the text of the c# files?
Here is my code:
import os
import sys
src_path = sys.argv[1]
os.chdir(src_path)
data = ""
for file in os.listdir('.'):
if os.path.isfile(file):
if file.split('.')[-1]=="cs" and (len(file.split('.'))==2 or len(file.split('.'))==3):
print "Copying", file
with open(file, "r") as f:
data += f.read()
print data
with open("yo.txt", "w") as f:
f.write(data)
If someone has an idea, it will be great :)
Thanks
You have to ensure the directory the file is created has sufficient write permissions, if not run
chmod -R 777 .
to make the directory writable.
import os
for r, d, f in os.walk(inputdir):
for file in f:
filelist.append(f"{r}\\{file}")
with open(outputfile, 'w') as outfile:
for f in filelist:
with open(f) as infile:
for line in infile:
outfile.write(line)
outfile.write('\n \n')

reading multiple files and writing multiple output files

If I have multiple files in a directory C, and I want to write a code reading (automatically) ALL the files and processing each file and then writing an output file for each input file.
For example in directory C i have the following files:
aba
cbr
wos
grebedit
scor
Hint: these files are without an obvious extension
Then the program reads these files one by one, process, then write the output in the directory:
aba.out
cbr.out
wos.out
grebedit.out
scor.out
Allow me to direct you to the tutorial. Once you're comfortable with file IO in general, here's a basic workflow for you to expand on.
def do_something(lines):
output = []
for line in lines:
# Do whatever you need to do.
newline = line.upper()
output.append(newline)
return '\n'.join(output) #
listfiles = ['aba', 'cbr', 'wos', 'grebedit', 'scor']
for f in listfiles:
try:
infile = open(f, 'r')
outfile = open(f+'.out', 'w')
processed = do_something(infile.readlines())
outfile.write(processed)
infile.close()
outfile.close()
except:
# Do some error handling here
print 'Error!'
If you need to build your list from all the files in a certain directory, use the os module.
import os
listfiles = os.listdir(r'C:\test')

Reading/Writing Text from Multiple Files to Master File

In the code below I'm trying to open a series of text files and copy their contents into a single file. I'm getting an error on the "os.write(out_file, line)" in which it asks me for an integer. I haven't defined what "line" is, so is that the problem? Do I need to specify somehow that "line" is a text string from the in_file? Also, I open the out_file through each iteration of the for-loop. Is that bad? Should I open it once at the beginning? Thanks!
import os
import os.path
import shutil
# This is supposed to read through all the text files in a folder and
# copy the text inside to a master file.
# This defines the master file and gets the source directory
# for reading/writing the files in that directory to the master file.
src_dir = r'D:\Term Search'
out_file = r'D:\master.txt'
files = [(path, f) for path,_,file_list in os.walk(src_dir) for f in file_list]
# This for-loop should open each of the files in the source directory, write
# their content to the master file, and finally close the in_file.
for path, f_name in files:
open(out_file, 'a+')
in_file = open('%s/%s' % (path, f_name), 'r')
for line in in_file:
os.write(out_file, line)
close(file_name)
close(out_file)
print 'Finished'
You're doing it wrong:
You did:
open(out_file, 'a+')
but that doesn't save the reference as a variable, so you have no way to access the file object you just created. What you need to do:
out_file_handle = open(out_file, 'a+')
...
out_file_handle.write(line)
...
out_file_handle.close()
Or, more pythonically:
out_filename = r"D:\master.txt"
...
with open(out_filename, 'a+') as outfile:
for filepath in files:
with open(os.path.join(*filepath)) as infile:
outfile.write(infile.read())
print "finished"

Categories