Reading in multiple files in directory using python

Reading in multiple files in directory using python - python

I'm trying to open each file from a directory and print the contents, so I have a code as such:
import os, sys
def printFiles(dir):
os.chdir(dir)
for f in os.listdir(dir):
myFile = open(f,'r')
lines = myFile.read()
print lines
myFile.close()
printFiles(sys.argv[1])
The program runs, but the problem here is that it is only printing one of the contents of the file, probably the last file that it has read. Does this have something to do with the open() function?
Edit: added last line that takes in sys.argv. That's the whole code, and it still only prints the last file.

There is problem with directory and file paths.
Option 1 - chdir:
def printFiles(dir):
os.chdir(dir)
for f in os.listdir('.'):
myFile = open(f,'r')
# ...
Option 2 - computing full path:
def printFiles(dir):
# no chdir here
for f in os.listdir(dir):
myFile = open(os.path.join(dir, f), 'r')
# ...
But you are combining both options - that's wrong.
This is why I prefer pathlib.Path - it's much simpler:
from pathlib import Path
def printFiles(dir):
dir = Path(dir)
for f in dir.iterdir():
myFile = f.open()
# ...

The code itself certainly should print the contents of every file.
However, if you supply a local path and not a global path it will not work.
For example, imagine you have the following folder structure:
./a
./a/x.txt
./a/y.txt
./a/a
./a/a/x.txt
If you now run
printFiles('a')
you will only get the contents of x.txt, because os.listdir will be executed from within a, and will list the contents of the internal a/a folder, which only has x.txt.

Related

How to pass file name at run time in python

I have a directory which contains 4 .csv files at the moment.
I am able retrieve their names using os lib in the code given below:
import os
fileNames = os.listdir(path)
for f in fileNames:
print(f)
Now I want to pass the file names one by one in my open file command and do the related processing.
how do I pass file name in my command:
file = open(r'C:\Users\hu170f\Documents\TEST1\<filename to be passed>')

import os
dir_path = r'D:\text_dirs\\'
fileNames = os.listdir(dir_path)
for fn in fileNames:
f = open(dir_path+fn, "r")
print(f.read())

import os
path = r"C:\..."
filenames = os.listdir(path)
You can use the map function like a for loop, which only opens one method(The first parameter) and passes one after one element of this list(second parameter) to the method. The lambda function is like a method which executes the open function and returns the _io.TextIOWrapper. files contains a list of _io.TextIOWrapper.
files = list(map(lambda filename: open(f"{path}\{filename}"), filenames))

how to include folders in python code

Let's say two folders. One -X and one more-Y inside X.Now lets say I've set my working path to folder X inside ATOM IDE and now if I want use the folder Y in my code how do I do it?
for example while writing below code I'm inside folder X so
import glob2
import datetime
filenames = glob2.glob('*.txt')
#How do I list files of folder Y only???
with open(datetime.datetime.now().strftime("%Y-%m-%d-%H-%M-%S-%f")+".txt", 'w') as file:
#How do I create file inside folder Y only
for item in filenames:
with open(item,"r") as f:
content = f.read()
file.write(content)
file.write("\n")

You can use below code to switching around different directory
Path='path to y'
currentDir = os.getcwd()
os.chdir(Path)
#do your job here
#now come back to previous directory
os.chdir(currentDir)

Let's take your directory structure :
x/
some_script.py
y/
Now what you're looking for is to create a file inside y by writing some code in some_script.py
This is how you do it :
fh = open('y/a.txt', 'w')
fh.write("Yayy")
fh.close()

opening and reading all the files in a directory in python - python beginner

I'd like to read the contents of every file in a folder/directory and then print them at the end (I eventually want to pick out bits and pieces from the individual files and put them in a separate document)
So far I have this code
import os
path = 'results/'
fileList = os.listdir(path)
for i in fileList:
file = open(os.path.join('results/'+ i), 'r')
allLines = file.readlines()
print(allLines)
at the end I dont get any errors but it only prints the contents of the last file in my folder in a series of strings and I want to make sure its reading every file so I can then access the data I want from each file. I've looked online and I cant find where I'm going wrong. Is there any way of making sure the loop is iterating over all my files and reading all of them?
also i get the same result when I use
file = open(os.path.join('results/',i), 'r')
in the 5th line
Please help I'm so lost
Thanks!!

Separate the different functions of the thing you want to do.
Use generators wherever possible. Especially if there are a lot of files or large files
Imports
from pathlib import Path
import sys
Deciding which files to process:
source_dir = Path('results/')
files = source_dir.iterdir()
[Optional] Filter files
For example, if you only need files with extension .ext
files = source_dir.glob('*.ext')
Process files
def process_files(files):
for file in files:
with file.open('r') as file_handle :
for line in file_handle:
# do your thing
yield line
Save the lines you want to keep
def save_lines(lines, output_file=sys.std_out):
for line in lines:
output_file.write(line)

you forgot indentation at this line allLines = file.readlines()
and maybe you can try that :
import os
allLines = []
path = 'results/'
fileList = os.listdir(path)
for file in fileList:
file = open(os.path.join('results/'+ i), 'r')
allLines.append(file.read())
print(allLines)

You forgot to indent this line allLines.append(file.read()).
Because it was outside the loop, it only appended the file variable to the list after the for loop was finished. So it only appended the last value of the file variable that remained after the loop. Also, you should not use readlines() in this way. Just use read() instead;
import os
allLines = []
path = 'results/'
fileList = os.listdir(path)
for file in fileList:
file = open(os.path.join('results/'+ i), 'r')
allLines.append(file.read())
print(allLines)

This also creates a file containing all the files you wanted to print.
rootdir= your folder, like 'C:\\Users\\you\\folder\\'
import os
f = open('final_file.txt', 'a')
for root, dirs, files in os.walk(rootdir):
for filename in files:
data = open(full_name).read()
f.write(data + "\n")
f.close()
This is a similar case, with more features: Copying selected lines from files in different directories to another file

Python: Looping through files in a different directory and scanning data

I am having a hard time looping through files in a directory that is different from the directory where the script was written. I also ideally would want my script through go to through all files that start with sasa. There are a couple of files in the folder such as sasa.1, sasa.2 etc... as well as other files such as doc1.pdf, doc2.pdf
I use Python Version 2.7 with windows Powershell
Locations of Everything
1) Python Script Location ex: C:Users\user\python_project
2) Main_Directory ex: C:Users\user\Desktop\Data
3) Current_Working_Directory ex: C:Users\user\python_project
Main directory contains 100 folders (folder A, B, C, D etc..)
Each of these folders contains many files including the sasa files of interest.
Attempts at running script
For 1 file the following works:
Script is run the following way: python script1.py
file_path = 'C:Users\user\Desktop\Data\A\sasa.1
def writing_function(file_path):
with open(file_path) as file_object:
lines = file_object.readlines()
for line in lines:
print(lines)
writing_function(file_path)
However, the following does not work
Script is run the following way: python script1.py A sasa.1
import os
import sys
from os.path import join
dr = sys.argv[1]
file_name = sys.argv[2]
file_path = 'C:Users\user\Desktop\Data'
new_file_path = os.path.join(file_path, dr)
new_file_path2 = os.path.join(new_file_path, file_name)
def writing_function(paths):
with open(paths) as file_object:
lines = file_object.readlines()
for line in lines:
print(line)
writing_function(new_file_path2)
I get the following error:
with open(paths) as file_object:
IO Error: [Errno 2] No such file or directory:
'C:Users\\user\\Desktop\\A\\sasa.1'
Please note right now I am just working on one file, I want to be able to loop through all of the sasa files in the folder.

It can be something in the line of:
import os
from os.path import join
def function_exec(file):
code to execute on each file
for root, dirs, files in os.walk('path/to/your/files'): # from your argv[1]
for f in files:
filename = join(root, f)
function_exec(filename)
Avoid using the variable dir. it is a python keyword. Try print(dir(os))
dir_ = argv[1] # is preferable

No one mentioned glob so far, so:
https://docs.python.org/3/library/glob.html
I think you can solve your problem using its ** magic:
If recursive is true, the pattern “**” will match any files and zero
or more directories and subdirectories. If the pattern is followed by
an os.sep, only directories and subdirectories match.

Also note you can change directory location using
os.chdir(path)

How to read and write multiple files?

I want to write a program for this: In a folder I have n number of files; first read one file and perform some operation then store result in a separate file. Then read 2nd file, perform operation again and save result in new 2nd file. Do the same procedure for n number of files. The program reads all files one by one and stores results of each file separately. Please give examples how I can do it.

I think what you miss is how to retrieve all the files in that directory.
To do so, use the glob module.
Here is an example which will duplicate all the files with extension *.txt to files with extension *.out
import glob
list_of_files = glob.glob('./*.txt') # create the list of file
for file_name in list_of_files:
FI = open(file_name, 'r')
FO = open(file_name.replace('txt', 'out'), 'w')
for line in FI:
FO.write(line)
FI.close()
FO.close()

import sys
# argv is your commandline arguments, argv[0] is your program name, so skip it
for n in sys.argv[1:]:
print(n) #print out the filename we are currently processing
input = open(n, "r")
output = open(n + ".out", "w")
# do some processing
input.close()
output.close()
Then call it like:
./foo.py bar.txt baz.txt

You may find the fileinput module useful. It is designed for exactly this problem.

I've just learned of the os.walk() command recently, and it may help you here.
It allows you to walk down a directory tree structure.
import os
OUTPUT_DIR = 'C:\\RESULTS'
for path, dirs, files in os.walk('.'):
for file in files:
read_f = open(os.join(path,file),'r')
write_f = open(os.path.join(OUTPUT_DIR,file))
# Do stuff

Combined answer incorporating directory or specific list of filenames arguments:
import sys
import os.path
import glob
def processFile(filename):
fileHandle = open(filename, "r")
for line in fileHandle:
# do some processing
pass
fileHandle.close()
def outputResults(filename):
output_filemask = "out"
fileHandle = open("%s.%s" % (filename, output_filemask), "w")
# do some processing
fileHandle.write('processed\n')
fileHandle.close()
def processFiles(args):
input_filemask = "log"
directory = args[1]
if os.path.isdir(directory):
print "processing a directory"
list_of_files = glob.glob('%s/*.%s' % (directory, input_filemask))
else:
print "processing a list of files"
list_of_files = sys.argv[1:]
for file_name in list_of_files:
print file_name
processFile(file_name)
outputResults(file_name)
if __name__ == '__main__':
if (len(sys.argv) > 1):
processFiles(sys.argv)
else:
print 'usage message'

from pylab import *
import csv
import os
import glob
import re
x=[]
y=[]
f=open("one.txt",'w')
for infile in glob.glob(('*.csv')):
# print "" +infile
csv23=csv2rec(""+infile,'rb',delimiter=',')
for line in csv23:
x.append(line[1])
# print len(x)
for i in range(3000,8000):
y.append(x[i])
print ""+infile,"\t",mean(y)
print >>f,""+infile,"\t\t",mean(y)
del y[:len(y)]
del x[:len(x)]

I know I saw this double with open() somewhere but couldn't remember where. So I built a small example in case someone needs.
""" A module to clean code(js, py, json or whatever) files saved as .txt files to
be used in HTML code blocks. """
from os import listdir
from os.path import abspath, dirname, splitext
from re import sub, MULTILINE
def cleanForHTML():
""" This function will search a directory text files to be edited. """
## define some regex for our search and replace. We are looking for <, > and &
## To replaced with &ls;, > and &. We might want to replace proper whitespace
## chars to as well? (r'\t', ' ') and (f'\n', '<br>')
search_ = ((r'(<)', '<'), (r'(>)', '>'), (r'(&)', '&'))
## Read and loop our file location. Our location is the same one that our python file is in.
for loc in listdir(abspath(dirname(__file__))):
## Here we split our filename into it's parts ('fileName', '.txt')
name = splitext(loc)
if name[1] == '.txt':
## we found our .txt file so we can start file operations.
with open(loc, 'r') as file_1, open(f'{name[0]}(fixed){name[1]}', 'w') as file_2:
## read our first file
retFile = file_1.read()
## find and replace some text.
for find_ in search_:
retFile = sub(find_[0], find_[1], retFile, 0, MULTILINE)
## finally we can write to our newly created text file.
file_2.write(retFile)

This thing also works for reading multiple files, my file name is fedaralist_1.txt and federalist_2.txt and like this, I have 84 files till fedaralist_84.txt
And I'm reading the files as f.
for file in filename:
with open(f'federalist_{file}.txt','r') as f:
f.read()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Reading in multiple files in directory using python - python

Related

How to pass file name at run time in python

how to include folders in python code

opening and reading all the files in a directory in python - python beginner

Python: Looping through files in a different directory and scanning data

How to read and write multiple files?

Categories

Resources