How would I read and write from multiple files in a single directory? Python - python

I am writing a Python code and would like some more insight on how to approach this issue.
I am trying to read in multiple files in order that end with .log. With this, I hope to write specific values to a .csv file.
Within the text file, there are X/Y values that are extracted below:
Textfile.log:
X/Y = 5
X/Y = 6
Textfile.log.2:
X/Y = 7
X/Y = 8
DesiredOutput in the CSV file:
5
6
7
8
Here is the code I've come up with so far:
def readfile():
import os
i = 0
for file in os.listdir("\mydir"):
if file.endswith(".log"):
return file
def main ():
import re
list = []
list = readfile()
for line in readfile():
x = re.search(r'(?<=X/Y = )\d+', line)
if x:
list.append(x.group())
else:
break
f = csv.write(open(output, "wb"))
while 1:
if (i>len(list-1)):
break
else:
f.writerow(list(i))
i += 1
if __name__ == '__main__':
main()
I'm confused on how to make it read the .log file, then the .log.2 file.
Is it possible to just have it automatically read all the files in 1 directory without typing them in individually?
Update: I'm using Windows 7 and Python V2.7

The simplest way to read files sequentially is to build a list and then loop over it. Something like:
for fname in list_of_files:
with open(fname, 'r') as f:
#Do all the stuff you do to each file
This way whatever you do to read each file will be repeated and applied to every file in list_of_files. Since lists are ordered, it will occur in the same order as the list is sorted to.
Borrowing from #The2ndSon's answer, you can pick up the files with os.listdir(dir). This will simply list all files and directories within dir in an arbitrary order. From this you can pull out and order all of your files like this:
allFiles = os.listdir(some_dir)
logFiles = [fname for fname in allFiles if "log" in fname.split('.')]
logFiles.sort(key = lambda x: x.split('.')[-1])
logFiles[0], logFiles[-1] = logFiles[-1], logFiles[0]
The above code will work with files name like "somename.log", "somename.log.2" and so on. You can then take logFiles and plug it in as list_of_files. Note that the last line is only necessary if the first file is "somename.log" instead of "somename.log.1". If the first file has a number on the end, just exclude the last step
Line By Line Explanation:
allFiles = os.listdir(some_dir)
This line takes all files and directories within some_dir and returns them as a list
logFiles = [fname for fname in allFiles if "log" in fname.split('.')]
Perform a list comprehension to gather all of the files with log in the name as part of the extension. "something.log.somethingelse" will be included, "log_something.somethingelse" will not.
logFiles.sort(key = lambda x: x.split('.')[-1])
Sort the list of log files in place by the last extension. x.split('.')[-1] splits the file name into a list of period delimited values and takes the last entry. If the name is "name.log.5", it will be sorted as "5". If the name is "name.log", it will be sorted as "log".
logFiles[0], logFiles[-1] = logFiles[-1], logFiles[0]
Swap the first and last entries of the list of log files. This is necessary because the sorting operation will put "name.log" as the last entry and "nane.log.1" as the first.

If you change the naming scheme for your log files you can easily return of list of files that have the ".log" extension. For example if you change the file names to Textfile1.log and Textfile2.log you can update readfile() to be:
import os
def readfile():
my_list = []
for file in os.listdir("."):
if file.endswith(".log"):
my_list.append(file)
print my_list will return ['Textfile1.log', 'Textfile2.log']. Using the word 'list' as a variable is generally avoided, as it is also used to for an object in python.

Related

Extracting a diffrentiating numerical value from multiple files - PowerShell/Python

I have multiple text files containing different text.
They all contain a single appearance of the same 2 lines I am interested in:
================================================================
Result: XX/100
I am trying to write a script to collect all those XX values (numerical values between 0 and 100), and paste them in a CSV file with the text file name in column A and the numerical value in column B.
I have considered using Python or PowerShell for this purpose.
How can I identify the line where "Result" appears under the string of "===..", collect its content until '\n', and then strip it from "Result: " and "/100"?
"Result" and other numerical values could appear in the files, but never in the quoted format, and below "=====", like the line im interested in.
Thank you!
Edit: I have written this poor naive attempt to collect the numerical values.
import os
dir_path = os.path.dirname(os.path.realpath(__file__))
for filename in os.listdir(dir_path):
if filename.endswith(".txt"):
with open(filename,"r") as f:
lineFound=False
for index, line in enumerate(f):
if lineFound:
line=line.replace("Result: ", "")
line=line.replace("/100","")
line.strip()
grade=line
lineFound=False
print(grade, end='')
continue
if index>3:
if "================================================================" in line:
lineFound=True
I'd still be happy to learn if there's a simple way to do this with PowerShell tbh
For the output, I used csv writer to append the results to a file one by one.
So there's two steps involved here, first is to get a list of files. There's a ton of answers for that one on stackoverflow, but this one is stupidly complete.
Once you have the list of files, you can simply just load the files themselves one by one, and then do some simple string.split() to get the value you want.
Finally, write the results into a CSV file. Since the CSV file is a simple one, you don't need to use the CSV library for this.
See the code example below. Note that I copied/pasted the function for generating the list of files from my personal github repo. I reuse that one a lot.
import os
def get_files_from_path(path: str = ".", ext:str or list=None) -> list:
"""Find files in path and return them as a list.
Gets all files in folders and subfolders
See the answer on the link below for a ridiculously
complete answer for this.
https://stackoverflow.com/a/41447012/9267296
Args:
path (str, optional): Which path to start on.
Defaults to '.'.
ext (str/list, optional): Optional file extention.
Defaults to None.
Returns:
list: list of file paths
"""
result = []
for subdir, dirs, files in os.walk(path):
for fname in files:
filepath = f"{subdir}{os.sep}{fname}"
if ext == None:
result.append(filepath)
elif type(ext) == str and fname.lower().endswith(ext.lower()):
result.append(filepath)
elif type(ext) == list:
for item in ext:
if fname.lower().endswith(item.lower()):
result.append(filepath)
return result
filelist = get_files_from_path("path/to/files/", ext=".txt")
split1 = "================================================================\nResult: "
split2 = "/100"
with open("output.csv", "w") as outfile:
outfile.write('filename, value\n')
for filename in filelist:
with open(filename) as infile:
value = infile.read().split(split1)[1].split(split2)[0]
print(value)
outfile.write(f'"{filename}", {value}\n')
You could try this.
In this example the filename written to the CSV will be its full (absolute) path. You may just want the base filename.
Uses the same, albeit seemingly unnecessary, mechanism for deriving the source directory. It would be unusual to have your Python script in the same directory as your data.
import os
import glob
equals = '=' * 64
dir_path = os.path.dirname(os.path.realpath(__file__))
outfile = os.path.join(dir_path, 'foo.csv')
with open(outfile, 'w') as csv:
print('A,B', file=csv)
for file in glob.glob(os.path.join(dir_path, '*.txt')):
prev = None
with open(file) as indata:
for line in indata:
t = line.split()
if len(t) == 2 and t[0] == 'Result:' and prev.startswith(equals):
v = t[1].split('/')
if len(v) == 2 and v[1] == '100':
print(f'{file},{v[0]}', file=csv)
break
prev = line

How can I iterate over a list of .txt files using numpy?

I'm trying to iterate over a list of .txt files in Python. I would like to load each file individually, create an array, find the maximum value in a certain column of each array, and append it to an empty list. Each file has three columns and no headers or anything apart from numbers.
My problem is starting the iteration. I've received error messages such as "No such file or directory", then displays the name of the first .txt file in my list.
I used os.listdir() to display each file in the directory that I'm working with. I assigned this to the variable filenamelist, which I'm trying to iterate over.
Here is one of my attempts to iterate:
for f in filenamelist:
x, y, z = np.array(f)
currentlist.append(max(z))
I expect it to make an array of each file, find the maximum value of the third column (which I have assigned to z) and then append that to an empty list, then move onto the next file.
Edit: Here is the code that I have wrote so far:
import os
import numpy as np
from glob import glob
path = 'C://Users//chand//06072019'
filenamelist = os.listdir(path)
currentlist = []
for f in filenamelist:
file_array = np.fromfile(f, sep=",")
z_column = file_array[:,2]
max_z = z_column.max()
currentlist.append(max_z)
Edit 2: Here is a snippet of one file that I'm trying to extract a value from:
0, 0.996, 0.031719
5.00E-08, 0.996, 0.018125
0.0000001, 0.996, 0.028125
1.50E-07, 0.996, 0.024063
0.0000002, 0.996, 0.023906
2.50E-07, 0.996, 0.02375
0.0000003, 0.996, 0.026406
Each column is of length 1000. I'm trying to extract the maximum value of the third column and append it to an empty list.
The main issue is thatnp.array(filename) does not load the file for you. Depending on the format of your file, something like np.loadtxt() will do the trick (see the docs).
Edit: As others have mentioned, there is another issue with your implementation. os.listdir() returns a list of file names, but you need file paths. You could use os.path.join() to get the path that you need.
Below is an example of how you might do what you want, but it really depends on the file format. In this example I'm assuming a CSV (comma separated) file.
Example input file:
1,2,3
4,5,6
Example code:
path = 'C://Users//chand//06072019'
filenames = os.listdir(path)
currentlist = []
for f in filenames:
# get the full path of the filename
filepath = os.path.join(path, f)
# load the file
file_array = np.loadtxt(filepath, delimiter=',')
# get the whole third column
z_column = file_array[:,2]
# get the max of that column
max_z = z_column.max()
# add the max to our list
currentlist.append(max_z)

Python: Appending file outputs from different directories into one overall list

I have n directories (labeled 0 to n), each that has a file (all the files have the same name), from which I want to grab certain lines from each file. I then want to append these grabbed lines together in order (from 0 to n) in a list.
This is my set-up:
for i in range(0, nfolders):
folder = "%02d" % i
os.system("cd " + folder)
myFile = open("myOutputFile", "r")
lines = myFile.readlines()
firstLine = float(lines[0])
#I then write a loop to store the next 5 lines in a list using append and call this list nextLines
My question is, is there an easy way to append firstLine from all the directories into one list (that my function returns), as well as append nextLines from all the directories into one list (again, that my function returns)?
I know there is the extend function, would I loop over that here (because let's say I have nfolders = 300, making it hard to manually add things together)?
Thanks!
You've got a couple of problems to deal with. os.system changes the working directory of the subshell invoke (and then immediately exit), but not the directory of this running script. Use os.chdir for that. Or, far better, just add the path to the file name and use that.
You don't need to read the entire file to get its first line, .readline or the next() function does that for you. Finally, just append to a list.
my_list = []
for i in range(0, nfolders):
filename = "%02d/MyOutputFile" % i
with open(filename) as myFile:
firstLine = float(next(myFile))
my_list.append(firstLine)
UPDATE
Suppose you want 4 + i lines from each file. You could tighten this up with
my_list = []
for i in range(0, nfolders):
filename = "%02d/MyOutputFile" % i
with open(filename) as myFile:
my_list += (next(myFile) for _ in range(4+i))
Note that we only use range to count iterations and don't care about its value so we use the variable _ as a quick visual queue that the value is not needed.

Searching multiple text files for two strings?

I have a folder with many text files (EPA10.txt, EPA55.txt, EPA120.txt..., EPA150.txt). I have 2 strings that are to be searched in each file and the result of the search is written in a text file result.txt. So far I have it working for a single file. Here is the working code:
if 'LZY_201_335_R10A01' and 'LZY_201_186_R5U01' in open('C:\\Temp\\lamip\\EPA150.txt').read():
with open("C:\\Temp\\lamip\\result.txt", "w") as f:
f.write('Current MW in node is EPA150')
else:
with open("C:\\Temp\\lamip\\result.txt", "w") as f:
f.write('NOT EPA150')
Now I want this to be repeated for all the text files in the folder. Please help.
Given that you have some amount of files named from EPA1.txt to EPA150.txt, but you don't know all the names, you can put them all together inside a folder, then read all the files in that folder using the os.listdir() method to get a list of filenames. You can read the file names using listdir("C:/Temp/lamip").
Also, your if statement is wrong, you should do this instead:
text = file.read()
if "string1" in text and "string2" in text
Here's the code:
from os import listdir
with open("C:/Temp/lamip/result.txt", "w") as f:
for filename in listdir("C:/Temp/lamip"):
with open('C:/Temp/lamip/' + filename) as currentFile:
text = currentFile.read()
if ('LZY_201_335_R10A01' in text) and ('LZY_201_186_R5U01' in text):
f.write('Current MW in node is ' + filename[:-4] + '\n')
else:
f.write('NOT ' + filename[:-4] + '\n')
PS: You can use / instead of \\ in your paths, Python automatically converts them for you.
Modularise! Modularise!
Well, not in the terms of having to write distinct Python modules, but isolate the different tasks at hand.
Find the files you wish to search.
Read the file and locate the text.
Write the result into a separate file.
Each of these tasks can be solved independently. I.e. to list the files, you have os.listdir which you might want to filter.
For step 2, it does not matter whether you have 1 or 1,000 files to search. The routine is the same. You merely have to iterate over each file found in step 1. This indicates that step 2 could be implemented as a function that takes the filename (and possible search-string) as argument, and returns True or False.
Step 3 is the combination of each element from step 1 and the result of step 2.
The result:
files = [fn for fn in os.listdir('C:/Temp/lamip') if fn.endswith('.txt')]
# perhaps filter `files`
def does_fn_contain_string(filename):
with open('C:/Temp/lamip/' + filename) as blargh:
content = blargh.read()
return 'string1' in content and/or 'string2' in content
with open('results.txt', 'w') as output:
for fn in files:
if does_fn_contain_string(fn):
output.write('Current MW in node is {1}\n'.format(fn[:-4]))
else:
output.write('NOT {1}\n'.format(fn[:-4]))
You can do this by creating a for loop that runs through all your .txt files in the current working directory.
import os
with open("result.txt", "w") as resultfile:
for result in [txt for txt in os.listdir(os.getcwd()) if txt.endswith(".txt")]:
if 'LZY_201_335_R10A01' and 'LZY_201_186_R5U01' in open(result).read():
resultfile.write('Current MW in node is {1}'.format(result[:-4]))
else:
resultfile.write('NOT {0}'.format(result[:-4]))

Extract number from file name in python

I have a directory where I have many data files, but the data file names have arbitrary numbers. For example
data_T_1e-05.d
data_T_7.2434.d
data_T_0.001.d
and so on. Because of the decimals in the file names they are not sorted according to the value of the numbers. What I want to do is the following:
I want to open every file, extract the number from the file name, put it in a array and do some manipulations using the data. Example:
a = np.loadtxt("data_T_1e-05.d",unpack=True)
res[i][0] = 1e-05
res[i][1] = np.sum[a]
I want to do this for every file by running a loop. I think it could be done by creating an array containing all the file names (using import os) and then doing something with it.
How can it be done?
If your files all start with the same prefix and end with the same suffix, simply slice and pass to float():
number = float(filename[7:-2])
This removes the first 7 characters (i.e. data_T_) and the last 2 (.d).
This works fine for your example filenames:
>>> for example in ('data_T_1e-05.d', 'data_T_7.2434.d', 'data_T_0.001.d'):
... print float(example[7:-2])
...
1e-05
7.2434
0.001
import os
# create the list containing all files from the current dir
filelistall = os.listdir(os.getcwd())
# create the list containing only data files.
# I assume that data file names end with ".d"
filelist = filter(lambda x: x.endswith('.d'), filelistall)
for filename in filelist:
f = open(filename, "r")
number = float(filename[7:-2])
# and any other code dealing with file
f.close()

Categories