How to save an edited txt file into a new txt file? - python

I am trying to save my output from x .txt files in only one .txt file.
The .txt file should look like the output as you can see in the picture below.
What this program actually does is read a couple of .txt files with tons of data which I filter out using regex.
My source code:
import os,glob
import re
folder_path =(r"C:\Users\yokay\Desktop\DMS\Messdaten_DMT")
values_re = re.compile(r'\t\d+\t-?\d+,?\d*(\t-?\d+,?\d+){71}')
for filename in glob.glob(os.path.join(folder_path, '*.txt')):
with open(filename) as lines:
for line in lines:
match = values_re.search(line)
if match:
values = match.group(0).split('\t')
assert values[0] == ''
values = values[1:]
print(values)
Thank you for your time! :)

Then you just need to open a file and write values to it. Try with this. You might need to format (I cannot test since I don't have your text files. I am assuming the output you have in values is correct and keep in mind that this is appending, so if you run more than once you will get duplicates.
import os,glob
import re
folder_path =(r"C:\Users\yokay\Desktop\DMS\Messdaten_DMT")
values_re = re.compile(r'\t\d+\t-?\d+,?\d*(\t-?\d+,?\d+){71}')
outF = open("myOutFile.txt", "a")
for filename in glob.glob(os.path.join(folder_path, '*.txt')):
with open(filename) as lines:
for line in lines:
match = values_re.search(line)
if match:
values = match.group(0).split('\t')
assert values[0] == ''
values = values[1:]
outF.write(values)
print(values)

Related

Extracting a diffrentiating numerical value from multiple files - PowerShell/Python

I have multiple text files containing different text.
They all contain a single appearance of the same 2 lines I am interested in:
================================================================
Result: XX/100
I am trying to write a script to collect all those XX values (numerical values between 0 and 100), and paste them in a CSV file with the text file name in column A and the numerical value in column B.
I have considered using Python or PowerShell for this purpose.
How can I identify the line where "Result" appears under the string of "===..", collect its content until '\n', and then strip it from "Result: " and "/100"?
"Result" and other numerical values could appear in the files, but never in the quoted format, and below "=====", like the line im interested in.
Thank you!
Edit: I have written this poor naive attempt to collect the numerical values.
import os
dir_path = os.path.dirname(os.path.realpath(__file__))
for filename in os.listdir(dir_path):
if filename.endswith(".txt"):
with open(filename,"r") as f:
lineFound=False
for index, line in enumerate(f):
if lineFound:
line=line.replace("Result: ", "")
line=line.replace("/100","")
line.strip()
grade=line
lineFound=False
print(grade, end='')
continue
if index>3:
if "================================================================" in line:
lineFound=True
I'd still be happy to learn if there's a simple way to do this with PowerShell tbh
For the output, I used csv writer to append the results to a file one by one.
So there's two steps involved here, first is to get a list of files. There's a ton of answers for that one on stackoverflow, but this one is stupidly complete.
Once you have the list of files, you can simply just load the files themselves one by one, and then do some simple string.split() to get the value you want.
Finally, write the results into a CSV file. Since the CSV file is a simple one, you don't need to use the CSV library for this.
See the code example below. Note that I copied/pasted the function for generating the list of files from my personal github repo. I reuse that one a lot.
import os
def get_files_from_path(path: str = ".", ext:str or list=None) -> list:
"""Find files in path and return them as a list.
Gets all files in folders and subfolders
See the answer on the link below for a ridiculously
complete answer for this.
https://stackoverflow.com/a/41447012/9267296
Args:
path (str, optional): Which path to start on.
Defaults to '.'.
ext (str/list, optional): Optional file extention.
Defaults to None.
Returns:
list: list of file paths
"""
result = []
for subdir, dirs, files in os.walk(path):
for fname in files:
filepath = f"{subdir}{os.sep}{fname}"
if ext == None:
result.append(filepath)
elif type(ext) == str and fname.lower().endswith(ext.lower()):
result.append(filepath)
elif type(ext) == list:
for item in ext:
if fname.lower().endswith(item.lower()):
result.append(filepath)
return result
filelist = get_files_from_path("path/to/files/", ext=".txt")
split1 = "================================================================\nResult: "
split2 = "/100"
with open("output.csv", "w") as outfile:
outfile.write('filename, value\n')
for filename in filelist:
with open(filename) as infile:
value = infile.read().split(split1)[1].split(split2)[0]
print(value)
outfile.write(f'"{filename}", {value}\n')
You could try this.
In this example the filename written to the CSV will be its full (absolute) path. You may just want the base filename.
Uses the same, albeit seemingly unnecessary, mechanism for deriving the source directory. It would be unusual to have your Python script in the same directory as your data.
import os
import glob
equals = '=' * 64
dir_path = os.path.dirname(os.path.realpath(__file__))
outfile = os.path.join(dir_path, 'foo.csv')
with open(outfile, 'w') as csv:
print('A,B', file=csv)
for file in glob.glob(os.path.join(dir_path, '*.txt')):
prev = None
with open(file) as indata:
for line in indata:
t = line.split()
if len(t) == 2 and t[0] == 'Result:' and prev.startswith(equals):
v = t[1].split('/')
if len(v) == 2 and v[1] == '100':
print(f'{file},{v[0]}', file=csv)
break
prev = line

How to extract the data from text file if the txt file do not have column?

I want to save the wimp mass only from the text file below into another txt file for plotting purposes. I have a lot of other .txt files to read WIMP_Mass from. I try to use np.loadtxt but cannot since there is string there. Can you guys suggest the code to extract Wimp_Mass, and the output value to be appended to .txt file without deleting the old values.
Omegah^2: 2.1971043635736895E-003
x_f: 25.000000000000000
Wimp_Mass: 100.87269924860568 GeV
sigmav(xf): 5.5536288606920853E-008
sigmaN_SI_p: -1.0000000000000000 GeV^-2: -389000000.00000000 pb
sigmaN_SI_n: -1.0000000000000000 GeV^-2: -389000000.00000000 pb
sigmaN_SD_p: -1.0000000000000000 GeV^-2: -389000000.00000000 pb
sigmaN_SD_n: -1.0000000000000000 GeV^-2: -389000000.00000000 pb
Nevents: 0
smearing: 0
%:a1a1_emep 24.174602466963883
%:a1a1_ddx 0.70401899013937730
%:a1a1_uux 10.607701601533348
%:a1a1_ssx 0.70401807105204617
%:a1a1_ccx 10.606374255125269
%:a1a1_bbx 0.70432127586224602
%:a1a1_ttx 0.0000000000000000
%:a1a1_mummup 24.174596692050287
%:a1a1_tamtap 24.172981870222447
%:a1a1_vevex 1.3837949256836950
%:a1a1_vmvmx 1.3837949256836950
%:a1a1_vtvtx 1.3837949256836950
You can use regex for this, please find below code:
import re
def get_wimp_mass():
# read content of file
with open("txt.file", "r") as f:
# read all lines from file into a list
data_list = f.readlines()
# convert data from list to string
data = "\n".join(data_list)
# use regex to fetch the data
wimp_search = re.search(r'Wimp_Mass:\s+([0-9\.]+)', data, re.M | re.I)
if wimp_search:
return wimp_search.group(1)
else:
return "Wimp mass not found"
if __name__ == '__main__':
wimp_mass = get_wimp_mass()
print(wimp_mass)
You can use basic regex if you want to extract value the code would go something like this:
import re
ans=re.findall("Wimp_Mass:[\ ]+([\d\.]+)",txt)
ans
>>['100.87269924860568']
If you wanted a more general code to extract everything, it could be
re.findall("([a-zA-Z0-9\^:\%]+)[\ ]+([\d\.]+)",txt)
you might have to add in some edge cases, though
Here is a simple solution:
with open('new_file.txt', 'w') as f_out: # not needed if file already exists
pass
filepaths = ['file1.txt', 'file2.txt'] # take all files
with open('new_file.txt', 'a') as f_out:
for file in filepaths:
with open(file, 'r') as f:
for line in f.readlines():
if line.startswith('Wimp_Mass'):
_, mass, _ = line.split()
f_out.write(mass) # writing current mass
f_out.write('\n') # writing newline
It first creates a new text file (remove if file exists). Then you need to enter all file paths (or just names if in same directory). The new_file.txt is opened in append mode and then for each file, the mass is found and added to the new_file.

number of strings between .txt files

I am new in Python. I am looking for the number of occurrences of a text string in a defined folder containing text files. I'm talking about the total number of this particular string.
def errors():
errors = 0
file = open ("\\d:\\myfolder\\*.txt", "r")
data = file.read()
errors = data.count("errors")
return errors
print("Errors:", errors)
Your code doesn't make any sense, but if I understand what you want to do, then here's some pseudo-code to get you going:
from glob import glob
text_file_paths = glob("\\d:\\myfolder\\*.txt")
error_counting = 0
for file_path in text_file_paths:
with open(file_path, 'r') as f:
all_file_lines = f.readlines()
error_counting += sum([line.count('errors') for line in all_lines])
print(error_counting)
Does that help?

Using same code for multiple text files and generate multiple text files as output using python

I have more than 30 text files. I need to do some processing on each text file and save them again in text files with different names.
Example-1: precise_case_words.txt ---- processing ---- precise_case_sentences.txt
Example-2: random_case_words.txt ---- processing ---- random_case_sentences.txt
Like this i need to do for all text files.
present code:
new_list = []
with open('precise_case_words.txt') as inputfile:
for line in inputfile:
new_list.append(line)
final = open('precise_case_sentences.txt', 'w+')
for item in new_list:
final.write("%s\n" % item)
Am manually copy+paste this code all the times and manually changing the names everytime. Please suggest me a solution to avoid manual job using python.
Suppose you have all your *_case_words.txt in the present dir
import glob
in_file = glob.glob('*_case_words.txt')
prefix = [i.split('_')[0] for i in in_file]
for i, ifile in enumerate(in_file):
data = []
with open(ifile, 'r') as f:
for line in f:
data.append(line)
with open(prefix[i] + '_case_sentence.txt' , 'w') as f:
f.write(data)
This should give you an idea about how to handle it:
def rename(name,suffix):
"""renames a file with one . in it by splitting and inserting suffix before the ."""
a,b = name.split('.')
return ''.join([a,suffix,'.',b]) # recombine parts including suffix in it
def processFn(name):
"""Open file 'name', process it, save it under other name"""
# scramble data by sorting and writing anew to renamed file
with open(name,"r") as r, open(rename(name,"_mang"),"w") as w:
for line in r:
scrambled = ''.join(sorted(line.strip("\n")))+"\n"
w.write(scrambled)
# list of filenames, see link below for how to get them with os.listdir()
names = ['fn1.txt','fn2.txt','fn3.txt']
# create demo data
for name in names:
with open(name,"w") as w:
for i in range(12):
w.write("someword"+str(i)+"\n")
# process files
for name in names:
processFn(name)
For file listings: see How do I list all files of a directory?
I choose to read/write line by line, you can read in one file fully, process it and output it again on block to your liking.
fn1.txt:
someword0
someword1
someword2
someword3
someword4
someword5
someword6
someword7
someword8
someword9
someword10
someword11
into fn1_mang.txt:
0demoorsw
1demoorsw
2demoorsw
3demoorsw
4demoorsw
5demoorsw
6demoorsw
7demoorsw
8demoorsw
9demoorsw
01demoorsw
11demoorsw
I happened just today to be writing some code that does this.

writing the data in text file while converting it to csv

I am very new with python. I have a .txt file and want to convert it to a .csv file with the format I was told but could not manage to accomplish. a hand can be useful for it. I am going to explain it with screenshots.
I have a txt file with the name of bip.txt. and the data inside of it is like this
I want to convert it to csv like this csv file
So far, what I could do is only writing all the data from text file with this code:
read_files = glob.glob("C:/Users/Emrehana1/Desktop/bip.txt")
with open("C:/Users/Emrehana1/Desktop/Test_Result_Report.csv", "w") as outfile:
for f in read_files:
with open(f, "r") as infile:
outfile.write(infile.read())
So is there a solution to convert it to a csv file in the format I desire? I hope I have explained it clearly.
There's no need to use the glob module if you only have one file and you already know its name. You can just open it. It would have been helpful to quote your data as text, since as an image someone wanting to help you can't just copy and paste your input data.
For each entry in the input file you will have to read multiple lines to collect together the information you need to create an entry in the output file.
One way is to loop over the lines of input until you find one that begins with "test:", then get the next line in the file using next() to create the entry:
The following code will produce the split you need - creating the csv file can be done with the standard library module, and is left as an exercise. I used a different file name, as you can see.
with open("/tmp/blip.txt") as f:
for line in f:
if line.startswith("test:"):
test_name = line.strip().split(None, 1)[1]
result = next(f)
if not result.startswith("outcome:"):
raise ValueError("Test name not followed by outcome for test "+test_name)
outcome = result.strip().split(None, 1)[1]
print test_name, outcome
You do not use the glob function to open a file, it searches for file names matching a pattern. you could open up the file bip.txt then read each line and put the value into an array then when all of the values have been found join them with a new line and a comma and write to a csv file, like this:
# set the csv column headers
values = [["test", "outcome"]]
current_row = []
with open("bip.txt", "r") as f:
for line in f:
# when a blank line is found, append the row
if line == "\n" and current_row != []:
values.append(current_row)
current_row = []
if ":" in line:
# get the value after the semicolon
value = line[line.index(":")+1:].strip()
current_row.append(value)
# append the final row to the list
values.append(current_row)
# join the columns with a comma and the rows with a new line
csv_result = ""
for row in values:
csv_result += ",".join(row) + "\n"
# output the csv data to a file
with open("Test_Result_Report.csv", "w") as f:
f.write(csv_result)

Categories