Python regex from txt file - python

I have a text file, that has data.
PAS_BEGIN_3600000
CMD_VERS=2
CMD_TRNS=O
CMD_REINIT=
CMD_OLIVIER=
I want to extract data from that file, where nothing is after the equal sign.
So in my new text file, I want to get
CMD_REINIT
CMD_OLIVIER
How do I do this?
My code is like that righr now.
import os, os.path
DIR_DAT = "dat"
DIR_OUTPUT = "output"
print("Psst go check in the ouptut folder ;)")
for roots, dir, files in os.walk(DIR_DAT):
for filename in files:
filename_output = "/" + os.path.splitext(filename)[0]
with open(DIR_DAT + "/" + filename) as infile, open(DIR_OUTPUT + "/bonjour.txt", "w") as outfile:
for line in infile:
if not line.strip().split("=")[-1]:
outfile.write(line)
I want to collect all data in a single file. It doesn't work. Can anyone help me ?
The third step, it do crawl that new file, and only keep single values. As four files are appended into a single one. Some data might be there four, three, two times.
And I need to keep in a new file, that I will call output.txt. Only the lines that are in common in all the files.

You can use regex:
import re
data = """PAS_BEGIN_3600000
CMD_VERS=2
CMD_TRNS=O
CMD_REINIT=
CMD_OLIVIER="""
found = re.findall(r"^\s+(.*)=\s*$",data,re.M)
print( found )
Output:
['CMD_REINIT', 'CMD_OLIVIER']
The expression looks for
^\s+ line start + whitespaces
(.*)= anything before a = which is caputred as group
\s*$ followed by optional whitespaces and line end
using the re.M (multiline) flag.
Read your files text like so:
with open("yourfile.txt","r") as f:
data = f.read()
Write your new file like so:
with open("newfile.txt","w") as f:
f.write(''.join("\n",found))
You can use http://www.regex101.com to evaluate test-text vs regex-patterns, make sure to swith to its python mode.

I suggest you the following short solution using comprehension:
with open('file.txt', 'r') as f, open('newfile.txt', 'w') as newf:
for x in (line.strip()[:-1] for line in f if line.strip().endswith("=")):
newf.write(f'{x}\n')

Try this pattern: \w+(?==$).
Demo

Using a simple iteration.
Ex:
with open(filename) as infile, open(filename2, "w") as outfile:
for line in infile: #Iterate Each line
if not line.strip().split("=")[-1]: #Check for second Val
print(line.strip().strip("="))
outfile.write(line) #Write to new file
Output:
CMD_REINIT
CMD_OLIVIER

Related

How to extract the data from text file if the txt file do not have column?

I want to save the wimp mass only from the text file below into another txt file for plotting purposes. I have a lot of other .txt files to read WIMP_Mass from. I try to use np.loadtxt but cannot since there is string there. Can you guys suggest the code to extract Wimp_Mass, and the output value to be appended to .txt file without deleting the old values.
Omegah^2: 2.1971043635736895E-003
x_f: 25.000000000000000
Wimp_Mass: 100.87269924860568 GeV
sigmav(xf): 5.5536288606920853E-008
sigmaN_SI_p: -1.0000000000000000 GeV^-2: -389000000.00000000 pb
sigmaN_SI_n: -1.0000000000000000 GeV^-2: -389000000.00000000 pb
sigmaN_SD_p: -1.0000000000000000 GeV^-2: -389000000.00000000 pb
sigmaN_SD_n: -1.0000000000000000 GeV^-2: -389000000.00000000 pb
Nevents: 0
smearing: 0
%:a1a1_emep 24.174602466963883
%:a1a1_ddx 0.70401899013937730
%:a1a1_uux 10.607701601533348
%:a1a1_ssx 0.70401807105204617
%:a1a1_ccx 10.606374255125269
%:a1a1_bbx 0.70432127586224602
%:a1a1_ttx 0.0000000000000000
%:a1a1_mummup 24.174596692050287
%:a1a1_tamtap 24.172981870222447
%:a1a1_vevex 1.3837949256836950
%:a1a1_vmvmx 1.3837949256836950
%:a1a1_vtvtx 1.3837949256836950
You can use regex for this, please find below code:
import re
def get_wimp_mass():
# read content of file
with open("txt.file", "r") as f:
# read all lines from file into a list
data_list = f.readlines()
# convert data from list to string
data = "\n".join(data_list)
# use regex to fetch the data
wimp_search = re.search(r'Wimp_Mass:\s+([0-9\.]+)', data, re.M | re.I)
if wimp_search:
return wimp_search.group(1)
else:
return "Wimp mass not found"
if __name__ == '__main__':
wimp_mass = get_wimp_mass()
print(wimp_mass)
You can use basic regex if you want to extract value the code would go something like this:
import re
ans=re.findall("Wimp_Mass:[\ ]+([\d\.]+)",txt)
ans
>>['100.87269924860568']
If you wanted a more general code to extract everything, it could be
re.findall("([a-zA-Z0-9\^:\%]+)[\ ]+([\d\.]+)",txt)
you might have to add in some edge cases, though
Here is a simple solution:
with open('new_file.txt', 'w') as f_out: # not needed if file already exists
pass
filepaths = ['file1.txt', 'file2.txt'] # take all files
with open('new_file.txt', 'a') as f_out:
for file in filepaths:
with open(file, 'r') as f:
for line in f.readlines():
if line.startswith('Wimp_Mass'):
_, mass, _ = line.split()
f_out.write(mass) # writing current mass
f_out.write('\n') # writing newline
It first creates a new text file (remove if file exists). Then you need to enter all file paths (or just names if in same directory). The new_file.txt is opened in append mode and then for each file, the mass is found and added to the new_file.

Overwriting lines in text file [duplicate]

How can I insert a string at the beginning of each line in a text file, I have the following code:
f = open('./ampo.txt', 'r+')
with open('./ampo.txt') as infile:
for line in infile:
f.insert(0, 'EDF ')
f.close
I get the following error:
'file' object has no attribute 'insert'
Python comes with batteries included:
import fileinput
import sys
for line in fileinput.input(['./ampo.txt'], inplace=True):
sys.stdout.write('EDF {l}'.format(l=line))
Unlike the solutions already posted, this also preserves file permissions.
You can't modify a file inplace like that. Files do not support insertion. You have to read it all in and then write it all out again.
You can do this line by line if you wish. But in that case you need to write to a temporary file and then replace the original. So, for small enough files, it is just simpler to do it in one go like this:
with open('./ampo.txt', 'r') as f:
lines = f.readlines()
lines = ['EDF '+line for line in lines]
with open('./ampo.txt', 'w') as f:
f.writelines(lines)
Here's a solution where you write to a temporary file and move it into place. You might prefer this version if the file you are rewriting is very large, since it avoids keeping the contents of the file in memory, as versions that involve .read() or .readlines() will. In addition, if there is any error in reading or writing, your original file will be safe:
from shutil import move
from tempfile import NamedTemporaryFile
filename = './ampo.txt'
tmp = NamedTemporaryFile(delete=False)
with open(filename) as finput:
with open(tmp.name, 'w') as ftmp:
for line in finput:
ftmp.write('EDF '+line)
move(tmp.name, filename)
For a file not too big:
with open('./ampo.txt', 'rb+') as f:
x = f.read()
f.seek(0,0)
f.writelines(('EDF ', x.replace('\n','\nEDF ')))
f.truncate()
Note that , IN THEORY, in THIS case (the content is augmented), the f.truncate() may be not really necessary. Because the with statement is supposed to close the file correctly, that is to say, writing an EOF (end of file ) at the end before closing.
That's what I observed on examples.
But I am prudent: I think it's better to put this instruction anyway. For when the content diminishes, the with statement doesn't write an EOF to close correctly the file less far than the preceding initial EOF, hence trailing initial characters remains in the file.
So if the with statement doens't write EOF when the content diminishes, why would it write it when the content augments ?
For a big file, to avoid to put all the content of the file in RAM at once:
import os
def addsomething(filepath, ss):
if filepath.rfind('.') > filepath.rfind(os.sep):
a,_,c = filepath.rpartition('.')
tempi = a + 'temp.' + c
else:
tempi = filepath + 'temp'
with open(filepath, 'rb') as f, open(tempi,'wb') as g:
g.writelines(ss + line for line in f)
os.remove(filepath)
os.rename(tempi,filepath)
addsomething('./ampo.txt','WZE')
f = open('./ampo.txt', 'r')
lines = map(lambda l : 'EDF ' + l, f.readlines())
f.close()
f = open('./ampo.txt', 'w')
map(lambda l : f.write(l), lines)
f.close()

Python - replace the startswith character

I want to replace the first character in each line from the text file.
2 1.510932 0.442072 0.978141 0.872182
5 1.510932 0.442077 0.978141 0.872181
Above is my text file.
import sys
import glob
import os.path
list_of_files = glob.glob('/path/txt/23.txt')
for file_name in list_of_files:
f= open(file_name, 'r')
lst = []
for line in f:
f = open(file_name , 'w')
if line.startswith("2 "):
line = line.replace("2 ","7")
f.write(line)
f.close()
What i want:-
If the number starting with 2, i want to change that into 7. The problem is that, In the same line multiple 7 is there. If i change startswith character and save everything was changing
Thanks
The proper solution is (pseudo code):
open sourcefile for reading as input
open temporaryfile for writing as output
for each line in input:
fix the line
write it to output
close input
close output
replace sourcefile with temporaryfile
We use a temporary file and write along to avoid potential memory errors.
I leave it up to you to translate this to Python (hint: that's quite straightforward).
This is one approach.
Ex:
for file_name in list_of_files:
data = []
with open(file_name) as infile:
for line in infile:
if line.startswith("2 "): #Check line
line = " ".join(['7'] + line.split()[1:]) #Update line
data.append(line)
with open(file_name, "w") as outfile: #Write back to file
for line in data:
outfile.write(line+"\n")

Python read file by line and write into a different file

SO basically what I am trying to do is that I am trying to make it so I can read a file line by line, and then have a certain text added after the text displayed
For Ex.
Code:
file = open("testlist.txt",'w')
file2 = open("testerlist.txt",'r+')
//This gives me a syntax error obviously.
file.write1("" + file + "" + file2 + "")
Textlist
In my testlist.txt it lists as:
os
Testerlist
In my testerlist.txt it lists as:
010101
I am trying to copy one text from one file and read another file and add it to the beginning of a new file for ex.[accounts.txt].
My End Result
For my end result I am trying to have it be like:
os010101
(btw I have all the correct code, its just that I am using this as an example so if I am missing any values its just because I was to lazy to add it.)
You can use file.read() to read the contents of a file. Then just concatenate the data from two files and write to the output file:
with open("testlist.txt") as f1, open("testerlist.txt") as f2, \
open("accounts.txt", "w") as f3:
f3.write(f1.read().strip() + f2.read().strip())
Note that 'mode' is not required when opening files for reading.
If you need to write the lines in particular order, you could use file.readlines() to read the lines into a list and file.writelines() to write multiple lines to the output file, e.g.:
with open("testlist.txt") as f1, open("testerlist.txt") as f2, \
open("accounts.txt", "w") as f3:
f1_lines = f1.readlines()
f3.write(f1_lines[0].strip())
f3.write(f2.read().strip())
f3.writelines(f1_lines[1:])
Try with something like this:
with open('testlist.txt', 'r') as f:
input1 = f.read()
with open('testerlist.txt', 'r') as f:
input2 = f.read()
output = input1+input2
with open("accounts.txt", "a") as myfile:
myfile.write(output)

Data extraction and manipulation in jython

For a given file
For ex : 11 ,345 , sdfsfsfs , 1232
i need to such above records from a file , read 11 to delimiter and strip the white space and store in the another file , similarly 345 to delimiter strip the while space and store in the file. This way i need to do for multiple rows.
so finally in the other file the data should look like
11,345,sdfsfsfs,1232
Please suggest me the way. Thanks for your help.
Open the input file (1) and the output file (2) for reading and writing respectively.
file1 = open('file1', 'r')
file2 = open('file2', 'w')
Iterate over the input file, getting each line. Split the line on a comma. Then re-join the line using a comma, but first stripping the whitespace (using a list comprehension).
for line in file1:
fields = line.split(',')
line = ",".join([s.strip() for s in fields])
file2.write(line + "\n")
Finally, close the input and output files.
file1.close()
file2.close()
I'm not sure of jython's capabilities when it comes to generators, so that is why I used a list comprehension. Please feel free to edit this (someone who knows jython better).
One approach you could take would be to remove all whitespace using the string.translate function.
import string
#makes a 256 character string that will be used in the translate function
trans_table = string.maketrans('', '')
file1 = open('file1', 'r')
file2 = open('file2', 'w')
for line in file1:
file2.write(line.translate(trans_table, string.whitespace) + '\n')
#Or the above could be written as:
# line = line.translate(trans_table, string.whitespace)
# file2.write(line + '\n')
file1.close()
file2.close()

Categories