reading file data mixing strings and numbers in python

reading file data mixing strings and numbers in python - python

I would like to read different files in one directory with the following structure:
# Mj = 1.60 ff = 7580.6 gg = 0.8325
I would like to read the numbers from each file and associate every one to a vector.
If we assume I have 3 files, I will have 3 components for vector Mj, ...
How can I do it in Python?
Thanks for your help.

I'd use a regular expression to take the line apart:
import re
lineRE = re.compile(r'''
\#\s*
Mj\s*=\s*(?P<Mj>[-+0-9eE.]+)\s*
ff\s*=\s*(?P<ff>[-+0-9eE.]+)\s*
gg\s*=\s*(?P<gg>[-+0-9eE.]+)
''', re.VERBOSE)
for filename in filenames:
for line in file(filename, 'r'):
m = lineRE.match(line)
if not m:
continue
Mj = m.group('Mj')
ff = m.group('ff')
gg = m.group('gg')
# Put them in whatever lists you want here.

Here's a pyparsing solution that might be easier to manage than a regex solution:
text = "# Mj = 1.60 ff = 7580.6 gg = 0.8325 "
from pyparsing import Word, nums, Literal
# subexpression for a real number, including conversion to float
realnum = Word(nums+"-+.E").setParseAction(lambda t:float(t[0]))
# overall expression for the full line of data
linepatt = (Literal("#") + "Mj" + "=" + realnum("Mj") +
"ff" + "=" + realnum("ff") +
"gg" + "=" + realnum("gg"))
# use '==' to test for matching line pattern
if text == linepatt:
res = linepatt.parseString(text)
# dump the matched tokens and all named results
print res.dump()
# access the Mj data field
print res.Mj
# use results names with string interpolation to print data fields
print "%(Mj)f %(ff)f %(gg)f" % res
Prints:
['#', 'Mj', '=', 1.6000000000000001, 'ff', '=', 7580.6000000000004, 'gg', '=', 0.83250000000000002]
- Mj: 1.6
- ff: 7580.6
- gg: 0.8325
1.6
1.600000 7580.600000 0.832500

Related

searching word following a giving pattern

I want get the word 'MASTER_INACTIVE' in the string:
'p_esco_link->state = MASTER_INACTIVE; /*M-t10*/'
by searching reg-expression 'p_esco_link->state =' to find the following word.
I have to replace date accessing to API functions. I try some reg-expression in python 3.6, but it does not work.
pattern = '(?<=\bp_esco_link->state =\W)\w+'
if __name__ == "__main__":
syslogger.info(sys.argv)
if version_info.major != 3:
raise Exception('Olny work on Python 3.x')
with open(cFile, encoding='utf-8') as file_obj:
lineNum = 0
for line in file_obj:
print(len(line))
re_obj = re.compile(pattern)
result = re.search(pattern, line)
lineNum += 1
#print(result)
if result:
print(str(lineNum) + ' ' +str(result.span()) + ' ' + result.group())
excepted Python re module can find the position of 'MASTER_INACTIVE' and put it into result.group().
error message is that Python re module find nothing.

Your pattern is working fine,
Just change the bellow line in your code,
pattern = r'(?<=\bp_esco_link->state =\W)\w+' # add r prefix
Check this sample work, I added line as your string.
import re
pattern = r'(?<=\bp_esco_link->state =\W)\w+'
line = 'p_esco_link->state = MASTER_INACTIVE; /*M-t10*/'
re_obj = re.compile(pattern)
result = re.search(pattern, line)
print(result.span()) # (21, 36)
print(result.group()) # 'MASTER_INACTIVE'
Check below question to get more understand about 'r' prefix,
Python regex - r prefix
What exactly do “u” and “r” string flags do, and what are raw string literals?
What does preceding a string literal with “r” mean? [duplicate]

how to manipulate SREC file

I have an S19 file looking something like below:
S0030000FC
S30D0003C0000F0000000000000020
S3FD00000000782EFF1FB58E00003D2B00003D2B00003D2B00003D2B00003D2B00003D
S3ED000000F83D2B00003D2B00003D2B00003D2B00003D2B00003D2B00003D2B00003D
S31500000400FFFFFFFFFFFFFFFFFFFFFFFF7EF9FFFF7D
S3FD0000041010B5DFF828000468012147F22C10C4F20300016047F22010C4F2030000
S70500008EB4B8
I want to separate the first two characters and also the next two characters, and so on... I want it to look like below (last two characters are also to be separated for each line):
S0, 03, 0000, FC
S3, 0D, 0003C000, 0F00000000000000, 20
S3, FD, 00000000, 782EFF1FB58E00003D2B00003D2B00003D2B00003D2B00003D2B0000, 3D
S3, ED, 000000F8, 3D2B00003D2B00003D2B00003D2B00003D2B00003D2B00003D2B0000, 3D
S3, 15, 00000400, FFFFFFFFFFFFFFFFFFFFFFFF7EF9FFFF, 7D
S3, FD, 00000410, 10B5DFF828000468012147F22C10C4F20300016047F22010C4F20300, 00
S7, 05, 00008EB4, B8
How can I do this in Python?
I have something like this:
#!/usr/bin/python
import string,os,sys,re,fileinput
print "hi"
inputfile = "k60.S19"
outputfile = "k60_out.S19"
# open the source file and read it
fh = file(inputfile, 'r')
subject = fh.read()
fh.close()
# create the pattern object. Note the "r". In case you're unfamiliar with Python
# this is to set the string as raw so we don't have to escape our escape characters
pattern2 = re.compile(r'S3')
pattern3 = re.compile(r'S7')
pattern1 = re.compile(r'S0')
# do the replace
result1 = pattern1.sub("S0, ", subject)
result2 = pattern2.sub("S3, ", subject)
result3 = pattern3.sub("S7, ", subject)
# write the file
f_out = file(outputfile, 'w')
f_out.write(result1)
f_out.write(result2)
f_out.write(result3)
f_out.close()
#EoF
but it is not working as I like!! Can someone help me with how to come up with proper regular expression use for this?

try package bincopy, maybe you need it.
bincopy - Interpret strings as packed binary data
Mangling of various file formats that conveys binary information (Motorola S-Record, Intel HEX and binary files).
import bincopy
f = bincopy.BinFile()
f.add_srec_file("path/to/your/s19/flie.s19")
f.as_binary() # print s19 as binary
or you can easily use open() for a file:
with open("path/to/your/s19/flie.s19") as s19:
for line in s19:
type = line[0:2]
count = line[2:4]
adress = line[4:12]
data = line[12:-2]
crc = line[-2:]
print type + ", "+ count + ", " + adress + ", " + data + ", " + crc + "\n"
hope it helps.
Motorola S-record file format

You can do it using a callback function as replacement with re.sub:
#!/usr/bin/python
import re
data = r'''S0030000FC
S30D0003C0000F0000000000000020
S3FD00000000782EFF1FB58E00003D2B00003D2B00003D2B00003D2B00003D2B00003D
S3ED000000F83D2B00003D2B00003D2B00003D2B00003D2B00003D2B00003D2B00003D
S31500000400FFFFFFFFFFFFFFFFFFFFFFFF7EF9FFFF7D
S3FD0000041010B5DFF828000468012147F22C10C4F20300016047F22010C4F2030000
S70500008EB4B8'''
pattern = re.compile(r'^(..)(..)((?:.{4}){1,2})(.*)(?=..)', re.M)
def repl(m):
repstr = ''
for g in m.groups():
if (g):
repstr += g + ', '
return repstr
print re.sub(pattern, repl, data)
However, as Mark Setchell notices it, there is probably a nice way to do it with slicing.

I know you are thinking Python and regexes, but this was made for awk and the following will maybe help you work out the way to do it using slicing:
awk '{r=length($0);print substr($0,1,2),substr($0,3,2),substr($0,5,8),substr($0,13,r-14),substr($0,r-1)}' OFS=, k60.s19
That says "get the length of the line in variable r, then print the first two characters, the next two characters, the next 8 characters and so on... and use a comma as the field separator".
EDITED
Here are a few more hints to get you started...
if you want to avoid printing line 1, you can do
awk 'FNR==1{next} ...rest of awk script above ... '
If you want to only process lines longer than 40 characters, you can do
awk 'length($0)>40 {print}' yourfile
If you only want to process lines where the second field is "xx", you can do
awk '$2 ~ "xx" {print}' yourfile

Unicode to original character in Python

When I use for example,
unicode_string = u"Austro\u002dHungarian_gulden"
unicode_string.encode("ascii", "ignore")
Then it will give this output:'Austro-Hungarian_gulden'
But I am using a txt file which contains set of data as below:
Austria\u002dHungary Austro\u002dHungarian_gulden
Cocos_\u0028Keeling\u0029_Islands Australian_dollar
El_Salvador Col\u00f3n_\u0028currency\u0029
Faroe_Islands Faroese_kr\u00f3na
Georgia_\u0028country\u0029 Georgian_lari
And I have to process this data using regular expressions in Python, so I have created a script as below, but it does not work for replacing Unicode values with appropiate characters in the string.
Likewise
'\u002d' has appropriate character '-'
'\u0028' has appropriate character '('
'\u0029' has appropriate character ')'
Script for processing text file:
import re
import collections
def extract():
filename = raw_input("Enter file Name:")
in_file = file(filename,"r")
out_file = file("Attribute.txt","w+")
for line in in_file:
values = line.split("\t")
if values[1]:
str1 = ""
for list in values[1]:
list = re.sub("[^\Da-z0-9A-Z()]","",list)
list = list.replace('_',' ')
out_file.write(list)
str1 += list
out_file.write(" ")
if values[2]:
str2 = ""
for list in values[2]:
list = re.sub("[^\Da-z0-9A-Z\n]"," ",list)
list = list.replace('"','')
list = list.replace('_',' ')
out_file.write(list)
str2 += list
s1 = str1.lstrip()
s1 = str1.rstrip()
s2 = str2.lstrip()
s2 = str2.rstrip()
print s1+s2
Expected output for the given data is:
Austria-Hungary Austro-Hungarian gulden
Cocos (Keeling) Islands Australian dollar
El Salvador Coln (currency)
FaroeIslands Faroese krna
Georgia (country) Georgian lari
How can I do it?

Convert the input into Unicode using decode("unicode_escape"), then encode() the output to an encoding of your choice.
>>> r"Austro\u002dHungarian_gulden".decode("unicode_escape")
u'Austro-Hungarian_gulden'

Replace single quotes with double quotes in python, for use with insert into database

Was wondering whether anyone has a clever solution for fixing bad
insert statements in Python, exported by a not so clever program. It didn't add
two single quotes for strings with a single quote in the string. To
make it a bit easier all the values being inserted are strings.
So it has:
INSERT INTO addresses VALUES ('1','1','CUCKOO'S NEST','CUCKOO'S NEST STREET');
instead of:
INSERT INTO addresses VALUES ('1','1','CUCKOO''S NEST','CUCKOO''S NEST STREET');
Obviously there are multiple lines of this and I don't want to replace
the enclosing single quotes as well.
Was thinking of using split and join, but I'm not sure how to easily update the split values while looping in a loop. Sorry I'm a noob. Something like the below, where I'm not sure how to do #update bit
import sys
fileIN = open('a.sql', "r")
line = fileIN.readline()
while line:
bits = line.split("','")
for bit in bits:
if bit.find("'") > -1:
#update bit
line_out = "','".join(bits)
sys.stdout.write(line_out)
line = fileIN.readline()
Thanks

Based on katrielalex's suggestion, how about this:
>>> import re
>>> s = "INSERT INTO addresses VALUES ('1','1','CUCKOO'S NEST','CUCKOO'S NEST STREET');"
>>> def repl(m):
if m.group(1) in ('(', ',') or m.group(2) in (',', ')'):
return m.group(0)
return m.group(1) + "''" + m.group(2)
>>> re.sub("(.)'(.)", repl, s)
"INSERT INTO addresses VALUES ('1','1','CUCKOO''S NEST','CUCKOO''S NEST STREET');"
and if you're into negative lookbehind assertions, this is the headache inducing pure regex version:
re.sub("((?<![(,])'(?![,)]))", "''", s)

while line:
# Restrain line2 to inside parentheses
line1, rest = line.split('(')
line2, line3 = rest.split(')')
# A bit more cleaner
new_bits = []
for bit in line2.split(','):
# Remove border ' characters
bit = bit[1:-1]
# Duplicate the ones inside
if "'" in bit:
bit = bit.replace("'", "''")
# Re-add border '
new_bits.append("'" + bit + "'")
sys.stdout.write(line1 + '(' + ','.join(new_bits + ')' + line3)
line = fileIN.readline()

Warning: This depends way too much on the formatting of the SQL statement. However, if your input is only ever going to have the format "statements (params) end" then this will work every time.
import sys
fileIN = open('a.sql', "r")
line = fileIN.readline()
while line:
#split out the parameters (between the ()'s)
start, temp = line.split("(")
params, end = temp.split(")")
#replace the "'"s in the parameters (without the start and end quote)
newParams = "','".join([x.replace("'", "''") for x in params[1:-1].split("','")])
#join the statement back together
line_out = start + "('" + newParams + "')" + end
#next line
sys.stdout.write(line_out)
line = fileIN.readline()
Explanation:
Split the string into 3 parts: The query start, the parameters, and the end.
The generator takes the parameters (without the starting/ending 's), splits it on ',', and, for every element in the list the split generates (the individual data entries), replaces the 's with ''s.
The last line then joins the query start, the new params (with the parenthesis and quotes that were removed previously), and the end of the statement.

Another answer:
a = "INSERT INTO addresses VALUES ('1','1','CUCKOO'S NEST','CUCKOO'S NEST STREET');"
open_par = a.find("(")
close_par = a.find(")")
b = a[open_par+1:close_par]
c = b.split(",")
d = map(lambda x: '"' + x.strip().strip("'") + '"',c)
result = a[:open_par+1] + ",".join(d) + a[close_par:]

Went with:
import sys
import re
def repl(m):
if m.group(1) in ('(', ',') or m.group(2) in (',', ')'):
return m.group(0)
return m.group(1) + "''" + m.group(2)
fileIN = open('a.sql', "r")
line = fileIN.readline()
while line:
line_out = re.sub("(.)'(.)", repl, line)
sys.stdout.write(line_out)
# Next line.
line = fileIN.readline()

python & pyparsing newb: how to open a file

Paul McGuire, the author of pyparsing, was kind enough to help a lot with a problem I'm trying to solve. We're on 1st down with a yard to goal, but I can't even punt it across the goal line. Confucius said if he gave a student 1/4 of the solution, and he did not return with the other 3/4s, then he would not teach that student again. So it is after almost a week of frustation and with great anxiety that I ask this...
How do I open an input file for pyparsing and print the output to another file?
Here is what I've got so far, but it's really all his work
from pyparsing import *
datafile = open( 'test.txt' )
# Backaus Nuer Form
num = Word(nums)
accessionDate = Combine(num + "/" + num + "/" + num)("accDate")
accessionNumber = Combine("S" + num + "-" + num)("accNum")
patMedicalRecordNum = Combine(num + "/" + num + "-" + num + "-" + num)("patientNum")
gleason = Group("GLEASON" + Optional("SCORE:") + num("left") + "+" + num("right") + "=" + num("total"))
patientData = Group(accessionDate + accessionNumber + patMedicalRecordNum)
partMatch = patientData("patientData") | gleason("gleason")
lastPatientData = None
# PARSE ACTIONS
def patientRecord( datafile ):
for match in partMatch.searchString(datafile):
if match.patientData:
lastPatientData = match
elif match.gleason:
if lastPatientData is None:
print "bad!"
continue
print "{0.accDate}: {0.accNum} {0.patientNum} Gleason({1.left}+{1.right}={1.total})".format(
lastPatientData.patientData, match.gleason
)
patientData.setParseAction(lastPatientData)
# MAIN PROGRAM
if __name__=="__main__":
patientRecord()

It looks like you need to call datafile.read() in order to read the contents of the file. Right now you are trying to call searchString on the file object itself, not the text in the file. You should really look at the Python tutorial (particularly this section) to get up to speed on how to read files, etc.

It seems like you need some help putting it together. The advice of #BrenBarn is spot-on, work with problem of simple complexity before you put it all together. I can help by giving you a minimal example of what you are trying to do, with a much simpler grammar. You can use this as a template to learn how to read/write a file in python. Consider the input text file data.txt:
cat 3
dog 5
foo 7
Let's parse this file and output the results. To have some fun, let's mulpitply the second column by 2:
from pyparsing import *
# Read the input data
filename = "data.txt"
FIN = open(filename)
TEXT = FIN.read()
# Define a simple grammar for the text, multiply the first col by 2
digits = Word(nums)
digits.setParseAction(lambda x:int(x[0]) * 2)
blocks = Group(Word(alphas) + digits)
grammar = OneOrMore(blocks)
# Parse the results
result = grammar.parseString( TEXT )
# This gives a list of lists
# [['cat', 6], ['dog', 10], ['foo', 14]]
# Open up a new file for the output
filename2 = "data2.txt"
FOUT = open(filename2,'w')
# Walk through the results and write to the file
for item in result:
print item
FOUT.write("%s %i\n" % (item[0],item[1]))
FOUT.close()
This gives in data2.txt:
cat 6
dog 10
foo 14
Break each piece down until you understand it. From here, you can slowly adapt this minimal example to your more complex problem above. It's OK to read the file in (as long as it is relatively small) since Paul himself notes:
parseFile is really just a simple shortcut around parseString, pretty
much the equivalent of expr.parseString(open(filename).read()).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

reading file data mixing strings and numbers in python - python

Related

searching word following a giving pattern

how to manipulate SREC file

Unicode to original character in Python

Replace single quotes with double quotes in python, for use with insert into database

python & pyparsing newb: how to open a file

Categories

Resources