I try to read data from a table in html. I read periodically and the table length always change and I don't know its length. However the table is always on the same format so I try to recognize some pattern and read data based on it's position.
The html is of the form:
<head>
<title>Some webside</title>
</head>
<body
<tr><td> There are some information coming here</td></tr>
<tbody><table>
<tr><td>First</td><td>London</td><td>24</td><td>3</td><td>19:00</td><td align="center"></td></tr>
<tr bgcolor="#cccccc"><td>Second</td><td>NewYork</td><td>24</td><td>4</td><td>20:13</td><td align="center"></td></tr>
<tr><td>Some surprise</td><td>Swindon</td><td>25</td><td>5</td><td>20:29</td><td align="center"></td></tr>
<tr bgcolor="#cccccc"><td>Third</td><td>Swindon</td><td>24</td><td>6</td><td>20:45</td><td align="center"></td></tr>
</tbody></table>
<tr><td> There are some information coming here</td></tr>
</body>
I convert html to a string and go over it to read the data but I want to read it only once. My code is:
def ReadTable(m):
refList = []
firstId = 1
nextId = 2
k = 1
helper = 1
while firstId != nextId:
row = []
helper = m.find('<td><a href="d?k=', helper) + 17
end_helper = m.find('">', helper)
rowId = m[helper : end_helper]
if k == 1: # to check if looped again
firstId = rowId
else:
nextId = rowId
row.append(rowId)
helper = end_helper + 2
end_helper = m.find('</a></td><td>', helper)
rowPlace = m[helper : end_helper]
row.append(rowPlace)
helper = m.find('</a></td><td>', end_helper) + 13
end_helper = m.find('</td><td>', helper)
rowCity = m[helper : end_helper]
row.append(rowCity)
helper = end_helper + 9
end_helper = m.find('</td><td>', helper)
rowDay = m[helper : end_helper]
row.append(rowDay)
helper = end_helper + 9
end_helper = m.find('</td><td>', helper)
rowNumber = m[helper : end_helper]
row.append(rowNumber)
helper = end_helper + 9
end_helper = m.find('</td>', helper)
rowTime = m[helper : end_helper]
row.append(rowTime)
refList.append(row)
k +=1
return refList
if __name__ == '__main__':
filePath = '/home/m/workspace/Tests/mainP.html'
fileRead = open(filePath)
myString = fileRead.read()
print myString
refList = ReadTable(myString)
print 'Final List = %s' % refList
I expect the outcome as a list with 4 lists inside like that:
Final List = [['101', 'First', 'London', '24', '3', '19:00'], ['102', 'Second', 'NewYork', '24', '4', '20:13'], ['201', 'Some surprise', 'Swindon', '25', '5', '20:29'], ['202', 'Third', 'Swindon', '24', '6', '20:45']]
I expect that after first loop the string is read again and the firstId is found again and my while-loop will terminate. Instead I have infinite loop and my list start to look like this:
Final List = [['101', 'First', 'London', '24', '3', '19:00'], ['102', 'Second', 'NewYork', '24', '4', '20:13'], ['201', 'Some surprise', 'Swindon', '25', '5', '20:29'], ['202', 'Third', 'Swindon', '24', '6', '20:45'], ['me webside</title>\n</head>\n<body \n<tr><td> There are some information coming here</td></tr>\n<tbody><table>\n<tr><td><a href="d?k=101', 'First', 'London', '24', '3', '19:00'], ['102', 'Second', 'NewYork', '24', '4', '20:13']...
I don't understand why my helper start to behave this way and I can't figure out how a program like that should be written. Can you suggest a good/effective way to write it or to fix my loop?
I would suggest you invest some time in looking at LXML. It allows you to look at all of the tables in an html file and work with the sub-elements of the things that make up the table (like rows and cells)
LXML is not hard to work with and it allows you to feed in a string with the
html.fromstring(somestring)
Further, there arte a lot of lxml questions that have been asked and answered here on SO so it is not to hard to find good examples to work from
You aren't checking the return from your find and it is returning -1 when it doesn't find a match.
http://docs.python.org/2/library/string.html#string.find
Return -1 on failure
I updated this section of the code and it returns as you expect now. First and last row below match what you have above so you can find the replacement.
row = []
helper = m.find('<td><a href="d?k=', helper)
if helper == -1:
break
helper += 17
end_helper = m.find('">', helper)
Related
here's my code :
UserList = [['person1', '25yo','70kg','170cm'],[ 'person2','21yo','54kg','164cm']]
ListStrUser = []
for ListStrUser in UserList:
ListStrUser = GetNum(UserList)
def GetNum(anyList):
for i in range(1,len(anyList)):
anyList[i] = re.sub (r'\D',"", str(anyList[i]))
return anyList
print(ListStrUser)
########
expected result :
[['person1', '25','70','170'],[ 'person2','21','54','164']]
You were not far off Asif. But I cannot add much more to Ethan's answer which is why I'm confused that it was down voted. If you want a function that can handle all the work without the need for another for loop then this function below will do just that:
import re
UserList = [['person1', '25yo','70kg','170cm'],[ 'person2','21yo','54kg','164cm']]
def get_num(any_list):
# includes the for loop to iterate through the list of lists
list_str_user = []
for inner_list in any_list:
temp_list = [inner_list[0]]
for i in range(1,len(inner_list)):
temp_list.append(re.sub(r'\D', '', str(inner_list[i])))
list_str_user.append(temp_list)
return list_str_user
print(get_num(UserList))
Output:
[['person1', '25', '70', '170'], ['person2', '21', '54', '164']]
So no need for the for loop outside the function.
import re
def GetNum(anyList):
for i in range(1, len(anyList)):
anyList[i] = re.sub(r'\D[^0-9]',"",str(anyList[i]))
return anyList
userList = [['person1','25yo','70kg','170cm'],['person2','21yo','54kg','164cm']]
for ListStrUser in userList: ListStrUser = GetNum(ListStrUser)
print("Output : ", userList)
output: [['person1', '25', '70', '170'], ['person2', '21', '54', '164']]
from #Guy 's comment:
UserList = [['person1', '25yo','70kg','170cm'],[ 'person2','21yo','54kg','164cm']]
import re
def GetNum(anyList):
for i in range(1,len(anyList)):
anyList[i] = re.sub (r'\D',"", str(anyList[i]))
return anyList
ListStrUser = []
for ListStr in UserList:
ListStrUser.append(GetNum(ListStr))
print(ListStrUser)
gives
[['person1', '25', '70', '170'], ['person2', '21', '54', '164']]
Try the following code:
user_list = [['person1', '25yo','70kg','170cm'],[ 'person2','21yo','54kg','164cm']]
list_str_user = []
def get_num(any_list):
updated_list = [any_list[0]]
for i in range(1,len(any_list)):
updated_list.append(re.sub(r'\D',"", str(any_list[i])))
return updated_list
for user in user_list:
list_str_user.append(get_num(user))
print(list_str_user)
Notice I also updated the naming of your variables and functions and the spacing between functions to be compliant with pep8. Keep this in mind when writing Python code.
Also functions should be defined before you use them otherwise Python won't find them.
I also created the updated_list variable in get_num, it's never a bad idea to not mutate parameters in functions.
What am I doing wrong here? I am getting errors when I run it. It says I can't convert string type to float. I want to store the values from C1 to c1 variables as float type for calculations.
import csv
file = open('Items.csv')
reader = csv.reader(file, delimiter=',')
items = dict()
headersRead = False
headers = []
for row in reader:
if headersRead == False:
for i in range(len(row)):
items[row[i]] = []
print(row[i])
headers = row
print(headers)
headersRead = True
else:
for i in range(len(row)):
items[headers[i]].append(row[i])
for key in items:
c1 = float(items[key][0])
c2 = float(items[key][1])
c3 = float(items[key][2])
constant = float(items[key][3])
file.close()
This is the csv file I am working with.
Item,C1,C2,C3,Constant
Guitar Hero,-0.1111,0,-0.2,10
iPhone 7,-0.1,-0.2,-0.33333,3
iPhone SE,-0.889,-0.23,-0.5,2
Star Wars,-0.0778,-0.373333333,-0.5,4
Markers,-0.667,-0.488333333,-0.65,3
Avengers,-0.556,-0.603333333,-0.756667,5
Elf on the Shelf,-0.04,-0.718333333,-0.863334,1
Pool Cue,-0.334,0,0,9
Tire Repair Kit,-0.223,-0.948333333,-0.076668,6
Silly Putty,-0.112,-0.063333333,-0.183335,1
Nike,-0.123,-0.178333333,0,5
This is the dictionary(items) u have used to iterate the last for loop
{'Item': ['Guitar Hero', 'iPhone 7', 'iPhone SE', 'Star Wars', 'Markers', 'Avengers', 'Elf on the Shelf', 'Pool Cue', 'Tire Repair Kit', 'Silly Putty', 'Nike'], 'C1': ['-0.1111', '-0.1', '-0.889', '-0.0778', '-0.667', '-0.556', '-0.04', '-0.334', '-0.223', '-0.112', '-0.123'], 'C2': ['0', '-0.2', '-0.23', '-0.373333333', '-0.488333333', '-0.603333333', '-0.718333333', '0', '-0.948333333', '-0.063333333', '-0.178333333'], 'C3': ['-0.2', '-0.33333', '-0.5', '-0.5', '-0.65', '-0.756667', '-0.863334', '0', '-0.076668', '-0.183335', '0'], 'Constant': ['10', '3', '2', '4', '3', '5', '1', '9', '6', '1', '']}
In the first key("Item") in the dictionary, which has a list of strings as its value. So I have added a if-continue statement, which skips to the next iteration of the loop if the key is "Item".
import csv
file = open('Items.csv')
reader = csv.reader(file, delimiter=',')
items = dict()
headersRead = False
headers = []
for row in reader:
if headersRead == False:
for i in range(len(row)):
items[row[i]] = []
print(row[i])
headers = row
print(headers)
headersRead = True
else:
for i in range(len(row)):
items[headers[i]].append(row[i])
for key in items:
if key == 'Item':
continue
c1 = float(items[key][0])
c2 = float(items[key][1])
c3 = float(items[key][2])
constant = float(items[key][3])
file.close()
Unfortunately, I couldn't add images for explanation as this is my first answer on Stack overflow.
This is the data:
C:/data/my_file.txt.c:10:0x21:name1:name2:0x10:1:OK
C:/data/my_file2.txt.c:110:0x1:name2:name5:0x12:1:NOT_OK
./data/my_file3.txt.c:110:0x1:name2:name5:0x12:10:OK
And I would like to get this result
[C:/data/my_file.txt.c, 10, 0x21, name1, name2, 0x10, 1, OK]
[C:/data/my_file2.txt.c, 110, 0x1, name2, name5, 0x12, 1, NOT_OK]
[./data/my_file3.txt.c, 110, 0x1, name2, name5, 0x12, 10, OK]
I know how to do that with some code or string split and stuff like that, but I am searching for a nice solution using pyparsing. My problem is the :/ for the file path.
Additional Question I use some code to strip comments and other stuff from the records so the raw data looks like this:
text = """C:/data/my_file.txt.c:10:0x21:name1:name2:0x10:1:OK
C:/data/my_file2.txt.c:110:0x1:name2:name5:0x12:1:NOT_OK
// comment
./data/my_file3.txt.c:110:0x1:name2:name5:0x12:10:OK
----
ok
"""
And i strip the "//", "ok", and "---" before parsing right now
So now I have a next question too the first:
Some addition to the first question. Till now I extracted the lines above from a data file - that works great. So I read the file line by line and parse it. But now I found out it is possible to use parseFile to parse a whole file. So I think I could strip some of my code and use parseFile instead. So the files I would like to parse have an additional footer.
C:/data/my_file.txt.c:10:0x21:name1:name2:0x10:1:OK
C:/data/my_file2.txt.c:110:0x1:name2:name5:0x12:1:NOT_OK
./data/my_file3.txt.c:110:0x1:name2:name5:0x12:10:OK: info message
-----------------------
3 Files 2 OK 1 NOT_OK
NOT_OK
Is it possible to change the parser to get 2 parse results?
Result1:
[['C:/data/my_file.txt.c', '10', '0x21', 'name1', 'name2', '0x10', '1', 'OK'],
['C:/data/my_file2.txt.c', '110', '0x1', 'name2', 'name5', '0x12', '1', 'NOT_OK'],
['./data/my_file3.txt.c', '110', '0x1', 'name2', 'name5', '0x12', '10', 'OK']]
Ignore the blank line
Ignore this line => -----------------------
Result 2:
[['3', 'Files', 2', 'OK’, '1', 'NOT_OK'],
['NOT_OK’],
So I changed the thes Code for that:
# define an expression for your file reference
one_thing = Combine(
oneOf(list(alphas)) + ':/' +
Word(alphanums + '_-./'))
# define a catchall expression for everything else (words of non-whitespace characters,
# excluding ':')
another_thing = Word(printables + " ", excludeChars=':')
# define an expression of the two; be sure to list the file reference first
thing = one_thing | another_thing
# now use plain old pyparsing delimitedList, with ':' delimiter
list_of_things = delimitedList(thing, delim=':')
list_of_other_things = Word(printables).setName('a')
# run it and see...
parse_ret = OneOrMore(Group(list_of_things | list_of_other_things)).parseFile("data.file")
parse_ret.pprint()
And I get this result:
[['C:/data/my_file.txt.c', '10', '0x21', 'name1', 'name2', '0x10', '1', 'OK'],
['C:/data/my_file2.txt.c','110', '0x1', 'name2', 'name5', '0x12', '1', 'NOT_OK'],
['./data/my_file3.txt.c', '110', '0x1', 'name2', 'name5', '0x12', '10', 'OK', 'info message'],
['-----------------------'],
['3 Files 2 OK 1 NOT_OK'],
['NOT_OK']]
So I can go with this but is it possible to split the result into two named results? I searched the docs but I didn´t find anything that works.
See embedded comments for pyparsing description:
from pyparsing import *
text = """C:/data/my_file.txt.c:10:0x21:name1:name2:0x10:1:OK
C:/data/my_file2.txt.c:110:0x1:name2:name5:0x12:1:NOT_OK
// blah-de blah blah blah
./data/my_file3.txt.c:110:0x1:name2:name5:0x12:10:OK"""
# define an expression for your file reference
one_thing = Combine(
oneOf(list(alphas.upper())) + ':/' +
Word(alphanums + '_-./'))
# define a catchall expression for everything else (words of non-whitespace characters,
# excluding ':')
another_thing = Word(printables, excludeChars=':')
# define an expression of the two; be sure to list the file reference first
thing = one_thing | another_thing
# now use plain old pyparsing delimitedList, with ':' delimiter
list_of_things = delimitedList(thing, delim=':')
parser = OneOrMore(Group(list_of_things))
# ignore comments starting with double slash
parser.ignore(dblSlashComment)
# run it and see...
parser.parseString(text).pprint()
prints:
[['C:/data/my_file.txt.c', '10', '0x21', 'name1', 'name2', '0x10', '1', 'OK'],
['C:/data/my_file2.txt.c', '110', '0x1', 'name2', 'name5', '0x12', '1', 'NOT_OK'],
['./data/my_file3.txt.c', '110', '0x1', 'name2', 'name5', '0x12', '10', 'OK']]
So I didn´t found a solution with delimitedList and parseFile but I found a Solution which is okay for me.
from pyparsing import *
data = """
C: / data / my_file.txt.c:10:0x21:name1:name2:0x10:1:OK
C: / data / my_file2.txt.c:110:0x1:name2:name5:0x12:1:NOT_OK
./ data / my_file3.txt.c:110:0x1:name2:name5:0x12:10:OK: info message
-----------------------
3 Files 2 OK 1 NOT_OK
NOT_OK
"""
if __name__ == '__main__':
# define an expression for your file reference
entry_one = Combine(
oneOf(list(alphas)) + ':/' +
Word(alphanums + '_-./'))
entry_two = Word(printables + ' ', excludeChars=':')
entry = entry_one | entry_two
delimiter = Literal(':').suppress()
tc_result_line = Group(entry.setResultsName('file_name') + delimiter + entry.setResultsName(
'line_nr') + delimiter + entry.setResultsName('num_one') + delimiter + entry.setResultsName('name_one') + delimiter + entry.setResultsName(
'name_two') + delimiter + entry.setResultsName('num_two') + delimiter + entry.setResultsName('status') + Optional(
delimiter + entry.setResultsName('msg'))).setResultsName("info_line")
EOL = LineEnd().suppress()
SOL = LineStart().suppress()
blank_line = SOL + EOL
tc_summary_line = Group(Word(nums).setResultsName("num_of_lines") + "Files" + Word(nums).setResultsName(
"num_of_ok") + "OK" + Word(nums).setResultsName("num_of_not_ok") + "NOT_OK").setResultsName(
"info_summary")
tc_end_line = Or(Literal("NOT_OK"), Literal('Ok')).setResultsName("info_result")
# run it and see...
pp1 = tc_result_line | Optional(tc_summary_line | tc_end_line)
pp1.ignore(blank_line | OneOrMore("-"))
result = list()
for l in data.split('\n'):
result.append((pp1.parseString(l)).asDict())
# delete empty results
result = filter(None, result)
for r in result:
print(r)
pass
Result:
{'info_line': {'file_name': 'C', 'num_one': '10', 'msg': '1', 'name_one': '0x21', 'line_nr': '/ data / my_file.txt.c', 'status': '0x10', 'num_two': 'name2', 'name_two': 'name1'}}
{'info_line': {'file_name': 'C', 'num_one': '110', 'msg': '1', 'name_one': '0x1', 'line_nr': '/ data / my_file2.txt.c', 'status': '0x12', 'num_two': 'name5', 'name_two': 'name2'}}
{'info_line': {'file_name': './ data / my_file3.txt.c', 'num_one': '0x1', 'msg': 'OK', 'name_one': 'name2', 'line_nr': '110', 'status': '10', 'num_two': '0x12', 'name_two': 'name5'}}
{'info_summary': {'num_of_lines': '3', 'num_of_ok': '2', 'num_of_not_ok': '1'}}
{'info_result': ['NOT_OK']}
Using re:
myList = ["C:/data/my_file.txt.c:10:0x21:name1:name2:0x10:1:OK", "C:/data/my_file2.txt.c:110:0x1:name2:name5:0x12:1:NOT_OK", "./data/my_file3.txt.c:110:0x1:name2:name5:0x12:10:OK"]
for i in myList:
newTxt = re.sub(r':', ",", i)
newTxt = re.sub(r',/', ":/", newTxt)
print newTxt
I'm trying to print the nearest value of a list giving the user input. In practice, the user gives me a time and I want to check if this value is in the list of the schedule, else I want to increment the minutes until I rescue a value of the list and then printing the time in a slot of thirty minutes. Here is my code. Can you help me showing what does not work? Thanks
def print_specific_time():
f = open("Bus 6 Lugano Stazione.txt")
lines = f.readlines()
d = defaultdict(list)
start = lines.index("Monday\n")
stop = lines.index("Saturday\n")
time = "07.35"
hour = time[0] + time[1]
minutes = time[3:]
for line in lines[start:stop]:
line = line.strip(",")
line = line.replace("\n","")
line = line.replace(" ","")
line = line.split("|")
key = line[0]
if len(line) == 2:
d[key] += [line[1]]
if minutes not in d[hour]:
minutes = int(minutes) + 1
minutes = str(minutes)
print(minutes)
if minutes in d[hour]:
print(minutes)
print(hour,d[hour])
else:
if minutes == '59':
hour = int(hour)
hour = hour + 1
hour = "0" + str(hour)
minutes = "00"
d = dict(d)
for key in d.keys():
if key == hour:
print(key,d[key])
In the file I have a list of the schedule all putting inside the dictionary. Here is the output I working on:
{'06': ['11', '26', '41', '56'], '12': ['06', '36'], '11': ['06', '36'],
'07': ['11', '26', '41', '56'], '16': ['11', '26', '41', '56'], '14': ['06', '36'],
'17': ['11', '26', '41', '56'], '20': ['05', '35'], '15': ['06', '36', '56'],
'09': ['06', '36'], '21': ['05', '35'], '22': ['05', '35'], '23': ['05', '35'],
'19': ['11', '40'], '08': ['11', '26', '41'], '13': ['06', '36'], '10': ['06', '36'],
'18': ['11', '26', '41', '56']}
I explain it better. If, for example, the user put a time like 07.35 my program should print 07.41, 07.56 and nothing else (because there is no time after 7.56 in the slot of 30 minutes). Can you help me checking what's wrong? Thanks
EDIT
Ok. Now I'm able to print the slot of time of 07 o'clock (which is in part right) but I don't understand why python prints it only when the variable minutes is equal to '56' and not '41' which is in the list....
I think you overcomplicate it a bit. This one by one increasing is not neccessary. Here is my code to find the nearest time in your dictionary:
def find(time):
start_hour, start_minute = map(int, time.split('.'))
for h in range(start_hour, 24):
hour = "{0:02d}".format(h)
if hour in data.keys():
line = data[hour]
if h == start_hour:
line = list(filter(lambda m: int(m) >= start_minute, line))
if len(line) > 0:
return '%s.%s' % (hour, line[0])
Here you can see it working: fiddle.
Edit: I modified the code to python3. (And the fiddle too.)
Edit2: I also made a version, that lists every time in the next half hour. Fiddle.
Here is the complete edition with the code dividing in two functions. The code was done thanks to the helping of #zord:
def printed():
f = open("Bus 6 Lugano Stazione.txt")
lines = f.readlines()
d = defaultdict(list)
start = lines.index("Monday\n")
stop = lines.index("Saturday\n")
time = "07.35"
hour = time[0] + time[1]
minutes = time[3:]
for line in lines[start:stop]:
line = line.strip(",")
line = line.replace("\n","")
line = line.replace(" ","")
line = line.split("|")
key = line[0]
if len(line) == 2:
d[key] += [line[1]]
d = dict(d)
return d
def find(time):
data = printed()
data2 = [int(h) * 60 + int(m) for h in data.keys() for m in data[h]]
start_hour, start_minute = map(int, time.split('.'))
start = start_hour * 60 + start_minute
end = start + 30
after = list(filter(lambda x: start <= x <= end, data2))
return list(map(lambda x: '%02d.%02d' % (x // 60, x % 60), after))
Here's my code:
from pyparsing import *
survey ='''
BREAK_L,PN1000,LA55.16469813,LN18.15054629
PN1,LA54.16469813,LN17.15054629,EL22.222
BREAK_L,PN2000,LA55.16507249,LN18.15125566
PN6,LA54.16506873,LN17.15115798,EL33.333
PN7,LA54.16507249,LN17.15125566,EL44.444
BREAK_L,PN3000,LA55.16507249,LN18.15125566
PN10,LA54.16507522,LN17.15198405,EL55.555
PN11,LA54.16506566,LN17.15139220,EL44.44
PN12,LA54.16517275,LN17.15100652,EL11.111
'''
digits = "0123456789"
number = Word(nums+'.').setParseAction(lambda t: float(t[0]))
num = Word(digits)
text = Word(alphas)
pt_id = Suppress('PN') + Combine(Optional(text) + num + Optional(text) + Optional(num))
separator = Suppress(',')
latitude = Suppress('LA') + number
longitude = Suppress('LN') + number
gps_line = pt_id + separator + latitude + separator + longitude
break_line = (Suppress('BREAK_L,')
+ pt_id
+ separator
+ latitude
+ separator
+ longitude)
result1 = gps_line.scanString(survey)
result2 = break_line.scanString(survey)
for item in result1:
print item
With example above I would like to find solution how to get output like:
gps_line + it's break_line, what means something like in pseudo code:
for every gps_line in result1:
print gps_line + precedent break_line
If matter of my question is not clear or not fit to description, feel free to change it.
EDIT #2
What I try to achieve is output:
['1', 54.16469813, 17.15054629, 22.222, 'BP1000', 55.16469813, 18.15054629]
['6', 54.16506873, 17.15115798, 33.333, 'BP2000', 55.16507249, 18.15125566]
['7', 54.16507249, 17.15125566, 44.444, 'BP2000', 55.16507249, 18.15125566]
['10', 54.16507522, 17.15198405, 55.555, 'BP3000', 55.16507249, 18.15125566]
['11', 54.16506566, 17.1513922, 44.44, 'BP3000', 55.16507249, 18.15125566]
['12', 54.16517275, 17.15100652, 11.111, 'BP3000', 55.16507249, 18.15125566]
Second attempt:
from decimal import Decimal
from operator import itemgetter
survey ='''
BREAK_L,PN1000,LA55.16469813,LN18.15054629
PN1,LA54.16469813,LN17.15054629,EL22.222
BREAK_L,PN2000,LA55.16507249,LN18.15125566
PN6,LA54.16506873,LN17.15115798,EL33.333
PN7,LA54.16507249,LN17.15125566,EL44.444
BREAK_L,PN3000,LA55.16507249,LN18.15125566
PN10,LA54.16507522,LN17.15198405,EL55.555
PN11,LA54.16506566,LN17.15139220,EL44.44
PN12,LA54.16517275,LN17.15100652,EL11.111
'''
def parse_line(line):
brk = False
kv = {}
for part in line.split(','):
if part == 'BREAK_L':
brk = True
else:
k = part[:2]
v = part[2:]
kv[k] = v
return (brk,kv)
def parse_survey(survey):
ig1 = itemgetter('PN','LA','LN','EL')
ig2 = itemgetter('PN','LA','LN')
brk_data = None
for line in survey.strip().splitlines():
brk, data = parse_line(line)
if brk:
brk_data = data
continue
else:
yield ig1(data) + ig2(brk_data)
for r in parse_survey(survey):
print r
Yields:
('1', '54.16469813', '17.15054629', '22.222', '1000', '55.16469813', '18.15054629')
('6', '54.16506873', '17.15115798', '33.333', '2000', '55.16507249', '18.15125566')
('7', '54.16507249', '17.15125566', '44.444', '2000', '55.16507249', '18.15125566')
('10', '54.16507522', '17.15198405', '55.555', '3000', '55.16507249', '18.15125566')
('11', '54.16506566', '17.15139220', '44.44', '3000', '55.16507249', '18.15125566')
('12', '54.16517275', '17.15100652', '11.111', '3000', '55.16507249', '18.15125566')
This is really not much different to my previous attempt. I'd already paired the data for you. I assume you'll be able to change 1000 into BP1000 yourself.