parsing a text file into lists with python

parsing a text file into lists with python - python

So I have a generated text file that I'd like to parse into a couple lists of dates. I had it figured out when there was one date per 'group' but i realized i may have to deal with multiple date values per group.
My .txt file looks like this:
DateGroup1
20191129
20191127
20191126
DateGroup2
20191129
20191127
20191126
DateGroup3
2019-12-02
DateGroup4
2019-11-27
DateGroup5
2019-11-27
And ideally i would be able to parse this out into 5 lists that include the dates for each group. I am so stumped

Just loop over each line, check for your key that will group data, remove newlines and store each new date.
DATE_GROUP_SEPARATOR = 'DateGroup'
sorted_data = {}
with open('test.txt') as file:
last_group = None
for line in file.readlines():
line = line.replace('\n', '')
if DATE_GROUP_SEPARATOR in line:
sorted_data[line] = []
last_group = line
else:
sorted_data[last_group].append(line)
for date_group, dates in sorted_data.items():
print(f"{date_group}: {dates}")

Here is an example that you could build off of, every time it reads a string rather than a number it then makes a new list and puts all the dates under the group in it
import os
#read file
lineList = 0
with open("test.txt") as f:
lineList = f.readlines()
#make new list to hold variables
lists = []
#loop through and check for numbers and strings
y=-1
for x in range(len(lineList)):
#check if it is a number or a string
if(lineList[x][0] is not None and not lineList[x][0].isdigit()):
#if it is a string make a new list and push back the name
lists.append([lineList[x]])
y+=1
else:
#if it is the number append it to the current list
lists[y].append(lineList[x])
#print the lists
for x in lists:
print(x)

Start by reading in your whole text file. Then you can count the amount of occurrences of "DateGroup", which seems to be the constant part in your date group separation. You can then parse your file by going through all the data that is in between any two "DateGroup" identifiers or between one "DateGroup" identifier and the end of the file. Try to understand the following piece of code and build your application on top of that:
file = open("dates.txt")
text = file.read()
file.close()
amountGroups = text.count("DateGroup")
list = []
index = 0
i = 0
for i in range(amountGroups):
list.append([])
index = text.find("DateGroup", index)
index = text.find("\n", index) + 1
indexEnd = text.find("DateGroup", index)
if(indexEnd == -1):
indexEnd = len(text)
while(index < indexEnd):
indexNewline = text.find("\n", index)
list[i].append(text[index:indexNewline])
index = indexNewline + 1
print(list)

This first section just to show how to treat a string with data as if it came from a file. That helps if you don't want to generate the actual file of the OP but want to visibly import the data in the editor.
import sys
from io import StringIO # allows treating some lines in editor as if they were from a file)
dat=StringIO("""DateGroup1
20191129
20191127
20191126
DateGroup2
20191129
20191127
20191126
DateGroup3
2019-12-02
DateGroup4
2019-11-27
DateGroup5
2019-11-27""")
lines=[ l.strip() for l in dat.readlines()]
print(lines)
output:
['DateGroup1', '20191129', '20191127', '20191126', 'DateGroup2', '20191129', '20191127', '20191126', 'DateGroup3', '2019-12-02', 'DateGroup4', '2019-11-27', 'DateGroup5', '2019-11-27']
Now one possible way to generate your desired list of lists, while ensuring that both possible date formats are covered:
from datetime import datetime
b=[]
for i,line in enumerate(lines):
try: # try first dateformat
do = datetime.strptime(line, '%Y%m%d')
a.append(datetime.strftime(do,'%Y-%m-%d'))
except:
try: # try second dateformat
do=datetime.strptime(line,'%Y-%m-%d')
a.append(datetime.strftime(do,'%Y-%m-%d'))
except: # if neither date, append old list to list of lists & make a new list
if a!=None:
b.append(a)
a=[]
if i==len(lines)-1:
b.append(a)
b
output:
[['2019-11-27'],
['2019-11-29', '2019-11-27', '2019-11-26'],
['2019-11-29', '2019-11-27', '2019-11-26'],
['2019-12-02'],
['2019-11-27'],
['2019-11-27']]

TTP can help to parse this text as well, here is sample template with code how to run it:
from ttp import ttp
data_to_parse = """
DateGroup1
20191129
20191127
20191126
DateGroup2
20191129
20191127
20191126
DateGroup3
2019-12-02
DateGroup4
2019-11-27
DateGroup5
2019-11-27
"""
ttp_template = """
<group name="date_groups.date_group{{ id }}">
DateGroup{{ id }}
{{ dates | to_list | joinmatches() }}
</group>
"""
parser = ttp(data=data_to_parse, template=ttp_template)
parser.parse()
print(parser.result(format="json")[0])
above code would produce this output:
[
{
"date_groups": {
"date_group1": {
"dates": [
"20191129",
"20191127",
"20191126"
]
},
"date_group2": {
"dates": [
"20191129",
"20191127",
"20191126"
]
},
"date_group3": {
"dates": [
"2019-12-02"
]
},
"date_group4": {
"dates": [
"2019-11-27"
]
},
"date_group5": {
"dates": [
"2019-11-27"
]
}
}
}
]

This is my attempt to parse that text data. I deliberately chose parsec.py, a haskell parsec-like parser combinators library, because it works more clearly then regular expressions, so it is easier to debug and test.
And second cause is much more flexibility of getting output data format.
import re
from parsec import *
spaces = regex(r'\s*', re.MULTILINE)
#generate
def getHeader():
s1 = yield string ("DateGroup")
s2 = ''.join( (yield many1(digit())))
return (s1 + s2)
#generate
def getDataLine():
s1 = yield digit()
s2 = ''.join((yield many1 (none_of ("\r\n"))))
yield spaces
return (s1 + s2)
#generate
def getChunk():
yield spaces
header = yield getHeader
yield spaces
dataList = yield many1 (getDataLine)
return (header,dataList)
#generate
def getData():
yield spaces
parsedData = yield many1(getChunk)
yield eof()
return parsedData
inputText = """DateGroup1
20191129
20191127
20191126
DateGroup2
20191129
20191127
20191126
DateGroup3
2019-12-02
DateGroup4
2019-11-27
DateGroup5
2019-11-27"""
result = getData.parse(inputText)
for p in result:
print(p)
Output:
('DateGroup1', ['20191129', '20191127', '20191126'])
('DateGroup2', ['20191129', '20191127', '20191126'])
('DateGroup3', ['2019-12-02'])
('DateGroup4', ['2019-11-27'])
('DateGroup5', ['2019-11-27'])

Related

how to handle date & time when splitting string on ":"

I am processing a text file, reading line by line splitting it, and inserting it into a database.
each line goes like
3530000000000:100000431506294:Jean:Camargo:male::::Kefron:6/4/2018 12:00:00 AM::11/19
The problem is that it also splits the date-time and as a result it populates the wrong information in the database like in the image below.
my code goes like:
with open(filename, encoding="utf-8") as f:
counter = 0
for line in f:
data = line.split(':')
id = str(counter)
Phonenumber = data[0].strip()
profileID = data[1].strip()
firstname = data[2].strip()
secondname = data[3].strip()
gender = data[4].strip()
LocationWhereLive = data[5].strip()
LocationWhereFrom = data[6].strip()
RelationshipStatus = data[7].strip()
whereWork = data[8].strip()
AccountCreationDate = data [9].strip()
Email = data[10].strip()
Birthdate = data [11].strip()
mycursor = mydb.cursor()
sql = mycursor.execute("insert into dataleads values ('"+id+"','"+Phonenumber+"','"+profileID+"','"+firstname+"','"+secondname+"','"+gender+"','"+LocationWhereLive+"','"+LocationWhereFrom+"','"+RelationshipStatus+"','"+whereWork+"','"+AccountCreationDate+"','"+Email+"','"+Birthdate+"')")
mycursor.execute(sql)
mydb.commit()
counter += 1

Alternative to splitting by spaces, you can also leverage the maxsplit argument in the split and rsplit methods:
def make_list(s):
before = s.split(":", maxsplit= 9) # splits up to the date
after = before[-1].rsplit(":", maxsplit= 2) # splits the last part up to the date (from the right)
return [*before[:-1], *after] # creates a list with both parts
s = "3530000000000:100000431506294:Jean:Camargo:male::::Kefron:6/4/2018 12:00:00 AM::11/19"
make_list(s)
Out:
['3530000000000',
'100000431506294',
'Jean',
'Camargo',
'male',
'',
'',
'',
'Kefron',
'6/4/2018 12:00:00 AM',
'',
'11/19']

As mentioned in the comments, you can split with the whitespace:
s = "3530000000000:100000431506294:Jean:Camargo:male::::Kefron:6/4/2018 12:00:00 AM::11/19"
split_s = s.split() # default split is any whitespace character
print(split_s[0]) # will print "3530000000000:100000431506294:Jean:Camargo:male::::Kefron:6/4/2018"
print(split_s[1]) # will print "12:00:00"
print(split_s[2]) # will print "AM::11/19"

To deal with the original file, you can split this in a loop with knowledge of the count of fields, rather than trying to use how many separator characters there are
collection = []
_line = line # keep a backup of the line to compare and count blocks
for field_index in range(12):
if field_index < 8: # get the first 8 fields (or some set)
prefix, _line = _line.split(":", 1) # only split once!
collection.append(prefix)
continue
if field_index == 9: # match date field _line from regex
if _line.startswith("::"): # test if field was omitted
_line = _line[1:] # truncate the first character
continue
r"^\d+/..." # TODO regex for field
continue
...
This can be tuned or adapted to handle any field which can be
absent
also contain the separators in it (thanks)
However, if you can instead take a moment to educate the author of this file that it's problematic and why and nicely.. they may rewrite the file to be better for you or provide you with its input files you are further munging
Specifically, the tool could either
use a separator unavailable in the resulting data (such as | or ##SEPARATOR##)
escape the fields or swap their separators to another character before writing (.replace(":", "-"))

An alternative solution is to match the field in the line first and transform it, allowing you to deal with the field on its own (perhaps transforming it back via a regex or .replace())
line = re.sub(r"(\d\d?):(\d\d):(\d\d) (AM|PM)", r"\1-\2-\3-\4", line)
# now split out line on :
>>> line = "3530000000000:100000431506294:Jean:Camargo:male::::Kefron:6/4/2018 12:00:00 AM::11/19"
>>> re.sub(r"(\d\d?):(\d\d):(\d\d) (AM|PM)", r"\1-\2-\3-\4", line).split(":")
['3530000000000', '100000431506294', 'Jean', 'Camargo', 'male', '', '', '', 'Kefron', '6/4/2018 12-00-00-AM', '', '11/19']

The structure is the same you only join the split data back together again
counter = 0
line = "3530000000000:100000431506294:Jean:Camargo:male::::Kefron:6/4/2018 12:00:00 AM::11/19"
data = line.split(':')
id = str(counter)
Phonenumber = data[0].strip()
profileID = data[1].strip()
firstname = data[2].strip()
secondname = data[3].strip()
gender = data[4].strip()
LocationWhereLive = data[5].strip()
LocationWhereFrom = data[6].strip()
RelationshipStatus = data[7].strip()
whereWork = data[8].strip()
AccountCreationDate = data [9].strip() + ':' + data[10].strip() +":" + data[11].strip()
Email = data[12].strip()
Birthdate = data [13].strip()

Python retrieving data from a block of lines containing specific characters and appending relevant data into separate lines

I am trying to create a program which selects specific information from a bulk paste, extract the relevant information and then proceed to paste said information into lines.
Here is some example data;
1. Track1 03:01
VOC:PersonA
LYR：LyrcistA
COM：ComposerA
ARR：ArrangerA
ARR：ArrangerB
2. Track2 04:18
VOC:PersonB
VOC:PersonC
LYR：LyrcistA
LYR：LyrcistC
COM：ComposerA
ARR：ArrangerA
I would like to have the output where the relevant data for the Track1 is grouped together in a single line, with semicolon joining identical information and " - " seperating between others.
LyrcistA - ComposerA - ArrangerA; ArrangerB
LyrcistA; LyrcistC - ComposerA - ArrangerA
I have not gotten very far despite my best efforts
while True:
YodobashiData = input("")
SplitData = YodobashiData.splitlines();
returns the following
['1. Track1 03:01']
['VOC:PersonA ']
['LYR：LyrcistA']
['COM：ComposerA']
['ARR：ArrangerA']
['ARR：ArrangerB']
[]
['2. Track2 04:18']
['VOC:PersonB']
['VOC:PersonC']
['LYR：LyrcistA']
['LYR：LyrcistC']
['COM：ComposerA']
['ARR：ArrangerA']
Whilst I have all the data now in separate lists, I have no idea how to identify and extract the information from the list I need from the ones I do not.
Also, it seems I need to have the while loop or else it will only return the first list and nothing else.

Here's a script that doesn't use regular expressions.
It assumes that header lines, and only the header lines, will always start with a digit, and that the overall structure of header line then credit lines is consistent. Empty lines are ignored.
Extraction and formatting of the track data are handled separately, so it's easier to change formats, or use the extracted data in other ways.
import collections
import unicodedata
data_from_question = """\
1. Track1 03:01
VOC:PersonA
LYR：LyrcistA
COM：ComposerA
ARR：ArrangerA
ARR：ArrangerB
2. Track2 04:18
VOC:PersonB
VOC:PersonC
LYR：LyrcistA
LYR：LyrcistC
COM：ComposerA
ARR：ArrangerA
"""
def prepare_data(data):
# The "colons" in the credits lines are actually
# "full width colons". Replace them (and other such characters)
# with their normal width equivalents.
# If full normalisation is undesirable then we could return
# data.replace('\N{FULLWIDTH COLON}', ':')
return unicodedata.normalize('NFKC', data)
def is_new_track(line):
return line[0].isdigit()
def parse_track_header(line):
id_, title, duration = line.split()
return {'id': id_.rstrip('.'), 'title': title, 'duration': duration}
def get_credit(line):
credit, _, name = line.partition(':')
return credit.strip(), name.strip()
def format_track_heading(track):
return 'id: {id} title: {title} length: {duration}'.format(**track)
def format_credits(track):
order = ['ARR', 'COM', 'LYR', 'VOC']
parts = ['; '.join(track[k]) for k in order]
return ' - '.join(parts)
def get_data():
# The data is expected to be a multiline string.
return data_from_question
def parse_data(data):
track = None
for line in filter(None, data.splitlines()):
if is_new_track(line):
if track:
yield track
track = collections.defaultdict(list)
header_data = parse_track_header(line)
track.update(header_data)
else:
role, name = get_credit(line)
track[role].append(name)
yield track
def report(tracks):
for track in tracks:
print(format_track_heading(track))
print(format_credits(track))
print()
def main():
data = get_data()
prepared_data = prepare_data(data)
tracks = parse_data(prepared_data)
report(tracks)
main()
Output:
id: 1 title: Track1 length: 03:01
ArrangerA; ArrangerB - ComposerA - LyrcistA - PersonA
id: 2 title: Track2 length: 04:18
ArrangerA - ComposerA - LyrcistA; LyrcistC - PersonB; PersonC

Here's another take on an answer to your question:
data = """
1. Track1 03:01
VOC:PersonA
LYR：LyrcistA
COM：ComposerA
ARR：ArrangerA
ARR：ArrangerB
2. Track2 04:18
VOC:PersonB
VOC:PersonC
LYR：LyrcistA
LYR：LyrcistC
COM：ComposerA
ARR：ArrangerA"""
import re
import collections
# Regular expression to pull apart the headline of each entry
headlinePattern = re.compile(r"(\d+)\.\s+(.*?)\s+(\d\d:\d\d)")
def main():
# break the data into lines
lines = data.strip().split("\n")
# while we have more lines...
while lines:
# The next line should be a title line
line = lines.pop(0)
m = headlinePattern.match(line)
if not m:
raise Exception("Unexpected data format")
id = m.group(1)
title = m.group(2)
length = m.group(3)
people = collections.defaultdict(list)
# Now read person lines until we hit a blank line or the end of the list
while lines:
line = lines.pop(0)
if not line:
break
# Break the line into label and name
label, name = re.split(r"\W+", line, 1)
# Add this entry to a map of lists, where the map's keys are the label and the
# map's values are all the people who had that label
people[label].append(name)
# Now we have everything for one entry in the data. Print everything we got.
print("id:", id, "title:", title, "length:", length)
print(" - ".join(["; ".join(person) for person in people.values()]))
# go on to the next entry...
main()
Result:
id: 1 title: Track1 length: 03:01
PersonA - LyrcistA - ComposerA - ArrangerA; ArrangerB
id: 2 title: Track2 length: 04:18
PersonB; PersonC - LyrcistA; LyrcistC - ComposerA - ArrangerA
You can just comment out the line that prints the headline info if you really just want the line with all of the people on it. Just replace the built in data with data = input("") if you want to read the data from a user prompt.

Assuming your data is in the format you specified in a file called tracks.txt, the following code should work:
import re
with open('tracks.txt') as fp:
tracklines = fp.read().splitlines()
def split_tracks(lines):
track = []
all_tracks = []
while True:
try:
if lines[0] != '':
track.append(lines.pop(0))
else:
all_tracks.append(track)
track = []
lines.pop(0)
except:
all_tracks.append(track)
return all_tracks
def gather_attrs(tracks):
track_attrs = []
for track in tracks:
attrs = {}
for line in track:
match = re.match('([A-Z]{3}):', line)
if match:
attr = line[:3]
val = line[4:].strip()
try:
attrs[attr].append(val)
except KeyError:
attrs[attr] = [val]
track_attrs.append(attrs)
return track_attrs
if __name__ == '__main__':
tracks = split_tracks(tracklines)
attrs = gather_attrs(tracks)
for track in attrs:
semicolons = map(lambda va: '; '.join(va), track.values())
hyphens = ' - '.join(semicolons)
print(hyphens)
The only thing you may have to change is the colon characters in your data - some of them are ASCII colons : and others are Unicode colons ：, which will break the regex.

import re
list_ = data_.split('\n') # here data_ is your data
regObj = re.compile(rf'[A-Za-z]+(:|{chr(65306)})[A-Za-z]+')
l = []
pre = ''
for i in list_:
if regObj.findall(i):
if i[:3] != 'VOC':
if pre == i[:3]:
l.append('; ')
else:
l.append(' - ')
l.append(i[4:].strip())
else:
l.append(' => ')
pre = i[:3]
track_list = list(map(lambda item: item.strip(' - '), filter(lambda item: item, ''.join(l).split(' => '))))
print(track_list)
OUTPUT : list of result you want
['LyrcistA - ComposerA - ArrangerA; ArrangerB', 'LyrcistA; LyrcistC - ComposerA - ArrangerA']

Fill tables in a template Word with Python (DocxTemplate, Jinja2)

I am trying to fill with Python a table in Word with DocxTemplate and I have some issues to do it properly. I want to use 2 dictionnaries to fill the data in 1 table, in the figure below.
Table to fill
The 2 dictionnaries are filled in a loop and I write the template document at the end.
The input document to create my dictionnaries is an DB extraction written in SQL.
My main issue is when I want to fill the table with my data in the 2 different dictionnaries.
In the code below I will give as an example the 2 dictionnaries with values in it.
# -*- coding: utf8 -*-
#
#
from docxtpl import DocxTemplate
if __name__ == "__main__":
document = DocxTemplate("template.docx")
DicoOccuTable = {'`num_carnet_adresses`': '`annuaire_telephonique`\n`carnet_adresses`\n`carnet_adresses_complement',
'`num_eleve`': '`CFA_apprentissage_ctrl_coherence`\n`CFA_apprentissage_ctrl_examen`}
DicoChamp = {'`num_carnet_adresses`': 72, '`num_eleve`': 66}
template_values = {}
#
template_values["keys"] = [[{"name":cle, "occu":val} for cle,val in DicoChamp.items()],
[{"table":vals} for cles,vals in DicoOccuTable.items()]]
#
document.render(template_values)
document.save('output/' + nomTable.replace('`','') + '.docx')
As a result the two lines for the table are created but nothing is written within...
I would like to add that it's only been 1 week that I work on Python, so I feel that I don't manage properly the different objects here.
If you have any suggestion to help me, I would appreciate it !
I put here the loop to create the dictionnaries, it may help you to understand why I coded it wrong :)
for c in ChampList:
with open("db_reference.sql", "r") as f:
listTable = []
line = f.readlines()
for l in line:
if 'CREATE TABLE' in l:
begin = True
linecreateTable = l
x = linecreateTable.split()
nomTable = x[2]
elif c in l and begin == True:
listTable.append(nomTable)
elif ') ENGINE=MyISAM DEFAULT CHARSET=latin1;' in l:
begin = False
nbreOccu=len(listTable)
Tables = "\n".join(listTable)
DicoChamp.update({c:nbreOccu})
DicoOccuTable.update({c:Tables})
# DicoChamp = {c:nbreOccu}
template_values = {}
Thank You very much !

Finally I found a solution for this problem. Here it is.
Instead of using 2 dictionnaries I created 1 dictionnary with this strucuture :
Dico = { Champ : [Occu , Tables] }
The full code for creating the table is detailed below :
from docxtpl import DocxTemplate
document = DocxTemplate("template.docx")
template_values = {}
Context = {}
for c in ChampList:
listTable = []
nbreOccu = 0
OccuTables = []
with open("db_reference.sql", "r") as g:
listTable = []
ligne = g.readlines()
for li in ligne:
if 'CREATE TABLE' in li:
begin = True
linecreateTable2 = li
y = linecreateTable2.split()
nomTable2 = y[2]
elif c in li and begin == True:
listTable.append(nomTable2)
elif ') ENGINE=MyISAM DEFAULT CHARSET=latin1;' in li:
begin = False
elif '/*!40101 SET COLLATION_CONNECTION=#OLD_COLLATION_CONNECTION */;' in li:
nbreOccu=len(listTable)
inter = "\n".join(listTable)
OccuTables.append(nbreOccu)
OccuTables.append(inter)
ChampNumPropre = c.replace('`','')
Context.update({ChampNumPropre:OccuTables})
else:
continue
template_values["keys"] = [{"label":cle, "cols":val} for cle,val in Context.items()]
#
document.render(template_values)
document.save('output/' + nomTable.replace('`','') + '.docx')
And I used a table with the following structure :
I hope you will find your answers here and good luck !

Parsing blocks of text data with python itertools.groupby

I'm trying to parse a blocks of text in python 2.7 using itertools.groupby
The data has the following structure:
BEGIN IONS
TITLE=cmpd01_scan=23
RTINSECONDS=14.605
PEPMASS=694.299987792969 505975.375
CHARGE=2+
615.839727 1760.3752441406
628.788226 2857.6264648438
922.4323436 2458.0959472656
940.4432533 9105.5
END IONS
BEGIN IONS
TITLE=cmpd01_scan=24
RTINSECONDS=25.737
PEPMASS=694.299987792969 505975.375
CHARGE=2+
575.7636234 1891.1656494141
590.3553938 2133.4477539063
615.8339562 2433.4252929688
615.9032114 1784.0628662109
END IONS
I need to extract information from the line beigining with "TITLE=", "PEPMASS=","CHARGE=".
The code I'm using as follows:
import itertools
import re
data_file='Test.mgf'
def isa_group_separator(line):
return line=='END IONS\n'
regex_scan = re.compile(r'TITLE=')
regex_precmass=re.compile(r'PEPMASS=')
regex_charge=re.compile(r'CHARGE=')
with open(data_file) as f:
for (key,group) in itertools.groupby(f,isa_group_separator):
#print(key,list(group))
if not key:
precmass_match = filter(regex_precmass.search,group)
print precmass_match
scan_match= filter(regex_scan.search,group)
print scan_match
charge_match = filter(regex_charge.search,group)
print charge_match
However, the output only picks up the "PEPMASS=" line,and if 'scan_match' assignment is done before 'precmass_match', the "TITLE=" line is printed only;
> ['PEPMASS=694.299987792969 505975.375\n'] [] []
> ['PEPMASS=694.299987792969 505975.375\n'] [] []
can someone point out what I'm doing wrong here?

The reason for this is that group is an iterator and it runs only once.
Please find the modified script that does the job.
import itertools
import re
data_file='Test.mgf'
def isa_group_separator(line):
return line == 'END IONS\n'
regex_scan = re.compile(r'TITLE=')
regex_precmass = re.compile(r'PEPMASS=')
regex_charge = re.compile(r'CHARGE=')
with open(data_file) as f:
for (key, group) in itertools.groupby(f, isa_group_separator):
if not key:
g = list(group)
precmass_match = filter(regex_precmass.search, g)
print precmass_match
scan_match = filter(regex_scan.search, g)
print scan_match
charge_match = filter(regex_charge.search, g)
print charge_match

I might try to parse this way (without using groupby(
import re
file = """\
BEGIN IONS
TITLE=cmpd01_scan=23
RTINSECONDS=14.605
PEPMASS=694.299987792969 505975.375
CHARGE=2+
615.839727 1760.3752441406
628.788226 2857.6264648438
922.4323436 2458.0959472656
940.4432533 9105.5
END IONS
BEGIN IONS
TITLE=cmpd01_scan=24
RTINSECONDS=25.737
PEPMASS=694.299987792969 505975.375
CHARGE=2+
575.7636234 1891.1656494141
590.3553938 2133.4477539063
615.8339562 2433.4252929688
615.9032114 1784.0628662109
END IONS""".splitlines()
pat = re.compile(r'(TITLE|PEPMASS|CHARGE)=(.+)')
data = []
for line in file:
m = pat.match(line)
if m is not None:
if m.group(1) == 'TITLE':
data.append([])
data[-1].append(m.group(2))
print(data)
Prints:
[['cmpd01_scan=23', '694.299987792969 505975.375', '2+'], ['cmpd01_scan=24', '694.299987792969 505975.375', '2+']]

Learning Python: Store values in dict from stdout

How can I do the following in Python:
I have a command output that outputs this:
Datexxxx
Clientxxx
Timexxx
Datexxxx
Client2xxx
Timexxx
Datexxxx
Client3xxx
Timexxx
And I want to work this in a dict like:
Client:(date,time), Client2:(date,time) ...

After reading the data into a string subject, you could do this:
import re
d = {}
for match in re.finditer(
"""(?mx)
^Date(.*)\r?\n
Client\d*(.*)\r?\n
Time(.*)""",
subject):
d[match.group(2)] = (match.group(1), match.group(2))

How about something like:
rows = {}
thisrow = []
for line in output.split('\n'):
if line[:4].lower() == 'date':
thisrow.append(line)
elif line[:6].lower() == 'client':
thisrow.append(line)
elif line[:4].lower() == 'time':
thisrow.append(line)
elif line.strip() == '':
rows[thisrow[1]] = (thisrow[0], thisrow[2])
thisrow = []
print rows
Assumes a trailing newline, no spaces before lines, etc.

What about using a dict with tuples?
Create a dictionary and add the entries:
dict = {}
dict['Client'] = ('date1','time1')
dict['Client2'] = ('date2','time2')
Accessing the entires:
dict['Client']
>>> ('date1','time1')

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

parsing a text file into lists with python - python

Related

how to handle date & time when splitting string on ":"

Python retrieving data from a block of lines containing specific characters and appending relevant data into separate lines

Fill tables in a template Word with Python (DocxTemplate, Jinja2)

Parsing blocks of text data with python itertools.groupby

Learning Python: Store values in dict from stdout

Categories

Resources