Using Regular Expressions to Parse Based on Unique Character Sequence - python

I'm hoping to get some Python assistance with parsing out a column in a DataFrame that has a unique character sequence.
Each record can have a variable number of parameter name/value pairings.
The only way to determine where each name/value pairing ends is by looking for an equal sign and then finding the most immediate preceding comma. This gets a little tricky as some of the values will continue commas, so using a comma to parse won't always yield clean results.
Example below:
String
NAME=TEST,TEST ID=1234,ENTRY DESCR=Verify,ENTRY CLASS=CCD,TRACE NO=124313523,12414,ENTRY DATE=210506
DESCRIPTION=TEST,TEST,TEST
End Result:
String1
String2
String3
String4
String5
String6
NAME=TEST
TEST ID=1234
ENTRY DESCR=Verify
ENTRY CLASS=CCD
TRACE NO=124313523,12414
ENTRY DATE=210506
DESCRIPTION=TEST,TEST,TEST
Thanks in advance for your help!

This can certainly be done with Regexs, but for a quick and dirty parser I would do it manually.
First a test suite:
import pytest
#pytest.mark.parametrize(
"encoded,parsed",
[
("X=Y", {"X": "Y"}),
("DESCRIPTION=TEST,TEST,TEST", {"DESCRIPTION": "TEST,TEST,TEST"}),
(
"NAME=TEST,TEST ID=1234,ENTRY DESCR=Verify,ENTRY CLASS=CCD,TRACE NO=124313523,12414,ENTRY DATE=210506",
{
"NAME": "TEST",
"TEST ID": "1234",
"ENTRY DESCR": "Verify",
"ENTRY CLASS": "CCD",
"TRACE NO": "124313523,12414",
"ENTRY DATE": "210506",
},
),
],
)
def test_parser(encoded, parsed):
assert parser(encoded) == parsed
You'll need to pip install pytest if you don't already have it.
Then a parser:
def parser(encoded: str) -> dict[str, str]:
parsed = {}
val = []
for token in reversed(encoded.split("=")):
if val:
*vals, token = token.split(",")
parsed[token] = ",".join(val)
val = vals
else:
val = token.split(",")
return parsed
This is not a 'proper' parser (i.e. the traditional token, lex, parse) but handles this format. It works as follows:
step backwards through all the something=val pairs.
split the val (this is strictly pointless, but see below)
split something at the last comma (using a * expression to collect all the other components)
add a new entry into the parsed dict, joining the val back up again with commas
Note that this would work just as well with val = [token]. But you probably don't want a parser which returns a format which in turn needs parsing. You probably want it to turn , separated values into a list of appropriate types. Currently you have three types: strs, ints and a datetime. Thus "".join(val) could profitably be replaced with [convert(x) for x in val]. convert might look something like this:
from datetime import datetime
def convert(x: str) -> Union[date, int, str]:
for candidate in (
lambda x: datetime.strptime(x, "%y%m%d").date(),
lambda x: int(x),
lambda x: x,
):
try:
return candidate(x)
except ValueError:
pass
This would then be used by doing something like this in the parser:
converted = [convert(x) for x in val]
if len(converted) == 1:
converted = converted[0]
parsed[token] = converted
However, this conversion function has a problem---it falsely identifies one number as a date. How exactly to fix this depends on the input data. Perhaps the date parsing function can be context-agnostic, and just check for a 6-digit input before parsing (or manually split the str and pass to datetime.date). Perhaps the decision needs to be made in the parser, based on whether the word "DATE" is in the key.
If you really want to use regexs, have a look at negative lookaheads.

You can do this:
def parse(s: str) -> dict:
# Split by "=" and by ","
raw = [x.split(",") for x in s.split("=")]
# Keys are the last element of each row, besides the last
keys = [k[-1] for k in raw[:-1]]
# Values are all the elements before the last, shifted by one
values = [",".join(k[:-1]) for k in raw[1:-1]] + [",".join(raw[-1])]
return dict(zip(keys, values))
If we try:
s1 = "NAME=TEST,TEST ID=1234,ENTRY DESCR=Verify,ENTRY CLASS=CCD,TRACE NO=124313523,12414,ENTRY DATE=210506"
s2 = "DESCRIPTION=TEST,TEST,TEST"
print(parse(s1))
print(parse(s2))
We get:
>>> {'NAME': 'TEST',
'TEST ID': '1234',
'ENTRY DESCR': 'Verify',
'ENTRY CLASS': 'CCD',
'TRACE NO': '124313523,12414',
'ENTRY DATE': '210506'}
>>> {'DESCRIPTION': 'TEST,TEST,TEST'}

Thanks for the suggestions, everyone! I wound up figuring a way using RegEx and did this in a two lines of code.
s1 = "NAME=TEST,TEST ID=1234,ENTRY DESCR=Verify,ENTRY CLASS=CCD,TRACE NO=124313523,12414,ENTRY DATE=210506"
regex=re.compile(',(?=[^,]+=)')
regex.split(s1)

Related

Fastest way to tokenize signal?

I need to find the fastest way to tokenize a signal. The signal is of the form:
identifier:value identifier:value identifier:value ...
identifier only consists of alphanumerics and underscores. identifier is separated from previous value by a space. Value may contain alphanumerics, various braces/brackets and spaces.
e.g.
signal_id:debug_word12_ind data:{ } virtual_interface_index:0x0000 module_id:0x0001 module_sub_id:0x0016 timestamp:0xcc557366 debug_words:[0x0006 0x0006 0x0000 0x0000 0x0000 0x0000 0xcc55 0x70a9 0x4c55 0x7364 0x0000 0x0000] sequence_number:0x0174
The best I've come up with is below. Ideally I'd like to halve the time it takes. I've tried various things with regexes but they're no better. Any suggestions?
# Convert data to dictionary. Expect data to be something like
# parameter_1:a b c d parameter_2:false parameter_3:0xabcd parameter_4:-56
# Split at colons. First part will be just parameter name, last will be just value
# everything in between will be <parameter name><space><value>
parts1 = data.split(":")
parts2 = []
for part in parts1:
# Copy first and last 'as is'
if part in (parts1[0], parts1[-1]):
parts2.append(part)
# Split everything in between at last space (don't expect parameter names to contain spaces)
else:
parts2.extend(part.rsplit(' ', 1))
# Expect to now have [parameter name, value, parameter name, value, ...]. Convert to a dict
self.data_dict = {}
for i in range(0, len(parts2), 2):
self.data_dict[parts2[i]] = parts2[i + 1]
I have optimized your solution a little:
1) Removed the check from the loop.
2) Changed a dictionary creation code: Pairs from single list.
parts1 = data.split(":")
parts2 = []
parts2.append(parts1.pop(0))
for part in parts1[0:-1]:
parts2.extend(part.rsplit(' ', 1))
parts2.append(parts1.pop())
data_dict = {k : v for k, v in zip(parts2[::2], parts2[1::2])}

Update last character of string with a value

I have two strings:
input = "12.34.45.362"
output = "2"
I want to be able to replace the 362 in input by 2 from output.
Thus the final result should be 12.34.45.2. I am unsure on how to do it. Any help is appreciated.
You can use a simple regex for this:
import re
input_ = "12.34.45.362"
output = "2"
input_ = re.sub(r"\.\d+$", f".{output}", input_)
print(input_)
Output:
12.34.45.2
Notice that I also changed input to input_, so we're not shadowing the built-in input() function.
Can also use a more simple, but little bit less robust pattern, which doesn't take the period into account at all, and just replaces all the digits from the end:
import re
input_ = "12.34.45.362"
output = "2"
input_ = re.sub(r"\d+$", output, input_)
print(input_)
Output:
12.34.45.2
Just in case you want to do this for any string of form X.Y.Z.W where X, Y, Z, and W may be of non-constant length:
new_result = ".".join(your_input.split(".")[:-1]) + "." + output
s.join will join a collection together to a string using the string s between each element. s.split will turn a string into a list which each element between the given character .. Slicing the list (l[:-1]) will give you all but the last element, and finally string concatenation (if you are sure output is str) will give you your result.
Breaking it down step-by-step:
your_input = "12.34.45.362"
your_input.split(".") # == ["12", "34", "45", "362"]
your_input.split(".")[:-1] # == ["12", "34", "45"]
".".join(your_input.split(".")[:-1]) # == "12.34.45"
".".join(your_input.split(".")[:-1]) + "." + output # == "12.34.45.2"
If you are trying to split int the lat . just do a right split get everything and do a string formatting
i = "12.34.45.362"
r = "{}.2".format(i.rsplit(".",1)[0])
output
'12.34.45.2'

Regex Python find everything between four characters

I have a string that holds data. And I want everything in between ({ and })
"({Simple Data})"
Should return "Simple Data"
Or regex:
s = '({Simple Data})'
print(re.search('\({([^})]+)', s).group(1))
Output:
'Simple Data'
You could try the following:
^\({(.*)}\)$
Group 1 will contain Simple Data.
See an example on regexr.
If the brackets are always positioned at the beginning and the end of the string, then you can do this:
l = "({Simple Data})"
print(l[2:-2])
Which resulst in:
"Simple Data"
In Python you can access single characters via the [] operator. With this you can access the sequence of characters starting with the third one (index = 2) up to the second-to-last (index = -2, second-to-last is not included in the sequence).
You could try this regex (?s)\(\{(.*?)\}\)
which simply captures the contents between the delimiters.
Beware though, this doesn't account for nesting.
If nesting is a concern, the best you can to with standard Python re engine
is to get the inner nest only, using this regex:
\(\{((?:(?!\(\{|\}\).)*)\}\)
Hereby I designed a tokenizer aimming at nesting data. OP should check out here.
import collections
import re
Token = collections.namedtuple('Token', ['typ', 'value', 'line', 'column'])
def tokenize(code):
token_specification = [
('DATA', r'[ \t]*[\w]+[\w \t]*'),
('SKIP', r'[ \t\f\v]+'),
('NEWLINE', r'\n|\r\n'),
('BOUND_L', r'\(\{'),
('BOUND_R', r'\}\)'),
('MISMATCH', r'.'),
]
tok_regex = '|'.join('(?P<%s>%s)' % pair for pair in token_specification)
line_num = 1
line_start = 0
lines = code.splitlines()
for mo in re.finditer(tok_regex, code):
kind = mo.lastgroup
value = mo.group(kind)
if kind == 'NEWLINE':
line_start = mo.end()
line_num += 1
elif kind == 'SKIP':
pass
else:
column = mo.start() - line_start
yield Token(kind, value, line_num, column)
statements = '''
({Simple Data})
({
Parent Data Prefix
({Nested Data (Yes)})
Parent Data Suffix
})
'''
queue = collections.deque()
for token in tokenize(statements):
if token.typ == 'DATA' or token.typ == 'MISMATCH':
queue.append(token.value)
elif token.typ == 'BOUND_L' or token.typ == 'BOUND_R':
print(''.join(queue))
queue.clear()
Output of this code should be:
Simple Data
Parent Data Prefix
Nested Data (Yes)
Parent Data Suffix

Pyparsing: Parsing semi-JSON nested plaintext data to a list

I have a bunch of nested data in a format that loosely resembles JSON:
company="My Company"
phone="555-5555"
people=
{
person=
{
name="Bob"
location="Seattle"
settings=
{
size=1
color="red"
}
}
person=
{
name="Joe"
location="Seattle"
settings=
{
size=2
color="blue"
}
}
}
places=
{
...
}
There are many different parameters with varying levels of depth--this is just a very small subset.
It also might be worth noting that when a new sub-array is created that there is always an equals sign followed by a line break followed by the open bracket (as seen above).
Is there any simple looping or recursion technique for converting this data to a system-friendly data format such as arrays or JSON? I want to avoid hard-coding the names of properties. I am looking for something that will work in Python, Java, or PHP. Pseudo-code is fine, too.
I appreciate any help.
EDIT: I discovered the Pyparsing library for Python and it looks like it could be a big help. I can't find any examples for how to use Pyparsing to parse nested structures of unknown depth. Can anyone shed light on Pyparsing in terms of the data I described above?
EDIT 2: Okay, here is a working solution in Pyparsing:
def parse_file(fileName):
#get the input text file
file = open(fileName, "r")
inputText = file.read()
#define the elements of our data pattern
name = Word(alphas, alphanums+"_")
EQ,LBRACE,RBRACE = map(Suppress, "={}")
value = Forward() #this tells pyparsing that values can be recursive
entry = Group(name + EQ + value) #this is the basic name-value pair
#define data types that might be in the values
real = Regex(r"[+-]?\d+\.\d*").setParseAction(lambda x: float(x[0]))
integer = Regex(r"[+-]?\d+").setParseAction(lambda x: int(x[0]))
quotedString.setParseAction(removeQuotes)
#declare the overall structure of a nested data element
struct = Dict(LBRACE + ZeroOrMore(entry) + RBRACE) #we will turn the output into a Dictionary
#declare the types that might be contained in our data value - string, real, int, or the struct we declared
value << (quotedString | struct | real | integer)
#parse our input text and return it as a Dictionary
result = Dict(OneOrMore(entry)).parseString(inputText)
return result.dump()
This works, but when I try to write the results to a file with json.dump(result), the contents of the file are wrapped in double quotes. Also, there are \n chraacters between many of the data pairs. I tried suppressing them in the code above with LineEnd().suppress() , but I must not be using it correctly.
Parsing an arbitrarily nested structure can be done with pyparsing by defining a placeholder to hold the nested part, using the Forward class. In this case, you are just parsing simple name-value pairs, where then value could itself be a nested structure containing name-value pairs.
name :: word of alphanumeric characters
entry :: name '=' value
struct :: '{' entry* '}'
value :: real | integer | quotedstring | struct
This translates to pyparsing almost verbatim. To define value, which can recursively contain values, we first create a Forward() placeholder, which can be used as part of the definition of entry. Then once we have defined all the possible types of values, we use the '<<' operator to insert this definition into the value expression:
EQ,LBRACE,RBRACE = map(Suppress,"={}")
name = Word(alphas, alphanums+"_")
value = Forward()
entry = Group(name + EQ + value)
real = Regex(r"[+-]?\d+\.\d*").setParseAction(lambda x: float(x[0]))
integer = Regex(r"[+-]?\d+").setParseAction(lambda x: int(x[0]))
quotedString.setParseAction(removeQuotes)
struct = Group(LBRACE + ZeroOrMore(entry) + RBRACE)
value << (quotedString | struct | real | integer)
The parse actions on real and integer will convert these elements from strings to float or ints at parse time, so that the values can be used as their actual types immediately after parsing (no need to post-process to do string-to-other-type conversion).
Your sample is a collection of one or more entries, so we use that to parse the total input:
result = OneOrMore(entry).parseString(sample)
We can access the parsed data as a nested list, but it is not so pretty to display. This code uses pprint to pretty-print a formatted nested list:
from pprint import pprint
pprint(result.asList())
Giving:
[['company', 'My Company'],
['phone', '555-5555'],
['people',
[['person',
[['name', 'Bob'],
['location', 'Seattle'],
['settings', [['size', 1], ['color', 'red']]]]],
['person',
[['name', 'Joe'],
['location', 'Seattle'],
['settings', [['size', 2], ['color', 'blue']]]]]]]]
Notice that all the strings are just strings with no enclosing quotation marks, and the ints are actual ints.
We can do just a little better than this, by recognizing that the entry format actually defines a name-value pair suitable for accessing like a Python dict. Our parser can do this with just a few minor changes:
Change the struct definition to:
struct = Dict(LBRACE + ZeroOrMore(entry) + RBRACE)
and the overall parser to:
result = Dict(OneOrMore(entry)).parseString(sample)
The Dict class treats the parsed contents as a name followed by a value, which can be done recursively. With these changes, we can now access the data in result like elements in a dict:
print result['phone']
or like attributes in an object:
print result.company
Use the dump() method to view the contents of a structure or substructure:
for person in result.people:
print person.dump()
print
prints:
['person', ['name', 'Bob'], ['location', 'Seattle'], ['settings', ['size', 1], ['color', 'red']]]
- location: Seattle
- name: Bob
- settings: [['size', 1], ['color', 'red']]
- color: red
- size: 1
['person', ['name', 'Joe'], ['location', 'Seattle'], ['settings', ['size', 2], ['color', 'blue']]]
- location: Seattle
- name: Joe
- settings: [['size', 2], ['color', 'blue']]
- color: blue
- size: 2
There is no "simple" way, but there are harder and not-so-hard ways. If you don't want to hardcode things, then at some point you're going to have to parse it as a structured format. That would involve parsing each line one-by-one, tokenizing it appropriately (for example, separating the key from the value correctly), and then determining how you want to deal with the line.
You may need to store your data in an intermediary format such as a (parse) tree in order to account for the arbitrary nesting relationships (represented by indents and braces), and then after you have finished parsing the data, take your resulting tree and then go through it again to get your arrays or JSON.
There are libraries available such as ANTLR that handles some of the manual work of figuring out how to write the parser.
Take a look at this code:
still_not_valid_json = re.sub (r'(\w+)=', r'"\1":', pseudo_json ) #1
this_one_is_tricky = re.compile ('("|\d)\n(?!\s+})', re.M)
that_one_is_tricky_too = re.compile ('(})\n(?=\s+\")', re.M)
nearly_valid_json = this_one_is_tricky.sub (r'\1,\n', still_not_valid_json) #2
nearly_valid_json = that_one_is_tricky_too.sub (r'\1,\n', nearly_valid_json) #3
valid_json = '{' + nearly_valid_json + '}' #4
You can convert your pseudo_json in parseable json via some substitutions.
Replace '=' with ':'
Add missing commas between simple value (like "2" or "Joe") and next field
Add missing commas between closing brace of a complex value and next field
Embrace it with braces
Still there are issues. In your example 'people' dictionary contains two similar keys 'person'. After parsing only one key remains in the dictionary. This is what I've got after parsing:{u'phone': u'555-5555', u'company': u'My Company', u'people': {u'person': {u'settings': {u'color': u'blue', u'size': 2}, u'name': u'Joe', u'location': u'Seattle'}}}
If only you could replace second occurence of 'person=' to 'person1=' and so on...
Replace the '=' with ':', Then just read it as json, add in trailing commas
Okay, I came up with a final solution that actually transforms this data into a JSON-friendly Dict as I originally wanted. It first using Pyparsing to convert the data into a series of nested lists and then loops through the list and transforms it into JSON. This allows me to overcome the issue where Pyparsing's toDict() method was not able to handle where the same object has two properties of the same name. To determine whether a list is a plain list or a property/value pair, the prependPropertyToken method adds the string __property__ in front of property names when Pyparsing detects them.
def parse_file(self,fileName):
#get the input text file
file = open(fileName, "r")
inputText = file.read()
#define data types that might be in the values
real = Regex(r"[+-]?\d+\.\d*").setParseAction(lambda x: float(x[0]))
integer = Regex(r"[+-]?\d+").setParseAction(lambda x: int(x[0]))
yes = CaselessKeyword("yes").setParseAction(replaceWith(True))
no = CaselessKeyword("no").setParseAction(replaceWith(False))
quotedString.setParseAction(removeQuotes)
unquotedString = Word(alphanums+"_-?\"")
comment = Suppress("#") + Suppress(restOfLine)
EQ,LBRACE,RBRACE = map(Suppress, "={}")
data = (real | integer | yes | no | quotedString | unquotedString)
#define structures
value = Forward()
object = Forward()
dataList = Group(OneOrMore(data))
simpleArray = (LBRACE + dataList + RBRACE)
propertyName = Word(alphanums+"_-.").setParseAction(self.prependPropertyToken)
property = dictOf(propertyName + EQ, value)
properties = Dict(property)
object << (LBRACE + properties + RBRACE)
value << (data | object | simpleArray)
dataset = properties.ignore(comment)
#parse it
result = dataset.parseString(inputText)
#turn it into a JSON-like object
dict = self.convert_to_dict(result.asList())
return json.dumps(dict)
def convert_to_dict(self, inputList):
dict = {}
for item in inputList:
#determine the key and value to be inserted into the dict
dictval = None
key = None
if isinstance(item, list):
try:
key = item[0].replace("__property__","")
if isinstance(item[1], list):
try:
if item[1][0].startswith("__property__"):
dictval = self.convert_to_dict(item)
else:
dictval = item[1]
except AttributeError:
dictval = item[1]
else:
dictval = item[1]
except IndexError:
dictval = None
#determine whether to insert the value into the key or to merge the value with existing values at this key
if key:
if key in dict:
if isinstance(dict[key], list):
dict[key].append(dictval)
else:
old = dict[key]
new = [old]
new.append(dictval)
dict[key] = new
else:
dict[key] = dictval
return dict
def prependPropertyToken(self,t):
return "__property__" + t[0]

parsing repeated lines of string based on initial characters [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I am working on lists and strings in python. I have following lines of string.
ID abcd
AC efg
RF hij
ID klmno
AC p
RF q
I want the output as :
abcd, efg, hij
klmno, p, q
This output is based on the the first two characters in the line. How can I achieve it in efficient way?
I'm looking to output the second part of the line for every entry between the ID tags.
I'm having a little trouble parsing the question, but according to my best guess, this should do what you're looking for:
all_data = " ".join([line for line in file]).split("ID")
return [", ".join([item.split(" ")[::2] for item in all_data])]
Basically what you're doing here is first just joining together all of your data (removing the newlines) then splitting on your keyphrase of "ID"
After that, if I'm correctly interpreting the question, you're looking to get the second value of each pair. These pairs are space delimited (as is everything in that item due to the " ".join in the first line), so we just step through that list grabbing every other item.
In general splits have a little more syntactic sugar than is usually used, and the full syntax is: [start:end:step], so [::2] just returns every other item.
You could use the following, which takes order into account so that transposing the dict's values makes more sense...
from collections import OrderedDict
items = OrderedDict()
with open('/home/jon/sample_data.txt') as fin:
lines = (line.strip().partition(' ')[::2] for line in fin)
for key, value in lines:
items.setdefault(key[0], []).append(value)
res = [', '.join(el) for el in zip(*items.values())]
# ['abcd, efg, hij', 'klmno, p, q']
Use a default dict:
from collections import defaultdict
result = defaultdict(list)
for line in lines:
split_line = line.split(' ')
result[split_line[0]].append(split_line[1])
This will give you a dictionary result that stores all the values that have the same key in an array. To get all the strings that were in a line that started with e.g. ID:
print result[ID]
Based on your answers in comments, this should work (if I understand what you're looking for):
data = None
for line in lines:
fields = line.split(2)
if fields[0] == "ID":
#New set of data
if data is not None:
#Output last set of data.
print ", ".join(data)
data = []
data.append(fields[1])
if data is not None:
#Output final data set
print ", ".join(data)
It's pretty straight forward, you're just collecting the second field in each line into data until you see that start of the next data set, at which point you output the previous data set.
I think using itertools.groupby is best for this kind of parsing (do something until next token X)
import itertools
class GroupbyHelper(object):
def __init__(self):
self.state = None
def __call__(self, row):
if self.state is None:
self.state = True
else:
if row[0] == 'ID':
self.state = not self.state
return self.state
# assuming you read data from 'stream'
for _, data in itertools.groupby((line.split() for line in stream), GroupbyHelper()):
print ','.join(c[1] for c in data)
output:
$ python groupby.py
abcd,efg,hij
klmno,p,q
It looks like you would like to sub group your data, when ever 'ID' is present as your key. Groupby solution could work wonder here, if you know how to group your data. Here is one such implementation that might work for you
>>> data=[e.split() for e in data.splitlines()]
>>> def new_key(key):
toggle = [0,1]
def helper(e):
if e[0] == key:
toggle[:] = toggle[::-1]
return toggle[0]
return helper
>>> from itertools import groupby
>>> for k,v in groupby(data, key = new_key('ID')):
for e in v:
print e[-1],
print
abcd efg hij
klmno p q
If lines is equal to
['ID abcd', 'AC efg', 'RF hij']
then
[line.split()[1] for line in lines]
Edit: Added everything below after down votes
I am not sure why this was down voted. I thought that code was the simplest way to get started with the information provided at the time. Perhaps this is a better explanation of what I thought/think the data was/is?
if input is a list of strings in repeated sequence, called alllines;
alllines = [ #a list of repeated lines of string based on initial characters
'ID abcd',
'AC efg',
'RF hij',
'ID klmno',
'AC p',
'RF q'
]
then code is;
[[line.split()[1] for line in lines] for lines in [[alllines.pop(0) \
for i in range(3)] for o in range(len(alllines)/3)]]
This basically says, create a sublist of three split [1] strings from the whole list of all strings for every three strings in the whole list.
and output is;
[[
'abcd', 'efg', 'hij'
], [
'klmno', 'p', 'q'
]]
Edit: 8-6-13 This is an even better one without pop();
zip(*[iter([line.split()[1] for line in alllines])]*3)
with a slightly different output
[(
'abcd', 'efg', 'hij'
), (
'klmno', 'p', 'q'
)]

Categories