pyparsing recursion of values list (ibm rhapsody) - python

Im building a parser for the IBM Rhapsody sbs file format. But unfortunately the recursion part won't work as expected. The rule pp.Word(pp.printables + " ") is probably the problem as it matches also ; and {}. But at least ; can also be part of the values.
import pyparsing as pp
import pprint
TEST = r"""{ foo
- key = bla;
- value = 1243; 1233; 1235;
- _hans = "hammer
time";
- HaMer = 765; 786; 890;
- value = "
#pragma LINK_INFO DERIVATIVE \"mc9s12xs256\"
";
- _mText = 12.11.2015::13:20:0;
- value = "war"; "fist";
- _obacht = "fish,car,button";
- _id = gibml c0d8-4535-898f-968362779e07;
- bam = { boing
- key = bla;
}
{ boing
- key = bla;
}
}
"""
def flat(loc, toks):
if len(toks[0]) == 1:
return toks[0][0]
assignment = pp.Suppress("-") + pp.Word(pp.alphanums + "_") + pp.Suppress("=")
value = pp.OneOrMore(
pp.Group(assignment + (
pp.Group(pp.OneOrMore(
pp.QuotedString('"', escChar="\\", multiline=True) +
pp.Suppress(";"))).setParseAction(flat) |
pp.Word(pp.alphas) + pp.Suppress(";") |
pp.Word(pp.printables + " ")
))
)
expr = pp.Forward()
expr = pp.Suppress("{") + pp.Word(pp.alphas) + (
value | (assignment + expr) | expr
) + pp.Suppress("}")
expr = expr.ignore(pp.pythonStyleComment)
print TEST
pprint.pprint(expr.parseString(TEST).asList())
Output:
% python prase.py
{ foo
- key = bla;
- value = 1243; 1233; 1235;
- _hans = "hammer
time";
- HaMer = 765; 786; 890;
- value = "
#pragma LINK_INFO DERIVATIVE \"mc9s12xs256\"
";
- _mText = 12.11.2015::13:20:0;
- value = "war"; "fist";
- _obacht = "fish,car,button";
- _id = gibml c0d8-4535-898f-968362779e07;
- bam = { boing
- key = bla;
}
{ boing
- key = bla;
}
}
['foo',
['key', 'bla'],
['value', '1243; 1233; 1235;'],
['_hans', 'hammer\n time'],
['HaMer', '765; 786; 890;'],
['value', '\n #pragma LINK_INFO DERIVATIVE "mc9s12xs256"\n '],
['_mText', '12.11.2015::13:20:0;'],
['value', ['war', 'fist']],
['_obacht', 'fish,car,button'],
['_id', 'gibml c0d8-4535-898f-968362779e07;'],
['bam', '{ boing'],
['key', 'bla']]

Wow, that is one messy model format! I think this will get you close. I started out by trying to characterize what a valid value expression could be. I saw that each grouping could contain ';'-terminated attribute definitions, or '{}'-enclosed nested objects. Each object contained a leading identifier giving the object type.
The difficult issue was the very general token which I named 'value_word', which is pretty much any grouping of characters, as long as it is not a '-', '{', or '}'. The negative lookaheads in the definition of 'value_word' take care of this. I think a key issue here is that I was able to not include ' ' as a valid character in a 'value_word', but instead let pyparsing do its default whitespace-skipping with a potential for having one or more 'value_word's make up an 'attr_value'.
The final kicker (not found in your test case, but in the example you linked to) was this line for an attribute 'assignment':
- m_pParent = ;
So the attr_value had to allow for an empty string also.
from pyparsing import *
LBRACE,RBRACE,SEMI,EQ,DASH = map(Suppress,"{};=-")
ident = Word(alphas + '_', alphanums+'_').setName("ident")
guid = Group('GUID' + Combine(Word(hexnums)+('-'+Word(hexnums))*4))
qs = QuotedString('"', escChar="\\", multiline=True)
character_literal = Combine("'" + oneOf(list(printables+' ')) + "'")
value_word = ~DASH + ~LBRACE + ~RBRACE + Word(printables, excludeChars=';').setName("value_word")
value_atom = guid | qs | character_literal | value_word
object_ = Forward()
attr_value = OneOrMore(object_) | Optional(delimitedList(Group(value_atom+OneOrMore(value_atom))|value_atom, ';')) + SEMI
attr_value.setName("attr_value")
attr_defn = Group(DASH + ident("name") + EQ + Group(attr_value)("value"))
attr_defn.setName("attr_defn")
object_ <<= Group(
LBRACE + ident("type") +
Group(ZeroOrMore(attr_defn | object_))("attributes") +
RBRACE
)
object_.parseString(TEST).pprint()
For your test string it gives:
[['foo',
[['key', ['bla']],
['value', ['1243', '1233', '1235']],
['_hans', ['hammer\n time']],
['HaMer', ['765', '786', '890']],
['value', ['\n #pragma LINK_INFO DERIVATIVE "mc9s12xs256"\n ']],
['_mText', ['12.11.2015::13:20:0']],
['value', ['war', 'fist']],
['_obacht', ['fish,car,button']],
['_id', [['gibml', 'c0d8-4535-898f-968362779e07']]],
['bam', [['boing', [['key', ['bla']]]], ['boing', [['key', ['bla']]]]]]]]]
I added results names that might help in processing these structures. Using object_.parseString(TEST).dump() gives this output:
[['foo', [['key', ['bla']], ['value', ['1243', '1233', '1235']], ['_hans', ['hammer\n time']], ...
[0]:
['foo', [['key', ['bla']], ['value', ['1243', '1233', '1235']], ['_hans', ['hammer\n time']], ...
- attributes: [['key', ['bla']], ['value', ['1243', '1233', '1235']], ['_hans', ['hammer...
[0]:
['key', ['bla']]
- name: key
- value: ['bla']
[1]:
['value', ['1243', '1233', '1235']]
- name: value
- value: ['1243', '1233', '1235']
[2]:
['_hans', ['hammer\n time']]
- name: _hans
- value: ['hammer\n time']
[3]:
['HaMer', ['765', '786', '890']]
- name: HaMer
- value: ['765', '786', '890']
[4]:
['value', ['\n #pragma LINK_INFO DERIVATIVE "mc9s12xs256"\n ']]
- name: value
- value: ['\n #pragma LINK_INFO DERIVATIVE "mc9s12xs256"\n ']
[5]:
['_mText', ['12.11.2015::13:20:0']]
- name: _mText
- value: ['12.11.2015::13:20:0']
[6]:
['value', ['war', 'fist']]
- name: value
- value: ['war', 'fist']
[7]:
['_obacht', ['fish,car,button']]
- name: _obacht
- value: ['fish,car,button']
[8]:
['_id', [['gibml', 'c0d8-4535-898f-968362779e07']]]
- name: _id
- value: [['gibml', 'c0d8-4535-898f-968362779e07']]
[0]:
['gibml', 'c0d8-4535-898f-968362779e07']
[9]:
['bam', [['boing', [['key', ['bla']]]], ['boing', [['key', ['bla']]]]]]
- name: bam
- value: [['boing', [['key', ['bla']]]], ['boing', [['key', ['bla']]]]]
[0]:
['boing', [['key', ['bla']]]]
- attributes: [['key', ['bla']]]
[0]:
['key', ['bla']]
- name: key
- value: ['bla']
- type: boing
[1]:
['boing', [['key', ['bla']]]]
- attributes: [['key', ['bla']]]
[0]:
['key', ['bla']]
- name: key
- value: ['bla']
- type: boing
- type: foo
It also successfully parses the linked example, once the leading version line is removed.

Related

How can I leave whitespaces in nestedExpr pyparsing

I've a wiki text like that
data = """
{{hello}}
{{hello world}}
{{hello much { }}
{{a {{b}}}}
{{a
td {
}
{{inner}}
}}
"""
and I want to extract the macros inside it
macro is a text enclosed between {{ and }}
so I tried using nestedExpr
from pyparsing import *
import pprint
def getMacroCandidates(txt):
candidates = []
def nestedExpr(opener="(", closer=")", content=None, ignoreExpr=quotedString.copy()):
if opener == closer:
raise ValueError("opening and closing strings cannot be the same")
if content is None:
if isinstance(opener,str) and isinstance(closer,str):
if ignoreExpr is not None:
content = (Combine(OneOrMore(~ignoreExpr +
~Literal(opener) + ~Literal(closer) +
CharsNotIn(ParserElement.DEFAULT_WHITE_CHARS,exact=1))
).setParseAction(lambda t:t[0]))
ret = Forward()
ret <<= Group( opener + ZeroOrMore( ignoreExpr | ret | content ) + closer )
ret.setName('nested %s%s expression' % (opener,closer))
return ret
# use {}'s for nested lists
macro = nestedExpr("{{", "}}")
# print(( (nestedItems+stringEnd).parseString(data).asList() ))
for toks, preloc, nextloc in macro.scanString(data):
print(toks)
return candidates
data = """
{{hello}}
{{hello world}}
{{hello much { }}
{{a {{b}}}}
{{a
td {
}
{{inner}}
}}
"""
getMacroCandidates(data)
Which gives me the tokens and spaces removed
[['{{', 'hello', '}}']]
[['{{', 'hello', 'world', '}}']]
[['{{', 'hello', 'much', '{', '}}']]
[['{{', 'a', ['{{', 'b', '}}'], '}}']]
[['{{', 'a', 'td', '{', '}', ['{{', 'inner', '}}'], '}}']]
You can you replace
data = """
{{hello}}
{{hello world}}
{{hello much { }}
{{a {{b}}}}
{{a
td {
}
{{inner}}
}}
"""
import shlex
data1= data.replace("{{",'"')
data2 = data1.replace("}}",'"')
data3= data2.replace("}"," ")
data4= data3.replace("{"," ")
data5= ' '.join(data4.split())
print(shlex.split(data5.replace("\n"," ")))
Output
This returns you all the tokens with braces and white space removed with extra line space also removed
['hello', 'hello world', 'hello much ', 'a b', 'a td inner ']
PS:This can be made to a single expression , multiple expression is used for readability

parse string into list based on input list

I would like to write a function in python3 to parse a string based on the input list element. The following function works but is there a better way to do it?
def func(oStr, s_s):
if not oStr:
return s_s
elif '' in s_s:
return [oStr]
else:
for x in s_s:
st = oStr.find(x)
end = st + len(x)
res.append(oStr[st:end])
oStr = oStr.replace(x, '')
if oStr:
res.append(oStr)
return res
case 1
o_str = 'ABCNew York - Address'
s_str = ['ABC']
return ['ABC', 'New York - Address']
case 2
o_str = 'New York Friend Add | NumberABCNewYork Name | FirstName Last Name | time : Jan-31-2017'
s_str = ['New York Friend Add | Number', 'ABC', 'NewYork Name | FirstName Last Name | time: Jan-31-2017']
return ['New York Friend Add | Number', 'ABC', 'NewYork Name | FirstName Last Name | time: Jan-31-2017']
case 3
o_str = '-'
s_str = ['']
return ['-']
case 4
o_str = '1'
s_str = ['']
return ['1']
case 5
o_str = '1234Family-Name'
s_str = ['1234']
return ['1234', 'Family-Name']
case 6
o_str = ''
s_str = ['12345667', 'name']
return ['12345667', 'name']
To use a string like an array, you would just program it in the same way. For example
myStr="Hello, World!"
myString.insert(len(myString),"""Your character here""")
For your purposes .append() would work exactly the same way. Hope I helped.

How to get results from pyparsing Forward object?

Let us assume we have the following string
string = """
object obj1{
attr1 value1;
object obj2 {
attr2 value2;
}
}
object obj3{
attr3 value3;
attr4 value4;
}
"""
There is a nested object, and we use Forward to parse this.
from pyparsing import *
word = Word(alphanums)
attribute = word.setResultsName("name")
value = word.setResultsName("value")
object_grammar = Forward()
attributes = attribute + value + Suppress(";") + LineEnd().suppress()
object_type = Suppress("object ") + word.setResultsName("object_type") + Suppress('{') + LineEnd().suppress()
object_grammar <<= object_type+\
OneOrMore(attributes|object_grammar) + Suppress("}") | Suppress("};")
for i, (obj, _, _) in enumerate(object_grammar.scanString(string)):
print('\n')
print('Enumerating over object {}'.format(i))
print('\n')
print('This is the object type {}'.format(obj.object_type))
print(obj.asXML())
print(obj.asDict())
print(obj.asList())
print(obj)
print(obj.dump())
These are the results. The obj.asXML() function contains all the information, however since it has been flattened, the order of the information is essential to parsing the result. Is this the best way to do it? I must be missing something. I would like a solution that works for both nested and not nested objects, i.e. for obj1, obj2 and obj3.
Also, setResultsName('object_type') doesn't return the object_type for the parent object. The output of the program above is shown below. Any suggestions?
Enumerating over object 0
This is the object type obj2
<ITEM>
<object_type>obj1</object_type>
<name>attr1</name>
<value>value1</value>
<object_type>obj2</object_type>
<name>attr2</name>
<value>value2</value>
</ITEM>
{'object_type': 'obj2', 'name': 'attr2', 'value': 'value2'}
['obj1', 'attr1', 'value1', 'obj2', 'attr2', 'value2']
['obj1', 'attr1', 'value1', 'obj2', 'attr2', 'value2']
['obj1', 'attr1', 'value1', 'obj2', 'attr2', 'value2']
- name: attr2
- object_type: obj2
- value: value2
Enumerating over object 1
This is the object type obj3
<ITEM>
<object_type>obj3</object_type>
<name>attr3</name>
<value>value3</value>
<name>attr4</name>
<value>value4</value>
</ITEM>
{'object_type': 'obj3', 'name': 'attr4', 'value': 'value4'}
['obj3', 'attr3', 'value3', 'attr4', 'value4']
['obj3', 'attr3', 'value3', 'attr4', 'value4']
['obj3', 'attr3', 'value3', 'attr4', 'value4']
- name: attr4
- object_type: obj3
- value: value4
While you have successfully processed your input string, let me suggest some refinements to your grammar.
When defining a recursive grammar, one usually wants to maintain some structure in the output results. In your case, the logical piece to structure is the content of each object, which is surrounded by opening and closing braces. Conceptually:
object_content = '{' + ZeroOrMore(attribute_defn | object_defn) + '}'
Then the supporting expressions are just (still conceptually):
attribute_defn = identifier + attribute_value + ';'
object_defn = 'object' + identifier + object_content
The actual Python/pyparsing for this looks like:
LBRACE,RBRACE,SEMI = map(Suppress, "{};")
word = Word(alphas, alphanums)
attribute = word
# expand to include other values if desired, such as ints, reals, strings, etc.
attribute_value = word
attributeDefn = Group(word("name") + value("value") + SEMI)
OBJECT = Keyword("object")
object_header = OBJECT + word("object_name")
object_grammar = Forward()
object_body = Group(LBRACE
+ ZeroOrMore(object_grammar | attributeDefn)
+ RBRACE)
object_grammar <<= Group(object_header + object_body("object_content"))
Group does two things for us: it structures the results into sub-objects; and it keeps the results names at one level from stepping on those at a different level (so no need for listAllMatches).
Now instead of scanString, you can just process your input using OneOrMore:
print(OneOrMore(object_grammar).parseString(string).dump())
Giving:
[['object', 'obj1', [['attr1', 'value1'], ['object', 'obj2', [['attr2', 'value2']]]]], ['object', 'obj3', [['attr3', 'value3'], ['attr4', 'value4']]]]
[0]:
['object', 'obj1', [['attr1', 'value1'], ['object', 'obj2', [['attr2', 'value2']]]]]
- object_content: [['attr1', 'value1'], ['object', 'obj2', [['attr2', 'value2']]]]
[0]:
['attr1', 'value1']
- name: attr1
- value: value1
[1]:
['object', 'obj2', [['attr2', 'value2']]]
- object_content: [['attr2', 'value2']]
[0]:
['attr2', 'value2']
- name: attr2
- value: value2
- object_name: obj2
- object_name: obj1
[1]:
['object', 'obj3', [['attr3', 'value3'], ['attr4', 'value4']]]
- object_content: [['attr3', 'value3'], ['attr4', 'value4']]
[0]:
['attr3', 'value3']
- name: attr3
- value: value3
[1]:
['attr4', 'value4']
- name: attr4
- value: value4
- object_name: obj3
I started to just make simple changes to your code, but there was a fatal flaw in your original. Your parser separated the left and right braces into two separate expressions - while this "works", it defeats the ability to define the group structure of the results.
I was able to work around this by using listAllMatches=True in the setResultsNames function. This gave me as asXML() result that had structure that I could retrieve information from. It still relies on the order of the XML and requires using zip for get the name and value for a attribute together. I'll leave this question open to see if I get a better way of doing this.

Parse a colon separated string with pyparsing

This is the data:
C:/data/my_file.txt.c:10:0x21:name1:name2:0x10:1:OK
C:/data/my_file2.txt.c:110:0x1:name2:name5:0x12:1:NOT_OK
./data/my_file3.txt.c:110:0x1:name2:name5:0x12:10:OK
And I would like to get this result
[C:/data/my_file.txt.c, 10, 0x21, name1, name2, 0x10, 1, OK]
[C:/data/my_file2.txt.c, 110, 0x1, name2, name5, 0x12, 1, NOT_OK]
[./data/my_file3.txt.c, 110, 0x1, name2, name5, 0x12, 10, OK]
I know how to do that with some code or string split and stuff like that, but I am searching for a nice solution using pyparsing. My problem is the :/ for the file path.
Additional Question I use some code to strip comments and other stuff from the records so the raw data looks like this:
text = """C:/data/my_file.txt.c:10:0x21:name1:name2:0x10:1:OK
C:/data/my_file2.txt.c:110:0x1:name2:name5:0x12:1:NOT_OK
// comment
./data/my_file3.txt.c:110:0x1:name2:name5:0x12:10:OK
----
ok
"""
And i strip the "//", "ok", and "---" before parsing right now
So now I have a next question too the first:
Some addition to the first question. Till now I extracted the lines above from a data file - that works great. So I read the file line by line and parse it. But now I found out it is possible to use parseFile to parse a whole file. So I think I could strip some of my code and use parseFile instead. So the files I would like to parse have an additional footer.
C:/data/my_file.txt.c:10:0x21:name1:name2:0x10:1:OK
C:/data/my_file2.txt.c:110:0x1:name2:name5:0x12:1:NOT_OK
./data/my_file3.txt.c:110:0x1:name2:name5:0x12:10:OK: info message
-----------------------
3 Files 2 OK 1 NOT_OK
NOT_OK
Is it possible to change the parser to get 2 parse results?
Result1:
[['C:/data/my_file.txt.c', '10', '0x21', 'name1', 'name2', '0x10', '1', 'OK'],
['C:/data/my_file2.txt.c', '110', '0x1', 'name2', 'name5', '0x12', '1', 'NOT_OK'],
['./data/my_file3.txt.c', '110', '0x1', 'name2', 'name5', '0x12', '10', 'OK']]
Ignore the blank line
Ignore this line => -----------------------
Result 2:
[['3', 'Files', 2', 'OK’, '1', 'NOT_OK'],
['NOT_OK’],
So I changed the thes Code for that:
# define an expression for your file reference
one_thing = Combine(
oneOf(list(alphas)) + ':/' +
Word(alphanums + '_-./'))
# define a catchall expression for everything else (words of non-whitespace characters,
# excluding ':')
another_thing = Word(printables + " ", excludeChars=':')
# define an expression of the two; be sure to list the file reference first
thing = one_thing | another_thing
# now use plain old pyparsing delimitedList, with ':' delimiter
list_of_things = delimitedList(thing, delim=':')
list_of_other_things = Word(printables).setName('a')
# run it and see...
parse_ret = OneOrMore(Group(list_of_things | list_of_other_things)).parseFile("data.file")
parse_ret.pprint()
And I get this result:
[['C:/data/my_file.txt.c', '10', '0x21', 'name1', 'name2', '0x10', '1', 'OK'],
['C:/data/my_file2.txt.c','110', '0x1', 'name2', 'name5', '0x12', '1', 'NOT_OK'],
['./data/my_file3.txt.c', '110', '0x1', 'name2', 'name5', '0x12', '10', 'OK', 'info message'],
['-----------------------'],
['3 Files 2 OK 1 NOT_OK'],
['NOT_OK']]
So I can go with this but is it possible to split the result into two named results? I searched the docs but I didn´t find anything that works.
See embedded comments for pyparsing description:
from pyparsing import *
text = """C:/data/my_file.txt.c:10:0x21:name1:name2:0x10:1:OK
C:/data/my_file2.txt.c:110:0x1:name2:name5:0x12:1:NOT_OK
// blah-de blah blah blah
./data/my_file3.txt.c:110:0x1:name2:name5:0x12:10:OK"""
# define an expression for your file reference
one_thing = Combine(
oneOf(list(alphas.upper())) + ':/' +
Word(alphanums + '_-./'))
# define a catchall expression for everything else (words of non-whitespace characters,
# excluding ':')
another_thing = Word(printables, excludeChars=':')
# define an expression of the two; be sure to list the file reference first
thing = one_thing | another_thing
# now use plain old pyparsing delimitedList, with ':' delimiter
list_of_things = delimitedList(thing, delim=':')
parser = OneOrMore(Group(list_of_things))
# ignore comments starting with double slash
parser.ignore(dblSlashComment)
# run it and see...
parser.parseString(text).pprint()
prints:
[['C:/data/my_file.txt.c', '10', '0x21', 'name1', 'name2', '0x10', '1', 'OK'],
['C:/data/my_file2.txt.c', '110', '0x1', 'name2', 'name5', '0x12', '1', 'NOT_OK'],
['./data/my_file3.txt.c', '110', '0x1', 'name2', 'name5', '0x12', '10', 'OK']]
So I didn´t found a solution with delimitedList and parseFile but I found a Solution which is okay for me.
from pyparsing import *
data = """
C: / data / my_file.txt.c:10:0x21:name1:name2:0x10:1:OK
C: / data / my_file2.txt.c:110:0x1:name2:name5:0x12:1:NOT_OK
./ data / my_file3.txt.c:110:0x1:name2:name5:0x12:10:OK: info message
-----------------------
3 Files 2 OK 1 NOT_OK
NOT_OK
"""
if __name__ == '__main__':
# define an expression for your file reference
entry_one = Combine(
oneOf(list(alphas)) + ':/' +
Word(alphanums + '_-./'))
entry_two = Word(printables + ' ', excludeChars=':')
entry = entry_one | entry_two
delimiter = Literal(':').suppress()
tc_result_line = Group(entry.setResultsName('file_name') + delimiter + entry.setResultsName(
'line_nr') + delimiter + entry.setResultsName('num_one') + delimiter + entry.setResultsName('name_one') + delimiter + entry.setResultsName(
'name_two') + delimiter + entry.setResultsName('num_two') + delimiter + entry.setResultsName('status') + Optional(
delimiter + entry.setResultsName('msg'))).setResultsName("info_line")
EOL = LineEnd().suppress()
SOL = LineStart().suppress()
blank_line = SOL + EOL
tc_summary_line = Group(Word(nums).setResultsName("num_of_lines") + "Files" + Word(nums).setResultsName(
"num_of_ok") + "OK" + Word(nums).setResultsName("num_of_not_ok") + "NOT_OK").setResultsName(
"info_summary")
tc_end_line = Or(Literal("NOT_OK"), Literal('Ok')).setResultsName("info_result")
# run it and see...
pp1 = tc_result_line | Optional(tc_summary_line | tc_end_line)
pp1.ignore(blank_line | OneOrMore("-"))
result = list()
for l in data.split('\n'):
result.append((pp1.parseString(l)).asDict())
# delete empty results
result = filter(None, result)
for r in result:
print(r)
pass
Result:
{'info_line': {'file_name': 'C', 'num_one': '10', 'msg': '1', 'name_one': '0x21', 'line_nr': '/ data / my_file.txt.c', 'status': '0x10', 'num_two': 'name2', 'name_two': 'name1'}}
{'info_line': {'file_name': 'C', 'num_one': '110', 'msg': '1', 'name_one': '0x1', 'line_nr': '/ data / my_file2.txt.c', 'status': '0x12', 'num_two': 'name5', 'name_two': 'name2'}}
{'info_line': {'file_name': './ data / my_file3.txt.c', 'num_one': '0x1', 'msg': 'OK', 'name_one': 'name2', 'line_nr': '110', 'status': '10', 'num_two': '0x12', 'name_two': 'name5'}}
{'info_summary': {'num_of_lines': '3', 'num_of_ok': '2', 'num_of_not_ok': '1'}}
{'info_result': ['NOT_OK']}
Using re:
myList = ["C:/data/my_file.txt.c:10:0x21:name1:name2:0x10:1:OK", "C:/data/my_file2.txt.c:110:0x1:name2:name5:0x12:1:NOT_OK", "./data/my_file3.txt.c:110:0x1:name2:name5:0x12:10:OK"]
for i in myList:
newTxt = re.sub(r':', ",", i)
newTxt = re.sub(r',/', ":/", newTxt)
print newTxt

When using setResultsName with listAllMatches true, some matched items are nested

Based on this grammar:
from pyparsing import *
g = quotedString.setParseAction( removeQuotes )
eg = Suppress('-') + quotedString.setParseAction( removeQuotes )
choice = Or( [ g.setResultsName("out",listAllMatches=True),
eg.setResultsName("in",listAllMatches=True) ] )
grammar = ZeroOrMore( choice ) + Suppress(restOfLine)
a = world.parseString( ' "ali" -"baba" "holy cow" -"smoking beaute" ' )
print a.dump()
I have discovered that tokens that satisfy the eg nonterminal are always wrapped in an extra list. The only difference with g is that it has a leading `Suppress('-')'.
['ali', 'baba', 'holy cow', 'smoking beaute']
- in: [['baba'], ['smoking beaute']]
- out: ['ali', 'holy cow']
How do make them behave the same ? I want to achieve the result below:
['ali', 'baba', 'holy cow', 'smoking beaute']
- in: ['baba', 'smoking beaute']
- out: ['ali', 'holy cow']
It's been a while since I've looked at this issue - the problem is that And's always return their tokens as lists, even if the contain only a single value.
Here is an ungrouper that can clear this up for you, I'll include this in the next pyparsing release:
ungroup = lambda expr : TokenConverter(expr).setParseAction(lambda t:t[0])
eg = ungroup(Suppress('-') + quotedString.setParseAction( removeQuotes ))
With your test code, I now get these results:
['ali', 'baba', 'holy cow', 'smoking beaute']
- in: ['baba', 'smoking beaute']
- out: ['ali', 'holy cow']

Categories