How to get pyparser to work in a particular form

How to get pyparser to work in a particular form - python

Sorry for the sorry title. I could not think of anything better
I am trying to implement a DSL with pyparsing that has the following requirements:
varaibles: All of them begin with v_
Unary operators: +, -
Binary operators: +,-,*,/,%
Constant numbers
Functions, like normal functions when they have just one variable
Functions need to work like this: foo(v_1+v_2) = foo(v_1) + foo(v_2), foo(bar(10*v_6))=foo(bar(10))*foo(bar(v_6)). It should be the case for any binary operation
I am able to get 1-5 working
This is the code I have so far
from pyparsing import *
exprstack = []
#~ #traceParseAction
def pushFirst(tokens):
exprstack.insert(0,tokens[0])
# define grammar
point = Literal( '.' )
plusorminus = Literal( '+' ) | Literal( '-' )
number = Word( nums )
integer = Combine( Optional( plusorminus ) + number )
floatnumber = Combine( integer +
Optional( point + Optional( number ) ) +
Optional( integer )
)
ident = Combine("v_" + Word(nums))
plus = Literal( "+" )
minus = Literal( "-" )
mult = Literal( "*" )
div = Literal( "/" )
cent = Literal( "%" )
lpar = Literal( "(" ).suppress()
rpar = Literal( ")" ).suppress()
addop = plus | minus
multop = mult | div | cent
expop = Literal( "^" )
band = Literal( "#" )
# define expr as Forward, since we will reference it in atom
expr = Forward()
fn = Word( alphas )
atom = ( ( floatnumber | integer | ident | ( fn + lpar + expr + rpar ) ).setParseAction(pushFirst) |
( lpar + expr.suppress() + rpar ))
factor = Forward()
factor << atom + ( ( band + factor ).setParseAction( pushFirst ) | ZeroOrMore( ( expop + factor ).setParseAction( pushFirst ) ) )
term = factor + ZeroOrMore( ( multop + factor ).setParseAction( pushFirst ) )
expr << term + ZeroOrMore( ( addop + term ).setParseAction( pushFirst ) )
print(expr)
bnf = expr
pattern = bnf + StringEnd()
def test(s):
del exprstack[:]
bnf.parseString(s,parseAll=True)
print exprstack
test("avg(+10)")
test("v_1+8")
test("avg(v_1+10)+10")
Here is the what I want.
My functions are of this type:
foo(v_1+v_2) = foo(v_1) + foo(v_2)
The same behaviour is expected for any other binary operation as well. I have no idea how to make the parser do this automatically.

Break out the function call as a separate sub expression:
function_call = fn + lpar + expr + rpar
Then add a parse action to function_call that pops the operators and operands from expr_stack, then pushes them back onto the stack:
if an operand, push operand then function
if an operator, push the operator
Since you are only doing binary operations, you might be better off doing a simple approach first:
expr = Forward()
identifier = Word(alphas+'_', alphanums+'_')
expr = Forward()
function_call = Group(identifier + LPAR + Group(expr) + RPAR)
unop = oneOf("+ -")
binop = oneOf("+ - * / %")
operand = Group(Optional(unop) + (function_call | number | identifier))
binexpr = operand + binop + operand
expr << (binexpr | operand)
bnf = expr
This gives you a simpler structure to work with, without having to mess with exprstack.
def test(s):
exprtokens = bnf.parseString(s,parseAll=True)
print exprtokens
test("10")
test("10+20")
test("avg(10)")
test("avg(+10)")
test("column_1+8")
test("avg(column_1+10)+10")
Gives:
[['10']]
[['10'], '+', ['20']]
[[['avg', [['10']]]]]
[[['avg', [['+', '10']]]]]
[['column_1'], '+', ['8']]
[[['avg', [['column_1'], '+', ['10']]]], '+', ['10']]
You want to expand fn(a op b) to fn(a) op fn(b), but fn(a) should be left alone, so you need to test on the length of the parsed expression argument:
def distribute_function(tokens):
# unpack function name and arguments
fname, args = tokens[0]
# if args contains an expression, expand it
if len(args) > 1:
ret = ParseResults([])
for i,a in enumerate(args):
if i % 2 == 0:
# even args are operands to be wrapped in the function
ret += ParseResults([ParseResults([fname,ParseResults([a])])])
else:
# odd args are operators, just add them to the results
ret += ParseResults([a])
return ParseResults([ret])
function_call.setParseAction(distribute_function)
Now your calls to test will look like:
[['10']]
[['10'], '+', ['20']]
[[['avg', [['10']]]]]
[[['avg', [['+', '10']]]]]
[['column_1'], '+', ['8']]
[[[['avg', [['column_1']]], '+', ['avg', [['10']]]]], '+', ['10']]
This should even work recursively with a call like fna(fnb(3+2)+fnc(4+9)).

Related

(pyparsing) delimitedList does not work as expected

I have a stringized array that I am receiving from an external system. It is stripped of quotes and separated by a comma and a space.
I'm trying to use pyparsing, but I'm only getting the first element of the array. How do I specify that a word must end in alphanumeric ?
value = '[AaAa=Aaa_xx_12,Bxfm=djfn_13, ldfjk=ddd,ttt=ddfs_ddfj_99]'
LBR, RBR = map(pp.Suppress, "[]")
qs = pp.Word(pp.alphas, pp.alphanums + pp.srange("[_=,]"))
qsList = LBR + pp.delimitedList(qs, delim=', ') + RBR
print(value)
print(qsList.parseString(value).asList())
[AaAa=Aaa_xx_12,Bxfm=djfn_13, ldfjk=ddd,ttt=ddfs_ddfj_99]
# pyparsing.exceptions.ParseException: Expected ']', found 'ldfjk' (at char 30), (line:1, col:31)
BR

Thanks for the mental support :D
LBR, RBR = map(pp.Suppress, "[]")
element = pp.Word(pp.alphanums) + pp.Literal('=') + pp.Word(pp.alphanums + pp.srange("[_]"))
qs = pp.Combine(element + pp.OneOrMore(pp.Optional(pp.Literal(',')) + element))
qsList = LBR + pp.delimitedList(qs, delim= ', ') + RBR
print(value)
print(qsList.parseString(value).asList())
#[AaAa=Aaa_xx_12,Bxfm=djfn_13, ldfjk=ddd,ttt=ddfs_ddfj_99]
#['AaAa=Aaa_xx_12,Bxfm=djfn_13', 'ldfjk=ddd,ttt=ddfs_ddfj_99']

Parse C-like declarations using pyparsing

I would like to parse declarations using pyparsing in a C-like source (GLSL code) such that I get a list of (type, name, value).
For example:
int a[3];
int b=1, c=2.0;
float d = f(z[2], 2) + 3*g(4,a), e;
Point f = {1,2};
I would like to obtain something like:
[ ('int', 'a[3]', ''),
('int', 'b', '1'),
('int', 'c', '2.0'),
('float', 'd', 'f(z[2], 2) + 3*g(4,a)'),
('float', 'e', ''),
('Point', 'f', '{1,2}') ]
I've played with Forward() and operatorPrecedence() to try to parse the rhs expression but I suspect it is not necessary in my case.
So far I have:
IDENTIFIER = Regex('[a-zA-Z_][a-zA-Z_0-9]*')
INTEGER = Regex('([+-]?(([1-9][0-9]*)|0+))')
EQUAL = Literal("=").suppress()
SEMI = Literal(";").suppress()
SIZE = INTEGER | IDENTIFIER
VARNAME = IDENTIFIER
TYPENAME = IDENTIFIER
VARIABLE = Group(VARNAME.setResultsName("name")
+ Optional(EQUAL + Regex("[^,;]*").setResultsName("value")))
VARIABLES = delimitedList(VARIABLE.setResultsName("variable",listAllMatches=True))
DECLARATION = (TYPENAME.setResultsName("type")
+ VARIABLES.setResultsName("variables", listAllMatches=True) + SEMI)
code = """
float a=1, b=3+f(2), c;
float d=1.0, e;
float f = z(3,4);
"""
for (token, start, end) in DECLARATION.scanString(code):
for variable in token.variable:
print token.type, variable.name, variable.value
but the last expression (f=z(3,4)) is not parsed because of the ,.

There is a C struct parser on the pyparsing wiki that might give you a good start.

This seems to work.
IDENTIFIER = Word(alphas+"_", alphas+nums+"_" )
INT_DECIMAL = Regex('([+-]?(([1-9][0-9]*)|0+))')
INT_OCTAL = Regex('(0[0-7]*)')
INT_HEXADECIMAL = Regex('(0[xX][0-9a-fA-F]*)')
INTEGER = INT_HEXADECIMAL | INT_OCTAL | INT_DECIMAL
FLOAT = Regex('[+-]?(((\d+\.\d*)|(\d*\.\d+))([eE][-+]?\d+)?)|(\d*[eE][+-]?\d+)')
LPAREN, RPAREN = Literal("(").suppress(), Literal(")").suppress()
LBRACK, RBRACK = Literal("[").suppress(), Literal("]").suppress()
LBRACE, RBRACE = Literal("{").suppress(), Literal("}").suppress()
SEMICOLON, COMMA = Literal(";").suppress(), Literal(",").suppress()
EQUAL = Literal("=").suppress()
SIZE = INTEGER | IDENTIFIER
VARNAME = IDENTIFIER
TYPENAME = IDENTIFIER
OPERATOR = oneOf("+ - * / [ ] . & ^ ! { }")
PART = nestedExpr() | nestedExpr('{','}') | IDENTIFIER | INTEGER | FLOAT | OPERATOR
EXPR = delimitedList(PART, delim=Empty()).setParseAction(keepOriginalText)
VARIABLE = (VARNAME("name") + Optional(LBRACK + SIZE + RBRACK)("size")
+ Optional(EQUAL + EXPR)("value"))
VARIABLES = delimitedList(VARIABLE.setResultsName("variables",listAllMatches=True))
DECLARATION = (TYPENAME("type") + VARIABLES + SEMICOLON)
code = """
int a[3];
int b=1, c=2.0;
float d = f(z[2], 2) + 3*g(4,a), e;
Point f = {1,2};
"""
for (token, start, end) in DECLARATION.scanString(code):
vtype = token.type
for variable in token.variables:
name = variable.name
size = variable.size
value = variable.value
s = "%s / %s" % (vtype,name)
if size: s += ' [%s]' % size[0]
if value: s += ' / %s' % value[0]
s += ";"
print s

Change from Combine(Literal('#') + 'spec') to Keyword('#spec') removes whitespace

Why does using Combine(...) preserve the whitespace, whereas Keyword(...) removes thes whitespace?
I need to preserve the whitespace after the matched token.
The test is as follows:
from pyparsing import *
def parse(string, refpattern):
print refpattern.searchString(string)
pattern = StringStart() \
+ SkipTo(refpattern)('previous') \
+ refpattern('ref') \
+ SkipTo(StringEnd())('rest')
print pattern.parseString(string)
string = "With #ref to_something"
identifier = Combine(Word(alphas + '_', alphanums + '_') + Optional('.' + Word(alphas)))
pattern_without_space = (CaselessKeyword('#ref') | CaselessKeyword(r'\ref')).setParseAction(lambda s, l, t: ['ref']) \
+ White().suppress() + identifier
pattern_with_space = Combine((Literal('#') | Literal('\\')).suppress() + 'ref') + White().suppress() + identifier
parse(string, pattern_without_space)
parse(string, pattern_with_space)
will output:
[['ref', 'to_something']]
['With', 'ref', 'to_something', '']
[['ref', 'to_something']]
['With ', 'ref', 'to_something', '']
# ^ space i need is preserved here

The problem happens when using alternation (the | operator) with CaselessKeyword. See these examples:
from pyparsing import *
theString = 'This is #Foo Bar'
identifier = Combine(Word(alphas + '_', alphanums + '_') + Optional('.' + Word(alphas)))
def testParser(p):
q = StringStart() + SkipTo(p)("previous") + p("body") + SkipTo(StringEnd())("rest")
return q.parseString(theString)
def test7():
p0 = (CaselessKeyword('#Foo') | Literal('#qwe')) + White().suppress() + identifier
p1 = (CaselessKeyword('#Foo') | CaselessKeyword('#qwe')) + White().suppress() + identifier
p2 = (Literal('#qwe') | CaselessKeyword('#Foo')) + White().suppress() + identifier
p3 = (CaselessKeyword('#Foo')) + White().suppress() + identifier
p4 = Combine((Literal('#') | Literal('\\')).suppress() + 'Foo') + White().suppress() + identifier
print "p0:", testParser(p0)
print "p1:", testParser(p1)
print "p2:", testParser(p2)
print "p3:", testParser(p3)
print "p4:", testParser(p4)
test7()
The output is:
p0: ['This is', '#Foo', 'Bar', '']
p1: ['This is', '#Foo', 'Bar', '']
p2: ['This is', '#Foo', 'Bar', '']
p3: ['This is ', '#Foo', 'Bar', '']
p4: ['This is ', 'Foo', 'Bar', '']
Perhaps this is a bug?
Update: This is how you could define your own parser to match either #Foo or \Foo as a keyword:
from pyparsing import *
import string
class FooKeyWord(Token):
alphas = string.ascii_lowercase + string.ascii_uppercase
nums = "0123456789"
alphanums = alphas + nums
def __init__(self):
super(FooKeyWord,self).__init__()
self.identChars = alphanums+"_$"
self.name = "#Foo"
def parseImpl(self, instring, loc, doActions = True):
if (instring[loc] in ['#', '\\'] and
instring.startswith('Foo', loc+1) and
(loc+4 >= len(instring) or instring[loc+4] not in self.identChars) and
(loc == 0 or instring[loc-1].upper() not in self.identChars)):
return loc+4, instring[loc] + 'Foo'
raise ParseException(instring, loc, self.errmsg, self)
def test8():
p = FooKeyWord() + White().suppress() + identifier
q = StringStart() + SkipTo(p)("previous") + p("body") + SkipTo(StringEnd())("rest")
print "with #Foo:", q.parseString("This is #Foo Bar")
print "with \\Foo:", q.parseString("This is \\Foo Bar")
And the output:
with #Foo: ['This is ', '#Foo', 'Bar', '']
with \Foo: ['This is ', '\\Foo', 'Bar', '']

SimpleParse not showing the result tree

I am working on the Google ProtoBuff where I am trying to parse the proto file using SimpleParse in python.
I am using EBNF format with SimpleParse, it shows success but there is nothing in the result Tree, not sure what is going wrong. Any help would really be appreciated.
Following is the grammar file:
proto ::= ( message / extend / enum / import / package / option / ';' )*
import ::= 'import' , strLit , ';'
package ::= 'package' , ident , ( '.' , ident )* , ';'
option ::= 'option' , optionBody , ';'
optionBody ::= ident , ( '.' , ident )* , '=' , constant
message ::= 'message' , ident , messageBody
extend ::= 'extend' , userType , '{' , ( field / group / ';' )* , '}'
enum ::= 'enum' , ident , '{' , ( option / enumField / ';' )* , '}'
enumField ::= ident , '=' , intLit , ';'
service ::= 'service' , ident , '{' , ( option / rpc / ';' )* , '}'
rpc ::= 'rpc' , ident , '(' , userType , ')' , 'returns' , '(' , userType , ')' , ';'
messageBody ::= '{' , ( field / enum / message / extend / extensions / group / option / ':' )* , '}'
group ::= label , 'group' , camelIdent , '=' , intLit , messageBody
field ::= label , type , ident , '=' , intLit , ( '[' , fieldOption , ( ',' , fieldOption )* , ']' )? , ';'
fieldOption ::= optionBody / 'default' , '=' , constant
extensions ::= 'extensions' , extension , ( ',' , extension )* , ';'
extension ::= intLit , ( 'to' , ( intLit / 'max' ) )?
label ::= 'required' / 'optional' / 'repeated'
type ::= 'double' / 'float' / 'int32' / 'int64' / 'uint32' / 'uint64' / 'sint32' / 'sint64' / 'fixed32' / 'fixed64' / 'sfixed32' / 'sfixed64' / 'bool' / 'string' / 'bytes' / userType
userType ::= '.'? , ident , ( '.' , ident )*
constant ::= ident / intLit / floatLit / strLit / boolLit
ident ::= [A-Za-z_],[A-Za-z0-9_]*
camelIdent ::= [A-Z],[\w_]*
intLit ::= decInt / hexInt / octInt
decInt ::= [1-9],[\d]*
hexInt ::= [0],[xX],[A-Fa-f0-9]+
octInt ::= [0],[0-7]+
floatLit ::= [\d]+ , [\.\d+]?
boolLit ::= 'true' / 'false'
strLit ::= quote ,( hexEscape / octEscape / charEscape / [^\0\n] )* , quote
quote ::= ['']
hexEscape ::= [\\],[Xx],[A-Fa-f0-9]
octEscape ::= [\\0]? ,[0-7]
charEscape ::= [\\],[abfnrtv\\\?'']
And this is the python code that I am experimenting with:
from simpleparse.parser import Parser
from pprint import pprint
protoGrammar = ""
protoInput = ""
protoGrammarRoot = "proto"
with open ("proto_grammar.ebnf", "r") as grammarFile:
protoGrammar=grammarFile.read()
with open("sample.proto", "r") as protoFile:
protoInput = protoFile.read().replace('\n', '')
parser = Parser(protoGrammar,protoGrammarRoot)
success, resultTree, newCharacter = parser.parse(protoInput)
pprint(protoInput)
pprint(success)
pprint(resultTree)
pprint(newCharacter)
and this the proto file that I am trying to parse
message AmbiguousMsg {
optional string mypack_ambiguous_msg = 1;
optional string mypack_ambiguous_msg1 = 1;
}
I get the output as
1
[]
0

I am new to Python but I came up with this, although I am not entirely sure of your output format. Hopefully this will point you in the right direction. Feel free to modify the code below to cater your requirements.
#!/usr/bin/python
# (c) 2015 enthusiasticgeek for StackOverflow. Use the code in anyway you want but leave credits intact. Also use this code at your own risk. I do not take any responsibility for your usage - blame games and trolls will strictly *NOT* be tolerated.
import re
#data_types=['string','bool','enum','int32','uint32','int64','uint64','sint32','sint64','bytes','string','fixed32','sfixed32','float','fixed64','sfixed64','double']
#function # 1
#Generate list of units in the brackets
#================ tokens based on braces ====================
def find_balanced_braces(args):
parts = []
for arg in args:
if '{' not in arg:
continue
chars = []
n = 0
for c in arg:
if c == '{':
if n > 0:
chars.append(c)
n += 1
elif c == '}':
n -= 1
if n > 0:
chars.append(c)
elif n == 0:
parts.append(''.join(chars).lstrip().rstrip())
chars = []
elif n > 0:
chars.append(c)
return parts
#function # 2
#================ Retrieve Nested Levels ====================
def find_nested_levels(test, count_level):
count_level=count_level+1
level = find_balanced_braces(test)
if not bool(level):
return count_level-1
else:
return find_nested_levels(level,count_level)
#function # 3
#================ Process Nested Levels ====================
def process_nested_levels(test, count_level):
count_level=count_level+1
level = find_balanced_braces(test)
print "===== Level = " + str(count_level) + " ====="
for i in range(len(level)):
#print level[i] + "\n"
exclusive_level_messages = ''.join(level[i]).split("message")[0]
exclusive_level_messages_tokenized = ''.join(exclusive_level_messages).split(";")
#print exclusive_level_messages + "\n"
for j in range(len(exclusive_level_messages_tokenized)):
pattern = exclusive_level_messages_tokenized[j].lstrip()
print pattern
#match = "\message \s*(.*?)\s*\{"+pattern
#match_result = re.findall(match, level[i])
#print match_result
print "===== End Level ====="
if not bool(level):
return count_level-1
else:
return process_nested_levels(level,count_level)
#============================================================
#=================================================================================
test_string=("message a{ optional string level-i1.l1.1 = 1 [default = \"/\"]; "
"message b{ required bool level-i1.l2.1 = 1; required fixed32 level-i1.l2.1 = 2; "
"message c{ required string level-i1.l3.1 = 1; } "
"} "
"} "
"message d{ required uint64 level-i2.l1.1 = 1; required double level-i2.l1.2 = 2; "
"message e{ optional double level-i2.l2.1 = 1; "
"message f{ optional fixed64 level-i2.l3.1 = 1; required fixed32 level-i2.l3.2 = 2; "
"message g{ required bool level-i2.l4.1 = 2; } "
"} "
"} "
"} "
"message h{ required uint64 level-i3.l1.1 = 1; required double level-i3.l1.2 = 2; }")
#Right now I do not see point in replacing \n with blank space
with open ("fileproto.proto", "r") as myfile:
data=myfile.read().replace('\n', '\n')
print data
count_level=0
#replace 'data' in the following line with 'test_string' for tests
nested_levels=process_nested_levels([data],count_level)
print "Total count levels depth = " + str(nested_levels)
print "========================\n"
My output looks as follows
// This defines protocol for a simple server that lists files.
//
// See also the nanopb-specific options in fileproto.options.
message ListFilesRequest {
optional string path = 1 [default = "/"];
}
message FileInfo {
required uint64 inode = 1;
required string name = 2;
}
message ListFilesResponse {
optional bool path_error = 1 [default = false];
repeated FileInfo file = 2;
}
===== Level = 1 =====
optional string path = 1 [default = "/"]
required uint64 inode = 1
required string name = 2
optional bool path_error = 1 [default = false]
repeated FileInfo file = 2
===== End Level =====
===== Level = 2 =====
===== End Level =====
Total count levels depth = 1
========================
NOTE After print pattern you can tokenize further if necessary by taking pattern as in input. I have commented one example with regex.

pyparsing: grammar for list of Dictionaries (erlang)

I'm trying to build a grammar to parse an Erlang tagged tuple list, and map this to a Dict in pyparsing. I'm having problems when I have a list of Dicts. The grammar works if the Dict has just one element, but when I add a second can't work out now to get it to parse.
Current (simplified grammar code (I removed the bits of the language not necessary in this case):
#!/usr/bin/env python2.7
from pyparsing import *
# Erlang config file definition:
erlangAtom = Word( alphas + '_')
erlangString = dblQuotedString.setParseAction( removeQuotes )
erlangValue = Forward()
erlangList = Forward()
erlangElements = delimitedList( erlangValue )
erlangCSList = Suppress('[') + erlangElements + Suppress(']')
erlangList <<= Group( erlangCSList )
erlangTaggedTuple = Group( Suppress('{') + erlangAtom + Suppress(',') +
erlangValue + Suppress('}') )
erlangDict = Dict( Suppress('[') + delimitedList( erlangTaggedTuple ) +
Suppress(']') )
erlangValue <<= ( erlangAtom | erlangString |
erlangTaggedTuple |
erlangDict | erlangList )
if __name__ == "__main__":
working = """
[{foo,"bar"}, {baz, "bar2"}]
"""
broken = """
[
[{foo,"bar"}, {baz, "bar2"}],
[{foo,"bob"}, {baz, "fez"}]
]
"""
w = erlangValue.parseString(working)
print w.dump()
b = erlangValue.parseString(broken)
print "b[0]:", b[0].dump()
print "b[1]:", b[1].dump()
This gives:
[['foo', 'bar'], ['baz', 'bar2']]
- baz: bar2
- foo: bar
b[0]: [['foo', 'bar'], ['baz', 'bar2'], ['foo', 'bob'], ['baz', 'fez']]
- baz: fez
- foo: bob
b[1]:
Traceback (most recent call last):
File "./erl_testcase.py", line 39, in <module>
print "b[1]:", b[1].dump()
File "/Library/Python/2.7/site-packages/pyparsing.py", line 317, in __getitem__
return self.__toklist[i]
IndexError: list index out of range
i.e. working works, but broken doesn't parse as two lists.
Any ideas?
Edit: Tweaked testcase to be more explicit about expected output.

Ok, so I have never worked with pyparsing before, so excuse me if my solution does not make sense. Here we go:
As far as I understand what you need is three main structures. The most common mistake you made was grouping delimitedLists. They are already grouped, so you have an issue of double grouping. Here are my definitions:
for {a,"b"}:
erlangTaggedTuple = Dict(Group(Suppress('{') + erlangAtom + Suppress(',') + erlangValue + Suppress('}') ))
for [{a,"b"}, {c,"d"}]:
erlangDict = Suppress('[') + delimitedList( erlangTaggedTuple ) + Suppress(']')
for the rest:
erlangList <<= Suppress('[') + delimitedList( Group(erlangDict|erlangList) ) + Suppress(']')
So my fix for your code is:
#!/usr/bin/env python2.7
from pyparsing import *
# Erlang config file definition:
erlangAtom = Word( alphas + '_')
erlangString = dblQuotedString.setParseAction( removeQuotes )
erlangValue = Forward()
erlangList = Forward()
erlangTaggedTuple = Dict(Group(Suppress('{') + erlangAtom + Suppress(',') +
erlangValue + Suppress('}') ))
erlangDict = Suppress('[') + delimitedList( erlangTaggedTuple ) + Suppress(']')
erlangList <<= Suppress('[') + delimitedList( Group(erlangDict|erlangList) ) + Suppress(']')
erlangValue <<= ( erlangAtom | erlangString |
erlangTaggedTuple |
erlangDict| erlangList )
if __name__ == "__main__":
working = """
[{foo,"bar"}, {baz, "bar2"}]
"""
broken = """
[
[{foo,"bar"}, {baz, "bar2"}],
[{foo,"bob"}, {baz, "fez"}]
]
"""
w = erlangValue.parseString(working)
print w.dump()
b = erlangValue.parseString(broken)
print "b[0]:", b[0].dump()
print "b[1]:", b[1].dump()
Which gives the output:
[['foo', 'bar'], ['baz', 'bar2']]
- baz: bar2
- foo: bar
b[0]: [['foo', 'bar'], ['baz', 'bar2']]
- baz: bar2
- foo: bar
b[1]: [['foo', 'bob'], ['baz', 'fez']]
- baz: fez
- foo: bob
Hope that helps, cheers!

I can't understand why it's not working, because your code looks very much like the JSON example, which handles nested lists just fine.
But the problem seems to happen at this line
erlangElements = delimitedList( erlangValue )
where if the erlangValues are lists, they get appended instead of cons'd. You can kludge around this with
erlangElements = delimitedList( Group(erlangValue) )
which adds an extra layer of list around the top-most element, but keeps your sub-lists from merging.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to get pyparser to work in a particular form - python

Related

(pyparsing) delimitedList does not work as expected

Parse C-like declarations using pyparsing

Change from Combine(Literal('#') + 'spec') to Keyword('#spec') removes whitespace

SimpleParse not showing the result tree

pyparsing: grammar for list of Dictionaries (erlang)

Categories

Resources