Any python module for customized BNF parser?

Any python module for customized BNF parser? - python

friends.
I have a 'make'-like style file needed to be parsed. The grammar is something like:
samtools=/path/to/samtools
picard=/path/to/picard
task1:
des: description
path: /path/to/task1
para: [$global.samtools,
$args.input,
$path
]
task2: task1
Where $global contains the variables defined in a global scope. $path is a 'local' variable. $args contains the key/pair values passed in by users.
I would like to parse this file by some python libraries. Better to return some parse tree. If there are some errors, better to report them. I found this one: CodeTalker and yeanpypa. Can they be used in this case? Any other recommendations?

I had to guess what your makefile structure allows based on your example, but this should get you close:
from pyparsing import *
# elements of the makefile are delimited by line, so we must
# define skippable whitespace to include just spaces and tabs
ParserElement.setDefaultWhitespaceChars(' \t')
NL = LineEnd().suppress()
EQ,COLON,LBRACK,RBRACK = map(Suppress, "=:[]")
identifier = Word(alphas+'_', alphanums)
symbol_assignment = Group(identifier("name") + EQ + empty +
restOfLine("value"))("symbol_assignment")
symbol_ref = Word("$",alphanums+"_.")
def only_column_one(s,l,t):
if col(l,s) != 1:
raise ParseException(s,l,"not in column 1")
# task identifiers have to start in column 1
task_identifier = identifier.copy().setParseAction(only_column_one)
task_description = "des:" + empty + restOfLine("des")
task_path = "path:" + empty + restOfLine("path")
task_para_body = delimitedList(symbol_ref)
task_para = "para:" + LBRACK + task_para_body("para") + RBRACK
task_para.ignore(NL)
task_definition = Group(task_identifier("target") + COLON +
Optional(delimitedList(identifier))("deps") + NL +
(
Optional(task_description + NL) &
Optional(task_path + NL) &
Optional(task_para + NL)
)
)("task_definition")
makefile_parser = ZeroOrMore(
symbol_assignment |
task_definition |
NL
)
if __name__ == "__main__":
test = """\
samtools=/path/to/samtools
picard=/path/to/picard
task1:
des: description
path: /path/to/task1
para: [$global.samtools,
$args.input,
$path
]
task2: task1
"""
# dump out what we parsed, including results names
for element in makefile_parser.parseString(test):
print element.getName()
print element.dump()
print
Prints:
symbol_assignment
['samtools', '/path/to/samtools']
- name: samtools
- value: /path/to/samtools
symbol_assignment
['picard', '/path/to/picard']
- name: picard
- value: /path/to/picard
task_definition
['task1', 'des:', 'description ', 'path:', '/path/to/task1 ', 'para:',
'$global.samtools', '$args.input', '$path']
- des: description
- para: ['$global.samtools', '$args.input', '$path']
- path: /path/to/task1
- target: task1
task_definition
['task2', 'task1']
- deps: ['task1']
- target: task2
The dump() output shows you what names you can use to get at the fields within the parsed elements, or to distinguish what kind of element you have. dump() is a handy, generic tool to output whatever pyparsing has parsed. Here is some code that is more specific to your particular parser, showing how to use the field names as either dotted object references (element.target, element.deps, element.name, etc.) or dict-style references (element[key]):
for element in makefile_parser.parseString(test):
if element.getName() == 'task_definition':
print "TASK:", element.target,
if element.deps:
print "DEPS:(" + ','.join(element.deps) + ")"
else:
print
for key in ('des', 'path', 'para'):
if key in element:
print " ", key.upper()+":", element[key]
elif element.getName() == 'symbol_assignment':
print "SYM:", element.name, "->", element.value
prints:
SYM: samtools -> /path/to/samtools
SYM: picard -> /path/to/picard
TASK: task1
DES: description
PATH: /path/to/task1
PARA: ['$global.samtools', '$args.input', '$path']
TASK: task2 DEPS:(task1)

I've used pyparsing in the past and been immensely pleased with it (q.v., the pyparsing project site).

Related

How to parse and search a list of inter-dependent strings?

Not sure how to word the title but I have this list:
Sink Input #1535
Driver: protocol-native.c
Owner Module: 10
Client: 21932
Sink: 0
Sample Specification: s16le 2ch 44100Hz
Channel Map: front-left,front-right
Format: pcm, format.sample_format = "\"s16le\"" format.rate = "44100" format.channels = "2" format.channel_map = "\"front-left,front-right\""
Corked: no
Mute: no
Volume: front-left: 32768 / 50% / -18.06 dB, front-right: 32768 / 50% / -18.06 dB
balance 0.00
Buffer Latency: 0 usec
Sink Latency: 23084 usec
Resample method: n/a
Properties:
media.name = "Simple DirectMedia Layer"
application.name = "ffplay"
with a whole bunch of other stuff following.
First I need to match on Input Sink# and record the following digits until end of line. Then I have to search on application.name = and record the program name that follows in quotes. Then the search has to repeat for multiple sinks and program names. Later I plan to return all Input Sink numbers for a given application name.
Current method uses brute force and high system resources. Is there a better method than this:
def sink_list(prog,func):
''' Return list of Firefox or ffplay input sinks indices
'''
indices = []
result = os.popen('pactl list short sink-inputs') \
.read().strip().splitlines()
# TODO: We could be doing one os.popen and grabbing all sinks at once
if len(result) == 0:
print('sink_list() found no input sinks at all.' \
' Called by: '+func)
return indices
for line in result:
sink = line.split('\t')[0]
app = os.popen('pactl list sink-inputs | grep "Sink Input #' + \
sink + '" -A20 | grep application.name').read()
# print("Searching for:",prog," in:",app," using input sink#:",sink)
if prog in app:
indices.append(sink)
# print('indices',prog,':',indices)
if len(indices) == 0:
print("sink_list() found no input sink for: '" + prog + \
"' called by: "+func)
return indices
# print("Found Input Sinks:", indices)
return indices
Reply to comments
Input was requested:
''' Get old PID's and Input Sinks before ffplay '''
old_pid = pid_list( "ffplay", "play_start()" )
old_sink = sink_list( "ffplay", "play_start()" )
self.have_ffplay_input_sink = False # Each ffplay can have diff #
# Launch ffplay in the background. CANNOT query result, it stops bkgrnd
os.popen('ffplay -autoexit ' + '"' + self.current_song_path + '"' \
+ ' -nodisp 2>' + TMP_CURR_SONG + ' &')
''' Get New PID's and Input Sinks for ffplay '''
# Give time for `ffplay` to create pulseaudio sink.
root.after(100) # THIS IS UGLY, root.after is machine dependent!!!
if not self.top2_is_active: return # Play window closed?
new_pid = pid_list("ffplay", "play_start()")
new_sink = sink_list("ffplay", "play_start()")
self.top2_ffplay_pid = list_diff(new_pid, old_pid, "play_start()")
self.top2_ffplay_sink = list_diff(new_sink, old_sink, "play_start()")

I'll answer my own question in case it helps others.
This is the function I wrote which returns original requirements plus current volume:
def sink_master():
all_lines = []
all_lines = os.popen('pactl list sink-inputs').read().splitlines()
all_sinks = []
in_sink = False
in_volume = False
for line in all_lines:
if in_sink is False and "Sink Input #" in line:
this_sink = line.split('#')[1]
in_sink = True
continue
if in_sink is True and in_volume is False and "Volume:" in line:
this_volume = line.split('/')[1]
this_volume = this_volume.replace(' ','')
this_volume = this_volume.replace('%','')
in_volume = True
continue
if in_sink is True and in_volume is True and "tion.name =" in line:
this_name = line.split('=')[1]
this_name = this_name.replace(' ','')
this_name = this_name.replace('"','')
in_sink = False
in_volume = False
all_sinks.append(tuple((this_sink,this_volume,this_name)))
continue
print(all_sinks)
return all_sinks
When you run it it returns a list of tuples:
[('1828', '100', 'Firefox'), ('1891', '50', 'ffplay'), ('1907', '100', 'ffplay')]
Each tuple contains:
Input Sink # used by pulseaudio (respected by ffplay)
Current volume (with spaces and % stripped)
Application name (with double quotes " stripped)

Parsing dbus monitor output messages

I am trying to parse the dbus monitor output messages. It has most of the messages as multi-line entries(including parameters). I need to parse and concatenate individual log messages to a single line entry.
The dbus-monitor output messages appear as below,
method call time=462.117843 sender=:1.62 -> destination=org.freedesktop.filehandler serial=122 path=/org/freedesktop/filehandler/routing; interface=org.freedesktop.filehandler.routing; member=start
int16 29877
uint16 0
method return time=462.117844 sender=org.freedesktop.filehandler -> destination=:1.62 serial=2210 reply_serial=122
int16 29877
uint16 0
method call time=462.117845 sender=:1.62 -> destination=org.freedesktop.filehandler serial=123 path=/org/freedesktop/filehandler/routing; interface=org.freedesktop.filehandler.routing; member=comment
string "starting .."
string "routing"
method return time=462.117846 sender=:1.19 -> destination=:1.62 serial=2212 reply_serial=123
int12 -23145
signal time=463.11223 sender=:1.64 -> destination=(null destination) serial=124 path=/org/freedesktop/fileserver; interface=org.freedesktop.DBus.Properties; member=PropertiesChanged
string "com.freedesktop.Systemserver"
array[
dict entry(
string "SystemTime"
variant struct{
byte 12
byte 9
byte 0
}
)
]
array [
]
This is the regex I tried to group the dbus messages(Parameter not grouped),
\b(signal|method call|method return)\b time=([\d,.]*) sender=([\w,.,:,(,), ]*) -> destination=([\w,.,:,(,), ]*) serial=([(,),\w]*) (?:path=([\w,\/]*); interface=([\w,.]*); member=([\w,_,-]*))?(?:reply_serial=([\d]*))?
I expect the output in the below format,
C [sender,serial] path interface+member (parameter1, parameter2, ...)
R [destination,reply_serial] interface+member (parameter1, parameter2, ...)
S [sender, serial] path interface+member (parameter1, parameter2, ...)
A sample output for the above dbus-monitor messages is shown below,
C [:1.62,122] /org/freedesktop/filehandler/routing org.freedesktop.filehandler.routing.start (29877,0)
R [:1.62,122] org.freedesktop.filehandler.routing.start (29877,0)
C [:1.62,123] /org/freedesktop/filehandler/routing org.freedesktop.filehandler.routing.comment ("starting", "routing")
R [:1.62,123] org.freedesktop.filehandler.routing.comment (-23145)
S [:1.64, 124] /org/freedesktop/fileserver org.freedesktop.DBus.Properties.PropertiesChanged ("com.freedesktop.Systemserver"[("SystemTime",{12,9,0})][])
How can the above expected result be achieved when the entries are usually multi-line? Also, the SIGNALS has multiple encapsulations making it difficult to access the parameters. Can someone help with the parsing of these dbus messages to the expected format?

Can you suggest how the code can be rewritten to process line by line?
Here I rearranged it accordingly:
import re
import sys
regex = r'\b(signal|method call|method return)\b time=([\d,.]*) sender=([\w,.,:,(,), ]*) -> destination=([\w,.,:,(,), ]*) serial=([(,),\w]*) (?:path=([\w,\/]*); interface=([\w,.]*); member=([\w,_,-]*))?(?:reply_serial=([\d]*))?'
remember = dict()
sep = None
for line in open('dbusl.in'):
m = re.match(regex, line)
if m:
if sep is not None: print ")" # end the previous parameter group
m = list(m.groups()) # each match is 9 capturing groups
if m[0] == 'method call':
print "C [{2},{4}] {5} {6}.{7}".format(*m),
remember[m[4]] = m[6:8] # store interface+member for return
if m[0] == 'method return':
m[6:8] = remember.pop(m[8]) # recall stored interface+member
print "R [{3},{8}] {6}.{7}".format(*m),
if m[0] == 'signal':
print "S [{2}, {4}] {5} {6}.{7}".format(*m),
sep = "("
else:
p = line.rstrip() # now handle parameters
if p[-1] in "[](){}": # with "encapsulations":
p = p[-1] # delete spaces, "array", "dict ..."
p = re.sub('^\s*\w*\s*', '', p) # delete spaces and data type
if p[-1] in "])}":
sep = '' # no separator before closing
print sep+p,
sys.stdout.softspace=0
if p[-1] in "[](){}": sep = ''
else: sep = ', ' # separator after data item
print ")" # end the previous parameter group
Note that I also changed m[6:8] = remember[m[8]] to m[6:8] = remember.pop(m[8]) in order to free the memory of no longer needed interface+member data.

If you absolutely have to use dbus-monitor, it’s probably best to use its PCAP output mode by passing the --pcap option to it. That outputs in a well-documented structured format which can be read by libpcap.

As you already have a usable regex, you can build on it by using it with re.split to get the needed message parts. Note that this yields a separate string for each capture group plus one string with the parameters, for each message entry. This example assumes that all the messages are in the string messages:
import re
import sys
regex = r'\b(signal|method call|method return)\b time=([\d,.]*) sender=([\w,.,:,(,), ]*) -> destination=([\w,.,:,(,), ]*) serial=([(,),\w]*) (?:path=([\w,\/]*); interface=([\w,.]*); member=([\w,_,-]*))?(?:reply_serial=([\d]*))?'
m = re.split(regex, messages)
m = m[1:] # discard empty? text before first match
remember = dict()
while m: # each match group is 9 capturing groups + 1 parameter group
if m[0] == 'method call':
print "C [{2},{4}] {5} {6}.{7}".format(*m),
remember[m[4]] = m[6:8] # store interface+member for return
if m[0] == 'method return':
m[6:8] = remember[m[8]] # recall stored interface+member
print "R [{3},{8}] {6}.{7}".format(*m),
if m[0] == 'signal':
print "S [{2}, {4}] {5} {6}.{7}".format(*m),
# now handle parameters
sep = "("
for p in m[9].split('\n')[1:-1]: # except empty string at start and end
if p[-1] in "[](){}": # with "encapsulations":
p = p[-1] # delete spaces, "array", "dict ..."
p = re.sub('^\s*\w*\s*', '', p) # delete spaces and data type
if p[-1] in "])}":
sep = '' # no separator before closing
print sep+p,
sys.stdout.softspace=0
if p[-1] in "[](){}": sep = ''
else: sep = ', ' # separator after data item
print ")"
m = m[10:] # delete the processed match group of 10
The output with your sample data is:
C [:1.62,122] /org/freedesktop/filehandler/routing org.freedesktop.filehandler.routing.start (29877, 0)
R [:1.62,122] org.freedesktop.filehandler.routing.start (29877, 0)
C [:1.62,123] /org/freedesktop/filehandler/routing org.freedesktop.filehandler.routing.comment ("starting ..", "routing")
R [:1.62,123] org.freedesktop.filehandler.routing.comment (-23145)
S [:1.64, 124] /org/freedesktop/fileserver org.freedesktop.DBus.Properties.PropertiesChanged ("com.freedesktop.Systemserver", [("SystemTime", {12, 9, 0})][])

The python operation database error

I use python operation postgresql database, the implementation of sql, it removed the quotation marks, resulting in inquiries failed, how to avoid?
def build_sql(self,table_name,keys,condition):
print(condition)
# condition = {
# "os":["Linux","Windows"],
# "client_type":["ordinary"],
# "client_status":'1',
# "offset":"1",
# "limit":"8"
# }
sql_header = "SELECT %s FROM %s" % (keys,table_name)
sql_condition = []
sql_range = []
sql_sort = []
sql_orederby = []
for key in condition:
if isinstance(condition[key],list):
sql_condition.append(key+" in ("+",".join(condition[key])+")")
elif key == 'limit' or key == 'offset':
sql_range.append(key + " " + condition[key])
else:
sql_condition.append(key + " = " + condition[key])
print(sql_condition)
print(sql_range)
sql_condition = [str(i) for i in sql_condition]
if not sql_condition == []:
sql_condition = " where " + " and ".join(sql_condition) + " "
sql = sql_header + sql_condition + " ".join(sql_range)
return sql
Error：
MySQL Error Code : column "winxp" does not exist
LINE 1: ...T * FROM ksc_client_info where base_client_os in (WinXP) and...

Mind you I do not have much Python experience, but basically you don't have single quotes in that sequence, so you either need to add those before passing it to function or for example during join(), like that:
sql_condition.append(key+" in ("+"'{0}'".format("','".join(condition[key]))+")")
You can see other solutions in those questions:
Join a list of strings in python and wrap each string in quotation marks
Add quotes to every list elements

Get correct brace grouping from string

I have files with incorrect JSON that I want to start fixing by getting it into properly grouped chunks.
The brace grouping {{ {} {} } } {{}} {{{}}} should already be correct
How can I grab all the top-level braces, correctly grouped, as separate strings?

If you don't want to install any extra modules simple function will do:
def top_level(s):
depth = 0
start = -1
for i, c in enumerate(s):
if c == '{':
if depth == 0:
start = i
depth += 1
elif c == '}' and depth:
depth -= 1
if depth == 0:
yield s[start:i+1]
print(list(top_level('{{ {} {} } } {{}} {{{}}}')))
Output:
['{{ {} {} } }', '{{}}', '{{{}}}']
It will skip invalid braces but could be easily modified to report an error when they are spotted.

Using the regex module:
In [1]: import regex
In [2]: braces = regex.compile(r"\{(?:[^{}]++|(?R))*\}")
In [3]: braces.findall("{{ {} {} } } {{}} {{{}}}")
Out[3]: ['{{ {} {} } }', '{{}}', '{{{}}}']

pyparsing can be really helpful here. It will handle pathological cases where you have braces inside strings, etc. It might be a little tricky to do all of this work yourself, but fortunately, somebody (the author of the library) has already done the hard stuff for us.... I'll reproduce the code here to prevent link-rot:
# jsonParser.py
#
# Implementation of a simple JSON parser, returning a hierarchical
# ParseResults object support both list- and dict-style data access.
#
# Copyright 2006, by Paul McGuire
#
# Updated 8 Jan 2007 - fixed dict grouping bug, and made elements and
# members optional in array and object collections
#
json_bnf = """
object
{ members }
{}
members
string : value
members , string : value
array
[ elements ]
[]
elements
value
elements , value
value
string
number
object
array
true
false
null
"""
from pyparsing import *
TRUE = Keyword("true").setParseAction( replaceWith(True) )
FALSE = Keyword("false").setParseAction( replaceWith(False) )
NULL = Keyword("null").setParseAction( replaceWith(None) )
jsonString = dblQuotedString.setParseAction( removeQuotes )
jsonNumber = Combine( Optional('-') + ( '0' | Word('123456789',nums) ) +
Optional( '.' + Word(nums) ) +
Optional( Word('eE',exact=1) + Word(nums+'+-',nums) ) )
jsonObject = Forward()
jsonValue = Forward()
jsonElements = delimitedList( jsonValue )
jsonArray = Group(Suppress('[') + Optional(jsonElements) + Suppress(']') )
jsonValue << ( jsonString | jsonNumber | Group(jsonObject) | jsonArray | TRUE | FALSE | NULL )
memberDef = Group( jsonString + Suppress(':') + jsonValue )
jsonMembers = delimitedList( memberDef )
jsonObject << Dict( Suppress('{') + Optional(jsonMembers) + Suppress('}') )
jsonComment = cppStyleComment
jsonObject.ignore( jsonComment )
def convertNumbers(s,l,toks):
n = toks[0]
try:
return int(n)
except ValueError, ve:
return float(n)
jsonNumber.setParseAction( convertNumbers )
Phew! That's a lot ... Now how do we use it? The general strategy here will be to scan the string for matches and then slice those matches out of the original string. Each scan result is a tuple of the form (lex-tokens, start_index, stop_index). For our use, we don't care about the lex-tokens, just the start and stop. We could do: string[result[1], result[2]] and it would work. We can also do string[slice(*result[1:])] -- Take your pick.
results = jsonObject.scanString(testdata)
for result in results:
print '*' * 80
print testdata[slice(*result[1:])]

Cannot parse correctly this file with pyparsing

I am trying to parse a file using the amazing python library pyparsing but I am having a lot of problems...
The file I am trying to parse is something like:
sectionOne:
list:
- XXitem
- XXanotherItem
key1: value1
product: milk
release: now
subSection:
skey : sval
slist:
- XXitem
mods:
- XXone
- XXtwo
version: last
sectionTwo:
base: base-0.1
config: config-7.0-7
As you can see is an indented configuration file, and this is more or less how I have tried to define the grammar
The file can have one or more sections
Each section is formed by a section name and a section content.
Each section have an indented content
Each section content can have one or more pairs of key/value or a subsection.
Each value can be just a single word or a list of items.
A list of items is a group of one or more items.
Each item is an HYPHEN + a name starting with 'XX'
I have tried to create this grammar using pyparsing but with no success.
import pprint
import pyparsing
NEWLINE = pyparsing.LineEnd().suppress()
VALID_CHARACTERS = pyparsing.srange("[a-zA-Z0-9_\-\.]")
COLON = pyparsing.Suppress(pyparsing.Literal(":"))
HYPHEN = pyparsing.Suppress(pyparsing.Literal("-"))
XX = pyparsing.Literal("XX")
list_item = HYPHEN + pyparsing.Combine(XX + pyparsing.Word(VALID_CHARACTERS))
list_of_items = pyparsing.Group(pyparsing.OneOrMore(list_item))
key = pyparsing.Word(VALID_CHARACTERS) + COLON
pair_value = pyparsing.Word(VALID_CHARACTERS) + NEWLINE
value = (pair_value | list_of_items)
pair = pyparsing.Group(key + value)
indentStack = [1]
section = pyparsing.Forward()
section_name = pyparsing.Word(VALID_CHARACTERS) + COLON
section_value = pyparsing.OneOrMore(pair | section)
section_content = pyparsing.indentedBlock(section_value, indentStack, True)
section << pyparsing.Group(section_name + section_content)
parser = pyparsing.OneOrMore(section)
def main():
try:
with open('simple.info', 'r') as content_file:
content = content_file.read()
print "content:\n", content
print "\n"
result = parser.parseString(content)
print "result1:\n", result
print "len", len(result)
pprint.pprint(result.asList())
except pyparsing.ParseException, err:
print err.line
print " " * (err.column - 1) + "^"
print err
except pyparsing.ParseFatalException, err:
print err.line
print " " * (err.column - 1) + "^"
print err
if __name__ == '__main__':
main()
This is the result :
result1:
[['sectionOne', [[['list', ['XXitem', 'XXanotherItem']], ['key1', 'value1'], ['product', 'milk'], ['release', 'now'], ['subSection', [[['skey', 'sval'], ['slist', ['XXitem']], ['mods', ['XXone', 'XXtwo']], ['version', 'last']]]]]]], ['sectionTwo', [[['base', 'base-0.1'], ['config', 'config-7.0-7']]]]]
len 2
[
['sectionOne',
[[
['list', ['XXitem', 'XXanotherItem']],
['key1', 'value1'],
['product', 'milk'],
['release', 'now'],
['subSection',
[[
['skey', 'sval'],
['slist', ['XXitem']],
['mods', ['XXone', 'XXtwo']],
['version', 'last']
]]
]
]]
],
['sectionTwo',
[[
['base', 'base-0.1'],
['config', 'config-7.0-7']
]]
]
]
As you can see I have two main problems:
1.- Each section content is nested twice into a list
2.- the key "version" is parsed inside the "subSection" when it belongs to the "sectionOne"
My real target is to be able to get a structure of python nested dictionaries with the keys and values to easily extract the info for each field, but the pyparsing.Dict is something obscure to me.
Could anyone please help me ?
Thanks in advance
( sorry for the long post )

You really are pretty close - congrats, indented parsers are not the easiest to write with pyparsing.
Look at the commented changes. Those marked with 'A' are changes to fix your two stated problems. Those marked with 'B' add Dict constructs so that you can access the parsed data as a nested structure using the names in the config.
The biggest culprit is that indentedBlock does some extra Group'ing for you, which gets in the way of Dict's name-value associations. Using ungroup to peel that away lets Dict see the underlying pairs.
Best of luck with pyparsing!
import pprint
import pyparsing
NEWLINE = pyparsing.LineEnd().suppress()
VALID_CHARACTERS = pyparsing.srange("[a-zA-Z0-9_\-\.]")
COLON = pyparsing.Suppress(pyparsing.Literal(":"))
HYPHEN = pyparsing.Suppress(pyparsing.Literal("-"))
XX = pyparsing.Literal("XX")
list_item = HYPHEN + pyparsing.Combine(XX + pyparsing.Word(VALID_CHARACTERS))
list_of_items = pyparsing.Group(pyparsing.OneOrMore(list_item))
key = pyparsing.Word(VALID_CHARACTERS) + COLON
pair_value = pyparsing.Word(VALID_CHARACTERS) + NEWLINE
value = (pair_value | list_of_items)
#~ A: pair = pyparsing.Group(key + value)
pair = (key + value)
indentStack = [1]
section = pyparsing.Forward()
section_name = pyparsing.Word(VALID_CHARACTERS) + COLON
#~ A: section_value = pyparsing.OneOrMore(pair | section)
section_value = (pair | section)
#~ B: section_content = pyparsing.indentedBlock(section_value, indentStack, True)
section_content = pyparsing.Dict(pyparsing.ungroup(pyparsing.indentedBlock(section_value, indentStack, True)))
#~ A: section << Group(section_name + section_content)
section << (section_name + section_content)
#~ B: parser = pyparsing.OneOrMore(section)
parser = pyparsing.Dict(pyparsing.OneOrMore(pyparsing.Group(section)))
Now instead of pprint(result.asList()) you can write:
print (result.dump())
to show the Dict hierarchy:
[['sectionOne', ['list', ['XXitem', 'XXanotherItem']], ... etc. ...
- sectionOne: [['list', ['XXitem', 'XXanotherItem']], ... etc. ...
- key1: value1
- list: ['XXitem', 'XXanotherItem']
- mods: ['XXone', 'XXtwo']
- product: milk
- release: now
- subSection: [['skey', 'sval'], ['slist', ['XXitem']]]
- skey: sval
- slist: ['XXitem']
- version: last
- sectionTwo: [['base', 'base-0.1'], ['config', 'config-7.0-7']]
- base: base-0.1
- config: config-7.0-7
allowing you to write statements like:
print (result.sectionTwo.base)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Any python module for customized BNF parser? - python

I've used pyparsing in the past and been immensely pleased with it (q.v., the pyparsing project site).

Related

How to parse and search a list of inter-dependent strings?

Parsing dbus monitor output messages

The python operation database error

Get correct brace grouping from string

Cannot parse correctly this file with pyparsing

Categories

Resources