Let us assume we have the following string
string = """
object obj1{
attr1 value1;
object obj2 {
attr2 value2;
}
}
object obj3{
attr3 value3;
attr4 value4;
}
"""
There is a nested object, and we use Forward to parse this.
from pyparsing import *
word = Word(alphanums)
attribute = word.setResultsName("name")
value = word.setResultsName("value")
object_grammar = Forward()
attributes = attribute + value + Suppress(";") + LineEnd().suppress()
object_type = Suppress("object ") + word.setResultsName("object_type") + Suppress('{') + LineEnd().suppress()
object_grammar <<= object_type+\
OneOrMore(attributes|object_grammar) + Suppress("}") | Suppress("};")
for i, (obj, _, _) in enumerate(object_grammar.scanString(string)):
print('\n')
print('Enumerating over object {}'.format(i))
print('\n')
print('This is the object type {}'.format(obj.object_type))
print(obj.asXML())
print(obj.asDict())
print(obj.asList())
print(obj)
print(obj.dump())
These are the results. The obj.asXML() function contains all the information, however since it has been flattened, the order of the information is essential to parsing the result. Is this the best way to do it? I must be missing something. I would like a solution that works for both nested and not nested objects, i.e. for obj1, obj2 and obj3.
Also, setResultsName('object_type') doesn't return the object_type for the parent object. The output of the program above is shown below. Any suggestions?
Enumerating over object 0
This is the object type obj2
<ITEM>
<object_type>obj1</object_type>
<name>attr1</name>
<value>value1</value>
<object_type>obj2</object_type>
<name>attr2</name>
<value>value2</value>
</ITEM>
{'object_type': 'obj2', 'name': 'attr2', 'value': 'value2'}
['obj1', 'attr1', 'value1', 'obj2', 'attr2', 'value2']
['obj1', 'attr1', 'value1', 'obj2', 'attr2', 'value2']
['obj1', 'attr1', 'value1', 'obj2', 'attr2', 'value2']
- name: attr2
- object_type: obj2
- value: value2
Enumerating over object 1
This is the object type obj3
<ITEM>
<object_type>obj3</object_type>
<name>attr3</name>
<value>value3</value>
<name>attr4</name>
<value>value4</value>
</ITEM>
{'object_type': 'obj3', 'name': 'attr4', 'value': 'value4'}
['obj3', 'attr3', 'value3', 'attr4', 'value4']
['obj3', 'attr3', 'value3', 'attr4', 'value4']
['obj3', 'attr3', 'value3', 'attr4', 'value4']
- name: attr4
- object_type: obj3
- value: value4
While you have successfully processed your input string, let me suggest some refinements to your grammar.
When defining a recursive grammar, one usually wants to maintain some structure in the output results. In your case, the logical piece to structure is the content of each object, which is surrounded by opening and closing braces. Conceptually:
object_content = '{' + ZeroOrMore(attribute_defn | object_defn) + '}'
Then the supporting expressions are just (still conceptually):
attribute_defn = identifier + attribute_value + ';'
object_defn = 'object' + identifier + object_content
The actual Python/pyparsing for this looks like:
LBRACE,RBRACE,SEMI = map(Suppress, "{};")
word = Word(alphas, alphanums)
attribute = word
# expand to include other values if desired, such as ints, reals, strings, etc.
attribute_value = word
attributeDefn = Group(word("name") + value("value") + SEMI)
OBJECT = Keyword("object")
object_header = OBJECT + word("object_name")
object_grammar = Forward()
object_body = Group(LBRACE
+ ZeroOrMore(object_grammar | attributeDefn)
+ RBRACE)
object_grammar <<= Group(object_header + object_body("object_content"))
Group does two things for us: it structures the results into sub-objects; and it keeps the results names at one level from stepping on those at a different level (so no need for listAllMatches).
Now instead of scanString, you can just process your input using OneOrMore:
print(OneOrMore(object_grammar).parseString(string).dump())
Giving:
[['object', 'obj1', [['attr1', 'value1'], ['object', 'obj2', [['attr2', 'value2']]]]], ['object', 'obj3', [['attr3', 'value3'], ['attr4', 'value4']]]]
[0]:
['object', 'obj1', [['attr1', 'value1'], ['object', 'obj2', [['attr2', 'value2']]]]]
- object_content: [['attr1', 'value1'], ['object', 'obj2', [['attr2', 'value2']]]]
[0]:
['attr1', 'value1']
- name: attr1
- value: value1
[1]:
['object', 'obj2', [['attr2', 'value2']]]
- object_content: [['attr2', 'value2']]
[0]:
['attr2', 'value2']
- name: attr2
- value: value2
- object_name: obj2
- object_name: obj1
[1]:
['object', 'obj3', [['attr3', 'value3'], ['attr4', 'value4']]]
- object_content: [['attr3', 'value3'], ['attr4', 'value4']]
[0]:
['attr3', 'value3']
- name: attr3
- value: value3
[1]:
['attr4', 'value4']
- name: attr4
- value: value4
- object_name: obj3
I started to just make simple changes to your code, but there was a fatal flaw in your original. Your parser separated the left and right braces into two separate expressions - while this "works", it defeats the ability to define the group structure of the results.
I was able to work around this by using listAllMatches=True in the setResultsNames function. This gave me as asXML() result that had structure that I could retrieve information from. It still relies on the order of the XML and requires using zip for get the name and value for a attribute together. I'll leave this question open to see if I get a better way of doing this.
Related
We use the https://ltb-project.org/documentation/openldap-noopsrch.html overlay on openldap.
It gives you the number of entries in each catalog without having to browse all.
example show -e '!1.3.6.1.4.1.4203.666.5.18' controltype to ldapsearch:
ldapsearch -x -H 'ldap://localhost:389' -D 'cn=Manager,dc=my-domain,dc=com'
-w secret -b 'dc=my-domain,dc=com' \
'(objectClass=*)' -e '!1.3.6.1.4.1.4203.666.5.18'
I use the python3 ldap3: https://ldap3.readthedocs.io/en/latest/searches.html
Any tips/examples on how to implement this?
Thanks to #EricLavault answer I managed to fix this:
c.search(base, filter, scope, controls=[
build_control(oid='1.3.6.1.4.1.4203.666.5.18',
criticality=True,
value=None)
])
c.result then holds a controls dict:
{
'result': 0, 'description': 'success', 'dn': '',
'message': '', 'referrals': None, 'type': 'searchResDone',
'controls': {
'1.3.6.1.4.1.4203.666.5.18': {'description': '',
'criticality': False,
'value': b'0\x0b\x02\x01\x00\x02\x03\x01\xf0\xac\x02\x01\x00'
}
}
The format of value is described here:
https://ltb-project.org/documentation/openldap-noopsrch.html#usage
>>> v = b'0\x0b\x02\x01\x00\x02\x03\x01\xf0\xac\x02\x01\x00'
>>> vh = hex(int.from_bytes(v,'big'))
>>> vhl = [f"0x{vh[i:i+2]}" for i in range(2, len(vh), 2)]
>>> vhl
['0x30', '0x0b', '0x02', '0x01', '0x00', '0x02', '0x03', '0x01', '0xf0', '0xac', '0x02', '0x01', '0x00']
# org count length is the 7th hex from msb in vhl (it can have another position if the response have any kind of error)
>>> orglen = int(vhl[6], 16)
>>> orgcount = vhl[7:7+orglen]
>>> orgcount
['0x01', '0xf0', '0xac']
>>> c = '0x'
# merge orgcount hex
>>> for o in orgcount:
... c += f"{int(o, 16):02x}"
...
>>> c = int(c, 16) # convert back to dec
>>> c
127148
Checked by counting the objects returned given the same base,scope and filter, only it took 27sec to parse while this took 0.24sec
This question already has answers here:
How do I create variable variables?
(17 answers)
Closed 4 years ago.
I have defined two dictionaries dict1 and dict2. I want the user to tell me via input, which dictionary to access (of course he has to know the exact names), so he gets a value from this dictionary. The following doesn't work out, I get a
Type Error "string indices must be integers":
dict1 = {'size': 38.24, 'rate': 465}
dict2 = {'size': 32.9, 'rate': 459}
name = input('Which dictionary to access?: ')
ret = name['size']
print ('Size of ' + name + ' is ' + str(ret))
globals() return a dict containing all global variables already defined:
>>> globals()
{'dict1': {'rate': 465, 'size': 38.24}, 'dict2': {'rate': 459, 'size': 32.9}, '__builtins__': <module '__builtin__' (built-in)>, '__file__': 'C:/Users/xxx/.PyCharm2018.3/config/scratches/scratch.py', '__package__': None, '__name__': '__main__', '__doc__': None}
So you should be able to retrieve the right variable using globals()[name]. But keep in mind this is a terrible way to do: variable names aren't meant to be dynamic. You should use a global dict to perform this kind of processing:
dicts = {
"dict1": {'size': 38.24, 'rate': 465},
"dict2": {'size': 32.9, 'rate': 459},
}
name = input('Which dictionary to access?: ')
ret = dicts[name]
print ('Size of ' + name + ' is ' + str(ret))
dict1 = {'size': 38.24, 'rate': 465}
dict2 = {'size': 32.9, 'rate': 459}
name = input('Which dictionary to access?: ')
if name == 'dict1':
ret = dict1['size']
eif name == 'dict2':
ret = dict2['size']
print ('Size of ' + name + ' is ' + str(ret))
or
input_to_dict_mapping = {'dict1':dict1,'dict2':dict2}
ret = input_to_dict_mapping[name]['size']
or from Antwane response.
Updated
input_to_dict_mapping = globe()
ret = input_to_dict_mapping[name]['size']
The problem is that name is a string value. you can not do the index as we do it in Dict.
Im building a parser for the IBM Rhapsody sbs file format. But unfortunately the recursion part won't work as expected. The rule pp.Word(pp.printables + " ") is probably the problem as it matches also ; and {}. But at least ; can also be part of the values.
import pyparsing as pp
import pprint
TEST = r"""{ foo
- key = bla;
- value = 1243; 1233; 1235;
- _hans = "hammer
time";
- HaMer = 765; 786; 890;
- value = "
#pragma LINK_INFO DERIVATIVE \"mc9s12xs256\"
";
- _mText = 12.11.2015::13:20:0;
- value = "war"; "fist";
- _obacht = "fish,car,button";
- _id = gibml c0d8-4535-898f-968362779e07;
- bam = { boing
- key = bla;
}
{ boing
- key = bla;
}
}
"""
def flat(loc, toks):
if len(toks[0]) == 1:
return toks[0][0]
assignment = pp.Suppress("-") + pp.Word(pp.alphanums + "_") + pp.Suppress("=")
value = pp.OneOrMore(
pp.Group(assignment + (
pp.Group(pp.OneOrMore(
pp.QuotedString('"', escChar="\\", multiline=True) +
pp.Suppress(";"))).setParseAction(flat) |
pp.Word(pp.alphas) + pp.Suppress(";") |
pp.Word(pp.printables + " ")
))
)
expr = pp.Forward()
expr = pp.Suppress("{") + pp.Word(pp.alphas) + (
value | (assignment + expr) | expr
) + pp.Suppress("}")
expr = expr.ignore(pp.pythonStyleComment)
print TEST
pprint.pprint(expr.parseString(TEST).asList())
Output:
% python prase.py
{ foo
- key = bla;
- value = 1243; 1233; 1235;
- _hans = "hammer
time";
- HaMer = 765; 786; 890;
- value = "
#pragma LINK_INFO DERIVATIVE \"mc9s12xs256\"
";
- _mText = 12.11.2015::13:20:0;
- value = "war"; "fist";
- _obacht = "fish,car,button";
- _id = gibml c0d8-4535-898f-968362779e07;
- bam = { boing
- key = bla;
}
{ boing
- key = bla;
}
}
['foo',
['key', 'bla'],
['value', '1243; 1233; 1235;'],
['_hans', 'hammer\n time'],
['HaMer', '765; 786; 890;'],
['value', '\n #pragma LINK_INFO DERIVATIVE "mc9s12xs256"\n '],
['_mText', '12.11.2015::13:20:0;'],
['value', ['war', 'fist']],
['_obacht', 'fish,car,button'],
['_id', 'gibml c0d8-4535-898f-968362779e07;'],
['bam', '{ boing'],
['key', 'bla']]
Wow, that is one messy model format! I think this will get you close. I started out by trying to characterize what a valid value expression could be. I saw that each grouping could contain ';'-terminated attribute definitions, or '{}'-enclosed nested objects. Each object contained a leading identifier giving the object type.
The difficult issue was the very general token which I named 'value_word', which is pretty much any grouping of characters, as long as it is not a '-', '{', or '}'. The negative lookaheads in the definition of 'value_word' take care of this. I think a key issue here is that I was able to not include ' ' as a valid character in a 'value_word', but instead let pyparsing do its default whitespace-skipping with a potential for having one or more 'value_word's make up an 'attr_value'.
The final kicker (not found in your test case, but in the example you linked to) was this line for an attribute 'assignment':
- m_pParent = ;
So the attr_value had to allow for an empty string also.
from pyparsing import *
LBRACE,RBRACE,SEMI,EQ,DASH = map(Suppress,"{};=-")
ident = Word(alphas + '_', alphanums+'_').setName("ident")
guid = Group('GUID' + Combine(Word(hexnums)+('-'+Word(hexnums))*4))
qs = QuotedString('"', escChar="\\", multiline=True)
character_literal = Combine("'" + oneOf(list(printables+' ')) + "'")
value_word = ~DASH + ~LBRACE + ~RBRACE + Word(printables, excludeChars=';').setName("value_word")
value_atom = guid | qs | character_literal | value_word
object_ = Forward()
attr_value = OneOrMore(object_) | Optional(delimitedList(Group(value_atom+OneOrMore(value_atom))|value_atom, ';')) + SEMI
attr_value.setName("attr_value")
attr_defn = Group(DASH + ident("name") + EQ + Group(attr_value)("value"))
attr_defn.setName("attr_defn")
object_ <<= Group(
LBRACE + ident("type") +
Group(ZeroOrMore(attr_defn | object_))("attributes") +
RBRACE
)
object_.parseString(TEST).pprint()
For your test string it gives:
[['foo',
[['key', ['bla']],
['value', ['1243', '1233', '1235']],
['_hans', ['hammer\n time']],
['HaMer', ['765', '786', '890']],
['value', ['\n #pragma LINK_INFO DERIVATIVE "mc9s12xs256"\n ']],
['_mText', ['12.11.2015::13:20:0']],
['value', ['war', 'fist']],
['_obacht', ['fish,car,button']],
['_id', [['gibml', 'c0d8-4535-898f-968362779e07']]],
['bam', [['boing', [['key', ['bla']]]], ['boing', [['key', ['bla']]]]]]]]]
I added results names that might help in processing these structures. Using object_.parseString(TEST).dump() gives this output:
[['foo', [['key', ['bla']], ['value', ['1243', '1233', '1235']], ['_hans', ['hammer\n time']], ...
[0]:
['foo', [['key', ['bla']], ['value', ['1243', '1233', '1235']], ['_hans', ['hammer\n time']], ...
- attributes: [['key', ['bla']], ['value', ['1243', '1233', '1235']], ['_hans', ['hammer...
[0]:
['key', ['bla']]
- name: key
- value: ['bla']
[1]:
['value', ['1243', '1233', '1235']]
- name: value
- value: ['1243', '1233', '1235']
[2]:
['_hans', ['hammer\n time']]
- name: _hans
- value: ['hammer\n time']
[3]:
['HaMer', ['765', '786', '890']]
- name: HaMer
- value: ['765', '786', '890']
[4]:
['value', ['\n #pragma LINK_INFO DERIVATIVE "mc9s12xs256"\n ']]
- name: value
- value: ['\n #pragma LINK_INFO DERIVATIVE "mc9s12xs256"\n ']
[5]:
['_mText', ['12.11.2015::13:20:0']]
- name: _mText
- value: ['12.11.2015::13:20:0']
[6]:
['value', ['war', 'fist']]
- name: value
- value: ['war', 'fist']
[7]:
['_obacht', ['fish,car,button']]
- name: _obacht
- value: ['fish,car,button']
[8]:
['_id', [['gibml', 'c0d8-4535-898f-968362779e07']]]
- name: _id
- value: [['gibml', 'c0d8-4535-898f-968362779e07']]
[0]:
['gibml', 'c0d8-4535-898f-968362779e07']
[9]:
['bam', [['boing', [['key', ['bla']]]], ['boing', [['key', ['bla']]]]]]
- name: bam
- value: [['boing', [['key', ['bla']]]], ['boing', [['key', ['bla']]]]]
[0]:
['boing', [['key', ['bla']]]]
- attributes: [['key', ['bla']]]
[0]:
['key', ['bla']]
- name: key
- value: ['bla']
- type: boing
[1]:
['boing', [['key', ['bla']]]]
- attributes: [['key', ['bla']]]
[0]:
['key', ['bla']]
- name: key
- value: ['bla']
- type: boing
- type: foo
It also successfully parses the linked example, once the leading version line is removed.
I'm using pprint to nicely print a dict and it's working fine. Now I switch to using an OrderedDict from module collections. Unfortunately, the pprint routing does not seem to recognize that such objects are more or less dicts as well and falls back to printing that as a long line.
>>> d = { i:'*'*i for i in range(8) }
>>> pprint.pprint(d)
{0: '',
1: '*',
2: '**',
3: '***',
4: '****',
5: '*****',
6: '******',
7: '*******'}
>>> pprint.pprint(collections.OrderedDict(d))
OrderedDict([(0, ''), (1, '*'), (2, '**'), (3, '***'), (4, '****'), (5, '*****'), (6, '******'), (7, '*******')])
Any way to get a nicer representation of OrderedDicts as well? Maybe even if they are nested inside a normal dict or list?
I found a relatively simple solution for this, but it includes the risk of making the output for your ordered dictionary appear exactly as if it were a regular dict object.
The original solution for using a context manager to prevent pprint from sorting dictionary keys comes from this answer.
#contextlib.contextmanager
def pprint_OrderedDict():
pp_orig = pprint._sorted
od_orig = OrderedDict.__repr__
try:
pprint._sorted = lambda x:x
OrderedDict.__repr__ = dict.__repr__
yield
finally:
pprint._sorted = pp_orig
OrderedDict.__repr__ = od_orig
(You could also just patch the OrderedDict.__repr__ method with dict.__repr__, but please don't.)
Example:
>>> foo = [('Roger', 'Owner'), ('Diane', 'Manager'), ('Bob', 'Manager'),
... ('Ian', 'Associate'), ('Bill', 'Associate'), ('Melinda', 'Associate')]
>>> d = OrderedDict(foo)
>>> pprint.pprint(d)
OrderedDict([('Roger', 'Owner'), ('Diane', 'Manager'), ('Bob', 'Manager'), ('Ian', 'Associate'), ('Bill', 'Associate'), ('Melinda', 'Associate')])
>>> pprint.pprint(dict(d))
{'Bill': 'Associate',
'Bob': 'Manager',
'Diane': 'Manager',
'Ian': 'Associate',
'Melinda': 'Associate',
'Roger': 'Owner'}
>>> with pprint_OrderedDict():
... pprint.pprint(d)
...
{'Roger': 'Owner',
'Diane': 'Manager',
'Bob': 'Manager',
'Ian': 'Associate',
'Bill': 'Associate',
'Melinda': 'Associate'}
Try this on:
d = collections.OrderedDict({ i:'*'*i for i in range(8) })
EDIT
pprint.pprint(list(d.items()))
If you are specifically targeting CPython* 3.6 or later, then you can just use regular dictionaries instead of OrderedDict. You'll miss out on a few methods exclusive to OrderedDict, and this is not (yet) guaranteed to be portable to other Python implementations,** but it is probably the simplest way to accomplish what you are trying to do.
* CPython is the reference implementation of Python which may be downloaded from python.org.
** CPython stole this idea from PyPy, so you can probably depend on it working there too.
I realize this is sort of necroposting, but I thought I'd post what I use. Its main virtue is that its aoutput can be read back into python, thus allowing, for instance, to shutlle between representations (which I use, for instance, on JSON files). Of course it breaks pprint encapsulation, by ripping some code off its inner _format function.
#!/bin/env python
from __future__ import print_function
import pprint;
from collections import OrderedDict
import json
import sys
class MyPP (pprint.PrettyPrinter):
def _format(self, object, stream, indent, allowance, context, level):
if not isinstance(object, OrderedDict) :
return pprint.PrettyPrinter._format(self, object, stream, indent, allowance, context, level)
level = level + 1
objid = id(object)
if objid in context:
stream.write(_recursion(object))
self._recursive = True
self._readable = False
return
write = stream.write
_len=len
rep = self._repr(object, context, level - 1)
typ = type(object)
sepLines = _len(rep) > (self._width - 1 - indent - allowance)
if self._depth and level > self._depth:
write(rep)
return
write('OrderedDict([\n%s'%(' '*(indent+1),))
if self._indent_per_level > 1:
write((self._indent_per_level - 1) * ' ')
length = _len(object)
#import pdb; pdb.set_trace()
if length:
context[objid] = 1
indent = indent + self._indent_per_level
items = object.items()
key, ent = items[0]
rep = self._repr(key, context, level)
write('( ')
write(rep)
write(', ')
self._format(ent, stream, indent + _len(rep) + 2,
allowance + 1, context, level)
write(' )')
if length > 1:
for key, ent in items[1:]:
rep = self._repr(key, context, level)
if sepLines:
write(',\n%s( %s , ' % (' '*indent, rep))
else:
write(', ( %s , ' % rep)
self._format(ent, stream, indent + _len(rep) + 2,
allowance + 1, context, level)
write(' )')
indent = indent - self._indent_per_level
del context[objid]
write('])')
return
pp = MyPP(indent=1)
handle=open(sys.argv[1],"r")
values=json.loads(handle.read(),object_pairs_hook=OrderedDict)
pp.pprint(values)
class MyOwnClass:
# list who contains the queries
queries = []
# a template dict
template_query = {}
template_query['name'] = 'mat'
template_query['age'] = '12'
obj = MyOwnClass()
query = obj.template_query
query['name'] = 'sam'
query['age'] = '23'
obj.queries.append(query)
query2 = obj.template_query
query2['name'] = 'dj'
query2['age'] = '19'
obj.queries.append(query2)
print obj.queries
It gives me
[{'age': '19', 'name': 'dj'}, {'age': '19', 'name': 'dj'}]
while I expect to have
[{'age': '23' , 'name': 'sam'}, {'age': '19', 'name': 'dj'}]
I thought to use a template for this list because I'm gonna to use it very often and there are some default variable who does not need to be changed.
Why does doing it the template_query itself changes? I'm new to python and I'm getting pretty confused.
this is because you are pointing to the same dictionary each time ... and overwriting the keys ...
# query = obj.template_query - dont need this
query = {}
query['name'] = 'sam'
query['age'] = '23'
obj.queries.append(query)
query2 = {} #obj.template_query-dont need this
query2['name'] = 'dj'
query2['age'] = '19'
obj.queries.append(query2)
this should demonstrate your problem
>>> q = {'a':1}
>>> lst = []
>>> lst.append(q)
>>> q['a']=2
>>> lst
[{'a': 2}]
>>> lst.append(q)
>>> lst
[{'a': 2}, {'a': 2}]
you could implement your class differently
class MyOwnClass:
# a template dict
#property
def template_query():
return {'name':'default','age':-1}
this will make obj.template_query return a new dict each time
This is because query and query2 are both referring to the same object. obj.template_query, in this case.
Better to make a template factory:
def template_query(**kwargs):
template = {'name': 'some default value',
'age': 'another default value',
'car': 'generic car name'}
template.update(**kwargs)
return template
That creates a new dictionary every time it's called. So you can do:
>>> my_query = template_query(name="sam")
>>> my_query
{'name': 'sam', 'age': 'another default value', 'car': 'generic car name'}
You're copying the same dict into query2. Instead, you might want to create the needed dict by creating a function template_query() and constructing a new dict each time:
class MyOwnClass:
# a template dict
def template_query():
d = {}
d['name'] = 'mat'
d['age'] = '12'
d['car'] = 'ferrari'
return d