Python: how parse this dict recursively? - python

I've a flat dict with entities. Each entity can have a parent. I'd like to recursively build each entity, considering the parent values.
Logic:
Each entity inherits defaults from its parent (e.g. is_mammal)
Each entity can overwrite the defaults of its parent (e.g. age)
Each entity can add new attributes (e.g. hobby)
I'm struggling to get it done. Help is appreciated, thanks!
entities = {
'human': {
'is_mammal': True,
'age': None,
},
'man': {
'parent': 'human',
'gender': 'male',
},
'john': {
'parent': 'man',
'age': 20,
'hobby': 'football',
}
};
def get_character(key):
# ... recursive magic with entities ...
return entity
john = get_character('john')
print(john)
Expected output:
{
'is_mammal': True, # inherited from human
'gender': 'male' # inherited from man
'parent': 'man',
'age': 20, # overwritten
'hobby': 'football', # added
}

def get_character(entities, key):
try:
entity = get_character(entities, entities[key]['parent'])
except KeyError:
entity = {}
entity.update(entities[key])
return entity

This solution is using recursion and a Python quirk where mutables (here it's a dictionary {}), are shared among function calls. See the discussion below for why this is somewhat surprising, though useful for accumulating recursion results.
def get_character(d, key, entity = {}):
if d.get(key) is None:
return entity
return get_character(d, d.get(key).get('parent'), d.get(key) | entity)
get_character(entities, 'john')
{'is_mammal': True,
'age': 20,
'parent': 'man',
'gender': 'male',
'hobby': 'football'}

Related

Return can close Loop for? (python/django)

Does return on python close the Loop for?
I'm asking that because I wrote a code that will search for me someone in a list and if know that person it will return me his age if not it will tell me "Person not found".
If the person is in the first place of the Array if find it but if it is in the second he doesn't why?
def lerDoBanco(name):
lista_names = [
{'name': 'Joaquim', 'age': 20},
{'name': 'João', 'age': 25},
{'name': 'Ana', 'age': 27}
]
for person in lista_names:
if person['name'] == name:
return person
else:
return {'name': 'Person not found', 'age': 0}
def fname(request, name):
result = lerDoBanco(name)
if result['age'] > 0:
return HttpResponse('The person was found, has ' + str(result['age']) + ' years old')
else:
return HttpResponse('Person not found')
I ask that because if I comment the else statement it shows me all the people in the Array correctly.
When you call return operator it will terminate the current function execution and return value. So if you will use return inside your loop, return will terminate the loop and the function and return some value.
return terminates execution further and returns some desired value (default is None). Change indentation to make your code working :
def lerDoBanco(name):
lista_names = [
{'name': 'Joaquim', 'age': 20},
{'name': 'João', 'age': 25},
{'name': 'Ana', 'age': 27}
]
for person in lista_names:
if person['name'] == name:
return person
return {'name': 'Person not found', 'age': 0}
It will iterate all the values to find the person then if not found then the default value {'name': 'Person not found', 'age': 0} will be returned.
From the moment you return you move out of the function, so out of the for loop as well.
What you thus can do is simply return the 'Person not found' at the end of the function:
def lerDoBanco(name):
lista_names = [
{'name': 'Joaquim', 'age': 20},
{'name': 'João', 'age': 25},
{'name': 'Ana', 'age': 27}
]
for person in lista_names:
if person['name'] == name:
return person
return {'name': 'Person not found', 'age': 0}
Note that if the data is stored in the database, it is better to make a queryset to filter at the database side. Databases are optimized to search effectively. You can also specify db_index=True on a column such that the database builds an index allowing much faster retrieval.
A loop ends when one of the following happens:
loop condition evaluates to false
break is used
an exception is raised
a return statement is executed
The points above are the same for every programming language

Convert XML to List of Dictionaries in python

I'm very new to python and please treat me as same.
When i tried to convert the XML content into List of Dictionaries I'm getting output but not as expected and tried a lot playing around.
XML Content:
<project>
<panelists>
<panelist panelist_login="pradeep">
<login/>
<firstname/>
<lastname/>
<gender/>
<age>0</age>
</panelist>
<panelist panelist_login="kumar">
<login>kumar</login>
<firstname>kumar</firstname>
<lastname>Pradeep</lastname>
<gender/>
<age>24</age>
</panelist>
</panelists>
</project>
Code i have used:
import xml.etree.ElementTree as ET
tree = ET.parse(xml_file.xml) # import xml from
root = tree.getroot()
Panelist_list = []
for item in root.findall('./panelists/panelist'): # find all projects node
Panelist = {} # dictionary to store content of each projects
panelist_login = {}
panelist_login = item.attrib
Panelist_list.append(panelist_login)
for child in item:
Panelist[child.tag] = child.text
Panelist_list.append(Panelist)
print(Panelist_list)
Output:
[{
'panelist_login': 'pradeep'
}, {
'login': None,
'firstname': None,
'lastname': None,
'gender': None,
'age': '0'
}, {
'panelist_login': 'kumar'
}, {
'login': 'kumar',
'firstname': 'kumar',
'lastname': 'Pradeep',
'gender': None,
'age': '24'
}]
and I'm Expecting for the below Output
[{
'panelist_login': 'pradeep',
'login': None,
'firstname': None,
'lastname': None,
'gender': None,
'age': '0'
}, {
'panelist_login': 'kumar'
'login': 'kumar',
'firstname': 'kumar',
'lastname': 'Pradeep',
'gender': None,
'age': '24'
}]
I have refereed so many stack overflow questions on xml tree but still didn't helped me.
any help/suggestion is appreciated.
Your code is appending the dict panelist_login with the tag attributes to the list, in this line: Panelist_list.append(panelist_login) separately from the Panelist dict. So for every <panelist> tag the code appends 2 dicts: one dict of tag attributes and one dict of subtags. Inside the loop you have 2 append() calls, which means 2 items in the list for each time through the loop.
But you actually want a single dict for each <panelist> tag, and you want the tag attribute to appear inside the Panelist dict as if it were a subtag also.
So have a single dict, and update the Panelist dict with the tag attributes instead of keeping the tag attributes in a separate dict.
for item in root.findall('./panelists/panelist'): # find all projects node
Panelist = {} # dictionary to store content of each projects
panelist_login = item.attrib
Panelist.update(panelist_login) # make panelist_login the first key of the dict
for child in item:
Panelist[child.tag] = child.text
Panelist_list.append(Panelist)
print(Panelist_list)
I get this output, which I think is what you had in mind:
[
{'panelist_login': 'pradeep',
'login': None,
'firstname': None,
'lastname': None,
'gender': None,
'age': '0'},
{'panelist_login': 'kumar',
'login': 'kumar',
'firstname': 'kumar',
'lastname': 'Pradeep',
'gender': None,
'age': '24'}
]

How to sort with logic

With four dictionaries grandpa, dad, son_1 and son_2:
grandpa = {'name': 'grandpa', 'parents': []}
dad = {'name': 'dad', 'parents': ['grandpa']}
son_1 = {'name': 'son_1', 'parents': ['dad']}
son_2 = {'name': 'son_2', 'parents': ['dad']}
relatives = [son_1, grandpa, dad, son_2]
I want to write a function that sorts all these relatives in a "reverse" order.
So instead of parents there would be children list used. The oldest grandpa would be on the top level of result dictionary, the dad would be below with its children list storing the son_1 and son_2:
def sortRelatives(relatives):
# returns a resulted dictionary:
# logic
result = sortRelatives(relatives)
print result
Which would print:
result = {'name': 'grandpa',
'children': [
{'name': 'dad',
'children': [{'name': 'son_1', 'children': [] },
{'name': 'son_2', 'children': [] }] }
]
}
How to make sortRelatives function perform a such sorting?
What I think is a viable yet simple solution is to first build a children dictionary that will map person names to their children. Then we can use that new data structure to build the output:
from collections import defaultdict
def children(relatives):
children = defaultdict(list)
for person in relatives:
for parent in person['parents']:
children[parent].append(person)
return children
Another tool we can use is a function that would find the root of our genealogy:
def genealogy_root(relatives):
for person in relatives:
if not person['parents']:
return person
raise TypeError("This doesn't look like a valid genealogy.")
That will help us locating the person that has no parent, and will therefore be the root of our genealogy-tree. Now that we have all the necessary tools we just need to build the output:
def build_genealogy(relatives):
relatives_children = children(relatives)
def sub_genealogy(current_person):
name = current_person['name']
return dict(
name=name,
children=[sub_genealogy(child) for child in relatives_children[name]]
)
root = genealogy_root(relatives)
return sub_genealogy(root)
result = build_genealogy(relatives)
print(result)
Which outputs:
{
'name': 'grandpa', 'children': [
{'name': 'dad', 'children': [
{'name': 'son_1', 'children': []},
{'name': 'son_2', 'children': []}
]}
]
}
Note that as I said in the comments, this is only working because there are no name duplicates. If several persons share the same name, you will have to have a better data-structure as input. For example, you may want to have something like:
grandpa = {'name': 'grandpa', 'parents': []}
dad = {'name': 'dad', 'parents': [grandpa]}
son_1 = {'name': 'son_1', 'parents': [dad]}
son_2 = {'name': 'son_2', 'parents': [dad]}
relatives = [grandpa, dad, son_1, son_2]

Mongoengine query to get only filtered embedded documents

I have this models:
class Sub(EmbeddedDocument):
name = StringField()
class Main(Document):
subs = ListField(EmbeddedDocumentField(Sub))
When i use this query, it returns all of Main data but i just need subs that their name is 'foo'.
query: Main.objects(__raw__={'subs': {'$elemMatch': {'name': 'foo'}}})
For example with this data:
{
subs: [
{'name': 'one'},
{'name': 'two'},
{'name': 'foo'},
{'name': 'bar'},
{'name': 'foo'}
]
}
The result must be:
{
subs: [
{'name': 'foo'},
{'name': 'foo'}
]
}
Note that in mongodb client, that query returns this values.
If you are allowed to change your data model then try this:
class Main(Document):
subs = ListField(StringField())
Main.objectsfilter(subs__ne="foo")
I propose this approach assuming that the embedded document only has one field in which case it is redundant.
MongoEngine provides the .aggregate(*pipeline, **kwargs) method which performs a aggregate function.
MongoDB 3.2 or newer
match = {"$match": {"subs.name": "foo"}}
project = {'$project': {'subs': {'$filter': {'as': 'sub',
'cond': {'$eq': ['$$sub.name', 'foo']},
'input': '$subs'}}}}
pipeline = [match, project]
Main.objects.aggregate(*pipeline)
MongoDB version <= 3.0
{'$redact': {'$cond': [{'$or': [{'$eq': ['$name', 'foo']}, {'$not': '$name'}]},
'$$DESCEND',
'$$PRUNE']}}
pipeline = [match, redact]
Main.objects.aggregate(*pipeline)

Build nested tree-like dict from an array of dicts with children

I have an array of dicts retrieved from a web API. Each dict has a name, description, 'parent', and children key. The children key has an array of dicts as it value. For the sake of clarity, here is a dummy example:
[
{'name': 'top_parent', 'description': None, 'parent': None,
'children': [{'name': 'child_one'},
{'name': 'child_two'}]},
{'name': 'child_one', 'description': None, 'parent': 'top_parent',
'children': []},
{'name': 'child_two', 'description': None, 'parent': 'top_parent',
'children': [{'name': 'grand_child'}]},
{'name': 'grand_child', 'description': None, 'parent': 'child_two',
'children': []}
]
Every item in in the array. An item could be the top-most parent, and thus not exist in any of the children arrays. An item could be both a child and a parent. Or an item could only be a child (have no children of its own).
So, in a tree structure, you'd have something like this:
top_parent
child_one
child_two
grand_child
In this contrived and simplified example top_parent is a parent but not a child; child_one is a child but not a parent; child_two is a parent and a child; and grand_child is a child but not a parent. This covers every possible state.
What I want is to be able to iterate over the array of dicts 1 time and generate a nested dict that properly represents the tree structure (however, it 1 time is impossible, the most efficient way possible). So, in this example, I would get a dict that looked like this:
{
'top_parent': {
'child_one': {},
'child_two': {
'grand_child': {}
}
}
}
Strictly speaking, it is not necessary to have item's without children to not be keys, but that is preferable.
Fourth edit, showing three versions, cleaned up a bit. First version works top-down and returns None, as you requested, but essentially loops through the top level array 3 times. The next version only loops through it once, but returns empty dicts instead of None.
The final version works bottom up and is very clean. It can return empty dicts with a single loop, or None with additional looping:
from collections import defaultdict
my_array = [
{'name': 'top_parent', 'description': None, 'parent': None,
'children': [{'name': 'child_one'},
{'name': 'child_two'}]},
{'name': 'child_one', 'description': None, 'parent': 'top_parent',
'children': []},
{'name': 'child_two', 'description': None, 'parent': 'top_parent',
'children': [{'name': 'grand_child'}]},
{'name': 'grand_child', 'description': None, 'parent': 'child_two',
'children': []}
]
def build_nest_None(my_array):
childmap = [(d['name'], set(x['name'] for x in d['children']) or None)
for d in my_array]
all_dicts = dict((name, kids and {}) for (name, kids) in childmap)
results = all_dicts.copy()
for (name, kids) in ((x, y) for x, y in childmap if y is not None):
all_dicts[name].update((kid, results.pop(kid)) for kid in kids)
return results
def build_nest_empty(my_array):
all_children = set()
all_dicts = defaultdict(dict)
for d in my_array:
children = set(x['name'] for x in d['children'])
all_dicts[d['name']].update((x, all_dicts[x]) for x in children)
all_children.update(children)
top_name, = set(all_dicts) - all_children
return {top_name: all_dicts[top_name]}
def build_bottom_up(my_array, use_None=False):
all_dicts = defaultdict(dict)
for d in my_array:
name = d['name']
all_dicts[d['parent']][name] = all_dicts[name]
if use_None:
for d in all_dicts.values():
for x, y in d.items():
if not y:
d[x] = None
return all_dicts[None]
print(build_nest_None(my_array))
print(build_nest_empty(my_array))
print(build_bottom_up(my_array, True))
print(build_bottom_up(my_array))
Results in:
{'top_parent': {'child_one': None, 'child_two': {'grand_child': None}}}
{'top_parent': {'child_one': {}, 'child_two': {'grand_child': {}}}}
{'top_parent': {'child_one': None, 'child_two': {'grand_child': None}}}
{'top_parent': {'child_one': {}, 'child_two': {'grand_child': {}}}}
You can keep a lazy mapping from names to nodes and then rebuild the hierarchy by processing just the parent link (I'm assuming data is correct, so if A is marked as the parent of B iff B is listed among the children of A).
nmap = {}
for n in nodes:
name = n["name"]
parent = n["parent"]
try:
# Was this node built before?
me = nmap[name]
except KeyError:
# No... create it now
if n["children"]:
nmap[name] = me = {}
else:
me = None
if parent:
try:
nmap[parent][name] = me
except KeyError:
# My parent will follow later
nmap[parent] = {name: me}
else:
root = me
The children property of the input is used only to know if the element should be stored as a None in its parent (because has no children) or if it should be a dictionary because it will have children at the end of the rebuild process. Storing nodes without children as empty dictionaries would simplify the code a bit by avoiding the need of this special case.
Using collections.defaultdict the code can also be simplified for the creation of new nodes
import collections
nmap = collections.defaultdict(dict)
for n in nodes:
name = n["name"]
parent = n["parent"]
me = nmap[name]
if parent:
nmap[parent][name] = me
else:
root = me
This algorithm is O(N) assuming constant-time dictionary access and makes only one pass on the input and requires O(N) space for the name->node map (the space requirement is O(Nc) for the original nochildren->None version where Nc is the number of nodes with children).
My stab at it:
persons = [\
{'name': 'top_parent', 'description': None, 'parent': None,\
'children': [{'name': 'child_one'},\
{'name': 'child_two'}]},\
{'name': 'grand_child', 'description': None, 'parent': 'child_two',\
'children': []},\
{'name': 'child_two', 'description': None, 'parent': 'top_parent',\
'children': [{'name': 'grand_child'}]},\
{'name': 'child_one', 'description': None, 'parent': 'top_parent',\
'children': []},\
]
def findParent(name,parent,tree,found = False):
if tree == {}:
return False
if parent in tree:
tree[parent][name] = {}
return True
else:
for p in tree:
found = findParent(name,parent,tree[p],False) or found
return found
tree = {}
outOfOrder = []
for person in persons:
if person['parent'] == None:
tree[person['name']] = {}
else:
if not findParent(person['name'],person['parent'],tree):
outOfOrder.append(person)
for person in outOfOrder:
if not findParent(person['name'],person['parent'],tree):
print 'parent of ' + person['name'] + ' not found
print tree
results in:
{'top_parent': {'child_two': {'grand_child': {}}, 'child_one': {}}}
It also picks up any children whose parent has not been added yet, and then reconciles this at the end.

Categories