Python Recursive Tree - python

I'm new to programming and I want to define a function that allows me to find an element in a non-binary tree, and keep track of all the parentals of that element on a list.
The tree is coded as a tuple, where index 0 is the parent, and index 1 is a list of its childrens. The list contains a tuple for each children that is composed the same way as before(index 0 is the parent, and index 1 is the children).
Example:
tree_data = (
'Alan', [
(
'Bob', [
('Chris', []),
(
'Debbie', [
('Cindy', [])
]
)
]
),
(
'Eric', [
('Dan', []),
(
'Fanny', [
('George', [])
]
)
]
),
('Hannah', [])
]
)
Looking for 'George' would return the following: ['George', 'Eric'. 'Alan']
So far i have the following: I have managed to only append the element and the direct parent, but not any further.
Also if i add a return statement to the function, the result come as None. Would appreciate a little help.
lst = []
def list_parentals(tree, element):
if tree[0] == element:
lst.append(element)
else:
for child in tree[1]:
list_parentals(child, element)
if child[0] == element:
lst.append(tree[0])

Instead of externally keeping a list you can build it as you go.
def list_parentals(tree, element, parents):
this, children = tree
new_parents = [this] + parents
if this == element:
return new_parents
else:
for child in children:
x = list_parentals(child, element, new_parents)
# If x is not None, return it
if x:
return x
list_parentals(t, 'George', [])
# ['George', 'Fanny', 'Eric', 'Alan']

For "my recursive function returns None", that is a very, very classical problem. You'll find hundred of detailed answers about that on SO searching these words.
But in short, a recursive function must return something always, not just when it found something. Typically, on such an example, you must return a result when you found the correct leaf. But if you don't want that result to be lost, then, you must also return something when you call that recursion that found the leaf. And when you call that recursion that called that recursion that found the leaf.
Keep in mind that what you get as a final result is the return value of the first call to list_parentals. If that first call doesn't return anything, then, it doesn't matter that some recursive subcalls did.
Secondly, the way to build such a path is precisely to take advantage of the recursion. Trying to create a list in a global variable when you find a matching leaf is not easy. And you may end up trying to compute that with a iterative algorithm, when the whole point of recursion is to do that for you.
Here is a working recursive method (I tried to keep it as close are your own code as possible)
tree=('Alan', [('Bob', [('Chris', []), ('Debbie', [('Cindy', [])])]), ('Eric', [('Dan', []), ('Fanny', [('George', [])])]), ('Hannah', [])])
def list_parentals(tree, element):
if tree[0] == element:
return [element]
else:
for child in tree[1]:
sub=list_parentals(child, element)
if sub:
return [tree[0]] + sub
return []
r=list_parentals(tree, 'George')
print(f"{r=}")
The logic of that is that list_parental(tree, 'George') returns the path in tree to leaf 'George', or [] (or None, or False, doesn't matter) if no such leaf is found.
So (just think of it as a mathematical expression, not as code, for now), list_parental(('Alan', [('Bob', [('Chris', []), ('Debbie', [('Cindy', [])])]), ('Eric', [('Dan', []), ('Fanny', [('George', [])])]), ('Hannah', [])]), 'George') is just Alan plus `list_parental(('Eric', [('Dan', []), ('Fanny', [('George', [])])]), 'George').
``list_parental(('Eric', [('Dan', []), ('Fanny', [('George', [])])]), 'George')is justEricpluslist_parental(('Fanny', [('George', [])]), 'George')`.
list_parental(('Fanny', [('George', [])]), 'George') is just Fanny plus list_parental(('George', []), 'George').
list_parental(('George', []), 'George') is just George.
To keep this consistent, all returns must be lists.
Hence my code.
list_parental(tree, leafName) is [leafName] is the root of the tree match leafName.
Else
list_parental(tree, leafName) is [tree[0]] + list_parental(child, leafName) if one of the child contains leafName, that is if list_parental(child, leafName) is not empty.

Related

How to use recursion to get all children when each parent has an array of children?

I have a dictionary d of parent:list(children) as so:
d = \
{'accounts': ['nodate'],
'retfile': ['bb', 'aa'],
'snaps': [],
'raw': ['pview', 'status'],
'nodate': ['tainlog', 'retfile'],
'mview': ['status', 'nodate'],
'pview': [],
'retlog': [],
'aa': ['payfile'],
'remfile': ['retlog'],
'tainlog': ['remfile'],
'payfile': [],
'bb': ['payfile'],
'balance': [],
'status': ['accounts', 'snaps', 'nodate'],
'charges': ['mview', 'raw', 'balance']}
For each key parent I want to get all decedents, so I was thinking of a recursion, so far I have this:
def get_all_descendants(t, l):
l.extend(list(d[t]))
if not d[t]:
return set(l)
for t1 in d[t]:
return get_all_descendants(t1, l)
but this is not good as the for loop never really iterates over all the elements in the list rather the recursion starts over for the the first element.
For instance consider 'charges', my function is returning 11 elements:
{'accounts',
'balance',
'mview',
'nodate',
'raw',
'remfile',
'retfile',
'retlog',
'snaps',
'status',
'tainlog'}
While I want 15 elements (all the keys in d except 'charges') for instance missing 'pview' (which is the child of 'raw').
Any ideas?
You can do a depth first search. Just keep track of which items you've seen in a set. Then yield each one you haven't seen and yield from the result of recursive call:
def get_all_descendants(t, d, seen=None):
if seen is None:
seen = set([t])
for item in d[t]:
if item not in seen:
seen.add(item)
yield item
yield from get_all_descendants(item, d, seen)
list(get_all_descendants('charges', d))
This will give you this list:
['mview',
'status',
'accounts',
'nodate',
'tainlog',
'remfile',
'retlog',
'retfile',
'bb',
'payfile',
'aa',
'snaps',
'raw',
'pview',
'balance']
I think #Mark's answer can be improved. Creating an inner loop means we will not check if seen is a set on each iteration. And moving the if outside of the for loop means you can skip many unnecessary checks on child nodes when the parent node has already been seen. The loop also uses the closure property so d and seen do not need to be passed to every recursive call -
def dfs(t, d):
seen = set() # unconditional
def loop(t): # single parameter
if t not in seen: # if outside of loop
seen.add(t)
yield t
for item in d[t]:
yield from loop(item)
return drop(loop(t), 1) # exclude starting node from result
This depends on a generic drop helper which removes n items from an iterable -
def drop(it, n):
for _ in range(n):
next(it, None)
return it
The output is identical -
for node in dfs("charges", d):
print(node)
mview
status
accounts
nodate
tainlog
remfile
retlog
retfile
bb
payfile
aa
snaps
raw
pview
balance

Sort a list of dictionaries by multiple keys/values, where the order of the values should be specific

I have a list of dictionaries and want each item to be sorted by a specific property values.
The list:
[
{'name':'alpha', status='run'},
{'name':'alpha', status='in'},
{'name':'alpha-32', status='in'},
{'name':'beta', status='out'}
{'name':'gama', status='par'}
{'name':'gama', status='in'}
{'name':'aeta', status='run'}
{'name':'aeta', status='unknown'}
{'pname': 'boc', status='run'}
]
I know I can do:
newlist = sorted(init_list, key=lambda k: (k['name'], k['status'])
but there two more conditions:
If the key name is no present in a dict, for the name to be used the value corresponding to pname key.
the status order to be ['out', 'in', 'par', 'run']
if the status value doesn't correspond with what is in the list, ignore it - see unknown;
The result should be:
[
{'name':'aeta', status='unknown'}
{'name':'aeta', status='run'}
{'name':'alpha', status='in'},
{'name':'alpha', status='run'},
{'name':'alpha-32', status='in'},
{'name':'beta', status='out'},
{'pname': 'boc', status='run'}
{'name':'gama', status='in'},
{'name':'gama', status='par'}
]
Use
from itertools import count
# Use count() instead of range(4) so that we
# don't need to worry about the length of the status list.
newlist = sorted(init_list,
key=lambda k: (k.get('name', k.get('pname')),
dict(zip(['out', 'in', 'par', 'run'], count())
).get(k['status'], -1)
)
)
If k['name'] doesn't exits, fall back to k['pname'] (or None if that doesn't exist). Likewise, if there is no known integer for the given status, default to -1.
I deliberately put this all in one logical line to demonstrate that at this point, you may want to just define the key function using a def statement.
def list_order(k):
name_to_use = k.get('name')
if name_to_use is None:
name_to_use = k['pname'] # Here, just assume pname is available
# Explicit definition; you might still write
# status_orders = dict(zip(['out', ...], count())),
# or better yet a dict comprehension like
# { status: rank for rank, status in enumerate(['out', ...]) }
status_orders = {
'out': 0,
'in': 1,
'par': 2,
'run': 3
}
status_to_use = status_orders.get(k['status'], -1)
return name_to_use, status_to_use
newlist = sorted(init_list, key=list_order)
The first condition is simple, you get default the first value of the ordering tuple to pname, i.e.
lambda k: (k.get('name', k.get('pname')), k['status'])
For the second and third rule I would define an order dict for statuses
status_order = {key: i for i, key in enumerate(['out', 'in', 'par', 'run'])}
and then use it in key-function
lambda k: (k.get('name', k.get('pname')), status_order.get(k['status']))
I haven't tested it, so it might need some tweaking

sorting a reduce query with multiple terms and Q filters

I am trying to make a search function that querys on multiple attributes from a model. To make matters a bit tougher I want to be able to do it with multiple terms inside a list comprehension then sort by the results that hit more accurately.
For example, if the serach terms were ['green', 'shoe'] and I had an object named 'green shoe' I would want that to be the first item in my result followed by 'black shoe' or 'green pants'.
Here is what I have so far that extracts the search terms from the query param and then runs the Q queries.
def get_queryset(self):
search_terms = self.request.GET.getlist('search', None)
terms = []
x = [terms.extend(term.lower().replace('/', '').split(" "))
for term in search_terms]
# x is useless, but it is just better to look at.
results = reduce(operator.or_,
(Item.objects.filter(Q(name__icontains=term) |
Q(description__icontains=term) |
Q(option__name__icontains=term))
for term in terms))
return results
This would return ['black shoe', 'green pants', 'green shoe'] which is out of order, but it is all of the matching results.
I realize I could make it not split the search term up into multiple terms and would only get one result but then I wouldn't be getting other things that are similar either.
Thanks for looking
Edit 1
So after the first answer I started to play around with it. Now this produces the result I want, but I feel like it may be just terrible due to adding the query set to a list. Let me know what you think:
def get_queryset(self):
search_terms = self.request.GET.getlist('search', None)
if not search_terms or '' in search_terms or ' ' in search_terms:
return []
terms = [term.lower().replace('/', '').split(" ") for term in search_terms][0]
results = reduce(operator.or_,
(Item.objects.filter
(Q(name__icontains=term) | Q(description__icontains=term) | Q(option__name__icontains=term))
for term in terms))
# creating a list so I can index later
# Couldn't find an easy way to index on a generator/queryset
results = list(results)
# Using enumerate so I can get the index, storing index at end of list for future reference
# Concats the item name and the item description into one list, using that for the items weight in the result
results_split = [t.name.lower().split() + t.description.lower().split() + list((x,)) for x, t in enumerate(results)]
query_with_weights = [(x, len(search_terms[0].split()) - search_terms[0].split().index(x)) for x in terms]
get_weight = lambda x: ([weight for y, weight in query_with_weights if y==x] or [0])[0]
sorted_results = sorted([(l, sum([(get_weight(m)) for m in l])) for l in results_split], key=lambda lst: lst[1], reverse=True)
# Building the final list based off the sorted list and the index of the items.
final_sorted = [results[result[0][-1]] for result in sorted_results]
print results_split
print query_with_weights
print final_sorted
return final_sorted
A query of [red, shoes, pants] would print out this:
# Combined name and description of each item
[[u'red', u'shoe', u'sweet', u'red', u'shoes', u'bro', 0], [u'blue', u'shoe', u'sweet', u'blue', u'shoes', u'bro', 1], [u'red', u'pants', u'sweet', u'red', u'pants', u'bro', 2], [u'blue', u'pants', u'sweet', u'blue', u'pants', u'bro', 3], [u'red', u'swim', u'trunks', u'sweet', u'red', u'trunks', u'bro', 4]]
# Weighted query
[(u'red', 3), (u'shoes', 2), (u'pants', 1)]
# Final list of sorted items from queryset
[<Item: Red Shoe>, <Item: Red Pants>, <Item: Red Swim Trunks>, <Item: Blue Shoe>, <Item: Blue Pants>]
This is not exactly a QuerySet problem.
This needs a separate algo that decides the ordering of the result set that you create. I would write a new algo that decides the ordering - possibly a whole array of algos because your results would depend on the category of the query itself.
For now I can think of adding weight to the every result in the result set, deciding how close it is to the query done, based on some parameters.
In your case, your parameters would be as follows:
How many words matched?
The words that appear first should get the highest priority
Any query that matches fully should have the highest priority as well
The words on the far end of the query should have lowest priority
Anyways, that is an idea to begin with, I am sure you will have it much more complex perhaps.
So here's the code for create the ordering:
query = 'green shoe'
query_with_weights = [(x, len(query.split()) - query.split().index(x)) for x in query.split()]
results = ['black pants', 'green pants', 'green shoe']
results_split = [res.split() for res in results]
get_weight = lambda x: ([weight for y, weight in query_with_weights if y==x] or [0])[0]
sorted_results = sorted([ (l, sum([( get_weight(m)) for m in l])) for l in results_split], key = lambda lst: lst[1], reverse=True)
print('sorted_results={}'.format(sorted_results))
Once you try this, you will get the following results:
sorted_results=[(['green', 'shoe'], 3), (['green', 'pants'], 2),
(['black', 'pants'], 0)]
I hope this explains the point. However, this algo will only work for simple text. You might have to change your algo based on electrical items, for example, if your website depends on it. Sometimes you may have to look into properties of the object itself. This should be a good starter.

how to make list comprehension using while in loop

I have such loop:
ex = [{u'white': []},
{u'yellow': [u'9241.jpg', []]},
{u'red': [u'241.jpg', []]},
{u'blue': [u'59241.jpg', []]}]
for i in ex:
while not len(i.values()[0]):
break
else:
print i
break
I need always to return first dict with lenght of values what is higher then 0
but i want to make it with list comprehension
A list comprehension would produce a whole list, while you want just one item.
Use a generator expression instead, and have the next() function iterate to the first value:
next((i for i in ex if i.values()[0]), None)
I've given next() a default to return as well; if there is no matching dictionary, None is returned instead.
Demo:
>>> ex = [{u'white': []},
... {u'yellow': [u'9241.jpg', []]},
... {u'red': [u'241.jpg', []]},
... {u'blue': [u'59241.jpg', []]}]
>>> next((i for i in ex if i.values()[0]), None)
{u'yellow': [u'9241.jpg', []]}
You should, however, rethink your data structure. Dictionaries with just one key-value pair suggest to me you wanted a different type instead; tuples perhaps:
ex = [
(u'white', []),
(u'yellow', [u'9241.jpg', []]),
(u'red', [u'241.jpg', []]),
(u'blue', [u'59241.jpg', []]),
]

Python: Convert a nested list of tuples of lists into a dictionary?

I have a nested list of tuples of lists (of tuples, etc.) that looks like this:
[(' person',
[(('surname', u'Doe', True),),
(('name', u'John', False),),
('contact',
[(('email', u'john#doe.me', True),),
(('phone', u'+0123456789', False),),
(('fax', u'+0987654321', False),)]),
('connection',
[(('company', u'ibcn', True),),
('contact',
[(('email', u'mail#ibcn.com', True),),
(('address', u'main street 0', False),),
(('city', u'pythonville', False),),
(('fax', u'+0987654321', False),)])])])]
There is no way of knowing neither the number of (double) tuples within a list nor how deep nesting goes.
I want to convert it to a nested dictionary (of dictionaries), eliminating the boolean values, like this:
{'person': {'name': 'John', 'surname': 'Doe',
'contact': {'phone': '+0123456789', 'email': 'john#doe.me','fax': '+0987654321',
'connection': {'company name': 'ibcn', 'contact':{'phone': '+2345678901',
'email': 'mail#ibcn.com', 'address': 'main street 0'
'city': 'pythonville', 'fax': +0987654321'
}}}}}
All I have, so far, is a recursive method that can print the nested structure in a per-line fashion:
def tuple2dict(_tuple):
for item in _tuple:
if type(item) == StringType or type(item) == UnicodeType:
print item
elif type(item) == BooleanType:
pass
else:
tuple2dict(item)
but, I'm not sure I'm on the right track...
EDIT:
I've edited the original structure, since it was missing a comma.
You are on the right track. The recursive approach will work. As far as I can tell from your sample data, each tuple first has string item, containing the key. After that you have either another tuple or list as value, or a String value followed by a boolean true or false.
EDIT:
The trick with recursion is that you have to know when to stop. Basically, in your case it appears the deepest structure are the nested three tuples, matching names to values.
Hacking away a bit. I shamefully admit this is the ugliest code in the world.
def tuple2dict(data):
d = {}
for item in data:
if len(item) == 1 and isinstance(item, tuple):
# remove the nested structure, you may need a loop here
item = item[0]
key = item[0]
value = item[1]
d[key] = value
continue
key = item[0]
value = item[1]
if hasattr(value, '__getitem__'):
value = tuple2dict(value)
d[key] = value
return d
not beautiful ... but it works... basically
def t2d(t):
if isinstance(t,basestring):return t
length = len(t)
if length == 1:
return t2d(t[0])
if length == 2:
t1,t2 = t2d(t[0]),t2d(t[1])
print "T:",t1,t2
if isinstance(t1,dict) and len(t1) == 1:
t2['name'] = t1.values()[0]
t1 = t1.keys()[0]
return dict([[t1,t2]])
if length == 3 and isinstance(t[2],bool):
return t2d(t[:2])
L1 =[t2d(tp) for tp in t]
L2 = [lx.items() for lx in L1]
L3 = dict( [i[0] for i in L2])
return L3
I should mention it works specifically with the dict you posted... (seems like company wasnt quite setup right so I hacked it (see t2['name']...))
You are definitely on the right track. A recursive function is the way to go.
A dictionary can be built from an iterable giving tuples of length 2.
Now, to get rid of the boolean, you can use slicing. Just slice off everything except the first two elements.
>> (1,2,3)[:3]
(1,2)
This is my final solution. Not very elegant, I must admit.
It also takes care of multiple entries for a key by concatenating it with the existing one.
def tuple2dict(_obj):
_dict = {}
for item in _obj:
if isinstance(item, tuple) or isinstance(item, list):
if isinstance(item[0], basestring):
_dict[item[0]] = tuple2dict(item[1])
else:
if isinstance(item[0], tuple):
# if the key already exists, then concatenate the old
# value with the new one, adding a space in between.
_key = item[0][0]
if _key in _dict:
_dict[_key] = " ".join([_dict[_key], item[0][1]])
else:
_dict[_key] = item[0][1]
return _dict

Categories