How to use properly recursion and side effects in python - python

In a tree structure, I'm trying to find all leafs of a branch. Here is what I wrote:
def leafs_of_branch(node,heads=[]):
if len(node.children()) == 0:
heads.append(str(node))
else:
for des in node.children():
leafs_of_branch(des)
return heads
leafs_of_branch(node)
I don't know why but it feels wrong for me. It works but I want to know if there is a better way to use recursion without creating the heads parameter.

This
def leafs_of_branch(node,heads=[]):
is always a bad idea. Better would be
def leafs_of_branch(node,heads=None):
heads = heads or []
as otherwise you always use the same list for leafs_of_branch. In your specific case it might be o.k., but sooner or later you will run into problems.
I recommend:
def leafs_of_branch(node):
leafs = []
for des in node.children():
leafs.extend(leafs_of_branch(des))
if len(leafs)==0:
leafs.append(str(node))
return leafs
leafs_of_branch(node)
Instead of doing a if len(node.children()==0, I check for len(leafs) after descending into all (possibly zero) children. Thus I call node.children() only once.

I believe this should work:
def leafs_of_branch(node):
if len(node.children()) == 0:
return [str(node)]
else:
x = []
for des in node.children():
x += leafs_of_branch(des) #x.extend(leafs_of_branch(des)) would work too :-)
return x
It's not very pretty and could probably be condensed a bit more, but I was trying to keep the form of your original code as much as possible to make it obvious what was going on.
Your original version won't actually work if you call it more than once because as you append to the heads list, that list will actually be saved between calls.

As long as recursion goes, you are doing it right IMO; you are missing the heads paramater on the recursive call tho. The reason it's working anyway is for what other people said, default parameters are global and reused between calls.
If you want to avoid recursion altogheter, in this case you can use either a Queue or a Stack and a loop:
def leafs_of_branch(node):
traverse = [node]
leafs = []
while traverse:
node = traverse.pop()
children = node.children()
if children:
traverse.extend(children)
else:
leafs.append(str(node))
return leafs

You may also define recursively an iterator this way.
def leafs_of_branch(node):
if len(node.children()) == 0:
yield str(node)
else:
for des in node.children():
for leaf in leafs_of_branch(des):
yield leaf
leafs = list(leafs_of_branch(node))

First of all, refrain from using mutable objects (lists, dicts etc) as default values, since default values are global and reused between the function calls:
def bad_func(val, dest=[]):
dest.append(val)
print dest
>>> bad_func(1)
[1]
>>> bad_func(2)
[1, 2] # surprise!
So, the consequent calls will make something completely unexpected.
As for the recursion question, I'd re-write it like this:
from itertools import chain
def leafs_of_branch(node):
children = node.children()
if not children: # better than len(children) == 0
return (node, )
all_leafs = (leafs_of_branch(child) for child in children)
return chain(*all_leafs)

Related

Python: How can I implement yield in my recursion?

How can I implement yield from in my recursion? I am trying to understand how to implement it but failing:
# some data
init_parent = [1020253]
df = pd.DataFrame({'parent': [1020253, 1020253],
'id': [1101941, 1101945]})
# look for parent child
def recur1(df, parents, parentChild=None, step=0):
if len(parents) != 0:
yield parents, parentChild
else:
parents = df.loc[df['parent'].isin(parents)][['id', 'parent']]
parentChild = parents['parent'].to_numpy()
parents = parents['id'].to_numpy()
yield from recur1(df=df, parents=parents, parentChild=parentChild, step=step+1)
# exec / only printing results atm
out = recur1(df, init_parent, step=0)
[x for x in out]
I'd say your biggest issue here is that recur1 isn't always guaranteed to return a generator. For example, suppose your stack calls into the else branch three times before calling into the if branch. In this case, the top three frames would be returning a generator received from the lower frame, but the lowest from would be returned from this:
yield parents, parentChild
So, then, there is a really simple way you can fix this code to ensure that yield from works. Simply transform your return from a tuple to a generator-compatible type by enclosing it in a list:
yield [(parents, parentChild)]
Then, when you call yield from recur1(df=df, parents=parents, parentChild=parentChild, step=step+1) you'll always be working with something for which yeild from makes sense.

Recursion puzzle with key in the box

I currently try to understand recursion on made up example. Imagine you have a briefcase, which can be opened by the key. The key is in the big box, which can contain other smaller boxes, which key might be in.
In my example boxes are lists. The recursion appears when we find the smaller box - we search it for the key. The problem is that my function can find the key if it is actually in the box and can't go back if there is nothing like 'key'.
Unfortunately, i could not understand how to go back if there is no key in the smaller box. Can you help me solve this puzzle? By the way, have a nice day! Here is the code (big box consists in the way when the key can be found and returned):
box = ['socks', 'papers', ['jewelry', 'flashlight', 'key'], 'dishes', 'souvernirs', 'posters']
def look_for_key(box):
for item in box:
if isinstance(item, list) == True:
look_for_key(item)
elif item == 'key':
print('found the key')
key = item
return key
print(look_for_key(box))
Iteration
The most closed to yours and yet readable solution I could find is:
def look_for_key(box):
for item in box:
if item == 'key':
return item
elif isinstance(item, list) and look_for_key(item) is not None:
return look_for_key(item)
else:
pass
box = [['sock','papers'],['jewelry','key']]
look_for_key(box)
# ==> 'key'
I don't like it because its deduction condition includes a recursive call which is hard to interpret. It does not help to improve interpretability if you assign look_for_key(item) to a variable and check for not None afterwards. It is just similarly difficult to interpret. An equivalent but more interpretable solution is:
def look_for_key(box):
def inner(item, remained):
if item == [] and remained == []:
return None
elif isinstance(item, list) and item != []:
return inner(item[0], [item[1:], remained])
elif item == [] or item != 'key':
return inner(remained[0], remained[1:])
elif item == 'key':
return item
return inner(box[0], box[1:])
box = [['sock','papers'],['jewelry','key']]
look_for_key(box)
# ==> 'key'
It explicitly splits the tree to branches (see below what this means) with return inner(item[0], [item[1:], remained]) and return inner(remained[0], remained[1:]) instead of intrinsically reusing the recursive call conditionally during deduction - if look_for_key(item) is not None: return look_for_key(item) - with this line of code it is hard to see a diagram and understand in which direction the recursion goes.
The 2nd solution also makes it easier to infer the complexity using a tree diagram since you see the branches explicitly, for example remained[0] vs. remained[1:].
As inner is simply an iteration written in a functional way and for loop is a syntactic sugar to form iteration, both solutions should have similar complexity in principle.
Since you do not just want a solution but also a better understanding of recursion, I would try the following approach.
Mapping over Trees (Map-Reduce)
This is a typical text book tree recursion question. What you want is to traverse a hieratical data structure called tree. A typical solution is mapping a function over the tree:
from functools import reduce
def look_for_key(tree):
def look_inner(sub_tree):
if isinstance(sub_tree, list):
return look_for_key(sub_tree)
elif sub_tree == 'key':
return [sub_tree]
else:
return []
return reduce(lambda left_branch, right_branch: look_inner(left_branch) + look_inner(right_branch), tree, [])
box = ['socks', 'papers', ['jewelry', 'flashlight', 'key'], 'dishes', 'souvernirs', 'posters']
look_for_key(box)
# ==> ['key']
To make it explicit I use tree, sub_tree, left_branch, right_branch as variable names instead of box, inner_box and so on as in your example. Notice how the function look_for_key is mapped over each left_branch and right_branch of the sub_trees in the tree. The result is then summarized using reduce (A classic map-reduce procedure).
To be more clear, you can omit the reduce part and keep only the map part:
def look_for_key(tree):
def look_inner(sub_tree):
if isinstance(sub_tree, list):
return look_for_key(sub_tree)
elif sub_tree == 'key':
return sub_tree
else:
return None
return list(map(look_inner, tree))
look_for_key(box)
# ==> [None, None, [None, None, 'key'], None, None, None]
This does not generate your intended format of the result. But it helps to understand how the recursion works. map just adds an abstract layer to recursively look for keys into sub trees which is equivalent to the syntactic sugar of for loop provided by python. That is not important. The essential thing is decomposing the tree properly (deduction) and set-up proper base condition to return the result.
Native Tree Recursion
If it is still not clear enough, you can get rid of all abstractions and syntactic sugars and just build a native recursion from scratch:
def look_for_key(box):
if box == []:
return []
elif not isinstance(box, list) and box == 'key':
print('found the key')
return [box]
elif not isinstance(box, list) and box != 'key':
return []
else:
return look_for_key(box[0]) + look_for_key(box[1:])
look_for_key(box)
# ==> found the key
# ==> ['key']
Here all three fundamental elements of recursion:
base cases
deduction
recursive calls
are explicitly displayed. You can also see from this example clearly that there is no miracle of going out of an inner box (or sub-tree). To look into every possible corner inside the box (or tree), you just repeatedly split it to two parts in every smaller box (or sub tree). Then you properly combine your results at each level (so called fold or reduce or accumulate), here using +, then recursive calls will take care of it and help to return to the top level.
Both the native recursion and map-reduce approaches are able to find out multiple keys, because they traverse over the whole tree and accumulate all matches:
box = ['a','key','c', ['e', ['f','key']]]
look_for_key(box)
# ==> found the key
# ==> found the key
# ==> ['key', 'key']
Recursion Visualization
Finally, to fully understand what is going on with the tree recursion, you could plot the recursive depth and visualize how the calls are moving to deeper levels and then returned.
import functools
import matplotlib.pyplot as plt
# ignore the error of unhashable data type
def ignore_unhashable(func):
uncached = func.__wrapped__
attributes = functools.WRAPPER_ASSIGNMENTS + ('cache_info', 'cache_clear')
#functools.wraps(func, assigned=attributes)
def wrapper(*args, **kwargs):
try:
return func(*args, **kwargs)
except TypeError as error:
if 'unhashable type' in str(error):
return uncached(*args, **kwargs)
raise
wrapper.__uncached__ = uncached
return wrapper
# rewrite the native recursion and cache the recursive calls
#ignore_unhashable
#functools.lru_cache(None)
def look_for_key(box):
global depth, depths
depth += 1
depths.append(depth)
result = ([] if box == [] else
[box] if not isinstance(box, list) and box == 'key' else
[] if not isinstance(box, list) and box != 'key' else
look_for_key(box[0]) + look_for_key(box[1:]))
depth -= 1
return result
# function to plot recursion depth
def plot_depths(f, *args, show=slice(None), **kwargs):
"""Plot the call depths for a cached recursive function"""
global depth, depths
depth, depths = 0, []
f.cache_clear()
f(*args, **kwargs)
plt.figure(figsize=(12, 6))
plt.xlabel('Recursive Call Number'); plt.ylabel('Recursion Depth')
X, Y = range(1, len(depths) + 1), depths
plt.plot(X[show], Y[show], '.-')
plt.grid(True); plt.gca().invert_yaxis()
box = ['socks', 'papers', ['jewelry', 'flashlight', 'key'], 'dishes', 'souvernirs']
plot_depths(look_for_key, box)
Whenever the function got called recursively, the curve goes to a deeper level - the downward slash. When the tree/sub-tree is splitted to left and right branches, two calls happen at the same level - the horizontal line that connected two dots (two calls look_for_key(box[0]) + look_for_key(box[1:])). When it traverses
over a complete sub-tree (or branch) and reaches to the last leave in that sub-tree (a base condition when a value or [] is returned), it starts to go back to upper levels - the valley in the curve. If you have multiple sub/nest lists there will be multiple valleys. Eventually it traverses over the whole tree and returns the results
You can play with boxes (or trees) of different nest structures to understand better how it works. Hopefully these provide you enough information and a more comprehensive understanding of tree-recursion.
Integrating the above comments:
box = ['socks', 'papers', ['jewelry', 'flashlight', 'key'], 'dishes', 'souvernirs', 'posters']
def look_for_key(box):
for item in box:
if isinstance(item, list) == True:
in_box = look_for_key(item)
if in_box is not None:
return in_box
elif item == 'key':
print('found the key')
return item
# not found
return None
print(look_for_key(box))
which prints:
found the key
key
If the key is deleted from the box, executing the code prints:
None

Recursive function only returns results of one branch of nested dictionary

I have a branching nested dictionary to visualize species taxonomy data. I'm trying to write a function that gives me all the branches at a particular level.
I've tried iterative and recursive functions, but I have only gotten close using a recursive function.
However, depending on where I put return/print statements, my function either returns None (but prints the correct information), or returns only one branch of the data.
Using the second option, the output is perfect until the dataset branches.
tree = {"k-b":
{"p-a":
{"c-a":{"o-a":{}, "o-b":{}},
"c-b":{"o-a":{}}},
"p-b":
{"c-a":{"o-a":{},"o-b":{}}}}}
def branches(tree, level):
if level == 0:
#print(tree.keys())
return tree.keys()
else:
for i in tree.keys():
return branches(tree[i], level-1)
print(branchNumber(tree, 2))
For level 2, I expect [['c-a', 'c-b'], ['c-a']] (it doesn't have to be an array of arrays, and I don't care if it has dict_keys() or anything else around it)
I actually get dict_keys(['c-a', 'c-b']), which excludes the second branch
Alternatively, if I remove the 'return' before recursively calling branches, and uncomment the print statement, it prints:
dict_keys(['c-a', 'c-b'])
dict_keys(['c-a'])
Which is exactly the output I want, but the function returns None so I can't store that information for future applications
Your code always returns the first item in the loop, so your algorithm ends prematurely and doesn't explore all the necessary branches. You could yield the results to create a generator function (among other approaches):
tree = {"k-b":
{"p-a":
{"c-a":{"o-a":{}, "o-b":{}},
"c-b":{"o-a":{}}},
"p-b":
{"c-a":{"o-a":{},"o-b":{}}}}}
def branches(tree, level):
if level == 0:
yield list(tree.keys())
elif level > 0:
for v in tree.values():
yield from branches(v, level - 1)
for i in range(4):
print(f"level {i}:", list(branches(tree, i)))
Output:
level 0: [['k-b']]
level 1: [['p-a', 'p-b']]
level 2: [['c-a', 'c-b'], ['c-a']]
level 3: [['o-a', 'o-b'], ['o-a'], ['o-a', 'o-b']]
The line elif level > 0: is an optimization to avoid walking deeper into the tree than necessary.
Also, for i in tree.keys(), then tree[i] to access the value could be clearer as for v in tree.values().
You might want to return a list of all items at that level:
tree = {"k-b":
{"p-a":
{"c-a":{"o-a":{}, "o-b":{}},
"c-b":{"o-a":{}}},
"p-b":
{"c-a":{"o-a":{},"o-b":{}}}}}
def branches(tree, level):
if level == 0:
#print(tree.keys())
return tree.keys()
else:
return [branches(tree[i], level-1) for i in tree.keys()]
print(branches(tree, 2))
Output:
[[dict_keys(['c-a', 'c-b']), dict_keys(['c-a'])]]
It sounds like you want to return a list of all branches. One way to do this is with a list comprehension:
def branches(tree, level):
if level == 0:
#print(tree.keys())
return tree.keys()
else:
return [branches(tree[i], level-1) for i in tree.keys()]
Note that this will return a deeply nested list. Flattening is left as an exercise for the reader.

Python recursion - how to exit early

I've been playing with BST (binary search tree) and I'm wondering how to do an early exit. Following is the code I've written to find kth smallest. It recursively calls the child node's find_smallest_at_k, stack is just a list passed into the function to add all the elements in inorder. Currently this solution walks all the nodes inorder and then I have to select the kth item from "stack" outside this function.
def find_smallest_at_k(self, k, stack, i):
if self is None:
return i
if (self.left is not None):
i = self.left.find_smallest_at_k(k, stack, i)
print(stack, i)
stack.insert(i, self.data)
i += 1
if i == k:
print(stack[k - 1])
print "Returning"
if (self.right is not None):
i = self.right.find_smallest_at_k(k, stack, i)
return i
It's called like this,
our_stack = []
self.root.find_smallest_at_k(k, our_stack, 0)
return our_stack[k-1]
I'm not sure if it's possible to exit early from that function. If my k is say 1, I don't really have to walk all the nodes then find the first element. It also doesn't feel right to pass list from outside function - feels like passing pointers to a function in C. Could anyone suggest better alternatives than what I've done so far?
Passing list as arguments: Passing the list as argument can be good practice, if you make your function tail-recursive. Otherwise it's pointless. With BST where there are two potential recursive function calls to be done, it's a bit of a tall ask.
Else you can just return the list. I don't see the necessity of variable i. Anyway if you absolutely need to return multiples values, you can always use tuples like this return i, stack and this i, stack = root.find_smallest_at_k(k).
Fast-forwarding: For the fast-forwarding, note the right nodes of a BST parent node are always bigger than the parent. Thus if you descend the tree always on the right children, you'll end up with a growing sequence of values. Thus the first k values of that sequence are necessarily the smallest, so it's pointless to go right k times or more in a sequence.
Even in the middle of you descend you go left at times, it's pointless to go more than k times on the right. The BST properties ensures that if you go right, ALL subsequent numbers below in the hierarchy will be greater than the parent. Thus going right k times or more is useless.
Code: Here is a pseudo-python code quickly made. It's not tested.
def findKSmallest( self, k, rightSteps=0 ):
if rightSteps >= k: #We went right more than k times
return []
leftSmallest = self.left.findKSmallest( k, rightSteps ) if self.left != None else []
rightSmallest = self.right.findKSmallest( k, rightSteps + 1 ) if self.right != None else []
mySmallest = sorted( leftSmallest + [self.data] + rightSmallest )
return mySmallest[:k]
EDIT The other version, following my comment.
def findKSmallest( self, k ):
if k == 0:
return []
leftSmallest = self.left.findKSmallest( k ) if self.left != None else []
rightSmallest = self.right.findKSmallest( k - 1 ) if self.right != None else []
mySmallest = sorted( leftSmallest + [self.data] + rightSmallest )
return mySmallest[:k]
Note that if k==1, this is indeed the search of the smallest element. Any move to the right, will immediately returns [], which contributes to nothing.
As said Lærne, you have to care about turning your function into a tail-recursive one; then you may be interested by using a continuation-passing style. Thus your function could be able to call either itself or the "escape" function. I wrote a module called tco for optimizing tail-calls; see https://github.com/baruchel/tco
Hope it can help.
Here is another approach: it doesn't exit recursion early, instead it prevents additional function calls if not needed, which is essentially what you're trying to achieve.
class Node:
def __init__(self, v):
self.v = v
self.left = None
self.right = None
def find_smallest_at_k(root, k):
res = [None]
count = [k]
def helper(root):
if root is None:
return
helper(root.left)
count[0] -= 1
if count[0] == 0:
print("found it!")
res[0] = root
return
if count[0] > 0:
print("visiting right")
find(root.right)
helper(root)
return res[0].v
If you want to exit as soon as earlier possible, then use exit(0).
This will make your task easy!

recursive sorting in python

I am trying to run a sorting function recursively in python. I have an empty list that starts everything but everytime I try to print the list I get an empty list. here is my code. Any help would be greatly appreciated
def parse(list):
newParse = []
if len(list) == 0:
return newParse
else:
x = min(list)
list.remove(x)
newParse.append(x)
return sort(list)
The value of newParse is not preserved between invocations of the function; you're setting it equal to [] (well, you're creating a new variable with the value []).
Since the only time you return is
newParse = []
if len(list) == 0:
return newParse`
you will always be returning [] because that is the value of newParse at that time.
Because you are doing this recursively, you are calling the function anew, without keeping the function's own state. Take a moment to consider the implications of this on your code.
Instead of initialising newParse = [], add an optional parameter newParse defaulting to a bogus value, and set newParse = [] if you receive that bogus value for newParse. Otherwise, you'll actually be getting the same list every time (i.e. the contents of the list object are being mutated). And newParse through in your tail call.
You also seem to have the problem that your definition and and the supposedly-recursive call refer to different functions.
def sort(list, newParse = None):
if newParse is None:
newParse = []
if len(list) == 0:
return newParse
else:
x = min(list)
list.remove(x)
newParse.append(x)
return sort(list, newParse)
Here is what I think you are trying to do:
def recursive_sort(a_list):
def helper_function(list_to_be_sorted, list_already_sorted):
new = []
if len(list_to_be_sorted) == 0:
return list_already_sorted
else:
x = min(list_to_be_sorted)
list_to_be_sorted.remove(x)
new.append(x)
return helper_function(list_to_be_sorted, list_already_sorted + new)
return helper_function(a_list, [])
You shouldn't name variables list, as that is a builtin.
Also, if you are trying to implement a recursive sort function, you might want to look at quicksort, which is a very common (and fast) recursive sorting algorithm. What you have tried to implement is a recursive version of selection sort, which is much slower.
Also, if you actually need a sorting function, rather than just wanting to implement a recursive one, you should use the list method sort, or the function on an iterable sorted, both of which will be a lot faster than anything you could make in Python.

Categories