Python: How can I implement yield in my recursion? - python

How can I implement yield from in my recursion? I am trying to understand how to implement it but failing:
# some data
init_parent = [1020253]
df = pd.DataFrame({'parent': [1020253, 1020253],
'id': [1101941, 1101945]})
# look for parent child
def recur1(df, parents, parentChild=None, step=0):
if len(parents) != 0:
yield parents, parentChild
else:
parents = df.loc[df['parent'].isin(parents)][['id', 'parent']]
parentChild = parents['parent'].to_numpy()
parents = parents['id'].to_numpy()
yield from recur1(df=df, parents=parents, parentChild=parentChild, step=step+1)
# exec / only printing results atm
out = recur1(df, init_parent, step=0)
[x for x in out]

I'd say your biggest issue here is that recur1 isn't always guaranteed to return a generator. For example, suppose your stack calls into the else branch three times before calling into the if branch. In this case, the top three frames would be returning a generator received from the lower frame, but the lowest from would be returned from this:
yield parents, parentChild
So, then, there is a really simple way you can fix this code to ensure that yield from works. Simply transform your return from a tuple to a generator-compatible type by enclosing it in a list:
yield [(parents, parentChild)]
Then, when you call yield from recur1(df=df, parents=parents, parentChild=parentChild, step=step+1) you'll always be working with something for which yeild from makes sense.

Related

Recursive function only returns results of one branch of nested dictionary

I have a branching nested dictionary to visualize species taxonomy data. I'm trying to write a function that gives me all the branches at a particular level.
I've tried iterative and recursive functions, but I have only gotten close using a recursive function.
However, depending on where I put return/print statements, my function either returns None (but prints the correct information), or returns only one branch of the data.
Using the second option, the output is perfect until the dataset branches.
tree = {"k-b":
{"p-a":
{"c-a":{"o-a":{}, "o-b":{}},
"c-b":{"o-a":{}}},
"p-b":
{"c-a":{"o-a":{},"o-b":{}}}}}
def branches(tree, level):
if level == 0:
#print(tree.keys())
return tree.keys()
else:
for i in tree.keys():
return branches(tree[i], level-1)
print(branchNumber(tree, 2))
For level 2, I expect [['c-a', 'c-b'], ['c-a']] (it doesn't have to be an array of arrays, and I don't care if it has dict_keys() or anything else around it)
I actually get dict_keys(['c-a', 'c-b']), which excludes the second branch
Alternatively, if I remove the 'return' before recursively calling branches, and uncomment the print statement, it prints:
dict_keys(['c-a', 'c-b'])
dict_keys(['c-a'])
Which is exactly the output I want, but the function returns None so I can't store that information for future applications
Your code always returns the first item in the loop, so your algorithm ends prematurely and doesn't explore all the necessary branches. You could yield the results to create a generator function (among other approaches):
tree = {"k-b":
{"p-a":
{"c-a":{"o-a":{}, "o-b":{}},
"c-b":{"o-a":{}}},
"p-b":
{"c-a":{"o-a":{},"o-b":{}}}}}
def branches(tree, level):
if level == 0:
yield list(tree.keys())
elif level > 0:
for v in tree.values():
yield from branches(v, level - 1)
for i in range(4):
print(f"level {i}:", list(branches(tree, i)))
Output:
level 0: [['k-b']]
level 1: [['p-a', 'p-b']]
level 2: [['c-a', 'c-b'], ['c-a']]
level 3: [['o-a', 'o-b'], ['o-a'], ['o-a', 'o-b']]
The line elif level > 0: is an optimization to avoid walking deeper into the tree than necessary.
Also, for i in tree.keys(), then tree[i] to access the value could be clearer as for v in tree.values().
You might want to return a list of all items at that level:
tree = {"k-b":
{"p-a":
{"c-a":{"o-a":{}, "o-b":{}},
"c-b":{"o-a":{}}},
"p-b":
{"c-a":{"o-a":{},"o-b":{}}}}}
def branches(tree, level):
if level == 0:
#print(tree.keys())
return tree.keys()
else:
return [branches(tree[i], level-1) for i in tree.keys()]
print(branches(tree, 2))
Output:
[[dict_keys(['c-a', 'c-b']), dict_keys(['c-a'])]]
It sounds like you want to return a list of all branches. One way to do this is with a list comprehension:
def branches(tree, level):
if level == 0:
#print(tree.keys())
return tree.keys()
else:
return [branches(tree[i], level-1) for i in tree.keys()]
Note that this will return a deeply nested list. Flattening is left as an exercise for the reader.

Python recursion - how to exit early

I've been playing with BST (binary search tree) and I'm wondering how to do an early exit. Following is the code I've written to find kth smallest. It recursively calls the child node's find_smallest_at_k, stack is just a list passed into the function to add all the elements in inorder. Currently this solution walks all the nodes inorder and then I have to select the kth item from "stack" outside this function.
def find_smallest_at_k(self, k, stack, i):
if self is None:
return i
if (self.left is not None):
i = self.left.find_smallest_at_k(k, stack, i)
print(stack, i)
stack.insert(i, self.data)
i += 1
if i == k:
print(stack[k - 1])
print "Returning"
if (self.right is not None):
i = self.right.find_smallest_at_k(k, stack, i)
return i
It's called like this,
our_stack = []
self.root.find_smallest_at_k(k, our_stack, 0)
return our_stack[k-1]
I'm not sure if it's possible to exit early from that function. If my k is say 1, I don't really have to walk all the nodes then find the first element. It also doesn't feel right to pass list from outside function - feels like passing pointers to a function in C. Could anyone suggest better alternatives than what I've done so far?
Passing list as arguments: Passing the list as argument can be good practice, if you make your function tail-recursive. Otherwise it's pointless. With BST where there are two potential recursive function calls to be done, it's a bit of a tall ask.
Else you can just return the list. I don't see the necessity of variable i. Anyway if you absolutely need to return multiples values, you can always use tuples like this return i, stack and this i, stack = root.find_smallest_at_k(k).
Fast-forwarding: For the fast-forwarding, note the right nodes of a BST parent node are always bigger than the parent. Thus if you descend the tree always on the right children, you'll end up with a growing sequence of values. Thus the first k values of that sequence are necessarily the smallest, so it's pointless to go right k times or more in a sequence.
Even in the middle of you descend you go left at times, it's pointless to go more than k times on the right. The BST properties ensures that if you go right, ALL subsequent numbers below in the hierarchy will be greater than the parent. Thus going right k times or more is useless.
Code: Here is a pseudo-python code quickly made. It's not tested.
def findKSmallest( self, k, rightSteps=0 ):
if rightSteps >= k: #We went right more than k times
return []
leftSmallest = self.left.findKSmallest( k, rightSteps ) if self.left != None else []
rightSmallest = self.right.findKSmallest( k, rightSteps + 1 ) if self.right != None else []
mySmallest = sorted( leftSmallest + [self.data] + rightSmallest )
return mySmallest[:k]
EDIT The other version, following my comment.
def findKSmallest( self, k ):
if k == 0:
return []
leftSmallest = self.left.findKSmallest( k ) if self.left != None else []
rightSmallest = self.right.findKSmallest( k - 1 ) if self.right != None else []
mySmallest = sorted( leftSmallest + [self.data] + rightSmallest )
return mySmallest[:k]
Note that if k==1, this is indeed the search of the smallest element. Any move to the right, will immediately returns [], which contributes to nothing.
As said Lærne, you have to care about turning your function into a tail-recursive one; then you may be interested by using a continuation-passing style. Thus your function could be able to call either itself or the "escape" function. I wrote a module called tco for optimizing tail-calls; see https://github.com/baruchel/tco
Hope it can help.
Here is another approach: it doesn't exit recursion early, instead it prevents additional function calls if not needed, which is essentially what you're trying to achieve.
class Node:
def __init__(self, v):
self.v = v
self.left = None
self.right = None
def find_smallest_at_k(root, k):
res = [None]
count = [k]
def helper(root):
if root is None:
return
helper(root.left)
count[0] -= 1
if count[0] == 0:
print("found it!")
res[0] = root
return
if count[0] > 0:
print("visiting right")
find(root.right)
helper(root)
return res[0].v
If you want to exit as soon as earlier possible, then use exit(0).
This will make your task easy!

recursive sorting in python

I am trying to run a sorting function recursively in python. I have an empty list that starts everything but everytime I try to print the list I get an empty list. here is my code. Any help would be greatly appreciated
def parse(list):
newParse = []
if len(list) == 0:
return newParse
else:
x = min(list)
list.remove(x)
newParse.append(x)
return sort(list)
The value of newParse is not preserved between invocations of the function; you're setting it equal to [] (well, you're creating a new variable with the value []).
Since the only time you return is
newParse = []
if len(list) == 0:
return newParse`
you will always be returning [] because that is the value of newParse at that time.
Because you are doing this recursively, you are calling the function anew, without keeping the function's own state. Take a moment to consider the implications of this on your code.
Instead of initialising newParse = [], add an optional parameter newParse defaulting to a bogus value, and set newParse = [] if you receive that bogus value for newParse. Otherwise, you'll actually be getting the same list every time (i.e. the contents of the list object are being mutated). And newParse through in your tail call.
You also seem to have the problem that your definition and and the supposedly-recursive call refer to different functions.
def sort(list, newParse = None):
if newParse is None:
newParse = []
if len(list) == 0:
return newParse
else:
x = min(list)
list.remove(x)
newParse.append(x)
return sort(list, newParse)
Here is what I think you are trying to do:
def recursive_sort(a_list):
def helper_function(list_to_be_sorted, list_already_sorted):
new = []
if len(list_to_be_sorted) == 0:
return list_already_sorted
else:
x = min(list_to_be_sorted)
list_to_be_sorted.remove(x)
new.append(x)
return helper_function(list_to_be_sorted, list_already_sorted + new)
return helper_function(a_list, [])
You shouldn't name variables list, as that is a builtin.
Also, if you are trying to implement a recursive sort function, you might want to look at quicksort, which is a very common (and fast) recursive sorting algorithm. What you have tried to implement is a recursive version of selection sort, which is much slower.
Also, if you actually need a sorting function, rather than just wanting to implement a recursive one, you should use the list method sort, or the function on an iterable sorted, both of which will be a lot faster than anything you could make in Python.

Finding a node in a tree

I am having trouble finding a node in a tree with arbitrary branching factor. Each Node carries data and has zero or greater children. The search method is inside the Node class and
checks to see if that Node carries data and then checks all of that Nodes children. I keep ending up with infinite loops in my recursive method, any help?
def find(self, x):
_level = [self]
_nextlevel = []
if _level == []:
return None
else:
for node in _level:
if node.data is x:
return node
_nextlevel += node.children
_level = _nextlevel
return self.find(x) + _level
The find method is in the Node class and checks if data x is in the node the method is called from, then checks all of that nodes children. I keep getting an infinite loop, really stuck at this point any insight would be appreciated.
There are a few issues with this code. First, note that on line 2 you have _level = [self]. that means the if _level == [] on line 5 will always be false.
The 2nd issue is that your for loop goes over everything in _level, but, as noted above, that will always be [self] due to line 2.
The 3rd issue is the return statement. You have return self.find(x) + _level. That gets evaluated in 2 parts. First, call self.find(x), then concatenate what that returns with the contents of _level. But, when you call self.find(x) that will call the same method with the same arguments and that, in turn, will then hit the same return self.find(x) + _level line, which will call the same method again, and on and on forever.
A simple pattern for recursive searches is to use a generator. That makes it easy to pass up the answers to calling code without managing the state of the recursion yourself.
class Example(object):
def __init__(self, datum, *children):
self.Children = list(children) # < assumed to be of the same or duck-similar class
self.Datum = datum
def GetChildren(self):
for item in self.Children:
for subitem in item.GetChildren():
yield subitem
yield item
def FindInChildren(self, query): # where query is an expression that is true for desired data
for item in self.GetChildren():
if query(item):
yield item

How to use properly recursion and side effects in python

In a tree structure, I'm trying to find all leafs of a branch. Here is what I wrote:
def leafs_of_branch(node,heads=[]):
if len(node.children()) == 0:
heads.append(str(node))
else:
for des in node.children():
leafs_of_branch(des)
return heads
leafs_of_branch(node)
I don't know why but it feels wrong for me. It works but I want to know if there is a better way to use recursion without creating the heads parameter.
This
def leafs_of_branch(node,heads=[]):
is always a bad idea. Better would be
def leafs_of_branch(node,heads=None):
heads = heads or []
as otherwise you always use the same list for leafs_of_branch. In your specific case it might be o.k., but sooner or later you will run into problems.
I recommend:
def leafs_of_branch(node):
leafs = []
for des in node.children():
leafs.extend(leafs_of_branch(des))
if len(leafs)==0:
leafs.append(str(node))
return leafs
leafs_of_branch(node)
Instead of doing a if len(node.children()==0, I check for len(leafs) after descending into all (possibly zero) children. Thus I call node.children() only once.
I believe this should work:
def leafs_of_branch(node):
if len(node.children()) == 0:
return [str(node)]
else:
x = []
for des in node.children():
x += leafs_of_branch(des) #x.extend(leafs_of_branch(des)) would work too :-)
return x
It's not very pretty and could probably be condensed a bit more, but I was trying to keep the form of your original code as much as possible to make it obvious what was going on.
Your original version won't actually work if you call it more than once because as you append to the heads list, that list will actually be saved between calls.
As long as recursion goes, you are doing it right IMO; you are missing the heads paramater on the recursive call tho. The reason it's working anyway is for what other people said, default parameters are global and reused between calls.
If you want to avoid recursion altogheter, in this case you can use either a Queue or a Stack and a loop:
def leafs_of_branch(node):
traverse = [node]
leafs = []
while traverse:
node = traverse.pop()
children = node.children()
if children:
traverse.extend(children)
else:
leafs.append(str(node))
return leafs
You may also define recursively an iterator this way.
def leafs_of_branch(node):
if len(node.children()) == 0:
yield str(node)
else:
for des in node.children():
for leaf in leafs_of_branch(des):
yield leaf
leafs = list(leafs_of_branch(node))
First of all, refrain from using mutable objects (lists, dicts etc) as default values, since default values are global and reused between the function calls:
def bad_func(val, dest=[]):
dest.append(val)
print dest
>>> bad_func(1)
[1]
>>> bad_func(2)
[1, 2] # surprise!
So, the consequent calls will make something completely unexpected.
As for the recursion question, I'd re-write it like this:
from itertools import chain
def leafs_of_branch(node):
children = node.children()
if not children: # better than len(children) == 0
return (node, )
all_leafs = (leafs_of_branch(child) for child in children)
return chain(*all_leafs)

Categories