How to use yield in BinarySearchTree? - python

I am following the BinarySearchTree code in the book Data Structure and Algorithms.
Would you like to read the full code in this link?
And I am not clear how this method works
def __iter__(self):
if self.left != None:
for elem in self.left:
yield elem
yield self.val
if self.right != None:
for elem in self.right:
yield elem
Is the elem variable an instance of the Node class or is it a float number (from inputs)? In debug it is both, I guess this value is changed because of line yield elem but I do not understand it.
What are the differences between yield elem and yield self.val? How many generator objects are there in this situation?
In addition, would you like to share some experience in debugging generator functions? I am confused by yield when debugging.

1. elem is a Node instance. From the for loops, we know that elem is always either self.left or self.right. You can see in the example usage that float values are inserted into the binary tree with tree.insert(float(x)) and the BinarySearchTree.insert() method ultimately calls BinarySearchTree.Node(val) where val is float(x) in this case. Therefore self.left and self.right are always Node instances.
As mentioned by don't talk just code in the comments, elem is a float. I did not see this before because I assumed that iterating over self.left would product a list of Node elements. However this is not correct. In fact, iterating over self.left works in this case by calling self.left.__iter__(). I break down this __iter__() function into 3 cases, almost like a recursive function. (It is not in fact recursive because it is calling the __iter__() method of different instances of the Node class, but its behavior is similar.)
First, the Node has no left or right children. This is straightforward: the iter will just yield self.val, which is a float.
Second, the Node has left children. In this case, the for loop will traverse down all the left children in an almost recursive fashion until it reaches a Node that has no left children. Then we are back at the first case.
Third, the Node has right children. In this case, after the own nodes self.val is return, the iterator will continue to the first right node, and repeat.
There is only one generator, namely Node.__iter__(), because generators are functions. It uses multiple yield statements to return different values depending on the situation. yield elem and yield self.val just return either a Node if the current Node has left or right branches or the current Node's value.
I do not have specific tips for debugging yield statements in particular. In general I use IPython for interactive work when building code and use its built-in %debug magic operator. You might also find rubber duck debugging useful.
Using IPython you can run the following in a cell to debug interactively.
In [37]: %%debug
...: for x in tree.root:
...: print(x)
...:
NOTE: Enter 'c' at the ipdb> prompt to continue execution.
You can then use the s command at the debugger prompt, ipdb> , to step through the code, jumping into a function calls.
ipdb> s
--Call--
> <ipython-input-1-c4e297595467>(30)__iter__()
28 # of the nodes of the tree yielding all the values. In this way, we get
29 # the values in ascending order.
---> 30 def __iter__(self):
31 if self.left != None:
32 for elem in self.left:
While debugging, you can evaluate expressions by preceding them with an exclamation point, !.
ipdb> !self
BinarySearchTree.Node(5.5,BinarySearchTree.Node(4.4,BinarySearchTree.Node(3.3,BinarySearchTree.Node(2.2,BinarySearchTree
.Node(1.1,None,None),None),None),None),None)

First, there is an indentation issue in the code you shared: yield self.val should not be in the if block:
def __iter__(self):
if self.left != None:
for elem in self.left:
yield elem
yield self.val # Unconditional. This is also the base case
if self.right != None:
for elem in self.right:
yield elem
To understand this code, first start imagining a tree with just one node. Let's for a moment ignore the BinarySearchTree class and say we have direct access to the Node class. We can create a node and then iterate it:
node = Node(1)
for value in node:
print(value)
This loop will call the __iter__ method, which in this case will not execute any of the if blocks, as it has no children, and only execute yield self.val. And that is what value in the above loop will get as value, and which gets printed.
Now extend this little exercise with 2 more nodes:
node = Node(1,
Node(0),
Node(2)
)
for value in node:
print(value)
Here we have created this tree, and node refers to its root
1 <-- node
/ \
0 2
When the for..in loop will call __iter__ now, it will first enter the first if block, where we get a form of recursion. With the for statement there, we again execute __iter__, but this time on the left child of node, i.e. the node with value 0. But that is a case we already know: this node has no children, and we know from the first example above, that this results in one iteration where the loop variable will be the value of that node, i.e. 0, and that value is yielded. That means the main program gets an iteration with value equal to 0, which gets printed.
So elem is numeric. It would better have been called value or val to take away any confusion.
After that if block has executed we get to yield self.val. self is here node, and so we yield 1. That means the main program gets to execute a second iteration, this time with value equal to 1.
Finally the second if block is executed, and now the right child of node is the subject of a recursive __iter__ call. It is the same principle as with the left child. This yields value 2, and the main program prints 2.
We could again extend the tree with more nodes, but the principle is the same: by recursive calls of __iter__ all the values of the tree are yielded.
yield from
There is a syntax that allows simplification of the code, and also it is more common practice to use the is operator when comparing with None:
def __iter__(self):
if self.left is not None:
yield from self.left
yield self.val
if self.right is not None:
yield from self.right
This results in the same behavior. yield from will yield all values that come from the iterable. And since node instances are iterable as they have the __iter__ method, this works as intended.

Related

Python: How can I implement yield in my recursion?

How can I implement yield from in my recursion? I am trying to understand how to implement it but failing:
# some data
init_parent = [1020253]
df = pd.DataFrame({'parent': [1020253, 1020253],
'id': [1101941, 1101945]})
# look for parent child
def recur1(df, parents, parentChild=None, step=0):
if len(parents) != 0:
yield parents, parentChild
else:
parents = df.loc[df['parent'].isin(parents)][['id', 'parent']]
parentChild = parents['parent'].to_numpy()
parents = parents['id'].to_numpy()
yield from recur1(df=df, parents=parents, parentChild=parentChild, step=step+1)
# exec / only printing results atm
out = recur1(df, init_parent, step=0)
[x for x in out]
I'd say your biggest issue here is that recur1 isn't always guaranteed to return a generator. For example, suppose your stack calls into the else branch three times before calling into the if branch. In this case, the top three frames would be returning a generator received from the lower frame, but the lowest from would be returned from this:
yield parents, parentChild
So, then, there is a really simple way you can fix this code to ensure that yield from works. Simply transform your return from a tuple to a generator-compatible type by enclosing it in a list:
yield [(parents, parentChild)]
Then, when you call yield from recur1(df=df, parents=parents, parentChild=parentChild, step=step+1) you'll always be working with something for which yeild from makes sense.

Build binary search tree using dictionary in Python

I'm trying to build a BST (binary search tree) with dict in python. I do not understand why my code is not adding nodes to the BST. I saw a similar post here:
How to implement a binary search tree in Python?
which looks the same as my code except declaring a node class, but I would like to know why my dict implementation fails (and hopefully improve my understanding of parameter passing with recursion in python).
keys = [10,9,2,5,3,7,101,18]
start = {'key': keys[-1], 'val': 1, 'left': None, 'right': None}
def binarySearch(root, node):
# compare keys and insert node into right place
if not root:
root = node
elif node['key'] < root['key']:
binarySearch(root['left'], node)
else:
binarySearch(root['right'], node)
# Now let's test our function and build a BST
while keys:
key = keys.pop()
node = {'key': key, 'val': 1, 'left': None, 'right': None}
binarySearch(start, node)
print(start) # unchanged, hence my confusion. Thx for your time!
===========================================
Edit: here is the code that would make it work!
def binarySearch(root, node):
# compare keys and insert node into right place
if not root:
root = node
elif node['key'] < root['key']:
if not root['left']: root['left'] = node
else: binarySearch(root['left'], node)
else:
if not root['right']: root['right'] = node
else: binarySearch(root['right'], node)
Here is what I think that is happening under the hood (why one version is able to add to BST but the other one is not):
In the original version, we will reach a recursion call where root still points to None inside the BST, but then root = node make root points to node which has absolutely no connection with start, i.e. the BST itself. Then local variables are deleted and no changes are made.
In the modified version, we will avoid this since when we add the node by e.g. root['left'] = node. Here root is still pointing to the original BST and thus we are modifying the key-val pair in the original BST instead of having root point to something totally outside the BST.
Let's run through your code as though we were the python interpreter.
Lets start at the first call: binarySearch(start, node)
Here start is the dict defined at the top of your script and node is another dict (which curiously has the same value).
Lets jump inside the call and we find ourselves at: if not root: where root refers to start above and so is truthy so fails this if.
Next we find ourselves at: elif node['key'] < root['key']: which in this case is not True.
Next we pass into the else: and we are at: binarySearch(root['right'], node).
Just before we jump into the first recursive call, lets review what the parameters to the call are: root['right'] from start has the value None and node is still the same dict which we want to insert somewhere. So, onto the recursive call.
Again we find ourselves at: if not root:
However this time root just refers to the first parameter of the first recursive call and we can see from the above review of the parameters that root refers to None.
Now None is considered falsy and so this time the if succeeds and we are on to the next line.
Now we are at root = node.
This is an assignment in python. What this means is that python will use the variable root to stop referring to None and to refer to whatever node currently refers to, which is the dict which was created in the while loop. So root (which is just a parameter, but you can think of as a local variable now) refers to a dict.
Now what happens is that we are at the end of the first recursive call and this function ends. Whenever a function ends, all the local variables are destroyed. That is root and node are destroyed. That is just these variables and not what they refer to.
Now we return to just after the first call site i.e. just after binarySearch(root['right'], node)
We can see here that the parameters: root['right'], node still refer to whatever they were referring to before. This is why your start is unchanged and why your program should deal with left and right now instead of recursing.
#Creted by The Misunderstood Genius
def add_root(e,key):
''''
e is node's name
key is the node's key search
'''
bst=dict()
bst[e]={'key':key,'P':None,'L':None,'R':None}
return bst
def root(tree):
for k,v in tree.items():
if v['P'] == None:
return k
def insert(tree, node, key):
tree[node]={'key':key,'P':None,'L':None,'R':None}
y =None
x = root(tree)
node_key = tree[node]['key']
while x is not None:
y=x
node_root=tree['R']['key']
if node_key < node_root:
x=tree[x]['L']
else:
x=tree[x]['R']
tree[node]['P']=y
if y is not None and node_key< tree[y]['key']:
tree[y]['L']=node
else:
tree[y]['R']=node
return tree
def print_all(tree):
for k,v in tree.items():
print(k,v)
print()
'''
Give a root node and key search target
Returns the name of the node with associated key
Else None
'''
def tree_search(tree,root, target):
if root ==None:
print(" key with node associate not found")
return root
if tree[root]['key'] == target:
return root
if target < tree[root]['key']:
return tree_search(tree,tree[root]['L'],target)
else:
return tree_search(tree,tree[root]['R'],target)
def tree_iterative_search(tree,root,target):
while root is not None and tree[root]['key']!=target:
if target < tree[root]['key']:
root=tree[root]['L']
else:
root=tree[root]['R']
return root
def minimum(tree,root):
while tree[root]['L'] is not None:
root=tree[root]['L']
return tree[root]['key']
bst=add_root('R',20)
bst=insert(bst,'M',10)
bst=insert(bst,'B',8)
bst=insert(bst,'C',24)
bst=insert(bst,'D',22)
bst=insert(bst,'E',25)
bst=insert(bst,'G',25)
print_all(bst)
print(tree_search(bst,'R',25))
x=tree_iterative_search(bst,'R',25)
print(x)
#print(minimum(bst,'R'))

trying to understand recursive run of generators

I'm trying to figure out how to plot the run of this code onto a recursion tree, because im not quite sure how it operates, even when I'm debugging.
what each yield is doing and why do I need them both?
Ive tried creating a somewhat tree that connects each run to its next recursively but I dont know what follows yield.data where the head is 'c'
class Node:
def __init__(self, data, next=None):
self.data = data
self.next = next
def get_reverse_iterator(head):
if head.next:
for datum in get_reverse_iterator(head.next):
yield datum
yield head.data
lst = Node('a', Node('b', Node('c')))
for x in get_reverse_iterator(lst):
print(x)
the result should be:
c
b
a
To understand how it works you need to understand the basic idea of recursion. Let's suppose that we are not dealing with a generators; we just wish to print all the nodes of a list in reverse given the head node. We call the function print_reverse passing the node as the argument. If the node's next field is empty we just print the field's data value. But if next is not empty, it is pointing to a node that must be printed before the current node is printed. So we recursively call print_reverse again to first print that node. When print_reverse returns we can now print the current node. Of course, when we call print_reverse recursively to print the next node, it may discover that there is yet another node that it points to which must first be printed and we will be calling print_reverse recursively yet again. So we have:
class Node:
def __init__(self, data, next=None):
self.data = data
self.next = next
def print_reverse(head):
if head.next:
print_reverse(head.next)
print(head.data)
lst = Node('a', Node('b', Node('c')))
print_reverse(lst)
The above code must be understood before the generator problem can be understood. Instead of creating a function print_reverse that prints the node's data field, we wish instead to create a generator function that yields the value. So, it makes sense to rename the function and to replace the print function with a yield statement and the recursive call with a yield from statement:
class Node:
def __init__(self, data, next=None):
self.data = data
self.next = next
def get_reverse_iterator(head):
if head.next:
#print_reverse(head.next)
yield from get_reverse_iterator(head.next)
#print(head.data)
yield head.data
lst = Node('a', Node('b', Node('c')))
Now we can use the generator as in:
for x in get_reverse_iterator(lst):
print(x)
or:
l = [x in get_reverse_iterator(lst)]
But an alternative to using recursion that avoids creating multiple generator objects, would be:
def get_reverse_iterator(head):
stack = []
while head.next:
stack.append(head)
head = head.next
yield head.data
while len(stack):
head = stack.pop()
yield head.data
Whenever you call the method as a generator (e.g. for x in get_reverse_iterator()), python starts executing that method line by line. Whenever it hits a yield, it stops cold and returns that. When it gets asked for a next() value in the next iteration of the for loop, it continues to execute.
This looks like a fairly straightforward linked-list-traversal idiom, where each element of the list contains data that is itself a list (or some other iterable value, like a string):
list[0].data = [1, 2, 3, 4]
list[1].data = [5, 6, 7, 8]
...
list[9].data = [37, 38, 39, 40]
So what the code is doing here is printing those sub-lists from the back of the main list to the front of the main list. The output should look something like this:
37 38 39 40 33 34 35 36 ... 5 6 7 8 [1, 2, 3, 4]
which becomes evident when you look at how the code executes. I'll rewrite it in words:
func get_reverse_iterator(head) {
if head isn't the last element of the list, then
call this function on the next element of the list (head.next)
for every element in the return value of that,
yield that element
yield this element's data
The 'base case' is the last element of the list, which doesn't have a .next. So its data, which is iterable, gets returned to the second-to-last element. The second-to-last element yields every element of that data in turn, and then returns its own data to the third-to-last element. The third-to-last element yields every element of that data in turn, and so on, until finally you get to the first element of the list. Every single yield statement thus far has passed one element up the chain, recursively, and so that inner for loop for the first element has yielded 36 values so far. Finally, all the later elements in the list are done passing values through, and so the first element gets to the last statement of the function and yields its own data.
But there's nothing left to catch that yielded data and parse it by individual element, so it gets printed as the list it was in the first place. Or, at least, that is for my example presented above.
In your case, it's more straightforward, because when you iterate over a string each item is still a string. But it's the same thing on a smaller scale:
get_reverse_iterator() is called on the root node of lst
The root node (I'll call it NodeA) has a .next
get_reverse_iterator() is called on the next node, which I'll call NodeB
NodeB has a .next
get_reverse_iterator() is called on the next node, which I'll call NodeC
NodeC does not have a .next
get_reverse_iterator(NodeC) skips the for loop and yields NodeC.data, which is 'c'`
get_reverse_iterator(NodeB) catches 'c' inside the for loop and yields it
get_reverse_iterator(NodeA) catches 'c' inside the for loop and yields it
'c' gets assigned to x, and it gets printed.
The next iteration of the outer loop happens, and execution returns to get_reverse_iterator(NodeB)
The for loop ends, because get_reverse_iterator(NodeC) has stopped yielding things
get_reverse_iterator(NodeB) ends the for loop, exits the if block, and finally yields NodeB.data, which is 'b'
get_reverse_iterator(NodeA) catches 'b' inside the for loop and yields it
'b' gets assigned to x, and it gets printed.
The next iteration of the outer loop happens, and execution returns to get_reverse_iterator(NodeA)
The for loop ends, because get_reverse_iterator(NodeC) has stopped yielding things
get_reverse_iterator(NodeA) ends the for loop, exits the if block, and finally yields NodeA.data, which is 'a'
'a' gets assigned to x, and it gets printed
The outer for loop finishes, as get_reverse_iterator(NodeA) has stopped yielding things.

Python recursion - how to exit early

I've been playing with BST (binary search tree) and I'm wondering how to do an early exit. Following is the code I've written to find kth smallest. It recursively calls the child node's find_smallest_at_k, stack is just a list passed into the function to add all the elements in inorder. Currently this solution walks all the nodes inorder and then I have to select the kth item from "stack" outside this function.
def find_smallest_at_k(self, k, stack, i):
if self is None:
return i
if (self.left is not None):
i = self.left.find_smallest_at_k(k, stack, i)
print(stack, i)
stack.insert(i, self.data)
i += 1
if i == k:
print(stack[k - 1])
print "Returning"
if (self.right is not None):
i = self.right.find_smallest_at_k(k, stack, i)
return i
It's called like this,
our_stack = []
self.root.find_smallest_at_k(k, our_stack, 0)
return our_stack[k-1]
I'm not sure if it's possible to exit early from that function. If my k is say 1, I don't really have to walk all the nodes then find the first element. It also doesn't feel right to pass list from outside function - feels like passing pointers to a function in C. Could anyone suggest better alternatives than what I've done so far?
Passing list as arguments: Passing the list as argument can be good practice, if you make your function tail-recursive. Otherwise it's pointless. With BST where there are two potential recursive function calls to be done, it's a bit of a tall ask.
Else you can just return the list. I don't see the necessity of variable i. Anyway if you absolutely need to return multiples values, you can always use tuples like this return i, stack and this i, stack = root.find_smallest_at_k(k).
Fast-forwarding: For the fast-forwarding, note the right nodes of a BST parent node are always bigger than the parent. Thus if you descend the tree always on the right children, you'll end up with a growing sequence of values. Thus the first k values of that sequence are necessarily the smallest, so it's pointless to go right k times or more in a sequence.
Even in the middle of you descend you go left at times, it's pointless to go more than k times on the right. The BST properties ensures that if you go right, ALL subsequent numbers below in the hierarchy will be greater than the parent. Thus going right k times or more is useless.
Code: Here is a pseudo-python code quickly made. It's not tested.
def findKSmallest( self, k, rightSteps=0 ):
if rightSteps >= k: #We went right more than k times
return []
leftSmallest = self.left.findKSmallest( k, rightSteps ) if self.left != None else []
rightSmallest = self.right.findKSmallest( k, rightSteps + 1 ) if self.right != None else []
mySmallest = sorted( leftSmallest + [self.data] + rightSmallest )
return mySmallest[:k]
EDIT The other version, following my comment.
def findKSmallest( self, k ):
if k == 0:
return []
leftSmallest = self.left.findKSmallest( k ) if self.left != None else []
rightSmallest = self.right.findKSmallest( k - 1 ) if self.right != None else []
mySmallest = sorted( leftSmallest + [self.data] + rightSmallest )
return mySmallest[:k]
Note that if k==1, this is indeed the search of the smallest element. Any move to the right, will immediately returns [], which contributes to nothing.
As said Lærne, you have to care about turning your function into a tail-recursive one; then you may be interested by using a continuation-passing style. Thus your function could be able to call either itself or the "escape" function. I wrote a module called tco for optimizing tail-calls; see https://github.com/baruchel/tco
Hope it can help.
Here is another approach: it doesn't exit recursion early, instead it prevents additional function calls if not needed, which is essentially what you're trying to achieve.
class Node:
def __init__(self, v):
self.v = v
self.left = None
self.right = None
def find_smallest_at_k(root, k):
res = [None]
count = [k]
def helper(root):
if root is None:
return
helper(root.left)
count[0] -= 1
if count[0] == 0:
print("found it!")
res[0] = root
return
if count[0] > 0:
print("visiting right")
find(root.right)
helper(root)
return res[0].v
If you want to exit as soon as earlier possible, then use exit(0).
This will make your task easy!

Finding a node in a tree

I am having trouble finding a node in a tree with arbitrary branching factor. Each Node carries data and has zero or greater children. The search method is inside the Node class and
checks to see if that Node carries data and then checks all of that Nodes children. I keep ending up with infinite loops in my recursive method, any help?
def find(self, x):
_level = [self]
_nextlevel = []
if _level == []:
return None
else:
for node in _level:
if node.data is x:
return node
_nextlevel += node.children
_level = _nextlevel
return self.find(x) + _level
The find method is in the Node class and checks if data x is in the node the method is called from, then checks all of that nodes children. I keep getting an infinite loop, really stuck at this point any insight would be appreciated.
There are a few issues with this code. First, note that on line 2 you have _level = [self]. that means the if _level == [] on line 5 will always be false.
The 2nd issue is that your for loop goes over everything in _level, but, as noted above, that will always be [self] due to line 2.
The 3rd issue is the return statement. You have return self.find(x) + _level. That gets evaluated in 2 parts. First, call self.find(x), then concatenate what that returns with the contents of _level. But, when you call self.find(x) that will call the same method with the same arguments and that, in turn, will then hit the same return self.find(x) + _level line, which will call the same method again, and on and on forever.
A simple pattern for recursive searches is to use a generator. That makes it easy to pass up the answers to calling code without managing the state of the recursion yourself.
class Example(object):
def __init__(self, datum, *children):
self.Children = list(children) # < assumed to be of the same or duck-similar class
self.Datum = datum
def GetChildren(self):
for item in self.Children:
for subitem in item.GetChildren():
yield subitem
yield item
def FindInChildren(self, query): # where query is an expression that is true for desired data
for item in self.GetChildren():
if query(item):
yield item

Categories