trying to understand recursive run of generators - python

I'm trying to figure out how to plot the run of this code onto a recursion tree, because im not quite sure how it operates, even when I'm debugging.
what each yield is doing and why do I need them both?
Ive tried creating a somewhat tree that connects each run to its next recursively but I dont know what follows yield.data where the head is 'c'
class Node:
def __init__(self, data, next=None):
self.data = data
self.next = next
def get_reverse_iterator(head):
if head.next:
for datum in get_reverse_iterator(head.next):
yield datum
yield head.data
lst = Node('a', Node('b', Node('c')))
for x in get_reverse_iterator(lst):
print(x)
the result should be:
c
b
a

To understand how it works you need to understand the basic idea of recursion. Let's suppose that we are not dealing with a generators; we just wish to print all the nodes of a list in reverse given the head node. We call the function print_reverse passing the node as the argument. If the node's next field is empty we just print the field's data value. But if next is not empty, it is pointing to a node that must be printed before the current node is printed. So we recursively call print_reverse again to first print that node. When print_reverse returns we can now print the current node. Of course, when we call print_reverse recursively to print the next node, it may discover that there is yet another node that it points to which must first be printed and we will be calling print_reverse recursively yet again. So we have:
class Node:
def __init__(self, data, next=None):
self.data = data
self.next = next
def print_reverse(head):
if head.next:
print_reverse(head.next)
print(head.data)
lst = Node('a', Node('b', Node('c')))
print_reverse(lst)
The above code must be understood before the generator problem can be understood. Instead of creating a function print_reverse that prints the node's data field, we wish instead to create a generator function that yields the value. So, it makes sense to rename the function and to replace the print function with a yield statement and the recursive call with a yield from statement:
class Node:
def __init__(self, data, next=None):
self.data = data
self.next = next
def get_reverse_iterator(head):
if head.next:
#print_reverse(head.next)
yield from get_reverse_iterator(head.next)
#print(head.data)
yield head.data
lst = Node('a', Node('b', Node('c')))
Now we can use the generator as in:
for x in get_reverse_iterator(lst):
print(x)
or:
l = [x in get_reverse_iterator(lst)]
But an alternative to using recursion that avoids creating multiple generator objects, would be:
def get_reverse_iterator(head):
stack = []
while head.next:
stack.append(head)
head = head.next
yield head.data
while len(stack):
head = stack.pop()
yield head.data

Whenever you call the method as a generator (e.g. for x in get_reverse_iterator()), python starts executing that method line by line. Whenever it hits a yield, it stops cold and returns that. When it gets asked for a next() value in the next iteration of the for loop, it continues to execute.
This looks like a fairly straightforward linked-list-traversal idiom, where each element of the list contains data that is itself a list (or some other iterable value, like a string):
list[0].data = [1, 2, 3, 4]
list[1].data = [5, 6, 7, 8]
...
list[9].data = [37, 38, 39, 40]
So what the code is doing here is printing those sub-lists from the back of the main list to the front of the main list. The output should look something like this:
37 38 39 40 33 34 35 36 ... 5 6 7 8 [1, 2, 3, 4]
which becomes evident when you look at how the code executes. I'll rewrite it in words:
func get_reverse_iterator(head) {
if head isn't the last element of the list, then
call this function on the next element of the list (head.next)
for every element in the return value of that,
yield that element
yield this element's data
The 'base case' is the last element of the list, which doesn't have a .next. So its data, which is iterable, gets returned to the second-to-last element. The second-to-last element yields every element of that data in turn, and then returns its own data to the third-to-last element. The third-to-last element yields every element of that data in turn, and so on, until finally you get to the first element of the list. Every single yield statement thus far has passed one element up the chain, recursively, and so that inner for loop for the first element has yielded 36 values so far. Finally, all the later elements in the list are done passing values through, and so the first element gets to the last statement of the function and yields its own data.
But there's nothing left to catch that yielded data and parse it by individual element, so it gets printed as the list it was in the first place. Or, at least, that is for my example presented above.
In your case, it's more straightforward, because when you iterate over a string each item is still a string. But it's the same thing on a smaller scale:
get_reverse_iterator() is called on the root node of lst
The root node (I'll call it NodeA) has a .next
get_reverse_iterator() is called on the next node, which I'll call NodeB
NodeB has a .next
get_reverse_iterator() is called on the next node, which I'll call NodeC
NodeC does not have a .next
get_reverse_iterator(NodeC) skips the for loop and yields NodeC.data, which is 'c'`
get_reverse_iterator(NodeB) catches 'c' inside the for loop and yields it
get_reverse_iterator(NodeA) catches 'c' inside the for loop and yields it
'c' gets assigned to x, and it gets printed.
The next iteration of the outer loop happens, and execution returns to get_reverse_iterator(NodeB)
The for loop ends, because get_reverse_iterator(NodeC) has stopped yielding things
get_reverse_iterator(NodeB) ends the for loop, exits the if block, and finally yields NodeB.data, which is 'b'
get_reverse_iterator(NodeA) catches 'b' inside the for loop and yields it
'b' gets assigned to x, and it gets printed.
The next iteration of the outer loop happens, and execution returns to get_reverse_iterator(NodeA)
The for loop ends, because get_reverse_iterator(NodeC) has stopped yielding things
get_reverse_iterator(NodeA) ends the for loop, exits the if block, and finally yields NodeA.data, which is 'a'
'a' gets assigned to x, and it gets printed
The outer for loop finishes, as get_reverse_iterator(NodeA) has stopped yielding things.

Related

Python: How can I implement yield in my recursion?

How can I implement yield from in my recursion? I am trying to understand how to implement it but failing:
# some data
init_parent = [1020253]
df = pd.DataFrame({'parent': [1020253, 1020253],
'id': [1101941, 1101945]})
# look for parent child
def recur1(df, parents, parentChild=None, step=0):
if len(parents) != 0:
yield parents, parentChild
else:
parents = df.loc[df['parent'].isin(parents)][['id', 'parent']]
parentChild = parents['parent'].to_numpy()
parents = parents['id'].to_numpy()
yield from recur1(df=df, parents=parents, parentChild=parentChild, step=step+1)
# exec / only printing results atm
out = recur1(df, init_parent, step=0)
[x for x in out]
I'd say your biggest issue here is that recur1 isn't always guaranteed to return a generator. For example, suppose your stack calls into the else branch three times before calling into the if branch. In this case, the top three frames would be returning a generator received from the lower frame, but the lowest from would be returned from this:
yield parents, parentChild
So, then, there is a really simple way you can fix this code to ensure that yield from works. Simply transform your return from a tuple to a generator-compatible type by enclosing it in a list:
yield [(parents, parentChild)]
Then, when you call yield from recur1(df=df, parents=parents, parentChild=parentChild, step=step+1) you'll always be working with something for which yeild from makes sense.

How to use yield in BinarySearchTree?

I am following the BinarySearchTree code in the book Data Structure and Algorithms.
Would you like to read the full code in this link?
And I am not clear how this method works
def __iter__(self):
if self.left != None:
for elem in self.left:
yield elem
yield self.val
if self.right != None:
for elem in self.right:
yield elem
Is the elem variable an instance of the Node class or is it a float number (from inputs)? In debug it is both, I guess this value is changed because of line yield elem but I do not understand it.
What are the differences between yield elem and yield self.val? How many generator objects are there in this situation?
In addition, would you like to share some experience in debugging generator functions? I am confused by yield when debugging.
1. elem is a Node instance. From the for loops, we know that elem is always either self.left or self.right. You can see in the example usage that float values are inserted into the binary tree with tree.insert(float(x)) and the BinarySearchTree.insert() method ultimately calls BinarySearchTree.Node(val) where val is float(x) in this case. Therefore self.left and self.right are always Node instances.
As mentioned by don't talk just code in the comments, elem is a float. I did not see this before because I assumed that iterating over self.left would product a list of Node elements. However this is not correct. In fact, iterating over self.left works in this case by calling self.left.__iter__(). I break down this __iter__() function into 3 cases, almost like a recursive function. (It is not in fact recursive because it is calling the __iter__() method of different instances of the Node class, but its behavior is similar.)
First, the Node has no left or right children. This is straightforward: the iter will just yield self.val, which is a float.
Second, the Node has left children. In this case, the for loop will traverse down all the left children in an almost recursive fashion until it reaches a Node that has no left children. Then we are back at the first case.
Third, the Node has right children. In this case, after the own nodes self.val is return, the iterator will continue to the first right node, and repeat.
There is only one generator, namely Node.__iter__(), because generators are functions. It uses multiple yield statements to return different values depending on the situation. yield elem and yield self.val just return either a Node if the current Node has left or right branches or the current Node's value.
I do not have specific tips for debugging yield statements in particular. In general I use IPython for interactive work when building code and use its built-in %debug magic operator. You might also find rubber duck debugging useful.
Using IPython you can run the following in a cell to debug interactively.
In [37]: %%debug
...: for x in tree.root:
...: print(x)
...:
NOTE: Enter 'c' at the ipdb> prompt to continue execution.
You can then use the s command at the debugger prompt, ipdb> , to step through the code, jumping into a function calls.
ipdb> s
--Call--
> <ipython-input-1-c4e297595467>(30)__iter__()
28 # of the nodes of the tree yielding all the values. In this way, we get
29 # the values in ascending order.
---> 30 def __iter__(self):
31 if self.left != None:
32 for elem in self.left:
While debugging, you can evaluate expressions by preceding them with an exclamation point, !.
ipdb> !self
BinarySearchTree.Node(5.5,BinarySearchTree.Node(4.4,BinarySearchTree.Node(3.3,BinarySearchTree.Node(2.2,BinarySearchTree
.Node(1.1,None,None),None),None),None),None)
First, there is an indentation issue in the code you shared: yield self.val should not be in the if block:
def __iter__(self):
if self.left != None:
for elem in self.left:
yield elem
yield self.val # Unconditional. This is also the base case
if self.right != None:
for elem in self.right:
yield elem
To understand this code, first start imagining a tree with just one node. Let's for a moment ignore the BinarySearchTree class and say we have direct access to the Node class. We can create a node and then iterate it:
node = Node(1)
for value in node:
print(value)
This loop will call the __iter__ method, which in this case will not execute any of the if blocks, as it has no children, and only execute yield self.val. And that is what value in the above loop will get as value, and which gets printed.
Now extend this little exercise with 2 more nodes:
node = Node(1,
Node(0),
Node(2)
)
for value in node:
print(value)
Here we have created this tree, and node refers to its root
1 <-- node
/ \
0 2
When the for..in loop will call __iter__ now, it will first enter the first if block, where we get a form of recursion. With the for statement there, we again execute __iter__, but this time on the left child of node, i.e. the node with value 0. But that is a case we already know: this node has no children, and we know from the first example above, that this results in one iteration where the loop variable will be the value of that node, i.e. 0, and that value is yielded. That means the main program gets an iteration with value equal to 0, which gets printed.
So elem is numeric. It would better have been called value or val to take away any confusion.
After that if block has executed we get to yield self.val. self is here node, and so we yield 1. That means the main program gets to execute a second iteration, this time with value equal to 1.
Finally the second if block is executed, and now the right child of node is the subject of a recursive __iter__ call. It is the same principle as with the left child. This yields value 2, and the main program prints 2.
We could again extend the tree with more nodes, but the principle is the same: by recursive calls of __iter__ all the values of the tree are yielded.
yield from
There is a syntax that allows simplification of the code, and also it is more common practice to use the is operator when comparing with None:
def __iter__(self):
if self.left is not None:
yield from self.left
yield self.val
if self.right is not None:
yield from self.right
This results in the same behavior. yield from will yield all values that come from the iterable. And since node instances are iterable as they have the __iter__ method, this works as intended.

Infinite for loop in Python - is the list updated or not?

First of all, I am aware that I should use while loop to create an infinite one. Nevertheless, while playing with making an infinite for loop in Python I run into something I do not fully understand.
My idea for an "infinite" for loop was following:
l = [1]
for el in l:
print(len(l))
l.append(1)
and this in fact created an infinite loop as 1 was constantly appended to the list l at each loop iteration. So I thought I do not want my list to become longer and longer so that at some point I have not enough memory. So I did this:
l = [1]
for el in l:
print(len(l))
l.pop(0)
l.append(1)
and I got just one iteration. Can someone explain why? Is it because the l is still referenced to the same object and its length is always 1?
The for statement returns an "iterator" which is a container for a stream of data that returns one element at a time, so you are not iterating over the iterable (your list in this case) but instead over this container.
You can return this iterator directly using iter() in order to better understand what is happening with your for loop.
items = ['a']
it = iter(items)
print(it.__reduce__())
# (<built-in function iter>, (['a'],), 0)
The built in __reduce__() function returns state information about the iterator. Note that the state information returned immediately after the iterator is created includes the iterable, ['a'], and an index value of 0. The index value indicates the position of the next element that will be returned by the iterator.
To simulate the first iteration of the loop, we can use next() which returns the next element from an iterator.
next(it)
print(it.__reduce__())
# (<built-in function iter>, (['a'],), 1)
Now we see that the index value has changed to 1 because the iterator already returned the first element in the list and on the next iteration it will attempt to return the second element in the list. If you attempt to remove the first element in the list and then append another element to the list, following is the resulting state of the iterator.
items.pop(0)
items.append('b')
print(it.__reduce__())
# (<built-in function iter>, (['b'],), 1)
You can see that the first element was removed and the new element was appended (as expected). However, the iterator still retained an index value of 1 as the position of the next element to be returned from iteration. If we attempt another iteration, a StopIteration exception will be raised because there is no element at index 1 in the iterable being used by our iterator container.
next(it)
# Traceback (most recent call last):
# File "main.py", line 16, in <module>
# next(it)
# StopIteration
If you are really interested in creating an infinite for loop, using a generator would be a better way to deal with your memory concerns (although as you note in your question, there aren't too many good reasons not to use while for this sort of thing). See my answer to a related question for an example of an infinite for loop.
If you use an iterator as suggested above, it makes it way easier to implemented.
The class below implements the ___iter___ and ___next___ method needed for a iterator classes
class WhileLoop:
def __init__(self, value):
#Assign the value
self.value = value
def __iter__(self):
#The iterator will be the class itself
return self
def __next__(self):
#The next element in the loop is the value itself
return self.value
You can then call and loop on this iterator as follows.
loop = WhileLoop(1)
for item in loop:
print(item)

Why is the head getting affected when I call my reverse function vs the case at the bottom? I get the none error

Here's the code. The head and the dummy node in ispalindrome gets changed after I call the reverse function. Any reason why this happens? I have a case in the bottom of doing the same thing where head doesn't get changed.
class ListNode(object):
def __init__(self,x):
self.val=x
self.next=None
def reverse(head,l):
prev=None
while head:
head.next,prev,head=prev,head,head.next
l+=1
return (prev,l)
def isPalindrome(head):
dummy=head
rev,l=reverse(head,0)
print (rev.val,dummy.next.val)
a=ListNode(1)
a.next=ListNode(2)
isPalindrome(a)
I did something similar here to test out my logic and this is the way I imagined my code to turn out: When I call reverse, I can still access the head and the b variable assigned to it.
class ListNode(object):
def __init__(self,x):
self.val=x
self.next=None
a=ListNode(1)
a.next=ListNode(2)
b=a
def reverse(head,l):
prev=None
while head:
head.next,prev,head=prev,head,head.next
l+=1
return (prev,l)
rev,l=reverse(a,0)
print b.val,rev.val
The issue with your code is that your reverse function is destructive. That is, it creates a reversed list by modifying the list you pass to it. Afterwards, the original list doesn't exist in the same form any more.
This is problematic for a palindrome test, since you want to compare the list to its own reverse. If the original doesn't exist any more, the reversed version isn't very useful.
While there are various ways you could avoid your specific error, to solve the larger problem of getting a reversed list while still having a copy of the original, I think there is an easy fix. You can change the logic of reverse to create a copy of the list in reversed order, rather than modifying the original:
def reverse_copy(head, l):
new_list = None
while head:
temp = Node(head.val)
temp.next, new_list = new_list, temp
l += 1
return new_list, l
Now you can have isPalindrome do rev, l = reverse_copy(head), and still have head refer to the original list!
In the first version you store the original head to dummy and then reverse the list. This of course causes dummy to be the last node in the reversed list. Then you try to access value of next element from dummy: dummy.next.val This obviously fails because dummy.next is None.
On the second example you're not accessing the links at all, only the values. When you reverse the list you only change the links between the nodes but not the values so you can't detect the change.
Update To illustrate better on what's happening here I've added a new function that displays the list before and after reversal:
def to_str(head):
l = []
while head:
l.append(str(head.val))
l.append('->')
head = head.next
l.append('None')
return ' '.join(l)
def isPalindrome(head):
dummy=head
print 'Before (dummy): {}'.format(to_str(dummy))
rev,l=reverse(head,0)
print 'After (dummy): {}'.format(to_str(dummy))
print 'After (rev): {}'.format(to_str(rev))
print (rev.val,dummy.next.val)
If you replace the original isPalindrome with above you get following output:
Before (dummy): 1 -> 2 -> None
After (dummy): 1 -> None
After (rev): 2 -> 1 -> None
Traceback (most recent call last):
File "test.py", line 34, in <module>
isPalindrome(a)
File "test.py", line 29, in isPalindrome
print (rev.val,dummy.next.val)
AttributeError: 'NoneType' object has no attribute 'val'

Finding a node in a tree

I am having trouble finding a node in a tree with arbitrary branching factor. Each Node carries data and has zero or greater children. The search method is inside the Node class and
checks to see if that Node carries data and then checks all of that Nodes children. I keep ending up with infinite loops in my recursive method, any help?
def find(self, x):
_level = [self]
_nextlevel = []
if _level == []:
return None
else:
for node in _level:
if node.data is x:
return node
_nextlevel += node.children
_level = _nextlevel
return self.find(x) + _level
The find method is in the Node class and checks if data x is in the node the method is called from, then checks all of that nodes children. I keep getting an infinite loop, really stuck at this point any insight would be appreciated.
There are a few issues with this code. First, note that on line 2 you have _level = [self]. that means the if _level == [] on line 5 will always be false.
The 2nd issue is that your for loop goes over everything in _level, but, as noted above, that will always be [self] due to line 2.
The 3rd issue is the return statement. You have return self.find(x) + _level. That gets evaluated in 2 parts. First, call self.find(x), then concatenate what that returns with the contents of _level. But, when you call self.find(x) that will call the same method with the same arguments and that, in turn, will then hit the same return self.find(x) + _level line, which will call the same method again, and on and on forever.
A simple pattern for recursive searches is to use a generator. That makes it easy to pass up the answers to calling code without managing the state of the recursion yourself.
class Example(object):
def __init__(self, datum, *children):
self.Children = list(children) # < assumed to be of the same or duck-similar class
self.Datum = datum
def GetChildren(self):
for item in self.Children:
for subitem in item.GetChildren():
yield subitem
yield item
def FindInChildren(self, query): # where query is an expression that is true for desired data
for item in self.GetChildren():
if query(item):
yield item

Categories