Implementing a depth-first tree iterator in Python

Implementing a depth-first tree iterator in Python - python

I'm trying to implement an iterator class for not-necessarily-binary trees in Python. After the iterator is constructed with a tree's root node, its next() function can be called repeatedly to traverse the tree in depth-first order (e.g., this order), finally returning None when there are no nodes left.
Here is the basic Node class for a tree:
class Node(object):
def __init__(self, title, children=None):
self.title = title
self.children = children or []
self.visited = False
def __str__(self):
return self.title
As you can see above, I introduced a visited property to the nodes for my first approach, since I didn't see a way around it. With that extra measure of state, the Iterator class looks like this:
class Iterator(object):
def __init__(self, root):
self.stack = []
self.current = root
def next(self):
if self.current is None:
return None
self.stack.append(self.current)
self.current.visited = True
# Root case
if len(self.stack) == 1:
return self.current
while self.stack:
self.current = self.stack[-1]
for child in self.current.children:
if not child.visited:
self.current = child
return child
self.stack.pop()
This is all well and good, but I want to get rid of the need for the visited property, without resorting to recursion or any other alterations to the Node class.
All the state I need should be taken care of in the iterator, but I'm at a loss about how that can be done. Keeping a visited list for the whole tree is non-scalable and out of the question, so there must be a clever way to use the stack.
What especially confuses me is this--since the next() function, of course, returns, how can I remember where I've been without marking anything or using excess storage? Intuitively, I think of looping over children, but that logic is broken/forgotten when the next() function returns!
UPDATE - Here is a small test:
tree = Node(
'A', [
Node('B', [
Node('C', [
Node('D')
]),
Node('E'),
]),
Node('F'),
Node('G'),
])
iter = Iterator(tree)
out = object()
while out:
out = iter.next()
print out

If you really must avoid recursion, this iterator works:
from collections import deque
def node_depth_first_iter(node):
stack = deque([node])
while stack:
# Pop out the first element in the stack
node = stack.popleft()
yield node
# push children onto the front of the stack.
# Note that with a deque.extendleft, the first on in is the last
# one out, so we need to push them in reverse order.
stack.extendleft(reversed(node.children))
With that said, I think that you're thinking about this too hard. A good-ole' (recursive) generator also does the trick:
class Node(object):
def __init__(self, title, children=None):
self.title = title
self.children = children or []
def __str__(self):
return self.title
def __iter__(self):
yield self
for child in self.children:
for node in child:
yield node
both of these pass your tests:
expected = ['A', 'B', 'C', 'D', 'E', 'F', 'G']
# Test recursive generator using Node.__iter__
assert [str(n) for n in tree] == expected
# test non-recursive Iterator
assert [str(n) for n in node_depth_first_iter(tree)] == expected
and you can easily make Node.__iter__ use the non-recursive form if you prefer:
def __iter__(self):
return node_depth_first_iter(self)

That could still potentially hold every label, though. I want the
iterator to keep only a subset of the tree at a time.
But you already are holding everything. Remember that an object is essentially a dictionary with an entry for each attribute. Having self.visited = False in the __init__ of Node means you are storing a redundant "visited" key and False value for every single Node object no matter what. A set, at least, also has the potential of not holding every single node ID. Try this:
class Iterator(object):
def __init__(self, root):
self.visited_ids = set()
...
def next(self):
...
#self.current.visited = True
self.visited_ids.add(id(self.current))
...
#if not child.visited:
if id(child) not in self.visited_ids:
Looking up the ID in the set should be just as fast as accessing a node's attribute. The only way this can be more wasteful than your solution is the overhead of the set object itself (not its elements), which is only a concern if you have multiple concurrent iterators (which you obviously don't, otherwise the node visited attribute couldn't be useful to you).

Related

Append (self) with stacks in Python classes

I am doing/learning data structures and algorithms and I came across the following code:
class BinaryTree:
def __init__(self, root_data):
self.data = root_data
self.left_child = None
self.right_child = None
def inorder_iterative(self):
inorder_list = []
return inorder_list
def get_right_child(self):
return self.right_child
def get_left_child(self):
return self.left_child
def set_root_val(self, obj):
self.data = obj
def get_root_val(self):
return self.data
def preorder_iterative(self):
pre_ordered_list = [] #use this as result
stack = []
stack.append(self)
while len(stack)>0:
node = stack.pop() #should return a value
pre_ordered_list.append(node.get_root_val())
if node.right_child:
stack.append(node.get_right_child())
if node.left_child:
stack.append(node.get_left_child())
return pre_ordered_list
bn = BinaryTree(1)
bn.left_child = BinaryTree(2)
bn.right_child = BinaryTree(3)
print (bn.preorder_iterative())
I am very lost about stack.append(self) part. I am not sure what is the point of having this line and I don't fully understand the concept of .append(self). I have learnt that self represents the instance of the class.

The purpose of the stack is to simulate recursion.
The initial value placed on the stack is the tree itself (in the form of its root node). Its value is retrieved, and then each subtree is placed on the stack. On the next iteration of the loop, the left child is removed, processed, and replaced with its children, if any. The loop continues as long as there is anything on the stack to process. Once everything on the left side of the tree as been processed, you'll finally start on the right child (placed the stack way at the beginning of the loop).
Compare to a recursive version:
def preorder_recursive(self):
result = [self.get_root_val()]
if node.left_child:
result.extend(node.left_child.preorder_recursive())
if node.right_child:
result.extend(node.right_child.preorder_recursive())
return result
Each recursive call essentially puts self on a stack, allowing the left child (and its descendants) to be processed before eventually returning to the root and moving to its right child.

Sum of all nodes of a Binary Tree

I'm trying to write a program to calculate the sum of all nodes (including the root) in a Binary Tree (not a Binary Search Tree) represented by a list of lists. I conceptually understand that approaching this recursively is the best way to do it but just cannot figure out the code. So far, my code is:
class BinaryTree:
def __init__(self,rootObj, leftChild = None, rightChild = None):
self.key = rootObj
self.leftChild = None
self.rightChild = None
self.node=[rootObj, leftChild, rightChild]
def getrightChild(self):
return self.rightChild
def getleftChild(self):
return self.leftChild
def setRootObj(self,obj):
self.key = obj
def getRootObj(self):
return self.key
def sumTree(BinaryTree):
if BinaryTree is None: return 0
return sumTree(BinaryTree.leftChild) \
+ sumTree(BinaryTree.rightChild)\
+ BinaryTree.rootObj
print(sumTree([8,[],[]]))
print(sumTree([9, [6, [ ], [ ]], [65, [ ], [ ]]]))

Be careful,
self.key = rootObj
self.leftChild = None
self.rightChild = None
are object attributes, so you can't access them with through your class directly. You have to create an instance like
obj = BinaryTree(...)
and then call the method
obj.sumTree(...)
To your sum algorithm, the easiest way to calculate the sum your way would be something like this:
class BinaryTree:
#classmethod
def calc_sum(cls, list_tree):
print(list_tree)
if list_tree:
left_node_value = BinaryTree.calc_sum(list_tree[1])
right_node_value = BinaryTree.calc_sum(list_tree[2])
return list_tree[0] + left_node_value + right_node_value
return 0
value = BinaryTree.calc_sum([9, [6, [ ], [ ]], [65, [ ], [ ]]])
print(value)

You don't need all the getters. You can simply use object accessor methods, e.g. tree_a.left_child. Secondly, you didn't create a BinaryTree out of your children, meaning that it doesn't make sense to run sum_tree on them. Read through the following code, and make sure that you understand what's going on.
Pretty sure that what you actually want, is this
class BinaryTree:
def __init__(self, root, left_child=None, right_child=None):
self.root = root
self.left_child = None if not left_child else BinaryTree(*left_child)
self.right_child = None if not right_child else BinaryTree(*right_child)
self.node = [root, left_child, right_child]
def set_root(self, root):
self.root = root
def sum_tree(self):
tree_sum = 0
if self.left_child:
tree_sum += self.left_child.sum_tree()
if self.right_child:
tree_sum += self.right_child.sum_tree()
return tree_sum + self.root
tree_a = BinaryTree(8)
tree_b = BinaryTree(9, [6, [], []], [65, [], []])
print(tree_a.sum_tree())
# 8
print(tree_b.sum_tree())
# 80
print(tree_b.left_child.node)
# [6, [], []]

Well, from what I read from this code, your recursive algorithm is correct.
However, there are many syntax mistakes as well as other, semantic mistakes in it that make it impossible to run correctly.
Here is what I see:
You created a BinaryTree class, but you never created an instance of it.
sumTree([...]) tries to calculate that sum of a list, which will not work, because you want it to do it for a BinaryTree object. You need to parse that list and create an instance of BinaryTree first. (Like tree = BinaryTree(*write your list here*) maybe. But you need to make your __init__() method allow that passing of the list, of course. See next point.)
Your __init__() method takes BinaryTree objects as parameters, so there is no parsing of your lists.
Within the __init__() method, you set both children to None, so no node will ever have child nodes.
When calling the sumTree() method, you need to specify the context.
It needs to be BinaryTree.sumTree(..). You still need to create the Binary tree instance that shall be passed to the sumTree method, though.
Within the sumTree() method, you try to access the rootObj member - which does not exist, because you called it key.
Besides the errors, I'd like to point out some "code smells", if you like.
You should rename the parameter of the sumTree() method to something different ot the class name.
In python, there is no need for Getter-methods. You can access the members directly. If you still wish to define more complex get/set behaviour, you should have a look at python properties.
The member node is never used.

Trigger method when child is added to tree

Suppose i got the following piece of python code to create a forest containing a bunch of trees.
NEXT_INDEX = 0
class Node:
""" A node of a tree """
def __init__(self):
# Each node gets a unique id
self._index = NEXT_INDEX
NEXT_INDEX += 1
# any node may have an arbitrary number of children
self._children = list()
self._parent = None
def add_child(self, node):
node._parent = self
self._children.append(node)
def __str__(self):
return "node {}".format(self._index)
class Forest:
""" A bunch of trees """
def __init__(self):
# contains the root nodes of a whole bunch of trees
self._trees = list()
def add_node(self, node):
# the new node will be the root node for a new tree in self._trees
self._trees.append(node)
def find_node(self, idx):
"""
Search all trees in self._trees for a node with index = idx
and return that node.
"""
# Implementation not relevant here
pass
def on_add_child(child):
# should be executed each time add_child is called on a node with the
# new child as a parameter
print("on_add_child with child = {}".format(child))
I would like to execute a method, "on_add_child", each time a child is added to any node in any of the trees stored in Forest._trees.
Important: The print statement has to be in the Forest class. In the real code Forest maintains a search index of nodes and whenever a new child node is added, the new node has to be added to the search index. Adding a reference to Forest to Node (so that Node.add_child could call Forest.on_add_child) is unfortunately not an option either, because it would introduce a circular dependency between Node and Forest.
Example: Say i executed the following code
forest = Forest()
node_0 = Node()
node_1 = Node()
node_2 = Node()
node_3 = Node()
node_4 = Node()
# We add the first node to the forest: It will become the root of the first tree
forest.add_node(node_0)
# Add node_1 as a child to node_0; This should execute on_add_child(node_1) and
# print "on_add_child with child = node 1"
forest.find_node(0).add_child(node_1)
# Should print "on_add_child with child = node 2"
# => on_add_child is also triggered when we add a child to a non-root node
forest.find_node(1).add_child(node_2)
# Create a second tree
forest.add_node(node_3)
# Should print "on_add_child with child = node 4"
forest.find_node(3).add_child(node_4)
How can this be accomplished? I am aware of python properties and i have found several related questions about how to use properties together with python lists (eg. Python property on a list, Python decorating property setter with list, python: how to have a property and with a setter function that detects all changes that happen to the value), but in my case it is not just a list, but also a tree structure and i couldn't get this combination to work.

If you need action in the forest class to be initiated you can call this in add_child as well. So if vertices need to be updated just update them every time add_child is called. You will need to also keep track of which forest the node is in by passing that into the default constructor.
def add_child(self, node):
node._parent = self
self._children.append(node)
self._forest.on_add_child(node)

My recursive function (populates a tree structure) is adding to the root node during every loop/call

I have an algorithm to populate a tree like structure (class: Scan_instance_tree), but unfortunately, during each call, it is incorrectly adding to the root node's children, as well as to the new child nodes created further down in the tree.
As a clue, I saw another thread...
Persistent objects in recursive python functions
...where this problem was mentioned briefly, and it was suggested that the parameters passed had to be mutable. Is that the answer, and how would I do this, in this example???
Here is my current code:
class Field_node(object):
field_phenotype_id = -1
field_name = ''
field_parent_id = -1
child_nodes = []
class Scan_instance_tree(object):
root_node = None
def __init__(self, a_db):
self.root_node = Field_node()
scan_field_values = self.create_scan_field_values(a_db) # This just creates a temporary user-friendly version of a database table
self.build_tree(scan_field_values)
def build_tree(self, a_scan_field_values):
self.root_node.field_name = 'ROOT'
self.add_child_nodes(a_scan_field_values, self.root_node)
def add_child_nodes(self, a_scan_field_values, a_parent_node):
i = 0
while i < len(a_scan_field_values):
if a_scan_field_values[i]['field_parent_dependancy'] == a_parent_node.field_phenotype_id:
#highest_level_children.append(a_scan_field_values.pop(a_scan_field_values.index(scan_field)))
child_node = Field_node()
child_node.field_phenotype_id = a_scan_field_values[i]['field_phenotype_id']
child_node.field_name = a_scan_field_values[i]['field_name']
child_node.field_parent_dependancy = a_scan_field_values[i]['field_parent_dependancy']
a_parent_node.child_nodes.append(child_node)
a_scan_field_values.remove(a_scan_field_values[i])
# RECURSION: get the child nodes
self.add_child_nodes(a_scan_field_values, child_node)
else:
i = i+1
If I remove the recursive call to self.add_child_nodes(...), the root's children are added correctly, ie they only consist of those nodes where the field_parent_dependancy = -1
If I allow the recursive call, the root's children contain all the nodes, regardless of the field_parent_dependancy value.
Best regards
Ann

When you define your Field_node class, the line
child_nodes = []
is actually instantiating a single list as a class attribute, rather than an instance attribute, that will be shared by all instances of the class.
What you should do instead is create instance attributes in __init__, e.g.:
class Field_node(object):
def __init__(self):
self.field_phenotype_id = -1
self.field_name = ''
self.field_parent_id = -1
self.child_nodes = []

Python For...loop iteration

Alright,
I have this program to sparse code in Newick Format, which extracts both a name, and a distance for use in a phylogenetic tree diagram.
What my problem is, in this branch of code, as the program reads through the newickNode function, it assigns the name and distance to the 'node' variable, then returns it back into the 'Node' class to be printed, but it seems to only print the first node 'A', and skips the other 3.
Is there anyway to finish the for loop in newickNode to read the other 3 nodes and print them accordingly with the first?
class Node:
def __init__(self, name, distance, parent=None):
self.name = name
self.distance = distance
self.children = []
self.parent = parent
def displayNode(self):
print "Name:",self.name,",Distance:",self.distance,",Children:",self.children,",Parent:",self.parent
def newickNode(newickString, parent=None):
String = newickString[1:-1].split(',')
for x in String:
splitString = x.split(':')
nodeName = splitString[0]
nodeDistance = float(splitString[1])
node = Node(nodeName, nodeDistance, parent)
return node
Node1 = newickNode('(A:0.1,B:0.2,C:0.3,D:0.4)')
Node1.displayNode()
Thanks!

You could make it a generator:
def newickNode(newickString, parent=None):
String = newickString[1:-1].split(',')
for x in String:
splitString = x.split(':')
nodeName = splitString[0]
nodeDistance = float(splitString[1])
node = Node(nodeName, nodeDistance, parent)
yield node
for node in newickNode('(A:0.1,B:0.2,C:0.3,D:0.4)'):
node.displayNode()
The generator will return one node at a time and pause within the function, and then resume when you want the next one.
Or just save them up and return them
def newickNode(newickString, parent=None):
String = newickString[1:-1].split(',')
nodes = []
for x in String:
splitString = x.split(':')
nodeName = splitString[0]
nodeDistance = float(splitString[1])
node = Node(nodeName, nodeDistance, parent)
nodes.append(node)
return nodes

Your newickNode() function should accumulate a list of nodes and return that, rather than returning the first node created. If you're going to do that, why have a loop to begin with?
def newickNodes(newickString, parent=None):
nodes = []
for node in newickString[1:-1].split(','):
nodeName, nodeDistance = node.split(':')
nodes.append(Node(nodeName, nodeDistance, parent))
return nodes
Alternatively, you could write it as a generator that yields the nodes one at a time. This would allow you to easily iterate over them or convert them to a list depending on your needs.
def newickNodes(newickString, parent=None):
for node in newickString[1:-1].split(','):
nodeName, nodeDistance = node.split(':')
yield Node(nodeName, nodeDistance, parent)
Also, from a object-oriented design POV, this should probably be a class method on your Node class named parseNewickString() or similar.

Alternatively, your newickNode() function could immediately call node.displayNode() on the new node each time through the loop.

To keep this more flexible - I would use pyparsing to process the Newick text and networkx so I had all the graph functionality I could desire - recommend to easy_install/pip those modules. It's also nice that someone has written a parser with node and tree creation already (although it looks like it lacks some features, it'll work for your case):
http://code.google.com/p/phylopy/source/browse/trunk/src/phylopy/newick.py?r=66

The first time through your for: loop, you return a node, which stops the function executing.
If you want to return a list of nodes, create the list at the top of the function, append to it each time through the loop, and return the list when you're done.
It may make more sense to move the loop outside of the newickNode function, and have that function only return a single node as its name suggests.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.