Python For...loop iteration - python

Alright,
I have this program to sparse code in Newick Format, which extracts both a name, and a distance for use in a phylogenetic tree diagram.
What my problem is, in this branch of code, as the program reads through the newickNode function, it assigns the name and distance to the 'node' variable, then returns it back into the 'Node' class to be printed, but it seems to only print the first node 'A', and skips the other 3.
Is there anyway to finish the for loop in newickNode to read the other 3 nodes and print them accordingly with the first?
class Node:
def __init__(self, name, distance, parent=None):
self.name = name
self.distance = distance
self.children = []
self.parent = parent
def displayNode(self):
print "Name:",self.name,",Distance:",self.distance,",Children:",self.children,",Parent:",self.parent
def newickNode(newickString, parent=None):
String = newickString[1:-1].split(',')
for x in String:
splitString = x.split(':')
nodeName = splitString[0]
nodeDistance = float(splitString[1])
node = Node(nodeName, nodeDistance, parent)
return node
Node1 = newickNode('(A:0.1,B:0.2,C:0.3,D:0.4)')
Node1.displayNode()
Thanks!

You could make it a generator:
def newickNode(newickString, parent=None):
String = newickString[1:-1].split(',')
for x in String:
splitString = x.split(':')
nodeName = splitString[0]
nodeDistance = float(splitString[1])
node = Node(nodeName, nodeDistance, parent)
yield node
for node in newickNode('(A:0.1,B:0.2,C:0.3,D:0.4)'):
node.displayNode()
The generator will return one node at a time and pause within the function, and then resume when you want the next one.
Or just save them up and return them
def newickNode(newickString, parent=None):
String = newickString[1:-1].split(',')
nodes = []
for x in String:
splitString = x.split(':')
nodeName = splitString[0]
nodeDistance = float(splitString[1])
node = Node(nodeName, nodeDistance, parent)
nodes.append(node)
return nodes

Your newickNode() function should accumulate a list of nodes and return that, rather than returning the first node created. If you're going to do that, why have a loop to begin with?
def newickNodes(newickString, parent=None):
nodes = []
for node in newickString[1:-1].split(','):
nodeName, nodeDistance = node.split(':')
nodes.append(Node(nodeName, nodeDistance, parent))
return nodes
Alternatively, you could write it as a generator that yields the nodes one at a time. This would allow you to easily iterate over them or convert them to a list depending on your needs.
def newickNodes(newickString, parent=None):
for node in newickString[1:-1].split(','):
nodeName, nodeDistance = node.split(':')
yield Node(nodeName, nodeDistance, parent)
Also, from a object-oriented design POV, this should probably be a class method on your Node class named parseNewickString() or similar.

Alternatively, your newickNode() function could immediately call node.displayNode() on the new node each time through the loop.

To keep this more flexible - I would use pyparsing to process the Newick text and networkx so I had all the graph functionality I could desire - recommend to easy_install/pip those modules. It's also nice that someone has written a parser with node and tree creation already (although it looks like it lacks some features, it'll work for your case):
http://code.google.com/p/phylopy/source/browse/trunk/src/phylopy/newick.py?r=66

The first time through your for: loop, you return a node, which stops the function executing.
If you want to return a list of nodes, create the list at the top of the function, append to it each time through the loop, and return the list when you're done.
It may make more sense to move the loop outside of the newickNode function, and have that function only return a single node as its name suggests.

Related

Append (self) with stacks in Python classes

I am doing/learning data structures and algorithms and I came across the following code:
class BinaryTree:
def __init__(self, root_data):
self.data = root_data
self.left_child = None
self.right_child = None
def inorder_iterative(self):
inorder_list = []
return inorder_list
def get_right_child(self):
return self.right_child
def get_left_child(self):
return self.left_child
def set_root_val(self, obj):
self.data = obj
def get_root_val(self):
return self.data
def preorder_iterative(self):
pre_ordered_list = [] #use this as result
stack = []
stack.append(self)
while len(stack)>0:
node = stack.pop() #should return a value
pre_ordered_list.append(node.get_root_val())
if node.right_child:
stack.append(node.get_right_child())
if node.left_child:
stack.append(node.get_left_child())
return pre_ordered_list
bn = BinaryTree(1)
bn.left_child = BinaryTree(2)
bn.right_child = BinaryTree(3)
print (bn.preorder_iterative())
I am very lost about stack.append(self) part. I am not sure what is the point of having this line and I don't fully understand the concept of .append(self). I have learnt that self represents the instance of the class.
The purpose of the stack is to simulate recursion.
The initial value placed on the stack is the tree itself (in the form of its root node). Its value is retrieved, and then each subtree is placed on the stack. On the next iteration of the loop, the left child is removed, processed, and replaced with its children, if any. The loop continues as long as there is anything on the stack to process. Once everything on the left side of the tree as been processed, you'll finally start on the right child (placed the stack way at the beginning of the loop).
Compare to a recursive version:
def preorder_recursive(self):
result = [self.get_root_val()]
if node.left_child:
result.extend(node.left_child.preorder_recursive())
if node.right_child:
result.extend(node.right_child.preorder_recursive())
return result
Each recursive call essentially puts self on a stack, allowing the left child (and its descendants) to be processed before eventually returning to the root and moving to its right child.

Trigger method when child is added to tree

Suppose i got the following piece of python code to create a forest containing a bunch of trees.
NEXT_INDEX = 0
class Node:
""" A node of a tree """
def __init__(self):
# Each node gets a unique id
self._index = NEXT_INDEX
NEXT_INDEX += 1
# any node may have an arbitrary number of children
self._children = list()
self._parent = None
def add_child(self, node):
node._parent = self
self._children.append(node)
def __str__(self):
return "node {}".format(self._index)
class Forest:
""" A bunch of trees """
def __init__(self):
# contains the root nodes of a whole bunch of trees
self._trees = list()
def add_node(self, node):
# the new node will be the root node for a new tree in self._trees
self._trees.append(node)
def find_node(self, idx):
"""
Search all trees in self._trees for a node with index = idx
and return that node.
"""
# Implementation not relevant here
pass
def on_add_child(child):
# should be executed each time add_child is called on a node with the
# new child as a parameter
print("on_add_child with child = {}".format(child))
I would like to execute a method, "on_add_child", each time a child is added to any node in any of the trees stored in Forest._trees.
Important: The print statement has to be in the Forest class. In the real code Forest maintains a search index of nodes and whenever a new child node is added, the new node has to be added to the search index. Adding a reference to Forest to Node (so that Node.add_child could call Forest.on_add_child) is unfortunately not an option either, because it would introduce a circular dependency between Node and Forest.
Example: Say i executed the following code
forest = Forest()
node_0 = Node()
node_1 = Node()
node_2 = Node()
node_3 = Node()
node_4 = Node()
# We add the first node to the forest: It will become the root of the first tree
forest.add_node(node_0)
# Add node_1 as a child to node_0; This should execute on_add_child(node_1) and
# print "on_add_child with child = node 1"
forest.find_node(0).add_child(node_1)
# Should print "on_add_child with child = node 2"
# => on_add_child is also triggered when we add a child to a non-root node
forest.find_node(1).add_child(node_2)
# Create a second tree
forest.add_node(node_3)
# Should print "on_add_child with child = node 4"
forest.find_node(3).add_child(node_4)
How can this be accomplished? I am aware of python properties and i have found several related questions about how to use properties together with python lists (eg. Python property on a list, Python decorating property setter with list, python: how to have a property and with a setter function that detects all changes that happen to the value), but in my case it is not just a list, but also a tree structure and i couldn't get this combination to work.
If you need action in the forest class to be initiated you can call this in add_child as well. So if vertices need to be updated just update them every time add_child is called. You will need to also keep track of which forest the node is in by passing that into the default constructor.
def add_child(self, node):
node._parent = self
self._children.append(node)
self._forest.on_add_child(node)

Implementing a depth-first tree iterator in Python

I'm trying to implement an iterator class for not-necessarily-binary trees in Python. After the iterator is constructed with a tree's root node, its next() function can be called repeatedly to traverse the tree in depth-first order (e.g., this order), finally returning None when there are no nodes left.
Here is the basic Node class for a tree:
class Node(object):
def __init__(self, title, children=None):
self.title = title
self.children = children or []
self.visited = False
def __str__(self):
return self.title
As you can see above, I introduced a visited property to the nodes for my first approach, since I didn't see a way around it. With that extra measure of state, the Iterator class looks like this:
class Iterator(object):
def __init__(self, root):
self.stack = []
self.current = root
def next(self):
if self.current is None:
return None
self.stack.append(self.current)
self.current.visited = True
# Root case
if len(self.stack) == 1:
return self.current
while self.stack:
self.current = self.stack[-1]
for child in self.current.children:
if not child.visited:
self.current = child
return child
self.stack.pop()
This is all well and good, but I want to get rid of the need for the visited property, without resorting to recursion or any other alterations to the Node class.
All the state I need should be taken care of in the iterator, but I'm at a loss about how that can be done. Keeping a visited list for the whole tree is non-scalable and out of the question, so there must be a clever way to use the stack.
What especially confuses me is this--since the next() function, of course, returns, how can I remember where I've been without marking anything or using excess storage? Intuitively, I think of looping over children, but that logic is broken/forgotten when the next() function returns!
UPDATE - Here is a small test:
tree = Node(
'A', [
Node('B', [
Node('C', [
Node('D')
]),
Node('E'),
]),
Node('F'),
Node('G'),
])
iter = Iterator(tree)
out = object()
while out:
out = iter.next()
print out
If you really must avoid recursion, this iterator works:
from collections import deque
def node_depth_first_iter(node):
stack = deque([node])
while stack:
# Pop out the first element in the stack
node = stack.popleft()
yield node
# push children onto the front of the stack.
# Note that with a deque.extendleft, the first on in is the last
# one out, so we need to push them in reverse order.
stack.extendleft(reversed(node.children))
With that said, I think that you're thinking about this too hard. A good-ole' (recursive) generator also does the trick:
class Node(object):
def __init__(self, title, children=None):
self.title = title
self.children = children or []
def __str__(self):
return self.title
def __iter__(self):
yield self
for child in self.children:
for node in child:
yield node
both of these pass your tests:
expected = ['A', 'B', 'C', 'D', 'E', 'F', 'G']
# Test recursive generator using Node.__iter__
assert [str(n) for n in tree] == expected
# test non-recursive Iterator
assert [str(n) for n in node_depth_first_iter(tree)] == expected
and you can easily make Node.__iter__ use the non-recursive form if you prefer:
def __iter__(self):
return node_depth_first_iter(self)
That could still potentially hold every label, though. I want the
iterator to keep only a subset of the tree at a time.
But you already are holding everything. Remember that an object is essentially a dictionary with an entry for each attribute. Having self.visited = False in the __init__ of Node means you are storing a redundant "visited" key and False value for every single Node object no matter what. A set, at least, also has the potential of not holding every single node ID. Try this:
class Iterator(object):
def __init__(self, root):
self.visited_ids = set()
...
def next(self):
...
#self.current.visited = True
self.visited_ids.add(id(self.current))
...
#if not child.visited:
if id(child) not in self.visited_ids:
Looking up the ID in the set should be just as fast as accessing a node's attribute. The only way this can be more wasteful than your solution is the overhead of the set object itself (not its elements), which is only a concern if you have multiple concurrent iterators (which you obviously don't, otherwise the node visited attribute couldn't be useful to you).

Constructing a phylogentic tree

I have a list of list of lists like this
matches = [[['rootrank', 'Root'], ['domain', 'Bacteria'], ['phylum', 'Firmicutes'], ['class', 'Clostridia'], ['order', 'Clostridiales'], ['family', 'Lachnospiraceae'], ['genus', 'Lachnospira']],
[['rootrank', 'Root'], ['domain', 'Bacteria'], ['phylum', '"Proteobacteria"'], ['class', 'Gammaproteobacteria'], ['order', '"Vibrionales"'], ['family', 'Vibrionaceae'], ['genus', 'Catenococcus']],
[['rootrank', 'Root'], ['domain', 'Archaea'], ['phylum', '"Euryarchaeota"'], ['class', '"Methanomicrobia"'], ['order', 'Methanomicrobiales'], ['family', 'Methanomicrobiaceae'], ['genus', 'Methanoplanus']]]
And I want to construct a phylogenetic tree from them. I wrote a node class like so (based partially on this code):
class Node(object):
"""Generic n-ary tree node object
Children are additive; no provision for deleting them."""
def __init__(self, parent, category=None, name=None):
self.parent = parent
self.category = category
self.name = name
self.childList = []
if parent is None:
self.birthOrder = 0
else:
self.birthOrder = len(parent.childList)
parent.childList.append(self)
def fullPath(self):
"""Returns a list of children from root to self"""
result = []
parent = self.parent
kid = self
while parent:
result.insert(0, kid)
parent, kid = parent.parent, parent
return result
def ID(self):
return '{0}|{1}'.format(self.category, self.name)
And then I try to construct my tree like this:
node = None
for match in matches:
for branch in match:
category, name = branch
node = Node(node, category, name)
print [n.ID() for n in node.fullPath()]
This works for the first match, but when I start with the second match it is appended at the end of the tree instead of starting again at the top. How would I do that? I tried some variations on searching for the ID, but I can't get it to work.
I would highly recommend using a phylogenetics library like Dendropy.
The 'standard way of writing phylogenetic trees is with the Newick format (parenthetical statements like ((A,B),C)). If you use Dendropy, reading that tree would be as simple as
>>> import dendropy
>>> tree1 = dendropy.Tree.get_from_string("((A,B),(C,D))", schema="newick")
or to read from a stream
>>> tree1 = dendropy.Tree(stream=open("mle.tre"), schema="newick")
The creator of the library maintains a nice tutorial too.
The issue is that node is always the bottommost node in the tree, and you are always appending to that node. You need to store the root node. Since ['rootrank', 'Root'] appears at the beginning of each of the lists, I'd recommend pulling that out and using it as the root. So you can do something like:
rootnode = Node(None, 'rootrank', 'Root')
for match in matches:
node = rootnode
for branch in match:
category, name = branch
node = Node(node, category, name)
print [n.ID() for n in node.fullPath()]
This will make the matches list more readable, and gives the expected output.
Do yourself a favor and don't reinvent the wheel. Python-graph (a.k.a. pygraph) does all that you ask here and most of the things that you'll ask next.

My recursive function (populates a tree structure) is adding to the root node during every loop/call

I have an algorithm to populate a tree like structure (class: Scan_instance_tree), but unfortunately, during each call, it is incorrectly adding to the root node's children, as well as to the new child nodes created further down in the tree.
As a clue, I saw another thread...
Persistent objects in recursive python functions
...where this problem was mentioned briefly, and it was suggested that the parameters passed had to be mutable. Is that the answer, and how would I do this, in this example???
Here is my current code:
class Field_node(object):
field_phenotype_id = -1
field_name = ''
field_parent_id = -1
child_nodes = []
class Scan_instance_tree(object):
root_node = None
def __init__(self, a_db):
self.root_node = Field_node()
scan_field_values = self.create_scan_field_values(a_db) # This just creates a temporary user-friendly version of a database table
self.build_tree(scan_field_values)
def build_tree(self, a_scan_field_values):
self.root_node.field_name = 'ROOT'
self.add_child_nodes(a_scan_field_values, self.root_node)
def add_child_nodes(self, a_scan_field_values, a_parent_node):
i = 0
while i < len(a_scan_field_values):
if a_scan_field_values[i]['field_parent_dependancy'] == a_parent_node.field_phenotype_id:
#highest_level_children.append(a_scan_field_values.pop(a_scan_field_values.index(scan_field)))
child_node = Field_node()
child_node.field_phenotype_id = a_scan_field_values[i]['field_phenotype_id']
child_node.field_name = a_scan_field_values[i]['field_name']
child_node.field_parent_dependancy = a_scan_field_values[i]['field_parent_dependancy']
a_parent_node.child_nodes.append(child_node)
a_scan_field_values.remove(a_scan_field_values[i])
# RECURSION: get the child nodes
self.add_child_nodes(a_scan_field_values, child_node)
else:
i = i+1
If I remove the recursive call to self.add_child_nodes(...), the root's children are added correctly, ie they only consist of those nodes where the field_parent_dependancy = -1
If I allow the recursive call, the root's children contain all the nodes, regardless of the field_parent_dependancy value.
Best regards
Ann
When you define your Field_node class, the line
child_nodes = []
is actually instantiating a single list as a class attribute, rather than an instance attribute, that will be shared by all instances of the class.
What you should do instead is create instance attributes in __init__, e.g.:
class Field_node(object):
def __init__(self):
self.field_phenotype_id = -1
self.field_name = ''
self.field_parent_id = -1
self.child_nodes = []

Categories