Related
I'm trying to implement a Graph/Node data structure to store some connections between synsets in WordNet.
import nltk
from nltk.corpus import wordnet as wn
# a dict of [wn.synset, Node]
syns_node_dict = []
# gets a node from a wn.synset
def nodeFromSyn(syn):
for row in syns_node_dict:
if row[0] == syn:
return row[1]
return False
def display_dict():
for row in syns_node_dict:
curr_nde = row[1]
display([curr_nde._value, curr_nde._parents, curr_nde._children])
# Graph Node data struct
class Node:
# value is a wn.synset
# parents, children are both lists of wn.synset
def __init__(self, value, parents=[], children=[]):
curr = nodeFromSyn(value)
# if a Node for value already exists, merge attributes
if curr:
for p in parents:
if p not in curr._parents : curr._parents.append(p)
for c in children:
if c not in curr._children : curr._children.append(c)
self = curr
# if a Node for value does not exist, create new Node for value
else:
syns_node_dict.append([value, self])
self._value = value
self._parents = parents
self._children = children
# Create a Node for each of self's parents if it does not already exist
# and add self as a child
for parent in self._parents:
parent_node = nodeFromSyn(parent)
if parent_node:
if value not in parent_node._children:
parent_node._children.append(value)
else:
parent_node = Node(parent, children=[value])
# Create a Node for each of self's children if it does not already exist
# and add self as a parent
for child in self._children:
child_node = nodeFromSyn(child)
if child_node:
if value not in child_node._parents:
child_node._parents.append(value)
else:
child_node = Node(child, parents=[value])
However, I'm seeing some strange behavior when I attempt to run my code:
Node(wn.synset('condition.n.01'), [], children=[wn.synset('difficulty.n.03')])
display_dict()
>[Synset('condition.n.01'), [], [Synset('difficulty.n.03')]]
>[Synset('difficulty.n.03'), [Synset('condition.n.01')], []]
Node(wn.synset('state.n.02'), children=[wn.synset('condition.n.01')])
display_dict()
>[Synset('condition.n.01'), [Synset('state.n.02')], [Synset('difficulty.n.03')]]
>[Synset('difficulty.n.03'), [Synset('condition.n.01')], []]
>[Synset('state.n.02'), [], [Synset('condition.n.01')]]
When I run the following lines of code I expect the last line to be
[Synset('attribute.n.02'), [], [Synset('state.n.02')]].
Why does the Node for wn.synset('attribute.n.02') contain itself among its parents?
Node(wn.synset('attribute.n.02'), children=[wn.synset('state.n.02')])
display_dict()
>[Synset('condition.n.01'), [Synset('state.n.02')], [Synset('difficulty.n.03')]]
>[Synset('difficulty.n.03'), [Synset('condition.n.01')], []]
>[Synset('state.n.02'), [Synset('attribute.n.02')], [Synset('condition.n.01')]]
>[Synset('attribute.n.02'), [Synset('attribute.n.02')], [Synset('state.n.02')]]
I isolated the issue to be within the block below but I can't figure out what's causing this behavior.
# Create a Node for each of self's children if it does not already exist
# and add self as a parent
for child in self._children:
child_node = nodeFromSyn(child)
if child_node:
if value not in child_node._parents:
child_node._parents.append(value)
else:
child_node = Node(child, parents=[value])
The reason this is happening is because default parameters are evaluated once when the function is defined, not each time it is called. Because of this, all the nodes you create with the parents (or children) parameter blank are actually sharing the same list for their parent list. For the first two nodes this isn't a problem because you specify an empty list for the first node, and the first node creates a list for the second, but all the nodes after that are sharing the same list (which becomes more obvious if you keep creating more nodes).
A way to solve this problem outlined here is to make the default parameters None, then check if the parameter is none and create the empty list if so:
# modify the init function so that the default parameters are None
def __init__(self, value, parents=None, children=None):
# if they are left as None, set them to an empty list
if parents is None:
parents = []
if children is None:
children = []
# The rest of your code here...
I'm trying to implement an iterator class for not-necessarily-binary trees in Python. After the iterator is constructed with a tree's root node, its next() function can be called repeatedly to traverse the tree in depth-first order (e.g., this order), finally returning None when there are no nodes left.
Here is the basic Node class for a tree:
class Node(object):
def __init__(self, title, children=None):
self.title = title
self.children = children or []
self.visited = False
def __str__(self):
return self.title
As you can see above, I introduced a visited property to the nodes for my first approach, since I didn't see a way around it. With that extra measure of state, the Iterator class looks like this:
class Iterator(object):
def __init__(self, root):
self.stack = []
self.current = root
def next(self):
if self.current is None:
return None
self.stack.append(self.current)
self.current.visited = True
# Root case
if len(self.stack) == 1:
return self.current
while self.stack:
self.current = self.stack[-1]
for child in self.current.children:
if not child.visited:
self.current = child
return child
self.stack.pop()
This is all well and good, but I want to get rid of the need for the visited property, without resorting to recursion or any other alterations to the Node class.
All the state I need should be taken care of in the iterator, but I'm at a loss about how that can be done. Keeping a visited list for the whole tree is non-scalable and out of the question, so there must be a clever way to use the stack.
What especially confuses me is this--since the next() function, of course, returns, how can I remember where I've been without marking anything or using excess storage? Intuitively, I think of looping over children, but that logic is broken/forgotten when the next() function returns!
UPDATE - Here is a small test:
tree = Node(
'A', [
Node('B', [
Node('C', [
Node('D')
]),
Node('E'),
]),
Node('F'),
Node('G'),
])
iter = Iterator(tree)
out = object()
while out:
out = iter.next()
print out
If you really must avoid recursion, this iterator works:
from collections import deque
def node_depth_first_iter(node):
stack = deque([node])
while stack:
# Pop out the first element in the stack
node = stack.popleft()
yield node
# push children onto the front of the stack.
# Note that with a deque.extendleft, the first on in is the last
# one out, so we need to push them in reverse order.
stack.extendleft(reversed(node.children))
With that said, I think that you're thinking about this too hard. A good-ole' (recursive) generator also does the trick:
class Node(object):
def __init__(self, title, children=None):
self.title = title
self.children = children or []
def __str__(self):
return self.title
def __iter__(self):
yield self
for child in self.children:
for node in child:
yield node
both of these pass your tests:
expected = ['A', 'B', 'C', 'D', 'E', 'F', 'G']
# Test recursive generator using Node.__iter__
assert [str(n) for n in tree] == expected
# test non-recursive Iterator
assert [str(n) for n in node_depth_first_iter(tree)] == expected
and you can easily make Node.__iter__ use the non-recursive form if you prefer:
def __iter__(self):
return node_depth_first_iter(self)
That could still potentially hold every label, though. I want the
iterator to keep only a subset of the tree at a time.
But you already are holding everything. Remember that an object is essentially a dictionary with an entry for each attribute. Having self.visited = False in the __init__ of Node means you are storing a redundant "visited" key and False value for every single Node object no matter what. A set, at least, also has the potential of not holding every single node ID. Try this:
class Iterator(object):
def __init__(self, root):
self.visited_ids = set()
...
def next(self):
...
#self.current.visited = True
self.visited_ids.add(id(self.current))
...
#if not child.visited:
if id(child) not in self.visited_ids:
Looking up the ID in the set should be just as fast as accessing a node's attribute. The only way this can be more wasteful than your solution is the overhead of the set object itself (not its elements), which is only a concern if you have multiple concurrent iterators (which you obviously don't, otherwise the node visited attribute couldn't be useful to you).
I've been trying to create a simple linked list implementation in Python as a code exercise and, although I have most of the stuff working (inserting, removing, pretty print, swapping the content of two nodes), I've been stuck on swapping two nodes for a few days.
I've looked around on the internet and most people seem to recommend deleting/inserting the nodes or swapping the data. Both are very fine and functional options but I wanted to challenge myself and see if I could swap the nodes the "correct" way.
Ideally I would like to have a generic function that can handle all edge cases (moving to the begin, end and swapping random nodes). This has proven to be way more challenging than I expected.
I've experimented a bit with pen and paper and search around and I found the following discussion and example implementation:
Discussion about swapping in C
Example implementation in C
The issue I run into is that my node1.next and my node2.prev are swapped and that my node2.next refers to itself and not to the next node in the list.
The comment on the page of the example implementation specifically addresses this problem and mentions that it should not happen with his implementation.
I can't seem to figure out what I've done wrong. I guess I could "cheat" and force them to take the correct values at the end but that gives a lot of problems when the node is the first/last.
__author__ = 'laurens'
from django.core.exceptions import ObjectDoesNotExist
import logging
import copy
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)
class DoublyLinkedList(object):
"""This class implements a basic doubly linked list in Django, it depends
on a Django model with the following field:
id : PK
prev: integer previous node
data_field: Foreign key
next: integer next node
the prev and next fields don't need to be self-referencing
When instantiating the class you have to link this class to a Django model
and specify a data field. The data field can link to a foreign key
or contain data
"""
def __init__(self, doubly_linked_list_model, data_field):
self.doubly_linked_list_model = doubly_linked_list_model
self.data_field = data_field
def get_node_from_node_id(self, node_id=None):
"""This function returns the node associated with a certain node_id"""
if node_id is None:
node = None
else:
try:
node = self.doubly_linked_list_model.get(id=node_id)
except ObjectDoesNotExist:
node = None
return node
#staticmethod
def _update_node(node, prev=None, next=None):
node.prev = prev
node.next = next
logger.debug('updating node: %s', node.id)
logger.debug('node.prev = %s', node.prev)
logger.debug('node.next = %s', node.next)
try:
node.save()
except Exception as e: #Todo: specify this
logger.debug('Error saving node: %s', node.id)
def move_node(self, node1=None, node2=None):
"""
This function swaps the position of node1 with the position of node2
"""
#swapping two nodes!
logger.debug('Swapping two random nodes!: %s, %s', node1.id, node2.id)
# Swapping next nodes
logger.debug('Swapping next node')
tmp = copy.deepcopy(node1.next)
self._update_node(node=node1,
prev=node1.prev,
next=node2.next)
#Todo: Check if tmp changes or is stored as a copy
self._update_node(node=node2,
prev=node2.prev,
next=tmp)
if node1.next is not None:
logger.debug('Connect the next node to node 1')
node_next = self.get_node_from_node_id(node1.next)
self._update_node(node=node_next,
prev=node1.id,
next=node_next.next)
if node2.next is not None:
logger.debug('Connect the next node to node 2')
node_next = self.get_node_from_node_id(node2.next)
self._update_node(node=node_next,
prev=node2.id,
next=node_next.next)
logger.debug('Swap prev nodes')
tmp = copy.deepcopy(node1.prev)
self._update_node(node=node1,
prev=node2.prev,
next=node1.next)
self._update_node(node=node2,
prev=tmp,
next=node2.next)
# Connect the node before node1 to node 1
if node1.prev is not None:
logger.debug('Connect the prev to node 1')
node_prev = self.get_node_from_node_id(node1.prev)
self._update_node(node=node_prev,
prev=node_prev.prev,
next=node1.id)
# Connect the node before node2 to node 2
if node2.prev is not None:
logger.debug('Connect the prev to node 2')
node_prev = self.get_node_from_node_id(node2.prev)
self._update_node(node=node_prev,
prev=node_prev.prev,
next=node2.id)
The _update_node function does nothing more than taking my input and committing it to the database; it can handle None values.
get_node_from_node_id takes an integer as input and returns the node object associated with it. I use it so that I don't have to work with self-referencing foreign keys (is that the correct term?) in the database, for now I would like to continue working this way. Once I have this working I'll move on to fixing it in the database in the correct way.
tip: I get answers much more quickly when I provide a minimal, complete, verifiable example (MCVE), also known as a short, self-contained, compilable example (SSCCE).
Your example code fails to demonstrate the problem, making it impossible for us to help you.
Please make it easy for us to help you.
I ran your example code, but I didn't see any output.
The issue I run into is that ... that my node2.next refers to itself and
not to the next node in the list.
Why is node2.next referring to node2 a problem?
As far as I can tell, the part of the code you gave us so far works fine.
Some of the most difficult debugging sessions I've ever had ended only when I realized that everything was actually working correctly, that the "bug" I thought I was hunting didn't actually exist.
Yes, there is an intermediate step halfway through the algorithm where a node refers to itself, which may seem obviously wrong.
But then the second half of the algorithm makes further changes.
By the time the algorithm finishes,
the "next" chain correctly runs all the way from the beginning to the end, and
the "forward" chain correctly runs in the reverse order as the "next" chain, from the end to the beginning, and
the order of those chains is similar to the original order, except that node1 and node2 have swapped logical position (but are still in the same physical position).
Every node points to 2 other nodes (or to the NULL node), never to itself.
Isn't that what you wanted?
When I ran the following test script, I get lots of output --
but I don't see any problems in the output.
(I used a simple array with integer indexes rather than true pointers or references to make the code shorter and easier to debug compared to a SQL database).
Would you mind pointing out which particular line of output is not what you expected, and spelling out what you expected that line of output to actually say?
#!/usr/bin/env python
# https://stackoverflow.com/questions/24610889/trying-to-write-a-swap-position-function-for-a-doubly-linked-list
# Is this the shortest possible program that exhibits the bug?
# Before running this probram, you may need to install
# sudo apt-get install python-django
# 2015-03-12: David added some test scaffolding
# 2014-07-07: Laurens posted to StackOverflow
__author__ = 'laurens'
from django.core.exceptions import ObjectDoesNotExist
import logging
import copy
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)
class DoublyLinkedList(object):
"""This class implements a basic doubly linked list in Django, it depends
on a Django model with the following field:
id : PK
prev: integer previous node
data_field: Foreign key
next: integer next node
the prev and next fields don't need to be self-referencing
When instantiating the class you have to link this class to a Django model
and specify a data field. The data field can link to a foreign key
or contain data
"""
def __init__(self, doubly_linked_list_model, data_field):
self.doubly_linked_list_model = doubly_linked_list_model
self.data_field = data_field
def get_node_from_node_id(self, node_id=None):
"""This function returns the node associated with a certain node_id"""
if node_id is None:
node = None
else:
try:
node = self.doubly_linked_list_model.get(id=node_id)
except ObjectDoesNotExist:
node = None
return node
#staticmethod
def _update_node(node, prev=None, next=None):
node.prev = prev
node.next = next
logger.debug('updating node: %s', node.id)
logger.debug('node.prev = %s', node.prev)
logger.debug('node.next = %s', node.next)
try:
node.save()
except Exception as e: #Todo: specify this
logger.debug('Error saving node: %s', node.id)
def move_node(self, node1=None, node2=None):
"""
This function swaps the position of node1 with the position of node2
"""
#swapping two nodes!
logger.debug('Swapping two random nodes!: %s, %s', node1.id, node2.id)
# Swapping next nodes
logger.debug('Swapping next node')
tmp = copy.deepcopy(node1.next)
self._update_node(node=node1,
prev=node1.prev,
next=node2.next)
#Todo: Check if tmp changes or is stored as a copy
self._update_node(node=node2,
prev=node2.prev,
next=tmp)
if node1.next is not None:
logger.debug('Connect the next node to node 1')
node_next = self.get_node_from_node_id(node1.next)
self._update_node(node=node_next,
prev=node1.id,
next=node_next.next)
if node2.next is not None:
logger.debug('Connect the next node to node 2')
node_next = self.get_node_from_node_id(node2.next)
self._update_node(node=node_next,
prev=node2.id,
next=node_next.next)
logger.debug('Swap prev nodes')
tmp = copy.deepcopy(node1.prev)
self._update_node(node=node1,
prev=node2.prev,
next=node1.next)
self._update_node(node=node2,
prev=tmp,
next=node2.next)
# Connect the node before node1 to node 1
if node1.prev is not None:
logger.debug('Connect the prev to node 1')
node_prev = self.get_node_from_node_id(node1.prev)
self._update_node(node=node_prev,
prev=node_prev.prev,
next=node1.id)
# Connect the node before node2 to node 2
if node2.prev is not None:
logger.debug('Connect the prev to node 2')
node_prev = self.get_node_from_node_id(node2.prev)
self._update_node(node=node_prev,
prev=node_prev.prev,
next=node2.id)
global_test_array = []
obfuscation = 0xaa
class trivial_test_node_class(object):
def __init__(self, prev, id, next, data):
print "initializing test class."
# self.stuff = [ ["first", 0, 1], ["second", 0, 1] ]
self.id = id
self.prev = prev
self.next = next
self.data = data
def something(self):
print "something"
def save(self):
id = self.id
global_test_array[id] = self
def __repr__(self):
# print self.prev, self.id, self.next, self.data
the_string = "%s(%r)\n" % (self.__class__, self.__dict__)
return the_string
class trivial_test_list_model_class(object):
def __init__(self):
print "initializing test class."
#self.stuff = [ ["first", 0, 1], ["second", 0, 1] ]
self.stuff = [ 0 for i in xrange(0,10) ]
data = 'a'
for i in xrange(1,10):
self.stuff[i] = trivial_test_node_class(i-1,i,i+1,data);
data += 'r' # talk like a pirate day
self.stuff[-1].next = 0 # use 0 as NULL id.
global global_test_array
global_test_array = self.stuff
def something(self):
print "something"
def get(self,id):
return self.stuff[id]
def __repr__(self):
# for i in xrange(1,10):
# print self.stuff[i]
the_string = "%s(%r)" % (self.__class__, self.__dict__)
return the_string
if __name__ == '__main__':
# test code that only gets run when this file is run directly,
# not when this file is imported from some other python script.
print "Hello, world."
trivial_test_model = trivial_test_list_model_class()
print trivial_test_model
testll = DoublyLinkedList( trivial_test_model, "data" )
left_node = trivial_test_model.get(3)
right_node = trivial_test_model.get(4)
testll.move_node( left_node, right_node )
print trivial_test_model
# recommended by http://wiki.python.org/moin/vim
# vim: tabstop=8 expandtab shiftwidth=4 softtabstop=4 :
I have an algorithm to populate a tree like structure (class: Scan_instance_tree), but unfortunately, during each call, it is incorrectly adding to the root node's children, as well as to the new child nodes created further down in the tree.
As a clue, I saw another thread...
Persistent objects in recursive python functions
...where this problem was mentioned briefly, and it was suggested that the parameters passed had to be mutable. Is that the answer, and how would I do this, in this example???
Here is my current code:
class Field_node(object):
field_phenotype_id = -1
field_name = ''
field_parent_id = -1
child_nodes = []
class Scan_instance_tree(object):
root_node = None
def __init__(self, a_db):
self.root_node = Field_node()
scan_field_values = self.create_scan_field_values(a_db) # This just creates a temporary user-friendly version of a database table
self.build_tree(scan_field_values)
def build_tree(self, a_scan_field_values):
self.root_node.field_name = 'ROOT'
self.add_child_nodes(a_scan_field_values, self.root_node)
def add_child_nodes(self, a_scan_field_values, a_parent_node):
i = 0
while i < len(a_scan_field_values):
if a_scan_field_values[i]['field_parent_dependancy'] == a_parent_node.field_phenotype_id:
#highest_level_children.append(a_scan_field_values.pop(a_scan_field_values.index(scan_field)))
child_node = Field_node()
child_node.field_phenotype_id = a_scan_field_values[i]['field_phenotype_id']
child_node.field_name = a_scan_field_values[i]['field_name']
child_node.field_parent_dependancy = a_scan_field_values[i]['field_parent_dependancy']
a_parent_node.child_nodes.append(child_node)
a_scan_field_values.remove(a_scan_field_values[i])
# RECURSION: get the child nodes
self.add_child_nodes(a_scan_field_values, child_node)
else:
i = i+1
If I remove the recursive call to self.add_child_nodes(...), the root's children are added correctly, ie they only consist of those nodes where the field_parent_dependancy = -1
If I allow the recursive call, the root's children contain all the nodes, regardless of the field_parent_dependancy value.
Best regards
Ann
When you define your Field_node class, the line
child_nodes = []
is actually instantiating a single list as a class attribute, rather than an instance attribute, that will be shared by all instances of the class.
What you should do instead is create instance attributes in __init__, e.g.:
class Field_node(object):
def __init__(self):
self.field_phenotype_id = -1
self.field_name = ''
self.field_parent_id = -1
self.child_nodes = []
Alright,
I have this program to sparse code in Newick Format, which extracts both a name, and a distance for use in a phylogenetic tree diagram.
What my problem is, in this branch of code, as the program reads through the newickNode function, it assigns the name and distance to the 'node' variable, then returns it back into the 'Node' class to be printed, but it seems to only print the first node 'A', and skips the other 3.
Is there anyway to finish the for loop in newickNode to read the other 3 nodes and print them accordingly with the first?
class Node:
def __init__(self, name, distance, parent=None):
self.name = name
self.distance = distance
self.children = []
self.parent = parent
def displayNode(self):
print "Name:",self.name,",Distance:",self.distance,",Children:",self.children,",Parent:",self.parent
def newickNode(newickString, parent=None):
String = newickString[1:-1].split(',')
for x in String:
splitString = x.split(':')
nodeName = splitString[0]
nodeDistance = float(splitString[1])
node = Node(nodeName, nodeDistance, parent)
return node
Node1 = newickNode('(A:0.1,B:0.2,C:0.3,D:0.4)')
Node1.displayNode()
Thanks!
You could make it a generator:
def newickNode(newickString, parent=None):
String = newickString[1:-1].split(',')
for x in String:
splitString = x.split(':')
nodeName = splitString[0]
nodeDistance = float(splitString[1])
node = Node(nodeName, nodeDistance, parent)
yield node
for node in newickNode('(A:0.1,B:0.2,C:0.3,D:0.4)'):
node.displayNode()
The generator will return one node at a time and pause within the function, and then resume when you want the next one.
Or just save them up and return them
def newickNode(newickString, parent=None):
String = newickString[1:-1].split(',')
nodes = []
for x in String:
splitString = x.split(':')
nodeName = splitString[0]
nodeDistance = float(splitString[1])
node = Node(nodeName, nodeDistance, parent)
nodes.append(node)
return nodes
Your newickNode() function should accumulate a list of nodes and return that, rather than returning the first node created. If you're going to do that, why have a loop to begin with?
def newickNodes(newickString, parent=None):
nodes = []
for node in newickString[1:-1].split(','):
nodeName, nodeDistance = node.split(':')
nodes.append(Node(nodeName, nodeDistance, parent))
return nodes
Alternatively, you could write it as a generator that yields the nodes one at a time. This would allow you to easily iterate over them or convert them to a list depending on your needs.
def newickNodes(newickString, parent=None):
for node in newickString[1:-1].split(','):
nodeName, nodeDistance = node.split(':')
yield Node(nodeName, nodeDistance, parent)
Also, from a object-oriented design POV, this should probably be a class method on your Node class named parseNewickString() or similar.
Alternatively, your newickNode() function could immediately call node.displayNode() on the new node each time through the loop.
To keep this more flexible - I would use pyparsing to process the Newick text and networkx so I had all the graph functionality I could desire - recommend to easy_install/pip those modules. It's also nice that someone has written a parser with node and tree creation already (although it looks like it lacks some features, it'll work for your case):
http://code.google.com/p/phylopy/source/browse/trunk/src/phylopy/newick.py?r=66
The first time through your for: loop, you return a node, which stops the function executing.
If you want to return a list of nodes, create the list at the top of the function, append to it each time through the loop, and return the list when you're done.
It may make more sense to move the loop outside of the newickNode function, and have that function only return a single node as its name suggests.