loop/recursion to handle hierarchical structures in Python

loop/recursion to handle hierarchical structures in Python - python

I was wondering if there's a more efficient way of looping through a set of objects that might have some dependencies (child-parent relationship) between them, e.g.:
dependencies = [(child1,parent1),(child2,parent2),...]
obj_stack = [{'id':'child1'},{'id':'child2'},{'id':'parent1'},..]
new_stack = [{'id':'child1'},{'id':'child3'}]
check_and_add_to_new_stack(obj) # function that decides if an obj should be added to the new stack
I want to check each obj_stack and add objs into new_stack under two conditions:
it has no parent and check_and_add_to_new_stack(obj) adds it (let's assume it does it correctly)
it has a parent, then add it directly (no need to use the above function)
This means that I need to check if an element has a parent, and then check&add their parent first. If it gets added, then I come back to that element. I am kinda getting stuck on the recursive loop.
Here is the pseudo-code:
def check_and_add_to_new_stack(obj,stack):
if passed_checks(obj):
return add_to_new_stack(obj,stack)
return stack
def myFunction(obj_stack, new_stack, dependencies):
for obj in obj_stack:
if obj is not in new_stack:
if obj has parent in dependencies:
myFunction([parent], new_stack, dependencies)
else: # here the original obj should be thrown back into the function
new_stack += check_and_add_to_new_stack(obj)
return stack
Edit: adding the result that I am expecting and more details:
Let's assume that
passed_checks(parent1) = False
passed_checks(parent2) = True
passed_checks(child1) = True
passed_checks(child2) = False
The expected result is:
myFunction(obj_stack, new_stack, dependencies)
> [{'id':'child1'},{'id':'child3'},{'id':'child2'},{'id':'parent2'}]
Even though passed_checks(child2) = False it has a parent2 for which passed_checks = True, so both get added to the resulting set. child1 was already in new_stack. parent1 did not get added because passed_checks = False.

I think you might be looking for something like this.
walk_ancestry yields node IDs for a given node and all of its parents based on the parents dict (which is basically dict(dependencies) from your original code)
check is your check function – I just copied your condition there
The all_passing set comprehension iterates over all of our known object names (obj_stack in your original code) and uses the built-in any function to see if any of the nodes in that node's ancestry pass the check() test. If so, it's considered passing.
This could be made faster by caching and memoization, but for small enough graphs, I'm pretty sure this works out alright.
def walk_ancestry(parents, node):
while True:
yield node
node = parents.get(node)
if not node:
break
def check(node):
return node in {"parent2", "child1"}
parents = {
"child1": "parent1",
"child2": "parent2",
"parent2": "parent3",
}
all_objects = {
"parent1",
"parent2",
"parent3",
"child1",
"child2",
}
all_passing = {
node
for node in all_objects
if any(check(n) for n in walk_ancestry(parents, node))
}
print(all_passing)
The output is
['parent2', 'child2', 'child1']

Related

Implementing a depth-first tree iterator in Python

I'm trying to implement an iterator class for not-necessarily-binary trees in Python. After the iterator is constructed with a tree's root node, its next() function can be called repeatedly to traverse the tree in depth-first order (e.g., this order), finally returning None when there are no nodes left.
Here is the basic Node class for a tree:
class Node(object):
def __init__(self, title, children=None):
self.title = title
self.children = children or []
self.visited = False
def __str__(self):
return self.title
As you can see above, I introduced a visited property to the nodes for my first approach, since I didn't see a way around it. With that extra measure of state, the Iterator class looks like this:
class Iterator(object):
def __init__(self, root):
self.stack = []
self.current = root
def next(self):
if self.current is None:
return None
self.stack.append(self.current)
self.current.visited = True
# Root case
if len(self.stack) == 1:
return self.current
while self.stack:
self.current = self.stack[-1]
for child in self.current.children:
if not child.visited:
self.current = child
return child
self.stack.pop()
This is all well and good, but I want to get rid of the need for the visited property, without resorting to recursion or any other alterations to the Node class.
All the state I need should be taken care of in the iterator, but I'm at a loss about how that can be done. Keeping a visited list for the whole tree is non-scalable and out of the question, so there must be a clever way to use the stack.
What especially confuses me is this--since the next() function, of course, returns, how can I remember where I've been without marking anything or using excess storage? Intuitively, I think of looping over children, but that logic is broken/forgotten when the next() function returns!
UPDATE - Here is a small test:
tree = Node(
'A', [
Node('B', [
Node('C', [
Node('D')
]),
Node('E'),
]),
Node('F'),
Node('G'),
])
iter = Iterator(tree)
out = object()
while out:
out = iter.next()
print out

If you really must avoid recursion, this iterator works:
from collections import deque
def node_depth_first_iter(node):
stack = deque([node])
while stack:
# Pop out the first element in the stack
node = stack.popleft()
yield node
# push children onto the front of the stack.
# Note that with a deque.extendleft, the first on in is the last
# one out, so we need to push them in reverse order.
stack.extendleft(reversed(node.children))
With that said, I think that you're thinking about this too hard. A good-ole' (recursive) generator also does the trick:
class Node(object):
def __init__(self, title, children=None):
self.title = title
self.children = children or []
def __str__(self):
return self.title
def __iter__(self):
yield self
for child in self.children:
for node in child:
yield node
both of these pass your tests:
expected = ['A', 'B', 'C', 'D', 'E', 'F', 'G']
# Test recursive generator using Node.__iter__
assert [str(n) for n in tree] == expected
# test non-recursive Iterator
assert [str(n) for n in node_depth_first_iter(tree)] == expected
and you can easily make Node.__iter__ use the non-recursive form if you prefer:
def __iter__(self):
return node_depth_first_iter(self)

That could still potentially hold every label, though. I want the
iterator to keep only a subset of the tree at a time.
But you already are holding everything. Remember that an object is essentially a dictionary with an entry for each attribute. Having self.visited = False in the __init__ of Node means you are storing a redundant "visited" key and False value for every single Node object no matter what. A set, at least, also has the potential of not holding every single node ID. Try this:
class Iterator(object):
def __init__(self, root):
self.visited_ids = set()
...
def next(self):
...
#self.current.visited = True
self.visited_ids.add(id(self.current))
...
#if not child.visited:
if id(child) not in self.visited_ids:
Looking up the ID in the set should be just as fast as accessing a node's attribute. The only way this can be more wasteful than your solution is the overhead of the set object itself (not its elements), which is only a concern if you have multiple concurrent iterators (which you obviously don't, otherwise the node visited attribute couldn't be useful to you).

create different list of dicts from master list of dict

I have a list of dictionary as following:-
listDict =[{'name':'A',
'fun':'funA',
'childs':[{'name':'B',
'fun':'funB',
'childs':[{ 'name':'D',
'fun':'funD'}]},
{'name':'C',
'fun':'funC',
'childs':[{ 'name':'E',
'fun':'funE'},
{ 'name':'F',
'fun':'funF'},
{ 'name':'G',
'fun':'funG',
'childs' :[{ 'name':'H',
'fun':'funH'}]}]}]},
{'name':'Z',
'fun':'funZ'}]
I wanted to create three list of dict from this :-
1. With no child and no parent
lod1 = [{'name':'Z'
'fun':'funZ'}]
2.with no child but having parent and parent as key:-
`lod2 = [{'B':[{ 'name':'D',
'fun':'funD'}]},
{'C':[{'name':'E',
'fun':'funE'},
{'name':'F',
'fun':'funF'}]},
{'G':[{ 'name':'H',
'fun':'funH'}]
}]`
3.with only parent child as a flat list with parent as key :-
lod3 = [{'A': [{ 'name':'B',
'fun':'funB'},
{'name':'C',
'fun':'funC'}]},
{'C': [{'name':'G',
'fun':'funG'}]
}]
Is there any possible way to do this with or without recursion. The purpose of this division is that I am trying to create a flat class structure, where are all nodes in category 1 (no child and parent) are added as a function of final class. All nodes with no child but having parent (category 2) are added as a function of respective parent class. And remaining parent child (category 3) will be created as classes with childs having instance of parent.

This is a task for which the Visitor Pattern is appropriate. You have a tree like structure and you wish to traverse it, accumulating the three different sets of information.
To implement this well, you should separate the traversal of the structure from the data collection. In that way you only need to define the different forms of data collection rather than reimplemeting the visitor each time. So lets start with that.
The visitor will take a dictionary, consider it, and will visit all of the dictionaries in the childs list (you may wish to rename this to children).
from abc import ABCMeta, abstractmethod
class DictionaryVisitor(object):
__metaclass__ = ABCMeta
#abstractmethod
def visit(self, node, parents, result):
""" This inspects the current node and
accumulates any data into result """
pass # Implement this in the subclasses
def accept(self, node, parents, result):
""" This tests if the node should be traversed. This is an efficiency
improvement to prevent traversing lots of nodes that you have no
interest in """
return True
def traverse(self, node, parents, result):
""" This traverses the dictionary, visiting each node in turn """
if not self.accept(node, parents, result):
return
self.visit(node, parents, result)
if 'childs' in node:
for child in node['childs']:
self.traverse(child, parents + [node], result)
def start(self, dict_list): # bad method name
""" This just handles the parents and result argument of traverse """
# Assuming that result is always a list is not normally appropriate
result = []
for node in dict_list:
self.traverse(node, [], result)
return result
You can then implement the different required outputs as subclasses of this abstract base class:
class ParentlessChildlessVisitor(DictionaryVisitor):
def visit(self, node, parents, result):
""" Collect the nodes that have no parents or children """
# parent filtering is performed in accept
if 'childs' not in node:
result.append(node)
def accept(self, nodes, parents, result):
""" Reject all nodes with parents """
return not parents
Then you can call it:
visitor = ParentlessChildlessVisitor()
results = visitor.start(data)
print results
# prints [{'fun': 'funZ', 'name': 'Z'}]
The next one:
class ChildlessChildVisitor(DictionaryVisitor):
def visit(self, node, parents, result):
""" Collect the nodes that have parents but no children """
if parents and 'childs' not in node:
# slightly odd data structure here, a list of dicts where the only
# dict key is unique. It would be better to be a plain dict, which
# is what is done here:
result[parents[-1]['name']].append(node)
def start(self, dict_list):
""" This just handles the parents and result argument of traverse """
# Here it is much better to have a dict as the result.
# This is an example of why wrapping all this logic in the start method
# is not normally appropriate.
result = defaultdict(list)
for node in dict_list:
self.traverse(node, [], result)
return result
visitor = ChildlessChildVisitor()
results = visitor.start(listDict)
print dict(results)
# prints {'C': [{'fun': 'funE', 'name': 'E'}, {'fun': 'funF', 'name': 'F'}], 'B': [{'fun': 'funD', 'name': 'D'}], 'G': [{'fun': 'funH', 'name': 'H'}]}
It is not entirely clear to me what you want to collect with the last example so you will have to handle that one yourself.

My recursive function (populates a tree structure) is adding to the root node during every loop/call

I have an algorithm to populate a tree like structure (class: Scan_instance_tree), but unfortunately, during each call, it is incorrectly adding to the root node's children, as well as to the new child nodes created further down in the tree.
As a clue, I saw another thread...
Persistent objects in recursive python functions
...where this problem was mentioned briefly, and it was suggested that the parameters passed had to be mutable. Is that the answer, and how would I do this, in this example???
Here is my current code:
class Field_node(object):
field_phenotype_id = -1
field_name = ''
field_parent_id = -1
child_nodes = []
class Scan_instance_tree(object):
root_node = None
def __init__(self, a_db):
self.root_node = Field_node()
scan_field_values = self.create_scan_field_values(a_db) # This just creates a temporary user-friendly version of a database table
self.build_tree(scan_field_values)
def build_tree(self, a_scan_field_values):
self.root_node.field_name = 'ROOT'
self.add_child_nodes(a_scan_field_values, self.root_node)
def add_child_nodes(self, a_scan_field_values, a_parent_node):
i = 0
while i < len(a_scan_field_values):
if a_scan_field_values[i]['field_parent_dependancy'] == a_parent_node.field_phenotype_id:
#highest_level_children.append(a_scan_field_values.pop(a_scan_field_values.index(scan_field)))
child_node = Field_node()
child_node.field_phenotype_id = a_scan_field_values[i]['field_phenotype_id']
child_node.field_name = a_scan_field_values[i]['field_name']
child_node.field_parent_dependancy = a_scan_field_values[i]['field_parent_dependancy']
a_parent_node.child_nodes.append(child_node)
a_scan_field_values.remove(a_scan_field_values[i])
# RECURSION: get the child nodes
self.add_child_nodes(a_scan_field_values, child_node)
else:
i = i+1
If I remove the recursive call to self.add_child_nodes(...), the root's children are added correctly, ie they only consist of those nodes where the field_parent_dependancy = -1
If I allow the recursive call, the root's children contain all the nodes, regardless of the field_parent_dependancy value.
Best regards
Ann

When you define your Field_node class, the line
child_nodes = []
is actually instantiating a single list as a class attribute, rather than an instance attribute, that will be shared by all instances of the class.
What you should do instead is create instance attributes in __init__, e.g.:
class Field_node(object):
def __init__(self):
self.field_phenotype_id = -1
self.field_name = ''
self.field_parent_id = -1
self.child_nodes = []

Wxpython: TreeCtrl: Iteration over a tree

I am using the following method to iterate over all the nodes of a wxpython treectrl.
def get_desired_parent(self, name, selectednode = None):
if selectednode == None:
selectednode = self.treeCtrl.RootItem
# First perform the action on the first object separately
childcount = self.treeCtrl.GetChildrenCount(selectednode, False)
if childcount == 0:
return None
(item,cookie) = self.treeCtrl.GetFirstChild(selectednode)
if self.treeCtrl.GetItemText(item) == name:
return item
while childcount > 1:
childcount = childcount - 1
# Then iterate over the rest of objects
(item,cookie) = self.treeCtrl.GetNextChild(item,cookie)
if self.treeCtrl.GetItemText(item) == name:
return item
return None
This problem of excess code becomes even more apparent when I am iterating inside the structure recursively.
Is there another way of performing the same actions in more compact manner, to make my code more concise / pythonic.

You could use a function that is inside this one (in its namespace only) that will check if it matches the conditiin or not. If it does return the item if it doesn't, continue.
Otherwise you could check your condition just after the while line. This way the item variable will be defined by the first child before the loop and evaluated like any other.
Still another way: (or a mix of the two)
(child, cookie) = self.GetFirstChild(item)
while child.IsOk():
do_something(child)
(child, cookie) = self.GetNextChild(item, cookie)

Here a full example that traverses the tree going depth first. The function was bound to the right button.
def OnRightDown(self, event):
def showChildren(item,cookie):
# functions goes recursively down the tree
if item.IsOk():
child, cookie = self.tree.GetFirstChild(item)
while child.IsOk():
child, cookie = self.tree.GetNextChild(child, cookie)
if child:
print(self.tree.GetItemText(child)) #show child label name
showChildren(child,cookie)
pt = event.GetPosition()
item, flags = self.tree.HitTest(pt)
if item:
print(self.tree.GetItemText(item)) #show parent label name
showChildren(item,0) #iterate depth first into the tree

The best way to make your code highly readable here is to make it short and highly functional.
If you need to iterate through all the tree items and do so through depth first. Here's that as a single quick function. Hand it a function that gets each item, and where you start (usually self.root). It's also quite reusable since you might be doing this a lot.
def depth_first_tree(self, funct, item):
(child, cookie) = self.tree.GetFirstChild(item)
while child.IsOk():
self.depth_first_tree(funct, child)
funct(child)
(child, cookie) = self.tree.GetNextChild(item, cookie)

Finding certain child in wxTreeCtrl and updating TreeCtrl in wxPython

How can I check if a certain root in a wx.TreeCtrl object has a certain child or not?
I am writing manual functions to update TreeCtrl every time a child is added by user.Is there a way to automate this?

You might want to consider storing the data in some other easily-searchable structure, and using the TreeCtrl just to display it. Otherwise, you can iterate over the children of a TreeCtrl root item like this:
def item_exists(tree, match, root):
item, cookie = tree.GetFirstChild(root)
while item.IsOk():
if tree.GetItemText(item) == match:
return True
#if tree.ItemHasChildren(item):
# if item_exists(tree, match, item):
# return True
item, cookie = tree.GetNextChild(root, cookie)
return False
result = item_exists(tree, 'some text', tree.GetRootItem())
Uncommenting the commented lines will make it a recursive search.

A nicer way to handle recursive tree traversal is to wrap it in a generator object, which you can then re-use to perform any operation you like on your tree nodes:
def walk_branches(tree,root):
""" a generator that recursively yields child nodes of a wx.TreeCtrl """
item, cookie = tree.GetFirstChild(root)
while item.IsOk():
yield item
if tree.ItemHasChildren(item):
walk_branches(tree,item)
item,cookie = tree.GetNextChild(root,cookie)
for node in walk_branches(my_tree,my_root):
# do stuff

For searching by text without recursion :
def GetItemByText(self, search_text, tree_ctrl_instance):
retval = None
root_list = [tree_ctrl_instance.GetRootItem()]
for root_child in root_list:
item, cookie = tree_ctrl_instance.GetFirstChild(root_child)
while item.IsOk():
if tree_ctrl_instance.GetItemText(item) == search_text:
retval = item
break
if tree_ctrl_instance.ItemHasChildren(item):
root_list.append(item)
item, cookie = tree_ctrl_instance.GetNextChild(root_child, cookie)
return retval

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

loop/recursion to handle hierarchical structures in Python - python

Related

Implementing a depth-first tree iterator in Python

create different list of dicts from master list of dict

My recursive function (populates a tree structure) is adding to the root node during every loop/call

Wxpython: TreeCtrl: Iteration over a tree

Finding certain child in wxTreeCtrl and updating TreeCtrl in wxPython

Categories

Resources