How do I build a tree dynamically in Python - python

A beginner Python/programming question... I'd like to build a tree structure in Python, preferably based on dictionaries. I found code that does this neatly:
Tree = lambda: collections.defaultdict(Tree)
root = Tree()
This can easily be populated like:
root['toplevel']['secondlevel']['thirdlevel'] = 1
root['toplevel']['anotherLevel'] = 2
...etc.
I'd like to populate the levels/leaves dynamically so that I can add as many levels as needed, and where the leaves can be at any level. How do I do that?
Any help is greatly appreciated.

You can simply do it with a utility function, like this
def add_element(root, path, data):
reduce(lambda x, y: x[y], path[:-1], root)[path[-1]] = data
You can use it, like this
import collections
tree = lambda: collections.defaultdict(tree)
root = tree()
add_element(root, ['toplevel', 'secondlevel', 'thirdlevel'], 1)
add_element(root, ['toplevel', 'anotherlevel'], 2)
print root
Output
defaultdict(<function <lambda> at 0x7f1145eac7d0>,
{'toplevel': defaultdict(<function <lambda> at 0x7f1145eac7d0>,
{'secondlevel': defaultdict(<function <lambda> at 0x7f1145eac7d0>,
{'thirdlevel': 1}),
'anotherlevel': 2
})
})
If you want to implement this in recursive manner, you can take the first element and get the child object from current root and strip the first element from the path, for the next iteration.
def add_element(root, path, data):
if len(path) == 1:
root[path[0]] = data
else:
add_element(root[path[0]], path[1:], data)

aah! this was a problem for me when I started coding as well, but the best of us come across this early.
Note; this is for when your tree is going N levels deep. where N is between 0 and infinite, ie; you don't know how deep it can go; it may only have a first level, or it may go up to a 20th level
your problem is a general programming problem; reading in a tree that could be any number of levels deep and the solution to that is; Recursion.
whenever reading in a tree structure, you have to;
1 - build up an object
2 - check whether the object has children
2a - if the object has children, do steps 1 and 2 for each child.
here's a code template in python for doing this;
def buildTree(treeObject):
currObject = Hierarchy()
currObject.name = treeObject.getName()
currObject.age = treeObject.getAge()
#as well as any other calculations and values you have to set for that object
for child in treeObject.children:
currChild = buildTree(child)
currObject.addChild(currChild)
#end loop
return currObject

This
root['toplevel']['secondlevel']['thirdlevel'] = 1
can also be done like this:
node = root
for key in ('toplevel', 'secondlevel'):
node = node[key]
node['thirdlevel'] = 1
I hope that gives you an idea.

Related

Python: Use a For Loop to Create a Dictionary of XML Nested Tags for Undetermined Level of Nesting

I'm parsing many thousands of XML files, each with generally more than 200 tags that can vary quite a lot. To compare them, I want to be able to gather their structures, and then compare each tier of nested tags between the various files.
Nested dictionaries seem to be the way to go, and this code is cumbersome, but it works, for each level of nesting I find and use.
import xml.etree.ElementTree as ET
strip_ns = lambda xx: str(xx).split('}', 1)[1]
tree = ET.parse('xmlpath.xml')
root = tree.getroot()
tierdict = {}
for tier1 in root:
tier1var = strip_ns(tier1.tag)
tierdict[tier1var] = {}
for tier2 in tier1:
tier2var = strip_ns(tier2.tag)
tierdict[tier1var][tier2var] = {}
for tier3 in tier2:
tier3var = strip_ns(tier3.tag)
tierdict[tier1var][tier2var][tier3var] = {}
for tier4 in tier3:
tier4var = strip_ns(tier4.tag)
tierdict[tier1var][tier2var][tier3var][tier4var] = {}
However:
1) Is there more efficient way to code this as is, especially when it comes to identifying each "tier"/level of nesting?
2) More importantly, is there a way to do this kind of looping for nesting of unknown depth? Some of these files might go 10 or 20 nestings deep, and I have no way to check them by hand.

Modify a binary tree in python

I have a binary tree with 7 elements which currently looks like this:
1
5 2
7 6 4 3
I am trying to traverse it in postorder and relable the elements as I go, so that it looks like this:
7
3 6
1 2 4 5
using the following function, which is part of my Tree class:
def relable(self, h):
if self.root is not None:
self._relable(self.root, h)
def _relable(self, node, h):
if node is not None:
self._relable(node.l, h-2)
self._relable(node.r, h-1)
node = Node(h)
The rest of my Tree class is more or less the same as the one here.
I populated the tree by adding the numbers 1-7 in a loop.
However, when I call tree.relable(7), and then print the tree, the tree is the same.
I'm guessing this has something to do with how Python passes arguments (I'm a C++ programmer) but I don't know how to fix this.
The entirety of my code can be fount here.
node = Node(h) is just assigning a local variable, it doesn't have any effect on the node parameter that was passed to the function. You need to actually modify the node, i.e. node.v = h.

python function not recurving properly (adding nodes to graph)

I'm having a rare honest-to-goodness computer science problem (as opposed to the usual how-do-I-make-this-language-I-don't-write-often-enough-do-what-I-want problem), and really feeling my lack of a CS degree for a change.
This is a bit messy, because I'm using several dicts of lists, but the basic concept is this: a Twitter-scraping function that adds retweets of a given tweet to a graph, node-by-node, building outwards from the original author (with follower relationships as edges).
for t in RTs_list:
g = nx.DiGraph()
followers_list=collections.defaultdict(list)
level=collections.defaultdict(list)
hoppers=collections.defaultdict(list)
retweets = []
retweeters = []
try:
u = api.get_status(t)
original_tweet = u.retweeted_status.id_str
print original_tweet
ot = api.get_status(original_tweet)
node_adder(ot.user.id, 1)
# Can't paginate -- can only get about ~20 RTs max. Need to work on small data here.
retweets = api.retweets(original_tweet)
for r in retweets:
retweeters.append(r.user.id)
followers_list["0"] = api.followers_ids(ot.user.id)[0]
print len(retweets),"total retweets"
level["1"] = ot.user.id
g.node[ot.user.id]['crossover'] = 1
if g.node[ot.user.id]["followers_count"]<4000:
bum_node_adder(followers_list["0"],level["1"], 2)
for r in retweets:
rt_iterator(r,retweets,0,followers_list,hoppers,level)
except:
print ""
def rt_iterator(r,retweets,q,followers_list,hoppers,level):
q = q+1
if r.user.id in followers_list[str(q-1)]:
hoppers[str(q)].append(r.user.id)
node_adder(r.user.id,q+1)
g.add_edge(level[str(q)], r.user.id)
try:
followers_list[str(q)] = api.followers_ids(r.user.id)[0]
level[str(q+1)] = r.user.id
if g.node[r.user.id]["followers_count"]<4000:
bum_node_adder(followers_list[str(q)],level[str(q+1)],q+2)
crossover = pull_crossover(followers_list[str(q)],followers_list[str(q-1)])
if q<10:
for r in retweets:
rt_iterator(r,retweets,q,followers_list,hoppers,level)
except:
print ""
There's some other function calls in there, but they're not related to the problem. The main issue is how Q counts when going from a (e.g.) a 2-hop node to a 3-hop node. I need it to build out to the maximum depth (10) for every branch from the center, whereas right now I believe it's just building out to the maximum depth for the first branch it tries. Hope that makes sense. If not, typing it up here has helped me; I think I'm just missing a loop in there somewhere but it's tough for me to see.
Also, ignore that various dicts refer to Q+1 or Q-1, that's an artifact of how I implemented this before I refactored to make it recurve.
Thanks!
I'm not totally sure what you mean by "the center" but I think you want something like this:
def rt_iterator(depth, other-args):
# store whatever info you need from this point in the tree
if depth>= MAX_DEPTH:
return
# look at the nodes you want to expand from here
for each node, in the order you want them expanded:
rt_iterator(depth+1, other-args)
think I've fixed it... this way Q isn't incremented when it shouldn't be.
def rt_iterator(r,retweets,q,depth,followers_list,hoppers,level):
def node_iterator (r,retweets,q,depth,followers_list,hoppers,level):
for r in retweets:
if r.user.id in followers_list[str(q-1)]:
hoppers[str(q)].append(r.user.id)
node_adder(r.user.id,q+1)
g.add_edge(level[str(q)], r.user.id)
try:
level[str(q+1)] = r.user.id
if g.node[r.user.id]["followers_count"]<4000:
followers_list[str(q)] = api.followers_ids(r.user.id)[0]
bum_node_adder(followers_list[str(q)],level[str(q+1)],q+2)
crossover = pull_crossover(followers_list[str(q)],followers_list[str(q-1)])
if q<10:
node_iterator(r,retweets,q+1,depth,followers_list,hoppers,level)
except:
print ""
depth = depth+1
q = depth
if q<10:
rt_iterator(r,retweets,q,depth,followers_list,hoppers,level)

Finding Successors of Successors in a Directed Graph in NetworkX

I'm working on some code for a directed graph in NetworkX, and have hit a block that's likely the result of my questionable programming experience. What I'm trying to do is the following:
I have a directed graph G, with two "parent nodes" at the top, from which all other nodes flow. When graphing this network, I'd like to graph every node that is a descendant of "Parent 1" one color, and all the other nodes another color. Which means I need a list Parent 1's successors.
Right now, I can get the first layer of them easily using:
descend= G.successors(parent1)
The problem is this only gives me the first generation of successors. Preferably, I want the successors of successors, the successors of the successors of the successors, etc. Arbitrarily, because it would be extremely useful to be able to run the analysis and make the graph without having to know exactly how many generations are in it.
Any idea how to approach this?
You don't need a list of descendents, you just want to color them. For that you just have to pick a algorithm that traverses the graph and use it to color the edges.
For example, you can do
from networkx.algorithms.traversal.depth_first_search import dfs_edges
G = DiGraph( ... )
for edge in dfs_edges(G, parent1):
color(edge)
See https://networkx.github.io/documentation/stable/reference/algorithms/generated/networkx.algorithms.traversal.depth_first_search.dfs_edges.html?highlight=traversal
If you want to get all the successor nodes, without passing through edges, another way could be:
import networkx as nx
G = DiGraph( ... )
successors = nx.nodes(nx.dfs_tree(G, your_node))
I noticed that if you call instead:
successors = list(nx.dfs_successors(G, your_node)
the nodes of the bottom level are somehow not included.
Well, the successor of successor is just the successor of the descendants right?
# First successors
descend = G.successors(parent1)
# 2nd level successors
def allDescendants(d1):
d2 = []
for d in d1:
d2 += G.successors(d)
return d2
descend2 = allDescendants(descend)
To get level 3 descendants, call allDescendants(d2) etc.
Edit:
Issue 1:
allDescend = descend + descend2 gives you the two sets combined, do the same for further levels of descendants.
Issue2: If you have loops in your graph, then you need to first modify the code to test if you've visited that descendant before, e.g:
def allDescendants(d1, exclude):
d2 = []
for d in d1:
d2 += filter(lambda s: s not in exclude, G.successors(d))
return d2
This way, you pass allDescend as the second argument to the above function so it's not included in future descendants. You keep doing this until allDescandants() returns an empty array in which case you know you've explored the entire graph, and you stop.
Since this is starting to look like homework, I'll let you figure out how to piece all this together on your own. ;)
So that the answer is somewhat cleaner and easier to find for future folks who stumble upon it, here's the code I ended up using:
G = DiGraph() # Creates an empty directed graph G
infile = open(sys.argv[1])
for edge in infile:
edge1, edge2 = edge.split() #Splits data on the space
node1 = int(edge1) #Creates integer version of the node names
node2 = int(edge2)
G.add_edge(node1,node2) #Adds an edge between two nodes
parent1=int(sys.argv[2])
parent2=int(sys.argv[3])
data_successors = dfs_successors(G,parent1)
successor_list = data_successors.values()
allsuccessors = [item for sublist in successor_list for item in sublist]
pos = graphviz_layout(G,prog='dot')
plt.figure(dpi=300)
draw_networkx_nodes(G,pos,node_color="LightCoral")
draw_networkx_nodes(G,pos,nodelist=allsuccessors, node_color="SkyBlue")
draw_networkx_edges(G,pos,arrows=False)
draw_networkx_labels(G,pos,font_size=6,font_family='sans-serif',labels=labels)
I believe Networkx has changed since #Jochen Ritzel 's answer a few years ago.
Now the following holds, only changing the import statement.
import networkx
from networkx import dfs_edges
G = DiGraph( ... )
for edge in dfs_edges(G, parent1):
color(edge)
Oneliner:
descendents = sum(nx.dfs_successors(G, parent).values(), [])
nx.descendants(G, parent)
more details: https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.dag.descendants.html

depth-first algorithm in python does not work

I have some project which I decide to do in Python. In brief: I have list of lists. Each of them also have lists, sometimes one-element, sometimes more. It looks like this:
rules=[
[[1],[2],[3,4,5],[4],[5],[7]]
[[1],[8],[3,7,8],[3],[45],[12]]
[[31],[12],[43,24,57],[47],[2],[43]]
]
The point is to compare values from numpy array to values from this rules (elements of rules table). We are comparing some [x][y] point to first element (e.g. 1 in first element), then, if it is true, value [x-1][j] from array with second from list and so on. Five first comparisons must be true to change value of [x][y] point. I've wrote sth like this (main function is SimulateLoop, order are switched because simulate2 function was written after second one):
def simulate2(self, i, j, w, rule):
data = Data(rule)
if w.world[i][j] in data.c:
if w.world[i-1][j] in data.n:
if w.world[i][j+1] in data.e:
if w.world[i+1][j] in data.s:
if w.world[i][j-1] in data.w:
w.world[i][j] = data.cc[0]
else: return
else: return
else: return
else: return
else: return
def SimulateLoop(self,w):
for z in range(w.steps):
for i in range(2,w.x-1):
for j in range(2,w.y-1):
for rule in w.rules:
self.simulate2(i,j,w,rule)
Data class:
class Data:
def __init__(self, rule):
self.c = rule[0]
self.n = rule[1]
self.e = rule[2]
self.s = rule[3]
self.w = rule[4]
self.cc = rule[5]
NumPy array is a object from World class. Rules is list as described above, parsed by function obtained from another program (GPL License).
To be honest it seems to work fine, but it does not. I was trying other possibilities, without luck. It is working, interpreter doesn't return any errors, but somehow values in array changing wrong. Rules are good because it was provided by program from which I've obtained parser for it (GPL license).
Maybe it will be helpful - it is Perrier's Loop, modified Langton's loop (artificial life).
Will be very thankful for any help!
)
I am not familiar with Perrier's Loop, but if you code something like famous "game life" you would have done simple mistake: store the next generation in the same array thus corrupting it.
Normally you store the next generation in temporary array and do copy/swap after the sweep, like in this sketch:
def do_step_in_game_life(world):
next_gen = zeros(world.shape) # <<< Tmp array here
Nx, Ny = world.shape
for i in range(1, Nx-1):
for j in range(1, Ny-1):
neighbours = sum(world[i-1:i+2, j-1:j+2]) - world[i,j]
if neighbours < 3:
next_gen[i,j] = 0
elif ...
world[:,:] = next_gen[:,:] # <<< Saving computed next generation

Categories