I was wondering how to best implement a tree data structure to be able to enumerate paths of all levels. Let me explain it with the following example:
A
/ \
B C
| /\
D E F
I want to be able to generate the following:
A
B
C
D
E
F
A-B
A-C
B-D
C-E
C-F
A-B-D
A-C-E
A-C-F
As of now, I am executing a depth-first-search for different depths on a data structure built using a dictionary and recording unique nodes that are seen but I was wondering if there is a better way to do this kind of a traversal. Any suggestions?
Whenever you find a problem on trees, just use recursion :D
def paths(tree):
#Helper function
#receives a tree and
#returns all paths that have this node as root and all other paths
if tree is the empty tree:
return ([], [])
else: #tree is a node
root = tree.value
rooted_paths = [[root]]
unrooted_paths = []
for subtree in tree.children:
(useable, unueseable) = paths(subtree)
for path in useable:
unrooted_paths.append(path)
rooted_paths.append([root]+path)
for path in unuseable:
unrooted_paths.append(path)
return (rooted_paths, unrooted_paths)
def the_function_you_use_in_the_end(tree):
a,b = paths(tree)
return a+b
Just one more way:
Every path without repetitions in tree is uniquely described by its start and finish.
So one of ways to enumerate paths is to enumerate every possible pair of vertices. For each pair it's relatively easy to find path (find common ancestor and go through it).
Find a path to each node of the tree using depth first search, then call enumerate-paths(Path p), where p is the path from the root to the node. Let's assume that a path p is an array of nodes p[0] [1] .. p[n] where p[0] is the root and p[n] is the current node.
enumerate-paths(p) {
for i = 0 .. n
output p[n - i] .. p[n] as a path.
}
Each of these paths is different, and each of them is different from the results returned from any other node of the tree since no other paths end in p[n]. Clearly it is complete, since any path is from a node to some node between it and the root. It is also optimal, since it finds and outputs each path exactly once.
The order will be slightly different from yours, but you could always create an array of list of paths where A[x] is a List of the paths of length x. Then you could output the paths in order of their length, although this would take O(n) storage.
Related
Problem link: 2096. Step-By-Step Directions From a Binary Tree Node to Another.
I was solving a problem that asks us to output the shortest step-by-step path between a start node and a destination node in a binary tree, where traveling to left and right children are denoted by ‘L’ and ‘R’, and traveling to a parent node is denoted by ‘U’. We are given the reference to the root node, and the number of nodes n can be as large as 100,000.
My solution is as follows:
Run a BFS to find both start and destination nodes, while generating the paths along the way using arrays. Once we find both nodes, return the path arrays.
Now that we have paths to both start and destination nodes, find the lowest common ancestor of the two nodes by throwing out any initial common letters (e.g. if startPath = [‘L’, ‘R’, ‘L’] and destPath = [‘L’, ‘L’, ‘R’], the lowest common ancestor is root.left, and the remaining startPath from this lowest common ancestor is [‘R’, ‘L’].)
Finally, to get the path between startNode and destNode, we can convert all remaining startPath letters to ‘U’s, and then add on the remaining destPath.
The relevant code pertaining to the question is as follows:
def bfs_path2(root, target1, target2):
q = deque([(root, [])])
while q and not (found1 and found2):
node, path = q.popleft()
if node.left:
q.append((node.left, path + ['L']))
if node.right:
q.append((node.right, path + ['R']))
return path1, path2
s_path, t_path = bfs_path2(root, startValue, destValue)
# i, j legal indices of s_path and t_path
return "".join(['U']*len(s_path[i:]) + t_path[j:])
The runtime of this against a large test case was >10s. However, if I change the implementation of the BFS queue elements to strings, it runs in ~3s:
def bfs_path2(root, target1, target2):
q = deque([(root, '')])
while q and not (found1 and found2):
node, path = q.popleft()
if node.left:
q.append((node.left, path + 'L'))
if node.right:
q.append((node.right, path + 'R'))
return path1, path2
s_path, t_path = bfs_path2(root, startValue, destValue)
# i, j legal indices of s_path and t_path
return 'U'*len(s_path[i:]) + t_path[j:]
There are many links on StackOverflow showing that string concatenation is far slower than the join() method in Python, so I’m confused as to why the first code runtime is much slower than the second code runtime. Am I missing something here?
I have a question about the following code, but i guess applies to different functions.
This function computes the maximum path and its length for a DAG, given the Graph, source node, and end node.
To keep track of already computed distances across recursions I use "max_distances_and_paths" variable, and update it on each recursion.
Is it better to keep it as a function parameter (inputed and outputed across recursions) or
use a global variable and initialize it outside the function?
How can avoid to have this parameter returned when calling the function externally (i.e it
has to be outputed across recursions but I dont care about its value, externally)?
a better way than doing: LongestPath(G, source, end)[0:2] ??
Thanks
# for a DAG computes maximum distance and maximum path nodes sequence (ordered in reverse).
# Recursively computes the paths and distances to edges which are adjacent to the end node
# and selects the maximum one
# It will return a single maximum path (and its distance) even if there are different paths
# with same max distance
# Input {Node 1: adj nodes directed to Node 1 ... Node N: adj nodes directed to Node N}
# Example: {'g': ['r'], 'k': ['g', 'r']})
def LongestPath(G, source, end, max_distances_and_paths=None):
if max_distances_and_paths is None:
max_distances_and_paths = {}
max_path = [end]
distances_list = []
paths_list = []
# return max_distance and max_path from source to current "end" if already computed (i.e.
# present in the dictionary tracking maximum distances and correspondent distances)
if end in max_distances_and_paths:
return max_distances_and_paths[end][0], max_distances_and_paths[end][1], max_distances_and_paths
# base case, when end node equals source node
if source == end:
max_distance = 0
return max_distance, max_path, max_distances_and_paths
# if there are no adjacent nodes directed to end node (and is not the source node, previous case)
# means path is disconnected
if len(G[end]) == 0:
return 0, [0], {"": []}
# for each adjacent node pointing to end node compute recursively its max distance to source node
# and add one to get the distance to end node. Recursively add nodes included in the path
for t in G[end]:
sub_distance, sub_path, max_distances_and_paths = LongestPath(G, source, t, max_distances_and_paths)
paths_list += [[end] + sub_path]
distances_list += [1 + sub_distance]
# compute max distance
max_distance = max(distances_list)
# access the same index where max_distance is, in the list of paths, to retrieve the path
# correspondent to the max distance
index = [i for i, x in enumerate(distances_list) if x == max_distance][0]
max_path = paths_list[index]
# update the dictionary tracking maximum distances and correspondent paths from source
# node to current end node.
max_distances_and_paths.update({end: [max_distance, max_path]})
# return computed max distance, correspondent path, and tracker
return max_distance, max_path, max_distances_and_paths
Global variables are generally avoided due to several reasons (see Why are global variables evil?). I would recommend sending the parameter in this case. However, you could define a larger function housing your recursive function. Here's a quick example I wrote for a factorial code:
def a(m):
def b(m):
if m<1:return 1
return m*b(m-1)
n = b(m)
m=m+2
return n,m
print(a(6))
This will give: (720, 8). This proves that even if you used the same variable name in your recursive function, the one you passed in to the larger function will not change. In your case, you want to just return n as per my example. I only returned an edited m value to show that even though both a and b functions have m as their input, Python separates them.
In general I would say avoid the usage of global variables. This is because is makes you code harder to read and often more difficult to debug if you codebase gets a bit more complex. So it is good practice.
I would use a helper function to initialise your recursion.
def longest_path_helper(G, source, end, max_distances_and_paths=None):
max_distance, max_path, max_distances_and_paths = LongestPath(
G, source, end, max_distances_and_paths
)
return max_distance, max_path, max_distances_and_paths
On a side note, in Python it is convention to write functions without capital letters and separated with underscores and Capicalized without underscores are used for classes. So it would be more Pythonic to use def longest_path():
Suppose I have a directed graph G in Network X such that:
G has multiple trees in it
Every node N in G has exactly 1 or 0
parent's.
For a particular node N1, I want to find the root node of the tree it resides in (its ancestor that has a degree of 0). Is there an easy way to do this in network x?
I looked at:
Getting the root (head) of a DiGraph in networkx (Python)
But there are multiple root nodes in my graph. Just only one root node that happens to be in the same tree as N1.
edit Nov 2017 note that this was written before networkx 2.0 was released. There is a migration guide for updating 1.x code into 2.0 code (and in particular making it compatible for both)
Here's a simple recursive algorithm. It assumes there is at most a single parent. If something doesn't have a parent, it's the root. Otherwise, it returns the root of its parent.
def find_root(G,node):
if G.predecessors(node): #True if there is a predecessor, False otherwise
root = find_root(G,G.predecessors(node)[0])
else:
root = node
return root
If the graph is a directed acyclic graph, this will still find a root, though it might not be the only root, or even the only root ancestor of a given node.
I took the liberty of updating #Joel's script. His original post did not work for me.
def find_root(G,child):
parent = list(G.predecessors(child))
if len(parent) == 0:
print(f"found root: {child}")
return child
else:
return find_root(G, parent[0])
Here's a test:
G = nx.DiGraph(data = [('glu', 'skin'), ('glu', 'bmi'), ('glu', 'bp'), ('glu', 'age'), ('npreg', 'glu')])
test = find_root(G, "age")
age
glu
npreg
found root: npreg
Networkx - 2.5.1
The root/leaf node can be found using the edges.
for node_id in graph.nodes:
if len(graph.in_edges(node_id)) == 0:
print("root node")
if len(graph.out_edges(node_id)) == 0:
print("leaf node")
In case of multiple roots, we can do something like this:
def find_multiple_roots(G, nodes):
list_roots = []
for node in nodes:
predecessors = list(G.predecessors(node))
if len(predecessors)>0:
for predecessor in predecessors:
list_roots.extend(find_root(G, [predecessor]))
else:
list_roots.append(node)
return list_roots
Usage:
# node need to be passed as a list
find_multiple_roots(G, [node])
Warning: This recursive function can explode pretty quick (the number of recursive functions called could be exponentially proportional to the number of nodes exist between the current node and the root), so use it with care.
I am trying to simulate a random traversal through a directed networkx graph. The pseudo code is as follows
Create graph G with nodes holding the value true or false.
// true -> visited, false -> not visited
pick random node N from G
save N.successors as templist
while true
nooptions = false
pick random node N from templist
while N from templist has been visited
remove N from templist
pick random node N from templist
if templist is empty
nooptions = true
break
if nooptions = true
break
save N.successors as templist
Is there are a more efficient way of marking a path as traveled other than
creating a temporary list and removing the elements if they are marked as visited?
EDIT
The goal of the algorithm is to pick a node at random in the graph. Pick a random successor/child of that node. If it is unvisited, go there and mark it as visited. Repeat until there are either no successors/children or there are no unvisited successors/children
Depending on the size of your graph, you could use the built-in all_pairs_shortest_path function. Your function would then be basically:
G = nx.DiGraph()
<add some stuff to G>
# Get a random path from the graph
all_paths = nx.all_pairs_shortest_path(G)
# Choose a random source
source = random.choice(all_paths.keys())
# Choose a random target that source can access
target = random.choice(all_paths[source].keys())
# Random path is at
random_path = all_paths[source][target]
There doesn't appear to be a way to just generate the random paths starting at source that I saw, but the python code is accessible, and adding that feature would be straightforward I think.
Two other possibilities, which might be faster but a little more complicated/manual, would be to use bfs_successors, which does a breadth-first search, and should only include any target node once in the list. Not 100% sure on the format, so it might not be convenient.
You could also generate bfs_tree, which generates a subgraph with no cycles to all nodes that it can reach. That might actually be simpler, and probably shorter?
# Get random source from G.node
source = random.choice(G.node)
min_tree = nx.bfs_tree(G, source)
# Accessible nodes are any node in this list, except I need to remove source.
all_accessible = min_tree.node.keys()
all_accessible.remove(source)
target = random.choice(all_accessible.node.keys())
random_path = nx.shortest_path(G, source, target)
I have a (likely) simple graph traversal question. I'm a graph newbie using networkx as my graph data structures. My graphs always look like this:
0
1 8
2 3 9 10
4 5 6 7 11 12 13 14
I need to return the path from the root node to a given node (eg., path(0, 11) should return [0, 8, 9, 11]).
I have a solution that works by passing along a list which grows and shrinks to keep track of what the path looks like as you traverse the tree, ultimately getting returned when the target node is found:
def VisitNode(self, node, target, path):
path.append(node)
# Base case. If we found the target, then notify the stack that we're done.
if node == target:
return True
else:
# If we're at a leaf and it isn't the target, then pop the leaf off
# our path (backtrack) and notify the stack that we're still looking
if len(self.neighbors(node)) == 0:
path.pop()
return False
else:
# Sniff down the next available neighboring node
for i in self.neighbors_iter(node):
# If this next node is the target, then return the path
# we've constructed so far
if self.VisitNode(i, target, path):
return path
# If we've gotten this far without finding the target,
# then this whole branch is a dud. Backtrack
path.pop()
I feel in my bones that there is no need for passing around this "path" list... I should be able to keep track of that information using the call stack, but I can't figure out how... Could someone enlighten me on how you would solve this problem recursively using the stack to keep track of the path?
You could avoid passing around the path by returning None on failure, and a partial path on success. In this way, you do not keep some sort of 'breadcrumb trail' from the root to the current node, but you only construct a path from the target back to the root if you find it. Untested code:
def VisitNode(self, node, target):
# Base case. If we found the target, return target in a list
if node == target:
return [node]
# If we're at a leaf and it isn't the target, return None
if len(self.neighbors(node)) == 0:
return None
# recursively iterate over children
for i in self.neighbors_iter(node):
tail = self.VisitNode(i, target)
if tail: # is not None
return [node] + tail # prepend node to path back from target
return None #none of the children contains target
I don't know the graph library you are using, but I assume that even leafs contain a neighbours_iter method, which obviously shouldn't yield any children for a leaf. In this case, you can leave out the explicit check for a leaf, making it a bit shorter:
def VisitNode(self, node, target):
# Base case. If we found the target, return target in a list
if node == target:
return [node]
# recursively iterate over children
for i in self.neighbors_iter(node):
tail = self.VisitNode(i, target)
if tail: # is not None
return [node] + tail # prepend node to path back from target
return None # leaf node or none of the child contains target
I also removed some of the else statements, since inside the true-part of the if you are returning from the function. This is common refactering pattern (which some old-school people don't like). This removes some unnecessary indentation.
You can avoid your path argument at all having path initialized in the method's body. If method returns before finding a full path, it may return an empty list.
But your question is also about using a stack instead of a list in Depth-First-search implementation, right? You get a flavor here: http://en.literateprograms.org/Depth-first_search_%28Python%29.
In a nutshell, you
def depthFirstSearch(start, isGoal, result):
###ensure we're not stuck in a cycle
result.append(start)
###check if we've found the goal
###expand each child node in order, returning if we find the goal
# No path was found
result.pop()
return False
with
###<<expand each child node in order, returning if we find the goal>>=
for v in start.successors:
if depthFirstSearch(v, isGoal, result):
return True
and
###<<check if we've found the goal>>=
if isGoal(start):
return True
Use networkx directly:
all_simple_paths(G, source, target, cutoff=None)