Python networkx - find heaviest path in DAG between 2 nodes - python

I'm totally surprised with python networkx not supporting heavies path between specific 2 nodes
I have very big graph (DAG), ~70K nodes where there is a weight attributes on each edge (weight is >= 0)
I want to create a function take source and target and return the heaviest path between this specific 2 nodes.
I have tried using all_simple_path and implemented get_weight function that takes path and return total weight, as suggested in some solutions.
however all_simple_path never ends with this graph, the graph does not have any cycle for sure (ran networkx find_cycle function), this solution worked for very small graphs.
all suggested solutions I found here and other places return heaviest path in the whole graph (start to end), while dag has this function (dag_longest_path), but its not what I need.
Any networkx function or graphs lib in python I can use to get heavies path between 2 nodes ?
or any direction to achieve the requirement ?
Thanks in advance!

It's just the matter of summing up the weights (or any other edge numeric attribute) of edges in each path by iterating over the all_simple_paths and take the maximum value at the end:
import networkx as nx
import random
G = nx.complete_graph(10)
# add a random weight between 0 and 1 to each edge
for src, target, _ in G.edges.data():
G[src][target]['weight'] = round(random.random(), 2)
def aggregate_weights(G, path):
"""
Calculate sum of the weights in a path.
"""
return sum([G[i][i + 1]['weight'] for i in range(len(path) - 2)])
def find_heaviest_path(G, source, target):
"""
Find the heaviest path between source and target nodes.
"""
return max([aggregate_weights(G, path) for path in nx.all_simple_paths(G, source, target)])
Note: the above algorithm has a high time complexity.

Related

Python - Networkx: Graph of Neighbor Nodes with certain weight

The following problem is using Python 3.9 and Networkx 2.5
I need to output a subgraph of G that only contains edges between nodes in a list and directly neighboring nodes with edge weights less than 100. Currently I am using the following code, but only am able to pull the edge weight. I need to get both the node name and edge weight.
list_neighbors=G.neighbors('Rochester, NY')
for i in list_neighbors:
if G.edges[('Rochester, NY',i)]['weight']<100:
print (G.edges[('Rochester, NY',i)])
OUTPUT:
{'weight': 88}
How do I get the output to include the Node Names as well (input node and it's neighbor that meets criteria of weight)
I would like to have the input of the function be ('Rochester, NY', 'Seattle, WA'), and the output to be the neighbor cities of each within 100 miles.
Follow up questions are discouraged in favour of new, separate question but since you are new here:
import networkx as nx
# version 1 : outputs an iterator; more elegant, potentially more performant (unlikely in this case though)
def get_neighbors_below_threshold(graph, node, threshold=100, attribute='weight'):
for neighbor in graph.neighors(node):
if graph.edges[(node, neighbor)][attribute] < threshold:
yield neighbor
# version 2 : outputs a list; maybe easier to understand
def get_neighbors_below_threshold(graph, node, threshold=100, attribute='weight'):
output = []
for neighbor in graph.neighors(node):
if graph.edges[(node, neighbor)][attribute] < threshold:
output.append(neighbor)
return output
n1 = get_neighbors_below_threshold(G, 'Rochester, NY')
n2 = get_neighbors_below_threshold(G, 'Seattle, WA')
combined_neighbors = set(n1) | set(n2)
common_neighbors = set(n1) & set(n2)

What is best: Global Variable or Parameter in this python function?

I have a question about the following code, but i guess applies to different functions.
This function computes the maximum path and its length for a DAG, given the Graph, source node, and end node.
To keep track of already computed distances across recursions I use "max_distances_and_paths" variable, and update it on each recursion.
Is it better to keep it as a function parameter (inputed and outputed across recursions) or
use a global variable and initialize it outside the function?
How can avoid to have this parameter returned when calling the function externally (i.e it
has to be outputed across recursions but I dont care about its value, externally)?
a better way than doing: LongestPath(G, source, end)[0:2] ??
Thanks
# for a DAG computes maximum distance and maximum path nodes sequence (ordered in reverse).
# Recursively computes the paths and distances to edges which are adjacent to the end node
# and selects the maximum one
# It will return a single maximum path (and its distance) even if there are different paths
# with same max distance
# Input {Node 1: adj nodes directed to Node 1 ... Node N: adj nodes directed to Node N}
# Example: {'g': ['r'], 'k': ['g', 'r']})
def LongestPath(G, source, end, max_distances_and_paths=None):
if max_distances_and_paths is None:
max_distances_and_paths = {}
max_path = [end]
distances_list = []
paths_list = []
# return max_distance and max_path from source to current "end" if already computed (i.e.
# present in the dictionary tracking maximum distances and correspondent distances)
if end in max_distances_and_paths:
return max_distances_and_paths[end][0], max_distances_and_paths[end][1], max_distances_and_paths
# base case, when end node equals source node
if source == end:
max_distance = 0
return max_distance, max_path, max_distances_and_paths
# if there are no adjacent nodes directed to end node (and is not the source node, previous case)
# means path is disconnected
if len(G[end]) == 0:
return 0, [0], {"": []}
# for each adjacent node pointing to end node compute recursively its max distance to source node
# and add one to get the distance to end node. Recursively add nodes included in the path
for t in G[end]:
sub_distance, sub_path, max_distances_and_paths = LongestPath(G, source, t, max_distances_and_paths)
paths_list += [[end] + sub_path]
distances_list += [1 + sub_distance]
# compute max distance
max_distance = max(distances_list)
# access the same index where max_distance is, in the list of paths, to retrieve the path
# correspondent to the max distance
index = [i for i, x in enumerate(distances_list) if x == max_distance][0]
max_path = paths_list[index]
# update the dictionary tracking maximum distances and correspondent paths from source
# node to current end node.
max_distances_and_paths.update({end: [max_distance, max_path]})
# return computed max distance, correspondent path, and tracker
return max_distance, max_path, max_distances_and_paths
Global variables are generally avoided due to several reasons (see Why are global variables evil?). I would recommend sending the parameter in this case. However, you could define a larger function housing your recursive function. Here's a quick example I wrote for a factorial code:
def a(m):
def b(m):
if m<1:return 1
return m*b(m-1)
n = b(m)
m=m+2
return n,m
print(a(6))
This will give: (720, 8). This proves that even if you used the same variable name in your recursive function, the one you passed in to the larger function will not change. In your case, you want to just return n as per my example. I only returned an edited m value to show that even though both a and b functions have m as their input, Python separates them.
In general I would say avoid the usage of global variables. This is because is makes you code harder to read and often more difficult to debug if you codebase gets a bit more complex. So it is good practice.
I would use a helper function to initialise your recursion.
def longest_path_helper(G, source, end, max_distances_and_paths=None):
max_distance, max_path, max_distances_and_paths = LongestPath(
G, source, end, max_distances_and_paths
)
return max_distance, max_path, max_distances_and_paths
On a side note, in Python it is convention to write functions without capital letters and separated with underscores and Capicalized without underscores are used for classes. So it would be more Pythonic to use def longest_path():

Use NetworkX to find cycles in MultiDiGraph imported from shapefile

I am writing a QGIS plugin which will use the NetworkX library to manipulate and analyze stream networks. My data comes from shapefiles representing stream networks.
(arrows represent direction of stream flow)
Within this stream network are braids which are important features I need to retain. I am categorizing braid features into "simple" (two edges that share two nodes) and "complex" (more than two edges, with more than two nodes).
Simple braid example
Complex braid example
Normally, I would just use the NetworkX built-in function read_shp to import the shapefile as a DiGraph. As is evident in the examples, the "simple" braid will be considered a parallel edge in a NetworkX DiGraph, because those two edges (which share the same to and from nodes) would be collapsed into a single edge. In order to preserve these multiple edges, we wrote a function that imports a shapefile as a MultiDiGraph. Simple braids (i.e. parallel edges) are preserved by using unique keys in the edge objects (this is embedded in a class):
def _shp_to_nx(self, in_network_lyr, simplify=True, geom_attrs=True):
"""
This is a re-purposed version of read_shp from the NetworkX library.
:param shapelayer:
:param simplify:
:param geom_attrs:
:return:
"""
self.G = nx.MultiDiGraph()
for f in in_network_lyr.getFeatures():
flddata = f.attributes()
fields = [str(fi.name()) for fi in f.fields()]
geo = f.geometry()
# We don't care about M or Z
geo.geometry().dropMValue()
geo.geometry().dropZValue()
attributes = dict(zip(fields, flddata))
# Add a new _FID_ field
fid = int(f.id())
attributes[self.id_field] = fid
attributes['_calc_len_'] = geo.length()
# Note: Using layer level geometry type
if geo.wkbType() in (QgsWKBTypes.LineString, QgsWKBTypes.MultiLineString):
for edge in self.edges_from_line(geo, attributes, simplify, geom_attrs):
e1, e2, attr = edge
self.features[fid] = attr
self.G.add_edge(tuple(e1), tuple(e2), key=attr[self.id_field], attr_dict=attr)
self.cols = self.features[self.features.keys()[0]].keys()
else:
raise ImportError("GeometryType {} not supported. For now we only support LineString types.".
format(QgsWKBTypes.displayString(int(geo.wkbType()))))
I have already written a function to find the "simple" braid features (I just iterate through the MultiDiGraphs nodes, and find edges with more than one key). But I also need to find the "complex" braids. Normally, in a Graph, I could use the cycle_basis to find all of the "complex" braids (i.e. cycles), however, the cycle_basis method only works on un-directed Graphs, not directional graphs. But I'd rather not convert my MultiDiGraph into an un-directed Graph, as there can be unexpected results associated with that conversion (not to mention losing my edge key values).
How could I go about finding cycles which are made up of more than one edge, in a relatively time-efficient way? The stream networks I'm really working with can be quite large and complex, representing large watersheds.
Thanks!
So I came up with a solution, for finding both "simple" and "complex" braids.
def get_complex_braids(self, G, attrb_field, attrb_name):
"""
Create graph with the braid edges attributed
:param attrb_field: name of the attribute field
:return braid_G: graph with new attribute
"""
if nx.is_directed(G):
UG = nx.Graph(G)
braid_G = nx.MultiDiGraph()
for edge in G.edges(data=True, keys=True):
is_edge = self.get_edge_in_cycle(edge, UG)
if is_edge == True:
braid_G.add_edge(*edge)
self.update_attribute(braid_G, attrb_field, attrb_name)
return braid_G
else:
print "ERROR: Graph is not directed."
braid_complex_G = nx.null_graph()
return braid_complex_G
def get_simple_braids(self, G, attrb_field, attrb_name):
"""
Create graph with the simple braid edges attributed
:param attrb_field: name of the attribute field
:return braid_G: graph with new attribute
"""
braid_simple_G = nx.MultiDiGraph()
parallel_edges = []
for e in G.edges_iter():
keys = G.get_edge_data(*e).keys()
if keys not in parallel_edges:
if len(keys) == 2:
for k in keys:
data = G.get_edge_data(*e, key=k)
braid_simple_G.add_edge(e[0], e[1], key=k, attr_dict=data)
parallel_edges.append(keys)
self.update_attribute(braid_simple_G, attrb_field, attrb_name)
return braid_simple_G
This is not a definite answer, but longer than maximum allowed characters for a comment, so I post it here anyway.
To find simple braids, you can use built-in methods G.selfloop_edges and G.nodes_with_selfloops.
I haven't heard about cycle_basis for directed graphs, can you provide a reference (e.g. scientific work)? NetworkX has simple_cycles(G) which works on directed Graphs, but it is also not useful in this case, because water does not visit any node twice (or?).
I am afraid that the only way is to precisely describe the topology and then search the graph to find matching occurrences. let me clarify my point with an example. the following function should be able to identify instances of complex braids similar to your example:
def Complex_braid(G):
res = []
# find all nodes with out_degree greater than one:
candidates = [n for n in G.nodes() if len(G.successors(n)) > 1]
# find successors:
for n in candidates:
succ = G.successors(n)
for s in succ:
if len(list(nx.all_simple_paths(G,n,s))) > 1:
all_nodes = sorted(list(nx.all_simple_paths(G,n,s)), key=len)[-1]
res.append(all_nodes)
return res
G = nx.MultiDiGraph()
G.add_edges_from([(0,1), (1,2), (2,3), (4,5), (1,5), (5,2)])
Complex_braid(G)
# out: [[1, 5, 2]]
but the problem actually is that complex braids can be in different topological configurations and therefore it doesn't really make sense to define all possible topological configurations, unless you can describe them with one (or few) patterns or you can find a condition that signify the presence of complex braid.

Python - Networkx search predecessor nodes - Maximum depth exceeded

I'm working in a project using the library Networkx ( for graph management ) in Python, and I been having trouble trying to implement what I need
I have a collection of directed graphs, holding special objects as nodes and weights associated with the edges, the thing is I need to go through the graph from output nodes to input nodes. and for each node I have to take the weights from their predecessors and an operation calculated by that predecessor node to build the operation form my output node. But the problem is that the operations of the predecessors may depend from their own predecessors, and so on, so I'm wondering how I can solve this problem.
So far I have try the next, lets say I have a list of my output nodes and I can go through the predecessors using the methods of the Networkx library:
# graph is the object containig my directe graph
for node in outputNodes:
activate_predecessors(node , graph)
# ...and a function to activate the predecessors ..
def activate_predecessors( node = None , graph ):
ws = [] # a list for the weight
res = [] # a list for the response from the predecessor
for pred in graph.predecessors( node ):
# get the weights
ws.append( graph[pred][node]['weight'] )
activate_predecessors( pred , graph )
res.append( pred.getResp() ) # append the response from my predecessor node to a list, but this response depend on their own predecessors, so i call this function over the current predecessor in a recursive way
# after I have the two lists ( weights and the response the node should calculate a reduce operation
# do after turning those lists into numpy arrays...
node.response = np.sum( ws*res )
This code seems to work... I tried it on in some random many times, but in many occasions it gives a maximum recursion depth exceeded so I need to rewrite it in a more stable ( and possibly iterative ) way in order to avoid maximum recursion. but I'm running out of ideas to handle this..
the library has some searching algorithms (Depth first search) but after I don't know how it could help me to solve this.
I also try to put some flags on the nodes to know if it had been already activated but I keep getting the same error.
Edit: I forgot, the input nodes have a defined response value so they don't need to do calculations.
your code may contain an infinite recursion if there is a cycle between two nodes. for example:
import networkx as nx
G = nx.DiGraph()
G.add_edges_from([(1,2), (2,1)])
def activate_nodes(g, node):
for pred in g.predecessors(node):
activate_nodes(g, pred)
activate_nodes(G, 1)
RuntimeError: maximum recursion depth exceeded
if you have possible cycles on one of the graphs you better mark each node as visited or change the edges on the graph to have no cycles.
assuming you do not have cycles on your graphs here is an example of how to implement the algorithm iteratively:
import networkx as nx
G = nx.DiGraph()
G.add_nodes_from([1,2,3])
G.add_edges_from([(2, 1), (3, 1), (2, 3)])
G.node[1]['weight'] = 1
G.node[2]['weight'] = 2
G.node[3]['weight'] = 3
def activate_node(g, start_node):
stack = [start_node]
ws = []
while stack:
node = stack.pop()
preds = g.predecessors(node)
stack += preds
print('%s -> %s' % (node, preds))
for pred in preds:
ws.append(g.node[pred]['weight'])
print('weights: %r' % ws)
return sum(ws)
print('total sum %d' % activate_node(G, 1))
this code prints:
1 -> [2, 3]
3 -> [2]
2 -> []
2 -> []
weights: [2, 3, 2]
total sum 7
Note
you can reverse the direction of the directed graph using DiGraph.reverse()
if you need to use DFS or something else you can reverse the graph to get the predecessor as just the directly connected neighbours of that node. Using this, algorithms like DFS might be easier to use.

More efficient way of running a random traversal of a directed graph with Networkx

I am trying to simulate a random traversal through a directed networkx graph. The pseudo code is as follows
Create graph G with nodes holding the value true or false.
// true -> visited, false -> not visited
pick random node N from G
save N.successors as templist
while true
nooptions = false
pick random node N from templist
while N from templist has been visited
remove N from templist
pick random node N from templist
if templist is empty
nooptions = true
break
if nooptions = true
break
save N.successors as templist
Is there are a more efficient way of marking a path as traveled other than
creating a temporary list and removing the elements if they are marked as visited?
EDIT
The goal of the algorithm is to pick a node at random in the graph. Pick a random successor/child of that node. If it is unvisited, go there and mark it as visited. Repeat until there are either no successors/children or there are no unvisited successors/children
Depending on the size of your graph, you could use the built-in all_pairs_shortest_path function. Your function would then be basically:
G = nx.DiGraph()
<add some stuff to G>
# Get a random path from the graph
all_paths = nx.all_pairs_shortest_path(G)
# Choose a random source
source = random.choice(all_paths.keys())
# Choose a random target that source can access
target = random.choice(all_paths[source].keys())
# Random path is at
random_path = all_paths[source][target]
There doesn't appear to be a way to just generate the random paths starting at source that I saw, but the python code is accessible, and adding that feature would be straightforward I think.
Two other possibilities, which might be faster but a little more complicated/manual, would be to use bfs_successors, which does a breadth-first search, and should only include any target node once in the list. Not 100% sure on the format, so it might not be convenient.
You could also generate bfs_tree, which generates a subgraph with no cycles to all nodes that it can reach. That might actually be simpler, and probably shorter?
# Get random source from G.node
source = random.choice(G.node)
min_tree = nx.bfs_tree(G, source)
# Accessible nodes are any node in this list, except I need to remove source.
all_accessible = min_tree.node.keys()
all_accessible.remove(source)
target = random.choice(all_accessible.node.keys())
random_path = nx.shortest_path(G, source, target)

Categories