Standard Deviation of shortest path lengths in networkx - python

networkx.average_shortest_path_length(G) gives the average of shortest paths between all pairs of nodes in a graph G. I want the standard deviation of all these shortest path lengths. Is there an inbuilt method in the networkx package?
I am aware of using nx.all_pairs_shortest_path_length(G), which gives a dictionary of all the shortest path length. I was hoping that networkx has some inbuilt method instead, since it already has a method to calculate average.

The current version of the software (2,4rc1 as of writing) does not have such a method.
You can check the list of methods available within this context here: https://networkx.github.io/documentation/latest/reference/algorithms/shortest_paths.html#module-networkx.algorithms.shortest_paths.unweighted
Due to the fact that shortest path length calculation can be done with a multitude of algorithmic means, and each of them have a list of their own peculiar disadvantages, such a method would not really make sense in terms of what NetworkX is meant to do, or aims to achieve. Depending on how you aim to calculate the shortest path you should implement your own function for that, which can then circumvent these limitations within the specific graph you are working with.
You can easily calculate it from the dictionary NetworkX is already providing.
import numpy as np
import networkx as nx
Pairs = nx.all_pairs_shortest_path_length(G)
np.std(Pairs)
More about numpy.std here: https://docs.scipy.org/doc/numpy/reference/generated/numpy.std.html

Related

Is there a more efficient way to calculate the shortest path problem than networkx in python?

I am calculating the shortest path from one source to one goal on a weighted graph with networkx and single_source_dijkstra.
However, I run into memory problems.
Is there a more efficient way to calculate this? An alternative to Networkx? See my code:
cost, shortestpath = nx.single_source_dijkstra(graph, startpointcoords, secondptcoords,cutoff=10000000)
The bidirectional dijkstra algorithm should produce a significant improvement. Here is the documentation.
A good analogy would be in 3D: place one balloon at point x and expand it till it reaches point y. The amount of air you put in is proportional to the cube of the distance between them. Now put a balloon at each point and inflate both until they touch. The combined volume of air is only 1/4 of the original. In higher dimensions (which is a closer analogy to most networks), there is even more reduction.
Apparently the A* Algorithm of networkx is way more efficient. Afterwards I calculate the length of the resulting path with the dijkstra algorithm I posted.
Perhaps try using another algorithm? Your graph may have too many vertices but few edges, in which case you could use Bellman-Ford bellman_ford_path() link in networkX
Another solution would be to use another python package, for example the answers to this question has different possible libraries.
The last solution would be to implement your own algorithm! Perhaps Gabow's algorithm, but you would have to be very efficient for example by using numpy with numba

Efficient query in Cypher (as in Neo4J) to find union of shortest paths

I'm looking for an efficient Cypher query to find the subgraph defined on a subset of nodes from a graph, defined as the union of shortest paths between nodes in that subset.
Currently, what I have is:
from neo4j import GraphDatabase
gdb = GraphDatabase.driver(address)
with gdb.session() as session:
query = "MATCH path=shortestPath((n1)-[*1..{0}]->(n2)) ".format(max_path_length)
query += "where n1.Name in {0} and c2.Name in {0} and n1.Name<>n2.Name ".format(list(nodes))
query += "return rels(path)"
paths = list(session.run(query).data())
edges = set().union(*(set(path['rels(path)']) for path in paths))
graph = nx.Graph(list(edges))
Now, this code works - but is inefficient, calculating the shortest path between any two nodes separately, while many of the paths intersect (as one can expect in a realistic use-case of a "sub-graph").
I don't know an efficient algorithm to calculate this; is there one - implemented in Cypher?
Related questions:
Extract subgraph from Neo4j graph with Cypher shows how to save a bit in syntax by defining the list of nodes with a "WITH" statement, but it have not said to improve performance (maybe it is not possible).
Extract subgraph in neo4j deals with the subgraph around a single source - if it applies to my case I don't quite understand how (will accept an answer that explains how that answers there help my case)
EDITED: I'd also accept a solution that returns the (or a) minimal subgraph that contains the given nodes; I didn't ask about it because it looked like a harder question, and I didn't have a code sample for it. If I'm wrong and it's actually easier, I'd accept that.
Have you looked at the all pairs shortest path and minimum weight spanning tree algorithms available from the Neo4j graph algorithms library? All pairs shortest path returns the distance for each pair of nodes, not the path, so it might not meet your need.

Graph matching algorithms

I've been searching for graph matching algorithms written in Python but I haven't been able to find much.
I'm currently trying to match two different graphs that derive from two distinct sets of character sequences. I know that there is an underlying connection between the two graphs, more precisely a one-to-one mapping between the nodes. But the graphs don't have the same labels and as such I need graph matching algorithms that return nodes mappings just by comparing topology and/or attributes. By testing, I hope to maximize correct matches.
I've been using Blondel and Heymans from the graphsim package and intend to also use Tacsim from the same package.
I would like to test other options, probably more standard, like maximum subgraph isomorphism or finding subgraphs with very good matchings between the two graphs. Graph edit distance might also help if it manages to give a matching.
The problem is that I can't find anything implemented, even in Networkx that I'm using. Does anyone know of any Python implementations? Would be a plus if those options used Networkx.
I found this implementation of Graph Edit Distance algorithms which uses NetworkX in Python.
https://github.com/Jacobe2169/GMatch4py
"GMatch4py is a library dedicated to graph matching. Graph structure are stored in NetworkX graph objects. GMatch4py algorithms were implemented with Cython to enhance performance."

Computing many shortest paths in graph

I have a large (weighted, directed) graph (>100,000 nodes) and I want to compute a large number of random shortest paths in that graph. So I want to randomly select two nodes (let's say k times) and compute the shortest path. One way to do this is using either the networkx or the igraph module and doing a for loop as in
pairs=np.random.choice(np.arange(0,len(graph.nodes)), [k,2])
for pair in pairs:
graph.get_shortest_paths(pair[0],pair[1], weights='weight')
This works, but it takes a long time. Especially, compared to computing all paths for a particular source node. Essentially, in every iteration the process loads the graph again and starts the process from scratch. So is there a way to benefit from loading the graph structure in to memory and not redoing this in each iteration without computing all shortest paths (which would take too long given that those would be n*(n-1) paths).
Phrased differently, can I compute a random subset of all shortest paths in an efficient way?
AFAIK, the operations are independent of each other, so running them in parallel could work (pseudocode):
import dask
#dask.delayed
def short_path(graph, pair):
return graph.get_shortest_paths(pair[0],pair[1], weights='weight')
pairs=np.random.choice(np.arange(0,len(graph.nodes)), [k,2])
results = dask.compute(*[short_path(pair) for pair in pairs])

Finding the Path of all Edges on a Graph

I'm trying to get the path on a graph which covers all edges, and traverses them only once.
This means there will only be two "end" points - which will have an odd-number of attached nodes. These end points would either have one connecting edge, or be part of a loop and have 3 connections.
So in the simple case below I need to traverse the nodes in this order 1-2-3-4-5 (or 5-4-3-2-1):
In the more complicated case below the path would be 1-2-3-4-2 (or 1-2-4-3-2):
Below is also a valid graph, with 2 end-points: 1-2-4-3-2-5
I've tried to find the name of an algorithm to solve this, and thought it was the "Chinese Postman Problem", but implementing this based on code at https://github.com/rkistner/chinese-postman/blob/master/postman.py didn't provide the results I expected.
The Eulerian path looks almost what is needed, but the networkx implementation will only work for closed (looped) networks.
I also looked at a Hamiltonian Path - and tried the networkx algorithm - but the graph types were not supported.
Ideally I'd like to use Python and networkx to implement this, and there may be a simple solution that is already part of the library, but I can't seem to find it.
You're looking for Eulerian Path that visits every edge exactly once. You can use Fleury's algorithm to generate the path. Fleury's algorithm has O(E^2) time complexity, if you need more efficient algorithm check Hierholzer's algorithm which is O(E) instead.
There is also an unmerged pull request for the networkx library that implements this. The source is easy to use.
(For networkx 1.11 the .edge has to be replaced with .edge_iter).
This is known as the Eulerian Path of a graph. It has now been added to NetworkX as eulerian_path().

Categories