K-terminal graph connectivity check (using Networkx) - python

I have a large undirected graph with m specific source and n specific terminal nodes, and I want to check whether there is a connection between all sources to all terminals or not. The answer will be a binary scalar, i.e. 1 if all sources are connected to all terminals, and 0 if there exists a source and a terminal which are not connected.
For one-source one-terminal connectivity, I can use Networkx to check the connection (which is based on the DFS algorithm):
has_path(G, source, target)
The most simple way to check the m-source n-terminal connectivity is to use m+n-1independent DFS runs (using the function above). However, this is not probably the most efficient way of doing the task, and would be slow if we want to do this task repetitively (say, for millions of graphs). What is the most efficient algorithm? What is the minimum number of the required DFS runs? I am using Python, and I prefer to use Networkx to perform connectivity check. Thanks!

I believe the best way to do this for an undirected graph is to use networkx's node_connected_component command to find all the nodes in the same component as one of your source nodes. Then check if all of the target and source nodes are also in that component.
The node_connected_component returns a list in 1.11 and a set in 2.0. Probably the best way to do the test is to see if the set of sources and targets when intersected with the set of that component is equal to the set of sources and targets.

Related

How to find the set of all coincident edges crossing into a subset of a graph only using Python and Gurobi API

I'm trying to work on a Network Optimization Problem, where I need to create all distinct Subgraphs of a provided graph (The Graph is complete in itself and is of undirected nature with symmetric edge weights). In order to ensure the connectivity, distinct set of edges need to mapped to all these individual subgraphs, which are coincident into them. A coincident edge is considered as an edge whose one terminal node is present in the mapped subgraph and other terminal node in the complement (of the subgraph).
I'm facing implementation issues in python, as I'm not able to enumerate all such coincident edges for every distinct subgraph. I need the mapped set of these edges to deploy constraints in Gurobi solver with python. I'm clueless about this task, as I'm relatively new to python and Gurobi.
In case the NETWORKX module has an inbuilt function for such task, then duly provide the info about it and possible implementation

(Python graph-tool) graph-tool search using OpenMP? Can finding all paths between a source and target vertex be made parallel?

I am currently using graph_tool.topology.all_paths to find all paths between two vertices of a specific length, but the documentation doesn't explicitly say what algorithm is used. I assume it is either the breadth-first search (BFS) or Dijkstra’s algorithm like shortest_path or all_shortest_paths?
My graph is unweighted and directed, so therefore is there a way to make my searches using all_paths parallel, and use more cores? I know I have OpenMP turned on using openmp_enabled() and that it is set to use all 8 cores that I have.
I have seen that some algorithms such as the DFS cannot be made parallel, but I don't understand why searching through my graph to find all paths up to a certain length is not being done using multiple cores, especially when the Graph-tool performance comparison page has benchmarks for shortest path using multiple cores.
Running graph_tool.topology.all_paths(g, source, target, cutoff=4) using a basic function:
def find_paths_of_length(graph, path_length, start_vertex, end_vertex):
savedpath=[]
for path in graph_tool.topology.all_paths(graph, start_vertex, end_vertex, cutoff=path_length):
savedpath.append(path)
only uses 1 core. Is there any way that this can be done in parallel? My network contains on the order of 50 million vertices and 200 million edges, and the algorithm is O(V+E) according to the documentation.
Thanks in advance

Check if path exists that contains a set of N nodes

Given a graph g and a set of N nodes my_nodes = [n1, n2, n3, ...], how can I check if there's a path that contains all N nodes?
Checking among all_simple_paths for paths that contain all nodes in my_nodes becomes computationally cumbersome as the graph grows
The search above can be limited to paths between my_nodes pairwise couples. This reduces complexity only to a small degree. Plus it requires a lot of python looping, which is quite slow
Is there a faster solution to the problem?
You may try out some greedy algorithm here, starting the path find check from all the nodes to find, and step by step explore your graph. Can't provide some real sample, but pseudo-code should be something like this:
Start n path stubs from all your n nodes to find
For all these path stubs adjust them by all the neighbors which weren't checked before
If you have some intersection between path stubs, then you got a new one, which does contain more of your needed nodes than before
If after merging the stub paths you have the one which covers all needed nodes, you're done
If there are still some additional nodes to add to the path, you continue with second step again
If there are no nodes left in graph, the path doesn't exists
This algorithm has complexity O(E + N), because you're visiting the edges and nodes in non-recursive fashion.
However, in case of directed graph the "merge" will be a bit more complicated, yet still be done, but in this case the worst scenario may take a lot of time.
Update:
As you say that the graph is directed, the above approach wouldn't work well. In this case you may simplify your task like this:
Find the strongly connected components in graph (I suggest you to implement it by yourself, e.g., Kosaraju's algorithm). The complexity is O(E + N). You can use a NetworkX method for this, if you want some out-ofbox solution.
Create the condensation of graph, based on step 1 information, with saving the information about which component can be visited from other. Again, there is a NetworkX method for this.
Now you can easily say, which nodes from your set are in the same component, so a path containing all of them definitely exists.
After that all you need to check is a connectivity between different components for your nodes. For example, you can get the topological sort of condensation and do check in linear time again.

Finding the Path of all Edges on a Graph

I'm trying to get the path on a graph which covers all edges, and traverses them only once.
This means there will only be two "end" points - which will have an odd-number of attached nodes. These end points would either have one connecting edge, or be part of a loop and have 3 connections.
So in the simple case below I need to traverse the nodes in this order 1-2-3-4-5 (or 5-4-3-2-1):
In the more complicated case below the path would be 1-2-3-4-2 (or 1-2-4-3-2):
Below is also a valid graph, with 2 end-points: 1-2-4-3-2-5
I've tried to find the name of an algorithm to solve this, and thought it was the "Chinese Postman Problem", but implementing this based on code at https://github.com/rkistner/chinese-postman/blob/master/postman.py didn't provide the results I expected.
The Eulerian path looks almost what is needed, but the networkx implementation will only work for closed (looped) networks.
I also looked at a Hamiltonian Path - and tried the networkx algorithm - but the graph types were not supported.
Ideally I'd like to use Python and networkx to implement this, and there may be a simple solution that is already part of the library, but I can't seem to find it.
You're looking for Eulerian Path that visits every edge exactly once. You can use Fleury's algorithm to generate the path. Fleury's algorithm has O(E^2) time complexity, if you need more efficient algorithm check Hierholzer's algorithm which is O(E) instead.
There is also an unmerged pull request for the networkx library that implements this. The source is easy to use.
(For networkx 1.11 the .edge has to be replaced with .edge_iter).
This is known as the Eulerian Path of a graph. It has now been added to NetworkX as eulerian_path().

What's the right algorithm for finding isolated subsets

Picture is worth a thousand words, so:
My input is the matrix on the left, and what I need to find is the sets of nodes that are maximum one step away from each other (not diagonally). Node that is more than one up/down/left/right step away would be in a separate set.
So, my plan was running a BFS from every node I find, then returning the set it traversed through, and removing it from the original set. Iterate this process until I'm done. But then I've had the wild idea of looking for a graph analysis tools - and I've found NetworkX. Is there an easy way (algorithm?) to achieve this without manually writing BFS, and traverse the whole matrix?
Thanks
What you are trying to do is searching for "connected components" and
NetworX has itself a method for doing exactly that as can be seen in the first example on this documentation page as others has already pointed out on the comments.
Reading your question it seems that your nodes are on a discrete grid and the concept of connected that you describe is the same used on the pixel of an image.
Connected components algorithms are available for graphs and for images also.
If performances are important in your case I would suggest you to go for the image version of connected components.
This comes by the fact that images (grids of pixels) are a specific class of graphs so the connected components algorithms dealing with grids of nodes
are built knowing the topology of the graph itself (i.e. graph is planar, the max vertex degree is four). A general algorithm for graphs has o be able to work on general graphs
(i.e they may be not planar, with multiple edges between some nodes) so it has to spend more work because it can't assume much about the properties of the input graph.
Since connected components can be found on graphs in linear time I am not telling the image version would be orders of magnitude faster. There will only be a constant factor between the two.
For this reason you should also take into account which is the data structure that holds your input data and how much time will be spent in creating the input structures which are required by each version of the algorithm.

Categories