I need to find all the possible node cuts in an s-t graph.
It is a directed graph, but finding the cuts for the undirected graph could be enough (I can filter them afterwards).
Networkx provides functions as:
all_node_cuts
but it is not implemented for directed graphs (no problem) and it does not consider s-t graphs, so the solution is not useful in my case (at least for cut sets in which source S is included by the function).
I tried to implement the combinatorial approach (check all the possible combinations of nodes) and it works but is extremely inefficient, starting from graphs with more than 10 nodes the execution time is too large.
I tried to check if it's possible to modify network functions like minimum_st_node_cut but did not succeded and I don't know if it possible to list all the possible cuts.
I could also use any other library if it provides some useful tool for this (even programming language, if needed).
IGraph is providing an algorithm, but I cannot evaluate if its computational efficient. For my instance of 93 nodes, it could not find a cut until I stopped it after a few hours. Nevertheless, it works for smaller instances:
import numpy as np
from igraph import Graph
# Create adjacancy matrix
adjacency = np.array([ [0,0,0],
[0,0,1],
[1,0,0],])
iG = Graph.Adjacency((adjacency > 0).tolist())
cuts = iG.all_st_cuts(1,2)
Related
I would like to generate multiple Erdos-Renyi graphs with random edge weights. However, my code works quite slow since there are two nested loops. I was wondering if someone can help me with improving my code.
import networkx as nx
import random
#Suppose I generate 1000 different random graphs
for _ in range(1000):
#Let's say I will have 100 nodes and the connection probability is 0.4
G= nx.fast_gnp_random_graph(100,0.4)
#Then, I assign random edge weights.
for (u, v) in G.edges():
G.edges[u,v]['weight'] = random.randint(15,5000)
When I run a similar code block in R using igraph, it is super fast regardless of the size of the network. What are some alternative ways that I can accomplish the same task without facing slow execution time?
This benchmark shows the performance of many graphs libraries (from different languages). It confirms NetworkX is very slow. The graph-tool Python package seems a significantly faster alternative to NetworkX. Please note that the performance of a given package is dependent of what you want to achieve because the performance of a graph algorithm is very dependent of the chosen internal representation.
I have a network that is a graph network and it is the Email-Eu network that is available in here.
This dataset has the actual dataset, which is a graph of around 1005 nodes with the edges that form this giant graph. It also has the ground truth labels for the nodes and its corresponding communities (department). Each one of these nodes belongs to one of each 42 departments.
I want to run a community detection algorithm on the graph to find to the corresponding department for each node. My main objective is to find the nodes in the largest community.
So, first I need to find the first 42 departments (Communities), then find the nodes in the biggest one of them.
I started with Girvan-Newman Algorithm to find the communities. The beauty of Girvan-Newman is that it is easy to implement since every time I need to find the edge with the highest betweenness and remove it till I find the 42 departments(Communities) I want.
I am struggling to find other Community Detection Algorithms that give me the option of specifying how many communities/partitions I need to break down my graph into.
Is there any Community Detection Function/Technique that I can use, which gives me the option of specifying how many communities do I need to uncover from my graph? Any ideas are very much appreciated.
I am using Python and NetworkX.
A (very) partial answer (and solution) to your question is to use Fluid Communities algorithm implemented by Networkx as asyn_fluidc.
Note that it works on connected, undirected, unweighted graphs, so if your graph has n connected components, you should run it n times. In fact this could be a significant issue as you should have some sort of preliminary knowledge of each component to choose the corresponding k.
Anyway, it is worth a try.
You may want to try pysbm. It is based on networkx and implements different variants of stochastic block models and inference methods.
If you consider to switch from networkxto a different python based graph package you may want to consider graph-tool, where you would be able to use the stochastic block model for the clustering task. Another noteworthy package is igraph, may want to look at How to cluster a graph using python igraph.
The approaches directly available in networkx are rather old fashioned. If you aim for state of the art clustering methods, you may consider spectral clustering or Infomap. The selection depends on your desired usage of the inferred communities. The task of inferring ground truth from a network, falls under (approximate) the No-Free-Lunch theorem, i.e. (roughly) no algorithm exists, such that it returns "better" communities than any other algorithm, if we average the results over all possibilities.
I am not entirely sure of my answer but maybe you can try this. Are you aware of label propagation ? The main idea is that you have some nodes in graph which are labelled i.e. they belong to a community and you want to give labels to other unlabelled nodes in your graph. LPA will spread these labels across the graph and give you a list of nodes and the communities they belong to. These communities will be the same as the ones that your labelled set of nodes belong to.
So I think you can control the number of communities you want to extract from the graph by controlling the number of communities you initialise in the beginning. But I think it is also possible that after LPA converges some of the communities you initialised vanish from the graph due the graph structure and also randomness of the algorithm. But there are many variants of LPA where you can control this randomness. I believe this page of sklearn talks about it.
You can read about LPA here and also here
I have networks (large&small) which I need to prune until only isolates are left. I would like to maximize the number of isolates. Is there an algorithm to do this? also if its more complex ? (e.g. on the bottom there may be connected nodes as well)
consider this simple example:
import networkx as nx
G = nx.DiGraph()
G.add_edges_from([(1,2), (2,3), (2,4), (3,5), (3,6), (4,7), (4,8),
(1,2+9), (2+9,3+9), (2+9,4+9), (3+9,5+9), (3+9,6+9), (4+9,7+9), (4+9,8+9) ])
Here, the optimal solution is deleting node 1,3,4,11 and 12.
This sounds like you want to calculate the Maximum Independent Set (which is NP-hard for general graphs). (Wikipedia also mentions the name vertex packing).
Networkx has an approximation algorithm.
Maybe an approximation is not enough for you (and i think there is no prebuild alternative).
According to the wikipedia-link above, there is an exact algorithm available in igraph.
Also keep in mind, that this approximation-algorithm is for general graphs, so it might be very much possible, that there are better approaches for trees.
Edit: (Without checking) This SO-answer presents an algorithm for trees (and the question is a possible duplicate).
I'm trying to get the path on a graph which covers all edges, and traverses them only once.
This means there will only be two "end" points - which will have an odd-number of attached nodes. These end points would either have one connecting edge, or be part of a loop and have 3 connections.
So in the simple case below I need to traverse the nodes in this order 1-2-3-4-5 (or 5-4-3-2-1):
In the more complicated case below the path would be 1-2-3-4-2 (or 1-2-4-3-2):
Below is also a valid graph, with 2 end-points: 1-2-4-3-2-5
I've tried to find the name of an algorithm to solve this, and thought it was the "Chinese Postman Problem", but implementing this based on code at https://github.com/rkistner/chinese-postman/blob/master/postman.py didn't provide the results I expected.
The Eulerian path looks almost what is needed, but the networkx implementation will only work for closed (looped) networks.
I also looked at a Hamiltonian Path - and tried the networkx algorithm - but the graph types were not supported.
Ideally I'd like to use Python and networkx to implement this, and there may be a simple solution that is already part of the library, but I can't seem to find it.
You're looking for Eulerian Path that visits every edge exactly once. You can use Fleury's algorithm to generate the path. Fleury's algorithm has O(E^2) time complexity, if you need more efficient algorithm check Hierholzer's algorithm which is O(E) instead.
There is also an unmerged pull request for the networkx library that implements this. The source is easy to use.
(For networkx 1.11 the .edge has to be replaced with .edge_iter).
This is known as the Eulerian Path of a graph. It has now been added to NetworkX as eulerian_path().
Picture is worth a thousand words, so:
My input is the matrix on the left, and what I need to find is the sets of nodes that are maximum one step away from each other (not diagonally). Node that is more than one up/down/left/right step away would be in a separate set.
So, my plan was running a BFS from every node I find, then returning the set it traversed through, and removing it from the original set. Iterate this process until I'm done. But then I've had the wild idea of looking for a graph analysis tools - and I've found NetworkX. Is there an easy way (algorithm?) to achieve this without manually writing BFS, and traverse the whole matrix?
Thanks
What you are trying to do is searching for "connected components" and
NetworX has itself a method for doing exactly that as can be seen in the first example on this documentation page as others has already pointed out on the comments.
Reading your question it seems that your nodes are on a discrete grid and the concept of connected that you describe is the same used on the pixel of an image.
Connected components algorithms are available for graphs and for images also.
If performances are important in your case I would suggest you to go for the image version of connected components.
This comes by the fact that images (grids of pixels) are a specific class of graphs so the connected components algorithms dealing with grids of nodes
are built knowing the topology of the graph itself (i.e. graph is planar, the max vertex degree is four). A general algorithm for graphs has o be able to work on general graphs
(i.e they may be not planar, with multiple edges between some nodes) so it has to spend more work because it can't assume much about the properties of the input graph.
Since connected components can be found on graphs in linear time I am not telling the image version would be orders of magnitude faster. There will only be a constant factor between the two.
For this reason you should also take into account which is the data structure that holds your input data and how much time will be spent in creating the input structures which are required by each version of the algorithm.