Find overlapping modularity in two graphs - iGraph in Python - python

I have two related graphs created in iGraph, A and G. I find community in structure in G using either infomap or label_propagation methods (because they are two that allow for weighted, directional links). From this, I can see the modularity of this community for the G graph. However, I need to see what modularity this will provide for the A graph. How can I do this?

Did you try using the modularity function?
im <- infomap.community(graph=G)
qG <- modularity(im)
memb <- membership(im)
qA <- modularity(x=A, membership=memb, weights=E(A)$weight)
cat("qG=",qG," vs. qA=",qA,"\n",sep="")
Note: tested with igraph v0.7, I don't have a more recent version right now. The parameter/function names might slightly differ.

So I figured it out. What you need to do is find a community structure, either pre-defined or using one of the methods provided for community detection, such as infomap or label_propagation. This gives you a vertex clustering, which you can use to place on another graph and from that use .q to find the modularity.

Related

Community Detection Algorithms using NetworkX

I have a network that is a graph network and it is the Email-Eu network that is available in here.
This dataset has the actual dataset, which is a graph of around 1005 nodes with the edges that form this giant graph. It also has the ground truth labels for the nodes and its corresponding communities (department). Each one of these nodes belongs to one of each 42 departments.
I want to run a community detection algorithm on the graph to find to the corresponding department for each node. My main objective is to find the nodes in the largest community.
So, first I need to find the first 42 departments (Communities), then find the nodes in the biggest one of them.
I started with Girvan-Newman Algorithm to find the communities. The beauty of Girvan-Newman is that it is easy to implement since every time I need to find the edge with the highest betweenness and remove it till I find the 42 departments(Communities) I want.
I am struggling to find other Community Detection Algorithms that give me the option of specifying how many communities/partitions I need to break down my graph into.
Is there any Community Detection Function/Technique that I can use, which gives me the option of specifying how many communities do I need to uncover from my graph? Any ideas are very much appreciated.
I am using Python and NetworkX.
A (very) partial answer (and solution) to your question is to use Fluid Communities algorithm implemented by Networkx as asyn_fluidc.
Note that it works on connected, undirected, unweighted graphs, so if your graph has n connected components, you should run it n times. In fact this could be a significant issue as you should have some sort of preliminary knowledge of each component to choose the corresponding k.
Anyway, it is worth a try.
You may want to try pysbm. It is based on networkx and implements different variants of stochastic block models and inference methods.
If you consider to switch from networkxto a different python based graph package you may want to consider graph-tool, where you would be able to use the stochastic block model for the clustering task. Another noteworthy package is igraph, may want to look at How to cluster a graph using python igraph.
The approaches directly available in networkx are rather old fashioned. If you aim for state of the art clustering methods, you may consider spectral clustering or Infomap. The selection depends on your desired usage of the inferred communities. The task of inferring ground truth from a network, falls under (approximate) the No-Free-Lunch theorem, i.e. (roughly) no algorithm exists, such that it returns "better" communities than any other algorithm, if we average the results over all possibilities.
I am not entirely sure of my answer but maybe you can try this. Are you aware of label propagation ? The main idea is that you have some nodes in graph which are labelled i.e. they belong to a community and you want to give labels to other unlabelled nodes in your graph. LPA will spread these labels across the graph and give you a list of nodes and the communities they belong to. These communities will be the same as the ones that your labelled set of nodes belong to.
So I think you can control the number of communities you want to extract from the graph by controlling the number of communities you initialise in the beginning. But I think it is also possible that after LPA converges some of the communities you initialised vanish from the graph due the graph structure and also randomness of the algorithm. But there are many variants of LPA where you can control this randomness. I believe this page of sklearn talks about it.
You can read about LPA here and also here

Graph matching algorithms

I've been searching for graph matching algorithms written in Python but I haven't been able to find much.
I'm currently trying to match two different graphs that derive from two distinct sets of character sequences. I know that there is an underlying connection between the two graphs, more precisely a one-to-one mapping between the nodes. But the graphs don't have the same labels and as such I need graph matching algorithms that return nodes mappings just by comparing topology and/or attributes. By testing, I hope to maximize correct matches.
I've been using Blondel and Heymans from the graphsim package and intend to also use Tacsim from the same package.
I would like to test other options, probably more standard, like maximum subgraph isomorphism or finding subgraphs with very good matchings between the two graphs. Graph edit distance might also help if it manages to give a matching.
The problem is that I can't find anything implemented, even in Networkx that I'm using. Does anyone know of any Python implementations? Would be a plus if those options used Networkx.
I found this implementation of Graph Edit Distance algorithms which uses NetworkX in Python.
https://github.com/Jacobe2169/GMatch4py
"GMatch4py is a library dedicated to graph matching. Graph structure are stored in NetworkX graph objects. GMatch4py algorithms were implemented with Cython to enhance performance."

Python Library to create and visualize HyperGraph

Is there any libraries similar to igraph where I can create a hypergraph. I am working with hypergraph now and wanted to use some HyperGraph libraries to work on.
MGtoolkit and its paper
pygraph
halp
PyMETIS
SageMath's implementation, 1, 2. SageMath is not a python library but more like a python distribution (ships python 2.7 currently) which lots of interesting libraries are pre-installed.
I hope we see NetworkX and igraph support also soon.
There's also HyperNetX which is able to represent and visualise hypergraphs.
It seems very accessible. They have a number of nice tutorials on their GitHub page.
However, when working with it I identified some issues:
Performance: The library struggles with graphs that have several thousand nodes. I recommend igraph instead, although it does not have explicit support for hypergraphs. It does offer functionality for bipartite graphs, though. I believe if no hyperedge is fully contained in another, you can work with a bipartite graph that is isomorphic to your given hypergraph?
I encountered an issue in which the ordering of nodes would not be deterministic, i.e. if you constructed the same graph several times and iterated over the nodes, they would be given to you in different orders. This can probably be worked around.

Finding the Path of all Edges on a Graph

I'm trying to get the path on a graph which covers all edges, and traverses them only once.
This means there will only be two "end" points - which will have an odd-number of attached nodes. These end points would either have one connecting edge, or be part of a loop and have 3 connections.
So in the simple case below I need to traverse the nodes in this order 1-2-3-4-5 (or 5-4-3-2-1):
In the more complicated case below the path would be 1-2-3-4-2 (or 1-2-4-3-2):
Below is also a valid graph, with 2 end-points: 1-2-4-3-2-5
I've tried to find the name of an algorithm to solve this, and thought it was the "Chinese Postman Problem", but implementing this based on code at https://github.com/rkistner/chinese-postman/blob/master/postman.py didn't provide the results I expected.
The Eulerian path looks almost what is needed, but the networkx implementation will only work for closed (looped) networks.
I also looked at a Hamiltonian Path - and tried the networkx algorithm - but the graph types were not supported.
Ideally I'd like to use Python and networkx to implement this, and there may be a simple solution that is already part of the library, but I can't seem to find it.
You're looking for Eulerian Path that visits every edge exactly once. You can use Fleury's algorithm to generate the path. Fleury's algorithm has O(E^2) time complexity, if you need more efficient algorithm check Hierholzer's algorithm which is O(E) instead.
There is also an unmerged pull request for the networkx library that implements this. The source is easy to use.
(For networkx 1.11 the .edge has to be replaced with .edge_iter).
This is known as the Eulerian Path of a graph. It has now been added to NetworkX as eulerian_path().

Algorithm of community_edge_betweenness() in python-igraph implementation

I had to shift from using community_fastgreedy() to community_edge_betweenness() due to the inability of community_fastgreedy() to handle directed graphs (directed unweighted graph).
My understanding is that community_fastgreedy() is bottom-up approach while community_edge_betweenness() is top-down and both work on the principle of finding communities that maximize modularity, one by merging communities and the other by removing edges.
In the original paper by M.Girvan and M.E.J.Newman "Community structure in social and biological networks", there is no mention of it being able to handle directed graph. This is being used for community_edge_betweenness().
I referred here and Link documentation to get more information on the algorithm for directed networks.
My questions are -
Is my understanding of, community_fastgreedy() and community_edge_betweenness() implementation in python-igraph depend on maximizing modularity, correct.
Can you please point me to the documentation of how community_edge_betweenness is implemented to handle directed network in python-igraph or to a newer version of the paper by Girvan and Newman.
Since i am new to community detection any pointers are useful.
I am aware of better methods (Louvain, Infomap) but still need to use CNM or GN for comparision purposes.
Thanks.
community_edge_betweenness() does not try to maximize modularity. Modularity is only used as a rule of thumb to decide where to "cut" the dendrogram generated by the algorithm if the user insists on a "flat" community structure instead of a flat dendrogram.
community_edge_betweenness() "handles" directed graphs simply by looking for directed paths instead of undirected ones when it calculates the edge betweenness scores for the edges (which are then used in turn to decide which edge to remove at a particular step). As far as I know, no research has been made on whether this approach is scientifically sound and correct or not.
The reason why most community detection methods (especially the ones that are maximizing modularity) do not cater for directed graphs is because the concept of a "community" is not well-defined for directed graphs - most of the algorithms look for parts in the graph that are "denser than expected by chance", but this vague definition does not say anything about how the directions of edges should be used. Also, there are multiple (conflicting) extensions of the modularity score for directed graphs.
As far as I know, the only method in igraph that has a "formal" treatment to the problem of communities in directed networks is the InfoMap algorithm. InfoMap defines communities based on minimal encodings of random walks within graphs so it is able to take the edge directions into account accurately - roughly speaking, communities found by the InfoMap algorithm are groups of nodes for which a random walker has a small probability of "escaping" from the group. (The InfoMap homepage has a nice visual explanation). So, if you really need to find communities in a directed graph, I would suggest using the InfoMap method.

Categories