Does anybody know any module in Python that computes the best bipartite matching?
I have tried the following two:
munkres
hungarian
However, in my case, I have to deal with non-complete graph (i.e., there might not be an edge between two nodes), and therefore, there might not be a match if the node has no edge. The above two packages seem not to be able to deal with this.
Any advice?
Set cost to infinity or a large value for an edge that does not exist. You can then tell by the result whether an invalid edge was used.
Related
I have a rather simple problem to define but I did not find a simple answer so far.
I have two graphs (ie sets of vertices and edges) which are identical. Each of them has independently labelled vertices. Look at the example below:
How can the computer detect, without prior knowledge of it, that 1 is identical to 9, 2 to 10 and so on?
Note that in the case of symmetry, there may be several possible one to one pairings which give complete equivalence, but just finding one of them is sufficient to me.
This is in the context of a Python implementation. Does someone have a pointer towards a simple algorithm publicly available on the Internet? The problem sounds simple but I simply lack the mathematical knowledge to come up to it myself or to find proper keywords to find the information.
EDIT: Note that I also have atom types (ie labels) for each graphs, as well as the full distance matrix for the two graphs to align. However the positions may be similar but not exactly equal.
This is known as the graph isomorphism problem, and probably very hard; although the exactly details of how hard are still subject of research.
(But things look better if you graphs are planar.)
So, after searching for it a bit, I think that I found a solution that works most of the time for moderate computational cost. This is a kind of genetic algorithm which uses a bit of randomness, but it is practical enough for my purposes it seems. I didn't have any aberrant configuration with my samples so far even if it is theoretically possible that this happens.
Here is how I proceeded:
Determine the complete set of 2-paths, 3-paths and 4-paths
Determine vertex types using both atom type and surrounding topology, creating an "identity card" for each vertex
Do the following ten times:
Start with a random candidate set of pairings complying with the allowed vertex types
Evaluate how much of 2-paths, 3-paths and 4-paths correspond between the two pairings by scoring one point for each corresponding vertex (also using the atom type as an additional descriptor)
Evaluate all other shortlisted candidates for a given vertex by permuting the pairings for this candidate with its other positions in the same way
Sort the scores in descending order
For each score, check if the configuration is among the excluded configurations, and if it is not, take it as the new configuration and put it into the excluded configurations.
If the score is perfect (ie all of the 2-paths, 3-paths and 4-paths correspond), then stop the loop and calculate the sum of absolute differences between the distance matrices of the two graphs to pair using the selected pairing, otherwise go back to 4.
Stop this process after it has been done 10 times
Check the difference between distance matrices and take the pairings associated with the minimal sum of absolute differences between the distance matrices.
I am trying create two graphs with the identical networks between them removed.
Graph 1:
Graph 2:
To demonstrate what I want to do:
In red, I highlight a network in both graphs that has several similar nodes (e.g. B Arg 511, B Asp 513 ...). However, graph 2 has more nodes connected and different edges that connect them.
In green, I highlight identical networks.
Essentially, I'd like to keep the different networks and exclude identical ones.
I tried using networkx.algorithms.operators.binary.difference() but it doesn't work since they don't have identical node sets.
A potential solution might be to make sets of sets of the node networks for each graph and take the difference. Then try to graph that instead?
Any help is appreciated.
Looks like you are looking for a list of all largest subgraphs that belong to the both parent graphs. Assuming you have no prior info regarding what kind of subgraph you are looking for, I see no better way to solve it except to.. try all possible subgraphs. It works nice theoretically however absolutely impossible on practice due to combinatorial explosion. Problems becomes much harder if you gonna look for isomorphic subgraphs — those which have the same structure but made of possible different sets of vertices and/or edges.
Otherwise you have to extend your question with what else you know about subgraph you gonna look for and it may or may not help to find a better way to solve the problem.
I've been searching for graph matching algorithms written in Python but I haven't been able to find much.
I'm currently trying to match two different graphs that derive from two distinct sets of character sequences. I know that there is an underlying connection between the two graphs, more precisely a one-to-one mapping between the nodes. But the graphs don't have the same labels and as such I need graph matching algorithms that return nodes mappings just by comparing topology and/or attributes. By testing, I hope to maximize correct matches.
I've been using Blondel and Heymans from the graphsim package and intend to also use Tacsim from the same package.
I would like to test other options, probably more standard, like maximum subgraph isomorphism or finding subgraphs with very good matchings between the two graphs. Graph edit distance might also help if it manages to give a matching.
The problem is that I can't find anything implemented, even in Networkx that I'm using. Does anyone know of any Python implementations? Would be a plus if those options used Networkx.
I found this implementation of Graph Edit Distance algorithms which uses NetworkX in Python.
https://github.com/Jacobe2169/GMatch4py
"GMatch4py is a library dedicated to graph matching. Graph structure are stored in NetworkX graph objects. GMatch4py algorithms were implemented with Cython to enhance performance."
I'm trying to get the path on a graph which covers all edges, and traverses them only once.
This means there will only be two "end" points - which will have an odd-number of attached nodes. These end points would either have one connecting edge, or be part of a loop and have 3 connections.
So in the simple case below I need to traverse the nodes in this order 1-2-3-4-5 (or 5-4-3-2-1):
In the more complicated case below the path would be 1-2-3-4-2 (or 1-2-4-3-2):
Below is also a valid graph, with 2 end-points: 1-2-4-3-2-5
I've tried to find the name of an algorithm to solve this, and thought it was the "Chinese Postman Problem", but implementing this based on code at https://github.com/rkistner/chinese-postman/blob/master/postman.py didn't provide the results I expected.
The Eulerian path looks almost what is needed, but the networkx implementation will only work for closed (looped) networks.
I also looked at a Hamiltonian Path - and tried the networkx algorithm - but the graph types were not supported.
Ideally I'd like to use Python and networkx to implement this, and there may be a simple solution that is already part of the library, but I can't seem to find it.
You're looking for Eulerian Path that visits every edge exactly once. You can use Fleury's algorithm to generate the path. Fleury's algorithm has O(E^2) time complexity, if you need more efficient algorithm check Hierholzer's algorithm which is O(E) instead.
There is also an unmerged pull request for the networkx library that implements this. The source is easy to use.
(For networkx 1.11 the .edge has to be replaced with .edge_iter).
This is known as the Eulerian Path of a graph. It has now been added to NetworkX as eulerian_path().
Given two graphs (A and B), I am trying to determine if there exists a subgraph of B that matches A given some threshold based on the difference in edge weights. That is, if I take the sum of the difference between each pair of associated edges, it will be below a specified threshold. The vertex labels are not consistent between A and B, so I am just relying on the edge weights.
A will be somewhat small (e.g. max 10) and B will be larger (e.g. max 200).
I believe one of these two packages may help:
The Graph Matching Toolbox in MATLAB "implements spectral graph matching with affine constraint (SMAC), optionally with kronecker bistochastic normalization". It states on the webpage that it "handles graphs of different sizes (subgraph matching)"
http://www.timotheecour.com/software/graph_matching/graph_matching.html
The algorithm used in the Graph Matching Toolbox in MATLAB is based on the algorithm described in the paper by Timothee Cour, Praveen Srinivasan, and Jianbo Shi titled Balanced Graph Matching. The paper was published in NIPS 2006.
In addition, there is a second toolkit called Graph Matching Toolkit (GMT) that seems like it might have support for error-tolerant subgraph matching, as it does support error-tolerant graph matching. Rather than using a spectral method, it has various methods of computing edit distance, and then it is my impression that it finds the best matching by giving the argmax of the minimum edit distance. If it doesn't explicitly support subgraph matching and you don't care about efficiency, you might just search all subgraphs of B and use GMT to try to find matches of those subgraphs in A. Or maybe you could just search a subset of the subgraphs of B.
http://www.fhnw.ch/wirtschaft/iwi/gmt
Unfortunately neither of these appear to be in Python, and they don't seem to support networkx's graph format either. But I believe you may be able to find a converter that will change the representation of the networkx graph's to something usable by these toolkits. Then you can run the toolkits and output your desired subgraph matchings.