find k-partition of a graph that satisfies the condition - python

Given the adjacency matrix of a graph, I want to divide the nodes into k partitions where [the sum of edges inside each part] minus [the sum of edges between these k parts] is the maximum.
Here is my attempt:
I made all partitions of the nodes and looked up for partitions including k subsets. Then calculated all of the edges inside each part considering all pairs of nodes and I did this for edges between all pairs of parts too and looked for the maximum.
The problem is that my approach is not efficient in terms of time and memory. So I am looking for a faster method.
I know that I shouldn’t store all partitions in memory and have to do the calculations while generating each one. But I don’t know how to implement this.
Any help is appreciated.

Related

Can a genetic algorithm optimize my NP-complete problem?

I have an array that stores a large collection of elements.
Any two elements can be compared in some function with the result being true or false.
The problem is to find the largest or at least a relatively large subgroup, where every element with all the others in that subgroup is in a true relationship.
Finding the largest subgroup from an array of size N requires N! operations, so the iterative way is out.
Randomly adding successive matching elements works, but the resulting subgroups are too small.
Can this problem be significantly optimised using a genetic algorithm and thus find much larger subgroups in a reasonable time?

Closest neighbor to any point (other than itself) in a 2D plane

Given a 2D plane and N points (n1=(x1,y1), n2=(x2,y2)..., nN=(xN,yN)), what is a fast (O(n) or better) algorithm that will find the closest neighbor of any point (e.g. n1 closest neighbor is n3, n2 closest neighbor is n4). I was thinking about storing it in a dictionary where the keys are the points, and the values are the neighbors.
There seem to be quite a lot of similar questions on SO, but I couldn't find any code in Python or answers that are not in other languages.
A simple solution that can yield better results than O(n) on average is using a k-d tree for storing the points. building the tree has a worst case complexity O(nlogn). after that searching is on average O(logn) and worst case O(n) (usually for points away from all the other data).
Also you might be interested in LSH or locality sensitive hashing while it is an approximation approach meaning you won't always get the right answer, for high dimensional data it provides an important speedup with complexity being closely related to the parameters chosen.
For a given a point P, simple solution:
Calculate P's distance from all other points, this is done with O(n) time complexity.
Save them as list of tuples (or similar data structure), the tuples of (point, distance from P).
One walk over the list can yield you the top K closest points, in O(n) time complexity.
A second solution that will cost more in space complexity (O(n^2)) and maintenance over time, but will shorten finding the K nearest neighbors:
Keeping a dictionary from each point to a ordered list (or similar data structure) of points by their distance - building this table once is O(n^2 * log(n)).
Finding K closest points is O(k) - dictionary access + copying k elements from ordered list.
Overhead in time complexity would be in adding a new point or removing an old one in a manner that will keep this data structure valid - O(n*log(n)) in both.
The solution you choose should be on what you want to optimize

What's the fastest way of finding the longest chain of strings with common first/last characters?

This one seems pretty straightforward but I'm having trouble implementing it in Python.
Say you have an array of strings, all of length 2: ["ab","bc","cd","za","ef","fg"]
There are two chains here: "zabcd" and "efg"
What's the most efficient way of finding the longest one? (in this case it would be "zabcd").
Chains can be cyclical too... for instance ["ab","bc","ca"] (in this particular case the length of the chain would be 3).
This is clearly a graph problem, with the characters being the vertices and the pairs being unweighted directed edges.
Without allowing cycles in the solution, this is the longest path problem, and it is NP-hard, so "efficient" is probably out of the window, even if cycles are allowed (to get rid of the solution cycles, split the vertices in two, one for incoming edges and one for out-going edges, with an edge in-between). According to Wikipedia, no good approximation scheme is known.
If the graph is acyclic, then you can do it in linear time, as the wikipedia article mentions:
A longest path between two given vertices s and t in a weighted graph
G is the same thing as a shortest path in a graph −G derived from G by
changing every weight to its negation. Therefore, if shortest paths
can be found in −G, then longest paths can also be found in G.[4]
For most graphs, this transformation is not useful because it creates
cycles of negative length in −G. But if G is a directed acyclic graph,
then no negative cycles can be created, and a longest path in G can be
found in linear time by applying a linear time algorithm for shortest
paths in −G, which is also a directed acyclic graph.[4] For instance,
for each vertex v in a given DAG, the length of the longest path
ending at v may be obtained by the following steps:
Find a topological ordering of the given DAG. For each vertex v of the
DAG, in the topological ordering, compute the length of the longest
path ending at v by looking at its incoming neighbors and adding one
to the maximum length recorded for those neighbors. If v has no
incoming neighbors, set the length of the longest path ending at v to
zero. In either case, record this number so that later steps of the
algorithm can access it. Once this has been done, the longest path in
the whole DAG may be obtained by starting at the vertex v with the
largest recorded value, then repeatedly stepping backwards to its
incoming neighbor with the largest recorded value, and reversing the
sequence of vertices found in this way.
There are other special cases where there are efficient algorithms available, notably trees, but since you allow cycles that probably doesn't apply to you.
I didn't give an algorithm here for you, but that should give you the right direction for your research. The problem itself may be straightforward, but an efficient solution is NOT.

Check if path exists that contains a set of N nodes

Given a graph g and a set of N nodes my_nodes = [n1, n2, n3, ...], how can I check if there's a path that contains all N nodes?
Checking among all_simple_paths for paths that contain all nodes in my_nodes becomes computationally cumbersome as the graph grows
The search above can be limited to paths between my_nodes pairwise couples. This reduces complexity only to a small degree. Plus it requires a lot of python looping, which is quite slow
Is there a faster solution to the problem?
You may try out some greedy algorithm here, starting the path find check from all the nodes to find, and step by step explore your graph. Can't provide some real sample, but pseudo-code should be something like this:
Start n path stubs from all your n nodes to find
For all these path stubs adjust them by all the neighbors which weren't checked before
If you have some intersection between path stubs, then you got a new one, which does contain more of your needed nodes than before
If after merging the stub paths you have the one which covers all needed nodes, you're done
If there are still some additional nodes to add to the path, you continue with second step again
If there are no nodes left in graph, the path doesn't exists
This algorithm has complexity O(E + N), because you're visiting the edges and nodes in non-recursive fashion.
However, in case of directed graph the "merge" will be a bit more complicated, yet still be done, but in this case the worst scenario may take a lot of time.
Update:
As you say that the graph is directed, the above approach wouldn't work well. In this case you may simplify your task like this:
Find the strongly connected components in graph (I suggest you to implement it by yourself, e.g., Kosaraju's algorithm). The complexity is O(E + N). You can use a NetworkX method for this, if you want some out-ofbox solution.
Create the condensation of graph, based on step 1 information, with saving the information about which component can be visited from other. Again, there is a NetworkX method for this.
Now you can easily say, which nodes from your set are in the same component, so a path containing all of them definitely exists.
After that all you need to check is a connectivity between different components for your nodes. For example, you can get the topological sort of condensation and do check in linear time again.

Find cliques of length k in a graph

I'm working with graphs of ~200 nodes and ~3500 edges. I need to find all cliques of this graph. Using networkx's enumerate_all_cliques() works fine with smaller graphs of up to 100 nodes, but runs out of memory for bigger ones.
"This algorithm however, hopefully, does not run out of memory
since it only keeps candidate sublists in memory and
continuously removes exhausted sublists."source code for enumerate_all_cliques()
Is there maybe a way to return a generator of all cliques of length k, instead of all cliques, in order to save memory?
It seems that your priority is to save memory rather than getting all cliques. In that case the use of networkx.find_cliques(G) is a satisfactory solution as you will get all maximal cliques (largest complete subgraph containing a given node) instead of all cliques.
I compared the number of lists (subgraphs) of both functions:
G = nx.erdos_renyi_graph(300,0.08)
print 'All:',len(list(nx.enumerate_all_cliques(G)))
print 'Maximal',len(list(nx.find_cliques(G)))
All: 6087
Maximal 2522
And when the number of edges increases in the graph the difference in results gets wider.

Categories