how to represent graphs /trees in python and how to detect cycles? - python

i want to implement kruskal's algorithm in python how can i go about representing the tree/graph and what approach should i follow to detect the cycles ?

The simplest way of representing it (in my opinion) is by using a dict of arrays lists:
graph = {}
graph[node_id] = [other_node_id for other_node_id in neighbors(node_id)]
A simple way of finding cycles is by using a BF or DF search:
def df(node):
if visited(node):
pass # found a cycle here, do something with it
visit(node)
[df(node_id) for node_id in graph[node]]
Disclaimer: this is actually a sketch; neighbors(), visited() and visit() are just dummies to represent how the algorithm should look like.

Python Graph API is a good place to start.
For example, NetworkX has a Kruskal's algorithm implementation to find the minimum spanning tree.
If you want to re-invent the wheel and do it yourself, that is also possible.

Related

Is there a more efficient way to calculate the shortest path problem than networkx in python?

I am calculating the shortest path from one source to one goal on a weighted graph with networkx and single_source_dijkstra.
However, I run into memory problems.
Is there a more efficient way to calculate this? An alternative to Networkx? See my code:
cost, shortestpath = nx.single_source_dijkstra(graph, startpointcoords, secondptcoords,cutoff=10000000)
The bidirectional dijkstra algorithm should produce a significant improvement. Here is the documentation.
A good analogy would be in 3D: place one balloon at point x and expand it till it reaches point y. The amount of air you put in is proportional to the cube of the distance between them. Now put a balloon at each point and inflate both until they touch. The combined volume of air is only 1/4 of the original. In higher dimensions (which is a closer analogy to most networks), there is even more reduction.
Apparently the A* Algorithm of networkx is way more efficient. Afterwards I calculate the length of the resulting path with the dijkstra algorithm I posted.
Perhaps try using another algorithm? Your graph may have too many vertices but few edges, in which case you could use Bellman-Ford bellman_ford_path() link in networkX
Another solution would be to use another python package, for example the answers to this question has different possible libraries.
The last solution would be to implement your own algorithm! Perhaps Gabow's algorithm, but you would have to be very efficient for example by using numpy with numba

Can sklearn.tree.export_graphviz be reused for one's own data structures?

I've been looking for a good way in Python to draw an abstract syntax tree to PNG. A combination of networkx and matplotlib seems to be able to do the job well enough to get by.
But I just noticed that https://scikit-learn.org/stable/modules/generated/sklearn.tree.export_graphviz.html does much better! This applies when using sklearn to generate a random forest; it is a function specific to the resulting decision trees.
Is there a way to supply an arbitrary tree to the above function, or to some version of the code behind it, to obtain the high-quality rendering?
You could use simple graphviz. There is examples how to draw your own data structures.

Graph matching algorithms

I've been searching for graph matching algorithms written in Python but I haven't been able to find much.
I'm currently trying to match two different graphs that derive from two distinct sets of character sequences. I know that there is an underlying connection between the two graphs, more precisely a one-to-one mapping between the nodes. But the graphs don't have the same labels and as such I need graph matching algorithms that return nodes mappings just by comparing topology and/or attributes. By testing, I hope to maximize correct matches.
I've been using Blondel and Heymans from the graphsim package and intend to also use Tacsim from the same package.
I would like to test other options, probably more standard, like maximum subgraph isomorphism or finding subgraphs with very good matchings between the two graphs. Graph edit distance might also help if it manages to give a matching.
The problem is that I can't find anything implemented, even in Networkx that I'm using. Does anyone know of any Python implementations? Would be a plus if those options used Networkx.
I found this implementation of Graph Edit Distance algorithms which uses NetworkX in Python.
https://github.com/Jacobe2169/GMatch4py
"GMatch4py is a library dedicated to graph matching. Graph structure are stored in NetworkX graph objects. GMatch4py algorithms were implemented with Cython to enhance performance."

Finding the Path of all Edges on a Graph

I'm trying to get the path on a graph which covers all edges, and traverses them only once.
This means there will only be two "end" points - which will have an odd-number of attached nodes. These end points would either have one connecting edge, or be part of a loop and have 3 connections.
So in the simple case below I need to traverse the nodes in this order 1-2-3-4-5 (or 5-4-3-2-1):
In the more complicated case below the path would be 1-2-3-4-2 (or 1-2-4-3-2):
Below is also a valid graph, with 2 end-points: 1-2-4-3-2-5
I've tried to find the name of an algorithm to solve this, and thought it was the "Chinese Postman Problem", but implementing this based on code at https://github.com/rkistner/chinese-postman/blob/master/postman.py didn't provide the results I expected.
The Eulerian path looks almost what is needed, but the networkx implementation will only work for closed (looped) networks.
I also looked at a Hamiltonian Path - and tried the networkx algorithm - but the graph types were not supported.
Ideally I'd like to use Python and networkx to implement this, and there may be a simple solution that is already part of the library, but I can't seem to find it.
You're looking for Eulerian Path that visits every edge exactly once. You can use Fleury's algorithm to generate the path. Fleury's algorithm has O(E^2) time complexity, if you need more efficient algorithm check Hierholzer's algorithm which is O(E) instead.
There is also an unmerged pull request for the networkx library that implements this. The source is easy to use.
(For networkx 1.11 the .edge has to be replaced with .edge_iter).
This is known as the Eulerian Path of a graph. It has now been added to NetworkX as eulerian_path().

Voronoi Tessellation in Python

Node Assignment Problem
The problem I want to solve is to tessellate the map given with the Blue Nodes(Source Nodes) as given input points, Once I am able to do this I would like to see how many Black Nodes(Demand Nodes) fall within each cell and assign it to the Blue Node associated with that cell.
I would like to know if there is a easier way of doing this without using Fortune's Algorithm.I came across this function under Mahotas called Mahotas.segmentation.gvoronoi(image)source. But I am not sure if this will solve my problem.
Also please suggest me if there is a better way of doing this segmentation(other than Voronoi tessellation). I am not sure if clustering algorithms would be a good choice. I am a programming newbie.
Here is an alternative approach to using Voronoi tessellation:
Build a k-d tree over the source nodes. Then for every demand node, use the k-d tree to find the nearest source node and increment a counter associated with that nearby source node.
The implementation of a k-d tree found at http://code.google.com/p/python-kdtree/ should be useful.
I've just been looking for the same thing and found this:
https://github.com/Softbass/py_geo_voronoi
There's not many points in your diagram. That suggests you can, for each demand node, just iterate through all the source nodes and find the nearest one.
Perhaps this:
def distance(a, b):
return sum((xa - xb) ** 2 for (xa, xb) in zip(a, b))
def clusters(sources, demands):
result = dict((source, []) for source in sources)
for demand in demands:
nearest = min(sources, key=lambda s: distance(s, demand))
result[nearest].append(demand)
return result
This code will give you a dictionary, mapping source nodes to a list of all demand nodes which are closer to that source node than any other.
This isn't particularly efficient, but it's very simple!
I think the spatial index answer by https://stackoverflow.com/users/1062447/wye-bee (A kd-tree for example) is the easiest solution to your problem.
Additionally, you did also ask is there an easier alternative to Fortune's algorithm and for that particular question I refer you to: Easiest algorithm of Voronoi diagram to implement?
You did not say why you wanted to avoid Fortune's algorithm. I assume you meant that you just didn't want to implement it yourself, but it has already been implemented in a script by Bill Simons and Carston Farmer so computing the voronoi diagram shouldn't be difficult.
Building on their script I made it even easier to use and uploaded it to PyPi under the name Pytess. So you could use the pytess.voronoi() function based on the blue points as input, returning the original points with their computed voronoi polygons. Then you would have to assign each black point through point-in-polygon testing, which you could base on http://geospatialpython.com/2011/08/point-in-polygon-2-on-line.html.
Run this code in Mathematica. It's spectacular! (Yes, I know it is not Python, but ...)
pts3 = RandomReal[1, {50, 3}];
ListDensityPlot[pts3,
InterpolationOrder -> 0, ColorFunction -> "SouthwestColors", Mesh -> All]

Categories