Voronoi Tessellation in Python - python

Node Assignment Problem
The problem I want to solve is to tessellate the map given with the Blue Nodes(Source Nodes) as given input points, Once I am able to do this I would like to see how many Black Nodes(Demand Nodes) fall within each cell and assign it to the Blue Node associated with that cell.
I would like to know if there is a easier way of doing this without using Fortune's Algorithm.I came across this function under Mahotas called Mahotas.segmentation.gvoronoi(image)source. But I am not sure if this will solve my problem.
Also please suggest me if there is a better way of doing this segmentation(other than Voronoi tessellation). I am not sure if clustering algorithms would be a good choice. I am a programming newbie.

Here is an alternative approach to using Voronoi tessellation:
Build a k-d tree over the source nodes. Then for every demand node, use the k-d tree to find the nearest source node and increment a counter associated with that nearby source node.
The implementation of a k-d tree found at http://code.google.com/p/python-kdtree/ should be useful.

I've just been looking for the same thing and found this:
https://github.com/Softbass/py_geo_voronoi

There's not many points in your diagram. That suggests you can, for each demand node, just iterate through all the source nodes and find the nearest one.
Perhaps this:
def distance(a, b):
return sum((xa - xb) ** 2 for (xa, xb) in zip(a, b))
def clusters(sources, demands):
result = dict((source, []) for source in sources)
for demand in demands:
nearest = min(sources, key=lambda s: distance(s, demand))
result[nearest].append(demand)
return result
This code will give you a dictionary, mapping source nodes to a list of all demand nodes which are closer to that source node than any other.
This isn't particularly efficient, but it's very simple!

I think the spatial index answer by https://stackoverflow.com/users/1062447/wye-bee (A kd-tree for example) is the easiest solution to your problem.
Additionally, you did also ask is there an easier alternative to Fortune's algorithm and for that particular question I refer you to: Easiest algorithm of Voronoi diagram to implement?

You did not say why you wanted to avoid Fortune's algorithm. I assume you meant that you just didn't want to implement it yourself, but it has already been implemented in a script by Bill Simons and Carston Farmer so computing the voronoi diagram shouldn't be difficult.
Building on their script I made it even easier to use and uploaded it to PyPi under the name Pytess. So you could use the pytess.voronoi() function based on the blue points as input, returning the original points with their computed voronoi polygons. Then you would have to assign each black point through point-in-polygon testing, which you could base on http://geospatialpython.com/2011/08/point-in-polygon-2-on-line.html.

Run this code in Mathematica. It's spectacular! (Yes, I know it is not Python, but ...)
pts3 = RandomReal[1, {50, 3}];
ListDensityPlot[pts3,
InterpolationOrder -> 0, ColorFunction -> "SouthwestColors", Mesh -> All]

Related

Is there a more efficient way to calculate the shortest path problem than networkx in python?

I am calculating the shortest path from one source to one goal on a weighted graph with networkx and single_source_dijkstra.
However, I run into memory problems.
Is there a more efficient way to calculate this? An alternative to Networkx? See my code:
cost, shortestpath = nx.single_source_dijkstra(graph, startpointcoords, secondptcoords,cutoff=10000000)
The bidirectional dijkstra algorithm should produce a significant improvement. Here is the documentation.
A good analogy would be in 3D: place one balloon at point x and expand it till it reaches point y. The amount of air you put in is proportional to the cube of the distance between them. Now put a balloon at each point and inflate both until they touch. The combined volume of air is only 1/4 of the original. In higher dimensions (which is a closer analogy to most networks), there is even more reduction.
Apparently the A* Algorithm of networkx is way more efficient. Afterwards I calculate the length of the resulting path with the dijkstra algorithm I posted.
Perhaps try using another algorithm? Your graph may have too many vertices but few edges, in which case you could use Bellman-Ford bellman_ford_path() link in networkX
Another solution would be to use another python package, for example the answers to this question has different possible libraries.
The last solution would be to implement your own algorithm! Perhaps Gabow's algorithm, but you would have to be very efficient for example by using numpy with numba

Efficient method for counting number of data points inside sphere of fixed radius centered on each data point

I have a database with many data-points each with an x,y,z coordinate. I want to count the number of points that are within a certain distance to neighboring points. Some points will have a pair that is within a radius R, others will not. I simply want to count the number of pairs within some distance. I could easily write an algorithm to do this but it would not be efficient enough (for I would iterate over every single data point).
This seems like something that must already exist in astropy, scipy, etc. but I cannot seem to find what I am looking for. Is there anything out there that accomplishes this?
As mentioned by #Davis Herring in the comments, an efficient option is a k-d tree.
The k-d tree is an algorithm that avoids the brute-force approach and allows for efficient distance computations* (see bottom of answer for background).
There are several Python implementations of this, one of which is through SciPy:
SciPy k-d tree in Cython (faster since it uses C/Cython)
SciPy k-d tree in pure Python
You can use this by first constructing a k-d tree for your xyz data:
import numpy as np #for later code
from scipy.spatial import cKDTree
kdtree = cKDTree(xyzData)
Then, you must query the k-d tree with a point point to compute the distance between point and its nearest neighbor. The output of this query is the distance NN_dist between point and its nearest neighbor and the index NN_idx of that neighbor. To compute this for all of your points, we need a for loop, but given the k-d tree algorithm, this is much faster than a brute-force computation:
NN_dists = np.zeros(numPoints) #pre-allocate an array to store distances
for i in range(numPoints):
point = xyzData[i]
NN_dist, NN_idx = kdtree.query(point,k=[1])
#Note: 'k' specifies the kth neighbor distance to compute,
#so set k=2 if you end up finding the point as its own "neighbor":
if NN_dist == 0:
NN_dist, NN_idx = targetTree.query(curCoord,k=[2])
NN_dists[i] = NN_dist
(see k-d tree query for more details).
Then, to find the distances that are below some threshold, you could use the built-in utility of NumPy arrays when using comparison operators (like <):
distanceThres = 10
goodIdx = NN_dists < distanceThres
goodPoints = xyzData[goodIdx]
This will give you the indices goodIdx and points goodPoints that are within your specified distance threshold distanceThres (though you may have to change this code depending on the shape/format of your xyz coordinate data).
*A light background on k-d trees (glossing over fine details -- see references for more): the k-d tree method partitions a dataset in such a way that avoids computing the distance between every single point (i.e., the brute force method). It does this by dividing the dataset into binary space partitions to construct a k-d tree. These partitions are such that a distance computation (e.g., a nearest-neighbor search) can ignore datapoints that are in distant partitions. Additionally, this same k-d tree is reused for each point.
There are a lot of resources online about k-d trees in general. I found these references most helpful when I was learning about this algorithm: Stanford k-d trees or Princeton k-d trees.
Let me know if you have questions -- I had this exact problem myself during an astronomy project, so I may be able to help more.
I don't have direct experience with it but scipy.spatial.distance.pdist may be what you're looking for.
This link may be helpful as well. It gives an example of how to solve the problem as I understand it.

Algorithm designing

Which one is more suitable for designing an algorithm that produces all the paths between two vertices in a directed graph?
Backtracking
Divide and conquer
Greedy approach
Dynamic programming
I was thinking of Backtracking due to the BFS and DFS, but I am not sure. Thank you.
Note that there can be an exponential number of paths in your output.
Indeed, in a directed graph of n vertices having an edge i -> j for every pair i < j, there are 2n-2 paths from 1 to n: each vertex except the endpoints can be either present in the path or omitted.
So, if we really want to output all paths (and not, e.g, make a clever lazy structure to list them one by one later) no advanced technique can help achieve polynomial complexity here.
The simplest way to find all the simple paths is recursively constructing a path, and adding the current path to the answer once we arrive at the end vertex.
To improve it, we can use backtracking.
Indeed, for each vertex, we can first compute whether the final vertex is reachable from it, and do so in polynomial time.
Later, we just use only the vertices for which the answer was positive.

Python Implementation of OPTICS (Clustering) Algorithm

I'm looking for a decent implementation of the OPTICS algorithm in Python. I will use it to form density-based clusters of points ((x,y) pairs).
I'm looking for something that takes in (x,y) pairs and outputs a list of clusters, where each cluster in the list contains a list of (x, y) pairs belonging to that cluster.
I'm not aware of a complete and exact python implementation of OPTICS. The links posted here seem just rough approximations of the OPTICS idea. They also do not use an index for acceleration, so they will run in O(n^2) or more likely even O(n^3).
OPTICS has a number of tricky things besides the obvious idea. In particular, the thresholding is proposed to be done with relative thresholds ("xi") instead of absolute thresholds as posted here (at which point the result will be approximately that of DBSCAN!).
The original OPTICS paper contains a suggested approach to converting the algorithm's output into actual clusters:
http://www.dbs.informatik.uni-muenchen.de/Publikationen/Papers/OPTICS.pdf
The OPTICS implementation in Weka is essentially unmaintained and just as incomplete. It doesn't actually produce clusters, it only computes the cluster order. For this it makes a duplicate of the database - it isn't really Weka code.
There seems to be a rather extensive implementation available in ELKI in Java by the group that published OPTICS in the first place. You might want to test any other implementation against this "official" version.
EDIT: the following is known to not be a complete implementation of OPTICS.
I did a quick search and found the following (Optics). I can't vouch for its quality, however the algorithm seems pretty simple, so you should be able to validate/adapt it quickly.
Here is a quick example of how to build clusters on the output of the optics algorithm:
def cluster(order, distance, points, threshold):
''' Given the output of the options algorithm,
compute the clusters:
#param order The order of the points
#param distance The relative distances of the points
#param points The actual points
#param threshold The threshold value to cluster on
#returns A list of cluster groups
'''
clusters = [[]]
points = sorted(zip(order, distance, points))
splits = ((v > threshold, p) for i,v,p in points)
for iscluster, point in splits:
if iscluster: clusters[-1].append(point)
elif len(clusters[-1]) > 0: clusters.append([])
return clusters
rd, cd, order = optics(points, 4)
print cluster(order, rd, points, 38.0)
While not technically OPTICS there is an HDBSCAN* implementation for python available at https://github.com/lmcinnes/hdbscan . This is equivalent to OPTICS with an infinite maximal epsilon, and a different cluster extraction method. Since the implementation provides access to the generated cluster hierarchy you can extract clusters from that via more traditional OPTICS methods as well if you would prefer.
Note that despite not limiting the epsilon parameter this implementation still achieves O(n log(n)) performance using kd-tree and ball-tree based minimal spanning tree algorithms, and can handle quite large datasets.
There now exists the library pyclustering that contains, amongst others, a Python and a C++ implementation of OPTICS.
It is now implemented in the development version (scikit-learn v0.21.dev0) of sklearn (a clustering and maschine learning module for python)
here is the link:
https://scikit-learn.org/dev/modules/generated/sklearn.cluster.OPTICS.html
See "Density-based clustering approaches" on
http://www.chemometria.us.edu.pl/index.php?goto=downloads
You want to look at a space-filling-curve or a spatial index. A sfc reduce the 2d complexity to a 1d complexity. You want to look at Nick's hilbert curve quadtree spatial index blog. You want to download my implementation of a sfc at phpclasses.org (hilbert-curve).

how to represent graphs /trees in python and how to detect cycles?

i want to implement kruskal's algorithm in python how can i go about representing the tree/graph and what approach should i follow to detect the cycles ?
The simplest way of representing it (in my opinion) is by using a dict of arrays lists:
graph = {}
graph[node_id] = [other_node_id for other_node_id in neighbors(node_id)]
A simple way of finding cycles is by using a BF or DF search:
def df(node):
if visited(node):
pass # found a cycle here, do something with it
visit(node)
[df(node_id) for node_id in graph[node]]
Disclaimer: this is actually a sketch; neighbors(), visited() and visit() are just dummies to represent how the algorithm should look like.
Python Graph API is a good place to start.
For example, NetworkX has a Kruskal's algorithm implementation to find the minimum spanning tree.
If you want to re-invent the wheel and do it yourself, that is also possible.

Categories