Simulation of random points on a multivariate convex hull with scipy - python

I have a data with 10 000 rows and 10 columns. The first goal of my study is to calculate the "Convex Hull" on this data. The package "scipy" can do this easily and I can get the vertices, the parameters of the different hyperplanes such as : b0 + b1.x1 + b2.x2 + .... + b10.x10 = 0 where : (b0,b1,...,b10) are the parameters of one facet of the convex hull (I can know the vertices on it).
from scipy.spatial import ConvexHull, convex_hull_plot_2d
import numpy as np
fit_hull = ConvexHull(data)
V = fit_hull.vertices
parameters = fit_hull.equations
My question is : how can I uniformly simulate : random points on the convex hull, knowing all of this ?
It is difficult because it is quite simple to simulate random points on a hyperplane, but here, it is a hyperplane bounded with the vertices of the facet (for example, with 3 variables : to create a facet, I need three points, so it would a triangle).
Thank you so much
Have a nice day (from France)

Make a Delaunay tessellation of your convex hull. In 2D these are triangles, in 3D these are tetrahedra, and you can get their area/volume.
Pick a triangle/tetrahedron at random, with probabiities given by the normalized areas/volumes.
Pick a point uniformly in this triangle/tetrahedron.

Related

Clustering on evenly spaced grid points

I have a 50 by 50 grid of evenly spaced (x,y) points. Each of these points has a third scalar value. This can be visualized using a contourplot which I have added. I am interested in the regions indicated in by the red circles. These regions of low "Z-values" are what I want to extract from this data.
2D contour plot of 50 x 50 evenly spaced grid points:
I want to do this by using clustering (machine learning), which can be lightning quick when applied correctly. The problem is, however, that the points are evenly spaced together and therefore the density of the entire dataset is equal everywhere.
I have tried using a DBSCAN algorithm with a custom distance metric which takes into account the Z values of each point. I have defined the distance between two points as follows:\
def custom_distance(point1,point2):
average_Z = (point1[2]+point2[2])/2
distance = np.sqrt(np.square((point1[0]-point2[0])) + np.square((point1[1]-point2[1])))
distance = distance * average_Z
return distance
This essentially determines the Euclidean distance between two points and adds to it the average of the two Z values of both points. In the picture below I have tested this distance determination function applied in a DBSCAN algorithm. Each point in this 50 by 50 grid each has a Z value of 1, except for four clusters that I have randomly placed. These points each have a z value of 10. The algorithm is able to find the clusters in the data based on their z value as can be seen below.
DBSCAN clustering result using scalar value distance determination:
Positive about the results I tried to apply it to my actual data, only to be disappointed by the results. Since the x and y values of my data are very large, I have simply scaled them to be 0 to 49. The z values I have left untouched. The results of the clustering can be seen in the image below:
Clustering result on original data:
This does not come close to what I want and what I was expecting. For some reason the clusters that are found are of rectangular shape and the light regions of low Z values that I am interested in are not extracted with this approach.
Is there any way I can make the DBSCAN algorithm work in this way? I suspect the reason that it is currently not working has something to do with the differences in scale of the x,y and z values. I am also open for tips or recommendations on other approaches on how to define and find the lighter regions in the data.

Random point from a multidimensional ball in Python [duplicate]

I've looked around and all solutions for generating uniform random points in/on the unit ball are designed for 2 or 3 dimensions.
What is a (tractable) way to generate uniform random points inside a ball in arbitrary dimension? Particularly, not just on the surface of the ball.
To preface, generating random points in the cube and throwing out the points with norm greater than 1 is not feasible in high dimension. The ratio of the volume of a unit ball to the volume of a unit cube in high dimension goes to 0. Even in 10 dimensions only about 0.25% of random points in the unit cube are also inside the unit ball.
The best way to generate uniformly distributed random points in a d-dimension ball appears to be by thinking of polar coordinates (directions instead of locations). Code is provided below.
Pick a random point on the unit ball with uniform distribution.
Pick a random radius where the likelihood of a radius corresponds to the surface area of a ball with that radius in d dimensions.
This selection process will (1) make all directions equally likely, and (2) make all points on the surface of balls within the unit ball equally likely. This will generate our desired uniformly random distribution over the entire interior of the ball.
Picking a random direction (on the unit ball)
In order to achieve (1) we can randomly generate a vector from d independent draws of a Gaussian distribution normalized to unit length. This works because a Gausssian distribution has a probability distribution function (PDF) with x^2 in an exponent. That implies that the joint distribution (for independent random variables this is the multiplication of their PDFs) will have (x_1^2 + x_2^2 + ... + x_d^2) in the exponent. Notice that resembles the definition of a sphere in d dimensions, meaning the joint distribution of d independent samples from a Gaussian distribution is invariant to rotation (the vectors are uniform over a sphere).
Here is what 200 random points generated in 2D looks like.
Picking a random radius (with appropriate probability)
In order to achieve (2) we can generate a radius by using the inverse of a cumulative distribution function (CDF) that corresponds to the surface area of a ball in d dimensions with radius r. We know that the surface area of an n-ball is proportional to r^d, meaning we can use this over the range [0,1] as a CDF. Now a random sample is generated by mapping random numbers in the range [0,1] through the inverse, r^(1/d).
Here is a visual of the CDF of x^2 (for two dimensions), random generated numbers in [0,1] would get mapped to the corresponding x coordinate on this curve. (e.g. .1 ➞ .317)
Code for the above
Finally, here is some Python code (assumes you have NumPy installed) that computes all of the above.
# Generate "num_points" random points in "dimension" that have uniform
# probability over the unit ball scaled by "radius" (length of points
# are in range [0, "radius"]).
def random_ball(num_points, dimension, radius=1):
from numpy import random, linalg
# First generate random directions by normalizing the length of a
# vector of random-normal values (these distribute evenly on ball).
random_directions = random.normal(size=(dimension,num_points))
random_directions /= linalg.norm(random_directions, axis=0)
# Second generate a random radius with probability proportional to
# the surface area of a ball with a given radius.
random_radii = random.random(num_points) ** (1/dimension)
# Return the list of random (direction & length) points.
return radius * (random_directions * random_radii).T
For posterity, here is a visual of 5000 random points generated with the above code.

Finding two most far away points in plot with many points in Python

I need to find the two points which are most far away from each other.
I have, as the screenshots say, an array containing two other arrays. one for the X and one for the Y coordinates. What's the best way to determine the longest line through the data? by saying this, i need to select the two most far away points in the plot. Hope you guys can help. Below are some screenshots to help explain the problem.
You can avoid computing all pairwise distances by observing that the two points which are furthest apart will occur as vertices in the convex hull. You can then compute pairwise distances between fewer points.
For example, with 100,000 points distributed uniformly in a unit square, there are only 22 points in the convex hull in my instance.
import numpy as np
from scipy import spatial
# test points
pts = np.random.rand(100_000, 2)
# two points which are fruthest apart will occur as vertices of the convex hull
candidates = pts[spatial.ConvexHull(pts).vertices]
# get distances between each pair of candidate points
dist_mat = spatial.distance_matrix(candidates, candidates)
# get indices of candidates that are furthest apart
i, j = np.unravel_index(dist_mat.argmax(), dist_mat.shape)
print(candidates[i], candidates[j])
# e.g. [ 1.11251218e-03 5.49583204e-05] [ 0.99989971 0.99924638]
If your data is 2-dimensional, you can compute the convex hull in O(N*log(N)) time where N is the number of points. By concentration of measure, this method deteriorates in performance for many common distributions as the number of dimensions grows.

How to get volume under a surface given by irregular data points using python?

Given a set of data (x,y,z) with z>0 (around 10^4 points), I want to find the volume under the surface it describes using python.
All these data points are randomly generated and unsorted, but from the 3D plot the surface is smooth and vanishing. See here plotted data
So far, I have checked the following related topics
Using Convex Hull : _Among other errors this one: QhullError: QH6114 qhull precision error: initial simplex is not convex.
Trapezoidal: Although it runs, I don't trust in the result, since even generating half sphere shell the volume was wrong.
Other Convex Hull: Same errors as before.
I also though I could make a grid and interpolate and then compute the double integral
# Given independents numpy.ndarray x2, y2, and z2
# Set up a regular grid of interpolation points
x1i = np.linspace(x2.min(), x2.max(), 1000)
y1i = np.linspace(y2.min(), y2.max(), 1000)
x1i, y1i = np.meshgrid(x1i, y1i)
# Interpolate;
z1i = scipy.interpolate.griddata((x2, y2), z2, (x1i, y1i), method='linear')
But then I don't know how to implement dblquad given the griddata
integrate.dblquad(z1i, min(x2),max(x2),lambda x: min(y2), lambda x: max(y2), epsabs=1.49e-08, epsrel=1.49e-08)
doesn't work.
How can I do it?
I suggest you to use Monte Carlo method for this kind of application. Since you know the data points , it is easy to use Monte Carlo method compared to double integrals.
One caveat is that, this only gives approximate volume under the surface. With increasing the number of random points you can increase the accuracy.
I would suggest you to implement this in google colab as it requires high computation power.
For more info:https://www.youtube.com/watch?v=6dprNJVaGPA&t=2s
This is an old post by maybe someone is still interested.
The volume can be computed by summing up the volumes of all tetrahedra of the Delaunay triangulation. This is efficiently computed using the determinant of three corner vectors which is the signed (could be neg!) volume of the resulting parallelepiped. Turns out the desired tetrahedral volume is just 1/6 of that.
(see https://www.youtube.com/watch?v=xgGdrTH6WGw)
This gives the volume of the convex hull. I had a more complex problem where I had a cubic volume of points which I transformed into another space in which the outer surface was not a complex hull. However, since I had both sets of points, I compute the complex hull in the original cube, then use those tetrahedra to compute the volume. (Note this assumes the transformation between GT and GP doesn't distort the points too much so as to change their order.) If you just want the convex hull volume, GT and GP are the same in the code below.
def getGamutVol(GT, GP):
# Calculates volume of a set of points. ASSUMES SAME SPATIAL ORDER IN SPACE IN GT and GP
# GT [N x 3] Points from which to get triangulation
# GP [N x 3] Points on which to calc volume
tri = Delaunay(GT).simplices
p0 = GP[tri[:,0],:]
p1 = GP[tri[:,1],:]
p2 = GP[tri[:,2],:]
p3 = GP[tri[:,3],:]
# Form vectors between triangle corners
v1 = p1-p0
v2 = p2-p0
v3 = p3-p0
# Calc triple products (signed volume of parallelpiped)
dets = np.linalg.det(np.dstack([v1, v2, v3]))
vol = np.sum(np.abs(dets))/6
return vol

Find density of points from their scatter plot in python

Can I go through equal boxed areas in the scatter plot, so I can calculate how many there are on average on each box?
Or, is there a specific function in python to calculate this?
I don't want a colored density plot, but a number that represents the density of these points in the scatter plot.
Here is for example a plot of the eigenvalues of a random matrix:
How would I find their density?
from scipy import linalg as la
e = la.eigvals(my_matrix)
hist,xedges,yedges = np.histogram2d(e.real,e.imag,bins=40,normed=False)
So in this case, 'hist' would be a 40x40 array (since bins=40). Its elements are the number of eigenvalues for each bin.
Thanks to #jepio and #plonser for the comments.

Categories