Find density of points from their scatter plot in python - python

Can I go through equal boxed areas in the scatter plot, so I can calculate how many there are on average on each box?
Or, is there a specific function in python to calculate this?
I don't want a colored density plot, but a number that represents the density of these points in the scatter plot.
Here is for example a plot of the eigenvalues of a random matrix:
How would I find their density?

from scipy import linalg as la
e = la.eigvals(my_matrix)
hist,xedges,yedges = np.histogram2d(e.real,e.imag,bins=40,normed=False)
So in this case, 'hist' would be a 40x40 array (since bins=40). Its elements are the number of eigenvalues for each bin.
Thanks to #jepio and #plonser for the comments.

Related

Simulation of random points on a multivariate convex hull with scipy

I have a data with 10 000 rows and 10 columns. The first goal of my study is to calculate the "Convex Hull" on this data. The package "scipy" can do this easily and I can get the vertices, the parameters of the different hyperplanes such as : b0 + b1.x1 + b2.x2 + .... + b10.x10 = 0 where : (b0,b1,...,b10) are the parameters of one facet of the convex hull (I can know the vertices on it).
from scipy.spatial import ConvexHull, convex_hull_plot_2d
import numpy as np
fit_hull = ConvexHull(data)
V = fit_hull.vertices
parameters = fit_hull.equations
My question is : how can I uniformly simulate : random points on the convex hull, knowing all of this ?
It is difficult because it is quite simple to simulate random points on a hyperplane, but here, it is a hyperplane bounded with the vertices of the facet (for example, with 3 variables : to create a facet, I need three points, so it would a triangle).
Thank you so much
Have a nice day (from France)
Make a Delaunay tessellation of your convex hull. In 2D these are triangles, in 3D these are tetrahedra, and you can get their area/volume.
Pick a triangle/tetrahedron at random, with probabiities given by the normalized areas/volumes.
Pick a point uniformly in this triangle/tetrahedron.

How do you tell when the points on a 3D plot lie on a plane or on a curve?

I have a 3D scatter plot of some data generated using matploblib Axes 3D. I need to decide if it lies on a plane or a curve. I am trying to understand the visual differences that would indicate plane or curve. My guess is that if there are points along a wide range of z values then it lies on a curve because if it lied a plane, this would mean that the points are spread only over a flat surface. Even if my guess is correct, I am only right by virtue of eliminating the only other possibility so how would I tell specifically if it the data lies on a curve?
If the plane is tilted you will also find a wide range of z values.
Assuming you have your 3D points in an nx3 array, you can calculate the plane that fits them using this:
centroid = np.mean(points, axis=0)
_, eigenvalues, eigenvectors = np.linalg.svd(points - centroid, full_matrices=False)
normal = eigenvectors[2]
dispersion = eigenvalues[2]
The plane that best approximates the scattered points is defined by a point (centroid) and its normal vector.
Then, according to the dispersion value along the normal axis you can decide whether it is low enough (the points lie on a plane) or it is too high (they don't lie on a plane).

Alternative Voronoi diagram with specified minimum number of samples in each cell

I'm using the scipy.spatial.Voronoi function to bin up my data where the cell sizes decrease with increased 2D point density, in Python.
However the function places cell boundaries around each point such that each cell can only contain one point. (This is the point of a Voronoi diagram anyway)
However, is there a way to specify a minimum number of samples to lie in each bin such that you still have smaller cells in over-dense regions but no cell contains only one point?
Link to the function in question: https://docs.scipy.org/doc/scipy-0.18.1/reference/generated/scipy.spatial.Voronoi.html
Example code:
import matplotlib.pyplot as plt
from scipy.spatial import Voronoi, voronoi_plot_2d
vor = Voronoi(xy.T)
fig = voronoi_plot_2d(vor, show_vertices=False, line_colors='orange',
line_width=2, line_alpha=0.6, point_size=2)
Note: xy can be any stacked x and y coordinate array of points here.
Perhaps there's an alternative algorithm entirely for this task? Any help will be greatly appreciated!

Bin counting with hex-like bins in 2D

I use numpy's historgram 2d
to count how many (training) data points lie in one each. For a new point (x,y) I can then query how may points are in the same bin as (x,y):
Is there something similar for "hex" bins like in the matplotlib plots
where I can fill the bins and then later query how may point are in each bin?
You can get the bin data, but it's not as simple as doing the same operation on a rectangular grid. The reason is that hex bins do not lend themselves to straightforward two-dimensional indexing. The function hexbin(), returns a PolyCollection which has the bin locations accessible through get_offsets() and bin values accessible through get_array(). So:
import matplotlib.pyplot as plt
hb = plt.hexbin(...)
bin_xy = hb.get_offsets()
bin_counts = hb.get_array()

How to plot coarse-grained average of a set of data points?

I have a set of discrete 2-dimensional data points. Each of these points has a measured value associated with it. I would like to get a scatter plot with points colored by their measured values. But the data points are so dense that points with different colors would overlap with each other, that may not be good for visualization. So I am thinking if I could associate the color for each point based on the coarse-grained average of measured values of some points near it. Does anyone know how to implement this in Python?
Thanks!
I have it done by using sklearn.neighbors.RadiusNeighborsClassifier(), the idea is the take the average of the values of the neighbors within a specific radius. Suppose the coordinates of the data points are in the list temp_coors, the values associated with these points are coloring, then coloring could be coarse-grained in the following way:
r_neigh = RadiusNeighborsRegressor(radius=smoothing_radius, weights='uniform')
r_neigh.fit(temp_coors, coloring)
coloring = r_neigh.predict(temp_coors)

Categories