Efficient algorithm to find number density of points in 3D space - python

I have the position data for particles in 3D space. The particles are in random positions in the 3D box and I am trying to find the position of the maximum number density. Is there a simple algorithm to do this efficiently (I have a few million particles)? I have tried to use a similar idea to the centre of mass of the system (code is below). This gives me the centre of mass..is there a similar approach to find the position of the maximum number density?
I was thinking of making some 3d cube and separating it out into smaller cubes to the the number of particles within each cube....but that will take very long for many particles.
import numpy as np
X_data = np.random.random(100000) # x coordinates
Y_data = np.random.random(100000) # y-coordinates
Z_data = np.random.random(100000) # z-coordinates
#Assume all points are weighted equally
com_x = np.mean(X_data)
com_y = np.mean(Y_data)
com_z = np.mean(Z_data)
#Now have the centre of mass position

Related

Calculate the area enclosed by a 2D array of unordered points in python

I am trying to calculate the area of a shape enclosed by a large set of unordered points in python. I have a 2D array of points which I can plot as a scatterplot like this.
There are several ways to calculate the area enclosed by points, but these all assume ordered points, such as here and here. This method calculates the area unordered points, but it doesn't appear to work for complex shapes, as seen here. How would I calculate this area from unordered points in python?
Sample data looks like this:
[[225.93459 -27.25677 ]
[226.98128 -32.001945]
[223.3623 -34.119724]
[225.84741 -34.416553]]
From pen and paper one can see that this shape contains an area of ~12 (unitless) but putting these coordinates into one of the algorithms linked to previously returns an area of ~0.78.
Let's first mention that in the question How would I calculate this area from unordered points in python? used phrase 'unordered points' in the context of calculation of an area usually means that given are points of a contour enclosing an area which area is to calculate.
But in the question provided data sample are not points of a contour but just a cloud of points, which if visualized using a scatterplot results in a visually perceivable area.
The above is the reason why in the question provided links to algorithms calculating areas from 'unordered points' don't apply at all to what the question is about.
In other words, the actual title of the question I will answer below will be:
Calculate the visually perceivable area a cloud of (x,y) points is forming when visualized as a scatterplot
One of the possible options is mentioned in a comment to the question:
Honestly, you might consider taking THAT graph as a bitmap, and counting the number of non-white pixels in it. That is probably as close as you can get. – Tim Roberts
Given the image perfectly covering (without any margin) all the non-white pixels you can calculate the area the image rectangle is covering in units used in the underlying (x,y) data by calculating the area TA of the rectangle visible in the image from the underlying list of points P with (x,y) point coordinates ( P = [(x1,y1), (x2,y2), ...] ) as follows:
X = [x for x,y in P]
Y = [y for x,y in P]
TA = (max(X)-min(X))*(max(Y)-min(Y))
Assuming N_white is the number of all white pixels in the image with N pixels the actual area A covered by non-white pixels expressed in units used in the list of points P will be:
A = TA*(N-N_white)/N
Another approach using a list of points P with (x,y) point coordinates only ( without creation of an image ) consists of following steps:
decide which area Ap a point is covering and calculate half of the size h2 of a rectangle with this area around that point ( h2 = 0.5*sqrt(Ap) )
create a list R with rectangles around all points in the list P: R = [(x-h2, y+h2, x+h2, y-h2) for x,y in P]
use the code provided through a link listed in the stackoverflow question
Area of Union Of Rectangles using Segment Trees to calculate the total area covered by the rectangles in the list R.
The above approach has the advantage over the graphical one obtained from the scatterplot that with the choice of the area covered by a point you directly influence the used precision/resolution/granularity for the area calculation.
Given a 2D array of points the area covered by the points can be calculated with help of the return value of the same hist2d() function provided in the matplotlib module (as matplotlib.pyplot.hist2d()) which is used to show the scatterplot.
The 'trick' is to set the cmin parameter value of the function to 1 ( cmin=1 ) and then calculate the number of numpy.nan values in the by the function returned array setting them in relation to entire amount of array values.
In other words all what is necessary to calculate the area when creating the scatterplot is already there for easy use in a simple area calculation formulas if you know that the histogram creating function provide as return value all what is therefore necessary.
Below code of a ready to use function for the area calculation along with demonstration of function usage:
def area_of_points(points, grid_size = [1000, 1000]):
"""
Returns the area covered by N 2D-points provided in a 'points' array
points = [ (x1,y1), (x2,y2), ... , (xN, yN) ]
'grid_size' gives the number of grid cells in x and y direction the
'points' bounding box is divided into for calculation of the area.
Larger 'grid_size' values mean smaller grid cells, higher precision
of the area calculation and longer runtime.
area_of_points() requires installed matplotlib module. """
import matplotlib.pyplot as plt
import numpy as np
pts_x = [x for x,y in points]
pts_y = [y for x,y in points]
pts_bb_area = (max(pts_x)-min(pts_x))*(max(pts_y)-min(pts_y))
h2D,_,_,_ = plt.hist2d( pts_x, pts_y, bins = grid_size, cmin=1)
numberOfWhiteBins = np.count_nonzero(np.isnan(h2D))
numberOfAll2Dbins = h2D.shape[0]*h2D.shape[1]
areaFactor = 1.0 - numberOfWhiteBins/numberOfAll2Dbins
pts_pts_area = areaFactor * pts_bb_area
print(f'Areas: b-box = {pts_bb_area:8.4f}, points = {pts_pts_area:8.4f}')
plt.show()
return pts_pts_area
#:def area_of_points(points, grid_size = [1000, 1000])
import numpy as np
np.random.seed(12345)
x = np.random.normal(size=100000)
y = x + np.random.normal(size=100000)
pts = [[xi,yi] for xi,yi in zip(x,y)]
print(area_of_points(pts))
# ^-- prints: Areas: b-box = 114.5797, points = 7.8001
# ^-- prints: 7.800126875291629
The above code creates following scatterplot:
Notice that the printed output Areas: b-box = 114.5797, points = 7.8001 and the by the function returned area value 7.800126875291629 give the area in units in which the x,y coordinates in the array of points are specified.
Instead of usage of a function when utilizing the know how you can play around with the parameter of the scatterplot calculating the area of what can be seen in the scatterplot.
Below code which changes the displayed scatterplot using the same underlying point data:
import numpy as np
np.random.seed(12345)
x = np.random.normal(size=100000)
y = x + np.random.normal(size=100000)
pts = [[xi,yi] for xi,yi in zip(x,y)]
pts_values_example = \
[[0.53005, 2.79209],
[0.73751, 0.18978],
... ,
[-0.6633, -2.0404],
[1.51470, 0.86644]]
# ---
pts_x = [x for x,y in pts]
pts_y = [y for x,y in pts]
pts_bb_area = (max(pts_x)-min(pts_x))*(max(pts_y)-min(pts_y))
# ---
import matplotlib.pyplot as plt
bins = [320, 300] # resolution of the grid (for the scatter plot)
# ^-- resolution of precision for the calculation of area
pltRetVal = plt.hist2d( pts_x, pts_y, bins = bins, cmin=1, cmax=15 )
plt.colorbar() # display the colorbar (for a 2d density histogram)
plt.show()
# ---
h2D, xedges1D, yedges1D, h2DhistogramObject = pltRetVal
numberOfWhiteBins = np.count_nonzero(np.isnan(h2D))
numberOfAll2Dbins = (len(xedges1D)-1)*(len(yedges1D)-1)
areaFactor = 1.0 - numberOfWhiteBins/numberOfAll2Dbins
area = areaFactor * pts_bb_area
print(f'Areas: b-box = {pts_bb_area:8.4f}, points = {area:8.4f}')
# prints "Areas: b-box = 114.5797, points = 20.7174"
creating following scatterplot:
Notice that the calculated area is now larger due to smaller values used for grid resolution resulting in more of the area colored.

Random point from a multidimensional ball in Python [duplicate]

I've looked around and all solutions for generating uniform random points in/on the unit ball are designed for 2 or 3 dimensions.
What is a (tractable) way to generate uniform random points inside a ball in arbitrary dimension? Particularly, not just on the surface of the ball.
To preface, generating random points in the cube and throwing out the points with norm greater than 1 is not feasible in high dimension. The ratio of the volume of a unit ball to the volume of a unit cube in high dimension goes to 0. Even in 10 dimensions only about 0.25% of random points in the unit cube are also inside the unit ball.
The best way to generate uniformly distributed random points in a d-dimension ball appears to be by thinking of polar coordinates (directions instead of locations). Code is provided below.
Pick a random point on the unit ball with uniform distribution.
Pick a random radius where the likelihood of a radius corresponds to the surface area of a ball with that radius in d dimensions.
This selection process will (1) make all directions equally likely, and (2) make all points on the surface of balls within the unit ball equally likely. This will generate our desired uniformly random distribution over the entire interior of the ball.
Picking a random direction (on the unit ball)
In order to achieve (1) we can randomly generate a vector from d independent draws of a Gaussian distribution normalized to unit length. This works because a Gausssian distribution has a probability distribution function (PDF) with x^2 in an exponent. That implies that the joint distribution (for independent random variables this is the multiplication of their PDFs) will have (x_1^2 + x_2^2 + ... + x_d^2) in the exponent. Notice that resembles the definition of a sphere in d dimensions, meaning the joint distribution of d independent samples from a Gaussian distribution is invariant to rotation (the vectors are uniform over a sphere).
Here is what 200 random points generated in 2D looks like.
Picking a random radius (with appropriate probability)
In order to achieve (2) we can generate a radius by using the inverse of a cumulative distribution function (CDF) that corresponds to the surface area of a ball in d dimensions with radius r. We know that the surface area of an n-ball is proportional to r^d, meaning we can use this over the range [0,1] as a CDF. Now a random sample is generated by mapping random numbers in the range [0,1] through the inverse, r^(1/d).
Here is a visual of the CDF of x^2 (for two dimensions), random generated numbers in [0,1] would get mapped to the corresponding x coordinate on this curve. (e.g. .1 ➞ .317)
Code for the above
Finally, here is some Python code (assumes you have NumPy installed) that computes all of the above.
# Generate "num_points" random points in "dimension" that have uniform
# probability over the unit ball scaled by "radius" (length of points
# are in range [0, "radius"]).
def random_ball(num_points, dimension, radius=1):
from numpy import random, linalg
# First generate random directions by normalizing the length of a
# vector of random-normal values (these distribute evenly on ball).
random_directions = random.normal(size=(dimension,num_points))
random_directions /= linalg.norm(random_directions, axis=0)
# Second generate a random radius with probability proportional to
# the surface area of a ball with a given radius.
random_radii = random.random(num_points) ** (1/dimension)
# Return the list of random (direction & length) points.
return radius * (random_directions * random_radii).T
For posterity, here is a visual of 5000 random points generated with the above code.

k-space vector for N-body simulation box DFTs

I'm trying to write a particle mesh N-body simulation. In such a simulation the potential field is found by solving Poisson's equation using Fourier transforms. I have been following a presentation by Andrey Kravtsov (http://astro.uchicago.edu/~andrey/talks/PM/pm.pdf), but slide 15 has me confused. So far, I have assigned densities to a 3d grid from particle positions, and Fourier transformed the density grid. The next step is to calculate Green's function in Fourier space and multiply it with the Fourier transformed density grid, and afterwards applying an inverse Fourier transform to real space to obtain the potential grid. Through trial and error I traced the part that wasn't working correctly to the potential calculation, and specifically the k-space vector.
So, to calculate Green's function in Fourier space I need the Fourier axes usually called k-space vectors k_x, k_y, k_z. Using the slide it should be 2*pi*(k,l,m)/N_g for components k,l,m, where N_g is the number of grid cells. So far I've tried with these components running from 0,+1,+2,...,N_g. And -N_particle/2, ..., +N_particle/2 and several other iterations. The only thing that has produced reasonable results (can see a cluster in density slice projected on the same potential field slice) has been with using numpy.fft.freq in Python for specific values of the resolution/sample spacing. However, any resolution I chose (such as L/N_g, N_p/N_g, 2pi/N_g, etc.) did not scale properly with box size L, number of grid cells or number of particles and no longer worked for e.g. larger number of grid cells.
My question is:
How do I define my k-space vectors (i.e. the Fourier axes in reciprocal space) for a simulation with, along one direction, box size L, number of grid cells N_g and number of particles N_p?
I should add that the particle positions and velocities are all in code units as defined in the first few slides.
Minimum working example:
#!/usr/bin/env python3
import numpy as np
import matplotlib.pyplot as plt
M = 30 #Number of particles in 1 direction
Mn = 90 #Number of grid cells in 1 direction
Lx = 10 #grid physical size
u = np.random.random(M*M*M)
v = np.random.random(M*M*M)
w = np.random.random(M*M*M)
#Have purposefully taken smaller cube, to show potential works
planex = M*u
planey = M*v
planez = M*w
#Create a new grid
grid = np.zeros([Mn,Mn,Mn], dtype='cfloat')
#cell center coordinates
x_c = np.floor(planex).astype(int)%Mn
y_c = np.floor(planey).astype(int)%Mn
z_c = np.floor(planez).astype(int)%Mn
#in terms of the average density of the universe, doesnt matter for the
#example
mass = 1.
#Update the grid
grid[z_c,y_c,x_c] += mass
fig = plt.figure()
ax = fig.add_subplot(111)
plt.imshow(grid[:,:,2].real)
plt.show()
#FFT the grid
grid = np.fft.fftn(grid)
#The resolution and the k-space vectors are the parts I am unsure about
resolution = np.pi*2/(M/Mn)
resolution = Lx/Mn
#Define the k-space vectors
k_x = np.fft.fftfreq(Mn, resolution)
k_y = np.fft.fftfreq(Mn, resolution)
k_z = np.fft.fftfreq(Mn, resolution)
kz, ky, kx = np.meshgrid(k_z, k_y, k_x)
Omega_0 = 0.27
a = 0.3
#Calculate Greens function
k_squared = np.sin(kz/2)**2 + np.sin(ky/2)**2 + np.sin(kx/2)**2
Greens = -3*Omega_0/8/a*np.divide(1, k_squared, where=k_squared!=0)
#Multiply the grids in Fourier space
grid = Greens*grid
#IFFT to real space
potentials = np.fft.ifftn(grid)
fig1 = plt.figure()
ax1 = fig1.add_subplot(111)
plt.imshow(potentials[:,:,0].real)
plt.show()
Large value for the resolution makes velocities explosive, small value and very small velocities. So what makes the right resolution?
This is my first time asking on Stack overflow, please let me know if I'm doing something wrong.
Best, R.

More Efficient Way of Calculating Distances between Numpy Arrays?

What I have: pre-defined circles at different locations within a 3D box, and particles (with ids and locations in x-y-z coordinate).
What I want to do: find out all the particles within certain radius of each circle, and record their IDs.
What I have been doing: I'm using the distance.cdist() function to compute the euclidean distances between the x-axis positions of all particles and each center of the circles. This is done by looping over all the centers and storing the distances between all particles and each center in different columns.
My code is:
p_h_dx = np.empty((len(p),len(dsel))) #I create an empty array
for i in range(len(dsel)): #looping over all centers
distance = distance.cdist(np.column_stack((p[:,1],p_zeros)),np.column_stack((dsel[:,2][i],0)),'euclidean')
p_h_dx[:,i] = distance.reshape((len(p),))[:]
Then I repeat it for y-axis and z-axis. In the end, I calculate the distances in the following way:
###############################################
p_h_dx an array storing the distances between the x-axis positions of all
particles and centers of the circles;
p_h_dy for y-axis and p_h_dz for z-axis
###############################################
p_h_d = np.sqrt(np.power(p_h_dx,2) + np.power(p_h_dy,2) + np.power(p_h_dz,2))
I have more than 100 million particles and ~30 thousand circles, so with my current method, it takes ~1 week to achieve my goal, even with mpi. I'm wondering if there's a way to more efficiently do the job.

Count the numbers of neighbors in an array with 3D coordinates

I have the following problem. I have an an array of 3D coordinates like this:
arr = np.array([[21.000,48.000,28.000],[27.000,48.000,31.000],[21.000, 47.000,27.000],[22.000, 21.000, 97.000],[22.000, 20.000, 97.000],[22.000, 20.000, 95.000]])
This is only a small extract of the coordinates, because there a millions.
They a all lying in a 3D box. Additionaly I have the miniumum and maxium
coordinate of the 3D box the coordinates the are lying in:
edgeMaxima = [(0,0,0))(40,59,110)]
Now i want to calculate the number of neighbors which each coordinate could have in a specific radius and store it in a second array. So lets assume I take the radius 5 i would have 5^3-1 neighbors for most of my coordinates. The edge coordinates of course would have less neigbors. Is there a good way to calculate the numbers of neighbors?

Categories