How can I vectorize this numpy rotation code? - python

The Goal:
I would like to vectorize (or otherwise speed up) this code. It rotates a 3d numpy model around its center point (let x,y,z denote the dimensions; then we want to rotate around the z-axis). The np model is binary voxels that are either "on" or "off"
I bet some basic matrix operation could do it, like take a layer and apply the rotation matrix to each element. The only issue with that is decimals; where should I have the new value land since cos(pi / 6) == sqrt(3) / 2?
The Code:
def rotate_model(m, theta):
'''
theta in degrees
'''
n =np.zeros(m.shape)
for i,layer in enumerate(m):
rotated = rotate(layer,theta)
n[i] = rotated
return n
where rotate() is:
def rotate(arr, theta):
'''
Rotates theta clockwise
rotated.shape == arr.shape, unlike scipy.ndimage.rotate(), which inflates size and also does some strange mixing
'''
if theta == int(theta):
theta *= pi / 180
theta = -theta
# theta=-theta b/c clockwise. Otherwise would default to counterclockwise
rotated =np.zeros(arr.shape)
#print rotated.shape[0], rotated.shape[1]
y_mid = arr.shape[0]//2
x_mid = arr.shape[1]//2
val = 0
for x_new in range(rotated.shape[1]):
for y_new in range(rotated.shape[0]):
x_centered = x_new - x_mid
y_centered = y_new - y_mid
x = x_centered*cos(theta) - y_centered*sin(theta)
y = x_centered*sin(theta) + y_centered*cos(theta)
x += x_mid
y += y_mid
x = int(round(x)); y = int(round(y)) # cast so range() picks it up
# lossy rotation
if x in range(arr.shape[1]) and y in range(arr.shape[0]):
val = arr[y,x]
rotated[y_new,x_new] = val
#print val
#print x,y
return rotated

You have a couple of problems in your code. First, if you want to fit the original image onto a rotated grid then you need a larger grid (usually). Alternatively, imagine a regular grid but the shape of your object - a rectangle - is rotated, thus becoming a "rhomb". It is obvious if you want to fit the entire rhomb - you need a larger output grid (array). On the other hand, you say in the code "rotated.shape == arr.shape, unlike scipy.ndimage.rotate(), which inflates size". If that is the case, maybe you do not want to fit the entire object? So, maybe it is OK to do this: rotated=np.zeros(arr.shape). But in general, yeah, one has to have a larger grid in order to fit the entire input image after it is rotated.
Another issue is angle conversion that you are doing:
if theta == int(theta):
theta *= pi / 180
theta = -theta
Why??? What will happen when I want to rotate the image by 1 radian? Or 2 radians? Am I forbidden to use integer number of radians? I think you are trying to do too much in this function and therefore it will be very confusing to do use it. Just require the caller to convert angles to radians. Or, you can do it inside this function if input theta is always in degrees. Or, you can add another parameter called, e.g., units and caller could set it to radians or degrees. Don't try to guess it based on "integer-ness" of input!
Now, let's rewrite your code a little bit:
rotated = np.zeros_like(arr) # instead of np.zero(arr.shape)
y_mid = arr.shape[0] // 2
x_mid = arr.shape[1] // 2
# val = 0 <- this is unnecessary
# pre-compute cos(theta) and sin(theta):
cs = cos(theta)
sn = sin(theta)
for x_new in range(rotated.shape[1]):
for y_new in range(rotated.shape[0]):
x = int(round((x_new - x_mid) * cs - (y_new - y_mid) * sn + x_mid)
y = int(round((x_new - x_mid) * sn - (y_new - y_mid) * cs + y_mid)
# just use comparisons, don't search through many values!
if 0 <= x < arr.shape[1] and 0 <= y < arr.shape[0]:
rotated[y_new, x_new] = arr[y, x]
So, now I can see (more easily) that for each pixel from the output array is mapped to a location in the input array. Yes, you can vectorize this.
import numpy as np
def rotate(arr, theta, unit='rad'):
# deal with theta units:
if unit.startswith('deg'):
theta = np.deg2rad(theta)
# for convenience, store array size:
ny, nx = arr.shape
# generate arrays of indices and flatten them:
y_new, x_new = np.indices(arr.shape)
x_new = x_new.ravel()
y_new = y_new.ravel()
# compute center of the array:
x0 = nx // 2
y0 = ny // 2
# compute old coordinates
xc = x_new - x0
yc = y_new - y0
x = np.round(np.cos(theta) * xc - np.sin(theta) * yc + x0).astype(np.int)
y = np.round(np.sin(theta) * xc - np.cos(theta) * yc + y0).astype(np.int)
# main idea to deal with indices is to create a mask:
mask = (x >= 0) & (x < nx) & (y >= 0) & (y < ny)
# ... and then select only those coordinates (both in
# input and "new" coordinates) that satisfy the above condition:
x = x[mask]
y = y[mask]
x_new = x_new[mask]
y_new = y_new[mask]
# map input values to output pixels *only* for selected "good" pixels:
rotated = np.zeros_like(arr)
rotated[y_new, x_new] = arr[y, x]
return rotated

Here is some code for anyone also doing 3d modeling. It solved my specific use-case pretty well. Still figuring out how to rotate in the proper plane. Hope it's helpful to you as well:
def rotate_model(m, theta):
'''
Redefines the prev 'rotate_model()' method
theta has to be in degrees
'''
rotated = scipy.ndimage.rotate(m, theta, axes=(1,2))
# have tried (1,0), (2,0), and now (1,2)
# ^ z is "up" and "2"
# scipy.ndimage.rotate() shrinks the model
# TODO: regrow it back
x_r = rotated.shape[1]
y_r = rotated.shape[0]
x_m = m.shape[1]
y_m = m.shape[0]
x_diff = abs(x_r - x_m)
y_diff = abs(y_r - y_m)
if x_diff%2==0 and y_diff%2==0:
return rotated[
x_diff//2 : x_r-x_diff//2,
y_diff//2 : y_r-y_diff//2,
:
]
elif x_diff%2==0 and y_diff%2==1:
# if this shift ends up turning the model to shit in a few iterations,
# change the following lines to include a flag that alternates cutting off the top and bottom bits of the array
return rotated[
x_diff//2 : x_r-x_diff//2,
y_diff//2+1 : y_r-y_diff//2,
:
]
elif x_diff%2==1 and y_diff%2==0:
return rotated[
x_diff//2+1 : x_r-x_diff//2,
y_diff//2 : y_r-y_diff//2,
:
]
else:
# x_diff%2==1 and y_diff%2==1:
return rotated[
x_diff//2+1 : x_r-x_diff//2,
y_diff//2+1 : y_r-y_diff//2,
:
]

Related

Normalise all vectors on the z axis

I have written code that calculates the angle between two vectors. However the way in which is does this is to start with two vectors, rotate each according to some euler angles calculated in a separate program, then calculate the angle between the vectors.
Up until now I have been working with a use case that means both starting vectors are (0,0,1) that makes life super easy. I could just take one set of euler angles away from the other and then calculate the angle between 0,0,1 and the vector that had been rotated by the difference. It meant I could plot nice distribution plots and vector diagrams because everything was normalised to 0,0,1. (I have 1000s of these vectors for the record).
No I am trying to write in a function that would allow for a use case where the two starting vectors are not on 0,0,1. I figured the easiest way to do this would be to calculate direction of the vector relative to 0,0,1 and after calculating the position of the vector just rotate by the precalculated offsets. (this might be a stupid way to do it, if it is please tell me).
MY current code works for a case where a vector is 0,1,0 but then breaks down if i start entering random numbers.
import numpy as np
import math
def RotationMatrix(axis, rotang):
"""
This uses Euler-Rodrigues formula.
"""
#Input taken in degrees, here we change it to radians
theta = rotang * 0.0174532925
axis = np.asarray(axis)
#Ensure axis is a unit vector
axis = axis/math.sqrt(np.dot(axis, axis))
#calclating a, b, c and d according to euler-rodrigues forumla requirments
a = math.cos(theta/2)
b, c, d = axis*math.sin(theta/2)
a2, b2, c2, d2 = a*a, b*b, c*c, d*d
bc, ad, ac, ab, bd, cd = b*c, a*d, a*c, a*b, b*d, c*d
#Return the rotation matrix
return np.array([[a2+b2-c2-d2, 2*(bc-ad), 2*(bd+ac)],
[2*(bc+ad), a2+c2-b2-d2, 2*(cd-ab)],
[2*(bd-ac), 2*(cd+ab), a2+d2-b2-c2]])
def ApplyRotationMatrix(vector, rotationmatrix):
"""
This function take the output from the RotationMatrix function and
uses that to apply the rotation to an input vector
"""
a1 = (vector[0] * rotationmatrix[0, 0]) + (vector[1] * rotationmatrix[0, 1]) + (vector[2] * rotationmatrix[0, 2])
b1 = (vector[0] * rotationmatrix[1, 0]) + (vector[1] * rotationmatrix[1, 1]) + (vector[2] * rotationmatrix[1, 2])
c1 = (vector[0] * rotationmatrix[2, 0]) + (vector[1] * rotationmatrix[2, 1]) + (vector[2] * rotationmatrix[2, 2])
return np.array((a1, b1, c1)
'''
Functions for Calculating the angles of 3D vectors relative to one another
'''
def CalculateAngleBetweenVector(vector, vector2):
"""
Does what it says on the tin, outputs an angle in degrees between two input vectors.
"""
dp = np.dot(vector, vector2)
maga = math.sqrt((vector[0] ** 2) + (vector[1] ** 2) + (vector[2] ** 2))
magb = math.sqrt((vector2[0] ** 2) + (vector2[1] ** 2) + (vector2[2] ** 2))
magc = maga * magb
dpmag = dp / magc
#These if statements deal with rounding errors of floating point operations
if dpmag > 1:
error = dpmag - 1
print('error = {}, do not worry if this number is very small'.format(error))
dpmag = 1
elif dpmag < -1:
error = 1 + dpmag
print('error = {}, do not worry if this number is very small'.format(error))
dpmag = -1
angleindeg = ((math.acos(dpmag)) * 180) / math.pi
return angleindeg
def CalculateAngleAroundZ(Vector):
X,Y,Z = Vector[0], Vector[1], Vector[2]
AngleAroundZ = math.atan2(Y, X)
AngleAroundZdeg = (AngleAroundZ*180)/math.pi
return AngleAroundZdeg
def CalculateAngleAroundX(Vector):
X,Y,Z = Vector[0], Vector[1], Vector[2]
AngleAroundZ = math.atan2(Y, Z)
AngleAroundZdeg = (AngleAroundZ*180)/math.pi
return AngleAroundZdeg
def CalculateAngleAroundY(Vector):
X,Y,Z = Vector[0], Vector[1], Vector[2]
AngleAroundZ = math.atan2(X, Z)
AngleAroundZdeg = (AngleAroundZ*180)/math.pi
return AngleAroundZdeg
V1 = (0,0,1)
V2 = (3,5,4)
Xoffset = (CalculateAngleAroundX(V2))
Yoffset = (CalculateAngleAroundY(V2))
Zoffset = (CalculateAngleAroundZ(V2))
XRM = RotationMatrix((1,0,0), (Xoffset * 1))
YRM = RotationMatrix((0,1,0), (Yoffset * 1))
ZRM = RotationMatrix((0,0,1), (Zoffset * 1))
V2 = V2 / np.linalg.norm(V2)
V2X = ApplyRotationMatrix(V2, XRM)
V2XY = ApplyRotationMatrix(V2X, YRM)
V2XYZ = ApplyRotationMatrix(V2XY, ZRM)
print(V2XYZ)
print(CalculateAngleBetweenVector(V1, V2XYZ))
Any advice to fix this problem will be much appreciated.
I'm not sure to fully understand what you need but if it is to compute the angle between two vectors in space you can use the formula:
where a.b is the scalar product and theta is the angle between vectors.
thus your function CalculateAngleBetweenVector becomes:
def CalculateAngleBetweenVector(vector, vector2):
return math.acos(np.dot(vector,vector2)/(np.linalg.norm(vector)* np.linalg.norm(vector2))) * 180 /math.pi
You can also simplify your ApplyRotationMatrix function:
def ApplyRotationMatrix(vector, rotationmatrix):
"""
This function take the output from the RotationMatrix function and
uses that to apply the rotation to an input vector
"""
return rotationmatrix # vector
the # symbol is the matrix product
Hope this will help you. Feel free to precise your request if this is not helpfull.
Im an idiot I just needed to do the cross product and the dot product and rotate by the dot product *-1 around the cross product.

Calculating mean value of a 2D array as a function of distance from the center in Python

I'm trying to calculate the mean value of a quantity(in the form of a 2D array) as a function of its distance from the center of a 2D grid. I understand that the idea is that I identify all the array elements that are at a distance R from the center, and then add them up and divide by the number of elements. However, I'm having trouble actually identifying an algorithm to go about doing this.
I have attached a working example of the code to generate the 2d array below. The code is for calculating some quantities that are resultant from gravitational lensing, so the way the array is made is irrelevant to this problem, but I have attached the entire code so that you could create the output array for testing.
import numpy as np
import multiprocessing
import matplotlib.pyplot as plt
n = 100 # grid size
c = 3e8
G = 6.67e-11
M_sun = 1.989e30
pc = 3.086e16 # parsec
Dds = 625e6*pc
Ds = 1726e6*pc #z=2
Dd = 1651e6*pc #z=1
FOV_arcsec = 0.0001
FOV_arcmin = FOV_arcsec/60.
pix2rad = ((FOV_arcmin/60.)/float(n))*np.pi/180.
rad2pix = 1./pix2rad
Renorm = (4*G*M_sun/c**2)*(Dds/(Dd*Ds))
#stretch = [10, 2]
# To create a random distribution of points
def randdist(PDF, x, n):
#Create a distribution following PDF(x). PDF and x
#must be of the same length. n is the number of samples
fp = np.random.rand(n,)
CDF = np.cumsum(PDF)
return np.interp(fp, CDF, x)
def get_alpha(args):
zeta_list_part, M_list_part, X, Y = args
alpha_x = 0
alpha_y = 0
for key in range(len(M_list_part)):
z_m_z_x = (X - zeta_list_part[key][0])*pix2rad
z_m_z_y = (Y - zeta_list_part[key][1])*pix2rad
alpha_x += M_list_part[key] * z_m_z_x / (z_m_z_x**2 + z_m_z_y**2)
alpha_y += M_list_part[key] * z_m_z_y / (z_m_z_x**2 + z_m_z_y**2)
return (alpha_x, alpha_y)
if __name__ == '__main__':
# number of processes, scale accordingly
num_processes = 1 # Number of CPUs to be used
pool = multiprocessing.Pool(processes=num_processes)
num = 100 # The number of points/microlenses
r = np.linspace(-n, n, n)
PDF = np.abs(1/r)
PDF = PDF/np.sum(PDF) # PDF should be normalized
R = randdist(PDF, r, num)
Theta = 2*np.pi*np.random.rand(num,)
x1= [R[k]*np.cos(Theta[k])*1 for k in range(num)]
y1 = [R[k]*np.sin(Theta[k])*1 for k in range(num)]
# Uniform distribution
#R = np.random.uniform(-n,n,num)
#x1= np.random.uniform(-n,n,num)
#y1 = np.random.uniform(-n,n,num)
zeta_list = np.column_stack((np.array(x1), np.array(y1))) # List of coordinates for the microlenses
x = np.linspace(-n,n,n)
y = np.linspace(-n,n,n)
X, Y = np.meshgrid(x,y)
M_list = np.array([0.1 for i in range(num)])
# split zeta_list, M_list, X, and Y
zeta_list_split = np.array_split(zeta_list, num_processes, axis=0)
M_list_split = np.array_split(M_list, num_processes)
X_list = [X for e in range(num_processes)]
Y_list = [Y for e in range(num_processes)]
alpha_list = pool.map(
get_alpha, zip(zeta_list_split, M_list_split, X_list, Y_list))
alpha_x = 0
alpha_y = 0
for e in alpha_list:
alpha_x += e[0]
alpha_y += e[1]
alpha_x_y = 0
alpha_x_x = 0
alpha_y_y = 0
alpha_y_x = 0
alpha_x_y, alpha_x_x = np.gradient(alpha_x*rad2pix*Renorm,edge_order=2)
alpha_y_y, alpha_y_x = np.gradient(alpha_y*rad2pix*Renorm,edge_order=2)
det_A = 1 - alpha_y_y - alpha_x_x + (alpha_x_x)*(alpha_y_y) - (alpha_x_y)*(alpha_y_x)
abs = np.absolute(det_A)
I = abs**(-1.)
O = np.log10(I+1)
plt.contourf(X,Y,O,100)
The array of interest is O, and I have attached a plot of how it should look like. It can be different based on the random distribution of points.
What I'm trying to do is to plot the mean values of O as a function of radius from the center of the grid. In the end, I want to be able to plot the average O as a function of distance from center in a 2d line graph. So I suppose the first step is to define circles of radius R, based on X and Y.
def circle(x,y):
r = np.sqrt(x**2 + y**2)
return r
Now I just have to figure out a way to find all the values of O, that have the same indices as equivalent values of R. Kinda confused on this part and would appreciate any help.
You can find the geometric coordinates of a circle with center (0,0) and radius R as such:
phi = np.linspace(0, 1, 50)
x = R*np.cos(2*np.pi*phi)
y = R*np.sin(2*np.pi*phi)
these values however will not fall on the regular pixel grid but in between.
In order to use them as sampling points you can either round the values and use them as indexes or interpolate the values from the near pixels.
Attention: The pixel indexes and the x, y are not the same. In your example (0,0) is at the picture location (50,50).

Rotating 1D numpy array of radial intensities into 2D array of spacial intensities

I have a numpy array filled with intensity readings at different radii in a uniform circle (for context, this is a 1D radiative transfer project for protostellar formation models: while much better models exist, my supervisor wasnts me to have the experience of producing one so I understand how others work).
I want to take that 1d array, and "rotate" it through a circle, forming a 2D array of intensities that could then be shown with imshow (or, with a bit of work, aplpy). The final array needs to be 2d, and the projection needs to be Cartesian, not polar.
I can do it with nested for loops, and I can do it with lookup tables, but I have a feeling there must be a neat way of doing it in numpy or something.
Any ideas?
EDIT:
I have had to go back and recreate my (frankly horrible) mess of for loops and if statements that I had before. If I really tried, I could probably get rid of one of the loops and one of the if statements by condensing things down. However, the aim is not to make it work with for loops, but see if there is a built in way to rotate the array.
impB is an array that differs slightly from what I stated it was before. Its actually just a list of radii where particles are detected. I then bin those into radius bins to get the intensity (or frequency if you prefer) in each radius. R is the scale factor for my radius as I run the model in a dimensionless way. iRes is a resolution scale factor, essentially how often I want to sample my radial bins. Everything else should be clear.
radJ = np.ndarray(shape=(2*iRes, 2*iRes)) # Create array of 2xRadius square
for i in range(iRes):
n = len(impB[np.where(impB[:] < ((i+1.) * (R / iRes)))]) # Count number of things within this radius +1
m = len(impB[np.where(impB[:] <= ((i) * (R / iRes)))]) # Count number of things in this radius
a = (((i + 1) * (R / iRes))**2 - ((i) * (R / iRes))**2) * math.pi # A normalisation factor based on area.....dont ask
for x in range(iRes):
for y in range(iRes):
if (x**2 + y**2) < (i * iRes)**2:
if (x**2 + y**2) >= (i * iRes)**2: # Checks for radius, and puts in cartesian space
radJ[x+iRes,y+iRes] = (n-m) / a # Put in actual intensity bins
radJ[x+iRes,-y+iRes] = (n-m) / a
radJ[-x+iRes,y+iRes] = (n-m) / a
radJ[-x+iRes,-y+iRes] = (n-m) / a
Nested loops are a simple approach for that. With ri_data_r and y containing your radius values (difference to the middle pixel) and the array for rotation, respectively, I would suggest:
from scipy import interpolate
import numpy as np
y = np.random.rand(100)
ri_data_r = np.linspace(-len(y)/2,len(y)/2,len(y))
interpol_index = interpolate.interp1d(ri_data_r, y)
xv = np.arange(-1, 1, 0.01) # adjust your matrix values here
X, Y = np.meshgrid(xv, xv)
profilegrid = np.ones(X.shape, float)
for i, x in enumerate(X[0, :]):
for k, y in enumerate(Y[:, 0]):
current_radius = np.sqrt(x ** 2 + y ** 2)
profilegrid[i, k] = interpol_index(current_radius)
print(profilegrid)
This will give you exactly what you are looking for. You just have to take in your array and calculate an symmetric array ri_data_r that has the same length as your data array and contains the distance between the actual data and the middle of the array. The code is doing this automatically.
I stumbled upon this question in a different context and I hope I understood it right. Here are two other ways of doing this. The first uses skimage.transform.warp with interpolation of desired order (here we use order=0 Nearest-neighbor). This method is slower but more precise and needs less memory then the second method.
The second one does not use interpolation, therefore is faster but also less precise and needs way more memory because it stores each 2D array containing one tilt until the end, where they are averaged with np.nanmean().
The difference between both solutions stemmed from the problem of handling the center of the final image where the tilts overlap the most, i.e. the first one would just add values with each tilt ending up out of the original range. This was "solved" by clipping the matrix in each step to a global_min and global_max (consult the code). The second one solves it by taking the mean of the tilts where they overlap, which forces us to use the np.nan.
Please, read the Example of usage and Sanity check sections in order to understand the plot titles.
Solution 1:
import numpy as np
from skimage.transform import warp
def rotate_vector(vector, deg_angle):
# Credit goes to skimage.transform.radon
assert vector.ndim == 1, 'Pass only 1D vectors, e.g. use array.ravel()'
center = vector.size // 2
square = np.zeros((vector.size, vector.size))
square[center,:] = vector
rad_angle = np.deg2rad(deg_angle)
cos_a, sin_a = np.cos(rad_angle), np.sin(rad_angle)
R = np.array([[cos_a, sin_a, -center * (cos_a + sin_a - 1)],
[-sin_a, cos_a, -center * (cos_a - sin_a - 1)],
[0, 0, 1]])
# Approx. 80% of time is spent in this function
return warp(square, R, clip=False, output_shape=((vector.size, vector.size)))
def place_vectors(vectors, deg_angles):
matrix = np.zeros((vectors.shape[-1], vectors.shape[-1]))
global_min, global_max = 0, 0
for i, deg_angle in enumerate(deg_angles):
tilt = rotate_vector(vectors[i], deg_angle)
global_min = tilt.min() if global_min > tilt.min() else global_min
global_max = tilt.max() if global_max < tilt.max() else global_max
matrix += tilt
matrix = np.clip(matrix, global_min, global_max)
return matrix
Solution 2:
Credit for the idea goes to my colleague Michael Scherbela.
import numpy as np
def rotate_vector(vector, deg_angle):
assert vector.ndim == 1, 'Pass only 1D vectors, e.g. use array.ravel()'
square = np.ones([vector.size, vector.size]) * np.nan
radius = vector.size // 2
r_values = np.linspace(-radius, radius, vector.size)
rad_angle = np.deg2rad(deg_angle)
ind_x = np.round(np.cos(rad_angle) * r_values + vector.size/2).astype(np.int)
ind_y = np.round(np.sin(rad_angle) * r_values + vector.size/2).astype(np.int)
ind_x = np.clip(ind_x, 0, vector.size-1)
ind_y = np.clip(ind_y, 0, vector.size-1)
square[ind_y, ind_x] = vector
return square
def place_vectors(vectors, deg_angles):
matrices = []
for deg_angle, vector in zip(deg_angles, vectors):
matrices.append(rotate_vector(vector, deg_angle))
matrix = np.nanmean(np.array(matrices), axis=0)
return np.nan_to_num(matrix, copy=False, nan=0.0)
Example of usage:
r = 100 # Radius of the circle, i.e. half the length of the vector
n = int(np.pi * r / 8) # Number of vectors, e.g. number of tilts in tomography
v = np.ones(2*r) # One vector, e.g. one tilt in tomography
V = np.array([v]*n) # All vectors, e.g. a sinogram in tomography
# Rotate 1D vector to a specific angle (output is 2D)
angle = 45
rotated = rotate_vector(v, angle)
# Rotate each row of a 2D array according to its angle (output is 2D)
angles = np.linspace(-90, 90, num=n, endpoint=False)
inplace = place_vectors(V, angles)
Sanity check:
These are just simple checks which by no means cover all possible edge cases. Depending on your use case you might want to extend the checks and adjust the method.
# I. Sanity check
# Assuming n <= πr and v = np.ones(2r)
# Then sum(inplace) should be approx. equal to (n * (2πr - n)) / π
# which is an area that should be covered by the tilts
desired_area = (n * (2 * np.pi * r - n)) / np.pi
covered_area = np.sum(inplace)
covered_frac = covered_area / desired_area
print(f'This method covered {covered_frac * 100:.2f}% '
'of the area which should be covered in total.')
# II. Sanity check
# Assuming n <= πr and v = np.ones(2r)
# Then a circle M with radius m <= r should be the largest circle which
# is fully covered by the vectors. I.e. its mean should be no less than 1.
# If n = πr then m = r.
# m = n / π
m = int(n / np.pi)
# Code for circular mask not included
mask = create_circular_mask(2*r, 2*r, center=None, radius=m)
m_area = np.mean(inplace[mask])
print(f'Full radius r={r}, radius m={m}, mean(M)={m_area:.4f}.')
Code for plotting:
import matplotlib.pyplot as plt
plt.figure(figsize=(16, 8))
plt.subplot(121)
rotated = np.nan_to_num(rotated) # not necessary in case of the first method
plt.title(
f'Output of rotate_vector(), angle={angle}°\n'
f'Sum is {np.sum(rotated):.2f} and should be {np.sum(v):.2f}')
plt.imshow(rotated, cmap=plt.cm.Greys_r)
plt.subplot(122)
plt.title(
f'Output of place_vectors(), r={r}, n={n}\n'
f'Covered {covered_frac * 100:.2f}% of the area which should be covered.\n'
f'Mean of the circle M is {m_area:.4f} and should be 1.0.')
plt.imshow(inplace)
circle=plt.Circle((r, r), m, color='r', fill=False)
plt.gcf().gca().add_artist(circle)
plt.gcf().gca().legend([circle], [f'Circle M (m={m})'])

Make a total vector for three vectors in python

We have a class that have three functions called(Bdisk, Bhalo,and BX).
all of these functions accept arrays (e.g. shape (1000))not matrices (e.g. shape (2,1000)).
I want to get the total of all these functions( total= Bdisk + Bhalo+BX), total these all functions give the magnetic field in all three components (B_r, B_phi, B_z) for thousand coordinate points (r, phi, z).
the code is here:
import numpy as np
import logging
import warnings
import gmf
signum = lambda x: (x < 0.) * -1. + (x >= 0) * 1.
pi = np.pi
#Class with analytical functions that describe the GMF according to the model of JF12
class GMF(object):
def __init__(self): # self:is automatically set to reference the newly created object that needs to be initialized
self.Rsun = -8.5 # position of the sun along the x axis in kpc
############################################################################
# Disk Parameters
############################################################################
self.bring, self.bring_unc = 0.1,0.1 # floats, field strength in ring at 3 kpc < r < 5 kpc
self.hdisk, self.hdisk_unc = 0.4, 0.03 # float, disk/halo transition height
self.wdisk, self.wdisk_unc = 0.27,0.08 # floats, transition width
self.b = np.array([0.1,3.,-0.9,-0.8,-2.0,-4.2,0.,2.7]) # (8,1)-dim np.arrays, field strength of spiral arms at 5 kpc
self.b_unc = np.array([1.8,0.6,0.8,0.3,0.1,0.5,1.8,1.8]) # uncertainty
self.rx = np.array([5.1,6.3,7.1,8.3,9.8,11.4,12.7,15.5])# (8,1)-dim np.array,dividing lines of spiral lines coordinates of neg. x-axes that intersect with arm
self.idisk = 11.5 * pi/180. # float, spiral arms pitch angle
#############################################################################
# Halo Parameters
#############################################################################
self.Bn, self.Bn_unc = 1.4,0.1 # floats, field strength northern halo
self.Bs, self.Bs_unc = -1.1,0.1 # floats, field strength southern halo
self.rn, self.rn_unc = 9.22,0.08 # floats, transition radius south, lower limit
self.rs, self.rs_unc = 16.7,0. # transition radius south, lower limit
self.whalo, self.whalo_unc = 0.2,0.12 # floats, transition width
self.z0, self.z0_unc = 5.3, 1.6 # floats, vertical scale height
##############################################################################
# Out of plaxe or "X" component Parameters
##############################################################################
self.BX0, self.BX_unc = 4.6,0.3 # floats, field strength at origin
self.ThetaX0, self.ThetaX0_unc = 49. * pi/180., pi/180. # elev. angle at z = 0, r > rXc
self.rXc, self.rXc_unc = 4.8, 0.2 # floats, radius where thetaX = thetaX0
self.rX, self.rX_unc = 2.9, 0.1 # floats, exponential scale length
# striated field
self.gamma, self.gamma_unc = 2.92,0.14 # striation and/or rel. elec. number dens. rescaling
return
##################################################################################
##################################################################################
# Transition function given by logistic function eq.5
##################################################################################
def L(self,z,h,w):
if np.isscalar(z):
z = np.array([z]) # scalar or numpy array with positions (height above disk, z; distance from center, r)
ones = np.ones(z.shape[0])
return 1./(ones + np.exp(-2. *(np.abs(z)- h)/w))
####################################################################################
# return distance from center for angle phi of logarithmic spiral
# r(phi) = rx * exp(b * phi) as np.array
####################################################################################
def r_log_spiral(self,phi):
if np.isscalar(phi): #Returns True if the type of num is a scalar type.
phi = np.array([phi])
ones = np.ones(phi.shape[0])
# self.rx.shape = 8
# phi.shape = p
# then result is given as (8,p)-dim array, each row stands for one rx
# vstack : Take a sequence of arrays and stack them vertically to make a single array
# tensordot(a, b, axes=2):Compute tensor dot product along specified axes for arrays >=1D.
result = np.tensordot(self.rx , np.exp((phi - 3.*pi*ones) / np.tan(pi/2. - self.idisk)),axes = 0)
result = np.vstack((result, np.tensordot(self.rx , np.exp((phi - pi*ones) / np.tan(pi/2. - self.idisk)),axes = 0) ))
result = np.vstack((result, np.tensordot(self.rx , np.exp((phi + pi*ones) / np.tan(pi/2. - self.idisk)),axes = 0) ))
return np.vstack((result, np.tensordot(self.rx , np.exp((phi + 3.*pi*ones) / np.tan(pi/2. - self.idisk)),axes = 0) ))
#############################################################################################
# Disk component in galactocentric cylindrical coordinates (r,phi,z)
#############################################################################################
def Bdisk(self,r,phi,z):
# Bdisk is purely azimuthal (toroidal) with the field strength b_ring
"""
r: N-dim np.array, distance from origin in GC cylindrical coordinates, is in kpc
z: N-dim np.array, height in kpc in GC cylindrical coordinates
phi:N-dim np.array, polar angle in GC cylindircal coordinates, in radian
Bdisk: (3,N)-dim np.array with (r,phi,z) components of disk field for each coordinate tuple
|Bdisk|: N-dim np.array, absolute value of Bdisk for each coordinate tuple
"""
if (not r.shape[0] == phi.shape[0]) and (not z.shape[0] == phi.shape[0]):
warnings.warn("List do not have equal shape! returning -1", RuntimeWarning)
return -1
# Return a new array of given shape and type, filled with zeros.
Bdisk = np.zeros((3,r.shape[0])) # Bdisk vector in r, phi, z
ones = np.ones(r.shape[0])
r_center = (r >= 3.) & (r < 5.1)
r_disk = (r >= 5.1) & (r <= 20.)
Bdisk[1,r_center] = self.bring
# Determine in which arm we are
# this is done for each coordinate individually
if np.sum(r_disk):
rls = self.r_log_spiral(phi[r_disk])
rls = np.abs(rls - r[r_disk])
arms = np.argmin(rls, axis = 0) % 8
# The magnetic spiral defined at r=5 kpc and fulls off as 1/r ,the field direction is given by:
Bdisk[0,r_disk] = np.sin(self.idisk)* self.b[arms] * (5. / r[r_disk])
Bdisk[1,r_disk] = np.cos(self.idisk)* self.b[arms] * (5. / r[r_disk])
Bdisk *= (ones - self.L(z,self.hdisk,self.wdisk)) # multiplied by L
return Bdisk, np.sqrt(np.sum(Bdisk**2.,axis = 0)) # the Bdisk, the normalization
# axis=0 : sum over index 0(row)
# axis=1 : sum over index 1(columns)
##############################################################################################
# Halo component
###############################################################################################
def Bhalo(self,r,z):
# Bhalo is purely azimuthal (toroidal), i.e. has only a phi component
if (not r.shape[0] == z.shape[0]):
warnings.warn("List do not have equal shape! returning -1", RuntimeWarning)
return -1
Bhalo = np.zeros((3,r.shape[0])) # Bhalo vector in r, phi, z rows: r, phi and z component
ones = np.ones(r.shape[0])
m = ( z != 0. )
# SEE equation 6.
Bhalo[1,m] = np.exp(-np.abs(z[m])/self.z0) * self.L(z[m], self.hdisk, self.wdisk) * \
( self.Bn * (ones[m] - self.L(r[m], self.rn, self.whalo)) * (z[m] > 0.) \
+ self.Bs * (ones[m] - self.L(r[m], self.rs, self.whalo)) * (z[m] < 0.) )
return Bhalo , np.sqrt(np.sum(Bhalo**2.,axis = 0))
##############################################################################################
# BX component (OUT OF THE PLANE)
###############################################################################################
def BX(self,r,z):
#BX is purely ASS and poloidal, i.e. phi component = 0
if (not r.shape[0] == z.shape[0]):
warnings.warn("List do not have equal shape! returning -1", RuntimeWarning)
return -1
BX= np.zeros((3,r.shape[0])) # BX vector in r, phi, z rows: r, phi and z component
m = np.sqrt(r**2. + z**2.) >= 1.
bx = lambda r_p: self.BX0 * np.exp(-r_p / self.rX) # eq.7
thetaX = lambda r,z,r_p: np.arctan(np.abs(z)/(r - r_p)) # eq.10
r_p = r[m] *self.rXc/(self.rXc + np.abs(z[m] ) / np.tan(self.ThetaX0)) # eq 9
m_r_b = r_p > self.rXc # region with constant elevation angle
m_r_l = r_p <= self.rXc # region with varying elevation angle
theta = np.zeros(z[m].shape[0])
b = np.zeros(z[m].shape[0])
r_p0 = (r[m])[m_r_b] - np.abs( (z[m])[m_r_b] ) / np.tan(self.ThetaX0) # eq.8
b[m_r_b] = bx(r_p0) * r_p0/ (r[m])[m_r_b] # the field strength in the constant elevation angle (b_x(r_p)r_p/r)
theta[m_r_b] = self.ThetaX0 * np.ones(theta.shape[0])[m_r_b]
b[m_r_l] = bx(r_p[m_r_l]) * (r_p[m_r_l]/(r[m])[m_r_l] )**2. # the field strength with varying elevation angle (b_x(r_p)(r_p/r)**2)
theta[m_r_l] = thetaX((r[m])[m_r_l] ,(z[m])[m_r_l] ,r_p[m_r_l])
mz = (z[m] == 0.)
theta[mz] = np.pi/2.
BX[0,m] = b * (np.cos(theta) * (z[m] >= 0) + np.cos(pi*np.ones(theta.shape[0]) - theta) * (z[m] < 0))
BX[2,m] = b * (np.sin(theta) * (z[m] >= 0) + np.sin(pi*np.ones(theta.shape[0]) - theta) * (z[m] < 0))
return BX, np.sqrt(np.sum(BX**2.,axis=0))
then, I create three arrays, one for r, one for phi, one for z. Each of these arrays has (e.g: thousand elements). like this:
import gmf
gmfm = gmf.GMF()
x = np.linspace(-20.,20.,100)
y = np.linspace(-20.,20.,100)
z = np.linspace(-1.,1.,x.shape[0])
xx,yy = np.meshgrid(x,y)
rr = np.sqrt(xx**2. + yy**2.)
theta = np.arctan2(yy,xx)
for i,r in enumerate(rr[:]):
Bdisk, Babs_d = gmfm.Bdisk(r,theta[i],z)
Bhalo, Babs_h = gmfm.Bhalo(r,z)
BX, Babs_x = gmfm.BX(r,z)
Btotal = Bdisk + Bhalo + BX
but I am getting when I make the addition of the three functions Btotal= Bdisk + Bhalo+BX) in 2d matrix with 3 rows and 100 columns.
My question is how can I add these three functions together to get Btotal in shape (n,) e.g( shape(100,)
because as I said in the beginning the three functions accept accept arrays (e.g. shape (1000) )then when we adding the three functions together we have to get the total also in the same shape (shape (n,)?
I do not know how can I do it, could you please tell me how can I make it.
thank you for your cooperation.
You need to correct the indention, for example in the def Bdisk method.
More importantly in
for i,r in enumerate(rr[:]):
Bdisk, Babs_d = gmfm.Bdisk(r,theta[i],z)
Bhalo, Babs_h = gmfm.Bhalo(r,z)
BX, Babs_x = gmfm.BX(r,z)
Btotal = Bdisk + Bhalo + BX
are you doing this addition for each iteration, or once at the end of the loop? You aren't accumulating any values over iterations. You are just throwing away the old ones, leaving you with the final iteration.
As for adding the array - it appears that all your arrays are initialed like:
Bdisk = np.zeros((3,r.shape[0]))
If that's what the method returns, then
Bdisk + Bhalo + BX
will just sum the corresponding elements of each array, resulting in a Btotal with the same shape. If you don't not like the shape of Btotal then change how Bdisk is calculated, because it has the same shape.

Efficient method of calculating density of irregularly spaced points

I am attempting to generate map overlay images that would assist in identifying hot-spots, that is areas on the map that have high density of data points. None of the approaches that I've tried are fast enough for my needs.
Note: I forgot to mention that the algorithm should work well under both low and high zoom scenarios (or low and high data point density).
I looked through numpy, pyplot and scipy libraries, and the closest I could find was numpy.histogram2d. As you can see in the image below, the histogram2d output is rather crude. (Each image includes points overlaying the heatmap for better understanding)
My second attempt was to iterate over all the data points, and then calculate the hot-spot value as a function of distance. This produced a better looking image, however it is too slow to use in my application. Since it's O(n), it works ok with 100 points, but blows out when I use my actual dataset of 30000 points.
My final attempt was to store the data in an KDTree, and use the nearest 5 points to calculate the hot-spot value. This algorithm is O(1), so much faster with large dataset. It's still not fast enough, it takes about 20 seconds to generate a 256x256 bitmap, and I would like this to happen in around 1 second time.
Edit
The boxsum smoothing solution provided by 6502 works well at all zoom levels and is much faster than my original methods.
The gaussian filter solution suggested by Luke and Neil G is the fastest.
You can see all four approaches below, using 1000 data points in total, at 3x zoom there are around 60 points visible.
Complete code that generates my original 3 attempts, the boxsum smoothing solution provided by 6502 and gaussian filter suggested by Luke (improved to handle edges better and allow zooming in) is here:
import matplotlib
import numpy as np
from matplotlib.mlab import griddata
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import math
from scipy.spatial import KDTree
import time
import scipy.ndimage as ndi
def grid_density_kdtree(xl, yl, xi, yi, dfactor):
zz = np.empty([len(xi),len(yi)], dtype=np.uint8)
zipped = zip(xl, yl)
kdtree = KDTree(zipped)
for xci in range(0, len(xi)):
xc = xi[xci]
for yci in range(0, len(yi)):
yc = yi[yci]
density = 0.
retvalset = kdtree.query((xc,yc), k=5)
for dist in retvalset[0]:
density = density + math.exp(-dfactor * pow(dist, 2)) / 5
zz[yci][xci] = min(density, 1.0) * 255
return zz
def grid_density(xl, yl, xi, yi):
ximin, ximax = min(xi), max(xi)
yimin, yimax = min(yi), max(yi)
xxi,yyi = np.meshgrid(xi,yi)
#zz = np.empty_like(xxi)
zz = np.empty([len(xi),len(yi)])
for xci in range(0, len(xi)):
xc = xi[xci]
for yci in range(0, len(yi)):
yc = yi[yci]
density = 0.
for i in range(0,len(xl)):
xd = math.fabs(xl[i] - xc)
yd = math.fabs(yl[i] - yc)
if xd < 1 and yd < 1:
dist = math.sqrt(math.pow(xd, 2) + math.pow(yd, 2))
density = density + math.exp(-5.0 * pow(dist, 2))
zz[yci][xci] = density
return zz
def boxsum(img, w, h, r):
st = [0] * (w+1) * (h+1)
for x in xrange(w):
st[x+1] = st[x] + img[x]
for y in xrange(h):
st[(y+1)*(w+1)] = st[y*(w+1)] + img[y*w]
for x in xrange(w):
st[(y+1)*(w+1)+(x+1)] = st[(y+1)*(w+1)+x] + st[y*(w+1)+(x+1)] - st[y*(w+1)+x] + img[y*w+x]
for y in xrange(h):
y0 = max(0, y - r)
y1 = min(h, y + r + 1)
for x in xrange(w):
x0 = max(0, x - r)
x1 = min(w, x + r + 1)
img[y*w+x] = st[y0*(w+1)+x0] + st[y1*(w+1)+x1] - st[y1*(w+1)+x0] - st[y0*(w+1)+x1]
def grid_density_boxsum(x0, y0, x1, y1, w, h, data):
kx = (w - 1) / (x1 - x0)
ky = (h - 1) / (y1 - y0)
r = 15
border = r * 2
imgw = (w + 2 * border)
imgh = (h + 2 * border)
img = [0] * (imgw * imgh)
for x, y in data:
ix = int((x - x0) * kx) + border
iy = int((y - y0) * ky) + border
if 0 <= ix < imgw and 0 <= iy < imgh:
img[iy * imgw + ix] += 1
for p in xrange(4):
boxsum(img, imgw, imgh, r)
a = np.array(img).reshape(imgh,imgw)
b = a[border:(border+h),border:(border+w)]
return b
def grid_density_gaussian_filter(x0, y0, x1, y1, w, h, data):
kx = (w - 1) / (x1 - x0)
ky = (h - 1) / (y1 - y0)
r = 20
border = r
imgw = (w + 2 * border)
imgh = (h + 2 * border)
img = np.zeros((imgh,imgw))
for x, y in data:
ix = int((x - x0) * kx) + border
iy = int((y - y0) * ky) + border
if 0 <= ix < imgw and 0 <= iy < imgh:
img[iy][ix] += 1
return ndi.gaussian_filter(img, (r,r)) ## gaussian convolution
def generate_graph():
n = 1000
# data points range
data_ymin = -2.
data_ymax = 2.
data_xmin = -2.
data_xmax = 2.
# view area range
view_ymin = -.5
view_ymax = .5
view_xmin = -.5
view_xmax = .5
# generate data
xl = np.random.uniform(data_xmin, data_xmax, n)
yl = np.random.uniform(data_ymin, data_ymax, n)
zl = np.random.uniform(0, 1, n)
# get visible data points
xlvis = []
ylvis = []
for i in range(0,len(xl)):
if view_xmin < xl[i] < view_xmax and view_ymin < yl[i] < view_ymax:
xlvis.append(xl[i])
ylvis.append(yl[i])
fig = plt.figure()
# plot histogram
plt1 = fig.add_subplot(221)
plt1.set_axis_off()
t0 = time.clock()
zd, xe, ye = np.histogram2d(yl, xl, bins=10, range=[[view_ymin, view_ymax],[view_xmin, view_xmax]], normed=True)
plt.title('numpy.histogram2d - '+str(time.clock()-t0)+"sec")
plt.imshow(zd, origin='lower', extent=[view_xmin, view_xmax, view_ymin, view_ymax])
plt.scatter(xlvis, ylvis)
# plot density calculated with kdtree
plt2 = fig.add_subplot(222)
plt2.set_axis_off()
xi = np.linspace(view_xmin, view_xmax, 256)
yi = np.linspace(view_ymin, view_ymax, 256)
t0 = time.clock()
zd = grid_density_kdtree(xl, yl, xi, yi, 70)
plt.title('function of 5 nearest using kdtree\n'+str(time.clock()-t0)+"sec")
cmap=cm.jet
A = (cmap(zd/256.0)*255).astype(np.uint8)
#A[:,:,3] = zd
plt.imshow(A , origin='lower', extent=[view_xmin, view_xmax, view_ymin, view_ymax])
plt.scatter(xlvis, ylvis)
# gaussian filter
plt3 = fig.add_subplot(223)
plt3.set_axis_off()
t0 = time.clock()
zd = grid_density_gaussian_filter(view_xmin, view_ymin, view_xmax, view_ymax, 256, 256, zip(xl, yl))
plt.title('ndi.gaussian_filter - '+str(time.clock()-t0)+"sec")
plt.imshow(zd , origin='lower', extent=[view_xmin, view_xmax, view_ymin, view_ymax])
plt.scatter(xlvis, ylvis)
# boxsum smoothing
plt3 = fig.add_subplot(224)
plt3.set_axis_off()
t0 = time.clock()
zd = grid_density_boxsum(view_xmin, view_ymin, view_xmax, view_ymax, 256, 256, zip(xl, yl))
plt.title('boxsum smoothing - '+str(time.clock()-t0)+"sec")
plt.imshow(zd, origin='lower', extent=[view_xmin, view_xmax, view_ymin, view_ymax])
plt.scatter(xlvis, ylvis)
if __name__=='__main__':
generate_graph()
plt.show()
This approach is along the lines of some previous answers: increment a pixel for each spot, then smooth the image with a gaussian filter. A 256x256 image runs in about 350ms on my 6-year-old laptop.
import numpy as np
import scipy.ndimage as ndi
data = np.random.rand(30000,2) ## create random dataset
inds = (data * 255).astype('uint') ## convert to indices
img = np.zeros((256,256)) ## blank image
for i in xrange(data.shape[0]): ## draw pixels
img[inds[i,0], inds[i,1]] += 1
img = ndi.gaussian_filter(img, (10,10))
A very simple implementation that could be done (with C) in realtime and that only takes fractions of a second in pure python is to just compute the result in screen space.
The algorithm is
Allocate the final matrix (e.g. 256x256) with all zeros
For each point in the dataset increment the corresponding cell
Replace each cell in the matrix with the sum of the values of the matrix in an NxN box centered on the cell. Repeat this step a few times.
Scale result and output
The computation of the box sum can be made very fast and independent on N by using a sum table. Every computation just requires two scan of the matrix... total complexity is O(S + WHP) where S is the number of points; W, H are width and height of output and P is the number of smoothing passes.
Below is the code for a pure python implementation (also very un-optimized); with 30000 points and a 256x256 output grayscale image the computation is 0.5sec including linear scaling to 0..255 and saving of a .pgm file (N = 5, 4 passes).
def boxsum(img, w, h, r):
st = [0] * (w+1) * (h+1)
for x in xrange(w):
st[x+1] = st[x] + img[x]
for y in xrange(h):
st[(y+1)*(w+1)] = st[y*(w+1)] + img[y*w]
for x in xrange(w):
st[(y+1)*(w+1)+(x+1)] = st[(y+1)*(w+1)+x] + st[y*(w+1)+(x+1)] - st[y*(w+1)+x] + img[y*w+x]
for y in xrange(h):
y0 = max(0, y - r)
y1 = min(h, y + r + 1)
for x in xrange(w):
x0 = max(0, x - r)
x1 = min(w, x + r + 1)
img[y*w+x] = st[y0*(w+1)+x0] + st[y1*(w+1)+x1] - st[y1*(w+1)+x0] - st[y0*(w+1)+x1]
def saveGraph(w, h, data):
X = [x for x, y in data]
Y = [y for x, y in data]
x0, y0, x1, y1 = min(X), min(Y), max(X), max(Y)
kx = (w - 1) / (x1 - x0)
ky = (h - 1) / (y1 - y0)
img = [0] * (w * h)
for x, y in data:
ix = int((x - x0) * kx)
iy = int((y - y0) * ky)
img[iy * w + ix] += 1
for p in xrange(4):
boxsum(img, w, h, 2)
mx = max(img)
k = 255.0 / mx
out = open("result.pgm", "wb")
out.write("P5\n%i %i 255\n" % (w, h))
out.write("".join(map(chr, [int(v*k) for v in img])))
out.close()
import random
data = [(random.random(), random.random())
for i in xrange(30000)]
saveGraph(256, 256, data)
Edit
Of course the very definition of density in your case depends on a resolution radius, or is the density just +inf when you hit a point and zero when you don't?
The following is an animation built with the above program with just a few cosmetic changes:
used sqrt(average of squared values) instead of sum for the averaging pass
color-coded the results
stretching the result to always use the full color scale
drawn antialiased black dots where the data points are
made an animation by incrementing the radius from 2 to 40
The total computing time of the 39 frames of the following animation with this cosmetic version is 5.4 seconds with PyPy and 26 seconds with standard Python.
Histograms
The histogram way is not the fastest, and can't tell the difference between an arbitrarily small separation of points and 2 * sqrt(2) * b (where b is bin width).
Even if you construct the x bins and y bins separately (O(N)), you still have to perform some ab convolution (number of bins each way), which is close to N^2 for any dense system, and even bigger for a sparse one (well, ab >> N^2 in a sparse system.)
Looking at the code above, you seem to have a loop in grid_density() which runs over the number of bins in y inside a loop of the number of bins in x, which is why you're getting O(N^2) performance (although if you are already order N, which you should plot on different numbers of elements to see, then you're just going to have to run less code per cycle).
If you want an actual distance function then you need to start looking at contact detection algorithms.
Contact Detection
Naive contact detection algorithms come in at O(N^2) in either RAM or CPU time, but there is an algorithm, rightly or wrongly attributed to Munjiza at St. Mary's college London, which runs in linear time and RAM.
you can read about it and implement it yourself from his book, if you like.
I have written this code myself, in fact
I have written a python-wrapped C implementation of this in 2D, which is not really ready for production (it is still single threaded, etc) but it will run in as close to O(N) as your dataset will allow. You set the "element size", which acts as a bin size (the code will call interactions on everything within b of another point, and sometimes between b and 2 * sqrt(2) * b), give it an array (native python list) of objects with an x and y property and my C module will callback to a python function of your choice to run an interaction function for matched pairs of elements. it's designed for running contact force DEM simulations, but it will work fine on this problem too.
As I haven't released it yet, because the other bits of the library aren't ready yet, I'll have to give you a zip of my current source but the contact detection part is solid. The code is LGPL'd.
You'll need Cython and a c compiler to make it work, and it's only been tested and working under *nix environemnts, if you're on windows you'll need the mingw c compiler for Cython to work at all.
Once Cython's installed, building/installing pynet should be a case of running setup.py.
The function you are interested in is pynet.d2.run_contact_detection(py_elements, py_interaction_function, py_simulation_parameters) (and you should check out the classes Element and SimulationParameters at the same level if you want it to throw less errors - look in the file at archive-root/pynet/d2/__init__.py to see the class implementations, they're trivial data holders with useful constructors.)
(I will update this answer with a public mercurial repo when the code is ready for more general release...)
Your solution is okay, but one clear problem is that you're getting dark regions despite there being a point right in the middle of them.
I would instead center an n-dimensional Gaussian on each point and evaluate the sum over each point you want to display. To reduce it to linear time in the common case, use query_ball_point to consider only points within a couple standard deviations.
If you find that he KDTree is really slow, why not call query_ball_point once every five pixels with a slightly larger threshold? It doesn't hurt too much to evaluate a few too many Gaussians.
You can do this with a 2D, separable convolution (scipy.ndimage.convolve1d) of your original image with a gaussian shaped kernel. With an image size of MxM and a filter size of P, the complexity is O(PM^2) using separable filtering. The "Big-Oh" complexity is no doubt greater, but you can take advantage of numpy's efficient array operations which should greatly speed up your calculations.
Just a note, the histogram2d function should work fine for this. Did you play around with different bin sizes? Your initial histogram2d plot seems to just use the default bin sizes... but there's no reason to expect the default sizes to give you the representation you want. Having said that, many of the other solutions are impressive too.

Categories