Stop pyplot.contour from drawing a contour along a discontinuity - python

I have a 2d map of a coordinate transform. The data at each point is the aximuthal angle in the original coordinate system, which goes from 0 to 360. I'm trying to use pyplot.contour to plot lines of constant angle, e.g. 45 degrees. The contour appears along the 45 degree line between the two poles, but there's an additional part to the contour that connects the two poles along the 0/360 discontinuity. This makes a very jagged ugly line as it basically just traces the pixels with a number close to 0 on one side and another close to 360 on the other.
Examples:
Here is an image using full colour map:
You can see the discontinuity along the blue/red curve on the left side. One side is 360 degrees, the other is 0 degrees. When plotting contours, I get:
Note that all contours connect the two poles, but even though I have NOT plotted the 0 degree contour, all the other contours follow along the 0 degree discontinuity (because pyplot thinks if it's 0 on one side and 360 on the other, there must be all other angles in between).
Code to produce this data:
import numpy as np
import matplotlib.pyplot as plt
jgal = np.array(
[
[-0.054875539726, -0.873437108010, -0.483834985808],
[0.494109453312, -0.444829589425, 0.746982251810],
[-0.867666135858, -0.198076386122, 0.455983795705],
]
)
def s2v3(rra, rdec, r):
pos0 = r * np.cos(rra) * np.cos(rdec)
pos1 = r * np.sin(rra) * np.cos(rdec)
pos2 = r * np.sin(rdec)
return np.array([pos0, pos1, pos2])
def v2s3(pos):
x = pos[0]
y = pos[1]
z = pos[2]
if np.isscalar(x):
x, y, z = np.array([x]), np.array([y]), np.array([z])
rra = np.arctan2(y, x)
low = np.where(rra < 0.0)
high = np.where(rra > 2.0 * np.pi)
if len(low[0]):
rra[low] = rra[low] + (2.0 * np.pi)
if len(high[0]):
rra[high] = rra[high] - (2.0 * np.pi)
rxy = np.sqrt(x ** 2 + y ** 2)
rdec = np.arctan2(z, rxy)
r = np.sqrt(x ** 2 + y ** 2 + z ** 2)
if x.size == 1:
rra = rra[0]
rdec = rdec[0]
r = r[0]
return rra, rdec, r
def gal2fk5(gl, gb):
rgl = np.deg2rad(gl)
rgb = np.deg2rad(gb)
r = 1.0
pos = s2v3(rgl, rgb, r)
pos1 = np.dot(pos.transpose(), jgal).transpose()
rra, rdec, r = v2s3(pos1)
dra = np.rad2deg(rra)
ddec = np.rad2deg(rdec)
return dra, ddec
def make_coords(resolution=50):
width = 9
height = 6
px = width * resolution
py = height * resolution
coords = np.zeros((px, py, 4))
for ix in range(0, px):
for iy in range(0, py):
l = 360.0 / px * ix - 180.0
b = 180.0 / py * iy - 90.0
dra, ddec = gal2fk5(l, b)
coords[ix, iy, 0] = dra
coords[ix, iy, 1] = ddec
coords[ix, iy, 2] = l
coords[ix, iy, 3] = b
return coords
coords = make_coords()
# now do one of these
# plt.imshow(coords[:,:,0],origin='lower') # color plot
plt.contour(
coords[:, :, 0], levels=[45, 90, 135, 180, 225, 270, 315]
) # contour plot with jagged ugliness
plt.show()
How can I either:
stop pyplot.contour from drawing a contour along the discontinuity
make pyplot.contour recognize that the 0/360 discontinuity in angle is not a real discontinuity at all.
I can just increase the resolution of the underlying data, but before I get a nice smooth line it starts to take a very long time and a lot of memory to plot.
I will also want to plot a contour along 0 degrees, but if I can figure out how to hide the discontinuity I can just shift it to somewhere else not near a contour. Or, if I can make #2 happen, it won't be an issue.

This is definitely still a hack, but you can get nice smooth contours with a two-fold approach:
Plot contours of the absolute value of the phase (going from -180˚ to 180˚) so that there is no discontinuity.
Plot two sets of contours in a finite region so that numerical defects close to the tops and bottoms of the extrema do not creep in.
Here is the complete code to append to your example:
Z = np.exp(1j*np.pi*coords[:,:,0]/180.0)
Z *= np.exp(0.25j*np.pi/2.0) # Shift to get same contours as in your example
X = np.arange(300)
Y = np.arange(450)
N = 2
levels = 90*(0.5 + (np.arange(N) + 0.5)/N)
c1 = plt.contour(X, Y, abs(np.angle(Z)*180/np.pi), levels=levels)
c2 = plt.contour(X, Y, abs(np.angle(Z*np.exp(0.5j*np.pi))*180/np.pi), levels=levels)
One can generalize this code to get smooth contours for any "periodic" function. What is left to be done is to generate a new set of contours with the correct values so that colormaps apply correctly, labels will be applied correctly etc. However, there does not seem to be a simple way of doing this with matplotlib: the relevant QuadContourSet class does everything and I do not see a simple way of constructing an appropriate contour object from the contours c1 and c2.

I was interested in the exact same problem. One solution is to NaN out the contours along the branch cut; see here; another is to use the max_jump argument in matplotx's contour().
I molded the solution into a Python package, cplot.
import cplot
import numpy as np
def f(z):
return np.exp(1 / z)
cplot.show(f, (-1.0, +1.0, 400), (-1.0, +1.0, 400))

Related

Draw a circle in a numpy array given index and radius without external libraries

I need to draw a circle in a 2D numpy array given [i,j] as indexes of the array, and r as the radius of the circle. Each time a condition is met at index [i,j], a circle should be drawn with that as the center point, increasing all values inside the circle by +1. I want to avoid the for-loops at the end where I draw the circle (where I use p,q to index) because I have to draw possibly millions of circles. Is there a way without for loops? I also don't want to import another library for just a single task.
Here is my current implementation:
for i in range(array_shape[0]):
for j in range(array_shape[1]):
if (condition): # Draw circle if condition is fulfilled
# Create a square of pixels with side lengths equal to radius of circle
x_square_min = i-r
x_square_max = i+r+1
y_square_min = j-r
y_square_max = j+r+1
# Clamp this square to the edges of the array so circles near edges don't wrap around
if x_square_min < 0:
x_square_min = 0
if y_square_min < 0:
y_square_min = 0
if x_square_max > array_shape[0]:
x_square_max = array_shape[0]
if y_square_max > array_shape[1]:
y_square_max = array_shape[1]
# Now loop over the box and draw circle inside of it
for p in range(x_square_min , x_square_max):
for q in range(y_square_min , y_square_max):
if (p - i) ** 2 + (q - j) ** 2 <= r ** 2:
new_array[p,q] += 1 # Incrementing because need to have possibility of
# overlapping circles
If you're using the same radius for every single circle, you can simplify things significantly by only calculating the circle coordinates once and then adding the center coordinates to the circle points when needed. Here's the code:
# The main array of values is called array.
shape = array.shape
row_indices = np.arange(0, shape[0], 1)
col_indices = np.arange(0, shape[1], 1)
# Returns xy coordinates for a circle with a given radius, centered at (0,0).
def points_in_circle(radius):
a = np.arange(radius + 1)
for x, y in zip(*np.where(a[:,np.newaxis]**2 + a**2 <= radius**2)):
yield from set(((x, y), (x, -y), (-x, y), (-x, -y),))
# Set the radius value before running code.
radius = RADIUS
circle_r = np.array(list(points_in_circle(radius)))
# Note that I'm using x as the row number and y as the column number.
# Center of circle is at (x_center, y_center). shape_0 and shape_1 refer to the main array
# so we can get rid of coordinates outside the bounds of array.
def add_center_to_circle(circle_points, x_center, y_center, shape_0, shape_1):
circle = np.copy(circle_points)
circle[:, 0] += x_center
circle[:, 1] += y_center
# Get rid of rows where coordinates are below 0 (can't be indexed)
bad_rows = np.array(np.where(circle < 0)).T[:, 0]
circle = np.delete(circle, bad_rows, axis=0)
# Get rid of rows that are outside the upper bounds of the array.
circle = circle[circle[:, 0] < shape_0, :]
circle = circle[circle[:, 1] < shape_1, :]
return circle
for x in row_indices:
for y in col_indices:
# You need to set CONDITION before running the code.
if CONDITION:
# Because circle_r is the same for all circles, it doesn't need to be recalculated all the time. All you need to do is add x and y to circle_r each time CONDITION is met.
circle_coords = add_center_to_circle(circle_r, x, y, shape[0], shape[1])
array[tuple(circle_coords.T)] += 1
When I set radius = 10, array = np.random.rand(1200).reshape(40, 30) and replaced if CONDITION with if (x == 20 and y == 20) or (x == 25 and y == 20), I got this, which seems to be what you want:
Let me know if you have any questions.
Adding each circle can be vectorized. This solution iterates over the coordinates where the condition is met. On a 2-core colab instance ~60k circles with radius 30 can be added per second.
import numpy as np
np.random.seed(42)
arr = np.random.rand(400,300)
r = 30
xx, yy = np.mgrid[-r:r+1, -r:r+1]
circle = xx**2 + yy**2 <= r**2
condition = np.where(arr > .999) # np.where(arr > .5) to benchmark 60k circles
for x,y in zip(*condition):
# valid indices of the array
i = slice(max(x-r,0), min(x+r+1, arr.shape[0]))
j = slice(max(y-r,0), min(y+r+1, arr.shape[1]))
# visible slice of the circle
ci = slice(abs(min(x-r, 0)), circle.shape[0] - abs(min(arr.shape[0]-(x+r+1), 0)))
cj = slice(abs(min(y-r, 0)), circle.shape[1] - abs(min(arr.shape[1]-(y+r+1), 0)))
arr[i, j] += circle[ci, cj]
Visualizing np.array arr
import matplotlib.pyplot as plt
plt.figure(figsize=(8,8))
plt.imshow(arr)
plt.show()

How to use mgrid to interpolate between a rectangle and a circle

I am trying to create a 3D surface that has a 1/4 rectangle for the exterior and 1/4 circle for the interior. I had help before to create the 3D surface with an ellipse as an exterior but I cannot do this for a rectangle for some reason. I have done the math by hand which makes sense, but my code does not. I would greatly appreciate any help with this.
import numpy as np
import pyvista as pv
# parameters for the waveguide
# diameter of the inner circle
waveguide_throat = 30
# axes of the outer ellipse
ellipse_x = 250
ellipse_y = 170
# shape parameters for the z profile
depth_factor = 4
angle_factor = 40
# number of grid points in radial and angular direction
array_length = 100
phase_plug = 0
phase_plug_dia = 20
plug_offset = 5
dome_dia = 28
# theta is angle where x and y intersect
theta = np.arctan(ellipse_x / ellipse_y)
# chi is for x direction and lhi is for y direction
chi = np.linspace(0, theta, 100)
lhi = np.linspace(theta, np.pi/2, 100)
# mgrid to create structured grid
r, phi = np.mgrid[0:1:array_length*1j, 0:np.pi/2:array_length*1j]
# Rectangle exterior, circle interior
x = (ellipse_y * np.tan(chi)) * r + ((waveguide_throat / 2 * (1 - r)) * np.cos(phi))
y = (ellipse_x / np.tan(lhi)) * r + ((waveguide_throat / 2 * (1 - r)) * np.sin(phi))
# compute z profile
angle_factor = angle_factor / 10000
z = (ellipse_x / 2 * r / angle_factor) ** (1 / depth_factor)
plotter = pv.Plotter()
waveguide_mesh = pv.StructuredGrid(x, y, z)
plotter.add_mesh(waveguide_mesh)
plotter.show()
The linear interpolation you're trying to use is a general tool that should work (with one small caveat). So the issue is first with your rectangular edge.
Here's a sanity check which plots your interior and exterior lines:
# debugging: plot interior and exterior
exterior_points = np.array([
ellipse_y * np.tan(chi),
ellipse_x / np.tan(lhi),
np.zeros_like(chi)
]).T
phi_aux = np.linspace(0, np.pi/2, array_length)
interior_points = np.array([
waveguide_throat / 2 * np.cos(phi_aux),
waveguide_throat / 2 * np.sin(phi_aux),
np.zeros_like(phi_aux)
]).T
plotter = pv.Plotter()
plotter.add_mesh(pv.wrap(exterior_points))
plotter.add_mesh(pv.wrap(interior_points))
plotter.show()
The bottom left is your interior circle, looks good. The top right is what's supposed to be a rectangle, but isn't.
To see why your original surface looks the way it does, we have to note one more thing (this is the small caveat I mentioned): the orientation of your curves is also the opposite. This implies that you interpolate the "top" (in the screenshot) point of your interior curve with the "bottom" point of the exterior curve. This explains the weird fan shape.
So you need to fix the exterior curve, and make sure the orientation of the two edges is the same. Note that you can just create the two 1d arrays for the two edges, and then interpolate them. You don't have to come up with a symbolic formula that you plug into the interpolation step. If you have 1d arrays of the same shape x_interior, y_interior, x_exterior, y_exterior then you can then do x_exterior * r + x_interior * (1 - r) and the same for y. This means removing the mgrid call, only using an array r of shape (n, 1), and making use of array broadcasting to do the interpolation. This means doing r = np.linspace(0, 1, array_length)[:, None].
So the question is how to define your rectangle. You need to have the same number of points on the rectangular curve than what you have on the circle (I would strongly recommend using the array_length parameter everywhere to ensure this!). Since you want to span the whole rectangle, I believe you have to choose an array index (i.e. a certain angle in the circular arc) which will map to the corner of the rectangle. Then it's a simple matter of varying only y for the points until that index, and x for the rest (or vice versa).
Here's what I mean: you know that the rectangle's corner is at angle theta in your code (although I think you have x and y mixed up if we assume the conventional relationship between "x", "y" and the tangent of the angle). Since theta goes from 0 to pi/2, and your phi values also go from 0 to pi/2, you should choose index (array_length * (2*theta/np.pi)).round().astype(int) - 1 (or something similar) that will map to the rectangle's corner. If you have a square, this gives you theta = pi/4, and consequently (array_length / 2).round().astype(int) - 1. For array_length = 3 this is index (2 - 1) == 1, which is the middle index for 3-length arrays. (The more points you have along the edge, the less it will matter if you commit an off-by-one error here.)
The only remaining complication then is that we have to explicitly broadcast the 1d z array to the common shape. And we can use the same math you used to get a rectangular edge that is equidistant in angles.
Your code fixed with this suggestion (note that I've added 1 to the corner index because I'm using it as a right-exclusive range index):
import numpy as np
import pyvista as pv
# parameters for the waveguide
# diameter of the inner circle
waveguide_throat = 30
# axes of the outer ellipse
ellipse_x = 250
ellipse_y = 170
# shape parameters for the z profile
depth_factor = 4
angle_factor = 40
# number of grid points in radial and angular direction
array_length = 100
# quarter circle interior line
phi = np.linspace(0, np.pi/2, array_length)
x_interior = waveguide_throat / 2 * np.cos(phi)
y_interior = waveguide_throat / 2 * np.sin(phi)
# theta is angle where x and y intersect
theta = np.arctan2(ellipse_y, ellipse_x)
# find array index which maps to the corner of the rectangle
corner_index = (array_length * (2*theta/np.pi)).round().astype(int)
# construct rectangular coordinates manually
x_exterior = np.zeros_like(x_interior)
y_exterior = x_exterior.copy()
phi_aux = np.linspace(0, theta, corner_index)
x_exterior[:corner_index] = ellipse_x
y_exterior[:corner_index] = ellipse_x * np.tan(phi_aux)
phi_aux = np.linspace(np.pi/2, theta, array_length - corner_index, endpoint=False)[::-1] # mind the reverse!
x_exterior[corner_index:] = ellipse_y / np.tan(phi_aux)
y_exterior[corner_index:] = ellipse_y
# interpolate between two curves
r = np.linspace(0, 1, array_length)[:, None] # shape (array_length, 1) for broadcasting
x = x_exterior * r + x_interior * (1 - r)
y = y_exterior * r + y_interior * (1 - r)
# debugging: plot interior and exterior
exterior_points = np.array([
x_exterior,
y_exterior,
np.zeros_like(x_exterior),
]).T
interior_points = np.array([
x_interior,
y_interior,
np.zeros_like(x_interior),
]).T
plotter = pv.Plotter()
plotter.add_mesh(pv.wrap(exterior_points))
plotter.add_mesh(pv.wrap(interior_points))
plotter.show()
# compute z profile
angle_factor = angle_factor / 10000
z = (ellipse_x / 2 * r / angle_factor) ** (1 / depth_factor)
# explicitly broadcast to the shape of x and y
z = np.broadcast_to(z, x.shape)
plotter = pv.Plotter()
waveguide_mesh = pv.StructuredGrid(x, y, z)
plotter.add_mesh(waveguide_mesh, style='wireframe')
plotter.show()
The curves look reasonable:
As does the interpolated surface:

Efficiently apply function to spheric neighbourhood in numpy array

I have a 3D numpy array of float values in Python.
I need to retrieve all the elements in a sphere of radius r starting from
a center point P(x, y, z). Then, I want to apply to the sphere points a function that
updates their values and needs the distance to the center point to do this. I do these steps a lot of times and for
large radius values, so I would like to have a solution that is as efficient
as possible.
My current solution checks only the points in the bounding box of the sphere,
as indicated here: Using a QuadTree to get all points within a bounding circle.
A sketch of the code looks like this:
# P(x, y, z): center of the sphere
for k1 in range(x - r, x + r + 1):
for k2 in range(y - r, y + r + 1):
for k3 in range(z - r, z + r + 1):
# Sphere center - current point distance
dist = np.sum((np.array([k1, k2, k3]) - np.array([x, y, z])) ** 2)
if (dist <= r * r):
# computeUpdatedValue(distance, radius): function that computes the new value of the matrix in the current point
newValue = computeUpdatedValue(dist, r)
# Update the matrix
mat[k1, k2, k3] = newValue
However, I thought that applying a mask to retrive the points and, then,
update them based on distance in a vectorized manner is more efficient.
I have seen how to apply a circular kernel
(How to apply a disc shaped mask to a numpy array?),
but I do no know how to efficiently apply the function (depending on the indices) on each of the mask's elements.
EDIT: If your array is very big compared to the region you are updating, the solution below will take much more memory than necessary. You can apply the same idea but only to the region where the sphere may fall:
def updateSphereBetter(mat, center, radius):
# Find beginning and end of region of interest
center = np.asarray(center)
start = np.minimum(np.maximum(center - radius, 0), mat.shape)
end = np.minimum(np.maximum(center + radius + 1, 0), mat.shape)
# Slice region of interest
mat_sub = mat[tuple(slice(s, e) for s, e in zip(start, end))]
# Center coordinates relative to the region of interest
center_rel = center - start
# Same as before but with mat_sub and center_rel
ind = np.indices(mat_sub.shape)
ind = np.moveaxis(ind, 0, -1)
dist_squared = np.sum(np.square(ind - center_rel), axis=-1)
mask = dist_squared <= radius * radius
mat_sub[mask] = computeUpdatedValue(dist_squared[mask], radius)
Note that since mat_sub is a view of mat, updating it updates the original array, so this produces the same result as before, but with less resources.
Here is a little proof of concept. I defined computeUpdatedValue so that it shows the distance from the center, and then plotted a few "sections" of an example:
import numpy as np
import matplotlib.pyplot as plt
def updateSphere(mat, center, radius):
# Make array of all index coordinates
ind = np.indices(mat.shape)
# Compute the squared distances to each point
ind = np.moveaxis(ind, 0, -1)
dist_squared = np.sum(np.square(ind - center), axis=-1)
# Make a mask for squared distances within squared radius
mask = dist_squared <= radius * radius
# Update masked values
mat[mask] = computeUpdatedValue(dist_squared[mask], radius)
def computeUpdatedValue(dist_squared, radius):
# 1 at the center of the sphere and 0 at the surface
return np.clip(1 - np.sqrt(dist_squared) / radius, 0, 1)
mat = np.zeros((100, 60, 80))
updateSphere(mat, [50, 20, 40], 20)
plt.subplot(131)
plt.imshow(mat[:, :, 30], vmin=0, vmax=1)
plt.subplot(132)
plt.imshow(mat[:, :, 40], vmin=0, vmax=1)
plt.subplot(133)
plt.imshow(mat[:, :, 55], vmin=0, vmax=1)
Output:

Generate profiles through a 2D array at an angle without altering pixels

I'd like to plot two profiles through the highest intensity point in a 2D numpy array, which is an image of a blob (i.e. a line through the semi-major axis, and another line through the semi-minor axis). The blob is rotated at an angle theta counterclockwise from the standard x-axis and is asymmetric.
It is a 600x600 array with a max intensity of 1 (at only one pixel) that is located right at the center at (300, 300). The angle rotation from the x-axis (which then gives the location of the semi-major axis when rotated by that angle) is theta = 89.54 degrees. I do not want to use scipy.ndimage.rotate because it uses spline interpolation, and I do not want to change any of my pixel values. But I suppose a nearest-neighbor interpolation method would be okay.
I tried generating lines corresponding to the major and minor axes across the image, but the result was not right at all (the peak was far less than 1), so maybe I did something wrong. The code for this is below:
import numpy as np
import matplotlib.pyplot as plt
from scipy import ndimage
def profiles_at_angle(image, axis, theta):
theta = np.deg2rad(theta)
if axis == 'major':
x_0, y_0 = 0, 300-300*np.tan(theta)
x_1, y_1 = 599, 300+300*np.tan(theta)
elif axis=='minor':
x_0, y_0 = 300-300*np.tan(theta), 599
x_1, y_1 = 300+300*np.tan(theta), -599
num = 600
x, y = np.linspace(x_0, x_1, num), np.linspace(y_0, y_1, num)
z = ndimage.map_coordinates(image, np.vstack((x,y)))
fig, axes = plt.subplots(nrows=2)
axes[0].imshow(image, cmap='gray')
axes[0].axis('image')
axes[1].plot(z)
plt.xlim(250,350)
plt.show()
profiles_at_angle(image, 'major', theta)
Did I do something obviously wrong in my code above? Or how else can I accomplish this? Thank you.
Edit: Here are some example images. Sorry for the bad quality; my browser crashed every time I tried uploading them anywhere so I had to take photos of the screen.
Figure 1: This is the result of my code above, which is clearly wrong since the peak should be at 1. I'm not sure what I did wrong though.
Figure 2: I made this plot below by just taking the profiles through the standard x and y axes, ignoring any rotation (this only looks good coincidentally because the real angle of rotation is so close to 90 degrees, so I was able to just switch the labels and get this). I want my result to look something like this, but taking the correction rotation angle into account.
Edit: It could be useful to run tests on this method using data very much like my own (it's a 2D Gaussian with nearly the same parameters):
image = np.random.random((600,600))
def generate(data_set):
xvec = np.arange(0, np.shape(data_set)[1], 1)
yvec = np.arange(0, np.shape(data_set)[0], 1)
X, Y = np.meshgrid(xvec, yvec)
return X, Y
def gaussian_func(xy, x0, y0, sigma_x, sigma_y, amp, theta, offset):
x, y = xy
a = (np.cos(theta))**2/(2*sigma_x**2) + (np.sin(theta))**2/(2*sigma_y**2)
b = -np.sin(2*theta)/(4*sigma_x**2) + np.sin(2*theta)/(4*sigma_y**2)
c = (np.sin(theta))**2/(2*sigma_x**2) + (np.cos(theta))**2/(2*sigma_y**2)
inner = a * (x-x0)**2
inner += 2*b*(x-x0)*(y-y0)
inner += c * (y-y0)**2
return (offset + amp * np.exp(-inner)).ravel()
xx, yy = generate(image)
image = gaussian_func((xx.ravel(), yy.ravel()), 300, 300, 5, 4, 1, 1.56, 0)
image = np.reshape(image, (600, 600))
This should do it for you. You just did not properly compute your lines.
theta = 65
peak = np.argwhere(image==1)[0]
x = np.linspace(peak[0]-100,peak[0]+100,1000)
y = lambda x: (x-peak[1])*np.tan(np.deg2rad(theta))+peak[0]
y_maj = np.linspace(y(peak[1]-100),y(peak[1]+100),1000)
y = lambda x: -(x-peak[1])/np.tan(np.deg2rad(theta))+peak[0]
y_min = np.linspace(y(peak[1]-100),y(peak[1]+100),1000)
del y
z_min = scipy.ndimage.map_coordinates(image, np.vstack((x,y_min)))
z_maj = scipy.ndimage.map_coordinates(image, np.vstack((x,y_maj)))
fig, axes = plt.subplots(nrows=2)
axes[0].imshow(image)
axes[0].plot(x,y_maj)
axes[0].plot(x,y_min)
axes[0].axis('image')
axes[1].plot(z_min)
axes[1].plot(z_maj)
plt.show()

Efficient method of calculating density of irregularly spaced points

I am attempting to generate map overlay images that would assist in identifying hot-spots, that is areas on the map that have high density of data points. None of the approaches that I've tried are fast enough for my needs.
Note: I forgot to mention that the algorithm should work well under both low and high zoom scenarios (or low and high data point density).
I looked through numpy, pyplot and scipy libraries, and the closest I could find was numpy.histogram2d. As you can see in the image below, the histogram2d output is rather crude. (Each image includes points overlaying the heatmap for better understanding)
My second attempt was to iterate over all the data points, and then calculate the hot-spot value as a function of distance. This produced a better looking image, however it is too slow to use in my application. Since it's O(n), it works ok with 100 points, but blows out when I use my actual dataset of 30000 points.
My final attempt was to store the data in an KDTree, and use the nearest 5 points to calculate the hot-spot value. This algorithm is O(1), so much faster with large dataset. It's still not fast enough, it takes about 20 seconds to generate a 256x256 bitmap, and I would like this to happen in around 1 second time.
Edit
The boxsum smoothing solution provided by 6502 works well at all zoom levels and is much faster than my original methods.
The gaussian filter solution suggested by Luke and Neil G is the fastest.
You can see all four approaches below, using 1000 data points in total, at 3x zoom there are around 60 points visible.
Complete code that generates my original 3 attempts, the boxsum smoothing solution provided by 6502 and gaussian filter suggested by Luke (improved to handle edges better and allow zooming in) is here:
import matplotlib
import numpy as np
from matplotlib.mlab import griddata
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import math
from scipy.spatial import KDTree
import time
import scipy.ndimage as ndi
def grid_density_kdtree(xl, yl, xi, yi, dfactor):
zz = np.empty([len(xi),len(yi)], dtype=np.uint8)
zipped = zip(xl, yl)
kdtree = KDTree(zipped)
for xci in range(0, len(xi)):
xc = xi[xci]
for yci in range(0, len(yi)):
yc = yi[yci]
density = 0.
retvalset = kdtree.query((xc,yc), k=5)
for dist in retvalset[0]:
density = density + math.exp(-dfactor * pow(dist, 2)) / 5
zz[yci][xci] = min(density, 1.0) * 255
return zz
def grid_density(xl, yl, xi, yi):
ximin, ximax = min(xi), max(xi)
yimin, yimax = min(yi), max(yi)
xxi,yyi = np.meshgrid(xi,yi)
#zz = np.empty_like(xxi)
zz = np.empty([len(xi),len(yi)])
for xci in range(0, len(xi)):
xc = xi[xci]
for yci in range(0, len(yi)):
yc = yi[yci]
density = 0.
for i in range(0,len(xl)):
xd = math.fabs(xl[i] - xc)
yd = math.fabs(yl[i] - yc)
if xd < 1 and yd < 1:
dist = math.sqrt(math.pow(xd, 2) + math.pow(yd, 2))
density = density + math.exp(-5.0 * pow(dist, 2))
zz[yci][xci] = density
return zz
def boxsum(img, w, h, r):
st = [0] * (w+1) * (h+1)
for x in xrange(w):
st[x+1] = st[x] + img[x]
for y in xrange(h):
st[(y+1)*(w+1)] = st[y*(w+1)] + img[y*w]
for x in xrange(w):
st[(y+1)*(w+1)+(x+1)] = st[(y+1)*(w+1)+x] + st[y*(w+1)+(x+1)] - st[y*(w+1)+x] + img[y*w+x]
for y in xrange(h):
y0 = max(0, y - r)
y1 = min(h, y + r + 1)
for x in xrange(w):
x0 = max(0, x - r)
x1 = min(w, x + r + 1)
img[y*w+x] = st[y0*(w+1)+x0] + st[y1*(w+1)+x1] - st[y1*(w+1)+x0] - st[y0*(w+1)+x1]
def grid_density_boxsum(x0, y0, x1, y1, w, h, data):
kx = (w - 1) / (x1 - x0)
ky = (h - 1) / (y1 - y0)
r = 15
border = r * 2
imgw = (w + 2 * border)
imgh = (h + 2 * border)
img = [0] * (imgw * imgh)
for x, y in data:
ix = int((x - x0) * kx) + border
iy = int((y - y0) * ky) + border
if 0 <= ix < imgw and 0 <= iy < imgh:
img[iy * imgw + ix] += 1
for p in xrange(4):
boxsum(img, imgw, imgh, r)
a = np.array(img).reshape(imgh,imgw)
b = a[border:(border+h),border:(border+w)]
return b
def grid_density_gaussian_filter(x0, y0, x1, y1, w, h, data):
kx = (w - 1) / (x1 - x0)
ky = (h - 1) / (y1 - y0)
r = 20
border = r
imgw = (w + 2 * border)
imgh = (h + 2 * border)
img = np.zeros((imgh,imgw))
for x, y in data:
ix = int((x - x0) * kx) + border
iy = int((y - y0) * ky) + border
if 0 <= ix < imgw and 0 <= iy < imgh:
img[iy][ix] += 1
return ndi.gaussian_filter(img, (r,r)) ## gaussian convolution
def generate_graph():
n = 1000
# data points range
data_ymin = -2.
data_ymax = 2.
data_xmin = -2.
data_xmax = 2.
# view area range
view_ymin = -.5
view_ymax = .5
view_xmin = -.5
view_xmax = .5
# generate data
xl = np.random.uniform(data_xmin, data_xmax, n)
yl = np.random.uniform(data_ymin, data_ymax, n)
zl = np.random.uniform(0, 1, n)
# get visible data points
xlvis = []
ylvis = []
for i in range(0,len(xl)):
if view_xmin < xl[i] < view_xmax and view_ymin < yl[i] < view_ymax:
xlvis.append(xl[i])
ylvis.append(yl[i])
fig = plt.figure()
# plot histogram
plt1 = fig.add_subplot(221)
plt1.set_axis_off()
t0 = time.clock()
zd, xe, ye = np.histogram2d(yl, xl, bins=10, range=[[view_ymin, view_ymax],[view_xmin, view_xmax]], normed=True)
plt.title('numpy.histogram2d - '+str(time.clock()-t0)+"sec")
plt.imshow(zd, origin='lower', extent=[view_xmin, view_xmax, view_ymin, view_ymax])
plt.scatter(xlvis, ylvis)
# plot density calculated with kdtree
plt2 = fig.add_subplot(222)
plt2.set_axis_off()
xi = np.linspace(view_xmin, view_xmax, 256)
yi = np.linspace(view_ymin, view_ymax, 256)
t0 = time.clock()
zd = grid_density_kdtree(xl, yl, xi, yi, 70)
plt.title('function of 5 nearest using kdtree\n'+str(time.clock()-t0)+"sec")
cmap=cm.jet
A = (cmap(zd/256.0)*255).astype(np.uint8)
#A[:,:,3] = zd
plt.imshow(A , origin='lower', extent=[view_xmin, view_xmax, view_ymin, view_ymax])
plt.scatter(xlvis, ylvis)
# gaussian filter
plt3 = fig.add_subplot(223)
plt3.set_axis_off()
t0 = time.clock()
zd = grid_density_gaussian_filter(view_xmin, view_ymin, view_xmax, view_ymax, 256, 256, zip(xl, yl))
plt.title('ndi.gaussian_filter - '+str(time.clock()-t0)+"sec")
plt.imshow(zd , origin='lower', extent=[view_xmin, view_xmax, view_ymin, view_ymax])
plt.scatter(xlvis, ylvis)
# boxsum smoothing
plt3 = fig.add_subplot(224)
plt3.set_axis_off()
t0 = time.clock()
zd = grid_density_boxsum(view_xmin, view_ymin, view_xmax, view_ymax, 256, 256, zip(xl, yl))
plt.title('boxsum smoothing - '+str(time.clock()-t0)+"sec")
plt.imshow(zd, origin='lower', extent=[view_xmin, view_xmax, view_ymin, view_ymax])
plt.scatter(xlvis, ylvis)
if __name__=='__main__':
generate_graph()
plt.show()
This approach is along the lines of some previous answers: increment a pixel for each spot, then smooth the image with a gaussian filter. A 256x256 image runs in about 350ms on my 6-year-old laptop.
import numpy as np
import scipy.ndimage as ndi
data = np.random.rand(30000,2) ## create random dataset
inds = (data * 255).astype('uint') ## convert to indices
img = np.zeros((256,256)) ## blank image
for i in xrange(data.shape[0]): ## draw pixels
img[inds[i,0], inds[i,1]] += 1
img = ndi.gaussian_filter(img, (10,10))
A very simple implementation that could be done (with C) in realtime and that only takes fractions of a second in pure python is to just compute the result in screen space.
The algorithm is
Allocate the final matrix (e.g. 256x256) with all zeros
For each point in the dataset increment the corresponding cell
Replace each cell in the matrix with the sum of the values of the matrix in an NxN box centered on the cell. Repeat this step a few times.
Scale result and output
The computation of the box sum can be made very fast and independent on N by using a sum table. Every computation just requires two scan of the matrix... total complexity is O(S + WHP) where S is the number of points; W, H are width and height of output and P is the number of smoothing passes.
Below is the code for a pure python implementation (also very un-optimized); with 30000 points and a 256x256 output grayscale image the computation is 0.5sec including linear scaling to 0..255 and saving of a .pgm file (N = 5, 4 passes).
def boxsum(img, w, h, r):
st = [0] * (w+1) * (h+1)
for x in xrange(w):
st[x+1] = st[x] + img[x]
for y in xrange(h):
st[(y+1)*(w+1)] = st[y*(w+1)] + img[y*w]
for x in xrange(w):
st[(y+1)*(w+1)+(x+1)] = st[(y+1)*(w+1)+x] + st[y*(w+1)+(x+1)] - st[y*(w+1)+x] + img[y*w+x]
for y in xrange(h):
y0 = max(0, y - r)
y1 = min(h, y + r + 1)
for x in xrange(w):
x0 = max(0, x - r)
x1 = min(w, x + r + 1)
img[y*w+x] = st[y0*(w+1)+x0] + st[y1*(w+1)+x1] - st[y1*(w+1)+x0] - st[y0*(w+1)+x1]
def saveGraph(w, h, data):
X = [x for x, y in data]
Y = [y for x, y in data]
x0, y0, x1, y1 = min(X), min(Y), max(X), max(Y)
kx = (w - 1) / (x1 - x0)
ky = (h - 1) / (y1 - y0)
img = [0] * (w * h)
for x, y in data:
ix = int((x - x0) * kx)
iy = int((y - y0) * ky)
img[iy * w + ix] += 1
for p in xrange(4):
boxsum(img, w, h, 2)
mx = max(img)
k = 255.0 / mx
out = open("result.pgm", "wb")
out.write("P5\n%i %i 255\n" % (w, h))
out.write("".join(map(chr, [int(v*k) for v in img])))
out.close()
import random
data = [(random.random(), random.random())
for i in xrange(30000)]
saveGraph(256, 256, data)
Edit
Of course the very definition of density in your case depends on a resolution radius, or is the density just +inf when you hit a point and zero when you don't?
The following is an animation built with the above program with just a few cosmetic changes:
used sqrt(average of squared values) instead of sum for the averaging pass
color-coded the results
stretching the result to always use the full color scale
drawn antialiased black dots where the data points are
made an animation by incrementing the radius from 2 to 40
The total computing time of the 39 frames of the following animation with this cosmetic version is 5.4 seconds with PyPy and 26 seconds with standard Python.
Histograms
The histogram way is not the fastest, and can't tell the difference between an arbitrarily small separation of points and 2 * sqrt(2) * b (where b is bin width).
Even if you construct the x bins and y bins separately (O(N)), you still have to perform some ab convolution (number of bins each way), which is close to N^2 for any dense system, and even bigger for a sparse one (well, ab >> N^2 in a sparse system.)
Looking at the code above, you seem to have a loop in grid_density() which runs over the number of bins in y inside a loop of the number of bins in x, which is why you're getting O(N^2) performance (although if you are already order N, which you should plot on different numbers of elements to see, then you're just going to have to run less code per cycle).
If you want an actual distance function then you need to start looking at contact detection algorithms.
Contact Detection
Naive contact detection algorithms come in at O(N^2) in either RAM or CPU time, but there is an algorithm, rightly or wrongly attributed to Munjiza at St. Mary's college London, which runs in linear time and RAM.
you can read about it and implement it yourself from his book, if you like.
I have written this code myself, in fact
I have written a python-wrapped C implementation of this in 2D, which is not really ready for production (it is still single threaded, etc) but it will run in as close to O(N) as your dataset will allow. You set the "element size", which acts as a bin size (the code will call interactions on everything within b of another point, and sometimes between b and 2 * sqrt(2) * b), give it an array (native python list) of objects with an x and y property and my C module will callback to a python function of your choice to run an interaction function for matched pairs of elements. it's designed for running contact force DEM simulations, but it will work fine on this problem too.
As I haven't released it yet, because the other bits of the library aren't ready yet, I'll have to give you a zip of my current source but the contact detection part is solid. The code is LGPL'd.
You'll need Cython and a c compiler to make it work, and it's only been tested and working under *nix environemnts, if you're on windows you'll need the mingw c compiler for Cython to work at all.
Once Cython's installed, building/installing pynet should be a case of running setup.py.
The function you are interested in is pynet.d2.run_contact_detection(py_elements, py_interaction_function, py_simulation_parameters) (and you should check out the classes Element and SimulationParameters at the same level if you want it to throw less errors - look in the file at archive-root/pynet/d2/__init__.py to see the class implementations, they're trivial data holders with useful constructors.)
(I will update this answer with a public mercurial repo when the code is ready for more general release...)
Your solution is okay, but one clear problem is that you're getting dark regions despite there being a point right in the middle of them.
I would instead center an n-dimensional Gaussian on each point and evaluate the sum over each point you want to display. To reduce it to linear time in the common case, use query_ball_point to consider only points within a couple standard deviations.
If you find that he KDTree is really slow, why not call query_ball_point once every five pixels with a slightly larger threshold? It doesn't hurt too much to evaluate a few too many Gaussians.
You can do this with a 2D, separable convolution (scipy.ndimage.convolve1d) of your original image with a gaussian shaped kernel. With an image size of MxM and a filter size of P, the complexity is O(PM^2) using separable filtering. The "Big-Oh" complexity is no doubt greater, but you can take advantage of numpy's efficient array operations which should greatly speed up your calculations.
Just a note, the histogram2d function should work fine for this. Did you play around with different bin sizes? Your initial histogram2d plot seems to just use the default bin sizes... but there's no reason to expect the default sizes to give you the representation you want. Having said that, many of the other solutions are impressive too.

Categories