computing weight based on distance from centre of ellipse - python

I was using a circular image mask before and I was calculating a weight based on the distance from the centre of the circle as follows:
import numpy as np
def create_mask(image, radius, center=(0, 0)):
r, c, d = image.shape
x, y = np.ogrid[:r, :c]
distance = np.sqrt((x-center[0])**2 + (y-center[1])**2)
m = distance < radius
distance[m] = 1.0 - distance[m]/radius
array = np.zeros((r, c))
array[m] = distance[m]
return array
This was basically setting the height weight at the centre and the weight was dropping linearly towards the edges.
Now, I want to do something similar with an ellipse. Again, the ellipse can have very different radii along the two dimensions and I would like the weight to drop linearly with distance as well. However, regardless of the long or the short radii, I would like the weights to decay similarly towards the edges. I am guessing I need to include a weight based on both the radius to achieve this but was unable to figure it out.

I'm not sure about linear weights, but you can achieve a continuous array weights from 1 to 0 using (and I'm sure there's a more efficient way to do this)
ellipse = lambda x0, y0, r_x, r_y: lambda x, y: ((x - x0) / r_x)**2 + ((y - y0) / r_y)**2
def gen_ellipse(el, lower, upper, step):
coords = np.arange(lower, upper, step)
x, y = np.meshgrid(coords, coords)
mask = el(x, y)
mask[np.where(mask > 1)] = 0
return 1 - mask
For example:
> %pylab
> el = ellipse(0.0, 0.0, 0.3, 0.8)
> mask = gen_ellipse(el, -1.0, 1.0, 0.0025)
> imshow(mask, cmap=get_cmap('Greys'))
Where black is 1 and white is 0.

Related

Draw a circle in a numpy array given index and radius without external libraries

I need to draw a circle in a 2D numpy array given [i,j] as indexes of the array, and r as the radius of the circle. Each time a condition is met at index [i,j], a circle should be drawn with that as the center point, increasing all values inside the circle by +1. I want to avoid the for-loops at the end where I draw the circle (where I use p,q to index) because I have to draw possibly millions of circles. Is there a way without for loops? I also don't want to import another library for just a single task.
Here is my current implementation:
for i in range(array_shape[0]):
for j in range(array_shape[1]):
if (condition): # Draw circle if condition is fulfilled
# Create a square of pixels with side lengths equal to radius of circle
x_square_min = i-r
x_square_max = i+r+1
y_square_min = j-r
y_square_max = j+r+1
# Clamp this square to the edges of the array so circles near edges don't wrap around
if x_square_min < 0:
x_square_min = 0
if y_square_min < 0:
y_square_min = 0
if x_square_max > array_shape[0]:
x_square_max = array_shape[0]
if y_square_max > array_shape[1]:
y_square_max = array_shape[1]
# Now loop over the box and draw circle inside of it
for p in range(x_square_min , x_square_max):
for q in range(y_square_min , y_square_max):
if (p - i) ** 2 + (q - j) ** 2 <= r ** 2:
new_array[p,q] += 1 # Incrementing because need to have possibility of
# overlapping circles
If you're using the same radius for every single circle, you can simplify things significantly by only calculating the circle coordinates once and then adding the center coordinates to the circle points when needed. Here's the code:
# The main array of values is called array.
shape = array.shape
row_indices = np.arange(0, shape[0], 1)
col_indices = np.arange(0, shape[1], 1)
# Returns xy coordinates for a circle with a given radius, centered at (0,0).
def points_in_circle(radius):
a = np.arange(radius + 1)
for x, y in zip(*np.where(a[:,np.newaxis]**2 + a**2 <= radius**2)):
yield from set(((x, y), (x, -y), (-x, y), (-x, -y),))
# Set the radius value before running code.
radius = RADIUS
circle_r = np.array(list(points_in_circle(radius)))
# Note that I'm using x as the row number and y as the column number.
# Center of circle is at (x_center, y_center). shape_0 and shape_1 refer to the main array
# so we can get rid of coordinates outside the bounds of array.
def add_center_to_circle(circle_points, x_center, y_center, shape_0, shape_1):
circle = np.copy(circle_points)
circle[:, 0] += x_center
circle[:, 1] += y_center
# Get rid of rows where coordinates are below 0 (can't be indexed)
bad_rows = np.array(np.where(circle < 0)).T[:, 0]
circle = np.delete(circle, bad_rows, axis=0)
# Get rid of rows that are outside the upper bounds of the array.
circle = circle[circle[:, 0] < shape_0, :]
circle = circle[circle[:, 1] < shape_1, :]
return circle
for x in row_indices:
for y in col_indices:
# You need to set CONDITION before running the code.
if CONDITION:
# Because circle_r is the same for all circles, it doesn't need to be recalculated all the time. All you need to do is add x and y to circle_r each time CONDITION is met.
circle_coords = add_center_to_circle(circle_r, x, y, shape[0], shape[1])
array[tuple(circle_coords.T)] += 1
When I set radius = 10, array = np.random.rand(1200).reshape(40, 30) and replaced if CONDITION with if (x == 20 and y == 20) or (x == 25 and y == 20), I got this, which seems to be what you want:
Let me know if you have any questions.
Adding each circle can be vectorized. This solution iterates over the coordinates where the condition is met. On a 2-core colab instance ~60k circles with radius 30 can be added per second.
import numpy as np
np.random.seed(42)
arr = np.random.rand(400,300)
r = 30
xx, yy = np.mgrid[-r:r+1, -r:r+1]
circle = xx**2 + yy**2 <= r**2
condition = np.where(arr > .999) # np.where(arr > .5) to benchmark 60k circles
for x,y in zip(*condition):
# valid indices of the array
i = slice(max(x-r,0), min(x+r+1, arr.shape[0]))
j = slice(max(y-r,0), min(y+r+1, arr.shape[1]))
# visible slice of the circle
ci = slice(abs(min(x-r, 0)), circle.shape[0] - abs(min(arr.shape[0]-(x+r+1), 0)))
cj = slice(abs(min(y-r, 0)), circle.shape[1] - abs(min(arr.shape[1]-(y+r+1), 0)))
arr[i, j] += circle[ci, cj]
Visualizing np.array arr
import matplotlib.pyplot as plt
plt.figure(figsize=(8,8))
plt.imshow(arr)
plt.show()

Efficiently apply function to spheric neighbourhood in numpy array

I have a 3D numpy array of float values in Python.
I need to retrieve all the elements in a sphere of radius r starting from
a center point P(x, y, z). Then, I want to apply to the sphere points a function that
updates their values and needs the distance to the center point to do this. I do these steps a lot of times and for
large radius values, so I would like to have a solution that is as efficient
as possible.
My current solution checks only the points in the bounding box of the sphere,
as indicated here: Using a QuadTree to get all points within a bounding circle.
A sketch of the code looks like this:
# P(x, y, z): center of the sphere
for k1 in range(x - r, x + r + 1):
for k2 in range(y - r, y + r + 1):
for k3 in range(z - r, z + r + 1):
# Sphere center - current point distance
dist = np.sum((np.array([k1, k2, k3]) - np.array([x, y, z])) ** 2)
if (dist <= r * r):
# computeUpdatedValue(distance, radius): function that computes the new value of the matrix in the current point
newValue = computeUpdatedValue(dist, r)
# Update the matrix
mat[k1, k2, k3] = newValue
However, I thought that applying a mask to retrive the points and, then,
update them based on distance in a vectorized manner is more efficient.
I have seen how to apply a circular kernel
(How to apply a disc shaped mask to a numpy array?),
but I do no know how to efficiently apply the function (depending on the indices) on each of the mask's elements.
EDIT: If your array is very big compared to the region you are updating, the solution below will take much more memory than necessary. You can apply the same idea but only to the region where the sphere may fall:
def updateSphereBetter(mat, center, radius):
# Find beginning and end of region of interest
center = np.asarray(center)
start = np.minimum(np.maximum(center - radius, 0), mat.shape)
end = np.minimum(np.maximum(center + radius + 1, 0), mat.shape)
# Slice region of interest
mat_sub = mat[tuple(slice(s, e) for s, e in zip(start, end))]
# Center coordinates relative to the region of interest
center_rel = center - start
# Same as before but with mat_sub and center_rel
ind = np.indices(mat_sub.shape)
ind = np.moveaxis(ind, 0, -1)
dist_squared = np.sum(np.square(ind - center_rel), axis=-1)
mask = dist_squared <= radius * radius
mat_sub[mask] = computeUpdatedValue(dist_squared[mask], radius)
Note that since mat_sub is a view of mat, updating it updates the original array, so this produces the same result as before, but with less resources.
Here is a little proof of concept. I defined computeUpdatedValue so that it shows the distance from the center, and then plotted a few "sections" of an example:
import numpy as np
import matplotlib.pyplot as plt
def updateSphere(mat, center, radius):
# Make array of all index coordinates
ind = np.indices(mat.shape)
# Compute the squared distances to each point
ind = np.moveaxis(ind, 0, -1)
dist_squared = np.sum(np.square(ind - center), axis=-1)
# Make a mask for squared distances within squared radius
mask = dist_squared <= radius * radius
# Update masked values
mat[mask] = computeUpdatedValue(dist_squared[mask], radius)
def computeUpdatedValue(dist_squared, radius):
# 1 at the center of the sphere and 0 at the surface
return np.clip(1 - np.sqrt(dist_squared) / radius, 0, 1)
mat = np.zeros((100, 60, 80))
updateSphere(mat, [50, 20, 40], 20)
plt.subplot(131)
plt.imshow(mat[:, :, 30], vmin=0, vmax=1)
plt.subplot(132)
plt.imshow(mat[:, :, 40], vmin=0, vmax=1)
plt.subplot(133)
plt.imshow(mat[:, :, 55], vmin=0, vmax=1)
Output:

Creating an image mask with a linear gradient

I am creating a circular mask in python as follows:
import numpy as np
def make_mask(image, radius, center=(0, 0)):
r, c, d = image.shape
y, x = np.ogrid[-center[0]:r-center[0], -center[1]:r-center[1]]
mask = x*x + y*y <= radius*radius
array = np.zeros((r, c))
array[mask] = 1
return array
This returns a mask of shape (r, c). What I would like to do is have a weighted mask where the weight is 1 at the center of the image (given by the center parameter) and decreasing linearly towards the edge of the image. So, his should be an added weight calculated between 0 and 1 (0 not included) in the line. I was thinking this should be something like:
distance = (center[0] - x)**2 + (center[1] - y)**2
# weigh it inversely to distance from center
mask = (x*x + y*y) * 1.0/distance
However, this will result in divide by 0 and the mask would not be between 0 and 1 either.
First, if you want to weight to be linear, you need to take the square root of what you have for distance (ie, what you're calling "distance" isn't the distance from the center but the square of that, so you should rename it to something like R_squared). So:
R_squared = (center[0] - x)**2 + (center[1] - y)**2 # what you have for distance
r = sqrt(R_squared)
Then, since it starts off as 0 where you want it to be 1, add 1 to it; but now that you've added 1 scale the value so it's 1 where you want the result to be 0. Say you want it to be 0 at a distance L from then center, then your equation is:
weight = 1 - r/L
Here this will be 1 where r==0 and 0 where r==L.

Stop pyplot.contour from drawing a contour along a discontinuity

I have a 2d map of a coordinate transform. The data at each point is the aximuthal angle in the original coordinate system, which goes from 0 to 360. I'm trying to use pyplot.contour to plot lines of constant angle, e.g. 45 degrees. The contour appears along the 45 degree line between the two poles, but there's an additional part to the contour that connects the two poles along the 0/360 discontinuity. This makes a very jagged ugly line as it basically just traces the pixels with a number close to 0 on one side and another close to 360 on the other.
Examples:
Here is an image using full colour map:
You can see the discontinuity along the blue/red curve on the left side. One side is 360 degrees, the other is 0 degrees. When plotting contours, I get:
Note that all contours connect the two poles, but even though I have NOT plotted the 0 degree contour, all the other contours follow along the 0 degree discontinuity (because pyplot thinks if it's 0 on one side and 360 on the other, there must be all other angles in between).
Code to produce this data:
import numpy as np
import matplotlib.pyplot as plt
jgal = np.array(
[
[-0.054875539726, -0.873437108010, -0.483834985808],
[0.494109453312, -0.444829589425, 0.746982251810],
[-0.867666135858, -0.198076386122, 0.455983795705],
]
)
def s2v3(rra, rdec, r):
pos0 = r * np.cos(rra) * np.cos(rdec)
pos1 = r * np.sin(rra) * np.cos(rdec)
pos2 = r * np.sin(rdec)
return np.array([pos0, pos1, pos2])
def v2s3(pos):
x = pos[0]
y = pos[1]
z = pos[2]
if np.isscalar(x):
x, y, z = np.array([x]), np.array([y]), np.array([z])
rra = np.arctan2(y, x)
low = np.where(rra < 0.0)
high = np.where(rra > 2.0 * np.pi)
if len(low[0]):
rra[low] = rra[low] + (2.0 * np.pi)
if len(high[0]):
rra[high] = rra[high] - (2.0 * np.pi)
rxy = np.sqrt(x ** 2 + y ** 2)
rdec = np.arctan2(z, rxy)
r = np.sqrt(x ** 2 + y ** 2 + z ** 2)
if x.size == 1:
rra = rra[0]
rdec = rdec[0]
r = r[0]
return rra, rdec, r
def gal2fk5(gl, gb):
rgl = np.deg2rad(gl)
rgb = np.deg2rad(gb)
r = 1.0
pos = s2v3(rgl, rgb, r)
pos1 = np.dot(pos.transpose(), jgal).transpose()
rra, rdec, r = v2s3(pos1)
dra = np.rad2deg(rra)
ddec = np.rad2deg(rdec)
return dra, ddec
def make_coords(resolution=50):
width = 9
height = 6
px = width * resolution
py = height * resolution
coords = np.zeros((px, py, 4))
for ix in range(0, px):
for iy in range(0, py):
l = 360.0 / px * ix - 180.0
b = 180.0 / py * iy - 90.0
dra, ddec = gal2fk5(l, b)
coords[ix, iy, 0] = dra
coords[ix, iy, 1] = ddec
coords[ix, iy, 2] = l
coords[ix, iy, 3] = b
return coords
coords = make_coords()
# now do one of these
# plt.imshow(coords[:,:,0],origin='lower') # color plot
plt.contour(
coords[:, :, 0], levels=[45, 90, 135, 180, 225, 270, 315]
) # contour plot with jagged ugliness
plt.show()
How can I either:
stop pyplot.contour from drawing a contour along the discontinuity
make pyplot.contour recognize that the 0/360 discontinuity in angle is not a real discontinuity at all.
I can just increase the resolution of the underlying data, but before I get a nice smooth line it starts to take a very long time and a lot of memory to plot.
I will also want to plot a contour along 0 degrees, but if I can figure out how to hide the discontinuity I can just shift it to somewhere else not near a contour. Or, if I can make #2 happen, it won't be an issue.
This is definitely still a hack, but you can get nice smooth contours with a two-fold approach:
Plot contours of the absolute value of the phase (going from -180˚ to 180˚) so that there is no discontinuity.
Plot two sets of contours in a finite region so that numerical defects close to the tops and bottoms of the extrema do not creep in.
Here is the complete code to append to your example:
Z = np.exp(1j*np.pi*coords[:,:,0]/180.0)
Z *= np.exp(0.25j*np.pi/2.0) # Shift to get same contours as in your example
X = np.arange(300)
Y = np.arange(450)
N = 2
levels = 90*(0.5 + (np.arange(N) + 0.5)/N)
c1 = plt.contour(X, Y, abs(np.angle(Z)*180/np.pi), levels=levels)
c2 = plt.contour(X, Y, abs(np.angle(Z*np.exp(0.5j*np.pi))*180/np.pi), levels=levels)
One can generalize this code to get smooth contours for any "periodic" function. What is left to be done is to generate a new set of contours with the correct values so that colormaps apply correctly, labels will be applied correctly etc. However, there does not seem to be a simple way of doing this with matplotlib: the relevant QuadContourSet class does everything and I do not see a simple way of constructing an appropriate contour object from the contours c1 and c2.
I was interested in the exact same problem. One solution is to NaN out the contours along the branch cut; see here; another is to use the max_jump argument in matplotx's contour().
I molded the solution into a Python package, cplot.
import cplot
import numpy as np
def f(z):
return np.exp(1 / z)
cplot.show(f, (-1.0, +1.0, 400), (-1.0, +1.0, 400))

Efficient method of calculating density of irregularly spaced points

I am attempting to generate map overlay images that would assist in identifying hot-spots, that is areas on the map that have high density of data points. None of the approaches that I've tried are fast enough for my needs.
Note: I forgot to mention that the algorithm should work well under both low and high zoom scenarios (or low and high data point density).
I looked through numpy, pyplot and scipy libraries, and the closest I could find was numpy.histogram2d. As you can see in the image below, the histogram2d output is rather crude. (Each image includes points overlaying the heatmap for better understanding)
My second attempt was to iterate over all the data points, and then calculate the hot-spot value as a function of distance. This produced a better looking image, however it is too slow to use in my application. Since it's O(n), it works ok with 100 points, but blows out when I use my actual dataset of 30000 points.
My final attempt was to store the data in an KDTree, and use the nearest 5 points to calculate the hot-spot value. This algorithm is O(1), so much faster with large dataset. It's still not fast enough, it takes about 20 seconds to generate a 256x256 bitmap, and I would like this to happen in around 1 second time.
Edit
The boxsum smoothing solution provided by 6502 works well at all zoom levels and is much faster than my original methods.
The gaussian filter solution suggested by Luke and Neil G is the fastest.
You can see all four approaches below, using 1000 data points in total, at 3x zoom there are around 60 points visible.
Complete code that generates my original 3 attempts, the boxsum smoothing solution provided by 6502 and gaussian filter suggested by Luke (improved to handle edges better and allow zooming in) is here:
import matplotlib
import numpy as np
from matplotlib.mlab import griddata
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import math
from scipy.spatial import KDTree
import time
import scipy.ndimage as ndi
def grid_density_kdtree(xl, yl, xi, yi, dfactor):
zz = np.empty([len(xi),len(yi)], dtype=np.uint8)
zipped = zip(xl, yl)
kdtree = KDTree(zipped)
for xci in range(0, len(xi)):
xc = xi[xci]
for yci in range(0, len(yi)):
yc = yi[yci]
density = 0.
retvalset = kdtree.query((xc,yc), k=5)
for dist in retvalset[0]:
density = density + math.exp(-dfactor * pow(dist, 2)) / 5
zz[yci][xci] = min(density, 1.0) * 255
return zz
def grid_density(xl, yl, xi, yi):
ximin, ximax = min(xi), max(xi)
yimin, yimax = min(yi), max(yi)
xxi,yyi = np.meshgrid(xi,yi)
#zz = np.empty_like(xxi)
zz = np.empty([len(xi),len(yi)])
for xci in range(0, len(xi)):
xc = xi[xci]
for yci in range(0, len(yi)):
yc = yi[yci]
density = 0.
for i in range(0,len(xl)):
xd = math.fabs(xl[i] - xc)
yd = math.fabs(yl[i] - yc)
if xd < 1 and yd < 1:
dist = math.sqrt(math.pow(xd, 2) + math.pow(yd, 2))
density = density + math.exp(-5.0 * pow(dist, 2))
zz[yci][xci] = density
return zz
def boxsum(img, w, h, r):
st = [0] * (w+1) * (h+1)
for x in xrange(w):
st[x+1] = st[x] + img[x]
for y in xrange(h):
st[(y+1)*(w+1)] = st[y*(w+1)] + img[y*w]
for x in xrange(w):
st[(y+1)*(w+1)+(x+1)] = st[(y+1)*(w+1)+x] + st[y*(w+1)+(x+1)] - st[y*(w+1)+x] + img[y*w+x]
for y in xrange(h):
y0 = max(0, y - r)
y1 = min(h, y + r + 1)
for x in xrange(w):
x0 = max(0, x - r)
x1 = min(w, x + r + 1)
img[y*w+x] = st[y0*(w+1)+x0] + st[y1*(w+1)+x1] - st[y1*(w+1)+x0] - st[y0*(w+1)+x1]
def grid_density_boxsum(x0, y0, x1, y1, w, h, data):
kx = (w - 1) / (x1 - x0)
ky = (h - 1) / (y1 - y0)
r = 15
border = r * 2
imgw = (w + 2 * border)
imgh = (h + 2 * border)
img = [0] * (imgw * imgh)
for x, y in data:
ix = int((x - x0) * kx) + border
iy = int((y - y0) * ky) + border
if 0 <= ix < imgw and 0 <= iy < imgh:
img[iy * imgw + ix] += 1
for p in xrange(4):
boxsum(img, imgw, imgh, r)
a = np.array(img).reshape(imgh,imgw)
b = a[border:(border+h),border:(border+w)]
return b
def grid_density_gaussian_filter(x0, y0, x1, y1, w, h, data):
kx = (w - 1) / (x1 - x0)
ky = (h - 1) / (y1 - y0)
r = 20
border = r
imgw = (w + 2 * border)
imgh = (h + 2 * border)
img = np.zeros((imgh,imgw))
for x, y in data:
ix = int((x - x0) * kx) + border
iy = int((y - y0) * ky) + border
if 0 <= ix < imgw and 0 <= iy < imgh:
img[iy][ix] += 1
return ndi.gaussian_filter(img, (r,r)) ## gaussian convolution
def generate_graph():
n = 1000
# data points range
data_ymin = -2.
data_ymax = 2.
data_xmin = -2.
data_xmax = 2.
# view area range
view_ymin = -.5
view_ymax = .5
view_xmin = -.5
view_xmax = .5
# generate data
xl = np.random.uniform(data_xmin, data_xmax, n)
yl = np.random.uniform(data_ymin, data_ymax, n)
zl = np.random.uniform(0, 1, n)
# get visible data points
xlvis = []
ylvis = []
for i in range(0,len(xl)):
if view_xmin < xl[i] < view_xmax and view_ymin < yl[i] < view_ymax:
xlvis.append(xl[i])
ylvis.append(yl[i])
fig = plt.figure()
# plot histogram
plt1 = fig.add_subplot(221)
plt1.set_axis_off()
t0 = time.clock()
zd, xe, ye = np.histogram2d(yl, xl, bins=10, range=[[view_ymin, view_ymax],[view_xmin, view_xmax]], normed=True)
plt.title('numpy.histogram2d - '+str(time.clock()-t0)+"sec")
plt.imshow(zd, origin='lower', extent=[view_xmin, view_xmax, view_ymin, view_ymax])
plt.scatter(xlvis, ylvis)
# plot density calculated with kdtree
plt2 = fig.add_subplot(222)
plt2.set_axis_off()
xi = np.linspace(view_xmin, view_xmax, 256)
yi = np.linspace(view_ymin, view_ymax, 256)
t0 = time.clock()
zd = grid_density_kdtree(xl, yl, xi, yi, 70)
plt.title('function of 5 nearest using kdtree\n'+str(time.clock()-t0)+"sec")
cmap=cm.jet
A = (cmap(zd/256.0)*255).astype(np.uint8)
#A[:,:,3] = zd
plt.imshow(A , origin='lower', extent=[view_xmin, view_xmax, view_ymin, view_ymax])
plt.scatter(xlvis, ylvis)
# gaussian filter
plt3 = fig.add_subplot(223)
plt3.set_axis_off()
t0 = time.clock()
zd = grid_density_gaussian_filter(view_xmin, view_ymin, view_xmax, view_ymax, 256, 256, zip(xl, yl))
plt.title('ndi.gaussian_filter - '+str(time.clock()-t0)+"sec")
plt.imshow(zd , origin='lower', extent=[view_xmin, view_xmax, view_ymin, view_ymax])
plt.scatter(xlvis, ylvis)
# boxsum smoothing
plt3 = fig.add_subplot(224)
plt3.set_axis_off()
t0 = time.clock()
zd = grid_density_boxsum(view_xmin, view_ymin, view_xmax, view_ymax, 256, 256, zip(xl, yl))
plt.title('boxsum smoothing - '+str(time.clock()-t0)+"sec")
plt.imshow(zd, origin='lower', extent=[view_xmin, view_xmax, view_ymin, view_ymax])
plt.scatter(xlvis, ylvis)
if __name__=='__main__':
generate_graph()
plt.show()
This approach is along the lines of some previous answers: increment a pixel for each spot, then smooth the image with a gaussian filter. A 256x256 image runs in about 350ms on my 6-year-old laptop.
import numpy as np
import scipy.ndimage as ndi
data = np.random.rand(30000,2) ## create random dataset
inds = (data * 255).astype('uint') ## convert to indices
img = np.zeros((256,256)) ## blank image
for i in xrange(data.shape[0]): ## draw pixels
img[inds[i,0], inds[i,1]] += 1
img = ndi.gaussian_filter(img, (10,10))
A very simple implementation that could be done (with C) in realtime and that only takes fractions of a second in pure python is to just compute the result in screen space.
The algorithm is
Allocate the final matrix (e.g. 256x256) with all zeros
For each point in the dataset increment the corresponding cell
Replace each cell in the matrix with the sum of the values of the matrix in an NxN box centered on the cell. Repeat this step a few times.
Scale result and output
The computation of the box sum can be made very fast and independent on N by using a sum table. Every computation just requires two scan of the matrix... total complexity is O(S + WHP) where S is the number of points; W, H are width and height of output and P is the number of smoothing passes.
Below is the code for a pure python implementation (also very un-optimized); with 30000 points and a 256x256 output grayscale image the computation is 0.5sec including linear scaling to 0..255 and saving of a .pgm file (N = 5, 4 passes).
def boxsum(img, w, h, r):
st = [0] * (w+1) * (h+1)
for x in xrange(w):
st[x+1] = st[x] + img[x]
for y in xrange(h):
st[(y+1)*(w+1)] = st[y*(w+1)] + img[y*w]
for x in xrange(w):
st[(y+1)*(w+1)+(x+1)] = st[(y+1)*(w+1)+x] + st[y*(w+1)+(x+1)] - st[y*(w+1)+x] + img[y*w+x]
for y in xrange(h):
y0 = max(0, y - r)
y1 = min(h, y + r + 1)
for x in xrange(w):
x0 = max(0, x - r)
x1 = min(w, x + r + 1)
img[y*w+x] = st[y0*(w+1)+x0] + st[y1*(w+1)+x1] - st[y1*(w+1)+x0] - st[y0*(w+1)+x1]
def saveGraph(w, h, data):
X = [x for x, y in data]
Y = [y for x, y in data]
x0, y0, x1, y1 = min(X), min(Y), max(X), max(Y)
kx = (w - 1) / (x1 - x0)
ky = (h - 1) / (y1 - y0)
img = [0] * (w * h)
for x, y in data:
ix = int((x - x0) * kx)
iy = int((y - y0) * ky)
img[iy * w + ix] += 1
for p in xrange(4):
boxsum(img, w, h, 2)
mx = max(img)
k = 255.0 / mx
out = open("result.pgm", "wb")
out.write("P5\n%i %i 255\n" % (w, h))
out.write("".join(map(chr, [int(v*k) for v in img])))
out.close()
import random
data = [(random.random(), random.random())
for i in xrange(30000)]
saveGraph(256, 256, data)
Edit
Of course the very definition of density in your case depends on a resolution radius, or is the density just +inf when you hit a point and zero when you don't?
The following is an animation built with the above program with just a few cosmetic changes:
used sqrt(average of squared values) instead of sum for the averaging pass
color-coded the results
stretching the result to always use the full color scale
drawn antialiased black dots where the data points are
made an animation by incrementing the radius from 2 to 40
The total computing time of the 39 frames of the following animation with this cosmetic version is 5.4 seconds with PyPy and 26 seconds with standard Python.
Histograms
The histogram way is not the fastest, and can't tell the difference between an arbitrarily small separation of points and 2 * sqrt(2) * b (where b is bin width).
Even if you construct the x bins and y bins separately (O(N)), you still have to perform some ab convolution (number of bins each way), which is close to N^2 for any dense system, and even bigger for a sparse one (well, ab >> N^2 in a sparse system.)
Looking at the code above, you seem to have a loop in grid_density() which runs over the number of bins in y inside a loop of the number of bins in x, which is why you're getting O(N^2) performance (although if you are already order N, which you should plot on different numbers of elements to see, then you're just going to have to run less code per cycle).
If you want an actual distance function then you need to start looking at contact detection algorithms.
Contact Detection
Naive contact detection algorithms come in at O(N^2) in either RAM or CPU time, but there is an algorithm, rightly or wrongly attributed to Munjiza at St. Mary's college London, which runs in linear time and RAM.
you can read about it and implement it yourself from his book, if you like.
I have written this code myself, in fact
I have written a python-wrapped C implementation of this in 2D, which is not really ready for production (it is still single threaded, etc) but it will run in as close to O(N) as your dataset will allow. You set the "element size", which acts as a bin size (the code will call interactions on everything within b of another point, and sometimes between b and 2 * sqrt(2) * b), give it an array (native python list) of objects with an x and y property and my C module will callback to a python function of your choice to run an interaction function for matched pairs of elements. it's designed for running contact force DEM simulations, but it will work fine on this problem too.
As I haven't released it yet, because the other bits of the library aren't ready yet, I'll have to give you a zip of my current source but the contact detection part is solid. The code is LGPL'd.
You'll need Cython and a c compiler to make it work, and it's only been tested and working under *nix environemnts, if you're on windows you'll need the mingw c compiler for Cython to work at all.
Once Cython's installed, building/installing pynet should be a case of running setup.py.
The function you are interested in is pynet.d2.run_contact_detection(py_elements, py_interaction_function, py_simulation_parameters) (and you should check out the classes Element and SimulationParameters at the same level if you want it to throw less errors - look in the file at archive-root/pynet/d2/__init__.py to see the class implementations, they're trivial data holders with useful constructors.)
(I will update this answer with a public mercurial repo when the code is ready for more general release...)
Your solution is okay, but one clear problem is that you're getting dark regions despite there being a point right in the middle of them.
I would instead center an n-dimensional Gaussian on each point and evaluate the sum over each point you want to display. To reduce it to linear time in the common case, use query_ball_point to consider only points within a couple standard deviations.
If you find that he KDTree is really slow, why not call query_ball_point once every five pixels with a slightly larger threshold? It doesn't hurt too much to evaluate a few too many Gaussians.
You can do this with a 2D, separable convolution (scipy.ndimage.convolve1d) of your original image with a gaussian shaped kernel. With an image size of MxM and a filter size of P, the complexity is O(PM^2) using separable filtering. The "Big-Oh" complexity is no doubt greater, but you can take advantage of numpy's efficient array operations which should greatly speed up your calculations.
Just a note, the histogram2d function should work fine for this. Did you play around with different bin sizes? Your initial histogram2d plot seems to just use the default bin sizes... but there's no reason to expect the default sizes to give you the representation you want. Having said that, many of the other solutions are impressive too.

Categories