The purpose of my experiment is to fit a ring gaussian model to image data and find out the parameters of elliptical or ring Gaussian object in an image.
I've tried Astropy to make a Ring Gaussian model for simplicity and trial. Unfortunately, it's completely different from my artificial data.
from astropy.modeling import Fittable2DModel, Parameter, fitting
import matplotlib.pyplot as plt
import numpy as np
class ringGaussian(Fittable2DModel):
background = Parameter(default=5.)
amplitude = Parameter(default=1500.)
x0 = Parameter(default=15.)
y0 = Parameter(default=15.)
radius = Parameter(default=2.)
width = Parameter(default=1.)
#staticmethod
def evaluate(x, y, background, amplitude, x0, y0, radius, width):
z = background + amplitude * np.exp( -((np.sqrt((x-x0)**2. + (y-y0)**2.) - radius) / width)**2 )
return z
Then I made some artificial data (initial parameters) to test the fitting function of ring gaussian class.
back = 10 # background
amp = 2000 # highest value of the Gaussian
x0 = 10 # x coordinate of center
y0 = 10 # y coordinate of center
radius = 3
width = 1
xx, yy = np.mgrid[:30, :30]
z = back + amp * np.exp( -((np.sqrt((xx-x0)**2. + (yy-y0)**2.) - radius) / width)**2 )
I plotted xx, yy and z using contourf:
fig = plt.subplots(figsize=(7,7))
plt.contourf(xx,yy,z)
plt.show()
It is what I got:
enter image description here
Then I tried to fit the z using my fittable class:
p_init = ringGaussian() #bounds={"x0":[0., 20.], "y0":[0., 20.]}
fit_p = fitting.LevMarLSQFitter()
p = fit_p(p_init, xx, yy, z)
# It is the parameter I got:
<ringGaussian(background=133.0085329497139, amplitude=-155.53652181827655, x0=25.573499373946227, y0=25.25813520725603, radius=8.184302497405568, width=-7.273935403490675)>
I plotted the model:
fig = plt.subplots(figsize=(7,7))
plt.contourf(xx,yy,p(xx,yy))
plt.show()
It is what I got:
enter image description here
Originally, I also tried to include the derivative in my class:
#staticmethod
def fit_deriv(x, y, background, amplitude, x0, y0, radius, width):
g = (np.sqrt((x-x0)**2. + (y-y0)**2.) - radius) / width
z = amplitude * np.exp( -g**2 )
dg_dx0 = - (x-x0)/np.sqrt((x-x0)**2. + (y-y0)**2.)
dg_dy0 = - (y-y0)/np.sqrt((x-x0)**2. + (y-y0)**2.)
dg_dr0 = - 1/width
dg_dw0 = g * -1/width
dz_dB = 1.
dz_dA = z / amplitude
dz_dx0 = -2 * z * g**3 * dg_dx0
dz_dy0 = -2 * z * g**3 * dg_dy0
dz_dr0 = -2 * z * g**3 * dg_dr0
dz_dw0 = -2 * z * g**3 * dg_dw0
return [dz_dB, dz_dA, dz_dx0, dz_dy0, dz_dr0, dz_dw0]
But it returned "ValueError: setting an array element with a sequence."
I'm quite desperate now. Can anyone suggest some possible solutions? or alternative ways to implement the ring gaussian fit in python?
Many many thanks~~~
Your implementation is OK (I did not try or check your fit_deriv implementation).
The issue is just that your fit doesn't converge because the initial model parameters are too far from the true values, so the optimiser fails. When I run your code, I get this warning:
WARNING: The fit may be unsuccessful; check fit_info['message'] for more information. [astropy.modeling.fitting]
If you change to model parameters so that your model roughly matches the data, the fit succeeds:
p_init = ringGaussian(x0=11, y0=11)
To check how your model compares with the data, you can use imshow to display data and model images (or also e.g. residual images):
plt.imshow(z)
plt.imshow(p_init(xx,yy))
plt.imshow(p(xx,yy))
Related
I have some data which looks like this:
I've drawn an ellipse around some of the data using from skimage.measure import EllipseModel
I was able to fit the ellipse by providing the package with B0_M data and the corresponding q^2 between 5200 and 5350, and then I was able to extract some parameters from the fit, to plot the ellipse myself, as follows:
X1Y1 = np.column_stack((X1,Y1))
ell = EllipseModel()
ell.estimate(X1Y1)
xc, yc, a, b, theta = ell.params
where X1 is the full B_0 data and X2 is the full q^2 dataset. It returned the following values for the ellipse parameters:
a = 0.399894
b = 37.826
xc = 5272
yc = 9.27
theta = 1.573
Unfortunately this fit was not perfect, so I scaled some of the parameters, or added some small numbers etc, essentially to tinker to get the fit shown in the figure. Here is how I plotted the ellipse fit:
xc, yc, a, b, theta = ell.params
t = np.linspace(0, 2*np.pi, 100)
dt = 0.01*theta
ell_x = xc + 2*a*np.cos(theta+dt)*np.cos(t) - 1.8*b*np.sin(theta+dt)*np.sin(t)
ell_y = yc + 0.47*a*np.sin(theta+dt)*np.cos(t) + 0.47*b*np.cos(theta+dt)*np.sin(t)+0.26
plt.scatter(X,Y, marker = '.', alpha = 0.05, color = 'navy', s =0.2)
plt.scatter(xc, yc+0.26, color='red', s=10)
plt.plot(ell_x, ell_y, color = 'red')
plt.xlim(5150,5400)
plt.ylim(7,12)
plt.xlabel('B0_M')
plt.ylabel('$q^2$')
plt.title('jpsi')
Now I'd like to remove all of the points, from X1 and Y1, that are inside the ellipse
How can I do this? I wanted to use a simple mathematical argument like basically using the equation of an ellipse, but it is more complicated since I have it in parametric form, and its also not the most tidiest thing since I have scaled different variables by different amounts as I said before.
Is there some way to simply say, "delete points in X, Y if they are inside the ellipse with coordinates ell_x and ell_y"?
Many thanks
I think you can use the equation for an ellipse to construct a mask that isolates the points outside your model.
The trick is to transform your X1 and Y1 into the ellipse's coordinate system by shifting and rotating them using xc, yc, and theta before applying the ellipse equation.
dx = X1 - xc
dy = Y1 - yc
x2 = dx * np.cos(theta) + dy * np.sin(theta)
y2 = -dx * np.sin(theta) + dy * np.cos(theta)
mask = np.square(x2 / a) + np.square(y2 / b) > 1
X1_outside = X1[mask]
Y1_outside = Y1[mask]
Note: I expected that skimage.measure.Ellipse would have some method that makes this easier, but I can't find it after a quick read through the docs. If anyone knows how to do this using Ellipse that would be much more readable.
I am plotting a vector field using the numpy function quiver() and it works. But I would like to emphasize the cowlick in the following plot:
I am not sure how to go about it, but increasing the density of arrows in the center could possibly do the trick. To do so, I would like to resort to some option within np.meshgrid() that would allow me to get more tightly packed x,y coordinate points in the center. A linear, quadratic or other specification does not seem to be built in. I am not sure if sparse can be modified to this end.
The code:
lim = 10
int = 0.22 *lim
x,y = np.meshgrid(np.arange(-lim, lim, int), np.arange(-lim, lim, int))
u = 3 * np.cos(np.arctan2(y,x)) - np.sqrt(x**2+y**2) * np.sin(np.arctan2(y,x))
v = 3 * np.sin(np.arctan2(y,x)) + np.sqrt(x**2+y**2) * np.cos(np.arctan2(y,x))
color = x**2 + y**2
plt.rcParams["image.cmap"] = "Greys_r"
mult = 1
plt.figure(figsize=(mult*lim, mult*lim))
plt.quiver(x,y,u,v,color, linewidths=.006, lw=.1)
plt.show()
Closing the loop on this, thanks to the accepted answer I was able to finally strike a balance between the density of the mesh as I learned from to do from #flwr and keeping the "cowlick" structure of the vector field conspicuous (avoiding the radial structure around the origin as much as possible):
You can construct the points whereever you want to calculate your field on and quivers will be happy about it. The code below uses polar coordinates and stretches the radial coordinate non-linearly.
import numpy as np
import matplotlib.pyplot as plt
lim = 10
N = 10
theta = np.linspace(0.1, 2*np.pi, N*2)
stretcher_factor = 2
r = np.linspace(0.3, lim**(1/stretcher_factor), N)**stretcher_factor
R, THETA = np.meshgrid(r, theta)
x = R * np.cos(THETA)
y = R * np.sin(THETA)
# x,y = np.meshgrid(x, y)
r = x**2 + y**2
u = 3 * np.cos(THETA) - np.sqrt(r) * np.sin(THETA)
v = 3 * np.sin(THETA) + np.sqrt(r) * np.cos(THETA)
plt.rcParams["image.cmap"] = "Greys_r"
mult = 1
plt.figure(figsize=(mult*lim, mult*lim))
plt.quiver(x,y,u,v,r, linewidths=.006, lw=.1)
Edit: Bug taking meshgrid twice
np.meshgrid just makes a grid of the vectors you provide.
What you could do is contract this regular grid in the center to have more points in the center (best visible with more points), e.g. like so:
# contract in the center
a = 0.5 # how far to contract
b = 0.8 # how strongly to contract
c = 1 - b*np.exp(-((x/lim)**2 + (y/lim)**2)/a**2)
x, y = c*x, c*y
plt.plot(x,y,'.k')
plt.show()
Alternatively you can x,y cooridnates that are not dependent on a grid at all:
x = np.random.randn(500)
y = np.random.randn(500)
plt.plot(x,y,'.k')
plt.show()
But I think you'd prefer a slightly more regular patterns you could look into poisson disk sampling with adaptive distances or something like that, but the key point here is that for using quiver, you can use ANY set of coordinates, they do not have to be in a regular grid.
I am trying to fit a 2D Gaussian to an image to find the location of the brightest point in it. My code looks like this:
import numpy as np
import astropy.io.fits as fits
import os
from astropy.stats import mad_std
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
from matplotlib.patches import Circle
from lmfit.models import GaussianModel
from astropy.modeling import models, fitting
def gaussian(xycoor,x0, y0, sigma, amp):
'''This Function is the Gaussian Function'''
x, y = xycoor # x and y taken from fit function. Stars at 0, increases by 1, goes to length of axis
A = 1 / (2*sigma**2)
eq = amp*np.exp(-A*((x-x0)**2 + (y-y0)**2)) #Gaussian
return eq
def fit(image):
med = np.median(image)
image = image-med
image = image[0,0,:,:]
max_index = np.where(image >= np.max(image))
x0 = max_index[1] #Middle of X axis
y0 = max_index[0] #Middle of Y axis
x = np.arange(0, image.shape[1], 1) #Stars at 0, increases by 1, goes to length of axis
y = np.arange(0, image.shape[0], 1) #Stars at 0, increases by 1, goes to length of axis
xx, yy = np.meshgrid(x, y) #creates a grid to plot the function over
sigma = np.std(image) #The standard dev given in the Gaussian
amp = np.max(image) #amplitude
guess = [x0, y0, sigma, amp] #The initial guess for the gaussian fitting
low = [0,0,0,0] #start of data array
#Upper Bounds x0: length of x axis, y0: length of y axis, st dev: max value in image, amplitude: 2x the max value
upper = [image.shape[0], image.shape[1], np.max(image), np.max(image)*2]
bounds = [low, upper]
params, pcov = curve_fit(gaussian, (xx.ravel(), yy.ravel()), image.ravel(),p0 = guess, bounds = bounds) #optimal fit. Not sure what pcov is.
return params
def plotting(image, params):
fig, ax = plt.subplots()
ax.imshow(image)
ax.scatter(params[0], params[1],s = 10, c = 'red', marker = 'x')
circle = Circle((params[0], params[1]), params[2], facecolor = 'none', edgecolor = 'red', linewidth = 1)
ax.add_patch(circle)
plt.show()
data = fits.getdata('AzTECC100.fits') #read in file
med = np.median(data)
data = data - med
data = data[0,0,:,:]
parameters = fit(data)
#generates a gaussian based on the parameters given
plotting(data, parameters)
The image is plotting and the code is giving no errors but the fitting isn't working. It's just putting an x wherever the x0 and y0 are. The pixel values in my image are very small. The max value is 0.0007 and std dev is 0.0001 and the x and y are a few orders of magnitude larger. So I believe my problem is that because of this my eq is going to zero everywhere so the curve_fit is failing. I'm wondering if there's a better way to construct my gaussian so that it plots correctly?
I do not have access to your image. Instead I have generated some test "image" as follows:
y, x = np.indices((51,51))
x -= 25
y -= 25
data = 3 * np.exp(-0.7 * ((x+2)**2 + (y-1)**2))
Also, I have modified your code for plotting to increase the radius of the circle by 10:
circle = Circle((params[0], params[1]), 10 * params[2], ...)
and I commented out two more lines:
# image = image[0,0,:,:]
# data = data[0,0,:,:]
The result that I get is shown in the attached image and it looks reasonable to me:
Could it be that the issue is in how you access data from the FITS file? (e.g., image = image[0,0,:,:]) Are the data 4D array? Why do you have 4 indices?
I also saw that you have asked a similar question here: Astropy.model 2DGaussian issue in which you tried to use just astropy.modeling. I will look into that question.
NOTE: you can replace code such as
max_index = np.where(image >= np.max(image))
x0 = max_index[1] #Middle of X axis
y0 = max_index[0] #Middle of Y axis
with
y0, x0 = np.unravel_index(np.argmax(data), data.shape)
I am trying to reproduce the example of the Gabor transform that it is in his wikipedia entry, and I do not know if it is a bug or I am missing something. The example is the calculate the Gabor transform of a sinusoidal signal:
To plot the frequencies sorted, I create an unsorted axis. Then I use mesh grid to create 2D axes and plot with pcolormesh. Here is the piece of the code:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridsp
dt = 0.05
x = np.arange(-50.0,50.0,dt)
y = np.sin(2.0 * np.pi * x)
Nx = len(x)
w = np.fft.fftfreq(Nx,dt)
sigma = 1.0 / 3.0
neg = np.where (x <= 0.0)
pos = np.where (x > 0.0)
T,W = np.meshgrid(x,w)
func = np.zeros(Nx)
tmp = np.zeros(Nx,dtype='complex64')
gabor = np.zeros((Nx,Nx))
func[neg] = np.sin(2.0 * np.pi * x[neg])
func[pos] = np.sin(4.0 * np.pi * x[pos])
for it in range(Nx):
tmp[:] = np.fft.fft(func[:] * np.exp( - ( x[it] - x[:] ) * ( x[it] - x[:] ) / 2.0 / sigma / sigma ) )
gabor[:,it] = np.real(np.conj(tmp) * tmp)
fig = plt.figure(figsize=(20,10),facecolor='white')
gs = gridsp.GridSpec(2, 1)
ax1 = plt.subplot(gs[0,0])
ax1.plot(x,func,'r',linewidth=2)
ax1.axis('tight')
ax1.set_xticks(np.arange(min(x),max(x),1.) )
ax1.set_xlabel('time',fontsize=20)
ax1.set_ylabel(r'$\sin{time}$',fontsize=20)
ax1.set_xlim([-6.0,6.0])
ax2 = plt.subplot(gs[1,0])
surf1 = ax2.pcolormesh(T,W,gabor,shading='gouraud')
ax2.axis('tight')
ax2.set_xticks(np.arange(min(x),max(x),2.) )
ax2.set_yticks(np.arange(min(w),max(w),2.) )
ax2.set_xlabel('time',fontsize=20)
ax2.set_ylabel('frequency',fontsize=20)
ax2.set_xlim([-6.0,6.0])
ax2.set_ylim([-4.0,4.0])
gs.tight_layout(fig)
plt.show()
Here is the figure I get,
It seems that the upper part of the plot is reduced to zero. If I try it using fftshift when I create the transform and the axis,
for it in range(Nx):
tmp[:] = np.fft.fftshift(np.fft.fft(func[:] * np.exp( - ( x[it] - x[:] ) * ( x[it] - x[:] ) / 2.0 / sigma / sigma ) ) )
gabor[:,it] = np.real(np.conj(tmp) * tmp)
T,W = np.meshgrid(x,np.fft.fftshift(w))
Then I get this figure:
!
It seems that pcolormesh routine can not flip upside down the array as it is usually done in 1D plots. does anybody know exactly why it is doing this?
Thanks,
Alex
The problem lies in W. Or actually in w. When w is plotted:
Thus pcolormesh receives non-monotonic Y coordinates and gets confused. If you look at the description of pcolor or pcolormesh it is clear they cannot do anything reasonable with non-monotonic data.
So, your gabor is fine:
ax.imshow(gabor)
as you can see:
There are several possibilities how to fix this. One of them is to feed both W and gabor to fftshift that way the frequencies will roll back to monotonic. Or - if you want to have the figure as above (negative frequencies on the top), just add the maximum frequency to all negative values of W.
It might also be cleaner to supply pcolormesh with x and w instead of T and W.
If you want performance, you might be better of with imshow (it can be used when the data is equispaced in both dimensions. The only slight problem is the calculation of extents (which actually may be slightly off even in the question). The extents tell the outer limets of the highest, lowest, leftmost and rightmost pixels. However, the pixel vectors only tell the centers of the pixels.
We need to know the following:
number of points in X direction (num_x)
number of points in Y direction (num_y)
value of the first and last x sample (x0, x1)
value of the first and last y sample (y0, y1)
After that we can use imshow to show the data with correct scaling:
dx = 1. * (x1 - x0) / (num_x-1)
dy = 1. * (y1 - y0) / (num_y-1)
ax.imshow(img, extent=[x0 - dx/2, x1 + dx/2, y0 - dy/2, y1 + dy/2], origin='lower', interpolation='nearest')
So, applied to the question's data:
gabor_shifted = np.fft.fftshift(gabor, axes=0)
w_shifted = np.fft.fftshift(w)
x0 = x[0]
x1 = x[-1]
w0 = w_shifted[0]
w1 = w_shifted[-1]
dx = 1.*(x1-x0) / (len(x) - 1)
dw = 1.*(w1-w0) / (len(w) - 1)
ax2.imshow(gabor_shifted, extent=[x0-dx/2, x1+dx/2, w0-dw/2, w1+dw/2], interpolation='nearest', origin='lower', aspect='auto')
ax2.grid('on', color='w')
ax2.ylim(-4,4)
which gives:
I am attempting to generate map overlay images that would assist in identifying hot-spots, that is areas on the map that have high density of data points. None of the approaches that I've tried are fast enough for my needs.
Note: I forgot to mention that the algorithm should work well under both low and high zoom scenarios (or low and high data point density).
I looked through numpy, pyplot and scipy libraries, and the closest I could find was numpy.histogram2d. As you can see in the image below, the histogram2d output is rather crude. (Each image includes points overlaying the heatmap for better understanding)
My second attempt was to iterate over all the data points, and then calculate the hot-spot value as a function of distance. This produced a better looking image, however it is too slow to use in my application. Since it's O(n), it works ok with 100 points, but blows out when I use my actual dataset of 30000 points.
My final attempt was to store the data in an KDTree, and use the nearest 5 points to calculate the hot-spot value. This algorithm is O(1), so much faster with large dataset. It's still not fast enough, it takes about 20 seconds to generate a 256x256 bitmap, and I would like this to happen in around 1 second time.
Edit
The boxsum smoothing solution provided by 6502 works well at all zoom levels and is much faster than my original methods.
The gaussian filter solution suggested by Luke and Neil G is the fastest.
You can see all four approaches below, using 1000 data points in total, at 3x zoom there are around 60 points visible.
Complete code that generates my original 3 attempts, the boxsum smoothing solution provided by 6502 and gaussian filter suggested by Luke (improved to handle edges better and allow zooming in) is here:
import matplotlib
import numpy as np
from matplotlib.mlab import griddata
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import math
from scipy.spatial import KDTree
import time
import scipy.ndimage as ndi
def grid_density_kdtree(xl, yl, xi, yi, dfactor):
zz = np.empty([len(xi),len(yi)], dtype=np.uint8)
zipped = zip(xl, yl)
kdtree = KDTree(zipped)
for xci in range(0, len(xi)):
xc = xi[xci]
for yci in range(0, len(yi)):
yc = yi[yci]
density = 0.
retvalset = kdtree.query((xc,yc), k=5)
for dist in retvalset[0]:
density = density + math.exp(-dfactor * pow(dist, 2)) / 5
zz[yci][xci] = min(density, 1.0) * 255
return zz
def grid_density(xl, yl, xi, yi):
ximin, ximax = min(xi), max(xi)
yimin, yimax = min(yi), max(yi)
xxi,yyi = np.meshgrid(xi,yi)
#zz = np.empty_like(xxi)
zz = np.empty([len(xi),len(yi)])
for xci in range(0, len(xi)):
xc = xi[xci]
for yci in range(0, len(yi)):
yc = yi[yci]
density = 0.
for i in range(0,len(xl)):
xd = math.fabs(xl[i] - xc)
yd = math.fabs(yl[i] - yc)
if xd < 1 and yd < 1:
dist = math.sqrt(math.pow(xd, 2) + math.pow(yd, 2))
density = density + math.exp(-5.0 * pow(dist, 2))
zz[yci][xci] = density
return zz
def boxsum(img, w, h, r):
st = [0] * (w+1) * (h+1)
for x in xrange(w):
st[x+1] = st[x] + img[x]
for y in xrange(h):
st[(y+1)*(w+1)] = st[y*(w+1)] + img[y*w]
for x in xrange(w):
st[(y+1)*(w+1)+(x+1)] = st[(y+1)*(w+1)+x] + st[y*(w+1)+(x+1)] - st[y*(w+1)+x] + img[y*w+x]
for y in xrange(h):
y0 = max(0, y - r)
y1 = min(h, y + r + 1)
for x in xrange(w):
x0 = max(0, x - r)
x1 = min(w, x + r + 1)
img[y*w+x] = st[y0*(w+1)+x0] + st[y1*(w+1)+x1] - st[y1*(w+1)+x0] - st[y0*(w+1)+x1]
def grid_density_boxsum(x0, y0, x1, y1, w, h, data):
kx = (w - 1) / (x1 - x0)
ky = (h - 1) / (y1 - y0)
r = 15
border = r * 2
imgw = (w + 2 * border)
imgh = (h + 2 * border)
img = [0] * (imgw * imgh)
for x, y in data:
ix = int((x - x0) * kx) + border
iy = int((y - y0) * ky) + border
if 0 <= ix < imgw and 0 <= iy < imgh:
img[iy * imgw + ix] += 1
for p in xrange(4):
boxsum(img, imgw, imgh, r)
a = np.array(img).reshape(imgh,imgw)
b = a[border:(border+h),border:(border+w)]
return b
def grid_density_gaussian_filter(x0, y0, x1, y1, w, h, data):
kx = (w - 1) / (x1 - x0)
ky = (h - 1) / (y1 - y0)
r = 20
border = r
imgw = (w + 2 * border)
imgh = (h + 2 * border)
img = np.zeros((imgh,imgw))
for x, y in data:
ix = int((x - x0) * kx) + border
iy = int((y - y0) * ky) + border
if 0 <= ix < imgw and 0 <= iy < imgh:
img[iy][ix] += 1
return ndi.gaussian_filter(img, (r,r)) ## gaussian convolution
def generate_graph():
n = 1000
# data points range
data_ymin = -2.
data_ymax = 2.
data_xmin = -2.
data_xmax = 2.
# view area range
view_ymin = -.5
view_ymax = .5
view_xmin = -.5
view_xmax = .5
# generate data
xl = np.random.uniform(data_xmin, data_xmax, n)
yl = np.random.uniform(data_ymin, data_ymax, n)
zl = np.random.uniform(0, 1, n)
# get visible data points
xlvis = []
ylvis = []
for i in range(0,len(xl)):
if view_xmin < xl[i] < view_xmax and view_ymin < yl[i] < view_ymax:
xlvis.append(xl[i])
ylvis.append(yl[i])
fig = plt.figure()
# plot histogram
plt1 = fig.add_subplot(221)
plt1.set_axis_off()
t0 = time.clock()
zd, xe, ye = np.histogram2d(yl, xl, bins=10, range=[[view_ymin, view_ymax],[view_xmin, view_xmax]], normed=True)
plt.title('numpy.histogram2d - '+str(time.clock()-t0)+"sec")
plt.imshow(zd, origin='lower', extent=[view_xmin, view_xmax, view_ymin, view_ymax])
plt.scatter(xlvis, ylvis)
# plot density calculated with kdtree
plt2 = fig.add_subplot(222)
plt2.set_axis_off()
xi = np.linspace(view_xmin, view_xmax, 256)
yi = np.linspace(view_ymin, view_ymax, 256)
t0 = time.clock()
zd = grid_density_kdtree(xl, yl, xi, yi, 70)
plt.title('function of 5 nearest using kdtree\n'+str(time.clock()-t0)+"sec")
cmap=cm.jet
A = (cmap(zd/256.0)*255).astype(np.uint8)
#A[:,:,3] = zd
plt.imshow(A , origin='lower', extent=[view_xmin, view_xmax, view_ymin, view_ymax])
plt.scatter(xlvis, ylvis)
# gaussian filter
plt3 = fig.add_subplot(223)
plt3.set_axis_off()
t0 = time.clock()
zd = grid_density_gaussian_filter(view_xmin, view_ymin, view_xmax, view_ymax, 256, 256, zip(xl, yl))
plt.title('ndi.gaussian_filter - '+str(time.clock()-t0)+"sec")
plt.imshow(zd , origin='lower', extent=[view_xmin, view_xmax, view_ymin, view_ymax])
plt.scatter(xlvis, ylvis)
# boxsum smoothing
plt3 = fig.add_subplot(224)
plt3.set_axis_off()
t0 = time.clock()
zd = grid_density_boxsum(view_xmin, view_ymin, view_xmax, view_ymax, 256, 256, zip(xl, yl))
plt.title('boxsum smoothing - '+str(time.clock()-t0)+"sec")
plt.imshow(zd, origin='lower', extent=[view_xmin, view_xmax, view_ymin, view_ymax])
plt.scatter(xlvis, ylvis)
if __name__=='__main__':
generate_graph()
plt.show()
This approach is along the lines of some previous answers: increment a pixel for each spot, then smooth the image with a gaussian filter. A 256x256 image runs in about 350ms on my 6-year-old laptop.
import numpy as np
import scipy.ndimage as ndi
data = np.random.rand(30000,2) ## create random dataset
inds = (data * 255).astype('uint') ## convert to indices
img = np.zeros((256,256)) ## blank image
for i in xrange(data.shape[0]): ## draw pixels
img[inds[i,0], inds[i,1]] += 1
img = ndi.gaussian_filter(img, (10,10))
A very simple implementation that could be done (with C) in realtime and that only takes fractions of a second in pure python is to just compute the result in screen space.
The algorithm is
Allocate the final matrix (e.g. 256x256) with all zeros
For each point in the dataset increment the corresponding cell
Replace each cell in the matrix with the sum of the values of the matrix in an NxN box centered on the cell. Repeat this step a few times.
Scale result and output
The computation of the box sum can be made very fast and independent on N by using a sum table. Every computation just requires two scan of the matrix... total complexity is O(S + WHP) where S is the number of points; W, H are width and height of output and P is the number of smoothing passes.
Below is the code for a pure python implementation (also very un-optimized); with 30000 points and a 256x256 output grayscale image the computation is 0.5sec including linear scaling to 0..255 and saving of a .pgm file (N = 5, 4 passes).
def boxsum(img, w, h, r):
st = [0] * (w+1) * (h+1)
for x in xrange(w):
st[x+1] = st[x] + img[x]
for y in xrange(h):
st[(y+1)*(w+1)] = st[y*(w+1)] + img[y*w]
for x in xrange(w):
st[(y+1)*(w+1)+(x+1)] = st[(y+1)*(w+1)+x] + st[y*(w+1)+(x+1)] - st[y*(w+1)+x] + img[y*w+x]
for y in xrange(h):
y0 = max(0, y - r)
y1 = min(h, y + r + 1)
for x in xrange(w):
x0 = max(0, x - r)
x1 = min(w, x + r + 1)
img[y*w+x] = st[y0*(w+1)+x0] + st[y1*(w+1)+x1] - st[y1*(w+1)+x0] - st[y0*(w+1)+x1]
def saveGraph(w, h, data):
X = [x for x, y in data]
Y = [y for x, y in data]
x0, y0, x1, y1 = min(X), min(Y), max(X), max(Y)
kx = (w - 1) / (x1 - x0)
ky = (h - 1) / (y1 - y0)
img = [0] * (w * h)
for x, y in data:
ix = int((x - x0) * kx)
iy = int((y - y0) * ky)
img[iy * w + ix] += 1
for p in xrange(4):
boxsum(img, w, h, 2)
mx = max(img)
k = 255.0 / mx
out = open("result.pgm", "wb")
out.write("P5\n%i %i 255\n" % (w, h))
out.write("".join(map(chr, [int(v*k) for v in img])))
out.close()
import random
data = [(random.random(), random.random())
for i in xrange(30000)]
saveGraph(256, 256, data)
Edit
Of course the very definition of density in your case depends on a resolution radius, or is the density just +inf when you hit a point and zero when you don't?
The following is an animation built with the above program with just a few cosmetic changes:
used sqrt(average of squared values) instead of sum for the averaging pass
color-coded the results
stretching the result to always use the full color scale
drawn antialiased black dots where the data points are
made an animation by incrementing the radius from 2 to 40
The total computing time of the 39 frames of the following animation with this cosmetic version is 5.4 seconds with PyPy and 26 seconds with standard Python.
Histograms
The histogram way is not the fastest, and can't tell the difference between an arbitrarily small separation of points and 2 * sqrt(2) * b (where b is bin width).
Even if you construct the x bins and y bins separately (O(N)), you still have to perform some ab convolution (number of bins each way), which is close to N^2 for any dense system, and even bigger for a sparse one (well, ab >> N^2 in a sparse system.)
Looking at the code above, you seem to have a loop in grid_density() which runs over the number of bins in y inside a loop of the number of bins in x, which is why you're getting O(N^2) performance (although if you are already order N, which you should plot on different numbers of elements to see, then you're just going to have to run less code per cycle).
If you want an actual distance function then you need to start looking at contact detection algorithms.
Contact Detection
Naive contact detection algorithms come in at O(N^2) in either RAM or CPU time, but there is an algorithm, rightly or wrongly attributed to Munjiza at St. Mary's college London, which runs in linear time and RAM.
you can read about it and implement it yourself from his book, if you like.
I have written this code myself, in fact
I have written a python-wrapped C implementation of this in 2D, which is not really ready for production (it is still single threaded, etc) but it will run in as close to O(N) as your dataset will allow. You set the "element size", which acts as a bin size (the code will call interactions on everything within b of another point, and sometimes between b and 2 * sqrt(2) * b), give it an array (native python list) of objects with an x and y property and my C module will callback to a python function of your choice to run an interaction function for matched pairs of elements. it's designed for running contact force DEM simulations, but it will work fine on this problem too.
As I haven't released it yet, because the other bits of the library aren't ready yet, I'll have to give you a zip of my current source but the contact detection part is solid. The code is LGPL'd.
You'll need Cython and a c compiler to make it work, and it's only been tested and working under *nix environemnts, if you're on windows you'll need the mingw c compiler for Cython to work at all.
Once Cython's installed, building/installing pynet should be a case of running setup.py.
The function you are interested in is pynet.d2.run_contact_detection(py_elements, py_interaction_function, py_simulation_parameters) (and you should check out the classes Element and SimulationParameters at the same level if you want it to throw less errors - look in the file at archive-root/pynet/d2/__init__.py to see the class implementations, they're trivial data holders with useful constructors.)
(I will update this answer with a public mercurial repo when the code is ready for more general release...)
Your solution is okay, but one clear problem is that you're getting dark regions despite there being a point right in the middle of them.
I would instead center an n-dimensional Gaussian on each point and evaluate the sum over each point you want to display. To reduce it to linear time in the common case, use query_ball_point to consider only points within a couple standard deviations.
If you find that he KDTree is really slow, why not call query_ball_point once every five pixels with a slightly larger threshold? It doesn't hurt too much to evaluate a few too many Gaussians.
You can do this with a 2D, separable convolution (scipy.ndimage.convolve1d) of your original image with a gaussian shaped kernel. With an image size of MxM and a filter size of P, the complexity is O(PM^2) using separable filtering. The "Big-Oh" complexity is no doubt greater, but you can take advantage of numpy's efficient array operations which should greatly speed up your calculations.
Just a note, the histogram2d function should work fine for this. Did you play around with different bin sizes? Your initial histogram2d plot seems to just use the default bin sizes... but there's no reason to expect the default sizes to give you the representation you want. Having said that, many of the other solutions are impressive too.