I hope you can help! I need to show the trajectory of a particle that is under gravity of g = -9.81 m/s2 and time step of dt = 0.05 sec, where the position and velocity of the particle are:
x_1 = x_0 + v_x1 * dt
y_1 = y_0 + v_y1 * dt
v_x1 = v_x0
v_y1 = v_y0 + g * dt
This is what I should achieve:
This is what I've done so far:
import numpy as np
import matplotlib.pyplot as plt
plt.figure(1, figsize=(12,12))
ax = plt.subplot(111, aspect='equal')
ax.set_ylim(0,50)
ax.set_title('Boom --- Weeee! --- Ooof')
r = np.array([0.,0.,15.,30.])
g = -9.81
dt = 0.05
y = 0
x = 0
while y > 0:
plt.plot(x_1,y_2,':', ms=2)
x_1 = v_x1 * dt
y_1 = v_y1 * dt
v_x1 = v_x0
v_y1 = v_y0 + g * dt
This doesn't produce an image only the plt.figure stated in the beginning, I've tried to integrate the r vector into the loop but I can't figure out how.
Thank you.
Here's a modified version of your code that I believe gives you the result you desire (you may want to choose different initial velocity values):
import matplotlib.pyplot as plt
# Set up our plot surface
plt.figure(1, figsize=(12,12))
ax = plt.subplot()
# ax = plt.subplot(111, aspect='equal')
# ax.set_ylim(0,50)
ax.set_title('Boom --- Weeee! --- Ooof')
# Initial conditions
g = -9.81
dt = 0.05
y = 0
x = 0
v_x = 5
v_y = 5
# Create our lists that will store our data points
points_x = []
points_y = []
while True:
# Add the current position of our projectile to our data
points_x.append(x)
points_y.append(y)
# Move our projectile along
x += v_x * dt
y += v_y * dt
# If our projectile falls below the X axis (y < 0), end the simulation
if y < 0:
break
# Update our y velocity per gravity
v_y = v_y + g * dt
# Plot our data
ax.plot(points_x, points_y)
# Show the plot on the screen
plt.show()
I'm sorry if I could have made fewer changes. Here are the substantive ones I can think of:
You weren't using the r value you computed, so got rid of it, along with the import of numpy that was then no longer needed.
I took out calls you made explicitly size your plot. You're better off letting the plot library decide upon the bounds of the plot for you
I don't know if there's another way to do it, but I've always supplied data to the plotting library as arrays of points rather than by providing the points one at a time. So here, I collect up all of the x and y coordinates into two lists while running the simulation, and then add those arrays to the plot at the end to plot the data.
The x_0 vs x_1, etc., got confusing for me. I didn't see any reason to keep track of multiple position and velocity values, so I reduced the code down to using just one set of positions and velocities, x, y, v_x and v_y.
See the comments for more info
Result:
I am trying to create a plot that looks like the picture.
Wave Particle Motions under Wave
This is not homework, i'm trying to do this for experience.
I have the following parameters:
Plot the water particle motions under Trough (Lowest point on wave elevation profile) at water depths
from 0 to 100 meters in increments of 10 m below mean water line.
The wave profile varying over space is π(π₯) = π΄cos(πx) at time = 0. Plot this wave profile first for one wave.
π(π₯) = π΄*cos(πx) #at time = 0
Next compute vertical and horizontal particle displacements for different water depths of 0 to 100m
XDisp = -A * e**(k*z) * np.sin(-w*t)
YDisp = -A * e**(k*z) * np.cos(-w*t) # when x=0
You could use any x.
Motion magnitudes donβt change. Where z is depth below mean water level. All other parameters are as defined in earlier problems above.
Do not forget to shift the horizontally particle displacement to under trough and βzβ below water line for vertical particle displacement.
Here is my code, but im doing something wrong. I have the plot looking like the example but my circles are not right. I think it has to do with the x&y disp.
import numpy as np
import matplotlib.pyplot as plt
A = 1 # Wave amplitude in meters
T = 10 # Time Period in secs
n_w = 1 # Number of waves
wavelength = 156 # Wavelength in meters
# Wave Number
k = (2 * np.pi) / wavelength
# Wave angular frequency
w = (2 * np.pi) / T
def XDisp(z,t):
return -A * np.e**(k * z) * np.sin(-w * t)
def YDisp(z,t):
return -A * np.e**(k * z) * np.cos(-w * t)
def wave_elevation(x):
return A * np.cos(k * x)
t_list = np.array([0,0.25,0.5,0.75,1.0])*T
z = [0,-10,-20,-30,-40,-50,-60,-70,-80,-90,-100]
A_d = []
x_plot2 = []
for i in z:
A_d.append(A * np.e**(k * i))
x_plot2.append(wavelength/2)
x_plot = np.linspace(0,wavelength)
Y_plot = []
for i in x_plot:
Y_plot.append(wave_elevation(i))
plt.plot(x_plot,Y_plot,'.-r')
plt.scatter(x_plot2,z,s= A_d, facecolors = 'none',edgecolors = 'b',marker='o',linewidth=2)
plt.xlabel('X (m)')
plt.ylabel("\u03B7 & Water Depth")
plt.title('Wave Particle Motions Under Wave')
plt.legend()
plt.grid()
plt.show()
I am afraid with provided information, I don't follow science part of the question, but if you have problem in marker size you can put an array of sizes as third argument of plt.scatter. I think this code may help you, although I change your code a little bit to make it simpler
import numpy as np
import matplotlib.pyplot as plt
A = 1 # Wave amplitude in meters
T = 10 # Time Period in secs
n_w = 1 # Number of waves
wavelength = 156 # Wavelength in meters
k = (2 * np.pi) / wavelength # Wave Number
w = (2 * np.pi) / T # Wave angular frequency
def wave_elevation(x):
return A * np.cos(k * x)
A_d = [] # marker size
x2 = [] # for particle place on x axis which is wavelength/2
y2 = [] # for particle place on y axis
for i in range(0,100,10):
x2.append(wavelength/2)
y2.append(-i)
A_d.append(15 * np.exp(-k * i)) # here I change A to 15
x = []
y = []
for i in range(0,wavelength):
x.append(i)
y.append(wave_elevation(i))
plt.plot(x,y,'red')
plt.scatter(x2,y2,A_d)
plt.ylim(-100, 10)
plt.xlabel('X (m)')
plt.ylabel("\u03B7 & Water Depth")
plt.title('Wave Particle Motions Under Wave')
plt.grid()
plt.show()
So what I'm doing is creating sine waves with normally distributed amplitudes and frequencies - within given ranges. Eg 5V with 2-10Hz. So my attempt at this is to get my function with the given amplitude and frequency and then run it till the first turning point. From there I calculate the next function and add the y value of the previous functions turning point (as a shift) so it starts from that point. My problem is for some of the function changes I get straight lines rather than curves. If someone could tell me where I'm going wrong I'd appreciate it. Just to note, I use 8ms increments for each value to be plotted.
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
import serial
newlist = np.zeros(1)
timesnew = np.zeros(1)
volts = []
def main(amp, lowerFreq, upperFreq, time, incr):
#Creates graph and saves it in newlist and timesnew
amt = np.int(time / incr)
list = []
timels = [] # np.zeros(amt+amt)
curtime = 0
loweramp = -amp
mu, sigma = 0, 1
ybefore = 0
rand = stats.truncnorm((loweramp - mu) / sigma, (amp - mu) / sigma, loc=mu, scale=sigma)
freqr = stats.truncnorm((lowerFreq - mu) / sigma, (upperFreq - mu) / sigma, loc=mu, scale=sigma)
i = 0
while i < amt:
# get amp
thisAmp = rand.rvs()
angleFreq = 2 * np.pi * freqr.rvs()
xtp = np.arccos(0) / angleFreq #x value of turning point
yval = thisAmp * np.sin(angleFreq * xtp)
# check that yvalue(voltage) is okay to be used - is within +-amp range
while not loweramp <= yval + ybefore <= amp:
thisAmp = rand.rvs()
angleFreq = 2 * np.pi * freqr.rvs()
xtp = np.arccos(0) / angleFreq
yval = thisAmp * np.sin(angleFreq * xtp)
# now add values to list
t = 0
while t <= xtp:
ynow = thisAmp * np.sin(angleFreq * t) + ybefore
# print ynow
list.append(ynow)
curtime += incr
timels.append(curtime)
t += incr
i += 1
print i
ybefore = ynow
newlist = np.asarray(list)
timesnew = np.asarray(timels)
#a = np.column_stack((timesnew, newlist))
np.savetxt("C://foo.csv", a, delimiter=";", fmt='%.10f')
addvolts()
plt.plot(timels,list)
plt.show()
if __name__ == "__main__":
main(5, 1, 2, 25, 0.00008)
EDIT:
Basically here is the problem, after the turning point the function does not seem to be sinusodial (the line seems to be linear) and I can't understand why or atleast how to get the functions to end up being more "curvy" and not "sharp" at the turning points.
I'm thinking maybe the function changes shouldn't be too different from the previous function but then I would lose the randomness. I'd like it to "look better" but I'm not sure how to achieve that unless I ran the frequencies in order. I'm trying to emulate a "whitenoise file" that was given to me as part of a job that I applied for - the whitenoise would be sent to a digital to analog converter and be used to test equipment. Obviously I didn't get the position BUT for knowledge purposes I want to complete this.
Here is the graph of the whitenoise file I was given - 700 mins long:
From the last pic the difference between mine and the given can be seen, I think I'm going to attempt to run each function for an entire period rather than a single turning point.
True white noise is completely random, so trying to emulate white noise using some kind of function already is contradictory.
If the file you have is really supposed to be white noise than it has already undergone some kind of filtering. You can of course do the same in your program: Create some truely random numbers and use a filter function to obtain some "smoothing" effect.
For example you can use a Hann filter and colvolute the random noise with the filter. This is shown below.
import numpy as np
import scipy.signal
import matplotlib.pyplot as plt
y = np.random.rand(1600)
win = scipy.signal.hann(15)
filtered = scipy.signal.convolve(y, win, mode='same') / sum(win)
fig, (ax, ax2) = plt.subplots(nrows=2, sharex=True, sharey=True)
ax.plot(y, linestyle="-", marker=".", lw=0.3, markersize=1, color="r", alpha=0.5)
ax.set_title("random noise")
ax2.plot(y, linestyle="", marker=".", color="r", markersize=1)
ax2.plot(filtered)
ax2.set_title("filterred")
plt.show()
You might want to zoom in to better see the effect or use different parameter for the filter window.
I'm using streamplot in order to plot stress trajectories around an open circle. I do not want the stress trajectories to be analyzed inside the radius of the circle for two reasons: (1) The stresses will not propagate through the air as they would through the medium surrounding the hole, and (2) The math doesn't allow for it. I have been messing around with the idea of a mask but I haven't been able to get it to work. There might be a better way. Does anyone know how I can plot these trajectories without them plotting inside the radius of the hole? I effectively need some sort of command to tell the streamplot to stop whenever it gets to the outer radius of the hole, but then also know where to pick back up again. The first bit of code below is just the math used to derive the directions of the stress trajectories. I included this for reference. Following this I plot the trajectories.
import numpy as np
import matplotlib.pyplot as plt
from pylab import *
def stress_trajectory_cartesian(X,Y,chi,F,a):
# r is the radius out from the center of the hole at which we want to know the stress
# Theta is the angle from reference at which we want to know the stress
# a is the radius of the hole
r = np.sqrt(np.power(X,2)+np.power(Y,2))*1.0
c = (1.0*a)/(1.0*r)
theta = np.arctan2(Y,X)
A = 0.5*(1 - c**2. + (1 - 4*c**2. + 3*c**4.)*np.cos(2*theta))
B = 0.5*(1 - c**2. - (1 - 4*c**2. + 3*c**4.)*np.cos(2*theta))
C = 0.5*(1 + c**2. - (1 + 3*c**4.)*np.cos(2*theta))
D = 0.5*(1 + c**2. + (1+ 3*c**4.)*np.cos(2*theta))
E = 0.5*((1 + 2*c**2. - 3*c**4.)*np.sin(2*theta))
tau_r = 1.0*F*c**2. + (A-1.0*chi*B) # Radial stress
tau_theta = -1.*F*c**2. + (C - 1.0*chi*D) # Tangential stress
tau_r_theta = (-1 - 1.0*chi)*E # Shear stress
tau_xx = .5*tau_r*(np.cos(2*theta)+1) -1.0*tau_r_theta*np.sin(2*theta) + .5*(1-np.cos(2*theta))*tau_theta
tau_xy = .5*np.sin(2*theta)*(tau_r - tau_theta) + 1.0*tau_r_theta*np.cos(2*theta)
tau_yy = .5*(1-np.cos(2*theta))*tau_r + 1.0*tau_r_theta*np.sin(2*theta) + .5*(np.cos(2*theta)+1)*tau_theta
tan_2B = (2.*tau_xy)/(1.0*tau_xx - 1.0*tau_yy)
beta1 = .5*np.arctan(tan_2B)
beta2 = .5*np.arctan(tan_2B) + np.pi/2.
return beta1, beta2
# Functions to plot beta as a vector field in the Cartesian plane
def stress_beta1_cartesian(X,Y,chi,F,a):
return stress_trajectory_cartesian(X,Y,chi,F,a)[0]
def stress_beta2_cartesian(X,Y,chi,F,a):
return stress_trajectory_cartesian(X,Y,chi,F,a)[1]
#Used to return the directions of the betas
def to_unit_vector_x(angle):
return np.cos(angle)
def to_unit_vector_y(angle):
return np.sin(angle)
The code below plots the stress trajectories:
# Note that R_min is taken as the radius of the hole here
# Using R_min for a in these functions under the assumption that we don't want to analyze stresses across the hole
def plot_stresses_cartesian(F,chi,R_min):
Y_grid, X_grid = np.mgrid[-5:5:100j, -5:5:100j]
R_grid = np.sqrt(X_grid**2. + Y_grid**2.)
cart_betas1 = stress_beta1_cartesian(X_grid,Y_grid,chi,F,R_min)
beta_X1s = to_unit_vector_x(cart_betas1)
beta_Y1s = to_unit_vector_y(cart_betas1)
beta_X1s[R_grid<1] = np.nan
beta_Y1s[R_grid<1] = np.nan
cart_betas2 = stress_beta2_cartesian(X_grid,Y_grid,chi,F,R_min)
beta_X2s = to_unit_vector_x(cart_betas2)
beta_Y2s = to_unit_vector_y(cart_betas2)
beta_X2s[R_grid<1] = np.nan
beta_Y2s[R_grid<1] = np.nan
fig = plt.figure(figsize=(5,5))
#streamplot
ax=fig.add_subplot(111)
ax.set_title('Stress Trajectories')
plt.streamplot(X_grid, Y_grid, beta_X1s, beta_Y1s, minlength=0.9, arrowstyle='-', density=2.5, color='b')
plt.streamplot(X_grid, Y_grid, beta_X2s, beta_Y2s, minlength=0.9, arrowstyle='-', density=2.5, color='r')
plt.axis("image")
plt.xlabel(r'$\chi = $'+str(round(chi,1)) + ', ' + r'$F = $'+ str(round(F,1)))
plt.ylim(-5,5)
plt.xlim(-5,5)
plt.show()
plot_stresses_cartesian(0,1,1)
I think that you just need to have NaN values for the region that you do not want to consider. I generated a simple example below.
import numpy as np
import matplotlib.pyplot as plt
Y, X = np.mgrid[-5:5:100j, -5:5:100j]
R = np.sqrt(X**2 + Y**2)
U = -1 - X**2 + Y
V = 1 + X - Y**2
U[R<1] = np.nan
V[R<1] = np.nan
plt.streamplot(X, Y, U, V, density=2.5, arrowstyle='-')
plt.axis("image")
plt.savefig("stream.png", dpi=300)
plt.show()
With plot
I am attempting to generate map overlay images that would assist in identifying hot-spots, that is areas on the map that have high density of data points. None of the approaches that I've tried are fast enough for my needs.
Note: I forgot to mention that the algorithm should work well under both low and high zoom scenarios (or low and high data point density).
I looked through numpy, pyplot and scipy libraries, and the closest I could find was numpy.histogram2d. As you can see in the image below, the histogram2d output is rather crude. (Each image includes points overlaying the heatmap for better understanding)
My second attempt was to iterate over all the data points, and then calculate the hot-spot value as a function of distance. This produced a better looking image, however it is too slow to use in my application. Since it's O(n), it works ok with 100 points, but blows out when I use my actual dataset of 30000 points.
My final attempt was to store the data in an KDTree, and use the nearest 5 points to calculate the hot-spot value. This algorithm is O(1), so much faster with large dataset. It's still not fast enough, it takes about 20 seconds to generate a 256x256 bitmap, and I would like this to happen in around 1 second time.
Edit
The boxsum smoothing solution provided by 6502 works well at all zoom levels and is much faster than my original methods.
The gaussian filter solution suggested by Luke and Neil G is the fastest.
You can see all four approaches below, using 1000 data points in total, at 3x zoom there are around 60 points visible.
Complete code that generates my original 3 attempts, the boxsum smoothing solution provided by 6502 and gaussian filter suggested by Luke (improved to handle edges better and allow zooming in) is here:
import matplotlib
import numpy as np
from matplotlib.mlab import griddata
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import math
from scipy.spatial import KDTree
import time
import scipy.ndimage as ndi
def grid_density_kdtree(xl, yl, xi, yi, dfactor):
zz = np.empty([len(xi),len(yi)], dtype=np.uint8)
zipped = zip(xl, yl)
kdtree = KDTree(zipped)
for xci in range(0, len(xi)):
xc = xi[xci]
for yci in range(0, len(yi)):
yc = yi[yci]
density = 0.
retvalset = kdtree.query((xc,yc), k=5)
for dist in retvalset[0]:
density = density + math.exp(-dfactor * pow(dist, 2)) / 5
zz[yci][xci] = min(density, 1.0) * 255
return zz
def grid_density(xl, yl, xi, yi):
ximin, ximax = min(xi), max(xi)
yimin, yimax = min(yi), max(yi)
xxi,yyi = np.meshgrid(xi,yi)
#zz = np.empty_like(xxi)
zz = np.empty([len(xi),len(yi)])
for xci in range(0, len(xi)):
xc = xi[xci]
for yci in range(0, len(yi)):
yc = yi[yci]
density = 0.
for i in range(0,len(xl)):
xd = math.fabs(xl[i] - xc)
yd = math.fabs(yl[i] - yc)
if xd < 1 and yd < 1:
dist = math.sqrt(math.pow(xd, 2) + math.pow(yd, 2))
density = density + math.exp(-5.0 * pow(dist, 2))
zz[yci][xci] = density
return zz
def boxsum(img, w, h, r):
st = [0] * (w+1) * (h+1)
for x in xrange(w):
st[x+1] = st[x] + img[x]
for y in xrange(h):
st[(y+1)*(w+1)] = st[y*(w+1)] + img[y*w]
for x in xrange(w):
st[(y+1)*(w+1)+(x+1)] = st[(y+1)*(w+1)+x] + st[y*(w+1)+(x+1)] - st[y*(w+1)+x] + img[y*w+x]
for y in xrange(h):
y0 = max(0, y - r)
y1 = min(h, y + r + 1)
for x in xrange(w):
x0 = max(0, x - r)
x1 = min(w, x + r + 1)
img[y*w+x] = st[y0*(w+1)+x0] + st[y1*(w+1)+x1] - st[y1*(w+1)+x0] - st[y0*(w+1)+x1]
def grid_density_boxsum(x0, y0, x1, y1, w, h, data):
kx = (w - 1) / (x1 - x0)
ky = (h - 1) / (y1 - y0)
r = 15
border = r * 2
imgw = (w + 2 * border)
imgh = (h + 2 * border)
img = [0] * (imgw * imgh)
for x, y in data:
ix = int((x - x0) * kx) + border
iy = int((y - y0) * ky) + border
if 0 <= ix < imgw and 0 <= iy < imgh:
img[iy * imgw + ix] += 1
for p in xrange(4):
boxsum(img, imgw, imgh, r)
a = np.array(img).reshape(imgh,imgw)
b = a[border:(border+h),border:(border+w)]
return b
def grid_density_gaussian_filter(x0, y0, x1, y1, w, h, data):
kx = (w - 1) / (x1 - x0)
ky = (h - 1) / (y1 - y0)
r = 20
border = r
imgw = (w + 2 * border)
imgh = (h + 2 * border)
img = np.zeros((imgh,imgw))
for x, y in data:
ix = int((x - x0) * kx) + border
iy = int((y - y0) * ky) + border
if 0 <= ix < imgw and 0 <= iy < imgh:
img[iy][ix] += 1
return ndi.gaussian_filter(img, (r,r)) ## gaussian convolution
def generate_graph():
n = 1000
# data points range
data_ymin = -2.
data_ymax = 2.
data_xmin = -2.
data_xmax = 2.
# view area range
view_ymin = -.5
view_ymax = .5
view_xmin = -.5
view_xmax = .5
# generate data
xl = np.random.uniform(data_xmin, data_xmax, n)
yl = np.random.uniform(data_ymin, data_ymax, n)
zl = np.random.uniform(0, 1, n)
# get visible data points
xlvis = []
ylvis = []
for i in range(0,len(xl)):
if view_xmin < xl[i] < view_xmax and view_ymin < yl[i] < view_ymax:
xlvis.append(xl[i])
ylvis.append(yl[i])
fig = plt.figure()
# plot histogram
plt1 = fig.add_subplot(221)
plt1.set_axis_off()
t0 = time.clock()
zd, xe, ye = np.histogram2d(yl, xl, bins=10, range=[[view_ymin, view_ymax],[view_xmin, view_xmax]], normed=True)
plt.title('numpy.histogram2d - '+str(time.clock()-t0)+"sec")
plt.imshow(zd, origin='lower', extent=[view_xmin, view_xmax, view_ymin, view_ymax])
plt.scatter(xlvis, ylvis)
# plot density calculated with kdtree
plt2 = fig.add_subplot(222)
plt2.set_axis_off()
xi = np.linspace(view_xmin, view_xmax, 256)
yi = np.linspace(view_ymin, view_ymax, 256)
t0 = time.clock()
zd = grid_density_kdtree(xl, yl, xi, yi, 70)
plt.title('function of 5 nearest using kdtree\n'+str(time.clock()-t0)+"sec")
cmap=cm.jet
A = (cmap(zd/256.0)*255).astype(np.uint8)
#A[:,:,3] = zd
plt.imshow(A , origin='lower', extent=[view_xmin, view_xmax, view_ymin, view_ymax])
plt.scatter(xlvis, ylvis)
# gaussian filter
plt3 = fig.add_subplot(223)
plt3.set_axis_off()
t0 = time.clock()
zd = grid_density_gaussian_filter(view_xmin, view_ymin, view_xmax, view_ymax, 256, 256, zip(xl, yl))
plt.title('ndi.gaussian_filter - '+str(time.clock()-t0)+"sec")
plt.imshow(zd , origin='lower', extent=[view_xmin, view_xmax, view_ymin, view_ymax])
plt.scatter(xlvis, ylvis)
# boxsum smoothing
plt3 = fig.add_subplot(224)
plt3.set_axis_off()
t0 = time.clock()
zd = grid_density_boxsum(view_xmin, view_ymin, view_xmax, view_ymax, 256, 256, zip(xl, yl))
plt.title('boxsum smoothing - '+str(time.clock()-t0)+"sec")
plt.imshow(zd, origin='lower', extent=[view_xmin, view_xmax, view_ymin, view_ymax])
plt.scatter(xlvis, ylvis)
if __name__=='__main__':
generate_graph()
plt.show()
This approach is along the lines of some previous answers: increment a pixel for each spot, then smooth the image with a gaussian filter. A 256x256 image runs in about 350ms on my 6-year-old laptop.
import numpy as np
import scipy.ndimage as ndi
data = np.random.rand(30000,2) ## create random dataset
inds = (data * 255).astype('uint') ## convert to indices
img = np.zeros((256,256)) ## blank image
for i in xrange(data.shape[0]): ## draw pixels
img[inds[i,0], inds[i,1]] += 1
img = ndi.gaussian_filter(img, (10,10))
A very simple implementation that could be done (with C) in realtime and that only takes fractions of a second in pure python is to just compute the result in screen space.
The algorithm is
Allocate the final matrix (e.g. 256x256) with all zeros
For each point in the dataset increment the corresponding cell
Replace each cell in the matrix with the sum of the values of the matrix in an NxN box centered on the cell. Repeat this step a few times.
Scale result and output
The computation of the box sum can be made very fast and independent on N by using a sum table. Every computation just requires two scan of the matrix... total complexity is O(S + WHP) where S is the number of points; W, H are width and height of output and P is the number of smoothing passes.
Below is the code for a pure python implementation (also very un-optimized); with 30000 points and a 256x256 output grayscale image the computation is 0.5sec including linear scaling to 0..255 and saving of a .pgm file (N = 5, 4 passes).
def boxsum(img, w, h, r):
st = [0] * (w+1) * (h+1)
for x in xrange(w):
st[x+1] = st[x] + img[x]
for y in xrange(h):
st[(y+1)*(w+1)] = st[y*(w+1)] + img[y*w]
for x in xrange(w):
st[(y+1)*(w+1)+(x+1)] = st[(y+1)*(w+1)+x] + st[y*(w+1)+(x+1)] - st[y*(w+1)+x] + img[y*w+x]
for y in xrange(h):
y0 = max(0, y - r)
y1 = min(h, y + r + 1)
for x in xrange(w):
x0 = max(0, x - r)
x1 = min(w, x + r + 1)
img[y*w+x] = st[y0*(w+1)+x0] + st[y1*(w+1)+x1] - st[y1*(w+1)+x0] - st[y0*(w+1)+x1]
def saveGraph(w, h, data):
X = [x for x, y in data]
Y = [y for x, y in data]
x0, y0, x1, y1 = min(X), min(Y), max(X), max(Y)
kx = (w - 1) / (x1 - x0)
ky = (h - 1) / (y1 - y0)
img = [0] * (w * h)
for x, y in data:
ix = int((x - x0) * kx)
iy = int((y - y0) * ky)
img[iy * w + ix] += 1
for p in xrange(4):
boxsum(img, w, h, 2)
mx = max(img)
k = 255.0 / mx
out = open("result.pgm", "wb")
out.write("P5\n%i %i 255\n" % (w, h))
out.write("".join(map(chr, [int(v*k) for v in img])))
out.close()
import random
data = [(random.random(), random.random())
for i in xrange(30000)]
saveGraph(256, 256, data)
Edit
Of course the very definition of density in your case depends on a resolution radius, or is the density just +inf when you hit a point and zero when you don't?
The following is an animation built with the above program with just a few cosmetic changes:
used sqrt(average of squared values) instead of sum for the averaging pass
color-coded the results
stretching the result to always use the full color scale
drawn antialiased black dots where the data points are
made an animation by incrementing the radius from 2 to 40
The total computing time of the 39 frames of the following animation with this cosmetic version is 5.4 seconds with PyPy and 26 seconds with standard Python.
Histograms
The histogram way is not the fastest, and can't tell the difference between an arbitrarily small separation of points and 2 * sqrt(2) * b (where b is bin width).
Even if you construct the x bins and y bins separately (O(N)), you still have to perform some ab convolution (number of bins each way), which is close to N^2 for any dense system, and even bigger for a sparse one (well, ab >> N^2 in a sparse system.)
Looking at the code above, you seem to have a loop in grid_density() which runs over the number of bins in y inside a loop of the number of bins in x, which is why you're getting O(N^2) performance (although if you are already order N, which you should plot on different numbers of elements to see, then you're just going to have to run less code per cycle).
If you want an actual distance function then you need to start looking at contact detection algorithms.
Contact Detection
Naive contact detection algorithms come in at O(N^2) in either RAM or CPU time, but there is an algorithm, rightly or wrongly attributed to Munjiza at St. Mary's college London, which runs in linear time and RAM.
you can read about it and implement it yourself from his book, if you like.
I have written this code myself, in fact
I have written a python-wrapped C implementation of this in 2D, which is not really ready for production (it is still single threaded, etc) but it will run in as close to O(N) as your dataset will allow. You set the "element size", which acts as a bin size (the code will call interactions on everything within b of another point, and sometimes between b and 2 * sqrt(2) * b), give it an array (native python list) of objects with an x and y property and my C module will callback to a python function of your choice to run an interaction function for matched pairs of elements. it's designed for running contact force DEM simulations, but it will work fine on this problem too.
As I haven't released it yet, because the other bits of the library aren't ready yet, I'll have to give you a zip of my current source but the contact detection part is solid. The code is LGPL'd.
You'll need Cython and a c compiler to make it work, and it's only been tested and working under *nix environemnts, if you're on windows you'll need the mingw c compiler for Cython to work at all.
Once Cython's installed, building/installing pynet should be a case of running setup.py.
The function you are interested in is pynet.d2.run_contact_detection(py_elements, py_interaction_function, py_simulation_parameters) (and you should check out the classes Element and SimulationParameters at the same level if you want it to throw less errors - look in the file at archive-root/pynet/d2/__init__.py to see the class implementations, they're trivial data holders with useful constructors.)
(I will update this answer with a public mercurial repo when the code is ready for more general release...)
Your solution is okay, but one clear problem is that you're getting dark regions despite there being a point right in the middle of them.
I would instead center an n-dimensional Gaussian on each point and evaluate the sum over each point you want to display. To reduce it to linear time in the common case, use query_ball_point to consider only points within a couple standard deviations.
If you find that he KDTree is really slow, why not call query_ball_point once every five pixels with a slightly larger threshold? It doesn't hurt too much to evaluate a few too many Gaussians.
You can do this with a 2D, separable convolution (scipy.ndimage.convolve1d) of your original image with a gaussian shaped kernel. With an image size of MxM and a filter size of P, the complexity is O(PM^2) using separable filtering. The "Big-Oh" complexity is no doubt greater, but you can take advantage of numpy's efficient array operations which should greatly speed up your calculations.
Just a note, the histogram2d function should work fine for this. Did you play around with different bin sizes? Your initial histogram2d plot seems to just use the default bin sizes... but there's no reason to expect the default sizes to give you the representation you want. Having said that, many of the other solutions are impressive too.