Smooth a bumpy circle - python

I am detecting edges of round objects and am obtaining "bumpy" irregular edges. Is there away to smooth the edges so that I have a more uniform shape?
For example, in the code below I generate a "bumpy" circle (left). Is there a smoothing or moving average kind of function I could use to obtain or approximate the "smooth" circle (right). Preferably with some sort of parameter I can control as my actual images arn't perfectly circular.
import numpy as np
import matplotlib.pyplot as plt
fig, (bumpy, smooth) = plt.subplots(ncols=2, figsize=(14, 7))
an = np.linspace(0, 2 * np.pi, 100)
bumpy.plot(3 * np.cos(an) + np.random.normal(0,.03,100), 3 * np.sin(an) + np.random.normal(0,.03,100))
smooth.plot(3 * np.cos(an), 3 * np.sin(an))

You can do this in frequency domain. Take the (x,y) coordinates of the points of your curve and construct the signal as signal = x + yj, then take the Fourier transform of this signal. Filter out the high frequency components, then take the inverse Fourier transform and you'll get a smooth curve. You can control the smoothness by adjusting the cutoff frequency.
Here's an example:
import numpy as np
from matplotlib import pyplot as plt
r = 3
theta = np.linspace(0, 2 * np.pi, 100)
noise_level = 2
# construct the signal
x = r * np.cos(theta) + noise_level * np.random.normal(0,.03,100)
y = r * np.sin(theta) + noise_level * np.random.normal(0,.03,100)
signal = x + 1j*y
# FFT and frequencies
fft = np.fft.fft(signal)
freq = np.fft.fftfreq(signal.shape[-1])
# filter
cutoff = 0.1
fft[np.abs(freq) > cutoff] = 0
# IFFT
signal_filt = np.fft.ifft(fft)
plt.figure()
plt.subplot(121)
plt.axis('equal')
plt.plot(x, y, label='Noisy')
plt.subplot(122)
plt.axis('equal')
plt.plot(signal_filt.real, signal_filt.imag, label='Smooth')

If the shapes in question approximate ellipses, and you’d like to force them to be actual ellipses, you can easily fit an ellipse by computing the moments of inertia of the set of points, and figure the parameters of the ellipse from those. This Q&A shows how to accomplish that using MATLAB, the code is easy to translate to Python:
import numpy as np
import matplotlib.pyplot as plt
# Some test data (from Question, but with offset added):
an = np.linspace(0, 2 * np.pi, 100)
x = 3 * np.cos(an) + np.random.normal(0,.03,100) + 3.8
y = 3 * np.sin(an) + np.random.normal(0,.03,100) + 5.4
plt.plot(x, y)
# Approximate the ellipse:
L, V = np.linalg.eig(np.cov(x, y))
r = np.sqrt(2*L) # radius
cos_phi = V[0, 0]
sin_phi = V[1, 0] # two components of minor axis direction
m = np.array([np.mean(x), np.mean(y)]) # centroid
# Draw the ellipse:
x_approx = r[0] * np.cos(an) * cos_phi - r[1] * np.sin(an) * sin_phi + m[0];
y_approx = r[0] * np.cos(an) * sin_phi + r[1] * np.sin(an) * cos_phi + m[1];
plt.plot(x_approx, y_approx, 'r')
plt.show()
The centroid calculation above works because the points are uniformly distributed around the ellipse. If this is not the case, a slightly more complex centroid computation is needed.

I recommend to use FIR filter. The idea behind is to use weighted average of points arround filtered point... You just do something like
p(i) = 0.5*p(i) + 0.25*p(i-1) + 0.25*p(i+1)
where p(i) is i-th point. Its a good idea to remember original p(i) as it is used for next iteration (to avoid shifting). You can do each axis separately. You can use any weights but their sum should be 1.0. You can use any number of neighbors not just 2 (but in such case you need to remember more points). Symmetric weights will reduce shifting. You can apply FIR multiple times ...
Here small 2D C++ example:
//---------------------------------------------------------------------------
const int n=50; // points
float pnt[n][2]; // points x,y ...
//---------------------------------------------------------------------------
void pnt_init()
{
int i;
float a,da=2.0*M_PI/float(n),r;
Randomize();
for (a=0.0,i=0;i<n;i++,a+=da)
{
r=0.75+(0.2*Random());
pnt[i][0]=r*cos(a);
pnt[i][1]=r*sin(a);
}
}
//---------------------------------------------------------------------------
void pnt_smooth()
{
int i,j;
float p0[2],*p1,*p2,tmp[2],aa0[2],aa1[2],bb0[2],bb1[2];
// bb = original BBOX
for (j=0;j<2;j++) { bb0[j]=pnt[0][j]; bb1[j]=pnt[0][j]; }
for (i=0;i<n;i++)
for (p1=pnt[i],j=0;j<2;j++)
{
if (bb0[j]>p1[j]) bb0[j]=p1[j];
if (bb1[j]<p1[j]) bb1[j]=p1[j];
}
// FIR filter
for (j=0;j<2;j++) p0[j]=pnt[n-1][j]; // remember p[i-1]
p1=pnt[0]; p2=pnt[1]; // pointers to p[i],p[i+1]
for (i=0;i<n;i++,p1=p2,p2=pnt[(i+1)%n])
{
for (j=0;j<2;j++)
{
tmp[j]=p1[j]; // store original p[i]
p1[j]=(0.1*p0[j]) + (0.8*p1[j]) + (0.1*p2[j]); // p[i] = FIR(p0,p1,p2)
p0[j]=tmp[j]; // remeber original p1 as p[i-1] for next iteration
}
}
// aa = new BBOX
for (j=0;j<2;j++) { aa0[j]=pnt[0][j]; aa1[j]=pnt[0][j]; }
for (i=0;i<n;i++)
for (p1=pnt[i],j=0;j<2;j++)
{
if (aa0[j]>p1[j]) aa0[j]=p1[j];
if (aa1[j]<p1[j]) aa1[j]=p1[j];
}
// compute scale transform aa -> bb
for (j=0;j<2;j++) tmp[j]=(bb1[j]-bb0[j])/(aa1[j]-aa0[j]); // scale
// convert aa -> bb
for (i=0;i<n;i++)
for (p1=pnt[i],j=0;j<2;j++)
p1[j]=bb0[0]+((p1[j]-aa0[j])*tmp[j]);
}
//---------------------------------------------------------------------------
I added also checking BBOX before and after smoothing so the shape does not change size and position. In some cases centroid is better than BBOX for the position correction.
Here preview of multiple application of FIR filter:

Related

Change in coordinate density for np.meshgrid() in matplotlib

I am plotting a vector field using the numpy function quiver() and it works. But I would like to emphasize the cowlick in the following plot:
I am not sure how to go about it, but increasing the density of arrows in the center could possibly do the trick. To do so, I would like to resort to some option within np.meshgrid() that would allow me to get more tightly packed x,y coordinate points in the center. A linear, quadratic or other specification does not seem to be built in. I am not sure if sparse can be modified to this end.
The code:
lim = 10
int = 0.22 *lim
x,y = np.meshgrid(np.arange(-lim, lim, int), np.arange(-lim, lim, int))
u = 3 * np.cos(np.arctan2(y,x)) - np.sqrt(x**2+y**2) * np.sin(np.arctan2(y,x))
v = 3 * np.sin(np.arctan2(y,x)) + np.sqrt(x**2+y**2) * np.cos(np.arctan2(y,x))
color = x**2 + y**2
plt.rcParams["image.cmap"] = "Greys_r"
mult = 1
plt.figure(figsize=(mult*lim, mult*lim))
plt.quiver(x,y,u,v,color, linewidths=.006, lw=.1)
plt.show()
Closing the loop on this, thanks to the accepted answer I was able to finally strike a balance between the density of the mesh as I learned from to do from #flwr and keeping the "cowlick" structure of the vector field conspicuous (avoiding the radial structure around the origin as much as possible):
You can construct the points whereever you want to calculate your field on and quivers will be happy about it. The code below uses polar coordinates and stretches the radial coordinate non-linearly.
import numpy as np
import matplotlib.pyplot as plt
lim = 10
N = 10
theta = np.linspace(0.1, 2*np.pi, N*2)
stretcher_factor = 2
r = np.linspace(0.3, lim**(1/stretcher_factor), N)**stretcher_factor
R, THETA = np.meshgrid(r, theta)
x = R * np.cos(THETA)
y = R * np.sin(THETA)
# x,y = np.meshgrid(x, y)
r = x**2 + y**2
u = 3 * np.cos(THETA) - np.sqrt(r) * np.sin(THETA)
v = 3 * np.sin(THETA) + np.sqrt(r) * np.cos(THETA)
plt.rcParams["image.cmap"] = "Greys_r"
mult = 1
plt.figure(figsize=(mult*lim, mult*lim))
plt.quiver(x,y,u,v,r, linewidths=.006, lw=.1)
Edit: Bug taking meshgrid twice
np.meshgrid just makes a grid of the vectors you provide.
What you could do is contract this regular grid in the center to have more points in the center (best visible with more points), e.g. like so:
# contract in the center
a = 0.5 # how far to contract
b = 0.8 # how strongly to contract
c = 1 - b*np.exp(-((x/lim)**2 + (y/lim)**2)/a**2)
x, y = c*x, c*y
plt.plot(x,y,'.k')
plt.show()
Alternatively you can x,y cooridnates that are not dependent on a grid at all:
x = np.random.randn(500)
y = np.random.randn(500)
plt.plot(x,y,'.k')
plt.show()
But I think you'd prefer a slightly more regular patterns you could look into poisson disk sampling with adaptive distances or something like that, but the key point here is that for using quiver, you can use ANY set of coordinates, they do not have to be in a regular grid.

Simple DFT Coefficients => Amplitude/Frequencies => Plot

Im trying on DFT and FFT in Python with numpy and pyplot.
My Sample Vector is
x = np.array([1,2,4,3]
The DFT coefficients for that vector are
K = [10+0j, -3+1j, 0+0j, -3-1j]
so basically we have 10, -3+i, 0 and -3-1i as DFT coefficients.
My problem now is to get a combination of sin and cos to fit all 4 points.
Let's assume we have a sample Rate of 1hz.
This is my code :
from matplotlib import pyplot as plt
import numpy as np
x = np.array([1,2,4,3])
fft = np.fft.fft(x)
space = np.linspace(0,4,50)
values = np.array([1,2,3,4])
cos0 = fft[0].real * np.cos(0 * space)
cos1 = fft[1].real * np.cos(1/4 * np.pi * space)
sin1 = fft[1].imag * np.sin(1/4 * np.pi * space)
res = cos0 + cos1 + sin1
plt.scatter(values, x, label="original")
plt.plot(space, cos0, label="cos0")
plt.plot(space, cos1, label="cos1")
plt.plot(space, sin1, label="sin1")
plt.plot(space, res, label="combined")
plt.legend()
As result i get the plot:
(source: heeser-it.de)
Why isnt the final curve hitting any point?
I would appreciate your help. Thanks!
EDIT:
N = 1000
dataPoints = np.linspace(0, np.pi, N)
function = np.sin(dataPoints)
fft = np.fft.fft(function)
F = np.zeros((N,))
for i in range(0, N):
F[i] = (2 * np.pi * i) / N
F_sin = np.zeros((N,N))
F_cos = np.zeros((N,N))
res = 0
for i in range(0, N):
F_sin[i] = fft[i].imag / 500 * np.sin(dataPoints * F[i])
F_cos[i] = fft[i].real / 500* np.cos(dataPoints * F[i])
res = res + F_sin[i] + F_cos[i]
plt.plot(dataPoints, function)
plt.plot(dataPoints, res)
my plot looks like:
(source: heeser-it.de)
where do i fail?
Your testing vector x looks bit like a sawtooth because it rises linearly and then starts to decrease but with that few datapoints it's hard to tell what signal it is. This has an infinite FFT series, which means it has lot of higher harmonic frequency components in it. So to describe it with DTF coefficients and get close to original points, you would have to use
higher sample rate, to get information about higher frequencies (you should learn about nyquist theorem)
more data points (samples), so you can extract more precise information about frequencies in your signal) This means you have to have more items in your array 'x'.
Also you could try to fit some simpler signal. What about you try to fit a sine signal for start? Generate 1000 data points of low frequency sine (1Hz or one cycle per 1000 samples) and then run DTF on it to check if your code works.
There are a few mistakes:
The xs you assigned to the original values are off by one
The frequency you assigned to fft[1] is incorrect
The coefficients are incorrectly scaled
This one works:
from matplotlib import pyplot as plt
import numpy as np
x = np.array([1,2,4,3])
fft = np.fft.fft(x)
space = np.linspace(0,4,50)
values = np.array([0,1,2,3])
cos0 = fft[0].real * np.cos(0 * space)/4
cos1 = fft[1].real * np.cos(1/2 * np.pi * space)/2
sin1 = -fft[1].imag * np.sin(1/2 * np.pi * space)/2
res = cos0 + cos1 + sin1
plt.scatter(values, x, label="original")
plt.plot(space, cos0, label="cos0")
plt.plot(space, cos1, label="cos1")
plt.plot(space, sin1, label="sin1")
plt.plot(space, res, label="combined")
plt.legend()
plt.show()

Two dimensional FFT using python results in slightly shifted frequency

I know there have been several questions about using the Fast Fourier Transform (FFT) method in python, but unfortunately none of them could help me with my problem:
I want to use python to calculate the Fast Fourier Transform of a given two dimensional signal f, i.e. f(x,y). Pythons documentation helps a lot, solving a few issues, which the FFT brings with it, but i still end up with a slightly shifted frequency compared to the frequency i expect it to show. Here is my python code:
from scipy.fftpack import fft, fftfreq, fftshift
import matplotlib.pyplot as plt
import numpy as np
import math
fq = 3.0 # frequency of signal to be sampled
N = 100.0 # Number of sample points within interval, on which signal is considered
x = np.linspace(0, 2.0 * np.pi, N) # creating equally spaced vector from 0 to 2pi, with spacing 2pi/N
y = x
xx, yy = np.meshgrid(x, y) # create 2D meshgrid
fnc = np.sin(2 * np.pi * fq * xx) # create a signal, which is simply a sine function with frequency fq = 3.0, modulating the x(!) direction
ft = np.fft.fft2(fnc) # calculating the fft coefficients
dx = x[1] - x[0] # spacing in x (and also y) direction (real space)
sampleFrequency = 2.0 * np.pi / dx
nyquisitFrequency = sampleFrequency / 2.0
freq_x = np.fft.fftfreq(ft.shape[0], d = dx) # return the DFT sample frequencies
freq_y = np.fft.fftfreq(ft.shape[1], d = dx)
freq_x = np.fft.fftshift(freq_x) # order sample frequencies, such that 0-th frequency is at center of spectrum
freq_y = np.fft.fftshift(freq_y)
half = len(ft) / 2 + 1 # calculate half of spectrum length, in order to only show positive frequencies
plt.imshow(
2 * abs(ft[:half,:half]) / half,
aspect = 'auto',
extent = (0, freq_x.max(), 0, freq_y.max()),
origin = 'lower',
interpolation = 'nearest',
)
plt.grid()
plt.colorbar()
plt.show()
And what i get out of this when running it, is:
Now you see that the frequency in x direction is not exactly at fq = 3, but slightly shifted to the left. Why is this?
I would assume that is has to do with the fact, that FFT is an algorithm using symmetry arguments and
half = len(ft) / 2 + 1
is used to show the frequencies at the proper place. But I don't quite understand what the exact problem is and how to fix it.
Edit: I have also tried using a higher sampling frequency (N = 10000.0), which did not solve the issue, but instead shifted the frequency slightly too far to the right. So i am pretty sure that the problem is not the sampling frequency.
Note: I'm aware of the fact, that the leakage effect leads to unphysical amplitudes here, but in this post I am primarily interested in the correct frequencies.
I found a number of issues
you use 2 * np.pi twice, you should choose one of either linspace or the arg to sine as radians if you want a nice integer number of cycles
additionally np.linspace defaults to endpoint=True, giving you an extra point for 101 instead of 100
fq = 3.0 # frequency of signal to be sampled
N = 100 # Number of sample points within interval, on which signal is considered
x = np.linspace(0, 1, N, endpoint=False) # creating equally spaced vector from 0 to 2pi, with spacing 2pi/N
y = x
xx, yy = np.meshgrid(x, y) # create 2D meshgrid
fnc = np.sin(2 * np.pi * fq * xx) # create a signal, which is simply a sine function with frequency fq = 3.0, modulating the x(!) direction
you can check these issues:
len(x)
Out[228]: 100
plt.plot(fnc[0])
fixing the linspace endpoint now means you have an even number of fft bins so you drop the + 1 in the half calc
matshow() appears to have better defaults, your extent = (0, freq_x.max(), 0, freq_y.max()), in imshow appears to fubar the fft bin numbering
from scipy.fftpack import fft, fftfreq, fftshift
import matplotlib.pyplot as plt
import numpy as np
import math
fq = 3.0 # frequency of signal to be sampled
N = 100 # Number of sample points within interval, on which signal is considered
x = np.linspace(0, 1, N, endpoint=False) # creating equally spaced vector from 0 to 2pi, with spacing 2pi/N
y = x
xx, yy = np.meshgrid(x, y) # create 2D meshgrid
fnc = np.sin(2 * np.pi * fq * xx) # create a signal, which is simply a sine function with frequency fq = 3.0, modulating the x(!) direction
plt.plot(fnc[0])
ft = np.fft.fft2(fnc) # calculating the fft coefficients
#dx = x[1] - x[0] # spacing in x (and also y) direction (real space)
#sampleFrequency = 2.0 * np.pi / dx
#nyquisitFrequency = sampleFrequency / 2.0
#
#freq_x = np.fft.fftfreq(ft.shape[0], d=dx) # return the DFT sample frequencies
#freq_y = np.fft.fftfreq(ft.shape[1], d=dx)
#
#freq_x = np.fft.fftshift(freq_x) # order sample frequencies, such that 0-th frequency is at center of spectrum
#freq_y = np.fft.fftshift(freq_y)
half = len(ft) // 2 # calculate half of spectrum length, in order to only show positive frequencies
plt.matshow(
2 * abs(ft[:half, :half]) / half,
aspect='auto',
origin='lower'
)
plt.grid()
plt.colorbar()
plt.show()
zoomed the plot:

Rotating 1D numpy array of radial intensities into 2D array of spacial intensities

I have a numpy array filled with intensity readings at different radii in a uniform circle (for context, this is a 1D radiative transfer project for protostellar formation models: while much better models exist, my supervisor wasnts me to have the experience of producing one so I understand how others work).
I want to take that 1d array, and "rotate" it through a circle, forming a 2D array of intensities that could then be shown with imshow (or, with a bit of work, aplpy). The final array needs to be 2d, and the projection needs to be Cartesian, not polar.
I can do it with nested for loops, and I can do it with lookup tables, but I have a feeling there must be a neat way of doing it in numpy or something.
Any ideas?
EDIT:
I have had to go back and recreate my (frankly horrible) mess of for loops and if statements that I had before. If I really tried, I could probably get rid of one of the loops and one of the if statements by condensing things down. However, the aim is not to make it work with for loops, but see if there is a built in way to rotate the array.
impB is an array that differs slightly from what I stated it was before. Its actually just a list of radii where particles are detected. I then bin those into radius bins to get the intensity (or frequency if you prefer) in each radius. R is the scale factor for my radius as I run the model in a dimensionless way. iRes is a resolution scale factor, essentially how often I want to sample my radial bins. Everything else should be clear.
radJ = np.ndarray(shape=(2*iRes, 2*iRes)) # Create array of 2xRadius square
for i in range(iRes):
n = len(impB[np.where(impB[:] < ((i+1.) * (R / iRes)))]) # Count number of things within this radius +1
m = len(impB[np.where(impB[:] <= ((i) * (R / iRes)))]) # Count number of things in this radius
a = (((i + 1) * (R / iRes))**2 - ((i) * (R / iRes))**2) * math.pi # A normalisation factor based on area.....dont ask
for x in range(iRes):
for y in range(iRes):
if (x**2 + y**2) < (i * iRes)**2:
if (x**2 + y**2) >= (i * iRes)**2: # Checks for radius, and puts in cartesian space
radJ[x+iRes,y+iRes] = (n-m) / a # Put in actual intensity bins
radJ[x+iRes,-y+iRes] = (n-m) / a
radJ[-x+iRes,y+iRes] = (n-m) / a
radJ[-x+iRes,-y+iRes] = (n-m) / a
Nested loops are a simple approach for that. With ri_data_r and y containing your radius values (difference to the middle pixel) and the array for rotation, respectively, I would suggest:
from scipy import interpolate
import numpy as np
y = np.random.rand(100)
ri_data_r = np.linspace(-len(y)/2,len(y)/2,len(y))
interpol_index = interpolate.interp1d(ri_data_r, y)
xv = np.arange(-1, 1, 0.01) # adjust your matrix values here
X, Y = np.meshgrid(xv, xv)
profilegrid = np.ones(X.shape, float)
for i, x in enumerate(X[0, :]):
for k, y in enumerate(Y[:, 0]):
current_radius = np.sqrt(x ** 2 + y ** 2)
profilegrid[i, k] = interpol_index(current_radius)
print(profilegrid)
This will give you exactly what you are looking for. You just have to take in your array and calculate an symmetric array ri_data_r that has the same length as your data array and contains the distance between the actual data and the middle of the array. The code is doing this automatically.
I stumbled upon this question in a different context and I hope I understood it right. Here are two other ways of doing this. The first uses skimage.transform.warp with interpolation of desired order (here we use order=0 Nearest-neighbor). This method is slower but more precise and needs less memory then the second method.
The second one does not use interpolation, therefore is faster but also less precise and needs way more memory because it stores each 2D array containing one tilt until the end, where they are averaged with np.nanmean().
The difference between both solutions stemmed from the problem of handling the center of the final image where the tilts overlap the most, i.e. the first one would just add values with each tilt ending up out of the original range. This was "solved" by clipping the matrix in each step to a global_min and global_max (consult the code). The second one solves it by taking the mean of the tilts where they overlap, which forces us to use the np.nan.
Please, read the Example of usage and Sanity check sections in order to understand the plot titles.
Solution 1:
import numpy as np
from skimage.transform import warp
def rotate_vector(vector, deg_angle):
# Credit goes to skimage.transform.radon
assert vector.ndim == 1, 'Pass only 1D vectors, e.g. use array.ravel()'
center = vector.size // 2
square = np.zeros((vector.size, vector.size))
square[center,:] = vector
rad_angle = np.deg2rad(deg_angle)
cos_a, sin_a = np.cos(rad_angle), np.sin(rad_angle)
R = np.array([[cos_a, sin_a, -center * (cos_a + sin_a - 1)],
[-sin_a, cos_a, -center * (cos_a - sin_a - 1)],
[0, 0, 1]])
# Approx. 80% of time is spent in this function
return warp(square, R, clip=False, output_shape=((vector.size, vector.size)))
def place_vectors(vectors, deg_angles):
matrix = np.zeros((vectors.shape[-1], vectors.shape[-1]))
global_min, global_max = 0, 0
for i, deg_angle in enumerate(deg_angles):
tilt = rotate_vector(vectors[i], deg_angle)
global_min = tilt.min() if global_min > tilt.min() else global_min
global_max = tilt.max() if global_max < tilt.max() else global_max
matrix += tilt
matrix = np.clip(matrix, global_min, global_max)
return matrix
Solution 2:
Credit for the idea goes to my colleague Michael Scherbela.
import numpy as np
def rotate_vector(vector, deg_angle):
assert vector.ndim == 1, 'Pass only 1D vectors, e.g. use array.ravel()'
square = np.ones([vector.size, vector.size]) * np.nan
radius = vector.size // 2
r_values = np.linspace(-radius, radius, vector.size)
rad_angle = np.deg2rad(deg_angle)
ind_x = np.round(np.cos(rad_angle) * r_values + vector.size/2).astype(np.int)
ind_y = np.round(np.sin(rad_angle) * r_values + vector.size/2).astype(np.int)
ind_x = np.clip(ind_x, 0, vector.size-1)
ind_y = np.clip(ind_y, 0, vector.size-1)
square[ind_y, ind_x] = vector
return square
def place_vectors(vectors, deg_angles):
matrices = []
for deg_angle, vector in zip(deg_angles, vectors):
matrices.append(rotate_vector(vector, deg_angle))
matrix = np.nanmean(np.array(matrices), axis=0)
return np.nan_to_num(matrix, copy=False, nan=0.0)
Example of usage:
r = 100 # Radius of the circle, i.e. half the length of the vector
n = int(np.pi * r / 8) # Number of vectors, e.g. number of tilts in tomography
v = np.ones(2*r) # One vector, e.g. one tilt in tomography
V = np.array([v]*n) # All vectors, e.g. a sinogram in tomography
# Rotate 1D vector to a specific angle (output is 2D)
angle = 45
rotated = rotate_vector(v, angle)
# Rotate each row of a 2D array according to its angle (output is 2D)
angles = np.linspace(-90, 90, num=n, endpoint=False)
inplace = place_vectors(V, angles)
Sanity check:
These are just simple checks which by no means cover all possible edge cases. Depending on your use case you might want to extend the checks and adjust the method.
# I. Sanity check
# Assuming n <= πr and v = np.ones(2r)
# Then sum(inplace) should be approx. equal to (n * (2πr - n)) / π
# which is an area that should be covered by the tilts
desired_area = (n * (2 * np.pi * r - n)) / np.pi
covered_area = np.sum(inplace)
covered_frac = covered_area / desired_area
print(f'This method covered {covered_frac * 100:.2f}% '
'of the area which should be covered in total.')
# II. Sanity check
# Assuming n <= πr and v = np.ones(2r)
# Then a circle M with radius m <= r should be the largest circle which
# is fully covered by the vectors. I.e. its mean should be no less than 1.
# If n = πr then m = r.
# m = n / π
m = int(n / np.pi)
# Code for circular mask not included
mask = create_circular_mask(2*r, 2*r, center=None, radius=m)
m_area = np.mean(inplace[mask])
print(f'Full radius r={r}, radius m={m}, mean(M)={m_area:.4f}.')
Code for plotting:
import matplotlib.pyplot as plt
plt.figure(figsize=(16, 8))
plt.subplot(121)
rotated = np.nan_to_num(rotated) # not necessary in case of the first method
plt.title(
f'Output of rotate_vector(), angle={angle}°\n'
f'Sum is {np.sum(rotated):.2f} and should be {np.sum(v):.2f}')
plt.imshow(rotated, cmap=plt.cm.Greys_r)
plt.subplot(122)
plt.title(
f'Output of place_vectors(), r={r}, n={n}\n'
f'Covered {covered_frac * 100:.2f}% of the area which should be covered.\n'
f'Mean of the circle M is {m_area:.4f} and should be 1.0.')
plt.imshow(inplace)
circle=plt.Circle((r, r), m, color='r', fill=False)
plt.gcf().gca().add_artist(circle)
plt.gcf().gca().legend([circle], [f'Circle M (m={m})'])

Efficient method of calculating density of irregularly spaced points

I am attempting to generate map overlay images that would assist in identifying hot-spots, that is areas on the map that have high density of data points. None of the approaches that I've tried are fast enough for my needs.
Note: I forgot to mention that the algorithm should work well under both low and high zoom scenarios (or low and high data point density).
I looked through numpy, pyplot and scipy libraries, and the closest I could find was numpy.histogram2d. As you can see in the image below, the histogram2d output is rather crude. (Each image includes points overlaying the heatmap for better understanding)
My second attempt was to iterate over all the data points, and then calculate the hot-spot value as a function of distance. This produced a better looking image, however it is too slow to use in my application. Since it's O(n), it works ok with 100 points, but blows out when I use my actual dataset of 30000 points.
My final attempt was to store the data in an KDTree, and use the nearest 5 points to calculate the hot-spot value. This algorithm is O(1), so much faster with large dataset. It's still not fast enough, it takes about 20 seconds to generate a 256x256 bitmap, and I would like this to happen in around 1 second time.
Edit
The boxsum smoothing solution provided by 6502 works well at all zoom levels and is much faster than my original methods.
The gaussian filter solution suggested by Luke and Neil G is the fastest.
You can see all four approaches below, using 1000 data points in total, at 3x zoom there are around 60 points visible.
Complete code that generates my original 3 attempts, the boxsum smoothing solution provided by 6502 and gaussian filter suggested by Luke (improved to handle edges better and allow zooming in) is here:
import matplotlib
import numpy as np
from matplotlib.mlab import griddata
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import math
from scipy.spatial import KDTree
import time
import scipy.ndimage as ndi
def grid_density_kdtree(xl, yl, xi, yi, dfactor):
zz = np.empty([len(xi),len(yi)], dtype=np.uint8)
zipped = zip(xl, yl)
kdtree = KDTree(zipped)
for xci in range(0, len(xi)):
xc = xi[xci]
for yci in range(0, len(yi)):
yc = yi[yci]
density = 0.
retvalset = kdtree.query((xc,yc), k=5)
for dist in retvalset[0]:
density = density + math.exp(-dfactor * pow(dist, 2)) / 5
zz[yci][xci] = min(density, 1.0) * 255
return zz
def grid_density(xl, yl, xi, yi):
ximin, ximax = min(xi), max(xi)
yimin, yimax = min(yi), max(yi)
xxi,yyi = np.meshgrid(xi,yi)
#zz = np.empty_like(xxi)
zz = np.empty([len(xi),len(yi)])
for xci in range(0, len(xi)):
xc = xi[xci]
for yci in range(0, len(yi)):
yc = yi[yci]
density = 0.
for i in range(0,len(xl)):
xd = math.fabs(xl[i] - xc)
yd = math.fabs(yl[i] - yc)
if xd < 1 and yd < 1:
dist = math.sqrt(math.pow(xd, 2) + math.pow(yd, 2))
density = density + math.exp(-5.0 * pow(dist, 2))
zz[yci][xci] = density
return zz
def boxsum(img, w, h, r):
st = [0] * (w+1) * (h+1)
for x in xrange(w):
st[x+1] = st[x] + img[x]
for y in xrange(h):
st[(y+1)*(w+1)] = st[y*(w+1)] + img[y*w]
for x in xrange(w):
st[(y+1)*(w+1)+(x+1)] = st[(y+1)*(w+1)+x] + st[y*(w+1)+(x+1)] - st[y*(w+1)+x] + img[y*w+x]
for y in xrange(h):
y0 = max(0, y - r)
y1 = min(h, y + r + 1)
for x in xrange(w):
x0 = max(0, x - r)
x1 = min(w, x + r + 1)
img[y*w+x] = st[y0*(w+1)+x0] + st[y1*(w+1)+x1] - st[y1*(w+1)+x0] - st[y0*(w+1)+x1]
def grid_density_boxsum(x0, y0, x1, y1, w, h, data):
kx = (w - 1) / (x1 - x0)
ky = (h - 1) / (y1 - y0)
r = 15
border = r * 2
imgw = (w + 2 * border)
imgh = (h + 2 * border)
img = [0] * (imgw * imgh)
for x, y in data:
ix = int((x - x0) * kx) + border
iy = int((y - y0) * ky) + border
if 0 <= ix < imgw and 0 <= iy < imgh:
img[iy * imgw + ix] += 1
for p in xrange(4):
boxsum(img, imgw, imgh, r)
a = np.array(img).reshape(imgh,imgw)
b = a[border:(border+h),border:(border+w)]
return b
def grid_density_gaussian_filter(x0, y0, x1, y1, w, h, data):
kx = (w - 1) / (x1 - x0)
ky = (h - 1) / (y1 - y0)
r = 20
border = r
imgw = (w + 2 * border)
imgh = (h + 2 * border)
img = np.zeros((imgh,imgw))
for x, y in data:
ix = int((x - x0) * kx) + border
iy = int((y - y0) * ky) + border
if 0 <= ix < imgw and 0 <= iy < imgh:
img[iy][ix] += 1
return ndi.gaussian_filter(img, (r,r)) ## gaussian convolution
def generate_graph():
n = 1000
# data points range
data_ymin = -2.
data_ymax = 2.
data_xmin = -2.
data_xmax = 2.
# view area range
view_ymin = -.5
view_ymax = .5
view_xmin = -.5
view_xmax = .5
# generate data
xl = np.random.uniform(data_xmin, data_xmax, n)
yl = np.random.uniform(data_ymin, data_ymax, n)
zl = np.random.uniform(0, 1, n)
# get visible data points
xlvis = []
ylvis = []
for i in range(0,len(xl)):
if view_xmin < xl[i] < view_xmax and view_ymin < yl[i] < view_ymax:
xlvis.append(xl[i])
ylvis.append(yl[i])
fig = plt.figure()
# plot histogram
plt1 = fig.add_subplot(221)
plt1.set_axis_off()
t0 = time.clock()
zd, xe, ye = np.histogram2d(yl, xl, bins=10, range=[[view_ymin, view_ymax],[view_xmin, view_xmax]], normed=True)
plt.title('numpy.histogram2d - '+str(time.clock()-t0)+"sec")
plt.imshow(zd, origin='lower', extent=[view_xmin, view_xmax, view_ymin, view_ymax])
plt.scatter(xlvis, ylvis)
# plot density calculated with kdtree
plt2 = fig.add_subplot(222)
plt2.set_axis_off()
xi = np.linspace(view_xmin, view_xmax, 256)
yi = np.linspace(view_ymin, view_ymax, 256)
t0 = time.clock()
zd = grid_density_kdtree(xl, yl, xi, yi, 70)
plt.title('function of 5 nearest using kdtree\n'+str(time.clock()-t0)+"sec")
cmap=cm.jet
A = (cmap(zd/256.0)*255).astype(np.uint8)
#A[:,:,3] = zd
plt.imshow(A , origin='lower', extent=[view_xmin, view_xmax, view_ymin, view_ymax])
plt.scatter(xlvis, ylvis)
# gaussian filter
plt3 = fig.add_subplot(223)
plt3.set_axis_off()
t0 = time.clock()
zd = grid_density_gaussian_filter(view_xmin, view_ymin, view_xmax, view_ymax, 256, 256, zip(xl, yl))
plt.title('ndi.gaussian_filter - '+str(time.clock()-t0)+"sec")
plt.imshow(zd , origin='lower', extent=[view_xmin, view_xmax, view_ymin, view_ymax])
plt.scatter(xlvis, ylvis)
# boxsum smoothing
plt3 = fig.add_subplot(224)
plt3.set_axis_off()
t0 = time.clock()
zd = grid_density_boxsum(view_xmin, view_ymin, view_xmax, view_ymax, 256, 256, zip(xl, yl))
plt.title('boxsum smoothing - '+str(time.clock()-t0)+"sec")
plt.imshow(zd, origin='lower', extent=[view_xmin, view_xmax, view_ymin, view_ymax])
plt.scatter(xlvis, ylvis)
if __name__=='__main__':
generate_graph()
plt.show()
This approach is along the lines of some previous answers: increment a pixel for each spot, then smooth the image with a gaussian filter. A 256x256 image runs in about 350ms on my 6-year-old laptop.
import numpy as np
import scipy.ndimage as ndi
data = np.random.rand(30000,2) ## create random dataset
inds = (data * 255).astype('uint') ## convert to indices
img = np.zeros((256,256)) ## blank image
for i in xrange(data.shape[0]): ## draw pixels
img[inds[i,0], inds[i,1]] += 1
img = ndi.gaussian_filter(img, (10,10))
A very simple implementation that could be done (with C) in realtime and that only takes fractions of a second in pure python is to just compute the result in screen space.
The algorithm is
Allocate the final matrix (e.g. 256x256) with all zeros
For each point in the dataset increment the corresponding cell
Replace each cell in the matrix with the sum of the values of the matrix in an NxN box centered on the cell. Repeat this step a few times.
Scale result and output
The computation of the box sum can be made very fast and independent on N by using a sum table. Every computation just requires two scan of the matrix... total complexity is O(S + WHP) where S is the number of points; W, H are width and height of output and P is the number of smoothing passes.
Below is the code for a pure python implementation (also very un-optimized); with 30000 points and a 256x256 output grayscale image the computation is 0.5sec including linear scaling to 0..255 and saving of a .pgm file (N = 5, 4 passes).
def boxsum(img, w, h, r):
st = [0] * (w+1) * (h+1)
for x in xrange(w):
st[x+1] = st[x] + img[x]
for y in xrange(h):
st[(y+1)*(w+1)] = st[y*(w+1)] + img[y*w]
for x in xrange(w):
st[(y+1)*(w+1)+(x+1)] = st[(y+1)*(w+1)+x] + st[y*(w+1)+(x+1)] - st[y*(w+1)+x] + img[y*w+x]
for y in xrange(h):
y0 = max(0, y - r)
y1 = min(h, y + r + 1)
for x in xrange(w):
x0 = max(0, x - r)
x1 = min(w, x + r + 1)
img[y*w+x] = st[y0*(w+1)+x0] + st[y1*(w+1)+x1] - st[y1*(w+1)+x0] - st[y0*(w+1)+x1]
def saveGraph(w, h, data):
X = [x for x, y in data]
Y = [y for x, y in data]
x0, y0, x1, y1 = min(X), min(Y), max(X), max(Y)
kx = (w - 1) / (x1 - x0)
ky = (h - 1) / (y1 - y0)
img = [0] * (w * h)
for x, y in data:
ix = int((x - x0) * kx)
iy = int((y - y0) * ky)
img[iy * w + ix] += 1
for p in xrange(4):
boxsum(img, w, h, 2)
mx = max(img)
k = 255.0 / mx
out = open("result.pgm", "wb")
out.write("P5\n%i %i 255\n" % (w, h))
out.write("".join(map(chr, [int(v*k) for v in img])))
out.close()
import random
data = [(random.random(), random.random())
for i in xrange(30000)]
saveGraph(256, 256, data)
Edit
Of course the very definition of density in your case depends on a resolution radius, or is the density just +inf when you hit a point and zero when you don't?
The following is an animation built with the above program with just a few cosmetic changes:
used sqrt(average of squared values) instead of sum for the averaging pass
color-coded the results
stretching the result to always use the full color scale
drawn antialiased black dots where the data points are
made an animation by incrementing the radius from 2 to 40
The total computing time of the 39 frames of the following animation with this cosmetic version is 5.4 seconds with PyPy and 26 seconds with standard Python.
Histograms
The histogram way is not the fastest, and can't tell the difference between an arbitrarily small separation of points and 2 * sqrt(2) * b (where b is bin width).
Even if you construct the x bins and y bins separately (O(N)), you still have to perform some ab convolution (number of bins each way), which is close to N^2 for any dense system, and even bigger for a sparse one (well, ab >> N^2 in a sparse system.)
Looking at the code above, you seem to have a loop in grid_density() which runs over the number of bins in y inside a loop of the number of bins in x, which is why you're getting O(N^2) performance (although if you are already order N, which you should plot on different numbers of elements to see, then you're just going to have to run less code per cycle).
If you want an actual distance function then you need to start looking at contact detection algorithms.
Contact Detection
Naive contact detection algorithms come in at O(N^2) in either RAM or CPU time, but there is an algorithm, rightly or wrongly attributed to Munjiza at St. Mary's college London, which runs in linear time and RAM.
you can read about it and implement it yourself from his book, if you like.
I have written this code myself, in fact
I have written a python-wrapped C implementation of this in 2D, which is not really ready for production (it is still single threaded, etc) but it will run in as close to O(N) as your dataset will allow. You set the "element size", which acts as a bin size (the code will call interactions on everything within b of another point, and sometimes between b and 2 * sqrt(2) * b), give it an array (native python list) of objects with an x and y property and my C module will callback to a python function of your choice to run an interaction function for matched pairs of elements. it's designed for running contact force DEM simulations, but it will work fine on this problem too.
As I haven't released it yet, because the other bits of the library aren't ready yet, I'll have to give you a zip of my current source but the contact detection part is solid. The code is LGPL'd.
You'll need Cython and a c compiler to make it work, and it's only been tested and working under *nix environemnts, if you're on windows you'll need the mingw c compiler for Cython to work at all.
Once Cython's installed, building/installing pynet should be a case of running setup.py.
The function you are interested in is pynet.d2.run_contact_detection(py_elements, py_interaction_function, py_simulation_parameters) (and you should check out the classes Element and SimulationParameters at the same level if you want it to throw less errors - look in the file at archive-root/pynet/d2/__init__.py to see the class implementations, they're trivial data holders with useful constructors.)
(I will update this answer with a public mercurial repo when the code is ready for more general release...)
Your solution is okay, but one clear problem is that you're getting dark regions despite there being a point right in the middle of them.
I would instead center an n-dimensional Gaussian on each point and evaluate the sum over each point you want to display. To reduce it to linear time in the common case, use query_ball_point to consider only points within a couple standard deviations.
If you find that he KDTree is really slow, why not call query_ball_point once every five pixels with a slightly larger threshold? It doesn't hurt too much to evaluate a few too many Gaussians.
You can do this with a 2D, separable convolution (scipy.ndimage.convolve1d) of your original image with a gaussian shaped kernel. With an image size of MxM and a filter size of P, the complexity is O(PM^2) using separable filtering. The "Big-Oh" complexity is no doubt greater, but you can take advantage of numpy's efficient array operations which should greatly speed up your calculations.
Just a note, the histogram2d function should work fine for this. Did you play around with different bin sizes? Your initial histogram2d plot seems to just use the default bin sizes... but there's no reason to expect the default sizes to give you the representation you want. Having said that, many of the other solutions are impressive too.

Categories