Efficient 2D cross correlation in Python? - python

I have two arrays of size (n, m, m) (n number of images of size (m,m)). I want to perform a cross correlation between each corresponding n of the two arrays.
Example: n=1 -> corr2d([m,m]<sub>1</sub>,[m,m]<sub>2</sub>)
My current way include a bunch of for loops in python:
for i in range(len(X)):
X_co = X[i,0,:,:]/(np.max(X[i,0,:,:]))
X_x = X[i,1,:,:]/(np.max(X[i,1,:,:]))
autocorr[i,0,:,:]=correlate2d(X_co, X_x, mode='same', boundary='fill', fillvalue=0)
Obviously this is very slow when the input contain many images, and becomes a substantial part of the total run time if (m,m) << n.
The obvious optimization is to skip the loop and feed everything directly to the compiled correlation function. Currently I'm using scipy's correlate2d.
I've looked around but haven't found any function that allows correlation along some axis or multiple inputs.
Any tips on how to make scipy's correlate2d work or alternatives?

I decided to implement it via the FFT instead.
def fft_xcorr2D(x):
# Over axes (-2,-1) (default in the fft2 function)
## Pad because of cyclic (circular?) behavior of the FFT
x = np.fft2(np.pad(x,([0,0],[0,0],[0,34],[0,34]),mode='constant'))
# Conjugate for correlation, not convolution (Conv. Theorem)
x[:,1,:,:] = np.conj(x[:,1,:,:])
# Over axes (-2,-1) (default in the ifft2 function)
## Multiply elementwise over 2:nd axis (2 image bands for me)
### fftshift over rows and column over images
corr = np.fft.fftshift(np.ifft2(np.prod(x,axis=1)),axes=(-2,-1))
# Return after removing padding
return np.abs(corr)[:,3:-2,3:-2]
Call via:
ts=fft_xcorr2D(X)
If anybody wants to use it:
My input is a 4D array: (N, 2, #Rows, #Cols)
E.g. (500, 2, 30, 30): 500 images, 2 bands (polarizations, for example), of 30x30 pixels
If your input is different, adjust the padding to your liking
Check so your input order is the same as mine otherwise change the axes arguments in the fft2 and ifft2 functions, the np.prod and fftshift. I use fftshift to get the maximum value in the middle (otherwise in the corners), so be wary of that if that's not what you want.
Why is it the maximum value? Technically, it doesn't have to be, but for my purpose it is. fftshift is used to get a correlation that looks like you're used to. Otherwise, the quadrants are turned "inside out". If you wonder what I mean, remove fftshift (just the fftshift part, not its arguments), call the function as before, and plot it.
Afterwards, it should be ready to use.
Possibly x.prod(axis=1) is faster than np.prod(x,axis=1) but it's an old post. It shows no improvement for me after trying.

Related

Python - How to resample a 2D shape?

I am writing a python script for some geometrical data manipulation (calculating motion trajectories for a multi-drive industrial machine). Generally, the idea is that there is a given shape (let's say - an ellipse, but it general case it can be any convex shape, defined with a series of 2D points), which is rotated and it's uppermost tangent point must be followed. I don't have a problem with the latter part but I need a little hint with the 2D shape preparation.
Let's say that the ellipse was defined with too little points, for example - 25. (As I said, ultimately this can be any shape, for example a rounded hexagon). To maintain necessary precision I need far more points (let's say - 1000), preferably equally distributed over whole shape or with higher density of points near corners, sharp curves, etc.
I have a few things ringing in my head, I guess that DFT (FFT) would be a good starting point for this resampling, analyzing the scipy.signal.resample() I have found out that there are far more functions in the scipy.signal package which sound promising to me...
What I'm asking for is a suggestion which way I should follow, what tool I should try for this job, which may be the most suitable. Maybe there is a tool meant exactly for what I'm looking for or maybe I'm overthinking this and one of the implementations of FFT like resample() will work just fine (of course, after some adjustments at the starting and ending point of the shape to make sure it's closing without issues)?
Scipy.signal sounds promising, however, as far as I understand, it is meant to work with time series data, not geometrical data - I guess this may cause some problems as my data isn't a function (in a mathematical understanding).
Thanks and best regards!
As far as I understood, what you want is to get an interpolated version of your original data.
The DFT (or FFT) will not achieve this purpose, since it will perform an Fourier Transform (which is not what you want).
Talking theoretically, what you need to interpolate your data is to define a function to calculate the result in the new-data-points.
So, let's say your data contains 5 points, in which one you have a 1D (to simplify) number stored, representing your data, and you want a new array with 10 points, filled with the linear-interpolation of your original data.
Using numpy.interp:
import numpy as np
original_data = [2, 0, 3, 5, 1] # define your data in 1D
new_data_resolution = 0.5 # define new sampling distance (i.e, your x-axis resolution)
interp_data = np.interp(
x = np.arange(0, 5-1+new_data_resolution , new_data_resolution), # new sampling points (new axis)
xp = range(original_data),
fp = original_data
)
# now interp_data contains (5-1) / 0.5 + 1 = 9 points
After this, you will have a (5-1) / new_resolution (which is greater than 5, since new_resolution < 1)-length data, which values will be (in this case) a linear interpolation of your original data.
After you have achieved/understood this example, you can dive in the scipy.interpolate module to get a better understanding in the interpolation functions (my example uses a linear function to get the data in the missing points).
Applying this to n-D dimensional arrays is straight-forward, iterating over each dimension of your data.

nD "cube" from ranges

I have a mixed integer problem. I need to minimize a function, which is a weighted least square regression, the weights being dependent on the regression (iteratively reweighted least square). 7 parameters define my piecewise regression. I need to find a local minima around a first guess.
I tried to write the problem in gekko, but I somehow find it very difficult to implement. After many tries, I stopped at "negative DOF".
Anyway, I decided to brute force the problem. It works, but it's slow. I build a cube (itertools) around my working point in 7D and calculate the weighted square errors at each of the 3^7 points. I have boundaries for each dimension, and sometimes my working point is on one of the faces of my 7D domain. Technically, I have 2^p * 3^(7-p) points. I now have a list of all the values, find the minimum, move my working point there and restart building a cube, excluding all the points that I have already calculated in the previous loop steps.
Now I want to accelerate it by calculating the gradient at my working point and move faster (skip a step or two in my loop). np.gradient will require a 7d array in order to perform correctly.
Given a point, and 7 ranges around that point, how to make a 7D array in an efficient way? How to make an image of this array with the values of my function?
Please don't say 7 for loops.
Regardless of whether your function is vectorized, you can use an approach with np.indices like this:
base_grid = np.indices(7 * (3,), sparse=False) - 1
This produces an array of all the combinations of -1, 0, 1 that you need. np.meshgrid does something similar, but the arrays will be separated into a tuple, which is inconvenient.
At each iteration, you modify the grid with your step (scale) and offset:
current_grid = base_grid * scale + offset
If your function is vectorized, you call it directly, the grid is 7 3x3x3x3x3x3x3 arrays. If it accepts seven inputs, just use star expansion.
If your function is not vectorized, you can still step along the corresponding elements in a single loop, not seven loops, using np.nditer:
with np.nditer([current_grid, None],
op_axes=[list(range(1, current_grid.ndim)), None]) as it:
for x, y in it:
y[:] = f(*x)
j = it.operands[1]

Laplacian of Gaussian Edge Detector Being Affected by Change of Mask Size

For a class, I've written a Laplacian of Gaussian edge detector that works in the following way.
Make a Laplacian of Gaussian mask given the variance of the Gaussian the size of the mask
Convolve it with the image
Find the zero crossings in a really shoddy manner, these are the edges of the image
If you so desire, the code for this program can be viewed here, but the most important part is where I create my Gaussian mask which depends on two functions that I've reproduced here for your convenience:
# Function for calculating the laplacian of the gaussian at a given point and with a given variance
def l_o_g(x, y, sigma):
# Formatted this way for readability
nom = ( (y**2)+(x**2)-2*(sigma**2) )
denom = ( (2*math.pi*(sigma**6) ))
expo = math.exp( -((x**2)+(y**2))/(2*(sigma**2)) )
return nom*expo/denom
# Create the laplacian of the gaussian, given a sigma
# Note the recommended size is 7 according to this website http://homepages.inf.ed.ac.uk/rbf/HIPR2/log.htm
# Experimentally, I've found 6 to be much more reliable for images with clear edges and 4 to be better for images with a lot of little edges
def create_log(sigma, size = 7):
w = math.ceil(float(size)*float(sigma))
# If the dimension is an even number, make it uneven
if(w%2 == 0):
print "even number detected, incrementing"
w = w + 1
# Now make the mask
l_o_g_mask = []
w_range = int(math.floor(w/2))
print "Going from " + str(-w_range) + " to " + str(w_range)
for i in range_inc(-w_range, w_range):
for j in range_inc(-w_range, w_range):
l_o_g_mask.append(l_o_g(i,j,sigma))
l_o_g_mask = np.array(l_o_g_mask)
l_o_g_mask = l_o_g_mask.reshape(w,w)
return l_o_g_mask
All in all, it works relatively well, even if it is extremely slow because I don't know how to leverage Numpy. However, whenever I change the size of the Gaussian mask, the thickness of the edges I detect change drastically.
Here is the image run with a size of mask equivalent to 4 times the given variance of the Gaussian:
Here is the same image run with a size of mask equivalent to 6 times the variance:
I'm kind of baffled, because the only thing the size parameter should change is the accuracy of the approximation of the Laplacian of Gaussian mask before I begin to convolve it with the image. So I ran a test where I wanted to vizualize how my mask looked given different size parameters.
Here it is with a size of 4:
Here it is with a size of 6:
The shape of the function seems to be the same as far as I can tell from the zero crossings (they happen to be spaced around four pixels apart) and their peaks. Is there a better way to check?
Any suggestions as to why this issue might be occurring or how to investigate further are appreciated.
It turns out your concept about the effect of increasing the mask size is wrong. Increasing the size doesn't actually improve the quality of approximation or the resolution of the function. To explain, instead of using a complicated 2D function like the Laplacian of the Gaussian, let's take things back down to the one dimension and pretend we are approximating the function f(x) = x^2.
Now you code for calculating the function would look like this:
def derp(theta, size):
w = math.ceil(float(size)*float(sigma))
# If the dimension is an even number, make it uneven
if(w%2 == 0):
print "even number detected, incrementing"
w = w + 1
# Now make the mask
x_mask = []
w_range = int(math.floor(w/2))
print "Going from " + str(-w_range) + " to " + str(w_range)
for i in range_inc(-w_range, w_range):
x_mask = a*i^2
If you were to increase the "size" of this function, you wouldn't be increasing the resolution, you're actually increasing the range of values of x that you're grabbing from. For example, for a size of 3 you're evaluating -1, 0, 1, for a size of 5 you're evaluating -2, -1, 0, 1, 2. Notice this doesn't increase the spacing between the pixels. This is what you're actually seeing when you talk about the zero crossing occurring the same number of pixels apart.
Consequently, when convoluting with this really silly mask, you would get really different results. But what if we went back to the Laplacian of the Gaussian?
Well, the nice property the Laplacian of the Gaussian has is that the farther you go with it, the more zero values you get. So unlike our silly x^2 function, you should be getting the same results after some time.
Now, I think the reason you didn't see this with your test cases is because they were too limited in size, because your program is too slow for you to really see the difference between size=15 and size=20, but if were to actually run those cases I think you would see that the image doesn't change that much.
This still doesn't answer what you should be doing, for that, we're going to have to look to the professionals. Namely, the implementation of the gaussian_filter in Scipy (source here).
When you look at their source code, the first thing you'll notice is that when creating their mask they're basically doing the same thing as you. They are always using an integer step size and they are scaling the size of the mask by it's standard deviation.
As to why they are doing it that way, I can't answer, since I don't have that much of an in depth knowledge of image processing or Scipy. However, this may make for a good new question to ask on SO.

Python scipy.fftpack.rfft frequency bin mapping

I'm trying to get the correct FFT bin index based on the given frequency. The audio is being sampled at 44.1k Hz and the FFT size is 1024. Given the signal is real (capture from PyAudio, decoded through numpy.fromstring, windowed by scipy.signal.hann), I then perform FFT through scipy.fftpack.rfft, and compute the decibel of the result, in whole, magnitude = 20 * scipy.log10(abs(rfft(audio_sample)))
Based on this, and this, I originally had my mapping from the FFT bin index, k, to any frequency, F, as:
F = k*Fs/N for k = 0 ... N/2-1 where Fs is the sampling rate, and N is the FFT bin size, in this case, 1024. And the reverse as:
k = F*N/Fs for F = 0Hz ... Fs/2-Fs/N
However, realizing that the rfft's result is no symmetric like fft, and provides the result, in an N size array. I now have some questions in regarding the mapping and the function. Documentation unfortunately did not provide much information as I'm novice in this area.
My questions:
To me, the result of rfft on an audio sample can be used directly from the first bin to the last bin, as no symmetry occurs in the output, is that correct?
Given the lack of symmetry from the above, the frequency resolution appears to have increased, is this interpretation correct?
Because of using rfft, my mapping function from bin index k to frequency F is now F = k*Fs/(2N) for k = 0 ... N-1 is this correct?
Conversely, the reverse mapping function from frequency F to bin index k now becomes k = 2*F*N/Fs for F = 0Hz ... Fs/2-(Fs/2/N), what about the correctness of this?
My general confusion arises from how rfft is related to fft, and how the mapping can be done correctly while using rfft. I believe my mapping is offset by a small amount, and that is crucial in my application. Please point out the mistake or advise on the matter if possible, thank you very much.
First to clear up a few things for you:
A quick reference to the fftpack documentation reveals that rfft only gives you an output vector from 0..512 (in your case). The reason for this is exactly because of the symmetry present when calculating the discrete Fourier transform of a real-valued input:
y[k] = y*[N-k] (see Wikipedia page on DFTs). Therefore, the rfft function only calculates and stores N/2+1 values since you can calculate the other half by just taking the complex conjugates (should you really want it for plotting (say)). The fft function makes no assumption on the input values (they can have both a real and imaginary part) and therefore no symmetry can be assumed in the output and it gives you a full output vector with N values. Admittedly, most applications use a real input, so people tend to assume the symmetry is always there. Note that the Fast Fourier Transform (FFT) is an (efficient) algorithm to calculate the Discrete Fourier Transform (DFT) and the rfft function also uses the FFT to do the calculation.
In light of the above, your indices for accessing the output vector are out of bounds, i.e. > 512. The reasons why/how you can do this depends on your code. You should clearly distinguish between the 'logical N' (that you use to map the bin frequencies, define the DFT etc.) and the 'computational N' (the actual number of values in your output vector), then all your problems should disappear.
To concretely answer your questions:
No. There is symmetry and you need to use this to calculate the last bins (but they give you no extra information).
No. The only way to increase resolution of a DFT is to increase your sample length.
No, but almost. F = k*Fs/N for k = 0..N/2
For an output vector with N bins you get frequencies from 0 to (N-1)/N*Fs. Using the rfft you will have an output vector with N/2+1 bins. You do the maths, but I get 0..Fs/2
Hope things are clearer now.

How do I perform a convolution in python with a variable-width Gaussian?

I need to perform a convolution using a Gaussian, however the width of the Gaussian needs to change. I'm not doing traditional signal processing but instead I need to take my perfect Probability Density Function (PDF) and ``smear" it, based on the resolution of my equipment.
For instance, suppose my PDF starts out as a spike/delta-function. I'll model this as a very narrow Gaussian. After being run through my equipment, it will be smeared out according to some Gaussian resolution. I can calculate this using the scipy.signal convolution functions.
import numpy as np
import matplotlib.pylab as plt
import scipy.signal as signal
import scipy.stats as stats
# Create the initial function. I model a spike
# as an arbitrarily narrow Gaussian
mu = 1.0 # Centroid
sig=0.001 # Width
original_pdf = stats.norm(mu,sig)
x = np.linspace(0.0,2.0,1000)
y = original_pdf.pdf(x)
plt.plot(x,y,label='original')
# Create the ``smearing" function to convolve with the
# original function.
# I use a Gaussian, centered at 0.0 (no bias) and
# width of 0.5
mu_conv = 0.0 # Centroid
sigma_conv = 0.5 # Width
convolving_term = stats.norm(mu_conv,sigma_conv)
xconv = np.linspace(-5,5,1000)
yconv = convolving_term.pdf(xconv)
convolved_pdf = signal.convolve(y/y.sum(),yconv,mode='same')
plt.plot(x,convolved_pdf,label='convolved')
plt.ylim(0,1.2*max(convolved_pdf))
plt.legend()
plt.show()
This all works no problem. But now suppose my original PDF is not a spike, but some broader function. For example, a Gaussian with sigma=1.0. And now suppose my resolution actually varys over x: at x=0.5, the smearing function is a Gaussian with sigma_conv=0.5, but at x=1.5, the smearing function is a Gaussian with sigma_conv=1.5. And suppose I know the functional form of the x-dependence of my smearing Gaussian. Naively, I thought I would change the line above to
convolving_term = stats.norm(mu_conv,lambda x: 0.2*x + 0.1)
But that doesn't work, because the norm function expects a value for the width, not a function. In some sense, I need my convolving function to be a 2D array, where I have a different smearing Gaussian for each point in my original PDF, which remains a 1D array.
So is there a way to do this with functions already defined in Python? I have some code to do this that I wrote myself....but I want to make sure I've not just re-invented the wheel.
Thanks in advance!
Matt
Question, in brief:
How to convolve with a non-stationary kernel, for example, a Gaussian that changes width for different locations in the data, and does a Python an existing tool for this?
Answer, sort-of:
It's difficult to prove a negative, but I do not think that a function to perform a convolution with a non-stationary kernel exists in scipy or numpy. Anyway, as you describe it, it can't really be vectorized well, so you may as well do a loop or write some custom C code.
One trick that might work for you is, instead of changing the kernel size with position, stretch the data with the inverse scale (ie, at places where you'd want to the Gaussian with to be 0.5 the base width, stretch the data to 2x). This way, you can do a single warping operation on the data, a standard convolution with a fixed width Gaussian, and then unwarp the data to original scale.
The advantages of this approach are that it's very easy to write, and is completely vectorized, and therefore probably fairly fast to run.
Warping the data (using, say, an interpolation method) will cause some loss of accuracy, but if you choose things so that the data is always expanded and not reduced in your initial warping operation, the losses should be minimal.

Categories