signal.correlate 'same'/'full' meaning? - python

I am wondering what the mode arguments in signal.correlate (or numpy.correlate) mean?
def crossCorrelator(sig1, sig2):
correlate = signal.correlate(sig1,sig2,mode='same')
flux0 = [ 0.02006948 0.01358697 -0.06196026 -0.03842506 -0.09023056 -0.05464169 -0.02530553 -0.01937054 -0.01237411 0.03472263 0.17865012 0.27441767 0.23532932 0.16358341 0.08743969 0.12166425 0.10287468 0.13430794 0.08262321 0.0515434 0.04657624 0.09017276 0.09131331 0.04696824 -0.03901519 -0.01413654 0.05448175 0.1236946 0.09968044 -0.001584 -0.06094561 -0.02998289 -0.00113092 0.04336605 0.01105071 0.0527657 0.03825847 0.02309524]
flux1 = [-0.02946104 -0.02590192 -0.02274955 0.00485888 -0.0149776 0.01757462 0.02820086 0.0379213 0.03580811 0.06507382 0.09995243 0.12814133 0.16109725 0.12371425 0.08273643 0.09433014 0.05137761 0.04057405 -0.08171598 -0.06541216 0.00126869 0.09223577 0.06811737 0.0795967 0.08689563 0.0928949 0.09971169 0.05413958 0.05410236 0.00120439 0.02454734 0.06450544 0.01508899 -0.06100537 -0.10038889 -0.00651572 0.01095773 0.05517478]
correlation = crossCorrelator(flux0,flux1)
f, axarr = plt.subplots(2)
When I use mode 'same' the correlation array has same dimension as the fluxes for full it has double? If the len(flux0/1) is of dimension time what dimension would len(correlation) be ?
I am really more looking for a mathematical explanation, the answers I have found so far were more of technical nature...

Given two sequences (a[0], .., a[A-1]) and (b[0], .., b[B-1]) of lengths A and B, respectively, the convolution is calculated as
c[n] = sum_m a[m] * b[n-m]
If mode=="full" then the convolution is calculated for n ranging from 0 to A+B-2, so the return array has A+B-1 elements.
If mode=="same" then scipy.signal.correlate computes the convolution for n ranging from (B-1)/2 to A-1+(B-1)/2, where integer division is assumed. The return array has A elements. numpy.correlate behaves the same way only if A>=B; if A is less than B it switches the two arrays (and the returned array has B elements).
If mode=="valid" then the convolution is calculated for n ranging from min(A,B)-1 to max(A,B)-1, and therefore has max(A,B)-min(A,B)+1 elements.


Is there a way to create a vector field from a list of vectors in numpy?

I have some code where I record a list of vectors over a trajectory. For each of these I have the N dimensional point from which it originates, and the vector from that point.
I have all of these ordered in an array from start to finish, and I would like to have these N dimensional vectors placed at points in the space relating to their origin so that I can calculate the divergence.
Example origin points:
[[-0.03194652 -0.02481244 0.02337171 -0.04088087]
[-0.03244277 0.16996671 0.02255409 -0.32609914]
[-0.02904343 0.3647604 0.01603211 -0.61158502]
[-0.02174823 0.16941809 0.00380041 -0.31389597]
[-0.01835986 -0.02575779 -0.00247751 -0.02001695]
[-0.01887502 0.1693996 -0.00287785 -0.31348053]
[-0.01548703 -0.02568124 -0.00914746 -0.02170657]
[-0.01600065 -0.22067082 -0.00958159 0.26807625]
[-0.02041407 -0.02541345 -0.00422007 -0.02761331]
[-0.02092234 0.16976877 -0.00477233 -0.32162472]]
Example vectors at each point
[[-0.00049625 0.19477914 -0.00081762 -0.28521826]
[ 0.00339933 0.19479369 -0.00652198 -0.28548588]
[ 0.00729521 -0.19534231 -0.0122317 0.29768905]
[ 0.00338836 -0.19517588 -0.00627792 0.29387903]
[-0.00051516 0.19515739 -0.00040034 -0.29346358]
[ 0.00338799 -0.19508084 -0.00626961 0.29177396]
[-0.00051362 -0.19498958 -0.00043413 0.28978282]
[-0.00441342 0.19525737 0.00536152 -0.29568956]
[-0.00050827 0.19518221 -0.00055227 -0.29401141]
[ 0.00339538 -0.19505367 -0.00643249 0.29117411]]
I attempted to use np.meshgrid over the initial points for each dimension, but I had to use sparse=True to save memory.
I am a bit stuck here, can anyone help?
For my specific example I have a particle x in 4 dimensional space.
I have a collection of trajectories in a list for it moving through this N dimensional space of a set length of 10 time steps. Each entry in the trajectory is an entry [(t_start, t_end)]
initial_array = []
vector_array = []
for trajectory in range(trajectories):
for t in range(timesteps):
initial_state = trajectory[t][0]
vector_at_state = trajectory[t][0] - initial_state
return np.array(initial_array), np.array(vector_array)
I then want to plot these vectors at the points they began from to generate a "sample" vector field from what I can observe, and then use this function:
def divergence(f, h):
num_dims = len(f)
return np.ufunc.reduce(np.add, [np.gradient(f[i], h[i],axis=i) for i in range(num_dims)])
To calculate the divergence over the sampled trajectories.

Resampling 2-d array using Fourier transform method

I have a question on the resampling 2-d array.
Sometimes, the original size of the geoscience data should be transformed to other size. If the ratio for each axis is equal, the task is simple, in which np.reshape allow a 2-d array of 100x100 to 50x50 without data loss. The code is shown as:
## creat a original data
xc1, xc2, yc1, yc2 = 100, 110, 35, 45
lon,lat = np.linspace(xc1,xc2,XSIZE),np.linspace(yc1,yc2,YSIZE)
pop = np.random.uniform(low=1000, high=50000, size=(XSIZE*YSIZE,)).reshape(YSIZE,XSIZE)
## reshape
shape = np.array(pop.shape, dtype=float)
coarseness = 2 # the new shape is in 50 x 50
new_shape = coarseness * np.ceil(shape/coarseness).astype(int)
zp_pop = np.zeros(new_shape)
zp_pop[:int(shape[0]), :int(shape[1])] = pop
temp = zp_pop.reshape((new_shape[0] // coarseness, coarseness,
new_shape[1] // coarseness, coarseness))
coarse_pop = np.sum(temp, axis=(1,3))
print (pop.sum())
print (coarse_pop.sum())
However, when the coarse factor is different for each axis, this method can not be implemented. I turned to apply other method. Here is an example I tried to use FFT to generate a 60*80 array as output
from scipy import fftpack
pop_fft = fftpack.fft2(pop,shape = (60,80))
pop_res = fftpack.ifft2(pop_fft).real
The data loss was significant. Thus, I posted my issue here. Maybe the resampling function I used was not correct. Or there are some better approach to deal with this situation. Any advices or comments are highly appreciated!
When you set up the 'coarse array' yourself you sum over adjacent entries, instead of computing the average or interpolating.
This way the sum over all elements in the coarse and original array are identical str((coarse_pop.sum()-pop.sum())/(0.5*(pop.sum()+coarse_pop.sum()))) gives '-1.1638426077573779e-16' only a tiny numerical error.
if you compare the mean of the fftpack resampled coarse array it matches up:
alternatively you can correct for the number of elements yourself:
I don't know about your problem but the fftpack way of downsampling the array makes more sense to me. if it's not what you want you can apply the prefactor to the original array, like pop_fft = fftpack.fft2(pop*100*100/(60*80),shape = (60,80))

Matrix left division of stacked arrays using numpy

I'm working on a program to solve the Bloch (or more precise the Bloch McConnell) equations in python. So the equation to solve is:
where A is a NxN matrix, A+ its pseudoinverse and M0 and B are vectors of size N.
The special thing is that I wanna solve the equation for several offsets (and thus several matrices A) at the same time. The new dimensions are:
A: MxNxN
b: Nx1
M0: MxNx1
The conventional version of the program (using a loop over the 1st dimension of size M) works fine, but I'm stuck at one point in the 'parallel version'.
At the moment my code looks like this:
def bmcsim(A, b, M0, timestep):
ex = myexpm(A*timestep) # returns stacked array of size MxNxN
M = np.zeros_like(M0)
for n in range(ex.shape[0]):
A_tmp = A[n,:,:]
A_b = np.linalg.lstsq(A_tmp ,b, rcond=None)[0]
M[n,:,:] = np.abs(np.real([n,:,:], M0[n,:,:] + A_b) - A_b))
return M
and I would like to get rid of that for n in range(ex.shape[0]) loop. Unfortunately, np.linalg.lstsq doesn't work for stacked arrays, does it? In myexpm is used np.apply_along_axis for a another problem:
def myexpm(A):
vals,vects = np.linalg.eig(A)
tmp = np.einsum('ijk,ikl->ijl', vects, np.apply_along_axis(np.diag, -1, np.exp(vals)))
return np.einsum('ijk,ikl->ijl', tmp, np.linalg.inv(vects))
However, that just works for 1D input data. Is there something similar that I can use with np.linalg.lstsq? The in bmcsim will be replaced with np.einsum like in myexpm I guess, or are there better ways?
Thanks for your help!
I just realized that I can replace np.linalg.lstsq(A,b) with np.linalg.solve(, and managed to get rid of the loop this way:
def bmcsim2(A, b, M0, timestep):
ex = myexpm(A*timestep)
b_stack = np.repeat(b[np.newaxis, :, :], offsets.size, axis=0)
tmp_left = np.einsum('kji,ikl->ijl', np.transpose(A), A)
tmp_right = np.einsum('kji,ikl->ijl', np.transpose(A), b_stack)
A_b_stack = np.linalg.solve(tmp_left , tmp_right )
return np.abs(np.real(np.einsum('ijk,ikl->ijl',ex, M0+A_b_stack ) - A_b_stack ))
This is about 3 times faster, but still a bit complicated. I hope there is a better (shorter/easier) way, that's maybe even faster?!

How to find Mahalanobis distance between two 1D arrays in Python?

I have two 1D arrays, and I need to find out the Mahalanobis distance between them.
Array 1
And, Array 2
I found that Scipy has already implemented the function. However, I am confused about what the value of IV should be. I tried to do the following
V = np.cov(np.array([array_1, array_2]))
IV = np.linalg.inv(V)
print(mahalanobis(array_1, array_2, IV))
But, I get the following error:
line 1043, in mahalanobis
m =, VI), delta)
ValueError: shapes (128,) and (2,2) not aligned: 128 (dim 0) != 2 (dim 0)
array_1 = [-0.10577646642923355, 0.09617947787046432, 0.029290344566106796, 0.02092641592025757, -0.021434104070067406, -0.13410840928554535, 0.028282659128308296, -0.12082239985466003, 0.21936850249767303, -0.06512433290481567, 0.16812698543071747, -0.03302834928035736, -0.18088334798812866, -0.04598559811711311, -0.014739632606506348, 0.06391328573226929, -0.15650317072868347, -0.13678401708602905, 0.01166679710149765, -0.13967938721179962, 0.14632365107536316, 0.025218486785888672, 0.046839646995067596, 0.09690812975168228, -0.13414686918258667, -0.2883925437927246, -0.1435326784849167, -0.17896348237991333, 0.10746842622756958, -0.09142691642045975, 0.04860316216945648, 0.031577128916978836, -0.17280976474285126, -0.059613555669784546, -0.05718057602643967, 0.0401446670293808, 0.026440180838108063, -0.017025159671902657, 0.22091664373874664, 0.024703698232769966, -0.15607595443725586, -0.0018572667613625526, -0.037675946950912476, 0.3210170865058899, 0.10884962230920792, 0.030370134860277176, 0.056784629821777344, -0.030112050473690033, 0.023124486207962036, -0.1449904441833496, 0.08885903656482697, 0.17527811229228973, 0.08804896473884583, 0.038310401141643524, -0.01704210229218006, -0.17355971038341522, -0.018237406387925148, 0.030551932752132416, -0.23085585236549377, 0.13475817441940308, 0.16338199377059937, -0.06968289613723755, -0.04330683499574661, 0.04434924200177193, 0.22637797892093658, 0.07463733851909637, -0.15070196986198425, -0.07500549405813217, 0.10863590240478516, -0.22288714349269867, 0.0010778247378766537, 0.057608842849731445, -0.12828609347343445, -0.17236559092998505, -0.23064571619033813, 0.09910193085670471, 0.46647992730140686, 0.0634111613035202, -0.13985536992549896, 0.052741192281246185, -0.1558966338634491, 0.022585246711969376, 0.10514408349990845, 0.11794176697731018, -0.06241249293088913, 0.06389056891202927, -0.14145469665527344, 0.060088545083999634, 0.09667345881462097, -0.004665130749344826, -0.07927791774272919, 0.21978208422660828, -0.0016187895089387894, 0.04876316711306572, 0.03137822449207306, 0.08962501585483551, -0.09108036011457443, -0.01795950159430504, -0.04094596579670906, 0.03533276170492172, 0.01394269522279501, -0.08244197070598602, -0.05095399543642998, 0.04305890575051308, -0.1195211187005043, 0.16731074452400208, 0.03894471749663353, -0.0222858227789402, -0.07944411784410477, 0.0614166259765625, -0.1481470763683319, -0.09113290905952454, 0.14758692681789398, -0.24051085114479065, 0.164126917719841, 0.1753545105457306, -0.003193420823663473, 0.20875433087348938, 0.03357946127653122, 0.1259773075580597, -0.00022807717323303223, -0.039092566817998886, -0.13582147657871246, -0.01937306858599186, 0.015938198193907738, 0.00787206832319498, 0.05792934447526932, 0.03294186294078827]
array_2 = [-0.1966051608324051, 0.0940953716635704, -0.0031937970779836178, -0.03691547363996506, -0.07240629941225052, -0.07114037871360779, -0.07133384048938751, -0.1283963918685913, 0.15377545356750488, -0.091400146484375, 0.10803385823965073, -0.09235749393701553, -0.1866973638534546, -0.021168243139982224, -0.09094691276550293, 0.07300164550542831, -0.20971564948558807, -0.1847742646932602, -0.009817334823310375, -0.05971141159534454, 0.09904412180185318, 0.0278592761605978, -0.012554554268717766, 0.09818517416715622, -0.1747943013906479, -0.31632938981056213, -0.0864541232585907, -0.13249783217906952, 0.002135572023689747, -0.04935726895928383, 0.010047778487205505, 0.04549024999141693, -0.26334646344184875, -0.05263081565499306, -0.013573898002505302, 0.2042253464460373, 0.06646320968866348, 0.08540669083595276, 0.12267164140939713, -0.018634958192706108, -0.19135263562202454, 0.01208433136343956, 0.09216200560331345, 0.2779296934604645, 0.1531585156917572, 0.10681629925966263, -0.021275708451867104, -0.059720948338508606, 0.06610126793384552, -0.21058350801467896, 0.005440462380647659, 0.18833838403224945, 0.08883830159902573, 0.025969548150897026, 0.0337764173746109, -0.1585341989994049, 0.02370697632431984, 0.10416869819164276, -0.19022507965564728, 0.11423652619123459, 0.09144753962755203, -0.08765758574008942, -0.0032832929864525795, -0.0051014479249715805, 0.19875964522361755, 0.07349056005477905, -0.1031823456287384, -0.10447365045547485, 0.11358538269996643, -0.24666038155555725, -0.05960353836417198, 0.07124857604503632, -0.039664581418037415, -0.20122921466827393, -0.31481748819351196, -0.006801256909966469, 0.41940364241600037, 0.1236235573887825, -0.12495145946741104, 0.12580059468746185, -0.02020396664738655, -0.03004150651395321, 0.11967054009437561, 0.09008713811635971, -0.07470540702342987, 0.09324200451374054, -0.13763070106506348, 0.07720538973808289, 0.19568027555942535, 0.036567769944667816, 0.030284458771348, 0.14119629561901093, -0.03820852190256119, 0.06232285499572754, 0.036639824509620667, 0.07704029232263565, -0.12276224792003632, -0.0035170004703104496, -0.13103705644607544, 0.027697769924998283, -0.01527332328259945, -0.04027168080210686, -0.03659897670149803, 0.03330300375819206, -0.12293602526187897, 0.09043421596288681, -0.019673841074109077, -0.07563626766204834, -0.13991905748844147, 0.014788001775741577, -0.07630413770675659, 0.00017269013915210962, 0.16345393657684326, -0.25710681080818176, 0.19869503378868103, 0.19393865764141083, -0.07422225922346115, 0.19553625583648682, 0.09189949929714203, 0.051557887345552444, -0.0008843056857585907, -0.006250975653529167, -0.1680600494146347, -0.10320111364126205, 0.03232177346944809, -0.08931156992912292, 0.11964476853609085, 0.00814182311296463]
The co-variance matrix of the above arrays turn out to be a singular matrix, and thus I am unable to inverse it. Why does it end up being a singular matrix?
EDIT 2: Solution
Since the co-variance matrix here is singular matrix, I had to pseudo inverse it using np.linalg.pinv(V).
From the numpy.cov docs, the first argument should be an array m such that:
Each row of m represents a variable, and each column a single observation of all those variables.
So to fix your code just take the transpose (with .T) of your array before you call cov:
V = np.cov(np.array([array_1, array_2]).T)
IV = np.linalg.inv(V)
print(mahalanobis(array_1, array_2, IV))
I just tested this out on some random data, and I can confirm it works.
Also, calculating covariance from just two observations is a bad idea, and not likely to be very accurate. If your data is coming from an image, you should use the entire image img (or at least the entire region of interest) when calculating the covariance matrix, then use that matrix to find the Mahalanobis distance between the two vectors of interest:
V = np.cov(np.array(img))
IV = np.linalg.inv(V)
print(mahalanobis(array_1, array_2, IV))
You may or may not need to replace img with img.T, depending on how you generated array_1 and array_2 in the first place.
If you're getting singular covariance matrices, what you have is a math problem, not a code problem. It's apparently a common enough problem that the question "why is my covariance matrix singular?" has already been asked and answered. Very broadly, it seems like it can happen when enough of your data points are "too similar", in some sense. I'd imagine using just two data points also makes this more likely.

numpy.fft.irfft: Why is len(a) necessary?

The documentation for numpy.fft.irfft, the inverse discrete Fourier transform for real input, states
This function computes the inverse of the one-dimensional n-point
discrete Fourier Transform of real input computed by rfft. In other
words, irfft(rfft(a), len(a)) == a to within numerical accuracy. (See
Notes below for why len(a) is necessary here.)
However, the Notes section does not seem to indicate why it would be necessary to specify len(a) in this case. Indeed, everything seems to work correctly even when omitting the length:
a = numpy.random.rand(20)
# array([0.12696983, 0.96671784, 0.26047601, 0.89723652, 0.37674972,
# 0.33622174, 0.45137647, 0.84025508, 0.12310214, 0.5430262 ,
# 0.37301223, 0.44799682, 0.12944068, 0.85987871, 0.82038836,
# 0.35205354, 0.2288873 , 0.77678375, 0.59478359, 0.13755356])
# array([0.12696983, 0.96671784, 0.26047601, 0.89723652, 0.37674972,
# 0.33622174, 0.45137647, 0.84025508, 0.12310214, 0.5430262 ,
# 0.37301223, 0.44799682, 0.12944068, 0.85987871, 0.82038836,
# 0.35205354, 0.2288873 , 0.77678375, 0.59478359, 0.13755356])
Can I omit len(a) in my call to numpy.fft.rfft?
As indicated in the comments, omitting the length works if the length is even, but not if it is odd:
a = numpy.random.rand(21)
# array([0.12696983, 0.96671784, 0.26047601, 0.89723652, 0.37674972,
# 0.33622174, 0.45137647, 0.84025508, 0.12310214, 0.5430262 ,
# 0.37301223, 0.44799682, 0.12944068, 0.85987871, 0.82038836,
# 0.35205354, 0.2288873 , 0.77678375, 0.59478359, 0.13755356,
# 0.85289978])
# array([0.24111601, 0.90078174, 0.37803686, 0.86982605, 0.38581891,
# 0.29202917, 0.72002065, 0.59446031, 0.23485829, 0.55698438,
# 0.42253411, 0.26457788, 0.49961714, 1.06138356, 0.45849842,
# 0.22863701, 0.68431715, 0.73579194, 0.14511054, 0.82140976])
The documentation for the return values of numpy.fft.rfft and numpy.fft.irfft explains why this happens, although the reference to the “Notes” section for numpy.fft.irfft is still misleading:
numpy.fft.rfft(a, n=None, axis=-1, norm=None)
out : complex ndarray
The truncated or zero-padded input, transformed along the axis
indicated by axis, or the last one if axis is not specified. If
n is even, the length of the transformed axis is (n/2)+1. If n is odd, the length is (n+1)/2.
numpy.fft.irfft(a, n=None, axis=-1, norm=None)
out : ndarray
The truncated or zero-padded input, transformed along the axis indicated by axis, or the last one if axis is not specified. The length of the transformed axis is n, or, if n is not given, 2*(m-1) where m is the length of the transformed axis of the input. To get an odd number of output points, n must be specified.
