numpy.fft.irfft: Why is len(a) necessary? - python

The documentation for numpy.fft.irfft, the inverse discrete Fourier transform for real input, states
This function computes the inverse of the one-dimensional n-point
discrete Fourier Transform of real input computed by rfft. In other
words, irfft(rfft(a), len(a)) == a to within numerical accuracy. (See
Notes below for why len(a) is necessary here.)
However, the Notes section does not seem to indicate why it would be necessary to specify len(a) in this case. Indeed, everything seems to work correctly even when omitting the length:
numpy.random.seed(123456)
a = numpy.random.rand(20)
# array([0.12696983, 0.96671784, 0.26047601, 0.89723652, 0.37674972,
# 0.33622174, 0.45137647, 0.84025508, 0.12310214, 0.5430262 ,
# 0.37301223, 0.44799682, 0.12944068, 0.85987871, 0.82038836,
# 0.35205354, 0.2288873 , 0.77678375, 0.59478359, 0.13755356])
numpy.fft.irfft(numpy.fft.rfft(a))
# array([0.12696983, 0.96671784, 0.26047601, 0.89723652, 0.37674972,
# 0.33622174, 0.45137647, 0.84025508, 0.12310214, 0.5430262 ,
# 0.37301223, 0.44799682, 0.12944068, 0.85987871, 0.82038836,
# 0.35205354, 0.2288873 , 0.77678375, 0.59478359, 0.13755356])
Can I omit len(a) in my call to numpy.fft.rfft?

As indicated in the comments, omitting the length works if the length is even, but not if it is odd:
numpy.random.seed(123456)
a = numpy.random.rand(21)
# array([0.12696983, 0.96671784, 0.26047601, 0.89723652, 0.37674972,
# 0.33622174, 0.45137647, 0.84025508, 0.12310214, 0.5430262 ,
# 0.37301223, 0.44799682, 0.12944068, 0.85987871, 0.82038836,
# 0.35205354, 0.2288873 , 0.77678375, 0.59478359, 0.13755356,
# 0.85289978])
numpy.fft.irfft(numpy.fft.rfft(a))
# array([0.24111601, 0.90078174, 0.37803686, 0.86982605, 0.38581891,
# 0.29202917, 0.72002065, 0.59446031, 0.23485829, 0.55698438,
# 0.42253411, 0.26457788, 0.49961714, 1.06138356, 0.45849842,
# 0.22863701, 0.68431715, 0.73579194, 0.14511054, 0.82140976])
The documentation for the return values of numpy.fft.rfft and numpy.fft.irfft explains why this happens, although the reference to the “Notes” section for numpy.fft.irfft is still misleading:
numpy.fft.rfft(a, n=None, axis=-1, norm=None)
Returns:
out : complex ndarray
The truncated or zero-padded input, transformed along the axis
indicated by axis, or the last one if axis is not specified. If
n is even, the length of the transformed axis is (n/2)+1. If n is odd, the length is (n+1)/2.
numpy.fft.irfft(a, n=None, axis=-1, norm=None)
Returns:
out : ndarray
The truncated or zero-padded input, transformed along the axis indicated by axis, or the last one if axis is not specified. The length of the transformed axis is n, or, if n is not given, 2*(m-1) where m is the length of the transformed axis of the input. To get an odd number of output points, n must be specified.

Related

shapely interpolate in three dimensions returns Point Z but invalid results

I am trying to use interpolate along a three dimensional line. However, any changes in the Z axis are not taken into account by .interpolate.
LineString([(0, 0, 0), (0, 0, 1), (0, 0, 2)]).interpolate(1, normalized=True).wkt
'POINT Z (0 0 0)'
vs
LineString([(0, 0, 0), (0, 1, 0), (0, 2, 0)]).interpolate(1, normalized=True).wkt
'POINT Z (0 2 0)'
I read the documentation and it is silent on 3D lines or the restriction is documented at a higher level than the interpolate documentation.
Is this a bug? I can't believe I'm the first person to try this.
Assuming that there is no direct way to accomplish this, any suggestions for doing my own interpolation?
That does indeed seem like a bug from shapely. I looked into the source code a little bit and I'm willing to bet it's an upstream issue with PyGEOS.
Anyways, here's a little implementation I put together:
import numpy as np
import shapely
import geopandas as gpd # Only necessary for the examples, not the actual function
def my_interpolate(input_line, input_dist, normalized=False):
'''
Function that interpolates the coordinates of a shapely LineString.
Note: If you use this function on a MultiLineString geometry, it will
"flatten" the geometry and consider all the points in it to be
consecutively connected. For example, consider the following shape:
MultiLineString(((0,0),(0,2)),((0,4),(0,6)))
In this case, this function will assume that there is no gap between
(0,2) and (0,4). Instead, the function will assume that these points
all connected. Explicitly, the MultiLineString above will be
interpreted instead as the following shape:
LineString((0,0),(0,2),(0,4),(0,6))
Parameters
----------
input_line : shapely.geometry.Linestring or shapely.geometry.MultiLineString
(Multi)LineString whose coordinates you want to interpolate
input_dist : float
Distance used to calculate the interpolation point
normalized : boolean
Flag that indicates whether or not the `input_dist` argument should be
interpreted as being an absolute number or a percentage that is
relative to the total distance or not.
When this flag is set to "False", the `input_dist` argument is assumed
to be an actual absolute distance from the starting point of the
geometry. When this flag is set to "True", the `input_dist` argument
is assumed to represent the relative distance with respect to the
geometry's full distance.
The default is False.
Returns
-------
shapely.geometry.Point
The shapely geometry of the interpolated Point.
'''
# Making sure the entry value is a LineString or MultiLineString
if ((input_line.type.lower() != 'linestring') and
(input_line.type.lower() != 'multilinestring')):
return None
# Extracting the coordinates from the geometry
if input_line.type.lower()[:len('multi')] == 'multi':
# In case it's a multilinestring, this step "flattens" the points
coords = [item for sub_list in [list(this_geom.coords) for
this_geom in input_line.geoms]
for item in sub_list]
else:
coords = [tuple(coord) for coord in list(input_line.coords)]
# Transforming the list of coordinates into a numpy array for
# ease of manipulation
coords = np.array(coords)
# Calculating the distances between points
dists = ((coords[:-1] - coords[1:])**2).sum(axis=1)**0.5
# Calculating the cumulative distances
dists_cum = np.append(0,dists.cumsum())
# Finding the total distance
dist_total = dists_cum[-1]
# Finding appropriate use of the `input_dist` value
if normalized == False:
input_dist_abs = input_dist
input_dist_rel = input_dist / dist_total
else:
input_dist_abs = input_dist * dist_total
input_dist_rel = input_dist
# Taking care of some edge cases
if ((input_dist_rel < 0) or
(input_dist_rel > 1) or
(input_dist_abs < 0) or
(input_dist_abs > dist_total)):
return None
elif ((input_dist_rel == 0) or (input_dist_abs == 0)):
return shapely.geometry.Point(coords[0])
elif ((input_dist_rel == 1) or (input_dist_abs == dist_total)):
return shapely.geometry.Point(coords[-1])
# Finding which point is immediately before and after the input distance
pt_before_idx = np.arange(dists_cum.shape[0])[(dists_cum <= input_dist_abs)].max()
pt_after_idx = np.arange(dists_cum.shape[0])[(dists_cum >= input_dist_abs)].min()
pt_before = coords[pt_before_idx]
pt_after = coords[pt_after_idx]
seg_full_dist = dists[pt_before_idx]
dist_left = input_dist_abs - dists_cum[pt_before_idx]
# Calculating the interpolated coordinates
interpolated_coords = ((dist_left / seg_full_dist) * (pt_after - pt_before)) + pt_before
# Creating a shapely geometry
interpolated_point = shapely.geometry.Point(interpolated_coords)
return interpolated_point
The function above can be used on Shapely (Multi)LineStrings. Here's an example of it being applied to a simple LineString.
input_line = shapely.geometry.LineString([(0, 0, 0),
(1, 2, 3),
(4, 5, 6)])
interpolated_point = my_interpolate(input_line, 2.5, normalized=False)
print(interpolated_point.wkt)
> POINT Z (0.6681531047810609 1.336306209562122 2.004459314343183)
And here's an example of using the apply method to perform the interpolation on a whole GeoDataFrame of LineStrings:
line_df = gpd.GeoDataFrame({'id':[1,
2,
3],
'geometry':[input_line,
input_line,
input_line],
'interpolate_dist':[0.5,
2.5,
6.5],
'interpolate_dist_normalized':[True,
False,
False]})
interpolated_points = line_df.apply(
lambda row: my_interpolate(input_line=row['geometry'],
input_dist=row['interpolate_dist'],
normalized=row['interpolate_dist_normalized']),
axis=1)
print(interpolated_points.apply(lambda point: point.wkt))
> 0 POINT Z (1.419876550265357 2.419876550265356 3...
> 1 POINT Z (0.6681531047810609 1.336306209562122 ...
> 2 POINT Z (2.592529850263281 3.592529850263281 4...
> dtype: object
Important notes
Corner cases and error handling
Please note that the function I developed doesn't do error handling very well. In many cases, it just silently returns a None object. Depending on your use case, you might want to adjust that behavior.
MultiLineStrings
The function above can be used on MultiLineStrings, but it makes some simplifications and assumptions. If you use this function on a MultiLineString geometry, it will "flatten" the geometry and consider all the points in it to be consecutively connected. For example, consider the following shape:
MultiLineString(((0,0),(0,2)),((0,4),(0,6)))
In this case, the function will assume that there is no gap between (0,2) and (0,4). Instead, the function will assume that these points are all connected. Explicitly, the MultiLineString above will be interpreted instead as the following shape:
LineString((0,0),(0,2),(0,4),(0,6))
Someone asked me " Can you interpolate along each axis instead of doing all three together?" I think the answer is yes and here is the approach I used.
# Upsample to 1S intervals rather than our desired interval because resample throws
# out rows that do not fall on the desired interval, including the rows we want to keep.
int_df = df.resample('1S', origin='start').asfreq()
# For each axis, interpolate to fill in NAN values.
int_df['Latitude'] = int_df['Latitude'].interpolate(method='polynomial', order=order)
int_df['Longitude'] = int_df['Longitude'].interpolate(method='polynomial', order=order)
int_df['AGL'] = int_df['AGL'].interpolate(method='polynomial', order=order)
# Now downsample to our desired frequency
int_df = int_df.resample('5S', origin='start').asfreq()
I initially resampled at 5S intervals but that caused any existing points that were not on the interval boundaries to get dropped in favor of new ones that were on the interval boundaries. For my use case this is important. If you want regular intervals then you don't need to upsample then down sample.
After that, just interpolate each of the three axis.
So, if I started with:
I now have:
To answer the question of why the shapely manipulation functions are not operating on 3D / Z:
From shapely docs. (writing this when version 1.8.X is current)
A third z coordinate value may be used when constructing instances,
but has no effect on geometric analysis. All operations are performed
in the x-y plane.
I also need Z for my purposes. So was searching for this information to see if using geopandas (which uses shaply) was an option, rather then osgeo.ogr.

Applying a Numba guvectorize function over time dimension of an 3D-Array with Xarray's apply_ufunc

I have some problems getting this to work properly and I'm also open to other suggestions as I'm not 100% sure if I'm going the right way with this.
Here is some simple dummy data:
times = pd.date_range(start='2012-01-01',freq='1W',periods=25)
x = np.array([range(0,20)]).squeeze()
y = np.array([range(20,40)]).squeeze()
data = np.random.randint(3, size=(25,20,20))
ds = xr.DataArray(data, dims=['time', 'y', 'x'], coords = {'time': times, 'y': y, 'x': x})
For each x,y-coordinate, I want to return the longest sequence of 1s or 2s over time. So my input array is 3D (time, x, y) and my output 2D (x, y). The code in 'seq_gufunc' is inspired by this thread.
My actual dataset is much larger (with landuse classes instead of 1s, 2s, etc) and this is only a small part of a bigger workflow, where I'm also using dask for parallel processing. So in the end this should run fast and efficiently, which is why I ended up trying to figure out how to get numba's #guvectorize and Xarray's apply_ufunc to work together:
#guvectorize(
"(int64[:], int64[:])",
"(n) -> (n)", target='parallel', nopython=True
)
def seq_gufunc(x, out):
f_arr = np.array([False])
bool_stack = np.hstack((f_arr, (x == 1) | (x == 2), f_arr))
# Get start, stop index pairs for sequences
idx_pairs = np.where(np.diff(bool_stack))[0].reshape(-1, 2)
# Get length of longest sequence
longest_seq = np.max(np.diff(idx_pairs))
out[:] = longest_seq
## Input for dim would be: 'time'
def apply_seq_gufunc(data, dim):
return xr.apply_ufunc(seq_gufunc,
data,
input_core_dims=[[dim]],
exclude_dims=set((dim,)),
dask="allowed")
There are probably some very obvious mistakes that hopefully someone can point out. I have a hard time understanding what actually goes on in the background and how I should set up the layout-string of #guvectorize and the parameters of apply_ufunc so that it does what I want.
EDIT2:
This is the working solution. See #OriolAbril 's answer for more information about the parameters of apply_ufunc and guvectorize. It was also necessary to implement the if...else... clause in case no values match and to avoid the ValueError that would be raised.
#guvectorize(
"(int64[:], int64[:])",
"(n) -> ()", nopython=True
)
def seq_gufunc(x, out):
f_arr = np.array([False])
bool_stack = np.hstack((f_arr, (x == 1) | (x == 2), f_arr))
if np.sum(bool_stack) == 0:
longest_seq = 0
else:
# Get start, stop index pairs for sequences
idx_pairs = np.where(np.diff(bool_stack))[0].reshape(-1, 2)
# Get length of longest sequence
longest_seq = np.max(np.diff(idx_pairs))
out[:] = longest_seq
def apply_seq_gufunc(data, dim):
return xr.apply_ufunc(seq_gufunc,
data,
input_core_dims=[[dim]],
dask="parallelized",
output_dtypes=['uint8']
)
I'd point you out to How to apply a xarray u_function over NetCDF and return a 2D-array (multiple new variables) to the DataSet, the immediate goal is not the same, but the detailed description and examples should clarify the issue.
In particular, you are right in using time as input_core_dims (in order to make sure it is moved to the last dimension) and it is correctly formatted as a list of lists, however, you do not need excluded_dims but output_core_dims==[["time"]].
The output has the same shape as the input, however, as explained in the link above, apply_ufunc expects it will have same shape as broadcasted dims. output_core_dims is needed to get apply_ufunc to expect output with dims y, x, time.

Shift interpolation does not give expected behaviour

When using scipy.ndimage.interpolation.shift to shift a numpy data array along one axis with periodic boundary treatment (mode = 'wrap'), I get an unexpected behavior. The routine tries to force the first pixel (index 0) to be identical to the last one (index N-1) instead of the "last plus one (index N)".
Minimal example:
# module import
import numpy as np
from scipy.ndimage.interpolation import shift
import matplotlib.pyplot as plt
# print scipy.__version__
# 0.18.1
a = range(10)
plt.figure(figsize=(16,12))
for i, shift_pix in enumerate(range(10)):
# shift the data via spline interpolation
b = shift(a, shift=shift_pix, mode='wrap')
# plotting the data
plt.subplot(5,2,i+1)
plt.plot(a, marker='o', label='data')
plt.plot(np.roll(a, shift_pix), marker='o', label='data, roll')
plt.plot(b, marker='o',label='shifted data')
if i == 0:
plt.legend(loc=4,fontsize=12)
plt.ylim(-1,10)
ax = plt.gca()
ax.text(0.10,0.80,'shift %d pix' % i, transform=ax.transAxes)
Blue line: data before the shift
Green line: expected shift behavior
Red line: actual shift output of scipy.ndimage.interpolation.shift
Is there some error in how I call the function or how I understand its behavior with mode = 'wrap'? The current results are in contrast to the mode parameter description from the related scipy tutorial page and from another StackOverflow post. Is there an off-by-one-error in the code?
Scipy version used is 0.18.1, distributed in anaconda-2.2.0
It seems that the behaviour you have observed is intentional.
The cause of the problem lies in the C function map_coordinate which translates the coordinates after shift to ones before shift:
map_coordinate(double in, npy_intp len, int mode)
The function is used as the subroutine in NI_ZoomShift that does the actual shift. Its interesting part looks like this:
Example. Lets see how the output for output = shift(np.arange(10), shift=4, mode='wrap') (from the question) is computed.
NI_ZoomShift computes edge values output[0] and output[9] in some special way, so lets take a look at computation of output[1] (a bit simplified):
# input = [0,1,2,3,4,5,6,7,8,9]
# output = [ ,?, , , , , , , , ] '?' == computed position
# shift = 4
output_index = 1
in = output_index - shift # -3
sz = 10 - 1 # 9
in += sz * ((-5 / 9) + 1)
# += 9 * (( 0) + 1) == 9
# in == 6
return input[in] # 6
It is clear that sz = len - 1 is responsible for the behaviour you have observed. It was changed from sz = len in a suggestively named commit dating back to 2007: Fix off-by-on errors in ndimage boundary routines. Update tests.
I don't know why such change was introduced. One of the possible explanations that come to my mind is as follows:
Function 'shift' uses splines for interpolation.
A knot vector of an uniform spline on interval [0, k] is simply [0,1,2,...,k]. When we say that the spline should wrap, it is natural to require equality on values for knots 0 and k, so that many copies of the spline could be glued together, forming a periodic function:
0--1--2--3-...-k 0--1--2--3-...-k 0--1-- ...
0--1--2--3-...-k 0--1--2--3-...-k ...
Maybe shift just treats its input as a list of values for spline's knots?
It is worth noting that this behavior appears to be a bug, as noted in this SciPy issue:
https://github.com/scipy/scipy/issues/2640
The issue appears to effect every extrapolation mode in scipy.ndimage other than mode='mirror'.

signal.correlate 'same'/'full' meaning?

I am wondering what the mode arguments in signal.correlate (or numpy.correlate) mean?
def crossCorrelator(sig1, sig2):
correlate = signal.correlate(sig1,sig2,mode='same')
return(correlate)
flux0 = [ 0.02006948 0.01358697 -0.06196026 -0.03842506 -0.09023056 -0.05464169 -0.02530553 -0.01937054 -0.01237411 0.03472263 0.17865012 0.27441767 0.23532932 0.16358341 0.08743969 0.12166425 0.10287468 0.13430794 0.08262321 0.0515434 0.04657624 0.09017276 0.09131331 0.04696824 -0.03901519 -0.01413654 0.05448175 0.1236946 0.09968044 -0.001584 -0.06094561 -0.02998289 -0.00113092 0.04336605 0.01105071 0.0527657 0.03825847 0.02309524]
flux1 = [-0.02946104 -0.02590192 -0.02274955 0.00485888 -0.0149776 0.01757462 0.02820086 0.0379213 0.03580811 0.06507382 0.09995243 0.12814133 0.16109725 0.12371425 0.08273643 0.09433014 0.05137761 0.04057405 -0.08171598 -0.06541216 0.00126869 0.09223577 0.06811737 0.0795967 0.08689563 0.0928949 0.09971169 0.05413958 0.05410236 0.00120439 0.02454734 0.06450544 0.01508899 -0.06100537 -0.10038889 -0.00651572 0.01095773 0.05517478]
correlation = crossCorrelator(flux0,flux1)
f, axarr = plt.subplots(2)
axarr[0].plot(np.arange(len(flux0)),flux0)
axarr[0].plot(np.arange(len(flux1)),flux1)
axarr[1].plot(np.arange(len(correlation)),correlation)
plt.show()
When I use mode 'same' the correlation array has same dimension as the fluxes for full it has double? If the len(flux0/1) is of dimension time what dimension would len(correlation) be ?
I am really more looking for a mathematical explanation, the answers I have found so far were more of technical nature...
Given two sequences (a[0], .., a[A-1]) and (b[0], .., b[B-1]) of lengths A and B, respectively, the convolution is calculated as
c[n] = sum_m a[m] * b[n-m]
If mode=="full" then the convolution is calculated for n ranging from 0 to A+B-2, so the return array has A+B-1 elements.
If mode=="same" then scipy.signal.correlate computes the convolution for n ranging from (B-1)/2 to A-1+(B-1)/2, where integer division is assumed. The return array has A elements. numpy.correlate behaves the same way only if A>=B; if A is less than B it switches the two arrays (and the returned array has B elements).
If mode=="valid" then the convolution is calculated for n ranging from min(A,B)-1 to max(A,B)-1, and therefore has max(A,B)-min(A,B)+1 elements.

Why NUMPY correlate and corrcoef return different values and how to "normalize" a correlate in "full" mode?

I'm trying to use some Time Series Analysis in Python, using Numpy.
I have two somewhat medium-sized series, with 20k values each and I want to check the sliding correlation.
The corrcoef gives me as output a Matrix of auto-correlation/correlation coefficients. Nothing useful by itself in my case, as one of the series contains a lag.
The correlate function (in mode="full") returns a 40k elements list that DO look like the kind of result I'm aiming for (the peak value is as far from the center of the list as the Lag would indicate), but the values are all weird - up to 500, when I was expecting something from -1 to 1.
I can't just divide it all by the max value; I know the max correlation isn't 1.
How could I normalize the "cross-correlation" (correlation in "full" mode) so the return values would be the correlation on each lag step instead those very large, strange values?
You are looking for normalized cross-correlation. This option isn't available yet in Numpy, but a patch is waiting for review that does just what you want. It shouldn't be too hard to apply it I would think. Most of the patch is just doc string stuff. The only lines of code that it adds are
if normalize:
a = (a - mean(a)) / (std(a) * len(a))
v = (v - mean(v)) / std(v)
where a and v are the inputted numpy arrays of which you are finding the cross-correlation. It shouldn't be hard to either add them into your own distribution of Numpy or just make a copy of the correlate function and add the lines there. I would do the latter personally if I chose to go this route.
Another, quite possibly better, alternative is to just do the normalization to the input vectors before you send it to correlate. It's up to you which way you would like to do it.
By the way, this does appear to be the correct normalization as per the Wikipedia page on cross-correlation except for dividing by len(a) rather than (len(a)-1). I feel that the discrepancy is akin to the standard deviation of the sample vs. sample standard deviation and really won't make much of a difference in my opinion.
According to this slides, I would suggest to do it this way:
def cross_correlation(a1, a2):
lags = range(-len(a1)+1, len(a2))
cs = []
for lag in lags:
idx_lower_a1 = max(lag, 0)
idx_lower_a2 = max(-lag, 0)
idx_upper_a1 = min(len(a1), len(a1)+lag)
idx_upper_a2 = min(len(a2), len(a2)-lag)
b1 = a1[idx_lower_a1:idx_upper_a1]
b2 = a2[idx_lower_a2:idx_upper_a2]
c = np.correlate(b1, b2)[0]
c = c / np.sqrt((b1**2).sum() * (b2**2).sum())
cs.append(c)
return cs
For a full mode, would it make sense to compute corrcoef directly on the lagged signal/feature? Code
from dataclasses import dataclass
from typing import Any, Optional, Sequence
import numpy as np
ArrayLike = Any
#dataclass
class XCorr:
cross_correlation: np.ndarray
lags: np.ndarray
def cross_correlation(
signal: ArrayLike, feature: ArrayLike, lags: Optional[Sequence[int]] = None
) -> XCorr:
"""
Computes normalized cross correlation between the `signal` and the `feature`.
Current implementation assumes the `feature` can't be longer than the `signal`.
You can optionally provide specific lags, if not provided `signal` is padded
with the length of the `feature` - 1, and the `feature` is slid/padded (creating lags)
with 0 padding to match the length of the new signal. Pearson product-moment
correlation coefficients is computed for each lag.
See: https://en.wikipedia.org/wiki/Cross-correlation
:param signal: observed signal
:param feature: feature you are looking for
:param lags: optional lags, if not provided equals to (-len(feature), len(signal))
"""
signal_ar = np.asarray(signal)
feature_ar = np.asarray(feature)
if np.count_nonzero(feature_ar) == 0:
raise ValueError("Unsupported - feature contains only zeros")
assert (
signal_ar.ndim == feature_ar.ndim == 1
), "Unsupported - only 1d signal/feature supported"
assert len(feature_ar) <= len(
signal
), "Unsupported - signal should be at least as long as the feature"
padding_sz = len(feature_ar) - 1
padded_signal = np.pad(
signal_ar, (padding_sz, padding_sz), "constant", constant_values=0
)
lags = lags if lags is not None else range(-padding_sz, len(signal_ar), 1)
if np.max(lags) >= len(signal_ar):
raise ValueError("max positive lag must be shorter than the signal")
if np.min(lags) <= -len(feature_ar):
raise ValueError("max negative lag can't be longer than the feature")
assert np.max(lags) < len(signal_ar), ""
lagged_patterns = np.asarray(
[
np.pad(
feature_ar,
(padding_sz + lag, len(signal_ar) - lag - 1),
"constant",
constant_values=0,
)
for lag in lags
]
)
return XCorr(
cross_correlation=np.corrcoef(padded_signal, lagged_patterns)[0, 1:],
lags=np.asarray(lags),
)
Example:
signal = [0, 0, 1, 0.5, 1, 0, 0, 1]
feature = [1, 0, 0, 1]
xcorr = cross_correlation(signal, feature)
assert xcorr.lags[xcorr.cross_correlation.argmax()] == 4

Categories