Shift interpolation does not give expected behaviour

Shift interpolation does not give expected behaviour - python

When using scipy.ndimage.interpolation.shift to shift a numpy data array along one axis with periodic boundary treatment (mode = 'wrap'), I get an unexpected behavior. The routine tries to force the first pixel (index 0) to be identical to the last one (index N-1) instead of the "last plus one (index N)".
Minimal example:
# module import
import numpy as np
from scipy.ndimage.interpolation import shift
import matplotlib.pyplot as plt
# print scipy.__version__
# 0.18.1
a = range(10)
plt.figure(figsize=(16,12))
for i, shift_pix in enumerate(range(10)):
# shift the data via spline interpolation
b = shift(a, shift=shift_pix, mode='wrap')
# plotting the data
plt.subplot(5,2,i+1)
plt.plot(a, marker='o', label='data')
plt.plot(np.roll(a, shift_pix), marker='o', label='data, roll')
plt.plot(b, marker='o',label='shifted data')
if i == 0:
plt.legend(loc=4,fontsize=12)
plt.ylim(-1,10)
ax = plt.gca()
ax.text(0.10,0.80,'shift %d pix' % i, transform=ax.transAxes)
Blue line: data before the shift
Green line: expected shift behavior
Red line: actual shift output of scipy.ndimage.interpolation.shift
Is there some error in how I call the function or how I understand its behavior with mode = 'wrap'? The current results are in contrast to the mode parameter description from the related scipy tutorial page and from another StackOverflow post. Is there an off-by-one-error in the code?
Scipy version used is 0.18.1, distributed in anaconda-2.2.0

It seems that the behaviour you have observed is intentional.
The cause of the problem lies in the C function map_coordinate which translates the coordinates after shift to ones before shift:
map_coordinate(double in, npy_intp len, int mode)
The function is used as the subroutine in NI_ZoomShift that does the actual shift. Its interesting part looks like this:
Example. Lets see how the output for output = shift(np.arange(10), shift=4, mode='wrap') (from the question) is computed.
NI_ZoomShift computes edge values output[0] and output[9] in some special way, so lets take a look at computation of output[1] (a bit simplified):
# input = [0,1,2,3,4,5,6,7,8,9]
# output = [ ,?, , , , , , , , ] '?' == computed position
# shift = 4
output_index = 1
in = output_index - shift # -3
sz = 10 - 1 # 9
in += sz * ((-5 / 9) + 1)
# += 9 * (( 0) + 1) == 9
# in == 6
return input[in] # 6
It is clear that sz = len - 1 is responsible for the behaviour you have observed. It was changed from sz = len in a suggestively named commit dating back to 2007: Fix off-by-on errors in ndimage boundary routines. Update tests.
I don't know why such change was introduced. One of the possible explanations that come to my mind is as follows:
Function 'shift' uses splines for interpolation.
A knot vector of an uniform spline on interval [0, k] is simply [0,1,2,...,k]. When we say that the spline should wrap, it is natural to require equality on values for knots 0 and k, so that many copies of the spline could be glued together, forming a periodic function:
0--1--2--3-...-k 0--1--2--3-...-k 0--1-- ...
0--1--2--3-...-k 0--1--2--3-...-k ...
Maybe shift just treats its input as a list of values for spline's knots?

It is worth noting that this behavior appears to be a bug, as noted in this SciPy issue:
https://github.com/scipy/scipy/issues/2640
The issue appears to effect every extrapolation mode in scipy.ndimage other than mode='mirror'.

Related

How can I cut a piece away from a plot and set the point I need to zero?

In my work I have the task to read in a CSV file and do calculations with it. The CSV file consists of 9 different columns and about 150 lines with different values acquired from sensors. First the horizontal acceleration was determined, from which the distance was derived by double integration. This represents the lower plot of the two plots in the picture. The upper plot represents the so-called force data. The orange graph shows the plot over the 9th column of the CSV file and the blue graph shows the plot over the 7th column of the CSV file.
As you can see I have drawn two vertical lines in the lower plot in the picture. These lines represent the x-value, which in the upper plot is the global minimum of the orange function and the intersection with the blue function. Now I want to do the following, but I need some help: While I want the intersection point between the first vertical line and the graph to be (0,0), i.e. the function has to be moved down. How do I achieve this? Furthermore, the piece of the function before this first intersection point (shown in purple) should be omitted, so that the function really only starts at this point. How can I do this?
In the following picture I try to demonstrate how I would like to do that:
If you need my code, here you can see it:
import numpy as np
import matplotlib.pyplot as plt
import math as m
import loaddataa as ld
import scipy.integrate as inte
from scipy.signal import find_peaks
import pandas as pd
import os
# Loading of the values
print(os.path.realpath(__file__))
a,b = os.path.split(os.path.realpath(__file__))
print(os.chdir(a))
print(os.chdir('..'))
print(os.chdir('..'))
path=os.getcwd()
path=path+"\\Data\\1 Fabienne\\Test1\\left foot\\50cm"
print(path)
dataListStride = ld.loadData(path)
indexStrideData = 0
strideData = dataListStride[indexStrideData]
#%%Calculation of the horizontal acceleration
def horizontal(yAngle, yAcceleration, xAcceleration):
a = ((m.cos(m.radians(yAngle)))*yAcceleration)-((m.sin(m.radians(yAngle)))*xAcceleration)
return a
resultsHorizontal = list()
for i in range (len(strideData)):
strideData_yAngle = strideData.to_numpy()[i, 2]
strideData_xAcceleration = strideData.to_numpy()[i, 4]
strideData_yAcceleration = strideData.to_numpy()[i, 5]
resultsHorizontal.append(horizontal(strideData_yAngle, strideData_yAcceleration, strideData_xAcceleration))
resultsHorizontal.insert(0, 0)
#plt.plot(x_values, resultsHorizontal)
#%%
#x-axis "convert" into time: 100 Hertz makes 0.01 seconds
scale_factor = 0.01
x_values = np.arange(len(resultsHorizontal)) * scale_factor
#Calculation of the global high and low points
heel_one=pd.Series(strideData.iloc[:,7])
plt.scatter(heel_one.idxmax()*scale_factor,heel_one.max(), color='red')
plt.scatter(heel_one.idxmin()*scale_factor,heel_one.min(), color='blue')
heel_two=pd.Series(strideData.iloc[:,9])
plt.scatter(heel_two.idxmax()*scale_factor,heel_two.max(), color='orange')
plt.scatter(heel_two.idxmin()*scale_factor,heel_two.min(), color='green')#!
#Plot of force data
plt.plot(x_values[:-1],strideData.iloc[:,7]) #force heel
plt.plot(x_values[:-1],strideData.iloc[:,9]) #force toe
# while - loop to calculate the point of intersection with the blue function
i = heel_one.idxmax()
while strideData.iloc[i,7] > strideData.iloc[i,9]:
i = i-1
# Length calculation between global minimum orange function and intersection with blue function
laenge=(i-heel_two.idxmin())*scale_factor
print(laenge)
#%% Integration of horizontal acceleration
velocity = inte.cumtrapz(resultsHorizontal,x_values)
plt.plot(x_values[:-1], velocity)
#%% Integration of the velocity
s = inte.cumtrapz(velocity, x_values[:-1])
plt.plot(x_values[:-2],s)
I hope it's clear what I want to do. Thanks for helping me!

I didn't dig all the way through your code, but the following tricks may be useful.
Say you have x and y values:
x = np.linspace(0,3,100)
y = x**2
Now, you only want the values corresponding to, say, .5 < x < 1.5. First, create a boolean mask for the arrays as follows:
mask = np.logical_and(.5 < x, x < 1.5)
(If this seems magical, then run x < 1.5 in your interpreter and observe the results).
Then use this mask to select your desired x and y values:
x_masked = x[mask]
y_masked = y[mask]
Then, you can translate all these values so that the first x,y pair is at the origin:
x_translated = x_masked - x_masked[0]
y_translated = y_masked - y_masked[0]
Is this the type of thing you were looking for?

Plancks Law, Frequency figures

I want to plot the frequency version of planck's law. I first tried to do this independently:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
%matplotlib inline
# Planck's Law
# Constants
h = 6.62607015*(10**-34) # J*s
c = 299792458 # m * s
k = 1.38064852*(10**-23) # J/K
T = 20 # K
frequency_range = np.linspace(10**-19,10**19,1000000)
def plancks_law(nu):
a = (2*h*nu**3) / (c**2)
e_term = np.exp(h*nu/(k*T))
brightness = a /(e_term - 1)
return brightness
plt.plot(frequency_range,plancks_law(frequency_range))
plt.gca().set_xlim([1*10**-16 ,1*10**16 ])
plt.gca().invert_xaxis()
This did not work, I have an issue with scaling somehow. My next idea was to attempt to use this person's code from this question: Plancks Formula for Blackbody spectrum
import matplotlib.pyplot as plt
import numpy as np
h = 6.626e-34
c = 3.0e+8
k = 1.38e-23
def planck_f(freq, T):
a = 2.0*h*(freq**3)
b = h*freq/(k*T)
intensity = a/( (c**2 * (np.exp(b) - 1.0) ))
return intensity
# generate x-axis in increments from 1nm to 3 micrometer in 1 nm increments
# starting at 1 nm to avoid wav = 0, which would result in division by zero.
wavelengths = np.arange(1e-9, 3e-6, 1e-9)
frequencies = np.arange(3e14, 3e17, 1e14, dtype=np.float64)
intensity4000 = planck_f(frequencies, 4000.)
plt.gca().invert_xaxis()
This didn't work, because I got a divide by zero error. Except that I don't see where there is a division by zero, the denominator shouldn't ever be zero since the exponential term shouldn't ever be equal to one. I chose the frequencies to be the conversions of the wavelength values from the example code.
Can anyone help fix the problem or explain how I can get planck's law for frequency instead of wavelength?

You can not safely handle such large numbers; even for comparably "small" values of b = h*freq/(k*T) your float64 will overflow, e.g np.exp(709.)=8.218407461554972e+307 is ok, but np.exp(710.)=inf. You'll have to adjust your units (exponents) accordingly to avoid this!
Note that this is also the case in the other question you linked to, if you insert print( np.exp(b)[:10] ) within the definition of planck(), you can examine the first ten evaluated b's and you'll see the overflow in the first few occurrences. In any case, simply use the answer posted within the other question, but convert the x-axis in plt.plot(wavelengths, intensity) to frequency (i hope you know how to get from one to the other) :-)

Gaussian Sum with python

I found this code :
import numpy as np
import matplotlib.pyplot as plt
# We create 1000 realizations with 200 steps each
n_stories = 1000
t_max = 500
t = np.arange(t_max)
# Steps can be -1 or 1 (note that randint excludes the upper limit)
steps = 2 * np.random.randint(0, 1 + 1, (n_stories, t_max)) - 1
# The time evolution of the position is obtained by successively
# summing up individual steps. This is done for each of the
# realizations, i.e. along axis 1.
positions = np.cumsum(steps, axis=1)
# Determine the time evolution of the mean square distance.
sq_distance = positions**2
mean_sq_distance = np.mean(sq_distance, axis=0)
# Plot the distance d from the origin as a function of time and
# compare with the theoretically expected result where d(t)
# grows as a square root of time t.
plt.figure(figsize=(10, 7))
plt.plot(t, np.sqrt(mean_sq_distance), 'g.', t, np.sqrt(t), 'y-')
plt.xlabel(r"$t$")
plt.tight_layout()
plt.show()
Instead of doing just steps -1 or 1 , I would like to do steps following a standard normal distribution ... when I am inserting np.random.normal(0,1,1000) instead of np.random.randint(...) it is not working.
I am really new to Python btw.
Many thanks in advance and Kind regards

You are entering a single number as third parameter of np.random.normal, therefore you get a 1d array, instead of 2d, see the documentation. Try this:
steps = np.random.normal(0, 1, (n_stories, t_max))

Python - Iter through identified component features

I am standing in front of a huge problem. Using the python libraries NumPy and SciPy, I identified several features in large array. For this purpose, I created a 3x3 neighbor structure and used it for a connected component analysis --> see docs.
struct = scipy.ndimage.generate_binary_structure(2,2)
labeled_array, num_features = ndimage.label(array,struct)
My problem now is that I want to iterate through all identified features in a loop. Someone has an idea how to address individual features in the resulting NumPy array?

Here's an example of handling features identified by ndimage.label. Whether this helps you or not depends on what you want to do with the features.
import numpy as np
import scipy.ndimage as ndi
import matplotlib.pyplot as plt
# Make a small array for the demonstration.
# The ndimage.label() function treats 0 as the "background".
a = np.zeros((16, 16), dtype=int)
a[:6, :8] = 1
a[9:, :5] = 1
a[8:, 13:] = 2
a[5:13, 6:12] = 3
struct = ndi.generate_binary_structure(2, 2)
lbl, n = ndi.label(a, struct)
# Plot the original array.
plt.figure(figsize=(11, 4))
plt.subplot(1, n + 1, 1)
plt.imshow(a, interpolation='nearest')
plt.title("Original")
plt.axis('off')
# Plot the isolated features found by label().
for i in range(1, n + 1):
# Make an array of zeros the same shape as `a`.
feature = np.zeros_like(a, dtype=int)
# Set the elements that are part of feature i to 1.
# Feature i consists of elements in `lbl` where the value is i.
# This statement uses numpy's "fancy indexing" to set the corresponding
# elements of `feature` to 1.
feature[lbl == i] = 1
# Make an image plot of the feature.
plt.subplot(1, n + 1, i + 1)
plt.imshow(feature, interpolation='nearest', cmap=plt.cm.copper)
plt.title("Feature {:d}".format(i))
plt.axis('off')
plt.show()
Here's the image generated by the script:

Just a quick note on an alternative way to solve the above mentioned problem. Instead of using the NumPy "fanzy indexing" one could also use the ndimage "find_objects" function.
example:
# Returns a list of slices for the labeled array. The slices represent the position of features in the labeled area
s = ndi.find_objects(lbl, max_label=0)
# Then you can simply output the patches
for i in n:
print a[s[i]]
I will leave the question open because i couldn't solve an additional arising problem. I want to get the size of the features (already solved, quite easy via ndi.sum() ) as well as the number of nonlabeled cells in direct vicinity of the feature (ergo counting the number of zeros around the feature).

Why NUMPY correlate and corrcoef return different values and how to "normalize" a correlate in "full" mode?

I'm trying to use some Time Series Analysis in Python, using Numpy.
I have two somewhat medium-sized series, with 20k values each and I want to check the sliding correlation.
The corrcoef gives me as output a Matrix of auto-correlation/correlation coefficients. Nothing useful by itself in my case, as one of the series contains a lag.
The correlate function (in mode="full") returns a 40k elements list that DO look like the kind of result I'm aiming for (the peak value is as far from the center of the list as the Lag would indicate), but the values are all weird - up to 500, when I was expecting something from -1 to 1.
I can't just divide it all by the max value; I know the max correlation isn't 1.
How could I normalize the "cross-correlation" (correlation in "full" mode) so the return values would be the correlation on each lag step instead those very large, strange values?

You are looking for normalized cross-correlation. This option isn't available yet in Numpy, but a patch is waiting for review that does just what you want. It shouldn't be too hard to apply it I would think. Most of the patch is just doc string stuff. The only lines of code that it adds are
if normalize:
a = (a - mean(a)) / (std(a) * len(a))
v = (v - mean(v)) / std(v)
where a and v are the inputted numpy arrays of which you are finding the cross-correlation. It shouldn't be hard to either add them into your own distribution of Numpy or just make a copy of the correlate function and add the lines there. I would do the latter personally if I chose to go this route.
Another, quite possibly better, alternative is to just do the normalization to the input vectors before you send it to correlate. It's up to you which way you would like to do it.
By the way, this does appear to be the correct normalization as per the Wikipedia page on cross-correlation except for dividing by len(a) rather than (len(a)-1). I feel that the discrepancy is akin to the standard deviation of the sample vs. sample standard deviation and really won't make much of a difference in my opinion.

According to this slides, I would suggest to do it this way:
def cross_correlation(a1, a2):
lags = range(-len(a1)+1, len(a2))
cs = []
for lag in lags:
idx_lower_a1 = max(lag, 0)
idx_lower_a2 = max(-lag, 0)
idx_upper_a1 = min(len(a1), len(a1)+lag)
idx_upper_a2 = min(len(a2), len(a2)-lag)
b1 = a1[idx_lower_a1:idx_upper_a1]
b2 = a2[idx_lower_a2:idx_upper_a2]
c = np.correlate(b1, b2)[0]
c = c / np.sqrt((b1**2).sum() * (b2**2).sum())
cs.append(c)
return cs

For a full mode, would it make sense to compute corrcoef directly on the lagged signal/feature? Code
from dataclasses import dataclass
from typing import Any, Optional, Sequence
import numpy as np
ArrayLike = Any
#dataclass
class XCorr:
cross_correlation: np.ndarray
lags: np.ndarray
def cross_correlation(
signal: ArrayLike, feature: ArrayLike, lags: Optional[Sequence[int]] = None
) -> XCorr:
"""
Computes normalized cross correlation between the `signal` and the `feature`.
Current implementation assumes the `feature` can't be longer than the `signal`.
You can optionally provide specific lags, if not provided `signal` is padded
with the length of the `feature` - 1, and the `feature` is slid/padded (creating lags)
with 0 padding to match the length of the new signal. Pearson product-moment
correlation coefficients is computed for each lag.
See: https://en.wikipedia.org/wiki/Cross-correlation
:param signal: observed signal
:param feature: feature you are looking for
:param lags: optional lags, if not provided equals to (-len(feature), len(signal))
"""
signal_ar = np.asarray(signal)
feature_ar = np.asarray(feature)
if np.count_nonzero(feature_ar) == 0:
raise ValueError("Unsupported - feature contains only zeros")
assert (
signal_ar.ndim == feature_ar.ndim == 1
), "Unsupported - only 1d signal/feature supported"
assert len(feature_ar) <= len(
signal
), "Unsupported - signal should be at least as long as the feature"
padding_sz = len(feature_ar) - 1
padded_signal = np.pad(
signal_ar, (padding_sz, padding_sz), "constant", constant_values=0
)
lags = lags if lags is not None else range(-padding_sz, len(signal_ar), 1)
if np.max(lags) >= len(signal_ar):
raise ValueError("max positive lag must be shorter than the signal")
if np.min(lags) <= -len(feature_ar):
raise ValueError("max negative lag can't be longer than the feature")
assert np.max(lags) < len(signal_ar), ""
lagged_patterns = np.asarray(
[
np.pad(
feature_ar,
(padding_sz + lag, len(signal_ar) - lag - 1),
"constant",
constant_values=0,
)
for lag in lags
]
)
return XCorr(
cross_correlation=np.corrcoef(padded_signal, lagged_patterns)[0, 1:],
lags=np.asarray(lags),
)
Example:
signal = [0, 0, 1, 0.5, 1, 0, 0, 1]
feature = [1, 0, 0, 1]
xcorr = cross_correlation(signal, feature)
assert xcorr.lags[xcorr.cross_correlation.argmax()] == 4

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Shift interpolation does not give expected behaviour - python

It is worth noting that this behavior appears to be a bug, as noted in this SciPy issue: https://github.com/scipy/scipy/issues/2640 The issue appears to effect every extrapolation mode in scipy.ndimage other than mode='mirror'.

Related

How can I cut a piece away from a plot and set the point I need to zero?

Plancks Law, Frequency figures

Gaussian Sum with python

Python - Iter through identified component features

Why NUMPY correlate and corrcoef return different values and how to "normalize" a correlate in "full" mode?

Categories

Resources