Reduce oscillations in splines interpolation - python

I'm trying to interpolate a set of points using the UnivariateSpline function, but I'm getting the usual big oscillations in the limits of the set, do you know any way to solve this?
My code looks like this:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
x=pd.read_csv('thrustlaw.txt')
x1=x['Time(sec)']
y1=x['Thrust(N)']
def splines(x1,y1):
from scipy.interpolate import UnivariateSpline
si = UnivariateSpline(x1,y1,s=0, k=3)
xs = np.linspace(0, x1[len(x1)-1], 10000)
ys = si(xs)
plt.plot(x1,y1,'go')
plt.plot(xs, ys)
plt.ylabel("Thrust[N]")
plt.xlabel("Time[sec]")
plt.title("Thrust curve (splines)")
plt.grid()
plt.show()
splines(x1,y1)
Result:

Fitting high-degree polynomials to noisy data tends to do this. An interpolation method that doesn't have this problem is the (unique) piecewise cubic polynomial that, for each pair of successive points i, i+1:
goes through x_i, y_i
goes through x_{i+1}, y_{i+1}
at x_i, has slope (y_{i+1} - y_{i-1}) / (x_{i+1} - x_{i-1})
at x_{i+1}, has slope (y_{i+2} - y_i) / (x_{i+2} - x_i)
So the tangent at each point is parallel to the straight line segment from the previous point to the next. This forces the derivative to be "somewhat similar" to the original data, so it doesn't oscillate wildly.
If I'm not mistaken, this is a Catmull-Rom spline, a particular case of a cubic Hermite spline. Maybe this question will help you implement it in scipy, or to find another interpolation method to your liking.

Related

savgol_filter from scipy.signal library, get the resulting polinormial function?

savgol_filter gives me the series.
I want to get the underlying polynormial function.
The function of the red line in a below picture.
So that I can extrapolate a point beyond the given x range.
Or I can find the slope of the function at the two extreme data points.
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import savgol_filter
x = np.linspace(0,2*np.pi,100)
y = np.sin(x) + np.random.random(100) * 0.2
yhat = savgol_filter(y, 51, 3) # window size 51, polynomial order 3
plt.plot(x,y)
plt.plot(x,yhat, color='red')
plt.show()
** edit**
Since the filter uses least squares regression to fit the data in a small window to a polynomial of given degree, you can probably only extrapolate from the ends. I think the fitted curve is a piecewise function of these 'fits' and each function would not be a good representation of the entire data as a whole. What you could do is take the end windows of your data, and fit them to the same polynomial degree as the savitzy golay filter (using scipy's polyfit). It likely will not be accurate very far from the window though.
You can also use scipy.signal.savgol_coeffs() to get the coefficients of the filter. I think you dot product the coefficient array with your array of data to get the value at each point. You can include a derivative argument to get the slope at the ends of your data.
https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.savgol_coeffs.html#scipy.signal.savgol_coeffs

Discontinuity in numerical derivative of an interpolating cubic spline

I am trying to to calculate and plot the numerical derivative (dy/dx) from two lists x and y. I am using the scipy.interpolate.UnivariateSpline and scipy.interpolate.UnivariateSpline.derivative to compute the slope. The plot of y vs x seems to be C1 continuous and I was expecting the slope dy/dx to be smooth as well when plotted against x. But then what is causing the little bump in the plot here? Also any suggestion on how I can massage the code to make it C1 continuous?
import numpy as np
from matplotlib import pyplot as plt
from scipy.interpolate import UnivariateSpline
x=[20.14141131550861, 20.29161104293003, 20.458574567775457, 20.653802880772922, 20.910446090013004, 21.404599384233677, 21.427939384233678, 21.451279384233676, 21.474619384233677, 21.497959384233678, 21.52129938423368, 21.52130038423368, 21.54463938423368, 21.56797938423368, 21.59131938423368, 21.61465938423368, 21.63799938423368, 22.132152678454354, 22.388795887694435, 22.5840242006919]
y=[-1.6629252348586834, -1.7625046339166028, -1.875358801338162, -2.01040013818419, -2.193327440415778, -2.5538174545988306, -2.571799827167608, -2.5896274995868005, -2.607298426787476, -2.624811539182082, -2.642165776735291, -2.642165776735291, -2.659360089028171, -2.6763934353217587, -2.693264784620056, -2.7099731157324367, -2.7265165368570314, -3.0965791078676754, -3.290845721407758, -3.440799238587583]
spl1 = UnivariateSpline(x,y,s=0)
dydx = spl1.derivative(n=1)
T = dydx(x)
plt.plot(x,y,'-x')
plt.plot(x,T,'-')
plt.show()
The given data points look like they define a nice C1-smooth curve, but they do not. Plotting the slopes (difference of y over difference of x) shows this:
plt.plot(np.diff(y)/np.diff(x))
There are some duplicate values of y in the array, which look like they don't belong, also some near-duplicate (but not duplicate) values of x.
The easiest way to fix the spline is to allow a tiny bit of smoothing:
spl1 = UnivariateSpline(x, y, s=1e-5)
makes the derivative what you expected:
Removing the "bad apple" also helps, though not as much.
spl1 = UnivariateSpline(x[:10] + x[11:], y[:10] + y[11:], s=0)

scipy -- how to integrate a linearly interpolated function?

I have a function which is an interpolation of a relative large set of data. I use linear interpolation interp1d so there are a lot of non-smooth sharp point like this. The quad function from scipy will give warning because of the sharp points. I wonder how to do the integration without the warning?
Thank you!
Thanks for all the answers. Here I summarize the solutions in case some others run into the same problem:
Just like what #Stelios did, use points to avoid warnings and to get a more accurate result.
In practice the number of points are usually larger than the default limit(limit=50) of quad, so I choose quad(f_interp, a, b, limit=2*p.shape[0], points=p) to avoid all those warnings.
If a and b are not the same start or the end point of the data set x, the points p can be chosen by p = x[where(x>=a and x<=b)]
quad accepts an optional argument, called points. According to the documentation:
points : (sequence of floats,ints), optional
A sequence of break points in the bounded integration interval where
local difficulties of the integrand may occur (e.g., singularities,
discontinuities). The sequence does not have to be sorted.
In your case, the "difficult" points are exactly the x-coordinates of the data points. Here is an example:
import numpy as np
from scipy.integrate import quad
np.random.seed(123)
# generate random data set
x = np.arange(0,10)
y = np.random.rand(10)
# construct a linear interpolation function of the data set
f_interp = lambda xx: np.interp(xx, x, y)
Here is a plot of the data points and f_interp:
Now calling quad as
quad(f_interp,0,9)
return a series of warnings along with
(4.89770017785734, 1.3762838395159349e-05)
If you provide the points argument, i.e.,
quad(f_interp,0,9, points = x)
it issues no warnings and the result is
(4.8977001778573435, 5.437539505167948e-14)
which also implies a much greater accuracy of the result compared to the previous call.
Instead of interp1d, you could use scipy.interpolate.InterpolatedUnivariateSpline. That interpolator has the method integral(a, b) that computes the definite integral.
Here's an example:
import numpy as np
from scipy.interpolate import InterpolatedUnivariateSpline
import matplotlib.pyplot as plt
# Create some test data.
x = np.linspace(0, np.pi, 21)
np.random.seed(12345)
y = np.sin(1.5*x) + np.random.laplace(scale=0.35, size=len(x))**3
# Create the interpolator. Use k=1 for linear interpolation.
finterp = InterpolatedUnivariateSpline(x, y, k=1)
# Create a finer mesh of points on which to compute the integral.
xx = np.linspace(x[0], x[-1], 5*len(x))
# Use the interpolator to compute the integral from 0 to t for each
# t in xx.
qq = [finterp.integral(0, t) for t in xx]
# Plot stuff
p = plt.plot(x, y, '.', label='data')
plt.plot(x, y, '-', color=p[0].get_color(), label='linear interpolation')
plt.plot(xx, qq, label='integral of linear interpolation')
plt.grid()
plt.legend(framealpha=1, shadow=True)
plt.show()
The plot:

Fitting a Gaussian to a set of x,y data

Firstly this is an assignment I've been set so I'm only after pointers, and I am restricted to using the following libraries, NumPy, SciPy and MatPlotLib.
We have been given a txt file which includes x and y data for a resonance experiment and have to fit both a gaussian and lorentzian fit. I'm working on the gaussian fit at the minute and have tried following the code laid out in a previous question as a basis for my own code. (Gaussian fit for Python)
from numpy import *
from matplotlib import *
import matplotlib.pyplot as plt
import pylab
from scipy.optimize import curve_fit
energy, intensity = numpy.loadtxt('resonance_data.txt', unpack=True)
n = size(energy)
mean = 30.7
sigma = 10
intensity0 = 45
def gaus(energy, intensity0, energy0, sigma):
return intensity0 * exp(-(energy - energy0)**2 / (sigma**2))
popt, pcov = curve_fit(gaus, energy, intensity, p0=[45, mean, sigma])
plt.plot(energy, intensity, 'o')
plt.xlabel('Energy/eV')
plt.ylabel('Intensity')
plt.title('Plot of Intensity against Energy')
plt.plot(energy, gaus(energy, *popt))
plt.show()
Which returns the following graph
If I keep the expressions for mean and sigma, as in the url posted the curve fit is a horizontal line, so I'm guessing the problem lies in the curve fit not converging or something.
Looks like your data skews heavily to the left, why Gaussian? Not Boltzmann, Log-Normal, or anything else?
Much of these are already implemented in scipy.stats. See scipy.stats.cauchy for lorentzian and scipy.stats.normal gaussian. An example:
import scipy.stats as ss
A=ss.norm.rvs(0, 5, size=(100)) #Generate a random variable of 100 elements, with expected mean=0, std=5
ss.norm.fit_loc_scale(A) #fit both the mean and std
(-0.13053732553697531, 5.163322485150271) #your number will vary.
And I think you don't need the intensity0 parameter, it is just going to be 1/sigma/srqt(2*pi), because the density function has to sum up to 1.

Spline representation with scipy.interpolate: Poor interpolation for low-amplitude, rapidly oscillating functions

I need to (numerically) calculate the first and second derivative of a function for which I've attempted to use both splrep and UnivariateSpline to create splines for the purpose of interpolation the function to take the derivatives.
However, it seems that there's an inherent problem in the spline representation itself for functions who's magnitude is order 10^-1 or lower and are (rapidly) oscillating.
As an example, consider the following code to create a spline representation of the sine function over the interval (0,6*pi) (so the function oscillates three times only):
import scipy
from scipy import interpolate
import numpy
from numpy import linspace
import math
from math import sin
k = linspace(0, 6.*pi, num=10000) #interval (0,6*pi) in 10'000 steps
y=[]
A = 1.e0 # Amplitude of sine function
for i in range(len(k)):
y.append(A*sin(k[i]))
tck =interpolate.UnivariateSpline(x, y, w=None, bbox=[None, None], k=5, s=2)
M=tck(k)
Below are the results for M for A = 1.e0 and A = 1.e-2
http://i.imgur.com/uEIxq.png Amplitude = 1
http://i.imgur.com/zFfK0.png Amplitude = 1/100
Clearly the interpolated function created by the splines is totally incorrect! The 2nd graph does not even oscillate the correct frequency.
Does anyone have any insight into this problem? Or know of another way to create splines within numpy/scipy?
Cheers,
Rory
I'm guessing that your problem is due to aliasing.
What is x in your example?
If the x values that you're interpolating at are less closely spaced than your original points, you'll inherently lose frequency information. This is completely independent from any type of interpolation. It's inherent in downsampling.
Nevermind the above bit about aliasing. It doesn't apply in this case (though I still have no idea what x is in your example...
I just realized that you're evaluating your points at the original input points when you're using a non-zero smoothing factor (s).
By definition, smoothing won't fit the data exactly. Try putting s=0 in instead.
As a quick example:
import matplotlib.pyplot as plt
import numpy as np
from scipy import interpolate
x = np.linspace(0, 6.*np.pi, num=100) #interval (0,6*pi) in 10'000 steps
A = 1.e-4 # Amplitude of sine function
y = A*np.sin(x)
fig, axes = plt.subplots(nrows=2)
for ax, s, title in zip(axes, [2, 0], ['With', 'Without']):
yinterp = interpolate.UnivariateSpline(x, y, s=s)(x)
ax.plot(x, yinterp, label='Interpolated')
ax.plot(x, y, 'bo',label='Original')
ax.legend()
ax.set_title(title + ' Smoothing')
plt.show()
The reason that you're only clearly seeing the effects of smoothing with a low amplitude is due to the way the smoothing factor is defined. See the documentation for scipy.interpolate.UnivariateSpline for more details.
Even with a higher amplitude, the interpolated data won't match the original data if you use smoothing.
For example, if we just change the amplitude (A) to 1.0 in the code example above, we'll still see the effects of smoothing...
The problem is in choosing suitable values for the s parameter. Its values depend on the scaling of the data.
Reading the documentation carefully, one can deduce that the parameter should be chosen around s = len(y) * np.var(y), i.e. # of data points * variance. Taking for example s = 0.05 * len(y) * np.var(y) gives a smoothing spline that does not depend on the scaling of the data or the number of data points.
EDIT: sensible values for s depend of course also on the noise level in the data. The docs seem to recommend choosing s in the range (m - sqrt(2*m)) * std**2 <= s <= (m + sqrt(2*m)) * std**2 where std is the standard deviation associated with the "noise" you want to smooth over.

Categories