Smoothing spline with continuous 4th and 5th order derivatives in Python? - python

I have some noisy time series that represent input parameters as a function of time for a set of ODEs. These noisy time series need to be interpolated and the derivatives of that interpolation function need to be continuous up to the same number of orders as my ODE solver. Otherwise an Nth order ODE solver will try to take extremely small adaptive time steps to deal with the discontinuous jumps in the Nth order derivative. So for example, for the standard RK45 solver in scipy.integrate.solve_ivp, all derivatives of the interpolation function up to at least 5th order must be continuous, I think.
What is the best way to create a smoothing spline in Python such that its first 5 derivatives will be continuous? I am having a strangely difficult time accomplishing this simple task. Below is a minimal working example showing that even when using a 5th order smoothing UnivariateSpline from scipy, the 4th order derivative shows discontinuities/oscillations and the 5th order derivative just explodes (~1e6 magnitude). This is despite me first gaussian-smoothing my noisy time series to help the interpolation function.
I suspect this is at least one reason why the scipy RK45 solver is taking such a long time to solve my system of ODEs -- its taking an unnecessarily large number of smaller timesteps (though I think the fact that I'm using the default solve_ivp tolerances could also be playing a role).
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import lognorm
from scipy.ndimage import gaussian_filter1d
from scipy.interpolate import UnivariateSpline
x = np.sort(np.random.uniform(0.1,10,500))
dist = lognorm(0.5,loc=1.0)
y_noisy = dist.pdf(x) + np.random.uniform(0.0,1,500)
y_smooth = gaussian_filter1d(y_noisy,10,mode='nearest')
spl = UnivariateSpline(x,y_smooth,k=5,s=2)
xx = np.linspace(0.1,10,100000)
yy = spl(xx)
d1 = np.diff(yy) / np.diff(xx)
d2 = np.diff(d1) / np.diff(xx[1:])
d3 = np.diff(d2) / np.diff(xx[1:-1])
d4 = np.diff(d3) / np.diff(xx[1:-2])
d5 = np.diff(d4) / np.diff(xx[1:-3])
fig, axes = plt.subplots(nrows=6,ncols=1,figsize=(8,8))
fig.subplots_adjust(hspace=0.7)
axes[0].plot(x,y_noisy,'k-',lw=2,alpha=0.1)
axes[0].plot(x,y_smooth,'y-',lw=2)
axes[0].plot(xx, yy)
axes[0].set_title('5th-order smoothing UnivariateSpline')
axes[1].plot(xx[1:], d1)
axes[1].set_title('first derivative')
axes[2].plot(xx[1:-1], d2)
axes[2].set_title('second derivative')
axes[3].plot(xx[2:-1], d3)
axes[3].set_title('third derivative')
axes[4].plot(xx[3:-1], d4)
axes[4].set_title('fourth derivative')
axes[5].plot(xx[4:-1], d5)
axes[5].set_title('fifth derivative')

I don't think this is a valid way to check whether a function is continuous at the fifth derivative. I think the timestep is so small that the plots of the fourth and fifth derivatives are getting overwhelmed by floating point errors. The fourth and fifth differences are so tiny that small errors have a big effect.
To test this, I replaced the spline that you fit with np.sin(), which I know has continuous derivatives.
Sure enough, it says the fourth derivative is jumping around a whole lot. But that can't be - the fourth derivative of sine is just sine.
If I decrease the number of timesteps to 1000, it correctly finds the fourth and fifth derivative. Applying the same technique to the original function, I get this:

Related

Building a Sympy function to differential and integrate from Scipy interpolation

During a project I've been working on, I collected experimental temperature data as a function of time from a metal casting at 4 locations in the sytsem. This temperature profile is complex with a period of initial cooling, followed by an enormous, sudden rise in temperature as the alloy arrives followed by a final period of cooling.
To understand the temperature environment between the measurement locations, I'm trying to use Python to solve the Heat equation which requires a combination of symbolic derivatives and integrals (for which I've used Sympy) and numerical calculations (which which I've used lambdify and numpy).
The issue comes when I want to use the collected temperature data as a boundary conditions in the calculation. I've use Scipy to interpolate between the data points to obtain a complete temperature data set for all times (and to obtain a new spline representing the derivative of the original spline) but I cannot obtain this interpolation response in a format the Sympy will understand for derivatives and integrals in the calculus
Any advice or suggestions?
########################################################################################################
The code I've used for the open step of defining the interpolation is detailed below. I appreciate it would be more efficiently written as a matrix, but I find it easier to see it all when its written out long hand (I generally simplify later if needed):
Note: The time (the x-axis parameter in the interpolation) is a strictly increasing parameter
import numpy as np #import the relevant packages/items
from scipy import signal
import sympy as sp
from scipy.interpolate import *
from scipy.interpolate import UnivariateSpline
x=sp.Symbol("x", real=True, positive=True) #define the sympy symbols
L=sp.Symbol("L", real=True, positive=True)
filename='.../Temperature Data for Code.csv'
data = np.array(pd.read_csv(filename,skiprows=1, header=None)) #reading in the datasets from file
Time_exp=np.array(data[:,0]) #assign the time (x-axis)
T_Alloy_centre_orig=np.array(data[:,1]) +273 #assign 4 temperature (y-axis) and convert to K
T_Alloy_edge_orig=np.array(data[:,2]) +273
T_Inner_orig=np.array(data[:,9]) +273
T_Outer_orig=np.array(data[:,11]) +273
T_Alloy_centre=T_Alloy_centre_orig #create copy of the original data before manipulation
T_Alloy_edge=T_Alloy_edge_orig
T_Inner=T_Inner_orig
T_Outer=T_Outer_orig
T_Alloy_centre=signal.savgol_filter(T_Alloy_centre,3,1) #basic filter to smooth experimental noise
T_Alloy_edge=signal.savgol_filter(T_Alloy_edge,3,1)
T_Inner=signal.savgol_filter(T_Inner,3,1)
T_Outer=signal.savgol_filter(T_Outer,3,1)
T_Alloy_centre_xt = UnivariateSpline(Time_exp, T_Alloy_centre,k=3) #perform spline interpolation
T_Alloy_edge_xt = UnivariateSpline(Time_exp, T_Alloy_edge,k=3)
T_Inner_xt = UnivariateSpline(Time_exp, T_Inner,k=3)
T_Outer_xt = UnivariateSpline(Time_exp, T_Outer,k=3)
diff_T_Alloy_centre_xt = T_Alloy_centre_xt.derivative(n=1) #new spline for derivative of previous
diff_T_Alloy_edge_xt = T_Alloy_edge_xt.derivative(n=1)
diff_T_Inner_xt = T_Inner_xt.derivative(n=1)
diff_T_Outer_xt = T_Outer_xt.derivative(n=1)
#############################################################################
here is where the speculation begins. I've tried several things to try and convert the resulting interpolation into something that can be used by Sympy but unsuccessfully.
First, I tried an sympy.implemented_function approach and just using lambda, although this just gave a string as far as I can tell:
T_Alloy_centre_f = implemented_function('T_Alloy_centre_f', lambda t: T_Alloy_centre_xt)
T_Alloy_centre_f = lambda t: T_Alloy_centre_xt
Secondly, I tried using the interpolation functions available within Sympy (interpolating_spline) although this was running for 15 minutes without achieving a result for only one of the 4 measurements. It would be useful if this worked as it is already within Sympy, although the calculation time is extreme. Possibly as the data is not smooth, featuring a sudden, massive increase in temperature on arrival of the molten alloy.
T_Alloy_centre_xt = sp.interpolating_spline(3, x, Time_exp, T_Alloy_centre)
Finally, to pulled the spline coefficients and knots out of the interpolation with the aim of building the function manually before converting, but I could not come up with a convenient way of getting the piecewise function. Nor did the previous implemented_function approach seem to work here either.
spline_coeffs = T_Alloy_centre_xt.get_coeffs()
spline_knots = T_Alloy_centre_xt.get_knots()
I'm not sure how to proceed from here. I need something from this interpolation that can be passed through sp.diff and sp.integrate
###########################################################################
#if I can get past the above conversion, the next step in the code is evaluating the derivative at a specified value and performing an integral as below:
F_A=-1*sp.diff(T_Inner_f, t) - x/L*(sp.diff(T_Outer_f,t)-sp.diff(T_Inner_f,t))
f_n_A=(2/L)*sp.integrate(F_A*sp.sin(lamda*x),(x,0,L))
Any assistance would be hugely appreciated.

Problem for solving a second order differential equation with ODEINT with a driven force

I need your help because I want to code the movement of a tower for a sinusoidal excitation. The problem is that when I plot the result, there is like a sinusoidal noise which looks abnormal and I don't know where does it come from... I was indeed expecting a more smooth curve as it is normally the case for a driven damped harmonic oscillator.
Below is the equation of the movement:
ddx1 + (f1/m1)*dx1 + (k1/m1)*x1 = omega^2*Em*sin(omega*t)
with the initial conditions: x0 = 0 m and v0=dx0=0 m/s
here is my code:
from math import *
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import odeint
#params
m1=264000000. # kg
f1 = 5000000. # kg/s
k1=225000000. # N/m
#initial displacement of the tower:
x0 = 0. # m
dx0 = 0. # m/s
N=1000000
duration=200
time = np.linspace(0, duration, N)
# Creating the excitation
#sinusoidal excitation
def entry(Em,f,t):
omega = 2*np.pi*f
return -omega**2*Em*np.sin(omega*t)
# Equation: ddx1 + (f1/m1)*dx1 + (k1/m1)*x1 = omega^2*Em*sin(omega*t)
# Solving
def dX(X,t):
#X = [x1, dx1]
A=np.array([[ 0 , 1 ],
[-k1/m1, -f1/m1]])
B=np.array([0, -entry(1,50,t)])
dX=np.dot(A,X)+B
return dX
result = odeint(dX,[x0,dx0],time)
plt.plot(time, result[:, 0])
plt.show()
And here are some pictures:
a first picture
and here when I zoom-in
Could you please tell me what is wrong with my code?
Thank you by advance for your help!
[EDIT] I had tried the code for smaller frequencies and it was more what I expected. What I hadn't thought of is as pointed out by JustLearning, that the difference between the natural frequency and the driving frequency is very important and therefore it is in fact normal to have these micro oscillations. Concerning the value of the parameters, they are indeed very important because they are those of the Taipei tower. But as there is each time a ratio of all these quantities, I think (but I could be wrong) that python does not bother doing the calculations.
I am really new to this so thank you for answering so quickly and helping me.
Assessing what's wrong with your ODE purely based on your plots is probably not wise. In order to check whether your code makes sense when run, you should probably go for convergence tests: pick a known analytic solution and check that the L2 norm of |numerical - analytic| decreases as expected as you make the timestep smaller.
That said, by only looking at the oscillation on top of the oscillation from your plots, what you see is nothing more than a superposition between the not-forced damped oscillation, and the forcing term. The reason why this superposed frequency is so microscopic is because it is roughly FIFTY! times larger than the natural+damped frequency of the oscillator. If you change 50 in entry by something smaller, say 1, you will find that both the natural+damped oscillation and the forcing will superpose with roughly similar frequency. Try 0.1 for an even more comparable superposition of all oscillations in play.
By the way, given your very crazy large parameters, you truly may want to do a convergence test and, if not successful, try some ODE solvers that can handle stiff ODEs -- something that the odeint default solver can' manage most of the time!

How do I numerically integrate a function thats a product of a lorentzian and a cosinus in Python?

I am new to stackoverflow and also quite new to Python. So, I hope to ask my question in an appropriate manner.
I am running a Python code similar to this minimal example with an example function that is a product of a lorentzian with a cosinus that I want to numerically integrate:
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import quad
#minimal example:
omega_loc = 15
gamma = 5
def Lorentzian(w):
#print(w)
return (w**3)/((w/omega_loc) + 1)**2*(gamma/2)/((w-omega_loc)**2+(gamma/2)**2)
def intRe(t):
return quad(lambda w: w**(-2)*Lorentzian(w)*(1-np.cos(w*t)),0,np.inf,limit=10000)[0]
plt.figure(1)
plot_range = np.linspace(0,100,1000)
plt.plot(plot_range, [intRe(t) for t in plot_range])
Independent on the upper limit of the integration I never get the code to run and to give me a result.
When I enable the #print(w) line it seems like the code just keeps on probing the integral at random different values of w in an infinite loop (?). Also the console gives me a detection of a roundoff error.
Is there a different way for numerical integration in Python that is better suited for this kind of function than the quad function or did I do a more fundamental error?
Observations
Close to zero (1 - cos(w*t)) / w**2 tends to 0/0. We can take the taylor expansion t**2(1/2 - (w*t)**2/24).
Going to the infinity the Lorentzian is a constant and the cosine term will cause the output to oscillate indefinitely, the integral can be approximated by multiplying that term by a slowly decreasing term.
You are using a linearly spaced scale with many points. It is easier to visualize with w in log scale.
The plot looks like this before damping the cosine term
I introduced two parameters to tune the attenuation of the oscilations
def cosinus_term(w, t, damping=1e4*omega_loc):
return np.where(abs(w*t) < 1e-6, t**2*(0.5 - (w*t)**2/24.0), (1-np.exp(-abs(w/damping))*np.cos(w*t))/w**2)
def intRe(t, damping=1e4*omega_loc):
return quad(lambda w: cosinus_term(w, t)*Lorentzian(w),0,np.inf,limit=10000)[0]
Plotting with the following code
plt.figure(1)
plot_range = np.logspace(-3,3,100)
plt.plot(plot_range, [intRe(t, 1e2*omega_loc) for t in plot_range])
plt.plot(plot_range, [intRe(t, 1e3*omega_loc) for t in plot_range])
plt.xscale('log')
It runs in less than 3 minutes here, and the two results are close to each other, specially for large w, suggesting that the damping doesn't affect too much the result.

How can I find the break frequencies/3dB points from a bandpass filter frequency sweep data in python?

The data that i have is stored in a 2D list where one column represents a frequency and the other column is its corresponding dB. I would like to programmatically identify the frequency of the 3db points on either end of the passband. I have two ideas on how to do this but they both have drawbacks.
Find maximum point then the average of points in the passband then find points about 3dB lower
Use the sympy library to perform numerical differentiation and identify the critical points/inflection points
use a histogram/bin function to find the amplitude of the passband.
drawbacks
sensitive to spikes, not quite sure how to do this
i don't under stand the math involved and the data is noisy which could lead to a lot of false positives
correlating the amplitude values with list index values could be tricky
Can you think of better ideas and/or ways to implement what I have described?
Assuming that you've loaded multiple readings of the PSD from the signal analyzer, try averaging them before attempting to find the bandedges. If the signal isn't changing too dramatically, the averaging process might smooth away any peaks and valleys and noise within the passband, making it easier to find the edges. This is what many spectrum analyzers can do to make for a smoother PSD.
In case that wasn't clear, assume that each reading gives you 128 tuples of the frequency and power and that you capture 100 of these buffers of data. Now average the 100 samples from bin 0, then samples from 1, 2, ..., 128. Now try and locate the bandpass on this data. It should be easier than on any single buffer. Note I used 100 as an example. If your data is very noisy, it may require more. If there isn't much noise, fewer.
Be careful when doing the averaging. Your data is in dB. To add the samples together in order to find an average, you must first convert the dB data back to decimal, do the adds, do the divide to find the average, and then convert the averaged power back into dB.
Ok it seems this has to be solved by data analysis. I would propose these steps:
Preprocess you data if you suspect it to bee too noisy. I'd suggest either moving-average filter (sp.convolve(data, sp.ones(n)/n, "same")) or better a savitzky-golay-filter (sp.signal.savgol_filter(data, n, polyorder=3)) because you will be interested in extrema of the data, which will be unnecessarily distorted by the ma filter. You might also want to get rid of artifacts like 60Hz noise at this stage.
If the signal you are interested in lives in a narrow band, the spectrum will be a single pronounced peak. In that case you could just fit a curve to your data, a gaussian would be appropriate in that case.
import scipy as sp
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
freq, pow = read_in_your_data_here()
freq, pow = sp.asarray(freq), sp.asarray(pow)
def gauss(x, a, mu, sig):
return a**sp.exp(-(x-mu)**2/(2.*sig**2))
(a, mu, sig), _ = curve_fit(gauss, freq, pow)
fitted_curve = gauss(freq, a, mu, sig)
plt.plot(freq, pow)
plt.plot(freq, fitted_curve)
plt.vlines(mu, min(pow)-2, max(pow)+2)
plt.show()
center_idx = sp.absolute(freq-mu).argmin()
pow_center = pow[center_idx]
pow_3db = pow_center - 3.
def interv_from_binvec(data):
indicator = sp.convolve(data, [-1,1], "same")
return indicator.argmin(), indicator.argmax()
passband_idx = interv_from_binvec(pow > pow_3db)
passband = freq[passband_idx[0]], freq[passband_idx[1]]
This is more an example than a solution, and relies heavily on the assumption the you are searching and finding a high SNR signal with a narrow band. It could be extended to handle more than one signal by use of a mixture model.
You can use scipy's UnivariateSpline and leastsq methods:
Create a spline of y-(np.max(y)-3)
Find the roots of it.
Calculate the difference between the two roots.
from scipy.interpolate import UnivariateSpline
from scipy.optimize import leastsq
x = df["Wavelength / nm"]
y = df["Power / dBm"]
#create spline
spline = UnivariateSpline(x, y-(np.max(y)-3), s=0)
# find the roots
r1, r2 = spline.roots()
# calculate the difference
threedB_bandwidth = abs(r2-r1)

Spline representation with scipy.interpolate: Poor interpolation for low-amplitude, rapidly oscillating functions

I need to (numerically) calculate the first and second derivative of a function for which I've attempted to use both splrep and UnivariateSpline to create splines for the purpose of interpolation the function to take the derivatives.
However, it seems that there's an inherent problem in the spline representation itself for functions who's magnitude is order 10^-1 or lower and are (rapidly) oscillating.
As an example, consider the following code to create a spline representation of the sine function over the interval (0,6*pi) (so the function oscillates three times only):
import scipy
from scipy import interpolate
import numpy
from numpy import linspace
import math
from math import sin
k = linspace(0, 6.*pi, num=10000) #interval (0,6*pi) in 10'000 steps
y=[]
A = 1.e0 # Amplitude of sine function
for i in range(len(k)):
y.append(A*sin(k[i]))
tck =interpolate.UnivariateSpline(x, y, w=None, bbox=[None, None], k=5, s=2)
M=tck(k)
Below are the results for M for A = 1.e0 and A = 1.e-2
http://i.imgur.com/uEIxq.png Amplitude = 1
http://i.imgur.com/zFfK0.png Amplitude = 1/100
Clearly the interpolated function created by the splines is totally incorrect! The 2nd graph does not even oscillate the correct frequency.
Does anyone have any insight into this problem? Or know of another way to create splines within numpy/scipy?
Cheers,
Rory
I'm guessing that your problem is due to aliasing.
What is x in your example?
If the x values that you're interpolating at are less closely spaced than your original points, you'll inherently lose frequency information. This is completely independent from any type of interpolation. It's inherent in downsampling.
Nevermind the above bit about aliasing. It doesn't apply in this case (though I still have no idea what x is in your example...
I just realized that you're evaluating your points at the original input points when you're using a non-zero smoothing factor (s).
By definition, smoothing won't fit the data exactly. Try putting s=0 in instead.
As a quick example:
import matplotlib.pyplot as plt
import numpy as np
from scipy import interpolate
x = np.linspace(0, 6.*np.pi, num=100) #interval (0,6*pi) in 10'000 steps
A = 1.e-4 # Amplitude of sine function
y = A*np.sin(x)
fig, axes = plt.subplots(nrows=2)
for ax, s, title in zip(axes, [2, 0], ['With', 'Without']):
yinterp = interpolate.UnivariateSpline(x, y, s=s)(x)
ax.plot(x, yinterp, label='Interpolated')
ax.plot(x, y, 'bo',label='Original')
ax.legend()
ax.set_title(title + ' Smoothing')
plt.show()
The reason that you're only clearly seeing the effects of smoothing with a low amplitude is due to the way the smoothing factor is defined. See the documentation for scipy.interpolate.UnivariateSpline for more details.
Even with a higher amplitude, the interpolated data won't match the original data if you use smoothing.
For example, if we just change the amplitude (A) to 1.0 in the code example above, we'll still see the effects of smoothing...
The problem is in choosing suitable values for the s parameter. Its values depend on the scaling of the data.
Reading the documentation carefully, one can deduce that the parameter should be chosen around s = len(y) * np.var(y), i.e. # of data points * variance. Taking for example s = 0.05 * len(y) * np.var(y) gives a smoothing spline that does not depend on the scaling of the data or the number of data points.
EDIT: sensible values for s depend of course also on the noise level in the data. The docs seem to recommend choosing s in the range (m - sqrt(2*m)) * std**2 <= s <= (m + sqrt(2*m)) * std**2 where std is the standard deviation associated with the "noise" you want to smooth over.

Categories