Building a Sympy function to differential and integrate from Scipy interpolation

Building a Sympy function to differential and integrate from Scipy interpolation - python

During a project I've been working on, I collected experimental temperature data as a function of time from a metal casting at 4 locations in the sytsem. This temperature profile is complex with a period of initial cooling, followed by an enormous, sudden rise in temperature as the alloy arrives followed by a final period of cooling.
To understand the temperature environment between the measurement locations, I'm trying to use Python to solve the Heat equation which requires a combination of symbolic derivatives and integrals (for which I've used Sympy) and numerical calculations (which which I've used lambdify and numpy).
The issue comes when I want to use the collected temperature data as a boundary conditions in the calculation. I've use Scipy to interpolate between the data points to obtain a complete temperature data set for all times (and to obtain a new spline representing the derivative of the original spline) but I cannot obtain this interpolation response in a format the Sympy will understand for derivatives and integrals in the calculus
Any advice or suggestions?
########################################################################################################
The code I've used for the open step of defining the interpolation is detailed below. I appreciate it would be more efficiently written as a matrix, but I find it easier to see it all when its written out long hand (I generally simplify later if needed):
Note: The time (the x-axis parameter in the interpolation) is a strictly increasing parameter
import numpy as np #import the relevant packages/items
from scipy import signal
import sympy as sp
from scipy.interpolate import *
from scipy.interpolate import UnivariateSpline
x=sp.Symbol("x", real=True, positive=True) #define the sympy symbols
L=sp.Symbol("L", real=True, positive=True)
filename='.../Temperature Data for Code.csv'
data = np.array(pd.read_csv(filename,skiprows=1, header=None)) #reading in the datasets from file
Time_exp=np.array(data[:,0]) #assign the time (x-axis)
T_Alloy_centre_orig=np.array(data[:,1]) +273 #assign 4 temperature (y-axis) and convert to K
T_Alloy_edge_orig=np.array(data[:,2]) +273
T_Inner_orig=np.array(data[:,9]) +273
T_Outer_orig=np.array(data[:,11]) +273
T_Alloy_centre=T_Alloy_centre_orig #create copy of the original data before manipulation
T_Alloy_edge=T_Alloy_edge_orig
T_Inner=T_Inner_orig
T_Outer=T_Outer_orig
T_Alloy_centre=signal.savgol_filter(T_Alloy_centre,3,1) #basic filter to smooth experimental noise
T_Alloy_edge=signal.savgol_filter(T_Alloy_edge,3,1)
T_Inner=signal.savgol_filter(T_Inner,3,1)
T_Outer=signal.savgol_filter(T_Outer,3,1)
T_Alloy_centre_xt = UnivariateSpline(Time_exp, T_Alloy_centre,k=3) #perform spline interpolation
T_Alloy_edge_xt = UnivariateSpline(Time_exp, T_Alloy_edge,k=3)
T_Inner_xt = UnivariateSpline(Time_exp, T_Inner,k=3)
T_Outer_xt = UnivariateSpline(Time_exp, T_Outer,k=3)
diff_T_Alloy_centre_xt = T_Alloy_centre_xt.derivative(n=1) #new spline for derivative of previous
diff_T_Alloy_edge_xt = T_Alloy_edge_xt.derivative(n=1)
diff_T_Inner_xt = T_Inner_xt.derivative(n=1)
diff_T_Outer_xt = T_Outer_xt.derivative(n=1)
#############################################################################
here is where the speculation begins. I've tried several things to try and convert the resulting interpolation into something that can be used by Sympy but unsuccessfully.
First, I tried an sympy.implemented_function approach and just using lambda, although this just gave a string as far as I can tell:
T_Alloy_centre_f = implemented_function('T_Alloy_centre_f', lambda t: T_Alloy_centre_xt)
T_Alloy_centre_f = lambda t: T_Alloy_centre_xt
Secondly, I tried using the interpolation functions available within Sympy (interpolating_spline) although this was running for 15 minutes without achieving a result for only one of the 4 measurements. It would be useful if this worked as it is already within Sympy, although the calculation time is extreme. Possibly as the data is not smooth, featuring a sudden, massive increase in temperature on arrival of the molten alloy.
T_Alloy_centre_xt = sp.interpolating_spline(3, x, Time_exp, T_Alloy_centre)
Finally, to pulled the spline coefficients and knots out of the interpolation with the aim of building the function manually before converting, but I could not come up with a convenient way of getting the piecewise function. Nor did the previous implemented_function approach seem to work here either.
spline_coeffs = T_Alloy_centre_xt.get_coeffs()
spline_knots = T_Alloy_centre_xt.get_knots()
I'm not sure how to proceed from here. I need something from this interpolation that can be passed through sp.diff and sp.integrate
###########################################################################
#if I can get past the above conversion, the next step in the code is evaluating the derivative at a specified value and performing an integral as below:
F_A=-1*sp.diff(T_Inner_f, t) - x/L*(sp.diff(T_Outer_f,t)-sp.diff(T_Inner_f,t))
f_n_A=(2/L)*sp.integrate(F_A*sp.sin(lamda*x),(x,0,L))
Any assistance would be hugely appreciated.

Related

Solving log-transformed ODE system without overflow error

I have a system of ODEs where my state variables and independent variable span many orders of magnitude (initial values are around 0 at t=0 and are expected to become about 10¹⁰ by t=10¹⁷). I also want to ensure that my state variables remain positive.
According to this Stack Overflow post, one way to enforce positivity is to log-transform the ODEs to solve for the evolution of the logarithm of a variable instead of the variable itself. However when I try this with my ODEs, I get an overflow error probably because of the huge dynamic range / orders of magnitude of my state variables and time variable. Am I doing something wrong or is log-transform just not applicable in my case?
Here is a minimal working example that is successfully solved by scipy.integrate.solve_ivp:
import numpy as np
from scipy.interpolate import interp1d
from scipy.integrate import solve_ivp
# initialize times at which we are given certain input quantities/parameters
# this is seconds corresponding to the age of the universe in billions of years
times = np.linspace(0.1,10,500) * 3.15e16
# assume we are given the amount of new mass flowing into the system in units of g/sec
# for this toy example we will assume a log-normal distribution and then interpolate it for our integrator function
mdot_grow_array = np.random.lognormal(mean=0,sigma=1,size=len(times))*1.989e33 / 3.15e7
interp_grow = interp1d(times,mdot_grow_array,kind='cubic')
# assume there is also a conversion efficiency for some fraction of mass to be converted to another form
# for this example we'll assume the fractions are drawn from a uniform random distribution and again interpolate
mdot_convert_array = np.random.uniform(0,0.1,len(times)) / 3.15e16 # fraction of M1 per second converted to M2
interp_convert = interp1d(times,mdot_convert_array,kind='cubic')
# set up our integrator function
def integrator(t,y):
print('Working on t=',t/3.15e16) # to check status of integration in billions of years
# unpack state variables
M1, M2 = y
# get the interpolated value of new mass flowing in at this time
mdot_grow_now = interp_grow(t)
mdot_convert_now = interp_convert(t)
# assume some fraction of the mass gets converted to another form
mdot_convert = mdot_convert_now * M1
# return the derivatives
M1dot = mdot_grow_now - mdot_convert
M2dot = mdot_convert
return M1dot, M2dot
# set up initial conditions and run solve_ivp for the whole time range
# should start with M1=M2=0 initially but then solve_ivp does not work at all, so just use [1,1] instead
initial_conditions = [1.0,1.0]
# note how the integrator gets stuck at very small timesteps early on
sol = solve_ivp(integrator,(times[0],times[-1]),initial_conditions,dense_output=True,method='RK23')
And here is the same example but now log-transformed following the Stack Overflow post referenced above (since dlogx/dt = 1/x * dx/dt, we simply replace the LHS with x*dlogx/dt and divide both sides by x to isolate dlogx/dt on the LHS; and we make sure to use np.exp() on the state variables – now logx instead of x – within the integrator function):
import numpy as np
from scipy.interpolate import interp1d
from scipy.integrate import solve_ivp
# initialize times at which we are given certain input quantities/parameters
# this is seconds corresponding to the age of the universe in billions of years
times = np.linspace(0.1,10,500) * 3.15e16
# assume we are given the amount of new mass flowing into the system in units of g/sec
# for this toy example we will assume a log-normal distribution and then interpolate it for our integrator function
mdot_grow_array = np.random.lognormal(mean=0,sigma=1,size=len(times))*1.989e33 / 3.15e7
interp_grow = interp1d(times,mdot_grow_array,kind='cubic')
# assume there is also a conversion efficiency for some fraction of mass to be converted to another form
# for this example we'll assume the fractions are drawn from a uniform random distribution and again interpolate
mdot_convert_array = np.random.uniform(0,0.1,len(times)) / 3.15e16 # fraction of M1 per second converted to M2
interp_convert = interp1d(times,mdot_convert_array,kind='cubic')
# set up our integrator function
def integrator(t,logy):
print('Working on t=',t/3.15e16) # to check status of integration in billions of years
# unpack state variables
M1, M2 = np.exp(logy)
# get the interpolated value of new mass flowing in at this time
mdot_grow_now = interp_grow(t)
mdot_convert_now = interp_convert(t)
# assume some fraction of the mass gets converted to another form
mdot_convert = mdot_convert_now * M1
# return the derivatives
M1dot = (mdot_grow_now - mdot_convert) / M1
M2dot = (mdot_convert) / M2
return M1dot, M2dot
# set up initial conditions and run solve_ivp for the whole time range
# should start with M1=M2=0 initially but then solve_ivp does not work at all, so just use [1,1] instead
initial_conditions = [1.0,1.0]
# note how the integrator gets stuck at very small timesteps early on
sol = solve_ivp(integrator,(times[0],times[-1]),initial_conditions,dense_output=True,method='RK23')

[…] is log-transform just not applicable in my case?
I don’t know where your transform went wrong, but it will certainly not achieve what you think it does. Log-transforming as a means to avoid negative values makes sense and works if and only if the following two conditions hold:
If the value of a dynamical variable approaches zero (from above), its derivative also approaches zero (from above) in your model.
Due to numerical noise, your derivative may turn negative though it actually isn’t.
Conversely, it is not necessary or doesn’t work in the following cases:
If Condition 1 fails because your derivative never approaches zero in your model, but is strictly positive, you have no problem to begin with, as your derivative should not become negative in any reasonable implementation of your model. (You might make it happen by implementing some spectacular numerical annihilation, but that’s quite a difficult feat to achieve and not what I would consider a reasonable implementation.)
If Condition 1 fails because your derivative becomes truly negative in your model, logarithms won’t save you, because the dynamics wants to push the derivative below zero and the logarithms cannot represent this. You usually get an overflow error due to the logarithms becoming extremely negative or the adaptive integration fails.
Even if Condition 1 applies, Condition 2 can usually be handled by avoiding numerical annihilations and similar when implementing your model.
Unless I am mistaken, your model falls into the first category. If M1 goes to zero, mdot_convert goes towards zero and thus M1dot = mdot_grow_now - mdot_convert is strictly positive, because mdot_grow_now is. M2dot is strictly positive anyway. Thus, you gain nothing from log-transforming. In fact, in the vast majority of cases, your dynamical variables will quickly increase.
With all that being said, some things you might want to look into are:
Normalising your variables to be in the order of magnitude of 1.
Stochastic differential equations.

Smoothing spline with continuous 4th and 5th order derivatives in Python?

I have some noisy time series that represent input parameters as a function of time for a set of ODEs. These noisy time series need to be interpolated and the derivatives of that interpolation function need to be continuous up to the same number of orders as my ODE solver. Otherwise an Nth order ODE solver will try to take extremely small adaptive time steps to deal with the discontinuous jumps in the Nth order derivative. So for example, for the standard RK45 solver in scipy.integrate.solve_ivp, all derivatives of the interpolation function up to at least 5th order must be continuous, I think.
What is the best way to create a smoothing spline in Python such that its first 5 derivatives will be continuous? I am having a strangely difficult time accomplishing this simple task. Below is a minimal working example showing that even when using a 5th order smoothing UnivariateSpline from scipy, the 4th order derivative shows discontinuities/oscillations and the 5th order derivative just explodes (~1e6 magnitude). This is despite me first gaussian-smoothing my noisy time series to help the interpolation function.
I suspect this is at least one reason why the scipy RK45 solver is taking such a long time to solve my system of ODEs -- its taking an unnecessarily large number of smaller timesteps (though I think the fact that I'm using the default solve_ivp tolerances could also be playing a role).
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import lognorm
from scipy.ndimage import gaussian_filter1d
from scipy.interpolate import UnivariateSpline
x = np.sort(np.random.uniform(0.1,10,500))
dist = lognorm(0.5,loc=1.0)
y_noisy = dist.pdf(x) + np.random.uniform(0.0,1,500)
y_smooth = gaussian_filter1d(y_noisy,10,mode='nearest')
spl = UnivariateSpline(x,y_smooth,k=5,s=2)
xx = np.linspace(0.1,10,100000)
yy = spl(xx)
d1 = np.diff(yy) / np.diff(xx)
d2 = np.diff(d1) / np.diff(xx[1:])
d3 = np.diff(d2) / np.diff(xx[1:-1])
d4 = np.diff(d3) / np.diff(xx[1:-2])
d5 = np.diff(d4) / np.diff(xx[1:-3])
fig, axes = plt.subplots(nrows=6,ncols=1,figsize=(8,8))
fig.subplots_adjust(hspace=0.7)
axes[0].plot(x,y_noisy,'k-',lw=2,alpha=0.1)
axes[0].plot(x,y_smooth,'y-',lw=2)
axes[0].plot(xx, yy)
axes[0].set_title('5th-order smoothing UnivariateSpline')
axes[1].plot(xx[1:], d1)
axes[1].set_title('first derivative')
axes[2].plot(xx[1:-1], d2)
axes[2].set_title('second derivative')
axes[3].plot(xx[2:-1], d3)
axes[3].set_title('third derivative')
axes[4].plot(xx[3:-1], d4)
axes[4].set_title('fourth derivative')
axes[5].plot(xx[4:-1], d5)
axes[5].set_title('fifth derivative')

I don't think this is a valid way to check whether a function is continuous at the fifth derivative. I think the timestep is so small that the plots of the fourth and fifth derivatives are getting overwhelmed by floating point errors. The fourth and fifth differences are so tiny that small errors have a big effect.
To test this, I replaced the spline that you fit with np.sin(), which I know has continuous derivatives.
Sure enough, it says the fourth derivative is jumping around a whole lot. But that can't be - the fourth derivative of sine is just sine.
If I decrease the number of timesteps to 1000, it correctly finds the fourth and fifth derivative. Applying the same technique to the original function, I get this:

How do I numerically integrate a function thats a product of a lorentzian and a cosinus in Python?

I am new to stackoverflow and also quite new to Python. So, I hope to ask my question in an appropriate manner.
I am running a Python code similar to this minimal example with an example function that is a product of a lorentzian with a cosinus that I want to numerically integrate:
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import quad
#minimal example:
omega_loc = 15
gamma = 5
def Lorentzian(w):
#print(w)
return (w**3)/((w/omega_loc) + 1)**2*(gamma/2)/((w-omega_loc)**2+(gamma/2)**2)
def intRe(t):
return quad(lambda w: w**(-2)*Lorentzian(w)*(1-np.cos(w*t)),0,np.inf,limit=10000)[0]
plt.figure(1)
plot_range = np.linspace(0,100,1000)
plt.plot(plot_range, [intRe(t) for t in plot_range])
Independent on the upper limit of the integration I never get the code to run and to give me a result.
When I enable the #print(w) line it seems like the code just keeps on probing the integral at random different values of w in an infinite loop (?). Also the console gives me a detection of a roundoff error.
Is there a different way for numerical integration in Python that is better suited for this kind of function than the quad function or did I do a more fundamental error?

Observations
Close to zero (1 - cos(w*t)) / w**2 tends to 0/0. We can take the taylor expansion t**2(1/2 - (w*t)**2/24).
Going to the infinity the Lorentzian is a constant and the cosine term will cause the output to oscillate indefinitely, the integral can be approximated by multiplying that term by a slowly decreasing term.
You are using a linearly spaced scale with many points. It is easier to visualize with w in log scale.
The plot looks like this before damping the cosine term
I introduced two parameters to tune the attenuation of the oscilations
def cosinus_term(w, t, damping=1e4*omega_loc):
return np.where(abs(w*t) < 1e-6, t**2*(0.5 - (w*t)**2/24.0), (1-np.exp(-abs(w/damping))*np.cos(w*t))/w**2)
def intRe(t, damping=1e4*omega_loc):
return quad(lambda w: cosinus_term(w, t)*Lorentzian(w),0,np.inf,limit=10000)[0]
Plotting with the following code
plt.figure(1)
plot_range = np.logspace(-3,3,100)
plt.plot(plot_range, [intRe(t, 1e2*omega_loc) for t in plot_range])
plt.plot(plot_range, [intRe(t, 1e3*omega_loc) for t in plot_range])
plt.xscale('log')
It runs in less than 3 minutes here, and the two results are close to each other, specially for large w, suggesting that the damping doesn't affect too much the result.

Conversion from Matlab to Python of Cross Power Spectral Density function

I am trying to convert a MATLAB program to python. I am having problems setting up the cross power spectral density function and obtain matching results with Matlab.
The function used in the MATLAB code is written as follows:
[Pxy,f] = cpsd(x,y,M,round(M/2),M,fs);
In the documentation available in my code, I read that: M = 128 (number of FFT points) and fs = 25.0 (sampling frequency [Hz]). x and y are row-vectors 1x751 of acceleration data.
The function used has six arguments, so I assume that this: [pxy,f] = cpsd(x,y,window,noverlap,f,fs) is the the function which the programmer intended to call from the MATLAB library, as the only one available in the documentation with six arguments (see here)
This function returns the cross power spectral density estimates at the frequencies specified in f.
(It bugs me that f is not defined as a frequency but the variable M is passed there and it's the number of FFT points, but let's assume this was not a mistake).
Now, I would like to use scipy.signal.csd to convert this function, but there are two problems:
The window is defined in MatLab as an integer, but the csd from scipy allows windows only as tuples, strings, or array_like objects;
In the csd from scipy there is not an argument that allows to returns the cross power spectral density estimates at specific frequencies.
For number 1 I defined a window as follows:
window = hamming(M, sym=False)
I choose the hamming window as it's the default one specified for when a window is passed as an integer in the csd in MATLAB ("If window is an integer, then cpsd divides x and y into segments of length window and windows each segment with a Hamming window of that length.") and didn't make it symmetric given that I am doing spectral analysis, so it makes sense to use a periodic window.
For number 2 I have no solution.
This is the function I set up in my python code:
M = 128
fs = 25.0
window = hamming(M, sym=False)
noverlap = np.round(M/2)
f, fxx = signal.csd(x,y,fs=fs,window=window,noverlap=noverlap
The results are not matching in terms of Pxy (cross power spectral density), but they are perfect in terms of frequencies.
These are the first elements in the matlab results:
1.3590e-05
3.4354e-05
4.5282e-05
6.2549e-05
5.7965e-05
4.9697e-05
5.5413e-05
While this is what I get from Python:
2.04688576e-06
3.37540142e-05
4.51821900e-05
6.19997501e-05
5.78926181e-05
5.00058106e-05
5.53681683e-05
I tried using the simple cross spectral density function in matlplotlib (documented here) as follows:
fxx, f = mlab.csd(x,y,NFFT=M,Fs=fs,noverlap=noverlap)
And I obtain more matching results, but still not perfect.
1.18939608e-05
3.45206717e-05
4.56902859e-05
6.45083475e-05
5.73594952e-05
5.01539145e-05
5.34534933e-05
The objective is not to get rid of a possible numerical error in the conversion, but to operate the cross power spectral density with a matching input
Can anybody help?
Thanks a lot in advance!!!

How can I find the break frequencies/3dB points from a bandpass filter frequency sweep data in python?

The data that i have is stored in a 2D list where one column represents a frequency and the other column is its corresponding dB. I would like to programmatically identify the frequency of the 3db points on either end of the passband. I have two ideas on how to do this but they both have drawbacks.
Find maximum point then the average of points in the passband then find points about 3dB lower
Use the sympy library to perform numerical differentiation and identify the critical points/inflection points
use a histogram/bin function to find the amplitude of the passband.
drawbacks
sensitive to spikes, not quite sure how to do this
i don't under stand the math involved and the data is noisy which could lead to a lot of false positives
correlating the amplitude values with list index values could be tricky
Can you think of better ideas and/or ways to implement what I have described?

Assuming that you've loaded multiple readings of the PSD from the signal analyzer, try averaging them before attempting to find the bandedges. If the signal isn't changing too dramatically, the averaging process might smooth away any peaks and valleys and noise within the passband, making it easier to find the edges. This is what many spectrum analyzers can do to make for a smoother PSD.
In case that wasn't clear, assume that each reading gives you 128 tuples of the frequency and power and that you capture 100 of these buffers of data. Now average the 100 samples from bin 0, then samples from 1, 2, ..., 128. Now try and locate the bandpass on this data. It should be easier than on any single buffer. Note I used 100 as an example. If your data is very noisy, it may require more. If there isn't much noise, fewer.
Be careful when doing the averaging. Your data is in dB. To add the samples together in order to find an average, you must first convert the dB data back to decimal, do the adds, do the divide to find the average, and then convert the averaged power back into dB.

Ok it seems this has to be solved by data analysis. I would propose these steps:
Preprocess you data if you suspect it to bee too noisy. I'd suggest either moving-average filter (sp.convolve(data, sp.ones(n)/n, "same")) or better a savitzky-golay-filter (sp.signal.savgol_filter(data, n, polyorder=3)) because you will be interested in extrema of the data, which will be unnecessarily distorted by the ma filter. You might also want to get rid of artifacts like 60Hz noise at this stage.
If the signal you are interested in lives in a narrow band, the spectrum will be a single pronounced peak. In that case you could just fit a curve to your data, a gaussian would be appropriate in that case.
import scipy as sp
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
freq, pow = read_in_your_data_here()
freq, pow = sp.asarray(freq), sp.asarray(pow)
def gauss(x, a, mu, sig):
return a**sp.exp(-(x-mu)**2/(2.*sig**2))
(a, mu, sig), _ = curve_fit(gauss, freq, pow)
fitted_curve = gauss(freq, a, mu, sig)
plt.plot(freq, pow)
plt.plot(freq, fitted_curve)
plt.vlines(mu, min(pow)-2, max(pow)+2)
plt.show()
center_idx = sp.absolute(freq-mu).argmin()
pow_center = pow[center_idx]
pow_3db = pow_center - 3.
def interv_from_binvec(data):
indicator = sp.convolve(data, [-1,1], "same")
return indicator.argmin(), indicator.argmax()
passband_idx = interv_from_binvec(pow > pow_3db)
passband = freq[passband_idx[0]], freq[passband_idx[1]]
This is more an example than a solution, and relies heavily on the assumption the you are searching and finding a high SNR signal with a narrow band. It could be extended to handle more than one signal by use of a mixture model.

You can use scipy's UnivariateSpline and leastsq methods:
Create a spline of y-(np.max(y)-3)
Find the roots of it.
Calculate the difference between the two roots.
from scipy.interpolate import UnivariateSpline
from scipy.optimize import leastsq
x = df["Wavelength / nm"]
y = df["Power / dBm"]
#create spline
spline = UnivariateSpline(x, y-(np.max(y)-3), s=0)
# find the roots
r1, r2 = spline.roots()
# calculate the difference
threedB_bandwidth = abs(r2-r1)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.