scipy.optimize.curve_fit return inf pcov - python

I use curve_fit to fit a very simple line as below code:
from scipy.optimize import curve_fit
def func(x, a, b):
return a * x + b
x = [6.6000000000000005, 7.599]
y = [123.9835274456227, 144.9319749893788]
popt, pcov = curve_fit(func,x,y,method='dogbox',p0=[20,-15])
print(popt) # get [ 20.96941696 -14.4146245 ]
print(pcov) # get [[inf inf], [inf inf]]
But the pcov result is inf. How can I get correct pcov values ?

The result should be no value, since two points fit curve should no parameter error. So the covariance of parameters is zero.

Related

How to determine unknown parameters of a differential equation based on the best fit to a data set in Python?

I am trying to fit different differential equations to a given data set with python. For this reason, I use the scipy package, respectively the solve_ivp function.
This works fine for me, as long as I have a rough estimate of the parameters (b= 0.005) included in the differential equations, e.g:
import matplotlib.pyplot as plt
from scipy.integrate import solve_ivp
import numpy as np
def f(x, y, b):
dydx= [-b[0] * y[0]]
return dydx
xspan= np.linspace(1, 500, 25)
yinit= [5]
b= [0.005]
sol= solve_ivp(lambda x, y: f(x, y, b),
[xspan[0], xspan[-1]], yinit, t_eval= xspan)
print(sol)
print("\n")
print(sol.t)
print(sol.y)
plt.plot(sol.t, sol.y[0], "b--")
However, what I like to achieve is, that the parameter b (or more parameters) is/are determined "automatically" based on the best fit of the solved differential equation to a given data set (x and y). Is there a way this can be done, for example by combining this example with the curve_fit function of scipy and how would this look?
Thank you in advance!
Yes, what you think about should work, it should be easy to plug together. You want to call
popt, pcov = scipy.optimize.curve_fit(curve, xdata, ydata, p0=[b0])
b = popt[0]
where you now have to define a function curve(x,*p) that transforms any list of point into a list of values according to the only parameter b.
def curve(x,b):
res = solve_ivp(odefun, [1,500], [5], t_eval=x, args = [b])
return res.y[0]
Add optional arguments for error tolerances as necessary.
To make this more realistic, make also the initial point a parameter. Then it also becomes more obvious where a list is expected and where single arguments. To get a proper fitting task add some random noise to the test data. Also make the fall to zero not so fast, so that the final plot still looks somewhat interesting.
from scipy.integrate import solve_ivp
from scipy.optimize import curve_fit
xmin,xmax = 1,500
def f(t, y, b):
dydt= -b * y
return dydt
def curve(t, b, y0):
sol= solve_ivp(lambda t, y: f(t, y, b),
[xmin, xmax], [y0], t_eval= t)
return sol.y[0]
xdata = np.linspace(xmin, xmax, 25)
ydata = np.exp(-0.02*xdata)+0.02*np.random.randn(*xdata.shape)
y0 = 5
b= 0.005
p0 = [b,y0]
popt, pcov = curve_fit(curve, xdata, ydata, p0=p0)
b, y0 = popt
print(f"b={b}, y0 = {y0}")
This returns
b=0.019975693539459473, y0 = 0.9757709108115179
Now plot the test data against the fitted curve

Fit differential equation with scipy

how can I fit the differential function of the followint scipy tutorial
Scipy Differential Equation Tutorial?
In the end I want to fit some datapoints that follow a set of two differential equations with six parameters in total but I'd like to start with an easy example. So far I tried the functions scipy.optimize.curve_fit and scipy.optimize.leastsq but I did not get anywhere.
So this is how far I came:
import numpy as np
import scipy.optimize as scopt
import scipy.integrate as scint
import scipy.optimize as scopt
def pend(y, t, b, c):
theta, omega = y
dydt = [omega, -b*omega - c*np.sin(theta)]
return dydt
def test_pend(y, t, b, c):
theta, omega = y
dydt = [omega, -b*omega - c*np.sin(theta)]
return dydt
b = 0.25
c = 5.0
y0 = [np.pi - 0.1, 0.0]
guess = [0.5, 4]
t = np.linspace(0, 1, 11)
sol = scint.odeint(pend, y0, t, args=(b, c))
popt, pcov = scopt.curve_fit(test_pend, guess, t, sol)
with the following error message:
ValueError: too many values to unpack (expected 2)
And I'm sorry as this is assumingly a pretty simple question but I don't get it to work. So thanks in advance.
You need to provide a function f(t,b,c) that given an argument or a list of arguments in t returns the value of the function at the argument(s). This requires some work, either by determining the type of t or by using a construct that works either way:
def f(t,b,c):
tspan = np.hstack([[0],np.hstack([t])])
return scint.odeint(pend, y0, tspan, args=(b,c))[1:,0]
popt, pcov = scopt.curve_fit(f, t, sol[:,0], p0=guess)
which returns popt = array([ 0.25, 5. ]).
This can be extended to fit even more parameters,
def f(t, a0,a1, b,c):
tspan = np.hstack([[0],np.hstack([t])])
return scint.odeint(pend, [a0,a1], tspan, args=(b,c))[1:,0]
popt, pcov = scopt.curve_fit(f, t, sol[:,0], p0=guess)
which results in popt = [ 3.04159267e+00, -2.38543640e-07, 2.49993362e-01, 4.99998795e+00].
Another possibility is to explicitly compute the square norm of the differences to the target solution and apply minimization to the so-defined scalar function.
def f(param):
b,c = param
t_sol = scint.odeint(pend, y0, t, args=(b,c))
return np.linalg.norm(t_sol[:,0]-sol[:,0]);
res = scopt.minimize(f, np.array(guess))
which returns in res
fun: 1.572327981969186e-08
hess_inv: array([[ 0.00031325, 0.00033478],
[ 0.00033478, 0.00035841]])
jac: array([ 0.06129361, -0.04859557])
message: 'Desired error not necessarily achieved due to precision loss.'
nfev: 518
nit: 27
njev: 127
status: 2
success: False
x: array([ 0.24999905, 4.99999884])

How to force scipy.optimize.curve_fit fix first point?

I use this code for smoothing data by fitting exponent with scipy.optimize.curve_fit:
def smooth_data_v1(x_arr,y_arr):
def func(x, a, b, c):
return a*np.exp(-b*x)+c
#Scale data
y = y_orig / 10000.0
x = 500.0 * x_orig
popt, pcov = curve_fit(func, x, y, p0=(1, 0.01, 1))
y_smooth = func(x, *popt) # Calcaulate smoothed values for same points
#Undo scaling
y_final = y_smooth * 10000.0
return y_final
Howewer I want estimated exponent curve to go through 1st point.
Bad case:
Good case:
I have tried to remove last parameter using first point x0,y0:
def smooth_data_v2(x_orig,y_orig):
x0 = x_orig[0]
y0 = y_orig[0]
def func(x, a, b):
return a*np.exp(-b*x)+y0-a*np.exp(-b*x0)
#Scale data
y = y_orig / 10000.0
x = 500.0 * x_orig
popt, pcov = curve_fit(func, x, y, p0=(1, 0.01))
y_smooth = func(x, *popt) # Calcaulate smoothed values for same points
#Undo scaling
y_final = y_smooth * 10000.0
return y_final
Howewer something go wrong and I get:
a param is really large
popt [ 4.45028144e+05 2.74698863e+01]
Any ideas?
Update:
Example of data
x_orig [ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.]
y_orig [ 445057. 447635. 450213. 425089. 391746. 350725. 285433. 269027.
243835. 230587. 216757. 202927. 189097. 175267. 161437.]
Scipy curve_fit allows for passing the parameter sigma, which is designed to be the standard deviation for weighting the fit. But this array can be filled with arbitrary data:
from scipy.optimize import curve_fit
def smooth_data_v1(x_arr,y_arr):
def func(x, a, b, c):
return a*np.exp(-b*x)+c
#create the weighting array
y_weight = np.empty(len(y_arr))
#high pseudo-sd values, meaning less weighting in the fit
y_weight.fill(10)
#low values for point 0 and the last points, meaning more weighting during the fit procedure
y_weight[0] = y_weight[-5:-1] = 0.1
popt, pcov = curve_fit(func, x_arr, y_arr, p0=(y_arr[0], 1, 1), sigma = y_weight, absolute_sigma = True)
print("a, b, c:", *popt)
y_smooth = func(x_arr, *popt)
return y_smooth
x_orig = np.asarray([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14])
y_orig = np.asarray([ 445057, 447635, 450213, 425089, 391746, 350725, 285433, 269027,
243835, 230587, 216757, 202927, 189097, 175267, 161437])
print(smooth_data_v1(x_orig, y_orig))
As you can see, now the first and last point are close to the original values, but this "clamping of values" comes sometimes at a price for the rest of the data points.
You have probably also noticed, that I removed your rescaling part. Imho, one shouldn't do this before curve fitting procedures. It is usually better to use raw data. Additionally, your data are not really well represented by an exponential function, hence the tiny b value.
How about changing the definition of the function you try to fit?
def func(x, a, b, c):
return (y_arr[0] - c)*np.exp(-b*(x - x_arr[0]))+c
This function always fits the first point perfectly by definition, but otherwise can do anything your current function does.

Using scipy.optimize.curve_fit with weights

According to the documentation, the argument sigma can be used to set the weights of the data points in the fit. These "describe" 1-sigma errors when the argument absolute_sigma=True.
I have some data with artificial normally-distributed noise which varies:
n = 200
x = np.linspace(1, 20, n)
x0, A, alpha = 12, 3, 3
def f(x, x0, A, alpha):
return A * np.exp(-((x-x0)/alpha)**2)
noise_sigma = x/20
noise = np.random.randn(n) * noise_sigma
yexact = f(x, x0, A, alpha)
y = yexact + noise
If I want to fit the noisy y to f using curve_fit to what should I set sigma? The documentation isn't very specific here, but I would usually use 1/noise_sigma**2 as the weight:
p0 = 10, 4, 2
popt, pcov = curve_fit(f, x, y, p0)
popt2, pcov2 = curve_fit(f, x, y, p0, sigma=1/noise_sigma**2, absolute_sigma=True)
It doesn't seem to improve the fit much, though.
Is this option only used to better interpret the fit uncertainties through the covariance matrix? What is the difference between these two telling me?
In [249]: pcov
Out[249]:
array([[ 1.10205238e-02, -3.91494024e-08, 8.81822412e-08],
[ -3.91494024e-08, 1.52660426e-02, -1.05907265e-02],
[ 8.81822412e-08, -1.05907265e-02, 2.20414887e-02]])
In [250]: pcov2
Out[250]:
array([[ 0.26584674, -0.01836064, -0.17867193],
[-0.01836064, 0.27833 , -0.1459469 ],
[-0.17867193, -0.1459469 , 0.38659059]])
At least with scipy version 1.1.0 the parameter sigma should be equal to the error on each parameter. Specifically the documentation says:
A 1-d sigma should contain values of standard deviations of errors in
ydata. In this case, the optimized function is chisq = sum((r / sigma)
** 2).
In your case that would be:
curve_fit(f, x, y, p0, sigma=noise_sigma, absolute_sigma=True)
I looked through the source code and verified that when you specify sigma this way it minimizes ((f-data)/sigma)**2.
As a side note, this is in general what you want to be minimizing when you know the errors. The likelihood of observing points data given a model f is given by:
L(data|x0,A,alpha) = product over i Gaus(data_i, mean=f(x_i,x0,A,alpha), sigma=sigma_i)
which if you take the negative log becomes (up to constant factors that don't depend on the parameters):
-log(L) = sum over i (f(x_i,x0,A,alpha)-data_i)**2/(sigma_i**2)
which is just the chisquare.
I wrote a test program to verify that curve_fit was indeed returning the correct values with the sigma specified correctly:
from __future__ import print_function
import numpy as np
from scipy.optimize import curve_fit, fmin
np.random.seed(0)
def make_chi2(x, data, sigma):
def chi2(args):
x0, A, alpha = args
return np.sum(((f(x,x0,A,alpha)-data)/sigma)**2)
return chi2
n = 200
x = np.linspace(1, 20, n)
x0, A, alpha = 12, 3, 3
def f(x, x0, A, alpha):
return A * np.exp(-((x-x0)/alpha)**2)
noise_sigma = x/20
noise = np.random.randn(n) * noise_sigma
yexact = f(x, x0, A, alpha)
y = yexact + noise
p0 = 10, 4, 2
# curve_fit without parameters (sigma is implicitly equal to one)
popt, pcov = curve_fit(f, x, y, p0)
# curve_fit with wrong sigma specified
popt2, pcov2 = curve_fit(f, x, y, p0, sigma=1/noise_sigma**2, absolute_sigma=True)
# curve_fit with correct sigma
popt3, pcov3 = curve_fit(f, x, y, p0, sigma=noise_sigma, absolute_sigma=True)
chi2 = make_chi2(x,y,noise_sigma)
# double checking that we get the correct answer
xopt = fmin(chi2,p0,xtol=1e-10,ftol=1e-10)
print("popt = %s, chi2 = %.2f" % (popt,chi2(popt)))
print("popt2 = %s, chi2 = %.2f" % (popt2, chi2(popt2)))
print("popt3 = %s, chi2 = %.2f" % (popt3, chi2(popt3)))
print("xopt = %s, chi2 = %.2f" % (xopt, chi2(xopt)))
which outputs:
popt = [ 11.93617403 3.30528488 2.86314641], chi2 = 200.66
popt2 = [ 11.94169083 3.30372955 2.86207253], chi2 = 200.64
popt3 = [ 11.93128545 3.333727 2.81403324], chi2 = 200.44
xopt = [ 11.93128603 3.33373094 2.81402741], chi2 = 200.44
As you can see the chi2 is indeed minimized correctly when you specify sigma=sigma as an argument to curve_fit.
As to why the improvement isn't "better", I'm not really sure. My only guess is that without specifying a sigma value you implicitly assume they are equal and over the part of the data where the fit matters (the peak), the errors are "approximately" equal.
To answer your second question, no the sigma option is not only used to change the output of the covariance matrix, it actually changes what is being minimized.

scipy.optimize.curve_fit unable to fit shifted skewed gaussian curve

I am trying to fit a skewed and shifted Gaussian curve using scipy's curve_fit function, but I find that under certain conditions the fitting is quite poor, often giving me close to or exactly a straight line.
The code below is derived from the curve_fit documentation. The code provided is an arbitrary set of data for test purposes but displays the issue quite well.
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
import math as math
import scipy.special as sp
#def func(x, a, b, c):
# return a*np.exp(-b*x) + c
def func(x, sigmag, mu, alpha, c,a):
#normal distribution
normpdf = (1/(sigmag*np.sqrt(2*math.pi)))*np.exp(-(np.power((x-mu),2)/(2*np.power(sigmag,2))))
normcdf = (0.5*(1+sp.erf((alpha*((x-mu)/sigmag))/(np.sqrt(2)))))
return 2*a*normpdf*normcdf + c
x = np.linspace(0,100,100)
y = func(x, 10,30, 0,0,1)
yn = y + 0.001*np.random.normal(size=len(x))
popt, pcov = curve_fit(func, x, yn,) #p0=(9,35,0,9,1))
y_fit= func(x,popt[0],popt[1],popt[2],popt[3],popt[4])
plt.plot(x,yn)
plt.plot(x,y_fit)
The issue seems to pop up when I shift the gaussian too far from zero (using mu). I have tried giving initial values, even those identical to my original function, but it does not solve the problem. For a value of mu=10, curve_fit works perfectly, but if I use mu>=30 it not longer fits the data.
Giving starting points for minimization often works wonders. Try giving the minimizer some information on the position of the maximum and the width of the curve:
popt, pcov = curve_fit(func, x, yn, p0=(1./np.std(yn), np.argmax(yn) ,0,0,1))
Changing this single line in your code with sigma=10 and mu=50 produces
You can call curve_fit many times with random initial guess, and choose the parameters with minimum error.
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
import math as math
import scipy.special as sp
def func(x, sigmag, mu, alpha, c,a):
#normal distribution
normpdf = (1/(sigmag*np.sqrt(2*math.pi)))*np.exp(-(np.power((x-mu),2)/(2*np.power(sigmag,2))))
normcdf = (0.5*(1+sp.erf((alpha*((x-mu)/sigmag))/(np.sqrt(2)))))
return 2*a*normpdf*normcdf + c
x = np.linspace(0,100,100)
y = func(x, 10,30, 0,0,1)
yn = y + 0.001*np.random.normal(size=len(x))
results = []
for i in xrange(50):
p = np.random.randn(5)*10
try:
popt, pcov = curve_fit(func, x, yn, p)
except:
pass
err = np.sum(np.abs(func(x, *popt) - yn))
results.append((err, popt))
if err < 0.1:
break
err, popt = min(results, key=lambda x:x[0])
y_fit= func(x, *popt)
plt.plot(x,yn)
plt.plot(x,y_fit)
print len(results)

Categories