scipy.optimize.curve_fit unable to fit shifted skewed gaussian curve - python

I am trying to fit a skewed and shifted Gaussian curve using scipy's curve_fit function, but I find that under certain conditions the fitting is quite poor, often giving me close to or exactly a straight line.
The code below is derived from the curve_fit documentation. The code provided is an arbitrary set of data for test purposes but displays the issue quite well.
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
import math as math
import scipy.special as sp
#def func(x, a, b, c):
# return a*np.exp(-b*x) + c
def func(x, sigmag, mu, alpha, c,a):
#normal distribution
normpdf = (1/(sigmag*np.sqrt(2*math.pi)))*np.exp(-(np.power((x-mu),2)/(2*np.power(sigmag,2))))
normcdf = (0.5*(1+sp.erf((alpha*((x-mu)/sigmag))/(np.sqrt(2)))))
return 2*a*normpdf*normcdf + c
x = np.linspace(0,100,100)
y = func(x, 10,30, 0,0,1)
yn = y + 0.001*np.random.normal(size=len(x))
popt, pcov = curve_fit(func, x, yn,) #p0=(9,35,0,9,1))
y_fit= func(x,popt[0],popt[1],popt[2],popt[3],popt[4])
The issue seems to pop up when I shift the gaussian too far from zero (using mu). I have tried giving initial values, even those identical to my original function, but it does not solve the problem. For a value of mu=10, curve_fit works perfectly, but if I use mu>=30 it not longer fits the data.

Giving starting points for minimization often works wonders. Try giving the minimizer some information on the position of the maximum and the width of the curve:
popt, pcov = curve_fit(func, x, yn, p0=(1./np.std(yn), np.argmax(yn) ,0,0,1))
Changing this single line in your code with sigma=10 and mu=50 produces

You can call curve_fit many times with random initial guess, and choose the parameters with minimum error.
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
import math as math
import scipy.special as sp
def func(x, sigmag, mu, alpha, c,a):
#normal distribution
normpdf = (1/(sigmag*np.sqrt(2*math.pi)))*np.exp(-(np.power((x-mu),2)/(2*np.power(sigmag,2))))
normcdf = (0.5*(1+sp.erf((alpha*((x-mu)/sigmag))/(np.sqrt(2)))))
return 2*a*normpdf*normcdf + c
x = np.linspace(0,100,100)
y = func(x, 10,30, 0,0,1)
yn = y + 0.001*np.random.normal(size=len(x))
results = []
for i in xrange(50):
p = np.random.randn(5)*10
popt, pcov = curve_fit(func, x, yn, p)
err = np.sum(np.abs(func(x, *popt) - yn))
results.append((err, popt))
if err < 0.1:
err, popt = min(results, key=lambda x:x[0])
y_fit= func(x, *popt)
print len(results)


python does not solve an ode system properly

I have an ODE which I need to solve and plot. This is my code so far:
import numpy as np
import math
from scipy.integrate import odeint
import matplotlib.pyplot as plt
import math
from scipy import integrate
# Constants
#α = 1,β = 12,μ = 0.338,Vb = 0.01,ω = 0.5,
alpha = 1#p1
beta = 12#p2
mu = 0.338#p3
omega = 0.5#p5
gamma = 0.25#p6
tmax = 1000
t = np.arange(0.0, tmax, 0.001)
# Initial conditions vector
# The model differential equations.
def deriv(y,t):
X, I, Z, H, E, O = y
dXdt = I
dIdt = -(alpha)*(X)-(beta)*(X**3)-(mu)*I+gamma*(1/(1-X)**2)-gamma*(1/(1+X)**2)+Acoef*(1/(1-X)**2)*(np.sin(omega*t))
dZdt = H
dHdt =-(alpha)*(Z)-(beta)*(Z**3)-(mu)*I+gamma*(1/(1-Z)**2)-gamma*(1/(1+Z)**2)+Acoef*(1/(1-Z)**2)*(np.sin(omega*t))
return dXdt, dIdt, dZdt, dHdt, dEdt,dOdt
# Initial conditions vector
y0 = X0,I0,Z0,H0,E0,O0
# Integrate the SIR equations over the time grid, t.
ret = odeint(deriv, y0, t)
X, I, Z, H, E, O = ret.T
# Plot the data on three separate curves for S(t), I(t) and R(t)
plt.xlim([-10, 10])
plt.plot(ret[:,4], ret[:,5])
Unfortunately, it produces the following warning:
File "C:\Users\user\AppData\Local\Programs\Python\Python37\lib\site-packages\scipy\integrate\", line 247
warnings.warn(warning_msg, ODEintWarning)
ODEintWarning: Excess work done on this call (perhaps wrong Dfun type). Run with full_output = 1 to get quantitative information.
And the result is not as expected. I cannot here post graphical result probably.
Does anyone know how I can fix it, please?

How to determine unknown parameters of a differential equation based on the best fit to a data set in Python?

I am trying to fit different differential equations to a given data set with python. For this reason, I use the scipy package, respectively the solve_ivp function.
This works fine for me, as long as I have a rough estimate of the parameters (b= 0.005) included in the differential equations, e.g:
import matplotlib.pyplot as plt
from scipy.integrate import solve_ivp
import numpy as np
def f(x, y, b):
dydx= [-b[0] * y[0]]
return dydx
xspan= np.linspace(1, 500, 25)
yinit= [5]
b= [0.005]
sol= solve_ivp(lambda x, y: f(x, y, b),
[xspan[0], xspan[-1]], yinit, t_eval= xspan)
plt.plot(sol.t, sol.y[0], "b--")
However, what I like to achieve is, that the parameter b (or more parameters) is/are determined "automatically" based on the best fit of the solved differential equation to a given data set (x and y). Is there a way this can be done, for example by combining this example with the curve_fit function of scipy and how would this look?
Thank you in advance!
Yes, what you think about should work, it should be easy to plug together. You want to call
popt, pcov = scipy.optimize.curve_fit(curve, xdata, ydata, p0=[b0])
b = popt[0]
where you now have to define a function curve(x,*p) that transforms any list of point into a list of values according to the only parameter b.
def curve(x,b):
res = solve_ivp(odefun, [1,500], [5], t_eval=x, args = [b])
return res.y[0]
Add optional arguments for error tolerances as necessary.
To make this more realistic, make also the initial point a parameter. Then it also becomes more obvious where a list is expected and where single arguments. To get a proper fitting task add some random noise to the test data. Also make the fall to zero not so fast, so that the final plot still looks somewhat interesting.
from scipy.integrate import solve_ivp
from scipy.optimize import curve_fit
xmin,xmax = 1,500
def f(t, y, b):
dydt= -b * y
return dydt
def curve(t, b, y0):
sol= solve_ivp(lambda t, y: f(t, y, b),
[xmin, xmax], [y0], t_eval= t)
return sol.y[0]
xdata = np.linspace(xmin, xmax, 25)
ydata = np.exp(-0.02*xdata)+0.02*np.random.randn(*xdata.shape)
y0 = 5
b= 0.005
p0 = [b,y0]
popt, pcov = curve_fit(curve, xdata, ydata, p0=p0)
b, y0 = popt
print(f"b={b}, y0 = {y0}")
This returns
b=0.019975693539459473, y0 = 0.9757709108115179
Now plot the test data against the fitted curve

Pass tuple as input argument for scipy.optimize.curve_fit

I have the following code:
import numpy as np
from scipy.optimize import curve_fit
def func(x, p): return p[0] + p[1] + x
popt, pcov = curve_fit(func, np.arange(10), np.arange(10), p0=(0, 0))
It will raise TypeError: func() takes exactly 2 arguments (3 given). Well, that sounds fair - curve_fit unpact the (0, 0) to be two scalar inputs. So I tried this:
popt, pcov = curve_fit(func, np.arange(10), np.arange(10), p0=((0, 0),))
Again, it said: ValueError: object too deep for desired array
If I left it as default (not specifying p0):
popt, pcov = curve_fit(func, np.arange(10), np.arange(10))
It will raise IndexError: invalid index to scalar variable. Obviously, it only gave the function a scalar for p.
I can make def func(x, p1, p2): return p1 + p2 + x to get it working, but with more complicated situations the code is going to look verbose and messy. I'd really love it if there's a cleaner solution to this problem.
Not sure if this is cleaner, but at least it is easier now to add more parameters to the fitting function. Maybe one could even make an even better solution out of this.
import numpy as np
from scipy.optimize import curve_fit
def func(x, p): return p[0] + p[1] * x
def func2(*args):
return func(args[0],args[1:])
popt, pcov = curve_fit(func2, np.arange(10), np.arange(10), p0=(0, 0))
print popt,pcov
EDIT: This works for me
import numpy as np
from scipy.optimize import curve_fit
def func(x, *p): return p[0] + p[1] * x
popt, pcov = curve_fit(func, np.arange(10), np.arange(10), p0=(0, 0))
print popt,pcov
When using curve_fit you must explicitly say the number of fit parameters. Doing something like:
def f(x, *p):
return sum( [p[i]*x**i for i in range(len(p))] )
would be great, since it would be a general nth-order polynomial fitting function, but unfortunately, in my SciPy 0.12.0, it raises:
ValueError: Unable to determine number of fit parameters.
So you should do:
def f_1(x, p0, p1):
return p0 + p1*x
def f_2(x, p0, p1, p2):
return p0 + p1*x + p2*x**2
and so forth...
Then you can call using the p0 argument:
curve_fit(f_1, xdata, ydata, p0=(0,0))
scipy.optimize.curve_fit(f, xdata, ydata, p0=None, sigma=None, **kw)
Use non-linear least squares to fit a function, f, to data.
Assumes ydata = f(xdata, *params) + eps
Explaining the idea
The function to be fitted should take only scalars (not: *p0).
Remember that the result of the fit depends on the initialization parameters.
Working example
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
def func(x, a0, a1):
return a0 + a1 * x
x, y = np.arange(10), np.arange(10) + np.random.randn(10)/10
popt, pcov = curve_fit(func, x, y, p0=(1, 1))
# Plot the results
plt.title('Fit parameters:\n a0=%.2e a1=%.2e' % (popt[0], popt[1]))
# Data
plt.plot(x, y, 'rx')
# Fitted function
x_fine = np.linspace(x[0], x[-1], 100)
plt.plot(x_fine, func(x_fine, popt[0], popt[1]), 'b-')
You can define functions that return other functions (see Passing additional arguments using scipy.optimize.curve_fit? )
Working example :
import numpy as np
import random
from scipy.optimize import curve_fit
from matplotlib import pyplot as plt
import math
def funToFit(x):
return 0.5+2*x-3*x*x+0.2*x*x*x+0.1*x*x*x*x
xx=[random.uniform(1,5) for i in range(30)]
yy=[funToFit(xx[i])+random.uniform(-1,1) for i in range(len(xx))]
def make_func(numarg):
def func(x,*a):
for i in range(ng):
return v
return func
leastsq, covar = curve_fit(make_func(len(a)),xx,yy,tuple(a))
print leastsq
def fFited(x):
for i in range(len(leastsq)):
return v
This is an old thread now, but I also just ran into this issue. Building on Emile Maras' solution, but expanding the function to to return either the nth order polynomial fitting function for curve_fit, or the y values based on fit results. This facilitates plotting and residual calculations. Here is an example that fits data to progressively higher order polynomials and plots the results and residuals.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def funToFit(x):
return 0.5 + 2*x -3*x**2 + 0.2*x**3 + 0.1*x**4
def polyfun(order=0,x=np.arange(0),P=np.arange(0)):
if P.size>0 and x.size>0:
for i in range(P.size):
return y
elif order>0:
def fitfun(x,*a):
for i in range(order+1):
return y
return fitfun
raise Exception("Either order or x and P must be provided")
for i in range(4):
order = i+1
[fit,covar] = curve_fit(polyfun(order=order),x,y,p0=(1,)*(order+1))
yfit = polyfun(x=x,P=fit)
res = yfit-y

Fitting data to system of ODEs using Python via Scipy & Numpy

I am having some trouble translating my MATLAB code into Python via Scipy & Numpy. I am stuck on how to find optimal parameter values (k0 and k1) for my system of ODEs to fit to my ten observed data points. I currently have an initial guess for k0 and k1. In MATLAB, I can using something called 'fminsearch' which is a function that takes the system of ODEs, the observed data points, and the initial values of the system of ODEs. It will then calculate a new pair of parameters k0 and k1 that will fit the observed data. I have included my code to see if you can help me implement some kind of 'fminsearch' to find the optimal parameter values k0 and k1 that will fit my data. I want to add whatever code to do this to my file.
I have three .py files -,, and
def f(y, t, k):
return (-k[0]*y[0],
import pylab as py
import numpy as np
from scipy import integrate
from scipy import optimize
import ode
def lsq(teta,y0,data):
#INPUT teta, the unknowns k0,k1
# data, observed
# y0 initial values needed by the ODE
#OUTPUT lsq value
t = np.linspace(0,9,10)
y_obs = data #data points
k = [0,0]
k[0] = teta[0]
k[1] = teta[1]
#call the ODE solver to get the states:
r = integrate.odeint(ode.f,y0,t,args=(k,))
#the ODE system in
#at each row (time point), y_cal has
#the values of the components [A,B,C]
y_cal = r[:,1] #separate the measured B
#compute the expression to be minimized:
return sum((y_obs-y_cal)**2)
import pylab as py
import numpy as np
from scipy import integrate
from scipy import optimize
import lsq
if __name__ == '__main__':
teta = [0.2,0.3] #guess for parameter values k0 and k1
y0 = [1,0,0] #initial conditions for system
y = [0.000,0.416,0.489,0.595,0.506,0.493,0.458,0.394,0.335,0.309] #observed data points
data = y
resid = lsq.lsq(teta,y0,data)
print resid
For these kind of fitting tasks you could use the package lmfit. The outcome of the fit would look like this; as you can see, the data are reproduced very well:
For now, I fixed the initial concentrations, you could also set them as variables if you like (just remove the vary=False in the code below). The parameters you obtain are:
x10: 5 (fixed)
x20: 0 (fixed)
x30: 0 (fixed)
k0: 0.12183301 +/- 0.005909 (4.85%) (init= 0.2)
k1: 0.77583946 +/- 0.026639 (3.43%) (init= 0.3)
[[Correlations]] (unreported correlations are < 0.100)
C(k0, k1) = 0.809
The code that reproduces the plot looks like this (some explanation can be found in the inline comments):
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import odeint
from lmfit import minimize, Parameters, Parameter, report_fit
from scipy.integrate import odeint
def f(y, t, paras):
Your system of differential equations
x1 = y[0]
x2 = y[1]
x3 = y[2]
k0 = paras['k0'].value
k1 = paras['k1'].value
except KeyError:
k0, k1 = paras
# the model equations
f0 = -k0 * x1
f1 = k0 * x1 - k1 * x2
f2 = k1 * x2
return [f0, f1, f2]
def g(t, x0, paras):
Solution to the ODE x'(t) = f(t,x,k) with initial condition x(0) = x0
x = odeint(f, x0, t, args=(paras,))
return x
def residual(paras, t, data):
compute the residual between actual data and fitted data
x0 = paras['x10'].value, paras['x20'].value, paras['x30'].value
model = g(t, x0, paras)
# you only have data for one of your variables
x2_model = model[:, 1]
return (x2_model - data).ravel()
# initial conditions
x10 = 5.
x20 = 0
x30 = 0
y0 = [x10, x20, x30]
# measured data
t_measured = np.linspace(0, 9, 10)
x2_measured = np.array([0.000, 0.416, 0.489, 0.595, 0.506, 0.493, 0.458, 0.394, 0.335, 0.309])
plt.scatter(t_measured, x2_measured, marker='o', color='b', label='measured data', s=75)
# set parameters including bounds; you can also fix parameters (use vary=False)
params = Parameters()
params.add('x10', value=x10, vary=False)
params.add('x20', value=x20, vary=False)
params.add('x30', value=x30, vary=False)
params.add('k0', value=0.2, min=0.0001, max=2.)
params.add('k1', value=0.3, min=0.0001, max=2.)
# fit model
result = minimize(residual, params, args=(t_measured, x2_measured), method='leastsq') # leastsq nelder
# check results of the fit
data_fitted = g(np.linspace(0., 9., 100), y0, result.params)
# plot fitted data
plt.plot(np.linspace(0., 9., 100), data_fitted[:, 1], '-', linewidth=2, color='red', label='fitted data')
plt.xlim([0, max(t_measured)])
plt.ylim([0, 1.1 * max(data_fitted[:, 1])])
# display fitted statistics
If you have data for additional variables, you can simply update the function residual.
The following worked for me:
import pylab as pp
import numpy as np
from scipy import integrate, interpolate
from scipy import optimize
##initialize the data
x_data = np.linspace(0,9,10)
y_data = np.array([0.000,0.416,0.489,0.595,0.506,0.493,0.458,0.394,0.335,0.309])
def f(y, t, k):
"""define the ODE system in terms of
dependent variable y,
independent variable t, and
optinal parmaeters, in this case a single variable k """
return (-k[0]*y[0],
def my_ls_func(x,teta):
"""definition of function for LS fit
x gives evaluation points,
teta is an array of parameters to be varied for fit"""
# create an alias to f which passes the optional params
f2 = lambda y,t: f(y, t, teta)
# calculate ode solution, retuen values for each entry of "x"
r = integrate.odeint(f2,y0,x)
#in this case, we only need one of the dependent variable values
return r[:,1]
def f_resid(p):
""" function to pass to optimize.leastsq
The routine will square and sum the values returned by
this function"""
return y_data-my_ls_func(x_data,p)
#solve the system - the solution is in variable c
guess = [0.2,0.3] #initial guess for params
y0 = [1,0,0] #inital conditions for ODEs
(c,kvg) = optimize.leastsq(f_resid, guess) #get params
print "parameter values are ",c
# fit ODE results to interpolating spline just for fun
xeval=np.linspace(min(x_data), max(x_data),30)
gls = interpolate.UnivariateSpline(xeval, my_ls_func(xeval,c), k=3, s=0)
#pick a few more points for a very smooth curve, then plot
# data and curve fit
xeval=np.linspace(min(x_data), max(x_data),200)
#Plot of the data as red dots and fit as blue line
pp.plot(x_data, y_data,'.r',xeval,gls(xeval),'-b')
Look at the scipy.optimize module. The minimize function looks fairly similar to fminsearch, and I believe that both basically use a simplex algorithm for optimization.
# cleaned up a bit to get my head around it - thanks for sharing
import pylab as pp
import numpy as np
from scipy import integrate, optimize
class Parameterize_ODE():
def __init__(self):
self.X = np.linspace(0,9,10)
self.y = np.array([0.000,0.416,0.489,0.595,0.506,0.493,0.458,0.394,0.335,0.309])
self.y0 = [1,0,0] # inital conditions ODEs
def ode(self, y, X, p):
return (-p[0]*y[0],
def model(self, X, p):
return integrate.odeint(self.ode, self.y0, X, args=(p,))
def f_resid(self, p):
return self.y - self.model(self.X, p)[:,1]
def optim(self, p_quess):
return optimize.leastsq(self.f_resid, p_guess) # fit params
po = Parameterize_ODE(); p_guess = [0.2, 0.3]
c, kvg = po.optim(p_guess)
# --- show ---
print "parameter values are ", c, kvg
x = np.linspace(min(po.X), max(po.X), 2000)
pp.plot(po.X, po.y,'.r',x, po.model(x, c)[:,1],'-b')
pp.xlabel('X',{"fontsize":16}); pp.ylabel("y",{"fontsize":16}); pp.legend(('data','fit'),loc=0);

SciPy LeastSq Goodness of Fit Estimator

I have a data surface that I'm fitting using SciPy's leastsq function.
I would like to have some estimate of the quality of the fit after leastsq returns. I'd expected that this would be included as a return from the function, but, if so, it doesn't seem to be clearly documented.
Is there such a return or, barring that, some function I can pass my data and the returned parameter values and fit function to that will give me an estimate of fit quality (R^2 or some such)?
If you call leastsq like this:
import scipy.optimize
p,cov,infodict,mesg,ier = optimize.leastsq(
def residuals(a,x,y):
return y-f(x,a)
then, using the definition of R^2 given here,
What is infodict['fvec'] you ask? It's the array of residuals:
In [48]: optimize.leastsq?
infodict -- a dictionary of optional outputs with the keys:
'fvec' : the function evaluated at the output
For example:
import scipy.optimize as optimize
import numpy as np
import collections
import matplotlib.pyplot as plt
x = np.array([821,576,473,377,326])
y = np.array([255,235,208,166,157])
def sigmoid(p,x):
y = c / (1 + np.exp(-k*(x-x0))) + y0
return y
def residuals(p,x,y):
return y - sigmoid(p,x)
Param=collections.namedtuple('Param','x0 y0 c k')
p,cov,infodict,mesg,ier = optimize.leastsq(
xp = np.linspace(100, 1600, 1500)
x0 = {p.x0}
y0 = {p.y0}
c = {p.c}
k = {p.k}
# You could compute the residuals this way:
# [ 0.76205302 -2.010142 2.60265297 -3.02849144 1.6739274 ]
# But you don't have to compute `resid` -- `infodict['fvec']` already
# contains the info.
# [ 0.76205302 -2.010142 2.60265297 -3.02849144 1.6739274 ]
# 0.996768131959
plt.plot(x, y, '.', xp, pxp, '-')
