I have the following code:
import numpy as np
from scipy.optimize import curve_fit
def func(x, p): return p[0] + p[1] + x
popt, pcov = curve_fit(func, np.arange(10), np.arange(10), p0=(0, 0))
It will raise TypeError: func() takes exactly 2 arguments (3 given). Well, that sounds fair - curve_fit unpact the (0, 0) to be two scalar inputs. So I tried this:
popt, pcov = curve_fit(func, np.arange(10), np.arange(10), p0=((0, 0),))
Again, it said: ValueError: object too deep for desired array
If I left it as default (not specifying p0):
popt, pcov = curve_fit(func, np.arange(10), np.arange(10))
It will raise IndexError: invalid index to scalar variable. Obviously, it only gave the function a scalar for p.
I can make def func(x, p1, p2): return p1 + p2 + x to get it working, but with more complicated situations the code is going to look verbose and messy. I'd really love it if there's a cleaner solution to this problem.
Thanks!
Not sure if this is cleaner, but at least it is easier now to add more parameters to the fitting function. Maybe one could even make an even better solution out of this.
import numpy as np
from scipy.optimize import curve_fit
def func(x, p): return p[0] + p[1] * x
def func2(*args):
return func(args[0],args[1:])
popt, pcov = curve_fit(func2, np.arange(10), np.arange(10), p0=(0, 0))
print popt,pcov
EDIT: This works for me
import numpy as np
from scipy.optimize import curve_fit
def func(x, *p): return p[0] + p[1] * x
popt, pcov = curve_fit(func, np.arange(10), np.arange(10), p0=(0, 0))
print popt,pcov
Problem
When using curve_fit you must explicitly say the number of fit parameters. Doing something like:
def f(x, *p):
return sum( [p[i]*x**i for i in range(len(p))] )
would be great, since it would be a general nth-order polynomial fitting function, but unfortunately, in my SciPy 0.12.0, it raises:
ValueError: Unable to determine number of fit parameters.
Solution
So you should do:
def f_1(x, p0, p1):
return p0 + p1*x
def f_2(x, p0, p1, p2):
return p0 + p1*x + p2*x**2
and so forth...
Then you can call using the p0 argument:
curve_fit(f_1, xdata, ydata, p0=(0,0))
scipy.optimize.curve_fit
scipy.optimize.curve_fit(f, xdata, ydata, p0=None, sigma=None, **kw)
Use non-linear least squares to fit a function, f, to data.
Assumes ydata = f(xdata, *params) + eps
Explaining the idea
The function to be fitted should take only scalars (not: *p0).
Remember that the result of the fit depends on the initialization parameters.
Working example
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
def func(x, a0, a1):
return a0 + a1 * x
x, y = np.arange(10), np.arange(10) + np.random.randn(10)/10
popt, pcov = curve_fit(func, x, y, p0=(1, 1))
# Plot the results
plt.title('Fit parameters:\n a0=%.2e a1=%.2e' % (popt[0], popt[1]))
# Data
plt.plot(x, y, 'rx')
# Fitted function
x_fine = np.linspace(x[0], x[-1], 100)
plt.plot(x_fine, func(x_fine, popt[0], popt[1]), 'b-')
plt.savefig('Linear_fit.png')
plt.show()
You can define functions that return other functions (see Passing additional arguments using scipy.optimize.curve_fit? )
Working example :
import numpy as np
import random
from scipy.optimize import curve_fit
from matplotlib import pyplot as plt
import math
def funToFit(x):
return 0.5+2*x-3*x*x+0.2*x*x*x+0.1*x*x*x*x
xx=[random.uniform(1,5) for i in range(30)]
yy=[funToFit(xx[i])+random.uniform(-1,1) for i in range(len(xx))]
a=np.zeros(5)
def make_func(numarg):
def func(x,*a):
ng=numarg
v=0
for i in range(ng):
v+=a[i]*np.power(x,i)
return v
return func
leastsq, covar = curve_fit(make_func(len(a)),xx,yy,tuple(a))
print leastsq
def fFited(x):
v=0
for i in range(len(leastsq)):
v+=leastsq[i]*np.power(x,i)
return v
xfine=np.linspace(1,5,200)
plt.plot(xx,yy,".")
plt.plot(xfine,fFited(xfine))
plt.show()
This is an old thread now, but I also just ran into this issue. Building on Emile Maras' solution, but expanding the function to to return either the nth order polynomial fitting function for curve_fit, or the y values based on fit results. This facilitates plotting and residual calculations. Here is an example that fits data to progressively higher order polynomials and plots the results and residuals.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def funToFit(x):
return 0.5 + 2*x -3*x**2 + 0.2*x**3 + 0.1*x**4
x=np.arange(30)
y=funToFit(x)+np.random.normal(0,5000,x.size)
def polyfun(order=0,x=np.arange(0),P=np.arange(0)):
if P.size>0 and x.size>0:
y=0
for i in range(P.size):
y+=P[i]*np.power(x,i)
return y
elif order>0:
def fitfun(x,*a):
y=0
for i in range(order+1):
y+=a[i]*np.power(x,i)
return y
return fitfun
else:
raise Exception("Either order or x and P must be provided")
plt.figure("fits")
plt.plot(x,y,color="black")
for i in range(4):
order = i+1
[fit,covar] = curve_fit(polyfun(order=order),x,y,p0=(1,)*(order+1))
yfit = polyfun(x=x,P=fit)
res = yfit-y
plt.figure("fits")
plt.plot(x,yfit)
plt.figure("res")
plt.plot(x,res)
Related
from scipy.optimize import curve_fit
def func(x, a, b):
return a * np.exp(-b * x)
xdata = np.linspace(0, 4, 50)
ydata = np.linspace(0, 4, 50)
popt, pcov = curve_fit(func, xdata, ydata)
It is quite easy to fit an arbitrary Gaussian in python with something like the above method. However, I would like to prepare a function that always the user to select an arbitrary number of Gaussians and still attempt to find a best fit.
I'm trying to figure out how to modify the function func so that I can pass it an additional parameter n=2 for instance and it would return a function that would try and fit 2 Gaussians, akin to:
from scipy.optimize import curve_fit
def func2(x, a, b, d, e):
return (a * np.exp(-b * x) + c * np.exp(-d * x))
xdata = np.linspace(0, 4, 50)
ydata = np.linspace(0, 4, 50)
popt, pcov = curve_fit(func2, xdata, ydata)
Without having to hard code this additional cases, so that we could instead pass a single function like func(...,n=2) and get the same result as above. I'm having trouble finding an elegant solution to this. My guess is that the best solution will be something using a lambda function.
You can define the function to take a variable number of arguments using def func(x, *args). The *args variable then contains something like (a1, b1, a2, b2, a3, ...). It would be possible to loop over those and sum the Gaussians but I'm showing a vectorized solution instead.
Since curve_fit can no longer determine the number of parameters from this function, you can provide an initial guess that determines the amount of Gaussians you want to fit. Each Gaussian requires two parameters, so [1, 1]*n produces a parameter vector of the correct length.
from scipy.optimize import curve_fit
import numpy as np
def func(x, *args):
x = x.reshape(-1, 1)
a = np.array(args[0::2]).reshape(1, -1)
b = np.array(args[1::2]).reshape(1, -1)
return np.sum(a * np.exp(-b * x), axis=1)
n = 3
xdata = np.linspace(0, 4, 50)
ydata = np.linspace(0, 4, 50)
popt, pcov = curve_fit(func, xdata, ydata, p0=[1, 1] * n)
I am trying to fit different differential equations to a given data set with python. For this reason, I use the scipy package, respectively the solve_ivp function.
This works fine for me, as long as I have a rough estimate of the parameters (b= 0.005) included in the differential equations, e.g:
import matplotlib.pyplot as plt
from scipy.integrate import solve_ivp
import numpy as np
def f(x, y, b):
dydx= [-b[0] * y[0]]
return dydx
xspan= np.linspace(1, 500, 25)
yinit= [5]
b= [0.005]
sol= solve_ivp(lambda x, y: f(x, y, b),
[xspan[0], xspan[-1]], yinit, t_eval= xspan)
print(sol)
print("\n")
print(sol.t)
print(sol.y)
plt.plot(sol.t, sol.y[0], "b--")
However, what I like to achieve is, that the parameter b (or more parameters) is/are determined "automatically" based on the best fit of the solved differential equation to a given data set (x and y). Is there a way this can be done, for example by combining this example with the curve_fit function of scipy and how would this look?
Thank you in advance!
Yes, what you think about should work, it should be easy to plug together. You want to call
popt, pcov = scipy.optimize.curve_fit(curve, xdata, ydata, p0=[b0])
b = popt[0]
where you now have to define a function curve(x,*p) that transforms any list of point into a list of values according to the only parameter b.
def curve(x,b):
res = solve_ivp(odefun, [1,500], [5], t_eval=x, args = [b])
return res.y[0]
Add optional arguments for error tolerances as necessary.
To make this more realistic, make also the initial point a parameter. Then it also becomes more obvious where a list is expected and where single arguments. To get a proper fitting task add some random noise to the test data. Also make the fall to zero not so fast, so that the final plot still looks somewhat interesting.
from scipy.integrate import solve_ivp
from scipy.optimize import curve_fit
xmin,xmax = 1,500
def f(t, y, b):
dydt= -b * y
return dydt
def curve(t, b, y0):
sol= solve_ivp(lambda t, y: f(t, y, b),
[xmin, xmax], [y0], t_eval= t)
return sol.y[0]
xdata = np.linspace(xmin, xmax, 25)
ydata = np.exp(-0.02*xdata)+0.02*np.random.randn(*xdata.shape)
y0 = 5
b= 0.005
p0 = [b,y0]
popt, pcov = curve_fit(curve, xdata, ydata, p0=p0)
b, y0 = popt
print(f"b={b}, y0 = {y0}")
This returns
b=0.019975693539459473, y0 = 0.9757709108115179
Now plot the test data against the fitted curve
I'm trying to fit and plot a Gaussian curve to some given data. This is what I have so far:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# Generate data
mu, sigma = 0, 0.1
y, xe = np.histogram(np.random.normal(mu, sigma, 1000))
x = .5 * (xe[:-1] + xe[1:])
def gauss (x, y):
p = [x0, y0, sigma]
return p[0] * np.exp(-(x-p[1])**2 / (2 * p[2]**2))
p0 = [1., 1., 1.]
fit = curve_fit(gauss, x, y, p0=p0)
plt.plot(gauss(x, y))
plt.show()
When I run the code I get this error:
TypeError: gauss() takes exactly 2 arguments (4 given)
I don't understand where I have given my function 4 arguments. I'm also not convinced I'm using the curve function correctly, but I'm not sure exactly what I'm doing wrong. Any help would be appreciated.
Edit
Here's the Traceback:
Traceback (most recent call last):
File "F:\Numerical methods\rw893 final assignment.py", line 21, in <module>
fitE, fitI = curve_fit(gauss, x, y, p0=p0)
File "F:\Portable Python 2.7.5.1\App\lib\site-packages\scipy\optimize\minpack.py", line 515, in curve_fit
res = leastsq(func, p0, args=args, full_output=1, **kw)
File "F:\Portable Python 2.7.5.1\App\lib\site-packages\scipy\optimize\minpack.py", line 354, in leastsq
shape, dtype = _check_func('leastsq', 'func', func, x0, args, n)
File "F:\Portable Python 2.7.5.1\App\lib\site-packages\scipy\optimize\minpack.py", line 17, in _check_func
res = atleast_1d(thefunc(*((x0[:numinputs],) + args)))
File "F:\Portable Python 2.7.5.1\App\lib\site-packages\scipy\optimize\minpack.py", line 427, in _general_function
return function(xdata, *params) - ydata
TypeError: gauss() takes exactly 2 arguments (4 given)
Check the first scipy documentation docs.scipy.org/doc/scipy-0.13.0/reference/generated/scipy.optimize.curve_fit.html:
scipy.optimize.curve_fit
scipy.optimize.curve_fit(f, xdata, ydata, p0=None, sigma=None, **kw)
Use non-linear least squares to fit a function, f, to data.
Assumes ydata = f(xdata, *params) + eps
Explaining the idea
The function to be fitted should take only scalars (not: *p0).
I want to remind you that you hand over the initialization parameters x0, y0, sigma to the function gauss during the call of curve_fit.
You call the initialization p0 = [x0, y0, sigma].
The function gauss returns the value y = y0 * np.exp(-((x - x0) / sigma)**2).
Therefore the input values need to be x, x0, y0, sigma.
The first parameter x is the data you know together with the result of the function y. The later three parameters will be fitted - you hand over them as initialization parameters.
Working example
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# Create data:
x0, sigma = 0, 0.1
y, xe = np.histogram(np.random.normal(x0, sigma, 1000))
x = .5 * (xe[:-1] + xe[1:])
# Function to be fitted
def gauss(x, x0, y0, sigma):
p = [x0, y0, sigma]
return p[1]* np.exp(-((x-p[0])/p[2])**2)
# Initialization parameters
p0 = [1., 1., 1.]
# Fit the data with the function
fit, tmp = curve_fit(gauss, x, y, p0=p0)
# Plot the results
plt.title('Fit parameters:\n x0=%.2e y0=%.2e sigma=%.2e' % (fit[0], fit[1], fit[2]))
# Data
plt.plot(x, y, 'r--')
# Fitted function
x_fine = np.linspace(xe[0], xe[-1], 100)
plt.plot(x_fine, gauss(x_fine, fit[0], fit[1], fit[2]), 'b-')
plt.savefig('Gaussian_fit.png')
plt.show()
Probably your callback is called in curve_fit with a different number of parameters.
Have a look at the documentation where it says:
The model function, f(x, ...). It must take the independent variable
as the first argument and the parameters to fit as separate remaining
arguments.
To make sure this works out you might want to take *args after the first argument and have a look at what you get.
from numpy import loadtxt
import numpy as np
from scipy import *
from matplotlib import *
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a, b, c, d, x0):
return a*np.exp(-(x-x0)**2/(2*d**2)) + c
x = np.linspace(0,4,50)
y = func(x, 2.5, 1.3, 0.5, 1.0, 2.0)
yn = y + 0.2*np.random.normal(size=len(x))
p = [1,1,1,1,1]
popt, pcov = curve_fit(func, x, yn, p0=p)
plt.plot(x,func(x,popt[0],popt[1],popt[2],popt[3],popt[4]))
plt.plot(x,yn,'r+')
plt.show()
This should help. This can also be extended to a 3d Gaussian, then the
input array 'x' should be a k-dimensional array for the (x,y) values and
'yn' should be the z-values.
I am trying to fit a skewed and shifted Gaussian curve using scipy's curve_fit function, but I find that under certain conditions the fitting is quite poor, often giving me close to or exactly a straight line.
The code below is derived from the curve_fit documentation. The code provided is an arbitrary set of data for test purposes but displays the issue quite well.
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
import math as math
import scipy.special as sp
#def func(x, a, b, c):
# return a*np.exp(-b*x) + c
def func(x, sigmag, mu, alpha, c,a):
#normal distribution
normpdf = (1/(sigmag*np.sqrt(2*math.pi)))*np.exp(-(np.power((x-mu),2)/(2*np.power(sigmag,2))))
normcdf = (0.5*(1+sp.erf((alpha*((x-mu)/sigmag))/(np.sqrt(2)))))
return 2*a*normpdf*normcdf + c
x = np.linspace(0,100,100)
y = func(x, 10,30, 0,0,1)
yn = y + 0.001*np.random.normal(size=len(x))
popt, pcov = curve_fit(func, x, yn,) #p0=(9,35,0,9,1))
y_fit= func(x,popt[0],popt[1],popt[2],popt[3],popt[4])
plt.plot(x,yn)
plt.plot(x,y_fit)
The issue seems to pop up when I shift the gaussian too far from zero (using mu). I have tried giving initial values, even those identical to my original function, but it does not solve the problem. For a value of mu=10, curve_fit works perfectly, but if I use mu>=30 it not longer fits the data.
Giving starting points for minimization often works wonders. Try giving the minimizer some information on the position of the maximum and the width of the curve:
popt, pcov = curve_fit(func, x, yn, p0=(1./np.std(yn), np.argmax(yn) ,0,0,1))
Changing this single line in your code with sigma=10 and mu=50 produces
You can call curve_fit many times with random initial guess, and choose the parameters with minimum error.
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
import math as math
import scipy.special as sp
def func(x, sigmag, mu, alpha, c,a):
#normal distribution
normpdf = (1/(sigmag*np.sqrt(2*math.pi)))*np.exp(-(np.power((x-mu),2)/(2*np.power(sigmag,2))))
normcdf = (0.5*(1+sp.erf((alpha*((x-mu)/sigmag))/(np.sqrt(2)))))
return 2*a*normpdf*normcdf + c
x = np.linspace(0,100,100)
y = func(x, 10,30, 0,0,1)
yn = y + 0.001*np.random.normal(size=len(x))
results = []
for i in xrange(50):
p = np.random.randn(5)*10
try:
popt, pcov = curve_fit(func, x, yn, p)
except:
pass
err = np.sum(np.abs(func(x, *popt) - yn))
results.append((err, popt))
if err < 0.1:
break
err, popt = min(results, key=lambda x:x[0])
y_fit= func(x, *popt)
plt.plot(x,yn)
plt.plot(x,y_fit)
print len(results)
I have a data surface that I'm fitting using SciPy's leastsq function.
I would like to have some estimate of the quality of the fit after leastsq returns. I'd expected that this would be included as a return from the function, but, if so, it doesn't seem to be clearly documented.
Is there such a return or, barring that, some function I can pass my data and the returned parameter values and fit function to that will give me an estimate of fit quality (R^2 or some such)?
Thanks!
If you call leastsq like this:
import scipy.optimize
p,cov,infodict,mesg,ier = optimize.leastsq(
residuals,a_guess,args=(x,y),full_output=True)
where
def residuals(a,x,y):
return y-f(x,a)
then, using the definition of R^2 given here,
ss_err=(infodict['fvec']**2).sum()
ss_tot=((y-y.mean())**2).sum()
rsquared=1-(ss_err/ss_tot)
What is infodict['fvec'] you ask? It's the array of residuals:
In [48]: optimize.leastsq?
...
infodict -- a dictionary of optional outputs with the keys:
'fvec' : the function evaluated at the output
For example:
import scipy.optimize as optimize
import numpy as np
import collections
import matplotlib.pyplot as plt
x = np.array([821,576,473,377,326])
y = np.array([255,235,208,166,157])
def sigmoid(p,x):
x0,y0,c,k=p
y = c / (1 + np.exp(-k*(x-x0))) + y0
return y
def residuals(p,x,y):
return y - sigmoid(p,x)
Param=collections.namedtuple('Param','x0 y0 c k')
p_guess=Param(x0=600,y0=200,c=100,k=0.01)
p,cov,infodict,mesg,ier = optimize.leastsq(
residuals,p_guess,args=(x,y),full_output=True)
p=Param(*p)
xp = np.linspace(100, 1600, 1500)
print('''\
x0 = {p.x0}
y0 = {p.y0}
c = {p.c}
k = {p.k}
'''.format(p=p))
pxp=sigmoid(p,xp)
# You could compute the residuals this way:
resid=residuals(p,x,y)
print(resid)
# [ 0.76205302 -2.010142 2.60265297 -3.02849144 1.6739274 ]
# But you don't have to compute `resid` -- `infodict['fvec']` already
# contains the info.
print(infodict['fvec'])
# [ 0.76205302 -2.010142 2.60265297 -3.02849144 1.6739274 ]
ss_err=(infodict['fvec']**2).sum()
ss_tot=((y-y.mean())**2).sum()
rsquared=1-(ss_err/ss_tot)
print(rsquared)
# 0.996768131959
plt.plot(x, y, '.', xp, pxp, '-')
plt.xlim(100,1000)
plt.ylim(130,270)
plt.xlabel('x')
plt.ylabel('y',rotation='horizontal')
plt.grid(True)
plt.show()