Fitting Gaussian curve to data in python - python

I'm trying to fit and plot a Gaussian curve to some given data. This is what I have so far:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# Generate data
mu, sigma = 0, 0.1
y, xe = np.histogram(np.random.normal(mu, sigma, 1000))
x = .5 * (xe[:-1] + xe[1:])
def gauss (x, y):
p = [x0, y0, sigma]
return p[0] * np.exp(-(x-p[1])**2 / (2 * p[2]**2))
p0 = [1., 1., 1.]
fit = curve_fit(gauss, x, y, p0=p0)
plt.plot(gauss(x, y))
plt.show()
When I run the code I get this error:
TypeError: gauss() takes exactly 2 arguments (4 given)
I don't understand where I have given my function 4 arguments. I'm also not convinced I'm using the curve function correctly, but I'm not sure exactly what I'm doing wrong. Any help would be appreciated.
Edit
Here's the Traceback:
Traceback (most recent call last):
File "F:\Numerical methods\rw893 final assignment.py", line 21, in <module>
fitE, fitI = curve_fit(gauss, x, y, p0=p0)
File "F:\Portable Python 2.7.5.1\App\lib\site-packages\scipy\optimize\minpack.py", line 515, in curve_fit
res = leastsq(func, p0, args=args, full_output=1, **kw)
File "F:\Portable Python 2.7.5.1\App\lib\site-packages\scipy\optimize\minpack.py", line 354, in leastsq
shape, dtype = _check_func('leastsq', 'func', func, x0, args, n)
File "F:\Portable Python 2.7.5.1\App\lib\site-packages\scipy\optimize\minpack.py", line 17, in _check_func
res = atleast_1d(thefunc(*((x0[:numinputs],) + args)))
File "F:\Portable Python 2.7.5.1\App\lib\site-packages\scipy\optimize\minpack.py", line 427, in _general_function
return function(xdata, *params) - ydata
TypeError: gauss() takes exactly 2 arguments (4 given)

Check the first scipy documentation docs.scipy.org/doc/scipy-0.13.0/reference/generated/scipy.optimize.curve_fit.html:
scipy.optimize.curve_fit
scipy.optimize.curve_fit(f, xdata, ydata, p0=None, sigma=None, **kw)
Use non-linear least squares to fit a function, f, to data.
Assumes ydata = f(xdata, *params) + eps
Explaining the idea
The function to be fitted should take only scalars (not: *p0).
I want to remind you that you hand over the initialization parameters x0, y0, sigma to the function gauss during the call of curve_fit.
You call the initialization p0 = [x0, y0, sigma].
The function gauss returns the value y = y0 * np.exp(-((x - x0) / sigma)**2).
Therefore the input values need to be x, x0, y0, sigma.
The first parameter x is the data you know together with the result of the function y. The later three parameters will be fitted - you hand over them as initialization parameters.
Working example
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# Create data:
x0, sigma = 0, 0.1
y, xe = np.histogram(np.random.normal(x0, sigma, 1000))
x = .5 * (xe[:-1] + xe[1:])
# Function to be fitted
def gauss(x, x0, y0, sigma):
p = [x0, y0, sigma]
return p[1]* np.exp(-((x-p[0])/p[2])**2)
# Initialization parameters
p0 = [1., 1., 1.]
# Fit the data with the function
fit, tmp = curve_fit(gauss, x, y, p0=p0)
# Plot the results
plt.title('Fit parameters:\n x0=%.2e y0=%.2e sigma=%.2e' % (fit[0], fit[1], fit[2]))
# Data
plt.plot(x, y, 'r--')
# Fitted function
x_fine = np.linspace(xe[0], xe[-1], 100)
plt.plot(x_fine, gauss(x_fine, fit[0], fit[1], fit[2]), 'b-')
plt.savefig('Gaussian_fit.png')
plt.show()

Probably your callback is called in curve_fit with a different number of parameters.
Have a look at the documentation where it says:
The model function, f(x, ...). It must take the independent variable
as the first argument and the parameters to fit as separate remaining
arguments.
To make sure this works out you might want to take *args after the first argument and have a look at what you get.

from numpy import loadtxt
import numpy as np
from scipy import *
from matplotlib import *
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a, b, c, d, x0):
return a*np.exp(-(x-x0)**2/(2*d**2)) + c
x = np.linspace(0,4,50)
y = func(x, 2.5, 1.3, 0.5, 1.0, 2.0)
yn = y + 0.2*np.random.normal(size=len(x))
p = [1,1,1,1,1]
popt, pcov = curve_fit(func, x, yn, p0=p)
plt.plot(x,func(x,popt[0],popt[1],popt[2],popt[3],popt[4]))
plt.plot(x,yn,'r+')
plt.show()
This should help. This can also be extended to a 3d Gaussian, then the
input array 'x' should be a k-dimensional array for the (x,y) values and
'yn' should be the z-values.

Related

How to determine unknown parameters of a differential equation based on the best fit to a data set in Python?

I am trying to fit different differential equations to a given data set with python. For this reason, I use the scipy package, respectively the solve_ivp function.
This works fine for me, as long as I have a rough estimate of the parameters (b= 0.005) included in the differential equations, e.g:
import matplotlib.pyplot as plt
from scipy.integrate import solve_ivp
import numpy as np
def f(x, y, b):
dydx= [-b[0] * y[0]]
return dydx
xspan= np.linspace(1, 500, 25)
yinit= [5]
b= [0.005]
sol= solve_ivp(lambda x, y: f(x, y, b),
[xspan[0], xspan[-1]], yinit, t_eval= xspan)
print(sol)
print("\n")
print(sol.t)
print(sol.y)
plt.plot(sol.t, sol.y[0], "b--")
However, what I like to achieve is, that the parameter b (or more parameters) is/are determined "automatically" based on the best fit of the solved differential equation to a given data set (x and y). Is there a way this can be done, for example by combining this example with the curve_fit function of scipy and how would this look?
Thank you in advance!
Yes, what you think about should work, it should be easy to plug together. You want to call
popt, pcov = scipy.optimize.curve_fit(curve, xdata, ydata, p0=[b0])
b = popt[0]
where you now have to define a function curve(x,*p) that transforms any list of point into a list of values according to the only parameter b.
def curve(x,b):
res = solve_ivp(odefun, [1,500], [5], t_eval=x, args = [b])
return res.y[0]
Add optional arguments for error tolerances as necessary.
To make this more realistic, make also the initial point a parameter. Then it also becomes more obvious where a list is expected and where single arguments. To get a proper fitting task add some random noise to the test data. Also make the fall to zero not so fast, so that the final plot still looks somewhat interesting.
from scipy.integrate import solve_ivp
from scipy.optimize import curve_fit
xmin,xmax = 1,500
def f(t, y, b):
dydt= -b * y
return dydt
def curve(t, b, y0):
sol= solve_ivp(lambda t, y: f(t, y, b),
[xmin, xmax], [y0], t_eval= t)
return sol.y[0]
xdata = np.linspace(xmin, xmax, 25)
ydata = np.exp(-0.02*xdata)+0.02*np.random.randn(*xdata.shape)
y0 = 5
b= 0.005
p0 = [b,y0]
popt, pcov = curve_fit(curve, xdata, ydata, p0=p0)
b, y0 = popt
print(f"b={b}, y0 = {y0}")
This returns
b=0.019975693539459473, y0 = 0.9757709108115179
Now plot the test data against the fitted curve

Error non-linear-regression python curve-fit

Hello guys i want to make non-linear regression in python with curve fit
this is my code:
#fit a fourth degree polynomial to the economic data
from numpy import arange
from scipy.optimize import curve_fit
from matplotlib import pyplot
import math
x = [17.47,20.71,21.08,18.08,17.12,14.16,14.06,12.44,11.86,11.19,10.65]
y = [5,35,65,95,125,155,185,215,245,275,305]
# define the true objective function
def objective(x, a, b, c, d, e):
return ((a)-((b)*(x/3-5)))+((c)*(x/305)**2)-((d)*(math.log(305))-math.log(x))+((e)*(math.log(305)-(math.log(x))**2))
popt, _ = curve_fit(objective, x, y)
# summarize the parameter values
a, b, c, d, e = popt
# plot input vs output
pyplot.scatter(x, y)
# define a sequence of inputs between the smallest and largest known inputs
x_line = arange(min(x), max(x), 1)
# calculate the output for the range
y_line = objective(x_line, a, b, c, d, e)
# create a line plot for the mapping function
pyplot.plot(x_line, y_line, '--', color='red')
pyplot.show()
this is my error :
Traceback (most recent call last):
File "C:\Users\Fahmi\PycharmProjects\pythonProject\main.py", line 16, in
popt, _ = curve_fit(objective, x, y)
File "C:\Users\Fahmi\PycharmProjects\pythonProject\venv\lib\site-packages\scipy\optimize\minpack.py", line 784, in curve_fit
res = leastsq(func, p0, Dfun=jac, full_output=1, **kwargs)
File "C:\Users\Fahmi\PycharmProjects\pythonProject\venv\lib\site-packages\scipy\optimize\minpack.py", line 410, in leastsq
shape, dtype = _check_func('leastsq', 'func', func, x0, args, n)
File "C:\Users\Fahmi\PycharmProjects\pythonProject\venv\lib\site-packages\scipy\optimize\minpack.py", line 24, in _check_func
res = atleast_1d(thefunc(((x0[:numinputs],) + args)))
File "C:\Users\Fahmi\PycharmProjects\pythonProject\venv\lib\site-packages\scipy\optimize\minpack.py", line 484, in func_wrapped
return func(xdata, params) - ydata
File "C:\Users\Fahmi\PycharmProjects\pythonProject\main.py", line 13, in objective
return ((a)-((b)(x/3-5)))+((c)(x/305)**2)-((d)(math.log(305))-math.log(x))+((e)(math.log(305)-(math.log(x))**2))
TypeError: only size-1 arrays can be converted to Python scalars
thanks before
This is a known problem with the math library. Simply use numpy and your problem should be fixed as numpy functions have support for scalars and arrays.
#fit a fourth degree polynomial to the economic data
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
x = [17.47,20.71,21.08,18.08,17.12,14.16,14.06,12.44,11.86,11.19,10.65]
y = [5,35,65,95,125,155,185,215,245,275,305]
# define the true objective function
def objective(x, a, b, c, d, e):
return ((a)-((b)*(x/3-5)))+((c)*(x/305)**2)-((d)*(np.log(305))-np.log(x))+((e)*(np.log(305)-(np.log(x))**2))
popt, _ = curve_fit(objective, x, y)
# summarize the parameter values
a, b, c, d, e = popt
# plot input vs output
plt.scatter(x, y)
# define a sequence of inputs between the smallest and largest known inputs
x_line = np.arange(np.min(x), np.max(x), 1)
# calculate the output for the range
y_line = objective(x_line, a, b, c, d, e)
# create a line plot for the mapping function
plt.plot(x_line, y_line, '--', color='red')
plt.show()

Using scipy.optimize.curve_fit with weights

According to the documentation, the argument sigma can be used to set the weights of the data points in the fit. These "describe" 1-sigma errors when the argument absolute_sigma=True.
I have some data with artificial normally-distributed noise which varies:
n = 200
x = np.linspace(1, 20, n)
x0, A, alpha = 12, 3, 3
def f(x, x0, A, alpha):
return A * np.exp(-((x-x0)/alpha)**2)
noise_sigma = x/20
noise = np.random.randn(n) * noise_sigma
yexact = f(x, x0, A, alpha)
y = yexact + noise
If I want to fit the noisy y to f using curve_fit to what should I set sigma? The documentation isn't very specific here, but I would usually use 1/noise_sigma**2 as the weight:
p0 = 10, 4, 2
popt, pcov = curve_fit(f, x, y, p0)
popt2, pcov2 = curve_fit(f, x, y, p0, sigma=1/noise_sigma**2, absolute_sigma=True)
It doesn't seem to improve the fit much, though.
Is this option only used to better interpret the fit uncertainties through the covariance matrix? What is the difference between these two telling me?
In [249]: pcov
Out[249]:
array([[ 1.10205238e-02, -3.91494024e-08, 8.81822412e-08],
[ -3.91494024e-08, 1.52660426e-02, -1.05907265e-02],
[ 8.81822412e-08, -1.05907265e-02, 2.20414887e-02]])
In [250]: pcov2
Out[250]:
array([[ 0.26584674, -0.01836064, -0.17867193],
[-0.01836064, 0.27833 , -0.1459469 ],
[-0.17867193, -0.1459469 , 0.38659059]])
At least with scipy version 1.1.0 the parameter sigma should be equal to the error on each parameter. Specifically the documentation says:
A 1-d sigma should contain values of standard deviations of errors in
ydata. In this case, the optimized function is chisq = sum((r / sigma)
** 2).
In your case that would be:
curve_fit(f, x, y, p0, sigma=noise_sigma, absolute_sigma=True)
I looked through the source code and verified that when you specify sigma this way it minimizes ((f-data)/sigma)**2.
As a side note, this is in general what you want to be minimizing when you know the errors. The likelihood of observing points data given a model f is given by:
L(data|x0,A,alpha) = product over i Gaus(data_i, mean=f(x_i,x0,A,alpha), sigma=sigma_i)
which if you take the negative log becomes (up to constant factors that don't depend on the parameters):
-log(L) = sum over i (f(x_i,x0,A,alpha)-data_i)**2/(sigma_i**2)
which is just the chisquare.
I wrote a test program to verify that curve_fit was indeed returning the correct values with the sigma specified correctly:
from __future__ import print_function
import numpy as np
from scipy.optimize import curve_fit, fmin
np.random.seed(0)
def make_chi2(x, data, sigma):
def chi2(args):
x0, A, alpha = args
return np.sum(((f(x,x0,A,alpha)-data)/sigma)**2)
return chi2
n = 200
x = np.linspace(1, 20, n)
x0, A, alpha = 12, 3, 3
def f(x, x0, A, alpha):
return A * np.exp(-((x-x0)/alpha)**2)
noise_sigma = x/20
noise = np.random.randn(n) * noise_sigma
yexact = f(x, x0, A, alpha)
y = yexact + noise
p0 = 10, 4, 2
# curve_fit without parameters (sigma is implicitly equal to one)
popt, pcov = curve_fit(f, x, y, p0)
# curve_fit with wrong sigma specified
popt2, pcov2 = curve_fit(f, x, y, p0, sigma=1/noise_sigma**2, absolute_sigma=True)
# curve_fit with correct sigma
popt3, pcov3 = curve_fit(f, x, y, p0, sigma=noise_sigma, absolute_sigma=True)
chi2 = make_chi2(x,y,noise_sigma)
# double checking that we get the correct answer
xopt = fmin(chi2,p0,xtol=1e-10,ftol=1e-10)
print("popt = %s, chi2 = %.2f" % (popt,chi2(popt)))
print("popt2 = %s, chi2 = %.2f" % (popt2, chi2(popt2)))
print("popt3 = %s, chi2 = %.2f" % (popt3, chi2(popt3)))
print("xopt = %s, chi2 = %.2f" % (xopt, chi2(xopt)))
which outputs:
popt = [ 11.93617403 3.30528488 2.86314641], chi2 = 200.66
popt2 = [ 11.94169083 3.30372955 2.86207253], chi2 = 200.64
popt3 = [ 11.93128545 3.333727 2.81403324], chi2 = 200.44
xopt = [ 11.93128603 3.33373094 2.81402741], chi2 = 200.44
As you can see the chi2 is indeed minimized correctly when you specify sigma=sigma as an argument to curve_fit.
As to why the improvement isn't "better", I'm not really sure. My only guess is that without specifying a sigma value you implicitly assume they are equal and over the part of the data where the fit matters (the peak), the errors are "approximately" equal.
To answer your second question, no the sigma option is not only used to change the output of the covariance matrix, it actually changes what is being minimized.

Pass tuple as input argument for scipy.optimize.curve_fit

I have the following code:
import numpy as np
from scipy.optimize import curve_fit
def func(x, p): return p[0] + p[1] + x
popt, pcov = curve_fit(func, np.arange(10), np.arange(10), p0=(0, 0))
It will raise TypeError: func() takes exactly 2 arguments (3 given). Well, that sounds fair - curve_fit unpact the (0, 0) to be two scalar inputs. So I tried this:
popt, pcov = curve_fit(func, np.arange(10), np.arange(10), p0=((0, 0),))
Again, it said: ValueError: object too deep for desired array
If I left it as default (not specifying p0):
popt, pcov = curve_fit(func, np.arange(10), np.arange(10))
It will raise IndexError: invalid index to scalar variable. Obviously, it only gave the function a scalar for p.
I can make def func(x, p1, p2): return p1 + p2 + x to get it working, but with more complicated situations the code is going to look verbose and messy. I'd really love it if there's a cleaner solution to this problem.
Thanks!
Not sure if this is cleaner, but at least it is easier now to add more parameters to the fitting function. Maybe one could even make an even better solution out of this.
import numpy as np
from scipy.optimize import curve_fit
def func(x, p): return p[0] + p[1] * x
def func2(*args):
return func(args[0],args[1:])
popt, pcov = curve_fit(func2, np.arange(10), np.arange(10), p0=(0, 0))
print popt,pcov
EDIT: This works for me
import numpy as np
from scipy.optimize import curve_fit
def func(x, *p): return p[0] + p[1] * x
popt, pcov = curve_fit(func, np.arange(10), np.arange(10), p0=(0, 0))
print popt,pcov
Problem
When using curve_fit you must explicitly say the number of fit parameters. Doing something like:
def f(x, *p):
return sum( [p[i]*x**i for i in range(len(p))] )
would be great, since it would be a general nth-order polynomial fitting function, but unfortunately, in my SciPy 0.12.0, it raises:
ValueError: Unable to determine number of fit parameters.
Solution
So you should do:
def f_1(x, p0, p1):
return p0 + p1*x
def f_2(x, p0, p1, p2):
return p0 + p1*x + p2*x**2
and so forth...
Then you can call using the p0 argument:
curve_fit(f_1, xdata, ydata, p0=(0,0))
scipy.optimize.curve_fit
scipy.optimize.curve_fit(f, xdata, ydata, p0=None, sigma=None, **kw)
Use non-linear least squares to fit a function, f, to data.
Assumes ydata = f(xdata, *params) + eps
Explaining the idea
The function to be fitted should take only scalars (not: *p0).
Remember that the result of the fit depends on the initialization parameters.
Working example
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
def func(x, a0, a1):
return a0 + a1 * x
x, y = np.arange(10), np.arange(10) + np.random.randn(10)/10
popt, pcov = curve_fit(func, x, y, p0=(1, 1))
# Plot the results
plt.title('Fit parameters:\n a0=%.2e a1=%.2e' % (popt[0], popt[1]))
# Data
plt.plot(x, y, 'rx')
# Fitted function
x_fine = np.linspace(x[0], x[-1], 100)
plt.plot(x_fine, func(x_fine, popt[0], popt[1]), 'b-')
plt.savefig('Linear_fit.png')
plt.show()
You can define functions that return other functions (see Passing additional arguments using scipy.optimize.curve_fit? )
Working example :
import numpy as np
import random
from scipy.optimize import curve_fit
from matplotlib import pyplot as plt
import math
def funToFit(x):
return 0.5+2*x-3*x*x+0.2*x*x*x+0.1*x*x*x*x
xx=[random.uniform(1,5) for i in range(30)]
yy=[funToFit(xx[i])+random.uniform(-1,1) for i in range(len(xx))]
a=np.zeros(5)
def make_func(numarg):
def func(x,*a):
ng=numarg
v=0
for i in range(ng):
v+=a[i]*np.power(x,i)
return v
return func
leastsq, covar = curve_fit(make_func(len(a)),xx,yy,tuple(a))
print leastsq
def fFited(x):
v=0
for i in range(len(leastsq)):
v+=leastsq[i]*np.power(x,i)
return v
xfine=np.linspace(1,5,200)
plt.plot(xx,yy,".")
plt.plot(xfine,fFited(xfine))
plt.show()
This is an old thread now, but I also just ran into this issue. Building on Emile Maras' solution, but expanding the function to to return either the nth order polynomial fitting function for curve_fit, or the y values based on fit results. This facilitates plotting and residual calculations. Here is an example that fits data to progressively higher order polynomials and plots the results and residuals.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def funToFit(x):
return 0.5 + 2*x -3*x**2 + 0.2*x**3 + 0.1*x**4
x=np.arange(30)
y=funToFit(x)+np.random.normal(0,5000,x.size)
def polyfun(order=0,x=np.arange(0),P=np.arange(0)):
if P.size>0 and x.size>0:
y=0
for i in range(P.size):
y+=P[i]*np.power(x,i)
return y
elif order>0:
def fitfun(x,*a):
y=0
for i in range(order+1):
y+=a[i]*np.power(x,i)
return y
return fitfun
else:
raise Exception("Either order or x and P must be provided")
plt.figure("fits")
plt.plot(x,y,color="black")
for i in range(4):
order = i+1
[fit,covar] = curve_fit(polyfun(order=order),x,y,p0=(1,)*(order+1))
yfit = polyfun(x=x,P=fit)
res = yfit-y
plt.figure("fits")
plt.plot(x,yfit)
plt.figure("res")
plt.plot(x,res)

scipy.optimize.curve_fit unable to fit shifted skewed gaussian curve

I am trying to fit a skewed and shifted Gaussian curve using scipy's curve_fit function, but I find that under certain conditions the fitting is quite poor, often giving me close to or exactly a straight line.
The code below is derived from the curve_fit documentation. The code provided is an arbitrary set of data for test purposes but displays the issue quite well.
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
import math as math
import scipy.special as sp
#def func(x, a, b, c):
# return a*np.exp(-b*x) + c
def func(x, sigmag, mu, alpha, c,a):
#normal distribution
normpdf = (1/(sigmag*np.sqrt(2*math.pi)))*np.exp(-(np.power((x-mu),2)/(2*np.power(sigmag,2))))
normcdf = (0.5*(1+sp.erf((alpha*((x-mu)/sigmag))/(np.sqrt(2)))))
return 2*a*normpdf*normcdf + c
x = np.linspace(0,100,100)
y = func(x, 10,30, 0,0,1)
yn = y + 0.001*np.random.normal(size=len(x))
popt, pcov = curve_fit(func, x, yn,) #p0=(9,35,0,9,1))
y_fit= func(x,popt[0],popt[1],popt[2],popt[3],popt[4])
plt.plot(x,yn)
plt.plot(x,y_fit)
The issue seems to pop up when I shift the gaussian too far from zero (using mu). I have tried giving initial values, even those identical to my original function, but it does not solve the problem. For a value of mu=10, curve_fit works perfectly, but if I use mu>=30 it not longer fits the data.
Giving starting points for minimization often works wonders. Try giving the minimizer some information on the position of the maximum and the width of the curve:
popt, pcov = curve_fit(func, x, yn, p0=(1./np.std(yn), np.argmax(yn) ,0,0,1))
Changing this single line in your code with sigma=10 and mu=50 produces
You can call curve_fit many times with random initial guess, and choose the parameters with minimum error.
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
import math as math
import scipy.special as sp
def func(x, sigmag, mu, alpha, c,a):
#normal distribution
normpdf = (1/(sigmag*np.sqrt(2*math.pi)))*np.exp(-(np.power((x-mu),2)/(2*np.power(sigmag,2))))
normcdf = (0.5*(1+sp.erf((alpha*((x-mu)/sigmag))/(np.sqrt(2)))))
return 2*a*normpdf*normcdf + c
x = np.linspace(0,100,100)
y = func(x, 10,30, 0,0,1)
yn = y + 0.001*np.random.normal(size=len(x))
results = []
for i in xrange(50):
p = np.random.randn(5)*10
try:
popt, pcov = curve_fit(func, x, yn, p)
except:
pass
err = np.sum(np.abs(func(x, *popt) - yn))
results.append((err, popt))
if err < 0.1:
break
err, popt = min(results, key=lambda x:x[0])
y_fit= func(x, *popt)
plt.plot(x,yn)
plt.plot(x,y_fit)
print len(results)

Categories