Gaussian fit in Python - parameters estimation - python

I want to fit an array of data (in the program called "data", of size "n") with a Gaussian function and I want to get the estimations for the parameters of the curve, namely the mean and the sigma. Is the following code, which I found on the Web, a fast way to do that? If so, how can I actually get the estimated values of the parameters?
import pylab as plb
from scipy.optimize import curve_fit
from scipy import asarray as ar,exp
x = ar(range(n))
y = data
n = len(x) #the number of data
mean = sum(x*y)/n #note this correction
sigma = sum(y*(x-mean)**2)/n #note this correction
def gaus(x,a,x0,sigma,c):
return a*exp(-(x-x0)**2/(sigma**2))+c
popt,pcov = curve_fit(gaus,x,y,p0=[1,mean,sigma,0.0])
print popt
print pcov
plt.plot(x,y,'b+:',label='data')
plt.plot(x,gaus(x,*popt),'ro:',label='fit')
plt.legend()
plt.title('Fig. 3 - Fit')
plt.xlabel('q')
plt.ylabel('data')
plt.show()

To answer your first question, "Is the following code, which I found on the Web, a fast way to do that?"
The code that you have is in fact the right way to proceed with fitting your data, when you believe is Gaussian and know the fitting function (except change the return function to
a*exp(-(x-x0)**2/(sigma**2)).
I believe for a Gaussian function you don't need the constant c parameter.
A common use of least-squares minimization is curve fitting, where one has a parametrized model function meant to explain some phenomena and wants to adjust the numerical values for the model to most closely match some data. With scipy, such problems are commonly solved with scipy.optimize.curve_fit.
To answer your second question, "If so, how can I actually get the estimated values of the parameters?"
You can go to the link provided for scipy.optimize.curve_fit and find that the best fit parameters reside in your popt variable. In your example, popt will contain the mean and sigma of your data. In addition to the best fit parameters, pcov will contain the covariance matrix, which will have the errors of your mean and sigma. To obtain 1sigma standard deviations, you can simply use np.sqrt(pcov) and obtain the same.

Related

Issues fitting gaussian to scatter plot

I'm having a lot of trouble fitting this data, particularly getting the fit parameters to match the expected parameters.
from scipy.optimize import curve_fit
import numpy as np
def gaussian_model(x, a, b, c, d): # add constant d
return a*np.exp(-(x-b)**2/(2*c**2))+d
x = np.linspace(0, 20, 100)
mu, cov = curve_fit(gaussian_model, xdata, ydata)
fit_A = mu[0]
fit_B = mu[1]
fit_C = mu[2]
fit_D = mu[3]
fit_y = gaussian_model(xdata, fit_A, fit_B, fit_C, fit_D)
print(mu)
plt.plot(x, fit_y)
plt.scatter(xdata, ydata)
plt.show()
Here's the plot
When I printed the parameters, I got values of -17 for amplitude, 2.6 for mean, -2.5 for standard deviation, and 110 for the base. This is very far off from what I would expect from the scatter plot. Any ideas why?
Also, I'm pretty new to coding, so any advice is helpful! Thanks everyone :)
Edit: figured out what was wrong! Just needed to add some guesses.
This is not an answer as expected.
This is an alternative method of fitting gaussian.
The process is not iteratif and doesn't requier initial "guessed" values of the parameters to start as in the usual methods.
The result is :
The method of calculus is shown below :
The general principle is explained with examples in https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales . This is a linear regression wrt an integral equation which solution is the gaussian function.
If one want more accurate and/or more specific result according to some specified criteria of fitting, one have to use a software with non-linear regression process. Then one can use the above result as initial values of parameters for a more robust iterative process.

Exponential curve fitting of pandas data in python

I'm trying to fit an exponential curve to some data represented by a pandas dataframe. The data looks like this:
The code I've used for curve fitting:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from scipy.optimize import curve_fit
t = df['time'].values
ym = df['value'].values
def func(t, c0, c1, c2, c3):
return c0 + c1*t - c2*np.exp(-c3*t)
p0 = [6e6, 0.01, 100, 0.01]
c, cov = curve_fit(func, t, ym, p0)
print(c) # Output: [-5.46019366e+06 3.19567938e+03 1.00000000e+08 1.00000000e+06]
yp = func(t, c[0], c[1], c[2], c[3])
plt.figure()
plt.plot(t/60, ym)
plt.plot(t/60, yp)
However, the fitted curve always seem to be linear like this:
I have tried different methods I've found online and always get the same linear result. My dataframe look like this, were Cycle_id corresponds to "time", and peak correspond to "value":
Any suggestion on how to fit this data is much appreciated, since I can't seem to find any errors in my code upon reviewing it, thus not getting any further..
Sorry for the poor answer. I have not enough knowledge about Python in practical use. Moreover it is not possible to get sufficiently correct data from a picture. A scanning provided data which was used in the below calculus but the results are probably not accurate.
I guess that the difficulty that you faced comes from the method of calculus which is iterative starting from "guessed" values of the parameters.
If we use a non-iterative method which doesn't need initial guessed values the calculus is generally more robust. Such a method is explain in this paper : https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales
They are a lot of numerical examples in the paper but unfortunately your function is not treated in full details. It is not difficult to adapt the method to this case : See below. The result is :
Possibly you can use the above values of p,a,b,c as initial values of parameters in a more classical method.
FOR INFOMATION :
The method of non-iterative regression uses a convenient integral equation which transforms the non-linear regression to a linear regression. In the present case the integral equation is :

Problem while fitting a set of data points to arbitrary curve with Scipy

I have a set of data points which, according to the model I want to implement, could be modelled with a certain curve (in this case, a product between an exponential and a complementary error function).
For fitting these data into such a curve, I tried:
import numpy as np
from scipy.optimize import curve_fit
from scipy import special
x_fit = np.linspace(0,1,1000)
def fitted_function(x_fit, c, d, S):
return c*np.exp(((S*d/2)**2)-x_fit*d)*special.erfc(S*d/2-x_fit/S)
FitParameters, FitCovariance = curve_fit(fitted_function, x_data, y_data, maxfev = 100000)
It does not give me any particular error, but the result of the fitting is evidently wrong. I strongly suspect that it has to do with the the part x_fit/S, where the fitting parameter S appears as a denominator.
For example, I encounter the same problem while fitting a simple exponential: if I define the fitting curve with
return a*np.exp(-x_fit/b)
with a, b fitting parameters; since the fitting parameter b appears as a denominator, I find the same problem (i.e. the resulting fitted curve is a horizontal line for some reason).
For the case of a simple exponential I can simple bypass this by doing
return a*np.exp(-b*x_fit)
so that b is not a denominator anymore and the fitted curve is really an exponential curve. For my current case, instead, I cannot do this since S appears ad a numerator and a denominator in different part of the expression.
Any ideas? Thank you in advance!

Reducing difference between two graphs by optimizing more than one variable in MATLAB/Python?

Suppose 'h' is a function of x,y,z and t and it gives us a graph line (t,h) (simulated). At the same time we also have observed graph (observed values of h against t). How can I reduce the difference between observed (t,h) and simulated (t,h) graph by optimizing values of x,y and z? I want to change the simulated graph so that it imitates closer and closer to the observed graph in MATLAB/Python. In literature I have read that people have done same thing by Lavenberg-marquardt algorithm but don't know how to do it?
You are actually trying to fit the parameters x,y,z of the parametrized function h(x,y,z;t).
MATLAB
You're right that in MATLAB you should either use lsqcurvefit of the Optimization toolbox, or fit of the Curve Fitting Toolbox (I prefer the latter).
Looking at the documentation of lsqcurvefit:
x = lsqcurvefit(fun,x0,xdata,ydata);
It says in the documentation that you have a model F(x,xdata) with coefficients x and sample points xdata, and a set of measured values ydata. The function returns the least-squares parameter set x, with which your function is closest to the measured values.
Fitting algorithms usually need starting points, some implementations can choose randomly, in case of lsqcurvefit this is what x0 is for. If you have
h = #(x,y,z,t) ... %// actual function here
t_meas = ... %// actual measured times here
h_meas = ... %// actual measured data here
then in the conventions of lsqcurvefit,
fun <--> #(params,t) h(params(1),params(2),params(3),t)
x0 <--> starting guess for [x,y,z]: [x0,y0,z0]
xdata <--> t_meas
ydata <--> h_meas
Your function h(x,y,z,t) should be vectorized in t, such that for vector input in t the return value is the same size as t. Then the call to lsqcurvefit will give you the optimal set of parameters:
x = lsqcurvefit(#(params,t) h(params(1),params(2),params(3),t),[x0,y0,z0],t_meas,h_meas);
h_fit = h(x(1),x(2),x(3),t_meas); %// best guess from curve fitting
Python
In python, you'd have to use the scipy.optimize module, and something like scipy.optimize.curve_fit in particular. With the above conventions you need something along the lines of this:
import scipy.optimize as opt
popt,pcov = opt.curve_fit(lambda t,x,y,z: h(x,y,z,t), t_meas, y_meas, p0=[x0,y0,z0])
Note that the p0 starting array is optional, but all parameters will be set to 1 if it's missing. The result you need is the popt array, containing the optimal values for [x,y,z]:
x,y,z = popt
h_fit = h(x,y,z,t_meas)

Python: Fit error function (erf) or similar to data

I need to fit a special distribution of data with any available function. The distribution does not really follow a specific theoretical prediction, so I just want to fit any given function without great meaning. I attached an image with a sample distribution and a fifth order polynomial fit to show that this simple approach does not really work.
I know the distribution closely resembles an error function, but I did not manage to fit such a function with scipy...
I hope anyone has either a way to fit an error function to such a distribution, or maybe can suggest a different type of function I could fit to describe this distribution.
You can fit any function you want:
from scipy.optimize import curve_fit
popt, pcov = curve_fit(func, xdata, ydata)
In case you want some function similar to erf, you can use for example:
def func(z,a,b):
return a*scipy.special.erf(z)+b
This will find the parameters a,b.
Further fitting parameters might be helpful:
def func(x, a, b, z, f):
return a * scipy.special.erf((x - z)*f) + b
To prevent a runtime error, the number of iterations (default = 800) can be adapted with maxfev:
popt, pcov = curve_fit(func, x, y, maxfev=8000)
(See Scipy curvefit RuntimeError:Optimal parameters not found: Number of calls to function has reached maxfev = 1000)

Categories