How to get intercept slope and fit line by using scipy ODR - python

I am new to python. I want to perform orthogonal distance regression by using Scipy ODR by using the code below. I do not know how can I extract slope and intercept from the output and on what logic we give values to beta0 in "myodr = odr.ODR(mydata, linear, beta0=[0.,1.]"
def f(B, x):
return B[0]*x + B[1]
linear = odr.Model(f)
mydata = odr.Data(x, y, wd=1./xerr, we=1./yerr)
myodr = odr.ODR(mydata, linear, beta0=[0.,1.]) # how to put beta0 values here
myoutput = myodr.run()
myoutput.pprint()

The fitted parameters are stored in the beta attribute of the Output instance returned by the run() method.
You used the generic Model, so the meaning of the parameters is determined by how you use them in f. In your case, you can recover the slope and intercept with
slope, intercept = myoutput.beta
ODR fitting involves solving a nonlinear problem with numerical methods that require a starting "guess" of the solution. The better the guess, the more likely that the numerical method will converge to a solution. Determining a good initial guess for an arbitrary model can be difficult. You model is linear, so there are several methods that might work fine. You could, for example, use the result of a standard least squares fit. Or you could use the equation of the line through two data points, say the points associated with the smallest and largest x values.
Because your model is linear, you might prefer to use the predefined model unilinear. Then you don't have to define f, and you don't have to provide beta0, because the unilinear model includes its own method for generating an initial guess. The meaning of the parameters in the unilinear model are the same as in your f function, so you can continue to use slope, intercept = myoutput.beta to retrieve the results.
Something like this should work:
mydata = odr.Data(x, y, wd=1./xerr, we=1./yerr)
myodr = odr.ODR(mydata, model=odr.unilinear)
myoutput = myodr.run()
slope, intercept = myoutput.beta

Related

How to find length of parameter input of function?

I'm trying to use Scipy's ODR to fit various different curves to data. These curves have to be given as an ODR Model, which is defined by a function. This function has two arguments: p and x. p is a list of the parameters that will be optimised. Example:
def f(p, x):
m, c = p
return m*x + c
model = Model(f)
data = RealData(xdata, ydata)
odr_setup = ODR(data, model, beta0=[0,0], partol=0.001)
odr_result = odr_setup.run()
My problem: the length of p depends on the function I am fitting. If the length of beta0 and p are not the same, it gives a ValueError. To get round this, I have constructed some nested try-except statements to get the matching length. Is there a more elegant way of achieving the same thing?
Ideally I would like something like ODR(data, model, ***beta0=[0]*n_params***, partol=0.001). How would I find n_params? Or is there a better way?
Thanks!

Problem while fitting a set of data points to arbitrary curve with Scipy

I have a set of data points which, according to the model I want to implement, could be modelled with a certain curve (in this case, a product between an exponential and a complementary error function).
For fitting these data into such a curve, I tried:
import numpy as np
from scipy.optimize import curve_fit
from scipy import special
x_fit = np.linspace(0,1,1000)
def fitted_function(x_fit, c, d, S):
return c*np.exp(((S*d/2)**2)-x_fit*d)*special.erfc(S*d/2-x_fit/S)
FitParameters, FitCovariance = curve_fit(fitted_function, x_data, y_data, maxfev = 100000)
It does not give me any particular error, but the result of the fitting is evidently wrong. I strongly suspect that it has to do with the the part x_fit/S, where the fitting parameter S appears as a denominator.
For example, I encounter the same problem while fitting a simple exponential: if I define the fitting curve with
return a*np.exp(-x_fit/b)
with a, b fitting parameters; since the fitting parameter b appears as a denominator, I find the same problem (i.e. the resulting fitted curve is a horizontal line for some reason).
For the case of a simple exponential I can simple bypass this by doing
return a*np.exp(-b*x_fit)
so that b is not a denominator anymore and the fitted curve is really an exponential curve. For my current case, instead, I cannot do this since S appears ad a numerator and a denominator in different part of the expression.
Any ideas? Thank you in advance!

How do I improve a Gaussian/Normal fit in Python 3.X by using a running median?

I have an array of 100x100 data points, where I'm trying to perform a Gaussian fit to each column of 100 values in the array. I then want the parameters of the Gaussian found by using the fit of the first column to be the initial parameters of the starting point for the next column to use. Let's say I start with the initial parameters of 1000, 0, and 1, and the fit finds values of 800, 3, and 1.5. I then want the fitter to use these three parameters as initial values for the next column.
My code is:
x = np.linspace(-50,50,100)
Gauss_Model = models.Gaussian1D(amplitude = 1000., mean = 0, stddev = 1.)
Fitting_Model = fitting.LevMarLSQFitter()
Fit_Data = []
for i in range(0, Data_Array.shape[0]):
Fit_Data.append(Fitting_Model(Gauss_Model, x, Data_Array[:,i]))
Right now it uses the same initial values for every fit. Does anyone know how to perform such a running median/mean for a Gaussian fitting method? Would really appreciate any help or being pointed in the right direction, thanks!
I'm not familiar with the specific library you are using, but if you can get your fitted parameters out with something like fit_data[-1].amplitude or fit_data[-1].mean, then you could modify your loop to use something like:
for i in range(0, data_array.shape[0]):
if fit_data: # true if not an empty list
Gauss_Model = models.Gaussian1D(amplitude=fit_data[-1].amplitude,
mean=fit_data[-1].mean,
stddev=fit_data[-1].stddev)
fit_data.append(Fitting_Model(Gauss_Model, x, Data_Array[:,i]))
basically checking whether you have already fit a model, and if you have, use the most recent fitted amplitude, mean, and standard deviation as the starting point for your next Gauss_Model.
A thought: this might speed up your fitting, but it shouldn't result in a "better" fit to the 100 data points in each fit operation. Your resulting model is probably the best fit model to the data it was presented. If you want to estimate the error in the parameters of your model, you can use the fact that, for two normal distributions A ~ N(m_a, v_a) and B ~ N(m_b, v_b), the distribution A + B will have mean m_a + m_b and variance is v_a + v_b. Thus, the distribution of your means will be N(sum(means)/n, sum(variances)/n). Basically you can say that your true mean is centered at the mean of your means with standard deviation (sum(stddev)/sqrt(n)).
I also cannot tell what library you are using, and the details of how to do this probably depend on the details of how that library stores the fitted values. I can say that for lmfit (https://lmfit.github.io/lmfit-py/) we struggled with this sort of usage and arrived at a design that makes what you are trying to do pretty easy. With lmfit, you might compose this problem as:
import numpy as np
from lmfit import GaussianModel
x = np.linspace(-50,50,100)
# get Data_Array from somewhere....
# create a model for a Gaussian
Gauss_Model = GaussianModel()
# make a set of parameters, setting initial values
params = Gauss_Model.make_params(amplitude=1000, center=0, sigma=1.0)
Fit_Results = []
for i in range(Data_Array.shape[1]):
result = Gauss_Model.fit(Data_Array[:, i], params, x=x)
Fit_Results.append(result)
# update `params` with the current best fit params for the next column
params = result.params
Note that this works because lmfit is careful that Model.fit() will not alter the input parameters, and will put the resulting best-fit parameters for each fit in result.params.
And, if you decide you do want to have all columns use the original initial values, just comment out that last params = result.params.
Lmfit has a lot more bells and whistles, but I hope that helps you do what you need.

Gaussian fit in Python - parameters estimation

I want to fit an array of data (in the program called "data", of size "n") with a Gaussian function and I want to get the estimations for the parameters of the curve, namely the mean and the sigma. Is the following code, which I found on the Web, a fast way to do that? If so, how can I actually get the estimated values of the parameters?
import pylab as plb
from scipy.optimize import curve_fit
from scipy import asarray as ar,exp
x = ar(range(n))
y = data
n = len(x) #the number of data
mean = sum(x*y)/n #note this correction
sigma = sum(y*(x-mean)**2)/n #note this correction
def gaus(x,a,x0,sigma,c):
return a*exp(-(x-x0)**2/(sigma**2))+c
popt,pcov = curve_fit(gaus,x,y,p0=[1,mean,sigma,0.0])
print popt
print pcov
plt.plot(x,y,'b+:',label='data')
plt.plot(x,gaus(x,*popt),'ro:',label='fit')
plt.legend()
plt.title('Fig. 3 - Fit')
plt.xlabel('q')
plt.ylabel('data')
plt.show()
To answer your first question, "Is the following code, which I found on the Web, a fast way to do that?"
The code that you have is in fact the right way to proceed with fitting your data, when you believe is Gaussian and know the fitting function (except change the return function to
a*exp(-(x-x0)**2/(sigma**2)).
I believe for a Gaussian function you don't need the constant c parameter.
A common use of least-squares minimization is curve fitting, where one has a parametrized model function meant to explain some phenomena and wants to adjust the numerical values for the model to most closely match some data. With scipy, such problems are commonly solved with scipy.optimize.curve_fit.
To answer your second question, "If so, how can I actually get the estimated values of the parameters?"
You can go to the link provided for scipy.optimize.curve_fit and find that the best fit parameters reside in your popt variable. In your example, popt will contain the mean and sigma of your data. In addition to the best fit parameters, pcov will contain the covariance matrix, which will have the errors of your mean and sigma. To obtain 1sigma standard deviations, you can simply use np.sqrt(pcov) and obtain the same.

Fitting a log-log data using scipy.optmize.curve_fit

I have two variables x and y which I am trying to fit using curve_fit from scipy.optimize.
The equation that fits the data is a simple power law of the form y=a(x^b). The fit seems to be well for the data when I set the x and y axis to log scale, i.e ax.set_xscale('log') and ax.set_yscale('log').
Here is the code:
def fitfunc(x,p1,p2):
y = p1*(x**p2)
return y
popt_1,pcov_1 = curve_fit(fitfunc,x,y,p0=(1.0,1.0))
p1_1 = popt_1[0]
p1_2 = popt_1[1]
residuals1 = (ngal_mstar_1) - fitfunc(x,p1_1,p1_2)
xi_sq_1 = sum(residuals1**2) #The chi-square value
curve_y_1 = fitfunc(x,p1_1,p1_2) #This is the fit line seen in the graph
fig = plt.figure(figsize=(14,12))
ax1 = fig.add_subplot(111)
ax1.scatter(x,y,c='r')
ax1.plot(y,curve_y_1,'y.',linewidth=1)
ax1.legend(loc='best',shadow=True,scatterpoints=1)
ax1.set_xscale('log') #Scale is set to log
ax1.set_yscale('log') #SCale is set to log
plt.show()
When I use true log-log values for x and y, the power law fit becomes y=10^(a+b*log(x)),i.e raising the power of the right side to 10 as it is logbase 10. Now both by x and y values are log(x) and log(y).
The fit for the above does not seem to be good. Here is the code I have used.
def fitfunc(x,p1,p2):
y = 10**(p1+(p2*x))
return y
popt_1,pcov_1 = curve_fit(fitfunc,np.log10(x),np.log10(y),p0=(1.0,1.0))
p1_1 = popt_1[0]
p1_2 = popt_1[1]
residuals1 = (y) - fitfunc((x),p1_1,p1_2)
xi_sq_1 = sum(residuals1**2)
curve_y_1 = fitfunc(np.log10(x),p1_1,p1_2) #The fit line uses log(x) here itself
fig = plt.figure(figsize=(14,12))
ax1 = fig.add_subplot(111)
ax1.scatter(np.log10(x),np.log10(y),c='r')
ax1.plot(np.log10(y),curve_y_1,'y.',linewidth=1)
plt.show()
THE ONLY DIFFERENCE BETWEEN THE TWO PLOTS IS THE FITTING EQUATIONS, AND FOR THE SECOND PLOT THE VALUES HAVE BEEN LOGGED INDEPENDENTLY. Am I doing something wrong here, because I want a log(x) vs log(y) plot and the corresponding fit parameters (slope and intercept)
Your transformation of the power-law model to log-log is wrong, i.e. your second fit actually fits a different model. Take your original model y=a*(x^b) and apply the logarithm on both sides, you will get log(y) = log(a) + b*log(x). Thus, your model in log-scale should simply read y' = a' + b*x', where the primes indicate variables in log-scale. The model is now a linear function, a well known result that all power-laws become linear functions in log-log.
That said, you can still expect some small differences in the two versions of your fit, since curve_fit will optimise the least-squares problem. Therefore, in log scale, the fit will minimise the relative error between the fit and the data, while in linear scale, the fit will minimise the absolute error. Thus, in order to decide which way is actually the better domain for your fit, you will have to estimate the error in your data. The data you show certainly does not have a constant uncertainty in log-scale, so on linear scale your fit might be more faithful. If details about the error in each data-point are known, then you could consider using the sigma parameter. If that one is used properly, there should not be much difference in the two approaches. In that case, I would prefer the log-scale fitting, as the model is simpler and therefore likely to be more numerically stable.

Categories