Python curvefit inside a for loop - python

I have a simple linear regression model which I find coefficients using python curve-fit as follows:
import numpy as np
from scipy.optimize import curve_fit
def line(x,m,c): #linear fit function in order to get the slope
return m*x + c
x = np.array([2005.38,2005.46,2017.39])
y = np.array([631137.78, 631137.88, 631138.12])
popt, pcov = curve_fit(line,x,y)
slope = popt[0]
intercept = popt[1]
perr = np.sqrt(np.diag(pcov))
slope_err = perr[0]
intercept_err = perr_RA[1]
Then I have performed Monte Carlo simulation based on prior and generated about 1000 similar y array as follows however my x array should stay the same. So same x array for all MC generated y arrays:
y = np.array([631137.97960858, 631137.97958298, 631137.97544918]),
array([631138.00349615, 631138.00462398, 631138.18676081]),
array([631137.83121579, 631137.83457397, 631138.37689362]),
array([631138.03276579, 631138.03322997, 631138.10819225]),
array([631137.79168171, 631137.79288829, 631137.98774176])]
Now, I would like to perform the same calculation and obtain coefficients as I showed above, however, when I put them in a for loop it does not properly calculate coefficients.
nsims = 1000
y = []
slope_mc = []
int_mc = []
for i in range(nsims):
m = models[i]
y.append(m[:,0])
popt, pcov = curve_fit(line,x,m[:,0])
slope = popt[0]
intercept = popt[1]
slope_mc.append(slope)
int_mc.append(intercept)
I have received an error stating
OptimizeWarning: Covariance of the parameters could not be estimated
category=OptimizeWarning)
I have look at similar solutions like this one but it did not solve my problem. Also is there an easier/faster way without using for loop? I appreciate any help.

Related

Monte-Carlo Fitting on python data

I have written a Monte-Carlo simulation to fit 49 data points with asymmetric error bars. Since the errors are asymmetric on both axis, I cannot simply use scipy.optimize.curve_fit module. This is my basic approach:
Generate a list of 1000 random numbers from within the confidence level (error range) using triangular probability distribution distribution with maximum probability at a certain data point. Now I have a list of dimensions [49*1000].
Convert this list from [491000] to [100049]. I did this so that I have a data set of 1000 samples of 49 points which are within the error range.
Use scipy's curve_fit to fit these 1000 samples seperately and find the free parameters in the function y=m*x+c (c is free parameter and I already know the slope m.)
Find mean squared error in each of these 1000 samples using sklearn.metrics.mean_squared_error module.
Find the index with least Mean squared and use the popt value (c parameter) at this index to plot the fit.
Working Code: This is my code:
trials=int(1e4)
#xdata: most probable x value
#ydata: most probable y value
#xerr6low: lower bound on x error
#xerr6up: upper bound on x error
#yerr6low: lower bound on y error
#yerr6up: upper bound on y error
#generating random number using triangular distribution weighted with highest probability at xdata/ydata:
xarray1=[0]*len(obs6xr)
yarray1=[0]*len(obs6xr)
for i in range(0,len(obs6xr)):
xarray1[i]=np.random.triangular(xdata[i]-xerr6low[i],xdata[i],xdata[i]+xerr6up[i],trials)
yarray1[i]=np.random.triangular(ydata[i]-yerr6low[i],ydata[i],ydata[i]+yerr6up[i],trials)
xarray=[list(x) for x in zip(*xarray1)]
yarray=[list(x) for x in zip(*yarray1)]
def func(x, c):
#return (np.log10(a)+(b*(x-12))+np.log10(10**(8+c)))
#return ((x/1e12)**a)*(b)*(10**(8+c))
m = 1.65
mx = [element * m for element in x]
y = [j+c for j in mx]
return y
#Fit for the parameters a, b, c of the function func:
popt=np.zeros(trials)
pcov=np.zeros(trials)
print(len(xarray),len(yarray),len(popt))
for i in tqdm(range (trials),desc='Optimizing'):
popt[i], pcov[i] = curve_fit(func, xarray[i], np.array(yarray[i]))
MSE=np.zeros(trials)
for i in tqdm(range (trials),desc='Calculating MSE'):
MSE[i]=mean_squared_error(np.array(yarray[i]),func(np.array(xarray[i]),popt[i]))
minimum_MSE=np.amin(MSE)
index_MSE=np.where(MSE == np.amin(MSE))
print('minimum MSE = ',minimum_MSE,'at index = ', index_MSE)
print(popt[index_MSE])
def MbhthShimasakupropc(x,c):
Mbhth = (10**(c))*(x**1.65)
return Mbhth
plt.figure(figsize=(5,5))
x=np.logspace(11,14,trials)
plt.loglog(obs6x,MbhthShimasakuprop(obs6x,popt[index_MSE]))
plt.xscale('log')
plt.yscale('log')
m=np.logspace(np.log10(4e10),np.log10(3e14),1000)
plt.errorbar(obs0x, obs0y, xerr=asymmetric_errorx0, yerr=asymmetric_errory0, fmt='o',color='black', markersize='2.5', ecolor='black',capsize=2, elinewidth=1)
plt.errorbar(obs6x, obs6y, xerr=asymmetric_errorx6, yerr=asymmetric_errory6, fmt='o',color='red', markersize='2.5', ecolor='red',capsize=2, elinewidth=1)
plt.loglog(m,MbhthShimasaku(m,0),color='black',linestyle=':',label='Shimasaku-Ferrarese z=0')
plt.xlim(4e10,3e14)
plt.ylim(1e6,1e11)
plt.legend(['Monte-Carlo Fitting','local relation','z=0','z~6'])
plt.show()
Result:
As we can see, this code works perfectly fine. But if I change function to 2 parameters (a&b),
func(x,a,b) and apply the same code with some small tweaks, the code fails miserably.
Not Working Code:
trials=int(1e3)
#generating random number:
xarray1=[0]*len(obs6xr)
yarray1=[0]*len(obs6xr)
for i in range(0,len(obs6xr)):
xarray1[i]=np.random.triangular(xdata[i]-xerr6low[i],xdata[i],xdata[i]+xerr6up[i],trials)
yarray1[i]=np.random.triangular(ydata[i]-yerr6low[i],ydata[i],ydata[i]+yerr6up[i],trials)
xarray=[list(x) for x in zip(*xarray1)]
yarray=[list(x) for x in zip(*yarray1)]
def func(x, a, b):
y=a*x+b
return y
#Fit for the parameters a, b, c of the function func:
popt=[0]*(trials)
pcov=[0]*(trials)
print(len(xarray),len(yarray),len(popt))
for i in tqdm(range (trials)):
popt[i], pcov[i] = curve_fit(func, np.array(xarray[i]), np.array(yarray[i]))
MSE=np.zeros(trials)
for i in tqdm(range (trials)):
MSE[i]=mean_squared_error(np.array(yarray[i]),func(np.array(xarray[i]),np.array(popt[i])[0],np.array(popt[i])[1]))
minimum_MSE=np.amin(MSE)
index_MSE=np.where(MSE == np.amin(MSE))
print('minimum MSE = ',minimum_MSE,'at index = ', index_MSE)
def MbhthShimasakupropmc(x,m,c):
Mbhth = (10**(c))*(x**m)
return Mbhth
plt.figure(figsize=(5,5))
x=np.logspace(11,14,trials)
plt.loglog(obs6x,MbhthShimasakupropmc(obs6x,*popt[[index_MSE][0][0][0]]))
plt.xscale('log')
plt.yscale('log')
m=np.logspace(np.log10(4e10),np.log10(3e14),1000)
plt.errorbar(obs0x, obs0y, xerr=asymmetric_errorx0, yerr=asymmetric_errory0, fmt='o',color='black', markersize='2.5', ecolor='black',capsize=2, elinewidth=1)
plt.errorbar(obs6x, obs6y, xerr=asymmetric_errorx6, yerr=asymmetric_errory6, fmt='o',color='red', markersize='2.5', ecolor='red',capsize=2, elinewidth=1)
plt.loglog(m,MbhthShimasaku(m,0),color='black',linestyle=':',label='Shimasaku-Ferrarese z=0')
plt.legend(['Monte-Carlo Fitting','local relation','z=0','z~6'])
plt.show()
Result:
I don't know what I am doing wrong. Can someone help me debug the issue.

How to fit this data in python and scipy?

I have some function which behaves as shown below i.e. some tapered/decaying oscillations
I want to fit the data using scipy's curve_fit. I have previously asked a question related to fitting functions with scipy, which was well answered here, and highlighted the importance of the initial guess for the values of the fitting parameters.
However, I am struggling to fit this data in a way which captures both the oscillations and the decay. My approach is as follows:
from scipy.optimize import curve_fit
import numpy as np
def Fit(x,y):
#Define the function fit
func = ansatz
#Define the initial guess of parameters
mag = (y.max() + y.min()) / 2
y_shifted = y - mag
omega_guess = np.pi * np.sum(y_shifted[:-1] * y_shifted[1:] < 0) / (x.max() - x.min())
lam = np.log(2) / 1e7 #Rough guess based on approximate half life
p0 = (mag,mag, omega_guess,mag,lam)
#Do the fit
popt, pcov = curve_fit(func, x,y,p0=p0)
# return
return func(x, *popt)
def ansatz(x,A,B,omega,offset,lam):
osc = A*np.sin(omega*x) + B*np.cos(omega*x)
linear = offset
decay = np.exp(-x*lam)
return decay*osc + linear
data = np.load('example.npy')
x = data[:,0]
y = data[:,1]
yFit = Fit(x,y)
This approach captures the decay, but not the oscillations. What is erroneous with my approach? Guesses for fit parameters? Function ansatz? Code implementation?

How do I specify error of Y-variable when fitting with lmfit?

I'm almost new to Python and I'm trying to fit data from college using lmfit. The Y variable has a variable error of 3%. How do I add that error to the fitting process? I am changing from scipy's curve fit and in scipy it was really easy to do so, just creating an array with the error values and specifying the error when fitting by adding the text "sigma = [yourarray]"
This is my current code:
from lmfit import Minimizer, Parameters, report_fit
import matplotlib.pyplot as plt
w1, V1, phi1, scal1 = np.loadtxt("./FiltroPasaBajo_1.txt", delimiter = "\t", unpack = True)
t = w1
eV= V1*0.03 + 0.01
def funcion(parametros, x, y):
R = parametros['R'].value
C = parametros['C'].value
modelo = 4/((1+(x**2)*(R**2)*(C**2))**1/2)
return modelo - y
parametros = Parameters()
parametros.add('R', value = 1000, min= 900, max = 1100)
parametros.add('C', value = 1E-6, min = 1E-7, max = 1E-5)
fit = Minimizer(funcion, parametros, fcn_args=(t,V1))
resultado = fit.minimize()
final = V1 + resultado.residual
report_fit(resultado)
try:
plt.plot(t, V1, 'k+')
plt.plot(t, final, 'r')
plt.show()
except ImportError:
pass
V1 are the values I measured, and eV would be the array of errors. t is the x coordinate.
Thank you for your time
The minimize() function minimizes an array in the least-square sense, adjusting the variable parameters in order to minimize (resid**2).sum() for the resid array returned by your objective function. It really does not know anything about the uncertainties in your data or even about your data. To use the uncertainties in your fit, you need to pass in your array eV just as you pass in t and V1 and then use that in your calculation of the array to be minimized.
One typically wants to minimize Sum[ (data-model)^2/epsilon^2 ], where epsilon is the uncertainty in the data (your eV), so the residual array should be altered from data-model to (data-model)/epsilon. For your fit, you would want
def funcion(parametros, x, y, eps):
R = parametros['R'].value
C = parametros['C'].value
modelo = 4/((1+(x**2)*(R**2)*(C**2))**1/2)
return (modelo - y)/eps
and then use this with
fit = Minimizer(funcion, parametros, fcn_args=(t, V1, eV))
resultado = fit.minimize()
...
If you use the lmfit.Model interface (designed for curve-fitting), then you could pass in weights array that multiplies data -model, and so would be 1.0 / eV to represent weighting for uncertainties (as above with minimize). Using the lmfit.Model interface and providing uncertainties would then look like this:
from lmfit import Model
# model function, to model the data
def func(t, r, c):
return 4/((1+(t**2)*(r**2)*(c**2))**1/2)
model = Model(func)
parametros = model.make_params(r=1000, c=1.e-6)
parametros['r'].set(min=900, max=1100)
parametros['c'].set(min=1.e-7, max=1.e-5)
resultado = model.fit(V1, parametros, t=t, weights=1.0/eV)
print(resultado.fit_report())
plt.errorbar(t, V1, eV, 'k+', label='data')
plt.plot(t, resultado.best_fit, 'r', label='fit')
plt.legend()
plt.show()
hope that helps....
I think you cannot provide sigma in fit.minimize() directly.
However I see that fit.minimize() uses scipy's leastsq method (by default) which is the same method used by scipy's curve_fit.
If you look into scipy's curve_fit source, it does following with the sigma (for 1-d case).
transform = 1.0 / sigma
jac = _wrap_jac(jac, xdata, transform)
res = leastsq(func, p0, Dfun=jac, full_output=1, **kwargs)
Since fit.minimize() allows you to pass kwargs (Dfun) for leastsq, you can pass the jac the way it is done in scipy curve_fit.

Find the good fit measurement for Non Linear model (Gaussian, Power, Exponential and Log-Logistic)

I'm very new to stat. I'm wondering that is there a way to find the goodness fit score for non-linear functions (Gaussian, Power, Log-logistics)? I tried to used curve fit function in scipy to fit my data into the model. I know that if it's linear regression we can find R-squared. However, I really have no idea how to get the measurement if my data fit with all those functions. I did some research somebody used pseudo-R squared and some used Chi-quare. Here is my code for running gaussian and try to find the fit.
# X,Y -> both are in array
x = ar(df['r'].values)
y = ar(df['D'].values)
# Parameters for Gaussian
n = len(x)
mean = sum(x*yn)/n
sigma = math.sqrt(sum(y*(x-mean)**2)/n)
def gaussian(x, a, x0, sigma):
return (a/(sigma*math.sqrt(2 * 3.14159265)))*exp(-(x-x0)**2/(2*sigma**2))
# Curve fit from Scipy
popt, pcov = curve_fit(gaussian, x, yn, p0=[1, mean, sigma])
# First option :Finding best fit from Chi-square
p1 = popt[0]
p2 = popt[1]
p3 = popt[2]
chisqr = sum((yn- guassian(x,p1,p2,p3)**2/sigma**2)
dof = len(yn) - 2
GOF = 1 - chi2.cdf(chisqr,dof)
#Second option : R^2 = SS_reg/SS_tot
y_fit = [gaus(xi, *popt) for xi in x]
y_bar = np.sum(yn)/len(y)
ssreg = np.sum([ (yihat - ybar)**2 for yihat in y_fit])
sstot = np.sum([ (yi - ybar)**2 for yi in yn])
results = ssreg / sstot
Another question:
Are my model correct?
Gaussian:
(a/(sigma*math.sqrt(2 * 3.14159265)))*exp(-(x-x0)**2/(2*sigma**2))
Power:
x*exp(-b)
Exponential:
exp(-b*x)
Logarithm
:exp(-x) / (1+exp(-x))**2
Are all this correct? I got the low GOF score which the graph from the result of curve_fit looks so fit. Thank you so much for your time. Really Appreciate

Nonlinear e^(-x) regression using scipy, python, numpy

The code below is giving me a flat line for the line of best fit rather than a nice curve along the model of e^(-x) that would fit the data. Can anyone show me how to fix the code below so that it fits my data?
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize
def _eNegX_(p,x):
x0,y0,c,k=p
y = (c * np.exp(-k*(x-x0))) + y0
return y
def _eNegX_residuals(p,x,y):
return y - _eNegX_(p,x)
def Get_eNegX_Coefficients(x,y):
print 'x is: ',x
print 'y is: ',y
# Calculate p_guess for the vectors x,y. Note that p_guess is the
# starting estimate for the minimization.
p_guess=(np.median(x),np.min(y),np.max(y),.01)
# Calls the leastsq() function, which calls the residuals function with an initial
# guess for the parameters and with the x and y vectors. Note that the residuals
# function also calls the _eNegX_ function. This will return the parameters p that
# minimize the least squares error of the _eNegX_ function with respect to the original
# x and y coordinate vectors that are sent to it.
p, cov, infodict, mesg, ier = scipy.optimize.leastsq(
_eNegX_residuals,p_guess,args=(x,y),full_output=1,warning=True)
# Define the optimal values for each element of p that were returned by the leastsq() function.
x0,y0,c,k=p
print('''Reference data:\
x0 = {x0}
y0 = {y0}
c = {c}
k = {k}
'''.format(x0=x0,y0=y0,c=c,k=k))
print 'x.min() is: ',x.min()
print 'x.max() is: ',x.max()
# Create a numpy array of x-values
numPoints = np.floor((x.max()-x.min())*100)
xp = np.linspace(x.min(), x.max(), numPoints)
print 'numPoints is: ',numPoints
print 'xp is: ',xp
print 'p is: ',p
pxp=_eNegX_(p,xp)
print 'pxp is: ',pxp
# Plot the results
plt.plot(x, y, '>', xp, pxp, 'g-')
plt.xlabel('BPM%Rest')
plt.ylabel('LVET/BPM',rotation='vertical')
plt.xlim(0,3)
plt.ylim(0,4)
plt.grid(True)
plt.show()
return p
# Declare raw data for use in creating regression equation
x = np.array([1,1.425,1.736,2.178,2.518],dtype='float')
y = np.array([3.489,2.256,1.640,1.043,0.853],dtype='float')
p=Get_eNegX_Coefficients(x,y)
It looks like it's a problem with your initial guesses; something like (1, 1, 1, 1) works fine:
You have
p_guess=(np.median(x),np.min(y),np.max(y),.01)
for the function
def _eNegX_(p,x):
x0,y0,c,k=p
y = (c * np.exp(-k*(x-x0))) + y0
return y
So that's test_data_maxe^( -.01(x - test_data_median)) + test_data_min
I don't know much about the art of choosing good starting parameters, but I can say a few things. leastsq is finding a local minimum here - the key in choosing these values is to find the right mountain to climb, not to try to cut down on the work that the minimization algorithm has to do. Your initial guess looks like this (green):
(1.736, 0.85299999999999998, 3.4889999999999999, 0.01)
which results in your flat line (blue):
(-59.20295956, 1.8562 , 1.03477144, 0.69483784)
Greater gains were made in adjusting the height of the line than in increasing the k value. If you know you're fitting to this kind of data, use a larger k. If you don't know, I guess you could try to find a decent k value by sampling your data, or working back from the slope between an average of the first half and the second half, but I wouldn't know how to go about that.
Edit: You could also start with several guesses, run the minimization several times, and take the line with the lowest residuals.

Categories