Python curve_fit choice of bounds and initial condition affect the result - python

I have a data set that is described by two free parameters which I want to determine using optimalization.curve_fit. The model is defined as follows
def func(x, a, b,):
return a*x*np.sqrt(1-b*x)
And the fitting part as
popt, pcov = opt.curve_fit(f = func, xdata = x_data, ydata= y_data, p0
= init_guess, bounds = ([a_min, b_min], [a_max, b_max]))
The outcome of the solutions for a and b depends quite strong on my choice of init_guess, i.e. the initial guess and also on the choice of the bounds.
Is there a way the solve this?

The authors of the Python scipy module have included the Differential Evolution genetic algorithm in scipy's optimization code as the module scipy.optimize.differential_evolution. This module can be used to stochastically find initial parameter values for non-linear regression.
Here is example code from RamanSpectroscopyFit, which uses scipy's genetic algorithm for initial parameter estimation for fitting Raman spectroscopy data:
import numpy as np
import pickle # for loading pickled test data
import matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import warnings
from scipy.optimize import differential_evolution
# Double Lorentzian peak function
# bounds on parameters are set in generate_Initial_Parameters() below
def double_Lorentz(x, a, b, A, w, x_0, A1, w1, x_01):
return a*x+b+(2*A/np.pi)*(w/(4*(x-x_0)**2 + w**2))+(2*A1/np.pi)*(w1/(4*(x-x_01)**2 + w1**2))
# function for genetic algorithm to minimize (sum of squared error)
# bounds on parameters are set in generate_Initial_Parameters() below
def sumOfSquaredError(parameterTuple):
warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
return np.sum((yData - double_Lorentz(xData, *parameterTuple)) ** 2)
def generate_Initial_Parameters():
# min and max used for bounds
maxX = max(xData)
minX = min(xData)
maxY = max(yData)
minY = min(yData)
parameterBounds = []
parameterBounds.append([-1.0, 1.0]) # parameter bounds for a
parameterBounds.append([maxY/-2.0, maxY/2.0]) # parameter bounds for b
parameterBounds.append([0.0, maxY*100.0]) # parameter bounds for A
parameterBounds.append([0.0, maxY/2.0]) # parameter bounds for w
parameterBounds.append([minX, maxX]) # parameter bounds for x_0
parameterBounds.append([0.0, maxY*100.0]) # parameter bounds for A1
parameterBounds.append([0.0, maxY/2.0]) # parameter bounds for w1
parameterBounds.append([minX, maxX]) # parameter bounds for x_01
# "seed" the numpy random number generator for repeatable results
result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
return result.x
# load the pickled test data from original Raman spectroscopy
data = pickle.load(open('data.pkl', 'rb'))
xData = data[0]
yData = data[1]
# generate initial parameter values
initialParameters = generate_Initial_Parameters()
# curve fit the test data
fittedParameters, niepewnosci = curve_fit(double_Lorentz, xData, yData, initialParameters)
# create values for display of fitted peak function
a, b, A, w, x_0, A1, w1, x_01 = fittedParameters
y_fit = double_Lorentz(xData, a, b, A, w, x_0, A1, w1, x_01)
plt.plot(xData, yData) # plot the raw data
plt.plot(xData, y_fit) # plot the equation using the fitted parameters
plt.show()
print(fittedParameters)

Related

How do I specify error of Y-variable when fitting with lmfit?

I'm almost new to Python and I'm trying to fit data from college using lmfit. The Y variable has a variable error of 3%. How do I add that error to the fitting process? I am changing from scipy's curve fit and in scipy it was really easy to do so, just creating an array with the error values and specifying the error when fitting by adding the text "sigma = [yourarray]"
This is my current code:
from lmfit import Minimizer, Parameters, report_fit
import matplotlib.pyplot as plt
w1, V1, phi1, scal1 = np.loadtxt("./FiltroPasaBajo_1.txt", delimiter = "\t", unpack = True)
t = w1
eV= V1*0.03 + 0.01
def funcion(parametros, x, y):
R = parametros['R'].value
C = parametros['C'].value
modelo = 4/((1+(x**2)*(R**2)*(C**2))**1/2)
return modelo - y
parametros = Parameters()
parametros.add('R', value = 1000, min= 900, max = 1100)
parametros.add('C', value = 1E-6, min = 1E-7, max = 1E-5)
fit = Minimizer(funcion, parametros, fcn_args=(t,V1))
resultado = fit.minimize()
final = V1 + resultado.residual
report_fit(resultado)
try:
plt.plot(t, V1, 'k+')
plt.plot(t, final, 'r')
plt.show()
except ImportError:
pass
V1 are the values I measured, and eV would be the array of errors. t is the x coordinate.
Thank you for your time
The minimize() function minimizes an array in the least-square sense, adjusting the variable parameters in order to minimize (resid**2).sum() for the resid array returned by your objective function. It really does not know anything about the uncertainties in your data or even about your data. To use the uncertainties in your fit, you need to pass in your array eV just as you pass in t and V1 and then use that in your calculation of the array to be minimized.
One typically wants to minimize Sum[ (data-model)^2/epsilon^2 ], where epsilon is the uncertainty in the data (your eV), so the residual array should be altered from data-model to (data-model)/epsilon. For your fit, you would want
def funcion(parametros, x, y, eps):
R = parametros['R'].value
C = parametros['C'].value
modelo = 4/((1+(x**2)*(R**2)*(C**2))**1/2)
return (modelo - y)/eps
and then use this with
fit = Minimizer(funcion, parametros, fcn_args=(t, V1, eV))
resultado = fit.minimize()
...
If you use the lmfit.Model interface (designed for curve-fitting), then you could pass in weights array that multiplies data -model, and so would be 1.0 / eV to represent weighting for uncertainties (as above with minimize). Using the lmfit.Model interface and providing uncertainties would then look like this:
from lmfit import Model
# model function, to model the data
def func(t, r, c):
return 4/((1+(t**2)*(r**2)*(c**2))**1/2)
model = Model(func)
parametros = model.make_params(r=1000, c=1.e-6)
parametros['r'].set(min=900, max=1100)
parametros['c'].set(min=1.e-7, max=1.e-5)
resultado = model.fit(V1, parametros, t=t, weights=1.0/eV)
print(resultado.fit_report())
plt.errorbar(t, V1, eV, 'k+', label='data')
plt.plot(t, resultado.best_fit, 'r', label='fit')
plt.legend()
plt.show()
hope that helps....
I think you cannot provide sigma in fit.minimize() directly.
However I see that fit.minimize() uses scipy's leastsq method (by default) which is the same method used by scipy's curve_fit.
If you look into scipy's curve_fit source, it does following with the sigma (for 1-d case).
transform = 1.0 / sigma
jac = _wrap_jac(jac, xdata, transform)
res = leastsq(func, p0, Dfun=jac, full_output=1, **kwargs)
Since fit.minimize() allows you to pass kwargs (Dfun) for leastsq, you can pass the jac the way it is done in scipy curve_fit.

How to perform a joint fit of several curves (in Python)?

Suppose I'm fitting some data points by a simple linear regression. Now I'd like to perform several joint linear regressions for several sets of data points. More specifically, I want one parameter to be equal among all fits, which is schematically depicted here for the y-axis intersection.
After searching Google for some time I could neither find any Python (Scipy) routine which does that, nor any general literature, how one would accomplish this.
Ideally, I want to perform those joint fits not only in the case of simple linear regressions, but also for more general fit functions (for instance, power-law fits with joint exponent).
The lmfit module allows you to do this, as mentioned in their FAQ:
from lmfit import minimize, Parameters, fit_report
import numpy as np
# residual function to minimize
def fit_function(params, x=None, dat1=None, dat2=None):
model1 = params['offset'] + x * params['slope1']
model2 = params['offset'] + x * params['slope2']
resid1 = dat1 - model1
resid2 = dat2 - model2
return np.concatenate((resid1, resid2))
# setup fit parameters
params = Parameters()
params.add('slope1', value=1)
params.add('slope2', value=-1)
params.add('offset', value=0.5)
# generate sample data
x = np.arange(0, 10)
slope1, slope2, offset = 1.1, -0.9, 0.2
y1 = slope1 * x + offset
y2 = slope2 * x + offset
# fit
out = minimize(residual, params, kws={"x": x, "dat1": y1, "dat2": y2})
print(fit_report(out))
# [[Fit Statistics]]
# # fitting method = leastsq
# # function evals = 9
# # data points = 20
# # variables = 3
# chi-square = 1.4945e-31
# reduced chi-square = 8.7913e-33
# Akaike info crit = -1473.48128
# Bayesian info crit = -1470.49408
# [[Variables]]
# slope1: 1.10000000 +/- 8.2888e-18 (0.00%) (init = 1)
# slope2: -0.90000000 +/- 8.2888e-18 (0.00%) (init = -1)
# offset: 0.20000000 +/- 3.8968e-17 (0.00%) (init = 0.5)
# [[Correlations]] (unreported correlations are < 0.100)
# C(slope1, offset) = -0.742
# C(slope2, offset) = -0.742
# C(slope1, slope2) = 0.551
I think this graphing code example does what you want, fitting two data sets with a single shared parameter. Note that if the data sets are of unequal length, that can effectively weight the fit toward the data set with more individual points. This example explicitly sets the initial parameter values to 1,0 - the curve_fit() defaults - and does not use scipy's genetic algorithm to help find initial parameter estimates.
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
y1 = np.array([ 16.00, 18.42, 20.84, 23.26])
y2 = np.array([-20.00, -25.50, -31.00, -36.50, -42.00])
comboY = np.append(y1, y2)
x1 = np.array([5.0, 6.1, 7.2, 8.3])
x2 = np.array([15.0, 16.1, 17.2, 18.3, 19.4])
comboX = np.append(x1, x2)
if len(y1) != len(x1):
raise(Exception('Unequal x1 and y1 data length'))
if len(y2) != len(x2):
raise(Exception('Unequal x2 and y2 data length'))
def function1(data, a, b, c): # not all parameters are used here, c is shared
return a * data + c
def function2(data, a, b, c): # not all parameters are used here, c is shared
return b * data + c
def combinedFunction(comboData, a, b, c):
# single data reference passed in, extract separate data
extract1 = comboData[:len(x1)] # first data
extract2 = comboData[len(x1):] # second data
result1 = function1(extract1, a, b, c)
result2 = function2(extract2, a, b, c)
return np.append(result1, result2)
# some initial parameter values
initialParameters = np.array([1.0, 1.0, 1.0])
# curve fit the combined data to the combined function
fittedParameters, pcov = curve_fit(combinedFunction, comboX, comboY, initialParameters)
# values for display of fitted function
a, b, c = fittedParameters
y_fit_1 = function1(x1, a, b, c) # first data set, first equation
y_fit_2 = function2(x2, a, b, c) # second data set, second equation
plt.plot(comboX, comboY, 'D') # plot the raw data
plt.plot(x1, y_fit_1) # plot the equation using the fitted parameters
plt.plot(x2, y_fit_2) # plot the equation using the fitted parameters
plt.show()
print('a, b, c:', fittedParameters)

Non linear regression using curve_fit

I was trying to fit my data to the function that is written below, but when using curve_fit the results don't match the data at all.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
nu=[0.00,0.03,0.01,-0.02,0.00,-0.06]
data=np.loadtxt('impedancia.txt')
use=np.transpose(data)
Z=use[0]
omega=use[1]
def func(x,a,b,c):
return a/(x**2)+b+c*x**2
popt,poc=curve_fit(func,omega,Z)
plt.plot(omega,Z,'bo',markersize=3.5)
plt.plot(omega,func(omega,*popt))`
I was wondering if anyone could help me with this.
Here is my code and plotted result, with the scipy.optimize.differential_evolution module used to estimate initial parameters for the non-linear solver. Note that this code uses a variation on the Lorentzian peak equation similar to yours, however lines 20 and 21 allow you to select the equation. The peak equation in your code does not appear to fit the narrow peak of the data as well as the recommended peak equation currently selected.
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import warnings
from scipy.optimize import differential_evolution
# bounds on parameters are set in generate_Initial_Parameters() below
def func_original(x,a,b,c):
return a/(x**2)+b+c*x**2
# bounds on parameters are set in generate_Initial_Parameters() below
def func_recommended(x,a,b,c):
return a / (b + (x-c)**2)
# select peak function here
#func = func_original
func = func_recommended
# function for genetic algorithm to minimize (sum of squared error)
# bounds on parameters are set in generate_Initial_Parameters() below
def sumOfSquaredError(parameterTuple):
warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
return np.sum((yData - func(xData, *parameterTuple)) ** 2)
def generate_Initial_Parameters():
# data min and max used for bounds
maxX = max(xData)
minX = min(xData)
maxY = max(yData)
minY = min(yData)
minSearch = min([minX, minY])
maxSearch = max([maxX, maxY])
parameterBounds = []
parameterBounds.append([minSearch, maxSearch]) # parameter bounds for a
parameterBounds.append([minSearch, maxSearch]) # parameter bounds for b
parameterBounds.append([minSearch, maxSearch]) # parameter bounds for c
# "seed" the numpy random number generator for repeatable results
result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
return result.x
# load data from text file
data=np.loadtxt('impedancia.txt')
use=np.transpose(data)
yData=use[0]
xData=use[1]
# generate initial parameter values
initialParameters = generate_Initial_Parameters()
# curve fit the data
fittedParameters, niepewnosci = curve_fit(func, xData, yData, initialParameters)
# create values for display of fitted peak function
a, b, c = fittedParameters
y_fit = func(xData, a, b, c)
plt.plot(xData, yData, 'D') # plot the raw data
plt.plot(xData, y_fit) # plot the equation using the fitted parameters
plt.show()
print(fittedParameters)

Fitting data to system of ODEs using Python via Scipy & Numpy

I am having some trouble translating my MATLAB code into Python via Scipy & Numpy. I am stuck on how to find optimal parameter values (k0 and k1) for my system of ODEs to fit to my ten observed data points. I currently have an initial guess for k0 and k1. In MATLAB, I can using something called 'fminsearch' which is a function that takes the system of ODEs, the observed data points, and the initial values of the system of ODEs. It will then calculate a new pair of parameters k0 and k1 that will fit the observed data. I have included my code to see if you can help me implement some kind of 'fminsearch' to find the optimal parameter values k0 and k1 that will fit my data. I want to add whatever code to do this to my lsqtest.py file.
I have three .py files - ode.py, lsq.py, and lsqtest.py
ode.py:
def f(y, t, k):
return (-k[0]*y[0],
k[0]*y[0]-k[1]*y[1],
k[1]*y[1])
lsq.py:
import pylab as py
import numpy as np
from scipy import integrate
from scipy import optimize
import ode
def lsq(teta,y0,data):
#INPUT teta, the unknowns k0,k1
# data, observed
# y0 initial values needed by the ODE
#OUTPUT lsq value
t = np.linspace(0,9,10)
y_obs = data #data points
k = [0,0]
k[0] = teta[0]
k[1] = teta[1]
#call the ODE solver to get the states:
r = integrate.odeint(ode.f,y0,t,args=(k,))
#the ODE system in ode.py
#at each row (time point), y_cal has
#the values of the components [A,B,C]
y_cal = r[:,1] #separate the measured B
#compute the expression to be minimized:
return sum((y_obs-y_cal)**2)
lsqtest.py:
import pylab as py
import numpy as np
from scipy import integrate
from scipy import optimize
import lsq
if __name__ == '__main__':
teta = [0.2,0.3] #guess for parameter values k0 and k1
y0 = [1,0,0] #initial conditions for system
y = [0.000,0.416,0.489,0.595,0.506,0.493,0.458,0.394,0.335,0.309] #observed data points
data = y
resid = lsq.lsq(teta,y0,data)
print resid
For these kind of fitting tasks you could use the package lmfit. The outcome of the fit would look like this; as you can see, the data are reproduced very well:
For now, I fixed the initial concentrations, you could also set them as variables if you like (just remove the vary=False in the code below). The parameters you obtain are:
[[Variables]]
x10: 5 (fixed)
x20: 0 (fixed)
x30: 0 (fixed)
k0: 0.12183301 +/- 0.005909 (4.85%) (init= 0.2)
k1: 0.77583946 +/- 0.026639 (3.43%) (init= 0.3)
[[Correlations]] (unreported correlations are < 0.100)
C(k0, k1) = 0.809
The code that reproduces the plot looks like this (some explanation can be found in the inline comments):
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import odeint
from lmfit import minimize, Parameters, Parameter, report_fit
from scipy.integrate import odeint
def f(y, t, paras):
"""
Your system of differential equations
"""
x1 = y[0]
x2 = y[1]
x3 = y[2]
try:
k0 = paras['k0'].value
k1 = paras['k1'].value
except KeyError:
k0, k1 = paras
# the model equations
f0 = -k0 * x1
f1 = k0 * x1 - k1 * x2
f2 = k1 * x2
return [f0, f1, f2]
def g(t, x0, paras):
"""
Solution to the ODE x'(t) = f(t,x,k) with initial condition x(0) = x0
"""
x = odeint(f, x0, t, args=(paras,))
return x
def residual(paras, t, data):
"""
compute the residual between actual data and fitted data
"""
x0 = paras['x10'].value, paras['x20'].value, paras['x30'].value
model = g(t, x0, paras)
# you only have data for one of your variables
x2_model = model[:, 1]
return (x2_model - data).ravel()
# initial conditions
x10 = 5.
x20 = 0
x30 = 0
y0 = [x10, x20, x30]
# measured data
t_measured = np.linspace(0, 9, 10)
x2_measured = np.array([0.000, 0.416, 0.489, 0.595, 0.506, 0.493, 0.458, 0.394, 0.335, 0.309])
plt.figure()
plt.scatter(t_measured, x2_measured, marker='o', color='b', label='measured data', s=75)
# set parameters including bounds; you can also fix parameters (use vary=False)
params = Parameters()
params.add('x10', value=x10, vary=False)
params.add('x20', value=x20, vary=False)
params.add('x30', value=x30, vary=False)
params.add('k0', value=0.2, min=0.0001, max=2.)
params.add('k1', value=0.3, min=0.0001, max=2.)
# fit model
result = minimize(residual, params, args=(t_measured, x2_measured), method='leastsq') # leastsq nelder
# check results of the fit
data_fitted = g(np.linspace(0., 9., 100), y0, result.params)
# plot fitted data
plt.plot(np.linspace(0., 9., 100), data_fitted[:, 1], '-', linewidth=2, color='red', label='fitted data')
plt.legend()
plt.xlim([0, max(t_measured)])
plt.ylim([0, 1.1 * max(data_fitted[:, 1])])
# display fitted statistics
report_fit(result)
plt.show()
If you have data for additional variables, you can simply update the function residual.
The following worked for me:
import pylab as pp
import numpy as np
from scipy import integrate, interpolate
from scipy import optimize
##initialize the data
x_data = np.linspace(0,9,10)
y_data = np.array([0.000,0.416,0.489,0.595,0.506,0.493,0.458,0.394,0.335,0.309])
def f(y, t, k):
"""define the ODE system in terms of
dependent variable y,
independent variable t, and
optinal parmaeters, in this case a single variable k """
return (-k[0]*y[0],
k[0]*y[0]-k[1]*y[1],
k[1]*y[1])
def my_ls_func(x,teta):
"""definition of function for LS fit
x gives evaluation points,
teta is an array of parameters to be varied for fit"""
# create an alias to f which passes the optional params
f2 = lambda y,t: f(y, t, teta)
# calculate ode solution, retuen values for each entry of "x"
r = integrate.odeint(f2,y0,x)
#in this case, we only need one of the dependent variable values
return r[:,1]
def f_resid(p):
""" function to pass to optimize.leastsq
The routine will square and sum the values returned by
this function"""
return y_data-my_ls_func(x_data,p)
#solve the system - the solution is in variable c
guess = [0.2,0.3] #initial guess for params
y0 = [1,0,0] #inital conditions for ODEs
(c,kvg) = optimize.leastsq(f_resid, guess) #get params
print "parameter values are ",c
# fit ODE results to interpolating spline just for fun
xeval=np.linspace(min(x_data), max(x_data),30)
gls = interpolate.UnivariateSpline(xeval, my_ls_func(xeval,c), k=3, s=0)
#pick a few more points for a very smooth curve, then plot
# data and curve fit
xeval=np.linspace(min(x_data), max(x_data),200)
#Plot of the data as red dots and fit as blue line
pp.plot(x_data, y_data,'.r',xeval,gls(xeval),'-b')
pp.xlabel('xlabel',{"fontsize":16})
pp.ylabel("ylabel",{"fontsize":16})
pp.legend(('data','fit'),loc=0)
pp.show()
Look at the scipy.optimize module. The minimize function looks fairly similar to fminsearch, and I believe that both basically use a simplex algorithm for optimization.
# cleaned up a bit to get my head around it - thanks for sharing
import pylab as pp
import numpy as np
from scipy import integrate, optimize
class Parameterize_ODE():
def __init__(self):
self.X = np.linspace(0,9,10)
self.y = np.array([0.000,0.416,0.489,0.595,0.506,0.493,0.458,0.394,0.335,0.309])
self.y0 = [1,0,0] # inital conditions ODEs
def ode(self, y, X, p):
return (-p[0]*y[0],
p[0]*y[0]-p[1]*y[1],
p[1]*y[1])
def model(self, X, p):
return integrate.odeint(self.ode, self.y0, X, args=(p,))
def f_resid(self, p):
return self.y - self.model(self.X, p)[:,1]
def optim(self, p_quess):
return optimize.leastsq(self.f_resid, p_guess) # fit params
po = Parameterize_ODE(); p_guess = [0.2, 0.3]
c, kvg = po.optim(p_guess)
# --- show ---
print "parameter values are ", c, kvg
x = np.linspace(min(po.X), max(po.X), 2000)
pp.plot(po.X, po.y,'.r',x, po.model(x, c)[:,1],'-b')
pp.xlabel('X',{"fontsize":16}); pp.ylabel("y",{"fontsize":16}); pp.legend(('data','fit'),loc=0); pp.show()

scipy.optimize_curvefit gives bad results

I'm trying to fit a material model (Carreau-Law). Generally the data looks very good, but it's impossible (for me at least) to get the right model-data and parameters with curve_fit. I tried setting reasonable starting values etc.
import numpy as np
import matplotlib.pyplot as plt
## Y-DATA
eta = np.array([7128.67, 6814, 6490, 6135.67, 5951.67,
5753.67, 5350, 4929.33, 4499.33,4068.67, 3641.33,
3225.33, 2827.33, 2451, 2104.67, 1788, 1503, 1251.33,
1032.33, 434.199, 271.707, 134.532, 75.7034, 40.9144, 21.7112, 14.9206, 9.29772])
##X-DATA
gamma = np.array([0.1, 0.1426, 0.2034, 0.29, 0.4135, 0.5897, 0.8409, 1.199,
1.71, 2.438, 3.477, 4.959, 7.071, 10.08, 14.38, 20.5,
29.24, 41.7, 59.46, 135.438, 279.707, 772.93,
1709.91, 3734.32, 8082.32, 12665.8, 22353.3])
carreaulaw = lambda x, eta_0, lam, a, n: eta_0 / (1 + (lam * x)**a)**((n-1)/a)
popt, pcov = sp.optimize.curve_fit(carreaulaw, gamma, eta, p0=[8000, 3000, 0.8, 0.1])
print(popt)
x = np.linspace(gamma.min(), gamma.max(), 500)
fig = plt.figure()
diagram = fig.add_axes([0.1, 0.1, 0.8, 0.8])
diagram.set_xlabel(r"$log\ \. \gamma_{true}\ (s^{-1})$", fontsize = 12)
diagram.set_ylabel(r"$log\ \eta_{true}\ (Pa*s)$",fontsize = 12)
#diagram.set_xscale("log")
#diagram.set_yscale("log")
diagram.plot(gamma, eta, "r*")
diagram.plot(x, carreaulaw(x, popt[0], popt[1], popt[2], popt[3]), "g-")
I constantly keep getting the error: RuntimeWarning: invalid value encountered in power. I tried a lot of variations already and am pretty stuck right now.
If I don't give any starting values, I get:
RuntimeError: Optimal parameters not found: Number of calls to function has reached maxfev = 1000.
Here is the image of the data on a log-log scale:
I really don't know where I go wrong! The data looks pretty good, that's why I never should run out of maxfev.
Here is a graphical fitter using your data and equation. This example code uses scipy's Differential Evolution genetic algorithm to determine initial parameter estimates for curve_fit(). This scipy module uses the Latin Hypercube algorithm to ensure a thorough search of parameter space, which requires bounds within which to search. It is much easier to find ranges for the parameters than individual values, and here I experimented with different bounds until the fit visually looked OK to me. You should check the bounds I used and see if they appear reasonable.
import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import differential_evolution
import warnings
xData = numpy.array([7128.67, 6814, 6490, 6135.67, 5951.67,
5753.67, 5350, 4929.33, 4499.33,4068.67, 3641.33,
3225.33, 2827.33, 2451, 2104.67, 1788, 1503, 1251.33,
1032.33, 434.199, 271.707, 134.532, 75.7034, 40.9144, 21.7112, 14.9206, 9.29772])
yData = numpy.array([0.1, 0.1426, 0.2034, 0.29, 0.4135, 0.5897, 0.8409, 1.199,
1.71, 2.438, 3.477, 4.959, 7.071, 10.08, 14.38, 20.5,
29.24, 41.7, 59.46, 135.438, 279.707, 772.93,
1709.91, 3734.32, 8082.32, 12665.8, 22353.3])
def carreaulaw(x, eta_0, lam, n, a):
return eta_0 * (1.0+(lam*x)**a)**((n-1.0)/a)
# function for genetic algorithm to minimize (sum of squared error)
def sumOfSquaredError(parameterTuple):
warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
val = carreaulaw(xData, *parameterTuple)
return numpy.sum((yData - val) ** 2.0)
def generate_Initial_Parameters():
parameterBounds = []
parameterBounds.append([0.0, 50.0]) # search bounds for eta_0
parameterBounds.append([0.0, 1.0]) # search bounds for lam
parameterBounds.append([-1.0, 0.0]) # search bounds for n
parameterBounds.append([-200.0, 0.0]) # search bounds for a
# "seed" the numpy random number generator for repeatable results
result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
return result.x
# by default, differential_evolution completes by calling curve_fit() using parameter bounds
geneticParameters = generate_Initial_Parameters()
# now call curve_fit without passing bounds from the genetic algorithm,
# just in case the best fit parameters are aoutside those bounds
fittedParameters, pcov = curve_fit(carreaulaw, xData, yData, geneticParameters)
print('Fitted parameters:', fittedParameters)
print()
modelPredictions = carreaulaw(xData, *fittedParameters)
absError = modelPredictions - yData
SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))
print()
print('RMSE:', RMSE)
print('R-squared:', Rsquared)
print()
##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
axes = f.add_subplot(111)
# first the raw data as a scatter plot
axes.plot(xData, yData, 'D')
# create data for the fitted equation plot
xModel = numpy.linspace(min(xData), max(xData))
yModel = carreaulaw(xModel, *fittedParameters)
# now the model as a line plot
axes.plot(xModel, yModel)
axes.set_xlabel('X Data') # X axis data label
axes.set_ylabel('Y Data') # Y axis data label
plt.show()
plt.close('all') # clean up after using pyplot
graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)
All you need to do is pass the bounds in curve_fit. When no bounds are defined, you can have non-real operations, like (in your case), a float exponentiation of a negative number.
Bounds are simply defined as a lists of two lists/tuples with lower and upper bounds:
bounds = [(-np.inf, 0, 0, 0), [np.inf, np.inf, 1, 1]] #upper np.inf or lower -np.inf means no bound
popt, pcov = curve_fit(carreaulaw, gamma, eta, p0=[8000, 3000, 0.8, 0.1], bounds=bounds)
Output:

Categories