scipy.optimize_curvefit gives bad results - python

I'm trying to fit a material model (Carreau-Law). Generally the data looks very good, but it's impossible (for me at least) to get the right model-data and parameters with curve_fit. I tried setting reasonable starting values etc.
import numpy as np
import matplotlib.pyplot as plt
## Y-DATA
eta = np.array([7128.67, 6814, 6490, 6135.67, 5951.67,
5753.67, 5350, 4929.33, 4499.33,4068.67, 3641.33,
3225.33, 2827.33, 2451, 2104.67, 1788, 1503, 1251.33,
1032.33, 434.199, 271.707, 134.532, 75.7034, 40.9144, 21.7112, 14.9206, 9.29772])
##X-DATA
gamma = np.array([0.1, 0.1426, 0.2034, 0.29, 0.4135, 0.5897, 0.8409, 1.199,
1.71, 2.438, 3.477, 4.959, 7.071, 10.08, 14.38, 20.5,
29.24, 41.7, 59.46, 135.438, 279.707, 772.93,
1709.91, 3734.32, 8082.32, 12665.8, 22353.3])
carreaulaw = lambda x, eta_0, lam, a, n: eta_0 / (1 + (lam * x)**a)**((n-1)/a)
popt, pcov = sp.optimize.curve_fit(carreaulaw, gamma, eta, p0=[8000, 3000, 0.8, 0.1])
print(popt)
x = np.linspace(gamma.min(), gamma.max(), 500)
fig = plt.figure()
diagram = fig.add_axes([0.1, 0.1, 0.8, 0.8])
diagram.set_xlabel(r"$log\ \. \gamma_{true}\ (s^{-1})$", fontsize = 12)
diagram.set_ylabel(r"$log\ \eta_{true}\ (Pa*s)$",fontsize = 12)
#diagram.set_xscale("log")
#diagram.set_yscale("log")
diagram.plot(gamma, eta, "r*")
diagram.plot(x, carreaulaw(x, popt[0], popt[1], popt[2], popt[3]), "g-")
I constantly keep getting the error: RuntimeWarning: invalid value encountered in power. I tried a lot of variations already and am pretty stuck right now.
If I don't give any starting values, I get:
RuntimeError: Optimal parameters not found: Number of calls to function has reached maxfev = 1000.
Here is the image of the data on a log-log scale:
I really don't know where I go wrong! The data looks pretty good, that's why I never should run out of maxfev.

Here is a graphical fitter using your data and equation. This example code uses scipy's Differential Evolution genetic algorithm to determine initial parameter estimates for curve_fit(). This scipy module uses the Latin Hypercube algorithm to ensure a thorough search of parameter space, which requires bounds within which to search. It is much easier to find ranges for the parameters than individual values, and here I experimented with different bounds until the fit visually looked OK to me. You should check the bounds I used and see if they appear reasonable.
import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import differential_evolution
import warnings
xData = numpy.array([7128.67, 6814, 6490, 6135.67, 5951.67,
5753.67, 5350, 4929.33, 4499.33,4068.67, 3641.33,
3225.33, 2827.33, 2451, 2104.67, 1788, 1503, 1251.33,
1032.33, 434.199, 271.707, 134.532, 75.7034, 40.9144, 21.7112, 14.9206, 9.29772])
yData = numpy.array([0.1, 0.1426, 0.2034, 0.29, 0.4135, 0.5897, 0.8409, 1.199,
1.71, 2.438, 3.477, 4.959, 7.071, 10.08, 14.38, 20.5,
29.24, 41.7, 59.46, 135.438, 279.707, 772.93,
1709.91, 3734.32, 8082.32, 12665.8, 22353.3])
def carreaulaw(x, eta_0, lam, n, a):
return eta_0 * (1.0+(lam*x)**a)**((n-1.0)/a)
# function for genetic algorithm to minimize (sum of squared error)
def sumOfSquaredError(parameterTuple):
warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
val = carreaulaw(xData, *parameterTuple)
return numpy.sum((yData - val) ** 2.0)
def generate_Initial_Parameters():
parameterBounds = []
parameterBounds.append([0.0, 50.0]) # search bounds for eta_0
parameterBounds.append([0.0, 1.0]) # search bounds for lam
parameterBounds.append([-1.0, 0.0]) # search bounds for n
parameterBounds.append([-200.0, 0.0]) # search bounds for a
# "seed" the numpy random number generator for repeatable results
result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
return result.x
# by default, differential_evolution completes by calling curve_fit() using parameter bounds
geneticParameters = generate_Initial_Parameters()
# now call curve_fit without passing bounds from the genetic algorithm,
# just in case the best fit parameters are aoutside those bounds
fittedParameters, pcov = curve_fit(carreaulaw, xData, yData, geneticParameters)
print('Fitted parameters:', fittedParameters)
print()
modelPredictions = carreaulaw(xData, *fittedParameters)
absError = modelPredictions - yData
SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))
print()
print('RMSE:', RMSE)
print('R-squared:', Rsquared)
print()
##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
axes = f.add_subplot(111)
# first the raw data as a scatter plot
axes.plot(xData, yData, 'D')
# create data for the fitted equation plot
xModel = numpy.linspace(min(xData), max(xData))
yModel = carreaulaw(xModel, *fittedParameters)
# now the model as a line plot
axes.plot(xModel, yModel)
axes.set_xlabel('X Data') # X axis data label
axes.set_ylabel('Y Data') # Y axis data label
plt.show()
plt.close('all') # clean up after using pyplot
graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)

All you need to do is pass the bounds in curve_fit. When no bounds are defined, you can have non-real operations, like (in your case), a float exponentiation of a negative number.
Bounds are simply defined as a lists of two lists/tuples with lower and upper bounds:
bounds = [(-np.inf, 0, 0, 0), [np.inf, np.inf, 1, 1]] #upper np.inf or lower -np.inf means no bound
popt, pcov = curve_fit(carreaulaw, gamma, eta, p0=[8000, 3000, 0.8, 0.1], bounds=bounds)
Output:

Related

lmfit not exploring parameter space

I'm trying to use lmfit to find the best fit parameters of a function for some random data using the Model and Parameters classes. However, it doesn't seem to be exploring the parameter space very much. It does ~10 function evaluations and then returns a terrible fit.
Here is the code:
import numpy as np
from lmfit.model import Model
from lmfit.parameter import Parameters
import matplotlib.pyplot as plt
def dip(x, loc, wid, dep):
"""Make a line with a dip in it"""
# Array of ones
y = np.ones_like(x)
# Define start and end points of dip
start = np.abs(x - (loc - (wid/2.))).argmin()
end = np.abs(x - (loc + (wid/2.))).argmin()
# Set depth of the dip
y[start:end] *= dep
return y
def fitter(x, loc, wid, dep, scatter=0.001, sigma=3):
"""Find the parameters of the dip function in random data"""
# Make the lmfit model
model = Model(dip)
# Make random data and print input values
rand_loc = abs(np.random.normal(loc, scale=0.02))
rand_wid = abs(np.random.normal(wid, scale=0.03))
rand_dep = abs(np.random.normal(dep, scale=0.005))
print('rand_loc: {}\nrand_wid: {}\nrand_dep: {}\n'.format(rand_loc, rand_wid, rand_dep))
data = dip(x, rand_loc, rand_wid, rand_dep) + np.random.normal(0, scatter, x.size)
# Make parameter ranges
params = Parameters()
params.add('loc', value=loc, min=x.min(), max=x.max())
params.add('wid', value=wid, min=0, max=x.max()-x.min())
params.add('dep', value=dep, min=scatter*10, max=0.8)
# Fit the data
result = model.fit(data, x=x, params)
print(result.fit_report())
# Plot it
plt.plot(x, data, 'bo')
plt.plot(x, result.init_fit, 'k--', label='initial fit')
plt.plot(x, result.best_fit, 'r-', label='best fit')
plt.legend(loc='best')
plt.show()
And then I run:
fitter(np.linspace(55707.97, 55708.1, 100), loc=55708.02, wid=0.04, dep=0.98)
Which returns (for example, since it's randomized data):
rand_loc: 55707.99659784677
rand_wid: 0.02015076619874132
rand_dep: 0.9849809461153651
[[Model]]
Model(dip)
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 9
# data points = 100
# variables = 3
chi-square = 0.00336780
reduced chi-square = 3.4720e-05
Akaike info crit = -1023.86668
Bayesian info crit = -1016.05117
## Warning: uncertainties could not be estimated:
loc: at initial value
wid: at initial value
[[Variables]]
loc: 55708.0200 (init = 55708.02)
wid: 0.04000000 (init = 0.04)
dep: 0.99754082 (init = 0.98)
Any idea why it executes so few function evaluations returning a bad fit? Any assistance with this would be greatly appreciated!
This is a similar question to fitting step function with variation in the step location with scipy optimize curve_fit. See https://stackoverflow.com/a/59504874/5179748.
Basically, the solvers in scipy.optimize/lmfit assume that parameters are continuous -- not discrete -- variables. They make small changes to the parameters to see what change that makes in the result. A small change in your loc and wid parameters will have no effect on the result, as argmin() will always return an integer value.
You might find that using a Rectangle Model with a finite width (see https://lmfit.github.io/lmfit-py/builtin_models.html#rectanglemodel) will be helpful. I changed your example a bit, but it should be enough to get you started:
import numpy as np
import matplotlib.pyplot as plt
from lmfit.models import RectangleModel, ConstantModel
def dip(x, loc, wid, dep):
"""Make a line with a dip in it"""
# Array of ones
y = np.ones_like(x)
# Define start and end points of dip
start = np.abs(x - (loc - (wid/2.))).argmin()
end = np.abs(x - (loc + (wid/2.))).argmin()
# Set depth of the dip
y[start:end] *= dep
return y
x = np.linspace(0, 1, 201)
data = dip(x, 0.3, 0.09, 0.98) + np.random.normal(0, 0.001, x.size)
model = RectangleModel() + ConstantModel()
params = model.make_params(c=1.0, amplitude=-0.01, center1=.100, center2=0.7, sigma1=0.15)
params['sigma2'].expr = 'sigma1' # force left and right widths to be the same size
params['c'].vary = False # force offset = 1.0 : value away from "dip"
result = model.fit(data, params, x=x)
print(result.fit_report())
plt.plot(x, data, 'bo')
plt.plot(x, result.init_fit, 'k--', label='initial fit')
plt.plot(x, result.best_fit, 'r-', label='best fit')
plt.legend(loc='best')
plt.show()

scipy curve_fit do not converge even if I iteratively change initial guess

I have some points experimentally acquired.
These points should follow a theoretical function of these type :
f(x) = A * ( 1 - e ^{-x/B})
I tried to use curve_fit function from scipy.optimize to find the parameters A and B that best fits the exponential.
I have to perform this fit on almost 100 different samples.
Moreover I know by experience that 0.5 < A < 2.0 and 7.0 < B < 9.0 .
My problem is related to the failure in the convergence of curve_fit to the optimal values of A and B.
This is the code I wrote, first of all I import the packages that I need, I define the exponential function and then I define a fit function where I impose some constrains on the A value. I did this because othervise in some cases ( e.g. 10% of times) curve_fit returned me some irrealistic values for A , for example A = 10^5 or even greater. If A is a value greater than 2 , I call again the curve_fit function, by changing the initial guess.
from scipy.optimize import curve_fit
import pandas as pd
import numpy as np
initial_guess = [8, 1]
def exponential(x, a, b):
return a*(1 - np.exp(-(x)/b))
def fit(x, y, i):
best_vals, covar = curve_fit(lambda t, a, b: exponential(t, a, b), x, y, p0=i)
if best_vals[1]<0.5 or best_vals[1]>2:
i2 = np.array([1, 0.8, 1])
while best_vals[1]<0.5 or best_vals[1]>2:
i2 = i2 + [0.5, 0.1, 0.5]
best_vals, covar = curve_fit(lambda t, a, b: exponential(t, a, b), x, y, p0=i2)
print(best_vals)
variance = np.sqrt(np.diag(covar))
i2= i
B = best_vals[0]
A = best_vals[1]
return variance, A, B
df = pd.read_csv('data.csv')
v, a, b = fit(df['x'], df['y'], initial_guess)
With this code unfortunately, sometimes I am not able to converge to a value of A between 0.5 and 2.0.
Does anyone suggest some other way to perform this fit by considering the constrain I have ?
Maybe there is a better way to write the fit function.. or to consider the constrains that I have and subsequently change the initial guess
Thanks who can help me
Andrea
Here is an example graphical fitter using scipy's Differential Evolution genetic algorithm to determine initial parameter estimates for curve_fit(). The scipy implementation uses the Latin Hypercube algorithm to ensure a thorough search of parameter space, which requires bounds within which to search. In this example I have used your equation with an added offset so that it works with my test data. I also made he genetic algorithm search bounds on A and B slightly larger that the one you provided as a "margin of error" on the search bounds.
import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import differential_evolution
import warnings
xData = numpy.array([19.1647, 18.0189, 16.9550, 15.7683, 14.7044, 13.6269, 12.6040, 11.4309, 10.2987, 9.23465, 8.18440, 7.89789, 7.62498, 7.36571, 7.01106, 6.71094, 6.46548, 6.27436, 6.16543, 6.05569, 5.91904, 5.78247, 5.53661, 4.85425, 4.29468, 3.74888, 3.16206, 2.58882, 1.93371, 1.52426, 1.14211, 0.719035, 0.377708, 0.0226971, -0.223181, -0.537231, -0.878491, -1.27484, -1.45266, -1.57583, -1.61717])
yData = numpy.array([0.644557, 0.641059, 0.637555, 0.634059, 0.634135, 0.631825, 0.631899, 0.627209, 0.622516, 0.617818, 0.616103, 0.613736, 0.610175, 0.606613, 0.605445, 0.603676, 0.604887, 0.600127, 0.604909, 0.588207, 0.581056, 0.576292, 0.566761, 0.555472, 0.545367, 0.538842, 0.529336, 0.518635, 0.506747, 0.499018, 0.491885, 0.484754, 0.475230, 0.464514, 0.454387, 0.444861, 0.437128, 0.415076, 0.401363, 0.390034, 0.378698])
# exponential equation + offset
def func(x, a, b, offset):
return a*(1.0 - numpy.exp(-(x)/b)) + offset
# function for genetic algorithm to minimize (sum of squared error)
def sumOfSquaredError(parameterTuple):
warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
val = func(xData, *parameterTuple)
return numpy.sum((yData - val) ** 2.0)
def generate_Initial_Parameters():
minY = min(yData)
maxY = max(yData)
parameterBounds = []
parameterBounds.append([0.0, 5.0]) # search bounds for a
parameterBounds.append([5.0, 15.0]) # search bounds for b
parameterBounds.append([minY, maxY]) # search bounds for offset
# "seed" the numpy random number generator for repeatable results
result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
return result.x
# by default, differential_evolution completes by calling curve_fit() using parameter bounds
geneticParameters = generate_Initial_Parameters()
# now call curve_fit without passing bounds from the genetic algorithm,
# just in case the best fit parameters are aoutside those bounds
fittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)
print('Fitted parameters:', fittedParameters)
print()
modelPredictions = func(xData, *fittedParameters)
absError = modelPredictions - yData
SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))
print()
print('RMSE:', RMSE)
print('R-squared:', Rsquared)
print()
##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
axes = f.add_subplot(111)
# first the raw data as a scatter plot
axes.plot(xData, yData, 'D')
# create data for the fitted equation plot
xModel = numpy.linspace(min(xData), max(xData))
yModel = func(xModel, *fittedParameters)
# now the model as a line plot
axes.plot(xModel, yModel)
axes.set_xlabel('X Data') # X axis data label
axes.set_ylabel('Y Data') # Y axis data label
plt.show()
plt.close('all') # clean up after using pyplot
graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)

scipy curve_fit raises "OptimizeWarning: Covariance of the parameters could not be estimated"

I am trying to fit this function to some data:
But when I use my code
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
def f(x, start, end):
res = np.empty_like(x)
res[x < start] =-1
res[x > end] = 1
linear = np.all([[start <= x], [x <= end]], axis=0)[0]
res[linear] = np.linspace(-1., 1., num=np.sum(linear))
return res
if __name__ == '__main__':
xdata = np.linspace(0., 1000., 1000)
ydata = -np.ones(1000)
ydata[500:1000] = 1.
ydata = ydata + np.random.normal(0., 0.25, len(ydata))
popt, pcov = curve_fit(f, xdata, ydata, p0=[495., 505.])
print(popt, pcov)
plt.figure()
plt.plot(xdata, f(xdata, *popt), 'r-', label='fit')
plt.plot(xdata, ydata, 'b-', label='data')
plt.show()
I get the error
OptimizeWarning: Covariance of the parameters could not be estimated
Output:
In this example start and end should be closer to 500, but they dont change at all from my initial guess.
The warning (not error) of
OptimizeWarning: Covariance of the parameters could not be estimated
means that the fit could not determine the uncertainties (variance) of the fitting parameters.
The main problem is that your model function f treats the parameters start and end as discrete values -- they are used as integer locations for the change in functional form. scipy's curve_fit (and all other optimization routines in scipy.optimize) assume that parameters are continuous variables, not discrete.
The fitting procedure will try to take small steps (typically around machine precision) in the parameters to get a numerical derivative of the residual with respect to the variables (the Jacobian). With values used as discrete variables, these derivatives will be zero and the fitting procedure will not know how to change the values to improve the fit.
It looks like you're trying to fit a step function to some data. Allow me to recommend trying lmfit (https://lmfit.github.io/lmfit-py) which provides a higher-level interface to curve fitting, and has many built-in models. For example, it includes a StepModel that should be able to model your data.
For a slight modification of your data (so that it has a finite step), the following script with lmfit can fit such data:
#!/usr/bin/python
import numpy as np
from lmfit.models import StepModel, LinearModel
import matplotlib.pyplot as plt
np.random.seed(0)
xdata = np.linspace(0., 1000., 1000)
ydata = -np.ones(1000)
ydata[500:1000] = 1.
# note that a linear step is added here:
ydata[490:510] = -1 + np.arange(20)/10.0
ydata = ydata + np.random.normal(size=len(xdata), scale=0.1)
# model data as Step + Line
step_mod = StepModel(form='linear', prefix='step_')
line_mod = LinearModel(prefix='line_')
model = step_mod + line_mod
# make named parameters, giving initial values:
pars = model.make_params(line_intercept=ydata.min(),
line_slope=0,
step_center=xdata.mean(),
step_amplitude=ydata.std(),
step_sigma=2.0)
# fit data to this model with these parameters
out = model.fit(ydata, pars, x=xdata)
# print results
print(out.fit_report())
# plot data and best-fit
plt.plot(xdata, ydata, 'b')
plt.plot(xdata, out.best_fit, 'r-')
plt.show()
which prints out a report of
[[Model]]
(Model(step, prefix='step_', form='linear') + Model(linear, prefix='line_'))
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 49
# data points = 1000
# variables = 5
chi-square = 9.72660131
reduced chi-square = 0.00977548
Akaike info crit = -4622.89074
Bayesian info crit = -4598.35197
[[Variables]]
step_sigma: 20.6227793 +/- 0.77214167 (3.74%) (init = 2)
step_center: 490.167878 +/- 0.44804412 (0.09%) (init = 500)
step_amplitude: 1.98946656 +/- 0.01304854 (0.66%) (init = 0.996283)
line_intercept: -1.00628058 +/- 0.00706005 (0.70%) (init = -1.277259)
line_slope: 1.3947e-05 +/- 2.2340e-05 (160.18%) (init = 0)
[[Correlations]] (unreported correlations are < 0.100)
C(step_amplitude, line_slope) = -0.875
C(step_sigma, step_center) = -0.863
C(line_intercept, line_slope) = -0.774
C(step_amplitude, line_intercept) = 0.461
C(step_sigma, step_amplitude) = 0.170
C(step_sigma, line_slope) = -0.147
C(step_center, step_amplitude) = -0.146
C(step_center, line_slope) = 0.127
and produces a plot of
Lmfit has lots of extra features. For example, if you want to set bounds on some of the parameter values or fix some from varying, you can do the following:
# make named parameters, giving initial values:
pars = model.make_params(line_intercept=ydata.min(),
line_slope=0,
step_center=xdata.mean(),
step_amplitude=ydata.std(),
step_sigma=2.0)
# now set max and min values for step amplitude"
pars['step_amplitude'].min = 0
pars['step_amplitude'].max = 100
# fix the offset of the line to be -1.0
pars['line_offset'].value = -1.0
pars['line_offset'].vary = False
# then run fit with these parameters
out = model.fit(ydata, pars, x=xdata)
If you know the model should be Step+Constant and that the constant should be fixed, you could also modify the model to be
from lmfit.models import ConstantModel
# model data as Step + Constant
step_mod = StepModel(form='linear', prefix='step_')
const_mod = ConstantModel(prefix='const_')
model = step_mod + const_mod
pars = model.make_params(const_c=-1,
step_center=xdata.mean(),
step_amplitude=ydata.std(),
step_sigma=2.0)
pars['const_c'].vary = False

Python curve_fit choice of bounds and initial condition affect the result

I have a data set that is described by two free parameters which I want to determine using optimalization.curve_fit. The model is defined as follows
def func(x, a, b,):
return a*x*np.sqrt(1-b*x)
And the fitting part as
popt, pcov = opt.curve_fit(f = func, xdata = x_data, ydata= y_data, p0
= init_guess, bounds = ([a_min, b_min], [a_max, b_max]))
The outcome of the solutions for a and b depends quite strong on my choice of init_guess, i.e. the initial guess and also on the choice of the bounds.
Is there a way the solve this?
The authors of the Python scipy module have included the Differential Evolution genetic algorithm in scipy's optimization code as the module scipy.optimize.differential_evolution. This module can be used to stochastically find initial parameter values for non-linear regression.
Here is example code from RamanSpectroscopyFit, which uses scipy's genetic algorithm for initial parameter estimation for fitting Raman spectroscopy data:
import numpy as np
import pickle # for loading pickled test data
import matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import warnings
from scipy.optimize import differential_evolution
# Double Lorentzian peak function
# bounds on parameters are set in generate_Initial_Parameters() below
def double_Lorentz(x, a, b, A, w, x_0, A1, w1, x_01):
return a*x+b+(2*A/np.pi)*(w/(4*(x-x_0)**2 + w**2))+(2*A1/np.pi)*(w1/(4*(x-x_01)**2 + w1**2))
# function for genetic algorithm to minimize (sum of squared error)
# bounds on parameters are set in generate_Initial_Parameters() below
def sumOfSquaredError(parameterTuple):
warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
return np.sum((yData - double_Lorentz(xData, *parameterTuple)) ** 2)
def generate_Initial_Parameters():
# min and max used for bounds
maxX = max(xData)
minX = min(xData)
maxY = max(yData)
minY = min(yData)
parameterBounds = []
parameterBounds.append([-1.0, 1.0]) # parameter bounds for a
parameterBounds.append([maxY/-2.0, maxY/2.0]) # parameter bounds for b
parameterBounds.append([0.0, maxY*100.0]) # parameter bounds for A
parameterBounds.append([0.0, maxY/2.0]) # parameter bounds for w
parameterBounds.append([minX, maxX]) # parameter bounds for x_0
parameterBounds.append([0.0, maxY*100.0]) # parameter bounds for A1
parameterBounds.append([0.0, maxY/2.0]) # parameter bounds for w1
parameterBounds.append([minX, maxX]) # parameter bounds for x_01
# "seed" the numpy random number generator for repeatable results
result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
return result.x
# load the pickled test data from original Raman spectroscopy
data = pickle.load(open('data.pkl', 'rb'))
xData = data[0]
yData = data[1]
# generate initial parameter values
initialParameters = generate_Initial_Parameters()
# curve fit the test data
fittedParameters, niepewnosci = curve_fit(double_Lorentz, xData, yData, initialParameters)
# create values for display of fitted peak function
a, b, A, w, x_0, A1, w1, x_01 = fittedParameters
y_fit = double_Lorentz(xData, a, b, A, w, x_0, A1, w1, x_01)
plt.plot(xData, yData) # plot the raw data
plt.plot(xData, y_fit) # plot the equation using the fitted parameters
plt.show()
print(fittedParameters)

SciPy Curve Fit Fails Power Law

So, I'm trying to fit a set of data with a power law of the following kind:
def f(x,N,a): # Power law fit
if a >0:
return N*x**(-a)
else:
return 10.**300
par,cov = scipy.optimize.curve_fit(f,data,time,array([10**(-7),1.2]))
where the else condition is just to force a to be positive. Using scipy.optimize.curve_fit yields an awful fit (green line), returning values of 1.2e+04 and 1.9e0-7 for N and a, respectively, with absolutely no intersection with the data. From fits I've put in manually, the values should land around 1e-07 and 1.2 for N and a, respectively, though putting those into curve_fit as initial parameters doesn't change the result. Removing the condition for a to be positive results in a worse fit, as it chooses a negative, which leads to a fit with the wrong sign slope.
I can't figure out how to get a believable, let alone reliable, fit out of this routine, but I can't find any other good Python curve fitting routines. Do I need to write my own least-squares algorithm or is there something I'm doing wrong here?
UPDATE
In the original post, I showed a solution that uses lmfit which allows to assign bounds to your parameters. Starting with version 0.17, scipy also allows to assign bounds to your parameters directly (see documentation). Please find this solution below after the EDIT which can hopefully serve as a minimal example on how to use scipy's curve_fit with parameter bounds.
Original post
As suggested by #Warren Weckesser, you could use lmfit to get this task done, which allows you to assign bounds to your parameters and avoids this 'ugly' if-clause.
Since you do not provide any data, I created some which are shown here:
They follow the law f(x) = 10.5 * x ** (-0.08)
I fit them - as suggested by #roadrunner66 - by transforming the power law in a linear function:
y = N * x ** a
ln(y) = ln(N * x ** a)
ln(y) = a * ln(x) + ln(N)
So I first use np.log on the original data and then do the fit. When I now use lmfit, I get the following output:
[[Variables]]
lN: 2.35450302 +/- 0.019531 (0.83%) (init= 1.704748)
a: -0.08035342 +/- 0.005158 (6.42%) (init=-0.5)
So a is pretty close to the original value and np.exp(2.35450302) gives 10.53 which is also very close to the original value.
The plot then looks as follows; as you can see the fit describes the data very well:
Here is the entire code with a couple of inline comments:
import numpy as np
import matplotlib.pyplot as plt
from lmfit import minimize, Parameters, Parameter, report_fit
# generate some data with noise
xData = np.linspace(0.01, 100., 50.)
aOrg = 0.08
Norg = 10.5
yData = Norg * xData ** (-aOrg) + np.random.normal(0, 0.5, len(xData))
plt.plot(xData, yData, 'bo')
plt.show()
# transform data so that we can use a linear fit
lx = np.log(xData)
ly = np.log(yData)
plt.plot(lx, ly, 'bo')
plt.show()
def decay(params, x, data):
lN = params['lN'].value
a = params['a'].value
# our linear model
model = a * x + lN
return model - data # that's what you want to minimize
# create a set of Parameters
params = Parameters()
params.add('lN', value=np.log(5.5), min=0.01, max=100) # value is the initial value
params.add('a', value=-0.5, min=-1, max=-0.001) # min, max define parameter bounds
# do fit, here with leastsq model
result = minimize(decay, params, args=(lx, ly))
# write error report
report_fit(params)
# plot data
xnew = np.linspace(0., 100., 5000.)
# plot the data
plt.plot(xData, yData, 'bo')
plt.plot(xnew, np.exp(result.values['lN']) * xnew ** (result.values['a']), 'r')
plt.show()
EDIT
Assuming that you have scipy 0.17 installed, you can also do the following using curve_fit. I show it for your original definition of the power law (red line in the plot below) as well as for the logarithmic data (black line in the plot below). The data is generated in the same way as above. The plot the looks as follows:
As you can see, the data is described very well. If you print popt and popt_log, you obtain array([ 10.47463426, 0.07914812]) and array([ 2.35158653, -0.08045776]), respectively (note: for the letter one you will have to take the exponantial of the first argument - np.exp(popt_log[0]) = 10.502 which is close to the original data).
Here is the entire code:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# generate some data with noise
xData = np.linspace(0.01, 100., 50)
aOrg = 0.08
Norg = 10.5
yData = Norg * xData ** (-aOrg) + np.random.normal(0, 0.5, len(xData))
# get logarithmic data
lx = np.log(xData)
ly = np.log(yData)
def f(x, N, a):
return N * x ** (-a)
def f_log(x, lN, a):
return a * x + lN
# optimize using the appropriate bounds
popt, pcov = curve_fit(f, xData, yData, bounds=(0, [30., 20.]))
popt_log, pcov_log = curve_fit(f_log, lx, ly, bounds=([0, -10], [30., 20.]))
xnew = np.linspace(0.01, 100., 5000)
# plot the data
plt.plot(xData, yData, 'bo')
plt.plot(xnew, f(xnew, *popt), 'r')
plt.plot(xnew, f(xnew, np.exp(popt_log[0]), -popt_log[1]), 'k')
plt.show()

Categories