i am trying to use LMFIT to fit a power law model of the form y ~ a (x-x0)^b + d. I used the built in models which exclude the parameter x0:
DATA Data Plot:
x = [57924.223, 57925.339, 57928.226, 57929.22 , 57930.222, 57931.323, 57933.205,
57935.302, 57939.28 , 57951.282]
y = [14455.95775513, 13838.7702847 , 11857.5599917 , 11418.98888834, 11017.30612092,
10905.00155524, 10392.55775922, 10193.91608535,9887.8610764 , 8775.83459273]
err = [459.56414237, 465.27518505, 448.25224285, 476.64621165, 457.05994986,
458.37532126, 469.89966451, 473.68349925, 455.91446878, 507.48473313]
from lmfit.models import PowerlawModel, ConstantModel
power = ExponentialModel()
offset = ConstantModel()
model = power + offset
pars = offset.guess(y, x = x)
pars += power.guess(y, x= x)
result = model.fit(y, x = x, weights=1/err )
print(result.fit_report())
This brings up an error because my data starts at about x = 57000. I was initially offsetting my x-axis by x-57923.24 for all x values which gave me an okay fit. I would like to know how I can implement an x axis offset.
I was looking into expression models...
from lmfit.models import ExpressionModel
mod = ExpressionModel('a*(x-x0)**(b)+d')
mod.guess(y, x=x)
But this I get an error that guess() was not implemented. If i create my own parameters I get the following error too.
ValueError: The model function generated NaN values and the fit aborted! Please check
your model function and/or set boundaries on parameters where applicable. In cases
like this, using "nan_policy='omit'" will probably not work.
any help would be appreciated :)
It turns out that (x-x0)**b is a particularly tricky case to fit. Exponential decays typically require very good initial values for parameters or data over a few decades of decay.
In addition, because x**b is complex when x<0 and b is non-integer, this can mess up the fitting algorithms, which deal strictly with Float64 real values. You also have to be careful in setting bounds on x0 so that x-x0 is always positive and/or allow for values to move along the imaginary axis and then force them back to real.
For your data, getting initial values is somewhat tricky, and with a limited data range (specifically not having a huge drop in intensity), it is hard to be confident that there is a single unique solution - note the high uncertainty in the parameters below. Still, I think this will be close to what you are trying to do:
import numpy as np
from lmfit.models import Model
from matplotlib import pyplot as plt
# Note: use numpy arrays, not lists!
x = np.array([57924.223, 57925.339, 57928.226, 57929.22 , 57930.222,
57931.323, 57933.205, 57935.302, 57939.28 , 57951.282])
y = np.array([14455.95775513, 13838.7702847 , 11857.5599917 ,
11418.98888834, 11017.30612092, 10905.00155524,
10392.55775922, 10193.91608535,9887.8610764 , 8775.83459273])
err = np.array([459.56414237, 465.27518505, 448.25224285, 476.64621165,
457.05994986, 458.37532126, 469.89966451, 473.68349925,
455.91446878, 507.48473313])
# define a function rather than use ExpressionModel (easier to debug)
def powerlaw(x, x0, amplitude, exponent, offset):
return (offset + amplitude * (x-x0)**exponent)
pmodel = Model(powerlaw)
# make parameters with good initial values: challenging!
params = pmodel.make_params(x0=57800, amplitude=1.5e9, exponent=-2.5, offset=7500)
# set bounds on `x0` to prevent (x-x0)**b going complex
params['x0'].min = 57000
params['x0'].max = x.min()-2
# set bounds on other parameters too
params['amplitude'].min = 1.e7
params['offset'].min = 0
params['offset'].max = 50000
params['exponent'].max = 0
params['exponent'].min = -6
# run fit
result = pmodel.fit(y, params, x=x)
print(result.fit_report())
plt.errorbar(x, y, err, marker='o', linewidth=2, label='data')
plt.plot(x, result.init_fit, label='initial guess')
plt.plot(x, result.best_fit, label='best fit')
plt.legend()
plt.show()
Alternatively, you could force x to be complex (say multiply by (1.0+0j)) and then have your model function do return (offset + amplitude * (x-x0)**exponent).real. But, I think you would still carefully selected bounds on the parameters that depend strongly on the actual data you are fitting.
Anyway, this will print a report of
[[Model]]
Model(powerlaw)
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 304
# data points = 10
# variables = 4
chi-square = 247807.309
reduced chi-square = 41301.2182
Akaike info crit = 109.178217
Bayesian info crit = 110.388557
[[Variables]]
x0: 57907.5658 +/- 7.81258329 (0.01%) (init = 57800)
amplitude: 10000061.9 +/- 45477987.8 (454.78%) (init = 1.5e+09)
exponent: -2.63343429 +/- 1.19163855 (45.25%) (init = -2.5)
offset: 8487.50872 +/- 375.212255 (4.42%) (init = 7500)
[[Correlations]] (unreported correlations are < 0.100)
C(amplitude, exponent) = -0.997
C(x0, amplitude) = -0.982
C(x0, exponent) = 0.965
C(exponent, offset) = -0.713
C(amplitude, offset) = 0.662
C(x0, offset) = -0.528
(note the huge uncertainty on amplitude!), and generate a plot of
Related
What are the values I print now with sigma_ab and how can I calculate the confidence interval at 95?
for g in all:
c0 = 5
c2 = 0.2
c3 = 0.7
start = g['y'].iloc[0]
p0 = np.array([c0, c2, c3]), # Construct initial guess array
popt, pcov = curve_fit(
model, g['x'], g['y'],
absolute_sigma=True, maxfev=100000
)
sigma_ab = np.sqrt(np.diagonal(pcov))
n = g.name
print(n+' Estimated parameters: \n', popt)
print(n + ' Approximated errors: \n', sigma_ab)
These are the estimated parameters
[0.24803625 0.06072472 0.46449578]
This is sigma_ab but I don't know exactly what it is. I would like to calculate the upper and lower limit of the mean with 95% confidence interval.
[1.32778766 0.64261562 1.47915215]
Your sigma_ab (sqrt of the diagonal elements of the covariance) will be the 1-sigma (68.3%) uncertainties. If the distribution of your uncertainties is strictly Gaussian (often a good but not perfect assumption, so maybe "a decent starting estimate"), then the 2-sigma (95.5%) uncertainties will be twice those values.
If you want a more detailed measure (and one that doesn't assume symmetric uncertainties), you might find lmfit and its Model class helpful. By default (and when possible) it will report 1-sigma uncertainties from the covariance, which is fast, and usually pretty good. It can also explicitly bracket and find 1-, 2-, 3-sigma uncertainties, positive and negative separately.
You didn't give a very complete example, so it's a little hard to tell what your model function is doing. If you have a model function like:
def modelfunc(x, amp, cen, sigma):
return amp * np.exp(-(x-cen)*(x-cen)/sigma**2)
you could use
import numpy as np
import lmfit
def modelfunc(x, amp, cen, sigma):
return amp * np.exp(-(x-cen)*(x-cen)/sigma**2)
x = np.linspace(-10.0, 10.0, 201)
y = modelfunc(x, 3.0, 0.5, 1.1) + np.random.normal(scale=0.1, size=len(x))
model = lmfit.Model(modelfunc)
params = model.make_params(amp=5., cen=0.2, sigma=1)
result = model.fit(y, params, x=x)
print(result.fit_report())
# now calculate explicit 1-, 2, and 3-sigma uncertainties:
ci = result.conf_interval(sigmas=[1,2,3])
lmfit.printfuncs.report_ci(ci)
which will print out
[[Model]]
Model(modelfunc)
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 21
# data points = 201
# variables = 3
chi-square = 1.93360112
reduced chi-square = 0.00976566
Akaike info crit = -927.428077
Bayesian info crit = -917.518162
[[Variables]]
amp: 2.97351225 +/- 0.03245896 (1.09%) (init = 5)
cen: 0.48792611 +/- 0.00988753 (2.03%) (init = 0.2)
sigma: 1.10931408 +/- 0.01398308 (1.26%) (init = 1)
[[Correlations]] (unreported correlations are < 0.100)
C(amp, sigma) = -0.577
99.73% 95.45% 68.27% _BEST_ 68.27% 95.45% 99.73%
amp : -0.09790 -0.06496 -0.03243 2.97351 +0.03255 +0.06543 +0.09901
cen : -0.03007 -0.01991 -0.00992 0.48793 +0.00991 +0.01990 +0.03004
sigma: -0.04151 -0.02766 -0.01387 1.10931 +0.01404 +0.02834 +0.04309
which gives explicitly calculated uncertainties, and shows that - for this case - the very fast estimates of the 1-sigma uncertainties are very good, and 2-sigma is pretty close to 2x the 1-sigma values. Like, you shouldn't really trust past the 2nd significant digit anyway...
Finally, in your example, you are actually not passing in your initial values, which illustrates a very serious flaw in curve_fit.
I tried the following to find a sine regression but I am not able to draw a sine curve. What am I doing wrong here?
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from scipy.optimize import curve_fit
def sinfunc(x, a, b, c, d):
return a * np.sin(b * (x - np.radians(c)))+d
year=np.arange(0,24,2)
population=np.array([10.2,11.1,12,11.7,10.6,10,10.6,11.7,12,11.1,10.2,10.2])
popt, pcov = curve_fit(sinfunc, year, population, p0=None)
x_data = np.linspace(0, 25, num=100)
plt.scatter(year,population,label='Population')
plt.plot(x_data, sinfunc(x_data, *popt), 'r-',label='Fitted function')
plt.title("Year vs Population")
plt.xlabel('Year')
plt.ylabel('Population')
plt.legend()
plt.show()
The TI-nspire shows y=sin(0.58x-1)+11
Update
If I use p0=[1,0.4,1,5] it works well. But shouldn't it be automatic?
The thing you are doing "wrong" is passing p0=None to curve_fit().
All fitting methods really, really require initial values. Unfortunately, scipy.optimize.curve_fit() has the completely unjustifiable option of allowing you to not set initial values and silently (not even a warning!!) making the absurd guess that all values have initial values of 1.0. It turns out that for your problem these impossible-to-justify-and-broken-by-design initial values are so bad that the fit fails to find a good answer. This is not uncommon. curve_fit is lying to you that p0=None is acceptable, and you are believing that lie.
The solution is to recognize that the offset is obviously around 11 and use p0=[1.0, 0.5, 0.5, 11.0].
You might consider using lmfit (https://lmfit.github.io/lmfit-py/). for this problem (disclaimer: I am a lead author). lmfit has a Model class for curve-fitting that has several useful features that might be useful here (not that curve_fit cannot solve this problem -- it can). With lmfit, your fit might look like:
import numpy as np
import matplotlib.pyplot as plt
from lmfit import Model
def sinfunc(x, a, b, c, d):
return a * np.sin(b*(x - c)) + d
year=np.arange(0,24,2)
population=np.array([10.2,11.1,12,11.7,10.6,10,
10.6,11.7,12,11.1,10.2,10.2])
# build model from your model function
model = Model(sinfunc)
# create parameters (with initial values!). Note that parameters
# are named from the argument names of your model function
params = model.make_params(a=1, b=0.5, c=0.5, d=11.0)
# you can set min/max for any parameter to put bounds on the values
params['a'].min = 0
params['c'].min = -np.pi
params['c'].max = np.pi
# do the fit to your data with those parameters
result = model.fit(population, params, x=year)
# print out report of fit statistics and parameter values+uncertainties
print(result.fit_report())
# plot data and fit result
plt.scatter(year,population,label='Population')
plt.plot(year, result.best_fit, 'r-',label='Fitted function')
plt.title("Year vs Population")
plt.xlabel('Year')
plt.ylabel('Population')
plt.legend()
plt.show()
This will print out a report of
[[Model]]
Model(sinfunc)
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 26
# data points = 12
# variables = 4
chi-square = 0.00761349
reduced chi-square = 9.5169e-04
Akaike info crit = -80.3528861
Bayesian info crit = -78.4132595
[[Variables]]
a: 1.00465520 +/- 0.01247767 (1.24%) (init = 1)
b: 0.57528444 +/- 0.00198556 (0.35%) (init = 0.5)
c: 1.80990367 +/- 0.03823815 (2.11%) (init = 0.5)
d: 11.0250780 +/- 0.00925246 (0.08%) (init = 11)
[[Correlations]] (unreported correlations are < 0.100)
C(b, c) = 0.812
C(b, d) = 0.245
C(c, d) = 0.234
and produce a plot of
But, again: the problem is that you were suckered into believing that p0=None is a reasonable use of curve_fit().
I'm trying to use lmfit to find the best fit parameters of a function for some random data using the Model and Parameters classes. However, it doesn't seem to be exploring the parameter space very much. It does ~10 function evaluations and then returns a terrible fit.
Here is the code:
import numpy as np
from lmfit.model import Model
from lmfit.parameter import Parameters
import matplotlib.pyplot as plt
def dip(x, loc, wid, dep):
"""Make a line with a dip in it"""
# Array of ones
y = np.ones_like(x)
# Define start and end points of dip
start = np.abs(x - (loc - (wid/2.))).argmin()
end = np.abs(x - (loc + (wid/2.))).argmin()
# Set depth of the dip
y[start:end] *= dep
return y
def fitter(x, loc, wid, dep, scatter=0.001, sigma=3):
"""Find the parameters of the dip function in random data"""
# Make the lmfit model
model = Model(dip)
# Make random data and print input values
rand_loc = abs(np.random.normal(loc, scale=0.02))
rand_wid = abs(np.random.normal(wid, scale=0.03))
rand_dep = abs(np.random.normal(dep, scale=0.005))
print('rand_loc: {}\nrand_wid: {}\nrand_dep: {}\n'.format(rand_loc, rand_wid, rand_dep))
data = dip(x, rand_loc, rand_wid, rand_dep) + np.random.normal(0, scatter, x.size)
# Make parameter ranges
params = Parameters()
params.add('loc', value=loc, min=x.min(), max=x.max())
params.add('wid', value=wid, min=0, max=x.max()-x.min())
params.add('dep', value=dep, min=scatter*10, max=0.8)
# Fit the data
result = model.fit(data, x=x, params)
print(result.fit_report())
# Plot it
plt.plot(x, data, 'bo')
plt.plot(x, result.init_fit, 'k--', label='initial fit')
plt.plot(x, result.best_fit, 'r-', label='best fit')
plt.legend(loc='best')
plt.show()
And then I run:
fitter(np.linspace(55707.97, 55708.1, 100), loc=55708.02, wid=0.04, dep=0.98)
Which returns (for example, since it's randomized data):
rand_loc: 55707.99659784677
rand_wid: 0.02015076619874132
rand_dep: 0.9849809461153651
[[Model]]
Model(dip)
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 9
# data points = 100
# variables = 3
chi-square = 0.00336780
reduced chi-square = 3.4720e-05
Akaike info crit = -1023.86668
Bayesian info crit = -1016.05117
## Warning: uncertainties could not be estimated:
loc: at initial value
wid: at initial value
[[Variables]]
loc: 55708.0200 (init = 55708.02)
wid: 0.04000000 (init = 0.04)
dep: 0.99754082 (init = 0.98)
Any idea why it executes so few function evaluations returning a bad fit? Any assistance with this would be greatly appreciated!
This is a similar question to fitting step function with variation in the step location with scipy optimize curve_fit. See https://stackoverflow.com/a/59504874/5179748.
Basically, the solvers in scipy.optimize/lmfit assume that parameters are continuous -- not discrete -- variables. They make small changes to the parameters to see what change that makes in the result. A small change in your loc and wid parameters will have no effect on the result, as argmin() will always return an integer value.
You might find that using a Rectangle Model with a finite width (see https://lmfit.github.io/lmfit-py/builtin_models.html#rectanglemodel) will be helpful. I changed your example a bit, but it should be enough to get you started:
import numpy as np
import matplotlib.pyplot as plt
from lmfit.models import RectangleModel, ConstantModel
def dip(x, loc, wid, dep):
"""Make a line with a dip in it"""
# Array of ones
y = np.ones_like(x)
# Define start and end points of dip
start = np.abs(x - (loc - (wid/2.))).argmin()
end = np.abs(x - (loc + (wid/2.))).argmin()
# Set depth of the dip
y[start:end] *= dep
return y
x = np.linspace(0, 1, 201)
data = dip(x, 0.3, 0.09, 0.98) + np.random.normal(0, 0.001, x.size)
model = RectangleModel() + ConstantModel()
params = model.make_params(c=1.0, amplitude=-0.01, center1=.100, center2=0.7, sigma1=0.15)
params['sigma2'].expr = 'sigma1' # force left and right widths to be the same size
params['c'].vary = False # force offset = 1.0 : value away from "dip"
result = model.fit(data, params, x=x)
print(result.fit_report())
plt.plot(x, data, 'bo')
plt.plot(x, result.init_fit, 'k--', label='initial fit')
plt.plot(x, result.best_fit, 'r-', label='best fit')
plt.legend(loc='best')
plt.show()
I am trying to fit this function to some data:
But when I use my code
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
def f(x, start, end):
res = np.empty_like(x)
res[x < start] =-1
res[x > end] = 1
linear = np.all([[start <= x], [x <= end]], axis=0)[0]
res[linear] = np.linspace(-1., 1., num=np.sum(linear))
return res
if __name__ == '__main__':
xdata = np.linspace(0., 1000., 1000)
ydata = -np.ones(1000)
ydata[500:1000] = 1.
ydata = ydata + np.random.normal(0., 0.25, len(ydata))
popt, pcov = curve_fit(f, xdata, ydata, p0=[495., 505.])
print(popt, pcov)
plt.figure()
plt.plot(xdata, f(xdata, *popt), 'r-', label='fit')
plt.plot(xdata, ydata, 'b-', label='data')
plt.show()
I get the error
OptimizeWarning: Covariance of the parameters could not be estimated
Output:
In this example start and end should be closer to 500, but they dont change at all from my initial guess.
The warning (not error) of
OptimizeWarning: Covariance of the parameters could not be estimated
means that the fit could not determine the uncertainties (variance) of the fitting parameters.
The main problem is that your model function f treats the parameters start and end as discrete values -- they are used as integer locations for the change in functional form. scipy's curve_fit (and all other optimization routines in scipy.optimize) assume that parameters are continuous variables, not discrete.
The fitting procedure will try to take small steps (typically around machine precision) in the parameters to get a numerical derivative of the residual with respect to the variables (the Jacobian). With values used as discrete variables, these derivatives will be zero and the fitting procedure will not know how to change the values to improve the fit.
It looks like you're trying to fit a step function to some data. Allow me to recommend trying lmfit (https://lmfit.github.io/lmfit-py) which provides a higher-level interface to curve fitting, and has many built-in models. For example, it includes a StepModel that should be able to model your data.
For a slight modification of your data (so that it has a finite step), the following script with lmfit can fit such data:
#!/usr/bin/python
import numpy as np
from lmfit.models import StepModel, LinearModel
import matplotlib.pyplot as plt
np.random.seed(0)
xdata = np.linspace(0., 1000., 1000)
ydata = -np.ones(1000)
ydata[500:1000] = 1.
# note that a linear step is added here:
ydata[490:510] = -1 + np.arange(20)/10.0
ydata = ydata + np.random.normal(size=len(xdata), scale=0.1)
# model data as Step + Line
step_mod = StepModel(form='linear', prefix='step_')
line_mod = LinearModel(prefix='line_')
model = step_mod + line_mod
# make named parameters, giving initial values:
pars = model.make_params(line_intercept=ydata.min(),
line_slope=0,
step_center=xdata.mean(),
step_amplitude=ydata.std(),
step_sigma=2.0)
# fit data to this model with these parameters
out = model.fit(ydata, pars, x=xdata)
# print results
print(out.fit_report())
# plot data and best-fit
plt.plot(xdata, ydata, 'b')
plt.plot(xdata, out.best_fit, 'r-')
plt.show()
which prints out a report of
[[Model]]
(Model(step, prefix='step_', form='linear') + Model(linear, prefix='line_'))
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 49
# data points = 1000
# variables = 5
chi-square = 9.72660131
reduced chi-square = 0.00977548
Akaike info crit = -4622.89074
Bayesian info crit = -4598.35197
[[Variables]]
step_sigma: 20.6227793 +/- 0.77214167 (3.74%) (init = 2)
step_center: 490.167878 +/- 0.44804412 (0.09%) (init = 500)
step_amplitude: 1.98946656 +/- 0.01304854 (0.66%) (init = 0.996283)
line_intercept: -1.00628058 +/- 0.00706005 (0.70%) (init = -1.277259)
line_slope: 1.3947e-05 +/- 2.2340e-05 (160.18%) (init = 0)
[[Correlations]] (unreported correlations are < 0.100)
C(step_amplitude, line_slope) = -0.875
C(step_sigma, step_center) = -0.863
C(line_intercept, line_slope) = -0.774
C(step_amplitude, line_intercept) = 0.461
C(step_sigma, step_amplitude) = 0.170
C(step_sigma, line_slope) = -0.147
C(step_center, step_amplitude) = -0.146
C(step_center, line_slope) = 0.127
and produces a plot of
Lmfit has lots of extra features. For example, if you want to set bounds on some of the parameter values or fix some from varying, you can do the following:
# make named parameters, giving initial values:
pars = model.make_params(line_intercept=ydata.min(),
line_slope=0,
step_center=xdata.mean(),
step_amplitude=ydata.std(),
step_sigma=2.0)
# now set max and min values for step amplitude"
pars['step_amplitude'].min = 0
pars['step_amplitude'].max = 100
# fix the offset of the line to be -1.0
pars['line_offset'].value = -1.0
pars['line_offset'].vary = False
# then run fit with these parameters
out = model.fit(ydata, pars, x=xdata)
If you know the model should be Step+Constant and that the constant should be fixed, you could also modify the model to be
from lmfit.models import ConstantModel
# model data as Step + Constant
step_mod = StepModel(form='linear', prefix='step_')
const_mod = ConstantModel(prefix='const_')
model = step_mod + const_mod
pars = model.make_params(const_c=-1,
step_center=xdata.mean(),
step_amplitude=ydata.std(),
step_sigma=2.0)
pars['const_c'].vary = False
So, I'm trying to fit a set of data with a power law of the following kind:
def f(x,N,a): # Power law fit
if a >0:
return N*x**(-a)
else:
return 10.**300
par,cov = scipy.optimize.curve_fit(f,data,time,array([10**(-7),1.2]))
where the else condition is just to force a to be positive. Using scipy.optimize.curve_fit yields an awful fit (green line), returning values of 1.2e+04 and 1.9e0-7 for N and a, respectively, with absolutely no intersection with the data. From fits I've put in manually, the values should land around 1e-07 and 1.2 for N and a, respectively, though putting those into curve_fit as initial parameters doesn't change the result. Removing the condition for a to be positive results in a worse fit, as it chooses a negative, which leads to a fit with the wrong sign slope.
I can't figure out how to get a believable, let alone reliable, fit out of this routine, but I can't find any other good Python curve fitting routines. Do I need to write my own least-squares algorithm or is there something I'm doing wrong here?
UPDATE
In the original post, I showed a solution that uses lmfit which allows to assign bounds to your parameters. Starting with version 0.17, scipy also allows to assign bounds to your parameters directly (see documentation). Please find this solution below after the EDIT which can hopefully serve as a minimal example on how to use scipy's curve_fit with parameter bounds.
Original post
As suggested by #Warren Weckesser, you could use lmfit to get this task done, which allows you to assign bounds to your parameters and avoids this 'ugly' if-clause.
Since you do not provide any data, I created some which are shown here:
They follow the law f(x) = 10.5 * x ** (-0.08)
I fit them - as suggested by #roadrunner66 - by transforming the power law in a linear function:
y = N * x ** a
ln(y) = ln(N * x ** a)
ln(y) = a * ln(x) + ln(N)
So I first use np.log on the original data and then do the fit. When I now use lmfit, I get the following output:
[[Variables]]
lN: 2.35450302 +/- 0.019531 (0.83%) (init= 1.704748)
a: -0.08035342 +/- 0.005158 (6.42%) (init=-0.5)
So a is pretty close to the original value and np.exp(2.35450302) gives 10.53 which is also very close to the original value.
The plot then looks as follows; as you can see the fit describes the data very well:
Here is the entire code with a couple of inline comments:
import numpy as np
import matplotlib.pyplot as plt
from lmfit import minimize, Parameters, Parameter, report_fit
# generate some data with noise
xData = np.linspace(0.01, 100., 50.)
aOrg = 0.08
Norg = 10.5
yData = Norg * xData ** (-aOrg) + np.random.normal(0, 0.5, len(xData))
plt.plot(xData, yData, 'bo')
plt.show()
# transform data so that we can use a linear fit
lx = np.log(xData)
ly = np.log(yData)
plt.plot(lx, ly, 'bo')
plt.show()
def decay(params, x, data):
lN = params['lN'].value
a = params['a'].value
# our linear model
model = a * x + lN
return model - data # that's what you want to minimize
# create a set of Parameters
params = Parameters()
params.add('lN', value=np.log(5.5), min=0.01, max=100) # value is the initial value
params.add('a', value=-0.5, min=-1, max=-0.001) # min, max define parameter bounds
# do fit, here with leastsq model
result = minimize(decay, params, args=(lx, ly))
# write error report
report_fit(params)
# plot data
xnew = np.linspace(0., 100., 5000.)
# plot the data
plt.plot(xData, yData, 'bo')
plt.plot(xnew, np.exp(result.values['lN']) * xnew ** (result.values['a']), 'r')
plt.show()
EDIT
Assuming that you have scipy 0.17 installed, you can also do the following using curve_fit. I show it for your original definition of the power law (red line in the plot below) as well as for the logarithmic data (black line in the plot below). The data is generated in the same way as above. The plot the looks as follows:
As you can see, the data is described very well. If you print popt and popt_log, you obtain array([ 10.47463426, 0.07914812]) and array([ 2.35158653, -0.08045776]), respectively (note: for the letter one you will have to take the exponantial of the first argument - np.exp(popt_log[0]) = 10.502 which is close to the original data).
Here is the entire code:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# generate some data with noise
xData = np.linspace(0.01, 100., 50)
aOrg = 0.08
Norg = 10.5
yData = Norg * xData ** (-aOrg) + np.random.normal(0, 0.5, len(xData))
# get logarithmic data
lx = np.log(xData)
ly = np.log(yData)
def f(x, N, a):
return N * x ** (-a)
def f_log(x, lN, a):
return a * x + lN
# optimize using the appropriate bounds
popt, pcov = curve_fit(f, xData, yData, bounds=(0, [30., 20.]))
popt_log, pcov_log = curve_fit(f_log, lx, ly, bounds=([0, -10], [30., 20.]))
xnew = np.linspace(0.01, 100., 5000)
# plot the data
plt.plot(xData, yData, 'bo')
plt.plot(xnew, f(xnew, *popt), 'r')
plt.plot(xnew, f(xnew, np.exp(popt_log[0]), -popt_log[1]), 'k')
plt.show()