Curve fitting with conditional equation - python

Problem
I have created a curve fitting exercise (see functional code below), but I would like to add to the functionality.
I need to be able to define the following condition: slope at min(xdata) = 0.
(in words: I want the fitted curve to start out with horizontal gradient)
What I have tried
I have spent quite a bit of time researching scipy.optimize.curve_fit and evaluated other options (lmfit package, and scipy functions scipy.optimize.fmin_slsqp, scipy.optimize.minimize, etc.). lmfit only allows me to set a static condition on the parameters, such as p1 = 2 * p2 + 3. But it does not allow me to address min(xdata) dynamically, and I cannot make use of the derivate in the constraint.
Scipy only allows me to minimize the function (find an optimal x, but parameters p are already known). Or it can be used to define a specific range for the parameters. I was not able to define a second function that can be used to constrain the parameters during the curve fitting.
I need to be able to pass the condition directly to the curve fitting algorithm (rather than addressing the problem by bringing the condition into the cubic_fit() equation - it seems possible to eliminate e.g. p3 and define it as a combination of the other parameters and min(xdata)). My actual fitting function is much more complex and I need to run this script iteratively on a batch of data (varying min(xdata)). I cannot manually alter the fitting function each time...
I am grateful for any suggestions, maybe there are other packages out there that allow for a more complex definition of the curve fitting problem?
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats
import scipy.optimize
# generate dummy data - on which I will run a curve fit below
def cubic_fit_with_noise(x, p1, p2, p3, p4):
return p1 + p2*x + p3*x**2 + p4*x**3 + np.random.rand()
xdata = [x * 0.1 for x in range(0, 100)]
ydata = np.array( [cubic_fit_with_noise (x, 2, 0.4, -.2,0.02) for x in xdata] )
# now, run the curve-fit
# set up the fitting function:
def cubic_fit(x, p1, p2, p3, p4):
return p1 + p2*x + p3*x**2 + p4*x**3
# define starting point:
s1 = 2.5
s2 = 0.2
s3 = -.2
s4 = 0.02
# scipy curve fitting:
popt, pcov = scipy.optimize.curve_fit(cubic_fit, xdata, ydata, p0=(s1,s2,s3,s4))
y_modelled = np.array([cubic_fit(x, popt[0], popt[1], popt[2], popt[3]) for x in xdata])
print(popt) # prints out the 4 parameters p1,p2,p3,p4 defined in curve-fitting
plt.plot(xdata, ydata, 'bo')
plt.plot(xdata, y_modelled, 'r-')
plt.show()
The above code runs with Python3 (fix the print statement if you have Python2).
As an addition, I want to bring in the derivative:
def cubic_fit_derivative(x, p1, p2, p3, p4):
return p2 + 2.0 * p3 * x + 3 * p4 * x**2
and the constraint that cubic_fit_derivative(min(xdata), p1,p2,p3,p4) = 0.

Your condition that the derivative of your polynomial = 0 at xmin can be expressed as a simple constraint and means that the variables p2, p3, and p4 are not actually independent. The derivate condition is
p2 + 2*p3*xmin + 3*p4*xmin**2 = 0
where xmin is the minimum value of xdata. Furthermore, xmin will be known prior to the fit (if not necessarily when your script is written), you can use this to constrain one of the three parameters. Since xmin may be zero (in fact, it is for your case), the constraint should be that
p2 = - 2*p3*xmin - 3*p4*xmin**2
Using lmfit, the original, unconstrained fit would look like this (I cleaned it up a bit):
import numpy as np
from lmfit import Model
import matplotlib.pylab as plt
# the model function:
def cubic_poly(x, p1, p2, p3, p4):
return p1 + p2*x + p3*x**2 + p4*x**3
xdata = np.arange(100) * 0.1
ydata = cubic_poly(xdata, 2, 0.4, -.2, 0.02)
ydata = ydata + np.random.normal(size=len(xdata), scale=0.05)
# make Model, create parameters, run fit, print results
model = Model(cubic_poly)
params = model.make_params(p1=2.5, p2=0.2, p3=-0.0, p4=0.0)
result = model.fit(ydata, params, x=xdata)
print(result.fit_report())
plt.plot(xdata, ydata, 'bo')
plt.plot(xdata, result.best_fit, 'r-')
plt.show()
which prints:
[[Model]]
Model(cubic_poly)
[[Fit Statistics]]
# function evals = 13
# data points = 100
# variables = 4
chi-square = 0.218
reduced chi-square = 0.002
Akaike info crit = -604.767
Bayesian info crit = -594.347
[[Variables]]
p1: 2.00924432 +/- 0.018375 (0.91%) (init= 2.5)
p2: 0.39427207 +/- 0.016155 (4.10%) (init= 0.2)
p3: -0.19902928 +/- 0.003802 (1.91%) (init=-0)
p4: 0.01993319 +/- 0.000252 (1.27%) (init= 0)
[[Correlations]] (unreported correlations are < 0.100)
C(p3, p4) = -0.986
C(p2, p3) = -0.967
C(p2, p4) = 0.914
C(p1, p2) = -0.857
C(p1, p3) = 0.732
C(p1, p4) = -0.646
and produces a plot of
Now, to add your constraint condition, we will add xmin as a fixed parameter, and constrain p2 as above, replace the above with:
params = model.make_params(p1=2.5, p2=0.2, p3=-0.0, p4=0.0)
# add an extra parameter for `xmin`
params.add('xmin', min(xdata), vary=False)
# constrain p2 so that the derivative is 0 at xmin
params['p2'].expr = '-2*p3*xmin - 3*p4*xmin**2'
result = model.fit(ydata, params, x=xdata)
print(result.fit_report())
plt.plot(xdata, ydata, 'bo')
plt.plot(xdata, result.best_fit, 'r-')
plt.show()
which now prints
[[Model]]
Model(cubic_poly)
[[Fit Statistics]]
# function evals = 10
# data points = 100
# variables = 3
chi-square = 1.329
reduced chi-square = 0.014
Akaike info crit = -426.056
Bayesian info crit = -418.241
[[Variables]]
p1: 2.39001759 +/- 0.023239 (0.97%) (init= 2.5)
p2: 0 +/- 0 (nan%) == '-2*p3*xmin - 3*p4*xmin**2'
p3: -0.10858258 +/- 0.002372 (2.19%) (init=-0)
p4: 0.01424411 +/- 0.000251 (1.76%) (init= 0)
xmin: 0 (fixed)
[[Correlations]] (unreported correlations are < 0.100)
C(p3, p4) = -0.986
C(p1, p3) = -0.742
C(p1, p4) = 0.658
and a plot like
If xmin had not been zero (say, xdata = np.linspace(-10, 10, 101), the value and uncertainty of p2 would not be zero.

As mentioned in my comment, you just have to fit the right function. I forgot the constant, though. So the function would be a*(x-xmin)**2*(x-xn)+c
As curvefit does not take additional parameters as would e.g. leatssq, the only trick is to pass xmin. I do that by a global variable (Maybe not the nicest way, but it works. Comments on how to do it better are welcome).
Eventually, you just need to add the following lines to your code:
def cubic_zero(x,a,xn,const):
global xmin
return (a*(x-xmin)**2*(x-xn)+const)
and
xmin=xdata[0]
popt2, pcov2 = scipy.optimize.curve_fit(cubic_zero, xdata, ydata)
y_modelled2 = np.array([cubic_zero(x, *popt2) for x in xdata])
print(popt2)
plt.plot(xdata, y_modelled2, color='#ee9900',linestyle="--")
providing
>>>[ 0.01429367 7.63190327 2.92604132]
and

This solution uses scipy.optimize.leastsq. Using the self made residuals function, there is actually no need to pass xmin as additional parameter to the fit. The fit function is as in the other post and therefore has no necessity for constraints. This looks like:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import leastsq
def cubic_fit_with_noise(x, p1, p2, p3, p4):
return p1 + p2*x + p3*x**2 + p4*x**3 + .2*(1-2*np.random.rand())
def cubic_zero(x,a,xn,const, xmin):
return (a*(x-xmin)**2*(x-xn)+const)
def residuals(params, dataX,dataY):
a,xn,const=params
xmin=dataX[0]
dist=np.fromiter( (y-cubic_zero(x,a,xn,const, xmin) for x,y in zip(dataX,dataY)), np.float)
return dist
xdata = np.linspace(.5,10.5,100)
ydata = np.fromiter( (cubic_fit_with_noise (x, 2, 0.4, -.2,0.02) for x in xdata), np.float )
# scipy curve fitting with leastsq:
initialGuess=[.3,.3,.3]
popt2, pcov2, info2, msg2, ier2 = leastsq(residuals,initialGuess, args=(xdata, ydata), full_output=True)
fullparams=np.append(popt2,xdata[0])
y_modelled2 = np.array([cubic_zero(x, *fullparams) for x in xdata])
print(popt2)
print(pcov2)
print np.array([ -popt2[0]*xdata[0]**2*popt2[1]+popt2[2],popt2[0]*(xdata[0]**2+2*xdata[0]*popt2[1]),-popt2[0]*(2*xdata[0]+popt2[1]),popt2[0] ])
plt.plot(xdata, ydata, 'bo')
plt.plot(xdata, y_modelled2, 'r-')
plt.show()
and provides:
>>>[ 0.01710749 7.69369653 2.38986378]
>>>[[ 4.33308441e-06 5.61402017e-04 2.71819763e-04]
[ 5.61402017e-04 1.10367937e-01 5.67852980e-02]
[ 2.71819763e-04 5.67852980e-02 3.94127702e-02]]
>>>[ 2.35695882 0.13589672 -0.14872733 0.01710749]
Image upload does not work at the moment ... for whatever reason but result is the same as in the other post

Related

Fitting Tanh curves with python

I need to fit an tanh curve like this one :
import numpy as np
import matplotlib.pyplot as plt
from lmfit import Model
def f(x, a1=0.00010, a2=0.00013, a3=0.00013, teta1=1, teta2=0.00555, teta3=0.00555, phi1=-50, phi2=600, phi3=-900,
a=0.000000019, b=0):
formule = a1 * np.tanh(teta1 * (x + phi1)) + a2 * np.tanh(teta2 * (x + phi2)) + a3 * np.tanh(
teta3 * (x + phi3)) + a * x + b
return formule
# generate points used to plot
x_plot = np.linspace(-10000, 10000, 1000)
gmodel = Model(f)
result = gmodel.fit(f(x_plot), x=x_plot, a1=1,a2=1,a3=1,teta1=1,teta2=1,teta3=1,phi1=0,phi2=0,phi3=0)
plt.plot(x_plot, f(x_plot), 'bo')
plt.plot(x_plot, result.best_fit, 'r-')
plt.show()
i try to do someting like that but i got this result:
There is an other way for fitting this curve ? I don't know what i'm doing wrong ?
Basically your fit is fine (although not very nice from the coding point of view). Like always, non-linear fits strongly rely on initial parameters. Yours are just chosen badly. You could either think how to determine them manually or use a pre-made package like differential_evolution from scipy.optimize. I am not using this package but you can find an example here on SE
I agree with the answers from mikuszefski and F. Win but would like to add another point.
Your model includes a line + 3 tanh functions. It's not entirely clear that the data support that many different tanh functions. If so (and echoing mikuszefki), you will need to tell the fit that these are not identical. Your example starts them off being identical, which will make it very difficult for the fit to find a good solution. Either way, it would probably be helpful to be able to easily test if there really are 1, 2, 3, or more tanh functions.
You may also want to give not only initial values for your parameters, but also realistic boundaries on them so that the tanh functions are clearly separated and don't wander too far off from where they should be.
To clean up your code and to better allow you to change the number of tanh functions used and place boundary constraints, I would suggest making individual models and adding them as with:
from lmfit import Model
def f_tanh(x, eta=1, phi=0):
"tanh function"
return np.tanh(eta * (x + phi))
def f_line(x, slope=0, intercept=0):
"line function"
return slope*x + intercept
# create model as line + 2 tanh functions
gmodel = Model(f_line) + Model(f_tanh, prefix='t1_') + Model(f_tanh, prefix='t2_')
Now you can easily create parameters, with
params = gmodel.make_params(slope=0.003, intercept=0.001,
t1_eta=0.021, t1_phi=-2000,
t2_eta=0.013, t2_phi=600)
With the fit parameters defined, you can place bounds with:
params['t1_eta'].min = 0
params['t2_eta'].min = 0
params['t1_phi'].min = -3000
params['t1_phi'].max = -1000
params['t2_phi'].min = 0
params['t2_phi'].max = 1000
I think all of these will help you better explore the data and the fits to it. Putting this all together, you might have:
import numpy as np
import matplotlib.pyplot as plt
from lmfit import Model
def f_tanh(x, eta=1, phi=0):
"tanh function"
return np.tanh(eta * (x + phi))
def f_line(x, slope=0, intercept=0):
"line function"
return slope*x + intercept
# line + 2 tanh functions
gmodel = Model(f_line) + Model(f_tanh, prefix='t1_') + Model(f_tanh, prefix='t2_')
# generate "data"
x = np.linspace(-10000, 10000, 1000)
y = gmodel.eval(x=x, slope=0.0001,
t1_eta=0.010, t1_phi=-2100,
t2_eta=0.004, t2_phi=740)
y = y + np.random.normal(size=len(x), scale=0.02)
# make parameters with initial values
params = gmodel.make_params(slope=0.003, intercept=0.001,
t1_eta=0.021, t1_phi=-2000,
t2_eta=0.013, t2_phi=600)
# place realistic but generous constraints to keep tanhs separate
params['t1_eta'].min = 0
params['t2_eta'].min = 0
params['t1_phi'].min = -3000
params['t1_phi'].max = -1000
params['t2_phi'].min = 0
params['t2_phi'].max = 1000
result = gmodel.fit(y, params, x=x)
print(result.fit_report())
plt.plot(x, y, 'bo')
plt.plot(x, result.best_fit, 'r-')
plt.show()
This will give a good fit and plot and find the expected values, within the noise level. Hope that helps get you pointed in the right direction.
Your function is a bit confusing and you do not really have function values. You basically want to to fit to your function itself. Ideally you want to replace f(x_plot) in curve_fit() by real experimental data.
A good way to fit a function is using scipy.optimize.curve_fit
from scipy.optimize import curve_fit
popt, pcov = curve_fit(f, x_plot, f(x_plot), p0=[0.00010, 0.00013, 0.00013, 1, 0.00555, .00555, -50, 600, -900,
0.000000019, 0])
plt.plot(f(x_plot, *popt))
The resulting fit looks like this
with real data :
test_X = np.array(
[-9.77073e+03, -9.29706e+03, -8.82339e+03, -8.34979e+03, -7.87614e+03, -7.40242e+03, -6.92874e+03, -6.45506e+03,
-5.98143e+03, -5.50771e+03, -5.03404e+03, -4.56012e+03, -4.08674e+03, -3.61304e+03, -3.13937e+03, -2.66578e+03,
-2.19210e+03, -1.71845e+03, -1.24478e+03, -9.78925e+02, -9.29077e+02, -8.79059e+02, -8.29082e+02, -7.79092e+02,
-7.29080e+02, -6.79084e+02, -6.29061e+02, -5.79078e+02, -5.29103e+02, -4.79089e+02, -4.29094e+02, -3.79071e+02,
-3.29074e+02, -2.79062e+02, -2.29079e+02, -1.92907e+02, -1.72931e+02, -1.52930e+02, -1.32937e+02, -1.12946e+02,
-9.29511e+01, -7.29438e+01, -5.29292e+01, -3.29304e+01, -1.29330e+01, 7.04455e+00, 2.70676e+01, 4.70634e+01,
6.70526e+01, 8.70340e+01, 1.07056e+02, 1.27037e+02, 1.47045e+02, 1.67033e+02, 1.87039e+02, 2.20765e+02,
2.70680e+02, 3.20699e+02, 3.70693e+02, 4.20692e+02, 4.70696e+02, 5.20704e+02, 5.70685e+02, 6.20710e+02,
6.70682e+02, 7.20705e+02, 7.70707e+02, 8.20704e+02, 8.70713e+02, 9.20691e+02, 9.70700e+02, 1.23926e+03,
1.73932e+03, 2.23932e+03, 2.73926e+03, 3.23924e+03, 3.73926e+03, 4.23952e+03, 4.73926e+03, 5.23930e+03,
5.71508e+03, 6.21417e+03, 6.71413e+03, 7.21412e+03, 7.71410e+03, 8.21405e+03, 8.71402e+03, 9.21423e+03])
test_Y = np.array(
[-3.17679e-04, -3.27541e-04, -3.51184e-04, -3.60672e-04, -3.75965e-04, -3.86888e-04, -4.03222e-04, -4.23262e-04,
-4.38526e-04, -4.51187e-04, -4.61081e-04, -4.67121e-04, -4.96690e-04, -4.94811e-04, -5.10110e-04, -5.18985e-04,
-5.11754e-04, -4.90964e-04, -4.36904e-04, -3.93638e-04, -3.83336e-04, -3.71110e-04, -3.57207e-04, -3.39643e-04,
-3.24155e-04, -2.97296e-04, -2.74653e-04, -2.43700e-04, -1.95574e-04, -1.60716e-04, -1.43363e-04, -1.33610e-04,
-1.30734e-04, -1.26332e-04, -1.26063e-04, -1.24228e-04, -1.23424e-04, -1.20276e-04, -1.16886e-04, -1.21865e-04,
-1.16605e-04, -1.14148e-04, -1.14728e-04, -1.14660e-04, -1.16927e-04, -1.10380e-04, -1.09836e-04, 4.24232e-05,
8.66095e-05, 8.43905e-05, 9.09867e-05, 8.95580e-05, 9.02585e-05, 8.87033e-05, 8.86536e-05, 8.92236e-05,
9.24438e-05, 9.27929e-05, 9.24961e-05, 9.72166e-05, 1.00432e-04, 1.05457e-04, 1.11278e-04, 1.14716e-04,
1.25818e-04, 1.40721e-04, 1.62968e-04, 1.91776e-04, 2.28125e-04, 2.57918e-04, 2.88941e-04, 3.85003e-04,
4.91916e-04, 5.32483e-04, 5.50929e-04, 5.45350e-04, 5.38903e-04, 5.27765e-04, 5.15592e-04, 4.95717e-04,
4.81722e-04, 4.69538e-04, 4.58643e-04, 4.41407e-04, 4.29820e-04, 4.07784e-04, 3.92236e-04, 3.81761e-04])
i try this:
import numpy,
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import differential_evolution
import warnings
def function(x, a1, a2, a3, teta1, teta2, teta3, phi1, phi2, phi3, a, b):
import numpy as np
formule = a1 * np.tanh(teta1 * (x + phi1)) + a2 * np.tanh(teta2 * (x + phi2)) + a3 * np.tanh(teta3 * (x + phi3)) + a * x + b
return formule
# function for genetic algorithm to minimize (sum of squared error)
def sumOfSquaredError(parameterTuple):
warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
val = function(test_X, *parameterTuple)
return numpy.sum((test_Y - val) ** 2.0)
def generate_Initial_Parameters():
parameterBounds = []
parameterBounds.append([1.4e-04, 1.4e-04])
parameterBounds.append([2.00e-04,2.0e-04])
parameterBounds.append([2.5e-04, 2.5e-04])
parameterBounds.append([0, 2.0e+01])
parameterBounds.append([0, 4.0e-03])
parameterBounds.append([0, 4.0e-03])
parameterBounds.append([-8.e+01, 0])
parameterBounds.append([0, 9.0e+02])
parameterBounds.append([-2.1e+03, 0])
parameterBounds.append([-3.4e-08, -2.4e-08])
parameterBounds.append([-2.2e-05*2, 4.2e-05])
# "seed" the numpy random number generator for repeatable results
result = differential_evolution(sumOfSquaredError, parameterBounds)
return result.x
# generate initial parameter values
geneticParameters = generate_Initial_Parameters()
# curve fit the test data
fittedParameters, pcov = curve_fit(function, test_X, test_Y, geneticParameters)
print('Parameters', fittedParameters)
modelPredictions = function(test_X, *fittedParameters)
absError = modelPredictions - test_Y
SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(test_Y))
print('RMSE:', RMSE)
print('R-squared:', Rsquared)
ytry = ftry(test_X)
##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth / 100.0, graphHeight / 100.0), dpi=100)
axes = f.add_subplot(111)
# first the raw data as a scatter plot
axes.plot(test_X, test_Y, 'D')
# create data for the fitted equation plot
yModel = function(test_X, *fittedParameters)
# now the model as a line plot
axes.plot(test_X, yModel)
axes.set_xlabel('X Data') # X axis data label
axes.set_ylabel('Y Data') # Y axis data label
axes.plot(test_X, ytry)
plt.show()
plt.close('all') # clean up after using pyplot
graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)
R-squared: 0.9978, not perfect but not so bad
enter image description here

Python - curve fitting of more complex function

I wish to find the equation of the curve of best fit of the following graph:
Which has the equation in the form of:
I've attempted to find examples of curve fitting with numpy here and here, but they all only show how to plot only exponential or only sinusoidal, but I'd like to plot a graph combining the two functions.
How would I do this?
Here's one approach you might find useful. This uses lmfit (http://lmfit.github.io/lmfit-py/), which provides a high-level approach to curve fitting:
import numpy as np
import matplotlib.pyplot as plt
from lmfit import Model
def decay_cosine(t, amp, beta, omega, phi):
"""model data as decaying cosine wave"""
return amp * np.exp(-beta*t)* np.cos(omega*t + phi)
# create fake data to be fitted
t = np.linspace(0, 5, 101)
y = decay_cosine(t, 1.4, 0.9, 7.2, 0.23) + np.random.normal(size=len(t), scale=0.05)
# build model from decay_cosine
mod = Model(decay_cosine)
# create parameters, giving initial values
params = mod.make_params(amp=2.0, beta=0.5, omega=5, phi=0)
# you can place bounds on parameters:
params['phi'].max = np.pi/2
params['phi'].min = -np.pi/2
params['amp'].min = 0
# fit data to model
result = mod.fit(y, params, t=t)
# print out fit results
print(result.fit_report())
# plot data with best fit
plt.plot(t, y, 'bo', label='data')
plt.plot(t, result.best_fit, 'r')
plt.show()
This will print out a report like this:
[[Model]]
Model(decay_cosine)
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 46
# data points = 101
# variables = 4
chi-square = 0.25540159
reduced chi-square = 0.00263301
Akaike info crit = -595.983903
Bayesian info crit = -585.523421
[[Variables]]
amp: 1.38812335 +/- 0.03034640 (2.19%) (init = 2)
beta: 0.90760648 +/- 0.02820705 (3.11%) (init = 0.5)
omega: 7.16579292 +/- 0.02891827 (0.40%) (init = 5)
phi: 0.26249321 +/- 0.02225816 (8.48%) (init = 0)
[[Correlations]] (unreported correlations are < 0.100)
C(omega, phi) = -0.713
C(amp, beta) = 0.695
C(amp, phi) = 0.253
C(amp, omega) = -0.183
C(beta, phi) = 0.178
C(beta, omega) = -0.128
and produce a plot like this:
Here is a quite simple example using curve_fit and leastsq from scipy.optimize.
1. Setting parameter values, model and experimental data.
import numpy as np
import scipy.optimize
import matplotlib.pyplot as plt
np.random.seed(0) # choosing seed for reproducibility
# ==== theoretical parameter values ====
x0 = 1
beta = .5
omega = 2*np.pi
phi = 0
params = x0, beta, omega, phi
# ==== model ====
def decay_cosine(t, x0, beta, omega, phi):
x = x0 * np.exp(-beta*t_data) * np.cos(omega*t_data + phi)
return x
# ==== generating experimental data ====
t_data = np.linspace(0, 5, num=80)
noise = .05 * np.random.randn(t_data.size)
x_data = decay_cosine(t_data, *params) + noise
2. Fitting.
# ==== fitting using curve_fit ====
params_cf, _ = scipy.optimize.curve_fit(decay_cosine, t_data, x_data)
# ==== fitting using leastsq ====
def residuals(args, t, x):
return x - decay_cosine(t, *args)
x0 = np.ones(len(params)) # initializing all params at one
params_lsq, _ = scipy.optimize.leastsq(residuals, x0, args=(t_data, x_data))
print(params_cf)
print(params_lsq)
array([ 1.04938794, 0.53877389, 6.30375113, -0.01850761])
array([ 1.04938796, 0.53877389, 6.30375103, -0.01850744])
3. Plotting.
plt.plot(t_data, x_data, '.', label='exp data')
plt.plot(t_data, decay_cosine(t_data, *params_cf), label='curve_fit')
plt.plot(t_data, decay_cosine(t_data, *params_lsq), '--', label='leastsq')
plt.legend()
plt.grid(True)
plt.show()

scipy curve_fit raises "OptimizeWarning: Covariance of the parameters could not be estimated"

I am trying to fit this function to some data:
But when I use my code
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
def f(x, start, end):
res = np.empty_like(x)
res[x < start] =-1
res[x > end] = 1
linear = np.all([[start <= x], [x <= end]], axis=0)[0]
res[linear] = np.linspace(-1., 1., num=np.sum(linear))
return res
if __name__ == '__main__':
xdata = np.linspace(0., 1000., 1000)
ydata = -np.ones(1000)
ydata[500:1000] = 1.
ydata = ydata + np.random.normal(0., 0.25, len(ydata))
popt, pcov = curve_fit(f, xdata, ydata, p0=[495., 505.])
print(popt, pcov)
plt.figure()
plt.plot(xdata, f(xdata, *popt), 'r-', label='fit')
plt.plot(xdata, ydata, 'b-', label='data')
plt.show()
I get the error
OptimizeWarning: Covariance of the parameters could not be estimated
Output:
In this example start and end should be closer to 500, but they dont change at all from my initial guess.
The warning (not error) of
OptimizeWarning: Covariance of the parameters could not be estimated
means that the fit could not determine the uncertainties (variance) of the fitting parameters.
The main problem is that your model function f treats the parameters start and end as discrete values -- they are used as integer locations for the change in functional form. scipy's curve_fit (and all other optimization routines in scipy.optimize) assume that parameters are continuous variables, not discrete.
The fitting procedure will try to take small steps (typically around machine precision) in the parameters to get a numerical derivative of the residual with respect to the variables (the Jacobian). With values used as discrete variables, these derivatives will be zero and the fitting procedure will not know how to change the values to improve the fit.
It looks like you're trying to fit a step function to some data. Allow me to recommend trying lmfit (https://lmfit.github.io/lmfit-py) which provides a higher-level interface to curve fitting, and has many built-in models. For example, it includes a StepModel that should be able to model your data.
For a slight modification of your data (so that it has a finite step), the following script with lmfit can fit such data:
#!/usr/bin/python
import numpy as np
from lmfit.models import StepModel, LinearModel
import matplotlib.pyplot as plt
np.random.seed(0)
xdata = np.linspace(0., 1000., 1000)
ydata = -np.ones(1000)
ydata[500:1000] = 1.
# note that a linear step is added here:
ydata[490:510] = -1 + np.arange(20)/10.0
ydata = ydata + np.random.normal(size=len(xdata), scale=0.1)
# model data as Step + Line
step_mod = StepModel(form='linear', prefix='step_')
line_mod = LinearModel(prefix='line_')
model = step_mod + line_mod
# make named parameters, giving initial values:
pars = model.make_params(line_intercept=ydata.min(),
line_slope=0,
step_center=xdata.mean(),
step_amplitude=ydata.std(),
step_sigma=2.0)
# fit data to this model with these parameters
out = model.fit(ydata, pars, x=xdata)
# print results
print(out.fit_report())
# plot data and best-fit
plt.plot(xdata, ydata, 'b')
plt.plot(xdata, out.best_fit, 'r-')
plt.show()
which prints out a report of
[[Model]]
(Model(step, prefix='step_', form='linear') + Model(linear, prefix='line_'))
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 49
# data points = 1000
# variables = 5
chi-square = 9.72660131
reduced chi-square = 0.00977548
Akaike info crit = -4622.89074
Bayesian info crit = -4598.35197
[[Variables]]
step_sigma: 20.6227793 +/- 0.77214167 (3.74%) (init = 2)
step_center: 490.167878 +/- 0.44804412 (0.09%) (init = 500)
step_amplitude: 1.98946656 +/- 0.01304854 (0.66%) (init = 0.996283)
line_intercept: -1.00628058 +/- 0.00706005 (0.70%) (init = -1.277259)
line_slope: 1.3947e-05 +/- 2.2340e-05 (160.18%) (init = 0)
[[Correlations]] (unreported correlations are < 0.100)
C(step_amplitude, line_slope) = -0.875
C(step_sigma, step_center) = -0.863
C(line_intercept, line_slope) = -0.774
C(step_amplitude, line_intercept) = 0.461
C(step_sigma, step_amplitude) = 0.170
C(step_sigma, line_slope) = -0.147
C(step_center, step_amplitude) = -0.146
C(step_center, line_slope) = 0.127
and produces a plot of
Lmfit has lots of extra features. For example, if you want to set bounds on some of the parameter values or fix some from varying, you can do the following:
# make named parameters, giving initial values:
pars = model.make_params(line_intercept=ydata.min(),
line_slope=0,
step_center=xdata.mean(),
step_amplitude=ydata.std(),
step_sigma=2.0)
# now set max and min values for step amplitude"
pars['step_amplitude'].min = 0
pars['step_amplitude'].max = 100
# fix the offset of the line to be -1.0
pars['line_offset'].value = -1.0
pars['line_offset'].vary = False
# then run fit with these parameters
out = model.fit(ydata, pars, x=xdata)
If you know the model should be Step+Constant and that the constant should be fixed, you could also modify the model to be
from lmfit.models import ConstantModel
# model data as Step + Constant
step_mod = StepModel(form='linear', prefix='step_')
const_mod = ConstantModel(prefix='const_')
model = step_mod + const_mod
pars = model.make_params(const_c=-1,
step_center=xdata.mean(),
step_amplitude=ydata.std(),
step_sigma=2.0)
pars['const_c'].vary = False

SciPy Curve Fit Fails Power Law

So, I'm trying to fit a set of data with a power law of the following kind:
def f(x,N,a): # Power law fit
if a >0:
return N*x**(-a)
else:
return 10.**300
par,cov = scipy.optimize.curve_fit(f,data,time,array([10**(-7),1.2]))
where the else condition is just to force a to be positive. Using scipy.optimize.curve_fit yields an awful fit (green line), returning values of 1.2e+04 and 1.9e0-7 for N and a, respectively, with absolutely no intersection with the data. From fits I've put in manually, the values should land around 1e-07 and 1.2 for N and a, respectively, though putting those into curve_fit as initial parameters doesn't change the result. Removing the condition for a to be positive results in a worse fit, as it chooses a negative, which leads to a fit with the wrong sign slope.
I can't figure out how to get a believable, let alone reliable, fit out of this routine, but I can't find any other good Python curve fitting routines. Do I need to write my own least-squares algorithm or is there something I'm doing wrong here?
UPDATE
In the original post, I showed a solution that uses lmfit which allows to assign bounds to your parameters. Starting with version 0.17, scipy also allows to assign bounds to your parameters directly (see documentation). Please find this solution below after the EDIT which can hopefully serve as a minimal example on how to use scipy's curve_fit with parameter bounds.
Original post
As suggested by #Warren Weckesser, you could use lmfit to get this task done, which allows you to assign bounds to your parameters and avoids this 'ugly' if-clause.
Since you do not provide any data, I created some which are shown here:
They follow the law f(x) = 10.5 * x ** (-0.08)
I fit them - as suggested by #roadrunner66 - by transforming the power law in a linear function:
y = N * x ** a
ln(y) = ln(N * x ** a)
ln(y) = a * ln(x) + ln(N)
So I first use np.log on the original data and then do the fit. When I now use lmfit, I get the following output:
[[Variables]]
lN: 2.35450302 +/- 0.019531 (0.83%) (init= 1.704748)
a: -0.08035342 +/- 0.005158 (6.42%) (init=-0.5)
So a is pretty close to the original value and np.exp(2.35450302) gives 10.53 which is also very close to the original value.
The plot then looks as follows; as you can see the fit describes the data very well:
Here is the entire code with a couple of inline comments:
import numpy as np
import matplotlib.pyplot as plt
from lmfit import minimize, Parameters, Parameter, report_fit
# generate some data with noise
xData = np.linspace(0.01, 100., 50.)
aOrg = 0.08
Norg = 10.5
yData = Norg * xData ** (-aOrg) + np.random.normal(0, 0.5, len(xData))
plt.plot(xData, yData, 'bo')
plt.show()
# transform data so that we can use a linear fit
lx = np.log(xData)
ly = np.log(yData)
plt.plot(lx, ly, 'bo')
plt.show()
def decay(params, x, data):
lN = params['lN'].value
a = params['a'].value
# our linear model
model = a * x + lN
return model - data # that's what you want to minimize
# create a set of Parameters
params = Parameters()
params.add('lN', value=np.log(5.5), min=0.01, max=100) # value is the initial value
params.add('a', value=-0.5, min=-1, max=-0.001) # min, max define parameter bounds
# do fit, here with leastsq model
result = minimize(decay, params, args=(lx, ly))
# write error report
report_fit(params)
# plot data
xnew = np.linspace(0., 100., 5000.)
# plot the data
plt.plot(xData, yData, 'bo')
plt.plot(xnew, np.exp(result.values['lN']) * xnew ** (result.values['a']), 'r')
plt.show()
EDIT
Assuming that you have scipy 0.17 installed, you can also do the following using curve_fit. I show it for your original definition of the power law (red line in the plot below) as well as for the logarithmic data (black line in the plot below). The data is generated in the same way as above. The plot the looks as follows:
As you can see, the data is described very well. If you print popt and popt_log, you obtain array([ 10.47463426, 0.07914812]) and array([ 2.35158653, -0.08045776]), respectively (note: for the letter one you will have to take the exponantial of the first argument - np.exp(popt_log[0]) = 10.502 which is close to the original data).
Here is the entire code:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# generate some data with noise
xData = np.linspace(0.01, 100., 50)
aOrg = 0.08
Norg = 10.5
yData = Norg * xData ** (-aOrg) + np.random.normal(0, 0.5, len(xData))
# get logarithmic data
lx = np.log(xData)
ly = np.log(yData)
def f(x, N, a):
return N * x ** (-a)
def f_log(x, lN, a):
return a * x + lN
# optimize using the appropriate bounds
popt, pcov = curve_fit(f, xData, yData, bounds=(0, [30., 20.]))
popt_log, pcov_log = curve_fit(f_log, lx, ly, bounds=([0, -10], [30., 20.]))
xnew = np.linspace(0.01, 100., 5000)
# plot the data
plt.plot(xData, yData, 'bo')
plt.plot(xnew, f(xnew, *popt), 'r')
plt.plot(xnew, f(xnew, np.exp(popt_log[0]), -popt_log[1]), 'k')
plt.show()

fitting hyperbolic and harmonic functions with curvefit

I have a problem working with curvefit function.
Here I have a code with two functions to work with.
The first is an hyperbolic function.
The second is the same but with one parameter = 1.
My problem is that the result to fit the first function with curvefit works fine but with the second doesn´t.
I have a commercial program that generates correct solutions for both respectively. So it is possible to find a solution for the second function (a particular case of the first one as I mentioned above)
Is there someone that could give me an idea about what I am doing wrong ?
Thanks !
Here is the code to run:
def hypRegress(ptp,pir):
xData = np.arange(len(ptp))
yData = pir
xData = np.array(xData, dtype=float)
yData = np.array(yData, dtype= float)
def funcHyp(x, qi, exp, di):
return qi*(1+exp*di*x)**(-1/exp)
def errfuncHyp(p):
return funcHyp(xData, p[0], p[1], p[2]) - yData
#print(xData.min(), xData.max())
#print(yData.min(), yData.max())
trialX = np.linspace(xData[0], xData[-1], 1000)
# Fit an hyperbolic
popt, pcov = optimize.curve_fit(funcHyp, xData, yData)
print 'popt'
#print(popt)
yHYP = funcHyp(trialX, *popt)
#optimization
# initial values
p1, success = optimize.leastsq(errfuncHyp, popt,maxfev=10000)
print p1
aaaa = funcHyp(trialX, *p1)
plt.figure()
plt.plot(xData, yData, 'r+', label='Data', marker='o')
plt.plot(trialX, yHYP, 'r-',ls='--', label="Hyp Fit")
plt.plot(trialX, aaaa, 'y', label = 'Optimized')
plt.legend()
plt.show(block=False)
return p1
def harRegress(ptp,pir):
xData = np.arange(len(ptp))
yData = pir
xData = np.array(xData, dtype=float)
yData = np.array(yData, dtype=float)
def funcHar(x, qi, di):
return qi*(1+di*x)**(-1)
def errfuncHar(p):
return funcHar(xData, p[0], p[1]) - yData
#print(xData.min(), xData.max())
#print(yData.min(), yData.max())
trialX = np.linspace(xData[0], xData[-1], 1000)
# Fit an harmonic
popt, pcov = optimize.curve_fit(funcHar, xData, yData)
print 'popt'
print(popt)
yHAR = funcHar(trialX, *popt)
#optimization
# initial values
p1, success = optimize.leastsq(errfuncHar, popt,maxfev=1000)
print p1
aaaa = funcHar(trialX, *p1)
plt.figure()
plt.plot(xData, yData, 'r+', label='Data', marker='o')
plt.plot(trialX, yHAR, 'r-',ls='--', label="Har Fit")
plt.plot(trialX, aaaa, 'y', label = 'Optimized')
plt.legend()
plt.show(block=False)
return p1
ptp = ([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14])
pir = ([150,85,90,50,45,60,60,40,40,30,28,30,38,30,26])
hypRegress(ptp,pir)
harRegress(ptp,pir)
input('pause')
It's a classic problem. The curve_fit algorithm starts from an initial guess for the arguments to be optimized, which, if not supplied, is simply all ones.
That means, when you call
popt, pcov = optimize.curve_fit(funcHar, xData, yData)
the first attempt for the fitting routine will be to assume
funcHar(xData, qi=1, di=1)
If you haven't specified any of the other options, the fit will be poor, as evidenced by the large variances of the parameter estimates (check the diagonal of pcov and compare it to the actual values returned in popt).
In many cases, the situation is solved by supplying an intelligent guess. From your HAR-model, I gather that the values around x==0 are the same in size as qi. So you could supply an initial guess of p0 = (pir[0], 1), which will already lead to a satisfying solution. You could also call it with
popt, pcov = optimize.curve_fit(funcHar, ptp, pir, p0=(0, 1))
which leads to the same result. So the problem is just that the algorithm finds a local minimum.
An alternative would've been to supply a different factor, the "parameter determining the initial step bound":
popt, pcov = optimize.curve_fit(funcHar, ptp, pir, p0=(1, 1), factor=1)
In this case, even with the (default) initial guess of p0=(1,1), it gives the same resulting fit.
Remember: fitting is an art, not a science. Often times, by analyzing the model you want to fit, you could already supply a good initial guess.
I can't speak for the algorithm used in the commercial program. If it is open-source (unlikely), you could have a look to see what they do.

Categories