Why does scipy.optimize.curve_fit not fit to the data? - python

I've been trying to fit an exponential to some data for a while using scipy.optimize.curve_fit but i'm having real difficulty. I really can't see any reason why this wouldn't work but it just produces a strait line, no idea why!
Any help would be much appreciated
from __future__ import division
import numpy
from scipy.optimize import curve_fit
import matplotlib.pyplot as pyplot
def func(x,a,b,c):
return a*numpy.exp(-b*x)-c
yData = numpy.load('yData.npy')
xData = numpy.load('xData.npy')
trialX = numpy.linspace(xData[0],xData[-1],1000)
# Fit a polynomial
fitted = numpy.polyfit(xData, yData, 10)[::-1]
y = numpy.zeros(len(trailX))
for i in range(len(fitted)):
y += fitted[i]*trialX**i
# Fit an exponential
popt, pcov = curve_fit(func, xData, yData)
yEXP = func(trialX, *popt)
pyplot.figure()
pyplot.plot(xData, yData, label='Data', marker='o')
pyplot.plot(trialX, yEXP, 'r-',ls='--', label="Exp Fit")
pyplot.plot(trialX, y, label = '10 Deg Poly')
pyplot.legend()
pyplot.show()
xData = [1e-06, 2e-06, 3e-06, 4e-06,
5e-06, 6e-06, 7e-06, 8e-06,
9e-06, 1e-05, 2e-05, 3e-05,
4e-05, 5e-05, 6e-05, 7e-05,
8e-05, 9e-05, 0.0001, 0.0002,
0.0003, 0.0004, 0.0005, 0.0006,
0.0007, 0.0008, 0.0009, 0.001,
0.002, 0.003, 0.004, 0.005,
0.006, 0.007, 0.008, 0.009, 0.01]
yData =
[6.37420666067e-09, 1.13082012115e-08,
1.52835756975e-08, 2.19214493931e-08, 2.71258852882e-08, 3.38556130078e-08, 3.55765277358e-08,
4.13818145846e-08, 4.72543475372e-08, 4.85834751151e-08, 9.53876562077e-08, 1.45110636413e-07,
1.83066627931e-07, 2.10138415308e-07, 2.43503982686e-07, 2.72107045549e-07, 3.02911771395e-07,
3.26499455951e-07, 3.48319349445e-07, 5.13187669283e-07, 5.98480176303e-07, 6.57028222701e-07,
6.98347073045e-07, 7.28699930335e-07, 7.50686502279e-07, 7.7015576866e-07, 7.87147246927e-07,
7.99607141001e-07, 8.61398763228e-07, 8.84272900407e-07, 8.96463883243e-07, 9.04105135329e-07,
9.08443443149e-07, 9.12391264185e-07, 9.150842683e-07, 9.16878548643e-07, 9.18389990067e-07]

Numerical algorithms tend to work better when not fed extremely small (or large) numbers.
In this case, the graph shows your data has extremely small x and y values. If you scale them, the fit is remarkable better:
xData = np.load('xData.npy')*10**5
yData = np.load('yData.npy')*10**5
from __future__ import division
import os
os.chdir(os.path.expanduser('~/tmp'))
import numpy as np
import scipy.optimize as optimize
import matplotlib.pyplot as plt
def func(x,a,b,c):
return a*np.exp(-b*x)-c
xData = np.load('xData.npy')*10**5
yData = np.load('yData.npy')*10**5
print(xData.min(), xData.max())
print(yData.min(), yData.max())
trialX = np.linspace(xData[0], xData[-1], 1000)
# Fit a polynomial
fitted = np.polyfit(xData, yData, 10)[::-1]
y = np.zeros(len(trialX))
for i in range(len(fitted)):
y += fitted[i]*trialX**i
# Fit an exponential
popt, pcov = optimize.curve_fit(func, xData, yData)
print(popt)
yEXP = func(trialX, *popt)
plt.figure()
plt.plot(xData, yData, label='Data', marker='o')
plt.plot(trialX, yEXP, 'r-',ls='--', label="Exp Fit")
plt.plot(trialX, y, label = '10 Deg Poly')
plt.legend()
plt.show()
Note that after rescaling xData and yData, the parameters returned by curve_fit must also be rescaled. In this case, a, b and c each must be divided by 10**5 to obtain fitted parameters for the original data.
One objection you might have to the above is that the scaling has to be chosen rather "carefully". (Read: Not every reasonable choice of scale works!)
You can improve the robustness of curve_fit by providing a reasonable initial guess for the parameters. Usually you have some a priori knowledge about the data which can motivate ballpark / back-of-the envelope type guesses for reasonable parameter values.
For example, calling curve_fit with
guess = (-1, 0.1, 0)
popt, pcov = optimize.curve_fit(func, xData, yData, guess)
helps improve the range of scales on which curve_fit succeeds in this case.

A (slight) improvement to this solution, not accounting for a priori knowledge of the data might be the following: Take the inverse-mean of the data set and use that as the "scale factor" to be passed to the underlying leastsq() called by curve_fit(). This allows the fitter to work and returns the parameters on the original scale of the data.
The relevant line is:
popt, pcov = curve_fit(func, xData, yData)
which becomes:
popt, pcov = curve_fit(func, xData, yData,
diag=(1./xData.mean(),1./yData.mean()) )
Here is the full example which produces this image:
from __future__ import division
import numpy
from scipy.optimize import curve_fit
import matplotlib.pyplot as pyplot
def func(x,a,b,c):
return a*numpy.exp(-b*x)-c
xData = numpy.array([1e-06, 2e-06, 3e-06, 4e-06, 5e-06, 6e-06,
7e-06, 8e-06, 9e-06, 1e-05, 2e-05, 3e-05, 4e-05, 5e-05, 6e-05,
7e-05, 8e-05, 9e-05, 0.0001, 0.0002, 0.0003, 0.0004, 0.0005,
0.0006, 0.0007, 0.0008, 0.0009, 0.001, 0.002, 0.003, 0.004, 0.005
, 0.006, 0.007, 0.008, 0.009, 0.01])
yData = numpy.array([6.37420666067e-09, 1.13082012115e-08,
1.52835756975e-08, 2.19214493931e-08, 2.71258852882e-08,
3.38556130078e-08, 3.55765277358e-08, 4.13818145846e-08,
4.72543475372e-08, 4.85834751151e-08, 9.53876562077e-08,
1.45110636413e-07, 1.83066627931e-07, 2.10138415308e-07,
2.43503982686e-07, 2.72107045549e-07, 3.02911771395e-07,
3.26499455951e-07, 3.48319349445e-07, 5.13187669283e-07,
5.98480176303e-07, 6.57028222701e-07, 6.98347073045e-07,
7.28699930335e-07, 7.50686502279e-07, 7.7015576866e-07,
7.87147246927e-07, 7.99607141001e-07, 8.61398763228e-07,
8.84272900407e-07, 8.96463883243e-07, 9.04105135329e-07,
9.08443443149e-07, 9.12391264185e-07, 9.150842683e-07,
9.16878548643e-07, 9.18389990067e-07])
trialX = numpy.linspace(xData[0],xData[-1],1000)
# Fit a polynomial
fitted = numpy.polyfit(xData, yData, 10)[::-1]
y = numpy.zeros(len(trialX))
for i in range(len(fitted)):
y += fitted[i]*trialX**i
# Fit an exponential
popt, pcov = curve_fit(func, xData, yData,
diag=(1./xData.mean(),1./yData.mean()) )
yEXP = func(trialX, *popt)
pyplot.figure()
pyplot.plot(xData, yData, label='Data', marker='o')
pyplot.plot(trialX, yEXP, 'r-',ls='--', label="Exp Fit")
pyplot.plot(trialX, y, label = '10 Deg Poly')
pyplot.legend()
pyplot.show()

the model a*exp(-b*x)+c fit well the data, but I suggest a little modification:
use this instead
a*x*exp(-b*x)+c
good luck

Related

python SciPy curve_fit with np.exp returns with pcov = inf

I'm trying to optimize a exponential fitting with scipy.optimize.curve_fit. But the result is no good . My code is :
def func(x, a, b, c):
return a * np.exp(-b * x) + c
# xdata and data is obtain from another dataframe and their type is nparray
xdata =[36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70 ,71,72]
ydata = [4,4,4,6,6,13,22,22,26,28,38,48,55,65,65,92,112,134,171,210,267,307,353,436,669,669,818,1029,1219,1405,1617,1791,2032,2032,2182,2298,2389]
popt, pcov = curve_fit(func, xdata, ydata)
plt.plot(xdata, func(xdata, *popt), 'r-', label='fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))
plt.scatter(xdata, ydata, s=1)
plt.show()
Then I got the result like this:
enter image description here
the result showed that :
pcov = [[inf inf inf] [inf inf inf] [inf inf inf]]
popt = [1 1 611.83784]
I don't know how to make my curve fit well. Can you helo me? Thank you!
Fitting against exponential functions is exceedingly tough because tiny variations in the exponent can make large differences in the result. The optimizer is optimizing across many orders of magnitude, and errors near the origin are not equally weighted compared to errors higher up the curve.
The simplest way to handle this is to convert your exponential data to a line using a transformation:
y' = np.log(y)
Then instead of needing to use the fancier (and slower) curve_fit, you can simply use numpy's polyfit function and fit a line. If you wish, you can transform the data back into linear space for analysis. Here, I've edited your code to do the fit with np.polyfit, and you can see the fit is sensible.
import numpy as np
import matplotlib.pyplot as plt
# from scipy.optimize import curve_fit
# def func(x, a, b, c):
# return a * np.exp(-b * x) + c
# xdata and data is obtain from another dataframe and their type is nparray
xdata = np.array([36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70 ,71,72])
ydata = np.array([4,4,4,6,6,13,22,22,26,28,38,48,55,65,65,92,112,134,171,210,267,307,353,436,669,669,818,1029,1219,1405,1617,1791,2032,2032,2182,2298,2389])
# popt, pcov = curve_fit(func, xdata, ydata)
# plt.plot(xdata, func(xdata, *popt), 'r-', label='fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))
# Fit a line (deg=1)
P, pcov = np.polyfit(xdata, np.log(ydata), deg=1, cov=True)
print(pcov)
plt.scatter(xdata, ydata, s=1)
plt.plot(xdata, np.exp(P[0]*xdata + P[1]), 'r-')
plt.legend()
plt.show()
The method is not finding the optimal point. One thing to try is changing the initial guess so that b starts negative, because it looks from your data that b must be negative so that the func fits it decently. Also, from the docs of curve_fit, the initial guess is 1 by default if not specified. A good initial guess is:
popt, pcov = curve_fit(func, xdata, ydata, p0=[1, -0.05, 1])
which gives
popt
array([ 1.90782987e+00, -1.01639857e-01, -1.73633728e+02])
pcov
array([[ 1.08960274e+00, 7.93580944e-03, -5.24526701e+01],
[ 7.93580944e-03, 5.79450721e-05, -3.74693994e-01],
[-5.24526701e+01, -3.74693994e-01, 3.34388178e+03]])
And the plot

Python Curve_Fit Exponential / Power / Log Curve - Improve Results

I am trying to fit this data which is asymptotically approaching zero (but never reaching it).
I believe the best curve is an Inverse Logistic Function, but open to suggestions. The Key is the decaying "S-curve" shape which is expected.
Here is the code I have so far, and the plot image below, which is a pretty ugly fit.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# DATA
x = pd.Series([1,1,264,882,913,1095,1156,1217,1234,1261,1278,1460,1490,1490,1521,1578,1612,1612,1668,1702,1704,1735,1793,2024,2039,2313,2313,2558,2558,2617,2617,2708,2739,2770,2770,2831,2861,2892,2892,2892,2892,2892,2923,2923,2951,2951,2982,2982,3012,3012,3012,3012,3012,3012,3012,3073,3073,3073,3104,3104,3104,3104,3135,3135,3135,3135,3165,3165,3165,3165,3165,3196,3196,3196,3226,3226,3257,3316,3347,3347,3347,3347,3377,3377,3438,3469,3469]).values
y = pd.Series([1000,600,558.659217877095,400,300,100,7.75,6,8.54,6.66666666666667,7.14,1.1001100110011,1.12,0.89,1,2,0.666666666666667,0.77,1.12612612612613,0.7,0.664010624169987,0.65,0.51,0.445037828215398,0.27,0.1,0.26,0.1,0.1,0.13,0.16,0.1,0.13,0.1,0.12,0.1,0.13,0.14,0.14,0.17,0.11,0.15,0.09,0.1,0.26,0.16,0.09,0.09,0.05,0.09,0.09,0.1,0.1,0.11,0.11,0.09,0.09,0.11,0.08,0.09,0.09,0.1,0.06,0.07,0.07,0.09,0.05,0.05,0.06,0.07,0.08,0.08,0.07,0.1,0.08,0.08,0.05,0.06,0.04,0.04,0.05,0.05,0.04,0.06,0.05,0.05,0.06]).values
# Inverse Logistic Function
# https://en.wikipedia.org/wiki/Logistic_function
def func(x, L ,x0, k, b):
y = 1/(L / (1 + np.exp(-k*(x-x0)))+b)
return y
# FIT DATA
p0 = [max(y), np.median(x),1,min(y)] # this is an mandatory initial guess
popt, pcov = curve_fit(func, x, y,p0, method='dogbox',maxfev=10000)
# PERFORMANCE
modelPredictions = func(x, *popt)
absError = modelPredictions - y
SE = np.square(absError) # squared errors
MSE = np.mean(SE) # mean squared errors
RMSE = np.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (np.var(absError) / np.var(y))
print('Parameters:', popt)
print('RMSE:', RMSE)
print('R-squared:', Rsquared)
#PLOT
plt.figure()
plt.plot(x, y, 'ko', label="Original Noised Data")
plt.plot(x, func(x, *popt), 'r-', label="Fitted Curve")
plt.legend()
plt.yscale('log')
#plt.xscale('log')
plt.show()
Here is the result when this code is run... and what I would Like to achieve!
How can I better optimize the curve_fit, so that instead of the code generated RED line, I get something closer to the BLUE drawn line?
Thank you!!
From your plot of data and expected fit, I would guess that you do not really want to model your data y as a logistic-like step function but log(y) as a logistic-like step function.
So, I think you would probably want to use a logistic step function, perhaps adding a linear component to model the log of this data. I would do this with lmfit, as it comes with the models built-in, gives better reporting of resulting, and allows you to greatly simplify your fitting code as with (disclaimer: I am a lead author):
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from lmfit.models import StepModel, LinearModel
# DATA
x = pd.Series([1, 1, 264, 882, 913, 1095, 1156, 1217, 1234, 1261, 1278,
1460, 1490, 1490, 1521, 1578, 1612, 1612, 1668, 1702, 1704,
1735, 1793, 2024, 2039, 2313, 2313, 2558, 2558, 2617, 2617,
2708, 2739, 2770, 2770, 2831, 2861, 2892, 2892, 2892, 2892,
2892, 2923, 2923, 2951, 2951, 2982, 2982, 3012, 3012, 3012,
3012, 3012, 3012, 3012, 3073, 3073, 3073, 3104, 3104, 3104,
3104, 3135, 3135, 3135, 3135, 3165, 3165, 3165, 3165, 3165,
3196, 3196, 3196, 3226, 3226, 3257, 3316, 3347, 3347, 3347,
3347, 3377, 3377, 3438, 3469, 3469]).values
y = pd.Series([1000, 600, 558.659217877095, 400, 300, 100, 7.75, 6, 8.54,
6.66666666666667, 7.14, 1.1001100110011, 1.12, 0.89, 1, 2,
0.666666666666667, 0.77, 1.12612612612613, 0.7,
0.664010624169987, 0.65, 0.51, 0.445037828215398, 0.27, 0.1,
0.26, 0.1, 0.1, 0.13, 0.16, 0.1, 0.13, 0.1, 0.12, 0.1, 0.13,
0.14, 0.14, 0.17, 0.11, 0.15, 0.09, 0.1, 0.26, 0.16, 0.09,
0.09, 0.05, 0.09, 0.09, 0.1, 0.1, 0.11, 0.11, 0.09, 0.09,
0.11, 0.08, 0.09, 0.09, 0.1, 0.06, 0.07, 0.07, 0.09, 0.05,
0.05, 0.06, 0.07, 0.08, 0.08, 0.07, 0.1, 0.08, 0.08, 0.05,
0.06, 0.04, 0.04, 0.05, 0.05, 0.04, 0.06, 0.05, 0.05, 0.06]).values
model = StepModel(form='logistic') + LinearModel()
params = model.make_params(amplitude=-5, center=1000, sigma=100, intercept=0, slope=0)
result = model.fit(np.log(y), params, x=x)
print(result.fit_report())
plt.plot(x, y, 'ko', label="Original Noised Data")
plt.plot(x, np.exp(result.best_fit), 'r-', label="Fitted Curve")
plt.legend()
plt.yscale('log')
plt.show()
That will print out a report with fit statistics and best-fit values of:
[[Model]]
(Model(step, form='logistic') + Model(linear))
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 73
# data points = 87
# variables = 5
chi-square = 9.38961801
reduced chi-square = 0.11450754
Akaike info crit = -183.688405
Bayesian info crit = -171.358865
[[Variables]]
amplitude: -4.89008796 +/- 0.29600969 (6.05%) (init = -5)
center: 1180.65823 +/- 15.2836422 (1.29%) (init = 1000)
sigma: 94.0317580 +/- 18.5328976 (19.71%) (init = 100)
slope: -0.00147861 +/- 8.1151e-05 (5.49%) (init = 0)
intercept: 6.95177838 +/- 0.17170849 (2.47%) (init = 0)
[[Correlations]] (unreported correlations are < 0.100)
C(amplitude, slope) = -0.798
C(amplitude, sigma) = -0.649
C(amplitude, intercept) = -0.605
C(center, intercept) = -0.574
C(sigma, slope) = 0.542
C(sigma, intercept) = 0.348
C(center, sigma) = -0.335
C(amplitude, center) = 0.282
and produce a plot like this
You could certainly reproduce all that with scipy.optimize.curve_fit if you desired, but I would leave that as an exercise.
In your case I'd fit a hyperbolic tangent1 to the base-10 logarithm of your data.
Let's use
                                       log10 (y) = y₀ - a tanh (λ(x-x₀))
as your function
Approximately your x runs from 0 to 3500, your log10(y) from 3 to -1, with the provision that tanh(2) = -tanh(2) ≈ 1 we have
            y₀+a = 3, y0-a= -1 ⇒ y₀ = 1, a = 2;
            λ = (2-(-2)) / (3500-0); x₀ = (3500-0)/2.
(this rough estimate is necessary to provede curve_fit with an initial guess, otherwise the procedure gets lost).
Omitting the boilerplate I have eventually
X = np.linspace(0, 3500, 701)
plt.scatter(x, np.log10(y), label='data')
plt.plot(X, 1-2*np.tanh(4/3500*(X-1750)), label='hand fit')
(y0, a, l, x0), *_ = curve_fit(
lambda x, y0, a, l,x 0: y0 - a*np.tanh(l*(x-x0)),
x, np.log10(y),
p0=[1, 2, 4/3500, 3500/2])
plt.plot(X, y0-a*np.tanh(l*(X-x0)), label='curve_fit fit')
plt.legend()
Note 1: the logistic function is the hyperbolic tangent in disguise
I see that your plot uses log scaling, and I found that several different sigmoidal equations gave what appear to be good fits to the natural log of the Y data. Here is a graphical Python fitter using the natural log of the Y data with a four-parameter Logistic equation:
import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import warnings
xData = numpy.array([1,1,264,882,913,1095,1156,1217,1234,1261,1278,1460,1490,1490,1521,1578,1612,1612,1668,1702,1704,1735,1793,2024,2039,2313,2313,2558,2558,2617,2617,2708,2739,2770,2770,2831,2861,2892,2892,2892,2892,2892,2923,2923,2951,2951,2982,2982,3012,3012,3012,3012,3012,3012,3012,3073,3073,3073,3104,3104,3104,3104,3135,3135,3135,3135,3165,3165,3165,3165,3165,3196,3196,3196,3226,3226,3257,3316,3347,3347,3347,3347,3377,3377,3438,3469,3469], dtype=float)
yData = numpy.array([1000,600,558.659217877095,400,300,100,7.75,6,8.54,6.66666666666667,7.14,1.1001100110011,1.12,0.89,1,2,0.666666666666667,0.77,1.12612612612613,0.7,0.664010624169987,0.65,0.51,0.445037828215398,0.27,0.1,0.26,0.1,0.1,0.13,0.16,0.1,0.13,0.1,0.12,0.1,0.13,0.14,0.14,0.17,0.11,0.15,0.09,0.1,0.26,0.16,0.09,0.09,0.05,0.09,0.09,0.1,0.1,0.11,0.11,0.09,0.09,0.11,0.08,0.09,0.09,0.1,0.06,0.07,0.07,0.09,0.05,0.05,0.06,0.07,0.08,0.08,0.07,0.1,0.08,0.08,0.05,0.06,0.04,0.04,0.05,0.05,0.04,0.06,0.05,0.05,0.06], dtype=float)
# fit the natural lpg of the data
yData = numpy.log(yData)
warnings.filterwarnings("ignore") # do not print "invalid value" warnings during fit
def func(x, a, b, c, d): # Four-Parameter Logistic from zunzun.com
return d + (a - d) / (1.0 + numpy.power(x / c, b))
# these are the same as the scipy defaults
initialParameters = numpy.array([1.0, 1.0, 1.0, 1.0])
# curve fit the test data
fittedParameters, pcov = curve_fit(func, xData, yData, initialParameters)
modelPredictions = func(xData, *fittedParameters)
print('Parameters:', fittedParameters)
print()
##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
axes = f.add_subplot(111)
# first the raw data as a scatter plot
axes.plot(xData, yData, 'D')
# create data for the fitted equation plot
xModel = numpy.linspace(min(xData), max(xData))
yModel = func(xModel, *fittedParameters)
# now the model as a line plot
axes.plot(xModel, yModel)
axes.set_xlabel('X Data') # X axis data label
axes.set_ylabel('Natural Log of Y Data') # Y axis data label
plt.show()
plt.close('all') # clean up after using pyplot
graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)
p0 = [max(y), np.median(x),1,min(y)] # this is an mandatory initial guess
Just to clarify, since this might be your issue, you shouldn't use "1.0" as your initial guess k. You should use 1.0 / (max(x) - min(x))
If your X's are data that ranges over say, [1200, 8000]. Then, using 1.0 will really struggle converge. You want to use 1/6800 as k, so you start off with a normalized [-1, 1] as your initial x-range.
Main reason being, p.exp(4000) will generally fail to evaluate, which will cause python to struggle to fit the function.

More accurate curve fitting

I did a curve fitting by using curve_fit function in scipy.
But, curve fitting was not good to me. Is there any way to improve the curve fitting?
Below python code is what I wrote.
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
minus_eta_total =[-0.004, -0.0116, -0.02604, -0.04, -0.08, -0.12, -0.16, -0.2, -0.24, -0.288, -0.3456, -0.4]
I_d_infil_0_sub_1 = [0.004204675, 0.012262849, 0.028331318, 0.045626793, 0.113222124, 0.224759087, 0.403571293, 0.678854397, 1.090002487, 1.837299526, 3.260471157, 5.311900419]
ASR_el = 0.0075
eta_infil_0_sub_1 = np.array(minus_eta_total) + (np.array(I_d_infil_0_sub_1)*np.array(ASR_el))
cons_eq = 8.3144 * 1073 / (0.5 * 4 * 96485)
def func(x, a, b):
return -cons_eq*np.log(x/a)-b*x
popt_infil_0_sub_1, pcov_infil_0_sub_1 = curve_fit(func, I_d_infil_0_sub_1, eta_infil_0_sub_1)
plt.clf()
plt.plot(I_d_infil_0_sub_1, eta_infil_0_sub_1, linestyle = '--', marker='o', color='k', label = 'original')
plt.plot(I_d_infil_0_sub_1, func(np.asarray(I_d_infil_0_sub_1),*popt_infil_0_sub_1), 'k', label='fit: $\mathit{j_0}$=%5.4f, R$_{ohm}$=%5.4f' % tuple(popt_infil_0_sub_1))
plt.ylim(-0.42, 0.02)
plt.xticks(np.arange(0, 12, 2))
plt.yticks(np.arange(0, -0.42, -0.05))
plt.xlabel('$\mathit{j}$ ($A/cm^2$)', fontsize=14)
plt.ylabel('-\u03b7$_c$ (V)', fontsize=14) # \u03bcm = micro(\u03bc) + meter(m)
plt.legend(loc='upper right')
plt.show(block = False)
The fitting is not so good than expected probably because the chosen equation is not quite convenient.
For example, if we chose the equation :
y(x)=(exp(-a*x)-1)/a
and criteria of fitting : Least mean square relative error,
a=10.17
The result is MSRE=0.052
This is a very good resut considering a so simple equatioon with one parameter only.
Of course, more accurate fit can certainly be achieved in choosing a more complicated equation with more adjustable parameters.

Plotting one sigma error bars on a curve fit line in scipy

I plotted a linear least square fit curve using scipy.optimize.curve_fit(). My data has some error associated to it and I added those while plotting the fit curve.
Next, I want to plot two dashed lines representing one sigma error bar on the curve fit and shade region between those two lines. This is what I have tried so far:
import sys
import os
import numpy
import matplotlib.pyplot as plt
from pylab import *
import scipy.optimize as optimization
from scipy.optimize import curve_fit
xdata = numpy.array([-5.6, -5.6, -6.1, -5.0, -3.2, -6.4, -5.2, -4.5, -2.22, -3.30, -6.15])
ydata = numpy.array([-18.40, -17.63, -17.67, -16.80, -14.19, -18.21, -17.10, -17.90, -15.30, -18.90, -18.62])
# Initial guess.
x0 = numpy.array([1.0, 1.0])
#data error
sigma = numpy.array([0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.22, 0.45, 0.35])
sigma1 = numpy.array([0.000001, 0.000001, 0.000001, 0.0000001, 0.0000001, 0.13, 0.22, 0.30, 0.00000001, 1.0, 0.05])
#def func(x, a, b, c):
# return a + b*x + c*x*x
def line(x, a, b):
return a * x + b
#print optimization.curve_fit(line, xdata, ydata, x0, sigma)
popt, pcov = curve_fit(line, xdata, ydata, sigma =sigma)
print popt
print "a =", popt[0], "+/-", pcov[0,0]**0.5
print "b =", popt[1], "+/-", pcov[1,1]**0.5
#1 sigma error ######################################################################################
sigma2 = numpy.array([1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]) #make change
popt1, pcov1 = curve_fit(line, xdata, ydata, sigma = sigma2) #make change
print popt1
print "a1 =", popt1[0], "+/-", pcov1[0,0]**0.5
print "b1 =", popt1[1], "+/-", pcov1[1,1]**0.5
#####################################################################################################
plt.errorbar(xdata, ydata, yerr=sigma, xerr= sigma1, fmt="none")
plt.ylim(-11.5, -19.5)
plt.xlim(-2, -7)
xfine = np.linspace(-2.0, -7.0, 100) # define values to plot the function for
plt.plot(xfine, line(xfine, popt[0], popt[1]), 'r-')
plt.plot(xfine, line(xfine, popt1[0], popt1[1]), '--') #make change
plt.show()
However, I think the dashed line I plotted takes one sigma error from my provided xdata and ydata numpy array, not from the curve fit. Do I have to know the coordinates that satisfy my fit curve and then make a second array to make the one sigma error fit curve?
It seems you are plotting two completely different lines.
Instead, you need to plot three lines: the first one is your fit without any corrections, the other two lines should be built with the same parameters a and b, but with added or subtracted sigmas. You obtain the respective sigmas from the covariance matrix you obtain in pcov. So you'll have something like:
y = line(xfine, popt[0], popt[1])
y1 = line(xfine, popt[0] + pcov[0,0]**0.5, popt[1] - pcov[1,1]**0.5)
y2 = line(xfine, popt[0] - pcov[0,0]**0.5, popt[1] + pcov[1,1]**0.5)
plt.plot(xfine, y, 'r-')
plt.plot(xfine, y1, 'g--')
plt.plot(xfine, y2, 'g--')
plt.fill_between(xfine, y1, y2, facecolor="gray", alpha=0.15)
fill_between shades the area between the error bar lines.
This is the result:
You can apply the same technique for your other line if you want.

Curve fitting an exponential function using SciPy

I have the following "score" function, that meant to give a score between 0 and one for a certain measurement, that looks like:
def func(x, a, b):
return 1.0/(1.0+np.exp(-b*(x-a)))
I would like to fit it to the following x and y daya:
x = np.array([4000, 2500, 2000, 1000, 500])
y = np.array([ 0.1, 0.3, 0.5, 0.7, 0.9])
But curve_fit does not seems to work:
popt, pcov = curve_fit(func, x, y)
When I try to fit it with a linear function curve_fit gives a good fitting (in green line), but with the exponential function above it just give a=1 and b=1, that is not a good fitting. A good fitting should be a=1800 and b=-0.001667, that gives the red line (data in blue).
The reason is likely that the starting condition is not specified. If you give it as some reasonable numbers, then it is more likely that curve_fit will converge. Below is an example with some reasonable starting conditions:
from scipy.optimize import curve_fit
def func(x, a, b):
return 1.0/(1.0+np.exp(-b*(x-a)))
x = np.array([4000., 2500., 2000., 1000., 500.])
y = np.array([ 0.1, 0.3, 0.5, 0.7, 0.9])
popt, pcov = curve_fit(func, x, y, p0=[2000., 0.005])
plot(x, y, 'x')
xx = linspace(0, 4000, 100)
yy = func(xx, *popt)
plot(xx, yy, lw=5)

Categories