I am having a trouble with estimation of 5 unknown parameters a, b, c, d, e that are definitely lay in the intervals. It's simply looks this way:
import numpy as np
from scipy.optimize import curve_fit
diap_a = np.arange(0.01, 1, 0.2)
diap_b = np.arange(0.01, 30, 5)
diap_c = np.arange(0.01, 2, 0.5)
diap_d = np.arange(0.01, 2, 0.5)
diap_e = np.arange(0.01, 0.3, 0.03)
X = np.arange(0.01, 1, 0.01)
def func(a, b, c, d, e):
return a + b + c + d + e #for example
Y = func(a, b, c, d, e)
I have data (expected values) such that
Y1 = [60, 59, 58, 57, 56, 55, 50, 30, 10]
X1 = [0.048, 0.049, 0.05, 0.05, 0.06, 0.089, 0.1, 0.12, 0.134]
I was trying to implement it this way:
popt, pcov = curve_fit(func, a, b, c, d, e, Y1, X1)
to find optimal a, b, c, d, e that will help to fit the curve
plt.plot(Y, X)
plt.show()
But it doesn't work.
The result is:
OptimizeWarning: Covariance of the parameters could not be estimated
Sorry for my bad formulation of the problem.
Your curve_fit() should take func, X1, and Y1 as the first three parameters according to the curve_fit() documentation. As currently coded, func() will always return a single value that has nothing to do with X1 and cannot fit the data. Here is an example graphing fitter using your data that has three parameters and uses scipy's default initial parameter estimates of all 1.0 - these are not always optimal. If you get a bad fit of the data to any given function it might be the initial parameter estimates, and so scipy has a genetic algorithm module to help find those estimates if needed.
import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
xData = numpy.array([0.048, 0.049, 0.05, 0.05, 0.06, 0.089, 0.1, 0.12, 0.134])
yData = numpy.array([60, 59, 58, 57, 56, 55, 50, 30, 10])
def func(x, a, b, c): # simple quadratic example
return (a * numpy.square(x)) + b * x + c
# these are the same as the scipy defaults
initialParameters = numpy.array([1.0, 1.0, 1.0])
# curve fit the test data
fittedParameters, pcov = curve_fit(func, xData, yData, initialParameters)
modelPredictions = func(xData, *fittedParameters)
absError = modelPredictions - yData
SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))
print('Parameters:', fittedParameters)
print('RMSE:', RMSE)
print('R-squared:', Rsquared)
print()
##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
axes = f.add_subplot(111)
# first the raw data as a scatter plot
axes.plot(xData, yData, 'D')
# create data for the fitted equation plot
xModel = numpy.linspace(min(xData), max(xData))
yModel = func(xModel, *fittedParameters)
# now the model as a line plot
axes.plot(xModel, yModel)
axes.set_xlabel('X Data') # X axis data label
axes.set_ylabel('Y Data') # Y axis data label
plt.show()
plt.close('all') # clean up after using pyplot
graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)
Related
I have come across a problem when playing with the parameters of the curve_fit from scipy. I have initially copied the code suggested by the docs. I then changed the equation slightly and it was fine, but having increased the np.linspace, the whole prediction ended up being a straight line. Any ideas?
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
def f(x, a, b, c):
# This works fine on smaller numbers
return (a - c) * np.exp(-x / b) + c
xdata = np.linspace(60, 3060, 200)
ydata = f(xdata, 100, 400, 20)
# noise
np.random.seed(1729)
ydata = ydata + np.random.normal(size=xdata.size) * 0.2
# graph
fig, ax = plt.subplots()
plt.plot(xdata, ydata, marker="o")
pred, covar = curve_fit(f, xdata, ydata)
plt.plot(xdata, f(xdata, *pred), label="prediciton")
plt.show()
You may need to start with a better guess, The default initial guess (1.0, 1.0, 1.0) seems to be in the divergent region.
I use the initial guess p0 = (50,200,100) and it works
fig, ax = plt.subplots()
plt.plot(xdata, ydata, marker="o")
pred, covar = curve_fit(f, xdata, ydata, p0 = (50,200,100))
plt.plot(xdata, f(xdata, *pred), label="prediciton")
plt.show()
Here is example code using your data and equation, with the initial parameter estimates given by scipy's differential_evolution genetic algorithm module. That module uses the Latin Hypercube algorithm to ensure a thorough search of parameter space, which requires bounds within which to search. In this example those bounds are taken from the data maximum and minimum values. It is much easier to supply ranges for the initial parameter estimates rather than specific values.
import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import differential_evolution
import warnings
def func(x, a, b, c):
return (a - c) * numpy.exp(-x / b) + c
xData = numpy.linspace(60, 3060, 200)
yData = func(xData, 100, 400, 20)
# noise
numpy.random.seed(1729)
yData = yData + numpy.random.normal(size=xData.size) * 0.2
# function for genetic algorithm to minimize (sum of squared error)
def sumOfSquaredError(parameterTuple):
warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
val = func(xData, *parameterTuple)
return numpy.sum((yData - val) ** 2.0)
def generate_Initial_Parameters():
# min and max used for bounds
maxX = max(xData)
minX = min(xData)
maxY = max(yData)
minY = min(yData)
parameterBounds = []
parameterBounds.append([minY, maxY]) # search bounds for a
parameterBounds.append([minX, maxX]) # search bounds for b
parameterBounds.append([minY, maxY]) # search bounds for c
# "seed" the numpy random number generator for repeatable results
result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
return result.x
# by default, differential_evolution completes by calling curve_fit() using parameter bounds
geneticParameters = generate_Initial_Parameters()
# now call curve_fit without passing bounds from the genetic algorithm,
# just in case the best fit parameters are aoutside those bounds
fittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)
print('Fitted parameters:', fittedParameters)
print()
modelPredictions = func(xData, *fittedParameters)
absError = modelPredictions - yData
SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))
print()
print('RMSE:', RMSE)
print('R-squared:', Rsquared)
print()
##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
axes = f.add_subplot(111)
# first the raw data as a scatter plot
axes.plot(xData, yData, 'D')
# create data for the fitted equation plot
xModel = numpy.linspace(min(xData), max(xData))
yModel = func(xModel, *fittedParameters)
# now the model as a line plot
axes.plot(xModel, yModel)
axes.set_xlabel('X Data') # X axis data label
axes.set_ylabel('Y Data') # Y axis data label
plt.show()
plt.close('all') # clean up after using pyplot
graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)
This is due to a limitation of Levenberg–Marquardt algorithm which curve_fit uses by default. The good way to use it is to provide some decent initial guess for parameters before optimize. In my experiense this is particularly important when optimizing exponential functions like your example. With such iterative algorithms as LM, the quality of your starting point determines that where the result will converge. The more parameters you have the more likely that your final result will converge to a completely unwanted curve. Overall the solution is finding a good initial guess somehow as other answers did.
I'm trying to write an algorithm in Python that predicts the output of a sine wave. For example, if the input is 90 (in degrees), the output is 1.
When I try Linear Regression, the output is pretty bad.
[in]
import pandas as pd
from sklearn.linear_model import LinearRegression
dic = [0, 30, 60, 90, 120, 150, 180, 210, 240, 270, 300, 330, 360]
dc = [0, 0.5, 0.866, 1, .866, 0.5, 0, -0.5, -0.866, -1, -0.866, -0.5, 0]
test = [1, 10, 100]
df = pd.DataFrame(dic)
dfy = pd.DataFrame(dc)
test = pd.DataFrame(test)
clf = LinearRegression()
clf.fit(df, dfy)
[out]
[[0.7340967 ]
[0.69718681]
[0.32808791]]
And Logistic doesn't fit at all because it is for classification. What approaches would be better suited to this problem?
Here is a graphical non-linear fitter using your data and a sine function. The numpy sine function uses radians, so the sine function used here rescales the input. I guessed the initial parameter estimates by looking at a scatterplot of the data, and from the RMSE of nearly 0.0 and the R-squared of almost 1.0 the data would seem to have no noise component.
import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
dic = [0.0, 30.0, 60.0, 90.0, 120.0, 150.0, 180.0, 210.0, 240.0, 270.0, 300.0, 330.0, 360.0]
dc = [0.0, 0.5, 0.866, 1.0, 0.866, 0.5, 0.0, -0.5, -0.866, -1.0, -0.866, -0.5, 0.0]
# rename data to match previous example code
xData = dic
yData = dc
def func(x, amplitude, center, width):
return amplitude * numpy.sin(numpy.pi * (x - center) / width)
# these are estimated from a scatterplot of the data
initialParameters = numpy.array([-1.0, 180.0, 180.0])
# curve fit the test data
fittedParameters, pcov = curve_fit(func, xData, yData, initialParameters)
modelPredictions = func(xData, *fittedParameters)
absError = modelPredictions - yData
SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))
print('Parameters:', fittedParameters)
print('RMSE:', RMSE)
print('R-squared:', Rsquared)
print()
##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
axes = f.add_subplot(111)
# first the raw data as a scatter plot
axes.plot(xData, yData, 'D')
# create data for the fitted equation plot
xModel = numpy.linspace(min(xData), max(xData))
yModel = func(xModel, *fittedParameters)
# now the model as a line plot
axes.plot(xModel, yModel)
axes.set_xlabel('X Data') # X axis data label
axes.set_ylabel('Y Data') # Y axis data label
plt.show()
plt.close('all') # clean up after using pyplot
graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)
I am studying nonlinear curvefit with python.
I made example like below.
But the optimized plot is not drawn well
plt.plot(basketCont, fittedData)
I guess the optimized parametes are not good also.
Could you give some recommends? Thank you.
import matplotlib
matplotlib.use('Qt4Agg')
import matplotlib.pyplot as plt
from matplotlib.pyplot import cm
import numpy as np
from scipy.optimize import curve_fit
def func(x, a, b, c):
return a - b* np.exp(c * x)
baskets = np.array([475, 108, 2, 38, 320])
scaling_factor = np.array([95.5, 57.7, 1.4, 21.9, 88.8])
popt,pcov = curve_fit(func, baskets, scaling_factor)
print (popt)
print (pcov)
basketCont=np.linspace(min(baskets),max(baskets),50)
fittedData=[func(x, *popt) for x in basketCont]
fig1 = plt.figure(1)
plt.scatter(baskets, scaling_factor, s=5)
plt.plot(basketCont, fittedData)
plt.grid()
plt.show()
I personally could not get a good fit to your data using the equation you posted, however the Hill sigmoidal equation gave a good fit. Here is the Python code for the graphical fitter I used.
import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import warnings
baskets = numpy.array([475.0, 108.0, 2.0, 38.0, 320.0])
scaling_factor = numpy.array([95.5, 57.7, 1.4, 21.9, 88.8])
# rename data for simpler code re-use later
xData = baskets
yData = scaling_factor
def func(x, a, b, c): # Hill sigmoidal equation from zunzun.com
return a * numpy.power(x, b) / (numpy.power(c, b) + numpy.power(x, b))
# these are the same as the scipy defaults
initialParameters = numpy.array([1.0, 1.0, 1.0])
# do not print unnecessary warnings during curve_fit()
warnings.filterwarnings("ignore")
# curve fit the test data
fittedParameters, pcov = curve_fit(func, xData, yData, initialParameters)
modelPredictions = func(xData, *fittedParameters)
absError = modelPredictions - yData
SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))
print('Parameters:', fittedParameters)
print('RMSE:', RMSE)
print('R-squared:', Rsquared)
print()
##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
axes = f.add_subplot(111)
# first the raw data as a scatter plot
axes.plot(xData, yData, 'D')
# create data for the fitted equation plot
xModel = numpy.linspace(min(xData), max(xData))
yModel = func(xModel, *fittedParameters)
# now the model as a line plot
axes.plot(xModel, yModel)
axes.set_xlabel('X Data') # X axis data label
axes.set_ylabel('Y Data') # Y axis data label
plt.show()
plt.close('all') # clean up after using pyplot
graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)
I'm trying to fit a curve to some data that I have but for some reason I just get the error "'numpy.float64' object cannot be interpreted as an integer" and I don't understand why or how to fix it. Would be grateful for some help, the code is below:
import numpy as np
import matplotlib.pyplot as plt
from scipy import optimize
mud=[0.0014700734999999996,
0.0011840320799999997,
0.0014232304799999995,
0.0008501509799999997,
0.0007235751599999999,
0.0005770661399999999,
0.0005581295999999999,
0.00028703807999999994,
0.00014850233999999998]
F=[0.5750972123893806,
0.5512177433628319,
0.5638906194690266,
0.5240915044247788,
0.5217873451327435,
0.5066008407079646,
0.5027256637168142,
0.4847113274336283,
0.46502123893805314]
fitfunc = lambda p, x: p[0]+p[1]*x # Target function
errfunc = lambda p, x, y: fitfunc(p, x) - y # Distance to the target function
p0 = [0.46,80,1] # Initial guess for the parameters
p1, success = optimize.leastsq(errfunc, p0[:], args=(mud, F))
m = np.linspace(max(mud),min(mud), 9)
ax = plot(mud,F,"b^")
ax3 = plot(m,fitfunc(p2,m),"g-")
Your problem is that your arguments, mud and F are lists, not arrays, which means that you cannot just multiply them with a number. Hence the error. If you define those parameters as np.ndarrays, it will work:
import numpy as np
import matplotlib.pyplot as plt
from scipy import optimize
mud=np.array([0.0014700734999999996,
0.0011840320799999997,
0.0014232304799999995,
0.0008501509799999997,
0.0007235751599999999,
0.0005770661399999999,
0.0005581295999999999,
0.00028703807999999994,
0.00014850233999999998])
F=np.array([0.5750972123893806,
0.5512177433628319,
0.5638906194690266,
0.5240915044247788,
0.5217873451327435,
0.5066008407079646,
0.5027256637168142,
0.4847113274336283,
0.46502123893805314])
fitfunc = lambda p, x: p[0]+p[1]*x # Target function
errfunc = lambda p, x, y: fitfunc(p, x) - y # Distance to the target function
p0 = [0.46,80,1] # Initial guess for the parameters
p1, success = optimize.leastsq(errfunc, p0[:], args=(mud, F))
print(p1, success)
gives
[ 0.46006301 76.7920086 1. ] 2
Here is a graphical fitter using the Van Deemter Chromatography equation, it gives a good fit to your data.
import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# mud
xData=numpy.array([0.0014700734999999996,
0.0011840320799999997,
0.0014232304799999995,
0.0008501509799999997,
0.0007235751599999999,
0.0005770661399999999,
0.0005581295999999999,
0.00028703807999999994,
0.00014850233999999998])
# F
yData=numpy.array([0.5750972123893806,
0.5512177433628319,
0.5638906194690266,
0.5240915044247788,
0.5217873451327435,
0.5066008407079646,
0.5027256637168142,
0.4847113274336283,
0.46502123893805314])
def func(x, a, b, c): # Van Deemter chromatography equation
return a + b/x + c*x
# these are the same as the scipy defaults
initialParameters = numpy.array([1.0, 1.0, 1.0])
# curve fit the test data
fittedParameters, pcov = curve_fit(func, xData, yData, initialParameters)
modelPredictions = func(xData, *fittedParameters)
absError = modelPredictions - yData
SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))
print('Parameters:', fittedParameters)
print('RMSE:', RMSE)
print('R-squared:', Rsquared)
print()
##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
axes = f.add_subplot(111)
# first the raw data as a scatter plot
axes.plot(xData, yData, 'D')
# create data for the fitted equation plot
xModel = numpy.linspace(min(xData), max(xData))
yModel = func(xModel, *fittedParameters)
# now the model as a line plot
axes.plot(xModel, yModel)
axes.set_xlabel('X Data (mud)') # X axis data label
axes.set_ylabel('Y Data (F)') # Y axis data label
plt.show()
plt.close('all') # clean up after using pyplot
graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)
I plotted a linear least square fit curve using scipy.optimize.curve_fit(). My data has some error associated to it and I added those while plotting the fit curve.
Next, I want to plot two dashed lines representing one sigma error bar on the curve fit and shade region between those two lines. This is what I have tried so far:
import sys
import os
import numpy
import matplotlib.pyplot as plt
from pylab import *
import scipy.optimize as optimization
from scipy.optimize import curve_fit
xdata = numpy.array([-5.6, -5.6, -6.1, -5.0, -3.2, -6.4, -5.2, -4.5, -2.22, -3.30, -6.15])
ydata = numpy.array([-18.40, -17.63, -17.67, -16.80, -14.19, -18.21, -17.10, -17.90, -15.30, -18.90, -18.62])
# Initial guess.
x0 = numpy.array([1.0, 1.0])
#data error
sigma = numpy.array([0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.16, 0.22, 0.45, 0.35])
sigma1 = numpy.array([0.000001, 0.000001, 0.000001, 0.0000001, 0.0000001, 0.13, 0.22, 0.30, 0.00000001, 1.0, 0.05])
#def func(x, a, b, c):
# return a + b*x + c*x*x
def line(x, a, b):
return a * x + b
#print optimization.curve_fit(line, xdata, ydata, x0, sigma)
popt, pcov = curve_fit(line, xdata, ydata, sigma =sigma)
print popt
print "a =", popt[0], "+/-", pcov[0,0]**0.5
print "b =", popt[1], "+/-", pcov[1,1]**0.5
#1 sigma error ######################################################################################
sigma2 = numpy.array([1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]) #make change
popt1, pcov1 = curve_fit(line, xdata, ydata, sigma = sigma2) #make change
print popt1
print "a1 =", popt1[0], "+/-", pcov1[0,0]**0.5
print "b1 =", popt1[1], "+/-", pcov1[1,1]**0.5
#####################################################################################################
plt.errorbar(xdata, ydata, yerr=sigma, xerr= sigma1, fmt="none")
plt.ylim(-11.5, -19.5)
plt.xlim(-2, -7)
xfine = np.linspace(-2.0, -7.0, 100) # define values to plot the function for
plt.plot(xfine, line(xfine, popt[0], popt[1]), 'r-')
plt.plot(xfine, line(xfine, popt1[0], popt1[1]), '--') #make change
plt.show()
However, I think the dashed line I plotted takes one sigma error from my provided xdata and ydata numpy array, not from the curve fit. Do I have to know the coordinates that satisfy my fit curve and then make a second array to make the one sigma error fit curve?
It seems you are plotting two completely different lines.
Instead, you need to plot three lines: the first one is your fit without any corrections, the other two lines should be built with the same parameters a and b, but with added or subtracted sigmas. You obtain the respective sigmas from the covariance matrix you obtain in pcov. So you'll have something like:
y = line(xfine, popt[0], popt[1])
y1 = line(xfine, popt[0] + pcov[0,0]**0.5, popt[1] - pcov[1,1]**0.5)
y2 = line(xfine, popt[0] - pcov[0,0]**0.5, popt[1] + pcov[1,1]**0.5)
plt.plot(xfine, y, 'r-')
plt.plot(xfine, y1, 'g--')
plt.plot(xfine, y2, 'g--')
plt.fill_between(xfine, y1, y2, facecolor="gray", alpha=0.15)
fill_between shades the area between the error bar lines.
This is the result:
You can apply the same technique for your other line if you want.