How to fit the following function using curve_fit - python

I have the following function I need to solve:
np.exp((1-Y)/Y) = np.exp(c) -b*x
I defined the function as:
def function(x, b, c):
np.exp((1-Y)/Y) = np.exp(c) -b*x
return y
def function_solve(y, b, c):
x = (np.exp(c)-np.exp((1-Y)/Y))/b
return x
then I used:
x_data = [4, 6, 8, 10]
y_data = [0.86, 0.73, 0.53, 0.3]
popt, pcov = curve_fit(function, x_data, y_data,(28.14,-0.25))
answer = function_solve(0.5, popt[0], popt[1])
I tried running the code and the error was:
can't assign to function call
The function I'm trying to solve is y = 1/ c*exp(-b*x) in the linear form. I have bunch of y_data and x_data, I want to get optimal values for c and b.

There are two problems that jump at me:
ln((1-Y)/Y) = ln(c) -b*x this is not valid Python code. On the left side you must have a name, whereas here you have a function call ln(..), hence the error.
ln() is not a Python function in the standard library. There is a math.log() function. Unless you defined ln() somewhere else, it will not work.

Some problems with your code have already been pointed out. Here is a solution:
First, you need to get the correct logarithmic expression of your original function:
y = 1 / (c * exp(-b * x))
y = exp(b * x) / c
ln(y) = b * x + ln(1/c)
ln(y) = b * x - ln(c)
If you want to use that in curve_fit, you need to define your function as follows:
def f_log(x, b, c_ln):
return b * x - c_ln
I now show you the outcome for some randomly generated data (using b = 0.08 and c = 100.5) using the original function and then also the output for the data you provided:
[ 8.17260899e-02 1.17566291e+02]
As you can see the fitted values are close to the original ones and the fit describes the data very well.
For your data it looks as follows:
[-0.094 -1.263]
Here is the code:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def f(x, b, c):
return 1. / (c * np.exp(-b * x))
def f_log(x, b, c_ln):
return b * x - c_ln
# some random data
b_org = 0.08
c_org = 100.5
x_data = np.linspace(0.01, 100., 50)
y_data = f(x_data, b_org, c_org) + np.random.normal(0, 0.5, len(x_data))
# fit the data
popt, pcov = curve_fit(f, x_data, y_data, p0=(0.1, 50))
print popt
# plot the data
xnew = np.linspace(0.01, 100., 5000)
plt.plot(x_data, y_data, 'bo')
plt.plot(xnew, f(xnew, *popt), 'r')
plt.show()
# your data
x_data = np.array([4, 6, 8, 10])
y_data = np.array([0.86, 0.73, 0.53, 0.3])
# fit the data
popt_log, pcov_log = curve_fit(f_log, x_data, y_data)
print popt_log
# plot the data
xnew = np.linspace(4, 10., 500)
plt.plot(x_data, y_data, 'bo')
plt.plot(xnew, f_log(xnew, *popt_log), 'r')
plt.show()

Your problem is in defining function():
def function(x, b, c):
ln((1-Y)/Y) = ln(c) -b*x
return y
You're trying to assign
ln(c) - b*x
to the call of another function, ln(), rather than a variable. Instead, solve the function for a variable (of the function) so it can be stored in a python variable.

Related

Most pythonic way to fit multiple gaussians using scipy.optimize

from scipy.optimize import curve_fit
def func(x, a, b):
return a * np.exp(-b * x)
xdata = np.linspace(0, 4, 50)
ydata = np.linspace(0, 4, 50)
popt, pcov = curve_fit(func, xdata, ydata)
It is quite easy to fit an arbitrary Gaussian in python with something like the above method. However, I would like to prepare a function that always the user to select an arbitrary number of Gaussians and still attempt to find a best fit.
I'm trying to figure out how to modify the function func so that I can pass it an additional parameter n=2 for instance and it would return a function that would try and fit 2 Gaussians, akin to:
from scipy.optimize import curve_fit
def func2(x, a, b, d, e):
return (a * np.exp(-b * x) + c * np.exp(-d * x))
xdata = np.linspace(0, 4, 50)
ydata = np.linspace(0, 4, 50)
popt, pcov = curve_fit(func2, xdata, ydata)
Without having to hard code this additional cases, so that we could instead pass a single function like func(...,n=2) and get the same result as above. I'm having trouble finding an elegant solution to this. My guess is that the best solution will be something using a lambda function.
You can define the function to take a variable number of arguments using def func(x, *args). The *args variable then contains something like (a1, b1, a2, b2, a3, ...). It would be possible to loop over those and sum the Gaussians but I'm showing a vectorized solution instead.
Since curve_fit can no longer determine the number of parameters from this function, you can provide an initial guess that determines the amount of Gaussians you want to fit. Each Gaussian requires two parameters, so [1, 1]*n produces a parameter vector of the correct length.
from scipy.optimize import curve_fit
import numpy as np
def func(x, *args):
x = x.reshape(-1, 1)
a = np.array(args[0::2]).reshape(1, -1)
b = np.array(args[1::2]).reshape(1, -1)
return np.sum(a * np.exp(-b * x), axis=1)
n = 3
xdata = np.linspace(0, 4, 50)
ydata = np.linspace(0, 4, 50)
popt, pcov = curve_fit(func, xdata, ydata, p0=[1, 1] * n)

Python how to control curvature when joining two points

I have a original curve. I am developing a model curve matching closely the original curve. Everything is working fine but not matching. How to control the curvature of my model curve? Below code is based on answer here.
My code:
def curve_line(point1, point2):
a = (point2[1] - point1[1])/(np.cosh(point2[0]) - np.cosh(point1[0]))
b = point1[1] - a*np.sinh(point1[0])
x = np.linspace(point1[0], point2[0],100).tolist()
y = (a*np.cosh(x) + b).tolist()
return x,y
###### A sample of my code is given below
point1 = [10,100]
point2 = [20,50]
x,y = curve_line(point1, point2)
plt.plot(point1[0], point1[1], 'o')
plt.plot(point2[0], point2[1], 'o')
plt.plot(x,y) ## len(x)
My present output:
I tried following function as well:
y = (50*np.exp(-x/10) +2.5)
The output is:
Instead of just guessing the right parameters of your model function, you can fit a model curve to your data using curve_fit.
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
x = np.array([ 1.92, 14.35, 21.50, 25.27, 27.34, 30.32, 32.31, 34.09, 34.21])
y = np.array([8.30, 8.26, 8.13, 7.49, 6.66, 4.59, 2.66, 0.60, 0.06])
def fun(x, a, b, c):
return a * np.cosh(b * x) + c
coef,_ = curve_fit(fun, x, y)
plt.plot(x, y, label='Original curve')
plt.plot(x, fun(x, *coef), label=f'Model: %5.3f cosh(%4.2f x + %4.2f)' % tuple(coef) )
plt.legend()
plt.show()
If it is important that the start and end points are closely fitted, you can pass uncertainties to curve_fit, adjusting them to lower values towards the ends, e.g. by
s = np.ones(len(x))
s[1:-1] = s[1:-1] * 3
coef,_ = curve_fit(fun, x, y, sigma=s)
Your other approach a * np.exp(b * x) + c will also work and gives -0.006 exp(0.21 x + 8.49).
In some cases you'll have to provide an educated guess for the initial values of the coefficients to curve_fit (it uses 1 as default).

Curve fitting multiple outputs from a single function with scipy

OK, I have a function which uses a range of parameters to calculate the effect on two separate variables over time. These variables have already been curve-matched to some existing data to minimize the variation (shown below)
I want to be able to check the previous working, and match new data. I have been trying to use the scipy.optimize.curve_fit function, by stacking the x and y data resulting from my function (as suggested here: fit multiple parametric curves with scipy).
It may not be the right method, or I may just be misunderstanding, but my code keeps running into a type error TypeError: Improper input: N=3 must not exceed M=2
My simplified prototype code was initially taken from here: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a, b, c):
result = ([],[])
for i in x:
#set up 2 example curves
result[0].append(a * np.exp(-b * i) + c)
result[1].append(a * np.exp(-b * i) + c**2)
return result #as a tuple containing 2 lists
#Define the data to be fit with some noise:
xdata = list(np.arange(0, 10, 1))
y = func(xdata, 2.5, 5, 0.5)[0]
y2 = func(xdata, 1, 1, 2)[1]
#Add some noise
y_noise = 0.1 * np.random.normal(size=len(xdata))
y2_noise = 0.1 * np.random.normal(size=len(xdata))
ydata=[]
ydata2=[]
for i in range(len(y)): #clunky
ydata.append(y[i] + y_noise[i])
ydata2.append(y2[i] + y2_noise[i])
plt.scatter(xdata, ydata, label='data')
plt.scatter(xdata, ydata2, label='data2')
#plt.plot(xdata, y, 'k-', label='data (original function)')
#plt.plot(xdata, y2, 'k-', label='data2 (original function)')
#stack the data
xdat = xdata+xdata
ydat = ydata+ydata2
popt, pcov = curve_fit(func, xdat, ydat)
plt.plot(xdata, func(xdata, *popt), 'r-',
label='fit: a=%5.3f, b=%5.3f, c=%5.3f' % tuple(popt))
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()
Any help much appreciated !
Here is graphing example code that fits two different equations with a single shared parameter, if this looks like what you need it can easily be adapted for your specific problem.
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
y1 = np.array([ 16.00, 18.42, 20.84, 23.26])
y2 = np.array([-20.00, -25.50, -31.00, -36.50, -42.00])
comboY = np.append(y1, y2)
x1 = np.array([5.0, 6.1, 7.2, 8.3])
x2 = np.array([15.0, 16.1, 17.2, 18.3, 19.4])
comboX = np.append(x1, x2)
if len(y1) != len(x1):
raise(Exception('Unequal x1 and y1 data length'))
if len(y2) != len(x2):
raise(Exception('Unequal x2 and y2 data length'))
def function1(data, a, b, c): # not all parameters are used here, c is shared
return a * data + c
def function2(data, a, b, c): # not all parameters are used here, c is shared
return b * data + c
def combinedFunction(comboData, a, b, c):
# single data reference passed in, extract separate data
extract1 = comboData[:len(x1)] # first data
extract2 = comboData[len(x1):] # second data
result1 = function1(extract1, a, b, c)
result2 = function2(extract2, a, b, c)
return np.append(result1, result2)
# some initial parameter values
initialParameters = np.array([1.0, 1.0, 1.0])
# curve fit the combined data to the combined function
fittedParameters, pcov = curve_fit(combinedFunction, comboX, comboY, initialParameters)
# values for display of fitted function
a, b, c = fittedParameters
y_fit_1 = function1(x1, a, b, c) # first data set, first equation
y_fit_2 = function2(x2, a, b, c) # second data set, second equation
plt.plot(comboX, comboY, 'D') # plot the raw data
plt.plot(x1, y_fit_1) # plot the equation using the fitted parameters
plt.plot(x2, y_fit_2) # plot the equation using the fitted parameters
plt.show()
print('a, b, c:', fittedParameters)

Python curve_fit with multiple independent variables

Python's curve_fit calculates the best-fit parameters for a function with a single independent variable, but is there a way, using curve_fit or something else, to fit for a function with multiple independent variables? For example:
def func(x, y, a, b, c):
return log(a) + b*log(x) + c*log(y)
where x and y are the independent variable and we would like to fit for a, b, and c.
You can pass curve_fit a multi-dimensional array for the independent variables, but then your func must accept the same thing. For example, calling this array X and unpacking it to x, y for clarity:
import numpy as np
from scipy.optimize import curve_fit
def func(X, a, b, c):
x,y = X
return np.log(a) + b*np.log(x) + c*np.log(y)
# some artificially noisy data to fit
x = np.linspace(0.1,1.1,101)
y = np.linspace(1.,2., 101)
a, b, c = 10., 4., 6.
z = func((x,y), a, b, c) * 1 + np.random.random(101) / 100
# initial guesses for a,b,c:
p0 = 8., 2., 7.
print(curve_fit(func, (x,y), z, p0))
Gives the fit:
(array([ 9.99933937, 3.99710083, 6.00875164]), array([[ 1.75295644e-03, 9.34724308e-05, -2.90150983e-04],
[ 9.34724308e-05, 5.09079478e-06, -1.53939905e-05],
[ -2.90150983e-04, -1.53939905e-05, 4.84935731e-05]]))
optimizing a function with multiple input dimensions and a variable number of parameters
This example shows how to fit a polynomial with a two dimensional input (R^2 -> R) by an increasing number of coefficients. The design is very flexible so that the callable f from curve_fit is defined once for any number of non-keyword arguments.
minimal reproducible example
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def poly2d(xy, *coefficients):
x = xy[:, 0]
y = xy[:, 1]
proj = x + y
res = 0
for order, coef in enumerate(coefficients):
res += coef * proj ** order
return res
nx = 31
ny = 21
range_x = [-1.5, 1.5]
range_y = [-1, 1]
target_coefficients = (3, 0, -19, 7)
xs = np.linspace(*range_x, nx)
ys = np.linspace(*range_y, ny)
im_x, im_y = np.meshgrid(xs, ys)
xdata = np.c_[im_x.flatten(), im_y.flatten()]
im_target = poly2d(xdata, *target_coefficients).reshape(ny, nx)
fig, axs = plt.subplots(2, 3, figsize=(29.7, 21))
axs = axs.flatten()
ax = axs[0]
ax.set_title('Unknown polynomial P(x+y)\n[secret coefficients: ' + str(target_coefficients) + ']')
sm = ax.imshow(
im_target,
cmap = plt.get_cmap('coolwarm'),
origin='lower'
)
fig.colorbar(sm, ax=ax)
for order in range(5):
ydata=im_target.flatten()
popt, pcov = curve_fit(poly2d, xdata=xdata, ydata=ydata, p0=[0]*(order+1) )
im_fit = poly2d(xdata, *popt).reshape(ny, nx)
ax = axs[1+order]
title = 'Fit O({:d}):'.format(order)
for o, p in enumerate(popt):
if o%2 == 0:
title += '\n'
if o == 0:
title += ' {:=-{w}.1f} (x+y)^{:d}'.format(p, o, w=int(np.log10(max(abs(p), 1))) + 5)
else:
title += ' {:=+{w}.1f} (x+y)^{:d}'.format(p, o, w=int(np.log10(max(abs(p), 1))) + 5)
title += '\nrms: {:.1f}'.format( np.mean((im_fit-im_target)**2)**.5 )
ax.set_title(title)
sm = ax.imshow(
im_fit,
cmap = plt.get_cmap('coolwarm'),
origin='lower'
)
fig.colorbar(sm, ax=ax)
for ax in axs.flatten():
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.show()
P.S. The concept of this answer is identical to my other answer here, but the code example is way more clear. At the time given, I will delete the other answer.
Fitting to an unknown numer of parameters
In this example, we try to reproduce some measured data measData.
In this example measData is generated by the function measuredData(x, a=.2, b=-2, c=-.8, d=.1). I practice, we might have measured measData in a way - so we have no idea, how it is described mathematically. Hence the fit.
We fit by a polynomial, which is described by the function polynomFit(inp, *args). As we want to try out different orders of polynomials, it is important to be flexible in the number of input parameters.
The independent variables (x and y in your case) are encoded in the 'columns'/second dimension of inp.
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def measuredData(inp, a=.2, b=-2, c=-.8, d=.1):
x=inp[:,0]
y=inp[:,1]
return a+b*x+c*x**2+d*x**3 +y
def polynomFit(inp, *args):
x=inp[:,0]
y=inp[:,1]
res=0
for order in range(len(args)):
print(14,order,args[order],x)
res+=args[order] * x**order
return res +y
inpData=np.linspace(0,10,20).reshape(-1,2)
inpDataStr=['({:.1f},{:.1f})'.format(a,b) for a,b in inpData]
measData=measuredData(inpData)
fig, ax = plt.subplots()
ax.plot(np.arange(inpData.shape[0]), measData, label='measuered', marker='o', linestyle='none' )
for order in range(5):
print(27,inpData)
print(28,measData)
popt, pcov = curve_fit(polynomFit, xdata=inpData, ydata=measData, p0=[0]*(order+1) )
fitData=polynomFit(inpData,*popt)
ax.plot(np.arange(inpData.shape[0]), fitData, label='polyn. fit, order '+str(order), linestyle='--' )
ax.legend( loc='upper left', bbox_to_anchor=(1.05, 1))
print(order, popt)
ax.set_xticklabels(inpDataStr, rotation=90)
Result:
Yes. We can pass multiple variables for curve_fit. I have written a piece of code:
import numpy as np
x = np.random.randn(2,100)
w = np.array([1.5,0.5]).reshape(1,2)
esp = np.random.randn(1,100)
y = np.dot(w,x)+esp
y = y.reshape(100,)
In the above code I have generated x a 2D data set in shape of (2,100) i.e, there are two variables with 100 data points. I have fit the dependent variable y with independent variables x with some noise.
def model_func(x,w1,w2,b):
w = np.array([w1,w2]).reshape(1,2)
b = np.array([b]).reshape(1,1)
y_p = np.dot(w,x)+b
return y_p.reshape(100,)
We have defined a model function that establishes relation between y & x.
Note: The shape of output of the model function or predicted y should be (length of x,)
popt, pcov = curve_fit(model_func,x,y)
The popt is an 1D numpy array containing predicted parameters. In our case there are 3 parameters.
Yes, there is: simply give curve_fit a multi-dimensional array for xData.

Python - calculating trendlines with errors

So I've got some data stored as two lists, and plotted them using
plot(datasetx, datasety)
Then I set a trendline
trend = polyfit(datasetx, datasety)
trendx = []
trendy = []
for a in range(datasetx[0], (datasetx[-1]+1)):
trendx.append(a)
trendy.append(trend[0]*a**2 + trend[1]*a + trend[2])
plot(trendx, trendy)
But I have a third list of data, which is the error in the original datasety. I'm fine with plotting the errorbars, but what I don't know is using this, how to find the error in the coefficients of the polynomial trendline.
So say my trendline came out to be 5x^2 + 3x + 4 = y, there needs to be some sort of error on the 5, 3 and 4 values.
Is there a tool using NumPy that will calculate this for me?
I think you can use the function curve_fit of scipy.optimize (documentation). A basic example of the usage:
import numpy as np
from scipy.optimize import curve_fit
def func(x, a, b, c):
return a*x**2 + b*x + c
x = np.linspace(0,4,50)
y = func(x, 5, 3, 4)
yn = y + 0.2*np.random.normal(size=len(x))
popt, pcov = curve_fit(func, x, yn)
Following the documentation, pcov gives:
The estimated covariance of popt. The diagonals provide the variance
of the parameter estimate.
So in this way you can calculate an error estimate on the coefficients. To have the standard deviation you can take the square root of the variance.
Now you have an error on the coefficients, but it is only based on the deviation between the ydata and the fit. In case you also want to account for an error on the ydata itself, the curve_fit function provides the sigma argument:
sigma : None or N-length sequence
If not None, it represents the standard-deviation of ydata. This
vector, if given, will be used as weights in the least-squares
problem.
A complete example:
import numpy as np
from scipy.optimize import curve_fit
def func(x, a, b, c):
return a*x**2 + b*x + c
x = np.linspace(0,4,20)
y = func(x, 5, 3, 4)
# generate noisy ydata
yn = y + 0.2 * y * np.random.normal(size=len(x))
# generate error on ydata
y_sigma = 0.2 * y * np.random.normal(size=len(x))
popt, pcov = curve_fit(func, x, yn, sigma = y_sigma)
# plot
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
ax.errorbar(x, yn, yerr = y_sigma, fmt = 'o')
ax.plot(x, np.polyval(popt, x), '-')
ax.text(0.5, 100, r"a = {0:.3f} +/- {1:.3f}".format(popt[0], pcov[0,0]**0.5))
ax.text(0.5, 90, r"b = {0:.3f} +/- {1:.3f}".format(popt[1], pcov[1,1]**0.5))
ax.text(0.5, 80, r"c = {0:.3f} +/- {1:.3f}".format(popt[2], pcov[2,2]**0.5))
ax.grid()
plt.show()
Then something else, about using numpy arrays. One of the main advantages of using numpy is that you can avoid for loops because operations on arrays apply elementwise. So the for-loop in your example can also be done as following:
trendx = arange(datasetx[0], (datasetx[-1]+1))
trendy = trend[0]*trendx**2 + trend[1]*trendx + trend[2]
Where I use arange instead of range as it returns a numpy array instead of a list.
In this case you can also use the numpy function polyval:
trendy = polyval(trend, trendx)
I have not been able to find any way of getting the errors in the coefficients that is built in to numpy or python. I have a simple tool that I wrote based on Section 8.5 and 8.6 of John Taylor's An Introduction to Error Analysis. Maybe this will be sufficient for your task (note the default return is the variance, not the standard deviation). You can get large errors (as in the provided example) because of significant covariance.
def leastSquares(xMat, yMat):
'''
Purpose
-------
Perform least squares using the procedure outlined in 8.5 and 8.6 of Taylor, solving
matrix equation X a = Y
Examples
--------
>>> from scipy import matrix
>>> xMat = matrix([[ 1, 5, 25],
[ 1, 7, 49],
[ 1, 9, 81],
[ 1, 11, 121]])
>>> # matrix has rows of format [constant, x, x^2]
>>> yMat = matrix([[142],
[168],
[211],
[251]])
>>> a, varCoef, yRes = leastSquares(xMat, yMat)
>>> # a is a column matrix, holding the three coefficients a, b, c, corresponding to
>>> # the equation a + b*x + c*x^2
Returns
-------
a: matrix
best fit coefficients
varCoef: matrix
variance of derived coefficents
yRes: matrix
y-residuals of fit
'''
xMatSize = xMat.shape
numMeas = xMatSize[0]
numVars = xMatSize[1]
xxMat = xMat.T * xMat
xyMat = xMat.T * yMat
xxMatI = xxMat.I
aMat = xxMatI * xyMat
yAvgMat = xMat * aMat
yRes = yMat - yAvgMat
var = (yRes.T * yRes) / (numMeas - numVars)
varCoef = xxMatI.diagonal() * var[0, 0]
return aMat, varCoef, yRes

Categories