Estimating parameter values using optimize.curve.fit - python

I am attempting to estimate the parameters of the non-linear equation:
y(x1, x2) = x1 / A + Bx1 + Cx2
using the method outlined in the answer to this question, but can find no documentation on how to pass multiple independent variables to the curve_fit function appropriately.
Specifically, I am attempting to estimate plant biomass (y) on the basis of the density of the plant (x1), and the density of a competitor (x2). I have three exponential equations (of the form y = a[1-exp(-b*x1)]) for the the relationship between plant density and plant biomass, with different parameter values for three competitor densities:
For x2 == 146: y = 1697 * [1 - exp(-0.010 * x1)]
For x2 == 112: y = 1994 * [1 - exp(-0.023 * x1)]
For x2 == 127: y = 1022 * [1 - exp(-0.008 * x1)]
I would therefore like to write code along the lines of:
def model_func(self, x_vals, A, B, C):
return x_vals[0] / (A + B * x_vals[0] + C * x_vals[1])
def fit_nonlinear(self, d, y):
opt_parms, parm_cov = sp.optimize.curve_fit(self.model_func, [x1, x2], y, p0 = (0.2, 0.004, 0.007), maxfev=10000)
A, B, C = opt_parms
return A, B, C
However I have not found any documentation on how to format the argument y (passed to fit_nonlinear) to capture to two-dimensional nature of the x_vals (the documentation for curve_fit states y should be an N-length sequence). Is what I am attempting possible with curve_fit?

Based on your comment above, you want to think of using the flat versions of the matrices. If you take the same element from both the X1 and X2 matrices, that pair of values has a corresponding y-value. Here's a minimal example
import numpy as np
import scipy as sp
import scipy.optimize
x1 = np.linspace(-1, 1)
x2 = np.linspace(-1, 1)
X1, X2 = np.meshgrid(x1, x2)
def func(X, A, B, C):
X1, X2 = X
return X1 / (A + B * X1 + C * X2)
# generate some noisy points corresponding to a set of parameter values
p_ref = [0.15, 0.001, 0.05]
Yref = func([X1, X2], *p_ref)
std = Yref.std()
Y = Yref + np.random.normal(scale=0.1 * std, size=Yref.shape)
# fit a curve to the noisy points
p0 = (0.2, 0.004, 0.007)
p, cov = sp.optimize.curve_fit(func, [X1.flat, X2.flat], Y.flat, p0=p0)
# if the parameters from the fit are close to the ones used
# to generate the noisy points, we succeeded
print p_ref
print p

Related

Spectroscopy bimodal fitting isn't giving accurate x data(Wavelength) although y fitting is good. In python 3.7

I'm trying to fit some bimodal curve using James Phillips's(https://stackoverflow.com/users/2436862/james-phillips) RamanSpectroscopy (https://bitbucket.org/zunzuncode/ramanspectroscopyfit/src/master/) It fits y data(intensity amplitude) perfectly but returning exact x data for all the files which I was expecting to shift more or less. I tried lmfit models but doesn't fit my data well. This raman equation fits them as expected just doesn't really get the x axis right. I'm getting x axis with peakutils. Can I tweak some parameters in the ramanspectroscopy function to get accurate x readings?
Here's my code
import numpy as np
from matplotlib import pyplot
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import warnings
from scipy.optimize import differential_evolution
from numpy import genfromtxt
from peakutils.plot import plot as pplot
import peakutils
def get_peak(xData,yData):
# bounds on parameters are set in generate_Initial_Parameters() below
def double_Lorentz(x, a, b, A, w, x_0, A1, w1, x_01):
return a * x + b + (2 * A / np.pi) * (w / (4 * (x - x_0) ** 2 + w ** 2)) + (2 * A1 /
np.pi) * ( w1 / (4 * (x - x_01) ** 2 + w1 ** 2))
# function for genetic algorithm to minimize (sum of squared error)
# bounds on parameters are set in generate_Initial_Parameters() below
def sumOfSquaredError(parameterTuple):
warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
return np.sum((yData - double_Lorentz(xData, *parameterTuple)) ** 2)
def generate_Initial_Parameters():
# min and max used for bounds
maxX = max(xData)
minX = min(xData)
maxY = max(yData)
minY = min(yData)
parameterBounds = []
parameterBounds.append([-1.0, 1.0]) # parameter bounds for a
parameterBounds.append([maxY / -2.0, maxY / 2.0]) # parameter bounds for b
parameterBounds.append([0.0, maxY * 60.0]) # parameter bounds for A
parameterBounds.append([0.0, maxY / 2.0]) # parameter bounds for w
parameterBounds.append([minX, maxX]) # parameter bounds for x_0
parameterBounds.append([0.0, maxY * 60.0]) # parameter bounds for A1
parameterBounds.append([0.0, maxY / 2.0]) # parameter bounds for w1
parameterBounds.append([minX, maxX]) # parameter bounds for x_01
# "seed" the numpy random number generator for repeatable results
result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
return result.x
# generate initial parameter values
initialParameters = generate_Initial_Parameters()
# curve fit the test data
fittedParameters, pcov = curve_fit(double_Lorentz, xData, yData, initialParameters)
# create values for display of fitted peak function
a, b, A, w, x_0, A1, w1, x_01 = fittedParameters
y_fit = double_Lorentz(xData, a, b, A, w, x_0, A1, w1, x_01)
index = peakutils.peak.indexes(y_fit, thres=0.5, min_dist=100)
index = index[:1]
print("X index:", xData[index], "Y index:", y_fit[index])
print(fittedParameters)
return xData[index]

How can I make a 3D plot in matplotlib of an ellipsoid defined by a quadratic equation?

I have the general formula of an ellipsoid:
A*x**2 + C*y**2 + D*x + E*y + B*x*y + F + G*z**2 = 0
where A,B,C,D,E,F,G are constant factors.
How can I plot this equation as a 3D plot in matplotlib? (A wireframe would be best.)
I saw this example but it is in parametric form and I am not sure how to put the z-coordinates in this code. Is there a way to keep the general form to plot this without the parametric form?
I started to put this in some kind of code like this:
from mpl_toolkits import mplot3d
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt
def f(x, y):
return ((A*x**2 + C*y**2 + D*x + E*y + B*x*y + F))
def f(z):
return G*z**2
x = np.linspace(-2200, 1850, 30)
y = np.linspace(-100, 60, 30)
z = np.linspace(-100, 60, 30)
X, Y, Z = np.meshgrid(x, y, z)
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_wireframe(X, Y, Z, rstride=10, cstride=10)
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z');
I got this error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-1-95b1296ae6a4> in <module>()
18 fig = plt.figure()
19 ax = fig.add_subplot(111, projection='3d')
---> 20 ax.plot_wireframe(X, Y, Z, rstride=10, cstride=10)
21 ax.set_xlabel('x')
22 ax.set_ylabel('y')
C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\mpl_toolkits\mplot3d\axes3d.py in plot_wireframe(self, X, Y, Z, *args, **kwargs)
1847 had_data = self.has_data()
1848 if Z.ndim != 2:
-> 1849 raise ValueError("Argument Z must be 2-dimensional.")
1850 # FIXME: Support masked arrays
1851 X, Y, Z = np.broadcast_arrays(X, Y, Z)
ValueError: Argument Z must be 2-dimensional.
Side note, but what you have is not the most general equation for a 3d ellipsoid. Your equation can be rewritten as
A*x**2 + C*y**2 + D*x + E*y + B*x*y = - G*z**2 - F,
which means that in effect for each value of z you get a different level of a 2d ellipse, and the slices are symmetric with respect to the z = 0 plane. This shows how your ellipsoid is not general, and it helps check the results to make sure that what we get makes sense.
Assuming we take a general point r0 = [x0, y0, z0], you have
r0 # M # r0 + b0 # r0 + c0 == 0
where
M = [ A B/2 0
B/2 C 0
0 0 G],
b0 = [D, E, 0],
c0 = F
where # stands for matrix-vector or vector-vector product.
You could take your function and plot its isosurface, but that would be suboptimal: you would need a gridded approximation for your function which is very expensive to do to sufficient resolution, and you'd have to choose the domain for this sampling wisely.
Instead you can perform a principal axis transformation on your data to generalize the parametric plot of a canonical ellipsoid that you yourself linked.
The first step is to diagonalize M as M = V # D # V.T, where D is diagonal. Since it's a real symmetric matrix this is always possible and V is orthogonal. Then we have
r0 # V # D # V.T # r0 + b0 # r0 + c0 == 0
which we can regroup as
(V.T # r0) # D # (V.T # r0) + b0 # V # (V.T # r0) + c0 == 0
which motivates the definition of the auxiliary coordinates r1 = V.T # r0 and vector b1 = b0 # V, for which we get
r1 # D # r1 + b1 # r1 + c0 == 0.
Since D is a symmetric matrix with the eigenvalues d1, d2, d3 in its diagonal, the above is the equation
d1 * x1**2 + d2 * x2**2 + d3 * x3**3 + b11 * x1 + b12 * x2 + b13 * x3 + c0 == 0
where r1 = [x1, x2, x3] and b1 = [b11, b12, b13].
What's left is to switch from r1 to r2 such that we remove the linear terms:
d1 * (x1 + b11/(2*d1))**2 + d2 * (x2 + b12/(2*d2))**2 + d3 * (x3 + b13/(2*d3))**2 - b11**2/(4*d1) - b12**2/(4*d2) - b13**2/(4*d3) + c0 == 0
So we define
r2 = [x2, y2, z2]
x2 = x1 + b11/(2*d1)
y2 = y1 + b12/(2*d2)
z2 = z1 + b13/(2*d3)
c2 = b11**2/(4*d1) b12**2/(4*d2) b13**2/(4*d3) - c0.
For these we finally have
d1 * x2**2 + d2 * y2**2 + d3 * z2**2 == c2,
d1/c2 * x2**2 + d2/c2 * y2**2 + d3/c2 * z2**2 == 1
which is the canonical form of a second-order surface. In order for this to meaningfully correspond to an ellipsoid we must ensure that d1, d2, d3 and c2 are all strictly positive. If this is guaranteed then the semi-major axes of the canonical form are sqrt(c2/d1), sqrt(c2/d2) and sqrt(c2/d3).
So here's what we do:
ensure that the parameters correspond to an ellipsoid
generate a theta and phi mesh for polar and azimuthal angles
compute the transformed coordinates [x2, y2, z2]
shift them back (by r2 - r1) to get [x1, y1, z1]
transform the coordinates back by V to get r0, the actual [x, y, z] coordinates we're interested in.
Here's how I'd implement this:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
def get_transforms(A, B, C, D, E, F, G):
""" Get transformation matrix and shift for a 3d ellipsoid
Assume A*x**2 + C*y**2 + D*x + E*y + B*x*y + F + G*z**2 = 0,
use principal axis transformation and verify that the inputs
correspond to an ellipsoid.
Returns: (d, V, s) tuple of arrays
d: shape (3,) of semi-major axes in the canonical form
(X/d1)**2 + (Y/d2)**2 + (Z/d3)**2 = 1
V: shape (3,3) of the eigensystem
s: shape (3,) shift from the linear terms
"""
# construct original matrix
M = np.array([[A, B/2, 0],
[B/2, C, 0],
[0, 0, G]])
# construct original linear coefficient vector
b0 = np.array([D, E, 0])
# constant term
c0 = F
# compute eigensystem
D, V = np.linalg.eig(M)
if (D <= 0).any():
raise ValueError("Parameter matrix is not positive definite!")
# transform the shift
b1 = b0 # V
# compute the final shift vector
s = b1 / (2 * D)
# compute the final constant term, also has to be positive
c2 = (b1**2 / (4 * D)).sum() - c0
if c2 <= 0:
print(b1, D, c0, c2)
raise ValueError("Constant in the canonical form is not positive!")
# compute the semi-major axes
d = np.sqrt(c2 / D)
return d, V, s
def get_ellipsoid_coordinates(A, B, C, D, E, F, G, n_theta=20, n_phi=40):
"""Compute coordinates of an ellipsoid on an ellipsoidal grid
Returns: x, y, z arrays of shape (n_theta, n_phi)
"""
# get canonical grid
theta,phi = np.mgrid[0:np.pi:n_theta*1j, 0:2*np.pi:n_phi*1j]
r2 = np.array([np.sin(theta) * np.cos(phi),
np.sin(theta) * np.sin(phi),
np.cos(theta)]) # shape (3, n_theta, n_phi)
# get transformation data
d, V, s = get_transforms(A, B, C, D, E, F, G) # could be *args I guess
# shift and transform back the coordinates
r1 = d[:,None,None]*r2 - s[:,None,None] # broadcast along first of three axes
r0 = (V # r1.reshape(3, -1)).reshape(r1.shape) # shape (3, n_theta, n_phi)
return r0 # unpackable to x, y, z of shape (n_theta, n_phi)
Here's an example with an ellipsoid and proof that it works:
A,B,C,D,E,F,G = args = 2, -1, 2, 3, -4, -3, 4
x,y,z = get_ellipsoid_coordinates(*args)
print(np.allclose(A*x**2 + C*y**2 + D*x + E*y + B*x*y + F + G*z**2, 0)) # True
The actual plotting from here is trivial. Using the 3d scaling hack from this answer to preserve equal axes:
# create 3d axes
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
# plot the data
ax.plot_wireframe(x, y, z)
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
# scaling hack
bbox_min = np.min([x, y, z])
bbox_max = np.max([x, y, z])
ax.auto_scale_xyz([bbox_min, bbox_max], [bbox_min, bbox_max], [bbox_min, bbox_max])
plt.show()
Here's how the result looks:
Rotating it around it's nicely visible that the surface is indeed reflection symmetric with respect to the z = 0 plane, which was evident from the equation.
You can change the n_theta and n_phi keyword arguments to the function to generate a grid with a different mesh. The fun thing is that you can take any scattered points lying on the unit sphere and plug it into the definition of r2 in the function get_ellipsoid_coordinates (as long as this array has a first dimension of size 3), and the output coordinates will have the same shape, but they will be transformed onto the actual ellipsoid.
You can also use other libraries to visualize the surface, for instance mayavi where you can either plot the surface we just computed, or compare it with an isosurface which is built-in there.

Two-dimensional interpolation out of projection

Let's say I have a 3d object on a grid V = V(a, b, c). I want to interpolate V(a, b + alpha*d, c+d).
In other words, define f(d) = V(a, b + alpha*d, c+d). I want to approximate f. Importantly, I want to then apply optimize.root to the approximation, so I appreciate efficient computation of f.
For example,
gamma = 0.5
aGrid = np.linspace(5, 10, 30)
bGrid = np.linspace(4, 7, 40)
cGrid = np.linspace(0.1, 0.5, 20)
A, B, C = np.meshgrid(aGrid, bGrid, cGrid, indexing='ij')
V = A**2 + B*C
# define initial a, b, c
idx = (7, 8, 9)
a, b, c = A[idx], B[idx], C[idx]
# so V(a, b, c) = V[idx]
A naive approach would be
g = scipy.interpolate.interp2d(bGrid, cGrid, V[7, ...])
f = lambda x: g(b + gamma*x, c + x)
and my ultimate goal:
constant = 10
err = lambda x: f(x) - constant
scipy.optimize.root(err, np.array([5]))
However, this all looks very messy and inefficient. Is there a more pythonic way of accomplishing this?
I have change the notations to help me in the understanding of the question (I am used to physics notations). There is a scalar field V(x, y, z) in the 3D space.
We define a parametric line in this 3D space:
f_{x0, y0, z0, v_x, v_y, v_z}(t) = (x0 + v_x*t, y0 + v_y*t, z0 + v_z*t)
It could be seen as the trajectory of a particule starting at the point (x0, y0, z0) and moving along a straight line with the velocity vector (v_x, v_y, v_z).
We are looking for the time t1 such that V( f(t1) ) is equal to a specific given value V0. Is this the asked question ?
import numpy as np
from scipy.interpolate import RegularGridInterpolator
from scipy.optimize import root_scalar
import matplotlib.pylab as plt
# build the field
aGrid = np.linspace(5, 10, 30)
bGrid = np.linspace(4, 7, 40)
cGrid = np.linspace(0.1, 0.5, 20)
A, B, C = np.meshgrid(aGrid, bGrid, cGrid, indexing='ij')
V = A**2 + B*C
# Build a continuous field by linear interpolation of the gridded data:
V_interpolated = RegularGridInterpolator((aGrid, bGrid, cGrid), V,
bounds_error=False, fill_value=None)
# define the parametric line
idx = (7, 8, 9)
x0, y0, z0 = A[idx], B[idx], C[idx]
alpha = 0.5
v_x, v_y, v_z = 0, alpha, 1
def line(t):
xyz = (x0 + v_x*t, y0 + v_y*t, z0 + v_z*t)
return xyz
# Plot V(x,y,z) along this line (to check, is there a unique solution?)
t_span = np.linspace(0, 10, 23)
V_along_the_line = V_interpolated( line(t_span) )
plt.plot(t_span, V_along_the_line);
plt.xlabel('t'); plt.ylabel('V');
# Find t such that V( f(t) ) == V1
V1 = 80
sol = root_scalar(lambda s: V_interpolated( line(s) ) - V1,
x0=.0, x1=10)
print(sol)
# converged: True
# flag: 'converged'
# function_calls: 8
# iterations: 7
# root: 5.385594973846983
# Get the coordinates of the solution point:
print("(x,y,z)_sol = ", line(sol.root))
# (6.206896551724138, 7.308182102308106, 5.675068658057509)

Fitting Data to a Square-root or Logarithmic Function [duplicate]

I have a set of data and I want to compare which line describes it best (polynomials of different orders, exponential or logarithmic).
I use Python and Numpy and for polynomial fitting there is a function polyfit(). But I found no such functions for exponential and logarithmic fitting.
Are there any? Or how to solve it otherwise?
For fitting y = A + B log x, just fit y against (log x).
>>> x = numpy.array([1, 7, 20, 50, 79])
>>> y = numpy.array([10, 19, 30, 35, 51])
>>> numpy.polyfit(numpy.log(x), y, 1)
array([ 8.46295607, 6.61867463])
# y ≈ 8.46 log(x) + 6.62
For fitting y = AeBx, take the logarithm of both side gives log y = log A + Bx. So fit (log y) against x.
Note that fitting (log y) as if it is linear will emphasize small values of y, causing large deviation for large y. This is because polyfit (linear regression) works by minimizing ∑i (ΔY)2 = ∑i (Yi − Ŷi)2. When Yi = log yi, the residues ΔYi = Δ(log yi) ≈ Δyi / |yi|. So even if polyfit makes a very bad decision for large y, the "divide-by-|y|" factor will compensate for it, causing polyfit favors small values.
This could be alleviated by giving each entry a "weight" proportional to y. polyfit supports weighted-least-squares via the w keyword argument.
>>> x = numpy.array([10, 19, 30, 35, 51])
>>> y = numpy.array([1, 7, 20, 50, 79])
>>> numpy.polyfit(x, numpy.log(y), 1)
array([ 0.10502711, -0.40116352])
# y ≈ exp(-0.401) * exp(0.105 * x) = 0.670 * exp(0.105 * x)
# (^ biased towards small values)
>>> numpy.polyfit(x, numpy.log(y), 1, w=numpy.sqrt(y))
array([ 0.06009446, 1.41648096])
# y ≈ exp(1.42) * exp(0.0601 * x) = 4.12 * exp(0.0601 * x)
# (^ not so biased)
Note that Excel, LibreOffice and most scientific calculators typically use the unweighted (biased) formula for the exponential regression / trend lines. If you want your results to be compatible with these platforms, do not include the weights even if it provides better results.
Now, if you can use scipy, you could use scipy.optimize.curve_fit to fit any model without transformations.
For y = A + B log x the result is the same as the transformation method:
>>> x = numpy.array([1, 7, 20, 50, 79])
>>> y = numpy.array([10, 19, 30, 35, 51])
>>> scipy.optimize.curve_fit(lambda t,a,b: a+b*numpy.log(t), x, y)
(array([ 6.61867467, 8.46295606]),
array([[ 28.15948002, -7.89609542],
[ -7.89609542, 2.9857172 ]]))
# y ≈ 6.62 + 8.46 log(x)
For y = AeBx, however, we can get a better fit since it computes Δ(log y) directly. But we need to provide an initialize guess so curve_fit can reach the desired local minimum.
>>> x = numpy.array([10, 19, 30, 35, 51])
>>> y = numpy.array([1, 7, 20, 50, 79])
>>> scipy.optimize.curve_fit(lambda t,a,b: a*numpy.exp(b*t), x, y)
(array([ 5.60728326e-21, 9.99993501e-01]),
array([[ 4.14809412e-27, -1.45078961e-08],
[ -1.45078961e-08, 5.07411462e+10]]))
# oops, definitely wrong.
>>> scipy.optimize.curve_fit(lambda t,a,b: a*numpy.exp(b*t), x, y, p0=(4, 0.1))
(array([ 4.88003249, 0.05531256]),
array([[ 1.01261314e+01, -4.31940132e-02],
[ -4.31940132e-02, 1.91188656e-04]]))
# y ≈ 4.88 exp(0.0553 x). much better.
You can also fit a set of a data to whatever function you like using curve_fit from scipy.optimize. For example if you want to fit an exponential function (from the documentation):
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a, b, c):
return a * np.exp(-b * x) + c
x = np.linspace(0,4,50)
y = func(x, 2.5, 1.3, 0.5)
yn = y + 0.2*np.random.normal(size=len(x))
popt, pcov = curve_fit(func, x, yn)
And then if you want to plot, you could do:
plt.figure()
plt.plot(x, yn, 'ko', label="Original Noised Data")
plt.plot(x, func(x, *popt), 'r-', label="Fitted Curve")
plt.legend()
plt.show()
(Note: the * in front of popt when you plot will expand out the terms into the a, b, and c that func is expecting.)
I was having some trouble with this so let me be very explicit so noobs like me can understand.
Lets say that we have a data file or something like that
# -*- coding: utf-8 -*-
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import numpy as np
import sympy as sym
"""
Generate some data, let's imagine that you already have this.
"""
x = np.linspace(0, 3, 50)
y = np.exp(x)
"""
Plot your data
"""
plt.plot(x, y, 'ro',label="Original Data")
"""
brutal force to avoid errors
"""
x = np.array(x, dtype=float) #transform your data in a numpy array of floats
y = np.array(y, dtype=float) #so the curve_fit can work
"""
create a function to fit with your data. a, b, c and d are the coefficients
that curve_fit will calculate for you.
In this part you need to guess and/or use mathematical knowledge to find
a function that resembles your data
"""
def func(x, a, b, c, d):
return a*x**3 + b*x**2 +c*x + d
"""
make the curve_fit
"""
popt, pcov = curve_fit(func, x, y)
"""
The result is:
popt[0] = a , popt[1] = b, popt[2] = c and popt[3] = d of the function,
so f(x) = popt[0]*x**3 + popt[1]*x**2 + popt[2]*x + popt[3].
"""
print "a = %s , b = %s, c = %s, d = %s" % (popt[0], popt[1], popt[2], popt[3])
"""
Use sympy to generate the latex sintax of the function
"""
xs = sym.Symbol('\lambda')
tex = sym.latex(func(xs,*popt)).replace('$', '')
plt.title(r'$f(\lambda)= %s$' %(tex),fontsize=16)
"""
Print the coefficients and plot the funcion.
"""
plt.plot(x, func(x, *popt), label="Fitted Curve") #same as line above \/
#plt.plot(x, popt[0]*x**3 + popt[1]*x**2 + popt[2]*x + popt[3], label="Fitted Curve")
plt.legend(loc='upper left')
plt.show()
the result is:
a = 0.849195983017 , b = -1.18101681765, c = 2.24061176543, d = 0.816643894816
Here's a linearization option on simple data that uses tools from scikit learn.
Given
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import FunctionTransformer
np.random.seed(123)
# General Functions
def func_exp(x, a, b, c):
"""Return values from a general exponential function."""
return a * np.exp(b * x) + c
def func_log(x, a, b, c):
"""Return values from a general log function."""
return a * np.log(b * x) + c
# Helper
def generate_data(func, *args, jitter=0):
"""Return a tuple of arrays with random data along a general function."""
xs = np.linspace(1, 5, 50)
ys = func(xs, *args)
noise = jitter * np.random.normal(size=len(xs)) + jitter
xs = xs.reshape(-1, 1) # xs[:, np.newaxis]
ys = (ys + noise).reshape(-1, 1)
return xs, ys
transformer = FunctionTransformer(np.log, validate=True)
Code
Fit exponential data
# Data
x_samp, y_samp = generate_data(func_exp, 2.5, 1.2, 0.7, jitter=3)
y_trans = transformer.fit_transform(y_samp) # 1
# Regression
regressor = LinearRegression()
results = regressor.fit(x_samp, y_trans) # 2
model = results.predict
y_fit = model(x_samp)
# Visualization
plt.scatter(x_samp, y_samp)
plt.plot(x_samp, np.exp(y_fit), "k--", label="Fit") # 3
plt.title("Exponential Fit")
Fit log data
# Data
x_samp, y_samp = generate_data(func_log, 2.5, 1.2, 0.7, jitter=0.15)
x_trans = transformer.fit_transform(x_samp) # 1
# Regression
regressor = LinearRegression()
results = regressor.fit(x_trans, y_samp) # 2
model = results.predict
y_fit = model(x_trans)
# Visualization
plt.scatter(x_samp, y_samp)
plt.plot(x_samp, y_fit, "k--", label="Fit") # 3
plt.title("Logarithmic Fit")
Details
General Steps
Apply a log operation to data values (x, y or both)
Regress the data to a linearized model
Plot by "reversing" any log operations (with np.exp()) and fit to original data
Assuming our data follows an exponential trend, a general equation+ may be:
We can linearize the latter equation (e.g. y = intercept + slope * x) by taking the log:
Given a linearized equation++ and the regression parameters, we could calculate:
A via intercept (ln(A))
B via slope (B)
Summary of Linearization Techniques
Relationship | Example | General Eqn. | Altered Var. | Linearized Eqn.
-------------|------------|----------------------|----------------|------------------------------------------
Linear | x | y = B * x + C | - | y = C + B * x
Logarithmic | log(x) | y = A * log(B*x) + C | log(x) | y = C + A * (log(B) + log(x))
Exponential | 2**x, e**x | y = A * exp(B*x) + C | log(y) | log(y-C) = log(A) + B * x
Power | x**2 | y = B * x**N + C | log(x), log(y) | log(y-C) = log(B) + N * log(x)
+Note: linearizing exponential functions works best when the noise is small and C=0. Use with caution.
++Note: while altering x data helps linearize exponential data, altering y data helps linearize log data.
Well I guess you can always use:
np.log --> natural log
np.log10 --> base 10
np.log2 --> base 2
Slightly modifying IanVS's answer:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a, b, c):
#return a * np.exp(-b * x) + c
return a * np.log(b * x) + c
x = np.linspace(1,5,50) # changed boundary conditions to avoid division by 0
y = func(x, 2.5, 1.3, 0.5)
yn = y + 0.2*np.random.normal(size=len(x))
popt, pcov = curve_fit(func, x, yn)
plt.figure()
plt.plot(x, yn, 'ko', label="Original Noised Data")
plt.plot(x, func(x, *popt), 'r-', label="Fitted Curve")
plt.legend()
plt.show()
This results in the following graph:
We demonstrate features of lmfit while solving both problems.
Given
import lmfit
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
np.random.seed(123)
# General Functions
def func_log(x, a, b, c):
"""Return values from a general log function."""
return a * np.log(b * x) + c
# Data
x_samp = np.linspace(1, 5, 50)
_noise = np.random.normal(size=len(x_samp), scale=0.06)
y_samp = 2.5 * np.exp(1.2 * x_samp) + 0.7 + _noise
y_samp2 = 2.5 * np.log(1.2 * x_samp) + 0.7 + _noise
Code
Approach 1 - lmfit Model
Fit exponential data
regressor = lmfit.models.ExponentialModel() # 1
initial_guess = dict(amplitude=1, decay=-1) # 2
results = regressor.fit(y_samp, x=x_samp, **initial_guess)
y_fit = results.best_fit
plt.plot(x_samp, y_samp, "o", label="Data")
plt.plot(x_samp, y_fit, "k--", label="Fit")
plt.legend()
Approach 2 - Custom Model
Fit log data
regressor = lmfit.Model(func_log) # 1
initial_guess = dict(a=1, b=.1, c=.1) # 2
results = regressor.fit(y_samp2, x=x_samp, **initial_guess)
y_fit = results.best_fit
plt.plot(x_samp, y_samp2, "o", label="Data")
plt.plot(x_samp, y_fit, "k--", label="Fit")
plt.legend()
Details
Choose a regression class
Supply named, initial guesses that respect the function's domain
You can determine the inferred parameters from the regressor object. Example:
regressor.param_names
# ['decay', 'amplitude']
To make predictions, use the ModelResult.eval() method.
model = results.eval
y_pred = model(x=np.array([1.5]))
Note: the ExponentialModel() follows a decay function, which accepts two parameters, one of which is negative.
See also ExponentialGaussianModel(), which accepts more parameters.
Install the library via > pip install lmfit.
Wolfram has a closed form solution for fitting an exponential. They also have similar solutions for fitting a logarithmic and power law.
I found this to work better than scipy's curve_fit. Especially when you don't have data "near zero". Here is an example:
import numpy as np
import matplotlib.pyplot as plt
# Fit the function y = A * exp(B * x) to the data
# returns (A, B)
# From: https://mathworld.wolfram.com/LeastSquaresFittingExponential.html
def fit_exp(xs, ys):
S_x2_y = 0.0
S_y_lny = 0.0
S_x_y = 0.0
S_x_y_lny = 0.0
S_y = 0.0
for (x,y) in zip(xs, ys):
S_x2_y += x * x * y
S_y_lny += y * np.log(y)
S_x_y += x * y
S_x_y_lny += x * y * np.log(y)
S_y += y
#end
a = (S_x2_y * S_y_lny - S_x_y * S_x_y_lny) / (S_y * S_x2_y - S_x_y * S_x_y)
b = (S_y * S_x_y_lny - S_x_y * S_y_lny) / (S_y * S_x2_y - S_x_y * S_x_y)
return (np.exp(a), b)
xs = [33, 34, 35, 36, 37, 38, 39, 40, 41, 42]
ys = [3187, 3545, 4045, 4447, 4872, 5660, 5983, 6254, 6681, 7206]
(A, B) = fit_exp(xs, ys)
plt.figure()
plt.plot(xs, ys, 'o-', label='Raw Data')
plt.plot(xs, [A * np.exp(B *x) for x in xs], 'o-', label='Fit')
plt.title('Exponential Fit Test')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend(loc='best')
plt.tight_layout()
plt.show()

How to fit a two-term exponential in python?

I need to fit some data with a two-term exponential following the equation a*exp(b*x)+c*exp(d*x). In matlab, it's as easy as changing the 1 to a 2 in polyfit to go from a one-term exponential to a two-term. I haven't found a simple solution to do it in python and was wondering if there even was one? I tried using curve_fit but it is giving me lots of trouble and after scouring the internet I still haven't found anything useful. Any help is appreciated!
Yes you can use curve_fit from scipy. Here is an example for your specific fit function.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
x = np.linspace(0,4,50) # Example data
def func(x, a, b, c, d):
return a * np.exp(b * x) + c * np.exp(d * x)
y = func(x, 2.5, 1.3, 0.5, 0.5) # Example exponential data
# Here you give the initial parameters for a,b,c which Python then iterates over
# to find the best fit
popt, pcov = curve_fit(func,x,y,p0=(1.0,1.0,1.0,1.0))
print(popt) # This contains your three best fit parameters
p1 = popt[0] # This is your a
p2 = popt[1] # This is your b
p3 = popt[2] # This is your c
p4 = popt[3] # This is your d
residuals = y - func(x,p1,p2,p3,p4)
fres = sum( (residuals**2)/func(x,p1,p2,p3,p4) ) # The chi-sqaure of your fit
print(fres)
""" Now if you need to plot, perform the code below """
curvey = func(x,p1,p2,p3,p4) # This is your y axis fit-line
plt.plot(x, curvey, 'red', label='The best-fit line')
plt.scatter(x,y, c='b',label='The data points')
plt.legend(loc='best')
plt.xlabel('x')
plt.ylabel('y')
plt.show()
You can do it with leastsq. Something like:
from numpy import log, exp
from scipy.optimize.minpack import leastsq
## regression function
def _exp(a, b, c, d):
"""
Exponential function y = a * exp(b * x) + c * exp(d * x)
"""
return lambda x: a * exp(b * x) + c * exp(d * x)
## interpolation
def interpolate(x, df, fun=_exp):
"""
Interpolate Y from X based on df, a dataframe with columns 'x' and 'y'.
"""
resid = lambda p, x, y: y - fun(*p)(x)
ls = leastsq(resid, [1.0, 1.0, 1.0, 1.0], args=(df['x'], df['y']))
a, b, c, d = ls[0]
y = fun(a, b, c, d)(x)
return y

Categories