Related
I am trying to use a system of ODEs to fit a curve using Python.
However, both of the equations requires calling on itself for any time point t.
If our two equations are
S'(t) = (a-b-c) * S(t)
R'(t) = (a-b) * R(t) + mN(t)
how would I pass these through to odeint?
So far this code is throwing the error
UnboundLocalError: local variable 'R' referenced before assignment
I am not sure how to get around this. Any suggestions or help would be greatly appreciated. Thanks!
from scipy.optimize import minimize
from scipy.optimize import curve_fit
from scipy import integrate
import numpy as np
import pylab as plt
xdata = np.array([0,1,2,3,6,7,8,9,10,13,14,15,16,17,22,23,24,27,28,29,30,31,34,35,36,37,38,41,42,43,44,45,48,49,50,51,52])
ydata = np.array([100, 97.2912199,91.08273896,86.28651363,70.58056853,49.00137427,47.50069587,48.22363999,47.22288896,42.59400221,29.35959158,30.47252256,30.85180297,33.44706703,41.93176936,44.2826702,46.51306081,57.32118321,62.58641733,53.23377307,50.4804287,51.59281055,67.82561566,61.28460679,65.49713333,67.99324793,74.55147631,104.5586471,98.97800534,94.31637549,99.0441014,109.6035876,151.0500311,135.2589923,135.2083231,145.1832811,169.0801019])
xdata = np.array(xdata, dtype=float)
ydata = np.array(ydata, dtype=float)
def model(y, x, a, b, c, m):
S = (a - b - c) * y[0]
R = (a - b) * R + x*(m *S)
return S, R
def fit_odeint1(x, a, b, c, m):
return integrate.odeint(model, (S0, R0), x, args=(a, b, c, m))[:,0]
def fit_odeint2(x, a, b, c, m):
return integrate.odeint(model, (S0, R0), x, args=(a, b, c, m))[:,1]
S0 = ydata[0]
R0 = 0
popt, pcov = curve_fit(fit_odeint1, xdata, ydata)
fitted1 = fit_odeint1(xdata, *popt)
absError = fitted1 - ydata
SE = np.square(absError) # squared errors
MSE = np.mean(SE) # mean squared errors
RMSE = np.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (np.var(absError) / np.var(ydata))
print("RMSE "+str(RMSE))
print("R^2 "+str(Rsquared))
print()
popt, pcov = curve_fit(fit_odeint2, xdata, ydata)
fitted2 = fit_odeint2(xdata, *popt)
absError = fitted2 - ydata
SE = np.square(absError) # squared errors
MSE = np.mean(SE) # mean squared errors
RMSE = np.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (np.var(absError) / np.var(ydata))
print("RMSE "+str(RMSE))
print("R^2 "+str(Rsquared))
print()
plt.plot(xdata, ydata, 'D', color='purple')
plt.plot(xdata, fitted1, '--', color='lightcoral')
plt.plot(xdata, fitted2, '--', color='mediumaquamarine')
plt.show()```
The UnboundLocalError happens due to your model function. You are trying to assign to the R variable, but also trying to use it during the expression calculation (so it does not have an assigned value):
def model(y, x, a, b, c, m):
S = (a - b - c) * y[0]
R = (a - b) * R + x*(m *S)
return S, R
The concept you are missing there is that, in the R'(t) equation (R'(t) = (a-b) * R(t) + mN(t)), R'(t) is different from R(t), so you should have different variables for them.
I have to define two separate functions z(x, mu, c) and landau(x, A, mu, c).
z = (x-mu)/c and landau = 1.64872*A*np.exp(-0.5(z+np.exp(-z)))
As they share variables I have tried to define z inside the definition of landau as a function itself and not as a function. Also, I have tried to define z as a function outside the definition of landau. However, nothing I try seems to work. Either python tells me "'float' object is not callable" or "bad operant type for unary -: 'function'". Is there a quick fix for this?
landau(1,1,1,1) should give an answer that's roughly equal to 1
def landau(x, A, mu, c): # Functie Landau met variabelen x, A, c en mu
# A = amplitude
# c = schaalparameter
# mu = positieparameter
def z(x, mu, c):
return (x-mu)/c
return 1.64872*A*np.exp(-0.5(z(x, mu, c)+np.exp(-z(x, mu, c))))
You missed a * in -0.5 * (z(x. At least I'm assuming it's supposed to be a multiplication.
import numpy as np
def landau(x, A, mu, c): # Functie Landau met variabelen x, A, c en mu
# A = amplitude
# c = schaalparameter
# mu = positieparameter
return 1.64872 * A * np.exp(-0.5 * (z(x, mu, c) + np.exp(-z(x, mu, c))))
def z(x, mu, c):
return (x - mu) / c
landau(1, 1, 1, 1)
0.9999992292814129
I am trying to fit a voigt function to an absorption profile, but makes it way too broad. I do not quite understand, as the sigma that I put in is quite small. (The same as for the gauss fit, which does a nice job.) Does anyone know how I can force a better fit?
My code is below.
def Gauss(x, y0, a, x0, sigma):
return y0 + a * np.exp(-(x - x0)**2 / (2 * sigma**2))
def Voigt(x, x0, a, sigma, gamma):
"""
Return the Voigt line shape at x with Lorentzian component HWHM gamma
and Gaussian component HWHM alpha.
"""
#sigma = alpha / np.sqrt(2 * np.log(2))
return a * np.real(wofz((x - x0 + 1j*gamma)/sigma/np.sqrt(2))) / sigma /np.sqrt(2*np.pi)
def fitgauss(x,y):
mean = sum(x * y) / sum(y)
minx = x[np.where(y == np.min(y))][0]
sigma = np.sqrt(sum(y * (x - mean)**2) / sum(y))
print(sigma, mean, -(max(y)-min(y)), minx)
try:
#popt, pcov = curve_fit(Gauss, x, y, p0=[max(y), -(max(y)-min(y)), mean, sigma])
popt, pcov = curve_fit(Voigt, x, y, p0=[minx, -(max(y)-min(y)), sigma, 1])
except RuntimeError:
print("Error - curve_fit failed, putting in 0.")
return popt
y= np.array([22910.756 , 17401.46 , 9381.714 , 6923.935 , 6015.0757,
5792.314 , 5454.933 , 5244.275 , 5170.7695, 5011.3506,
5344.995 , 6036.098 , 12318.398])
xx = np.array([6561.49999988, 6562. , 6562.19999999, 6562.39999998,
6562.69999999, 6562.84999999, 6563. , 6563.15000001,
6563.30000001, 6563.60000002, 6563.80000001, 6564. ,
6564.50000012])
popt = fitgauss(xx,y)
plt.plot(xx, y, 'b+:', label='data')
plt.plot(xx, Voigt(xx, *popt), 'r-', label='fit')
plt.legend()
plt.show()
I have a set of data and I want to compare which line describes it best (polynomials of different orders, exponential or logarithmic).
I use Python and Numpy and for polynomial fitting there is a function polyfit(). But I found no such functions for exponential and logarithmic fitting.
Are there any? Or how to solve it otherwise?
For fitting y = A + B log x, just fit y against (log x).
>>> x = numpy.array([1, 7, 20, 50, 79])
>>> y = numpy.array([10, 19, 30, 35, 51])
>>> numpy.polyfit(numpy.log(x), y, 1)
array([ 8.46295607, 6.61867463])
# y ≈ 8.46 log(x) + 6.62
For fitting y = AeBx, take the logarithm of both side gives log y = log A + Bx. So fit (log y) against x.
Note that fitting (log y) as if it is linear will emphasize small values of y, causing large deviation for large y. This is because polyfit (linear regression) works by minimizing ∑i (ΔY)2 = ∑i (Yi − Ŷi)2. When Yi = log yi, the residues ΔYi = Δ(log yi) ≈ Δyi / |yi|. So even if polyfit makes a very bad decision for large y, the "divide-by-|y|" factor will compensate for it, causing polyfit favors small values.
This could be alleviated by giving each entry a "weight" proportional to y. polyfit supports weighted-least-squares via the w keyword argument.
>>> x = numpy.array([10, 19, 30, 35, 51])
>>> y = numpy.array([1, 7, 20, 50, 79])
>>> numpy.polyfit(x, numpy.log(y), 1)
array([ 0.10502711, -0.40116352])
# y ≈ exp(-0.401) * exp(0.105 * x) = 0.670 * exp(0.105 * x)
# (^ biased towards small values)
>>> numpy.polyfit(x, numpy.log(y), 1, w=numpy.sqrt(y))
array([ 0.06009446, 1.41648096])
# y ≈ exp(1.42) * exp(0.0601 * x) = 4.12 * exp(0.0601 * x)
# (^ not so biased)
Note that Excel, LibreOffice and most scientific calculators typically use the unweighted (biased) formula for the exponential regression / trend lines. If you want your results to be compatible with these platforms, do not include the weights even if it provides better results.
Now, if you can use scipy, you could use scipy.optimize.curve_fit to fit any model without transformations.
For y = A + B log x the result is the same as the transformation method:
>>> x = numpy.array([1, 7, 20, 50, 79])
>>> y = numpy.array([10, 19, 30, 35, 51])
>>> scipy.optimize.curve_fit(lambda t,a,b: a+b*numpy.log(t), x, y)
(array([ 6.61867467, 8.46295606]),
array([[ 28.15948002, -7.89609542],
[ -7.89609542, 2.9857172 ]]))
# y ≈ 6.62 + 8.46 log(x)
For y = AeBx, however, we can get a better fit since it computes Δ(log y) directly. But we need to provide an initialize guess so curve_fit can reach the desired local minimum.
>>> x = numpy.array([10, 19, 30, 35, 51])
>>> y = numpy.array([1, 7, 20, 50, 79])
>>> scipy.optimize.curve_fit(lambda t,a,b: a*numpy.exp(b*t), x, y)
(array([ 5.60728326e-21, 9.99993501e-01]),
array([[ 4.14809412e-27, -1.45078961e-08],
[ -1.45078961e-08, 5.07411462e+10]]))
# oops, definitely wrong.
>>> scipy.optimize.curve_fit(lambda t,a,b: a*numpy.exp(b*t), x, y, p0=(4, 0.1))
(array([ 4.88003249, 0.05531256]),
array([[ 1.01261314e+01, -4.31940132e-02],
[ -4.31940132e-02, 1.91188656e-04]]))
# y ≈ 4.88 exp(0.0553 x). much better.
You can also fit a set of a data to whatever function you like using curve_fit from scipy.optimize. For example if you want to fit an exponential function (from the documentation):
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a, b, c):
return a * np.exp(-b * x) + c
x = np.linspace(0,4,50)
y = func(x, 2.5, 1.3, 0.5)
yn = y + 0.2*np.random.normal(size=len(x))
popt, pcov = curve_fit(func, x, yn)
And then if you want to plot, you could do:
plt.figure()
plt.plot(x, yn, 'ko', label="Original Noised Data")
plt.plot(x, func(x, *popt), 'r-', label="Fitted Curve")
plt.legend()
plt.show()
(Note: the * in front of popt when you plot will expand out the terms into the a, b, and c that func is expecting.)
I was having some trouble with this so let me be very explicit so noobs like me can understand.
Lets say that we have a data file or something like that
# -*- coding: utf-8 -*-
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import numpy as np
import sympy as sym
"""
Generate some data, let's imagine that you already have this.
"""
x = np.linspace(0, 3, 50)
y = np.exp(x)
"""
Plot your data
"""
plt.plot(x, y, 'ro',label="Original Data")
"""
brutal force to avoid errors
"""
x = np.array(x, dtype=float) #transform your data in a numpy array of floats
y = np.array(y, dtype=float) #so the curve_fit can work
"""
create a function to fit with your data. a, b, c and d are the coefficients
that curve_fit will calculate for you.
In this part you need to guess and/or use mathematical knowledge to find
a function that resembles your data
"""
def func(x, a, b, c, d):
return a*x**3 + b*x**2 +c*x + d
"""
make the curve_fit
"""
popt, pcov = curve_fit(func, x, y)
"""
The result is:
popt[0] = a , popt[1] = b, popt[2] = c and popt[3] = d of the function,
so f(x) = popt[0]*x**3 + popt[1]*x**2 + popt[2]*x + popt[3].
"""
print "a = %s , b = %s, c = %s, d = %s" % (popt[0], popt[1], popt[2], popt[3])
"""
Use sympy to generate the latex sintax of the function
"""
xs = sym.Symbol('\lambda')
tex = sym.latex(func(xs,*popt)).replace('$', '')
plt.title(r'$f(\lambda)= %s$' %(tex),fontsize=16)
"""
Print the coefficients and plot the funcion.
"""
plt.plot(x, func(x, *popt), label="Fitted Curve") #same as line above \/
#plt.plot(x, popt[0]*x**3 + popt[1]*x**2 + popt[2]*x + popt[3], label="Fitted Curve")
plt.legend(loc='upper left')
plt.show()
the result is:
a = 0.849195983017 , b = -1.18101681765, c = 2.24061176543, d = 0.816643894816
Here's a linearization option on simple data that uses tools from scikit learn.
Given
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import FunctionTransformer
np.random.seed(123)
# General Functions
def func_exp(x, a, b, c):
"""Return values from a general exponential function."""
return a * np.exp(b * x) + c
def func_log(x, a, b, c):
"""Return values from a general log function."""
return a * np.log(b * x) + c
# Helper
def generate_data(func, *args, jitter=0):
"""Return a tuple of arrays with random data along a general function."""
xs = np.linspace(1, 5, 50)
ys = func(xs, *args)
noise = jitter * np.random.normal(size=len(xs)) + jitter
xs = xs.reshape(-1, 1) # xs[:, np.newaxis]
ys = (ys + noise).reshape(-1, 1)
return xs, ys
transformer = FunctionTransformer(np.log, validate=True)
Code
Fit exponential data
# Data
x_samp, y_samp = generate_data(func_exp, 2.5, 1.2, 0.7, jitter=3)
y_trans = transformer.fit_transform(y_samp) # 1
# Regression
regressor = LinearRegression()
results = regressor.fit(x_samp, y_trans) # 2
model = results.predict
y_fit = model(x_samp)
# Visualization
plt.scatter(x_samp, y_samp)
plt.plot(x_samp, np.exp(y_fit), "k--", label="Fit") # 3
plt.title("Exponential Fit")
Fit log data
# Data
x_samp, y_samp = generate_data(func_log, 2.5, 1.2, 0.7, jitter=0.15)
x_trans = transformer.fit_transform(x_samp) # 1
# Regression
regressor = LinearRegression()
results = regressor.fit(x_trans, y_samp) # 2
model = results.predict
y_fit = model(x_trans)
# Visualization
plt.scatter(x_samp, y_samp)
plt.plot(x_samp, y_fit, "k--", label="Fit") # 3
plt.title("Logarithmic Fit")
Details
General Steps
Apply a log operation to data values (x, y or both)
Regress the data to a linearized model
Plot by "reversing" any log operations (with np.exp()) and fit to original data
Assuming our data follows an exponential trend, a general equation+ may be:
We can linearize the latter equation (e.g. y = intercept + slope * x) by taking the log:
Given a linearized equation++ and the regression parameters, we could calculate:
A via intercept (ln(A))
B via slope (B)
Summary of Linearization Techniques
Relationship | Example | General Eqn. | Altered Var. | Linearized Eqn.
-------------|------------|----------------------|----------------|------------------------------------------
Linear | x | y = B * x + C | - | y = C + B * x
Logarithmic | log(x) | y = A * log(B*x) + C | log(x) | y = C + A * (log(B) + log(x))
Exponential | 2**x, e**x | y = A * exp(B*x) + C | log(y) | log(y-C) = log(A) + B * x
Power | x**2 | y = B * x**N + C | log(x), log(y) | log(y-C) = log(B) + N * log(x)
+Note: linearizing exponential functions works best when the noise is small and C=0. Use with caution.
++Note: while altering x data helps linearize exponential data, altering y data helps linearize log data.
Well I guess you can always use:
np.log --> natural log
np.log10 --> base 10
np.log2 --> base 2
Slightly modifying IanVS's answer:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def func(x, a, b, c):
#return a * np.exp(-b * x) + c
return a * np.log(b * x) + c
x = np.linspace(1,5,50) # changed boundary conditions to avoid division by 0
y = func(x, 2.5, 1.3, 0.5)
yn = y + 0.2*np.random.normal(size=len(x))
popt, pcov = curve_fit(func, x, yn)
plt.figure()
plt.plot(x, yn, 'ko', label="Original Noised Data")
plt.plot(x, func(x, *popt), 'r-', label="Fitted Curve")
plt.legend()
plt.show()
This results in the following graph:
We demonstrate features of lmfit while solving both problems.
Given
import lmfit
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
np.random.seed(123)
# General Functions
def func_log(x, a, b, c):
"""Return values from a general log function."""
return a * np.log(b * x) + c
# Data
x_samp = np.linspace(1, 5, 50)
_noise = np.random.normal(size=len(x_samp), scale=0.06)
y_samp = 2.5 * np.exp(1.2 * x_samp) + 0.7 + _noise
y_samp2 = 2.5 * np.log(1.2 * x_samp) + 0.7 + _noise
Code
Approach 1 - lmfit Model
Fit exponential data
regressor = lmfit.models.ExponentialModel() # 1
initial_guess = dict(amplitude=1, decay=-1) # 2
results = regressor.fit(y_samp, x=x_samp, **initial_guess)
y_fit = results.best_fit
plt.plot(x_samp, y_samp, "o", label="Data")
plt.plot(x_samp, y_fit, "k--", label="Fit")
plt.legend()
Approach 2 - Custom Model
Fit log data
regressor = lmfit.Model(func_log) # 1
initial_guess = dict(a=1, b=.1, c=.1) # 2
results = regressor.fit(y_samp2, x=x_samp, **initial_guess)
y_fit = results.best_fit
plt.plot(x_samp, y_samp2, "o", label="Data")
plt.plot(x_samp, y_fit, "k--", label="Fit")
plt.legend()
Details
Choose a regression class
Supply named, initial guesses that respect the function's domain
You can determine the inferred parameters from the regressor object. Example:
regressor.param_names
# ['decay', 'amplitude']
To make predictions, use the ModelResult.eval() method.
model = results.eval
y_pred = model(x=np.array([1.5]))
Note: the ExponentialModel() follows a decay function, which accepts two parameters, one of which is negative.
See also ExponentialGaussianModel(), which accepts more parameters.
Install the library via > pip install lmfit.
Wolfram has a closed form solution for fitting an exponential. They also have similar solutions for fitting a logarithmic and power law.
I found this to work better than scipy's curve_fit. Especially when you don't have data "near zero". Here is an example:
import numpy as np
import matplotlib.pyplot as plt
# Fit the function y = A * exp(B * x) to the data
# returns (A, B)
# From: https://mathworld.wolfram.com/LeastSquaresFittingExponential.html
def fit_exp(xs, ys):
S_x2_y = 0.0
S_y_lny = 0.0
S_x_y = 0.0
S_x_y_lny = 0.0
S_y = 0.0
for (x,y) in zip(xs, ys):
S_x2_y += x * x * y
S_y_lny += y * np.log(y)
S_x_y += x * y
S_x_y_lny += x * y * np.log(y)
S_y += y
#end
a = (S_x2_y * S_y_lny - S_x_y * S_x_y_lny) / (S_y * S_x2_y - S_x_y * S_x_y)
b = (S_y * S_x_y_lny - S_x_y * S_y_lny) / (S_y * S_x2_y - S_x_y * S_x_y)
return (np.exp(a), b)
xs = [33, 34, 35, 36, 37, 38, 39, 40, 41, 42]
ys = [3187, 3545, 4045, 4447, 4872, 5660, 5983, 6254, 6681, 7206]
(A, B) = fit_exp(xs, ys)
plt.figure()
plt.plot(xs, ys, 'o-', label='Raw Data')
plt.plot(xs, [A * np.exp(B *x) for x in xs], 'o-', label='Fit')
plt.title('Exponential Fit Test')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend(loc='best')
plt.tight_layout()
plt.show()
I need to fit some data with a two-term exponential following the equation a*exp(b*x)+c*exp(d*x). In matlab, it's as easy as changing the 1 to a 2 in polyfit to go from a one-term exponential to a two-term. I haven't found a simple solution to do it in python and was wondering if there even was one? I tried using curve_fit but it is giving me lots of trouble and after scouring the internet I still haven't found anything useful. Any help is appreciated!
Yes you can use curve_fit from scipy. Here is an example for your specific fit function.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
x = np.linspace(0,4,50) # Example data
def func(x, a, b, c, d):
return a * np.exp(b * x) + c * np.exp(d * x)
y = func(x, 2.5, 1.3, 0.5, 0.5) # Example exponential data
# Here you give the initial parameters for a,b,c which Python then iterates over
# to find the best fit
popt, pcov = curve_fit(func,x,y,p0=(1.0,1.0,1.0,1.0))
print(popt) # This contains your three best fit parameters
p1 = popt[0] # This is your a
p2 = popt[1] # This is your b
p3 = popt[2] # This is your c
p4 = popt[3] # This is your d
residuals = y - func(x,p1,p2,p3,p4)
fres = sum( (residuals**2)/func(x,p1,p2,p3,p4) ) # The chi-sqaure of your fit
print(fres)
""" Now if you need to plot, perform the code below """
curvey = func(x,p1,p2,p3,p4) # This is your y axis fit-line
plt.plot(x, curvey, 'red', label='The best-fit line')
plt.scatter(x,y, c='b',label='The data points')
plt.legend(loc='best')
plt.xlabel('x')
plt.ylabel('y')
plt.show()
You can do it with leastsq. Something like:
from numpy import log, exp
from scipy.optimize.minpack import leastsq
## regression function
def _exp(a, b, c, d):
"""
Exponential function y = a * exp(b * x) + c * exp(d * x)
"""
return lambda x: a * exp(b * x) + c * exp(d * x)
## interpolation
def interpolate(x, df, fun=_exp):
"""
Interpolate Y from X based on df, a dataframe with columns 'x' and 'y'.
"""
resid = lambda p, x, y: y - fun(*p)(x)
ls = leastsq(resid, [1.0, 1.0, 1.0, 1.0], args=(df['x'], df['y']))
a, b, c, d = ls[0]
y = fun(a, b, c, d)(x)
return y