Related
I am trying to fit a progression of Gaussian peaks to a spectral lineshape.
The progression is a summation of N evenly spaced Gaussian peaks. When coded as a function, the formula for N=1 looks like this:
A * ((e0-i*hf)/e0)**3 * ((S**i)/np.math.factorial(i)) * np.exp(-4*np.log(2)*((x-e0+i*hf)/fwhm)**2)
where A, e0, hf, S and fwhm are to be determined from the fit with some good initial guesses.
Importantly, the parameter i starts at 0 and is incremented by 1 for every additional component.
So, for N = 3 the expression would take the form:
A * ((e0-0*hf)/e0)**3 * ((S**0)/np.math.factorial(0)) * np.exp(-4*np.log(2)*((x-e0+0*hf)/fwhm)**2) +
A * ((e0-1*hf)/e0)**3 * ((S**1)/np.math.factorial(1)) * np.exp(-4*np.log(2)*((x-e0+1*hf)/fwhm)**2) +
A * ((e0-2*hf)/e0)**3 * ((S**2)/np.math.factorial(2)) * np.exp(-4*np.log(2)*((x-e0+2*hf)/fwhm)**2)
All the parameters except i are constant for every component in the summation, and this is intended. i is changing in a controlled way depending on the number of parameters.
I am using curve_fit. One way to code the fitting routine would be to explicitly define the expression for any reasonable N and just use an appropriate one. Like, here it'would be 5 or 6, depending on the spacing, which is determined by hf. I could just define a long function with N components, writing an appropriate i value into each component. I understand how to do that (and did). But I would like to code this more intelligently. My goal is to write a function that will accept any value of N, add the appropriate amount of components as described above, compute the expression while incrementing the i properly and return the result.
I have attempted a variety of things. My main hurdle is that I don't know how to tell the program to use a particular N and the corresponding values of i. Finally, after some searching I thought I found a good way to code it with a lambda function.
from scipy.optimize import curve_fit
import numpy as np
def fullfunc(x,p,n):
def func(x,A,e0,hf,S,fwhm,i):
return A * ((e0-i*hf)/e0)**3 * ((S**i)/np.math.factorial(i)) * np.exp(-4*np.log(2)*((x-e0+i*hf)/fwhm)**2)
y_fit = np.zeros_like(x)
for i in range(n):
y_fit += func(x,p[0],p[1],p[2],p[3],p[4],i)
return y_fit
p = [1,26000,1400,1,1000]
x = [27027,25062,23364,21881,20576,19417,18382,17452,16611,15847,15151]
y = [0.01,0.42,0.93,0.97,0.65,0.33,0.14,0.06,0.02,0.01,0.004]
n = 7
fittedParameters, pcov = curve_fit(lambda x,p: fullfunc(x,p,n), x, y, p)
A,e0,hf,S,fwhm = fittedParameters
This gives:
TypeError: <lambda>() takes 2 positional arguments but 7 were given
and I don't understand why. I have a feeling the lambda function can't deal with a list of initial parameters.
I would greatly appreciate any advice on how to make this work without explicitly writing all the equations out, as I find that a bit too rigid.
The x and y ranges provided are samples of real data which give a general idea of what the shape is.
Since you only use summation over a range i=0, 1, ..., n-1, there is no need to refer to complicated lambda constructs that may or may not work in the context of curve fit. Just define your fit function as the summation of n components:
from matplotlib import pyplot as plt
from scipy.optimize import curve_fit
import numpy as np
def func(x, A, e0, hf, S, fwhm):
return sum((A * ((e0-i*hf)/e0)**3 * ((S**i)/np.math.factorial(i)) * np.exp(-4*np.log(2)*((x-e0+i*hf)/fwhm)**2)) for i in range(n))
p = [1,26000,1400,1,1000]
x = [27027,25062,23364,21881,20576,19417,18382,17452,16611,15847,15151]
y = [0.01,0.42,0.93,0.97,0.65,0.33,0.14,0.06,0.02,0.01,0.004]
n = 7
fittedParameters, pcov = curve_fit(func, x, y, p0=p)
#A,e0,hf,S,fwhm = fittedParameters
print(fittedParameters)
plt.plot(x, y, "ro", label="data")
x_fit = np.linspace(min(x), max(x), 100)
y_fit = func(x_fit, *fittedParameters)
plt.plot(x_fit, y_fit, label="fit")
plt.legend()
plt.show()
Sample output:
P.S.: By the look of it, these data points are already well fitted with n=1.
I am trying to fit different differential equations to a given data set with python. For this reason, I use the scipy package, respectively the solve_ivp function.
This works fine for me, as long as I have a rough estimate of the parameters (b= 0.005) included in the differential equations, e.g:
import matplotlib.pyplot as plt
from scipy.integrate import solve_ivp
import numpy as np
def f(x, y, b):
dydx= [-b[0] * y[0]]
return dydx
xspan= np.linspace(1, 500, 25)
yinit= [5]
b= [0.005]
sol= solve_ivp(lambda x, y: f(x, y, b),
[xspan[0], xspan[-1]], yinit, t_eval= xspan)
print(sol)
print("\n")
print(sol.t)
print(sol.y)
plt.plot(sol.t, sol.y[0], "b--")
However, what I like to achieve is, that the parameter b (or more parameters) is/are determined "automatically" based on the best fit of the solved differential equation to a given data set (x and y). Is there a way this can be done, for example by combining this example with the curve_fit function of scipy and how would this look?
Thank you in advance!
Yes, what you think about should work, it should be easy to plug together. You want to call
popt, pcov = scipy.optimize.curve_fit(curve, xdata, ydata, p0=[b0])
b = popt[0]
where you now have to define a function curve(x,*p) that transforms any list of point into a list of values according to the only parameter b.
def curve(x,b):
res = solve_ivp(odefun, [1,500], [5], t_eval=x, args = [b])
return res.y[0]
Add optional arguments for error tolerances as necessary.
To make this more realistic, make also the initial point a parameter. Then it also becomes more obvious where a list is expected and where single arguments. To get a proper fitting task add some random noise to the test data. Also make the fall to zero not so fast, so that the final plot still looks somewhat interesting.
from scipy.integrate import solve_ivp
from scipy.optimize import curve_fit
xmin,xmax = 1,500
def f(t, y, b):
dydt= -b * y
return dydt
def curve(t, b, y0):
sol= solve_ivp(lambda t, y: f(t, y, b),
[xmin, xmax], [y0], t_eval= t)
return sol.y[0]
xdata = np.linspace(xmin, xmax, 25)
ydata = np.exp(-0.02*xdata)+0.02*np.random.randn(*xdata.shape)
y0 = 5
b= 0.005
p0 = [b,y0]
popt, pcov = curve_fit(curve, xdata, ydata, p0=p0)
b, y0 = popt
print(f"b={b}, y0 = {y0}")
This returns
b=0.019975693539459473, y0 = 0.9757709108115179
Now plot the test data against the fitted curve
I'm trying to fit a exponential function by using scipy.optimize.curve_fit()(The example data and code as following). But it always shows a RuntimeError like this: RuntimeError: Optimal parameters not found: Number of calls to function has reached maxfev = 5000. I'm not sure where I'm going wrong.
import numpy as np
from scipy.optimize import curve_fit
x = np.arange(-1, 1, .01)
param1 = [-1, 2, 10, 100]
fit_func = lambda x, a, b, c, d: a * np.exp(b * x + c) + d
y = fit_func(x, *param1)
popt, _ = curve_fit(fit_func, x, y, maxfev=5000)
This is almost certainly due to the initial guess for the parameters.
You don't pass an initial guess to curve_fit, which means it defaults to a value of 1 for every parameter. Unfortunately, this is a terrible guess in your case. The function of interest is an exponential, one property of which is that the derivative is also an exponential. So all derivatives (first-order, second-order, etc) will be not just wrong, but have the wrong sign. This means the optimizer will have a very difficult time making progress.
You can solve this by giving the optimizer just a smidge of help. Since you know all your data is negative, you could just pass -1 as an initial guess for the first parameter (the scale or amplitude of the function). This alone is enough to for the optimizer to arrive at a reasonable guess.
p0 = (-1, 1, 1, 1)
popt, _ = curve_fit(x, y, p0=p0, maxfev=5000)
fig, ax = plt.subplots()
ax.plot(x, y, label="Data", color="k")
ax.plot(x, fit_func(x, *popt), color="r", linewidth=3.0, linestyle=":", label="Fitted")
fig.tight_layout()
You should see something like this:
So, I'm trying to fit a set of data with a power law of the following kind:
def f(x,N,a): # Power law fit
if a >0:
return N*x**(-a)
else:
return 10.**300
par,cov = scipy.optimize.curve_fit(f,data,time,array([10**(-7),1.2]))
where the else condition is just to force a to be positive. Using scipy.optimize.curve_fit yields an awful fit (green line), returning values of 1.2e+04 and 1.9e0-7 for N and a, respectively, with absolutely no intersection with the data. From fits I've put in manually, the values should land around 1e-07 and 1.2 for N and a, respectively, though putting those into curve_fit as initial parameters doesn't change the result. Removing the condition for a to be positive results in a worse fit, as it chooses a negative, which leads to a fit with the wrong sign slope.
I can't figure out how to get a believable, let alone reliable, fit out of this routine, but I can't find any other good Python curve fitting routines. Do I need to write my own least-squares algorithm or is there something I'm doing wrong here?
UPDATE
In the original post, I showed a solution that uses lmfit which allows to assign bounds to your parameters. Starting with version 0.17, scipy also allows to assign bounds to your parameters directly (see documentation). Please find this solution below after the EDIT which can hopefully serve as a minimal example on how to use scipy's curve_fit with parameter bounds.
Original post
As suggested by #Warren Weckesser, you could use lmfit to get this task done, which allows you to assign bounds to your parameters and avoids this 'ugly' if-clause.
Since you do not provide any data, I created some which are shown here:
They follow the law f(x) = 10.5 * x ** (-0.08)
I fit them - as suggested by #roadrunner66 - by transforming the power law in a linear function:
y = N * x ** a
ln(y) = ln(N * x ** a)
ln(y) = a * ln(x) + ln(N)
So I first use np.log on the original data and then do the fit. When I now use lmfit, I get the following output:
[[Variables]]
lN: 2.35450302 +/- 0.019531 (0.83%) (init= 1.704748)
a: -0.08035342 +/- 0.005158 (6.42%) (init=-0.5)
So a is pretty close to the original value and np.exp(2.35450302) gives 10.53 which is also very close to the original value.
The plot then looks as follows; as you can see the fit describes the data very well:
Here is the entire code with a couple of inline comments:
import numpy as np
import matplotlib.pyplot as plt
from lmfit import minimize, Parameters, Parameter, report_fit
# generate some data with noise
xData = np.linspace(0.01, 100., 50.)
aOrg = 0.08
Norg = 10.5
yData = Norg * xData ** (-aOrg) + np.random.normal(0, 0.5, len(xData))
plt.plot(xData, yData, 'bo')
plt.show()
# transform data so that we can use a linear fit
lx = np.log(xData)
ly = np.log(yData)
plt.plot(lx, ly, 'bo')
plt.show()
def decay(params, x, data):
lN = params['lN'].value
a = params['a'].value
# our linear model
model = a * x + lN
return model - data # that's what you want to minimize
# create a set of Parameters
params = Parameters()
params.add('lN', value=np.log(5.5), min=0.01, max=100) # value is the initial value
params.add('a', value=-0.5, min=-1, max=-0.001) # min, max define parameter bounds
# do fit, here with leastsq model
result = minimize(decay, params, args=(lx, ly))
# write error report
report_fit(params)
# plot data
xnew = np.linspace(0., 100., 5000.)
# plot the data
plt.plot(xData, yData, 'bo')
plt.plot(xnew, np.exp(result.values['lN']) * xnew ** (result.values['a']), 'r')
plt.show()
EDIT
Assuming that you have scipy 0.17 installed, you can also do the following using curve_fit. I show it for your original definition of the power law (red line in the plot below) as well as for the logarithmic data (black line in the plot below). The data is generated in the same way as above. The plot the looks as follows:
As you can see, the data is described very well. If you print popt and popt_log, you obtain array([ 10.47463426, 0.07914812]) and array([ 2.35158653, -0.08045776]), respectively (note: for the letter one you will have to take the exponantial of the first argument - np.exp(popt_log[0]) = 10.502 which is close to the original data).
Here is the entire code:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
# generate some data with noise
xData = np.linspace(0.01, 100., 50)
aOrg = 0.08
Norg = 10.5
yData = Norg * xData ** (-aOrg) + np.random.normal(0, 0.5, len(xData))
# get logarithmic data
lx = np.log(xData)
ly = np.log(yData)
def f(x, N, a):
return N * x ** (-a)
def f_log(x, lN, a):
return a * x + lN
# optimize using the appropriate bounds
popt, pcov = curve_fit(f, xData, yData, bounds=(0, [30., 20.]))
popt_log, pcov_log = curve_fit(f_log, lx, ly, bounds=([0, -10], [30., 20.]))
xnew = np.linspace(0.01, 100., 5000)
# plot the data
plt.plot(xData, yData, 'bo')
plt.plot(xnew, f(xnew, *popt), 'r')
plt.plot(xnew, f(xnew, np.exp(popt_log[0]), -popt_log[1]), 'k')
plt.show()
I have a problem working with curvefit function.
Here I have a code with two functions to work with.
The first is an hyperbolic function.
The second is the same but with one parameter = 1.
My problem is that the result to fit the first function with curvefit works fine but with the second doesn´t.
I have a commercial program that generates correct solutions for both respectively. So it is possible to find a solution for the second function (a particular case of the first one as I mentioned above)
Is there someone that could give me an idea about what I am doing wrong ?
Thanks !
Here is the code to run:
def hypRegress(ptp,pir):
xData = np.arange(len(ptp))
yData = pir
xData = np.array(xData, dtype=float)
yData = np.array(yData, dtype= float)
def funcHyp(x, qi, exp, di):
return qi*(1+exp*di*x)**(-1/exp)
def errfuncHyp(p):
return funcHyp(xData, p[0], p[1], p[2]) - yData
#print(xData.min(), xData.max())
#print(yData.min(), yData.max())
trialX = np.linspace(xData[0], xData[-1], 1000)
# Fit an hyperbolic
popt, pcov = optimize.curve_fit(funcHyp, xData, yData)
print 'popt'
#print(popt)
yHYP = funcHyp(trialX, *popt)
#optimization
# initial values
p1, success = optimize.leastsq(errfuncHyp, popt,maxfev=10000)
print p1
aaaa = funcHyp(trialX, *p1)
plt.figure()
plt.plot(xData, yData, 'r+', label='Data', marker='o')
plt.plot(trialX, yHYP, 'r-',ls='--', label="Hyp Fit")
plt.plot(trialX, aaaa, 'y', label = 'Optimized')
plt.legend()
plt.show(block=False)
return p1
def harRegress(ptp,pir):
xData = np.arange(len(ptp))
yData = pir
xData = np.array(xData, dtype=float)
yData = np.array(yData, dtype=float)
def funcHar(x, qi, di):
return qi*(1+di*x)**(-1)
def errfuncHar(p):
return funcHar(xData, p[0], p[1]) - yData
#print(xData.min(), xData.max())
#print(yData.min(), yData.max())
trialX = np.linspace(xData[0], xData[-1], 1000)
# Fit an harmonic
popt, pcov = optimize.curve_fit(funcHar, xData, yData)
print 'popt'
print(popt)
yHAR = funcHar(trialX, *popt)
#optimization
# initial values
p1, success = optimize.leastsq(errfuncHar, popt,maxfev=1000)
print p1
aaaa = funcHar(trialX, *p1)
plt.figure()
plt.plot(xData, yData, 'r+', label='Data', marker='o')
plt.plot(trialX, yHAR, 'r-',ls='--', label="Har Fit")
plt.plot(trialX, aaaa, 'y', label = 'Optimized')
plt.legend()
plt.show(block=False)
return p1
ptp = ([0,1,2,3,4,5,6,7,8,9,10,11,12,13,14])
pir = ([150,85,90,50,45,60,60,40,40,30,28,30,38,30,26])
hypRegress(ptp,pir)
harRegress(ptp,pir)
input('pause')
It's a classic problem. The curve_fit algorithm starts from an initial guess for the arguments to be optimized, which, if not supplied, is simply all ones.
That means, when you call
popt, pcov = optimize.curve_fit(funcHar, xData, yData)
the first attempt for the fitting routine will be to assume
funcHar(xData, qi=1, di=1)
If you haven't specified any of the other options, the fit will be poor, as evidenced by the large variances of the parameter estimates (check the diagonal of pcov and compare it to the actual values returned in popt).
In many cases, the situation is solved by supplying an intelligent guess. From your HAR-model, I gather that the values around x==0 are the same in size as qi. So you could supply an initial guess of p0 = (pir[0], 1), which will already lead to a satisfying solution. You could also call it with
popt, pcov = optimize.curve_fit(funcHar, ptp, pir, p0=(0, 1))
which leads to the same result. So the problem is just that the algorithm finds a local minimum.
An alternative would've been to supply a different factor, the "parameter determining the initial step bound":
popt, pcov = optimize.curve_fit(funcHar, ptp, pir, p0=(1, 1), factor=1)
In this case, even with the (default) initial guess of p0=(1,1), it gives the same resulting fit.
Remember: fitting is an art, not a science. Often times, by analyzing the model you want to fit, you could already supply a good initial guess.
I can't speak for the algorithm used in the commercial program. If it is open-source (unlikely), you could have a look to see what they do.