Python scipy curve_fit Exponential equation not fitting as expected - python

I have data I am trying to fit a exponential to, this data is not ideal however when use JMP's in-build curve fit function it works as expected and a I get a good approximation of my data (please see bellow figure, JMP Fit Curve Exponential 3P).
JMP Fit Curve Exponential 3P
I am know trying to replicate this using the python library scipy.optimize with the curve_fit function as described here. However this is producing very different curves please see bellow.
import pandas as pd
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import numpy as np
df = pd.read_csv('test.csv', sep = ',' ,index_col = None, engine='python')
def exponential_3p(x, a, b, c):
return a + b * np.exp(c * x)
popt, pcov = curve_fit(exponential_3p,df.x,df.y)
a = popt[0]
b = popt[1]
c = popt[2]
plt.plot(df.x,df.y)
plt.plot(df.x,exponential_3p(df.x, a, b, c))
scipy optimize.curve_fit Exponential

You are yet another victim of the incomprehensible stupidity of scipy.optimize.curve_fit.
Curve-fitting and local optimization problems REQUIRES initial values for all variable parameters. They are not optional. There is no "default value" that makes sense. scipy.optimize.curve_fit lies to you about this and allows you to not provide initial values and silently (not even a warning!) assumes that you meant all initial values to be 1. This is wrong, wrong, wrong.
You must give sensible starting values or the fit.

Related

Issues fitting gaussian to scatter plot

I'm having a lot of trouble fitting this data, particularly getting the fit parameters to match the expected parameters.
from scipy.optimize import curve_fit
import numpy as np
def gaussian_model(x, a, b, c, d): # add constant d
return a*np.exp(-(x-b)**2/(2*c**2))+d
x = np.linspace(0, 20, 100)
mu, cov = curve_fit(gaussian_model, xdata, ydata)
fit_A = mu[0]
fit_B = mu[1]
fit_C = mu[2]
fit_D = mu[3]
fit_y = gaussian_model(xdata, fit_A, fit_B, fit_C, fit_D)
print(mu)
plt.plot(x, fit_y)
plt.scatter(xdata, ydata)
plt.show()
Here's the plot
When I printed the parameters, I got values of -17 for amplitude, 2.6 for mean, -2.5 for standard deviation, and 110 for the base. This is very far off from what I would expect from the scatter plot. Any ideas why?
Also, I'm pretty new to coding, so any advice is helpful! Thanks everyone :)
Edit: figured out what was wrong! Just needed to add some guesses.
This is not an answer as expected.
This is an alternative method of fitting gaussian.
The process is not iteratif and doesn't requier initial "guessed" values of the parameters to start as in the usual methods.
The result is :
The method of calculus is shown below :
The general principle is explained with examples in https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales . This is a linear regression wrt an integral equation which solution is the gaussian function.
If one want more accurate and/or more specific result according to some specified criteria of fitting, one have to use a software with non-linear regression process. Then one can use the above result as initial values of parameters for a more robust iterative process.

How to get a log function fit using Scipy curve_fit for the data

I am trying to get a*log(b/x)^c type fit for the following data (simplified for 10 data points)
I have tried methods described in some other questions like this one using both curve_fit and lmfit but the solution never converges. My guess is that my initial conditions are bad. I was able to get the other exponential function commented out fit but the application requires a log fit of the form given. The data with the fit that works is attached for reference.
import numpy as np
from scipy.optimize import curve_fit
x=[0, 0.89790454, 1.79580908, 2.69371362, 3.59161816, 4.48952269, 5.38742723, 6.28533177, 7.18323631, 8.08114085]
y=[0.39599324, 0.10255828, 0.07094521, 0.05500624, 0.04636146, 0.04585985, 0.0398909, 0.03340628, 0.03041699, 0.02498938]
x = np.array(x,dtype=float)
y = np.array(y,dtype=float)
def func(x, a, b, c):
#return a*np.exp(-c*(x*b))+d
return a*(np.log(b/x)**c)
popt, pcov = curve_fit(func, x, y, p0=[.5,.5,1],maxfev=10000)
print(popt)
a,b ,c = np.asarray(popt)
Replace your function with,
def func(x, a, b, c):
#return a*np.exp(-c*(x*b))+d
t1 = np.log(b/x)
t2 = a*t1**c
print(a,b,c,t1, t2)
return t;
Yow will rapidly see that t1 = np.log(b / x) may be negative (this happens whenever b < x). A power of a negative number to a non-integer power is not a real number, and here numpy is producing nan results.
I have no difficukty with my fitting software (result below).
Often a cause of difficulty with non-linear fitting using iterative method of regression is the setting of initial values of the parameters to start the iterative process.

Exponential curve fitting of pandas data in python

I'm trying to fit an exponential curve to some data represented by a pandas dataframe. The data looks like this:
The code I've used for curve fitting:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from scipy.optimize import curve_fit
t = df['time'].values
ym = df['value'].values
def func(t, c0, c1, c2, c3):
return c0 + c1*t - c2*np.exp(-c3*t)
p0 = [6e6, 0.01, 100, 0.01]
c, cov = curve_fit(func, t, ym, p0)
print(c) # Output: [-5.46019366e+06 3.19567938e+03 1.00000000e+08 1.00000000e+06]
yp = func(t, c[0], c[1], c[2], c[3])
plt.figure()
plt.plot(t/60, ym)
plt.plot(t/60, yp)
However, the fitted curve always seem to be linear like this:
I have tried different methods I've found online and always get the same linear result. My dataframe look like this, were Cycle_id corresponds to "time", and peak correspond to "value":
Any suggestion on how to fit this data is much appreciated, since I can't seem to find any errors in my code upon reviewing it, thus not getting any further..
Sorry for the poor answer. I have not enough knowledge about Python in practical use. Moreover it is not possible to get sufficiently correct data from a picture. A scanning provided data which was used in the below calculus but the results are probably not accurate.
I guess that the difficulty that you faced comes from the method of calculus which is iterative starting from "guessed" values of the parameters.
If we use a non-iterative method which doesn't need initial guessed values the calculus is generally more robust. Such a method is explain in this paper : https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales
They are a lot of numerical examples in the paper but unfortunately your function is not treated in full details. It is not difficult to adapt the method to this case : See below. The result is :
Possibly you can use the above values of p,a,b,c as initial values of parameters in a more classical method.
FOR INFOMATION :
The method of non-iterative regression uses a convenient integral equation which transforms the non-linear regression to a linear regression. In the present case the integral equation is :

Problem while fitting a set of data points to arbitrary curve with Scipy

I have a set of data points which, according to the model I want to implement, could be modelled with a certain curve (in this case, a product between an exponential and a complementary error function).
For fitting these data into such a curve, I tried:
import numpy as np
from scipy.optimize import curve_fit
from scipy import special
x_fit = np.linspace(0,1,1000)
def fitted_function(x_fit, c, d, S):
return c*np.exp(((S*d/2)**2)-x_fit*d)*special.erfc(S*d/2-x_fit/S)
FitParameters, FitCovariance = curve_fit(fitted_function, x_data, y_data, maxfev = 100000)
It does not give me any particular error, but the result of the fitting is evidently wrong. I strongly suspect that it has to do with the the part x_fit/S, where the fitting parameter S appears as a denominator.
For example, I encounter the same problem while fitting a simple exponential: if I define the fitting curve with
return a*np.exp(-x_fit/b)
with a, b fitting parameters; since the fitting parameter b appears as a denominator, I find the same problem (i.e. the resulting fitted curve is a horizontal line for some reason).
For the case of a simple exponential I can simple bypass this by doing
return a*np.exp(-b*x_fit)
so that b is not a denominator anymore and the fitted curve is really an exponential curve. For my current case, instead, I cannot do this since S appears ad a numerator and a denominator in different part of the expression.
Any ideas? Thank you in advance!

Scipy.optimize.curve_fit won't fit cosine power law

For several hours now, I have been trying to fit a model to a (generated) dataset as a casus for a problem I've been struggling with. I generated datapoints for the function f(x) = A*cos^n(x)+b, and added some noise. When I try to fit the dataset with this function and curve_fit, I get the error
./tester.py:10: RuntimeWarning: invalid value encountered in power
return Amp*(np.cos(x))**n + b
/usr/lib/python2.7/dist-packages/scipy/optimize/minpack.py:690: OptimizeWarning: Covariance of the parameters could not be estimated category=OptimizeWarning)
The code I'm using to generate the datapoints and fit the model is the following:
#!/usr/bin/env python
from __future__ import print_function
import numpy as np
from scipy.optimize import curve_fit
from matplotlib.pyplot import figure, show, rc, plot
def f(x, Amp, n, b):
return np.real(Amp*(np.cos(x))**n + b)
x = np.arange(0, 6.28, 0.01)
randomPart = np.random.rand(len(x))-0.5
fig = figure()
sample = f(x, 5, 2, 5)+randomPart
frame = fig.add_subplot(1,1,1)
frame.plot(x, sample, label="Sample measurements")
popt, pcov = curve_fit(f, x, sample, p0=(1,1,1))
modeldata = f(x, popt[0], popt[1], popt[2])
print(modeldata)
frame.plot(x, modeldata, label="Best fit")
frame.legend()
frame.set_xlabel("x")
frame.set_ylabel("y")
show()
The noisy data is shown - see the image below.
Does any of you have a clue of what's going on? I suspect it has something to do with the power law going into the complex domain, as the real part of the function is nowhere divergent. I have tried returning only the real part of the function, setting realistic bounds in curve_fit and using a numpy array instead of a python list for p0 already as well. I'm running the latest version of scipy available to me, scipy 0.17.0-1.
The problem is the following:
>>> (-2)**1.1
(-2.0386342710747223-0.6623924280875919j)
>>> np.array(-2)**1.1
__main__:1: RuntimeWarning: invalid value encountered in power
nan
Unlike native python floats, numpy doubles usually refuse to take part in operations leading to complex results:
>>> np.sqrt(-1)
__main__:1: RuntimeWarning: invalid value encountered in sqrt
nan
As a quick workaround I suggest adding an np.abs call to your function, and using appropriate bounds for fitting to make sure this doesn't give a spurious fit. If your model is near the truth and your sample (I mean the cosine in your sample) is positive, then adding an absolute value around it should be a no-op (update: I realize this is never the case, see the proper approach below).
def f(x, Amp, n, b):
return Amp*(np.abs(np.cos(x)))**n + b # only change here
With this small change I get this:
For reference, the parameters from the fit are (4.96482314, 2.03690954, 5.03709923]) comparing to the generation with (5,2,5).
After giving it a bit more thought I realized that the cosine will always be negative for half your domain (duh). So the workaround I suggested might be a bit problematic, or at least its correctness is non-trivial. On the other hand, thinking of your original formula containing cos(x)^n, with negative values for cos(x) this only makes sense as a model if n is an integer, otherwise you get a complex result. Since we can't solve Diophantine fitting problems, we need to handle this properly.
The most proper way (by which I mean the way that is least likely to bias your data) is this: first do the fitting with a model that converts your data to complex numbers then takes the complex magnitude on output:
def f(x, Amp, n, b):
return Amp*np.abs(np.cos(x.astype(np.complex128))**n) + b
This is obviously much less efficient than my workaround, since in each fitting step we create a new mesh, and do some extra work both in the form of complex arithmetic and an extra magnitude calculation. This gives me the following fit even with no bounds set:
The parameters are (5.02849409, 1.97655728, 4.96529108). These are close too. However, if we put these values back into the actual model (without np.abs), we get imaginary parts as large as -0.37, which is not overwhelming but significant.
So the second step should be redoing the fit with a proper model---one that has an integer exponent. Take the exponent 2 which is obvious from your fit, and do a new fit with this model. I don't believe any other approach gives you a mathematically sound result. You can also start from the original popt, hoping that it's indeed close to the truth. Of course we could use the original function with some currying, but it's much faster to use a dedicated double-specific version of your model.
from __future__ import print_function
import numpy as np
from scipy.optimize import curve_fit
from matplotlib.pyplot import subplots, show
def f_aux(x, Amp, n, b):
return Amp*np.abs(np.cos(x.astype(np.complex128))**n) + b
def f_real(x, Amp, n, b):
return Amp*np.cos(x)**n + b
x = np.arange(0, 2*np.pi, 0.01) # pi
randomPart = np.random.rand(len(x)) - 0.5
sample = f(x, 5, 2, 5) + randomPart
fig,(frame_aux,frame) = subplots(ncols=2)
for fr in frame_aux,frame:
fr.plot(x, sample, label="Sample measurements")
fr.legend()
fr.set_xlabel("x")
fr.set_ylabel("y")
# auxiliary fit for n value
popt_aux, pcov_aux = curve_fit(f_aux, x, sample, p0=(1,1,1))
modeldata = f(x, *popt_aux)
#print(modeldata)
print('Auxiliary fit parameters: {}'.format(popt_aux))
frame_aux.plot(x, modeldata, label="Auxiliary fit")
# check visually, test if it's close to an integer, but otherwise
n = np.round(popt_aux[1])
# actual fit with integral exponent
popt, pcov = curve_fit(lambda x,Amp,b,n=n: f_real(x,Amp,n,b), x, sample, p0=(popt_aux[0],popt_aux[2]))
modeldata = f(x, popt[0], n, popt[1])
#print(modeldata)
print('Final fit parameters: {}'.format([popt[0],n,popt[1]]))
frame.plot(x, modeldata, label="Best fit")
frame_aux.legend()
frame.legend()
show()
Note that I changed a few things in your code which doesn't really affect my point. The figure from the above, so the one that shows both the auxiliary fit and the proper one:
The output:
Auxiliary fit parameters: [ 5.02628994 2.00886409 5.00652371]
Final fit parameters: [5.0288141074549699, 2.0, 5.0009730316739462]
Just to reiterate: while there might be no visual difference between the auxiliary fit and the proper one, only the latter gives a meaningful answer to your problem.

Categories