I am trying to fit a decaying exponential function to real world data. I'm having a problem with aligning the function to the actual data.
Here's my code:
def test_func(x, a, b, c):
return a*np.exp(-b*x)*np.sin(c*x)
my_time = np.linspace(0,2.5e-6,25000)
p0 = [60000, 700000, 2841842]
params, params_covariance = curve_fit(test_func, my_time, my_amp,p0)
My signal and fitted function
My question: why doesn't the fitted function start where my data starts increasing in amplitude?
As I said in my comment, the problem is that your function does not take into account that the exponential curve can be shifted. If you include this shift as an additional parameter, the fit will probably converge.
from scipy.optimize import curve_fit
from matplotlib import pyplot as plt
import numpy as np
def test_func(x, a, b, c, d):
return a*np.exp(-b*(x+d))*np.sin(c*(x+d))
my_time = np.linspace(0,2.5e-6,25000)
#generate fake data
testp0 = [66372, 765189, 2841842, -1.23e-7]
test_amp = test_func(my_time, *testp0)
my_amp = test_func(my_time, *testp0)
my_amp[:2222] = my_amp[2222]
p0 = [600, 700000, 2000, -2e-7]
params, params_covariance = curve_fit(test_func, my_time, test_amp, p0)
print(params)
fit_amp = test_func(my_time, *params)
plt.plot(my_time, my_amp, label="data")
plt.plot(my_time, fit_amp, label="fit")
plt.legend()
plt.show()
Sample output
Related
Hi everyone I am trying to fit a curve using Python scipy.optimize.curve_fit. The end result of the fit is very poor. After getting the parameters, and reconstructing the curve with the new parameters, the end result is a very poorly fitted curve.
I should be expecting a downward sloping curve, instead of an upward sloping one.
Attached the code below
import pandas as pd
import numpy as np
import datetime as dt
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
variance = {"Nov2022":0.166092943386744, "May2023":0.119276119381951, "Jun2023":0.113809061614305}
variance = pd.Series(variance)
time = list(variance.index)
# dt.datetime.strptime(time[0], "%b%Y")
time = [dt.datetime.strptime(x, "%b%Y") for x in time]
# print(time)
today = dt.datetime.today()
time_years = [(x - today).days/365 for x in time]
def Var(T, sigma, alpha):
t = 1/365
var = sigma*2 /(2*alpha) * (1 - np.exp(-2*alpha * (np.array(T) - t)))
return var
#the fitting is done here with both parameters
# popt, pcov = curve_fit(ModelVar, time_years, list(variance), bounds=([0,0],[np.inf, np.inf]))
popt, pcov = curve_fit(Var, time_years, list(variance))
sigma = popt[0]
alpha = popt[1]
#after fitting, reconstruct with the given alpha and sigma
pd.options.display.float_format = "{:.15f}".format
fitted_model = Var(time_years, sigma, alpha)
df = pd.DataFrame(variance).rename(columns= {0:"HistoricalVar"}, errors = "raise").assign(fitted_var = fitted_model)
plt.plot(df)
plt.show()
print(df)
I tried the following to find a sine regression but I am not able to draw a sine curve. What am I doing wrong here?
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from scipy.optimize import curve_fit
def sinfunc(x, a, b, c, d):
return a * np.sin(b * (x - np.radians(c)))+d
year=np.arange(0,24,2)
population=np.array([10.2,11.1,12,11.7,10.6,10,10.6,11.7,12,11.1,10.2,10.2])
popt, pcov = curve_fit(sinfunc, year, population, p0=None)
x_data = np.linspace(0, 25, num=100)
plt.scatter(year,population,label='Population')
plt.plot(x_data, sinfunc(x_data, *popt), 'r-',label='Fitted function')
plt.title("Year vs Population")
plt.xlabel('Year')
plt.ylabel('Population')
plt.legend()
plt.show()
The TI-nspire shows y=sin(0.58x-1)+11
Update
If I use p0=[1,0.4,1,5] it works well. But shouldn't it be automatic?
The thing you are doing "wrong" is passing p0=None to curve_fit().
All fitting methods really, really require initial values. Unfortunately, scipy.optimize.curve_fit() has the completely unjustifiable option of allowing you to not set initial values and silently (not even a warning!!) making the absurd guess that all values have initial values of 1.0. It turns out that for your problem these impossible-to-justify-and-broken-by-design initial values are so bad that the fit fails to find a good answer. This is not uncommon. curve_fit is lying to you that p0=None is acceptable, and you are believing that lie.
The solution is to recognize that the offset is obviously around 11 and use p0=[1.0, 0.5, 0.5, 11.0].
You might consider using lmfit (https://lmfit.github.io/lmfit-py/). for this problem (disclaimer: I am a lead author). lmfit has a Model class for curve-fitting that has several useful features that might be useful here (not that curve_fit cannot solve this problem -- it can). With lmfit, your fit might look like:
import numpy as np
import matplotlib.pyplot as plt
from lmfit import Model
def sinfunc(x, a, b, c, d):
return a * np.sin(b*(x - c)) + d
year=np.arange(0,24,2)
population=np.array([10.2,11.1,12,11.7,10.6,10,
10.6,11.7,12,11.1,10.2,10.2])
# build model from your model function
model = Model(sinfunc)
# create parameters (with initial values!). Note that parameters
# are named from the argument names of your model function
params = model.make_params(a=1, b=0.5, c=0.5, d=11.0)
# you can set min/max for any parameter to put bounds on the values
params['a'].min = 0
params['c'].min = -np.pi
params['c'].max = np.pi
# do the fit to your data with those parameters
result = model.fit(population, params, x=year)
# print out report of fit statistics and parameter values+uncertainties
print(result.fit_report())
# plot data and fit result
plt.scatter(year,population,label='Population')
plt.plot(year, result.best_fit, 'r-',label='Fitted function')
plt.title("Year vs Population")
plt.xlabel('Year')
plt.ylabel('Population')
plt.legend()
plt.show()
This will print out a report of
[[Model]]
Model(sinfunc)
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 26
# data points = 12
# variables = 4
chi-square = 0.00761349
reduced chi-square = 9.5169e-04
Akaike info crit = -80.3528861
Bayesian info crit = -78.4132595
[[Variables]]
a: 1.00465520 +/- 0.01247767 (1.24%) (init = 1)
b: 0.57528444 +/- 0.00198556 (0.35%) (init = 0.5)
c: 1.80990367 +/- 0.03823815 (2.11%) (init = 0.5)
d: 11.0250780 +/- 0.00925246 (0.08%) (init = 11)
[[Correlations]] (unreported correlations are < 0.100)
C(b, c) = 0.812
C(b, d) = 0.245
C(c, d) = 0.234
and produce a plot of
But, again: the problem is that you were suckered into believing that p0=None is a reasonable use of curve_fit().
I was trying to fit my data to the function that is written below, but when using curve_fit the results don't match the data at all.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
nu=[0.00,0.03,0.01,-0.02,0.00,-0.06]
data=np.loadtxt('impedancia.txt')
use=np.transpose(data)
Z=use[0]
omega=use[1]
def func(x,a,b,c):
return a/(x**2)+b+c*x**2
popt,poc=curve_fit(func,omega,Z)
plt.plot(omega,Z,'bo',markersize=3.5)
plt.plot(omega,func(omega,*popt))`
I was wondering if anyone could help me with this.
Here is my code and plotted result, with the scipy.optimize.differential_evolution module used to estimate initial parameters for the non-linear solver. Note that this code uses a variation on the Lorentzian peak equation similar to yours, however lines 20 and 21 allow you to select the equation. The peak equation in your code does not appear to fit the narrow peak of the data as well as the recommended peak equation currently selected.
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import warnings
from scipy.optimize import differential_evolution
# bounds on parameters are set in generate_Initial_Parameters() below
def func_original(x,a,b,c):
return a/(x**2)+b+c*x**2
# bounds on parameters are set in generate_Initial_Parameters() below
def func_recommended(x,a,b,c):
return a / (b + (x-c)**2)
# select peak function here
#func = func_original
func = func_recommended
# function for genetic algorithm to minimize (sum of squared error)
# bounds on parameters are set in generate_Initial_Parameters() below
def sumOfSquaredError(parameterTuple):
warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
return np.sum((yData - func(xData, *parameterTuple)) ** 2)
def generate_Initial_Parameters():
# data min and max used for bounds
maxX = max(xData)
minX = min(xData)
maxY = max(yData)
minY = min(yData)
minSearch = min([minX, minY])
maxSearch = max([maxX, maxY])
parameterBounds = []
parameterBounds.append([minSearch, maxSearch]) # parameter bounds for a
parameterBounds.append([minSearch, maxSearch]) # parameter bounds for b
parameterBounds.append([minSearch, maxSearch]) # parameter bounds for c
# "seed" the numpy random number generator for repeatable results
result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
return result.x
# load data from text file
data=np.loadtxt('impedancia.txt')
use=np.transpose(data)
yData=use[0]
xData=use[1]
# generate initial parameter values
initialParameters = generate_Initial_Parameters()
# curve fit the data
fittedParameters, niepewnosci = curve_fit(func, xData, yData, initialParameters)
# create values for display of fitted peak function
a, b, c = fittedParameters
y_fit = func(xData, a, b, c)
plt.plot(xData, yData, 'D') # plot the raw data
plt.plot(xData, y_fit) # plot the equation using the fitted parameters
plt.show()
print(fittedParameters)
I am trying to fit a Lorentz peak to some data and the fit seems to function however it does not show up on the graph.
#Defining a Lorentz function to fit a curve to the data which is used in the plot() to create a graph
def lorentz(M,M0,delM):
d=read()
M=d[:,0]
n=delM
d=(M-M0)**2+(delM/2)**2
s=(1/2*pi)*(n/d)
#Multiplying s by 7.943E5 to convert tesla to A/m for a more sensible umerical answer
return s*(7.943E5)
#Creating a graph plot of Absorption against Magnetic field for magnetic resonance and using lorentz() to fit a curve to the data
def plot():
d=read()
M=d[:,0]
A=d[:,1]
pyplot.semilogx(M,A,marker=".",ls="none", label="20 Ghz")
fit_vals, fit_errors = curve_fit(lorentz,M,A)
pyplot.semilogx(M, lorentz(M, fit_vals[0], fit_vals[1]), label = "fit")
pyplot.xlabel("Magnetic Field,A/m")
pyplot.ylabel("Absorption")
pyplot.title("Magnetic resonance")
pyplot.legend()
pyplot.show()
The following code runs for me:
import numpy as np
from matplotlib import pyplot
from scipy.optimize import curve_fit
def fn(x, a, b, c):
return a * np.exp(-b * x) + c
d=np.random.rand(50,2)
M=d[:,0]
A=d[:,1]
fit_vals, fit_errors = curve_fit(fn,M,A)
print ' fit coefficients:\n', fit_vals
print ' Covariance matrix:\n', fit_errors
polynomial = np.poly1d(fit_vals)
xs = np.arange(0,1,0.01)
ys = polynomial(xs)
pyplot.semilogx(M, A,'o', label = "fit")
pyplot.semilogx(xs, ys, label = "fit")
pyplot.xlabel("Magnetic Field,A/m")
pyplot.ylabel("Absorption")
pyplot.title("Magnetic resonance")
pyplot.legend()
pyplot.show()
I use two columns of a random array to make the plot.
I am trying to fit a skewed and shifted Gaussian curve using scipy's curve_fit function, but I find that under certain conditions the fitting is quite poor, often giving me close to or exactly a straight line.
The code below is derived from the curve_fit documentation. The code provided is an arbitrary set of data for test purposes but displays the issue quite well.
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
import math as math
import scipy.special as sp
#def func(x, a, b, c):
# return a*np.exp(-b*x) + c
def func(x, sigmag, mu, alpha, c,a):
#normal distribution
normpdf = (1/(sigmag*np.sqrt(2*math.pi)))*np.exp(-(np.power((x-mu),2)/(2*np.power(sigmag,2))))
normcdf = (0.5*(1+sp.erf((alpha*((x-mu)/sigmag))/(np.sqrt(2)))))
return 2*a*normpdf*normcdf + c
x = np.linspace(0,100,100)
y = func(x, 10,30, 0,0,1)
yn = y + 0.001*np.random.normal(size=len(x))
popt, pcov = curve_fit(func, x, yn,) #p0=(9,35,0,9,1))
y_fit= func(x,popt[0],popt[1],popt[2],popt[3],popt[4])
plt.plot(x,yn)
plt.plot(x,y_fit)
The issue seems to pop up when I shift the gaussian too far from zero (using mu). I have tried giving initial values, even those identical to my original function, but it does not solve the problem. For a value of mu=10, curve_fit works perfectly, but if I use mu>=30 it not longer fits the data.
Giving starting points for minimization often works wonders. Try giving the minimizer some information on the position of the maximum and the width of the curve:
popt, pcov = curve_fit(func, x, yn, p0=(1./np.std(yn), np.argmax(yn) ,0,0,1))
Changing this single line in your code with sigma=10 and mu=50 produces
You can call curve_fit many times with random initial guess, and choose the parameters with minimum error.
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
import math as math
import scipy.special as sp
def func(x, sigmag, mu, alpha, c,a):
#normal distribution
normpdf = (1/(sigmag*np.sqrt(2*math.pi)))*np.exp(-(np.power((x-mu),2)/(2*np.power(sigmag,2))))
normcdf = (0.5*(1+sp.erf((alpha*((x-mu)/sigmag))/(np.sqrt(2)))))
return 2*a*normpdf*normcdf + c
x = np.linspace(0,100,100)
y = func(x, 10,30, 0,0,1)
yn = y + 0.001*np.random.normal(size=len(x))
results = []
for i in xrange(50):
p = np.random.randn(5)*10
try:
popt, pcov = curve_fit(func, x, yn, p)
except:
pass
err = np.sum(np.abs(func(x, *popt) - yn))
results.append((err, popt))
if err < 0.1:
break
err, popt = min(results, key=lambda x:x[0])
y_fit= func(x, *popt)
plt.plot(x,yn)
plt.plot(x,y_fit)
print len(results)