scipy curve_fit strange result - python

I am trying to fit a distribution with scipy's curve_fit. I tried to fit a one component exponential function which resulted in an almost straight line (see figure). I also tried a two component exponential fit which seemed to work nicely. Two components just means that a part of the equation repeats with different input parameters. Anyway, here is the one component fit function:
def Exponential(Z,w0,z0,Z0):
z = Z - Z0
termB = (newsigma**2 + z*z0) / (numpy.sqrt(2.0)*newsigma*z0)
termA = (newsigma**2 - z*z0) / (numpy.sqrt(2.0)*newsigma*z0)
return w0/2.0 * numpy.exp(-(z**2 / (2.0*newsigma**2))) * (numpy.exp(termA**2)*erfc(termA) + numpy.exp(termB**2)*erfc(termB))
and the fitting is done with
fitexp = curve_fit(Exponential,newx,y2)
Then I tried something, just to try it out. I took two parameters of the two component fit, but did not use them in the calculation.
def ExponentialNew(Z,w0,z0,w1,z1,Z0):
z = Z - Z0
termB = (newsigma**2 + z*z0) / (numpy.sqrt(2.0)*newsigma*z0)
termA = (newsigma**2 - z*z0) / (numpy.sqrt(2.0)*newsigma*z0)
return w0/2.0 * numpy.exp(-(z**2 / (2.0*newsigma**2))) * (numpy.exp(termA**2)*erfc(termA) + numpy.exp(termB**2)*erfc(termB))
And suddenly this works.
Now, my quation is. WHY? As you can see, there is absolutely no difference in the calculation of the fit. It just gets two extra variables that are not used. Should this not get the same result?
#Andras_Deak
An actual example:
from scipy.special import erfc
import numpy
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
#setup data
x = [-58.,-54.,-50.,-46.,-42.,-38.,-34.,-30.,-26.,-22.,-18.,-14.,-10.,-6.,-2.,2.,6.,10.,14.,18.,22.,26.,30.,34.,38.,42.,46.,50.,54.,58.]
y = [23.06763817, 16.89802085, 17.83258379, 16.63446237, 13.81878965, 12.97965839, 14.30451789, 16.98288216, 22.26811491, 28.56756908, 33.06990344, 38.59842098, 54.19860393, 86.37381604, 137.47253315, 199.49724512, 238.66047662, 219.89405445, 160.68820199, 103.88901303, 65.92405727, 43.84596266, 31.5395342, 25.9610156, 22.71683709, 18.06740651, 13.85362374, 11.12867065, 10.36502799, 11.31855619]
y_err = [17.9823065, 4.13684885, 1.66490726, 2.4109372, 2.93359141, 1.9701747, 3.19214881, 3.65593012, 2.89089074, 3.58922121, 4.25505348, 4.72728874, 6.77736567, 11.3888196, 21.87771722, 39.0087495, 56.6910311, 51.7592369, 26.39750958, 10.62678862, 7.85893395, 8.11741621, 7.91731416, 7.07739132, 5.41818744, 6.11286843, 8.27070757, 7.85323065, 4.26885499, 0.9047867]
#function to fit
def Exponential2(Z, w0, z0, w1, z1, Z0):
z = Z - Z0
s = 3.98098937586
a = z**2 / (2.0*s**2)
b = (s**2 + z*z0) / (numpy.sqrt(2.0)*s*z0)
c = (s**2 - z*z0) / (numpy.sqrt(2.0)*s*z0)
d = (s**2 + z*z1) / (numpy.sqrt(2.0)*s*z1)
e = (s**2 - z*z1) / (numpy.sqrt(2.0)*s*z1)
return w0/2.0 * numpy.exp(-a) * (numpy.exp(c**2)*erfc(c) + numpy.exp(b**2)*erfc(b)) + w1/2.0 * numpy.exp(-a) * (numpy.exp(e**2)*erfc(e) + numpy.exp(d**2)*erfc(d))
#derive and set initial guess
ymaxpos = x[numpy.where(y==numpy.max(y))[0]]
p0_2 = [numpy.max(y),5,numpy.max(y)/2.0,20,ymaxpos]
#fit
fitexp2 = curve_fit(Exponential2,x,y,p0=p0_2,sigma=y_err)
#get results
w0err = numpy.sqrt(numpy.diag(fitexp2[1]))[0]
z0err = numpy.sqrt(numpy.diag(fitexp2[1]))[1]
w1err = numpy.sqrt(numpy.diag(fitexp2[1]))[2]
z1err = numpy.sqrt(numpy.diag(fitexp2[1]))[3]
w0 = fitexp2[0][0]
z0 = fitexp2[0][1]
w1 = fitexp2[0][2]
z1 = fitexp2[0][3]
Z0 = fitexp2[0][4]
#new x array for smoother curve
smoothx = numpy.arange(-58,59,0.1)
y2 = Exponential2(smoothx,w0,z0,w1,z1,Z0)
print 'Exponential 2: w0: '+str(w0.round(3))+' +/- '+str(w0err.round(3))+' \t z0: '+str(z0.round(3))+' +/- '+str(z0err.round(3))+' \t w1: '+str(w1.round(3))+' +/- '+str(w1err.round(3))+' \t\t z1: '+str(z1.round(3))+' +/- '+str(z1err.round(3))
#plot
fig = plt.figure()
ax = fig.add_subplot(111)
ax.errorbar(x,y,y_err,fmt='o',markersize=2,label='data')
ax.plot(smoothx,y2,label='fit',color='red')
ax.grid()
ax.legend()
plt.show()
As you can see, the plot does look good, but the returned value z1 is totaly unrealistic.
Exponential 2: w0: 312.608 +/- 36.764 z0: 8.263 +/- 1.158 w1: 12.689 +/- 9.138 z1: 1862257.883 +/- 45201809883.8

In my experience curve_fit can sometimes act up and stick with the initial values for the parameters. I would suspect that in your case adding a few fake parameters changed the heuristics of how the relevant parameters are being initialized (although this contradicts the documentation's statement that with no initial values given, they all default to 1).
It helps a lot in obtaining reliable fits if you specify reasonable bounds and initial values for your fitting parameters (I mean the p0 and bounds keywords). The fact that the default starting values should all be 1 suggests that for most use cases, the default won't cut it.

Related

Fipy error:’’The Factor is exactly singular’’, when applying Neumann boundary conditions

We’re trying to solve a one-dimensional Coupled Continuity-Poisson problem in Fipy. When applying
Dirichlet’s conditions, it gives the correct results, but when we change the boundaries conditions to Neumann’s which is closer to our problem, it gives “The Factor is exactly singular’’ error.
Any help is highly appreciated. The code is as follows (0<x<2.5):
from fipy import *
from fipy import Grid1D, CellVariable, TransientTerm, DiffusionTerm, Viewer
import numpy as np
import math
import matplotlib.pyplot as plt
from matplotlib import cm
from cachetools import cached, TTLCache #caching to increase the speed of python
cache = TTLCache(maxsize=100, ttl=86400) #creating the cache object: the
#first argument= the number of objects we store in the cache.
#____________________________________________________
nx=50
dx=0.05
L=nx*dx
e=math.e
m = Grid1D(nx=nx, dx=dx)
print(np.log(e))
#____________________________________________________
phi = CellVariable(mesh=m, hasOld=True, value=0.)
ne = CellVariable(mesh=m, hasOld=True, value=0.)
phi_face = phi.faceValue
ne_face = ne.faceValue
x = m.cellCenters[0]
t0 = Variable()
phi.setValue((x-1)**3)
ne.setValue(-6*(x-1))
#____________________________________________________
#cached(cache)
def S(x,t):
f=6*(x-1)*e**(-t)+54*((x-1)**2)*e**(-2.*t)
return f
#____________________________________________________
#Boundary Condition:
valueleft_phi=3*e**(-t0)
valueright_phi=6.75*e**(-t0)
valueleft_ne=-6*e**(-t0)
valueright_ne=-6*e**(-t0)
phi.faceGrad.constrain([valueleft_phi], m.facesLeft)
phi.faceGrad.constrain([valueright_phi], m.facesRight)
ne.faceGrad.constrain([valueleft_ne], m.facesLeft)
ne.faceGrad.constrain([valueright_ne], m.facesRight)
#____________________________________________________
eqn0 = DiffusionTerm(1.,var=phi)==ImplicitSourceTerm(-1.,var=ne)
eqn1 = TransientTerm(1.,var=ne) ==
VanLeerConvectionTerm(phi.faceGrad,var=ne)+S(x,t0)
eqn = eqn0 & eqn1
#____________________________________________________
steps = 1.e4
dt=1.e-4
T=dt*steps
F=dt/(dx**2)
print('F=',F)
#____________________________________________________
vi = Viewer(phi)
with open('out2.txt', 'w') as output:
while t0()<T:
print(t0)
phi.updateOld()
ne.updateOld()
res=1.e30
#for sweep in range(steps):
while res > 1.e-4:
res = eqn.sweep(dt=dt)
t0.setValue(t0()+dt)
for m in range(nx):
output.write(str(phi[m])+' ') #+ os.linesep
output.write('\n')
if __name__ == '__main__':
vi.plot()
#____________________________________________________
data = np.loadtxt('out2.txt')
X, T = np.meshgrid(np.linspace(0, L, len(data[0,:])), np.linspace(0, T,
len(data[:,0])))
fig = plt.figure(3)
ax = fig.add_subplot(111,projection='3d')
ax.plot_surface(X, T, Z=data)
plt.show(block=True)
The issue with these equations, particularly eqn0, is that they admit an infinite number of solutions when Neumann boundary conditions are applied on both boundaries. You can fix this by pinning a value somewhere with an internal fixed value. E.g., based on the analytical solution given in the comments, phi = (x-1)**3 * exp(-t), we can pin phi = 0 at x = 1 with
mask = (m.x > 1-dx/2) & (m.x < 1+dx/2)
largeValue = 1e6
value = 0.
#____________________________________________________
eqn0 = (DiffusionTerm(1.,var=phi)==ImplicitSourceTerm(-1.,var=ne)
+ ImplicitSourceTerm(mask * largeValue, var=phi) - mask * largeValue * value)
At this point, the solutions still do not agree with the expected solutions. This is because, while you have called ne.faceGrad.constrain() for the left and right boundaries, does not appear in the discretized equations. You can see this if you plot ne; the gradient is zero at both boundaries despite the constraint because FiPy never "sees" the constraint.
What does appear is the flux . By applying fixed flux boundary conditions, I obtain the expected solutions:
ne_left = 6 * numerix.exp(-t0)
ne_right = -9 * numerix.exp(-t0)
eqn1 = (TransientTerm(1.,var=ne)
== VanLeerConvectionTerm(phi.faceGrad * m.interiorFaces,var=ne)
+ S(x,t0)
+ (m.facesLeft * ne_left * phi.faceGrad).divergence
+ (m.facesRight * ne_right * phi.faceGrad).divergence)
You can probably get better convergence properties with
eqn1 = (TransientTerm(1.,var=ne)
== DiffusionTerm(coeff=ne.faceValue * m.interiorFaces, var=phi)
+ S(x,t0)
+ (m.facesLeft * ne_left * phi.faceGrad).divergence
+ (m.facesRight * ne_right * phi.faceGrad).divergence)
but either formulation seems to work.
Note: phi.faceGrad.constrain() is fine, because the flux does appear in DiffusionTerm(coeff=1., var=phi).
Separately, it appears (based on "The Factor is exactly singular") that you are solving with the SciPy LinearLUSolver. The PETSc LinearLUSolver does better, but the baseline value of the solution wanders all over the place. Calling
res = eqn.sweep(dt=dt, solver=LinearGMRESSolver())
also seems to produce stable results (without pinning an internal value). This behavior probably shouldn't be relied on; pinning a value is the right thing to do.

scipy curve_fi returns initial parameters estimates

I am triyng to use scipy curve_fit function to fit a gaussian function to my data to estimate a theoretical power spectrum density. While doing so, the curve_fit function always return the initial parameters (p0=[1,1,1]) , thus telling me that the fitting didn't work.
I don't know where the issue is. I am using python 3.9 (spyder 5.1.5) from the anaconda distribution on windows 11.
here a Wetransfer link to the data file
https://wetransfer.com/downloads/6097ebe81ee0c29ee95a497128c1c2e420220704110130/86bf2d
Here is my code below. Can someone tell me what the issue is, and how can i solve it?
on the picture of the plot, the blue plot is my experimental PSD and the orange one is the result of the fit.
import numpy as np
import math
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import scipy.constants as cst
File = np.loadtxt('test5.dat')
X = File[:, 1]
Y = File[:, 2]
f_sample = 50000
time=[]
for i in range(1,len(X)+1):
t=i*(1/f_sample)
time= np.append(time,t)
N = X.shape[0] # number of observation
N1=int(N/2)
delta_t = time[2] - time[1]
T_mes = N * delta_t
freq = np.arange(1/T_mes, (N+1)/T_mes, 1/T_mes)
freq=freq[0:N1]
fNyq = f_sample/2 # Nyquist frequency
nb = 350
freq_block = []
# discrete fourier tansform
X_ft = delta_t*np.fft.fft(X, n=N)
X_ft=X_ft[0:N1]
plt.figure()
plt.plot(time, X)
plt.xlabel('t [s]')
plt.ylabel('x [micro m]')
# Experimental power spectrum on both raw and blocked data
PSD_X_exp = (np.abs(X_ft)**2/T_mes)
PSD_X_exp_b = []
STD_PSD_X_exp_b = []
for i in range(0, N1+2, nb):
freq_b = np.array(freq[i:i+nb]) # i-nb:i
psd_b = np.array(PSD_X_exp[i:i+nb])
freq_block = np.append(freq_block, (1/nb)*np.sum(freq_b))
PSD_X_exp_b = np.append(PSD_X_exp_b, (1/nb)*np.sum(psd_b))
STD_PSD_X_exp_b = np.append(STD_PSD_X_exp_b, PSD_X_exp_b/np.sqrt(nb))
plt.figure()
plt.loglog(freq, PSD_X_exp)
plt.legend(['Raw Experimental PSD'])
plt.xlabel('f [Hz]')
plt.ylabel('PSD')
plt.figure()
plt.loglog(freq_block, PSD_X_exp_b)
plt.legend(['Experimental PSD after blocking'])
plt.xlabel('f [Hz]')
plt.ylabel('PSD')
kB = cst.k # Boltzmann constant [m^2kg/s^2K]
T = 273.15 + 25 # Temperature [K]
r = (2.8 / 2) * 1e-6 # Particle radius [m]
v = 0.00002414 * 10 ** (247.8 / (-140 + T)) # Water viscosity [Pa*s]
gamma = np.pi * 6 * r * v # [m*Pa*s]
Do = kB*T/gamma # expected value for D
f3db_o = 50000 # expected value for f3db
fc_o = 300 # expected value pour fc
n = np.arange(-10,11)
def theo_spectrum_lorentzian_filter(x, D_, fc_, f3db_):
PSD_theo=[]
for i in range(0,len(x)):
# print(i)
psd_theo=np.sum((((D_*Do)/2*math.pi**2)/((fc_*fc_o)**2+(x[i]+n*f_sample)
** 2))*(1/(1+((x[i]+n*f_sample)/(f3db_*f3db_o))**2)))
PSD_theo= np.append(PSD_theo,psd_theo)
return PSD_theo
popt, pcov = curve_fit(theo_spectrum_lorentzian_filter, freq_block, PSD_X_exp_b, p0=[1, 1, 1], sigma=STD_PSD_X_exp_b, absolute_sigma=True, check_finite=True,bounds=(0.1, 10), method='trf', jac=None)
D_, fc_, f3db_ = popt
D1 = D_*Do
fc1 = fc_*fc_o
f3db1 = f3db_*f3db_o
print('Diffusion constant D = ', D1, ' Corner frequency fc= ',fc1, 'f3db(diode,eff)= ', f3db1)
I believe I've successfully fitted your data. Here's the approach I took.
First, I plotted your model (with popt=[1, 1, 1]) and the data you had. I noticed your data was significantly lower than the model. Then I started fiddling with the parameters. I wanted to push the model upwards. I did that by multiplying popt[0] by increasingly large values. I ended up with 1E13 as a ballpark value. Note that I have no idea if this is physically possible for your model. Then I jury-rigged your fitting function to multiply D_ by 1E13 and ran your code. I got this fit:
So I believe it's a problem of 1) inappropriate starting values and 2) inappropriate bounds. In your position, I would revise this model, check if there's any problems with units and so on.
Here's what I used to try to fit your model:
plt.figure()
plt.loglog(freq_block[:170], PSD_X_exp_b[:170], label='Exp')
plt.loglog(freq_block[:170],
theo_spectrum_lorentzian_filter(
freq_block[:170],
1E13*popt[0], popt[1], popt[2]),
label='model'
)
plt.xlabel('f [Hz]')
plt.ylabel('PSD')
plt.legend()
I limited the data to point 170 because there were some weird backwards values that made me uncomfortable. I would recheck them if I were you.
Here's the model code I used. I didn't change the curve_fit call (except to limit x to :170.
def theo_spectrum_lorentzian_filter(x, D_, fc_, f3db_):
PSD_theo=[]
D_ = 1E13*D_ # I only changed here
for i in range(0,len(x)):
psd_theo=np.sum((((D_*Do)/2*math.pi**2)/((fc_*fc_o)**2+(x[i]+n*f_sample)
** 2))*(1/(1+((x[i]+n*f_sample)/(f3db_*f3db_o))**2)))
PSD_theo= np.append(PSD_theo,psd_theo)
return PSD_theo

How to solve a 9-equations system of non linear DE with python?

I'm desperately trying to solve (and display the graph) a system made of nine nonlinear differential equations which model the path of a boomerang. The system is the following:
All the letters on the left side are variables, the others are either constants or known functions depending on v_G and w_z
I have tried with scipy.odeint with no conclusive results (I had this issue but the workaround did not work.)
I begin to think that the problem is linked with the fact that these equations are nonlinear or that the function in denominator might cause a singularity that the scipy solver is simply unable to handle. However, I am not familiar with that sort of mathematical knowledge.
What possibilities python-wise do I have to solve this set of equations?
EDIT : Sorry if I was not clear enough. Since it models the path of a boomerang, my goal is not to solve analytically this system (ie I don't care about the mathematical expression of each function), but rather to get the values of each function for a specific time range (say, from t1 = 0s to t2 = 15s with an interval of 0.01s between each value) in order to display the graph of each function and the graph of the center of mass of the boomerang (X,Y,Z are its coordinates).
Here is the code I tried :
import scipy.integrate as spi
import numpy as np
#Constants
I3 = 10**-3
lamb = 1
L = 5*10**-1
mu = I3
m = 0.1
Cz = 0.5
rho = 1.2
S = 0.03*0.4
Kz = 1/2*rho*S*Cz
g = 9.81
#Initial conditions
omega0 = 20*np.pi
V0 = 25
Psi0 = 0
theta0 = np.pi/2
phi0 = 0
psi0 = -np.pi/9
X0 = 0
Y0 = 0
Z0 = 1.8
INPUT = (omega0, V0, Psi0, theta0, phi0, psi0, X0, Y0, Z0) #initial conditions
def diff_eqs(t, INP):
'''The main set of equations'''
Y=np.zeros((9))
Y[0] = (1/I3) * (Kz*L*(INP[1]**2+(L*INP[0])**2))
Y[1] = -(lamb/m)*INP[1]
Y[2] = -(1/(m * INP[1])) * ( Kz*L*(INP[1]**2+(L*INP[0])**2) + m*g) + (mu/I3)/INP[0]
Y[3] = (1/(I3*INP[0]))*(-mu*INP[0]*np.sin(INP[6]))
Y[4] = (1/(I3*INP[0]*np.sin(INP[3]))) * (mu*INP[0]*np.cos(INP[5]))
Y[5] = -np.cos(INP[3])*Y[4]
Y[6] = INP[1]*(-np.cos(INP[5])*np.cos(INP[4]) + np.sin(INP[5])*np.sin(INP[4])*np.cos(INP[3]))
Y[7] = INP[1]*(-np.cos(INP[5])*np.sin(INP[4]) - np.sin(INP[5])*np.cos(INP[4])*np.cos(INP[3]))
Y[8] = INP[1]*(-np.sin(INP[5])*np.sin(INP[3]))
return Y # For odeint
t_start = 0.0
t_end = 20
t_step = 0.01
t_range = np.arange(t_start, t_end, t_step)
RES = spi.odeint(diff_eqs, INPUT, t_range)
However, I keep getting the same problem as shown here and especially the error message :
Excess work done on this call (perhaps wrong Dfun type)
I am not quite sure what it means but it looks like the solver have troubles solving the system. In any case, when I try to display the 3D path thanks to the XYZ coordinates, I just get 3 or 4 points where there should be something like 2000.
So my questions are : - Am I doing something wrong in my code ?
- If not, is there an other maybe more sophisticated tool to solve this sytem ?
- If not, is it even possible to get what I want from this system of ODEs ?
Thanks in advance
There are several issues:
if I copy the code, it does not run
the workaround you mention does not work with odeint, the given
solution uses ode
The scipy reference for odeint says:"For new code, use
scipy.integrate.solve_ivp to solve a differential equation."
the call RES = spi.odeint(diff_eqs, INPUT, t_range) should be
consistent to the function head def diff_eqs(t, INP) . Mind the
order: RES = spi.odeint(diff_eqs,t_range, INPUT)
There are some issues about to mathematical formulas too:
have a look at the 3rd formula on your picture. It has no tendency term, it starts with a zero - what does that mean ?
it's hard to check wether you have translated the formula correctly into code since the code does not follow the formulas strictly.
Below I tried a solution with scipy solve_ivp. In case A I'm able to run a pendulum, but in case B no meaningful solution for the boomerang can be found. So check the maths, I guess some error in the mathematical expressions.
For the graphics use pandas to plot all variables together (see code below).
import scipy.integrate as spi
import numpy as np
import pandas as pd
def diff_eqs_boomerang(t,Y):
INP = Y
dY = np.zeros((9))
dY[0] = (1/I3) * (Kz*L*(INP[1]**2+(L*INP[0])**2))
dY[1] = -(lamb/m)*INP[1]
dY[2] = -(1/(m * INP[1])) * ( Kz*L*(INP[1]**2+(L*INP[0])**2) + m*g) + (mu/I3)/INP[0]
dY[3] = (1/(I3*INP[0]))*(-mu*INP[0]*np.sin(INP[6]))
dY[4] = (1/(I3*INP[0]*np.sin(INP[3]))) * (mu*INP[0]*np.cos(INP[5]))
dY[5] = -np.cos(INP[3])*INP[4]
dY[6] = INP[1]*(-np.cos(INP[5])*np.cos(INP[4]) + np.sin(INP[5])*np.sin(INP[4])*np.cos(INP[3]))
dY[7] = INP[1]*(-np.cos(INP[5])*np.sin(INP[4]) - np.sin(INP[5])*np.cos(INP[4])*np.cos(INP[3]))
dY[8] = INP[1]*(-np.sin(INP[5])*np.sin(INP[3]))
return dY
def diff_eqs_pendulum(t,Y):
dY = np.zeros((3))
dY[0] = Y[1]
dY[1] = -Y[0]
dY[2] = Y[0]*Y[1]
return dY
t_start, t_end = 0.0, 12.0
case = 'A'
if case == 'A': # pendulum
Y = np.array([0.1, 1.0, 0.0]);
Yres = spi.solve_ivp(diff_eqs_pendulum, [t_start, t_end], Y, method='RK45', max_step=0.01)
if case == 'B': # boomerang
Y = np.array([omega0, V0, Psi0, theta0, phi0, psi0, X0, Y0, Z0])
print('Y initial:'); print(Y); print()
Yres = spi.solve_ivp(diff_eqs_boomerang, [t_start, t_end], Y, method='RK45', max_step=0.01)
#---- graphics ---------------------
yy = pd.DataFrame(Yres.y).T
tt = np.linspace(t_start,t_end,yy.shape[0])
with plt.style.context('fivethirtyeight'):
plt.figure(1, figsize=(20,5))
plt.plot(tt,yy,lw=8, alpha=0.5);
plt.grid(axis='y')
for j in range(3):
plt.fill_between(tt,yy[j],0, alpha=0.2, label='y['+str(j)+']')
plt.legend(prop={'size':20})

Fitting Tanh curves with python

I need to fit an tanh curve like this one :
import numpy as np
import matplotlib.pyplot as plt
from lmfit import Model
def f(x, a1=0.00010, a2=0.00013, a3=0.00013, teta1=1, teta2=0.00555, teta3=0.00555, phi1=-50, phi2=600, phi3=-900,
a=0.000000019, b=0):
formule = a1 * np.tanh(teta1 * (x + phi1)) + a2 * np.tanh(teta2 * (x + phi2)) + a3 * np.tanh(
teta3 * (x + phi3)) + a * x + b
return formule
# generate points used to plot
x_plot = np.linspace(-10000, 10000, 1000)
gmodel = Model(f)
result = gmodel.fit(f(x_plot), x=x_plot, a1=1,a2=1,a3=1,teta1=1,teta2=1,teta3=1,phi1=0,phi2=0,phi3=0)
plt.plot(x_plot, f(x_plot), 'bo')
plt.plot(x_plot, result.best_fit, 'r-')
plt.show()
i try to do someting like that but i got this result:
There is an other way for fitting this curve ? I don't know what i'm doing wrong ?
Basically your fit is fine (although not very nice from the coding point of view). Like always, non-linear fits strongly rely on initial parameters. Yours are just chosen badly. You could either think how to determine them manually or use a pre-made package like differential_evolution from scipy.optimize. I am not using this package but you can find an example here on SE
I agree with the answers from mikuszefski and F. Win but would like to add another point.
Your model includes a line + 3 tanh functions. It's not entirely clear that the data support that many different tanh functions. If so (and echoing mikuszefki), you will need to tell the fit that these are not identical. Your example starts them off being identical, which will make it very difficult for the fit to find a good solution. Either way, it would probably be helpful to be able to easily test if there really are 1, 2, 3, or more tanh functions.
You may also want to give not only initial values for your parameters, but also realistic boundaries on them so that the tanh functions are clearly separated and don't wander too far off from where they should be.
To clean up your code and to better allow you to change the number of tanh functions used and place boundary constraints, I would suggest making individual models and adding them as with:
from lmfit import Model
def f_tanh(x, eta=1, phi=0):
"tanh function"
return np.tanh(eta * (x + phi))
def f_line(x, slope=0, intercept=0):
"line function"
return slope*x + intercept
# create model as line + 2 tanh functions
gmodel = Model(f_line) + Model(f_tanh, prefix='t1_') + Model(f_tanh, prefix='t2_')
Now you can easily create parameters, with
params = gmodel.make_params(slope=0.003, intercept=0.001,
t1_eta=0.021, t1_phi=-2000,
t2_eta=0.013, t2_phi=600)
With the fit parameters defined, you can place bounds with:
params['t1_eta'].min = 0
params['t2_eta'].min = 0
params['t1_phi'].min = -3000
params['t1_phi'].max = -1000
params['t2_phi'].min = 0
params['t2_phi'].max = 1000
I think all of these will help you better explore the data and the fits to it. Putting this all together, you might have:
import numpy as np
import matplotlib.pyplot as plt
from lmfit import Model
def f_tanh(x, eta=1, phi=0):
"tanh function"
return np.tanh(eta * (x + phi))
def f_line(x, slope=0, intercept=0):
"line function"
return slope*x + intercept
# line + 2 tanh functions
gmodel = Model(f_line) + Model(f_tanh, prefix='t1_') + Model(f_tanh, prefix='t2_')
# generate "data"
x = np.linspace(-10000, 10000, 1000)
y = gmodel.eval(x=x, slope=0.0001,
t1_eta=0.010, t1_phi=-2100,
t2_eta=0.004, t2_phi=740)
y = y + np.random.normal(size=len(x), scale=0.02)
# make parameters with initial values
params = gmodel.make_params(slope=0.003, intercept=0.001,
t1_eta=0.021, t1_phi=-2000,
t2_eta=0.013, t2_phi=600)
# place realistic but generous constraints to keep tanhs separate
params['t1_eta'].min = 0
params['t2_eta'].min = 0
params['t1_phi'].min = -3000
params['t1_phi'].max = -1000
params['t2_phi'].min = 0
params['t2_phi'].max = 1000
result = gmodel.fit(y, params, x=x)
print(result.fit_report())
plt.plot(x, y, 'bo')
plt.plot(x, result.best_fit, 'r-')
plt.show()
This will give a good fit and plot and find the expected values, within the noise level. Hope that helps get you pointed in the right direction.
Your function is a bit confusing and you do not really have function values. You basically want to to fit to your function itself. Ideally you want to replace f(x_plot) in curve_fit() by real experimental data.
A good way to fit a function is using scipy.optimize.curve_fit
from scipy.optimize import curve_fit
popt, pcov = curve_fit(f, x_plot, f(x_plot), p0=[0.00010, 0.00013, 0.00013, 1, 0.00555, .00555, -50, 600, -900,
0.000000019, 0])
plt.plot(f(x_plot, *popt))
The resulting fit looks like this
with real data :
test_X = np.array(
[-9.77073e+03, -9.29706e+03, -8.82339e+03, -8.34979e+03, -7.87614e+03, -7.40242e+03, -6.92874e+03, -6.45506e+03,
-5.98143e+03, -5.50771e+03, -5.03404e+03, -4.56012e+03, -4.08674e+03, -3.61304e+03, -3.13937e+03, -2.66578e+03,
-2.19210e+03, -1.71845e+03, -1.24478e+03, -9.78925e+02, -9.29077e+02, -8.79059e+02, -8.29082e+02, -7.79092e+02,
-7.29080e+02, -6.79084e+02, -6.29061e+02, -5.79078e+02, -5.29103e+02, -4.79089e+02, -4.29094e+02, -3.79071e+02,
-3.29074e+02, -2.79062e+02, -2.29079e+02, -1.92907e+02, -1.72931e+02, -1.52930e+02, -1.32937e+02, -1.12946e+02,
-9.29511e+01, -7.29438e+01, -5.29292e+01, -3.29304e+01, -1.29330e+01, 7.04455e+00, 2.70676e+01, 4.70634e+01,
6.70526e+01, 8.70340e+01, 1.07056e+02, 1.27037e+02, 1.47045e+02, 1.67033e+02, 1.87039e+02, 2.20765e+02,
2.70680e+02, 3.20699e+02, 3.70693e+02, 4.20692e+02, 4.70696e+02, 5.20704e+02, 5.70685e+02, 6.20710e+02,
6.70682e+02, 7.20705e+02, 7.70707e+02, 8.20704e+02, 8.70713e+02, 9.20691e+02, 9.70700e+02, 1.23926e+03,
1.73932e+03, 2.23932e+03, 2.73926e+03, 3.23924e+03, 3.73926e+03, 4.23952e+03, 4.73926e+03, 5.23930e+03,
5.71508e+03, 6.21417e+03, 6.71413e+03, 7.21412e+03, 7.71410e+03, 8.21405e+03, 8.71402e+03, 9.21423e+03])
test_Y = np.array(
[-3.17679e-04, -3.27541e-04, -3.51184e-04, -3.60672e-04, -3.75965e-04, -3.86888e-04, -4.03222e-04, -4.23262e-04,
-4.38526e-04, -4.51187e-04, -4.61081e-04, -4.67121e-04, -4.96690e-04, -4.94811e-04, -5.10110e-04, -5.18985e-04,
-5.11754e-04, -4.90964e-04, -4.36904e-04, -3.93638e-04, -3.83336e-04, -3.71110e-04, -3.57207e-04, -3.39643e-04,
-3.24155e-04, -2.97296e-04, -2.74653e-04, -2.43700e-04, -1.95574e-04, -1.60716e-04, -1.43363e-04, -1.33610e-04,
-1.30734e-04, -1.26332e-04, -1.26063e-04, -1.24228e-04, -1.23424e-04, -1.20276e-04, -1.16886e-04, -1.21865e-04,
-1.16605e-04, -1.14148e-04, -1.14728e-04, -1.14660e-04, -1.16927e-04, -1.10380e-04, -1.09836e-04, 4.24232e-05,
8.66095e-05, 8.43905e-05, 9.09867e-05, 8.95580e-05, 9.02585e-05, 8.87033e-05, 8.86536e-05, 8.92236e-05,
9.24438e-05, 9.27929e-05, 9.24961e-05, 9.72166e-05, 1.00432e-04, 1.05457e-04, 1.11278e-04, 1.14716e-04,
1.25818e-04, 1.40721e-04, 1.62968e-04, 1.91776e-04, 2.28125e-04, 2.57918e-04, 2.88941e-04, 3.85003e-04,
4.91916e-04, 5.32483e-04, 5.50929e-04, 5.45350e-04, 5.38903e-04, 5.27765e-04, 5.15592e-04, 4.95717e-04,
4.81722e-04, 4.69538e-04, 4.58643e-04, 4.41407e-04, 4.29820e-04, 4.07784e-04, 3.92236e-04, 3.81761e-04])
i try this:
import numpy,
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import differential_evolution
import warnings
def function(x, a1, a2, a3, teta1, teta2, teta3, phi1, phi2, phi3, a, b):
import numpy as np
formule = a1 * np.tanh(teta1 * (x + phi1)) + a2 * np.tanh(teta2 * (x + phi2)) + a3 * np.tanh(teta3 * (x + phi3)) + a * x + b
return formule
# function for genetic algorithm to minimize (sum of squared error)
def sumOfSquaredError(parameterTuple):
warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
val = function(test_X, *parameterTuple)
return numpy.sum((test_Y - val) ** 2.0)
def generate_Initial_Parameters():
parameterBounds = []
parameterBounds.append([1.4e-04, 1.4e-04])
parameterBounds.append([2.00e-04,2.0e-04])
parameterBounds.append([2.5e-04, 2.5e-04])
parameterBounds.append([0, 2.0e+01])
parameterBounds.append([0, 4.0e-03])
parameterBounds.append([0, 4.0e-03])
parameterBounds.append([-8.e+01, 0])
parameterBounds.append([0, 9.0e+02])
parameterBounds.append([-2.1e+03, 0])
parameterBounds.append([-3.4e-08, -2.4e-08])
parameterBounds.append([-2.2e-05*2, 4.2e-05])
# "seed" the numpy random number generator for repeatable results
result = differential_evolution(sumOfSquaredError, parameterBounds)
return result.x
# generate initial parameter values
geneticParameters = generate_Initial_Parameters()
# curve fit the test data
fittedParameters, pcov = curve_fit(function, test_X, test_Y, geneticParameters)
print('Parameters', fittedParameters)
modelPredictions = function(test_X, *fittedParameters)
absError = modelPredictions - test_Y
SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(test_Y))
print('RMSE:', RMSE)
print('R-squared:', Rsquared)
ytry = ftry(test_X)
##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth / 100.0, graphHeight / 100.0), dpi=100)
axes = f.add_subplot(111)
# first the raw data as a scatter plot
axes.plot(test_X, test_Y, 'D')
# create data for the fitted equation plot
yModel = function(test_X, *fittedParameters)
# now the model as a line plot
axes.plot(test_X, yModel)
axes.set_xlabel('X Data') # X axis data label
axes.set_ylabel('Y Data') # Y axis data label
axes.plot(test_X, ytry)
plt.show()
plt.close('all') # clean up after using pyplot
graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)
R-squared: 0.9978, not perfect but not so bad
enter image description here

Scipy's Optimize Curve Fit Limits

Is there any way I can provide limits for the Scipy's Optimize Curve Fit?
My example:
def optimized_formula(x, m_1, m_2, y_1, y_2, ratio_2):
return (log(x[0]) * m_1 + m_2)*((1 - x[1]/max_age)*(1-ratio_2)) + ((log(x[1]) * y_1 + y_2)*(x[1]/max_age)*ratio_2)
popt, pcov = optimize.curve_fit(optimized_formula, usage_and_age, prices)
x[0] is age and max_age is a constant. With that in mind, as x[0] approaches maximum, x[1]/max_age approaches 1.
Is it possible to provide a constraint/limit whereby x[1]/max_age > 0.3 and x[1]/max_age < 0.7 and other constraints such as m_1 < 0, m_2 > 0, and so on.
As suggested in another answer, you could use lmfit for these kind of problems. Therefore, I add an example on how to use it in case someone is interested in this topic, too.
Let's say you have a dataset as follows:
xdata = np.array([177.,180.,183.,187.,189.,190.,196.,197.,201.,202.,203.,204.,206.,218.,225.,231.,234.,
252.,262.,266.,267.,268.,277.,286.,303.])
ydata = np.array([0.81,0.74,0.78,0.75,0.77,0.81,0.73,0.76,0.71,0.74,0.81,0.71,0.74,0.71,
0.72,0.69,0.75,0.59,0.61,0.63,0.64,0.63,0.35,0.27,0.26])
and you want to fit a model to the data which looks like this:
model = n1 + (n2 * x + n3) * 1./ (1. + np.exp(n4 * (n5 - x)))
with the constraints that
0.2 < n1 < 0.8
-0.3 < n2 < 0
Using lmfit (version 0.8.3) you then obtain the following output:
n1: 0.26564921 +/- 0.024765 (9.32%) (init= 0.2)
n2: -0.00195398 +/- 0.000311 (15.93%) (init=-0.005)
n3: 0.87261892 +/- 0.068601 (7.86%) (init= 1.0766)
n4: -1.43507072 +/- 1.223086 (85.23%) (init=-0.36379)
n5: 277.684530 +/- 3.768676 (1.36%) (init= 274)
As you can see, the fit reproduces the data very well and the parameters are in the requested ranges.
Here is the entire code that reproduces the plot with a few additional comments:
from lmfit import minimize, Parameters, Parameter, report_fit
import numpy as np
xdata = np.array([177.,180.,183.,187.,189.,190.,196.,197.,201.,202.,203.,204.,206.,218.,225.,231.,234.,
252.,262.,266.,267.,268.,277.,286.,303.])
ydata = np.array([0.81,0.74,0.78,0.75,0.77,0.81,0.73,0.76,0.71,0.74,0.81,0.71,0.74,0.71,
0.72,0.69,0.75,0.59,0.61,0.63,0.64,0.63,0.35,0.27,0.26])
def fit_fc(params, x, data):
n1 = params['n1'].value
n2 = params['n2'].value
n3 = params['n3'].value
n4 = params['n4'].value
n5 = params['n5'].value
model = n1 + (n2 * x + n3) * 1./ (1. + np.exp(n4 * (n5 - x)))
return model - data #that's what you want to minimize
# create a set of Parameters
# 'value' is the initial condition
# 'min' and 'max' define your boundaries
params = Parameters()
params.add('n1', value= 0.2, min=0.2, max=0.8)
params.add('n2', value= -0.005, min=-0.3, max=10**(-10))
params.add('n3', value= 1.0766, min=-1000., max=1000.)
params.add('n4', value= -0.36379, min=-1000., max=1000.)
params.add('n5', value= 274.0, min=0., max=1000.)
# do fit, here with leastsq model
result = minimize(fit_fc, params, args=(xdata, ydata))
# write error report
report_fit(params)
xplot = np.linspace(min(xdata), max(xdata), 1000)
yplot = result.values['n1'] + (result.values['n2'] * xplot + result.values['n3']) * \
1./ (1. + np.exp(result.values['n4'] * (result.values['n5'] - xplot)))
#plot results
try:
import pylab
pylab.plot(xdata, ydata, 'k+')
pylab.plot(xplot, yplot, 'r')
pylab.show()
except:
pass
EDIT:
If you use version 0.9.x you need to adjust the code accordingly; check here which changes have been made from 0.8.3 to 0.9.x.
Note: New in version 0.17 of SciPy
Let's suppose you want to fit a model to the data which looks like this:
y=a*t**alpha+b
and with the constraint on alpha
0<alpha<2
while other parameters a and b remains free. Then we should use the bounds option of optimize.curve_fit:
import numpy as np
from scipy.optimize import curve_fit
def func(t, a,alpha,b):
return a*t**alpha+b
param_bounds=([-np.inf,0,-np.inf],[np.inf,2,np.inf])
popt, pcov = optimize.curve_fit(func, xdata,ydata,bounds=param_bounds)
Source is here
Try the lmfit module (http://lmfit.github.io/lmfit-py/). It adds a way to fix or set bounds on parameters for many of the optimization routines in scipy.optimize, including least-squares, and provides many tools to make fitting easier.
Since curve_fit() uses a least squares approach, you might want to look at scipy.optimize.fmin_slsqp(), which allows do perform constrained optimizations. Check this tutorial on how to use it.

Categories