Scipy Optimize CurveFit calculates wrong values - python

I am interesting in knowing the phase shift between two sine-waves type. For that I am trying to fit each wave with scipy.cuve_fit. I have been following this post. However I obtain negative amplitudes and the phase shift looks like forwarded pi radians sometimes.
The code that I am using is that one below:
def fit_sin_LD(t_LD, y_LD):
'''Fit sin to the input time sequence, and return fitting parameters "amp", "omega", "phase", "offset", "freq", "period" and "fitfunc"'''
ff = np.fft.fftfreq(len(t_LD), (t_LD[1]-t_LD[0])) # assume uniform spacing
Fyy = abs(np.fft.fft(y_LD))
guess_freq = abs(ff[np.argmax(Fyy[1:])+1]) # excluding the zero frequency "peak", which is related to offset
guess_amp = np.std(y_LD) * 2.**0.5
guess_offset = np.mean(y_LD)
guess = np.array([guess_amp, 2.*np.pi*guess_freq, 0., guess_offset])
def sinfunc_LD(t_LD, A, w, p, c):
return A * np.sin(w*t_LD + p) + c
#boundary=([0,-np.inf,-np.pi, 1.5],[0.8, +np.inf, np.pi, 2.5])
popt, pcov = scipy.optimize.curve_fit(sinfunc_LD, t_LD, y_LD, p0=guess, maxfev=3000) # with maxfev= number I can increase the number of iterations
A, w, p, c = popt
f = w/(2.*np.pi)
fitfunc_LD = lambda t_LD: A*np.sin(w*t_LD + p) + c
fitted_LD = fitfunc_LD(t_LD)
dic_LD = {"amp_LD": A, "omega_LD": w, "phase_LD": p, "offset_LD": c, "freq_LD": f, "period_LD": 1./f, "fitfunc_LD": fitted_LD, "maxcov_LD": np.max(pcov), "rawres_LD": (guess, popt, pcov)}
return dic_LD
def fit_sin_APD(t_APD, y_APD):
''' Fit sin to the input time sequence, and return fitting parameters "amp", "omega", "phase", "offset", "freq", "period" and "fitfunc" '''
ff = np.fft.fftfreq(len(t_APD), (t_APD[1]-t_APD[0])) # assume uniform spacing
Fyy = abs(np.fft.fft(y_APD))
guess_freq = abs(ff[np.argmax(Fyy[1:])+1]) # excluding the zero frequency "peak", which is related to offset
guess_amp = np.std(y_APD) * 2.**0.5
guess_offset = np.mean(y_APD)
guess = np.array([guess_amp, 2.*np.pi*guess_freq, 0., guess_offset])
def sinfunc_APD(t_APD, A, w, p, c):
return A * np.sin(w*t_APD + p) + c
#boundary=([0,0,-np.pi, 0.0],[np.inf, np.inf, np.pi, 0.7])
popt, pcov = scipy.optimize.curve_fit(sinfunc_APD, t_APD, y_APD, p0=guess, maxfev=5000) # with maxfev= number I can increase the number of iterations
A, w, p, c = popt
f = w/(2.*np.pi)
fitfunc_APD = lambda t_APD: A*np.sin(w*t_APD + p) + c
fitted_APD = fitfunc_APD(t_APD)
dic_APD = {"amp_APD": A, "omega_APD": w, "phase_APD": p, "offset_APD": c, "freq_APD": f, "period_APD": 1./f, "fitfunc_APD": fitted_APD, "maxcov_APD": np.max(pcov), "rawres_APD": (guess, popt, pcov)}
return dic_APD
I dont understand why curve_fit is returning a negative amplitude (that in terms of physics has not sense). I have tried as well setting boundary conditions as **kwargs* with:
bounds=([0.0, -np.inf,-np.pi, 0.0],[+np.inf, +np.inf,-np.pi, +np.inf])
but it yields a more weird result.
I added an image showing this difference:
Does anyone how to overcome this issue with phases and amplitudes?
Thanks in advance

There are a few issues here that I do not understand:
There is no need to define the fit function inside the "fit function"
There is no need to define it twice if the only difference is the naming of the dictionary. (While I do not understand why this has to be named differently in the first place)
One could directly fit the frequency instead of omega
When pre-calculating the fitted values, directly use the given fitfunction
Overall I don't see why the second fit should fail and using some generic data here, it doesn't. Considering the fact that in physics an amplitude can be complex I don't have a problem with a negative results. Nevertheless, I understand the point in the OP. Surely, a fit algorithm does not know about physics and, mathematically, there is no problem with the amplitude being negative. This just gives an additional phase shift of pi. Hence, one can easily force positive amplitudes when taking care of the required phase shift. I introduced this here as possible keyword argument. Moreover I reduced this to one fit function with possible "renaming" of the output dictionary keys as keyword argument.
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
def sinfunc( t, A, f, p, c ):
return A * np.sin( 2.0 * np.pi * f * t + p) + c
def fit_sin(t_APD, y_APD, addName="", posamp=False):
''' Fit sin to the input time sequence, and return fitting parameters "amp", "omega", "phase", "offset", "freq", "period" and "fitfunc" '''
ff = np.fft.fftfreq( len( t_APD ), t_APD[1] - t_APD[0] ) # assume uniform spacing
Fyy = abs( np.fft.fft( y_APD ) )
guess_freq = abs( ff[np.argmax( Fyy[1:] ) + 1] ) # excluding the zero frequency "peak", which is related to offset
guess_amp = np.std( y_APD ) * 2.**0.5
guess_offset = np.mean( y_APD )
guess = np.array( [ guess_amp, guess_freq, 0., guess_offset ] )
popt, pcov = curve_fit(sinfunc, t_APD, y_APD, p0=guess, maxfev=500) # with maxfev= number I can increase the number of iterations
if popt[0] < 0 and posamp:
popt[0] = -popt[0]
popt[2] += np.pi
popt[2] = popt[2] % ( 2 * np.pi )
A, f, p, c = popt
fitted_APD = sinfunc( t_APD, *popt )
dic_APD = {
"amp{}".format(addName): A,
"omega{}".format(addName): 2.0 * np.pi * f,
"phase{}".format(addName): p,
"offset{}".format(addName): c,
"freq{}".format(addName): f,
"period{}".format(addName): 1.0 / f,
"fitfunc{}".format(addName): fitted_APD,
"maxcov{}".format(addName): np.max( pcov ),
"rawres{}".format(addName): ( guess, popt, pcov ) }
return dic_APD
tl = np.linspace(0,1e-6, 150 )
sl1 = np.fromiter( (sinfunc(t, .18, 4998735, 3.6, 2.0 ) + .01 *( 1 - 2 * np.random.random() ) for t in tl ), np.float )
sl2 = np.fromiter( (sinfunc(t, .06, 4998735, 2.1, 0.4 ) + .01 *( 1 - 2 * np.random.random() ) for t in tl ), np.float )
ld = fit_sin(tl, sl1, addName="_ld" )
print ld["amp_ld"]
ld = fit_sin(tl, sl1, addName="_ld", posamp=True )
print ld["amp_ld"]
apd = fit_sin(tl, sl2 )
fig = plt.figure("1")
ax = fig.add_subplot( 1, 1, 1 )
ax.plot( tl, sl1, color="r" )
ax.plot( tl, ld["fitfunc_ld"], color="k", ls="--" )
ax.plot( tl, sl2, color="#50FF80" )
ax.plot( tl, apd["fitfunc"], color="k", ls="--" )
ax.grid()
plt.show()
This gives me:
-0.180108427200549
0.180108427200549
i.e. in the first try, despite the good guess for the amplitude, it turns out negative. This is probably due to the large phase. As that guess is zero, it is easier for the algorithm to switch sign of the amplitude first and then adjusting the phase. As mentioned above, this is corrected easily and does not even require error propagation.

Related

Why is my attempt to fit a tanh(x) function to data not working well?

I've got data of anode currents for different anode voltages. I'm trying to fit a tanh(x) curve to the resulting I-V curve using curve_fit, but I keep getting a line.
Since I'm trying to fit a curve to y against log10(x) I did the curve_fit 2 ways:
I took the log10 of the data first and fit the curve second
I fit the curve first and took the log10 of the data second.
Method 1 code and output:
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def fitfunction(v, a, b, c, d):
return a * np.tanh(b * v + c) + d
# x data = V90
# y data = np.log10(I90)
pars, cov = curve_fit(fitfunction, V90, np.log10(I90))
plt.plot(V90, fitfunction(V90, *pars), 'r-', linewidth='3', label='Line of Best Fit')
plt.scatter(V90, np.log10(I90), marker='.', label='Data')
plt.title('Graph of Line of Best Fit of Anode Current against Anode Potential')
plt.grid(True)
plt.xlabel('Voltage (V)')
plt.ylabel('Current (A)')
plt.legend()
plt.show()
The resulting graph using the above code was:
Method 2 code and output:
pars, cov = curve_fit(fitfunction, V90, I90)
print(pars)
y = fitfunction(V90, *pars)
if any(i < 0 for i in y) == True:
y = y + abs(min(y))
y = y[377:]
x = V90[377:]
plt.plot(x, np.log10(y), 'r-', linewidth='3', label='Line of Best Fit')
plt.scatter(V90, np.log10(I90), marker='.', label='Data')
plt.title('Graph of Line of Best Fit of Anode Current against Anode Potential')
plt.grid(True)
plt.xlabel('Voltage (V)')
plt.ylabel('Current (A)')
plt.legend()
plt.show()
The resulting graph using this code was:
I'm not quite sure why in the second method, even if I do cut off the 0 values, there is still such a drastic deviation when V90 < 0.
The part of the data that's causing problems seems to be somewhere in 2.5V-5.0V which is about 500 lines of data:
#Voltages(V) Currents(A) Af=89.9mA Vf=24.3V
2.50000,0.0003815846315912
2.50500,0.0003816979315912
2.51000,0.0003817056315912
2.51500,0.00038173013159120006
2.52000,0.00038178253159120004
2.52500,0.0003818257315912
2.53000,0.0003819050315912
2.53500,0.0003818466315912
2.54000,0.0003818978315912
2.54500,0.0003819977315912
2.55000,0.00038197953159120005
2.55500,0.00038198843159120005
2.56000,0.00038210623159120005
2.56500,0.00038209303159120005
2.57000,0.0003821845315912
2.57500,0.0003821863315912
2.58000,0.00038220063159120004
2.58500,0.0003822367315912
2.59000,0.00038230733159120005
2.59500,0.00038230853159120003
2.60000,0.00038232433159120005
2.60500,0.0003823070315912
2.61000,0.0003824262315912
2.61500,0.0003824784315912
2.62000,0.0003825377315912
2.62500,0.0003825299315912
2.63000,0.0003825463315912
2.63500,0.00038256423159120005
2.64000,0.00038260893159120006
2.64500,0.0003826748315912
2.65000,0.0003826939315912
2.65500,0.0003826620315912
2.66000,0.00038270823159120004
2.66500,0.00038275413159120003
2.67000,0.0003827898315912
2.67500,0.0003828730315912
2.68000,0.00038286673159120005
2.68500,0.00038290933159120004
2.69000,0.0003829376315912
2.69500,0.0003829943315912
2.70000,0.0003830041315912
2.70500,0.00038304703159120005
2.71000,0.0003830539315912
2.71500,0.0003830631315912
2.72000,0.0003831442315912
2.72500,0.00038314893159120005
2.73000,0.0003831841315912
2.73500,0.0003832354315912
2.74000,0.00038327293159120005
2.74500,0.0003833084315912
2.75000,0.0003833653315912
2.75500,0.0003834109315912
2.76000,0.00038340913159120004
2.76500,0.0003834842315912
2.77000,0.00038352123159120005
2.77500,0.00038353283159120005
2.78000,0.00038357873159120003
2.78500,0.00038351133159120004
2.79000,0.00038353613159120004
2.79500,0.0003836058315912
2.80000,0.00038369733159120004
2.80500,0.0003836595315912
2.81000,0.00038369343159120006
2.81500,0.0003837056315912
2.82000,0.0003837095315912
2.82500,0.0003837635315912
2.83000,0.0003838389315912
2.83500,0.00038387283159120004
2.84000,0.00038388863159120006
2.84500,0.00038390353159120004
2.85000,0.00038393663159120005
2.85500,0.00038400643159120006
2.86000,0.00038402783159120004
2.86500,0.0003840540315912
2.87000,0.00038411223159120003
2.87500,0.00038411843159120003
2.88000,0.0003841074315912
2.88500,0.0003841589315912
2.89000,0.00038417953159120005
2.89500,0.0003841864315912
2.90000,0.00038419383159120005
2.90500,0.0003842904315912
2.91000,0.0003843029315912
2.91500,0.0003842731315912
2.92000,0.0003843509315912
2.92500,0.0003843986315912
2.93000,0.00038439503159120003
2.93500,0.0003843986315912
2.94000,0.00038445253159120007
2.94500,0.0003844441315912
2.95000,0.00038448443159120005
2.95500,0.00038446503159120005
2.96000,0.00038451213159120006
2.96500,0.0003845723315912
2.97000,0.0003846322315912
2.97500,0.0003846078315912
2.98000,0.00038468173159120005
2.98500,0.0003846975315912
2.99000,0.00038470823159120004
2.99500,0.00038466083159120003
3.00000,0.00038470853159120003
3.00500,0.00038475533159120005
3.01000,0.00038481043159120003
3.01500,0.0003848238315912
3.02000,0.0003848566315912
3.02500,0.00038489833159120003
3.03000,0.0003848757315912
3.03500,0.0003849278315912
3.04000,0.0003849609315912
3.04500,0.0003849475315912
3.05000,0.0003850175315912
3.05500,0.0003850137315912
3.06000,0.0003850512315912
3.06500,0.0003851350315912
3.07000,0.0003850923315912
3.07500,0.0003851168315912
3.08000,0.00038514873159120003
3.08500,0.0003851388315912
3.09000,0.00038523033159120005
3.09500,0.00038527003159120004
3.10000,0.00038524373159120007
3.10500,0.00038534923159120005
3.11000,0.0003853215315912
3.11500,0.00038534393159120003
3.12000,0.0003853278315912
3.12500,0.0003853394315912
3.13000,0.0003853603315912
3.13500,0.0003853787315912
3.14000,0.00038547143159120003
3.14500,0.0003854083315912
3.15000,0.00038548693159120006
3.15500,0.00038548843159120003
3.16000,0.00038548723159120005
3.16500,0.0003855608315912
3.17000,0.0003855918315912
3.17500,0.0003855635315912
3.18000,0.0003856055315912
3.18500,0.00038563923159120006
3.19000,0.0003856708315912
3.19500,0.00038566013159120003
3.20000,0.0003857125315912
3.20500,0.00038573913159120003
3.21000,0.00038580913159120005
3.21500,0.00038580463159120003
3.22000,0.0003858180315912
3.22500,0.0003858449315912
3.23000,0.00038583533159120004
3.23500,0.00038587913159120006
3.24000,0.0003858967315912
3.24500,0.0003858645315912
3.25000,0.0003859590315912
3.25500,0.0003859778315912
3.26000,0.00038596353159120006
3.26500,0.00038597303159120005
3.27000,0.00038599453159120006
3.27500,0.0003860436315912
3.28000,0.0003860785315912
3.28500,0.0003860782315912
3.29000,0.00038609913159120006
3.29500,0.0003861384315912
3.30000,0.0003861640315912
3.30500,0.0003861927315912
3.31000,0.0003862037315912
3.31500,0.0003861900315912
3.32000,0.0003862180315912
3.32500,0.00038625343159120004
3.33000,0.00038624963159120004
3.33500,0.0003862892315912
3.34000,0.0003863312315912
3.34500,0.00038631103159120006
3.35000,0.00038635183159120004
3.35500,0.00038638793159120004
3.36000,0.0003863819315912
3.36500,0.0003864013315912
3.37000,0.00038648713159120004
3.37500,0.0003864919315912
3.38000,0.00038650973159120005
3.38500,0.0003865127315912
3.39000,0.0003865819315912
3.39500,0.0003865768315912
3.40000,0.00038658543159120006
3.40500,0.00038663733159120003
3.41000,0.00038662423159120006
3.41500,0.0003866617315912
3.42000,0.00038666563159120003
3.42500,0.0003866883315912
3.43000,0.00038664123159120006
3.43500,0.00038672403159120005
3.44000,0.0003867511315912
3.44500,0.00038671243159120005
3.45000,0.00038679173159120005
3.45500,0.00038677233159120004
3.46000,0.0003867619315912
3.46500,0.0003867944315912
3.47000,0.00038678993159120003
3.47500,0.0003868668315912
3.48000,0.00038685753159120004
3.48500,0.0003868131315912
3.49000,0.0003868772315912
3.49500,0.00038690763159120005
3.50000,0.00038699083159120007
3.50500,0.0003869496315912
3.51000,0.0003869472315912
3.51500,0.0003870432315912
3.52000,0.0003870340315912
3.52500,0.0003870396315912
3.53000,0.0003870617315912
3.53500,0.0003870569315912
3.54000,0.00038713173159120003
3.54500,0.00038711033159120005
3.55000,0.00038715503159120006
3.55500,0.00038715383159120003
3.56000,0.0003871439315912
3.56500,0.00038713713159120003
3.57000,0.00038718153159120005
3.57500,0.00038722833159120007
3.58000,0.0003871866315912
3.58500,0.0003872426315912
3.59000,0.0003872560315912
3.59500,0.00038726643159120005
3.60000,0.0003872912315912
3.60500,0.0003872947315912
3.61000,0.0003873543315912
3.61500,0.00038739643159120004
3.62000,0.0003873946315912
3.62500,0.0003874357315912
3.63000,0.00038740023159120003
3.63500,0.0003874494315912
3.64000,0.00038744203159120003
3.64500,0.00038750963159120004
3.65000,0.00038753823159120004
3.65500,0.0003876038315912
3.66000,0.0003876062315912
3.66500,0.0003876419315912
3.67000,0.00038765513159120006
3.67500,0.00038764313159120003
3.68000,0.0003876282315912
3.68500,0.0003876902315912
3.69000,0.0003876959315912
3.69500,0.0003877227315912
3.70000,0.0003877594315912
3.70500,0.0003877594315912
3.71000,0.00038777433159120005
3.71500,0.0003877194315912
3.72000,0.0003878219315912
3.72500,0.00038785143159120005
3.73000,0.0003878336315912
3.73500,0.00038787443159120003
3.74000,0.0003878839315912
3.74500,0.00038790993159120004
3.75000,0.0003879575315912
3.75500,0.0003880106315912
3.76000,0.00038798413159120003
3.76500,0.0003880049315912
3.77000,0.0003880705315912
3.77500,0.00038805773159120003
3.78000,0.0003880672315912
3.78500,0.00038812413159120007
3.79000,0.0003881292315912
3.79500,0.00038814233159120004
3.80000,0.0003881444315912
3.80500,0.0003881864315912
3.81000,0.0003881593315912
3.81500,0.0003881632315912
3.82000,0.0003881823315912
3.82500,0.00038825473159120004
3.83000,0.0003882901315912
3.83500,0.00038829763159120003
3.84000,0.0003882907315912
3.84500,0.00038834053159119997
3.85000,0.0003883670315912
3.85500,0.0003883939315912
3.86000,0.0003883834315912
3.86500,0.0003883804315912
3.87000,0.0003884391315912
3.87500,0.0003884344315912
3.88000,0.0003884836315912
3.88500,0.00038849013159120003
3.89000,0.00038856043159120004
3.89500,0.0003885616315912
3.90000,0.00038857123159120004
3.90500,0.0003886108315912
3.91000,0.0003886305315912
3.91500,0.0003886633315912
3.92000,0.00038866483159119997
3.92500,0.0003886671315912
3.93000,0.00038875213159120003
3.93500,0.0003887288315912
3.94000,0.0003887735315912
3.94500,0.0003887700315912
3.95000,0.00038872023159120005
3.95500,0.00038880303159120004
3.96000,0.0003888695315912
3.96500,0.0003888597315912
3.97000,0.00038894643159120005
3.97500,0.0003889258315912
3.98000,0.00038898183159120003
3.98500,0.0003889664315912
3.99000,0.0003889261315912
3.99500,0.00038897353159120007
4.00000,0.0003890727315912
4.00500,0.0003890072315912
4.01000,0.00038906323159120003
4.01500,0.00038904213159120005
4.02000,0.00038912523159120003
4.02500,0.0003891359315912
4.03000,0.0003891619315912
4.03500,0.0003891288315912
4.04000,0.0003892173315912
4.04500,0.0003891940315912
4.05000,0.0003892384315912
4.05500,0.0003893127315912
4.06000,0.0003893159315912
4.06500,0.0003892709315912
4.07000,0.0003892769315912
4.07500,0.0003892754315912
4.08000,0.00038935383159120003
4.08500,0.00038933983159120003
4.09000,0.00038941553159120004
4.09500,0.00038942323159120006
4.10000,0.0003894319315912
4.10500,0.0003895219315912
4.11000,0.0003895454315912
4.11500,0.00038950883159120003
4.12000,0.0003896175315912
4.12500,0.00038962773159120003
4.13000,0.0003896372315912
4.13500,0.0003896017315912
4.14000,0.0003896599315912
4.14500,0.00038967243159120004
4.15000,0.00038972243159120007
4.15500,0.00038967923159120003
4.16000,0.0003897564315912
4.16500,0.00038985093159120003
4.17000,0.0003898157315912
4.17500,0.0003898777315912
4.18000,0.00038985683159120004
4.18500,0.00038996063159120003
4.19000,0.00038990393159120005
4.19500,0.0003898995315912
4.20000,0.00039002143159120006
4.20500,0.0003900357315912
4.21000,0.00039002343159120004
4.21500,0.00039006813159120005
4.22000,0.00039010213159120004
4.22500,0.0003901114315912
4.23000,0.0003901018315912
4.23500,0.0003901289315912
4.24000,0.0003902195315912
4.24500,0.0003902076315912
4.25000,0.00039021833159120004
4.25500,0.0003902306315912
4.26000,0.00039023533159120003
4.26500,0.00039031403159120004
4.27000,0.0003903662315912
4.27500,0.00039037843159120004
4.28000,0.00039042283159120006
4.28500,0.0003903867315912
4.29000,0.00039044223159120006
4.29500,0.0003904630315912
4.30000,0.0003904529315912
4.30500,0.0003904729315912
4.31000,0.00039054473159120005
4.31500,0.0003905533315912
4.32000,0.0003905974315912
4.32500,0.0003905998315912
4.33000,0.0003906898315912
4.33500,0.00039066933159120004
4.34000,0.00039072833159120004
4.34500,0.00039076823159120004
4.35000,0.0003907929315912
4.35500,0.00039083443159120007
4.36000,0.0003908636315912
4.36500,0.00039086923159120006
4.37000,0.00039087253159120005
4.37500,0.0003908964315912
4.38000,0.0003909136315912
4.38500,0.00039098463159120006
4.39000,0.0003910287315912
4.39500,0.0003910355315912
4.40000,0.0003910477315912
4.40500,0.0003910922315912
4.41000,0.0003910749315912
4.41500,0.0003911458315912
4.42000,0.0003911830315912
4.42500,0.00039118783159120007
4.43000,0.00039116373159120004
4.43500,0.0003912555315912
4.44000,0.0003912483315912
4.44500,0.00039127843159120006
4.45000,0.0003914009315912
4.45500,0.0003913550315912
4.46000,0.00039139433159120007
4.46500,0.0003914075315912
4.47000,0.0003914602315912
4.47500,0.0003914307315912
4.48000,0.0003914930315912
4.48500,0.0003914924315912
4.49000,0.00039153143159120004
4.49500,0.00039155053159120004
4.50000,0.0003916068315912
4.50500,0.0003916372315912
4.51000,0.00039167213159120004
4.51500,0.00039174183159120006
4.52000,0.00039177053159120004
4.52500,0.00039176813159120003
4.53000,0.00039181583159120004
4.53500,0.0003918697315912
4.54000,0.00039187893159120003
4.54500,0.0003918822315912
4.55000,0.0003919370315912
4.55500,0.0003919138315912
4.56000,0.0003919716315912
4.56500,0.00039199283159120006
4.57000,0.00039201103159120003
4.57500,0.00039209833159120004
4.58000,0.00039208733159120003
4.58500,0.0003921516315912
4.59000,0.0003921233315912
4.59500,0.00039217223159120003
4.60000,0.0003922100315912
4.60500,0.0003922246315912
4.61000,0.0003922810315912
4.61500,0.00039230933159120006
4.62000,0.00039232663159120005
4.62500,0.0003923683315912
4.63000,0.00039239963159120006
4.63500,0.0003924449315912
4.64000,0.00039247533159120007
4.64500,0.0003924917315912
4.65000,0.0003925414315912
4.65500,0.0003925510315912
4.66000,0.0003926022315912
4.66500,0.00039258413159120003
4.67000,0.0003926276315912
4.67500,0.0003927253315912
4.68000,0.0003927191315912
4.68500,0.0003926767315912
4.69000,0.0003927778315912
4.69500,0.0003928117315912
4.70000,0.0003928255315912
4.70500,0.00039284933159120005
4.71000,0.00039287673159120007
4.71500,0.0003929137315912
4.72000,0.00039292833159120004
4.72500,0.00039297333159120005
4.73000,0.00039305613159120004
4.73500,0.00039298043159120003
4.74000,0.00039306123159120004
4.74500,0.00039308923159120005
4.75000,0.0003931020315912
4.75500,0.0003931107315912
4.76000,0.00039316343159120004
4.76500,0.00039320483159120006
4.77000,0.00039318313159120003
4.77500,0.0003932430315912
4.78000,0.00039328623159120006
4.78500,0.0003932948315912
4.79000,0.0003933348315912
4.79500,0.0003933708315912
4.80000,0.0003934275315912
4.80500,0.0003934933315912
4.81000,0.0003933869315912
4.81500,0.00039350763159120005
4.82000,0.00039353273159120004
4.82500,0.0003935321315912
4.83000,0.0003935798315912
4.83500,0.0003935836315912
4.84000,0.0003936641315912
4.84500,0.00039368763159120006
4.85000,0.00039365193159120003
4.85500,0.00039369873159120005
4.86000,0.0003937326315912
4.86500,0.00039382033159120005
4.87000,0.0003938480315912
4.87500,0.00039383903159120003
4.88000,0.0003938897315912
4.88500,0.0003940000315912
4.89000,0.0003939517315912
4.89500,0.0003940104315912
4.90000,0.0003940086315912
4.90500,0.0003939877315912
4.91000,0.00039403513159120006
4.91500,0.00039411413159120006
4.92000,0.0003941618315912
4.92500,0.00039414903159120003
4.93000,0.00039421843159120006
4.93500,0.00039421603159120006
4.94000,0.00039430313159120005
4.94500,0.0003943028315912
4.95000,0.0003943370315912
4.95500,0.0003943808315912
4.96000,0.0003943910315912
4.96500,0.00039442853159120004
4.97000,0.0003944494315912
4.97500,0.00039449833159120004
4.98000,0.00039456143159120004
4.98500,0.0003945686315912
4.99000,0.0003945677315912
4.99500,0.0003946011315912
Here is a bit of code, based on yours. Here, there is an option to read from a file, or curve fit perfect data that is created on the fly. If you have "nice" values, like (a,b,c,d)=1,2,3,4, and take the log of the data, the curve fit does well. However, if you fiddle with the parameters, you can wind up with bad fits, even with "perfect" data. I did notice that when rejecting the data V<-3.0, and not taking the log10(I90), you get a fit that looks OK, but it isn't perfect. When trying data when V>-2.5, and taking the log of the I90 data, you get this, which isn't bad:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
https://stackoverflow.com/questions/71008658/how-to-fit-a-tanhx-function-to-data-in-python?noredirect=1#71008658
test code to check tanh fitting
"""
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import numpy as np
def fitfunction(v,a,b,c,d):
return (a * np.tanh((b * v) + c) + d)
# x data = V90
# y data = np.log10(I90)
# read data from file
readdata=False
if(readdata==True):
# arr = np.loadtxt("anode_90ma_test1.csv", delimiter=",")
arr = np.loadtxt("test_data4.csv", delimiter=",")
V90=arr[:,0]
# option of using raw data or log of raw data
I90=arr[:,1]
# I90=np.log10(arr[:,1])
else:
a_coeff=100
b_coeff=1
c_coeff=0.2
d_coeff=-99
V90=np.arange(-4,4,0.1)
createvalues = np.vectorize(fitfunction)
I90=createvalues(V90,a_coeff,b_coeff,c_coeff,d_coeff)
# optionally take log10
# I90=np.log10(I90)
# pars, cov = curve_fit(fitfunction,V90,np.log10(I90))
pars, cov = curve_fit(fitfunction,V90,I90)
print("fit pars = ",pars)
plt.plot(V90,fitfunction(V90,*pars),'r-',linewidth='3',label='Line of Best Fit')
plt.scatter(V90,I90,marker='.',label='Data')
plt.title('Graph of Line of Best Fit of Anode Current against Anode Potential')
plt.grid(True)
plt.xlabel('Voltage (V)')
plt.ylabel('Current (A)')
plt.legend()
plt.show()
I have to confess that I do not see any particular problem in the data and in fitting it. I'd actually say that any problems likely occur due to problems with initial values. The outliers are negligible, but one could try to use robust fitting. Start values can be guessed automatically like this:
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
import numpy as np
def f( x, a, b, c, d ):
return a * np.tanh( b * x + c ) + d
def g( x, a, b, c, d, p ):
"""
sharpen the transition to the flat part with parameter p
"""
w = np.copysign( np.ones( len( x ) ), b * x + c )
return a * w * np.tanh( np.abs( b * x + c )**p )**( 1 / p ) + d
data = np.loadtxt(
"anode_90ma_test1.txt",
skiprows=1, delimiter=","
)
xl, yl = data[:,0], np.log10( data[:,1] )
### simple guesses for a, c and d
d0 = ( min( yl ) + max( yl ) ) / 2
a0 = ( max( yl ) - min( yl ) ) / 2
npa = np.argwhere( np.heaviside( yl - d0, 0 ) == 0 )
c0 = -xl[ npa[-1,0] ]
print ("a0, c0, d0 = ", a0, c0, d0 )
### best guess for b via differential equation
### and alternative guesses for a and d
### uses: dy/dx = u * y**2 + v * y + w
### with u = -b/a, v = 2 b / a d and w = -b/a d^2 + a b
dy = np.gradient( yl, xl )
VT = np.array([
yl**2, yl, np.ones( len( yl ) )
])
V = np.transpose( VT )
eta = np.dot( VT, dy )
A = np.dot( VT, V )
sol = np.linalg.solve( A, eta )
print( sol )
u, v, w = sol
df = -v / 2 / u
bf = np.sqrt( u**2 * df**2 - w * u )
af = -bf / u
print( "d = ", df )
print( "b = ", bf )
print( "a = ", af )
### non-linear fit
sol, cov = curve_fit( f, xl, yl, p0=[ a0, bf, c0, d0 ] )
print( sol )
### with sharpened edges
sol2, cov2 = curve_fit( g, xl, yl, p0=np.append( sol, 1 ) )
### plotting
fig = plt.figure()
ax = fig.add_subplot( 1, 1, 1 )
ax.plot(
data[:,0], np.log10( data[:,1] ),
ls='', marker='+', label='data', alpha=0.5
)
ax.plot( data[:,0], f( data[:,0], *sol ), label="round edges" )
ax.plot( data[:,0], g( data[:,0], *sol2 ), label="sharp edges" )
ax.plot( data[:,0], f( data[:,0], af, bf, c0, df ), ls=':',label="guess" )
ax.axhline( y=d0, color='k', ls=':' )
ax.grid()
ax.legend( loc=0 )
plt.show()
providing
a0, c0, d0 = 3.9470363024506527 0.205 -7.350877976113524
[ -0.62643791 -8.42148481 -21.38742817]
d = -6.721723517680841
b = 2.0814552335484855
a = 3.3226840415005183
[ 3.30445925 2.18844797 0.19235933 -6.7106928 ]
and
Final non-linear fit works flawless.

Resonance frequency with python curve-fit, error maxfev reached

I have measured a voltage over a LCR tank circuit (with unknown components) to determine the resonance frequency. Ik have performed a broad frequency sweep and measured the voltages. Now I want to determine the exact location of the resonance peak by adding a fit to the data. The curve looks like a damped, driven, harmonic oscillator so I used the following function to fit the data to: A = F0 / sqrt((k-mw^2)^2 + (bw)^2).
This is the code I have for now, but I get the following error: "Optimal parameters not found: Number of calls to function has reached maxfev = 5000."
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def fit( f, F0 , k, m, b):
w = 2 * np.pi * f
return F0 / np.sqrt( ( k - m*w**2)**2 + ( b * w )**2 )
fuData = np.loadtxt( "ohlVW.txt", delimiter=',' )
fuData = fuData[ fuData[:,0].argsort()]
f = fuData[:,0]
U = fuData[:,1]
popt, _ = curve_fit(fit, f, U, maxfev=5000)
F0, k, m, b = popt
print(popt)
plt.scatter(f, U)
x_line = np.arange(min(f), max(f), 1)
y_line = fit(x_line, F0, k, m, b)
plt.figure()
plt.plot(f, U)
plt.plot(x_line, y_line, '--', color='red')
plt.show()
Increasing maxfev did not work. How can I adjust the code to get a nice fit over the data?

How to calculate "relative error in the sum of squares" and "relative error in the approximate solution" from least squares method?

I have implemented a 3D gaussian fit using scipy.optimize.leastsq and now I would like to tweak the arguments ftol and xtol to optimize the performances. However, I don't understand the "units" of these two parameters in order to make a proper choice. Is it possible to calculate these two parameters from the results? That would give me an understanding of how to choose them. My data is numpy arrays of np.uint8. I tried to read the FORTRAN source code of MINIPACK but my FORTRAN knowledge is zero. I also read checked the Levenberg-Marquardt algorithm, but I could not really get a number that was below the ftol for example.
Here is a minimal example of what I do:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import leastsq
class gaussian_model:
def __init__(self):
self.prev_iter_model = None
self.f_vals = []
def gaussian_1D(self, coeffs, xx):
A, sigma, mu = coeffs
# Center rotation around peak center
x0 = xx - mu
model = A*np.exp(-(x0**2)/(2*(sigma**2)))
return model
def residuals(self, coeffs, I_obs, xx, model_func):
model = model_func(coeffs, xx)
residuals = I_obs - model
if self.prev_iter_model is not None:
self.f = np.sum(((model-self.prev_iter_model)/model)**2)
self.f_vals.append(self.f)
self.prev_iter_model = model
return residuals
# x data
x_start = 1
x_stop = 10
num = 100
xx, dx = np.linspace(x_start, x_stop, num, retstep=True)
# Simulated data with some noise
A, s_x, mu = 10, 0.5, 3
coeffs = [A, s_x, mu]
model = gaussian_model()
yy = model.gaussian_1D(coeffs, xx)
noise_ampl = 0.5
noise = np.random.normal(0, noise_ampl, size=num)
yy += noise
# LM Least squares
initial_guess = [1, 1, 1]
pred_coeffs, cov_x, info, mesg, ier = leastsq(model.residuals, initial_guess,
args=(yy, xx, model.gaussian_1D),
ftol=1E-6, full_output=True)
yy_fit = model.gaussian_1D(pred_coeffs, xx)
rel_SSD = np.sum(((yy-yy_fit)/yy)**2)
RMS_SSD = np.sqrt(rel_SSD/num)
print(RMS_SSD)
print(model.f)
print(model.f_vals)
fig, ax = plt.subplots(1,2)
# Plot results
ax[0].scatter(xx, yy)
ax[0].plot(xx, yy_fit, c='r')
ax[1].scatter(range(len(model.f_vals)), model.f_vals, c='r')
# ax[1].set_ylim(0, 1E-6)
plt.show()
rel_SSD is around 1 and definitely not something below ftol = 1E-6.
EDIT: Based on #user12750353 answer below I updated my minimal example to try to recreate how lmdif determines termination with ftol. The problem is that my f_vals are too small, so they are not the right values. The reason I would like to recreate this is that I would like to see what kind of numbers I am getting on my main code to decide on a ftol that would terminate the fitting process earlier.
Since you are giving a function without the gradient, the method called is lmdif. Instead of gradients it will use forward difference gradient estimate, f(x + delta) - f(x) ~ delta * df(x)/dx (I will write as if the parameter).
There you find the following description
c ftol is a nonnegative input variable. termination
c occurs when both the actual and predicted relative
c reductions in the sum of squares are at most ftol.
c therefore, ftol measures the relative error desired
c in the sum of squares.
c
c xtol is a nonnegative input variable. termination
c occurs when the relative error between two consecutive
c iterates is at most xtol. therefore, xtol measures the
c relative error desired in the approximate solution.
Looking in the code the actual reduction acred = 1 - (fnorm1/fnorm)**2 is what you calculated for rel_SSD, but between the two last iterations, not between the fitted function and the target points.
Example
The problem here is that we need to discover what are the values assumed by the internal variables. An attempt to do so is to save the coefficients and the residual norm every time the function is called as follows.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import leastsq
class gaussian_model:
def __init__(self):
self.prev_iter_model = None
self.fnorm = []
self.x = []
def gaussian_1D(self, coeffs, xx):
A, sigma, mu = coeffs
# Center rotation around peak center
x0 = xx - mu
model = A*np.exp(-(x0**2)/(2*(sigma**2)))
grad = np.array([
model / A,
model * x0**2 / (sigma**3),
model * 2 * x0 / (2*(sigma**2))
]).transpose();
return model, grad
def residuals(self, coeffs, I_obs, xx, model_func):
model, grad = model_func(coeffs, xx)
residuals = I_obs - model
self.x.append(np.copy(coeffs));
self.fnorm.append(np.sqrt(np.sum(residuals**2)))
return residuals
def grad(self, coeffs, I_obs, xx, model_func):
model, grad = model_func(coeffs, xx)
residuals = I_obs - model
return -grad
def plot_progress(self):
x = np.array(self.x)
dx = np.sqrt(np.sum(np.diff(x, axis=0)**2, axis=1))
plt.plot(dx / np.sqrt(np.sum(x[1:, :]**2, axis=1)))
fnorm = np.array(self.fnorm)
plt.plot(1 - (fnorm[1:]/fnorm[:-1])**2)
plt.legend(['$||\Delta f||$', '$||\Delta x||$'], loc='upper left');
# x data
x_start = 1
x_stop = 10
num = 100
xx, dx = np.linspace(x_start, x_stop, num, retstep=True)
# Simulated data with some noise
A, s_x, mu = 10, 0.5, 3
coeffs = [A, s_x, mu]
model = gaussian_model()
yy, _ = model.gaussian_1D(coeffs, xx)
noise_ampl = 0.5
noise = np.random.normal(0, noise_ampl, size=num)
yy += noise
Then we can see the relative variation of $x$ and $f$
initial_guess = [1, 1, 1]
pred_coeffs, cov_x, info, mesg, ier = leastsq(model.residuals, initial_guess,
args=(yy, xx, model.gaussian_1D),
xtol=1e-6,
ftol=1e-6, full_output=True)
plt.figure(figsize=(14, 6))
plt.subplot(121)
model.plot_progress()
plt.yscale('log')
plt.grid()
plt.subplot(122)
yy_fit,_ = model.gaussian_1D(pred_coeffs, xx)
# Plot results
plt.scatter(xx, yy)
plt.plot(xx, yy_fit, c='r')
plt.show()
The problem with this is that the function is evaluated both to compute f and to compute the gradient of f. To produce a cleaner plot what can be done is to implement pass Dfun so that it evaluate func only once per iteration.
# x data
x_start = 1
x_stop = 10
num = 100
xx, dx = np.linspace(x_start, x_stop, num, retstep=True)
# Simulated data with some noise
A, s_x, mu = 10, 0.5, 3
coeffs = [A, s_x, mu]
model = gaussian_model()
yy, _ = model.gaussian_1D(coeffs, xx)
noise_ampl = 0.5
noise = np.random.normal(0, noise_ampl, size=num)
yy += noise
# LM Least squares
initial_guess = [1, 1, 1]
pred_coeffs, cov_x, info, mesg, ier = leastsq(model.residuals, initial_guess,
args=(yy, xx, model.gaussian_1D),
Dfun=model.grad,
xtol=1e-6,
ftol=1e-6, full_output=True)
plt.figure(figsize=(14, 6))
plt.subplot(121)
model.plot_progress()
plt.yscale('log')
plt.grid()
plt.subplot(122)
yy_fit,_ = model.gaussian_1D(pred_coeffs, xx)
# Plot results
plt.scatter(xx, yy)
plt.plot(xx, yy_fit, c='r')
plt.show()
Well, the value I am obtaining for xtol is not exactly what is in the lmdif implementation.

How to pass data frame series or array to exponential function while calculating curvefit

I am trying to fit using the scipy.optimize module.
My Exponential function:
a - (a - b) np.exp( -(c + Q / V) * t )
I need to find out a, b, c from the equation by optimizing.
V = 1200 # constant
my data frame looks like this:
time(t) value score(Q)
1.0 2.347 4500
2.0 2.345 4600
3.0 2.523 4655
4.0 2.723 4500
...
...
100.0 5.6787 7000
...
Values in the "value" field increases in a linear way.
My fit function for the above exponential:
def my_exp(Q, t, a, b, c): #just added Q here
V = 1280
return a - (a - b) np.exp( -(c + Q / V) * t )
# Q = 5000 #mean value from column score
# getting values
c, cov = curve_fit(lambda t, a, b, c: my_exp(Q, t, a, b, c), df['time'], df['value'])
scenario 1: when "score" column is not given, provided constant has to be passed like(ex: Q = 5000)
I tried with taking mean value from the score series and it's working.
scenario 2: when "score" column is given, send "score" series to the exponential function
Q should be providing the score value at each time point
How can I send "score" series array values to the Q in exponential function to get an optimized value?
Is it the correct way of doing curve_fit for the above-mentioned data or do I need to follow any other curve fitting models?
Hi as a dirty work around you could do just the following:
do not pass Q as argument.
make Q a global list
as t are basically integer call inside of my_epx the value of Q via Q[ int( round( t ) ) ]
Edit/AddOn
This should make it clear:
import numpy as np
from scipy.optimize import curve_fit
# setting static values
v = 2500
qlist = [ np.random.randint(4500, 4650 ) for i in range( 15 ) ]
tlist = np.arange( 1, 16 )
noiselist = np.random.normal( 0, 0.3, 15 )
a0 = 5
b0 = 2.2
c0 = -.78
# simple function checking for iterable first argument
def my_exp( t, a, b, c):
if isinstance( t, ( list, tuple, np.ndarray ) ):
out = np.fromiter( ( my_exp( tt, a, b, c) for tt in t ), np.float )
else:
localindex = int( round ( t ) ) - 1 ## -1 as t starts at 1 but index at 0
localq = qlist[ localindex ]
out = a - ( a - b ) * np.exp( -( c + localq / v ) * t )
return out
# creating some test data with noise
testdata = my_exp( tlist, a0, b0, c0 )
testdatanoisy = testdata + noiselist
# fitting, does not even require start values
sol, _ = curve_fit( my_exp, tlist, testdatanoisy )
print sol
# works
which gives something like:
>> [ 4.89673111, 1.70423291, -0.72995739 ]

Is there a way to add a dependency between parameters for scipy.optimize.curve_fit

I am trying to characterize the spreading of carbon isotopes caused by nuclear test in the 70's in an ocean model.
The atmospheric signal is a strong spike, which will be carried to depth with the ocean currents (deeper currents are much slower).
My goal is to detect the onset of the rise in concentration and the rate of increase at various depth levels.
I assume that the oceanic concentration of carbon isotopes behaves like a piecewise linear function with 3 segments:
A constant initial value (b) up until time (t_0)
A linear increase in concentrations from time (t_0) to (t_1) with the rate m1.
A linear decrease in concentration after time(t_1) with the rate m2
I am representing the function using this code in python:
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as sio
def piecewise_linear( t, t0, t1, b, m1, m2 ):
condlist = [ t < t0,
(t >= t0 ) & ( t < t1 ),
t >= t1
]
funclist = [lambda t: b,
lambda t: b + m1 * ( t - t0 ),
lambda t: b + m1 * ( t - t0 ) + m2 * ( t - t1 )
]
return np.piecewise( t, condlist, funclist )
For a given time array t I want to be able to fit two 'types' of this function:
A full 3-segment line, which is representative of the upper ocean, where the signal propagates fast and the spike is fully captured.
A special case, where at the end of the time series the concentration has not peaked (this would represent the signal in the deep ocean, where it takes a long time to propagate the signal)
As example
t = np.arange( 0, 15, 0.1 )
y_full = piecewise_linear( t, 5, 10, 2, 2, -4 )
y_cut = piecewise_linear( t, 5, 15, 2, 2, -4 )
plt.plot( t, y_full )
plt.plot( t, y_cut )
plt.legend( [ 'surface', 'deep ocean' ] )
For the first case I am getting good results, when I try to fit the function to the signal after adding some random noise:
noise = np.random.normal( 0, 1, len( y_full ) ) * 1
y = y_full
yy = y_full + noise
bounds = ( [ 0, 0, 0, 0, -np.inf ], [ np.inf, np.inf, np.inf, np.inf, 0 ] )
fit,_ = sio.curve_fit( piecewise_linear, t, yy, bounds=bounds )
print( fit )
y_fit = piecewise_linear( t, *tuple( fit ) )
plt.plot( t, yy, color='0.5' )
plt.plot( t, y_fit, linewidth=3 )
plt.plot( t, y, linestyle='--', linewidth=3 )
Which results in
>>[ 5.00001407 10.01945313 2.13055863 1.95208167 -3.95199719]
However when I try to evaluate the second case (deep ocean), I often get poor results like below:
noise = np.random.normal( 0, 1, len(y_full ) ) * 1#
y = y_cut
yy = y_cut+noise
bounds = ( [ 0, 0, 0, 0, -np.inf], [ np.inf, np.inf, np.inf, np.inf, 0 ] )
fit,_ = sio.curve_fit( piecewise_linear, t, yy, bounds=bounds )
print( fit )
y_fit = piecewise_linear( t, *tuple( fit ) )
plt.plot( t, yy, color='0.5' )
plt.plot( t, y_fit, linewidth=3 )
plt.plot( t, y, linestyle='--', linewidth=3 )
plt.legend( [ 'noisy data', 'fit', 'original' ] )
I get
>>[ 1.83838997 0.40000014 1.51810839 2.56982348 -1.0622842 ]
The optimization determines that t_0 is larger than t_1, which is nonsensical in this context.
Is there a way to build the condition t_0 < t_1 into the curve fitting? Or do I have to test, which type of curve is given and then fit to two different functions (a 3-segment or 2-segment piecewise linear function)?
Any help is greatly appreciated
You might consider using lmfit (https://lmfit.github.io/lmfit-py) for this.
Lmfit provides a higher-level interface to curve fitting and makes fitting parameters first class python objects. Among other things, this easily allows fixing some parameters, and setting bounds on parameters in a more pythonic manner than what scipy.optimize.curve_fit uses. In particular for your question, lmfit parameters also support using mathematical expressions as constraint expressions for all parameters.
To turn your model function piecewise_linear() into an Model for curve-fitting with lmfit you would do something like
from lmfit import Model
# make a model
mymodel = Model(piecewise_linear)
# create parameters and set initial values
# note that parameters are *named* from the
# names of arguments of your model function
params = mymodel.make_params(t0=0, t1=1, b=3, m1=2, m2=2)
# now, you can place bounds on parameters, maybe like
params['b'].min = 0
params['m1'].min = 0
# but what you want is an inequality constraint, so
# 1. add a new parameter 'tdiff'
# 2. constrain t1 = t0 + tdiff
# 3. set a minimum value of 0 for tdiff
params.add('tdiff', value=1, min=0)
params['t1'].expr = 't0 + tdiff'
# now perform the fit
result = mymodel.fit(yy, params, t=t)
# print out results
print(result.fit_report())
You can read in the lmfit docs or on other SO questions how to extract other information from the fit result.
In this case curve_fit has several disadvantages such that the solution of MNewille is something to think about. Moreover, curve_fit has no parameter args (in contrast to, e.g., leastsq), which might allow to switch off the second slope. A second fit function without m2 might be a solution, here. If, however, curve_fit is a must, and a generic fit function working in both cases is required, a solution might look like (note the starting parameters extracted from the data):
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as sio
"""
we know t0 > 0, t1 > t0, b>0, m1 > 0, m2 < 0
"""
def piecewise_linear( t, t0, a , b, m1, m2 ):
t0 = abs( t0 )
t1 = abs( a ) * t0
b = abs( b )
m1 = abs( m1 )
m2 = - abs( m2 )
condlist = [ t < t0,
( t >= t0 ) & ( t < t1 ),
t >= t1
]
funclist = [ lambda t: b,
lambda t: b + m1 * ( t - t0 ),
lambda t: b + m1 * ( t - t0 ) + m2 * ( t - t1 )
]
return np.piecewise( t, condlist, funclist )
t = np.arange( 0, 15, 0.1 )
y_full = piecewise_linear( t, 5, 2, 2, 2, -4 )
y_cut = piecewise_linear( t, 5, 3, 2, 2, -4 )
####################
#~ plt.plot( t, y_full )
#~ plt.plot( t, y_cut )
#~ plt.legend( [ 'surface', 'deep ocean'] )
####################
#~ noise = np.random.normal( 0, 1, len( y_full ) ) * 1
#~ y = y_full
#~ yy = y_full + noise
#~ bounds = ( [ 0, 0, 0, 0, -np.inf ], [ np.inf, np.inf, np.inf, np.inf, 0 ] )
#~ fit,_ = sio.curve_fit( piecewise_linear, t, yy, bounds=bounds )
#~ print( fit )
#~ y_fit = piecewise_linear( t, *tuple( fit ) )
#~ plt.plot( t, yy, color='0.5' )
#~ plt.plot( t, y_fit, linewidth=3 )
#~ plt.plot( t, y, linestyle='--', linewidth=3 )
####################
noise = np.random.normal( 0, 1, len( y_full ) ) * 1
y = y_cut
yy = y_cut + noise
tPos = np.argmax( yy )
t1Start = t[ tPos ]
t0Start = t[ tPos // 2 ]
bStart = yy[ 0 ]
aStart = 2
m1Start = ( yy[ tPos ] - yy[ tPos // 2 ] ) / ( t1Start - t0Start )
p0 = [ t0Start, aStart, bStart, m1Start, 0 ])
fit,_ = sio.curve_fit( piecewise_linear, t, yy, p0=p0 )
print( fit )
y_fit = piecewise_linear( t, *tuple( fit ) )
plt.plot( t, yy, color='0.5' )
plt.plot( t, y_fit, linewidth=3 )
plt.plot( t, y, linestyle='--', linewidth=3 )
plt.legend( [ 'noisy data', 'fit', 'original' ] )
plt.show()
It works on the test data. One has to keep in mind that the returned fit parameters might be negative. As the function takes the modulus, this needs to be done on the returned parameters as well. Also note that t1 is not fitted directly any more, but as a multiple of t0. Errors, hence, need to be propagated accordingly. The new structure does not require bounds.
Also note, the choice of starting parameters p0 should work for case 1, too.
Since I don't known what exactly is the regression algorithm used in Python, I cannot really answer to your question. Probably the algorithm is an iterative process as usual.
As additional information, I would show a very simple method which can give an approximate answer without iterative process nor initial guess. The theory based on the fitting of an integral equation can be found in https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales and some examples of the use in case of piecewise function are shown in : https://fr.scribd.com/document/380941024/Regression-par-morceaux-Piecewise-Regression-pdf
In the case of a piecewise function made of three linear segments, the method of calculus is given page 30 of the second above paper. It is very easy to write a code in any computer language. I suppose that it is possible with Python too.
From the data obtained by scanning the original graph :
The method of regression with integral equation leads to the next result :
The fitted equation is :
H is the Heaviside function.
The values of the parameters a1, a2, p1, q1, p2, q2, p3, q3 are given on the above figure.
One can see that the first segment is not exactly horizontal as expected. But the slope is very small : 0.166
It is possible to specify a slope=0 (that is p1=0) thanks to a slight change in the second part of the algorithm. The modified algorithm is shown below :
Now, the result is :

Categories