Still having trouble with curve fitting - python

I already opened a question on this topic, but I wasn't sure, if I should post it there, so I opened a new question here.
I have trouble again when fitting two or more peaks. First problem occurs with a calculated example function.
xg = np.random.uniform(0,1000,500)
mu1 = 200
sigma1 = 20
I1 = -2
mu2 = 800
sigma2 = 20
I2 = -1
yg3 = 0.0001*xg
yg1 = (I1 / (sigma1 * np.sqrt(2 * np.pi))) * np.exp( - (xg - mu1)**2 / (2 * sigma1**2) )
yg2 = (I2 / (sigma2 * np.sqrt(2 * np.pi))) * np.exp( - (xg - mu2)**2 / (2 * sigma2**2) )
yg=yg1+yg2+yg3
plt.figure(0, figsize=(8,8))
plt.plot(xg, yg, 'r.')
I tried two different approaches, I found in the documentation, which are shown below (modified for my data), but both give me wrong fitting data and a messy chaos of graphs (I guess one line per fitting step).
1st attempt:
import numpy as np
from lmfit.models import PseudoVoigtModel, LinearModel, GaussianModel, LorentzianModel
import sys
import matplotlib.pyplot as plt
gauss1 = PseudoVoigtModel(prefix='g1_')
pars.update(gauss1.make_params())
pars['g1_center'].set(200)
pars['g1_sigma'].set(15, min=3)
pars['g1_amplitude'].set(-0.5)
pars['g1_fwhm'].set(20, vary=True)
#pars['g1_fraction'].set(0, vary=True)
gauss2 = PseudoVoigtModel(prefix='g2_')
pars.update(gauss2.make_params())
pars['g2_center'].set(800)
pars['g2_sigma'].set(15)
pars['g2_amplitude'].set(-0.4)
pars['g2_fwhm'].set(20, vary=True)
#pars['g2_fraction'].set(0, vary=True)
mod = gauss1 + gauss2 + LinearModel()
pars.add('intercept', value=0, vary=True)
pars.add('slope', value=0.0001, vary=True)
init = mod.eval(pars, x=xg)
out = mod.fit(yg, pars, x=xg)
print(out.fit_report(min_correl=0.5))
plt.figure(5, figsize=(8,8))
out.plot_fit()
When I include the 'fraction'-parameter, I often get
'NameError: name 'pv1_fraction' is not defined in expr='<_ast.Module object at 0x00000000165E03C8>'.
although it should be defined. I get this Error for real data with this approach, too.
2nd attempt:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import lmfit
def gauss(x, sigma, mu, A):
return A*np.exp(-(x-mu)**2/(2*sigma**2))
def linear(x, m, n):
return m*x + n
peak1 = lmfit.model.Model(gauss, prefix='p1_')
peak2 = lmfit.model.Model(gauss, prefix='p2_')
lin = lmfit.model.Model(linear, prefix='l_')
model = peak1 + lin + peak2
params = model.make_params()
params['p1_mu'] = lmfit.Parameter(value=200, min=100, max=250)
params['p2_mu'] = lmfit.Parameter(value=800, min=100, max=1000)
params['p1_sigma'] = lmfit.Parameter(value=15, min=0.01)
params['p2_sigma'] = lmfit.Parameter(value=20, min=0.01)
params['p1_A'] = lmfit.Parameter(value=-2, min=-3)
params['p2_A'] = lmfit.Parameter(value=-2, min=-3)
params['l_m'] = lmfit.Parameter(value=0)
params['l_n'] = lmfit.Parameter(value=0)
out = model.fit(yg, params, x=xg)
print out.fit_report()
plt.figure(8, figsize=(8,8))
out.plot_fit()
So the result looks like this, in both cases. It seems to plot all fitting attempts, but never solves it correctly. The best fitted parameters are in the range that I gave it.
Anyone knows this type of error? Or has any solutions for this? And does anyone know how to avoid the NameError when calling a model function from lmfit with those approaches?

I have a somewhat tolerable solution for you. Since I don't know how variable your data is, I cannot say that it will work in a general sense but should get you started. If your data is along 0-1000 and has two peaks or dips along a line as you showed, then it should work.
I used the scipy curve_fit and put all of the components of the function together into one function. One can pass starting locations into the curve_fit function. (you can probably do this with the lib you're using but I'm not familiar with it) There is a loop in loop where I vary the mu parameters to find the ones with the lowest squared error. If you are needing to fit your data many times or in some real-time scenario then this is not for you but if you just need to fit some data, launch this code and grab a coffee.
from scipy.optimize import curve_fit
import numpy as np
import matplotlib.pyplot as plt
import pylab
from matplotlib import cm as cm
import time
def my_function_big(x, m, n, #lin vars
sigma1, mu1, I1, #gaussian 1
sigma2, mu2, I2): #gaussian 2
y = m * x + n + (I1 / (sigma1 * np.sqrt(2 * np.pi))) * np.exp( - (x - mu1)**2 / (2 * sigma1**2) ) + (I2 / (sigma2 * np.sqrt(2 * np.pi))) * np.exp( - (x - mu2)**2 / (2 * sigma2**2) )
return y
#make some data
xs = np.random.uniform(0,1000,500)
mu1 = 200
sigma1 = 20
I1 = -2
mu2 = 800
sigma2 = 20
I2 = -1
yg3 = 0.0001 * xs
yg1 = (I1 / (sigma1 * np.sqrt(2 * np.pi))) * np.exp( - (xs - mu1)**2 / (2 * sigma1**2) )
yg2 = (I2 / (sigma2 * np.sqrt(2 * np.pi))) * np.exp( - (xs - mu2)**2 / (2 * sigma2**2) )
ys = yg1 + yg2 + yg3
xs = np.array(xs)
ys = np.array(ys)
#done making data
#start a double loop...very expensive but this is quick and dirty
#it would seem that the regular optimizer has trouble finding the minima so i
#found that having the near proper mu values helped it zero in much better
start = time.time()
serr = []
_x = []
_y = []
for x in np.linspace(0, 1000, 61):
for y in np.linspace(0, 1000, 61):
cfiti = curve_fit(my_function_big, xs, ys, p0=[0, 0, 1, x, 1, 1, y, 1], maxfev=20000000)
serr.append(np.sum((my_function_big(xs, *cfiti[0]) - ys) ** 2))
_x.append(x)
_y.append(y)
serr = np.array(serr)
_x = np.array(_x)
_y = np.array(_y)
print 'done loop in loop fitting'
print 'time: %0.1f' % (time.time() - start)
gridsize=20
plt.subplot(111)
plt.hexbin(_x, _y, C=serr, gridsize=gridsize, cmap=cm.jet, bins=None)
plt.axis([_x.min(), _x.max(), _y.min(), _y.max()])
cb = plt.colorbar()
cb.set_label('SE')
plt.show()
ix = np.argmin(serr.ravel())
mustart1 = _x.ravel()[ix]
mustart2 = _y.ravel()[ix]
print mustart1
print mustart2
cfit = curve_fit(my_function_big, xs, ys, p0=[0, 0, 1, mustart1, 1, 1, mustart2, 1], maxfev=2000000000)
xp = np.linspace(0, 1000, 1001)
plt.figure()
plt.scatter(xs, ys) #plot synthetic dat
plt.plot(xp, my_function_big(xp, *cfit[0]), '-', label='fit function') #plot data evaluated along 0-1000
plt.legend(loc=3, numpoints=1, prop={'size':12})
plt.show()
pylab.close()
Good luck!

In your first attempt:
pars['g1_fraction'].set(0, vary=True)
The fraction must be a value between 0 and 1, but I believe that cannot be zero. Try to put something like 0.000001, and it will work.

Related

Solving for most likely intersection of multiple multivariate Gaussians

Is there a performant way to directly solve for the most likely intersection point (X, Y) of several multivariable Gaussians?
I've seen a few posts here that have asked how to solve for the intersection between two Gaussians - the concept is familiar to me. Right now it's not obvious to me aside from iterating and solving for two distributions at a time.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
mus = [np.array([[0.3],[0.7]]),
np.array([[0.3],[0.2]]),
np.array([[1.5],[0.6]])]
covs = [np.array([[0.85, 0.3], [0.3, 0.25]]),
np.array([[0.7, -0.41], [-0.41, 0.25]]),
np.array([[0.5, 0.15], [0.15, 0.15]])]
cmaps = ["Reds", "Blues", "Greens"]
for m, cov, c in zip(mus, covs, cmaps):
cov_inv = np.linalg.inv(cov)
cov_det = np.linalg.det(cov)
x = np.linspace(-3, 3)
y = np.linspace(-3, 3)
X,Y = np.meshgrid(x,y)
coe = 1.0 / ((2 * np.pi)**2 * cov_det)**0.5
Z = coe * np.e ** (-0.5 * (cov_inv[0,0]*(X-m[0])**2 + (cov_inv[0,1] + cov_inv[1,0])*(X-m[0])*(Y-m[1]) + cov_inv[1,1]*(Y-m[1])**2))
plt.contour(X,Y,Z, cmap = c)
You can do a LOT better than iterating between 2 solutions at a time. Realize that at every (x, y) point, you have a Z value for all 3 curves, and at the 3-way intersecting point, they are all equal (or within tolerance). And at other points, if you take the lowest Z of the curves, and move towards the center (mu_x, mu_y) of that curve, you are moving in an improving direction.
The below is an iterative algorithm that does that. There is certainly some meat on the bone in terms of possible enhancements. Notably, you could incorporate a "tolerance" for stopping conditions easily, or do some weighted average of the 2 lower z values instead of just the lowest to get the movement vector, or tinker with a larger step size.
Anyhow, this converges very rapidly for many different test starting points.
Code:
import numpy as np
import matplotlib.pyplot as plt
class Curve:
# a convenience so we can avoid recomputations
def __init__(self, mu, cov_inv, cov_det):
self.mu = mu
self.cov_inv = cov_inv
self.cov_det = cov_det
self.coe = 1.0 / ((2 * np.pi)**2 * cov_det)**0.5
def z(self, x, y):
Z = self.coe * np.e ** (-0.5 * (self.cov_inv[0,0]*(x-self.mu[0])**2 + \
(self.cov_inv[0,1] + self.cov_inv[1,0])*(x-self.mu[0])*(y-self.mu[1]) + self.cov_inv[1,1]*(y-self.mu[1])**2))
return Z
mus = [np.array([[0.3],[0.7]]),
np.array([[0.3],[0.2]]),
np.array([[1.5],[0.6]])]
covs = [np.array([[0.85, 0.3], [0.3, 0.25]]),
np.array([[0.7, -0.41], [-0.41, 0.25]]),
np.array([[0.5, 0.15], [0.15, 0.15]])]
cmaps = ["Reds", "Blues", "Greens"]
curves = []
for m, cov, c in zip(mus, covs, cmaps):
cov_inv = np.linalg.inv(cov)
cov_det = np.linalg.det(cov)
x = np.linspace(-3, 3)
y = np.linspace(-3, 3)
X,Y = np.meshgrid(x,y)
coe = 1.0 / ((2 * np.pi)**2 * cov_det)**0.5
Z = coe * np.e ** (-0.5 * (cov_inv[0,0]*(X-m[0])**2 + (cov_inv[0,1] + cov_inv[1,0])*(X-m[0])*(Y-m[1]) + cov_inv[1,1]*(Y-m[1])**2))
plt.contour(X,Y,Z, cmap = c)
curves.append(Curve(m, cov_inv, cov_det))
# iterative algorithm...
pos = np.array((-1,2))
step_size = 0.1
num_steps = 100
footprints = [pos,]
for step in range(num_steps):
zs = [ (curves[i].z(*pos), i) for i in range(len(curves))]
zs.sort() # sort by z value, lowest will be first
c = curves[zs[0][1]] # the curve to move toward
vec = c.mu.T - pos
move_vec = vec * (step_size/np.linalg.norm(vec))
print(f'move: {move_vec} towards curve {zs[0][1]}')
pos = pos + move_vec
pos = pos.flatten()
# check to see if we have backtracked, if so, shorten the step
if len(footprints) > 1 and np.linalg.norm(pos - footprints[-2]) < step_size:
#print(f'norm: {np.linalg.norm(pos-footprints[-2])}')
step_size *= 0.5
footprints.append(pos)
plt.plot([t[0] for t in footprints], [t[1] for t in footprints], c='k', lw=2)
plt.show()
Plot:

Python: how to integrate functions with two unknown parameters numerically

Now I have two functions respectively are
rho(u) = np.exp( (-2.0 / 0.2) * (u**0.2-1.0) )
psi( w(x-u) ) = (1/(4.0 * math.sqrt(np.pi))) * np.exp(- ((w * (x-u))**2) / 4.0) * (2.0 - (w * (x-u))**2)
And then I want to integrate 'rho(u) * psi( w(x-u) )' with respect to 'u'. So that the integral result can be one function with respect to 'w' and 'x'.
Here's my Python code snippet as I try to solve this integral.
import numpy as np
import math
import matplotlib.pyplot as plt
from scipy import integrate
x = np.linspace(0,10,1000)
w = np.linspace(0,10,500)
u = np.linspace(0,10,1000)
rho = np.exp((-2.0/0.2)*(u**0.2-1.0))
value = np.zeros((500,1000),dtype="float32")
# Integrate the products of rho with
# (1/(4.0*math.sqrt(np.pi)))*np.exp(- ((w[i]*(x[j]-u))**2) / 4.0)*(2.0 - (w[i]*(x[j]-u))**2)
for i in range(len(w)):
for j in range(len(x)):
value[i,j] =value[i,j]+ integrate.simps(rho*(1/(4.0*math.sqrt(np.pi)))*np.exp(- ((w[i]*(x[j]-u))**2) / 4.0)*(2.0 - (w[i]*(x[j]-u))**2),u)
plt.imshow(value,origin='lower')
plt.colorbar()
As illustrated above, when I do the integration, I used nesting for loops. We all know that such a way is inefficient.
So I want to ask whether there are methods not using for loop.
Here is a possibility using scipy.integrate.quad_vec. It executes in 6 seconds on my machine, which I believe is acceptable. It is true, however, that I have used a step of 0.1 only for both x and w, but such a resolution seems to be a good compromise on a single core.
from functools import partial
import matplotlib.pyplot as plt
from numpy import empty, exp, linspace, pi, sqrt
from scipy.integrate import quad_vec
from time import perf_counter
def func(u, x):
rho = exp(-10 * (u ** 0.2 - 1))
var = w * (x - u)
psi = exp(-var ** 2 / 4) * (2 - var ** 2) / 4 / sqrt(pi)
return rho * psi
begin = perf_counter()
x = linspace(0, 10, 101)
w = linspace(0, 10, 101)
res = empty((x.size, w.size))
for i, xVal in enumerate(x):
res[i], err = quad_vec(partial(func, x=xVal), 0, 10)
print(f'{perf_counter() - begin} s')
plt.contourf(w, x, res)
plt.colorbar()
plt.xlabel('w')
plt.ylabel('x')
plt.show()
UPDATE
I had not realised, but one can also work with a multi-dimensional array in quad_vec. The updated approach below enables to increase the resolution by a factor of 2 for both x and w, and keep an execution time of around 7 seconds. Additionally, no more visible for loops.
import matplotlib.pyplot as plt
from numpy import exp, mgrid, pi, sqrt
from scipy.integrate import quad_vec
from time import perf_counter
def func(u):
rho = exp(-10 * (u ** 0.2 - 1))
var = w * (x - u)
psi = exp(-var ** 2 / 4) * (2 - var ** 2) / 4 / sqrt(pi)
return rho * psi
begin = perf_counter()
x, w = mgrid[0:10:201j, 0:10:201j]
res, err = quad_vec(func, 0, 10)
print(f'{perf_counter() - begin} s')
plt.contourf(w, x, res)
plt.colorbar()
plt.xlabel('w')
plt.ylabel('x')
plt.show()
Addressing the comment
Just add the following lines before plt.show() to have both axes scale logarithmically.
plt.gca().set_xlim(0.05, 10)
plt.gca().set_ylim(0.05, 10)
plt.gca().set_xscale('log')
plt.gca().set_yscale('log')

Sine ploting python

How to get one graph which consist of two different sinusoidal waves? I wrote this code but it makes two separate waves..
Fs = 1000
f = 2
sample = 1000
sample_rate= 0.1
x = np.arange(sample)
noise = 0.0003*np.asarray(random.sample(range(0,1000),sample))
y = np.sin(2 * np.pi * f * x / Fs)+noise
f1 = 10
x1 = np.arange(sample)
y1 = np.sin(2 * np.pi * f1 * x / Fs)+noise
plt.plot(x, y, x1, y1)
plt.xlabel('Time(s)')
plt.ylabel('Amplitude(V)')
plt.show()
I got this
but I need to get this one
Aside from the "spike" joining the two different signals, this looks more like what you're looking for:
import numpy as np
import matplotlib.pyplot as plt
rng = np.random.default_rng()
Fs = 1000
def generate_noisy_signal(*, length, f, noise_amp=0):
x = np.arange(length)
noise = noise_amp * rng.random(length)
return np.sin(2 * np.pi * f * x / Fs) + noise
signal1 = generate_noisy_signal(length=1000, f=2, noise_amp=0.3)
signal2 = generate_noisy_signal(length=1000, f=10, noise_amp=0.3) + 1.5
signal = np.concatenate([signal1, signal2])
plt.plot(signal)
plt.xlabel("Time(s)")
plt.ylabel("Amplitude(V)")
plt.show()

Specific heat fit using Debye+Einstein model in matplotlib

I am trying to find a fit to a specific heat data using gammaT+mDebye_model+(1-m)*Einstein model as given below.
Cel+ph(T ) = γ T + [αCDebye(T ) + (1 − α)CEinstein(T )]
where the Debye and Einstein models are given by eq. 3 and 4 in the attachment.
I have tried the following code in jupyter notebook following some examples on the web but i have no idea how can i combine these functions together to carry out the fit.
The data is linked https://www.dropbox.com/s/u0r2m3zwl8w77at/HC_ScPtBi.dat?dl=0
Column 1 is Temperature and Column 3 is Y data of interest.
Model is in https://www.dropbox.com/s/9452fq7eydajr5o/Debye.pdf?dl=0
Code is in https://www.dropbox.com/s/hk9b1t0agvt36zn/Untitled2.ipynb?dl=0
from matplotlib import pyplot
import numpy as np
from scipy import integrate
from scipy.optimize import curve_fit
from scipy.integrate import quad
data=np.genfromtxt('HC_ScPtBi.dat', skip_header=1)
R=8.314
n=3
M=1
T=data[10:290,0]
c=data[10:290,2]
def plot_data():
pyplot.scatter(T, c)
pyplot.xlabel('$T [K]$')
pyplot.ylabel('$C$')
plot_data()
def c_einstein(T, T_E):
x = T_E / T
return 3 *n*R*x**2 * np.exp(x) / (np.exp(x) - 1)**2
popt0, pcov0 = curve_fit(c_einstein, T, c, 250)
T_E = popt0[0]
delta_T_E = np.sqrt(pcov0[0, 0])
print(f"T_E = {T_E:.5} ± {delta_T_E:.3} K")
print(popt0)
plot_data()
#temps = np.linspace(10, T[-1], 100)
pyplot.plot(T, c_einstein(T, *popt0));
def integrand(y):
return y**4 * np.exp(y) / (np.exp(y) - 1)**2
#np.vectorize
def c_debye(T, T_D):
x = T / T_D
return 9 *n*R*x**3 * quad(integrand, 0, 1/x)[0]
popt1, pcov1 = curve_fit(c_debye, T, c, 150)
T_D = popt1[0]
delta_T_D = np.sqrt(pcov1[0, 0])
print(f"T_D = {T_D:.5} ± {delta_T_D:.3} K")
print(popt1)
plot_data()
pyplot.plot(T, c_einstein(T, *popt0), label='Einstein')
pyplot.plot(T, c_debye(T, *popt1), label='Debye')
pyplot.legend();
If it might be of any use, I obtained an excellent fit to a modified Weibull peak equation, with R-squared = 0.99999 and RMSE = 0.06878.
def Peak_WeibullPeak_Modified_model(x): # from zunzun.com
a = 6.4654735487019195E+01
b = 3.4517137038577323E+02
c = -1.5940608784806631E+00
d = 2.7331145870203617E+00
return = a * numpy.exp(-0.5 * numpy.power(numpy.log(x/b) / c, d))
You need to combine the Einstein and Debye equations into a single function, which should look something like this:
def func(T, alpha,gamma,T_e,T_d):
fn = lambda y: y**4 * np.exp(y) / (np.exp(y) - 1)**2
einst = (1-alpha)*3*n*R*T_e**2/T**2 * np.exp(T_e/T) / (np.exp(T_e/T) - 1)**2
debye_int = np.array([integrate.quad(fn, 0, T_d/t)[0] for t in T])
debye = alpha*9*n*R*T**3/T_d**3*debye_int
return einst+debye+gamma*T
You can then use that function in the curve fitting
coefs = curve_fit(func, T, c)[0]
plt.plot(T, func(T, *coefs))

how can I do a maximum likelihood regression using scipy.optimize.minimize

How can I do a maximum likelihood regression using scipy.optimize.minimize? I specifically want to use the minimize function here, because I have a complex model and need to add some constraints. I am currently trying a simple example using the following:
from scipy.optimize import minimize
def lik(parameters):
m = parameters[0]
b = parameters[1]
sigma = parameters[2]
for i in np.arange(0, len(x)):
y_exp = m * x + b
L = sum(np.log(sigma) + 0.5 * np.log(2 * np.pi) + (y - y_exp) ** 2 / (2 * sigma ** 2))
return L
x = [1,2,3,4,5]
y = [2,3,4,5,6]
lik_model = minimize(lik, np.array([1,1,1]), method='L-BFGS-B', options={'disp': True})
When I run this, convergence fails. Does anyone know what is wrong with my code?
The message I get running this is 'ABNORMAL_TERMINATION_IN_LNSRCH'. I am using the same algorithm that I have working using optim in R.
Thank you Aleksander. You were correct that my likelihood function was wrong, not the code. Using a formula I found on wikipedia I adjusted the code to:
import numpy as np
from scipy.optimize import minimize
def lik(parameters):
m = parameters[0]
b = parameters[1]
sigma = parameters[2]
for i in np.arange(0, len(x)):
y_exp = m * x + b
L = (len(x)/2 * np.log(2 * np.pi) + len(x)/2 * np.log(sigma ** 2) + 1 /
(2 * sigma ** 2) * sum((y - y_exp) ** 2))
return L
x = np.array([1,2,3,4,5])
y = np.array([2,5,8,11,14])
lik_model = minimize(lik, np.array([1,1,1]), method='L-BFGS-B')
plt.scatter(x,y)
plt.plot(x, lik_model['x'][0] * x + lik_model['x'][1])
plt.show()
Now it seems to be working.
Thanks for the help!

Categories