Errors using curve_fit for Guassian fit of data - python

I'm trying to do a guassian fit for some experimental data but I keep running into error after error. I've followed a few different threads online but either the fit isn't good (it's just a horizontal line) or the code just won't run. I'm following this code from another thread. Below is my code.
I apologize if my code seems a bit messy. There are some bits from other attempts when I tried making it work. Hence the "astropy" import.
import math as m
import matplotlib.pyplot as plt
import numpy as np
from scipy import optimize as opt
import pandas as pd
import statistics as stats
from astropy import modeling
def gaus(x,a,x0,sigma, offset):
return a*m.exp(-(x-x0)**2/(2*sigma**2)) + offset
# Python program to get average of a list
def Average(lst):
return sum(lst) / len(lst)
wavelengths = [391.719, 391.984, 392.248, 392.512, 392.777, 393.041, 393.306, 393.57, 393.835, 394.099, 391.719, 391.455, 391.19, 390.926, 390.661, 390.396]
intensities = [511.85, 1105.85, 1631.85, 1119.85, 213.85, 36.85, 10.85, 6.85, 13.85, 7.85, 511.85, 200.85, 80.85, 53.85, 14.85, 24.85]
n=sum(intensities)
mean = sum(wavelengths*intensities)/n
sigma = m.sqrt(sum(intensities*(wavelengths-mean)**2)/n)
def gaus(x,a,x0,sigma):
return a*m.exp(-(x-x0)**2/(2*sigma**2))
popt,pcov = opt.curve_fit(gaus,wavelengths,intensities,p0=[1,mean,sigma])
print(popt)
plt.scatter(wavelengths, intensities)
plt.title("Helium Spectral Line Peak 1")
plt.xlabel("Wavelength (nm)")
plt.ylabel("Intensity (a.u.)")
plt.show()
Thanks to the kind user, my curve seems to be working more reasonably well. However, one of the points seems to be back connecting to an earlier point? Screenshot below:

There are two problems with your code. The first is that you are performing vector operation on list which gives you the first error in the line mean = sum(wavelengths*intensities)/n. Therefore, you should use np.array instead. The second is that you take math.exp on python list which again throws an error as it takes a real number, so you should use np.exp here instead.
The following code solves your problem:
import matplotlib.pyplot as plt
import numpy as np
from scipy import optimize as opt
wavelengths = [391.719, 391.984, 392.248, 392.512, 392.777, 393.041,
393.306, 393.57, 393.835, 394.099, 391.719, 391.455,
391.19, 390.926, 390.661, 390.396]
intensities = [511.85, 1105.85, 1631.85, 1119.85, 213.85, 36.85, 10.85, 6.85,
13.85, 7.85, 511.85, 200.85, 80.85, 53.85, 14.85, 24.85]
wavelengths_new = np.array(wavelengths)
intensities_new = np.array(intensities)
n=sum(intensities)
mean = sum(wavelengths_new*intensities_new)/n
sigma = np.sqrt(sum(intensities_new*(wavelengths_new-mean)**2)/n)
def gaus(x,a,x0,sigma):
return a*np.exp(-(x-x0)**2/(2*sigma**2))
popt,pcov = opt.curve_fit(gaus,wavelengths_new,intensities_new,p0=[1,mean,sigma])
print(popt)
plt.scatter(wavelengths_new, intensities_new, label="data")
plt.plot(wavelengths_new, gaus(wavelengths_new, *popt), label="fit")
plt.title("Helium Spectral Line Peak 1")
plt.xlabel("Wavelength (nm)")
plt.ylabel("Intensity (a.u.)")
plt.show()

Related

Kernel Density Estimation using scipy's gaussian_kde and sklearn's KernelDensity leads to different results

I created some data from two superposed normal distributions and then applied sklearn.neighbors.KernelDensity and scipy.stats.gaussian_kde to estimate the density function. However, using the same bandwith (1.0) and the same kernel, both methods produce a different outcome. Can someone explain me the reason for this? Thanks for help.
Below you can find the code to reproduce the issue:
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import gaussian_kde
import seaborn as sns
from sklearn.neighbors import KernelDensity
n = 10000
dist_frac = 0.1
x1 = np.random.normal(-5,2,int(n*dist_frac))
x2 = np.random.normal(5,3,int(n*(1-dist_frac)))
x = np.concatenate((x1,x2))
np.random.shuffle(x)
eval_points = np.linspace(np.min(x), np.max(x))
kde_sk = KernelDensity(bandwidth=1.0, kernel='gaussian')
kde_sk.fit(x.reshape([-1,1]))
y_sk = np.exp(kde_sk.score_samples(eval_points.reshape(-1,1)))
kde_sp = gaussian_kde(x, bw_method=1.0)
y_sp = kde_sp.pdf(eval_points)
sns.kdeplot(x)
plt.plot(eval_points, y_sk)
plt.plot(eval_points, y_sp)
plt.legend(['seaborn','scikit','scipy'])
If I change the scipy bandwith to 0.25, the result of both methods look approximately the same.
What is meant by bandwidth in scipy.stats.gaussian_kde and sklearn.neighbors.KernelDensity is not the same. Scipy.stats.gaussian_kde uses a bandwidth factor https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html. For a 1-D kernel density estimation the following formula is applied:
the bandwidth of sklearn.neighbors.KernelDensity = bandwidth factor of the scipy.stats.gaussian_kde * standard deviation of the sample
For your estimation this probably means that your standard deviation equals 4.
I would like to refer to Getting bandwidth used by SciPy's gaussian_kde function for more information.
To be honest, I don't know why, but using scipy hyperparameter bw_method='scott' makes it work exactly the same as seaborn.
So, it seems to be all about the hyperparameters. We could find out why by understanding them in depth, but in the meantime just use ‘scott’ or ‘silverman’ instead of using a random scalar.
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import gaussian_kde
import seaborn as sns
from sklearn.neighbors import KernelDensity
n = 10000
dist_frac = 0.1
x1 = np.random.normal(-5,2,int(n*dist_frac))
x2 = np.random.normal(5,3,int(n*(1-dist_frac)))
x = np.concatenate((x1,x2))
np.random.shuffle(x)
eval_points = np.linspace(np.min(x), np.max(x))
kde_sk = KernelDensity(bandwidth=1, kernel='gaussian')
kde_sk.fit(x.reshape([-1,1]))
y_sk = np.exp(kde_sk.score_samples(eval_points.reshape(-1,1)))
kde_sp = gaussian_kde(x, bw_method='scott') ### I MEAN HERE! ###
y_sp = kde_sp.pdf(eval_points)
sns.kdeplot(x)
plt.plot(eval_points, y_sk)
plt.plot(eval_points, y_sp)
plt.legend(['seaborn','scikit','scipy'])
Increase the size of 'random normal'. your data points are too few.
try with n=500000 and check the results.

Fitting multiple datasets with shared paramaters

I am trying to fit different data set with different non-linear function that shared some parameters and it look something like this:
import matplotlib
from matplotlib import pyplot as plt
from scipy import optimize
import numpy as np
#some non-linear function
def Sigma1x(x,C11,C111,C1111,C11111):
return C11*x+1/2*C111*pow(x,2)+1/6*C1111*pow(x,3)+1/24*C11111*pow(x,4)
def Sigma2x(x,C12,C112,C1112,C11112):
return C12*x+1/2*C112*pow(x,2)+1/6*C1112*pow(x,3)+1/24*C11112*pow(x,4)
def Sigma1y(y,C12,C111,C222,C112,C1111,C1112,C2222,C12222):
return C12*y+1/2*(C111-C222+C112)*pow(y,2)+1/12*(C111+2*C1112-C2222)*pow(y,3)+1/24*C12222*pow(y,4)
def Sigma2y(y,C11,C222,C222,C2222):
return C11*y+1/2*C222*pow(y,2)+1/6*C2222*pow(y,3)+1/24*C22222*pow(y,4)
def Sigmaz(z,C11,C12,C111,C222,C112,C1111,C1112,C2222,C1122,C11111,C11112,C122222,C11122,C22222):
return (C11+C12)*z+1/2*(2*C111-C222+3*C112)*pow(z,2)+1/6*(3/2*C1111+4*C1112-1/2*C222+3*C1122)*pow(z,3)+\
1/24*(3*C11111+10*C11112-5*C12222+10*C11122-2*C22222)*pow(z,4)
# Experimental datasets
Xdata=np.loadtxt('x-direction.txt') #This contain x axis and two other dataset, should be fitted with Sigma1x and Sigma2x
Ydata=np.loadtxt('y-direction.txt') #his contain yaxis and two other dataset, should be fitted with Sigma1yand Sigma2y
Zdata=nploadtxt('z-direction.txt')#This contain z axis and one dataset fitted with Sigmaz
The question is how to use optimize.leastsq or other packages to fit the data with the appropriate function, knowing that they share multiple paramaters?
I was able to solve ( partially the initial question). I found symfit a very comprehensive and easy to use. So i wrote the following code
import matplotlib.pyplot as plt
from symfit import *
import numpy as np
from symfit.core.minimizers import DifferentialEvolution, BFGS
Y_strain = np.genfromtxt('Y_strain.csv', delimiter=',')
X_strain=np.genfromtxt('X_strain.csv', delimiter=',')
xmax=max(X_strain[:,0])
xmin=min(X_strain[:,0])
xdata = np.linspace(xmin, xmax, 50)
ymax=max(Y_strain[:,0])
ymin=max(Y_strain[:,0])
ydata=np.linspace(ymin, ymax, 50)
x,y,Sigma1x,Sigma2x,Sigma1y,Sigma2y= variables('x,y,Sigma1x,Sigma2x,Sigma1y,Sigma2y')
C11,C111,C1111,C11111,C12,C112,C1112,C11112,C222,C2222,C12222,C22222 = parameters('C11,C111,C1111,C11111,C12,C112,C1112,C11112,C222,C2222,C12222,C22222')
model =Model({
Sigma1x:C11*x+1/2*C111*pow(x,2)+1/6*C1111*pow(x,3)+1/24*C11111*pow(x,4),
Sigma2x:C12*x+1/2*C112*pow(x,2)+1/6*C1112*pow(x,3)+1/24*C11112*pow(x,4),
#Sigma1y:C12*y+1/2*(C111-C222+C112)*pow(y,2)+1/12*(C111+2*C1112-C2222)*pow(y,3)+1/24*C12222*pow(y,4),
#Sigma2y:C11*y+1/2*C222*pow(y,2)+1/6*C2222*pow(y,3)+1/24*C22222*pow(y,4),
})
fit = Fit(model, x=X_strain[:,0], Sigma1x=X_strain[:,1],Sigma2x=X_strain[:,2])
fit_result = fit.execute()
print(fit_result)
plt.scatter(Y_strain[:,0],Y_strain[:,2])
plt.scatter(Y_strain[:,0],Y_strain[:,1])
plt.plot(xdata, model(x=xdata, **fit_result.params).Sigma1x)
plt.plot(xdata, model(x=xdata, **fit_result.params).Sigma2x)
However, The resulting fit is very bad :
Parameter Value Standard Deviation
C11 1.203919e+02 3.988977e+00
C111 -6.541505e+02 5.643111e+01
C1111 1.520749e+03 3.713742e+02
C11111 -7.824107e+02 1.015887e+03
C11112 4.451211e+03 1.015887e+03
C1112 -1.435071e+03 3.713742e+02
C112 9.207923e+01 5.643111e+01
C12 3.272248e+01 3.988977e+00
Status message Desired error not necessarily achieved due to precision loss.
Number of iterations 59
Objective <symfit.core.objectives.LeastSquares object at 0x000001CC00C0A508>
Minimizer <symfit.core.minimizers.BFGS object at 0x000001CC7F84A548>
Goodness of fit qualifiers:
chi_squared 6.230510793023184
objective_value 3.115255396511592
r_squared 0.991979767376565
Any idea's how to improve the fit?

Plotting Fourier Transform of Gaussian function with python, but the result was wrong

I have been thinking about it for a long time, but I don't find out what the problem is. Hope you can help me, Thank you.
F(s) Gaussian function
F(s)=1/(√2π s) e^(-(w-μ)^2/(2s^2 ))
Code:
import numpy as np
from matplotlib import pyplot as plt
from math import pi
from scipy.fft import fft
def F_S(w, mu, sig):
return (np.exp(-np.power(w-mu, 2)/(2 * np.power(sig, 2))))/(np.power(2*pi, 0.5)*sig)
w=np.linspace(-5,5,100)
plt.plot(w, np.real(np.fft.fft(F_S(w, 0, 1))))
plt.show()
Result:
As was mentioned before you want the absolute value, not the real part.
A minimal example, showing the the re/im , abs/phase spectra.
import numpy as np
import matplotlib.pyplot as p
%matplotlib inline
n=1001 # add 1 to keep the interval a round number when using linspace
t = np.linspace(-5, 5, n ) # presumed to be time
dt=t[1]-t[0] # time resolution
print(f'sampling every {dt:.3f} sec , so at {1/dt:.1f} Sa/sec, max. freq will be {1/2/dt:.1f} Hz')
y = np.exp(-(t**2)/0.01) # signal in time
fr= np.fft.fftshift(np.fft.fftfreq(n, dt)) # shift helps with sorting the frequencies for better plotting
ft=np.fft.fftshift(np.fft.fft(y)) # fftshift only necessary for plotting in sequence
p.figure(figsize=(20,12))
p.subplot(231)
p.plot(t,y,'.-')
p.xlabel('time (secs)')
p.title('signal in time')
p.subplot(232)
p.plot(fr,np.abs(ft), '.-',lw=0.3)
p.xlabel('freq (Hz)')
p.title('spectrum, abs');
p.subplot(233)
p.plot(fr,np.real(ft), '.-',lw=0.3)
p.xlabel('freq (Hz)')
p.title('spectrum, real');
p.subplot(235)
p.plot(fr,np.angle(ft), '.-', lw=0.3)
p.xlabel('freq (Hz)')
p.title('spectrum, phase');
p.subplot(236)
p.plot(fr,np.imag(ft), '.-',lw=0.3)
p.xlabel('freq (Hz)')
p.title('spectrum, imag');
you have to change from time scale to frequency scale
When you make a FFT you will get the simetric tranformation, i.e, mirror of the positive to negative curve. Usually, you only will look at the positive side.
Also, you should take care with sample rate, as FFT is designed to transform time domain input to frequency domain, the time, or sample rate, of input info matters. So add timestep in np.fft.fftfreq(n, d=timestep) for your sample rate.
If you simple want to make a fft of normal dist signal, here is another question with it and some good explanations on why are you geting this behavior:
Fourier transform of a Gaussian is not a Gaussian, but thats wrong! - Python
There are two mistakes in your code:
Don't take the real part, take the absoulte value when plotting.
From the docs:
If A = fft(a, n), then A[0] contains the zero-frequency term (the mean
of the signal), which is always purely real for real inputs. Then
A[1:n/2] contains the positive-frequency terms, and A[n/2+1:] contains
the negative-frequency terms, in order of decreasingly negative
frequency.
You can rearrange the elements with np.fft.fftshift.
The working code:
import numpy as np
from matplotlib import pyplot as plt
from math import pi
from scipy.fftpack import fft, fftshift
def F_S(w, mu, sig):
return (np.exp(-np.power(w-mu, 2)/(2 * np.power(sig, 2))))/(np.power(2*pi, 0.5)*sig)
w=np.linspace(-5,5,100)
plt.plot(w, fftshift(np.abs(np.fft.fft(F_S(w, 0, 1)))))
plt.show()
Also, you might want to consider scaling the x axis too.

Plotting 2D integral function in python

Here is my first steps within the NumPy world.
As a matter of fact the target is plotting below 2-D function as a 3-D mesh:
N = \frac{n}{2\sigma\sqrt{\pi}}\exp^{-\frac{n^{2}x^{2}}{4\sigma^{2}}}
That could been done as a piece a cake in Matlab with below snippet:
[x,n] = meshgrid(0:0.1:20, 1:1:100);
mu = 0;
sigma = sqrt(2)./n;
f = normcdf(x,mu,sigma);
mesh(x,n,f);
But the bloody result is ugly enough to drive me trying Python capabilities to generate scientific plots.
I searched something and found that the primary steps to hit above mark in Pyhton might be acquired by below snippet:
from matplotlib.patches import Polygon
import numpy as np
from scipy.integrate import quad
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
sigma = 1
def integrand(x,n):
return (n/(2*sigma*np.sqrt(np.pi)))*np.exp(-(n**2*x**2)/(4*sigma**2))
t = np.linespace(0, 20, 0.01)
n = np.linespace(1, 100, 1)
lower_bound = -100000000000000000000 #-inf
upper_bound = t
tt, nn = np.meshgrid(t,n)
real_integral = quad(integrand(tt,nn), lower_bound, upper_bound)
Axes3D.plot_trisurf(real_integral, tt,nn)
Edit: With due attention to more investigations on Greg's advices, above code is the most updated snippet.
Here is the generated exception:
RuntimeError: infinity comparisons don't work for you
It is seemingly referring to the quad call...
Would you please helping me to handle this integrating-plotting problem?!...
Best
Just a few hints to get you in the right direction.
numpy.meshgrid can do the same as MatLABs function:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.meshgrid.html
When you have x and n you can do math just like in matlab:
sigma = numpy.sqrt(2)/n
(in python multiplication/division is default index by index - no dot needed)
scipy has a lot more advanced functions, see for example How to calculate cumulative normal distribution in Python for a 1D case.
For plotting you can use matplotlibs pcolormesh:
import matplotlib.pyplot as plt
plt.pcolormesh(x,n,real_integral)
Hope this helps until someone can give you a more detailed answer.

curve fitting with lmfit python

I am new to python and trying to fit data using lmfit. I am following on the lmfit tutorial here:
http://lmfit.github.io/lmfit-py/parameters.html
and this is my code (based on the code explained in the above link):
import numpy as np
import lmfit
import matplotlib.pyplot as plt
from numpy import exp, sqrt, pi
from lmfit import minimize,Parameters,Parameter,report_fit
data=np.genfromtxt('test.txt',delimiter=',')
x=data[:,][:,0]
y=data[:,][:,1]
def fcn2fit(params,x,y):
"""model decaying sine wave, subtract data"""
S1=params['S1'].value
t0=params['t0'].value
T1=params['T1'].value
S2=params['S2'].value
T2=params['T2'].value
model = 1-(1-S1)*exp(-(x-t0)/T1)+S2*(1-exp(-(x-t0)/T2)
return model - y
params = Parameters()
params.add('S1', value=0.85, min=0.8, max=0.9)
params.add('t0', value=0.05, min=0.01, max=0.1)
params.add('T1', value=0.2, min=0.1, max=0.3)
params.add('S2', value=0.03, min=0.01, max=0.05)
params.add('T2', value=0.3, min=0.2, max=0.4)
result = minimize(fcn2fit, params, args=(x,y))
final = y + result.residual
report_fit (params)
try:
import pylab
pylab.plot(x,y, 'k+')
pylab.plot(x,final, 'r')
pylab.show()
except:
pass
Problem:
it return syntax error for line return model-y
I appreciate if you could please let me to right direction.
I think there is a parenthesis problem in the previous line. This causes the return to be included in the formula. I think there's a ) missing at the end.
You have forgotten a right parenthesis ")" in the previous line (result= ...). Opening and closing parentheses are unbalanced causing a syntax error.

Categories