Fitting with a gaussian - python

I have some problems when trying to fit data from a text file with a gaussian. This is my code, where cal1_p1 is an array containing 54 values.
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from scipy.optimize import curve_fit
cal1=np.loadtxt("C:/Users/Luca/Desktop/G3/X_rays/cal1_5min_Am.txt")
cal1_p1=[0 for a in range(854,908)]
for i in range(0,54):
cal1_p1[i]=cal1[i+854]
# cal1_p1 takes the following values:
[5.0,6.0,5.0,11.0,4.0,9.0,14.0,13.0,13.0,14.0,12.0,13.0,16.0,20.0,15.0,23.0,23.0,33.0,43.0,46.0,41.0,40.0,49.0,57.0,62.0,61.0,53.0,65.0,64.0,42.0,72.0,55.0,47.0,43.0,38.0,46.0,37.0,39.0,27.0,18.0,20.0,20.0,18.0,10.0,11.0,8.0,10.0,6.0,8.0,8.0,6.0,10.0,6.0,4.0]
x=np.arange(854,908)
def gauss(x,sigma,m):
return np.exp(-(x-m)**2/(2*sigma**2))/(sigma*np.sqrt(2*np.pi))
from scipy.optimize import curve_fit
popt,pcov=curve_fit(gauss,x,cal1_p1,p0=[10,880])
plt.xlabel("Channel")
plt.ylabel("Counts")
axes=plt.gca()
axes.set_xlim([854,907])
axes.set_ylim([0,75])
plt.plot(x,cal1_p1,"k")
plt.plot(x,gauss(x,*popt),'b', label='fit')
The problem is that the resulting gaussian is really squeezed, namely it has a very low variance. Even if I try to modify the initial value p_0 the result doesn't change. What could be the problem? Thanks for any help you can provide!

The problem is that the Gaussian is normalised, while your data are not. You need to fit an amplitude as well. That is easy to fix, by adding an extra parameter a to your function:
x = np.arange(854, 908)
def gauss(x, sigma, m, a):
return a * np.exp(-(x-m)**2/(2*sigma**2))/(sigma*np.sqrt(2*np.pi))
popt, pcov = curve_fit(gauss, x, cal1_p1, p0=[10, 880, 1])
print(popt)
plt.xlabel("Channel")
plt.ylabel("Counts")
axes=plt.gca()
axes.set_xlim([854, 907])
axes.set_ylim([0, 75])
plt.plot(x, cal1_p1, "k")
plt.plot(x, gauss(x,*popt), 'b', label='fit')
While I've given 1 as starting parameter for a, you'll find that the fitted values are actually:
[ 9.55438603 880.88681556 1398.66618699]
but the amplitude value here can probably be ignored, since I assume you'd only be interested in the relative strength, which can be measured in counts.

Related

Sine fitting using scipy is not returning good fit

trying to fit some sine wave to data i collected. But Amplitude and Frequency are way off. Any suggestions?
x=[0,1,3,4,5,6,7,11,12,13,14,15,16,18,20,21,22,24,26,28,29,30,31,32,35,37,38,40,41,42,43,44,45,48,49,50,51,52,53,54,55,57,58,60,61,62,63,65,66,67,68,69,70,71,73,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,112,114,115,116,117,120,122,123,124,125,128,129,130,131,132,136,137,138,139,140,143,145,147,148,150,151,153,154,155,156,160,163,164,165,167,168,169,171,172,173,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,199,201,202,203,204,205,207,209,210,215,217,218,223,224,225,226,228,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,254,255,256,257,258,259,260,261,262,263,264,265,266,267,269,270,271,272,273,274,275,276,279,280,281,282,286,287,288,292,294,295,296,298,301,302,303,310,311,312,313,315,316,317,318,319,320,321,323,324,325,326,328,329,330,331,332,333,334,335,336,337,338,339,340,341,342,343,344,345,348,349,350,351,352,354,356,357,358,359,362,363,365,366,367,371,372,373,374,375,377,378,379,380,381,382,383,384,385,386,387,388,389,390,391,392,393,394,395,396,397,398,399,400,401,402,404,405,406,407,408,411,412,413,417,418,419,420,421,422,428,429,431,435,436,437,443,444,445,446,450,451,452,453,454,455,456,459,460,461,462,464,465,466,467,468,469,470,471,472,473,474,475,476,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,495,496,497,498,499,500,501,505,506,507,512,513,514,515,516,517,519,521,522,523,524,525,526,528,529,530,531,532,533,535,537,538,539,543,544,545,546,547,548,549,550,551,552,553,554,555,556,557,559,560,561,562,563,564,566,567,568,569,570,571,572,573,574,575,577,578,579,584,585,586,588,591,592,593,594,596,598,600,601,603,604,605,606,607,608,609,610,611,612,613,614,615,616,617,618,620,621,622,623,624,625,626,627,628,629,630,631,632,633,634,635,636,637,638,639,640,642,643,644,646,647,648,650,652,653,654,655,656,660,661,662,663,665,666,667,668,669,670,671,672,673,676,677,678,679,680,681,682,684,685,687,688,690,691,692,693,694,695,696,697,698,701,702,703,704,707,708,709,710,712,713,714,715,717,718,719,721,722,723 ]
y=[53.66666667,53.5,51,53.66666667,54.33333333,55.5,57,59,56.5,57.33333333,56,56,57,58,58.66666667,59.5,57,59,58,61.5,60,61,62.5,67,60.66666667,62.5,64.33333333,64,64,65,65,65.66666667,68,70.5,67,67.5,71.5,65,70.5,73.33333333,72,67,76,73.5,72.83333333,75,73,74,73,71,70.5,73.16666667,70,75,69,71,68.33333333,68.5,66.75,62,63.5,63,62.5,61,53.5,61.25,55,57.5,62,54.75,56.5,52.33333333,52.33333333,49,47.66666667,47.5,45,44,42.5,41,37,37.2,34.5,33.4,33.2,34,26,28.6,25,25.5,27,22.66666667,21.66666667,21.5,22.5,22,19.8,19.66666667,20,20,17,26,22.6,19,28,26.33333333,24.25,27,28.5,30,24,33,31,41,38,22,31.66666667,30,39,26,33.5,40,40.5,38,44,47,48,43,42.5,44,43,51.5,48,49.66666667,51.5,47,56,50,50,58,51,58,58.5,57.33333333,57.5,64,57,59,56.5,65.5,60,63.66666667,62,62,65.33333333,66.5,65,66,65,68,65.5,65.83333333,60,65.5,70,68,64,65.42857143,62,68,63.25,62,63.33333333,60.4,59,52.5,52.6,55.16666667,50,51,45.33333333,48.33333333,39.4,38.25,34.33333333,43.25,31.33333333,29.5,29.5,29,27,26,27,25.5,24.5,23,22,22.5,19.5,20,20,18,18.5,17,16,16,15,14,14.5,13,12.5,11.5,11,11,11,10.5,10.5,9,9,10,10,10.5,9,10,10,11,11,11,10,10.66666667,12,12,12.5,13,13,14,14,14.5,16,16,18,16.5,20.5,21.5,21,25,28,22,29,29,28.66666667,36,42,36.75,43.5,48,44.75,50.66666667,53.75,51,57.33333333,58.5,58.66666667,60,60.25,61.75,60,58.5,63,61,60.33333333,62,63,63,60,61.5,62.33333333,62.66666667,61,63.5,61,61.66666667,62,59,60,57.5,56,57,58.5,52.5,50.5,47.5,49.66666667,49.66666667,54.66666667,45.66666667,41,44,33.16666667,49,45,29.5,39.5,29,20.5,23.5,23,19,18.66666667,17,16.75,15.5,15,16,17,13.5,12.2,12,14,13,11,11.5,11.5,11,11.5,11,11.5,11.5,12,13,13,13,13,13.5,14,14,14,15,17,15,16,16,17,18,17,18,18.5,19.5,20.5,20,21.5,20,22,22,23,23,25,26,28,29,36.25,31,37.75,41.33333333,43.6,37.5,46.5,38,47.33333333,46.75,47,50.5,48.5,58,50.5,48.75,54.33333333,56,49,55.5,60,56.5,56,60,56.5,52.75,54,56,57,56,52.66666667,52,52.66666667,53,47.66666667,44,48,50.5,45,46.66666667,48,44.66666667,42.33333333,46.5,43,36.75,41,28,35,36.5,36,37.33333333,24,30.5,29,29.33333333,32.5,20,25.5,27.5,18,33,25.75,26,19.5,16,15.5,18,13,21,12,12.25,11,5,9,10,7.5,5,7.5,4,4.5,5.666666667,3.5,6.5,5,7,7.333333333,7,9,7.5,9,9.5,11,9,10,12,11.5,12.5,13,14,13.5,13,14,15,15,16,16.5,17.5,19.66666667,19.33333333,20.5,23.66666667,25.5,28.75,31,32.66666667,33.66666667,29,32.33333333,37.6,31,39.5,49,44.14285714,41,42.16666667,45,47.66666667,50.2,52.66666667,52,50,54,53.33333333,54.66666667,54.5,54,56,54,53.5,53,53,52,51.5,51.5,52,48,53,48,50,49.5,48.5,46,45,47,49,48,44,42,42,43,43,42.5,41.5,39.5,46,36,37.5,39,39,38,43,40,38,32.5,34,35.33333333,35,35,30.5,30,31.33333333,33,26,30,27,24,30,28,25,29,25.33333333]
from scipy.optimize import curve_fit
from numpy import sin
def fitting(x, a, b, c):
return a * sin(b*x + c)
constants = curve_fit(fitting, x, y)
a_fit= constants[0][0]
b_fit= constants[0][1]
c_fit = constants[0][2]
fit_y=[]
for i in x:
fit_y.append(fitting(i, a_fit, b_fit, c_fit))
plt.plot(x,fit_y, '--', color='red')
plt.scatter(x,y)
You should add an offset to your fitting function, as your data clearly has an offset around 40.
And then you need a proper initial estimate parameter p0 so that the fit converges to the ideal solution. This will do the job :
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from numpy import sin
def fitting(x, a,b,c,d):
return a * sin(b*x + c) + d
p0 = [ (np.max(y)-np.min(y))/2, 6/150, 0, np.mean(y)]
constants = curve_fit(fitting, x, y , p0=p0 )
guess_y = [ fitting(i, *p0) for i in x]
fit_y = [ fitting(i, *constants[0]) for i in x]
plt.plot(x,guess_y, '--', color='green',label='guess')
plt.plot(x,fit_y, '--', color='red',label='fit')
plt.scatter(x,y,label='data') plt.legend()
plt.legend()
If you feel like it, you could even add a linear offset (a*x+b)
Note : thanks for the edit jonsca
I would add this as a comment, but I can't. Fundamentally, a * sin(b*x + c) isn't going to fit well to your data, you don't have an average value of zero so you'd have to try a*sin(b*x +c) + d, but even then I don't think you'll get a great fit. You could try:
Give it some initial values to work with using the p0 input argument https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html . It never hurts to help the minimizer out..
Try a different function, what you have here looks like a sin wave, with offset 'a0' and maybe a decaying amplitude.
But you really need to just look at your data before trying to force a function to fit to it.

SciPy curve_fit displays straight line and is not fit to the data

I'm trying to fit a set of data to a CDF exponential function. However, I'm not sure what is going wrong either in my code or in the initial parameter guess, but it only creates a straight line. Data was imported from a CSV file.
#Plot Data
plt.figure(1,dpi=120)
plt.title("Cell A3")
plt.xlabel(rawdata[0][0])
plt.ylabel(rawdata[0][1])
plt.scatter(xdata,ydata,label="A3 Cell 1")
#Define Function
def func(t,lam):
return 1 - (np.exp(-lam * t))
funcdata = func(xdata,1.17)
plt.plot(xdata,funcdata,label="Model")
plt.legend()
#CurveFit data to model
popt, pcov = curve_fit(func,xdata,ydata,p0=(-0.64))
perr = np.sqrt(np.diag(pcov))
Image of the graph I get with the initial data and the straight line that the curve_fit gives
You cannot fit correctly such a simple exponential function of this kind :
y=( 1 - (np.exp(-lam * t)) ) * scale
to the data because the shape of this function is far to the shape of the data in the range of 0<t<5.
Better consider a function of the logistic kind, For example :
Think about your data and your function. ydata is quite a large value. What is the maximum value of
def func(t,lam):
return 1 - (np.exp(-lam * t))
I think you will find the max of the function occurs as lam approaches infinity, the function approaches 1. How can a function with max value == 1 fit data in the 1000s? If you want to be able to scale beyond 1, you need more parameters in your function. Try with
def func(t,lam,scale):
return ( 1 - (np.exp(-lam * t)) ) * scale
and see if scipy is able to better fit the data.
EDIT:
I mananaged to get that to work, however, you aren't even plotting the optimum parameters. To do that, see my code with simulated xdata and ydata:
#Plot Data
import numpy as np
from scipy.optimize import curve_fit
from matplotlib import pyplot as plt
def func(t,lam,scale):
return ( 1 - (np.exp(-lam * t)) ) * scale
xdata = np.arange(25.)
ydata = func(xdata, 1.12, 2000.)
plt.figure(1,dpi=120)
plt.title("Cell A3")
plt.xlabel(rawdata[0][0])
plt.ylabel(rawdata[0][1])
plt.scatter(xdata,ydata,label="A3 Cell 1")
#CurveFit data to model
popt, pcov = curve_fit(func,xdata,ydata,p0=[0.5, 1000.1])
plt.plot(np.arange(25),func(np.arange(25), *popt),label="Model")
plt.legend()
outputs:

Plotting Fourier Transform of Gaussian function with python, but the result was wrong

I have been thinking about it for a long time, but I don't find out what the problem is. Hope you can help me, Thank you.
F(s) Gaussian function
F(s)=1/(√2π s) e^(-(w-μ)^2/(2s^2 ))
Code:
import numpy as np
from matplotlib import pyplot as plt
from math import pi
from scipy.fft import fft
def F_S(w, mu, sig):
return (np.exp(-np.power(w-mu, 2)/(2 * np.power(sig, 2))))/(np.power(2*pi, 0.5)*sig)
w=np.linspace(-5,5,100)
plt.plot(w, np.real(np.fft.fft(F_S(w, 0, 1))))
plt.show()
Result:
As was mentioned before you want the absolute value, not the real part.
A minimal example, showing the the re/im , abs/phase spectra.
import numpy as np
import matplotlib.pyplot as p
%matplotlib inline
n=1001 # add 1 to keep the interval a round number when using linspace
t = np.linspace(-5, 5, n ) # presumed to be time
dt=t[1]-t[0] # time resolution
print(f'sampling every {dt:.3f} sec , so at {1/dt:.1f} Sa/sec, max. freq will be {1/2/dt:.1f} Hz')
y = np.exp(-(t**2)/0.01) # signal in time
fr= np.fft.fftshift(np.fft.fftfreq(n, dt)) # shift helps with sorting the frequencies for better plotting
ft=np.fft.fftshift(np.fft.fft(y)) # fftshift only necessary for plotting in sequence
p.figure(figsize=(20,12))
p.subplot(231)
p.plot(t,y,'.-')
p.xlabel('time (secs)')
p.title('signal in time')
p.subplot(232)
p.plot(fr,np.abs(ft), '.-',lw=0.3)
p.xlabel('freq (Hz)')
p.title('spectrum, abs');
p.subplot(233)
p.plot(fr,np.real(ft), '.-',lw=0.3)
p.xlabel('freq (Hz)')
p.title('spectrum, real');
p.subplot(235)
p.plot(fr,np.angle(ft), '.-', lw=0.3)
p.xlabel('freq (Hz)')
p.title('spectrum, phase');
p.subplot(236)
p.plot(fr,np.imag(ft), '.-',lw=0.3)
p.xlabel('freq (Hz)')
p.title('spectrum, imag');
you have to change from time scale to frequency scale
When you make a FFT you will get the simetric tranformation, i.e, mirror of the positive to negative curve. Usually, you only will look at the positive side.
Also, you should take care with sample rate, as FFT is designed to transform time domain input to frequency domain, the time, or sample rate, of input info matters. So add timestep in np.fft.fftfreq(n, d=timestep) for your sample rate.
If you simple want to make a fft of normal dist signal, here is another question with it and some good explanations on why are you geting this behavior:
Fourier transform of a Gaussian is not a Gaussian, but thats wrong! - Python
There are two mistakes in your code:
Don't take the real part, take the absoulte value when plotting.
From the docs:
If A = fft(a, n), then A[0] contains the zero-frequency term (the mean
of the signal), which is always purely real for real inputs. Then
A[1:n/2] contains the positive-frequency terms, and A[n/2+1:] contains
the negative-frequency terms, in order of decreasingly negative
frequency.
You can rearrange the elements with np.fft.fftshift.
The working code:
import numpy as np
from matplotlib import pyplot as plt
from math import pi
from scipy.fftpack import fft, fftshift
def F_S(w, mu, sig):
return (np.exp(-np.power(w-mu, 2)/(2 * np.power(sig, 2))))/(np.power(2*pi, 0.5)*sig)
w=np.linspace(-5,5,100)
plt.plot(w, fftshift(np.abs(np.fft.fft(F_S(w, 0, 1)))))
plt.show()
Also, you might want to consider scaling the x axis too.

Fitting sin curve using python

I am having two list:
# on x-axis:
# list1:
[70.434654, 37.147266, 8.5787086, 161.40877, -27.31284, 80.429482, -81.918106, 52.320129, 64.064552, -156.40771, 12.37026, 15.599689, 166.40984, 134.93636, 142.55002, -38.073524, -38.073524, 123.88509, -82.447571, 97.934402, 106.28793]
# on y-axis:
# list2:
[86683.961, -40564.863, 50274.41, 80570.828, 63628.465, -87284.016, 30571.402, -79985.648, -69387.891, 175398.62, -132196.5, -64803.133, -269664.06, 36493.316, 22769.121, 25648.252, 25648.252, 53444.855, 684814.69, 82679.977, 103244.58]
I need to fit a sine curve a+bsine(2*3.14*list1+c) in the data points obtained by plotting list1(on x-axis) against(on-y-axis) using python.
I am not able to get any good result.Can anyone help me with a suitable code,explanation...
Thanks!
this is my graph after plotting the list1(on x-axis) and list2(on y-axis)
Well, if you used lmfit setting up and running your fit would look like this:
xdeg = [70.434654, 37.147266, 8.5787086, 161.40877, -27.31284, 80.429482, -81.918106, 52.320129, 64.064552, -156.40771, 12.37026, 15.599689, 166.40984, 134.93636, 142.55002, -38.073524, -38.073524, 123.88509, -82.447571, 97.934402, 106.28793]
y = [86683.961, -40564.863, 50274.41, 80570.828, 63628.465, -87284.016, 30571.402, -79985.648, -69387.891, 175398.62, -132196.5, -64803.133, -269664.06, 36493.316, 22769.121, 25648.252, 25648.252, 53444.855, 684814.69, 82679.977, 103244.58]
import numpy as np
from lmfit import Model
import matplotlib.pyplot as plt
def sinefunction(x, a, b, c):
return a + b * np.sin(x*np.pi/180.0 + c)
smodel = Model(sinefunction)
result = smodel.fit(y, x=xdeg, a=0, b=30000, c=0)
print(result.fit_report())
plt.plot(xdeg, y, 'o', label='data')
plt.plot(xdeg, result.best_fit, '*', label='fit')
plt.legend()
plt.show()
That is assuming your X data is in degrees, and that you really intended to convert that to radians (as numpy's sin() function requires).
But that just addresses the mechanics of how to do the fit (and I'll leave the display of results up to you - it seems like you may need the practice).
The fit result is terrible, because these data are not sinusoidal. They are also not well ordered, which isn't a problem for doing the fit, but does make it harder to see what is going on.

Gradient in noisy data, python

I have an energy spectrum from a cosmic ray detector. The spectrum follows an exponential curve but it will have broad (and maybe very slight) lumps in it. The data, obviously, contains an element of noise.
I'm trying to smooth out the data and then plot its gradient.
So far I've been using the scipy sline function to smooth it and then the np.gradient().
As you can see from the picture, the gradient function's method is to find the differences between each point, and it doesn't show the lumps very clearly.
I basically need a smooth gradient graph. Any help would be amazing!
I've tried 2 spline methods:
def smooth_data(y,x,factor):
print "smoothing data by interpolation..."
xnew=np.linspace(min(x),max(x),factor*len(x))
smoothy=spline(x,y,xnew)
return smoothy,xnew
def smooth2_data(y,x,factor):
xnew=np.linspace(min(x),max(x),factor*len(x))
f=interpolate.UnivariateSpline(x,y)
g=interpolate.interp1d(x,y)
return g(xnew),xnew
edit: Tried numerical differentiation:
def smooth_data(y,x,factor):
print "smoothing data by interpolation..."
xnew=np.linspace(min(x),max(x),factor*len(x))
smoothy=spline(x,y,xnew)
return smoothy,xnew
def minim(u,f,k):
""""functional to be minimised to find optimum u. f is original, u is approx"""
integral1=abs(np.gradient(u))
part1=simps(integral1)
part2=simps(u)
integral2=abs(part2-f)**2.
part3=simps(integral2)
F=k*part1+part3
return F
def fit(data_x,data_y,denoising,smooth_fac):
smy,xnew=smooth_data(data_y,data_x,smooth_fac)
y0,xnnew=smooth_data(smy,xnew,1./smooth_fac)
y0=list(y0)
data_y=list(data_y)
data_fit=fmin(minim, y0, args=(data_y,denoising), maxiter=1000, maxfun=1000)
return data_fit
However, it just returns the same graph again!
There is an interesting method published on this: Numerical Differentiation of Noisy Data. It should give you a nice solution to your problem. More details are given in another, accompanying paper. The author also gives Matlab code that implements it; an alternative implementation in Python is also available.
If you want to pursue the interpolation with splines method, I would suggest to adjust the smoothing factor s of scipy.interpolate.UnivariateSpline().
Another solution would be to smooth your function through convolution (say with a Gaussian).
The paper I linked to claims to prevent some of the artifacts that come up with the convolution approach (the spline approach might suffer from similar difficulties).
I won't vouch for the mathematical validity of this; it looks like the paper from LANL that EOL cited would be worth looking into. Anyway, I’ve gotten decent results using SciPy’s splines’ built-in differentiation when using splev.
%matplotlib inline
from matplotlib import pyplot as plt
import numpy as np
from scipy.interpolate import splrep, splev
x = np.arange(0,2,0.008)
data = np.polynomial.polynomial.polyval(x,[0,2,1,-2,-3,2.6,-0.4])
noise = np.random.normal(0,0.1,250)
noisy_data = data + noise
f = splrep(x,noisy_data,k=5,s=3)
#plt.plot(x, data, label="raw data")
#plt.plot(x, noise, label="noise")
plt.plot(x, noisy_data, label="noisy data")
plt.plot(x, splev(x,f), label="fitted")
plt.plot(x, splev(x,f,der=1)/10, label="1st derivative")
#plt.plot(x, splev(x,f,der=2)/100, label="2nd derivative")
plt.hlines(0,0,2)
plt.legend(loc=0)
plt.show()
You can also use scipy.signal.savgol_filter.
Result
Example
import matplotlib.pyplot as plt
import numpy as np
import scipy
from random import random
# generate data
x = np.array(range(100))/10
y = np.sin(x) + np.array([random()*0.25 for _ in x])
dydx = scipy.signal.savgol_filter(y, window_length=11, polyorder=2, deriv=1)
# Plot result
plt.plot(x, y, label='Original signal')
plt.plot(x, dydx*10, label='1st Derivative')
plt.plot(x, np.cos(x), label='Expected 1st Derivative')
plt.legend()
plt.show()

Categories