Related
I am having issues fitting a Gaussian to my data. Currently the output for my code looks like
this. Where orange is the data, blue is the gaussian fit and green is an in-built gaussian fitter however I do not wish to use it as it never quite begins at zero and I do not have access to the code. I would like my output to look something like this where the drawn in red is the gaussian fit.
I have tried reading about the curve_fit documentation however at best I get a fit that looks like this which fits over all the data, however, this is undesirable as I am only interested in the central peak which is my main issue - I do not know how to get curve_fit to fit a gaussian on the central peak like in the second image.
I have considered using a weights function like np.random.choice() or looking at the data file's maximum value and then looking at the second derivative at either side of the central peak to see where there are changes in inflection but am unsure how best to implement this.
How would I best go about this? I have done a lot of googling but cant quite get my head around changing curve_fit to suit my needs.
Cheers for any pointers!
This is a data file.
https://drive.google.com/file/d/1qrAkD74U6L46GoGnvMiUHdPuLEToS6Pv/view?usp=sharing
This is my code:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from matplotlib.pyplot import figure
plt.close('all')
fpathB4 = 'E:\.1. Work - Current Projects + Old Projects\Current Projects\PF 4MHz Laser System\.8. 1050 SYSTEM\AC traces'
fpath = fpathB4.replace('\\','/') + ('/')
filename = '300'
with open(fpath+filename) as f:
dataraw = f.readlines()
FWHM = dataraw[8].split(':')[1].split()[0]
FWHM = np.float(FWHM)
print("##### For AC file -", filename, "#####")
print("Auto-co guess -", FWHM, "ps")
pulseduration = FWHM/np.sqrt(2)
pulseduration = str(pulseduration)
dataraw = dataraw[15:]
print("Pulse duration -", pulseduration, "ps" + "\n")
time = np.array([])
acf1 = np.array([]) #### DATA
fit = np.array([]) #### Gaussian fit
for k in dataraw:
data = k.split()
time = np.append(time, np.float(data[0]))
acf1= np.append(acf1, np.float(data[1]))
fit = np.append(fit, np.float(data[2]))
n = len(time)
y = acf1.copy()
x = time.copy()
mean = sum(x*y)/n
sigma = sum(y*(x-mean)**2)/n
def gaus(x,a,x0,sigma):
return a*np.exp(-(x-x0)**2/(2*sigma**2))
popt,pcov = curve_fit(gaus,x,y,p0=[1,mean,sigma])
plt.plot(x,gaus(x,*popt)/np.max(gaus(x,*popt)))
figure(num=1, figsize=(8, 3), dpi=96, facecolor='w', edgecolor='k') # figsize = (length, height)
plt.plot(time, acf1/np.max(acf1), label = 'Data - ' + filename, linewidth = 1)
plt.plot(time, fit/np.max(fit), label = '$FWHM_{{\Delta t}}$ (ps) = ' + pulseduration)
plt.autoscale(enable = True, axis = 'x', tight = True)
plt.title("Auto-Correlation Data")
plt.xlabel("Time (ps)")
plt.ylabel("Intensity (a.u.)")
plt.legend()
I think the problem might be that the data are not completely Gaussian-like. It seems you have some kind of Airy/sinc function due to the time resolution of your acquisition instrument. Still, if you are only interested in the center you can still fit it using a single gaussian:
import fitwrap as fw
import pandas as pd
df = pd.read_csv('300', skip_blank_lines=True, skiprows=13, sep='\s+')
def gaussian_no_offset(x, x0=2, sigma=1, amp=300):
return amp*np.exp(-(x-x0)**2/sigma**2)
fw.fit(gaussian_no_offset, df.time, df.acf1)
x0: 2.59158 +/- 0.00828 (0.3%) initial:2
sigma: 0.373 +/- 0.0117 (3.1%) initial:1
amp: 355.02 +/- 9.65 (2.7%) initial:300
If you want to be slightly more precise I can think of a sinc squared function for the peak and a broad gaussian offset. The fit seems nicer, but it really depends on what the data actually represents...
def sinc(x, x0=2.5, amp=300, width=1, amp_g=20, sigma=3):
return amp*(np.sinc((x-x0)/width))**2 + amp_g*np.exp(-(x-x0)**2/sigma**2)
fw.fit(sinc, df.time, df.acf1)
x0: 2.58884 +/- 0.0021 (0.1%) initial:2.5
amp: 303.84 +/- 3.7 (1.2%) initial:300
width: 0.49211 +/- 0.00565 (1.1%) initial:1
amp_g: 81.32 +/- 2.11 (2.6%) initial:20
sigma: 1.512 +/- 0.0351 (2.3%) initial:3
I'd add a constant to the Gaussian equation, and limit the range of that in the bounds parameter of curve fit, so that the graph isn't raised higher.
So your equation would be:
def gaus(y0,x,a,x0,sigma):
return y0 + a*np.exp(-(x-x0)**2/(2*sigma**2))
and the curve_fit bounds would be something like this:
curve_fit(..... ,bounds = [[0,a_min, x0_min, sigma_min],[0.1, a_max, x0_max, sigma_max]])
I have a problem with my code.
So i try to represent the sampled values of a function 'sin(t^3)/2^tan(t)' for
t between 0 and 1.5 and frequency fs=50Hz.
I have created a function 'sampleFunction' which takes as parameters the string which represents the trigonometric function,beginning of the interval,end of interval and the frequency.
I create tVector(0,0.02,0.04,..,1.48)
Then I take the elements of tVector and use them to evaluate the string and put the result in another vector y
I return both y and tVector
But I encounter a problem when i run it saying 'y' is not defined
This is the code:
import numpy as np
import matplotlib.pyplot as plt
import math
def sampleFunction(functionString,t0,t1,fs):
tVector=np.arange(start=t0, stop=t1, step=1/fs, dtype='float')
t=t0
for i in range(0,len(tVector)):
t=tVector[i]
y[i]=eval(functionString)
return y,tVector
t0=0
t1 =1.5
fs=50
thold=.1
functionString='math.sin(t**3)/2**math.tan(t)'
y,t=sampleFunction(functionString,t0,t1,fs)
plt.plot(t,y)
plt.xlabel('time')
plt.ylabel('Amplitude')
You can change your code in the following way:
def sampleFunction(functionString,t0,t1,fs):
tVector=np.arange(start=t0, stop=t1, step=1/fs, dtype='float')
t=t0
y = np.zeros( tVector.shape )
for i in range(0,len(tVector)):
t=tVector[i]
y[i]=eval(functionString)
return y,tVector
However, this is not good python. There are a couple of issues:
You should use vectorized operations.
You should avoid eval like the plague. This has security implications.
For vectorized operations, simply do:
def sampleFunction(functionString,t0,t1,fs):
t = np.arange(start=t0, stop=t1, step=1/fs, dtype='float')
y = eval(functionString)
return y, t
and call it as:
sampleFunction('np.sin(t**3)/2**np.tan(t)', 0, 10, 100)
This is much faster (especially for large arrays)
Finally, the vectorized form is only a single line long. You probably don't need the extra function.
You have a problem with the allocation of the 'y' variable as Harold is saying.
However, there are multiple ways of achieving what you are doing and the eval function is, unless you have a very good reason, the absolute worst. Maybe consider one of the possible examples below:
import numpy as np
import matplotlib.pyplot as plt
import math
def sampleFunction(functionString,t0,t1,fs):
tVector=np.arange(start=t0, stop=t1, step=1/fs, dtype='float')
t=t0
y = [float]*len(tVector) # <------------------- Allocate 'y' variable
for i in range(0,len(tVector)):
t = tVector[i]
y[i]=eval(functionString)
return y,tVector
t0=0
t1 =1.5
fs=50
thold=.1
# Your code
functionString = 'math.sin(t**3)/2**math.tan(t)'
y, t = sampleFunction(functionString,t0,t1,fs)
plt.plot(t, y, color='cyan')
# Using the 'map' built-in function
t = np.arange(start=t0, stop=t1, step=1./fs, dtype='float')
y = map(lambda ti: 0.9*math.sin(ti**3)/2**math.tan(ti), t)
plt.plot(t, y, color='magenta')
# Using Numpy's 'sin' and 'tan'
t = np.arange(start=t0, stop=t1, step=1./fs, dtype='float')
y = 0.8*np.sin(t**3)/2**np.tan(t)
plt.plot(t, y, color='darkorange')
# Using 'list comprehensions'
t = np.arange(start=t0, stop=t1, step=1./fs, dtype='float')
y = [ 0.7*math.sin(ti**3)/2**math.tan(ti) for ti in t]
plt.plot(t, y, color='darkgreen')
plt.xlabel('time')
plt.ylabel('Amplitude')
plt.show()
The result is:
When running the above code, you should have gotten an error message saying, in the end, "name 'y' is not defined". If you look at your function definition, you will see that it really isn't. You cannot passing a value to y[i] without defining y first! The following line before the "for" loop fixes that particular problem:
y = [None] * len(tVector)
The code will run fine after that correction.
But: why do you want to pass a function string when you can pass a function? Functions, in Python, are first-class-objects!
I'm trying to understand scipy.signal.deconvolve.
From the mathematical point of view a convolution is just the multiplication in fourier space so I would expect
that for two functions f and g:
Deconvolve(Convolve(f,g) , g) == f
In numpy/scipy this is either not the case or I'm missing an important point.
Although there are some questions related to deconvolve on SO already (like here and here) they do not address this point, others remain unclear (this) or unanswered (here). There are also two questions on SignalProcessing SE (this and this) the answers to which are not helpful in understanding how scipy's deconvolve function works.
The question would be:
How do you reconstruct the original signal f from a convoluted signal,
assuming you know the convolving function g.?
Or in other words: How does this pseudocode Deconvolve(Convolve(f,g) , g) == f translate into numpy / scipy?
Edit: Note that this question is not targeted at preventing numerical inaccuracies (although this is also an open question) but at understanding how convolve/deconvolve work together in scipy.
The following code tries to do that with a Heaviside function and a gaussian filter.
As can be seen in the image, the result of the deconvolution of the convolution is not at
all the original Heaviside function. I would be glad if someone could shed some light into this issue.
import numpy as np
import scipy.signal
import matplotlib.pyplot as plt
# Define heaviside function
H = lambda x: 0.5 * (np.sign(x) + 1.)
#define gaussian
gauss = lambda x, sig: np.exp(-( x/float(sig))**2 )
X = np.linspace(-5, 30, num=3501)
X2 = np.linspace(-5,5, num=1001)
# convolute a heaviside with a gaussian
H_c = np.convolve( H(X), gauss(X2, 1), mode="same" )
# deconvolute a the result
H_dc, er = scipy.signal.deconvolve(H_c, gauss(X2, 1) )
#### Plot ####
fig , ax = plt.subplots(nrows=4, figsize=(6,7))
ax[0].plot( H(X), color="#907700", label="Heaviside", lw=3 )
ax[1].plot( gauss(X2, 1), color="#907700", label="Gauss filter", lw=3 )
ax[2].plot( H_c/H_c.max(), color="#325cab", label="convoluted" , lw=3 )
ax[3].plot( H_dc, color="#ab4232", label="deconvoluted", lw=3 )
for i in range(len(ax)):
ax[i].set_xlim([0, len(X)])
ax[i].set_ylim([-0.07, 1.2])
ax[i].legend(loc=4)
plt.show()
Edit: Note that there is a matlab example, showing how to convolve/deconvolve a rectangular signal using
yc=conv(y,c,'full')./sum(c);
ydc=deconv(yc,c).*sum(c);
In the spirit of this question it would also help if someone was able to translate this example into python.
After some trial and error I found out how to interprete the results of scipy.signal.deconvolve() and I post my findings as an answer.
Let's start with a working example code
import numpy as np
import scipy.signal
import matplotlib.pyplot as plt
# let the signal be box-like
signal = np.repeat([0., 1., 0.], 100)
# and use a gaussian filter
# the filter should be shorter than the signal
# the filter should be such that it's much bigger then zero everywhere
gauss = np.exp(-( (np.linspace(0,50)-25.)/float(12))**2 )
print gauss.min() # = 0.013 >> 0
# calculate the convolution (np.convolve and scipy.signal.convolve identical)
# the keywordargument mode="same" ensures that the convolution spans the same
# shape as the input array.
#filtered = scipy.signal.convolve(signal, gauss, mode='same')
filtered = np.convolve(signal, gauss, mode='same')
deconv, _ = scipy.signal.deconvolve( filtered, gauss )
#the deconvolution has n = len(signal) - len(gauss) + 1 points
n = len(signal)-len(gauss)+1
# so we need to expand it by
s = (len(signal)-n)/2
#on both sides.
deconv_res = np.zeros(len(signal))
deconv_res[s:len(signal)-s-1] = deconv
deconv = deconv_res
# now deconv contains the deconvolution
# expanded to the original shape (filled with zeros)
#### Plot ####
fig , ax = plt.subplots(nrows=4, figsize=(6,7))
ax[0].plot(signal, color="#907700", label="original", lw=3 )
ax[1].plot(gauss, color="#68934e", label="gauss filter", lw=3 )
# we need to divide by the sum of the filter window to get the convolution normalized to 1
ax[2].plot(filtered/np.sum(gauss), color="#325cab", label="convoluted" , lw=3 )
ax[3].plot(deconv, color="#ab4232", label="deconvoluted", lw=3 )
for i in range(len(ax)):
ax[i].set_xlim([0, len(signal)])
ax[i].set_ylim([-0.07, 1.2])
ax[i].legend(loc=1, fontsize=11)
if i != len(ax)-1 :
ax[i].set_xticklabels([])
plt.savefig(__file__ + ".png")
plt.show()
This code produces the following image, showing exactly what we want (Deconvolve(Convolve(signal,gauss) , gauss) == signal)
Some important findings are:
The filter should be shorter than the signal
The filter should be much bigger than zero everywhere (here > 0.013 is good enough)
Using the keyword argument mode = 'same' to the convolution ensures that it lives on the same array shape as the signal.
The deconvolution has n = len(signal) - len(gauss) + 1 points.
So in order to let it also reside on the same original array shape we need to expand it by s = (len(signal)-n)/2 on both sides.
Of course, further findings, comments and suggestion to this question are still welcome.
As written in the comments, I cannot help with the example you posted originally. As #Stelios has pointed out, the deconvolution might not work out due to numerical issues.
I can, however, reproduce the example you posted in your Edit:
That is the code which is a direct translation from the matlab source code:
import numpy as np
import scipy.signal
import matplotlib.pyplot as plt
x = np.arange(0., 20.01, 0.01)
y = np.zeros(len(x))
y[900:1100] = 1.
y += 0.01 * np.random.randn(len(y))
c = np.exp(-(np.arange(len(y))) / 30.)
yc = scipy.signal.convolve(y, c, mode='full') / c.sum()
ydc, remainder = scipy.signal.deconvolve(yc, c)
ydc *= c.sum()
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(4, 4))
ax[0][0].plot(x, y, label="original y", lw=3)
ax[0][1].plot(x, c, label="c", lw=3)
ax[1][0].plot(x[0:2000], yc[0:2000], label="yc", lw=3)
ax[1][1].plot(x, ydc, label="recovered y", lw=3)
plt.show()
I am trying to fit a morse potential using a python and scipy.
The morse potential is defined as:
V = D*(exp(-2*m*(x-u)) - 2*exp(-m*(x-u)))
where D, m and u are the parameters I need to extract.
Unfortunately the fit is not satisfactory as you can see below (sorry I do not have 10 reputation so the image has to be clicked). Could anyone help me please? I must say I am not the best programmer with python.
Here is my code:
from scipy.optimize import curve_fit
import numpy as np
import matplotlib.pyplot as plt
xdata2=np.array([1.0 ,1.1 ,1.2 ,1.3 ,1.4 ,1.5 ,1.6 ,1.7 ,1.8 ,1.9 ,2.0 ,2.1 ,2.2 ,2.3 ,2.4 ,2.5 ,2.6 ,2.7 ,2.8 ,2.9 ,3.0 ,3.1 ,3.2 ,3.3 ,3.4 ,3.5 ,3.6 ,3.7 ,3.8 ,3.9 ,4.0 ,4.1 ,4.2 ,4.3 ,4.4 ,4.5 ,4.6 ,4.7 ,4.8 ,4.9 ,5.0 ,5.1 ,5.2 ,5.3 ,5.4 ,5.5 ,5.6 ,5.7 ,5.8 ,5.9])
ydata2=[-1360.121815,-1368.532641,-1374.215047,-1378.090480,-1380.648178,-1382.223113,-1383.091562,-1383.479384,-1383.558087,-1383.445803,-1383.220380,-1382.931531,-1382.609269,-1382.273574,-1381.940879,-1381.621299,-1381.319042,-1381.036231,-1380.772039,-1380.527051,-1380.301961,-1380.096257,-1379.907700,-1379.734621,-1379.575837,-1379.430693,-1379.299282,-1379.181303,-1379.077272,-1378.985220,-1378.903626,-1378.831588,-1378.768880,-1378.715015,-1378.668910,-1378.629996,-1378.597943,-1378.572742,-1378.554547,-1378.543296,-1378.539843,-1378.543593,-1378.554519,-1378.572747,-1378.597945,-1378.630024,-1378.668911,-1378.715015,-1378.768915,-1378.831593]
t=np.linspace(0.1,7)
def morse(q, m, u, x ):
return (q * (np.exp(-2*m*(x-u))-2*np.exp(-m*(x-u))))
popt, pcov = curve_fit(morse, xdata2, ydata2, maxfev=40000000)
yfit = morse(t,popt[0], popt[1], popt[2])
print popt
plt.plot(xdata2, ydata2,"ro")
plt.plot(t, yfit)
plt.show()
Old fit before gboffi's comment
I am guessing the exact depth of the morse potential does not interest you overly much. So I added an additional parameter to shift the morse potential up and down (v), includes #gboffis comment. Furthermore, the first argument of your function must be the arguments, not the parameters you want to fit (see http://docs.scipy.org/doc/scipy-0.16.1/reference/generated/scipy.optimize.curve_fit.html)
In addition, such fits are dependent on your starting position. The following should give you what you want.
from scipy.optimize import curve_fit
import numpy as np
import matplotlib.pyplot as plt
xdata2=np.array([1.0 ,1.1 ,1.2 ,1.3 ,1.4 ,1.5 ,1.6 ,1.7 ,1.8 ,1.9 ,2.0 ,2.1 ,2.2 ,2.3 ,2.4 ,2.5 ,2.6 ,2.7 ,2.8 ,2.9 ,3.0 ,3.1 ,3.2 ,3.3 ,3.4 ,3.5 ,3.6 ,3.7 ,3.8 ,3.9 ,4.0 ,4.1 ,4.2 ,4.3 ,4.4 ,4.5 ,4.6 ,4.7 ,4.8 ,4.9 ,5.0 ,5.1 ,5.2 ,5.3 ,5.4 ,5.5 ,5.6 ,5.7 ,5.8 ,5.9])
ydata2=[-1360.121815,-1368.532641,-1374.215047,-1378.090480,-1380.648178,-1382.223113,-1383.091562,-1383.479384,-1383.558087,-1383.445803,-1383.220380,-1382.931531,-1382.609269,-1382.273574,-1381.940879,-1381.621299,-1381.319042,-1381.036231,-1380.772039,-1380.527051,-1380.301961,-1380.096257,-1379.907700,-1379.734621,-1379.575837,-1379.430693,-1379.299282,-1379.181303,-1379.077272,-1378.985220,-1378.903626,-1378.831588,-1378.768880,-1378.715015,-1378.668910,-1378.629996,-1378.597943,-1378.572742,-1378.554547,-1378.543296,-1378.539843,-1378.543593,-1378.554519,-1378.572747,-1378.597945,-1378.630024,-1378.668911,-1378.715015,-1378.768915,-1378.831593]
t=np.linspace(0.1,7)
tstart = [1.e+3, 1, 3, 0]
def morse(x, q, m, u , v):
return (q * (np.exp(-2*m*(x-u))-2*np.exp(-m*(x-u))) + v)
popt, pcov = curve_fit(morse, xdata2, ydata2, p0 = tstart, maxfev=40000000)
print popt # [ 5.10155662 1.43329962 1.7991549 -1378.53461345]
yfit = morse(t,popt[0], popt[1], popt[2], popt[3])
#print popt
#
#
#
plt.plot(xdata2, ydata2,"ro")
plt.plot(t, yfit)
plt.show()
When I try to fit my data, results are a bit strange and I don't understand why ? Obtained fitting is flat, and the first input e=0. seems to raised a division error somewhere.
The only working case is when I modify e[0]=1.0e-9
The result is the following:
From example here it seems that my example is not so far from what I read, but I stay stuck, so could you help me please on what's going wrong in my case ?
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
src_s = np.array((45.59,50.66664,59.74871,65.71018,72.76012,79.06256,84.13755,90.39944,
96.33653,101.65667,106.27968,110.76301,114.41808,117.21922,120.51836))
src_e = np.array((0.0,0.00126,0.00503,0.00804,0.01228,0.01685,0.02127,0.02846,0.03666,
0.04581,0.05620,0.06882,0.08005,0.09031,0.10327))
# plot source data
plt.plot(src_e, src_s, 'o')
# fitting function
def sigma(e, k ,n): return k*(e**n)
# find parameters curve fitting
param, var = curve_fit(sigma, src_e, src_s)
new_e = np.linspace(src_e.min(), src_e.max(), 50)
plt.plot(new_e, sigma(new_e, *param))
# modify first input
src_e[0]=1.0e-9
# relaunch parameters curve fitting
param, var = curve_fit(sigma, src_e, src_s)
new_e = np.linspace(src_e.min(), src_e.max(), 50)
plt.plot(new_e, sigma(new_e, *param))
plt.show()
Thanks in advance for your help.
The root of problem is an bad initial guess of parameters (actually no starting parameter is provided for curve_fit).
The target function can easily be linearized. Let's do that, then do a linear regression to get a good set of initial guess parameters for curve_fit (pass to it by p0=). The resulting fit is better (having less residue) and does not need to replace the first value of to be 1e-9:
In [38]:
src_e[0]=1.0e-9
# relaunch parameters curve fitting
param, var = curve_fit(sigma, src_e, src_s)
new_e = np.linspace(src_e.min(), src_e.max(), 50)
src_e[0]=0
plt.plot(new_e, sigma(new_e, *param))
plt.plot(src_e, src_s, 'ro')
plt.savefig('1.png')
print 'Residue is:', ((sigma(src_e, *param)-src_s)**2).sum()
Residue is: 2168.65307587
In [39]:
import scipy.stats as ss
src_e[0]=0
V=ss.linregress(np.log(src_e)[1:], np.log(src_s)[1:]) #avoid log(0)
param, var = curve_fit(sigma, src_e, src_s, p0=(np.exp(V[1]), V[0]))
new_e = np.linspace(src_e.min(), src_e.max(), 50)
plt.plot(new_e, sigma(new_e, *param))
plt.plot(src_e, src_s, 'ro')
plt.savefig('1.png')
print 'Residue is:', ((sigma(src_e, *param)-src_s)**2).sum()
Residue is: 2128.85364181
The first point can't be on the curve, so you need to change the curve formula:
import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
src_s = np.array((45.59,50.66664,59.74871,65.71018,72.76012,79.06256,84.13755,90.39944,
96.33653,101.65667,106.27968,110.76301,114.41808,117.21922,120.51836))
src_e = np.array((0.0,0.00126,0.00503,0.00804,0.01228,0.01685,0.02127,0.02846,0.03666,
0.04581,0.05620,0.06882,0.08005,0.09031,0.10327))
# plot source data
plt.plot(src_e, src_s, 'o')
def sigma(e, k ,n, offset): return k*((e+offset)**n)
# find parameters curve fitting
param, var = curve_fit(sigma, src_e, src_s)
new_e = np.linspace(src_e.min(), src_e.max(), 50)
plt.plot(new_e, sigma(new_e, *param))
here is the output: