Related
I am having issues fitting a Gaussian to my data. Currently the output for my code looks like
this. Where orange is the data, blue is the gaussian fit and green is an in-built gaussian fitter however I do not wish to use it as it never quite begins at zero and I do not have access to the code. I would like my output to look something like this where the drawn in red is the gaussian fit.
I have tried reading about the curve_fit documentation however at best I get a fit that looks like this which fits over all the data, however, this is undesirable as I am only interested in the central peak which is my main issue - I do not know how to get curve_fit to fit a gaussian on the central peak like in the second image.
I have considered using a weights function like np.random.choice() or looking at the data file's maximum value and then looking at the second derivative at either side of the central peak to see where there are changes in inflection but am unsure how best to implement this.
How would I best go about this? I have done a lot of googling but cant quite get my head around changing curve_fit to suit my needs.
Cheers for any pointers!
This is a data file.
https://drive.google.com/file/d/1qrAkD74U6L46GoGnvMiUHdPuLEToS6Pv/view?usp=sharing
This is my code:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from matplotlib.pyplot import figure
plt.close('all')
fpathB4 = 'E:\.1. Work - Current Projects + Old Projects\Current Projects\PF 4MHz Laser System\.8. 1050 SYSTEM\AC traces'
fpath = fpathB4.replace('\\','/') + ('/')
filename = '300'
with open(fpath+filename) as f:
dataraw = f.readlines()
FWHM = dataraw[8].split(':')[1].split()[0]
FWHM = np.float(FWHM)
print("##### For AC file -", filename, "#####")
print("Auto-co guess -", FWHM, "ps")
pulseduration = FWHM/np.sqrt(2)
pulseduration = str(pulseduration)
dataraw = dataraw[15:]
print("Pulse duration -", pulseduration, "ps" + "\n")
time = np.array([])
acf1 = np.array([]) #### DATA
fit = np.array([]) #### Gaussian fit
for k in dataraw:
data = k.split()
time = np.append(time, np.float(data[0]))
acf1= np.append(acf1, np.float(data[1]))
fit = np.append(fit, np.float(data[2]))
n = len(time)
y = acf1.copy()
x = time.copy()
mean = sum(x*y)/n
sigma = sum(y*(x-mean)**2)/n
def gaus(x,a,x0,sigma):
return a*np.exp(-(x-x0)**2/(2*sigma**2))
popt,pcov = curve_fit(gaus,x,y,p0=[1,mean,sigma])
plt.plot(x,gaus(x,*popt)/np.max(gaus(x,*popt)))
figure(num=1, figsize=(8, 3), dpi=96, facecolor='w', edgecolor='k') # figsize = (length, height)
plt.plot(time, acf1/np.max(acf1), label = 'Data - ' + filename, linewidth = 1)
plt.plot(time, fit/np.max(fit), label = '$FWHM_{{\Delta t}}$ (ps) = ' + pulseduration)
plt.autoscale(enable = True, axis = 'x', tight = True)
plt.title("Auto-Correlation Data")
plt.xlabel("Time (ps)")
plt.ylabel("Intensity (a.u.)")
plt.legend()
I think the problem might be that the data are not completely Gaussian-like. It seems you have some kind of Airy/sinc function due to the time resolution of your acquisition instrument. Still, if you are only interested in the center you can still fit it using a single gaussian:
import fitwrap as fw
import pandas as pd
df = pd.read_csv('300', skip_blank_lines=True, skiprows=13, sep='\s+')
def gaussian_no_offset(x, x0=2, sigma=1, amp=300):
return amp*np.exp(-(x-x0)**2/sigma**2)
fw.fit(gaussian_no_offset, df.time, df.acf1)
x0: 2.59158 +/- 0.00828 (0.3%) initial:2
sigma: 0.373 +/- 0.0117 (3.1%) initial:1
amp: 355.02 +/- 9.65 (2.7%) initial:300
If you want to be slightly more precise I can think of a sinc squared function for the peak and a broad gaussian offset. The fit seems nicer, but it really depends on what the data actually represents...
def sinc(x, x0=2.5, amp=300, width=1, amp_g=20, sigma=3):
return amp*(np.sinc((x-x0)/width))**2 + amp_g*np.exp(-(x-x0)**2/sigma**2)
fw.fit(sinc, df.time, df.acf1)
x0: 2.58884 +/- 0.0021 (0.1%) initial:2.5
amp: 303.84 +/- 3.7 (1.2%) initial:300
width: 0.49211 +/- 0.00565 (1.1%) initial:1
amp_g: 81.32 +/- 2.11 (2.6%) initial:20
sigma: 1.512 +/- 0.0351 (2.3%) initial:3
I'd add a constant to the Gaussian equation, and limit the range of that in the bounds parameter of curve fit, so that the graph isn't raised higher.
So your equation would be:
def gaus(y0,x,a,x0,sigma):
return y0 + a*np.exp(-(x-x0)**2/(2*sigma**2))
and the curve_fit bounds would be something like this:
curve_fit(..... ,bounds = [[0,a_min, x0_min, sigma_min],[0.1, a_max, x0_max, sigma_max]])
in the framework of my bachelor's thesis, I need to evaluate my data with python. Unfortunately there's no suiting script of my fellow students yet and I'm quite new to programming.
I have this data set and I'm trying to fit it with a gaussian by using scipy.optimize.curve_fit. Since there are a lot of unusable counts especially at the end of the axis, I'd like to confine the part that is to be fitted.
Picture raw data
This is what I have so far:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
x=np.arange(5120)
y=array([ 0.81434599, 1.17054264, 0.85279188, ..., 1. ,
1. , 13.56291391]) #most of the data isn't interesting
#to me, part of interest see below
def Gauss(x, a, x0, sigma):
return a * np.exp(-(x - x0)**2 / (2 * sigma**2))
mean = sum(x * y) / sum(y)
sigma = np.sqrt(sum(y * (x - mean)**2) / sum(y))
popt,pcov = curve_fit(Gauss, x, y, p0=[max(y), mean, sigma],
maxfev=360000)
plt.plot(x,y,label='data')
plt.plot(x,Gauss(x, *popt), 'r-',label='fit')
On docs.scipy.org I've found a general description about curve_fit
If I try using
bounds=([2400,-np.inf, -np.inf],[2600, np.inf, np.inf]),
I'm getting the ValueError: x0 is infeasible. What is the problem here?
I also tried to confine it with
popt,pcov = curve_fit(Gauss, x[2400:2600], y[2400:2600], p0=[max(y), mean, sigma], maxfev=360000)
as suggested in a comment on this question: "Error when obtaining gaussian fit for graph" at stackoverflow
In this case I only get a straight line though.
Picture: Confinement with x[2400:2600],y[2400:2600] as arguments of curve_fit
I really hope you can help me out here. I only need a way to fit a small part of my data. Thanks in advance!
interesting data:
y=array([ 0.93396226, 1.00884956, 1.15457413, 1.07590759,
0.88915094, 1.07142857, 1.10714286, 1.14171123, 1.06666667,
0.84975369, 0.95480226, 0.99388379, 1.01675978, 0.83967391,
0.9771987 , 1.02402402, 1.04531722, 1.07492795, 0.97135417,
0.99714286, 1.0248139 , 1.26223776, 1.1533101 , 0.99099099,
1.18867925, 1.15772871, 0.95076923, 1.03313253, 1.02278481,
0.93265993, 1.06705539, 1.00265252, 1.02023121, 0.92076503,
0.99728997, 1.03353659, 1.15116279, 1.04336043, 0.95076923,
1.05515588, 0.92571429, 0.93448276, 1.02702703, 0.90056818,
0.96068796, 1.08493151, 1.13584906, 1.1212938 , 1.0739645 ,
0.98972603, 0.94594595, 1.07913669, 0.98425197, 0.87762238,
0.96811594, 1.02710843, 0.99392097, 0.91384615, 1.09809264,
1.00630915, 0.93175074, 0.87572254, 1.00651466, 0.78772379,
1.12244898, 1.2248062 , 0.97109827, 0.94607843, 0.97900262,
0.97527473, 1.01212121, 1.16422287, 1.20634921, 0.97275204,
1.01090909, 0.99404762, 1.00561798, 1.01146132, 1.08695652,
0.97214485, 1.03525641, 0.99096386, 1.05135952, 1.16451613,
0.90462428, 0.76876877, 0.47701149, 0.27607362, 0.21580547,
0.20598007, 0.16766467, 0.15533981, 0.19745223, 0.15407855,
0.18925831, 0.26997245, 0.47603834, 0.596875 , 0.85126582, 0.96
, 1.06578947, 1.08761329, 0.89548023, 0.99705882, 1.07142857,
0.95677233, 0.86119874, 1.02857143, 0.98250729, 0.94214876,
1.04166667, 0.96024465, 1.07022472, 1.10344828, 1.04859335,
0.96655518, 1.06424581, 1.01754386, 1.03492063, 1.18627451,
0.91036415, 1.03355705, 1.09116809, 0.96083551, 1.01298701,
1.03691275, 1.02923977, 1.11612903, 1.01457726, 1.06285714,
0.98186528, 1.16470588, 0.86645963, 1.07317073, 1.09615385,
1.21192053, 0.94385027, 0.94244604, 0.88390501, 0.95718654,
0.9691358 , 1.01729107, 1.01119403, 1.20350877, 1.12890625,
1.06940063, 0.90410959, 1.14662757, 0.97093023, 1.03021148,
1.10629921, 0.97118156, 1.10693642, 1.07917889, 0.9484127 ,
1.07581227, 0.98006645, 0.98986486, 0.90066225, 0.90066225,
0.86779661, 0.86779661, 0.96996997, 1.01438849, 0.91186441,
0.91290323, 1.03745318, 1.0615942 , 0.97202797, 1.16608997,
0.94182825, 1.08333333, 0.9076087 , 1.18181818, 1.20618557,
1.01273885, 0.93606138, 0.87457627, 0.90575916, 1.09756098,
0.99115044, 1.13380282, 1.04333333, 1.04026846, 1.0297619 ,
1.04334365, 1.03395062, 0.92553191, 0.98198198, 1. ,
0.9439528 , 1.02684564, 1.1372549 , 0.96676737, 0.99649123,
1.07051282, 1.10367893, 1.0866426 , 1.15384615, 0.99667774])
You might find the lmfit module (https://lmfit.github.io/lmfit-py/) useful for this. It is designed to make curve fitting very easy, has built-in models for common peaks like Gaussian, and has many useful features such as allowing you to set bounds on parameters. A fit to your data with lmfit might look like this:
import numpy as np
import matplotlib.pyplot as plt
from lmfit.models import GaussianModel, ConstantModel
y = np.array([.....]) # uses your shorter data range
x = np.arange(len(y))
# make a model that is a Gaussian + a constant:
model = GaussianModel(prefix='peak_') + ConstantModel()
# make parameters with starting values:
params = model.make_params(c=1.0, peak_center=90,
peak_sigma=5, peak_amplitude=-5)
# it's not really needed for this data, but you can put bounds on
# parameters like this (or set .vary=False to fix a parameter)
params['peak_sigma'].min = 0 # sigma > 0
params['peak_amplitude'].max = 0 # amplitude < 0
params['peak_center'].min = 80
params['peak_center'].max = 100
# run fit
result = model.fit(y, params, x=x)
# print, plot results
print(result.fit_report())
plt.plot(x, y)
plt.plot(x, result.best_fit)
plt.show()
This will print out
[[Model]]
(Model(gaussian, prefix='peak_') + Model(constant))
[[Fit Statistics]]
# function evals = 54
# data points = 200
# variables = 4
chi-square = 1.616
reduced chi-square = 0.008
Akaike info crit = -955.625
Bayesian info crit = -942.432
[[Variables]]
peak_sigma: 4.03660814 +/- 0.204240 (5.06%) (init= 5)
peak_center: 91.2246614 +/- 0.200267 (0.22%) (init= 90)
peak_amplitude: -9.79111362 +/- 0.445273 (4.55%) (init=-5)
c: 1.02138228 +/- 0.006796 (0.67%) (init= 1)
peak_fwhm: 9.50548558 +/- 0.480950 (5.06%) == '2.3548200*peak_sigma'
peak_height: -0.96766623 +/- 0.041854 (4.33%) == '0.3989423*peak_amplitude/max(1.e-15, peak_sigma)'
[[Correlations]] (unreported correlations are < 0.100)
C(peak_sigma, peak_amplitude) = -0.599
C(peak_amplitude, c) = -0.328
C(peak_sigma, c) = 0.196
and make a plot like this:
I am trying to compute S3x3 moving averages, using asymmetric weights, as described in this MatLab example and I am unsure if my interpretation of the following is correct when translating from MatLab:
Have I set up my matrices in the same way?
Does scipy.signal.convolve2d do the same as MatLab's conv2d in this instance?
Why is my fit so bad?!
In MatLab, the filter is given and applied as:
% S3x3 seasonal filter
% Symmetric weights
sW3 = [1/9;2/9;1/3;2/9;1/9];
% Asymmetric weights for end of series
aW3 = [.259 .407;.37 .407;.259 .185;.111 0];
% dat contains data - simplified adaptation from link above
ns = length(dat) ; first = 1:4 ; last = ns - 3:ns;
trend = conv(dat, sW3, 'same');
trend(1:2) = conv2(dat(first), 1, rot90(aW3,2), 'valid');
trend(ns-1:ns) = conv2(dat(last), 1, aW3, 'valid');
I have interpreted this in python using my own data, I have assumed in doing so that ; in MatLab matrices means new row and that a space means new column
import numpy as np
from scipy.signal import convolve2d
dat = np.array([0.02360784, 0.0227628 , 0.0386366 , 0.03338596, 0.03141621, 0.03430469])
dat = dat.reshape(dat.shape[0], 1) # in columns
sW3 = np.array([[1/9.],[2/9.],[1/3.],[2/9.],[1/9.]])
aW3 = np.array( [[ 0.259, 0.407],
[ 0.37 , 0.407],
[ 0.259, 0.185],
[ 0.111, 0. ]])
trend = convolve2d(dat, sW3, 'same')
trend[:2] = convolve2d(dat[:2], np.rot90(aW3,2), 'same')
trend[-2:] = convolve2d(dat[-2:], np.rot90(aW3,2), 'same')
Plotting the data, the fit is pretty bad...
import matplotlib.pyplot as plt
plt.plot(dat, 'grey', label='raw data', linewidth=4.)
plt.plot(trend, 'b--', label = 'S3x3 trend')
plt.legend()
plt.plot()
Solved.
It turned out that the issue was really to do with the subtle differences in MatLab's conv2d and scipy's convolve2d, from the docs:
C = conv2(h1,h2,A) first convolves each column of A with the vector h1 and then convolves each row of the result with the vector h2
This means that what is actually happening at the edges is incorrect in my code. It should actually be the sum of a the convolution of each endpoint and the relevant columns of aW3
e.g.
nwtrend = np.zeros(dat.shape)
nwtrend = convolve2d(dat, sW3, 'same')
for i in xrange(np.rot90(aW3, 2).shape[1]):
nwtrend[i] = np.convolve(dat[i,0], np.rot90(aW3, 2)[:,i], 'same').sum()
for i in xrange(aW3.shape[1]):
nwtrend[-i-1] = np.convolve(dat[-i-1,0], aW3[:,i], 'same').sum()
To get the output above
plt.plot(nwtrend, 'r', label='S3x3 new',linewidth=2.)
plt.plot(trend, 'b--', label='S3x3 old')
plt.legend(loc='lower centre')
plt.show()
def GaussianMatrix(X,sigma):
row,col=X.shape
GassMatrix=np.zeros(shape=(row,row))
X=np.asarray(X)
i=0
for v_i in X:
j=0
for v_j in X:
GassMatrix[i,j]=Gaussian(v_i.T,v_j.T,sigma)
j+=1
i+=1
return GassMatrix
def Gaussian(x,z,sigma):
return np.exp((-(np.linalg.norm(x-z)**2))/(2*sigma**2))
This is my current way. Is there any way I can use matrix operation to do this? X is the data points.
I myself used the accepted answer for my image processing, but I find it (and the other answers) too dependent on other modules. Therefore, here is my compact solution:
import numpy as np
def gkern(l=5, sig=1.):
"""\
creates gaussian kernel with side length `l` and a sigma of `sig`
"""
ax = np.linspace(-(l - 1) / 2., (l - 1) / 2., l)
gauss = np.exp(-0.5 * np.square(ax) / np.square(sig))
kernel = np.outer(gauss, gauss)
return kernel / np.sum(kernel)
Edit: Changed arange to linspace to handle even side lengths
Edit: Use separability for faster computation, thank you Yves Daoust.
Do you want to use the Gaussian kernel for e.g. image smoothing? If so, there's a function gaussian_filter() in scipy:
Updated answer
This should work - while it's still not 100% accurate, it attempts to account for the probability mass within each cell of the grid. I think that using the probability density at the midpoint of each cell is slightly less accurate, especially for small kernels. See https://homepages.inf.ed.ac.uk/rbf/HIPR2/gsmooth.htm for an example.
import numpy as np
import scipy.stats as st
def gkern(kernlen=21, nsig=3):
"""Returns a 2D Gaussian kernel."""
x = np.linspace(-nsig, nsig, kernlen+1)
kern1d = np.diff(st.norm.cdf(x))
kern2d = np.outer(kern1d, kern1d)
return kern2d/kern2d.sum()
Testing it on the example in Figure 3 from the link:
gkern(5, 2.5)*273
gives
array([[ 1.0278445 , 4.10018648, 6.49510362, 4.10018648, 1.0278445 ],
[ 4.10018648, 16.35610171, 25.90969361, 16.35610171, 4.10018648],
[ 6.49510362, 25.90969361, 41.0435344 , 25.90969361, 6.49510362],
[ 4.10018648, 16.35610171, 25.90969361, 16.35610171, 4.10018648],
[ 1.0278445 , 4.10018648, 6.49510362, 4.10018648, 1.0278445 ]])
The original (accepted) answer below accepted is wrong
The square root is unnecessary, and the definition of the interval is incorrect.
import numpy as np
import scipy.stats as st
def gkern(kernlen=21, nsig=3):
"""Returns a 2D Gaussian kernel array."""
interval = (2*nsig+1.)/(kernlen)
x = np.linspace(-nsig-interval/2., nsig+interval/2., kernlen+1)
kern1d = np.diff(st.norm.cdf(x))
kernel_raw = np.sqrt(np.outer(kern1d, kern1d))
kernel = kernel_raw/kernel_raw.sum()
return kernel
I'm trying to improve on FuzzyDuck's answer here. I think this approach is shorter and easier to understand. Here I'm using signal.scipy.gaussian to get the 2D gaussian kernel.
import numpy as np
from scipy import signal
def gkern(kernlen=21, std=3):
"""Returns a 2D Gaussian kernel array."""
gkern1d = signal.gaussian(kernlen, std=std).reshape(kernlen, 1)
gkern2d = np.outer(gkern1d, gkern1d)
return gkern2d
Plotting it using matplotlib.pyplot:
import matplotlib.pyplot as plt
plt.imshow(gkern(21), interpolation='none')
You may simply gaussian-filter a simple 2D dirac function, the result is then the filter function that was being used:
import numpy as np
import scipy.ndimage.filters as fi
def gkern2(kernlen=21, nsig=3):
"""Returns a 2D Gaussian kernel array."""
# create nxn zeros
inp = np.zeros((kernlen, kernlen))
# set element at the middle to one, a dirac delta
inp[kernlen//2, kernlen//2] = 1
# gaussian-smooth the dirac, resulting in a gaussian filter mask
return fi.gaussian_filter(inp, nsig)
I tried using numpy only. Here is the code
def get_gauss_kernel(size=3,sigma=1):
center=(int)(size/2)
kernel=np.zeros((size,size))
for i in range(size):
for j in range(size):
diff=np.sqrt((i-center)**2+(j-center)**2)
kernel[i,j]=np.exp(-(diff**2)/(2*sigma**2))
return kernel/np.sum(kernel)
You can visualise the result using:
plt.imshow(get_gauss_kernel(5,1))
A 2D gaussian kernel matrix can be computed with numpy broadcasting,
def gaussian_kernel(size=21, sigma=3):
"""Returns a 2D Gaussian kernel.
Parameters
----------
size : float, the kernel size (will be square)
sigma : float, the sigma Gaussian parameter
Returns
-------
out : array, shape = (size, size)
an array with the centered gaussian kernel
"""
x = np.linspace(- (size // 2), size // 2)
x /= np.sqrt(2)*sigma
x2 = x**2
kernel = np.exp(- x2[:, None] - x2[None, :])
return kernel / kernel.sum()
For small kernel sizes this should be reasonably fast.
Note: this makes changing the sigma parameter easier with respect to the accepted answer.
If you are a computer vision engineer and you need heatmap for a particular point as Gaussian distribution(especially for keypoint detection on image)
def gaussian_heatmap(center = (2, 2), image_size = (10, 10), sig = 1):
"""
It produces single gaussian at expected center
:param center: the mean position (X, Y) - where high value expected
:param image_size: The total image size (width, height)
:param sig: The sigma value
:return:
"""
x_axis = np.linspace(0, image_size[0]-1, image_size[0]) - center[0]
y_axis = np.linspace(0, image_size[1]-1, image_size[1]) - center[1]
xx, yy = np.meshgrid(x_axis, y_axis)
kernel = np.exp(-0.5 * (np.square(xx) + np.square(yy)) / np.square(sig))
return kernel
The usage and output
kernel = gaussian_heatmap(center = (2, 2), image_size = (10, 10), sig = 1)
plt.imshow(kernel)
print("max at :", np.unravel_index(kernel.argmax(), kernel.shape))
print("kernel shape", kernel.shape)
max at : (2, 2)
kernel shape (10, 10)
kernel = gaussian_heatmap(center = (25, 40), image_size = (100, 50), sig = 5)
plt.imshow(kernel)
print("max at :", np.unravel_index(kernel.argmax(), kernel.shape))
print("kernel shape", kernel.shape)
max at : (40, 25)
kernel shape (50, 100)
linalg.norm takes an axis parameter. With a little experimentation I found I could calculate the norm for all combinations of rows with
np.linalg.norm(x[None,:,:]-x[:,None,:],axis=2)
It expands x into a 3d array of all differences, and takes the norm on the last dimension.
So I can apply this to your code by adding the axis parameter to your Gaussian:
def Gaussian(x,z,sigma,axis=None):
return np.exp((-(np.linalg.norm(x-z, axis=axis)**2))/(2*sigma**2))
x=np.arange(12).reshape(3,4)
GaussianMatrix(x,1)
produces
array([[ 1.00000000e+00, 1.26641655e-14, 2.57220937e-56],
[ 1.26641655e-14, 1.00000000e+00, 1.26641655e-14],
[ 2.57220937e-56, 1.26641655e-14, 1.00000000e+00]])
Matching:
Gaussian(x[None,:,:],x[:,None,:],1,axis=2)
array([[ 1.00000000e+00, 1.26641655e-14, 2.57220937e-56],
[ 1.26641655e-14, 1.00000000e+00, 1.26641655e-14],
[ 2.57220937e-56, 1.26641655e-14, 1.00000000e+00]])
Building up on Teddy Hartanto's answer. You can just calculate your own one dimensional Gaussian functions and then use np.outer to calculate the two dimensional one. Very fast and efficient way.
With the code below you can also use different Sigmas for every dimension
import numpy as np
def generate_gaussian_mask(shape, sigma, sigma_y=None):
if sigma_y==None:
sigma_y=sigma
rows, cols = shape
def get_gaussian_fct(size, sigma):
fct_gaus_x = np.linspace(0,size,size)
fct_gaus_x = fct_gaus_x-size/2
fct_gaus_x = fct_gaus_x**2
fct_gaus_x = fct_gaus_x/(2*sigma**2)
fct_gaus_x = np.exp(-fct_gaus_x)
return fct_gaus_x
mask = np.outer(get_gaussian_fct(rows,sigma), get_gaussian_fct(cols,sigma_y))
return mask
A good way to do that is to use the gaussian_filter function to recover the kernel.
For instance:
indicatrice = np.zeros((5,5))
indicatrice[2,2] = 1
gaussian_kernel = gaussian_filter(indicatrice, sigma=1)
gaussian_kernel/=gaussian_kernel[2,2]
This gives
array[[0.02144593, 0.08887207, 0.14644428, 0.08887207, 0.02144593],
[0.08887207, 0.36828649, 0.60686612, 0.36828649, 0.08887207],
[0.14644428, 0.60686612, 1. , 0.60686612, 0.14644428],
[0.08887207, 0.36828649, 0.60686612, 0.36828649, 0.08887207],
[0.02144593, 0.08887207, 0.14644428, 0.08887207, 0.02144593]]
Adapting th accepted answer by FuzzyDuck to match the results of this website: http://dev.theomader.com/gaussian-kernel-calculator/ I now present this definition to you:
import numpy as np
import scipy.stats as st
def gkern(kernlen=21, sig=3):
"""Returns a 2D Gaussian kernel."""
x = np.linspace(-(kernlen/2)/sig, (kernlen/2)/sig, kernlen+1)
kern1d = np.diff(st.norm.cdf(x))
kern2d = np.outer(kern1d, kern1d)
return kern2d/kern2d.sum()
print(gkern(kernlen=5, sig=1))
output:
[[0.003765 0.015019 0.02379159 0.015019 0.003765 ]
[0.015019 0.05991246 0.0949073 0.05991246 0.015019 ]
[0.02379159 0.0949073 0.15034262 0.0949073 0.02379159]
[0.015019 0.05991246 0.0949073 0.05991246 0.015019 ]
[0.003765 0.015019 0.02379159 0.015019 0.003765 ]]
As I didn't find what I was looking for, I coded my own one-liner. You can modify it accordingly (according to the dimensions and the standard deviation).
Here is the one-liner function for a 3x5 patch for example.
from scipy import signal
def gaussian2D(patchHeight, patchWidth, stdHeight=1, stdWidth=1):
gaussianWindow = signal.gaussian(patchHeight, stdHeight).reshape(-1, 1)#signal.gaussian(patchWidth, stdWidth).reshape(1, -1)
return gaussianWindow
print(gaussian2D(3, 5))
You get an output like this:
[[0.082085 0.36787944 0.60653066 0.36787944 0.082085 ]
[0.13533528 0.60653066 1. 0.60653066 0.13533528]
[0.082085 0.36787944 0.60653066 0.36787944 0.082085 ]]
You can read more about scipy's Gaussian here.
Yet another implementation.
This is normalized so that for sigma > 1 and sufficiently large win_size, the total sum of the kernel elements equals 1.
def gaussian_kernel(win_size, sigma):
t = np.arange(win_size)
x, y = np.meshgrid(t, t)
o = (win_size - 1) / 2
r = np.sqrt((x - o)**2 + (y - o)**2)
scale = 1 / (sigma**2 * 2 * np.pi)
return scale * np.exp(-0.5 * (r / sigma)**2)
To generate a 5x5 kernel:
gaussian_kernel(win_size=5, sigma=1)
I took a similar approach to Nils Werner's answer -- since convolution of any kernel with a Kronecker delta results in the kernel itself centered around that Kronecker delta -- but I made it slightly more general to deal with both odd and even dimensions. In three lines:
import scipy.ndimage as scim
def gaussian_kernel(dimension: int, sigma: float):
dirac = np.zeros((dimension,dimension))
dirac[(dimension-1)//2:dimension//2+1, (dimension-1)//2:dimension//2+1] = 1.0 / (1 + 3 * ((dimension + 1) % 2))
return scim.gaussian_filter(dirac, sigma=sigma)
The second line creates either a single 1.0 in the middle of the matrix (if the dimension is odd), or a square of four 0.25 elements (if the dimension is even). The division could be moved to the third line too; the result is normalised either way.
For those who like to have the kernel the matrix with one (odd) or four (even) 1.0 element(s) in the middle instead of normalisation, this works:
import scipy.ndimage as scim
def gaussian_kernel(dimension: int, sigma: float, ones_in_the_middle=False):
dirac = np.zeros((dimension,dimension))
dirac[(dimension-1)//2:dimension//2+1, (dimension-1)//2:dimension//2+1] = 1.0
kernel = scim.gaussian_filter(dirac, sigma=sigma)
divisor = kernel[(dimension-1)//2, (dimension-1)//2] if ones_in_the_middle else 1 + 3 * ((dimension + 1) % 2)
return kernel/divisor
I have a netcdf file containing global sea-surface temperatures. Using matplotlib and Basemap, I've managed to make a map of this data, with the following code:
from netCDF4 import Dataset
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
filename = '/Users/Nick/Desktop/SST/SST.nc'
fh = Dataset(filename, mode='r')
lons = fh.variables['LON'][:]
lats = fh.variables['LAT'][:]
sst = fh.variables['SST'][:].squeeze()
fig = plt.figure()
m = Basemap(projection='merc', llcrnrlon=80.,llcrnrlat=-25.,urcrnrlon=150.,urcrnrlat=25.,lon_0=115., lat_0=0., resolution='l')
lon, lat = np.meshgrid(lons, lats)
xi, yi = m(lon, lat)
cs = m.pcolormesh(xi,yi,sst, vmin=18, vmax=32)
m.drawmapboundary(fill_color='0.3')
m.fillcontinents(color='0.3', lake_color='0.3')
cbar = m.colorbar(cs, location='bottom', pad="10%", ticks=[18., 20., 22., 24., 26., 28., 30., 32.])
cbar.set_label('January SST (' + u'\u00b0' + 'C)')
plt.savefig('SST.png', dpi=300)
The problem is that the data is very high resolution (9km grid) which makes the resulting image quite noisy. I would like to put the data onto a lower resolution grid (e.g. 1 degree), but I'm struggling to work out how this could be done. I followed a worked solution to try and use the matplotlib griddata function by inserting the code below into my above example, but it resulted in 'ValueError: condition must be a 1-d array'.
xi, yi = np.meshgrid(lons, lats)
X = np.arange(min(x), max(x), 1)
Y = np.arange(min(y), max(y), 1)
Xi, Yi = np.meshgrid(X, Y)
Z = griddata(xi, yi, z, Xi, Yi)
I'm a relative beginner to Python and matplotlib, so I'm not sure what I'm doing wrong (or what a better approach might be). Any advice appreciated!
If you regrid your data to a coarser lat/lon grid using e.g. bilinear interpolation, this will result in a smoother field.
The NCAR ClimateData guide has a nice introduction to regridding (general, not Python-specific).
The most powerful implementation of regridding routines available for Python is, to my knowledge, the Earth System Modeling Framework (ESMF) Python interface (ESMPy). If this is a bit too involved for your application, you should look into
EarthPy tutorials on regridding (e.g. using Pyresample, cKDTree, or Basemap).
Turning your data into an Iris cube and using Iris' regridding functions.
Perhaps start by looking at the EarthPy regridding tutorial using Basemap, since you are using it already.
The way to do this in your example would be
from mpl_toolkits import basemap
from netCDF4 import Dataset
filename = '/Users/Nick/Desktop/SST/SST.nc'
with Dataset(filename, mode='r') as fh:
lons = fh.variables['LON'][:]
lats = fh.variables['LAT'][:]
sst = fh.variables['SST'][:].squeeze()
lons_sub, lats_sub = np.meshgrid(lons[::4], lats[::4])
sst_coarse = basemap.interp(sst, lons, lats, lons_sub, lats_sub, order=1)
This performs bilinear interpolation (order=1) on your SST data onto a sub-sampled grid (every fourth point). Your plot will look more coarse-grained afterwards. If you do not like that, interpolate back onto the original grid with e.g.
sst_smooth = basemap.interp(sst_coarse, lons_sub[0,:], lats_sub[:,0], *np.meshgrid(lons, lats), order=1)
I usually run my data through a Laplace filter for smoothing. Perhaps you could try the function below and see if it helps with your data. The function can be called with or without a mask (e.g land/ocean mask for ocean data points). Hope this helps. T
# Laplace filter for 2D field with/without mask
# M = 1 on - cells used
# M = 0 off - grid cells not used
# Default is without masking
import numpy as np
def laplace_X(F,M):
jmax, imax = F.shape
# Add strips of land
F2 = np.zeros((jmax, imax+2), dtype=F.dtype)
F2[:, 1:-1] = F
M2 = np.zeros((jmax, imax+2), dtype=M.dtype)
M2[:, 1:-1] = M
MS = M2[:, 2:] + M2[:, :-2]
FS = F2[:, 2:]*M2[:, 2:] + F2[:, :-2]*M2[:, :-2]
return np.where(M > 0.5, (1-0.25*MS)*F + 0.25*FS, F)
def laplace_Y(F,M):
jmax, imax = F.shape
# Add strips of land
F2 = np.zeros((jmax+2, imax), dtype=F.dtype)
F2[1:-1, :] = F
M2 = np.zeros((jmax+2, imax), dtype=M.dtype)
M2[1:-1, :] = M
MS = M2[2:, :] + M2[:-2, :]
FS = F2[2:, :]*M2[2:, :] + F2[:-2, :]*M2[:-2, :]
return np.where(M > 0.5, (1-0.25*MS)*F + 0.25*FS, F)
# The mask may cause laplace_X and laplace_Y to not commute
# Take average of both directions
def laplace_filter(F, M=None):
if M == None:
M = np.ones_like(F)
return 0.5*(laplace_X(laplace_Y(F, M), M) +
laplace_Y(laplace_X(F, M), M))
To answer your original question regarding scipy.interpolate.griddata, too:
Have a close look at the parameter specs for that function (e.g. in the SciPy documentation) and make sure that your input arrays have the right shapes. You might need to do something like
import numpy as np
points = np.vstack([a.flat for a in np.meshgrid(lons,lats)]).T # (n,D)
values = sst.ravel() # (n)
etc.
If you are working on Linux, you can achieve this using nctoolkit (https://nctoolkit.readthedocs.io/en/latest/).
You have not stated the latlon extent of your data, so I will assume it is a global dataset. Regridding to 1 degree resolution would require the following:
import nctoolkit as nc
filename = '/Users/Nick/Desktop/SST/SST.nc'
data = nc.open_data(filename)
data.to_latlon(lon = [-179.5, 179.5], lat = [-89.5, 89.5], res = [1,1])
# visualize the data
data.plot()
Look at this example with xarray...
use the ds.interp method and specify the new latitude and longitude values.
http://xarray.pydata.org/en/stable/interpolation.html#example