Basemap interpolation alternative - regridding data - python

I'm moving from basemap to cartopy given basemap is going to be phased out. I've previously used the basemap.interp functionality to interpolate data, e.g. say I have data at 1 degree resolution (180x360), I would run the following to interpolate to 0.5 degrees.
import numpy as np
from mpl_toolkits import basemap
Old_Lon = np.linspace(-180,180,360)
Old_Lat = np.linspace(-90,90,180)
New_Lon = np.linspace(-180,180,720)
New_Lat = np.linspace(-90,90,360)
New_Lon,New_Lat = np.meshgrid(New_Lon,New_Lat)
New_Data = basemap.interp(Old_Data,Old_Lon,Old_Lat,New_Lon,New_Lat,order=0)
order gives me options to choose from nearest neighbour, bi-linear etc. Is there an alternative that does this in as simple way? I've seen scipy has interpolation but I'm not sure how to apply it. Any help would be appreciated!

I eventually decided to take the raw code from Basemap and make it into a standalone function - I'll be recommending it to the cartopy guys to implement it as its a useful feature. Posting here as could be useful to someone else:
def Interp(datain,xin,yin,xout,yout,interpolation='NearestNeighbour'):
"""
Interpolates a 2D array onto a new grid (only works for linear grids),
with the Lat/Lon inputs of the old and new grid. Can perfom nearest
neighbour interpolation or bilinear interpolation (of order 1)'
This is an extract from the basemap module (truncated)
"""
# Mesh Coordinates so that they are both 2D arrays
xout,yout = np.meshgrid(xout,yout)
# compute grid coordinates of output grid.
delx = xin[1:]-xin[0:-1]
dely = yin[1:]-yin[0:-1]
xcoords = (len(xin)-1)*(xout-xin[0])/(xin[-1]-xin[0])
ycoords = (len(yin)-1)*(yout-yin[0])/(yin[-1]-yin[0])
xcoords = np.clip(xcoords,0,len(xin)-1)
ycoords = np.clip(ycoords,0,len(yin)-1)
# Interpolate to output grid using nearest neighbour
if interpolation == 'NearestNeighbour':
xcoordsi = np.around(xcoords).astype(np.int32)
ycoordsi = np.around(ycoords).astype(np.int32)
dataout = datain[ycoordsi,xcoordsi]
# Interpolate to output grid using bilinear interpolation.
elif interpolation == 'Bilinear':
xi = xcoords.astype(np.int32)
yi = ycoords.astype(np.int32)
xip1 = xi+1
yip1 = yi+1
xip1 = np.clip(xip1,0,len(xin)-1)
yip1 = np.clip(yip1,0,len(yin)-1)
delx = xcoords-xi.astype(np.float32)
dely = ycoords-yi.astype(np.float32)
dataout = (1.-delx)*(1.-dely)*datain[yi,xi] + \
delx*dely*datain[yip1,xip1] + \
(1.-delx)*dely*datain[yip1,xi] + \
delx*(1.-dely)*datain[yi,xip1]
return dataout
--

The SciPy interpolation routines return a function that you can call to perform an interpolation. For nearest neighbour interpolation on a regular grid, you can use scipy.interpolate.RegularGridInterpolator:
import numpy as np
from scipy.interpolate import RegularGridInterpolator
nearest_function = RegularGridInterpolator(
(old_lon, old_lat), old_data, method="nearest", bounds_error=False
)
new_data = np.array(
[[nearest_function([i, j]) for j in new_lat] for i in new_lon]
).squeeze()
That isn't perfect, though, because lon=175 are all fill values. (If I hadn't set bounds_error=False then you'd get an error there.) In that case, you need to ask how you want to wrap around the dateline. A straightforward solution would be to copy the lon=0 line to the end of the array and call it lon=180.
Should you want linear or higher order interpolation one day, which I'd recommend if your data are points rather than cells, you can use scipy.interpolate.RectBivariateSpline:
import numpy as np
from scipy.interpolate import RectBivariateSpline
old_step = 10
old_lon = np.arange(-180, 180, old_step)
old_lat = np.arange(-90, 90, old_step)
old_data = np.random.random((len(old_lon), len(old_lat)))
interp_function = RectBivariateSpline(old_lon, old_lat, old_data, kx=1, ky=1)
new_lon = np.arange(-180, 180, new_step)
new_lat = np.arange(-90, 90, new_step)
new_data = interp_function(new_lon, new_lat)

Related

What is the best way/method to digitize the data of a 3D surface into a grid of pixels with smaller resolution in Python?

I want to digitize (= average out over cells) photon count data into pixels given by a grid that tells how they are aligned. The photon count data is stored in a 2D array. I want to split that data into cells, each of which would correspond to a pixel. The idea is basically the same as changing an HD image to a smaller resolution. I'd like to achieve this in Python.
The digitizing function I've written:
import numpy as np
def digitize(function_data, grid_shape):
"""
function_data = 2D array of function values of some 3D shape,
eg.: exp(-(x^2 + y^2 -> want to digitize this
grid_shape: an array of length 2 which contains the dimensions of the smaller resolution
"""
l = len(function_data)
pixel_len_x = int(l/grid_shape[0])
pixel_len_y = int(l/grid_shape[1])
digitized_data = np.empty((grid_shape[0], grid_shape[1]))
for i in range(grid_shape[0]): #row-index of pixel in smaller-resolution grid
for j in range(grid_shape[1]): #column-index of pixel in smaller-resolution grid
hd_pixel = []
for k in range(pixel_len_y):
hd_pixel.append(z_data[k][j:j*pixel_len_x])
hd_pixel = np.ravel(hd_pixel) #turns 2D array into 1D to be able to compute average
pixel_avg = np.average(hd_pixel)
digitized_data[i][j] = pixel_avg
return digitized_data
In theory, this function should do what I want to achieve, but when tested it doesn't yield the expected results. Either a completed version of my function or any other method that achieves my goal would be extremely helpful.
You could also use a interpolation function, if you can use SciPy. Here we use one of the gridded data interpolating functions, RectBivariateSpline to upsample your function, but you can find numerous examples on this and other sites.
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import RectBivariateSpline as rbs
# Sampling coordinates
x = np.linspace(-2,2,20)
y = np.linspace(-2,2,30)
# Your function
f = np.exp(-(x[:,None]**2 + y**2))
# Interpolator
interp = rbs(x, y, f)
# Higher resolution coordinates
x_hd = np.linspace(x.min(), x.max(), x.size * 5)
y_hd = np.linspace(y.min(), y.max(), y.size * 5)
# New higher res function
f_hd = interp(x_hd, y_hd, grid = True)
# Some plots
fig, ax = plt.subplots(ncols = 2)
ax[0].imshow(f)
ax[1].imshow(f_hd)

interpolation periodic boundaries with xarray

I would like to interpolate many xarray datasets containig global climate data to one common grid. xarray actually has an interp() method which works fine, but as far as I can tell does not take any periodic boundries into account, although this is necessary when interpolating on a sphere. Instead, datapoints which are outside of the old grid are extrapolated or filled with NaNs. The interpolation is based on the scipy package, and I think other interpolation methods from scipy also do not support periodic boundaries.
I am considering using xesmf, but was wondering if there is an easier solution for this just using xarray?
I would prefer linear interpolation but am flexible in this regard.
This is possible, if you are willing to wrap your data in the longitudinal direction. With some assumptions (DataArray has coords 'lon' and 'lat', 'lon' spans almost 0-360 and doesn't quite go to the boundaries), and borrowing some ideas from this answer, this should work:
import numpy as np
import xarray as xr
data = np.arange(360 * 180).reshape(360, 180)
lon = np.linspace(0.5, 359.5, 360)
lat = np.linspace(-89.5, 89.5, 180)
da = xr.DataArray(
coords=dict(
lon=lon,
lat=lat,
),
data=data,
)
# These will both print 'nan' as lon is outside 0.5-359.5
print(da.interp(lon=0.3, lat=32).values)
print(da.interp(lon=359.7, lat=32).values)
def xr_add_cyclic_points(da):
"""
Add cyclic points at start and end of `lon` dimension of data array.
Inputs
da: xr.DataArray including dimensions (lat,lon)
"""
# Borrows heavily from cartopy.util.add_cyclic point, but adds at start and end.
lon_idx = da.dims.index('lon')
start_slice = [slice(None)] * da.ndim
end_slice = [slice(None)] * da.ndim
start_slice[lon_idx] = slice(0, 1)
end_slice[lon_idx] = slice(-1, None)
wrap_data = np.concatenate([da.values[tuple(end_slice)], da.values, da.values[tuple(start_slice)]], axis=lon_idx)
wrap_lon = np.concatenate([da.lon.values[-1:] - 360, da.lon.values, da.lon.values[0:1] + 360])
# Generate output DataArray with new data but same structure as input
outp_da = xr.DataArray(data=wrap_data,
coords=dict(lat=da.lat, lon=wrap_lon),
dims=da.dims,
attrs=da.attrs)
return outp_da
da_wrapped = xr_add_cyclic_points(da)
# These will print interpolated values.
print(da_wrapped.interp(lon=0.3, lat=32).values)
print(da_wrapped.interp(lon=359.7, lat=32).values)

Applying circular filter to image in Python / Applying function to each element of numpy array

I have a Python code that works, but it's quite slow and I believe there has to be a way of doing this more efficiently.
The idea is to apply a filter to an image. The filter is an average of the points which fall within a specified radius. The input is a mx2 array representing x,y and mx1 array z, representing coordinates of m observation points.
The program that works is the following
import numpy as np
def haversine(point, xy_list):
earth_radius = 6378137.0
dlon = np.radians(xy_list[:,0]) - np.radians(point[0])
dlat = np.radians(xy_list[:,1]) - np.radians(point[1])
a = np.square(np.sin(dlat/2.0)) + np.cos(np.radians(point[0])) * np.cos(np.radians(xy_list[:,0])) * np.square(np.sin(dlon/2.0))
return 2 * earth_radius * np.arcsin(np.sqrt(a))
def circular_filter(xy, z, radius):
filtered = np.zeros(xy.shape[0])
for q in range(xy.shape[0]):
dist = haversine(xy[q,:],xy)
masked_z = np.ma.masked_where(dist>radius, z)
filtered[q] = masked_z.mean()
return filtered
x = np.random.uniform(low=-90, high=0, size=(1000,1)) # x represents longitude
y = np.random.uniform(low=0, high=90, size=(1000,1)) # y represents latitude
xy = np.hstack((x,y))
z = np.random.rand(1000,)
fitered_z = circular_filter(xy, z, radius=100.)
The problem is that I have 6 million points per data set, and the code is horribly slow. There must be a way to do this more efficiently. I thought of using scipy.spatial.distance.cdist() which is fast, but then I'd have to reproject the data to UTM, and I'd like to avoid reprojection. Any suggestions?
Thanks,
Reniel
After a lot of reading an searching I finally found the reason my code took forever to run. It's because I needed to understand and apply the concept of a filter kernel. Basically I realized there was a connection between my problem and this post:
Local Maxima with circular window
The downside: User needs to provide proper EPSG code, but I think I can find workarounds for this later.
The upside: It is very fast and efficient.
What worked for me was converting the lat long to UTM so that we can create a circular kernel and apply generic_filter from scipy.
import numpy as np
from pyproj import Proj, transform
from scipy.ndimage.filters import generic_filter
def circular_filter(tile, radius):
x, y = np.meshgrid(tile['lon'], tile['lat'])
x = x.reshape(x.size)
y = np.flipud(y.reshape(y.size))
z = tile['values'].reshape(tile['values'].size)
wgs84 = Proj(init='epsg:4326')
utm18N = Proj(init='epsg:26918')
x,y = transform(wgs84,utm18N,x,y)
dem_res = np.abs(x[1]-x[0]) # calculates the raster resolution, (original data is geoTiff read using gdal).
radius = int(np.ceil(radius/dem_res)) # user gives meters, but we figure out number of cells
print(radius)
kernel = np.zeros((2*radius+1, 2*radius+1))
y,x = np.ogrid[-radius:radius+1, -radius:radius+1]
mask = x**2 + y**2 <= radius**2
kernel[mask] = 1
print('Commence circular filter.'); start = time.time()
tile['values'] = generic_filter(tile['values'], np.mean, footprint=kernel)
print('Took {:.3f} seconds'.format(time.time()-start))
I also took a look at clustering techniques from here: http://geoffboeing.com/2014/08/clustering-to-reduce-spatial-data-set-size/
But I realized these clustering techniques serve a completely different purpose.

To do a spline surface fit using scipy's RectBivariateSpline and SmoothBivariateSpline on noisy data

I am trying to do a 2D-surface fit on some imaging data. I attached an example of such data, which is basically a 1014 x 1014 array with substantial amount of noise. Example_image. Some patches of this array are invalid data, which I masked and set to NaN values, as shown in yellow in the Example image. As you can see in the image, there is a background gradient from left (brighter) to right (dimmer), which I am trying to remove. The gradient cannot be well fitted by a polynomial, hence my goal is to do a 2D-surface bivariate spline fit, and subtract the gradient off.
I have tried a number of tasks in scipy, but most of them do not return ideal result.
To start with I have tried the [RectBivariateSpline] Bivariate structured interpolation of large array with NaN values or mask), but since my image have NaNs in it, running RectBivariateSpline gives only an output of NaNs.
I have also tried SmoothBivariateSpline, which is the irregular gridded version of the task. I omitted those pixels that have NaN values and converted the rest into 1D arrays as input. But it failed as the array size is too big. I then tried to chop my array to try to run it on smaller chunks, but it gives the following error and quit with a segmentation fault, which I have no idea what it means.
fitpack2.py:1044: UserWarning:
Error on entry, no approximation returned. The following conditions
must hold:
xb<=x[i]<=xe, yb<=y[i]<=ye, w[i]>0, i=0..m-1
If iopt==-1, then
xb
I then tried to first filled in the NaN patches in my image with values using griddata with linear interpolation. Since the patches are huge, the interpolation is not ideal, but at least it gave me an array without NaN. I then use this array to run RectBivariateSpline again. But the output array is still NaNs.
I suspect that the noise in my image is screwing up the behaviour of both tasks, so I also tried to first run a Gaussian kernel on my image to smooth it, then filled in the NaN patches with griddata, then run RectBivariateSpline or SmoothBivariateSpline, but they still give me arrays with NaN values as output.
I am not sure that I understand the manual of both tasks correctly, so I attach the following script:
#!/usr/bin/python
import matplotlib
matplotlib.use('qt5agg')
#matplotlib.rc('font',**{'family':'sans-serif','sans-serif':['Helvetica']})
#matplotlib.rc('text.latex', preamble=r'\usepackage{cmbright}')
#matplotlib.rc('text.latex', preamble=r'\usepackage[scaled]{helvet} \renewcommand\familydefault{\sfdefault} \usepackage[T1]{fontenc}')
#matplotlib.rc('text', usetex=True)
import matplotlib.pyplot as plt
import numpy as np
import astropy.io.fits as pyfits
import scipy.interpolate as sp
from astropy.convolution import convolve
from astropy.convolution import Gaussian2DKernel
#------------------------------------------------------------
#Read in the arrays
hdulistorg = pyfits.open('icmj01jrq_flt.fits')
hdulistorg.info()
errarrorg = np.swapaxes(hdulistorg[1].data, 0,1)
hdulist = pyfits.open('jrq_sci_nan_deep.fits')
hdulist.info()
dataarrorg = np.swapaxes(hdulist[0].data, 0,1) #image array
errarrorg = np.swapaxes(hdulistorg[1].data, 0,1) #error array
#Flag some of the problematic values, turn NaNs into 0 for easier handling
dataarr = np.copy(dataarrorg)
w=np.isnan(dataarr)
ww=np.where(dataarr == 0)
www=np.where(dataarr > 100)
wwww=np.where(dataarr < 0)
errarr = 1.0 / (np.copy(errarrorg)+1e-5) # Try to use 1/error as the estimate for weight below
errarr[w] = 0
errarr[ww] = 0
errarr[www] = 0
errarr[wwww]=0
dataarr[w]= 0
dataarr[ww]= 0
dataarr[www]=0
dataarr[wwww]=0
#Make a gaussian kernel smoothed data
maskarr = np.copy(errarr) #For masking the nan regions so they dun get smoothed
maskarr[:]=0
maskarr[w]=1
maskarr[ww]=1
maskarr[www]=1
maskarr[wwww]=1
gauss = Gaussian2DKernel(stddev=5)
condataarr = convolve(dataarr,gauss,normalize_kernel=True,boundary='extend',mask=maskarr)
condataarr[w]=0
conerrarr = np.copy(errarr)
#Setting x,y arrays for the Spline functions
nx, ny = (1014,1014)
x = np.linspace(0, 1013, nx)
y = np.linspace(0, 1013, ny)
xv, yv = np.meshgrid(x, y)
#Make an 1D version of these 2D arrays
dataarrflat = np.ravel(condataarr[0:200,0:200]) #Try only a small chunk!
xvflat = np.ravel(xv[0:200,0:200])
yvflat = np.ravel(yv[0:200,0:200])
errarrflat = np.ravel(conerrarr[0:200,0:200])
notnanloc = np.where(dataarrflat != 0) #Not NaNs
#SmoothBivariateSpline!
rect_S_spline = sp.SmoothBivariateSpline(xvflat[notnanloc], yvflat[notnanloc], dataarrflat[notnanloc],w=errarrflat[notnanloc], kx=3, ky=3)
#Also try using grid data to fix the grid?
gddataarr = np.copy(condataarr)
gddataarrflat = np.ravel(gddataarr)
gdloc = np.where(gddataarrflat != 0) #Not NaNs
gdxvflat = np.ravel(xv)
gdyvflat = np.ravel(yv)
xyarr = np.c_[gdxvflat[gdloc],gdyvflat[gdloc]]
x_grid, y_grid = np.mgrid[0:1013:1014j,0:1013:1014j]
grid_z2 = sp.griddata(xyarr, gddataarrflat[gdloc], (x_grid, y_grid), method='linear')
plt.imshow(grid_z2.T)
#plt.show()
#RectBivariatSpline
rect_B_spline = sp.RectBivariateSpline(x, y, grid_z2.T)
#Result grid (same as input for now)
xnew = np.arange(0, 1013, 1)
ynew = np.arange(0, 1013, 1)
znewS = rect_S_spline(xnew, ynew)
znewB = rect_B_spline(xnew, ynew)
print 'znewS', znewS
print 'znewB', znewB
#Write FITS files
condataarr = np.swapaxes(condataarr, 0, 1)
hdu2 = pyfits.PrimaryHDU(condataarr)
hdulist2 = pyfits.HDUList([hdu2])
hdulist2.writeto('contest.fits',overwrite=True)
hdulist2.close()
hdu3 = pyfits.PrimaryHDU(znewS)
hdulist3 = pyfits.HDUList([hdu3])
hdulist3.writeto('Stest.fits',overwrite=True)
hdulist3.close()
I can not exactly solve your problem, but I have some code that interfaces a FORTRAN interpolation routine with python. You can just call the routines directly from python, no fortran needed.
You can find the code and a description of it at this github page
https://github.com/haakoan/inter

find tangent vector at a point for discrete data points

I have a vector with a min of two points in space, e.g:
A = np.array([-1452.18133319 3285.44737438 -7075.49516676])
B = np.array([-1452.20175668 3285.29632734 -7075.49110863])
I want to find the tangent of the vector at a discrete points along the curve, g.g the beginning and end of the curve. I know how to do it in Matlab but I want to do it in Python. This is the code in Matlab:
A = [-1452.18133319 3285.44737438 -7075.49516676];
B = [-1452.20175668 3285.29632734 -7075.49110863];
points = [A; B];
distance = [0.; 0.1667];
pp = interp1(distance, points,'pchip','pp');
[breaks,coefs,l,k,d] = unmkpp(pp);
dpp = mkpp(breaks,repmat(k-1:-1:1,d*l,1).*coefs(:,1:k-1),d);
ntangent=zeros(length(distance),3);
for j=1:length(distance)
ntangent(j,:) = ppval(dpp, distance(j));
end
%The solution would be at beginning and end:
%ntangent =
% -0.1225 -0.9061 0.0243
% -0.1225 -0.9061 0.0243
Any ideas? I tried to find the solution using numpy and scipy using multiple methods, e.g.
tck, u= scipy.interpolate.splprep(data)
but none of the methods seem satisfy what I want.
Give der=1 to splev to get the derivative of the spline:
from scipy import interpolate
import numpy as np
t=np.linspace(0,1,200)
x=np.cos(5*t)
y=np.sin(7*t)
tck, u = interpolate.splprep([x,y])
ti = np.linspace(0, 1, 200)
dxdt, dydt = interpolate.splev(ti,tck,der=1)
ok, I found the solution which is a little modification of "pv" above (note that splev works only for 1D vectors)
One problem I was having originally with "tck, u= scipy.interpolate.splprep(data)" is that it requires a min of 4 points to work (Matlab works with two points). I was using two points. After increasing the data points, it works as i want.
Here is the solution for completeness:
import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate
data = np.array([[-1452.18133319 , 3285.44737438, -7075.49516676],
[-1452.20175668 , 3285.29632734, -7075.49110863],
[-1452.32645025 , 3284.37412457, -7075.46633213],
[-1452.38226151 , 3283.96135828, -7075.45524248]])
distance=np.array([0., 0.15247556, 1.0834, 1.50007])
data = data.T
tck,u = interpolate.splprep(data, u=distance, s=0)
yderv = interpolate.splev(u,tck,der=1)
and the tangents are (which matches the Matlab results if the same data is used):
(-0.13394599723751408, -0.99063114953803189, 0.026614957159932656)
(-0.13394598523149195, -0.99063115868512985, 0.026614950816003666)
(-0.13394595055068903, -0.99063117647357712, 0.026614941718878599)
(-0.13394595652952143, -0.9906311632471152, 0.026614954146007865)

Categories