interpolation periodic boundaries with xarray - python

I would like to interpolate many xarray datasets containig global climate data to one common grid. xarray actually has an interp() method which works fine, but as far as I can tell does not take any periodic boundries into account, although this is necessary when interpolating on a sphere. Instead, datapoints which are outside of the old grid are extrapolated or filled with NaNs. The interpolation is based on the scipy package, and I think other interpolation methods from scipy also do not support periodic boundaries.
I am considering using xesmf, but was wondering if there is an easier solution for this just using xarray?
I would prefer linear interpolation but am flexible in this regard.

This is possible, if you are willing to wrap your data in the longitudinal direction. With some assumptions (DataArray has coords 'lon' and 'lat', 'lon' spans almost 0-360 and doesn't quite go to the boundaries), and borrowing some ideas from this answer, this should work:
import numpy as np
import xarray as xr
data = np.arange(360 * 180).reshape(360, 180)
lon = np.linspace(0.5, 359.5, 360)
lat = np.linspace(-89.5, 89.5, 180)
da = xr.DataArray(
coords=dict(
lon=lon,
lat=lat,
),
data=data,
)
# These will both print 'nan' as lon is outside 0.5-359.5
print(da.interp(lon=0.3, lat=32).values)
print(da.interp(lon=359.7, lat=32).values)
def xr_add_cyclic_points(da):
"""
Add cyclic points at start and end of `lon` dimension of data array.
Inputs
da: xr.DataArray including dimensions (lat,lon)
"""
# Borrows heavily from cartopy.util.add_cyclic point, but adds at start and end.
lon_idx = da.dims.index('lon')
start_slice = [slice(None)] * da.ndim
end_slice = [slice(None)] * da.ndim
start_slice[lon_idx] = slice(0, 1)
end_slice[lon_idx] = slice(-1, None)
wrap_data = np.concatenate([da.values[tuple(end_slice)], da.values, da.values[tuple(start_slice)]], axis=lon_idx)
wrap_lon = np.concatenate([da.lon.values[-1:] - 360, da.lon.values, da.lon.values[0:1] + 360])
# Generate output DataArray with new data but same structure as input
outp_da = xr.DataArray(data=wrap_data,
coords=dict(lat=da.lat, lon=wrap_lon),
dims=da.dims,
attrs=da.attrs)
return outp_da
da_wrapped = xr_add_cyclic_points(da)
# These will print interpolated values.
print(da_wrapped.interp(lon=0.3, lat=32).values)
print(da_wrapped.interp(lon=359.7, lat=32).values)

Related

Coordinate offsets in xarray and dask

I'm making use of xarray as the coordinates and automatic alignment are really useful. and I've been using Dask as the data I'm generally dealing with datasets in the order of terabytes.
I have a 3D source array generated (or loaded) and dependent on wavelength (wl) and x position and y position at origin zero.
I also have a 2D output array dependant only on x and y which accumulates all of the wavelengths from the source array. Idealistically the output would be:
output = source.sum('wl')
However, the wavelength dependence means that each wavelength offsets the source origin by a certain amount. The best (and ugliest) solution I could come up with is to loop through each wavelength, reassign coordinates, interp up to the output coordinates, stack them into a new array and then sum.
I have an example code that shows what I'm trying to do:
from dask.distributed import Client
import xarray as xr
import dask.array as da
import numpy as np
client = Client(n_workers=2, threads_per_worker=2, memory_limit='2GB')
client
# Generate some offset data here
wavelengths = np.linspace(0.1,10,1000)
x_offsets = np.linspace(100,400,1000)
y_offsets = np.linspace(100,400,1000)
# Coordinate offsets for each wavelength
offset = xr.Dataset(
{
'x': (['wl'],x_offsets),
'y': (['wl'],y_offsets)
},
coords={
'wl': wavelengths
})
# Our example source function
source_shape = (1000, 10000, 10000,)
wl_source = np.linspace(0.4,5,source_shape[0])
x_source = np.linspace(-6,6, source_shape[1])
y_source = np.linspace(-6,6, source_shape[2])
source = xr.DataArray(da.random.random(size=source_shape, chunks=(10,400,400)),
coords=[wl_source,x_source,y_source],
dims=['wl','x','y'])
out_shape = (10000, 10000,)
# Our final output array
x_out = np.linspace(-1000,1000,out_shape[0])
y_out = np.linspace(-1000,1000,out_shape[1])
out = xr.DataArray(da.random.random(size=out_shape, chunks=(4000,4000)),coords=[x_out, y_out], dims=['x','y'])
accum =[]
for wl in source.wl:
# Build our map from source -> output space
x_map = offset.interp(wl=wl).x + source.x
y_map = offset.interp(wl=wl).y + source.y
# Remap coordinates
source_mapped = source.sel(wl=wl).assign_coords({'x':x_map,
'y':y_map})
# Interp_like unchunks it so need to rechunk it here
# Interp up to the output coordinates
accum.append(
source_mapped.interp_like(out, method='nearest',kwargs={'fill_value':0}).chunk({'x':4000,'y':4000})
)
# Accumalate and add to the output
out += xr.concat(accum,dim='wl').sum('wl')
out
This solution ends up with over 1 million tasks, because of that the building of the task graph takes a long time and during computation the gc collection takes a long time, memory is exhausted or I spill so much to disk that I run out of storage. Manually slicing has the same issue.
Additionally, this can't scale if I have more than one source as well. I've been racking my brain trying to figure out a better solution.
I'm wondering if theres a more efficient way of doing this? Either through, dask, xarray or some other library. I'm fairly new to dask and xarray so I'm still trying to get to grips with how they work and how to better chunk and distribute tasks
Sorry for the long winded question!

Error when trying to interpolate using SmoothSphereBivariateSpline(): "ValueError: Error code returned by bispev: 10"

I want to interpolate data, which is randomly scattered on the surface of a sphere, onto a regular longitude/latitude grid. I tried to do this with SmoothSphereBivariateSpline() from the scipy.interpolate package (see the code below).
import numpy as np
from scipy.interpolate import SmoothSphereBivariateSpline
#Define the input data and the original sampling points
NSamp = 2000
Theta = np.random.uniform(0,np.pi,NSamp)
Phi = np.random.uniform(0,2*np.pi, NSamp)
Data = np.ones(NSamp)
Interpolator = SmoothSphereBivariateSpline(Theta, Phi, Data, s=3.5)
#Prepare the grid to which the input shall be interpolated
NLon = 64
NLat = 32
GridPosLons = np.arange(NLon)/NLon * 2 * np.pi
GridPosLats = np.arange(NLat)/NLat * np.pi
LatsGrid, LonsGrid = np.meshgrid(GridPosLats, GridPosLons)
Lats = LatsGrid.ravel()
Lons = LonsGrid.ravel()
#Interpolate
Interpolator(Lats, Lons)
However, when I execute this code it gives me the following error:
ValueError: Error code returned by bispev: 10
Does anyone know what the problem is and how to fix it? Is this a bug or am I doing something wrong?
In the documentation of __call__ method of SmoothSphereBivariateSpline, note the grid flag (some other interpolators have it too). If True, it's understood that you are entering one-dimensional arrays from which a grid is to be formed. This is the default value. But you already made a meshgrid from your one-dimensional arrays, so this default behavior doesn't work for your input.
Solution: use either
Interpolator(Lats, Lons, grid=False)
or, which is simpler and better:
Interpolator(GridPosLats, GridPosLons)
The latter will return the data in grid form (2D array), which makes more sense than the flattened data you would get with the first version.

Basemap interpolation alternative - regridding data

I'm moving from basemap to cartopy given basemap is going to be phased out. I've previously used the basemap.interp functionality to interpolate data, e.g. say I have data at 1 degree resolution (180x360), I would run the following to interpolate to 0.5 degrees.
import numpy as np
from mpl_toolkits import basemap
Old_Lon = np.linspace(-180,180,360)
Old_Lat = np.linspace(-90,90,180)
New_Lon = np.linspace(-180,180,720)
New_Lat = np.linspace(-90,90,360)
New_Lon,New_Lat = np.meshgrid(New_Lon,New_Lat)
New_Data = basemap.interp(Old_Data,Old_Lon,Old_Lat,New_Lon,New_Lat,order=0)
order gives me options to choose from nearest neighbour, bi-linear etc. Is there an alternative that does this in as simple way? I've seen scipy has interpolation but I'm not sure how to apply it. Any help would be appreciated!
I eventually decided to take the raw code from Basemap and make it into a standalone function - I'll be recommending it to the cartopy guys to implement it as its a useful feature. Posting here as could be useful to someone else:
def Interp(datain,xin,yin,xout,yout,interpolation='NearestNeighbour'):
"""
Interpolates a 2D array onto a new grid (only works for linear grids),
with the Lat/Lon inputs of the old and new grid. Can perfom nearest
neighbour interpolation or bilinear interpolation (of order 1)'
This is an extract from the basemap module (truncated)
"""
# Mesh Coordinates so that they are both 2D arrays
xout,yout = np.meshgrid(xout,yout)
# compute grid coordinates of output grid.
delx = xin[1:]-xin[0:-1]
dely = yin[1:]-yin[0:-1]
xcoords = (len(xin)-1)*(xout-xin[0])/(xin[-1]-xin[0])
ycoords = (len(yin)-1)*(yout-yin[0])/(yin[-1]-yin[0])
xcoords = np.clip(xcoords,0,len(xin)-1)
ycoords = np.clip(ycoords,0,len(yin)-1)
# Interpolate to output grid using nearest neighbour
if interpolation == 'NearestNeighbour':
xcoordsi = np.around(xcoords).astype(np.int32)
ycoordsi = np.around(ycoords).astype(np.int32)
dataout = datain[ycoordsi,xcoordsi]
# Interpolate to output grid using bilinear interpolation.
elif interpolation == 'Bilinear':
xi = xcoords.astype(np.int32)
yi = ycoords.astype(np.int32)
xip1 = xi+1
yip1 = yi+1
xip1 = np.clip(xip1,0,len(xin)-1)
yip1 = np.clip(yip1,0,len(yin)-1)
delx = xcoords-xi.astype(np.float32)
dely = ycoords-yi.astype(np.float32)
dataout = (1.-delx)*(1.-dely)*datain[yi,xi] + \
delx*dely*datain[yip1,xip1] + \
(1.-delx)*dely*datain[yip1,xi] + \
delx*(1.-dely)*datain[yi,xip1]
return dataout
--
The SciPy interpolation routines return a function that you can call to perform an interpolation. For nearest neighbour interpolation on a regular grid, you can use scipy.interpolate.RegularGridInterpolator:
import numpy as np
from scipy.interpolate import RegularGridInterpolator
nearest_function = RegularGridInterpolator(
(old_lon, old_lat), old_data, method="nearest", bounds_error=False
)
new_data = np.array(
[[nearest_function([i, j]) for j in new_lat] for i in new_lon]
).squeeze()
That isn't perfect, though, because lon=175 are all fill values. (If I hadn't set bounds_error=False then you'd get an error there.) In that case, you need to ask how you want to wrap around the dateline. A straightforward solution would be to copy the lon=0 line to the end of the array and call it lon=180.
Should you want linear or higher order interpolation one day, which I'd recommend if your data are points rather than cells, you can use scipy.interpolate.RectBivariateSpline:
import numpy as np
from scipy.interpolate import RectBivariateSpline
old_step = 10
old_lon = np.arange(-180, 180, old_step)
old_lat = np.arange(-90, 90, old_step)
old_data = np.random.random((len(old_lon), len(old_lat)))
interp_function = RectBivariateSpline(old_lon, old_lat, old_data, kx=1, ky=1)
new_lon = np.arange(-180, 180, new_step)
new_lat = np.arange(-90, 90, new_step)
new_data = interp_function(new_lon, new_lat)

Applying circular filter to image in Python / Applying function to each element of numpy array

I have a Python code that works, but it's quite slow and I believe there has to be a way of doing this more efficiently.
The idea is to apply a filter to an image. The filter is an average of the points which fall within a specified radius. The input is a mx2 array representing x,y and mx1 array z, representing coordinates of m observation points.
The program that works is the following
import numpy as np
def haversine(point, xy_list):
earth_radius = 6378137.0
dlon = np.radians(xy_list[:,0]) - np.radians(point[0])
dlat = np.radians(xy_list[:,1]) - np.radians(point[1])
a = np.square(np.sin(dlat/2.0)) + np.cos(np.radians(point[0])) * np.cos(np.radians(xy_list[:,0])) * np.square(np.sin(dlon/2.0))
return 2 * earth_radius * np.arcsin(np.sqrt(a))
def circular_filter(xy, z, radius):
filtered = np.zeros(xy.shape[0])
for q in range(xy.shape[0]):
dist = haversine(xy[q,:],xy)
masked_z = np.ma.masked_where(dist>radius, z)
filtered[q] = masked_z.mean()
return filtered
x = np.random.uniform(low=-90, high=0, size=(1000,1)) # x represents longitude
y = np.random.uniform(low=0, high=90, size=(1000,1)) # y represents latitude
xy = np.hstack((x,y))
z = np.random.rand(1000,)
fitered_z = circular_filter(xy, z, radius=100.)
The problem is that I have 6 million points per data set, and the code is horribly slow. There must be a way to do this more efficiently. I thought of using scipy.spatial.distance.cdist() which is fast, but then I'd have to reproject the data to UTM, and I'd like to avoid reprojection. Any suggestions?
Thanks,
Reniel
After a lot of reading an searching I finally found the reason my code took forever to run. It's because I needed to understand and apply the concept of a filter kernel. Basically I realized there was a connection between my problem and this post:
Local Maxima with circular window
The downside: User needs to provide proper EPSG code, but I think I can find workarounds for this later.
The upside: It is very fast and efficient.
What worked for me was converting the lat long to UTM so that we can create a circular kernel and apply generic_filter from scipy.
import numpy as np
from pyproj import Proj, transform
from scipy.ndimage.filters import generic_filter
def circular_filter(tile, radius):
x, y = np.meshgrid(tile['lon'], tile['lat'])
x = x.reshape(x.size)
y = np.flipud(y.reshape(y.size))
z = tile['values'].reshape(tile['values'].size)
wgs84 = Proj(init='epsg:4326')
utm18N = Proj(init='epsg:26918')
x,y = transform(wgs84,utm18N,x,y)
dem_res = np.abs(x[1]-x[0]) # calculates the raster resolution, (original data is geoTiff read using gdal).
radius = int(np.ceil(radius/dem_res)) # user gives meters, but we figure out number of cells
print(radius)
kernel = np.zeros((2*radius+1, 2*radius+1))
y,x = np.ogrid[-radius:radius+1, -radius:radius+1]
mask = x**2 + y**2 <= radius**2
kernel[mask] = 1
print('Commence circular filter.'); start = time.time()
tile['values'] = generic_filter(tile['values'], np.mean, footprint=kernel)
print('Took {:.3f} seconds'.format(time.time()-start))
I also took a look at clustering techniques from here: http://geoffboeing.com/2014/08/clustering-to-reduce-spatial-data-set-size/
But I realized these clustering techniques serve a completely different purpose.

How to plot an irregular spaced RGB image using python and basemap?

Given that I have three matrices which describe the data that I want to plot:
lons - 2D matrix with [n_lons,n_lats]
lats - 2D matrix with [n_lons,n_lats]
dataRGB - 3D matrix with [n_lons,n_lats,3]
what is the preferred way to plot such data using python and basemap.
For pseudo-color data this is quite simple using the pcolormesh method:
data - 2D matrix with [n_lons,n_lats]
m = Basemap(...)
m.pcolormesh(lons,lats,data,latlon=True)
From reading the documentation, it seems to me that the imshow command should be used in this case, but for this method regularly gridded data is needed and I would have to regridd and interpolate my data.
Is there any other way to plot the data?
I ran into this same issue awhile ago, and this is the only solution I could come up with:
(Note that this works with matplotlib 1.3.0, but not 1.1.0)
from mpl_toolkits.basemap import Basemap
import numpy.ma as ma
import numpy as np
m = Basemap() #Define your map projection here
Assuming var is your variable of interest (NxMx3),lats is (N)x(M) and lons is (N)x(M):
We need to convert pixel center lat/lons to pixel corner lat/lons (N+1)x(M+1)
cornerLats=getCorners(lat);cornerLons=getCorners(lon)
Get coordinate corners
xCorners,yCorners=m(cornerLats,cornerLons,inverse=True)
Mask the data that is invalid
var=ma.masked_where(np.isnan(var),var)
We need a flattened tuple(N*M,3) to pass to pcolormesh
colorTuple=tuple(np.array([var[:,:,0].flatten(),var[:,:,1].flatten(),var[:,:,2].flatten()]).transpose().tolist())
Setting a larger linewidth will result in more edge distortion, and a
smaller linewidth will result in a screwed up image for some reason.
m.pcolormesh(xCorners,yCorners,var[:,:,0],color=colorTuple,clip_on=True,linewidth=0.05)
def getCorners(centers):
one = centers[:-1,:]
two = centers[1:,:]
d1 = (two - one) / 2.
one = one - d1
two = two + d1
stepOne = np.zeros((centers.shape[0] + 1,centers.shape[1]))
stepOne[:-2,:] = one
stepOne[-2:,:] = two[-2:,:]
one = stepOne[:,:-1]
two = stepOne[:,1:]
d2 = (two - one) / 2.
one = one - d2
two = two + d2
stepTwo = np.zeros((centers.shape[0] + 1,centers.shape[1] + 1))
stepTwo[:,:-2] = one
stepTwo[:,-2:] = two[:,-2:]
return stepTwo

Categories