I am trying to interpolate 3D atmospheric data from one vertical coordinate to another using Numpy/Scipy. For example, I have cubes of temperature and relative humidity, both of which are on constant, regular pressure surfaces. I want to interpolate the relative humidity to constant temperature surface(s).
The exact problem I am trying to solve has been asked previously here, however, the solution there is very slow. In my case, I have approximately 3M points in my cube (30x321x321), and that method takes around 4 minutes to operate on one set of data.
That post is nearly 5 years old. Do newer versions of Numpy/Scipy perhaps have methods that handle this faster? Maybe new sets of eyes looking at the problem have a better approach? I'm open to suggestions.
EDIT:
Slow = 4 minutes for one set of data cubes. I'm not sure how else I can quantify it.
The code being used...
def interpLevel(grid,value,data,interp='linear'):
"""
Interpolate 3d data to a common z coordinate.
Can be used to calculate the wind/pv/whatsoever values for a common
potential temperature / pressure level.
grid : numpy.ndarray
The grid. For example the potential temperature values for the whole 3d
grid.
value : float
The common value in the grid, to which the data shall be interpolated.
For example, 350.0
data : numpy.ndarray
The data which shall be interpolated. For example, the PV values for
the whole 3d grid.
kind : str
This indicates which kind of interpolation will be done. It is directly
passed on to scipy.interpolate.interp1d().
returns : numpy.ndarray
A 2d array containing the *data* values at *value*.
"""
ret = np.zeros_like(data[0,:,:])
for yIdx in xrange(grid.shape[1]):
for xIdx in xrange(grid.shape[2]):
# check if we need to flip the column
if grid[0,yIdx,xIdx] > grid[-1,yIdx,xIdx]:
ind = -1
else:
ind = 1
f = interpolate.interp1d(grid[::ind,yIdx,xIdx], \
data[::ind,yIdx,xIdx], \
kind=interp)
ret[yIdx,xIdx] = f(value)
return ret
EDIT 2:
I could share npy dumps of sample data, if anyone was interested enough to see what I am working with.
Since this is atmospheric data, I imagine that your grid does not have uniform spacing; however if your grid is rectilinear (such that each vertical column has the same set of z-coordinates) then you have some options.
For instance, if you only need linear interpolation (say for a simple visualization), you can just do something like:
# Find nearest grid point
idx = grid[:,0,0].searchsorted(value)
upper = grid[idx,0,0]
lower = grid[idx - 1, 0, 0]
s = (value - lower) / (upper - lower)
result = (1-s) * data[idx - 1, :, :] + s * data[idx, :, :]
(You'll need to add checks for value being out of range, of course).For a grid your size, this will be extremely fast (as in tiny fractions of a second)
You can pretty easily modify the above to perform cubic interpolation if need be; the challenge is in picking the correct weights for non-uniform vertical spacing.
The problem with using scipy.ndimage.map_coordinates is that, although it provides higher order interpolation and can handle arbitrary sample points, it does assume that the input data be uniformly spaced. It will still produce smooth results, but it won't be a reliable approximation.
If your coordinate grid is not rectilinear, so that the z-value for a given index changes for different x and y indices, then the approach you are using now is probably the best you can get without a fair bit of analysis of your particular problem.
UPDATE:
One neat trick (again, assuming that each column has the same, not necessarily regular, coordinates) is to use interp1d to extract the weights doing something like follows:
NZ = grid.shape[0]
zs = grid[:,0,0]
ident = np.identity(NZ)
weight_func = interp1d(zs, ident, 'cubic')
You only need to do the above once per grid; you can even reuse weight_func as long as the vertical coordinates don't change.
When it comes time to interpolate then, weight_func(value) will give you the weights, which you can use to compute a single interpolated value at (x_idx, y_idx) with:
weights = weight_func(value)
interp_val = np.dot(data[:, x_idx, y_idx), weights)
If you want to compute a whole plane of interpolated values, you can use np.inner, although since your z-coordinate comes first, you'll need to do:
result = np.inner(data.T, weights).T
Again, the computation should be practically immediate.
This is quite an old question but the best way to do this nowadays is to use MetPy's interpolate_1d funtion:
https://unidata.github.io/MetPy/latest/api/generated/metpy.interpolate.interpolate_1d.html
There is a new implementation of Numba accelerated interpolation on regular grids in 1, 2, and 3 dimensions:
https://github.com/dbstein/fast_interp
Usage is as follows:
from fast_interp import interp2d
import numpy as np
nx = 50
ny = 37
xv, xh = np.linspace(0, 1, nx, endpoint=True, retstep=True)
yv, yh = np.linspace(0, 2*np.pi, ny, endpoint=False, retstep=True)
x, y = np.meshgrid(xv, yv, indexing='ij')
test_function = lambda x, y: np.exp(x)*np.exp(np.sin(y))
f = test_function(x, y)
test_x = -xh/2.0
test_y = 271.43
fa = test_function(test_x, test_y)
interpolater = interp2d([0,0], [1,2*np.pi], [xh,yh], f, k=5, p=[False,True], e=[1,0])
fe = interpolater(test_x, test_y)
Related
So, I have three numpy arrays which store latitude, longitude, and some property value on a grid -- that is, I have LAT(y,x), LON(y,x), and, say temperature T(y,x), for some limits of x and y. The grid isn't necessarily regular -- in fact, it's tripolar.
I then want to interpolate these property (temperature) values onto a bunch of different lat/lon points (stored as lat1(t), lon1(t), for about 10,000 t...) which do not fall on the actual grid points. I've tried matplotlib.mlab.griddata, but that takes far too long (it's not really designed for what I'm doing, after all). I've also tried scipy.interpolate.interp2d, but I get a MemoryError (my grids are about 400x400).
Is there any sort of slick, preferably fast way of doing this? I can't help but think the answer is something obvious... Thanks!!
Try the combination of inverse-distance weighting and
scipy.spatial.KDTree
described in SO
inverse-distance-weighted-idw-interpolation-with-python.
Kd-trees
work nicely in 2d 3d ..., inverse-distance weighting is smooth and local,
and the k= number of nearest neighbours can be varied to tradeoff speed / accuracy.
There is a nice inverse distance example by Roger Veciana i Rovira along with some code using GDAL to write to geotiff if you're into that.
This is of coarse to a regular grid, but assuming you project the data first to a pixel grid with pyproj or something, all the while being careful what projection is used for your data.
A copy of his algorithm and example script:
from math import pow
from math import sqrt
import numpy as np
import matplotlib.pyplot as plt
def pointValue(x,y,power,smoothing,xv,yv,values):
nominator=0
denominator=0
for i in range(0,len(values)):
dist = sqrt((x-xv[i])*(x-xv[i])+(y-yv[i])*(y-yv[i])+smoothing*smoothing);
#If the point is really close to one of the data points, return the data point value to avoid singularities
if(dist<0.0000000001):
return values[i]
nominator=nominator+(values[i]/pow(dist,power))
denominator=denominator+(1/pow(dist,power))
#Return NODATA if the denominator is zero
if denominator > 0:
value = nominator/denominator
else:
value = -9999
return value
def invDist(xv,yv,values,xsize=100,ysize=100,power=2,smoothing=0):
valuesGrid = np.zeros((ysize,xsize))
for x in range(0,xsize):
for y in range(0,ysize):
valuesGrid[y][x] = pointValue(x,y,power,smoothing,xv,yv,values)
return valuesGrid
if __name__ == "__main__":
power=1
smoothing=20
#Creating some data, with each coodinate and the values stored in separated lists
xv = [10,60,40,70,10,50,20,70,30,60]
yv = [10,20,30,30,40,50,60,70,80,90]
values = [1,2,2,3,4,6,7,7,8,10]
#Creating the output grid (100x100, in the example)
ti = np.linspace(0, 100, 100)
XI, YI = np.meshgrid(ti, ti)
#Creating the interpolation function and populating the output matrix value
ZI = invDist(xv,yv,values,100,100,power,smoothing)
# Plotting the result
n = plt.normalize(0.0, 100.0)
plt.subplot(1, 1, 1)
plt.pcolor(XI, YI, ZI)
plt.scatter(xv, yv, 100, values)
plt.title('Inv dist interpolation - power: ' + str(power) + ' smoothing: ' + str(smoothing))
plt.xlim(0, 100)
plt.ylim(0, 100)
plt.colorbar()
plt.show()
There's a bunch of options here, which one is best will depend on your data...
However I don't know of an out-of-the-box solution for you
You say your input data is from tripolar data. There are three main cases for how this data could be structured.
Sampled from a 3d grid in tripolar space, projected back to 2d LAT, LON data.
Sampled from a 2d grid in tripolar space, projected into 2d LAT LON data.
Unstructured data in tripolar space projected into 2d LAT LON data
The easiest of these is 2. Instead of interpolating in LAT LON space, "just" transform your point back into the source space and interpolate there.
Another option that works for 1 and 2 is to search for the cells that maps from tripolar space to cover your sample point. (You can use a BSP or grid type structure to speed up this search) Pick one of the cells, and interpolate inside it.
Finally there's a heap of unstructured interpolation options .. but they tend to be slow.
A personal favourite of mine is to use a linear interpolation of the nearest N points, finding those N points can again be done with gridding or a BSP. Another good option is to Delauney triangulate the unstructured points and interpolate on the resulting triangular mesh.
Personally if my mesh was case 1, I'd use an unstructured strategy as I'd be worried about having to handle searching through cells with overlapping projections. Choosing the "right" cell would be difficult.
I suggest you taking a look at GRASS (an open source GIS package) interpolation features (http://grass.ibiblio.org/gdp/html_grass62/v.surf.bspline.html). It's not in python but you can reimplement it or interface with C code.
Am I right in thinking your data grids look something like this (red is the old data, blue is the new interpolated data)?
alt text http://www.geekops.co.uk/photos/0000-00-02%20%28Forum%20images%29/DataSeparation.png
This might be a slightly brute-force-ish approach, but what about rendering your existing data as a bitmap (opengl will do simple interpolation of colours for you with the right options configured and you could render the data as triangles which should be fairly fast). You could then sample pixels at the locations of the new points.
Alternatively, you could sort your first set of points spatially and then find the closest old points surrounding your new point and interpolate based on the distances to those points.
There is a FORTRAN library called BIVAR, which is very suitable for this problem. With a few modifications you can make it usable in python using f2py.
From the description:
BIVAR is a FORTRAN90 library which interpolates scattered bivariate data, by Hiroshi Akima.
BIVAR accepts a set of (X,Y) data points scattered in 2D, with associated Z data values, and is able to construct a smooth interpolation function Z(X,Y), which agrees with the given data, and can be evaluated at other points in the plane.
I am considering using this method to interpolate some 3D points I have. As an input I have atmospheric concentrations of a gas at various elevations over an area. The data I have appears as values every few feet of vertical elevation for several tens of feet, but horizontally separated by many hundreds of feet (so 'columns' of tightly packed values).
The assumption is that values vary in the vertical direction significantly more than in the horizontal direction at any given point in time.
I want to perform 3D kriging with that assumption accounted for (as a parameter I can adjust or that is statistically defined - either/or).
I believe the scikit learn module can do this. If it can, my question is how do I create a discrete cell output? That is, output into a 3D grid of data with dimensions of, say, 50 x 50 x 1 feet. Ideally, I would like an output of [x_location, y_location, value] with separation of those (or similar) distances.
Unfortunately I don't have a lot of time to play around with it, so I'm just hoping to figure out if this is possible in Python before delving into it. Thanks!
Yes, you can definitely do that in scikit_learn.
In fact, it is a basic feature of kriging/Gaussian process regression that you can use anisotropic covariance kernels.
As it is precised in the manual (cited below) ou can either set the parameters of the covariance yourself or estimate them. And you can choose either having all parameters equal or all different.
theta0 : double array_like, optional
An array with shape (n_features, ) or (1, ). The parameters in the
autocorrelation model. If thetaL and thetaU are also specified, theta0
is considered as the starting point for the maximum likelihood
estimation of the best set of parameters. Default assumes isotropic
autocorrelation model with theta0 = 1e-1.
In the 2d case, something like this should work:
import numpy as np
from sklearn.gaussian_process import GaussianProcess
x = np.arange(1,51)
y = np.arange(1,51)
X, Y = np.meshgrid(lons, lats)
points = zip(obs_x, obs_y)
values = obs_data # Replace with your observed data
gp = GaussianProcess(theta0=0.1, thetaL=.001, thetaU=1., nugget=0.001)
gp.fit(points, values)
XY_pairs = np.column_stack([X.flatten(), Y.flatten()])
predicted = gp.predict(XY_pairs).reshape(X.shape)
I have a number of 2-dimensional np.arrays (all of equal size) containing complex numbers. Each of them belongs to one position in a 4-dimensional space. Those positions are sparse and distributed irregularly (a latin hypercube to be precise).
I would like to interpolate this data to other points in the same 4-dimensional space.
I can successfully do this for simple numbers, using either sklearn.kriging(), scipy.interpolate.Rbf() (or others):
# arrayof co-ordinates: 2 4D sets
X = np.array([[1.0, 0.0, 0.0, 0.0],\
[0.0, 1.0, 0.0, 0.0]])
# two numbers, one for each of the points above
Y = np.array([1,\
0])
# define the type of gaussian process I want
kriging = gp.GaussianProcess(theta0=1e-2, thetaL=1e-4, thetaU=4.0,\
corr='linear', normalize=True, nugget=0.00001, optimizer='fmin_cobyla')
# train the model on the data
kmodel = kriging.fit(X,Y)
# interpolate
kmodel.predict(np.array([0.5, 0.5, 0.0, 0.0]))
# returns: array([ 0.5])
If I try to use arrays (or just complex numbers) as data, this doesn't work:
# two arrays of complex numbers, instead of the numbers
Y = np.array([[1+1j, -1-1j],\
[0+0j, 0+0j]])
kmodel = kriging.fit(X,Y)
# returns: ValueError: The number of features in X (X.shape[1] = 1) should match the sample size used for fit() which is 4.
This is obvious since the docstring for kriging.fit() clearly states that it needs an array of n scalars, one per each element in the first dimension of X.
One solution is to decompose the arrays in Y into individual numbers, those into real and imaginary parts, make a separate interpolation of each of those and then put them together again. I can do this with the right combination of loops and some artistry but it would be nice if there was a method (e.g. in scipy.interpolate) that could handle an entire np.array instead of scalar values.
I'm not fixed on a specific algorithm (yet), so I'd be happy to know about any that can use arrays of complex numbers as the "variable" to be interpolated. Since -- as I said -- there are few and irregular points in space (and no grid to interpolate on), simple linear interpolation won't do, of course.
There are two ways of looking at complex numbers:
Cartesian Form ( a + bi ) and
Polar/Euler Form ( A * exp(i * phi) )
When you say you want to interpolate between two polar coordinates, do you want to interpolate with respect to the real/imaginary components (1), or with respect to the number's magnitude and phase (2)?
You CAN break things down into real and imaginary components,
X = 2 * 5j
X_real = np.real(X)
X_imag = np.imag(X)
# Interpolate the X_real and X_imag
# Reconstruct X
X2 = X_real + 1j * X_imag
However, With real-life applications that involve complex numbers, such as digital filter design, you quite often want to work with numbers in Polar/exponential form.
Therefore instead of interpolating the np.real() and np.imag() components, you may want to break it down into magnitude & phase using np.abs() and Angle or Arctan2, and interpolate separately. You might do this, for example, when trying to interpolate the Fourier Transform of a digital filter.
Y = 1+2j
mag = np.abs(Y)
phase = np.angle(Y)
The interpolated values can be converted back into complex (Cartesian) numbers using the Eulers formula
# Complex number
y = mag * np.exp( 1j * phase)
# Or if you want the real and imaginary complex components separately,
realPart, imagPart = mag * np.cos(phase) , mag * np.sin(phase)
Depending on what you're doing, this gives you some real flexibility with the interpolation methods you use.
I ended up working around the problem, but after learning a good deal more about response surfaces and the like, I now understand that this is a far-from-trivial problem. I could not have expected a simple solution in numpy, and the question would have probably been better placed in a forum on mathematics than on programming.
If I had to tackle such a task again, I'd probably use scikit-learn to try and establish either a co-Kriging interpolation for both components, or two separate Kriging (or more general, Gaussian Process) models which share a common set of model constants, optimized to minimize the combined error amplitude, (i.e.: Full model error square is the sum of both partial model errors)
-- but first I'd go and have a look if there aren't any useful papers on the topic already.
I am following this link to do a smoothing of my data set.
The technique is based on the principle of removing the higher order terms of the Fourier Transform of the signal, and so obtaining a smoothed function.
This is part of my code:
N = len(y)
y = y.astype(float) # fix issue, see below
yfft = fft(y, N)
yfft[31:] = 0.0 # set higher harmonics to zero
y_smooth = fft(yfft, N)
ax.errorbar(phase, y, yerr = err, fmt='b.', capsize=0, elinewidth=1.0)
ax.plot(phase, y_smooth/30, color='black') #arbitrary normalization, see below
However some things do not work properly.
Indeed, you can check the resulting plot :
The blue points are my data, while the black line should be the smoothed curve.
First of all I had to convert my array of data y by following this discussion.
Second, I just normalized arbitrarily to compare the curve with data, since I don't know why the original curve had values much higher than the data points.
Most importantly, the curve is like "specular" to the data point, and I don't know why this happens.
It would be great to have some advices especially to the third point, and more generally how to optimize the smoothing with this technique for my particular data set shape.
Your problem is probably due to the shifting that the standard FFT does. You can read about it here.
Your data is real, so you can take advantage of symmetries in the FT and use the special function np.fft.rfft
import numpy as np
x = np.arange(40)
y = np.log(x + 1) * np.exp(-x/8.) * x**2 + np.random.random(40) * 15
rft = np.fft.rfft(y)
rft[5:] = 0 # Note, rft.shape = 21
y_smooth = np.fft.irfft(rft)
plt.plot(x, y, label='Original')
plt.plot(x, y_smooth, label='Smoothed')
plt.legend(loc=0)
plt.show()
If you plot the absolute value of rft, you will see that there is almost no information in frequencies beyond 5, so that is why I choose that threshold (and a bit of playing around, too).
Here the results:
From what I can gather you want to build a low pass filter by doing the following:
Move to the frequency domain. (Fourier transform)
Remove undesired frequencies.
Move back to the time domain. (Inverse fourier transform)
Looking at your code, instead of doing 3) you're just doing another fourier transform. Instead, try doing an actual inverse fourier transform to move back to the time domain:
y_smooth = ifft(yfft, N)
Have a look at scipy signal to see a bunch of already available filters.
(Edit: I'd be curious to see the results, do share!)
I would be very cautious in using this technique. By zeroing out frequency components of the FFT you are effectively constructing a brick wall filter in the frequency domain. This will result in convolution with a sinc in the time domain and likely distort the information you want to process. Look up "Gibbs phenomenon" for more information.
You're probably better off designing a low pass filter or using a simple N-point moving average (which is itself a LPF) to accomplish the smoothing.
I need to interpolate temperature data linearly in 4 dimensions (latitude, longitude, altitude and time).
The number of points is fairly high (360x720x50x8) and I need a fast method of computing the temperature at any point in space and time within the data bounds.
I have tried using scipy.interpolate.LinearNDInterpolator but using Qhull for triangulation is inefficient on a rectangular grid and takes hours to complete.
By reading this SciPy ticket, the solution seemed to be implementing a new nd interpolator using the standard interp1d to calculate a higher number of data points, and then use a "nearest neighbor" approach with the new dataset.
This, however, takes a long time again (minutes).
Is there a quick way of interpolating data on a rectangular grid in 4 dimensions without it taking minutes to accomplish?
I thought of using interp1d 4 times without calculating a higher density of points, but leaving it for the user to call with the coordinates, but I can't get my head around how to do this.
Otherwise would writing my own 4D interpolator specific to my needs be an option here?
Here's the code I've been using to test this:
Using scipy.interpolate.LinearNDInterpolator:
import numpy as np
from scipy.interpolate import LinearNDInterpolator
lats = np.arange(-90,90.5,0.5)
lons = np.arange(-180,180,0.5)
alts = np.arange(1,1000,21.717)
time = np.arange(8)
data = np.random.rand(len(lats)*len(lons)*len(alts)*len(time)).reshape((len(lats),len(lons),len(alts),len(time)))
coords = np.zeros((len(lats),len(lons),len(alts),len(time),4))
coords[...,0] = lats.reshape((len(lats),1,1,1))
coords[...,1] = lons.reshape((1,len(lons),1,1))
coords[...,2] = alts.reshape((1,1,len(alts),1))
coords[...,3] = time.reshape((1,1,1,len(time)))
coords = coords.reshape((data.size,4))
interpolatedData = LinearNDInterpolator(coords,data)
Using scipy.interpolate.interp1d:
import numpy as np
from scipy.interpolate import LinearNDInterpolator
lats = np.arange(-90,90.5,0.5)
lons = np.arange(-180,180,0.5)
alts = np.arange(1,1000,21.717)
time = np.arange(8)
data = np.random.rand(len(lats)*len(lons)*len(alts)*len(time)).reshape((len(lats),len(lons),len(alts),len(time)))
interpolatedData = np.array([None, None, None, None])
interpolatedData[0] = interp1d(lats,data,axis=0)
interpolatedData[1] = interp1d(lons,data,axis=1)
interpolatedData[2] = interp1d(alts,data,axis=2)
interpolatedData[3] = interp1d(time,data,axis=3)
Thank you very much for your help!
In the same ticket you have linked, there is an example implementation of what they call tensor product interpolation, showing the proper way to nest recursive calls to interp1d. This is equivalent to quadrilinear interpolation if you choose the default kind='linear' parameter for your interp1d's.
While this may be good enough, this is not linear interpolation, and there will be higher order terms in the interpolation function, as this image from the wikipedia entry on bilinear interpolation shows:
This may very well be good enough for what you are after, but there are applications where a triangulated, really piecewise linear, interpoaltion is preferred. If you really need this, there is an easy way of working around the slowness of qhull.
Once LinearNDInterpolator has been setup, there are two steps to coming up with an interpolated value for a given point:
figure out inside which triangle (4D hypertetrahedron in your case) the point is, and
interpolate using the barycentric coordinates of the point relative to the vertices as weights.
You probably do not want to mess with barycentric coordinates, so better leave that to LinearNDInterpolator. But you do know some things about the triangulation. Mostly that, because you have a regular grid, within each hypercube the triangulation is going to be the same. So to interpolate a single value, you could first determine in which subcube your point is, build a LinearNDInterpolator with the 16 vertices of that cube, and use it to interpolate your value:
from itertools import product
def interpolator(coords, data, point) :
dims = len(point)
indices = []
sub_coords = []
for j in xrange(dims) :
idx = np.digitize([point[j]], coords[j])[0]
indices += [[idx - 1, idx]]
sub_coords += [coords[j][indices[-1]]]
indices = np.array([j for j in product(*indices)])
sub_coords = np.array([j for j in product(*sub_coords)])
sub_data = data[list(np.swapaxes(indices, 0, 1))]
li = LinearNDInterpolator(sub_coords, sub_data)
return li([point])[0]
>>> point = np.array([12.3,-4.2, 500.5, 2.5])
>>> interpolator((lats, lons, alts, time), data, point)
0.386082399091
This cannot work on vectorized data, since that would require storing a LinearNDInterpolator for every possible subcube, and even though it probably would be faster than triangulating the whole thing, it would still be very slow.
scipy.ndimage.map_coordinates
is a nice fast interpolator for uniform grids (all boxes the same size).
See multivariate-spline-interpolation-in-python-scipy on SO
for a clear description.
For non-uniform rectangular grids, a simple wrapper
Intergrid maps / scales non-uniform to uniform grids,
then does map_coordinates.
On a 4d test case like yours it takes about 1 μsec per query:
Intergrid: 1000000 points in a (361, 720, 47, 8) grid took 652 msec
For very similar things I use Scientific.Functions.Interpolation.InterpolatingFunction.
import numpy as np
from Scientific.Functions.Interpolation import InterpolatingFunction
lats = np.arange(-90,90.5,0.5)
lons = np.arange(-180,180,0.5)
alts = np.arange(1,1000,21.717)
time = np.arange(8)
data = np.random.rand(len(lats)*len(lons)*len(alts)*len(time)).reshape((len(lats),len(lons),len(alts),len(time)))
axes = (lats, lons, alts, time)
f = InterpolatingFunction(axes, data)
You can now leave it to the user to call the InterpolatingFunction with coordinates:
>>> f(0,0,10,3)
0.7085675631375401
InterpolatingFunction has nice additional features, such as integration and slicing.
However, I do not know for sure whether the interpolation is linear. You would have to look in the module source to find out.
I can not open this address, and find enough informations about this package