3D interpolation of 2 lists in python - python

hi i have two sets of data taken from two seperate import files which are both being imported into python and have been placed in two seperate lists as follows:
list 1 is of the form:
(node, x coordinate, y coordinate, z coordinate)
example list 1: [[1,0,0,0],[2,1,0,0],[3,0,1,0],[4,1,1,0],[5,0,0,1],[6,1,0,1],[7,0,1,1],[8,1,1,1]]
list 2 is in the form:
(x coordinate, y coordinate, z coordinate, temperature)
example list 2: [[0,0,0,100],[1,0,0,90],[0,1,0,85],[1,1,0,110],[0,0,1,115],[1,0,1,118],[0,1,1,100],[1,1,11,96]]
from these two lists I need to use the coordinates to create a third list which contains a node value and its corresponding temperature. This task is a simple dictionary function if all the x y and z coordinates match up however with the data i am working with this will not always be the case.
For example if in list 1 I add a new entry at the end of the list, node number 9;
new entry at end of list 1 [9, 0.5, 0.9, 0.25]
Now I find myself with a node number with no corresponding temperature. At this point an interpolation function will need to be performed on list 2 to give me the temperature related to this node. Through basic 3d interpolation calculations I have worked out that this temperature will be 97.9 therefore my final output list would look like this:
Output list:
(node, temperature)
Output list: [[1,100],[2,90],[3,85],[4,110],[5,115],[6,118],[7,100],[8,96],[9,97.9]]
I am reasonably new to python so am struggling to find a solution to this interpolation problem, I have been researching how to do this for a number of weeks now and have still not been able to find a solution.
Any help would be greatly greatly appreciated,
Thanks

There are quite a few interpolation routines in scipy, but above 2 dimensions, most of them only offer linear and nearest neighbour interpolation - which might not be sufficient for your use.
All of the interpolation routiens are listed on the interplation page of the scipy docs area. Straight away you can ignore the mnivariate, and 1D and 2D spline sections - you want the multivariate section.
There are 9 functions here, split into structured and unstructed data:
Unstructured data:
griddata(points, values, xi[, method, ...]) Interpolate unstructured
D-dimensional data.
LinearNDInterpolator(points, values[, ...]) Piecewise linear interpolant in N dimensions.
NearestNDInterpolator(points, values) Nearest-neighbour interpolation in N dimensions.
CloughTocher2DInterpolator(points, values[, tol]) Piecewise cubic, C1 smooth, curvature-minimizing interpolant in 2D.
Rbf(*args) A class for radial basis function approximation/interpolation of n-dimensional scattered data.
interp2d(x, y, z[, kind, copy, ...]) Interpolate over a 2-D grid. For >
data on a grid:
interpn(points, values, xi[, method, ...]) Multidimensional
interpolation on regular grids.
RegularGridInterpolator(points, values[, ...]) Interpolation on a regular grid in arbitrary dimensions
RectBivariateSpline(x, y, z[, bbox, kx, ky, s]) Bivariate spline approximation over a rectangular mesh.
plus an additional one in the see also section, though we'll ignore that.
You should read how they each work, it might help you understand a little better.
The way these functions work though, is that you pass them data i.e. x,y,z coords, and the corresponding values at those points, and they then return a function which allows you to get a point at any location.
I would recommend the Rbf function here though, as from what i can see it's the only nD option which does not limit you to linear or nearest neighbour interpolation.
For example, you have two lists:
node_locations = [(node, x_coord, y_coord, z_coord), ...]
temp_data = [(x0, y0, z0, temp0), (x1, y1, z1, temp1), ...]
xs, ys, zs, temps = zip(*teemp_data) # This will unpack your data into columns, rather than rows.
from scipy.interpolate import Rbf
rbfi = Rbf(xs, ys, zs, temps)
# I don't know how you want your output data, so i'm just dumping it in a dictionary.
node_data = {}
for node, x, y, z in node_locations:
node_data[node] = rbfi(x, y, z)
Try something like that.

For scientific computing, I wouldn't use lists but numpy arrays instead.
So in your case:
import numpy as np
nodes = np.array(example_list_1)
temperatures = np.array(example_list_2)
With this you can then go on to use scipy's interpolation functions, like for example:
http://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.griddata.html#scipy.interpolate.griddata
from scipy.interpolate import griddata
interpolated = griddata(temperatures[:, :-1],
temperatures[:, -1],
nodes[:, 1:])

Related

2D interpolate list of many points [duplicate]

So, I have three numpy arrays which store latitude, longitude, and some property value on a grid -- that is, I have LAT(y,x), LON(y,x), and, say temperature T(y,x), for some limits of x and y. The grid isn't necessarily regular -- in fact, it's tripolar.
I then want to interpolate these property (temperature) values onto a bunch of different lat/lon points (stored as lat1(t), lon1(t), for about 10,000 t...) which do not fall on the actual grid points. I've tried matplotlib.mlab.griddata, but that takes far too long (it's not really designed for what I'm doing, after all). I've also tried scipy.interpolate.interp2d, but I get a MemoryError (my grids are about 400x400).
Is there any sort of slick, preferably fast way of doing this? I can't help but think the answer is something obvious... Thanks!!
Try the combination of inverse-distance weighting and
scipy.spatial.KDTree
described in SO
inverse-distance-weighted-idw-interpolation-with-python.
Kd-trees
work nicely in 2d 3d ..., inverse-distance weighting is smooth and local,
and the k= number of nearest neighbours can be varied to tradeoff speed / accuracy.
There is a nice inverse distance example by Roger Veciana i Rovira along with some code using GDAL to write to geotiff if you're into that.
This is of coarse to a regular grid, but assuming you project the data first to a pixel grid with pyproj or something, all the while being careful what projection is used for your data.
A copy of his algorithm and example script:
from math import pow
from math import sqrt
import numpy as np
import matplotlib.pyplot as plt
def pointValue(x,y,power,smoothing,xv,yv,values):
nominator=0
denominator=0
for i in range(0,len(values)):
dist = sqrt((x-xv[i])*(x-xv[i])+(y-yv[i])*(y-yv[i])+smoothing*smoothing);
#If the point is really close to one of the data points, return the data point value to avoid singularities
if(dist<0.0000000001):
return values[i]
nominator=nominator+(values[i]/pow(dist,power))
denominator=denominator+(1/pow(dist,power))
#Return NODATA if the denominator is zero
if denominator > 0:
value = nominator/denominator
else:
value = -9999
return value
def invDist(xv,yv,values,xsize=100,ysize=100,power=2,smoothing=0):
valuesGrid = np.zeros((ysize,xsize))
for x in range(0,xsize):
for y in range(0,ysize):
valuesGrid[y][x] = pointValue(x,y,power,smoothing,xv,yv,values)
return valuesGrid
if __name__ == "__main__":
power=1
smoothing=20
#Creating some data, with each coodinate and the values stored in separated lists
xv = [10,60,40,70,10,50,20,70,30,60]
yv = [10,20,30,30,40,50,60,70,80,90]
values = [1,2,2,3,4,6,7,7,8,10]
#Creating the output grid (100x100, in the example)
ti = np.linspace(0, 100, 100)
XI, YI = np.meshgrid(ti, ti)
#Creating the interpolation function and populating the output matrix value
ZI = invDist(xv,yv,values,100,100,power,smoothing)
# Plotting the result
n = plt.normalize(0.0, 100.0)
plt.subplot(1, 1, 1)
plt.pcolor(XI, YI, ZI)
plt.scatter(xv, yv, 100, values)
plt.title('Inv dist interpolation - power: ' + str(power) + ' smoothing: ' + str(smoothing))
plt.xlim(0, 100)
plt.ylim(0, 100)
plt.colorbar()
plt.show()
There's a bunch of options here, which one is best will depend on your data...
However I don't know of an out-of-the-box solution for you
You say your input data is from tripolar data. There are three main cases for how this data could be structured.
Sampled from a 3d grid in tripolar space, projected back to 2d LAT, LON data.
Sampled from a 2d grid in tripolar space, projected into 2d LAT LON data.
Unstructured data in tripolar space projected into 2d LAT LON data
The easiest of these is 2. Instead of interpolating in LAT LON space, "just" transform your point back into the source space and interpolate there.
Another option that works for 1 and 2 is to search for the cells that maps from tripolar space to cover your sample point. (You can use a BSP or grid type structure to speed up this search) Pick one of the cells, and interpolate inside it.
Finally there's a heap of unstructured interpolation options .. but they tend to be slow.
A personal favourite of mine is to use a linear interpolation of the nearest N points, finding those N points can again be done with gridding or a BSP. Another good option is to Delauney triangulate the unstructured points and interpolate on the resulting triangular mesh.
Personally if my mesh was case 1, I'd use an unstructured strategy as I'd be worried about having to handle searching through cells with overlapping projections. Choosing the "right" cell would be difficult.
I suggest you taking a look at GRASS (an open source GIS package) interpolation features (http://grass.ibiblio.org/gdp/html_grass62/v.surf.bspline.html). It's not in python but you can reimplement it or interface with C code.
Am I right in thinking your data grids look something like this (red is the old data, blue is the new interpolated data)?
alt text http://www.geekops.co.uk/photos/0000-00-02%20%28Forum%20images%29/DataSeparation.png
This might be a slightly brute-force-ish approach, but what about rendering your existing data as a bitmap (opengl will do simple interpolation of colours for you with the right options configured and you could render the data as triangles which should be fairly fast). You could then sample pixels at the locations of the new points.
Alternatively, you could sort your first set of points spatially and then find the closest old points surrounding your new point and interpolate based on the distances to those points.
There is a FORTRAN library called BIVAR, which is very suitable for this problem. With a few modifications you can make it usable in python using f2py.
From the description:
BIVAR is a FORTRAN90 library which interpolates scattered bivariate data, by Hiroshi Akima.
BIVAR accepts a set of (X,Y) data points scattered in 2D, with associated Z data values, and is able to construct a smooth interpolation function Z(X,Y), which agrees with the given data, and can be evaluated at other points in the plane.

How to plot an array as if the indices i,j were the x,y coordinates?

Hi guys first question here, looked for an answer but could not find anything, I will try to give it my best.
I am currently working on a problem in the field of Computational Physics and I am solving the Navier-Stokes equations numerically using the Finite Difference Method. It`s my first time working with Python (using a Google Colaboratory notebook with Python 3). I am solving the equations for a grid of points in a two-dimensional plane. I created this grid using np.arrays
import numpy as np
import matplotlib.pyplot as plt
N = 10
data = np.zeros((N,N))
and then manipulating it. For example
for i in range(N):
for j in range(N):
data[i,j] = i
which makes the values of the array increase with index i. However, if I plot my data-array now using
x = np.arange(N)
y = np.arange(N)
plt.contourf(x, y, data)
plt.colorbar()
The result of the example:
It shows that the plotted data increases along the y-axis even though my manipulation of the array should make it increase along the x-axis.
I noticed this happens because the indexing of arrays (i,j) is different from the standard orientation of x- and y-axis, but how can I plot my data-array as if i=x and j=y?
You can use numpy's ndindex function to get the indices based on shape and then unzip the result.
x,y=list(zip(*np.ndindex((N,N))))
The data is row by column and can be obtained with meshgrid. If you're interested in the same manipulation. You can make the data with meshgrid as
dx,dy=np.meshgrid(np.arange(N),np.arange(N))
And then plot the dy to get variation in the x axis.

Fast 3D interpolation of atmospheric data in Numpy/Scipy

I am trying to interpolate 3D atmospheric data from one vertical coordinate to another using Numpy/Scipy. For example, I have cubes of temperature and relative humidity, both of which are on constant, regular pressure surfaces. I want to interpolate the relative humidity to constant temperature surface(s).
The exact problem I am trying to solve has been asked previously here, however, the solution there is very slow. In my case, I have approximately 3M points in my cube (30x321x321), and that method takes around 4 minutes to operate on one set of data.
That post is nearly 5 years old. Do newer versions of Numpy/Scipy perhaps have methods that handle this faster? Maybe new sets of eyes looking at the problem have a better approach? I'm open to suggestions.
EDIT:
Slow = 4 minutes for one set of data cubes. I'm not sure how else I can quantify it.
The code being used...
def interpLevel(grid,value,data,interp='linear'):
"""
Interpolate 3d data to a common z coordinate.
Can be used to calculate the wind/pv/whatsoever values for a common
potential temperature / pressure level.
grid : numpy.ndarray
The grid. For example the potential temperature values for the whole 3d
grid.
value : float
The common value in the grid, to which the data shall be interpolated.
For example, 350.0
data : numpy.ndarray
The data which shall be interpolated. For example, the PV values for
the whole 3d grid.
kind : str
This indicates which kind of interpolation will be done. It is directly
passed on to scipy.interpolate.interp1d().
returns : numpy.ndarray
A 2d array containing the *data* values at *value*.
"""
ret = np.zeros_like(data[0,:,:])
for yIdx in xrange(grid.shape[1]):
for xIdx in xrange(grid.shape[2]):
# check if we need to flip the column
if grid[0,yIdx,xIdx] > grid[-1,yIdx,xIdx]:
ind = -1
else:
ind = 1
f = interpolate.interp1d(grid[::ind,yIdx,xIdx], \
data[::ind,yIdx,xIdx], \
kind=interp)
ret[yIdx,xIdx] = f(value)
return ret
EDIT 2:
I could share npy dumps of sample data, if anyone was interested enough to see what I am working with.
Since this is atmospheric data, I imagine that your grid does not have uniform spacing; however if your grid is rectilinear (such that each vertical column has the same set of z-coordinates) then you have some options.
For instance, if you only need linear interpolation (say for a simple visualization), you can just do something like:
# Find nearest grid point
idx = grid[:,0,0].searchsorted(value)
upper = grid[idx,0,0]
lower = grid[idx - 1, 0, 0]
s = (value - lower) / (upper - lower)
result = (1-s) * data[idx - 1, :, :] + s * data[idx, :, :]
(You'll need to add checks for value being out of range, of course).For a grid your size, this will be extremely fast (as in tiny fractions of a second)
You can pretty easily modify the above to perform cubic interpolation if need be; the challenge is in picking the correct weights for non-uniform vertical spacing.
The problem with using scipy.ndimage.map_coordinates is that, although it provides higher order interpolation and can handle arbitrary sample points, it does assume that the input data be uniformly spaced. It will still produce smooth results, but it won't be a reliable approximation.
If your coordinate grid is not rectilinear, so that the z-value for a given index changes for different x and y indices, then the approach you are using now is probably the best you can get without a fair bit of analysis of your particular problem.
UPDATE:
One neat trick (again, assuming that each column has the same, not necessarily regular, coordinates) is to use interp1d to extract the weights doing something like follows:
NZ = grid.shape[0]
zs = grid[:,0,0]
ident = np.identity(NZ)
weight_func = interp1d(zs, ident, 'cubic')
You only need to do the above once per grid; you can even reuse weight_func as long as the vertical coordinates don't change.
When it comes time to interpolate then, weight_func(value) will give you the weights, which you can use to compute a single interpolated value at (x_idx, y_idx) with:
weights = weight_func(value)
interp_val = np.dot(data[:, x_idx, y_idx), weights)
If you want to compute a whole plane of interpolated values, you can use np.inner, although since your z-coordinate comes first, you'll need to do:
result = np.inner(data.T, weights).T
Again, the computation should be practically immediate.
This is quite an old question but the best way to do this nowadays is to use MetPy's interpolate_1d funtion:
https://unidata.github.io/MetPy/latest/api/generated/metpy.interpolate.interpolate_1d.html
There is a new implementation of Numba accelerated interpolation on regular grids in 1, 2, and 3 dimensions:
https://github.com/dbstein/fast_interp
Usage is as follows:
from fast_interp import interp2d
import numpy as np
nx = 50
ny = 37
xv, xh = np.linspace(0, 1, nx, endpoint=True, retstep=True)
yv, yh = np.linspace(0, 2*np.pi, ny, endpoint=False, retstep=True)
x, y = np.meshgrid(xv, yv, indexing='ij')
test_function = lambda x, y: np.exp(x)*np.exp(np.sin(y))
f = test_function(x, y)
test_x = -xh/2.0
test_y = 271.43
fa = test_function(test_x, test_y)
interpolater = interp2d([0,0], [1,2*np.pi], [xh,yh], f, k=5, p=[False,True], e=[1,0])
fe = interpolater(test_x, test_y)

Multivariate spline interpolation in python/scipy?

Is there a library module or other straightforward way to implement multivariate spline interpolation in python?
Specifically, I have a set of scalar data on a regularly-spaced three-dimensional grid which I need to interpolate at a small number of points scattered throughout the domain. For two dimensions, I have been using scipy.interpolate.RectBivariateSpline, and I'm essentially looking for an extension of that to three-dimensional data.
The N-dimensional interpolation routines I have found are not quite good enough: I would prefer splines over LinearNDInterpolator for smoothness, and I have far too many data points (often over one million) for, e.g., a radial basis function to work.
If anyone knows of a python library that can do this, or perhaps one in another language that I could call or port, I'd really appreciate it.
If I'm understanding your question correctly, your input "observation" data is regularly gridded?
If so, scipy.ndimage.map_coordinates does exactly what you want.
It's a bit hard to understand at first pass, but essentially, you just feed it a sequence of coordinates that you want to interpolate the values of the grid at in pixel/voxel/n-dimensional-index coordinates.
As a 2D example:
import numpy as np
from scipy import ndimage
import matplotlib.pyplot as plt
# Note that the output interpolated coords will be the same dtype as your input
# data. If we have an array of ints, and we want floating point precision in
# the output interpolated points, we need to cast the array as floats
data = np.arange(40).reshape((8,5)).astype(np.float)
# I'm writing these as row, column pairs for clarity...
coords = np.array([[1.2, 3.5], [6.7, 2.5], [7.9, 3.5], [3.5, 3.5]])
# However, map_coordinates expects the transpose of this
coords = coords.T
# The "mode" kwarg here just controls how the boundaries are treated
# mode='nearest' is _not_ nearest neighbor interpolation, it just uses the
# value of the nearest cell if the point lies outside the grid. The default is
# to treat the values outside the grid as zero, which can cause some edge
# effects if you're interpolating points near the edge
# The "order" kwarg controls the order of the splines used. The default is
# cubic splines, order=3
zi = ndimage.map_coordinates(data, coords, order=3, mode='nearest')
row, column = coords
nrows, ncols = data.shape
im = plt.imshow(data, interpolation='nearest', extent=[0, ncols, nrows, 0])
plt.colorbar(im)
plt.scatter(column, row, c=zi, vmin=data.min(), vmax=data.max())
for r, c, z in zip(row, column, zi):
plt.annotate('%0.3f' % z, (c,r), xytext=(-10,10), textcoords='offset points',
arrowprops=dict(arrowstyle='->'), ha='right')
plt.show()
To do this in n-dimensions, we just need to pass in the appropriate sized arrays:
import numpy as np
from scipy import ndimage
data = np.arange(3*5*9).reshape((3,5,9)).astype(np.float)
coords = np.array([[1.2, 3.5, 7.8], [0.5, 0.5, 6.8]])
zi = ndimage.map_coordinates(data, coords.T)
As far as scaling and memory usage goes, map_coordinates will create a filtered copy of the array if you're using an order > 1 (i.e. not linear interpolation). If you just want to interpolate at a very small number of points, this is a rather large overhead. It doesn't increase with the number points you want to interpolate at, however. As long as have enough RAM for a single temporary copy of your input data array, you'll be fine.
If you can't store a copy of your data in memory, you can either a) specify prefilter=False and order=1 and use linear interpolation, or b) replace your original data with a filtered version using ndimage.spline_filter, and then call map_coordinates with prefilter=False.
Even if you have enough ram, keeping the filtered dataset around can be a big speedup if you need to call map_coordinates multiple times (e.g. interactive use, etc).
Smooth spline interpolation in dim > 2 is difficult to implement, and so there are not many freely available libraries able to do that (in fact, I don't know any).
You can try inverse distance weighted interpolation, see: Inverse Distance Weighted (IDW) Interpolation with Python .
This should produce reasonably smooth results, and scale better than RBF to larger data sets.

Python griddata meshgrid

in Python I want to interpolate some data using scipy.interpolate.griddata(x,y,z,xi,yi).
Since I want my unequal spaced original data on the X-Y grid map on an equal spaced XI-YI grid I have to use a meshgrid as:
X, Y = numpy.meshgrid([1,2,3], [2,5,6,8])
XI,YI = numpy.meshgrid([1,2,3],[4,5,6,7])
print scipy.interpolate.griddata(X,Y,X**2+Y**2,XI,YI)
Unfortunately it seems as scipys' griddata does not accept matrices as input for x,y,z in contrast to matlab's griddata-function. Does anyone has a hint for me how to solve the problem?
The correct call sequence in your case is
print scipy.interpolate.griddata((X.ravel(),Y.ravel()), (X**2+Y**2).ravel(), (XI, YI))
I.e., you need to cast the input data points to 1-d. (This could be fixed to work without the .ravel()s in the next version of Scipy.)
I think you need to reshape your grids, griddata expects a list of points with coordinates in column form:
points = transpose(reshape((X,Y), (2,12)))
pointsI = transpose(reshape((XI,YI), (2,12)))
Z = reshape(X**2+Y**2, 12)
print scipy.interpolate.griddata(points, Z, pointsI)

Categories