Related
So, I have three numpy arrays which store latitude, longitude, and some property value on a grid -- that is, I have LAT(y,x), LON(y,x), and, say temperature T(y,x), for some limits of x and y. The grid isn't necessarily regular -- in fact, it's tripolar.
I then want to interpolate these property (temperature) values onto a bunch of different lat/lon points (stored as lat1(t), lon1(t), for about 10,000 t...) which do not fall on the actual grid points. I've tried matplotlib.mlab.griddata, but that takes far too long (it's not really designed for what I'm doing, after all). I've also tried scipy.interpolate.interp2d, but I get a MemoryError (my grids are about 400x400).
Is there any sort of slick, preferably fast way of doing this? I can't help but think the answer is something obvious... Thanks!!
Try the combination of inverse-distance weighting and
scipy.spatial.KDTree
described in SO
inverse-distance-weighted-idw-interpolation-with-python.
Kd-trees
work nicely in 2d 3d ..., inverse-distance weighting is smooth and local,
and the k= number of nearest neighbours can be varied to tradeoff speed / accuracy.
There is a nice inverse distance example by Roger Veciana i Rovira along with some code using GDAL to write to geotiff if you're into that.
This is of coarse to a regular grid, but assuming you project the data first to a pixel grid with pyproj or something, all the while being careful what projection is used for your data.
A copy of his algorithm and example script:
from math import pow
from math import sqrt
import numpy as np
import matplotlib.pyplot as plt
def pointValue(x,y,power,smoothing,xv,yv,values):
nominator=0
denominator=0
for i in range(0,len(values)):
dist = sqrt((x-xv[i])*(x-xv[i])+(y-yv[i])*(y-yv[i])+smoothing*smoothing);
#If the point is really close to one of the data points, return the data point value to avoid singularities
if(dist<0.0000000001):
return values[i]
nominator=nominator+(values[i]/pow(dist,power))
denominator=denominator+(1/pow(dist,power))
#Return NODATA if the denominator is zero
if denominator > 0:
value = nominator/denominator
else:
value = -9999
return value
def invDist(xv,yv,values,xsize=100,ysize=100,power=2,smoothing=0):
valuesGrid = np.zeros((ysize,xsize))
for x in range(0,xsize):
for y in range(0,ysize):
valuesGrid[y][x] = pointValue(x,y,power,smoothing,xv,yv,values)
return valuesGrid
if __name__ == "__main__":
power=1
smoothing=20
#Creating some data, with each coodinate and the values stored in separated lists
xv = [10,60,40,70,10,50,20,70,30,60]
yv = [10,20,30,30,40,50,60,70,80,90]
values = [1,2,2,3,4,6,7,7,8,10]
#Creating the output grid (100x100, in the example)
ti = np.linspace(0, 100, 100)
XI, YI = np.meshgrid(ti, ti)
#Creating the interpolation function and populating the output matrix value
ZI = invDist(xv,yv,values,100,100,power,smoothing)
# Plotting the result
n = plt.normalize(0.0, 100.0)
plt.subplot(1, 1, 1)
plt.pcolor(XI, YI, ZI)
plt.scatter(xv, yv, 100, values)
plt.title('Inv dist interpolation - power: ' + str(power) + ' smoothing: ' + str(smoothing))
plt.xlim(0, 100)
plt.ylim(0, 100)
plt.colorbar()
plt.show()
There's a bunch of options here, which one is best will depend on your data...
However I don't know of an out-of-the-box solution for you
You say your input data is from tripolar data. There are three main cases for how this data could be structured.
Sampled from a 3d grid in tripolar space, projected back to 2d LAT, LON data.
Sampled from a 2d grid in tripolar space, projected into 2d LAT LON data.
Unstructured data in tripolar space projected into 2d LAT LON data
The easiest of these is 2. Instead of interpolating in LAT LON space, "just" transform your point back into the source space and interpolate there.
Another option that works for 1 and 2 is to search for the cells that maps from tripolar space to cover your sample point. (You can use a BSP or grid type structure to speed up this search) Pick one of the cells, and interpolate inside it.
Finally there's a heap of unstructured interpolation options .. but they tend to be slow.
A personal favourite of mine is to use a linear interpolation of the nearest N points, finding those N points can again be done with gridding or a BSP. Another good option is to Delauney triangulate the unstructured points and interpolate on the resulting triangular mesh.
Personally if my mesh was case 1, I'd use an unstructured strategy as I'd be worried about having to handle searching through cells with overlapping projections. Choosing the "right" cell would be difficult.
I suggest you taking a look at GRASS (an open source GIS package) interpolation features (http://grass.ibiblio.org/gdp/html_grass62/v.surf.bspline.html). It's not in python but you can reimplement it or interface with C code.
Am I right in thinking your data grids look something like this (red is the old data, blue is the new interpolated data)?
alt text http://www.geekops.co.uk/photos/0000-00-02%20%28Forum%20images%29/DataSeparation.png
This might be a slightly brute-force-ish approach, but what about rendering your existing data as a bitmap (opengl will do simple interpolation of colours for you with the right options configured and you could render the data as triangles which should be fairly fast). You could then sample pixels at the locations of the new points.
Alternatively, you could sort your first set of points spatially and then find the closest old points surrounding your new point and interpolate based on the distances to those points.
There is a FORTRAN library called BIVAR, which is very suitable for this problem. With a few modifications you can make it usable in python using f2py.
From the description:
BIVAR is a FORTRAN90 library which interpolates scattered bivariate data, by Hiroshi Akima.
BIVAR accepts a set of (X,Y) data points scattered in 2D, with associated Z data values, and is able to construct a smooth interpolation function Z(X,Y), which agrees with the given data, and can be evaluated at other points in the plane.
I have data from distinct curves, and want to fit each of them individually. However, the data is mixed into a single array, so first I believe I need a way to separate the data.
I know that each of the individual curves is under the family A/x+B. As of now I cut out each of the curves by hand and curve fit, but would like to automate this process, have the computer separate these curves a fit them. I attempted to use machine learning, but didn't know where to start, what packages to use. I am using python, but can also use C++, in fact I hope to transfer it to C++ by the end. Where do you think I should start, is it worth it to use unsupervised machine learning, or is there a better way to separate the data?
The expected curves:
An example of the data
Well, you sure do have an interesting problem.
I see that there are curves with Y-axis values that are considerably larger than the rest of them. I would simply take the first N-values with the largest Y-axis values and then fit them to an exponential decay curve (or that other curve you mention). You can then simply take the points that most fit that curve and then leave the other points alone.
Except...
This is a terrible way to extrapolate data. Doing this, you are cherry-picking the data you want. This is falsifying information and is very bad.
Your best bet is to create a single curve that all points fit too if you cannot isolate all of those points into separate curves with external information.
But...
We do know some information: a valid function must have only 1 output given a single input.
If the X-Axis is discreet, this means you can create a lookup table of Outputs given the input. This allows you to count how many curves there are associated with the specific X-value (which could be a time unit). In other words, you have to have external information to separate points locally. You can then reorder the points in increasing Y-value, and now you have your separate curves defined in discrete points.
Basically, this is an unsolvable problem in the general sense, but in your specific application, there might be extra rules that further define the domain and range such that you can do data filtering.
One more thing...
I am making these statements with the assumption that the (X,Y) values are floats that cannot maintain accuracy after some mathematical operations.
If you are using things like unum numbers, you might be able to keep enough information in the decimal such that your fitting functions can differentiate between points without extra filtering.
This case is more of a hope than anything, as adopting a new number representation to get more accuracy to isolate sampled points is a stretch at best.
Just for completeness, there are some mathematical libraries that might help you.
Boost.uBLAS
Eigen
LAPACK++
Hopefully, I have given you enough information to allow you to solve your problem.
I extracted data from the plot for analysis. Here is example code that loads, separates, fits and plots the three data sets. It works when the separate data files are appended into a single text file.
import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
##########################################################
# data load and separation section
datafilename = 'temp.dat'
textdata = open(datafilename, 'rt').read()
xLists = [[], [], []]
yLists = [[], [], []]
previousY = 0.0 # initialize
whichList = -1 # initialize
datalines = textdata.split('\n')
for line in datalines:
if not line: # allow for blank lines in data file
continue
spl = line.split()
x = float(spl[0])
y = float(spl[1])
if y > previousY + 50.0: # this separator must be greater than max noise
whichList += 1
previousY = y
xLists[whichList].append(x)
yLists[whichList].append(y)
##########################################################
# curve fitting section
def func(x, a, b):
return a / x + b
parameterLists = []
for curveIndex in range(len(xLists)):
# these are the same as the scipy defaults
initialParameters = numpy.array([1.0, 1.0])
xData = numpy.array(xLists[curveIndex], dtype=float)
yData = numpy.array(yLists[curveIndex], dtype=float)
# curve fit the test data
fittedParameters, pcov = curve_fit(func, xData, yData, initialParameters)
parameterLists.append(fittedParameters)
##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
axes = f.add_subplot(111)
for curveIndex in range(len(xLists)):
# first the raw data as a scatter plot
axes.plot(xLists[curveIndex], yLists[curveIndex], 'D')
# create data for each fitted equation plot
xModel = numpy.linspace(min(xLists[curveIndex]), max(xLists[curveIndex]))
yModel = func(xModel, *parameterLists[curveIndex])
# now the model as a line plot
axes.plot(xModel, yModel)
axes.set_xlabel('X Data') # X axis data label
axes.set_ylabel('Y Data') # Y axis data label
plt.show()
plt.close('all') # clean up after using pyplot
graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)
The idea:
create N naive, easy to calculate, sufficiently precise(for clustering), approximations. Then "classify" each data-point to the closest such approximation.
This is done like this:
The approximations are analytical approximations using these two equations I derived:
where (x1,y1) and (x2,y2) are coordinates of two points on the curve.
To get these two points I assumed that (1) the first points(according to the x-axis) are distributed equally between the different real curves. And (2) the 2 first points of each real curve, are smaller or bigger than the 2 first points of each other real curve. Thus sorting them and dividing into N groups will successfully cluster the first *2*N* points. If these assumptions are false you can still manually classify the 2 first points of each real curve and the rest will be classified automatically (this is actually the first approach I implemented).
Then cluster rest of the points to each point's closest approximation. Closest meaning with the smallest error.
Edit: A stronger approach for the initial approximation could be by calculating A and B for a couple of pairs of points and using their mean A and B as the approximation. And maybe even possibly doing K-means on these points/approximations.
The Code:
import numpy as np
import matplotlib.pyplot as plt
# You should probably edit this variable
NUM_OF_CURVES = 4
# <data> should be a 1-D array containing the Y values of the series
# <x_of_data> should be a 1-D array containing the corresponding X values of the series
data, x_of_data = np.loadtxt('...')
# clustering of first 2*num_of_curves points
# I started at NUM_OF_CURVES instead of 0 because my xs started at 0.
# The range (0:NUM_OF_CURVES*2) will probably be better for you.
raw_data = data[NUM_OF_CURVES:NUM_OF_CURVES*3]
raw_xs = x_of_data[NUM_OF_CURVES:NUM_OF_CURVES*3]
sort_ind = np.argsort(raw_data)
Y = raw_data[sort_ind].reshape(NUM_OF_CURVES,-1).T
X = raw_xs[sort_ind].reshape(NUM_OF_CURVES,-1).T
# approximation of A and B for each curve
A = ((Y[0]*Y[1])*(X[0]-X[1]))/(Y[1]-Y[0])
B = (A / Y[0]) - X[0]
# creating approximating curves
f = []
for i in range(NUM_OF_CURVES):
f.append(A[i]/(x_of_data+B[i]))
curves = np.vstack(f)
# clustering the points to the approximating curves
raw_clusters = [[] for _ in range(NUM_OF_CURVES)]
for i in range(len(data)):
raw_clusters[np.abs(curves[:,i]-data[i]).argmin()].append((x_of_data[i],data[i]))
# changing the clusters to np.arrays of the shape (2,-1)
# where row 0 contains the X coordinates and row 1 the Y coordinates
clusters = []
for i in range(len(raw_clusters)):
clusters.append(np.array(list(zip(*raw_clusters[i]))))
Example:
raw series:
separated series:
I'm working on a project to calculate the centroid of a state/country using python.
What I have done so far:
Take an outline of the state and run it through ImageJ to create a csv of the x,y coordinates of the border. This gives me a .csv file with data like this:
556,243
557,243
557,250
556,250
556,252
555,252
555,253
554,253
etc, etc,
For about 2500 data points.
Import this list into a Python script.
Calculate the average of the x and y coordinate arrays. This point is the centroid. (Idea similar to this)
Plot the points and the centroid using matplotlib.
Here is my code:
#####################################################
# Imports #
#####################################################
import csv
import matplotlib.pyplot as plt
import numpy as np
import pylab
#####################################################
# Setup #
#####################################################
#Set empty list for coordinates
x,y =[],[]
#Importing csv data
with open("russiadata.csv", "r") as russiadataFile:
russiadataReader = csv.reader(russiadataFile)
#Create list of points
russiadatalist = []
#Import data
for row in russiadataReader:
#While the rows have data, AKA length not equal to zero.
if len(row) != 0:
#Append data to arrays created above
x.append(float(row[0]))
y.append(float(row[1]))
#Close file as importing is done
russiadataFile.closejust flipped around the
#####################################################
# Data Analysis #
#####################################################
#Convert list to array for computations
x=np.array(x)
y=np.array(y)
#Calculate number of data points
x_len=len(x)just flipped around the
y_len=len(y)
#Set sum of points equal to x_sum and y_sum
x_sum=np.sum(x)
y_sum=np.sum(y)
#Calculate centroid of points
x_centroid=x_sum/x_len
y_centroid=y_sum/y_len
#####################################################
# Plotting #
#####################################################
#Plot all points in data
plt.xkcd()
plt.plot(x,y, "-.")
#Plot centroid and label it
plt.plot(x_centroid,y_centroid,'^')
plt.ymax=max(x)
#Add axis labels
plt.xlabel("X")
plt.ylabel("Y")
plt.title("russia")
#Show the plot
plt.show()
The problem I have run into is that some sides of the state have more points than others, so the centroid is being weighted towards areas with more points. This is not what I want. I'm trying to find the centroid of the polygon that has vertices from the x,y coordinates.
This is what my plot looks like:
https://imgur.com/a/ZdukA
As you can see, the centroid is weighted more towards the section of points with more density. (As a side note, yes, that is Russia. I'm having issues with the plot coming out backwards and stretched/squashed.)
In other words, is there a more accurate way to get the centroid?
Thanks in advance for any help.
It sounds to me like you don't want your centroid to be calculated with the density of the scatter in mind.
If you just want to use surface area, then I would eliminate any point that is contained within the current outline of the scatter. A slightly more accurate way might be to pretend there is a box outlined by your outer-most points, then to check the x- and y-coordinates of all of your points and eliminate any that fall inside of the box. Any points that fall inside the current outline are not contributing to the shape, only the density.
I think the most technical and accurate approach would be very complicated, and here's what I think it would require: to get the outer-most points to connect based on least distance from each other, and furthest distance from all other points. By "connect" I mean to pretend that a line passes through, and ends at, both points. It should be defined mathematically.
Then, for each point, calculate whether or not it falls inside or outside of this outline, and eliminate all that fall inside (they are redundant as they are already inside the shape).
You can find the correct formula for a closed polygon on Wikipedia: https://en.wikipedia.org/wiki/Centroid#Centroid_of_a_polygon
Another formula is helpful to deal with Kaliningrad oblast (exclave) and islands (if you want to be really precise): https://en.wikipedia.org/wiki/Centroid#By_geometric_decomposition
That said, such questions probably fit better to https://math.stackexchange.com
First, a bit of background:
I am using spherical harmonics as an example of a function on the surface of a sphere like the front spheres in this image:
I produced one of these spheres, coloured according to the value of the harmonic function at points on its surface. I do this first for a very large number of points, so my function is very accurate. I've called this my fine sphere.
Now that I have my fine sphere, I take a relatively small number of points on the sphere. These are the points I wish to interpolate from, the training data, and I call them interp points. Here are my interp points, coloured to their values, plotted on my fine sphere.
Now, the goal of the project is to use these interp points to train a SciPy Radial Basis Function to interpolate my function on the sphere. I was able to do this using:
# Train the interpolation using interp coordinates
rbf = Rbf(interp.phi, interp.theta, harmonic13_coarse)
# The result of the interpolation on fine coordinates
interp_values = rbf(fine.phi, fine.theta)
Which produced this interpolation, plotted on the sphere:
Hopefully, through this last image, you can see my problem. Notice the line running through the interpolation? This is because the interpolation data has a boundary. The boundary is because I trained the radial basis function using spherical coordinates (boundaries at [0,pi] and [0,2pi]).
rbf = Rbf(interp.phi, interp.theta, harmonic13_coarse)
My goal, and why I'm posting this problem, is to interpolate my function on the surface of the sphere using the x,y,z Cartesian coordinates of the data on the sphere. This way, since spheres don't have boundaries, I won't have this boundary error like I do in spherical coordinates. However, I just can't figure out how to do this.
I've tried simply giving the Rbf function the x,y,z coordinates and the value of the function.
rbf=Rbf(interp.x, interp.y, interp.z, harmonic13_coarse)
interp_values=rbf(fine.x,fine.y,fine.z)
But NumPy throws me a Singular Matrix Error
numpy.linalg.linalg.LinAlgError: singular matrix
Is there any way for me to give Rbf my data sites in Cartesian coordinates, with the function values at each site and have it behave like it does with spherical coordinates but without that boundaries? From the Rbf documentation, there is the attribute norm for defining a different distance norm, could I have to use a spherical distance to get this to work?
I'm pretty much stumped on this. Let me know if you have any ideas for interpolating my function on a sphere without the boundaries of spherical coordinates.
Here is my code in full:
import matplotlib.pyplot as plt
from matplotlib import cm, colors
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
from scipy import special
from scipy.interpolate import Rbf
from collections import namedtuple
from mayavi import mlab
# Nice aliases
pi = np.pi
cos = np.cos
sin = np.sin
# Creating a sphere in Cartesian and Sphereical
# Saves coordinates as named tuples
def coordinates(r, n):
phi, theta = np.mgrid[0:pi:n, 0:2 * pi:n]
Coor = namedtuple('Coor', 'r phi theta x y z')
r = r
x = r * sin(phi) * cos(theta)
y = r * sin(phi) * sin(theta)
z = r * cos(phi)
return Coor(r, phi, theta, x, y, z)
# Creating a sphere
# fine is coordinates on a fine grid
# interp is coordinates on coarse grid for training interpolation
fine = coordinates(1, 100j)
interp = coordinates(1, 5j)
# Defining finection to colour sphere
# Here we are using a spherical harmonic
def harmonic(m, n, theta, phi):
return special.sph_harm(m, n, theta, phi).real
norm = colors.Normalize()
# One example of the harmonic function, for testing
harmonic13_fine = harmonic(1, 3, fine.theta, fine.phi)
harmonic13_coarse = harmonic(1, 3, interp.theta, interp.phi)
# Train the interpolation using interp coordinates
rbf = Rbf(interp.phi, interp.theta, harmonic13_coarse)
# The result of the interpolation on fine coordinates
interp_values = rbf(fine.phi, fine.theta)
rbf=Rbf(interp.x, interp.y, interp.z, harmonic13_coarse)
interp_values=rbf(fine.x,fine.y,fine.z)
#Figure of harmoinc function on sphere in fine cordinates
#Points3d showing interpolation training points coloured to their value
mlab.figure()
vmax, vmin = np.max(harmonic13_fine), np.min(harmonic13_fine)
mlab.mesh(fine.x, fine.y, fine.z, scalars=harmonic13_fine, vmax=vmax, vmin=vmin)
mlab.points3d(interp.x, interp.y, interp.z, harmonic13_coarse,
scale_factor=0.1, scale_mode='none', vmax=vmax, vmin=vmin)
#Figure showing results of rbf interpolation
mlab.figure()
vmax, vmin = np.max(harmonic13_fine), np.min(harmonic13_fine)
mlab.mesh(fine.x, fine.y, fine.z, scalars=interp_values)
# mlab.points3d(interp.x, interp.y, interp.z, scalars, scale_factor=0.1, scale_mode='none',vmax=vmax, vmin=vmin)
mlab.show()
The boundary you see is because you are mapping a closed surface (S2) to an open one (R2). One way or another, you will have boundaries. The local properties of the manifolds are compatible, so it works for most of the sphere, but not the global, you get a line.
The way around it is to use an atlas instead of a single chart. An atlas is a collection of overlapping charts. In the overlapping region, you need to define weights, a smooth function that goes from 0 to 1 on each chart. (Sorry, probably differential geometry was not what you were expecting to hear).
If you don't want to go all the way here, you can notice that your original sphere has an equator where the variance is minimal. You can then rotate your fine sphere and make it coincide with the line. It doesn't solve your problem, but it can certainly mitigate it.
You can change the standard distance:
def euclidean_norm(x1, x2):
return np.sqrt( ((x1 - x2)**2).sum(axis=0) )
by the sphere distance (see, for instance, this question Haversine Formula in Python (Bearing and Distance between two GPS points)).
I need to interpolate temperature data linearly in 4 dimensions (latitude, longitude, altitude and time).
The number of points is fairly high (360x720x50x8) and I need a fast method of computing the temperature at any point in space and time within the data bounds.
I have tried using scipy.interpolate.LinearNDInterpolator but using Qhull for triangulation is inefficient on a rectangular grid and takes hours to complete.
By reading this SciPy ticket, the solution seemed to be implementing a new nd interpolator using the standard interp1d to calculate a higher number of data points, and then use a "nearest neighbor" approach with the new dataset.
This, however, takes a long time again (minutes).
Is there a quick way of interpolating data on a rectangular grid in 4 dimensions without it taking minutes to accomplish?
I thought of using interp1d 4 times without calculating a higher density of points, but leaving it for the user to call with the coordinates, but I can't get my head around how to do this.
Otherwise would writing my own 4D interpolator specific to my needs be an option here?
Here's the code I've been using to test this:
Using scipy.interpolate.LinearNDInterpolator:
import numpy as np
from scipy.interpolate import LinearNDInterpolator
lats = np.arange(-90,90.5,0.5)
lons = np.arange(-180,180,0.5)
alts = np.arange(1,1000,21.717)
time = np.arange(8)
data = np.random.rand(len(lats)*len(lons)*len(alts)*len(time)).reshape((len(lats),len(lons),len(alts),len(time)))
coords = np.zeros((len(lats),len(lons),len(alts),len(time),4))
coords[...,0] = lats.reshape((len(lats),1,1,1))
coords[...,1] = lons.reshape((1,len(lons),1,1))
coords[...,2] = alts.reshape((1,1,len(alts),1))
coords[...,3] = time.reshape((1,1,1,len(time)))
coords = coords.reshape((data.size,4))
interpolatedData = LinearNDInterpolator(coords,data)
Using scipy.interpolate.interp1d:
import numpy as np
from scipy.interpolate import LinearNDInterpolator
lats = np.arange(-90,90.5,0.5)
lons = np.arange(-180,180,0.5)
alts = np.arange(1,1000,21.717)
time = np.arange(8)
data = np.random.rand(len(lats)*len(lons)*len(alts)*len(time)).reshape((len(lats),len(lons),len(alts),len(time)))
interpolatedData = np.array([None, None, None, None])
interpolatedData[0] = interp1d(lats,data,axis=0)
interpolatedData[1] = interp1d(lons,data,axis=1)
interpolatedData[2] = interp1d(alts,data,axis=2)
interpolatedData[3] = interp1d(time,data,axis=3)
Thank you very much for your help!
In the same ticket you have linked, there is an example implementation of what they call tensor product interpolation, showing the proper way to nest recursive calls to interp1d. This is equivalent to quadrilinear interpolation if you choose the default kind='linear' parameter for your interp1d's.
While this may be good enough, this is not linear interpolation, and there will be higher order terms in the interpolation function, as this image from the wikipedia entry on bilinear interpolation shows:
This may very well be good enough for what you are after, but there are applications where a triangulated, really piecewise linear, interpoaltion is preferred. If you really need this, there is an easy way of working around the slowness of qhull.
Once LinearNDInterpolator has been setup, there are two steps to coming up with an interpolated value for a given point:
figure out inside which triangle (4D hypertetrahedron in your case) the point is, and
interpolate using the barycentric coordinates of the point relative to the vertices as weights.
You probably do not want to mess with barycentric coordinates, so better leave that to LinearNDInterpolator. But you do know some things about the triangulation. Mostly that, because you have a regular grid, within each hypercube the triangulation is going to be the same. So to interpolate a single value, you could first determine in which subcube your point is, build a LinearNDInterpolator with the 16 vertices of that cube, and use it to interpolate your value:
from itertools import product
def interpolator(coords, data, point) :
dims = len(point)
indices = []
sub_coords = []
for j in xrange(dims) :
idx = np.digitize([point[j]], coords[j])[0]
indices += [[idx - 1, idx]]
sub_coords += [coords[j][indices[-1]]]
indices = np.array([j for j in product(*indices)])
sub_coords = np.array([j for j in product(*sub_coords)])
sub_data = data[list(np.swapaxes(indices, 0, 1))]
li = LinearNDInterpolator(sub_coords, sub_data)
return li([point])[0]
>>> point = np.array([12.3,-4.2, 500.5, 2.5])
>>> interpolator((lats, lons, alts, time), data, point)
0.386082399091
This cannot work on vectorized data, since that would require storing a LinearNDInterpolator for every possible subcube, and even though it probably would be faster than triangulating the whole thing, it would still be very slow.
scipy.ndimage.map_coordinates
is a nice fast interpolator for uniform grids (all boxes the same size).
See multivariate-spline-interpolation-in-python-scipy on SO
for a clear description.
For non-uniform rectangular grids, a simple wrapper
Intergrid maps / scales non-uniform to uniform grids,
then does map_coordinates.
On a 4d test case like yours it takes about 1 μsec per query:
Intergrid: 1000000 points in a (361, 720, 47, 8) grid took 652 msec
For very similar things I use Scientific.Functions.Interpolation.InterpolatingFunction.
import numpy as np
from Scientific.Functions.Interpolation import InterpolatingFunction
lats = np.arange(-90,90.5,0.5)
lons = np.arange(-180,180,0.5)
alts = np.arange(1,1000,21.717)
time = np.arange(8)
data = np.random.rand(len(lats)*len(lons)*len(alts)*len(time)).reshape((len(lats),len(lons),len(alts),len(time)))
axes = (lats, lons, alts, time)
f = InterpolatingFunction(axes, data)
You can now leave it to the user to call the InterpolatingFunction with coordinates:
>>> f(0,0,10,3)
0.7085675631375401
InterpolatingFunction has nice additional features, such as integration and slicing.
However, I do not know for sure whether the interpolation is linear. You would have to look in the module source to find out.
I can not open this address, and find enough informations about this package