Related
So, I have three numpy arrays which store latitude, longitude, and some property value on a grid -- that is, I have LAT(y,x), LON(y,x), and, say temperature T(y,x), for some limits of x and y. The grid isn't necessarily regular -- in fact, it's tripolar.
I then want to interpolate these property (temperature) values onto a bunch of different lat/lon points (stored as lat1(t), lon1(t), for about 10,000 t...) which do not fall on the actual grid points. I've tried matplotlib.mlab.griddata, but that takes far too long (it's not really designed for what I'm doing, after all). I've also tried scipy.interpolate.interp2d, but I get a MemoryError (my grids are about 400x400).
Is there any sort of slick, preferably fast way of doing this? I can't help but think the answer is something obvious... Thanks!!
Try the combination of inverse-distance weighting and
scipy.spatial.KDTree
described in SO
inverse-distance-weighted-idw-interpolation-with-python.
Kd-trees
work nicely in 2d 3d ..., inverse-distance weighting is smooth and local,
and the k= number of nearest neighbours can be varied to tradeoff speed / accuracy.
There is a nice inverse distance example by Roger Veciana i Rovira along with some code using GDAL to write to geotiff if you're into that.
This is of coarse to a regular grid, but assuming you project the data first to a pixel grid with pyproj or something, all the while being careful what projection is used for your data.
A copy of his algorithm and example script:
from math import pow
from math import sqrt
import numpy as np
import matplotlib.pyplot as plt
def pointValue(x,y,power,smoothing,xv,yv,values):
nominator=0
denominator=0
for i in range(0,len(values)):
dist = sqrt((x-xv[i])*(x-xv[i])+(y-yv[i])*(y-yv[i])+smoothing*smoothing);
#If the point is really close to one of the data points, return the data point value to avoid singularities
if(dist<0.0000000001):
return values[i]
nominator=nominator+(values[i]/pow(dist,power))
denominator=denominator+(1/pow(dist,power))
#Return NODATA if the denominator is zero
if denominator > 0:
value = nominator/denominator
else:
value = -9999
return value
def invDist(xv,yv,values,xsize=100,ysize=100,power=2,smoothing=0):
valuesGrid = np.zeros((ysize,xsize))
for x in range(0,xsize):
for y in range(0,ysize):
valuesGrid[y][x] = pointValue(x,y,power,smoothing,xv,yv,values)
return valuesGrid
if __name__ == "__main__":
power=1
smoothing=20
#Creating some data, with each coodinate and the values stored in separated lists
xv = [10,60,40,70,10,50,20,70,30,60]
yv = [10,20,30,30,40,50,60,70,80,90]
values = [1,2,2,3,4,6,7,7,8,10]
#Creating the output grid (100x100, in the example)
ti = np.linspace(0, 100, 100)
XI, YI = np.meshgrid(ti, ti)
#Creating the interpolation function and populating the output matrix value
ZI = invDist(xv,yv,values,100,100,power,smoothing)
# Plotting the result
n = plt.normalize(0.0, 100.0)
plt.subplot(1, 1, 1)
plt.pcolor(XI, YI, ZI)
plt.scatter(xv, yv, 100, values)
plt.title('Inv dist interpolation - power: ' + str(power) + ' smoothing: ' + str(smoothing))
plt.xlim(0, 100)
plt.ylim(0, 100)
plt.colorbar()
plt.show()
There's a bunch of options here, which one is best will depend on your data...
However I don't know of an out-of-the-box solution for you
You say your input data is from tripolar data. There are three main cases for how this data could be structured.
Sampled from a 3d grid in tripolar space, projected back to 2d LAT, LON data.
Sampled from a 2d grid in tripolar space, projected into 2d LAT LON data.
Unstructured data in tripolar space projected into 2d LAT LON data
The easiest of these is 2. Instead of interpolating in LAT LON space, "just" transform your point back into the source space and interpolate there.
Another option that works for 1 and 2 is to search for the cells that maps from tripolar space to cover your sample point. (You can use a BSP or grid type structure to speed up this search) Pick one of the cells, and interpolate inside it.
Finally there's a heap of unstructured interpolation options .. but they tend to be slow.
A personal favourite of mine is to use a linear interpolation of the nearest N points, finding those N points can again be done with gridding or a BSP. Another good option is to Delauney triangulate the unstructured points and interpolate on the resulting triangular mesh.
Personally if my mesh was case 1, I'd use an unstructured strategy as I'd be worried about having to handle searching through cells with overlapping projections. Choosing the "right" cell would be difficult.
I suggest you taking a look at GRASS (an open source GIS package) interpolation features (http://grass.ibiblio.org/gdp/html_grass62/v.surf.bspline.html). It's not in python but you can reimplement it or interface with C code.
Am I right in thinking your data grids look something like this (red is the old data, blue is the new interpolated data)?
alt text http://www.geekops.co.uk/photos/0000-00-02%20%28Forum%20images%29/DataSeparation.png
This might be a slightly brute-force-ish approach, but what about rendering your existing data as a bitmap (opengl will do simple interpolation of colours for you with the right options configured and you could render the data as triangles which should be fairly fast). You could then sample pixels at the locations of the new points.
Alternatively, you could sort your first set of points spatially and then find the closest old points surrounding your new point and interpolate based on the distances to those points.
There is a FORTRAN library called BIVAR, which is very suitable for this problem. With a few modifications you can make it usable in python using f2py.
From the description:
BIVAR is a FORTRAN90 library which interpolates scattered bivariate data, by Hiroshi Akima.
BIVAR accepts a set of (X,Y) data points scattered in 2D, with associated Z data values, and is able to construct a smooth interpolation function Z(X,Y), which agrees with the given data, and can be evaluated at other points in the plane.
I have these arrays that I need to interpolate and make the smoothest possible interpolation:
x = time
y = height
z = latitude
print np.shape(x)
print np.shape(y)
print np.shape(z)
Result:
(99, 25)
(99, 25)
(99, 25)
y is altitude and it's not uniform. It has a bunch of nan's and even though they're all the same size (a variable n_alt with the number of altitudes, which is for this example 99).
x is time and it's uniform all the way through (all the values in one column of that array are the same).
z is latitude and it's the actual 'z' and it's an array with the same number of rows as the number of time points and same number of rows as the altitude points.
I want to interpolate in 2D (the data set has series of nans in both x and y directions) to fill the gaps on the data, since several files will cover a certain altitude range and not others
My questions are:
1) is there a good way to fill the gaps the 2 directions while making the grid uniform (the idea is to plot that and also save the interpolated data (x,y and z) into a new file as well)?
2) what's a good way to contour plot the data with the shape I mentioned earlier (tried plt.contour, but it doesn't give a satisfactory result just plotting that straight up)?
Thanks y'all
Edit:
I believe this will illustrate the question better:
X: Time, Y: Altitude, Z: Latitude or Longitude
I essentially want to fill up the white space (I understand the consequences of extrapolations and all, but I just want, at this point, to have an algorithm that works. The blue dots is my grid and the color plot is just a normal plt.contour (no interpolation done). I want to make such that I have blue dots all over the plot area.
Rafael! With respect to your interpolation question, I can explain the math if you want to manually come up with an interpolation function, but there is an existing resource you might want to look into: scipy.interpolate.RegularGridInterpolator
(see https://docs.scipy.org/doc/scipy-0.16.1/reference/generated/scipy.interpolate.RegularGridInterpolator.html)
If I have misunderstood your issue, another interpolation method from the class might be appropriate: see, scipy.interpolate
For plotting the 3d surface, https://matplotlib.org/examples/mplot3d/surface3d_demo.html might help guide you! Let me know if this helps! Just comment if you would like me to expand! Hopefully those are the resources you were looking for!
My data is regularly spaced, but not quite a grid - each row of points is slightly offset from the one below.
The data is in the form of 3 1D arrays, x, y, z, with each index corresponding to a point. It is smoothly varying data - approximately Gaussian.
The point density is quite high. What is the best way to plot this data?
I tried meshgrid, but it gives me some bad contours through regions that have no data points near the contour's value.
I have tried rbf interpolation according to this post:
Python : 2d contour plot from 3 lists : x, y and rho?
but this just gives me nonsense - all the contours are on one edge - does not reflect the data at all.
Any other ideas for what I can try. Maybe I should be using some sort of nearest neighbour interpolation? Here is a picture of about a 1/4 of my data points: http://imgur.com/a/b00R6
I'm surprised it is causing me such difficulty - it seems like it should be fairly easy to plot.
The easiest way to plot ungridded data is probably tricontour or tricontourf (a filled tricontour plot).
Having 1D arrays of the x, y and z coordinates x, y and z, you'd simply call
plt.tricontourf(x,y,z, n, ...)
to obtain n levels of contours.
The other quick method is to interpolate on a grid using matplotlib.mlab.griddata to obtain a regular grid from the irregular points.
Both methods are compared in an example on the matplotlib page:
Tricontour vs. griddata
Found the answer: needed to rescale my data.
I am trying to interpolate 3D atmospheric data from one vertical coordinate to another using Numpy/Scipy. For example, I have cubes of temperature and relative humidity, both of which are on constant, regular pressure surfaces. I want to interpolate the relative humidity to constant temperature surface(s).
The exact problem I am trying to solve has been asked previously here, however, the solution there is very slow. In my case, I have approximately 3M points in my cube (30x321x321), and that method takes around 4 minutes to operate on one set of data.
That post is nearly 5 years old. Do newer versions of Numpy/Scipy perhaps have methods that handle this faster? Maybe new sets of eyes looking at the problem have a better approach? I'm open to suggestions.
EDIT:
Slow = 4 minutes for one set of data cubes. I'm not sure how else I can quantify it.
The code being used...
def interpLevel(grid,value,data,interp='linear'):
"""
Interpolate 3d data to a common z coordinate.
Can be used to calculate the wind/pv/whatsoever values for a common
potential temperature / pressure level.
grid : numpy.ndarray
The grid. For example the potential temperature values for the whole 3d
grid.
value : float
The common value in the grid, to which the data shall be interpolated.
For example, 350.0
data : numpy.ndarray
The data which shall be interpolated. For example, the PV values for
the whole 3d grid.
kind : str
This indicates which kind of interpolation will be done. It is directly
passed on to scipy.interpolate.interp1d().
returns : numpy.ndarray
A 2d array containing the *data* values at *value*.
"""
ret = np.zeros_like(data[0,:,:])
for yIdx in xrange(grid.shape[1]):
for xIdx in xrange(grid.shape[2]):
# check if we need to flip the column
if grid[0,yIdx,xIdx] > grid[-1,yIdx,xIdx]:
ind = -1
else:
ind = 1
f = interpolate.interp1d(grid[::ind,yIdx,xIdx], \
data[::ind,yIdx,xIdx], \
kind=interp)
ret[yIdx,xIdx] = f(value)
return ret
EDIT 2:
I could share npy dumps of sample data, if anyone was interested enough to see what I am working with.
Since this is atmospheric data, I imagine that your grid does not have uniform spacing; however if your grid is rectilinear (such that each vertical column has the same set of z-coordinates) then you have some options.
For instance, if you only need linear interpolation (say for a simple visualization), you can just do something like:
# Find nearest grid point
idx = grid[:,0,0].searchsorted(value)
upper = grid[idx,0,0]
lower = grid[idx - 1, 0, 0]
s = (value - lower) / (upper - lower)
result = (1-s) * data[idx - 1, :, :] + s * data[idx, :, :]
(You'll need to add checks for value being out of range, of course).For a grid your size, this will be extremely fast (as in tiny fractions of a second)
You can pretty easily modify the above to perform cubic interpolation if need be; the challenge is in picking the correct weights for non-uniform vertical spacing.
The problem with using scipy.ndimage.map_coordinates is that, although it provides higher order interpolation and can handle arbitrary sample points, it does assume that the input data be uniformly spaced. It will still produce smooth results, but it won't be a reliable approximation.
If your coordinate grid is not rectilinear, so that the z-value for a given index changes for different x and y indices, then the approach you are using now is probably the best you can get without a fair bit of analysis of your particular problem.
UPDATE:
One neat trick (again, assuming that each column has the same, not necessarily regular, coordinates) is to use interp1d to extract the weights doing something like follows:
NZ = grid.shape[0]
zs = grid[:,0,0]
ident = np.identity(NZ)
weight_func = interp1d(zs, ident, 'cubic')
You only need to do the above once per grid; you can even reuse weight_func as long as the vertical coordinates don't change.
When it comes time to interpolate then, weight_func(value) will give you the weights, which you can use to compute a single interpolated value at (x_idx, y_idx) with:
weights = weight_func(value)
interp_val = np.dot(data[:, x_idx, y_idx), weights)
If you want to compute a whole plane of interpolated values, you can use np.inner, although since your z-coordinate comes first, you'll need to do:
result = np.inner(data.T, weights).T
Again, the computation should be practically immediate.
This is quite an old question but the best way to do this nowadays is to use MetPy's interpolate_1d funtion:
https://unidata.github.io/MetPy/latest/api/generated/metpy.interpolate.interpolate_1d.html
There is a new implementation of Numba accelerated interpolation on regular grids in 1, 2, and 3 dimensions:
https://github.com/dbstein/fast_interp
Usage is as follows:
from fast_interp import interp2d
import numpy as np
nx = 50
ny = 37
xv, xh = np.linspace(0, 1, nx, endpoint=True, retstep=True)
yv, yh = np.linspace(0, 2*np.pi, ny, endpoint=False, retstep=True)
x, y = np.meshgrid(xv, yv, indexing='ij')
test_function = lambda x, y: np.exp(x)*np.exp(np.sin(y))
f = test_function(x, y)
test_x = -xh/2.0
test_y = 271.43
fa = test_function(test_x, test_y)
interpolater = interp2d([0,0], [1,2*np.pi], [xh,yh], f, k=5, p=[False,True], e=[1,0])
fe = interpolater(test_x, test_y)
I am trying to learn python by doing. I have real world xy (lat long dd) coordinates and z (km below surface) values for about 500,000 earthquakes (hypocenter) from M 0.1 to 2.0. I chopped the data down to 10 rows of xyz values in a .txt tab delimited table. I want to plot the data in a 3d scatter plot rotatable box in matplotlib. I can use basic commands to read the data and the format looks fine. I am not clear if I need to read the data into a list or array for mpl to read and plot the data. Do I need to create an array at all?
I then want to plot the subsurface location of an oil well, given the xyz coordinates of vertices along the well bore (about 40 positions), do I need to create a polyline?. This data has the same general coordiantes (to be evaluated further) as some of the hypcenters. Which set of data should be the plot, and which should be the subplot? Also, I am unclear as to the "float" that I need given the 6 to 7 decimal places of he xy lat long coordinates, and the 2 decimal z coordinates.
Matplotlib is a poor choice for this, actually. It doesn't allow true 3D plotting, and it won't handle the complexity (or number of points in 3D) that you need. Have a look at mayavi instead.
Incidentally, it sounds like you're doing microseismic? (I'm a geophysist, as well, for whatever it's worth.)
As a quick example:
from enthought.mayavi import mlab
import numpy as np
# Generate some random hypocenters
x, y, z, mag = np.random.random((4, 500))
# Make a curved well bore...
wellx, welly, wellz = 3 * [np.linspace(0, 1.5, 10)]
wellz = wellz**2
# Plot the hypocenters, colored and scaled by magnitude
mlab.points3d(x, y, z, mag)
# Plot the wellbore
mlab.plot3d(wellx, welly, wellz, tube_radius=0.1)
mlab.show()
As far as reading in your data goes, it sounds like it should be as simple as:
x, y, z = np.loadtxt('data.txt').T
What problems have you run in to?