python interp2d strange checkers - python

I am interpolating an array (2d_values) of size 105 by 109 with scipy.interpolate.interp2d.
function=interp2d(2d_x_coords,2d_y_cords,2d_values)
interpolated=float(function(2d_x_finer_coords,2d_y_fner_coords))
I am having an issue where interpolated comes out with proper values in most locations but in certain areas of interpolated there are checkerboards and stripes of huge positive and negative numbers when the data is supposed to be between (0 and ~300).
2d_values is a relatively continuous field with a few places with large jumps between adjacent coordinates, and is a map projection of latitude and longitude coordinates so the coordinates are not a regular grid but are distorted as a flat map is.
the picture on the right is 2d_values and the picture on the left is interpolated
this is the code used to perform this

Using griddata instead yielded great results with no issues.

Related

Clustering on evenly spaced grid points

I have a 50 by 50 grid of evenly spaced (x,y) points. Each of these points has a third scalar value. This can be visualized using a contourplot which I have added. I am interested in the regions indicated in by the red circles. These regions of low "Z-values" are what I want to extract from this data.
2D contour plot of 50 x 50 evenly spaced grid points:
I want to do this by using clustering (machine learning), which can be lightning quick when applied correctly. The problem is, however, that the points are evenly spaced together and therefore the density of the entire dataset is equal everywhere.
I have tried using a DBSCAN algorithm with a custom distance metric which takes into account the Z values of each point. I have defined the distance between two points as follows:\
def custom_distance(point1,point2):
average_Z = (point1[2]+point2[2])/2
distance = np.sqrt(np.square((point1[0]-point2[0])) + np.square((point1[1]-point2[1])))
distance = distance * average_Z
return distance
This essentially determines the Euclidean distance between two points and adds to it the average of the two Z values of both points. In the picture below I have tested this distance determination function applied in a DBSCAN algorithm. Each point in this 50 by 50 grid each has a Z value of 1, except for four clusters that I have randomly placed. These points each have a z value of 10. The algorithm is able to find the clusters in the data based on their z value as can be seen below.
DBSCAN clustering result using scalar value distance determination:
Positive about the results I tried to apply it to my actual data, only to be disappointed by the results. Since the x and y values of my data are very large, I have simply scaled them to be 0 to 49. The z values I have left untouched. The results of the clustering can be seen in the image below:
Clustering result on original data:
This does not come close to what I want and what I was expecting. For some reason the clusters that are found are of rectangular shape and the light regions of low Z values that I am interested in are not extracted with this approach.
Is there any way I can make the DBSCAN algorithm work in this way? I suspect the reason that it is currently not working has something to do with the differences in scale of the x,y and z values. I am also open for tips or recommendations on other approaches on how to define and find the lighter regions in the data.

Bilinear interpolation from a (distorted) rectangular 2D grid to arbitrary points, in Python

The task at hand is seemingly simple:
I have a 2D grid of data. The data is available in 2D arrays for X and Y coordinates, as well as the input variable which I want to interpolate. This means I can plot the data using rectangular cells, which means it is possible to use bilinear interpolation. Unfortunately, the data is not precisely aligned with the coordinates, and also not precisely spaced. There were some numerics involved in creating the data, which means that all sampling locations are a little off the mark, and the cell spacing is smooth but not uniform.
I would like to interpolate from this input grid to a set of predefined sample coordinates (as opposed to simply refining the mesh).
In short, an example for my type of input is:
# a nice, regular grid
Xs, Ys = np.meshgrid(np.linspace(0, 1, num=3), np.linspace(0, 1, num=5))
# ...perturbed by some systematic and some random noise ...
X_in = Xs + np.random.normal(scale=0.03, size=(5, 3))
# ...and some systematic deviation
Y_in = (Ys + np.random.normal(scale=0.03, size=(5, 3)))* (1 + Xs**1.5)
# and some variable at each node to interpolate
Z_in = np.random.normal(scale=1, size=(5, 3))
So (X_in, Y_in) are arrays of shape (n, m) which define a mesh with quadrilateral cells, and Z_in another array of ther same shape which provides a value at each node in that mesh. I am looking for some Python library that performs bilinear interpolation of Z_in across those cells.
However, all methods I have found so far either ignore the rectangular structure (and triangulate the data, or fit some 2D spline through arbitrary point clouds), or require a perfectly rectangular and equally-spaced grid as input (which mine is not).
Examples of answers/methods that seem not to be appliccable:
This answer recommends using scipy.ndimage.map_coordinates -- but that effectively uses the indices of the 2D input data array as coordinates, which won't work for me.
scipy.interpolate.interp2d requires either a regular grid (node locations provided by 1D X and Y arrays), or an irregular one, which is flattened, which means that the algorithm cannot know which nodes form a cell. This means it either fits some spline through unstructured data, or triangulates it. And it only interpolates onto regular grids or individual points.
scipy.interpolate.RectBivariateSpline is recommended for interpolation from gridded data but only accepts input points which are perfectly aligned with the coordinate system.
There's also a Matplotlib toolkit for interpolation, which I had thought should be able to do this sort of thing, as it also does interpolated contour plots of rectangular meshes, but as it turns out, even though mpl_toolkits.basemap.interp accepts arbitrary quadrilateral meshes as target for interpolation, it cannot use them as inputs ...
Upon closer inspection, it turns out that even matplotlib.plt.contour() does not seem to perform bilinear interpolation when plotting the input data:
plt.contour(X_in, Y_in, Z_in, levels=np.linspace(Z_in.min(), Z_in.max(), 50))
plt.plot(X_in, Y_in, 'k-')
plt.plot(X_in.T, Y_in.T, 'k-')
As you can see, the contour lines within the cells are straight, but with bilinear interpolation, they should not be, and there should not be those empty quadrilateral areas in the mittle of some cells. I suspect that Matplotlib only finds the contour values on the cell edges and simply draws straight lines between them.
I have found two explanations of the maths of bilinear interpolation from grids which are not perfectly aligned, but I was hoping to come across a ready-made implementation somewhere because I'm sure that this kind of task is not so rare, and a numpy or scipy implementation (if it exists) is probably way faster than whatever I'd implement myself.

2D interpolate list of many points [duplicate]

So, I have three numpy arrays which store latitude, longitude, and some property value on a grid -- that is, I have LAT(y,x), LON(y,x), and, say temperature T(y,x), for some limits of x and y. The grid isn't necessarily regular -- in fact, it's tripolar.
I then want to interpolate these property (temperature) values onto a bunch of different lat/lon points (stored as lat1(t), lon1(t), for about 10,000 t...) which do not fall on the actual grid points. I've tried matplotlib.mlab.griddata, but that takes far too long (it's not really designed for what I'm doing, after all). I've also tried scipy.interpolate.interp2d, but I get a MemoryError (my grids are about 400x400).
Is there any sort of slick, preferably fast way of doing this? I can't help but think the answer is something obvious... Thanks!!
Try the combination of inverse-distance weighting and
scipy.spatial.KDTree
described in SO
inverse-distance-weighted-idw-interpolation-with-python.
Kd-trees
work nicely in 2d 3d ..., inverse-distance weighting is smooth and local,
and the k= number of nearest neighbours can be varied to tradeoff speed / accuracy.
There is a nice inverse distance example by Roger Veciana i Rovira along with some code using GDAL to write to geotiff if you're into that.
This is of coarse to a regular grid, but assuming you project the data first to a pixel grid with pyproj or something, all the while being careful what projection is used for your data.
A copy of his algorithm and example script:
from math import pow
from math import sqrt
import numpy as np
import matplotlib.pyplot as plt
def pointValue(x,y,power,smoothing,xv,yv,values):
nominator=0
denominator=0
for i in range(0,len(values)):
dist = sqrt((x-xv[i])*(x-xv[i])+(y-yv[i])*(y-yv[i])+smoothing*smoothing);
#If the point is really close to one of the data points, return the data point value to avoid singularities
if(dist<0.0000000001):
return values[i]
nominator=nominator+(values[i]/pow(dist,power))
denominator=denominator+(1/pow(dist,power))
#Return NODATA if the denominator is zero
if denominator > 0:
value = nominator/denominator
else:
value = -9999
return value
def invDist(xv,yv,values,xsize=100,ysize=100,power=2,smoothing=0):
valuesGrid = np.zeros((ysize,xsize))
for x in range(0,xsize):
for y in range(0,ysize):
valuesGrid[y][x] = pointValue(x,y,power,smoothing,xv,yv,values)
return valuesGrid
if __name__ == "__main__":
power=1
smoothing=20
#Creating some data, with each coodinate and the values stored in separated lists
xv = [10,60,40,70,10,50,20,70,30,60]
yv = [10,20,30,30,40,50,60,70,80,90]
values = [1,2,2,3,4,6,7,7,8,10]
#Creating the output grid (100x100, in the example)
ti = np.linspace(0, 100, 100)
XI, YI = np.meshgrid(ti, ti)
#Creating the interpolation function and populating the output matrix value
ZI = invDist(xv,yv,values,100,100,power,smoothing)
# Plotting the result
n = plt.normalize(0.0, 100.0)
plt.subplot(1, 1, 1)
plt.pcolor(XI, YI, ZI)
plt.scatter(xv, yv, 100, values)
plt.title('Inv dist interpolation - power: ' + str(power) + ' smoothing: ' + str(smoothing))
plt.xlim(0, 100)
plt.ylim(0, 100)
plt.colorbar()
plt.show()
There's a bunch of options here, which one is best will depend on your data...
However I don't know of an out-of-the-box solution for you
You say your input data is from tripolar data. There are three main cases for how this data could be structured.
Sampled from a 3d grid in tripolar space, projected back to 2d LAT, LON data.
Sampled from a 2d grid in tripolar space, projected into 2d LAT LON data.
Unstructured data in tripolar space projected into 2d LAT LON data
The easiest of these is 2. Instead of interpolating in LAT LON space, "just" transform your point back into the source space and interpolate there.
Another option that works for 1 and 2 is to search for the cells that maps from tripolar space to cover your sample point. (You can use a BSP or grid type structure to speed up this search) Pick one of the cells, and interpolate inside it.
Finally there's a heap of unstructured interpolation options .. but they tend to be slow.
A personal favourite of mine is to use a linear interpolation of the nearest N points, finding those N points can again be done with gridding or a BSP. Another good option is to Delauney triangulate the unstructured points and interpolate on the resulting triangular mesh.
Personally if my mesh was case 1, I'd use an unstructured strategy as I'd be worried about having to handle searching through cells with overlapping projections. Choosing the "right" cell would be difficult.
I suggest you taking a look at GRASS (an open source GIS package) interpolation features (http://grass.ibiblio.org/gdp/html_grass62/v.surf.bspline.html). It's not in python but you can reimplement it or interface with C code.
Am I right in thinking your data grids look something like this (red is the old data, blue is the new interpolated data)?
alt text http://www.geekops.co.uk/photos/0000-00-02%20%28Forum%20images%29/DataSeparation.png
This might be a slightly brute-force-ish approach, but what about rendering your existing data as a bitmap (opengl will do simple interpolation of colours for you with the right options configured and you could render the data as triangles which should be fairly fast). You could then sample pixels at the locations of the new points.
Alternatively, you could sort your first set of points spatially and then find the closest old points surrounding your new point and interpolate based on the distances to those points.
There is a FORTRAN library called BIVAR, which is very suitable for this problem. With a few modifications you can make it usable in python using f2py.
From the description:
BIVAR is a FORTRAN90 library which interpolates scattered bivariate data, by Hiroshi Akima.
BIVAR accepts a set of (X,Y) data points scattered in 2D, with associated Z data values, and is able to construct a smooth interpolation function Z(X,Y), which agrees with the given data, and can be evaluated at other points in the plane.

How to plot coarse-grained average of a set of data points?

I have a set of discrete 2-dimensional data points. Each of these points has a measured value associated with it. I would like to get a scatter plot with points colored by their measured values. But the data points are so dense that points with different colors would overlap with each other, that may not be good for visualization. So I am thinking if I could associate the color for each point based on the coarse-grained average of measured values of some points near it. Does anyone know how to implement this in Python?
Thanks!
I have it done by using sklearn.neighbors.RadiusNeighborsClassifier(), the idea is the take the average of the values of the neighbors within a specific radius. Suppose the coordinates of the data points are in the list temp_coors, the values associated with these points are coloring, then coloring could be coarse-grained in the following way:
r_neigh = RadiusNeighborsRegressor(radius=smoothing_radius, weights='uniform')
r_neigh.fit(temp_coors, coloring)
coloring = r_neigh.predict(temp_coors)

Finding n nearest data points to grid locations

I'm working on a problem where I have a large set (>4 million) of data points located in a three-dimensional space, each with a scalar function value. This is represented by four arrays: XD, YD, ZD, and FD. The tuple (XD[i], YD[i], ZD[i]) refers to the location of data point i, which has a value of FD[i].
I'd like to superimpose a rectilinear grid of, say, 100x100x100 points in the same space as my data. This grid is set up as follows.
[XGrid, YGrid, ZGrid] = np.mgrid[Xmin:Xmax:Xstep, Ymin:Ymax:Ystep, Zmin:Zmax:Zstep]
XG = XGrid[:,0,0]
YG = YGrid[0,:,0]
ZG = ZGrid[0,0,:]
XGrid is a 3D array of the x-value at each point in the grid. XG is a 1D array of the x-values going from Xmin to Xmax, separated by a distance of XStep.
I'd like to use an interpolation algorithm I have to find the value of the function at each grid point based on the data surrounding it. In this algorithm I require 20 data points closest (or at least close) to my grid point of interest. That is, for grid point (XG[i], YG[j], ZG[k]) I want to find the 20 closest data points.
The only way I can think of is to have one for loop that goes through each data point and a subsequent embedded for loop going through all (so many!) data points, calculating the Euclidean distance, and picking out the 20 closest ones.
for i in range(0,XG.shape):
for j in range(0,YG.shape):
for k in range(0,ZG.shape):
Distance = np.zeros([XD.shape])
for a in range(0,XD.shape):
Distance[a] = (XD[a] - XG[i])**2 + (YD[a] - YG[j])**2 + (ZD[a] - ZG[k])**2
B = np.zeros([20], int)
for a in range(0,20):
indx = np.argmin(Distance)
B[a] = indx
Distance[indx] = float(inf)
This would give me an array, B, of the indices of the data points closest to the grid point. I feel like this would take too long to go through each data point at each grid point.
I'm looking for any suggestions, such as how I might be able to organize the data points before calculating distances, which could cut down on computation time.
Have a look at a seemingly simmilar but 2D problem and see if you cannot improve with ideas from there.
From the top of my head, I'm thinking that you can sort the points according to their coordinates (three separate arrays). When you need the closest points to the [X, Y, Z] grid point you'll quickly locate points in those three arrays and start from there.
Also, you don't really need the euclidian distance, since you are only interested in relative distance, which can also be described as:
abs(deltaX) + abs(deltaY) + abs(deltaZ)
And save on the expensive power and square roots...
No need to iterate over your data points for each grid location: Your grid locations are inherently ordered, so just iterate over your data points once, and assign each data point to the eight grid locations that surround it. When you're done, some grid locations may have too few data points. Check the data points of adjacent grid locations. If you have plenty of data points to go around (it depends on how your data is distributed), you can already select the 20 closest neighbors during the initial pass.
Addendum: You may want to reconsider other parts of your algorithm as well. Your algorithm is a kind of piecewise-linear interpolation, and there are plenty of relatively simple improvements. Instead of dividing your space into evenly spaced cubes, consider allocating a number of center points and dynamically repositioning them until the average distance of data points from the nearest center point is minimized, like this:
Allocate each data point to its closest center point.
Reposition each center point to the coordinates that would minimize the average distance from "its" points (to the "centroid" of the data subset).
Some data points now have a different closest center point. Repeat steps 1. and 2. until you converge (or near enough).

Categories