So, I have three numpy arrays which store latitude, longitude, and some property value on a grid -- that is, I have LAT(y,x), LON(y,x), and, say temperature T(y,x), for some limits of x and y. The grid isn't necessarily regular -- in fact, it's tripolar.
I then want to interpolate these property (temperature) values onto a bunch of different lat/lon points (stored as lat1(t), lon1(t), for about 10,000 t...) which do not fall on the actual grid points. I've tried matplotlib.mlab.griddata, but that takes far too long (it's not really designed for what I'm doing, after all). I've also tried scipy.interpolate.interp2d, but I get a MemoryError (my grids are about 400x400).
Is there any sort of slick, preferably fast way of doing this? I can't help but think the answer is something obvious... Thanks!!
Try the combination of inverse-distance weighting and
scipy.spatial.KDTree
described in SO
inverse-distance-weighted-idw-interpolation-with-python.
Kd-trees
work nicely in 2d 3d ..., inverse-distance weighting is smooth and local,
and the k= number of nearest neighbours can be varied to tradeoff speed / accuracy.
There is a nice inverse distance example by Roger Veciana i Rovira along with some code using GDAL to write to geotiff if you're into that.
This is of coarse to a regular grid, but assuming you project the data first to a pixel grid with pyproj or something, all the while being careful what projection is used for your data.
A copy of his algorithm and example script:
from math import pow
from math import sqrt
import numpy as np
import matplotlib.pyplot as plt
def pointValue(x,y,power,smoothing,xv,yv,values):
nominator=0
denominator=0
for i in range(0,len(values)):
dist = sqrt((x-xv[i])*(x-xv[i])+(y-yv[i])*(y-yv[i])+smoothing*smoothing);
#If the point is really close to one of the data points, return the data point value to avoid singularities
if(dist<0.0000000001):
return values[i]
nominator=nominator+(values[i]/pow(dist,power))
denominator=denominator+(1/pow(dist,power))
#Return NODATA if the denominator is zero
if denominator > 0:
value = nominator/denominator
else:
value = -9999
return value
def invDist(xv,yv,values,xsize=100,ysize=100,power=2,smoothing=0):
valuesGrid = np.zeros((ysize,xsize))
for x in range(0,xsize):
for y in range(0,ysize):
valuesGrid[y][x] = pointValue(x,y,power,smoothing,xv,yv,values)
return valuesGrid
if __name__ == "__main__":
power=1
smoothing=20
#Creating some data, with each coodinate and the values stored in separated lists
xv = [10,60,40,70,10,50,20,70,30,60]
yv = [10,20,30,30,40,50,60,70,80,90]
values = [1,2,2,3,4,6,7,7,8,10]
#Creating the output grid (100x100, in the example)
ti = np.linspace(0, 100, 100)
XI, YI = np.meshgrid(ti, ti)
#Creating the interpolation function and populating the output matrix value
ZI = invDist(xv,yv,values,100,100,power,smoothing)
# Plotting the result
n = plt.normalize(0.0, 100.0)
plt.subplot(1, 1, 1)
plt.pcolor(XI, YI, ZI)
plt.scatter(xv, yv, 100, values)
plt.title('Inv dist interpolation - power: ' + str(power) + ' smoothing: ' + str(smoothing))
plt.xlim(0, 100)
plt.ylim(0, 100)
plt.colorbar()
plt.show()
There's a bunch of options here, which one is best will depend on your data...
However I don't know of an out-of-the-box solution for you
You say your input data is from tripolar data. There are three main cases for how this data could be structured.
Sampled from a 3d grid in tripolar space, projected back to 2d LAT, LON data.
Sampled from a 2d grid in tripolar space, projected into 2d LAT LON data.
Unstructured data in tripolar space projected into 2d LAT LON data
The easiest of these is 2. Instead of interpolating in LAT LON space, "just" transform your point back into the source space and interpolate there.
Another option that works for 1 and 2 is to search for the cells that maps from tripolar space to cover your sample point. (You can use a BSP or grid type structure to speed up this search) Pick one of the cells, and interpolate inside it.
Finally there's a heap of unstructured interpolation options .. but they tend to be slow.
A personal favourite of mine is to use a linear interpolation of the nearest N points, finding those N points can again be done with gridding or a BSP. Another good option is to Delauney triangulate the unstructured points and interpolate on the resulting triangular mesh.
Personally if my mesh was case 1, I'd use an unstructured strategy as I'd be worried about having to handle searching through cells with overlapping projections. Choosing the "right" cell would be difficult.
I suggest you taking a look at GRASS (an open source GIS package) interpolation features (http://grass.ibiblio.org/gdp/html_grass62/v.surf.bspline.html). It's not in python but you can reimplement it or interface with C code.
Am I right in thinking your data grids look something like this (red is the old data, blue is the new interpolated data)?
alt text http://www.geekops.co.uk/photos/0000-00-02%20%28Forum%20images%29/DataSeparation.png
This might be a slightly brute-force-ish approach, but what about rendering your existing data as a bitmap (opengl will do simple interpolation of colours for you with the right options configured and you could render the data as triangles which should be fairly fast). You could then sample pixels at the locations of the new points.
Alternatively, you could sort your first set of points spatially and then find the closest old points surrounding your new point and interpolate based on the distances to those points.
There is a FORTRAN library called BIVAR, which is very suitable for this problem. With a few modifications you can make it usable in python using f2py.
From the description:
BIVAR is a FORTRAN90 library which interpolates scattered bivariate data, by Hiroshi Akima.
BIVAR accepts a set of (X,Y) data points scattered in 2D, with associated Z data values, and is able to construct a smooth interpolation function Z(X,Y), which agrees with the given data, and can be evaluated at other points in the plane.
I am using python and geojson to do this, I want to specify a point and that point will be the center of a square, assuming the square is 1 mile by one mile I want to list all the points and polys found in the square, including polys bigger than the square.
I have multiple geojson files so will need to do the check a few times which is fine. I have been playing with the code below which checks to see if the cell center is near the centre of the square but will have issues for oddly shaped polygons. I really want to know all items / features that are found in the square.
import json
from shapely.geometry import shape, Point
from shapely.geometry import asShape, mapping
point = Point(14.9783266342289, 16.87265432621112)
max_distance_from_center = 1
with open('cells.geojson') as f:
js = json.load(f)
for feature in js['features']:
polygon = asShape(feature['geometry'])
distance = point.distance(polygon.centroid)
# print(f'{distance} - {polygon.centroid}')
if distance < max_distance_from_center:
print (f'Found cells containing polygon:{feature}')
For source data I was using a exported map from https://azgaar.github.io/Fantasy-Map-Generator/ the grid should be 10 miles by 10 miles. Suggestions on how to do this?
Update:
Here is a poorly drawn diagram. Within the grid square I want to identify all markers and polygons that fall within the bounds of the square even if they go out side of it. I want to have a list of all features that have some presence in the grid square. I highlighted the areas in yellow.
Poorly draw image
I looked at intersects and it may do it. Will try tonight.
you can try this:
First, create grid.
from shapely.geometry import Point
from matplotlib.pyplot as plt
point = Point(0, -10)
square = point.buffer(0.5).envelope
fig, ax = plt.subplots(figsize=(5,5))
gpd.GeoSeries(square).plot(ax=ax)
gpd.GeoSeries(point).plot(ax=ax, color = "black",markersize=30)
plt.grid()
plt.show()
and then,
import geopandas as gpd
# get geodataframe from geojson file
geo_df = gpd.GeoDataFrame.from_file('cells.geojson')
geo_df['grid_yn'] = geo_df['geometry'].apply(lambda x : x.intersects(square))
I'm using the scipy.spatial.Voronoi function to bin up my data where the cell sizes decrease with increased 2D point density, in Python.
However the function places cell boundaries around each point such that each cell can only contain one point. (This is the point of a Voronoi diagram anyway)
However, is there a way to specify a minimum number of samples to lie in each bin such that you still have smaller cells in over-dense regions but no cell contains only one point?
Link to the function in question: https://docs.scipy.org/doc/scipy-0.18.1/reference/generated/scipy.spatial.Voronoi.html
Example code:
import matplotlib.pyplot as plt
from scipy.spatial import Voronoi, voronoi_plot_2d
vor = Voronoi(xy.T)
fig = voronoi_plot_2d(vor, show_vertices=False, line_colors='orange',
line_width=2, line_alpha=0.6, point_size=2)
Note: xy can be any stacked x and y coordinate array of points here.
Perhaps there's an alternative algorithm entirely for this task? Any help will be greatly appreciated!
I have a set 100k of of geo locations (lat/lon) and a hexogonal grid (4k polygons). My goal is to calculate the total number of points which are located within each polygon.
My current algorithm uses 2 for loops to loop over all geo points and all polygons, which is really slow if I increase the number of polygons... How would you speedup the algorithm? I have uploaded a minimal example which creates 100k random geo points and uses 561 cells in the grid...
I also saw that reading the geo json file (with 4k polygons) takes some time, maybe i should export the polygons into a csv?
hexagon_grid.geojson file:
https://gist.github.com/Arnold1/9e41454e6eea910a4f6cd68ff1901db1
minimal python example:
https://gist.github.com/Arnold1/ee37a2e4b2dfbfdca9bfae7c7c3a3755
You don't need to explicitly test each hexagon to see whether a given point is located inside it.
Let's assume, for the moment, that all of your points fall somewhere within the bounds of your hexagonal grid. Because your hexagons form a regular lattice, you only really need to know which of the hexagon centers is closest to each point.
This can be computed very efficiently using a scipy.spatial.cKDTree:
import numpy as np
from scipy.spatial import cKDTree
import json
with open('/tmp/grid.geojson', 'r') as f:
data = json.load(f)
verts = []
centroids = []
for hexagon in data['features']:
# a (7, 2) array of xy coordinates specifying the vertices of the hexagon.
# we ignore the last vertex since it's equal to the first
xy = np.array(hexagon['geometry']['coordinates'][0][:6])
verts.append(xy)
# compute the centroid by taking the average of the vertex coordinates
centroids.append(xy.mean(0))
verts = np.array(verts)
centroids = np.array(centroids)
# construct a k-D tree from the centroid coordinates of the hexagons
tree = cKDTree(centroids)
# generate 10000 normally distributed xy coordinates
sigma = 0.5 * centroids.std(0, keepdims=True)
mu = centroids.mean(0, keepdims=True)
gen = np.random.RandomState(0)
xy = (gen.randn(10000, 2) * sigma) + mu
# query the k-D tree to find which hexagon centroid is nearest to each point
distance, idx = tree.query(xy, 1)
# count the number of points that are closest to each hexagon centroid
counts = np.bincount(idx, minlength=centroids.shape[0])
Plotting the output:
from matplotlib import pyplot as plt
fig, ax = plt.subplots(1, 1, subplot_kw={'aspect': 'equal'})
ax.hold(True)
ax.scatter(xy[:, 0], xy[:, 1], 10, c='b', alpha=0.25, edgecolors='none')
ax.scatter(centroids[:, 0], centroids[:, 1], marker='h', s=(counts + 5),
c=counts, cmap='Reds')
ax.margins(0.01)
I can think of several different ways you could handle points that fall outside your grid depending on how much accuracy you need:
You could exclude points that fall outside the outer bounding rectangle of your hexagon vertices (i.e. x < xmin, x > xmax etc.). However, this will fail to exclude points that fall within the 'gaps' along the edges of your grid.
Another straightforward option would be to set a cut-off on distance according to the spacing of your hexagon centers, which is equivalent to using a circular approximation for your outer hexagons.
If accuracy is crucial then you could define a matplotlib.path.Path corresponding to the outer vertices of your hexagonal grid, then use its .contains_points() method to test whether your points are contained within it. Compared to the other two methods, this would probably be slower and more fiddly to code.
I have (tons of) coordinates of points for closed curve(s) sorted in x-increasing order.
When plot it in the regular way the result i get is this:
(circle as an example only, the shapes I currently have can be, at best, classified as amoeboids)
But the result I am looking for is something like this:
I have looked through matplotlib and I couldn't find anything. (Maybe I had my keywords wrong...?)
I have tried to reformat the data in the following ways:
Pick a point at random, find its nearest neighbor and then the next nearest neighbor and so on..
It fails at the edges where, sometimes the data isn't too consistent (the nearest neighbor maybe on the opposite side of the curve).
To account for inconsistent data, I tried to check if the slope between two points (which are being considered as nearest neighbors) matches with the previously connected slope - Fails, for reasons I could not find. (spent considerable number of hours before I gave up)
Pick x_minimum and x_maximum (and corresponding y coordinates) and draw an imaginary line and sort for points on either side of the line. - Fails when you have a curve that looks like a banana.
Is there a python package/library that can help me get to where I want.? Or can you help me with ideas to sort my data points better.? Thanks in advance.
EDIT:
Tried the ConcaveHull on the circle I had, any idea why the lines are overlapping at places.? Here's the image:
EDIT2:
The problem was sorted out by changing part of my code as suggested by #Reblochon Masque in the comment section in his answer.
If you don't know how your points are set up (if you do I recommend you follow that order, it will be faster) you can use Convex Hull from scipy:
import matplotlib.pyplot as plt
from scipy.spatial import ConvexHull
# RANDOM DATA
x = np.random.normal(0,1,100)
y = np.random.normal(0,1,100)
xy = np.hstack((x[:,np.newaxis],y[:,np.newaxis]))
# PERFORM CONVEX HULL
hull = ConvexHull(xy)
# PLOT THE RESULTS
plt.scatter(x,y)
plt.plot(x[hull.vertices], y[hull.vertices])
plt.show()
, which in the example above results is this:
Notice this method will create a bounding box for your points.
Here is an example that will maybe do what you want and solve your problem:
more info here
import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial import ConvexHull
points = np.random.rand(30, 2) # 30 random points in 2-D
hull = ConvexHull(points)
#xs = np.array([point[0] for point in points])
#ys = np.array([point[1] for point in points])
#xh = np.array([point[0] for point in hull.points])
#yh = np.array([point[1] for point in hull.points])
plt.plot(points[:,0], points[:,1], 'o')
for simplex in hull.simplices:
plt.plot(points[simplex, 0], points[simplex, 1], 'k-')
plt.plot(points[hull.vertices,0], points[hull.vertices,1], 'r--', lw=2)
plt.plot(points[hull.vertices[0],0], points[hull.vertices[0],1], 'ro')
plt.show()
The points on the of the convex hull are plotted separately and joined to form a polygon. You can further manipulate them if you want.
I think this is maybe a good solution (easy and cheap) to implement in your case. It will work well if your shapes are convex.
In case your shapes are not all convex, one approach that might be successful could be to sort the points according to which neighbor is closest, and draw a polygon from this sorted set.