Extract coordinates enclosed by a matplotlib patch. - python

I have created an ellipse using matplotlib.patches.ellipse as shown below:
patch = mpatches.Ellipse(center, major_ax, minor_ax, angle_deg, fc='none', ls='solid', ec='g', lw='3.')
What I want is a list of all the integer coordinates enclosed inside this patch.
I.e. If I was to plot this ellipse along with every integer point on the same grid, how many of those points are enclosed in the ellipse?
I have tried seeing if I can extract the equation of the ellipse so I can loop through each point and see whether it falls within the line but I can't seem to find an obvious way to do this, it becomes more complicated as the major axis of the ellipse can be orientated at any angle. The information to do this must be stored in patches somewhere, but I can't seem to find it.
Any advice on this would be much appreciated.

Ellipse objects have a method contains_point which will return 1 if the point is in the ellipse, 0 other wise.
Stealing from #DrV 's answer:
import matplotlib.pyplot as plt
import matplotlib.patches
import numpy as np
# create an ellipse
el = matplotlib.patches.Ellipse((50,-23), 10, 13.7, 30, facecolor=(1,0,0,.2), edgecolor='none')
# calculate the x and y points possibly within the ellipse
y_int = np.arange(-30, -15)
x_int = np.arange(40, 60)
# create a list of possible coordinates
g = np.meshgrid(x_int, y_int)
coords = list(zip(*(c.flat for c in g)))
# create the list of valid coordinates (from untransformed)
ellipsepoints = np.vstack([p for p in coords if el.contains_point(p, radius=0)])
# just to see if this works
fig = plt.figure()
ax = fig.add_subplot(111)
ax.add_artist(el)
ep = np.array(ellipsepoints)
ax.plot(ellipsepoints[:,0], ellipsepoints[:,1], 'ko')
plt.show()
This will give you the result as below:

If you really want to use the methods offered by matplotlib, then:
import matplotlib.pyplot as plt
import matplotlib.patches
import numpy as np
# create an ellipse
el = matplotlib.patches.Ellipse((50,-23), 10, 13.7, 30, facecolor=(1,0,0,.2), edgecolor='none')
# find the bounding box of the ellipse
bb = el.get_window_extent()
# calculate the x and y points possibly within the ellipse
x_int = np.arange(np.ceil(bb.x0), np.floor(bb.x1) + 1, dtype='int')
y_int = np.arange(np.ceil(bb.y0), np.floor(bb.y1) + 1, dtype='int')
# create a list of possible coordinates
g = np.meshgrid(x_int, y_int)
coords = np.array(zip(*(c.flat for c in g)))
# create a list of transformed points (transformed so that the ellipse is a unit circle)
transcoords = el.get_transform().inverted().transform(coords)
# find the transformed coordinates which are within a unit circle
validcoords = transcoords[:,0]**2 + transcoords[:,1]**2 < 1.0
# create the list of valid coordinates (from untransformed)
ellipsepoints = coords[validcoords]
# just to see if this works
fig = plt.figure()
ax = fig.add_subplot(111)
ax.add_artist(el)
ep = np.array(ellipsepoints)
ax.plot(ellipsepoints[:,0], ellipsepoints[:,1], 'ko')
Seems to work:
(Zooming in reveals that even the points hanging on the edge are inside.)
The point here is that matplotlib handles ellipses as transformed circles (translate, rotate, scale, anything affine). If the transform is applied in reverse, the result is a unit circle at origin, and it is very simple to check if a point is within that.
Just a word of warning: The get_window_extent may not be extremely reliable, as it seems to use the spline approximation of a circle. Also, see tcaswell's comment on the renderer-dependency.
In order to find a more reliable bounding box, you may:
create a horizontal and vertical vector into the plot coordinates (their position is not important, ([0,0],[1,0]) and ([0,0], [0,1]) will do)
transform these vectors into the ellipse coordinates (the get_transform, etc.)
find in the ellipse coordinate system (i.e. the system where the ellipse is a unit circle around the origin) the four tangents of the circle which are parallel to these two vectors
find the intersection points of the vectors (4 intersections, but 2 diagonal will be enough)
transform the intersection points back to the plot coordinates
This will give an accurate (but of course limited by the numerical precision) square bounding box.
However, you may use a simple approximation:
all possible points are within a circle whose center is the same as that of the ellipse and whose diameter is the same as that of the major axis of the ellipse
In other words, all possible points are within a square bounding box which is between x0+-m/2, y0+-m/2, where (x0, y0) is the center of the ellipse and m the major axis.

I'd like to offer another solution that uses the Path object's contains_points() method instead of contains_point():
First get the coordinates of the ellipse and make it into a Path object:
elpath=Path(el.get_verts())
(NOTE that el.get_paths() won't work for some reason.)
Then call the path's contains_points():
validcoords=elpath.contains_points(coords)
Below I'm comparing #tacaswell's solution (method 1), #Drv's (method 2) and my own (method 3) (I've enlarged the ellipse by ~5 times):
import numpy
import matplotlib.pyplot as plt
from matplotlib.patches import Ellipse
from matplotlib.path import Path
import time
#----------------Create an ellipse----------------
el=Ellipse((50,-23),50,70,30,facecolor=(1,0,0,.2), edgecolor='none')
#---------------------Method 1---------------------
t1=time.time()
for ii in range(50):
y=numpy.arange(-100,50)
x=numpy.arange(-30,130)
g=numpy.meshgrid(x,y)
coords=numpy.array(zip(*(c.flat for c in g)))
ellipsepoints = numpy.vstack([p for p in coords if el.contains_point(p, radius=0)])
t2=time.time()
print 'time of method 1',t2-t1
#---------------------Method 2---------------------
t2=time.time()
for ii in range(50):
y=numpy.arange(-100,50)
x=numpy.arange(-30,130)
g=numpy.meshgrid(x,y)
coords=numpy.array(zip(*(c.flat for c in g)))
invtrans=el.get_transform().inverted()
transcoords=invtrans.transform(coords)
validcoords=transcoords[:,0]**2+transcoords[:,1]**2<=1.0
ellipsepoints=coords[validcoords]
t3=time.time()
print 'time of method 2',t3-t2
#---------------------Method 3---------------------
t3=time.time()
for ii in range(50):
y=numpy.arange(-100,50)
x=numpy.arange(-30,130)
g=numpy.meshgrid(x,y)
coords=numpy.array(zip(*(c.flat for c in g)))
#------Create a path from ellipse's vertices------
elpath=Path(el.get_verts())
# call contains_points()
validcoords=elpath.contains_points(coords)
ellipsepoints=coords[validcoords]
t4=time.time()
print 'time of method 3',t4-t3
#---------------------Plot it ---------------------
fig,ax=plt.subplots()
ax.add_artist(el)
ep=numpy.array(ellipsepoints)
ax.plot(ellipsepoints[:,0],ellipsepoints[:,1],'ko')
plt.show(block=False)
I got these execution time:
time of method 1 62.2502269745
time of method 2 0.488734006882
time of method 3 0.588987112045
So the contains_point() approach is way slower. The coordinate-transformation method is faster than mine, but when you get irregular shaped contours/polygons, this method would still work.
Finally the result plot:

Related

Getting coordinates of surface nodes using pyvista

I'm wondering if anyone could help me figure out how to apply pyvista to extract the surface nodes of a 3D object. For example, suppose I have a collection of points that builds out a sphere, including 'interior' and 'surface' points:
import numpy as np
import matplotlib.pyplot as plt
N = 50
max_rad = 1
thetavec = np.linspace(0,np.pi,N)
phivec = np.linspace(0,2*np.pi,2*N)
[th, ph] = np.meshgrid(thetavec,phivec)
R = np.random.rand(*th.shape) * max_rad
x = R*np.sin(th)*np.cos(ph)
y = R*np.sin(th)*np.sin(ph)
z = R*np.cos(th)
ax = plt.axes(projection='3d')
ax.plot(x.flatten(), y.flatten(), z.flatten(), '*')
Now I'd like to apply pyvista's extract_surface to locate the 'nodes' that live on the surface, together with their coordinates. That is, I'd like for extract_surface to return an array or dataframe of the coordinates of the surface points. I've tried to build a polydata object just with the vertices above (see link and section 'Initialize with just vertices')
Any help is much appreciated. Thanks!
Since you've confirmed in a comment that you're looking for a convex hull, you can do this using the delaunay_3d() filter. The output of the triangulation is an UnstructuredGrid that contains a grid of tetrahedra that fills the convex hull of you mesh. Calling extract_surface() on this space-filling mesh will give you the actual exterior, i.e. the convex hull:
import numpy as np
import pyvista as pv
# your example data
N = 50
max_rad = 1
thetavec = np.linspace(0,np.pi,N)
phivec = np.linspace(0,2*np.pi,2*N)
[th, ph] = np.meshgrid(thetavec,phivec)
R = np.random.rand(*th.shape) * max_rad
x = R*np.sin(th)*np.cos(ph)
y = R*np.sin(th)*np.sin(ph)
z = R*np.cos(th)
# create a PyVista point cloud (in a PolyData)
points = np.array([x, y, z]).reshape(3, -1).T # shape (n_points, 3)
cloud = pv.PolyData(points)
# extract surface by Delaunay triangulation to get the convex hull
convex_hull = cloud.delaunay_3d().extract_surface() # contains faces
surface_points = convex_hull.cast_to_pointset() # only points
# check what we've got
surface_points.plot(
render_points_as_spheres=True,
point_size=10,
background='paleturquoise',
scalar_bar_args={'color': 'black'},
)
(On older PyVista versions where PolyData.cast_to_pointset() is not available, one can convex_hull.extract_points(range(convex_hull.n_points))).
The result looks like this:
Playing around with this interactively it's obvious that it only contains points from the convex hull (i.e. it doesn't contain interior points).
Also note the colouring: the scalars used are called 'vtkOriginalPointIds' which are what you would actually expect if you tried to guess: it is the index of each point in the original point cloud. So we can use these scalars to extract the indices of the points making up the point cloud:
# grab original point indices
surface_point_inds = surface_points.point_data['vtkOriginalPointIds']
# confirm that the indices are correct
print(np.array_equal(surface_points.points, cloud.points[surface_point_inds, :]))
# True
Of course if you don't need to identify the surface points in the original point cloud then you can just use surface_points.points or even convex_hull.points to get a standalone array of convex hull point coordinates.

speedup geolocation algorithm in python

I have a set 100k of of geo locations (lat/lon) and a hexogonal grid (4k polygons). My goal is to calculate the total number of points which are located within each polygon.
My current algorithm uses 2 for loops to loop over all geo points and all polygons, which is really slow if I increase the number of polygons... How would you speedup the algorithm? I have uploaded a minimal example which creates 100k random geo points and uses 561 cells in the grid...
I also saw that reading the geo json file (with 4k polygons) takes some time, maybe i should export the polygons into a csv?
hexagon_grid.geojson file:
https://gist.github.com/Arnold1/9e41454e6eea910a4f6cd68ff1901db1
minimal python example:
https://gist.github.com/Arnold1/ee37a2e4b2dfbfdca9bfae7c7c3a3755
You don't need to explicitly test each hexagon to see whether a given point is located inside it.
Let's assume, for the moment, that all of your points fall somewhere within the bounds of your hexagonal grid. Because your hexagons form a regular lattice, you only really need to know which of the hexagon centers is closest to each point.
This can be computed very efficiently using a scipy.spatial.cKDTree:
import numpy as np
from scipy.spatial import cKDTree
import json
with open('/tmp/grid.geojson', 'r') as f:
data = json.load(f)
verts = []
centroids = []
for hexagon in data['features']:
# a (7, 2) array of xy coordinates specifying the vertices of the hexagon.
# we ignore the last vertex since it's equal to the first
xy = np.array(hexagon['geometry']['coordinates'][0][:6])
verts.append(xy)
# compute the centroid by taking the average of the vertex coordinates
centroids.append(xy.mean(0))
verts = np.array(verts)
centroids = np.array(centroids)
# construct a k-D tree from the centroid coordinates of the hexagons
tree = cKDTree(centroids)
# generate 10000 normally distributed xy coordinates
sigma = 0.5 * centroids.std(0, keepdims=True)
mu = centroids.mean(0, keepdims=True)
gen = np.random.RandomState(0)
xy = (gen.randn(10000, 2) * sigma) + mu
# query the k-D tree to find which hexagon centroid is nearest to each point
distance, idx = tree.query(xy, 1)
# count the number of points that are closest to each hexagon centroid
counts = np.bincount(idx, minlength=centroids.shape[0])
Plotting the output:
from matplotlib import pyplot as plt
fig, ax = plt.subplots(1, 1, subplot_kw={'aspect': 'equal'})
ax.hold(True)
ax.scatter(xy[:, 0], xy[:, 1], 10, c='b', alpha=0.25, edgecolors='none')
ax.scatter(centroids[:, 0], centroids[:, 1], marker='h', s=(counts + 5),
c=counts, cmap='Reds')
ax.margins(0.01)
I can think of several different ways you could handle points that fall outside your grid depending on how much accuracy you need:
You could exclude points that fall outside the outer bounding rectangle of your hexagon vertices (i.e. x < xmin, x > xmax etc.). However, this will fail to exclude points that fall within the 'gaps' along the edges of your grid.
Another straightforward option would be to set a cut-off on distance according to the spacing of your hexagon centers, which is equivalent to using a circular approximation for your outer hexagons.
If accuracy is crucial then you could define a matplotlib.path.Path corresponding to the outer vertices of your hexagonal grid, then use its .contains_points() method to test whether your points are contained within it. Compared to the other two methods, this would probably be slower and more fiddly to code.

Check if points are inside ellipse faster than contains_point method

I use matplotlib 1.15.1 and I try to generate scattergrams like this:
The ellipses have fixes size and are drawn with center coordinates, width, height and angle (provided from outside): I have no idea what their equotions are.
g_ell_center = (0.8882, 0.8882)
g_ell_width = 0.36401857095483
g_ell_height = 0.16928136341606
g_ellipse = patches.Ellipse(g_ell_center, g_ell_width, g_ell_height, angle=angle, fill=False, edgecolor='green', linewidth=2)
This ellipses should mark normal and semi-normal data on my plot.
Then, I have an array of ~500 points which must be colored according to ellipse they belong to. So I tried to check each point with contains_point method:
colors_array = []
colors_scheme = ['green', 'yellow', 'black']
for point in points_array:
if g_ellipse.contains_point(point, radius=0):
colors_array.append(0)
elif y_ellipse.contains_point(point, radius=0):
colors_array.append(1)
else:
colors_array.append(2)
Finally, points are drawn:
plt.scatter(x_array, y_array, s=10, c=[colors_scheme[x] for x in colors_array], edgecolor="k", linewidths=0.3)
But contains_point is extremely slow! It worked for 5 minutes for 300-points scattergram, and I have to generate thousands of them in parallel. Maybe there's faster approach?
P.S. Whole project is bound to matplotlib, I can't use other libraries.
This approach should test if a point is within an ellipse, given the ellipse's centre, width, height and angle. You find the point's x and y coordinates relative to the ellipse centre, then transform those using the angle to be the coordinates along the major and minor axes. Finally, you find the normalised distance of the point from the cell centre, where a distance of 1 would be on the ellipse, less than 1 is inside, and more than 1 is outside.
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import numpy as np
fig,ax = plt.subplots(1)
ax.set_aspect('equal')
# Some test points
x = np.random.rand(500)*0.5+0.7
y = np.random.rand(500)*0.5+0.7
# The ellipse
g_ell_center = (0.8882, 0.8882)
g_ell_width = 0.36401857095483
g_ell_height = 0.16928136341606
angle = 30.
g_ellipse = patches.Ellipse(g_ell_center, g_ell_width, g_ell_height, angle=angle, fill=False, edgecolor='green', linewidth=2)
ax.add_patch(g_ellipse)
cos_angle = np.cos(np.radians(180.-angle))
sin_angle = np.sin(np.radians(180.-angle))
xc = x - g_ell_center[0]
yc = y - g_ell_center[1]
xct = xc * cos_angle - yc * sin_angle
yct = xc * sin_angle + yc * cos_angle
rad_cc = (xct**2/(g_ell_width/2.)**2) + (yct**2/(g_ell_height/2.)**2)
# Set the colors. Black if outside the ellipse, green if inside
colors_array = np.array(['black'] * len(rad_cc))
colors_array[np.where(rad_cc <= 1.)[0]] = 'green'
ax.scatter(x,y,c=colors_array,linewidths=0.3)
plt.show()
Note, this whole script takes 0.6 seconds to run and process 500 points. That includes creating and saving the figure, etc.
The process of setting the colors_array using the np.where method above takes 0.00007s for 500 points.
Note, in an older implementation shown below, setting the colors_array in a loop took 0.00016 s:
colors_array = []
for r in rad_cc:
if r <= 1.:
# point in ellipse
colors_array.append('green')
else:
# point not in ellipse
colors_array.append('black')
Your current implementation should only be calling contains_point 25,000 to 50,000 times, which isn't a lot. So, I'm guessing that the implementation of contains_point is targeted toward precision rather than speed.
Since you have a distribution of points where only a small percentage will be in any given ellipse, and therefore most will rarely be anywhere near any given ellipse, you can easily use rectangular coordinates as a short-cut to figure out whether the point is close enough to the ellipse to be worth calling contains_point.
Compute the left and right x coordinates and top and bottom y coordinates of the ellipse, possibly with a bit of padding to account for rendering differences, then check if the point is within those, such as the following pseudo-code:
if point.x >= ellipse_left and point.x <= ellipse_right and _
point.y >= ellipse_top and point.y <= ellipse_bottom:
if ellipse.contains_point(point, radius=0):
... use the contained point here
This approach eliminates expensive calculations for most of the points, allowing simple comparisons instead to rule out the obvious mismatches, while preserving the accuracy of the computations where the point is close enough that it might be in the ellipse. If e.g. only 1% of your points are anywhere near a given ellipse, this approach will eliminate 99% of your calls to contains_point and instead replace them with much faster comparisons.

Take data from a circle in python

I am looking into how the intensity of a ring changes depending on angle. Here is an example of an image:
What I would like to do is take a circle of values from within the center of that doughnut and plot them vs angle. What I'm currently doing is using scipy.ndimage.interpolation.rotate and taking slices radially through the ring, and extracting the maximum of the two peaks and plotting those vs angle.
crop = np.ones((width,width)) #this is my image
slices = np.arange(0,width,1)
stack = np.zeros((2*width,len(slices)))
angles = np.linspace(0,2*np.pi,len(crop2))
for j in range(len(slices2)): # take slices
stack[:,j] = rotate(crop,slices[j],reshape=False)[:,width]
However I don't think this is doing what I'm actually looking for. I'm mostly struggling with how to extract the data I want. I have also tried applying a mask which looks like this;
to the image, but then I don't know how to get the values within that mask in the correct order (ie. in order of increasing angle 0 - 2pi)
Any other ideas would be of great help!
I made a different input image to help verifying correctness:
import numpy as np
import scipy as sp
import scipy.interpolate
import matplotlib.pyplot as plt
# Mock up an image.
W = 100
x = np.arange(W)
y = np.arange(W)
xx,yy = np.meshgrid(x,y)
image = xx//5*5 + yy//5*5
image = image / np.max(image) # scale into [0,1]
plt.imshow(image, interpolation='nearest', cmap='gray')
plt.show()
To sample values from circular paths in the image, we first build an interpolator because we want to access arbitrary locations. We also vectorize it to be faster.
Then, we generate the coordinates of N points on the circle's circumference using the parametric definition of the circle x(t) = sin(t), y(t) = cos(t).
N should be at least twice the circumference (Nyquist–Shannon sampling theorem).
interp = sp.interpolate.interp2d(x, y, image)
vinterp = np.vectorize(interp)
for r in (15, 30, 45): # radii for circles around image's center
xcenter = len(x)/2
ycenter = len(y)/2
arclen = 2*np.pi*r
angle = np.linspace(0, 2*np.pi, arclen*2, endpoint=False)
value = vinterp(xcenter + r*np.sin(angle),
ycenter + r*np.cos(angle))
plt.plot(angle, value, label='r={}'.format(r))
plt.legend()
plt.show()

Python determine the mean and extract maximum value inside a polygon over a grid in an Array

I have 3 NumPy arrays which consist of UTM-X(256) and UTM-Y(256) coordinates, and the accumulated Rainfall(65536) for a Weather Radar 256x256 (km) in UTM.
I also have a Polygon inside the Grid bounds that is a Catchment Boundary in UTM.
I need to determine the Average Rainfall over just the catchment polygon (a clipped sub set of the RADAR data), and the maximum, and the location of the maximum. I have already determined the average over the entire RADAR grid.
So the question is: How do I perform analysis on a subset of a NumPy array that is determined by the Polygon? I would have thought that this would be a very common operation, but have not found any Python scripts to perform this operation.
Here is an illustration of the data set:
Here is an outline of a possible approach.
First find the polygon that bounds the catchment boundary. Presuming you know which of the UTM coordinates of your full set of points form that catchment boundary, say it's like this,
catchment = an np.array of (UTM_X, UTM_Y) point tuples
you could find the boundary of that point set using scipy.spatial.ConvexHull
boundary= scipy.spatial.ConvexHull(catchment)
Next, for your array of rainfall data, you would have to test whether the coordinates fall inside or outside of the boundary of the convex hull.
This previous SO question has some good answers explaining ways to do that coordinate test.
Finally you would gather those rainfall data points that passed the test of being inside the boundary and perform whatever statistical calculations you want to do with appropriate NumPy/SciPy statistical functions.
Assuming the boundary is given as a list of the polygon vertices, you could have matplotlib generate a mask for you over the data coordinates and then use that mask to sum up only the values within the contour.
In other words, when you have a series of coordinates that define the boundary of the polygon that marks the region of interest, then have matplotlib generate a boolean mask indicating all the coordinates that are within this polygon. This mask can then be used to extract only the limited dataset of rainfall within the contour.
The following simple example shows you how this is done:
import numpy as np
from matplotlib.patches import PathPatch
from matplotlib.path import Path
import matplotlib.pyplot as plt
# generate some fake data
xmin, xmax, ymin, ymax = -10, 30, -4, 20
y,x = np.mgrid[ymin:ymax+1,xmin:xmax+1]
z = (x-(xmin+xmax)/2)**2 + (y-(ymin + ymax)/2)**2
extent = [xmin-.5, xmax+.5, ymin-.5, ymax+.5]
xr, yr = [np.random.random_integers(lo, hi, 3) for lo, hi
in ((xmin, xmax), (ymin, ymax))] # create a contour
coordlist = np.vstack((xr, yr)).T # create an Nx2 array of coordinates
coord_map = np.vstack((x.flatten(), y.flatten())).T # create an Mx2 array listing all the coordinates in field
polypath = Path(coordlist)
mask = polypath.contains_points(coord_map).reshape(x.shape) # have mpl figure out which coords are within the contour
f, ax = plt.subplots(1,1)
ax.imshow(z, extent=extent, interpolation='none', origin='lower', cmap='hot')
ax.imshow(mask, interpolation='none', extent=extent, origin='lower', alpha=.5, cmap='gray')
patch = PathPatch(polypath, facecolor='g', alpha=.5)
ax.add_patch(patch)
plt.show(block=False)
print(z[mask].sum()) # prints out the total accumulated
In this example, x and y represent your UTM-X and UTM-Y dataranges. z represents the weather rainfall data, but is in this case a matrix, unlike your single-column view of average rainfall (which is easily remapped onto a grid).
In the last line, I've summed up all the values of z that are within the contour. If you want the mean, just replace sum by mean.

Categories