Finding k nearest neighbors in 3d numpy array - python

So I'm trying to find the k nearest neighbors in a pyvista numpy array from an example mesh. With the neighbors received, I want to implement some region growing in my 3d model.
But unfortunaley I receive some weird output, which you can see in the following picture.
It seems like I'm missing something on the KDTree implementation. I was following the answer on a similar question: https://stackoverflow.com/a/2486341/9812286
import numpy as np
from sklearn.neighbors import KDTree
import pyvista as pv
from pyvista import examples
# Example dataset with normals
mesh = examples.load_random_hills()
smooth = mesh
NDIM = 3
X = smooth.points
point = X[5000]
tree = KDTree(X, leaf_size=X.shape[0]+1)
# ind = tree.query_radius([point], r=10) # indices of neighbors within distance 0.3
distances, ind = tree.query([point], k=1000)
p = pv.Plotter()
p.add_mesh(smooth)
ids = np.arange(smooth.n_points)[ind[0]]
top = smooth.extract_cells(ids)
random_color = np.random.random(3)
p.add_mesh(top, color=random_color)
p.show()

You're almost there :) The problem is that you are using the points in the mesh to build the tree, but then extracting cells. Of course these are unrelated in the sense that indices for points will give you nonsense when applied as indices of cells.
Either you have to extract_points:
import numpy as np
from sklearn.neighbors import KDTree
import pyvista as pv
from pyvista import examples
# Example dataset with normals
mesh = examples.load_random_hills()
smooth = mesh
NDIM = 3
X = smooth.points
point = X[5000]
tree = KDTree(X, leaf_size=X.shape[0]+1)
# ind = tree.query_radius([point], r=10) # indices of neighbors within distance 0.3
distances, ind = tree.query([point], k=1000)
p = pv.Plotter()
p.add_mesh(smooth)
ids = np.arange(smooth.n_points)[ind[0]]
top = smooth.extract_points(ids) # changed here!
random_color = np.random.random(3)
p.add_mesh(top, color=random_color)
p.show()
Or you have to work with cell centers to begin with:
import numpy as np
from sklearn.neighbors import KDTree
import pyvista as pv
from pyvista import examples
# Example dataset with normals
mesh = examples.load_random_hills()
smooth = mesh
NDIM = 3
X = smooth.cell_centers().points # changed here!
point = X[5000]
tree = KDTree(X, leaf_size=X.shape[0]+1)
# ind = tree.query_radius([point], r=10) # indices of neighbors within distance 0.3
distances, ind = tree.query([point], k=1000)
p = pv.Plotter()
p.add_mesh(smooth)
ids = np.arange(smooth.n_points)[ind[0]]
top = smooth.extract_cells(ids)
random_color = np.random.random(3)
p.add_mesh(top, color=random_color)
p.show()
As you can see, the two results differ, since index 5000 (which we used for the reference point) means something else when indexing points or when indexing cells.

Related

How to find the intersection of 2 convex hulls?

I have two convex hulls. Let's assume they are given as scipy.spatial.ConvexHulls
import numpy as np
points1 = np.random.rand((10, 3))
points2 = np.random.rand((10, 3))
hull1 = ConvexHull(points1)
hull2 = ConvexHull(points2)
I would like the convex hull that is the intersection of these two convex hulls, but could not find a built in method to do this.
I assume this can be done manually somehow by using scipy.spatial.HalfspaceIntersection by using half spaces defined by hull1 to cut off hull2, but still having trouble doing it, and can't believe this is not already implemented somewhere.
Note that I don't mind if scipy is not used.
I would try pycddlib, which implements the double description of polyhedra. The double description of a polyhedron is:
V-description: description by vertices
H-description: description by system of linear inequalities ("H" for "hyperplanes")
You probably have the vertices of your two convex polyhedra. Convert to the H-descriptions, then combine the two systems of linear inequalities, and then convert to the V-representation.
Here is an example.
import numpy as np
import pyvista as pv
import cdd as pcdd
from scipy.spatial import ConvexHull
# take one cube
cube1 = pv.Cube()
# take the same cube but translate it
cube2 = pv.Cube()
cube2.translate((0.5, 0.5, 0.5))
# plot
pltr = pv.Plotter(window_size=[512,512])
pltr.add_mesh(cube1)
pltr.add_mesh(cube2)
pltr.show()
# I don't know why, but there are duplicates in the PyVista cubes;
# here are the vertices of each cube, without duplicates
pts1 = cube1.points[0:8, :]
pts2 = cube2.points[0:8, :]
# make the V-representation of the first cube; you have to prepend
# with a column of ones
v1 = np.column_stack((np.ones(8), pts1))
mat = pcdd.Matrix(v1, number_type='fraction') # use fractions if possible
mat.rep_type = pcdd.RepType.GENERATOR
poly1 = pcdd.Polyhedron(mat)
# make the V-representation of the second cube; you have to prepend
# with a column of ones
v2 = np.column_stack((np.ones(8), pts2))
mat = pcdd.Matrix(v2, number_type='fraction')
mat.rep_type = pcdd.RepType.GENERATOR
poly2 = pcdd.Polyhedron(mat)
# H-representation of the first cube
h1 = poly1.get_inequalities()
# H-representation of the second cube
h2 = poly2.get_inequalities()
# join the two sets of linear inequalities; this will give the intersection
hintersection = np.vstack((h1, h2))
# make the V-representation of the intersection
mat = pcdd.Matrix(hintersection, number_type='fraction')
mat.rep_type = pcdd.RepType.INEQUALITY
polyintersection = pcdd.Polyhedron(mat)
# get the vertices; they are given in a matrix prepended by a column of ones
vintersection = polyintersection.get_generators()
# get rid of the column of ones
ptsintersection = np.array([
vintersection[i][1:4] for i in range(8)
])
# these are the vertices of the intersection; it remains to take
# the convex hull
ConvexHull(ptsintersection)

What is the best way/method to digitize the data of a 3D surface into a grid of pixels with smaller resolution in Python?

I want to digitize (= average out over cells) photon count data into pixels given by a grid that tells how they are aligned. The photon count data is stored in a 2D array. I want to split that data into cells, each of which would correspond to a pixel. The idea is basically the same as changing an HD image to a smaller resolution. I'd like to achieve this in Python.
The digitizing function I've written:
import numpy as np
def digitize(function_data, grid_shape):
"""
function_data = 2D array of function values of some 3D shape,
eg.: exp(-(x^2 + y^2 -> want to digitize this
grid_shape: an array of length 2 which contains the dimensions of the smaller resolution
"""
l = len(function_data)
pixel_len_x = int(l/grid_shape[0])
pixel_len_y = int(l/grid_shape[1])
digitized_data = np.empty((grid_shape[0], grid_shape[1]))
for i in range(grid_shape[0]): #row-index of pixel in smaller-resolution grid
for j in range(grid_shape[1]): #column-index of pixel in smaller-resolution grid
hd_pixel = []
for k in range(pixel_len_y):
hd_pixel.append(z_data[k][j:j*pixel_len_x])
hd_pixel = np.ravel(hd_pixel) #turns 2D array into 1D to be able to compute average
pixel_avg = np.average(hd_pixel)
digitized_data[i][j] = pixel_avg
return digitized_data
In theory, this function should do what I want to achieve, but when tested it doesn't yield the expected results. Either a completed version of my function or any other method that achieves my goal would be extremely helpful.
You could also use a interpolation function, if you can use SciPy. Here we use one of the gridded data interpolating functions, RectBivariateSpline to upsample your function, but you can find numerous examples on this and other sites.
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import RectBivariateSpline as rbs
# Sampling coordinates
x = np.linspace(-2,2,20)
y = np.linspace(-2,2,30)
# Your function
f = np.exp(-(x[:,None]**2 + y**2))
# Interpolator
interp = rbs(x, y, f)
# Higher resolution coordinates
x_hd = np.linspace(x.min(), x.max(), x.size * 5)
y_hd = np.linspace(y.min(), y.max(), y.size * 5)
# New higher res function
f_hd = interp(x_hd, y_hd, grid = True)
# Some plots
fig, ax = plt.subplots(ncols = 2)
ax[0].imshow(f)
ax[1].imshow(f_hd)

Python shapely: Aggregating points to shape files for a Choropleth map

I'm trying a create a Choropleth in Python3 using shapely, fiona & bokeh for display.
I have a file with about 7000 lines that have the location of a town and a counter.
Example:
54.7604;9.55827;208
54.4004;9.95918;207
53.8434;9.95271;203
53.5979;10.0013;201
53.728;10.2526;197
53.646;10.0403;196
54.3977;10.1054;193
52.4385;9.39217;193
53.815;10.3476;192
...
I want to show these in a 12,5km grid, for which a shapefile is available on
https://opendata-esri-de.opendata.arcgis.com/datasets/3c1f46241cbb4b669e18b002e4893711_0
The code I have works.
It's very slow, because it's a brute force algorithm that checks each of the 7127 grid points against all of the 7000 points.
import pandas as pd
import fiona
from shapely.geometry import Polygon, Point, MultiPoint, MultiPolygon
from shapely.prepared import prep
sf = r'c:\Temp\geo_de\Hexagone_125_km\Hexagone_125_km.shp'
shp = fiona.open(sf)
district_xy = [ [ xy for xy in feat["geometry"]["coordinates"][0]] for feat in shp]
district_poly = [ Polygon(xy) for xy in district_xy] # coords to Polygon
df_p = pd.read_csv('points_file.csv', sep=';', header=None)
df_p.columns = ('lat', 'lon', 'count')
map_points = [Point(x,y) for x,y in zip(df_p.lon, df_p.lat)] # Convert Points to Shapely Points
all_points = MultiPoint(map_points) # all points
def calc_points_per_poly(poly, points, values): # Returns total for poly
poly = prep(poly)
return sum([v for p, v in zip(points, values) if poly.contains(p)])
# this is the slow part
# for each shape this sums um the points
sum_hex = [calc_points_per_poly(x, all_points, df_p['count']) for x in district_poly]
Since this is extremly slow, I'm wondering if there is a faster way to get the num_hex value, especially, since the real world list of points may be a lot larger and a smaller grid with more shapes would deliver a better result.
I would recommend using 'geopandas' and its built-in rtree spatial index. It allows you to do the check only if there is a possibility that point lies within polygon.
import pandas as pd
import geopandas as gpd
from shapely.geometry import Polygon, Point
sf = 'Hexagone_125_km.shp'
shp = gpd.read_file(sf)
df_p = pd.read_csv('points_file.csv', sep=';', header=None)
df_p.columns = ('lat', 'lon', 'count')
gdf_p = gpd.GeoDataFrame(df_p, geometry=[Point(x,y) for x,y in zip(df_p.lon, df_p.lat)])
sum_hex = []
spatial_index = gdf_p.sindex
for index, row in shp.iterrows():
polygon = row.geometry
possible_matches_index = list(spatial_index.intersection(polygon.bounds))
possible_matches = gdf_p.iloc[possible_matches_index]
precise_matches = possible_matches[possible_matches.within(polygon)]
sum_hex.append(sum(precise_matches['count']))
shp['sum'] = sum_hex
This solution should be faster than your. You can then plot your GeoDataFrame via Bokeh. If you want more details on spatial indexing I recommend this article by Geoff Boeing: https://geoffboeing.com/2016/10/r-tree-spatial-index-python/

Basemap interpolation alternative - regridding data

I'm moving from basemap to cartopy given basemap is going to be phased out. I've previously used the basemap.interp functionality to interpolate data, e.g. say I have data at 1 degree resolution (180x360), I would run the following to interpolate to 0.5 degrees.
import numpy as np
from mpl_toolkits import basemap
Old_Lon = np.linspace(-180,180,360)
Old_Lat = np.linspace(-90,90,180)
New_Lon = np.linspace(-180,180,720)
New_Lat = np.linspace(-90,90,360)
New_Lon,New_Lat = np.meshgrid(New_Lon,New_Lat)
New_Data = basemap.interp(Old_Data,Old_Lon,Old_Lat,New_Lon,New_Lat,order=0)
order gives me options to choose from nearest neighbour, bi-linear etc. Is there an alternative that does this in as simple way? I've seen scipy has interpolation but I'm not sure how to apply it. Any help would be appreciated!
I eventually decided to take the raw code from Basemap and make it into a standalone function - I'll be recommending it to the cartopy guys to implement it as its a useful feature. Posting here as could be useful to someone else:
def Interp(datain,xin,yin,xout,yout,interpolation='NearestNeighbour'):
"""
Interpolates a 2D array onto a new grid (only works for linear grids),
with the Lat/Lon inputs of the old and new grid. Can perfom nearest
neighbour interpolation or bilinear interpolation (of order 1)'
This is an extract from the basemap module (truncated)
"""
# Mesh Coordinates so that they are both 2D arrays
xout,yout = np.meshgrid(xout,yout)
# compute grid coordinates of output grid.
delx = xin[1:]-xin[0:-1]
dely = yin[1:]-yin[0:-1]
xcoords = (len(xin)-1)*(xout-xin[0])/(xin[-1]-xin[0])
ycoords = (len(yin)-1)*(yout-yin[0])/(yin[-1]-yin[0])
xcoords = np.clip(xcoords,0,len(xin)-1)
ycoords = np.clip(ycoords,0,len(yin)-1)
# Interpolate to output grid using nearest neighbour
if interpolation == 'NearestNeighbour':
xcoordsi = np.around(xcoords).astype(np.int32)
ycoordsi = np.around(ycoords).astype(np.int32)
dataout = datain[ycoordsi,xcoordsi]
# Interpolate to output grid using bilinear interpolation.
elif interpolation == 'Bilinear':
xi = xcoords.astype(np.int32)
yi = ycoords.astype(np.int32)
xip1 = xi+1
yip1 = yi+1
xip1 = np.clip(xip1,0,len(xin)-1)
yip1 = np.clip(yip1,0,len(yin)-1)
delx = xcoords-xi.astype(np.float32)
dely = ycoords-yi.astype(np.float32)
dataout = (1.-delx)*(1.-dely)*datain[yi,xi] + \
delx*dely*datain[yip1,xip1] + \
(1.-delx)*dely*datain[yip1,xi] + \
delx*(1.-dely)*datain[yi,xip1]
return dataout
--
The SciPy interpolation routines return a function that you can call to perform an interpolation. For nearest neighbour interpolation on a regular grid, you can use scipy.interpolate.RegularGridInterpolator:
import numpy as np
from scipy.interpolate import RegularGridInterpolator
nearest_function = RegularGridInterpolator(
(old_lon, old_lat), old_data, method="nearest", bounds_error=False
)
new_data = np.array(
[[nearest_function([i, j]) for j in new_lat] for i in new_lon]
).squeeze()
That isn't perfect, though, because lon=175 are all fill values. (If I hadn't set bounds_error=False then you'd get an error there.) In that case, you need to ask how you want to wrap around the dateline. A straightforward solution would be to copy the lon=0 line to the end of the array and call it lon=180.
Should you want linear or higher order interpolation one day, which I'd recommend if your data are points rather than cells, you can use scipy.interpolate.RectBivariateSpline:
import numpy as np
from scipy.interpolate import RectBivariateSpline
old_step = 10
old_lon = np.arange(-180, 180, old_step)
old_lat = np.arange(-90, 90, old_step)
old_data = np.random.random((len(old_lon), len(old_lat)))
interp_function = RectBivariateSpline(old_lon, old_lat, old_data, kx=1, ky=1)
new_lon = np.arange(-180, 180, new_step)
new_lat = np.arange(-90, 90, new_step)
new_data = interp_function(new_lon, new_lat)

Extracting clusters in the form of lines from a dataset

I have a dataset similar to the one shown below, that clearly forms lines from my point of view. Instead of drawing markers, I want to connect the markers within each curve by a line. I am curious, in this case, what type of clustering algorithms would be a good one?
import numpy as np
import matplotlib.pyplot as plt
np.random.seed = 42
#Generate (x,y) data
x = np.linspace(0.1,0.9,50)
y = x%1
x += np.sin(2*x%1)
y = y%0.2
#Shuffle (x,y) data
ns = list(range(len(x)))
np.random.shuffle(ns)
x = x[ns]
y = y[ns]
#Plot
fig, axs = plt.subplots(1,2)
axs[0].scatter(x,y)
axs[1].plot(x,y)
plt.savefig("markers vs lines.pdf")
Figure - Left: Markers, Right: Data points connected by lines.
Since you asked for Clustering Algorithm, you may want to look at DBSCAN.
http://scikit-learn.org/stable/auto_examples/cluster/plot_dbscan.html
There is two parameters, epsilon and the number of point to make a cluster.
Here is a code to get you started:
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler
import numpy as np
import matplotlib.pyplot as plt
np.random.seed = 42
%matplotlib inline
#Generate (x,y) data
x = np.linspace(0.1,0.9,50)
y = x%1
x += np.sin(2*x%1)
y = y%0.2
#Shuffle (x,y) data
ns = list(range(len(x)))
np.random.shuffle(ns)
x = x[ns]
y = y[ns]
"""
Fit the Data
"""
X = [i for i in zip(x,y)]
X = StandardScaler().fit_transform(X)
"""
Compute the DBSCAN
"""
db = DBSCAN(eps=0.5, min_samples=1).fit(X)
labels = db.labels_
# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
n_clusters_
"""
Plot the clusters
"""
d= dict(zip(set(labels),['red','green','blue','yellow','purple','grey']))
d[-1] = "black"
plt.scatter(x,y,color=[ d[i] for i in labels])
plt.show()
The result :
Inpired by : http://scikit-learn.org/stable/auto_examples/cluster/plot_dbscan.html
More about the parameters of the DBSCAN here : http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html#sklearn.cluster.DBSCAN
Hope this help.
Such data is common in image analysis, due to architecture.
In order to infer perspective, people have used the Hough transform to identify lines of points'.
That is probably the best method to use here.

Categories