I would like to use Delaunay Triangulation in Python to interpolate the points in 3D.
What I have is
# my array of points
points = [[1,2,3], [2,3,4], ...]
# my array of values
values = [7, 8, ...]
# an object with triangulation
tri = Delaunay(points)
# a set of points at which I want to interpolate
p = [[1.5, 2.5, 3.5], ...]
# this gets simplexes that contain given points
s = tri.find_simplex(p)
# this gets vertices for the simplexes
v = tri.vertices[s]
I was only able to find one answer here that suggest to use transform method for the interpolation, but without being any more specific.
What I need to know is how to use the vertices of the containing simplex to get the weights for the linear interpolation. Let's assume a general n-dim case so that the answer does not depend on the dimension.
EDIT: I do not want to use LinearNDInterpolator or similar approach because I do not have a number at each point as a value but something more complex (array/function).
After some experimenting, the solution looks simple (this post was quite helpful):
# dimension of the problem (in this example I use 3D grid,
# but the method works for any dimension n>=2)
n = 3
# my array of grid points (array of n-dimensional coordinates)
points = [[1,2,3], [2,3,4], ...]
# each point has some assigned value that will be interpolated
# (e.g. a float, but it can be a function or anything else)
values = [7, 8, ...]
# a set of points at which I want to interpolate (it must be a NumPy array)
p = np.array([[1.5, 2.5, 3.5], [1.1, 2.2, 3.3], ...])
# create an object with triangulation
tri = Delaunay(points)
# find simplexes that contain interpolated points
s = tri.find_simplex(p)
# get the vertices for each simplex
v = tri.vertices[s]
# get transform matrices for each simplex (see explanation bellow)
m = tri.transform[s]
# for each interpolated point p, mutliply the transform matrix by
# vector p-r, where r=m[:,n,:] is one of the simplex vertices to which
# the matrix m is related to (again, see bellow)
b = np.einsum('ijk,ik->ij', m[:,:n,:n], p-m[:,n,:])
# get the weights for the vertices; `b` contains an n-dimensional vector
# with weights for all but the last vertices of the simplex
# (note that for n-D grid, each simplex consists of n+1 vertices);
# the remaining weight for the last vertex can be copmuted from
# the condition that sum of weights must be equal to 1
w = np.c_[b, 1-b.sum(axis=1)]
The key method to understand is transform, which is briefly documented, however the documentation says all it needs to be said. For each simplex, transform[:,:n,:n] contains the transformation matrix, and transform[:,n,:] contains the vector r to which the matrix is related to. It seems that r vector is chosen as the last vertex of the simplex.
Another tricky point is how to get b, because what I want to do is something like
for i in range(len(p)): b[i] = m[i,:n,:n].dot(p[i]-m[i,n,:])
Essentially, I need an array of dot products, while dot gives a product of two arrays. The loop over the individual simplexes like above would work, but a it can be done faster in one step, for which there is numpy.einsum:
b = np.einsum('ijk,ik->ij', m[:,:n,:n], p-m[:,n,:])
Now, v contains indices of vertex points for each simplex and w holds corresponding weights. To get the interpolated values p_values at set of points p, we do (note: values must be NumPy array for this):
values = np.array(values)
for i in range(len(p)): p_values[i] = np.inner(values[v[i]], w[i])
Or we may do this in a single step using `np.einsum' again:
p_values = np.einsum('ij,ij->i', values[v], w)
Some care must be taken in situations, when some of the interpolated points lie outside the grid. In such case, find_simplex(p) returns -1 for those points and then you will have to mask out them (using masked arrays perhaps).
You don't need to implement this from scratch, there is already built-in support in scipy for this feature:
scipy.interpolate.LinearNDInterpolator
You need an interval and a linear interpolation, i.e. the lenght of the edge and the distance of the interpolated points from the start vertex.
Related
I would like to build a function npbatch(U,X) which compares data points in an input matrix (U) with data points in a training matrix (X) and gets me the index of X with the shortest euclidean distance to the data point in U.
I would like to avoid any loops to increase the performance and I would like to use the function scipy.spatial.distance.cdist to compute the distance.
Example Input:
U
array([[0.69646919, 0.28613933, 0.22685145],
[0.55131477, 0.71946897, 0.42310646],
[0.9807642 , 0.68482974, 0.4809319 ]])
X
array([[0.24875591, 0.16306678, 0.78364326],
[0.80852339, 0.62562843, 0.60411363],
[0.8857019 , 0.75911747, 0.18110506]])
--> Expected Output: Array with the three indices of the data points in X which have the shortest distance to the three data points in U.
My overall target is then to get the label of the corresponding data point using the index which I've got. Example for label input would be:
Y
array([1, 0, 0])
Thank you for any hint!
With scipy.spatial.distance.cdist you already chose a well-suited function for the task. To get the indices, we just have to apply numpy.argmin along the axis 0 (or axis 1 for cdist(U, X)):
ix = numpy.argmin(scipy.spatial.distance.cdist(X, U), 0)
Getting the label is then trivial:
Y[ix]
I'm trying to extract the coordinates of the local maxima from a numpy 2D matrix. the values are numbers between 0 and 1 representing a likelihood that an object is in that location.
I've tried threshold-ing the matrix and extracting the argmax and saving the coordinates and changing it's value to 0 and looping until the threshold encountered.
detections = []
while True:
maxloc = np.unravel_index(np.argmax(scmap),
scmap.shape)
if scmap[maxloc] > 0.9:
# other code ..
detections.append(maxloc)
scmap[maxloc] = 0
# after that, what i did is calculating the euclidean distance
# between each pair and excluded the ones that does not meet the
# threshold
I am not satisfied with this, and i think there is more efficient elegant ways to extract the local maxima. thoughts ?
Locating local maxima is a built-in feature of scikit-image which locates values that are maximal within some predefined distance.
from skimage.feature import peak_local_max
coordinates = peak_local_max(scmax, min_distance=5)
I'm not sure how this is actually implemented but one implementation method is perform non-maximal suppression (i.e. Iterate through each value in the matrix and compare with all values within a radius. If the value is not maximal in that window then set it to some predefined value like zero or -inf). Then take the coordinates of all non-suppressed values (possibly above some threshold) as the collection of local maxima.
If you are trying to extract coordinates of a Numby Matrix, of all of the values that meets a certain threshold, you can simply compare the threshold to the entire matrix.
import numpy as np
data = np.array([
[0, 0.5, 0.95],
[0, 0.95, 0.5],
[0.95, 0.5, 0]
])
thresholded_coordinates = np.argwhere(data > 0.9)
# array([[0, 2], [1,1], [2, 0]])
The output of the thresholded_coordinates is a pairwise collection of coordinates. (0,2) indicates it is the third value in the first row (0-indexed). The output is indicated on the comment on the last line.
Given a 2D numpy array dist with shape (200,200), where each entry of the array represents the joint probability of (x1, x2) for all x1 , x2 ∈ {0, 1, . . . , 199}. How do I sample bivariate data x = (x1, x2) from this probability distribution with the aid of Numpy or Scipy API?
This solution works with probability distributions of any number of dimensions, assuming they are a valid probability distribution (its contents must sum to 1, etc.). It flattens the distribution, samples from that, and adjusts the random index to match the original array shape.
# Create a flat copy of the array
flat = array.flatten()
# Then, sample an index from the 1D array with the
# probability distribution from the original array
sample_index = np.random.choice(a=flat.size, p=flat)
# Take this index and adjust it so it matches the original array
adjusted_index = np.unravel_index(sample_index, array.shape)
print(adjusted_index)
Also, to get multiple samples, add a size keyword argument to the np.random.choice call, and modify adjusted_index before printing it:
adjusted_index = np.array(zip(*adjusted_index))
This is necessary because np.random.choice with a size argument outputs a list of indices for each coordinate dimension, so this zips them into a list of coordinate tuples. This is also much more efficient than simply repeating the first code.
Relevant documentation:
np.random.choice
np.unravel_index
Here's a way, but I'm sure there's a much more elegant solution using scipy.
numpy.random doesn't deal with 2d pmfs, so you have to do some reshaping gymnastics to go this way.
import numpy as np
# construct a toy joint pmf
dist=np.random.random(size=(200,200)) # here's your joint pmf
dist/=dist.sum() # it has to be normalized
# generate the set of all x,y pairs represented by the pmf
pairs=np.indices(dimensions=(200,200)).T # here are all of the x,y pairs
# make n random selections from the flattened pmf without replacement
# whether you want replacement depends on your application
n=50
inds=np.random.choice(np.arange(200**2),p=dist.reshape(-1),size=n,replace=False)
# inds is the set of n randomly chosen indicies into the flattened dist array...
# therefore the random x,y selections
# come from selecting the associated elements
# from the flattened pairs array
selections = pairs.reshape(-1,2)[inds]
I can't comment either, but #applemonkey496 's suggestion for getting multiple samples doesn't work as written. It's an excellent solution otherwise.
Instead of
adjusted_index = np.array(zip(*adjusted_index))
adjusted_index should be converted to a python list before trying to put it into a numpy array (numpy arrays do not accept zipped objects), eg:
adjusted_index = np.array(list(zip(*adjusted_index)))
I can't comment, but to improve kevinkayaks answer's :
pairs=np.indices(dimensions=(200,200)).T
selections = pairs.reshape(-1,2)[inds]
Is not needed can be replace by :
np.array([inds//m, inds%m]).T
The matrix "pairs" is not needed anymore.
Here is my problem : I manipulate 432*46*136*136 grids representing time*(space) encompassed in numpy arrays with numpy and python. I have one array alt, which encompasses the altitudes of the grid points, and another array temp which stores the temperature of the grid points.
It is problematic for a comparison : if T1 and T2 are two results, T1[t0,z0,x0,y0] and T2[t0,z0,x0,y0] represent the temperature at H1[t0,z0,x0,y0] and H2[t0,z0,x0,y0] meters, respectively. But I want to compare the temperature of points at the same altitude, not at the same grid point.
Hence I want to modify the z-axis of my matrices to represent the altitude and not the grid point. I create a function conv(alt[t,z,x,y]) which attributes a number between -20 and 200 to each altitude. Here is my code :
def interpolation_extended(self,temp,alt):
[t,z,x,y]=temp.shape
new=np.zeros([t,220,x,y])
for l in range(0,t):
for j in range(0,z):
for lat in range(0,x):
for lon in range(0,y):
new[l,conv(alt[l,j,lat,lon]),lat,lon]=temp[l,j,lat,lon]
return new
But this takes definitely too much time, I can't work this it. I tried to write it using universal functions with numpy :
def interpolation_extended(self,temp,alt):
[t,z,x,y]=temp.shape
new=np.zeros([t,220,x,y])
for j in range(0,z):
new[:,conv(alt[:,j,:,:]),:,:]=temp[:,j,:,:]
return new
But that does not work. Do you have any idea of doing this in python/numpy without using 4 nested loops ?
Thank you
I can't really try the code since I don't have your matrices, but something like this should do the job.
First, instead of declaring conv as a function, get the whole altitude projection for all your data:
conv = np.round(alt / 500.).astype(int)
Using np.round, the numpys version of round, it rounds all the elements of the matrix by vectorizing operations in C, and thus, you get a new array very quickly (at C speed). The following line aligns the altitudes to start in 0, by shifting all the array by its minimum value (in your case, -20):
conv -= conv.min()
the line above would transform your altitude matrix from [-20, 200] to [0, 220] (better for indexing).
With that, interpolation can be done easily by getting multidimensional indices:
t, z, y, x = np.indices(temp.shape)
the vectors above contain all the indices needed to index your original matrix. You can then create the new matrix by doing:
new_matrix[t, conv[t, z, y, x], y, x] = temp[t, z, y, x]
without any loop at all.
Let me know if it works. It might give you some erros since is hard for me to test it without data, but it should do the job.
The following toy example works fine:
A = np.random.randn(3,4,5) # Random 3x4x5 matrix -- your temp matrix
B = np.random.randint(0, 10, 3*4*5).reshape(3,4,5) # your conv matrix with altitudes from 0 to 9
C = np.zeros((3,10,5)) # your new matrix
z, y, x = np.indices(A.shape)
C[z, B[z, y, x], x] = A[z, y, x]
C contains your results by altitude.
My code:
from numpy import *
def pca(orig_data):
data = array(orig_data)
data = (data - data.mean(axis=0)) / data.std(axis=0)
u, s, v = linalg.svd(data)
print s #should be s**2 instead!
print v
def load_iris(path):
lines = []
with open(path) as input_file:
lines = input_file.readlines()
data = []
for line in lines:
cur_line = line.rstrip().split(',')
cur_line = cur_line[:-1]
cur_line = [float(elem) for elem in cur_line]
data.append(array(cur_line))
return array(data)
if __name__ == '__main__':
data = load_iris('iris.data')
pca(data)
The iris dataset: http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data
Output:
[ 20.89551896 11.75513248 4.7013819 1.75816839]
[[ 0.52237162 -0.26335492 0.58125401 0.56561105]
[-0.37231836 -0.92555649 -0.02109478 -0.06541577]
[ 0.72101681 -0.24203288 -0.14089226 -0.6338014 ]
[ 0.26199559 -0.12413481 -0.80115427 0.52354627]]
Desired Output:
Eigenvalues - [2.9108 0.9212 0.1474 0.0206]
Principal Components - Same as I got but transposed so okay I guess
Also, what's with the output of the linalg.eig function? According to the PCA description on wikipedia, I'm supposed to this:
cov_mat = cov(orig_data)
val, vec = linalg.eig(cov_mat)
print val
But it doesn't really match the output in the tutorials I found online. Plus, if I have 4 dimensions, I thought I should have 4 eigenvalues and not 150 like the eig gives me. Am I doing something wrong?
edit: I've noticed that the values differ by 150, which is the number of elements in the dataset. Also, the eigenvalues are supposed to add to be equal to the number of dimensions, in this case, 4. What I don't understand is why this difference is happening. If I simply divided the eigenvalues by len(data) I could get the result I want, but I don't understand why. Either way the proportion of the eigenvalues isn't altered, but they are important to me so I'd like to understand what's going on.
You decomposed the wrong matrix.
Principal Component Analysis requires manipulating the eigenvectors/eigenvalues
of the covariance matrix, not the data itself. The covariance matrix, created from an m x n data matrix, will be an m x m matrix with ones along the main diagonal.
You can indeed use the cov function, but you need further manipulation of your data. It's probably a little easier to use a similar function, corrcoef:
import numpy as NP
import numpy.linalg as LA
# a simulated data set with 8 data points, each point having five features
data = NP.random.randint(0, 10, 40).reshape(8, 5)
# usually a good idea to mean center your data first:
data -= NP.mean(data, axis=0)
# calculate the covariance matrix
C = NP.corrcoef(data, rowvar=0)
# returns an m x m matrix, or here a 5 x 5 matrix)
# now get the eigenvalues/eigenvectors of C:
eval, evec = LA.eig(C)
To get the eigenvectors/eigenvalues, I did not decompose the covariance matrix using SVD,
though, you certainly can. My preference is to calculate them using eig in NumPy's (or SciPy's)
LA module--it is a little easier to work with than svd, the return values are the eigenvectors
and eigenvalues themselves, and nothing else. By contrast, as you know, svd doesn't return these these directly.
Granted the SVD function will decompose any matrix, not just square ones (to which the eig function is limited); however when doing PCA, you'll always have a square matrix to decompose,
regardless of the form that your data is in. This is obvious because the matrix you
are decomposing in PCA is a covariance matrix, which by definition is always square
(i.e., the columns are the individual data points of the original matrix, likewise
for the rows, and each cell is the covariance of those two points, as evidenced
by the ones down the main diagonal--a given data point has perfect covariance with itself).
The left singular values returned by SVD(A) are the eigenvectors of AA^T.
The covariance matrix of a dataset A is : 1/(N-1) * AA^T
Now, when you do PCA by using the SVD, you have to divide each entry in your A matrix by (N-1) so you get the eigenvalues of the covariance with the correct scale.
In your case, N=150 and you haven't done this division, hence the discrepancy.
This is explained in detail here
(Can you ask one question, please? Or at least list your questions separately. Your post reads like a stream of consciousness because you are not asking one single question.)
You probably used cov incorrectly by not transposing the matrix first. If cov_mat is 4-by-4, then eig will produce four eigenvalues and four eigenvectors.
Note how SVD and PCA, while related, are not exactly the same. Let X be a 4-by-150 matrix of observations where each 4-element column is a single observation. Then, the following are equivalent:
a. the left singular vectors of X,
b. the principal components of X,
c. the eigenvectors of X X^T.
Also, the eigenvalues of X X^T are equal to the square of the singular values of X. To see all this, let X have the SVD X = QSV^T, where S is a diagonal matrix of singular values. Then consider the eigendecomposition D = Q^T X X^T Q, where D is a diagonal matrix of eigenvalues. Replace X with its SVD, and see what happens.
Question already adressed: Principal component analysis in Python