I have a numpy array size (9126,12) and two reference cluster points (2,12) that I'm trying to calculate the distance to for the array in order to label them. I understand in practice how this is meant to happen but just can't implement it due to sending different size arrays.
Know I can use numpy.linalg but it's part of a home work assignment so not allowed to do so.
ValueError: operands could not be broadcast together with shapes (9126,12) (2,12)
def euclid_dist(v1, v2):
return np.sqrt(((v1-v2)**2).sum(axis = 1))
def check_euclid_dist(data, reference_vectors):
npdata = data.to_numpy()
dst = euclid_dist(npdata, reference_vectors)
# Get the indices of minimum element in numpy array
result = np.where(dst == np.amin(dst))
print(result)
return result
You can compute the distance of each vector to each of the reference points by inserting an extra dimension in both arrays and let Numpy broadcast them against each other:
distances = np.linalg.norm(npdata[:, None, ...] - reference_vectors[None, ...], axis=-1)
Then you can find the nearest cluster by using np.argmin:
cluster_id = np.argmin(distances, axis=1)
Related
I've an image of about 8000x9000 size as a numpy matrix. I also have a list of indices in a numpy 2xn matrix. These indices are fractional as well as may be out of image size. I need to interpolate the image and find the values for the given indices. If the indices fall outside, I need to return numpy.nan for them. Currently I'm doing it in for loop as below
def interpolate_image(image: numpy.ndarray, indices: numpy.ndarray) -> numpy.ndarray:
"""
:param image:
:param indices: 2xN matrix. 1st row is dim1 (rows) indices, 2nd row is dim2 (cols) indices
:return:
"""
# Todo: Vectorize this
M, N = image.shape
num_indices = indices.shape[1]
interpolated_image = numpy.zeros((1, num_indices))
for i in range(num_indices):
x, y = indices[:, i]
if (x < 0 or x > M - 1) or (y < 0 or y > N - 1):
interpolated_image[0, i] = numpy.nan
else:
# Todo: Do Bilinear Interpolation. For now nearest neighbor is implemented
interpolated_image[0, i] = image[int(round(x)), int(round(y))]
return interpolated_image
But the for loop is taking huge amount of time (as expected). How can I vectorize this? I found scipy.interpolate.interp2d, but I'm not able to use it. Can someone explain how to use this or any other method is also fine. I also found this, but again it is not according to my requirements. Given x and y indices, these generated interpolated matrices. I don't want that. For the given indices, I just want the interpolated values i.e. I need a vector output. Not a matrix.
I tried like this, but as said above, it gives a matrix output
f = interpolate.interp2d(numpy.arange(image.shape[0]), numpy.arange(image.shape[1]), image, kind='linear')
interp_image_vect = f(indices[:,0], indices[:,1])
RuntimeError: Cannot produce output of size 73156608x73156608 (size too large)
For now, I've implemented nearest-neighbor interpolation. scipy interp2d doesn't have nearest neighbor. It would be good if the library function as nearest neighbor (so I can compare). If not, then also fine.
It looks like scipy.interpolate.RectBivariateSpline will do the trick:
from scipy.interpolate import RectBivariateSpline
image = # as given
indices = # as given
spline = RectBivariateSpline(numpy.arange(M), numpy.arange(N), image)
interpolated = spline(indices[0], indices[1], grid=False)
This gets you the interpolated values, but it doesn't give you nan where you need it. You can get that with where:
nans = numpy.zeros(interpolated.shape) + numpy.nan
x_in_bounds = (0 <= indices[0]) & (indices[0] < M)
y_in_bounds = (0 <= indices[1]) & (indices[1] < N)
bounded = numpy.where(x_in_bounds & y_in_bounds, interpolated, nans)
I tested this with a 2624x2624 image and 100,000 points in indices and all told it took under a second.
What I am trying to do is take a numpy array representing 3D image data and calculate the hessian matrix for every voxel. My input is a matrix of shape (Z,X,Y) and I can easily take a slice along z and retrieve a single original image.
gx, gy, gz = np.gradient(imgs)
gxx, gxy, gxz = np.gradient(gx)
gyx, gyy, gyz = np.gradient(gy)
gzx, gzy, gzz = np.gradient(gz)
And I can access the hessian for an individual voxel as follows:
x = 100
y = 100
z = 63
H = [[gxx[z][x][y], gxy[z][x][y], gxz[z][x][y]],
[gyx[z][x][y], gyy[z][x][y], gyz[z][x][y]],
[gzx[z][x][y], gzy[z][x][y], gzz[z][x][y]]]
But this is cumbersome and I can't easily slice the data.
I have tried using reshape as follows
H = H.reshape(Z, X, Y, 3, 3)
But when I test this by retrieving the hessian for a specific voxel the, the value returned from the reshaped array is completely different than the original array.
I think I could use zip somehow but I have only been able to find that for making lists of tuples.
Bonus: If there's a faster way to accomplish this please let me know, I essentially need to calculate the three eigenvalues of the hessian matrix for every voxel in the 3D data set. Calculating the hessian values is really fast but finding the eigenvalues for a single 2D image slice takes about 20 seconds. Are there any GPUs or tensor flow accelerated libraries for image processing?
We can use a list comprehension to get the hessians -
H_all = np.array([np.gradient(i) for i in np.gradient(imgs)]).transpose(2,3,4,0,1)
Just to give it a bit of explanation : [np.gradient(i) for i in np.gradient(imgs)] loops through the two levels of outputs from np.gradient calls, resulting in a (3 x 3) shaped tensor at the outer two axes. We need these two as the last two axes in the final output. So, we push those at the end with the transpose.
Thus, H_all holds all the hessians and hence we can extract our specific hessian given x,y,z, like so -
x = 100
y = 100
z = 63
H = H_all[z,y,x]
Let there be some 4D array [x,y,z,k] comprised of k 3D images [x,y,z].
Is there any way to calculate the variance of each individual pixel in 3D from the 4D array?
E.g. I have a 10x10x10x5 array and would like to return a 10x10x10 variance array; the variance is calculated for each pixel (or voxel, really) along k
If this doesn't make sense, let me know and I'll try explaining better.
Currently, my code is:
tensors = []
while error > threshold:
for _ in range(5): #arbitrary
new_tensor = foo(bar) #always returns array of same size
tensors.append(new_tensor)
tensors = np.stack(tensors, axis = 3)
#tensors.shape
And I would like the calculate a variance array for tensors
There is a simple way to do that if you're using numpy:
variance = tensors.var(axis=3)
I am trying to compute the pairwise distances between all points in two binary areas/volume/hypervolume in Tensorflow.
E.g. In 2D the areas are defined as binary tensors with ones and zeros:
input1 = tf.constant(np.array([[1,0,0], [0,1,0], [0,0,1]))
input2 = tf.constant(np.array([[0,1,0], [0,0,1], [0,1,0]))
input1 has 3 points and input2 has 2 points.
So far I have managed to convert the binary tensors into arrays of spatial coordinates:
coord1 = tf.where(tf.cast(input1, tf.bool))
coord2 = tf.where(tf.cast(input2, tf.bool))
Where, coord1 will have shape=(3,2) and coord2 will have shape=(2,2). The first dimension refers to the number of points and the second to their spatial coordinates (in this case 2D).
The result that I want is a tensor with shape=(6, ) with the pairwise Euclidean distances between all of the points in the areas.
Example (the order of the distances might be incorrect):
output = [1, sqrt(5), 1, 1, sqrt(5), 1]
Since TensorFlow isn't great with loops and in my real application the number of points in each tensor is unknown, I think I might be missing some linear algebra here.
I'm not familiar with Tensorflow, but my understanding from reading this is that the underlying NumPy arrays should be easy to extract from your data. So I will provide a solution which shows how to calculate pairwise Euclidean distances between points of 3x2 and 2x2 NumPy arrays, and hopefully it helps.
Generating random NumPy arrays in the same shape as your data:
coord1 = np.random.random((3, 2))
coord2 = np.random.random((2, 2))
Import the relevant SciPy function and run:
from scipy.spatial.distance import cdist
distances = cdist(coord1, coord2, metric='euclidean')
This will return a 3x2 array, but you can use distances.flatten() to get your desired 1-dimensional array of length 6.
I have come up with an answer using only matrix multiplies and transposition. This makes use of the fact that distances can be expressed with inner products (d^2 = x^2 + y^2 - 2xy):
input1 = np.array([[1,0,0],[0,1,0],[0,0,1]])
input2 = np.array([[1,1,0],[0,0,1],[1,0,0]])
c1 = tf.cast(tf.where(tf.cast(input1, tf.bool)), tf.float32)
c2 = tf.cast(tf.where(tf.cast(input2, tf.bool)), tf.float32)
distances = tf.sqrt(-2 * tf.matmul(c1, tf.transpose(c2)) + tf.reduce_sum(tf.square(c2), axis=1)
+ tf.expand_dims(tf.reduce_sum(tf.square(c1), axis=1), axis=1))
with tf.Session() as sess:
d = sess.run(distances)
Since Tensorflow has broadcast by default the fact the arrays have different dimensions doesn't matter.
Hope it helps somebody.
I've been trying to validate my code to calculate Mahalanobis distance written in Python (and double check to compare the result in OpenCV)
My data points are of 1 dimension each (5 rows x 1 column).
In OpenCV (C++), I was successful in calculating the Mahalanobis distance when the dimension of a data point was with above dimensions.
The following code was unsuccessful in calculating Mahalanobis distance when dimension of the matrix was 5 rows x 1 column. But it works when the number of columns in the matrix are more than 1:
import numpy;
import scipy.spatial.distance;
s = numpy.array([[20],[123],[113],[103],[123]]);
covar = numpy.cov(s, rowvar=0);
invcovar = numpy.linalg.inv(covar)
print scipy.spatial.distance.mahalanobis(s[0],s[1],invcovar);
I get the following error:
Traceback (most recent call last):
File "/home/abc/Desktop/Return.py", line 6, in <module>
invcovar = numpy.linalg.inv(covar)
File "/usr/lib/python2.6/dist-packages/numpy/linalg/linalg.py", line 355, in inv
return wrap(solve(a, identity(a.shape[0], dtype=a.dtype)))
IndexError: tuple index out of range
One-dimensional Mahalanobis distance is really easy to calculate manually:
import numpy as np
s = np.array([[20], [123], [113], [103], [123]])
std = s.std()
print np.abs(s[0] - s[1]) / std
(reducing the formula to the one-dimensional case).
But the problem with scipy.spatial.distance is that for some reason np.cov returns a scalar, i.e. a zero-dimensional array, when given a set of 1d variables. You want to pass in a 2d array:
>>> covar = np.cov(s, rowvar=0)
>>> covar.shape
()
>>> invcovar = np.linalg.inv(covar.reshape((1,1)))
>>> invcovar.shape
(1, 1)
>>> mahalanobis(s[0], s[1], invcovar)
2.3674720531046645
Covariance needs 2 arrays to compare. In both np.cov() and Opencv CalcCovarMatrix, it expects the two arrays to be stacked on top of each other (Use vstack). You can also have the 2 arrays to be side by side if you change the Rowvar to false in numpy or use COVAR_COL in opencv. If your arrays are multidimentional, just flatten() them first.
So if I want to compare two 24x24 images, I flatten them both into 2 1x1024 images, then stack the two to get a 2x1024, and that is the first argument of np.cov().
You should then get a large square matrix, where it shows the results of comparing each element in array1 with each element in array2. In my example it will be 1024x1024. THAT is what you pass into your invert function.