Numpy operation for euclidean distance between multidimensional arrays

Numpy operation for euclidean distance between multidimensional arrays - python

I have two numpy arrays. 'A' of size w,h,2 and 'B' with n,2.
In other words, A is a 2-dimensional array of 2D vectors while B is a 1D array of 2D vectors.
What i want as a result is an array of size w,h,n. The last dimension is an n-dimensional vector where each of the components is the euclidean distance between the corresponding vector from A (denoted by the first two dimensions w and h) and the nth vector of B.
I know that i can just loop through w, h and n in python manually and calculate the distance for each element, but i like to know if there is a smart way to do that with numpy operations to increase performance.
I found some similar questions but unfortunately all of those use input arrays of the same dimensionality.

Approach #1
You could reshape A to 2D, use Scipy's cdist that expects 2D arrays as inputs, get those euclidean distances and finally reshape back to 3D.
Thus, an implementation would be -
from scipy.spatial.distance import cdist
out = cdist(A.reshape(-1,2),B).reshape(w,h,-1)
Approach #2
Since, the axis of reduction is of length 2 only, we can just slice the input arrays to save memory on intermediate arrays, like so -
np.sqrt((A[...,0,None] - B[:,0])**2 + (A[...,1,None] - B[:,1])**2)
Explanation on A[...,0,None] and A[...,1,None] :
With that None we are just introducing a new axis at the end of sliced A. Well, let's take a small example -
In [54]: A = np.random.randint(0,9,(4,5,2))
In [55]: A[...,0].shape
Out[55]: (4, 5)
In [56]: A[...,0,None].shape
Out[56]: (4, 5, 1)
In [57]: B = np.random.randint(0,9,(3,2))
In [58]: B[:,0].shape
Out[58]: (3,)
So, we have :
A[...,0,None] : 4 x 5 x 1
B[:,0] : 3
That is essentially :
A[...,0,None] : 4 x 5 x 1
B[:,0] : 1 x 1 x 3
When the subtraction is performed, the singleton dims are broadcasted corresponding to the dimensions of the other participating arrays -
A[...,0,None] - B : 4 x 5 x 3
We repeat this for the second index along the last axis. We add these two arrays after squaring and finally a square-root to get the final eucl. distances.

Related

How to visualize/connect vectors, matrices and representations in Python and numpy arrays?

I am having trouble visualizing scalars, vectors and matrices as how they are written in a math/physics class to how they would be represented in plain Python and numpy and their corresponding notions of dimensions, axes and shapes.
If I have a scalar, say 5
>>> b = np.array(5)
>>> np.ndim(b)
0
I have 0 dimensions for 5 but what are the axes here? There are functions for ndim and shape but not axes.
For a vector like this:
we say that we have 2 dimensions in physics/math class because it represents a 2D vector but it looks like numpy uses a different notion of this.
Why is it that ndim gives 1 and shape gives what the dimension is?
>>> c = np.array([1,-3])
>>> c
array([ 1, -3])
>>> c.ndim
1
>>> c.shape
(2,)
np.ndim gives 1 then?
I have looked at this tutorial on axes but haven't been able to get how the axes then apply here.
How would you represent the vector above in Python and numpy? Would this be [1, -3] in Python or [[1], [-3]]? How about in numpy? Would it be np.array([1, -3]) or np.array([[1], [-3]])? Which I tend to write, for my eyes' sake, as
np.array([
[1],
[-3]
])
Other than vectors, how would this matrix be represented in both plain Python and numpy? The documentation states that we need to use np.arrays instead.
It is no longer recommended to use this class, even for linear algebra. Instead use regular arrays. The class may be removed in the future.
When it comes to multidimensional arrays, how would I represent/visualize multiple values for all the points in a 3D cube? Say we have a Rubik's cube and each of the sub cubes has a temperature and a color represented with red, green and blue so 4 values for each cube?

A scalar is not an array, so it has 0 dimensions.
np.array([1,-3]) is a 1D array, so c.shape returns a tuple with only one element (2,), just the first dimension and it's telling you there is only 1 dimension and 2 elements in that dimension.
You are correct np.array([[1], [-3]]) is the vector you have in 2. c.shape gives (2,1) meaning there are 2 rows and 1 column. c.ndim gives 2 since there are 2 dimensions x and y. It's a 2D/planar array
For 3., you would create it as np.array([[1,2,3], [4,5,6], [7,8,9]]). shape returns (3,3) meaning 3 rows and 3 columns. ndim returns 2 because it's still a 2D/planar array.

A ndarray has a shape, a tuple. ndim is the length of that tuple, and may be 0. The array has ndim axes (sometimes called dimensions).
np.array(5)
has shape (), 0 ndim and no axes.
np.array([1,2,3,4])
has (4,) shape, and 1 axis. It can be reshaped to (4,1), or (1,4) or (2,2) or even (2,1,2) or (1,4,1).
Your A can be created with
A = np.arange(1,10).reshape(3,3)
That's a 9 element 1d array reshaped to (3,3)
numpy arrays have a print display, with [] marking dimensional nesting. A.tolist() produces a list with 3 elements, each a 3 element list.
Rows, columns, planes are useful ways of talking about arrays, but are not a formal part of their definition.

Element wise divide like MATLAB's ./ operator?

I am trying to normalize some Nx3 data. If X is a Nx3 array and D is a Nx1 array, in MATLAB, I can do
Y = X./D
If I do the following in Python, I get an error
X = np.random.randn(100,3)
D = np.linalg.norm(X,axis=1)
Y = X/D
ValueError: operands could not be broadcast together with shapes (100,3) (100,)
Any suggestions?
Edit: Thanks to dm2.
Y = X/D.reshape((100,1))
Another way is to use scikitlearn.
from sklearn import preprocessing
Y = preprocessing.normalize(X)

From numpy documentation on array broadcasting:
When operating on two arrays, NumPy compares their shapes
element-wise. It starts with the trailing (i.e. rightmost) dimensions
and works its way left. Two dimensions are compatible when
they are equal, or
one of them is 1
Both of your arrays have the same first dimension, but your X array is 2-dimensional, while your D array is 1-dimensional, which means the shapes of these two arrays do not meet the requirements to be broadcast together.
To make sure they do, you could reshape your D array into a 2-dimensional array of shape (100,1), which would satisfy the requirements to broadcast: rightmost dimensions are 3 and 1 (one of them is 1) and the other dimensions are equal (100 and 100).
So:
Y = X/D.reshape((-1,1))
or
Y = X/D.reshape((100,1))
or
Y = X/D[:,np.newaxis]
Should give you the result you're after.

numpy .dot using lists and arrays, whats the difference

When multiplication of two similar matrices 1*2 like [1,2], [3,5] is carried out using numpy.dot, it gives a result, when in fact it should be giving a shape and dimension error like while multiplying two similar arrays. What is going on under the hood?
a=[1,2]
b=[6,3]
result=[np.dot(b, a)]
print(result)
O/P= 12
But,
a=[[1,2]]
b=[[6,3]]
result=[np.dot(b, a)]
print(result)
Error:
O/P= ValueError: shapes (1,2) and (1,2) not aligned: 2 (dim 1) != 1
(dim 0)

As per the documentation here,
If both a and b are 1-D arrays, it is inner product of vectors (without complex conjugation).
If both a and b are 2-D arrays, it is matrix multiplication, but using matmul or a # b is preferred.
Case 1: a and b are 1-D arrays, so result is 1*6+2*3=12.
Case 2: a and b are 2-D arrays, so we will do matrix product of these two. It raises ValueError since the last dimension of a is not the same size as the second-to-last dimension of b.

Adding on to Anubhav Singh's correct answer, note that a matrix product of a row vector with a column vector returns a 1-by-1 matrix whose sole entry is the dot product of the two vectors, so in this case,
In [32]: a = np.array([[1,2]])
In [33]: b = np.array([[6,3]])
In [34]: a # b.T
Out[34]: array([[12]])
In [35]: np.dot(a, b.T)
Out[35]: array([[12]])
In [36]: np.dot(a[0], b[0])
Out[36]: 12
This is why np.dot behaves the way it does.

Numpy inner product of 2 column vectors

How can I take an inner product of 2 column vectors in python's numpy
Below code does not work
import numpy as np
x = np.array([[1], [2]])
np.inner(x, x)
It returned
array([[1, 2],
[2, 4]])`
instead of 5

The inner product of a vector with dimensions 2x1 (2 rows, 1 column) with another vector of dimension 2x1 (2 rows, 1 column) is a matrix with dimensions 2x2 (2 rows, 2 columns). When you take the inner product of any tensor the inner most dimensions must match (which is 1 in this case) and the result is a tensor with the dimensions matching the outter, i.e.; a 2x1 * 1x2 = 2x2.
What you want to do is transpose both such that when you multiply the dimensions are 1x2 * 2x1 = 1x1.
More generally, multiplying anything with dimensions NxM by something with dimensionsMxK, yields something with dimensions NxK. Note the inner dimensions must both be M. For more, review your matrix multiplication rules
The np.inner function will automatically transpose the second argument, thus when you pass in two 2x1, you get a 2x2, but if you pass in two 1x2 you will get a 1x1.
Try this:
import numpy as np
x = np.array([[1], [2]])
np.inner(np.transpose(x), np.transpose(x))
or simply define your x as row vectors initially.
import numpy as np
x = np.array([1,2])
np.inner(x, x)

i think you mean to have:
x= np.array([1,2])
in order to get 5 as output, your vector needs to be 1xN not Nx1 if you want to apply np.inner on it

Try the following it will work
np.dot(np.transpose(a),a))

make sure col_vector has shape (N,1) where N is the number of elements
then simply sum one to one multiplication result
np.sum(col_vector*col_vector)

shape of Vector in numpy

I am confused by the fact that
a = np.array([1,2])
a.T == a # True
and also
I = np.array([[1,0],[0,1]])
np.dot(a, I) = np.dot(I, a) # both sides work
Is the shape of vector (or array) in this case 1*2 or 2*1 ?

The vector a has shape 2, not 1 × 2 nor 2 × 1 (it is neither a column nor row vector), which is why transposition doesn't have any effect, as transposition by default reverses the ordering of the axes.
Numpy is very lenient about what kinds of arrays can be multiplied using dot:
it is a sum product over the last axis of a and the second-to-last of b

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Numpy operation for euclidean distance between multidimensional arrays - python

Related

How to visualize/connect vectors, matrices and representations in Python and numpy arrays?

Element wise divide like MATLAB's ./ operator?

numpy .dot using lists and arrays, whats the difference

Numpy inner product of 2 column vectors

shape of Vector in numpy

Categories

Resources