Vectorized implementation for Euclidean distance [duplicate] - python

This question already has answers here:
How can the Euclidean distance be calculated with NumPy?
(25 answers)
Closed 4 years ago.
I am trying to compute a vectorized implementation of Euclidean distance(between each element in X and Y using inner product). The data as follows:
X = np.random.uniform(low=0, high=1, size=(10000, 5))
Y = np.random.uniform(low=0, high=1, size=(10000, 5))
What I did was:
euclidean_distances_vectorized = np.array(np.sqrt(np.sum(X**2, axis=1) - 2 * np.dot(X, Y.T) + np.sum(Y**2, axis=1)))
Although this gives 'some output' the answer is wrong as each row still contains 5 elements.
Does anyone know what I am doing wrong?

If I understood correctly this should do
np.linalg.norm(X - Y, axis=1)
Or with einsum (square root of the dot product of each difference pair along the first axis)
np.sqrt(np.einsum('ij,ij->i...', X - Y, X - Y))
If you want all pairwise distances
from scipy.spatial.distance import cdist
cdist(X, Y)

Related

Applying across a numpy axis (row-wise correlation of every pair of rows between two arrays with NaNs)

I've got a function f(x,y) that takes two 1-d arrays and returns a scalar.
If I have a 2d matrix of shape (M,N), how do I efficiently apply the function pairwise across the 0 axis to end up with a square symmetric result of shape (M, M)?
Edit:
I'm trying to calculate pairwise correlation of an array of 1d arrays:
def f(x, y):
sigma_x_y = np.nanstd(x) * np.nanstd(y)
covariance = np.nanmean((x-np.nanmean(x))*(y-np.nanmean(y)))
return covariance/sigma_x_y
I think this is what you are looking for. The equations are similar to your function f(x, y):
x_m = x - np.nanmean(x,axis=1)[:,None]
y_m = y - np.nanmean(y,axis=1)[:,None]
X = np.nansum(x_m**2,axis=1)
Y = np.nansum(y_m**2,axis=1)
corr = np.dot(x_m,y_m.T)/np.sqrt(np.dot(X[:,None],Y[None]))
EDIT: If you wish to ignore NaN values in calculating correlation of two rows, simply replace last line with this:
corr = np.dot(np.nan_to_num(x_m), np.nan_to_num(y_m).T)/np.sqrt(np.dot(X[:,None],Y[None]))

How most efficiently compute the diagonal of a matrix product [duplicate]

This question already has answers here:
Using numpy einsum to compute inner product of column-vectors of a matrix
(2 answers)
Closed 2 years ago.
I want to compute the following:
import numpy as np
n= 3
m = 2
x = np.random.randn(n,m)
#Method 1
y = np.zeros(m)
for i in range(m):
y[i] = x[:,i] # x[:,i]
#Method 2
y2 = np.diag(x.T # x)
The first method has the problem that it uses a for loop, which can't be very effecient (I need to do this in pytorch on a GPU millions of times)
The second method computes the full matrix product, when I only need the diagonal entries, so that can't be very efficient either.
I'm wondering whether there exist any clever way of doing this?
Use a manually constructed sum-product. You want the sums of the squares of the individual columns:
y = (x * x).sum(axis=0)
As Divakar suggests, np.einsum will likely offer a less memory-intensive option, since it does not require the temporary array x * x:
y = np.einsum('ij,ij->j', x, x)

Multiplying Two matrices (2x1) and (2x2( [duplicate]

This question already has an answer here:
numpy matrix vector multiplication [duplicate]
(1 answer)
Closed 4 years ago.
Hi for my code I have to multiply a point/vector (1,0) by matrix [1.00583, -0.087156], [0.087156, 1.00583]. The result should give me a new point (x,y)
This is what I have so far:
import matplotlib.pyplot as plt
import numpy as np
A = np.array([[1],[0]])
B = np.array([[1.00583, -0.087156], [0.087156, 1.00583]])
test =np.multiply(A, B)
print (test)
The result still gives me a (2x2) matrix instead of a (2x1) that i can use as a point. Is there another function or a better way of going about this?
First thing, if you want to do matrix multiplication use numpy.matmul or the # operator, e.g. B#A.
Also, when you define A like
A = np.array([[1],[0]])
this creates a 2x1 vector (not 1x2). So if you want to multiply the vector A with the matrix B (2x2) this should be C = B*A, where C will be a 2x1 vector
C = B#A
Otherwise if you want to multiply A*B and B is still the 2x2 matrix you should define A as a 1x2 vector:
A = np.array([1,0])
and get a 1x2 result with
C = A#B
test =np.matmul(B,A)
This should do the trick.

Memory efficient ways of computing large correlation matrices? [duplicate]

This question already has answers here:
Computing the correlation coefficient between two multi-dimensional arrays
(3 answers)
Closed 6 years ago.
I have two matrices where the variables are the columns , and both matrices have the same number of samples.
One matrix is 800 by 200, and the other is 800 by 100000. I want to compute the correlation matrix between the columns of these matrices so I've tried this:
import numpy as np
def matcor(x, y):
xc = x.shape[1]
return np.corrcoef(x, y, rowvar=False)[xc:, :xc]
xy_cor = matcor(X, Y)
However this ends up taking a large amount of memory, I get a memory error at around 64GB of memory used, and it might end up taking up more than that. Is there a memory efficient way to compute this ?
Unfortunately, the cov and corrcoef functions don't allow a direct calculation of only the xy correlation. Since the problem is obviously too big to be tackled in full, you cannot compute the full matrix and extract the slice afterwards, which is what you are currently doing. Instead, compute the xy part by hand:
samples = x.shape[0]
centered_x = x - np.sum(x, axis=0, keepdims=True) / samples
centered_y = y - np.sum(y, axis=0, keepdims=True) / samples
cov_xy = 1./(samples - 1) * np.dot(centered_x.T, centered_y)
var_x = 1./(samples - 1) * np.sum(centered_x**2, axis=0)
var_y = 1./(samples - 1) * np.sum(centered_y**2, axis=0)
corrcoef_xy = cov_xy / np.sqrt(var_x[:, None] * var_y[None,:])
You need the variances to normalize the covariance matrix. Else, only the first four lines would be needed.

Creation and usage coordinates grid [duplicate]

This question already has an answer here:
indexing spherical subset of 3d grid data in numpy
(1 answer)
Closed 9 years ago.
Can anybody help me - is there any way to create coordinates grid as numpy array like this?
(0,0) (0,1) (0,2) ... (0,n)
(1,0) (1,1) (1,2) ... (1,n)
...........................
(m,0) (m,1) (m,2) ... (m,n)
If yes, how can I find distance from every point to circle with center in (m/2, n/2) and radius R?
(x - m/2)^2 + (y - n/2)^2 - R^2 = ?
A standard way of doing this is with the meshgrid function. It makes two arrays, with the x and y coordinates of the points you want. To get the coordinates shown in your question you can do
import numpy as np
x = np.arange(m+1)
y = np.arange(n+1)
X, Y = np.meshgrid(x, y)
then to calculate the distance you want you can do
np.sqrt((X - m/2.)**2 + (Y - n/2.)**2) - R
For more information on meshgrid see the documentation
http://docs.scipy.org/doc/numpy/reference/generated/numpy.meshgrid.html
Also, if you want evenly spaced values between two endpoints instead of just 0 through m or 0 through n, consider using the linspace function.

Categories