Numpy cross covariance - python

Let X be a (d_x,n) matrix containing n observations of a d_x-dimensional variable x, and let w be a vector of weights (probabilities) of dimension n. The weighted covariance is given in numpy by
CX = numpy.cov(X, ddof=0, aweights=w)
Let now Y be a (d_y,n) matrix containing n observations of a d_y-dimensional vector. Is there a clever way to compute the weighted cross covariance, in pseudocode
CXY = sum(W[i] * numpy.outer((X[i, :] - X_mean),(Y[i, :] - Y_mean)))
?

Related

How to find non-normalized eigen vector?

My goal is to validate whether an eigen vector is part of an nxn square.
The formula that I know A * x = lamda * x
Example
import numpy as np
A = np.array([[1,-1],
[6,4]])
eigvalues, eigvectors = np.linalg.eig(A)
The eigvectors output are normalized eigen vector.
According to this website, normalized eigen vector is an eigvector having unit length.
In this case I want to check whether eigen vector is part of an n x n square matrix.
x = [ 1
3 ]
So, what I did is that to multiply the eigen vector in question with the square root of the eigen vector:
normalized_eigvector = np.array([1,3]) * npsqrt(1 + 9)
np.array_equal(normalized_eigvector, np.absolute(eigectors[1]))
Is this the correct way to do it?

Why is numpy saying that my totally positive matrix is not positive semi-definite?

I generate a correlation matrix by drawing from a uniform distribution:
corr = np.random.uniform(low=0.1, high=0.3, size=[20, 20])
and set the main diagonal elements to one
corr[np.diag_indices_from(corr)] = 1
Then, I make the correlation matrix symmetric
corr[np.triu_indices(n=20, k=1)] = corr.T[np.triu_indices(n=20, k=1)]
which yields a totally positive matrix ,i.e., all values of the matrix are strictly positive.
According to numpy, however, the matrix is not positive (semi-) definit.
np.all(np.linalg.eigvals(corr) >= 0)
False
That's still not guaranteed to be PSD
I will give you two easy ways:
Sny square non-singular matrix can be used to create a PSD matrix with
A = A # A.T
Any matrix can be used to produce a PSD matrix with
A = (A + A.T)/2
A = A - np.eye(len(A)) * (np.min(np.linalg.eigvalsh(A)) - 0.001)
If you want the minimum perturbation to a symmetric matrix (the least squares projection to the positive semidefinite cone)
A_ = (v * np.maximum(w, 0.01)) # v.T
print(np.linalg.eigvalsh(A_))
Notice that I am giving a margin of 0.01, if I used strictly zero your test could fail due to numerical errors.

How to normalize in numpy?

I have the following question: A numpy array Y of shape (N, M) where Y[i] contains the same data as X[i], but normalized to have mean 0 and standard deviation 1.
I have mapped the array like this:
(X - np.mean(X)) / np.std(X)
but it doesn't give me the correct answer.
You want to normalize along a specific dimension, for instance -
(X - np.mean(X, axis=0)) / np.std(X, axis=0)
Otherwise you're calculating the statistics over the whole matrix, i.e. subtracting the global mean of all points/features and the same with the standard deviation.
Use norm from linalg
https://numpy.org/doc/stable/reference/generated/numpy.linalg.norm.html
from numpy import linalg as LA
a = np.arange(9) - 4
LA.norm(a)
>>>7.745966692414834
Then you divide the array by the norm :
a/LA.norm(a)

Compute distances in kmeans Lloyds algorithm

I'm trying to compute the distance between each point of matrix X (shape N,D) and matrix mu (shape K,D) using numpy:
np.array([[np.linalg.norm(x - m) for m in mu] for x in X])
This is very slow. Is there a faster way to get the same result?
We can extend the dimensions of one matrix to a third dimension and then calculate the distance:
np.linalg.norm(X - mu[:,None], axis=-1, ord=2).T

Can covariance of A be used to calculate A'*A?

I am doing a benchmarking test in python on different ways to calculate A'*A, with A being a N x M matrix. One of the fastest ways was to use numpy.dot().
I was curious if I can obtain the same result using numpy.cov() (which gives the covariance matrix) by somehow varying the weights or by somehow pre-processing the A matrix ? But I had no success. Does anyone know if there is any relation between the product A'*A and covariance of A, where A is a matrix with N rows/observations and M columns/variables?
Have a look at the cov source. Near the end of the function it does this:
c = dot(X, X_T.conj())
Which is basically the dot product you want to perform. However, there are all kinds of other operations: checking inputs, subtracting the mean, normalization, ...
In short, np.cov will never ever be faster than np.dot(A.T, A) because internally it contains exactly that operation.
That said - the covariance matrix is computed as
Or in Python:
import numpy as np
a = np.random.rand(10, 3)
m = np.mean(a, axis=0, keepdims=True)
x = np.dot((a - m).T, a - m) / (a.shape[0] - 1)
y = np.cov(a.T)
assert np.allclose(x, y) # check they are equivalent
As you can see, the covariance matrix is equivalent to the raw dot product if you subtract the mean of each variable and divide the result by the number of samples (minus one).

Categories