Python: Covariance matrix by hand - python

I have two vectors X0 and X1 (mx0 and mx1 are the means of each vector) and I am trying to find the covariance matrix between them. I have managed to find each element in the matrix through doing:
b1=numpy.zeros(N*1).reshape((N,1))
b2=numpy.zeros(N*1).reshape((N,1))
for i in range(0,N):
b1[i]=X0[i]-mX0
for j in range(0,N):
b2[j]=X1[j]-mX1
bii=sum(p*q for p,q in zip(b1,b1))/(N-1)
bij=sum(p*q for p,q in zip(b1,b2))/(N-1)
bji=sum(p*q for p,q in zip(b2,b1))/(N-1)
bjj=sum(p*q for p,q in zip(b2,b2))/(N-1)
but I want a nicer way of doing this through a loop, rather than doing each element separately.

If you want to compute the covariance matrix by hand, study/mimick how numpy.cov does it, or if you just want the result, use np.cov(b1, b2) directly.
import numpy as np
np.random.seed(1)
N = 10
b1 = np.random.rand(N)
b2 = np.random.rand(N)
X = np.column_stack([b1, b2])
X -= X.mean(axis=0)
fact = N - 1
by_hand = np.dot(X.T, X.conj()) / fact
print(by_hand)
# [[ 0.04735338 0.01242557]
# [ 0.01242557 0.07669083]]
using_cov = np.cov(b1, b2)
assert np.allclose(by_hand, using_cov)

Related

Efficient Matrix construction for a weighted Euclidean distance

I have M points in 2-dimensional Euclidean space, and have stored them in an array X of size M x 2.
I have constructed a cost matrix whereby element ij is the distance d(X[i, :], X[j, :]). The distance function I am using is the standard Euclidean distance weighted by an inverse of the matrix D. i.e d(x,y)= < D^{-1}(x-y) , x-y >. I would like to know if there is a more efficient way of doing this, note I have practically avoided for loops.
import numpy as np
Dinv = np.linalg.inv(D)
def cost(X, Dinv):
Msq = len(X) ** 2
mesh = []
for i in range(2): # separate each coordinate axis
xmesh = np.meshgrid(X[:, i], X[:, i]) # meshgrid each axis
xmesh = xmesh[1] - xmesh[0] # create the difference matrix
xmesh = xmesh.reshape(Msq) # reshape into vector
mesh.append(xmesh) # save/append into list
meshv = np.vstack((mesh[0], mesh[1])).T # recombined coordinate axis
# apply D^{-1}
Dx = np.einsum("ij,kj->ki", Dinv, meshv)
return np.sum(Dx * meshv, axis=1) # dot the elements
I ll try something like this, mostly optimizing your meshv calculation:
meshv = (X[:,None]-X).reshape(-1,2)
((meshv # Dinv.T)*meshv).sum(1)

Solving linear equations where each matrix element itself is matrix (2D) and each variables are 1D vectors

The simple example of the problem is:
In my case:
Any idea/suggestion is highly appreciated.
As long as the dimensions conform, A is actually "just" a matrix, even if it is built out of smaller matrices. Here's a relatively general example showing how the dimensions must go:
import numpy
import numpy.linalg
l, m, n, k = 2, 3, 4, 5
# if these are known, obviously just define them here.
A11 = numpy.random.random((l, m))
A12 = numpy.random.random((l, n))
A21 = numpy.random.random((k, m))
A22 = numpy.random.random((k, n))
x1 = numpy.random.random((m,))
x2 = numpy.random.random((n,))
A = numpy.bmat([[A11, A12],
[A21, A22]])
x = numpy.concatenate([x1, x2])
b = numpy.linalg.solve(A, x)

dot product with diagonal matrix, without creating it full matrix

I'd like to calculate a dot product of two matrices, where one of them is a diagonal matrix. However, I don't want to use np.diag or np.diagflat in order to create the full matrix, but instead use the 1D array directly filled with the diagonal values. Is there any way or numpy operation which I can use for this kind of problem?
x = np.arange(9).reshape(3,3)
y = np.arange(3) # diagonal elements
z = np.dot(x, np.diag(y))
and the solution I'm looking for should be without np.diag
z = x ??? y
Directly multiplying the ndarray by your vector will work. Numpy conveniently assumes that you want to multiply the nth column of x by the nth element of your y.
x = np.random.random((5, 5)
y = np.random.random(5)
diagonal_y = np.diag(y)
z = np.dot(x, diagonal_y)
np.allclose(z, x * y) # Will return True
The Einstein summation is an elegant solution to these kind of problems:
import numpy as np
x = np.random.uniform(0,1, size=5)
w = np.random.uniform(0,1, size=(5, 3))
diagonal_x = np.diagflat(x)
z = np.dot(diagonal_x, w)
zz = np.einsum('i,ij->ij',x , w)
np.allclose(z, zz) # Will return True
See: https://docs.scipy.org/doc/numpy/reference/generated/numpy.einsum.html#numpy.einsum

How to change elements in sparse matrix in Python's SciPy?

I have built a small code that I want to use for solving eigenvalue problems involving large sparse matrices. It's working fine, all I want to do now is to set some elements in the sparse matrix to zero, i.e. the ones in the very top row (which corresponds to implementing boundary conditions). I can just adjust the column vectors (C0, C1, and C2) below to achieve that. However, I wondered if there is a more direct way. Evidently, NumPy indexing does not work with SciPy's sparse package.
import scipy.sparse as sp
import scipy.sparse.linalg as la
import numpy as np
import matplotlib.pyplot as plt
#discretize x-axis
N = 11
x = np.linspace(-5,5,N)
print(x)
V = x * x / 2
h = len(x)/(N)
hi2 = 1./(h**2)
#discretize Schroedinger Equation, i.e. build
#banded matrix from difference equation
C0 = np.ones(N)*30. + V
C1 = np.ones(N) * -16.
C2 = np.ones(N) * 1.
diagonals = np.array([-2,-1,0,1,2])
H = sp.spdiags([C2, C1, C0,C1,C2],[-2,-1,0,1,2], N, N)
H *= hi2 * (- 1./12.) * (- 1. / 2.)
#solve for eigenvalues
EV = la.eigsh(H,return_eigenvectors = False)
#check structure of H
plt.figure()
plt.spy(H)
plt.show()
This is a visualisation of the matrix that is build by the code above. I want so set the elements in the first row zero.
As suggested in the comments, I'll post the answer that I found to my own question. There are several matrix classes in in SciPy's sparse package, they are listed here. One can convert sparse matrices from one class to another. So for what I need to do, I choose to convert my sparse matrix to the class csr_matrix, simply by
H = sp.csr_matrix(H)
Then I can set the elements in the first row to 0 by using the regular NumPy notation:
H[0,0] = 0
H[0,1] = 0
H[0,2] = 0
For completeness, I post the full modified code snippet below.
#SciPy Sparse linear algebra takes care of sparse matrix computations
#http://docs.scipy.org/doc/scipy/reference/sparse.linalg.html
import scipy.sparse as sp
import scipy.sparse.linalg as la
import numpy as np
import matplotlib.pyplot as plt
#discretize x-axis
N = 1100
x = np.linspace(-100,100,N)
V = x * x / 2.
h = len(x)/(N)
hi2 = 1./(h**2)
#discretize Schroedinger Equation, i.e. build
#banded matrix from difference equation
C0 = np.ones(N)*30. + V
C1 = np.ones(N) * -16.
C2 = np.ones(N) * 1.
H = sp.spdiags([C2, C1, C0, C1, C2],[-2,-1,0,1,2], N, N)
H *= hi2 * (- 1./12.) * (- 1. / 2.)
H = sp.csr_matrix(H)
H[0,0] = 0
H[0,1] = 0
H[0,2] = 0
#check structure of H
plt.figure()
plt.spy(H)
plt.show()
EV = la.eigsh(H,return_eigenvectors = False)
Using lil_matrix is much more efficient in scipy to change elements than simple numpy method.
H = sp.csr_matrix(H)
HL = H.tolil()
HL[1,1] = 5 # same as the numpy indexing notation
print HL
print HL.todense() # if numpy style matrix is required
H = HL.tocsr() # if csr is required

find the dot product of sub-arrays in numpy

In numpy, the numpy.dot() function can be used to calculate the matrix product of two 2D arrays. I have two 3D arrays X and Y (say), and I'd like to calculate the matrix Z where Z[i] == numpy.dot(X[i], Y[i]) for all i. Is this possible to do non-iteratively?
How about:
from numpy.core.umath_tests import inner1d
Z = inner1d(X,Y)
For example:
X = np.random.normal(size=(10,5))
Y = np.random.normal(size=(10,5))
Z1 = inner1d(X,Y)
Z2 = [np.dot(X[k],Y[k]) for k in range(10)]
print np.allclose(Z1,Z2)
returns True
Edit Correction since I didn't see the 3D part of the question
from numpy.core.umath_tests import matrix_multiply
X = np.random.normal(size=(10,5,3))
Y = np.random.normal(size=(10,3,5))
Z1 = matrix_multiply(X,Y)
Z2 = np.array([np.dot(X[k],Y[k]) for k in range(10)])
np.allclose(Z1,Z2) # <== returns True
This works because (as the docstring states), matrix_multiplyprovides
matrix_multiply(x1, x2[, out]) matrix
multiplication on last two dimensions

Categories