How to generate a random covariance matrix in Python?

How to generate a random covariance matrix in Python? - python

So I would like to generate a 50 X 50 covariance matrix for a random variable X given the following conditions:
one variance is 10 times larger than the others
the parameters of X are only slightly correlated
Is there a way of doing this in Python/R etc? Or is there a covariance matrix that you can think of that might satisfy these requirements?
Thank you for your help!

OK, you only need one matrix and randomness isn't important. Here's a way to construct a matrix according to your description. Start with an identity matrix 50 by 50. Assign 10 to the first (upper left) element. Assign a small number (I don't know what's appropriate for your problem, maybe 0.1? 0.01? It's up to you) to all the other elements. Now take that matrix and square it (i.e. compute transpose(X) . X where X is your matrix). Presto! You've squared the eigenvalues so now you have a covariance matrix.
If the small element is small enough, X is already positive definite. But squaring guarantees it (assuming there are no zero eigenvalues, which you can verify by computing the determinant -- if the determinant is nonzero then there are no zero eigenvalues).
I assume you can find Python functions for these operations.

Related

Generating invertible matrices in numpy/tensorflow

I would like to generate invertible matrices (specifically those from GL(n), a general linear group of size n) using Tensorflow and/or Numpy for use with my neural network.
How can this be done and what would be the best way of doing so?
I understand there is a way to generate symmetric invertible matrices by computing (A + A.T)/2 for arbitrary square matrices A, however, I would like mine to not just be symmetric.

I happened to have found one way which I believe can generate a large variety of random invertible matrices using diagonal dominance.
The theorem is that given an nxn matrix, if the abs of the diagonal element is larger than the sum of the abs of all the row elements with respect to the row the diagonal element is in, and this holds true for all rows, then the underlying matrix is invertible. (here is the corresponding wikipedia article: https://en.wikipedia.org/wiki/Diagonally_dominant_matrix)
Therefore the following code snippet generates an arbitrary invertible matrix.
n = 5 # size of invertible matrix I wish to generate
m = np.random.rand(n, n)
mx = np.sum(np.abs(m), axis=1)
np.fill_diagonal(m, mx)

numpy and solving symmetric systems

Suppose I have a symmetric matrix A and a vector b and want to find A^(-1) b. Now, this is well-known to be doable in time O(N^2) (where N is the dimension of the vector\matrix), and I believe that in MATLAB this can be done as b\A. But all I can find in python is numpy.linalg.solve() which will do Gaussian elimination, which is O(N^3). I must not be looking in the right place...

scipy.linalg.solve has an argument to make it assume a symmetric matrix:
x = scipy.linalg.solve(A, b, assume_a="sym")
If you know your matrix is not just symmetric but positive definite you can give this stronger assumption instead, as "pos".

Can we calculate only the n-th eigenvalue and eigenvector of a very large sparse matrix?

I have a very big sparse matrix A = 7Mi-by-7Mi matrix. I am using Matlab's eigs(A,k) function which can calculate first k eigenvalues and vectors.
I need all of its eigenvector and values. But I can't store all of the eigenvectors because it requires a lot of memory.
Is there any way (in Matlab or Python) by which I can get eigenvectors one by one in a for loop? i.e. in ith iteration, I get the ith eigenvector and value.

If you have a good guess about how large the eigenvalue you are looking for is, say lambda_guess, you can use the Power iteration on
(A - lambda_guess* Id)^-1
This approach is sometimes referred to as the inverse-shift method. Here the method will converge to the eigenvalue closest to lambda_guess (and the better your guess the faster the convergence). Note that you wouldn't store the inverse, but only compute the solution of
x_next_iter = solve(A - lambda_guess*Id, x_iter), possibly itself with an iterative linear solver.
I would combine this with a subspace iteration method with subspace at least size two. This way, on your first iteration, you can find the smallest and second smallest eigenvalues lambda1, lambda2.
Then you can try lambdaguess= lambda2+ epsilon so that The first and second eigenvector outputted correspond to the second and third smallest eigenvalues, respectively.(if the first eigenvalue of this iteration is not the same as the value of lambda2 for your previous iteration, you need to make epsilon smaller and repeat. In practice you would test that their difference is small enough, to account for roundoff error and the fact that iterative methods are never exact). You repeat this until you get the eigenvalue number you are looking for. It's going to be slow, but you will only use two eigenvectors at any time.
NOTE: we assume all eigenvalues are distinct, otherwise this problem will not have a low memory solution with the usual techniques. In general, if the maximal multiplicity of an eigenvalue is m, you would need m vectors in memory for subspace iteration to converge.

Spectral norm 2x2 matrix in tensorflow

I've got a 2x2 matrix defined by the variables J00, J01, J10, J11 coming in from other inputs. Since the matrix is small, I was able to compute the spectral norm by first computing the trace and determinant
J_T = tf.reduce_sum([J00, J11])
J_ad = tf.reduce_prod([J00, J11])
J_cb = tf.reduce_prod([J01, J10])
J_det = tf.reduce_sum([J_ad, -J_cb])
and then solving the quadratic
L1 = J_T/2.0 + tf.sqrt(J_T**2/4.0 - J_det)
L2 = J_T/2.0 - tf.sqrt(J_T**2/4.0 - J_det)
spectral_norm = tf.maximum(L1, L2)
This works, but it looks rather ugly and it isn't generalizable to larger matrices. Is there cleaner way (maybe a method call that I'm missing) to compute spectral_norm?

The spectral norm of a matrix J equals the largest singular value of the matrix.
Therefore you can use tf.svd() to perform the singular value decomposition, and take the largest singular value:
spectral_norm = tf.svd(J,compute_uv=False)[...,0]
where J is your matrix.
Notes:
I use compute_uv=False since we are interested only in singular values, not singular vectors.
J does not need to be square.
This solution works also for the case where J has any number of batch dimensions (as long as the two last dimensions are the matrix dimensions).
The elipsis ... operation works as in NumPy.
I take the 0 index because we are interested only in the largest singular value.

How to avoid a seemingly unavoidable divide by zero

Ok, so I'm doing the power method in python.
Basically, the equation revolves around multiplying a matrix A by a vector (y) like this:
for i in range(0, 100):
y = mult(matrix,y)
y = scalarMult(y, 1.0/y[0][0])
Then you multiply the vector y by 1/(the first element in y). Now, if the matrix is sparse or has a zero in just the right spot, you will get a zero for the first element in a. None of my googling skills have yielded a modification to the power method to avoid this.
For those interested, I'm trying to solve for the eigenvalues of a matrix; and my code works as long as there aren't too many zeros.

Instead of dividing by first element of the vector you can divide by one of its norms.
For example if you use second norm, the length of the vector will always be 1.
norm = sum(e**2 for e in y)**0.5
Norm of the vector is only zero when vector is 0 (has all elements 0), so division by 0 should not happen.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.