Howto: CVXPY Matrix Inequality Constraints - python

I am trying to formulate an optimization problem in the following way:
My optimization variable x is a n*n matrix.
x should be PSD.
It should be in the range 0<=x<=I. Meaning, it would be in the range from the all zeros square matrix to n dimensional identity matrix.
Here is what I have come up with so far:
import cvxpy as cp
import numpy as np
import cvxopt
x = cp.Variable((2, 2), PSD=True)
a = cvxopt.matrix([[1, 0], [0, 0]])
b = cvxopt.matrix([[.5, .5], [.5, .5]])
identity = cvxopt.matrix([[1, 0], [0, 1]])
zeros = cvxopt.matrix([[0, 0], [0, 0]])
constraints = [x >= zeros, x <= identity]
objective = cp.Maximize(cp.trace(x*a - x * b))
prob = cp.Problem(objective, constraints)
prob.solve()
This gives me a result of [[1, 0], [0, 0]] as the optimal x, with a maximum trace of .5. But that should not be the case. Because I have done this same program in CVX in matlab and I got the answer matrix as [[.85, -.35], [-.35, .14]] with an optimal value of .707. Which is correct.
I think my constraint formulation is not correct or not following cvxpy standards. How do I enforce the constraints in my program correctly?
(Here is my matlab version of the code:)
a = [1, 0; 0, 0];
b = [.5, .5; .5, .5];
cvx_begin sdp
variable x(2, 2) hermitian;
maximize(trace(x*a - x*b))
subject to
x >= 0;
x <= eye(2);
cvx_end
TIA

You need to use the PSD constraint. If you compare a matrix against a scalar, cvxpy does elementwise inequalities unless you use >> or <<. You already have constrained x to be PSD when you created it so all you need to change is:
constraints = [x << np.eye(2)]
Then I get your solution:
array([[ 0.85355339, -0.35355339],
[-0.35355339, 0.14644661]])

Related

Finding eigenvalues of a matrix with unknown variables using numpy.linalg.eig

As an example, I have the following matrix:
$$\begin{bmatrix}a+1&1\1&1\end{bmatrix}$$
I would like to find the eigenvalue of the matrix with python.
This is my attempt:
arr = np.array( [[ a+1, 1],
[ 1, 1]] )
print(np.linalg.eig(arr))
Obviously, python tells me that a is not defined. But I dont want to define a. a should just be a variable, and I want the eigenvalues to be expressed by a.
Any ideas?
Kind regards,
Zebraboard
ddejohn is right. What you want is a symbolic operation so use sympy:
from sympy import var, Matrix
var('a')
arr = Matrix( [[ a+1, 1],
[ 1, 1]] )
arr.eigenvals()
gives
{a/2 - sqrt(a**2 + 4)/2 + 1: 1, a/2 + sqrt(a**2 + 4)/2 + 1: 1}

An equivalent but differentiable argmax expression in Tensorflow

I need a one-hot representation for the maximum value in a tensor.
For example, consider a tensor 2 x 3:
[ [1, 5, 2],
[0, 3, 7] ]
The one-hot-argmax representation I am aiming for looks like this:
[ [0, 1, 0],
[0, 0, 1] ]
I can do it as follows, where my_tensor is a N x 3 tensor:
position = tf.argmax(my_tensor, axis=1). # Shape (N x )
one_hot_pos = tf.one_hot(position, depth=3) # Shape (N x 3)
But this part of the code need be differentiable since I'm training over it.
My workaround was as follows, where EPSILON = 1e-3 is a small constant:
max_value = tf.reduce_max(my_tensor, axis=1, keepdims=True)
clip_min = max_value - EPSILON
one_hot_pos = (tf.clip_by_value(my_tensor, clip_min, max_value) - clip_min) / (max_value - clip_min)
The workaround works most of the time, but - as expected - it has some issues:
Sensible to EPSILON: if it is too small, a division by zero might happen
Can't solve ties: argmax only chooses one even in a tie situation
Do you know any better way of simulating the argmax followed by one_hot situation, while fixing the two mentioned issues, but using only differentiable Tensorflow functions?
Do some maximum, tile and multiplication operations. Like:
a = tf.Variable([ [1, 5, 2], [0, 3, 7] ]) # your tensor
m = tf.reduce_max(a, axis=1) # [5,7]
m = tf.expand_dims(m, -1) # [[5],[7]]
m = tf.tile(m, [1,3]) # [[5,5,5],[7,7,7]]
y = tf.cast(tf.equal(a,m), tf.float32)) # [[0,1,0],[0,0,1]]
This is a tricky multiplication operation that is differentiable.

Matrix QR factorization algorithms

I've been trying to visualize QR decomposition in a step by step fashion, but I'm not getting expected results. I'm new to numpy so it'd be nice if any expert eye could spot what I might be missing:
import numpy as np
from scipy import linalg
A = np.array([[12, -51, 4],
[6, 167, -68],
[-4, 24, -41]])
#Givens
v = np.array([12, 6])
vnorm = np.linalg.norm(v)
W_12 = np.array([[v[0]/vnorm, v[1]/vnorm, 0],
[-v[1]/vnorm, v[0]/vnorm, 0],
[0, 0, 1]])
W_12 * A #this should return a matrix such that [1,0] = 0
#gram-schmidt
A[:,0]
v = np.linalg.norm(A[:,0]) * np.array([1, 0, 0])
u = (A[:,0] - v)
u = u / np.linalg.norm(u)
W1 = np.eye(3) - 2 * np.outer(u, u.transpose())
W1 * A #this matrix's first column should look like [a, 0, 0]
any help clarifying the fact that this intermediate results don't show the properties that they are supposed to will be greatly received
NumPy is designed to work with homogeneous multi-dimensional arrays, it is not specifically a linear algebra package. So by design, the * operator is element-wise multiplication, not the matrix product.
If you want to get the matrix product, there are a few ways:
You can create np.matrix objects, rather than np.ndarray objects, for which the * operator is the matrix product.
You can also use the # operator, as in W_12 # A, which is the matrix product.
Or you can use np.dot(W_12, A) or W_12.dot(A), which computes the dot product.
Any one of these, using the data you give, returns the following for Givens rotation:
>>> np.dot(W_12 A)[1, 0]
-2.2204460492503131e-16
And this for the Gram-Schmidt step:
>>> (W1.dot(A))[:, 0]
array([ 1.40000000e+01, -4.44089210e-16, 4.44089210e-16])

How to derive with respect to a Matrix element with Sympy

Given the product of a matrix and a vector
A.v
with A of shape (m,n) and v of dim n, where m and n are symbols, I need to calculate the Derivative with respect to the matrix elements.
I haven't found the way to use a proper vector, so I started with 2 MatrixSymbol:
n, m = symbols('n m')
j = tensor.Idx('j')
i = tensor.Idx('i')
l = tensor.Idx('l')
h = tensor.Idx('h')
A = MatrixSymbol('A', n,m)
B = MatrixSymbol('B', m,1)
C=A*B
Now, if I try to derive with respect to one of A's elements with the indices I get back the unevaluated expression:
diff(C, A[i,j])
>>>> Derivative(A*B, A[i, j])
If I introduce the indices in C also (it won't let me use only one index in the resulting vector) I get back the product expressed as a Sum:
C[l,h]
>>>> Sum(A[l, _k]*B[_k, h], (_k, 0, m - 1))
If I derive this with respect to the matrix element I end up getting 0 instead of an expression with the KroneckerDelta, which is the result that I would like to get:
diff(C[l,h], A[i,j])
>>>> 0
I wonder if maybe I shouldn't be using MatrixSymbols to start with. How should I go about implementing the behaviour that I want to get?
SymPy does not yet know matrix calculus; in particular, one cannot differentiate MatrixSymbol objects. You can do this sort of computation with Matrix objects filled with arrays of symbols; the drawback is that the matrix sizes must be explicit for this to work.
Example:
from sympy import *
A = Matrix(symarray('A', (4, 5)))
B = Matrix(symarray('B', (5, 3)))
C = A*B
print(C.diff(A[1, 2]))
outputs:
Matrix([[0, 0, 0], [B_2_0, B_2_1, B_2_2], [0, 0, 0], [0, 0, 0]])
The git version of SymPy (and the next version) handles this better:
In [55]: print(diff(C[l,h], A[i,j]))
Sum(KroneckerDelta(_k, j)*KroneckerDelta(i, l)*B[_k, h], (_k, 0, m - 1))

build matrix from blocks

I have an object which is described by two quantities, A and B (in real case they can be more than two). Objects are correlated depending on the value of A and B. In particular I know the correlation matrix for A and for B. Just as example:
a = np.array([[1, 1, 0, 0],
[1, 1, 0, 0],
[0, 0, 1, 1],
[0, 0, 1, 1]])
b = np.array([[1, 1, 0],
[1, 1, 1],
[0, 1, 1]])
na = a.shape[0]
nb = b.shape[0]
correlation for A:
so if an element has A == 0.5 and the other equal to A == 1.5 they are fully correlated (red). Otherwise if an element has A == 0.5 and the second item has A == 3.5 they are uncorrelated (blue).
Similarly for B:
Now I want multiply the two correlation matrixes, but I want to obtain as final matrix a matrix with two axis, where the new axes are a folded version of the original axes:
def get_folded_bin(ia, ib):
return ia * nb + ib
here what I am doing:
result = np.swapaxes(np.tensordot(a, b, axes=0), 1, 2).reshape(na* nb, na * nb)
visually:
and in particular this must hold:
for ia1 in xrange(na):
for ia2 in xrange(na):
for ib1 in xrange(nb):
for ib2 in xrange(nb):
assert(a[ia1, ia2] * b[ib1, ib2] == result[get_folded_bin(ia1, ib1), get_folded_bin(ia2, ib2)])
actually my problem is to do it with more quantities (A, B, C, ...) in a general way. Maybe there is also a simpler function within numpy to do that.
np.einsum lets you simplify the tensordot expression a bit:
result = np.einsum('ij,kl->ikjl',a,b).reshape(-1, na * nb)
I don't think there's a way of eliminating the reshape.
It may also be easier to generalize to more arrays, though I wouldn't get carried away with too many iteration variables in one einsum expression.
I think finally I have found a solution:
np.kron(a,b)
and then I can compose with
np.kron(np.kron(a,b), c)

Categories