I'm trying to solve a Sylvester matrix equation of the form
AX + XB = C
From what I've seen, these equations are usually solved with the Bartels-Stewart algorithm taking successive Schur decompositions. I'm aware scipy.linalg already has a solve_sylvester function, but I'm integrating the solution to the Sylvester equation into a neural network, so I need a way to calculate gradients to make A, B, and C learnable. Currently, I'm just solving a linear system with torch.linalg.solve using the Kronecker product and vectorization trick, but this has terrible runtime complexity. I haven't found any PyTorch support for Sylvester equations, let alone Schur decompositions, but before I try to implement Barters-Stewart on the GPU, is there a simpler way to find the gradients?
Initially I wrote a solution that would give complex X based on Bartels-Stewart algorithm for the m=n case. I had some problems because the eigenvector matrix is not accurate enough. Also the real part gives the real solution, and the imaginary part must be a solution for AX - XB = 0
import torch
def sylvester(A, B, C, X=None):
m = B.shape[-1];
n = A.shape[-1];
R, U = torch.linalg.eig(A)
S, V = torch.linalg.eig(B)
F = torch.linalg.solve(U, (C + 0j) # V)
W = R[..., :, None] - S[..., None, :]
Y = F / W
X = U[...,:n,:n] # Y[...,:n,:m] # torch.linalg.inv(V)[...,:m,:m]
return X.real if all(torch.isreal(x.flatten()[0])
for x in [A, B, C]) else X
As can be verified on the GPU with
device='cuda'
# Try different dimensions
for batch_size, M, N in [(1, 4, 4), (20, 16, 16), (6, 13, 17), (11, 29, 23)]:
print(batch_size, (M, N))
A = torch.randn((batch_size, N, N), dtype=torch.float64,
device=device, requires_grad=True)
B = torch.randn((batch_size, M, M), dtype=torch.float64,
device=device, requires_grad=True)
X = torch.randn((batch_size, N, M), dtype=torch.float64,
device=device, requires_grad=True)
C = A # X - X # B
X_ = sylvester(A, B, C)
C_ = (A) # X_ - X_ # (B)
print(torch.max(abs(C - C_)))
X.sum().backward()
A faster algorithm, but inaccurate in the current pytorch version is
def sylvester_of_the_future(A, B, C):
def h(V):
return V.transpose(-1,-2).conj()
m = B.shape[-1];
n = A.shape[-1];
R, U = torch.linalg.eig(A)
S, V = torch.linalg.eig(B)
F = h(U) # (C + 0j) # V
W = R[..., :, None] - S[..., None, :]
Y = F / W
X = U[...,:n,:n] # Y[...,:n,:m] # h(V)[...,:m,:m]
return X.real if all(torch.isreal(x.flatten()[0]) for x in [A, B, C]) else X
I will leave it here maybe in the future it will work properly.
Related
I have Gaussian beam in 2D:
After doing fft2 and angle I get strange results:
def finite2D(x,y, N, M, a, hx):
f = np.array([[0.0]*N]*N)
for i in range(len(x)):
for k in range(len(y)):
f[i][k] = np.exp(-(x[i]*x[i] + y[k]*y[k]))
D1 = fftpack.fft2(f)
D2 = fftpack.fftshift(D1)
b = N*N/(4*a*M)
x = np.linspace(-b, b, N)
y = np.linspace(-b, b, N)
xx, yy = np.meshgrid(x, y)
plt.imshow(np.abs(D2))
plt.show()
plt.imshow(np.angle(D2))
plt.show(True)
return D2, phas
a = 5
N = 128
M = 256
b = N*N/(4*a*M)
hx = 2*a/N
x = np.linspace(-a, a, N)
y = np.linspace(-a, a, N)
finite2D(x,y, N, M, a, hx)
It should be phase 0 or close to 0. Why is this not the case, and how do I fix this?
///Updated:
def finite2D(x,y, N, M, a, hx):
f = np.array([[0.0]*N]*N)
for i in range(len(x)):
for k in range(len(y)):
f[i][k] = np.exp(-(x[i]*x[i] + y[k]*y[k]))
f = fftpack.ifftshift(f)
D1 = fftpack.fft2(f)
D2 = fftpack.fftshift(D1)
b = N*N/(4*a*M)
x = np.linspace(-b, b, N)
y = np.linspace(-b, b, N)
xx, yy = np.meshgrid(x, y)
plt.imshow(np.abs(D2))
plt.show()
plt.imshow(np.angle(D2))
plt.show(True)
return D2
a = 5
N = 128
M = 256
b = N*N/(4*a*M)
hx = 2*a/N
x = np.linspace(-a, a, N, endpoint=False)
y = np.linspace(-a, a, N, endpoint=False)
finite2D(x,y, N, M, a, hx)
Phase:
The FFT asumes that the origin is in the top-left corner of the image. Thus, you are computing the FFT of a Gaussian shifted by half the image size. This shift leads to a high-frequency phase shift in the frequency domain.
To solve the problem, you need to shift the origin of your Gaussian signal to the top-left corner of the image. ifftshift does this:
f = fftpack.ifftshift(f)
D1 = fftpack.fft2(f)
D2 = fftpack.fftshift(D1)
Note that where the magnitude is very small, the phase is defined by rounding errors, don’t expect zero phase there.
The updated result looks good, but there still is a very small gradient in the central region. This is caused by the half-pixel shift of the Gaussian. This shift is given by the definition of the x and y coordinates:
N = 128
x = np.linspace(-a, a, N)
y = np.linspace(-a, a, N)
For an even-sized N, do
x = np.linspace(-a, a, N, endpoint=False)
y = np.linspace(-a, a, N, endpoint=False)
such that there is a sample where x==0.
In my problem, I have a batch of matrices, represented by a tensor of shap [BATCH,SEQUENCE,DATA]. I have a function to transform the last axis of this tensor into cartesian coordinates. It was first implemented in pure Python and then I implemented in TF.
Originally the function performed the calculations over a single element of the batch tensor. But now (code below) it is able to be applied over the batch axis. If I input only one example to the function (x = (1,SEQUENCE,DATA)), the function works properly. When the number of examples (BATCH > 1) grows, the function does not work properly, making a wrong calculation.
I already tested with both the original numpy function and the TF implementation. And i narrowed down to the tf.einsum call. What is wrong in here and how can I change this behavior?
def z2c(zmat):
cartesian = tf.Variable(tf.zeros(tf.shape(zmat)))
seq_length = zmat.shape[1]
batch = zmat.shape[0]
bead_1 = tf.constant([[zmat[i,1,0], 0., 0] for i in range(batch)])
cartesian[:,1,:].assign(bead_1)
bead_2 =
[[(zmat[i,2,0]*np.sin(zmat[i,2,1]))+zmat[i,1,0],zmat[i,2,0]*np.cos(zmat[i,2,1]), 0] \
for i in range(batch)]
cartesian[:,2,:].assign(bead_2)
r, theta, phi = zmat[:,:,0], zmat[:,:,1], zmat[:,:,2]
sinTheta = np.sin(theta)
cosTheta = np.cos(theta)
sinPhi = np.sin(phi)
cosPhi = np.cos(phi)
x = r * cosTheta
y = r * cosPhi * sinTheta
z = r * sinPhi * sinTheta
xyz = tf.Variable([-x,y,z])
xyz = tf.transpose(tf.Variable([-x,y,z]), (1,2,0))
def extend(cartesian, xyz, i):#, x,y,z):
cartesian = tf.Variable(cartesian)
a,b,c = cartesian[:,i-3,:], cartesian[:,i-2,:], cartesian[:,i-1,:]
print("Iter: {}".format(i))
print('a: {}, b: {}, c: {}\n'.format(a,b,c))
print(a.shape)
ab = (b - a)
bc = (c - b) #.normalized()
bc = bc/tf.linalg.norm(bc)
n = tf.linalg.cross(ab,bc)
n = n/tf.linalg.norm(n)
ncbc = tf.linalg.cross(n,bc)
M = tf.Variable([[bc[:,0], ncbc[:,0], n[:,0]],
[bc[:,1], ncbc[:,1], n[:,1]], [bc[:,2], ncbc[:,2], n[:,2]]])
pos = xyz[:,i,:]
cartesian[:,i,:].assign( tf.einsum('bsc,bs->bc', tf.transpose(M), xyz[:,i,:]) + c )
return cartesian, xyz, i+1
def cond(cartesian, xyz, i):
return i < seq_len
i = tf.Variable(3, dtype=tf.int32)
seq_len = tf.Variable(seq_length,dtype=tf.int32)
result, _, _ = tf.while_loop(cond = cond,
body = extend,
loop_vars = (cartesian, xyz, i))
return result
I have this interesting problem where I want to calculate the sum over the element-wise product of three matrices
While calculating \mathbf{p}_ {ijk} and c_{ijk} can be done apriori, I have my problem with f_{ijk}(x,y,z). Elements of this matrix are multivariate polynomials which depend upon the matrix indices, thus numpy.vectorize cannot be trivially applied. My best bet at tackling the issue would be to treat the (i,j,k) as additional variables such that numpy.vectorize is then subsequently applied to a 6-dimensional instead of 3-dimensional input. However, I am not sure if more efficient or alternative ways exist.
This is a simple way to implement that formula efficiently:
import numpy as np
np.random.seed(0)
l, m, n = 4, 5, 6
x, y, z = np.random.rand(3)
p = np.random.rand(l, m, n)
c = np.random.rand(l, m, n)
i, j, k = map(np.arange, (l, m, n))
xi = (x ** (l - i)) * (x ** l)
yj = (y ** (m - j)) * (y ** m)
zk = (z ** (n - k)) * (z ** n)
res = np.einsum('ijk,ijk,i,j,k->', p, c, xi, yj, zk)
print(res)
# 0.0007208482648476157
Or even slightly more compact:
import numpy as np
np.random.seed(0)
l, m, n = 4, 5, 6
x, y, z = np.random.rand(3)
p = np.random.rand(l, m, n)
c = np.random.rand(l, m, n)
t = map(lambda v, s: (v ** (s - np.arange(s))) * (v ** s), (x, y, z), (l, m, n))
res = np.einsum('ijk,ijk,i,j,k->', p, c, *t)
print(res)
# 0.0007208482648476157
Using np.einsum you minimize the need for intermediate arrays, so it should be faster that making f first (which you could get e.g. as f = np.einsum('i,j,k->ijk', xi, yj, zk)), multiplying p, c and f and then summing the result.
So I have this 3x3 G matrix (not shown here, it's irrelevant to my problem) that I created using the two variables u (a vector, x - y) and the scalar k. x_j = (x_1 (j), x_2 (j), x_3 (j)) and y_j = (y_1 (j), y_2 (j), y_3 (j)). alpha_j is a 3x3 matrix. The A matrix is block diagonal matrix of size 3nx3n. I am having trouble with the W matrix. How do I code a matrix of size 3nx3n, where the (i,j)th block is the 3x3 matrix given by alpha_i*G_[ij]*alpha_j?? I am lost.
My alpha_j matrix also seems to be having some trouble. The loop keeps throwing me the error, "only length-1 arrays can be converted to Python scalars." pls help :/
def W(x, y, k, alpha, A):
u = x - y
n = x.shape[0]
W = np.zeros((3*n, 3*n))
for i in range(0, n-1):
for j in range(0, n-1):
#u = -np.array([[x[i,0] - x[j,0]], [x[i,1] - x[j,1]], [0]]) ??
W[i][j] = (alpha_j(alpha, A) * G(u, k) * alpha_j(alpha, A))
W[i][i] = np.zeros((n, n))
return W
def alpha_j(a, A):
alph = np.array([[0,0,0],[0,0,0],[0,0,0]],complex)
rho = np.random.rand(3,1)
for i in range(0, 2):
for j in range(0, 2):
alph[i][j] = (rho[i] * a * A[i][j])
return alph
#-------------------------------------------------------------------
x1 = np.array([[1], [2], [0]])
y1 = np.array([[4], [5], [0]])
# SYSTEM PARAMETERS
# incoming Wave angle
theta = 0 # can range from [0, 2pi)
# susceptibility
chi = 10 + 1j
# wavelength
lam = 0.5 # microns (values between .4-.7)
# frequency
k = (2 * np.pi)/lam # 1/microns
# volume
V_0 = (0.05)**3 # microns^3
# incoming wave vector
K = k * np.array([[0], [np.sin(theta)], [np.cos(theta)]])
# polarization vector
vecinc = np.array([[1], [0], [0]]) # (can choose any vector perpendicular to K)
# for the fixed alpha case
alpha = (V_0 * 3 * chi)/(chi + 3)
# 3 x 3 matrix
A = np.matlib.identity(3) # could be any symmetric matrix,
#-------------------------------------------------------------------
# TEST FUNCTIONS
test = G((x1-y1), k)
print(test)
w = W(x1, y1, k, alpha, A)
print(w)
Sometimes my W loops throws me the error, "can't set an array element with a sequence." But I need to set each array element in this arbitrary matrix W to the 3x3 matrix created by multiplying alpha by G...
To your question of how to create a new array with a block for each element, the following should do the trick:
G = np.random.random([3,3])
result = np.zeros([9,9])
num_blocks = 3
a = np.random.random([3,3])
b = np.random.random([3,3])
for i in range(G.shape[0]):
for j in range(G.shape[1]):
block_result = a*G[i,j]*b
for k in range(num_blocks):
for l in range(num_blocks):
result[3*i + k, 3*j + l] = block_result[i, j]
You should be able to generalize from there. I hope I've understood correctly.
EDIT: It looks like I haven't understood correctly. I'm leaving it in hopes it spurs you to an answer. The general idea is to generate ranges of indices to operate on, and then just operate on them directly. Slicing might be helpful, too.
Ah, you asked how to create a diagonal filled with blocks. In that case:
num_diagonal_blocks = 3 # for example
for block_dim in range(num_diagonal_blocks)
# do your block calculation...
for k in range(G.shape[0]):
for l in range(G.shape[1]):
result[3*block_dim + k, 3*block_dim + l] = # assign to element of block
I think that's nearly it.
I have a list of 3D-points for which I calculate a plane by numpy.linalg.lstsq - method. But Now I want to do a orthogonal projection for each point into this plane, but I can't find my mistake:
from numpy.linalg import lstsq
def VecProduct(vek1, vek2):
return (vek1[0]*vek2[0] + vek1[1]*vek2[1] + vek1[2]*vek2[2])
def CalcPlane(x, y, z):
# x, y and z are given in lists
n = len(x)
sum_x = sum_y = sum_z = sum_xx = sum_yy = sum_xy = sum_xz = sum_yz = 0
for i in range(n):
sum_x += x[i]
sum_y += y[i]
sum_z += z[i]
sum_xx += x[i]*x[i]
sum_yy += y[i]*y[i]
sum_xy += x[i]*y[i]
sum_xz += x[i]*z[i]
sum_yz += y[i]*z[i]
M = ([sum_xx, sum_xy, sum_x], [sum_xy, sum_yy, sum_y], [sum_x, sum_y, n])
b = (sum_xz, sum_yz, sum_z)
a,b,c = lstsq(M, b)[0]
'''
z = a*x + b*y + c
a*x = z - b*y - c
x = -(b/a)*y + (1/a)*z - c/a
'''
r0 = [-c/a,
0,
0]
u = [-b/a,
1,
0]
v = [1/a,
0,
1]
xn = []
yn = []
zn = []
# orthogonalize u and v with Gram-Schmidt to get u and w
uu = VecProduct(u, u)
vu = VecProduct(v, u)
fak0 = vu/uu
erg0 = [val*fak0 for val in u]
w = [v[0]-erg0[0],
v[1]-erg0[1],
v[2]-erg0[2]]
ww = VecProduct(w, w)
# P_new = ((x*u)/(u*u))*u + ((x*w)/(w*w))*w
for i in range(len(x)):
xu = VecProduct([x[i], y[i], z[i]], u)
xw = VecProduct([x[i], y[i], z[i]], w)
fak1 = xu/uu
fak2 = xw/ww
erg1 = [val*fak1 for val in u]
erg2 = [val*fak2 for val in w]
erg = [erg1[0]+erg2[0], erg1[1]+erg2[1], erg1[2]+erg2[2]]
erg[0] += r0[0]
xn.append(erg[0])
yn.append(erg[1])
zn.append(erg[2])
return (xn,yn,zn)
This returns me a list of points which are all in a plane, but when I display them, they are not at the positions they should be.
I believe there is already a certain built-in method to solve this problem, but I couldn't find any =(
You are doing a very poor use of np.lstsq, since you are feeding it a precomputed 3x3 matrix, instead of letting it do the job. I would do it like this:
import numpy as np
def calc_plane(x, y, z):
a = np.column_stack((x, y, np.ones_like(x)))
return np.linalg.lstsq(a, z)[0]
>>> x = np.random.rand(1000)
>>> y = np.random.rand(1000)
>>> z = 4*x + 5*y + 7 + np.random.rand(1000)*.1
>>> calc_plane(x, y, z)
array([ 3.99795126, 5.00233364, 7.05007326])
It is actually more convenient to use a formula for your plane that doesn't depend on the coefficient of z not being zero, i.e. use a*x + b*y + c*z = 1. You can similarly compute a, b and c doing:
def calc_plane_bis(x, y, z):
a = np.column_stack((x, y, z))
return np.linalg.lstsq(a, np.ones_like(x))[0]
>>> calc_plane_bis(x, y, z)
array([-0.56732299, -0.70949543, 0.14185393])
To project points onto a plane, using my alternative equation, the vector (a, b, c) is perpendicular to the plane. It is easy to check that the point (a, b, c) / (a**2+b**2+c**2) is on the plane, so projection can be done by referencing all points to that point on the plane, projecting the points onto the normal vector, subtract that projection from the points, then referencing them back to the origin. You could do that as follows:
def project_points(x, y, z, a, b, c):
"""
Projects the points with coordinates x, y, z onto the plane
defined by a*x + b*y + c*z = 1
"""
vector_norm = a*a + b*b + c*c
normal_vector = np.array([a, b, c]) / np.sqrt(vector_norm)
point_in_plane = np.array([a, b, c]) / vector_norm
points = np.column_stack((x, y, z))
points_from_point_in_plane = points - point_in_plane
proj_onto_normal_vector = np.dot(points_from_point_in_plane,
normal_vector)
proj_onto_plane = (points_from_point_in_plane -
proj_onto_normal_vector[:, None]*normal_vector)
return point_in_plane + proj_onto_plane
So now you can do something like:
>>> project_points(x, y, z, *calc_plane_bis(x, y, z))
array([[ 0.13138012, 0.76009389, 11.37555123],
[ 0.71096929, 0.68711773, 13.32843506],
[ 0.14889398, 0.74404116, 11.36534936],
...,
[ 0.85975642, 0.4827624 , 12.90197969],
[ 0.48364383, 0.2963717 , 10.46636903],
[ 0.81596472, 0.45273681, 12.57679188]])
You can simply do everything in matrices is one option.
If you add your points as row vectors to a matrix X, and y is a vector, then the parameters vector beta for the least squares solution are:
import numpy as np
beta = np.linalg.inv(X.T.dot(X)).dot(X.T.dot(y))
but there's an easier way, if we want to do projections: QR decomposition gives us an orthonormal projection matrix, as Q.T, and Q is itself the matrix of orthonormal basis vectors. So, we can first form QR, then get beta, then use Q.T to project the points.
QR:
Q, R = np.linalg.qr(X)
beta:
# use R to solve for beta
# R is upper triangular, so can use triangular solver:
beta = scipy.solve_triangular(R, Q.T.dot(y))
So now we have beta, and we can project the points using Q.T very simply:
X_proj = Q.T.dot(X)
Thats it!
If you want more information and graphical piccies and stuff, I made a whole bunch of notes, whilst doing something similar, at: https://github.com/hughperkins/selfstudy-IBP/blob/9dedfbb93f4320ac1bfef60db089ae0dba5e79f6/test_bases.ipynb
(Edit: note that if you want to add a bias term, so the best-fit doesnt have to pass through the origin, you can simply add an additional column, with all-1s, to X, which acts as the bias term/feature)
This web page has a pretty great code base. It implements the theory expounded by Maple in numpy quite well, as follows:
# import numpy to perform operations on vector
import numpy as np
# vector u
u = np.array([2, 5, 8])
# vector n: n is orthogonal vector to Plane P
n = np.array([1, 1, 7])
# Task: Project vector u on Plane P
# finding norm of the vector n
n_norm = np.sqrt(sum(n**2))
# Apply the formula as mentioned above
# for projecting a vector onto the orthogonal vector n
# find dot product using np.dot()
proj_of_u_on_n = (np.dot(u, n)/n_norm**2)*n
# subtract proj_of_u_on_n from u:
# this is the projection of u on Plane P
print("Projection of Vector u on Plane P is: ", u - proj_of_u_on_n)