I am implementing a Hidden Markov Model and thus am dealing with very small probabilities. I am handling the underflow by representing variables in log space (so x → log(x)) which has the side effect that multiplication is now replaced by addition and addition is handled via numpy.logaddexp or similar.
Is there an easy way to handle matrix multiplication in log space?
This is the best way I could come up with to do it.
from scipy.special import logsumexp
def log_space_product(A,B):
Astack = np.stack([A]*A.shape[0]).transpose(2,1,0)
Bstack = np.stack([B]*B.shape[1]).transpose(1,0,2)
return logsumexp(Astack+Bstack, axis=0)
The inputs A and B are the logs of the matrices A0 and B0 you want to multiply, and the functions returns the log of A0B0. The idea is that the i,j spot in log(A0B0) is the log of the dot product of the ith row of A0 and the jth column of B0. So it is the logsumexp of the ith row of A plus the jth column of B.
In the code, Astack is built so the i,j spot is a vector containing the ith row of A, and Bstack is built so the i,j spot is a vector containing the jth column of B. Thus Astack + Bstack is a 3D tensor whose i,j spot is the ith row of A plus the jth column of B. Taking logsumexp with axis = 0 then gives the desired result.
Erik's response doesn't seem to work for some non-square matrices (e.g. n*m times m*r). Here is a version that takes that into account:
def log_space_product(A,B):
Astack = np.stack([A]*B.shape[1]).transpose(1,0,2)
Bstack = np.stack([B]*A.shape[0]).transpose(0,2,1)
return logsumexp(Astack+Bstack, axis=2)
where the i, j spot of A contains the i-th row of A and i, j spot of B contains the i-th column of B.
This happens because [A] * B.shape[1] is of shape (r, n, m) which is transposed into (n, r, m), and [B] * A.shape[0] is of shape (n, m, r) which is transposed into (n, r, m). We want their first two dimensions to be (n, r) because the result matrix needs to be of shape (n, r).
Took a while to figure out myself. Hope this helps anyone implementing a HMM!
Related
I implement Crank-Nicolson 2D finite-difference method.
I get a matrix A which is banded with 1 band above and below the main diagonal, but also contains 2 additional bands , further apart from the main diagonal, so it is NOT penta-diagonal.
A picture showing the structure is below. My matrix is the RHS one. The LHS is easy, it's the penta-diagonal one.
I couldn't find up until now a way to solve Ax = b with A being the RHS matrix from the photo in python.
I could barely find a name for it, in these lecture notes https://ocw.mit.edu/ans7870/2/2.086/F12/MIT2_086F12_notes_unit5.pdf it is called an 'outrigger' matrix (page 403).
At the moment I am using spsolve from from scipy.sparse.linalg, into which I feed two arguments, namely sparse.csc_matrix(A) and sparse.csc_array(b), where A and b have been defined initially as A = sparse.dok_matrix((size, size), dtype=np.complex64) and b = sparse.dok_array((size, 1), dtype=np.complex64), then populated with values by iterating element by element through them.
It is extremely slow and I was wondering maybe someone more experienced knows a way to exploit the structure appearing in A.
Thank you!
You should consider ussing the Gauss-Seidel method.
If your system is diagonal dominant it will converge, if it is not you probably can make it so by changing using a higher resolution grid.
Where both x and b have shape (N, M) and A has shape (N, N).
Let L = np.diag(np.diag(A)), vL = np.diag(A).reshape(N, 1) and U = A - L.
The inv(L) * (b - U # x) iteration can be written as (b - U # x) / vL, so each iteration will have O(n) complexity if you use sparse matrices.
If you want to make it even more efficient you can do the multiplications by sum of rolled diagonal matrices.
np.roll(np.diag(np.roll(A, k, axis=0)) * x[:,0], -k, axis=0).reshape(N, M)
You can precompute the rolled diagonals, then your matrix multiplication is performed by 4 (or five if the structure is not symmetric) vector multiplications, and some additional rolling and adding operations.
Given is X (n,d) array and theta(k,d) array and a parameter temp_parameter.
In my code, I want to create a matrix such that its i,j component is the dot product of the ith row of X and the jth row of theta. I have created this so far
k = theta.shape[0]
n = X.shape[0]
zeros = np.zeros((k,n))
for i in range(n):
for j in range(k):
zeros[j][i]= np.dot(X[i],theta[j])/temp_parameter
However, this is extremely inefficient. Can anybody give me a hint how to improve my code?
Say I have a list of N points each with d coordinates. There is a choice of representing this as a numpy array of shape:
- (d, N)
- (N, d)
which are mathematically equivalent.
Question. Are there general guidelines/good practise principles for choosing one over the other? Computationally speaking, is numpy designed with one choice in mind?
A interpretation in favour of (N, d).
If I were to store coordinates of a list of points in a spreadsheet, then I would find it more natural (or is it just me?) to go through the list vertically (downwards) and through the coordinates horizontally. In other words, each row of the spread sheet corresponds to a Python tuple (fixed length, immutable), and the spreadsheet corresponds to a Python list of such tuples. The number of coordinates in the spreadsheet is fixed, but more points could be added (or removed) to the list, and thus the length of this list is unbounded, and I find it easier to scroll vertically than horizontally.
Example.
In the k-means clustering algorithm, one wants to calculate the distance between each of N points with each of k cluster centers (all of which have d coordinates). From this article I am learning a way to do this by exploiting broadcasting. If X is an array of shape (N, d) of sample points and C is an array of shape (k, d), then the distance can be conveniently calculated by taking the element wise length of the array
X - C[:, None]
of shape (k, N, d).
It would be less convenient to do this with arrays of shapes (d, N) and (d, k), respectively.
I have a (large) 4D array, consisting of the 5 coefficients in a given basis for a matrix field. Given the 5 basis matrices, I want to efficiently calculate the matrix field.
The coefficient field c[x,y,z,i] being the value of i-th coefficient at position x,y,z
And the matrix field M[x,y,z,a,b] being the (3,3) matrix at position x,y,z
And the basis matrices T_1,...T_5, being the (3,3) basis matrices
I could loop over each position in space:
M[x,y,z,:,:] = T_1[:,:]*c[x,y,z,0] + T_2[:,:]*c[x,y,z,1]...T_5[:,:]*c[x,y,z,4]
But this is very inefficient. My attempts at using np.multiply,np.sum result in broadcasting errors due to the ambiguity of the desired product being a field of 3x3 matrices.
Keep in mind that to numpy, these 4 and 5d arrays are just that, not 3d arrays containing 2d matrices, etc.
Let's try to write your calculation in a way that clarifies dimensions:
M[x,y,z] = T_1*c[x,y,z,0] + T_2*c[x,y,z,1]...T_5*c[x,y,z,4]
M[x,y,z,:,:] = T_1[:,:]*c[x,y,z,0] + T_2[:,:]*c[x,y,z,1]...T_5[:,:]*c[x,y,z,4]
c[x,y,z,i] is a coefficient, right? So M is a weighted sum of the T_n arrays?
One way of expressing this is:
T = np.stack([T_1, T_2, ...T_5], axis=0) # 3d (nab)
M = np.einsum('nab,xyzn->xyzab', T, c)
We could alternatively stack T_i on a new last axis
T = np.stack([T_1, T_2 ...T_5], axis=2) # (abn)
M = np.einsum('abn,xyzn->xyzab', T, c)
or as broadcasted multiplication plus sum:
M = (T[None,None,None,:,:,:] * c[:,:,:,None,None,:]).sum(axis=-1)
I'm writing this code without testing, so there may be errors, but I think the basic outline is right.
It could also be written as a dot, if I can put the n dimension last in one argument, and 2nd to the last in the other. Or with tensordot. But there's less control over broadcasting of the other dimensions.
For test calculations you could also reshape these arrays so that the x,y,z are rolled into one, and the a,b into another, e.g
M[xyz,:] = T_n[ab]*c[xyz,n] # etc
I have 2 arrays (for the sake of the example, let's name them A and B) and i perform the following manipulations at them, but i get an error at the assignment of "d2" in my code.
n = len(tracks) #tracks is a list containing different-length 3d arrays
n=30; #test with a few tracks
length = len(tracks) #list containing the total number of "samples"
perm_index = np.random.permutation(length) #uniform sampling without replacement
subset_len = 5 # choose the size of subset of tracks A
subset_A = [tracks[x:x+1] for x in xrange(0, subset_len, 1)]
subset_B = [tracks[x:x+1] for x in xrange(subset_len, n, 1)]
tempA = distance_calc.dist_calcsub(len(subset_A), subset_A) # distance matrix calculation
tempA = mcp.sym_mcp(len(subset_A), tempA) # symmetrize mcp ???
tempB = distance_calc.dist_calcsubs(subset_A, subset_B) # distance matrix calculation
#symmetrize mcp ? ? its not diagonal, symmetric . . .
A = affinity.aff_conv(60, tempA) # conversion to affinity
B = affinity.aff_conv(60, tempB) # conversion to affinity
#((row,col)) = np.shape(A)
#A = normalization_affinity.norm_aff(row, col, A) # normalization of affinity matrix
# Normalize A and B for Laplacian using row sums of W, where W = [A B; B' B'*A^-1*B].
# Let d1 = [A B]*1, d2 = [B' B'*A^-1*B]*1, dhat = sqrt(1./[d1; d2]).
d1 = np.sum( np.vstack((A, np.transpose(B))) )
d2 = np.sum(B,0) + np.dot(np.sum(np.transpose(B),0), np.dot(np.linalg.pinv(A), B ))
dhat = np.transpose(np.sqrt( 1/ np.hstack((d1, d2)) ))
A = A* np.dot( dhat[0:subset_len], np.transpose(dhat[0:subset_len]) )
B = B* np.dot( dhat[0:subset_len], np.transpose(dhat[subset_len:n]) )
The error again is "ValueError: matrices are not aligned." because the np.dot vectors are 1d vectors of different size; I know the reason why this is happening but I am following exactly the equations to perform the Nystrom method.
P.S: I am following the method described in p.90-92 in this thesis: thesis link
Looking at the paper, you've got two problems here.
Let's start with the information you left out of your question. You're trying to do this operation:
bc + B.T * A^−1 * br
where ar and br are column vectors containing the row sums of A and B and bc is
the column sum of B.
In particular, you're mapping that A^-1 * br to np.dot( np.linalg.pinv(A), np.sum(B, 0)).
The first problem is that np.linalg.pinv is the pseudo-inverse, A+, not the multiplicative inverse, A^-1. Using a completely different operation just because it doesn't give you an error doesn't solve the problem.
So, how do you calculate the multiplicative inverse? Well, you can't. In general, the multiplicative inverse doesn't exist for non-square matrices, so given a 5x10 A, you're stuck right at the beginning.
Anyway, the second problem comes from the fact that your br isn't a column vector. If you want to think in matrix terms, as the paper does, it's a row vector, 10x1 instead of 1x10. If you want to think in numpy ndarray terms, it's a 1D (10,) array instead of a 2D (1, 10) array. If you think of the operation in matrix multiplication terms, you can't multiply a 10x5 matrix with a 10x1 matrix; if you think of it in NumPy terms as the multidimensional dot product, you can't multiply a (10, 5) array with a (10,) array.
It's true that you can extend the dot product to specifically the domain of MxN matrices vs. M vectors, and under that definition your multiplication would make sense. But that's not the definition used by either the paper's standard matrix multiplication notation or NumPy's dot function. So, what can you do? Well, note that the operation you're trying to do is commutative, so swapping the order of operands is perfectly legal—and if you do that, then it does happen to correspond to the general dot product. So, you could write this as np.dot(np.sum(B, 0), np.linalg.pinv(A)) and get the result you want. And there are a number of other ways you could transform the arrays that are idempotent in your matrix-vs.-vector multiplication domain but meaningful for np.dot, and they will all get you the same result. For example, np.dot(np.linalg.pinv(A).T, np.sum(B, 0)) will also work.
I'm also not sure why you're using dot product in the first place. I don't see anything in the notation to imply that
But all of this is a sideshow; if you inverted A properly, you would have something with the same dimensions as A, and multiply a 5x10 matrix with a 10x1 vector, or a (5, 10) array with a (10,) array, is already perfectly well defined. The only problem is that, again, you can't generally invert non-square matrices, so there's no way you can actually get to this place.
So, the real solution is to go back to wherever you decided on those shapes for A and B and try again.
In particular, it's pretty clear from the illustration in the paper showing the derivation of A and B from the larger matrix that the height of A is the height of B, and the width of A is the width of B.T, which is of course the height of B again.
Also, if the larger matrix is supposed to symmetric, and A is the upper left corner of a symmetric matrix, A has to be symmetric.
I also think you've mixed up row-column order and x-y order a few times, and bc is supposed to the column sums of B, not the column sums of B.T (which would just be the row sums of B, flipped into a row vector instead of a column vector).
While we're at it, let's use methods and operators where possible instead of writing everything in the longest possible way.
So, I think what you wanted is something like this:
A = np.random.random_sample((4, 4)) # square
A = (A + A.T) / 2 # and symmetric
B = np.random.random_sample((4, 10))
ar = A.sum(1)
br = B.sum(1)
bc = B.sum(0) # not B.T.sum(0), that's just br again!
d1 = ar + br
d2 = bc + np.dot(B.T, np.dot(np.linalg.inv(A), br))
Without actually reading the paper I can't be sure this is what you actually want, but this looks like it fits with a quick skim of those two pages, and it runs without any errors, so hopefully you can at least look at the results and see if they are what you want.
You are summing over the first dimension of B, so the shape is 10, the size of the second dimension of B.
You can calculate
np.dot( np.sum(B, 0), np.linalg.pinv(A))
but this gives you a vector with 5 elements, but B_T has only a size of 4. So something doesn't fit in your sample data.