My pytorch code is running too slow due to it not being vectorized and I am unsure how to go about vectorizing it as I am relatively new to PyTorch. Can someone help me do this or point me in the right direction?
level_stride = 8
loc = torch.zeros(H * W, 2)
for i in range(H):
for j in range(W):
loc[i * H + j][0] = level_stride * (j + 0.5)
loc[i * H + j][1] = level_stride * (i + 0.5)
First of all, you defined the tensor to be of size (H*W, 2). This is of course entirely optional, but it might be more expressive to preserve the dimensionality explicitly, by having H and W be separate dimension in the tensor. That makes some operations later on easier as well.
The values you compute to fill the tensor with originate from ranges. We can use the torch.arange function to get the same ranges, already in form of a tensor, ready to be put into your loc tensor. If that is done, you could see it as, completely leaving out the for loops, and just treating j and i as different tensors.
If you're not familiar with tensors, this might seem confusing, but operations between single number values and tensors work just as well, so not much of the rest of the code has to be changed.
I'll give you a expression of how your code could look like with these changes applied:
level_stride = 8
loc = torch.zeros(H, W, 2)
j = torch.arange(W).expand((H, W))
loc[:, :, 0] = level_stride * (j + 0.5)
i = torch.arange(H).expand((W, H)).T
loc[:,:,1] = level_stride * (i + 0.5)
The most notable changes are the assignments to j and i, and the usage of slicing to fill the data into loc.
For completeness, let's go over the expressions that are assigned to i and j.
j starts as a torch.arange(W) which is just like a regular range, just in form of a tensor. Then .expand is applied, which you could see as the tensor being repeated. For example, if H had been 5 and W 2, then a range of 2 would have been created, and expanded to a size of (5, 2). The size of this tensor thereby matches the first two sizes in the loc tensor.
i starts just the same, only that W and H swap positions. This is due to i originating from a range based on H rather then W. Notable here is that .T is applied at the end of that expression. The reason for this is that the i tensor still has to match the first two dimensions of loc. .T transposes the tensor for this reason.
If you have a subject specific reason to have the loc tensor in the (H*W, 2) shape, but are otherwise happy with this solution, you could reshape the tensor in the end with loc.reshape(H*W,2).
I am implementing a Hidden Markov Model and thus am dealing with very small probabilities. I am handling the underflow by representing variables in log space (so x → log(x)) which has the side effect that multiplication is now replaced by addition and addition is handled via numpy.logaddexp or similar.
Is there an easy way to handle matrix multiplication in log space?
This is the best way I could come up with to do it.
from scipy.special import logsumexp
def log_space_product(A,B):
Astack = np.stack([A]*A.shape[0]).transpose(2,1,0)
Bstack = np.stack([B]*B.shape[1]).transpose(1,0,2)
return logsumexp(Astack+Bstack, axis=0)
The inputs A and B are the logs of the matrices A0 and B0 you want to multiply, and the functions returns the log of A0B0. The idea is that the i,j spot in log(A0B0) is the log of the dot product of the ith row of A0 and the jth column of B0. So it is the logsumexp of the ith row of A plus the jth column of B.
In the code, Astack is built so the i,j spot is a vector containing the ith row of A, and Bstack is built so the i,j spot is a vector containing the jth column of B. Thus Astack + Bstack is a 3D tensor whose i,j spot is the ith row of A plus the jth column of B. Taking logsumexp with axis = 0 then gives the desired result.
Erik's response doesn't seem to work for some non-square matrices (e.g. n*m times m*r). Here is a version that takes that into account:
def log_space_product(A,B):
Astack = np.stack([A]*B.shape[1]).transpose(1,0,2)
Bstack = np.stack([B]*A.shape[0]).transpose(0,2,1)
return logsumexp(Astack+Bstack, axis=2)
where the i, j spot of A contains the i-th row of A and i, j spot of B contains the i-th column of B.
This happens because [A] * B.shape[1] is of shape (r, n, m) which is transposed into (n, r, m), and [B] * A.shape[0] is of shape (n, m, r) which is transposed into (n, r, m). We want their first two dimensions to be (n, r) because the result matrix needs to be of shape (n, r).
Took a while to figure out myself. Hope this helps anyone implementing a HMM!
I have 2 arrays (for the sake of the example, let's name them A and B) and i perform the following manipulations at them, but i get an error at the assignment of "d2" in my code.
n = len(tracks) #tracks is a list containing different-length 3d arrays
n=30; #test with a few tracks
length = len(tracks) #list containing the total number of "samples"
perm_index = np.random.permutation(length) #uniform sampling without replacement
subset_len = 5 # choose the size of subset of tracks A
subset_A = [tracks[x:x+1] for x in xrange(0, subset_len, 1)]
subset_B = [tracks[x:x+1] for x in xrange(subset_len, n, 1)]
tempA = distance_calc.dist_calcsub(len(subset_A), subset_A) # distance matrix calculation
tempA = mcp.sym_mcp(len(subset_A), tempA) # symmetrize mcp ???
tempB = distance_calc.dist_calcsubs(subset_A, subset_B) # distance matrix calculation
#symmetrize mcp ? ? its not diagonal, symmetric . . .
A = affinity.aff_conv(60, tempA) # conversion to affinity
B = affinity.aff_conv(60, tempB) # conversion to affinity
#((row,col)) = np.shape(A)
#A = normalization_affinity.norm_aff(row, col, A) # normalization of affinity matrix
# Normalize A and B for Laplacian using row sums of W, where W = [A B; B' B'*A^-1*B].
# Let d1 = [A B]*1, d2 = [B' B'*A^-1*B]*1, dhat = sqrt(1./[d1; d2]).
d1 = np.sum( np.vstack((A, np.transpose(B))) )
d2 = np.sum(B,0) + np.dot(np.sum(np.transpose(B),0), np.dot(np.linalg.pinv(A), B ))
dhat = np.transpose(np.sqrt( 1/ np.hstack((d1, d2)) ))
A = A* np.dot( dhat[0:subset_len], np.transpose(dhat[0:subset_len]) )
B = B* np.dot( dhat[0:subset_len], np.transpose(dhat[subset_len:n]) )
The error again is "ValueError: matrices are not aligned." because the np.dot vectors are 1d vectors of different size; I know the reason why this is happening but I am following exactly the equations to perform the Nystrom method.
P.S: I am following the method described in p.90-92 in this thesis: thesis link
Looking at the paper, you've got two problems here.
Let's start with the information you left out of your question. You're trying to do this operation:
bc + B.T * A^−1 * br
where ar and br are column vectors containing the row sums of A and B and bc is
the column sum of B.
In particular, you're mapping that A^-1 * br to np.dot( np.linalg.pinv(A), np.sum(B, 0)).
The first problem is that np.linalg.pinv is the pseudo-inverse, A+, not the multiplicative inverse, A^-1. Using a completely different operation just because it doesn't give you an error doesn't solve the problem.
So, how do you calculate the multiplicative inverse? Well, you can't. In general, the multiplicative inverse doesn't exist for non-square matrices, so given a 5x10 A, you're stuck right at the beginning.
Anyway, the second problem comes from the fact that your br isn't a column vector. If you want to think in matrix terms, as the paper does, it's a row vector, 10x1 instead of 1x10. If you want to think in numpy ndarray terms, it's a 1D (10,) array instead of a 2D (1, 10) array. If you think of the operation in matrix multiplication terms, you can't multiply a 10x5 matrix with a 10x1 matrix; if you think of it in NumPy terms as the multidimensional dot product, you can't multiply a (10, 5) array with a (10,) array.
It's true that you can extend the dot product to specifically the domain of MxN matrices vs. M vectors, and under that definition your multiplication would make sense. But that's not the definition used by either the paper's standard matrix multiplication notation or NumPy's dot function. So, what can you do? Well, note that the operation you're trying to do is commutative, so swapping the order of operands is perfectly legal—and if you do that, then it does happen to correspond to the general dot product. So, you could write this as np.dot(np.sum(B, 0), np.linalg.pinv(A)) and get the result you want. And there are a number of other ways you could transform the arrays that are idempotent in your matrix-vs.-vector multiplication domain but meaningful for np.dot, and they will all get you the same result. For example, np.dot(np.linalg.pinv(A).T, np.sum(B, 0)) will also work.
I'm also not sure why you're using dot product in the first place. I don't see anything in the notation to imply that
But all of this is a sideshow; if you inverted A properly, you would have something with the same dimensions as A, and multiply a 5x10 matrix with a 10x1 vector, or a (5, 10) array with a (10,) array, is already perfectly well defined. The only problem is that, again, you can't generally invert non-square matrices, so there's no way you can actually get to this place.
So, the real solution is to go back to wherever you decided on those shapes for A and B and try again.
In particular, it's pretty clear from the illustration in the paper showing the derivation of A and B from the larger matrix that the height of A is the height of B, and the width of A is the width of B.T, which is of course the height of B again.
Also, if the larger matrix is supposed to symmetric, and A is the upper left corner of a symmetric matrix, A has to be symmetric.
I also think you've mixed up row-column order and x-y order a few times, and bc is supposed to the column sums of B, not the column sums of B.T (which would just be the row sums of B, flipped into a row vector instead of a column vector).
While we're at it, let's use methods and operators where possible instead of writing everything in the longest possible way.
So, I think what you wanted is something like this:
A = np.random.random_sample((4, 4)) # square
A = (A + A.T) / 2 # and symmetric
B = np.random.random_sample((4, 10))
ar = A.sum(1)
br = B.sum(1)
bc = B.sum(0) # not B.T.sum(0), that's just br again!
d1 = ar + br
d2 = bc + np.dot(B.T, np.dot(np.linalg.inv(A), br))
Without actually reading the paper I can't be sure this is what you actually want, but this looks like it fits with a quick skim of those two pages, and it runs without any errors, so hopefully you can at least look at the results and see if they are what you want.
You are summing over the first dimension of B, so the shape is 10, the size of the second dimension of B.
You can calculate
np.dot( np.sum(B, 0), np.linalg.pinv(A))
but this gives you a vector with 5 elements, but B_T has only a size of 4. So something doesn't fit in your sample data.
I would like to count how many m by n matrices whose elements are 1 or -1 have the property that all its floor(m/2)+1 by n submatrices have full rank. My current method is naive and slow and is in the following python/numpy code. It simply iterates over all matrices and tests all the submatrices.
import numpy as np
import itertools
from scipy.misc import comb
m = 8
n = 4
rowstochoose = int(np.floor(m/2)+1)
maxnumber = comb(m, rowstochoose, exact = True)
matrix_g=(np.array(x).reshape(m,n) for x in itertools.product([-1,1], repeat = m*n))
nofound = 0
for A in matrix_g:
count = 0
for rows in itertools.combinations(range(m), int(rowstochoose)):
if (np.linalg.matrix_rank(A[list(rows)]) == int(min(n,rowstochoose))):
count+=1
else:
break
if (count == maxnumber):
nofound+=1
print nofound, 2**(m*n)
Is there a better/faster way to do this? I would like to do this calculation for n and m up to 20 but any significant improvements would be great.
Context. I am interested in getting some exact solutions for https://math.stackexchange.com/questions/640780/probability-that-every-vector-is-not-orthogonal-to-half-of-the-others .
As a data point to compare implementations. n,m = 4,4 should output 26880 . n,m=5,5 is too slow for me to run. For n = 2 and m = 2,3,4,5,6 the outputs should be 8, 0, 96, 0, 1280.
Current status Feb 2, 2014:
The answer of leewangzhong is fast but is not correct for m > n . leewangzhong is considering how to fix it.
The answer of Hooked does not run for m > n .
(Now a partial solution for n = m//2+1, and the requested code.)
Let k := m//2+1
This is somewhat equivalent to asking, "How many collections of m n-dimensional vectors of {-1,1} have no linearly dependent sets of size min(k,n)?"
For those matrices, we know or can assume:
The first entry of every vector is 1 (if not, multiply the whole by -1). This reduces the count by a factor of 2**m.
All vectors in the list are distinct (if not, any submatrix with two identical vectors has non-full rank). This eliminates a lot. There are choose(2**m,n) matrices of distinct vectors.
The list of vectors are sorted lexicographically (rank isn't affected by permutations). So we're really thinking about sets of vectors instead of lists. This reduces the count by a factor of m! (because we require distinctness).
With this, we have a solution for n=4, m=8. There are only eight different vectors with the property that the first entry is positive. There is only one combination (sorted list) of 8 distinct vectors from 8 distinct vectors.
array([[ 1, 1, 1, 1],
[ 1, 1, 1, -1],
[ 1, 1, -1, 1],
[ 1, 1, -1, -1],
[ 1, -1, 1, 1],
[ 1, -1, 1, -1],
[ 1, -1, -1, 1],
[ 1, -1, -1, -1]], dtype=int8)
100 size-4 combinations from this list have rank 3. So there are 0 matrices with the property.
For a more general solution:
Note that there are 2**(n-1) vectors with first coordinate -1, and choose(2**(n-1),m) matrices to inspect. For n=8 and m=8, there are 128 vectors, and 1.4297027e+12 matrices. It might help to answer, "For i=1,...,k, how many combinations have rank i?"
Alternatively, "What kind of matrices (with the above assumptions) have less than full rank?" And I think the answer is exactly, A sufficient condition is, "Two columns are multiples of each other". I have a feeling that this is true, and I tested this for all 4x4, 5x5, and 6x6 matrices.(Must've screwed up the tests) Since the first column was chosen to be homogeneous, and since all homogeneous vectors are multiples of each other, any submatrix of size k with a homogeneous column other than the first column will have rank less than k.
This is not a necessary condition, though. The following matrix is singular (first plus fourth is equal to third plus second).
array([[ 1, 1, 1, 1, 1],
[ 1, 1, 1, 1, -1],
[ 1, 1, -1, -1, 1],
[ 1, 1, -1, -1, -1],
[ 1, -1, 1, -1, 1]], dtype=int8)
Since there are only two possible values (-1 and 1), all mxn matrices where m>2, k := m//2+1, n = k and with first column -1 have a majority member in each column (i.e. at least k members are the same). So for n=k, the answer is 0.
For n<=8, here's code to generate the vectors.
from numpy import unpackbits, arange, uint8, int8
#all distinct n-length vectors from -1,1 with first entry -1
def nvectors(n):
if n > 8:
raise ValueError #is that the right error?
return -1 + 2 * (
#explode binary numbers to arrays of 8 zeroes and ones
unpackbits(arange(2**(n-1),dtype=uint8)) #unpackbits only takes uint
.reshape((-1,8)) #unpackbits flattens, so we need to shape it to 8 bits
[:,-n:] #only take the last n bytes
.view(int8) #need signed
)
Matrix generator:
#generate all length-m matrices that are combinations of distinct n-vectors
def matrix_g(n,m):
return (array(mat) for mat in combinations(nvectors(n),m))
The following is a function to check that all submatrices of length maxrank have full rank. It stops if any have less than maxrank, instead of checking all combinations.
rankof = np.linalg.matrix_rank
#all submatrices of at least half size have maxrank
#(we only need to check the maxrank-sized matrices)
def halfrank(matrix,maxrank):
return all(rankof(submatr) == maxrank for submatr in combinations(matrix,maxrank))
Generate all matrices that have all half-matrices with full rank
def nicematrices(m,n):
maxrank = min(m//2+1,n)
return (matr for matr in matrix_g(n,m) if halfrank(matr,maxrank))
Putting it all together:
import numpy as np
from numpy import unpackbits, arange, uint8, int8, array
from itertools import combinations
#all distinct n-length vectors from -1,1 with first entry -1
def nvectors(n):
if n > 8:
raise ValueError #is that the right error?
if n==0:
return array([])
return -1 + 2 * (
#explode binary numbers to arrays of 8 zeroes and ones
unpackbits(arange(2**(n-1),dtype=uint8)) #unpackbits only takes uint
.reshape((-1,8)) #unpackbits flattens, so we need to shape it to 8 bits
[:,-n:] #only take the last n bytes
.view(int8) #need signed
)
#generate all length-m matrices that are combinations of distinct n-vectors
def matrix_g(n,m):
return (array(mat) for mat in combinations(nvectors(n),m))
rankof = np.linalg.matrix_rank
#all submatrices of at least half size have maxrank
#(we only need to check the maxrank-sized matrices)
def halfrank(matrix,maxrank):
return all(rankof(submatr) == maxrank for submatr in combinations(matrix,maxrank))
#generate all matrices that have all half-matrices with full rank
def nicematrices(m,n):
maxrank = min(m//2+1,n)
return (matr for matr in matrix_g(n,m) if halfrank(matr,maxrank))
#returns (number of nice matrices, number of all matrices)
def count_nicematrices(m,n):
from math import factorial
return (len(list(nicematrices(m,n)))*factorial(m)*2**m, 2**(m*n))
for i in range(0,6):
print (i, count_nicematrices(i,i))
count_nicematrices(5,5) takes about 15 seconds for me, the vast majority of which is taken by the matrix_rank function.
Since no one's answered yet, here's an answer without code. The useful symmetries that I see are as follows.
Multiply a row by -1.
Multiply a column by -1.
Permute the rows.
Permute the columns.
I would attack this problem by exhaustively generating the non-isomorphs, filtering them, and summing the sizes of their orbits. nauty will be quite useful for the first and third steps. Assuming that most matrices have few symmetries (undoubtedly an excellent assumption for n large, but it's not obvious a priori how large), I would expect 8x8 to be doable, 9x9 to be borderline, and 10x10 to be out of reach.
Expanded pseudocode:
Generate one representative of each orbit of the (m - 1) by (n - 1) 0-1 matrices acted upon by the group generated by row and column permutations, together with the size of the orbit (= (m - 1)! (n - 1)! / the size of the automorphism group). Perhaps the author of the paper that Tim linked would be willing to share his code; otherwise, see below.
For each matrix, replace entries x by (-1)^x. Add one row and one column of 1s. Multiply the size of its orbit by 2^(m + n - 1). This takes care of the sign change symmetries.
Filter the matrices and sum the orbit sizes of the ones that remain. You might save a little computation time here by implementing Gram--Schmidt yourself so that when you try all combinations in lexicographic order there's an opportunity to reuse partial results for the shared prefixes.
Isomorph-free enumeration:
McKay's template can be used to generate the representatives for (m + 1) by n 0-1 matrices from the representatives for m by n 0-1 matrices, in a manner amenable to depth-first search. With each m by n 0-1 matrix, associate a bipartite graph with m black vertices, n white vertices, and the appropriate edge for each 1 entry. Do the following for each m by n representative.
For each length-n vector, construct the graph for the (m + 1) by n matrix consisting of the representative together with the new vector and run nauty to get a canonical labeling and the vertex orbits.
Filter out the possibilities where the vertex corresponding to the new vector is in a different orbit from the black vertex with the lowest number.
Filter out the possibilities with duplicate canonical labelings.
nauty also computes the orders of automorphism groups.
You will need to rethink this problem from a mathematical point of view. That said even with brute force, there are some programming tricks you can use to speed up the process (as SO is a programming site). Little tricks like not recalculating int(min(n,rowstochoose)) and itertools.combinations(range(m), int(rowstochoose)) can save a few percent - but the real gain comes from memoization. Others have mentioned it, but I thought it might be useful to have a complete, working, code example:
import numpy as np
from scipy.misc import comb
import itertools, hashlib
m,n = 4,4
rowstochoose = int(np.floor(m/2)+1)
maxnumber = comb(m, rowstochoose, exact = True)
combo_itr = (x for x in itertools.product([-1,1], repeat = m*n))
matrix_itr = (np.array(x,dtype=np.int8).reshape((n,m)) for x in combo_itr)
sub_shapes = map(list,(itertools.combinations(range(m), int(rowstochoose))))
required_rank = int(min(n,rowstochoose))
memo = {}
no_found = 0
for A in matrix_itr:
check = True
for s in sub_shapes:
view = A[s].view(np.int8)
h = hashlib.sha1(view).hexdigest()
if h not in memo:
memo[h] = np.linalg.matrix_rank(view)
if memo[h] != required_rank:
check = False
break
if check: no_found+=1
print no_found, 2**(m*n)
This gives a speed gain of almost 10x for the 4x4 case - you'll see substantial improvements for larger matrices if you care to wait long enough. It's possible for the larger matrices, where the cost of the rank is proportionally more expensive, that you can order the matrices ahead of time on the hashing:
idx = np.lexsort(view.T)
h = hashlib.sha1(view[idx]).hexdigest()
For the 4x4 case this makes it slightly worse, but I expect that to reverse for the 5x5 case.
Algorithm 1 - memorizing small ones
I would use memorizing of the already checked smaller matrices.
You could simply write down in binary format (0 for -1, 1 for 1) all smaller matrices. BTW, you cane directly check for ranges matrices of (0 and 1) instead of (-1 and 1) - it is the same. Let us call these coding IMAGES. Using long types you can have matrices of up to 64 cells, so, up to 8x8. It is fast. Using String you can them have as large as you need.
Really, 8x8 is more than enough - in the 8GB memory we can place 1G longs. it is about 2^30, so, you can remember matrices of about up to 25-28 elements.
For every size you'll have a set of images:
for 2x2: 1001, 0110, 1000, 0100, 0010, 0001, 0111, 1011, 1101, 1110.
So, you'll have archive=array of NxN, each element of which will be an ordered list of binary images of good matrices.
- (for matrix size MxN, where M>=N, the appropriate place in archive will have coordinates M,N. If M
When you are checking a new large matrix, divide it into small ones
For every small matrix T
If the appropriate place in the archive for size of T has no list, create it and fill by images of all full-rank matrices of size of T and order images. If you are out of memory, stop the process of archive filling.
If T could be in archive, according to size:
Make image of T
Look for image(t) in the list - if it is in it, it is OK, if no, the large matrix should be thrown off.
If T is too big for the archive, check it as you do it.
Algorithm 2 - increasing sizes
The other possibility is to create larger matrices by adding pieces to the lesser ones, already found.
You should decide, up to what size the matrices will grow.
When you find a "correct" matrix of size MxN, try to add a row to top it. New matrices should be checked only for submatrices that include the new row. The same with the new column.
You should set exact algorithm, which sizes are derived from which ones. Thus you can minimize the number of remembered matrices. I thought about that sequence:
Start from 2x2 matrices.
continue with 3x2
4x2, 3x3
5x2, 4x3
6x2, 5x3, 4x4
7x2, 6x3, 5x4
...
So you can remember only (M+N)/2-1 matrices for searching among sizes MxN.
If each time when we can create new size from two old ones, we derive from more square ones, we could also greatly spare place for matrices remembering: For "long" matrices as 7x2 we do need remember and check only the last line 1x2. For matrices 6x3 we should remember their stub of 2x3, and so on.
Also, you don't need to remember the largest matrices - you won't use them for further counting.
Again use "images" for remembering the matrix.