I'm doing the online Computer Vision course by UMich and am new to PyTorch. One of the assignment questions is on batch matrix multiplication, where we have to find the batch matrix product with and without the bmm function. Here is the code.
def batched_matrix_multiply(x, y, use_loop=True):
"""
Perform batched matrix multiplication between the tensor x of shape (B, N, M)
and the tensor y of shape (B, M, P).
If use_loop=True, then you should use an explicit loop over the batch
dimension B. If loop=False, then you should instead compute the batched
matrix multiply without an explicit loop using a single PyTorch operator.
Inputs:
- x: Tensor of shape (B, N, M)
- y: Tensor of shape (B, M, P)
- use_loop: Whether to use an explicit Python loop.
Hint: torch.stack, bmm
Returns:
- z: Tensor of shape (B, N, P) where z[i] of shape (N, P) is the result of
matrix multiplication between x[i] of shape (N, M) and y[i] of shape
(M, P). It should have the same dtype as x.
"""
z = None
#############################################################################
# TODO: Implement this function #
#############################################################################
# Replace "pass" statement with your code
z = torch.zeros(x.shape[0], x.shape[1], y.shape[2])
if use_loop == True:
for i in range(x.shape[0]):
z[i] = torch.mm(x[i], y[i])
else:
z = torch.bmm(x,y)
#############################################################################
# END OF YOUR CODE #
#############################################################################
return z
I've managed to do it without bmm, but without using the torch.stack hint. I initialized a zeroes matrix 'z' with the dimensions of the output matrix and performed normal matrix multiplication for each batch using the for loop.
I'd like to know what the more efficient answer using torch.stack is.
great question. I just tried solving this myself for two hours now. Here's my solution and it really speeds up the computation, as needed.
if use_loop == False:
z = torch.bmm(x,y)
else:
z = torch.zeros(x.shape[0], x.shape[1], y.shape[2])
for i in range(x.shape[0],2):
z[i] = torch.stack([x[i] # y[i], x[i+1] # y[i+1]])
Hoped this helped!
Related
Given an n-by-n matrix A, where each row of A is a permutation of [n], e.g.,
import torch
n = 100
AA = torch.rand(n, n)
A = torch.argsort(AA, dim=1)
Also given another n-by-n matrix P, we want to construct a 3D tensor Q s.t.
Q[i, j, k] = P[A[i, j], k]
Is there any efficient way in pytorch?
I am aware of torch.gather but it seems hard to be directly applied here.
You can directly use:
Q = P[A]
Why not simply use A as an index:
Q = P[A, :]
I'm currently trying to fill a matrix K where each entry in the matrix is just a function applied to two entries of an array x.
At the moment I'm using the most obvious method of running through rows and columns one at a time using a double for-loop:
K = np.zeros((x.shape[0],x.shape[0]), dtype=np.float32)
for i in range(x.shape[0]):
for j in range(x.shape[0]):
K[i,j] = f(x[i],x[j])
While this works fine the resulting matrix is a 10,000 by 10,000 matrix and takes very long to calculate. I was wondering if there is a more efficient way to do this built into NumPy?
EDIT: The function in question here is a gaussian kernel:
def gaussian(a,b,sigma):
vec = a-b
return np.exp(- np.dot(vec,vec)/(2*sigma**2))
where I set sigma in advance before calculating the matrix.
The array x is an array of shape (10000, 8). So the scalar product in the gaussian is between two vectors of dimension 8.
You can use a single for loop together with broadcasting. This requires to change the implementation of the gaussian function to accept 2D inputs:
def gaussian(a,b,sigma):
vec = a-b
return np.exp(- np.sum(vec**2, axis=-1)/(2*sigma**2))
K = np.zeros((x.shape[0],x.shape[0]), dtype=np.float32)
for i in range(x.shape[0]):
K[i] = gaussian(x[i:i+1], x)
Theoretically you could accomplish this even without any for loop, again by using broadcasting, but here an intermediary array of size len(x)**2 * x.shape[1] will be created which might run out of memory for your array sizes:
K = gaussian(x[None, :, :], x[:, None, :])
I am trying to generate a matrix (tensor object on PyTorch) that is similar to Gram matrix except I need to apply a kernel function instead of inner product on my input matrix.
For loops like the one below works:
N = x.shape[0] # x.shape = (N,d)
G = torch.zeros((N,N))
for i in range(N):
for j in range(N):
G[i][j] = K(x[i], x[j])
where x is my input tensor whose shape is (N,d) and the kernel function K(a,b) yields a real value after performing some math. For example:
def K(a,b):
return ((1+(a*b)).sum()).pow(2) #second degree polynomial.
I want to generate this matrix, G without having to change the kernel function K() and of course, without for-loops!
My initial attempt is to use a lambda approach but this code below obviously doesn't work as it only yields a list of k(x[i],x[i]).
G = torch.tensor(list(map(lambda a,b: K(a,b),x,x))
How can I use the lambda function to yield N-by-N matrix?
What would be some other ways to tackle this problem?
Any insight would be appreciated.
You can calculate G from x simply with:
G = (1 + torch.matmul(x, x.T)).pow(2)
I have n-vectors which need to be influenced by each other and output n vectors with same dimensionality d. I believe this is what torch.nn.MultiheadAttention does. But the forward function expects query, key and value as inputs. According to this blog, I need to initialize a random weight matrix of shape (d x d) for each of q, k and v and multiply each of my vectors with these weight matrices and get 3 (n x d) matrices. Now are the q, k and v expected by torch.nn.MultiheadAttention just these three matrices or do I have it mistaken?
When you want to use self attention, just pass your input vector into torch.nn.MultiheadAttention for the query, key and value.
attention = torch.nn.MultiheadAttention(<input-size>, <num-heads>)
x, _ = attention(x, x, x)
The pytorch class returns the output states (same shape as input) and the weights used in the attention process.
I'm watching the Youtube videos of Stanford's cs231n, and trying to do the assignments as exercice. While doing the SVM one I ran into the following piece of code:
def svm_loss_naive(W, X, y, reg):
"""
Structured SVM loss function, naive implementation (with loops).
Inputs have dimension D, there are C classes, and we operate on minibatches
of N examples.
Inputs:
- W: A numpy array of shape (D, C) containing weights.
- X: A numpy array of shape (N, D) containing a minibatch of data.
- y: A numpy array of shape (N,) containing training labels; y[i] = c means
that X[i] has label c, where 0 <= c < C.
- reg: (float) regularization strength
Returns a tuple of:
- loss as single float
- gradient with respect to weights W; an array of same shape as W
"""
dW = np.zeros(W.shape) # initialize the gradient as zero
# compute the loss and the gradient
num_classes = W.shape[1]
num_train = X.shape[0]
loss = 0.0
for i in range(num_train):
scores = X[i].dot(W) # This line
correct_class_score = scores[y[i]]
for j in range(num_classes):
if j == y[i]:
continue
margin = scores[j] - correct_class_score + 1 # note delta = 1
if margin > 0:
loss += margin
Heres the line I'm having trouble with:
scores = X[i].dot(W)
This is doing the product xW, shouldn't it be Wx? by that I mean W.dot(X[i])
Because the array shapes are (D, C) and (N, D) for W and X respectively, you can't take the dot product directly, without transposing them both first (they must be (C, D)·(D, N) for matrix multiplication.
Since X.T.dot(W.T) == W.dot(X), the implementation simply reverses the order of the dot product as opposed to taking the transform of each array. Effectively, this just comes down to a decision around how the inputs are arranged. In this case the (somewhat arbitrary) decision was made to arrange the samples and features in a more intuitive way versus having the dot product as x·W.