Given two 2-D pytorch tensors:
A = torch.FloatTensor([[1,2],[3,4]])
B = torch.FloatTensor([[0,0],[1,1],[2,2]])
Is there an efficient way to calculate a tensor of shape (6, 2, 2) where each entry is a column of A times each row of B?
For example, with A and B above, the 3D tensor should have the following matrices:
[[[0, 0],
[0, 0]],
[[1, 1],
[3, 3]],
[[2, 2],
[6, 6]],
[[0, 0],
[0, 0]],
[[2, 2],
[4, 4]],
[[4, 4],
[8, 8]]]
I know how to do it via for-loop but I am wondering if could have an efficient way to save it.
Pytorch tensors implement numpy style broadcast semantics which will work for this problem.
It's not clear from the question if you want to perform matrix multiplication or element-wise multiplication. In the length 2 case that you showed the two are equivalent, but this is certainly not true for higher dimensionality! Thankfully the code is almost the same so I'll just give both options.
A = torch.FloatTensor([[1, 2], [3, 4]])
B = torch.FloatTensor([[0, 0], [1, 1], [2, 2]])
# matrix multiplication
C_mm = (A.T[:, None, :, None] # B[None, :, None, :]).flatten(0, 1)
# element-wise multiplication
C_ew = (A.T[:, None, :, None] * B[None, :, None, :]).flatten(0, 1)
Code description. A.T transposes A and the indexing with None inserts unitary dimensions so A.T[:, None, :, None] will be shape (2, 1, 2, 1) and B[None, :, None, :] is shape (1, 3, 1, 2). Since # (matrix multiplication) operates on the last two dimensions of tensors, and broadcasts the other dimensions, then the result is matrix multiplication for each column of A times each row of B. In the element-wise case the broadcasting is performed on every dimension. The result is a (2, 3, 2, 2) tensor. To turn it into a (6, 2, 2) tensor we just flatten the first two dimensions using Tensor.flatten.
Related
Using PyTorch, torch.combinations will only take a 1D tensor as input but I would like to apply it to each 1D tensor in a multidimensional tensor.
inp = torch.tensor([[1, 2, 3],
[2, 3, 4]])
torch.combinations((inp), r=2)
The result is an error saying I can't apply it to that shape but I want to apply it to [1, 2, 3] and [2, 3, 4] individually. I can't do it one by one because the idea is to apply this to large sets of data.
inp = torch.tensor([[1,2,3],[2,3,4]])
inp_tuple = torch.unbind(inp)
print(inp_tuple)
(tensor([1, 2, 3]), tensor([2, 3, 4]))
torch.combinations((inp_tuple), r=2)
I also tried unbinding the tensor and applying it to the tuple of tensors but it gives an error saying it can't be applied to a tuple.
Is there any way that I can get torch.combinations to automatically apply to each individual 1D tensor in a multidimensional tensor or each tensor in a tuple of tensors? If not are there any alternatives to achieve all combinations of each individual part of a multidimensional tensor?
Function torch.combinations returns all possible combinations of size r of the elements contained in the 1D input vector. The reason why multi-dimensional inputs are not supported is probably that you have no guarantee that the different vectors in your input have the exact same number of unique elements. Obviously if one of the vectors has a duplicate element then you would end up with one set of combinations bigger than another which is simply not possible to represent with a homogenous PyTorch tensor.
So from there on, I will assume that the input tensor inp is a 2D tensor shaped (N, C) where each of its N vectors contains C unique elements. The example you gave would fit to this requirement since both vectors have three unique elements each: {1, 2, 3} and {2, 3, 4}.
>>> inp = torch.tensor([[1,2,3],[2,3,4]])
The idea is to apply torch.combinations on an arrangement tensor of length equal to that of our vectors. We can then use those as indices to gather values in our different vectors in our input tensor.
We can retrieve all combinations of an arrangement with the following:
>>> c = torch.combinations(torch.arange(inp.size(1)), r=2)
tensor([[0, 1],
[0, 2],
[1, 2]])
Then we need to reshape and expand both inp and c such that they match in number of dimensions:
>>> x = inp[:,None].expand(-1,len(c),-1)
tensor([[[1, 2, 3],
[1, 2, 3],
[1, 2, 3]],
[[2, 3, 4],
[2, 3, 4],
[2, 3, 4]]])
>>> idx = c[None].expand(len(x), -1, -1)
tensor([[[0, 1],
[0, 2],
[1, 2]],
[[0, 1],
[0, 2],
[1, 2]]])
Finally we can apply torch.gather on x and idx on dim=2. This will return a 3D tensor out such that:
out[i][j][k] = x[i][j][index[i][j][k]]
Let's make our call on torch.gather:
>>> x.gather(dim=2, index=idx)
tensor([[[1, 2],
[1, 3],
[2, 3]],
[[2, 3],
[2, 4],
[3, 4]]])
Which is the desired result.
Given a 1-d tensor:
A = torch.tensor([1, 2, 3, 4])
suppose we have some "indexer tensor"
ind1 = torch.tensor([3, 0, 1])
ind2 = torch.tensor([[3, 0], [1, 2]])
as we run A[ind1] & A[ind2]
we get results tensor([4, 1, 2]) & tensor([[4, 1],[2, 3]])
which is the same shape of the indexed tensor (ind1 and ind2) and its value are mapped from tensor A.
I want to ask how can I index on higher dimension tensors?
Currently I have one solution:
For a N-d tensor A, suppose we have the indexer tensor IND,
IND is like [[i11, i12, ... i1N], [i21, i22, ... i2N], ...[iM1, i22, ... iMN], where M is the number of indexed elements.
We can divide IND into N tensors, where
IND_1 = torch.tensor([i11, i21, ... iM1])
...
IND_N = torch.tensor([i1N, i2N, ... iMN])
as we run A[IND_1, ... IND_N], we got tensor(v1, v2, ... vM)
Example:
A = tensor([[1, 2], [3, 4]], [[5, 6], [7, 8]]]) # [2 * 2 * 2]
ind1 = tensor([1, 0, 1])
ind2 = tensor([1, 1, 0])
ind3 = tensor([0, 1, 0])
A[ind1, ind2, ind3]
=> tensor([7, 4, 5])
# and the good thing is you can control the shape of result tensor by modifying the inds' shape.
ind1 = tensor([[0, 0], [1, 0]])
ind2 = tensor([[1, 1], [0, 1]])
ind3 = tensor([[0, 1], [0, 0]])
A[ind1, ind2, ind3]
=> tensor([[3, 4],[5, 3]]) # same as inds' shape
Anyone has more elegant solutions?
1- Manual approach using unraveled indices on flattened input.
If you want to index on an arbitrary number of axes (all axes of A) then one straightforward approach is to flatten all dimensions and unravel the indices. Let's assume that A is 3D and we want to index it using a stack of ind1, ind2, and ind3:
>>> ind = torch.stack((ind1, ind2, ind3))
You can first unravel the indices using A's strides:
>>> unraveled = torch.tensor(A.stride()) # ind.flatten(1)
Then flatten A, index it with unraveled and reshape to the final form:
>>> A.flatten()[unraveled].reshape_as(ind[0])
2- Using a simple split of ind.
You can actually perform the same operation using torch.chunk:
>>> A[ind.chunk(len(ind))][0]
Or alternatively torch.split which is identical:
>>> A[ind.split(1)][0]
3- Initial answer for single-axis indexing.
Let's take a minimal multi-dimensional example with A being a 2-D tensor defined as:
>>> A = torch.tensor([[1, 2, 3, 4],
[5, 6, 7, 8]])
From your description of the problem:
the same shape of index tensor and its value are mapped from tensor A.
Then the indexer tensor would require to have the same shape as the indexed tensor A, since this one is no longer flat. Otherwise, what would the result of A (shaped (2, 4)) indexed by ind1 (shape (3,)) be?
If you are indexing on a single dimension then you can utilize torch.gather:
>>> A.gather(1, ind2)
tensor([[4, 1],
[6, 7]])
I am trying to get a good understanding on broadcasting rules in numpy, but I have noticed I firstly need to get a good understanding on what 1-dimensional numpy array is. I found multiple sources saying that 1-dimensional numpy array is neither a horizontal or vertical vector. From that I'd expect that it behaves differently depending on an operation done and other component of the operation. But I can't really find a case when 1-dimensional array would behave like a column vector. For example:
a = np.arange(3)
b = np.arange(3)[:, np.newaxis]
a + b
array([[0, 1, 2],
[1, 2, 3],
[2, 3, 4]])
which indicates that a behaves like a horizontal vector. On the other hand, if we add it to horizontal vector b:
a = np.arange(3)
b = np.arange(3)[np.newaxis, :]
a + b
array([[0, 1, 4]])
a still behaves like a horizontal vector. On the other hand a seems to be indifferent to transformation with .T. So my question is - does 1-dimensional numpy arrays always mimic the horizontal vector behaviour? If not, what are the cases when they behave like standard vertical vector?
What you just came across is known as right align property of numpy arrays. When you have a vector of shape (n, ) and some other array of shape (a, b, c, d, ..., z) then numpy will always try to broadcast the vector to shape (1, 1, ...., n) and finally check if n is broadcastable with z (in other words, z is a multiple of n).
Now, if you don't want the behaviour, you will have to tell numpy explicitly, how do you want to broadcast with the other array with which you are operating by adding axis to the vector using np.newaxis. You can also use the function np.broadcast_arrays to get the broadcasted arrays.
For example,
import numpy as np
a = np.array([1, 2, 3])
b = np.eye(3)
# broadcasts a to shape (1, 3) first
# adds the vector a to rows of b
# [[1, 0, 0] [[1, 2, 3]
# [0, 1, 0] + [1, 2, 3]
# [0, 0, 1]] [1, 2, 3]]
print(a + b)
# Tell numpy explicitly, how you want
# your vector to be broadcasted
# Now, a is first broadcasted to shape (3, 1)
# and the vector a is added to the columns of b
# [[1, 0, 0] [[1, 1, 1]
# [0, 1, 0] + [2, 2, 2]
# [0, 0, 1]] [3, 3, 3]]
print(b + a[np.newaxis, :])
I find it weird that numpy.power has no axis argument... is it because there is a better/safer way to achieve the same goal (elevating each 2D array in a 3D array to the power of a 1D array).
Suppose you have a (3,10,10) array (A) and you want to elevate each (10,10) array to the power of elements in array B of shape (3,).
You should be able to do it by using np.power(A,B,axis=0), right?
Yet it yields the following TypeError :
TypeError: 'axis' is an invalid keyword to ufunc 'power'
Since it seems that power does not have an axis or axes argument (despite being an ufunc), what is the preferred way to do it ?
There may be a solution using the ufunc.reduce method but I don't really see how that would work with numpy.power...
For now I do :
np.array([A[i,:,:]**B[i] for i in range(3)])
But it looks ugly and is probably less efficient than a numpy method would be.
Thanks
power is not a reduction operation: it does not reduce a collection of numbers to a single number, so an axis argument doesn't make sense. Operations such as sum or max are reductions, so it is meaningful to specify an axis along which to apply the reduction.
The operation that you want is broadcasting. Here's a smaller example, with A having shape (3, 2, 2) and B having shape (3,). We can't write np.power(A, B), because the shapes are not compatible for broadcasting. We first have to add trivial dimensions to B to give it the shape (3, 1, 1). That can be done with, for example, B[:, np.newaxis, np.newaxis] or B.reshape(-1, 1, 1).
In [100]: A
Out[100]:
array([[[1, 1],
[3, 3]],
[[3, 2],
[1, 1]],
[[3, 2],
[1, 3]]])
In [101]: B
Out[101]: array([2, 1, 3])
In [102]: np.power(A, B[:, np.newaxis, np.newaxis])
Out[102]:
array([[[ 1, 1],
[ 9, 9]],
[[ 3, 2],
[ 1, 1]],
[[27, 8],
[ 1, 27]]])
The value of np.newaxis is None, so you'll often see expressions that use None instead of np.newaxis. You can also using the ** operator instead of the function power:
In [103]: A ** B[:, None, None]
Out[103]:
array([[[ 1, 1],
[ 9, 9]],
[[ 3, 2],
[ 1, 1]],
[[27, 8],
[ 1, 27]]])
Let us say one has an array of 2D vectors:
v = np.array([ [1, 1], [1, 1], [1, 1], [1, 1]])
v.shape = (4, 2)
And an array of scalars:
s = np.array( [2, 2, 2, 2] )
s.shape = (4,)
I would like the result:
f(v, s) = np.array([ [2, 2], [2, 2], [2, 2], [2, 2]])
Now, executing v*s is an error. Then, what is the most efficient way to go about implementing f?
Add a new singular dimension to the vector:
v*s[:,None]
This is equivalent to reshaping the vector as (len(s), 1). Then, the shapes of the multiplied objects will be (4,2) and (4,1), which are compatible due to NumPy broadcasting rules (corresponding dimensions are either equal to each other or equal to 1).
Note that when two operands have unequal numbers of dimensions, NumPy will insert extra singular dimensions "in front" of the operand with fewer dimensions. This would make your vector (1,4) which is incompatible with (4,2). Therefore, we explicitly specify where the extra dimensions are added, in order to make the shapes compatible.