PyTorch lists as indexes instead of slice notation - python

I am working with BERT context vectors and I am trying to extract specific layer activations for specific tokens. I can extract sequential layers fine using ":" slice notation but I want specific layers given by a list (or some other method) e.g. first and fourth layers only.
# (num_target_tokens, num_tokens_in_sequence, num_bert_layers, size_of_bert_vector)
example = torch.randn([3, 12, 13, 768])
indices = torch.tensor([[0, 1], [1, 10], [2, 11]])
a = example[indices[:, 0], indices[:, 1], -4:]
b = example[indices[:, 0], indices[:, 1], 1:5]
# (num_target_tokens, num_specified_layers, size_of_bert_vector)
a.shape
>> torch.Size([3, 4, 768])
b.shape
>> torch.Size([3, 4, 768])
# Desired output shape: torch.Size([3, 4, 768])
c = example[indices[:, 0], indices[:, 1], [1, 3, 5, 7]] # Desired usage
>> shape mismatch: indexing tensors could not be broadcast together with shapes [3], [3], [4]
Is there some elegant way to achieve this with indexing or will I need to split my tensors to achieve this result.

Related

Applying torch.combinations on multidimensional tensor or tuple of tensors in PyTorch?

Using PyTorch, torch.combinations will only take a 1D tensor as input but I would like to apply it to each 1D tensor in a multidimensional tensor.
inp = torch.tensor([[1, 2, 3],
[2, 3, 4]])
torch.combinations((inp), r=2)
The result is an error saying I can't apply it to that shape but I want to apply it to [1, 2, 3] and [2, 3, 4] individually. I can't do it one by one because the idea is to apply this to large sets of data.
inp = torch.tensor([[1,2,3],[2,3,4]])
inp_tuple = torch.unbind(inp)
print(inp_tuple)
(tensor([1, 2, 3]), tensor([2, 3, 4]))
torch.combinations((inp_tuple), r=2)
I also tried unbinding the tensor and applying it to the tuple of tensors but it gives an error saying it can't be applied to a tuple.
Is there any way that I can get torch.combinations to automatically apply to each individual 1D tensor in a multidimensional tensor or each tensor in a tuple of tensors? If not are there any alternatives to achieve all combinations of each individual part of a multidimensional tensor?
Function torch.combinations returns all possible combinations of size r of the elements contained in the 1D input vector. The reason why multi-dimensional inputs are not supported is probably that you have no guarantee that the different vectors in your input have the exact same number of unique elements. Obviously if one of the vectors has a duplicate element then you would end up with one set of combinations bigger than another which is simply not possible to represent with a homogenous PyTorch tensor.
So from there on, I will assume that the input tensor inp is a 2D tensor shaped (N, C) where each of its N vectors contains C unique elements. The example you gave would fit to this requirement since both vectors have three unique elements each: {1, 2, 3} and {2, 3, 4}.
>>> inp = torch.tensor([[1,2,3],[2,3,4]])
The idea is to apply torch.combinations on an arrangement tensor of length equal to that of our vectors. We can then use those as indices to gather values in our different vectors in our input tensor.
We can retrieve all combinations of an arrangement with the following:
>>> c = torch.combinations(torch.arange(inp.size(1)), r=2)
tensor([[0, 1],
[0, 2],
[1, 2]])
Then we need to reshape and expand both inp and c such that they match in number of dimensions:
>>> x = inp[:,None].expand(-1,len(c),-1)
tensor([[[1, 2, 3],
[1, 2, 3],
[1, 2, 3]],
[[2, 3, 4],
[2, 3, 4],
[2, 3, 4]]])
>>> idx = c[None].expand(len(x), -1, -1)
tensor([[[0, 1],
[0, 2],
[1, 2]],
[[0, 1],
[0, 2],
[1, 2]]])
Finally we can apply torch.gather on x and idx on dim=2. This will return a 3D tensor out such that:
out[i][j][k] = x[i][j][index[i][j][k]]
Let's make our call on torch.gather:
>>> x.gather(dim=2, index=idx)
tensor([[[1, 2],
[1, 3],
[2, 3]],
[[2, 3],
[2, 4],
[3, 4]]])
Which is the desired result.

Pytorch-index on multiple dimension tensor in a batch

Given a 1-d tensor:
A = torch.tensor([1, 2, 3, 4])
suppose we have some "indexer tensor"
ind1 = torch.tensor([3, 0, 1])
ind2 = torch.tensor([[3, 0], [1, 2]])
as we run A[ind1] & A[ind2]
we get results tensor([4, 1, 2]) & tensor([[4, 1],[2, 3]])
which is the same shape of the indexed tensor (ind1 and ind2) and its value are mapped from tensor A.
I want to ask how can I index on higher dimension tensors?
Currently I have one solution:
For a N-d tensor A, suppose we have the indexer tensor IND,
IND is like [[i11, i12, ... i1N], [i21, i22, ... i2N], ...[iM1, i22, ... iMN], where M is the number of indexed elements.
We can divide IND into N tensors, where
IND_1 = torch.tensor([i11, i21, ... iM1])
...
IND_N = torch.tensor([i1N, i2N, ... iMN])
as we run A[IND_1, ... IND_N], we got tensor(v1, v2, ... vM)
Example:
A = tensor([[1, 2], [3, 4]], [[5, 6], [7, 8]]]) # [2 * 2 * 2]
ind1 = tensor([1, 0, 1])
ind2 = tensor([1, 1, 0])
ind3 = tensor([0, 1, 0])
A[ind1, ind2, ind3]
=> tensor([7, 4, 5])
# and the good thing is you can control the shape of result tensor by modifying the inds' shape.
ind1 = tensor([[0, 0], [1, 0]])
ind2 = tensor([[1, 1], [0, 1]])
ind3 = tensor([[0, 1], [0, 0]])
A[ind1, ind2, ind3]
=> tensor([[3, 4],[5, 3]]) # same as inds' shape
Anyone has more elegant solutions?
1- Manual approach using unraveled indices on flattened input.
If you want to index on an arbitrary number of axes (all axes of A) then one straightforward approach is to flatten all dimensions and unravel the indices. Let's assume that A is 3D and we want to index it using a stack of ind1, ind2, and ind3:
>>> ind = torch.stack((ind1, ind2, ind3))
You can first unravel the indices using A's strides:
>>> unraveled = torch.tensor(A.stride()) # ind.flatten(1)
Then flatten A, index it with unraveled and reshape to the final form:
>>> A.flatten()[unraveled].reshape_as(ind[0])
2- Using a simple split of ind.
You can actually perform the same operation using torch.chunk:
>>> A[ind.chunk(len(ind))][0]
Or alternatively torch.split which is identical:
>>> A[ind.split(1)][0]
3- Initial answer for single-axis indexing.
Let's take a minimal multi-dimensional example with A being a 2-D tensor defined as:
>>> A = torch.tensor([[1, 2, 3, 4],
[5, 6, 7, 8]])
From your description of the problem:
the same shape of index tensor and its value are mapped from tensor A.
Then the indexer tensor would require to have the same shape as the indexed tensor A, since this one is no longer flat. Otherwise, what would the result of A (shaped (2, 4)) indexed by ind1 (shape (3,)) be?
If you are indexing on a single dimension then you can utilize torch.gather:
>>> A.gather(1, ind2)
tensor([[4, 1],
[6, 7]])

Broadcast and concatenate ragged tensors

I have a ragged tensor of dimensions [BATCH_SIZE, TIME_STEPS, EMBEDDING_DIM]. I want to augment the last axis with data from another tensor of shape [BATCH_SIZE, AUG_DIM]. Each time step of a given example gets augmented with the same value.
If the tensor wasn't ragged with varying TIME_STEPS for each example, I could simply reshape the second tensor with tf.repeat and then use tf.concat:
import tensorflow as tf
# create data
# shape: [BATCH_SIZE, TIME_STEPS, EMBEDDING_DIM]
emb = tf.constant([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [0, 0, 0]]])
# shape: [BATCH_SIZE, 1, AUG_DIM]
aug = tf.constant([[[8]], [[9]]])
# concat
aug = tf.repeat(aug, emb.shape[1], axis=1)
emb_aug = tf.concat([emb, aug], axis=-1)
This doesn't approach work when emb is ragged since emb.shape[1] is unknown and varies across examples:
# rag and remove padding
emb = tf.RaggedTensor.from_tensor(emb, padding=(0, 0, 0))
# reshape for augmentation - this doesn't work
aug = tf.repeat(aug, emb.shape[1], axis=1)
ValueError: Attempt to convert a value (None) with an unsupported type (<class 'NoneType'>) to a Tensor.
The goal is to create a ragged tensor emb_aug which looks like this:
<tf.RaggedTensor [[[1, 2, 3, 8], [4, 5, 6, 8]], [[1, 2, 3 ,9]]]>
Any ideas?
The easiest way to do this is to just make your ragged tensor a regular tensor by using tf.RaggedTensor.to_tensor() and then do the rest of your solution. I'll assume that you need the tensor to remain ragged. The key is to find the row_lengths of each batch in your ragged tensor, and then use this information to make your augmentation tensor ragged.
Example:
import tensorflow as tf
# data
emb = tf.constant([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [0, 0, 0]]])
aug = tf.constant([[[8]], [[9]]])
# make embeddings ragged for testing
emb_r = tf.RaggedTensor.from_tensor(emb, padding=(0, 0, 0))
print(emb_r.shape)
# (2, None, 3)
Here we'll use a combination of row_lengths and sequence_mask to create a new ragged tensor.
# find the row lengths of the embeddings
rl = emb_r.row_lengths()
print(rl)
# tf.Tensor([2 1], shape=(2,), dtype=int64)
# find the biggest row length
max_rl = tf.math.reduce_max(rl)
print(max_rl)
# tf.Tensor(2, shape=(), dtype=int64)
# repeat the augmented data `max_rl` number of times
aug_t = tf.repeat(aug, repeats=max_rl, axis=1)
print(aug_t)
# tf.Tensor(
# [[[8]
# [8]]
#
# [[9]
# [9]]], shape=(2, 2, 1), dtype=int32)
# create a mask
msk = tf.sequence_mask(rl)
print(msk)
# tf.Tensor(
# [[ True True]
# [ True False]], shape=(2, 2), dtype=bool)
From here we can use tf.ragged.boolean_mask to make the augmented data ragged
# make the augmented data a ragged tensor
aug_r = tf.ragged.boolean_mask(aug_t, msk)
print(aug_r)
# <tf.RaggedTensor [[[8], [8]], [[9]]]>
# concatenate!
output = tf.concat([emb_r, aug_r], 2)
print(output)
# <tf.RaggedTensor [[[1, 2, 3, 8], [4, 5, 6, 8]], [[1, 2, 3, 9]]]>
You can find the list of tensorflow methods that support ragged tensors here
Ragged Tensors can be constructed from row lengths directly.
The values input is a flat (with respect to the future ragged dimension not all other dimensions) tensor that can be constructed using tf.repeat, again using the row_lengths to find the appropriate number of repeats per sample!
ragged_lengths = emb.row_lengths()
aug = tf.RaggedTensor.from_row_lengths(
values=tf.repeat(aug, ragged_lengths, axis=0),
row_lengths=ragged_lengths)
emb_aug = tf.concat([emb, aug], axis=-1)

How does PyTorch Tensor.index_select() evaluates tensor output?

I am not able to understand how complex indexing - non contiguous indexing of a tensor works. Here is a sample code and its output
import torch
def describe(x):
print("Type: {}".format(x.type()))
print("Shape/size: {}".format(x.shape))
print("Values: \n{}".format(x))
indices = torch.LongTensor([0,2])
x = torch.arange(6).view(2,3)
describe(torch.index_select(x, dim=1, index=indices))
Returns output as
Type: torch.LongTensor Shape/size: torch.Size([2, 2]) Values:
tensor([[0, 2],
[3, 5]])
Can someone explain how did it arrive to this output tensor?
Thanks!
You are selecting the first (indices[0] is 0) and third (indices[1] is 2) tensors from x on the first axis (dim=0). Essentially, torch.index_select with dim=1 works the same as doing a direct indexing on the second axis with x[:, indices].
>>> x
tensor([[0, 1, 2],
[3, 4, 5]])
So selecting columns (since you're looking at dim=1 and not dim=0) which indices are in indices. Imagine having a simple list [0, 2] as indices:
>>> indices = [0, 2]
>>> x[:, indices[0]] # same as x[:, 0]
tensor([0, 3])
>>> x[:, indices[1]] # same as x[:, 2]
tensor([2, 5])
So passing the indices as a torch.Tensor allows you to index on all elements of indices directly, i.e. columns 0 and 2. Similar to how NumPy's indexing works.
>>> x[:, indices]
tensor([[0, 2],
[3, 5]])
Here's another example to help you see how it works. With x defined as x = torch.arange(9).view(3, 3) so we have 3 rows (a.k.a. dim=0) and 3 columns (a.k.a. dim=1).
>>> indices
tensor([0, 2]) # namely 'first' and 'third'
>>> x = torch.arange(9).view(3, 3)
tensor([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> x.index_select(0, indices) # select first and third rows
tensor([[0, 1, 2],
[6, 7, 8]])
>>> x.index_select(1, indices) # select first and third columns
tensor([[0, 2],
[3, 5],
[6, 8]])
Note: torch.index_select(x, dim, indices) is equivalent to x.index_select(dim, indices)

keras-gcn fit model ValueError

I'm using this library to create a model to learn graphs. Here is the code (from repository):
import numpy as np
from keras_gcn.backend import keras
from keras_gcn import GraphConv
# feature matrix
input_data = np.array([[[0, 1, 2],
[2, 3, 4],
[4, 5, 6],
[7, 7, 8]]])
# adjacency matrix
input_edge = np.array([[[1, 1, 1, 0],
[1, 1, 0, 0],
[1, 0, 1, 0],
[0, 0, 0, 1]]])
labels = np.array([[[1],
[0],
[1],
[0]]])
data_layer = keras.layers.Input(shape=(None, 3), name='Input-Data')
edge_layer = keras.layers.Input(shape=(None, None), dtype='int32', name='Input-Edge')
conv_layer = GraphConv(units=4, step_num=1, kernel_initializer='ones',
bias_initializer='ones', name='GraphConv')([data_layer, edge_layer])
model = keras.models.Model(inputs=[data_layer, edge_layer], outputs=conv_layer)
model.compile(optimizer='adam', loss='mae', metrics=['mae'])
model.fit([input_data, input_edge], labels)
However, when I run the code I get the following error:
ValueError: Error when checking target: expected GraphConv to have 3 dimensions, but got array with shape (4, 1)
while the shape of labels is (1, 4, 1)
You should encode your labels using onehot-encoder, something like the following:
lables = np.array([[[0, 1],
[1, 0],
[0, 1],
[1, 0]]])
Also number of units in GraphConv layer should be equal to the number of unique labels which is 2 in your case.
I think the issue is mismatch between the shapes of your edge_layer and data_layer.
When you use the function keras.layers.Input you're giving data_layer a shape of shape=(None, 3) and then you're giving edge_layer a shape of shape=(None, None)
Match the shapes and let me know how it goes.

Categories