Let I have a tensor dimension of (B, N^2, C)
and I reshape it into (B, C, N, N).
I think that I have two choices below
A = torch.rand(5, 100, 20) # Original Tensor
# First Method
B = torch.transpose(2, 1)
B = B.view(5, 20, 10, 10)
# Second Method
C = A.view(5, 20, 10, 10)
Both methods work but the outputs are slightly different and I cannot catch the difference between them.
Thanks
The difference between B and C is that you have used torch.transpose which means you have swapped two axes, this means you have changed the layout of the memory. The view at the end is just a nice interface for you to access your data but it has no effect on the underlying data of your tensor. What it comes down to is a contiguous memory data buffer.
If you take a smaller example, something we can grasp more easily:
>>> A = torch.rand(1, 4, 3)
tensor([[[0.2656, 0.5920, 0.3774],
[0.8447, 0.5984, 0.0614],
[0.5160, 0.8048, 0.6260],
[0.1644, 0.3144, 0.1040]]])
Here swapping axis=1 and axis=2 comes down to a batched transpose (in mathematical terms):
>>> B = A.transpose(2, 1)
tensor([[[0.4543, 0.7447, 0.7814, 0.3444],
[0.9766, 0.2732, 0.4766, 0.0387],
[0.0123, 0.7260, 0.8939, 0.8581]]])
In terms of memory layout A has the following memory arangement:
>>> A.flatten()
tensor([0.4543, 0.9766, 0.0123, 0.7447, 0.2732, 0.7260, 0.7814, 0.4766, 0.8939,
0.3444, 0.0387, 0.8581])
While B has a different layout. By layout I mean memory arrangement, I am not referring to its shape which is irrelevant:
>>> B.flatten()
tensor([0.4543, 0.7447, 0.7814, 0.3444, 0.9766, 0.2732, 0.4766, 0.0387, 0.0123,
0.7260, 0.8939, 0.8581])
As I said reshaping i.e. building a view on top of a tensor doesn't change its memory layout, it's an abstraction level to better manipulate tensors.
So in the end, yes you end up with two different results: C shares the same data as A, while B is a copy and has a different memory layout.
Transposing/permuting and view/reshape are NOT the same!
reshape and view only affect the shape of a tensor, but d not change the underlying order of elements.
In contrast, transpose and permute change the underlying order of elements in the tensor. See this answer, and this one for more details.
Here's an example, with B=1, N=3 and C=2, the first channel has even numbers 0..16, and the second channel has odd numbers 1..17:
A = torch.arange(2*9).view(1,9,2)
tensor([[[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11],
[12, 13],
[14, 15],
[16, 17]]])
If you correctly transpose and then reshape, you get the correct split into even and odd channels:
A.transpose(1,2).view(1,2,3,3)
tensor([[[[ 0, 2, 4],
[ 6, 8, 10],
[12, 14, 16]],
[[ 1, 3, 5],
[ 7, 9, 11],
[13, 15, 17]]]])
However, if you only change the shape (i.e., using view or reshape) you incorrectly "mix" the values from the two channels:
A.view(1,2,3,3)
tensor([[[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]]]])
Update (Aug 31st, 2022)
Take a look at this simple example:
# original tensor
x = torch.arange(12).view(3,4)
x.data_ptr() # -> 94308398597888
x.stride() # -> (4, 1)
# transpose
x1 = x.transpose(0, 1)
x1.data_ptr() # -> 94308398597888 (same data)
x1.stride() # -> (1, 4) efficient stride representation can handle this
# messing around a bit more:
x1.view(3,4)
# strides cannot cut it anymore - we get an error
RuntimeError: view size is not compatible with input tensor''s size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
# using reshape:
x2 = x1.reshape(3, 4)
x2.data_ptr() # -> 94308399099200 (NOT the same data)
x2.stride() # -> (4, 1)
I have a list:
code = ['<s>', 'are', 'defined', 'in', 'the', '"editable', 'parameters"', '\n', 'section.', '\n', 'A', 'larger', '`tsteps`', 'value', 'means', 'that', 'the', 'LSTM', 'will', 'need', 'more', 'memory', '\n', 'to', 'figure', 'out']
And I want to convert to one hot encoding. I tried:
to_categorical(code)
And I get an error: ValueError: invalid literal for int() with base 10: '<s>'
What am I doing wrong?
keras only supports one-hot-encoding for data that has already been integer-encoded. You can manually integer-encode your strings like so:
Manual encoding
# this integer encoding is purely based on position, you can do this in other ways
integer_mapping = {x: i for i,x in enumerate(code)}
vec = [integer_mapping[word] for word in code]
# vec is
# [0, 1, 2, 3, 16, 5, 6, 22, 8, 22, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
Using scikit-learn
from sklearn.preprocessing import LabelEncoder
import numpy as np
code = np.array(code)
label_encoder = LabelEncoder()
vec = label_encoder.fit_transform(code)
# array([ 2, 6, 7, 9, 19, 1, 16, 0, 17, 0, 3, 10, 5, 21, 11, 18, 19,
# 4, 22, 14, 13, 12, 0, 20, 8, 15])
You can now feed this into keras.utils.to_categorical:
from keras.utils import to_categorical
to_categorical(vec)
instead use
pandas.get_dummies(y_train)
tf.keras.layers.CategoryEncoding
In TF 2.6.0, One Hot Encoding (OHE) or Multi Hot Encoding (MHE) can be implemented using tf.keras.layers.CategoryEncoding , tf.keras.layers.StringLookup, and tf.keras.layers.IntegerLookup.
I think this way is not plausible in TF 2.4.x so it must have been implemented after.
See Classify structured data using Keras preprocessing layers for the actual implementation.
def get_category_encoding_layer(name, dataset, dtype, max_tokens=None):
# Create a layer that turns strings into integer indices.
if dtype == 'string':
index = layers.StringLookup(max_tokens=max_tokens)
# Otherwise, create a layer that turns integer values into integer indices.
else:
index = layers.IntegerLookup(max_tokens=max_tokens)
# Prepare a `tf.data.Dataset` that only yields the feature.
feature_ds = dataset.map(lambda x, y: x[name])
# Learn the set of possible values and assign them a fixed integer index.
index.adapt(feature_ds)
# Encode the integer indices.
encoder = layers.CategoryEncoding(num_tokens=index.vocabulary_size())
# Apply multi-hot encoding to the indices. The lambda function captures the
# layer, so you can use them, or include them in the Keras Functional model later.
return lambda feature: encoder(index(feature))
Try converting it to a numpy array first:
from numpy import array
and then:
to_categorical(array(code))
Given an input tensor of shape (C, B, H) torch.Size([2, 5, 32]) of some neural net layers, where
channels = 2
batch_size = 5
hidden_size = 32
The goal is to flatten the channels and manipulate the input tensor to the shape (B, C*H) torch.Size([5, 2 * 32]), where:
batch_size = 5
hidden_size = 32 * 2
I've tried to do the following:
import torch
t = torch.rand([2, 5, 32])
# Changed from (channels, batch_size, hidden_size)
# -> (batch_size, channels, hidden_size)
t = t.permute(1, 0, 2)
# Reshape using view(), where batch_size is t.size(0)
# and -1 is to flatten the left over values to the other dimension.
z = t.contiguous().view(t.size(0), -1)
print(z.shape)
print(z)
[out]:
torch.Size([5, 64])
tensor([[0.3911, 0.9586, 0.2104, 0.3937, 0.9976, 0.3378, 0.0630, 0.6676, 0.0806,
0.9311, 0.5219, 0.1697, 0.7442, 0.5162, 0.2555, 0.0826, 0.5502, 0.9700,
0.3375, 0.5012, 0.9025, 0.8176, 0.1465, 0.1848, 0.3460, 0.9999, 0.7892,
0.7577, 0.6615, 0.2620, 0.6868, 0.2003, 0.4840, 0.8354, 0.9253, 0.3172,
0.9516, 0.8962, 0.1272, 0.2268, 0.6510, 0.5166, 0.6772, 0.9616, 0.9826,
0.5254, 0.9191, 0.4378, 0.7048, 0.8808, 0.0299, 0.1102, 0.9710, 0.8714,
0.7256, 0.9684, 0.6117, 0.1957, 0.8663, 0.4742, 0.2843, 0.6548, 0.9592,
0.1559],
[0.2333, 0.0858, 0.5284, 0.2965, 0.3863, 0.3370, 0.6940, 0.3387, 0.3513,
0.1022, 0.3731, 0.3575, 0.7095, 0.0053, 0.7024, 0.4091, 0.3289, 0.5808,
0.5640, 0.8847, 0.7584, 0.8878, 0.9873, 0.0525, 0.7731, 0.2501, 0.9926,
0.5226, 0.0925, 0.0300, 0.4176, 0.0456, 0.4643, 0.4497, 0.5920, 0.9519,
0.6647, 0.2379, 0.4927, 0.9666, 0.1675, 0.9887, 0.7741, 0.5668, 0.7376,
0.4452, 0.7449, 0.1298, 0.9065, 0.3561, 0.5813, 0.1439, 0.2115, 0.5874,
0.2038, 0.1066, 0.3843, 0.6179, 0.8321, 0.9428, 0.1067, 0.5045, 0.9324,
0.3326],
[0.6556, 0.1479, 0.9288, 0.9238, 0.1324, 0.0718, 0.6620, 0.2659, 0.7162,
0.7559, 0.7564, 0.2120, 0.3943, 0.9497, 0.7520, 0.8455, 0.4444, 0.4708,
0.8371, 0.6365, 0.3616, 0.0326, 0.1581, 0.4973, 0.6701, 0.9245, 0.8274,
0.3464, 0.7044, 0.5376, 0.0441, 0.5210, 0.8603, 0.7396, 0.2544, 0.3514,
0.5686, 0.3283, 0.7248, 0.4303, 0.9531, 0.5587, 0.8703, 0.1585, 0.9161,
0.9043, 0.9778, 0.4489, 0.9463, 0.8655, 0.5576, 0.1135, 0.1268, 0.3424,
0.1504, 0.2265, 0.1734, 0.1872, 0.3995, 0.1191, 0.0532, 0.6109, 0.1662,
0.6937],
[0.6342, 0.1922, 0.1758, 0.4625, 0.7654, 0.6509, 0.2908, 0.1546, 0.4768,
0.3779, 0.2490, 0.0086, 0.6170, 0.5425, 0.6953, 0.4730, 0.5834, 0.8326,
0.0165, 0.8236, 0.0023, 0.7479, 0.5621, 0.9894, 0.5957, 0.0857, 0.6087,
0.5667, 0.5478, 0.8197, 0.9228, 0.7329, 0.4434, 0.5894, 0.9860, 0.6133,
0.2395, 0.4718, 0.8830, 0.6361, 0.6104, 0.6630, 0.5084, 0.7604, 0.7591,
0.3601, 0.6888, 0.6767, 0.9178, 0.5291, 0.0591, 0.4320, 0.7875, 0.5038,
0.4419, 0.0319, 0.3719, 0.5843, 0.0334, 0.3525, 0.0023, 0.1205, 0.4040,
0.7908],
[0.0989, 0.8436, 0.0425, 0.6247, 0.6091, 0.4778, 0.2692, 0.4785, 0.9217,
0.9604, 0.6355, 0.4686, 0.9414, 0.7722, 0.8013, 0.1660, 0.6578, 0.6414,
0.6814, 0.6212, 0.4124, 0.7102, 0.7416, 0.7404, 0.9842, 0.6542, 0.0106,
0.3826, 0.5529, 0.8079, 0.9855, 0.3012, 0.2341, 0.9353, 0.6597, 0.7177,
0.8214, 0.1438, 0.4729, 0.6747, 0.9310, 0.4167, 0.3689, 0.8464, 0.9395,
0.9407, 0.8419, 0.5486, 0.1786, 0.1423, 0.9900, 0.9365, 0.3996, 0.1862,
0.6232, 0.7547, 0.7779, 0.4767, 0.6218, 0.9079, 0.6153, 0.1488, 0.5960,
0.4015]])
Although the permute() + view() achieve the desired output, are there other ways to perform the same operation? Is there a better way that can directly rehape without first permutating the order of the shape?
Let's look "behind the curtain" and see why one must have both permute/transpose and view in order to go from a C-B-H to B-C*H:
Elements of tensors are stored as a long contiguous vector in memory. For instance, if you look at a 2-3-4 tensor it has 24 elements stored at 24 consecutive places in memory. This tensor also has a "header" that tells pytorch to treat these 24 values as a 2-by-3-by-4 tensor. This is done by storing not only the size of the tensor, but also "strides": what is the "stride" one need to jump in order to get to the next element along each dimension. In our example, size=(2,3,4) and strides=(12, 4, 1) (you can check this out yourself, and you can see more about it here).
Now, if you only want to change the size to 2-(3*4) you do not need to move any item of the tensor in memory, only to update the "header" of the tensor. By setting size=(2, 12) and strides=(12, 1) you are done!
Alternatively, if you want to "transpose" the tensor to 3-2-4 that's a bit more tricky, but you can still do that by manipulating the strides. Setting size=(3, 2, 4) and strides=(4, 12, 1) gives you exactly what you want without moving any of the real tensor elements in memory.
However, once you manipulated the strides, you cannot trivially change the size of the tensor - because now you will need to have two different "stride" values for one (or more) dimensions. This is why you must call contiguous() at this point.
Summary
If you want to move from shape (C, B, H) to (B, C*H) you must have permute, contiguous and view operations, otherwise you just scramble the entries of your tensor.
A small example with 2-3-4 tensor:
a =
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]],
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
If you just change the view of the tensor you get
a.view(3,8)
array([[ 0, 1, 2, 3, 4, 5, 6, 7],
[ 8, 9, 10, 11, 12, 13, 14, 15],
[16, 17, 18, 19, 20, 21, 22, 23]])
Which is not what you want!
You need to have
a.permute(1,0,2).contiguous().view(3, 8)
array([[ 0, 1, 2, 3, 12, 13, 14, 15],
[ 4, 5, 6, 7, 16, 17, 18, 19],
[ 8, 9, 10, 11, 20, 21, 22, 23]])
Einops allows doing such element rearrangements in one (readable) line
from einops import rearrange
import torch
t = torch.rand([2, 5, 32])
y = rearrange(t, 'c b h -> b (c h)')
y.shape # prints torch.Size([5, 64])