Pytorch crossentropy loss with 3d input - python

I have a network which outputs a 3D tensor of size (batch_size, max_len, num_classes). My groud truth is in the shape (batch_size, max_len). If I do perform one-hot encoding on the labels, it'll be of shape (batch_size, max_len, num_classes) i.e the values in max_len are integers in the range [0, num_classes]. Since the original code is too long, I have written a simpler version that reproduces the original error.
criterion = nn.CrossEntropyLoss()
batch_size = 32
max_len = 350
num_classes = 1000
pred = torch.randn([batch_size, max_len, num_classes])
label = torch.randint(0, num_classes,[batch_size, max_len])
pred = nn.Softmax(dim = 2)(pred)
criterion(pred, label)
the shape of pred and label are respectively,torch.Size([32, 350, 1000]) and torch.Size([32, 350])
The error encountered is
ValueError: Expected target size (32, 1000), got torch.Size([32, 350, 1000])
If I one-hot encode labels for computing the loss
x = nn.functional.one_hot(label)
criterion(pred, x)
it'll throw the following error
ValueError: Expected target size (32, 1000), got torch.Size([32, 350, 1000])

From the Pytorch documentation, CrossEntropyLoss expects the shape of its input to be (N, C, ...), so the second dimension is always the number of classes. Your code should work if you reshape preds to be of size (batch_size, num_classes, max_len).

Related

Input and output shape to GRU layer in PyTorch

I am getting confused about the input shape to GRU layer.
I have a batch of 128 images and I extracted 9 features from each images.
So now my shape is (1,128,9).
This is the GRU layer
gru=torch.nn.GRU(input_size=128,hidden_size=8,batch_first=True)
Question 1: Is the input_size=128 correctly defined?
Here is the code of forward function
def forward(features):
features=features.permute(0,2,1)#[1, 9, 128]
x2,_=self.gru(features)
Question 2: Is the `code in forward function is correctly defined?
Thanks
No, input_size is not correctly defined. Here, input_size means the number of features in a single input vector of the sequence. The input to the GRU is a sequence of vectors, each input being a 1-D tensor of length input_size. In case of batched input, the input to GRU is a batch of sequence of vectors, so the shape should be (batch_size, sequence_length, input_size) when batch_first=True otherwise the expected shape is (sequence_length, batch_size, input_size) when batch_first=False
import torch
batch_size = 128
input_size = 9 # features in the input
seq_len = 5 # seqence length - how many input vectors in one sequence
hidden_size = 20 # the no of fetures in the output of GRU
gru=torch.nn.GRU(input_size=input_size,hidden_size=hidden_size,batch_first=True)
X = torch.rand( (batch_size, seq_len, input_size), dtype = torch.float32 )
print(f'{X.shape=}')
Y,_ = gru(X)
print(f'{Y.shape=}')
output
"""
X.shape=torch.Size([128, 5, 9])
Y.shape=torch.Size([128, 5, 20])
"""
Using batch_first=False
gru=torch.nn.GRU(input_size=input_size,hidden_size=hidden_size,batch_first=False)
X = torch.rand( (seq_len, batch_size, input_size), dtype = torch.float32 )
print(f'{X.shape=}')
Y,_ = gru(X)
print(f'{Y.shape=}')
output
"""
X.shape=torch.Size([5, 128, 9])
Y.shape=torch.Size([5, 128, 20])
"""

Issues with the output size of a Many-to-Many CNN-LSTM in PyTorch

I am trying to build a binary temporal image classifier by combining ResNet18 and an LSTM. However, I have never really used RNNs before and have been struggling on getting the correct output shape.
I am using a batch size of 128 and a sequence size of 32. The images are 80x80 grayscale images.
The current model is:
class CNNLSTM(nn.Module):
def __init__(self):
super(CNNLSTM, self).__init__()
self.resnet = models.resnet18(pretrained=False)
self.resnet.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3)
self.resnet.fc = nn.Sequential(nn.Linear(in_features=512, out_features=256, bias=True))
self.lstm = nn.LSTM(input_size=256, hidden_size=256, num_layers=3)
self.fc1 = nn.Linear(256, 128)
self.fc2 = nn.Linear(128, 1)
def forward(self, x_3d):
#x3d: torch.Size([128, 32, 1, 80, 80])
hidden = None
toret = []
for t in range(x_3d.size(1)):
x = self.resnet(x_3d[:, t, :, :, :])
out, hidden = self.lstm(x.unsqueeze(0), hidden)
x = self.fc1(out[-1, :, :])
x = F.relu(x)
x = self.fc2(x)
print("x shape: ", x.shape)
toret.append(x)
return torch.stack(toret)
Which returns a tensor of shape torch.Size([32, 128, 1]) which, according to what I understand, means that every nth row represents the nth time step of each element in the sequence.
How can I get output of shape 128x1x32 instead?
And is there a better way to do this?
You could permute the dimensions:
a = torch.rand(32, 128, 1)
a = a.permute(1, 2, 0) # these are the indices of the original dimensions
print(a.shape)
>> torch.Size([128, 1, 32])
But you could also set batch_first=True in the LSTM module:
self.lstm = nn.LSTM(input_size=256, hidden_size=256, num_layers=3, batch_first=True)
This will expect that the input to the LSTM has the shape batch-size x seq-len x features and will output a tensor in the same way.

ValueError: Can not squeeze dim[1], expected a dimension of 1, got 3 for '{{node ctc/Squeeze}} with input shapes: [?,3]

I'm trying to implement CTC alignment for my video sequence labelling task with the Keras lambda function. All available sample codes are for speech/text alignment. I'm not sure where the lambda function can be used as such for Videos. When I tried, I'm getting the error:
ValueError: Can not squeeze dim[1], expected a dimension of 1, got 3
for '{{node ctc/Squeeze}} = SqueezeT=DT_FLOAT,
squeeze_dims=[-1]' with input shapes: [?,3]
def ctc_lambda_func(args):
y_pred, labels, input_length, label_length = args
# the 2 is critical here since the first couple outputs of the RNN
# tend to be garbage:
# print "y_pred_shape: ", y_pred.shape
y_pred = y_pred[:, 2:, :]
# print "y_pred_shape: ", y_pred.shape
return K.ctc_batch_cost(labels, y_pred, input_length, label_length)
labels = Input(name='the_labels', shape=[None], dtype=dtype) # transcription data (batch_size * y_seq_size)
input_length = Input(name='input_length', shape=X_train_length.shape, dtype=dtype) # unpadded len of all x_sequences in batch
label_length = Input(name='label_length', shape=label_length.shape, dtype=dtype) # unpadded len of all y_sequences in batch
y_train=np.ndarray(shape=(num_files,max_y_length),
dtype=np.float32)
y_train.fill(1)
# Lambda layer with ctc_loss function due to Keras not supporting CTC layers
loss_out = Lambda(function=ctc_lambda_func, name='ctc', output_shape=(208,30))([y_pred, y_train, X_train_length, label_length])

Tensorflow, can not figure out what shape my inputs and labels have

I am trying to load weights and for that to work i need to perform the following:
dummy_input = tf.random.uniform(input_shape) # create a tensor of input shape
dummy_label = tf.random.uniform(label_shape) # create a tensor of label shape
hist = model.fit(dummy_input, dummy_label)
I am new to this and can't figure out what these shapes should be.
Some information about my model:
Im feeding the model images with shape (224,224,3).
In batches of 16.
I have 423 different classes and use sparse_categorical_crossentropy.
I tried this
dummy_input = tf.random.uniform([16, 224, 224, 3]) # create a tensor of input shape
dummy_label = tf.random.uniform([16, 1, 423]) # create a tensor of label shape
hist = model.fit(dummy_input, dummy_label, epochs=epochs,
steps_per_epoch=len_train // batch_size,
validation_steps=len_test // batch_size)
There may be many errors here but the one i am getting right now is
ValueError: Shape mismatch: The shape of labels (received (423,))
should equal the shape of logits except for the last dimension (received (1, 423)).
This was the solution
dummy_input = tf.random.uniform([32, 224, 224, 3]) # create a tensor of input shape
dummy_label = tf.random.uniform([32,]) # create a tensor of label shape
hist = model.fit(dummy_input, dummy_label)

Tensorflow: Recurrent Neural Network Batch Training

I am trying to implement RNN in Tensorflow. I am writing my own functions instead of using RNN cells to practice.
The problem is sequence tagging, input size is [32, 48, 900] where 32 is batch size, 48 is time steps and 900 is vocab size which is one-hot encoded vector. Output is [32, 48, 145] where first two dimensions are same as input, but the last dimension is output vocabulary size (one-hot). Basically this is a NLP tagging problem.
I am getting following error:
InvalidArgumentError (see above for traceback): logits and labels must
be same size: logits_size=[48,145] labels_size=[1536,145]
The actual labels_size is [32, 48, 145] but it merges first two dimensions without my control. FYI 32*48 = 1536
If I run my RNN with batch size 1, it works fine as expected. I could not figure out how to solve the issue. I am getting the problem in the last line of the code.
I pasted the related part of the code:
inputs = tf.placeholder(shape=[None, self.seq_length, self.vocab_size], dtype=tf.float32, name="inputs")
targets = tf.placeholder(shape=[None, self.seq_length, self.output_vocab_size], dtype=tf.float32, name="targets")
init_state = tf.placeholder(shape=[1, self.hidden_size], dtype=tf.float32, name="state")
initializer = tf.random_normal_initializer(stddev=0.1)
with tf.variable_scope("RNN") as scope:
hs_t = init_state
ys = []
for t, xs_t in enumerate(tf.split(inputs[0], self.seq_length, axis=0)):
if t > 0: scope.reuse_variables()
Wxh = tf.get_variable("Wxh", [self.vocab_size, self.hidden_size], initializer=initializer)
Whh = tf.get_variable("Whh", [self.hidden_size, self.hidden_size], initializer=initializer)
Why = tf.get_variable("Why", [self.hidden_size, self.output_vocab_size], initializer=initializer)
bh = tf.get_variable("bh", [self.hidden_size], initializer=initializer)
by = tf.get_variable("by", [self.output_vocab_size], initializer=initializer)
hs_t = tf.tanh(tf.matmul(xs_t, Wxh) + tf.matmul(hs_t, Whh) + bh)
ys_t = tf.matmul(hs_t, Why) + by
ys.append(ys_t)
hprev = hs_t
output_softmax = tf.nn.softmax(ys) # Get softmax for sampling
#outputs = tf.concat(ys, axis=0)
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=targets, logits=ys))
The problem may fall in the size of the ys, ys should have the size of [32, 48, 145], but the output ys only have the size of [48,145], so if the batchsize is 1, the taget size is [1, 48, 145], which just have the same size of [48,145] after dimensionality reduction.
To solve the problem you can add a loop to deal with the batchsize ( inputs[0] ) :
such as :
for i in range(inputs.getshape(0)):
for t, xs_t in enumerate(tf.split(inputs[i], self.seq_length, axis=0)):

Categories