Pytorch LSTM and cross entropy - python

I am working on sentiment analysis, I want to classify the output into 4 classes. For loss I am using cross-entropy.
The problem is PyTorch cross-entropy needs the input of (batch_size, output) which is am having trouble with.
I am taking a batch size of 12 and sequence size is 32
import torch.nn as nn
class RNN(nn.Module):
def __init__(self, hidden_dim = 256, input_size = 32 , num_layers = 1, num_classes=4, vocab_size = len(vocab_to_int)+1, embedding_dim=100):
super().__init__()
self.input_size = input_size
self.hidden_dim = hidden_dim
self.num_layers = num_layers
self.num_classes = num_classes
self.embedding = nn.Embedding(vocab_size, embedding_dim)
self.lstm = nn.LSTM(embedding_dim, hidden_dim, num_layers)
self.fc1 = nn.Linear(hidden_dim, 50)
self.fc2 = nn.Linear(50, 4)
def forward(self, x, hidden):
x = self.embedding(x)
x = x.view(32, 12, 100)
x, hidden = self.lstm(x, hidden)
x = x.contiguous().view(-1, 256)
x = self.fc1(x) # output shape ([384, 50])
x = self.fc2(x) # output shape [384, 4]
return x, hidden
def init_hidden(self, batch_size=12):
weight = next(self.parameters()).data
hidden = (weight.new(self.num_layers, 12, self.hidden_dim).zero_().cuda(), weight.new(self.num_layers, 12, self.hidden_dim).zero_().cuda())
return hidden

According to the CrossEntropyLoss docs:
input has to be a Tensor of size (C) for unbatched input, (minibatch,C) [for batched input] [...]
The code you provided is only the RNN class and not the data processing and the actual call to CrossEntropyLoss, but the error you stated in the comments makes me think that you didn't reshape the labels tensor to have the same size as the output from the neural network. Therefore, you'd be calculating the loss of a tensor with size (384, 4) against another tensor which I infer is of size (12, 32). Your labels tensor should be of size (384) to match the first dimension of the neural network output.
Also, you don't have to manually reshape your tensors, you can reshape them after the forward() call through the torch.nn.utils.rnn.pack_padded_sequence() function. If you do apply this function to both the output of the neural network and the labels, you will have a tensor of size (384, 4) that PyTorch can handle in the call to CrossEntropyLoss. See the note in the pack_padded_sequence() function docs for more details.

Related

Get Tensor shape at train time in Tensorflow

How do you get the "actual" shape of a tensor at training time? e.g.,
(None, 64) -> (128, 64)
In other words, at training time, I get a shape like (None, 64) where None means the first dimension of the tensor is dynamic w.r.t to the input size, and 64 is an example value for the second dimension. I assume at training time, the "actual" size of that tensor is known to the framework, so I am wondering how/if I can get the actual size of the tensor, where None is evaluated to the train/test/eval dataset size. Hence, I am interested to get (128, 64) instead of (None, 64) where 128 is the size of the input.
Please consider the following simplified code example.
class ALayer(tensorflow.keras.layers.Layer):
def call(self, inputs):
features = tf.matmul(inputs, self.kernel) + self.bias
# These are the different approaches I've tried.
print(features.shape)
# This prints: (None, 64)
print(tf.shape(features)
# This prints: Tensor("ALayer/Shape:0", shape=(2,), dtype=int32)
return features
input_layer = layers.Input(input_dim)
x = ALayer()([input_layer])
x = layers.Dense(1)(x)
model = keras.Model(inputs=[input_layer], outputs=[x])
model.compile()
train_dataset = tf.data.Dataset.from_tensor_slices((X_train, (y_train)))
val_dataset = tf.data.Dataset.from_tensor_slices((X_val, (y_val)))
model.fit(train_dataset, validation_data=val_dataset)
You should use tf.print since eager execution is activated by default in TF 2.7:
import tensorflow as tf
class ALayer(tf.keras.layers.Layer):
def __init__(self, units=32):
super(ALayer, self).__init__()
self.units = units
def build(self, input_shape):
self.w = self.add_weight(
shape=(input_shape[-1], self.units),
initializer="random_normal",
trainable=True,
)
self.b = self.add_weight(
shape=(self.units,), initializer="random_normal", trainable=True
)
def call(self, inputs):
features = tf.matmul(inputs, self.w) + self.b
tf.print('Features shape -->', tf.shape(features), '\n')
return features
input_layer = tf.keras.layers.Input(shape=(10,))
x = ALayer(10)(input_layer)
x = tf.keras.layers.Dense(1)(x)
model = tf.keras.Model(inputs=[input_layer], outputs=[x])
model.compile(loss=tf.keras.losses.BinaryCrossentropy())
X_train, y_train = tf.random.normal((64, 10)), tf.random.uniform((64,), maxval=2, dtype=tf.int32)
X_val, y_val = tf.random.normal((64, 10)), tf.random.uniform((64,), maxval=2, dtype=tf.int32)
train_dataset = tf.data.Dataset.from_tensor_slices((X_train, y_train)).batch(32)
val_dataset = tf.data.Dataset.from_tensor_slices((X_val, y_val)).batch(32)
model.fit(train_dataset, validation_data=val_dataset, epochs=1, verbose=0)
Features shape --> [32 10]
Features shape --> [32 10]
Features shape --> [32 10]
Features shape --> [32 10]
<keras.callbacks.History at 0x7fab3ce15910>

Predict new samples with PyTorch model

I am newbee in neural networks, i have teached my model and now i want to test it. I have wrote a code with google help, but it do not work.
The problem is that i do not understand from where i am getting the 4th dimension.
Code is the following:
import matplotlib.pyplot as plt
import numpy as np
import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
transform = transforms.Compose([
transforms.Resize(32),
transforms.CenterCrop(32),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
main_path = 'D:/RTU/dataset/ready_dataset_2classes'
train_data_path = main_path + '/train'
#test_data_path = main_path + '/test'
weigths_path = 'D:/RTU/dataset/weights_done/weights_noise_original037-97%.pt'
train_data = torchvision.datasets.ImageFolder(root=train_data_path, transform=transform)
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# convolutional layer (sees 32x32x3 image tensor)
self.conv1 = nn.Conv2d(3, 16, 3, padding=1)
# convolutional layer (sees 16x16x16 tensor)
self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
# convolutional layer (sees 8x8x32 tensor)
self.conv3 = nn.Conv2d(32, 64, 3, padding=1)
# max pooling layer
self.pool = nn.MaxPool2d(2, 2)
# linear layer (64 * 4 * 4 -> 500)
self.fc1 = nn.Linear(64 * 4 * 4, 500)
# linear layer (500 -> 10)
self.fc2 = nn.Linear(500, 250)
self.fc3 = nn.Linear(250, 2)
# dropout layer (p=0.25)
self.dropout = nn.Dropout(0.25) #0.25
def forward(self, x):
# add sequence of convolutional and max pooling layers
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = self.pool(F.relu(self.conv3(x)))
# flatten image input
x = x.view(-1, 64 * 4 * 4)
# add dropout layer
x = self.dropout(x)
# add 1st hidden layer, with relu activation function
x = F.relu(self.fc1(x))
# add dropout layer
x = self.dropout(x)
# add 2nd hidden layer, with relu activation function
x = F.relu(self.fc2(x))
x = self.dropout(x)
x = self.fc3(x)
return x
# Disable grad
with torch.no_grad():
# Retrieve item
index = 1
item = train_data[index]
image = item[0]
true_target = item[1]
# Loading the saved model
mlp = Net()
optimizer = optim.SGD(mlp.parameters(), lr=0.01)
epoch=5
valid_loss_min = np.Inf
checkpoint = torch.load(weigths_path , map_location=torch.device('cpu'))
mlp.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
valid_loss_min = checkpoint['valid_loss_min']
mlp.eval()
# Generate prediction
prediction = mlp(image)
# Predicted class value using argmax
predicted_class = np.argmax(prediction)
# Reshape image
#image = image.reshape(28, 28, 1)
# Show result
plt.imshow(image, cmap='gray')
plt.title(f'Prediction: {predicted_class} - Actual target: {true_target}')
plt.show()
The code seems working till the "mlp.eval()" and then i am getting an error Expected 4-dimensional input for 4-dimensional weight [16, 3, 3, 3], but got 3-dimensional input of size [3, 32, 32] instead.
What i am doing wrong?
Error
When you are training neural nets, you are feeding small batches of input data to your model. Indeed even it's not clearly specified when writting Layers in Pytorch, If you look at the documentation, here you can see that Layers receive 4D arrays
with N corresponding to batch size and C to number of channels, here 3 because you are using RGB images
So when testing your model once trained, the testing data should be built the same way to be fed into the network.
Thus if you want to feed 1 image to your network you must reshape it proprely
myimage.reshape(-1,3,32,32)
print(myimage.shape)
#(1,3,32,33)

How many weight convolution layer has?

I have a simple convolution network:
import torch.nn as nn
class model(nn.Module):
def __init__(self, ks=1):
super(model, self).__init__()
self.conv1 = nn.Conv2d(in_channels=4, out_channels=32, kernel_size=ks, stride=1)
self.fc1 = nn.Linear(8*8*32*ks, 64)
self.fc2 = nn.Linear(64, 64)
def forward(self, x):
x = F.relu(self.conv1(x))
x = x.view(x.size(0), -1)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return x
cnn = model(1)
Since the kernel size is 1 and the output channel is 32, I assume that there should be 32*1*1 weights in this layer. But, when I ask pytorch about the shape of the weight matrix cnn.conv1.weight.shape, it returns torch.Size([32, 4, 1, 1]). Why the number of input channel should matter on the weight of a conv2d layer?
Am I missing something?
It matters because you are doing 2D convolution over the images which means the depth of the filter(kernel) must be equal to the number of in_channels(pytorch sets it for you) so the true kernel size is [in_channels,1,1]. On the other hands we can say that out_channels number is the number of kernels so the number of weights = number of kernels * size of kernel = out_channels * (in_channels * kernel_size). Here is 2D conv with 3D input

GRU layer works fine but LSTM layer gives value error : too many values to unpack expected 2

I am using the tensorflow image captioning tutorial to train a model. They have used GRU in decoder but i want to use LSTM based decoder or infact bidirectional LSTM if possible.. GRU works fine but if i replace it with LSTM or bidirectional LSTM it gives the error said in title
The decoder logic that works with GRU is ggiven below, i want to use LSTM or bidirectional LSTM in the place of GRU
def __init__(self, embedding_dim, units, vocab_size):
super(RNN_Decoder, self).__init__()
self.units = units
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
self.gru = tf.keras.layers.GRU(self.units,
return_sequences=True,
return_state=True,
recurrent_initializer='glorot_uniform')
self.fc1 = tf.keras.layers.Dense(self.units)
self.fc2 = tf.keras.layers.Dense(vocab_size)
self.attention = BahdanauAttention(self.units)
def call(self, x, features, hidden):
# defining attention as a separate model
context_vector, attention_weights = self.attention(features, hidden)
# x shape after passing through embedding == (batch_size, 1, embedding_dim)
x = self.embedding(x)
# x shape after concatenation == (batch_size, 1, embedding_dim + hidden_size)
x = tf.concat([tf.expand_dims(context_vector, 1), x], axis=-1)
# passing the concatenated vector to the GRU
output, state = self.gru(x)
# shape == (batch_size, max_length, hidden_size)
x = self.fc1(output)
# x shape == (batch_size * max_length, hidden_size)
x = tf.reshape(x, (-1, x.shape[2]))
# output shape == (batch_size * max_length, vocab)
x = self.fc2(x)
return x, state, attention_weights
def reset_state(self, batch_size):
return tf.zeros((batch_size, self.units))```

Tensor output from final layer is of the wrong shape in PyTorch

I am building a sequence-to-label classifier, where the input data are text sequences and output labels are binary. The model is very simple, with GRU hidden layers and a Word Embeddings input layer. I want a [n, 60] input to output a [n, 1] label, but the Torch model returns a [n, 60] output.
My model, with minimal layers:
class Model(nn.Module):
def __init__(self, weights_matrix, hidden_size, num_layers):
super(Model, self).__init__()
self.embedding, num_embeddings, embedding_dim = create_emb_layer(weights_matrix, True)
self.hidden_size = hidden_size
self.num_layers = num_layers
self.gru = nn.GRU(embedding_dim, hidden_size, num_layers, batch_first=True)
self.out = nn.Linear(hidden_size, 1)
def forward(self, inp, hidden):
emb = self.embedding(inp);
out, hidden = self.gru(emb, hidden)
out = self.out(out);
return out, hidden;
def init_hidden(self, batch_size):
return torch.zeros(self.num_layers, batch_size, self.hidden_size).to(device);
Model Layers:
Model(
(embedding): Embedding(184901, 100)
(gru): GRU(100, 60, num_layers=3, batch_first=True)
(out): Linear(in_features=60, out_features=1, bias=True)
)
Input shapes of my data are: X : torch.Size([64, 60]), and Y : torch.Size([64, 1]), for a single batch of size 64.
When I run the X tensor through the model, it should output a single label, however, the output from the classifier is torch.Size([64, 60, 1]). To run the model, I do the following:
for epoch in range(1):
running_loss = 0.0;
batch_size = 64;
hidden = model.init_hidden(batch_size)
for ite, data in enumerate(train_loader, 0):
x, y = data[:,:-1], data[:,-1].reshape(-1,1)
optimizer.zero_grad();
outputs, hidden = model(x, hidden);
hidden = Variable(hidden.data).to(device);
loss = criterion(outputs, y);
loss.backward();
optimizer.step();
running_loss = running_loss + loss.item();
if ite % 2000 == 1999:
print('[%d %5d] loss: %.3f'%(epoch+1, ite+1, running_loss / 2000))
running_loss = 0.0;
When I print the shape of outputs, it is 64x60x1 rather than 64x1. What I also don't get is how the criterion function is able to calculate the loss when the shapes of outputs and labels are inconsistent. With Tensorflow, this would always throw an error, but it doesn't with Torch.
The output from your model is of shape torch.Size([64, 60, 1]) i.e. 64 is the batch size, and (60, 1) corresponds [n, 1] as expected.
Assuming you're using nn.CrossEntropy(input, target), it expected the input to be (N,C) and target to be (N), where C is number of classes.
Your output is consistent, and hence loss is evaluated.
For example,
outputs = torch.randn(3, 2, 1)
target = torch.empty(3, 1, dtype=torch.long).random_(2)
criterion = nn.CrossEntropyLoss(reduction='mean')
print(outputs)
print(target)
loss = criterion(outputs, target)
print(loss)
# outputs
tensor([[[ 0.5187],
[ 1.0320]],
[[ 0.2169],
[ 2.4480]],
[[-0.4895],
[-0.6096]]])
tensor([[0],
[1],
[0]])
tensor(0.5731)
Read more here.

Categories