Multihead attention model - IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1 - python

I'm trying to train a multiclass classification model (with 3 classes) using a multihead attention layer and two linear layers with some tabular data, and I'm getting this error:
IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)
I have copied my model/dataset classes and my training loop below; it seems like the error is associated with the data I am passing into my loss function (criterion), which looks like this:
y_pred: tensor([-115.7523, -113.5820, 37.0307], dtype=torch.float64, grad_fn=<SqueezeBackward0>)
and
y: tensor(0).
I am unable to resolve this error, so any help with this would be greatly appreciated.
Here is the dataset and model classes:
class GeneExpressionDataset(torch.utils.data.Dataset):
def __init__(self, data):
self.data = data
self.features = self.data.iloc[:, 2:].values
self.labels = self.data.iloc[:, 1].values
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
features = torch.tensor(self.features[idx], dtype=torch.double)
labels = torch.tensor(self.labels[idx], dtype=torch.long)
return features, labels
class MultiheadAttention(nn.Module):
def __init__(self, input_dim, num_heads, dropout_rate):
super(MultiheadAttention, self).__init__()
self.input_dim = input_dim
self.num_heads = num_heads
self.dropout_rate = dropout_rate
self.q_linear = nn.Linear(input_dim, input_dim)
self.k_linear = nn.Linear(input_dim, input_dim)
self.v_linear = nn.Linear(input_dim, input_dim)
self.dropout = nn.Dropout(dropout_rate)
self.out_linear = nn.Linear(input_dim, input_dim)
def forward(self, query, key, value, mask=None):
batch_size = query.size(0)
# Apply linear transformations to obtain query, key, and value representations
q = self.q_linear(query).view(batch_size, -1, self.num_heads)
k = self.k_linear(key).view(batch_size, -1, self.num_heads)
v = self.v_linear(value).view(batch_size, -1, self.num_heads)
# Compute scaled dot-product attention scores
scores = torch.matmul(q, k.transpose(1, 2)) / (self.input_dim ** 0.5)
if mask is not None:
mask = mask.unsqueeze(1)
scores = scores.masked_fill(mask == 0, -1e9)
# Apply softmax to obtain attention weights
attn_weights = torch.softmax(scores, dim=-1)
# Apply dropout to the attention weights
attn_weights = self.dropout(attn_weights)
# Compute the attention output
attn_output = torch.matmul(attn_weights, v)
# Concatenate the attention output from different heads
attn_output = attn_output.transpose(1, 2).contiguous().view(batch_size, -1, self.num_heads * (self.input_dim // self.num_heads))
# Apply linear transformation to obtain the final attention output
out = self.out_linear(attn_output)
return out
class geneGPT(nn.Module):
def __init__(self, input_dim, hid_dim, output_dim, num_heads, dropout_rate):
super().__init__()
self.attention = MultiheadAttention(input_dim, num_heads, dropout_rate)
self.fc1 = nn.Linear(num_heads * (input_dim//num_heads), hid_dim)
self.relu = nn.ReLU()
self.out = nn.Linear(hid_dim, output_dim)
def forward(self, x, mask=None):
x = self.attention(x, x, x, mask)
x = self.relu(self.fc1(x))
x = self.out(x)
return x
and here is the training loop:
print('Training...')
model = geneGPT(INPUT_DIM, HID_DIM, OUTPUT_DIM, NUM_HEADS, DROPOUT_RATE).double().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
for epoch in range(NUM_EPOCHS):
train_losses = 0.0
valid_losses = 0.0
train_accs = 0.0
valid_accs = 0.0
for i, (x, y) in enumerate(train_dl):
x, y = x.to(device), y.to(device)
optimizer.zero_grad()
y_pred = model(x).squeeze()
y = y.squeeze()
print(y_pred, y)
train_loss = criterion(y_pred, y)
train_acc = multi_acc(y_pred, y)
train_loss.backward()
optimizer.step()
train_losses += train_loss.item()
train_accs += train_acc.item()
for i, (x, y) in enumerate(val_dl):
x, y = x.to(device), y.to(device)
y_pred = model(x).squeeze()
y = y.squeeze()
valid_loss = criterion(y_pred, y)
valid_acc = multi_acc(y_pred, y)
valid_losses += valid_loss.item()
valid_accs += valid_acc.item()
print("Epoch {}/{} | Loss: {:.4f} | Train Loss:{:.4f} | Valid Loss".format(epoch + 1, NUM_EPOCHS, train_loss / len(train_dl), valid_loss / len(val_dl)))
print("Training Accuracy: {:.4f} | Validation Accuracy: {:.4f}".format(train_accs / len(train_dl), valid_accs / len(val_dl)))
test_accs = 0.0
for i, (x, y) in enumerate(test_dl):
x, y = x.to(device), y.to(device)
y_pred = model(x).squeeze()
y = y.squeeze()
test_acc = multi_acc(y_pred, y)
test_accs += test_acc.item()
print("Testing Accuracy: {:.4f}".format(test_accs / len(test_dl)))
torch.save(model.state_dict(), "model.pth")

In your training loop
y_pred = model(x).squeeze()
y = y.squeeze()
You changed the dimension of both, and in later step
train_loss = criterion(y_pred.unsqueeze(0), y)
You again changed the y_pred dimension, keeping the y dim same. So I assume the relative difference in the dim of both y and y_pred is resulting in the error " Expected input batch_size (1) to match target batch_size (0)".

Related

Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead BUT i'm not using numpy here

I get this error in the training loop for this neural network:
class YourModel(torch.nn.Module):
def __init__(self):
super(YourModel, self).__init__()
self.fc1 = nn.Linear(50, 128)
self.sigmoid = nn.Sigmoid()
self.fc2 = nn.Linear(128, 1)
def forward(self, x1, x2):
x = torch.cat((x1, x2), dim=1)
out = self.fc1(x)
out = self.sigmoid(out)
out = self.fc2(out)
return out
model = YourModel().to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
loss_fn = nn.BCELoss()
My dataloader contains 3 datasets, 1 with 25 features for 8000 documents, other with 25 features of 8000 queries and the last one with the relation between both (0 or 1). So that's why I'm using a neural network for binary classification. (However if you know an alternative neural network I'm open to options)
My batch_size is 1 right now and here is my training loop:
def train(dataloader, model, loss_fn, optimizer):
model.train()
train_loss = 0
num_batches = len(dataloader)
all_pred = []
all_real = []
for batch, i in enumerate(train_dataloader): #access to each batch
i_1 = i[0]
i_2 = i[1]
y = i[2].float().view(1, 1) #find relevance
#y = torch.clamp(y, min=0, max=1)
#x = np.hstack((i_1, i_2))
#x = torch.Tensor(x)
#x = torch.clamp(x, min=0, max=1)
# Zero the gradients
optimizer.zero_grad()
# Forward pass
y_pred = model(i_1, i_2).float()
y_pred = torch.clamp(y_pred, min=0, max=1)
loss = loss_fn(y_pred, y)
# Backward pass
loss.backward()
# Update the parameters
optimizer.step()
train_loss += loss.item() #sum the loss
all_pred.append(y_pred)
all_real.append(y)
if batch > 0 and batch%1000 == 0:
print(f"Partial loss: {train_loss/batch}, F1: {f1_score(all_real, all_pred)}")
train_loss /= num_batches
print(f"Total loss: {train_loss}") #print loss of every epoch
return train_loss
I'm getting this error: "Can't call numpy() on Tensor that requires grad. Use tensor.detach().numpy() instead." but as far as I know I'm not calling numpy on any tensors. And if I use the detach method then I get the an error saying that the loss can not be computed because the tensor of 0 doesn't need grad. So it is pretty much a loop.

Linear regression using Pytorch

I have classification problem. I am using Pytorch, My input is sequence of length 341 and output one of three classes {0,1,2}, I want to train linear regression model using pytorch, I created the following class but during the training, the loss values start to have numbers then inf then NAN. I do not know how to fix that . Also I tried to initialize the weights for linear model but it is the same thing. Any suggestions.
class regression(nn.Module):
def __init__(self, input_dim):
super().__init__()
self.input_dim = input_dim
# One layer
self.linear = nn.Linear(input_dim, 1)
def forward(self, x):
y_pred = self.linear(x)
return y_pred
criterion = torch.nn.MSELoss()
def fit(model, data_loader, optim, epochs):
for epoch in range(epochs):
for i, (X, y) in enumerate(data_loader):
X = X.float()
y = y.unsqueeze(1).float()
X = Variable(X, requires_grad=True)
y = Variable(y, requires_grad=True)
# Make a prediction for the input X
pred = model(X)
#loss = (y-pred).pow(2).mean()
loss = criterion(y, pred)
optim.zero_grad()
loss.backward()
optim.step()
print(loss)
print(type(loss))
# Give some feedback after each 5th pass through the data
if epoch % 5 == 0:
print("Epoch", epoch, f"loss: {loss}")
return None
regnet = regression(input_dim=341)
optim = SGD(regnet.parameters(), lr=0.01)
fit(regnet, data_loader, optim=optim, epochs=5)
pred = regnet(torch.Tensor(test_set.data_info).float())
pred = pred.detach().numpy()
I would additionally suggest to replace MSE with CrossEntropy Loss as it is better suited for multi-class classificiation problems.
import random
import torch
from torch import nn, optim
from matplotlib import pyplot as plt
# Generate random dataset with your shape to test
# Replace this with your own dataset
data = []
for label in [0, 1, 2]:
for i in range(1000):
data.append((torch.rand(341), label))
# train test split
random.shuffle(data)
train, val = data[:1500], data[1500:]
def run_gradient_descent(model, data_train, data_val, batch_size=64, learning_rate=0.01, weight_decay=0, num_epochs=10):
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=learning_rate, weight_decay=weight_decay)
iters, losses = [], []
iters_sub, train_acc, val_acc = [], [] ,[]
train_loader = torch.utils.data.DataLoader(data_train, batch_size=batch_size, shuffle=True)
# training
n = 0 # the number of iterations
for epoch in range(num_epochs):
for xs, ts in iter(train_loader):
if len(ts) != batch_size:
continue
zs = model(xs)
loss = criterion(zs, ts) # compute the total loss
loss.backward() # compute updates for each parameter
optimizer.step() # make the updates for each parameter
optimizer.zero_grad() # a clean up step for PyTorch
# save the current training information
iters.append(n)
losses.append(float(loss)/batch_size) # compute *average* loss
if n % 10 == 0:
iters_sub.append(n)
train_acc.append(get_accuracy(model, data_train))
val_acc.append(get_accuracy(model, data_val))
# increment the iteration number
n += 1
# plotting
plt.title("Training Curve (batch_size={}, lr={})".format(batch_size, learning_rate))
plt.plot(iters, losses, label="Train")
plt.xlabel("Iterations")
plt.ylabel("Loss")
plt.show()
plt.title("Training Curve (batch_size={}, lr={})".format(batch_size, learning_rate))
plt.plot(iters_sub, train_acc, label="Train")
plt.plot(iters_sub, val_acc, label="Validation")
plt.xlabel("Iterations")
plt.ylabel("Accuracy")
plt.legend(loc='best')
plt.show()
return model
def get_accuracy(model, data):
loader = torch.utils.data.DataLoader(data, batch_size=500)
correct, total = 0, 0
for xs, ts in loader:
zs = model(xs)
pred = zs.max(1, keepdim=True)[1] # get the index of the max logit
correct += pred.eq(ts.view_as(pred)).sum().item()
total += int(ts.shape[0])
return correct / total
class MyRegression(nn.Module):
def __init__(self, input_dim, output_dim):
super(MyRegression, self).__init__()
# One layer
self.linear = nn.Linear(input_dim, output_dim)
def forward(self, x):
return self.linear(x)
model = MyRegression(341, 3)
run_gradient_descent(model, train, val, batch_size=64, learning_rate=0.01, num_epochs=10)
cause of my reputation number I can't comment.so if I was you. I'm gonna build like this: I think there is something wrong with your method of making a Module.
class regression(nn.Module):
def __init__(self,input_dim,output_dim):
super(regression,self).__init__()
#function
self.linear=nn.Linear(input_dim,output_dim)
def forward(self,x):
return self.linear(x)
#define the model
input_dim=341
output_dim=3
model=LinearRegression(input_dim,output_dim)
# Mean square error
mse=nn.MSELoss()
#Optimization
learning_rate=0.01
optimizer=torch.optim.SGD(model.parameters(),lr=learning_rate)
#train the model
loss_list=[]
iteration_number=X
for iteration in range(iteration_number):
#optimiziation
optimizer.zero_grad()
#forward to get output
results=model("input_datas_tensor")
#loss calculate
loss=mse(results,"outputs_datas_tensor")
#backward propagation
loss.backward()
#updating parameters
optimizer.step()
#store loss
loss_list.append(loss.data)
if(iteration %5==0):
print("epoch{} ,loss{}".format(iteration,loss.data))

RuntimeError: 1D target tensor expected, multi-target not supported Pytorch

I recently shifted to pytorch from keras and I am still trying to understand how all this work. Below is the code I have implemented to classify mnist dataset using a simple MLP. Just like I used to do in keras I have flattend each of 28x28 image into a vector of 784 , and I have also created a one-hot representation for my labels.
In the model I was hoping that given a vector of 784 the model would output a one-hot vector with probabilities,but as soon as my code reaches to compute the loss I get the following error :
RuntimeError: 1D target tensor expected, multi-target not supported
Below is my code :
import numpy as np
import matplotlib.pyplot as plt
import torch
import time
from torch import nn, optim
from keras.datasets import mnist
from torch.utils.data import Dataset, DataLoader
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)
torch.manual_seed(RANDOM_SEED)
# ----------------------------------------------------
class MnistDataset(Dataset):
def __init__(self, data_size=0):
(x, y), (_, _) = mnist.load_data()
x = [i.flatten() for i in x]
x = np.array(x, dtype=np.float32)
if data_size < 0 or data_size > len(y):
assert ("Data size should be between 0 to number of files in the dataset")
if data_size == 0:
data_size = len(y)
self.data_size = data_size
# picking 'data_size' random samples
self.x = x[:data_size]
self.y = y[:data_size]
# scaling between 0-1
self.x = (self.x / 255)
# Creating one-hot representation of target
y_encoded = []
for label in y:
encoded = np.zeros(10)
encoded[label] = 1
y_encoded.append(encoded)
self.y = np.array(y_encoded)
def __len__(self):
return self.data_size
def __getitem__(self, index):
x_sample = self.x[index]
label = self.y[index]
return x_sample, label
# ----------------------------------------------------
num_train_samples = 10000
num_test_samples = 2000
# Each generator returns a single
# sample & its label on each iteration.
mnist_train = MnistDataset(data_size=num_train_samples)
mnist_test = MnistDataset(data_size=num_test_samples)
# Each generator returns a batch of samples on each iteration.
train_loader = DataLoader(mnist_train, batch_size=128, shuffle=True) # 79 batches
test_loader = DataLoader(mnist_test, batch_size=128, shuffle=True) # 16 batches
# ----------------------------------------------------
# Defining the Model Architecture
class MLP(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(28 * 28, 100)
self.act1 = nn.ReLU()
self.fc2 = nn.Linear(100, 50)
self.act2 = nn.ReLU()
self.fc3 = nn.Linear(50, 10)
self.act3 = nn.Sigmoid()
def forward(self, x):
x = self.act1(self.fc1(x))
x = self.act2(self.fc2(x))
output = self.act3(self.fc3(x))
return output
# ----------------------------------------------------
model = MLP()
# Defining optimizer and loss function
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
# ----------------------------------------------------
# Training the model
epochs = 10
print("Training Started...")
for epoch in range(epochs):
for batch_index, (inputs, targets) in enumerate(train_loader):
optimizer.zero_grad() # Zero the gradients
outputs = model(inputs) # Forward pass
loss = criterion(outputs, targets) # Compute the Loss
loss.backward() # Compute the Gradients
optimizer.step() # Update the parameters
# Evaluating the model
total = 0
correct = 0
with torch.no_grad():
for batch_idx, (inputs, targets) in enumerate(test_loader):
outputs = model(inputs)
_, predicted = torch.max(outputs.data, 1)
total += targets.size(0)
correct += predicted.eq(targets.data).cpu().sum()
print('Epoch : {} Test Acc : {}'.format(epoch, (100. * correct / total)))
print("Training Completed Sucessfully")
# ----------------------------------------------------
I also read some other posts related to the same problem & most of them said that the CrossEntropy loss the target has to be a single number ,which totally gets over my head.Can someone please explain a solution.Thank you.
For nn.CrossEntropyLoss, you don't need one-hot representation of the label, you just need to pass the prediction's logit, which shape is (batch_size, n_class), and a target vector (batch_size,)
So just pass in the label index vector y instead of one-hot vector.
Fixed of your code:
class MnistDataset(Dataset):
def __init__(self, data_size=0):
(x, y), (_, _) = mnist.load_data()
x = [i.flatten() for i in x]
x = np.array(x, dtype=np.float32)
if data_size < 0 or data_size > len(y):
assert ("Data size should be between 0 to number of files in the dataset")
if data_size == 0:
data_size = len(y)
self.data_size = data_size
# picking 'data_size' random samples
self.x = x[:data_size]
self.y = y[:data_size]
# scaling between 0-1
self.x = (self.x / 255)
self.y = y # <--
def __len__(self):
return self.data_size
def __getitem__(self, index):
x_sample = self.x[index]
label = self.y[index]
return x_sample, label
Take a look at Pytorch example for more detail:
https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html

pytorch, How can i make same size of tensor model(x) and answer(x)?

I'm try to make a simple linear model to predict parameters of formula.
y = 3*x1 + x2 - 2*x3
Unfortunately, there are some problem when i try to compute loss.
def answer(x):
return 3 * x[:,0] + x[:,1] - 2 * x[:,2]
def loss_f(x):
y = answer(x)
y_hat = model(x)
loss = ((y - y_hat).pow(2)).sum() / x.size(0)
return loss
When i set batch_size = 3, the size of each result is different
x = torch.randn(3,3)
answer(x)
tensor([ 2.0201, -3.8354, 2.0059])
model(x)
tensor([[ 0.2085],
[-0.0670],
[-1.3635]], grad_fn=<ThAddmmBackward>)
answer(x.data).size()
torch.Size([3])
model(x.data).size()
torch.Size([3, 1])
I think the broadcast applied automatically.
loss = ((y - y_hat).pow(2)).sum() / x.size(0)
How can i make same size of two tensors? Thanks
This is my code
import torch
import torch.nn as nn
import torch.optim as optim
class model(nn.Module):
def __init__(self, input_size, output_size):
super(model, self).__init__()
self.linear = nn.Linear(input_size, output_size)
def forward(self, x):
y = self.linear(x)
return y
model = model(3,1)
optimizer = optim.SGD(model.parameters(), lr = 0.001, momentum=0.1)
print('Parameters : ')
for p in model.parameters():
print(p)
print('')
print('Optimizer : ')
print(optimizer)
def generate_data(batch_size):
x = torch.randn(batch_size, 3)
return x
def answer(x):
return 3 * x[:,0] + x[:,1] - 2 * x[:,2]
def loss_f(x):
y = answer(x)
y_hat = model(x)
loss = ((y - y_hat).pow(2)).sum() / x.size(0)
return loss
x = torch.randn(3,3)
print(x)
x = torch.FloatTensor(x)
batch_size = 3
epoch_n = 1000
iter_n = 100
for epoch in range(epoch_n):
avg_loss = 0
for i in range(iter_n):
x = torch.randn(batch_size, 3)
optimizer.zero_grad()
loss = loss_f(x.data)
loss.backward()
optimizer.step()
avg_loss += loss
avg_loss = avg_loss / iter_n
x_valid = torch.FloatTensor([[1,2,3]])
y_valid = answer(x_valid)
model.eval()
y_hat = model(x_valid)
model.train()
print(avg_loss, y_valid.data[0], y_hat.data[0])
if avg_loss < 0.001:
break
You can use Tensor.view
https://pytorch.org/docs/stable/tensors.html#torch.Tensor.view
So something like
answer(x.data).view(-1, 1)
should do the trick.

pytorch loss value not change

I wrote a module based on this article: http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/
The idea is pass the input into multiple streams then concat together and connect to a FC layer. I divided my source code into 3 custom modules: TextClassifyCnnNet >> FlatCnnLayer >> FilterLayer
FilterLayer:
class FilterLayer(nn.Module):
def __init__(self, filter_size, embedding_size, sequence_length, out_channels=128):
super(FilterLayer, self).__init__()
self.model = nn.Sequential(
nn.Conv2d(1, out_channels, (filter_size, embedding_size)),
nn.ReLU(inplace=True),
nn.MaxPool2d((sequence_length - filter_size + 1, 1), stride=1)
)
for m in self.modules():
if isinstance(m, nn.Conv2d):
n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
m.weight.data.normal_(0, math.sqrt(2. / n))
def forward(self, x):
return self.model(x)
FlatCnnLayer:
class FlatCnnLayer(nn.Module):
def __init__(self, embedding_size, sequence_length, filter_sizes=[3, 4, 5], out_channels=128):
super(FlatCnnLayer, self).__init__()
self.filter_layers = nn.ModuleList(
[FilterLayer(filter_size, embedding_size, sequence_length, out_channels=out_channels) for
filter_size in filter_sizes])
def forward(self, x):
pools = []
for filter_layer in self.filter_layers:
out_filter = filter_layer(x)
# reshape from (batch_size, out_channels, h, w) to (batch_size, h, w, out_channels)
pools.append(out_filter.view(out_filter.size()[0], 1, 1, -1))
x = torch.cat(pools, dim=3)
x = x.view(x.size()[0], -1)
x = F.dropout(x, p=dropout_prob, training=True)
return x
TextClassifyCnnNet (main module):
class TextClassifyCnnNet(nn.Module):
def __init__(self, embedding_size, sequence_length, num_classes, filter_sizes=[3, 4, 5], out_channels=128):
super(TextClassifyCnnNet, self).__init__()
self.flat_layer = FlatCnnLayer(embedding_size, sequence_length, filter_sizes=filter_sizes,
out_channels=out_channels)
self.model = nn.Sequential(
self.flat_layer,
nn.Linear(out_channels * len(filter_sizes), num_classes)
)
def forward(self, x):
x = self.model(x)
return x
def fit(net, data, save_path):
if torch.cuda.is_available():
net = net.cuda()
for param in list(net.parameters()):
print(type(param.data), param.size())
optimizer = optim.Adam(net.parameters(), lr=0.01, weight_decay=0.1)
X_train, X_test = data['X_train'], data['X_test']
Y_train, Y_test = data['Y_train'], data['Y_test']
X_valid, Y_valid = data['X_valid'], data['Y_valid']
n_batch = len(X_train) // batch_size
for epoch in range(1, n_epochs + 1): # loop over the dataset multiple times
net.train()
start = 0
end = batch_size
for batch_idx in range(1, n_batch + 1):
# get the inputs
x, y = X_train[start:end], Y_train[start:end]
start = end
end = start + batch_size
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
predicts = _get_predict(net, x)
loss = _get_loss(predicts, y)
loss.backward()
optimizer.step()
if batch_idx % display_step == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(x), len(X_train), 100. * batch_idx / (n_batch + 1), loss.data[0]))
# print statistics
if epoch % display_step == 0 or epoch == 1:
net.eval()
valid_predicts = _get_predict(net, X_valid)
valid_loss = _get_loss(valid_predicts, Y_valid)
valid_accuracy = _get_accuracy(valid_predicts, Y_valid)
print('\r[%d] loss: %.3f - accuracy: %.2f' % (epoch, valid_loss.data[0], valid_accuracy * 100))
print('\rFinished Training\n')
net.eval()
test_predicts = _get_predict(net, X_test)
test_loss = _get_loss(test_predicts, Y_test).data[0]
test_accuracy = _get_accuracy(test_predicts, Y_test)
print('Test loss: %.3f - Test accuracy: %.2f' % (test_loss, test_accuracy * 100))
torch.save(net.flat_layer.state_dict(), save_path)
def _get_accuracy(predicts, labels):
predicts = torch.max(predicts, 1)[1].data[0]
return np.mean(predicts == labels)
def _get_predict(net, x):
# wrap them in Variable
inputs = torch.from_numpy(x).float()
# convert to cuda tensors if cuda flag is true
if torch.cuda.is_available:
inputs = inputs.cuda()
inputs = Variable(inputs)
return net(inputs)
def _get_loss(predicts, labels):
labels = torch.from_numpy(labels).long()
# convert to cuda tensors if cuda flag is true
if torch.cuda.is_available:
labels = labels.cuda()
labels = Variable(labels)
return F.cross_entropy(predicts, labels)
It seems that parameters 're just updated slightly each epoch, the accuracy remains for all the process. While with the same implementation and the same params in Tensorflow, it runs correctly.
I'm new to Pytorch, so maybe my instructions has something wrong, please help me to find out. Thank you!
P.s: I try to use F.nll_loss + F.log_softmax instead of F.cross_entropy. Theoretically, it should return the same, but in fact another result is printed out (but it still be a wrong loss value)
I have seen that in your original code, weight_decay term is set to be 0.1. weight_decay is used to regularize the network's parameters. This term maybe too strong so that the regularization is too much. Try to reduce the value of weight_decay.
For convolutional neural networks in computer vision tasks. weight_decay term are usually set to be 5e-4 or 5e-5. I am not familiar with text classification. These values may work for you out of the box or you have to tweak it a little bit by trial and error.
Let me know if it works for you.
I realised that L2_loss in Adam Optimizer make loss value remain unchanged (I haven't tried in other Optimizer yet). It works when I remove L2_loss:
# optimizer = optim.Adam(net.parameters(), lr=0.01, weight_decay=0.1)
optimizer = optim.Adam(model.parameters(), lr=0.001)
=== UPDATE (See above answer for more detail!) ===
self.features = nn.Sequential(self.flat_layer)
self.classifier = nn.Linear(out_channels * len(filter_sizes), num_classes)
...
optimizer = optim.Adam([
{'params': model.features.parameters()},
{'params': model.classifier.parameters(), 'weight_decay': 0.1}
], lr=0.001)
In my case, I was facing the same error. On my laptop without GPU the training was fine. When I tried on GPU the model didn’t change the accuracy and loss after the first epochs. I was using nn.CrossEntropyLoss() with Adam.
Changing Adam with SGD worked for me.
I am sharing this, anyone may suffer from this.

Categories