I have this very simple resnet18 network that I am trying to train from scratch for task of landmark estimation (I have 4 landmarks):
num_classes = 4 * 2 #4 coordinates X and Y flattened --> 4 of 2D keypoints or landmarks
class Network(nn.Module):
def __init__(self,num_classes=8):
super().__init__()
self.model_name = 'resnet18'
self.model = models.resnet18()
self.model.fc = nn.Linear(self.model.fc.in_features, num_classes)
def forward(self, x):
x = x.float()
out = self.model(x)
return out
For the following piece of code:
network = Network()
network.cuda()
criterion = nn.MSELoss()
optimizer = optim.Adam(network.parameters(), lr=0.0001)
loss_min = np.inf
num_epochs = 1
start_time = time.time()
for epoch in range(1,num_epochs+1):
loss_train = 0
loss_test = 0
running_loss = 0
network.train()
print('size of train loader is: ', len(train_loader))
for step in range(1,len(train_loader)+1):
batch = next(iter(train_loader))
images, landmarks = batch['image'], batch['landmarks']
#RuntimeError: Given groups=1, weight of size [64, 3, 7, 7], expected input[64, 600, 800, 3] to have 3 channels, but got 600 channels instead
#using permute to fix the above error
images = images.permute(0,3,1,2)
images = images.cuda()
landmarks = landmarks.view(landmarks.size(0),-1).cuda()
##images = torchvision.transforms.Normalize(images) #find the args later
##landmarks = torchvision.transforms.Normalize(landmarks) #find the args later
predictions = network(images)
# clear all the gradients before calculating them
optimizer.zero_grad()
print('predictions are: ', predictions.float())
print('landmarks are: ', landmarks.float())
# find the loss for the current step
loss_train_step = criterion(predictions.float(), landmarks.float())
loss_train_step = loss_train_step.to(torch.float32)
print("loss_train_step before backward: ", loss_train_step)
# calculate the gradients
loss_train_step.backward()
# update the parameters
optimizer.step()
print("loss_train_step after backward: ", loss_train_step)
loss_train += loss_train_step.item()
print("loss_train: ", loss_train)
running_loss = loss_train/step
print('step: ', step)
print('running loss: ', running_loss)
print_overwrite(step, len(train_loader), running_loss, 'train')
network.eval()
with torch.no_grad():
for step in range(1,len(test_loader)+1):
batch = next(iter(train_loader))
images, landmarks = batch['image'], batch['landmarks']
images = images.permute(0,3,1,2)
images = images.cuda()
landmarks = landmarks.view(landmarks.size(0),-1).cuda()
predictions = network(images)
# find the loss for the current step
loss_test_step = criterion(predictions, landmarks)
loss_test += loss_test_step.item()
running_loss = loss_test/step
print_overwrite(step, len(test_loader), running_loss, 'Validation')
loss_train /= len(train_loader)
loss_test /= len(test_loader)
print('\n--------------------------------------------------')
print('Epoch: {} Train Loss: {:.4f} Valid Loss: {:.4f}'.format(epoch, loss_train, loss_test))
print('--------------------------------------------------')
if loss_test < loss_min:
loss_min = loss_test
torch.save(network.state_dict(), '../moth_landmarks.pth')
print("\nMinimum Valid Loss of {:.4f} at epoch {}/{}".format(loss_min, epoch, num_epochs))
print('Model Saved\n')
print('Training Complete')
print("Total Elapsed Time : {} s".format(time.time()-start_time))
I get this NAN as well as very large MSE losses (showing only for one step):
size of train loader is: 90
predictions are: tensor([[-0.0380, -0.1871, 0.0729, -0.3570, -0.2153, 0.3066, 1.1273, -0.0558],
[-0.0316, -0.1876, 0.0317, -0.3613, -0.2333, 0.3023, 1.0940, -0.0665],
[-0.0700, -0.1882, 0.0068, -0.3201, -0.1884, 0.2953, 1.0516, -0.0567],
[-0.0844, -0.2009, 0.0573, -0.3166, -0.2597, 0.3127, 1.0343, -0.0573],
[-0.0486, -0.2333, 0.0535, -0.3245, -0.2310, 0.2818, 1.0590, -0.0716],
[-0.0240, -0.1989, 0.0572, -0.3135, -0.2435, 0.2912, 1.0612, -0.0560],
[-0.0942, -0.2439, 0.0277, -0.3147, -0.2368, 0.2978, 1.0110, -0.0874],
[-0.0356, -0.2285, 0.0064, -0.3179, -0.2432, 0.3083, 1.0300, -0.0756]],
device='cuda:0', grad_fn=<AddmmBackward>)
landmarks are: tensor([[501.9200, 240.1600, 691.0000, 358.0000, 295.0000, 294.0000, 488.6482,
279.6466],
[495.6300, 246.0600, 692.0000, 235.0000, 286.0000, 242.0000, 464.0000,
339.0000],
[488.7100, 240.8900, 613.4007, 218.3425, 281.0000, 220.0000, 415.9966,
338.4796],
[502.5721, 245.4983, 640.0000, 131.0000, 360.0000, 143.0000, 542.9840,
321.8463],
[505.1393, 246.4364, 700.0000, 306.0000, 303.0000, 294.0000, 569.6925,
351.8367],
[501.0900, 244.0100, 724.0000, 251.0000, 302.0000, 276.0000, 504.6415,
291.7443],
[495.9500, 244.2800, 608.0000, 127.0000, 323.0000, 166.0000, 491.0000,
333.0000],
[490.2500, 241.3400, 699.0000, 304.0000, 398.6197, 313.8339, 429.1374,
303.8483]], device='cuda:0')
loss_train_step before backward: tensor(166475.6875, device='cuda:0', grad_fn=<MseLossBackward>)
loss_train_step after backward: tensor(166475.6875, device='cuda:0', grad_fn=<MseLossBackward>)
loss_train: 166475.6875
step: 1
running loss: 166475.6875
The other thing besides Network I am also suspicious of is the transforms:
Here's the transforms I am using:
transformed_dataset = MothLandmarksDataset(csv_file='moth_gt.csv',
root_dir='.',
transform=transforms.Compose(
[
Rescale(256),
RandomCrop(224),
ToTensor()#,
##transforms.Normalize(mean = [ 0.485, 0.456, 0.406 ],
## std = [ 0.229, 0.224, 0.225 ])
]
)
)
How can I fix it? Also, with having Normalize I do still get same problem (NAN and very large loss values since the predictions are very off).
How can I basically debug why the predictions are very off?
I think there is a problem with the way you are using DataLoader. iter(train_loader) creates an iterator out of the data loader and calling next should give you the next example from dataset. But you are calling next(iter(train_loader)) in each iteration which creates a new iterator each time from train_loader and returns the next example which would essentially be the first example in your dataset. So i think this way you end up training (and validating) on the same example in each iteration. Even if your data loader shuffles the dataset each time, you'll end up training on some random samples from the dataset (as each time only the first random sample will be used) and not on the complete dataset. Try changing your code such that you use iter(train_loader) only once and and call next in each iteration:
train_iter = iter(train_loader)
for step in range(1, len(train_loader)+1):
batch = next(train_iter)
# now batch contains (step)th batch (assuming 1-based indexing)
Or even better change your for loop to iterate on the train_loader itself:
for step, batch in enumerate(train_loader):
# now batch contains (step)th batch (assuming 0-based indexing)
Similar change would also be required for validation or evaluation. Let me know if this resolves your problem or helps in any way.
Related
my data are divided to 2 parts training and validation one . I Used load_dataset and dataloader functions . I convert data in dataset to torch format using traindataset.set_format
when starting training I got error
new(): invalid data type 'numpy.str_'
in this line
for step,batch in enumerate(train_dataloader):
so how can i fix this error?
model= MixModel()
#model.load_state_dict(torch.load(r"/media/sh/saved_weightscnnbert.pt"))
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
traindataset = load_dataset('csv', data_files='/content/drive//My Drive/Colab Notebooks/newdataset/newdata_train2',split='train')
testdataset = load_dataset('csv', data_files='/content/drive//My Drive/Colab Notebooks/newdataset/newdata_valid2',split='train')
traindataset =traindataset.map(encode)
testdataset1 = testdataset.map(encode)
traindataset =traindataset.map(lambda examples: {'labels': examples['symptoms']}, batched=True)
testdataset =testdataset1.map(lambda examples: {'labels': examples['symptoms']}, batched=True)
traindataset.set_format(type='torch', columns=['input_ids', 'attention_mask', 'labels'])
testdataset.set_format(type='torch', columns=['input_ids', 'attention_mask', 'labels'])
train_dataloader = torch.utils.data.DataLoader(traindataset, batch_size= 64)
test_dataloader = torch.utils.data.DataLoader(testdataset, batch_size= 64)
# function to train the model
def train():
model.train()
total_loss, total_accuracy = 0, 0
# empty list to save model predictions
total_preds=[]
Labels=[]
# iterate over batches
for step,batch in enumerate(train_dataloader):
# progress update after every 50 batches.
if step % 100 == 0 and not step == 0:
print(' Batch {:>5,} of {:>5,}.'.format(step, len(train_dataloader)))
sent_id, mask, labels = batch['input_ids'],batch['attention_mask'],batch['labels']
# clear previously calculated gradients
model.zero_grad()
# get model predictions for the current batch
preds = model(sent_id, mask, labels)
# compute the loss between actual and predicted values
alpha=0.25
gamma=2
ce_loss = loss_fn(preds, labels)
#pt = torch.exp(-ce_loss)
#focal_loss = (alpha * (1-pt)**gamma * ce_loss).mean() # mean over the batch
# add on to the total loss
total_loss = total_loss + ce_loss.item()
# backward pass to calculate the gradients
ce_loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
# update parameters
optimizer.step()
preds =torch.argmax(preds, dim=1)
total_preds.append(preds)
total_accuracy += (preds == labels).float().sum()
# compute the training loss of the epoch
avg_loss = total_loss / len(train_dataloader)
avg_accuracy = total_accuracy / len(traindataset)
# predictions are in the form of (no. of batches, size of batch, no. of classes).
# reshape the predictions in form of (number of samples, no. of classes)
total_preds = np.concatenate(total_preds, axis=0)
#returns the loss and predictions
return avg_loss, total_preds, avg_accuracy
I have a custom-written model using PyTorch. However, when I try to use .to(device) to train in GPU, ı come across this error. The model does run on the CPU but is slow. Can Anyone give me a little insight on how to solve this?
I added the inputs to the GPU as well and still, I receive this error. The second picture added shows the initial part of the error and the title I mentioned in the last lie of the errors.
Model Train Loop
Error Initial Part
def build_network(dim_val, dim_attn, input_size, dec_seq_len, output_sequence_length, n_decoder_layers, n_encoder_layers, n_heads):
# Input Size = 1
t = Transformer(dim_val, dim_attn, input_size ,dec_seq_len, output_sequence_length, n_decoder_layers, n_encoder_layers, n_heads)
return t
def build_optimizer(model ,learning_rate):
optimizer = torch.optim.Adam(model.parameters(), learning_rate)
return optimizer
Initialize network
t = build_network(2, 2, 1, 2, 2, 2, 2, 2).to(device)
Initialize Optimizer
optimizer = build_optimizer(t, 0.001)
MAPE Formula
def mape(actual, pred):
actual, pred = np.array(actual), np.array(pred)
return round(np.mean(np.abs((actual - pred) / actual)) * 100,2)
for e in range(2):
start_time = time.time()
# Batch Train Loss
batch_losses = []
batch_mape = []
# Batch Test Loss
val_batch_loss = []
val_batch_mape = []
################################################ Train Batch Loop ################################################
for test_x, test_y in train_loader:
X = test_x.to(device)
Y = test_y.to(device)
# Zero Grad at start of loop
optimizer.zero_grad()
# Forward pass and calculate loss
net_out = t(X)
## Loss MSE
loss = torch.mean((net_out - Y) ** 2)
# Train MAPE
train_mape = mape(np.abs((Y.detach().numpy())),
net_out.detach().numpy())
# Backwards pass
loss.backward()
optimizer.step()
# Append the Batch Loss
batch_losses.append(loss.item())
# Append the Batch MAPE
batch_mape.append(train_mape)
################################################ Validation Batch Loop ################################################
for test_x, test_y in val_loader:
X_Test = test_x.to(device)
Y_Test = test_y.to(device)
#test_values = torch.from_numpy(Transformer_Xtest)
val = t(X_Test)
# Val Loss
validation_loss = torch.mean((val - Y_Test) ** 2)
# Append the Batch Loss
val_batch_loss.append(validation_loss.item())
# Val MAPE
validation_mape = mape(np.abs((Y_Test.detach().numpy())),
val.detach().numpy())
# Append the Batch MAPE
val_batch_mape.append(validation_mape)
######## Train Epoch Summary
epoch_train_loss = round(np.mean(batch_losses), 5)
epoch_train_mape = round(np.mean(batch_mape), 5)
######## Val Epoch Summary
epoch_val_loss = round(np.mean(val_batch_loss), 5)
epoch_val_mape = round(np.mean(val_batch_mape), 5)
show_print = print("Epoch: ", e+1, "epoch_train_loss:", epoch_train_loss, "epoch_train_mape: ", epoch_train_mape, "epoch_val_loss:", epoch_val_loss, "epoch_val_mape:", epoch_val_mape,"--- Elapsed Time Seconds ---", round((time.time() - start_time), 3) )
I am working on a multilabel text classification task with Bert.
The following is the code for generating an iterable Dataset.
from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler
train_set = TensorDataset(X_train_id,X_train_attention, y_train)
test_set = TensorDataset(X_test_id,X_test_attention,y_test)
train_dataloader = DataLoader(
train_set,
sampler = RandomSampler(train_set),
drop_last=True,
batch_size=13
)
test_dataloader = DataLoader(
test_set,
sampler = SequentialSampler(test_set),
drop_last=True,
batch_size=13
)
The following are the the dimensions of the training set:
In[]
print(X_train_id.shape)
print(X_train_attention.shape)
print(y_train.shape)
Out[]
torch.Size([262754, 512])
torch.Size([262754, 512])
torch.Size([262754, 34])
There should be 262754 rows each with 512 columns. The output should predict the values from 34 possible labels. I am breaking them down into batches of 13.
Training code
optimizer = AdamW(model.parameters(), lr=2e-5)
# Training
def train(model):
model.train()
train_loss = 0
for batch in train_dataloader:
b_input_ids = batch[0].to(device)
b_input_mask = batch[1].to(device)
b_labels = batch[2].to(device)
optimizer.zero_grad()
loss, logits = model(b_input_ids,
token_type_ids=None,
attention_mask=b_input_mask,
labels=b_labels)
loss.backward()
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
optimizer.step()
train_loss += loss.item()
return train_loss
# Testing
def test(model):
model.eval()
val_loss = 0
with torch.no_grad():
for batch in test_dataloader:
b_input_ids = batch[0].to(device)
b_input_mask = batch[1].to(device)
b_labels = batch[2].to(device)
with torch.no_grad():
(loss, logits) = model(b_input_ids,
token_type_ids=None,
attention_mask=b_input_mask,
labels=b_labels)
val_loss += loss.item()
return val_loss
# Train task
max_epoch = 1
train_loss_ = []
test_loss_ = []
for epoch in range(max_epoch):
train_ = train(model)
test_ = test(model)
train_loss_.append(train_)
test_loss_.append(test_)
Out[]
Expected input batch_size (13) to match target batch_size (442).
This is the description of my model:
from transformers import BertForSequenceClassification, AdamW, BertConfig
model = BertForSequenceClassification.from_pretrained(
"cl-tohoku/bert-base-japanese-whole-word-masking", # 日本語Pre trainedモデル
num_labels = 34,
output_attentions = False,
output_hidden_states = False,
)
I have clearly stated that I want the batch size to be 13. However, during the training process pytorch is throwing a Runtime Error
Where is the number 442 even coming from? I have clearly stated that I want each batch to have a size of 13 rows.
I have already confirmed that each batch has input_id with dimensions [13,512], attention tensor with dimensions [13,512], and labels with dimensions [13,34].
I have tried caving in and using a batch size of 442 when initializing the DataLoader, but after a single batch iteration, it throws another Pytorch Value Error Expected: input batch size does not match target batch size, this time showing:
ValueError: Expected input batch_size (442) to match target batch_size (15028).
Why does the batch size keep on changing? Where is the number 15028 even coming from?
The following are some of the answers I have looked through, but had no luck on applying to my source code:
https://discuss.pytorch.org/t/valueerror-expected-input-batch-size-324-to-match-target-batch-size-4/24498
https://discuss.pytorch.org/t/valueerror-expected-input-batch-size-1-to-match-target-batch-size-64/43071
Pytorch CNN error: Expected input batch_size (4) to match target batch_size (64)
Thanks in advance. Your support is truly appreciated :)
It looks like this model does not handle multi-target scenario according to documentation:
labels (torch.LongTensor of shape (batch_size,), optional) – Labels for computing the sequence classification/regression loss. Indices should be in [0, ..., config.num_labels - 1]. If config.num_labels == 1 a regression loss is computed (Mean-Square loss), If config.num_labels > 1 a classification loss is computed (Cross-Entropy).
So, you need prepare your labels to have the shape of batch_size: torch.Size([batch_size]) with class index in a range [0, ..., config.num_labels - 1] just like for the original pytorch's CrossEntropyLoss (see example section).
I am very new in pytorch and implementing my own network of image classifier. However I see for each epoch training accuracy is very good but validation accuracy is 0.i noted till 5th epoch. I am using Adam optimizer and have learning rate .001. also resampling the whole data set after each epoch into training n validation set. Please help where I am going wrong.
Here is my code:
### where is data?
data_dir_train = '/home/sup/PycharmProjects/deep_learning/CNN_Data/training_set'
data_dir_test = '/home/sup/PycharmProjects/deep_learning/CNN_Data/test_set'
# Define your batch_size
batch_size = 64
allData = datasets.ImageFolder(root=data_dir_train,transform=transformArr)
# We need to further split our training dataset into training and validation sets.
def split_train_validation():
# Define the indices
num_train = len(allData)
indices = list(range(num_train)) # start with all the indices in training set
split = int(np.floor(0.2 * num_train)) # define the split size
#train_idx, valid_idx = indices[split:], indices[:split]
# Random, non-contiguous split
validation_idx = np.random.choice(indices, size=split, replace=False)
train_idx = list(set(indices) - set(validation_idx))
# define our samplers -- we use a SubsetRandomSampler because it will return
# a random subset of the split defined by the given indices without replacement
train_sampler = SubsetRandomSampler(train_idx)
validation_sampler = SubsetRandomSampler(validation_idx)
#train_loader = DataLoader(allData,batch_size=batch_size,sampler=train_sampler,shuffle=False,num_workers=4)
#validation_loader = DataLoader(dataset=allData,batch_size=1, sampler=validation_sampler)
return (train_sampler,validation_sampler)
Training
from torch.optim import Adam
import torch
import createNN
import torch.nn as nn
import loadData as ld
from torch.autograd import Variable
from torch.utils.data import DataLoader
# check if cuda - GPU support available
cuda = torch.cuda.is_available()
#create model, optimizer and loss function
model = createNN.ConvNet(class_num=2)
optimizer = Adam(model.parameters(),lr=.001,weight_decay=.0001)
loss_func = nn.CrossEntropyLoss()
if cuda:
model.cuda()
# function to save model
def save_model(epoch):
torch.save(model.load_state_dict(),'imageClassifier_{}.model'.format(epoch))
print('saved model at epoch',epoch)
def exp_lr_scheduler ( epoch , init_lr = args.lr, weight_decay = args.weight_decay, lr_decay_epoch = cf.lr_decay_epoch):
lr = init_lr * ( 0.5 ** (epoch // lr_decay_epoch))
def train(num_epochs):
best_acc = 0.0
for epoch in range(num_epochs):
print('\n\nEpoch {}'.format(epoch))
train_sampler, validation_sampler = ld.split_train_validation()
train_loader = DataLoader(ld.allData, batch_size=30, sampler=train_sampler, shuffle=False)
validation_loader = DataLoader(dataset=ld.allData, batch_size=1, sampler=validation_sampler)
model.train()
acc = 0.0
loss = 0.0
total = 0
# train model with training data
for i,(images,labels) in enumerate(train_loader):
# if cuda then move to GPU
if cuda:
images = images.cuda()
labels = labels.cuda()
# Variable class wraps a tensor and we can calculate grad
images = Variable(images)
labels = Variable(labels)
# reset accumulated gradients for each batch
optimizer.zero_grad()
# pass images to model which returns preiction
output = model(images)
#calculate the loss based on prediction and actual
loss = loss_func(output,labels)
# backpropagate the loss and compute gradient
loss.backward()
# update weights as per the computed gradients
optimizer.step()
# prediction class
predVal , predClass = torch.max(output.data, 1)
acc += torch.sum(predClass == labels.data)
loss += loss.cpu().data[0]
total += labels.size(0)
# print the statistics
train_acc = acc/total
train_loss = loss / total
print('Mean train acc = {} over epoch = {}'.format(epoch,acc))
print('Mean train loss = {} over epoch = {}'.format(epoch, loss))
# Valid model with validataion data
model.eval()
acc = 0.0
loss = 0.0
total = 0
for i,(images,labels) in enumerate(validation_loader):
# if cuda then move to GPU
if cuda:
images = images.cuda()
labels = labels.cuda()
# Variable class wraps a tensor and we can calculate grad
images = Variable(images)
labels = Variable(labels)
# reset accumulated gradients for each batch
optimizer.zero_grad()
# pass images to model which returns preiction
output = model(images)
#calculate the loss based on prediction and actual
loss = loss_func(output,labels)
# backpropagate the loss and compute gradient
loss.backward()
# update weights as per the computed gradients
optimizer.step()
# prediction class
predVal, predClass = torch.max(output.data, 1)
acc += torch.sum(predClass == labels.data)
loss += loss.cpu().data[0]
total += labels.size(0)
# print the statistics
valid_acc = acc / total
valid_loss = loss / total
print('Mean train acc = {} over epoch = {}'.format(epoch, valid_acc))
print('Mean train loss = {} over epoch = {}'.format(epoch, valid_loss))
if(best_acc<valid_acc):
best_acc = valid_acc
save_model(epoch)
# at 30th epoch we save the model
if (epoch == 30):
save_model(epoch)
train(20)
I think you did not take into account that acc += torch.sum(predClass == labels.data) returns a tensor instead of a float value. Depending on the version of pytorch you are using I think you should change it to:
acc += torch.sum(predClass == labels.data).cpu().data[0] #pytorch 0.3
acc += torch.sum(predClass == labels.data).item() #pytorch 0.4
Although your code seems to be working for old pytorch version, I would recommend you to upgrade to the 0.4 version.
Also, I mentioned other problems/typos in your code.
You are loading the dataset for every epoch.
for epoch in range(num_epochs):
print('\n\nEpoch {}'.format(epoch))
train_sampler, validation_sampler = ld.split_train_validation()
train_loader = DataLoader(ld.allData, batch_size=30, sampler=train_sampler, shuffle=False)
validation_loader = DataLoader(dataset=ld.allData, batch_size=1, sampler=validation_sampler)
...
That should not happen, it should be enough loading it once
train_sampler, validation_sampler = ld.split_train_validation()
train_loader = DataLoader(ld.allData, batch_size=30, sampler=train_sampler, shuffle=False)
validation_loader = DataLoader(dataset=ld.allData, batch_size=1, sampler=validation_sampler)
for epoch in range(num_epochs):
print('\n\nEpoch {}'.format(epoch))
...
In the training part you have (this does not happen in the validation):
train_acc = acc/total
train_loss = loss / total
print('Mean train acc = {} over epoch = {}'.format(epoch,acc))
print('Mean train loss = {} over epoch = {}'.format(epoch, loss))
Where you are printing acc instead of train_acc
Also, in the validation part I mentioned that you are printing print('Mean train acc = {} over epoch = {}'.format(epoch, valid_acc)) when it should be something like 'Mean val acc'.
Changing this lines of code, using a standard model I created and CIFAR dataset the training seems to converge, accuracy increases at every epoch while mean loss value decreases.
I Hope I could help you!
Headed into Lasagne and Theano with a modified mnist.py (the primary example of Lasagne) to train a very simple XOR.
import numpy as np
import theano
import theano.tensor as T
import time
import lasagne
X_train = [[[[0, 0], [0, 1], [1, 0], [1, 1]]]] # (1)
y_train = [[[[1, 0], [0, 1], [0, 1], [1, 0]]]]
# [0, 1, 1, 0]
X_train = np.array(X_train).astype(np.uint8)
y_train = np.array(y_train).astype(np.uint8)
print X_train.shape
X_val = X_train
y_val = y_train
X_test = X_train
y_test = y_train
def build_mlp(input_var=None):
# This creates an MLP of two hidden layers of 800 units each, followed by
# a softmax output layer of 10 units. It applies 20% dropout to the input
# data and 50% dropout to the hidden layers.
# Input layer, specifying the expected input shape of the network
# (unspecified batchsize, 1 channel, 28 rows and 28 columns) and
# linking it to the given Theano variable `input_var`, if any:
l_in = lasagne.layers.InputLayer(shape=(None, 1, 4, 2), # (2)
input_var=input_var)
# Apply 20% dropout to the input data:
# l_in_drop = lasagne.layers.DropoutLayer(l_in, p=0.2)
# Add a fully-connected layer of 800 units, using the linear rectifier, and
# initializing weights with Glorot's scheme (which is the default anyway):
l_hid1 = lasagne.layers.DenseLayer(
l_in, num_units=4,
nonlinearity=lasagne.nonlinearities.rectify,
W=lasagne.init.GlorotUniform())
# Finally, we'll add the fully-connected output layer, of 10 softmax units:
l_out = lasagne.layers.DenseLayer(
l_hid1, num_units=2,
nonlinearity=lasagne.nonlinearities.softmax)
# Each layer is linked to its incoming layer(s), so we only need to pass
# the output layer to give access to a network in Lasagne:
return l_out
# Prepare Theano variables for inputs and targets
input_var = T.tensor4('inputs')
target_var = T.ivector('targets')
network = build_mlp(input_var)
# Create a loss expression for training, i.e., a scalar objective we want
# to minimize (for our multi-class problem, it is the cross-entropy loss):
prediction = lasagne.layers.get_output(network)
loss = lasagne.objectives.categorical_crossentropy(prediction, target_var)
loss = loss.mean()
# We could add some weight decay as well here, see lasagne.regularization.
# Create update expressions for training, i.e., how to modify the
# parameters at each training step. Here, we'll use Stochastic Gradient
# Descent (SGD) with Nesterov momentum, but Lasagne offers plenty more.
params = lasagne.layers.get_all_params(network, trainable=True)
updates = lasagne.updates.nesterov_momentum(
loss, params, learning_rate=0.01, momentum=0.9)
# Create a loss expression for validation/testing. The crucial difference
# here is that we do a deterministic forward pass through the network,
# disabling dropout layers.
test_prediction = lasagne.layers.get_output(network, deterministic=True)
test_loss = lasagne.objectives.categorical_crossentropy(test_prediction,
target_var)
test_loss = test_loss.mean()
# As a bonus, also create an expression for the classification accuracy:
test_acc = T.mean(T.eq(T.argmax(test_prediction, axis=1), target_var),
dtype=theano.config.floatX)
# Compile a function performing a training step on a mini-batch (by giving
# the updates dictionary) and returning the corresponding training loss:
train_fn = theano.function([input_var, target_var], loss, updates=updates)
# Compile a second function computing the validation loss and accuracy:
val_fn = theano.function([input_var, target_var], [test_loss, test_acc])
# ############################# Batch iterator ###############################
# This is just a simple helper function iterating over training data in
# mini-batches of a particular size, optionally in random order. It assumes
# data is available as numpy arrays. For big datasets, you could load numpy
# arrays as memory-mapped files (np.load(..., mmap_mode='r')), or write your
# own custom data iteration function. For small datasets, you can also copy
# them to GPU at once for slightly improved performance. This would involve
# several changes in the main program, though, and is not demonstrated here.
def iterate_minibatches(inputs, targets, batchsize, shuffle=False):
assert len(inputs) == len(targets)
if shuffle:
indices = np.arange(len(inputs))
np.random.shuffle(indices)
for start_idx in range(0, len(inputs) - batchsize + 1, batchsize):
if shuffle:
excerpt = indices[start_idx:start_idx + batchsize]
else:
excerpt = slice(start_idx, start_idx + batchsize)
yield inputs[excerpt], targets[excerpt]
else:
if shuffle:
excerpt = indices[0:len(inputs)]
else:
excerpt = slice(0, len(inputs))
yield inputs[excerpt], targets[excerpt]
num_epochs = 4
# Finally, launch the training loop.
print("Starting training...")
# We iterate over epochs:
for epoch in range(num_epochs):
# In each epoch, we do a full pass over the training data:
train_err = 0
train_batches = 0
start_time = time.time()
for batch in iterate_minibatches(X_train, y_train, 4, shuffle=True):
inputs, targets = batch
print inputs.shape, targets.shape, input_var.shape, input_var.ndim, inputs.ndim
train_err += train_fn(inputs, targets) # (3)
train_batches += 1
# And a full pass over the validation data:
val_err = 0
val_acc = 0
val_batches = 0
for batch in iterate_minibatches(X_val, y_val, 4, shuffle=False):
inputs, targets = batch
err, acc = val_fn(inputs, targets)
val_err += err
val_acc += acc
val_batches += 1
# Then we print the results for this epoch:
print("Epoch {} of {} took {:.3f}s".format(
epoch + 1, num_epochs, time.time() - start_time))
print(" training loss:\t\t{:.6f}".format(train_err / train_batches))
print(" validation loss:\t\t{:.6f}".format(val_err / val_batches))
print(" validation accuracy:\t\t{:.2f} %".format(
val_acc / val_batches * 100))
# After training, we compute and print the test error:
test_err = 0
test_acc = 0
test_batches = 0
for batch in iterate_minibatches(X_test, y_test, 500, shuffle=False):
inputs, targets = batch
err, acc = val_fn(inputs, targets)
test_err += err
test_acc += acc
test_batches += 1
print("Final results:")
print(" test loss:\t\t\t{:.6f}".format(test_err / test_batches))
print(" test accuracy:\t\t{:.2f} %".format(
test_acc / test_batches * 100))
# Optionally, you could now dump the network weights to a file like this:
# np.savez('model.npz', lasagne.layers.get_all_param_values(network))
Defined a training set at (1), modified the input to the new dimension at (2) and get an exception at (3):
Traceback (most recent call last):
File "test.py", line 139, in <module>
train_err += train_fn(inputs, targets)
File "/usr/local/lib/python2.7/site-packages/theano/compile/function_module.py", line 513, in __call__
allow_downcast=s.allow_downcast)
File "/usr/local/lib/python2.7/site-packages/theano/tensor/type.py", line 169, in filter
data.shape))
TypeError: ('Bad input argument to theano function with name "test.py:91" at index 1(0-based)', 'Wrong number of dimensions: expected 1, got 4 with shape (1, 1, 4, 2).')
And I have no clue what I did wrong. When I print the dimension (or the output of the program until the exception) I get this
(1, 1, 4, 2)
Starting training...
(1, 1, 4, 2) (1, 1, 4, 2) Shape.0 4 4
Which seem to be perfect. What I'm doing wrong and how must the array be formed to work?
The problem is with the second input, targets. Note that the error message indicated this by saying "...at index 1(0-based)...", i.e. the second parameter.
target_var is an ivector but you're providing a 4-dimensional tensor for targets. The solution is to alter your y_train dataset so that it is 1-dimensional:
y_train = [0, 1, 1, 0]
This will cause another error because you currently assert that the first dimension of the inputs and targets should match, but changing
assert len(inputs) == len(targets)
to
assert inputs.shape[2] == len(targets)
will solve the second problem and allow the script to run successfully.