I am building a classifier using the Food-101 dataset. The dataset has predefined training and test sets, both labeled. It has a total of 101,000 images. I’m trying to build a classifier model with >=90% accuracy for top-1. I’m currently sitting at 75%. The training set was provided unclean. But now, I would like to know some of the ways I can improve my model and what are some of the things I’m doing wrong.
I’ve partitioned the train and test images into their respective folders. Here, I am using 0.2 of the training dataset to validate the learner by running 5 epochs.
np.random.seed(42)
data = ImageList.from_folder(path).split_by_rand_pct(valid_pct=0.2).label_from_re(pat=file_parse).transform(size=224).databunch()
top_1 = partial(top_k_accuracy, k=1)
learn = cnn_learner(data, models.resnet50, metrics=[accuracy, top_1], callback_fns=ShowGraph)
learn.fit_one_cycle(5)
epoch train_loss valid_loss accuracy top_k_accuracy time
0 2.153797 1.710803 0.563498 0.563498 19:26
1 1.677590 1.388702 0.637096 0.637096 18:29
2 1.385577 1.227448 0.678746 0.678746 18:36
3 1.154080 1.141590 0.700924 0.700924 18:34
4 1.003366 1.124750 0.707063 0.707063 18:25
And here, I’m trying to find the learning rate. Pretty standard to how it was in the lectures:
learn.lr_find()
learn.recorder.plot(suggestion=True)
LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.
Min numerical gradient: 1.32E-06
Min loss divided by 10: 6.31E-08
Using the learning rate of 1e-06 to run another 5 epochs. Saving it as stage-2
learn.fit_one_cycle(5, max_lr=slice(1.e-06))
learn.save('stage-2')
epoch train_loss valid_loss accuracy top_k_accuracy time
0 0.940980 1.124032 0.705809 0.705809 18:18
1 0.989123 1.122873 0.706337 0.706337 18:24
2 0.963596 1.121615 0.706733 0.706733 18:38
3 0.975916 1.121084 0.707195 0.707195 18:27
4 0.978523 1.123260 0.706403 0.706403 17:04
Previously I ran 3 stages altogether but the model wasn’t improving beyond 0.706403 so I didn’t want to repeat it. Below is my confusion matrix. I apologize for the terrible resolution. Its the doing of Colab.
Since I’ve created an additional validation set, I decided to use the test set to validate the saved model of stage-2 to see how well it was performing:
path = '/content/food-101/images'
data_test = ImageList.from_folder(path).split_by_folder(train='train', valid='test').label_from_re(file_parse).transform(size=224).databunch()
learn.load('stage-2')
learn.validate(data_test.valid_dl)
This is the result:
[0.87199837, tensor(0.7584), tensor(0.7584)]
Try augmentations like RandomHorizontalFlip, RandomResizedCrop,
RandomRotate, Normalize etc from torchvision transforms. These always help a lot in classification problems.
Label smoothing and/or Mixup precision training.
Simply try using a more optimized architecture, like EfficientNet.
Instead of OneCycle, a longer, more manual training approach may help. Try Stochastic Gradient Descent with a weight decay of 5e-4 and a Nesterov Momentum of 0.9. Use Warmup Training of around 1-3 epochs, and then regular training of around 200 epochs. You could set a manual learning rate schedule or cosine annealing or some other scheme. This entire method will consume a lot more time and effort than the usual onecycle training, and should be explored only if other methods don't show considerable gains.
Related
I'm using pytorch to implement a simple linear regression model.
The code works perfectly for randomly created datasets, but when it comes to the dataset I wanted to train, it gives significantly wrong results.
Here is the code:
x = torch.linspace(1,100,steps=100)
learn_rate = 0.000001
x_train = x[:100]
x_test = x[100:]
y_train = data[:100]
y_test = data[100:]
# y_train = -0.01*x_train + torch.randn(100)*10 #Code for generating random data.
w = torch.rand(1,requires_grad=True)
b= torch.rand(1,requires_grad=True)
for i in range(1000):
loss = torch.mean((y_train-(w*x_train+b))**2)
if(i%100==0):
print(loss)
loss.backward()
w.data.add_(-w.grad.data*learn_rate)
b.data.add_(-b.grad.data*learn_rate)
w.grad.data.zero_()
b.grad.data.zero_()
The result it gives makes no sense.
However, when I used a randomly generated dataset, it works perfectly:
The dataset actually looks similar. I am not sure for the reason of the inaccuracy of this model.
Code for plotting data:
plt.plot(x_train.numpy(),y_train.numpy())
plt.plot(x_train.numpy(),(w*x_train+b).data.numpy())
plt.show()
--
Now the problem seems to be that weight converges much faster than bias. At the current learning rate, bias will not converge to the optimal. However, if I increase the learning rate just by a little, the weight will simply diverge. I have to set two learning rates.
However, I'm wondering whether setting different learning rate is the best solution for a simple model like this, because I've found out that not much model actually uses different learning rate for different parameters.
Your code seems to be correct, but your model converges slower when there is a large bias in your data (because it now has to update the bias parameter many times before it reaches the correct value).
You could try running it for more iterations or increasing the learning rate.
I am trying to optimize a given neural network (ex Perceptron Multilayer, with 2 hidden layers), by finding the number of epoch and batch that give the highest accuracy.
for epoch from 10 to 200 (in steps of 10):
for batch from 40 to 200 (in steps of 20):
modele.fit (X_train, Y_train, epochs = epoch, batch_size = batch)
I save batch, epoch, Accuracy;
Afterwards I kept the smallest epoch with the smallest corresponding batch which has the highest recognition
ex best_params: epoch = 10, batch = 150 => Accuracy = 94%
My problem is that when I re-run my model with the best_params, it doesn't give me the same results (loss, accuracy), even sometimes very low accuracy (eg 10%).
i try to fix seed, but no best result
Regards
Djam75
df=pd.DataFrame(columns=['Nb_Batch','Nb_Epoch','Accuracy'])
i=0
lst_loss=[]
lst_accuracy=[]
lst_epoch=list(np.arange(10,200,10))
lst_batch=list(np.arange(100,400,20))
for epoch in lst_epoch:
print ('---------------- Epoch ' + str(epoch)+ '------------------')
for batch in lst_batch:
modelSimple.fit(X_train, Y_train, nb_epoch = epoch, batch_size = batch, verbose = 0)
score = modelSimple.evaluate(X_test, Y_test)
df.loc[i,"Nb_Batch"]=batch
df.loc[i,"Nb_Epoch"]=epoch
df.loc[i,"Accuracy"]=score[1]*100
i=i+1
This might be happening due to random parameter initialization. Because if you are building an end-to-end model without transfer learn the weights, every time you training architecture get random values for its parameters.
In this case, a good practice is to use batch normalization layers after some layers according to your architecture.
tensoflow-implementation
pytorch-implmentation
extra idea:
Do not use any 'for', 'while' loops in the model implementation.
you can follow templates in TensorFlow or PyTorch.
OR, if you build a complete model from scratch, vectorize operations by using NumPy like metrics operation library.
Thanks for the update.
I resolve my probelm by saving a model and load it after.
thaks for idea (batch normalization ) and extra idea : not user any for ;-)
regards
I think you might not be updating the weight matrix after completing the training for certain batch sizes and epochs.
Please include the code as well in order to see the problem
https://www.tensorflow.org/tutorials/estimator/linear
I am following the Tensorflow documentation to implement a Linear Classifier but I like to use my own data instead of the tutorial set. I just have a few general questions.
My dataset is as follows. It's not a time series.
row[0] - float (changed to binary, 0 = negative, 1 = positive) VALUE TO ESTIMATE
row[1] - string (categorical, changed to vocabulary, ints 1,2,3,4,5,6,7,8,9)
row[2-19] - float (positive and negative)
row[20-60] - ints (percentile ranks, ints 10,20,30,40,50,60,70,80,90)
row[61-95] - ints (binary 1, 0)
I started by using 50k (45k training) rows of data and num_epochs=100, batch_size=256.
{'accuracy': 0.8912, 'accuracy_baseline': 0.8932, 'auc': 0.7101819, 'auc_precision_recall': 0.2830853, 'average_loss': 0.30982444, 'label/mean': 0.1068, 'loss': 0.31013006, 'precision': 0.4537037, 'prediction/mean': 0.11840516, 'recall': 0.0917603, 'global_step': 17600}
Does the column I want to estimate need to be a column of binaries for this model?
Is it a bad idea to mix data types like this? Would it be necessary to normalize the data using something like preprocessing.Normalization ?
Should I alter the epochs/batch if I want to use more data?
The accuracy seems high but the loss also seems quite high, why is that?
Any other suggestions?
Thanks for any help or advice.
Here is the answer to your questions.
By default tf.estimator.LinearClassifier considers as binary classification with n_classes=2, but you can have more than 2 classes as well.
For a linear classification normalizing data won't affect much in terms of accuracy compared to non linear classifier accuracy change after normalizing on the same data.
You can observe the change in accuracy and loss, if it does not change much for about 5-10 epochs, you can restrict the number of epochs there only. Again you can repeat the same step by changing the batch size.
Accuracy and loss are not dependent on each other, consider an example of your case to classify 0 and 1. A model with 2 classes that always predicts 0.51 for the true class would have the same accuracy as one that predicts 0.99. Best model would be with high accuracy and with less loss, if your model is giving good accuracy and high loss that means your model made huge errors on few data.
Try to tune your model hyper parameters based on several observations and to feed quality data with some preprocessing is always best way to reach high accuracy and less loss, with some additional data would be good to have always.
I'm training a sequence to sequence (seq2seq) model and I have different values to train on for the input_sequence_length.
For values 10 and 15, I get acceptable results but when I try to train with 20, I get memory errors so I switched the training to train by batches but the model over-fit and the validation loss explodes, and even with the accumulated gradient I get the same behavior, so I'm looking for hints and leads to more accurate ways to do the update.
Here is my training function (only with batch section) :
if batch_size is not None:
k=len(list(np.arange(0,(X_train_tensor_1.size()[0]//batch_size-1), batch_size )))
for epoch in range(num_epochs):
optimizer.zero_grad()
epoch_loss=0
for i in list(np.arange(0,(X_train_tensor_1.size()[0]//batch_size-1), batch_size )): # by using equidistant batch till the last one it becomes much faster than using the X.size()[0] directly
sequence = X_train_tensor[i:i+batch_size,:,:].reshape(-1, sequence_length, input_size).to(device)
labels = y_train_tensor[i:i+batch_size,:,:].reshape(-1, sequence_length, output_size).to(device)
# Forward pass
outputs = model(sequence)
loss = criterion(outputs, labels)
epoch_loss+=loss.item()
# Backward and optimize
loss.backward()
optimizer.step()
epoch_loss=epoch_loss/k
model.eval
validation_loss,_= evaluate(model,X_test_hard_tensor_1,y_test_hard_tensor_1)
model.train()
training_loss_log.append(epoch_loss)
print ('Epoch [{}/{}], Train MSELoss: {}, Validation : {} {}'.format(epoch+1, num_epochs,epoch_loss,validation_loss))
EDIT:
here are the parameters that I'm training with :
batch_size = 1024
num_epochs = 25000
learning_rate = 10e-04
optimizer=torch.optim.Adam(model.parameters(), lr=learning_rate)
criterion = nn.MSELoss(reduction='mean')
Batch size affects regularization. Training on a single example at a time is quite noisy, which makes it harder to overfit. Training on batches smoothes everything out, which makes it easier to overfit. Translating back to regularization:
Smaller batches add regularization.
Larger batches reduce regularization.
I am also curious about your learning rate. Every call to loss.backward() will accumulate the gradient. If you have set your learning rate to expect a single example at a time, and not reduced it to account for batch accumulation, then one of two things will happen.
The learning rate will be too high for the now-accumulated gradient, training will diverge, and both training and validation errors will explode.
The learning rate won't be too high, and nothing will diverge. The model will just train more quickly and effectively. If the model is too large for the data being fit, then training error will go to 0 but validation error will explode due to overfitting.
Update
Here is a bit more detail regarding the gradient accumulation.
Every call to loss.backward() will accumulate gradient, until you reset it with optimizer.zero_grad(). It will be acted on when you call optimizer.step(), based on whatever it has accumulated.
The way your code is written, you call loss.backward() for every pass through the inner loop, then you call optimizer.step() in the outer loop before resetting. So the gradient has been accumulated, that is summed, over all examples in the batch and not just one example at a time.
Under most assumptions, that will make the batch-accumulated gradient larger than the gradient for a single example. If the gradients are all aligned, for B batches, it will be larger by B times. If the gradients are i.i.d. then it will be more like sqrt(B) times larger.
If you do not account for this, then you have effectively increased your learning rate by that factor. Some of that will be mitigated by the smoothing effect of larger batches, which can then tolerate a higher learning rate. Larger batches reduce regularization, larger learning rates add it back. But that will not be a perfect match to compensate, so you will still want to adjust accordingly.
In general, whenever you change your batch size you will also want to re-tune your learning rate to compensate.
Leslie N. Smith has written some excellent papers on a methodical approach to hyperparameter tuning. A great place to start is A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay. He recommends you start by reading the diagrams, which are very well done.
I was trying to build a multiple regression model to predict housing prices using the following features:
[bedrooms bathrooms sqft_living view grade]
= [0.09375 0.266667 0.149582 0.0 0.6]
I have standardized and scaled the features using sklearn.preprocessing.MinMaxScaler.
I used Keras to build the model:
def build_model(X_train):
model = Sequential()
model.add(Dense(5, activation = 'relu', input_shape = X_train.shape[1:]))
model.add(Dense(1))
optimizer = Adam(lr = 0.001)
model.compile(loss = 'mean_squared_error', optimizer = optimizer)
return model
When I go to train the model, my loss values are insanely high, something like 4 or 40 trillion and it will only go down about a million per epoch making training infeasibly slow. At first I tried increasing the learning rate, but it didn't help much. Then I did some searching and found that others have used a log-MSE loss function so I tried it and my model seemed to work fine. (Started at 140 loss, went down to 0.2 after 400 epochs)
My question is do I always just use log-MSE when I see very large MSE values for linear/multiple regression problems? Or are there other things i can do to try and fix this issue?
A guess as to why this issue occurred is the scale between my predictor and response variables were vastly different. X's are between 0-1 while the highest Y went up to 8 million. (Am I suppose to scale down my Y's? And then scale back up for predicting?)
A lot of people believe in scaling everything. If your y goes up to 8 million, I'd scale it, yes, and reverse the scaling when you get predictions out, later.
Don't worry too much about specifically what loss number you see. Sure, 40 trillion is a bit ridiculously high, indicating changes may need to be made to the network architecture or parameters. The main concern is whether the validation loss is actually decreasing, and the network actually learning therewith. If, as you say, it 'went down to 0.2 after 400 epochs', then it sounds like you're on the right track.
There are many other loss functions besides log-mse, mse, and mae, for regression problems. Have a look at these. Hope that helps!