Are all train samples used in fit_generator in Keras? - python

I am using model.fit_generator() to train a neural network with Keras. During the fitting process I've set the steps_per_epoch to 16 (len(training samples)/batch_size).
If the mini batch size is set to 12, and the total number of training samples is 195, does it mean that 3 samples won't be used in the training phase?

No, because it is a generator the model does not know the total number of training samples. Therefore, it finishes an epoch when it reaches the final step defined with the steps_per_epoch argument. In your case it will indeed train 192 samples per epoch.
If you want to use all samples in your model you can shuffle the data at the start of every epoch with the argument shuffle.

Related

Neural Network optimization using epoch and batch

I am trying to optimize a given neural network (ex Perceptron Multilayer, with 2 hidden layers), by finding the number of epoch and batch that give the highest accuracy.
for epoch from 10 to 200 (in steps of 10):
for batch from 40 to 200 (in steps of 20):
modele.fit (X_train, Y_train, epochs = epoch, batch_size = batch)
I save batch, epoch, Accuracy;
Afterwards I kept the smallest epoch with the smallest corresponding batch which has the highest recognition
ex best_params: epoch = 10, batch = 150 => Accuracy = 94%
My problem is that when I re-run my model with the best_params, it doesn't give me the same results (loss, accuracy), even sometimes very low accuracy (eg 10%).
i try to fix seed, but no best result
Regards
Djam75
df=pd.DataFrame(columns=['Nb_Batch','Nb_Epoch','Accuracy'])
i=0
lst_loss=[]
lst_accuracy=[]
lst_epoch=list(np.arange(10,200,10))
lst_batch=list(np.arange(100,400,20))
for epoch in lst_epoch:
print ('---------------- Epoch ' + str(epoch)+ '------------------')
for batch in lst_batch:
modelSimple.fit(X_train, Y_train, nb_epoch = epoch, batch_size = batch, verbose = 0)
score = modelSimple.evaluate(X_test, Y_test)
df.loc[i,"Nb_Batch"]=batch
df.loc[i,"Nb_Epoch"]=epoch
df.loc[i,"Accuracy"]=score[1]*100
i=i+1
This might be happening due to random parameter initialization. Because if you are building an end-to-end model without transfer learn the weights, every time you training architecture get random values for its parameters.
In this case, a good practice is to use batch normalization layers after some layers according to your architecture.
tensoflow-implementation
pytorch-implmentation
extra idea:
Do not use any 'for', 'while' loops in the model implementation.
you can follow templates in TensorFlow or PyTorch.
OR, if you build a complete model from scratch, vectorize operations by using NumPy like metrics operation library.
Thanks for the update.
I resolve my probelm by saving a model and load it after.
thaks for idea (batch normalization ) and extra idea : not user any for ;-)
regards
I think you might not be updating the weight matrix after completing the training for certain batch sizes and epochs.
Please include the code as well in order to see the problem

training by batches leads to more over-fitting

I'm training a sequence to sequence (seq2seq) model and I have different values to train on for the input_sequence_length.
For values 10 and 15, I get acceptable results but when I try to train with 20, I get memory errors so I switched the training to train by batches but the model over-fit and the validation loss explodes, and even with the accumulated gradient I get the same behavior, so I'm looking for hints and leads to more accurate ways to do the update.
Here is my training function (only with batch section) :
if batch_size is not None:
k=len(list(np.arange(0,(X_train_tensor_1.size()[0]//batch_size-1), batch_size )))
for epoch in range(num_epochs):
optimizer.zero_grad()
epoch_loss=0
for i in list(np.arange(0,(X_train_tensor_1.size()[0]//batch_size-1), batch_size )): # by using equidistant batch till the last one it becomes much faster than using the X.size()[0] directly
sequence = X_train_tensor[i:i+batch_size,:,:].reshape(-1, sequence_length, input_size).to(device)
labels = y_train_tensor[i:i+batch_size,:,:].reshape(-1, sequence_length, output_size).to(device)
# Forward pass
outputs = model(sequence)
loss = criterion(outputs, labels)
epoch_loss+=loss.item()
# Backward and optimize
loss.backward()
optimizer.step()
epoch_loss=epoch_loss/k
model.eval
validation_loss,_= evaluate(model,X_test_hard_tensor_1,y_test_hard_tensor_1)
model.train()
training_loss_log.append(epoch_loss)
print ('Epoch [{}/{}], Train MSELoss: {}, Validation : {} {}'.format(epoch+1, num_epochs,epoch_loss,validation_loss))
EDIT:
here are the parameters that I'm training with :
batch_size = 1024
num_epochs = 25000
learning_rate = 10e-04
optimizer=torch.optim.Adam(model.parameters(), lr=learning_rate)
criterion = nn.MSELoss(reduction='mean')
Batch size affects regularization. Training on a single example at a time is quite noisy, which makes it harder to overfit. Training on batches smoothes everything out, which makes it easier to overfit. Translating back to regularization:
Smaller batches add regularization.
Larger batches reduce regularization.
I am also curious about your learning rate. Every call to loss.backward() will accumulate the gradient. If you have set your learning rate to expect a single example at a time, and not reduced it to account for batch accumulation, then one of two things will happen.
The learning rate will be too high for the now-accumulated gradient, training will diverge, and both training and validation errors will explode.
The learning rate won't be too high, and nothing will diverge. The model will just train more quickly and effectively. If the model is too large for the data being fit, then training error will go to 0 but validation error will explode due to overfitting.
Update
Here is a bit more detail regarding the gradient accumulation.
Every call to loss.backward() will accumulate gradient, until you reset it with optimizer.zero_grad(). It will be acted on when you call optimizer.step(), based on whatever it has accumulated.
The way your code is written, you call loss.backward() for every pass through the inner loop, then you call optimizer.step() in the outer loop before resetting. So the gradient has been accumulated, that is summed, over all examples in the batch and not just one example at a time.
Under most assumptions, that will make the batch-accumulated gradient larger than the gradient for a single example. If the gradients are all aligned, for B batches, it will be larger by B times. If the gradients are i.i.d. then it will be more like sqrt(B) times larger.
If you do not account for this, then you have effectively increased your learning rate by that factor. Some of that will be mitigated by the smoothing effect of larger batches, which can then tolerate a higher learning rate. Larger batches reduce regularization, larger learning rates add it back. But that will not be a perfect match to compensate, so you will still want to adjust accordingly.
In general, whenever you change your batch size you will also want to re-tune your learning rate to compensate.
Leslie N. Smith has written some excellent papers on a methodical approach to hyperparameter tuning. A great place to start is A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay. He recommends you start by reading the diagrams, which are very well done.

Show Model Validation Progress with Keras model.fit()

I am training a CNN model using tf.keras passing training and validation generators as follows:
model.fit(
x=training_data_generator,
validation_data=validation_data_generator,
epochs=n_epochs,
use_multiprocessing=False,
max_queue_size=100,
workers=50
)
The generators are based on tf.keras.Sequence.
The problem is, my data set is huge. Training one epoch takes about a day (despite training on two Titan RTX GPUs) and validation after each epoch takes a few hours.
During training I can see the progress displayed, but during validation all I see is the last snapshot of the training progress bar:
130339/130340 [==============================] - 147432s 1s/step
until the validation finishes and finally I see my validation acuracy, loss etc.
Is there a way to display a progress bar for validation?
I'm thinking of doing something like this:
for epoch in range(n_epochs):
model.fit(
x=training_data_generator,
epochs=1,
use_multiprocessing=False,
max_queue_size=100,
workers=50
)
validation_results = model.evaluate(
x=validation_data_generator,
use_multiprocessing=False,
max_queue_size=100,
workers=50
)
print(validation_results)
Another option I was considering is to create a custom callback that validates the model on_epoch_end, but this seems very non-standard.
Is there a better approach to this?
You can set a steps_per_epoch on the fit method.
Based on the documentation:
Total number of steps (batches of samples) before declaring one epoch finished and starting the next epoch. When training with input tensors such as TensorFlow data tensors, the default None is equal to the number of samples in your dataset divided by the batch size, or 1 if that cannot be determined. If x is a tf.data dataset, and 'steps_per_epoch' is None, the epoch will run until the input dataset is exhausted. This argument is not supported with array inputs.
By this, you can limit the per epoch steps, so setting it with a lower value will immediately give you the validation loss & accuracy per epoch
By setting the steps_per_epoch to a lower size means you need to increase the epoch.
Every 1000 steps or epoch, it will show you the training and validation loss & accuracy after finishing 1000 steps rather than exhausting the entire dataset first then showing the results.
history = model.fit(x_train, y_train,
batch_size=2,
epochs=30,
steps_per_epoch=1000,
# We pass some validation for
# monitoring validation loss and metrics
# at the end of each epoch
validation_data=(x_val, y_val))

How to obtain weight matrices during training on Scikit

I am training a MLPClassifier by using Scikit. Lets say I want to train for 5 epochs on MNIST with one hidden layer of 100 neurons.
If I do "mlp = MLPClassifier(...)" and then "mlp.fit(train,test)", then I can obtain the trained weights with "mlp.coefs_".
But what I want is the sequence of weight matrices obtained after each epoch during training. So if I train for 5 epochs I would want a list of size 5 with the history of weight matrices.
Is this possible with scikit? Or should I use Keras?
One option is to train your model with a fraction of the epochs you wanted to do.
Store the parameters.
Then continue training your model with the warm_start = True parameter. You would do this until you got the overall number of epochs you wanted.
In the context of sci-kit learns implementation the max_iter parameter would be the epochs. This is referenced in this link.
https://stats.stackexchange.com/questions/284491/are-the-epochs-equivalent-to-the-iterations

Understanding fit_generator (steps_per_epoch), validation_steps, evaluate_generator (Steps) & predict_generator (steps)

I am new in using keras for my project. I have been working with generator in my model.
I am literally confused what value should i input
1) In fit_generator : steps_per_epoch & validation_steps ?
2) evaluate_generator : steps ?
3) predict_generator : steps ?
I have referred keras documentation and few other stack1, stack2 questions. I cannot able to understand. Better I can provide the example of my data shape what I am currently working and follow my questions accordingly. Also, please correct if my understanding is wrong
model.fit_generator(trainGen, steps_per_epoch=25, epochs = 100, validation_data=ValGen, validation_steps= 4)
Q1: For every epoch, there were 25 steps. For each step trainGen yields a tuple of shape (244*100*4, 244*100*2) and perform training.
What will be my batch_size and batches if my steps_per_epoch is 25 ?
Q2:
I understood that val_acc and val_loss will be calculated at the end of 25th step of the an epoch. I choose my validation_steps = 4. So ValGen yields a tuple of shape (30*100*4, 30*100*2) 4 times at the end of 25th step of an epoch
I have chosen arbitrarily validation_steps = 4. But how to choose
correct number of validation_steps ? How does val_loss & val_acc
calculated ? (calculate the mean 4 times either as single batch or
using batch_size)
Q3:
Say for example in evaluate_generator & predict_generator, my Generator yields a tuple shape (30*100*4, 30*100*2) for both.
How to choose the correct number for steps argument for both
evaluate_generator & predict_generator ? In keras document it is mentioned as Total number of steps (batches of samples) to yield from generator before stopping ? In my case what will the batches of samples ?
If any additional information required let me know.
Steps are not a parameter that you "choose", you can compute it as:
steps = number of samples / batch size
So here the only parameter that you are free to choose is the batch size, which is chosen to a value where the model does not run out of memory while training. Typical values are between 32 and 64.
For the training set, you use the number of samples of the training set and divide it for the training batch size, and for the validation set, you divide the number of samples in the validation set with the validation batch size. Both batch sizes can be equal.
This applies to all functions that use generators.

Categories