How the Keras steps_per_epoch in fit_generator works - python

In Keras documentation - steps_per_epoch: Total number of steps (batches of samples) to yield from generator before declaring one epoch finished and starting the next epoch. It should typically be equal to the number of unique samples of your dataset divided by the batch size.
I have 3000 samples.
If i set steps_per_epoch=3000 it's work slowly. If i set steps_per_epoch=300 it's work faster and i thought that Batch works!
But then I compared how much video memory is allocated in the first and second cases. And did not notice a big difference. If I use a simple fit() function then the difference is large. So it's real speed up or i just process 300 examples, instead of 3000?
What for this parameter is necessary? And how can I speed up the training?
My generator code:
def samples_generator(self, path_source, path_mask):
while 1:
file_paths_x = self.get_files(path_source)
file_paths_y = self.get_files(path_mask)
for path_x, path_y in zip(file_paths_x, file_paths_y):
x = self.load_pixels(path_x, 3, cv2.INTER_CUBIC)
y = self.load_pixels(path_y, 0, cv2.INTER_NEAREST)
yield (x, y)

The steps_per_epoch parameter is the number of batches of samples it will take to complete one full epoch. This is dependent on your batch size. The batch size is set where you initialize your training data. For example, if you're doing this with ImageDataGenerator.flow() or ImageDataGenerator.flow_from_directory(), the batch size is specified with the batch_size parameter in each of these.
You said you have 3000 samples.
If your batch size was 100, then steps_per_epoch would be 30.
If your batch size was 10, then steps_per_epoch would be 300.
If your batch size was 1, then steps_per_epoch would be 3000.
This is because steps_per_epoch should be equivalent to the total number of samples divided by the batch size. The process of implementing this in Keras is available in the two videos below.
The reason why you have to set steps_per_epoch is that the generator is designed to run indefinitely (See the docs:
"The generator is expected to loop over its data indefinitely."
). You implemented this by setting while 1.
Since fit_generator() is supposed to run epochs=x times, the method must know when the next epoch begins within this indefinitely loop (and, hence, the data has to be drawn from the beginning again).
Image preparation for CNN training with Keras
Create and train a CNN in Keras

Related

Can someone explain the relationship between batch size and steps per epoch?

I have a training set containing 272 images.
batch size = 8, steps per epoch = 1 > train the model for just 8 images and jumps to next epoch?
batch size = 8, steps per epoch = 34 (no shuffle) > train the model for all 272 images and jumps to the next epoch?
At the end of each steps per epoch does it update the weights of the model?
If so, by increasing the number of steps per epoch does it gives a better result?
Is there a convention in selecting batch size & steps per epoch?
If I provide the definition using the 272 images as the training dataset and 8 as batch size,
batch size - the number of images that will be feed together to the neural network.
epoch - an iteration over all the dataset images
steps - usually the batch size and number of epochs determine the steps. By default, here, steps = 272/8 = 34 per epoch. In total, if you want 10 epochs, you get 10 x 34 = 340 steps.
Now, if your dataset is very large, or if there are many possible ways to augment your images, which can again lead to a dataset of infinite or dynamic length, so how do you set the epoch in this case? You simply use steps per epoch to set a boundary. You pick an arbitrary value like say 100 and you assume your total dataset length to be 800. Now, it is another thing on how you do the augmentation. Normally, you can rotate, crop, or scale by random values each time.
Anyway, coming to the answers to your questions -
Yes
Yes
Yes if you are using Mini-batch gradient descent
Well, yes unless it overfits or your data is very small or ... there are a lot of other things to consider.
I am not aware of any. But for a ballpark figure, you can check on the training mechanism of high accuracy open source trained models in your domain.
(Note: I am not actively working in this field any more. So some things may have changed or I may be mistaken.)
The batch size defines the number of samples that propagates through the network before updating the model parameters.
Each batch of samples go through one full forward and backward propagation.
Example:
Total training samples (images) = 3000
batch_size = 32
epochs = 500
Then…
32 samples will be taken at a time to train the network.
To go through all 3000 samples it takes 3000/32 = 94 iterations  1 epoch.
This process continues 500 times (epochs).
You may be limited to small batch sizes based on your system hardware (RAM + GPU).
Smaller batches mean each step in gradient descent may be less accurate, so it may take longer for the algorithm to converge.
But, it has been observed that for larger batches there is a significant degradation in the quality of the model, as measured by its ability to generalize.
Batch size of 32 or 64 is a good starting point.
Summary:
Larger batch sizes result in faster progress in training, but don't always converge as fast.
Smaller batch sizes train slower but can converge faster

training by batches leads to more over-fitting

I'm training a sequence to sequence (seq2seq) model and I have different values to train on for the input_sequence_length.
For values 10 and 15, I get acceptable results but when I try to train with 20, I get memory errors so I switched the training to train by batches but the model over-fit and the validation loss explodes, and even with the accumulated gradient I get the same behavior, so I'm looking for hints and leads to more accurate ways to do the update.
Here is my training function (only with batch section) :
if batch_size is not None:
k=len(list(np.arange(0,(X_train_tensor_1.size()[0]//batch_size-1), batch_size )))
for epoch in range(num_epochs):
optimizer.zero_grad()
epoch_loss=0
for i in list(np.arange(0,(X_train_tensor_1.size()[0]//batch_size-1), batch_size )): # by using equidistant batch till the last one it becomes much faster than using the X.size()[0] directly
sequence = X_train_tensor[i:i+batch_size,:,:].reshape(-1, sequence_length, input_size).to(device)
labels = y_train_tensor[i:i+batch_size,:,:].reshape(-1, sequence_length, output_size).to(device)
# Forward pass
outputs = model(sequence)
loss = criterion(outputs, labels)
epoch_loss+=loss.item()
# Backward and optimize
loss.backward()
optimizer.step()
epoch_loss=epoch_loss/k
model.eval
validation_loss,_= evaluate(model,X_test_hard_tensor_1,y_test_hard_tensor_1)
model.train()
training_loss_log.append(epoch_loss)
print ('Epoch [{}/{}], Train MSELoss: {}, Validation : {} {}'.format(epoch+1, num_epochs,epoch_loss,validation_loss))
EDIT:
here are the parameters that I'm training with :
batch_size = 1024
num_epochs = 25000
learning_rate = 10e-04
optimizer=torch.optim.Adam(model.parameters(), lr=learning_rate)
criterion = nn.MSELoss(reduction='mean')
Batch size affects regularization. Training on a single example at a time is quite noisy, which makes it harder to overfit. Training on batches smoothes everything out, which makes it easier to overfit. Translating back to regularization:
Smaller batches add regularization.
Larger batches reduce regularization.
I am also curious about your learning rate. Every call to loss.backward() will accumulate the gradient. If you have set your learning rate to expect a single example at a time, and not reduced it to account for batch accumulation, then one of two things will happen.
The learning rate will be too high for the now-accumulated gradient, training will diverge, and both training and validation errors will explode.
The learning rate won't be too high, and nothing will diverge. The model will just train more quickly and effectively. If the model is too large for the data being fit, then training error will go to 0 but validation error will explode due to overfitting.
Update
Here is a bit more detail regarding the gradient accumulation.
Every call to loss.backward() will accumulate gradient, until you reset it with optimizer.zero_grad(). It will be acted on when you call optimizer.step(), based on whatever it has accumulated.
The way your code is written, you call loss.backward() for every pass through the inner loop, then you call optimizer.step() in the outer loop before resetting. So the gradient has been accumulated, that is summed, over all examples in the batch and not just one example at a time.
Under most assumptions, that will make the batch-accumulated gradient larger than the gradient for a single example. If the gradients are all aligned, for B batches, it will be larger by B times. If the gradients are i.i.d. then it will be more like sqrt(B) times larger.
If you do not account for this, then you have effectively increased your learning rate by that factor. Some of that will be mitigated by the smoothing effect of larger batches, which can then tolerate a higher learning rate. Larger batches reduce regularization, larger learning rates add it back. But that will not be a perfect match to compensate, so you will still want to adjust accordingly.
In general, whenever you change your batch size you will also want to re-tune your learning rate to compensate.
Leslie N. Smith has written some excellent papers on a methodical approach to hyperparameter tuning. A great place to start is A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay. He recommends you start by reading the diagrams, which are very well done.

Losses keep increasing within iteration

I am just a little confused on the following:
I am training a neural network and have it print out the losses. I am training it over 4 iterations just to try it out, and use batches. I normally see loss functions as parabolas, where the losses would decrease to a minimum point before increasing again. But my losses keep increasing as the iteration progresses.
For example, let's say there are 100 batches in each iteration. In iteration 0, losses started at 26.3 (batch 0) and went up to 1500.7 (batch 100). In iteration 1, it started at 2.4e-14 and went up to 80.8.
I am following an example from spacy (https://spacy.io/usage/examples#training-ner). Should I be comparing the losses across batches instead (i.e. if I take the points from all of the batch 0s it should resemble a parabola)?
If you are using the exact same code as linked, this behaviour is to be expected.
for itn in range(n_iter):
random.shuffle(TRAIN_DATA)
losses = {}
# batch up the examples using spaCy's minibatch
batches = minibatch(TRAIN_DATA, size=compounding(4.0, 32.0, 1.001))
for batch in batches:
texts, annotations = zip(*batch)
nlp.update(
texts, # batch of texts
annotations, # batch of annotations
drop=0.5, # dropout - make it harder to memorise data
losses=losses,
)
print("Losses", losses)
An "iteration" is the outer loop: for itn in range(n_iter). And from the sample code you can also infer that losses is being reset every iteration. The nlp.update call will actually increment the appropriate loss in each call, i.e. with each batch that it processes.
So yes: the loss increases WITHIN an iteration, for each batch that you process. To check whether your model is actually learning anything, you need to check the loss across iterations, similar as how the print statement in the original snippet only prints after looping through the batches, not during.
Hope that helps!

Understanding fit_generator (steps_per_epoch), validation_steps, evaluate_generator (Steps) & predict_generator (steps)

I am new in using keras for my project. I have been working with generator in my model.
I am literally confused what value should i input
1) In fit_generator : steps_per_epoch & validation_steps ?
2) evaluate_generator : steps ?
3) predict_generator : steps ?
I have referred keras documentation and few other stack1, stack2 questions. I cannot able to understand. Better I can provide the example of my data shape what I am currently working and follow my questions accordingly. Also, please correct if my understanding is wrong
model.fit_generator(trainGen, steps_per_epoch=25, epochs = 100, validation_data=ValGen, validation_steps= 4)
Q1: For every epoch, there were 25 steps. For each step trainGen yields a tuple of shape (244*100*4, 244*100*2) and perform training.
What will be my batch_size and batches if my steps_per_epoch is 25 ?
Q2:
I understood that val_acc and val_loss will be calculated at the end of 25th step of the an epoch. I choose my validation_steps = 4. So ValGen yields a tuple of shape (30*100*4, 30*100*2) 4 times at the end of 25th step of an epoch
I have chosen arbitrarily validation_steps = 4. But how to choose
correct number of validation_steps ? How does val_loss & val_acc
calculated ? (calculate the mean 4 times either as single batch or
using batch_size)
Q3:
Say for example in evaluate_generator & predict_generator, my Generator yields a tuple shape (30*100*4, 30*100*2) for both.
How to choose the correct number for steps argument for both
evaluate_generator & predict_generator ? In keras document it is mentioned as Total number of steps (batches of samples) to yield from generator before stopping ? In my case what will the batches of samples ?
If any additional information required let me know.
Steps are not a parameter that you "choose", you can compute it as:
steps = number of samples / batch size
So here the only parameter that you are free to choose is the batch size, which is chosen to a value where the model does not run out of memory while training. Typical values are between 32 and 64.
For the training set, you use the number of samples of the training set and divide it for the training batch size, and for the validation set, you divide the number of samples in the validation set with the validation batch size. Both batch sizes can be equal.
This applies to all functions that use generators.

The meaning of batch_size in ptb_word_lm (LSTM model of tensorflow)

I am new to tensorflow, i am now a little confused about the meaning of batch_size. As commonly known that the meaning of batch_size is the number of samples for each batch, but according to the code in ptb_word_lm, it seems not:
reader.py:
data_len = tf.size(raw_data) #the number of words in dataset
batch_len = data_len // batch_size
What does batch_len mean? The number of batches?
ptb_word_lm.py:
self.epoch_size = ((len(data) // batch_size) - 1) // num_steps
What does epoch_size mean? The number of sequences in each batch?
But if batch_size means the number of batches, then everything make sense. have I misunderstood something?
There are a few different concepts here: epoch, step, batch, and unroll steps for LSTM.
At the highest level, you train a network with multiple epochs. In each epoch, you will go through and use all training data (usually in a random order) by steps; In each step, you train a batch of samples.
I think the confusion here added by LSTM is that: each step, you will train a sequence of batches, instead of a single batch. The length of sequence is the number of unroll steps (num_steps).

Categories