Maximum-security batch size in Keras Sequential model - python

I'm training a Keras Sequential model where I haven't specified a batch_size. When I took a look into a list which is saving all my batch sizes during training, I observed that 1200 seemed like the maximum batch size.
My input data has 1500 samples. When I specify the batch size to 1500, keras is still splitting this into 1200 and an extra 300. When I do not specify something, I get 37 times 32 batches plus one of size 16 (=1200) and then 9 times 32 plus one with size 12 (=300). Saved batch size:
I looked into the keras Sequential model documentation here, but haven't found an explanation why this is happening.
I thought about if maybe my memory is too small for 1500 samples, but during calculations it's only used about 60%. And this is no explanation for the second observation.

that is because by default it takes 80% of the data to train with it and 20% of the data to test with it.

Related

What does batch_size mean in Keras' TimeSeriesGenerator?

I'm working with a video classification of 5 classes and using TimeDistributed CNN model. The train dataset contains 8 videos containing 75 frames each.I have used TimeSeriesGenerator of Keras where length equals 75 as each video contains 75 sequences.But, it seems unclear to me what batch_size should be in this case.
from keras.preprocessing.sequence import TimeseriesGenerator
train_sequences = TimeseriesGenerator(train_data, train_labels, length=75, batch_size=1)
Can anyone tell me what batch size should be considered for this task?
The batch size defines the number of video samples that will be introduce in each iteration of your model. The difference between the different values of batch size are the model weight's optimization. If batch size is equal to 3, the model will input the 3 sample videos and only after that 3 inputs, it will update the weights.
There isn't a optimal value for batch size. It's like No Free Lunch Theorem. I suggest you to try different values and look for the best results.
There are constraint in defining the batch size:
If the value is small, it will require less memory and it could be faster, since your model is requesting fewer samples. Otherwise, the gradient estimation will be less accurate.
If the value is big, the gradient's estimation will be more accurate, but it will need more memory and could be slower.
So you have to find a optimal value between grandient's estimation accuracy and computational resources usage

what is a "batch" in Keras model.fit() when training data are images

say I have a set of image data for training, 20 input images and 20 output images, with image size 512*512. Firstly I prepare training data as "train_image_input"(size 20*512*512) and "train_image_output"(size 20*512*512), then I run below code in Keras,
model.fit(train_image_input, train_image_output,epochs=3,batch_size=5)
I would like to confirm the definition of a "batch" when data are images, on the above example, does "batch_size=5" means
5 images(data size 5*512*512) are taken into training at a time ?
5 column among a single image(data size 5*512) are taken into training at a time ?
I had read the article : https://machinelearningmastery.com/difference-between-a-batch-and-an-epoch/
and the below description confuses me about the definition of sample/batch when data are images
What Is a Sample?
A sample is a single row of data.
It contains inputs that are fed into the algorithm and an output that is used to compare to the prediction and calculate an error.
A training dataset is comprised of many rows of data, e.g. many samples. A sample may also be called an instance, an observation, an input vector, or a feature vector.
Now that we know what a sample is, let’s define a batch.
What Is a Batch?
The batch size is a hyperparameter that defines the number of samples to work through before updating the internal model parameters.
Further more, if I set "batch_size=30" which is larger of number of images, there is no error during code execution, so I may consider the second one(data size 5*512) is correct ?
Thanks.
The batch size defines the number of samples that will be propagated through the network.
For instance, let's say you have 1050 training samples and you want to set up a batch_size equal to 100. The algorithm takes the first 100 samples (from 1st to 100th) from the training dataset and trains the network. Next, it takes the second 100 samples (from 101st to 200th) and trains the network again. We can keep doing this procedure until we have propagated all samples through the network. The problem might happen with the last set of samples. In our example, we've used 1050 which is not divisible by 100 without the remainder. The simplest solution is just to get the final 50 samples and train the network.

How much should batch size and number of epochs be when fitting a model in Keras?

I am training a model with 107850 samples and validating on 26963 samples.
How much should batch size and number of epochs be when fitting a model in Keras to optimize the validation accuracy? Is there any sort of rule of thumb to use based on data input size? Does it overfit a model if an increased number of epochs?
Thank You.
No! Their is not any rule of thumb for selecting the batch size of the data. Its a trade off between better accuracy and time. So we have to take the batch size which will process our data fast and give good accuracy too. Now what happens when you take too large batch size. Actually after every batch your model is going to update their all the weights. Large batch size large error than according to the error your model will adjust weight. Now After processing a large number of batch and update the weight take less time than taking smaller batch and updating the weights after every batch. But when you take small batch size your model will update weights after every batch i.e(16,32,64) etc whatever you want than your model will be able to learn your data more accurately but its take time to update all the weights after every batch.
Now According to the research papers most of the researchers use batch size
(16,32,64) may be researcher use larger batch sizes but I haven't seen yet.
I hope that answer is helpful for you.
and if you want number of epochs to be optimized use callbacks for your neural network your neural network will automatically stop learning if your model is not learning for more than 4 or 5 epochs depends on you.

Keras/TensorFlow - numpy vs tensor performance

I am trying to train neural networks using TensorFlow 1.12.0 and Keras API.
The setup is as follows. I have a large number of data points: each point consists of a context (call it 24 floats) and a label (1 float). The total amount of data is O(10^7) points. Test networks vary, but a fairly simple one might look like so
v=[keras.layers.Input(shape=input_shape)]
v.append(keras.layers.Dense(4,use_bias=False,activation=tf.nn.tanh)(v[-1]))
v.append(keras.layers.Dense(1,use_bias=False)(v[-1]))
model=keras.models.Model(inputs=v[0],outputs=v[-1])
optimizer = tf.train.RMSPropOptimizer(0.001)
model.compile(loss='mse', optimizer=optimizer, metrics=['mae'])
(...)
history=model.fit(x=train_data,y=train_labels,epochs=EPOCHS,verbose=1,batch_size=10000, shuffle=True)
I've been getting decent results keeping the data in numpy arrays and passing them to model.fit() as-is. However, I am not happy with the performance. It appears to bottleneck in Python code (I even tried to profile it and the Python method slice_arrays in python/keras/utils/generic_utils.py comes up as the primary bottleneck, with up to half of all time spent there.) The GPU (GeForce 1080 Ti) is in use, but its utilization (as reported by nvidia-smi) is rarely above 10-15%.
Looking for a way to speed things up, I tried to convert the data into tensors:
features=tf.convert_to_tensor(train_data)
labels=tf.convert_to_tensor(train_labels)
history=model.fit(x=features,y=labels,epochs=EPOCHS,verbose=1,steps_per_epoch=20, shuffle=True)
model.fit requires an argument batch_size when passed numpy arrays, but steps_per_epoch when passed tensors. Documentation isn't clear, but it seems that I should set steps_per_epoch to the number of data points divided by batch_size. This way I get comparable convergence rates. Whatever the value, GPU utilization is 100%, which is good.
But now there's a problem. With numpy arrays, run time per epoch is relatively independent of the batch size. I see 8 seconds per epoch at batch size 10k, 5 seconds at batch size 100k, and 7 seconds per epoch at batch size 1M. Convergence is generally way better if the batch size is small. So, I'd typically start with batch size 100k or less. On the other hand, with tensor input, run time per epoch goes up exponentially with steps_per_epoch. With batch size 1M (10-20 steps per epoch), I'm at 2 s / epoch, but the convergence rate is abysmal. With batch size 10k, the convergence rate is good, but the time is up to 30 s / epoch (actually slower than with numpy, despite 100% GPU utilization.)
Basically, the only scenario where tensor input actually ends up faster is the one where I don't care to use it in the first place.
What is going on and is there any way to get around this?

Tensorflow: on what does the batch_size depend?

I new on tensorflow and I try to understand what size should be batch.
Shape of my data (119396, 12955). How can I choose best batch_size to my data?
And what dependence batch_size from data shape or using algorithm?
The batch size is the number of input data values that you are introducing at once in the model. It is very important while training, and secondary when testing. For a standard Machine Learning/Deep Learning algorithm, choosing a batch size will have an impact on several aspects:
The bigger the batch size, the more data you will feed at once in a model. Thus, RAM memory consumption will be almost linear with batch size, and there will always be a limit based on your system specs and the size of your model above which your model will overflow.
The bigger the batch size, the faster you will loop over your dataset N times to perform training.
A biggerbatch size will slow down your model training speed, meaning that it will take longer for your model to get one single update since that update depends on more data.
A biggerbatch size will have more data to average towards the next update of the model, hence training should be smoother: smoother training/test accuracy curves.
Note that the size of the data is only related to the batch size in the sense that the bigger the data, the smaller the maximum batch size becomes (limit set by RAM). The size of the model also has a similar relation.
In practice, you should follow "in powers of 2 and the larger the better, provided that the batch fits into your (GPU) memory". For more in-depth details, check https://stackoverflow.com/a/46655895/9670056.

Categories