I am using Keras with theano backend and I want to train my Network on a gpu. That actually works pretty good. But when I want to train on a huge amount of data, I recognized, that there is a bottleneck in the model.fit() function (I am using the functional API).
Actually in the model.fit() function Keras starts to use the GPU for the training. But before it starts on the GPU, it needs much CPU-effort to prepare the training (I don't know exactly what fit() is doing before the actual training). The problem is, that this part only uses one thread, so that this part takes pretty long.
Is it possible to force Keras to use multiprocessing at this step?
Edit: Added additional data to my function call:
My function call looks like this:
optimizer = SGD(lr=0.00001)
early_stopping = EarlyStopping(monitor='val_loss', patience=30, verbose=1, mode='auto')
outname = join(outdir, save_base_name+".model")
checkpoint = ModelCheckpoint(outname, monitor='val_loss', verbose=1, save_best_only=True)
model.compile(loss='hinge', optimizer=optimizer, metrics=['accuracy'])
model.fit(
train_instances.x,
train_instances.y,
batch_size=60,
epochs=50,
verbose=1,
callbacks=[checkpoint, early_stopping],
validation_data=(valid_instances.x, valid_instances.y),
shuffle=True
)
The model I use (you can find the implementation here: https://github.com/pexmar/DSCNN_document) has 90 inputs (shared Layers) of dimension 100 x 300 (word2vec embedding layer: 100 words, each has 300 dimensions). I give 12500 training instances and 1000 validation instances to the network.
Related
I am training a neural network and I want to reduce the learning rate while training.
I am currently using ReduceLROnPlateau function provided by keras. But then it reaches the patience factor, it simply stops and don't continue training.
I want to reduce the learning rate and keep the net training.
Here is my code.
optimizer=k.optimizers.Adam(learning_rate=1e-5)
model.compile(loss='categorical_crossentropy',
optimizer=optimizer,
metrics=['acc'])
learningRate=callbacks.callbacks.ReduceLROnPlateau(monitor='val_acc', verbose=1, mode='max',factor=0.2, min_lr=1e-8,patience=7)
model.fit_generator(generator=training_generator,
validation_data=validation_generator,
steps_per_epoch=1000,
epochs=30,
validation_steps=1000,
callbacks=[learningRate]
)
You're using EarlyStopping which is stopping your training.
I want to reduce the learning rate and keep the net training but don't know how to do it.
If you want this then remove EarlyStopping.
I have a few questions about interpreting the performance of certain optimizers on MNIST using a Lenet5 network and what does the validation loss/accuracy vs training loss/accuracy graphs tell us exactly.
So everything is done in Keras using a standard LeNet5 network and it is ran for 15 epochs with a batch size of 128.
There are two graphs, train acc vs val acc and train loss vs val loss. I made 4 graphs because I ran it twice, once with validation_split = 0.1 and once with validation_data = (x_test, y_test) in model.fit parameters. Specifically the difference is shown here:
train = model.fit(x_train, y_train, epochs=15, batch_size=128, validation_data=(x_test,y_test), verbose=1)
train = model.fit(x_train, y_train, epochs=15, batch_size=128, validation_split=0.1, verbose=1)
These are the graphs I produced:
using validation_data=(x_test, y_test):
using validation_split=0.1:
So my two questions are:
1.) How do I interpret both the train acc vs val acc and train loss vs val acc graphs? Like what does it tell me exactly and why do different optimizers have different performances (i.e the graphs are different as well).
2.) Why do the graphs change when I use validation_split instead? Which one would be a better choice to use?
I will attempt to provide an answer
You can see that towards the end training accuracy is slightly higher than validation accuracy and training loss is slightly lower than validation loss. This hints at overfitting and if you train for more epochs the gap should widen.
Even if you use the same model with same optimizer you will notice slight difference between runs because weights are initialized randomly and randomness associated with GPU implementation. You can look here for how to address this issue.
Different optimizers will usually produce different graph because they update model parameters differently. For example, vanilla SGD will do update at constant rate for all parameters and at all training steps. But if you add momentum the rate will depend on previous updates and usually will result in faster convergence. Which means you can achieve same accuracy as vanilla SGD in lower number of iteration.
Graphs will change because training data will be changed if you split randomly. But for MNIST you should use standard test split provided with the dataset.
In Keras, when we are training a model for a fixed number of epochs using model.fit(), one of its parameters is shuffle (a boolean). The Keras documentation about it reads:
"Boolean (whether to shuffle the training data before each epoch)."
Essentially, I am training a Convolutional Neural Network and trying to get reproducible results. So, I followed the instructions and specified seeds as mentioned in this answer.
Although it worked partially (successfully got reproducible results on my local machine only), it was thought setting shuffle=False would help (by keeping the same data inputs), but keeping the reproducibility aside for a second, doing that dramatically reduced the performance of the model. Specifically, after each epoch, the metrics give same results (meaning not improving) even an increase in epochs gives same numbers (Accuracy = ~75 after 3 epochs and after 30 epochs). But setting shuffle=True shows gradual normal improvement in results.
Training data shape: (143256, 1, 150, 3)
Target data shape: (143256, 3)
Batch Size: 64
metrics = ['accuracy']
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adam(),
metrics=metrics)
....
model.fit(x_train, to_categorical(y_train), batch_size=batch_size,
epochs=epochs, verbose=verbose,
validation_data=(x_val, to_categorical(y_val)),
shuffle=False, callbacks=[metrics],
class_weight=class_weights)
Is this normal behavior of shuffling being set to false? Because even though the data is not permuted, the weights should be updated in each epoch and hence the metrics should improve overtime.
Assuming there is some issue with my implementation, should there be any significant difference in model performance when trying to train with both approaches (shuffling or without it)?
How can the results be reproducible with shuffle=True, which they apparently are, even if seeds are specified?
Any help will be really appreciated. Thanks!
I have a Keras model and i trained the model with 100 epochs.
now, i got 0.0085 loss at epoch 85 and at lat epoch i got 0.0092.
My question is,
what does model.save() in Keras saves?
Is it save the weights it got from lat epoch(i.e., 100)
Or is it saves the weights from best epoch (i,e., epoch 85)
Or average or mean weights from all 100 epochs?.
What actually keras model.save() is designed to save the weights after 100 epochs completion?.
Thanks for Explanation in Advance:).
The model.save() saves the whole architecture, weights and the optimizer state. This command saves the details needed to reconstitute your model.
The command will save:
The architecture of the model, allowing to re-create the model;
The weights of the model;
The training configuration (loss, optimizer);
the state of the optimizer, allowing to resume training exactly where you left off.
So you can reuse your model using keras.models.load_model(filepath) to reinstantiate your model. load_model will also take care of compiling the model using the saved training configuration.
See the example:
from keras.models import load_model
model.save('my_model.h5') # creates a HDF5 file 'my_model.h5'
del model # deletes the existing model
# returns a compiled model
# identical to the previous one
model = load_model('my_model.h5')
Source: https://keras.io/getting-started/faq/#how-can-i-save-a-keras-model
The model.save() will save many details about your NN. Most important details are
The architectures of the network including the dimensions (inputs/outputs layers, hidden layers ...etc).
The weights matrices for every hidden unit in each layer and the activation function.
and many other details that we may not need to outline here.
Coming back to the second part of your question, when we save the trained model, it will be saved the loss value after the last epoch. Which mean, the final value will be less or more from the previous epochs depending on the number of epochs you specified and how close you get from overfitting.
Also, the number of epochs is not saved and it doesn't make sense in most situations according to Francois Chollet the creator of Keras. see this conversation
This is true unless you activate the callback option that turns on the early stopping of the training of your network after a certain number of epochs (which you called the best iteration). see this
My question is, what does model.save() saves , "Is it save the weights
it got from lat epoch(i.e., 100)" OR "Is it saves the weights from
best epoch (i,e., epoch 85)" OR "Average or mean weights from all 100
epochs"?.
What all things are saved(weights, optimizer state etc.) are already mentioned in the other answers. In your case, the weights of the model at the end of 100 epochs are saved.
In case, you would like to save the best model(with the least loss), then you need to create a ModelCheckPoint callback object and pass it to the fit() method via the callbacks argument.
https://keras.io/callbacks/#ModelCheckpoint
https://keras.io/callbacks/#example-model-checkpoints
It saves weights
Yes
For saving weights for best epoch, use chunk of code i have given below
No
What actually keras model.save() is designed to save the weights after 100 epochs completion?. Yes it does, but have a look at following code for saving weights of only best epochs.
Use this chunk of code to:
Save weights of best epochs only
Update weights after every epoch only if given criteria is improved (val_loss is min)
Additionally, history after each epoch will be save in .csv file.
Code
import pandas as pd
from keras.callbacks import EarlyStopping, ModelCheckpoint
#Stop when val_loss is not decreasing
earlyStopping = EarlyStopping(monitor='val_loss', patience=10, verbose=0, mode='min')
#Save the model after every epoch.
checkpointer = ModelCheckpoint(filepath='Model_1_weights.h5', verbose=1, save_best_only=True)
#history variable will save training progress after each epoch
history = model.fit(X_train, y_train, batch_size=20, epochs=40, validation_data=(X_valid, y_valid), shuffle=True, callbacks=[checkpointer, earlyStopping])
#Save progress of each epoch in .csv file
hist_df = pd.DataFrame(history.history)
hist_csv_file = 'History_Model_1.csv'
with open(hist_csv_file, mode='w') as f:
hist_df.to_csv(f)
Link: https://keras.io/callbacks/#ModelCheckpoint
Following this great post: Scaling Keras Model Training to Multiple GPUs I tried to upgrade my model to run in parallel on my multiple GPUs instance.
At first I ran the MNIST example as proposed here: MNIST in Keras with the additional syntax in the compile command as follows:
# Prepare the list of GPUs to be used in training
NUM_GPU = 8 # or the number of GPUs available on your machine
gpu_list = []
for i in range(NUM_GPU): gpu_list.append('gpu(%d)' % i)
# Compile your model by setting the context to the list of GPUs to be used in training.
model.compile(loss='categorical_crossentropy',
optimizer=opt,
metrics=['accuracy'],
context=gpu_list)
then I trained the model:
model.fit(x_train, y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test))
So far so good. It ran for less than 1s per epoch and I was really excited and happy until I tried - data augmentation.
To that point, my training images were a numpy array at size (6000,1,28,28) and the labels were at size (10,60000) - one-hot encoded. For data augmentation I used the ImageDataGenerator function:
gen = image.ImageDataGenerator(rotation_range=8, width_shift_range=0.08, shear_range=0.3,
height_shift_range=0.08, zoom_range=0.08)
batches = gen.flow(x_train, y_train, batch_size=NUM_GPU*64)
test_batches = gen.flow(x_test, y_test, batch_size=NUM_GPU*64)
and then:
model.fit_generator(batches, batches.N, nb_epoch=1,
validation_data=test_batches, nb_val_samples=test_batches.N)
And unfortunately, from 1s per epoch I started getting ~11s per epoch... I suppose that the "impact" of the ImageDataGenerator is destructive and it probably running all the (reading->augmenting->writing to gpu) process really slow and inefficient.
Scaling keras to multiple GPUs is great, but data-augmentation is essential for my model to be robust enough.
I guess one solution could be: load all images from directory and write your own function that shuffles and augment those images. But I'm sure must be some easier way to optimize this process using keras API.
Thanks!
Ok I've found the solution.
You need to use mxnet's iterator. see here:
Image IO - Loading and pre-processing images
instead of Keras's data_generator