how to reduce learning rate on train correctly - python

I am training a neural network and I want to reduce the learning rate while training.
I am currently using ReduceLROnPlateau function provided by keras. But then it reaches the patience factor, it simply stops and don't continue training.
I want to reduce the learning rate and keep the net training.
Here is my code.
optimizer=k.optimizers.Adam(learning_rate=1e-5)
model.compile(loss='categorical_crossentropy',
optimizer=optimizer,
metrics=['acc'])
learningRate=callbacks.callbacks.ReduceLROnPlateau(monitor='val_acc', verbose=1, mode='max',factor=0.2, min_lr=1e-8,patience=7)
model.fit_generator(generator=training_generator,
validation_data=validation_generator,
steps_per_epoch=1000,
epochs=30,
validation_steps=1000,
callbacks=[learningRate]
)

You're using EarlyStopping which is stopping your training.
I want to reduce the learning rate and keep the net training but don't know how to do it.
If you want this then remove EarlyStopping.

Related

Query about tensorlfow keras learning rate?

I'm implementing deep architecture using TensorFlow Keras. At first, I used a loss function without defining the learning rate, for example:
model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
I'm wondering as to what the default learning rate is and how TensorFlow Keras sets it. Second, which is preferable: the default-specified learning rate or the Custom (user-specified) learning rate?
Then I switched to a custom learning rate. However, I've observed two different approaches of assigning a value to learning rate. For instance, one is lr and the other is learning_rate.
first way to set learing rate
optimizer = Adam(learning_rate=0.001)
model.compile(loss="categorical_crossentropy", optimizer=optimizer, metrics=["accuracy"])
Second way to set learning rate
optimizer = Adam(lr=0.001)
model.compile(loss="categorical_crossentropy", optimizer=optimizer, metrics=["accuracy"])
What is the difference between learning_rate and lr?
The lr implementation is deprecated but they essentially did the same thing. You can check this here. Credit - #Frightera.

How to monitor accuracy in tensorflow (metric accuracy is not available)

I would like to monitor accuracy for my tensorflow model, however, when compiling my model using metrics=['accuracy'] or metrics=[tf.keras.metrics.Accuracy()] and then train my model the following Warning pops up.
WARNING:tensorflow: Early stopping conditioned on metric accuracy which is not available. Available metrics are: loss, val_loss
model.compile(optimizer='adam', loss='mean_squared_error', metrics=["tried both options i mentioned"])
callbacks = [EarlyStopping(monitor='accuracy', patience=1000)]
model.fit(x_train, y_train, epochs=5000, batch_size=100, validation_split=0.2, callbacks=callbacks)
Based on the link here:
Accuracy is one metric for evaluating classification models. Informally, accuracy is the fraction of predictions our model got right. Formally, accuracy has the following definition:
So, for other problems like regression you should use other metrics rather than accuracy, like metrics=[tf.keras.metrics.MeanSquaredError()])
In addition to Kaveh's answer, there are other metrics for regression problems. One that I think is quite useful is R2 squared (https://en.wikipedia.org/wiki/Coefficient_of_determination) and it isn't included in Keras.
Tensorflow addons library (https://www.tensorflow.org/addons) implements it and can be used in a ANN with the following code:
import tensorflow_addons as tfa
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.01),
loss="mean_squared_error",
metrics=tfa.metrics.RSquare(y_shape=(1,)))

Neural Network optimization for image classification in keras/tensorflow

I am writing a program for clasifying images into two categories: "Wires" and "non-Wires". I have hand-labeled around 5000 microscope images, examples:
non-wire
wire
The neural network I am using is adapted from "Deep Learning with Python", chapter about convolutional networks (I don't think convolutional networks are neccesary here because there are no obvious hierarchies; Dense networks should be more suitable):
model = models.Sequential()
model.add(layers.Dense(32, activation='relu',input_shape=(200,200,3)))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Dense(32, activation='relu'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Flatten())
model.add(layers.Dense(256, activation='relu'))
model.add(layers.Dense(2, activation='softmax'))
However, test accuracy after 10 epochs of training does not go over 92% when playing around with the paramters of the network. Training images contain about 1/3 wires, 2/3 non-wires. My question: Do you see any obvious mistakes in this neural network design that inhibits accuracy, or do you think I am limited by the image quality? I have about 4000 train and 1000 test images.
You might get some improvement by trying to handle the class imbalance using a weights dictionary. If the label of non wire is 0 and the label for wire is 1 then the weight dictionary would be
weight_dict= { 0:.5, 1:1}
in model.fit set
class_weight=weight_dict .
Without seeing the results of training (training loss and validation loss) can't tell what else to do. If you are over fitting try adding some dropout layers. Also recommend you try using an adjustable learning using the keras callback ReduceLROnPlateau, and early stopping using the keras callback EarlyStopping. Documentation is here. Set each callback to monitor validation loss. My suggested code is shown below:
reduce_lr=tf.keras.callbacks.ReduceLROnPlateau(
monitor="val_loss",factor=0.5, patience=2, verbose=1)
e_stop=tf.keras.callbacks.EarlyStopping( monitor="val_loss", patience=5,
verbose=0, restore_best_weights=True)
callbacks=[reduce_lr, e_stop]
In model.fit include
callbacks=callbacks
If you want to give a convolutional network a try I recommend transfer learning using the Mobilenetmodel. Documentation for that is here.. My recommend code for that is below:
base_model=tf.keras.applications.mobilenet.MobileNet( include_top=False,
input_shape=(200,200,3) pooling='max', weights='imagenet',dropout=.4)
x=base_model.output
x=keras.layers.BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001 )(x)
x = Dense(1024, activation='relu')(x)
x=Dropout(rate=.3, seed=123)(x)
output=Dense(2, activation='softmax')(x)
model=Model(inputs=base_model.input, outputs=output)
model.compile(Adamax(lr=.001),loss='categorical_crossentropy',metrics=
['accuracy'] )
In model.fit include the callbacks as shown above.

Interpreting training loss/accuracy vs validation loss/accuracy

I have a few questions about interpreting the performance of certain optimizers on MNIST using a Lenet5 network and what does the validation loss/accuracy vs training loss/accuracy graphs tell us exactly.
So everything is done in Keras using a standard LeNet5 network and it is ran for 15 epochs with a batch size of 128.
There are two graphs, train acc vs val acc and train loss vs val loss. I made 4 graphs because I ran it twice, once with validation_split = 0.1 and once with validation_data = (x_test, y_test) in model.fit parameters. Specifically the difference is shown here:
train = model.fit(x_train, y_train, epochs=15, batch_size=128, validation_data=(x_test,y_test), verbose=1)
train = model.fit(x_train, y_train, epochs=15, batch_size=128, validation_split=0.1, verbose=1)
These are the graphs I produced:
using validation_data=(x_test, y_test):
using validation_split=0.1:
So my two questions are:
1.) How do I interpret both the train acc vs val acc and train loss vs val acc graphs? Like what does it tell me exactly and why do different optimizers have different performances (i.e the graphs are different as well).
2.) Why do the graphs change when I use validation_split instead? Which one would be a better choice to use?
I will attempt to provide an answer
You can see that towards the end training accuracy is slightly higher than validation accuracy and training loss is slightly lower than validation loss. This hints at overfitting and if you train for more epochs the gap should widen.
Even if you use the same model with same optimizer you will notice slight difference between runs because weights are initialized randomly and randomness associated with GPU implementation. You can look here for how to address this issue.
Different optimizers will usually produce different graph because they update model parameters differently. For example, vanilla SGD will do update at constant rate for all parameters and at all training steps. But if you add momentum the rate will depend on previous updates and usually will result in faster convergence. Which means you can achieve same accuracy as vanilla SGD in lower number of iteration.
Graphs will change because training data will be changed if you split randomly. But for MNIST you should use standard test split provided with the dataset.

Force keras mode.fit() to use Multiprocessing

I am using Keras with theano backend and I want to train my Network on a gpu. That actually works pretty good. But when I want to train on a huge amount of data, I recognized, that there is a bottleneck in the model.fit() function (I am using the functional API).
Actually in the model.fit() function Keras starts to use the GPU for the training. But before it starts on the GPU, it needs much CPU-effort to prepare the training (I don't know exactly what fit() is doing before the actual training). The problem is, that this part only uses one thread, so that this part takes pretty long.
Is it possible to force Keras to use multiprocessing at this step?
Edit: Added additional data to my function call:
My function call looks like this:
optimizer = SGD(lr=0.00001)
early_stopping = EarlyStopping(monitor='val_loss', patience=30, verbose=1, mode='auto')
outname = join(outdir, save_base_name+".model")
checkpoint = ModelCheckpoint(outname, monitor='val_loss', verbose=1, save_best_only=True)
model.compile(loss='hinge', optimizer=optimizer, metrics=['accuracy'])
model.fit(
train_instances.x,
train_instances.y,
batch_size=60,
epochs=50,
verbose=1,
callbacks=[checkpoint, early_stopping],
validation_data=(valid_instances.x, valid_instances.y),
shuffle=True
)
The model I use (you can find the implementation here: https://github.com/pexmar/DSCNN_document) has 90 inputs (shared Layers) of dimension 100 x 300 (word2vec embedding layer: 100 words, each has 300 dimensions). I give 12500 training instances and 1000 validation instances to the network.

Categories