I have trouble with recording 'val_loss' and 'val_acc' in Keras. 'loss' and 'acc' are easy because they always recorded in history of model.fit.
'val_loss' is recorded if validation is enabled in fit, and val_acc is recorded if validation and accuracy monitoring are enabled. But what does this mean?
My node is model.fit(train_data, train_labels,epochs = 64,batch_size = 10,shuffle = True,validation_split = 0.2, callbacks=[history]).
As you see, I use 5-fold cross-validation and shuffle the data. In this case, how can I enable validation in fit to record 'val_loss' and 'val_acc'?
Thanks
From Keras documentation, we have for models.fit method:
fit(x=None, y=None,
batch_size=None,
epochs=1,
verbose=1,
callbacks=None,
validation_split=0.0, validation_data=None,
shuffle=True,
class_weight=None,
sample_weight=None,
initial_epoch=0,
steps_per_epoch=None,
validation_steps=None
)
'val_loss' is recorded if validation is enabled in fit, and val_accis recorded if validation and accuracy monitoring are enabled. - This is from the keras.callbacks.Callback() object, if used for callbacks parameter in the above fit method.
Instead of using the history callback, which you've used, it can be used as follows:
from keras.callbacks import Callback
logs = Callback()
model.fit(train_data,
train_labels,
epochs = 64,
batch_size = 10,
shuffle = True,
validation_split = 0.2,
callbacks=[logs]
)
'val_loss' is recorded if validation is enabled in fit means: when using the model.fit method you are using either the validatoin_split parameter or you use validation_data parameter to specify the tuple (x_val, y_val) or tuple (x_val, y_val, val_sample_weights) on which to evaluate the loss and any model metrics at the end of each epoch. .
A History object. Its History.history attribute is a record of
training loss values and metrics values at successive epochs, as well
as validation loss values and validation metrics values (if
applicable). - Keras Documentation ( Return value for model.fit method)
You are using the History callback, in your model as follows:
model.fit(train_data,
train_labels,
epochs = 64,
batch_size = 10,
shuffle = True,
validation_split = 0.2,
callbacks=[history]
)
history.history will output a dictionary for you with the : loss, acc, val_loss and val_acc, if you use a variable for saving model.fit like below:
history = model.fit(
train_data,
train_labels,
epochs = 64,
batch_size = 10,
shuffle = True,
validation_split = 0.2,
callbacks=[history]
)
history.history
The output will be like the following:
{'val_loss': [14.431451635814849,
14.431451635814849,
14.431451635814849,
14.431451635814849,
14.431451635814849,
14.431451635814849,
14.431451635814849,
14.431451635814849,
14.431451635814849,
14.431451635814849],
'val_acc': [0.1046428571712403,
0.1046428571712403,
0.1046428571712403,
0.1046428571712403,
0.1046428571712403,
0.1046428571712403,
0.1046428571712403,
0.1046428571712403,
0.1046428571712403,
0.1046428571712403],
'loss': [14.555215610322499,
14.555215534028553,
14.555215548560733,
14.555215588524229,
14.555215592157273,
14.555215581258137,
14.555215575808571,
14.55521561940511,
14.555215563092913,
14.555215624854679],
'acc': [0.09696428571428571,
0.09696428571428571,
0.09696428571428571,
0.09696428571428571,
0.09696428571428571,
0.09696428571428571,
0.09696428571428571,
0.09696428571428571,
0.09696428571428571,
0.09696428571428571]}
You can save the data both by using csvlogger like below as given in the comments or by using the longer method of writing a dictionary to a csv file as given here writing a dictionary to a csv
csv_logger = CSVLogger('training.log')
model.fit(X_train, Y_train, callbacks=[csv_logger])
UPDATE: The val_accuracy dictionary key seems to no longer work today. No idea why, but I removed that code from here despite the OP asking how to log it (also, loss is what actually matters for comparison of cross-validation results).
Using Python 3.7 and Tensorflow 2.0, the following worked for me, after much searching, guessing, and failing repeatedly. I started with someone else's script to get what I needed written to a .json file; it produces one such .json file per training run showing the validation loss per epoch so you can see how the model converged (or did not); accuracy is logged but not as a performance metric.
NOTE: You need to fill in yourTrainDir, yourTrainingData, yourValidationData, yourOptimizer, yourLossFunctionFromKerasOrElsewhere, yourNumberOfEpochs, etc. to enable this code to run:
import numpy as np
import os
import tensorflow as tf
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint, LambdaCallback
import json
model.compile(
optimizer=yourOptimizer,
loss=yourLossFunctionFromKerasOrElsewhere()
)
# create a custom callback to enable future cross-validation efforts
yourTrainDir = os.getcwd() + '/yourOutputFolderName/'
uniqueID = np.random.randint(999999) # To distinguish validation runs by saved JSON name
epochValidationLog = open(
yourTrainDir +
'val_log_per_epoch_' +
'{}_'.format(uniqueID) +
'.json',
mode='wt',
buffering=1
)
ValidationLogsCallback = LambdaCallback(
on_epoch_end = lambda epoch,
logs: epochValidationLog.write(
json.dumps(
{
'oneIndexedEpoch': epoch + 1,
'Validationloss': logs['val_loss']
}
) + '\n'
),
on_train_end = lambda logs: epochValidationLog.close()
)
# set up the list of callbacks
callbacksList = [
ValidationLogsCallback,
EarlyStopping(patience=40, verbose=1),
]
results = model.fit(
x=yourTrainingData,
steps_per_epoch=len(yourTrainingData),
validation_data=yourValidationData,
validation_steps=len(yourValidationData),
epochs=yourNumberOfEpochs,
verbose=1,
callbacks=callbacksList
)
This produces a JSON file in TrainDir folder recording validation loss and accuracy for each training epoch as its own dictionary-like item. Note that the epoch number is indexed to start at 1 so it matches the output of tensorflow, not the actual index in Python.
I am outputting to .JSON file but it could be anything. Here is my code for analyzing the JSON files produced; I could have put it all in one script but did not.
import os
from pathlib import Path
import json
currentDirectory = os.getcwd()
outFileName = 'CVResults.json'
outFile = open(outFileName, mode='wt')
validationLogPaths = Path().glob('val_log_per_epoch_*.json')
# Necessary list to detect short unique IDs for each training session
stringDecimalDigits = [
'1',
'2',
'3',
'4',
'5',
'6',
'7',
'8',
'9',
'0'
]
setStringDecimalDigits = set(stringDecimalDigits)
trainingSessionsList = []
# Load the JSON files into memory to allow reading.
for validationLogFile in validationLogPaths:
trainingUniqueIDCandidate = str(validationLogFile)[18:21]
# Pad unique IDs with fewer than three digits with zeros at front
thirdPotentialDigitOfUniqueID = trainingUniqueIDCandidate[2]
if setStringDecimalDigits.isdisjoint(thirdPotentialDigitOfUniqueID):
secondPotentialDigitOfUniqueID = trainingUniqueIDCandidate[1]
if setStringDecimalDigits.isdisjoint(secondPotentialDigitOfUniqueID):
trainingUniqueID = '00' + trainingUniqueIDCandidate[:1]
else:
trainingUniqueID = '0' + trainingUniqueIDCandidate[:2]
else:
trainingUniqueID = trainingUniqueIDCandidate
trainingSessionsList.append((trainingUniqueID, validationLogFile))
trainingSessionsList.sort(key=lambda x: x[0])
# Analyze and export cross-validation results
for replicate in range(len(dict(trainingSessionsList).keys())):
validationLogFile = trainingSessionsList[replicate][1]
fileOpenForReading = open(
validationLogFile, mode='r', buffering=1
)
with fileOpenForReading as openedFile:
jsonValidationData = [json.loads(line) for line in openedFile]
bestEpochResultsDict = {}
oneIndexedEpochsList = []
validationLossesList = []
for line in range(len(jsonValidationData)):
tempDict = jsonValidationData[line]
oneIndexedEpochsList.append(tempDict['oneIndexedEpoch'])
validationLossesList.append(tempDict['Validationloss'])
trainingStopIndex = min(
range(len(validationLossesList)),
key=validationLossesList.__getitem__
)
bestEpochResultsDict['Integer_unique_ID'] = trainingSessionsList[replicate][0]
bestEpochResultsDict['Min_val_loss'] = validationLossesList[trainingStopIndex]
bestEpochResultsDict['Last_train_epoch'] = oneIndexedEpochsList[trainingStopIndex]
outFile.write(json.dumps(bestEpochResultsDict, sort_keys=True) + '\n')
outFile.close()
This last block of code creates a JSON summarizing what is in CVResults.json produced above:
from pathlib import Path
import json
import os
import statistics
outFile = open("CVAnalysis.json", mode='wt')
CVResultsPath = sorted(Path().glob('*CVResults.json'))
if len(CVResultsPath) > 1:
print('\nPlease analyze only one CVResults.json file at at time.')
userAnswer = input('\nI understand only one will be analyzed: y or n')
if (userAnswer == 'y') or (userAnswer == 'Y'):
print('\nAnalyzing results in file {}:'.format(str(CVResultsPath[0])))
# Load the first CVResults.json file into memory to allow reading.
CVResultsFile = CVResultsPath[0]
fileOpenForReading = open(
CVResultsFile, mode='r', buffering=1
)
outFile.write(
'Analysis of cross-validation results tabulated in file {}'.format(
os.getcwd()
) +
str(CVResultsFile) +
':\n\n'
)
with fileOpenForReading as openedFile:
jsonCVResultsData = [json.loads(line) for line in openedFile]
minimumValidationLossesList = []
trainedOneIndexedEpochsList = []
for line in range(len(jsonCVResultsData)):
tempDict = jsonCVResultsData[line]
minimumValidationLossesList.append(tempDict['Min_val_loss'])
trainedOneIndexedEpochsList.append(tempDict['Last_train_epoch'])
outFile.write(
'\nTrained validation losses: ' +
json.dumps(minimumValidationLossesList) +
'\n'
)
outFile.write(
'\nTraining epochs required: ' +
json.dumps(trainedOneIndexedEpochsList) +
'\n'
)
outFile.write(
'\n\nMean trained validation loss: ' +
str(round(statistics.mean(minimumValidationLossesList), 4)) +
'\n'
)
outFile.write(
'Median of mean trained validation losses per session: ' +
str(round(statistics.median(minimumValidationLossesList), 4)) +
'\n'
)
outFile.write(
'\n\nMean training epochs required: ' +
str(round(statistics.mean(trainedOneIndexedEpochsList), 1)) +
'\n'
)
outFile.write(
'Median of mean training epochs required per session: ' +
str(round(statistics.median(trainedOneIndexedEpochsList), 1)) +
'\n'
)
outFile.close()
It is possible to save the data of val_loss and val_acc using the ModelCheckpoint class of Keras.
from keras.callbacks import ModelCheckpoint
checkpointer = ModelCheckpoint(filepath='yourmodelname.hdf5',
monitor='val_loss',
verbose=1,
save_best_only=False)
history = model.fit(X_train, y_train, epochs=100, validation_split=0.02, callbacks=[checkpointer])
history.history.keys()
# output
# dict_keys(['val_loss', 'val_mae', 'val_acc', 'loss', 'mae', 'acc'])
An important point, if you omit the validation_split property, you will only get the values of loss, mae and acc.
Hope this helps!
Related
I'm trying to train a basic Graph Neural Network using the StellarGraph library, in particular starting from the example provided in [0].
The example works fine, but now I would like to repeat the same exercize removing the N-Fold Crossvalidation and providing specific training, validation and test sets. I'm trying to do so with the following code:
# One hot encoding
graph_training_set_labels_encoded = pd.get_dummies(graphs_training_set_labels, drop_first=True)
graph_validation_set_labels_encoded = pd.get_dummies(graphs_validation_set_labels, drop_first=True)
graphs = graphs_training_set + graphs_validation_set
# Graph generator preparation
generator = PaddedGraphGenerator(graphs=graphs)
train_gen = generator.flow([x for x in range(0, len(graphs_training_set))],
targets=graph_training_set_labels_encoded,
batch_size=batch_size)
valid_gen = generator.flow([x for x in range(len(graphs_training_set),
len(graphs_training_set) + len(graphs_validation_set))],
targets=graph_validation_set_labels_encoded,
batch_size=batch_size)
# Stopping criterium
es = EarlyStopping(monitor="val_loss",
min_delta=0,
patience=20,
restore_best_weights=True)
# Model definition
gc_model = GCNSupervisedGraphClassification(layer_sizes=[64, 64],
activations=["relu", "relu"],
generator=generator,
dropout=dropout_value)
x_inp, x_out = gc_model.in_out_tensors()
predictions = Dense(units=32, activation="relu")(x_out)
predictions = Dense(units=16, activation="relu")(predictions)
predictions = Dense(units=1, activation="sigmoid")(predictions)
# Creating Keras model and preparing it for training
model = Model(inputs=x_inp, outputs=predictions)
model.compile(optimizer=Adam(adam_value), loss=binary_crossentropy, metrics=["acc"])
# GNN Training
history = model.fit(train_gen, epochs=num_epochs, validation_data=valid_gen, verbose=0, callbacks=[es])
# Calculate performance on the validation data
test_metrics = model.evaluate(valid_gen, verbose=0)
valid_acc = test_metrics[model.metrics_names.index("acc")]
print(f"Test Accuracy model = {valid_acc}")
Where graphs_training_set and graphs_validation_set are lists of StellarDiGraphs.
I am able to run this piece of code, but it provides NaN as result. What could be the problem?
Since it is the first time I am using StellarGraph and in particular PaddedGraphGenerator. I think my mistake rely on the usage of that generator, but providing training set and validation set in different manner didn't produce better results.
Thank you in advance.
UPDATE Fixed I typo in the code, as pointed out here (thanks to george123).
[0] https://stellargraph.readthedocs.io/en/stable/demos/graph-classification/gcn-supervised-graph-classification.html
I found a solution digging in the StellarGraph documentation for PaddedGraphGenerator and GCN Neural Network Class GCNSupervisedGraphClassification. Furthermore, I have found a similar question on StellarGraph Issue Tracker which also points out to the solution.
# Graph generator preparation
generator = PaddedGraphGenerator(graphs=graphs)
train_gen = generator.flow([x for x in range(0, num_graphs_for_training)],
targets=training_graphs_labels,
batch_size=35)
valid_gen = generator.flow([x for x in range(num_graphs_for_training, num_graphs_for_training + num_graphs_for_validation)],
targets=validation_graphs_labels,
batch_size=35)
# Stopping criterium
es = EarlyStopping(monitor="val_loss",
min_delta=0.001,
patience=10,
restore_best_weights=True)
# Model definition
gc_model = GCNSupervisedGraphClassification(layer_sizes=[64, 64],
activations=["relu", "relu"],
generator=generator,
dropout=dropout_value)
x_inp, x_out = gc_model.in_out_tensors()
predictions = Dense(units=32, activation="relu")(x_out)
predictions = Dense(units=16, activation="relu")(predictions)
predictions = Dense(units=1, activation="sigmoid")(predictions)
# Let's create the Keras model and prepare it for training
model = Model(inputs=x_inp, outputs=predictions)
model.compile(optimizer=Adam(adam_value), loss=binary_crossentropy, metrics=["acc"])
# GNN Training
history = model.fit(train_gen, epochs=num_epochs, validation_data=valid_gen, verbose=1, callbacks=[es])
# Evaluate performance on the validation data
valid_metrics = model.evaluate(valid_gen, verbose=0)
valid_acc = valid_metrics[model.metrics_names.index("acc")]
# Define test set indices temporary vars
index_begin_test_set = num_graphs_for_training + num_graphs_for_validation
index_end_test_set = index_begin_test_set + num_graphs_for_testing
test_set_indices = [x for x in range(index_begin_test_set, index_end_test_set)]
# Evaluate performance on test set
generator_for_test_set = PaddedGraphGenerator(graphs=graphs)
test_gen = generator_for_test_set.flow(test_set_indices)
result = model.predict(test_gen)
I am trying to use local binary data to train a network to perform regression inference.
Each local binary data has the following layout:
and the whole data consists of several *.bin files with the layout above. Each file has a variable number of sequences of 403*4 bytes. I was able to read one of those files using the following code:
import tensorflow as tf
RAW_N = 2 + 20*20 + 1
def convert_binary_to_float_array(register):
return tf.io.decode_raw(register, out_type=tf.float32)
raw_dataset = tf.data.FixedLengthRecordDataset(filenames=['mydata.bin'],record_bytes=RAW_N*4)
raw_dataset = raw_dataset.map(map_func=convert_binary_to_float_array)
Now, I need to create 4 datasets train_data, train_labels, test_data, test_labels as follows:
train_data, train_labels, test_data, test_labels = prepare_ds(raw_dataset, 0.8)
and use them to train & evaluate:
model = build_model()
history = model.fit(train_data, train_labels, ...)
loss, mse = model.evaluate(test_data, test_labels)
My question is: how to implement function prepare_ds(dataset, frac)?
def prepare_ds(dataset, frac):
...
I have tried to use tf.shape, tf.reshape, tf.slice, subscription [:] with no success. I realized that those functions doesn't work properly because after the map() call raw_dataset is a MapDataset (as a result of the eager execution concerns).
If the meta-data is suppose to be part of your inputs, which I am assuming, you could try something like this:
import random
import struct
import tensorflow as tf
import numpy as np
RAW_N = 2 + 20*20 + 1
bytess = random.sample(range(1, 5000), RAW_N*4)
with open('mydata.bin', 'wb') as f:
f.write(struct.pack('1612i', *bytess))
def decode_and_prepare(register):
register = tf.io.decode_raw(register, out_type=tf.float32)
inputs = register[:402]
label = register[402:]
return inputs, label
total_data_entries = 8
raw_dataset = tf.data.FixedLengthRecordDataset(filenames=['/content/mydata.bin', '/content/mydata.bin'], record_bytes=RAW_N*4)
raw_dataset = raw_dataset.map(decode_and_prepare)
raw_dataset = raw_dataset.shuffle(buffer_size=total_data_entries)
train_ds_size = int(0.8 * total_data_entries)
test_ds_size = int(0.2 * total_data_entries)
train_ds = raw_dataset.take(train_ds_size)
remaining_data = raw_dataset.skip(train_ds_size)
test_ds = remaining_data.take(test_ds_size)
Note that I am using the same bin file twice for demonstration purposes. After running that code snippet, you could feed the datasets to your model like this:
model = build_model()
history = model.fit(train_ds, ...)
loss, mse = model.evaluate(test_ds)
as each dataset contains the inputs and the corresponding labels.
Usually, we can define a callback for a model to stop the epoch if the accuracy reaches a certain level.
I am working on the adjustment of parameters. The val_acc is highly unstable as shown in the picture.
def LSTM_model(X_train, y_train, X_test, y_test, num_classes, batch_size=68, units=128, learning_rate=0.005, epochs=20,
dropout=0.2, recurrent_dropout=0.2):
class myCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs={}):
if (logs.get('acc') > 0.90):
print("\nReached 90% accuracy so cancelling training!")
self.model.stop_training = True
callbacks = myCallback()
As the graphs show that the val_acc(orange) is fluctuating within a range and not really going up anymore.
Is there a way to automatically stop the training once the general trend of the val_acc stops increasing?
You can achieve this with a callback like this
class terminate_on_plateau(keras.callbacks.Callback):
def __init__(self):
self.patience = 10
self.val_loss = deque([],self.patience)
self.std_threshold = 1e-2
def on_epoch_end(self,epoch,logs=None):
val_loss,val_mae = model.evaluate(x_val,y_val)
self.val_loss.append(val_loss)
if len(self.val_loss) >= self.patience:
std = np.std(self.val_loss)
if std < self.std_threshold:
print('\n\n EarlyStopping on std invoked! \n\n')
# clear the deque
self.val_loss = deque([],self.patience)
model.stop_training = True
As you can see, in terminate_on_plateau, val_loss of epochs are stored in a deque of max length self.patience. Once the length of the deque reaches self.patience, standard deviation of the val_loss will be calculated for every new epoch, and the training process will be terminated (the deque of val_loss will also be cleared), if the calculated std is smaller than a threshold.
Below is a simple script that shows you how to use this
from collections import deque
import numpy as np
import tensorflow as tf
from tensorflow import keras
import tensorflow.keras.backend as K
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input,Dense
x = np.linspace(0,10,1000)
np.random.shuffle(x)
y = np.sin(x) + x
x_train,x_val,y_train,y_val = train_test_split(x,y,test_size=0.3)
input_x = Input(shape=(1,))
y = Dense(10,activation='relu')(input_x)
y = Dense(10,activation='relu')(y)
y = Dense(1,activation='relu')(y)
model = Model(inputs=input_x,outputs=y)
adamopt = tf.keras.optimizers.Adam(lr=0.01, beta_1=0.9, beta_2=0.999, epsilon=1e-8)
class terminate_on_plateau(keras.callbacks.Callback):
def __init__(self):
self.patience = 10
self.val_loss = deque([],self.patience)
self.std_threshold = 1e-2
def on_epoch_end(self,epoch,logs=None):
val_loss,val_mae = model.evaluate(x_val,y_val)
self.val_loss.append(val_loss)
if len(self.val_loss) >= self.patience:
std = np.std(self.val_loss)
if std < self.std_threshold:
print('\n\n EarlyStopping on std invoked! \n\n')
# clear the deque
self.val_loss = deque([],self.patience)
model.stop_training = True
model.compile(loss='mse',optimizer=adamopt,metrics=['mae'])
history = model.fit(x_train,y_train,
batch_size=8,
epochs=100,
validation_data=(x_val, y_val),
verbose=1,
callbacks=[terminate_on_plateau()])
The code below is for a custom callback that will stop training when the quantity being monitored fails to improve after patience number of epochs. Set the parameter acc_or_loss to 'loss' in order to monitor validation loss. Set it to 'acc' to monitor validation accuracy. I recommend NOT to monitor validation accuracy as it can swing wildly particularly in the early epochs. I put in print statements so you can see what is going on during training. You can of course remove them later. If you are monitoring validation loss the call back halts training if for a patience number of epochs the validation loss has exceeded the lowest loss found in the previous epochs. If you are monitoring validation accuracy the callback halts training if for a patience number of epochs the validation accuracy has stayed below the highest validation accuracy recorded in the previous epochs
class halt(keras.callbacks.Callback):
def __init__(self, patience, acc_or_loss):
self.acc_or_loss=acc_or_loss
super(halt, self).__init__()
self.patience=patience # specifies how many epochs without improvement before learning rate is adjusted
self.lowest_loss=np.inf
self.highest_acc=0
self.count=0
print ('initializing values ', 'count= ', self.count, ' lowest_loss= ', self.lowest_loss, 'highest acc= ', self.highest_acc)
def on_epoch_end(self, epoch, logs=None):
v_loss=logs.get('val_loss') # get the validation loss for this epoch
v_acc=logs.get('val_accuracy')
if self.acc_or_loss=='loss':
print (' for epoch ', epoch +1, ' v_loss= ', v_loss, ' lowest_loss= ', self.lowest_loss, 'count= ', self.count)
if v_loss< self.lowest_loss:
self.lowest_loss=v_loss
self.count=0
else:
self.count=self.count +1
if self.count>=self.patience:
print('There have been ', self.patience, ' epochs with no reduction of validation loss below the lowest loss')
print ('Terminating training')
self.model.stop_training = True
else:
print (' for epoch ', epoch +1, ' v_acc= ', v_acc, ' highest accuracy= ', self.highest_acc, 'count= ', self.count)
if v_acc>self.highest_acc:
self.count=0
self.highest_acc=v_acc
else:
self.count=self.count +1
if self.count>=self.patience:
print('There have been ', self.patience, ' epochs with noincrease in validation accuracy')
print ('Terminating training')
self.model.stop_training = True
patience= 2 # specify the patience value
acc_or_loss='loss' # specify to monitor validation loss or validation accuracy
callbacks=[halt(patience=patience, acc_or_loss=acc_or_loss)]
# in model.fit include callbacks=callbacks
Or you can just use Keras API in tensorflow : tf.keras.callbacks.EarlyStopping
Given your initial question I'm not sure why you would need custom callbacks
Here is an example of application:
history = model.fit([trainX,trainX,trainX],
np.array(trainLabels),
validation_data = ([testX, testX, testX], np.array(testLabels)),
epochs=EPOCH,
batch_size=BATCH_SIZE,
steps_per_epoch = None,
callbacks=[tf.keras.callbacks.EarlyStopping(
monitor="val_acc",
patience=5,
mode="min",
restore_best_weights = True)])
Some of the above answers are a little complex, you can use the below code.
opt = tf.optimizers.Adadelta(learning_rate=0.01)
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=["accuracy"])
es = EarlyStopping(monitor='val_accuracy', mode='max', patience=20)
# will stop if validation accuracy is not improving till 20 epoches, you can give any number in patience.
ms = ModelCheckpoint('save_model.h5', monitor='val_accuracy', mode='max', save_best_only=True)
training_history = model.fit(x=X_train, y=y_train, validation_split=0.1, batch_size=5, epochs=1000, verbose=1,
callbacks = [es, ms])
I just copied this code from my project, which is not for LSTM, you can adjust this code according to your problem/task.
I'm working on a probabilistic forecast model using RNNs and want to log multiple runs with different parameters in Tensorboard to evaluate and compare them. I'm quite new to Tensorboard and couldn't really come up with a good way to organize my runs. I want to be able to sort through them in Tensorboard by parameter values, so currently I'm using this rather clunky approach:
tb = SummaryWriter(log_dir=f'runs/leakyrelu/cuda{cuda_id}/m_epochs{max_epochs}/lr{learning_rate}/'
f'bs{batch_size}/h_h{history_horizon}/f_h{forecast_horizon}/'
f'core_{core_net}/drop_fc{dropout_fc}/'
f'drop_core{dropout_core}')
Is there any smart way or convention on how to do this without creating mile-long filenames or directories kilometres deep?
It seems you are doing HyperParameter tuning with multiple parameters.
The best way to log such runs in Tensorboard is by using its HParams plugin.
Step1: Importing
import tensorflow as tf
from tensorboard.plugins.hparams import api as hp
After that, you create Hparam object of parameters you want to try different values for and create a summary writer.
Step 2: Creating Hparam object and summary writer
HP_NUM_UNITS = hp.HParam('num_units', hp.Discrete([16, 32]))
HP_DROPOUT = hp.HParam('dropout', hp.RealInterval(0.1, 0.2))
HP_OPTIMIZER = hp.HParam('optimizer', hp.Discrete(['adam', 'sgd']))
METRIC_ACCURACY = 'accuracy'
with tf.summary.create_file_writer('logs/hparam_tuning').as_default():
hp.hparams_config(
hparams=[HP_NUM_UNITS, HP_DROPOUT, HP_OPTIMIZER],
metrics=[hp.Metric(METRIC_ACCURACY, display_name='Accuracy')],
)
Your created object will look something like this:
HP_NUM_UNITS
HParam(name='num_units', domain=IntInterval(16, 32), display_name=None, description=None)
Step 3: Create a function for training and testing
def train_test_model(hparams):
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(hparams[HP_NUM_UNITS], activation=tf.nn.relu),
tf.keras.layers.Dropout(hparams[HP_DROPOUT]),
tf.keras.layers.Dense(10, activation=tf.nn.softmax),
])
model.compile(
optimizer=hparams[HP_OPTIMIZER],
loss='sparse_categorical_crossentropy',
metrics=['accuracy'],
)
model.fit(x_train, y_train, epochs=1) # Run with 1 epoch to speed things up for demo purposes
_, accuracy = model.evaluate(x_test, y_test)
return accuracy
In this function hparams is a dictionary of type:
{
HParam Object 1: VALUE-FOR-THE-OBJECT,
HParam Object 2: VALUE-FOR-THE-OBJECT,
HParam Object 3: VALUE-FOR-THE-OBJECT,
}
The actual dictionary looks like this:
{HParam(name='num_units', domain=Discrete([16, 32]), display_name=None, description=None): 32,
HParam(name='dropout', domain=RealInterval(0.1, 0.2), display_name=None, description=None): 0.2,
HParam(name='optimizer', domain=Discrete(['adam', 'sgd']), display_name=None, description=None): 'sgd'}
Step 4: Function for logging into the Tensorboard.
def run(run_dir, hparams):
with tf.summary.create_file_writer(run_dir).as_default():
hp.hparams(hparams) # record the values used in this trial
accuracy = train_test_model(hparams)
tf.summary.scalar(METRIC_ACCURACY, accuracy, step=1)
Here, run_dir is a path for each individual run.
Step 5: Trying different parameter:
session_num = 0
for num_units in HP_NUM_UNITS.domain.values:
for dropout_rate in (HP_DROPOUT.domain.min_value, HP_DROPOUT.domain.max_value):
for optimizer in HP_OPTIMIZER.domain.values:
hparams = {
HP_NUM_UNITS: num_units,
HP_DROPOUT: dropout_rate,
HP_OPTIMIZER: optimizer,
}
run_name = "run-%d" % session_num
print('--- Starting trial: %s' % run_name)
print({h.name: hparams[h] for h in hparams})
run('logs/hparam_tuning/' + run_name, hparams)
session_num += 1
Note: num_units will take 2 values '16' and '32' not every value between 16 and 32.
Your Tensorboard will look like this:
Tabular View:
Scatter Plot View:
.
You can also combine this with Tensorboard callback in Keras by setting the path of the callback to run_dir.
For eg:
def train_test_model(hparams, run_dir):
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(hparams[HP_NUM_UNITS], activation=tf.nn.relu),
tf.keras.layers.Dropout(hparams[HP_DROPOUT]),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(
optimizer=hparams[HP_OPTIMIZER],
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
callbacks = [
tf.keras.callbacks.TensorBoard(run_dir),
]
model.fit(x_train, y_train, epochs=10, callbacks = callbacks) # Run with 1 epoch to speed things up for demo purposes
_, accuracy = model.evaluate(x_test,
y_test)
return accuracy
The above-mentioned steps are good if you want log custom metrics or a variety of metrics other than accuracy or loss which you have defined in the compile method.
But if you don't want to use custom metrics or don't want to deal with summary writers etc. You can use Keras callbacks to simplify the process.
Complete code with callbacks without summary writers
# Creating Hparams
HP_NUM_UNITS = hp.HParam('num_units', hp.Discrete([16, 32]))
HP_DROPOUT = hp.HParam('dropout', hp.RealInterval(0.1, 0.2))
HP_OPTIMIZER = hp.HParam('optimizer', hp.Discrete(['adam', 'sgd']))
# Creating train test function
def train_test_model(hparams, run_dir):
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(hparams[HP_NUM_UNITS], activation=tf.nn.relu),
tf.keras.layers.Dropout(hparams[HP_DROPOUT]),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
model.compile(
optimizer=hparams[HP_OPTIMIZER],
loss='sparse_categorical_crossentropy',
metrics=['accuracy']
)
callbacks = [
tf.keras.callbacks.TensorBoard(run_dir),# log metrics
hp.KerasCallback(run_dir, hparams), # log hparams
]
model.fit(x_train, y_train, epochs=10, callbacks = callbacks) # Run with 1 epoch to speed things up for demo purposes
_, accuracy = model.evaluate(x_test,
y_test)
return accuracy
# Running different configurations
session_num = 0
for num_units in HP_NUM_UNITS.domain.values:
for dropout_rate in (HP_DROPOUT.domain.min_value, HP_DROPOUT.domain.max_value):
for optimizer in HP_OPTIMIZER.domain.values:
hparams = {
HP_NUM_UNITS: num_units,
HP_DROPOUT: dropout_rate,
HP_OPTIMIZER: optimizer,
}
run_name = "run-%d" % session_num
print('--- Starting trial: %s' % run_name)
print({h.name: hparams[h] for h in hparams})
train_test_model(hparams, 'logs/hparam_tuning/' + run_name)
session_num += 1
Useful Links:
Hyperparameter Tuning with the HParams Dashboard
Hparams demo using all possible Hparam objects - Official Github Repo
As you can see below i have two functions , get_data() outputs a data frame for the selected asset history and passes it to train_model() every thing works fine but as the model trains the accuracy does not seem to change the loss does go down but the accuracy stays the same after the second epoch ,when training with 1000 epochs the accuracy also does not change
Things i tried changing with this code:
changing unit count for each of the LSTM layers
tried using differnet data frames from different sources ( alpha-vantage )
changing epoch count
unfortunately nothing changed
def train_model( df):
if not os.path.exists("/py_stuff/"):
os.makedirs("/py_stuff/")
checkpoint_filepath ="/py_stuff/check_point"
weights_checkpoint = "/py_stuff/"
checkpoint_dir = os.path.dirname(checkpoint_filepath)
model_checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_filepath,
save_weights_only=True,
monitor='accuracy',
mode='max',
save_best_only=True,
verbose=1)
dataset_train = df
training_set = dataset_train.iloc[:, 1:2].values
sc = MinMaxScaler(feature_range=(0,1))
training_set_scaled = sc.fit_transform(training_set)
X_train = []
y_train = []
for i in range(100, len(df)):
X_train.append(training_set_scaled[i-100:i, 0])
y_train.append(training_set_scaled[i, 0])
X_train, y_train = np.array(X_train), np.array(y_train)
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))
model = Sequential()
model.add(LSTM(units = 100, return_sequences = True, input_shape = (X_train.shape[1], 1)))
model.add(Dropout(0.2))
model.add(LSTM(units=100 , return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=100 , return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(units=100))
model.add(Dropout(0.2))
model.add(Dense(units=1))
model.compile(optimizer='adam', loss='mean_squared_error' , metrics=['accuracy'])
## loading weights
try:
model.load_weights(checkpoint_filepath)
print ("Weights loaded successfully $$$$$$$ ")
except:
print ("No Weights Found !!! ")
model.fit(X_train,y_train,epochs=50,batch_size=100, callbacks=[model_checkpoint_callback])
## saving weights
try:
model.save(checkpoint_filepath)
model.save_weights(filepath=checkpoint_filepath)
print ("Saving weights and model done ")
except OSError as no_model:
print ("Error saving weights and model !!!!!!!!!!!! ")
def get_data(CHOICE):
data = yf.download( # or pdr.get_data_yahoo(...
# tickers list or string as well
tickers = CHOICE,
# use "period" instead of start/end
# valid periods: 1d,5d,1mo,3mo,6mo,1y,2y,5y,10y,ytd,max
# (optional, default is '1mo')
period = "5y",
# fetch data by interval (including intraday if period < 60 days)
# valid intervals: 1m,2m,5m,15m,30m,60m,90m,1h,1d,5d,1wk,1mo,3mo
# (optional, default is '1d')
interval = "1d",
# group by ticker (to access via data['SPY'])
# (optional, default is 'column')
group_by = 'ticker',
# adjust all OHLC automatically
# (optional, default is False)
auto_adjust = True,
# download pre/post regular market hours data
# (optional, default is False)
prepost = True,
# use threads for mass downloading? (True/False/Integer)
# (optional, default is True)
threads = True,
# proxy URL scheme use use when downloading?
# (optional, default is None)
proxy = None
)
dff = pd.DataFrame(data)
return dff
df = get_data(CHOICE="BTC-USD")
train_model(df)
From your loss function, it looks like you have a regression network. Your loss is Mean Squared Error and the metric accuracy does not have any meaning for regression networks. Accuracy metric is only meaningful when used for classification models. So you can remove the metrics=['accuracy'] from your compile code and and use loss value to evaluate your model. So if the loss is decreasing that means your optimizer is successfully training the network.
You are dealing with a regression problem where the accuracy is not defined.
The accuracy is defined as the probability of belonging to a specific class. For example, the probability of the output to be the digit 9. The number of classes is finite (or countable).
In your case, your network outputs a real number. The notion of accuracy does not make any sense in this context.
For example, the probability of your output to be 1.000 for example is 0. Although (and surprisingly!), a probability of zero does not mean that the event will never happen!
Ideally, Keras should return an error saying accuracy not defined.