I would like to know if it's possible to know the loss/accuracy with model.predict() in Keras?
I tried to figured out by myself, but I failed. Thank you very much for any inputs.
Keras model.predict() only gets the input data (X) and produces the output from the trained model. It does not know anything about the actual expected value (y).
You can do what you are asking for using model.evaluate(), that actually requires the X and y values in your data set and will produce the loss value and metrics values for the model in test mode.
Use model.predict() in production, to just get the model output. During your testing and validation phase, you may want to also use model.evaluate().
So my code is like this :
# Fetch dataset
train_data, test_data = tfds.load(name="imdb_reviews", split=["train", "test"],
batch_size=-1, as_supervised=True)
train_examples, train_labels = tfds.as_numpy(train_data)
test_examples, test_labels = tfds.as_numpy(test_data)
np.save("train_examples", train_examples)
np.save("train_labels", train_labels)
np.save("test_examples", test_examples)
np.save("test_labels", test_labels)
# BUILD MODEL
model = "https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1"
hub_layer = hub.KerasLayer(model, output_shape=[], input_shape=[],
dtype=tf.string, trainable=True, name='gnews_embedding')
model = build_model(hub_layer)
model.summary()
# SAVE AS CHECKPOINT (THE BEST ONLY)
es = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=4, verbose=1, mode='min')
checkpoint = tf.keras.callbacks.ModelCheckpoint('model.h5', monitor='val_loss', save_best_only=True)
# TRAIN MODEL
history = model.fit(
train_examples,
train_labels,
epochs=20,
batch_size=BATCH_SIZE,
validation_split = .2,
shuffle = True,
callbacks = [checkpoint, es],
verbose=1)
# CHECK ACCURACY AND LOSS VALUE
model.load_weights('/app/model.h5')
results = model.evaluate(test_examples, test_labels)
text = "The gold rush apple from natora- this is the most expensive apple."
dataset = tf.data.Dataset.from_tensor_slices([text])
np.save("test_txt", test_txt)
resultsTest = model.evaluate(dataset, test_labels)
print("RESULT ACCURACY = ", resultsTest)
so I try to know the accuracy/loss of this prediction. Can i evaluate a simple sentence like that or just re-train the model to get the loss/accuracy ?
Related
I'm fine-tuning a Huggingface model for a downstream task, and I am using StratifiedKFold to evaluate performance on unseen data. The results I'm getting are very encouraging for my particular domain, and its making me think that I might be leaking data somehow. Suspiciously, f1 seems to be consistently increasing over each fold. I'm assuming that there is something hanging around between folds that is causing this increase in performance but I can't see what.
checkpoint = 'roberta-large-mnli'
# LM settiungs
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=3)
# set training arguments
batch_size=32
training_args = TrainingArguments(num_train_epochs=5,
weight_decay=0.1,
learning_rate=1e-5,
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=batch_size,
output_dir="content/drive/My Drive/Projects/test-trainer")
metric4 = load_metric("f1")
# function to tokenize each dataset
def tokenize_function(example):
return tokenizer(example["message"], example["hypothesis"], truncation=True)
# 5-fold cross validation loop
for train_index, test_index in cv.split(data, data['label']):
# split into train and test regions based on index positions
train_set, test_set = data.iloc[list(train_index)],data.iloc[list(test_index)]
# split training set into train and validation sub-regions
train_set, val_set = train_test_split(train_set,
stratify=train_set['label'],
test_size=0.10, random_state=42)
# convert datasets to Dataset object and gather in dictionary
train_dataset = Dataset.from_pandas(train_set, preserve_index=False)
val_dataset = Dataset.from_pandas(val_set, preserve_index=False)
test_dataset = Dataset.from_pandas(test_set, preserve_index=False)
combined_dataset = DatasetDict({'train':train_dataset,
'test': test_dataset,
'val':val_dataset})
# tokenize
tokenized_datasets = combined_dataset.map(tokenize_function, batched=True)
# instantiate trainer
trainer = Trainer(
model,
training_args,
train_dataset=tokenized_datasets["train"],
eval_dataset=tokenized_datasets["val"],
data_collator=data_collator,
tokenizer=tokenizer)
# train
trainer.train()
# get predictions
predictions = trainer.predict(tokenized_datasets["test"])
preds = np.argmax(predictions.predictions, axis=-1)
print("F1 score ", metric4.compute(predictions=preds,
references=predictions.label_ids,
average='macro',pos_label=2))
Based on the above, what I think I'm doing is (1) splitting the data into seperate folds, (2) splitting the training set into train and val regions, (3) tokenizing, (4) training on the train/val sets, (5) testing on the test set, (6) starting the next with a new trainer instance. I'd like to think that that is correct, but I cannot see in the above why the F1 at fold-level would consistently get better over time.
I trained a model on initial data, got some good scores, and now after receiving more data I want to load the pre-trained model and continue to train.
Here some snippet of what I did:
(1) I follow this post which says to save the model in 'tf' format
# saving initial model
model.save(path2initial, save_format='tf')
# load pre-trained model
clf = tf.keras.models.load_model(path2initial)
# create new data generators
train_gen = generators.create(generator_config, 'train')
val_gen = generators.create(generator_config, 'val')
# create metrics, loss, optimizer and callbacks
loss = losses.create(loss_config)
callback_list = callbacks.create(callback_config)
optimizer = optimizers.create(optimizer_confing)
metrics = metrics.create(metrics_config)
# compile model
clf.compile(optimizer=optimizer, loss=loss, metrics=metrics)
# train
clf.fit(x=train_gen,
epochs=NB_EPCOHS,
validation_data=val_gen,
steps_per_epochs=math.ceil(len(train_steps)/ BATCH_SIZE),
validation_steps=math.ceil(len(val_steps)/ BATCH_SIZE),
callbacks=callback_list,
use_multiprocessing=True,
workers=16,
max_queue_size=8,
verbose=1
)
I should note that two of my callbacks are
EarlyStopping(monitor='val_loss', restore_best_weights=True,
min_delta=0.001, patience=10, mode='min', verbose=1)
ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=True,
mode='min', save_freq='epoch')
And that train_gen is consisted with both initial_data and new_data.
This method trained only for 4 epochs, and hadn't changed for the
rest of the 10 'patient' epochs. Moreover the results were way worse than the initial model's results.
(2) The second method I tried was to save the model in the default format (that's the only change):
model.save(path2initial)
.
.
.
This model had trained for 71/200 epochs, but it seems that it ignored my EarlyStopping() callback. In some epochs the val-loss had changed by 1e-4 or even less, and still it continue with the training (weirddd), And it stopped (by EarlyStopping()) in epoch 71 even that the val-loss had change! Moreover, the results had barely changed.
For comparison I trained a model from scratch on all the data (both initial and new data) and got way better results:
Initial data Method (1) Method (2) New model on all data
mean F1 score: 0.735 0.422 0.74 0.803
Is there a proven way to how to continue training a keras model?
When loading the model does the optimizer status reset?
When loading the model, do I need to define all the callbacks, loss, opt, metrics all over again? Do I need to compile it again?
I am using tensorflow and keras for a binary classification problem.
I have only a training set of 81 samples (Testsize 21), but ~1900 features. I know its too less samples and too many features, but its a biological problem (gene-expression data), so i have to deal with it.
My model looks like this: (using different neurons per layer, different number of hidden layers, regularization and dropout to deal with the high dimensional data)
model = Sequential()
model.add(Input((input_shape,)))
for i in range(num_hidden):
model.add(Dense(n_neurons, activation="relu",kernel_regularizer=keras.regularizers.l1_l2(l1_reg, l2_reg)))
model.add(Dropout(dropout_rate))
model.add(Dense(1, activation="sigmoid"))
ann_optimizer= keras.optimizers.Adam()
model.compile(loss="binary_crossentropy",
optimizer=ann_optimizer, metrics=['accuracy'])
I am using a 10 fold nested cross validation and grid search in the inner fold like this:
# fit and evaluate the model
# configure the inner cross-validation procedure (5 fold, 80 inner training dataset, 20 inner test dataset)
cv_inner = ShuffleSplit(n_splits=5, test_size=0.2, random_state=1)
# define the mode
ann = KerasRegressor(build_fn=regressionModel_sequential, input_shape=X_train.shape[1],
batch_size=batch_size)
# use pipeline as prevent from Leaky Preprocessing (StandardScaler on 80% inner-training dataset))
pipe = Pipeline(steps=[('scaler', StandardScaler()), ('ann', ann)])
# define the grid search of with inner cv to get good parameters
grid_search_result = GridSearchCV(
pipe, param_grid, n_jobs=-1, cv=cv_inner, refit=True, verbose=0)
#refit = True a final model with the entire inner-training dataset
# execute search
grid_search_result.fit(X_train, y_train, ann__verbose=0)
logger.info('>>>>> est=%.3f, params=%s' % (grid_search_result.best_score_, grid_search_result.best_params_))
# to get loss curve
ann_val = regressionModel_sequential(input_shape=X_train.shape[1],
n_neurons=grid_search_result.best_params_['ann__n_neurons'],
l1_reg=grid_search_result.best_params_['ann__l1_reg'],
l2_reg=grid_search_result.best_params_['ann__l2_reg'],
num_hidden=grid_search_result.best_params_['ann__num_hidden'],
dropout_rate=grid_search_result.best_params_['ann__dropout_rate'])
# Validation with outer 20 %
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
history = ann_val.fit(X_train, y_train, batch_size=batch_size, verbose=0,
validation_split=0.25, shuffle=True, epochs=grid_search_result.best_params_['ann__epochs'])
plot_history(history, directory, i)
# use best grid search reult for predicting on outer test dataset
y_predicted = ann_val.predict(X_test)
# print predicted
logger.info(y_predicted[:5])
logger.info(y_test[:5])
rmse = (np.sqrt(metrics.mean_squared_error(y_test, y_predicted)))
mae = (metrics.mean_squared_error(y_test, y_predicted))
r_squared = metrics.r2_score(y_test, y_predicted)
My loss seems good: loss
But accuracy is very bad. accuracy (example from one outer fold)
Does anyone have suggestions on what i could do to improve my results?
I also know that the biological question behind is very hard/maybe not possible to solve.
I am going through the Kaggle Digit Recognizer Tutorial and I'm trying to understand how all of this works. I would like to validate a predicted value. Basically, I have a prediction that's wrong, but I want to see what the actual value of that prediction was. I think I am way off:
...
df = pd.read_csv('data/train.csv')
labels = df['label'].values
x_train = df.drop(columns=['label']).values / 255
# trying to produce a crappy dataset for train/test
x_train, x_test, y_train, y_test = train_test_split(x_train, labels, test_size=0.95)
# Purposely trying to get a crappy model so I can learn about validation
model = tf.keras.models.Sequential()
# model.add(tf.keras.layers.Flatten())
# model.add(tf.keras.layers.Dense(128, activation=tf.nn.relu))
model.add(tf.keras.layers.Dense(10, activation=tf.nn.softmax))
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=1)
predictions = model.predict([x_test])
index_to_predict = 0
print('Prediction: ', np.argmax(predictions[index_to_predict]))
print('Actual: ', predictions.argmax(axis=-1)[index_to_predict])
print(predictions.shape)
vals = x_test[index_to_predict].reshape(28, 28)
plt.imshow(vals)
This yields the following:
How can I get a true 'heres the prediction' and 'heres the actual' breakdown? My logic on getting the actual is definitely off.
The true labels (also sometimes called target values, or ground-truth labels) are stored in y_train and y_test for training and test set respectively. Therefore, you can easily just print that to find the true label:
print('Actual:', y_test[index_to_predict])
y_test[index_to_predict]
will have the actual label and
predictions[index_to_predict]
should have the predicted probability values for each of your classes.
I'm training a time series forecasting LSTM-model in Keras.
When I save the trained model and reload it, to train it on different data (so the weights get better and can predict several types of data), is it alright to:
model.fit(x_train, y_train, epochs=60, batch_size=1, verbose=2)
# make predictions
y_pred_train = np.squeeze(model.predict(x_train), axis=(1,))
y_pred_val = np.squeeze(model.predict(x_val), axis=(1,))
# save whole model
model.save('lstm_model.h5')
# load and fit model
loaded_model = load_model('lstm_model.h5')
loaded_model.fit(x_train, y_train, epochs=60, batch_size=1, verbose=2)
I wonder if the training starts all over again and the trained status is gone after fitting in a second time.
Thank you for helping! :)