Keras fit using validation_split gets higher results than using validation_data - python

I am using the following fit function:
history =[X1_train, X2_train, X3_train],
#validation_data=([X1_test, X2_test, X3_test], y_test),
class_weight={0:1, 1:10})
and getting average val_acc of 0.7. But when running again, this time with the validation_data option (using data from the same dataset that I kept aside, of size around 30% of train data) I am getting an average val_acc of 0.35. Any reasons for getting such differences?

As requested by the OP, I am posting my comment as an answer and try to elaborate more:
When you set the validation_split argument, the validations samples are selected from the last samples in the training data and labels (i.e. X_train and y_train). Now, in this specific case, if the proportion of class labels in these selected samples is not the same as the proportion of the class labels in the data you provide using validation_data argument, then you should not necessarily expect the validation loss to be the same in these two cases. And that's simply because your model may have different accuracy on each of the classes.


Using Keras Multi Layer Perceptron with Cross Validation prediction [duplicate]

I'm implementing a Multilayer Perceptron in Keras and using scikit-learn to perform cross-validation. For this, I was inspired by the code found in the issue Cross Validation in Keras
from sklearn.cross_validation import StratifiedKFold
def load_data():
# load your data using this function
def create model():
# create your model using this function
def train_and_evaluate__model(model, data[train], labels[train], data[test], labels[test)):
# fit and evaluate here.
if __name__ == "__main__":
X, Y = load_model()
kFold = StratifiedKFold(n_splits=10)
for train, test in kFold.split(X, Y):
model = None
model = create_model()
train_evaluate(model, X[train], Y[train], X[test], Y[test])
In my studies on neural networks, I learned that the knowledge representation of the neural network is in the synaptic weights and during the network tracing process, the weights that are updated to thereby reduce the network error rate and improve its performance. (In my case, I'm using Supervised Learning)
For better training and assessment of neural network performance, a common method of being used is cross-validation that returns partitions of the data set for training and evaluation of the model.
My doubt is...
In this code snippet:
for train, test in kFold.split(X, Y):
model = None
model = create_model()
train_evaluate(model, X[train], Y[train], X[test], Y[test])
We define, train and evaluate a new neural net for each of the generated partitions?
If my goal is to fine-tune the network for the entire dataset, why is it not correct to define a single neural network and train it with the generated partitions?
That is, why is this piece of code like this?
for train, test in kFold.split(X, Y):
model = None
model = create_model()
train_evaluate(model, X[train], Y[train], X[test], Y[test])
and not so?
model = None
model = create_model()
for train, test in kFold.split(X, Y):
train_evaluate(model, X[train], Y[train], X[test], Y[test])
Is my understanding of how the code works wrong? Or my theory?
If my goal is to fine-tune the network for the entire dataset
It is not clear what you mean by "fine-tune", or even what exactly is your purpose for performing cross-validation (CV); in general, CV serves one of the following purposes:
Model selection (choose the values of hyperparameters)
Model assessment
Since you don't define any search grid for hyperparameter selection in your code, it would seem that you are using CV in order to get the expected performance of your model (error, accuracy etc).
Anyway, for whatever reason you are using CV, the first snippet is the correct one; your second snippet
model = None
model = create_model()
for train, test in kFold.split(X, Y):
train_evaluate(model, X[train], Y[train], X[test], Y[test])
will train your model sequentially over the different partitions (i.e. train on partition #1, then continue training on partition #2 etc), which essentially is just training on your whole data set, and it is certainly not cross-validation...
That said, a final step after the CV which is often only implied (and frequently missed by beginners) is that, after you are satisfied with your chosen hyperparameters and/or model performance as given by your CV procedure, you go back and train again your model, this time with the entire available data.
You can use wrappers of the Scikit-Learn API with Keras models.
Given inputs x and y, here's an example of repeated 5-fold cross-validation:
from sklearn.model_selection import RepeatedKFold, cross_val_score
from tensorflow.keras.models import *
from tensorflow.keras.layers import *
from tensorflow.keras.wrappers.scikit_learn import KerasRegressor
def buildmodel():
model= Sequential([
Dense(10, activation="relu"),
Dense(5, activation="relu"),
model.compile(optimizer='adam', loss='mse', metrics=['mse'])
estimator= KerasRegressor(build_fn=buildmodel, epochs=100, batch_size=10, verbose=0)
kfold= RepeatedKFold(n_splits=5, n_repeats=100)
results= cross_val_score(estimator, x, y, cv=kfold, n_jobs=2) # 2 cpus
results.mean() # Mean MSE
I think many of your questions will be answered if you read about nested cross-validation. This is a good way to "fine tune" the hyper parameters of your model. There's a thread here:
The biggest issue to be aware of is "peeking" or circular logic. Essentially - you want to make sure that none of data used to assess model accuracy is seen during training.
One example where this might be problematic is if you are running something like PCA or ICA for feature extraction. If doing something like this, you must be sure to run PCA on your training set, and then apply the transformation matrix from the training set to the test set.
The main idea of testing your model performance is to perform the following steps:
Train a model on a training set.
Evaluate your model on a data not used during training process in order to simulate a new data arrival.
So basically - the data you should finally test your model should mimic the first data portion you'll get from your client/application to apply your model on.
So that's why cross-validation is so powerful - it makes every data point in your whole dataset to be used as a simulation of new data.
And now - to answer your question - every cross-validation should follow the following pattern:
for train, test in kFold.split(X, Y
model = training_procedure(train, ...)
score = evaluation_procedure(model, test, ...)
because after all, you'll first train your model and then use it on a new data. In your second approach - you cannot treat it as a mimicry of a training process because e.g. in second fold your model would have information kept from the first fold - which is not equivalent to your training procedure.
Of course - you could apply a training procedure which uses 10 folds of consecutive training in order to finetune network. But this is not cross-validation then - you'll need to evaluate this procedure using some kind of schema above.
The commented out functions make this a little less obvious, but the idea is to keep track of your model performance as you iterate through your folds and at the end provide either those lower level performance metrics or an averaged global performance. For example:
The train_evaluate function ideally would output some accuracy score for each split, which could be combined at the end.
def train_evaluate(model, x_train, y_train, x_test, y_test):, y_train)
return model.score(x_test, y_test)
X, Y = load_model()
kFold = StratifiedKFold(n_splits=10)
scores = np.zeros(10)
idx = 0
for train, test in kFold.split(X, Y):
model = create_model()
scores[idx] = train_evaluate(model, X[train], Y[train], X[test], Y[test])
idx += 1
So yes you do want to create a new model for each fold as the purpose of this exercise is to determine how your model as it is designed performs on all segments of the data, not just one particular segment that may or may not allow the model to perform well.
This type of approach becomes particularly powerful when applied along with a grid search over hyperparameters. In this approach you train a model with varying hyperparameters using the cross validation splits and keep track of the performance on splits and overall. In the end you will be able to get a much better idea of which hyperparameters allow the model to perform best. For a much more in depth explanation see sklearn Model Selection and pay particular attention to the sections of Cross Validation and Grid Search.

How does keras.evaluate() calculate the loss?

I am building a MLP using TensorFlow 2.0. I am plotting the learning curve and also using keras.evaluate on both training and test data to see how well it performed. The code I'm using:
history =, y_train, batch_size=32,
epochs=200, validation_split=0.2, verbose=0)
# evaluate the model
eval_result_tr = model.evaluate(X_train, y_train)
eval_result_te = model.evaluate(X_test, y_test)
print("[training loss, training accuracy]:", eval_result_tr)
print("[test loss, test accuracy]:", eval_result_te)
#[training loss, training accuracy]: [0.5734676122665405, 0.9770742654800415]
#[test loss, test accuracy]: [0.7273344397544861, 0.9563318490982056]
#plot the learning rate curve
import matplotlib.pyplot as plt
plt.plot(history.history["loss"], label='eğitim')
plt.plot(history.history['val_loss'], label='doğrulama')
plt.xlabel("Öğrenme ivmesi")
plt.ylabel("Hata payı")
plt.title("Temel modelin öğrenme eğrisi")
The output is:
My question is: How keras.evaluate() calculates the training loss to be 0.5734676122665405? I take the average of history.history["loss"] bu it returns different (0.7975356701016426) value.
Or, am I mistaken to begin with by trying to evaluate the model performance on training data by eval_result_tr = model.evaluate(X_train, y_train)?
For community benefit, adding #Dr. Snoopy's answer here in the answer section.
This has been asked many times before, the loss you see is with
changing weights during training, evaluating outside of training will
use fixed weights, so you will always see a different loss value.

Difference in every result of network in every run?

I have written a simple neural network (MLP Regressor), to fit simple data frame columns. To have an optimum architecture, I also defined it as a function to see whether it is converging to a pattern. But every time that I run the model, it gives me a different result than the last time that I tried, and I do not know why? Due to the fact that it is fairly difficult to make the question reproducible, I can not post the data but I can post the architecture of the network here:
def MLP(): #After 50
return nn,nl,a,s,learn,learn_in,max_i#,
def process(df):
y = df.iloc[:,-1]
X = df.drop(columns=['col3'])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=27)
return X_train, X_test, y_train, y_test
def MLPreg(x_train, y_train):#
act=['identity', 'logistic', 'relu','tanh'] #'identity'=Linear
sol=['lbfgs', 'sgd', 'adam']
model = MLPRegressor(hidden_layer_sizes=(nl,nn), activation=activ, solver=solv, alpha=0.00001, batch_size='auto',
learning_rate=lr, learning_rate_init=learn_in, power_t=0.5, max_iter=max_i, shuffle=True, random_state=None,
tol=0.0001, verbose=False, warm_start=False, momentum=0.9, nesterovs_momentum=True, early_stopping=False,
validation_fraction=0.1, beta_1=0.9, beta_2=0.999, epsilon=1e-08, n_iter_no_change=10, max_fun=15000), y_train)
return model
Even I have tried to keep all variables that make the model produce randomness Off but I receive different mse in every run.
The new model which is below is the simplest version.
def MLPreg(x_train, y_train):
model = MLPRegressor(hidden_layer_sizes=(100,),
max_iter=100,), y_train)
return model
The first time mse:
The second time mse:
And so on.
Looking at the documentation of MLPRegressor, these two arguments are important for reproducible results:
shuffle:bool, default=True
Whether to shuffle samples in each
iteration. Only used when solver=’sgd’ or ‘adam’.
Set shuffle=False to have the same behavior between runs.
random_state: int, RandomState instance, default=None
random number generation for weights and bias initialization,
train-test split if early stopping is used, and batch sampling when
solver=’sgd’ or ‘adam’. Pass an int for reproducible results across
multiple function calls.
Follow these instructions and set an int value for this argument. Neither of your code examples currently does that (one sets it to None, which is the non-reproducible default, the other omits it).

Tensorflow Keras model: how to get the best score from a history object

I'm trying to train multiple machine learning models using tensorflow keras, I was just wondering is there a way to obtain the best score achieved while training after training is complete. I found online that the .fit function returns a history object which can be accessed to get the best score, though from code i've tried it says "AttributeError: 'History' object has no attribute 'best_score'", I cannot find an attribute list online so this is why I am asking here.
Thanks in advance.
History =, ytrain, epochs=1, validation_data=(Xtest, ytest), verbose=1)
print("Best: %f using %s" % (History.best_score, History.best_params_))
PS, I know training for 1 epoch will achieve nothing, I'm just trying to test the code
I'm assuming you just want the best score from the history object.
hist =
print(hist.history) # this will print a dictionary object, now you need to grab the metrics / score you're looking for
# if your score == 'acc', if not replace 'acc' with your metric
best_score = max(hist.history['acc'])
If your metric is 'accuracy', use that instead.
If you want the best model, you can just use ModelCheckPoint
The fit() method on keras return a history object. The history.history attribute is a dictionary recording training loss values and metrics values at successive epochs, as well as validation loss values and validation metrics values (if applicable). You can call "history.history['loss']" or "history.history['val_loss']" to access it. Here is an example of it.
model.compile(loss = 'mean_squared_error', optimizer =optimizer,metrics=['accuracy'])
History =, ytrain, epochs=1, validation_data=(Xtest, ytest), verbose=1)
For the best params, I think using callbacks.ModelCheckpoint might be helpful. This method has an argument 'save_best_only' to the latest best model.
keras.callbacks.callbacks.ModelCheckpoint(filepath, monitor='val_loss', verbose=0, save_best_only=False, save_weights_only=False, mode='auto', period=1)
Kindly refer to the Training history visualization and callback for more information.

Python/Keras - How to access each epoch prediction?

I'm using Keras to predict a time series. As standard I'm using 20 epochs.
I want to check if my model is learning well, by predicting for each one of the 20 epochs.
By using model.predict() I'm getting only one prediction among all epochs (not sure how Keras selects it). I want all predictions, or at least the 10 best.
Would anyone know how to help me?
I think there is a bit of a confusion here.
An epoch is only used while training the neural network, so when training stops (in this case, after the 20th epoch), then the weights correspond to the ones computed on the last epoch.
Keras prints current loss values on the validation set during training after each epoch. If the weights after each epoch are not saved, then they are lost. You can save weights for each epoch with the ModelCheckpoint callback, and then load them back with load_weights on your model.
You can compute your predictions after each training epoch by implementing an appropriate callback by subclassing Callback and calling predict on the model inside the on_epoch_end function.
Then to use it, you instantiate your callback, make a list and use it as keyword argument callbacks to
The following code will do the desired job:
import tensorflow as tf
import keras
# define your custom callback for prediction
class PredictionCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs={}):
y_pred = self.model.predict(self.validation_data[0])
print('prediction: {} at epoch: {}'.format(y_pred, epoch))
# ...
# register the callback before training starts, y_train, batch_size=32, epochs=25,
validation_data=(X_valid, y_valid),
In case you want to make predictions on the test data, after every epoch while the training is going-on you can try this
class CustomCallback(keras.callbacks.Callback):
def __init__(self, model, x_test, y_test):
self.model = model
self.x_test = x_test
self.y_test = y_test
def on_epoch_end(self, epoch, logs={}):
y_pred = self.model.predict(self.x_test, self.y_test)
print('y predicted: ', y_pred)
You need mention the callback during
# your model architecture, y_train, epochs=10,
callbacks=[CustomCallback(model, x_test, y_test)])
Similar to on_epoch_end there are many other methods provided by keras
on_train_begin, on_train_end, on_epoch_begin, on_epoch_end, on_test_begin,
on_test_end, on_predict_begin, on_predict_end, on_train_batch_begin, on_train_batch_end,
on_test_batch_begin, on_test_batch_end, on_predict_batch_begin,on_predict_batch_end
