Results not reproducible with Keras and TensorFlow in Python

Results not reproducible with Keras and TensorFlow in Python - python

I have the problem, that I am not able to reproduce my results with Keras and ThensorFlow.
It seems like recently there has been a workaround published on the Keras documentation site for this issue but somehow it doesn't work for me.
What I am doing wrong?
I'm using a Jupyter Notebook on a MBP Retina (without Nvidia GPU).
# ** Workaround from Keras Documentation **
import numpy as np
import tensorflow as tf
import random as rn
# The below is necessary in Python 3.2.3 onwards to
# have reproducible behavior for certain hash-based operations.
# See these references for further details:
# https://docs.python.org/3.4/using/cmdline.html#envvar-PYTHONHASHSEED
# https://github.com/fchollet/keras/issues/2280#issuecomment-306959926
import os
os.environ['PYTHONHASHSEED'] = '0'
# The below is necessary for starting Numpy generated random numbers
# in a well-defined initial state.
np.random.seed(42)
# The below is necessary for starting core Python generated random numbers
# in a well-defined state.
rn.seed(12345)
# Force TensorFlow to use single thread.
# Multiple threads are a potential source of
# non-reproducible results.
# For further details, see: https://stackoverflow.com/questions/42022950/which-seeds-have-to-be-set-where-to-realize-100-reproducibility-of-training-res
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
from keras import backend as K
# The below tf.set_random_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, see: https://www.tensorflow.org/api_docs/python/tf/set_random_seed
tf.set_random_seed(1234)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)
# ** Workaround end **
# ** Start of my code **
# LSTM and CNN for sequence classification in the IMDB dataset
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence
from sklearn import metrics
# fix random seed for reproducibility
#np.random.seed(7)
# ... importing data and so on ...
# create the model
embedding_vecor_length = 32
neurons = 91
epochs = 1
model = Sequential()
model.add(Embedding(top_words, embedding_vecor_length, input_length=max_review_length))
model.add(LSTM(neurons))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='mean_squared_logarithmic_error', optimizer='adam', metrics=['accuracy'])
print(model.summary())
model.fit(X_train, y_train, epochs=epochs, batch_size=64)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))
Used Python version:
Python 3.6.3 |Anaconda custom (x86_64)| (default, Oct 6 2017, 12:04:38)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
The workaround is already included in the code (without effect).
With everytime I do the training part I get different results.
When resetting the kernel of the Jupyter Notebook, 1st time corresponds with the first time and 2nd time with 2nd time.
So after resetting I will always get for example 0.7782 at the first run, 0.7732 on the second run etc.
But results without kernel reset are always different each time I run it.
I would be helpful for any suggestion!

I had exactly the same problem and managed to solve it by closing and restarting the tensorflow session every time I run the model. In your case it should look like this:
#START A NEW TF SESSION
np.random.seed(0)
tf.set_random_seed(0)
sess = tf.Session(graph=tf.get_default_graph())
K.set_session(sess)
embedding_vecor_length = 32
neurons = 91
epochs = 1
model = Sequential()
model.add(Embedding(top_words, embedding_vecor_length, input_length=max_review_length))
model.add(LSTM(neurons))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='mean_squared_logarithmic_error', optimizer='adam', metrics=['accuracy'])
print(model.summary())
model.fit(X_train, y_train, epochs=epochs, batch_size=64)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))
#CLOSE TF SESSION
K.clear_session()
I ran the following code and had reproducible results using GPU and tensorflow backend:
print datetime.now()
for i in range(10):
np.random.seed(0)
tf.set_random_seed(0)
sess = tf.Session(graph=tf.get_default_graph())
K.set_session(sess)
n_classes = 3
n_epochs = 20
batch_size = 128
task = Input(shape = x.shape[1:])
h = Dense(100, activation='relu', name='shared')(task)
h1= Dense(100, activation='relu', name='single1')(h)
output1 = Dense(n_classes, activation='softmax')(h1)
model = Model(task, output1)
model.compile(loss='categorical_crossentropy', optimizer='Adam')
model.fit(x_train, y_train_onehot, batch_size = batch_size, epochs=n_epochs, verbose=0)
print(model.evaluate(x=x_test, y=y_test_onehot, batch_size=batch_size, verbose=0))
K.clear_session()
And obtained this output:
2017-10-23 11:27:14.494482
0.489712882132
0.489712893813
0.489712892765
0.489712854426
0.489712882132
0.489712864011
0.486303713004
0.489712903398
0.489712892765
0.489712903398
What I understood is that if you don't close your tf session (you are doing it by running in a new kernel) you keep sampling the same "seeded" distribution.

My answer is the following, which uses Keras with Tensorflow as backend. Within your nested for loop, where one typically iterates through the various parameters you wish to explore for your model's development, immediately add this function after your last for loop.
for...
for...
reset_keras()
.
.
.
where the reset function is defined as
def reset_keras():
sess = tf.keras.backend.get_session()
tf.keras.backend.clear_session()
sess.close()
sess = tf.keras.backend.get_session()
np.random.seed(1)
tf.set_random_seed(2)
PS: The function above also actually avoids your nvidia GPU from building up too much memory (which happens after many iteration) so that it eventually becomes very slow...so the function restores GPU performance and maintains results as reproducible.

Looks like a bug in TensorFlow / Keras not sure. When setting the Keras back-end to CNTK the results are reproducible.
I even tried with several versions of TensorFlow from 1.2.1 till 1.13.1. All the TensorFlow versions results doesn't agree with multiple runs even when the random seeds are set.

The thing that worked for me was to run the training every time in a new console. In addition to this I also have this parameters set:
RANDOM_STATE = 42
os.environ['PYTHONHASHSEED'] = str(RANDOM_STATE)
random.seed(RANDOM_STATE)
np.random.seed(RANDOM_STATE)
tf.set_random_seed(RANDOM_STATE)
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)
intra_op_parallelism could also be a bigger value

Related

Incompatible shapes in Stellargraph

I'm trying to implement a small prototype of The GCN Model using the library Stellargraph. I've my StellarGraph graph object ready, I'm trying to solve a multi-class multi-label classification problem. This means I'm trying to predict more than one column (19 exactly) each column is encoded to either 0 or 1.
Here is what I've done:
from sklearn.model_selection import train_test_split
from stellargraph.mapper import FullBatchNodeGenerator
train_subjects, test_subjects = train_test_split(nodelist, test_size = .25)
generator = FullBatchNodeGenerator(graph, method="gcn")
from stellargraph.layer import GCN
train_gen = generator.flow(train_subjects['ID'], train_subjects.drop(['ID'], axis = 1))
gcn = GCN(layer_sizes=[16, 16], activations=["relu", "relu"], generator=generator, dropout=0.5)
from tensorflow.keras import layers, optimizers, losses, metrics, Model
x_inp, x_out = gcn.in_out_tensors()
predictions = layers.Dense(units = 1, activation="sigmoid")(x_out)
from tensorflow.keras.metrics import Precision as Precision

model = Model(inputs=x_inp, outputs=predictions)
model.compile(
optimizer=optimizers.Adam(learning_rate = 0.01),
loss=losses.categorical_crossentropy,
metrics= [Precision()])
val_gen = generator.flow(test_subjects['ID'], test_subjects.drop(['ID'], axis = 1))
from tensorflow.keras.callbacks import EarlyStopping
es_callback = EarlyStopping(monitor="val_precision", patience=200, restore_best_weights=True)
history = model.fit(
train_gen,
epochs=200,
validation_data=val_gen,
verbose=2,
shuffle=False,
callbacks=[es_callback])
I've 271045 edges & 16354 nodes in total including 12265 training nodes. The issue I'm getting is a shape mismatching from Keras. It states as follows. I suspect it's due to inserting multiple columns as target columns. I've tried the model using only one column (class) & it worked perfectly.
InvalidArgumentError: Incompatible shapes: [1,12265] vs. [1,233035]
[[node LogicalAnd_1 (defined at tmp/ipykernel_52/2745570431.py:7) ]] [Op:__inference_train_function_1405]
It's worth mentioning that 233035 = 12265 (number of train nodes) times 19 (number of classes). Any Idea on what is going wrong here?

I figured out the problem.
It was a newbie mistake, I initialized the Dense Classification layer with 1 unit instead of 19 (number of classes).
I just needed to fix that line to:
predictions = layers.Dense(units = 19, activation="sigmoid")(x_out)
Have a nice day!

Freeze model and train it

I am using Keras Tensorflow in Colab. I fit a model and save it. Then I load it and check the performance and of course it should be the same. Then I freeze it and I fit it again. I would expect that afterwards the model has the same performance. Of course during "training" due to batch size differences there can be differences in the accuracy. But afterwards when checking it with model.evaluate I would expect no differences, as the weights cannot be changed, as the model was frozen. However, it turns out this is not the case.
My code:
import csv
import tensorflow as tf
import tensorflow_datasets as tfds
import numpy as np
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
(train_x, train_labels), (test_x, test_labels) = tf.keras.datasets.imdb.load_data(num_words=10000)
x_train_padded = pad_sequences(train_x, maxlen=500)
x_test_padded = pad_sequences(test_x, maxlen=500)
model = tf.keras.Sequential([
tf.keras.layers.Embedding(10000, 128, input_length=500),
tf.keras.layers.Conv1D(128, 5, activation='relu'),
tf.keras.layers.GlobalAveragePooling1D(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1)
])
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),optimizer='adam', metrics=[tf.metrics.BinaryAccuracy(threshold=0.0, name='accuracy')])
history = model.fit(x=x_train_padded,
y=train_labels,
validation_data=(x_test_padded , test_labels),
epochs=4, batch_size=128)
gives the output:
I save the model:
model.save('test.h5')
and load it back:
modelloaded=tf.keras.models.load_model('test.h5')
and check the performance:
modelloaded.evaluate(x_test_padded , test_labels)
of course still the same:
Now I set the model to non-trainable:
modelloaded.trainable=False
and indeed:
modelloaded.summary()
shows that all parameters are non-trainable:
Now I fit it again, using just one epoch:
history = modelloaded.fit(x=x_train_padded,
y=train_labels,
validation_data=(x_test_padded , test_labels),
epochs=1, batch_size=128)
I understand that although the weights are non-trainable, the accuracy changes as this depends on the batch size.
However, when I check the model afterwards with:
modelloaded.evaluate(x_test_padded , test_labels)
I can see that the model was changed? The loss and the accuracy is different. I do not understand why, I would have expected the same numbers. As the model cannot be trained. It doesn't matter if I call it with different batch sizes:
modelloaded.evaluate(x_test_padded , test_labels, batch_size=16)
The numbers are always the same, however different to those before the model fitting.
Edit:
I tried the following:
modelloaded=tf.keras.models.load_model('test.h5')
modelloaded.trainable=False
for layer in modelloaded.layers:
layer.trainable=False
history = modelloaded.fit(x=x_train_padded,
y=train_labels,
validation_data=(x_test_padded , test_labels),
epochs=1, batch_size=128)
modelloaded.evaluate(x_test_padded, test_labels)
However, still the weights are adjusted (I checked this with comparing
print(modelloaded.trainable_variables) before and afterwards) and the modelloaded.evaluate output gives slightly different results, where I would expect no changes. As the model weights should not have changed. But they did, as I can see when checking
print(modelloaded.trainable_variables).

This appears to be a bigger Issue which is discussed here.
Setting all layers explicitly non trainable should work:
for layer in modelloaded.layers:
layer.trainable = False

My mistake was that I did not compile the model again after setting it to non-trainable.

You have to compile the model befor fitting it again, or the function fit will take your latest compile configuration ...

tf.keras: how to improve performance with large-ish embedding layer

I am training an LSTM model with embedding input layer with a vocabulary size of approximately 100,000. While profiling the training via tensorboard, I discovered that most of the training time is spent on "Kernel Launch" (58%), followed by "All Others" (36%). In other words the GPU is idle most of the time due to overhead. The high kernel launch time seems to be driven by the size of the embedding layer.
My question is: how can I improve the training speed? Is it inevitable that most of the training time is spent on kernel launch when working with a large-ish embedding? Increasing the batch size (currently at 128) would help since the kernel launch time doesn't depend on the batch size, but 128 is already on the high side.
Not sure what exactly falls under "All Others"?
I am working on a Tesla T4 GPU with Tensorflow 2.2.0, but I see the same behavior using the nightly build.
Following the RNN tutorial on tensorflow.org (https://www.tensorflow.org/tutorials/text/text_classification_rnn), here is an example that highlights the performance issues:
import tensorflow_datasets as tfds
import tensorflow as tf
from datetime import datetime
from tqdm.auto import tqdm
### retrieve data ###
# use imdb_reviews dataset from TFDS
dataset = tfds.load('imdb_reviews',as_supervised=True)
train_dataset, test_dataset = dataset['train'], dataset['test']
### get encoder ###
# initialize tokenizer
tokenizer = tfds.features.text.Tokenizer()
# build vocabulary
def addOrUpdate(d,token):
d[token] = d.get(token,0)+1
vocab = dict()
dataset_iter = iter(train_dataset)
for el in tqdm(dataset_iter):
text = el[0].numpy().decode("utf-8")
for token in tokenizer.tokenize(text):
addOrUpdate(vocab,token)
# shrink vocabulary (MIN_COUNT>1 significantly reduces model dimension)
MIN_COUNT = 1
vocab_subset = set([k for k,v in vocab.items() if v >= MIN_COUNT])
print("Using vocabulary subset with min_count={:}: {:,} words, ".format(MIN_COUNT,len(vocab_subset)))
# create encoder
encoder = tfds.features.text.TokenTextEncoder(vocab_subset)
### Prepare the data for training ###
def encode(text_tensor, label):
encoded_text = encoder.encode(text_tensor.numpy())
return encoded_text, label
def encode_map_fn(text,label):
# encode
encoded_text, label = tf.py_function(encode,
inp=[text, label],
Tout=(tf.int64, tf.int64))
# set shapes
encoded_text.set_shape([None])
label.set_shape([])
return encoded_text, label
train_dataset = train_dataset.map(encode_map_fn)
test_dataset = test_dataset.map(encode_map_fn)
BUFFER_SIZE = 25000
BATCH_SIZE = 128
train_dataset = train_dataset.shuffle(BUFFER_SIZE)
train_dataset = train_dataset.padded_batch(BATCH_SIZE)
### create the model ###
model = tf.keras.Sequential([
tf.keras.layers.Embedding(encoder.vocab_size, 256, mask_zero=True),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(128)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(1)
])
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
optimizer=tf.keras.optimizers.Adam(1e-4),
metrics=['accuracy'])
### Train the model ###
# create tensorboard callback
log_path = 'logs_'+datetime.now().strftime("%Y%m%d_%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_path,
profile_batch = '10,20')
history = model.fit(train_dataset, epochs=1, steps_per_epoch=30,
callbacks=[tensorboard_callback])
Same code in a Colab Notebook: https://colab.research.google.com/drive/1WoAShXR2cGOYWPQoKdh4IGlhZh4FAK7o?usp=sharing

I haven't tried your code, but from looking at it, I guess the following issue might be related:
If a GPU is present but eager execution is enabled, Embedding layers are still placed on the CPU.
See https://github.com/tensorflow/tensorflow/issues/44194 (it includes a workaround).

If Keras results are not reproducible, what's the best practice for comparing models and choosing hyper parameters?

UPDATE: This question was for Tensorflow 1.x. I upgraded to 2.0 and (at least on the simple code below) the reproducibility issue seems fixed on 2.0. So that solves my problem; but I'm still curious about what "best practices" were used for this issue on 1.x.
Training the exact same model/parameters/data on keras/tensorflow does not give reproducible results and the loss is significantly different each time you train the model. There are many stackoverflow questions about that (eg, How to get reproducible results in keras ) but the recommend workarounds don't seem to work for me or many other people on StackOverflow. OK, it is what it is.
But given that limitation of non-reproducibility with keras on tensorflow -- what's the best practice for comparing models and choosing hyper parameters? I'm testing different architectures and activations, but since the loss estimate is different each time, I'm never sure if one model is better than the other. Is there any best practice for dealing with this?
I don't think the issue has anything to do with my code, but just in case it helps; here's a sample program:
import os
#stackoverflow says turning off the GPU helps reproducibility, but it doesn't help for me
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = ""
os.environ['PYTHONHASHSEED']=str(1)
import tensorflow as tf
import tensorflow.keras as keras
import tensorflow.keras.layers
import random
import pandas as pd
import numpy as np
#StackOverflow says this is needed for reproducibility but it doesn't help for me
from tensorflow.keras import backend as K
config = tf.ConfigProto(intra_op_parallelism_threads=1,inter_op_parallelism_threads=1)
sess = tf.Session(graph=tf.get_default_graph(), config=config)
K.set_session(sess)
#make some random data
NUM_ROWS = 1000
NUM_FEATURES = 10
random_data = np.random.normal(size=(NUM_ROWS, NUM_FEATURES))
df = pd.DataFrame(data=random_data, columns=['x_' + str(ii) for ii in range(NUM_FEATURES)])
y = df.sum(axis=1) + np.random.normal(size=(NUM_ROWS))
def run(x, y):
#StackOverflow says you have to set the seeds but it doesn't help for me
tf.set_random_seed(1)
np.random.seed(1)
random.seed(1)
os.environ['PYTHONHASHSEED']=str(1)
model = keras.Sequential([
keras.layers.Dense(40, input_dim=df.shape[1], activation='relu'),
keras.layers.Dense(20, activation='relu'),
keras.layers.Dense(10, activation='relu'),
keras.layers.Dense(1, activation='linear')
])
NUM_EPOCHS = 500
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(x, y, epochs=NUM_EPOCHS, verbose=0)
predictions = model.predict(x).flatten()
loss = model.evaluate(x, y) #This prints out the loss by side-effect
#Each time we run it gives a wildly different loss. :-(
run(df, y)
run(df, y)
run(df, y)
Given the non-reproducibility, how can I evaluate whether changes in my hyper-parameters and architecture are helping or not?

It's sneaky, but your code does, in fact, lack a step for better reproducibility: resetting the Keras & TensorFlow graphs before each run. Without this, tf.set_random_seed() won't work properly - see correct approach below.
I'd exhaust all the options before tossing the towel on non-reproducibility; currently I'm aware of only one such instance, and it's likely a bug. Nonetheless, it's possible you'll get notably differing results even if you follow through all the steps - in that case, see "If nothing works", but each is clearly not very productive, thus it's best on focusing attaining reproducibility:
Definitive improvements:
Use reset_seeds(K) below
Increase numeric precision: K.set_floatx('float64')
Set PYTHONHASHSEED before the Python kernel starts - e.g. from terminal
Upgrade to TF 2, which includes some reproducibility bug fixes, but mind performance
Run CPU on a single thread (painfully slow)
Do not import from tf.python.keras - see here
Ensure all imports are consistent (i.e. don't do from keras.layers import ... and from tensorflow.keras.optimizers import ...)
Use a superior CPU - for example, Google Colab, even if using GPU, is much more robust against numeric imprecision - see this SO
Also see related SO on reproducibility
If nothing works:
Rerun X times w/ exact same hyperparameters & seeds, average results
K-Fold Cross-Validation w/ exact same hyperparameters & seeds, average results - superior option, but more work involved
Correct reset method:
def reset_seeds(reset_graph_with_backend=None):
if reset_graph_with_backend is not None:
K = reset_graph_with_backend
K.clear_session()
tf.compat.v1.reset_default_graph()
print("KERAS AND TENSORFLOW GRAPHS RESET") # optional
np.random.seed(1)
random.seed(2)
tf.compat.v1.set_random_seed(3)
print("RANDOM SEEDS RESET") # optional
Running TF on single CPU thread: (code for TF1-only)
session_conf = tf.ConfigProto(
intra_op_parallelism_threads=1,
inter_op_parallelism_threads=1)
sess = tf.Session(config=session_conf)

The problem appears to be solved in Tensorflow 2.0 (at least on simple models)! Here is a code snippet that seems to yield repeatable results.
import os
####*IMPORANT*: Have to do this line *before* importing tensorflow
os.environ['PYTHONHASHSEED']=str(1)
import tensorflow as tf
import tensorflow.keras as keras
import tensorflow.keras.layers
import random
import pandas as pd
import numpy as np
def reset_random_seeds():
os.environ['PYTHONHASHSEED']=str(1)
tf.random.set_seed(1)
np.random.seed(1)
random.seed(1)
#make some random data
reset_random_seeds()
NUM_ROWS = 1000
NUM_FEATURES = 10
random_data = np.random.normal(size=(NUM_ROWS, NUM_FEATURES))
df = pd.DataFrame(data=random_data, columns=['x_' + str(ii) for ii in range(NUM_FEATURES)])
y = df.sum(axis=1) + np.random.normal(size=(NUM_ROWS))
def run(x, y):
reset_random_seeds()
model = keras.Sequential([
keras.layers.Dense(40, input_dim=df.shape[1], activation='relu'),
keras.layers.Dense(20, activation='relu'),
keras.layers.Dense(10, activation='relu'),
keras.layers.Dense(1, activation='linear')
])
NUM_EPOCHS = 500
model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(x, y, epochs=NUM_EPOCHS, verbose=0)
predictions = model.predict(x).flatten()
loss = model.evaluate(x, y) #This prints out the loss by side-effect
#With Tensorflow 2.0 this is now reproducible!
run(df, y)
run(df, y)
run(df, y)

Putting only the code of below, it works. The KEY of the question, VERY IMPORTANT, is to call the function reset_seeds() every time before running the model. Doing that you will obtain reproducible results as I checked in the Google Collab.
import numpy as np
import tensorflow as tf
import random as python_random
def reset_seeds():
np.random.seed(123)
python_random.seed(123)
tf.random.set_seed(1234)
reset_seeds()

You have a couple option for stabilizing performance...
1) Set the seed for your intializers so they are always initialized to the same values.
2) More data generally results in a more stable convergence.
3) Lower learning rates and bigger batch sizes are also good for more predictable learning.
4) Training based on a fixed amount of epochs instead of using callbacks to modify hyperparams during train.
5) K-fold validation to train on different subsets. The average of these folds should result in a fairly predictable metric.
6) Also you have the option of just training multiple times and taking an average of this.

Get loss values for each training instance - Keras

I want to get loss values as model train with each instance.
history = model.fit(..)
for example above code returns the loss values for each epoch not mini batch or instance.
what is the best way to do this? Any suggestions?

There is exactly what you are looking for at the end of this official keras documentation page https://keras.io/callbacks/#callback
Here is the code to create a custom callback
class LossHistory(keras.callbacks.Callback):
def on_train_begin(self, logs={}):
self.losses = []
def on_batch_end(self, batch, logs={}):
self.losses.append(logs.get('loss'))
model = Sequential()
model.add(Dense(10, input_dim=784, kernel_initializer='uniform'))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
history = LossHistory()
model.fit(x_train, y_train, batch_size=128, epochs=20, verbose=0, callbacks=[history])
print(history.losses)
# outputs
'''
[0.66047596406559383, 0.3547245744908703, ..., 0.25953155204159617, 0.25901699725311789]
'''

If you want to get loss values for each batch, you might want to use call model.train_on_batch inside a generator. It's hard to provide a complete example without knowing your dataset, but you will have to break your dataset into batches and feed them one by one
def make_batches(...):
...
batches = make_batches(...)
batch_losses = [model.train_on_batch(x, y) for x, y in batches]
It's a bit more complicated with single instances. You can, of course, train on 1-sized batches, though it will most likely thrash your optimiser (by maximising gradient variance) and significantly degrade performance. Besides, since loss functions are evaluated outside of Python's domain, there is no direct way to hijack the computation without tinkering with C/C++ and CUDA sources. Even then, the backend itself evaluates the loss batch-wise (benefitting from highly vectorised matrix-operations), therefore you will severely degrade performance by forcing it to evaluate loss on each instance. Simply put, hacking the backend will only (probably) help you reduce GPU memory transfers (as compared to training on 1-sized batches from the Python interface). If you really want to get per-instance scores, I would recommend you to train on batches and evaluate on instances (this way you will avoid issues with high variance and reduce expensive gradient computations, since gradients are only estimated during training):
def make_batches(batchsize, x, y):
...
batchsize = n
batches = make_batches(n, ...)
batch_instances = [make_batches(1, x, y) for x, y in batches]
losses = [
(model.train_on_batch(x, y), [model.test_on_batch(*inst) for inst in instances])
for batch, instances in zip(batches, batch_instances)
]

One solution is to calculate the loss function between train expectations and predictions from train input. In the case of loss = mean_squared_error and three dimensional outputs (i.e. image width x height x channels):
model.fit(train_in,train_out,...)
pred = model.predict(train_in)
loss = np.add.reduce(np.square(test_out-pred),axis=(1,2,3)) # this computes the total squared error for each sample
loss = loss / ( pred.shape[1]*pred.shape[2]*pred.shape[3]) # this computes the mean over the sample entry
np.savetxt("loss.txt",loss) # This line saves the data to file

After combining resources from here and here I came up with the following code. Maybe it will help you. The idea is that you can override the Callbacks class from keras and then use the on_batch_end method to check the loss value from the logs that keras will supply automatically to that method.
Here is a working code of an NN with that particular function built in. Maybe you can start from here -
import numpy as np
import pandas as pd
import seaborn as sns
import os
import matplotlib.pyplot as plt
import time
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
import keras
from keras.models import Sequential
from keras.layers import Dense
from keras.callbacks import Callback
# fix random seed for reproducibility
seed = 155
np.random.seed(seed)
# load pima indians dataset
# download directly from website
dataset = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/pima-indians-diabetes/pima-indians-diabetes.data",
header=None).values
X_train, X_test, Y_train, Y_test = train_test_split(dataset[:,0:8], dataset[:,8], test_size=0.25, random_state=87)
class NBatchLogger(Callback):
def __init__(self,display=100):
'''
display: Number of batches to wait before outputting loss
'''
self.seen = 0
self.display = display
def on_batch_end(self,batch,logs={}):
self.seen += logs.get('size', 0)
if self.seen % self.display == 0:
print('\n{0}/{1} - Batch Loss: {2}'.format(self.seen,self.params['samples'],
logs.get('loss')))
out_batch = NBatchLogger(display=1000)
np.random.seed(seed)
my_first_nn = Sequential() # create model
my_first_nn.add(Dense(5, input_dim=8, activation='relu')) # hidden layer
my_first_nn.add(Dense(1, activation='sigmoid')) # output layer
my_first_nn.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
my_first_nn_fitted = my_first_nn.fit(X_train, Y_train, epochs=1000, verbose=0, batch_size=128,
callbacks=[out_batch], initial_epoch=0)
Please let me know if you wanted to have something like this.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Results not reproducible with Keras and TensorFlow in Python - python

Looks like a bug in TensorFlow / Keras not sure. When setting the Keras back-end to CNTK the results are reproducible. I even tried with several versions of TensorFlow from 1.2.1 till 1.13.1. All the TensorFlow versions results doesn't agree with multiple runs even when the random seeds are set.

Related

Incompatible shapes in Stellargraph

Freeze model and train it

tf.keras: how to improve performance with large-ish embedding layer

If Keras results are not reproducible, what's the best practice for comparing models and choosing hyper parameters?

Get loss values for each training instance - Keras

Categories

Resources