Freeze model and train it - python

I am using Keras Tensorflow in Colab. I fit a model and save it. Then I load it and check the performance and of course it should be the same. Then I freeze it and I fit it again. I would expect that afterwards the model has the same performance. Of course during "training" due to batch size differences there can be differences in the accuracy. But afterwards when checking it with model.evaluate I would expect no differences, as the weights cannot be changed, as the model was frozen. However, it turns out this is not the case.
My code:
import csv
import tensorflow as tf
import tensorflow_datasets as tfds
import numpy as np
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
(train_x, train_labels), (test_x, test_labels) = tf.keras.datasets.imdb.load_data(num_words=10000)
x_train_padded = pad_sequences(train_x, maxlen=500)
x_test_padded = pad_sequences(test_x, maxlen=500)
model = tf.keras.Sequential([
tf.keras.layers.Embedding(10000, 128, input_length=500),
tf.keras.layers.Conv1D(128, 5, activation='relu'),
tf.keras.layers.GlobalAveragePooling1D(),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1)
])
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),optimizer='adam', metrics=[tf.metrics.BinaryAccuracy(threshold=0.0, name='accuracy')])
history = model.fit(x=x_train_padded,
y=train_labels,
validation_data=(x_test_padded , test_labels),
epochs=4, batch_size=128)
gives the output:
I save the model:
model.save('test.h5')
and load it back:
modelloaded=tf.keras.models.load_model('test.h5')
and check the performance:
modelloaded.evaluate(x_test_padded , test_labels)
of course still the same:
Now I set the model to non-trainable:
modelloaded.trainable=False
and indeed:
modelloaded.summary()
shows that all parameters are non-trainable:
Now I fit it again, using just one epoch:
history = modelloaded.fit(x=x_train_padded,
y=train_labels,
validation_data=(x_test_padded , test_labels),
epochs=1, batch_size=128)
I understand that although the weights are non-trainable, the accuracy changes as this depends on the batch size.
However, when I check the model afterwards with:
modelloaded.evaluate(x_test_padded , test_labels)
I can see that the model was changed? The loss and the accuracy is different. I do not understand why, I would have expected the same numbers. As the model cannot be trained. It doesn't matter if I call it with different batch sizes:
modelloaded.evaluate(x_test_padded , test_labels, batch_size=16)
The numbers are always the same, however different to those before the model fitting.
Edit:
I tried the following:
modelloaded=tf.keras.models.load_model('test.h5')
modelloaded.trainable=False
for layer in modelloaded.layers:
layer.trainable=False
history = modelloaded.fit(x=x_train_padded,
y=train_labels,
validation_data=(x_test_padded , test_labels),
epochs=1, batch_size=128)
modelloaded.evaluate(x_test_padded, test_labels)
However, still the weights are adjusted (I checked this with comparing
print(modelloaded.trainable_variables) before and afterwards) and the modelloaded.evaluate output gives slightly different results, where I would expect no changes. As the model weights should not have changed. But they did, as I can see when checking
print(modelloaded.trainable_variables).

This appears to be a bigger Issue which is discussed here.
Setting all layers explicitly non trainable should work:
for layer in modelloaded.layers:
layer.trainable = False

My mistake was that I did not compile the model again after setting it to non-trainable.

You have to compile the model befor fitting it again, or the function fit will take your latest compile configuration ...

Related

Bert prediction shape not equal to num_samples

I have a text classification that I am trying to do using BERT. Below is the code I am using. The model training code(below) works fine but I am facing issue with the prediction part
from transformers import TFBertForSequenceClassification
import tensorflow as tf
# recommended learning rate for Adam 5e-5, 3e-5, 2e-5
learning_rate = 5e-5
nlabels = 26
# we will do just 1 epoch for illustration, though multiple epochs might be better as long as we will not overfit the model
number_of_epochs = 1
# model initialization
model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=nlabels,
output_attentions=False,
output_hidden_states=False)
# optimizer Adam
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate, epsilon=1e-08)
# we do not have one-hot vectors, we can use sparce categorical cross entropy and accuracy
loss = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
metric = tf.keras.metrics.SparseCategoricalAccuracy('accuracy')
model.compile(optimizer=optimizer, loss=loss, metrics=[metric])
bert_history = model.fit(ds_tr_encoded, epochs=number_of_epochs)
I am getting the output using the following
preds = model.predict(ds_te_encoded)
pred_labels_idx = np.argmax(preds['logits'], axis=1)
The issue I am facing is that the shape of pred_labels_idx is not the same as ds_te_encoded
len(pred_labels_idx) #426820
tf.data.experimental.cardinality(ds_te_encoded) #<tf.Tensor: shape=(), dtype=int64, numpy=21341>
Not sure why this is happening.
Since ds_te_encoded is of type tf.data.Dataset and you call cardinality(...), the cardinality in your case is simply the rounded number of batches and not the number of samples. So I am assuming you are using a batch size of 20, because 426820/20 = 21341. That is probably what is causing the confusion.

How do keras LSTM input and output shapes work?

trainX, trainY, sequence_length=len(train), batch_size=batchTrain
)
val=timeseries_dataset_from_array(
valX, valY, sequence_length=len(val), batch_size=batchVal
)
test=timeseries_dataset_from_array(
testX, testY, sequence_length=len(test), batch_size=batchTest
)
return train, val, test
train, val, test = preprocessor()
model=Sequential()
model.add(LSTM(4,return_sequences=True))
model.add(Dense(2,activation='softmax'))
model.compile(optimizer='Adam', loss="mae")
model.fit(train, epochs=200, verbose=2, validation_data=val, shuffle=False)
I'm trying to make an LSTM from time-series data and when I run the above, the loss doesn't change at all. I'm definitely struggling to understand how lstm input/output shapes work. I've read as much online as I could find, but I can't seem to get the model to learn. I'm under the impression that the first argument is the dimensionality of the output space. I want the lstm to return the whole sequence to the output function.
There are many problems in your model. You final layer is dense with two units and you are using softmax which should be replaced by sigmoid. Since you are using softmax, i guess that you are using this model for classification and not regression.
If you are using a model for classification tasks then you should use BinaryCrossentropy and not MeanAbsoluteError as loss.
To answer the question in full detail, you need to post the additional information. For example: What are you target variables etc.

LSTM Model not having any variance during evaluation

I have a question regarding the evaluation of an LSTM Model. I have trained an LSTM Model and stored it with model.save(...). Now I want load_model and evaluate it on the validation set datasets. Since neural networks are stochastic, I run it several times and compute the mean and the variance of the different metrics I am interested in.
Now I am shocked that after the first run all consecutive runs have the same performance on every metric. I don't think that is right, but I don't know where the error occurs.
So my question is:
what is my mistake in setting up the validation of my model?
and how can I fix that?
Here are the code snippets that should explain what I am doing:
Compile and fit the Model
def compile_and_fit( hparams,
MAX_EPOCHS,
model_path ):
window = WindowGenerator( input_width= hparams[HP_WINDOW_SIZE],
label_width=hparams[HP_WINDOW_SIZE], shift=1,
label_columns=['q_MARI'], batch_size = hparams[HP_BATCH_SIZE])
model = tf.keras.models.Sequential([
tf.keras.layers.LSTM(hparams[HP_NUM_UNITS], return_sequences=True, name="LSTM_1"),
tf.keras.layers.Dropout(hparams[HP_DROPOUT], name="Dropout_1"),
tf.keras.layers.LSTM(hparams[HP_NUM_UNITS], return_sequences=True, name="LSTM_2"),
tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(1))
])
learning_rate = hparams[HP_LEARNING_RATE]
model.compile(loss=tf.losses.MeanSquaredError(),
optimizer=tf.optimizers.Adam(learning_rate=learning_rate),
metrics=get_metrics())
history = model.fit(window.train,
epochs=MAX_EPOCHS,
validation_data=window.val,
callbacks= get_callbacks(model_path))
_, a,_,_,_,_ = model.evaluate(window.val)
return a, model, history
Train and safe it
a, model, history = compile_and_fit( hparams = hparams, MAX_EPOCHS = MAX_EPOCHS, model_path = run_path)
model.save(run_path)
Load and evaluate it
model = tf.keras.models.load_model(os.path.join(hparam_path, model_name),
custom_objects={"max_error": max_error, "median_absolute_error": median_absolute_error, "rev_metric": rev_metric, "nse_metric": nse_metric})
model.compile(loss=tf.losses.MeanSquaredError(), optimizer="adam", metrics=get_metrics())
metric_values = np.empty(shape = (nr_runs, len(metrics)), dtype=float)
for j in range(nr_runs):
window = WindowGenerator(input_width= hparam_vals[i], label_width=hparam_vals[i], shift=1,
label_columns=['q_MARI'])
metric_values[j]= np.array(model.evaluate(window.val))
means = metric_values.mean(axis=0)
varis = metric_values.var(axis=0)
print(f'means: {means}, varis: {varis}')
The results I am getting
For setting up the Training I follow those two guides:
https://www.tensorflow.org/tutorials/structured_data/time_series
https://www.tensorflow.org/tensorboard/hyperparameter_tuning_with_hparams
LSTM is not stochastic. Evaluation results should be the same for the same data.
There are two steps, when you train the model, randomness will influence the model you trained. However, after that, you saved the model, the prediction result would be same if you use the same model.

Tensorflow neural network doesn’t learn

I built a neural network for a university project. The goal is to find out if sensor data (temperature, humidity and light) can predict if the sunrise happened during a given time frame. So, it is a binary classification.
The problem is that the network does not learn. The accuracy converges towards about 0.8 and does not change after about 5 epochs. Same with the loss, which sits at about 0.4921 after a few epochs. I tried several things like changing the activation function or the number of hidden layers, but nothing worked.
I also created a dataset with an equal amount of "sunrise = 1" and "sunrise = 0" data points. The accuracy ended up at exactly 0,5. Therefore I think that there is something wrong with the network setup itself.
Do you have any idea what could be wrong?
Here is my code:
def build_network():
input = keras.Input(shape=(4,25), name="input")
hidden = layers.Dense(1000, activation="sigmoid", name="dense1")(input)
hidden = layers.Dense(1000, activation="sigmoid", name="dense2")(hidden)
hidden = layers.Flatten()(hidden)
hidden = layers.Dense(500, activation="sigmoid", name="dense3")(hidden)
hidden = layers.Dense(500, activation="sigmoid", name="dense4")(hidden)
hidden = layers.Dense(10, activation="sigmoid", name="dense5")(hidden)
output = layers.Dense(1, activation="sigmoid", name="output")(hidden)
model = keras.Model(inputs=input, outputs=output, name="sunrise_model")
return model
def train_model():
training_files = r'data/training'
test_files = r'data/test'
print('reding files...')
train_x, train_y = load_data(training_files)
test_x, test_y = load_data(test_files)
print("training network")
# compile model
model = build_network()
model.compile(
loss=keras.losses.BinaryCrossentropy(from_logits=False),
optimizer=keras.optimizers.RMSprop(),
metrics=["accuracy"],
)
# Train / fit
model.fit(train_x, train_y, batch_size=100, epochs=200)
# evaluate
test_scores = model.evaluate(test_x, test_y, verbose=2)
print("Test loss:", test_scores[0])
print("Test accuracy:", test_scores[1])
Here is the output: loss: 0.4921 - accuracy: 0.8225
Test loss: 0.4921109309196472,
Test accuracy: 0.8225
And here is an example of the data: https://hastebin.com/hazipagija.json
I would use RELU instead of sigmoid as the activation function. What was the learning rate you used? Try a smaller learning rate. Actually I find I get the best results using a variable learning rate. The Keras callback ReduceLROnPlateau makes this easy to do. Documentation is here. I also recommend that you use the Keras callback ModelCheckpoint to save the model with the lowest validation loss then use that model to make predictions on the test set. Documentation is here.I also think your model has to many parameters and will overfit. Add dropout layers to the model to help reduce this problem. I would try reducing the model complexity as a good alternative. Take out in of the layers with 1000 nodes and one of the layers with 500 nodes and see what results you get. I also prefer to use the Adamax optimizer. Documentation is here.. Use the default values.

Results not reproducible with Keras and TensorFlow in Python

I have the problem, that I am not able to reproduce my results with Keras and ThensorFlow.
It seems like recently there has been a workaround published on the Keras documentation site for this issue but somehow it doesn't work for me.
What I am doing wrong?
I'm using a Jupyter Notebook on a MBP Retina (without Nvidia GPU).
# ** Workaround from Keras Documentation **
import numpy as np
import tensorflow as tf
import random as rn
# The below is necessary in Python 3.2.3 onwards to
# have reproducible behavior for certain hash-based operations.
# See these references for further details:
# https://docs.python.org/3.4/using/cmdline.html#envvar-PYTHONHASHSEED
# https://github.com/fchollet/keras/issues/2280#issuecomment-306959926
import os
os.environ['PYTHONHASHSEED'] = '0'
# The below is necessary for starting Numpy generated random numbers
# in a well-defined initial state.
np.random.seed(42)
# The below is necessary for starting core Python generated random numbers
# in a well-defined state.
rn.seed(12345)
# Force TensorFlow to use single thread.
# Multiple threads are a potential source of
# non-reproducible results.
# For further details, see: https://stackoverflow.com/questions/42022950/which-seeds-have-to-be-set-where-to-realize-100-reproducibility-of-training-res
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
from keras import backend as K
# The below tf.set_random_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, see: https://www.tensorflow.org/api_docs/python/tf/set_random_seed
tf.set_random_seed(1234)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)
# ** Workaround end **
# ** Start of my code **
# LSTM and CNN for sequence classification in the IMDB dataset
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers.embeddings import Embedding
from keras.preprocessing import sequence
from sklearn import metrics
# fix random seed for reproducibility
#np.random.seed(7)
# ... importing data and so on ...
# create the model
embedding_vecor_length = 32
neurons = 91
epochs = 1
model = Sequential()
model.add(Embedding(top_words, embedding_vecor_length, input_length=max_review_length))
model.add(LSTM(neurons))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='mean_squared_logarithmic_error', optimizer='adam', metrics=['accuracy'])
print(model.summary())
model.fit(X_train, y_train, epochs=epochs, batch_size=64)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))
Used Python version:
Python 3.6.3 |Anaconda custom (x86_64)| (default, Oct 6 2017, 12:04:38)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
The workaround is already included in the code (without effect).
With everytime I do the training part I get different results.
When resetting the kernel of the Jupyter Notebook, 1st time corresponds with the first time and 2nd time with 2nd time.
So after resetting I will always get for example 0.7782 at the first run, 0.7732 on the second run etc.
But results without kernel reset are always different each time I run it.
I would be helpful for any suggestion!
I had exactly the same problem and managed to solve it by closing and restarting the tensorflow session every time I run the model. In your case it should look like this:
#START A NEW TF SESSION
np.random.seed(0)
tf.set_random_seed(0)
sess = tf.Session(graph=tf.get_default_graph())
K.set_session(sess)
embedding_vecor_length = 32
neurons = 91
epochs = 1
model = Sequential()
model.add(Embedding(top_words, embedding_vecor_length, input_length=max_review_length))
model.add(LSTM(neurons))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='mean_squared_logarithmic_error', optimizer='adam', metrics=['accuracy'])
print(model.summary())
model.fit(X_train, y_train, epochs=epochs, batch_size=64)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[1]*100))
#CLOSE TF SESSION
K.clear_session()
I ran the following code and had reproducible results using GPU and tensorflow backend:
print datetime.now()
for i in range(10):
np.random.seed(0)
tf.set_random_seed(0)
sess = tf.Session(graph=tf.get_default_graph())
K.set_session(sess)
n_classes = 3
n_epochs = 20
batch_size = 128
task = Input(shape = x.shape[1:])
h = Dense(100, activation='relu', name='shared')(task)
h1= Dense(100, activation='relu', name='single1')(h)
output1 = Dense(n_classes, activation='softmax')(h1)
model = Model(task, output1)
model.compile(loss='categorical_crossentropy', optimizer='Adam')
model.fit(x_train, y_train_onehot, batch_size = batch_size, epochs=n_epochs, verbose=0)
print(model.evaluate(x=x_test, y=y_test_onehot, batch_size=batch_size, verbose=0))
K.clear_session()
And obtained this output:
2017-10-23 11:27:14.494482
0.489712882132
0.489712893813
0.489712892765
0.489712854426
0.489712882132
0.489712864011
0.486303713004
0.489712903398
0.489712892765
0.489712903398
What I understood is that if you don't close your tf session (you are doing it by running in a new kernel) you keep sampling the same "seeded" distribution.
My answer is the following, which uses Keras with Tensorflow as backend. Within your nested for loop, where one typically iterates through the various parameters you wish to explore for your model's development, immediately add this function after your last for loop.
for...
for...
reset_keras()
.
.
.
where the reset function is defined as
def reset_keras():
sess = tf.keras.backend.get_session()
tf.keras.backend.clear_session()
sess.close()
sess = tf.keras.backend.get_session()
np.random.seed(1)
tf.set_random_seed(2)
PS: The function above also actually avoids your nvidia GPU from building up too much memory (which happens after many iteration) so that it eventually becomes very slow...so the function restores GPU performance and maintains results as reproducible.
Looks like a bug in TensorFlow / Keras not sure. When setting the Keras back-end to CNTK the results are reproducible.
I even tried with several versions of TensorFlow from 1.2.1 till 1.13.1. All the TensorFlow versions results doesn't agree with multiple runs even when the random seeds are set.
The thing that worked for me was to run the training every time in a new console. In addition to this I also have this parameters set:
RANDOM_STATE = 42
os.environ['PYTHONHASHSEED'] = str(RANDOM_STATE)
random.seed(RANDOM_STATE)
np.random.seed(RANDOM_STATE)
tf.set_random_seed(RANDOM_STATE)
session_conf = tf.ConfigProto(intra_op_parallelism_threads=1, inter_op_parallelism_threads=1)
sess = tf.Session(graph=tf.get_default_graph(), config=session_conf)
K.set_session(sess)
intra_op_parallelism could also be a bigger value

Categories