Training fails if model is saved beforehand

Training fails if model is saved beforehand - python

I noticed that saving a Tensorflow model prior to training causes the training to perform poorly. Obviously the solution is to just save the model later, but I'm curious why this is happening in the first place.
The following code runs fine, and produces the following output:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
model = tf.keras.Sequential([
layers.Flatten(input_shape=(28,28)),
layers.Dense(16, activation='relu'),
layers.Dense(16, activation='relu'),
layers.Dense(10)
])
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer='adam',
loss=loss_fn,
metrics=['accuracy'])
### model.save('my_model') ###
model.fit(x_train, y_train, epochs=3, verbose=1)
Epoch 1/3
1875/1875 [==============================] - 3s 1ms/step - loss: 0.4513 - accuracy: 0.8688
Epoch 2/3
1875/1875 [==============================] - 2s 1ms/step - loss: 0.2326 - accuracy: 0.9333
Epoch 3/3
1875/1875 [==============================] - 2s 1ms/step - loss: 0.1974 - accuracy: 0.9432
However when the second last line is commented out, the training fails badly:
INFO:tensorflow:Assets written to: my_model\assets
Epoch 1/3
1875/1875 [==============================] - 3s 1ms/step - loss: 0.4156 - accuracy: 0.0948
Epoch 2/3
1875/1875 [==============================] - 2s 1ms/step - loss: 0.2149 - accuracy: 0.1000
Epoch 3/3
1875/1875 [==============================] - 2s 1ms/step - loss: 0.1840 - accuracy: 0.0998

It's a known bug of TensorFlow. Use metrics=['sparse_categorical_accuracy'] instead of metrics=['accuracy'] when compiling the model to avoid it.

Related

Retrain a keras model

I would like to retrain a tensorflow.keras model, and I tried to compare both models:
Code 1:
keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)
optimizer = tf.keras.optimizers.Adam(lr=1e-2)
model = keras.models.Sequential([keras.layers.Dense(1, input_shape=[8])])
model.compile(loss="mse", optimizer=optimizer)
model.fit(X_train_scaled, y_train, epochs=3)
model.fit(X_train_scaled, y_train, epochs=3)
Code 2:
keras.backend.clear_session()
np.random.seed(42)
tf.random.set_seed(42)
optimizer = tf.keras.optimizers.Adam(lr=1e-2)
model = keras.models.Sequential([keras.layers.Dense(1, input_shape=[8])])
model.compile(loss="mse", optimizer=optimizer)
model.fit(X_train_scaled, y_train, epochs=6)
In code 1, I get:
Epoch 1/3
363/363 [==============================] - 0s 330us/step - loss: 1.8888
Epoch 2/3
363/363 [==============================] - 0s 342us/step - loss: 0.5772
Epoch 3/3
363/363 [==============================] - 0s 397us/step - loss: 0.5508
Epoch 1/3
363/363 [==============================] - 0s 470us/step - loss: 0.5428
Epoch 2/3
363/363 [==============================] - 0s 486us/step - loss: 0.5338
Epoch 3/3
363/363 [==============================] - 0s 479us/step - loss: 0.5519
In code 2, I get:
Epoch 1/6
363/363 [==============================] - 0s 332us/step - loss: 1.8888
Epoch 2/6
363/363 [==============================] - 0s 322us/step - loss: 0.5772
Epoch 3/6
363/363 [==============================] - 0s 333us/step - loss: 0.5508
Epoch 4/6
363/363 [==============================] - 0s 331us/step - loss: 0.5413
Epoch 5/6
363/363 [==============================] - 0s 371us/step - loss: 0.5440
Epoch 6/6
363/363 [==============================] - 0s 356us/step - loss: 0.5318
If you want to check on your own computer, I use this dataset:
import tensorflow as tf
from tensorflow import keras
import numpy as np
import os
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
housing = fetch_california_housing()
X_train_full, X_test, y_train_full, y_test = train_test_split(
housing.data, housing.target.reshape(-1, 1), random_state=42)
X_train, X_valid, y_train, y_valid = train_test_split(
X_train_full, y_train_full, random_state=42)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_valid_scaled = scaler.transform(X_valid)
X_test_scaled = scaler.transform(X_test)
I compared both ways of training a model:
first is by running 3 epochs, then 3 epochs
second is by running 6 epochs
It's unclear where the problem is. I use tensorflow 2.2.2 (or 2.3 provides the same results)
I don't understand why they don't provide the same result.

How to split the model.fit for continue training in multi days

The tensorflow model uses the following code for training:
model.fit(train_dataset,
steps_per_epoch=10000,
validation_data=test_dataset,
epochs=20000
)
The total steps_per_epoch is 10000 and epochs is 20000.
Is it possible to split the training time for multiple days:
day 1:
model.fit(..., steps_per_epoch=10000, ..., epochs=10, ....)
model.fit(..., steps_per_epoch=10000, ..., epochs=20, ....)
model.fit(..., steps_per_epoch=10000, ..., epochs=30, ....)
day 2:
model.fit(..., steps_per_epoch=10000, ..., epochs=100, ....)
day 3:
model.fit(..., steps_per_epoch=10000, ..., epochs=5, ....)
day (n):
model.fit(..., steps_per_epoch=10000, ..., epochs=n, ....)
The expected epochs is:
20000 = (day1 + day2 + day3 + ... + dayn)
Can I simply stop the model.fit and start the model.fit on another day?
Is it the same as running once with "epochs=20000"?

You can save your model after each day as a pickle file then tomorrow load your model and continue training:
training the model in day_1
import tensorflow_datasets as tfds
import tensorflow as tf
import joblib
train, test = tfds.load(
'fashion_mnist',
shuffle_files=True,
as_supervised=True,
split = ['train', 'test']
)
train = train.repeat(15).batch(64).prefetch(tf.data.AUTOTUNE)
test = test.batch(64).prefetch(tf.data.AUTOTUNE)
model = tf.keras.Sequential()
model.add(tf.keras.Input(shape=(28, 28, 1)))
model.add(tf.keras.layers.Conv2D(128, (3,3), activation='relu'))
model.add(tf.keras.layers.Dropout(rate=.4))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(512, activation='relu'))
model.add(tf.keras.layers.Dropout(rate=.4))
model.add(tf.keras.layers.Dense(128, activation='relu'))
model.add(tf.keras.layers.Dropout(rate=.4))
model.add(tf.keras.layers.Dense(10, activation='sigmoid'))
model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
optimizer='adam', metrics=['accuracy'])
model.summary()
model.fit(train, batch_size=256, steps_per_epoch=150, epochs=3, verbose=1)
model.evaluate(test, verbose=1)
joblib.dump(model, 'model_day_1.pkl')
Output after day_1:
Epoch 1/3
150/150 [==============================] - 7s 17ms/step - loss: 23.0504 - accuracy: 0.5786
Epoch 2/3
150/150 [==============================] - 2s 16ms/step - loss: 0.9366 - accuracy: 0.7208
Epoch 3/3
150/150 [==============================] - 3s 17ms/step - loss: 0.7321 - accuracy: 0.7682
157/157 [==============================] - 1s 8ms/step - loss: 0.4627 - accuracy: 0.8405
INFO:tensorflow:Assets written to: ram://***/assets
INFO:tensorflow:Assets written to: ram://***/assets
['model_day_1.pkl']
Load model in day_2 and continue training:
model = joblib.load("/content/model_day_1.pkl")
model.fit(train, batch_size=256, steps_per_epoch=150, epochs=3, verbose=1)
model.evaluate(test, verbose=1)
joblib.dump(model, 'model_day_2.pkl')
Output after day_2:
Epoch 1/3
150/150 [==============================] - 3s 17ms/step - loss: 0.6288 - accuracy: 0.7981
Epoch 2/3
150/150 [==============================] - 2s 16ms/step - loss: 0.5290 - accuracy: 0.8222
Epoch 3/3
150/150 [==============================] - 2s 16ms/step - loss: 0.5124 - accuracy: 0.8272
157/157 [==============================] - 1s 5ms/step - loss: 0.4131 - accuracy: 0.8598
INFO:tensorflow:Assets written to: ram://***/assets
INFO:tensorflow:Assets written to: ram://***/assets
['model_day_2.pkl']
Load model in day_3 and continue training:
model = joblib.load("/content/model_day_2.pkl")
model.fit(train, batch_size=256, steps_per_epoch=150, epochs=3, verbose=1)
model.evaluate(test, verbose=1)
joblib.dump(model, 'model_day_3.pkl')
Output after day_3:
Epoch 1/3
150/150 [==============================] - 3s 17ms/step - loss: 0.4579 - accuracy: 0.8498
Epoch 2/3
150/150 [==============================] - 2s 17ms/step - loss: 0.4078 - accuracy: 0.8589
Epoch 3/3
150/150 [==============================] - 2s 16ms/step - loss: 0.4073 - accuracy: 0.8560
157/157 [==============================] - 1s 5ms/step - loss: 0.3997 - accuracy: 0.8603
INFO:tensorflow:Assets written to: ram://***/assets
INFO:tensorflow:Assets written to: ram://***/assets
['model_day_3.pkl']

I think you're asking if multiple calls to model.fit will continue training the model (instead of starting from scratch)--the answer is yes, it will. However, a new History object is generated for each model.fit call, so if you are capturing that, you may want to handle that separately.
So running
model.fit(..., epochs=10)
model.fit(..., epochs=10)
will train the model for 20 epochs in total.

learn a new set of data from existing model for enhancing it (tensorflow - keras - callbacks)

I make a learning on a dataset, and everything is ok. Sometimes, changes occur, and I've got some new data. I'd like to "continue" the learning from my existing model, with the new set of data, without begining from scratch again.
Here is a simple example to show the problematic and where I'm stucked.
import tensorflow as tf
mnist = tf.keras.datasets.mnist
# get the data
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
# split in 2 parts to get 2 set of data
x_train1 = x_train[:5000]
x_train2 = x_train[5000:10000]
y_test1 = y_test[:5000]
y_test2 = y_test[5000:]
# set the model
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10)])
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
# Compile the model
model.compile(optimizer='adam',
loss=loss_fn,
metrics=['accuracy'])
# set the callback
checkpoint_path = "CHECKPOINTS/cp.ckpt"
cp_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path, monitor='accuracy', mode="max", save_best_only=False, save_weights_only=False, save_freq="epoch", verbose=0)
Then I fit my first set of data :
# fit the 1st dataset
model.fit(x_train1, y_test1, epochs=5,callbacks=[cp_callback],verbose=1)
Output:
Epoch 1/5
157/157 [==============================] - 0s 897us/step - loss: >2.3518 - accuracy: >0.1075
Epoch 2/5
157/157 [==============================] - 0s 752us/step - loss: >2.2833 - accuracy: >0.1438
Epoch 3/5
157/157 [==============================] - 0s 731us/step - loss: >2.2656 - accuracy: >0.1564
Epoch 4/5
157/157 [==============================] - 0s 755us/step - loss: >2.2388 - accuracy: >0.1719
Epoch 5/5
157/157 [==============================] - 0s 759us/step - loss: >2.2117 - accuracy: >0.1901
Then my 2nd one :
# fit the 2nd one
model.fit(x_train2, y_test2, epochs=5,callbacks=[cp_callback],verbose=1)
Output:
Epoch 1/5
157/157 [==============================] - 0s 943us/step - loss: >2.3240 - accuracy: >0.0964
Epoch 2/5
157/157 [==============================] - 0s 778us/step - loss: >2.2881 - accuracy: >0.1238
Epoch 3/5
157/157 [==============================] - 0s 805us/step - loss: >2.2688 - accuracy: >0.1514
Epoch 4/5
157/157 [==============================] - 0s 814us/step - loss: >2.2498 - accuracy: >0.1496
Epoch 5/5
157/157 [==============================] - 0s 1ms/step - loss: >2.2289 - accuracy: 0.1704
As you can see, it begins from 0 again the training, without keeping the existing train made before.
How could I do that.
Thanx by advance.

TensorFlow with same accuracy in Python

I was just following a TensorFlow example from the book Hands-On Machine Learning with Scikit-Learn and TensorFlow but got weird results.
The example:
import tensorflow as tf
from tensorflow import keras
tf.__version__
keras.__version__
fashion_mnist = keras.datasets.fashion_mnist
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist.load_data()
X_valid, X_train = X_train_full[:5000] / 255.0, X_train_full[5000:] / 255.0
y_valid, y_train = y_train_full[:5000] / 255.0, y_train_full[5000:] / 255.0
class_names = ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat",
"Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]
model = keras.models.Sequential([
keras.layers.Flatten(input_shape=[28, 28]),
keras.layers.Dense(300, activation="relu"),
keras.layers.Dense(100, activation="relu"),
keras.layers.Dense(10, activation="softmax")
])
model.compile(loss="sparse_categorical_crossentropy",
optimizer='sgd',
metrics=['accuracy'])
history = model.fit(X_train, y_train, epochs=50, validation_data=(X_valid, y_valid))
As the epochs evolve we should se an improvement for accuracy as indicated in the book:
Train on 55000 samples, validate on 5000 samples
Epoch 1/30
55000/55000 [==========] - 3s 55us/sample - loss: 1.4948 - acc: 0.5757 - val_loss: 1.0042 - val_acc: 0.7166
Epoch 2/30
55000/55000 [==========] - 3s 55us/sample - loss: 0.8690 - acc: 0.7318 - val_loss: 0.7549 - val_acc: 0.7616
[...]
Epoch 50/50
55000/55000 [==========] - 4s 72us/sample - loss: 0.3607 - acc: 0.8752 - acc: 0.8752 -val_loss: 0.3706 - val_acc: 0.8728
But when I ran I got the following:
Epoch 1/30
1719/1719 [==============================] - 3s 2ms/step - loss: 0.0623 - accuracy: 0.1005 - val_loss: 0.0011 - val_accuracy: 0.0914
Epoch 2/30
1719/1719 [==============================] - 3s 2ms/step - loss: 8.7637e-04 - accuracy: 0.1011 - val_loss: 5.2079e-04 - val_accuracy: 0.0914
Epoch 3/30
1719/1719 [==============================] - 3s 2ms/step - loss: 4.9200e-04 - accuracy: 0.1019 - val_loss: 3.4211e-04 - val_accuracy: 0.0914
[...]
Epoch 49/50
1719/1719 [==============================] - 3s 2ms/step - loss: 3.1710e-05 - accuracy: 0.0992 - val_loss: 3.2966e-05 - val_accuracy: 0.0914
Epoch 50/50
1719/1719 [==============================] - 3s 2ms/step - loss: 2.7711e-05 - accuracy: 0.1022 - val_loss: 3.1833e-05 - val_accuracy: 0.0914
So, as you can see the reproduction got a strongly lower accuracy that has not improved: it stayed at 0.0914 instead of 0.8728.
Is there something wrong in my TensorFlow installation, setup or even in the code?

you can not divide y such as y_valid, y_train = y_train_full[:5000] / 255.0, y_train_full[5000:] / 255.0. The completed code is following :
import tensorflow as tf
from tensorflow import keras
tf.__version__
keras.__version__
fashion_mnist = keras.datasets.fashion_mnist
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist.load_data()
X_train_full = X_train_full / 255.0
X_test = X_test / 255.0
class_names = ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat",
"Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]
model = keras.models.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
keras.layers.Dense(128, activation="relu"),
keras.layers.Dense(10, activation="softmax")
])
model.compile(loss="sparse_categorical_crossentropy",
optimizer='sgd',
metrics=['accuracy'])
history = model.fit(X_train_full, y_train_full, epochs=5, validation_data=(X_test, y_test))
It will give the acc like :
Epoch 1/5
1875/1875 [==============================] - 2s 1ms/step - loss: 0.9880 - accuracy: 0.6923 - val_loss: 0.5710 - val_accuracy: 0.8054
Epoch 2/5
1875/1875 [==============================] - 2s 944us/step - loss: 0.5281 - accuracy: 0.8227 - val_loss: 0.5112 - val_accuracy: 0.8228
Epoch 3/5
1875/1875 [==============================] - 2s 913us/step - loss: 0.4720 - accuracy: 0.8391 - val_loss: 0.4782 - val_accuracy: 0.8345
Epoch 4/5
1875/1875 [==============================] - 2s 915us/step - loss: 0.4492 - accuracy: 0.8462 - val_loss: 0.4568 - val_accuracy: 0.8410
Epoch 5/5
1875/1875 [==============================] - 2s 935us/step - loss: 0.4212 - accuracy: 0.8550 - val_loss: 0.4469 - val_accuracy: 0.8444
Also, optimizer adam may be give better result than sgd.

Keras 2.0.8 only performs 1 epoch with Python 3.x, 10 with 2.x

If I switch this to Python 2.x, it performs 10. Why is that?
Training a logistic regression model
import keras.backend as K
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.optimizers import SGD
from sklearn.model_selection import train_test_split, cross_val_score
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size = 0.3,
random_state = 42)
# NOTE: If I run this in Python 3.x, it only performs 1 Epoch
K.clear_session()
model = Sequential()
model.add(Dense(1, input_shape=(4,), activation='sigmoid'))
model.compile(loss = 'binary_crossentropy',
optimizer= 'sgd',
metrics = ['accuracy'])
# Saved the result of the fitting, to display the history as a data frame & see how the model does
history = model.fit (X_train, y_train)
result = model.evaluate(X_test, y_test)
Output:
Epoch 1/10
960/960 [==============================] - 0s - loss: 0.7943 - acc: 0.5219
Epoch 2/10
960/960 [==============================] - 0s - loss: 0.7338 - acc: 0.5469
Epoch 3/10
960/960 [==============================] - 0s - loss: 0.6847 - acc: 0.5688
Epoch 4/10
960/960 [==============================] - 0s - loss: 0.6446 - acc: 0.6177
Epoch 5/10
960/960 [==============================] - 0s - loss: 0.6113 - acc: 0.6719
Epoch 6/10
960/960 [==============================] - 0s - loss: 0.5832 - acc: 0.7000
Epoch 7/10
960/960 [==============================] - 0s - loss: 0.5591 - acc: 0.7177
Epoch 8/10
960/960 [==============================] - 0s - loss: 0.5381 - acc: 0.7365
Epoch 9/10
960/960 [==============================] - 0s - loss: 0.5196 - acc: 0.7542
Epoch 10/10
960/960 [==============================] - 0s - loss: 0.5031 - acc: 0.7688
32/412 [=>............................] - ETA: 0s

The fit function has parameter epochs with default value 1.
fit(self, x=None, y=None, batch_size=None, epochs=1, verbose=1, callbacks=None,
validation_split=0.0, validation_data=None, shuffle=True, class_weight=None,
sample_weight=None, initial_epoch=0, steps_per_epoch=None,
validation_steps=None)
However, the default used to be 10. See the changes to fit in models.py in this commit for example. You most likely have an older version of Keras with Python 2.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Training fails if model is saved beforehand - python

It's a known bug of TensorFlow. Use metrics=['sparse_categorical_accuracy'] instead of metrics=['accuracy'] when compiling the model to avoid it.

Related

Retrain a keras model

How to split the model.fit for continue training in multi days

learn a new set of data from existing model for enhancing it (tensorflow - keras - callbacks)

TensorFlow with same accuracy in Python

Keras 2.0.8 only performs 1 epoch with Python 3.x, 10 with 2.x

Categories

Resources