Keras Tuner Save Best Model

Keras Tuner Save Best Model - python

I am currently attempting to used keras tuner to create a model for my CNN, though I am having some issues with saving my model for future use.
As I am used to it, I could regularly just save my model with model.save(filename) to receive a .model file; however, when attempting this with code such as:
tuner = RandomSearch(
build_model,
objective = "val_accuracy",
max_trials = 5,
executions_per_trial = 1,
directory = LOG_DIR
)
tuner.search(x= x_train, y= y_train, epochs= 1, batch_size=64, validation_data= (x_test, y_test))
bestModels = tuner.get_best_models(num_models=1)
highestScoreModel= models[0]
highestScoreModel.fit(x=x_train, y=y_train, batch_size=64, epochs=5, verbose=1, validation_split=0.2)
highestScoreModel.save("Trained_Model")
I received a Trained_Model folder with no model within it, only the parameters. If anyone could assist me with saving the actual trained model I'd be very thankful.
================= Edit / Update ================
I have now found a way to obtain the trial_id by sifting the generated trial files. Though, when I run:
trial_id = getID()
tuner.save_model(trial_id=trial_id, model=highestScoreModel, step=0)
Nothing seems to happen, no save file appears. Again, I would be grateful for any help on this matter.

Try this
Tuner.save_model(trial_id, model, step=0)
where
trial_id is your model trial number
model is your trained model
and step is epochs numbers

Related

How to load the latest checkpoint and save it as a model in Tensorflow?

So, I was wondering how to load the latest checkpoint in Tensorflow having its path/directory and continue the training where I left off. And also how to load the latest checkpoint and save it as a complete model. Please help me
My code:
cp_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_path,
verbose=1,
save_weights_only=True,
save_freq=1*batch_size)
# Create a basic model instance
model = create_model(training, output)
model.save_weights(checkpoint_path.format(epoch=0))
# Create a TensorBoard callback (for metrics monitoring)
tb_callback = tf.keras.callbacks.TensorBoard(log_dir="chatbot/training/logs", histogram_freq=1, update_freq= 1, profile_batch= 1)
# Train the model with the new callback
model.fit(training, output, epochs=500, batch_size = batch_size, validation_data=(training, output), callbacks=[cp_callback, tb_callback], verbose = 1)

The simplest solution to your problem would be to save the entire model with the ModelCheckpoint callback.
You only have to remove the save_weights_only argument for it to work.
cp_callback = tf.keras.callbacks.ModelCheckpoint(
filepath=checkpoint_path,
verbose=1,
save_freq=1*batch_size)
To load the checkpoint and continue training at a later point in time, just call
model = tf.keras.models.load_model(checkpoint_path)
If you want to load a checkpoint given you only saved the model weights, you have to first build your model and transfer your saved weights into it.
model.load_weights(checkpoint_path)
If you need further information about loading and saving models, I would recommend reading the documentation: https://www.tensorflow.org/guide/keras/save_and_serialize
This answer is referencing the answer of :
Save and load weights in keras

Call back not working in tensor flow to stop the training

I have written a call back which stops training when accuracy becomes 99%.But the problem is i get this error .Sometimes if i resolve this error the call back not get called even though acuurqacy becoms 100 %.
'>' not supported between instances of 'NoneType' and 'float'
class myCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs={}):
if(logs.get('accuracy') > 0.99):
self.model.stop_training = True
def train_mnist():
# Please write your code only where you are indicated.
# please do not remove # model fitting inline comments.
# YOUR CODE SHOULD START HERE
# YOUR CODE SHOULD END HERE
call = myCallback()
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data(path=path)
# YOUR CODE SHOULD START
x_train = x_train/255
y_train = y_train/255
# YOUR CODE SHOULD END HERE
model = tf.keras.models.Sequential([
# YOUR CODE SHOULD START HERE
keras.layers.Flatten(input_shape=(28,28)),
keras.layers.Dense(128,activation='relu'),
keras.layers.Dense(10,activation='softmax')
# YOUR CODE SHOULD END HERE
])
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# model fitting
history = model.fit(# YOUR CODE SHOULD START HERE
x_train,y_train,epochs=9,callbacks=[call] )
# model fitting
return history.epoch, history.history['acc'][-1]

Two major problems with the above code:
Getting to 100% accuracy on training set almost always means that your model is overfitting. Thats BAD. What you want to do instead is specify the validation_split=.2 parameter in the .fit method, and look for a high accuracy on the validation set.
What you are trying to build in your custom callback is already done in keras.callbacks.EarlyStopping, it even has an option to restore to the best overall model over each epoch. And, by default, it is looking for a validation accuracy, not training accuracy, if you have a validation split.
So, here's what you should do:
Stop using custom callbacks, they take some mastery to get to work. Use EarlyStopping with restore_best instead. like this
Always use validation_split and look for high accuracy in validation set. Like in this quick example.
Did using built-in callbacks resolve your problem?

Running keras model on colab TPU

I am currently trying to train a convolutional neural network CNN using Keras and the Google Colab GPU.
I found this article that discussed the option to increase the training time that is needed to train the model. Since the current training on the GPU is very slow I tried to implement the method from the article. I have the following code:
sgd = optimizers.SGD(lr=0.02)
model.compile(optimizer=sgd,loss='categorical_crossentropy',metrics=['accuracy'])
def create_train_subsets():
X_train =[]
y_train = []
for i in range(80):
cat = i+1
path = 'train_set/by_cat/{}'.format(cat)
for img in os.listdir(path):
actual_image = Image.open(("train_set/by_cat/{}/{}".format(cat,img)))
X_train.append(actual_image)
y_train.append(cat)
return X_train, y_train
# This address identifies the TPU we'll use when configuring TensorFlow.
x_train, y_train = create_train_subsets()
TPU_WORKER = 'grpc://' + os.environ['COLAB_TPU_ADDR']
tf.logging.set_verbosity(tf.logging.INFO)
tpu_model = tf.contrib.tpu.keras_to_tpu_model(
model,
strategy=tf.contrib.tpu.TPUDistributionStrategy(
tf.contrib.cluster_resolver.TPUClusterResolver(TPU_WORKER)))
history = tpu_model.fit(x_train, y_train,
epochs=20,
batch_size=128 * 8,
validation_split=0.2)
tpu_model.save_weights('./tpu_model.h5', overwrite=True)
# tpu_model.evaluate(x_test, y_test, batch_size=128 * 8)
This code however gives back the following error:
InvalidArgumentError: No OpKernel was registered to support Op 'ConfigureDistributedTPU' used by node ConfigureDistributedTPU (defined at /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1748) with these attrs: [tpu_embedding_config="", is_global_init=false, embedding_config=""]
Registered devices: [CPU, XLA_CPU]
Registered kernels:
<no registered kernels>
[[ConfigureDistributedTPU]]
I did an extensive search online but I can't seem to find any indication on what it means. Also, I am not understanding the process enough to figure out the exact meaning of the error myself.
Therefore, is there anybody out there that can help me understand what is wrong and maybe also knows a solution on how to solve this.
Thank you in advance!

How to run several times a model in tensorflow?

I want to train several times a file ConvNet.py in order to produce some statistics about its training like precision, confussion matrices, etc. So, I tried (in google colab) to do something like
for k in range(10:
%run ConvNet.py
The first training goes well, but when in begin the second, arise a problem. It says that "weights variable already defined, disallowed" (weights is the fist variable that I define in ConvNet.py) and the script stops.
I tried clearing variables with os kill, but there is still problems. How can I fix this?

It's probably better if you do the iterated training in Python directly.
You haven't shared much about your setup, but you can do something like:
import tensorflow as tf
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
def build_model():
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Flatten(input_shape=(28,28,)))
model.add(tf.keras.layers.Dense(32, activation="relu"))
model.add(tf.keras.layers.Dense(10, activation="sigmoid"))
model.compile(optimizer=tf.train.AdamOptimizer(),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
return model
for i in range(10):
model = build_model()
model.fit(x_train, y_train)
model.save_weights(f"./weights-{i}.hdf5")
And when you want to do the analysis:
for i in range(10):
model = build_model()
model.load_weights(f"./weights-{i}.hdf5")
do_analysis(model)

keras dosen't load the model and weights when using checkpoint

I'm using keras to build a deep autoencoder. I used its checkpointer to load the model and the weights but the result is always None which I think it means that the checkpoint dosen't work correctly and is not saving weights.
Here is the code how I proceed:
checkpointer = ModelCheckpoint(filepath="weights.best.h5",
verbose=0,
save_best_only=True)
tensorboard = TensorBoard(log_dir='/tmp/autoencoder',
histogram_freq=0,
write_graph=True,
write_images=True)
input_enc = Input(shape=(input_size,))
hidden_1 = Dense(hidden_size1, activation='relu')(input_enc)
hidden_11 = Dense(hidden_size2, activation='relu')(hidden_1)
code = Dense(code_size, activation='relu')(hidden_11)
hidden_22 = Dense(hidden_size2, activation='relu')(code)
hidden_2 = Dense(hidden_size1, activation='relu')(hidden_22)
output_enc = Dense(input_size, activation='tanh')(hidden_2)
autoencoder_yes = Model(input_enc, output_enc)
autoencoder_yes.compile(optimizer='adam',
loss='mean_squared_error',
metrics=['accuracy'])
history_yes = autoencoder_yes.fit(df_noyau_norm_y, df_noyau_norm_y,
epochs=200,
batch_size=batch_size,
shuffle = True,
validation_data=(df_test_norm_y, df_test_norm_y),
verbose=1,
callbacks=[checkpointer, tensorboard]).history
autoencoder_yes.save_weights("weights.best.h5")
print(autoencoder_yes.load_weights("weights.best.h5"))
Can somebody help me find out a way to resolve the problem?
Thanks

No, your interpretation of load_weights returning None is not correct. Load weights is a procedure, it does not return anything, and if you assign the return value of a procedure to a variable, it will get the value of None.
So weight saving is probably working fine, its just your interpretation that is wrong.

you should use save_weights_only=True. Without this the whole model is saved not just the weights. To be able to load weights you must save weights like this:
checkpointer = ModelCheckpoint(filepath="weights.best.h5",
verbose=0, save_weights_only=True,
save_best_only=True)

This is expected behavior not an error. The autoencoder_yes.load_weights("weights.best.h5") doesn't actually return anything, so if you try to print the output of this function you will get None as output.
Expected behavior
In the code that you have provided, you have trained the model and saved the weights. So, the autoencoder_yes is a keras.Model object that has the fine-tuned weights.
In the same script if you load the saved weights once again, nothing is supposed to happen, the weights that you saved will get loaded again.
For clarity
Start with another fresh script, build the same model architecture and reload the weights from the h5 file and then do some predictions. In that case it will silently load the pre-trained weights and do the predictions according to that.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Keras Tuner Save Best Model - python

Try this Tuner.save_model(trial_id, model, step=0) where trial_id is your model trial number model is your trained model and step is epochs numbers

Related

How to load the latest checkpoint and save it as a model in Tensorflow?

Call back not working in tensor flow to stop the training

Running keras model on colab TPU

How to run several times a model in tensorflow?

keras dosen't load the model and weights when using checkpoint

Categories

Resources