How do I save an optimizer state of JAX trained model?

How do I save an optimizer state of JAX trained model? - python

I am playing with the mnist_vae example and can't figure out how to properly save/load weights of the trained model.
enc_init_rng, dec_init_rng = random.split(random.PRNGKey(2))
_, init_encoder_params = encoder_init(enc_init_rng, (batch_size, 28 * 28))
_, init_decoder_params = decoder_init(dec_init_rng, (batch_size, 10))
init_params = init_encoder_params, init_decoder_params
opt_init, opt_update, get_params = optimizers.momentum(step_size, mass=0.9)
opt_state = opt_init(init_params)
after that, I train the model using opt_update and want to save it. However, I haven't found any function to save the optimizer state to the disk.
I tried to save parameters and initialize opt_state with them, but not all the information conserves, and the result opt_state_1 is not the original opt_state.
weights=get_params(opt_state)
jnp.save(file, weights)
weights = jnp.load(file,allow_pickle=True)
opt_state_1 = opt_init(init_params)
How do I properly save the model I trained?

import pickle
from jax.experimental import optimizers
trained_params = optimizers.unpack_optimizer_state(opt_state)
pickle.dump(trained_params, open(os.path.join(config["ckpt_path"], "best_ckpt.pkl"), "wb"))
best_params = pickle.load(open(os.path.join(config["ckpt_path"], "best_ckpt.pkl"), "rb"))
best_opt_state = optimizers.pack_optimizer_state(best_params)

Related

How can I pass Input/Output images to Tensorboard using Keras model.fit() method to train a model?

I recently switched from Tensorflow 1.14 and Estimaror API to Tensorflow 2.0 and keras API.I am working on an image segmentation problem so the inputs/outputs/labels are all images. When I used Estimator, things where pretty straight forward. In model_fn where the arguments were (features, labels, mode, params) I could just pick the features and labels, do the necessary processing and then pass it in tf.summary.image() and everything worked like a charm. Now, using the keras API, although it provides greater ease of use, it makes hard to do simple handling on data during training, which becomes even harder when it is used with dataset API.Example:
Tensorflow 1.14/Estimator:
def model_fn(features, labels, mode, params):
loss, train_op, = None, None
eval_metric_ops, training_hooks, evaluation_hooks = None, None, None
output = model(input=features)
predictions = tf.argmax(output, axis=-1)
predictions_dict = {'predicted': predictions}
dice_score = tf.contrib.metrics.f1_score(labels=label, predictions=predictions[:, :, :, 1])
if mode in (estimator.ModeKeys.TRAIN, estimator.ModeKeys.EVAL):
global_step = tf.train.get_or_create_global_step()
learning_rate = tf.train.exponential_decay(params['lr'], global_step=global_step,
decay_steps=params['decay_steps'],
decay_rate=params['decay_rate'], staircase=False)
loss = loss_fn(outputs=predictions, labels=labels)
summary.image('Input_Image', features)
summary.image('Label', tf.expand_dims(tf.cast(label, dtype=tf.float32), axis=-1))
summary.image('Prediction', tf.expand_dims(tf.cast(predictions, dtype=tf.float32), axis=-1))
if mode == estimator.ModeKeys.TRAIN:
with tf.name_scope('Metrics'):
summary.scalar('Dice_Coefficient', dice_score[1])
summary.scalar('Learning_Rate', learning_rate)
summary.merge_all()
train_logs_hook = tf.estimator.LoggingTensorHook({'Dice_Coefficient': dice_score[1]},every_n_iter=params['train_log_every_n_steps']) every_n_iter=params['train_log_every_n_steps'])
training_hooks = [train_logs_hook]
train_op = Adam(learning_rate=learning_rate, epsilon=params['epsilon']).minimize(loss=loss, global_step=global_step)
if mode == estimator.ModeKeys.EVAL:
eval_metric_ops = {'Metrics/Dice_Coefficient': dice_score}
eval_summary_hook = tf.estimator.SummarySaverHook(output_dir=params['eval_metrics_path'],
summary_op=summary.merge_all(),
save_steps=params['eval_steps_per_summary_save'])
evaluation_hooks = [eval_summary_hook]
return estimator.EstimatorSpec(mode,
predictions=predictions_dict,
loss=loss,
train_op=train_op,
eval_metric_ops=eval_metric_ops,
training_hooks=training_hooks,
evaluation_hooks=evaluation_hooks)
Using Keras with Tensorflow 2.0 AFAIK, I can't have this kind of access to the Input/Output tensors during training or evaluation (notice than even though during evaluation estimator dont get the image summaries, you can still have access to preview the results by using a tf.estimator.SummarySaverHook). Below is my falied attempt:
def train_data(params): # Similar is the eval_data
def standardization_summaries(image, label, step, writer):
# Some processing to images
with writer.as_default():
tf.summary.image('Input_dataset', image, step=step, max_outputs=1)
tf.summary.image('label_dataset', label, step=step, max_outputs=1)
return image, label
data_set = tf.data.Dataset.from_generator(generator=lambda: data_generator(params),
output_types=(tf.float32, tf.int64),
output_shapes=(tf.TensorShape([None, None]), tf.TensorShape([None, None])))
data_set = data_set.map(lambda x, y: standardization_summaries(image=x, label=y, step=params['global_step'], writer=params['writer']))
data_set = data_set.batch(params['batch_size'])
data_set = data_set.prefetch(buffer_size=-1)
return data_set
model = tf.keras.models.load_model(saved_model)
summary_writer = tf.summary.create_file_writer(save_model_path)
step = tf.Variable(0, trainable=False, dtype=tf.int64)
tensorboard = tf.keras.callbacks.TensorBoard(log_dir=save_model_path, histogram_freq=1, write_graph=True,
write_images=False)
early_stop = tf.keras.callbacks.EarlyStopping(patience=args.early_stop)
callbacks = [tensorboard, early_stop]
params = {'batch_size': args.batch_size,
'global_step': step,
'writer': summary_writer}
model.fit(x=train_data(params), epochs=args.epochs, initial_epoch=args.initial_epoch,
validation_data=val_data(params), steps_per_epoch=2, callbacks=callbacks)
Getting the input images from the dataset API came from here but this just gets tons of images whenever the dataset fetches data from the generator. Also, with the step variable being constant and not changing (I can't figure out how to make it walk) everything is just under the step 0 and I can't think any viable way to connect these outputs with the predicted output, given that I would find a way to print them.
So, the question is: Is there anything that I am still missing with Keras API and Tensorboard synergies on image summaries. Is there a way to save image summaries lets say, for every half epoch in training and once at the end of evaluation or should I just let the model be trained and get the training outputs through model.predict() at the end of training an then inspect if something goes wrong(which is not efficient)?

keras model.fit() fed with initializable iterator of tf.Dataset object

I am using the tf.keras API to build my CNN model with the use of the tf.Dataset API to create a input pipeline for my model. The mnist dataset from the tf.keras.datasets is used for testing and prepared in the memory by executing the code:
(train_images,train_labels),(test_images,test_labels) = tf.keras.datasets.mnist.load_data()
and also some preprocessing to be compatible with my keras model:
Train_images = np.expand_dims(train_images,3).astype('float')/255.0
Test_images = np.expand_dims(test_images,3).astype('float')/255.0
Train_labels = tf.keras.utils.to_categorical(train_labels)
Test_labels = tf.keras.utils.to_categorical(test_labels)
Those data are stored in the memory as arrays and there are two option for creating a Dataset object. The first one is simply using tf.data.Dataset.from_tensor_slices:
image = tf.data.Dataset.from_tensor_slices((Train_images,Train_labels))
And input this resulting object to the model.fit():
model.fit(x=image,steps_per_epoch=1000)
OR input this dataset's iterator by:
iterator = image.make_one_shot_iterator()
model.fit(x=iterator,steps_per_epoch=1000)
Both of these two options just work fine since the dataset named image here is created using the data in the memory. However, according the the Importing Data here that we may like to avoid doing this because it copies data several times and takes up memory. So another option is creating such a dataset object based on tf.placeholder as well as the initialiable iterator:
X = tf.placeholder(tf.float32,shape = [60000,28,28,1])
Y = tf.placeholder(tf.float32,shape = [60000,10])
image2 = tf.data.Dataset.from_tensor_slices((X,Y))
iterator2 = image.make_initializable_iterator()
with tf.Session() as sess:
sess.run(iterator2.initializer,feed_dict={X:Train_images,Y:Train_labels}
sess.run(iterator2.get_next())
This kind of iterator works fine when using tf.Session() when fed with the data in the memory and avoids the multiple copies of the data. But I cannot find way to let it work with keras.model.fit() since you cannot really call the iterator.initializer or feed any data there. Is there a way to use this kind of iterator?

I don't think keras officially supports a case of passing initializable iterators, as you noted, there is no place to provide the placeholders and values mappings.
However a workaround is possible using keras callbacks:
import tensorflow as tf
import numpy as np
import pandas as pd
# Make sure only tensorflow.keras is imported, don't mix with keras
from tensorflow.keras import layers
import tensorflow.keras.backend as K
# example data
x_values = np.random.randn(200, 100).astype(np.float32)
y_labels = np.random.randint(low=0, high=9, size=200)
graph = tf.Graph()
with graph.as_default():
# make datasets from placeholders as in https://www.tensorflow.org/guide/datasets#reading_input_data
# X:
features_placeholder = tf.placeholder(tf.float32, x_values.shape, name='features')
dataset_x = tf.data.Dataset.from_tensor_slices({'x': features_placeholder})
# Y:
labels_placeholder = tf.placeholder(tf.float32, [None], name='labels')
dataset_y = tf.data.Dataset.from_tensor_slices({'y': labels_placeholder})
# compose datasets to make X-Y pairs for training
dataset0 = tf.data.Dataset.zip((dataset_x, dataset_y))
dataset0 = dataset0.batch(16).repeat()
# build model with keras
inputs = tf.keras.Input(name='x', shape=(x_values.shape[1],))
mlp1 = layers.Dense(16, name='mlp-1', activation='relu')
mlp1_out = mlp1(inputs)
output = layers.Dense(1, name='y', activation='linear')
output_out = output(mlp1_out)
model = tf.keras.Model(inputs=inputs, outputs=output_out)
# The compile step specifies the training configuration.
model.compile(optimizer=tf.train.RMSPropOptimizer(0.001), loss='mse', metrics=['mse'])
iterator = dataset0.make_initializable_iterator()
feed_dict = { labels_placeholder: y_labels, features_placeholder: x_values }
class InitIteratorCallback(tf.keras.callbacks.Callback):
"""
Ensures that placeholders in dataset are initialized before each epoch begins
"""
def on_epoch_begin(self, epoch, logs=None):
sess = K.get_session()
sess.run(iterator.initializer, feed_dict=feed_dict)
model.fit(iterator, callbacks=[InitIteratorCallback()],
epochs=10, steps_per_epoch=300)

Re-train pre-trained ResNet-50 model with tf slim for classification purposes

I would like to re-train a pre-trained ResNet-50 model with TensorFlow slim, and use it later for classifying purposes.
The ResNet-50 is designed to 1000 classes, but I would like just 10 classes (land cover types) as output.
First, I try to code it for only one image, what I can generalize later.
So this is my code:
from tensorflow.contrib.slim.nets import resnet_v1
import tensorflow as tf
import tensorflow.contrib.slim as slim
import numpy as np
batch_size = 1
height, width, channels = 224, 224, 3
# Create graph
inputs = tf.placeholder(tf.float32, shape=[batch_size, height, width, channels])
with slim.arg_scope(resnet_v1.resnet_arg_scope()):
logits, end_points = resnet_v1.resnet_v1_50(inputs, is_training=False)
saver = tf.train.Saver()
with tf.Session() as sess:
saver.restore(sess, 'd:/bitbucket/cnn-lcm/data/ckpt/resnet_v1_50.ckpt')
representation_tensor = sess.graph.get_tensor_by_name('resnet_v1_50/pool5:0')
# list of files to read
filename_queue = tf.train.string_input_producer(['d:/bitbucket/cnn-lcm/data/train/AnnualCrop/AnnualCrop_735.jpg'])
reader = tf.WholeFileReader()
key, value = reader.read(filename_queue)
img = tf.image.decode_jpeg(value, channels=3)
im = np.array(img)
im = im.reshape(1,224,224,3)
predict_values, logit_values = sess.run([end_points, logits], feed_dict= {inputs: im})
print (np.max(predict_values), np.max(logit_values))
print (np.argmax(predict_values), np.argmax(logit_values))
#img = ... #load image here with size [1, 224,224, 3]
#features = sess.run(representation_tensor, {'Placeholder:0': img})
I am a bit confused about what comes next (I should open a graph, or I should load the structure of the network and load the weights, or load batches. There is a problem with the image shape as well. There are a lot of versatile documentations, which aren't easy to interpret :/
Any advice how to correct the code in order to fit my purposes?
The test image: AnnualCrop735

The resnet layer gives you predictions if you provide the num_classes kwargs. Look at the documentation and code for resnet_v1
You need to add a loss function and training operations on top of it to fine-tune the resnet_v1 with reuse
...
with slim.arg_scope(resnet_v1.resnet_arg_scope()):
logits, end_points = resnet_v1.resnet_v1_50(
inputs,
num_classes=10,
is_training=True,
reuse=tf.AUTO_REUSE)
...
...
classification_loss = slim.losses.softmax_cross_entropy(
predict_values, im_label)
regularization_loss = tf.add_n(slim.losses.get_regularization_losses())
total_loss = classification_loss + regularization_loss
train_op = slim.learning.create_train_op(classification_loss, optimizer)
optimizer = tf.train.GradientDescentOptimizer(learning_rate)
slim.learning.train(
train_op,
logdir='/tmp/',
number_of_steps=1000,
save_summaries_secs=300,
save_interval_secs=600)

How to load pre-trained tensorflow model named inception by Google?

I have downloaded a tensorflow checkpoint model named inception_resnet_v2_2016_08_30.ckpt.
Do I need to create a graph (with all the variables) that were used when this checkpoint was created?
How do I make use of this model?

First of you have get the network architecture in memory. You can get the network architecture from here
Once you have this program with you, use the following approach to use the model:
from inception_resnet_v2 import inception_resnet_v2, inception_resnet_v2_arg_scope
height = 299
width = 299
channels = 3
X = tf.placeholder(tf.float32, shape=[None, height, width, channels])
with slim.arg_scope(inception_resnet_v2_arg_scope()):
logits, end_points = inception_resnet_v2(X, num_classes=1001,is_training=False)
With this you have all the network in memory, Now you can initialize the network with checkpoint file(ckpt) by using tf.train.saver:
saver = tf.train.Saver()
sess = tf.Session()
saver.restore(sess, "/home/pramod/Downloads/inception_resnet_v2_2016_08_30.ckpt")
If you want to do bottle feature extraction, its simple like lets say you want to get features from last layer, then simply you have to declare predictions = end_points["Logits"] If you want to get it for other intermediate layer, you can get those names from the above program inception_resnet_v2.py
After that you can call: output = sess.run(predictions, feed_dict={X:batch_images})

Do I need to create a graph (with all the variables) that were used when this checkpoint was created?
No, you don't.
As for how to use checkpoint file (cpkt file)
1.This article (TensorFlow-Slim image classification library) tells you how to train your model from scratch
2.The following is an example code from google blog
import numpy as np
import os
import tensorflow as tf
import urllib2
from datasets import imagenet
from nets import inception
from preprocessing import inception_preprocessing
slim = tf.contrib.slim
batch_size = 3
image_size = inception.inception_v3.default_image_size
checkpoints_dir = '/root/code/model'
checkpoints_filename = 'inception_resnet_v2_2016_08_30.ckpt'
model_name = 'InceptionResnetV2'
sess = tf.InteractiveSession()
graph = tf.Graph()
graph.as_default()
def classify_from_url(url):
image_string = urllib2.urlopen(url).read()
image = tf.image.decode_jpeg(image_string, channels=3)
processed_image = inception_preprocessing.preprocess_image(image, image_size, image_size, is_training=False)
processed_images = tf.expand_dims(processed_image, 0)
# Create the model, use the default arg scope to configure the batch norm parameters.
with slim.arg_scope(inception.inception_resnet_v2_arg_scope()):
logits, _ = inception.inception_resnet_v2(processed_images, num_classes=1001, is_training=False)
probabilities = tf.nn.softmax(logits)
init_fn = slim.assign_from_checkpoint_fn(
os.path.join(checkpoints_dir, checkpoints_filename),
slim.get_model_variables(model_name))
init_fn(sess)
np_image, probabilities = sess.run([image, probabilities])
probabilities = probabilities[0, 0:]
sorted_inds = [i[0] for i in sorted(enumerate(-probabilities), key=lambda x:x[1])]
plt.figure()
plt.imshow(np_image.astype(np.uint8))
plt.axis('off')
plt.show()
names = imagenet.create_readable_names_for_imagenet_labels()
for i in range(5):
index = sorted_inds[i]
print('Probability %0.2f%% => [%s]' % (probabilities[index], names[index]))

Another way of loading a pre-trained Imagenet model is
ResNet50
import tensorflow as tf
from tensorflow.keras.applications.resnet50 import ResNet50
model = ResNet50()
model.summary()
InceptionV3
iport tensorflow as tf
from tensorflow.keras.applications.inception_v3 import InceptionV3
model = InceptionV3()
model.summary()
You can check a detailed explanation related to this here

Pickle python lasagne model

I have trained a simple long short-term memory (lstm) model in lasagne following the recipie here:https://github.com/Lasagne/Recipes/blob/master/examples/lstm_text_generation.py
Here is the architecture:
l_in = lasagne.layers.InputLayer(shape=(None, None, vocab_size))
# We now build the LSTM layer which takes l_in as the input layer
# We clip the gradients at GRAD_CLIP to prevent the problem of exploding gradients.
l_forward_1 = lasagne.layers.LSTMLayer(
l_in, N_HIDDEN, grad_clipping=GRAD_CLIP,
nonlinearity=lasagne.nonlinearities.tanh)
l_forward_2 = lasagne.layers.LSTMLayer(
l_forward_1, N_HIDDEN, grad_clipping=GRAD_CLIP,
nonlinearity=lasagne.nonlinearities.tanh)
# The l_forward layer creates an output of dimension (batch_size, SEQ_LENGTH, N_HIDDEN)
# Since we are only interested in the final prediction, we isolate that quantity and feed it to the next layer.
# The output of the sliced layer will then be of size (batch_size, N_HIDDEN)
l_forward_slice = lasagne.layers.SliceLayer(l_forward_2, -1, 1)
# The sliced output is then passed through the softmax nonlinearity to create probability distribution of the prediction
# The output of this stage is (batch_size, vocab_size)
l_out = lasagne.layers.DenseLayer(l_forward_slice, num_units=vocab_size, W = lasagne.init.Normal(), nonlinearity=lasagne.nonlinearities.softmax)
# Theano tensor for the targets
target_values = T.ivector('target_output')
# lasagne.layers.get_output produces a variable for the output of the net
network_output = lasagne.layers.get_output(l_out)
# The loss function is calculated as the mean of the (categorical) cross-entropy between the prediction and target.
cost = T.nnet.categorical_crossentropy(network_output,target_values).mean()
# Retrieve all parameters from the network
all_params = lasagne.layers.get_all_params(l_out)
# Compute AdaGrad updates for training
print("Computing updates ...")
updates = lasagne.updates.adagrad(cost, all_params, LEARNING_RATE)
# Theano functions for training and computing cost
print("Compiling functions ...")
train = theano.function([l_in.input_var, target_values], cost, updates=updates, allow_input_downcast=True)
compute_cost = theano.function([l_in.input_var, target_values], cost, allow_input_downcast=True)
# In order to generate text from the network, we need the probability distribution of the next character given
# the state of the network and the input (a seed).
# In order to produce the probability distribution of the prediction, we compile a function called probs.
probs = theano.function([l_in.input_var],network_output,allow_input_downcast=True)
and the model is trained via:
for it in xrange(data_size * num_epochs / BATCH_SIZE):
try_it_out() # Generate text using the p^th character as the start.
avg_cost = 0;
for _ in range(PRINT_FREQ):
x,y = gen_data(p)
#print(p)
p += SEQ_LENGTH + BATCH_SIZE - 1
if(p+BATCH_SIZE+SEQ_LENGTH >= data_size):
print('Carriage Return')
p = 0;
avg_cost += train(x, y)
print("Epoch {} average loss = {}".format(it*1.0*PRINT_FREQ/data_size*BATCH_SIZE, avg_cost / PRINT_FREQ))
How can I save the model so I do not need to train it again? With scikit I generally just pickle the model object. However I am unclear on the analogous process with Theano / lasagne.

You can save the weights with numpy:
np.savez('model.npz', *lasagne.layers.get_all_param_values(network_output))
And load them again later on like this:
with np.load('model.npz') as f:
param_values = [f['arr_%d' % i] for i in range(len(f.files))]
lasagne.layers.set_all_param_values(network_output, param_values)
Source: https://github.com/Lasagne/Lasagne/blob/master/examples/mnist.py
As for the model definition itself: One option is certainly to keep the code and regenerate the network, before setting the pretrained weights.

You can save the model parameters and the model by Pickle
import cPickle as pickle
import os
#save the network and its parameters as a dictionary
netInfo = {'network': network, 'params': lasagne.layers.get_all_param_values(network)}
Net_FileName = 'LSTM.pkl'
# save the dictionary as a .pkl file
pickle.dump(netInfo, open(os.path.join(/path/to/a/folder/, Net_FileName), 'wb'),protocol=pickle.HIGHEST_PROTOCOL)
After saving your model, it can be retrieved by pickle.load:
net = pickle.load(open(os.path.join(/path/to/a/folder/,Net_FileName),'rb'))
all_params = net['params']
lasagne.layers.set_all_param_values(net['network'], all_params)

I've had success using dill in combination with the numpy.savez function:
import dill as pickle
...
np.savez('model.npz', *lasagne.layers.get_all_param_values(network))
with open('model.dpkl','wb') as p_output:
pickle.dump(network, p_output)
To import the pickled model:
with open('model.dpkl', 'rb') as p_input:
network = pickle.load(p_input)
with np.load('model.npz') as f:
param_values = [f['arr_%d' % i] for i in range(len(f.files))]
lasagne.layers.set_all_param_values(network, param_values)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do I save an optimizer state of JAX trained model? - python

Related

How can I pass Input/Output images to Tensorboard using Keras model.fit() method to train a model?

keras model.fit() fed with initializable iterator of tf.Dataset object

Re-train pre-trained ResNet-50 model with tf slim for classification purposes

How to load pre-trained tensorflow model named inception by Google?

Pickle python lasagne model

Categories

Resources