Deploying TensorFlow Checkpoint to Google Cloud Platform - python

I have trained a TensorFlow model and saved a checkpoint, and would like to deploy it to Google Cloud Platform. In the model deployment documentation it says that you need to create a SavedModel. It seems that others also use checkpoints instead of SavedModel.
Given that I have already spent time training this model and only have checkpoints instead of a SavedModel, is there a method I can use to deploy the model still or will I need to retrain?

A checkpoint maps variable names to tensor values. This, as is, is not enough for higher-level systems to use your model. On the other hand, a SavedModel is complete and airtight. As is made clear in the answer you link to in your post, a SavedModel provides all the info needed for serving TensorFlow models: a set of MetaGraphs, a checkpoint compatible with these Graphs and all necessary asset files. If you look at it this way, it makes sense that you need to export your model to a SavedModel in order to deploy it to ML Engine. Now, this does not imply that you need to retrain. What you need to do instead is to wrap one of your checkpoints into a SavedModel.

Related

Load tensorflow SavedModel in Rstudio trained in Google Cloud ML

I trained a model in Google Cloud ML and saved it as a saved model format. I've attached the directory for the saved model below.
https://drive.google.com/drive/folders/18ivhz3dqdkvSQY-dZ32TRWGGW5JIjJJ1?usp=sharing
I am trying to load the model into R using the following code but it is returning <tensorflow.python.training.tracking.tracking.AutoTrackable> with an object size of 552 bytes, definetly not correct. If anyone can properly load the model, I would love to know how you did it. It should also be able to be loaded into python I assume, that could work too. The model was trained on GPU, not sure which tensorflow version. Thank you very much!
library(keras)
list.files("/path/to/inceptdual400OG")
og400<-load_model_tf("/path/to/inceptdual400OG")
Since the shared model is not available anymore (it says that is in the trash folder)and it is not specified in the question I can't tell which framework you used to save the model on first place. I will suggest trying the Keras load function or the Tensorflow load function depending on which type of saved file model you have.
Bear in mind modify this argument as "compile = FALSE" if you have the model already compiled.
Remember to import the latest libraries if you trained your model with tf>=2.0 because of dependencies incompatibilities {Tensorflow, Keras} and rsconnect::appDependencies() output would be worth checking.

Is there a decent workaround to saving checkpoints in local drive when using TPU in Tensorflow?

A follow up to this question:
How to save a Tensorflow Checkpoint file from Google Colaboratory in when using TPU mode?
Where the official way of saving a checkpoint when using a Tensorflow TPU is to use the Google Cloud Service.
I am working if there is a workaround to this for those who do not wish to use GCS. Perhaps for each variable, do a .eval(), save the variable. And then set the save variable to the 'init' value for each variable.
A major issue I foresee though is saving and loading the parameters for the optimizers.
For Keras, the weights do seem to be saved from TPU to local
https://colab.research.google.com/github/tensorflow/tpu/blob/master/tools/colab/shakespeare_with_tpu_and_keras.ipynb
INFO:tensorflow:Copying TPU weights to the CPU
So I imagine that there's a general workaround too, without using keras.
Take a look at THIS CODE from Keras
If I understood correctly weights are not saved drectly from TPU, instead weights are synced to CPU and the saved to colab storage.
EDIT
Also see: this answer.
I just found the below solution after seeing this thread, so I wanted to add this option in. From the tensorflow documentation, there is an option field that you can use in save/load/restore functions in keras as well as tf.train.Checkpoint and the save method of tf.train.CheckpointManager which allow you to pass in an experimental localhost syncing strategy.
Copying their code example:
model = get_model()
# Saving the model to a path on localhost.
saved_model_path = '/tmp/tf_save'
save_options = tf.saved_model.SaveOptions(experimental_io_device='/job:localhost')
model.save(saved_model_path, options=save_options)
# Loading the model from a path on localhost.
another_strategy = tf.distribute.MirroredStrategy()
with another_strategy.scope():
load_options = tf.saved_model.LoadOptions(experimental_io_device='/job:localhost')
loaded = tf.keras.models.load_model(saved_model_path, options=load_options)
Documentation Sources:
https://www.tensorflow.org/tutorials/distribute/save_and_load#savingloading_from_a_local_device
https://www.tensorflow.org/api_docs/python/tf/train/Checkpoint#restore

Deploying Tensorflow app on the customer's infra

I'm currently developing a prediction model using Tensorflow and my model works well for a customer, so I'm tring to make it as a real product.
My model needs to be retrained using customer's input as time passes, and it should be deployed on customers infrastructure. (Not a SaaS or cloud.) Moreover, I'd like to protect my codes and models.
From my understanding of Tensorflow, trained model can be exported as protobuf, freezed and kept nodes that are required by prediction. freeze_graph.py at Tensorflow repo, I tried it and I successfully ran my prediction model using Golang + libtensorflow.so runtime. (Or, I could use Tensorflow Serving & C++)
If I can train my model on our company's infra, I could say "Okay, let's get some beers". However, my model has to be trained on the customer's infra, and without python code, it seems like I cannot train my model.
https://www.tensorflow.org/versions/r0.12/how_tos/language_bindings/index.html
At this time, support for gradients, functions and control flow operations ("if" and "while") is not available in languages other than Python. This will be updated when the C API provides necessary support.
Is there any workaround deploying TF app without exposing python code or model? Thanks in advance.
You can still use Python with a pre-trained model, without exposing all the code you needed to build it in the first place. As an example of this, have a look at the Inception retraining code, which loads a pretrained GraphDef and then retrains a new top layer:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/image_retraining/retrain.py

TensorFlow restore/deploy network without the model?

I've built and trained some networks with TensorFlow and successfully managed to save and restore the model's parameters.
However, for some scenarios - e.g. like deploying a trained network in a customer's infrastructure - it is not the best solution to ship the full code/model. Thus, I am wondering if there is any way to restore/run a trained network without the original code/model used for training?
I guess this leads to the question if TensorFlow is able to save a (compressed?) version of the network architecture into the checkpoint files in addition to the weights of the variables.
Is this somehow possible?
If you really need to restore just from the graphdef file (*.pb), to load it from another application for instance, you will need to use the freeze_graph.py script from here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py
This script takes a graphdef (.pb) and a checkpoint (.ckpt) file as input and outputs a graphdef file which contains the weights in the form of constants (you can read the docs on the script for more details).

TensorFlow: How do I release a model without source code?

I am using Tensorflow + Python.
I am curious if I can release a saved Tensorflow model (architecture + trained variables) without detailed source code. I'm aware of tf.train.Saver(), but it looks to save only variables, and in order to restore/run them, a user needs to "define" the same architecture.
For the testing/running purpose only, is there a way to release a saved {architecture+trained variables} without source code, so that a user can just cast a query and get a result?
The TensorFlow Serving project is intended to make this use case straightforward (assuming that the end user is only using the model for inference, not training). TensorFlow Serving includes an Exporter class that takes your tf.train.Saver, the tf.GraphDef that defines your overall model, and a "signature" that describes the inputs to and output from your model.
The basics tutorial has a good introduction to exporting your model.
You can build a Saver from the MetaGraphDef (saved with checkpoints by default: those .meta files). and then use that Saver to restore your model. So users don't have to re-define your graph in their code. But then they still need to figure out the model signature (input, output variables). I solve this using tf.Collection (but i am interested to find better ways to do it as well).
You can take a look at my example implementation (the eval.py evaluate a model without re-defining a model):
reconstruct saver from meta graph https://github.com/falcondai/cifar10/blob/master/eval.py#L18
get input variables from collections https://github.com/falcondai/cifar10/blob/master/eval.py#L58
how to define your model https://github.com/falcondai/cifar10/blob/master/models/cp2f3d.py

Categories