is it possible to share tensorflow checkpoint files with other users (plattform & CPU/GPU independet)? I had shared a tensorflow implementation of the DeconvNet and now I want to provide the trained weights. Can I simply upload the saved model or is there another tf way? I'm asking because I read a tutorial were the weights were stored using numpy.savetxt and then restored during the weight initalization. But this method was used for the MNIST example which uses a very small net..
Thanks!
You could save metagraph + provide code to restore and run your model --
http://tensorflow.org/how_tos/meta_graph
One downside of this is that it doesn't provide annotations of which tensors to feed/fetch, so you need to provide some code showing how to use it.
SavedModel is the next iteration of TensorFlow checkpoint format that takes care of that, but it doesn't have much documentation yet.
I use pickle, in binary mode, to dump and load big numpy matrix and it works quite well.
Related
Basically I have two models to run in sequence. However, the first is a object-based model trained in TF2, the second is trained in TF1.x saved as name-based ckpt.
The foundamental conflict here is that in tf.compat.v1 mode I have to disable_eager_execution to run the mode, while the other model needs Eager execution (otherwise ~2.5 times slower).
I tried to find a way to convert the TF1 ckpt to object-based TF2 model but I don't think it's an easy way... maybe I have to rebuild the model and copy the weights according to the variable one by one (nightmare).
So Anyone knew if there's a way to just temporarily turn off the eager_excution? That would solve everything here... I would really appreciate it!
I regretfully have to inform you that, in my experience, this is not possible. I had the same issue. I believe the tensorflow documentation actually states that once it is turned off it stays off for the remainder of the session. You cannot turn it back on even if you try. This is a problem anytime you turn off eager execution, and the status will remain as long as the Tensorflow module is loaded in a particular python instance.
My suggestion for transferring the model weights and biases is to dump them to a pickle file as numpy arrays. I know it's possible because the Tensorflow 1.X model I was using did this in its code (I didn't write that model). I ended up loading that pickle file and reconstructing a new Tensorflow 2.X model via a for loop. This works well for sequential models. If any branching occurs the looping method won't work super well, or it will be hard successfully implement.
As a heads up, unless you want to train the model further the best way to load initialize those weights is to use the tf.constant_initializer (or something along those lines). When I did convert the model to Tensorflow 2.X, I ended up creating a custom initializer, but apparently you can just use a regular initializer and then set weights and biases via model or layer attributes or functions.
I ultimately had to convert the Tensorflow 1.x + compat code to Tensorflow 2.X, so I could train the models natively in Tensorflow 2.X.
I wish I could offer better news and information, but this was my experience and solution to the same problem.
I would like to save weights (and biases) from a CNN that I implemented and trained from scratch using Tensorflow (Python API).
Now I would like to save these weights in a file and share it with someone so he can use my network. But since I have a lot of weights I don't know. How can/should I do that? Is there a format recommended to do that?
Tensorflow provides a way to save your model: tensorflow.org/api_docs/python/tf/train/Saver. Your friend should then also use Tensorflow to load them. The language you load / save with doesn't affect how it is saved - if you save them in Tensorflow in Python they can be read in Tensorflow in C++.
I am using Tensorflow + Python.
I am curious if I can release a saved Tensorflow model (architecture + trained variables) without detailed source code. I'm aware of tf.train.Saver(), but it looks to save only variables, and in order to restore/run them, a user needs to "define" the same architecture.
For the testing/running purpose only, is there a way to release a saved {architecture+trained variables} without source code, so that a user can just cast a query and get a result?
The TensorFlow Serving project is intended to make this use case straightforward (assuming that the end user is only using the model for inference, not training). TensorFlow Serving includes an Exporter class that takes your tf.train.Saver, the tf.GraphDef that defines your overall model, and a "signature" that describes the inputs to and output from your model.
The basics tutorial has a good introduction to exporting your model.
You can build a Saver from the MetaGraphDef (saved with checkpoints by default: those .meta files). and then use that Saver to restore your model. So users don't have to re-define your graph in their code. But then they still need to figure out the model signature (input, output variables). I solve this using tf.Collection (but i am interested to find better ways to do it as well).
You can take a look at my example implementation (the eval.py evaluate a model without re-defining a model):
reconstruct saver from meta graph https://github.com/falcondai/cifar10/blob/master/eval.py#L18
get input variables from collections https://github.com/falcondai/cifar10/blob/master/eval.py#L58
how to define your model https://github.com/falcondai/cifar10/blob/master/models/cp2f3d.py
From what I've gathered so far, there are several different ways of dumping a TensorFlow graph into a file and then loading it into another program, but I haven't been able to find clear examples/information on how they work. What I already know is this:
Save the model's variables into a checkpoint file (.ckpt) using a tf.train.Saver() and restore them later (source)
Save a model into a .pb file and load it back in using tf.train.write_graph() and tf.import_graph_def() (source)
Load in a model from a .pb file, retrain it, and dump it into a new .pb file using Bazel (source)
Freeze the graph to save the graph and weights together (source)
Use as_graph_def() to save the model, and for weights/variables, map them into constants (source)
However, I haven't been able to clear up several questions regarding these different methods:
Regarding checkpoint files, do they only save the trained weights of a model? Could checkpoint files be loaded into a new program, and be used to run the model, or do they simply serve as ways to save the weights in a model at a certain time/stage?
Regarding tf.train.write_graph(), are the weights/variables saved as well?
Regarding Bazel, can it only save into/load from .pb files for retraining? Is there a simple Bazel command just to dump a graph into a .pb?
Regarding freezing, can a frozen graph be loaded in using tf.import_graph_def()?
The Android demo for TensorFlow loads in Google's Inception model from a .pb file. If I wanted to substitute my own .pb file, how would I go about doing that? Would I need to change any native code/methods?
In general, what exactly is the difference between all these methods? Or more broadly, what is the difference between as_graph_def()/.ckpt/.pb?
In short, what I'm looking for is a method to save both a graph (as in, the various operations and such) and its weights/variables into a file, which can then be used to load the graph and weights into another program, for use (not necessarily continuing/retraining).
Documentation about this topic isn't very straightforward, so any answers/information would be greatly appreciated.
There are many ways to approach the problem of saving a model in TensorFlow, which can make it a bit confusing. Taking each of your sub-questions in turn:
The checkpoint files (produced e.g. by calling saver.save() on a tf.train.Saver object) contain only the weights, and any other variables defined in the same program. To use them in another program, you must re-create the associated graph structure (e.g. by running code to build it again, or calling tf.import_graph_def()), which tells TensorFlow what to do with those weights. Note that calling saver.save() also produces a file containing a MetaGraphDef, which contains a graph and details of how to associate the weights from a checkpoint with that graph. See the tutorial for more details.
tf.train.write_graph() only writes the graph structure; not the weights.
Bazel is unrelated to reading or writing TensorFlow graphs. (Perhaps I misunderstand your question: feel free to clarify it in a comment.)
A frozen graph can be loaded using tf.import_graph_def(). In this case, the weights are (typically) embedded in the graph, so you don't need to load a separate checkpoint.
The main change would be to update the names of the tensor(s) that are fed into the model, and the names of the tensor(s) that are fetched from the model. In the TensorFlow Android demo, this would correspond to the inputName and outputName strings that are passed to TensorFlowClassifier.initializeTensorFlow().
The GraphDef is the program structure, which typically does not change through the training process. The checkpoint is a snapshot of the state of a training process, which typically changes at every step of the training process. As a result, TensorFlow uses different storage formats for these types of data, and the low-level API provides different ways to save and load them. Higher-level libraries, such as the MetaGraphDef libraries, Keras, and skflow build on these mechanisms to provide more convenient ways to save and restore an entire model.
You can try the following code:
with tf.gfile.FastGFile('model/frozen_inference_graph.pb', "rb") as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
g_in = tf.import_graph_def(graph_def, name="")
sess = tf.Session(graph=g_in)
I am using Tensorflow + Python.
I am curious if I can release a saved Tensorflow model (architecture + trained variables) without detailed source code. I'm aware of tf.train.Saver(), but it looks to save only variables, and in order to restore/run them, a user needs to "define" the same architecture.
For the testing/running purpose only, is there a way to release a saved {architecture+trained variables} without source code, so that a user can just cast a query and get a result?
The TensorFlow Serving project is intended to make this use case straightforward (assuming that the end user is only using the model for inference, not training). TensorFlow Serving includes an Exporter class that takes your tf.train.Saver, the tf.GraphDef that defines your overall model, and a "signature" that describes the inputs to and output from your model.
The basics tutorial has a good introduction to exporting your model.
You can build a Saver from the MetaGraphDef (saved with checkpoints by default: those .meta files). and then use that Saver to restore your model. So users don't have to re-define your graph in their code. But then they still need to figure out the model signature (input, output variables). I solve this using tf.Collection (but i am interested to find better ways to do it as well).
You can take a look at my example implementation (the eval.py evaluate a model without re-defining a model):
reconstruct saver from meta graph https://github.com/falcondai/cifar10/blob/master/eval.py#L18
get input variables from collections https://github.com/falcondai/cifar10/blob/master/eval.py#L58
how to define your model https://github.com/falcondai/cifar10/blob/master/models/cp2f3d.py