I have a few questions regarding the SavedModel API, whose documentation I find leaves a lot of details unexplained.
The first three questions are about what to pass to the arguments of the add_meta_graph_and_variables() method of tf.saved_model.builder.SavedModelBuilder, while the fourth question is about why to use the SavedModel API over tf.train.Saver.
What is the format of the signature_def_map argument? Do I normally need to set this argument when saving a model?
Similarly, What is the format of the assets_collection argument?
Why do you save a list of tags with a metagraph as opposed to just giving it a name (i.e. attaching just one unique tag to it)? Why would I add multiple tags to a given metagraph? What if I try to load a metagrpah from a pb by a certain tag, but multiple metagraphs in that pb match that tag?
The documentation argues that it is recommended to use SavedModel to save entire models (as opposed to variables only) in self-contained files. But tf.train.Saver also saves the graph in addition to the variables in a .meta file. So what are the advantages of using SavedModel? The documentation says
When you want to save and load variables, the graph, and the graph's
metadata--basically, when you want to save or restore your model--we
recommend using SavedModel. SavedModel is a language-neutral,
recoverable, hermetic serialization format. SavedModel enables
higher-level systems and tools to produce, consume, and transform
TensorFlow models.
but this explanation is quite abstract and doesn't really help me understand what the advantages of SavedModel are. What would be concrete examples where SavedModel (as opposed to tf.train.Saver) would be better to use?
Please note that my question is not a duplicate of this question. I'm not asking how to save a model, I am asking very specific questions about the properties of SavedModel, which is only one of multiple mechanisms TensorFlow provides to save and load models. None of the answers in the linked question touch on the SavedModel API (which, once again, is not the same as tf.train.Saver).
EDIT: I wrote this back at TensorFlow 1.4. As of today (TensorFlow 1.12 is stable, there's a 1.13rc and 2.0 is around the corner) the docs linked in the question are much improved.
I'm trying to use tf.saved_model and also found the Docs quite (too) abstract. Here's my stab at a full answer to your questions:
1. signature_def_map:
a. Format See Tom's answer to Tensorflow: how to save/restore a model. (Ctrl-F for "tf.saved_model" - currently, the only uses of the phrase on that question are in his answer).
b. need It's my understanding that you do normally need it. If you intend to use the model, you need to know the inputs and outputs of the graph. I think it is akin to a C++ function signature: If you intend to define a function after it's called or in another C++ file, you need the signature in your main file (i.e. prototyped or in a header file).
2. assets_collection:
format: Couldn't find clear documentation, so I went to the builder source code. It appears that the argument is an iterable of Tensors of dtype=tf.string, where each Tensor is a path for the asset directory. So, a TensorFlow Graph collection should work. I guess that is the parameter's namesake, but from the source code I would expect a Python list to work too.
(You didn't ask if you need to set it, but judging from Zoe's answer to What are assets in tensorflow? and iga's answer to the tangentially related Tensorflow serving: “No assets to save/writes” when exporting models, it doesn't usually need set.)
3. Tags:
a. Why list I don't know why you must pass a list, but you may pass a list with one element. For instance, in my current project I only use the [tf...tag_constants.SERVING] tag.
b. When to use multiple Say you're using explicit device placement for operations. Maybe you want to save a CPU version and a GPU version of your graph. Obviously you want to save a serving version of each, and say you want to save training checkpoints. You could use a CPU/GPU tag and a training/serving tag to manage all cases. The docs hint at it:
Each MetaGraphDef added to the SavedModel must be annotated with user-specified tags. The tags provide a means to identify the specific MetaGraphDef to load and restore, along with the shared set of variables and assets. These tags typically annotate a MetaGraphDef with its functionality (for example, serving or training), and optionally with hardware-specific aspects (for example, GPU).
c. Collision
Too lazy to force a collision myself - I see two cases that would need addressed - I went to the loader source code. Inside def load, you'll see:
saved_model = _parse_saved_model(export_dir)
found_match = False
for meta_graph_def in saved_model.meta_graphs:
if set(meta_graph_def.meta_info_def.tags) == set(tags):
meta_graph_def_to_load = meta_graph_def
found_match = True
break
if not found_match:
raise RuntimeError(
"MetaGraphDef associated with tags " + str(tags).strip("[]") +
" could not be found in SavedModel. To inspect available tag-sets in"
" the SavedModel, please use the SavedModel CLI: `saved_model_cli`"
)
It appears to me that it's looking for an exact match. E.g. say you have a metagraph with tags "GPU" and "Serving" and a metagraph with tag "Serving". If you load "Serving", you'll get the latter metagraph. On the other hand, say you have a metagraph "GPU" and "Serving" and a metagraph "CPU" and "Serving". If you try to load "Serving", you'll get the error. If you try to save two metagraphs with the exact same tags in the same folder, I expect you'll overwrite the first one. It doesn't look like the build code handles such a collision in any special way.
4. SavedModel or tf.train.Saver:
This confused me too. wicke's answer to Should TensorFlow users prefer SavedModel over Checkpoint or GraphDef? cleared it up for me. I'll throw in my two cents:
In the scope of local Python+TensorFlow, you can make tf.train.Saver do everything. But, it will cost you. Let me outline the save-a-trained-model-and-deploy use case. You'll need your saver object. It's easiest to set it up to save the complete graph (every variable). You probably don't want to save the .meta all the time since you're working with a static graph. You'll need to specify that in your training hook. You can read about that on cv-tricks. When your training finishes, you'll need convert your checkpoint file to a pb file. That usually means clearing the current graph, restoring the checkpoint, freezing your variables to constants with tf.python.framework.graph_util, and writing it with tf.gfile.GFile. You can read about that on medium. After that, you want to deploy it in Python. You'll need the input and output Tensor names - the string names in the graph def. You can read about that on metaflow (actually a very good blog post for the tf.train.Saver method). Some op nodes will let you feed data into them easily. Some not so much. I usually gave up on finding an appropriate node and added a tf.reshape that didn't actually reshape anything to the graph def. That was my ad-hoc input node. Same for the output. And then finally, you can deploy your model, at least locally in Python.
Or, you could use the answer I linked in point 1 to accomplish all this with the SavedModel API. Less headaches thanks to Tom's answer . You'll get more support and features in the future if it ever gets documented appropriately . Looks like it's easier to use command line serving (the medium link covers doing that with Saver - looks tough, good luck!). It's practically baked in to the new Estimators. And according to the Docs,
SavedModel is a language-neutral, recoverable, hermetic serialization format.
Emphasis mine: Looks like you can get your trained models into the growing C++ API much easier.
The way I see it, it's like the Datasets API. It's just easier than the old way!
As far as concrete examples of SavedModel of tf.train.Saver: If "basically, when you want to save or restore your model" isn't clear enough for you: The correct time to use it is any time it makes your life easier. To me, that looks like always. Especially if you're using Estimators, deploying in C++, or using command line serving.
So that's my research on your question. Or four enumerated questions. Err, eight question marks. Hope this helps.
Related
While we are replicating the implementation of NGNN Dressing as a whole paper, I am stuck on one pickle file which is actually required to progress further i.e. fill_in_blank_1000_from_test_score.pkl.
Can someone help by sharing the same, else with its alternative?
Github implementation doesn't contain the same!
https://github.com/CRIPAC-DIG/NGNN
You're not supposed to use main_score.py (which requires the fill_in_blank_1000_from_test_score.pkl pickle), it's obsolete - only authors fail to mention this in the README. The problem was raised in this issue. Long story short, use another "main": main_multi_modal.py.
One of the comments explains in details how to proceed, I will copy it here so that it does not get lost:
Download the pre-processed dataset from the authors (the one on Google Drive, around 7 GB)
Download the "normal" dataset in order to get the text to the images (the one on Github, only a few MBs)
Change all the folder paths in the files to your corresponding ones
run "onehot_embedding.py" to create the textual features (The rest of the pre-processing was already done by the authors)
run "main_multi_modal.py" to train. In the end of the file you can adjust the config of the network (Beta, d, T etc.), so the file
"Config.py" is useless here.
If you want to train several instances in the for-loop, you need to reset the graph at the begining of the training. Just add
"tf.reset_default_graph()" at the start of the function "cm_ggnn()"
With this setup, I could reproduce the results fairly well with the
same accuracy as in the paper.
I'm trying to export the fasttext model created by gensim to a binary file. But the docs are unclear about how to achieve this.
What I've done so far:
model.wv.save_word2vec_format('model.bin')
But this does not seems like the best solution. Since later when I want to load the model using the :
fasttext.load_facebook_model('model.bin')
I get into an infinite loop. While loading the fasttext.model created by model.save('fasttext.model) function gets completed in around 30 seconds.
Using .save_word2vec_format() saves just the full-word vectors, to a simple format that was used by Google's original word2vec.c release. It doesn't save unique things about a full FastText model. Such files would be reloaded with the matched .load_word2vec_format().
The .load_facebook_format() method loads files in the format saved by Facebook's original (non-Python) FastText code release. (The name of this method is pretty misguided, since 'facebook' could mean so many different things other than a specific data format.) Gensim doesn't have a matched method for saving to this same format – though it probably wouldn't be very hard to implement, and would make symmetric sense to support this export option.
Gensim's models typically implement gensim-native .save() and .load() options, which make use of a mix of Python 'pickle' serialization and raw large-array files. These are your best options if you want to save the full model state, for later reloading back into Gensim.
(Such files can't be loaded by other FastText implementations.)
Be sure to keep the multiple related files written by this .save() (all with the same user-supplied prefix) together when moving the saved model to a new location.
Update (May 2020): Recent versions of gensim such as 3.8.3 and later include a new contributed FastText.save_facebook_model() method which saves to the original Facebook FastTExt binary format.
I am using object detection in tensorflow api. In my previous work I used to check the current step and save my model every n steps, something like the approach mentioned here.
In this case though the authors use TensorFlow-Slim to perform the training. So, they use a tf.train.Saver which is passed to the actual function which performs the training: slim.learning.train(). While this function has some parameters regarding the interval for writing down the training model using the parameter save_interval_secs it is time dependent and not step dependent.
So, since tf.train.Saver is a "passive" utility as mentioned here and just saves a model with the provided parameters, meaning is ignorant of any time or step notions, and also in object detection code the control is passed in TensorFlow-Slim, by passing the saver as a parameter, in this case how can I achieve to save my model step-ward (every n steps instead of every x seconds)?
The only solution is to dig into slim code and edit it (with all risks coming from this)? Or is there another option I am not familiar with?
P.S.1
I found out there is a astonishingly similar question about this option here but unfortunately it did not have any answers. So, since my problem persists I will leave this question intact to raise some interest in the issue.
P.S.2
Looking into slim.learning code I found out that in train() after passing the parameters it just passes the control over to supervisor.Supervisor which refers to tf.train.Supervisor which is a bit odd since this class is considered deprecated. The use of supervisor is also mentioned in the docstrings of slim.learning.
I use the follwoing code to load the net and set up it, the parameters of layers are stored in deploy.prototxt.
net = caffe.Net(deploy.prototxt, caffemodel, caffe.TEST)
However, what I want to do is to modify the parameters (e.g. kernel_size, or pad, etc.) of layers dynamically instead of modifying the prototxt file and reload it.
Is there any way to do so?
You can write your own get/set methods and expose them to python.
In layer.hpp:
virtual float GetParameter(const std::string param_name) {return -1;}
virtual void SetParameter(const std::string param_name, float val) {}
Then redefine these methods in layers where you would like to get/set parameters dynamically.
The last step is to expose the methods to python. In _caffe.cpp add this for bp::class_<Layer...:
.def("get_parameter", &Layer<Dtype>::GetParameter)
.def("set_parameter", &Layer<Dtype>::SetParameter)
I would suggest changing the way you think of this problem. What does it depend on for what you mentioned "dynamically modified parameter"? A most commonly used variable (which I was facing) is the current iteration times. For example I want to reduce the parameter value in every 10000 times. Based on that, in the layer when you are using that parameter, apply the function to modify it. That's the same as modifying the prototxt file.
To get the iteration times in specific layer, I just put one other's solution here. It is quite straightforward and could probably reduce your work load significantly compared to modifying prototxt file. Hopefully you could get inspired from this solution and apply it in your case.
https://stackoverflow.com/a/38386603/6591990
[Solution at the end of the post]
I needed to finetune a model and hence wanted to change the lr_mult parameters of individual layers programmatically. My search for help started from the title of this thread and thankfully ended in the below mentioned link titled 'How to modify a prototxt programmatically?'. https://github.com/BVLC/caffe/issues/4878
The parameters can be accessed and modified after loading the model definition prototxt file in google/protobuf in text_format. The modified protobuf can be written as a file.
I've been working through the TensorFlow documentation (still learning), and I can't figure out how to represent input/output sequence data. My inputs are a sequences of 20 8-entry vectors, making a 8x20xN matrix, where N is the number of instances. I'd like to eventually pass these through an LSTM for sequence to sequence learning. I know I need a 3D vector, but I'm unsure which dimensions are which.
RTFMs with pointers to the correct documentation greatly appreciated. I feel like this is obvious and I'm just missing it.
As described in the excellent blog post by WildML, the proper way is to save your example in a TFRecord using the formate tf.SequenceExample(). Using TFRecords for this provides the following advantages:
You can split your data in to many files and load them each on a different GPU.
You can use Tensorflow utilities for loading the data (for example using Queues to load you data on demand.
Your model code will be separate from your dataset processing (this is a good habit to have).
You can bring new data to your model just by putting it into this format.
TFRecords uses protobuf or protocol buffers as a way to format your data. The documentation of which can be found here. The basic idea is you have a format for your data (in this case in the format of tf.SequenceExample) save it to a TFRecord and load it using the same data definition. Code for this pattern can be found at this ipython notebook.
As my answer is mostly summarizing the WildML blog post on this topic, I suggest you check that out, again found here.