I want to use tensorflow hub, to retrain one of its modules in my graph and then use that module, but my problem is when I set trainable = True and tags = {"train"} to create the module, I can not do an evaluation because of batch normalization layers.
so as I read about this issue, I found that I should create also another graph for evaluation without setting tags = {"train"}. but I don't know how to restore variables from train graph into eval graph. I tried creating both modules with the same name and use reuse = True in the eval graph, but it wasn't helpful.
Specifying the Solution referenced by Arno in Answer Section (even though it is present in Comments Section), for the benefit of the community.
Answer is:
With hub.Module for TF1, the situation is as you say: either the training or the inference graph is instantiated, and there is no good way to import both and share variables between them in a single tf.Session. That's informed by the approach used by Estimators and many other training scripts in TF1 (esp. distributed ones): there's a training Session that produces checkpoints, and a separate evaluation Session that restores model weights from them. (The two will likely also differ in the dataset they read and the preprocessing they perform.)
With TF2 and its emphasis on Eager mode, this has changed. TF2-style Hub modules (as found at https://tfhub.dev/s?q=tf2-preview) are really just TF2-style SavedModels, and these don't come with multiple graph versions. Instead, the call function on the restored top-level object takes an optional training=... parameter if the train/inference distinction is required.
With this, TF2 should match your expectations. See the interactive demo tf2_image_retraining.ipynb and the underlying code in tensorflow_hub/keras_layer.py for how it can be done. The TF Hub team is working on making more complete selection of modules available for the TF2 release.
Related
Basically I have two models to run in sequence. However, the first is a object-based model trained in TF2, the second is trained in TF1.x saved as name-based ckpt.
The foundamental conflict here is that in tf.compat.v1 mode I have to disable_eager_execution to run the mode, while the other model needs Eager execution (otherwise ~2.5 times slower).
I tried to find a way to convert the TF1 ckpt to object-based TF2 model but I don't think it's an easy way... maybe I have to rebuild the model and copy the weights according to the variable one by one (nightmare).
So Anyone knew if there's a way to just temporarily turn off the eager_excution? That would solve everything here... I would really appreciate it!
I regretfully have to inform you that, in my experience, this is not possible. I had the same issue. I believe the tensorflow documentation actually states that once it is turned off it stays off for the remainder of the session. You cannot turn it back on even if you try. This is a problem anytime you turn off eager execution, and the status will remain as long as the Tensorflow module is loaded in a particular python instance.
My suggestion for transferring the model weights and biases is to dump them to a pickle file as numpy arrays. I know it's possible because the Tensorflow 1.X model I was using did this in its code (I didn't write that model). I ended up loading that pickle file and reconstructing a new Tensorflow 2.X model via a for loop. This works well for sequential models. If any branching occurs the looping method won't work super well, or it will be hard successfully implement.
As a heads up, unless you want to train the model further the best way to load initialize those weights is to use the tf.constant_initializer (or something along those lines). When I did convert the model to Tensorflow 2.X, I ended up creating a custom initializer, but apparently you can just use a regular initializer and then set weights and biases via model or layer attributes or functions.
I ultimately had to convert the Tensorflow 1.x + compat code to Tensorflow 2.X, so I could train the models natively in Tensorflow 2.X.
I wish I could offer better news and information, but this was my experience and solution to the same problem.
I'm using models I didn't create but modified (from this repo https://github.com/GeorgeSeif/Semantic-Segmentation-Suite)
I have trained models and can use them to predict well enough but I want to run entire folders of images through and split the work between multiple gpus. I don't fully understand how tf.device() works and what I have tried didnt work at all.
I assumed I could do something like so:
for i, d in enumerate(['\gpu:0', '\gpu:1']):
with tf.device(d):
output = sess.run(network, feed_dict={net_input: image_batch[i]})
But this doesnt actually allocate the tasks to the different GPUs, it doesn't raise an error either.
My question is, is it possible to allocate the different images to different instances of the session on seperate GPUs without explicitly modifying the network code pre train. I would like to avoid running two different python scripts with CUDA_VISIBLE_DEVICES = ...
Is there a simple way to do this?
From what I understand the definitions of the operations have to be nested in a "with tf.device()" block, however when inferencing the operation is just the loading of the model and weights but if I put that in a "with tf.device()" block I get an error saying the graph already exists and cannot be defined twice.
tf.device only applies when building the graph, not executing it, so wrapping session.run calls in a device context does nothing.
Instead I recommend you use tf replicator or tf distribution strategy (tf.distribute / tf.contrib.distribute depending on the tf version), specifically the MirroredStrategy.
I have a standard pipeline that evaluates the model after training an epoch. I need resnet50 to be finetunable while training, so I instantiate like so:
resnet50_module = hub.Module("https://tfhub.dev/google/imagenet/resnet_v2_50/feature_vector/1",
trainable=True, name="resnet50_finetunable", tags={"train"})
However, I read here that I should unset the tags when evaluating.
I realize that I can save the model, close the session, reset the graph, rebuild the model with the tags=None and load the weights from a checkpoint to do the eval. This seems very wasteful specially since the size of the model is huge due to resnet50, and I need to do hundreds of epochs to get good results. Is there a way to alternate between tags without this?
Thanks!
I'm afraid there is no good way to do this without going through a checkpoint.
Variables are created when hub.Module() is called, so they are tied to a particular graph version (tags={"train"} for training or the empty tag set for inference). What you describe could be read as a feature request to set that separately for each application of the module, but that doesn't exist yet (and has some ramifications).
Is checkpointing to local disk really that expensive compared to the eval you want to run? Wouldn't you want to checkpoint at times anyways, to allow resuming after a crash?
I am using Tensorflow + Python.
I am curious if I can release a saved Tensorflow model (architecture + trained variables) without detailed source code. I'm aware of tf.train.Saver(), but it looks to save only variables, and in order to restore/run them, a user needs to "define" the same architecture.
For the testing/running purpose only, is there a way to release a saved {architecture+trained variables} without source code, so that a user can just cast a query and get a result?
The TensorFlow Serving project is intended to make this use case straightforward (assuming that the end user is only using the model for inference, not training). TensorFlow Serving includes an Exporter class that takes your tf.train.Saver, the tf.GraphDef that defines your overall model, and a "signature" that describes the inputs to and output from your model.
The basics tutorial has a good introduction to exporting your model.
You can build a Saver from the MetaGraphDef (saved with checkpoints by default: those .meta files). and then use that Saver to restore your model. So users don't have to re-define your graph in their code. But then they still need to figure out the model signature (input, output variables). I solve this using tf.Collection (but i am interested to find better ways to do it as well).
You can take a look at my example implementation (the eval.py evaluate a model without re-defining a model):
reconstruct saver from meta graph https://github.com/falcondai/cifar10/blob/master/eval.py#L18
get input variables from collections https://github.com/falcondai/cifar10/blob/master/eval.py#L58
how to define your model https://github.com/falcondai/cifar10/blob/master/models/cp2f3d.py
I am using Tensorflow + Python.
I am curious if I can release a saved Tensorflow model (architecture + trained variables) without detailed source code. I'm aware of tf.train.Saver(), but it looks to save only variables, and in order to restore/run them, a user needs to "define" the same architecture.
For the testing/running purpose only, is there a way to release a saved {architecture+trained variables} without source code, so that a user can just cast a query and get a result?
The TensorFlow Serving project is intended to make this use case straightforward (assuming that the end user is only using the model for inference, not training). TensorFlow Serving includes an Exporter class that takes your tf.train.Saver, the tf.GraphDef that defines your overall model, and a "signature" that describes the inputs to and output from your model.
The basics tutorial has a good introduction to exporting your model.
You can build a Saver from the MetaGraphDef (saved with checkpoints by default: those .meta files). and then use that Saver to restore your model. So users don't have to re-define your graph in their code. But then they still need to figure out the model signature (input, output variables). I solve this using tf.Collection (but i am interested to find better ways to do it as well).
You can take a look at my example implementation (the eval.py evaluate a model without re-defining a model):
reconstruct saver from meta graph https://github.com/falcondai/cifar10/blob/master/eval.py#L18
get input variables from collections https://github.com/falcondai/cifar10/blob/master/eval.py#L58
how to define your model https://github.com/falcondai/cifar10/blob/master/models/cp2f3d.py