I used this code for training a model.
I now have 3 files:
model.ckpt-1.meta
model.ckpt-1.index
model.ckpt-1.data-00000-of-00001
How (with what methods) can I use these models now?
I'm not exactly sure what you mean with
How (with what methods) can I use these models now?
The model is not saved in those files but i can be restored with them.
Those*.ckpt get saved during training but do not contain your model. If you want to "use" your model you need to restore those files to it. Take a look at Tensorflow's Checkpoint and CheckpointManager. This tutorial shows a simple snipped of how to restore .ckpt files to your model.
Related
I have successfully trained a Convolutional neural network model using Google Colab in a file named model_prep.py. The model receives 92% accuracy. Now that i'm happy with the model I have used pyTorch to save my model.
torch.save(model, '/content/drive/MyDrive/myModel.pt')
My understanding of this is that once the model has been fully trained, I could use pyTorch to save the trained model to then be loaded into future projects for predictions on new data. Therefore I created a separate test.py file where i loaded the trained model like so,
model = torch.load('/content/drive/MyDrive/myModel.pt')
model.eval()
But within the new test.py file, I receive an error message
AttributeError: Can't get attribute 'ResNet1D' on <module '__main__'>
Although this error does not occur when loading the model in the same notebook as the trained model was created (model_prep.py). This error only occurs when loading the model into a separate notebook with no model architecture. How do I go about this problem? I would like to load the trained model into a new separate file to perform on new data. Can someone suggest a solution?
In the future, I would like to create a GUI using tkinter and deploy the trained model to check predictions using new data within the tkinter file. Is this possible?
Even I was facing the same error. What this trying to say is create an instance of your model by calling the class and then do torch.load().
If go to one of blogs on Saving and Loading Models by PyTorch and there in load section, you can clearly see this line # Model class must be defined somewhere.
Hence I would recommend, in your test.py file try to define the model class as you have done in train.py (guessing this is your filename where you have created your model) and then load as shown below.
model = ModelClass()
model = torch.load(PATH, , map_location=torch.device('cpu')) #<--- if current device is 'CPU'
model.eval() #<---- To prevent it from going to retraining mode.
As stated by Pytorch Blog (here) Saving a model in this way will save the entire module using Python’s pickle module. The disadvantage of this approach is that the serialized data is bound to the specific classes and the exact directory structure used when the model is saved. The reason for this is that pickle does not save the model class itself. Rather, it saves a path to the file containing the class, which is used during load time. Because of this, your code can break in various ways when used in other projects or after refactors.
I fixed this with the TorchScript approach. We save model with
model_scripted = torch.jit.script(model)# Export to TorchScript
model_scripted.save('model_scripted.pt') # Save
and to load it
model = torch.jit.load('model_scripted.pt')
model.eval()
More details here
I trained a computer vision model in google colab.
Now I want to get it to my computer to work with it.
If I do model.save("name.h5") and then download the .h5 file, does this save the trained model?
Or is it just the untrained structure of the model?
Yes, model.save("name.h5") saves the trained model. Of course, you should execute this line after you have trained/fit the model.
However, model.save() only saves the model structure and the updated weights. And it does not store any loss function weights and information of the loss function. Therefore, you should avoid re-training your model after loading it from the saved file.
If you are willing to further re-train your model, you should use tf.keras.models.save_model() to save a model, and tf.keras.models.load_model() to load a model. Further information can be found in the Keras documentation.
I am using Tensorflow + Python.
I am curious if I can release a saved Tensorflow model (architecture + trained variables) without detailed source code. I'm aware of tf.train.Saver(), but it looks to save only variables, and in order to restore/run them, a user needs to "define" the same architecture.
For the testing/running purpose only, is there a way to release a saved {architecture+trained variables} without source code, so that a user can just cast a query and get a result?
The TensorFlow Serving project is intended to make this use case straightforward (assuming that the end user is only using the model for inference, not training). TensorFlow Serving includes an Exporter class that takes your tf.train.Saver, the tf.GraphDef that defines your overall model, and a "signature" that describes the inputs to and output from your model.
The basics tutorial has a good introduction to exporting your model.
You can build a Saver from the MetaGraphDef (saved with checkpoints by default: those .meta files). and then use that Saver to restore your model. So users don't have to re-define your graph in their code. But then they still need to figure out the model signature (input, output variables). I solve this using tf.Collection (but i am interested to find better ways to do it as well).
You can take a look at my example implementation (the eval.py evaluate a model without re-defining a model):
reconstruct saver from meta graph https://github.com/falcondai/cifar10/blob/master/eval.py#L18
get input variables from collections https://github.com/falcondai/cifar10/blob/master/eval.py#L58
how to define your model https://github.com/falcondai/cifar10/blob/master/models/cp2f3d.py
From what I've gathered so far, there are several different ways of dumping a TensorFlow graph into a file and then loading it into another program, but I haven't been able to find clear examples/information on how they work. What I already know is this:
Save the model's variables into a checkpoint file (.ckpt) using a tf.train.Saver() and restore them later (source)
Save a model into a .pb file and load it back in using tf.train.write_graph() and tf.import_graph_def() (source)
Load in a model from a .pb file, retrain it, and dump it into a new .pb file using Bazel (source)
Freeze the graph to save the graph and weights together (source)
Use as_graph_def() to save the model, and for weights/variables, map them into constants (source)
However, I haven't been able to clear up several questions regarding these different methods:
Regarding checkpoint files, do they only save the trained weights of a model? Could checkpoint files be loaded into a new program, and be used to run the model, or do they simply serve as ways to save the weights in a model at a certain time/stage?
Regarding tf.train.write_graph(), are the weights/variables saved as well?
Regarding Bazel, can it only save into/load from .pb files for retraining? Is there a simple Bazel command just to dump a graph into a .pb?
Regarding freezing, can a frozen graph be loaded in using tf.import_graph_def()?
The Android demo for TensorFlow loads in Google's Inception model from a .pb file. If I wanted to substitute my own .pb file, how would I go about doing that? Would I need to change any native code/methods?
In general, what exactly is the difference between all these methods? Or more broadly, what is the difference between as_graph_def()/.ckpt/.pb?
In short, what I'm looking for is a method to save both a graph (as in, the various operations and such) and its weights/variables into a file, which can then be used to load the graph and weights into another program, for use (not necessarily continuing/retraining).
Documentation about this topic isn't very straightforward, so any answers/information would be greatly appreciated.
There are many ways to approach the problem of saving a model in TensorFlow, which can make it a bit confusing. Taking each of your sub-questions in turn:
The checkpoint files (produced e.g. by calling saver.save() on a tf.train.Saver object) contain only the weights, and any other variables defined in the same program. To use them in another program, you must re-create the associated graph structure (e.g. by running code to build it again, or calling tf.import_graph_def()), which tells TensorFlow what to do with those weights. Note that calling saver.save() also produces a file containing a MetaGraphDef, which contains a graph and details of how to associate the weights from a checkpoint with that graph. See the tutorial for more details.
tf.train.write_graph() only writes the graph structure; not the weights.
Bazel is unrelated to reading or writing TensorFlow graphs. (Perhaps I misunderstand your question: feel free to clarify it in a comment.)
A frozen graph can be loaded using tf.import_graph_def(). In this case, the weights are (typically) embedded in the graph, so you don't need to load a separate checkpoint.
The main change would be to update the names of the tensor(s) that are fed into the model, and the names of the tensor(s) that are fetched from the model. In the TensorFlow Android demo, this would correspond to the inputName and outputName strings that are passed to TensorFlowClassifier.initializeTensorFlow().
The GraphDef is the program structure, which typically does not change through the training process. The checkpoint is a snapshot of the state of a training process, which typically changes at every step of the training process. As a result, TensorFlow uses different storage formats for these types of data, and the low-level API provides different ways to save and load them. Higher-level libraries, such as the MetaGraphDef libraries, Keras, and skflow build on these mechanisms to provide more convenient ways to save and restore an entire model.
You can try the following code:
with tf.gfile.FastGFile('model/frozen_inference_graph.pb', "rb") as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
g_in = tf.import_graph_def(graph_def, name="")
sess = tf.Session(graph=g_in)
I am using Tensorflow + Python.
I am curious if I can release a saved Tensorflow model (architecture + trained variables) without detailed source code. I'm aware of tf.train.Saver(), but it looks to save only variables, and in order to restore/run them, a user needs to "define" the same architecture.
For the testing/running purpose only, is there a way to release a saved {architecture+trained variables} without source code, so that a user can just cast a query and get a result?
The TensorFlow Serving project is intended to make this use case straightforward (assuming that the end user is only using the model for inference, not training). TensorFlow Serving includes an Exporter class that takes your tf.train.Saver, the tf.GraphDef that defines your overall model, and a "signature" that describes the inputs to and output from your model.
The basics tutorial has a good introduction to exporting your model.
You can build a Saver from the MetaGraphDef (saved with checkpoints by default: those .meta files). and then use that Saver to restore your model. So users don't have to re-define your graph in their code. But then they still need to figure out the model signature (input, output variables). I solve this using tf.Collection (but i am interested to find better ways to do it as well).
You can take a look at my example implementation (the eval.py evaluate a model without re-defining a model):
reconstruct saver from meta graph https://github.com/falcondai/cifar10/blob/master/eval.py#L18
get input variables from collections https://github.com/falcondai/cifar10/blob/master/eval.py#L58
how to define your model https://github.com/falcondai/cifar10/blob/master/models/cp2f3d.py