MultiLayerNetwork saved in DL4J is not loading in Python

MultiLayerNetwork saved in DL4J is not loading in Python - python

I have been working with neural networks in Deeplearning4j and needed to switch it to Python. To use the same model (MultiLayerNetwork in DL4J) I saved it as a .h5 file. Like this:
File newFile = new File("newModel.h5");
ModelSerializer.writeModel(network, newFile, true);
Now, when i try to load it in Python I get the following error:
OSError: SavedModel file does not exist at: newModel.h5/{saved_model.pbtxt|saved_model.pb}
I have tried to use different extensions like .pb and used relative and absolute paths in python. Nothing helped. Can anyone explain to me why this happens? There seems to be not enough information on this issue on the internet and it seems to be the only way to implement the same code in python is to train a new model, etc.

a dl4j model is a zip file. Could you clarify what you're trying to do? If you imported it from keras and need to resave it, the best you can do is export the weights as a numpy array and recreate the architecutre. You can do that with model.params() which gives you the weights.

Related

Unable to load pre-trained model checkpoint with TensorFlow Object Detection API

Similar to this question:
Where can I find model.ckpt in faster_rcnn_resnet50_coco model? (this solution doesn't work for me)
I have downloaded the ssd_resnet152_v1_fpn_1024x1024_coco17_tpu-8 with the intention of using it as a starting point. I am using the sample model configuration associated with that model in the TF model zoo.
I am only changing the num classes and paths for tuning, training and eval.
With:
fine_tune_checkpoint: "C:\\Users\\Peter\\Desktop\\Adv-ML-Project\\models\\research\\object_detection\\test_data\\checkpoint\\model.ckpt"
I get:
tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for C:\Users\Pierre\Desktop\Adv-ML-Project\models\research\object_detection\test_data\checkpoint\model.ckpt
With:
fine_tune_checkpoint: "C:\\Users\\Peter\\Desktop\\Adv-ML-Project\\models\\research\\object_detection\\test_data\\checkpoint\\ckpt-0.*"
I get:
tensorflow.python.framework.errors_impl.DataLossError: Unable to open table file C:\Users\Pierre\Desktop\Adv-ML-Project\models\research\object_detection\test_data\checkpoint\ckpt-0.data-00000-of-00001: Data loss: not an sstable (bad mag
ic number): perhaps your file is in a different file format and you need to use a different restore operator?
I'm currently using absolute paths because it's easiest, but if it's a problem I can re-organize my project structure.
Checkpoint Folder
The official documentation from https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_training_and_evaluation.md
says to do something like
fine_tune_checkpoint: a path prefix to the pre-existing checkpoint (ie:"/usr/home/username/checkpoint/model.ckpt-#####").
Is there something I am doing wrong here? I am running this with the following command (also from documentation):
python object_detection/model_main_tf2.py \
--pipeline_config_path="C:\\Users\Pierre\\Desktop\\Adv-ML-Project\\models\\my_model\\my_model.config" \
--model_dir="C:\\Users\\Pierre\\Desktop\\Adv-ML-Project\\models\\my_model\\training" \
--alsologtostderr

Try changing the fine_tune_checkpoint path in the config file to something like path_to_folder/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/checkpoint/ckpt-0
And in your training command, set the model_dir flag to just point to the model directory, don't include training, kind of like --model_dir=<path_to>/ssd_resnet152_v1_fpn_1024x1024_coco17_tpu-8
Source
Just change the backslashes to forward-slashes, since you're on windows

Correct pb file to move Tensorflow model into ML.NET

I have a TensorFlow model that I built (a 1D CNN) that I would now like to implement into .NET.
In order to do so I need to know the Input and Output nodes.
When I uploaded the model on Netron I get a different graph depending on my save method and the only one that looks correct comes from an h5 upload. Here is the model.summary():
If I save the model as an h5 model.save("Mn_pb_model.h5") and load that into the Netron to graph it, everything looks correct:
However, ML.NET will not accept h5 format so it needs to be saved as a pb.
In looking through samples of adopting TensorFlow in ML.NET, this sample shows a TensorFlow model that is saved in a similar format to the SavedModel format - recommended by TensorFlow (and also recommended by ML.NET here "Download an unfrozen [SavedModel format] ..."). However when saving and loading the pb file into Netron I get this:
And zoomed in a little further (on the far right side),
As you can see, it looks nothing like it should.
Additionally the input nodes and output nodes are not correct so it will not work for ML.NET (and I think something is wrong).
I am using the recommended way from TensorFlow to determine the Input / Output nodes:
When I try to obtain a frozen graph and load it into Netron, at first it looks correct, but I don't think that it is:
There are four reasons I do not think this is correct.
it is very different from the graph when it was uploaded as an h5 (which looks correct to me).
as you can see from earlier, I am using 1D convolutions throughout and this is showing that it goes to 2D (and remains that way).
this file size is 128MB whereas the one in the TensorFlow to ML.NET example is only 252KB. Even the Inception model is only 56MB.
if I load the Inception model in TensorFlow and save it as an h5, it looks the same as from the ML.NET resource, yet when I save it as a frozen graph it looks different. If I take the same model and save it in the recommended SavedModel format, it shows up all messed up in Netron. Take any model you want and save it in the recommended SavedModel format and you will see for yourself (I've tried it on a lot of different models).
Additionally in looking at the model.summary() of Inception with it's graph, it is similar to its graph in the same way my model.summary() is to the h5 graph.
It seems like there should be an easier way (and a correct way) to save a TensorFlow model so it can be used in ML.NET.
Please show that your suggested solution works: In the answer that you provide, please check that it works (load the pb model [this should also have a Variables folder in order to work for ML.NET] into Netron and show that it is the same as the h5 model, e.g., screenshot it). So that we are all trying the same thing, here is a link to a MNIST ML crash course example. It takes less than 30s to run the program and produces a model called my_model. From here you can save it according to your method and upload it to see the graph on Netron. Here is the h5 model upload:

This answer is made of 3 parts:
going through other programs
NOT going through other programs
Difference between op-level graph and conceptual graph (and why Netron show you different graphs)
1. Going through other programs:
ML.net needs an ONNX model, not a pb file.
There is several ways to convert your model from TensorFlow to an ONNX model you could load in ML.net :
With WinMLTools tools: https://learn.microsoft.com/en-us/windows/ai/windows-ml/convert-model-winmltools
With MMdnn: https://github.com/microsoft/MMdnn
With tf2onnx: https://github.com/onnx/tensorflow-onnx
If trained with Keras, with keras2onnx: https://github.com/onnx/keras-onnx
This SO post could help you too: Load model with ML.NET saved with keras
And here you will find more informations on the h5 and pb files formats, what they contain, etc.: https://www.tensorflow.org/guide/keras/save_and_serialize#weights_only_saving_in_savedmodel_format
2. But you are asking "TensorFlow -> ML.NET without going through other programs":
2.A An overview of the problem:
First, the pl file format you made using the code you provided from seems, from what you say, to not be the same as the one used in the example you mentionned in comment (https://learn.microsoft.com/en-us/dotnet/machine-learning/tutorials/text-classification-tf)
Could to try to use the pb file that will be generated via tf.saved_model.save ? Is it working ?
A thought about this microsoft blog post:
From this page we can read:
In ML.NET you can load a frozen TensorFlow model .pb file (also called
“frozen graph def” which is essentially a serialized graph_def
protocol buffer written to disk)
and:
That TensorFlow .pb model file that you see in the diagram (and the
labels.txt codes/Ids) is what you create/train in Azure Cognitive
Services Custom Vision then exporte as a frozen TensorFlow model file
to be used by ML.NET C# code.
So, this pb file is a type of file generated from Azure Cognitive Services Custom Vision.
Perharps you could try this way too ?
2.B Now, we'll try to provide the solution:
In fact, in TensorFlow 1.x you could save a frozen graph easily, using freeze_graph.
But TensorFlow 2.x does not support freeze_graph and converter_variables_to_constants.
You could read some usefull informations here too: Tensorflow 2.0 : frozen graph support
Some users are wondering how to do in TF 2.x: how to freeze graph in tensorflow 2.0 (https://github.com/tensorflow/tensorflow/issues/27614)
There are some solutions however to create the pb file you could load in ML.net as you want:
https://leimao.github.io/blog/Save-Load-Inference-From-TF2-Frozen-Graph/
How to save Keras model as frozen graph? (already linked in your question though)
Difference between op-level graph and conceptual graph (and why Netron show you different graphs):
As #mlneural03 said in a comment to you question, Netron shows a different graph depending on what file format you give:
If you load a h5 file, Netron wil display the conceptual graph
If you load a pb file, Netron wil display the op-level graph
What is the difference between a op-level graph and a conceptual graph ?
In TensorFlow, the nodes of the op-level graph represent the operations ("ops"), like tf.add , tf.matmul , tf.linalg.inv, etc.
The conceptual graph will show you your your model's structure.
That's completely different things.
"ops" is an abbreviation for "operations".
Operations are nodes that perform the computations.
So, that's why you get a very large graph with a lot of nodes when you load the pb fil in Netron: you see all the computation nodes of the graph.
but when you load the h5 file in Netron, you "just" see your model's tructure, the design of your model.
In TensorFlow, you can view your graph with TensorBoard:
By default, TensorBoard displays the op-level graph.
To view the coneptual graph, in TensorBoard, select the "keras" tag.
There is a Jupyter Notebook that explains very clearly the difference between the op-level graph and the coneptual graph here: https://colab.research.google.com/github/tensorflow/tensorboard/blob/master/docs/graphs.ipynb
You can also read this "issue" on the TensorFlow Github too, related to your question: https://github.com/tensorflow/tensorflow/issues/39699
In a nutshell:
In fact there is no problem, just a little misunderstanding (and that's OK, we can't know everything).
You would like to see the same graphs when loading the h5 file and the pb file in Netron, but it has to be unsuccessful, because the files does not contains the same graphs. These graphs are two ways of displaying the same model.
The pb file created with the method we described will be the correct pb file to load whith ML.NET, as described in the Microsoft's tutorial we talked about. SO, if you load you correct pb file as described in these tutorials, you wil load your real/true model.

How to convert matterport mask_rcnn keras(.h5) model to coreml model(.mlmodel) using coremltools

I trained a model using matterport maskrcnn. I already had .h5 model file but i am not able to convert it to .mlmodel. As there are many custom layers involved. I already tried whatever I am able to find on google regarding the same. I also tried https://github.com/edouardlp/Mask-RCNN-CoreML for conversion. So far no success.
Does anybody able to did the conversion so far successfully, if yes can you share the codebase or tutorial for the same.

I am able to convert using the same github repo mentioned in the question. But you can't debug the code in Xcode as maskrcnn is to memory heavy. Its better to use another architecture like deeplab.

Here's a github project https://github.com/edouardlp/Mask-RCNN-CoreML/releases/tag/0.2 with a MaskRCNN.ml model.
Note: You have to copy the models into the project to get it to compile.

Tensorflow, change checkpoint model(.meta .index .data) to frozen model(.pd)

I am not familiar with tensorflow.
I want to transform this network, https://github.com/jiangsutx/SRN-Deblur, from tensorflow to nvidia tensorRT. It need a '.pb' model file, but the project only giving three model files as follow:
deblur.model-52300.data-00000-of-00001
deblur.model-52300.index
deblur.model-52300.meta
So I want to transform these files to a '.pb' file.
I have tested the ideas given by:
https://blog.metaflow.fr/tensorflow-how-to-freeze-a-model-and-serve-it-with-a-python-api-d4f3596b3adc
Tensorflow: How to convert .meta, .data and .index model files into one graph.pb file
The problem is that both gave ideas are failed because of the get_checkpoint_state() and latest_checkpoint() giving None value.
Is this caused by the missing of checkpoint file?
Are there other ways to implement this?
Any idea is appreciated.
Thanks.

As you can see from their own repo: they use get_checkpoint_state to test a pre-trained model.
https://github.com/jiangsutx/SRN-Deblur/blob/master/models/model.py#L245
So I'd say yes it is because of the missing .ckpt file, not provided by the author.
From experience, usually the first method from metaflow works quite well.

Issues while interfacing caffe with c++ or python

What I have read about the tutorials is that you create your data then write the model using protobuf and then you write the solver file. Finally you train the model and you get your generated file. All this is done though command line. Now there are two questions
1) Suppose I have the generated model now how do I load a new image not in the test folder and perform a forward pass. Should it be done though command line or from some language(c++, python) ?
2) I guess above was one way of doing it. What is the best way to train the classifier (command line train/ or though coding) and how to use the generated model file(after training) in your code.
I want to interface caffe with my code but I am not able to find a short tutorial which will give me step by step on any database say mnist and the model doesn't need to be as complicated as LeNet but a simple Fully connected layer will also do. But can anyone tell me how to just write a simple code using C++ or python and train any dataset from scratch.
A sample C++/python code for training a classifier and using it to predict new data using caffe would also be appreciated.

Training is best done using the command line. See this tutorial.
Once you trained a model and you have a myModel.caffemodel file (a binary file storing the wieghts of the different layers) and a deploy.prototxt file (a text file describing your net), you can use python interface to classify images.
You can run python script classify.py to classify image(s) from command line. This script wraps around classifier.py - a python object that holds a trained net and allows you to perform forward passes in python.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.