Creating a custom LSTMCell in Tensorflow with GPU support

Creating a custom LSTMCell in Tensorflow with GPU support - python

I would like to integrate an attentional component into the LSTM model I'm creating. Unfortunately, with tensorflow 2.3.1 that I'm using, it appears that if you subclass the LSTMCell, you have to run the model on CPU. From the tensorflow documentation:
CuDNN is only available at the layer level, and not at the cell level.
Which means I'm relegated to the CPU if I try something like this:
output=keras.layers.RNN(AttentionLSTMCell(400), return_sequences=True, stateful=False)(input_layer);
Where AttentionLSTMCell is a custom class I made that will take in some additional constants (generally an output of the previous timestamp and some new input) that will condition the output of the LSTM. In fact, the documentation seems to suggest that even only specific conditioning is allowed. I am about to dig into creating a full custom Layer (perhaps copy the existing and see if I can add my new inputs in call), but is there a better way? It makes prototyping quite difficult. Large recurrent networks are slow to train, especially in my case where I integrate image data as input.

Related

Why there are two different flags to disable gradient computation in PyTorch

I am an intermediate learner in PyTorch and in some recent cases, I have seen people use the torch.inference_mode() instead of the famous torch.no_grad() while validating your trained agent in reinforcement learning (RL) experiments. I checked the documentation and they have a table that consists of two flags to disable the gradient computation. And to be honest, if I read the description it sounds exactly the same to me. Has someone figured out an explanation?

So I have been scraping the web for a few days and I think I got my explanation. The torch.inference() mode has been added as an even more optimized way of doing inference with PyTorch (versus the torch.no_grad()). I listened to the PyTorch podcast and they have an explanation as to why there exists to different flags.
Version control of tensors: Let's say, you have a code in PyTorch and you have used it to train an agent. When you do torch.no_grad() and just run inference on the trained model, there are still some functionalities of PyTorch like version counting of tensor which are still in play, which gets allocated every time a tensor is created and increments (version bumps) when you mutate that specific tensor. Keeping a check of all the versions of all the tensors requires extra cost from computation and we can't just get rid of them as we have to keep an eye out for tensor mutations, either (directly) to that specific tensor or (indirectly) aliasing to some other tensor which is saved for backward computation.
View Tracking of Tensor: Pytorch tensors are strided. What that means is PyTorch uses stride in the backend for indexing, which can be used if you want to directly access specific elements in the memory block. But in the case of torch.autograd, what if you took a tensor and created a new view, and mutated it with a tensor that is associated with the backward computation? With torch.no_grad they keep record of some view metadata which is required to keep track of which tensors require gradients and which not. This also add up an extra overhead to you computation resources.
So torch.autograd check for these changes which don't get tracked when you switch to torch.inference_mode() (instead of torch.no_grad()) and if you code is not exploiting the above two points then inference mode works and reduces the code execution time. (PyTorch dev team says they have seen a bump of 5-10% while deploying models in production at Facebook.)

How to temporarily turn off/on eager_execution in TF2.x?

Basically I have two models to run in sequence. However, the first is a object-based model trained in TF2, the second is trained in TF1.x saved as name-based ckpt.
The foundamental conflict here is that in tf.compat.v1 mode I have to disable_eager_execution to run the mode, while the other model needs Eager execution (otherwise ~2.5 times slower).
I tried to find a way to convert the TF1 ckpt to object-based TF2 model but I don't think it's an easy way... maybe I have to rebuild the model and copy the weights according to the variable one by one (nightmare).
So Anyone knew if there's a way to just temporarily turn off the eager_excution? That would solve everything here... I would really appreciate it!

I regretfully have to inform you that, in my experience, this is not possible. I had the same issue. I believe the tensorflow documentation actually states that once it is turned off it stays off for the remainder of the session. You cannot turn it back on even if you try. This is a problem anytime you turn off eager execution, and the status will remain as long as the Tensorflow module is loaded in a particular python instance.
My suggestion for transferring the model weights and biases is to dump them to a pickle file as numpy arrays. I know it's possible because the Tensorflow 1.X model I was using did this in its code (I didn't write that model). I ended up loading that pickle file and reconstructing a new Tensorflow 2.X model via a for loop. This works well for sequential models. If any branching occurs the looping method won't work super well, or it will be hard successfully implement.
As a heads up, unless you want to train the model further the best way to load initialize those weights is to use the tf.constant_initializer (or something along those lines). When I did convert the model to Tensorflow 2.X, I ended up creating a custom initializer, but apparently you can just use a regular initializer and then set weights and biases via model or layer attributes or functions.
I ultimately had to convert the Tensorflow 1.x + compat code to Tensorflow 2.X, so I could train the models natively in Tensorflow 2.X.
I wish I could offer better news and information, but this was my experience and solution to the same problem.

TF Keras v 1.14+: subclass model or subclass layer for "module"

Tensorflow has some docs for subclassing (tf) Keras Model and Layer.
However, it is unclear which to use for "modules" or "blocks" (e.g. several layers collectively).
Since it is technically several layers, I feel that subclassing Layer would be deceiving, and while subclassing Model works, I am unsure if there are any negative penalties for doing so.
e.g.
x = inputs
a = self.dense_1(x) # <--- self.dense_1 = tf.keras.Dense(...)
b = self.dense_2(a)
c = self.add([x, b])
which is appropriate to use?

(Please note that this answer is old, later, Keras changed to allow and use subclassing regularly)
Initially, there is no need to sublass anything with Keras. Unless you have a particular reason for that (which is not building, training, predicting), you don't subclass for Keras.
Buiding a Keras model:
Either using Sequential (the model is ready already, just add layers), or using Model (create a graph with layers and finally call Model(inputs, outputs)), you don't need to subclass.
The moment you create an instance of Sequential or Model, you have a fully defined model, ready to use in all situations.
This model can even be used as parts of other models, its layers can be easily accessed to get intermetiate outputs and create new branches in your graph.
So, I don't see any reason at all to subclass Model, unless you are using some additional framework that would require this (but I don't think so). This seems to be something from PyTorch users (because this kind of model building is typical for PyTorch, create a subclass for Module and add layers and a call method). But Pytorch doesn't offer the same ease as Keras does for getting intermediate results.
The main advantage of using Keras is exactly this: you can easily access layers from blocks and models and instantly start branching from that point without needing to rebuild any call methods or adding any extra code for that in the models. So, when you subclass Model, you just defeat the purpose of Keras making it all more difficult.
The docs you mentioned say:
Model subclassing is particularly useful when eager execution is enabled since the forward pass can be written imperatively.
I don't really understand what "imperatively" means, and I don't see how it would be easier than just building a model with regular layers.
Another quote from the docs:
Key Point: Use the right API for the job. While model subclassing offers flexibility, it comes at a cost of greater complexity and more opportunities for user errors. If possible, prefer the functional API.
Well... it's always possible.
Subclassing layers
Here, there may be good reasons to do so. And these reasons are:
You want a layer that performs custom calculations that are not available with regular layers
This layer must have persistent weights.
If you don't need "both" things above, you don't need to subclass a layer. If you just want "custom calculations" without weights, a Lambda layer is enough.

Deploy Semantic Segmentation Network (U-Net) with TensorRT (no upsampling support)

I am trying to deploy a trained U-Net with TensorRT. The model was trained using Keras (with Tensorflow as backend). The code is very similar to this one: https://github.com/zhixuhao/unet/blob/master/model.py
When I converted the model to UFF format, using some code like this:
import uff
import os
uff_fname = os.path.join("./models/", "model_" + idx + ".uff")
uff_model = uff.from_tensorflow_frozen_model(
frozen_file = os.path.join('./models', trt_fname), output_nodes = output_names,
output_filename = uff_fname
)
I will get the following warning:
Warning: No conversion function registered for layer: ResizeNearestNeighbor yet.
Converting up_sampling2d_32_12/ResizeNearestNeighbor as custom op: ResizeNearestNeighbor
Warning: No conversion function registered for layer: DataFormatVecPermute yet.
Converting up_sampling2d_32_12/Shape-0-0-VecPermuteNCHWToNHWC-LayoutOptimizer as custom op: DataFormatVecPermute
I tried to avoid this by replacing the upsampling layer with upsampling(bilinear interpolation) and transpose convolution. But the converter would throw me similar errors. I checked https://docs.nvidia.com/deeplearning/sdk/tensorrt-support-matrix/index.html and it seemed all these operations are not supported yet.
I am wondering if there is any workaround to this problem? Is there any other format/framework that TensorRT likes and has upsampling supported? Or is it possible to replace it with some other supported operations?
I also saw somewhere that one can add customized operations to replace those unsupported ones for TensorRT. Though I am not so sure how the workflow would be. It would also be really helpful if someone could point out an example of custom layers.
Thank you in advance!

The warnings are because these operations are not supported yet by TensorRT, as you already mentioned.
Unfortunately there is no easy way to fix this. You either have to modify the graph (even after training) to use a combination supported operation only; or write these operation yourself as custom layer.
However, there is a better way to run inference on other devices in C++. You can use TensorFlow mixed with TensorRT together. TensorRT will analyze the graph for ops that it supports and convert them to TensorRT nodes, and the remaining of the graph will be handled by TensorFlow as usual. More information here. This solution is much faster than rewriting the operations yourself. The only complicated part is to build TensorFlow from sources on your target device and generating the dynamic library tensorflow_cc. Recently there are many guides and support for TensorFlow ports to various architectures e.g. ARM.

Update 09/28/2019
Nvidia released TensorRT 6.0.1 about two weeks ago and added a new API called "IResizeLayer". This layer supports "Nearest" interpolation and can thus be used to implement upsampling. No need to use custom layers/plugins any more!
Original answer:
thanks for all the answers and suggestions posted here!
In the end, we implemented the network in TensorRT C++ API directly and loaded the weights from the .h5 model file. We haven't got the time to profile and polish the solution yet, but the inference seems to be working according to the test images we fed in.
Here's the workflow we've adopted:
Step 1: Code the upsampling layer.
In our U-Net model, all the upsampling layer has a scaling factor of (2, 2) and they all use ResizeNearestNeighbor interpolation. Essentially, pixel value at (x,y) in the original tensor will go to four pixels: (2x, 2y), (2x+1, 2y), (2x, 2y+1) and (2x+1, 2y+1) in the new tensor. This can be easily coded up into a CUDA kernel function.
Once we got the upsampling kernel we need to wrap it with TensorRT API, specifically the IPluginV2Ext class. The developer reference has some descriptions of what functions need to be implemented. I'd say enqueue() is the most important function because the CUDA kernel gets executed there.
There are also examples in the TensorRT Samples folder. For my version, these resources are helpful:
Github: Leaky Relu as custom layer
TensorRT-5.1.2.2/samples/sampleUffSSD
TensorRT-5.1.2.2/samples/sampleSSD
Step 2: Code the rest of the network using TensorRT API
The rest of the network should be quite straightforward. Just find call different "addxxxLayer" function from TensorRT network definitions.
One thing to keep in mind:
depending on which version of TRT you are using, the way to add padding can be different. I think the newest version (5.1.5) allows developers to add parameters in addConvolution() so that the proper padding mode can be selected.
My model was trained using Keras, the default padding mode is that the right and bottom get more padding if the total number of padding is not even. Check this Stack Overflow link for details. There's a mode in 5.1.5 that represents this padding scheme.
If you are on an older version (5.1.2.2), you will need to add the padding as a separate layer before the convolution layer, which has two parameters: pre-padding and post-padding.
Also, all things are NCHW in TensorRT
Helpful sample:
TensorRT-5.1.2.2/samples/sampleMNISTAP
Step 3: Load the weights
TensorRT wants weights in format [out_c, in_c, filter_h, filter_w], which is mentioned in an archived documentation. Keras has weights in format [filter_h, filter_w, c_in, c_out].
We got a pure weights file by calling model.save_weights('weight.h5') in Python. Then we can read the weights into a Numpy array using h5py, performed transposing and saved the transposed weights as a new file. We also figured out the Group and Dataset name using h5py. This info was used when loading weights into C++ code using HDF5 C++ API.
We compared the output layer by layer between C++ code and Python code. For our U-Net, all the activation maps are the same till maybe the third block (after 2 pooling). After that, there is a tiny difference between pixel values. The absolute percentage error is 10^-8 so we don't think it's that bad. We are still in the process of polishing the C++ implementation.
Again, thanks for all the suggestions and answers we got in this post. Hope our solution can be helpful as well!

Hey I've done something similar, I'd say the best way to tackle the issue is to export your model to .onnx with a good like this one, if you check the support matrix for onnx, upsample is supported:
Then you can use https://github.com/onnx/onnx-tensorrt to convert the onnx-model to tensorrt, I've this to convert a network that I trained in pytorch and that had upsample. The repo for onnx-tensorrt is a bit more active, and if you check the pr tab you can check other people writing custom layers and fork from there.

Tensorflow LSTM model parameter learning inside parameter

I'm tryinig to train my LSTM model in tensorflow and my module has to calculate parameter inside parameter. And i want to train both parameters altogether.
More details are in the picture below.
I think that tensorflow LSTM module's input must be a perfect sequence and parameters like "tf.placeholder".
How can i do this in tensorflow? Or can you recommend another appropriate framework better than tensorflow in this task?
Sorry for my poor english.

First of all your usage of the word parameter is quite confusing. Normally parameters are referred as trainable parameters and therefore every variable which is trained by the optimizer. There are also so-called hyper-parameters, which have to be set per hand e.g. like the model topology.
Tensorflow work with tensors, which are representations of data which are used to build the workflow and are filled with data during run time via placeholder which is like an entry point for the data.
Also, if you have trouble to build your model in tensorflow, then there is also keras. Keras can run with tensorflow as its backend but model building is much easier. Also, keras is also available in the tensorflow API as tf.keras. In keras one or multiple LSTMs are simplified as a layer which can be added to your model.
If you like a more specific answer to your question, please provide code to describe your problem.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.