Training RNN on GPU - which tf.keras layer should I use? - python

I am training RNNs, which I built using tf.keras.layers.GRU layers. They are taking a long time to train (>2 hours), so I am going to deploy them to the GPU for training. I am wondering a few things about training on GPU:
What is the difference between tf.keras.layers.CuDNNGRU and tf.keras.layers.GRU (and also tf.keras.layers.LSTM vs. tf.keras.layers.CuDNNLSTM)? I understand from this post that CuDNNGRU layers train faster than GRU layers, but
Do the 2 layers converge to different results with the same seed?
Do the 2 layers perform the same during inference?
Do CuDNN layers require a GPU during inference?
Can GRU layers run inference on a GPU?
Are CuDNN layers easily deployable? I am currently using coremlconverter to convert my keras model to CoreML for deployment.
Is there an equivalent CuDNN layer for tf.keras.layers.SimpleRNN (i.e. tf.keras.layers.CuDNNSimpleRNN)? I am not committed to a specific architecture yet, and so I believe I would need the tf.keras.layers.CuDNNSimpleRNN layer if I decide on SimpleRNNs and the CuDNN layer has some functionality that I need.
With CuDNN layers, do I need to have tensorflow-gpu installed? Or do they still get deployed to the GPU as long as I have the relevant drivers installed?

if you are using a cuda compatible gpu, it makes absolutely sense to use CuDNN layers. They have a different implementation that tries to overcome computation parallelization issues inherent in the RNN architecture. They usually perform a bit worst though but are 3x-6x faster https://twitter.com/fchollet/status/918170264608817152?lang=en
Do the 2 layers converge to different results with the same seed?
yes
Do the 2 layers perform the same during inference?
You should have a comparable performance but not exactly the same
Do CuDNN layers require a GPU during inference?
Yes but you can convert to a CuDNN compatible GRU/LSTM
Can GRU layers run inference on a GPU?
Yes
With CuDNN layers, do I need to have tensorflow-gpu installed? Or do they still get deployed to the GPU as long as I have the relevant drivers installed?
Yes and you need a cuda compatible gpu

Related

Using GPU for tensorflow object detection

I have trained a faster R-CNN model for object detection using TensorFlow object detection with Google colab. But when testing videos google colab crashes, that's why I decided to test on my pc and installed CUDA 10.0 and Cudnn 7.6.5 and TensorFlow-gpu = 1.15.
But the test is so so slow as if it is running on a CPU. I get this message when testing so I guess it is using my GPU (photo).
Does anyone know a solution to test a video in a faster way?
Is the problem with CUDA or my GPU?
Thank you
From Comments
tf.test.is_gpu_available() tells whether Tensorflow can access a
GPU.
THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use
tf.config.list_physical_devices('GPU') instead
If it returns True, then there is no issue in using GPU. But
GeForce MX110 is slowest since it also has little RAM. For better
performance you can see the list of CUDA-enabled GPU cards. For
more details on hardware requirements you can refer
here.
(Paraphrased from Stanley Zheng and Dr.Snoopy)

Make TensorFlow use the GPU on an ARM Mac

I have installed TensorFlow on an M1 (ARM) Mac according to these instructions. Everything works fine.
However, model training is happening on the CPU. How do I switch training to the GPU?
In: tensorflow.config.list_physical_devices()
Out: [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]
In the documentation of Apple's TensorFlow distribution I found the following slightly confusing paragraph:
It is not necessary to make any changes to your existing TensorFlow scripts to use ML Compute as a backend for TensorFlow and TensorFlow Addons. There is an optional mlcompute.set_mlc_device(device_name='any') API for ML Compute device selection. The default value for device_name is 'any', which means ML Compute will select the best available device on your system, including multiple GPUs on multi-GPU configurations. Other available options are CPU and GPU. Please note that in eager mode, ML Compute will use the CPU. For example, to choose the CPU device, you may do the following:
# Import mlcompute module to use the optional set_mlc_device API for device selection with ML Compute.
from tensorflow.python.compiler.mlcompute import mlcompute
# Select CPU device.
mlcompute.set_mlc_device(device_name='cpu') # Available options are 'cpu', 'gpu', and 'any'.
So I try to run:
from tensorflow.python.compiler.mlcompute import mlcompute
mlcompute.set_mlc_device(device_name='gpu')
and get:
WARNING:tensorflow: Eager mode uses the CPU. Switching to the CPU.
At this point I am stuck. How can I train keras models on the GPU to my MacBook Air?
TensorFlow version: 2.4.0-rc0
Update
The tensorflow_macos tf 2.4 repository has been archived by the owner. For tf 2.5, refer to here.
It's probably not useful to disable the eager execution fully but to tf. functions. Try this and check your GPU usages, the warning message can be misleading.
import tensorflow as tf
tf.config.run_functions_eagerly(False)
The current release of Mac-optimized TensorFlow has several issues that yet not fixed (TensorFlow 2.4rc0). Eventually, the eager mode is the default behavior in TensorFlow 2.x, and that is also unchanged in the TensorFlow-MacOS. But unlike the official, this optimized version uses CPU forcibly for eager mode. As they stated here.
... in eager mode, ML Compute will use the CPU.
That's why even we set explicitly the device_name='gpu', it switches back to CPU as the eager mode is still on.
from tensorflow.python.compiler.mlcompute import mlcompute
mlcompute.set_mlc_device(device_name='gpu')
WARNING:tensorflow: Eager mode uses the CPU. Switching to the CPU.
Disabling the eager mode may work for the program to utilize the GPU, but it's not a general behavior and can lead to such puzzling performance on both CPU/GPU. For now, the most appropriate approach can be to choose device_name='any', by that the ML Compute will query the available devices on the system and selects the best device(s) for training the network.
Try with turning off the eager execution...
via following
import tensorflow as tf
tf.compat.v1.disable_eager_execution()
Let me know if it works.

Does the Keras implementation in TF2 support everything native Keras can do with TF1?

We are working with Keras on top of TensorFlow 1.x. But now that TF 2.0 is coming we are thinking of switching to that update, using the Keras API implementation built into TF 2.0.
But before we do so, I would like to ask you guys: Do you know whether the Keras implementation in TF 2.0 does support everything native Keras does with TF 1.0, or are there any features missing?
Moreover, will I be able to use my Keras code 1:1 with the new TF 2.0 implementation of the Keras API, or do we need to re-write parts of our existing Keras code?
If you want to use TensorFlow, then I highly recommend you to switch and use the TensorFlow implementation of Keras (i.e. tf.keras) because it will support more features of TF and also would be much more efficient and optimized than native Keras.
Actually, Keras maintainers released a new version (2.2.5) of Keras a few days ago (after more than 10 months with no new release!) and they also recommend to use tf.keras. Here are the release notes:
Keras 2.2.5 is the last release of Keras that implements the 2.2.* API. It is the last release to only support TensorFlow 1 (as well as Theano and CNTK).
The next release will be 2.3.0, which makes significant API changes and add support for TensorFlow 2.0. The 2.3.0 release will be the last major release of multi-backend Keras. Multi-backend Keras is superseded by tf.keras.
At this time, we recommend that Keras users who use multi-backend Keras with the TensorFlow backend switch to tf.keras in TensorFlow 2.0. tf.keras is better maintained and has better integration with TensorFlow features.
This: "Multi-backend Keras is superseded by tf.keras" is a strong indicator that it is better to switch to tf.keras, especially if you are still at the beginning of your project.

How to change the processing unit during run time (from GPU to CPU)?

In the context of deep neural networks training, the training works faster when it uses the GPU as the processing unit.
This is done by configuring CudNN optimizations and changing the processing unit in the environment variables with the following line (Python 2.7 and Keras on Windows):
os.environ["THEANO_FLAGS"] = "floatX=float32,device=gpu,optimizer_including=cudnn,gpuarray.preallocate=0.8,dnn.conv.algo_bwd_filter=deterministic,dnn.conv.algo_bwd_data=deterministic,dnn.include_path=e:/toolkits.win/cuda-8.0.61/include,dnn.library_path=e:/toolkits.win/cuda-8.0.61/lib/x64"
The output is then:
Using gpu device 0: TITAN Xp (CNMeM is disabled, cuDNN 5110)
The problem is that the GPU memory is limited compared to the RAM (12GB and 128GB respectively), and the training is only one phase of the whole flow. Therefore I want to change back to CPU once the training is completed.
I've tried the following line, but it has no effect:
os.environ["THEANO_FLAGS"] = "floatX=float32,device=cpu"
My questions are:
Is it possible to change from GPU to CPU and vice-versa during runtime? (technically)
If yes, how can I do it programmatically in Python? (2.7, Windows, and Keras with Theano backend).
Yes this is possible at least for the tensorflow backend. You just have to also import tensorflow and put your code into the following with:
with tf.device('/cpu:0'):
your code
with tf.device('/gpu:0'):
your code
I am unsure if this also works for theano backend. However, switching from one backend to the other one is just setting a flag beforehand so this should not provide too much trouble.

Keras batch normalization for shared layers

In previous keras version I used batch normalization parameter mode=2 to use shared modules in my network. After upgrading this parameter no longer exists and when I'm using batch normalization the network doesn't seem to learn.

Categories