Using GPU for tensorflow object detection - python

I have trained a faster R-CNN model for object detection using TensorFlow object detection with Google colab. But when testing videos google colab crashes, that's why I decided to test on my pc and installed CUDA 10.0 and Cudnn 7.6.5 and TensorFlow-gpu = 1.15.
But the test is so so slow as if it is running on a CPU. I get this message when testing so I guess it is using my GPU (photo).
Does anyone know a solution to test a video in a faster way?
Is the problem with CUDA or my GPU?
Thank you

From Comments
tf.test.is_gpu_available() tells whether Tensorflow can access a
THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use
tf.config.list_physical_devices('GPU') instead
If it returns True, then there is no issue in using GPU. But
GeForce MX110 is slowest since it also has little RAM. For better
performance you can see the list of CUDA-enabled GPU cards. For
more details on hardware requirements you can refer
(Paraphrased from Stanley Zheng and Dr.Snoopy)


(Tensorflow) Stuck at Epoch 1 during

I've been trying to make Tensorflow 2.8.0 work with my Windows GPU (GeForce GTX 1650 Ti), and even though it detects my GPU, any model that I make will be stuck at Epoch 1 indefinitely when I try to use the fit method till the kernel (I've tried on jupyter notebook and spyder) hangs and restarts.
Based on Tensorflow's website, I've downloaded the respective cuDNN and CUDA versions, for which I've further verified (together with tensorflow's detection of my GPU) by running the various commands:
CUDA (Supposed to be 11.2)
(on command line)
nvcc --version
Build cuda_11.2.r11.2/compiler.29373293_0
(In python)
import tensorflow.python.platform.build_info as build
Output: '64_112'
cuDNN (Supposed to be 8.1)
import tensorflow.python.platform.build_info as build
Output: '64_8' # Looks like v8 but I've actually installed v8.1 (cuDNN v8.1.1 (Feburary 26th, 2021), for CUDA 11.0,11.1 and 11.2) so I think it's fine?
GPU Checks
Output: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Output: True
Output: This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Created device /device:GPU:0 with 2153 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1650 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5
When I then try to fit any sort of model, it just fails following what I described above. What is surprising is that even though it can't load code such as that described in Tensorflow's CNN Tutorial, the only time it ever works is if I run the chunk of code from this stackoverflow question. This chunk of code looks almost the same as every other chunk that failed.
Can someone help me with this issue? I've been desperately testing TensorFlow with every chunk of code that I came across for the past couple of hours, and the only time where it does not get stuck at Epoch 1 is with the link above.
**(I've also tried running only on my CPU via os.environ['CUDA_VISIBLE_DEVICES'] = '-1' and everything seems to work fine)
Update (Solution)
It seems like the suggestions from this post helped - I've copied the following files from the zipped cudnn bin sub folder (cudnn-11.2-windows-x64-v8.1.1.33\cuda\bin) into my cuda bin folder (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin)
It seems like I initially misinterpreted the copy all cudnn*.dll files as only copying over the cudnn64_8.dll file, rather than copying every other file listed above.

Tensorflow-gpu only gives the object detection for the first frame

I have done some code for object detection task. I have done this before a long time ago. But now I want to come back and test the code The model only predict the first frame. This ONLY happen when I use GPU, while CPU is normal
My problem is the same with this question
Why Tensorflow-gpu only gives the object prediction once
Tensorflow 2.8
Python 3.8
Cuda 11.2
GPU RTX2060-m
I have solve the problem. By downgrade cuda to 10.1 and use Tensorflow 2.2 instead

Make TensorFlow use the GPU on an ARM Mac

I have installed TensorFlow on an M1 (ARM) Mac according to these instructions. Everything works fine.
However, model training is happening on the CPU. How do I switch training to the GPU?
In: tensorflow.config.list_physical_devices()
Out: [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]
In the documentation of Apple's TensorFlow distribution I found the following slightly confusing paragraph:
It is not necessary to make any changes to your existing TensorFlow scripts to use ML Compute as a backend for TensorFlow and TensorFlow Addons. There is an optional mlcompute.set_mlc_device(device_name='any') API for ML Compute device selection. The default value for device_name is 'any', which means ML Compute will select the best available device on your system, including multiple GPUs on multi-GPU configurations. Other available options are CPU and GPU. Please note that in eager mode, ML Compute will use the CPU. For example, to choose the CPU device, you may do the following:
# Import mlcompute module to use the optional set_mlc_device API for device selection with ML Compute.
from tensorflow.python.compiler.mlcompute import mlcompute
# Select CPU device.
mlcompute.set_mlc_device(device_name='cpu') # Available options are 'cpu', 'gpu', and 'any'.
So I try to run:
from tensorflow.python.compiler.mlcompute import mlcompute
and get:
WARNING:tensorflow: Eager mode uses the CPU. Switching to the CPU.
At this point I am stuck. How can I train keras models on the GPU to my MacBook Air?
TensorFlow version: 2.4.0-rc0
The tensorflow_macos tf 2.4 repository has been archived by the owner. For tf 2.5, refer to here.
It's probably not useful to disable the eager execution fully but to tf. functions. Try this and check your GPU usages, the warning message can be misleading.
import tensorflow as tf
The current release of Mac-optimized TensorFlow has several issues that yet not fixed (TensorFlow 2.4rc0). Eventually, the eager mode is the default behavior in TensorFlow 2.x, and that is also unchanged in the TensorFlow-MacOS. But unlike the official, this optimized version uses CPU forcibly for eager mode. As they stated here.
... in eager mode, ML Compute will use the CPU.
That's why even we set explicitly the device_name='gpu', it switches back to CPU as the eager mode is still on.
from tensorflow.python.compiler.mlcompute import mlcompute
WARNING:tensorflow: Eager mode uses the CPU. Switching to the CPU.
Disabling the eager mode may work for the program to utilize the GPU, but it's not a general behavior and can lead to such puzzling performance on both CPU/GPU. For now, the most appropriate approach can be to choose device_name='any', by that the ML Compute will query the available devices on the system and selects the best device(s) for training the network.
Try with turning off the eager execution...
via following
import tensorflow as tf
Let me know if it works.

Fail to find the dnn implementation for LSTM

I'm trying to run a simple LSTM model with following code
model = tf.keras.models.Sequential()
model.compile(optimizer=tf.keras.optimizers.RMSprop(), loss='mae')
single_step_history =, epochs=EPOCHS,
The error happened when it trying to fit the model
tensorflow.python.framework.errors_impl.UnknownError: [_Derived_] Fail to find the dnn implementation.
[[{{node CudnnRNN}}]]
[[sequential/lstm/StatefulPartitionedCall]] [Op:__inference_distributed_function_3107]
There's another error like this
2020-02-22 19:08:06.478567: W tensorflow/core/kernels/data/] The calling
iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the
dataset, the partially cached contents of the dataset will be discarded. This can happen if you have
an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use
`dataset.take(k).cache().repeat()` instead.
I tried all methods on this question which doesn't work for me
my envrionment is
tensorflow-gpu 2.0
CUDA v10
CuDNN 7.6.5
OK.. I found that I didn't have the latest Nvidia driver, so I upgraded, and works
Answering here for the benefit of the community even if the user has provided the solution.
Upgrading Nvidia driver to the latest has resolved the issue.
You can update NVIDIA manually from here here by selecting the product details and OS, you’re going to have to download the most recent drivers from their website. You’ll then have to run the installer and overwrite the old driver.
Try below
import tensorflow as tf
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], enable=True)

How to change the processing unit during run time (from GPU to CPU)?

In the context of deep neural networks training, the training works faster when it uses the GPU as the processing unit.
This is done by configuring CudNN optimizations and changing the processing unit in the environment variables with the following line (Python 2.7 and Keras on Windows):
os.environ["THEANO_FLAGS"] = "floatX=float32,device=gpu,optimizer_including=cudnn,gpuarray.preallocate=0.8,dnn.conv.algo_bwd_filter=deterministic,dnn.conv.algo_bwd_data=deterministic,dnn.include_path=e:/,dnn.library_path=e:/"
The output is then:
Using gpu device 0: TITAN Xp (CNMeM is disabled, cuDNN 5110)
The problem is that the GPU memory is limited compared to the RAM (12GB and 128GB respectively), and the training is only one phase of the whole flow. Therefore I want to change back to CPU once the training is completed.
I've tried the following line, but it has no effect:
os.environ["THEANO_FLAGS"] = "floatX=float32,device=cpu"
My questions are:
Is it possible to change from GPU to CPU and vice-versa during runtime? (technically)
If yes, how can I do it programmatically in Python? (2.7, Windows, and Keras with Theano backend).
Yes this is possible at least for the tensorflow backend. You just have to also import tensorflow and put your code into the following with:
with tf.device('/cpu:0'):
your code
with tf.device('/gpu:0'):
your code
I am unsure if this also works for theano backend. However, switching from one backend to the other one is just setting a flag beforehand so this should not provide too much trouble.
