setUpNet DNN module was not built with CUDA backend; switching to CPU - python

I want to run my script python with GPU as u see in this photo
I used the command line: watch nvidia-smi,to show Processes of GPU, unfortunately the script python use just 41Mib of GPU capacity:
this is a part of my code :
import time
import math
import cv2
import numpy as np
labelsPath = "./coco.names"
LABELS = open(labelsPath).read().strip().split("\n")
weightsPath = "./yolov3.weights"
configPath = "./yolov3.cfg"
net = cv2.dnn.readNetFromDarknet(configPath, weightsPath)
ln = net.getLayerNames()
ln = [ln[i[0] - 1] for i in net.getUnconnectedOutLayers()]
vs = cv2.VideoCapture(vid_path)
# vs = cv2.VideoCapture(0) ## USe this if you want to use webcam feed
writer = None
(W, H) = (None, None)
fl = 0
q = 0
while True:
(grabbed, frame) =
if not grabbed:
if W is None or H is None:
(H, W) = frame.shape[:2]
FW = 1075
FR = np.zeros((H+210,FW,3), np.uint8)
col = (255,255,255)
FH = H + 210
FR[:] = col
blob = cv2.dnn.blobFromImage(frame, 1 / 255.0, (416, 416),
swapRB=True, crop=False)
start = time.time()
layerOutputs = net.forward(ln)
end = time.time()
I tried to add this command line to force run with GPU ,
then after running the script again it gives me this message and continue running the script with CPU :
[ WARN:0] global /io/opencv/modules/dnn/src/dnn.cpp (1363) setUpNet DNN module was not built with CUDA backend; switching to CPU

You'll need to manually build OpenCV to work with your GPU.
Here is a great tutorial on how to do so.

You might have to uninstall your opencv-python package using pip in case you are already having one, only then will the custom built opencv be accessible to the program.
pip3 uninstall opencv-python

Compatibility chart of cuda and cudnn:
Checking the computation capability version from:
Which is 7.5
In GPU supported, for 7.5 computation capability, CUDA SDK 11.0 – 11.2 support for compute capability 3.5 – 8.6 (Kepler (in part), Maxwell, Pascal, Volta, Turing, Ampere):
check for your Supported NVIDIA Hardware.
In my case, I was using Tesla T4 having Turing, which is compatible with cuDNN.
so in compilation report, you can see that Cmake returns cuDNN availability as "NO":
Got the docker Image Using:
sudo docker nvidia/cuda:11.1-cudnn8-runtime-ubuntu18.04
Compiled Opencv Cuda from:


How to solve Qt display/platform error on Google Collab

I am trying to run an optical flow model, RAFT, on Google Colab. I have installed the setup file and libraries necessary for it but when I try to run the demo file, I get a Qt error that looks like this.
!python --model=raft-things.pth --path=demo-frames
/usr/local/lib/python3.10/site-packages/torch-1.13.0-py3.10-linux-x86_64.egg/torch/ UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
qt.qpa.xcb: could not connect to display
qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "/usr/local/lib/python3.10/site-packages/opencv_python-" even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.
Available platform plugins are: xcb, eglfs, minimal, minimalegl, offscreen, vnc.
I have never had any GUI errors on Colab with other libraries so I am not sure why torch is giving it.
Here is the file:
import argparse
import os
import cv2
import glob
import numpy as np
import torch
from PIL import Image
from raft import RAFT
from raft.utils import flow_viz
from raft.utils.utils import InputPadder
DEVICE = 'cuda'
def load_image(imfile):
img = np.array(
img = torch.from_numpy(img).permute(2, 0, 1).float()
return img[None].to(DEVICE)
def viz(img, flo):
img = img[0].permute(1,2,0).cpu().numpy()
flo = flo[0].permute(1,2,0).cpu().numpy()
# map flow to rgb image
flo = flow_viz.flow_to_image(flo)
img_flo = np.concatenate([img, flo], axis=0)
# import matplotlib.pyplot as plt
# plt.imshow(img_flo / 255.0)
cv2.imshow('image', img_flo[:, :, [2,1,0]]/255.0)
def demo(args):
model = torch.nn.DataParallel(RAFT(args))
model.load_state_dict(torch.load(args.model, map_location=DEVICE))
model = model.module
with torch.no_grad():
images = glob.glob(os.path.join(args.path, '*.png')) + \
glob.glob(os.path.join(args.path, '*.jpg'))
images = sorted(images)
for imfile1, imfile2 in zip(images[:-1], images[1:]):
image1 = load_image(imfile1)
image2 = load_image(imfile2)
padder = InputPadder(image1.shape)
image1, image2 = padder.pad(image1, image2)
flow_low, flow_up = model(image1, image2, iters=20, test_mode=True)
viz(image1, flow_up)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--model', help="restore checkpoint")
parser.add_argument('--path', help="dataset for evaluation")
parser.add_argument('--small', action='store_true', help='use small model')
parser.add_argument('--mixed_precision', action='store_true', help='use mixed precision')
parser.add_argument('--alternate_corr', action='store_true', help='use efficent correlation implementation')
args = parser.parse_args()
My hunch is that the issue might be coming from the fact the model might be outdated (2years). But I have tried using the pytorch and cuda version they specified and the most recent stable version (both installed through conda) and am still getting the same error.
Before this, I was getting a CudaCheck error but I assumed that was because the file wasn't correctly installed because of not having the correct version of python (3.8+), which I resolved by creating a separate kernel for python 3.10 and installing with that. I am now getting this error first.
My other hunch is that it has something to do with cv2 functions like cv2.namedWindow or cv2.imshow. That is what I gained from this other post. Nothing from there solved my issue. But they are necessary as a lot of the architecture is built on cv.

Could not load dynamic library '' ? / failed call to cuInit: UNKNOWN ERROR (303)?

I am a beginner TensorFlow user and am running into the following issue when attempting to load an already saved model for segmentation on a test images.
i installed all the libraries on a virtual environment that i created.
The same code runs on google colab and now i am trying to run it on my machine.
My Environment
Ubunutu 16 /
tensorflow 2.5.0
My code
When running the code :
import os
from glob import glob
from tqdm import tqdm
import cv2
import tensorflow as tf
import nibabel as nib
import numpy as np
import matplotlib.pyplot as plt
test_images = sorted(glob("/mnt/DATA2To/projet/all/Souris/SOD/data/20210726_C321/EXVIVO/IRM/test-20210726_C321/*")) # images 160x120x1
i = 0 # iterator initialized to zero
model = tf.keras.models.load_model("/mnt/DATA2To/projet/all/Souris/SOD/segmentation/segmentation-moelle.h5", compile= False )
for path in tqdm(test_images, total=len(test_images)):
x = nib.load(path) # load the images (160x1x120)
new_header = header=x.header.copy() # copy the header to a variable for writing the results at the end
x = nib.load(path).get_data() # get the data from the image loaded
original_image = x
original_image_bis = original_image.transpose((0,2,1))
h, w, _ = original_image_bis.shape
original_image_bis = cv2.resize(original_image_bis, (w, h))
x = x.transpose((0,2,1)) # permute the image axes to (160x120x1)
x = cv2.resize(x, (128, 128)) # resize the image to have a shape of (128x128)
x = (x - x.min()) / (x.max() - x.min()) # do the min-max normalisation
x.shape= x.shape + (1,) # add the third axes (128x128x1)
x = x.astype(np.float32)
x1 = np.expand_dims(x, axis=0)
pred_mask = model.predict(x1)[0]
#pred_mask = (np.where(pred_mask > np.mean(pred_mask), 1,0))
pred_mask = pred_mask.astype(np.float32)
pred_mask1 = cv2.resize(pred_mask, (w, h))
pred_mask1 = (np.where(pred_mask1 > 0.92, 1,0))
pred_mask1.shape= pred_mask1.shape + (1,) # add the third axes (160x120x1)
pred_mask1 = pred_mask1.transpose((0,2,1)) #permute the image axes to (160x1x120)
Sform= new_header.get_base_affine()
pred_mask2 = nib.Nifti1Image(pred_mask1,None, header= new_header)
fname= "/mnt/DATA2To/projet/all/Souris/SOD/data/20210726_C321/EXVIVO/IRM/results-moelle/image%04d.nii" %i, fname)
My Error
I am greeted with this error :
(venv) etudiant#PTT:~$ python3 '/home/etudiant/Documents/code/'
2021-07-28 09:58:12.539200: W tensorflow/stream_executor/platform/default/] Could not load dynamic library ''; dlerror: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /mnt/software//mrtrix/lib::/opt/minc/lib:/opt/minc/lib/InsightToolkit
2021-07-28 09:58:12.539221: I tensorflow/stream_executor/cuda/] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2021-07-28 09:58:13.429146: W tensorflow/stream_executor/platform/default/] Could not load dynamic library ''; dlerror: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /mnt/software//mrtrix/lib::/opt/minc/lib:/opt/minc/lib/InsightToolkit
2021-07-28 09:58:13.429164: W tensorflow/stream_executor/cuda/] failed call to cuInit: UNKNOWN ERROR (303)
2021-07-28 09:58:13.429179: I tensorflow/stream_executor/cuda/] kernel driver does not appear to be running on this host (PTT): /proc/driver/nvidia/version does not exist
2021-07-28 09:58:13.429322: I tensorflow/core/platform/] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
0it [00:00, ?it/s]
Can anyone would tell me what is wrong and how to fix that!?
W stands for "Warning" and I stands for "Information".
There are no problems with your code, TF just tells you it did not find the libraries required for GPU computation; this does not mean that TensorFlow does not run successfully on CPU.
What you can do instead to avoid receiving such messages in the future is to suppress the warnings.
Solution 1:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
import tensorflow as tf
Level 2 means to ignore warning and information, and print only the error.
Solution 2:
import tensorflow as tf
Solution 3:
import logging

Decode/encode HEVC with OpenCV Python with Intel Media SDK backend

I am trying to process some FHD video on Win10 in Python. I enabled the integrated GPU in BIOS, installed latest driver through Intel driver support, installed Intel Media SDK successfully and rebooted. Then I downloaded OpenCV 4.5 with all hardware codecs from here and added to a dummy project (made sure no other OpenCV exists). When I tried to use it for decoding/encoding, I had the following errors.
In decoding, I got error like
decoder = cv.VideoCapture('myfile.hevc', cv.CAP_INTEL_MFX)
# MFX: LoadPlugin failed for codec: 1129727304
but using cv.CAP_FFMPEG worked ok.
In encoding, I got error like
writer = cv.VideoWriter('output.hevc', apiPreference = cv.CAP_INTEL_MFX, fourcc = cv.VideoWriter_fourcc('h','v','c','1'), fps = 20.0, frameSize = (640,480))
# MFX: Unsupported FourCC: hvc1 (0x31637668)
# 'hevc' or 'h265' give same error code
# write to 'output.mp4' with 'mp4v' yield a different error code 0x7634706d
While writing with ffmpeg works
writer = cv.VideoWriter('C:/testData/output.mp4', apiPreference = cv.CAP_FFMPEG , fourcc = cv.VideoWriter_fourcc('m','p','4','v'), fps = 20.0, frameSize = (640,480))
# success
Much appreciated.
[Edit 01]
is the screenshot of running mediasdk_system_analayzer_64.exe.
[Edit 02]
Even though HW decoder does not show up in the analyzer, it works after I disable all monitors but 1 as suggested here.
.\sample_multi_transcode.exe -i::h265 myfile.hevc -o::h265 output.hevc -hw
# session 0 [0000023D33FCA130] PASSED (MFX_ERR_NONE) 8.34417 sec, 226 frames, 27.085 fps
VideoCapture with cv.CAP_INTEL_MFX works now. Yet VideoWriter gives the same error.

unsupported operation _FusedBatchNormV3 with tensorRT and jetson tx2

On a Jetson TX2 I am running:
Linux4Tegra R32.2.1
UFF Version 0.6.3
Cuda 10
Python 3.6.8
I get this error message:
[TensorRT] ERROR: UffParser: Validator error: sequential/batch_normalization_1/FusedBatchNormV3: Unsupported operation _FusedBatchNormV3
From this code:
output_nodes = [args.output_node_names]
input_node = args.input_node_name
frozen_graph_pb = args.frozen_graph_pb
uff_model = uff.from_tensorflow(frozen_graph_pb, output_nodes) . #Successfully creates uff model
network = builder.create_network()
G_LOGGER = trt.Logger(trt.Logger.INFO)
builder = trt.Builder(G_LOGGER)
builder.max_batch_size = 10
builder.max_workspace_size = 1 << 30
data_type = trt.DataType.FLOAT
parser = trt.UffParser()
input_verified =parser.register_input(input_node, (1,234,234,3)) #returns true
output_verified = parser.register_output(output_nodes[0]) #returns true
buffer_verified = parser.parse_buffer(uff_model, network, data_type) #returns false
The uff model was created successfully.
The parser successfully registered the inputs and outputs.
Parsing the buffer fails with the error above.
Does anyone know if FusedBatchNormV3 is truly not supported in tensorRT and if not is there an existing plugin that I can pull using the graph surgeon module?

Tensorflow crash with CUDNN_STATUS_ALLOC_FAILED

Been searching the web for hours with no results, so figured I'd ask here.
I'm trying to make a self driving car following Sentdex's tutorial, but when running the model, I get a bunch of fatal errors. I've searched all over the internet for the solution, and many seem to have the same problem. However, none of the solutions I've found (Including this Stack-post), work for me.
Here is my software:
Tensorflow: 1.5, GPU version
CUDA: 9.0, with the patch
CUDnn: 7
Windows 10 Pro
Python 3.6
Nvidia 1070ti, with latest drivers
Intel i5 7600K
Here is the crash log:
2018-02-04 16:29:33.606903: E C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2018-02-04 16:29:33.608872: E C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2018-02-04 16:29:33.609308: E C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2018-02-04 16:29:35.145249: E C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\] could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
2018-02-04 16:29:35.145563: E C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
2018-02-04 16:29:35.149896: F C:\tf_jenkins\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\kernels\] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms)
Here's my code:
import tensorflow as tf
import numpy as np
import cv2
import time
from PIL import ImageGrab
from getkeys import key_check
from alexnet import alexnet
import os
from sendKeys import PressKey, ReleaseKey, W,A,S,D,Sp
import random
WIDTH = 80
LR = 1e-3
MODEL_NAME = 'DiRT-AI-Driver-{}-{}-{}-epochs.model'.format(LR, 'alexnetv2', EPOCHS)
def straight():
def left():
def right():
def brake():
def handbrake():
model = alexnet(WIDTH, HEIGHT, LR)
def main():
last_time = time.time()
for i in list(range(4))[::-1]:
paused = False
if not paused:
screen = np.array(ImageGrab.grab(bbox=(0,40,1024,768)))
screen = cv2.cvtColor(screen,cv2.COLOR_BGR2GRAY)
screen = cv2.resize(screen,(80,60))
print('Loop took {} seconds'.format(time.time()-last_time))
last_time = time.time()
print('took time')
prediction = model.predict([screen.reshape(WIDTH,HEIGHT,1)])[0]
moves = list(np.around(prediction))
print('got moves')
if moves == [1,0,0,0,0]:
elif moves == [0,1,0,0,0]:
elif moves == [0,0,1,0,0]:
elif moves == [0,0,0,1,0]:
elif moves == [0,0,0,0,1]:
keys = key_check()
if 'T' in keys:
if paused:
pased = False
paused = True
I've found that the line that crashes python and spawns the first three bugs is this line:
prediction = model.predict([screen.reshape(WIDTH,HEIGHT,1)])[0]
When running the code, the CPU goes up to a whopping 100%, suggesting that something is seriously off. GPU goes to about 40-50%
I've tried Tensorflow 1.2 and 1.3, as well as CUDA 8, to no good. When installing CUDA I do not install the specific drivers, since they are too old for my GPU. Tried different CUDnn's too, did no good.
In my case, the issue happened because another python console with tensorflow imported was running. Closing it solved the problem.
I have Windows 10, the main errors were :
failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
Probably you're running out of GPU memory.
If you're using TensorFlow 1.x:
1st option) set allow_growth to true.
import tensorflow as tf
config = tf.ConfigProto()
sess = tf.Session(config=config)
2nd option) set memory fraction.
# change the memory fraction as you want
import tensorflow as tf
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.3)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
If you're using TensorFlow 2.x:
1st option) set set_memory_growth to true.
# Currently the ‘memory growth’ option should be the same for all GPUs.
# You should set the ‘memory growth’ option before initializing GPUs.
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
except RuntimeError as e:
2nd option) set memory_limit as you want.
Just change the index of gpus and memory_limit in this code below.
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
tf.config.experimental.set_virtual_device_configuration(gpus[0], [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=1024)])
except RuntimeError as e:
Try to set:
os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true' solved my problem
my environment:
Cudnn 7.6.5
Tensorflow 2.4
Cuda Toolkit 10.1
RTX 2060
Try to add the cuda path to environment variable. It's seems that the problem it's with cuda.
Set the CUDA Path in ~/.bashrc (edit with nano):
#Cuda Nvidia path
$ export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64"
$ export CUDA_HOME=/usr/local/cuda
I encountered the same problem, then I found out that because I'm also using GPU for run other stuffs even it doesn't show on task manager (windows) using GPU. Maybe even things like (rendering videos, video encoding or play heavy workload game, coin mining...).
If you think it's still using heavy GPU, just close it off and problem solve.
I had an almost identical problem. Fixed it by reinstalling tensorflow-gpu.
conda uninstall tensorflow-gpu
conda install tensorflow-gpu
I think pip should work as well.
