Python OpenCV with Cuda not working after successful build

Python OpenCV with Cuda not working after successful build - python

I am on Windows 10, using Python 3.9.6 and my cv2 version is 4.4.0. I built OpenCV with Cuda successfully and after calling cv2.cuda.getCudaEnabledDeviceCount(), it returns 1 as expected. The following lines also work fine.
net = cv2.dnn.readNetFromCaffe(proto_file, weights_file)
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
# multiple lines
# processing frame
# and setting input blob
net.setInput(in_blob)
However, executing the following line throws an exception.
output = net.forward()
The exception:
cv2.error: OpenCV(4.4.0)
G:\opencv-4.4.0\opencv-4.4.0\modules\dnn\src\dnn.cpp:2353: error:
(-216:No CUDA support) OpenCV was not built to work with the selected
device. Please check CUDA_ARCH_PTX or CUDA_ARCH_BIN in your build
configuration. in function
'cv::dnn::dnn4_v20200609::Net::Impl::initCUDABackend'
The message says that my Cuda was not built to work with the selected device (which I'm guessing is my GPU).
It seems to have encountered a conflict with CUDA_ARCH_BIN and/or CUDA_ARCH_PTX. My GPU model is NVIDIA Geforce MX130 whose CUDA_ARCH_BIN value is what I found to be 6.1 and I set it according on CMake.
How can I resolve these issues? Let me know if I need to provide any more information.

"Sources say" the MX130 has a Maxwell core, not a Pascal core. Maxwell is the predecessor of Pascal.
Hence, you only have CUDA compute capability 5.0.
You should check that with an appropriate tool such as GPU-Z that does its best to query the hardware instead of going by specs.
Sources:
https://en.wikipedia.org/wiki/GeForce_10_series#GeForce_10_(10xx)_series_for_notebooks (notice how the Fab (nm) is different and the code name is GM108, not GPxxx)
https://www.techpowerup.com/gpu-specs/geforce-mx130.c3043

Related

Opencv Cuda accelerated : Python can't see GPU device

I installed OpenCV for GPU use in python, following tutorials on youtube.
I encounter a major difficulty when I try to see if python recognizes the GPU.
After the installation, I executed this code, in order to verify if my GPU is detected or not :
import cv2
from cv2 import cuda
cuda.printCudaDeviceInfo(0)
The first output was :
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
cv2.error: OpenCV(4.5.5) D:\a\opencv-python\opencv-python\opencv\modules\core\include\opencv2/core/private.cuda.hpp:106
: error: (-216:No CUDA support) The library is compiled without CUDA support in function 'throw_no_cuda'
So I thought my install was wrong, then I did the install again.
After many attempts, I tried to do the same code verification as before but in the site-packages folder of my Miniconda install (In the same location of cv2 for GPU).
And surprisingly, when I use the method cuda.printCudaDeviceInfo(0) the output is :
*** CUDA Device Query (Runtime API) version (CUDART static linking) ***
Device count: 1
Device 0: "NVIDIA T400"
...
Compute Mode:
Default (multiple host threads can use ::cudaSetDevice() with device simultaneously)
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.70, CUDA Runtime Version = 11.70, NumDevs = 1
So the GPU has been detected when I used python in this folder.
But my want is to use python in other folders.
I thought it was a PATH error, but I added to my path, the cv2 location in my variables system environment, and I got the same result.
Does anyone have an idea about how to fix this ?
Thank you.

pytorch profiler nvidia cuda permission error (microsoft wsl): CUPTI_ERROR_NOT_INITIALIZED, CUPTI_ERROR_INSUFFICIENT_PRIVILEGES

I am trying to run a profiling script for pytorch on MS WSL 2.0 Ubuntu 20.04.
WSL is on the newest version (wsl --update). I am running the stable conda pytorch cuda 11.3 version from the pytorch website with pytorch 1.11. My GPU is a GTX 1650 Ti.
I can run my script fine and it finishes without error, but when I try to profile it using pytorch's bottleneck profiling tool python -m torch.utils.bottleneck run.py
it first throws this warning when starting the autograd profiler:
Running your script with the autograd profiler...
WARNING:2022-06-01 13:37:49 513:513 init.cpp:129] function status failed with error CUPTI_ERROR_NOT_INITIALIZED (15)
WARNING:2022-06-01 13:37:49 513:513 init.cpp:130] CUPTI initialization failed - CUDA profiler activities will be missing
Then, if I run for a small number of epochs, the script finishes fine, and it shows also the cuda profiling stats (even though it says profiler activities will be missing). But when I do a longer run, I get the message Killed after the script runs "through" the autograd profiler. The command dmesg gives this output at the end:
[ 1224.321233] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=python,pid=295,uid=1000
[ 1224.321421] Out of memory: Killed process 295 (python) total-vm:55369308kB, anon-rss:15107852kB, file-rss:0kB, shmem-rss:353072kB, UID:1000 pgtables:39908kB oom_score_adj:0
[ 1224.746786] oom_reaper: reaped process 295 (python), now anon-rss:0kB, file-rss:0kB, shmem-rss:353936kB
So, when using the profiler, there seems to be a memory error (it might not necessarily be related to the above CUPTI warning). Is this related to the profiler somehow saving too much data in-mem? Then, it might be a common problem that occurs for too long runs, right?
The cuda warning CUPTI_ERROR_NOT_INITIALIZED indicates that my CUPTI (short for "CUDA Profiling Tools Interface") is not running. I read in another post that this might be related to me running a newer version of CUPTI that is not backcompatible with the older version of CUDA 11.3. As cupti is not included in the cudatoolkit on conda by default, the system is probably trying to use / locate the cupti, but does not find it / cannot use it.
I'd appreciate any help for this issue. It would be quite nice to see a longer profiling run, in order to determine the bottlenecks / expensive operations in my pytorch code.
Thanks!

(-216:No CUDA support) The library is compiled without CUDA support in function 'throw_no_cuda'

Hi I build opencv from source and I would like to use in on the GPU, so I did all the flags, but still I have this issue if I want to put an image on the GPU. Does anyone have an idea? Here is how the configuration looks like in the end here. I already struggle for hours with that. I think the configuration is right but maybe some path isn't?
cv2.error: OpenCV(4.4.0) /tmp/pip-req-build-v7sdauef/opencv/modules/core/include/opencv2/core/private.cuda.hpp:106: error: (-216:No CUDA support) The library is compiled without CUDA support in function 'throw_no_cuda'

Using a GPU with Tensorflow Eager Execution on Google Colaboratory

I'm trying to offload computing to a GPU in Tensorflow eager execution on Google Colaboratory, but due to the way Colab handles GPUs I'm having trouble.
Normally, when using a GPU you change the runtime to a GPU accelerated one and Tensorflow automatically uses the available GPU for your calculations. If you were to try and set the GPU manually, you'd find you can't because there isn't one in the list of connected devices.
Eager execution, however, doesn't automatically use a GPU if one is available, and because you can't set one manually it doesn't seem like one can be used.
Please see the attached notebook: https://drive.google.com/file/d/1NeJQoO5Yi5V-m7Hmm85wmm1Cl5SrY33m/view?usp=sharing
Trying to specify a GPU to use throws the following error: RuntimeError: Error copying tensor to device: /job:localhost/replica:0/task:0/device:GPU:0. /job:localhost/replica:0/task:0/device:GPU:0 unknown device.

Eager execution does in fact let you specify GPUs manually (for example, search for "GPU" in https://www.tensorflow.org/programmers_guide/eager).
The particular error message you've included indicates that TensorFlow can't find a GPU, so make sure that you've installed the GPU-compatible version of TensorFlow 1.7. After you've done that, you'll need to make one more change to your code. In particular, copy the tensors x and y to GPU before multiplying them, like so:
with tf.device('/gpu:0'):
x = x.gpu()
y = y.gpu()
%timeit x * y

DEVICE_NOT_FOUND while calling pyopencl.Context

I am struggling with the following Python code:
import pyopencl as cl
ctx = cl.Context(dev_type=cl.device_type.GPU)
It gives the following exception:
RuntimeError: clcreatecontextfromtype failed: DEVICE_NOT_FOUND
My OS is Linux Mint Debian Edition 2, running on a laptop with i7-5600U. It also has a graphic card, but I do not use it. I am using Python 3.4.2.
I have installed the Debian package amd-opencl-icd (I first tried beignet, but then the command clinfo failed).
I have installed pyopencl using pip and opencl using this tutorial. Note that I did not do the fourth step (creating the symbolic link to intel64.icd), since I did not have this file. The test at the end of the tutorial succeed.
Do you have any hint about what is happening? I am surprised that the C++ test of opencl (in the tutorial) and the installation of pyopencl both succeed, but this simple command of pyopencl fails.
EDIT
After installing the Intel driver, I now have a different issue.
The command clinfo gives the following:
terminate called after throwing an instance of 'unsigned long'
And the above Python code gives:
LogicError: clcreatecontextfromtype failed: INVALID_PLATFORM

You've installed the intel opencl SDK, which gives you the compiler and maybe the CPU runtime. You're trying to create a context consisting of GPU devices, which means that you need the runtime for intel HD graphics. Grab the 64-bit driver from the link below.
https://software.intel.com/en-us/articles/opencl-drivers#latest_linux_driver
The CPU runtime is also available from that link. You need to follow the same procedure as before for the opencl HD graphics driver (converting .rpm to .deb). The CPU driver has a script you can execute.
The INVALID_PLATFORM error you got after installing the runtime appears to be because it expects the platform to be passed as a property, when creating from device type. It expects the properties as a list of key-tuple pairs. This is shown in the snippet below for the first available platform. The keyword is one of the values in context_properties, and the value is the platform object itself.
import pyopencl as cl
platforms = cl.get_platforms()
ctx = cl.Context(dev_type=cl.device_type.GPU, properties=[(cl.context_properties.PLATFORM, platforms[0])])
print(ctx.devices)
On my platform this prints
[<pyopencl.Device 'Intel(R) HD Graphics 4600' on 'Intel(R) OpenCL' at 0x1c04b217140>]
as my first platform is intel.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.