Runtime error using Python Library Keops using CUDA in Ubuntu18.04

Runtime error using Python Library Keops using CUDA in Ubuntu18.04 - python

I am trying to run samples from the Python library: GeomLoss, which depends on CUDA, Pytorch and Keops in Ubuntu 18.04.3. I downloaded Python3.7 using Anaconda, and I am using CUDA 10.1. The gcc version is 7.4.0.
When I run samples from GeomLoss, the error message said:
RuntimeError: [KeOps] This KeOps shared object has been compiled
without cuda support: try to set tagHostDevice to 0 or recompile the
formula with a working version of cuda.
I cannot change tagHostDevice to 0 since this will disable GPU calculation according to their documentation. I checked CUDA and Pytorch installation and they was no error.
But when I tried to run the installation checking code from KeOps:
import torch
import pykeops.torch as pktorch
x = torch.arange(1, 10, dtype=torch.float32).view(-1, 3)
y = torch.arange(3, 9, dtype=torch.float32).view(-1, 3)
my_conv = pktorch.Genred('SqNorm2(x-y)', ['x = Vi(3)', 'y = Vj(3)'])
print(my_conv(x, y))
I received error message:
error -- unsupported GNU version! gcc versions later than 6 are not supported! ^~~~~ CMake Error at
keopslibKeOpstorch91c92bd508_generated_link_autodiff.cu.o.Release.cmake:219
I checked CUDA documentation, for Ubuntu 18.04.3, the native linux distribution support should be gcc-7.3.0. for x86_64. I used gcc --version to check default gcc in system and it is using gcc-7.4.0. I am not sure if this is the problem with using KeOps with CUDA and GPU. Also, I believe KeOps will not support gcc versions before 7. So I am really confused about what should I do to fix the problem right now.
I am wondering if anyone has experienced similar problems with GeomLoss and KeOps or other libraries. I am indeed grateful for any suggestions. Thanks!

I did the following steps and it worked for me:
First, by checking the dependencies in this link I noticed that nvcc compiler is not installed. By going to Nvidia Toolkit Installation Guide I did the following steps:
wget https://developer.download.nvidia.com/compute/cuda/11.2.2/local_installers/cuda_11.2.2_460.32.03_linux.run
sudo sh cuda_11.2.2_460.32.03_linux.run
Then I realized that nvcc command not working, so I did add them to the path using:
nano ~/.bashrc
# Add the following two lines:
export PATH="$PATH:/usr/local/cuda/bin"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64"
In general, it is important to see the logs of PyKeops to check the process. So, you can always change the verbose and debug mode and see the details to check out what failed:
# Testing PyKeops installation
import pykeops
# Changing verbose and mode
pykeops.verbose = True
pykeops.build_type = 'Debug'
# Clean up the already compiled files
pykeops.clean_pykeops()
# Test Numpy integration
pykeops.test_numpy_bindings()

Related

CUDA_HOME environment variable is not set

I have a working environment for using pytorch deep learning with gpu, and i ran into a problem when i tried using mmcv.ops.point_sample, which returned :
ModuleNotFoundError: No module named 'mmcv._ext'
I have read that you should actually use mmcv-full to solve it, but i got another error when i tried to install it:
pip install mmcv-full
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
Which seems logic enough since i never installed cuda on my ubuntu machine(i am not the administrator), but it still ran deep learning training fine on models i built myself, and i'm guessing the package came in with minimal code required for running cuda tensors operations.
So my main question is where is cuda installed when used through pytorch package, and can i use the same path as the environment variable for cuda_home?
Additionaly if anyone knows some nice sources for gaining insights on the internals of cuda with pytorch/tensorflow I'd like to take a look (I have been reading cudatoolkit documentation which is cool but this seems more targeted at c++ cuda developpers than the internal working between python and the library)

you can chek it and check the paths with these commands :
which nvidia-smi
which nvcc
cat /usr/local/cuda/version.txt

How to find the right libnvinfer version for Cuda

I am building a Docker image, for deep learning:
cuda:11.2.0-cudnn8-devel-ubuntu20.04
PYTHON_VERSION=3.7.9
For this task I need 3 dependencies to install, but I can't find the right version. The error I get, when building the Docker image:
E: Version '8.1.1.33-1+cuda11.2' for 'libnvinfer8' was not found E:
Version '8.1.1.33-1+cuda11.2' for 'libnvinfer-dev' was not found E:
Version '8.1.1.33-1+cuda11.2' for 'libnvinfer-plugin8' was not found
I was experimenting with other versions as well, but I had no success, so the question is: where/how can I find the right versions which works with cuda 11.2 and ubuntu 20.04. Is there a rule of thumb?

To check which version of TensorRT your tensorflow installation expects:
>>> import tensorflow
>>> tensorflow.__version__
'2.8.0'
>>> from tensorflow.python.compiler.tensorrt import trt_convert as trt
>>> trt.trt_utils._pywrap_py_utils.get_linked_tensorrt_version()
(7, 2, 2)
>>> trt.trt_utils._pywrap_py_utils.get_loaded_tensorrt_version()
2022-03-24 08:59:15.415733: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
The last command shows that indeed libnvinfer.so is missing on your system (you can also check this fact using ldconfig -p | grep libnv).
To install it (adapted from Tensorflow's gpu.Dockerfile), take the TensorRT version in your output from above, double check it's available for your CUDA version on the nvidia repository, and install:
export LIBNVINFER=7.2.2 LIBNVINFER_MAJOR_VERSION=7 CUDA_VERSION=11.0
apt-get install -y libnvinfer${LIBNVINFER_MAJOR_VERSION}=${LIBNVINFER}-1+${CUDA_VERSION} libnvinfer-plugin${LIBNVINFER_MAJOR_VERSION}=${LIBNVINFER}-1+${CUDA_VERSION}
The last Python command above should now start working. If you get a similar error again, double check that the so file locations (check with ldconfig -p | grep libnv) are included in LD_LIBRARY_PATH.
Also double check your CUDA version. In my case, I was running the docker.io/nvidia/cuda:11.5.1-cudnn8-runtime-ubuntu20.04 image, which already contains the math libraries and specifically libnvrtc.so.11.2 (so for a newer CUDA version than TensorRT supports on the nvidia repository). This becomes evident after running the apt-get command above, which gave this output:
The following packages have unmet dependencies:
libnvinfer7 : Depends: cuda-nvrtc-11-0 but it is not going to be installed

When doubt:
https://www.tensorflow.org/install/source#gpu
This was actually the answer and it works like a charm.. follow the official site for Tested build configurations
tensorflow-2.5.0
cuda:11.2.0
PYTHON_VERSION=3.8.11
libcudnn8=8.1.1.33-1+cuda11.2
libcudnn8-dev=8.1.1.33-1+cuda11.2

ImportError: Could not load dynamic library 'cudart64_110.dll' [duplicate]

This question already has answers here:
Fix not load dynamic library for Tensorflow GPU
(5 answers)
Closed 1 year ago.
I was originally running Tensorflow using PyCharm.
In PyCharm, the same phrase as the title did not appear.
But after I switched to VS Code and installed Python extension,
When I write and execute import tensorflow as tf, the error like the title appears repeatedly.
ImportError: Could not load dynamic library 'cudart64_110.dll'
Considering that there was no problem in PyCharm, it does not seem to be an environmental variable problem.
When I type the same command that was executed in VS Code in the command prompt window, another phrase appears,
"Connection failed because the target computer refused to connect."
My OS: Windows 10
I am using Anaconda, and I created a virtual environment.
vscode ver : 1.53.2
tensorflow ver : 2.4.1
CUDA : 11.2
cudnn : 8.1

It is due to tensorflow GPU support. Tensorflow now comes with GPU support and the system need graphics support and CUDA, CUDU installations. If you missed CUDA installation then you will get the above message. The latest version of tensorflow sometimes won't run without CUDA.
Try to install tensorflow 1.15 and python 3.7.4
https://www.python.org/ftp/python/3.7.4/python-3.7.4-amd64.exe
pip install tensorflow==1.15
NB: Normally tensorflow will run without cuda but the message will always shown in the prompt.

I would agree that this is due to your CUDA version, check the bottom of tensorflow GPU build config, it says for 2.4, you need CUDA 11.0 and cuDNN 8.0, which you have neither, in addition, you need MSVC 2019 to compile it.
Notice that for newer versions of tensorflow-gpu (>=2.3.0), conda will NOT download everything, you need to do them manually.
because it seems like all the evidence is pointing to GPU support problem, tensorflow-gpu might still run without using GPU, so it is possible that it was running on CPU when you use PyCharm,
I would suggest you double-check it runs as intended in PyCharm with
print(tf.config.list_physical_devices('GPU'))
or just simply reinstall everything

I copied "cudart64_110.dll" to the CUDA/v11.2/bin folder and it was resolved.

How to find installation path in virtual environment?

I created virtual environment using anaconda python. I installed cuda toolkit in the created environment. Now I have to give path of cuda installation in makefile. Default path /usr/local/cuda/include/ doesn't exist. How can I find the right path of cuda ?
I have to make changes in make file given below
COMMON+= -DGPU -I/usr/local/cuda/include/
CFLAGS+= -DGPU
LDFLAGS+= -L/usr/local/cuda/lib64 -lcuda -lcudart -lcublas -lcurand
The command which nvcc gives /usr/bin/nvcc
locate cuda | grep /cuda$ gives
/home/tan/.conda/envs/tensorflow_env/include/opencv2/core/cuda
/home/tan/.conda/envs/tensorflow_env/lib/python3.6/site-packages/tensorflow/include/tensorflow/stream_executor/cuda
/home/tan/.conda/envs/tensorflow_gpu/include/opencv2/core/cuda
/home/tan/.conda/envs/tensorflow_gpu/lib/python3.6/site-packages/tensorflow/include/external/local_config_cuda/cuda
/home/tan/.conda/envs/tensorflow_gpu/lib/python3.6/site-packages/tensorflow/include/external/local_config_cuda/cuda/cuda
/home/tan/.conda/envs/tensorflow_gpu/lib/python3.6/site-packages/tensorflow/include/tensorflow/stream_executor/cuda
/home/tan/.conda/pkgs/libopencv-3.4.2-hb342d67_1/include/opencv2/core/cuda
/home/tan/.conda/pkgs/numba-0.42.0-py36h962f231_0/lib/python3.6/site-packages/numba/cuda
/home/tan/.conda/pkgs/opencv3-3.1.0-py36_0/include/opencv2/core/cuda
/home/tan/.conda/pkgs/tensorflow-base-1.10.0-mkl_py36h3c3e929_0/lib/python3.6/site-packages/tensorflow/include/external/local_config_cuda/cuda
/home/tan/.conda/pkgs/tensorflow-base-1.10.0-mkl_py36h3c3e929_0/lib/python3.6/site-packages/tensorflow/include/external/local_config_cuda/cuda/cuda
/home/tan/.conda/pkgs/tensorflow-base-1.10.0-mkl_py36h3c3e929_0/lib/python3.6/site-packages/tensorflow/include/tensorflow/stream_executor/cuda
/home/tan/.conda/pkgs/tensorflow-base-1.12.0-gpu_py36had579c0_0/lib/python3.6/site-packages/tensorflow/include/external/local_config_cuda/cuda
/home/tan/.conda/pkgs/tensorflow-base-1.12.0-gpu_py36had579c0_0/lib/python3.6/site-packages/tensorflow/include/external/local_config_cuda/cuda/cuda
/home/tan/.conda/pkgs/tensorflow-base-1.12.0-gpu_py36had579c0_0/lib/python3.6/site-packages/tensorflow/include/tensorflow/stream_executor/cuda
/home/tan/.conda/pkgs/tensorflow-base-1.12.0-mkl_py36h3c3e929_0/lib/python3.6/site-packages/tensorflow/include/external/local_config_cuda/cuda
/home/tan/.conda/pkgs/tensorflow-base-1.12.0-mkl_py36h3c3e929_0/lib/python3.6/site-packages/tensorflow/include/external/local_config_cuda/cuda/cuda
/home/tan/.conda/pkgs/tensorflow-base-1.12.0-mkl_py36h3c3e929_0/lib/python3.6/site-packages/tensorflow/include/tensorflow/stream_executor/cuda
/home/tan/.conda/pkgs/tensorflow-base-1.3.0-py27h0dbb4d0_1/lib/python2.7/site-packages/tensorflow/include/tensorflow/stream_executor/cuda
/home/tan/.conda/pkgs/tensorflow-base-1.3.0-py36h5293eaa_1/lib/python3.6/site-packages/tensorflow/include/tensorflow/stream_executor/cuda
/home/tan/anaconda3/lib/python3.6/site-packages/numba/cuda
/home/tan/anaconda3/pkgs/numba-0.38.0-py36h637b7d7_0/lib/python3.6/site-packages/numba/cuda
/home/tan/opencv3/opencv-3.4.1/build/modules/core/CMakeFiles/opencv_perf_core.dir/perf/cuda
/home/tan/opencv3/opencv-3.4.1/build_dnn/modules/core/CMakeFiles/opencv_perf_core.dir/perf/cuda
/home/tan/opencv3/opencv-3.4.1/build_gpu/modules/core/CMakeFiles/opencv_perf_core.dir/perf/cuda
/home/tan/opencv3/opencv-3.4.1/modules/core/include/opencv2/core/cuda
/home/tan/opencv3/opencv-3.4.1/modules/core/perf/cuda
/home/tan/opencv3/opencv-3.4.1/modules/core/src/cuda
/home/tan/opencv3/opencv-3.4.1/modules/cudaarithm/src/cuda
/home/tan/opencv3/opencv-3.4.1/modules/cudabgsegm/src/cuda
/home/tan/opencv3/opencv-3.4.1/modules/cudacodec/src/cuda
/home/tan/opencv3/opencv-3.4.1/modules/cudafeatures2d/src/cuda
/home/tan/opencv3/opencv-3.4.1/modules/cudafilters/src/cuda
/home/tan/opencv3/opencv-3.4.1/modules/cudaimgproc/src/cuda
/home/tan/opencv3/opencv-3.4.1/modules/cudalegacy/src/cuda
/home/tan/opencv3/opencv-3.4.1/modules/cudaobjdetect/src/cuda
/home/tan/opencv3/opencv-3.4.1/modules/cudaoptflow/src/cuda
/home/tan/opencv3/opencv-3.4.1/modules/cudastereo/src/cuda
/home/tan/opencv3/opencv-3.4.1/modules/cudawarping/src/cuda
/home/tan/opencv3/opencv-3.4.1/modules/photo/src/cuda
/home/tan/opencv3/opencv-3.4.1/modules/stitching/src/cuda
/home/tan/opencv3/opencv-3.4.1/modules/superres/src/cuda
/home/tan/opencv3/opencv-3.4.1/modules/videostab/src/cuda
/home/tan/opencv3/opencv_contrib-3.4.1/modules/hfs/src/cuda
/home/tan/opencv3/opencv_contrib-3.4.1/modules/xfeatures2d/src/cuda
/usr/include/flann/util/cuda

Full CUDA installation via runfile for ubuntu is 2.4 GB, while anaconda only ~370 MB. The latter contains all dependencies required to run libraries that depend on it, e.g. PyTorch or Tensorflow. It's not a full installation and most likely does not have what you're looking for.
You need a full development package which can be found on Nvidia website.

You can have multiple full installations of CUDA on your machine. It is my understanding that CUDA is backwards compatible, so you only need one.
However, compiling using a newer version of CUDA might link you incompatible newer libraries of other packages.
If you do need to install multiple versions of CUDA you need to add links to .bashrc and what not. Here is a website with instructions
https://medium.com/#peterjussi/multicuda-multiple-versions-of-cuda-on-one-machine-4b6ccda6faae
tl;dr run this command.
sudo ldconfig /usr/local/cuda-8.0/lib64

Tensorflow GPU setup: error with CUDA on PyCharm

I have TF 0.8 installed on Python3, MacOSX El Capitan.
When running a simple test code for TF I get this message:
ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tensorflow/python/_pywrap_tensorflow.so, 10):
Library not loaded: #rpath/libcudart.7.5.dylib
Referenced from: /Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tensorflow/python/_pywrap_tensorflow.so
Reason: image not found
My .bash_profile is as follows:
export PATH=/usr/local/bin:$PATH
export DYLD_LIBRARY_PATH=/Developer/NVIDIA/CUDA-7.5/lib:/usr/local/cuda/lib
in /Developer/NVIDIA/CUDA-7.5/lib I have a file called libcudart.7.5.dylib
in /usr/local/cuda/lib I have an alias called libcudart.7.5.dylib
I have tried several permutations of .bash_profile without success. ANy idea what may be wrong?
Note that I can successfully use my GPU with Theano so there's no reason to believe the GPU/cuDNN/CUDA install may be faulty.

If you're getting this error, make sure you installed CUDA, cuDNN correctly as described in the Tensorflow install instructions. Note the TF, CUDA, cuDNN version you're installing and what Python version you're using.
Filenames, paths etc frequently vary, so small tweaks in filenames and paths may be needed in your case if errors are cropping up. Sometimes it's hard for others to help you because your system may have a very specific path setup/version that cannot be understood by someone in a forum.
If you're getting exactly the error I'm describing in the OP, take a step back and check:
is this happening in PyCharm?
is this happening in iPython within PyCharm?
is it happening in both?
is this happening in iPython in Terminal?
In my case it was happening only in PyCharm. In iPython outside of PyCharm (that is using the Mac 'Terminal' software) everything worked fine. But when doing iPython in PyCharm, or when running the test file through PyCharm, I would get the error. This means it has something to do with PyCharm, not the Tensorflow install.
Make sure your DYLD_LIBRARY_PATH is correctly pointing to the libcudart.7.5.dylib file. Navigate there with Finder, do a Spotlight search search and find the file or its alias. Then place that path in your .bash_profile. In my case, this is working:
export DYLD_LIBRARY_PATH=/usr/local/cuda/lib
If your problem is PyCharm, a specific configuration is needed. Go to the upper right corner of the GUI and click the gray down arrow.
Choose "Edit Configuration". You will see an Environment option where you need to click on the ... box and enter the DYLD_LIBRARY_PATH that applies to your case.
Note that there's an Environment option for the specific file you're working on (it will be highlighted in the left panel) and for Defaults (put DYLD_... there as well if you want future files you create to have this). Note that you need to save this config or else when you close PyCharm it won't stick.

As an extension to pepe answer, which is the correct one, I don't mind if the following is integrated to the original answer.
I would like to add that if you wish to make this change permanent in pyCharm (affects only the current project) and not having it to do for every net file, is possible, from the interface shown above by pepe, by going under the "Default" to set the DYLD_LIBRARY_PATH.
Keep in mind, that this change by itself it doesn't alter the run configuration of the current script, which eventually still need to be manually changed ( or deleted and regenerated from the new defaults)

Could you try TF 0.9, which adds the MacOX GPU support?
https://github.com/tensorflow/tensorflow/blob/r0.9/RELEASE.md

It appears that you are not finding CUDA on your system. This could be for a number of reasons including installing CUDA for one version of python while running a different version of python that isn't aware of the other versions installed files.
Please take a look at my answer here.
https://stackoverflow.com/a/41073045/1831325

In my case, tensorflow version is 1.1, the dlopen error happens in both
ipython and pycharm
Environment:
Cuda version:8.0.62
cudnn version:6
error is a little different in pycharm and ipython. I cannot remeber too much detail, but ipython says there is no libcudnn.5.dylib, but pycharm just says there is
import error, image not found
Solution:
Download cudnn version 5, from
https://developer.nvidia.com/rdp/cudnn-download
Unzip the cudnn. Copy lib/ to /usr/local/cuda/lib. Copy include/ to /usr/local/cuda/include
unzip cuda.zip
cd cuda
sudo cp -r lib /usr/local/cuda/lib
sudo cp include/cudnn.h /usr/local/cuda/include
Add the lib directory path to your DYLD_LIBRARY_PATH. like this in my ~/.bash_profile:
export DYLD_LIBRARY_PATH=/Developer/NVIDIA/CUDA-8.0/lib:/usr/local/cuda/lib${DYLD_LIBRARY_PATH:+:${DYLD_LIBRARY_PATH}}
In the tensorflow official installation guide, it says need cudnn 5.1, so ,this is all about careless。
https://www.tensorflow.org/install/install_mac
Requirements to run TensorFlow with GPU support.
If you are installing TensorFlow with GPU support using one of the mechanisms described in this guide, then the following NVIDIA software must be installed on your system:
...
The NVIDIA drivers associated with CUDA Toolkit 8.0.
cuDNN v5.1. For details, see NVIDIA's documentation.
...

I think the problem is in the SIP (System Integrity Protection). Restricted processes run with cleared environment variables and you got this error.
You need to go to Recovery Mode, start Terminal and input
$ csrutil disable
, and reboot

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.