How to find the right libnvinfer version for Cuda - python

I am building a Docker image, for deep learning:
cuda:11.2.0-cudnn8-devel-ubuntu20.04
PYTHON_VERSION=3.7.9
For this task I need 3 dependencies to install, but I can't find the right version. The error I get, when building the Docker image:
E: Version '8.1.1.33-1+cuda11.2' for 'libnvinfer8' was not found E:
Version '8.1.1.33-1+cuda11.2' for 'libnvinfer-dev' was not found E:
Version '8.1.1.33-1+cuda11.2' for 'libnvinfer-plugin8' was not found
I was experimenting with other versions as well, but I had no success, so the question is: where/how can I find the right versions which works with cuda 11.2 and ubuntu 20.04. Is there a rule of thumb?

To check which version of TensorRT your tensorflow installation expects:
>>> import tensorflow
>>> tensorflow.__version__
'2.8.0'
>>> from tensorflow.python.compiler.tensorrt import trt_convert as trt
>>> trt.trt_utils._pywrap_py_utils.get_linked_tensorrt_version()
(7, 2, 2)
>>> trt.trt_utils._pywrap_py_utils.get_loaded_tensorrt_version()
2022-03-24 08:59:15.415733: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64
The last command shows that indeed libnvinfer.so is missing on your system (you can also check this fact using ldconfig -p | grep libnv).
To install it (adapted from Tensorflow's gpu.Dockerfile), take the TensorRT version in your output from above, double check it's available for your CUDA version on the nvidia repository, and install:
export LIBNVINFER=7.2.2 LIBNVINFER_MAJOR_VERSION=7 CUDA_VERSION=11.0
apt-get install -y libnvinfer${LIBNVINFER_MAJOR_VERSION}=${LIBNVINFER}-1+${CUDA_VERSION} libnvinfer-plugin${LIBNVINFER_MAJOR_VERSION}=${LIBNVINFER}-1+${CUDA_VERSION}
The last Python command above should now start working. If you get a similar error again, double check that the so file locations (check with ldconfig -p | grep libnv) are included in LD_LIBRARY_PATH.
Also double check your CUDA version. In my case, I was running the docker.io/nvidia/cuda:11.5.1-cudnn8-runtime-ubuntu20.04 image, which already contains the math libraries and specifically libnvrtc.so.11.2 (so for a newer CUDA version than TensorRT supports on the nvidia repository). This becomes evident after running the apt-get command above, which gave this output:
The following packages have unmet dependencies:
libnvinfer7 : Depends: cuda-nvrtc-11-0 but it is not going to be installed

When doubt:
https://www.tensorflow.org/install/source#gpu
This was actually the answer and it works like a charm.. follow the official site for Tested build configurations
tensorflow-2.5.0
cuda:11.2.0
PYTHON_VERSION=3.8.11
libcudnn8=8.1.1.33-1+cuda11.2
libcudnn8-dev=8.1.1.33-1+cuda11.2

Related

Runtime error using Python Library Keops using CUDA in Ubuntu18.04

I am trying to run samples from the Python library: GeomLoss, which depends on CUDA, Pytorch and Keops in Ubuntu 18.04.3. I downloaded Python3.7 using Anaconda, and I am using CUDA 10.1. The gcc version is 7.4.0.
When I run samples from GeomLoss, the error message said:
RuntimeError: [KeOps] This KeOps shared object has been compiled
without cuda support: try to set tagHostDevice to 0 or recompile the
formula with a working version of cuda.
I cannot change tagHostDevice to 0 since this will disable GPU calculation according to their documentation. I checked CUDA and Pytorch installation and they was no error.
But when I tried to run the installation checking code from KeOps:
import torch
import pykeops.torch as pktorch
x = torch.arange(1, 10, dtype=torch.float32).view(-1, 3)
y = torch.arange(3, 9, dtype=torch.float32).view(-1, 3)
my_conv = pktorch.Genred('SqNorm2(x-y)', ['x = Vi(3)', 'y = Vj(3)'])
print(my_conv(x, y))
I received error message:
error -- unsupported GNU version! gcc versions later than 6 are not supported! ^~~~~ CMake Error at
keopslibKeOpstorch91c92bd508_generated_link_autodiff.cu.o.Release.cmake:219
I checked CUDA documentation, for Ubuntu 18.04.3, the native linux distribution support should be gcc-7.3.0. for x86_64. I used gcc --version to check default gcc in system and it is using gcc-7.4.0. I am not sure if this is the problem with using KeOps with CUDA and GPU. Also, I believe KeOps will not support gcc versions before 7. So I am really confused about what should I do to fix the problem right now.
I am wondering if anyone has experienced similar problems with GeomLoss and KeOps or other libraries. I am indeed grateful for any suggestions. Thanks!
I did the following steps and it worked for me:
First, by checking the dependencies in this link I noticed that nvcc compiler is not installed. By going to Nvidia Toolkit Installation Guide I did the following steps:
wget https://developer.download.nvidia.com/compute/cuda/11.2.2/local_installers/cuda_11.2.2_460.32.03_linux.run
sudo sh cuda_11.2.2_460.32.03_linux.run
Then I realized that nvcc command not working, so I did add them to the path using:
nano ~/.bashrc
# Add the following two lines:
export PATH="$PATH:/usr/local/cuda/bin"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64"
In general, it is important to see the logs of PyKeops to check the process. So, you can always change the verbose and debug mode and see the details to check out what failed:
# Testing PyKeops installation
import pykeops
# Changing verbose and mode
pykeops.verbose = True
pykeops.build_type = 'Debug'
# Clean up the already compiled files
pykeops.clean_pykeops()
# Test Numpy integration
pykeops.test_numpy_bindings()

ImportError: Could not find 'cudart64_100.dll

I'm trying to install tensorflow-gpu==2.0.0-beta1 on my Windows 10 machine and got this error:
ImportError: Could not find 'cudart64_100.dll'. TensorFlow requires
that this DLL be installed in a directory that is named in your %PATH%
environment variable. Download and install CUDA 10.0 from this URL:
https://developer.nvidia.com/cuda-90-download-archive
I made all things from:
official documentation: https://www.tensorflow.org/install/gpu
from here: https://medium.com/#teavanist/install-tensorflow-gpu-on-windows-10-5a23c46bdbc7
Checked PATH variable: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin
also have CUDA_PATH : C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0 in variables
file C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin\cudart64_100.dll exists
Did system restart
But error still occurs
How can I fix this?
The simplest way to fix is to install the latest ‘NVIDIA GPU Computing Toolkit’, because if it's not there, you'll be missing the 'cudart64_100.dll' library.
The only issue is that the latest copy of CUDA has this particular library upgraded to 'cudart64_101.dll', while the latest TensorFlow still requires the older 'cudart64_100.dll'.
Anyways, one way to deal with this issue is to install the latest CUDA + CUDA from September 2018 and then copy 'cudart64_100.dll' library from old install to the new one.
Or just visit my site where I linked the 'cudart64_100.dll' library downloaded from the CUDA Toolkit 10.0 (Sept 2018), to make it easier to copy it into the latest CUDA directory.
Here are some screenshots to illustrate the process: https://www.joe0.com/2019/10/19/how-resolve-tensorflow-2-0-error-could-not-load-dynamic-library-cudart64_100-dll-dlerror-cudart64_100-dll-not-found/
I have faced similar issue. I have added the directory of the cudart64_100.dll file to the PATH variable but still it prompts the error "cudart64_100.dll" not found. In the end I finally manage to make it work by adding the following codes. Hope it helps.
import ctypes
hllDll = ctypes.WinDLL("C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v10.0\\bin\\cudart64_100.dll")
I had similar error:
cudart64_101.dll not found
It is because the latest version of CUDA requires older CUDA-version.dll files to work properly. The solution would be try installing the previous version of CUDA.
Once you have downloaded CUDA 10.1 run the .exe file which will first extract the necessary files in C:\Users\your_user_name\AppData\Local\Temp\CUDA.
Once the extraction is completed do not proceed with the installation navigate to the directory C:\Users\your_user_name\AppData\Local\Temp\CUDA\cudart\bin and here you will find the missing DLL file cudart64_101.dll and cudart32_101.dll copy both the files to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin and then cancel the installation.
Follow the same steps for any CUDA version will work for sure. Hope this helps thank you!!!
I also was dealing with the current situation that tensorflow expects cudart64_101.dll and NVIDIA offers Version 10.2 as main version (including cudart64_102.dll).
I simply installed both versions. I have both versions in the windows path. Beside disc space, I did not have problems so far and GPU is used in tensorflow.
*EDIT
Every Tensorflow version requires a specific version of CUDA.
The easiest way is to open https://www.tensorflow.org/install/gpu and read which version did you need.
ex. CUDA® Toolkit —TensorFlow supports CUDA® 11.2 (TensorFlow >= 2.5.0)
If you want to install Tensor Flow v 2.5.0 you must have exactly CUDA v 11.2 installed.
TensorFlow 2.2 try to import cudart64_101.dll, cusparse64_10.dll and cublas64_10.dll but they are part from CUDA 10.1 If you have OTHER cuda version will get ImportError: Could not find 'cudart64_100.dll as this files are only available in cuda 10.1.
If you want to use older tensorflow version you have to use it with appropriate cuda version
Just rename
\program files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin\cudart64_102.dll
to cudart100.dll
you can find the cudart64_100.dll file in this website link.
and extract it, add cudart64_100.dll to your C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin
after you run your python script you will see:
Successfully opened dynamic library cudart64_100.dll
NVidia maintains a download archive of older CUDA driver releases. I would recommend downloading the installer from here instead of a third-party archive:
https://developer.nvidia.com/cuda-toolkit-archive
That page has a link to a page for downloading version 10.0:
https://developer.nvidia.com/cuda-10.0-download-archive
For the CUDA-NN libraries, the archive may be found here:
https://developer.nvidia.com/rdp/cudnn-archive
You need to be a member of the NVidia developer program to access this archive, but it is free to sign-up.
There should be no problem installing multiple versions at once, since the CUDA installer uses a separate installation directory for each version and the library files have different names. You will, however, want to select a custom installation to avoid replacing your display drivers, since you're installing older versions.
The CUDA-NN libraries (at least for Windows) are distributed as zip files. Copy the contents where you would like them (I installed each as a subdirectory under the corresponding CUDA driver installation directory).
Finally, add all of the various CUDA and CUDA-NN directories to your PATH, if they aren't already there so Python and Tensorflow can find them.
The actual fix is to download and install CUDA Toolkit
But if you're in hurry, you can also disable gpu, before importing tensorflow.
import os
# BAD IDEA ::::: Disabling gpu . The actual fix is to download and install cuda toolkit.
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
import tensorflow as tf
Renaming the libraries might find a way to solve this issue, but that is not advised, as the root cause for this issue is due to version mismatch of CDNN and cuDNN. It's advised to install the right CDNN and cuDNN versions.
I too have faced similar issues and realized that the issue was with CUDA and CUDNN version mismatch.
Can refer here for the proper CDNN and cuDNN versions. From the reference below for TensorFlow 2.4.0 it is recommended to use CUDA 11.0 and cuDNN 8.0.
Or you can refer here to download cuDNN for suitable CUDA.

How to find installation path in virtual environment?

I created virtual environment using anaconda python. I installed cuda toolkit in the created environment. Now I have to give path of cuda installation in makefile. Default path /usr/local/cuda/include/ doesn't exist. How can I find the right path of cuda ?
I have to make changes in make file given below
COMMON+= -DGPU -I/usr/local/cuda/include/
CFLAGS+= -DGPU
LDFLAGS+= -L/usr/local/cuda/lib64 -lcuda -lcudart -lcublas -lcurand
The command which nvcc gives /usr/bin/nvcc
locate cuda | grep /cuda$ gives
/home/tan/.conda/envs/tensorflow_env/include/opencv2/core/cuda
/home/tan/.conda/envs/tensorflow_env/lib/python3.6/site-packages/tensorflow/include/tensorflow/stream_executor/cuda
/home/tan/.conda/envs/tensorflow_gpu/include/opencv2/core/cuda
/home/tan/.conda/envs/tensorflow_gpu/lib/python3.6/site-packages/tensorflow/include/external/local_config_cuda/cuda
/home/tan/.conda/envs/tensorflow_gpu/lib/python3.6/site-packages/tensorflow/include/external/local_config_cuda/cuda/cuda
/home/tan/.conda/envs/tensorflow_gpu/lib/python3.6/site-packages/tensorflow/include/tensorflow/stream_executor/cuda
/home/tan/.conda/pkgs/libopencv-3.4.2-hb342d67_1/include/opencv2/core/cuda
/home/tan/.conda/pkgs/numba-0.42.0-py36h962f231_0/lib/python3.6/site-packages/numba/cuda
/home/tan/.conda/pkgs/opencv3-3.1.0-py36_0/include/opencv2/core/cuda
/home/tan/.conda/pkgs/tensorflow-base-1.10.0-mkl_py36h3c3e929_0/lib/python3.6/site-packages/tensorflow/include/external/local_config_cuda/cuda
/home/tan/.conda/pkgs/tensorflow-base-1.10.0-mkl_py36h3c3e929_0/lib/python3.6/site-packages/tensorflow/include/external/local_config_cuda/cuda/cuda
/home/tan/.conda/pkgs/tensorflow-base-1.10.0-mkl_py36h3c3e929_0/lib/python3.6/site-packages/tensorflow/include/tensorflow/stream_executor/cuda
/home/tan/.conda/pkgs/tensorflow-base-1.12.0-gpu_py36had579c0_0/lib/python3.6/site-packages/tensorflow/include/external/local_config_cuda/cuda
/home/tan/.conda/pkgs/tensorflow-base-1.12.0-gpu_py36had579c0_0/lib/python3.6/site-packages/tensorflow/include/external/local_config_cuda/cuda/cuda
/home/tan/.conda/pkgs/tensorflow-base-1.12.0-gpu_py36had579c0_0/lib/python3.6/site-packages/tensorflow/include/tensorflow/stream_executor/cuda
/home/tan/.conda/pkgs/tensorflow-base-1.12.0-mkl_py36h3c3e929_0/lib/python3.6/site-packages/tensorflow/include/external/local_config_cuda/cuda
/home/tan/.conda/pkgs/tensorflow-base-1.12.0-mkl_py36h3c3e929_0/lib/python3.6/site-packages/tensorflow/include/external/local_config_cuda/cuda/cuda
/home/tan/.conda/pkgs/tensorflow-base-1.12.0-mkl_py36h3c3e929_0/lib/python3.6/site-packages/tensorflow/include/tensorflow/stream_executor/cuda
/home/tan/.conda/pkgs/tensorflow-base-1.3.0-py27h0dbb4d0_1/lib/python2.7/site-packages/tensorflow/include/tensorflow/stream_executor/cuda
/home/tan/.conda/pkgs/tensorflow-base-1.3.0-py36h5293eaa_1/lib/python3.6/site-packages/tensorflow/include/tensorflow/stream_executor/cuda
/home/tan/anaconda3/lib/python3.6/site-packages/numba/cuda
/home/tan/anaconda3/pkgs/numba-0.38.0-py36h637b7d7_0/lib/python3.6/site-packages/numba/cuda
/home/tan/opencv3/opencv-3.4.1/build/modules/core/CMakeFiles/opencv_perf_core.dir/perf/cuda
/home/tan/opencv3/opencv-3.4.1/build_dnn/modules/core/CMakeFiles/opencv_perf_core.dir/perf/cuda
/home/tan/opencv3/opencv-3.4.1/build_gpu/modules/core/CMakeFiles/opencv_perf_core.dir/perf/cuda
/home/tan/opencv3/opencv-3.4.1/modules/core/include/opencv2/core/cuda
/home/tan/opencv3/opencv-3.4.1/modules/core/perf/cuda
/home/tan/opencv3/opencv-3.4.1/modules/core/src/cuda
/home/tan/opencv3/opencv-3.4.1/modules/cudaarithm/src/cuda
/home/tan/opencv3/opencv-3.4.1/modules/cudabgsegm/src/cuda
/home/tan/opencv3/opencv-3.4.1/modules/cudacodec/src/cuda
/home/tan/opencv3/opencv-3.4.1/modules/cudafeatures2d/src/cuda
/home/tan/opencv3/opencv-3.4.1/modules/cudafilters/src/cuda
/home/tan/opencv3/opencv-3.4.1/modules/cudaimgproc/src/cuda
/home/tan/opencv3/opencv-3.4.1/modules/cudalegacy/src/cuda
/home/tan/opencv3/opencv-3.4.1/modules/cudaobjdetect/src/cuda
/home/tan/opencv3/opencv-3.4.1/modules/cudaoptflow/src/cuda
/home/tan/opencv3/opencv-3.4.1/modules/cudastereo/src/cuda
/home/tan/opencv3/opencv-3.4.1/modules/cudawarping/src/cuda
/home/tan/opencv3/opencv-3.4.1/modules/photo/src/cuda
/home/tan/opencv3/opencv-3.4.1/modules/stitching/src/cuda
/home/tan/opencv3/opencv-3.4.1/modules/superres/src/cuda
/home/tan/opencv3/opencv-3.4.1/modules/videostab/src/cuda
/home/tan/opencv3/opencv_contrib-3.4.1/modules/hfs/src/cuda
/home/tan/opencv3/opencv_contrib-3.4.1/modules/xfeatures2d/src/cuda
/usr/include/flann/util/cuda
Full CUDA installation via runfile for ubuntu is 2.4 GB, while anaconda only ~370 MB. The latter contains all dependencies required to run libraries that depend on it, e.g. PyTorch or Tensorflow. It's not a full installation and most likely does not have what you're looking for.
You need a full development package which can be found on Nvidia website.
You can have multiple full installations of CUDA on your machine. It is my understanding that CUDA is backwards compatible, so you only need one.
However, compiling using a newer version of CUDA might link you incompatible newer libraries of other packages.
If you do need to install multiple versions of CUDA you need to add links to .bashrc and what not. Here is a website with instructions
https://medium.com/#peterjussi/multicuda-multiple-versions-of-cuda-on-one-machine-4b6ccda6faae
tl;dr run this command.
sudo ldconfig /usr/local/cuda-8.0/lib64

when i am installing caffe in my ubuntu 16.04 ,i have issue with openCV

I was trying to install caffe in linux machine,and while i am trying to do make build ,i am getting an issue with openCV. I am getting the following error
/usr/bin/ld:cannot find -lopencv_imgcodecs
/usr/bin/ld:cannot find -lopencv_videoio
collect2:error:ld returned 1 exit status.
can anyone help me in resolve this issue?
This usually happens when you have either forgotten to uncomment line number 21(in my case) when you have opencv3 or if you have not checked the version properly.If opencv>=3.0 then in the makefile on line 181(in my case) add libraries like
LIBRARIES += glog gflags protobuf leveldb snappy \
lmdb boost_system hdf5_hl hdf5 \
opencv_core opencv_highgui opencv_imgproc opencv_imgcodecs
And also make sure to rebuild.

Building Python 3 on OS X: [Python/importlib.h] Error 133

I am trying to build Python (3.5.2) on OS X El Capitan (10.11.5). However, I run into an error when I try to make it. The error seems to be related to _freeze_importlib.
/usr/local/src/Python-3.5.2 $ make
if test "no" != "yes"; then \
./Programs/_freeze_importlib \
./Lib/importlib/_bootstrap.py Python/importlib.h; \
fi
dyld: lazy symbol binding failed: Symbol not found: _getentropy
Referenced from: /usr/local/src/Python-3.5.2/./Programs/_freeze_importlib
Expected in: /usr/lib/libSystem.B.dylib
dyld: Symbol not found: _getentropy
Referenced from: /usr/local/src/Python-3.5.2/./Programs/_freeze_importlib
Expected in: /usr/lib/libSystem.B.dylib
/bin/sh: line 1: 56666 Trace/BPT trap: 5 ./Programs/_freeze_importlib ./Lib/importlib/_bootstrap.py Python/importlib.h
make: *** [Python/importlib.h] Error 133
/usr/local/src/Python-3.5.2 $
You can see my steps on GitHub.
The full Terminal output up to the make fail is in a Gist.
I fully acknowledge that this is an academic exercise, as El Capitan comes with Python 2.7.10 and you can easily install Python 3.5.2 with the official OS X installer package or via Homebrew.
The documentation for Using Python on Unix platforms provides build instructions. The documentation for Using Python on a Macintosh specifically says to use the the OS X installer package.
However, it should be possible to build on Mac.
Python on a Macintosh running Mac OS X is in principle very similar to Python on any other Unix platform, but there are a number of additional features such as the IDE and the Package Manager that are worth pointing out.
At this point, I'm not worried about those additional features. Just curious about why I am getting a make error.
Fixed.
In the output of ./configure, I noticed a reference to /Applications/Xcode-beta.app/Contents/Developer/. I installed Xcode 8 (beta) a few days ago. After switching back to the regular Command Line Tools (with Xcode 7.3.1)
sudo xcode-select -s /Applications/Xcode.app/Contents/Developer
make succeeded. Not perfectly.
Python build finished successfully!
The necessary bits to build these optional modules were not found:
_dbm _gdbm _sqlite3
_ssl nis ossaudiodev
spwd zlib
To find the necessary bits, look in setup.py in detect_modules() for the module's name.
Failed to build these modules:
_lzma _tkinter
I hope I don't need those modules.
I've put the full output of ./configure and make in this Gist. I didn't include the output for make install as it was too long and seemed to only repeat the warnings and errors of make.
Notes
I didn't use --enable-framework or --enable-universalsdk.
I think a better solution is xcode-select --install.
If you update Xcode to 8(beta), you have to run xcode-select --install again to install all the build tools with it.
I ran into the same issue as yours and I can successfully install python through brew with Xcode8 after running xcode-select --install.
I hope it would help others with the same issue here.

Categories