Why does nvidia-smi show a wrong CUDA version?

Why does nvidia-smi show a wrong CUDA version? - python

I have already read this previous issue, but it did not answer my question. Different CUDA versions shown by nvcc and NVIDIA-smi
The above issue answers the question whether there is a problem with the installation. But it does not answer my question "If I install other applications in Python that require CUDA, which CUDA version should I assume that I have?".
In the previous issue, the author had intentionally installed two different versions of CUDA on his system. But I have only installed CUDA 10.1 on my computer, yet Python claims that I have installed version 11.1.
CUDA was installed on my computer following instructions on Nvidias homepage, by downloading installer files. I have not installed CUDA packages via pip or pip3 in python.
Version according to cmd.
Version in the file system.
Version according to System Environment Variables.
Version according to nvidia-smi called from a python console.
If I install other applications in Python, which CUDA version should I assume that I have? How can I get rid of the 11.1 version, and only keep the 10.1 version?

"If I install other applications in Python that require CUDA, which CUDA version should I assume that I have?"
You have CUDA 10.1. You will satisify the needs of any CUDA application in python such as tensorflow, if that application was linked against CUDA 10.1.
If I install other applications in Python, which CUDA version should I assume that I have?
You have CUDA 10.1
How can I get rid of the 11.1 version, and only keep the 10.1 version?
You can't, and don't want to. The CUDA 11.1 version reported is the version of the CUDA driver API. CUDA applications that are usable in Python will be linked either against a specific version of the runtime API, in which case you should assume your CUDA version is 10.1, or else they will be linked against the driver API. If linked against the driver API only, then based on your GPU driver install, any linkage against any driver API version up through CUDA 11.1 will work. That would include any driver API applications linked against CUDA 10.1.
If you were to uninstall the driver that is reporting the 11.1 version, you would break your CUDA install and nothing would work. The driver reporting 11.1 is perfectly fine and no problem at all for usage of CUDA applications that expect CUDA 10.1.

Related

cudnn64_8.dll not found (but I have it installed properly!)

I have tried to get my laptop gpu to work with tensorflow, however I keep encountering this issue
I had tensorflow installed through pip (on anaconda env) with CUDA 11.2 and CUDnn 8.1, and it won't work!
I then tried a previously known version to work (tensorflow 2.4 with CUDA 11.0 and so on.
-but pip will not install tensorflow 2.4.0 (I am assuming it is no longer supported)
I have included a photo with proof of my cuda and cudnn versions

I believe the issue may lie in the folder you extract your cuDNN to.
Personally, I've extracted my cuDNN to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2.
When you open the zip cuDNN file, open the "cuda" file in the zip, and then extract the rest (bin etc.) into the above mentioned directory.
Make sure you restart the program/kernel so it can detect the new files.
Also, don't forget to add the CUDA path to your environment variables, though as it knows to look for cudnn64_8.dll I expect this is fine.

Problem reverting to older Cuda version for TensorFlow in PyCharm (Windows 10)

Have been trying to access Cuda within Tensorflow 2.3.1 (through PyCharm) to access my compatible GPU for object detection. TensorFlow was working previously, but when I installed Cuda I realised that it was 11.1 - where TensorFlow has a pre-requisite of Cuda 10.1. I had both versions on my computer, and tried to uninstall all the 11.1 toolkit using the Windows add/remove tool (which is recommended). This seemed to work in removing 11.1, however, when I try to re-install the TensorFlow package into the Project Interpreter Setting (within PyCharm) it comes up with the following error:
UnsatisfiableError: The following specifications were found to be incompatible with the existing python installation in your
environment:
Specifications:
tensorflow -> python[version='3.5.|3.6.|3.7.*']
Your python: python=3.8
If python is on the left-most side of the chain, that's the version you've asked for. When python appears to the right, that indicates
that the thing on the left is somehow not available for the python
version you are constrained to. Note that conda will not change your
python version to a different minor version unless you explicitly
specify that.
The following specifications were found to be incompatible with your system:
feature:/win-64::__cuda==11.1=0
Your installed version is: 11.1
EDIT - this the the same when I try to install into the Conda Environment through Anaconda.
System setup:
Windows 10 (64bit)
Tensorflow 2.3.1
Cuda 10.1 (previously 11.1 installed - but I thought uninstalled)
cdnn 64_7
Python 3.8
Graphics: 2070Super (driver:456.55)
I understand that PyCharm is unable to install TensorFlow because this has a pre-requisite of Cuda 10.1, but I can't find any references to where it's still pointing to the older (newer 11.1) version. All my path environment variables point to the 10.1 directory. I wonder if there isn't a text/init file somewhere that hard-sets the Cuda version, but haven't found anything on the NVidia site.
Sorry for the noob question, but I am hoping someone can point out where this reference to the newer 11.1 version might be lingering.

So I feel somewhat embarrassed - it turns out despite the Tensorflow website indicating that TensorFlow2.0 was compatible with Python3.8, once I reverted back to an earlier 3.7 it seems to have at least resolved that problem. I was fixated on the fact that it was falsely reporting Cuda v11.1. I think this is now resolved.

Why does my CUDA work for Pytorch but not for Tensorflow suddenly?

The machine I'm using is with Titan XP and running Ubuntu 18.10. I'm not the owner so I'm not sure how it was configured previously. The cuda version is 9.*, most likely 9.0. There is no folder like /usr/local/cuda. Though it sounds strange (because no Cuda is compatible with 18.10), previously it worked pretty well both for Tensorflow and Pytorch. Now, when running tensorflow-gpu v1.12.0 in python 2.7, cudatoolkit 9.2 and cudnn 7.2.1 (this worked well previously without any change), it reports:
ImportError: libcublas.so.9.0: cannot open shared object file: No such file of directory
But, when I change my conda env to python 3.6 with pytorch 0.4.1 , cudatoolkit 9.0 and cudnn 7.6 (they are shown in pycharm). There is:
torch.cuda.is_available() # True
This shows that GPU is running in Pytorch code. Also I've checked GPU RAM by nvidia-smi, when Pytorch is running, RAM is occupied.
Although there is no Cuda folder like /usr/local/cuda/, when I run:
nvcc - V
There is:
Cuda compilation tools, release 9.1, V9.1.85
Can someone give me a hint about how these strange things happen? What should I do to make my tensorflow-gpu works? I get totally confused orz.

Anaconda environments install their own version of the CUDA toolkit when you install things like pytorch and tensorflow-gpu with conda. That looks like it's how your Python 3.6 environment was set up. Is your 2.7 version of Python a system install or part of another Python environment? It's possible that your Tensorflow was built against a CUDA toolkit that is no longer installed, for whatever reason, or in any case that you were trying to use Tensorflow while not having the path to the libraries that it was built against in your LD_LIBRARY_PATH (perhaps because of an unusual install location)
You can type which nvcc to see which part of your PATH is currently pointing to that executable. That will tell you where your CUDA toolkit is installed. I'm guessing that your PATH was still pointing to a conda environment when you last ran nvcc, or to some version of the CUDA toolkit in an unusual install location in any case.
First, I'd suggest abandoning any effort to use your system python with Tensorflow. My suggestion is to either modify or create a new conda environment and install tensorflow-gpu with conda, which will also install the CUDA toolkit for that environment. Note that your CUDA install will not be in /usr/local/cuda if you go down this path, it'll be located inside your conda environment instead.

TensorFlow GPU doesn't work, How to install it?

I started learning about the tensorflow recently and decided to switch to the GPU version, because it is much faster, but I can not import it, it always gives the same error.
I already tried:
Installing it by pip, in python 3.6.8, cuda 10 and the most recent cuDNN for cuda 10
I tried reinstalling python, CUDA and cuDNN
Tried installing Visual Studio and installed CUDA 9 and cuDnn
I tried installing the latest Anaconda, created a "default" env and another in python 3.6 (also tried in 3.5), pip install tensorflow-gpu in both cases
my last attempt was to follow a tutorial on youtube, I did exactly as demonstrated (https://www.youtube.com/watch?v=KZFn0dvPZUQ)
Everything i tried returned the same error.
Traceback: https://pastebin.com/KMEsZAmq
The complete code: https://pastebin.com/7tS0Rd5S (was working on CPU version)
.
My Specs:
i5-8400
8 GB Ram
GTX 1060 6GB
W10 home x64

just have a look here:
https://www.tensorflow.org/install/gpu
Tensorflow supports CUDA 9.0, you will need to downgrade your CUDA or use one of the tensorflow's docker images:
https://www.tensorflow.org/install/docker
via docker it won't use your CUDA drivers

no cudnn 6.0 for cuda toolkit 9.0

trying to install tensorflow gpu on windows 10 since three days.
https://www.tensorflow.org/install/install_windows#requirements_to_run_tensorflow_with_gpu_support
says :
If you are installing TensorFlow with GPU support using one of the mechanisms described in this guide, then the following NVIDIA software must be installed on your system:
CUDA® Toolkit 9.0. For details, see NVIDIA's documentation Ensure that you append the relevant Cuda pathnames to the %PATH% environment variable as described in the NVIDIA documentation.
The NVIDIA drivers associated with CUDA Toolkit 9.0.
cuDNN v6.0. For details, see NVIDIA's documentation. Note that cuDNN is typically installed in a different location from the other CUDA DLLs. Ensure that you add the directory where you installed the cuDNN DLL to your %PATH% environment variable.
GPU card with CUDA Compute Capability 3.0 or higher. See NVIDIA documentation for a list of supported GPU cards.
I downloaded cuda toolkit 9.0 from archives.
but there is no cudnn 6.0 for cuda 9.0 here : https://developer.nvidia.com/rdp/cudnn-download
It's driving me mad, as only thing available there is cudnn v7.
Please help me.

Apparently I cant comment... but I am having this exact same issue! Tensorflow has conflicting requirements for install. Cuda Tookit V8.0 is the last supported version for cudnn V6.0

For everyone who comes to this thread with issues on cudNN or cudart errors, here's a few notes:
Tensorflow documentation may or may not be updated quickly enough after a new release.
Tensorflow can be compiled (built) from scratch, which allows you to decide what CUDA and cuDNN version to use, so if you are using a pre-compiled binary, you will need the version of CUDA and cuDNN it was built for.
You need to have cuDNN in the path.
Tensorflow's documentation for installing a binary will always specify the version of CUDA and cuDNN it needs.
If things don't work, try running a simple hello world tensorflow program and read the errors to know what version of CUDA / cuDNN to use.
For example, a missing cudart64_81.dll needs the 64 bit version of CUDA 8.1.
A missing cudnn64_6.dll needs cuDNN 6.0
CUDA can be downloaded from: https://developer.nvidia.com/cuda-toolkit-archive
cuDNN can be downloaded from: https://developer.nvidia.com/rdp/cudnn-archive

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.