TensorFlow GPU doesn't work, How to install it?

TensorFlow GPU doesn't work, How to install it? - python

I started learning about the tensorflow recently and decided to switch to the GPU version, because it is much faster, but I can not import it, it always gives the same error.
I already tried:
Installing it by pip, in python 3.6.8, cuda 10 and the most recent cuDNN for cuda 10
I tried reinstalling python, CUDA and cuDNN
Tried installing Visual Studio and installed CUDA 9 and cuDnn
I tried installing the latest Anaconda, created a "default" env and another in python 3.6 (also tried in 3.5), pip install tensorflow-gpu in both cases
my last attempt was to follow a tutorial on youtube, I did exactly as demonstrated (https://www.youtube.com/watch?v=KZFn0dvPZUQ)
Everything i tried returned the same error.
Traceback: https://pastebin.com/KMEsZAmq
The complete code: https://pastebin.com/7tS0Rd5S (was working on CPU version)
.
My Specs:
i5-8400
8 GB Ram
GTX 1060 6GB
W10 home x64

just have a look here:
https://www.tensorflow.org/install/gpu
Tensorflow supports CUDA 9.0, you will need to downgrade your CUDA or use one of the tensorflow's docker images:
https://www.tensorflow.org/install/docker
via docker it won't use your CUDA drivers

Related

Why does nvidia-smi show a wrong CUDA version?

I have already read this previous issue, but it did not answer my question. Different CUDA versions shown by nvcc and NVIDIA-smi
The above issue answers the question whether there is a problem with the installation. But it does not answer my question "If I install other applications in Python that require CUDA, which CUDA version should I assume that I have?".
In the previous issue, the author had intentionally installed two different versions of CUDA on his system. But I have only installed CUDA 10.1 on my computer, yet Python claims that I have installed version 11.1.
CUDA was installed on my computer following instructions on Nvidias homepage, by downloading installer files. I have not installed CUDA packages via pip or pip3 in python.
Version according to cmd.
Version in the file system.
Version according to System Environment Variables.
Version according to nvidia-smi called from a python console.
If I install other applications in Python, which CUDA version should I assume that I have? How can I get rid of the 11.1 version, and only keep the 10.1 version?

"If I install other applications in Python that require CUDA, which CUDA version should I assume that I have?"
You have CUDA 10.1. You will satisify the needs of any CUDA application in python such as tensorflow, if that application was linked against CUDA 10.1.
If I install other applications in Python, which CUDA version should I assume that I have?
You have CUDA 10.1
How can I get rid of the 11.1 version, and only keep the 10.1 version?
You can't, and don't want to. The CUDA 11.1 version reported is the version of the CUDA driver API. CUDA applications that are usable in Python will be linked either against a specific version of the runtime API, in which case you should assume your CUDA version is 10.1, or else they will be linked against the driver API. If linked against the driver API only, then based on your GPU driver install, any linkage against any driver API version up through CUDA 11.1 will work. That would include any driver API applications linked against CUDA 10.1.
If you were to uninstall the driver that is reporting the 11.1 version, you would break your CUDA install and nothing would work. The driver reporting 11.1 is perfectly fine and no problem at all for usage of CUDA applications that expect CUDA 10.1.

How to get tensorflow keras to use my GPU?

I am trying to use keras in tensorflow to train a CNN network for some image classification. Obviously, the training running on my CPU is incredibly slow and so I need to use my GPU to do the training. I've found many similar questions on StackOverflow, none of which have helped me get the GPU to work, hence I am asking this question separately.
I've got an NVIDIA GeForce GTX 1060 3GB and the 466.47 NVIDIA driver installed. I've installed the CUDA toolkit from the NVIDIA website (installation is confirmed with nvcc -V command outputting my version 11.3), and downloaded the CUDNN library. I unzipped the CUDNN file and copied the files to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3, as stated on the NVIDIA website. Finally, I've checked that it's on PATH (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\bin and C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\libnvvp are both in the environment variable 'Path').
I then set up an environment using conda, downloading some packages that I need, like scikit-learn, as well as tensorflow-gpu=2.3 After booting my environment into Jupyter Notebook, I run this code to check to see if it's picking up the GPU:
import tensorflow as tf
print(tf.__version__)
print(tf.config.list_physical_devices())
And get this:
2.3.0
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]
I have tried literally everything I have come into contact with on this topic, but am not getting any success in getting it to work. Any help would be appreciated.

You, first, have to install all CUDA requirements. If you have Ubuntu 20.04, here is how you can install the requirements. Then it's the right time to install tensorflow. Asa you intended to utilize your GPU, you have install tensorflow-gpu library, not tensorflow alone.

I'm guessing you have installed TensorFlow correctly using pip install tensorflow.
NVIDIA GPU cards with CUDA architectures 3.5, 5.0, 6.0, 7.0, 7.5, 8.0 and higher than 8.0 are currently supported by TensorFlow. If you have the supported cards but TensorFlow cannot detect your GPU, you have to install the following software:
NVIDIA GPU drivers —CUDA 11.0 requires 450.x or higher.
CUDA Toolkit —TensorFlow supports CUDA 11 (TensorFlow >= 2.4.0)
cuDNN SDK 8.0.4
You can optionally install TensorRT 6.0 to improve latency and throughput for inference on some models.
For more info, please refer to the TensorFlow documentation: https://www.tensorflow.org/install/gpu

I recommend to use conda to install the CUDA Toolkit packages as well as CUDNN, which will avoid wasting time downloading the right packages (or making changes in the system folders)
conda install -c conda-forge cudatoolkit=11.0 cudnn=8.1
Then you can install keras and tensorflow-gpu by typing
conda install keras==2.7
pip install tensorflow-gpu==2.7
and it will work directly.
Based on this issue

RTX 3080, Ubuntu 18.04, cuda 11.1, gcc 9.1.0, fail to install pytorch 1.7.1

The NVIDIA driver is shown following:
1
I used anaconda and create a virtual environment to install pytroch 1.7.1, when I type the test code:
print(torch.rand(3,3).cuda())
There came the error:
2
This may caused by the error Geforce driver. But I have confirmed that the cuda, gcc and pytorch version are matched.
I installed pytorch 1.7.0，1.6.0 and I got the same results.
Next, I created another environment and installed pytorch 1.5.0. After I typed the same test code, 5 minutes later I could see the result:
3
If I run this code again, I can get the result right away. But if I exit the python and come into python again, when run this test code in the first time I have to wait again.
How to solve this problem or which pytorch and cuda version should I install? Thanks a lot.

Why does my CUDA work for Pytorch but not for Tensorflow suddenly?

The machine I'm using is with Titan XP and running Ubuntu 18.10. I'm not the owner so I'm not sure how it was configured previously. The cuda version is 9.*, most likely 9.0. There is no folder like /usr/local/cuda. Though it sounds strange (because no Cuda is compatible with 18.10), previously it worked pretty well both for Tensorflow and Pytorch. Now, when running tensorflow-gpu v1.12.0 in python 2.7, cudatoolkit 9.2 and cudnn 7.2.1 (this worked well previously without any change), it reports:
ImportError: libcublas.so.9.0: cannot open shared object file: No such file of directory
But, when I change my conda env to python 3.6 with pytorch 0.4.1 , cudatoolkit 9.0 and cudnn 7.6 (they are shown in pycharm). There is:
torch.cuda.is_available() # True
This shows that GPU is running in Pytorch code. Also I've checked GPU RAM by nvidia-smi, when Pytorch is running, RAM is occupied.
Although there is no Cuda folder like /usr/local/cuda/, when I run:
nvcc - V
There is:
Cuda compilation tools, release 9.1, V9.1.85
Can someone give me a hint about how these strange things happen? What should I do to make my tensorflow-gpu works? I get totally confused orz.

Anaconda environments install their own version of the CUDA toolkit when you install things like pytorch and tensorflow-gpu with conda. That looks like it's how your Python 3.6 environment was set up. Is your 2.7 version of Python a system install or part of another Python environment? It's possible that your Tensorflow was built against a CUDA toolkit that is no longer installed, for whatever reason, or in any case that you were trying to use Tensorflow while not having the path to the libraries that it was built against in your LD_LIBRARY_PATH (perhaps because of an unusual install location)
You can type which nvcc to see which part of your PATH is currently pointing to that executable. That will tell you where your CUDA toolkit is installed. I'm guessing that your PATH was still pointing to a conda environment when you last ran nvcc, or to some version of the CUDA toolkit in an unusual install location in any case.
First, I'd suggest abandoning any effort to use your system python with Tensorflow. My suggestion is to either modify or create a new conda environment and install tensorflow-gpu with conda, which will also install the CUDA toolkit for that environment. Note that your CUDA install will not be in /usr/local/cuda if you go down this path, it'll be located inside your conda environment instead.

ImportError: Could not find 'cudnn64_7.dll' , while importing tensorflow

This is an issue that many of us must have come across. While installing tensorflow, this is one of the error messages that pops up for most of the users. I could not install Tensorflow 1.10.0 due to the following error that I posted a few days back at
ImportError: Could not find 'cudnn64_7.dll'
I am using Windows 10 and was trying to implement
import tensorflow as tf
through Conda environment.
What can I do to resolve this issue?

1) Go to the cuDNN Archive
2) Click on Download cuDNN v7.6.1 (June 24, 2019), for CUDA 10.0
(you need CUDA 10 installed. NOT 10.1. If you installed the wrong version, uninstall
it and install the 10 which works with tensorflow-gpu)
3) Click on the link for your operating system.
4) Unzip it. It should unzip to a folder called CUDA.
5) Go into the CUDA folder and copy the contents
6) Open the installed CUDA 10 location. For windows 10 it is "Download cuDNN v7.6.1 (June 24, 2019), for CUDA 10.0"
7) Paste the contents from your clipboard to the folder.
8) have a coffee. You are done!

Jeremy Demers' answer worked for me, and I was able to repeat his process. However, I used cuDNN 10.1 instead of version 10 and installed tensorflow version 2.4.0-dev20200705 first via pip install tensorflow-gpu, and then `pip install tensorflow-nightly to get the latest build. Hardware: 2060 Super, 8GB.
Edit:
The recommended way to get tensorflow nightly via pip is:
pip install tf-nightly

Here is what I did.
Step 1) Installed 'NVIDIA GEFORCE EXPERIENCE' in my computer to check my Driver version.
Step 2) The driver version was an old one. Update was available. So I updated my Graphic driver.
My GPU properties now are:-
NVIDIA GEFORCE EXPERIENCE Version 3.14.1.48
GeForce 940MX
Driver Version 398.82
Intel(R) Core(TM) i5-7200U CPU #2.50GHz
7.9 GB RAM
Now, through conda environment ( I created an environment named 'tensorflow' ), when I executed the statement
(tensorflow) C:\Users\Arnab Sinha>pip install --ignore-installed --upgrade tensorflow-gpu
I encountered the following message :-
pandas 0.23.4 requires python-dateutil>=2.5.0, which is not installed.
pandas 0.23.4 requires pytz>=2011k, which is not installed.
I then installed the required packages by executing the following commands one after the other
pip install python-dateutil
and
pip install pytz
after which I ran the command in Python 3.6.6
import tensorflow as tf
and then
print(tf.__version__)
which gave the output
1.10.0
Here is how I installed Tensorflow 1.10.0 into my computer. The Anaconda Navigator however does not have the update of Tensorflow 1.10.0. Please inform me if you have found the update for it.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.