I finally got the tensorflow executor to successfully open the appropriate libraries in the pic below:
However, it gave me a framework error:
here are my versions:
cuda 10.1
cudnn 7.6.4 for 10.1
tensorflow 2.1.0
tensorflow-gpu 2.1.0
I ran (tf.config.list_physical_devices('GPU')), it was fine.
This thread (link added) in which they mentioned its device 0 (basically the device ID which the system identifies as primary GPU)
Tensorflow not detecting GPU - Adding visible gpu devices: 0
framework error might be because of the TensorFlow version, Try using a more recent version.
Related
I'm working on a CNN, and I noticed that during the training phase it uses CPU 100% instead of GPU (I have a GTX 1660Ti).
Tensorflow doesn't recognize my 1660Ti
I tried to follow this guide from TensorFlow website.
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
outputs
Num GPUs Available: 0
I tried to read all devices recognized by TensorFlow
tf.config.list_physical_devices()
outputs
[ PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU') ]
What i read on the topic
Searching in the internet I found that maybe I had to install NVidia CUDA toolkit. I did it from here, but it didn't solve it.
I found that NVidia CUDA is not always enabled on all GPUs: source. I found that a little strange, why should NVidia cut off a part of their customers from using CUDA?
Additional informations
My requirements.txt (if software version can help to solve my problems):
matplotlib==3.4.2
keras==2.4.3
tensorflow-gpu==2.5.0
seaborn==0.11.1
I'm running the python code in a Jupyter Notebook (installed via pip)
My question
There's a way to use my GPU for CUDA (or at least use TensorFlow, like in this case)?
I Finally solved it.
I had to download cuDNN from here, and following this installation guide I finally got it working.
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
now outputs
Num GPUs Available: 1
and
tf.config.list_physical_devices()
now outputs
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'),
I've got this message:
F .\tensorflow/core/kernels/random_op_gpu.h:232] Non‑OK‑status:
GpuLaunchKernel(FillPhiloxRandomKernelLaunch, num_blocks, block_size,
0, d.stream(), gen, data, size, dist) status: Internal: no kernel
image is available for execution on the device
while launching my YOLO detecting algorithm using the GPU of my laptop. (If I disable the GPU research by TensorFlow, everything works fine but slow)
I thought it was due to an error on CUDA/cuDNN installing procedure but I've performed the checks and all seems to be fine.
Can someone help me to figure out what's going on?
I'm using:
Windows 10
GPU: NVIDIA GeForce 940MX
CUDA 10.1
cuDNN 7.6
Python 3.7
Tensorflow 2.3.1
keras 2.4
So I decided to move over to Ubuntu 20.04 from Windows 10 since it's a better environment that's widely used in industry. I did encounter some problems when I cloned my clean Windows code from Github I immediately encountered issues when trying to run it. For context, my code utilizes Tensorflow 2.2.0 to segment images. I'm training from scratch with a dataset. The problem immediately occurs when I ran my train.py program. Initially there was an error which was fixed by inserting the following two lines immediately after importing Tensorflow (after verifying that Tensorflow could see and access my GPU on Ubuntu)
gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpus[0], True)
Following this, I get OOM errors despite being able to run the same code and dataset on my Windows OS. I was able to run the program once when I initially cloned the repo but the day after, I got these OOM errors and have been encountering them since. The code still works on Windows. The following log file shows what I got when I ran in Ubuntu 20.04 on the same machine I house my Windows on (I am dual booting). I installed TF GPU support exactly as they outlined with the versions they mentioned. The only part that worries me is that TF doesn't have Ubuntu 20.04 support, which I speculate may be causing this issue but as a new Ubuntu user, I can't be entirely sure. The output of the Terminal and the train.py program are linked below.
https://drive.google.com/drive/folders/1GRkqCwwdnoPWzsPklq2NIS82P1bFfRr1?usp=sharing
Relevant specs:
GPU - NVIDIA 2070 Super
CPU - Ryzen 3600
RAM - 32 GB
Tensorflow - 2.2.0
NVIDIA Driver - 451.x
CUDA - 10.1
cuDNN - 7.6.5
Does anyone have any insight into what could be causing this issue?
I understand this is not a recommended setup for machine learning in any sense, but I would like to work with what I have.
Not being an expert, I have been told that tf-gpu should work with any device supported by cuda.
When I run:
from numba import cuda
cuda.detect()
I get:
Found 1 CUDA devices
id 0 b'GeForce MX130' [SUPPORTED]
compute capability: 5.0
pci device id: 0
pci bus id: 1
Summary:
1/1 devices are supported
And I can get the GPU to work with some basic 'vectorized' tasks.
Also, running:
import tensorflow as tf
tf.test.is_built_with_cuda()
will return True
However, running
tf.config.experimental.list_physical_devices('gpu')
will return an empty list.
Running:
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
Will return:
Num GPUs Available: 0
Running:
strategy = tf.distribute.MirroredStrategy()
print("Number of devices: {}".format(strategy.num_replicas_in_sync))
will return:
WARNING:tensorflow:There are non-GPU devices in `tf.distribute.Strategy`, not using nccl allreduce.
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:CPU:0',)
Number of devices: 1
I have trained some basic models with the non-gpu version of tensorflow but I have no clue about how to deal with tf-gpu. I was able to fit a model with CuDNNLSTM layers, but the script didn't use the GPU, according to task manager.
I will appreciate any advice on how to get it to use my 'gpu' or a confirmation that it is not possible. Thanks!
EDITED:
I uninstalled keras and both tensorflow versions and installed only tensorflow-gpu. Nothing changed.
Unfortunately No.
Even though the official specs stated 'Yes', the CUDA GPU list did not mentioned MX130 as part of its list.
(I also running MX130 on my notebook)
reference:
official specs: https://www.nvidia.com/en-us/geforce/gaming-laptops/mx130/specifications/
CUDA enabled GPU list: https://developer.nvidia.com/cuda-gpus
Absolutely YES!
I assume that the compute capability: 5.0 is enough.
I tested my Geforce MX130 with tensorflow-gpu installed by conda (which handles the cuda, versions compatibility, etc.) in Python 3.7
conda install tensorflow-gpu
That's it! no more actions were required.
The following versions were installed:
tensorflow-gpu: 2.1.0
cudatoolkit: 10.1.243
cudnn: 7.6.5
... and it worked!
Install Cuda and CuDNN both. Set the paths for them. For checking if TensorFlow is using GPU, use this:
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
It should show your GPU name in its output.
Yes you can...tensorflow-gpu 2.5.0 + CUDA 11.2 + CUDNN 8.1
Review your enviroment path variable if you are using Windows. In my system it is pointing to...
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin;
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\libnvvp;
C:\Apps\CUDNN8.1\bin;
I have created a DLAMI (Deep Learning AMI (Amazon Linux) Version 8.0 - ami-9109beee on g2.8xlarge) and installed jupyter notebook to create a simple Keras LSTM. When I try to turn my Keras model into a GPU model using the multi_gpu_model function, I see the following error logged:
Ignoring visible gpu device (device: 0, name: GRID K520, pci bus id:
0000:00:03.0, compute capability: 3.0) with Cuda compute capability
3.0. The minimum required Cuda capability is 3.5.
I've tried reinstalling tensorflow-gpu to no avail. Is there any way to align the compatibilities on this AMI?
This was resolved by uninstalling, then reinstalling tensorflow-gpu through the conda environment provided by the AMI.
TensorFlow binaries that you install through pip or similar are only built to support CUDA compute capability 3.5, but TensorFlow does support compute capability 3.0.
Unfortunately the only way to obtain a TensorFlow installation that supports compute capability 3.0 is by building from source.