AWS DLAMI - GPUs not showing up for Keras

AWS DLAMI - GPUs not showing up for Keras - python

I have created a DLAMI (Deep Learning AMI (Amazon Linux) Version 8.0 - ami-9109beee on g2.8xlarge) and installed jupyter notebook to create a simple Keras LSTM. When I try to turn my Keras model into a GPU model using the multi_gpu_model function, I see the following error logged:
Ignoring visible gpu device (device: 0, name: GRID K520, pci bus id:
0000:00:03.0, compute capability: 3.0) with Cuda compute capability
3.0. The minimum required Cuda capability is 3.5.
I've tried reinstalling tensorflow-gpu to no avail. Is there any way to align the compatibilities on this AMI?

This was resolved by uninstalling, then reinstalling tensorflow-gpu through the conda environment provided by the AMI.

TensorFlow binaries that you install through pip or similar are only built to support CUDA compute capability 3.5, but TensorFlow does support compute capability 3.0.
Unfortunately the only way to obtain a TensorFlow installation that supports compute capability 3.0 is by building from source.

Related

tensorflow framework error: duplicate node name in graph: 'ones'

I finally got the tensorflow executor to successfully open the appropriate libraries in the pic below:
However, it gave me a framework error:
here are my versions:
cuda 10.1
cudnn 7.6.4 for 10.1
tensorflow 2.1.0
tensorflow-gpu 2.1.0
I ran (tf.config.list_physical_devices('GPU')), it was fine.

This thread (link added) in which they mentioned its device 0 (basically the device ID which the system identifies as primary GPU)
Tensorflow not detecting GPU - Adding visible gpu devices: 0
framework error might be because of the TensorFlow version, Try using a more recent version.

Cant train with GPU in TensorFlow

I'm working on a CNN, and I noticed that during the training phase it uses CPU 100% instead of GPU (I have a GTX 1660Ti).
Tensorflow doesn't recognize my 1660Ti
I tried to follow this guide from TensorFlow website.
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
outputs
Num GPUs Available: 0
I tried to read all devices recognized by TensorFlow
tf.config.list_physical_devices()
outputs
[ PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU') ]
What i read on the topic
Searching in the internet I found that maybe I had to install NVidia CUDA toolkit. I did it from here, but it didn't solve it.
I found that NVidia CUDA is not always enabled on all GPUs: source. I found that a little strange, why should NVidia cut off a part of their customers from using CUDA?
Additional informations
My requirements.txt (if software version can help to solve my problems):
matplotlib==3.4.2
keras==2.4.3
tensorflow-gpu==2.5.0
seaborn==0.11.1
I'm running the python code in a Jupyter Notebook (installed via pip)
My question
There's a way to use my GPU for CUDA (or at least use TensorFlow, like in this case)?

I Finally solved it.
I had to download cuDNN from here, and following this installation guide I finally got it working.
import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
now outputs
Num GPUs Available: 1
and
tf.config.list_physical_devices()
now outputs
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'),

Problem on using GPU and Tensorflow: "no kernel image is available for execution on the device"

I've got this message:
F .\tensorflow/core/kernels/random_op_gpu.h:232] Non‑OK‑status:
GpuLaunchKernel(FillPhiloxRandomKernelLaunch, num_blocks, block_size,
0, d.stream(), gen, data, size, dist) status: Internal: no kernel
image is available for execution on the device
while launching my YOLO detecting algorithm using the GPU of my laptop. (If I disable the GPU research by TensorFlow, everything works fine but slow)
I thought it was due to an error on CUDA/cuDNN installing procedure but I've performed the checks and all seems to be fine.
Can someone help me to figure out what's going on?
I'm using:
Windows 10
GPU: NVIDIA GeForce 940MX
CUDA 10.1
cuDNN 7.6
Python 3.7
Tensorflow 2.3.1
keras 2.4

Can I get tensorflow-gpu to work with NVIDIA GeForce MX130?

I understand this is not a recommended setup for machine learning in any sense, but I would like to work with what I have.
Not being an expert, I have been told that tf-gpu should work with any device supported by cuda.
When I run:
from numba import cuda
cuda.detect()
I get:
Found 1 CUDA devices
id 0 b'GeForce MX130' [SUPPORTED]
compute capability: 5.0
pci device id: 0
pci bus id: 1
Summary:
1/1 devices are supported
And I can get the GPU to work with some basic 'vectorized' tasks.
Also, running:
import tensorflow as tf
tf.test.is_built_with_cuda()
will return True
However, running
tf.config.experimental.list_physical_devices('gpu')
will return an empty list.
Running:
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
Will return:
Num GPUs Available: 0
Running:
strategy = tf.distribute.MirroredStrategy()
print("Number of devices: {}".format(strategy.num_replicas_in_sync))
will return:
WARNING:tensorflow:There are non-GPU devices in `tf.distribute.Strategy`, not using nccl allreduce.
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:CPU:0',)
Number of devices: 1
I have trained some basic models with the non-gpu version of tensorflow but I have no clue about how to deal with tf-gpu. I was able to fit a model with CuDNNLSTM layers, but the script didn't use the GPU, according to task manager.
I will appreciate any advice on how to get it to use my 'gpu' or a confirmation that it is not possible. Thanks!
EDITED:
I uninstalled keras and both tensorflow versions and installed only tensorflow-gpu. Nothing changed.

Unfortunately No.
Even though the official specs stated 'Yes', the CUDA GPU list did not mentioned MX130 as part of its list.
(I also running MX130 on my notebook)
reference:
official specs: https://www.nvidia.com/en-us/geforce/gaming-laptops/mx130/specifications/
CUDA enabled GPU list: https://developer.nvidia.com/cuda-gpus

Absolutely YES!
I assume that the compute capability: 5.0 is enough.
I tested my Geforce MX130 with tensorflow-gpu installed by conda (which handles the cuda, versions compatibility, etc.) in Python 3.7
conda install tensorflow-gpu
That's it! no more actions were required.
The following versions were installed:
tensorflow-gpu: 2.1.0
cudatoolkit: 10.1.243
cudnn: 7.6.5
... and it worked!

Install Cuda and CuDNN both. Set the paths for them. For checking if TensorFlow is using GPU, use this:
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
It should show your GPU name in its output.

Yes you can...tensorflow-gpu 2.5.0 + CUDA 11.2 + CUDNN 8.1
Review your enviroment path variable if you are using Windows. In my system it is pointing to...
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin;
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\libnvvp;
C:\Apps\CUDNN8.1\bin;

cuda_dnn error with Tensorflow in Windows: "could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED"

I want to perform some calculation with TensorFlow, Keras and Nvidia GeForce GPU on Windows. I installed the required software and it worked fine (TensorFlow, tensorflow-gpu, Keras, Nvidia driver, CUDA v9.2, CUDA DNN 9.0). I managed to perform some calculations.
Now, suddenly something went wrong since TensorFlow crashes Python when executing model.fit. The log messages are the following:
2018-06-14 06:43:44.339292: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:
0/task:0/device:GPU:0 with 1423 MB memory) -> physical GPU (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0)
2018-06-14 06:43:45.231652: **E T:\src\github\tensorflow\tensorflow\stream_executor\cuda\cuda_dnn.cc:455] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED**
2018-06-14 06:43:45.231904: E T:\src\github\tensorflow\tensorflow\stream_executor\cuda\cuda_dnn.cc:459] error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
2018-06-14 06:43:45.232704: F T:\src\github\tensorflow\tensorflow\core\kernels\conv_ops.cc:713] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms)
[I 06:44:05.636 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports
kernel cc7fe8f9-2c33-4e3d-9af6-8ddb6a12c0bb restarted
I tried to limit the per_process_gpu_memory_fraction from the Python code, but it didn't help.
import tensorflow as tf
gpu_options = tf.GPUOptions(allow_growth=True, per_process_gpu_memory_fraction=0.1)
s = tf.InteractiveSession(config=tf.ConfigProto(gpu_options=gpu_options))
I simply can't find the solution for Windows platform to this problem. I even tried to delete the Nvidia local cache files, but it didn't help either...
Any hint?

After some hours of digging, I issued two commands at Anaconda prompt:
***pip3 install --upgrade tensorflow
pip3 install --upgrade tensorflow-gpu***
Which solved probably some local inconsistency and now everything works again.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

AWS DLAMI - GPUs not showing up for Keras - python

This was resolved by uninstalling, then reinstalling tensorflow-gpu through the conda environment provided by the AMI.

TensorFlow binaries that you install through pip or similar are only built to support CUDA compute capability 3.5, but TensorFlow does support compute capability 3.0. Unfortunately the only way to obtain a TensorFlow installation that supports compute capability 3.0 is by building from source.

Related

tensorflow framework error: duplicate node name in graph: 'ones'

Cant train with GPU in TensorFlow

Problem on using GPU and Tensorflow: "no kernel image is available for execution on the device"

Can I get tensorflow-gpu to work with NVIDIA GeForce MX130?

cuda_dnn error with Tensorflow in Windows: "could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED"

Categories

Resources