I am very close to configuring a gpu enabled environment using the keras/tensorflow python library. When I try to train my model I get a long error message:
2018-11-27 18:34:47.776387: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2018-11-27 18:34:48.769258: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-27 18:34:48.769471: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2018-11-27 18:34:48.769595: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2018-11-27 18:34:48.769825: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3024 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-11-27 18:34:50.405201: E tensorflow/stream_executor/cuda/cuda_dnn.cc:363] Loaded runtime CuDNN library: 7.1.4 but source was compiled with: 7.2.1. CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
I've looked at a couple similar stack overflow posts and it appears that I need to either adjust the CuDNN version or the tensorflow-gpu version. I downloaded the correct version of CuDNN from Nvidia's website but it did not appear to do anything. I have also found several posts about changing my tensorflow-gpu version, but WHICH version should I download and HOW. I am using WIndows 10.
Hi this powershell Script should work to update your driver.
$ur='https://dsvmteststore.blob.core.windows.net/patches/cuda/cudnnpatch.zip?st=2019-02-20T04%3A10%3A00Z&se=2019-03-01T04%3A10%3A00Z&sp=r&sv=2017-07-29&sr=c&sig=w1VqK70ZcWWbbRW2K4Y8q5298dNxBqsoP71%2F4nF6uYM%3D'
Invoke-WebRequest -Uri $ur -OutFile '.\cudnnpatch.zip' -UseBasicParsing
$from='.\cudnnpatch.zip'
$to='.\'
cmd /c "c:\7-Zip\7z.exe x $from -o$to -y"
$root='C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0'
$dll='\bin\cudnn64_7.dll'
$header='\include\cudnn.h'
$lib='\lib\x64\cudnn.lib'
$from='.\cuda'
Copy-Item "$from$dll" "$root$dll" -Force
Copy-Item "$from$header" "$root$header" -Force
Copy-Item "$from$lib" "$root$lib" -Force
Related
I've got this message:
F .\tensorflow/core/kernels/random_op_gpu.h:232] Non‑OK‑status:
GpuLaunchKernel(FillPhiloxRandomKernelLaunch, num_blocks, block_size,
0, d.stream(), gen, data, size, dist) status: Internal: no kernel
image is available for execution on the device
while launching my YOLO detecting algorithm using the GPU of my laptop. (If I disable the GPU research by TensorFlow, everything works fine but slow)
I thought it was due to an error on CUDA/cuDNN installing procedure but I've performed the checks and all seems to be fine.
Can someone help me to figure out what's going on?
I'm using:
Windows 10
GPU: NVIDIA GeForce 940MX
CUDA 10.1
cuDNN 7.6
Python 3.7
Tensorflow 2.3.1
keras 2.4
I understand this is not a recommended setup for machine learning in any sense, but I would like to work with what I have.
Not being an expert, I have been told that tf-gpu should work with any device supported by cuda.
When I run:
from numba import cuda
cuda.detect()
I get:
Found 1 CUDA devices
id 0 b'GeForce MX130' [SUPPORTED]
compute capability: 5.0
pci device id: 0
pci bus id: 1
Summary:
1/1 devices are supported
And I can get the GPU to work with some basic 'vectorized' tasks.
Also, running:
import tensorflow as tf
tf.test.is_built_with_cuda()
will return True
However, running
tf.config.experimental.list_physical_devices('gpu')
will return an empty list.
Running:
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
Will return:
Num GPUs Available: 0
Running:
strategy = tf.distribute.MirroredStrategy()
print("Number of devices: {}".format(strategy.num_replicas_in_sync))
will return:
WARNING:tensorflow:There are non-GPU devices in `tf.distribute.Strategy`, not using nccl allreduce.
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:CPU:0',)
Number of devices: 1
I have trained some basic models with the non-gpu version of tensorflow but I have no clue about how to deal with tf-gpu. I was able to fit a model with CuDNNLSTM layers, but the script didn't use the GPU, according to task manager.
I will appreciate any advice on how to get it to use my 'gpu' or a confirmation that it is not possible. Thanks!
EDITED:
I uninstalled keras and both tensorflow versions and installed only tensorflow-gpu. Nothing changed.
Unfortunately No.
Even though the official specs stated 'Yes', the CUDA GPU list did not mentioned MX130 as part of its list.
(I also running MX130 on my notebook)
reference:
official specs: https://www.nvidia.com/en-us/geforce/gaming-laptops/mx130/specifications/
CUDA enabled GPU list: https://developer.nvidia.com/cuda-gpus
Absolutely YES!
I assume that the compute capability: 5.0 is enough.
I tested my Geforce MX130 with tensorflow-gpu installed by conda (which handles the cuda, versions compatibility, etc.) in Python 3.7
conda install tensorflow-gpu
That's it! no more actions were required.
The following versions were installed:
tensorflow-gpu: 2.1.0
cudatoolkit: 10.1.243
cudnn: 7.6.5
... and it worked!
Install Cuda and CuDNN both. Set the paths for them. For checking if TensorFlow is using GPU, use this:
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
It should show your GPU name in its output.
Yes you can...tensorflow-gpu 2.5.0 + CUDA 11.2 + CUDNN 8.1
Review your enviroment path variable if you are using Windows. In my system it is pointing to...
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin;
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\libnvvp;
C:\Apps\CUDNN8.1\bin;
I try to get Tensorflow to start on my machine, but I always get stuck with a "Could not identify NUMA node" error message.
I use a Conda environment:
tensorflow-gpu 1.12.0
cudatoolkit 9.0
cudnn 7.1.2
nvidia-smi says: Driver Version 418.43, CUDA Version 10.1
Here is the error code:
>>> import tensorflow as tf
>>> tf.Session()
2019-04-04 09:56:59.851321: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-04-04 09:56:59.950066: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:950] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2019-04-04 09:56:59.950762: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce GTX 750 Ti major: 5 minor: 0 memoryClockRate(GHz): 1.0845
pciBusID: 0000:01:00.0
totalMemory: 1.95GiB freeMemory: 1.84GiB
2019-04-04 09:56:59.950794: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-04-04 09:59:45.338767: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-04 09:59:45.338799: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-04-04 09:59:45.338810: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-04-04 09:59:45.339017: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1193] Could not identify NUMA node of platform GPU id 0, defaulting to 0. Your kernel may not have been built with NUMA support.
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Unfortunately, I have no idea what to do with the error code.
I could fix it with a new conda enviroment:
conda create --name tf python=3
conda activate tf
conda install cudatoolkit=9.0 tensorflow-gpu=1.11.0
A table of compatible CUDA/TF combinations is available here.
In my case, the combination of cudatoolkit=9.0 and tensorflow-gpu=1.12, inexplicably led to an std::bad_alloc error.
However, cudatoolkit=9.0 and tensorflow-gpu=1.11.0 works fine.
I had the same issue and I finally found out that it is because you used Adam to optimize the model. Once you use another optimizer it should work.
If you are getting this error on mac machine and error message includes this line Metal device set to: Apple M1 or any other chip than uninstall tensorflow-metal will resolve error.
pip uninstall tensorflow-metal
I want to perform some calculation with TensorFlow, Keras and Nvidia GeForce GPU on Windows. I installed the required software and it worked fine (TensorFlow, tensorflow-gpu, Keras, Nvidia driver, CUDA v9.2, CUDA DNN 9.0). I managed to perform some calculations.
Now, suddenly something went wrong since TensorFlow crashes Python when executing model.fit. The log messages are the following:
2018-06-14 06:43:44.339292: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:
0/task:0/device:GPU:0 with 1423 MB memory) -> physical GPU (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0)
2018-06-14 06:43:45.231652: **E T:\src\github\tensorflow\tensorflow\stream_executor\cuda\cuda_dnn.cc:455] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED**
2018-06-14 06:43:45.231904: E T:\src\github\tensorflow\tensorflow\stream_executor\cuda\cuda_dnn.cc:459] error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
2018-06-14 06:43:45.232704: F T:\src\github\tensorflow\tensorflow\core\kernels\conv_ops.cc:713] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms)
[I 06:44:05.636 NotebookApp] KernelRestarter: restarting kernel (1/5), keep random ports
kernel cc7fe8f9-2c33-4e3d-9af6-8ddb6a12c0bb restarted
I tried to limit the per_process_gpu_memory_fraction from the Python code, but it didn't help.
import tensorflow as tf
gpu_options = tf.GPUOptions(allow_growth=True, per_process_gpu_memory_fraction=0.1)
s = tf.InteractiveSession(config=tf.ConfigProto(gpu_options=gpu_options))
I simply can't find the solution for Windows platform to this problem. I even tried to delete the Nvidia local cache files, but it didn't help either...
Any hint?
After some hours of digging, I issued two commands at Anaconda prompt:
***pip3 install --upgrade tensorflow
pip3 install --upgrade tensorflow-gpu***
Which solved probably some local inconsistency and now everything works again.
I have created a DLAMI (Deep Learning AMI (Amazon Linux) Version 8.0 - ami-9109beee on g2.8xlarge) and installed jupyter notebook to create a simple Keras LSTM. When I try to turn my Keras model into a GPU model using the multi_gpu_model function, I see the following error logged:
Ignoring visible gpu device (device: 0, name: GRID K520, pci bus id:
0000:00:03.0, compute capability: 3.0) with Cuda compute capability
3.0. The minimum required Cuda capability is 3.5.
I've tried reinstalling tensorflow-gpu to no avail. Is there any way to align the compatibilities on this AMI?
This was resolved by uninstalling, then reinstalling tensorflow-gpu through the conda environment provided by the AMI.
TensorFlow binaries that you install through pip or similar are only built to support CUDA compute capability 3.5, but TensorFlow does support compute capability 3.0.
Unfortunately the only way to obtain a TensorFlow installation that supports compute capability 3.0 is by building from source.