keras with TensorFlow GPU, CUDA_ERROR_LAUNCH_FAILED hyper parameters search

keras with TensorFlow GPU, CUDA_ERROR_LAUNCH_FAILED hyper parameters search - python

I am working with Keras with TensorFlow back end.
I writing search script for tuning my CuDDNNLSTM hyper parameters .
After creating ~10 different CuDDNNLSTM networks I received the error:
tensorflow\stream_executer\cuda\cuda_driver.cc:1108 could not synchronize on CUDA context: CUDA_ERROR_LAUNCH_FAILED during the search process.
in : tensorflow\python\client\session.py in _run,_do_run,_do_call
OS: WIN10 64x
Python: 3.6.5
Keras version : 2.1.6
Tensorflow/GPU: 1.10.0
CUDA:9.0
cuddn:7.3
GPU: GeForce GTX 1080 Ti
May someone encounter in that problem ?

Related

(Tensorflow) Stuck at Epoch 1 during model.fit()

I've been trying to make Tensorflow 2.8.0 work with my Windows GPU (GeForce GTX 1650 Ti), and even though it detects my GPU, any model that I make will be stuck at Epoch 1 indefinitely when I try to use the fit method till the kernel (I've tried on jupyter notebook and spyder) hangs and restarts.
Based on Tensorflow's website, I've downloaded the respective cuDNN and CUDA versions, for which I've further verified (together with tensorflow's detection of my GPU) by running the various commands:
CUDA (Supposed to be 11.2)
(on command line)
nvcc --version
Build cuda_11.2.r11.2/compiler.29373293_0
(In python)
import tensorflow.python.platform.build_info as build
print(build.build_info['cuda_version'])
Output: '64_112'
cuDNN (Supposed to be 8.1)
import tensorflow.python.platform.build_info as build
print(build.build_info['cuda_version'])
Output: '64_8' # Looks like v8 but I've actually installed v8.1 (cuDNN v8.1.1 (Feburary 26th, 2021), for CUDA 11.0,11.1 and 11.2) so I think it's fine?
GPU Checks
tf.config.list_physical_devices('GPU')
Output: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
tf.test.is_gpu_available()
Output: True
tf.test.gpu_device_name()
Output: This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Created device /device:GPU:0 with 2153 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1650 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5
When I then try to fit any sort of model, it just fails following what I described above. What is surprising is that even though it can't load code such as that described in Tensorflow's CNN Tutorial, the only time it ever works is if I run the chunk of code from this stackoverflow question. This chunk of code looks almost the same as every other chunk that failed.
Can someone help me with this issue? I've been desperately testing TensorFlow with every chunk of code that I came across for the past couple of hours, and the only time where it does not get stuck at Epoch 1 is with the link above.
**(I've also tried running only on my CPU via os.environ['CUDA_VISIBLE_DEVICES'] = '-1' and everything seems to work fine)

Update (Solution)
It seems like the suggestions from this post helped - I've copied the following files from the zipped cudnn bin sub folder (cudnn-11.2-windows-x64-v8.1.1.33\cuda\bin) into my cuda bin folder (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin)
cudnn_adv_infer64_8.dll
cudnn_adv_train64_8.dll
cudnn_cnn_infer64_8.dll
cudnn_cnn_train64_8.dll
cudnn_ops_infer64_8.dll
cudnn_ops_train64_8.dll
It seems like I initially misinterpreted the copy all cudnn*.dll files as only copying over the cudnn64_8.dll file, rather than copying every other file listed above.

Do pretrained tensorflow models need to be used by machines with the same versions?

I trained a cnn on a Linux machine with keras/tensorflow but can’t get the pretrained model to run on my Raspberry Pi. The model was made on Ubuntu 16.04 with Python 3.6.7, tensorflow version 1.7.0, CuDNN 7.0.5 and CUDA 9. I am trying to run it on the Raspberry Pi 3 Model B+ with Python 3.5.3 and tensorflow version 1.13.1.
I have no problem loading and running the pretrained model on the same machine it was created on. The issue is only when I try to run that same pretrained model on the RPi system. I end up getting a segmentation fault.
I tried updating the Linux machine that created the model to tensorflow 1.12 but after tensorflow 1.12 successfully installed, I got "Failed to get convolution algorithm. This is probably because cuDNN failed to initialize" errors, so I'd rather not go down that route. I want to know if it's possible to just use this pretrained model with tensorflow 1.13.1 on the RPi.
Here's what I'm doing on the RPi:
>>> import tensorflow as tf
/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: compiletime version 3.4 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.5
return f(*args, **kwds)
/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: builtins.type size changed, may indicate binary incompatibility. Expected 432, got 412
>>> print(tf.__version__)
1.13.1
>>> from keras.models import load_model
Using TensorFlow backend.
>>> model = load_model(save_dir+model_name)
WARNING:tensorflow:From /home/pi/.local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /home/pi/.local/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
2019-03-25 17:08:11.471364: W tensorflow/core/framework/allocator.cc:124] Allocation of 209715200 exceeds 10% of system memory.
2019-03-25 17:12:55.123877: W tensorflow/core/framework/allocator.cc:124] Allocation of 209715200 exceeds 10% of system memory.
Backend terminated (returncode: -11)
Fatal Python error: Segmentation fault
I need some guidance on whether this is happening - are the versions incompatible? Maybe the model is too large for RPi (doubt it - it's a fairly shallow model with 18 layers)? The other forum posts I've seen with segmentation faults seemed a lot more dire (e.g., they can't even write standard commands in the Terminal without seeing a segmentation error) - this segmentation fault only happens (and happens repeatably) through the above commanding.
Any advice/help greatly appreciated!

tesla c2075 with tensorflow cuda version installation [duplicate]

This question already has answers here:
How can I make tensorflow run on a GPU with capability 2.x?
(3 answers)
Closed 3 years ago.
I'm new for GPU related model training.
I have Tesla C2075 with 6GB GPU and using keras CuDNNLSTM for faster training.
I have installed cuda-9 with cudnn=7.0.5, tensorflow-gpu==1.12.0 and using ubuntu 16.04.
For Tesla C2075 GPU model is compatible with cuda-9?
I have checked https://developer.nvidia.com/cuda-gpus link in this they have mentioned tesla C2075 is compute compatible to 2.0. what is compute compatible?
And while running my model tensorflow log,
tensorflow/core/common_runtime/gpu/gpu_device.cc:1482] Ignoring visible gpu device (device: 0, name: Tesla C2075, pci bus id: 0000:03:00.0, compute capability: 2.0) with Cuda compute capability 2.0. The minimum required Cuda capability is 3.5.
And I'm also getting error while model.fit(...),
InvalidArgumentError (see above for traceback): No OpKernel was registered to support Op 'CudnnRNN' with these attrs. Registered devices: [CPU,XLA_CPU,XLA_GPU], Registered kernels:
device='GPU'; T in [DT_DOUBLE]
device='GPU'; T in [DT_FLOAT]
device='GPU'; T in [DT_HALF]
[[node bidirectional_1/CudnnRNN (defined at /usr/local/lib/python3.5/dist-packages/tensorflow/contrib/cudnn_rnn/python/ops/cudnn_rnn_ops.py:922) = CudnnRNN[T=DT_FLOAT, direction="unidirectional", dropout=0, input_mode="linear_input", is_training=true, rnn_mode="lstm", seed=87654321, seed2=0](bidirectional_1/transpose, bidirectional_1/ExpandDims_1, bidirectional_1/ExpandDims_2, bidirectional_1/concat)]]
Thanks

The CUDA compute capability relates somehow to the architecture and hardware capabilities of the GPU, there is quite an extensive list in wikipedia.
The tensoflow webpage suggests that you need a GPU with CC bigger than 3.5 (older versions seemed to accept 3.0, but never lower).
Unfortunately this is a hardware limitation, the only way of changing your compute capability is using a different GPU. Simply said: you can not run Tensorflow in that GPU.

Tensorflow-gpu only running on CPU

I am running tensorflow 1.8.0 installed by 'pip -install tensorflow-gpu' with CUDA 9.0 and with cuDNN 7.1.4 on Windows 10 and I have been trying to get tensorflow to work with my NVIDIA GeForce GTX 860M graphics card. I am using IDLE to run and write my code in.
When I run any code it only performs on my CPU. Some sample code im running until I can get it to run on my gpu to do the MNIST tensorflow tutorials is:
import tensorflow as tf
# Initialize two constants
x1 = tf.constant([1,2,3,4])
x2 = tf.constant([5,6,7,8])
# Multiply
result = tf.multiply(x1, x2)
# Intialize the Session
sess = tf.Session()
# Print the result
print(sess.run(result))
# Close the session
sess.close()
When i run this I get this failed to create session error:
tensorflow.python.framework.errors_impl.InternalError: Failed to create session.
So I added the following lines to the beginning of the code
import os
os.environ["CUDA_VISIBLE_DEVICES"] = '1'
and now the code runs succesfully with the correct output
[ 5 12 21 32]
but it only runs on my CPU, and when I try the following commands in the IDLE command line,
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
the output is
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 1631189080169918107
]
I have tried restarting my computer, unistalling and reinstalling tensorflow-gpu and deleted other versions of CUDA I previously had installed, but I can not get my program to recognize my NVIDIA graphics card, and I looked it up and it is a supported card.
What else can I do to get my code to recognize and run on my graphics card?

Installing CUDA is not enough, you also need to install cuDNN 7.0.5 and some other dependencies too. Refer http://www.python36.com/install-tensorflow-gpu-windows/ for complete detail.

tensorflow keras do not use all available resources

I'm quite new in deep learning and, in order to improve my knowledge, I've been reading some books and following a video course on line.
In this videocourse I have to do an exercise with convolution neaural network.
I've builded a CNN with 10.000 images with dimension 64x64 pixels. (to recognize cats and dogs images)
from keras.models import Sequential
from keras.layers import Convolution2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense
# Initialising the CNN
classifier = Sequential()
# Step 1 - Convolution
classifier.add(Convolution2D(32,3,3,input_shape=(64,64,3),activation='relu'))
# Step 2 - Pooling
classifier.add(MaxPooling2D(pool_size = (2,2)))
classifier.add(Convolution2D(32,3,3,activation='relu'))
classifier.add(MaxPooling2D(pool_size = (2,2)))
# Step 3 - Flattening
classifier.add(Flatten())
#step 4 - Full Connection CNN
classifier.add(Dense(output_dim = 128 ,activation='relu'))
classifier.add(Dense(output_dim = 1 ,activation='sigmoid'))
# Compiling the CNN
classifier.compile(optimizer = 'adam' , loss = 'binary_crossentropy', metrics = ['accuracy'])
# Fitting the CNN to the images
from keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)
traininig_set = train_datagen.flow_from_directory(
'dataset/training_set',
target_size=(64, 64),
batch_size=32,
class_mode='binary')
test_set = test_datagen.flow_from_directory(
'dataset/test_set',
target_size=(64, 64),
batch_size=32,
class_mode='binary')
classifier.fit_generator(traininig_set,
steps_per_epoch=8000,
epochs=25,
validation_data=test_set,
validation_steps=2000)
The first time I installed Anaconda I didn't install the GPU module and when I started fitting my CNN
I had to wait 1190 seconds per epoch with the CPU working at 70%.
For your information my computer is quite fast. It's an i7 6800k overclocked to 4.2ghz an MSI GTX1080 video cards and 32gb 3333Mhz.
I've tought that with this computer installing the tensorflow gpu module was almost compulsory.
I watched in some posts how to check if the tensorflow is correctly configured to use GPU
and launching:
In [1]: from tensorflow.python.client import device_lib
In [2]: print(device_lib.list_local_devices())
I have this result:
2017-10-16 10:41:25.780983: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-10-16 10:41:25.781067: W C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-10-16 10:41:26.635590: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:955] Found device 0 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.8225
pciBusID 0000:03:00.0
Total memory: 8.00GiB
Free memory: 6.61GiB
2017-10-16 10:41:26.635807: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:976] DMA: 0
2017-10-16 10:41:26.636324: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:986] 0: Y
2017-10-16 10:41:26.637179: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\35\tensorflow\core\common_runtime\gpu\gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:03:00.0)
[name: "/cpu:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 16495731140373557390
, name: "/gpu:0"
device_type: "GPU"
memory_limit: 6740156088
locality {
bus_id: 1
}
incarnation: 6266244792178813148
physical_device_desc: "device: 0, name: GeForce GTX 1080, pci bus id: 0000:03:00.0"
]
With gpu:0, I read in the documentation that TensorFlow automatically will use GPU for computation.
Launching the fit method with this configuration I have to wait 950 sec per epoch, well better than 1190 seconds. The cpu never gets over 10% and, strangely, the GPU never gets over 10-13%.
I assume there is something wrong with my configuration because, the teacher in the course, with a MacBook notebook (I don't know the exact configuration actually) without tensorflow GPU module takes approximately 90 seconds per epoch.
I'm not a python or tensorflow expert, but it really seems there is something wrong or something else to understand.
Could someone give some advice, something to read, some tests to do to understand better where is the bottleneck?
Thank you

I don't have a GPU on windows, but I got a really good deal installing the Intel Distribution of Python with Anaconda: https://software.intel.com/en-us/articles/using-intel-distribution-for-python-with-anaconda.
For tensorflow, the best seems to be a python 3.5 environment (in the previous link, use python=3.5)
I then installed tensorflow with pip inside this environment made with anaconda. Follow installing with anaconda.
Then Keras with conda install keras. (But make sure it won't replace previous numpy and other installations, find proper installation commands not to replace these optimal packages). Maybe pip install keras could be better in case the conda version doesn't work. (Again, use the proper options not to replace your existing packages) - Don't let this keras installation replace your numpy packages or your tensorflow packages!
This gave me all processors absolutely 100% (according to windows resource monitor)
If this doesn't solve your problem, you can also try getting the numpy and scipy packages from here. Unfortunately I had no success at all with the keras and tensorflow packages from this source, but numpy is quality stuff.
With GPU, your problem may be the lack of a proper CUDA driver and the CUDNN library?
Follow this and this.
Unfortunatelly these things vary a lot from computer to computer. I followed strictly the instructions in these sites, and in tensorflow site, for a linux machine, and the results were astonishing.

On top of Daniel's answer (check CUDA & cuDNN) - it is never a good idea to have both tensorflow and tensorflow-gpu packages installed side by side; most probably, you are using the tensorflow (i.e. the CPU) one.
To avoid this, you should uninstall both packages, and then re-install tensorflow-gpu, i.e.:
pip uninstall tensorflow tensorflow-gpu
pip install tensorflow-gpu
See also accepted answer (and comment) here, on a similar issue.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

keras with TensorFlow GPU, CUDA_ERROR_LAUNCH_FAILED hyper parameters search - python

Related

(Tensorflow) Stuck at Epoch 1 during model.fit()

Do pretrained tensorflow models need to be used by machines with the same versions?

tesla c2075 with tensorflow cuda version installation [duplicate]

Tensorflow-gpu only running on CPU

tensorflow keras do not use all available resources

Categories

Resources