I've been trying to make Tensorflow 2.8.0 work with my Windows GPU (GeForce GTX 1650 Ti), and even though it detects my GPU, any model that I make will be stuck at Epoch 1 indefinitely when I try to use the fit method till the kernel (I've tried on jupyter notebook and spyder) hangs and restarts.
Based on Tensorflow's website, I've downloaded the respective cuDNN and CUDA versions, for which I've further verified (together with tensorflow's detection of my GPU) by running the various commands:
CUDA (Supposed to be 11.2)
(on command line)
nvcc --version
Build cuda_11.2.r11.2/compiler.29373293_0
(In python)
import tensorflow.python.platform.build_info as build
print(build.build_info['cuda_version'])
Output: '64_112'
cuDNN (Supposed to be 8.1)
import tensorflow.python.platform.build_info as build
print(build.build_info['cuda_version'])
Output: '64_8' # Looks like v8 but I've actually installed v8.1 (cuDNN v8.1.1 (Feburary 26th, 2021), for CUDA 11.0,11.1 and 11.2) so I think it's fine?
GPU Checks
tf.config.list_physical_devices('GPU')
Output: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
tf.test.is_gpu_available()
Output: True
tf.test.gpu_device_name()
Output: This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Created device /device:GPU:0 with 2153 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1650 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5
When I then try to fit any sort of model, it just fails following what I described above. What is surprising is that even though it can't load code such as that described in Tensorflow's CNN Tutorial, the only time it ever works is if I run the chunk of code from this stackoverflow question. This chunk of code looks almost the same as every other chunk that failed.
Can someone help me with this issue? I've been desperately testing TensorFlow with every chunk of code that I came across for the past couple of hours, and the only time where it does not get stuck at Epoch 1 is with the link above.
**(I've also tried running only on my CPU via os.environ['CUDA_VISIBLE_DEVICES'] = '-1' and everything seems to work fine)
Update (Solution)
It seems like the suggestions from this post helped - I've copied the following files from the zipped cudnn bin sub folder (cudnn-11.2-windows-x64-v8.1.1.33\cuda\bin) into my cuda bin folder (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin)
cudnn_adv_infer64_8.dll
cudnn_adv_train64_8.dll
cudnn_cnn_infer64_8.dll
cudnn_cnn_train64_8.dll
cudnn_ops_infer64_8.dll
cudnn_ops_train64_8.dll
It seems like I initially misinterpreted the copy all cudnn*.dll files as only copying over the cudnn64_8.dll file, rather than copying every other file listed above.
Related
Problem: I followed Microsoft's instruction in order to properly install and run TensorFlow 2 in WSL with GPU acceleration, using DirectML (here's the document).
Following the installation, when I try and import tensorflow in Python I get the following output:
>>> import tensorflow
2022-11-22 15:52:33.090032: I tensorflow/core/platform/cpu_feature_guard.cc:193]
This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)
to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/pietro/miniconda3/envs/testing/lib/python3.9/site-package
/tensorflow/__init__.py", line 440, in <module>
_ll.load_library(_plugin_dir)
File "/home/pietro/miniconda3/envs/testing/lib/python3.9/site-package
/tensorflow/python/framework/load_library.py", line 151, in load_library
py_tf.TF_LoadLibrary(lib)
tensorflow.python.framework.errors_impl.NotFoundError: /home/pietro
/miniconda3/envs/testing/lib/python3.9/site-packages/tensorflow-plugin
/libtfdml_plugin.so: undefined symbol:_ZN10tensorflow8internal15LogMessageFatalD1Ev, version tensorflow
I tried instead to follow the instructions for TensorFlow 1 and PyTorch (just in case something was wrong with my machine) and they both work perfectly, so I assume this issue only involves TensorFlow 2 somehow.
Did anyone encounter the same problem?
Thanks to everybody in advance :)
Pietro
Had the same problem, and downgrading TensorFlow from 2.11 fixed it. First remove the existing version:
pip uninstall tensorflow-cpu
Then re-install, this time with 2.10.0:
pip install tensorflow-cpu==2.10.0
After that, try importing it in Python. You should see something like the following (apologies for the messy output):
>>> import tensorflow as tf
2022-11-28 22:41:21.693757: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-28 22:41:21.806150: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-11-28 22:41:22.982148: I tensorflow/c/logging.cc:34] Successfully opened dynamic library libdirectml.d6f03b303ac3c4f2eeb8ca631688c9757b361310.so
2022-11-28 22:41:22.982289: I tensorflow/c/logging.cc:34] Successfully opened dynamic library libdxcore.so
2022-11-28 22:41:22.996385: I tensorflow/c/logging.cc:34] Successfully opened dynamic library libd3d12.so
2022-11-28 22:41:27.615851: I tensorflow/c/logging.cc:34] DirectML device enumeration: found 1 compatible adapters.
You can test that it works by adding two tensors. Running a command like the following:
print(tf.add([1.0, 2.0], [3.0, 4.0]))
And somewhere in the output, you should be able to verify that DirectML has found your GPU:
2022-11-28 22:43:42.632447: I tensorflow/c/logging.cc:34] DirectML: creating device on adapter 0 (NVIDIA GeForce RTX 3080)
Hope this helps!
I was working on my notebook using visual studio code, and training the model for the X times felt long.
So I decided I should try train them using GPU instead of GPU.
I follow this tutorial : https://www.youtube.com/watch?v=hHWkvEcDBO0
Which basically is very similar than what you can find in tensorflow docs.
You have to :
Download and install Nvidia CUDA Toolkit (here 11.x)
Download cuDNN
Extract its content in the same folder as the CUDA Toolkit
Add 2 vars in the path variable.
After this operation, I restarted my computer and I was able to see my GPU when I was doing :
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
Sadly every time a training or a predict cell was run, I was getting an error :
Canceled future for execute_request message before replies were done
I search into the logs to discover this :
warn 22:44:34.276: StdErr from Kernel Process 2022-10-03 22:44:34.276506: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8500
error 22:44:34.637: Disposing session as kernel process died ExitCode: 3221226505, Reason: c:\Users\Variraptor\anaconda3\lib\site-packages\traitlets\traitlets.py:2196: FutureWarning: Supporting extra quotes around Unicode is deprecated in traitlets 5.0. Use 'hmac-sha256' instead of '"hmac-sha256"' – or use CUnicode.
warn(
c:\Users\Variraptor\anaconda3\lib\site-packages\traitlets\traitlets.py:2151: FutureWarning: Supporting extra quotes around Bytes is deprecated in traitlets 5.0. Use 'dd5c0f8d-3774-496a-930e-bf20e1603651' instead of 'b"dd5c0f8d-3774-496a-930e-bf20e1603651"'.
warn(
2022-10-03 22:44:25.003727: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-10-03 22:44:25.604543: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3962 MB memory: -> device: 0, name: NVIDIA GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5
2022-10-03 22:44:34.276506: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8500
info 22:44:34.638: kill daemon
error 22:44:34.639: Raw kernel process exited code: 3221226505
error 22:44:34.640: Error in waiting for cell to complete [Error: Canceled future for execute_request message before replies were done
at t.KernelShellFutureHandler.dispose (c:\Users\Variraptor\.vscode\extensions\ms-toolsai.jupyter-2022.7.1102252217\out\extension.node.js:2:32353)
at c:\Users\Variraptor\.vscode\extensions\ms-toolsai.jupyter-2022.7.1102252217\out\extension.node.js:2:51405
at Map.forEach (<anonymous>)
at y._clearKernelState (c:\Users\Variraptor\.vscode\extensions\ms-toolsai.jupyter-2022.7.1102252217\out\extension.node.js:2:51390)
at y.dispose (c:\Users\Variraptor\.vscode\extensions\ms-toolsai.jupyter-2022.7.1102252217\out\extension.node.js:2:44872)
at c:\Users\Variraptor\.vscode\extensions\ms-toolsai.jupyter-2022.7.1102252217\out\extension.node.js:2:2320921
at t.swallowExceptions (c:\Users\Variraptor\.vscode\extensions\ms-toolsai.jupyter-2022.7.1102252217\out\extension.node.js:7:118974)
at dispose (c:\Users\Variraptor\.vscode\extensions\ms-toolsai.jupyter-2022.7.1102252217\out\extension.node.js:2:2320899)
at t.RawSession.dispose (c:\Users\Variraptor\.vscode\extensions\ms-toolsai.jupyter-2022.7.1102252217\out\extension.node.js:2:2325836)
at processTicksAndRejections (node:internal/process/task_queues:96:5)]
warn 22:44:34.640: Cell completed with errors {
message: 'Canceled future for execute_request message before replies were done'
}
info 22:44:34.642: Cancel all remaining cells true || Error || undefined
info 22:44:34.642: Cancel pending cells
info 22:44:34.642: Cell 8 executed with state Error
I tried to follow the answer from Nicola Manca, from another topic, but it doesn't seems to work with my problem.
Since it's my first step in GPU as a non native english student, I'm completely stuck.
Could you help me understanding the error here ?
Many thanks.
I also faced the same issue but my GPU installation was unsuccessful.
the issue is with the environment variables added once I deleted them, it's working for me.
-see this it may help you WingIDE C:\Python27 __init__.py" raise CodecRegistryError SyntaxError: invalid syntax
I have trained a faster R-CNN model for object detection using TensorFlow object detection with Google colab. But when testing videos google colab crashes, that's why I decided to test on my pc and installed CUDA 10.0 and Cudnn 7.6.5 and TensorFlow-gpu = 1.15.
But the test is so so slow as if it is running on a CPU. I get this message when testing so I guess it is using my GPU (photo).
Does anyone know a solution to test a video in a faster way?
Is the problem with CUDA or my GPU?
Thank you
From Comments
tf.test.is_gpu_available() tells whether Tensorflow can access a
GPU.
THIS FUNCTION IS DEPRECATED. It will be removed in a future version. Instructions for updating: Use
tf.config.list_physical_devices('GPU') instead
If it returns True, then there is no issue in using GPU. But
GeForce MX110 is slowest since it also has little RAM. For better
performance you can see the list of CUDA-enabled GPU cards. For
more details on hardware requirements you can refer
here.
(Paraphrased from Stanley Zheng and Dr.Snoopy)
I am asking this question because I am successfully training a segmentation network on my GTX 2070 on laptop with 8GB VRAM and I use exactly the same code and exactly the same software libraries installed on my desktop PC with a GTX 1080TI and it still throws out of memory.
Why does this happen, considering that:
The same Windows 10 + CUDA 10.1 + CUDNN 7.6.5.32 + Nvidia Driver 418.96 (comes along with CUDA 10.1) are both on laptop and on PC.
The fact that training with TensorFlow 2.3 runs smoothly on the GPU on my PC, yet it fails allocating memory for training only with PyTorch.
PyTorch recognises the GPU (prints GTX 1080 TI) via the command : print(torch.cuda.get_device_name(0))
PyTorch allocates memory when running this command: torch.rand(20000, 20000).cuda() #allocated 1.5GB of VRAM.
What is the solution to this?
Most of the people (even in the thread below) jump to suggest that decreasing the batch_size will solve this problem. In fact, it does not in this case. For example, it would have been illogical for a network to train on 8GB VRAM and yet to fail to train on 11GB VRAM, considering that there were no other applications consuming video memory on the system with 11GB VRAM and the exact same configuration is installed and used.
The reason why this happened in my case was that, when using the DataLoader object, I set a very high (12) value for the workers parameter. Decreasing this value to 4 in my case solved the problem.
In fact, although at the bottom of the thread, the answer provided by Yurasyk at https://github.com/pytorch/pytorch/issues/16417#issuecomment-599137646 pointed me in the right direction.
Solution: Decrease the number of workers in the PyTorch DataLoader. Although I do not exactly understand why this solution works, I assume it is related to the threads spawned behind the scenes for data fetching; it may be the case that, on some processors, such an error appears.
In the context of deep neural networks training, the training works faster when it uses the GPU as the processing unit.
This is done by configuring CudNN optimizations and changing the processing unit in the environment variables with the following line (Python 2.7 and Keras on Windows):
os.environ["THEANO_FLAGS"] = "floatX=float32,device=gpu,optimizer_including=cudnn,gpuarray.preallocate=0.8,dnn.conv.algo_bwd_filter=deterministic,dnn.conv.algo_bwd_data=deterministic,dnn.include_path=e:/toolkits.win/cuda-8.0.61/include,dnn.library_path=e:/toolkits.win/cuda-8.0.61/lib/x64"
The output is then:
Using gpu device 0: TITAN Xp (CNMeM is disabled, cuDNN 5110)
The problem is that the GPU memory is limited compared to the RAM (12GB and 128GB respectively), and the training is only one phase of the whole flow. Therefore I want to change back to CPU once the training is completed.
I've tried the following line, but it has no effect:
os.environ["THEANO_FLAGS"] = "floatX=float32,device=cpu"
My questions are:
Is it possible to change from GPU to CPU and vice-versa during runtime? (technically)
If yes, how can I do it programmatically in Python? (2.7, Windows, and Keras with Theano backend).
Yes this is possible at least for the tensorflow backend. You just have to also import tensorflow and put your code into the following with:
with tf.device('/cpu:0'):
your code
with tf.device('/gpu:0'):
your code
I am unsure if this also works for theano backend. However, switching from one backend to the other one is just setting a flag beforehand so this should not provide too much trouble.