I was working on my notebook using visual studio code, and training the model for the X times felt long.
So I decided I should try train them using GPU instead of GPU.
I follow this tutorial : https://www.youtube.com/watch?v=hHWkvEcDBO0
Which basically is very similar than what you can find in tensorflow docs.
You have to :
Download and install Nvidia CUDA Toolkit (here 11.x)
Download cuDNN
Extract its content in the same folder as the CUDA Toolkit
Add 2 vars in the path variable.
After this operation, I restarted my computer and I was able to see my GPU when I was doing :
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
Sadly every time a training or a predict cell was run, I was getting an error :
Canceled future for execute_request message before replies were done
I search into the logs to discover this :
warn 22:44:34.276: StdErr from Kernel Process 2022-10-03 22:44:34.276506: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8500
error 22:44:34.637: Disposing session as kernel process died ExitCode: 3221226505, Reason: c:\Users\Variraptor\anaconda3\lib\site-packages\traitlets\traitlets.py:2196: FutureWarning: Supporting extra quotes around Unicode is deprecated in traitlets 5.0. Use 'hmac-sha256' instead of '"hmac-sha256"' – or use CUnicode.
warn(
c:\Users\Variraptor\anaconda3\lib\site-packages\traitlets\traitlets.py:2151: FutureWarning: Supporting extra quotes around Bytes is deprecated in traitlets 5.0. Use 'dd5c0f8d-3774-496a-930e-bf20e1603651' instead of 'b"dd5c0f8d-3774-496a-930e-bf20e1603651"'.
warn(
2022-10-03 22:44:25.003727: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-10-03 22:44:25.604543: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3962 MB memory: -> device: 0, name: NVIDIA GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5
2022-10-03 22:44:34.276506: I tensorflow/stream_executor/cuda/cuda_dnn.cc:384] Loaded cuDNN version 8500
info 22:44:34.638: kill daemon
error 22:44:34.639: Raw kernel process exited code: 3221226505
error 22:44:34.640: Error in waiting for cell to complete [Error: Canceled future for execute_request message before replies were done
at t.KernelShellFutureHandler.dispose (c:\Users\Variraptor\.vscode\extensions\ms-toolsai.jupyter-2022.7.1102252217\out\extension.node.js:2:32353)
at c:\Users\Variraptor\.vscode\extensions\ms-toolsai.jupyter-2022.7.1102252217\out\extension.node.js:2:51405
at Map.forEach (<anonymous>)
at y._clearKernelState (c:\Users\Variraptor\.vscode\extensions\ms-toolsai.jupyter-2022.7.1102252217\out\extension.node.js:2:51390)
at y.dispose (c:\Users\Variraptor\.vscode\extensions\ms-toolsai.jupyter-2022.7.1102252217\out\extension.node.js:2:44872)
at c:\Users\Variraptor\.vscode\extensions\ms-toolsai.jupyter-2022.7.1102252217\out\extension.node.js:2:2320921
at t.swallowExceptions (c:\Users\Variraptor\.vscode\extensions\ms-toolsai.jupyter-2022.7.1102252217\out\extension.node.js:7:118974)
at dispose (c:\Users\Variraptor\.vscode\extensions\ms-toolsai.jupyter-2022.7.1102252217\out\extension.node.js:2:2320899)
at t.RawSession.dispose (c:\Users\Variraptor\.vscode\extensions\ms-toolsai.jupyter-2022.7.1102252217\out\extension.node.js:2:2325836)
at processTicksAndRejections (node:internal/process/task_queues:96:5)]
warn 22:44:34.640: Cell completed with errors {
message: 'Canceled future for execute_request message before replies were done'
}
info 22:44:34.642: Cancel all remaining cells true || Error || undefined
info 22:44:34.642: Cancel pending cells
info 22:44:34.642: Cell 8 executed with state Error
I tried to follow the answer from Nicola Manca, from another topic, but it doesn't seems to work with my problem.
Since it's my first step in GPU as a non native english student, I'm completely stuck.
Could you help me understanding the error here ?
Many thanks.
I also faced the same issue but my GPU installation was unsuccessful.
the issue is with the environment variables added once I deleted them, it's working for me.
-see this it may help you WingIDE C:\Python27 __init__.py" raise CodecRegistryError SyntaxError: invalid syntax
Related
I predicted it using 2 models and ended the model.
And when I try to restart, the process is not running due to out of memory problem.
logger.debug(device_lib.list_local_devices())
This error occurred while checking the available gpu.
File "/home/kgtmx/anaconda3/envs/ai-engine/lib/python3.6/site-packages/tensorflow/python/client/device_lib.py", line 43, in list_local_devices
_convert(s) for s in _pywrap_device_lib.list_devices(serialized_config)
RuntimeError: CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory
enter image description here
enter image description here
I checked the gpu status with nvidia-smi, but there was no problem.
I tried looking for solutions to this problem but none of them solved it.
os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'
I tried to limit memory usage, but it has not already been applied.
tf.keras.backend.clear_session()
session_clear() does not work.
from numba import cuda
enter code here`cuda.current_context().trashing.clear()
Unable to access numba due to OOM phenomenon.
i am currently using tensorflow 2.5.0 and python 2.8.2
Problem: I followed Microsoft's instruction in order to properly install and run TensorFlow 2 in WSL with GPU acceleration, using DirectML (here's the document).
Following the installation, when I try and import tensorflow in Python I get the following output:
>>> import tensorflow
2022-11-22 15:52:33.090032: I tensorflow/core/platform/cpu_feature_guard.cc:193]
This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)
to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/pietro/miniconda3/envs/testing/lib/python3.9/site-package
/tensorflow/__init__.py", line 440, in <module>
_ll.load_library(_plugin_dir)
File "/home/pietro/miniconda3/envs/testing/lib/python3.9/site-package
/tensorflow/python/framework/load_library.py", line 151, in load_library
py_tf.TF_LoadLibrary(lib)
tensorflow.python.framework.errors_impl.NotFoundError: /home/pietro
/miniconda3/envs/testing/lib/python3.9/site-packages/tensorflow-plugin
/libtfdml_plugin.so: undefined symbol:_ZN10tensorflow8internal15LogMessageFatalD1Ev, version tensorflow
I tried instead to follow the instructions for TensorFlow 1 and PyTorch (just in case something was wrong with my machine) and they both work perfectly, so I assume this issue only involves TensorFlow 2 somehow.
Did anyone encounter the same problem?
Thanks to everybody in advance :)
Pietro
Had the same problem, and downgrading TensorFlow from 2.11 fixed it. First remove the existing version:
pip uninstall tensorflow-cpu
Then re-install, this time with 2.10.0:
pip install tensorflow-cpu==2.10.0
After that, try importing it in Python. You should see something like the following (apologies for the messy output):
>>> import tensorflow as tf
2022-11-28 22:41:21.693757: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-11-28 22:41:21.806150: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2022-11-28 22:41:22.982148: I tensorflow/c/logging.cc:34] Successfully opened dynamic library libdirectml.d6f03b303ac3c4f2eeb8ca631688c9757b361310.so
2022-11-28 22:41:22.982289: I tensorflow/c/logging.cc:34] Successfully opened dynamic library libdxcore.so
2022-11-28 22:41:22.996385: I tensorflow/c/logging.cc:34] Successfully opened dynamic library libd3d12.so
2022-11-28 22:41:27.615851: I tensorflow/c/logging.cc:34] DirectML device enumeration: found 1 compatible adapters.
You can test that it works by adding two tensors. Running a command like the following:
print(tf.add([1.0, 2.0], [3.0, 4.0]))
And somewhere in the output, you should be able to verify that DirectML has found your GPU:
2022-11-28 22:43:42.632447: I tensorflow/c/logging.cc:34] DirectML: creating device on adapter 0 (NVIDIA GeForce RTX 3080)
Hope this helps!
I am trying to run a profiling script for pytorch on MS WSL 2.0 Ubuntu 20.04.
WSL is on the newest version (wsl --update). I am running the stable conda pytorch cuda 11.3 version from the pytorch website with pytorch 1.11. My GPU is a GTX 1650 Ti.
I can run my script fine and it finishes without error, but when I try to profile it using pytorch's bottleneck profiling tool python -m torch.utils.bottleneck run.py
it first throws this warning when starting the autograd profiler:
Running your script with the autograd profiler...
WARNING:2022-06-01 13:37:49 513:513 init.cpp:129] function status failed with error CUPTI_ERROR_NOT_INITIALIZED (15)
WARNING:2022-06-01 13:37:49 513:513 init.cpp:130] CUPTI initialization failed - CUDA profiler activities will be missing
Then, if I run for a small number of epochs, the script finishes fine, and it shows also the cuda profiling stats (even though it says profiler activities will be missing). But when I do a longer run, I get the message Killed after the script runs "through" the autograd profiler. The command dmesg gives this output at the end:
[ 1224.321233] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=python,pid=295,uid=1000
[ 1224.321421] Out of memory: Killed process 295 (python) total-vm:55369308kB, anon-rss:15107852kB, file-rss:0kB, shmem-rss:353072kB, UID:1000 pgtables:39908kB oom_score_adj:0
[ 1224.746786] oom_reaper: reaped process 295 (python), now anon-rss:0kB, file-rss:0kB, shmem-rss:353936kB
So, when using the profiler, there seems to be a memory error (it might not necessarily be related to the above CUPTI warning). Is this related to the profiler somehow saving too much data in-mem? Then, it might be a common problem that occurs for too long runs, right?
The cuda warning CUPTI_ERROR_NOT_INITIALIZED indicates that my CUPTI (short for "CUDA Profiling Tools Interface") is not running. I read in another post that this might be related to me running a newer version of CUPTI that is not backcompatible with the older version of CUDA 11.3. As cupti is not included in the cudatoolkit on conda by default, the system is probably trying to use / locate the cupti, but does not find it / cannot use it.
I'd appreciate any help for this issue. It would be quite nice to see a longer profiling run, in order to determine the bottlenecks / expensive operations in my pytorch code.
Thanks!
I've been trying to make Tensorflow 2.8.0 work with my Windows GPU (GeForce GTX 1650 Ti), and even though it detects my GPU, any model that I make will be stuck at Epoch 1 indefinitely when I try to use the fit method till the kernel (I've tried on jupyter notebook and spyder) hangs and restarts.
Based on Tensorflow's website, I've downloaded the respective cuDNN and CUDA versions, for which I've further verified (together with tensorflow's detection of my GPU) by running the various commands:
CUDA (Supposed to be 11.2)
(on command line)
nvcc --version
Build cuda_11.2.r11.2/compiler.29373293_0
(In python)
import tensorflow.python.platform.build_info as build
print(build.build_info['cuda_version'])
Output: '64_112'
cuDNN (Supposed to be 8.1)
import tensorflow.python.platform.build_info as build
print(build.build_info['cuda_version'])
Output: '64_8' # Looks like v8 but I've actually installed v8.1 (cuDNN v8.1.1 (Feburary 26th, 2021), for CUDA 11.0,11.1 and 11.2) so I think it's fine?
GPU Checks
tf.config.list_physical_devices('GPU')
Output: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
tf.test.is_gpu_available()
Output: True
tf.test.gpu_device_name()
Output: This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Created device /device:GPU:0 with 2153 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1650 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5
When I then try to fit any sort of model, it just fails following what I described above. What is surprising is that even though it can't load code such as that described in Tensorflow's CNN Tutorial, the only time it ever works is if I run the chunk of code from this stackoverflow question. This chunk of code looks almost the same as every other chunk that failed.
Can someone help me with this issue? I've been desperately testing TensorFlow with every chunk of code that I came across for the past couple of hours, and the only time where it does not get stuck at Epoch 1 is with the link above.
**(I've also tried running only on my CPU via os.environ['CUDA_VISIBLE_DEVICES'] = '-1' and everything seems to work fine)
Update (Solution)
It seems like the suggestions from this post helped - I've copied the following files from the zipped cudnn bin sub folder (cudnn-11.2-windows-x64-v8.1.1.33\cuda\bin) into my cuda bin folder (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin)
cudnn_adv_infer64_8.dll
cudnn_adv_train64_8.dll
cudnn_cnn_infer64_8.dll
cudnn_cnn_train64_8.dll
cudnn_ops_infer64_8.dll
cudnn_ops_train64_8.dll
It seems like I initially misinterpreted the copy all cudnn*.dll files as only copying over the cudnn64_8.dll file, rather than copying every other file listed above.
I am on Windows 10, using Python 3.9.6 and my cv2 version is 4.4.0. I built OpenCV with Cuda successfully and after calling cv2.cuda.getCudaEnabledDeviceCount(), it returns 1 as expected. The following lines also work fine.
net = cv2.dnn.readNetFromCaffe(proto_file, weights_file)
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
# multiple lines
# processing frame
# and setting input blob
net.setInput(in_blob)
However, executing the following line throws an exception.
output = net.forward()
The exception:
cv2.error: OpenCV(4.4.0)
G:\opencv-4.4.0\opencv-4.4.0\modules\dnn\src\dnn.cpp:2353: error:
(-216:No CUDA support) OpenCV was not built to work with the selected
device. Please check CUDA_ARCH_PTX or CUDA_ARCH_BIN in your build
configuration. in function
'cv::dnn::dnn4_v20200609::Net::Impl::initCUDABackend'
The message says that my Cuda was not built to work with the selected device (which I'm guessing is my GPU).
It seems to have encountered a conflict with CUDA_ARCH_BIN and/or CUDA_ARCH_PTX. My GPU model is NVIDIA Geforce MX130 whose CUDA_ARCH_BIN value is what I found to be 6.1 and I set it according on CMake.
How can I resolve these issues? Let me know if I need to provide any more information.
"Sources say" the MX130 has a Maxwell core, not a Pascal core. Maxwell is the predecessor of Pascal.
Hence, you only have CUDA compute capability 5.0.
You should check that with an appropriate tool such as GPU-Z that does its best to query the hardware instead of going by specs.
Sources:
https://en.wikipedia.org/wiki/GeForce_10_series#GeForce_10_(10xx)_series_for_notebooks (notice how the Fab (nm) is different and the code name is GM108, not GPxxx)
https://www.techpowerup.com/gpu-specs/geforce-mx130.c3043