Can't connect to GPU runtime

Can't connect to GPU runtime - python

Has anyone found a way to get a stable access to GPU runtime?
At the moment I follow this process:
Runtime -> Change runtime type -> "Python 2" and "GPU" -> Save -> Runtime -> Connec to runtime...
And check if GPU is enabled:
import tensorflow as tf
tf.test.gpu_device_name()
However, I get '', though 1 time in a 30 I was able to connect. Does anyone have any ideas what is going on?

The way to authoritatively know what kind of runtime you're connected is to hover over the CONNECTED button on the top-right; if the hover tooltip is suffixed "(GPU)" then you've got a GPU.
You can test for the health of the GPU HW by inspecting the output of executing !/opt/bin/nvidia-smi (which will only be found on a GPU runtime, by the way).
Tensorflow not being able to see the GPU while nvidia-smi can is usually a symptom of having done something like:
!pip install -U tensorflow
which gets you a TF build that doesn't know how to talk to the GPU. All colaboratory runtimes already have TF preinstalled, so you should not need to re-install it. If you need a particular feature of TF that is not available in the pre-installed version, you can get a build that knows how to talk to the GPU with !pip install -U tensorflow-gpu though note that the pre-installed TF build is better optimized for the particular CPU platform used so you'll be giving up some performance, as well as using a lot more RAM.
If you've only got a reinstalled TF build as a result of !pip install -U'ing something else that depends on tensorflow, you can avoid this by specifying --upgrade-strategy=only-if-needed which should leave the pre-installed TF in place.
If you've messed up your runtime and want to wipe the slate clean, execute
kill -9 -1 and wait 15-30s to reconnect.

Related

pytorch profiler nvidia cuda permission error (microsoft wsl): CUPTI_ERROR_NOT_INITIALIZED, CUPTI_ERROR_INSUFFICIENT_PRIVILEGES

I am trying to run a profiling script for pytorch on MS WSL 2.0 Ubuntu 20.04.
WSL is on the newest version (wsl --update). I am running the stable conda pytorch cuda 11.3 version from the pytorch website with pytorch 1.11. My GPU is a GTX 1650 Ti.
I can run my script fine and it finishes without error, but when I try to profile it using pytorch's bottleneck profiling tool python -m torch.utils.bottleneck run.py
it first throws this warning when starting the autograd profiler:
Running your script with the autograd profiler...
WARNING:2022-06-01 13:37:49 513:513 init.cpp:129] function status failed with error CUPTI_ERROR_NOT_INITIALIZED (15)
WARNING:2022-06-01 13:37:49 513:513 init.cpp:130] CUPTI initialization failed - CUDA profiler activities will be missing
Then, if I run for a small number of epochs, the script finishes fine, and it shows also the cuda profiling stats (even though it says profiler activities will be missing). But when I do a longer run, I get the message Killed after the script runs "through" the autograd profiler. The command dmesg gives this output at the end:
[ 1224.321233] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=python,pid=295,uid=1000
[ 1224.321421] Out of memory: Killed process 295 (python) total-vm:55369308kB, anon-rss:15107852kB, file-rss:0kB, shmem-rss:353072kB, UID:1000 pgtables:39908kB oom_score_adj:0
[ 1224.746786] oom_reaper: reaped process 295 (python), now anon-rss:0kB, file-rss:0kB, shmem-rss:353936kB
So, when using the profiler, there seems to be a memory error (it might not necessarily be related to the above CUPTI warning). Is this related to the profiler somehow saving too much data in-mem? Then, it might be a common problem that occurs for too long runs, right?
The cuda warning CUPTI_ERROR_NOT_INITIALIZED indicates that my CUPTI (short for "CUDA Profiling Tools Interface") is not running. I read in another post that this might be related to me running a newer version of CUPTI that is not backcompatible with the older version of CUDA 11.3. As cupti is not included in the cudatoolkit on conda by default, the system is probably trying to use / locate the cupti, but does not find it / cannot use it.
I'd appreciate any help for this issue. It would be quite nice to see a longer profiling run, in order to determine the bottlenecks / expensive operations in my pytorch code.
Thanks!

Make TensorFlow use the GPU on an ARM Mac

I have installed TensorFlow on an M1 (ARM) Mac according to these instructions. Everything works fine.
However, model training is happening on the CPU. How do I switch training to the GPU?
In: tensorflow.config.list_physical_devices()
Out: [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]
In the documentation of Apple's TensorFlow distribution I found the following slightly confusing paragraph:
It is not necessary to make any changes to your existing TensorFlow scripts to use ML Compute as a backend for TensorFlow and TensorFlow Addons. There is an optional mlcompute.set_mlc_device(device_name='any') API for ML Compute device selection. The default value for device_name is 'any', which means ML Compute will select the best available device on your system, including multiple GPUs on multi-GPU configurations. Other available options are CPU and GPU. Please note that in eager mode, ML Compute will use the CPU. For example, to choose the CPU device, you may do the following:
# Import mlcompute module to use the optional set_mlc_device API for device selection with ML Compute.
from tensorflow.python.compiler.mlcompute import mlcompute
# Select CPU device.
mlcompute.set_mlc_device(device_name='cpu') # Available options are 'cpu', 'gpu', and 'any'.
So I try to run:
from tensorflow.python.compiler.mlcompute import mlcompute
mlcompute.set_mlc_device(device_name='gpu')
and get:
WARNING:tensorflow: Eager mode uses the CPU. Switching to the CPU.
At this point I am stuck. How can I train keras models on the GPU to my MacBook Air?
TensorFlow version: 2.4.0-rc0

Update
The tensorflow_macos tf 2.4 repository has been archived by the owner. For tf 2.5, refer to here.
It's probably not useful to disable the eager execution fully but to tf. functions. Try this and check your GPU usages, the warning message can be misleading.
import tensorflow as tf
tf.config.run_functions_eagerly(False)
The current release of Mac-optimized TensorFlow has several issues that yet not fixed (TensorFlow 2.4rc0). Eventually, the eager mode is the default behavior in TensorFlow 2.x, and that is also unchanged in the TensorFlow-MacOS. But unlike the official, this optimized version uses CPU forcibly for eager mode. As they stated here.
... in eager mode, ML Compute will use the CPU.
That's why even we set explicitly the device_name='gpu', it switches back to CPU as the eager mode is still on.
from tensorflow.python.compiler.mlcompute import mlcompute
mlcompute.set_mlc_device(device_name='gpu')
WARNING:tensorflow: Eager mode uses the CPU. Switching to the CPU.
Disabling the eager mode may work for the program to utilize the GPU, but it's not a general behavior and can lead to such puzzling performance on both CPU/GPU. For now, the most appropriate approach can be to choose device_name='any', by that the ML Compute will query the available devices on the system and selects the best device(s) for training the network.

Try with turning off the eager execution...
via following
import tensorflow as tf
tf.compat.v1.disable_eager_execution()
Let me know if it works.

Setting up Anaconda to use Tensorboard Profiler

I just installed the tensorboard profiler with
pip install -U tensorboard_plugin_profile
The version is 2.3.
Tensorflow-Version 2.3
Tensorboard-Version 2.3
cudatoolkit-Version 10.1.243
When i now try to open the Profil-Tab in Tensorboard i see the Profiler-Window normaly but empty and the Error-Message:
DEM6561: Failed to load libcupti (is it installed and accessible?)
And the warning:
No step marker observed and hence the step time is unknown. This may happen if (1) training steps are not instrumented (e.g., if you are not using Keras) or (2) the profiling duration is shorter than the step time. For (1), you need to add step instrumentation; for (2), you may try to profile longer.
I think it has something to do with the enviroment-pathes- and variables but i dont know how they work with the virtuel enviroments of Anaconda. (I dont have a Cuda-Folder i can link to)
Had someone the same problem like me or any ideas what i can try?
Thanks ahead!

First, make sure that the CUPTI has been set to Path (via Environment Variables if you're using Windows), adding a path which should look like this:
%CUDA_PATH%\extras\CUPTI\lib64
Second, check if Tensorflow is looking for the correct CUPTI dll. I've encountered this exact same issue and as I checked, it appears that TF 2.4 is looking for cupti64_110.dll instead of cupti64_2020.1.1.dll. It is currently a known issue and will be addressed in TF 2.5. I'm not sure if that's the case too with TF 2.3.
I basically resolved the issue by copying the dll in the same directory and renaming it. Let me know if this helps!

How CPU talk to GPU?

I run Keras code (tensorflow backend) on GPU
I simply run it without setting anything the code is run on GPU
automatically and I can see the usage of GPU
When I run my normal python code and I check my
usage of GPU ,it turns out to be 0%.
My questions are:
(1) How to make CPU send the data to GPU and always let GPU compute it.
(2)I heard that the default of numpy array data type is always set to "float" ,Does is have something to do with GPU?

How to change the processing unit during run time (from GPU to CPU)?

In the context of deep neural networks training, the training works faster when it uses the GPU as the processing unit.
This is done by configuring CudNN optimizations and changing the processing unit in the environment variables with the following line (Python 2.7 and Keras on Windows):
os.environ["THEANO_FLAGS"] = "floatX=float32,device=gpu,optimizer_including=cudnn,gpuarray.preallocate=0.8,dnn.conv.algo_bwd_filter=deterministic,dnn.conv.algo_bwd_data=deterministic,dnn.include_path=e:/toolkits.win/cuda-8.0.61/include,dnn.library_path=e:/toolkits.win/cuda-8.0.61/lib/x64"
The output is then:
Using gpu device 0: TITAN Xp (CNMeM is disabled, cuDNN 5110)
The problem is that the GPU memory is limited compared to the RAM (12GB and 128GB respectively), and the training is only one phase of the whole flow. Therefore I want to change back to CPU once the training is completed.
I've tried the following line, but it has no effect:
os.environ["THEANO_FLAGS"] = "floatX=float32,device=cpu"
My questions are:
Is it possible to change from GPU to CPU and vice-versa during runtime? (technically)
If yes, how can I do it programmatically in Python? (2.7, Windows, and Keras with Theano backend).

Yes this is possible at least for the tensorflow backend. You just have to also import tensorflow and put your code into the following with:
with tf.device('/cpu:0'):
your code
with tf.device('/gpu:0'):
your code
I am unsure if this also works for theano backend. However, switching from one backend to the other one is just setting a flag beforehand so this should not provide too much trouble.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.