Make GPU available again after numba.cuda.close()?

Make GPU available again after numba.cuda.close()? - python

So when I run cuda.select_device(0) and then cuda.close(). Pytorch cannot access the GPU again, I know that there is way so that PyTorch can utilize the GPU again without having to restart the kernel. But I forgot how. Does anyone else know?
from numba import cuda as cu
import torch
# random tensor
a=torch.rand(100,100)
#tensor can be loaded onto the gpu()
a.cuda()
device = cu.get_current_device()
device.reset()
# thows error "RuntimeError: CUDA error: invalid argument"
a.cuda()
cu.close()
# thows error "RuntimeError: CUDA error: invalid argument"
a.cuda()
torch.cuda.is_available()
#True
And then trying to run cuda-based pytorch code yields:
RuntimeError: CUDA error: invalid argument

could you provide a more complete snippet, I am running
from numba import cuda
import torch
device = cuda.get_current_device()
device.reset()
cuda.close()
torch.cuda.isavailable()
which prints True, not sure what is your issue?

I had the same issue but with TensorFlow and Keras when iterating through a for loop to tune hyperparamenters. It did not free up the GPU memory used by older models. The cuda solution did not work for me. The following did:
import gc
gc.collect()

This has two possible culprits.
some driver issue be it numba driver or kernel driver managing gpu.
reason for suspecting this is Roger did not see this issue. or such issue is not reported to numba repo.
Another possible issue.
cuda.select_device(0)
which is not needed.
is there any strong reason do use this explicitly ?
Analysis :
do keep in mind design for cuda.get_current_device()
and cuda.close()
they are dependent on context and not gpu. as per documentations of get_current_device
Get current device associated with the current thread
Do check
gpus = cuda.list_devices()
before and after your code.
if the gpus listed are same.
then you need to create context again.
if creating context agiain is problem. please attach your complete code and debug log if possible.

Related

Stop printing warning message "Using CPU. Note: This module is much faster with a GPU."

I am running a Python program using the excellent EasyOCR module. It relies on PyTorch for image detection and every time I run it, it produces a warning: "Using CPU. Note: This module is much faster with a GPU." for each iteration.
What can I add to my code to stop this output without stopping other output? I don't have a GPU so that is not an option.

After looking into the source code, I noticed that verbose is set to True by default in the constructor.
After setting verbose=False the message stops appearing.
reader = easyocr.Reader(['en'], gpu=False, verbose=False)

I think there is a command line parameter --gpu=false. Have you tried that?

What does "RuntimeError: CUDA error: device-side assert triggered" in PyTorch mean?

I have seen a lot of specific posts to particular case-specific problems, but no fundamental motivating explanation. What does this error:
RuntimeError: CUDA error: device-side assert triggered
mean? Specifically, what is the assert that is being triggered, why is the assert there, and how do we work backwards to debug the problem?
As-is, this error message is near useless in diagnosing any problem because of the generality that it seems to say "some code somewhere that touches the GPU" has a problem. The documentation of Cuda also does not seem helpful in this regard, though I could be wrong.
https://docs.nvidia.com/cuda/cuda-gdb/index.html

When I shifted my code to work on CPU instead of GPU, I got the following error:
IndexError: index 128 is out of bounds for dimension 0 with size 128
So, perhaps there might be a mistake in the code which for some strange reason comes out as a CUDA error.

When a device-side error is detected while CUDA device code is running, that error is reported via the usual CUDA runtime API error reporting mechanism. The usual detected error in device code would be something like an illegal address (e.g. attempt to dereference an invalid pointer) but another type is a device-side assert. This type of error is generated whenever a C/C++ assert() occurs in device code, and the assert condition is false.
Such an error occurs as a result of a specific kernel. Runtime error checking in CUDA is necessarily asynchronous, but there are probably at least 3 possible methods to start to debug this.
Modify the source code to effectively convert asynchronous kernel launches to synchronous kernel launches, and do rigorous error-checking after each kernel launch. This will identify the specific kernel that has caused the error. At that point it may be sufficient simply to look at the various asserts in that kernel code, but you could also use step 2 or 3 below.
Run your code with cuda-memcheck. This is a tool something like "valgrind for device code". When you run your code with cuda-memcheck, it will tend to run much more slowly, but the runtime error reporting will be enhanced. It is also usually preferable to compile your code with -lineinfo. In that scenario, when a device-side assert is triggered, cuda-memcheck will report the source code line number where the assert is, and also the assert itself and the condition that was false. You can see here for a walkthrough of using it (albeit with an illegal address error instead of assert(), but the process with assert() will be similar.
It should also be possible to use a debugger. If you use a debugger such as cuda-gdb (e.g. on linux) then the debugger will have back-trace reports that will indicate which line the assert was, when it was hit.
Both cuda-memcheck and the debugger can be used if the CUDA code is launched from a python script.
At this point you have discovered what the assert is and where in the source code it is. Why it is there cannot be answered generically. This will depend on the developers intention, and if it is not commented or otherwise obvious, you will need some method to intuit that somehow. The question of "how to work backwards" is also a general debugging question, not specific to CUDA. You can use printf in CUDA kernel code, and also a debugger like cuda-gdb to assist with this (for example, set a breakpoint prior to the assert, and inspect machine state - e.g. variables - when the assert is about to be hit).
With newer GPUs, instead of cuda-memcheck you will probably want to use compute-sanitizer. It works in a similar fashion.

In my case, this error is caused because my loss function just receive values between [0, 1], and i was passing other values.
So, normalizing my loss function input, solve this:
saida_G -= saida_G.min(1, keepdim=True)[0]
saida_G /= saida_G.max(1, keepdim=True)[0]
Read this: link

How to get rid of theano.gof.compilelock?

I am using the joblib library to run multiple NN on my multiple CPU at once. the idea is that I want to make a final prediction as the average of all the different NN predictions. I use keras and theano on the backend.
My code works if I set n_job=1 but fails for anything >1.
Here is the error message:
[Parallel(n_jobs=3)]: Using backend ThreadingBackend with 3 concurrent workers.
Using Theano backend.
WARNING (theano.gof.compilelock): Overriding existing lock by dead process '6088' (I am process '6032')
WARNING (theano.gof.compilelock): Overriding existing lock by dead process '6088' (I am process '6032')
The code I use is rather simple (it works for n_job=1)
from joblib import Parallel, delayed
result = Parallel(n_jobs=1,verbose=1, backend="threading")(delayed(myNNfunction)(arguments,i,X_train,Y_train,X_test,Y_test) for i in range(network))
For information (I don't know if this is relevant), this is my parameters for keras:
os.environ['KERAS_BACKEND'] = 'theano'
os.environ["MKL_THREADING_LAYER"] = "GNU"
os.environ['MKL_NUM_THREADS'] = '3'
os.environ['GOTO_NUM_THREADS'] = '3'
os.environ['OMP_NUM_THREADS'] = '3'
I have tried to use the technique proposed here but it didn't change a thing. To be precise I have created a file in C:\Users\myname.theanorc with this in it:
[global]
base_compiledir=/tmp/%(user)s/theano.NOBACKUP
I've read somewhere (I can't find the link sorry) that on windows machines I shouldn't call the file .theanorc.txt but only .theanorc ; in any case it doesn't work.
Would you know what I am missing?

How can I change device used of theano

I tried to change the device used in theano-based program.
from theano import config
config.device = "gpu1"
However I got error
Exception: Can't change the value of this config parameter after initialization!
I wonder what is the best way of change gpu to gpu1 in code ?
Thanks

Another possibility which worked for me was setting the environment variable in the process, before importing theano:
import os
os.environ['THEANO_FLAGS'] = "device=gpu1"
import theano

There is no way to change this value in code running in the same process. The best you could do is to have a "parent" process that alters, for example, the THEANO_FLAGS environment variable and spawns children. However, the method of spawning will determine which environment the children operate in.
Note also that there is no way to do this in a way that maintains a process's memory through the change. You can't start running on CPU, do some work with values stored in memory then change to running on GPU and continue running using the values still in memory from the earlier (CPU) stage of work. The process must be shutdown and restarted for a change of device to be applied.
As soon as you import theano the device is fixed and cannot be changed within the process that did the import.

Remove the "device" config in .theanorc, then in your code:
import theano.sandbox.cuda
theano.sandbox.cuda.use("gpu0")
It works for me.
https://groups.google.com/forum/#!msg/theano-users/woPgxXCEMB4/l654PPpd5joJ

Sailfish: how to run on computer without gpu

Is there a way to run sailfish on a system without GPU?
Attempts so far: PyOpenCL works OK. However none of the examples from sailfish can be run properly!
Error appears in sailfish backend_opencl.py:
...
devices = platform.get_devices(device_type=cl.device_type.GPU)
RuntimeError: clGetDeviceIDs failed device not found

This is because the target device type is hardcoded as GPU.
You could try to change their code with something like :
platform.get_devices(device_type=cl.device_type.ALL)
It will lookup for any device : GPU, CPU, accelerator.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.