Is there a way to run sailfish on a system without GPU?
Attempts so far: PyOpenCL works OK. However none of the examples from sailfish can be run properly!
Error appears in sailfish backend_opencl.py:
...
devices = platform.get_devices(device_type=cl.device_type.GPU)
RuntimeError: clGetDeviceIDs failed device not found
This is because the target device type is hardcoded as GPU.
You could try to change their code with something like :
platform.get_devices(device_type=cl.device_type.ALL)
It will lookup for any device : GPU, CPU, accelerator.
Related
I try to run a Python script that trains several Neural Networks using TensorFlow and Keras. The problem is that I cannot restrict the number of cores used on the server, even though it works on my local desktop.
The basic structure is that I have defined a function run_net that runs the neural net. This function is called with different parameters in parallel using joblib (see below). Additionally, I have tried running the function iteratively with different parameters which didn't solve the problem.
Parallel(n_jobs=1, backend="multiprocessing")(
delayed(run_net)
If I run that on my local Windows Desktop, everything works fine. However, if I try to run the same script on our institute's server with 48 cores and check CPU usage using htop command, all cores are used. I already tried setting n_jobs in joblib Parallel to 1 and it looks like CPU usage goes to 100% once the tensorflow models are trained.
I already searched for different solutions and the main one that I found is the one below. I define that before running the parallel jobs shown above. I also tried placing the code below before every fit or predict method of the model.
NUM_PARALLEL_EXEC_UNITS = 5
config = tf.compat.v1.ConfigProto(
intra_op_parallelism_threads=NUM_PARALLEL_EXEC_UNITS,
inter_op_parallelism_threads=2,
device_count={"CPU": NUM_PARALLEL_EXEC_UNITS},
)
session = tf.compat.v1.Session(config=config)
K.set_session(session)
At this point, I am quite lost and have no idea how to make Tensorflow and/or Keras use a limited number of cores as the server I am using is shared across the institute.
The server is running linux. However, I don't know which exact distribution/version it is. I am very new to running code on a server.
These are the versions I am using:
python == 3.10.8
tensorflow == 2.10.0
keras == 2.10.0
If you need any other information, I am happy to provide that.
Edit 1
Both the answer suggested in this thread doesn't work as well as using only these commands:
tf.config.threading.set_intra_op_parallelism_threads(5)
tf.config.threading.set_inter_op_parallelism_threads(5)
after trying some things, I have found a solution to my problem. With the following code, I can restrict the number of CPUs used:
os.environ["OMP_NUM_THREADS"] = "5"
tf.config.threading.set_intra_op_parallelism_threads(5)
tf.config.threading.set_inter_op_parallelism_threads(5)
Note, that I have no idea how many CPUs will be used in the end. I noticed that it isn't five cores being used but more. As I don't really care about the exact number of cores but just that I don't use all cores, I am fine with that solution for now. If anybody knows how to calculate the number of cores used from the information provided above, let me know.
I have a server access which has multiple GPUs that can be accessed simultaneously by many users.
I choose only 1 gpu_id from the terminal and have a code like this.
device = "cuda:"+str(FLAGS.gpu_id) if torch.cuda.is_available() else "cpu"
where FLAGS is a parser, parsing arguments from terminal.
Even though I select only one id, I saw that I am using 2 different GPUs. That causes issues, when the other GPU memory is almost full, and my process terminates by throwing "CUDA out of memory" error.
I want to understand, what could be the possible cases for such thing to happen?
It is hard to tell what is wrong without knowing how you use the device parameter. In any case, you can try to achieve what you want with a different approach. Run your python script in the following way:
CUDA_VISIBLE_DEVICES=0 python3 my_code.py
So when I run cuda.select_device(0) and then cuda.close(). Pytorch cannot access the GPU again, I know that there is way so that PyTorch can utilize the GPU again without having to restart the kernel. But I forgot how. Does anyone else know?
from numba import cuda as cu
import torch
# random tensor
a=torch.rand(100,100)
#tensor can be loaded onto the gpu()
a.cuda()
device = cu.get_current_device()
device.reset()
# thows error "RuntimeError: CUDA error: invalid argument"
a.cuda()
cu.close()
# thows error "RuntimeError: CUDA error: invalid argument"
a.cuda()
torch.cuda.is_available()
#True
And then trying to run cuda-based pytorch code yields:
RuntimeError: CUDA error: invalid argument
could you provide a more complete snippet, I am running
from numba import cuda
import torch
device = cuda.get_current_device()
device.reset()
cuda.close()
torch.cuda.isavailable()
which prints True, not sure what is your issue?
I had the same issue but with TensorFlow and Keras when iterating through a for loop to tune hyperparamenters. It did not free up the GPU memory used by older models. The cuda solution did not work for me. The following did:
import gc
gc.collect()
This has two possible culprits.
some driver issue be it numba driver or kernel driver managing gpu.
reason for suspecting this is Roger did not see this issue. or such issue is not reported to numba repo.
Another possible issue.
cuda.select_device(0)
which is not needed.
is there any strong reason do use this explicitly ?
Analysis :
do keep in mind design for cuda.get_current_device()
and cuda.close()
they are dependent on context and not gpu. as per documentations of get_current_device
Get current device associated with the current thread
Do check
gpus = cuda.list_devices()
before and after your code.
if the gpus listed are same.
then you need to create context again.
if creating context agiain is problem. please attach your complete code and debug log if possible.
We only have 4 GPU devices. and we have more than 4 users to run cuda program ,so before I run my program I want to check which device is not busy, or it will alloc memory fail. But I havent found a function to get this tag. I know when we want to use device we call "cudaSetDevice()" , so there must be a tag for each device. and that "nvidia-smi" can get more detail, include which proccess is using which device and how much memory it used. So who can help me?
The values for cudaSetDevice start at 0 and then increase monotonically for each additional device. Alternatively you can set the environment variable CUDA_VISIBLE_DEVICES to select which device to use. (see https://devblogs.nvidia.com/parallelforall/cuda-pro-tip-control-gpu-visibility-cuda_visible_devices/).
To get information about what is using the device you need to use the driver API: http://docs.nvidia.com/cuda/cuda-driver-api/index.html
I tried to change the device used in theano-based program.
from theano import config
config.device = "gpu1"
However I got error
Exception: Can't change the value of this config parameter after initialization!
I wonder what is the best way of change gpu to gpu1 in code ?
Thanks
Another possibility which worked for me was setting the environment variable in the process, before importing theano:
import os
os.environ['THEANO_FLAGS'] = "device=gpu1"
import theano
There is no way to change this value in code running in the same process. The best you could do is to have a "parent" process that alters, for example, the THEANO_FLAGS environment variable and spawns children. However, the method of spawning will determine which environment the children operate in.
Note also that there is no way to do this in a way that maintains a process's memory through the change. You can't start running on CPU, do some work with values stored in memory then change to running on GPU and continue running using the values still in memory from the earlier (CPU) stage of work. The process must be shutdown and restarted for a change of device to be applied.
As soon as you import theano the device is fixed and cannot be changed within the process that did the import.
Remove the "device" config in .theanorc, then in your code:
import theano.sandbox.cuda
theano.sandbox.cuda.use("gpu0")
It works for me.
https://groups.google.com/forum/#!msg/theano-users/woPgxXCEMB4/l654PPpd5joJ