Pytorch Multi-GPU Issue

Pytorch Multi-GPU Issue - python

I want to train my model with 2 GPU(id 5, 6), so I run my code with CUDA_VISIBLE_DEVICES=5,6 train.py. However, when I printed torch.cuda.current_device I still got the id 0 rather than 5,6. But torch.cuda.device_count is 2, which semms right. How can I use GPU5,6 correctly?

It is most likely correct. PyTorch only sees two GPUs (therefore indexed 0 and 1) which are actually your GPU 5 and 6.
Check the actual usage with nvidia-smi. If it is still inconsistent, you might need to set an environment variable:
export CUDA_DEVICE_ORDER=PCI_BUS_ID
(See Inconsistency of IDs between 'nvidia-smi -L' and cuDeviceGetName())

you can check the device name to verify whether that is the correct name of that GPU. However, I think when you set the Cuda_Visible outside, you have forced torch to look only at that 2 gpu. So torch will manually set index for them as 0 and 1. Because of this, when you check the current_device, it will output 0

Related

Getting the autograd counter of a tensor in PyTorch

I am using PyTorch for training a network. I was going through the autograd documentation and here it is mentioned that for each tensor there is a counter that the autograd implements to track the "version" of any tensor. How can I get this counter for any tensor in the graph?
Reason why I need it.
I have encountered the autograd error
[torch.cuda.FloatTensor [x, y, z]], which is output 0 of torch::autograd::CopySlices, is at version 7; expected version 6 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
This is not new to me and I have been successful in handling it before. This time around I am not able to see why the tensor would be at version 7 instead of being at 6. To answer this, I would want to know the version at any given point in the run.
Thanks.

It can be obtained through the command tensor_name._version.
As an example of how to use it, following MSE is provided.
import torch
a = torch.zeros(10, 5)
print(a._version) # prints 0
a[:, 1] = 1
print(a._version) # prints 1

stokesCavity example for FiPy returns False

I have tried to run the stokesCavity example, which uses lid-driven boundary conditions for the flow. At the end of the code, the values in the top-right cell are compared with some reference values.
>>> print(numerix.allclose(pressure.globalValue[..., -1], 162.790867927)) #doctest: +NOT_PYAMGX_SOLVER
1
>>> print(numerix.allclose(xVelocity.globalValue[..., -1], 0.265072740929)) #doctest: +NOT_PYAMGX_SOLVER
1
>>> print(numerix.allclose(yVelocity.globalValue[..., -1], -0.150290488304)) #doctest: +NOT_PYAMGX_SOLVER
1
When I tried running this example, my output was
False
False
False
The actual values I get for the top-right cell are 129.235, 0.278627 and -0.166620 (instead of 162.790867927, 0.265072740929 and -0.150290488304). Does anyone know why am I getting different values? I've tried changing the solver (used scipy, Trilinos and pysparse), but the results do not change up to the 12th digit. The velocity profile looks similar to the one shown in their manual, but I am still worried something is off.
I run it on Linux (python 2.7.14, fipy 3.2, pysparse 1.2.dev0, Trilinos 12.12, scipy 1.2.1) and on Windows (python 2.7.15, fipy 3.1.3, scipy 1.1.0).

When run as part of the test suite, this example only does 5 sweeps, and the numerical check is hard-wired for this. When you run the example in isolation, it does 300 sweeps and the solution is better (or at least differently) converged. There's nothing wrong with the example, other than it's not written in a very robust way. Thanks for asking about this; we'll try to clean up the example.

Python Kernel Died when using triple for loop containing requires_grad=True parameters (Pytorch)

I just want to store para_k in every position of COV matrix, where:
para_k=torch.tensor([2.0],requires_grad=True).
And because I want to update parameters by optimising a loss function later(omitted here), I have to add another for loop. Therefore, the whole code is as follows:
import torch
para_k=torch.tensor([2.0],requires_grad=True)
for t in range(5):
num_row=300
num_col=300
COV = torch.zeros((num_row,num_col))
for i in range(num_row):
for j in range(num_col):
COV[i,j]=para_k
print('i= %d ,j= %d' % (i,j) )
This simple computation cannot be finished by Spyder!!
Things can go well if I change nrow and ncol to be smaller than 100.
If I use number larger than 100, it will send error message (sometimes even one iteration cannot be finished) that Kernel died and restarting: Spyder Kernel Died
I even uninstall and install again my Anaconda and reinstall pytorch. I also tried to run Spyder as administrator, still got the error.
I check the event of my system, the error is: ntdll.dll regarding to pythonw.exe. I listed pythonw.exe as exception of firewall, but still cannot solve the problem.

How do I check if PyTorch is using the GPU?

How do I check if PyTorch is using the GPU? The nvidia-smi command can detect GPU activity, but I want to check it directly from inside a Python script.

These functions should help:
>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.device_count()
1
>>> torch.cuda.current_device()
0
>>> torch.cuda.device(0)
<torch.cuda.device at 0x7efce0b03be0>
>>> torch.cuda.get_device_name(0)
'GeForce GTX 950M'
This tells us:
CUDA is available and can be used by one device.
Device 0 refers to the GPU GeForce GTX 950M, and it is currently chosen by PyTorch.

As it hasn't been proposed here, I'm adding a method using torch.device, as this is quite handy, also when initializing tensors on the correct device.
# setting device on GPU if available, else CPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)
print()
#Additional Info when using cuda
if device.type == 'cuda':
print(torch.cuda.get_device_name(0))
print('Memory Usage:')
print('Allocated:', round(torch.cuda.memory_allocated(0)/1024**3,1), 'GB')
print('Cached: ', round(torch.cuda.memory_reserved(0)/1024**3,1), 'GB')
Edit: torch.cuda.memory_cached has been renamed to torch.cuda.memory_reserved. So use memory_cached for older versions.
Output:
Using device: cuda
Tesla K80
Memory Usage:
Allocated: 0.3 GB
Cached: 0.6 GB
As mentioned above, using device it is possible to:
To move tensors to the respective device:
torch.rand(10).to(device)
To create a tensor directly on the device:
torch.rand(10, device=device)
Which makes switching between CPU and GPU comfortable without changing the actual code.
Edit:
As there has been some questions and confusion about the cached and allocated memory I'm adding some additional information about it:
torch.cuda.max_memory_cached(device=None) Returns the maximum GPU memory managed by the caching allocator in bytes for a
given device.
torch.cuda.memory_allocated(device=None) Returns the current GPU memory usage by tensors in bytes for a given device.
You can either directly hand over a device as specified further above in the post or you can leave it None and it will use the current_device().
Additional note: Old graphic cards with Cuda compute capability 3.0 or lower may be visible but cannot be used by Pytorch! Thanks to hekimgil for pointing this out! - "Found GPU0 GeForce GT 750M which is of cuda capability 3.0. PyTorch no longer supports this GPU because it is too old. The minimum cuda capability that we support is 3.5."

After you start running the training loop, if you want to manually watch it from the terminal whether your program is utilizing the GPU resources and to what extent, then you can simply use watch as in:
$ watch -n 2 nvidia-smi
This will continuously update the usage stats for every 2 seconds until you press ctrl+c
If you need more control on more GPU stats you might need, you can use more sophisticated version of nvidia-smi with --query-gpu=.... Below is a simple illustration of this:
$ watch -n 3 nvidia-smi --query-gpu=index,gpu_name,memory.total,memory.used,memory.free,temperature.gpu,pstate,utilization.gpu,utilization.memory --format=csv
which would output the stats something like:
Note: There should not be any space between the comma separated query names in --query-gpu=.... Else those values will be ignored and no stats are returned.
Also, you can check whether your installation of PyTorch detects your CUDA installation correctly by doing:
In [13]: import torch
In [14]: torch.cuda.is_available()
Out[14]: True
True status means that PyTorch is configured correctly and is using the GPU although you have to move/place the tensors with necessary statements in your code.
If you want to do this inside Python code, then look into this module:
https://github.com/jonsafari/nvidia-ml-py or in pypi here: https://pypi.python.org/pypi/nvidia-ml-py/

From practical standpoint just one minor digression:
import torch
dev = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
This dev now knows if cuda or cpu.
And there is a difference in how you deal with models and with tensors when moving to cuda. It is a bit strange at first.
import torch
import torch.nn as nn
dev = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
t1 = torch.randn(1,2)
t2 = torch.randn(1,2).to(dev)
print(t1) # tensor([[-0.2678, 1.9252]])
print(t2) # tensor([[ 0.5117, -3.6247]], device='cuda:0')
t1.to(dev)
print(t1) # tensor([[-0.2678, 1.9252]])
print(t1.is_cuda) # False
t1 = t1.to(dev)
print(t1) # tensor([[-0.2678, 1.9252]], device='cuda:0')
print(t1.is_cuda) # True
class M(nn.Module):
def __init__(self):
super().__init__()
self.l1 = nn.Linear(1,2)
def forward(self, x):
x = self.l1(x)
return x
model = M() # not on cuda
model.to(dev) # is on cuda (all parameters)
print(next(model.parameters()).is_cuda) # True
This all is tricky and understanding it once, helps you to deal fast with less debugging.

From the official site's get started page, you can check if the GPU is available for PyTorch like so:
import torch
torch.cuda.is_available()
Reference: PyTorch | Get Started

Query
Command
Does PyTorch see any GPUs?
torch.cuda.is_available()
Are tensors stored on GPU by default?
torch.rand(10).device
Set default tensor type to CUDA:
torch.set_default_tensor_type(torch.cuda.FloatTensor)
Is this tensor a GPU tensor?
my_tensor.is_cuda
Is this model stored on the GPU?
all(p.is_cuda for p in my_model.parameters())

To check if there is a GPU available:
torch.cuda.is_available()
If the above function returns False,
you either have no GPU,
or the Nvidia drivers have not been installed so the OS does not see the GPU,
or the GPU is being hidden by the environmental variable CUDA_VISIBLE_DEVICES. When the value of CUDA_VISIBLE_DEVICES is -1, then all your devices are being hidden. You can check that value in code with this line: os.environ['CUDA_VISIBLE_DEVICES']
If the above function returns True that does not necessarily mean that you are using the GPU. In Pytorch you can allocate tensors to devices when you create them. By default, tensors get allocated to the cpu. To check where your tensor is allocated do:
# assuming that 'a' is a tensor created somewhere else
a.device # returns the device where the tensor is allocated
Note that you cannot operate on tensors allocated in different devices. To see how to allocate a tensor to the GPU, see here: https://pytorch.org/docs/stable/notes/cuda.html

Almost all answers here reference torch.cuda.is_available(). However, that's only one part of the coin. It tells you whether the GPU (actually CUDA) is available, not whether it's actually being used. In a typical setup, you would set your device with something like this:
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
but in larger environments (e.g. research) it is also common to give the user more options, so based on input they can disable CUDA, specify CUDA IDs, and so on. In such case, whether or not the GPU is used is not only based on whether it is available or not. After the device has been set to a torch device, you can get its type property to verify whether it's CUDA or not.
if device.type == 'cuda':
# do something

Simply from command prompt or Linux environment run the following command.
python -c 'import torch; print(torch.cuda.is_available())'
The above should print True
python -c 'import torch; print(torch.rand(2,3).cuda())'
This one should print the following:
tensor([[0.7997, 0.6170, 0.7042], [0.4174, 0.1494, 0.0516]], device='cuda:0')

If you are here because your pytorch always gives False for torch.cuda.is_available() that's probably because you installed your pytorch version without GPU support. (Eg: you coded up in laptop then testing on server).
The solution is to uninstall and install pytorch again with the right command from pytorch downloads page. Also refer this pytorch issue.

It is possible for
torch.cuda.is_available()
to return True but to get the following error when running
>>> torch.rand(10).to(device)
as suggested by MBT:
RuntimeError: CUDA error: no kernel image is available for execution on the device
This link explains that
... torch.cuda.is_available only checks whether your driver is compatible with the version of cuda we used in the binary. So it means that CUDA 10.1 is compatible with your driver. But when you do computation with CUDA, it couldn't find the code for your arch.

If you are using Linux I suggest to install nvtop
https://github.com/Syllo/nvtop
You will get something like this:

For a MacBook M1 system:
import torch
print(torch.backends.mps.is_available(), torch.backends.mps.is_built())
And both should be True.

Create a tensor on the GPU as follows:
$ python
>>> import torch
>>> print(torch.rand(3,3).cuda())
Do not quit, open another terminal and check if the python process is using the GPU using:
$ nvidia-smi

Using the code below
import torch
torch.cuda.is_available()
will only display whether the GPU is present and detected by pytorch or not.
But in the "task manager-> performance" the GPU utilization will be very few percent.
Which means you are actually running using CPU.
To solve the above issue check and change:
Graphics setting --> Turn on Hardware accelerated GPU settings, restart.
Open NVIDIA control panel --> Desktop --> Display GPU in the notification area
[Note: If you have newly installed windows then you also have to agree the terms and conditions in NVIDIA control panel]
This should work!

step 1: import torch library
import torch
#step 2: create tensor
tensor = torch.tensor([5, 6])
#step 3: find the device type
#output 1: in the below, the output we can get the size(tensor.shape), dimension(tensor.ndim),
#and device on which the tensor is processed
tensor, tensor.device, tensor.ndim, tensor.shape
(tensor([5, 6]), device(type='cpu'), 1, torch.Size([2]))
#or
#output 2: in the below, the output we can get the only device type
tensor.device
device(type='cpu')
#As my system using cpu processor "11th Gen Intel(R) Core(TM) i5-1135G7 # 2.40GHz 2.42 GHz"
#find, if the tensor processed GPU?
print(tensor, torch.cuda.is_available()
# the output will be
tensor([5, 6]) False
#above output is false, hence it is not on gpu
#happy coding :)

TensorFlow Distributed Runtime Model Parallel CIFAR-10

I have tried to modify the CIFAR-10 example to run on the new TensorFlow distributed runtime. However, I get the following error when trying to run the program:
InvalidArgumentError: Cannot assign a device to node 'softmax_linear/biases/ExponentialMovingAverage':
Could not satisfy explicit device specification '/job:local/task:0/device:CPU:0'
I start the cluster using the following commands. On the first node I run:
bazel-bin/tensorflow/core/distributed_runtime/rpc/grpc_tensorflow_server --cluster_spec='local|10.31.101.101:7777;10.31.101.224:7778' --job_name=local --task_id=0
...and on the second node I run:
bazel-bin/tensorflow/core/distributed_runtime/rpc/grpc_tensorflow_server --cluster_spec='local|10.31.101.101:7777;10.31.101.224:7778' --job_name=local --task_id=1
For the CIFAR-10 multi-GPU code, I make the simple modifications, replacing two lines in the train() function. The following line:
with tf.Graph().as_default(), tf.device('/cpu:0'):
...is replaced with:
with tf.Graph().as_default(), tf.device('/job:local/task:0/cpu:0'):
and the following line:
with tf.device('/gpu:%d' % i):
...is replaced with:
with tf.device('/job:local/task:0/gpu:%d' % i):
In my understanding, the second substitution should take care of the model substitution. Running a simpler example, like the code below, works fine:
with tf.device('/job:local/task:0/cpu:0'):
c = tf.constant("Hello, distributed TensorFlow!")
sess.run(c)
print(c)

I can't tell from your program, but my guess is that you also have to modify the line that creates the session to specify the address of one of your worker tasks. For example, given your configuration above, you might write:
sess = tf.Session(
"grpc://10.31.101.101:7777",
config=tf.ConfigProto(
allow_soft_placement=True,
log_device_placement=FLAGS.log_device_placement))
As it happens, we've been trying to improve that error message to make it less confusing. If you update to the latest version in GitHub and run the same code, you should see an error message that explains why the device specification could not be satisfied.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pytorch Multi-GPU Issue - python

I want to train my model with 2 GPU(id 5, 6), so I run my code with CUDA_VISIBLE_DEVICES=5,6 train.py. However, when I printed torch.cuda.current_device I still got the id 0 rather than 5,6. But torch.cuda.device_count is 2, which semms right. How can I use GPU5,6 correctly?

Related

Getting the autograd counter of a tensor in PyTorch

stokesCavity example for FiPy returns False

Python Kernel Died when using triple for loop containing requires_grad=True parameters (Pytorch)

How do I check if PyTorch is using the GPU?

TensorFlow Distributed Runtime Model Parallel CIFAR-10

Categories

Resources