Pytorch CUDA experience

In my experience of using the lab server to train my model, I met the problem of OOM(out of memory). Here I attach some solution and thinking in the following article.

Assume such scenario:

  • The default CUDA is full and even you want to do torch.tensor([1,2,3]).cuda() you will get OOM error.

You shoul trying to choose another GPU.

CUDA_VISIBLE_DEVICES

Code

1
2
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '2,3'

Add this piece of code into your script file, and when your execute your code, you will use the corrsponding GPU.
(Note: This will not be useful in Jupyter Notebook.)

OR

1
CUDA_VISIBLE_DEVICES=2 python test.py

When you execute your script file, add the CUDA_VISIBLE_DEVICES=2 in the begining. Then the script will run on the certain GPU.

Note

Even you set your GPU of 2 or 3 using this way, in the output, the device will show tensor([1, 2, 3], device='cuda:0').

  • From pytorch forum of @pjavia ‘s answer:
    @MrTuo This is how pytorch 0.4.1 convention works. If you say CUDA_VISIBLE_DEVICES=2, 3. Then for pytorch GPU - 2 is cuda:0 and GPU - 3 is cuda:1. Just check your code is consistent with this convention or not?

And I tested on the torch 1.0.1, it seems also consistent with this answer.

Torch.cuda

1
2
3
4
torch.cuda.set_device(1)

torch.tensor([1,2,3]).cuda()
# output: tensor([1, 2, 3], device='cuda:1')

This code of first line is useful on Jupyter Notebook. When you set certain GPU device, the following code will use this GPU.

It’s kind of set the GPU environment.

Torch.device

1
2
device = torch.device('cuda:3')
# X = X.to(device)

Set a device of certain GPU, when you are executing the code, transfer the variable into the device(it can also be CPU).

Reference

  1. Set Default GPU in PyTorch
  2. Pytorch forum: CUDA_VISIBLE_DEVICE is of no use