The following code runs the EleutherAI/gpt-neo-1.3B model. The model runs on CPUs, but I don't understand why it does not use my GPU. Did I missed something?
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("EleutherAI/gpt-neo-1.3B")
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/gpt-neo-1.3B")
prompt = ("What is the capital of France?")
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
gen_tokens = model.generate(input_ids, do_sample=True, temperature=0.9, max_length=50 )
gen_text = tokenizer.batch_decode(gen_tokens)[0]
print (gen_text)
By the way, here is the output of the nvidia-smi command
Thu Feb 16 14:58:28 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.108.03 Driver Version: 510.108.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:73:00.0 On | N/A |
| 30% 31C P8 34W / 350W | 814MiB / 24576MiB | 22% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA RTX A5000 Off | 00000000:A6:00.0 Off | Off |
| 30% 31C P8 16W / 230W | 8MiB / 24564MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3484 G /usr/lib/xorg/Xorg 378MiB |
| 0 N/A N/A 3660 G /usr/bin/gnome-shell 62MiB |
| 0 N/A N/A 4364 G ...662097787256072160,131072 225MiB |
| 0 N/A N/A 37532 G ...6/usr/lib/firefox/firefox 142MiB |
| 1 N/A N/A 3484 G /usr/lib/xorg/Xorg 4MiB |
+-----------------------------------------------------------------------------+
I'm working on Ubuntu 18.04 with tensorflow==2.2.
I've installed cuda 10.1. My GPU is detected but the program seems to be stucked after "Created TensorFlow device" or at least takes 2-3 minutes to run.
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.05 Driver Version: 455.23.05 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 750 Ti On | 00000000:01:00.0 On | N/A |
| 32% 34C P0 1W / 38W | 320MiB / 2000MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1013 G /usr/lib/xorg/Xorg 24MiB |
| 0 N/A N/A 1130 G /usr/bin/gnome-shell 48MiB |
| 0 N/A N/A 1343 G /usr/lib/xorg/Xorg 178MiB |
| 0 N/A N/A 1489 G /usr/bin/gnome-shell 48MiB |
| 0 N/A N/A 2643 G /usr/lib/firefox/firefox 1MiB |
| 0 N/A N/A 2734 G /usr/lib/firefox/firefox 1MiB |
| 0 N/A N/A 2778 G /usr/lib/firefox/firefox 1MiB |
| 0 N/A N/A 5438 G /usr/lib/firefox/firefox 1MiB |
| 0 N/A N/A 6178 G /usr/lib/firefox/firefox 1MiB |
| 0 N/A N/A 6691 G /usr/lib/firefox/firefox 1MiB |
| 0 N/A N/A 7007 G /usr/lib/firefox/firefox 1MiB |
+-----------------------------------------------------------------------------+
Importing works fine:
import tensorflow as tf
tf.config.list_physical_devices("GPU")
Output:
Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 750 Ti computeCapability: 5.0
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
But this seems to be stucked:
print(tf.reduce_sum(tf.random.normal([1000, 1000])))
Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1425 MB memory) -> physical GPU (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0)
...
I've installed tensorflow with pip3 install tensorflow==2.2 and also tried pip3 install tensorflow-gpu.
Any ideas?
Please check below if you have followed all the required steps properly to install Tensorflow-gpu on ubuntu.
STEP 1: Nvidia Drivers
STEP 2: Installation of NVIDIA CUDA
STEP 3: Installation of Deep Neural Network library (cuDNN)
STEP 4: Finally installing TENSORFLOW with GPU support
pip install --upgrade tensorflow-gpu
STEP 5: Checking the installation
python -c "from tensorflow.python.client import device_lib;
print(device_lib.list_local_devices())
You can check this reference to complete all these steps, then execute the same above code. Please let us know if issue still persists.
I have been using this:
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
in order to run on GPU. It has been working properly since today.
The problem now is that, in the middle of the runtime, my program stops using GPU and switches to CPU, so it becomes too slow.
Any idea on why is that happening?
Output at the beggining of the execution for nvidia-smi:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 970 On | 00000000:01:00.0 On | N/A |
| 0% 42C P8 14W / 200W | 363MiB / 4039MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K40c On | 00000000:05:00.0 Off | 0 |
| 35% 74C P0 136W / 235W | 11011MiB / 11441MiB | 94% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1037 G /usr/lib/xorg/Xorg 20MiB |
| 0 1150 G /usr/bin/gnome-shell 12MiB |
| 0 7430 G /usr/lib/xorg/Xorg 166MiB |
| 0 7560 G /usr/bin/gnome-shell 158MiB |
| 1 13772 C python3 10998MiB |
+-----------------------------------------------------------------------------+
And then, when it begins to run too slowly:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 970 On | 00000000:01:00.0 On | N/A |
| 0% 42C P8 14W / 200W | 363MiB / 4039MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K40c On | 00000000:05:00.0 Off | 0 |
| 35% 69C P0 63W / 235W | 11011MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1037 G /usr/lib/xorg/Xorg 20MiB |
| 0 1150 G /usr/bin/gnome-shell 12MiB |
| 0 7430 G /usr/lib/xorg/Xorg 166MiB |
| 0 7560 G /usr/bin/gnome-shell 158MiB |
| 1 13772 C python3 10998MiB |
+-----------------------------------------------------------------------------+
I have notebook with GPU: nvidia-smi
Thu Oct 18 20:49:22 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.87 Driver Version: 390.87 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GT 540M Off | 00000000:01:00.0 N/A | N/A |
| N/A 44C P8 N/A / N/A | 12MiB / 964MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+
It is running a code Keras:
Using TensorFlow backend.
2018-10-18 20:26:08.963084: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:897] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-10-18 20:26:08.963593: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties:
name: GeForce GT 540M major: 2 minor: 1 memoryClockRate(GHz): 1.344
pciBusID: 0000:01:00.0
totalMemory: 964.50MiB freeMemory: 917.75MiB
2018-10-18 20:26:08.963633: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1455] Ignoring visible gpu device (device: 0, name: GeForce GT 540M, pci bus id: 0000:01:00.0, compute capability: 2.1) with Cuda compute capability 2.1. The minimum required Cuda capability is 3.5.
2018-10-18 20:26:08.963652: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-10-18 20:26:08.963663: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0
2018-10-18 20:26:08.963673: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N
When I training a VGG16 NN with GPU using TensorFlow, it always show me CUDA_ERROR_OUT_OF_MEMORY and always stops with the error tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
I searched the internet with those message and got some tips:
set config.gpu_options.allow_growth to True.
set config.gpu_options.per_process_gpu_memory_fraction to a smaller fraction like 0.6.
set smaller batch size.
But these tips don't work, the process runs just like nothing changed.
Here is my hardware:
GPU: NVIDIA GTX 1060
Memory: 3G + 4G(shared memory)
I monitored the usage of GPU using nvidia-smi, and below is the detail.
Before Running:
Thu Apr 19 14:21:59 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 388.31 Driver Version: 388.31 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1060 WDDM | 00000000:01:00.0 On | N/A |
| N/A 50C P8 7W / N/A | 587MiB / 3072MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 7300 C+G ...osoft Office\root\Office16\POWERPNT.EXE N/A |
| 0 8244 C+G ...6)\Youdao\YoudaoNote\YNoteCefRender.exe N/A |
| 0 9988 C+G C:\Windows\explorer.exe N/A |
| 0 10696 C+G ...t_cw5n1h2txyewy\ShellExperienceHost.exe N/A |
| 0 10808 C+G ...dows.Cortana_cw5n1h2txyewy\SearchUI.exe N/A |
| 0 11024 C+G Insufficient Permissions N/A |
| 0 11092 C+G C:\Windows\System32\mstsc.exe N/A |
| 0 13076 C+G ...ogram Files (x86)\Skype\Phone\Skype.exe N/A |
| 0 14664 C+G ...osoft Office\root\Office16\POWERPNT.EXE N/A |
+-----------------------------------------------------------------------------+
Process begin:
Thu Apr 19 14:24:23 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 388.31 Driver Version: 388.31 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1060 WDDM | 00000000:01:00.0 On | N/A |
| N/A 48C P2 28W / N/A | 1133MiB / 3072MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 7300 C+G ...osoft Office\root\Office16\POWERPNT.EXE N/A |
| 0 9988 C+G C:\Windows\explorer.exe N/A |
| 0 10696 C+G ...t_cw5n1h2txyewy\ShellExperienceHost.exe N/A |
| 0 10808 C+G ...dows.Cortana_cw5n1h2txyewy\SearchUI.exe N/A |
| 0 11024 C+G Insufficient Permissions N/A |
| 0 11092 C+G C:\Windows\System32\mstsc.exe N/A |
| 0 13076 C+G ...ogram Files (x86)\Skype\Phone\Skype.exe N/A |
| 0 14404 C ...ools\Anaconda3\envs\py36_tfg\python.exe N/A |
| 0 14664 C+G ...osoft Office\root\Office16\POWERPNT.EXE N/A |
+-----------------------------------------------------------------------------+
After 10 steps:
Thu Apr 19 14:30:40 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 388.31 Driver Version: 388.31 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1060 WDDM | 00000000:01:00.0 On | N/A |
| N/A 64C P2 31W / N/A | 2595MiB / 3072MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 7300 C+G ...osoft Office\root\Office16\POWERPNT.EXE N/A |
| 0 9988 C+G C:\Windows\explorer.exe N/A |
| 0 10696 C+G ...t_cw5n1h2txyewy\ShellExperienceHost.exe N/A |
| 0 10808 C+G ...dows.Cortana_cw5n1h2txyewy\SearchUI.exe N/A |
| 0 11024 C+G Insufficient Permissions N/A |
| 0 11092 C+G C:\Windows\System32\mstsc.exe N/A |
| 0 13076 C+G ...ogram Files (x86)\Skype\Phone\Skype.exe N/A |
| 0 14404 C ...ools\Anaconda3\envs\py36_tfg\python.exe N/A |
| 0 14664 C+G ...osoft Office\root\Office16\POWERPNT.EXE N/A |
+-----------------------------------------------------------------------------+
After 60 steps:
some message showed, but can still run
2018-04-19 14:33:56.384528: E c:\l\work\tensorflow-1.1.0\tensorflow\stream_executor\cuda\cuda_driver.cc:924] failed to alloc 2147483648 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
2018-04-19 14:33:56.423080: E c:\l\work\tensorflow-1.1.0\tensorflow\stream_executor\cuda\cuda_driver.cc:924] failed to alloc 1932735232 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
2018-04-19 14:33:56.474281: E c:\l\work\tensorflow-1.1.0\tensorflow\stream_executor\cuda\cuda_driver.cc:924] failed to alloc 1739461632 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
Thu Apr 19 14:36:13 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 388.31 Driver Version: 388.31 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1060 WDDM | 00000000:01:00.0 On | N/A |
| N/A 63C P2 33W / N/A | 2602MiB / 3072MiB | 43% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 7300 C+G ...osoft Office\root\Office16\POWERPNT.EXE N/A |
| 0 9988 C+G C:\Windows\explorer.exe N/A |
| 0 10696 C+G ...t_cw5n1h2txyewy\ShellExperienceHost.exe N/A |
| 0 10808 C+G ...dows.Cortana_cw5n1h2txyewy\SearchUI.exe N/A |
| 0 11024 C+G Insufficient Permissions N/A |
| 0 11092 C+G C:\Windows\System32\mstsc.exe N/A |
| 0 13076 C+G ...ogram Files (x86)\Skype\Phone\Skype.exe N/A |
| 0 14404 C ...ools\Anaconda3\envs\py36_tfg\python.exe N/A |
| 0 14664 C+G ...osoft Office\root\Office16\POWERPNT.EXE N/A |
+-----------------------------------------------------------------------------+
After 170 steps:
About eight hundreds lines message showed, then the process stopped with errors
About eight hundreds lines:
2018-04-19 14:49:35.688274: E c:\l\work\tensorflow-1.1.0\tensorflow\stream_executor\cuda\cuda_driver.cc:924] failed to alloc 4294967296 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
Stopped with some errors:
Traceback (most recent call last):
File "C:\DevTools\Anaconda3\envs\py36_tfg\lib\site-packages\tensorflow\python\client\session.py", line 1039, in _do_call
return fn(*args)
File "C:\DevTools\Anaconda3\envs\py36_tfg\lib\site-packages\tensorflow\python\client\session.py", line 1021, in _run_fn
status, run_metadata)
File "C:\DevTools\Anaconda3\envs\py36_tfg\lib\contextlib.py", line 88, in __exit__
next(self.gen)
File "C:\DevTools\Anaconda3\envs\py36_tfg\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
[[Node: input/input/div/_79 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_111_input/input/div", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "vgg16_train_and_test.py", line 212, in <module>
train()
File "vgg16_train_and_test.py", line 124, in train
coord.join(threads)
File "C:\DevTools\Anaconda3\envs\py36_tfg\lib\site-packages\tensorflow\python\training\coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "C:\DevTools\Anaconda3\envs\py36_tfg\lib\site-packages\six.py", line 693, in reraise
raise value
File "C:\DevTools\Anaconda3\envs\py36_tfg\lib\site-packages\tensorflow\python\training\queue_runner_impl.py", line 234, in _run
sess.run(enqueue_op)
File "C:\DevTools\Anaconda3\envs\py36_tfg\lib\site-packages\tensorflow\python\client\session.py", line 778, in run
run_metadata_ptr)
File "C:\DevTools\Anaconda3\envs\py36_tfg\lib\site-packages\tensorflow\python\client\session.py", line 982, in _run
feed_dict_string, options, run_metadata)
File "C:\DevTools\Anaconda3\envs\py36_tfg\lib\site-packages\tensorflow\python\client\session.py", line 1032, in _do_run
target_list, options, run_metadata)
File "C:\DevTools\Anaconda3\envs\py36_tfg\lib\site-packages\tensorflow\python\client\session.py", line 1052, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
[[Node: input/input/div/_79 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_111_input/input/div", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]