Tensorflow is unable to run code on too many GPUs? - python

I have the following test code:
import tensorflow as tf
import numpy as np
def body(x):
a = tf.random_uniform(shape=[2, 2], dtype=tf.int32, maxval=100)
b = tf.constant(np.array([[1, 2], [3, 4]]), dtype=tf.int32)
c = a + b
return tf.nn.relu(x + c)
def condition(x):
return tf.reduce_sum(x) < 100
x = tf.Variable(tf.constant(0, shape=[2, 2]))
with tf.Session():
tf.initialize_all_variables().run()
result = tf.while_loop(condition, body, [x])
print(result.eval())
when I run it on my GPU cluster, I produce the following error:
2018-03-30 18:10:33.473913: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10415 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:3d:00.0, compute capability: 6.1)
2018-03-30 18:10:33.591203: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10415 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:3e:00.0, compute capability: 6.1)
2018-03-30 18:10:33.688390: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 10415 MB memory) -> physical GPU (device: 2, name: GeForce GTX 1080 Ti, pci bus id: 0000:60:00.0, compute capability: 6.1)
2018-03-30 18:10:33.806845: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 10415 MB memory) -> physical GPU (device: 3, name: GeForce GTX 1080 Ti, pci bus id: 0000:61:00.0, compute capability: 6.1)
2018-03-30 18:10:33.913200: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:4 with 10415 MB memory) -> physical GPU (device: 4, name: GeForce GTX 1080 Ti, pci bus id: 0000:b1:00.0, compute capability: 6.1)
2018-03-30 18:10:34.018533: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:5 with 10415 MB memory) -> physical GPU (device: 5, name: GeForce GTX 1080 Ti, pci bus id: 0000:b2:00.0, compute capability: 6.1)
Killed
When I run the script using CUDA_VISIBLE_DEVICES='6' python script.py it aborts using the GPU. What could be causing this? Could this be a defective GPU?
nvidia-smi reports the following:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.25 Driver Version: 390.25 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:3D:00.0 Off | N/A |
| 28% 21C P8 8W / 250W | 0MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 108... Off | 00000000:3E:00.0 Off | N/A |
| 28% 21C P8 7W / 250W | 0MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce GTX 108... Off | 00000000:60:00.0 Off | N/A |
| 28% 24C P8 8W / 250W | 0MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce GTX 108... Off | 00000000:61:00.0 Off | N/A |
| 28% 25C P8 8W / 250W | 0MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 GeForce GTX 108... Off | 00000000:B1:00.0 Off | N/A |
| 28% 19C P8 8W / 250W | 0MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 GeForce GTX 108... Off | 00000000:B2:00.0 Off | N/A |
| 28% 20C P8 8W / 250W | 0MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 GeForce GTX 108... Off | 00000000:DA:00.0 Off | N/A |
| 28% 22C P8 8W / 250W | 0MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 GeForce GTX 108... Off | 00000000:DB:00.0 Off | N/A |
| 28% 21C P8 8W / 250W | 0MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Tensorflow version is 1.7.0 and CUDA version is 9.0.176

The problem was that I didn't request enough RAM space when creating a job to use that many GPUs. To use 8 GPUs, you need a good amount of space, perhaps ~60 Gi.

Related

Tensorflow stuck after "Created TensorFlow device"

I'm working on Ubuntu 18.04 with tensorflow==2.2.
I've installed cuda 10.1. My GPU is detected but the program seems to be stucked after "Created TensorFlow device" or at least takes 2-3 minutes to run.
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.23.05 Driver Version: 455.23.05 CUDA Version: 11.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 750 Ti On | 00000000:01:00.0 On | N/A |
| 32% 34C P0 1W / 38W | 320MiB / 2000MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1013 G /usr/lib/xorg/Xorg 24MiB |
| 0 N/A N/A 1130 G /usr/bin/gnome-shell 48MiB |
| 0 N/A N/A 1343 G /usr/lib/xorg/Xorg 178MiB |
| 0 N/A N/A 1489 G /usr/bin/gnome-shell 48MiB |
| 0 N/A N/A 2643 G /usr/lib/firefox/firefox 1MiB |
| 0 N/A N/A 2734 G /usr/lib/firefox/firefox 1MiB |
| 0 N/A N/A 2778 G /usr/lib/firefox/firefox 1MiB |
| 0 N/A N/A 5438 G /usr/lib/firefox/firefox 1MiB |
| 0 N/A N/A 6178 G /usr/lib/firefox/firefox 1MiB |
| 0 N/A N/A 6691 G /usr/lib/firefox/firefox 1MiB |
| 0 N/A N/A 7007 G /usr/lib/firefox/firefox 1MiB |
+-----------------------------------------------------------------------------+
Importing works fine:
import tensorflow as tf
tf.config.list_physical_devices("GPU")
Output:
Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 750 Ti computeCapability: 5.0
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
But this seems to be stucked:
print(tf.reduce_sum(tf.random.normal([1000, 1000])))
Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1425 MB memory) -> physical GPU (device: 0, name: GeForce GTX 750 Ti, pci bus id: 0000:01:00.0, compute capability: 5.0)
...
I've installed tensorflow with pip3 install tensorflow==2.2 and also tried pip3 install tensorflow-gpu.
Any ideas?
Please check below if you have followed all the required steps properly to install Tensorflow-gpu on ubuntu.
STEP 1: Nvidia Drivers
STEP 2: Installation of NVIDIA CUDA
STEP 3: Installation of Deep Neural Network library (cuDNN)
STEP 4: Finally installing TENSORFLOW with GPU support
pip install --upgrade tensorflow-gpu
STEP 5: Checking the installation
python -c "from tensorflow.python.client import device_lib;
print(device_lib.list_local_devices())
You can check this reference to complete all these steps, then execute the same above code. Please let us know if issue still persists.

Tensorflow does not get GPU

Python verion: 3.7.6
Tensorflow version: 2.3.0
CUDA: 10.2.89
CUDNN: 10.2
nvcc --version:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:32:27_Pacific_Daylight_Time_2019
Cuda compilation tools, release 10.2, V10.2.89
nvidia-smi output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 451.48 Driver Version: 451.48 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 WDDM | 00000000:04:00.0 On | N/A |
| 0% 47C P8 8W / 200W | 463MiB / 8192MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1268 C+G Insufficient Permissions N/A |
| 0 N/A N/A 1308 C+G Insufficient Permissions N/A |
| 0 N/A N/A 4936 C+G ...\Direct4\jabra-direct.exe N/A |
| 0 N/A N/A 7500 C+G Insufficient Permissions N/A |
| 0 N/A N/A 7516 C+G ...w5n1h2txyewy\SearchUI.exe N/A |
| 0 N/A N/A 9668 C+G Insufficient Permissions N/A |
| 0 N/A N/A 10676 C+G C:\Windows\explorer.exe N/A |
| 0 N/A N/A 10828 C+G ...st\Desktop\Mattermost.exe N/A |
| 0 N/A N/A 11536 C+G ...8bbwe\Microsoft.Notes.exe N/A |
| 0 N/A N/A 14604 C+G ...es.TextInput.InputApp.exe N/A |
+-----------------------------------------------------------------------------+
I tried:
print("Num GPUs Available: ", len(tf.config.experimental.list_physical_devices('GPU')))
Num GPUs Available: 0
Why tensorflow is not able to detect the GPU?
It seems you are trying to use the TensorFlow-GPU version and you have downloaded unsupported versions.
Note: GPU support is available for Ubuntu and Windows with CUDA enabled cards only.
If you have a Cuda enabled card follow the instructions provided below.
As stated in Tensorflow documentation. The software requirements are as follows.
Nvidia gpu drivers - 418.x or higher
Cuda - 10.1 (TensorFlow >= 2.1.0)
cuDNN - 7.6
Make sure you have these exact versions of the software mentioned above. See this
Also, check the system requirements here.
Make sure you have installed all the c++ redistributables - here
For downloading the software mentioned above see here.
For downloading TensorFlow follow the instructions provided here to correctly install the necessary packages.

Why Python stops using GPU and switches to CPU in runtime

I have been using this:
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
in order to run on GPU. It has been working properly since today.
The problem now is that, in the middle of the runtime, my program stops using GPU and switches to CPU, so it becomes too slow.
Any idea on why is that happening?
Output at the beggining of the execution for nvidia-smi:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 970 On | 00000000:01:00.0 On | N/A |
| 0% 42C P8 14W / 200W | 363MiB / 4039MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K40c On | 00000000:05:00.0 Off | 0 |
| 35% 74C P0 136W / 235W | 11011MiB / 11441MiB | 94% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1037 G /usr/lib/xorg/Xorg 20MiB |
| 0 1150 G /usr/bin/gnome-shell 12MiB |
| 0 7430 G /usr/lib/xorg/Xorg 166MiB |
| 0 7560 G /usr/bin/gnome-shell 158MiB |
| 1 13772 C python3 10998MiB |
+-----------------------------------------------------------------------------+
And then, when it begins to run too slowly:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 970 On | 00000000:01:00.0 On | N/A |
| 0% 42C P8 14W / 200W | 363MiB / 4039MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K40c On | 00000000:05:00.0 Off | 0 |
| 35% 69C P0 63W / 235W | 11011MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1037 G /usr/lib/xorg/Xorg 20MiB |
| 0 1150 G /usr/bin/gnome-shell 12MiB |
| 0 7430 G /usr/lib/xorg/Xorg 166MiB |
| 0 7560 G /usr/bin/gnome-shell 158MiB |
| 1 13772 C python3 10998MiB |
+-----------------------------------------------------------------------------+

How to check if keras tensorflow backend is running on the GPU or CPU?

I have notebook with GPU: nvidia-smi
Thu Oct 18 20:49:22 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.87 Driver Version: 390.87 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GT 540M Off | 00000000:01:00.0 N/A | N/A |
| N/A 44C P8 N/A / N/A | 12MiB / 964MiB | N/A Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+
It is running a code Keras:
Using TensorFlow backend.
2018-10-18 20:26:08.963084: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:897] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-10-18 20:26:08.963593: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with properties:
name: GeForce GT 540M major: 2 minor: 1 memoryClockRate(GHz): 1.344
pciBusID: 0000:01:00.0
totalMemory: 964.50MiB freeMemory: 917.75MiB
2018-10-18 20:26:08.963633: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1455] Ignoring visible gpu device (device: 0, name: GeForce GT 540M, pci bus id: 0000:01:00.0, compute capability: 2.1) with Cuda compute capability 2.1. The minimum required Cuda capability is 3.5.
2018-10-18 20:26:08.963652: I tensorflow/core/common_runtime/gpu/gpu_device.cc:965] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-10-18 20:26:08.963663: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0
2018-10-18 20:26:08.963673: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] 0: N

Tensorflow: CUDA_ERROR_OUT_OF_MEMORY tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized

When I training a VGG16 NN with GPU using TensorFlow, it always show me CUDA_ERROR_OUT_OF_MEMORY and always stops with the error tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
I searched the internet with those message and got some tips:
set config.gpu_options.allow_growth to True.
set config.gpu_options.per_process_gpu_memory_fraction to a smaller fraction like 0.6.
set smaller batch size.
But these tips don't work, the process runs just like nothing changed.
Here is my hardware:
GPU: NVIDIA GTX 1060
Memory: 3G + 4G(shared memory)
I monitored the usage of GPU using nvidia-smi, and below is the detail.
Before Running:
Thu Apr 19 14:21:59 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 388.31 Driver Version: 388.31 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1060 WDDM | 00000000:01:00.0 On | N/A |
| N/A 50C P8 7W / N/A | 587MiB / 3072MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 7300 C+G ...osoft Office\root\Office16\POWERPNT.EXE N/A |
| 0 8244 C+G ...6)\Youdao\YoudaoNote\YNoteCefRender.exe N/A |
| 0 9988 C+G C:\Windows\explorer.exe N/A |
| 0 10696 C+G ...t_cw5n1h2txyewy\ShellExperienceHost.exe N/A |
| 0 10808 C+G ...dows.Cortana_cw5n1h2txyewy\SearchUI.exe N/A |
| 0 11024 C+G Insufficient Permissions N/A |
| 0 11092 C+G C:\Windows\System32\mstsc.exe N/A |
| 0 13076 C+G ...ogram Files (x86)\Skype\Phone\Skype.exe N/A |
| 0 14664 C+G ...osoft Office\root\Office16\POWERPNT.EXE N/A |
+-----------------------------------------------------------------------------+
Process begin:
Thu Apr 19 14:24:23 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 388.31 Driver Version: 388.31 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1060 WDDM | 00000000:01:00.0 On | N/A |
| N/A 48C P2 28W / N/A | 1133MiB / 3072MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 7300 C+G ...osoft Office\root\Office16\POWERPNT.EXE N/A |
| 0 9988 C+G C:\Windows\explorer.exe N/A |
| 0 10696 C+G ...t_cw5n1h2txyewy\ShellExperienceHost.exe N/A |
| 0 10808 C+G ...dows.Cortana_cw5n1h2txyewy\SearchUI.exe N/A |
| 0 11024 C+G Insufficient Permissions N/A |
| 0 11092 C+G C:\Windows\System32\mstsc.exe N/A |
| 0 13076 C+G ...ogram Files (x86)\Skype\Phone\Skype.exe N/A |
| 0 14404 C ...ools\Anaconda3\envs\py36_tfg\python.exe N/A |
| 0 14664 C+G ...osoft Office\root\Office16\POWERPNT.EXE N/A |
+-----------------------------------------------------------------------------+
After 10 steps:
Thu Apr 19 14:30:40 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 388.31 Driver Version: 388.31 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1060 WDDM | 00000000:01:00.0 On | N/A |
| N/A 64C P2 31W / N/A | 2595MiB / 3072MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 7300 C+G ...osoft Office\root\Office16\POWERPNT.EXE N/A |
| 0 9988 C+G C:\Windows\explorer.exe N/A |
| 0 10696 C+G ...t_cw5n1h2txyewy\ShellExperienceHost.exe N/A |
| 0 10808 C+G ...dows.Cortana_cw5n1h2txyewy\SearchUI.exe N/A |
| 0 11024 C+G Insufficient Permissions N/A |
| 0 11092 C+G C:\Windows\System32\mstsc.exe N/A |
| 0 13076 C+G ...ogram Files (x86)\Skype\Phone\Skype.exe N/A |
| 0 14404 C ...ools\Anaconda3\envs\py36_tfg\python.exe N/A |
| 0 14664 C+G ...osoft Office\root\Office16\POWERPNT.EXE N/A |
+-----------------------------------------------------------------------------+
After 60 steps:
some message showed, but can still run
2018-04-19 14:33:56.384528: E c:\l\work\tensorflow-1.1.0\tensorflow\stream_executor\cuda\cuda_driver.cc:924] failed to alloc 2147483648 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
2018-04-19 14:33:56.423080: E c:\l\work\tensorflow-1.1.0\tensorflow\stream_executor\cuda\cuda_driver.cc:924] failed to alloc 1932735232 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
2018-04-19 14:33:56.474281: E c:\l\work\tensorflow-1.1.0\tensorflow\stream_executor\cuda\cuda_driver.cc:924] failed to alloc 1739461632 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
Thu Apr 19 14:36:13 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 388.31 Driver Version: 388.31 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1060 WDDM | 00000000:01:00.0 On | N/A |
| N/A 63C P2 33W / N/A | 2602MiB / 3072MiB | 43% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 7300 C+G ...osoft Office\root\Office16\POWERPNT.EXE N/A |
| 0 9988 C+G C:\Windows\explorer.exe N/A |
| 0 10696 C+G ...t_cw5n1h2txyewy\ShellExperienceHost.exe N/A |
| 0 10808 C+G ...dows.Cortana_cw5n1h2txyewy\SearchUI.exe N/A |
| 0 11024 C+G Insufficient Permissions N/A |
| 0 11092 C+G C:\Windows\System32\mstsc.exe N/A |
| 0 13076 C+G ...ogram Files (x86)\Skype\Phone\Skype.exe N/A |
| 0 14404 C ...ools\Anaconda3\envs\py36_tfg\python.exe N/A |
| 0 14664 C+G ...osoft Office\root\Office16\POWERPNT.EXE N/A |
+-----------------------------------------------------------------------------+
After 170 steps:
About eight hundreds lines message showed, then the process stopped with errors
About eight hundreds lines:
2018-04-19 14:49:35.688274: E c:\l\work\tensorflow-1.1.0\tensorflow\stream_executor\cuda\cuda_driver.cc:924] failed to alloc 4294967296 bytes on host: CUDA_ERROR_OUT_OF_MEMORY
Stopped with some errors:
Traceback (most recent call last):
File "C:\DevTools\Anaconda3\envs\py36_tfg\lib\site-packages\tensorflow\python\client\session.py", line 1039, in _do_call
return fn(*args)
File "C:\DevTools\Anaconda3\envs\py36_tfg\lib\site-packages\tensorflow\python\client\session.py", line 1021, in _run_fn
status, run_metadata)
File "C:\DevTools\Anaconda3\envs\py36_tfg\lib\contextlib.py", line 88, in __exit__
next(self.gen)
File "C:\DevTools\Anaconda3\envs\py36_tfg\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
[[Node: input/input/div/_79 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_111_input/input/div", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "vgg16_train_and_test.py", line 212, in <module>
train()
File "vgg16_train_and_test.py", line 124, in train
coord.join(threads)
File "C:\DevTools\Anaconda3\envs\py36_tfg\lib\site-packages\tensorflow\python\training\coordinator.py", line 389, in join
six.reraise(*self._exc_info_to_raise)
File "C:\DevTools\Anaconda3\envs\py36_tfg\lib\site-packages\six.py", line 693, in reraise
raise value
File "C:\DevTools\Anaconda3\envs\py36_tfg\lib\site-packages\tensorflow\python\training\queue_runner_impl.py", line 234, in _run
sess.run(enqueue_op)
File "C:\DevTools\Anaconda3\envs\py36_tfg\lib\site-packages\tensorflow\python\client\session.py", line 778, in run
run_metadata_ptr)
File "C:\DevTools\Anaconda3\envs\py36_tfg\lib\site-packages\tensorflow\python\client\session.py", line 982, in _run
feed_dict_string, options, run_metadata)
File "C:\DevTools\Anaconda3\envs\py36_tfg\lib\site-packages\tensorflow\python\client\session.py", line 1032, in _do_run
target_list, options, run_metadata)
File "C:\DevTools\Anaconda3\envs\py36_tfg\lib\site-packages\tensorflow\python\client\session.py", line 1052, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Dst tensor is not initialized.
[[Node: input/input/div/_79 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_111_input/input/div", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Categories