I would like to train a Tensorflow model by using my GPU
I'm using :
tensorboard 2.4.1
tensorboard-plugin-wit 1.8.0
tensorflow-estimator 2.4.0
tensorflow-gpu 2.4.1
cuda 11.0
cdnn 8.0.4
gpu RTX 3060 Laptop 6Gb
Nvidia FrameView SDK 1.1.4923.29548709
Nvidia Graphics Drivers 461.72
Nvidia PhysX 9.19.0218
Python 3.8.5
IDE Spyder 4.2.1
OS Windows 10 LTSC-2019 (modified)
What did I do before posting this help ?
1/ I've installed Nvidia Graphics Drivers
2/ I've followed this Tensorflow tutorial : https://www.tensorflow.org/install/gpu
So I've copied cuda folder from cdnn download archive in C:\tools\
I've also added all variables required to Path
3/ Tried to train my model (all works if I'm using CPU instead) :
with tf.device("/GPU:0"):
history = model.fit(images, imagesID, epochs=50, validation_split=0.2)
Error :
2021-03-14 15:07:16.145096: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2021-03-14 15:07:16.145335: E tensorflow/stream_executor/cuda/cuda_dnn.cc:340] Error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
2021-03-14 15:07:16.146411: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2021-03-14 15:07:16.146595: E tensorflow/stream_executor/cuda/cuda_dnn.cc:340] Error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
2021-03-14 15:07:16.146845: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at conv_ops_fused_impl.h:697 : Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
So I've found on the Internet this : https://github.com/tensorflow/tensorflow/issues/45779
Thus, I've implemented this code at the top to limit GPU memory :
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
print(e)
Error :
Physical devices cannot be modified after being initialized
So I've found this : https://github.com/tensorflow/tensorflow/issues/25138
from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession
config = ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.2
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)
But I still have the same error :
2021-03-14 15:07:16.145096: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
...
I'm completely lost because I have a lack of knowledges about Tensorflow-GPU errors...
Detail of all logs is here : https://pastebin.com/Xtsv3mLe
I'm not very good at writing posts, I hope I was clear enough.
Thank you in advance !!
You need Cuda 11.0 not 11.1. You can get more information on what you need here https://www.tensorflow.org/install/gpu. This may be a lot more helpful than reading the install guide, although you should, https://alejandro-gc.medium.com/setting-up-your-gpu-for-tensorflow-2-4-2021-d98cac79a686.
Related
I am trying out detectron2 and want to train the sample model.
When running the following code I get (<class 'RuntimeError'>, RuntimeError('No CUDA GPUs are available'), <traceback object at 0x7f42b094ebc0>). Find below the code:
import detectron2
from detectron2.utils.logger import setup_logger
setup_logger()
# import some common libraries
import matplotlib.pyplot as plt
import numpy as np
import cv2
# import some common detectron2 utilities
from detectron2.engine import DefaultPredictor
from detectron2.config import get_cfg
from detectron2.utils.visualizer import Visualizer
from detectron2.data import MetadataCatalog, DatasetCatalog
from detectron2.data.datasets import register_coco_instances
import random
from detectron2.engine import DefaultTrainer
from detectron2.config import get_cfg
import os
# To verify the data loading is correct, let's visualize the annotations of randomly selected samples in the training set:
register_coco_instances("fruits_nuts", {}, "../data/trainval.json", "../data/images")
fruits_nuts_metadata = MetadataCatalog.get("fruits_nuts")
dataset_dicts = DatasetCatalog.get("fruits_nuts")
'''
for d in random.sample(dataset_dicts, 3):
img = cv2.imread(d["file_name"])
visualizer = Visualizer(img[:, :, ::-1], metadata=fruits_nuts_metadata, scale=0.5)
vis = visualizer.draw_dataset_dict(d)
cv2.imshow('new', vis.get_image()[:, :, ::-1])
cv2.waitKey(0)
'''
# train model
cfg = get_cfg()
cfg.merge_from_file("../detectron2_repo/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")
cfg.DATASETS.TRAIN = ("fruits_nuts",)
cfg.DATASETS.TEST = () # no metrics implemented for this dataset
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = "detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl" # initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.02
cfg.SOLVER.MAX_ITER = 300 # 300 iterations seems good enough, but you can certainly train longer
cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128 # faster, and good enough for this toy dataset
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 3 # 3 classes (data, fig, hazelnut)
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()
I ran the script collect_env.py from torch:
/home/project/.venv/bin/python /home/project/src/collect_env.py
Collecting environment information...
PyTorch version: 1.10.2+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.31
Python version: 3.8.10 (default, Nov 26 2021, 20:14:08) [GCC 9.3.0] (64-bit runtime)
Python platform: Linux-5.13.0-27-generic-x86_64-with-glibc2.29
Is CUDA available: False
CUDA runtime version: 10.1.243
GPU models and configuration: Could not collect
Nvidia driver version: Could not collect
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.22.1
[pip3] torch==1.10.2
[pip3] torchvision==0.11.3
[conda] Could not collect
Process finished with exit code 0
I am having on the system a RTX3080 graphic card. However, it seems to me that its not found.
Any suggestions why?
Is there a way to run the training without CUDA?
I appreciate your replies!
I'm not sure if this works for you. But let's see from a Windows user perspective.
I'm using Detectron2 on Windows 10 with RTX3060 Laptop GPU CUDA enabled.
The first thing you should check is the CUDA. You can check by using the command:
nvcc -V
It should be shown this message:
C:\Users\User>nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_May__3_19:41:42_Pacific_Daylight_Time_2021
Cuda compilation tools, release 11.3, V11.3.109
Build cuda_11.3.r11.3/compiler.29920130_0
And to check if your Pytorch is installed with CUDA enabled, use this command (reference from their website):
import torch
torch.cuda.is_available()
As on your system info shared in this question, you haven't installed CUDA on your system. And your system doesn't detect any GPU (driver) available on your system.
As far as I know, they recommended installing Pytorch CUDA to run Detectron2 by (Nvidia) GPU.
(you can check on Pytorch website and Detectron2 GitHub repo for more details).
Or, you can use this option:
Add this line of code to your python program (as reference of this issues#300):
cfg.MODEL.DEVICE = "cpu"
I hope it helps. Cheers.
I have a computer with few NVidia GPU, use packet 'segmentation_models' and build NN on the base of Unet:
import segmentation_models as sm
import keras.backend as K
from keras import optimizers
from keras.utils import multi_gpu_model
lr = 2e-4
NUM_GPUS = 3
learning_rate = lr * NUM_GPUS
adam = optimizers.Adam(lr=learning_rate)
def dice_coef(y_true, y_pred, smooth=1):
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection = K.sum(y_true_f * y_pred_f)
return (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)
model = sm.Unet('efficientnetb3', encoder_weights='imagenet', classes=4, activation='softmax', encoder_freeze=False)
parallel_model = multi_gpu_model(model, gpus=NUM_GPUS)
model = parallel_model
model.compile(adam, 'categorical_crossentropy', [dice_coef])
history = model.fit_generator(
generator=train_gen, steps_per_epoch=len(train_gen), \
validation_data=validation_gen, \
epochs=50, callbacks=[clr, checkpoints, csv_logger],
initial_epoch=0)
after training I save weights for future using in cpu-mode:
single_gpu_model = model.layers[-2]
single_gpu_model.save(single_proc_model_path_1_kernel)
And I try to work with theese weights:
import keras
model1 = keras.models.load_model(single_proc_model_path_1_kernel)
...
pr_mask = self.model1.predict(img_exp)
Machine for NN training: Ubuntu 16.04.4 LTS, 3 x K80 GPU; python 3.6.7, tensorflow 1.12.0 - all code works here.
Win10 with 1 GeForce GTX 1080; python 3.7.3, tensorflow-gpu 1.13.1 - code works here too.
Win10 without NVidia GPU; tensorflow-gpu 1.13.1 - ERROR when loading model:
tensorflow/stream_executor/cuda/cuda_driver.cc:300] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
docker with Ubuntu 18.04.3 LTS; python 3.6.9, tensorflow 2.1.0.
Error when loading model:
tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Segmentation Models: using keras framework.
tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (b36a4cf2df2e): /proc/driver/nvidia/version does not exist
What should I change to force code to work on a machine with CPUs ony?
Tensorflow 1.15 resolved all the problems. Thanks.
You can try setting the environment variable CUDA_VISIBLE_DEVICES to either blank or emptystring "", or possibly -1.
Otherwise you'll need to tell the tensorflow backend to use CPU only.
See also: Can Keras with Tensorflow backend be forced to use CPU or GPU at will?
Note that keras multi_gpu_model is deprecated and you should alter your code to use tf.distribute.MirroredStrategy instead. I haven't personally worked with it but I imagine this new API is designed to work more seamlessly across GPU/CPU situations like yours.
I am trying to run a prediction of a model build in keras on my NVIDIA Tegra TX2 using Tensorflow and Python (2.7) and I am quite randomly running in tensorflow giving me the following exception:
Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4504 MB memory) -> physical GPU (device: 0, name: NVIDIA Tegra X2, pci bus id: 0000:00:00.0, compute capability: 6.2)
2019-10-04 16:17:50.786531: E tensorflow/stream_executor/cuda/cuda_driver.cc:1032] could not synchronize on CUDA context: CUDA_ERROR_UNKNOWN: unknown error :: *** Begin stack trace ***
stream_executor::gpu::GpuDriver::SynchronizeContext(stream_executor::gpu::GpuContext*)
stream_executor::StreamExecutor::SynchronizeAllActivity()
tensorflow::GPUUtil::SyncAll(tensorflow::Device*)
*** End stack trace ***
...
tensorflow.python.framework.errors_impl.InternalError: GPU sync failed
Sometimes after a few reboots / waiting time the problem is solved and I can run the prediction again but 8 out of 10 times this error appears.
I've already tried the following:
Change the query amount and memory usage as follows:
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
config.per_process_gpu_memory_fraction = 0.7
session = tf.Session(config=config, ...)
Re-Install Tensorflow build for TX2 and Jetpack v3.3
I would be really happy for any further suggestions.
Since this problem is intermittent, this may be happening when tensorflow is not getting the required memory. It is a known issue and you have already tried the basic trouble shooting steps. Since you are still getting the issue, try below steps as well:
reinstall libhdf5-dev, python-h5py
sudo apt-get install libhdf5-dev
sudo apt-get install python-h5py
and then set gpu allow growth as per "https://github.com/keras-team/keras/issues/4161#issuecomment-366031228"
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
set_session(sess)
Adding the following line solved the issue. Note my TensorFlow version is 2.1>
import tensorflow as tf
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.compat.v1.InteractiveSession(config=config)
I am trying to use Alexnet CNN from tensorflow, I can train model without any error messages and have access to the tensorboard, and I can test the trained model with no error.
Only problem is, while training, GPU usage stays mostly 0% sometimes fluctuates to 25% but rarely. However all my CPUS are working crazy like over 90%. So I am assuming it is using CPU instead of GPU.
Here is my setup
Windows 8.1 x64
GPU 1070 driver version 3.88
tensorflow-gpu 1.8.0
CUDA toolkit v9.0
cuDNN version 7
I can import tensorflow in python with no error, I ran some tests to see if it is installed correctly,
Test 1
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 625346735515728619
, name: "/device:GPU:0"
device_type: "GPU"
memory_limit: 6911164212
locality {
bus_id: 1
links {
}
}
incarnation: 15764160474642097170
physical_device_desc: "device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1"
]
Test 2
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))
b'Hello, TensorFlow!'
Test3
import ctypes
import imp
import sys
def main():
try:
import tensorflow as tf
print("TensorFlow successfully installed.")
if tf.test.is_built_with_cuda():
print("The installed version of TensorFlow includes GPU support.")
else:
print("The installed version of TensorFlow does not include GPU support.")
sys.exit(0)
except ImportError:
print("ERROR: Failed to import the TensorFlow module.")
candidate_explanation = False
python_version = sys.version_info.major, sys.version_info.minor
print("\n- Python version is %d.%d." % python_version)
if not (python_version == (3, 5) or python_version == (3, 6)):
candidate_explanation = True
print("- The official distribution of TensorFlow for Windows requires "
"Python version 3.5 or 3.6.")
try:
_, pathname, _ = imp.find_module("tensorflow")
print("\n- TensorFlow is installed at: %s" % pathname)
except ImportError:
candidate_explanation = False
print("""
- No module named TensorFlow is installed in this Python environment. You may
install it using the command `pip install tensorflow`.""")
try:
msvcp140 = ctypes.WinDLL("msvcp140.dll")
except OSError:
candidate_explanation = True
print("""
- Could not load 'msvcp140.dll'. TensorFlow requires that this DLL be
installed in a directory that is named in your %PATH% environment
variable. You may install this DLL by downloading Microsoft Visual
C++ 2015 Redistributable Update 3 from this URL:
https://www.microsoft.com/en-us/download/details.aspx?id=53587""")
try:
cudart64_80 = ctypes.WinDLL("cudart64_80.dll")
except OSError:
candidate_explanation = True
print("""
- Could not load 'cudart64_80.dll'. The GPU version of TensorFlow
requires that this DLL be installed in a directory that is named in
your %PATH% environment variable. Download and install CUDA 8.0 from
this URL: https://developer.nvidia.com/cuda-toolkit""")
try:
nvcuda = ctypes.WinDLL("nvcuda.dll")
except OSError:
candidate_explanation = True
print("""
- Could not load 'nvcuda.dll'. The GPU version of TensorFlow requires that
this DLL be installed in a directory that is named in your %PATH%
environment variable. Typically it is installed in 'C:\Windows\System32'.
If it is not present, ensure that you have a CUDA-capable GPU with the
correct driver installed.""")
cudnn5_found = False
try:
cudnn5 = ctypes.WinDLL("cudnn64_5.dll")
cudnn5_found = True
except OSError:
candidate_explanation = True
print("""
- Could not load 'cudnn64_5.dll'. The GPU version of TensorFlow
requires that this DLL be installed in a directory that is named in
your %PATH% environment variable. Note that installing cuDNN is a
separate step from installing CUDA, and it is often found in a
different directory from the CUDA DLLs. You may install the
necessary DLL by downloading cuDNN 5.1 from this URL:
https://developer.nvidia.com/cudnn""")
cudnn6_found = False
try:
cudnn = ctypes.WinDLL("cudnn64_6.dll")
cudnn6_found = True
except OSError:
candidate_explanation = True
if not cudnn5_found or not cudnn6_found:
print()
if not cudnn5_found and not cudnn6_found:
print("- Could not find cuDNN.")
elif not cudnn5_found:
print("- Could not find cuDNN 5.1.")
else:
print("- Could not find cuDNN 6.")
print("""
The GPU version of TensorFlow requires that the correct cuDNN DLL be installed
in a directory that is named in your %PATH% environment variable. Note that
installing cuDNN is a separate step from installing CUDA, and it is often
found in a different directory from the CUDA DLLs. The correct version of
cuDNN depends on your version of TensorFlow:
* TensorFlow 1.2.1 or earlier requires cuDNN 5.1. ('cudnn64_5.dll')
* TensorFlow 1.3 or later requires cuDNN 6. ('cudnn64_6.dll')
You may install the necessary DLL by downloading cuDNN from this URL:
https://developer.nvidia.com/cudnn""")
if not candidate_explanation:
print("""
- All required DLLs appear to be present. Please open an issue on the
TensorFlow GitHub page: https://github.com/tensorflow/tensorflow/issues""")
sys.exit(-1)
if __name__ == "__main__":
main()
I get
TensorFlow successfully installed.
The installed version of TensorFlow includes GPU support.
Training in python
When start to train in python, I get some warnings but when I searched it, I can seem to ignore these warnings,
curses is not supported on this machine (please install/reinstall curses for an optimal experience)
WARNING:tensorflow:From C:\Users\Jay\AppData\Local\Programs\Python\Python36\lib\site-packages\tflearn\initializations.py:119: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior.
WARNING:tensorflow:From C:\Users\Jay\AppData\Local\Programs\Python\Python36\lib\site-packages\tflearn\objectives.py:66: calling reduce_sum (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
---------------------------------
Run id: pygta5-car-fast-0.001-alexnetv2-10-epochs-300K-data.model
Log directory: log/
---------------------------------
Training samples: 1500
Validation samples: 500
--
Training Step: 1 | time: 2.863s
[2K
| Momentum | epoch: 001 | loss: 0.00000 - acc: 0.0000 -- iter: 0064/1500
[A[ATraining Step: 2 | total loss: [1m[32m1.73151[0m[0m | time: 4.523s
Training in cmd
When I train in cmd, I get slighly different messages,
curses is not supported on this machine (please install/reinstall curses for an optimal experience)
WARNING:tensorflow:From C:\Users\Jay\AppData\Local\Programs\Python\Python36\lib\site-packages\tflearn\initializations.py:119: UniformUnitScaling.__init__ (from tensorflow.python.ops.init_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior.
WARNING:tensorflow:From C:\Users\Jay\AppData\Local\Programs\Python\Python36\lib\site-packages\tflearn\objectives.py:66: calling reduce_sum (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
2018-05-13 19:13:07.272665: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2018-05-13 19:13:07.749663: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 1070 major: 6 minor: 1 memoryClockRate(GHz): 1.7845
pciBusID: 0000:01:00.0
totalMemory: 8.00GiB freeMemory: 6.77GiB
2018-05-13 19:13:07.766329: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-13 19:13:08.295258: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-13 19:13:08.310539: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:929] 0
2018-05-13 19:13:08.317846: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:942] 0: N
2018-05-13 19:13:08.325655: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6540 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-05-13 19:13:09.481654: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-13 19:13:09.492392: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-13 19:13:09.507539: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:929] 0
2018-05-13 19:13:09.514839: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:942] 0: N
2018-05-13 19:13:09.522600: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6540 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1070, pci bus id: 0000:01:00.0, compute capability: 6.1)
---------------------------------
Run id: pygta5-car-fast-0.001-alexnetv2-10-epochs-300K-data.model
Log directory: log/
---------------------------------
Training samples: 1500
Validation samples: 500
--
Training Step: 1 | time: 2.879s
| Momentum | epoch: 001 | loss: 0.00000 - acc: 0.0000 -- iter: 0064/1500
←[A←[ATraining Step: 2 | total loss: ←[1m←[32m1.60460←[0m←[0m | time: 4.542s
Performance while training
while training, CPU is almost always over 90%, while GPU usage is around 0~25%
Thank you for checking out this long post, I can't seem to locate the problem here to start with. Any help will be greatly appreciated,
Are you using your own model or some existing code?
If you're using your own implementation the low GPU utilization might be due to your specific implementation. Often, the data pipeline is to blame for this. Check out the performance guide for TF (https://www.tensorflow.org/performance/performance_guide) as well as how to efficiently import data using tf.data.Dataset (https://www.tensorflow.org/programmers_guide/datasets).
If it is not the data pipeline then some of your code is probably executed on the CPU, which results in a lot of copying from GPU to CPU and vice versa. Maybe you perform some of the operations with Numpy or something similar, which only runs on CPU?! Also, other common guidelines apply, e.g. try not to use placeholders and feed_dict, etc...
Overall it would be more helpful if we can see your code or if we knew which model you are using if you are using existing code. Without seeing your code it's hard to know what the problem is. Try downloading the MNIST-CNN example from Tensorflow directly and run it on your computer. It should have a GPU-Util of at least 80%. If this is not the case then mos
Unfortunately, Tensorflow+Cuda+CuDNN installation is known to be pain in the ass, especially on Windows.
I would recommend using precompiled wheels from here: https://github.com/fo40225/tensorflow-windows-wheel/
You can pick latest tensorflow, cuda and cudnn versions (not even supported by official TF releases). Also you can utilize AVX2 instructions by selecting appropriate build.
Already long time I try to understand with a problem. Please help me.
I'm trying to run the 'Keras' example from the standard example git lib (there).
If I use CPU, then everything will works fine; But, If I try to use GPU acceleration, it will crash WITHOUT catching any errors:
# build the model: a single LSTM
print('Build model...')
print(' 1')
model = Sequential()
print(' 2')
model.add(LSTM(128, input_shape=(maxlen, len(chars))))
print(' 3')
model.add(Dense(len(chars)))
print(' 4')
model.add(Activation('softmax'))
print(' 5')
optimizer = RMSprop(lr=0.01)
print(' Compilling')
model.compile(loss='categorical_crossentropy', optimizer=optimizer)
I put some print() for better understand the place of error.
And what I get:
runfile('C:/Users/kostya/Desktop/temp/python/test.py', wdir='C:/Users/kostya/Desktop/temp/python/')
Using Theano backend.
Using cuDNN version 5110 on context None
Preallocating 1638/2048 Mb (0.800000) on cuda
Mapped name None to device cuda: GeForce GTX 650 (0000:01:00.0)
WARNING: Preallocating too much memory can prevent cudnn and cublas from working properly
DEVICE: cuda
corpus length: 206433
total chars: 79
nb sequences: 68798
Vectorization...
Build model...
1
2
Ядро остановилось, перезапуск *(It means: The Core has stopped, restarting)*
I will take Similar error, if I run it througth standatr python console. (python.exe emergency stops)
I use: Win 10-64, Python 3.6.1, Anaconda with activated separate enviroment, CUDA 8.0, cuRNN 5.1, mkl 2017.0.3, numpy 1.13.0, theano 0.9.0, conda-forge.keras 2.0.2, m2w64-openblas 0.2.19, conda-forge.pygpu 0.6.8, VC 14.0 etc.
That's my .theanorc.txt configurational file. (I'm sure this can catch him. If I put the device = cpu - it works fine (but slowly))
[global]
floatX = float32
device = cuda
optimizer_including = cudnn
[nvcc]
flags=-LC:\Users\kostya\Anaconda3\envs\keras\libs
compiler_bindir=C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin
[cuda]
root = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0
[dnn]
library_path = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\lib\x64
include_path = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v8.0\include
[gpuarray]
preallocate = 0.8
You trying to use gpuarray backend option (preallocate) with CUDA backend. From Theano doc:
This value allocates GPU memory ONLY when using (GpuArray Backend). For the old backend, please see config.lib.cnmem
Try replace in your Theano config
[gpuarray]
preallocate = 0.8
with
[lib]
cnmem = 0.8