tensorflow built from source using cuda or not? - python

I built tensorflow with GPU support from source for python on macOS following the official instructions. When I import tensorflow though, I don't get the typical CUDA loading messages I do when I use the pip version (as below).
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
However, when I run my test program with my build, I do see that the GPU is being found and used (I think).
~/Drive/thesis/image_keras$ python3 demo.py
Using TensorFlow backend.
Found 2125 images belonging to 2 classes.
Found 832 images belonging to 2 classes.
demo.py:64: UserWarning: Update your `fit_generator` call to the Keras 2 API: `fit_generator(<keras.pre..., validation_data=<keras.pre..., steps_per_epoch=128, epochs=25, validation_steps=832)`
nb_val_samples=nb_validation_samples)
Epoch 1/25
2017-04-13 08:39:24.542434: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:865] OS X does not support NUMA - returning NUMA node zero
2017-04-13 08:39:24.542538: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties:
name: GeForce GT 750M
major: 3 minor: 0 memoryClockRate (GHz) 0.9255
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.77GiB
2017-04-13 08:39:24.542551: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0
2017-04-13 08:39:24.542557: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0: Y
2017-04-13 08:39:24.542566: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 750M, pci bus id: 0000:01:00.0)
49/128 [==========>...................] - ETA: 18s - loss: 0.7352 - acc: 0.5166
It looks like its using GPU, but without the CUDA loading I'm not sure. If it makes a difference I am running CUDA-8.0 with cuDNN-8.0-v5.1

tensorflow.test.is_gpu_available()
tensorflow.test.is_built_with_cuda()
If you run these codes, and Tensorflow is built with CUDA, then both functions should return True.
I have to use this, because as given in the previous answer, I don't get a output with "successfully opened CUDA library" lines printed as shown, even though I'm using the pip version.
I use Tensorflow 1.4.0.

Related

(Tensorflow) Stuck at Epoch 1 during model.fit()

I've been trying to make Tensorflow 2.8.0 work with my Windows GPU (GeForce GTX 1650 Ti), and even though it detects my GPU, any model that I make will be stuck at Epoch 1 indefinitely when I try to use the fit method till the kernel (I've tried on jupyter notebook and spyder) hangs and restarts.
Based on Tensorflow's website, I've downloaded the respective cuDNN and CUDA versions, for which I've further verified (together with tensorflow's detection of my GPU) by running the various commands:
CUDA (Supposed to be 11.2)
(on command line)
nvcc --version
Build cuda_11.2.r11.2/compiler.29373293_0
(In python)
import tensorflow.python.platform.build_info as build
print(build.build_info['cuda_version'])
Output: '64_112'
cuDNN (Supposed to be 8.1)
import tensorflow.python.platform.build_info as build
print(build.build_info['cuda_version'])
Output: '64_8' # Looks like v8 but I've actually installed v8.1 (cuDNN v8.1.1 (Feburary 26th, 2021), for CUDA 11.0,11.1 and 11.2) so I think it's fine?
GPU Checks
tf.config.list_physical_devices('GPU')
Output: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
tf.test.is_gpu_available()
Output: True
tf.test.gpu_device_name()
Output: This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Created device /device:GPU:0 with 2153 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1650 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5
When I then try to fit any sort of model, it just fails following what I described above. What is surprising is that even though it can't load code such as that described in Tensorflow's CNN Tutorial, the only time it ever works is if I run the chunk of code from this stackoverflow question. This chunk of code looks almost the same as every other chunk that failed.
Can someone help me with this issue? I've been desperately testing TensorFlow with every chunk of code that I came across for the past couple of hours, and the only time where it does not get stuck at Epoch 1 is with the link above.
**(I've also tried running only on my CPU via os.environ['CUDA_VISIBLE_DEVICES'] = '-1' and everything seems to work fine)
Update (Solution)
It seems like the suggestions from this post helped - I've copied the following files from the zipped cudnn bin sub folder (cudnn-11.2-windows-x64-v8.1.1.33\cuda\bin) into my cuda bin folder (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin)
cudnn_adv_infer64_8.dll
cudnn_adv_train64_8.dll
cudnn_cnn_infer64_8.dll
cudnn_cnn_train64_8.dll
cudnn_ops_infer64_8.dll
cudnn_ops_train64_8.dll
It seems like I initially misinterpreted the copy all cudnn*.dll files as only copying over the cudnn64_8.dll file, rather than copying every other file listed above.

Unknown error/crash - TensorFlow LSTM with GPU (no output after start of 1st epoch)

I'm trying to train a model using LSTM layers. I'm using a GPU and all needed libraries are loaded.
When I'm building the model this way:
model = keras.Sequential()
model.add(layers.LSTM(256, activation="relu", return_sequences=False)) # note the activation function
model.add(layers.Dropout(0.2))
model.add(layers.Dense(256, activation="relu"))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(1))
model.add(layers.Activation(activation="sigmoid"))
model.compile(
loss=keras.losses.BinaryCrossentropy(),
optimizer="adam",
metrics=["accuracy"]
)
It works. But it's using activation="relu" on the LSTM layer, so it's not CuDNNLSTM - that's automatically chosen when the activation function is tanh (default) - if I'm not wrong.
So, it's painfully slow and I would like to run the faster CuDNNLSTM. My code for that:
model = keras.Sequential()
model.add(layers.LSTM(256, return_sequences=False))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(256, activation="relu"))
model.add(layers.Dropout(0.2))
model.add(layers.Dense(1))
model.add(layers.Activation(activation="sigmoid"))
model.compile(
loss=keras.losses.BinaryCrossentropy(),
optimizer="adam",
metrics=["accuracy"]
)
It's basically the same, only without the activation function provided, so tanh will be used.
But now it's not training, and the end of output looks like this:
2021-04-19 22:41:46.046218: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-04-19 22:41:46.046426: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-04-19 22:41:46.046642: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-04-19 22:41:46.046942: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-04-19 22:41:46.047124: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-04-19 22:41:46.047312: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-04-19 22:41:46.047489: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-04-19 22:41:46.047663: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-04-19 22:41:46.047936: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-04-19 22:41:46.665456: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-04-19 22:41:46.665712: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-04-19 22:41:46.665876: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2021-04-19 22:41:46.666186: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2982 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2021-04-19 22:41:46.667505: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-04-19 22:42:07.374456: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Epoch 1/50
2021-04-19 22:42:08.922891: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-04-19 22:42:09.272264: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-04-19 22:42:09.302667: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
Process finished with exit code -1073740791 (0xC0000409)
It just starts the first epoch, then freezes for a minute and exits with this weird exit code.
Shape of the input data: tf.Tensor([50985 29 7], shape=(3,), dtype=int32)
My GPU: Nvidia GTX 1050 Ti
CUDA: v11.3
OS: Windows 10
IDE: PyCharm
Finding solutions for this problem is a bit challenging as I don't have any error outputed. Am I doing something wrong? Has anyone encountered a similar issue? What should help?
// Edit; I tried:
running this model with much fewer units (2 instead of 256) and lower batch_size
downgrading tensorflow to 2.4.0, CUDA to 11.0 and cudnn to 8.0.1 with python 3.7.1 (this should be a right combination according to this list from TensorFlow website)
restarting my PC :)
I found the solution... kinda.
So it works as it should when I downgraded tensorflow to 2.1.0, CUDA to 10.1 and cudnn to 7.6.5 (at the time 4th combination from this list on TensorFlow website)
I don't know why it didn't work at the newest version, or at the valid combination for tensorflow 2.4.0.
It's working well so my issue is solved. Nonetheless it would be nice to know why using LSTM with cudnn on higher versions didn't work for me, as I haven't found this issue anywhere.
replace
y1 = LSTM(64)(input)
with
y1 = RNN(tf.keras.layers.LSTMCell(64))(input)

It seems tensorflow is not recognizing my GPU, how to fix it?

I installed tensorflow-gpuinto my new computer, and the system, and everything is recognizing perfectly my GPU, for that I tried on my terminal:
Nvidia test:
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
However, trying to know if tensorflow recognizes the GPU I tried:
Tensorflow test:
import tensorflow as tf
tf.test.is_gpu_available()
Result:
2020-05-04 22:51:25.687188: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-04 22:51:25.687914: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:06:00.0 name: GeForce RTX 2060 SUPER computeCapability: 7.5
coreClock: 1.65GHz coreCount: 34 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2020-05-04 22:51:25.687956: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-05-04 22:51:25.687972: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-05-04 22:51:25.687986: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-05-04 22:51:25.688002: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-05-04 22:51:25.688015: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-05-04 22:51:25.688029: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-05-04 22:51:25.688112: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory
2020-05-04 22:51:25.688124: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1598] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-05-04 22:51:25.688160: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-04 22:51:25.688170: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-05-04 22:51:25.688178: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
False
On the other hand, following the suggestion given here. I tried the following commands:
Other Tensorflow test:
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
And I get the following logs.
My logs:
2020-05-04 22:53:35.486634: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-04 22:53:35.487357: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:06:00.0 name: GeForce RTX 2060 SUPER computeCapability: 7.5
coreClock: 1.65GHz coreCount: 34 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s
2020-05-04 22:53:35.487403: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-05-04 22:53:35.487421: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-05-04 22:53:35.487436: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-05-04 22:53:35.487451: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-05-04 22:53:35.487464: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-05-04 22:53:35.487477: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-05-04 22:53:35.487564: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory
2020-05-04 22:53:35.487574: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1598] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2020-05-04 22:53:35.487591: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-05-04 22:53:35.487598: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2020-05-04 22:53:35.487604: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 12034437465466050746
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 16469163198093824972
physical_device_desc: "device: XLA_CPU device"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 5712734079173508475
physical_device_desc: "device: XLA_GPU device"
]
is tensorflow recognizing my GPU? Some results tell me True and other False. Please help.
Edit
Other test I have done:
import tensorflow as tf
tf.test.is_built_with_cuda()
True
What is wrong here?
Your GPU is being detected, you just didn't install everything correctly. You need this post: Which TensorFlow and CUDA version combinations are compatible?. I think your cuda is OK, but cudnn isn't installed correctly, as from the logs.
Tensorflow is detecting your GPU but it does not means that it is fully functional.
As you can see in te logs, the libcuddn lib. does not appears to be installed on the system.
You can find the installation instructions of this library in the official documentation : https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html
Make sure that you also have all the dependancies listed there installed : https://www.tensorflow.org/install/gpu

tesla c2075 with tensorflow cuda version installation [duplicate]

This question already has answers here:
How can I make tensorflow run on a GPU with capability 2.x?
(3 answers)
Closed 3 years ago.
I'm new for GPU related model training.
I have Tesla C2075 with 6GB GPU and using keras CuDNNLSTM for faster training.
I have installed cuda-9 with cudnn=7.0.5, tensorflow-gpu==1.12.0 and using ubuntu 16.04.
For Tesla C2075 GPU model is compatible with cuda-9?
I have checked https://developer.nvidia.com/cuda-gpus link in this they have mentioned tesla C2075 is compute compatible to 2.0. what is compute compatible?
And while running my model tensorflow log,
tensorflow/core/common_runtime/gpu/gpu_device.cc:1482] Ignoring visible gpu device (device: 0, name: Tesla C2075, pci bus id: 0000:03:00.0, compute capability: 2.0) with Cuda compute capability 2.0. The minimum required Cuda capability is 3.5.
And I'm also getting error while model.fit(...),
InvalidArgumentError (see above for traceback): No OpKernel was registered to support Op 'CudnnRNN' with these attrs. Registered devices: [CPU,XLA_CPU,XLA_GPU], Registered kernels:
device='GPU'; T in [DT_DOUBLE]
device='GPU'; T in [DT_FLOAT]
device='GPU'; T in [DT_HALF]
[[node bidirectional_1/CudnnRNN (defined at /usr/local/lib/python3.5/dist-packages/tensorflow/contrib/cudnn_rnn/python/ops/cudnn_rnn_ops.py:922) = CudnnRNN[T=DT_FLOAT, direction="unidirectional", dropout=0, input_mode="linear_input", is_training=true, rnn_mode="lstm", seed=87654321, seed2=0](bidirectional_1/transpose, bidirectional_1/ExpandDims_1, bidirectional_1/ExpandDims_2, bidirectional_1/concat)]]
Thanks
The CUDA compute capability relates somehow to the architecture and hardware capabilities of the GPU, there is quite an extensive list in wikipedia.
The tensoflow webpage suggests that you need a GPU with CC bigger than 3.5 (older versions seemed to accept 3.0, but never lower).
Unfortunately this is a hardware limitation, the only way of changing your compute capability is using a different GPU. Simply said: you can not run Tensorflow in that GPU.

Tensorflow not showing "Successfully opened so & so CUDA libraries locally"

I configured tensorflow to work with CUDA support on my GPU (GeForce 840M) but the programs are running quite slow in compare to what my CPU used to earlier. Also, I do not get any kind of message that the so and so CUDA library was successfully opened when I run the program. Instead, this is what I get in logs when I run any tensorflow program:
python Neuralnet.py
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
Extracting /tmp/data/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz
2017-03-28 07:53:57.979382: W tensorflow/core/platform/cpu_feature_guard.cc:45]
The TensorFlow library wasn't compiled to use SSE4.1 instructions,
but these are available on your machine and could speed up CPU computations.
2017-03-28 07:53:57.979413: W tensorflow/core/platform/cpu_feature_guard.cc:45]
The TensorFlow library wasn't compiled to use SSE4.2 instructions,
but these are available on your machine and could speed up CPU computations.
2017-03-28 07:53:57.979431: W tensorflow/core/platform/cpu_feature_guard.cc:45]
The TensorFlow library wasn't compiled to use AVX instructions,
but these are available on your machine and could speed up CPU computations.
2017-03-28 07:53:57.979438: W tensorflow/core/platform/cpu_feature_guard.cc:45]
The TensorFlow library wasn't compiled to use AVX2 instructions,
but these are available on your machine and could speed up CPU computations.
2017-03-28 07:53:57.979447: W tensorflow/core/platform/cpu_feature_guard.cc:45]
The TensorFlow library wasn't compiled to use FMA instructions,
but these are available on your machine and could speed up CPU computations.
2017-03-28 07:53:58.233876: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901]
successful NUMA node read from SysFS had negative value (-1),
but there must be at least one NUMA node, so returning NUMA node zero
2017-03-28 07:53:58.234333: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887]
Found device 0 with properties:
name: GeForce 840M
major: 5 minor: 0 memoryClockRate (GHz) 1.124
pciBusID 0000:08:00.0
Total memory: 1.96GiB
Free memory: 1.75GiB
2017-03-28 07:53:58.234362: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0
2017-03-28 07:53:58.234372: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0: Y
2017-03-28 07:53:58.234388: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977]
Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce 840M, pci bus id: 0000:08:00.0)
('Epoch', 0, 'completed out of', 15, 'loss:', 115374329.04653475)
And so on the program started runnning but it didn't ran any faster according to my expectations. I installed CUDA from the official documentation, but I did not reset the git master head since it was creating issues and I used the same optimization flags provided bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package when building through bazel.
Did you use nvidia-smi to tell whether you have the right cuda drivers installed and that your gpu is visible to the system?
In TF you can set the log_device_placement option to understand if any ops are being assigned to the GPU.

Categories