I found this little code on SO and ran on my Ubuntu 22.04 VM:
import tensorflow as tf
from tensorflow.python.client import device_lib
print(tf.version.VERSION)
local_device_protos = device_lib.list_local_devices()
print([x.name for x in local_device_protos if x.device_type == 'CPU'])
output:
2.9.1
['/device:CPU:0']
I was suspicious abt how little CPU my TF code uses. Now I see why.
Question why just one? I have 20 CPU allocated for this VM and htop shows all of them. But when I run Jupyter notebook with TF code max usage is 160%
Any ideas how to investigate it?
Related
I've been trying to make Tensorflow 2.8.0 work with my Windows GPU (GeForce GTX 1650 Ti), and even though it detects my GPU, any model that I make will be stuck at Epoch 1 indefinitely when I try to use the fit method till the kernel (I've tried on jupyter notebook and spyder) hangs and restarts.
Based on Tensorflow's website, I've downloaded the respective cuDNN and CUDA versions, for which I've further verified (together with tensorflow's detection of my GPU) by running the various commands:
CUDA (Supposed to be 11.2)
(on command line)
nvcc --version
Build cuda_11.2.r11.2/compiler.29373293_0
(In python)
import tensorflow.python.platform.build_info as build
print(build.build_info['cuda_version'])
Output: '64_112'
cuDNN (Supposed to be 8.1)
import tensorflow.python.platform.build_info as build
print(build.build_info['cuda_version'])
Output: '64_8' # Looks like v8 but I've actually installed v8.1 (cuDNN v8.1.1 (Feburary 26th, 2021), for CUDA 11.0,11.1 and 11.2) so I think it's fine?
GPU Checks
tf.config.list_physical_devices('GPU')
Output: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
tf.test.is_gpu_available()
Output: True
tf.test.gpu_device_name()
Output: This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Created device /device:GPU:0 with 2153 MB memory: -> device: 0, name: NVIDIA GeForce GTX 1650 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5
When I then try to fit any sort of model, it just fails following what I described above. What is surprising is that even though it can't load code such as that described in Tensorflow's CNN Tutorial, the only time it ever works is if I run the chunk of code from this stackoverflow question. This chunk of code looks almost the same as every other chunk that failed.
Can someone help me with this issue? I've been desperately testing TensorFlow with every chunk of code that I came across for the past couple of hours, and the only time where it does not get stuck at Epoch 1 is with the link above.
**(I've also tried running only on my CPU via os.environ['CUDA_VISIBLE_DEVICES'] = '-1' and everything seems to work fine)
Update (Solution)
It seems like the suggestions from this post helped - I've copied the following files from the zipped cudnn bin sub folder (cudnn-11.2-windows-x64-v8.1.1.33\cuda\bin) into my cuda bin folder (C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin)
cudnn_adv_infer64_8.dll
cudnn_adv_train64_8.dll
cudnn_cnn_infer64_8.dll
cudnn_cnn_train64_8.dll
cudnn_ops_infer64_8.dll
cudnn_ops_train64_8.dll
It seems like I initially misinterpreted the copy all cudnn*.dll files as only copying over the cudnn64_8.dll file, rather than copying every other file listed above.
My system has a GPU.
When I run Tensorflow on it, TF automatically detects GPU and starts running the thread on the GPU.
How can I change this?
I.e. how can I run Tensorflow without GPU?
It should work. It mainly disables the CUDA device. So, the code looks for other sources (CPU) to run the code.
import os
import tensorflow as tf
#os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" #If the line below doesn't work, uncomment this line (make sure to comment the line below); it should help.
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
#Your Code Here
I am using Tensorflow 1.14 and keras 2.3.1 installed using Anaconda on a 64 bit system with 64 bit python. I can't upgrade to Tensorflow 2 because of depreciated stuff I need for my application. I'm trying to compare Tensorflow when running on different number of cores but it always uses all the cores of my CPU. Here's the code I'm using to try to get it to run on a single core:
config = tf.ConfigProto(device_count={"CPU": 8},
inter_op_parallelism_threads=8,
intra_op_parallelism_threads=1)
sess = tf.Session(config=config)
with sess:
with tf.device("/CPU:0"):
Do Stuff and look at CPU core usage.
From everything I can find on the web this should work, I've tried multiple versions of this, but all cores are still used to 100% on my cpu :(
I am running tensorflow 1.8.0 installed by 'pip -install tensorflow-gpu' with CUDA 9.0 and with cuDNN 7.1.4 on Windows 10 and I have been trying to get tensorflow to work with my NVIDIA GeForce GTX 860M graphics card. I am using IDLE to run and write my code in.
When I run any code it only performs on my CPU. Some sample code im running until I can get it to run on my gpu to do the MNIST tensorflow tutorials is:
import tensorflow as tf
# Initialize two constants
x1 = tf.constant([1,2,3,4])
x2 = tf.constant([5,6,7,8])
# Multiply
result = tf.multiply(x1, x2)
# Intialize the Session
sess = tf.Session()
# Print the result
print(sess.run(result))
# Close the session
sess.close()
When i run this I get this failed to create session error:
tensorflow.python.framework.errors_impl.InternalError: Failed to create session.
So I added the following lines to the beginning of the code
import os
os.environ["CUDA_VISIBLE_DEVICES"] = '1'
and now the code runs succesfully with the correct output
[ 5 12 21 32]
but it only runs on my CPU, and when I try the following commands in the IDLE command line,
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
the output is
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 1631189080169918107
]
I have tried restarting my computer, unistalling and reinstalling tensorflow-gpu and deleted other versions of CUDA I previously had installed, but I can not get my program to recognize my NVIDIA graphics card, and I looked it up and it is a supported card.
What else can I do to get my code to recognize and run on my graphics card?
Installing CUDA is not enough, you also need to install cuDNN 7.0.5 and some other dependencies too. Refer http://www.python36.com/install-tensorflow-gpu-windows/ for complete detail.
Has anyone found a way to get a stable access to GPU runtime?
At the moment I follow this process:
Runtime -> Change runtime type -> "Python 2" and "GPU" -> Save -> Runtime -> Connec to runtime...
And check if GPU is enabled:
import tensorflow as tf
tf.test.gpu_device_name()
However, I get '', though 1 time in a 30 I was able to connect. Does anyone have any ideas what is going on?
The way to authoritatively know what kind of runtime you're connected is to hover over the CONNECTED button on the top-right; if the hover tooltip is suffixed "(GPU)" then you've got a GPU.
You can test for the health of the GPU HW by inspecting the output of executing !/opt/bin/nvidia-smi (which will only be found on a GPU runtime, by the way).
Tensorflow not being able to see the GPU while nvidia-smi can is usually a symptom of having done something like:
!pip install -U tensorflow
which gets you a TF build that doesn't know how to talk to the GPU. All colaboratory runtimes already have TF preinstalled, so you should not need to re-install it. If you need a particular feature of TF that is not available in the pre-installed version, you can get a build that knows how to talk to the GPU with !pip install -U tensorflow-gpu though note that the pre-installed TF build is better optimized for the particular CPU platform used so you'll be giving up some performance, as well as using a lot more RAM.
If you've only got a reinstalled TF build as a result of !pip install -U'ing something else that depends on tensorflow, you can avoid this by specifying --upgrade-strategy=only-if-needed which should leave the pre-installed TF in place.
If you've messed up your runtime and want to wipe the slate clean, execute
kill -9 -1 and wait 15-30s to reconnect.