Problem accessing GPUs with tf-nightly 2.4

Problem accessing GPUs with tf-nightly 2.4 - python

Recently, the Ubuntu18.04 server I work on has been upgraded to the TensorFlow version 2.4.0 from 2.0.0. It started problem with accessing the GPUs which was working perfectly before. I noticed there are two versions available right now by pip list on my jupyter notebook. I also tried tf.test.gpu_device_name(), which returned nothing. Previously I was using the following code to assign GPU for my code:
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" # see issue #152
os.environ["CUDA_VISIBLE_DEVICES"]="1"
And see the list of all devices, I was using:
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
After the upgrade, the code above returns only the CPU name, not the GPUs.
My questions are:
This problem can be related to the multiple versions installed on the server. In that case, can I select a particular version to run my code? Right now, I am seeing tensorflow-gpu 2.3.0 and tf-nightly 2.4.0. I know uninstalling one can lead to the solution but I don't have the sudo access.
Do I need to use new code to assign GPU because of the version change?
Do I need to upgrade the whole code to make it compatible with TF2.4?
I also think, tf-nightly-gpu may solve the problem but I need to be 100% sure.
I am using python3. Thank you.

To solve with multiple tensorflow to access gpu. You should use Anaconda. This will also avoid for you the sudo problem. Try to install cuda-tookits and install tf-nightly. You can check here for earlier version as example. Therefore, I don't think you have to change anything in the code. Furthermore, from tf2.x gpu has been automatically goes with cpu version, then the tf-nightly-gpu will not be necessary

Related

CUDA_HOME environment variable is not set

I have a working environment for using pytorch deep learning with gpu, and i ran into a problem when i tried using mmcv.ops.point_sample, which returned :
ModuleNotFoundError: No module named 'mmcv._ext'
I have read that you should actually use mmcv-full to solve it, but i got another error when i tried to install it:
pip install mmcv-full
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
Which seems logic enough since i never installed cuda on my ubuntu machine(i am not the administrator), but it still ran deep learning training fine on models i built myself, and i'm guessing the package came in with minimal code required for running cuda tensors operations.
So my main question is where is cuda installed when used through pytorch package, and can i use the same path as the environment variable for cuda_home?
Additionaly if anyone knows some nice sources for gaining insights on the internals of cuda with pytorch/tensorflow I'd like to take a look (I have been reading cudatoolkit documentation which is cool but this seems more targeted at c++ cuda developpers than the internal working between python and the library)

you can chek it and check the paths with these commands :
which nvidia-smi
which nvcc
cat /usr/local/cuda/version.txt

ImportError: Could not load dynamic library 'cudart64_110.dll' [duplicate]

This question already has answers here:
Fix not load dynamic library for Tensorflow GPU
(5 answers)
Closed 1 year ago.
I was originally running Tensorflow using PyCharm.
In PyCharm, the same phrase as the title did not appear.
But after I switched to VS Code and installed Python extension,
When I write and execute import tensorflow as tf, the error like the title appears repeatedly.
ImportError: Could not load dynamic library 'cudart64_110.dll'
Considering that there was no problem in PyCharm, it does not seem to be an environmental variable problem.
When I type the same command that was executed in VS Code in the command prompt window, another phrase appears,
"Connection failed because the target computer refused to connect."
My OS: Windows 10
I am using Anaconda, and I created a virtual environment.
vscode ver : 1.53.2
tensorflow ver : 2.4.1
CUDA : 11.2
cudnn : 8.1

It is due to tensorflow GPU support. Tensorflow now comes with GPU support and the system need graphics support and CUDA, CUDU installations. If you missed CUDA installation then you will get the above message. The latest version of tensorflow sometimes won't run without CUDA.
Try to install tensorflow 1.15 and python 3.7.4
https://www.python.org/ftp/python/3.7.4/python-3.7.4-amd64.exe
pip install tensorflow==1.15
NB: Normally tensorflow will run without cuda but the message will always shown in the prompt.

I would agree that this is due to your CUDA version, check the bottom of tensorflow GPU build config, it says for 2.4, you need CUDA 11.0 and cuDNN 8.0, which you have neither, in addition, you need MSVC 2019 to compile it.
Notice that for newer versions of tensorflow-gpu (>=2.3.0), conda will NOT download everything, you need to do them manually.
because it seems like all the evidence is pointing to GPU support problem, tensorflow-gpu might still run without using GPU, so it is possible that it was running on CPU when you use PyCharm,
I would suggest you double-check it runs as intended in PyCharm with
print(tf.config.list_physical_devices('GPU'))
or just simply reinstall everything

I copied "cudart64_110.dll" to the CUDA/v11.2/bin folder and it was resolved.

Installing Tensorflow and Keras on Intel Pentium

For a university we are supposed to implement a TensorFlow project using the python libraries for tensorflow and keras. I can install both of them just fine using pip3, but executing any piece of code results in some kind of error.
I've settled on testing the very complicated code:
import keras
Using python 3.6 and the newest tensorflow and keras (pip3 install tensorflow keras) I get the error ModuleNotFoundError: No module named 'tensorflow.python'; 'tensorflow' is not a package. I checked, and import tensorflow finds the package, but returns some error about AVX instructions and dumps the core.
I researched, and my CPU does not support AVX instructions which are part of tensorflow >= 1.6.0. I could not find a precompiled version that runs on my laptop without AVX, and I don't have the time to compile myself.
I tried downgrading to tensorflow == 1.5.0 and keras == 2.1.3 which was the version when tensorflow == 1.5.0 was around, but I still get missing errors, for each version and import statement a different one.
For example when I use the code:
import keras
from keras.datasets import mnist
I instead get the error AttributeError: module 'keras.utils' has no attribute 'Sequence'. I'm on an Intel Pentium, which I assume is the problem. I am fully aware that my setup is in no way suitable for machine learning, and it isn't supposed to be, but nevertheless I'd like to work on that assignment.
Anyone got experience with installing TensorFlow on older machines?
System:
Ubuntu 18.04.2 LTS
Intel(R) Pentium(R) 3556U # 1.70GHz (Dual Core)
4GB RAM

I had the same trouble, but it seems to have solved it. (However, the Python version shall be 3.5. )
For CPUs that do not support AVX, the tensorflow must be version 1.5 or lower.
If you want to install Tensorflow 1.5, the Python version must be 3.5 or lower.
The successful procedure is as follows.
(1) Uninstall your Anaconda.
(2) Download the following version of Anaconda from the following
URL. Version: Anaconda3-4.2.0-Windows-x86_64.exe
URL:https://repo.anaconda.com/archive/ or https://repo.anaconda.com/archive/Anaconda3-4.2.0-Windows-x86_64.exe
(3) Double-click the anaconda icon of “(2)” above, and install the
anaconda according to the GUI instructions.
(4) Start Anaconda Prompt
(5) Enter “pip install tensorflow==1.5” in Anaconda Prompt and press
the return key. Wait for the installation to finish. (See the log)
(6) Enter "pip install keras==2.2.4" in Anaconda Prompt and press the
return key. Wait for the installation to finish.(See the log)
This completes the installation. If you Enter " import tensorflow " on Jupiter notebook, some future error may displayed.(See this log.)
System:
My PC does not support AVX like your PC. My PC's specs are as follows.
PC:Surface Go
CPU:Intel(R) Pentium(R) CPU 4415Y　＠　1.60 GHz
Windows10:64bit
How to test ?
Enter and execute the following command on Jupiter Note. Or use this file.
import tensorflow as tf
print(tf.__version__)
print(tf.keras.__version__)
or
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))
If your install is successful, then following message will be displayed on your Jupiter notebook
1.5.0
2.1.2-tf
P.S.
I'm not very good at English, so I'm sorry if I have some impolite or unclear expressions.

Sticking to the Pentium configuration is not recommended for default tensorflow builds because of AVX dependencies. Also many recent advances in this area are not available in earlier builds of TF and you will find it difficult to replicate research works. Options below:
Get a Google Colab (https://colab.research.google.com/) notebook, install Keras and TF and get going with your work
There have been genuine requests for this support, refer to this link [https://github.com/tensorflow/tensorflow/issues/18689] where unofficial builds are provided. See if one of them works
Build Tensorflow from scratch (very hard option), with the right set of flags for Bazel (remove all AVX/threading options)

tensorflow: <built-in function AppendInt32ArrayToTensorProto> returned NULL without setting an error

Had various issues getting tensorflow onto my system and eventually did with v1.4.1. Trying to run this: https://github.com/sherjilozair/char-rnn-tensorflow
SystemError: built-in function AppendInt32ArrayToTensorProto returned NULL without setting an error
Searched and couldn't find this specific issue or any patches in newer versions with this same isusue.

You are using an older Tensorflow version, which is probably not compatible with your current python version.
check your computer configuration and install a matching Tensorflow version with the help of the following table: https://www.tensorflow.org/install/pip#package-location
Install a python version that matches your Tensorflow version (also can be found in the link provided above)
Check your python version: $ python3 --version
Check your Tensorflow version:$ pip3 list | grep tensorflow
If versions are matching as stated in the table above you'd probably get rid of the error
I've encountered a similar problem when I was trying to run the Tensorflow image retraining script: https://github.com/tensorflow/hub/raw/master/examples/image_retraining/retrain.py
In my case the problem was caused by Tensorflow 1.11.0 not being compatible with python 3.7.0.
Steps that solved the problem for me:
Uninstall python 3.7.0.
Install python 3.6.0.
I run the script again, and now it run properly
Hope it will hellp :)

How do I resolve these tensorflow warnings?

I just installed Tensorflow 1.0.0 using pip. When running, I get warnings like the one shown below.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
I get 5 more similar warning for SSE4.1, SSE4.2, AVX, AVX2, FMA.
Despite these warnings the program seems to run fine.

export TF_CPP_MIN_LOG_LEVEL=2 solved the problem for me on Ubuntu.
https://github.com/tensorflow/tensorflow/issues/7778

My proposed way to solve the problem:
#!/usr/bin/env python3
import os
import tensorflow as tf
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
Should work at least on any Debian or Ubuntu systems.

I don't know much about C, but I found this
bazel build --linkopt='-lrt' -c opt --copt=-mavx --copt=-msse4.2 --copt=-msse4.1 --copt=-msse3-k //tensorflow/tools/pip_package:build_pip_package
How you build you program?

It seems that even if you don't have a compatible (i.e. Nvidia) GPU, you can actually still install the precompiled package for tensorflow-gpu via pip install tensorflow-gpu. It looks like in addition to the GPU support it also supports (or at least doesn't complain about) the CPU instruction set extensions like SSE3, AVX, etc. The only downside I've observed is that the Python wheel is a fair bit larger: 90MB for tensorflow-gpu instead of 42MB for plain tensorflow.
On my machine without an Nvidia GPU I've confirmed that tensorflow-gpu 1.0 runs fine without displaying the cpu_feature_guard warnings.

It would seem that the PIP build for the GPU is bad as well as I get the warnings with the GPU version and the GPU installed...

Those are simply warnings.
They are just informing you if you build TensorFlow from source it can be faster on your machine.
Those instructions are not enabled by default on the builds available I think to be compatible with more CPUs as possible.

As the warnings say you should only compile TF with these flags if you need to make TF faster.
You can use TF environment variable TF_CPP_MIN_LOG_LEVEL and it works as follows:
It defaults to 0, displaying all logs
To filter out INFO logs set it to 1
WARNINGS additionally, 2
and to additionally filter out ERROR logs set it to 3
So you can do the following to silence the warnings:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
import tensorflow as tf

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Problem accessing GPUs with tf-nightly 2.4 - python

Related

CUDA_HOME environment variable is not set

ImportError: Could not load dynamic library 'cudart64_110.dll' [duplicate]

Installing Tensorflow and Keras on Intel Pentium

tensorflow: <built-in function AppendInt32ArrayToTensorProto> returned NULL without setting an error

How do I resolve these tensorflow warnings?

Categories

Resources