Installing TensorFlow Profiler for Dummies - python

I am trying to follow this guide to install the TensorFlow Profiler to better understand why my recently installed Keras does run on the GPU, but hardly uses any ressources (and is really slow). However, I am unable to come to any results, since the guide does not provide me with sufficient information, since I am not a programmer by trade and obiously lack necessary knowledge.
What have I tried so far?
I use Anaconda and have a running version of python 3.7 installed. I also installed tensorflow and the necessary drivers and such so that tensorflow is able to access my GPUs. Following the linked guide, I downloaded the "install_and_run.py" and tried executing it using the conda prompt. I get asked to specify --envdir and --logdir. Where do I point these? Is the environment directory just the directory to my current conda environment? Since I tried pointing both envdir and logdir into that direction and ended up with the error that the command
True" is unknown and "True' returned non-zero exit status 1.
I could not come up with any solution for this. It should probably be mentioned that I have very little experience in using the conda prompt to run .py-files and usually only use it to install packages.
I am also unsure what is meant by the subsequent steps that talk about the CUPTI path. The given path is no complete path as far as i know. Where am I supposed to look for it? Or am I meant to exectute some of this
/sbin/ldconfig -N -v $(sed 's/:/ /g' <<< $LD_LIBRARY_PATH) | \
grep libcupti
as a command? I have tried running /sbin/ldconfig -N -v $ but my system could not find the path (potentially because I started looking from the wrong directory?).
Any help is much appreciated. Sorry for the potentially confusing post from a confused person.
Thank you!

Tensorflow profiler is no longer bundled with Tensorboard. There is a tutorial on how to install and run it, when fitting Keras model.
The summary is:
Inside your env run pip install tensorboard_plugin_profile
Declare a tensorboard callback as you normally would
tboard_callback = tf.keras.callbacks.TensorBoard(log_dir = logs,
histogram_freq = 1,
profile_batch = '500,520')
Fit your model (with declared tensorboard callback)
On a separe terminal (with your env activated) run tensorboard --logdir=path/to/logs
The Profiler tab shown in the tutorial may not be visisble, but there should be a profile option available in the drop down menu on the top right corner.

Related

CUDA_HOME environment variable is not set

I have a working environment for using pytorch deep learning with gpu, and i ran into a problem when i tried using mmcv.ops.point_sample, which returned :
ModuleNotFoundError: No module named 'mmcv._ext'
I have read that you should actually use mmcv-full to solve it, but i got another error when i tried to install it:
pip install mmcv-full
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
Which seems logic enough since i never installed cuda on my ubuntu machine(i am not the administrator), but it still ran deep learning training fine on models i built myself, and i'm guessing the package came in with minimal code required for running cuda tensors operations.
So my main question is where is cuda installed when used through pytorch package, and can i use the same path as the environment variable for cuda_home?
Additionaly if anyone knows some nice sources for gaining insights on the internals of cuda with pytorch/tensorflow I'd like to take a look (I have been reading cudatoolkit documentation which is cool but this seems more targeted at c++ cuda developpers than the internal working between python and the library)
you can chek it and check the paths with these commands :
which nvidia-smi
which nvcc
cat /usr/local/cuda/version.txt

Environment issues with running Anaconda Python in VS Code

I am trying to learn Python and debug code for the first time in VS Code (latest edition). I have anaconda running and the code I have runs fine by itself but now I need to know how to update the code and debug it for the first time.
I keep getting the following error related to NumPy:
Exception has occurred: ImportError
IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!
Importing the numpy C-extensions failed. This error can happen for
many reasons, often due to issues with your setup or how NumPy was
installed.
We have compiled some common reasons and troubleshooting tips at:
https://numpy.org/devdocs/user/troubleshooting-importerror.html
Please note and check the following:
The Python version is: Python3.7 from "C:\Miniconda2\envs\myproject_flask\python.exe"
The NumPy version is: "1.18.5"
and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.
Original error was: DLL load failed: The specified module could not be found.
In the Miniconda path above the Python.exe is version 3.7.7. I tried to install NumPy like so in myproject directory:
(myproject_flask) c:\MyProject\source\MyProject.Flask>conda install numpy=1.18.5
I still get the same error when I go to F5 to debug and run to a breakpoint.
Need help with my environment.
I need to use VS Code in my Windows environment with Anaconda.
You should launch VS Code from Anaconda Navigator so that the environment is initialized.
When you have problems with importing the numpy C-extensions in Anaconda it's very likely that your virtual environment hasn't been activated. Having just Python on the path is not good enough. You also need access to the libraries. This is what the activation does.
When you are running/debugging code you should see a terminal open. You can tell from the prompt
(myproject_flask) C:\MyProject
that conda has been activated. Sometimes this just takes a bit too long for VSCode. So simply push the start button a second time.
Note that the Code Runner extension in VSCode is also known to cause this kind of problems.
However, I wonder why you are using Miniconda2 with Python3, although in general this should work.
I was getting the error even after adding all the anaconda paths, it was due to VScode running the code in the python debugger terminal which is not able to enter the conda environment.
This worked for me:
press ctrl+Shift+P > Type Terminal:Select Default profile > Select Command prompt
after this code ran in Command prompt by default, inside the environment.

Cannot Import TensorFlow - ModuleNotFoundError: No module named 'tensorflow'

I have seen TensorFlow installation issues many times on this site. I have tried most of the fixes that I've come across, but none of them have quite worked.
Technical information:
Windows 10, 64 bit
Python version: Python 3.6
PyCharm
Current conda virtual environment: tensorflow_env, with interpreter as "Python 3.6 (tensorflow_env) (2) C:\Users\username\Anaconda3\envs\tensorflow_env\pythonw.exe"
Please let me know if I'm missing any important information, so I can add it.
Description of the issue:
In PyCharm, I have the code "import tensorflow" which results in the error mentioned in the title.
Now, when I start typing "import tens" it auto-completes to "import tensorflow" for me (see the figure below), which makes me think that it sees the module in some way, but it just can't import for some reason.
Additionally, my project interpreter has tensorflow as a listed package (see the figure below).
In order to install tensorflow, I have tried a number of methods. Here are some of them.
pip install tensorflow (resulted in "successful installation of . . .")
pip3 install tensorflow (resulted in "successful installation of . . .")
A number of ways of upgrading tensorflow (e.g. pip upgrade), most of which have resulted in success messages.
Through the interpreter itself, resulting in the following image (note the "Package 'tensorflow installed correctly" message at the bottom.
Now that all being said, I can technically import tensorflow using the command prompt and the following commands.
conda activate tensorflow_env
python
import tensorflow (does not result in error)
import keras (results in "Using TensorFlow backend.")
However, I would like to use this outside the command prompt.
I've tried a few Youtube tutorials (most notably tech with tim's video on the subject, as well as his troubleshooting video), the instructions on the tensorflow website, removing every version of python from my computer and trying again (twice), and the instructions I've seen on other posts on stackoverflow.
I'm certain I'm just missing something simple and obvious, but I need some help figuring out what that is.
I appreciate it. Thanks!

Encountering Import Error DLL load failed constantly

I have been trying to intall scikit-learn and pytorch using their respective commands given in the docs:
The commands for installing PyTorch are:
1) pip3 install https://download.pytorch.org/whl/cpu/torch-1.0.1-cp37-cp37m-win_amd64.whl
2) pip3 install torchvision
The command for installing scikit-learn is:
pip install -U scikit-learn
Some background:
I am using Windows 8.1, Python 3.7.2. My pip is updated. I have also installed Anaconda for solving this using conda, but had zero luck!(Also, here I am running into 'conda' unrecognized error which is another story). Here are the paths my PATH variable holds.
PATH
C:\Users\satya\Anaconda3;
C:\Users\satya\Anaconda3\Library\mingw-w64\bin;
C:\Users\satya\Anaconda3\Library\usr\bin;
C:\Users\satya\Anaconda3\Library\bin;
C:\Users\satya\Anaconda3\Scripts;
C:\Users\satya\AppData\Local\Programs\Python\Python37\Scripts\;
C:\Users\satya\AppData\Local\Programs\Python\Python37\; C:\Users\satya\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Python 3.7
The Actual Problem:
The same commands for installation given above work perfectly fine on my other Windows 10, but, for my Windows 8.1 it gives this error which has become a real PITA
Import Error: DLL load failed The specified module could not be found
When I import sklearn or import torch I get the exact same error. All the time.
Back Story:
I have searched almost all the related questions I could find on Stackoverflow and Github for 6+ hours to help me solve this problem. But, none of the answers have helped till now and some haven't had an "understandable" answer. Maybe, its just a small fix, but now, I am choosing to post a question on SO.
My Question Again:
Can someone please help out and try to explain what I am missing out here? I really want to fix this error for good(and want to be in a position to fix it if I encounter it again). An elaborate answer would really help understand easily.
Thank You!
Please check your python build number with the following command.
conda list python
Python 3.7.2 with build number h8c8aaf0_2 has a solved issue.
If this is the case, an update will do.
conda update python

Tensorflow GPU setup: error with CUDA on PyCharm

I have TF 0.8 installed on Python3, MacOSX El Capitan.
When running a simple test code for TF I get this message:
ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tensorflow/python/_pywrap_tensorflow.so, 10):
Library not loaded: #rpath/libcudart.7.5.dylib
Referenced from: /Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tensorflow/python/_pywrap_tensorflow.so
Reason: image not found
My .bash_profile is as follows:
export PATH=/usr/local/bin:$PATH
export DYLD_LIBRARY_PATH=/Developer/NVIDIA/CUDA-7.5/lib:/usr/local/cuda/lib
in /Developer/NVIDIA/CUDA-7.5/lib I have a file called libcudart.7.5.dylib
in /usr/local/cuda/lib I have an alias called libcudart.7.5.dylib
I have tried several permutations of .bash_profile without success. ANy idea what may be wrong?
Note that I can successfully use my GPU with Theano so there's no reason to believe the GPU/cuDNN/CUDA install may be faulty.
If you're getting this error, make sure you installed CUDA, cuDNN correctly as described in the Tensorflow install instructions. Note the TF, CUDA, cuDNN version you're installing and what Python version you're using.
Filenames, paths etc frequently vary, so small tweaks in filenames and paths may be needed in your case if errors are cropping up. Sometimes it's hard for others to help you because your system may have a very specific path setup/version that cannot be understood by someone in a forum.
If you're getting exactly the error I'm describing in the OP, take a step back and check:
is this happening in PyCharm?
is this happening in iPython within PyCharm?
is it happening in both?
is this happening in iPython in Terminal?
In my case it was happening only in PyCharm. In iPython outside of PyCharm (that is using the Mac 'Terminal' software) everything worked fine. But when doing iPython in PyCharm, or when running the test file through PyCharm, I would get the error. This means it has something to do with PyCharm, not the Tensorflow install.
Make sure your DYLD_LIBRARY_PATH is correctly pointing to the libcudart.7.5.dylib file. Navigate there with Finder, do a Spotlight search search and find the file or its alias. Then place that path in your .bash_profile. In my case, this is working:
export DYLD_LIBRARY_PATH=/usr/local/cuda/lib
If your problem is PyCharm, a specific configuration is needed. Go to the upper right corner of the GUI and click the gray down arrow.
Choose "Edit Configuration". You will see an Environment option where you need to click on the ... box and enter the DYLD_LIBRARY_PATH that applies to your case.
Note that there's an Environment option for the specific file you're working on (it will be highlighted in the left panel) and for Defaults (put DYLD_... there as well if you want future files you create to have this). Note that you need to save this config or else when you close PyCharm it won't stick.
As an extension to pepe answer, which is the correct one, I don't mind if the following is integrated to the original answer.
I would like to add that if you wish to make this change permanent in pyCharm (affects only the current project) and not having it to do for every net file, is possible, from the interface shown above by pepe, by going under the "Default" to set the DYLD_LIBRARY_PATH.
Keep in mind, that this change by itself it doesn't alter the run configuration of the current script, which eventually still need to be manually changed ( or deleted and regenerated from the new defaults)
Could you try TF 0.9, which adds the MacOX GPU support?
https://github.com/tensorflow/tensorflow/blob/r0.9/RELEASE.md
It appears that you are not finding CUDA on your system. This could be for a number of reasons including installing CUDA for one version of python while running a different version of python that isn't aware of the other versions installed files.
Please take a look at my answer here.
https://stackoverflow.com/a/41073045/1831325
In my case, tensorflow version is 1.1, the dlopen error happens in both
ipython and pycharm
Environment:
Cuda version:8.0.62
cudnn version:6
error is a little different in pycharm and ipython. I cannot remeber too much detail, but ipython says there is no libcudnn.5.dylib, but pycharm just says there is
import error, image not found
Solution:
Download cudnn version 5, from
https://developer.nvidia.com/rdp/cudnn-download
Unzip the cudnn. Copy lib/ to /usr/local/cuda/lib. Copy include/ to /usr/local/cuda/include
unzip cuda.zip
cd cuda
sudo cp -r lib /usr/local/cuda/lib
sudo cp include/cudnn.h /usr/local/cuda/include
Add the lib directory path to your DYLD_LIBRARY_PATH. like this in my ~/.bash_profile:
export DYLD_LIBRARY_PATH=/Developer/NVIDIA/CUDA-8.0/lib:/usr/local/cuda/lib${DYLD_LIBRARY_PATH:+:${DYLD_LIBRARY_PATH}}
In the tensorflow official installation guide, it says need cudnn 5.1, so ,this is all about careless。
https://www.tensorflow.org/install/install_mac
Requirements to run TensorFlow with GPU support.
If you are installing TensorFlow with GPU support using one of the mechanisms described in this guide, then the following NVIDIA software must be installed on your system:
...
The NVIDIA drivers associated with CUDA Toolkit 8.0.
cuDNN v5.1. For details, see NVIDIA's documentation.
...
I think the problem is in the SIP (System Integrity Protection). Restricted processes run with cleared environment variables and you got this error.
You need to go to Recovery Mode, start Terminal and input
$ csrutil disable
, and reboot

Categories