Setting up Anaconda to use Tensorboard Profiler - python

I just installed the tensorboard profiler with
pip install -U tensorboard_plugin_profile
The version is 2.3.
Tensorflow-Version 2.3
Tensorboard-Version 2.3
cudatoolkit-Version 10.1.243
When i now try to open the Profil-Tab in Tensorboard i see the Profiler-Window normaly but empty and the Error-Message:
DEM6561: Failed to load libcupti (is it installed and accessible?)
And the warning:
No step marker observed and hence the step time is unknown. This may happen if (1) training steps are not instrumented (e.g., if you are not using Keras) or (2) the profiling duration is shorter than the step time. For (1), you need to add step instrumentation; for (2), you may try to profile longer.
I think it has something to do with the enviroment-pathes- and variables but i dont know how they work with the virtuel enviroments of Anaconda. (I dont have a Cuda-Folder i can link to)
Had someone the same problem like me or any ideas what i can try?
Thanks ahead!

First, make sure that the CUPTI has been set to Path (via Environment Variables if you're using Windows), adding a path which should look like this:
%CUDA_PATH%\extras\CUPTI\lib64
Second, check if Tensorflow is looking for the correct CUPTI dll. I've encountered this exact same issue and as I checked, it appears that TF 2.4 is looking for cupti64_110.dll instead of cupti64_2020.1.1.dll. It is currently a known issue and will be addressed in TF 2.5. I'm not sure if that's the case too with TF 2.3.
I basically resolved the issue by copying the dll in the same directory and renaming it. Let me know if this helps!

Related

CUDA not available in PyTorch after following all installation steps

I am trying to get my BERT transformers model to run on CUDA and have followed all the installation steps here:
https://medium.com/#jjlovesstudying/python-cuda-set-up-on-windows-10-for-gpu-support-78126284b085
However after adding the folders to the PATH variable, I restart my Pycharm and run the following command:
torch.cuda.is_available()
which brings False. I appreciate this is not reproducible but would anyone have any idea how to debug this problem or fix this? Any extra information needed, I'm happy to provide.
Have you checked the modules are loaded at each part?
What do you get for import torch

Could not load library cudnn_ops_infer64_8.dll. Error code 126 Please make sure cudnn_ops_infer64_8.dll is in your library path

Could not load library cudnn_ops_infer64_8.dll. Error code 126 Please make sure cudnn_ops_infer64_8.dll is in your library path.
I've tried searching online but it's been hours and I haven't found anything. I would really appreciate anyone sharing his thoughts. I'm trying to run ai-benchmark library which internally tests for performance of gpu against popular datasets. (see image)
You should have downloaded CUDA zip file. Go to that file, extract it and in the bin folder you will see
cudnn_adv_infer64_8.dll
cudnn_adv_train64_8.dll
cudnn_cnn_infer64_8.dll
cudnn_cnn_train64_8.dll
cudnn_ops_infer64_8.dll
cudnn_ops_train64_8.dll
Copy these files into your nvidia gpu computing toolkit\cuda\bin folder.
In my case, it is because I did not install zlib, and after I install it, it works!!
Leaving an answer to respond to Diego Rueda's comment on MADM4X's post.
I ran into the same issue: copied the cuDNN files into my CUDA toolkit install and received the Error Code 126.
You need to specifically download/copy cuDNN Version 8.1.x. If you use the latest version (8.3.x) you'll receive the error Code described in the original post.
TensorFlow doesn't seem to be as sensitive to the Toolkit's version (I'm running 11.4), but I haven't explored all of the features to make sure they work.
For more context, you can find the specific CUDA/cuDNN versions listed on TensorFlow's website.
Jupyter notebook was locking the file in my case, closing Jupyter notebook server solved it for me.
I had the same problem and just went thru the Prerequisites: https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#prerequisites-windows
Check if all the cudnn[.dll, .h, .lib]* files and zlivwapi.dll are in the system path. Once all are in path, the problem is solved.
Have faced the same issue, could able to fix it by downloading the files from CUDNN and copy to "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDNN\v5.1\bin".

tensorboard: command not found, when I use

I already use python IDE PyCharm and have installed the package tensorflow but when I use the terminal and give an input to the tensor board as follows
--logdir = '/user/....',
it shows command not found?
is there still something Im missing like setting the tenor flow path or any thing else?
Maybe you need to activate your relevant environment via source activate some_conda_env.
If it isn't either of those you're going to need to give more information than you've given in the question (versions, environment, etc.)
Update (due to discussion in comments):
It seems that tensorboard is not in your search path.
That can be fixed, but meanwhile you can just run tensorboard by using the full path. e.g. /my/full/path/to/tensorboard --logdir='/path/to/logger' etc.
Good luck!

Error in prediction using sknn.mlp

I use Anaconda on a Windows 10 laptop with Python 2.7 and Spark 2.1. Built a deep learning model using Sknn.mlp package. I have completed the model. When I try to predict using the predict function, it throws an error. I run the same code on my Mac and it works just fine. Wondering what is wrong with my windows packages.
'NoneType' object is not callable
I verified input data. It is numpy.array and it does not have null value. Its dimension is same as training one and all attributed are the same. Not sure what it can be.
I don't work with Python on Windows, so this answer will be very vague, but maybe it will guide you in the right direction. Sometimes there are cross-platform errors due to one module still not being updated for the OS, frequently when another related module gets an update. I recall something happened to me with a django application which required somebody more familiar with Windows to fix it for me.
Maybe you could try with an environment using older versions of your modules until you find the culprit.
I finally solved the problem on windows. Here is the solution in case you face it.
The Theano package was faulty. I installed the latest version from github and then it threw another error as below:
RuntimeError: To use MKL 2018 with Theano you MUST set "MKL_THREADING_LAYER=GNU" in your environment.
In order to solve this, I created a variable named MKL_Threading_Layer under user environment variable and passed GNU. Reset the kernel and it was working.
Hope it helps!

theano NotImplementedError

I am running some theano code making use of tensor.advanced_subtensor
I am getting the following error :
NotImplementedError: Could not import inplace_increment, so some advanced indexing features are disabled. They will be available if you update NumPy to version 1.8 or later, or to the latest development version. You may need to clear the cache (theano-cache clear) afterwards.
I have the latest version of theano (0.6.0.dev-60b5ccc2bcabb1010714376764daf8a50722cee9) and numpy (1.8.0). Why am I still getting this error? How can I resolve this error? How do I clear theano cache?
The theano cache is usually in ~/.theano/ if you are using *ix.
You need to clear Theano cache. Cache live in ~/.theano/ folder.
Follow below steps to clear it manually.
import theano
print (theano.config.compiledir)
# and then delete directory returned from above.
If you don not want to delete manually then use below command.
theano-cache purge

Categories