Tensorflow GPU setup: error with CUDA on PyCharm

Tensorflow GPU setup: error with CUDA on PyCharm - python

I have TF 0.8 installed on Python3, MacOSX El Capitan.
When running a simple test code for TF I get this message:
ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tensorflow/python/_pywrap_tensorflow.so, 10):
Library not loaded: #rpath/libcudart.7.5.dylib
Referenced from: /Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tensorflow/python/_pywrap_tensorflow.so
Reason: image not found
My .bash_profile is as follows:
export PATH=/usr/local/bin:$PATH
export DYLD_LIBRARY_PATH=/Developer/NVIDIA/CUDA-7.5/lib:/usr/local/cuda/lib
in /Developer/NVIDIA/CUDA-7.5/lib I have a file called libcudart.7.5.dylib
in /usr/local/cuda/lib I have an alias called libcudart.7.5.dylib
I have tried several permutations of .bash_profile without success. ANy idea what may be wrong?
Note that I can successfully use my GPU with Theano so there's no reason to believe the GPU/cuDNN/CUDA install may be faulty.

If you're getting this error, make sure you installed CUDA, cuDNN correctly as described in the Tensorflow install instructions. Note the TF, CUDA, cuDNN version you're installing and what Python version you're using.
Filenames, paths etc frequently vary, so small tweaks in filenames and paths may be needed in your case if errors are cropping up. Sometimes it's hard for others to help you because your system may have a very specific path setup/version that cannot be understood by someone in a forum.
If you're getting exactly the error I'm describing in the OP, take a step back and check:
is this happening in PyCharm?
is this happening in iPython within PyCharm?
is it happening in both?
is this happening in iPython in Terminal?
In my case it was happening only in PyCharm. In iPython outside of PyCharm (that is using the Mac 'Terminal' software) everything worked fine. But when doing iPython in PyCharm, or when running the test file through PyCharm, I would get the error. This means it has something to do with PyCharm, not the Tensorflow install.
Make sure your DYLD_LIBRARY_PATH is correctly pointing to the libcudart.7.5.dylib file. Navigate there with Finder, do a Spotlight search search and find the file or its alias. Then place that path in your .bash_profile. In my case, this is working:
export DYLD_LIBRARY_PATH=/usr/local/cuda/lib
If your problem is PyCharm, a specific configuration is needed. Go to the upper right corner of the GUI and click the gray down arrow.
Choose "Edit Configuration". You will see an Environment option where you need to click on the ... box and enter the DYLD_LIBRARY_PATH that applies to your case.
Note that there's an Environment option for the specific file you're working on (it will be highlighted in the left panel) and for Defaults (put DYLD_... there as well if you want future files you create to have this). Note that you need to save this config or else when you close PyCharm it won't stick.

As an extension to pepe answer, which is the correct one, I don't mind if the following is integrated to the original answer.
I would like to add that if you wish to make this change permanent in pyCharm (affects only the current project) and not having it to do for every net file, is possible, from the interface shown above by pepe, by going under the "Default" to set the DYLD_LIBRARY_PATH.
Keep in mind, that this change by itself it doesn't alter the run configuration of the current script, which eventually still need to be manually changed ( or deleted and regenerated from the new defaults)

Could you try TF 0.9, which adds the MacOX GPU support?
https://github.com/tensorflow/tensorflow/blob/r0.9/RELEASE.md

It appears that you are not finding CUDA on your system. This could be for a number of reasons including installing CUDA for one version of python while running a different version of python that isn't aware of the other versions installed files.
Please take a look at my answer here.
https://stackoverflow.com/a/41073045/1831325

In my case, tensorflow version is 1.1, the dlopen error happens in both
ipython and pycharm
Environment:
Cuda version:8.0.62
cudnn version:6
error is a little different in pycharm and ipython. I cannot remeber too much detail, but ipython says there is no libcudnn.5.dylib, but pycharm just says there is
import error, image not found
Solution:
Download cudnn version 5, from
https://developer.nvidia.com/rdp/cudnn-download
Unzip the cudnn. Copy lib/ to /usr/local/cuda/lib. Copy include/ to /usr/local/cuda/include
unzip cuda.zip
cd cuda
sudo cp -r lib /usr/local/cuda/lib
sudo cp include/cudnn.h /usr/local/cuda/include
Add the lib directory path to your DYLD_LIBRARY_PATH. like this in my ~/.bash_profile:
export DYLD_LIBRARY_PATH=/Developer/NVIDIA/CUDA-8.0/lib:/usr/local/cuda/lib${DYLD_LIBRARY_PATH:+:${DYLD_LIBRARY_PATH}}
In the tensorflow official installation guide, it says need cudnn 5.1, so ,this is all about careless。
https://www.tensorflow.org/install/install_mac
Requirements to run TensorFlow with GPU support.
If you are installing TensorFlow with GPU support using one of the mechanisms described in this guide, then the following NVIDIA software must be installed on your system:
...
The NVIDIA drivers associated with CUDA Toolkit 8.0.
cuDNN v5.1. For details, see NVIDIA's documentation.
...

I think the problem is in the SIP (System Integrity Protection). Restricted processes run with cleared environment variables and you got this error.
You need to go to Recovery Mode, start Terminal and input
$ csrutil disable
, and reboot

Related

Custom OpenCV build works in command prompt but not working in Pycharm 2022.3.1?

Last December I've compiled OpenCV 4.6.0 with contrib modules + cuda support with cMake in a conda environment called opencv-460+cuda with python 3.9.15 and numpy 1.23.5. The cuda version installed is the cuda_11.0.3_451.82_win10 with cudnn-11.0-windows-x64-v8.0.5.39. Then I configured my previosly installed Pycharm 2021.2 to execute my own cv2 based scripts and all worked like a digital heaven. Only had to proper config these: .
Then one day I've decide to install CUDA v9.0 because Re3 object tracking algorithm requirement and it worked just fine alongside the v11.0 after settings the correct order in the environment variables panel
And lastly I choose to compile Marlin 2.1(yes, for 3D printers) with Visual Studio Code and after this, things got weird.
I couldn't execute cv2 scripts inside Pycharm nor the Anaconda Prompt with env opencv-460+cuda activated due to
import cv2
ImportError: DLL load failed while importing cv2: Couldn't find the specified module.
Then I get Windows to a past point using its restoration tool. It removed the CUDA v9.0(not the files) and the Visual Studio Code. And now the opencv scripts work in the Anaconda Prompt and in Windows PS but not in Pycharm 2022.3
I have even installed cuda v9.0 again and I still can execute my cv2 programs in Anaconda Prompt but not in Pycharm and I want to know what the h3ll have VS Code done to my Windows
I have even run Dependency Walker on the cv2.cp39-win_amd64.pyd file and I've got these . It finds all DLLs but in Checksum column says incorrect for the first three elements, altough when I put the caret on top of them appears the next
And other odd behavior is when I use Terminal from Pycharm I can import cv2 and even execute scripts but I can't execute it with Pycharm console and I WANT THE WAY THINGS WERE BEFORE VS CODE

CUDA_HOME environment variable is not set

I have a working environment for using pytorch deep learning with gpu, and i ran into a problem when i tried using mmcv.ops.point_sample, which returned :
ModuleNotFoundError: No module named 'mmcv._ext'
I have read that you should actually use mmcv-full to solve it, but i got another error when i tried to install it:
pip install mmcv-full
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
Which seems logic enough since i never installed cuda on my ubuntu machine(i am not the administrator), but it still ran deep learning training fine on models i built myself, and i'm guessing the package came in with minimal code required for running cuda tensors operations.
So my main question is where is cuda installed when used through pytorch package, and can i use the same path as the environment variable for cuda_home?
Additionaly if anyone knows some nice sources for gaining insights on the internals of cuda with pytorch/tensorflow I'd like to take a look (I have been reading cudatoolkit documentation which is cool but this seems more targeted at c++ cuda developpers than the internal working between python and the library)

you can chek it and check the paths with these commands :
which nvidia-smi
which nvcc
cat /usr/local/cuda/version.txt

Environment issues with running Anaconda Python in VS Code

I am trying to learn Python and debug code for the first time in VS Code (latest edition). I have anaconda running and the code I have runs fine by itself but now I need to know how to update the code and debug it for the first time.
I keep getting the following error related to NumPy:
Exception has occurred: ImportError
IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!
Importing the numpy C-extensions failed. This error can happen for
many reasons, often due to issues with your setup or how NumPy was
installed.
We have compiled some common reasons and troubleshooting tips at:
https://numpy.org/devdocs/user/troubleshooting-importerror.html
Please note and check the following:
The Python version is: Python3.7 from "C:\Miniconda2\envs\myproject_flask\python.exe"
The NumPy version is: "1.18.5"
and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.
Original error was: DLL load failed: The specified module could not be found.
In the Miniconda path above the Python.exe is version 3.7.7. I tried to install NumPy like so in myproject directory:
(myproject_flask) c:\MyProject\source\MyProject.Flask>conda install numpy=1.18.5
I still get the same error when I go to F5 to debug and run to a breakpoint.
Need help with my environment.
I need to use VS Code in my Windows environment with Anaconda.

You should launch VS Code from Anaconda Navigator so that the environment is initialized.

When you have problems with importing the numpy C-extensions in Anaconda it's very likely that your virtual environment hasn't been activated. Having just Python on the path is not good enough. You also need access to the libraries. This is what the activation does.
When you are running/debugging code you should see a terminal open. You can tell from the prompt
(myproject_flask) C:\MyProject
that conda has been activated. Sometimes this just takes a bit too long for VSCode. So simply push the start button a second time.
Note that the Code Runner extension in VSCode is also known to cause this kind of problems.
However, I wonder why you are using Miniconda2 with Python3, although in general this should work.

I was getting the error even after adding all the anaconda paths, it was due to VScode running the code in the python debugger terminal which is not able to enter the conda environment.
This worked for me:
press ctrl+Shift+P > Type Terminal:Select Default profile > Select Command prompt
after this code ran in Command prompt by default, inside the environment.

Installing TensorFlow Profiler for Dummies

I am trying to follow this guide to install the TensorFlow Profiler to better understand why my recently installed Keras does run on the GPU, but hardly uses any ressources (and is really slow). However, I am unable to come to any results, since the guide does not provide me with sufficient information, since I am not a programmer by trade and obiously lack necessary knowledge.
What have I tried so far?
I use Anaconda and have a running version of python 3.7 installed. I also installed tensorflow and the necessary drivers and such so that tensorflow is able to access my GPUs. Following the linked guide, I downloaded the "install_and_run.py" and tried executing it using the conda prompt. I get asked to specify --envdir and --logdir. Where do I point these? Is the environment directory just the directory to my current conda environment? Since I tried pointing both envdir and logdir into that direction and ended up with the error that the command
True" is unknown and "True' returned non-zero exit status 1.
I could not come up with any solution for this. It should probably be mentioned that I have very little experience in using the conda prompt to run .py-files and usually only use it to install packages.
I am also unsure what is meant by the subsequent steps that talk about the CUPTI path. The given path is no complete path as far as i know. Where am I supposed to look for it? Or am I meant to exectute some of this
/sbin/ldconfig -N -v $(sed 's/:/ /g' <<< $LD_LIBRARY_PATH) | \
grep libcupti
as a command? I have tried running /sbin/ldconfig -N -v $ but my system could not find the path (potentially because I started looking from the wrong directory?).
Any help is much appreciated. Sorry for the potentially confusing post from a confused person.
Thank you!

Tensorflow profiler is no longer bundled with Tensorboard. There is a tutorial on how to install and run it, when fitting Keras model.
The summary is:
Inside your env run pip install tensorboard_plugin_profile
Declare a tensorboard callback as you normally would
tboard_callback = tf.keras.callbacks.TensorBoard(log_dir = logs,
histogram_freq = 1,
profile_batch = '500,520')
Fit your model (with declared tensorboard callback)
On a separe terminal (with your env activated) run tensorboard --logdir=path/to/logs
The Profiler tab shown in the tutorial may not be visisble, but there should be a profile option available in the drop down menu on the top right corner.

The kernel appears to have died. It will restart automatically python 3

Whenever I try to import tensorflow in my windows machine, its saying that The kernel appears to have died. It will restart automatically and then its not even working.
The below is the following message given by the jupyter terminal.
Warning! HDF5 library version mismatched error
The HDF5 header files used to compile this application do not match
the version used by the HDF5 library to which this application is linked.
Data corruption or segmentation faults may occur if the application continues.
This can happen when an application was compiled by one version of HDF5 but
linked with a different version of static or shared HDF5 library.
You should recompile the application or check your shared library related
settings such as 'LD_LIBRARY_PATH'.
You can, at your own risk, disable this warning by setting the environment
variable 'HDF5_DISABLE_VERSION_CHECK' to a value of '1'.
Setting it to 2 or higher will suppress the warning messages totally.
Headers are 1.10.1, library is 1.10.2
what could solve this problem.
My version of Python is 3.6.3
and I updated the conda package also.
I have a Windows 10 Machine with 16GB RAM, so it cant be a memory issue also.
I was working with tensorflow previously, but now its not working.
This started to happen like 2 months back! when i was working on my university assignment this happened. The same code was working properly and I ran the code once again on the very same day jupyter notebook it crashed, since then I'm facing this problem.
I also tried to import tensorflow in the command prompt, its still showing the same error.
Has anyone encountered the same problem? What could be the fix?

Message says, Headers are 1.10.1, library is 1.10.2
You need to install 1.10.1 version
conda install -c anaconda hdf5=1.10.1

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.