Tensorflow OMP: Error #15 when training

Tensorflow OMP: Error #15 when training - python

I am training my neural network using tensorflow on CentOS HPC. However I got this error at start of the training process:
OMP: Error #15: Initializing libiomp5.so, but found libiomp5.so already initialized.
OMP: Hint: This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
The code is for instance segmentation and it worked fine for many people, but failed in my case.
Why it occurs? How to solve it?

I had a similar issue on macOS with the same error message (see this question) and found the following reasons:
Problem:
I had a conda environment where Numpy, SciPy and TensorFlow were installed.
Conda is using Intel(R) MKL Optimizations, see docs:
Anaconda has packaged MKL-powered binary versions of some of the most popular numerical/scientific Python libraries into MKL Optimizations for improved performance.
The Intel MKL functions (e.g. FFT, LAPACK, BLAS) are threaded with the OpenMP technology.
But on macOS you do not need MKL, because the Accelerate Framework comes with its own optimization algorithms and already uses OpenMP. That is the reason for the error message: OMP Error #15: ...
Workaround:
You should install all packages without MKL support:
conda install nomkl
and then use
conda install numpy scipy pandas tensorflow
followed by
conda remove mkl mkl-service
For more information see conda MKL Optimizations.

I solved this problem by asking a HPC server expert. Maybe useful for Compute Canada system users.
Why it occurs?
This error is due to conflict between a tensorflow pre-built Python wheel(which is specific for Compute Canada system) and conda environment.
Quote : "conda is always a bit problematic because it downloads precompiled binaries, mileage may vary..."
How to solve it?
As #abccd pointed out "The best thing to do is to ensure that only a single OpenMP runtime is linked into the process". However, I haven't figured out how to ensure that.
So I uninstalled conda, and install everything in module system using pip install. Then the network works fine.

I solved, as explained by the message, by adding:
import os
os.environ['KMP_DUPLICATE_LIB_OK']='True'

Simply downgrading my version of TensorFlow using Anaconda did it for me.

Related

How to use pandas on M1 mac? (without rosetta or changing to x86 environment in any other way)

Last I wrote a python project was less than 2 months ago and everything worked fine. I'm not sure if while working on other project I messed something up on my mac but now when trying to run python files which used to run perfectly, the following error appears:
dlopen(/opt/homebrew/lib/python3.9/site-packages/pandas/_libs/interval.cpython-39-darwin.so, 0x0002): tried: '/opt/homebrew/lib/python3.9/site-packages/pandas/_libs/interval.cpython-39-darwin.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e')), '/usr/local/lib/interval.cpython-39-darwin.so' (no such file), '/usr/lib/interval.cpython-39-darwin.so' (no such file)
I understand there is an issue with the architecture x86 vs arm so I tried seeing what platform the terminal is on with:
python -c 'import platform; print(platform.platform())'
which confirmed it was arm64.
Doing some googling and looking at similar issues such as Trouble installing Pandas on new MacBook Air M1 it seems like it would be possible to run the python project in an x86 environment, however like already mentioned, it worked fine before, and it seems there was no update since, so what could have happened that pandas (and perhaps other libs) no longer work on arm, and how can it be reverted?

You should try using miniforge.
its definition from its GitHub repository:
This repository holds a minimal installer for Conda specific to conda-forge. Miniforge allows you to install the conda package manager with the following features pre-configured:
Its main feature that will be useful for us
An emphasis on supporting various CPU architectures (x86_64, ppc64le, and aarch64 including Apple M1).
The Process I use:
Create a conda environment and usually go with "python3.9".
Install the packages from the conda, most of them are available but some are not.
After trying and installing all the packages possible with miniforge, I use PIP for the remaining packages.
This workflow has worked pretty well for me and hope it helps you.
I want to utilize the native m1 performance and I think you will be able to see the difference.
By default, miniforge only downloads arm compatible builds of python packages. till now I have not faced any major issue working with most data science libraries, except with PyTorch.

tensorflow nightly wheel in python

I'm newbie in python.
Someone can help me with the diference between
tf-nightly and tensorflow wheels?
What I should install?
https://pypi.python.org/pypi/tensorflow vs
https://pypi.python.org/pypi/tf-nightly
I'm stuck with nightly packages. I don't know what is that.

I have searched the softwareengineering.stackexchange.com and found this:
No, it means that every night, everything that has been checked into source control is built. That build is a "nightly build".
And in the installation page TF says that:
People who are a little more adventurous can also try our nightly binaries
So, we can conjecture that the tf-nightly is only for those who are adventurous, because it may have built the untested or not sufficiently tested source code into the binary form, which may result in unexpected errors or failures.
And if you employ the "conservative" installation, pip3 install -U tensorflow the binary was built from the fully tested source code(especially by(or exposed to the eyes of) users like us), usually tagged with 1.x in the github branch.
I highly recommend you install from source code yourself, you can better tailor it and get better performance. Just follow this official tutorial. You may need to download some pdg files from elsewhere you can because nvidia is under maintenance as it stated on the related pages.

By default, you should use tensorflow, not the nightly variants. Yet, some problems that persist in the official tensorflow packages can be fixed in nightly. See ValueError: Input 0 of layer dense is incompatible with the layer: its rank is undefined, but the layer requires a defined rank.

Python - ensure I'm running the same package versions in both Windows and Linux

I have a Windows 10 machine that I'm using to develop my code (Anaconda 3.5). Now I need to get my code running on a Linux server, so that others can use it as part of an application. What is the best way of setting up and maintaining my Linux environment so that it replicates the Windows one in terms of packages and version numbers?
I'm training and saving DataFrames, SVMs (Sklearn) and ANNs (Keras) in my Windows environment, which is running Anaconda Python 3.5.
On the Linux server I need to be able to load and use these models, which requires having the same packages and package versions.
How do I keep the environments running the same package versions?
The plan is to release newer and better models as I get more data. These might run on newer versions of Keras, Sklearn etc. as versions are released. How can I ensure that in Python I can have the latest package versions but still be able to run older models (possibly trained and saved using older package versions) if required? Backwards compatibility is very important.
Background:
I'm creating a 'sizing algorithm' that uses a number of ANNs and SVMs. For others to use this algorithm it's going to be running on a Linux server and somehow (the software guy ensures me it can be done) integrated, or linked, into the companies software. The different models will be loaded and saved to memory and used when called to size something. It is important that the older sizing algorithms can still be used even as I release newer, better versions.
Apparently I am the companies Python expert, even though I have only been using it since January and have no experience in releasing algorithms for others to use. I would really appreciate your help in the best way of setting up the system.
Many thanks

On a machine with the correct packages:
pip freeze > requirements.txt
On machines that need the correct packages, having copied that file to it:
pip install -r requirements.txt

Tensorflow: Compilation using SSE and AVX [duplicate]

This is the message received from running a script to check if Tensorflow is working:
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcurand.so.8.0 locally
W tensorflow/core/platform/cpu_feature_guard.cc:95] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:95] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I noticed that it has mentioned SSE4.2 and AVX,
What are SSE4.2 and AVX?
How do these SSE4.2 and AVX improve CPU computations for Tensorflow tasks.
How to make Tensorflow compile using the two libraries?

I just ran into this same problem, it seems like Yaroslav Bulatov's suggestion doesn't cover SSE4.2 support, adding --copt=-msse4.2 would suffice. In the end, I successfully built with
bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k //tensorflow/tools/pip_package:build_pip_package
without getting any warning or errors.
Probably the best choice for any system is:
bazel build -c opt --copt=-march=native --copt=-mfpmath=both --config=cuda -k //tensorflow/tools/pip_package:build_pip_package
(Update: the build scripts may be eating -march=native, possibly because it contains an =.)
-mfpmath=both only works with gcc, not clang. -mfpmath=sse is probably just as good, if not better, and is the default for x86-64. 32-bit builds default to -mfpmath=387, so changing that will help for 32-bit. (But if you want high-performance for number crunching, you should build 64-bit binaries.)
I'm not sure what TensorFlow's default for -O2 or -O3 is. gcc -O3 enables full optimization including auto-vectorization, but that sometimes can make code slower.
What this does: --copt for bazel build passes an option directly to gcc for compiling C and C++ files (but not linking, so you need a different option for cross-file link-time-optimization)
x86-64 gcc defaults to using only SSE2 or older SIMD instructions, so you can run the binaries on any x86-64 system. (See https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html). That's not what you want. You want to make a binary that takes advantage of all the instructions your CPU can run, because you're only running this binary on the system where you built it.
-march=native enables all the options your CPU supports, so it makes -mavx512f -mavx2 -mavx -mfma -msse4.2 redundant. (Also, -mavx2 already enables -mavx and -msse4.2, so Yaroslav's command should have been fine). Also if you're using a CPU that doesn't support one of these options (like FMA), using -mfma would make a binary that faults with illegal instructions.
TensorFlow's ./configure defaults to enabling -march=native, so using that should avoid needing to specify compiler options manually.
-march=native enables -mtune=native, so it optimizes for your CPU for things like which sequence of AVX instructions is best for unaligned loads.
This all applies to gcc, clang, or ICC. (For ICC, you can use -xHOST instead of -march=native.)

Let's start with the explanation of why do you see these warnings in the first place.
Most probably you have not installed TF from source and instead of it used something like pip install tensorflow. That means that you installed pre-built (by someone else) binaries which were not optimized for your architecture. And these warnings tell you exactly this: something is available on your architecture, but it will not be used because the binary was not compiled with it. Here is the part from documentation.
TensorFlow checks on startup whether it has been compiled with the
optimizations available on the CPU. If the optimizations are not
included, TensorFlow will emit warnings, e.g. AVX, AVX2, and FMA
instructions not included.
Good thing is that most probably you just want to learn/experiment with TF so everything will work properly and you should not worry about it
What are SSE4.2 and AVX?
Wikipedia has a good explanation about SSE4.2 and AVX. This knowledge is not required to be good at machine-learning. You may think about them as a set of some additional instructions for a computer to use multiple data points against a single instruction to perform operations which may be naturally parallelized (for example adding two arrays).
Both SSE and AVX are implementation of an abstract idea of SIMD (Single instruction, multiple data), which is
a class of parallel computers in Flynn's taxonomy. It describes
computers with multiple processing elements that perform the same
operation on multiple data points simultaneously. Thus, such machines
exploit data level parallelism, but not concurrency: there are
simultaneous (parallel) computations, but only a single process
(instruction) at a given moment
This is enough to answer your next question.
How do these SSE4.2 and AVX improve CPU computations for TF tasks
They allow a more efficient computation of various vector (matrix/tensor) operations. You can read more in these slides
How to make Tensorflow compile using the two libraries?
You need to have a binary which was compiled to take advantage of these instructions. The easiest way is to compile it yourself. As Mike and Yaroslav suggested, you can use the following bazel command
bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k //tensorflow/tools/pip_package:build_pip_package

Let me answer your 3rd question first:
If you want to run a self-compiled version within a conda-env, you can. These are the general instructions I run to get tensorflow to install on my system with additional instructions. Note: This build was for an AMD A10-7850 build (check your CPU for what instructions are supported...it may differ) running Ubuntu 16.04 LTS. I use Python 3.5 within my conda-env. Credit goes to the tensorflow source install page and the answers provided above.
git clone https://github.com/tensorflow/tensorflow
# Install Bazel
# https://bazel.build/versions/master/docs/install.html
sudo apt-get install python3-numpy python3-dev python3-pip python3-wheel
# Create your virtual env with conda.
source activate YOUR_ENV
pip install six numpy wheel, packaging, appdir
# Follow the configure instructions at:
# https://www.tensorflow.org/install/install_sources
# Build your build like below. Note: Check what instructions your CPU
# support. Also. If resources are limited consider adding the following
# tag --local_resources 2048,.5,1.0 . This will limit how much ram many
# local resources are used but will increase time to compile.
bazel build -c opt --copt=-mavx --copt=-msse4.1 --copt=-msse4.2 -k //tensorflow/tools/pip_package:build_pip_package
# Create the wheel like so:
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
# Inside your conda env:
pip install /tmp/tensorflow_pkg/NAME_OF_WHEEL.whl
# Then install the rest of your stack
pip install keras jupyter etc. etc.
As to your 2nd question:
A self-compiled version with optimizations are well worth the effort in my opinion. On my particular setup, calculations that used to take 560-600 seconds now only take about 300 seconds! Although the exact numbers will vary, I think you can expect about a 35-50% speed increase in general on your particular setup.
Lastly your 1st question:
A lot of the answers have been provided above already. To summarize: AVX, SSE4.1, SSE4.2, MFA are different kinds of extended instruction sets on X86 CPUs. Many contain optimized instructions for processing matrix or vector operations.
I will highlight my own misconception to hopefully save you some time: It's not that SSE4.2 is a newer version of instructions superseding SSE4.1. SSE4 = SSE4.1 (a set of 47 instructions) + SSE4.2 (a set of 7 instructions).
In the context of tensorflow compilation, if you computer supports AVX2 and AVX, and SSE4.1 and SSE4.2, you should put those optimizing flags in for all. Don't do like I did and just go with SSE4.2 thinking that it's newer and should superseed SSE4.1. That's clearly WRONG! I had to recompile because of that which cost me a good 40 minutes.

These are SIMD vector processing instruction sets.
Using vector instructions is faster for many tasks; machine learning is such a task.
Quoting the tensorflow installation docs:
To be compatible with as wide a range of machines as possible, TensorFlow defaults to only using SSE4.1 SIMD instructions on x86 machines. Most modern PCs and Macs support more advanced instructions, so if you're building a binary that you'll only be running on your own machine, you can enable these by using --copt=-march=native in your bazel build command.

Thanks to all this replies + some trial and errors, I managed to install it on a Mac with clang. So just sharing my solution in case it is useful to someone.
Follow the instructions on Documentation - Installing TensorFlow from Sources
When prompted for
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]
then copy-paste this string:
-mavx -mavx2 -mfma -msse4.2
(The default option caused errors, so did some of the other flags. I got no errors with the above flags. BTW I replied n to all the other questions)
After installing, I verify a ~2x to 2.5x speedup when training deep models with respect to another installation based on the default wheels - Installing TensorFlow on macOS
Hope it helps

I have recently installed it from source and bellow are all the steps needed to install it from source with the mentioned instructions available.
Other answers already describe why those messages are shown. My answer gives a step-by-step on how to isnstall, which may help people struglling on the actual installation as I did.
Install Bazel
Download it from one of their available releases, for example 0.5.2.
Extract it, go into the directory and configure it: bash ./compile.sh.
Copy the executable to /usr/local/bin: sudo cp ./output/bazel /usr/local/bin
Install Tensorflow
Clone tensorflow: git clone https://github.com/tensorflow/tensorflow.git
Go to the cloned directory to configure it: ./configure
It will prompt you with several questions, bellow I have suggested the response to each of the questions, you can, of course, choose your own responses upon as you prefer:
Using python library path: /usr/local/lib/python2.7/dist-packages
Do you wish to build TensorFlow with MKL support? [y/N] y
MKL support will be enabled for TensorFlow
Do you wish to download MKL LIB from the web? [Y/n] Y
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:
Do you wish to use jemalloc as the malloc implementation? [Y/n] n
jemalloc disabled
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] N
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with Hadoop File System support? [y/N] N
No Hadoop File System support will be enabled for TensorFlow
Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N] N
No XLA JIT support will be enabled for TensorFlow
Do you wish to build TensorFlow with VERBS support? [y/N] N
No VERBS support will be enabled for TensorFlow
Do you wish to build TensorFlow with OpenCL support? [y/N] N
No OpenCL support will be enabled for TensorFlow
Do you wish to build TensorFlow with CUDA support? [y/N] N
No CUDA support will be enabled for TensorFlow
The pip package. To build it you have to describe which instructions you want (you know, those Tensorflow informed you are missing).
Build pip script: bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.1 --copt=-msse4.2 -k //tensorflow/tools/pip_package:build_pip_package
Build pip package: bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
Install Tensorflow pip package you just built: sudo pip install /tmp/tensorflow_pkg/tensorflow-1.2.1-cp27-cp27mu-linux_x86_64.whl
Now next time you start up Tensorflow it will not complain anymore about missing instructions.

This is the simplest method. Only one step.
It has significant impact on speed. In my case, time taken for a training step almost halved.
Refer
custom builds of tensorflow

I compiled a small Bash script for Mac (easily can be ported to Linux) to retrieve all CPU features and apply some of them to build TF. Im on TF master and use kinda often (couple times in a month).
https://gist.github.com/venik/9ba962c8b301b0e21f99884cbd35082f

To compile TensorFlow with SSE4.2 and AVX, you can use directly
bazel build --config=mkl
--config="opt"
--copt="-march=broadwell"
--copt="-O3"
//tensorflow/tools/pip_package:build_pip_package
Source:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/docker/Dockerfile.devel-cpu-mkl

2.0 COMPATIBLE SOLUTION:
Execute the below commands in Terminal (Linux/MacOS) or in Command Prompt (Windows) to install Tensorflow 2.0 using Bazel:
git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow
#The repo defaults to the master development branch. You can also checkout a release branch to build:
git checkout r2.0
#Configure the Build => Use the Below line for Windows Machine
python ./configure.py
#Configure the Build => Use the Below line for Linux/MacOS Machine
./configure
#This script prompts you for the location of TensorFlow dependencies and asks for additional build configuration options.
#Build Tensorflow package
#CPU support
bazel build --config=opt //tensorflow/tools/pip_package:build_pip_package
#GPU support
bazel build --config=opt --config=cuda --define=no_tensorflow_py_deps=true //tensorflow/tools/pip_package:build_pip_package

When building TensorFlow from source, you'll run the configure script. One of the questions that the configure script asks is as follows:
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]
The configure script will attach the flag(s) you specify to the bazel command that builds the TensorFlow pip package. Broadly speaking, you can respond to this prompt in one of two ways:
If you are building TensorFlow on the same type of CPU type as the one on which you'll run TensorFlow, then you should accept the default (-march=native). This option will optimize the generated code for your machine's CPU type.
If you are building TensorFlow on one CPU type but will run TensorFlow on a different CPU type, then consider supplying a more specific optimization flag as described in the gcc
documentation.
After configuring TensorFlow as described in the preceding bulleted list, you should be able to build TensorFlow fully optimized for the target CPU just by adding the --config=opt flag to any bazel command you are running.

To hide those warnings, you could do this before your actual code.
import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
import tensorflow as tf

Illegal instruction: 4 when importing python plugins

I tried to install a hoomd_script molecular dynamics software on my imac (it's imac pro before 2009, the system is OS X El captain v10.11.3). I have successfully compiled this to iMac, but when I import this hoomd_script in Python 2.7.12, Python crashes completely and I get the error:
Illegal instruction: 4.
I have installed all the prerequisites packages (including boost, sphinx, git, mpich2, numpy, cmake, pkg-config, sqlite) using conda.
I applied python -vc 'hoomd_script' to test, and the result is here. I tried to reinstall all the packages including conda and recompile the hoomd, but nothing changed. I wonder how can I fix this. Thanks!

As stated on the HOOMD-blue web page, the conda builds require a CPU capable of AVX instructions (2011 or newer). The illegal instruction results because you are trying to execute an instruction that your processor does not support.
Compiling hoomd from a clean build directory on your system should result in a binary that your system can execute. Note that conda provided prerequisite libraries are difficult to work with: I recommend using macports or homebrew.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.