XGBoost parallelization issue on macOS High Sierra

XGBoost parallelization issue on macOS High Sierra - python

I am using Anaconda environment on macOS High Sierra and could not run XGBoost with 8 thread even though I set the nthread parameter to 8.
The python code is like this. When I run it, I monitor the execution by looking at htop. There is only one process which is 100%.
clf = xgb.XGBClassifier(
**xgb_params,
n_estimators=1000,
nthread=8
)
Then I searched on the internet and found this link. Some people refer this one and I followed it.
https://www.ibm.com/developerworks/community/blogs/jfp/entry/Installing_XGBoost_on_Mac_OSX?lang=en
➜ ~ brew install gcc --without-multilib
Warning: gcc 8.2.0 is already installed and up-to-date
To reinstall 8.2.0, run `brew reinstall gcc`
After seeing this info, I added the following lines to my xgboost config
export CC = gcc-8
export CXX = g++-8
When the build is done. I tried it again and nothing changed.
So, I kept searching the solution. I found this page.
https://clang-omp.github.io/
Then I ran the following line.
brew install llvm
And I tried to do the example in that site. I created a file, which is called hello.c, and put the following code inside.
#include <omp.h>
#include <stdio.h>
int main() {
#pragma omp parallel
printf("Hello from thread %d, nthreads %d\n", omp_get_thread_num(),
omp_get_num_threads());
}
Then I tried to compile it as mentioned.
clang -fopenmp hello.c -o hello
It worked! I also tried gcc-8 as follows.
gcc-8 -fopenmp hello.c -o hello
It also worked. So, this is the output when I run ./hello
➜ ~ ./hello
Hello from thread 4, nthreads 8
Hello from thread 7, nthreads 8
Hello from thread 2, nthreads 8
Hello from thread 1, nthreads 8
Hello from thread 6, nthreads 8
Hello from thread 3, nthreads 8
Hello from thread 0, nthreads 8
Hello from thread 5, nthreads 8
So, I added gcc-8 in xgboost config file and gcc-8 is able to run in parallel with -fopenmp option as you can see. However, XGBoost does not run in parallel even though I compile it with this setting and set the nthread parameter to 8.
Is there any idea that I can try more?
Edit 1: I tried more complex code to ensure parellalization works. I tried this code. The output shows 8 thread work. I also see it by typing htop.
➜ ~ clang -fopenmp OpenMP.c -o OpenMP
➜ ~ ./OpenMP
---- Serial
---- Serial done in 37.058571 seconds.
---- Parallel
---- Parallel done in 9.674641 seconds.
---- Check
Passed
Edit 2: I installed gcc-7 and did same process. It did not work either.

Related

How to use cuda-gdb python debugging?

I write a hello.py that is simply
print("hi")
and then run
cuda-gdb python3 hello.py
I get:
Reading symbols from python3...
(No debugging symbols found in python3)
"/home/x/Desktop/py_projects/hello.py" is not a core dump: file format not recognized
How to debug if I call cuda functions in python code?

Its possible to use cuda-gdb from python, assuming you only need to debug the C/C++ portion. I don't know of a debugger that can jump from debugging python to debugging CUDA C++. Here is one possible approach, a copy of what is presented here.
To debug a CUDA C/C++ library function called from python, the following is one possibility, inspired from this article.
For this walk through, I will use the t383.py and t383.cu files verbatim from this answer, and I'll be using CUDA 10, python 2.7.5, on CentOS7
Compile your CUDA C/C++ library using the -G and -g switches, as you would to do ordinary debug:
$ nvcc -Xcompiler -fPIC -std=c++11 -shared -arch=sm_60 -G -g -o t383.so t383.cu -DFIX
We'll need two terminal sessions for this. I will refer to them as session 1 and session 2. In session 1, start your python interpreter:
$ python
...
>>>
In session 2, find the process ID associated with your python interpreter (replace USER with your actual username):
$ ps -ef |grep USER
...
USER 23221 22694 0 23:55 pts/0 00:00:00 python
...
$
In the above example, 23221 is the process ID for the python interpreter (use man ps for help)
In session 2, start cuda-gdb so as to attach to that process ID:
$ cuda-gdb -p 23221
... (lots of spew here)
(cuda-gdb)
In session 2, at the (cuda-gdb) prompt, set a breakpoint at a desired location in your CUDA C/C++ library. For this example, we will set a breakpoint at one of the first lines of kernel code, line 70 in the t383.cu file. If you haven't yet loaded the library (we haven't, in this walk through), then cuda-gdb will point this out and ask you if you want to make the breakpoint pending on a future library load. Answer y to this (alternatively, before starting this cuda-gdb session, you could have run your python script once from within the interpreter, as we will do in step 7 below. This would load the symbol table for the library and avoid this prompt). After the breakpoint is set, we will issue the continue command in cuda-gdb in order to get the python interpreter running again:
(cuda-gdb) break t383.cu:70
No symbol table is loaded. Use the "file" command.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (t383.cu:70) pending.
(cuda-gdb) continue
Continuing.
In session 1, run your python script:
>>> execfile("t383.py")
init terrain_height_map
1, 1, 1, 1, 1,
1, 1, 1, 1,
1, 1, 1, 1,
our python interpreter has now halted (and is unresponsive), because in session 2 we see that the breakpoint has been hit:
[New Thread 0x7fdb0ffff700 (LWP 23589)]
[New Thread 0x7fdb0f7fe700 (LWP 23590)]
[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 0, lane 0]
Thread 1 "python" hit Breakpoint 1, update_water_flow<<<(1,1,1),(1024,1,1)>>> (
water_height_map=0x80080000800000, water_flow_map=0xfffcc000800600,
d_updated_water_flow_map=0x7fdb00800800, SIZE_X=4, SIZE_Y=4) at t383.cu:70
70 int col = index % SIZE_X;
(cuda-gdb)
and we see that the breakpoint is at line 70 of our library (kernel) code, just as expected. ordinary C/C++ cuda-gdb debug can proceed at this point within session 2, as long as you stay within the library function.
When you are finished debugging (you may need to remove any breakpoints set) you can once again type continue in session 2, to allow control to return to the python interpreter in session 1, and for your application to finish.

To complete Robert's answer, if you are using CUDA-Python, you can use option --args in order to pass a command-line that contains arguments. For example, this is a valid command-line:
$ cuda-gdb --args python3 hello.py
Your original command is not valid because, without --args, cuda-gdb takes in parameter a host coredump file.
Here is the complete command line with an example from the CUDA-Python repository:
$ cuda-gdb -q --args python3 simpleCubemapTexture_test.py
Reading symbols from python3...
(No debugging symbols found in python3)
(cuda-gdb) set cuda break_on_launch application
(cuda-gdb) run
Starting program: /usr/bin/python3 simpleCubemapTexture_test.py.
...
[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 0, lane 0]
0x00007fff67858600 in transformKernel<<<(8,8,1),(8,8,1)>>> ()
(cuda-gdb) p $pc
$1 = (void (*)()) 0x7fff67858600 <transformKernel>
(cuda-gdb) bt
#0 0x00007fff67858600 in transformKernel<<<(8,8,1),(8,8,1)>>> ()

Boost Python own module throws Segmentation Fault `GlobalError::PushToStack()`

I'm trying to wrap around an existing C++ llibrary we have for Python 3.6. I've followed the tutorials of Boost Python:
https://flanusse.net/interfacing-c++-with-python.html
https://www.mantidproject.org/Boost_Python_Introduction
https://github.com/TNG/boost-python-examples/blob/master/01-HelloWorld/CMakeLists.txt
All of them SIGSEV, so I run the command under gdb:
gdb --args python -c 'import MyPyLib'
And the actual output is:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff3bb02a9 in GlobalError::PushToStack() () from /usr/lib/x86_64-linux-gnu/libapt-pkg.so.5.0
I tried to run the boost-python-examples from Github and I get the same problem. If it helps, I'm on:
gcc 7.4.0
g++ 7.4.0
python 3.6.8
ibboost-python-dev 1.65.1

I found the problem, all examples use
find_package(Boost REQUIRED COMPONENTS python)
But if you pay attention, there are two libraries in the system:
sudo ldconfig -p | grep "libboost_python*"
libboost_python3-py36.so.1.65.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libboost_python3-py36.so.1.65.1
libboost_python3-py36.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libboost_python3-py36.so
libboost_python-py27.so.1.65.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libboost_python-py27.so.1.65.1
libboost_python-py27.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libboost_python-py27.so
So I started suspecting that my module was being linked to python 2.7 boost-python.
I swapped in CMakeLists.txt the actual component:
find_package(Boost REQUIRED COMPONENTS python3)
And now it works fine. It's quite surprising that such a miss-match throws such a cryptic error. Also cmake complains when using python3 that no headers were found or indexed.

Installing the "ring.cx SIP client" on a Raspberry PI

The Situation
I would like to get terminal-based (headless) SIP calls working on my Raspberry Pi and I already tried this using linphone:
RaspberryPI: Making SIP outbound calls using linphonec or an alternative SIP soft phone
Since I am currently stuck there I wanted to try another option which was SFLPhone. They pointed me towards the ring software project which offers a daemon dring which allows making SIP calls using a scripting interface:
Indeed, the daemon can run standalone and be controlled using the DBus API.
Note the project have been renamed to "Ring" (version is bumped to 2.x). Experimental packages are available at http://ring.cx/en/documentation/linux-installation
A major feature of Ring
2.x is the optional "DHT" account type allowing to make calls without any SIP server.
There are many other enhancements such as ICE support, UPnP support, stability improvements etc.
(note clients are being re-written (GTK3, Qt5) and there is a new OS X client, they are not yet feature complete and in heavy development.)
The new daemon dring source Git repo URI is : https://gerrit-ring.savoirfairelinux.com/ring .
The DBus API is mostly the same as before. In the tools/dringctrl directory you will find an example python client that we use for testing (uses python3-dbus).
We are willing to fix any bugs you may find, the daemon bugtracker is here : https://projects.savoirfairelinux.com/projects/ring-daemon/issues
Also look at https://projects.savoirfairelinux.com/projects/ring/wiki for build instruction etc.
Regards and good luck for your embedded project,
A. B.
Compiling the dependencies
I tried to compile the dependencies for the project like stated in the README:
git clone https://gerrit-ring.savoirfairelinux.com/ring
cd ring
Compile the dependencies first
cd ../contrib/
rm -fr native/ && mkdir native
cd native
../bootstrap
make
I got this error:
libvpx.webm-4640a0c4804b/third_party/googletest/src/include/gtest/gtest.h
mv libvpx-4640a0c4804b49f1870d5a2d17df0c7d0a77af2f libvpx && touch libvpx
cd libvpx && CROSS= ./configure --target=armv7-linux-gcc \
--as=yasm --disable-docs --disable-examples --disable-unit-tests --disable-install-bins --disable-install-docs --enable-realtime-only --enable-error-concealment --disable-runtime-cpu-detect --disable-webm-io --enable-pic --prefix=/home/pi/ring/contrib/arm-linux-gnueabihf
disabling docs
disabling examples
disabling unit_tests
disabling install_bins
disabling install_docs
enabling realtime_only
enabling error_concealment
disabling runtime_cpu_detect
disabling webm_io
enabling pic
Configuring selected codecs
enabling vp8_encoder
enabling vp8_decoder
enabling vp9_encoder
enabling vp9_decoder
Configuring for target 'armv7-linux-gcc'
enabling armv7
enabling neon
enabling neon_asm
enabling media
Unable to invoke compiler: arm-none-linux-gnueabi-gcc -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64
Configuration failed. This could reflect a misconfiguration of your
toolchains, improper options selected, or another problem. If you
don't see any useful error messages above, the next step is to look
at the configure error log file (config.log) to determine what
configure was trying to do when it died.
../../contrib/src/vpx/rules.mak:105: recipe for target '.vpx' failed
make: *** [.vpx] Error 1
Compiling ring
Despite compiling the dependencies failed I did attempt to compile ring:
git clone https://gerrit-ring.savoirfairelinux.com/ring
cd ring
./autogen.sh
./configure
make
make install
This caused the following error:
checking for PJPROJECT... no
configure: error: Missing pjproject files
pi#phone ~/ring $ make
make: *** No targets specified and no makefile found. Stop.
pi#phone ~/ring $ make install
make: *** No rule to make target 'install'. Stop.
So currently I am stuck and I fear that I will not be able to go beyond the current state of my project (🎥):
Edit: Now without the video codecs (like aberaud suggested) I run into the following error:
/bin/bash ../libtool --tag=CXX --mode=compile g++ -DHAVE_CONFIG_H -I. -I.. -I../include/opendht -I/home/pi/sip-desaster/ring/contrib/arm-linux-gnueabihf/include -fPIC -I/home/pi/sip-desaster/ring/contrib/arm-linux-gnueabihf/include -g -fPIC -O3 -std=c++0x -I/home/pi/sip-desaster/ring/contrib/arm-linux-gnueabihf/include -g -fPIC -O3 -std=c++0x -c -o libopendht_la-dht.lo `test -f 'dht.cpp' || echo './'`dht.cpp
libtool: compile: g++ -DHAVE_CONFIG_H -I. -I.. -I../include/opendht -I/home/pi/sip-desaster/ring/contrib/arm-linux-gnueabihf/include -fPIC -I/home/pi/sip-desaster/ring/contrib/arm-linux-gnueabihf/include -g -fPIC -O3 -std=c++0x -I/home/pi/sip-desaster/ring/contrib/arm-linux-gnueabihf/include -g -fPIC -O3 -std=c++0x -c dht.cpp -fPIC -DPIC -o libopendht_la-dht.o
In file included from ../include/opendht/dht.h:29:0,
from dht.cpp:27:
../include/opendht/infohash.h:58:22: error: expected initializer before ‘:’ token
dht.cpp:3105:1: error: expected ‘}’ at end of input
Makefile:386: recipe for target 'libopendht_la-dht.lo' failed
make[2]: *** [libopendht_la-dht.lo] Error 1
make[2]: Leaving directory '/home/pi/sip-desaster/ring/contrib/native/opendht/src'
Makefile:395: recipe for target 'install-recursive' failed
make[1]: *** [install-recursive] Error 1
make[1]: Leaving directory '/home/pi/sip-desaster/ring/contrib/native/opendht'
../../contrib/src/opendht/rules.mak:28: recipe for target '.opendht' failed
make: *** [.opendht] Error 2

Contrib
The "contrib" part of the Ring build is about building dependencies that are not available on the target system, mostly used to build full working packages when cross-compiling or for systems without proper dependency management (basically all OSs except Linux distros).
When in the contrib/native directory, you can run make list to see the list of packages to be built. Since you are cross-compiling you will need to build almost everything.
If you somehow mess up the contrib build you can safely delete the contrib/native and contrib/{target_tuple} (if any) directories and start again.
libvpx
The libvpx library is used by libav to provide the vp8 and vp9 video codecs. It's not an hard dependency and since your project doesn't use video you can safely disable it. We also encountered issues when cross compiling vpx.
In contrib/src/libav/rules.mak line 70, DEPS_libav defines the list of dependencies for libav. You can remove vpx, x264 and $(DEPS_vpx) from the list since you don't use video. You may also add speex and opus audio codecs to the list (they should be in the list but aren't, see this patch
as an example).
After cleaning contrib as described above and boostraping again, when running make list, vpx and x264 shouldn't show up in the list of "To-be-built packages". Then try to build contrib by running make.
If after trying this you encounter the same issue for other packages you may have some sort of cross compilation build path issue (I would then need more logs/details).
As a very last resort, compiling on the Pi itself (with Raspbian) is horribly slow but has the advantage to use local distro-provided dependencies and remove the hassles of cross compilation.
Good luck

I am providing an own answer which should document my steps which will hopefully lead me to the desired end result:
Download the sources
git clone https://gerrit-ring.savoirfairelinux.com/ring
cd ring
Build the contrib section
According to #aberaud's remarks I can update contrib/src/libav/rules.mak and remove any video-related dependencies (remember that I am headless):
So I changed line 70 form
DEPS_libav = zlib x264 vpx $(DEPS_vpx)
to
DEPS_libav = zlib opus speex
Now build the contrib section.
cd ../contrib/
rm -fr native/ && mkdir native
cd native
../bootstrap
make

I was hitting the same error trying to compile the contrib.
The version of Raspbian I was using came with the older version of the gcc compiler, version 4.6. After I upgraded to 4.8 it compiled instantly. Well, as instantly as anything compiles on the Pi at any rate.

How can I install wxPython 3.0.1.1 in Python 2.7 virtualenv on Ubuntu 14.10?

The following procedure fails. Am I missing something?
Install various Ubuntu packages (prerequisites for compilation)
Get http://downloads.sourceforge.net/wxpython/wxPython-src-3.0.1.1.tar.bz2
Uncompress to wxPython-src-3.0.1.1/
Create new virtualenv called test
Activate test virtualenv
In terminal, from wxPython-src-3.0.1.1/:
./configure --prefix=/home/username/.virtualenvs/test --with-gtk2 --enable-unicode --with-opengl
#lots of output, confirms "Configured wxWidgets 3.0.1 for `x86_64-unknown-linux-gnu'"
make install
#lots of output, confirms:
# The installation of wxWidgets is finished. On certain
# platforms (e.g. Linux) you'll now have to run ldconfig
# if you installed a shared library and also modify the
# LD_LIBRARY_PATH (or equivalent) environment variable.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/.virtualenvs/test/lib
# don't run ldconfig since that is a system tool, not appropriate for virtualenv?
cd wxPython
python setup.py install
# lots of output, starting:
# WARNING: WXWIN not set in environment. Assuming '..'
# Found wx-config: /home/username/.virtualenvs/test/bin/wx-config
# Using flags: --toolkit=gtk2 --unicode=yes --version=3.0
# Preparing CORE...
# Preparing STC...
# Preparing GLCANVAS...
# Preparing GIZMOS...
# running install
# etc
The final command fails with:
src/gtk/_core_wrap.cpp:20407:7: note: ‘arg3’ was declared here
int arg3 ;
^
src/gtk/_core_wrap.cpp: In function ‘PyObject* _wrap_Image_SetAlphaBuffer(PyObject*, PyObject*, PyObject*)’:
src/gtk/_core_wrap.cpp:3747:13: warning: ‘arg3’ may be used uninitialized in this function [-Wmaybe-uninitialized]
if (ALPHASIZE != self->GetWidth() * self->GetHeight()) {
^
src/gtk/_core_wrap.cpp:20474:7: note: ‘arg3’ was declared here
int arg3 ;
^
cc1plus: some warnings being treated as errors
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

I used python setup.py install &> ~/error.txt to pass on the error messages to knowledgeable colleague who identified that C compilation was using the -Werror=format-security flag. This version of wxPython (and maybe others) cannot compile with that flag.
My $CPPFLAGS and $CFLAGS environment variables were empty. It turns out that this flag is triggered by hardening-wrapper.
So, I overrode the flag by invoking the final step as follows, and wxPython was installed successfully:
CFLAGS=-Wno-error=format-security CPPFLAGS=-Wno-error=format-security python setup.py install

Programatically testing for openmp support from a python setup script

I'm working on a python project that uses cython and c to speed up time sensitive operations. In a few of our cython routines, we use openmp to further speed up an operation if idle cores are available.
This leads to a bit of an annoying situation on OS X since the default compiler for recent OS versions (llvm/clang on 10.7 and 10.8) doesn't support openmp. Our stopgap solution is to tell people to set gcc as their compiler when they build. We'd really like to do this programmatically since clang can build everything else with no issues.
Right now, compilation will fail with the following error:
clang: error: linker command failed with exit code 1 (use -v to see invocation)
error: Command "cc -bundle -undefined dynamic_lookup -L/usr/local/lib -L/usr/local/opt/sqlite/lib build/temp.macosx-10.8-x86_64-2.7/yt/utilities/lib/geometry_utils.o -lm -o yt/utilities/lib/geometry_utils.so -fopenmp" failed with exit status 1
The relevant portion of our setup script looks like this:
config.add_extension("geometry_utils",
["yt/utilities/lib/geometry_utils.pyx"],
extra_compile_args=['-fopenmp'],
extra_link_args=['-fopenmp'],
libraries=["m"], depends=["yt/utilities/lib/fp_utils.pxd"])
The full setup.py file is here.
Is there a way to programmatically test for openmp support from inside the setup script?

I was able to get this working by checking to see if a test program compiles:
import os, tempfile, subprocess, shutil
# see http://openmp.org/wp/openmp-compilers/
omp_test = \
r"""
#include <omp.h>
#include <stdio.h>
int main() {
#pragma omp parallel
printf("Hello from thread %d, nthreads %d\n", omp_get_thread_num(), omp_get_num_threads());
}
"""
def check_for_openmp():
tmpdir = tempfile.mkdtemp()
curdir = os.getcwd()
os.chdir(tmpdir)
filename = r'test.c'
with open(filename, 'w', 0) as file:
file.write(omp_test)
with open(os.devnull, 'w') as fnull:
result = subprocess.call(['cc', '-fopenmp', filename],
stdout=fnull, stderr=fnull)
os.chdir(curdir)
#clean up
shutil.rmtree(tmpdir)
return result

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.