How to use cuda-gdb python debugging?

How to use cuda-gdb python debugging? - python

I write a hello.py that is simply
print("hi")
and then run
cuda-gdb python3 hello.py
I get:
Reading symbols from python3...
(No debugging symbols found in python3)
"/home/x/Desktop/py_projects/hello.py" is not a core dump: file format not recognized
How to debug if I call cuda functions in python code?

Its possible to use cuda-gdb from python, assuming you only need to debug the C/C++ portion. I don't know of a debugger that can jump from debugging python to debugging CUDA C++. Here is one possible approach, a copy of what is presented here.
To debug a CUDA C/C++ library function called from python, the following is one possibility, inspired from this article.
For this walk through, I will use the t383.py and t383.cu files verbatim from this answer, and I'll be using CUDA 10, python 2.7.5, on CentOS7
Compile your CUDA C/C++ library using the -G and -g switches, as you would to do ordinary debug:
$ nvcc -Xcompiler -fPIC -std=c++11 -shared -arch=sm_60 -G -g -o t383.so t383.cu -DFIX
We'll need two terminal sessions for this. I will refer to them as session 1 and session 2. In session 1, start your python interpreter:
$ python
...
>>>
In session 2, find the process ID associated with your python interpreter (replace USER with your actual username):
$ ps -ef |grep USER
...
USER 23221 22694 0 23:55 pts/0 00:00:00 python
...
$
In the above example, 23221 is the process ID for the python interpreter (use man ps for help)
In session 2, start cuda-gdb so as to attach to that process ID:
$ cuda-gdb -p 23221
... (lots of spew here)
(cuda-gdb)
In session 2, at the (cuda-gdb) prompt, set a breakpoint at a desired location in your CUDA C/C++ library. For this example, we will set a breakpoint at one of the first lines of kernel code, line 70 in the t383.cu file. If you haven't yet loaded the library (we haven't, in this walk through), then cuda-gdb will point this out and ask you if you want to make the breakpoint pending on a future library load. Answer y to this (alternatively, before starting this cuda-gdb session, you could have run your python script once from within the interpreter, as we will do in step 7 below. This would load the symbol table for the library and avoid this prompt). After the breakpoint is set, we will issue the continue command in cuda-gdb in order to get the python interpreter running again:
(cuda-gdb) break t383.cu:70
No symbol table is loaded. Use the "file" command.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (t383.cu:70) pending.
(cuda-gdb) continue
Continuing.
In session 1, run your python script:
>>> execfile("t383.py")
init terrain_height_map
1, 1, 1, 1, 1,
1, 1, 1, 1,
1, 1, 1, 1,
our python interpreter has now halted (and is unresponsive), because in session 2 we see that the breakpoint has been hit:
[New Thread 0x7fdb0ffff700 (LWP 23589)]
[New Thread 0x7fdb0f7fe700 (LWP 23590)]
[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 0, lane 0]
Thread 1 "python" hit Breakpoint 1, update_water_flow<<<(1,1,1),(1024,1,1)>>> (
water_height_map=0x80080000800000, water_flow_map=0xfffcc000800600,
d_updated_water_flow_map=0x7fdb00800800, SIZE_X=4, SIZE_Y=4) at t383.cu:70
70 int col = index % SIZE_X;
(cuda-gdb)
and we see that the breakpoint is at line 70 of our library (kernel) code, just as expected. ordinary C/C++ cuda-gdb debug can proceed at this point within session 2, as long as you stay within the library function.
When you are finished debugging (you may need to remove any breakpoints set) you can once again type continue in session 2, to allow control to return to the python interpreter in session 1, and for your application to finish.

To complete Robert's answer, if you are using CUDA-Python, you can use option --args in order to pass a command-line that contains arguments. For example, this is a valid command-line:
$ cuda-gdb --args python3 hello.py
Your original command is not valid because, without --args, cuda-gdb takes in parameter a host coredump file.
Here is the complete command line with an example from the CUDA-Python repository:
$ cuda-gdb -q --args python3 simpleCubemapTexture_test.py
Reading symbols from python3...
(No debugging symbols found in python3)
(cuda-gdb) set cuda break_on_launch application
(cuda-gdb) run
Starting program: /usr/bin/python3 simpleCubemapTexture_test.py.
...
[Switching focus to CUDA kernel 0, grid 1, block (0,0,0), thread (0,0,0), device 0, sm 0, warp 0, lane 0]
0x00007fff67858600 in transformKernel<<<(8,8,1),(8,8,1)>>> ()
(cuda-gdb) p $pc
$1 = (void (*)()) 0x7fff67858600 <transformKernel>
(cuda-gdb) bt
#0 0x00007fff67858600 in transformKernel<<<(8,8,1),(8,8,1)>>> ()

Related

XGBoost parallelization issue on macOS High Sierra

I am using Anaconda environment on macOS High Sierra and could not run XGBoost with 8 thread even though I set the nthread parameter to 8.
The python code is like this. When I run it, I monitor the execution by looking at htop. There is only one process which is 100%.
clf = xgb.XGBClassifier(
**xgb_params,
n_estimators=1000,
nthread=8
)
Then I searched on the internet and found this link. Some people refer this one and I followed it.
https://www.ibm.com/developerworks/community/blogs/jfp/entry/Installing_XGBoost_on_Mac_OSX?lang=en
➜ ~ brew install gcc --without-multilib
Warning: gcc 8.2.0 is already installed and up-to-date
To reinstall 8.2.0, run `brew reinstall gcc`
After seeing this info, I added the following lines to my xgboost config
export CC = gcc-8
export CXX = g++-8
When the build is done. I tried it again and nothing changed.
So, I kept searching the solution. I found this page.
https://clang-omp.github.io/
Then I ran the following line.
brew install llvm
And I tried to do the example in that site. I created a file, which is called hello.c, and put the following code inside.
#include <omp.h>
#include <stdio.h>
int main() {
#pragma omp parallel
printf("Hello from thread %d, nthreads %d\n", omp_get_thread_num(),
omp_get_num_threads());
}
Then I tried to compile it as mentioned.
clang -fopenmp hello.c -o hello
It worked! I also tried gcc-8 as follows.
gcc-8 -fopenmp hello.c -o hello
It also worked. So, this is the output when I run ./hello
➜ ~ ./hello
Hello from thread 4, nthreads 8
Hello from thread 7, nthreads 8
Hello from thread 2, nthreads 8
Hello from thread 1, nthreads 8
Hello from thread 6, nthreads 8
Hello from thread 3, nthreads 8
Hello from thread 0, nthreads 8
Hello from thread 5, nthreads 8
So, I added gcc-8 in xgboost config file and gcc-8 is able to run in parallel with -fopenmp option as you can see. However, XGBoost does not run in parallel even though I compile it with this setting and set the nthread parameter to 8.
Is there any idea that I can try more?
Edit 1: I tried more complex code to ensure parellalization works. I tried this code. The output shows 8 thread work. I also see it by typing htop.
➜ ~ clang -fopenmp OpenMP.c -o OpenMP
➜ ~ ./OpenMP
---- Serial
---- Serial done in 37.058571 seconds.
---- Parallel
---- Parallel done in 9.674641 seconds.
---- Check
Passed
Edit 2: I installed gcc-7 and did same process. It did not work either.

How to run MPI compatible applications from Jupyter notebooks?

So I have a trobule with gmsh.
Direct execution works fine:
!gmsh -3 -algo meshadapt tmp_0.geo -o SFM.msh
While execution from code fails:
try:
out = subprocess.check_output(
["gmsh", "gmsh -3 -algo meshadapt tmp_0.geo -o SFM.msh"],
stderr=subprocess.STDOUT
).strip().decode('utf8')
except subprocess.CalledProcessError as e:
out = e.output
print(out)
with:
b"--------------------------------------------------------------------------\n[[23419,1],0]: A high-performance Open MPI point-to-point messaging module\nwas
unable to find any relevant network interfaces:\n\nModule: OpenFabrics
(openib)\n Host: 931136e3f6fe\n\nAnother transport will be used
instead, although this may result in\nlower
performance.\n--------------------------------------------------------------------------\n\x1b[1m\x1b[31mFatal : Can't open display: (FLTK internal
error)\x1b[0m\n--------------------------------------------------------------------------\nMPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD \nwith errorcode
1.\n\nNOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.\nYou may or may not see output from other processes,
depending on\nexactly when Open MPI kills
them.\n--------------------------------------------------------------------------\n"
So how to emulate ! execution in jupyter from Python 3 code?
#Hristo:
_=/opt/conda/bin/jupyter SHLVL=1 PATH=/opt/conda/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=931136e3f6fe HOME=/root LC_ALL=C.UTF-8 PWD=/ JPY_PARENT_PID=1
LANG=C.UTF-8 TERM=xterm-color CLICOLOR=1 PAGER=cat GIT_PAGER=cat
MPLBACKEND=module://ipykernel.pylab.backend_inline env DISPLAY=:0 gmsh
-3 -algo meshadapt tmp_0.geo -o SFM.msh
#Gilles:
Same result.

It seems the root cause is the $DISPLAY environment variable is not set.
first make sure $DISPLAY is set when your Jupyter notebook starts.
you might also have to direct mpirun to export it to all the MPI tasks.
starting from Open MPI 3.0.0, you can achieve this with
export OMPI_MCA_mca_base_env_list=DISPLAY
before starting your Jupyter notebook
By the way, should your application need to open the X display ?
If it does not do any graphics, then it could be adjusted to work correctly when no display is available.
[ADDENDUM]
An other possibility is that gmsh thinks a display is available since DISPLAY is set, so it tries to open it and fails. You can try to unset this environment variable, and see how things go, both from the command line (e.g. interactive mode) and via the notebook (e.g. batch mode)

Python Scripting : how to execute unix commands inside gdb prompt using python script..?

I want to execute linux commands (eg. bt, break, frame etc) inside gdb prompt using python scripting.
for example: i am using subprocess.call(["gdb"], shell=True)
this line takes me to (gdb) prompt by executing gdb command but after it when i try
subprocess.call(["backtrace"], shell=True)
it shows /bin/sh:backtrace: command not found

backtrace isn't a linux command, it's a gdb command.
If you want to send commands to a popen'd gdb session, you have to push them through stdin, with something like...
import subprocess
gdb = subprocess.Popen(['gdb'], stdin=subprocess.PIPE)
gdb.stdin.write('backtrace\n')

The commands you type into a (gdb) prompt like backtrace, break, frame etc are gdb commands. Only gdb knows how to interpret them and they won't work with subprocess.call() as the latter can only run Linux executable files.
There are two ways to accomplish something close to what you want:
Start GDB under the control of Python and use the GDB/MI protocol to talk to it. This is how pyclewn works. E.g. p = subprocess.Popen(['gdb', '-i=mi'], stdin=fd_in, stdout=fd_out). See also https://bitbucket.org/minami/python-gdb-mi/src/tip/gdbmi/session.py?at=default
Use GDB's builtin Python scripting. (API Reference) E.g.
Save this to t.py
import gdb
gdb.execute('set confirm off')
gdb.execute('file /bin/true')
gdb.execute('start')
gdb.execute('backtrace')
gdb.execute('quit')
Then run:
$ gdb -q -x t.py
Temporary breakpoint 1 at 0x4014c0: file true.c, line 59.
Temporary breakpoint 1, main (argc=1, argv=0x7fffffffde28) at true.c:59
59 if (argc == 2)
#0 main (argc=1, argv=0x7fffffffde28) at true.c:59
$

What is needed to use gdb 7's support for debugging Python programs?

I'd like to use gdb 7's support for debugging Python "inferior processes".
What do I need to be able to do that?
For example:
What flags does the inferior Python's configure script need to have been run with?
Does the inferior Python process have to be Python 2.7 or newer (I see that's when the part of the gdb support for this that's in Python source tree was committed)? Or is Python 2.7 only needed by the gdb process itself?
What files need to have been installed that might not be packaged by all distributions? For example, on packages.ubuntu.com I don't get any hits for python-gdb.py, which I believe is needed.
It would be very handy to know what's needed on particular distributions. I am particularly interested in what's needed for Ubuntu and Centos.

Python seems to need to have been compiled with --with-pydebug (on Ubuntu 12.04, package python-dbg contains a suitable Python executable, itself called python-dbg). The inferior Python does not need to be Python 2.7 -- 2.6 loads the 2.7 gdb extensions successfully (see the debugging session below). At least on Ubuntu 12.04, the file that gets installed that defines the gdb extensions is called libpython.py, not python-gdb.py (for some reason, building Python yields a build directory containing both those files -- they are identical).
However, I don't think it's currently possible to debug using production core files: it looks like the gdb extensions for Python inferior processes try to get hold of variables that get optimized away in a production binary (for example, the f variable in PyEval_EvalFrameEx). It seems Linux / gdb, and Python has yet to reach the level of awesomeness achieved for JavaScript on Illumos which Bryan Cantrill reports here is able to debug production core dumps in this way:
http://www.infoq.com/presentations/Debugging-Production-Systems
Here's the debug session on Ubuntu 12.04 showing gdb running a Python 2.6 inferior process to debug a segfault, using Python 2.7's gdb extensions. First the code to cause the segfault:
~/Downloads/Python-2.6.4$ cat ~/bin/dumpcore.py
class Foo:
def bar(self):
from ctypes import string_at
string_at(0xDEADBEEF) # this code will cause Python to segfault
def main():
f = Foo()
f.someattr = 42
f.someotherattr = {'one':1, 'two':2L, 'three':[(), (None,), (None, None)]}
f.bar()
if __name__ == "__main__":
main()
and the debugging session:
~/Downloads/Python-2.6.4$ gdb --args ./python ~/bin/dumpcore.py
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /home/john/Downloads/Python-2.6.4/python...done.
(gdb) run
Starting program: /home/john/Downloads/Python-2.6.4/python /home/john/bin/dumpcore.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault.
0x0000000000468d67 in PyString_FromString (str=0xdeadbeef <Address 0xdeadbeef out of bounds>) at Objects/stringobject.c:116
116 size = strlen(str);
(gdb) py-bt
Undefined command: "py-bt". Try "help".
(gdb) python
>import sys
>sys.path.insert(0, "/home/john/Downloads/Python-2.7/Tools/gdb")
>import libpython
>(gdb) py-bt
#10 Frame 0x8f0f90, for file /home/john/Downloads/Python-2.6.4/Lib/ctypes/__init__.py, line 496, in string_at (ptr=3735928559, size=-1)
return _string_at(ptr, size)
#14 Frame 0x8ebf90, for file /home/john/bin/dumpcore.py, line 5, in bar (self=<Foo(someattr=42, someotherattr={'three': [(), (None,), (None, None)], 'two': 2L, 'one': 1}) at remote 0x7ffff6e03240>, string_at=<function at remote 0x7ffff6e1c990>)
string_at(0xDEADBEEF) # this code will cause Python to segfault
#17 Frame 0x8ebd80, for file /home/john/bin/dumpcore.py, line 12, in main (f=<Foo(someattr=42, someotherattr={'three': [(), (None,), (None, None)], 'two': 2L, 'one': 1}) at remote 0x7ffff6e03240>)
f.bar()
#20 Frame 0x8eb680, for file /home/john/bin/dumpcore.py, line 16, in <module> ()
main()
(gdb)

For Centos 6, you simply need to do:
yum install gdb python-debuginfo
debuginfo-install python
You can then debug running python processes by simply attaching to them with gdb:
gdb python [process id]
Once connected, just type:
py-bt

gdb: break in shared library loaded by python

I'm trying to debug c/c++ code located in shared libraries which are loaded by ctypes.cdll.LoadLibrary() in python and then specific functions are called from python.
The python code forks child processes, so I need to be able to break whether the c function is called from a python parent or child process.
A dead-simple example: test.c
// j = clib.call1(i)
int call1(int i)
{
return i*2;
}
test.py
import os, sys, ctypes
path = os.path.abspath(os.path.join(os.path.dirname(sys.argv[0]), "test.so"))
clib = ctypes.cdll.LoadLibrary(path)
i = 20
j = clib.call1(i)
print "i=%d j=%d\n" %(i, j)
$ gcc -g -O0 test.c -shared -o test.so
$ gdb --args python-dbg test.py
(gdb) break test.c call1
Function "test.c call1" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (test.c call1) pending.
(gdb) info breakpoints
Num Type Disp Enb Address What
1 breakpoint keep y <PENDING> test.c call1
(gdb) run
Starting program: /usr/bin/python-dbg test.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
i=20 j=40
[23744 refs]
[Inferior 1 (process 44079) exited normally]
You can see from my terminal log that gdb is not seeing the breakpoint when python loads the library. I am seeing the same behavior with my application.

break on call1 instead
(gdb) break call1
this should work too
(gdb) break test.c:call1

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.