python tracing a segmentation fault - python

I'm developing C extensions from python and I obtain some segfaults (inevitable during the development...).
I'm searching for a way to display at which line of code the segfault happens (an idea is like tracing every single line of code), how can I do that?

If you are on linux, run python under gdb
gdb python
(gdb) run /path/to/script.py
## wait for segfault ##
(gdb) backtrace
## stack trace of the c code

Here's a way to output the filename and line number of every line of Python your code runs:
import sys
def trace(frame, event, arg):
print("%s, %s:%d" % (event, frame.f_code.co_filename, frame.f_lineno))
return trace
def test():
print("Line 8")
print("Line 9")
sys.settrace(trace)
test()
Output:
call, test.py:7
line, test.py:8
Line 8
line, test.py:9
Line 9
return, test.py:9
(You'd probably want to write the trace output to a file, of course.)

Segfaults from C extensions are very frequently a result of not incrementing a reference count when you create a new reference to an object. That makes them very hard to track down as the segfault occurs only after the last reference is removed from the object, and even then often only when some other object is being allocated.
You don't say how much C extension code you have written so far, but if you're just starting out consider whether you can use either ctypes or Cython. Ctypes may not be flexible enough for your needs, but you should be able to link to just about any C library with Cython and have all the reference counts maintained for you automatically.
That isn't always sufficient: if your Python objects and any underlying C objects have different lifetimes you can still get problems, but it does simplify things considerably.

I came here looking for a solution to the same problem, and none of the other answers helped me. What did help was faulthandler, and you can install it in Python 2.7 just using pip install.
faulthandler was introduced to Python only in version 3.3, that was released in September 2012, which was after most other answers here were written.

There are somewhat undocumented python extensions for gdb.
From the Python source grab Tools/gdb/libpython.py (it is not included in a normal install).
Put this in sys.path
Then:
# gdb /gps/python2.7_x64/bin/python coredump
...
Core was generated by `/usr/bin/python script.py'.
Program terminated with signal 11, Segmentation fault.
#0 call_function (oparg=<optimized out>, pp_stack=0x7f9084d15dc0) at Python/ceval.c:4037
...
(gdb) python
>import libpython
>
>end
(gdb) bt
#0 call_function (oparg=<optimized out>, pp_stack=0x7f9084d15dc0) at Python/ceval.c:4037
#1 PyEval_EvalFrameEx (f=f#entry=
Frame 0x7f9084d20ad0,
for file /usr/lib/python2.7/site-packages/librabbitmq/__init__.py, line 220,
in drain_events (self=<Connection(channels={1: <Channel(channel_id=1, connection=<...>, is_open=True, connect_timeout=4, _default_channel=<....(truncated), throwflag=throwflag#entry=0) at Python/ceval.c:2681
...
(gdb) py-list
218 else:
219 timeout = float(timeout)
>220 self._basic_recv(timeout)
221
222 def channel(self, channel_id=None):
As you can see we now have visibility into the Python stack corresponding with the CPython call chain.
Some caveats:
Your version of gdb needs to be greater than 7 and it needs to have been compiled with --with-python
gdb embeds python (by linking to libpython), it doesn't run it in a subshell. This means that It may not necessarily match the version of python that is on $PATH.
You need to download libpython.py from whatever version of the Python source that matches whatever gdb is linked to.
You may have to run gdb as root - if so you may need to set up sys.path to match that of the code that you are debugging.
If you cannot copy libpython.py into sys.path then you can add it's location to sys.path like this:
(gdb) python
>import sys
>sys.path.append('/path/to/containing/dir/')
>import libpython
>
>end
This is somewhat poorly documented in the python dev docs, the fedora wiki and the python wiki
If you have an older gdb or just can't get this working there is also a gdbinit in the Python source that you can copy to ~/.gdbinit which add some similar functionality

Here are 3 more alternatives:
1: Executing a script with faulthandler enabled:
python3 -X faulthandler your_script.py
2: Executing a script in debug mode (pdb)
python3 -m pdb your_script.py
and execute the script with the continue command.
The gdb tool provided the most information, but none of them print the last executed line number in my script.
3: I ended up using pytest. For this to work, I wrapped my code in a function prefixed with test_ and execute the script like this:
pytest your_script.py

Mark's answer is awesome. If you happen to be on a machine that has lldb readily available in your path, and not gdb (which was my case), Mark's answer becomes:
lldb python
(lldb) process launch -- /path/to/script.py
## wait for segfault ##
(lldb) bt
## stack trace of the c code
I wish I would have stumbled on this answer earlier :)

Related

Python shell in debugger after opening a python core dump

I have a core dump of a running CPython program and would like to execute Python code in the dumped process's context.
I have loaded the core and the interpreter into gdb with gdb python core-dump-file.
I know about python-interactive, but it isn't able to see the context (ex: import sys; sys.modules doesn't give me any of the process's modules)
How can I do this?
I don't mind calling CPython's C functions if that is the only possible way.
1) First check if your gdb has been built with python from source.
You can do this(in the gdb prompt) by:
(gdb) python print("Hi from python")
If you want to check the version of python in your system try:
(gdb) python import sys
(gdb) python print(sys.version)
If these commands fail. It probably means that your gdb was never built with python support in the first place.
You should build gdb from source, and in the configure step add --with-python="Path to python"
eg.
./configure --with-python=/usr/bin/python36
Hope this helps!!

Debugging TensorFlow tests: pdb or gdb?

I am debugging decode_raw_op_test from TensorFlow. The test file is written in python however it executes code from underlying C++ files.
Using pdb, I could debug python test file however it doesn't recognize c++ file. Is there a way in which we can debug underlying c++ code?
(I tried using gdb on decode_raw_op_test but it gives "File not in executable format: File format not recognized")
Debugging a mixed Python and C++ program is tricky. You can use gdb to debug the C++ parts of TensorFlow, however. There are two main ways to do this:
Run python under gdb, rather than the test script itself. Let's say that your test script is in bazel-bin/tensorflow/python/kernel_tests/decode_raw_op_test. You would run the following command:
$ gdb python bazel-bin/tensorflow/python/kernel_tests/decode_raw_op_test
(gdb) run
Note that gdb does not have great support for debugging the Python parts of the code. I'd recommend narrowing down the test case that you run to a single, simple test, and setting a breakpoint on a TensorFlow C API method, such as TF_Run, which is the main entry point from Python into C++ in TensorFlow.
Attach gdb to a running process. You can get the process ID of the Python test using ps and then run (where $PID is the process ID):
$ gdb -p $PID
You will probably need to arrange for your Python code to block so that there's time to attach. Calling the raw_input() function is an easy way to do this.
Could debug using below steps:
gdb python
then on gdb prompt, type
run bazel-bin/tensorflow/python/kernel_tests/decode_raw_op_test
Adding on mrry's answer, in today's TF2 environment, the main entry point would be TFE_Execute, this should be where you add the breakpoint.

Where should I start to debug a python program with "Segmentation fault: 11"

I have a python script test.py.
Running python test.py gives the brief message:
Segmentation fault: 11
In general, where should I start to debug such an issue?
In general (and without specifics for your code), the best thing to do is start putting print statements into your test script until you can narrow things down to a single line that is causing the segfault. Put in a bunch of print statements and move them around as you start narrowing down which print statements run and which ones don't because they are after the segfault.
If your Python script is causing a segmentation fault, that usually means that some Python module that is implemented in C is doing something wrong. You should be able to tell easily using gdb. Try running:
gdb `which python`
# This starts an interactive gdb session. Type:
set args /path/to/python/script.py
r
# The program will now run. Interact with it until the segfault occurs. Then type:
bt
This will give you the c call stack leading up to the segmentation fault. (gdb may print a message about missing debug symbols, and give you the command to run to install them. Debug symbols will give you a more detailed stack trace, including function names and line numbers of files.) Use this information to more quickly determine what call in Python is causing the segfault.

Can I view source-code of python built-in function while debugging with pdb?

How can I examine the code of a python built-in function, for example step into sum()?
https://docs.python.org/2/library/functions.html#sum.
I expected to see what sum() does using the code below and s command in pdb:
import pdb
def adder(nums):
x = sum(nums)
return x
pdb.set_trace()
print adder([1, 2, 3,4])
Some of the Python modules are written in C (to increase performance) and cannot be stepped through in pdb. If you really want to see what's going on in these functions it's possible, but not trivial. To examine C functions I typically use the GNU Debugger (GDB) and compile Python with debugging symbols enabled.
Download the Python source code found at https://www.python.org/downloads/
Untar the Python source code | tar xzvf Python-2.7.6.tar.gz
Enter the untarred directory and run the configuration script using | ./configure
Compile with debug symbols | make -g
Start your custom compiled debug Python with the GNU Debugger | gdb ./python
Set a breakpoint in GDB for the sum() call | b bltinmodule.c:builtin_sum.
Run your script from GDB (I called mine sumtest.py) | run ~/sumtest.py
The first thing that happens is you get prompted for your PDB call. Continue using c.
The next break is in the middle of the sum function in C. You can use info locals to list all the local variables. Just like in PDB c is used to continue execution to the next breakpoint amd s is used to step through single instructions.

How to step through Python code to help debug issues?

In Java/C# you can easily step through code to trace what might be going wrong, and IDE's make this process very user friendly.
Can you trace through python code in a similar fashion?
Yes! There's a Python debugger called pdb just for doing that!
You can launch a Python program through pdb by using pdb myscript.py or python -m pdb myscript.py.
There are a few commands you can then issue, which are documented on the pdb page.
Some useful ones to remember are:
b: set a breakpoint
c: continue debugging until you hit a breakpoint
s: step through the code
n: to go to next line of code
l: list source code for the current file (default: 11 lines including the line being executed)
u: navigate up a stack frame
d: navigate down a stack frame
p: to print the value of an expression in the current context
If you don't want to use a command line debugger, some IDEs like Pydev, Wing IDE or PyCharm have a GUI debugger. Wing and PyCharm are commercial products, but Wing has a free "Personal" edition, and PyCharm has a free community edition.
By using Python Interactive Debugger 'pdb'
First step is to make the Python interpreter to enter into the debugging mode.
A. From the Command Line
Most straight forward way, running from command line, of python interpreter
$ python -m pdb scriptName.py
> .../pdb_script.py(7)<module>()
-> """
(Pdb)
B. Within the Interpreter
While developing early versions of modules and to experiment it more iteratively.
$ python
Python 2.7 (r27:82508, Jul 3 2010, 21:12:11)
[GCC 4.0.1 (Apple Inc. build 5493)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pdb_script
>>> import pdb
>>> pdb.run('pdb_script.MyObj(5).go()')
> <string>(1)<module>()
(Pdb)
C. From Within Your Program
For a big project and long-running module, can start the debugging from inside the program using
import pdb and set_trace()
like this :
#!/usr/bin/env python
# encoding: utf-8
#
import pdb
class MyObj(object):
count = 5
def __init__(self):
self.count= 9
def go(self):
for i in range(self.count):
pdb.set_trace()
print i
return
if __name__ == '__main__':
MyObj(5).go()
Step-by-Step debugging to go into more internal
Execute the next statement… with “n” (next)
Repeating the last debugging command… with ENTER
Quitting it all… with “q” (quit)
Printing the value of variables… with “p” (print)
a) p a
Turning off the (Pdb) prompt… with “c” (continue)
Seeing where you are… with “l” (list)
Stepping into subroutines… with “s” (step into)
Continuing… but just to the end of the current subroutine… with “r” (return)
Assign a new value
a) !b = "B"
Set a breakpoint
a) break linenumber
b) break functionname
c) break filename:linenumber
Temporary breakpoint
a) tbreak linenumber
Conditional breakpoint
a) break linenumber, condition
Note:**All these commands should be execute from **pdb
For in-depth knowledge, refer:-
https://pymotw.com/2/pdb/
https://pythonconquerstheuniverse.wordpress.com/2009/09/10/debugging-in-python/
There is a module called 'pdb' in python. At the top of your python script you do
import pdb
pdb.set_trace()
and you will enter into debugging mode. You can use 's' to step, 'n' to follow next line similar to what you would do with 'gdb' debugger.
Starting in Python 3.7, you can use the breakpoint() built-in function to enter the debugger:
foo()
breakpoint() # drop into the debugger at this point
bar()
By default, breakpoint() will import pdb and call pdb.set_trace(). However, you can control debugging behavior via sys.breakpointhook() and use of the environment variable PYTHONBREAKPOINT.
See PEP 553 for more information.
ipdb (IPython debugger)
ipdb adds IPython functionality to pdb, offering the following HUGE improvements:
tab completion
show more context lines
syntax highlight
Much like pdg, ipdb is still far from perfect and completely rudimentary if compared to GDB, but it is already a huge improvement over pdb.
Usage is analogous to pdb, just install it with:
python3 -m pip install --user ipdb
and then add to the line you want to step debug from:
__import__('ipdb').set_trace(context=21)
You likely want to add a shortcut for that from your editor, e.g. for Vim snipmate I have:
snippet ipd
__import__('ipdb').set_trace(context=21)
so I can type just ipd<tab> and it expands to the breakpoint. Then removing it is easy with dd since everything is contained in a single line.
context=21 increases the number of context lines as explained at: How can I make ipdb show more lines of context while debugging?
Alternatively, you can also debug programs from the start with:
ipdb3 main.py
but you generally don't want to do that because:
you would have to go through all function and class definitions as Python reads those lines
I don't know how to set the context size there without hacking ipdb. Patch to allow it: https://github.com/gotcha/ipdb/pull/155
Or alternatively, as in raw pdb 3.2+ you can set some breakpoints from the command line:
ipdb3 -c 'b 12' -c 'b myfunc' ~/test/a.py
although -c c is broken for some reason: https://github.com/gotcha/ipdb/issues/156
python -m module debugging has been asked at: How to debug a Python module run with python -m from the command line? and since Python 3.7 can be done with:
python -m pdb -m my_module
Serious missing features of both pdb and ipdb compared to GDB:
persistent command history across sessions: Save command history in pdb
ipdb specific annoyances:
multithreading does not work well if you don't hack some settings...
ipdb, multiple threads and autoreloading programs causing ProgrammingError
https://github.com/gotcha/ipdb/issues/51
Tested in Ubuntu 16.04, ipdb==0.11, Python 3.5.2.
VSCode
If you want to use an IDE, this is a good alternative to PyCharm.
Install VSCode
Install the Python extension, if it's not already installed
Create a file mymodule.py with Python code
To set a breakpoint, hover over a line number and click the red dot, or press F9
Hit F5 to start debugging and select Python File
It will stop at the breakpoint and you can do your usual debugging stuff like inspecting the values of variables, either at the tab VARIABLES (usually on the left) or by clicking on Debug Console (usually at the bottom next to your Terminal):
This screenshot shows VSCodium.
More information
Python debugging in VS Code
Getting Started with Python in VS Code
Debugging in Visual Studio Code
There exist breakpoint() method nowadays, which replaces import pdb; pdb.set_trace().
It also has several new features, such as possible environment variables.
Python Tutor is an online single-step debugger meant for novices. You can put in code on the edit page then click "Visualize Execution" to start it running.
Among other things, it supports:
hiding variables, e.g. to hide a variable named x, put this at the end:
#pythontutor_hide: x
saving/sharing
a few other languages like Java, JS, Ruby, C, C++
However it also doesn't support a lot of things, for example:
Reading/writing files - use io.StringIO and io.BytesIO instead: demo
Code that is too large, runs too long, or defines too many variables or objects
Command-line arguments
Lots of standard library modules like argparse, csv, enum, html, os, sys, weakref...
Python 3.7+
Let's take look at what breakpoint() can do for you in 3.7+.
I have installed ipdb and pdbpp, which are both enhanced debuggers, via
pip install pdbpp
pip install ipdb
My test script, really doesn't do much, just calls breakpoint().
#test_188_breakpoint.py
myvars=dict(foo="bar")
print("before breakpoint()")
breakpoint() # 👈
print(f"after breakpoint myvars={myvars}")
breakpoint() is linked to the PYTHONBREAKPOINT environment variable.
CASE 1: disabling breakpoint()
You can set the variable via bash as usual
export PYTHONBREAKPOINT=0
This turns off breakpoint() where it does nothing (as long as you haven't modified sys.breakpointhook() which is outside of the scope of this answer).
This is what a run of the program looks like:
(venv38) myuser#explore$ export PYTHONBREAKPOINT=0
(venv38) myuser#explore$ python test_188_breakpoint.py
before breakpoint()
after breakpoint myvars={'foo': 'bar'}
(venv38) myuser#explore$
Didn't stop, because I disabled breakpoint. Something that pdb.set_trace() can't do 😀😀😀!
CASE 2: using the default pdb behavior:
Now, let's unset PYTHONBREAKPOINT which puts us back to normal, enabled-breakpoint behavior (it's only disabled when 0 not when empty).
(venv38) myuser#explore$ unset PYTHONBREAKPOINT
(venv38) myuser#explore$ python test_188_breakpoint.py
before breakpoint()
[0] > /Users/myuser/kds2/wk/explore/test_188_breakpoint.py(6)<module>()
-> print(f"after breakpoint myvars={myvars}")
(Pdb++) print("pdbpp replaces pdb because it was installed")
pdbpp replaces pdb because it was installed
(Pdb++) c
after breakpoint myvars={'foo': 'bar'}
It stopped, but I actually got pdbpp because it replaces pdb entirely while installed. If I unistalled pdbpp, I'd be back to normal pdb.
Note: a standard pdb.set_trace() would still get me pdbpp
CASE 3: calling a custom debugger
But let's call ipdb instead. This time, instead of setting the environment variable, we can use bash to set it only for this one command.
(venv38) myuser#explore$ PYTHONBREAKPOINT=ipdb.set_trace py test_188_breakpoint.py
before breakpoint()
> /Users/myuser/kds2/wk/explore/test_188_breakpoint.py(6)<module>()
5 breakpoint()
----> 6 print(f"after breakpoint myvars={myvars}")
7
ipdb> print("and now I invoked ipdb instead")
and now I invoked ipdb instead
ipdb> c
after breakpoint myvars={'foo': 'bar'}
Essentially, what it does, when looking at $PYTHONBREAKPOINT:
from ipdb import set_trace # function imported on the right-most `.`
set_trace()
Again, much cleverer than a plain old pdb.set_trace() 😀😀😀
in practice? I'd probably settle on a debugger.
Say I want ipdb always, I would:
export it via .profile or similar.
disable on a command by command basis, without modifying the normal value
Example (pytest and debuggers often make for unhappy couples):
(venv38) myuser#explore$ export PYTHONBREAKPOINT=ipdb.set_trace
(venv38) myuser#explore$ echo $PYTHONBREAKPOINT
ipdb.set_trace
(venv38) myuser#explore$ PYTHONBREAKPOINT=0 pytest test_188_breakpoint.py
=================================== test session starts ====================================
platform darwin -- Python 3.8.6, pytest-5.1.2, py-1.9.0, pluggy-0.13.1
rootdir: /Users/myuser/kds2/wk/explore
plugins: celery-4.4.7, cov-2.10.0
collected 0 items
================================== no tests ran in 0.03s ===================================
(venv38) myuser#explore$ echo $PYTHONBREAKPOINT
ipdb.set_trace
p.s.
I'm using bash under macos, any posix shell will behave substantially the same. Windows, either powershell or DOS, may have different capabilities, especially around PYTHONBREAKPOINT=<some value> <some command> to set a environment variable only for one command.
If you come from Java/C# background I guess your best bet would be to use Eclipse with Pydev. This gives you a fully functional IDE with debugger built in. I use it with django as well.
https://wiki.python.org/moin/PythonDebuggingTools
pudb is a good drop-in replacement for pdb
PyCharm is an IDE for Python that includes a debugger. Watch this YouTube video for an introduction on using it to step through code:
PyCharm Tutorial - Debug python code using PyCharm (the debugging starts at 6:34)
Note: PyCharm is a commercial product, but the company does provide a free license to students and teachers, as well as a "lightweight" Community version that is free and open-source.
If you want an IDE with integrated debugger, try PyScripter.
Programmatically stepping and tracing through python code is possible too (and its easy!). Look at the sys.settrace() documentation for more details. Also here is a tutorial to get you started.
Visual Studio with PTVS could be an option for you: http://www.hanselman.com/blog/OneOfMicrosoftsBestKeptSecretsPythonToolsForVisualStudioPTVS.aspx

Categories