best way to deal with python pdb flakiness re/stdout? - python

If I have a program where stdout is redirected, my pdb prompts all go to the redirection, because the library was written to write to stdout.
Oftentimes this problem is subtle, causing me to think a program is hanging when it's really waiting for input.
How do people work around this? (Unfortunately, using other debuggers like winpdb is not an option).

This answer is just to supplement Ned's, as a way of wrapping the pdb.py main() function in a manner which doesn't require copying 40 lines just to change one of them:
# sane_pdb.py: launch Pdb with stdout on original
import sys, pdb
def fixed_pdb(Pdb=pdb.Pdb):
'''make Pdb() tied to original stdout'''
return Pdb(stdout=sys.__stdout__)
if __name__ == '__main__':
pdb.Pdb = fixed_pdb
pdb.main()
I don't know if it actually works for the questioner's problem, but it does what Ned described...

The problem here is that PDB uses Cmd class where by default:
use_rawinput = 1
It means that Cmd will use raw_input() method by default instead of sys.stdout.readline() to read from the console . This is done because raw_input() supports history (only if readline module is loaded) and other useful bits . The only issue is that raw_input() does not support redirection, so if you have a script:
#!/usr/bin/python
name=raw_input("Enter your name: ")
and run it
> python test.py
Enter your name: Alex
but, if you run it with output redirection it will stuck
> python test.py | tee log
this is exactly what PDB uses and why it's stuck as well. As I mentioned sys.stdin.readline() supports redirection and if you rewrite the above script using readline() it should work.
Back to the original issue all you need to do is to tell Cmd to not use raw_input():
Cmd.use_rawinput = 0
or
pdb = pdb.Pdb()
pdb.use_rawinput=0
pdb.set_trace()

If you are invoking pdb in code, you can pass your own stdout into the constructor. sys.__stdout__ might be a good choice.
If you are invoking pdb from the command line, you could copy the main() function from pdb.py into your own sane_pdb.py. Then change the Pdb() initialization to:
pdb = Pdb(stdout=sys.__stdout__)
Then you can invoke sane_pdb.py instead of pdb.py. It's not awesome that you'd have to copy 40 lines into your own file just to change one of them, but it's an option.

Related

Python debugger with line edition in a program that uses stdin

To add an ad hoc debugger breakpoint in a Python script, I can insert the line
import pdb; pdb.set_trace()
Pdb reads from standard input, so this doesn't work if the script itself also reads from standard input. As a workaround, on a Unix-like system, I can tell pdb to read from the terminal:
import pdb; pdb.Pdb(stdin=open('/dev/tty', 'r'), stdout=open('/dev/tty', 'w')).set_trace()
This works, but unlike with a plain pdb.set_trace, I don't get the benefit of command line edition provided by the readline library (arrow keys, etc.).
How can I enter pdb without interfering with the script's stdin and stdout, and still get command line edition?
Ideally the same code should work in both Python 2 and Python 3. Compatibility with non-Unix systems would be a bonus.
Toy program as a test case:
#!/usr/bin/env python
import sys
for line in sys.stdin:
#import pdb; pdb.set_trace()
import pdb; pdb.Pdb(stdin=open('/dev/tty', 'r'), stdout=open('/dev/tty', 'w')).set_trace()
sys.stdout.write(line)
Usage: { echo one; echo two; } | python cat.py
I hope I have not missed anything important, but it seems like you cannot really do that in an entirely trivial way, because readline would only get used if pdb.Pdb (resp. cmd.Cmd it sublcasses) has use_rawinput set to non-zero, which however would result in ignoring your stdin and mixing inputs for debugger and script itself. That said, the best I've come up with so far is:
#!/usr/bin/env python3
import os
import sys
import pdb
pdb_inst = pdb.Pdb()
stdin_called = os.fdopen(os.dup(0))
console_new = open('/dev/tty')
os.dup2(console_new.fileno(), 0)
console_new.close()
sys.stdin = os.fdopen(0)
for line in stdin_called:
pdb_inst.set_trace()
sys.stdout.write(line)
It is relatively invasive to your original script, even though it could be at least placed outside of it and imported and called or used as a wrapper.
I've redirected (duplicated) the incoming STDIN to a file descriptor and opened that as stdin_called. Then (based on your example) I've opened /dev/tty for reading, replaced process' file descriptor 0 (for STDIN; it should rather use value returned by sys.stdin.fileno()) with this one I've just opened and also reassigned a corresponding file-like object to sys.stdin. This way the programs loop and pdb are using their own input streams while pdb gets to interact with what appears to be just a "normal" console STDIN it is happy to enable readline on.
It isn't pretty, but should be doing what you were after and it hopefully provides useful hints. It uses (if available) readline (line editing, history, completion) when in pdb:
$ { echo one; echo two; } | python3 cat.py
> /tmp/so/cat.py(16)<module>()
-> sys.stdout.write(line)
(Pdb) c
one
> /tmp/so/cat.py(15)<module>()
-> pdb_inst.set_trace()
(Pdb) con[TAB][TAB]
condition cont continue
(Pdb) cont
two
Note starting with version 3.7 you could use breakpoint() instead of import pdb; pdb.Pdb().set_trace() for convenience and you could also check result of dup2 call to make sure the file descriptor got created/replaced as expected.
EDIT: As mentioned earlier and noted in a comment by OP, this is both ugly and invasive to the script. It's not making it any prettier, but we can employ few tricks to reduce impact on its surrounding. One such option I've hacked together:
import sys
# Add this: BEGIN
import os
import pdb
import inspect
pdb_inst = pdb.Pdb()
class WrapSys:
def __init__(self):
self.__stdin = os.fdopen(os.dup(0))
self.__console = open('/dev/tty')
os.dup2(self.__console.fileno(), 0)
self.__console.close()
self.__console = os.fdopen(0)
self.__sys = sys
def __getattr__(self, name):
if name == 'stdin':
if any((f.filename.endswith("pdb.py") for f in inspect.stack())):
return self.__console
else:
return self.__stdin
else:
return getattr(self.__sys, name)
sys = WrapSys()
# Add this: END
for line in sys.stdin:
pdb_inst.set_trace() # Inject breakpoint
sys.stdout.write(line)
I have not dug all the way through, but as is, pdb/cmd really seems to not only need sys.stdin but also for it to use fd 0 in order for readline to kick in. Above example takes things up a notch and within our script hijacks what sys stands for in order to preset different meaning for sys.stdin when code from pdb.py is on a stack. One obvious caveat. If anything else other then pdb also expects and depends on sys.stdin fd to be 0, it still would be out of luck (or reading its input from a different stream if it just went for it).

How should a Python file be written such that it can be both a module and a script with command line options and pipe capabilities?

I'm considering how a Python file could be made to be an importable module as well as a script that is capable of accepting command line options and arguments as well as pipe data. How should this be done?
My attempt seems to work, but I want to know if my approach is how such a thing should be done (if such a thing should be done). Could there be complexities (such as when importing it) that I have not considered?
#!/usr/bin/env python
"""
usage:
program [options]
options:
--version display version and exit
--datamode engage data mode
--data=FILENAME input data file [default: data.txt]
"""
import docopt
import sys
def main(options):
print("main")
datamode = options["--datamode"]
filename_input_data = options["--data"]
if datamode:
print("engage data mode")
process_data(filename_input_data)
if not sys.stdin.isatty():
print("accepting pipe data")
input_stream = sys.stdin
input_stream_list = [line for line in input_stream]
print("input stream: {data}".format(data = input_stream_list))
def process_data(filename):
print("process data of file {filename}".format(filename = filename))
if __name__ == "__main__":
options = docopt.docopt(__doc__)
if options["--version"]:
print(version)
exit()
main(options)
That's it, you're good.
Nothing matters[1] except the if __name__ == '__main__', as noted elsewhere
From the docs (emphasis mine):
A module’s __name__ is set equal to '__main__' when read from standard input, a script, or from an interactive prompt. A module can discover whether or not it is running in the main scope by checking its own __name__, which allows a common idiom for conditionally executing code in a module when it is run as a script or with python -m but not when it is imported
I also like how python 2's docs poetically phrase it
It is this environment in which the idiomatic “conditional script” stanza causes a script to run:
That guard guarantees that the code underneath it will only be accepted if it is the main function being called; put all your argument-grabbing code there. If there is no other top-level code except class/function declarations, it will be safe to import.
Other complications?
Yes:
Multiprocessing (a new interpreter is started and things are re-imported). if __name__ == '__main__' covers that
If you're used to C coding, you might be thinking you can protect your imports with ifdef's and the like. There's some analogous hacks in python, but it's not what you're looking for.
I like having a main method like C and Java - when's that coming out? Never.
But I'm paranoid! What if someone changes my main function. Stop being friends with that person. As long as you're the user, I assume this isn't an issue.
I mentioned the -m flag. That sounds great, what's that?! Here and here, but don't worry about it.
Footnotes:
[1] Well, the fact that you put your main code in a function is nice. Means things will run slightly faster

Use pdb to debug into subprocess?

I have some python code with many calls to subprocess (for example, subprocess.check_call()). It apparently can't debug into the subprocess.
Is there any way (e.g. adding code) to make it do that, or must I use a different debugger?
It turns out the obstacle was that the code calling subprocess also redirected stdout:
subprocess.call(["called_program",
"-q", num_processes,
"-k", yaml_key],
stdout=logfile,
stderr=subprocess.STDOUT)
Per tdelaney's comment, I removed the redirects and put a breakpoint in called_program.py. Now I can use pdb within that module.
You will have to step through the code if you have a pdb. If you have the source files, leave a breakpoint in the line of your interest and use pdb to automatically stop at your line of interest.
This is what we do in .net. Hopefully it should work for python too..

Running a python debug session from a program, not from the console

I'm writing a little python IDE, and I want to add simple debugging. I don't need all the features of winpdb.
How do I launch a python program (by file name) with a breakpoint set at a line number so that it runs until that line number and halts?
Note that I don't want to do this from the command-line, and I don't want to edit the source (by inserting set_trace, for example). And I don't want it to stop at the first line so I have to run the debugger from there. I've tried all the obvious ways with pdb and bdb, but I must be missing something.
Pretty much the only viable way to do it (as far as I know) is to run Python as a subprocess from within your IDE. This avoids "pollution" from the current Python interpreter, which makes it fairly likely that the program will run in the same way as if you had started it independently. (If you have issues with this, check the subprocess environment.) In this manner, you can run a script in "debug mode" using
p = subprocess.Popen(args=[sys.executable, '-m', 'pdb', 'scriptname.py', 'arg1'],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
This will start up Python at the debugger prompt. You'll need to run some debugger commands to set breakpoints, which you can do like so:
o,e = p.communicate('break scriptname.py:lineno')
If this works, o should be the normal output of the Python interpreter after it sets a breakpoint, and e should be empty. I'd suggest you play around with this and add some checks in your code to ensure whether the breakpoints were properly set.
After that, you can start the program running with
p.communicate('continue')
At this point you'd probably want to hook the input, output, and error streams up to the console that you're embedding in your IDE. You would probably need to do this with an event loop, roughly like so:
while p.returncode is None:
o,e = p.communicate(console.read())
console.write(o)
console.write(e)
You should consider that snippet to be effectively pseudocode, since depending on how exactly your console works, it'll probably take some tinkering to get it right.
If this seems excessively messy, you can probably simplify the process a bit using the features of Python's pdb and bdb modules (I'm guessing "Python debugger" and basic debugger" respectively). The best reference on how to do this is the source code of the pdb module itself. Basically, the way the responsibilities of the modules are split is that bdb handles "under the hood" debugger functionality, like setting breakpoints, or stopping and restarting execution; pdb is a wrapper around this that handles user interaction, i.e. reading commands and displaying output.
For your IDE-integrated debugger, it would make sense to adjust the behavior of the pdb module in two ways that I can think of:
have it automatically set breakpoints during initialization, without you having to explicity send the textual commands to do so
make it take input from and send output to your IDE's console
Just these two changes should be easy to implement by subclassing pdb.Pdb. You can create a subclass whose initializer takes a list of breakpoints as an additional argument:
class MyPDB(pdb.Pdb):
def __init__(self, breakpoints, completekey='tab',
stdin=None, stdout=None, skip=None):
pdb.Pdb.__init__(self, completekey, stdin, stdout, skip)
self._breakpoints = breakpoints
The logical place to actually set up the breakpoints is just after the debugger reads its .pdbrc file, which occurs in the pdb.Pdb.setup method. To perform the actual setup, use the set_break method inherited from bdb.Bdb:
def setInitialBreakpoints(self):
_breakpoints = self._breakpoints
self._breakpoints = None # to avoid setting breaks twice
for bp in _breakpoints:
self.set_break(filename=bp.filename, line=bp.line,
temporary=bp.temporary, conditional=bp.conditional,
funcname=bp.funcname)
def setup(self, f, t):
pdb.Pdb.setup(self, f, t)
self.setInitialBreakpoints()
This piece of code would work for each breakpoint being passed as e.g. a named tuple. You could also experiment with just constructing bdb.Breakpoint instances directly, but I'm not sure if that would work properly, since bdb.Bdb maintains its own information about breakpoints.
Next, you'll need to create a new main method for your module which runs it the same way pdb runs. To some extent, you can copy the main method from pdb (and the if __name__ == '__main__' statement of course), but you'll need to augment it with some way to pass in the information about your additional breakpoints. What I'd suggest is writing the breakpoints to a temporary file from your IDE, and passing the name of that file as a second argument:
tmpfilename = ...
# write breakpoint info
p = subprocess.Popen(args=[sys.executable, '-m', 'mypdb', tmpfilename, ...], ...)
# delete the temporary file
Then in mypdb.main(), you would add something like this:
def main():
# code excerpted from pdb.main()
...
del sys.argv[0]
# add this
bpfilename = sys.argv[0]
with open(bpfilename) as f:
# read breakpoint info
breakpoints = ...
del sys.argv[0]
# back to excerpt from pdb.main()
sys.path[0] = os.path.dirname(mainpyfile)
pdb = Pdb(breakpoints) # modified
Now you can use your new debugger module just like you would use pdb, except that you don't have to explicitly send break commands before the process starts. This has the advantage that you can directly hook the standard input and output of the Python subprocess to your console, if it allows you to do that.

python: run interactive python shell from program

I often have the case that I'll be writing a script, and I'm up to a part of the script where I want to play around with some of the variables interactively. Getting to that part requires running a large part of the script I've already written.
In this case it isn't trivial to run this program from inside the shell. I would have to recreate the conditions of that function somehow.
What I want to do is call a function, like runshell(), which will run the python shell at that point in the program, keeping all variables in scope, allowing me to poke around in it.
How would I go about doing that?
import code
code.interact(local=locals())
But using the Python debugger is probably more what you want:
import pdb
pdb.set_trace()
By far the most convenient method that I have found is:
import IPython
IPython.embed()
You get all your global and local variables and all the creature comforts of IPython: tab completion, auto indenting, etc.
You have to install the IPython module to use it of course:
pip install ipython
For practicality I'd like to add that you can put the debugger trace in a one liner:
import pdb; pdb.set_trace()
Which is a nice line to add to an editor that supports snippets, like TextMate or Vim+SnipMate. I have it set up to expand "break" into the above one liner.
You can use the python debugger (pdb) set_trace function.
For example, if you invoke a script like this:
def whatever():
x = 3
import pdb
pdb.set_trace()
if __name__ == '__main__':
whatever()
You get the scope at the point when set_trace is called:
$ python ~/test/test.py
--Return--
> /home/jterrace/test/test.py(52)whatever()->None
-> pdb.set_trace()
(Pdb) x
3
(Pdb)
Not exactly a perfect source but I've written a few manhole's before, here is one I wrote for an abandoned pet project http://code.google.com/p/devdave/source/browse/pymethius/trunk/webmud/handlers/konsole.py
And here is one from the Twisted Library http://twistedmatrix.com/trac/browser/tags/releases/twisted-8.1.0/twisted/manhole/telnet.py the console logic is in Shell.doCommand

Categories