Python debugger with line edition in a program that uses stdin - python

To add an ad hoc debugger breakpoint in a Python script, I can insert the line
import pdb; pdb.set_trace()
Pdb reads from standard input, so this doesn't work if the script itself also reads from standard input. As a workaround, on a Unix-like system, I can tell pdb to read from the terminal:
import pdb; pdb.Pdb(stdin=open('/dev/tty', 'r'), stdout=open('/dev/tty', 'w')).set_trace()
This works, but unlike with a plain pdb.set_trace, I don't get the benefit of command line edition provided by the readline library (arrow keys, etc.).
How can I enter pdb without interfering with the script's stdin and stdout, and still get command line edition?
Ideally the same code should work in both Python 2 and Python 3. Compatibility with non-Unix systems would be a bonus.
Toy program as a test case:
#!/usr/bin/env python
import sys
for line in sys.stdin:
#import pdb; pdb.set_trace()
import pdb; pdb.Pdb(stdin=open('/dev/tty', 'r'), stdout=open('/dev/tty', 'w')).set_trace()
sys.stdout.write(line)
Usage: { echo one; echo two; } | python cat.py

I hope I have not missed anything important, but it seems like you cannot really do that in an entirely trivial way, because readline would only get used if pdb.Pdb (resp. cmd.Cmd it sublcasses) has use_rawinput set to non-zero, which however would result in ignoring your stdin and mixing inputs for debugger and script itself. That said, the best I've come up with so far is:
#!/usr/bin/env python3
import os
import sys
import pdb
pdb_inst = pdb.Pdb()
stdin_called = os.fdopen(os.dup(0))
console_new = open('/dev/tty')
os.dup2(console_new.fileno(), 0)
console_new.close()
sys.stdin = os.fdopen(0)
for line in stdin_called:
pdb_inst.set_trace()
sys.stdout.write(line)
It is relatively invasive to your original script, even though it could be at least placed outside of it and imported and called or used as a wrapper.
I've redirected (duplicated) the incoming STDIN to a file descriptor and opened that as stdin_called. Then (based on your example) I've opened /dev/tty for reading, replaced process' file descriptor 0 (for STDIN; it should rather use value returned by sys.stdin.fileno()) with this one I've just opened and also reassigned a corresponding file-like object to sys.stdin. This way the programs loop and pdb are using their own input streams while pdb gets to interact with what appears to be just a "normal" console STDIN it is happy to enable readline on.
It isn't pretty, but should be doing what you were after and it hopefully provides useful hints. It uses (if available) readline (line editing, history, completion) when in pdb:
$ { echo one; echo two; } | python3 cat.py
> /tmp/so/cat.py(16)<module>()
-> sys.stdout.write(line)
(Pdb) c
one
> /tmp/so/cat.py(15)<module>()
-> pdb_inst.set_trace()
(Pdb) con[TAB][TAB]
condition cont continue
(Pdb) cont
two
Note starting with version 3.7 you could use breakpoint() instead of import pdb; pdb.Pdb().set_trace() for convenience and you could also check result of dup2 call to make sure the file descriptor got created/replaced as expected.
EDIT: As mentioned earlier and noted in a comment by OP, this is both ugly and invasive to the script. It's not making it any prettier, but we can employ few tricks to reduce impact on its surrounding. One such option I've hacked together:
import sys
# Add this: BEGIN
import os
import pdb
import inspect
pdb_inst = pdb.Pdb()
class WrapSys:
def __init__(self):
self.__stdin = os.fdopen(os.dup(0))
self.__console = open('/dev/tty')
os.dup2(self.__console.fileno(), 0)
self.__console.close()
self.__console = os.fdopen(0)
self.__sys = sys
def __getattr__(self, name):
if name == 'stdin':
if any((f.filename.endswith("pdb.py") for f in inspect.stack())):
return self.__console
else:
return self.__stdin
else:
return getattr(self.__sys, name)
sys = WrapSys()
# Add this: END
for line in sys.stdin:
pdb_inst.set_trace() # Inject breakpoint
sys.stdout.write(line)
I have not dug all the way through, but as is, pdb/cmd really seems to not only need sys.stdin but also for it to use fd 0 in order for readline to kick in. Above example takes things up a notch and within our script hijacks what sys stands for in order to preset different meaning for sys.stdin when code from pdb.py is on a stack. One obvious caveat. If anything else other then pdb also expects and depends on sys.stdin fd to be 0, it still would be out of luck (or reading its input from a different stream if it just went for it).

Related

call an executable string from python

I'm trying to find a way to run an executable script that can be downloaded from the web from Python, without saving it as a file. The script can be python code or bash or whatever - it should execute appropriately based on the shebang. I.e. if the following were saved in a file called script, then I want something that will run ./script without needing to save the file:
#!/usr/bin/env python3
import sys
from my_module import *
scoped_hash = sys.argv[1]
print(scoped_hash)
I have a function that reads such a file from the web and attempts to execute it:
def execute_artifact(command_string):
os.system('sh | ' + command_string)
Here's what happens when I call it:
>>> print(string)
'#!/usr/bin/env python3\nimport sys\nfrom my_module import *\n\nscoped_hash = sys.argv[1]\n\nobject_string = read_artifact(scoped_hash)\nparsed_object = parse_object(object_string)\nprint(parsed_object)\n'
>>> execute_artifact(string)
sh-3.2$ Version: ImageMagick 7.0.10-57 Q16 x86_64 2021-01-10 https://imagemagick.org
Copyright: © 1999-2021 ImageMagick Studio LLC
License: https://imagemagick.org/script/license.php
Features: Cipher DPC HDRI Modules OpenMP(4.5)
Delegates (built-in): bzlib freetype gslib heic jng jp2 jpeg lcms lqr ltdl lzma openexr png ps tiff webp xml zlib
Usage: import [options ...] [ file ]
Bizarrely, ImageMagick is called. I'm not sure what's going on, but I'm sure there's a better way to do this. Can anyone help me?
EDIT: This answer was added before OP updated requirements to include:
The script can be python code or bash or whatever - it should execute appropriately based on the shebang.
Some may still find the below helpful if they decided to try to parse the shebang themselves:
Probably, the sanest way to do this is to pass the string to the python interpreter as standard input:
import subprocess
p = subprocess.Popen(["python"], stdin=subprocess.PIPE)
p.communicate(command_string.encode())
My instinct tells me this entire thing is fraught with pitfalls. Perhaps, at least, you want to launch it using the same executable that launched your current process, so:
import subprocess
import sys
p = subprocess.Popen([sys.executable], stdin=subprocess.PIPE)
p.communicate(command_string.encode())
If you want to use arguments, I think using the -c option to pass in code as a string as an argument works, then you have access to the rest, so:
import subprocess
import sys
command_string = """
import sys
print(f"{sys.argv=}")
"""
completed_process = subprocess.run([sys.executable, "-c", command_string, "foo", "bar", "baz"])
The above prints:
sys.argv=['-c', 'foo', 'bar', 'baz']
This cannot be done in full generality.
If you want the shebang line to be interpreted as usual, you must write the script to a file. This is a hard requirement of the protocol that makes shebangs work. When a script with a shebang line is executed by the operating system, the kernel (and yes, it’s not the shell which does it, unlike what the question implies) reads the script and invokes the interpreter specified in the shebang, passing the pathname of the script as a command line argument. For that mechanism to work, the script must exist in the file system where the interpreter can find it. (It’s a rather fragile design, leading to some security issues, but it is what it is.)
Many interpreters will allow you to specify the program text on standard input or on the command line, but it is nowhere guaranteed that it will work for any interpreter. If you know you are working with an interpreter which can do it, you can simply try to parse the shebang line yourself and invoke the interpreter manually:
import io
import subprocess
import re
_RE_SHBANG = re.compile(br'^#!\s*(\S+)(?:\s+(.*))?\s*\n$')
def execute(script_body):
stream = io.BytesIO(script_body)
shebang = stream.readline()
m = _RE_SHBANG.match(shebang)
if not m:
# not a shebang
raise ValueError(shebang)
interp, arg = m.groups()
arg = (arg,) if arg is not None else ()
return subprocess.call([interp, *arg, '-c', script_body])
The above will work for POSIX shell and Python scripts, but not e.g. for Perl, node.js or standalone Lua scripts, as the respective interpreters take the -e option instead of -c (and the latter doesn’t even ignore shebangs in code given on the command line, so that needs to be separately stripped too). Feeding the script to the interpreter through standard input is also possible, but considerably more involved, and will prevent the script itself from using the standard input stream. That is also possible to overcome, but it doesn’t change the fact that it’s just a makeshift workaround that isn’t anywhere guaranteed to work in the first place. Better to simply write the script to a file anyway.

Sending curses application's output to tty1

Goal
I'd like to make my curses Python application display its output on a Linux machine's first physical console (TTY1) by adding it to /etc/inittab, reloading init with telinit q and so on.
I'd like to avoid a hacky way of using IO redirection when starting it from /etc/inittab with:
1:2345:respawn:/path/to/app.py > /dev/tty1 < /dev/tty1
What I'm after is doing it natively from within my app, similar to the way getty does it, i.e. you use a command line argument to tell it on which TTY to listen to:
S0:2345:respawn:/sbin/getty -L ttyS1 115200 vt100
Example code
For simplicity, let's say I've written this very complex app that when invoked, prints some content using ncurses routines.
import curses
class CursesApp(object):
def __init__(self, stdscr):
self.stdscr = stdscr
# Code producing some output, accepting user input, etc.
# ...
curses.wrapper(CursesApp)
The code I already have does everything I need, except that it only shows its output on the terminal it's run from. When invoked from inittab without the hacky redirection I mentioned above, it works but there's no output on TTY1.
I know that init doesn't redirect input and output by itself, so that's expected.
How would I need to modify my existing code to send its output to the requested TTY instead of STDOUT?
PS. I'm not asking how to add support for command line arguments, I already have this but removed it from the code sample for brevity.
This is rather simple. Just open the terminal device once for input and once for output; then duplicate the input descriptor to the active process' file descriptor 0, and output descriptor over file descriptors 1 and 2. Then close the other handles to the TTY:
import os
import sys
with open('/dev/tty6', 'rb') as inf, open('/dev/tty6', 'wb') as outf:
os.dup2(inf.fileno(), 0)
os.dup2(outf.fileno(), 1)
os.dup2(outf.fileno(), 2)
I tested this with the cmd module running on TTY6:
import cmd
cmd.Cmd().cmdloop()
Works perfectly. With curses it is apparent from their looks that something is missing: TERM environment variable:
os.environ['TERM'] = 'linux'
Execute all these statements before even importing curses and it should work.

Suppress matplotlib figures when running .py files via python or ipython terminal

I am writing a test_examples.py to test the execution of a folder of python examples. Currently I use glob to parse the folder and then use subprocess to execute each python file. The issue is that some of these files are plots and they open a Figure window that halts until the window is closed.
A lot of the questions on this issue offer solutions from within the file, but how could I suppress the output whilst running the file externally without any modification?
What I have done so far is:
import subprocess as sb
import glob
from nose import with_setup
def test_execute():
files = glob.glob("../*.py")
files.sort()
for fl in files:
try:
sb.call(["ipython", "--matplotlib=Qt4", fl])
except:
assert False, "File: %s ran with some errors\n" % (fl)
This kind of works, in that it suppresses the Figures, but it doesn't throw any exceptions (even if the program has an error). I am also not 100% sure what it is doing. Is it appending all of the figures to Qt4 or will the Figure be removed from memory when that script has finished?
Ideally I would like to ideally run each .py file and capture its stdout and stderr, then use the exit condition to report the stderr and fail the tests. Then when I run nosetests it will run the examples folder of programs and check that they all run.
You could force matplotlib to use the Agg backend (which won't open any windows) by inserting the following lines at the top of each source file:
import matplotlib
matplotlib.use('Agg')
Here's a one-liner shell command that will dynamically insert these lines at the top of my_script.py (without modifying the file on disk) before piping the output to the Python interpreter for execution:
~$ sed "1i import matplotlib\nmatplotlib.use('Agg')\n" my_script.py | python
You should be able to make the equivalent call using subprocess, like this:
p1 = sb.Popen(["sed", "1i import matplotlib\nmatplotlib.use('Agg')\n", fl],
stdout=sb.PIPE)
exit_cond = sb.call(["python"], stdin=p1.stdout)
You could capture the stderr and stdout from your scripts by passing the stdout= and stderr= arguments to sb.call(). This would, of course, only work in Unix environments that have the sed utility.
Update
This is actually quite an interesting problem. I thought about it a bit more, and I think this is a more elegant solution (although still a bit of a hack):
#!/usr/bin/python
import sys
import os
import glob
from contextlib import contextmanager
import traceback
set_backend = "import matplotlib\nmatplotlib.use('Agg')\n"
#contextmanager
def redirected_output(new_stdout=None, new_stderr=None):
save_stdout = sys.stdout
save_stderr = sys.stderr
if new_stdout is not None:
sys.stdout = new_stdout
if new_stderr is not None:
sys.stderr = new_stderr
try:
yield None
finally:
sys.stdout = save_stdout
sys.stderr = save_stderr
def run_exectests(test_dir, log_path='exectests.log'):
test_files = glob.glob(os.path.join(test_dir, '*.py'))
test_files.sort()
passed = []
failed = []
with open(log_path, 'w') as f:
with redirected_output(new_stdout=f, new_stderr=f):
for fname in test_files:
print(">> Executing '%s'" % fname)
try:
code = compile(set_backend + open(fname, 'r').read(),
fname, 'exec')
exec(code, {'__name__':'__main__'}, {})
passed.append(fname)
except:
traceback.print_exc()
failed.append(fname)
pass
print ">> Passed %i/%i tests: " %(len(passed), len(test_files))
print "Passed: " + ', '.join(passed)
print "Failed: " + ', '.join(failed)
print "See %s for details" % log_path
return passed, failed
if __name__ == '__main__':
run_exectests(*sys.argv[1:])
Conceptually this is very similar to my previous solution - it works by reading in the test scripts as strings, and prepending them with a couple of lines that will import matplotlib and set the backend to a non-interactive one. The string is then compiled to Python bytecode, then executed. The main advantage is that it this ought to be platform-independent, since sed is not required.
The {'__name__':'__main__'} trick with the globals is necessary if, like me, you tend to write your scripts like this:
def run_me():
...
if __name__ == '__main__':
run_me()
A few points to consider:
If you try to run this function from within an ipython session where you've already imported matplotlib and set an interactive backend, the set_backend trick won't work and you'll still get figures popping up. The easiest way is to run it directly from the shell (~$ python exectests.py testdir/ logfile.log), or from an (i)python session where you haven't set an interactive backend for matplotlib. It should also work if you run it in a different subprocess from within your ipython session.
I'm using the contextmanager trick from this answer to redirect stdin and stdout to a log file. Note that this isn't threadsafe, but I think it's pretty unusual for scripts to open subprocesses.
Coming to this late, but I am trying to figure something similar out myself, and this is what I have come up with so far. Basically, if your plots are calling, for example, matplotlib.pyplot.show to show the plot, you can mock that method out using a patch decorator. Something like:
from unittest.mock import patch
#patch('matplotlib.pyplot.show') # passes a mock object to the decorated function
def test_execute(mock_show):
assert mock_show() == None # shouldn't do anything
files = glob.glob("../*.py")
files.sort()
for fl in files:
try:
sb.call(["ipython", fl])
except:
assert False, "File: %s ran with some errors\n" % (fl)
Basically the patch decorator should replace any call to matplotlib.pyplot.show within the decorated function with a mock object that doesn't do anything. At least that's how it's supposed to work in theory. In my application, my terminal is still trying to open plots and this is resulting in errors. I hope it works better for you, and I will update if I figure out something wrong in the above that is leading to my issue.
Edit: for completeness, you might be generating figures with a call to matplotlib.pyplot.figure() or matplotlib.pyplot.subplots(), in which case these are what you would mock out instead of matplotlib.pyplot.show(). Same syntax as above, you would just use:
#patch('matplotlib.pyplot.figure')
or:
#patch('matplotlib.pyplot.subplots')

Capturing log information from bunch of python functions in single file

I have 3 scripts. one is starttest.py which kicks the execution of methods called in test.py. Methods are defined in module.py.
There are many print statements in each of file and I want to capture each print statement in my log file from Starttest.py file itself. I tried using sys.stdout in starttest.py file but this function only takes print statements from starttest.py file. It does not have any control on test.py and module.py file print statements.
Any suggestions to capture the print statements from all of the files in a single place only?
Before importing anything from test.py or module.py, replace tye sys.stdout file object with one of your liking:
import sys
sys.stdout = open("test-output.txt", "wt")
# Import the rest
If you're running on a Unix-like operating system, there is a safer method that does not need to replace the file object reference. Specially since overwriting sys.stdout does not guarantee that previous object is destroyed:
import os
import sys
fd = os.open("test-output.txt", os.O_WRONLY | os.O_CREAT, 0644)
os.dup2(fd, sys.stdout.fileno())
os.close(fd)
Note that the above trick is used by almost all daemonization implementations for Python.
Even though not directly related, remember also that you can use your shell to redirect command output to a file (works on Windows too):
python starttest.py >test-output.txt
Maybe look at the logging module that comes with python

best way to deal with python pdb flakiness re/stdout?

If I have a program where stdout is redirected, my pdb prompts all go to the redirection, because the library was written to write to stdout.
Oftentimes this problem is subtle, causing me to think a program is hanging when it's really waiting for input.
How do people work around this? (Unfortunately, using other debuggers like winpdb is not an option).
This answer is just to supplement Ned's, as a way of wrapping the pdb.py main() function in a manner which doesn't require copying 40 lines just to change one of them:
# sane_pdb.py: launch Pdb with stdout on original
import sys, pdb
def fixed_pdb(Pdb=pdb.Pdb):
'''make Pdb() tied to original stdout'''
return Pdb(stdout=sys.__stdout__)
if __name__ == '__main__':
pdb.Pdb = fixed_pdb
pdb.main()
I don't know if it actually works for the questioner's problem, but it does what Ned described...
The problem here is that PDB uses Cmd class where by default:
use_rawinput = 1
It means that Cmd will use raw_input() method by default instead of sys.stdout.readline() to read from the console . This is done because raw_input() supports history (only if readline module is loaded) and other useful bits . The only issue is that raw_input() does not support redirection, so if you have a script:
#!/usr/bin/python
name=raw_input("Enter your name: ")
and run it
> python test.py
Enter your name: Alex
but, if you run it with output redirection it will stuck
> python test.py | tee log
this is exactly what PDB uses and why it's stuck as well. As I mentioned sys.stdin.readline() supports redirection and if you rewrite the above script using readline() it should work.
Back to the original issue all you need to do is to tell Cmd to not use raw_input():
Cmd.use_rawinput = 0
or
pdb = pdb.Pdb()
pdb.use_rawinput=0
pdb.set_trace()
If you are invoking pdb in code, you can pass your own stdout into the constructor. sys.__stdout__ might be a good choice.
If you are invoking pdb from the command line, you could copy the main() function from pdb.py into your own sane_pdb.py. Then change the Pdb() initialization to:
pdb = Pdb(stdout=sys.__stdout__)
Then you can invoke sane_pdb.py instead of pdb.py. It's not awesome that you'd have to copy 40 lines into your own file just to change one of them, but it's an option.

Categories