Python debugging, stop at particular output - python

I have complex python project with lots of modules, loggers, twisted defereds and other stuff.
And somewhere in the code some line is printed to logs, and I want to find out where. Usually I just search the codebase for that string, but now that string is generated dynamically, so is not searchable.
And I wander if there is any way to run python in some debug mode, and tell it to stop when some pattern will appear in sdout, and then print location in code where it stopped?

How about replacing sys.stdout?
For example:
import sys
import traceback
class StacktraceOnPrint:
def __init__(self, orig_stdout, substring):
self.orig_stdout = orig_stdout
self.substring = substring
def write(self, txt):
if self.substring in txt:
traceback.print_stack() # OR import pdb; pdb.set_trace()
self.orig_stdout.write(txt)
sys.stdout = StacktraceOnPrint(sys.stdout, 'blah')
print 'test ...'
print 'Hello blah.'
print 'test ...'
NOTE traceback.print_stack uses sys.stderr. If you want to catch sys.stderr, use different function (like traceback.format_stack). Otherwise it recurses forever; causes RuntimeError: maximum recursion depth exceeded ..

You can use pdb module.
This will give you the possibility to interactively debug your code while running.
You can probably write a script that runs your program "step-by-step" until you read that line in the log file.

you can include pathname, module, funcName, line number for each log record by setting formatter
formatter = logging.Formatter('[%(asctime)s] p%(process)s {%(pathname)s:%(lineno)d} %(levelname)s - %(message)s','%m-%d %H:%M:%S')

Looks like you are using twisted. You can start your twisted program under pdb. Check twistd --help:
...
-b, --debug run the application in the Python Debugger (implies
nodaemon), sending SIGUSR2 will drop into debugger
...
After you start the program under pdb you can put breakpoints where you like. You can also specify condition for the breakpoint to be honored:
(Pdb) b myfunc, somecondition
But in your particular case it looks it is quite hard to detect a place where to break the program.
So you might consider another approach. For example, you could redirect logs to stdin of some script. This script watches for the certain log line. When it detects the line in question, the script sends SIGUSR2 to the twisted program, and it will drop into the debugger. After that just inspect your program with pdb.

Related

How to read standard output of a Python script from within it?

Here is my problem. I have an application which prints some traces to the standard output using logging module. Now, I want to be able to read those traces at the same time in order to wait for specific trace I need.
This is for the testing purpose. So for example, if wanted trace does not occur in about 2 seconds, test fails.
I know I can read output of another scripts by using something like this:
import subprocess
p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
while True:
line = p.stdout.readline()
print line
if line == '' and p.poll() != None:
break
But, how can I do something similar from the script itself?
Thanks in advance.
EDIT
So, since my problem was expecting certain trace to appear while the Python application is running, and since I couldn't find a simple way to do so from the application itself, I decided to start the application (as suggested in comments) from another script.
The module I found very helpful, and easier to use than subprocess module, is pexpect module.
If you want to do some pre-processing of the logger messages you can do something like:
#!/usr/bin/python
import sys
import logging
import time
import types
def debug_wrapper(self,msg):
if( hasattr(self,'last_time_seen') and 'message' in msg):
print("INFO: seconds past since last time seen "+str(time.time()-self.last_time_seen))
self.last_time_seen = time.time()
self.debug_original(msg)
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)
logger = logging.getLogger("test")
logger.debug_original = logger.debug
logger.debug = types.MethodType(debug_wrapper, logger)
while True:
logger.debug("INFO: some message.")
time.sleep(1)
This works by replacing the original debug function of the logger object with your custom debug_wrapper function, in which you can do whatever processing you want, like for example, storing the last time you have seen a message.
You can store the script output to a file in real-time and then read its content within the script in real-time(as the contents in the output file is updating dynamically).
To store the script output to a file in real-time, you may use unbuffer which comes with the expect package.
sudo apt-get install expect
Then, while running the script use:
unbuffer python script.py > output.txt
You have to just print the output in the script , which will be dynamically updating to the output file. And hence, read that file each time.
Also, use > for overwriting old or creating new file and >> for appending the contents in previously created output.txt file.
If you want to record the output from print statement in other Python code, you can redirect sys.stdout to string like file object as follows:
import io
import sys
def foo():
print("hello world, what else ?")
stream = io.StringIO()
sys.stdout = stream
try:
foo()
finally:
sys.stdout = sys.__stdout__
print(stream.getvalue())

Opening a file in a Python module and closing it when the script ends

I have a Python module that performs some logging during some of the methods it contains:
module.py
LOG_FILE = "/var/log/module.log"
def log(message):
with os.fdopen(os.open(LOG_FILE, os.O_RDWR | os.O_CREAT, 0664), "a+") as f:
f.write("[%s] %s\n" % (time.strftime("%c"), message))
def do_something():
log("Doing something")
# ...
In this implementation the log file will be opened and closed every time the log method is called.
I'm considering refactoring it so the file is opened once when the module is loaded, but I'm not sure how to ensure it is closed when a script importing the module ends. Is there a clean way to do this?
Edit: I'm not asking about closing the file when an exception is encountered, but when the script that imports my module exits.
OS takes care of open file descriptors then a process dies. It may lead to a data loss if file buffers inside the application are not flushed. You could add f.flush() in the log() function after each write (note: it does not guarantee that the data is physically written to disk and therefore it is still may be lost on a power failure, see Threadsafe and fault-tolerant file writes).
Python may also close (and flush) the file on exit during a garbage collection. But you shouldn't rely on it.
atexit works only during a normal exit (and exit on some signals). It won't help if the script is killed abruptly.
As #René Fleschenberg suggested, use logging module that calls .flush() and perhaps registers atexit handlers for you.
Python is usually pretty good at cleaning up after itself. If you must do something when the script ends, you need to look at the atexit module - but even then, it offers no guarantees.
You may also want to consider logging to either stdout or stderr, depending on purpose, which avoids keeping a file around all together:
import sys
def log(message):
sys.stderr.write("[%s] %s\n" % (time.strftime("%c"), message))
Python will automatically close the opened files for you when the script that has imported your module exits.
But really, just use Python's logging module.

How can I effectively test my readline-based Python program using subprocesses?

I have a Python program, which, under certain conditions, should prompt the user for a filename. However, there is a default filename which I want to provide, which the user can edit if they wish. This means typically that they need to hit the backspace key to delete the current filename and replace it with the one they prefer.
To do this, I've adapted this answer for Python 3, into:
def rlinput(prompt, prefill=''):
readline.set_startup_hook(lambda: readline.insert_text(prefill))
try:
return input(prompt)
finally:
readline.set_startup_hook()
new_filename = rlinput("What filename do you want?", "foo.txt")
This works as expected when the program is run interactively as intended - after backspacing and entering a new filename, new_filename contains bar.txt or whatever filename the user enters.
However, I also want to test the program using unit tests. Generally, to do this, I run the program as a subprocess, so that I can feed it input to stdin (and hence test it as a user would use it). I have some unit testing code which (simplified) looks like this:
p = Popen(['mypythonutility', 'some', 'arguments'], stdin=PIPE)
p.communicate('\b\b\bbar.txt')
My intention is that this should simulate the user 'backspacing' over the provided foo.txt, and entering bar.txt instead.
However, this doesn't seem to have the desired effect. Instead, it would appear, after some debugging, that new_filename in my program ends up with the equivalent of \b\b\bbar.txt in it. I was expecting just bar.txt.
What am I doing wrong?
The appropriate way to control an interactive child process from Python is to use the pexpect module. This module makes the child process believe that it is running in an interactive terminal session, and lets the parent process determine exactly which keystrokes are sent to the child process.
Pexpect is a pure Python module for spawning child applications; controlling them; and responding to expected patterns in their output. Pexpect works like Don Libes’ Expect. Pexpect allows your script to spawn a child application and control it as if a human were typing commands.

How to know if a running script dies?

So I'm somewhat new to programming and mostly self-taught, so sorry if this question is a bit on the novice side.
I have a python script that runs over long periods (e.g. it downloads pages every few seconds for days at a time.) Sort of a monitoring script for a web app.
Every so often, something will disrupt it, and it'll need restarted. I've gotten these events to a bare minimum but it still happens every few days, and when it does get killed it could be bad news if I don't notice for a few hours.
Right now it's running in a screen session on a VPS.
Could someone point me in the right direction as far as knowing when the script dies / and having it automatically restart?
Would this be something to write in Bash? Or something else? I've never done anything like it before and don't know where to start or even look for information.
You could try supervisord, it's a tool for controlling daemon processes.
You should daemonize your program.
As described in Efficient Python Daemon, you can install and use the python-daemon which implements the well-behaved daemon specification of PEP 3143, "Standard daemon process library".
Create a file mydaemon.py with contents like this:
#!/usr/bin/env python
import daemon
import time
import logging
def do_something():
name = 'mydaemon'
logger = logging.getLogger(name)
handler = logging.FileHandler('/tmp/%s.log' % (name))
formatter = logging.Formatter('%(asctime)s %(levelname)s %(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)
logger.setLevel(logging.WARNING)
while True:
try:
time.sleep(5)
with open("/tmp/file-does-not-exist", "r") as f:
f.write("The time is now " + time.ctime())
except Exception, ex:
logger.error(ex)
def run():
with daemon.DaemonContext():
do_something()
if __name__ == "__main__":
run()
To actually run it use:
python mydaemon.py
Which will spawn do_something() within the DaemonContext and then the script mydaemon.py will exit. You can see the running daemon with: pgrep -fl mydaemon.py. This short example will simply log errors to a log file in /tmp/mydaemon.log. You'll need to kill the daemon manually or it will run indefinitely.
To run your own program, just replace the contents of the try block with a call to your code.
I believe a wrapper bash script that executes the python script inside a loop should do the trick.
while true; do
# Execute python script here
echo "Web app monitoring script disrupted ... Restarting script."
done
Hope this helps.
That depends on the kind of failure you want to guard against. If it's just the script crashing, the simplest thing to do would be to wrap your main function in a try/except:
import logging as log
while True:
try:
main()
except:
log.exception("main() crashed")
If something is killing the Python process, it might be simplest to run it in a shell loop:
while sleep 1; do python checker.py; done
And if it's crashing because the machine is going down… well… Quis custodiet ipsos custodes?
However, to answer your question directly: the absolute simplest way to check if it's running from the shell would be to grep the output of ps:
ps | grep "python checker.py" 2>&1 > /dev/null
running=$?
Of course, this isn't fool-proof, but it's generally Good Enough.

How to capture Python interpreter's and/or CMD.EXE's output from a Python script?

Is it possible to capture Python interpreter's output from a Python script?
Is it possible to capture Windows CMD's output from a Python script?
If so, which librar(y|ies) should I look into?
If you are talking about the python interpreter or CMD.exe that is the 'parent' of your script then no, it isn't possible. In every POSIX-like system (now you're running Windows, it seems, and that might have some quirk I don't know about, YMMV) each process has three streams, standard input, standard output and standard error. Bu default (when running in a console) these are directed to the console, but redirection is possible using the pipe notation:
python script_a.py | python script_b.py
This ties the standard output stream of script a to the standard input stream of script B. Standard error still goes to the console in this example. See the article on standard streams on Wikipedia.
If you're talking about a child process, you can launch it from python like so (stdin is also an option if you want two way communication):
import subprocess
# Of course you can open things other than python here :)
process = subprocess.Popen(["python", "main.py"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
x = process.stderr.readline()
y = process.stdout.readline()
process.wait()
See the Python subprocess module for information on managing the process. For communication, the process.stdin and process.stdout pipes are considered standard file objects.
For use with pipes, reading from standard input as lassevk suggested you'd do something like this:
import sys
x = sys.stderr.readline()
y = sys.stdin.readline()
sys.stdin and sys.stdout are standard file objects as noted above, defined in the sys module. You might also want to take a look at the pipes module.
Reading data with readline() as in my example is a pretty naïve way of getting data though. If the output is not line-oriented or indeterministic you probably want to look into polling which unfortunately does not work in windows, but I'm sure there's some alternative out there.
I think I can point you to a good answer for the first part of your question.
1. Is it possible to capture Python interpreter's output from a Python
script?
The answer is "yes", and personally I like the following lifted from the examples in the PEP 343 -- The "with" Statement document.
from contextlib import contextmanager
import sys
#contextmanager
def stdout_redirected(new_stdout):
saved_stdout = sys.stdout
sys.stdout = new_stdout
try:
yield None
finally:
sys.stdout.close()
sys.stdout = saved_stdout
And used like this:
with stdout_redirected(open("filename.txt", "w")):
print "Hello world"
A nice aspect of it is that it can be applied selectively around just a portion of a script's execution, rather than its entire extent, and stays in effect even when unhandled exceptions are raised within its context. If you re-open the file in append-mode after its first use, you can accumulate the results into a single file:
with stdout_redirected(open("filename.txt", "w")):
print "Hello world"
print "screen only output again"
with stdout_redirected(open("filename.txt", "a")):
print "Hello world2"
Of course, the above could also be extended to also redirect sys.stderr to the same or another file. Also see this answer to a related question.
Actually, you definitely can, and it's beautiful, ugly, and crazy at the same time!
You can replace sys.stdout and sys.stderr with StringIO objects that collect the output.
Here's an example, save it as evil.py:
import sys
import StringIO
s = StringIO.StringIO()
sys.stdout = s
print "hey, this isn't going to stdout at all!"
print "where is it ?"
sys.stderr.write('It actually went to a StringIO object, I will show you now:\n')
sys.stderr.write(s.getvalue())
When you run this program, you will see that:
nothing went to stdout (where print usually prints to)
the first string that gets written to stderr is the one starting with 'It'
the next two lines are the ones that were collected in the StringIO object
Replacing sys.stdout/err like this is an application of what's called monkeypatching. Opinions may vary whether or not this is 'supported', and it is definitely an ugly hack, but it has saved my bacon when trying to wrap around external stuff once or twice.
Tested on Linux, not on Windows, but it should work just as well. Let me know if it works on Windows!
You want subprocess. Look specifically at Popen in 17.1.1 and communicate in 17.1.2.
In which context are you asking?
Are you trying to capture the output from a program you start on the command line?
if so, then this is how to execute it:
somescript.py | your-capture-program-here
and to read the output, just read from standard input.
If, on the other hand, you're executing that script or cmd.exe or similar from within your program, and want to wait until the script/program has finished, and capture all its output, then you need to look at the library calls you use to start that external program, most likely there is a way to ask it to give you some way to read the output and wait for completion.

Categories