I'm looking for a way to debug a python exception "retrospectively". Essentially if my program raises an exception that isn't handled, I want it to save off the program state so I can go back later and debug the problem.
I've taken a look at the pdb docs, and it seems that you can do this, but only if you can interact with the program at the point of the exception. This won't work for me as the program will be run in the background (without a controlling terminal).
My first (doomed!) approach was to put a try/except block at the highest level of my program, and in the except block extract the traceback object from the current exception and write it to disk using pickle. I planned to then write a separate program that would unpickle the object and use pdb.post_mortem to debug the crashed program. But traceback objects aren't pickleable, but I wouldn't expect that to work anyway, as it wouldn't save off the entire program state.
As far as I know, there isn't any way to do what you're asking. That said, it sounds like you might be looking for a remote debugger. There are a couple of options:
rconsole - This isn't really a debugger, but it allows you to get an interactive prompt inside another process. This can be useful for debugging purposes. I haven't tried this, but it looks relatively straightforward.
rpdb2's embedded debugger - This lets you start a debugger and then connect to it from another shell.
What you can do is use twisted.python and write the traceback to a file, it gives you an exact traceback including the exception
At the moment the exception is caught, before the stack is unwound, the state is available for inspection with the inspect module: http://docs.python.org/2/library/inspect.html
Generically you would use inspect.getinnerframes on your traceback object. The local variables in each stack frame are available as .f_locals so you can see what they are.
The hard part is serializing all of them correctly: depending on what types you have in the local scope you may or may not be able to pickle them, dump them to JSON, or whatever.
You could create an entirely separate execution environment in the top level:
myEnv = {}
myEnv.update(globals)
Then execute your code within that execution environment. If an exception occurs you have the traceback (stack) and all the globals, so you can pretty well reconstruct the program state.
I hope this helps (it helped me):
import logging, traceback
_logger = logging.getLogger(__name__)
try:
something_bad()
except Exception as error:
_logger.exception("Oh no!") # Logs original traceback
store_exception_somewhere(error)
Also, in Python 3 there are a few new options like raise new_exc from original_exc
or raise OtherException(...).with_traceback(tb).
Related
There is no code to show as this is a conceptual question. So here is the concise concept breakdown:
Local server (my code) -calls-> acquire_remote_data function (my code) -calls-> data module (not my code) -data module uses-> requests (obviously not my code lol) which requests then throws the above titled exception if the server loses its internet connection temporarily.
I of course, first thing, used the "try/except" method in Python in the acquire_remote_data function to call the data module to quickly realize that it can't capture an exception so far down the path in modules I have no control over.
An option is to contact the author of the data module but who knows when, if ever, they will modify their code to capture the exception if they can.
Is there a workaround technique that can be used to keep this from crashing the server?
*** I figured it out. Using the try/except method was the correct first choice. However, I needed to use the except Exception catch all. I didn't really think of that the first time around because of my darn linter squawking about too general of an exception catch in the past. What I learned was how to shut my linter up about certain issues.
I am running a parallelized grid search in python using joblib.Parallel. My script is relatively straighforward:
# Imports for data and classes...
Parallel(n_jobs=n_jobs)(
delayed(biz_model)(
...
)
for ml_model_params in grid
for past_horizon in past_horizons
)
When I run it in my local machine, it seems to run fine though I can only test it on small datasets for memory reasons. Yet when I try to run it on a remote Oracle Linux server It begins some runs and after a while it outputs:
/u01/.../resources/python/lib/python3.6/multiprocessing/semaphore_tracker.py:143: UserWarning: semaphore_tracker: There appear to be 1 leaked semaphores to clean up at shutdown
len(cache))
Aborted!
I tried to reproduce it locally, and with small experiments it does run. The unparallelized script also runs and the number of jobs (either low on high) doesn't prevent the bug from happening.
So my question is, given that there is no traceback, is there a way to make joblib or Parallel more verbose? I just cannot quite get an idea of where to look at possible fail reasons without a traceback. Obviously if some possible reason for the abort can be inferred from just this (and I fail to grasp it) I thank the notice very much.
Thanks in advance.
Using a logger, catching the exception, logging it, flushing the logs and raising it again, usually makes the trick
# Imports for data and classes...
# Creates logger
Parallel(n_jobs=n_jobs)(
try:
delayed(biz_model)(
...
)
for ml_model_params in grid
for past_horizon in past_horizons
except BaseException as e:
logger.exception(e)
# you can use a for here if you had more than a handler
logger.handlers[0].flush()
raise e
)
I am using logging throughout my code for easier control over logging level. One thing that's been confusing me is
when do you prefer logging.error() to raise Exception()?
To me, logging.error() doesn't really make sense, since the program should stop when encountered with an error, like what raise Exception() does.
In what scenarios, do we put up an error message with logging.error() and let the program keep running?
Logging is merely to leave a trace of events for later inspection but does absolutely nothing to influence the program execution; raising an exception allows you to signal an error condition to higher up callers and let them handle it. Typically you use both together, it's not an either-or.
That purely depends on your process flow but it basically boils down to whether you want to deal with an encountered error (and how), or do you want to propagate it to the user of your code - or both.
logging.error() is typically used to log when an error occurs - that doesn't mean that your code should halt and raise an exception as there are recoverable errors. For example, your code might have been attempting to load a remote resource - you can log an error and try again at later time without raising an exception.
btw. you can use logging to log an exception (full, with a stack trace) by utilizing logging.exception().
What if, when a user is using my Python application and the application crashes, the state of the application can be saved to a file and sent to me, the developer? I open the Python interpreter and start debugging from the point where the user crashed.
To clarify, when I'm debugging an application and it raises an unhandled exception, I can debug the application post-mortem, getting access to all the local variables and their values which is crucial to quickly fixing bugs. When an user's application crashes though, I only receive the stack trace for when the error occurred, which is helpful, but not nearly as much as debugging interactively would be.
So is it possible to save the state of a Python application to a file, close the interpreter, then resume the execution from that file at a later stage?
This tool may help, but you'll need to call the dumper in your code when exception happens. It simply pickles the traceback & frame objects into files
And there's a similar question here.
I have a long running Python program that raises exception at some point. Is there some way to run this from ipython session and stop on the exception so I could examine the live data?
You may want ipython -i yourscript.py, which will execute your script in the interpreter environment. But this won't let you inspect the local environment where the exception happened, for example local variables within a function – you'll just be able to inspect globals. You probably want this instead:
In [1]: %run test.py
<exception occurs>
In [2]: %debug test.py
If you're not familiar with using PDB, check out some docs first.
Edit thanks to Thomas K
yes, depending on how you are setup. you can import your program and run it like any other module inside a try except block.
import yourprogram
try:
yourprogram.main_function(args)
except:
print "we blew up, investigate why"
If your program is not in a function you may need to put the try block around your import.
The problem with this approach is that the variables you are wanting to look at may be no longer in scope. I usually use print statements or log messages at various points to figure out what is not looking like I am expecting.