Consider the following twisted code, using deferLater:
import random
from twisted.internet.task import deferLater
from twisted.internet import reactor
def random_exception(msg='general'):
if random.random() < 0.5:
raise Exception("Random exception with 50%% likelihood occurred in %s!" % msg)
def dolater():
random_exception('dolater')
print "it's later!"
def whoops(failure):
failure.trap(Exception)
print failure
defer = deferLater(reactor, 10, dolater)
defer.addErrback(whoops)
reactor.run()
An exception is raised during the 10 second sleep (namely a KeyboardInterrupt), however, it seems that the whoops method is never called. My assumption is that since I add the errBack after the deferred kicks off, it's never properly registered. Advice appreciated.
EDIT:
Alright, no one likes my use of the signal (not the exception) KeyboardInterrupt to show an error condition outside of the defer. I thought pretty hard about an actual exception that might occur out of the defer callback, but couldn't think of a particularly good one, most everything would be some kind of signal (or developer error), so signal handling is fine for now- but that wasn't really the heart of the question.
As I understand it, twisted's callback/errback system handles errors within the callback structure - e.g. if dolater raises an Exception of some kind. To show this, I have added an exception that could occur during dolater, to show that if the exception occurs in dolater, the errback handles the exception just fine.
My concern was if something went wrong while the reactor was just reacting normally, and the only thing I could get to go wrong was a keyboard interrupt, then I wanted whoops to fire. It appears that if I put other async events into the reactor and raise exceptions from there, then the dolater code wouldn't be affected, and I would have to add errbacks to those other async events. There is no master error handling for an entire twisted program.
So signals it is, until I can find some way to cause the reactor to fail without a signal.
If by KeyboardInterrupt you mean a signal (ctrl-c, SIGINT, etc), then what you need to do is setup a signal handler with your whoops function as the callback.
By following two previous answers from #jean-paul-calderone twisted: catch keyboardinterrupt and shutdown properly and twisted - interrupt callback via KeyboardInterrupt, I tried the following, and I think it matches your need:
def dolater():
print "it's later!"
def whoops(signal, stackframe):
print "I'm here because of signal number " + str(signal)
reactor.stop()
defer = task.deferLater(reactor, 10, dolater)
signal.signal(signal.SIGINT, whoops)
reactor.run()
That will call whoops on a SIGINT. I put a reactor.stop() in the whoops because otherwise the reactor would just keep on running, take that out if you really want it to keep running in the face of a ctrl-c.
Note: I'm not explicitly showing how to fire a err-back in the signal system because (at least to my understanding) that doesn't really map to how defer'ed should be used. I imagine if you found a way to get the defer'ed into the signal handler you could fire its errback but I think thats out of the expected use-case for twisted and may have crazy consequences.
The problem is with the actual exception you're trying to catch, specifically KeyboardInterrupt is not a subclass of Exception, thus can not be catched with it. If you'd just change the line:
failure.trap(Exception)
into:
failure.trap(KeyboardInterrupt)
it surely would catch it. More on Python's exception hierarchy can be found in the official Python docs: https://docs.python.org/2/library/exceptions.html
Twisted is a library for doing many things concurrently. The things are kept as isolated as possible (given that this is still Python, there's still global state, etc).
If you have a TCP server with two clients connect to it and one of them sends you some bad data that triggers a bug in your parser that leads to an exception being raised, that exception isn't going to cause the other client to receive any error. Nor would you want it to, I hope (at least not automatically).
Similarly, if you have a client connected to your server and you start a delayed call with deferLater and the client triggers that bug, you wouldn't want the error to be delivered to the errback on the Deferred returned by deferLater.
The idea here is that separate event sources are generally treated separately (until you write some code that glues them together somehow).
For the ten seconds that are passing between when you call deferLater and when Twisted begins to run the function you passed to deferLater, any errors that happen - including you hitting C-c on your keyboard to make Python raise a KeyboardInterrupt - aren't associated with that delayed call and they won't be delivered to the errback you attach to its Deferred.
Only exceptions raised by your dolater function will cause the errback chain of that Deferred to begin execution.
Related
I have a callback associated with a rabbitmq queue through pika's basic_consume like so.
channel.basic_consume(queue=REQUESTS_QUEUE,
on_message_callback=request_callback, auto_ack=False)
And the request callback function is:
def request_callback(channel, method, properties, body):
try:
readings = json_util.loads(body)
location_updater.update_location(readings)
channel.basic_ack(delivery_tag=method.delivery_tag)
except Exception:
logger.exception('EXCEPTION: ')
Whenever the code inside the except block is executed, this particular callback stops working (i.e. it stops being called when a message is sent to its associated queue). All the other callbacks I have associated with other queues keep working fine. If I comment out the try...except logic the callback keeps working fine for further requests, even after a exception occurs.
I'm still getting used to Python, so it might be something simple. Can anyone help?
I'm assuming the exception comes from a statement before channel.basic_ack, and I'm also assuming you're calling channel.basic_qos to set a prefetch value.
The exception prevents the call to basic_ack, which prevents RabbitMQ from removing the message from the queue. If you have reached the prefetch value, no further messages will be delivered to that client because RabbitMQ assumes your client is still processing them.
You need to decide what to do with that message when an exception happens. I'm assuming that the message can't be considered to be processed, so you should reject (nack) the message (https://www.rabbitmq.com/nack.html). Do this in the except block. This will cause the message to be re-enqueued and re-delivered, potentially to a different consumer.
Closing the channel and/or the connection will also have the same effect. This ensures that clients that crash do not permanently "hold onto" messages.
NOTE: the RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.
I would like some clean-up activities to occur in case of crash of my program. I understand that some situations cannot be handled (a SIGKILL for instance) but I would like to cover as much as possible.
The atexit module was a good candidate but the docs explicitely state that
The functions registered via this module are not called when the
program is killed by a signal not handled by Python, when a Python
fatal internal error is detected, or when os._exit() is called.
Are there functions or python features which allow to handle sys.exit() and unhandled exceptions program terminations? (these are the main one I am concerned with)
SIGKILL cannot be handled, no matter what, your program is just terminated (killed violently), and you can do nothing about this.
The only thing you can do about SIGKILL is to look for data that needs to be cleaned up during the next launch of your program.
For other cases use atexit to handle Python's interpreter normal termination. If you've got some unhandled exceptions, see where they can occur and wrap these pieces of code in try/except blocks:
try:
pass
except ValueError as e:
pass
except:
# catch other exceptions
pass
To deal with sys.exit calls, you can wrap the entire program's starting point in a try/except block and catch the SystemExit exception:
try:
# your program goes here
# you're calling your functions from here, etc
except SystemExit:
# do cleanup
raise
I have a C-extension which implements an LRU cache https://github.com/pbrady/fastcache . I've recently noticed that in an application (SymPy) which makes fairly heavy use of caching, a timeout signal gets lost and the application continues to run. This only happens when using my C-extension and not with a pure Python LRU cache (i.e. functools.lru_cache) https://github.com/pbrady/fastcache/issues/26.
I've peppered my routines with calls to PyErr_CheckSignals() and the signal gets lost less frequently but it's still happening. Note that during a call, the cache will call PyObject_Hash, PyDict_Get/Set/DelItem and PyObject_Call (in the case of a miss).
Here's the relevant snippet of the SymPy code (timeout is an integer):
def _timeout(self, function, timeout):
def callback(x, y):
signal.alarm(0)
raise Skipped("Timeout")
signal.signal(signal.SIGALRM, callback)
signal.alarm(timeout) # Set an alarm with a given timeout
function()
signal.alarm(0) # Disable the alarm enter code here
Can something be overwriting the signal? If so, how do I work around this?
Turns out there is no real mystery here but it is a little tricky.
In addition to calls to PyErr_CheckSignals() the signal can be caught by the interpreter whenever control passes to it via calls to PyObject_Hash, PyDict_Get/Set/DelItem. If the signal is caught by the interpreter in one of these functions it will trigger an exception due to the callback function (and the signal will go away since it was handled). However, I was not checking the return value of all my functions (i.e. I knew my argument was hashable so I wasn't checking the return value of PyDict_SetItem)
Thus the exception was ignored and the program continued to execute as if the signal had not happened.
Special thanks to Ondrej for talking through this.
When I use multiprocessing.Queue.get I sometimes get an exception due to EINTR.
I know definitely that sometimes this happens for no good reason (I open another pane in a tmux buffr), and in such a case I would want to continue working and retry the operation.
I can imagine that in some other cases The error would be due to a good reason and I should stop running or fix some error.
How can I distinguish the two?
Thanks in advance
The EINTR error can be returned from many system calls when the application receives a signal while waiting for other input. Typically these signals can be quite benign and already handled by Python, but the underlying system call still ends up being interrupted. When doing C/C++ coding this is one reason why you can't entirely rely on functions like sleep(). The Python libraries sometimes handle this error code internally, but obviously in this case they're not.
You might be interested to read this thread which discusses this problem.
The general approach to EINTR is to simply handle the error and retry the operation again - this should be a safe thing to do with the get() method on the queue. Something like this could be used, passing the queue as a parameter and replacing the use of the get() method on the queue:
import errno
def my_queue_get(queue, block=True, timeout=None):
while True:
try:
return queue.get(block, timeout)
except IOError, e:
if e.errno != errno.EINTR:
raise
# Now replace instances of queue.get() with my_queue_get(queue), with other
# parameters passed as usual.
Typically you shouldn't need to worry about EINTR in a Python program unless you know you're waiting for a particular signal (for example SIGHUP) and you've installed a signal handler which sets a flag and relies on the main body of the code to pick up the flag. In this case, you might need to break out of your loop and check the signal flag if you receive EINTR.
However, if you're not using any signal handling then you should be able to just ignore EINTR and repeat your operation - if Python itself needs to do something with the signal it should have already dealt with it in the signal handler.
Old question, modern solution: as of Python 3.5, the wonderful PEP 475 - Retry system calls failing with EINTR has been implemented and solves the problem for you. Here is the abstract:
System call wrappers provided in the standard library should be retried automatically when they fail with EINTR , to relieve application code from the burden of doing so.
By system calls, we mean the functions exposed by the standard C library pertaining to I/O or handling of other system resources.
Basically, the system will catch and retry for you a piece of code that failed with EINTR so you don't have to handle it anymore. If you are targeting an older release, the while True loop still is the way to go. Note however that if you are using Python 3.3 or 3.4, you can catch the dedicated exception InterruptedError instead of catching IOError and checking for EINTR.
I have been using Pyro 3 for a little while now, with great success, but occasionally I have noticed, that when a signal such as SIGHUP or SIGINT arrives while Pyro is doing some remote communications, the process hangs, hence the question, is Pyro signal safe?
Thanks in advance.
Seems the issue here is by default Python sets up a handlers for SIGINT and SIGTERM which raise exceptions. If you therfore receive a signal while doing some Pyro comms, the exception is raised, and off it goes to look for an appropriate except clause, not finishing what it was doing, if you then try and use Pyro again, for example in the except/finally clause, you can get issues. In my case it was sending some messages from finally to a log via a queue which was proxied to another process using Pyro.