When can DeadlineExceededError exception be thrown in Google App Engine - python

I am writing a deferred task which is intended to construct a file in the blobstore for download. I am modelling the code on the example given in the docs:
http://code.google.com/appengine/articles/deferred.html
The idea is to structure the code so that if there is a DeadlineExceededError the handler can tidy up and kick off a new deferred task to continue later.
What I'd like to know is when exactly can this exception be thrown? Are there any operations which are guaranteed to be atomic and therefore will not be interrupted?
In the example (referenced above) they update a variable called start_key as they finish processing each record, but say the main loop was interrupted between the extending of the to_put and to_delete lists then the data would be wrong, as it would do miss a set of deletes.
If an exception can be raised at any point then it could be halfway through the batch_write, or between the db.put and clearing of the to_put list.
This is logically equivalent to a thread safety problem, to solve it one normally has guaranteed atomic operations and non-atomic operations.
How does this work?
Thanks

A DeadlineExceededError can be thrown literally any time at all. If there were a time when it couldn't be thrown, an abusive app could simply execute that code in a loop.
You can avoid this several ways:
Proactively check how long you've been executing for and stop at a good time before you hit the deadline.
Put the exception handler somewhere that it can store the state as of the last set of completed operations (eg, discarding anything since the last iteration of the outer loop in which the exception was thrown)
Use backends, which do not have deadlines.

Related

Why is _post_put_hook not running inside a transaction?

I have some code that queues up a task inside _post_put_hook.
The task retrieves the key and fetches the entity. However sometimes the worker fails because the object for that key hasn't been created yet, but will succeed when it next runs.Note that we're retrieving the object by key, so I expect the data to be consistent.
I'm only calling the enqueue on commit, so I'd expect the object to be created by the time the task runs. In the sample below, I find that _post_put_hook is not in a transaction which seems to be the cause of the issue, but why isn't it in a transaction?
Here's a sample:
#ndb.synctasklet
def log_usage(self):
#ndb.transactional_tasklet(xg=True)
def _txn():
yield Log.insert_document_log_async()
yield _txn()
class Log(ndb.Expando):
#classmethod
#ndb.tasklet
def insert_document_log_async(cls):
log = cls()
logging.debug("insert document log in transaction: {}".format(ndb.in_transaction()))
yield log.put_async()
#ndb.synctasklet
def _post_put_hook(self, future):
#ndb.synctasklet
def _callback_on_commit():
key = future.get_result()
yield SqlTaskHelper.enqueue_syncronise_sql_model_async(key)
logging.debug("_post_put_hook In transaction: {}".format(ndb.in_transaction()))
ndb.get_context().call_on_commit(lambda: _callback_on_commit())
The code is executed as follows:
log_usage is called which calls insert_document_log_async
When calling insert_document_log_async, logging indicates that we're in a transaction (insert document log in transaction: True).
But the _post_put_hook logging indicates we're not in a transaction (so call_on_commit is executed immediately, which is what I suspect the issue is). The task runs shortly after and the entity isn't always available.
I'd like to know why _post_put_hook is executing outside of a transaction.
Thanks
Your question was answered on Google Groups. I'm re-posting from there:
"Note that post hooks do not check whether the RPC was successful. The hook runs regardless of failure that might have occurred due to issues, more specifically the contention which is when you attempt to write to a single entity group too quickly. Also note that it is normal that a small number of datastore operations will result in timeout in normal operation. Read more here about the most common datastore issues and here how to avoid the contention.
In case you need any coding assistance, I suggest you post your inquiries on Stack Overflow where the community of developers are better prepared to assist you in that matter. Google Groups is oriented more towards general opinions, trends, and issues of general nature regarding Google Cloud Platform.
If an exception is detected by Datastore, it would be raised when the code calls get_result(), so the key would not return. However, note that “all post- hooks have a Future argument at the end of the call signature. This Future object holds the result of the action. You can call get_result on this Future to retrieve the result; you can be sure that get_result won't block, since the Future is complete by the time the hook is called.”
That said, in case you don’t have an exception, the future already has the result and get_result function is not blocking, occasionally failing to retrieve the key. Take a look at this Stack Overflow post with a suggestion to resolve an issue similar to your case."

Why is asyncio.Future incompatible with concurrent.futures.Future?

The two classes represent excellent abstractions for concurrent programming, so it's a bit disconcerting that they don't support the same API.
Specifically, according to the docs:
asyncio.Future is almost compatible with concurrent.futures.Future.
Differences:
result() and exception() do not take a timeout argument and raise an exception when the future isn’t done yet.
Callbacks registered with add_done_callback() are always called via the event loop's call_soon_threadsafe().
This class is not compatible with the wait() and as_completed() functions in the concurrent.futures package.
The above list is actually incomplete, there are a couple more differences:
running() method is absent
result() and exception() may raise InvalidStateError if called too early
Are any of these due to the inherent nature of an event loop that makes these operations either useless or too troublesome to implement?
And what is the meaning of the difference related to add_done_callback()? Either way, the callback is guaranteed to happen at some unspecified time after the futures is done, so isn't it perfectly consistent between the two classes?
The core reason for the difference is in how threads (and processes) handle blocks vs how coroutines handle events that block. In threading, the current thread is suspended until whatever condition resolves and the thread can go forward. For example in the case of the futures, if you request the result of a future, it's fine to suspend the current thread until that result is available.
However the concurrency model of an event loop is that rather than suspending code, you return to the event loop and get called again when ready. So it is an error to request the result of an asyncio future that doesn't have a result ready.
You might think that the asyncio future could just wait and while that would be inefficient, would it really be all that bad for your coroutine to block? It turns out though that having the coroutine block is very likely to mean that the future never completes. It is very likely that the future's result will be set by some code associated with the event loop running the code that requests the result. If the thread running that event loop blocks, no code associated with the event loop would run. So blocking on the result would deadlock and prevent the result from being produced.
So, yes, the differences in interface are due to this inherent difference. As an example, you wouldn't want to use an asyncio future with the concurrent.futures waiter abstraction because again that would block the event loop thread.
The add_done_callbacks difference guarantees that callbacks will be run in the event loop. That's desirable because they will get the event loop's thread local data. Also, a lot of coroutine code assumes that it will never be run at the same time as other code from the same event loop. That is, coroutines are only thread safe under the assumption that two coroutines from the same event loop do not run at the same time. Running the callbacks in the event loop avoids a lot of thread safety issues and makes it easier to write correct code.
concurrent.futures.Future provides a way to share results between different threads and processes usually when you use Executor.
asyncio.Future solves same task but for coroutines, that are actually some special sort of functions running usually in one process/thread asynchronously. "Asynchronously" in current context means that event loop manages code executing flow of this coroutines: it may suspend execution inside one coroutine, start executing another coroutine and later return to executing first one - everything usually in one thread/process.
These objects (and many other threading/asyncio objects like Lock, Event, Semaphore etc.) look similar because the idea of concurrency in your code with threads/processes and coroutines is similar.
I think the main reason objects are different is historical: asyncio was created much later then threading and concurrent.futures. It's probably impossible to change concurrent.futures.Future to work with asyncio without breaking class API.
Should both classes be one in "ideal world"? This is probably debatable issue, but I see many disadvantages of that: while asyncio and threading look similar at first glance, they're very different in many ways, including internal implementation or way of writing asyncio/non-asyncio code (see async/await keywords).
I think it's probably for the best that classes are different: we clearly split different by nature ways of concurrency (even if their similarity looks strange at first).

Django with Celery - existing object not found

I am having problem with executing celery task from another celery task.
Here is the problematic snippet (data object already exists in database, its attributes are just updated inside finalize_data function):
def finalize_data(data):
data = update_statistics(data)
data.save()
from apps.datas.tasks import optimize_data
optimize_data.delay(data.pk)
#shared_task
def optimize_data(data_pk):
data = Data.objects.get(pk=data_pk)
#Do something with data
Get call in optimize_data function fails with "Data matching query does not exist."
If I call the retrieve by pk function in finalize_data function it works fine. It also works fine if I delay the celery task call for some time.
This line:
optimize_data.apply_async((data.pk,), countdown=10)
instead of
optimize_data.delay(data.pk)
works fine. But I don't want to use hacks in my code. Is it possible that .save() call is asynchronously blocking access to that row/object?
I know that this is an old post but I stumbled on this problem today. Lee's answer pointed me to the correct direction but I think a better solution exists today.
Using the on_commit handler provided by Django this problem can be solved without a hackish way of countdowns in the code which might not be intuitive to the user about why it exsits.
I'm not sure if this existed when the question was posted but I'm just posting the answer so that people who come here in the future know about the alternative.
I'm guessing your caller is inside a transaction that hasn't committed before celery starts to process the task. Hence celery can't find the record. That is why adding a countdown makes it work.
A 1 second countdown will probably work as well as the 10 second one in your example. I've used 1 second countdowns throughout code to deal with this issue.
Another solution is to stop using transactions.
You could use an on_commit hook to make sure the celery task isn't triggered until after the transaction commits?
DjangoDocs#performing-actions-after-commit
It's a feature that was added in Django 1.9.
from django.db import transaction
def do_something():
pass # send a mail, invalidate a cache, fire off a Celery task, etc.
transaction.on_commit(do_something)
You can also wrap your function in a lambda:
transaction.on_commit(lambda: some_celery_task.delay('arg1'))
The function you pass in will be called immediately after a hypothetical database write made where on_commit() is called would be successfully committed.
If you call on_commit() while there isn’t an active transaction, the callback will be executed immediately.
If that hypothetical database write is instead rolled back (typically when an unhandled exception is raised in an atomic() block), your function will be discarded and never called.

Python Daemons - Program Structure and Exception Control

I've been doing amateur coding in Python for a while now and feel quite comfortable with it. Recently though I've been writing my first Daemon and am trying to come to terms with how my programs should flow.
With my past programs, exceptions could be handled by simply aborting the program, perhaps after some minor cleaning up. The only consideration I had to give to program structure was the effective handling of non-exception input. In effect, "Garbage In, Nothing Out".
In my Daemon, there is an outside loop that effectively never ends and a sleep statement within it to control the interval at which things happen. Processing of valid input data is easy but I'm struggling to understand the best practice for dealing with exceptions. Sometimes the exception may occur within several levels of nested functions and each needs to return something to its parent, which must, in turn, return something to its parent until control returns to the outer-most loop. Each function must be capable of handling any exception condition, not only for itself but also for all its subordinates.
I apologise for the vagueness of my question but I'm wondering if anyone could offer me some general pointers into how these exceptions should be handled. Should I be looking at spawning sub-processes that can be terminated without impact to the parent? A (remote) possibility is that I'm doing things correctly and actually do need all that nested handling. Another very real possibility is that I haven't got a clue what I'm talking about. :)
Steve
Exceptions are designed for the purpose of (potentially) not being caught immediately-- that's how they differ from when a function returns a value that means "error". Each exception can be caught at the level where you want to (and can) do something about it.
At a minimum, you could start by catching all exceptions at the main loop and logging a message. This is simple and ensures that your daemon won't die. At the main loop it's probably too late to fix most problems, so you can catch specific exceptions sooner. E.g. if a file has the wrong format, catch the exception in the routine that opens and tries to use the file, not deep in the parsing code where the problem is discovered; perhaps you can try another format. Basically if there's a place where you could recover from a particular error condition, catch it there and do so.
The answer will be "it depends".
If an exception occurs in some low-level function, it may be appropriate to catch it there if there is enough information available at this level to let the function complete successfully in spite of the exception. E.g. when reading triangles from an .stl file, the normal vector of the triangle it both explicitly given and implicitly given by the sequence of the three points that make up the triangle. So if the normal vector is given as (0,0,0), which is a 0-length vector and should trigger an exception in the constructor of a Normal vector class, that can be safely caught in the constructor of a Triangle class, because it can still be calculated by other means.
If there is not enough information available to handle an exception, it should trickle upwards to a level where it can be handled. E.g. if you are writing a module to read and interpret a file format, it should raise an exception if the file it was given doesn't match the file format. In this case it is probably the top level of the program using that module that should handle the exception and communicate with the user. (Or in case of a daemon, log the error and carry on.)

Communicating end of Queue

I'm learning to use the Queue module, and am a bit confused about how a queue consumer thread can be made to know that the queue is complete. Ideally I'd like to use get() from within the consumer thread and have it throw an exception if the queue has been marked "done". Is there a better way to communicate this than by appending a sentinel value to mark the last item in the queue?
original (most of this has changed; see updates below)
Based on some of the suggestions (thanks!) of Glenn Maynard and others, I decided to roll up a descendant of Queue.Queue that implements a close method. It's available in the form of a primitive (unpackaged) module. I'll clean this up a bit and package it properly when I have a bit more time. For now the module only contains the CloseableQueue class and the Closed exception class. I'm planning to expand it to also include subclasses of Queue.LifoQueue and Queue.PriorityQueue.
It's in a pretty preliminary state currently, which is to say that although it passes its test suite, I haven't actually used it for anything yet. Your mileage may vary. I'll keep this answer updated with exciting news.
The CloseableQueue class differs a bit from Glenn's suggestion in that closing the queue will prevent future puts, but not prevent future gets until the queue is emptied. This made the most sense to me; it seemed like functionality to clear the queue could be added as a separate mixin* that would be orthogonal to the closeability functionality. So basically with CloseableQueue, by closing the queue you indicate that the last element has been put. There's also an option to do this atomically by passing last=True to the final put call. Subsequent calls to put, and subsequent calls to get once the queue is emptied, as well as outstanding blocked calls matching those descriptions, will raise the Closed exception.
This is mostly useful for situations where a single producer is generating data for one or more consumers, but it could also be useful for a multi-multi arrangement where consumers are waiting for a particular item or set of items. In particular it doesn't provide a way to determine that all of a number of producers have finished production. Getting that working would entail the provision of some way to register producers (.open()?), as well as a way to indicate that producer registration is itself closed.
Suggestions and/or code reviews are quite welcome. I haven't written a whole lot of concurrency code, but hopefully the test suite is thorough enough that the fact that the code passes it is an indication of the code's quality, rather than the suite's lack thereof. I was able to reuse a bunch of the code from the Queue module's test suite: the file itself is included in this module and used as a basis for various subclasses and routines, including regression testing. This probably (hopefully) helped to avoid complete ineptitude in the testing department. The code itself just overrides Queue.get and Queue.put with fairly minimal changes, and adds the close and closed methods.
I've sort of intentionally avoided using any new-fangled fanciness like context managers in both the code itself and in the test suite in an effort to keep the code as backwards-compatible as is the Queue module itself, which is considerably backwards indeed. I'll probably add __enter__ and __exit__ methods at some point; otherwise, the contextlib's closing function should be applicable to a CloseableQueue instance.
*: Here I use the term "mixin" loosely. As the Queue module's classes are old-style, mixins would need to be mixed using class factory functions; some restrictions apply; offer void where prohibited by Guido.
update
The CloseableQueue module now provides CloseableLifoQueue and CloseablePriorityQueue classes. I've also added some convenience functions to support iteration. Still need to rework it as a proper package. There's a class factory function to allow for convenient subclassing of other Queue.Queue-derived classes.
update 2
CloseableQueue is now available via PyPI, e.g. with
$ easy_install CloseableQueue
Comments and criticism are welcome, especially from this answer's anonymous downvoter.
Queue's don't inherently have the idea of being complete or done. They can be used indefinitely. To close it up when you are done, you will indeed need to put None or some other magic value at the end and write the logic to check for it, as you described. The ideal way would probably be subclassing the Queue object.
See http://en.wikipedia.org/wiki/Queue_(data_structure) to learn more about queue in general.
A sentinel is a natural way to shut down a queue, but there are a couple things to watch out for.
First, remember that you may have more than one consumer, so you need to send a sentinel once for each running consumer, and guarantee that each consumer will only consume one sentinel, to ensure that each consumer receives its shutdown sentinel.
Second, remember that Queue defines an interface, and that when possible, code should behave regardless of the underlying Queue. You might have a PriorityQueue, or you might have some other class that exposes the same interface and returns values in some other order.
Unfortunately, it's hard to deal with both of these. To deal with the general case of different queues, a consumer that's shutting down must continue to consume values after receiving its shutdown sentinel until the queue is empty. That means that it may consume another thread's sentinel. This is a weakness of the Queue interface: it should have a Queue.shutdown call to cause an exception to be thrown by all consumers, but that's missing.
So, in practice:
if you're sure you're only ever using a regular Queue, simply send one sentinel per thread.
if you may be using a PriorityQueue, ensure that the sentinel has the lowest priority.
Queue is a FIFO (first in first out) register so remember that the consumer can be faster than producer. When consumers thread detect that the queue is empty normally realise one of following actions:
Send to API: switch to next thread.
Send to API: sleep some ms and than check again the queue.
Send to API: wait on event (like new message in queue).
If you wont that consumers thread terminate after job is complete than put in queue a sentinel value to terminate task.
The best practice way of doing this would be to have the queue itself notify a client that it has reached the 'done' state. The client can then take any action that is appropriate.
What you have suggested; checking the queue to see if it is done periodically, would be highly undesirable. Polling is an antipattern in multithreaded programming, you should always be using notifications.
EDIT:
So your saying that the queue itself knows that it's 'done' based on some criteria and needs to notify the clients of that fact. I think you are correct and the best way to do this is by throwing when a client calls get() and the queue is in the done state. If your throwing this would negate the need for a sentinel value on the client side. Internally the queue can detect that it is 'done' in any way it pleases e.g. queue is empty, it's state was set to done etc I don't see any need for a sentinel value.

Categories