AppEngine Timeout with Task Queues

AppEngine Timeout with Task Queues - python

I'm trying to execute a task in AppEngine through the Task Queues, but I still seem to be faced with a 60 second timeout. I'm unsure what I'm doing incorrectly, as the limit I'd think should be 10 minutes as advertised.
I have a call to urlfetch.fetch() that appears to be the culprit. My call is:
urlfetch.fetch(url, payload=query_data, method=method, deadline=300)
The tail end of my stack trace shows the method that triggers the url fetch call right before the DeadlineExceededError:
File "/base/data/home/apps/s~mips-conversion-scheduler/000-11.371629749593131630/views.py", line 81, in _get_mips_updated_data
policies_changed = InquiryClient().get_changed_policies(company_id, initial=initial).json()
When I look at the task queue information it shows:
Method/URL: POST /tasks/queue-initial-load
Dispatched time (UTC): 2013/11/14 15:18:49
Seconds late: 0.18
Seconds to process task: 59.90
Last http response code: 500
Reason to rety: AppError
My View that processes the task looks like:
class QueueInitialLoad(webapp2.RequestHandler):
def post(self):
company = self.request.get("company")
if company:
company_id = self.request.get("company")
queue_policy_load(company_id, queue_name="initialLoad", initial=True)
with the queue_policy_load being the method that triggers the urlfetch call.
Is there something obvious I'm missing that makes me limited to the 60 second timeout instead of 10 minutes?

Might be a little too general, but here are some thoughts that might help close the loop. There are 2 kinds of task queues, push queues and pull queues. Push queue tasks execute automatically, and they are only available to your App Engine app. On the other hand, pull queue tasks wait to be leased, are available to workers outside the app, and can be batched.
If you want to configure your queue, you can do it in the queue config file. In Java, that happens in the queue.xml file, and in Python that happens in the queue.yaml file. In terms of push queues specifically, push queue tasks are processed by handlers (URLs) as POST requests. They:
Are executed ASAP
May cause new instances (Frontend or Backend)
Have a task duration limit of 10 minutes
But, they have an unlimited duration if the tasks are run on the backend
Here is a quick Python code example showing how you can add tasks to a named push queue. Have a look at the Google developers page for Task Queues if you need more information: https://developers.google.com/appengine/docs/python/taskqueue/
Adding Tasks to a Named Push Queue:
queue = taskqueue.Queue("Qname")
task = taskqueue.Task(url='/handler', params=args)
queue.add(task)
On the other hand, let's say that you wanted to use a pull queue. You could add tasks in Python to a pull queue using the following:
queue = taskqueue.Queue("Qname")
task = taskqueue.Task(payload=load, method='PULL')
queue.add(task)
You can then lease these tasks out using the following approach in Python:
queue = taskqueue.Queue("Qname")
tasks = queue.lease_tasks(how-long, how-many)
Remember that, for pull queues, if a task fails, App Engine retries it until it succeeds.
Hope that helps in terms of providing a general perspective!

The task queues have a 10min deadline but a Urlfetch call has a 1 min deadline :
maximum deadline (request handler) 60 seconds
UPDATE: the intended behaviour was to have a max of 10mins URLFetch deadline when running in a TaskQueue, see this bug.

As GAE has evolved, this answer pertains to today where the idea of "backend" instances is deprecated. GAE Apps can be configured to be Services (aka module) and run with a manual scaling policy. Doing so allows one to set longer timeouts. If you were running your app with an autoscaling policy, it will cap your urlfetch's to 60sec and your queued tasks to 10 mins:
https://cloud.google.com/appengine/docs/python/an-overview-of-app-engine

Related

What does it mean for a celery task to be "Received"? When all celery workers are blocked, what is happening with new tasks that are not "Received"?

I'm working on a new monitoring system that can measure Celery queue throughput and help alert the team when the queue is getting backed up. Over the course of my work, I've come across some peculiar behaviors that I don't understand (and are not well documented in the Celery specs).
For testing purposes, I've set up an endpoint that will populate the queue with 16 several long-running tasks that can be used to simulate a backed-up queue. The framework is Flask and the Queue broker is Redis. Celery is configured for each worker to work on up to 4 tasks in parallel, and I have 2 workers running.
api/health.py
def health():
health = Blueprint("health", __name__)
#health.route("/api/debug/create-long-queue", methods=["GET"])
def long_queue():
for i in range(16):
sleepy_job.delay()
return make_response({}, 200)
return health
jobs.py
#celery.task(priority=HIGH_PRIORITY)
def sleepy_job(*args, **kwargs):
time.sleep(30)
Here's what I do to simulate a backed-up production queue:
I call /api/debug/create-long-queue to simulate a back-up in my queue. Based on the above math, the workers should be busy sleeping for 1 minute each (Together, they can concurrently handle 8 tasks at a time. Each task just sleeps for 30 seconds, and there are 16 tasks total.)
I make another API call shortly after (< 5 s), which kicks of a different job with real business logic (processing of an inbound webhook API call). We'll call this job handle_incoming_message.
Here's what I see Using flower to inspect the queue:
While all workers are blocked by the first 8 sleepy_job tasks, I see no sign of the new handle_incoming_message on the queue, even though I am certain handle_incoming_message.delay() has been called as a result of the 2nd API call.
After the first 8 sleepy_job tasks have been completed (~30s), I see the new handle_incoming_message on the queue with state RECIEVED.
After the second (and final) 8 sleepy_job tasks have been completed, I now see handle_incoming_message has state STARTED (and I can confirm this as the UI updates with the new data that was received and processed in that task.)
Questions
So it seems clear that when the workers are momentarily unblocked after handling the first 8 sleepy_job tasks, they are doing something to mark/acknowledge the new handle_incoming_message task in a way that is visible to flower. But this leaves several unanswered questions:
What is the state of the new handle_incoming_message task when the workers are blocked?
What changes after workers are unblocked that makes it so flower now has visibility into the new handle_incoming_message task?
What does the "RECEIVED" state actually mean?
(Bonus: How can I get visibility into tasks that are queued while workers are blocked?)

When all workers are blocked SOME tasks could be in the received state because of prefetching (look in the documentation for that). So chances are very high that your tasks are simply in the queue, waiting to be received by Celery workers (coordinating processes - these are not actual worker processes).
Flower is a simple service that is built upon a Celery feature called "task events". In simple terms it (Flower) subscribes itself as receiver of all events (received, succeeded, started, failed, etc) and then visually represents those to the web clients. More about it here. So when task gets received by a Celery worker, a "task-received" event is sent. Flower fetches this event, and changes the state of that task in the dashboard.
When a task is "received" it means that particular Celery worker took that task off the queue and it may be executed immediately (if there is a free worker-process to execute it), or Celery worker will wait for a worker process to become ready to run the task. I have already mentioned prefetching - Celery workers will often take more tasks then available worker-processes.
Celery does not give users a way to list what is in particular queue. That is why you will see many similar questions - including this one which offers answers. You will see my short answer there among others. In short, it depends on your broker of choice. If it is Redis, then you simply go through the list of objects. If it is RabbitMQ then you can use their tool to inspect queues. I think the decision not to provide this is good one as this information is never reliable. By the time you list all the tasks in particular queue, there may be thousands new ones...

How to detect Celery task which doing similar job before run another task?

My celery task is doing time-consuming calculations on some database-stored entity. Workflow is like this: get information from database, compile it to some serializable object, save object. Other tasks are doing other calculations (like rendering images) on loaded object.
But serialization is time-consuming, so i'd like to have one task per one entity running for a while, which holds serialized object in memory and process client requests, delivered through messaging queue (redis pubsub). If no requests for a while, task exits. After that, if client need some job to be done, it runs another task, which loads object, process it and stay tuned for a while for other jobs. This task should check at startup, if it only one worker on this particular entity to avoid collisions. So what is best strategy to check is there another task running for this entity?
1) First idea is to send message to some channel associated with entity, and wait for response. Bad idea, target task can be busy with calculations and waiting for response with timeout is just wasting time.
2) Store celery task-id in db is even worse - task can be killed, but record will stay, so we need to ensure that target task is alive.
3) Third idea is to inspect workers for running tasks, checking it state for entity id (which task will provide at startup). Also seems, that some collisions can happens, i.e. if several tasks are scheduled, but not runing yet.
For now I think idea 1 is the best with modifications like this: task will send message to entity channel on startup with it's startup time, but then immediately starts working, not waiting for response. Then it checks message queue and if someone is respond they compare timestamps and task with bigger timestamp quits. Seems complicated enough, are there better solution?

Final solution is to start supervisor thread in task, which reply to 'discover' message from competing tasks.
So workflow is like that.
Task starts, then subscribes to Redis PubSub channel with entity ID
Task sends 'discover' message to channel
Task wait a little bit
Task search 'reply' in incoming messages in channel, if found exits.
Task starts supervisor thread, which reply by 'reply' to all incoming 'discover' messages
This works fine except several tasks start simultaneouly, i.e. after worker restart. To avoid this need to make subscription proccess atomic, using Redis lock:
class RedisChannel:
def __init__(self, channel_id):
self.channel_id = channel_id
self.redis = StrictRedis()
self.channel = self.redis.pubsub()
with self.redis.lock(channel_id):
self.channel.subscribe(channel_id)

Batch processing of incoming notifications with GAE

My app engine app receives notifications from SendGrid for processing email deliveries, opens, etc. Sendgrid doesn't do much batching of these notifications so I could receive several per second.
I'd like to do batch processing of the incoming notifications, such as processing all of the notifications received in the last minute (my processing includes transactions so I need to combine them to avoid contention). There seems to be several ways of doing this...
For storing the incoming notifications, I could:
add an entity to the datastore or
create a pull queue task.
For triggering processing, I could:
Run a CRON job every minute (is this a good idea?) or
Have the handler that processes the incoming Sendgrid requests trigger processing of notifications but only if the last trigger was more than a minute ago (could store a last trigger date in memcache).
I'd love to hear pros and cons of the above or other approaches.

After a couple of days, I've come up with an implementation that works pretty well.
For storing incoming notifications, I'm storing the data in a pull queue task. I didn't know at the time of my question that you can actually store any raw data you want in a task, and the task doesn't have to itself be the execution of a function. You probably could store the incoming data in the datastore, but then you'd sort of be creating your own pull tasks so you might as well you the pull tasks provided by GAE.
For triggering a worker to process tasks in the pull queue, I came across this excellent blog post about On-demand Cron Jobs by a former GAE developer. I don't want to repeat that entire post here, but the basic idea is that each time you add a task to the pull queue, you create a worker task (regular push queue) to process tasks in the pull queue. For the worker task, you add a task name corresponding to a time interval to make sure you only have one worker task in the time interval. It allows you to get the benefit of 1-minute CRON job but the added performance bonus that it only runs when needed so you don't have a CRON job running when not needed.

Can AppEngine python threads last longer than the original request?

We're trying to use the new python 2.7 threading ability in Google App Engine and it seems like the created thread is getting killed before it finishes running. Our scenario:
User sends a message to the server
We update the user's data
We spawn a thread to do some more heavy duty processing
We return a response to the user before waiting for the heavy duty processing to finish
My assumption was that the thread would continue to run after the request had returned, as long as it did not exceed the total request time limit. What we're seeing though is that the thread is randomly killed partway through it's execution. No exceptions, no errors, nothing. It just stops running.
Are threads allowed to exist after the response has been returned? This does not repro on the dev server, only on live servers.
We could of course use a task queue instead, but that's a real pain since we'd have to set up a url for the action and serialize/deserialize the data.

The 'Sandboxing' section of this page:
http://code.google.com/appengine/docs/python/python27/using27.html#Sandboxing
indicates that threads cannot run past the end of the request.

Deferred tasks are the way to do this. You don't need a URL or serialization to use them:
from google.appengine.ext import deferred
deferred.defer(myfunction, arg1, arg2)

Python App Engine: Task Queues

I need to import some data to show it for user but page execution time exceeds 30 second limit. So I decided to split my big code into several tasks and try Task Queues. I add about 10-20 tasks to queue and app engine executes tasks in parallel while user is waiting for data. How can I determine that my tasks are completed to show user data ASAP? Can I somehow iterate over active tasks?

I've solved this in the past by keeping the status for the tasks in memcached, and polling (via Ajax) to determine when the tasks are finished.
If you go this way, it's best if you can always "manually" determine the status of the tasks without looking in memcached, since there's always the (slim) chance that memcache will go down or will get cleared or something as a task is running.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.