I am having problem with executing celery task from another celery task.
Here is the problematic snippet (data object already exists in database, its attributes are just updated inside finalize_data function):
def finalize_data(data):
data = update_statistics(data)
data.save()
from apps.datas.tasks import optimize_data
optimize_data.delay(data.pk)
#shared_task
def optimize_data(data_pk):
data = Data.objects.get(pk=data_pk)
#Do something with data
Get call in optimize_data function fails with "Data matching query does not exist."
If I call the retrieve by pk function in finalize_data function it works fine. It also works fine if I delay the celery task call for some time.
This line:
optimize_data.apply_async((data.pk,), countdown=10)
instead of
optimize_data.delay(data.pk)
works fine. But I don't want to use hacks in my code. Is it possible that .save() call is asynchronously blocking access to that row/object?
I know that this is an old post but I stumbled on this problem today. Lee's answer pointed me to the correct direction but I think a better solution exists today.
Using the on_commit handler provided by Django this problem can be solved without a hackish way of countdowns in the code which might not be intuitive to the user about why it exsits.
I'm not sure if this existed when the question was posted but I'm just posting the answer so that people who come here in the future know about the alternative.
I'm guessing your caller is inside a transaction that hasn't committed before celery starts to process the task. Hence celery can't find the record. That is why adding a countdown makes it work.
A 1 second countdown will probably work as well as the 10 second one in your example. I've used 1 second countdowns throughout code to deal with this issue.
Another solution is to stop using transactions.
You could use an on_commit hook to make sure the celery task isn't triggered until after the transaction commits?
DjangoDocs#performing-actions-after-commit
It's a feature that was added in Django 1.9.
from django.db import transaction
def do_something():
pass # send a mail, invalidate a cache, fire off a Celery task, etc.
transaction.on_commit(do_something)
You can also wrap your function in a lambda:
transaction.on_commit(lambda: some_celery_task.delay('arg1'))
The function you pass in will be called immediately after a hypothetical database write made where on_commit() is called would be successfully committed.
If you call on_commit() while there isn’t an active transaction, the callback will be executed immediately.
If that hypothetical database write is instead rolled back (typically when an unhandled exception is raised in an atomic() block), your function will be discarded and never called.
Related
Step Functions are AWS structures that control the flow of lambdas (or other events). All my lambdas use Python (but Lambdas can use most major languages). Throughout the process my step function sends status updates back to the client (the client triggered it via API). Let's say it progresses through these updates: Started -> In Progress -> Finishing -> Done. For handled errors it will send an 'Error' status back to the client. So the client could see a timeline like this: Started -> In Progress -> Errored. This is ideal - so the user knows the process has stopped.
But when there are unexpected/unhandled errors the client never really knows and the timeline might sit at 'In Progress' indefinitely - the user doesn't know what happened. So I started looking into the built-in Step Function error handling. I like this option because I can create a 'Catch' function for each lambda or event where I can communicate back to the client if there is an error. The downside to this was that it really made the step function template/design messy see the before/after screenshots below.
BEFORE---------------
AFTER---------------
The template code that generates these graphs doesn't look much better. So I considered an alternative which seems similarly messy. I could add a single try/except block within each lambda for the entire lambda - to catch any/all errors. For example:
def lambda_handler(event, context):
try:
#Execute function tasks
except:
#Communicate back to client that there was an error
Similar to the step function 'Catch' functions this would ensure that I catch and communicate any error. But this seems like a bad idea just because of what it is (adding blanket/blind try/except).
So right now I'm stuck between messy/repeated code and try/except-ing everything. Am I implementing step function 'Catch' incorrectly? Am I missing a better way to handle unknown Python errors? Is there another approach entirely?
As #stijndepestel pointed out, having a catch-all error check is a good idea.
What I do in my Python Lambda functions is this: I have a custom router class, which besides route managing, it handles all errors. If the error inherits from a base error class that I've created, then it's custom error that I threw, and those are assigned special info when I created them that automatically gets formatted when they are converted into strings. The router sends that back to the client if possible.
But if the error is some unknown/unexpected one, then the router prints it with as much detail as possible to CloudWatch Logs, and then returns a generic "500 Internal Server Error" message to the client.
I'd probably set it up in the future to notify me by email or something like that when such errors occur, so that I can take action quickly.
I don't see why having a try-catch system for the entirety of your lambda is such a bad idea. It just ensures that you're always in control of how errors are communicated to the caller of the lambda function.
Imagine for example a lambda that serves as a back-end for an HTTP API, it would be better practice to have an try-catch for everything, so you can communicate to your clients what the problem was, or at least provide a generic HTTP 500 type error. In this case, the functions will be called by AWS Step Functions, which means you're error messages don't have to be user friendly, but the fact you might want to be in control of how unexpected exceptions are handled, is still the same in my book.
I have a Django rest framework app that calls 2 huey tasks in succession in a serializer create method like so:
...
def create(self, validated_data):
user = self.context['request'].user
player_ids = validated_data.get('players', [])
game = Game.objects.create()
tasks.make_players_friends_task(player_ids)
tasks.send_notification_task(user.id, game.id)
return game
# tasks.py
#db_task()
def make_players_friends_task(ids):
players = User.objects.filter(id__in=ids)
# process players
#db_task()
def send_notification_task(user_id, game_id):
user = User.objects.get(id=user_id)
game = Game.objects.get(id=game_id)
# send notifications
When running the huey process in the terminal, when I hit this endpoint, I can see that only one or the other of the tasks is ever called, but never both. I am running huey with the default settings (redis with 1 thread worker.)
If I alter the code so that I am passing in the objects themselves as parameters, rather than the ids, and remove the django queries in the #db_task methods, things seem to work alright.
The reason I initially used the ids as parameters is because I assumed (or read somewhere) that huey uses json serialization as default, but after looking into it, pickle is actually the default serializer.
One theory is that since I am only running one worker, and also have a #db_periodic_task method in the app, the process can only handle listening for tasks or executing them at any time, but not both. This is the way celery seems to work, where you need a separate process for a scheduler and a worker each, but this isn't mentioned in huey's documentation.
If you run the huey consumer it will actually spawn a separate scheduler together with the amount of workers you've specified, so that's not going to be your problem.
You're not giving enough information to actually properly see what's going wrong so check the following:
If you run the huey consumer in the terminal, observe whether all your tasks show up as properly registered so that the consumer is actually capable of consuming them.
Check whether your redis process is running.
Try performing the tasks with a blocking call to see on which tasks it fails:
task_result = tasks.make_players_friends_task(player_ids)
task_result.get(blocking=True)
task_result = tasks.send_notification_task(user.id, game.id)
task_result.get(blocking=True)
Do this with a debugger or print statements to see whether it makes it to the end of your function or where it gets stuck.
Make sure to always restart your consumer when you change code. It doesn't automatically pick up new code like the django dev server. The fact that your code works as intended while pickling whole objects instead of passing id's could point to this, as it would be really weird that this would break it. On the other hand, you shouldn't pass in django ORM objects. It makes way more sense to use your id approach.
I have some code that queues up a task inside _post_put_hook.
The task retrieves the key and fetches the entity. However sometimes the worker fails because the object for that key hasn't been created yet, but will succeed when it next runs.Note that we're retrieving the object by key, so I expect the data to be consistent.
I'm only calling the enqueue on commit, so I'd expect the object to be created by the time the task runs. In the sample below, I find that _post_put_hook is not in a transaction which seems to be the cause of the issue, but why isn't it in a transaction?
Here's a sample:
#ndb.synctasklet
def log_usage(self):
#ndb.transactional_tasklet(xg=True)
def _txn():
yield Log.insert_document_log_async()
yield _txn()
class Log(ndb.Expando):
#classmethod
#ndb.tasklet
def insert_document_log_async(cls):
log = cls()
logging.debug("insert document log in transaction: {}".format(ndb.in_transaction()))
yield log.put_async()
#ndb.synctasklet
def _post_put_hook(self, future):
#ndb.synctasklet
def _callback_on_commit():
key = future.get_result()
yield SqlTaskHelper.enqueue_syncronise_sql_model_async(key)
logging.debug("_post_put_hook In transaction: {}".format(ndb.in_transaction()))
ndb.get_context().call_on_commit(lambda: _callback_on_commit())
The code is executed as follows:
log_usage is called which calls insert_document_log_async
When calling insert_document_log_async, logging indicates that we're in a transaction (insert document log in transaction: True).
But the _post_put_hook logging indicates we're not in a transaction (so call_on_commit is executed immediately, which is what I suspect the issue is). The task runs shortly after and the entity isn't always available.
I'd like to know why _post_put_hook is executing outside of a transaction.
Thanks
Your question was answered on Google Groups. I'm re-posting from there:
"Note that post hooks do not check whether the RPC was successful. The hook runs regardless of failure that might have occurred due to issues, more specifically the contention which is when you attempt to write to a single entity group too quickly. Also note that it is normal that a small number of datastore operations will result in timeout in normal operation. Read more here about the most common datastore issues and here how to avoid the contention.
In case you need any coding assistance, I suggest you post your inquiries on Stack Overflow where the community of developers are better prepared to assist you in that matter. Google Groups is oriented more towards general opinions, trends, and issues of general nature regarding Google Cloud Platform.
If an exception is detected by Datastore, it would be raised when the code calls get_result(), so the key would not return. However, note that “all post- hooks have a Future argument at the end of the call signature. This Future object holds the result of the action. You can call get_result on this Future to retrieve the result; you can be sure that get_result won't block, since the Future is complete by the time the hook is called.”
That said, in case you don’t have an exception, the future already has the result and get_result function is not blocking, occasionally failing to retrieve the key. Take a look at this Stack Overflow post with a suggestion to resolve an issue similar to your case."
I have a basic django projects that I use as a front end interface for a (Condor) computing cluster for generating simulations. From the django app the users can start simulations (in Condor). The simulation related meta-data and the simulation state are kept in a DB.
I need to add a new feature: notification when (some) simulations are done.
Since I want a simple solution (and I already using background tasks) I was thinking to use repeating task that at fixed intervals query Condor about the tasks, updates the DB and if necessary sends notifications.
So if I want to update every 10 min that statuses I will have something like:
#background(schedule=1)
def check_simulations(repeat=600):
# lookup simulation statuses
simulation_list = get_Simulations()
for sim in simulations_list:
if sim.status == Simulation.DONE:
user.email_user('Simulation Complete', 'You have been notified')
def initialize():
check_simulations()
However this task (or better say the initialize() method) must be started (called once) to create and schedule the check_simulations() task (which will practically serialize the call and save it in the DB); after that the background-tasks thread will read it and execute and also reschedule it (if there is error)
My questions:
where should I put the call to the initialize() method to only be run once ?
One such place could be for instance the urls.py but this is an extremely ugly solution. Is there a better way ?
how to ensure that a server restart will not create and schedule a new task (if one already exist)
This may happen if a task is already scheduled (so a serialized task is in the background-tasks table) and the webserver is restarted so the initialize() method is called again so a new task is created and scheduled ...
i had a similar problem and i solved it this way.
i initialize my task in urls.py, i dont know if you can use other places to put it ,also added and if, to check if the task its allready in the database
from background_task.models import Task
if not Task.objects.filter(verbose_name="update_orders").exists():
tasks.update_orders(repeat=300, verbose_name="update_orders")
i have tested it and it works fine, you can also search for the order with other parameters like name, hash ,...
you can check the task model here: https://github.com/arteria/django-background-tasks/blob/master/background_task/models.py
One of my methods doesn't work when run on atomic context. I want to ask Django if it's running a transaction.
The method can create a thread or a process and saves the result to database. This is a bit odd but there is a huge performance benefit when a process can be used.
I find that especially processes are a bit sketchy with Django. I know that Django will raise an exception if the method chooses to save the results in a process and the method is run on atomic context.
If I can check for an atomic context then I can throw an exception straight away (instead of getting odd errors) or force the method to only create a thread.
I found the is_managed() method but according to this question it's been removed in Django 1.8.
According to this ticket there are a couple ways to detect this: not transaction.get_autocommit() (using a public API) or transaction.get_connection().in_atomic_block (using a private API).