I'm planning on using django-celery-results backend to track status and results of Celery tasks.
Is the django-celery-results backend suitable to store status of task while it is running, or only after it has finished?
It's not clear when the TaskResult model is first created (upon task creation, task execution, or completion?)
If it's created upon task creation, will the model status automatically be updated to RUNNING when the task is picked up, if task_track_started option is set?
Can the TaskResult instance be accessed within the task function?
Another question here appears to indicate so but doesn't mention task status update to RUNNING
About Taskresult, of course, you can access during execution task, you only need import:
from django_celery_results.models import TaskResult
With this you can access to model taskresult filter by task_id, task_name etc,
This is official code, you can filter by any of this fields https://github.com/celery/django-celery-results/blob/master/django_celery_results/models.py
Example:
task = TaskResult.objects.filter(task_name=task_name, status='SUCCESS').first()
In addition, you can add some additional data necessaries for your task in models fields. When task is created and is in execution also is posible modify fields but in case that status is 'SUCCEES', you only can read it, status task is auto save by default.
Example:
task_result.meta = json.dumps({'help_text': {'date': '2020-11-30'}})
I recommend work with to_dict() function, here you can watch models attributes.
Backend is configured in settings module as:
CELERY_RESULT_BACKEND = 'django-db' # in this case it is django DB
If you configured django DB as a backend then you could import it as
from django-celery-results.models import TaskResult
Related
I have a standalong script that scrapes a page, initiates a connection to a database, and writes database to it. I need it to execute periodically after x hours. I can make it with using a bash script, with the pseudocode:
while true
do
python scraper.py
sleep 60*60*x
done
From what I read about message brokers, they are used for sending "signals" from one running program to another, like HTTP in principle. Like I have a piece of code that accepts an email id from user, it sends signal with email-id to another piece of code that will send the email.
I need celery to run a periodic task on heroku. I already have a mongodb on a separate server. WHy do I need to run another server for rabbitmq or redis just for this? Can I use celery without the broker?
Celery architecture is designed to scale and distribute tasks across several servers. For sites like yours it might be an overkill. Queue service is generally needed to maintain the task list and signal the status of finished tasks.
You might want to take a look in Huey instead. Huey is small-scale Celery "Clone" needing only Redis as an external dependency, not RabbitMQ. It's still using Redis queue mechanism to line the tasks in queue.
There also exists Advanced Python scheduler which does not need even Redis, but can hold the state of the queue in memory in-process.
Alternatively if you have very small amount of periodical tasks, no delayed tasks, I would just use Cron and pure Python scripts to run the tasks.
As the Celery documentation explains:
Celery communicates via messages, usually using a broker to mediate between clients and workers. To initiate a task, a client adds a message to the queue, which the broker then delivers to a worker.
You can use your existing MongoDB database as broker. see Using MongoDB.
For the application like this, its better use Django Background Tasks
,
Installation
Install from PyPI:
pip install django-background-tasks
Add to INSTALLED_APPS:
INSTALLED_APPS = (
# ...
'background_task',
# ...
)
Migrate your database:
python manage.py makemigrations background_task
python manage.py migrate
Creating and registering tasks
To register a task use the background decorator:
from background_task import background
from django.contrib.auth.models import User
#background(schedule=60)
def notify_user(user_id):
# lookup user by id and send them a message
user = User.objects.get(pk=user_id)
user.email_user('Here is a notification', 'You have been notified')
This will convert the notify_user into a background task function. When you call it from regular code it will actually create a Task object and stores it in the database. The database then contains serialised information about which function actually needs running later on. This does place limits on the parameters that can be passed when calling the function - they must all be serializable as JSON. Hence why in the example above a user_id is passed rather than a User object.
Calling notify_user as normal will schedule the original function to be run 60 seconds from now:
notify_user(user.id)
This is the default schedule time (as set in the decorator), but it can be overridden:
notify_user(user.id, schedule=90) # 90 seconds from now
notify_user(user.id, schedule=timedelta(minutes=20)) # 20 minutes from now
notify_user(user.id, schedule=timezone.now()) # at a specific time
Also you can run original function right now in synchronous mode:
notify_user.now(user.id) # launch a notify_user function and wait for it
notify_user = notify_user.now # revert task function back to normal function.
Useful for testing.
You can specify a verbose name and a creator when scheduling a task:
notify_user(user.id, verbose_name="Notify user", creator=user)
I've hit a really nasty situation. I have the following setup.
I have a django model representing an FSM with a django FSM field
I have a celery task that sends out an email and then advances the state of the main objects FSM. From the celery task's perspective, the object "seems" to be saved. But from the main django process' perspective, the object isn't being updated. The strange thing is that ancillary objects are being saved properly to the DB, and later accessible from the main django process.
I explicitly call .save() on the object from the Celery task, and the date_last_modified = models.DateTimeField(auto_now=True, null=True) field has a later timestamp in the Celery task than the main thread, although I'm not sure if that's an indication of anything, i.e. it may have been updated but the update has not been flushed out to the DB.
I'm using django 1.5.1,
postgresql 9.3.0,
celery v3.1.0,
Redis 2.6.10
Running Celery like so
$ celery -A tracking worker -E -B -l info
Any ideas of why this may be happening would be greatly appreciated
Are you re-getting the object after the save? I.e. not just looking at the instance you got before the save?
I had similar problem with Django 1.5
I guess it's because of that Django does not commit changes to database immediately.
Adding
'OPTIONS': {
'autocommit': True
}
to DATABASES setting fixed the problem for me.
Problem will not exist in Django 1.6+ beacuse autocommit is the default there.
What about transactions? You can try to set CELERY_EAGER_PROPAGATES_EXCEPTIONS=True and run celery with -l DEBUG to see, is any error happens after model .save() call.
Also take care of concurrent updates. When one process reads model, then celery reads and saves same model, if initial process calls models.save() later it would override all fields in it.
Had an issue looking like yours on Django 3.2.7. using get_nb_lines.delay(flow.pk)within an class based updateview.
After fix, I suppose it was a kind of (maybe) concurrent updates or crossing updates (dunno how to call that).
I understood that after I noticed that get_nb_lines.apply_async((flow.pk,), countdown=5)had fixed my problem. I anybody explains this another way, I'll take it.
Take care because the parameter sent into the function must be iterable as said an error alert. So in my case, I had to treat flow.pk as a list (add a comma after flow.pk)
I want to develop an application that monitors the database for new records and allows me to execute a method in the context of my Django application when a new record is inserted.
I am planning to use an approach where a Celery task checks the database for changes since the last check and triggers the above method.
Is there a better way to achieve this?
I'm using SQLite as the backend and tried apsw's setupdatehook API, but it doesn't seem to run my module in Django context.
NOTE: The updates are made by a different application outside Django.
Create a celery task to do whatever it is you need to do with the object:
tasks.py
from celery.decorators import task
#task()
def foo(object):
object.do_some_calculation()
Then create a django signal that is fired every time an instance of your Model is saved , queuing up your task in Celery:
models.py
class MyModel(models.Model):
...
from django.db.models.signals import post_save
from django.dispatch import receiver
from mymodel import tasks
#receiver(post_save, sender=MyModel)
def queue_task(sender, instance, created, **kwargs):
tasks.foo.delay(object=instance)
What's important to note that is django's signals are synchronous, in other words the queue_task function runs within the request cycle, but all the queue_task function is doing is telling Celery to handle the actual guts of the work (do_some_calculation) in theb background
A better way would be to have that application that modifies the records call yours. Or at least make a celery queue entry so that you don't really have to query the database too often to see if something changed.
But if that is not an option, letting celery query the database to find if something changed is probably the next best option. (surely better than the other possible option of calling a web service from the database as a trigger, which you should really avoid.)
I'm using Django to develop a classifier service, and user can build a model using api like http://localhost/api/buildmodel, however, because building a model takes a long time, maybe 2 hours, and I'm using web page to show the result of building a model. How to design my Django program to return immediately and do something to show the result after building finish? maybe I can use ajax but I want to implement it in Python, like using a async method and calling a callback function after building, any suggestions will be appreciated.
You need to use a task queue manager. Celery is by far the most popular task manager for Django. The idea is that you give a task to this manager, then it processes the task and once it is done, it can fire a callback function. Within the callback function, you can you run your logic to notify the user that the task has been completed.
One way to do it is to create a row in a persistent database (or a redis key/value pair) for the task which says if it is running or finished. Have the code set the value to be running when the task starts and done when the task completes. Then have an AJAX call do a GET lookup on a URL that sends the status for the task via a web service. You can put that in a setInterval() to periodically poll the database to see if it is done. You could send an email on completion or just have a landing page / dashboard that shows the status of the tasks being run.
I have a reminder type app that schedules tasks in celery using the "eta" argument. If the parameters in the reminder object changes (e.g. time of reminder), then I revoke the task previously sent and queue a new task.
I was wondering if there's any good way of keeping track of revoked tasks across celeryd restarts. I'd like to have the ability to scale celeryd processes up/down on the fly, and it seems that any celeryd processes started after the revoke command was sent will still execute that task.
One way of doing it is to keep a list of revoked task ids, but this method will result in the list growing arbitrarily. Pruning this list requires guarantees that the task is no longer in the RabbitMQ queue, which doesn't seem to be possible.
I've also tried using a shared --statedb file for each of the celeryd workers, but it seems that the statedb file is only updated on termination of the workers and thus not suitable for what I would like to accomplish.
Thanks in advance!
Interesting problem, I think it should be easy to solve using broadcast commands.
If when a new worker starts up it requests all the other workers to dump its revoked
tasks to the new worker. Adding two new remote control commands,
you can easily add new commands by using #Panel.register,
Module control.py:
from celery.worker import state
from celery.worker.control import Panel
#Panel.register
def bulk_revoke(panel, ids):
state.revoked.update(ids)
#Panel.register
def broadcast_revokes(panel, destination):
panel.app.control.broadcast("bulk_revoke", arguments={
"ids": list(state.revoked)},
destination=destination)
Add it to CELERY_IMPORTS:
CELERY_IMPORTS = ("control", )
The only missing problem now is to connect it so that the new worker
triggers broadcast_revokes at startup. I guess you could use the worker_ready
signal for this:
from celery import current_app as celery
from celery.signals import worker_ready
def request_revokes_at_startup(sender=None, **kwargs):
celery.control.broadcast("broadcast_revokes",
destination=sender.hostname)
I had to do something similar in my project and used celerycam with django-admin-monitor. The monitor takes a snapshot of tasks and saves them in the database periodically. And there is a nice user interface to browse and check the status of all tasks. And you can even use it even if your project is not Django based.
I implemented something similar to this some time ago, and the solution I came up with was very similar to yours.
The way I solved this problem was to have the worker fetch the Task object from the database when the job ran (by passing it the primary key, as the documentation recommends). In your case, before the reminder is sent the worker should perform a check to ensure that the task is "ready" to be run. If not, it should simply return without doing any work (assuming that the ETA has changed and another worker will pick up the new job).