Python Celery: Update django model after state change - python

I managed to find 2 similar topics with discussion about this issue, but unfortunately I couldn't get what is the best solution from it:
Update Django Model Field Based On Celery Task Status
Update Django Model Field Based On Celery Task Status
I use Django & Celery (+redis as Message broker) and I would like to update django model when celery task status changes(from pending -> success, pending -> failure) etc.
My code:
import time
from celery import shared_task
#shared_task(name="run_simulation")
def run_simulation(simulation_id: str):
t1_start = time.perf_counter()
doSomeWork() # we may change this to sleep for instance
t1_end = time.perf_counter()
return{'process_time': t1_end - t1_start}
and the particular view from which I am calling the task:
def run_simulation(request):
form = SimulationForm(request.POST)
if form.is_valid():
new_simulation = form.save()
new_simulation.save()
task_id = tasks.run_simulation.delay(new_simulation.id)
The question is, what is the preferred way to update django model state of Simulation when status of task has been changed?
In the docs I found handlers that are using methods on_failure, on_success etc. http://docs.celeryproject.org/en/latest/userguide/tasks.html#handlers

I don't think there's a preferred method to do something like this since it depends on your project.
You can use a monitoring task like the link you sent. Give the task a task id and re-schedule the task until the monitored task is in its FINISHED state.
from celery import AsyncResult
#app.task(bind=True)
def monitor_task(self, t_id):
"""Monitor a task"""
res = AsyncResult(t_id, backend=self.backend, app=self.app)
if res.ready():
raise self.retry(
countdown=10,
exc=Exception("Main task not done yet.")
)
You can also create an event receiver and check the state of the task then save it on the DB.
http://docs.celeryproject.org/en/latest/userguide/monitoring.html#real-time-processing
Now, if you are only interested in success and failure states then you can create success and failure callbacks and the care of saving the success or failure state in the DB there.
tasks.run_simulation.apply_async(
(sim_id,),
link=tasks.success_handler.s(),
link_error=tasks.error_handler()
)
http://docs.celeryproject.org/en/latest/userguide/calling.html#linking-callbacks-errbacks

Related

How to stop the execution of a long process if something changes in the db?

I have a view that sends a message to a RabbitMQ queue.
message = {'origin': 'Bytes CSV',
'data': {'csv_key': str(csv_entry.key),
'csv_fields': csv_fields
'order_by': order_by,
'filters': filters}}
...
queue_service.send(message=message, headers={}, exchange_name=EXCHANGE_IN_NAME,
routing_key=MESSAGES_ROUTING_KEY.replace('#', 'bytes_counting.create'))
On my consumer, I have a long process to generate a CSV.
def create(self, data):
csv_obj = self._get_object(key=data['csv_key'])
if csv_obj.status == CSVRequestStatus.CANCELED:
self.logger.info(f'CSV {csv_obj.key} was canceled by the user')
return
result = self.generate_result_data(filters=data['filters'], order_by=data['order_by'], csv_obj=csv_obj)
csv_data = self._generate_csv(result=result, csv_fields=data['csv_fields'], csv_obj=csv_obj)
file_key = self._post_csv(csv_data=csv_data, csv_obj=csv_obj)
csv_obj.status = CSVRequestStatus.READY
csv_obj.status_additional = CSVRequestStatusAdditional.SUCCESS
csv_obj.file_key = file_key
csv_obj.ready_at = timezone.now()
csv_obj.save(update_fields=['status', 'status_additional', 'ready_at', 'file_key'])
self.logger.info(f'CSV {csv_obj.name} created')
The long proccess happens inside self._generate_csv, because self.generate_result_data returns a queryset, which is lazy.
As you can see, if a user changes the status of the csv_request through an endpoint BEFORE the message starts to be consumed the proccess will not be evaluated. My goal is to let this happen during the execution of self._generate_csv.
So far I tried to use Threading, but unsuccessfully.
How can I achive my goal?
Thanks a lot!
Why don't you checkout Celery library ? Using celery with django with RabbitMQ backend is much easier than directly leveraging rabbitmq queues.
Celery has an inbuilt function revoke to terminate an ongoing task:
>>> from celery.task.control import revoke
>>> revoke(task_id, terminate=True)
related SO answer
celery docs
For your use case, you probably want something like (code snippets):
## celery/tasks.py
from celery import app
#app.task(queue="my_queue")
def create_csv(message):
# ...snip...
pass
## main.py
from celery import uuid, current_app
def start_task(task_id, message):
current_app.send_task(
"create_csv",
args=[message],
task_id=task_id,
)
def kill_task(task_id):
current_app.control.revoke(task_id, terminate=True)
## signals.py
from django.dispatch import receiver
from .models import MyModel
from .main import kill_task
# choose appropriate signal to listen for DB change
#receiver(models.signals.post_save, sender=MyModel)
def handler(sender, instance, **kwargs):
kill_task(instance.task_id)
Use celery.uuid to generate task IDs which can be stored in DB or cache and use the same task ID to control the task i.e. request termination.
Since self._generate_csv is the slowest, the obvious solution would be to work with this function.
To do this, you can divide the creation of the csv file into several pieces. After creating each piece, check the status and see if you can continue to create the file. At the very end, glue all the pieces into a finished file.
Here is a method for combining multiple files into one.

Django steps or process messages via REST

For learning purpose I want to implement the next thing:
I have a script that runs selenium for example in the background and I have some log messages that help me to see what is going on in the terminal.
But I want to get the same messages in my REST request to the Angular app.
print('Started')
print('Logged in')
...
print('Processing')
...
print('Success')
In my view.py file
class RunTask(viewsets.ViewSet):
queryset = Task.objects.all()
#action(detail=False, methods=['GET'], name='Run Test Script')
def run(self, request, *args, **kwargs):
task = task()
if valid['success']:
return Response(data=task)
else:
return Response(data=task['message'])
def task()
print('Staring')
print('Logged in')
...
print('Processing')
...
print('Success')
return {
'success': True/False,
'message': 'my status message'
}
Now it shows me only the result of the task. But I want to get the same messages to indicate process status in frontend.
And I can't understand how to organize it.
Or how I can tell angular about my process status?
Unfortunately, it's not that simple. Indeed, the REST API lets you start the task, but since it runs in the same thread, the HTTP request will block until the task is finished before sending the response. Your print statements won't appear in the HTTP response but on your server output (if you look at the shell where you ran python manage.py runserver, you'll see those print statements).
Now, if you wish to have those output in real-time, you'll have to look for WebSockets. They allow you to open a "tunnel" between the browser and the server, and send/receive messages in real-time. The django-channels library allow you to implement them.
However, for long-running background tasks (like a Selenium scraper), I would advise to look into the Celery task queue. Basically, your Django process will schedule task into the queue. The tasks into the queue will then be executed by one (or more !) "worker" processes. The advantage of this is that your Django process won't be blocked by the long task: it justs add some work into the queue and then respond.
When you add tasks in the queue, Celery will give you a unique identifier for this task, that you can return in the HTTP response. You can then very well implement another endpoint which takes a task id in parameter and return the state of the task (is it pending ? done ? failed ?).
For this to work, you'll have to setup a "broker", a kind of database that will store the tasks to do and their results (typically RabbitMQ or Redis). Celery documentation explains this well: https://docs.celeryproject.org/en/latest/getting-started/brokers/index.html
Either way you choose, it's not a trivial thing and will need quite some work before having some results ; but it's interesting to see how it expands the possibilities of a classical HTTP server.

How to provide user constant notification about Celery's Task execution status?

I integrated my project with celery in this way, inside views.py after receving request from the user
def upload(request):
if "POST" == request.method:
# save the file
task_parse.delay()
# continue
and in tasks.py
from __future__ import absolute_import
from celery import shared_task
from uploadapp.main import aunit
#shared_task
def task_parse():
aunit()
return True
In short, the shared task will run a function aunit() from a third python file located in uploadapp/ directory named main.py.
Let's assume that aunit() is a resource heavy process which takes time (like file parsing). As I integrated that with celery, It works totally asynchronously now which is good to me. So, the task start -> Celery process -> It finishes then celery set status to Finish. I can view that using flower .
But what I want to do is that I want to notify the user who is using my app also through django UI that Your Task is done processing as soon as Celery has finished processing at back-side and set status to SUCCESS.
Now, I know this is possible if :
1.) I constantly request the STATUS and see wheather it returns SUCCESS or not.
How do I do that via Celery. How can you query Celery Task status from your views.py and notify user asynchronously with just celery's python module ?
You need a real time mechanism. I would suggest Firebase. Update the Firebase real time DB field of user id with a boolean=True at the end of the celery task. Implement a javascript function to listen to Firebase database user_id object changes -> update the UI

python django race condition with celery

Working on a python django project, here is what I want:
User access Page1 with object argument, function longFunction() of the object is triggered and passed to celery so the page can be returned immediately
If user tries to access Page2 with same object argument, I want the page to hang until object function longFunction() triggered by Page1 is terminated.
So I tried by locking mysql db row with objects.select_for_update() but it doesn't work.
Here is a simplified version of my code:
def Page1(request, arg_id):
obj = Vm.objects.select_for_update().get(id=arg_id)
obj.longFunction.delay()
return render_to_response(...)
def Page2(request, arg_id):
vm = Vm.objects.select_for_update().get(id=arg_id)
return render_to_response(...)
I want that Page2 hangs at the line vm = Vm.objects.select_for_update().get(id=arg_id) until longFunction() is completed. I'm new to celery and it looks like the mysql connection initiated on Page1 is lost when the Page1 returns, even if longFunction() is not finished.
Is there another way I can achieve that?
Thanks
Maybe this can be helpul for you:
from celery.result import AsyncResult
from yourapp.celery import app
def Page1(request, arg_id):
obj = Vm.objects.select_for_update().get(id=arg_id)
celery_task_id = obj.longFunction.delay()
return render_to_response(...)
def Page2(request, arg_id, celery_task_id):
task = AsyncResult(app=app, id=celery_task_id)
state = task.state
while state != "SUCCESFUL":
# wait or do whatever you want
vm = Vm.objects.select_for_update().get(id=arg_id)
return render_to_response(...)
More info at http://docs.celeryproject.org/en/latest/reference/celery.states.html
The database lock from select_for_update is released when the transaction closes (in page 1). This lock doesn't get carried to the celery task. You can lock in the celery task but that won't solve your problem because page 2 might get loaded before the celery task obtains the lock.
Mikel's answer will work. You could also put a lock in the cache as described in the celery cookbook.

In celery how to get the task status for all the tasks for specific task name?

In celery i want to get the task status for all the tasks for specific task name. For that tried below code.
import celery.events.state
# Celery status instance.
stat = celery.events.state.State()
# task_by_type will return list of tasks.
query = stat.tasks_by_type("my_task_name")
# Print tasks.
print query
Now i'm getting empty list in this code.
celery.events.state.State() is a data-structure used to keep track of the state of celery workers and tasks. When calling State(), you get an empty state object with no data.
You should use app.events.Receiver(Stream Processing) or celery.events.snapshot(Batch Processing) to capture state that contains tasks.
Sample Code:
from celery import Celery
def my_monitor(app):
state = app.events.State()
def announce_failed_tasks(event):
state.event(event)
# task name is sent only with -received event, and state
# will keep track of this for us.
task = state.tasks.get(event['uuid'])
print('TASK FAILED: %s[%s] %s' % (
task.name, task.uuid, task.info(),))
with app.connection() as connection:
recv = app.events.Receiver(connection, handlers={
'task-failed': announce_failed_tasks,
'*': state.event,
})
recv.capture(limit=None, timeout=None, wakeup=True)
if __name__ == '__main__':
app = Celery(broker='amqp://guest#localhost//')
my_monitor(app)
This isn't natively supported. Depending on the backend (Mongo, Redis, etc), you may or may not be able to introspect the contents of a queue and find out what's in it. Even if you do, you'll miss items currently in progress.
That said, you could manage this yourself:
result = mytask.delay(...)
my_datastore.save("mytask", result.id)
...
for id in my_datastore.find(task="mytask"):
res = AsyncResult(id)
print res.state
In celery you can easily find the status of task by accessing them through task ID if you want to access them from other function.
Sample Code:-
#task(name='Sum_of_digits')
def ABC(x,y):
return x+y
Add this task for processing
res = ABC.delay(1, 2)
Now use the task res to fetch the state, status and results(res.get())
print(f"id={res.id}, state={res.state}, status={res.status}")

Categories