python django race condition with celery - python

Working on a python django project, here is what I want:
User access Page1 with object argument, function longFunction() of the object is triggered and passed to celery so the page can be returned immediately
If user tries to access Page2 with same object argument, I want the page to hang until object function longFunction() triggered by Page1 is terminated.
So I tried by locking mysql db row with objects.select_for_update() but it doesn't work.
Here is a simplified version of my code:
def Page1(request, arg_id):
obj = Vm.objects.select_for_update().get(id=arg_id)
obj.longFunction.delay()
return render_to_response(...)
def Page2(request, arg_id):
vm = Vm.objects.select_for_update().get(id=arg_id)
return render_to_response(...)
I want that Page2 hangs at the line vm = Vm.objects.select_for_update().get(id=arg_id) until longFunction() is completed. I'm new to celery and it looks like the mysql connection initiated on Page1 is lost when the Page1 returns, even if longFunction() is not finished.
Is there another way I can achieve that?
Thanks

Maybe this can be helpul for you:
from celery.result import AsyncResult
from yourapp.celery import app
def Page1(request, arg_id):
obj = Vm.objects.select_for_update().get(id=arg_id)
celery_task_id = obj.longFunction.delay()
return render_to_response(...)
def Page2(request, arg_id, celery_task_id):
task = AsyncResult(app=app, id=celery_task_id)
state = task.state
while state != "SUCCESFUL":
# wait or do whatever you want
vm = Vm.objects.select_for_update().get(id=arg_id)
return render_to_response(...)
More info at http://docs.celeryproject.org/en/latest/reference/celery.states.html

The database lock from select_for_update is released when the transaction closes (in page 1). This lock doesn't get carried to the celery task. You can lock in the celery task but that won't solve your problem because page 2 might get loaded before the celery task obtains the lock.
Mikel's answer will work. You could also put a lock in the cache as described in the celery cookbook.

Related

How to stop the execution of a long process if something changes in the db?

I have a view that sends a message to a RabbitMQ queue.
message = {'origin': 'Bytes CSV',
'data': {'csv_key': str(csv_entry.key),
'csv_fields': csv_fields
'order_by': order_by,
'filters': filters}}
...
queue_service.send(message=message, headers={}, exchange_name=EXCHANGE_IN_NAME,
routing_key=MESSAGES_ROUTING_KEY.replace('#', 'bytes_counting.create'))
On my consumer, I have a long process to generate a CSV.
def create(self, data):
csv_obj = self._get_object(key=data['csv_key'])
if csv_obj.status == CSVRequestStatus.CANCELED:
self.logger.info(f'CSV {csv_obj.key} was canceled by the user')
return
result = self.generate_result_data(filters=data['filters'], order_by=data['order_by'], csv_obj=csv_obj)
csv_data = self._generate_csv(result=result, csv_fields=data['csv_fields'], csv_obj=csv_obj)
file_key = self._post_csv(csv_data=csv_data, csv_obj=csv_obj)
csv_obj.status = CSVRequestStatus.READY
csv_obj.status_additional = CSVRequestStatusAdditional.SUCCESS
csv_obj.file_key = file_key
csv_obj.ready_at = timezone.now()
csv_obj.save(update_fields=['status', 'status_additional', 'ready_at', 'file_key'])
self.logger.info(f'CSV {csv_obj.name} created')
The long proccess happens inside self._generate_csv, because self.generate_result_data returns a queryset, which is lazy.
As you can see, if a user changes the status of the csv_request through an endpoint BEFORE the message starts to be consumed the proccess will not be evaluated. My goal is to let this happen during the execution of self._generate_csv.
So far I tried to use Threading, but unsuccessfully.
How can I achive my goal?
Thanks a lot!
Why don't you checkout Celery library ? Using celery with django with RabbitMQ backend is much easier than directly leveraging rabbitmq queues.
Celery has an inbuilt function revoke to terminate an ongoing task:
>>> from celery.task.control import revoke
>>> revoke(task_id, terminate=True)
related SO answer
celery docs
For your use case, you probably want something like (code snippets):
## celery/tasks.py
from celery import app
#app.task(queue="my_queue")
def create_csv(message):
# ...snip...
pass
## main.py
from celery import uuid, current_app
def start_task(task_id, message):
current_app.send_task(
"create_csv",
args=[message],
task_id=task_id,
)
def kill_task(task_id):
current_app.control.revoke(task_id, terminate=True)
## signals.py
from django.dispatch import receiver
from .models import MyModel
from .main import kill_task
# choose appropriate signal to listen for DB change
#receiver(models.signals.post_save, sender=MyModel)
def handler(sender, instance, **kwargs):
kill_task(instance.task_id)
Use celery.uuid to generate task IDs which can be stored in DB or cache and use the same task ID to control the task i.e. request termination.
Since self._generate_csv is the slowest, the obvious solution would be to work with this function.
To do this, you can divide the creation of the csv file into several pieces. After creating each piece, check the status and see if you can continue to create the file. At the very end, glue all the pieces into a finished file.
Here is a method for combining multiple files into one.

How to provide user constant notification about Celery's Task execution status?

I integrated my project with celery in this way, inside views.py after receving request from the user
def upload(request):
if "POST" == request.method:
# save the file
task_parse.delay()
# continue
and in tasks.py
from __future__ import absolute_import
from celery import shared_task
from uploadapp.main import aunit
#shared_task
def task_parse():
aunit()
return True
In short, the shared task will run a function aunit() from a third python file located in uploadapp/ directory named main.py.
Let's assume that aunit() is a resource heavy process which takes time (like file parsing). As I integrated that with celery, It works totally asynchronously now which is good to me. So, the task start -> Celery process -> It finishes then celery set status to Finish. I can view that using flower .
But what I want to do is that I want to notify the user who is using my app also through django UI that Your Task is done processing as soon as Celery has finished processing at back-side and set status to SUCCESS.
Now, I know this is possible if :
1.) I constantly request the STATUS and see wheather it returns SUCCESS or not.
How do I do that via Celery. How can you query Celery Task status from your views.py and notify user asynchronously with just celery's python module ?
You need a real time mechanism. I would suggest Firebase. Update the Firebase real time DB field of user id with a boolean=True at the end of the celery task. Implement a javascript function to listen to Firebase database user_id object changes -> update the UI

Check if in celery task

How to check that a function in executed by celery?
def notification():
# in_celery() returns True if called from celery_test(),
# False if called from not_celery_test()
if in_celery():
# Send mail directly without creation of additional celery subtask
...
else:
# Send mail with creation of celery task
...
#celery.task()
def celery_test():
notification()
def not_celery_test():
notification()
Here is one way to do it by using celery.current_task. Here is the code to be used by the task:
def notification():
from celery import current_task
if not current_task:
print "directly called"
elif current_task.request.id is None:
print "called synchronously"
else:
print "dispatched"
#app.task
def notify():
notification()
This is code you can run to exercise the above:
from core.tasks import notify, notification
print "DIRECT"
notification()
print "NOT DISPATCHED"
notify()
print "DISPATCHED"
notify.delay().get()
My task code in the first snippet was in a module named core.tasks. And I shoved the code in the last snippet in a custom Django management command. This tests 3 cases:
Calling notification directly.
Calling notification through a task executed synchronously. That is, this task is not dispatched through Celery to a worker. The code of the task executes in the same process that calls notify.
Calling notification through a task run by a worker. The code of the task executes in a different process from the process that started it.
The output was:
NOT DISPATCHED
called synchronously
DISPATCHED
DIRECT
directly called
There is no line from the print in the task on the output after DISPATCHED because that line ends up in the worker log:
[2015-12-17 07:23:57,527: WARNING/Worker-4] dispatched
Important note: I initially was using if current_task is None in the first test but it did not work. I checked and rechecked. Somehow Celery sets current_task to an object which looks like None (if you use repr on it, you get None) but is not None. Unsure what is going on there. Using if not current_task works.
Also, I've tested the code above in a Django application but I've not used it in production. There may be gotchas I don't know.

Asynchronous call in google appengine using task queues in python

Im new to task queue api in google app engine. I have created a new queue and added a task in it using the taskqueue.add() function. I have defined the url of the task and have written down the logic for the task the url. But the task is NOT HAPPENING ASYNCHRONOUSLY as the app is waiting for the task to complete and then it continues executing the statement after the taskqueue.add() function. How do i make the task asynchronous? Any help on this issue is appreciated.
the code looks like this
class botinitiate(webapp.RequestHandler):
def get(self):
# some more statements here
template_values = {'token': token,
'me': user.user_id()
}
taskqueue.add(url='/autobot', params={'key':game_key},queue_name='autobot')
path = os.path.join(os.path.dirname(__file__), 'index.html')
self.response.out.write(template.render(path, template_values))
class autobot(webapp.RequestHandler):
def post(self):
# task logic goes here
application = webapp.WSGIApplication([('/botinitiate',botinitiate),('/autobot',autobot)],debug=True)
def main():
run_wsgi_app(application)
if __name__ == "__main__":
main()
Thanks
The recently developed dev_appserver2 provides concurrency between user requests and task queue requests, for a more accurate emulation of production.
Task queues on App Engine are asynchronous; there's no way for the request that enqueued the task to know when the task is run (short of making RPC calls or other deliberate communication). What you may be observing is the single-threaded nature of the dev_appserver development environment; this certainly won't be the case in production.
So you'd use:
add_async(task, transactional=False, rpc=None)
Source: https://developers.google.com/appengine/docs/python/taskqueue/queues
You'd need to read the docs at the above URL and apply it to your own code.

GAE Task Queue: Task not running/outputting

I've started using the task queue to schedule a time-intensive task to run in the background. The task I want to run is in the URL '/test', and the URL I am using to schedule the task is '/bgtest'. Here is the handler for '/bgtest':
class RunTestAsBackgroundProcess(BaseHandler):
def get_secure(self):
taskqueue.add(url='/test', method='GET')
logging.debug("Task added to queue")
return
The '/test' task outputs data to the logs, and when I visit /test normally it executes, finishes and I can find the results in the logs. However, when I run /bgtest I see nothing in the logs except the "Task added to queue" message from the above function. Strangely, in the Task Queue in the admin console it says that a task has run in the last minute but doesn't give me any details about it. Any ideas?
EDIT: Just to explain the code, BaseHandler is a superclass I use to check the user is logged in to Facebook, and get_secure() is the method called after the superclass' get() method.
EDIT: /test runs this class:
class CalculateTestAllocations(BaseHandler):
def get_secure(self):
dbuser = db.GqlQuery("SELECT * FROM User WHERE fbid = :1", self.user['uid'])[0]
if (dbuser.isadmin != True):
self.redirect('/')
#test data
drivers = []
passengers = []
drivers.append(allocation.Driver("01", allocation.Location(51.440958, -2.576318), 3, 1000)) # coming from Bristol
drivers.append(allocation.Driver("02", allocation.Location(55.935628, -3.285044), 3, 1000)) # coming from Edinburgh
passengers.append(allocation.Passenger("03", allocation.Location(51.483193, -3.208187), 1000)) # coming from Cardiff
passengers.append(allocation.Passenger("04", allocation.Location(52.469263, -1.860303), 1000)) # coming from Birmingham
passengers.append(allocation.Passenger("05", allocation.Location(53.783703, -1.541841), 1000)) # coming from Leeds
passengers.append(allocation.Passenger("06", allocation.Location(54.973994, -1.636391), 1000)) # coming from Newcastle
logging.debug("Running allocation engine now (GET)")
alloc = allocation.Allocation()
alloc.buildProblem(drivers, passengers, allocation.Location(52.951923, -1.169967)) # destination at Nottingham
alloc.solveAndOutput()
This populates a set of test data for my allocation algorithm (which takes in a set of drivers and passengers and calculates the optimum route for them) and then tells the algorithm to run. The stuff sent to the log is contained in the allocation.solveAndOutput() method, which does this:
def solveAndOutput(self):
routes = self.solveProblem()
logging.warn("Num routes: "+str(len(routes)))
logging.warn("Length of first route: "+str(len(routes[0])))
for route in routes:
print self.getStaticMapAddress(route)
logging.debug(self.getStaticMapAddress(route))
As I said, if I just run /test I get these outputs, but if I run /bgtest nothing happens but the task queue says it ran something in the past minute.
It looks like your /test script is fetching from what I can only presume is a session, and then redirecting based on that. That's obviously not going to work in a Task Queue task - there is no user, and hence no session.

Categories