Check if in celery task - python

How to check that a function in executed by celery?
def notification():
# in_celery() returns True if called from celery_test(),
# False if called from not_celery_test()
if in_celery():
# Send mail directly without creation of additional celery subtask
...
else:
# Send mail with creation of celery task
...
#celery.task()
def celery_test():
notification()
def not_celery_test():
notification()

Here is one way to do it by using celery.current_task. Here is the code to be used by the task:
def notification():
from celery import current_task
if not current_task:
print "directly called"
elif current_task.request.id is None:
print "called synchronously"
else:
print "dispatched"
#app.task
def notify():
notification()
This is code you can run to exercise the above:
from core.tasks import notify, notification
print "DIRECT"
notification()
print "NOT DISPATCHED"
notify()
print "DISPATCHED"
notify.delay().get()
My task code in the first snippet was in a module named core.tasks. And I shoved the code in the last snippet in a custom Django management command. This tests 3 cases:
Calling notification directly.
Calling notification through a task executed synchronously. That is, this task is not dispatched through Celery to a worker. The code of the task executes in the same process that calls notify.
Calling notification through a task run by a worker. The code of the task executes in a different process from the process that started it.
The output was:
NOT DISPATCHED
called synchronously
DISPATCHED
DIRECT
directly called
There is no line from the print in the task on the output after DISPATCHED because that line ends up in the worker log:
[2015-12-17 07:23:57,527: WARNING/Worker-4] dispatched
Important note: I initially was using if current_task is None in the first test but it did not work. I checked and rechecked. Somehow Celery sets current_task to an object which looks like None (if you use repr on it, you get None) but is not None. Unsure what is going on there. Using if not current_task works.
Also, I've tested the code above in a Django application but I've not used it in production. There may be gotchas I don't know.

Related

Django steps or process messages via REST

For learning purpose I want to implement the next thing:
I have a script that runs selenium for example in the background and I have some log messages that help me to see what is going on in the terminal.
But I want to get the same messages in my REST request to the Angular app.
print('Started')
print('Logged in')
...
print('Processing')
...
print('Success')
In my view.py file
class RunTask(viewsets.ViewSet):
queryset = Task.objects.all()
#action(detail=False, methods=['GET'], name='Run Test Script')
def run(self, request, *args, **kwargs):
task = task()
if valid['success']:
return Response(data=task)
else:
return Response(data=task['message'])
def task()
print('Staring')
print('Logged in')
...
print('Processing')
...
print('Success')
return {
'success': True/False,
'message': 'my status message'
}
Now it shows me only the result of the task. But I want to get the same messages to indicate process status in frontend.
And I can't understand how to organize it.
Or how I can tell angular about my process status?
Unfortunately, it's not that simple. Indeed, the REST API lets you start the task, but since it runs in the same thread, the HTTP request will block until the task is finished before sending the response. Your print statements won't appear in the HTTP response but on your server output (if you look at the shell where you ran python manage.py runserver, you'll see those print statements).
Now, if you wish to have those output in real-time, you'll have to look for WebSockets. They allow you to open a "tunnel" between the browser and the server, and send/receive messages in real-time. The django-channels library allow you to implement them.
However, for long-running background tasks (like a Selenium scraper), I would advise to look into the Celery task queue. Basically, your Django process will schedule task into the queue. The tasks into the queue will then be executed by one (or more !) "worker" processes. The advantage of this is that your Django process won't be blocked by the long task: it justs add some work into the queue and then respond.
When you add tasks in the queue, Celery will give you a unique identifier for this task, that you can return in the HTTP response. You can then very well implement another endpoint which takes a task id in parameter and return the state of the task (is it pending ? done ? failed ?).
For this to work, you'll have to setup a "broker", a kind of database that will store the tasks to do and their results (typically RabbitMQ or Redis). Celery documentation explains this well: https://docs.celeryproject.org/en/latest/getting-started/brokers/index.html
Either way you choose, it's not a trivial thing and will need quite some work before having some results ; but it's interesting to see how it expands the possibilities of a classical HTTP server.

How to use result callback for celery AsyncTask

I have the following situatuion.
My client sends following task to the worker:
# client
task = my_task.apply_async((some_params), queue='my_queue')
# task.get() # This blocks
My worker executes the task properly and returns the result.
So retrieving the result with task.get() works but it blocks. Now what I'd like to have is a callback that is called when a result (success or failure) is available.
There is a on_success function of the Task class. But that is used in the worker. Similiar Question
Any ideas or solutions?
You can have callbacks with a task, but it's not the caller or client that can be notified or called back (because celery is out of process), it will be another celery task that has to be used as the callback. If you would like to use a callback, you can use celery's link or canvassing functionality.
For a simple callback that uses the result of the initiating task you can do the following:
#app.task
def add(m, n):
return m + n
#app.task
def callback(result):
print(f'My result was {result}')
def client_caller():
add.apply_async(args=(2, 2), link=callback.s())

Celery, RabbitMQ, Redis: Celery message enters exchange, but not queue?

I'm using Python 2.7 (sigh), celery==3.1.19, librabbitmq==1.6.1, rabbitmq-server-3.5.6-1.noarch, and redis 2.8.24 (from redis-cli info).
I'm attempting to send a message from a celery producer to a celery consumer, and obtain the result back in the producer. There is 1 producer and 1 consumer, but 2 rabbitmq's (as brokers) and 1 redis (for results) in between.
The problem I'm facing is:
In the consumer, I get back get an AsyncResult via async_result =
ZipUp.delay(unique_directory), but async_result.ready() never
returns True (at least for 9 seconds it doesn't) - even for a
consumer task that does essentially nothing but return a string.
I can see, in the rabbitmq management web interface, my message
being received by the rabbitmq exchange, but it doesn't show up in
the corresponding rabbitmq queue. Also, a log message sent by the
very beginning of the ZipUp task doesn't appear to be getting
logged.
Things work if I don't try to get a result back from the AsyncResult! But I'm kinda hoping to get the result of the call - it's useful :).
Below are configuration specifics.
We're setting up Celery as follows for returns:
CELERY_RESULT_BACKEND = 'redis://%s' % _SHARED_WRITE_CACHE_HOST_INTERNAL
CELERY_RESULT = Celery('TEST', broker=CELERY_BROKER)
CELERY_RESULT.conf.update(
BROKER_HEARTBEAT=60,
CELERY_RESULT_BACKEND=CELERY_RESULT_BACKEND,
CELERY_TASK_RESULT_EXPIRES=100,
CELERY_IGNORE_RESULT=False,
CELERY_RESULT_PERSISTENT=False,
CELERY_ACCEPT_CONTENT=['json'],
CELERY_TASK_SERIALIZER='json',
CELERY_RESULT_SERIALIZER='json',
)
We have another Celery configuration that doesn't expect a return value, and that works - in the same program. It looks like:
CELERY = Celery('TEST', broker=CELERY_BROKER)
CELERY.conf.update(
BROKER_HEARTBEAT=60,
CELERY_RESULT_BACKEND=CELERY_BROKER,
CELERY_TASK_RESULT_EXPIRES=100,
CELERY_STORE_ERRORS_EVEN_IF_IGNORED=False,
CELERY_IGNORE_RESULT=True,
CELERY_ACCEPT_CONTENT=['json'],
CELERY_TASK_SERIALIZER='json',
CELERY_RESULT_SERIALIZER='json',
)
The celery producer's stub looks like:
#CELERY_RESULT.task(name='ZipUp', exchange='cognition.workflow.ZipUp_%s' % INTERNAL_VERSION)
def ZipUp(directory): # pylint: disable=invalid-name
""" Task stub """
_unused_directory = directory
raise NotImplementedError
It's been mentioned that using queue= instead of exchange= in this stub would be simpler. Can anyone confirm that (I googled but found exactly nothing on the topic)? Apparently you can just use queue= unless you want to use fanout or something fancy like that, since not all celery backends have the concept of an exchange.
Anyway, the celery consumer starts out with:
#task(queue='cognition.workflow.ZipUp_%s' % INTERNAL_VERSION, name='ZipUp')
#StatsInstrument('workflow.ZipUp')
def ZipUp(directory): # pylint: disable=invalid-name
'''
Zip all files in directory, password protected, and return the pathname of the new zip archive.
:param directory Directory to zip
'''
try:
LOGGER.info('zipping up {}'.format(directory))
But "zipping up" doesn't get logged anywhere. I searched every (disk-backed) file on the celery server for that string, and got two hits: /usr/bin/zip, and my celery task's code - and no log messages.
Any suggestions?
Thanks for reading!
It appears that using the following task stub in the producer solved the problem:
#CELERY_RESULT.task(name='ZipUp', queue='cognition.workflow.ZipUp_%s' % INTERNAL_VERSION)
def ZipUp(directory): # pylint: disable=invalid-name
""" Task stub """
_unused_directory = directory
raise NotImplementedError
In short, it's using queue= instead of exchange= .

python django race condition with celery

Working on a python django project, here is what I want:
User access Page1 with object argument, function longFunction() of the object is triggered and passed to celery so the page can be returned immediately
If user tries to access Page2 with same object argument, I want the page to hang until object function longFunction() triggered by Page1 is terminated.
So I tried by locking mysql db row with objects.select_for_update() but it doesn't work.
Here is a simplified version of my code:
def Page1(request, arg_id):
obj = Vm.objects.select_for_update().get(id=arg_id)
obj.longFunction.delay()
return render_to_response(...)
def Page2(request, arg_id):
vm = Vm.objects.select_for_update().get(id=arg_id)
return render_to_response(...)
I want that Page2 hangs at the line vm = Vm.objects.select_for_update().get(id=arg_id) until longFunction() is completed. I'm new to celery and it looks like the mysql connection initiated on Page1 is lost when the Page1 returns, even if longFunction() is not finished.
Is there another way I can achieve that?
Thanks
Maybe this can be helpul for you:
from celery.result import AsyncResult
from yourapp.celery import app
def Page1(request, arg_id):
obj = Vm.objects.select_for_update().get(id=arg_id)
celery_task_id = obj.longFunction.delay()
return render_to_response(...)
def Page2(request, arg_id, celery_task_id):
task = AsyncResult(app=app, id=celery_task_id)
state = task.state
while state != "SUCCESFUL":
# wait or do whatever you want
vm = Vm.objects.select_for_update().get(id=arg_id)
return render_to_response(...)
More info at http://docs.celeryproject.org/en/latest/reference/celery.states.html
The database lock from select_for_update is released when the transaction closes (in page 1). This lock doesn't get carried to the celery task. You can lock in the celery task but that won't solve your problem because page 2 might get loaded before the celery task obtains the lock.
Mikel's answer will work. You could also put a lock in the cache as described in the celery cookbook.

Celery dynamic queue creation and routing

I'm trying to call a task and create a queue for that task if it doesn't exist then immediately insert to that queue the called task. I have the following code:
#task
def greet(name):
return "Hello %s!" % name
def run():
result = greet.delay(args=['marc'], queue='greet.1',
routing_key='greet.1')
print result.ready()
then I have a custom router:
class MyRouter(object):
def route_for_task(self, task, args=None, kwargs=None):
if task == 'tasks.greet':
return {'queue': kwargs['queue'],
'exchange': 'greet',
'exchange_type': 'direct',
'routing_key': kwargs['routing_key']}
return None
this creates an exchange called greet.1 and a queue called greet.1 but the queue is empty. The exchange should be just called greet which knows how to route a routing key like greet.1 to the queue called greet.1.
Any ideas?
When you do the following:
task.apply_async(queue='foo', routing_key='foobar')
Then Celery will take default values from the 'foo' queue in CELERY_QUEUES,
or if it does not exist then automatically create it using (queue=foo, exchange=foo, routing_key=foo)
So if 'foo' does not exist in CELERY_QUEUES you will end up with:
queues['foo'] = Queue('foo', exchange=Exchange('foo'), routing_key='foo')
The producer will then declare that queue, but since you override the routing_key,
actually send the message using routing_key = 'foobar'
This may seem strange but the behavior is actually useful for topic exchanges,
where you publish to different topics.
It's harder to do what you want though, you can create the queue yourself
and declare it, but that won't work well with automatic message publish retries.
It would be better if the queue argument to apply_async could support
a custom kombu.Queue instead that will be both declared and used as the destination.
Maybe you could open an issue for that at http://github.com/celery/celery/issues

Categories