Celery: #shared_task and non-standard BROKER_URL - python

I have a Celery 3.1.19 setup which uses a BROKER_URL including a virtual host.
# in settings.py
BROKER_URL = 'amqp://guest:guest#localhost:5672/yard'
Celery starts normally, loads the tasks, and the tasks I define within the #app.task decorator work fine. I assume that my rabbitmq and celery configuration at this end are correct.
Tasks, I define with #shared_tasks and load with app.autodiscover_tasks are still loading correctly upon start. However, if I call the task the message ends up in the (still existing) amqp://guest:guest#localhost:5672/ virtual host.
Question: What am I missing here? Where do shared tasks get their actual configuration from.
And here some more details:
# celery_app.py
from celery import Celery
celery_app = Celery('celery_app')
celery_app.config_from_object('settings')
celery_app.autodiscover_tasks(['connectors'])
#celery_app.task
def i_do_work():
print 'this works'
And in connectors/tasks.py (with an __init__.py in the same folder):
# in connectors/tasks.py
from celery import shared_task
#shared_task
def I_do_not_work():
print 'bummer'
And again the shared task gets also picked up by the Celery instance. It just lacks somehow the context to send messages to the right BROKER_URL.
Btw. why are shared_tasks so purely documented. Do they rely on some Django context? I am not using Django.
Or do I need additional parameters in my settings?
Thanks a lot.

The celery_app was not yet imported at application start. Within my project, I added following code to __init__.py at the same module level as my celery_app definition.
from __future__ import absolute_import
try:
from .celery_app import celery_app
except ImportError:
# just in case someone develops application without
# celery running
pass
I was confused by the fact that Celery seems to come with a perfectly working default app. In this case a more interface like structure with a NotImplementedError might have been more helpful. Nevertheless, Celery is awesome.

Related

running code when celery starts in worker mode

I am looking to run some code when a celery worker starts. Not when let's say a task is imported to be used from a client type application.
celery_app = Celery(__name__)
# I want to only create the egine if this file is used by a worker
engine = create_engine(str(POSTGRES_URL))
You are looking for worker signals ( https://docs.celeryproject.org/en/latest/userguide/signals.html?highlight=worker_ready#worker-signals ). It is all nicely explained there. I am guessing worker_ready is the one you should look at first.

Celery Task Priority

I want to manage tasks using Celery. I want to have a single task queue (with concurrency 1) and be able to push tasks onto the queue with different priorities such that higher priority tasks will preempt the others.
I am adding three tasks to a queue like so:
add_tasks.py
from tasks import example_task
example_task.apply_async((1), priority=1)
example_task.apply_async((2), priority=3)
example_task.apply_async((3), priority=2)
I have the following configuration:
tasks.py
from __future__ import absolute_import, unicode_literals
from celery import Celery
from kombu import Queue, Exchange
import time
app = Celery('tasks', backend='rpc://', broker='pyamqp://')
app.conf.task_queues = [Queue('celery', Exchange('celery'), routing_key='celery', queue_arguments={'x-max-priority': 10})]
#app.task
def example_task(task_num):
time.sleep(3)
print('Started {}'.format(task_num)
return True
I expect the second task I added to run before the third, because it has higher priority but it doesn't. They run in the order added.
I am following the docs and thought I had configured the app correctly.
Am I doing something wrong or am I misunderstanding the priority feature?
There is a possibility that the queue has no chance to prioritize the messages (because they get downloaded before the sorting happens). Try with these two settings (adapt to your project as needed):
CELERY_ACKS_LATE = True
CELERYD_PREFETCH_MULTIPLIER = 1
Prefetch multiplier is 4 by default.
I had developed a sample application to realize Celery's priority tasking on a very small scale. Please have a look at it here. While developing it, I had encountered a very similar problem and this change in settings actually solved it.
Note that you also require RabbitMQ version 3.5.0 or higher.

Make Celery use Django's test database without task_always_eager

When running tests in Django applications that make use of Celery tasks I can't fully test tasks that need to get data from the database since they don't connect to the test database that Django creates.
Setting task_always_eager in Celery to True partially solves this problem but as the documentation for testing says, this doesn't fully reflect how the code will run on a real Celery worker and isn't suitable for testing.
How can I make Celery tasks use the Django test database when running Django tests without setting task_always_eager = True?
Short = You must run celery worker as in production
Easy:
Use dedicated test db (as in production)
Configure celery to use it
Start celery worker manually before you run tests
Advanced:
Use auto created test db (it may be sqlite)
Run celery worker in your test setUp()
Configure celery to use auto created test db (copy django.conf.settings.DATABASE from test process to celery)
And always you must provide message broker for celery.
I have a test that requires dedicated celery worker to check my code that passes messages between celery task and calling code:
https://gist.github.com/Sovetnikov/a7ad982fc77e8dfbc528bfc20fcf3b1e
This python module is two in one - a TestUnit and celery worker runner with self contained configuration.
My code does not utilize any db, but you can easily adapt it to your need. Just pass django.conf.settings.DATABASE (as json or pickle or whatever you like method) to celery starter code and configure Django DATABASE to point to test db.
Additional info:
There is complete solution for this case https://github.com/RentMethod/celerytest (i tried
some old version of it and have no luck because it uses threads, with python GIL ... and i think it is over-complicated)
Sample code, how to configure DATABASE settings and init django itself in single module https://gist.github.com/Sovetnikov/369a8d05ba2b6482fa20769bc498f122
A simple solution is to use celery.contrib.testing.worker.start_worker to spawn a Celery worker within the Django test process. Because it lives in the same process, it can access the default in-memory test database, but because it lives in its own thread, it isn't eager and the task_always_eager flag is not needed or recommended.

Asynchronous Flask functions

I have an app in flask that I need to perform some asynchronous action on. I've read about celery, but not sure if it's correct.
Basically I have a button that takes input and runs a query to return back to the template, and this is quick, but I want it to also run another task (passing a SOAP envelope against a web service), and this is slow. I don't want the user to have to wait for the web-service call to finish. I'd like for the query running the return back to the template with new data to happen as quickly as possible and the web-service call to happen in the background.
Is this doable?
I know there are lots of Celery related threads here, but this might provide some service.
Using Celery for asynchronous activity requires more than just installing and importing the lib.
Requirements:
Celery lib
Queue broker, like Redis (in memory db), installed
Separate file that creates celery object
I found the Flask documentation on Celery with flask lacking. My preferred method was to create a tasks.py file and put in
from celery import Celery
# Other imports for functionality here
app = Celery('tasks', broker='redis://localhost:6379')
#app.tasks
def your_function(args):
do something with args
return something
Then in the application file make sure this is imported:
from tasks import your_function
And then use it where you need to in the app
your_function(args)
Then you must make sure that a celery daemon/worker is running. This can be done by init, by systemd, by launchctl or manually at the CLI (not ideal). Redis must also be running and listening on the url you give it.
I hope this helps someone else.
sounds like you need tornado! Asynchronous web server gateway compatible with flask
from tornado.wsgi import WSGIContainer
from tornado.httpserver import HTTPServer
from tornado.ioloop import IOLoop
from YourModule import app
http_server = HTTPServer(WSGIContainer(app))
http_server.listen(8080)
IOLoop.instance().start()
I prefer tornado for its speed, reliability, and simplicity with Flask, which I love for its beauty

Module importing multiple times with django + celery

I have a module which is expensive to import (it involves downloading a ~20MB index file), which is used by a celery worker. Unfortunately I can't figure out how to have the module imported only once, and only by the celery worker.
Version 1 tasks.py file:
import expensive_module
#shared_task
def f():
expensive_module.do_stuff()
When I organize the file this way the expensive module is imported both by the web server and the celery instance, which is what I'd expect since the tasks module is imported in both and they're difference processes.
Version 2 tasks.py file:
#shared_task:
def f():
import expensive_module
expensive_module.do_stuff()
In this version the web server never imports the module (which is good), but the module gets re-imported by the celery worker every time f.delay() is called. This is what really confuses me. In this scenario, why is the module re-imported every time this function is run by the celery worker? How can I re-organize this code to have only the celery worker import the expensive module, and have the module imported only once?
As a follow-on, less important question, in Version 1 of the tasks.py file, why does the web instance import the expensive module twice? Both times it's imported form urls.py when django runs self._urlconf_module = import_module(self.urlconf_name).
Make a duplicate tasks.py file for webserver which has empty tasks and no unneeded imports.
For celery use version 1 where you import only once instead of every time you call that task.
Been there and it works.

Categories