Changing CELERYBEAT_SCHEDULER - python

I'm have two django apps using the same database. Both using celery and having own CELERYBEAT_SCHEDULE. Is there a way to distinct which tasks should be run in proper celery worker ? now tasks from both apps are scheduled in the same worker.
The reason of that is (?) CELERYBEAT_SCHEDULER to 'djcelery.schedulers.DatabaseScheduler'. I couldn't find any approach/scheduler not using Database. As a broker I'm using redis (local/different for each app) so tasks from outside of CELERYBEAT_SCHEDULE are working properly.

You can define two queues in CELERY_QUEUES setting and assign one queue to each celerybeat using --queues=queue1 parameter.
You can assing task to specific queue using options:
'options': {'queue': 'queue1'},

Related

Celery send_task() method

I have my API, and some endpoints need to forward requests to Celery. Idea is to have specific API service that basically only instantiates Celery client and uses send_task() method, and seperate service(workers) that consume tasks. Code for task definitions should be located in that worker service. Basicaly seperating celery app (API) and celery worker to two seperate services.
I dont want my API to know about any celery task definitions, endpoints only need to use celery_client.send_task('some_task', (some_arguments)). So on one service i have my API, an on other service/host I have celery code base where my celery worker will execute tasks.
I came across this great article that describes what I want to do.
https://medium.com/#tanchinhiong/separating-celery-application-and-worker-in-docker-containers-f70fedb1ba6d
and this post Celery - How to send task from remote machine?
I need help on how to create routes for tasks from the API? I was expecting for celery_client.send_task() to have queue= keyword, but it does not. I need to have 2 queues, and two workers that will consume content from these two queues.
Commands for my workers:
celery -A <path_to_my_celery_file>.celery_client worker --loglevel=info -Q queue_1
celery -A <path_to_my_celery_file>.celery_client worker --loglevel=info -Q queue_2
I have also visited celery "Routing Tasks" documentation, but it is still unclear to me how to establish this communication.
Your API side should hold the router. I guess it's not an issue because it is only a map of task -> queue (aka send task1 to queue1).
In other words, your celery_client should have task_routes like:
task_routes = {
'mytasks.some_task': 'queue_1',
'mytasks.some_other_task': 'queue_2',
}

Use existing celery workers for Airflow's Celeryexecutor workers

I am trying to introduce dynamic workflows into my landscape that involves multiple steps of different model inference where the output from one model gets fed into another model.Currently we have few Celery workers spread across hosts to manage the inference chain. As the complexity increase, we are attempting to build workflows on the fly. For that purpose, I got a dynamic DAG setup with Celeryexecutor working. Now, is there a way I can retain the current Celery setup and route airflow driven tasks to the same workers? I do understand that the setup in these workers should have access to the DAG folders and environment same as the airflow server. I want to know how the celery worker need to be started in these servers so that airflow can route the same tasks that used to be done by the manual workflow from a python application. If I start the workers using command "airflow celery worker", I cannot access my application tasks. If I start celery the way it is currently ie "celery -A proj", airflow has nothing to do with it. Looking for ideas to make it work.
Thanks #DejanLekic. I got it working (though the DAG task scheduling latency was too much that I dropped the approach). If someone is looking to see how this was accomplished, here are few things I did to get it working.
Change the airflow.cfg to change the executor,queue and result back-end settings (Obvious)
If we have to use Celery worker spawned outside the airflow umbrella, change the celery_app_name setting to celery.execute instead of airflow.executors.celery_execute and change the Executor to "LocalExecutor". I have not tested this, but it may even be possible to avoid switching to celery executor by registering airflow's Task in the project's celery App.
Each task will now call send_task(), the AsynResult object returned is then stored in either Xcom(implicitly or explicitly) or in Redis(implicitly push to the queue) and the child task will then gather the Asyncresult ( it will be an implicit call to get the value from Xcom or Redis) and then call .get() to obtain the result from the previous step.
Note: It is not necessary to split the send_task() and .get() between two tasks of the DAG. By splitting them between parent and child, I was trying to take advantage of the lag between tasks. But in my case, the celery execution of tasks completed faster than airflow's inherent latency in scheduling dependent tasks.

How to lock variable in celery task?

I have a celery task, that mangles some variable. It works perfect if I set a single celery worker, but when I use concurrency, it all messed up. How could I lock the critical section, where variable is mangled?
inb4: using Python 3.6, Redis both as broker and result backed. threading.Lock doesn't help in here.
As long as celery runs on multiple workers (processes) thread lock would not help, because it works inside single process. Moreover threading lock have use when you control overall process, while using celery there is no way to achieve that.
It means that celery requires distributed lock. For django I always use django-cache, as in: here. If you need more generic locks especially Redis based, working for any python app you can use sherlock.
I know this is a question with 2+ years, but right now I'm tuning my celery configs and I came to this topic.
I am using python 2.7 with Django 1.11 and celery 4 in a linux machine. I am using rabbitmq as a broker.
My configs implies to have celery running as daemon and celery beat to handle scheduled tasks.
And so, with a dedicated queue for a given task, you may configure this queue with a worker (process) with concurrency=1 (subprocesses).
This solution solves concurrency problems for celery to run a task, but in your code, if you run the task without celery, it won't respect concurrency principles.
Example code:
CELERY_TASK_QUEUES = (
Queue('celery_periodic', default_exchange, routing_key='celery_periodic'),
Queue('celery_task_1', celery_task_1_exchange, routing_key='celery_task_1'),
)
default_exchange = Exchange('celery_periodic', type='direct')
celery_task_1_exchange = Exchange('celery_task_1', type='direct')
CELERY_BEAT_SCHEDULE = {
'celery-task-1': {
'task': 'tasks.celery_task_1',
'schedule': timedelta(minutes=15),
'queue': 'celery_task_1'
},
}
and finally, in /etc/default/celeryd (docs here: https://docs.celeryproject.org/en/latest/userguide/daemonizing.html#example-configuration):
CELERYD_NODES="worker1 worker2"
CELERYD_OPTS="--concurrency=1 --time-limit=600 -Q:worker1 celery_periodic -Q:worker2 celery_task_1"
--concurrency N means you will have exactly N worker subprocesses for your worker instance (meaning the worker instance can handle N conccurent tasks) (from here: https://stackoverflow.com/a/44903753/9412892).
Read more here: https://docs.celeryproject.org/en/stable/userguide/workers.html#concurrency
BR,
Eduardo

Setup celery periodic task

How do I set up a periodic task with Celerybeat and Flask that queries a database every hour?
The environment looks like this:
/
|-app
|-__init__.py
|-jobs
|-task.py
|-celery-beat.sh
|-celery-worker.sh
|-manage.py
I currently have a query function called run_query() located in task.py
I want the scheduler to kick in once the application initiates so I have the following lines in my /app/__init__.py folder:
celery = Celery()
#celery.on_after_configure.connect
def setup_periodic_tasks(sender, **kwargs):
sender.add_periodic_task(1, app.jobs.task.run_query())
(For simplicity's sake, I've set it up so that if it runs, it will run every minute. No such luck yet.)
When I launch the celery-worker.sh it recognizes my function under the [tasks] heading. But the scheduled function never runs. I can manually force the function to run by issuing the following at the command prompt:
>> from app.jobs import task
>> task.run_query.delay()
EDIT: Added celerybeat.sh
As a follow up: If the database is accessed through a flask context, during my asynch function call is it wise to create a new flask context to access the database? Use the existing flask context? Or forget contexts altogether and just initiate a connection to the database? My worry is that if I just initiate a new connection it may interfere with the existing context's connection?
To run periodic tasks you need some kind of schduler (Eg. celery beat).
celery beat is a scheduler; It kicks off tasks at regular intervals, that are then executed by available worker nodes in the
cluster.
You have to ensure only a single scheduler is running for a schedule
at a time, otherwise you’d end up with duplicate tasks. Using a
centralized approach means the schedule doesn’t have to be
synchronized, and the service can operate without using locks.
Reference: periodic-tasks
You can invoke scheduler with command,
$ celery -A proj beat #different process from your worker
You can also embed beat inside the worker by enabling the workers -B
option, this is convenient if you’ll never run more than one worker
node, but it’s not commonly used and for that reason isn’t recommended
for production use Starting scheduler:
$ celery -A proj worker -B
Reference: celery starting scheduler

Multiple Celery instances consuming a single queue

Is it possible to have multiple celery instances possible on different machines consuming from a single queue for tasks, working with django preferably using django-orm as the backend?
How can i implement this if possible, I can't seem to find any documentation for this.
Yes it's possible, they just have to use the same broker. For instance, if you are using AMQP, the configs on your servers must share the same
BROKER_URL = 'amqp://user:password#localhost:5672//'
See the routing page for more details. For instance let's say you want to have a common queue for two servers, then one specific to each of them, you could do
On server 1:
CELERY_ROUTES = {'your_app.your_specific_tasks1': {'queue': 'server1'}}
user#server1:/$ celery -A your_celery_app worker -Q server1, default
On server 2:
CELERY_ROUTES = {'your_app.your_specific_tasks2': {'queue': 'server2'}}
user#server2:/$ celery -A your_celery_app worker -Q server2, default
Of course it's optional, by default all the tasks will be routed to the queue named celery.

Categories