Celery multi workers unexpected task execution order - python

I run celery:
celery multi start --app=myapp fast_worker
slow_worker
-Q:fast_worker fast-queue
-Q:slow_worker slow-queue
-c:fast_worker 1 -c:slow_worker 1
--logfile=%n.log --pidfile=%n.pid
And celerybeat:
celery beat -A myapp
Task:
#task.periodic_task(run_every=timedelta(seconds=5), ignore_result=True)
def test_log_task_queue():
import time
time.sleep(10)
print "test_log_task_queue"
Routing:
CELERY_ROUTES = {
'myapp.tasks.test_log_task_queue': {
'queue': 'slow-queue',
'routing_key': 'slow-queue',
},
}
I use rabbitMQ. When I open rabbitMQ admin panel, I see that my tasks are in slow-queue, but when I open logs I see task output for both workers. Why do both workers execute my tasks, even when task not in worker queue?

It looks like celery multi creates something like shared queues. To fix this problem, I added -X option:
celery multi start --app=myapp fast_worker
slow_worker
-Q:fast_worker fast-queue
-Q:slow_worker slow-queue
-X:fast_worker slow-queue
-X:slow_worker fast-queue
-c:fast_worker 1 -c:slow_worker 1
--logfile=%n.log --pidfile=%n.pid

Related

How to send a task with schedule to celery while worker is running?

I've problem to creating and adding beat schedule task to the celery worker and running that when the celery worker is currently running
well this is my code
from celery import Celery
from celery.schedules import crontab
app = Celery('proj2')
app.config_from_object('proj2.celeryconfig')
#app.task
def hello() -> None:
print(f'Hello')
app.add_periodic_task(10.0, hello)
so when i am running the celery
$ celery -A proj2 worker --beat -E
this works very well and the task will run every 10 seconds
But
I need to create a task with schedule and sending that to the worker to run it automatically
Imagine i have run the worker celery and every things is ok
well i go to the python shell and do following code
>>> from proj2.celery import app
>>>
>>> app.conf.beat_schedule = {
... 'hello-every-5': {
... 'tasks': 'proj2.tasks.bye' # i have a task called bye in tasks module
... 'schedule': 5.0,
... }
...}
>>>
it's not working. Also there is no error
but it seems does not send to the worker
p.s: i have also used add_periodic_task method. still not working

Celery not processing tasks everytime

I am having below configuration for celery
celery = Celery(__name__,
broker=os.environ.get('CELERY_BROKER_URL', 'redis://'),
backend=os.environ.get('CELERY_BROKER_URL', 'redis://'))
celery.config_from_object(APP_SETTINGS)
ssl = celery.conf.get('REDIS_SSL', True)
r = redis.StrictRedis(REDIS_BROKER, int(REDIS_BROKER_PORT), 0,
charset='utf-8', decode_responses=True, ssl=ssl)
db_uri = celery.conf.get('SQLALCHEMY_DATABASE_URI')
#celery.task
def process_task(data):
#some code here
I am calling process task inside API endpoint like
process_task.delay(data)
sometimes it's processing tasks sometimes not.
can someone help me to resolve this issue?
I am running worker like celery worker -A api.celery --loglevel=DEBUG --concurrency=10
Once all the worker-processes are busy the new tasks will just sit on the queue waiting for the next idle worker-process to start the task. This is most likely why you perceive this as "not processing tasks everytime". Go through the monitoring and management section of the Celery documentation to find how to monitor your Celery cluster. For starters, do celery worker -A api.celery inspect active to check the currently running tasks.

Celery tasks on multiple machines

I have a server where I installed a RabbitMQ broker and two Celery consumers (main1.py and main2.py) both connected to the same broker.
In the first consumer (main1.py), I implemented a Celery Beat that sends multiple times a different task on a specific queue:
app = Celery('tasks', broker=..., backend=...)
app.conf.task_routes = (
[
('tasks.beat', {'queue': 'print-queue'}),
],
)
app.conf.beat_schedule = {
'beat-every-10-seconds': {
'task': 'tasks.beat',
'schedule': 10.0
},
}
#app.task(name='tasks.beat', bind=True)
def beat(self):
for i in range(10):
app.send_task("tasks.print", args=[i], queue="print-queue")
return None
In the second consumer (main2.py), I implemented the task said above:
app = Celery('tasks', broker=..., backend=...)
app.conf.task_routes = (
[
('tasks.print', {'queue': 'print-queue'}),
],
)
#app.task(name='tasks.print', bind=True)
def print(self, name):
return name
When I start the two Celery worker:
consumer1: celery worker -A main1 -Q print-queue --beat
consumer2: celery worker -A main2 -Q print-queue
I get these errors:
[ERROR/MainProcess] Received unregistered task of type 'tasks.print'
on the first consumer
[ERROR/MainProcess] Received unregistered task of type 'tasks.beat'
on the second consumer
Is it possible to split tasks on different Celery Applications both connected to the same broker?
Thanks in advance!
Here's what is happening. You have two workers A and B one of which also happens to be running celery beat (say that one is B).
celery beat submits task.beat to the queue. All this does is enqueue a message in rabbit with some metadata including the name of the task.
one of the two workers reads the message. Both A and B are listening to the same queue so either may read it.
a. If A reads the message it will try to find the task called tasks.beat this blows up because A doesn't define that task.
b. If B reads the message it will successfully try to find the task called tasks.beat (since it does have that task) and will run the code. tasks.beat will enqueue a new message in rabbit containing the metadata for tasks.print.
The same problem will again occur because only one of A and B defines tasks.print but either may get the message.
In practice, celery may be doing some checks to throw an error message earlier but I'm fairly certain this is the underlying problem.
In short, all workers (including beat) on a queue should be running the same code.

celery apply_async choking rabbit mq

I am using celery's apply_async method to queue tasks. I expect about 100,000 such tasks to run everyday (number will only go up). I am using RabbitMQ as the broker. I ran the code a few days back and RabbitMQ crashed after a few hours. I noticed that apply_async creates a new queue for each task with x-expires set at 1 day. My hypothesis is that RabbitMQ chokes when so many queues are being created. How can I stop celery from creating these extra queues for each task?
I also tried giving the queue parameter to the apply_async and assigned a x-message-ttl to that queue. Messages did go this new queue, however they were immediately consumed and never reached the ttl of 30sec that I had put. And this did not stop celery from creating those extra queues.
Here's my code:
views.py
from celery import task, chain
chain(task1.s(a), task2.s(b),)
.apply_async(link_error=error_handler.s(a), queue="async_tasks_queue")
tasks.py
from celery.result import AsyncResult
#shared_task
def error_handler(uuid, a):
#Handle error
#shared_task
def task1(a):
#Do something
return a
#shared_task
def task2(a, b):
#Do something more
celery.py
app = Celery(
'app',
broker=settings.QUEUE_URL,
backend=settings.QUEUE_URL,
)
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
app.amqp.queues.add("async_tasks_queue", queue_arguments={'durable' : True , 'x-message-ttl': 30000})
From the celery logs:
[2016-01-05 01:17:24,398: INFO/MainProcess] Received task:
project.tasks.task1[615e094c-2ec9-4568-9fe1-82ead2cd303b]
[2016-01-05 01:17:24,834: INFO/MainProcess] Received task:
project.decorators.wrapper[bf9a0a94-8e71-4ad6-9eaa-359f93446a3f]
RabbitMQ had 2 new queues by the names "615e094c2ec945689fe182ead2cd303b" and "bf9a0a948e714ad69eaa359f93446a3f" when these tasks were executed
My code is running on Django 1.7.7, celery 3.1.17 and RabbitMQ 3.5.3.
Any other suggestions to execute tasks asynchronously are also welcome
Try using a different backend - I recommend Redis. When we tried using Rabbitmq as both broker and backend we discovered that it was ill suited to the broker role.

Topic Exchange with Celery and RabbitMQ

I'm a bit confused on what my configuration should look like to set up a topic exchange.
http://www.rabbitmq.com/tutorials/tutorial-five-python.html
This is what I'd like to accomplish:
Task1 -> send to QueueOne and QueueFirehose
Task2 -> sent to QueueTwo and QueueFirehose
then:
Task1 -> consume from QueueOne
Task2 -> consume from QueueTwo
TaskFirehose -> consume from QueueFirehose
I only want Task1 to consume from QueueOne and Task2 to consume from QueueTwo.
That problem now is that when Task1 and 2 run, they also drain QueueFirehose, and TaskFirehose task never executes.
Is there something wrong with my config, or am I misunderstanding something?
CELERY_QUEUES = {
"QueueOne": {
"exchange_type": "topic",
"binding_key": "pipeline.one",
},
"QueueTwo": {
"exchange_type": "topic",
"binding_key": "pipeline.two",
},
"QueueFirehose": {
"exchange_type": "topic",
"binding_key": "pipeline.#",
},
}
CELERY_ROUTES = {
"tasks.task1": {
"queue": 'QueueOne',
"routing_key": 'pipeline.one',
},
"tasks.task2": {
"queue": 'QueueTwo',
"routing_key": 'pipeline.two',
},
"tasks.firehose": {
'queue': 'QueueFirehose',
"routing_key": 'pipeline.#',
},
}
Assuming that you actually meant something like this:
Task1 -> send to QueueOne
Task2 -> sent to QueueTwo
TaskFirehose -> send to QueueFirehose
then:
Worker1 -> consume from QueueOne, QueueFirehose
Worker2 -> consume from QueueTwo, QueueFirehose
WorkerFirehose -> consume from QueueFirehose
This might not be exactly what you meant, but i think it should cover many scenarios and hopefully yours too.
Something like this should work:
# Advanced example starting 10 workers in the background:
# * Three of the workers processes the images and video queue
# * Two of the workers processes the data queue with loglevel DEBUG
# * the rest processes the default' queue.
$ celery multi start 10 -l INFO -Q:1-3 images,video -Q:4,5 data
-Q default -L:4,5 DEBUG
For more options and reference: http://celery.readthedocs.org/en/latest/reference/celery.bin.multi.html
This was straight from the documentation.
I too had a similar situation, and I tackled it in a slightly different way. I couldnt use celery multi with supervisord.
So instead I created multiple program in supervisord for each worker. The workers will be on different processes anyway, so just let supervisord take care of everything for you.
The config file looks something like:-
; ==================================
; celery worker supervisor example
; ==================================
[program:Worker1]
; Set full path to celery program if using virtualenv
command=celery worker -A proj --loglevel=INFO -Q QueueOne, QueueFirehose
directory=/path/to/project
user=nobody
numprocs=1
stdout_logfile=/var/log/celery/worker1.log
stderr_logfile=/var/log/celery/worker1.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998
Similarly, for Worker2 and WorkerFirehose, edit the corresponding lines to make:
[program:Worker2]
; Set full path to celery program if using virtualenv
command=celery worker -A proj --loglevel=INFO -Q QueueTwo, QueueFirehose
and
[program:WorkerFirehose]
; Set full path to celery program if using virtualenv
command=celery worker -A proj --loglevel=INFO -Q QueueFirehose
include them all in supervisord.conf file and that should do it.

Categories