Celery tasks on multiple machines

Celery tasks on multiple machines - python

I have a server where I installed a RabbitMQ broker and two Celery consumers (main1.py and main2.py) both connected to the same broker.
In the first consumer (main1.py), I implemented a Celery Beat that sends multiple times a different task on a specific queue:
app = Celery('tasks', broker=..., backend=...)
app.conf.task_routes = (
[
('tasks.beat', {'queue': 'print-queue'}),
],
)
app.conf.beat_schedule = {
'beat-every-10-seconds': {
'task': 'tasks.beat',
'schedule': 10.0
},
}
#app.task(name='tasks.beat', bind=True)
def beat(self):
for i in range(10):
app.send_task("tasks.print", args=[i], queue="print-queue")
return None
In the second consumer (main2.py), I implemented the task said above:
app = Celery('tasks', broker=..., backend=...)
app.conf.task_routes = (
[
('tasks.print', {'queue': 'print-queue'}),
],
)
#app.task(name='tasks.print', bind=True)
def print(self, name):
return name
When I start the two Celery worker:
consumer1: celery worker -A main1 -Q print-queue --beat
consumer2: celery worker -A main2 -Q print-queue
I get these errors:
[ERROR/MainProcess] Received unregistered task of type 'tasks.print'
on the first consumer
[ERROR/MainProcess] Received unregistered task of type 'tasks.beat'
on the second consumer
Is it possible to split tasks on different Celery Applications both connected to the same broker?
Thanks in advance!

Here's what is happening. You have two workers A and B one of which also happens to be running celery beat (say that one is B).
celery beat submits task.beat to the queue. All this does is enqueue a message in rabbit with some metadata including the name of the task.
one of the two workers reads the message. Both A and B are listening to the same queue so either may read it.
a. If A reads the message it will try to find the task called tasks.beat this blows up because A doesn't define that task.
b. If B reads the message it will successfully try to find the task called tasks.beat (since it does have that task) and will run the code. tasks.beat will enqueue a new message in rabbit containing the metadata for tasks.print.
The same problem will again occur because only one of A and B defines tasks.print but either may get the message.
In practice, celery may be doing some checks to throw an error message earlier but I'm fairly certain this is the underlying problem.
In short, all workers (including beat) on a queue should be running the same code.

Related

Celery not processing tasks everytime

I am having below configuration for celery
celery = Celery(__name__,
broker=os.environ.get('CELERY_BROKER_URL', 'redis://'),
backend=os.environ.get('CELERY_BROKER_URL', 'redis://'))
celery.config_from_object(APP_SETTINGS)
ssl = celery.conf.get('REDIS_SSL', True)
r = redis.StrictRedis(REDIS_BROKER, int(REDIS_BROKER_PORT), 0,
charset='utf-8', decode_responses=True, ssl=ssl)
db_uri = celery.conf.get('SQLALCHEMY_DATABASE_URI')
#celery.task
def process_task(data):
#some code here
I am calling process task inside API endpoint like
process_task.delay(data)
sometimes it's processing tasks sometimes not.
can someone help me to resolve this issue?
I am running worker like celery worker -A api.celery --loglevel=DEBUG --concurrency=10

Once all the worker-processes are busy the new tasks will just sit on the queue waiting for the next idle worker-process to start the task. This is most likely why you perceive this as "not processing tasks everytime". Go through the monitoring and management section of the Celery documentation to find how to monitor your Celery cluster. For starters, do celery worker -A api.celery inspect active to check the currently running tasks.

How do I get a Celery worker to consume an 'outside' RabbitMQ queue?

I have the following scripts:
celery_tasks.py
from celery import Celery
app = Celery(broker='amqp://guest:guest#localhost:5672//')
app.conf.task_default_queue = 'test_queue'
#app.task(acks_late=True)
def test(a):
return a
publish.py
from celery_tasks import test
test.delay('abc')
When i run publish.py and start the worker (celery -A celery_tasks worker --loglevel=DEBUG), the 'abc' content is published in the 'test_queue' and is consumed by the worker.
Is there a way for the worker to consume something from a queue that was not posted by Celery? For example, when I put something in the test_queue straight through RabbitMQ, without going through the Celery publisher, and run the Celery worker, it gave me the following warning:
WARNING/MainProcess] Received and deleted unknown message. Wrong destination?!?
The full contents of the message body was: body: 'abc' (3b)
{content_type:None content_encoding:None
delivery_info:{'exchange': '', 'redelivered': False, 'delivery_tag': 1, 'consumer_tag': 'None2', 'routing_key': 'test_queue'} headers={}}
Is there a way to solve this?

Celery has a specific format and a set of headers that needs to be maintained to comply with it. Therefore you would have to reverse engineer it to make celery-compliant message not produced by celery.
Keep in mind that celery is not really made to send messages across the broker, but to send tasks, which are enhanced messages therefore have extras in the header part of the amqp message

It's a late answer but custom consumers might help you. I'm using this for consuming messages from rabbitmq. Where these messages are being populated from another app with pika.
http://docs.celeryproject.org/en/latest/userguide/extending.html#custom-message-consumers

Celery multi workers unexpected task execution order

I run celery:
celery multi start --app=myapp fast_worker
slow_worker
-Q:fast_worker fast-queue
-Q:slow_worker slow-queue
-c:fast_worker 1 -c:slow_worker 1
--logfile=%n.log --pidfile=%n.pid
And celerybeat:
celery beat -A myapp
Task:
#task.periodic_task(run_every=timedelta(seconds=5), ignore_result=True)
def test_log_task_queue():
import time
time.sleep(10)
print "test_log_task_queue"
Routing:
CELERY_ROUTES = {
'myapp.tasks.test_log_task_queue': {
'queue': 'slow-queue',
'routing_key': 'slow-queue',
},
}
I use rabbitMQ. When I open rabbitMQ admin panel, I see that my tasks are in slow-queue, but when I open logs I see task output for both workers. Why do both workers execute my tasks, even when task not in worker queue?

It looks like celery multi creates something like shared queues. To fix this problem, I added -X option:
celery multi start --app=myapp fast_worker
slow_worker
-Q:fast_worker fast-queue
-Q:slow_worker slow-queue
-X:fast_worker slow-queue
-X:slow_worker fast-queue
-c:fast_worker 1 -c:slow_worker 1
--logfile=%n.log --pidfile=%n.pid

celery apply_async choking rabbit mq

I am using celery's apply_async method to queue tasks. I expect about 100,000 such tasks to run everyday (number will only go up). I am using RabbitMQ as the broker. I ran the code a few days back and RabbitMQ crashed after a few hours. I noticed that apply_async creates a new queue for each task with x-expires set at 1 day. My hypothesis is that RabbitMQ chokes when so many queues are being created. How can I stop celery from creating these extra queues for each task?
I also tried giving the queue parameter to the apply_async and assigned a x-message-ttl to that queue. Messages did go this new queue, however they were immediately consumed and never reached the ttl of 30sec that I had put. And this did not stop celery from creating those extra queues.
Here's my code:
views.py
from celery import task, chain
chain(task1.s(a), task2.s(b),)
.apply_async(link_error=error_handler.s(a), queue="async_tasks_queue")
tasks.py
from celery.result import AsyncResult
#shared_task
def error_handler(uuid, a):
#Handle error
#shared_task
def task1(a):
#Do something
return a
#shared_task
def task2(a, b):
#Do something more
celery.py
app = Celery(
'app',
broker=settings.QUEUE_URL,
backend=settings.QUEUE_URL,
)
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
app.amqp.queues.add("async_tasks_queue", queue_arguments={'durable' : True , 'x-message-ttl': 30000})
From the celery logs:
[2016-01-05 01:17:24,398: INFO/MainProcess] Received task:
project.tasks.task1[615e094c-2ec9-4568-9fe1-82ead2cd303b]
[2016-01-05 01:17:24,834: INFO/MainProcess] Received task:
project.decorators.wrapper[bf9a0a94-8e71-4ad6-9eaa-359f93446a3f]
RabbitMQ had 2 new queues by the names "615e094c2ec945689fe182ead2cd303b" and "bf9a0a948e714ad69eaa359f93446a3f" when these tasks were executed
My code is running on Django 1.7.7, celery 3.1.17 and RabbitMQ 3.5.3.
Any other suggestions to execute tasks asynchronously are also welcome

Try using a different backend - I recommend Redis. When we tried using Rabbitmq as both broker and backend we discovered that it was ill suited to the broker role.

Django Celery Periodic Tasks Run But RabbitMQ Queues Aren't Consumed

Question
After running tasks via celery's periodic task scheduler, beat, why do I have so many unconsumed queues remaining in RabbitMQ?
Setup
Django web app running on Heroku
Tasks scheduled via celery beat
Tasks run via celery worker
Message broker is RabbitMQ from ClouldAMQP
Procfile
web: gunicorn --workers=2 --worker-class=gevent --bind=0.0.0.0:$PORT project_name.wsgi:application
scheduler: python manage.py celery worker --loglevel=ERROR -B -E --maxtasksperchild=1000
worker: python manage.py celery worker -E --maxtasksperchild=1000 --loglevel=ERROR
settings.py
CELERYBEAT_SCHEDULE = {
'do_some_task': {
'task': 'project_name.apps.appname.tasks.some_task',
'schedule': datetime.timedelta(seconds=60 * 15),
'args': ''
},
}
tasks.py
#celery.task
def some_task()
# Get some data from external resources
# Save that data to the database
# No return value specified
Result
Every time the task runs, I get (via the RabbitMQ web interface):
An additional message in the "Ready" state under my "Queued Messages"
An additional queue with a single message in the "ready" state
This queue has no listed consumers

It ended up being my setting for CELERY_RESULT_BACKEND.
Previously, it was:
CELERY_RESULT_BACKEND = 'amqp'
I no longer had unconsumed messages / queues in RabbitMQ after I changed it to:
CELERY_RESULT_BACKEND = 'database'
What was happening, it would appear, is that after a task was executed, celery was sending info about that task back via rabbitmq, but, there was nothing setup to consume these responses messages, hence a bunch of unread ones ending up in the queue.
NOTE: This means that celery would be adding database entries recording the outcomes of tasks. To keep my database from getting loaded up with useless messages, I added:
# Delete result records ("tombstones") from database after 4 hours
# http://docs.celeryproject.org/en/latest/configuration.html#celery-task-result-expires
CELERY_TASK_RESULT_EXPIRES = 14400
Relevant parts from Settings.py
########## CELERY CONFIGURATION
import djcelery
# https://github.com/celery/django-celery/
djcelery.setup_loader()
INSTALLED_APPS = INSTALLED_APPS + (
'djcelery',
)
# Compress all the messages using gzip
# http://celery.readthedocs.org/en/latest/userguide/calling.html#compression
CELERY_MESSAGE_COMPRESSION = 'gzip'
# See: http://docs.celeryproject.org/en/latest/configuration.html#broker-transport
BROKER_TRANSPORT = 'amqplib'
# Set this number to the amount of allowed concurrent connections on your AMQP
# provider, divided by the amount of active workers you have.
#
# For example, if you have the 'Little Lemur' CloudAMQP plan (their free tier),
# they allow 3 concurrent connections. So if you run a single worker, you'd
# want this number to be 3. If you had 3 workers running, you'd lower this
# number to 1, since 3 workers each maintaining one open connection = 3
# connections total.
#
# See: http://docs.celeryproject.org/en/latest/configuration.html#broker-pool-limit
BROKER_POOL_LIMIT = 3
# See: http://docs.celeryproject.org/en/latest/configuration.html#broker-connection-max-retries
BROKER_CONNECTION_MAX_RETRIES = 0
# See: http://docs.celeryproject.org/en/latest/configuration.html#broker-url
BROKER_URL = os.environ.get('CLOUDAMQP_URL')
# Previously, had this set to 'amqp', this resulted in many read / unconsumed
# queues and messages in RabbitMQ
# See: http://docs.celeryproject.org/en/latest/configuration.html#celery-result-backend
CELERY_RESULT_BACKEND = 'database'
# Delete result records ("tombstones") from database after 4 hours
# http://docs.celeryproject.org/en/latest/configuration.html#celery-task-result-expires
CELERY_TASK_RESULT_EXPIRES = 14400
########## END CELERY CONFIGURATION

Looks like you are getting back responses from your consumed tasks.
You can avoid that by doing:
#celery.task(ignore_result=True)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Celery tasks on multiple machines - python

Related

Celery not processing tasks everytime

How do I get a Celery worker to consume an 'outside' RabbitMQ queue?

Celery multi workers unexpected task execution order

celery apply_async choking rabbit mq

Django Celery Periodic Tasks Run But RabbitMQ Queues Aren't Consumed

Categories

Resources