Celery retried messages remain in same worker

Celery retried messages remain in same worker - python

I had in the back of my mind the impression that if a celery worker gets a task , and it is retried - it remains in the worker's memory (with the eta) - and doesn't return to the queue.
resulting in that if a celery task is retried and the worker is busy working on different tasks , and that task eta arrives- it has to wait until it finishes processing the other tasks.
I tried looking in the documentation for something that is aligned with what I remembered , but I can't find anything.
what I did to try and check it is create two tasks.
#app.task(bind=True, name='task_that_holds_worker', rate_limit='4/m',
default_retry_delay=5 * 60,
max_retries=int(60 * 60 * 24 * 1 / (60 * 5)))
def task_that_holds_worker(self, *args, **kwargs):
import time
time.sleep(50000)
#app.task(bind=True, name='retried_task', rate_limit='2/m',
default_retry_delay=10 * 60,
max_retries=int(60 * 60 * 24 * 1 / (60 * 10)))
def retried_task(self, *args, **kwargs):
self.retry()
the simplest tasks , just to check that if a task is busy with other task - the retried task is not processed by another worker.
I then launched one worker - and triggered those two tasks in the following way:
from some_app import tasks
from some_app.celery_app import app
current_app = app.tasks
async_result = tasks.retried_task.delay()
import time
time.sleep(20)
async_result = tasks.task_that_holds_worker.delay()
the worker processed the retried task , and retried it,
and then moved to the task that sleeps.
I then launched another worker and i can see that it is not getting the 'retried' task, only the first worker.
each worker launched was launced with --prefetch-multiplier=1 --concurrency=1
Is there something wrong with the way I reproduced this?
or is this the way a celery retried task behaves?
Thanks in advance!
celery: 4.1.2
Python: 3.6.2
Rabbitmq Image: rabbitmq:3.6.9-management

Seems like this is an issue with tasks with eta. the first available worker counts down until the task eta and doesn't release it back to the queue. (prefetch count is increased and ignored)
https://github.com/celery/celery/issues/2541

There is an error with how you reproduced it. Unless you have a special broker, celery will always requeue a task retry request back to the broker. Workers do not retain any memory of which task they attempted, and there is no data added to the retry request that allows celery to route the task request back to the same worker. There is no guarantee or assurance that the same worker will retry a task that it has seen before. You can confirm this in the code for celery in celery/app.task.py
# get the signature of the task as called
S = self.signature_from_request(
request, args, kwargs,
countdown=countdown, eta=eta, retries=retries,
**options
)
if max_retries is not None and retries > max_retries:
if exc:
# On Py3: will augment any current exception with
# the exc' argument provided (raise exc from orig)
raise_with_context(exc)
raise self.MaxRetriesExceededError(
"Can't retry {0}[{1}] args:{2} kwargs:{3}".format(
self.name, request.id, S.args, S.kwargs))
ret = Retry(exc=exc, when=eta or countdown)
if is_eager:
# if task was executed eagerly using apply(),
# then the retry must also be executed eagerly.
S.apply().get()
if throw:
raise ret
return ret
try:
S.apply_async()
except Exception as exc:
raise Reject(exc, requeue=False)
if throw:
raise ret
return ret
I've bolded the part where you can see how the retry works. Celery gets the tasks request signature (this include the task name, and the arguments to the task, and sets the eta, countdown, and retries). And then celery will simply call apply_async, which under the hood will just queue up a new task request to the broker.
Your sample did not work because celery workers will often pull more than one task request off of the broker, so what likely happened is that the first worker grabbed the task off of the broker before the second worker had come online.

Related

Celery chunks tasks never Retry

Have following chunked tasks
update_matching_product.chunks(update_matching_products, 10).apply_async(priority=5)
When inside update_matching_product raise self.retry() the chunked task never retry. Instead of that shows Task can be retried
If we open celery code we see:
if request.called_directly:
# raises orig stack if PyErr_Occurred,
# and augments with exc' if that argument is defined.
raise_with_context(exc or Retry('Task can be retried', None))
So question - why chanked tasks called_directly, and how to retry failed chunked tasks?
PS: Celery is not in task_always_eager mode

Python signal from celery task

I am running a celery task which on success runs subtask to send signal.
#celery.task(name='sendmail')
def send_async_email(msg):
return mail.send(msg)
def send_mail(msg):
// do some processing
send_async_email.apply_async((msg,), link=send_email_signal.s(msg))
#celery.task
def send_email_signal(result, email_type, msg):
email_sent_signal.send(msg, email_type=email_type)
signals.email_sent_signal.connect(track.track_emails_sent)
def track_emails_sent(msg):
// adds logs to logging system
Problem is when i send email everything works expected but I see duplicate entry in my logging system.
I receive 1 email as expected, as per celery flower send_email_signal ran once. But in logs we have 2 log entries.
I have multiple celery workers running in the celery box. Want to understand how the python signal sent from celery callback task is handled.

Celery tasks on multiple machines

I have a server where I installed a RabbitMQ broker and two Celery consumers (main1.py and main2.py) both connected to the same broker.
In the first consumer (main1.py), I implemented a Celery Beat that sends multiple times a different task on a specific queue:
app = Celery('tasks', broker=..., backend=...)
app.conf.task_routes = (
[
('tasks.beat', {'queue': 'print-queue'}),
],
)
app.conf.beat_schedule = {
'beat-every-10-seconds': {
'task': 'tasks.beat',
'schedule': 10.0
},
}
#app.task(name='tasks.beat', bind=True)
def beat(self):
for i in range(10):
app.send_task("tasks.print", args=[i], queue="print-queue")
return None
In the second consumer (main2.py), I implemented the task said above:
app = Celery('tasks', broker=..., backend=...)
app.conf.task_routes = (
[
('tasks.print', {'queue': 'print-queue'}),
],
)
#app.task(name='tasks.print', bind=True)
def print(self, name):
return name
When I start the two Celery worker:
consumer1: celery worker -A main1 -Q print-queue --beat
consumer2: celery worker -A main2 -Q print-queue
I get these errors:
[ERROR/MainProcess] Received unregistered task of type 'tasks.print'
on the first consumer
[ERROR/MainProcess] Received unregistered task of type 'tasks.beat'
on the second consumer
Is it possible to split tasks on different Celery Applications both connected to the same broker?
Thanks in advance!

Here's what is happening. You have two workers A and B one of which also happens to be running celery beat (say that one is B).
celery beat submits task.beat to the queue. All this does is enqueue a message in rabbit with some metadata including the name of the task.
one of the two workers reads the message. Both A and B are listening to the same queue so either may read it.
a. If A reads the message it will try to find the task called tasks.beat this blows up because A doesn't define that task.
b. If B reads the message it will successfully try to find the task called tasks.beat (since it does have that task) and will run the code. tasks.beat will enqueue a new message in rabbit containing the metadata for tasks.print.
The same problem will again occur because only one of A and B defines tasks.print but either may get the message.
In practice, celery may be doing some checks to throw an error message earlier but I'm fairly certain this is the underlying problem.
In short, all workers (including beat) on a queue should be running the same code.

celery apply_async choking rabbit mq

I am using celery's apply_async method to queue tasks. I expect about 100,000 such tasks to run everyday (number will only go up). I am using RabbitMQ as the broker. I ran the code a few days back and RabbitMQ crashed after a few hours. I noticed that apply_async creates a new queue for each task with x-expires set at 1 day. My hypothesis is that RabbitMQ chokes when so many queues are being created. How can I stop celery from creating these extra queues for each task?
I also tried giving the queue parameter to the apply_async and assigned a x-message-ttl to that queue. Messages did go this new queue, however they were immediately consumed and never reached the ttl of 30sec that I had put. And this did not stop celery from creating those extra queues.
Here's my code:
views.py
from celery import task, chain
chain(task1.s(a), task2.s(b),)
.apply_async(link_error=error_handler.s(a), queue="async_tasks_queue")
tasks.py
from celery.result import AsyncResult
#shared_task
def error_handler(uuid, a):
#Handle error
#shared_task
def task1(a):
#Do something
return a
#shared_task
def task2(a, b):
#Do something more
celery.py
app = Celery(
'app',
broker=settings.QUEUE_URL,
backend=settings.QUEUE_URL,
)
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
app.amqp.queues.add("async_tasks_queue", queue_arguments={'durable' : True , 'x-message-ttl': 30000})
From the celery logs:
[2016-01-05 01:17:24,398: INFO/MainProcess] Received task:
project.tasks.task1[615e094c-2ec9-4568-9fe1-82ead2cd303b]
[2016-01-05 01:17:24,834: INFO/MainProcess] Received task:
project.decorators.wrapper[bf9a0a94-8e71-4ad6-9eaa-359f93446a3f]
RabbitMQ had 2 new queues by the names "615e094c2ec945689fe182ead2cd303b" and "bf9a0a948e714ad69eaa359f93446a3f" when these tasks were executed
My code is running on Django 1.7.7, celery 3.1.17 and RabbitMQ 3.5.3.
Any other suggestions to execute tasks asynchronously are also welcome

Try using a different backend - I recommend Redis. When we tried using Rabbitmq as both broker and backend we discovered that it was ill suited to the broker role.

Django Celery get task count

I am currently using django with celery and everything works fine.
However I want to be able to give the users an opportunity to cancel a task if the server is overloaded by checking how many tasks are currently scheduled.
How can I achieve this ?
I am using redis as broker.
I just found this :
Retrieve list of tasks in a queue in Celery
It is somehow relate to my issue but I don't need to list the tasks , just count them :)

Here is how you can get the number of messages in a queue using celery that is broker-agnostic.
By using connection_or_acquire, you can minimize the number of open connections to your broker by utilizing celery's internal connection pooling.
celery = Celery(app)
with celery.connection_or_acquire() as conn:
conn.default_channel.queue_declare(
queue='my-queue', passive=True).message_count
You can also extend Celery to provide this functionality:
from celery import Celery as _Celery
class Celery(_Celery)
def get_message_count(self, queue):
'''
Raises: amqp.exceptions.NotFound: if queue does not exist
'''
with self.connection_or_acquire() as conn:
return conn.default_channel.queue_declare(
queue=queue, passive=True).message_count
celery = Celery(app)
num_messages = celery.get_message_count('my-queue')

If your broker is configured as redis://localhost:6379/1, and your tasks are submitted to the general celery queue, then you can get the length by the following means:
import redis
queue_name = "celery"
client = redis.Redis(host="localhost", port=6379, db=1)
length = client.llen(queue_name)
Or, from a shell script (good for monitors and such):
$ redis-cli -n 1 -h localhost -p 6379 llen celery

If you have already configured redis in your app, you can try this:
from celery import Celery
QUEUE_NAME = 'celery'
celery = Celery(app)
client = celery.connection().channel().client
length = client.llen(QUEUE_NAME)

Get a redis client instance used by Celery, then check the queue length. Don't forget to release the connection every time you use it (use .acquire):
# Get a configured instance of celery:
from project.celery import app as celery_app
def get_celery_queue_len(queue_name):
with celery_app.pool.acquire(block=True) as conn:
return conn.default_channel.client.llen(queue_name)
Always acquire a connection from the pool, don't create it manually. Otherwise, your redis server will run out of connection slots and this will kill your other clients.

I'll expand on the answer of #StephenFuhry around the not-found error, because more or less broker-agnostic way of retrieving queue length is beneficial even if Celery suggests to mess with brokers directly. In Celery 4 (with Redis broker) this error looks like:
ChannelError: Channel.queue_declare: (404) NOT_FOUND - no queue 'NAME' in vhost '/'
Observations:
ChannelError is a kombu exception (if fact, it's amqp's and kombu "re-exports" it).
On Redis broker Celery/Kombu represent queues as Redis lists
Redis collection type keys are removed whenever the collection becomes empty
If we look at what queue_declare does, it has these lines:
if passive and not self._has_queue(queue, **kwargs):
raise ChannelError(...)
Kombu Redis virtual transport's _has_queue is this:
def _has_queue(self, queue, **kwargs):
with self.conn_or_acquire() as client:
with client.pipeline() as pipe:
for pri in self.priority_steps:
pipe = pipe.exists(self._q_for_pri(queue, pri))
return any(pipe.execute())
The conclusion is that on a Redis broker ChannelError raised from queue_declare is okay (for an existing queue of course), and just means that the queue is empty.
Here's an example of how to output all active Celery queues' lengths (normally should be 0, unless your worker can't cope with the tasks).
from kombu.exceptions import ChannelError
def get_queue_length(name):
with celery_app.connection_or_acquire() as conn:
try:
ok_nt = conn.default_channel.queue_declare(queue=name, passive=True)
except ChannelError:
return 0
else:
return ok_nt.message_count
for queue_info in celery_app.control.inspect().active_queues().values():
print(queue_info[0]['name'], get_queue_length(queue_info[0]['name']))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Celery retried messages remain in same worker - python

Seems like this is an issue with tasks with eta. the first available worker counts down until the task eta and doesn't release it back to the queue. (prefetch count is increased and ignored) https://github.com/celery/celery/issues/2541

Related

Celery chunks tasks never Retry

Python signal from celery task

Celery tasks on multiple machines

celery apply_async choking rabbit mq

Django Celery get task count

Categories

Resources