So I am using Celery with RabbitMQ. I have a RESTful API that registers a user. I am using remote Celery worker to send a registration email asynchronously so my API can return fast response.
from .tasks import send_registration_email
def register_user(user_data):
# save user to the database etc
send_registration_email.delay(user.id)
return {'status': 'success'}
This works fine. Email is being sent in a non blocking asynchronous way (and can be retried if fails which is cool). The problem is when I look at RabbitMQ management console. I can see that the send_registration_email has created a random queue. Something like:
I can see that the task has been successfully executed. So why does the random queue stays in RabbitMQ forever? This is the task payload:
{"status": "SUCCESS", "traceback": null, "result": true, "task_id": "aad10877-3508-4179-a5fb-99f1bd0b8b2f", "children": []}
This normal behaviour, if you have configured CELERY_RESULT_BACKEND in your settings. Please check here: Celery result backend description
You could disable result backend, or decrease each message life time.
Related
I am getting an error
redis.exceptions.ConnectionError: Error 24 connecting to redis-service:6379. Too many open files.
...
OSError: [Errno 24] Too many open files
I know this can be fixed by increasing the ulimit but I don't think that's the issue here and also this is a service running on a container.
The application starts up correctly works for 48 hours correctly and then I get the above error.
Which implies that the connections are growing over time exponentially.
What my application is basically doing
background_task (ran using celery) -> collects data from postgres and sets it on redis
prometheus reaches the app at '/metrics' which is a django view -> collects data from redis and serves the data using django prometheus exporter
The code looks something like this
views.py
from prometheus_client.core import GaugeMetricFamily, REGISTRY
from my_awesome_app.taskbroker.celery import app
class SomeMetricCollector:
def get_sample_metrics(self):
with app.connection_or_acquire() as conn:
client = conn.channel().client
result = client.get('some_metric_key')
return {'some_metric_key': result}
def collect(self):
sample_metrics = self.get_sample_metrics()
for key, value in sample_metrics.items():
yield GaugeMetricFamily(key, 'This is a custom metric', value=value)
REGISTRY.register(SomeMetricCollector())
tasks.py
# This is my boilerplate taskbroker app
from my_awesome_app.taskbroker.celery import app
# How it's collecting data from postgres is trivial to this issue.
from my_awesome_app.utility_app.utility import some_value_calculated_from_query
#app.task()
def app_metrics_sync_periodic():
with app.connection_or_acquire() as conn:
client = conn.channel().client
client.set('some_metric_key', some_value_calculated_from_query(), ex=21600)
return True
I don't think the background data collection in tasks.py is causing the Redis connections to grow exponentially but it's the Django view '/metrics' in views.py which is causing.
Can you please tell me what I am doing wrong here?
If there is a better way to read from Redis from a Django view. The Prometheus instance scrapes the Django application every 5s.
This answer is according to my use case and research.
The issue here, according to me, is the fact that each request to /metrics initiates a new thread where the views.py creates new connections in the Celery broker's connection pool.
This can be easily handled by letting Django manage its own Redis connection pool through cache backend and Celery manage its own Redis connection pool and not use each other's connection pools from their respective threads.
Django Side
config.py
# CACHES
# ------------------------------------------------------------------------------
# For more details on options for your cache backend please refer
# https://docs.djangoproject.com/en/3.1/ref/settings/#backend
CACHES = {
"default": {
"BACKEND": "django_redis.cache.RedisCache",
"LOCATION": "redis://localhost:6379/0",
"OPTIONS": {
"CLIENT_CLASS": "django_redis.client.DefaultClient",
},
}
}
views.py
from prometheus_client.core import GaugeMetricFamily, REGISTRY
# *: Replacing celery app with Django cache backend
from django.core.cache import cache
class SomeMetricCollector:
def get_sample_metrics(self):
# *: This is how you will get the new client, which is still context managed.
with cache.client.get_client() as client:
result = client.get('some_metric_key')
return {'some_metric_key': result}
def collect(self):
sample_metrics = self.get_sample_metrics()
for key, value in sample_metrics.items():
yield GaugeMetricFamily(key, 'This is a custom metric', value=value)
REGISTRY.register(SomeMetricCollector())
This will ensure that Django will maintain it's Redis connection pool and not cause new connections to be spun up unnecessarily.
Celery Side
tasks.py
# This is my boilerplate taskbroker app
from my_awesome_app.taskbroker.celery import app
# How it's collecting data from postgres is trivial to this issue.
from my_awesome_app.utility_app.utility import some_value_calculated_from_query
#app.task()
def app_metrics_sync_periodic():
with app.connection_or_acquire() as conn:
# *: This will force celery to always look into the existing connection pool for connection.
client = conn.default_channel.client
client.set('some_metric_key', some_value_calculated_from_query(), ex=21600)
return True
How do I monitor connections?
There is a nice prometheus celery exporter which will help you monitor your celery task activity not sure how you can add connection pool and connection monitoring to it.
The easiest way to manually verify if the connections are growing every time /metrics is hit on the web app, is by:
$ redis-cli
127.0.0.1:6379> CLIENT LIST
...
The client list command will help you see if the number of connections are growing or not.
I don't use queues sadly but I would recommend using queues. This is how my worker runs:
$ celery -A my_awesome_app.taskbroker worker --concurrency=20 -l ERROR -E
For learning purpose I want to implement the next thing:
I have a script that runs selenium for example in the background and I have some log messages that help me to see what is going on in the terminal.
But I want to get the same messages in my REST request to the Angular app.
print('Started')
print('Logged in')
...
print('Processing')
...
print('Success')
In my view.py file
class RunTask(viewsets.ViewSet):
queryset = Task.objects.all()
#action(detail=False, methods=['GET'], name='Run Test Script')
def run(self, request, *args, **kwargs):
task = task()
if valid['success']:
return Response(data=task)
else:
return Response(data=task['message'])
def task()
print('Staring')
print('Logged in')
...
print('Processing')
...
print('Success')
return {
'success': True/False,
'message': 'my status message'
}
Now it shows me only the result of the task. But I want to get the same messages to indicate process status in frontend.
And I can't understand how to organize it.
Or how I can tell angular about my process status?
Unfortunately, it's not that simple. Indeed, the REST API lets you start the task, but since it runs in the same thread, the HTTP request will block until the task is finished before sending the response. Your print statements won't appear in the HTTP response but on your server output (if you look at the shell where you ran python manage.py runserver, you'll see those print statements).
Now, if you wish to have those output in real-time, you'll have to look for WebSockets. They allow you to open a "tunnel" between the browser and the server, and send/receive messages in real-time. The django-channels library allow you to implement them.
However, for long-running background tasks (like a Selenium scraper), I would advise to look into the Celery task queue. Basically, your Django process will schedule task into the queue. The tasks into the queue will then be executed by one (or more !) "worker" processes. The advantage of this is that your Django process won't be blocked by the long task: it justs add some work into the queue and then respond.
When you add tasks in the queue, Celery will give you a unique identifier for this task, that you can return in the HTTP response. You can then very well implement another endpoint which takes a task id in parameter and return the state of the task (is it pending ? done ? failed ?).
For this to work, you'll have to setup a "broker", a kind of database that will store the tasks to do and their results (typically RabbitMQ or Redis). Celery documentation explains this well: https://docs.celeryproject.org/en/latest/getting-started/brokers/index.html
Either way you choose, it's not a trivial thing and will need quite some work before having some results ; but it's interesting to see how it expands the possibilities of a classical HTTP server.
I have the following scripts:
celery_tasks.py
from celery import Celery
app = Celery(broker='amqp://guest:guest#localhost:5672//')
app.conf.task_default_queue = 'test_queue'
#app.task(acks_late=True)
def test(a):
return a
publish.py
from celery_tasks import test
test.delay('abc')
When i run publish.py and start the worker (celery -A celery_tasks worker --loglevel=DEBUG), the 'abc' content is published in the 'test_queue' and is consumed by the worker.
Is there a way for the worker to consume something from a queue that was not posted by Celery? For example, when I put something in the test_queue straight through RabbitMQ, without going through the Celery publisher, and run the Celery worker, it gave me the following warning:
WARNING/MainProcess] Received and deleted unknown message. Wrong destination?!?
The full contents of the message body was: body: 'abc' (3b)
{content_type:None content_encoding:None
delivery_info:{'exchange': '', 'redelivered': False, 'delivery_tag': 1, 'consumer_tag': 'None2', 'routing_key': 'test_queue'} headers={}}
Is there a way to solve this?
Celery has a specific format and a set of headers that needs to be maintained to comply with it. Therefore you would have to reverse engineer it to make celery-compliant message not produced by celery.
Keep in mind that celery is not really made to send messages across the broker, but to send tasks, which are enhanced messages therefore have extras in the header part of the amqp message
It's a late answer but custom consumers might help you. I'm using this for consuming messages from rabbitmq. Where these messages are being populated from another app with pika.
http://docs.celeryproject.org/en/latest/userguide/extending.html#custom-message-consumers
I am currently trying to setup celery to handle responses from a chatbot and forward those responses to a user.
The chatbot hits the /response endpoint of my server, that triggers the following function in my server.py module:
def handle_response(user_id, message):
"""Endpoint to handle the response from the chatbot."""
tasks.send_message_to_user.apply_async(args=[user_id, message])
return ('OK', 200,
{'Content-Type': 'application/json; charset=utf-8'})
In my tasks.py file, I import celery and create the send_message_to_user function:
from celery import Celery
celery_app = Celery('tasks', broker='redis://')
#celery_app.task(name='send_message_to_user')
def send_message_to_user(user_id, message):
"""Send the message to a user."""
# Here is the logic to send the message to a specific user
My problem is, my chatbot may answer multiple messages to a user, so the send_message_to_user task is properly put in the queue but then a race condition arises and sometimes the messages arrive to the user in the wrong order.
How could I make each send_message_to_user task wait for the previous task with the same name and with the same argument "user_id" before executing it ?
I have looked at this thread Running "unique" tasks with celery but a lock isn't my solution, as I don't want to implement ugly retries when the lock is released.
Does anyone have any idea how to solve that issue in a clean(-ish) way ?
Also, it's my first post here so I'm open to any suggestions to improve my request.
Thanks!
I'm having an issue logging to Sentry from within a celery task. Errors in tasks work fine. However, when I try to manually log an event, it gets logged to the celery logs, but not to the sentry server.
The code I'm using is:
#task
def myWorker():
logger = logging.getLogger('celery.task')
logger.addHandler(SentryHandler())
logger.warn("Some condition happened", exc_info=True, extra={ 'extra': 'data' })
I've found some posts on here and around the net on this, but they all seem to be very out of date