Slow Celery Task Times

Slow Celery Task Times - python

I'm using Django, Celery and RabbitMQ. I have a simple task that sends emails. This task works, but its very slow.
For example, I send 5000 emails, all 5000 emails go straight to RabbitMQ as normal but once in the message broker it then proceeds to takes around 30 minutes to complete and clear all tasks.
Without Celery these same tasks would take just a few minutes to process all 5000 tasks.
Have I missed configured something? It would be very helpful if someone could spot my speed issue.
task.py
class SendMessage(Task):
name = "Sending SMS"
max_retries = 10
default_retry_delay = 3
def run(self, message_id, gateway_id=None, **kwargs):
logging.debug("About to send a message.")
try:
message = Message.objects.get(pk=message_id)
except Exception as exc:
raise SendMessage.retry(exc=exc)
if not gateway_id:
if hasattr(message.billee, 'sms_gateway'):
gateway = message.billee.sms_gateway
else:
gateway = Gateway.objects.all()[0]
else:
gateway = Gateway.objects.get(pk=gateway_id)
account = Account.objects.get(user=message.sender)
if account._balance() >= message.length:
response = gateway._send(message)
if response.status == 'Sent':
# Take credit from users account.
transaction = Transaction(
account=account,
amount=- message.charge,
)
transaction.save()
message.billed = True
message.save()
else:
pass
settings.py
# Celery
BROKER_URL = 'amqp://admin:xxxxxx#xx.xxx.xxx.xxx:5672//'
CELERY_SEND_TASK_ERROR_EMAILS = True
Apache config
<VirtualHost *:80>
ServerName www.domain.com
DocumentRoot /srv/project/domain
WSGIDaemonProcess domain.com processes=2 threads=15 display-name=%{GROUP}
WSGIProcessGroup domain.com
WSGIScriptAlias / /srv/project/domain/apache/django.wsgi
ErrorLog /srv/project/logs/error.log
</VirtualHost>
conf
# Name of nodes to start, here we have a single node
#CELERYD_NODES="w1"
# or we could have three nodes:
CELERYD_NODES="w1 w2 w3"
# Where to chdir at start.
CELERYD_CHDIR="/srv/project/domain"
# How to call "manage.py celeryd_multi"
CELERYD_MULTI="$CELERYD_CHDIR/manage.py celeryd_multi"
# How to call "manage.py celeryctl"
CELERYCTL="$CELERYD_CHDIR/manage.py celeryctl"
# Extra arguments to celeryd
CELERYD_OPTS="--time-limit=900 --concurrency=8"
# %n will be replaced with the nodename.
CELERYD_LOG_FILE="/srv/project/logs/celery/%n.log"
CELERYD_PID_FILE="/srv/project/celery/%n.pid"
# Workers should run as an unprivileged user.
CELERYD_USER="root"
CELERYD_GROUP="root"
# Name of the projects settings module.
export DJANGO_SETTINGS_MODULE="domain.settings"
# Celery Beat Settings.
# Where to chdir at start.
CELERYBEAT_CHDIR="/srv/project/domain"
# Path to celerybeat
CELERYBEAT="$CELERYBEAT_CHDIR/manage.py celerybeat"

You are processing ~2.78 tasks/second (5000 tasks in 30 mins) which I can agree isn't that high. You have 3 nodes each running with a concurrency of 8 so you should be able to process 24 tasks in parallel.
Things to check:
CELERYD_PREFETCH_MULTIPLIER - This is set to 4 by default but if you have lots of short tasks it can be worthwhile to increase it. It will reduce the impact of the time to take the messages from the broker at the cost that tasks will not be as evenly distributed across workers.
DB connection/queries - I count 5+ DB queries being executed for the successful case. If you are using the default result backend for django-celery there are additional queries for storing the task result in the DB. django-celery will also close and reopen the DB connection after each task which adds some overhead. If you have 5 queries and each one takes 100ms then your task will take at least 500ms with or without celery. Running the queries by themselves is one thing but you also need to ensure that nothing in your task is locking the table/rows preventing other tasks from running efficiently in parallel.
Gateway response times - Your task appears to call a remote service which I'm assuming is an SMS gateway. If that server is slow to respond then your task will be slow. Again the response times might be different for a single call vs when you are doing this at peak load. In the US, long-code SMS can only be sent at a rate of 1 per second and depending on where the gateway is doing that rate-limiting then it might be slowing down your task.

Related

Uwsgi Locking Up After a Few Requests with Nginx/Traefik/Flask App Running over HTTPS/TLS and Docker

Problem
I have an app that uses nginx to serve my Python Flask app in production that only after a few requests starts locking up and timing out (will serve the first request or two quickly then start timing out and locking up afterwards). The Nginx app is served via Docker, the uwsgi Python app is served on barebones macOS (this Python app interfaces with the Docker instance running on the OS itself), the routing occurs via Traefik.
Findings
This problem only occurs in production and the only difference there is I'm using Traefik's LetsEncrypt SSL certs to use HTTPS to protect the API. I've narrowed the problem down to the following two docker-compose config lines (when present the problem persists, when removed the problem is corrected but SSL no longer is enabled):
- "traefik.http.routers.harveyapi.tls=true"
- "traefik.http.routers.harveyapi.tls.certresolver=letsencrypt"
Once locked up, I must restart the uwsgi processes to fix the problem just to have it lock right back up. Restarting nginx (Docker container) doesn't fix the problem which leads me to believe that uwsgi doesn't like the SSL config I'm using? Once I disable SSL support, I can send 2000 requests to the API and have it only take a second or two. Once enabled again, uwsgi can't even respond to 2 requests.
Desired Outcome
I'd like to be able to support SSL certs to enforce HTTPS connections to this API. I can currently run HTTP with this setup fine (thousands of concurrent connections) but that breaks when trying to use HTTPS.
Configs
I host dozens of other PHP sites with near identical setups. The only difference between those projects and this one is that they run PHP in Docker and this runs Python Uwsgi on barebones macOS. Here is the complete dump of configs for this project:
traefik.toml
# Traefik v2 Configuration
# Documentation: https://doc.traefik.io/traefik/migration/v1-to-v2/
[entryPoints]
# http should be redirected to https
[entryPoints.web]
address = ":80"
[entryPoints.web.http.redirections.entryPoint]
to = "websecure"
scheme = "https"
[entryPoints.websecure]
address = ":443"
[entryPoints.websecure.http.tls]
certResolver = "letsencrypt"
# Enable ACME (Let's Encrypt): automatic SSL
[certificatesResolvers.letsencrypt.acme]
email = "email#example.com"
storage = "/etc/traefik/acme/acme.json"
[certificatesResolvers.letsencrypt.acme.httpChallenge]
entryPoint = "web"
[log]
level = "DEBUG"
# Enable Docker Provider
[providers.docker]
endpoint = "unix:///var/run/docker.sock"
exposedByDefault = false # Must pass `traefik.enable=true` label to use Traefik
network = "traefik"
# Enable Ping (used for healthcheck)
[ping]
docker-compose.yml
version: "3.8"
services:
harvey-nginx:
build: .
restart: always
networks:
- traefik
labels:
- traefik.enable=true
labels:
- "traefik.http.routers.harveyapi.rule=Host(`project.com`, `www.project.com`)"
- "traefik.http.routers.harveyapi.tls=true"
- "traefik.http.routers.harveyapi.tls.certresolver=letsencrypt"
networks:
traefik:
name: traefik
uwsgi.ini
[uwsgi]
; uwsgi setup
master = true
memory-report = true
auto-procname = true
strict = true
vacuum = true
die-on-term = true
need-app = true
; concurrency
enable-threads = true
cheaper-initial = 5 ; workers to spawn on startup
cheaper = 2 ; minimum number of workers to go down to
workers = 10 ; highest number of workers to run
; workers
harakiri = 60 ; Restart workers if they have hung on a single request
max-requests = 500 ; Restart workers after this many requests
max-worker-lifetime = 3600 ; Restart workers after this many seconds
reload-on-rss = 1024 ; Restart workers after this much resident memory
reload-mercy = 3 ; How long to wait before forcefully killing workers
worker-reload-mercy = 3 ; How long to wait before forcefully killing workers
; app setup
protocol = http
socket = 127.0.0.1:5000
module = wsgi:APP
; daemonization
; TODO: Name processes `harvey` here
daemonize = /tmp/harvey_daemon.log
nginx.conf
server {
listen 80;
error_log /var/log/nginx/error.log;
access_log /var/log/nginx/access.log;
location / {
include uwsgi_params;
# TODO: Please note this only works for macOS: https://docs.docker.com/desktop/networking/#i-want-to-connect-from-a-container-to-a-service-on-the-host
# and will require adjusting for your OS.
proxy_pass http://host.docker.internal:5000;
}
}
Dockerfile
FROM nginx:1.23-alpine
RUN rm /etc/nginx/conf.d/default.conf
COPY nginx.conf /etc/nginx/conf.d
Additional Context
I've added additional findings on the GitHub issue where I've documented my journey for this problem: https://github.com/Justintime50/harvey/issues/67

This is no longer a problem and the solution is real frustrating - it was Docker's fault. For ~6 months there was a bug in Docker that was dropping connections (ultimately leading to the timeouts mentioned above) which was finally fixed in Docker Desktop 4.14.
The moment I upgraded Docker (it had just come out at the time and I thought I would try the hail Mary upgrade having already turned every dial and adjusted every config param without any luck), it finally stopped timing out and dropping connections. I was suddenly able to send through tens of thousands of concurrent requests without issue.
TLDR: uWSGI, Nginx, nor my config were at fault here. Docker had a bug that has been patched. If others on macOS are facing this problem, try upgrading to at least Docker Dekstop 4.14.

APScheduler does not run scheduled tasks: Flask + uWSGI

I have an application on a Flask and uWSGI with a jobstore in a SQLite. I start the scheduler along with the application, and add new tasks through add_task when some url is visited.
I see that the tasks are saved correctly in the jobstore, I can view them through the API, but it does not execute at the appointed time.
A few important data:
uwsgi.ini
processes = 1
enable-threads = true
__init__.py
scheduler = APScheduler()
scheduler.init_app(app)
with app.app_context():
scheduler.start()
main.py
scheduler.add_job(
id='{}{}'.format(test.id, g.user.id),
func = pay_day,
args = [test.id, g.user.id],
trigger ='interval',
minutes=test.timer
)
in service.py
def pay_day(tid, uid):
with scheduler.app.app_context():
*some code here*
Interesting behavior: if you create a task by going to the URL and restart the application after that, the task will be executed. But if the application is running and one of the users creates a task by going to the URL, then this task will not be completed until the application is restarted.
I don't get any errors or exceptions, even in the scheduler logs.
I already have no idea how to make it work and what I did wrong. I need a hint.

uWSGI employs some tricks which disable the Global Interpreter Lock and with it, the use of threads which are vital to the operation of APScheduler. To fix this, you need to re-enable the GIL using the --enable-threads switch. See the uWSGI documentation for more details.
I know that you had enable-threads = true in uwsgi.ini, but try the to enable it using the command line.

Celery-RabbitMQ Distributed Queue Test Message

I keep getting:ERROR/MainProcess] consumer: Cannot connect to amqp://ec2celeryuser when I run celery -A tasks worker on terminal.
Basically what I'm trying to do is get celery/rabbitmq working properly across (2) ec2 instances. To pass a silly task in tasks.py for processing to rabbitmq.
Instance 1 - Houses rabbitMQ
This currently runs RabbitMQ fine. If I run sudo rabbitmqctl status it outputs:
Status of node 'rabbit#ip-xx-xxx-xxx-xx' ...
[{pid,786},
2. Instance 2 - Houses Celery
I'm trying to run celery on instance 2 against Instance 1 using the following in terminal:
celery -A tasks worker
I have a file celeryconfig.py:
BROKER_URL = 'amqp://ec2celeryuser:mypasshere#xx.xxx.xx.xx:5672/celeryserver1/'
#CELERY SETTINGS
CELERY_IMPORTS = ("tasks",)
CELERY_RESULT_BACKEND = "amqp"
I have a file client.py:
from tasks import add
result = add.delay(4, 4) # call task
result_sum = result.get(timeout=5) # wait to get result for a maximum of 5 seconds
I have a file tasks.py:
from celery import Celery
app = Celery('tasks', broker='amqp://ec2celeryuser:mypasshere#xx.xxx.xx.xx:5672/celeryserver1/')
#app.task
def add(x, y):
return x + y
I've properly setup a vhost, a user ec2celeryuser, and gave this user permissions of:
sudo rabbitmqctl set_permissions -p /celeryserver1 ec2celeryuser ".*" ".*" ".*"
if I do: sudo rabbitmqctl list_users on RabbitMQ (instance 1) it shows:
ec2celeryuser []
guest [administrator
I've tried both usernames with their passwords, but no change.
I've been following the Celery Guide, and a tutorial without much luck.
What am I doing wrong here? Clearly there is a connection issue, but what am I doing wrong?
Thank you!

Thanks to user natdempk for helping me fix the configuration syntax of a queues.
The issue was creating a vhost in rabbitmq like:
sudo rabbitmqctl add_vhost /celeryserver1
when it should have been:
sudo rabbitmqctl add_vhost celeryserver1
I then had to reset the permissions for my user ec2celeryuser like:
sudo rabbitmqctl set_permissions -p celeryserver1 ec2celeryuser ".*" ".*" ".*"
The way I realized this was the issue was: I visited /var/log/rabbitmq/<last log file.log>
and saw:
=INFO REPORT==== 30-Apr-2014::12:45:58 ===
accepted TCP connection on [::]:5672 from xx.xxx.xxx.xxx:45964
=INFO REPORT==== 30-Apr-2014::12:45:58 ===
starting TCP connection <x.xxx.x> from from xx.xxx.xxx.xxx:45964
=ERROR REPORT==== 30-Apr-2014::12:46:01 ===
exception on TCP connection <x.xxx.x> from from xx.xxx.xxx.xxx:45964
{channel0_error,opening,
{amqp_error,access_refused,
"access to vhost 'celeryserver1/' refused for user 'ec2celeryuser'",
'connection.open'}}
Since fixing the vhost, I now pleasantly see:
[2014-04-30 13:08:10,101: WARNING/MainProcess] celery#ip-xx-xxx-xx-xxx ready.

So I see a few things wrong here. First your broker URL for rabbitMQ in tasks.py doesn't seem correct. It should read something like below.
app = Celery('tasks', broker='amqp://ec2celeryuser:ec2celerypassword#xx.xxx.xx.xx/celeryserver1/')
Also you might want to specify the app you want celery to serve when you run the worker process. You can do this by running celery -A tasks worker from the directory tasks.py is located in.
Another thing is your code in client.py to call your task seems incorrect. From the celery documentation, you can call the task as follows:
from tasks import add
result = add.delay(4, 4) # call task
result_sum = result.get(timeout=5) # wait to get result for a maximum of 5 seconds
Fixing these might solve your issue, or at least get you closer.

Tweaking celery for high performance

I'm trying to send ~400 HTTP GET requests and collect the results.
I'm running from django.
My solution was to use celery with gevent.
To start the celery tasks I call get_reports :
def get_reports(self, clients, *args, **kw):
sub_tasks = []
for client in clients:
s = self.get_report_task.s(self, client, *args, **kw).set(queue='io_bound')
sub_tasks.append(s)
res = celery.group(*sub_tasks)()
reports = res.get(timeout=30, interval=0.001)
return reports
#celery.task
def get_report_task(self, client, *args, **kw):
report = send_http_request(...)
return report
I use 4 workers:
manage celery worker -P gevent --concurrency=100 -n a0 -Q io_bound
manage celery worker -P gevent --concurrency=100 -n a1 -Q io_bound
manage celery worker -P gevent --concurrency=100 -n a2 -Q io_bound
manage celery worker -P gevent --concurrency=100 -n a3 -Q io_bound
And I use RabbitMq as the broker.
And although it works much faster than running the requests sequentially (400 requests took ~23 seconds), I noticed that most of that time was overhead from celery itself, i.e. if I changed get_report_task like this:
#celery.task
def get_report_task(self, client, *args, **kw):
return []
this whole operation took ~19 seconds.
That means that I spend 19 seconds only on sending all the tasks to celery and getting the results back
The queuing rate of messages to rabbit mq is seems to be bound to 28 messages / sec and I think that this is my bottleneck.
I'm running on a win 8 machine if that matters.
some of the things I've tried:
using redis as broker
using redis as results backend
tweaking with those settings
BROKER_POOL_LIMIT = 500
CELERYD_PREFETCH_MULTIPLIER = 0
CELERYD_MAX_TASKS_PER_CHILD = 100
CELERY_ACKS_LATE = False
CELERY_DISABLE_RATE_LIMITS = True
I'm looking for any suggestions that will help speed things up.

Are you really running on Windows 8 without a Virtual Machine? I did the following simple test on 2 Core Macbook 8GB RAM running OS X 10.7:
import celery
from time import time
#celery.task
def test_task(i):
return i
grp = celery.group(test_task.s(i) for i in range(400))
tic1 = time(); res = grp(); tac1 = time()
print 'queued in', tac1 - tic1
tic2 = time(); vals = res.get(); tac2 = time()
print 'executed in', tac2 - tic2
I'm using Redis as broker, Postgres as a result backend and default worker with --concurrency=4. Guess what is the output? Here it is:
queued in 3.5009469986
executed in 2.99818301201

Well it turnes out I had 2 separate issues.
First off, the task was a member method. After extracting it out of the class, the time went down to about 12 seconds. I can only assume it has something to do with the pickling of self.
The second thing was the fact that it ran on windows.
After running it on my linux machine, the run time was less than 2 seconds.
Guess windows just isn't cut for high performance..

How about using twisted instead? You can reach for much simpler application structure. You can send all 400 requests from the django process at once and wait for all of them to finish. This works simultaneously because twisted sets the sockets into non-blocking mode and only reads the data when its available.
I had a similar problem a while ago and I've developed a nice bridge between twisted and django. I'm running it in production environment for almost a year now. You can find it here: https://github.com/kowalski/featdjango/. In simple words it has the main application thread running the main twisted reactor loop and the django view results is delegated to a thread. It use a special threadpool, which exposes methods to interact with reactor and use its asynchronous capabilities.
If you use it, your code would look like this:
from twisted.internet import defer
from twisted.web.client import getPage
import threading
def get_reports(self, urls, *args, **kw):
ct = threading.current_thread()
defers = list()
for url in urls:
# here the Deferred is created which will fire when
# the call is complete
d = ct.call_async(getPage, args=[url] + args, kwargs=kw)
# here we keep it for reference
defers.append(d)
# here we create a Deferred which will fire when all the
# consiting Deferreds are completed
deferred_list = defer.DeferredList(defers, consumeErrors=True)
# here we tell the current thread to wait until we are done
results = ct.wait_for_defer(deferred_list)
# the results is a list of the form (C{bool} success flag, result)
# below unpack it
reports = list()
for success, result in results:
if success:
reports.append(result)
else:
# here handle the failure, or just ignore
pass
return reports
This still is something you can optimize a lot. Here, every call to getPage() would create a separate TCP connection and close it when its done. This is as optimal as it can be, providing that each of your 400 requests is sent to a different host. If this is not a case, you can use a http connection pool, which uses persistent connections and http pipelineing. You instantiate it like this:
from feat.web import httpclient
pool = httpclient.ConnectionPool(host, port, maximum_connections=3)
Than a single request is perform like this (this goes instead the getPage() call):
d = ct.call_async(pool.request, args=(method, path, headers, body))

Django Celery Periodic Tasks Run But RabbitMQ Queues Aren't Consumed

Question
After running tasks via celery's periodic task scheduler, beat, why do I have so many unconsumed queues remaining in RabbitMQ?
Setup
Django web app running on Heroku
Tasks scheduled via celery beat
Tasks run via celery worker
Message broker is RabbitMQ from ClouldAMQP
Procfile
web: gunicorn --workers=2 --worker-class=gevent --bind=0.0.0.0:$PORT project_name.wsgi:application
scheduler: python manage.py celery worker --loglevel=ERROR -B -E --maxtasksperchild=1000
worker: python manage.py celery worker -E --maxtasksperchild=1000 --loglevel=ERROR
settings.py
CELERYBEAT_SCHEDULE = {
'do_some_task': {
'task': 'project_name.apps.appname.tasks.some_task',
'schedule': datetime.timedelta(seconds=60 * 15),
'args': ''
},
}
tasks.py
#celery.task
def some_task()
# Get some data from external resources
# Save that data to the database
# No return value specified
Result
Every time the task runs, I get (via the RabbitMQ web interface):
An additional message in the "Ready" state under my "Queued Messages"
An additional queue with a single message in the "ready" state
This queue has no listed consumers

It ended up being my setting for CELERY_RESULT_BACKEND.
Previously, it was:
CELERY_RESULT_BACKEND = 'amqp'
I no longer had unconsumed messages / queues in RabbitMQ after I changed it to:
CELERY_RESULT_BACKEND = 'database'
What was happening, it would appear, is that after a task was executed, celery was sending info about that task back via rabbitmq, but, there was nothing setup to consume these responses messages, hence a bunch of unread ones ending up in the queue.
NOTE: This means that celery would be adding database entries recording the outcomes of tasks. To keep my database from getting loaded up with useless messages, I added:
# Delete result records ("tombstones") from database after 4 hours
# http://docs.celeryproject.org/en/latest/configuration.html#celery-task-result-expires
CELERY_TASK_RESULT_EXPIRES = 14400
Relevant parts from Settings.py
########## CELERY CONFIGURATION
import djcelery
# https://github.com/celery/django-celery/
djcelery.setup_loader()
INSTALLED_APPS = INSTALLED_APPS + (
'djcelery',
)
# Compress all the messages using gzip
# http://celery.readthedocs.org/en/latest/userguide/calling.html#compression
CELERY_MESSAGE_COMPRESSION = 'gzip'
# See: http://docs.celeryproject.org/en/latest/configuration.html#broker-transport
BROKER_TRANSPORT = 'amqplib'
# Set this number to the amount of allowed concurrent connections on your AMQP
# provider, divided by the amount of active workers you have.
#
# For example, if you have the 'Little Lemur' CloudAMQP plan (their free tier),
# they allow 3 concurrent connections. So if you run a single worker, you'd
# want this number to be 3. If you had 3 workers running, you'd lower this
# number to 1, since 3 workers each maintaining one open connection = 3
# connections total.
#
# See: http://docs.celeryproject.org/en/latest/configuration.html#broker-pool-limit
BROKER_POOL_LIMIT = 3
# See: http://docs.celeryproject.org/en/latest/configuration.html#broker-connection-max-retries
BROKER_CONNECTION_MAX_RETRIES = 0
# See: http://docs.celeryproject.org/en/latest/configuration.html#broker-url
BROKER_URL = os.environ.get('CLOUDAMQP_URL')
# Previously, had this set to 'amqp', this resulted in many read / unconsumed
# queues and messages in RabbitMQ
# See: http://docs.celeryproject.org/en/latest/configuration.html#celery-result-backend
CELERY_RESULT_BACKEND = 'database'
# Delete result records ("tombstones") from database after 4 hours
# http://docs.celeryproject.org/en/latest/configuration.html#celery-task-result-expires
CELERY_TASK_RESULT_EXPIRES = 14400
########## END CELERY CONFIGURATION

Looks like you are getting back responses from your consumed tasks.
You can avoid that by doing:
#celery.task(ignore_result=True)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.