Celery Worker is offline - python

I am running a celery worker
foreman run python manage.py celery worker -E --maxtasksperchild=1000
And a celerymon
foreman run python manage.py celerymon
as well as celerycam
foreman run python manage.py celerycam
The django admin shows that my worker is offline and all tasks remain in the delayed state. I have tried killing and restarting it several times but it does not seem to be online.
Here is my configuration
BROKER_TRANSPORT = 'amqplib'
BROKER_POOL_LIMIT=0
BROKER_CONNECTION_MAX_RETRIES = 0
BROKER_URL = os.environ.get('AMQP_URL')
CELERY_RESULT_BACKEND = 'database'
CELERY_TASK_RESULT_EXPIRES = 14400

Related

How do celery workers communicate in Heroku

I have some celery workers in a Heroku app. My app is using python3.6and django, these are the relevant dependencies and their versions:
celery==3.1.26.post2
redis==2.10.3
django-celery==3.2.2
I do not know if the are useful to this question, but just in case. On Heroku we are running the Heroku-18 stack.
As it's usual, we have our workers declared in a Procfile, with the following content:
web: ... our django app ....
celeryd: python manage.py celery worker -Q celery --loglevel=INFO -O fair
one_type_of_worker: python manage.py celery worker -Q ... --maxtasksperchild=3 --loglevel=INFO -O fair
another_type: python manage.py celery worker -Q ... --maxtasksperchild=3 --loglevel=INFO -O fair
So, my current understanding of this process is the following:
Our celery queues run on multiple workers, each worker runs as a dyno on Heroku (not a server, but a “worker process” kind of thing, since servers aren’t a concept on Heroku). We also have multiple dynos running the same celery worker with the same queue, which results in multiple parallel “threads” for that queue to run more tasks simultaneously (scalability).
The web workers, celery workers, and celery queues can talk to each other because celery manages the orchestration between them. I think it's specifically the broker that handles this responsibility. But for example, this lets our web workers schedule a celery task on a specific queue and it is routed to the correct queue/worker, or a task running in one queue/worker can schedule a task on a different queue/worker.
Now here is when comes my question, so does the worker communicate? Do they use an API endpoint in localhost with a port? RCP? Do they use the broker url? Magic?
I'm asking this because I'm trying to replicate this setup in ECS and I need to know how to set it up for celery.
Here you go to know how celery works at heroku: https://devcenter.heroku.com/articles/celery-heroku
You can't run celery on Heroku without getting a Heroku dyno for celery. Also, make sure you have Redis configured on your Django celery settings.
to run the celery on Heroku, you just add this line to your Procfile
worker: celery -A YOUR-PROJECT_NAME worker -l info -B
Note: above celery commands will run both celery worker and celery beat
If you want to run it separately, you can use separate commands but one command is recommended

Running Celery as a Flask app with Gunicorn

I'm running Celery as a Flask microservice where it has tasks.py with tasks and manage.py contains the call to run the flask server.
This is part of the manage.py
class CeleryWorker(Command):
"""Starts the celery worker."""
name = 'celery'
capture_all_args = True
def run(self, argv):
if "down" in argv:
ret = subprocess.call(
['pkill', '-9', '-f', "my_app.celery"])
sys.exit(ret)
else:
ret = subprocess.call(
['celery', 'worker', '-A', 'my_app.celery'] + argv)
sys.exit(ret)
manager.add_command("celery", CeleryWorker())
I can start the service with either python manage.py runserver or `celery worker -A my_app.celery and it runs perfectly and registers all tasks in tasks.py.
But in production, i want to handle multiple requests to this microservice and want to add gunicorn to serve those requests. How do i do it?
I'm not able to figure out how i can run both my gunicorn command and celery command together.
Also, i'm running other api services using gunicorn in production from its create_app, since i dont need them to run the celery command.
Recommend to use Supervisor, which allow you to control a number of processes.
step1: pip install supervisor
step2: vi supervisord.conf
[program:flask_wsgi]
command=gunicorn -w 3 --worker-class gevent wsgi:app
directory=$SRC_PATH
autostart=true
[program:celery]
command=celery worker -A app.celery --loglevel=info
directory=$SRC_PATH
autostart=true
step3: run supervisord -c supervisord.conf

Django, Django Dynamic Scraper, Djcelery and Scrapyd - Not Sending Tasks in Production

I'm using Django Dynamic Scraper to build a basic web scraper. I have it 99% of the way finished. It works perfectly in development alongside Celery and Scrapyd. Tasks are sent and fulfilled perfectly.
As for production I'm pretty sure I have things set up correctly:
I'm using Supervisor to run Scrapyd and Celery on my VPS. They are both pointing at the correct virtualenv installations etc...
Here's how I know they're both set up fine for the project: When I SSH into my server and use the manage.py shell to execute a celery task, it returns an Async task which is then executed. The results appear in the database and both my scrapyd and celery log show the tasks being processed.
The issue is that my scheduled tasks are not being fired automatically - despite working perfectly find in development.
# django-celery settings
import djcelery
djcelery.setup_loader()
BROKER_URL = 'django://'
CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler'
And my Supervisor configs:
Celery Config:
[program:IG_Tracker]
command=/home/dean/Development/IG_Tracker/venv/bin/celery --
app=IG_Tracker.celery:app worker --loglevel=INFO -n worker.%%h
directory=/home/dean/Development/IG_Tracker/
user=root
numprocs=1
stdout_logfile=/home/dean/Development/IG_Tracker/celery-worker.log
stderr_logfile=/home/dean/Development/IG_Tracker/celery-worker.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
killasgroup=true
priority=998
Scrapyd Config:
[program:scrapyd]
directory=/home/dean/Development/IG_Tracker/instagram/ig_scraper
command=/home/dean/Development/IG_Tracker/venv/bin/scrapyd
environment=MY_SETTINGS=/home/dean/Development/IG_Tracker/IG_Trackersettings.py
user=dean
autostart=true
autorestart=true
redirect_stderr=true
numprocs=1
stdout_logfile=/home/dean/Development/IG_Tracker/scrapyd.log
stderr_logfile=/home/dean/Development/IG_Tracker/scrapyd.log
startsecs=10
I have followed the docs as close as I could and used the recommended tools for deployment (eg. scrapyd-deploy etc...). Additionally, when I run celery and scrapyd manually on the server (as one would in development) things work fine. It's just when the two are run using supervisor.
I'm probably missing some setting or another which is preventing my celery tasks stored in the SQLite DB from being picked up and ran automatically by celery/scrapyd when in production.
Okay - so I eventually got this working. Maybe this can help someone else. My issue was that I only had ONE supervisor process for celery where as it needs two - one for actually running the tasks (worker) and another for supervising the scheduling. I only had the worker. This explains why everything worked fine when I fired off a task using the django shell (essentially manually passing a task to the worker).
Here is my conf file for the 'scheduler' celery process:
[program:celery_beat]
command=/home/dean/Development/IG_Tracker/venv/bin/celery beat -A
IG_Tracker --loglevel=INFO
directory=/home/dean/Development/IG_Tracker/
user=root
numprocs=1
stdout_logfile=/home/dean/Development/IG_Tracker/celery-worker.log
stderr_logfile=/home/dean/Development/IG_Tracker/celery-worker.log
autostart=true
autorestart=true
startsecs=10
stopwaitsecs = 600
killasgroup=true
priority=998
I added that and ran:
supervisorctl reread
supervisorctl update
supervisotctl restart all
My tasks began running right away.

Django Celery Periodic Task not running (Heroku)?

I have a Django application that I've deployed with Heroku. I'm trying to user celery to create a periodic task every minute. However, when I observe the logs for the worker using the following command:
heroku logs -t -p worker
I don't see my task being executed. Perhaps there is a step I'm missing? This is my configuration below...
Procfile
web: gunicorn activiist.wsgi --log-file -
worker: celery worker --app=trending.tasks.app
Tasks.py
import celery
app = celery.Celery('activiist')
import os
from celery.schedules import crontab
from celery.task import periodic_task
from django.conf import settings
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
app.conf.update(BROKER_URL=os.environ['REDIS_URL'],
CELERY_RESULT_BACKEND=os.environ['REDIS_URL'])
os.environ['DJANGO_SETTINGS_MODULE'] = 'activiist.settings'
from trending.views import *
#periodic_task(run_every=crontab())
def add():
getarticles(30)
One thing to add. When I run the task using the python shell and the "delay()" command, the task does indeed run (it shows in the logs) -- but it only runs once and only when executed.
You need separate worker for the beat process (which is responsible for executing periodic tasks):
web: gunicorn activiist.wsgi --log-file -
worker: celery worker --app=trending.tasks.app
beat: celery --app=trending.tasks.app
Worker isn't necessary for periodic tasks so the relevant line can be omitted. The other possibility is to embed beat inside the worker:
web: gunicorn activiist.wsgi --log-file -
worker: celery worker --app=trending.tasks.app -B
but to quote the celery documentation:
You can also start embed beat inside the worker by enabling workers -B option, this is convenient if you will never run more than one worker node, but it’s not commonly used and for that reason is not recommended for production use

Celery beat doesn't run periodic task

I have task
class BasecrmSync(PeriodicTask):
run_every = schedules.crontab(minute='*/1')
def run(self, **kwargs):
bc = basecrm.Client(access_token=settings.BASECRM_AUTH_TOKEN)
sync = basecrm.Sync(client=bc, device_uuid=settings.BASECRM_DEVICE_UUID)
sync.fetch(synchronize)
And celery config with db broker
CELERY_RESULT_BACKEND='djcelery.backends.database:DatabaseBackend'
BROKER_URL = 'django://'
CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler'
I run
celery -A renuval_api worker -B --loglevel=debug
But it doesn't run task...
Also I've tried run by
python3 manage.py celery worker --loglevel=DEBUG -E -B -c 1 --settings=renuval_api.settings.local
But It uses amqp transport and I can't understand why.
I run a separate process for the beat function itself. I could never get periodic tasks to fire otherwise. Of course, I may have this completely wrong, but it works for me and has for some time.
For example, I have the celery worker with its app running in one process:
celery worker --app=celeryapp:app -l info --logfile="/var/log/celery/worker.log"
And I have the beat pointed to the same app in its own process:
celery --app=celeryapp:app beat
They are pointed at the same app and settings, and beat fires off the task which the worker picks up and does. This app is in the same code tree as my Django apps, but the processes are not running in Django. Perhaps you could run something like:
python3 manage.py celery beat --loglevel=DEBUG -E -B -c 1 --settings=renuval_api.settings.local
I hope that helps.

Categories