Celery task not received by worker when two projects

Celery task not received by worker when two projects - python

My colleague has written celery tasks, necessary configuration in settings file, also supervisors config file. Everything is working perfectly fine. The projects is handed over to me and I seeing some issues that I have to fix.
There are two projects running on a single machine, both projects are almost same, lets call them projA and projB.
supervisord.conf file is as:
;for projA
[program:celeryd]
directory=/path_to_projA/
command=celery -A project worker -l info
...
[program:celerybeat]
directory=/path_to_projA/
command=celery -A project beat -l info
...
; For projB
[program:celerydB]
directory=/path_to_projB/
command=celery -A project worker -l info
...
[program:celerybeatB]
directory=/path_to_projB/
command=celery -A project beat -l info
...
The issue is, I am creating tasks through a loop and only one task is received from celeryd of projA, and remaining task are not in received (or could be received by celeryd of projB).
But when I stop celery programs for projB everything works well. Please note, the actual name of django-app is project hence celery -A project worker/beat -l info.
Please bare, I am new to celery, any help is appreciated. TIA.

As the Celery docs says,
Celery is an asynchronous task queue/job queue based on distributed message passing.
When multiple tasks are created through a loop, tasks are evenly distributed to two different workers ie worker of projA and worker of projB since your workers are same.
If projects are similar or as you mentioned almost same, you can use Celery Queue but of course your queues across projects should be different.
Celery Docs for the same is provided here.
You need to set CELERY_DEFAULT_QUEUE, CELERY_DEFAULT_ROUTING_KEY and CELERY_QUEUES
in your settings.py file.
And your supervisor.conf file needs queue name in the commands line for all the programs.
For Ex: command=celery -A project beat -l info -Q <queue_name>
And that should work, based on my experience.

Related

How do celery workers communicate in Heroku

I have some celery workers in a Heroku app. My app is using python3.6and django, these are the relevant dependencies and their versions:
celery==3.1.26.post2
redis==2.10.3
django-celery==3.2.2
I do not know if the are useful to this question, but just in case. On Heroku we are running the Heroku-18 stack.
As it's usual, we have our workers declared in a Procfile, with the following content:
web: ... our django app ....
celeryd: python manage.py celery worker -Q celery --loglevel=INFO -O fair
one_type_of_worker: python manage.py celery worker -Q ... --maxtasksperchild=3 --loglevel=INFO -O fair
another_type: python manage.py celery worker -Q ... --maxtasksperchild=3 --loglevel=INFO -O fair
So, my current understanding of this process is the following:
Our celery queues run on multiple workers, each worker runs as a dyno on Heroku (not a server, but a “worker process” kind of thing, since servers aren’t a concept on Heroku). We also have multiple dynos running the same celery worker with the same queue, which results in multiple parallel “threads” for that queue to run more tasks simultaneously (scalability).
The web workers, celery workers, and celery queues can talk to each other because celery manages the orchestration between them. I think it's specifically the broker that handles this responsibility. But for example, this lets our web workers schedule a celery task on a specific queue and it is routed to the correct queue/worker, or a task running in one queue/worker can schedule a task on a different queue/worker.
Now here is when comes my question, so does the worker communicate? Do they use an API endpoint in localhost with a port? RCP? Do they use the broker url? Magic?
I'm asking this because I'm trying to replicate this setup in ECS and I need to know how to set it up for celery.

Here you go to know how celery works at heroku: https://devcenter.heroku.com/articles/celery-heroku
You can't run celery on Heroku without getting a Heroku dyno for celery. Also, make sure you have Redis configured on your Django celery settings.
to run the celery on Heroku, you just add this line to your Procfile
worker: celery -A YOUR-PROJECT_NAME worker -l info -B
Note: above celery commands will run both celery worker and celery beat
If you want to run it separately, you can use separate commands but one command is recommended

How to set up celery persistent state database

Celery task revocation is stored in the memory, so it will not persist when worker is restarted.
In Celery documentation it can be persisted using command celery -A proj worker -l info --statedb=/var/run/celery/worker.state
http://celery.readthedocs.io/en/latest/userguide/workers.html#worker-persistent-revokes
but when I run the command, I got error file not found, so I created the file, I ran the command again but then it tells me db type could not be determined.
I try to lookup how to set the persistent database to use in celery but got no results. Any help will be apreciated

So it turns out, I have to create the directory first and celery worker should be permitted creating a file in that directory.
My solution was to create celery directory in the project then run command:
celery -A proj worker -l info --statedb=celery/working.state
and it works

Django, Django Dynamic Scraper, Djcelery and Scrapyd - Not Sending Tasks in Production

I'm using Django Dynamic Scraper to build a basic web scraper. I have it 99% of the way finished. It works perfectly in development alongside Celery and Scrapyd. Tasks are sent and fulfilled perfectly.
As for production I'm pretty sure I have things set up correctly:
I'm using Supervisor to run Scrapyd and Celery on my VPS. They are both pointing at the correct virtualenv installations etc...
Here's how I know they're both set up fine for the project: When I SSH into my server and use the manage.py shell to execute a celery task, it returns an Async task which is then executed. The results appear in the database and both my scrapyd and celery log show the tasks being processed.
The issue is that my scheduled tasks are not being fired automatically - despite working perfectly find in development.
# django-celery settings
import djcelery
djcelery.setup_loader()
BROKER_URL = 'django://'
CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler'
And my Supervisor configs:
Celery Config:
[program:IG_Tracker]
command=/home/dean/Development/IG_Tracker/venv/bin/celery --
app=IG_Tracker.celery:app worker --loglevel=INFO -n worker.%%h
directory=/home/dean/Development/IG_Tracker/
user=root
numprocs=1
stdout_logfile=/home/dean/Development/IG_Tracker/celery-worker.log
stderr_logfile=/home/dean/Development/IG_Tracker/celery-worker.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
killasgroup=true
priority=998
Scrapyd Config:
[program:scrapyd]
directory=/home/dean/Development/IG_Tracker/instagram/ig_scraper
command=/home/dean/Development/IG_Tracker/venv/bin/scrapyd
environment=MY_SETTINGS=/home/dean/Development/IG_Tracker/IG_Trackersettings.py
user=dean
autostart=true
autorestart=true
redirect_stderr=true
numprocs=1
stdout_logfile=/home/dean/Development/IG_Tracker/scrapyd.log
stderr_logfile=/home/dean/Development/IG_Tracker/scrapyd.log
startsecs=10
I have followed the docs as close as I could and used the recommended tools for deployment (eg. scrapyd-deploy etc...). Additionally, when I run celery and scrapyd manually on the server (as one would in development) things work fine. It's just when the two are run using supervisor.
I'm probably missing some setting or another which is preventing my celery tasks stored in the SQLite DB from being picked up and ran automatically by celery/scrapyd when in production.

Okay - so I eventually got this working. Maybe this can help someone else. My issue was that I only had ONE supervisor process for celery where as it needs two - one for actually running the tasks (worker) and another for supervising the scheduling. I only had the worker. This explains why everything worked fine when I fired off a task using the django shell (essentially manually passing a task to the worker).
Here is my conf file for the 'scheduler' celery process:
[program:celery_beat]
command=/home/dean/Development/IG_Tracker/venv/bin/celery beat -A
IG_Tracker --loglevel=INFO
directory=/home/dean/Development/IG_Tracker/
user=root
numprocs=1
stdout_logfile=/home/dean/Development/IG_Tracker/celery-worker.log
stderr_logfile=/home/dean/Development/IG_Tracker/celery-worker.log
autostart=true
autorestart=true
startsecs=10
stopwaitsecs = 600
killasgroup=true
priority=998
I added that and ran:
supervisorctl reread
supervisorctl update
supervisotctl restart all
My tasks began running right away.

Running two celery workers in a server for two django application

I have a server in which two django application are running appone, apptwo
for them, two celery workers are started with commands:
celery worker -A appone -B --loglevel=INFO
celery worker -A apptwo -B --loglevel=INFO
Both points to same BROKER_URL = 'redis://localhost:6379'
redis is setup with db 0 and 1
I can see the task configured in these two apps in both app's log, which is leading to warnings and errors.
Can we configure in django settings such that the celery works exclusively without interfering with each other's tasks?

You can route tasks to different queues. Start Celery with two different -Q myqueueX and then use different CELERY_DEFAULT_QUEUE in your two Django projects.
Depending on your Celery configuration, your Django setting should look something like:
CELERY_DEFAULT_QUEUE = 'myqueue1'
You can also have more fine grained control with:
#celery.task(queue="myqueue3")
def some_task(...):
pass
More options here:
How to keep multiple independent celery queues?

Do not create pidfile and logfile when running celery worker as a daemon

While running the celery working using the following command creates two files w1.log and w1.pid which I do not want.
celery multi start w1 -A destiPak.celery -l info
Output
celery multi v3.1.20 (Cipater)
> Starting nodes...
> w1#foo-bar: OK
Show worker
celery multi show w1
Output
/Users/foo/bar/bin/python -m celery worker --detach -n w1#foo-bar --pidfile=w1.pid --logfile=w1.log --executable=/Users/foo/bar/bin/python
Suggestions, how to avoid creating those log file while Running the worker as a daemon

It's hard to understand why you would not want these files - the pid file is only a few bytes and the log files will contain useful information, and you can use logrotate or whatever to ensure that they do not take up too much space.
That said, if you use supervisord to manage the workers instead of celery multi you can configure it to not generate log files, plus it does not use .pid files.
here's a supervisord config file that should do what you want
[program:celery]
command=celery worker -n w1#foo-bar
autostart=true
stdout_logfile=/dev/null
redirect_stderr=true

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.