Celery cannot detect nodes - python

I've started a celery3 worker (Redis backend) on dev machine with a command like:
celery -A tasks worker --loglevel=info -E
(and the celery screen says that events are enabled)
Then I try to get stats for this working with command:
celery status
which results in
Error: No nodes replied within time constraint
What can be a possible cause for this?
I've already tried restarting the working and the machine.

Related

How do celery workers communicate in Heroku

I have some celery workers in a Heroku app. My app is using python3.6and django, these are the relevant dependencies and their versions:
celery==3.1.26.post2
redis==2.10.3
django-celery==3.2.2
I do not know if the are useful to this question, but just in case. On Heroku we are running the Heroku-18 stack.
As it's usual, we have our workers declared in a Procfile, with the following content:
web: ... our django app ....
celeryd: python manage.py celery worker -Q celery --loglevel=INFO -O fair
one_type_of_worker: python manage.py celery worker -Q ... --maxtasksperchild=3 --loglevel=INFO -O fair
another_type: python manage.py celery worker -Q ... --maxtasksperchild=3 --loglevel=INFO -O fair
So, my current understanding of this process is the following:
Our celery queues run on multiple workers, each worker runs as a dyno on Heroku (not a server, but a “worker process” kind of thing, since servers aren’t a concept on Heroku). We also have multiple dynos running the same celery worker with the same queue, which results in multiple parallel “threads” for that queue to run more tasks simultaneously (scalability).
The web workers, celery workers, and celery queues can talk to each other because celery manages the orchestration between them. I think it's specifically the broker that handles this responsibility. But for example, this lets our web workers schedule a celery task on a specific queue and it is routed to the correct queue/worker, or a task running in one queue/worker can schedule a task on a different queue/worker.
Now here is when comes my question, so does the worker communicate? Do they use an API endpoint in localhost with a port? RCP? Do they use the broker url? Magic?
I'm asking this because I'm trying to replicate this setup in ECS and I need to know how to set it up for celery.
Here you go to know how celery works at heroku: https://devcenter.heroku.com/articles/celery-heroku
You can't run celery on Heroku without getting a Heroku dyno for celery. Also, make sure you have Redis configured on your Django celery settings.
to run the celery on Heroku, you just add this line to your Procfile
worker: celery -A YOUR-PROJECT_NAME worker -l info -B
Note: above celery commands will run both celery worker and celery beat
If you want to run it separately, you can use separate commands but one command is recommended

Celery worker stops when console is closed [duplicate]

I am running a celery worker like this:
celery worker --app=portalmq --logfile=/tmp/portalmq.log --loglevel=INFO -E --pidfile=/tmp/portalmq.pid
Now I want to run this worker in the background. I have tried several things, including:
nohup celery worker --app=portalmq --logfile=/tmp/portal_mq.log --loglevel=INFO -E --pidfile=/tmp/portal_mq.pid >> /tmp/portal_mq.log 2>&1 </dev/null &
But it is not working. I have checked the celery documentation, and I found this:
Running the worker as a daemon
Running the celery worker server
Specially this comment is relevant:
In production you will want to run the worker in the background as a daemon.
To do this you need to use the tools provided by your platform, or something
like supervisord (see Running the worker as a daemon for more information).
This is too much overhead just to run a process in the background. I would need to install supervisord in my servers, and get familiar with it. No go at the moment. Is there a simple way of running a celery worker in the backrground?
supervisor is really simple and requires really little work to get it setup up, same applies for to celery in combination with supervisor.
It should not take more than 10 minutes to setup it up :)
install supervisor with apt-get
create /etc/supervisor/conf.d/celery.conf config file
paste somethis in the celery.conf file
[program:celery]
directory = /my_project/
command = /usr/bin/python manage.py celery worker
plus (if you need) some optional and useful stuff (with dummy
values)
user = celery_user
group = celery_group
stdout_logfile = /var/log/celeryd.log
stderr_logfile = /var/log/celeryd.err
autostart = true
environment=PATH="/some/path/",FOO="bar"
restart supervisor (or do supervisorctl reread; supervisorctl add
celery)
after that you get the nice ctl commands to manage the celery process:
supervisorctl start/restart/stop celery
supervisorctl tail [-f] celery [stderr]
celery worker -A app.celery --loglevel=info --detach
For me this one worked, I was using celery with django
celery -A proj_name worker -l INFO --detach
I have faced the same problem as a lazy solution is to use & at the end of the command.
For example
celery worker -A <app>.celery --loglevel=info &
Below command when executed in terminal will start celery as a background process.
celery -A app.celery worker --loglevel=info --detach
Incase you want stop it then ps aux | grep celery as mentioned #Kaiss B. in another answer's comment & kill -9 <process id> to kill the process.
But first of all you need to install the celery for
apt install python-celery-common.
Some of the guys might be wondering why the other answers which are upvoted but not working in there system is because celery changed the command syntax from
celery worker -A app.celery --loglevel=info --detach
to
celery -A app.celery worker --loglevel=info --detach
Hope that helps.

How to set up celery persistent state database

Celery task revocation is stored in the memory, so it will not persist when worker is restarted.
In Celery documentation it can be persisted using command celery -A proj worker -l info --statedb=/var/run/celery/worker.state
http://celery.readthedocs.io/en/latest/userguide/workers.html#worker-persistent-revokes
but when I run the command, I got error file not found, so I created the file, I ran the command again but then it tells me db type could not be determined.
I try to lookup how to set the persistent database to use in celery but got no results. Any help will be apreciated
So it turns out, I have to create the directory first and celery worker should be permitted creating a file in that directory.
My solution was to create celery directory in the project then run command:
celery -A proj worker -l info --statedb=celery/working.state
and it works

Airflow celery worker keeps trying to run completed tasks

I recently added a new machine to my Airflow celery cluster (one that is listening on a separate queue).
Everything seemed to be running fine BUT the new worker keeps picking up the same couple of (completed) tasks over and over again. This is invisible from the airflow web interface, which just shows the old tasks as complete and no new tasks being picked up by the worker.
Checking the old task logs gives me messages like the following:
[2018-04-15 04:13:15,374] {base_task_runner.py:95} INFO - Subtask:
[2018-04-15 04:13:15,373] {models.py:1120} INFO - Dependencies not met
for <TaskInstance: my_task 2018-04-13 03:05:00 [success]>, dependency
'Task Instance State' FAILED: Task is in the 'success' state which is
not a valid state for execution. The task must be cleared in order to
be run.
over and over again
I've checked the metadata database and the tasks do show up as 'done'. I've tried restarting Celery, the scheduler, the worker and the servers themselves to no avail. Both the worker and the scheduler are running on UTC timezone as intended.
setup info:
EC2 cluster on AWS
MySQL Celery backend
Airflow 1.8.0
Has anyone ever run into anything like this?

Django, Django Dynamic Scraper, Djcelery and Scrapyd - Not Sending Tasks in Production

I'm using Django Dynamic Scraper to build a basic web scraper. I have it 99% of the way finished. It works perfectly in development alongside Celery and Scrapyd. Tasks are sent and fulfilled perfectly.
As for production I'm pretty sure I have things set up correctly:
I'm using Supervisor to run Scrapyd and Celery on my VPS. They are both pointing at the correct virtualenv installations etc...
Here's how I know they're both set up fine for the project: When I SSH into my server and use the manage.py shell to execute a celery task, it returns an Async task which is then executed. The results appear in the database and both my scrapyd and celery log show the tasks being processed.
The issue is that my scheduled tasks are not being fired automatically - despite working perfectly find in development.
# django-celery settings
import djcelery
djcelery.setup_loader()
BROKER_URL = 'django://'
CELERYBEAT_SCHEDULER = 'djcelery.schedulers.DatabaseScheduler'
And my Supervisor configs:
Celery Config:
[program:IG_Tracker]
command=/home/dean/Development/IG_Tracker/venv/bin/celery --
app=IG_Tracker.celery:app worker --loglevel=INFO -n worker.%%h
directory=/home/dean/Development/IG_Tracker/
user=root
numprocs=1
stdout_logfile=/home/dean/Development/IG_Tracker/celery-worker.log
stderr_logfile=/home/dean/Development/IG_Tracker/celery-worker.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
killasgroup=true
priority=998
Scrapyd Config:
[program:scrapyd]
directory=/home/dean/Development/IG_Tracker/instagram/ig_scraper
command=/home/dean/Development/IG_Tracker/venv/bin/scrapyd
environment=MY_SETTINGS=/home/dean/Development/IG_Tracker/IG_Trackersettings.py
user=dean
autostart=true
autorestart=true
redirect_stderr=true
numprocs=1
stdout_logfile=/home/dean/Development/IG_Tracker/scrapyd.log
stderr_logfile=/home/dean/Development/IG_Tracker/scrapyd.log
startsecs=10
I have followed the docs as close as I could and used the recommended tools for deployment (eg. scrapyd-deploy etc...). Additionally, when I run celery and scrapyd manually on the server (as one would in development) things work fine. It's just when the two are run using supervisor.
I'm probably missing some setting or another which is preventing my celery tasks stored in the SQLite DB from being picked up and ran automatically by celery/scrapyd when in production.
Okay - so I eventually got this working. Maybe this can help someone else. My issue was that I only had ONE supervisor process for celery where as it needs two - one for actually running the tasks (worker) and another for supervising the scheduling. I only had the worker. This explains why everything worked fine when I fired off a task using the django shell (essentially manually passing a task to the worker).
Here is my conf file for the 'scheduler' celery process:
[program:celery_beat]
command=/home/dean/Development/IG_Tracker/venv/bin/celery beat -A
IG_Tracker --loglevel=INFO
directory=/home/dean/Development/IG_Tracker/
user=root
numprocs=1
stdout_logfile=/home/dean/Development/IG_Tracker/celery-worker.log
stderr_logfile=/home/dean/Development/IG_Tracker/celery-worker.log
autostart=true
autorestart=true
startsecs=10
stopwaitsecs = 600
killasgroup=true
priority=998
I added that and ran:
supervisorctl reread
supervisorctl update
supervisotctl restart all
My tasks began running right away.

Categories