I'm learning Celery and I'd like to ask:
Which is the absolute simplest way to get Celery to automatically run when Django starts in Ubuntu?. Now I manually start celery -A {prj name} worker -l INFO via the terminal.
Can I make any type of configuration so Celery catches the changes in tasks.py code without the need to restart Celery? Now I ctrl+c and type celery -A {prj name} worker -l INFO every time I change something in the tasks.py code. I can foresee a problem in such approach in production if I can get Celery start automatically ==> need to restart Ubuntu instead?.
(setup: VPS, Django, Ubuntu 18.10 (no docker), no external resources, using Redis (that starts automatically)
I am aware it is a similar question to Django-Celery in production and How to ... but still it is a bit unclear as it refers to amazon and also using shell scripts, crontabs. It seems a bit peculiar that these things wouldn't work out of the box.
I give benefit to the doubt that I have misunderstood the setup of Celery.
I have a deploy script that launch Celery in production.
In production it's better to launch worker :
celery multi stop 5
celery multi start 5 -A {prj name} -Q:1 default -Q:2,3 QUEUE1 -Q:4,5 QUEUE2 --pidfile="%n.pid"
this will stop and launch 5 worker for different Queue
Celery at launch will get the wsgi file that will use this instance of your code, it's mean you need to relaunch it to apply modification, you cannot add a watcher in production (memory cost)
Related
I wanted to know if there is a way to restart celery worker if celery worker is down due to some error or issue, so that it can be automatically restarted programmatically.
Check out this SO thread.
AS you are using windows, check for the ability to run Celery as a service, such as is explained right here on SO.
I use this tool http://python-rq.org/
I have a flask app, but I could not find the way to start rq worker other than with rq cli, e.g. $ rq worker
I need to have the worker running all the time, how can I make it running as a service? I need the service to start up on boot also.
You should investigate some supervisor program to control your rq worker. Take a look at supervisor or systemd. I personally use supervisord and it's pretty popular in the Python community.
This is how any supervisor program works (not to be confused with supervisord): the supervisor itself is a service (controlled by another service! e.g., systemd, initd, etc) and it runs programs specified in its configuration file. If a program exits or has issues, the supervisor will respawn it.
If you were in the docker ecosystem, it'd be simpler because docker can be your supervisor.
Multiple options :
For dev purpose, I recommand a docker container (https://docker.com). For production purpose, the cleanest way is : use the packaged version for your system (assuming you're using a GNU/Linux box) along with a dedicated systemd unit.
For example, on Debian :
apt-get update && apt-get install redis
Then edit redis.conf, and start it :
systemctl start redis
Enable it (== start redis during startup)
systemctl enable redis
I have a persistent background process currently running as a standalone python script on my ubuntu server, managed by supervisor. However, I am migrating to Heroku and wonder if anyone have any experiences setting up the same kind of environment.
The specifications of the script;
Fetch information from external API
Do calculations on the data
Store the data to the database
If the script used less than 5 seconds, sleep for the remaining time, else run again
I could run a cronjob every 5 seconds, but from time to time step 1-3 can take up to a full hour.
Any tips?
Thanks.
What you want to do is create a worker process. Simply define a command line script so that you can call it easily, then in your Procfile, add a new worker entry like so:
# Procfile
web: python manage.py runserver # example
worker: python manage.py start_cronjob # command to run your background process
Once you've got this defined in your Procfile, go ahead and push your app to Heroku, then scale up a worker process:
$ heroku scale worker=1
This will launch a single worker process.
To view the logs and ensure things are working as expected, you can say:
$ heroku logs --tail --ps worker
i am trying to install an init.d script, to run celery for scheduling tasks. when i tried to start it by sudo /etc/init.d/celeryd start, it throws error "User does not exist: 'celery'"
my celery configuration file (/etc/default/celeryd) contains these:
# Workers should run as an unprivileged user.
CELERYD_USER="celery"
CELERYD_GROUP="celery"
i know that these are wrong that is why it throws error.
The documentation just says this:
CELERYD_USER
User to run celeryd as. Default is current user.
nothing more about it.
Any help will be appreciated.
I am adding a proper answer in order to be clearly visible:
Workers are unix processes that will run the various celery tasks. As you can see in the documentation, the CELERYD_USER and CELERYD_GROUP determine the name of user and group these workers will be run in your Unix environment.
So, what happened initially in your case is that celery tried to start the worker with a user named "celery" which did not exist in your system. When you commented out these two options, then celery started the workers with the user that issued the command sudo /etc/init.d/celeryd start which in this case is the root (administrator) user (default is the current user).
However, it is recommended to run the workers as unpriviledged users and not as root for obvious reasons. So I recommend to actually add the celery user and group using the small tutorial found here http://www.cyberciti.biz/faq/unix-create-user-account/ and uncomment again the
CELERYD_USER="celery"
CELERYD_GROUP="celery"
options.
I have a wsgi app with a celery component. Basically, when certain requests come in they can hand off relatively time-consuming tasks to celery. I have a working version of this product on a server I set up myself, but our client recently asked me to deploy it to Cloud Foundry. Since Celery is not available as a service on Cloud Foundry, we (me and the client's deployment team) decided to deploy the app twice – once as a wsgi app and once as a standalone celery app, sharing a rabbitmq service.
The code between the apps is identical. The wsgi app responds correctly, returning the expected web pages. vmc logs celeryapp shows that celery is to be up-and-running, but when I send requests to wsgi that should become celery tasks, they disappear as soon as they get to a .delay() statement. They neither appear in the celery logs nor do they appear as an error.
Attempts to debug:
I can't use celery.contrib.rdb in Cloud Foundry (to supply a telnet interface to pdb), as each app is sandboxed and port-restricted.
I don't know how to find the specific rabbitmq instance these apps are supposed to share, so I can see what messages it's passing.
Update: to corroborate the above statement about finding rabbitmq, here's what happens when I try to access the node that should be sharing celery tasks:
root#cf:~# export RABBITMQ_NODENAME=eecef185-e1ae-4e08-91af-47f590304ecc
root#cf:~# export RABBITMQ_NODE_PORT=57390
root#cf:~# ~/cloudfoundry/.deployments/devbox/deploy/rabbitmq/sbin/rabbitmqctl list_queues
Listing queues ...
=ERROR REPORT==== 18-Jun-2012::11:31:35 ===
Error in process <0.36.0> on node 'rabbitmqctl17951#cf' with exit value: {badarg,[{erlang,list_to_existing_atom,["eecef185-e1ae-4e08-91af-47f590304ecc#localhost"]},{dist_util,recv_challenge,1},{dist_util,handshake_we_started,1}]}
Error: unable to connect to node 'eecef185-e1ae-4e08-91af-47f590304ecc#cf': nodedown
diagnostics:
- nodes and their ports on cf: [{'eecef185-e1ae-4e08-91af-47f590304ecc',57390},
{rabbitmqctl17951,36032}]
- current node: rabbitmqctl17951#cf
- current node home dir: /home/cf
- current node cookie hash: 1igde7WRgkhAea8fCwKncQ==
How can I debug this and/or why are my tasks vanishing?
Apparently the problem was caused by a deadlock between the broker and the celery worker, such that the worker would never acknowledge the task as complete, and never accept a new task, but never crashed or failed either. The tasks weren't vanishing; they were simply staying in queue forever.
Update: The deadlock was caused by the fact that we were running celeryd inside a wrapper script that installed dependencies. (Literally pip install -r requirements.txt && ./celeryd -lINFO). Because of how Cloud Foundry manages process trees, Cloud Foundry would try to kill the parent process (bash), which would HUP celeryd, but ultimately lots of child processes would never die.