I am building a Python 3 application that will consume messages from RabbitMQ. Is there some Python background job library that can make this easy? I am looking for something similar to Sneakers in Ruby. I would like library to have:
easy way to define tasks that process RabbitMQ messages (I have a separate non-Python producer application that will create messages and put them into RabbitMQ)
configure number of worker processes that run
tasks
run workers as daemonized processes
I believe you're looking for Celery
You'll define task as follows
#task
def mytask(param):
return 1 + 1
It will be put in message broker (for example mentioned RabbitMQ), and then consumed and executed from celery
You can configure number of workers
celery worker --concurrency=10
And yes, it can be demonized
To consume task of RabbitMq you have to define worker, but to run worker in a daemonized mode you have to create a supervisor for that worker
command to start worker
celery worker --concurrency=10 -Ofair --loglevel=DEBUG -A file_name_without_extension -Q queue_name
steps to create supervisor
https://thomassileo.name/blog/2012/08/20/how-to-keep-celery-running-with-supervisor/
http://python-rq.org/patterns/supervisor/
Related
I am a little frustrated with celery documentation. I understand that the command
celery -A my-module worker -l INFO -n w1
starts a worker instance named w1. This means that the worker instance starts some default number of "processes" in the OS.
But can celery also start threads instead of processes? For example what does the following command do?
celery -A my-module worker --pool threads -l INFO -n w1
I tried reading through the documentation but could not find anything that would give an answer to the question "What does multi threading mean when it comes to celery? Can celery support multi threading in place of multi processing?"
I have some celery workers in a Heroku app. My app is using python3.6and django, these are the relevant dependencies and their versions:
celery==3.1.26.post2
redis==2.10.3
django-celery==3.2.2
I do not know if the are useful to this question, but just in case. On Heroku we are running the Heroku-18 stack.
As it's usual, we have our workers declared in a Procfile, with the following content:
web: ... our django app ....
celeryd: python manage.py celery worker -Q celery --loglevel=INFO -O fair
one_type_of_worker: python manage.py celery worker -Q ... --maxtasksperchild=3 --loglevel=INFO -O fair
another_type: python manage.py celery worker -Q ... --maxtasksperchild=3 --loglevel=INFO -O fair
So, my current understanding of this process is the following:
Our celery queues run on multiple workers, each worker runs as a dyno on Heroku (not a server, but a “worker process” kind of thing, since servers aren’t a concept on Heroku). We also have multiple dynos running the same celery worker with the same queue, which results in multiple parallel “threads” for that queue to run more tasks simultaneously (scalability).
The web workers, celery workers, and celery queues can talk to each other because celery manages the orchestration between them. I think it's specifically the broker that handles this responsibility. But for example, this lets our web workers schedule a celery task on a specific queue and it is routed to the correct queue/worker, or a task running in one queue/worker can schedule a task on a different queue/worker.
Now here is when comes my question, so does the worker communicate? Do they use an API endpoint in localhost with a port? RCP? Do they use the broker url? Magic?
I'm asking this because I'm trying to replicate this setup in ECS and I need to know how to set it up for celery.
Here you go to know how celery works at heroku: https://devcenter.heroku.com/articles/celery-heroku
You can't run celery on Heroku without getting a Heroku dyno for celery. Also, make sure you have Redis configured on your Django celery settings.
to run the celery on Heroku, you just add this line to your Procfile
worker: celery -A YOUR-PROJECT_NAME worker -l info -B
Note: above celery commands will run both celery worker and celery beat
If you want to run it separately, you can use separate commands but one command is recommended
I use celery4.x with Djangoand have more than two tasks in my celery queue. Due to the limit of GPU, I can only run at most two at the same time. Is there a way to let the third task wait and run until one of the previous two task? I have set CELERYD_CONCURRENCY paremeter in Django's settings.py which seems not work.
Anyone knows? Thanks
Run your worker using concurrency argument:
celery -A proj worker -l info --concurrency 2 -Q queue_name
My colleague has written celery tasks, necessary configuration in settings file, also supervisors config file. Everything is working perfectly fine. The projects is handed over to me and I seeing some issues that I have to fix.
There are two projects running on a single machine, both projects are almost same, lets call them projA and projB.
supervisord.conf file is as:
;for projA
[program:celeryd]
directory=/path_to_projA/
command=celery -A project worker -l info
...
[program:celerybeat]
directory=/path_to_projA/
command=celery -A project beat -l info
...
; For projB
[program:celerydB]
directory=/path_to_projB/
command=celery -A project worker -l info
...
[program:celerybeatB]
directory=/path_to_projB/
command=celery -A project beat -l info
...
The issue is, I am creating tasks through a loop and only one task is received from celeryd of projA, and remaining task are not in received (or could be received by celeryd of projB).
But when I stop celery programs for projB everything works well. Please note, the actual name of django-app is project hence celery -A project worker/beat -l info.
Please bare, I am new to celery, any help is appreciated. TIA.
As the Celery docs says,
Celery is an asynchronous task queue/job queue based on distributed message passing.
When multiple tasks are created through a loop, tasks are evenly distributed to two different workers ie worker of projA and worker of projB since your workers are same.
If projects are similar or as you mentioned almost same, you can use Celery Queue but of course your queues across projects should be different.
Celery Docs for the same is provided here.
You need to set CELERY_DEFAULT_QUEUE, CELERY_DEFAULT_ROUTING_KEY and CELERY_QUEUES
in your settings.py file.
And your supervisor.conf file needs queue name in the commands line for all the programs.
For Ex: command=celery -A project beat -l info -Q <queue_name>
And that should work, based on my experience.
I have celery running in a docker container processing tasks from rabbitmq. I am trying to stop and remove the celery container, while allowing the current running tasks to complete. The docs suggest that sending the TERM or INT signals to the main process should warm shutdown celery, although I am finding that the child processes are just being killed.
When I send TERM the running processes it throws:
WorkerLostError('Worker exited prematurely: signal 15 (SIGTERM).',)
When I send INT the running process just exits with no error, although it too doesn't allow the tasks to finish as the docs suggest.
I am starting the docker container with the command:
su -m celery_user -c "python manage.py celery worker -Q queue-name"
Any thoughts on why this might be happening? Could it be that the signal is terminating the container as well as the celery process?
I am sending the signal with:
docker kill --signal="TERM" containerid
or docker exec containerid kill -15 1
docker kill will kill the container. What you need to do is to send the signal only to the main celery process.
Personally I use supservisord inside the docker container to manage the celery worker. By default supervisord will send SIGTERM to stop the process.
Here's a sample supervisor config for celery
[program:celery]
command=celery worker -A my.proj.tasks --loglevel=DEBUG -Ofair --hostname celery.host.domain.com --queues=celery
environment=PYTHONPATH=/etc/foo/celeryconfig:/bar/Source,PATH=/foo/custom/bin:/usr/kerberos/bin
user=celery-user
autostart=true
stdout_logfile=/var/log/supervisor/celery.log
redirect_stderr=true