I have a django REST API setup on one machine (currently in test on local machine but will be on a web server eventually). Let's call this machine "client". I also have a computing server to run CPU-intensive tasks that requires a long execution time. Let's call this machine "run-server".
"run-server" runs a celery worker connected to a local rabbitmq server. The worker currently is in a git module with this structure:
proj/
client.py
cmd.sh
requirements.txt
tasks.py
The whole thing runs in a virtualenv for what it's worth. The cmd.sh basically executes celery multi start workername -A tasks -l info on "run-server". The client.py is a cli script that can submit a tasks to the "run-server" manually from the shell from any machine (i.e. the "client").
I want to run the equivalent of the client script from a django setup without having to copy the tasks.py and client.py code in the django repository. Ideally I would pip install proj from the django code and import proj to use it just like the client script does.
How can I package proj to achieve that?
I am used to package my own python module with a structure roughly looking like:
proj/
bin/
proj
proj/
__init__.py
__main__.py
script.py
setup.py
requirements.txt
I managed to make it work on my own. The structure above just works. Instead of celery multi start workername -A tasks -l info, you simply replace with celery multi start workername -A proj.tasks -l info and everything works. The same version of the module have to be installed within django and as the worker because the job queuein is done via duck-typing (i.e. the path and names must match)
Related
How to structure my python rest api (FastAPI) project?
Different api endpoints submit tasks to different celery workers. I want each celery worker to be build as a separate image and all builds are managed by docker-compose.
I tried separating api directory from celery worker directories and put a Dockerfile in each, but I ran into the problem when the task was submitted to the worker from the unauthorized task. Maybe there is a way to fix it, but it would seem to me like a workaround.
Update
my_app/
docker-compose.yml
fastapi_app/
api/
...
app.py
Dockerfile
worker_app1/
core_app_code/
...
Dockerfile
worker_app2/
core_app_code/
...
Dockerfile
Main question is, where the tasks should be defined for each worker, so that that fastapi_app could submit them.
You don't need to have two docker file for celery worker and API, you can directly write celery command in docker compose file.
See below example to run celery worker with docker compose file.
version: "3"
services:
worker:
build: . #your celery app path
command: celery -A tasks worker --loglevel=info #change loglevel and worker for production
depends-on:
- "redis" #your amqp broker
I have Django and Celery set up. I am only using one node for the worker.
I want to use use it as an asynchronous queue and as a scheduler.
I can launch the task as follows, with the -B option and it will do both.
celery worker start 127.0.0.1 --app=myapp.tasks -B
However it is unclear how to do this on production when I want to daemonise the process. Do I need to set up both the init scripts?
I have tried adding the -B option to the init.d script, but it doesn't seem to have any effect. The documentation is not very clear.
Personally I use Supervisord, which has some nice options and configurability. There are example supervisord config files here
A couple of ways to achieve this:
http://celery.readthedocs.org/en/latest/tutorials/daemonizing.html
1. Celery distribution comes with a generic init scripts located in path-to-celery/celery-3.1.10/extra/generic-init.d/celeryd
this can be placed in /etc/init.d/celeryd-name and configured using a configuration file also present in the distribution which would look like the following
# Names of nodes to start (space-separated)
#CELERYD_NODES="my_application-node_1"
# Where to chdir at start. This could be the root of a virtualenv.
#CELERYD_CHDIR="/path/to/my_application"
# How to call celeryd-multi
#CELERYD_MULTI="$CELERYD_CHDIR/bin/celeryd-multi
# Extra arguments
#CELERYD_OPTS="--app=my_application.path.to.worker --time-limit=300 --concurrency=8 --loglevel=DEBUG"
# Create log/pid dirs, if they don't already exist
#CELERY_CREATE_DIRS=1
# %n will be replaced with the nodename
#CELERYD_LOG_FILE="/path/to/my_application/log/%n.log"
#CELERYD_PID_FILE="/var/run/celery/%n.pid"
# Workers run as an unprivileged user
#CELERYD_USER=my_user
#CELERYD_GROUP=my_group
You can add the following celerybeat elements for celery beat configuration to the file
# Where to chdir at start.
CELERYBEAT_CHDIR="/opt/Myproject/"
# Extra arguments to celerybeat
CELERYBEAT_OPTS="--schedule=/var/run/celery/celerybeat-schedule"
This config should be then saved in (atleast for centos) /etc/default/celeryd-config-name
Look at the init file for the exact location.
now you can run celery as a daemon by running commands
/etc/init.d/celeryd star/restart/stop
Using supervisord.
As mentioned in the other answer.
The superviosord configuration files are also in the distribution path-to-dist/celery-version/extra/supervisord
Configure using the files and use superviosrctl to run the service as a daemon
I am relatively new to docker, celery and rabbitMQ.
In our project we currently have the following setup:
1 physical host with multiple docker containers running:
1x rabbitmq:3-management container
# pull image from docker hub and install
docker pull rabbitmq:3-management
# run docker image
docker run -d -e RABBITMQ_NODENAME=my-rabbit --name some-rabbit -p 8080:15672 -p 5672:5672 rabbitmq:3-management
1x celery container
# pull docker image from docker hub
docker pull celery
# run celery container
docker run --link some-rabbit:rabbit --name some-celery -d celery
(there are some more containers, but they should not have to do anything with the problem)
Task File
To get to know celery and rabbitmq a bit, I created a tasks.py file on the physical host:
from celery import Celery
app = Celery('tasks', backend='amqp', broker='amqp://guest:guest#172.17.0.81/')
#app.task(name='tasks.add')
def add(x, y):
return x + y
The whole setup seems to be working quite fine actually. So when I open a python shell in the directory where tasks.py is located and run
>>> from tasks import add
>>> add.delay(4,4)
The task gets queued and directly pulled from the celery worker.
However, the celery worker does not know the tasks module regarding to the logs:
$ docker logs some-celery
[2015-04-08 11:25:24,669: ERROR/MainProcess] Received unregistered task of type 'tasks.add'.
The message has been ignored and discarded.
Did you remember to import the module containing this task?
Or maybe you are using relative imports?
Please see http://bit.ly/gLye1c for more information.
The full contents of the message body was:
{'callbacks': None, 'timelimit': (None, None), 'retries': 0, 'id': '2b5dc209-3c41-4a8d-8efe-ed450d537e56', 'args': (4, 4), 'eta': None, 'utc': True, 'taskset': None, 'task': 'tasks.add', 'errbacks': None, 'kwargs': {}, 'chord': None, 'expires': None} (256b)
Traceback (most recent call last):
File "/usr/local/lib/python3.4/site-packages/celery/worker/consumer.py", line 455, in on_task_received
strategies[name](message, body,
KeyError: 'tasks.add'
So the problem obviously seems to be, that the celery workers in the celery container do not know the tasks module.
Now as I am not a docker specialist, I wanted to ask how I would best import the tasks module into the celery container?
Any help is appreciated :)
EDIT 4/8/2015, 21:05:
Thanks to Isowen for the answer. Just for completeness here is what I did:
Let's assume my tasks.py is located on my local machine in /home/platzhersh/celerystuff. Now I created a celeryconfig.py in the same directory with the following content:
CELERY_IMPORTS = ('tasks')
CELERY_IGNORE_RESULT = False
CELERY_RESULT_BACKEND = 'amqp'
As mentioned by Isowen, celery searches /home/user of the container for tasks and config files. So we mount the /home/platzhersh/celerystuff into the container when starting:
run -v /home/platzhersh/celerystuff:/home/user --link some-rabbit:rabbit --name some-celery -d celery
This did the trick for me. Hope this helps some other people with similar problems.
I'll now try to expand that solution by putting the tasks also in a separate docker container.
As you suspect, the issue is because the celery worker does not know the tasks module. There are two things you need to do:
Get your tasks definitions "into" the docker container.
Configure the celery worker to load those task definitions.
For Item (1), the easiest way is probably to use a "Docker Volume" to mount a host directory of your code onto the celery docker instance. Something like:
docker run --link some-rabbit:rabbit -v /path/to/host/code:/home/user --name some-celery -d celery
Where /path/to/host/code is the your host path, and /home/user is the path to mount it on the instance. Why /home/user in this case? Because the Dockerfile for the celery image defines the working directory (WORKDIR) as /home/user.
(Note: Another way to accomplish Item (1) would be to build a custom docker image with the code "built in", but I will leave that as an exercise for the reader.)
For Item (2), you need to create a celery configuration file that imports the tasks file. This is a more general issue, so I will point to a previous stackoverflow answer: Celery Received unregistered task of type (run example)
By following this tutorial, I have now a Celery-Django app that is working fine if I launch the worker with this command:
celery -A myapp worker -n worker1.%h
in my Django settings.py, I set all parameters for Celery (IP of the messages broker, etc...). Everything is working well.
My next step now, is to run this app as a Daemon. So I have followed this second tutorial and everything is simple, except now, my Celery parameters included in settings.py are not loaded. By example, messages broker IP is set to 127.0.0.1 but in my settings.py, I set it at an other IP address.
In the tutorial, they say:
make sure that the module that defines your Celery app instance also sets a default value for DJANGO_SETTINGS_MODULE as shown in the example Django project in First steps with Django.
So I made it sure. I had in /etc/default/celeryd this:
export DJANGO_SETTINGS_MODULE="myapp.settings"
Still not working... So I also, had this line in /etc/init.d/celeryd, again not working.
I don't know what to do anymore. Is someone has a clue?
EDIT:
Here is my celery.py:
from __future__ import absolute_import
import os
from django.conf import settings
from celery import Celery
# set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'myapp.settings')
app = Celery('myapp')
# Using a string here means the worker will not have to
# pickle the object when using Windows.
app.config_from_object('django.conf:settings')
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
#app.task(bind=True)
def debug_task(self):
print('Request: {0!r}'.format(self.request))
EDIT #2:
Here is my /etc/default/celeryd:
# Names of nodes to start
# most will only start one node:
CELERYD_NODES="worker1.%h"
# Absolute or relative path to the 'celery' command:
CELERY_BIN="/usr/local/bin/celery"
# App instance to use
# comment out this line if you don't use an app
CELERY_APP="myapp"
# Where to chdir at start.
CELERYD_CHDIR="/home/ubuntu/myapp-folder/"
# Extra command-line arguments to the worker
CELERYD_OPTS=""
# %N will be replaced with the first part of the nodename.
CELERYD_LOG_FILE="/var/log/celery/%N.log"
CELERYD_PID_FILE="/var/run/celery/%N.pid"
# Workers should run as an unprivileged user.
# You need to create this user manually (or you can choose
# a user/group combination that already exists, e.g. nobody).
CELERYD_USER="ubuntu"
CELERYD_GROUP="ubuntu"
# If enabled pid and log directories will be created if missing,
# and owned by the userid/group configured.
CELERY_CREATE_DIRS=1
# Name of the projects settings module.
export DJANGO_SETTINGS_MODULE=myapp.settings
export PYTHONPATH=$PYTHONPATH:/home/ubuntu/myapp-folder
All answers here could be a part of the solution but at the end, it was still not working.
But I finally succeeded to make it work.
First of all, in /etc/init.d/celeryd, I have changed this line:
CELERYD_MULTI=${CELERYD_MULTI:-"celeryd-multi"}
by:
CELERYD_MULTI=${CELERYD_MULTI:-"celery multi"}
The first one was tagged as deprecated, could be the problem.
Moreover, I put this as option:
CELERYD_OPTS="--app=myapp"
And don't forget to export some environments variables:
# Name of the projects settings module.
export DJANGO_SETTINGS_MODULE="myapp.settings"
export PYTHONPATH="$PYTHONPATH:/home/ubuntu/myapp-folder"
With all of this, it's now working on my side.
The problem is most likely that celeryd can't find your Django settings file because myapp.settings isn't in the the $PYTHONPATH then the application runs.
From what I recall, Python will look in the $PYTHONPATH as well as the local folder when importing files. When celeryd runs, it likely checks the path for a module app, doesn't find it, then looks in the current folder for a folder app with an __init__.py (i.e. a python module).
I think that all you should need to do is add this to your /etc/default/celeryd file:
export $PYTHONPATH:path/to/your/app
Below method does not helps to run celeryd, rather helps to run celery worker as a service which will be started at boot time.
commands like this sudo service celery status also works.
celery.conf
# This file sits in /etc/init
description "Celery for example"
start on runlevel [2345]
stop on runlevel [!2345]
#Send KILL after 10 seconds
kill timeout 10
script
#project(working_ecm) and Vitrualenv(working_ecm/env) settings
chdir /home/hemanth/working_ecm
exec /home/hemanth/working_ecm/env/bin/python manage.py celery worker -B -c 2 -f /var/log/celery-ecm.log --loglevel=info >> /tmp/upstart-celery-job.log 2>&1
end script
respawn
In your second tutorial they set the django_settings variable to:
export DJANGO_SETTINGS_MODULE="settings"
This could be a reason why your settings is not found in case it changes to directory
"/home/ubuntu/myapp-folder/"
Then you defined your app with "myapp" and then you say settings is in "myapp.settings"
This could lead to the fact that it searchs the settings file in
"/home/ubuntu/myapp-folder/myapp/myapp/settings"
So my suggestion is to remove the "myapp." in the DJANGO_SETTINGS_MODULE variable and dont forget quotation marks
I'd like to add an answer for anyone stumbling on this more recently.
I followed the getting started First Steps guide to a tee with Celery 4.4.7, as well as the Daemonization tutorial without luck.
My initial issue:
celery -A app_name worker -l info works without issue (actual celery configuration is OK).
I could start celeryd as daemon and status command would show OK, but couldn't receive tasks. Checking logs, I saw the following:
[2020-11-01 09:33:15,620: ERROR/MainProcess] consumer: Cannot connect to amqp://guest:**#127.0.0.1:5672//: [Errno 111] Connection refused.
This was an indication that celeryd was not connecting to my broker (redis). Given CELERY_BROKER_URL was already set in my configuration, this meant my celery app settings were not being pulled in for the daemon process.
I tried sudo C_FAKEFORK=1 sh -x -l -E /etc/init.d/celeryd start to see if any of my celery settings were pulled in, and i noticed that app was set to default default (not the app name specified as CELERY_APP in /etc/default/celeryd
Since celery -A app_name worker -l info worked, fixed the issue by exporting CELERY_APP in /etc/default/celeryd/, instead of just setting the variable per documentation.
TL;DR
If celery -A app_name worker -l info works (replace app_name with what you've defined in the Celery first steps guide), and sudo C_FAKEFORK=1 sh -x -l -E /etc/init.d/celeryd start does not show your celery app settings being pulled in, add the following to the end of your /etc/default/celeryd:
export CELERY_APP="app_name"
I am using Fabric to deploy a Celery broker (running RabbitMQ) and multiple Celery workers with celeryd daemonized through supervisor. I cannot for the life of me figure out how to reload the tasks.py module short of rebooting the servers.
/etc/supervisor/conf.d/celeryd.conf
[program:celeryd]
directory=/fab-mrv/celeryd
environment=[RABBITMQ crendentials here]
command=xvfb-run celeryd --loglevel=INFO --autoreload
autostart=true
autorestart=true
celeryconfig.py
import os
## Broker settings
BROKER_URL = "amqp://%s:%s#hostname" % (os.environ["RMQU"], os.environ["RMQP"])
# List of modules to import when celery starts.
CELERY_IMPORTS = ("tasks", )
## Using the database to store task state and results.
CELERY_RESULT_BACKEND = "amqp"
CELERYD_POOL_RESTARTS = True
Additional information
celery --version 3.0.19 (Chiastic Slide)
python --version 2.7.3
lsb_release -a Ubuntu 12.04.2 LTS
rabbitmqctl status ... 2.7.1 ...
Here are some things I have tried:
The celeryd --autoreload flag
sudo supervisorctl restart celeryd
celery.control.broadcast('pool_restart', arguments={'reload': True})
ps auxww | grep celeryd | grep -v grep | awk '{print $2}' | xargs kill -HUP
And unfortunately, nothing causes the workers to reload the tasks.py module (e.g. after running git pull to update the file). The gist of the relevant fab functions is available here.
The brokers/workers run fine after a reboot.
Just a shot in the dark, with the celeryd --autoreload option did you make sure you have one of the file system notification backends? It recommends PyNotify for linux, so I'd start by making sure you have that installed.
I faced a similar problem and was able to use Watchdog to reload the tasks.py tasks modules when there are changes detected. To install:
pip install watchdog
You can programmatically use the Watchdog API, for example, to monitor for directory changes in the file system. Additionally Watchdog provides an optional shell utility called watchmedo that can be used to execute commands on event. Here is an example that starts the Celery worker via Watchdog and reloads on any changes to .py files including changes via git pull:
watchmedo auto-restart --directory=./ --pattern="*.py" --recursive -- celery worker --app=worker.app --concurrency=1 --loglevel=INFO
Using Watchdog's watchmedo I was able to git pull changes and the respective tasks.py modules were auto reloaded without any reboot of the container or server.