RestAPI project structure to build separate images for celery workers

RestAPI project structure to build separate images for celery workers - python

How to structure my python rest api (FastAPI) project?
Different api endpoints submit tasks to different celery workers. I want each celery worker to be build as a separate image and all builds are managed by docker-compose.
I tried separating api directory from celery worker directories and put a Dockerfile in each, but I ran into the problem when the task was submitted to the worker from the unauthorized task. Maybe there is a way to fix it, but it would seem to me like a workaround.
Update
my_app/
docker-compose.yml
fastapi_app/
api/
...
app.py
Dockerfile
worker_app1/
core_app_code/
...
Dockerfile
worker_app2/
core_app_code/
...
Dockerfile
Main question is, where the tasks should be defined for each worker, so that that fastapi_app could submit them.

You don't need to have two docker file for celery worker and API, you can directly write celery command in docker compose file.
See below example to run celery worker with docker compose file.
version: "3"
services:
worker:
build: . #your celery app path
command: celery -A tasks worker --loglevel=info #change loglevel and worker for production
depends-on:
- "redis" #your amqp broker

Related

How do celery workers communicate in Heroku

I have some celery workers in a Heroku app. My app is using python3.6and django, these are the relevant dependencies and their versions:
celery==3.1.26.post2
redis==2.10.3
django-celery==3.2.2
I do not know if the are useful to this question, but just in case. On Heroku we are running the Heroku-18 stack.
As it's usual, we have our workers declared in a Procfile, with the following content:
web: ... our django app ....
celeryd: python manage.py celery worker -Q celery --loglevel=INFO -O fair
one_type_of_worker: python manage.py celery worker -Q ... --maxtasksperchild=3 --loglevel=INFO -O fair
another_type: python manage.py celery worker -Q ... --maxtasksperchild=3 --loglevel=INFO -O fair
So, my current understanding of this process is the following:
Our celery queues run on multiple workers, each worker runs as a dyno on Heroku (not a server, but a “worker process” kind of thing, since servers aren’t a concept on Heroku). We also have multiple dynos running the same celery worker with the same queue, which results in multiple parallel “threads” for that queue to run more tasks simultaneously (scalability).
The web workers, celery workers, and celery queues can talk to each other because celery manages the orchestration between them. I think it's specifically the broker that handles this responsibility. But for example, this lets our web workers schedule a celery task on a specific queue and it is routed to the correct queue/worker, or a task running in one queue/worker can schedule a task on a different queue/worker.
Now here is when comes my question, so does the worker communicate? Do they use an API endpoint in localhost with a port? RCP? Do they use the broker url? Magic?
I'm asking this because I'm trying to replicate this setup in ECS and I need to know how to set it up for celery.

Here you go to know how celery works at heroku: https://devcenter.heroku.com/articles/celery-heroku
You can't run celery on Heroku without getting a Heroku dyno for celery. Also, make sure you have Redis configured on your Django celery settings.
to run the celery on Heroku, you just add this line to your Procfile
worker: celery -A YOUR-PROJECT_NAME worker -l info -B
Note: above celery commands will run both celery worker and celery beat
If you want to run it separately, you can use separate commands but one command is recommended

How to start remote celery workers from django

I'm trying to use django in combination with celery.
Therefore I came across autodiscover_tasks() and I'm not fully sure on how to use them. The celery workers get tasks added by other applications (in this case a node backend).
So far I used this to start the worker:
celery worker -Q extraction --hostname=extraction_worker
which works fine.
Now I'm not sure what the general idea of the django-celery integration is. Should workers still be started from external (e.g. with the command above), or should they be managed and started from the django application?
My celery.py looks like:
# set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'main.settings')
app = Celery('app')
app.config_from_object('django.conf:settings', namespace='CELERY')
# Load task modules from all registered Django app configs.
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)
then I have 2 apps containing a tasks.py file with:
#shared_task
def extraction(total):
return 'Task executed'
how can I now register django to register the worker for those tasks?

You just start worker process as documented, you don't need to register anything else
In a production environment you’ll want to run the worker in the
background as a daemon - see Daemonization - but for testing and
development it is useful to be able to start a worker instance by
using the celery worker manage command, much as you’d use Django’s
manage.py runserver:
celery -A proj worker -l info
For a complete listing of the command-line options available, use the
help command:
celery help
celery worker collects/registers task when it runs and also consumes tasks which it found out

Running two celery workers in a server for two django application

I have a server in which two django application are running appone, apptwo
for them, two celery workers are started with commands:
celery worker -A appone -B --loglevel=INFO
celery worker -A apptwo -B --loglevel=INFO
Both points to same BROKER_URL = 'redis://localhost:6379'
redis is setup with db 0 and 1
I can see the task configured in these two apps in both app's log, which is leading to warnings and errors.
Can we configure in django settings such that the celery works exclusively without interfering with each other's tasks?

You can route tasks to different queues. Start Celery with two different -Q myqueueX and then use different CELERY_DEFAULT_QUEUE in your two Django projects.
Depending on your Celery configuration, your Django setting should look something like:
CELERY_DEFAULT_QUEUE = 'myqueue1'
You can also have more fine grained control with:
#celery.task(queue="myqueue3")
def some_task(...):
pass
More options here:
How to keep multiple independent celery queues?

Celery task not received by worker when two projects

My colleague has written celery tasks, necessary configuration in settings file, also supervisors config file. Everything is working perfectly fine. The projects is handed over to me and I seeing some issues that I have to fix.
There are two projects running on a single machine, both projects are almost same, lets call them projA and projB.
supervisord.conf file is as:
;for projA
[program:celeryd]
directory=/path_to_projA/
command=celery -A project worker -l info
...
[program:celerybeat]
directory=/path_to_projA/
command=celery -A project beat -l info
...
; For projB
[program:celerydB]
directory=/path_to_projB/
command=celery -A project worker -l info
...
[program:celerybeatB]
directory=/path_to_projB/
command=celery -A project beat -l info
...
The issue is, I am creating tasks through a loop and only one task is received from celeryd of projA, and remaining task are not in received (or could be received by celeryd of projB).
But when I stop celery programs for projB everything works well. Please note, the actual name of django-app is project hence celery -A project worker/beat -l info.
Please bare, I am new to celery, any help is appreciated. TIA.

As the Celery docs says,
Celery is an asynchronous task queue/job queue based on distributed message passing.
When multiple tasks are created through a loop, tasks are evenly distributed to two different workers ie worker of projA and worker of projB since your workers are same.
If projects are similar or as you mentioned almost same, you can use Celery Queue but of course your queues across projects should be different.
Celery Docs for the same is provided here.
You need to set CELERY_DEFAULT_QUEUE, CELERY_DEFAULT_ROUTING_KEY and CELERY_QUEUES
in your settings.py file.
And your supervisor.conf file needs queue name in the commands line for all the programs.
For Ex: command=celery -A project beat -l info -Q <queue_name>
And that should work, based on my experience.

Celery & RabbitMQ running as docker containers: Received unregistered task of type '...'

I am relatively new to docker, celery and rabbitMQ.
In our project we currently have the following setup:
1 physical host with multiple docker containers running:
1x rabbitmq:3-management container
# pull image from docker hub and install
docker pull rabbitmq:3-management
# run docker image
docker run -d -e RABBITMQ_NODENAME=my-rabbit --name some-rabbit -p 8080:15672 -p 5672:5672 rabbitmq:3-management
1x celery container
# pull docker image from docker hub
docker pull celery
# run celery container
docker run --link some-rabbit:rabbit --name some-celery -d celery
(there are some more containers, but they should not have to do anything with the problem)
Task File
To get to know celery and rabbitmq a bit, I created a tasks.py file on the physical host:
from celery import Celery
app = Celery('tasks', backend='amqp', broker='amqp://guest:guest#172.17.0.81/')
#app.task(name='tasks.add')
def add(x, y):
return x + y
The whole setup seems to be working quite fine actually. So when I open a python shell in the directory where tasks.py is located and run
>>> from tasks import add
>>> add.delay(4,4)
The task gets queued and directly pulled from the celery worker.
However, the celery worker does not know the tasks module regarding to the logs:
$ docker logs some-celery
[2015-04-08 11:25:24,669: ERROR/MainProcess] Received unregistered task of type 'tasks.add'.
The message has been ignored and discarded.
Did you remember to import the module containing this task?
Or maybe you are using relative imports?
Please see http://bit.ly/gLye1c for more information.
The full contents of the message body was:
{'callbacks': None, 'timelimit': (None, None), 'retries': 0, 'id': '2b5dc209-3c41-4a8d-8efe-ed450d537e56', 'args': (4, 4), 'eta': None, 'utc': True, 'taskset': None, 'task': 'tasks.add', 'errbacks': None, 'kwargs': {}, 'chord': None, 'expires': None} (256b)
Traceback (most recent call last):
File "/usr/local/lib/python3.4/site-packages/celery/worker/consumer.py", line 455, in on_task_received
strategies[name](message, body,
KeyError: 'tasks.add'
So the problem obviously seems to be, that the celery workers in the celery container do not know the tasks module.
Now as I am not a docker specialist, I wanted to ask how I would best import the tasks module into the celery container?
Any help is appreciated :)
EDIT 4/8/2015, 21:05:
Thanks to Isowen for the answer. Just for completeness here is what I did:
Let's assume my tasks.py is located on my local machine in /home/platzhersh/celerystuff. Now I created a celeryconfig.py in the same directory with the following content:
CELERY_IMPORTS = ('tasks')
CELERY_IGNORE_RESULT = False
CELERY_RESULT_BACKEND = 'amqp'
As mentioned by Isowen, celery searches /home/user of the container for tasks and config files. So we mount the /home/platzhersh/celerystuff into the container when starting:
run -v /home/platzhersh/celerystuff:/home/user --link some-rabbit:rabbit --name some-celery -d celery
This did the trick for me. Hope this helps some other people with similar problems.
I'll now try to expand that solution by putting the tasks also in a separate docker container.

As you suspect, the issue is because the celery worker does not know the tasks module. There are two things you need to do:
Get your tasks definitions "into" the docker container.
Configure the celery worker to load those task definitions.
For Item (1), the easiest way is probably to use a "Docker Volume" to mount a host directory of your code onto the celery docker instance. Something like:
docker run --link some-rabbit:rabbit -v /path/to/host/code:/home/user --name some-celery -d celery
Where /path/to/host/code is the your host path, and /home/user is the path to mount it on the instance. Why /home/user in this case? Because the Dockerfile for the celery image defines the working directory (WORKDIR) as /home/user.
(Note: Another way to accomplish Item (1) would be to build a custom docker image with the code "built in", but I will leave that as an exercise for the reader.)
For Item (2), you need to create a celery configuration file that imports the tasks file. This is a more general issue, so I will point to a previous stackoverflow answer: Celery Received unregistered task of type (run example)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.