I have a fastAPI app where I want to call a celery task
I can not import the task as they are in two different code base. So I have to call it using its name.
in tasks.py
imagery = Celery(
"imagery", broker=os.getenv("BROKER_URL"), backend=os.getenv("REDIS_URL")
)
...
#imagery.task(bind=True, name="filter")
def filter_task(self, **kwargs) -> Dict[str, Any]:
print('running task')
The celery worker is running with this command:
celery worker -A worker.imagery -P threads --loglevel=INFO --queues=imagery
Now in my FastAPI code base I want to run the filter task.
So my understanding is I have to use the celery.send_task() function
In app.py I have
from celery import Celery, states
from celery.execute import send_task
from fastapi import FastAPI
from starlette.responses import JSONResponse, PlainTextResponse
from app import models
app = FastAPI()
tasks = Celery(broker=os.getenv("BROKER_URL"), backend=os.getenv("REDIS_URL"))
#app.post("/filter", status_code=201)
async def upload_images(data: models.FilterProductsModel):
"""
TODO: use a celery task(s) to query the database and upload the results to S3
"""
data = ['ok', 'un test']
data = ['ok', 'un test']
result = tasks.send_task('workers.imagery.filter', args=list(data))
return PlainTextResponse(f"here is the id: {str(result.ready())}")
After calling the /filter endpoint, I don't see any task being picked up by the worker.
So I tried different name in send_task()
filter
imagery.filter
worker.imagery.filter
How come my task never get picked up by the worker and nothing shows in the log?
Is my task name wrong?
Edit:
The worker process run in docker. Here is the fullpath of the file on its disk.
tasks.py : /workers/worker.py
So if I follow the import schema. the name of the task would be workers.worker.filter but this does not work, nothing get printed in the logs of docker. Is a print supposed to appear in the STDOUT of the celery cli?
Your Celery worker is subscribed to the imagery queue only . On the other hand, you try to send the task to the default queue (if you did not change configuration, the name of that queue is celery) with result = tasks.send_task('workers.imagery.filter', args=list(data)). It is not surprising you do not see task being executed by your worker as you have been sending tasks to the default queue whole time.
To fix this, try the following:
result = tasks.send_task('workers.imagery.filter', args=list(data), queue='imagery')
OP Here.
This is the solution I used.
task = signature("filter", kwargs=data.dict() ,queue="imagery")
res = task.delay()
As mentioned by #DejanLekic I had to specify the queue.
Related
I have configured a CentOS server with (server A) :
Redis 6.2.6
Python 3.8.1
Celery 5.2.1
Flower
On another server (server B) I send tasks to redis that are executed by a worker on server A.
I want to try to implement multiple queues and change de name of the default queue : queues "high", "normal" and "low", with "normal" beeing the default queue (instead of celery queue).
If I hardcode the queue in the task, it works (they are send to the right queue).
But if I try to do it with "task_routes", it doesn't.
I tried to:
hardcode route in this settings
implement a routing function
implement a routing class with the routing function
But it never works. Moreover, even though I put the "task_create_missing_queue" parameter to False, all tasks are send to celery queue (the default queue is still celery and not normal...).
Any ideas ?
Here is the code for my celeryconfig.py file (with routing function:
from celery import Celery
from kombu import Exchange, Queue
from celery.exceptions import Reject
import re
task_create_missing_queues = False
task_queues = (
Queue('high', Exchange('high'), routing_key='high'),
Queue('normal', Exchange('normal'), routing_key='normal'),
Queue('low', Exchange('low'), routing_key='low')
)
task_default_queue = 'normal'
task_default_exchange = 'normal'
task_default_routing_key = 'normal'
def route_tasks (name, args, kwargs, options, task=None, **kw):
if ':' not in name:
return {'queue': 'normal'}
namespace, _ = name.split(':')
return {'queue': namespace}
task_routes = (route_tasks,)
And my tasks are defined with a name like this with the decorator (when I hardcode the queue for each task, it is there that I add "queue="queue_name") (my celery app is called celery):
#celery.task(bind=True, name="high:long_task")
I can see in flower that the configuration is well taken into account.
My worker is started with multi and -Q high,normal,low.
Many thanks!
I'd like to call generate_async_audio_service from a view and have it asynchronously generate audio files for the list of words using a threading pool and then commit them to a database.
I keep running into an error that I'm working out of the application context even though I'm creating a new polly and s3 instance each time.
How can I generate/upload multiple audio files at once?
from flask import current_app,
from multiprocessing.pool import ThreadPool
from Server.database import db
import boto3
import io
import uuid
def upload_audio_file_to_s3(file):
app = current_app._get_current_object()
with app.app_context():
s3 = boto3.client(service_name='s3',
aws_access_key_id=app.config.get('BOTO3_ACCESS_KEY'),
aws_secret_access_key=app.config.get('BOTO3_SECRET_KEY'))
extension = file.filename.rsplit('.', 1)[1].lower()
file.filename = f"{uuid.uuid4().hex}.{extension}"
s3.upload_fileobj(file,
app.config.get('S3_BUCKET'),
f"{app.config.get('UPLOADED_AUDIO_FOLDER')}/{file.filename}",
ExtraArgs={"ACL": 'public-read', "ContentType": file.content_type})
return file.filename
def generate_polly(voice_id, text):
app = current_app._get_current_object()
with app.app_context():
polly_client = boto3.Session(
aws_access_key_id=app.config.get('BOTO3_ACCESS_KEY'),
aws_secret_access_key=app.config.get('BOTO3_SECRET_KEY'),
region_name=app.config.get('AWS_REGION')).client('polly')
response = polly_client.synthesize_speech(VoiceId=voice_id,
OutputFormat='mp3', Text=text)
return response['AudioStream'].read()
def generate_polly_from_term(vocab_term, gender='m'):
app = current_app._get_current_object()
with app.app_context():
audio = generate_polly('Celine', vocab_term.term)
file = io.BytesIO(audio)
file.filename = 'temp.mp3'
file.content_type = 'mp3'
return vocab_term.id, upload_audio_file_to_s3(file)
def generate_async_audio_service(terms):
pool = ThreadPool(processes=12)
results = pool.map(generate_polly_from_term, terms)
# do something w/ results
This is not necessarily a fleshed-out answer, but rather than putting things into comments I'll put it here.
Celery is a task manager for python. The reason you would want to use this is if you have tasks pinging Flask, but they take longer to finish than the interval of tasks coming in, then certain tasks will be blocked and you won't get all of your results. To fix this, you hand it to another process. This goes like so:
1) Client sends a request to Flask to process audio files
2) The files land in Flask to be processed, Flask will send an asyncronous task to Celery.
3) Celery is notified of the task and stores its state in some sort of messaging system (RabbitMQ and Redis are the canonical examples)
4) Flask is now unburdened from that task and can receive more
5) Celery finishes the task, including the upload to your database
Celery and Flask are then two separate python processes communicating with one another. That should satisfy your multithreaded approach. You can also retrieve the state from a task through Flask if you want the client to verify that the task was/was not completed. The route in your Flask app.py would look like:
#app.route('/my-route', methods=['POST'])
def process_audio():
# Get your files and save to common temp storage
save_my_files(target_dir, files)
response = celery_app.send_tast('celery_worker.files', args=[target_dir])
return jsonify({'task_id': response.task_id})
Where celery_app comes from another module worker.py:
import os
from celery import Celery
env = os.environ
# This is for a rabbitMQ backend
CELERY_BROKER_URL = env.get('CELERY_BROKER_URL', 'amqp://0.0.0.0:5672/0')
CELERY_RESULT_BACKEND = env.get('CELERY_RESULT_BACKEND', 'rpc://')
celery_app = Celery('tasks', broker=CELERY_BROKER_URL, backend=CELERY_RESULT_BACKEND)
Then, your celery process would have a worker configured something like:
from celery import Celery
from celery.signals import after_task_publish
env = os.environ
CELERY_BROKER_URL = env.get('CELERY_BROKER_URL')
CELERY_RESULT_BACKEND = env.get('CELERY_RESULT_BACKEND', 'rpc://')
# Set celery_app with name 'tasks' using the above broker and backend
celery_app = Celery('tasks', broker=CELERY_BROKER_URL, backend=CELERY_RESULT_BACKEND)
#celery_app.task(name='celery_worker.files')
def async_files(path):
# Get file from path
# Process
# Upload to database
# This is just if you want to return an actual result, you can fill this in with whatever
return {'task_state': "FINISHED"}
This is relatively basic, but could serve as a starting point. I will say that some of Celery's behavior and setup is not always the most intuitive, but this will leave your flask app available to whoever wants to send files to it without blocking anything else.
Hopefully that's somewhat helpful
I have a Django project on an Ubuntu EC2 node, which I have been using to set up an asynchronous using Celery.
I am following http://michal.karzynski.pl/blog/2014/05/18/setting-up-an-asynchronous-task-queue-for-django-using-celery-redis/ along with the docs.
I've been able to get a basic task working at the command line, using:
(env1)ubuntu#ip-172-31-22-65:~/projects/tp$ celery --app=myproject.celery:app worker --loglevel=INFO
I just realized, that I have a bunch of tasks in my queue, that had not executed:
[2015-03-28 16:49:05,916: WARNING/MainProcess] Restoring 4 unacknowledged message(s).
(env1)ubuntu#ip-172-31-22-65:~/projects/tp$ celery -A tp purge
WARNING: This will remove all tasks from queue: celery.
There is no undo for this operation!
(to skip this prompt use the -f option)
Are you sure you want to delete all tasks (yes/NO)? yes
Purged 81 messages from 1 known task queue.
How do I get a list of the queued items from the command line?
If you want to get all scheduled tasks,
celery inspect scheduled
To find all active queues
celery inspect active_queues
For status
celery inspect stats
For all commands
celery inspect
If you want to get it explicitily.Since you are using redis as queue.Then
redis-cli
>KEYS * #find all keys
Then find out something related to celery
>LLEN KEY # i think it gives length of list
Here is a copy-paste solution for Redis:
def get_celery_queue_len(queue_name):
from yourproject.celery import app as celery_app
with celery_app.pool.acquire(block=True) as conn:
return conn.default_channel.client.llen(queue_name)
def get_celery_queue_items(queue_name):
import base64
import json
from yourproject.celery import app as celery_app
with celery_app.pool.acquire(block=True) as conn:
tasks = conn.default_channel.client.lrange(queue_name, 0, -1)
decoded_tasks = []
for task in tasks:
j = json.loads(task)
body = json.loads(base64.b64decode(j['body']))
decoded_tasks.append(body)
return decoded_tasks
It works with Django. Just don't forget to change yourproject.celery.
Is there a way to get a list of registered tasks?
I tried:
celery_app.tasks.keys()
Which only returns built in Celery tasks like celery.chord, celery.chain etc.
For the newer versions of celery(4.0 or above), we can get registered tasks as follows.
from celery import current_app
tasks = current_app.tasks.keys()
For older versions of celery, celery < 4, we can get registered tasks as follows.
from celery.task.control import inspect
i = inspect()
i.registered_tasks()
This will give a dictionary of all workers & related registered tasks.
from itertools import chain
set(chain.from_iterable( i.registered_tasks().values() ))
In case if you have multiple workers running same tasks or if you just need a set of all registered tasks, it does the job.
Alternate Way:
From terminal you can get a dump of registered tasks by using this command
celery inspect registered
To inspect tasks related to a specific app, you can pass app name
celery -A app_name inspect registered
With the newer versions of celery ( 4.0 and above ), the following seems to be the right way:
from celery import current_app
current_app.loader.import_default_modules()
tasks = list(sorted(name for name in current_app.tasks
if not name.startswith('celery.')))
return tasks
In a shell, try:
from celery import current_app
print(current_app.tasks.keys())
current_app.tasks has all tasks available as a dictionary. The keys are all the names of registered tasks in the current celery app you are running.
Is there a way to get all the results from every worker on a Celery Broadcast task? I would like to monitor if everything went ok on all the workers. A list of workers that the task was send to would also be appreciated.
No, that is not easily possible.
But you don't have to limit yourself to the built-in amqp result backend,
you can send your own results using Kombu (http://kombu.readthedocs.org),
which is the messaging library used by Celery:
from celery import Celery
from kombu import Exchange
results_exchange = Exchange('myres', type='fanout')
app = Celery()
#app.task(ignore_result=True)
def something():
res = do_something()
with app.producer_or_acquire(block=True) as producer:
producer.send(
{'result': res},
exchange=results_exchange,
serializer='json',
declare=[results_exchange],
)
producer_or_acquire will create a new kombu.Producer using the celery
broker connection pool.