I have a very simple Flask app example which uses a celery worker to process a task asynchronously:
app.py
app.config['CELERY_BROKER_URL'] = os.environ.get('REDISCLOUD_URL', 'redis://localhost:6379')
app.config['CELERY_RESULT_BACKEND']= os.environ.get('REDISCLOUD_URL', 'redis://localhost:6379')
app.config['SQLALCHEMY_DATABASE_URI'] = conn_str
celery = make_celery(app)
db.init_app(app)
#app.route('/')
def index():
return "Working"
#app.route('/test')
def test():
task = reverse.delay("hello")
return task.id
#celery.task(name='app.reverse')
def reverse(string):
return string[::-1]
if __name__ == "__main__":
app.run()
To run it locally, I run celery -A app.celery worker --loglevel=INFO
in one terminal, and python app.py in another terminal.
I'm wondering how can I deploy this application on Google Cloud? I don't want to use Task Queues since it is only compatible with Python 2. Is there a good piece of documentation available for doing something like this? Thanks
App engine task queues is the previous version of Google Cloud Tasks, this has full support for App Engine Flex/STD and Python 3.x runtimes.
You need to create a Cloud Task Queue and an App engine service to handle the tasks
Gcloud command to create a queue
gcloud tasks queues create [QUEUE_ID]
Task handler code
from flask import Flask, request
app = Flask(__name__)
#app.route('/example_task_handler', methods=['POST'])
def example_task_handler():
"""Log the request payload."""
payload = request.get_data(as_text=True) or '(empty payload)'
print('Received task with payload: {}'.format(payload))
return 'Printed task payload: {}'.format(payload)
Code to push a task
"""Create a task for a given queue with an arbitrary payload."""
from google.cloud import tasks_v2
client = tasks_v2.CloudTasksClient()
# replace with your values.
# project = 'my-project-id'
# queue = 'my-appengine-queue'
# location = 'us-central1'
# payload = 'hello'
parent = client.queue_path(project, location, queue)
# Construct the request body.
task = {
'app_engine_http_request': { # Specify the type of request.
'http_method': tasks_v2.HttpMethod.POST,
'relative_uri': '/example_task_handler'
}
}
if payload is not None:
# The API expects a payload of type bytes.
converted_payload = payload.encode()
# Add the payload to the request.
task['app_engine_http_request']['body'] = converted_payload
if in_seconds is not None:
timestamp = datetime.datetime.utcnow() + datetime.timedelta(seconds=in_seconds)
# Add the timestamp to the tasks.
task['schedule_time'] = timestamp
# Use the client to build and send the task.
response = client.create_task(parent=parent, task=task)
print('Created task {}'.format(response.name))
return response
requirements.txt
Flask==1.1.2
gunicorn==20.0.4
google-cloud-tasks==2.0.0
You can check this full example in GCP Python examples Github page
Related
Context
I developed a Flask API that sends tasks to my computing environment.
To use this, you should make a post request to the API.
Then, the API received your request, process it and send necessary data, through the RABBITMQ broker, a message to be held by the computing environment.
At the end, it should send the result back to the API
Some code
Here is an example of my API and my Celery application:
#main.py
# Package
import time
from flask import Flask
from flask import request, jsonify, make_response
# Own module
from celery_app import celery_app
# Environment
app = Flask()
# Endpoint
#app.route("/test", methods=["POST"])
def test():
"""
Test route
Returns
-------
Json formatted output
"""
# Do some preprocessing in here
result = celery_app.send_task(f"tasks.Client", args=[1, 2])
while result.state == "PENDING":
time.sleep(0.01)
result = result.get()
if result["sucess"]:
result_code = 200
else:
result_code = 500
output = str(result)
return make_response(
jsonify(
text=output,
code_status=result_code, ),
result_code,
)
# Main thread
if __name__ == "__main__":
app.run()
In a different file, I have setup my celery application connected to RABBITMQ Queue
#celery_app.py
from celery import Celery, Task
celery_app = Celery("my_celery",
broker=f"amqp://{USER}:{PASSWORD}#{HOSTNAME}:{PORT}/{COLLECTION}",
backend="rpc://"
)
celery_app.conf.task_serializer = "pickle"
celery_app.conf.result_serializer = "pickle"
celery_app.conf.accept_content = ["pickle"]
celery_app.conf.broker_connection_max_retries = 5
celery_app.conf.broker_pool_limit = 1
class MyTask(Task):
def run(self, a, b):
return a + b
celery_app.register_task(MyTask())
To run it, you should launch:
python3 main.py
Do not forget to run the celery worker (after registering tasks in it)
Then you can make a post request on it:
curl -X POST http://localhost:8000/test
The problem to resolve
When this simple API is running, I am sending request on my endpoint.
Unfortunatly, it fails 1 time on 4.
I have 2 messages:
The first message is:
amqp.exceptions.PreconditionFailed: (0, 0): (406) PRECONDITION_FAILED - delivery acknowledgement on channel 1 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more
Then, because of the time out, my server has lost the message so:
File "main.py", line x, in test
result = celery_app.send_task("tasks.Client", args=[1, 2])
amqp.exceptions.InvalidCommand: Channel.close_ok: (503) COMMAND_INVALID - unimplemented method
Resolve this error
There are 2 solutions to get around this problem
retry to send a tasks until it fails 5 times in a row (try / except amqp.exceptions.InvalidCommand)
change the timeout value.
Unfortunatly, it doesn't seems to be the best ways to solve it.
Can you help me ?
Regards
PS:
my_packages:
Flask==2.0.2
python==3.6
celery==4.4.5
rabbitmq==latest
1. PreconditionFailed
I change my RabbitMQ version from latest to 3.8.14.
Then, I set up a celery task timeout using time_limit and soft_time_limit.
And it works :)
2. InvalidCommand
To resolve this problem, I use this retryfunctionnaluity.
I setup:
# max_retries=3
# autoretry_for=(InvalidCommand,)
I have set up a Pyramid server with one method that sends tasks through Celery (in a distributed fashion):
celery_app = Celery('mycelery', broker='', backend='')
#view_config(route_name='run', request_method='POST')
def run_task(request):
req = request.json_body
task = celery_app.signature(
req['mytaskname'],
kwargs={'data': req['mykey']},
queue='jobs'
).delay()
if __name__ == '__main__':
with Configurator() as config:
config.add_route('run', '/v2/run')
config.scan()
app = config.make_wsgi_app()
server = make_server(API_HOST, int(API_PORT), app)
info(f'server at {API_HOST}:{API_PORT}')
try:
server.serve_forever()
except KeyboardInterrupt:
pass
Note how the task name is included in the request body, which implies that the user need to known beforehand which are the available task names. Now, I just need to set up the worker on a different machine and point to the same Celery broker instance:
celery_app = Celery('mycelery', broker='', backend='')
#celery_app.task(name='example_task_name', bind=True)
def my_task(self, data):
pass
I want to be able to set up (and kill) workers at any time on different machines, but the server should list me all available task names which I can use in my application with a brief description of them. I thought about using the decorator worker_ready.connect to send a signal to the API, but I am not able to successfully integrate Celery with Pyramid:
#worker_ready.connect
def register_worker(sender, **k):
celery_app.signature(
'join',
kwargs={'task_name': 'example_task_name', 'task_description': 'example task'},
queue='events'
).delay()
Any ideas?
I'd like to call generate_async_audio_service from a view and have it asynchronously generate audio files for the list of words using a threading pool and then commit them to a database.
I keep running into an error that I'm working out of the application context even though I'm creating a new polly and s3 instance each time.
How can I generate/upload multiple audio files at once?
from flask import current_app,
from multiprocessing.pool import ThreadPool
from Server.database import db
import boto3
import io
import uuid
def upload_audio_file_to_s3(file):
app = current_app._get_current_object()
with app.app_context():
s3 = boto3.client(service_name='s3',
aws_access_key_id=app.config.get('BOTO3_ACCESS_KEY'),
aws_secret_access_key=app.config.get('BOTO3_SECRET_KEY'))
extension = file.filename.rsplit('.', 1)[1].lower()
file.filename = f"{uuid.uuid4().hex}.{extension}"
s3.upload_fileobj(file,
app.config.get('S3_BUCKET'),
f"{app.config.get('UPLOADED_AUDIO_FOLDER')}/{file.filename}",
ExtraArgs={"ACL": 'public-read', "ContentType": file.content_type})
return file.filename
def generate_polly(voice_id, text):
app = current_app._get_current_object()
with app.app_context():
polly_client = boto3.Session(
aws_access_key_id=app.config.get('BOTO3_ACCESS_KEY'),
aws_secret_access_key=app.config.get('BOTO3_SECRET_KEY'),
region_name=app.config.get('AWS_REGION')).client('polly')
response = polly_client.synthesize_speech(VoiceId=voice_id,
OutputFormat='mp3', Text=text)
return response['AudioStream'].read()
def generate_polly_from_term(vocab_term, gender='m'):
app = current_app._get_current_object()
with app.app_context():
audio = generate_polly('Celine', vocab_term.term)
file = io.BytesIO(audio)
file.filename = 'temp.mp3'
file.content_type = 'mp3'
return vocab_term.id, upload_audio_file_to_s3(file)
def generate_async_audio_service(terms):
pool = ThreadPool(processes=12)
results = pool.map(generate_polly_from_term, terms)
# do something w/ results
This is not necessarily a fleshed-out answer, but rather than putting things into comments I'll put it here.
Celery is a task manager for python. The reason you would want to use this is if you have tasks pinging Flask, but they take longer to finish than the interval of tasks coming in, then certain tasks will be blocked and you won't get all of your results. To fix this, you hand it to another process. This goes like so:
1) Client sends a request to Flask to process audio files
2) The files land in Flask to be processed, Flask will send an asyncronous task to Celery.
3) Celery is notified of the task and stores its state in some sort of messaging system (RabbitMQ and Redis are the canonical examples)
4) Flask is now unburdened from that task and can receive more
5) Celery finishes the task, including the upload to your database
Celery and Flask are then two separate python processes communicating with one another. That should satisfy your multithreaded approach. You can also retrieve the state from a task through Flask if you want the client to verify that the task was/was not completed. The route in your Flask app.py would look like:
#app.route('/my-route', methods=['POST'])
def process_audio():
# Get your files and save to common temp storage
save_my_files(target_dir, files)
response = celery_app.send_tast('celery_worker.files', args=[target_dir])
return jsonify({'task_id': response.task_id})
Where celery_app comes from another module worker.py:
import os
from celery import Celery
env = os.environ
# This is for a rabbitMQ backend
CELERY_BROKER_URL = env.get('CELERY_BROKER_URL', 'amqp://0.0.0.0:5672/0')
CELERY_RESULT_BACKEND = env.get('CELERY_RESULT_BACKEND', 'rpc://')
celery_app = Celery('tasks', broker=CELERY_BROKER_URL, backend=CELERY_RESULT_BACKEND)
Then, your celery process would have a worker configured something like:
from celery import Celery
from celery.signals import after_task_publish
env = os.environ
CELERY_BROKER_URL = env.get('CELERY_BROKER_URL')
CELERY_RESULT_BACKEND = env.get('CELERY_RESULT_BACKEND', 'rpc://')
# Set celery_app with name 'tasks' using the above broker and backend
celery_app = Celery('tasks', broker=CELERY_BROKER_URL, backend=CELERY_RESULT_BACKEND)
#celery_app.task(name='celery_worker.files')
def async_files(path):
# Get file from path
# Process
# Upload to database
# This is just if you want to return an actual result, you can fill this in with whatever
return {'task_state': "FINISHED"}
This is relatively basic, but could serve as a starting point. I will say that some of Celery's behavior and setup is not always the most intuitive, but this will leave your flask app available to whoever wants to send files to it without blocking anything else.
Hopefully that's somewhat helpful
Basically, I have flask app which is hosted on Azure instance. when I post some data at API endpoint celery start a process in background and API send a response immediately to the client.
here is tasks.py basic sample:
from celery import Celery
app = Celery('tasks', broker ='amqp://localhost//')
#app.task
def reverse(main):
return main[::-1]
Error:
Basic flask Example:
from flask import Flask
from flask import request
from tasks import *
app = Flask(__name__)
#app.route('/params',methods =['POST'])
def get_url():
main = request.args.get('main')
reverse.delay(main)
return main
if __name__ == "__main__":
app.run()
again, flask app is running on azure instance. do I have change localhost to
a IP in tasks.py
I'm building a Flask application which relies on Celery to process some long running tasks. Each task will essentially append a dictionary to a shared list once it has finished processing - this list is shared by the celery workers and the routes of the Flask application. The Flask component essentially consists of a set of routes to retrieve the contents of the shared list and modify the order of the elements.
I thin I have successfully shared the list between the Celery workers using a Manager from the Python's multiprocessing module. However, the changes made to this list are not seen by the Flask application. Here is a minimal application which illustrates the issue:
import os
import json
from flask import Flask
from multiprocessing import Manager
from celery import Celery
application = Flask(__name__)
redis_url = os.environ.get('REDIS_URL')
if redis_url is None:
redis_url = 'redis://localhost:6379/0'
# Set the secret key to enable cookies
application.secret_key = 'some secret key'
application.config['SESSION_TYPE'] = 'filesystem'
# Redis and Celery configuration
application.config['BROKER_URL'] = redis_url
application.config['CELERY_RESULT_BACKEND'] = redis_url
celery = Celery(application.name, broker=redis_url)
celery.conf.update(BROKER_URL=redis_url,
CELERY_RESULT_BACKEND=redis_url)
manager = Manager()
shared_queue = manager.list() # THIS IS THE SHARED LIST
#application.route("/submit", methods=['GET'])
def submit_song():
add_song_to_queue.delay()
return 'Added a song to the queue'
#application.route("/playlist", methods=['GET', 'POST'])
def get_playlist():
playlist = []
i = 0
queue_size = len(shared_queue)
while i < queue_size:
print(shared_queue[i])
playlist.append(shared_queue[i])
return json.dumps(playlist)
#celery.task
def add_song_to_queue():
shared_queue.append({'some':'data!'})
print(len(shared_queue))
if __name__ == "__main__":
application.run(host='0.0.0.0', debug=True)
In the celery logs I can clearly see that the dictionaries are being appended to the list, and that the size of the list increases. However, when I access the /playlist route on my browser I always get an empty list.
Does anyone know how I can get the list to be shared among all the workers and the Flask application?
I found a solution by moving away from Celery and instead using multiprocessing.Pool as a task queue and shared memory through Manager as shown in sample code in the question. This link has an excellent example of how this solution can be integrated with Flask: http://gouthamanbalaraman.com/blog/python-multiprocessing-as-a-task-queue.html
from multiprocessing import Pool
from flask import Flask
app = Flask(__name__)
_pool = None
def expensive_function(x):
# import packages that is used in this function
# do your expensive time consuming process
return x*x
#app.route('/expensive_calc/<int:x>')
def route_expcalc(x):
f = _pool.apply_async(expensive_function,[x])
r = f.get(timeout=2)
return 'Result is %d'%r
if __name__=='__main__':
_pool = Pool(processes=4)
try:
# insert production server deployment code
app.run()
except KeyboardInterrupt:
_pool.close()
_pool.join()