So I have two microapp inside my app, there are:
fastapi for webservice
kafka for consuming message
I want to run both of them in a different thread within the main thread, I have run kafka on a separate thread, but I don't know how to run gunicorn fastapi, I have read that we can use subprocess in thread.
Is there a way to run gunicorn programmatically like uvicorn.run()?
Here is my main code:
def create_app():
logging.basicConfig(level=logging.DEBUG)
# register config
config = Config()
# register dependency
di_container = Container()
di_container.config.from_dict(config.get_config())
di_container.wire(modules=[runner_rating])
# start kafka consumer in a separate thread
kafka_client = di_container.kafka_consumer.client()
kafka_client.listen()
# how to run this inside thread?
app = FastAPI()
app.container = di_container
return app
Related
TLDR:
I need to setup a flask app for multiprocessing such that the API and stomp queue listener are running in separate processes and therefore not interfering with each other's operations.
Details:
I am building a python flask app that has API endpoints and also creates a message queue listener to connect to an activemq queue with the stomp package.
I need to implement multiprocessing such that the API and listener do not block each other's operation. That way the API will accept new requests and the listener will continue to listen for new messages and carry out tasks accordingly.
A simplified version of the code is shown below (some details are omitted for brevity).
Problem: The multiprocessing is causing the application to get stuck. The worker's run method is not called consistently, and therefore the listener never gets created.
# Start the worker as a subprocess -- this is not working -- app gets stuck before the worker's run method is called
m = Manager()
shared_state = m.dict()
worker = MyWorker(shared_state=shared_state)
worker.start()
After several days of troubleshooting I suspect the problem is due to the multiprocessing not being setup correctly. I was able to prove that this is the case because when I stripped out all of the multiprocessing code and called the worker's run method directly, the all of the queue management code is working correctly, the CustomWorker module creates the listener, creates the message, and picks up the message. I think this indicates that the queue management code is working correctly and the source of the problem is most likely due to the multiprocessing.
# Removing the multiprocessing and calling the worker's run method directly works without getting stuck so the issue is likely due to multiprocessing not being setup correctly
worker = MyWorker()
worker.run()
Here is the code I have so far:
App
This part of the code creates the API and attempts to create a new process to create the queue listener. The 'custom_worker_utils' module is a custom module that creates the stomp listener in the CustomWorker() class run method.
from flask import Flask, request, make_response, jsonify
from flask_restx import Resource, Api
import sys, os, logging, time
basedir = os.path.dirname(os.getcwd())
sys.path.append('..')
from custom_worker_utils.custom_worker_utils import *
from multiprocessing import Manager
# app.py
def create_app():
app = Flask(__name__)
app.config['BASE_DIR'] = basedir
api = Api(app, version='1.0', title='MPS Worker', description='MPS Common Worker')
logger = get_logger()
'''
This is a placeholder to trigger the sending of a message to the first queue
'''
#api.route('/initialapicall', endpoint="initialapicall", methods=['GET', 'POST', 'PUT', 'DELETE'])
class InitialApiCall(Resource):
#Sends a message to the queue
def get(self, *args, **kwargs):
mqconn = get_mq_connection()
message = create_queue_message(initial_tracker_file)
mqconn.send('/queue/test1', message, headers = {"persistent":"true"})
return make_response(jsonify({'message': 'Initial Test Call Worked!'}), 200)
# Start the worker as a subprocess -- this is not working -- app gets stuck before the worker's run method is called
m = Manager()
shared_state = m.dict()
worker = MyWorker(shared_state=shared_state)
worker.start()
# Removing the multiprocessing and calling the worker's run method directly works without getting stuck so the issue is likely due to multiprocessing not being setup correctly
#worker = MyWorker()
#worker.run()
return app
Custom worker utils
The run() method is called, connects to the queue and creates the listener with the stomp package
# custom_worker_utils.py
from multiprocessing import Manager, Process
from _datetime import datetime
import os, time, json, stomp, requests, logging, random
'''
The listener
'''
class MyListener(stomp.ConnectionListener):
def __init__(self, p):
self.process = p
self.logger = p.logger
self.conn = p.mqconn
self.conn.connect(_user, _password, wait=True)
self.subscribe_to_queue()
def on_message(self, headers, message):
message_data = json.loads(message)
ticket_id = message_data[constants.TICKET_ID]
prev_status = message_data[constants.PREVIOUS_STEP_STATUS]
task_name = message_data[constants.TASK_NAME]
#Run the service
if prev_status == "success":
resp = self.process.do_task(ticket_id, task_name)
elif hasattr(self, 'revert_task'):
resp = self.process.revert_task(ticket_id, task_name)
else:
resp = True
if (resp):
self.logger.debug('Acknowledging')
self.logger.debug(resp)
self.conn.ack(headers['message-id'], self.process.conn_id)
else:
self.conn.nack(headers['message-id'], self.process.conn_id)
def on_disconnected(self):
self.conn.connect('admin', 'admin', wait=True)
self.subscribe_to_queue()
def subscribe_to_queue(self):
queue = os.getenv('QUEUE_NAME')
self.conn.subscribe(destination=queue, id=self.process.conn_id, ack='client-individual')
def get_mq_connection():
conn = stomp.Connection([(_host, _port)], heartbeats=(4000, 4000))
conn.connect(_user, _password, wait=True)
return conn
class CustomWorker(Process):
def __init__(self, **kwargs):
super(CustomWorker, self).__init__()
self.logger = logging.getLogger("Worker Log")
log_level = os.getenv('LOG_LEVEL', 'WARN')
self.logger.setLevel(log_level)
self.mqconn = get_mq_connection()
self.conn_id = random.randrange(1,100)
for k, v in kwargs.items():
setattr(self, k, v)
def revert_task(self, ticket_id, task_name):
# If the subclass does not implement this,
# then there is nothing to undo so just return True
return True
def run(self):
lst = MyListener(self)
self.mqconn.set_listener('queue_listener', lst)
while True:
pass
Seems like Celery is excatly what you need.
Celery is a task queue that can distribute work across worker-processes and even across machines.
Miguel Grinberg created a great post about that, Showing how to accept tasks via flask and spawn them using Celery as tasks.
Good Luck!
To resolve this issue I have decided to run the flask API and the message queue listener as two entirely separate applications in the same docker container. I have installed and configured supervisord to start and the processes individually.
[supervisord]
nodaemon=true
logfile=/home/appuser/logs/supervisord.log
[program:gunicorn]
command=gunicorn -w 1 -c gunicorn.conf.py "app:create_app()" -b 0.0.0.0:8081 --timeout 10000
directory=/home/appuser/app
user=appuser
autostart=true
autorestart=true
stdout_logfile=/home/appuser/logs/supervisord_worker_stdout.log
stderr_logfile=/home/appuser/logs/supervisord_worker_stderr.log
[program:mqlistener]
command=python3 start_listener.py
directory=/home/appuser/mqlistener
user=appuser
autostart=true
autorestart=true
stdout_logfile=/home/appuser/logs/supervisord_mqlistener_stdout.log
stderr_logfile=/home/appuser/logs/supervisord_mqlistener_stderr.log
Simple Problem
App Configuration update is not reflected in a celery worker.
Configuration
Currently I have this initial config on the app.
In my config.py
class AppConfig(object):
DISABLED = False
and loaded in like this when the app server starts in my app.py.
app = Flask(__name__)
app.config.from_object('config.AppConfig')
Celery Task
In my celery task I am using this configuration to enabled/disabled it.
#celery.task(bind=True, name="task_example")
def task_example(self):
if app.config['DISABLED'] is True:
return
# otherwise proceed.
Which is called inside another service.
class Service(object):
def process(self):
task_example.delay()
Testing
I updated the config to app.config['DISABLED'] = True inside a test which also calls for the class Service that call celery task task_example.
class TaskTest(TestCase):
def test_task(self):
app.config.update(DISABLED=True)
service = Service()
# Problem lies down here because when it was called
# the `DISABLED` remain `False`.
service.process()
Unfortunately, task_example() is still DISABLED == False (enabled). It is possible to update it once inside the test class and propagate the changes into the celery worker?
I have created a flask application using Blueprints.
This application receives data via paho.mqtt.client.
This is also the trigger to processes the data and run processes afterwards.
'system' is a blueprint containing mqtt.py and functions.py
functions.py contains the function to process the data once received
mqtt.py contains the definition of the mqtt client
mqtt.py
from app.system import functions
import paho.mqtt.client as mqtt
#....
def on_message(mqttc,obj,msg):
try:
data = json.loads(msg.payload.decode('utf-8'))
# start main process
functions.process(data)
except Exception as e:
print("error: ", e)
pass
Once I receive data and the on_message callback is triggered I get an out of application context error:
error: Working outside of application context.
This typically means that you attempted to use functionality that needed
to interface with the current application object in some way. To solve
this, set up an application context with app.app_context(). See the
documentation for more information.
How can i get the application context within the on_message callback?
I tried importing current_app and using something like this
from flask import current_app
#...
def on_message(mqttc,obj,msg):
try:
data = json.loads(msg.payload.decode('utf-8'))
app = current_app._get_current_object()
with app.app_context():
# start main process
functions.process(data)
I still get the same error
There is this package - https://flask-mqtt.readthedocs.io/en/latest/ - that might help, but it only works with one worker instance.
Most of the time you set the application context when you create the app object.
So wherever you create your app is where you should initialize the extension. In your case it sounds like functions.py needs mqtt.py to carry out its logic, so you should initialize your mqtt client in your application creation.
From the flask docs - http://flask.pocoo.org/docs/1.0/appcontext/
If you see that error while configuring your application, such as when
initializing an extension, you can push a context manually since you
have direct access to the app. Use app_context() in a with block, and
everything that runs in the block will have access to current_app.
def create_app():
app = Flask(__name__)
with app.app_context():
#init_db()
initialize mqtt client
return app
I'd like to call generate_async_audio_service from a view and have it asynchronously generate audio files for the list of words using a threading pool and then commit them to a database.
I keep running into an error that I'm working out of the application context even though I'm creating a new polly and s3 instance each time.
How can I generate/upload multiple audio files at once?
from flask import current_app,
from multiprocessing.pool import ThreadPool
from Server.database import db
import boto3
import io
import uuid
def upload_audio_file_to_s3(file):
app = current_app._get_current_object()
with app.app_context():
s3 = boto3.client(service_name='s3',
aws_access_key_id=app.config.get('BOTO3_ACCESS_KEY'),
aws_secret_access_key=app.config.get('BOTO3_SECRET_KEY'))
extension = file.filename.rsplit('.', 1)[1].lower()
file.filename = f"{uuid.uuid4().hex}.{extension}"
s3.upload_fileobj(file,
app.config.get('S3_BUCKET'),
f"{app.config.get('UPLOADED_AUDIO_FOLDER')}/{file.filename}",
ExtraArgs={"ACL": 'public-read', "ContentType": file.content_type})
return file.filename
def generate_polly(voice_id, text):
app = current_app._get_current_object()
with app.app_context():
polly_client = boto3.Session(
aws_access_key_id=app.config.get('BOTO3_ACCESS_KEY'),
aws_secret_access_key=app.config.get('BOTO3_SECRET_KEY'),
region_name=app.config.get('AWS_REGION')).client('polly')
response = polly_client.synthesize_speech(VoiceId=voice_id,
OutputFormat='mp3', Text=text)
return response['AudioStream'].read()
def generate_polly_from_term(vocab_term, gender='m'):
app = current_app._get_current_object()
with app.app_context():
audio = generate_polly('Celine', vocab_term.term)
file = io.BytesIO(audio)
file.filename = 'temp.mp3'
file.content_type = 'mp3'
return vocab_term.id, upload_audio_file_to_s3(file)
def generate_async_audio_service(terms):
pool = ThreadPool(processes=12)
results = pool.map(generate_polly_from_term, terms)
# do something w/ results
This is not necessarily a fleshed-out answer, but rather than putting things into comments I'll put it here.
Celery is a task manager for python. The reason you would want to use this is if you have tasks pinging Flask, but they take longer to finish than the interval of tasks coming in, then certain tasks will be blocked and you won't get all of your results. To fix this, you hand it to another process. This goes like so:
1) Client sends a request to Flask to process audio files
2) The files land in Flask to be processed, Flask will send an asyncronous task to Celery.
3) Celery is notified of the task and stores its state in some sort of messaging system (RabbitMQ and Redis are the canonical examples)
4) Flask is now unburdened from that task and can receive more
5) Celery finishes the task, including the upload to your database
Celery and Flask are then two separate python processes communicating with one another. That should satisfy your multithreaded approach. You can also retrieve the state from a task through Flask if you want the client to verify that the task was/was not completed. The route in your Flask app.py would look like:
#app.route('/my-route', methods=['POST'])
def process_audio():
# Get your files and save to common temp storage
save_my_files(target_dir, files)
response = celery_app.send_tast('celery_worker.files', args=[target_dir])
return jsonify({'task_id': response.task_id})
Where celery_app comes from another module worker.py:
import os
from celery import Celery
env = os.environ
# This is for a rabbitMQ backend
CELERY_BROKER_URL = env.get('CELERY_BROKER_URL', 'amqp://0.0.0.0:5672/0')
CELERY_RESULT_BACKEND = env.get('CELERY_RESULT_BACKEND', 'rpc://')
celery_app = Celery('tasks', broker=CELERY_BROKER_URL, backend=CELERY_RESULT_BACKEND)
Then, your celery process would have a worker configured something like:
from celery import Celery
from celery.signals import after_task_publish
env = os.environ
CELERY_BROKER_URL = env.get('CELERY_BROKER_URL')
CELERY_RESULT_BACKEND = env.get('CELERY_RESULT_BACKEND', 'rpc://')
# Set celery_app with name 'tasks' using the above broker and backend
celery_app = Celery('tasks', broker=CELERY_BROKER_URL, backend=CELERY_RESULT_BACKEND)
#celery_app.task(name='celery_worker.files')
def async_files(path):
# Get file from path
# Process
# Upload to database
# This is just if you want to return an actual result, you can fill this in with whatever
return {'task_state': "FINISHED"}
This is relatively basic, but could serve as a starting point. I will say that some of Celery's behavior and setup is not always the most intuitive, but this will leave your flask app available to whoever wants to send files to it without blocking anything else.
Hopefully that's somewhat helpful
I have one scheduler to send message periodically in my flask app. For gunicorn, I defined 10 sync workers and the app create 10 schedulers and send the same message 10 times. Is there any way to only send one message?
The code for flask app:
def send_msg():
# here we send msg
#app.before_first_request
def activate_job():
scheduler = BackgroundScheduler()
scheduler.add_job(send_msg, 'interval', minutes=5)
scheduler.start()
atexit.register(lamda: scheduler.shutdown())
The 4 workers call the function activate job that's why your message was sent 4 times, I solved the problem by adding the background task in the main function that's called when I run my app and I added with app.app_context(): before the job function in your case
def send_msg():
with app.app_context():
# here we send msg