flask background task using `threading.Thread` is blocking main thread - python

I have a long-running background task that spins the flask app again and to do some auditing in the background. The front end is a web application and uses socketio to communicate with the backend main flask app to handle multiple async behaviors.
I make sure to only fire the background task when the main thread is created and I do eventlet.monkey_patch() only at the very beginning of the script.
if the background thread has a lot of stuff to audit, it blocks the main thread, the more stuff in memory, the longer it blocks the main thread. The audit is not CPU intensive at all, it's just some db inserts and logging.
The items that need to be audited are added to an object in memory from the main thread and are passed by reference to the child thread. (Like a in memory queue)
If I don't monkey patch eventlet, then everything works fine but then flask's auto reload won't work, and I need it for development.
I run the app like socketio.run(app) in dev
Behavior persists when using gunicorn/eventlet
When the background task is sleeping sleep(2), there's no block happening.
import eventlet
eventlet.monkey_patch()
# ... rest of code is a basic flask app create_app factory that at some # point starts the new thread if it's the main thread
# the async code that runs is the following
class AsyncAuditor(threading.Thread):
def __init__(self, tasks: list, stop: threading.Event):
super().__init__()
self.tasks = tasks
self.stop_event = stop
def run(self):
from app import init_app
from dal import db
app = init_app(mode='sys')
app.logger.info('starting async audit thread')
with app.app_context():
try:
while not self.stop_event.is_set():
if len(self.tasks) > 0:
task: dict
for task in self.tasks:
app.logger.debug(threading.current_thread().name + ' new audit record')
task.payload = encryptor.encrypt(task.payload)
task.ip = struct.unpack("!I", socket.inet_aton(task.ip))[0]
db.session.add(task)
self.tasks.clear()
db.session.commit()
sleep(2)
app.logger.info('exiting async audit thread')
except BaseException as e:
app.logger.exception('Exception')
# there's some code that tries to gracefully exit if app needs to exit
stop_event = threading.Event()
async_task = AsyncAuditor(API.audit_tasks, stop_event)
async_task.start()
def exit_async_thread():
stop_event.set()
async_task.join()
atexit.register(exit_async_thread)
I expect that while the child thread is working, the main thread would not be blocked by any db operations, in fact, like I mentioned before, if I don't monkey patch eventlet, then everything works fine in the main thread and the child one as well. Instead, I'm getting 9 and even 30 seconds delays when hitting an endpoint in the flask application while the background task is working.

Related

How to trigger socket emit from Celery task signal

I have a scenario where I want to trigger an event anytime a new task is created of finished. I'm currently trying to make use of Celery task signals to do so, but can't figure out how to trigger a certain socket event from within the task signal functions.
from .extensions import celery, socketio
from celery.signals import task_prerun, task_postrun
#celery.task
def myTask():
""" do some things """
return {"things"}
#task_prerun.connect(sender=myTask)
def task_prerun_notifier(sender=None, **kwargs):
print("From task_prerun_notifier ==> Running just before add() executes")
socketio.emit("trigger me") #doesn't work
#task_postrun.connect(sender=myTask)
def task_postrun_notifier(sender=None, **kwargs):
print("From task_postrun_notifier ==> Ok, done!")
socketio.emit("trigger me") #doesn't work
The task signals runs at the correct time, but I am not able to trigger the event trigger me. I have tried it with only emit("trigger me") instead of socketio.emit("trigger me") as well but without any luck.
How do you execute emits from within a Celery task signal?

Aborting code execution in a Python Process without terminating the process

Let's say I have a (websocket) API, api.py, as such:
from flask import Flask, request
from flask_socketio import SocketIO, emit
from worker import Worker
app = Flask()
socketio = SocketIO(app)
worker = Worker()
worker.start()
#socketio.on('connect')
def connect():
print("Client", request.sid, "connected")
#socketio.on('get_results')
def get_results(query):
"""
The only endpoing of the API.
"""
print("Client", request.sid, "requested results for query", query)
# Set the worker to work, wait for results to be ready, and
# send the results back to the client.
worker.task_queue.put(query)
results = worker.result_queue.get()
emit("results", results)
#socketio.on('disconnect')
def disconnect():
print("Client", request.sid, "disconnected, perhaps before results where ready")
# What to do here?
socketio.run(app, host='')
The a API will serve many clients but only has a single worker to produce the results that should be served. worker.py:
from multiprocessing import Process, Queue
class Worker(Process):
def __init__(self):
super().__init__()
self.task_queue = Queue()
self.result_queue = Queue()
self.some_stateful_variable = 0
# Do other computationally expensive work
def reset_state(self):
# Computationally inexpensive.
pass
def do_work(self, task):
# Computationally expensive. Takes long time.
# Modifies internal state.
pass
def run(self):
while True:
task = self.task_queue.get()
results = self.do_work(task)
self.result_queue.put(results)
The worker gets a request, i.e. a task to do, and sets forth producing a result. When the result is ready, the client will be served it.
But not all clients are patient. They may leave, i.e. disconnect from the API, before the results are ready. They don't want them, and the worker therefore ends up working on a task that does not need to finish. That makes other client in queue wait unnecessarily. How to avoid this situation, and get the worker to abort executing do_work for a task that does not need to finish?
In client side: when user closes browser tab or leave the page send request to your Flask server, the request should contain id of the task you would like to cancel.
In server side put cancel status for the task in database or any shared variable between Flask Server and your Worker Process
Divide Task processing in several stages and check status of task in database before each stage, if status is canceled - stop the task processing.
Another choice for point 1 is to do some monitoring in Server side in separate Process - count interval between status requests from client side.
I've handled similar problems by launching an entirely separate process via:
sp.call('start python path\\worker.py', shell=True)
worker.py would then report its PID back to the api.py via redis, then its straightforward to kill the process at any point from api.py
Of course, how viable that is for you will depend on how much data resides within api.py and is shared to worker.py - whether its feasible for that to also pass via redis or not is for you to decide.
The added benefit is you decouple socket from heavy compute - and you can go quasi-multi-core (single thread per worker.py). You could go full multi core by incorporating multiprocessing into each worker.py if you wished.

Python Multiprocessing: Signal job completion without passing Event object through a queue

Problem Outline
I have a python flask server where one of the endpoints has a moderate amount of work to do (the real code reads, resizes and returns an image). I want to optimise the endpoint so that it can be called multiple times in parallel.
The code I currently have (shown below) does not work because it relies on passing a multiprocessing.Event object through a multiprocessing.JoinableQueue which is not allowed and results in the following error:
RuntimeError: Condition objects should only be shared between processes
through inheritance
How can I use a separate process to compute some jobs and notify the main thread when a specific job is complete?
Proof of Concept
Flask can be multithreaded so if one request is waiting on a result other threads can continue to process other requests. I have a basic proof of concept here that shows that parallel requests can be optimised using multiprocessing: https://github.com/alanbacon/flask_multiprocessing
The example code on github spawns a new process for every request which I understand has considerable overheads, plus I've noticed that my proof-of-concept server crashes if there are more than 10 or 20 concurrent requests, I suspect this is because there are too many processes being spawned.
Current Attempt
I have tried to create a set of workers that pick jobs off a queue. When a job is complete the result is written to a shared memory area. Each job contains the work to be done and an Event object that can be set when the job is complete to signal the main thread.
Each request thread passes in a job with a newly created Event object, it then immediately waits on that event before returning the result. While one server request thread is waiting the server is able to use other threads to continue to serve other requests.
The problem as mentioned above is that Event objects can not be passed around in this way.
What approach should I take to circumvent this problem?
from flask import Flask, request, Response,
import multiprocessing
import uuid
app = Flask(__name__)
# flask config
app.config['PROPAGATE_EXCEPTIONS'] = True
app.config['DEBUG'] = False
def simpleWorker(complexity):
temp = 0
for i in range(0, complexity):
temp += 1
mgr = multiprocessing.Manager()
results = mgr.dict()
joinableQueue = multiprocessing.JoinableQueue()
lock = multiprocessing.Lock()
def mpWorker(joinableQueue, lock, results):
while True:
next_task = joinableQueue.get() # blocking call
if next_task is None: # poison pill to kill worker
break
simpleWorker(next_task['complexity']) # pretend to do heavy work
result = next_task['val'] * 2 # compute result
ID = next_task['ID']
with lock:
results[ID] = result # output result to shared memory
next_task['event'].set() # tell main process result is calculated
joinableQueue.task_done() # remove task from queue
#app.route("/work/<ID>", methods=['GET'])
def work(ID=None):
if request.method == 'GET':
# send a task to the consumer and wait for it to finish
uid = str(uuid.uuid4())
event = multiprocessing.Event()
# pass event to job so that job can tell this thread when processing is
# complete
joinableQueue.put({
'val': ID,
'ID': uid,
'event': event,
'complexity': 100000000
})
event.wait() # wait for result to be calculated
# get result from shared memory area, and clean up
with lock:
result = results[ID]
del results[ID]
return Response(str(result), 200)
if __name__ == "__main__":
num_consumers = multiprocessing.cpu_count() * 2
consumers = [
multiprocessing.Process(
target=mpWorker,
args=(joinableQueue, lock, results))
for i in range(num_consumers)
]
for c in consumers:
c.start()
host = '127.0.0.1'
port = 8080
app.run(host=host, port=port, threaded=True)

Why I am not able to do simultaneous requests in Tornado?

Below tornado APP has 2 end points. One(/) is slow because it waits for an IO operation and other(/hello) is fast.
My requirement is to make a request to both end points simultaneously.I observed it takes 2nd request only after it finishes the 1st one. Even though It is asynchronous why it is not able to handle both requests at same time ?
How to make it to handle simultaneously?
Edit : I am using windows 7, Eclipse IDE
****************Module*****************
import tornado.ioloop
import tornado.web
class MainHandler(tornado.web.RequestHandler):
#tornado.web.asynchronous
def get(self):
self.do_something()
self.write("FINISHED")
self.finish()
def do_something(self):
inp = input("enter to continue")
print (inp)
class HelloHandler(tornado.web.RequestHandler):
def get(self):
print ("say hello")
self.write("Hello bro")
self.finish(
def make_app():
return tornado.web.Application([
(r"/", MainHandler),
(r"/hello", HelloHandler)
])
if __name__ == "__main__":
app = make_app()
app.listen(8888)
tornado.ioloop.IOLoop.current().start()
It is asynchronous only if you make it so. A Tornado server runs in a single thread. If that thread is blocked by a synchronous function call, nothing else can happen on that thread in the meantime. What #tornado.web.asynchronous enables is the use of generators:
#tornado.web.asynchronous
def get(self):
yield from self.do_something()
^^^^^^^^^^
This yield/yield from (in current Python versions await) feature suspends the function and lets other code run on the same thread while the asynchronous call completes elsewhere (e.g. waiting for data from the database, waiting for a network request to return a response). I.e., if Python doesn't actively have to do something but is waiting for external processes to complete, it can yield processing power to other tasks. But since your function is very much running in the foreground and blocking the thread, nothing else will happen.
See http://www.tornadoweb.org/en/stable/guide/async.html and https://docs.python.org/3/library/asyncio.html.

flask\werkzeug: intercept a worker termination

Is there any way to execute some code just before a worker is turned off?
I'm not too confident on execution model of flask\werkzeug, the situation is this:
During the creation of flask application i start a deamon thread to do some external stuff (waiting on a queue essentially); i've setup this thread as demon because i don't want it to prevent the shut down of the worker running the flask application when it's needed.
there is my problem: i need to execute some clean up code just before the thread it's been killed by the worker, and my solution is to do those operations on a termination event (if any) of the worker
With python you can use the uwsgi.atexit hook. The function callback will be executed before exit.
import uwsgi, os
from flask import Flask
app = Flask('demo')
#app.route('/')
def index():
return "Hello World"
def callback():
print "Worker %i exinting" % os.getpid()
uwsgi.atexit = callback

Categories