Is there any way to execute some code just before a worker is turned off?
I'm not too confident on execution model of flask\werkzeug, the situation is this:
During the creation of flask application i start a deamon thread to do some external stuff (waiting on a queue essentially); i've setup this thread as demon because i don't want it to prevent the shut down of the worker running the flask application when it's needed.
there is my problem: i need to execute some clean up code just before the thread it's been killed by the worker, and my solution is to do those operations on a termination event (if any) of the worker
With python you can use the uwsgi.atexit hook. The function callback will be executed before exit.
import uwsgi, os
from flask import Flask
app = Flask('demo')
#app.route('/')
def index():
return "Hello World"
def callback():
print "Worker %i exinting" % os.getpid()
uwsgi.atexit = callback
Related
I have a scenario where I want to trigger an event anytime a new task is created of finished. I'm currently trying to make use of Celery task signals to do so, but can't figure out how to trigger a certain socket event from within the task signal functions.
from .extensions import celery, socketio
from celery.signals import task_prerun, task_postrun
#celery.task
def myTask():
""" do some things """
return {"things"}
#task_prerun.connect(sender=myTask)
def task_prerun_notifier(sender=None, **kwargs):
print("From task_prerun_notifier ==> Running just before add() executes")
socketio.emit("trigger me") #doesn't work
#task_postrun.connect(sender=myTask)
def task_postrun_notifier(sender=None, **kwargs):
print("From task_postrun_notifier ==> Ok, done!")
socketio.emit("trigger me") #doesn't work
The task signals runs at the correct time, but I am not able to trigger the event trigger me. I have tried it with only emit("trigger me") instead of socketio.emit("trigger me") as well but without any luck.
How do you execute emits from within a Celery task signal?
Let's say I have a (websocket) API, api.py, as such:
from flask import Flask, request
from flask_socketio import SocketIO, emit
from worker import Worker
app = Flask()
socketio = SocketIO(app)
worker = Worker()
worker.start()
#socketio.on('connect')
def connect():
print("Client", request.sid, "connected")
#socketio.on('get_results')
def get_results(query):
"""
The only endpoing of the API.
"""
print("Client", request.sid, "requested results for query", query)
# Set the worker to work, wait for results to be ready, and
# send the results back to the client.
worker.task_queue.put(query)
results = worker.result_queue.get()
emit("results", results)
#socketio.on('disconnect')
def disconnect():
print("Client", request.sid, "disconnected, perhaps before results where ready")
# What to do here?
socketio.run(app, host='')
The a API will serve many clients but only has a single worker to produce the results that should be served. worker.py:
from multiprocessing import Process, Queue
class Worker(Process):
def __init__(self):
super().__init__()
self.task_queue = Queue()
self.result_queue = Queue()
self.some_stateful_variable = 0
# Do other computationally expensive work
def reset_state(self):
# Computationally inexpensive.
pass
def do_work(self, task):
# Computationally expensive. Takes long time.
# Modifies internal state.
pass
def run(self):
while True:
task = self.task_queue.get()
results = self.do_work(task)
self.result_queue.put(results)
The worker gets a request, i.e. a task to do, and sets forth producing a result. When the result is ready, the client will be served it.
But not all clients are patient. They may leave, i.e. disconnect from the API, before the results are ready. They don't want them, and the worker therefore ends up working on a task that does not need to finish. That makes other client in queue wait unnecessarily. How to avoid this situation, and get the worker to abort executing do_work for a task that does not need to finish?
In client side: when user closes browser tab or leave the page send request to your Flask server, the request should contain id of the task you would like to cancel.
In server side put cancel status for the task in database or any shared variable between Flask Server and your Worker Process
Divide Task processing in several stages and check status of task in database before each stage, if status is canceled - stop the task processing.
Another choice for point 1 is to do some monitoring in Server side in separate Process - count interval between status requests from client side.
I've handled similar problems by launching an entirely separate process via:
sp.call('start python path\\worker.py', shell=True)
worker.py would then report its PID back to the api.py via redis, then its straightforward to kill the process at any point from api.py
Of course, how viable that is for you will depend on how much data resides within api.py and is shared to worker.py - whether its feasible for that to also pass via redis or not is for you to decide.
The added benefit is you decouple socket from heavy compute - and you can go quasi-multi-core (single thread per worker.py). You could go full multi core by incorporating multiprocessing into each worker.py if you wished.
I have a long-running background task that spins the flask app again and to do some auditing in the background. The front end is a web application and uses socketio to communicate with the backend main flask app to handle multiple async behaviors.
I make sure to only fire the background task when the main thread is created and I do eventlet.monkey_patch() only at the very beginning of the script.
if the background thread has a lot of stuff to audit, it blocks the main thread, the more stuff in memory, the longer it blocks the main thread. The audit is not CPU intensive at all, it's just some db inserts and logging.
The items that need to be audited are added to an object in memory from the main thread and are passed by reference to the child thread. (Like a in memory queue)
If I don't monkey patch eventlet, then everything works fine but then flask's auto reload won't work, and I need it for development.
I run the app like socketio.run(app) in dev
Behavior persists when using gunicorn/eventlet
When the background task is sleeping sleep(2), there's no block happening.
import eventlet
eventlet.monkey_patch()
# ... rest of code is a basic flask app create_app factory that at some # point starts the new thread if it's the main thread
# the async code that runs is the following
class AsyncAuditor(threading.Thread):
def __init__(self, tasks: list, stop: threading.Event):
super().__init__()
self.tasks = tasks
self.stop_event = stop
def run(self):
from app import init_app
from dal import db
app = init_app(mode='sys')
app.logger.info('starting async audit thread')
with app.app_context():
try:
while not self.stop_event.is_set():
if len(self.tasks) > 0:
task: dict
for task in self.tasks:
app.logger.debug(threading.current_thread().name + ' new audit record')
task.payload = encryptor.encrypt(task.payload)
task.ip = struct.unpack("!I", socket.inet_aton(task.ip))[0]
db.session.add(task)
self.tasks.clear()
db.session.commit()
sleep(2)
app.logger.info('exiting async audit thread')
except BaseException as e:
app.logger.exception('Exception')
# there's some code that tries to gracefully exit if app needs to exit
stop_event = threading.Event()
async_task = AsyncAuditor(API.audit_tasks, stop_event)
async_task.start()
def exit_async_thread():
stop_event.set()
async_task.join()
atexit.register(exit_async_thread)
I expect that while the child thread is working, the main thread would not be blocked by any db operations, in fact, like I mentioned before, if I don't monkey patch eventlet, then everything works fine in the main thread and the child one as well. Instead, I'm getting 9 and even 30 seconds delays when hitting an endpoint in the flask application while the background task is working.
I've been pulling my hair out trying to figure this one out, hoping someone else has already encountered this and knows how to solve it :)
I'm trying to build a very simple Flask endpoint that just needs to call a long running, blocking php script (think while true {...}). I've tried a few different methods to async launch the script, but the problem is my browser never actually receives the response back, even though the code for generating the response after running the script is executed.
I've tried using both multiprocessing and threading, neither seem to work:
# multiprocessing attempt
#app.route('/endpoint')
def endpoint():
def worker():
subprocess.Popen('nohup php script.php &', shell=True, preexec_fn=os.setpgrp)
p = multiprocessing.Process(target=worker)
print '111111'
p.start()
print '222222'
return json.dumps({
'success': True
})
# threading attempt
#app.route('/endpoint')
def endpoint():
def thread_func():
subprocess.Popen('nohup php script.php &', shell=True, preexec_fn=os.setpgrp)
t = threading.Thread(target=thread_func)
print '111111'
t.start()
print '222222'
return json.dumps({
'success': True
})
In both scenarios I see the 111111 and 222222, yet my browser still hangs on the response from the endpoint. I've tried p.daemon = True for the process, as well as p.terminate() but no luck. I had hoped launching a script with nohup in a different shell and separate processs/thread would just work, but somehow Flask or uWSGI is impacted by it.
Update
Since this does work locally on my Mac when I start my Flask app directly with python app.py and hit it directly without going through my Nginx proxy and uWSGI, I'm starting to believe it may not be the code itself that is having issues. And because my Nginx just forwards the request to uWSGI, I believe it may possibly be something there that's causing it.
Here is my ini configuration for the domain for uWSGI, which I'm running in emperor mode:
[uwsgi]
protocol = uwsgi
max-requests = 5000
chmod-socket = 660
master = True
vacuum = True
enable-threads = True
auto-procname = True
procname-prefix = michael-
chdir = /srv/www/mysite.com
module = app
callable = app
socket = /tmp/mysite.com.sock
This kind of stuff is the actual and probably main use case for Python Celery (https://docs.celeryproject.org/). As a general rule, do not run long-running jobs that are CPU-bound in the wsgi process. It's tricky, it's inefficient, and most important thing, it's more complicated than setting up an async task in a celery worker. If you want to just prototype you can set the broker to memory and not using an external server, or run a single-threaded redis on the very same machine.
This way you can launch the task, call task.result() which is blocking, but it blocks in an IO-bound fashion, or even better you can just return immediately by retrieving the task_id and build a second endpoint /result?task_id=<task_id> that checks if result is available:
result = AsyncResult(task_id, app=app)
if result.state == "SUCCESS":
return result.get()
else:
return result.state # or do something else depending on the state
This way you have a non-blocking wsgi app that does what is best suited for: short time CPU-unbound calls that have IO calls at most with OS-level scheduling, then you can rely directly to the wsgi server workers|processes|threads or whatever you need to scale the API in whatever wsgi-server like uwsgi, gunicorn, etc. for the 99% of workloads as celery scales horizontally by increasing the number of worker processes.
This approach works for me, it calls the timeout command (sleep 10s) in the command line and lets it work in the background. It returns the response immediately.
#app.route('/endpoint1')
def endpoint1():
subprocess.Popen('timeout 10', shell=True)
return 'success1'
However, not testing on WSGI server, but just locally.
Would it be enough to use a background task? Then you only need to import threading e.g.
import threading
import ....
def endpoint():
"""My endpoint."""
try:
t = BackgroundTasks()
t.start()
except RuntimeError as exception:
return f"An error occurred during endpoint: {exception}", 400
return "successful.", 200
return "successfully started.", 200
class BackgroundTasks(threading.Thread):
def run(self,*args,**kwargs):
...do long running stuff
When I try to start a thread in the same process as a flask app is running, two threads are started. So "once" will be printed twice.
from threading import Timer
from flask import Flask
app = Flask(__name__)
app.config.update(dict(
DEBUG = True
))
def once():
print("once")
t = Timer(1, once, ())
t.start()
app.run()
This only happens when DEBUG is true.
Anyone have any idea how to prevent this from happening when debugging?
Werkzeug's reloading support has to fork to be able to reload the module correctly. As such your module is imported twice at least; more if you altered the module and it is reloaded.
You can switch this off the reloader with use_reloader=False:
app.run(use_reloader=False)
or you can start your thread in a #app.before_first_request decorated function:
t = Timer(1, once, ())
#app.before_first_request
def start_thread():
t.start()
The start_thread function is now only executed when the first request comes in, not when importing.