Detect whether Celery is available and running

Detect whether Celery is available and running - python

I'm using Celery with Django for an online game.
I've written middleware to check whether Celery is available and running, based on this answer: Detect whether Celery is Available/Running
My code actually looks like this:
from celery.task.control import inspect
class CeleryCheckMiddleware(object):
def process_request(self, request):
insp = inspect().stats()
if not insp:
return render(...)
else:
return None
But I forgot the caveat in the comment at the bottom of that answer, 'I've discovered that the above adds two reply.celery.pidbox queues to rabbitmq every time it's run. This leads to an incremental increase in rabbitmq's memory usage.'
I'm now (only one day later!) noticing occasional 500 errors starting from the line insp = inspect().stats() and terminating with OSError: [Errno 4] Interrupted system call.
Is there a memory-safe way to check whether Celery is available and running?

This feels very heavy. You might be better off running an async task and collecting the result with an acceptable timeout. It's naive but it shouldn't impact resources a lot, depending on how often you have to call it thou .....
#app.job
def celery_alive():
return "OK"
def process_request(self, request):
res = celery_alive.apply_async()
try:
return "OK" == res.get(timeout=settings.ACCEPTABLE_TRANSACTION_TIME)
except TimeoutError as e:
return False

The below script is worked for me.
#Import the celery app from project
from application_package import app as celery_app
def get_celery_worker_status():
insp = celery_app.control.inspect()
nodes = insp.stats()
if not nodes:
raise Exception("celery is not running.")
logger.error("celery workers are: {}".format(nodes))
return nodes

Related

Call to async endpoint gets blocked by another thread

I have a tornado webservice which is going to serve something around 500 requests per minute. All these requests are going to hit 1 specific endpoint. There is a C++ program that I have compiled using Cython and use it inside the tornado service as my processor engine. Each request that goes to /check/ will trigger a function call in the C++ program (I will call it handler) and the return value will get sent to user as response.
This is how I wrap the handler class. One important point is that I do not instantiate the handler in __init__. There is another route in my tornado code that I want to start loading the DataStructure after an authroized request hits that route. (e.g. /reload/)
executors = ThreadPoolExecutor(max_workers=4)
class CheckerInstance(object):
def __init__(self, *args, **kwargs):
self.handler = None
self.is_loading = False
self.is_live = False
def init(self):
if not self.handler:
self.handler = pDataStructureHandler()
self.handler.add_words_from_file(self.data_file_name)
self.end_loading()
self.go_live()
def renew(self):
self.handler = None
self.init()
class CheckHandler(tornado.web.RequestHandler):
async def get(self):
query = self.get_argument("q", None).encode('utf-8')
answer = query
if not checker_instance.is_live:
self.write(dict(answer=self.get_argument("q", None), confidence=100))
return
checker_response = await checker_instance.get_response(query)
answer = checker_response[0]
confidence = checker_response[1]
if self.request.connection.stream.closed():
return
self.write(dict(correct=answer, confidence=confidence, is_cache=is_cache))
def on_connection_close(self):
self.wait_future.cancel()
class InstanceReloadHandler(BasicAuthMixin, tornado.web.RequestHandler):
def prepare(self):
self.get_authenticated_user(check_credentials_func=credentials.get, realm='Protected')
def new_file_exists(self):
return True
def can_reload(self):
return not checker_instance.is_loading
def get(self):
error = False
message = None
if not self.can_reload():
error = True
message = 'another job is being processed!'
else:
if not self.new_file_exists():
error = True
message = 'no new file found!'
else:
checker_instance.go_fake()
checker_instance.start_loading()
tornado.ioloop.IOLoop.current().run_in_executor(executors, checker_instance.renew)
message = 'job started!'
if self.request.connection.stream.closed():
return
self.write(dict(
success=not error, message=message
))
def on_connection_close(self):
self.wait_future.cancel()
def main():
app = tornado.web.Application(
[
(r"/", MainHandler),
(r"/check", CheckHandler),
(r"/reload", InstanceReloadHandler),
(r"/health", HealthHandler),
(r"/log-event", SubmitLogHandler),
],
debug=options.debug,
)
checker_instance = CheckerInstance()
I want this service to keep responding after checker_instance.renew starts running in another thread. But this is not what happens. When I hit the /reload/ endpoint and renew function starts working, any request to /check/ halts and waits for the reloading process to finish and then it starts working again. When the DataStructure is being loaded, the service should be in fake mode and respond to people with the same query that they send as input.
I have tested this code in my development environment with an i5 CPU (4 CPU cores) and it works just fine! But in the production environment (3 double-thread CPU cores) the /check/ endpoint halts requests.

It is difficult to fully trace the events being handled because you have clipped out some of the code for brevity. For instance, I don't see a get_response implementation here so I don't know if it is awaiting something itself that could be dependent on the state of checker_instance.
One area I would explore is in the thread-safety (or seeming absence of) in passing the checker_instance.renew to run_in_executor. This feels questionable to me because you are mutating the state of a single instance of CheckerInstance from a separate thread. While it might not break things explicitly, it does seem like this could be introducing odd race conditions or unanticipated copies of memory that might explain the unexpected behavior you are experiencing
If possible, I would make whatever load behavior you have that you want to offload to a thread be completely self-contained and when the data is loaded, return it as the function result which can then be fed back into you checker_instance. If you were to do this with the code as-is, you would want to await the run_in_executor call for its result and then update the checker_instance. This would mean the reload GET request would wait until the data was loaded. Alternatively, in your reload GET request, you could ioloop.spawn_callback to a function that triggers the run_in_executor in this manner, allowing the reload request to complete instead of waiting.

Long running script from flask endpoint

I've been pulling my hair out trying to figure this one out, hoping someone else has already encountered this and knows how to solve it :)
I'm trying to build a very simple Flask endpoint that just needs to call a long running, blocking php script (think while true {...}). I've tried a few different methods to async launch the script, but the problem is my browser never actually receives the response back, even though the code for generating the response after running the script is executed.
I've tried using both multiprocessing and threading, neither seem to work:
# multiprocessing attempt
#app.route('/endpoint')
def endpoint():
def worker():
subprocess.Popen('nohup php script.php &', shell=True, preexec_fn=os.setpgrp)
p = multiprocessing.Process(target=worker)
print '111111'
p.start()
print '222222'
return json.dumps({
'success': True
})
# threading attempt
#app.route('/endpoint')
def endpoint():
def thread_func():
subprocess.Popen('nohup php script.php &', shell=True, preexec_fn=os.setpgrp)
t = threading.Thread(target=thread_func)
print '111111'
t.start()
print '222222'
return json.dumps({
'success': True
})
In both scenarios I see the 111111 and 222222, yet my browser still hangs on the response from the endpoint. I've tried p.daemon = True for the process, as well as p.terminate() but no luck. I had hoped launching a script with nohup in a different shell and separate processs/thread would just work, but somehow Flask or uWSGI is impacted by it.
Update
Since this does work locally on my Mac when I start my Flask app directly with python app.py and hit it directly without going through my Nginx proxy and uWSGI, I'm starting to believe it may not be the code itself that is having issues. And because my Nginx just forwards the request to uWSGI, I believe it may possibly be something there that's causing it.
Here is my ini configuration for the domain for uWSGI, which I'm running in emperor mode:
[uwsgi]
protocol = uwsgi
max-requests = 5000
chmod-socket = 660
master = True
vacuum = True
enable-threads = True
auto-procname = True
procname-prefix = michael-
chdir = /srv/www/mysite.com
module = app
callable = app
socket = /tmp/mysite.com.sock

This kind of stuff is the actual and probably main use case for Python Celery (https://docs.celeryproject.org/). As a general rule, do not run long-running jobs that are CPU-bound in the wsgi process. It's tricky, it's inefficient, and most important thing, it's more complicated than setting up an async task in a celery worker. If you want to just prototype you can set the broker to memory and not using an external server, or run a single-threaded redis on the very same machine.
This way you can launch the task, call task.result() which is blocking, but it blocks in an IO-bound fashion, or even better you can just return immediately by retrieving the task_id and build a second endpoint /result?task_id=<task_id> that checks if result is available:
result = AsyncResult(task_id, app=app)
if result.state == "SUCCESS":
return result.get()
else:
return result.state # or do something else depending on the state
This way you have a non-blocking wsgi app that does what is best suited for: short time CPU-unbound calls that have IO calls at most with OS-level scheduling, then you can rely directly to the wsgi server workers|processes|threads or whatever you need to scale the API in whatever wsgi-server like uwsgi, gunicorn, etc. for the 99% of workloads as celery scales horizontally by increasing the number of worker processes.

This approach works for me, it calls the timeout command (sleep 10s) in the command line and lets it work in the background. It returns the response immediately.
#app.route('/endpoint1')
def endpoint1():
subprocess.Popen('timeout 10', shell=True)
return 'success1'
However, not testing on WSGI server, but just locally.

Would it be enough to use a background task? Then you only need to import threading e.g.
import threading
import ....
def endpoint():
"""My endpoint."""
try:
t = BackgroundTasks()
t.start()
except RuntimeError as exception:
return f"An error occurred during endpoint: {exception}", 400
return "successful.", 200
return "successfully started.", 200
class BackgroundTasks(threading.Thread):
def run(self,*args,**kwargs):
...do long running stuff

Django - How to properly send a request to server and ask to run a matlab program?

I have a django web project on a server. I want it to run a matlab code to produce some text file(which will be used later). Here is my code:
if(request.method == "POST"):
run_octave(Dataset,is_multiclass,require_mapper,mapper,require_aspect_ratio,aspect_ratio)
return redirect('meta2db.views.meta2db')
def run_octave(dataset,is_multiclass,require_mapper,mapper,require_aspect_ratio,aspect_ratio):
origWD = os.getcwd()
args = ["octave", "dbEval.m",dataset,is_multiclass,require_mapper,\
mapper,require_aspect_ratio,aspect_ratio]
os.chdir(os.path.join(os.path.abspath(sys.path[0]), "../scripts/"))
#subprocess call here
process = subprocess.Popen(args, stdout=subprocess.PIPE)
for line in process.stdout:
time.sleep(0.5)
Group("eval_status").send({"text": line.decode('utf-8')},immediately=True)
if process.poll() is None:
process.kill()
else:
print(process.communicate())
os.chdir(origWD)
I ues a post request to run the octave code with subprocess call. However the matlab code take awhile to be finished and always make the client timeout which is not acceptable. My question is how to solve this kind of problem in another way. A post request seems not a good solution.

This would be an asyncronous operation. This is not built in to django by default, but there are several ways to make it possible.
The most common choice is probably to use Celery. This is a distributed task queue that can be combined with django. It also requires that you install a message broker such as RabbitMQ.
http://www.celeryproject.org/
A newer alternative is django channels, which is part of the django project, but not a default part of django (at least not yet).
https://github.com/django/channels
See this question for more comparison of the two projects.
How Django channels are different than celery?
Both of these libraries are quite complex. If you are looking for more lightweight alternatives, see this question:
Simple approach to launching background task in Django

You could make it async.
And for async i mean that in the post request you will generate a uuid which identifies the operation.
You will launch the operation inside the post request using a different thread/process.
To check if the operation is done, you will have to setup a status view where given the operation id will return his status.
Adding just a code snippet as a proof of concept. I did not test it!
from concurrent.futures import ThreadPoolExecutor
import uuid
class Operation(object):
operations = {}
def __init__(self, *task):
self.id = str(uuid.uuid4())
self.task = task
self.done = False
self._thread = None
self._executor = ThreadPoolExecutor()
self.__class__.operations[self.id] = self
def run(self):
self._thread = self.executor.submit(*self.task)
self._thread.add_done_callback(self._callback)
return self.id
def _callback(self):
self.done = True
#classmethod
def is_operation_done(cls, id):
try:
return cls.operations[id].done
except IndexError:
raise Exception("Operation not found") # FIXME Custom exception here!!
if(request.method == "POST"):
operatiorn = Operation(run_octave,
Dataset,
is_multiclass,
require_mapper,
mapper,
require_aspect_ratio,
aspect_ratio)
id = operatiorn.run()
return id # jsonize it if you wanna pass it to the frontend
def is_operatiorn_done(request, operation_id):
# This is implementing a polling but would be bettere to have a socket!
return Operation.is_operation_done(operation_id)
def run_octave(dataset,is_multiclass,require_mapper,mapper,require_aspect_ratio,aspect_ratio):
.....
.....

Run methods after client's request

To minimize the request time I want to execute the method after return 200 to client.
#app.route('/register', methods=['POST'])
def register():
#code and code
return 200
send_email_with_validation_url()
How can I do it? With threads?

You can do it with threads, but without some control you could end up with lots of threads choking resources. You could also end up with processes crashing without you being aware.
This is the job for a queue system. Celery would be a good fit. Something along the lines of:
from celery import Celery
app = Celery('tasks', broker='amqp://guest#localhost//')
#app.task
send_email_job(address):
send_email_with_validation_url()
#app.route('/register', methods=['POST'])
def register():
#code and code
send_email_job.delay(address)
return 200
In this example, send_email_job will be scheduled run in the background (in a different thread or process or even machine if you want) with the given arguments and your server will return immediately.

Celery is great but if the task isn't critical asyncio would be a great option to explore, see this

Flask end response and continue processing

Is there a way in Flask to send the response to the client and then continue doing some processing? I have a few book-keeping tasks which are to be done, but I don't want to keep the client waiting.
Note that these are actually really fast things I wish to do, thus creating a new thread, or using a queue, isn't really appropriate here. (One of these fast things is actually adding something to a job queue.)

QUICK and EASY method.
We will use pythons Thread Library to acheive this.
Your API consumer has sent something to process and which is processed by my_task() function which takes 10 seconds to execute.
But the consumer of the API wants a response as soon as they hit your API which is return_status() function.
You tie the my_task to a thread and then return the quick response to the API consumer, while in the background the big process gets compelete.
Below is a simple POC.
import os
from flask import Flask,jsonify
import time
from threading import Thread
app = Flask(__name__)
#app.route("/")
def main():
return "Welcome!"
#app.route('/add_')
def return_status():
"""Return first the response and tie the my_task to a thread"""
Thread(target = my_task).start()
return jsonify('Response asynchronosly')
def my_task():
"""Big function doing some job here I just put pandas dataframe to csv conversion"""
time.sleep(10)
import pandas as pd
pd.DataFrame(['sameple data']).to_csv('./success.csv')
return print('large function completed')
if __name__ == "__main__":
app.run(host="0.0.0.0", port=8080)

Sadly teardown callbacks do not execute after the response has been returned to the client:
import flask
import time
app = flask.Flask("after_response")
#app.teardown_request
def teardown(request):
time.sleep(2)
print("teardown_request")
#app.route("/")
def home():
return "Success!\n"
if __name__ == "__main__":
app.run()
When curling this you'll note a 2s delay before the response displays, rather than the curl ending immediately and then a log 2s later. This is further confirmed by the logs:
teardown_request
127.0.0.1 - - [25/Jun/2018 15:41:51] "GET / HTTP/1.1" 200 -
The correct way to execute after a response is returned is to use WSGI middleware that adds a hook to the close method of the response iterator. This is not quite as simple as the teardown_request decorator, but it's still pretty straight-forward:
import traceback
from werkzeug.wsgi import ClosingIterator
class AfterResponse:
def __init__(self, app=None):
self.callbacks = []
if app:
self.init_app(app)
def __call__(self, callback):
self.callbacks.append(callback)
return callback
def init_app(self, app):
# install extension
app.after_response = self
# install middleware
app.wsgi_app = AfterResponseMiddleware(app.wsgi_app, self)
def flush(self):
for fn in self.callbacks:
try:
fn()
except Exception:
traceback.print_exc()
class AfterResponseMiddleware:
def __init__(self, application, after_response_ext):
self.application = application
self.after_response_ext = after_response_ext
def __call__(self, environ, start_response):
iterator = self.application(environ, start_response)
try:
return ClosingIterator(iterator, [self.after_response_ext.flush])
except Exception:
traceback.print_exc()
return iterator
Which you can then use like this:
#app.after_response
def after():
time.sleep(2)
print("after_response")
From the shell you will see the response return immediately and then 2 seconds later the after_response will hit the logs:
127.0.0.1 - - [25/Jun/2018 15:41:51] "GET / HTTP/1.1" 200 -
after_response
This is a summary of a previous answer provided here.

I had a similar problem with my blog. I wanted to send notification emails to those subscribed to comments when a new comment was posted, but I did not want to have the person posting the comment waiting for all the emails to be sent before he gets his response.
I used a multiprocessing.Pool for this. I started a pool of one worker (that was enough, low traffic site) and then each time I need to send an email I prepare everything in the Flask view function, but pass the final send_email call to the pool via apply_async.

You can find an example on how to use celery from within Flask
here https://gist.github.com/jzempel/3201722
The gist of the idea (pun intended) is to define the long, book-keeping tasks as #celery.task and use apply_async1 or delay to from within the view to start the task

Sounds like Teardown Callbacks would support what you want. And you might want to combine it with the pattern from Per-Request After-Request Callbacks to help with organizing the code.

You can do this with WSGI's close protocol, exposed from the Werkzeug Response object's call_on_close decorator. Explained in this other answer here: https://stackoverflow.com/a/63080968/78903

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Detect whether Celery is available and running - python

Related

Call to async endpoint gets blocked by another thread

Long running script from flask endpoint

Django - How to properly send a request to server and ask to run a matlab program?

Run methods after client's request

Flask end response and continue processing

Categories

Resources