Exporting prometheus metrics of sync vs async python apps

Exporting prometheus metrics of sync vs async python apps - python

I have a minimal async python server based on aiohttp.
It is very straightforward, just a websocket endpoint exposed as in
#routes.get('/my_endpoint')
async def my_func(request):
ws = web.WebSocketResponse()
await ws.prepare(request)
return ws
I want to expose as prometheus metrics the request rate (and potentially the error rate).
After performing a brief investigation on the topic, I realised that it seems like there is a distinction between approaching prometheus metrics exposure when it comes to sync vs async apps.
For my case, where I want a simple request count/rate, is there a reason not to just use the plain' old prometheus python client (e.g by simply decorating my_func?)
Would the request count actually fail in such a case?

The following is based on my understanding on asyncio and the way the official prometheus client describes how it exposes metrics.
aiohttp is to be used on top of asyncio. Now, asyncio is running something called an "event loop" which runs inside a single thread (usually the main thread)
You can look at it as an entity that decides to suspend or execute functions that were assigned to run in the loop. In your case my_func.
For prometheus_client to expose your metrics you will probably need to run it in a different thread
Metrics are usually exposed over HTTP, to be read by the Prometheus server. The easiest way to do this is via start_http_server, which will start a HTTP server in a daemon thread on the given port
This is outside "the control of the event loop" which might lead to performance issues and to unexpected behavior as a result. So the request count might not fail, but if for some reason its doing some blocking task (I/O) it will block the main thread as well. If you'd use the async approach and run it as part of the event loop your blocking task can be awaited and give back the control to the main thread.
There are open source projects that support prometheus in async functions such as aioprometheus and prometheus-async.

Related

is asyncio.to_thread() method different to ThreadPoolExecutor?

I see that asyncio.to_thread() method is been added #python 3.9+, its description says it runs blocking codes on a separate thread to run at once. see example below:
def blocking_io():
print(f"start blocking_io at {time.strftime('%X')}")
# Note that time.sleep() can be replaced with any blocking
# IO-bound operation, such as file operations.
time.sleep(1)
print(f"blocking_io complete at {time.strftime('%X')}")
async def main():
print(f"started main at {time.strftime('%X')}")
await asyncio.gather(
asyncio.to_thread(blocking_io),
asyncio.sleep(1))
print(f"finished main at {time.strftime('%X')}")
asyncio.run(main())
# Expected output:
#
# started main at 19:50:53
# start blocking_io at 19:50:53
# blocking_io complete at 19:50:54
# finished main at 19:50:54
By explanation, it seems like using thread mechanism and not context switching nor coroutine. Does this mean it is not actually an async after all? is it same as a traditional multi-threading as in concurrent.futures.ThreadPoolExecutor? what is the benefit of using thread this way then?

Source code of to_thread is quite simple. It boils down to awaiting run_in_executor with a default executor (executor argument is None) which is ThreadPoolExecutor.
In fact, yes, this is traditional multithreading, сode intended to run on a separate thread is not asynchronous, but to_thread allows you to await for its result asynchronously.
Also note that the function runs in the context of the current task, so its context variable values will be available inside the func.
async def to_thread(func, /, *args, **kwargs):
"""Asynchronously run function *func* in a separate thread.
Any *args and **kwargs supplied for this function are directly passed
to *func*. Also, the current :class:`contextvars.Context` is propogated,
allowing context variables from the main thread to be accessed in the
separate thread.
Return a coroutine that can be awaited to get the eventual result of *func*.
"""
loop = events.get_running_loop()
ctx = contextvars.copy_context()
func_call = functools.partial(ctx.run, func, *args, **kwargs)
return await loop.run_in_executor(None, func_call)

you would use asyncio.to_tread when ever you need to call a blocking api from a third party lib that either does not have an asyncio adapter/interface or where you do not want to create one because you just need to use a limited number of functions form that lib.
a concrete example is i am currently writing a applicaiton that will eventually run as a daemon at which point it will use asyncio for its core event loop. The eventloop will involved monitoring a unix socket for notifications which will trigger the deamon to take an action.
for rapid prototyping its currently a cli but one of the depencies/external system the deamon will interact with is call libvirt, an abstraction layer for virtual machine management written in c with a python wrapper called libvirt python.
the python binding are blocking and comunitcate with the libvirt deamon over a separate unix socket with a blocking request responce protocol.
you can conceptually think of making a call to the libvirt bindings as each function internally making a http request to a server and waiting for the server to complete the action. The exact mechanics of how it does that are not important for this disucssion just that its a blocking io operation that depends on and external process that may take some time. i.e. this is not a cpu bound call and therefore it can be offloaded to a thread and awaited.
if i was to directly call “domains = libvirt.conn.listAllDomains()” in a async function
that would block my asyncio event loop until i got a responce form libvirt.
so if any events were recived on the unix socket my main loop is monitoring
they would not be processed while we are waiting for the libvirt deamon to look up all domains and return the list of them to us.
if i use “domains = await asyncio.to_thread(libvirt.conn.listAllDomains)”
however the await call will suspend my current coroutine until we get the responce, yeilding execution back to the asyncio event loop. that means if the daemon recives a notification while we are waiting on libvirt it can be schduled to run concurrently instead of being blocked.
in my application i will also need to read and write to linux speical files in /sys. linux has natiave aio file support which can be used with asyncio vai aiofile however linux does not supprot the aio interface for managing special files, so i would have to use blocking io.
one way to do that in a async applicaiton would be to wrap function that writes to the special files asyncio.to_thread.
i could and might use a decorator to use run_in_executor directly since i own the write_sysfs function but if i did not then to_thread is more polite then monkeypatching someone else’s lib and less work then creating my own wrapper api.
hopefully those are useful examples of where you might want to use to_thread. its really just a convince function and you can use run_in_executor to do the same thing with so addtional overhead.
if you need to support older python release you might also prefer run_in_executor since it predates the intorduction of to_thread but if you can assume 3.9+ then its a nice addtion to leverage when you need too.

Enhance synchronous software API to allow asynchronous consuming

I have a module in Python 3.5+, providing a function that reads some data from a remote web API and returns it. The function relies on a wrapper function, which in turn uses the library requests to make the HTTP call.
Here it is (omitting on purpose all data validation logic and exception handling):
# module fetcher.py
import requests
# high-level module API
def read(some_params):
resp = requests.get('http://example.com', params=some_params)
return resp.json()
# wrapper for the actual remote API call
def get_data(some_params):
return call_web_api(some_params)
The module is currently imported and used by multiple clients.
As of today, the call to get_data is inherently synchronous: this means that whoever uses the function fetcher.read() knows that this is going to block the thread the function is executed on.
What I would love to achieve
I want to allow the fetcher.read() to be run both in a synchronous and an asynchronous fashion (eg. via an event loop).
This is in order to keep compatibility with existing callers consuming the module and at the same time to offer the possibility
to leverage non-blocking calls to allow a better throughput for callers that do want to call the function asynchronously.
This said, my legitimate wish is to modify the original code as little as possible...
As of today, the only thing I know is that Requests does not support asynchronous operations out of the box and therefore I should switch to an asyncio-friendly HTTP client (eg. aiohttp) in order to provide a non-blocking behaviour
How would the above code need to be modified to meet my desiderata? Which also leads me to ask: is there any best practice about enhancing sync software APIs to async contexts?

I want to allow the fetcher.read() to be run both in a synchronous and an asynchronous fashion (eg. via an event loop).
I don't think it is feasible for the same function to be usable via both sync and async API because the usage patterns are so different. Even if you could somehow make it work, it would be just too easy to mess things up, especially taking into account Python's dynamic-typing nature. (For example, users might accidentally forget to await their functions in async code, and the sync code would kick in, thus blocking their event loop.)
Instead, I would recommend the actual API to be async, and to create a trivial sync wrapper that just invokes the entry points using run_until_complete. Something along these lines:
# new module afetcher.py (or fetcher_async, or however you like it)
import aiohttp
# high-level module API
async def read(some_params):
async with aiohttp.request('GET', 'http://example.com', params=some_params) as resp:
return await resp.json()
# wrapper for the actual remote API call
async def get_data(some_params):
return call_web_api(some_params)
Yes, you switch from using requests to aiohttp, but the change is mechanical as the APIs are very similar in spirit.
The sync module would exist for backward compatibility and convenience, and would trivially wrap the async functionality:
# module fetcher.py
import afetcher
def read(some_params):
loop = asyncio.get_event_loop()
return loop.run_until_complete(afetcher.read(some_params))
...
This approach provides both sync and async version of the API, without code duplication because the sync version consists of trivial trampolines, whose definition can be further compressed using appropriate decorators.
The async fetcher module should have a nice short name, so that the users don't feel punished for using the async functionality. It should be easy to use, and it actually provides a lot of new features compared to the sync API, most notably low-overhead parallelization and reliable cancellation.
The route that is not recommended is using run_in_executor or similar thread-based tool to run requests in a thread pool under the hood. That implementation doesn't provide the actual benefits of using asyncio, but incurs all the costs. In that case it is better to continue providing the synchronous API and leave it to the users to use concurrent.futures or similar tools for parallel execution, where they're at least aware they're using threads.

Mixing tornado and sqlalchemy

I'm trying to write a tornado web application that uses sqlalchemy in some request handlers. These handlers have two parts: one that takes a long time to complete, and another that uses sqlalchemy and is relatively fast.
I would like to make the slow part of the request asynchronous, but not the sqlalchemy part. Can I do something like the following code and be safe?
class ExampleHandler(BaseHandler):
async def post(self):
loop = asyncio.get_event_loop()
await loop.run_in_executor(...) # very slow (no sqlalchemy here)
with self.db_session() as s: # sqlalchemy session
s.add(...)
s.commit()
self.render(...)
The idea is to have sqlalchemy still blocking, but have the computational heavy part not blocking the application.

The tornado web server uses asynchronous code to get around the limit of the python Global Interpreter Lock. The GIL, as it is colloquially known, allows only one thread of execution to take place in the python interpreter process. Tornado is able to answer many requests simultaneously because of its use of an event loop. The event loop can perform one small task at a time. Let's take your own post handler to understand this better.
In this handler, when the python interpreter gets to the await keyword, it pauses the execution of the function and queues it for later on its event loop. It then checks the event loop to respond to other events that may have queued up there, like responding to a new connection or servicing another handler.
When you block in an asynchronous function, you freeze the entire event loop as it is unable to pause your function and service anything else. What this actually means for you is that your web server will not accept or service any requests while your async function blocks. It will appear as if your web server is hanging and indeed it is stuck.
To keep the server responsive, you have to find a way to execute your sqlalchemy query in an asynchronous non-blocking manner.

Python websockets get stuck

I have a python server that is available through a websocket endpoint.
During serving a connection, it also communicates with some backend services. This communication is asynchronous and may trigger the send() method of the websocket.
When a single client is served, it seems to work ok. However, when multiple clients are served in parallel, some of the routines that handle the connections get stuck occasionally. More precisely, it seem to block in the recv() method.
The actual code is somehow complex and the issue is slightly more complicated than I have described, nevertheless, I provide a minimal skeleton of code that sketch the way in which I use he websockets:
class MinimalConversation(object):
def __init__(self, ws, worker_sck, messages, should_continue_conversation, should_continue_listen):
self.ws = ws
self.messages = messages
self.worker_sck = worker_sck
self.should_continue_conversation = should_continue_conversation
self.should_continue_listen = should_continue_listen
async def run_conversation(self):
serving_future = asyncio.ensure_future(self.serve_connection())
listening_future = asyncio.ensure_future(self.handle_worker())
await asyncio.wait([serving_future, listening_future], return_when=asyncio.ALL_COMPLETED)
async def serve_connection(self):
while self.should_continue_conversation():
await self.ws.recv()
logger.debug("Message received")
self.sleep_randomly(10, 5)
await self.worker_sck.send(b"Dummy")
async def handle_worker(self):
while self.should_continue_listen():
self.sleep_randomly(50, 40)
await self.worker_sck.recv()
await self.ws.send(self.messages.pop())
def sleep_randomly(self, mean, dev):
delta = random.randint(1, dev) / 1000
if random.random() < .5:
delta *= -1
time.sleep(mean / 1000 + delta)
Obviously, in the real code I do not sleep for random intervals and don't use given list of messages but this sketches the way I handle the websockets. In the real setting, some errors may occur that are sent over the websocket too, so parallel sends() may occur in theory but I have never encountered such a situation.
The code is run from a handler function which is passed as a parameter to websockets.serve(), initialize the MinimalConversation object and calls the run_conversation() method.
My questions are:
Is there something fundamentally wrong with such usage of the websockets?
Are concurrent calls of the send() methods dangerous?
Can you suggest some good practices regarding usage of websockets and asyncio?
Thak you.

recv function yields back only when a message is received, and it seems that there are 2 connections awaiting messages from each other, so there might be a situation similar to "deadlock" when they are waiting for each other's messages and can't send anything. Maybe you should try to rethink the overall algorithm to be safer from this.
And, of course, try adding more debug output and see what really happens.
are concurrent calls of the send() methods dangerous?
If by concurrent you mean in the same thread but in independently scheduled coroutines then parallel send is just fine. But be careful with "parallel" recv on the same connection, because order of coroutine scheduling might be far from obvious and it's what decides which call to recv will get a message first.
Can you suggest some good practices regarding usage of websockets and asyncio?
In my experience, the easiest way is to create a dedicated task for incoming connections which will repeatedly call recv on the connection, until connection is closed. You can store the connection somewhere and delete it in finally block, then it can be used from other coroutines to send something.

Python falcon and async operations

I am writing an API using python3 + falcon combination.
There are lot of places in methods where I can send a reply to a client but because of some heavy code which does DB, i/o operations, etc it has to wait until the heavy part ends.
For example:
class APIHandler:
def on_get(self, req, resp):
response = "Hello"
#Some heavy code
resp.body(response)
I could send "Hello" at the first line of code. What I want is to run the heavy code in a background and send a response regardless of when the heavy part finishes.
Falcon does not have any built-in async capabilities but they mention it can be used with something like gevent. I haven't found any documentation of how to combine those two.

Client libraries have varying support for async operations, so the decision often comes down to which async approach is best supported by your particular backend client(s), combined with which WSGI server you would like to use. See also below for some of the more common options...
For libraries that do not support an async interaction model, either natively or via some kind of subclassing mechanism, tasks can be delegated to a thread pool. And for especially long-running tasks (i.e., on the order of several seconds or minutes), Celery's not a bad choice.
A brief survey of some of the more common async options for WSGI (and Falcon) apps:
Twisted. Favors an explicit asynchronous style, and is probably the most mature option. For integrating with a WSGI framework like Falcon, there's twisted.web.wsgi and crochet.
asyncio. Borrows many ideas from Twisted, but takes advantage of Python 3 language features to provide a cleaner interface. Long-term, this is probably the cleanest option, but necessitates an evolution of the WSGI interface (see also pulsar's extension to PEP-3333 as one possible approach). The asyncio ecosystem is relatively young at the time of this writing; the community is still experimenting with a wide variety of approaches around interfaces, patterns and tooling.
eventlet. Favors an implicit style that seeks to make async code look synchronous. One way eventlet does this is by monkey-patching I/O modules in the standard library. Some people don't like this approach because it masks the asynchronous mechanism, making edge cases harder to debug.
gevent. Similar to eventlet, albeit a bit more modern. Both uWSGI and Gunicorn support gevent worker types that monkey-patch the standard library.
Finally, it may be possible to extend Falcon to natively support twisted.web or asyncio (ala aiohttp), but I don't think anyone's tried it yet.

I use Celery for async related works . I don't know about gevent .Take a look at this http://celery.readthedocs.org/en/latest/getting-started/introduction.html

I think there are two different approaches here:
A task manager (like Celery)
An async implementation (like gevent)
What you achieve with each of them is different. With Celery, what you can do is to run all the code you need to compute the response synchronously, and then run in the background any other operation (like saving to logs). This way, the response should be faster.
With gevent, what you achieve, is to run in parallel different instances of your handler. So, if you have a single request, you won't see any difference in the response time, but if you have thousands of concurrent requests, the performance will be much better. The reason for this, is that without gevent, when your code executes an IO operation, it blocks the execution of that process, while with gevent, the CPU can go on executing other requests while the IO operation waits.
Setting up gevent is much easier than setting up Celery. If you're using gunicorn, you simply install gevent and change the worker type to gevent. Another advantage is that you can parallelize any operation that is required in the response (like extracting the response from a database). In Celery, you can't use the output of the Celery task in your response.
What I would recommend, is to start by using gevent, and consider to add Celery later (and have both of them) if:
The output of the task you will process with Celery is not required in the response
You have a different machine for your celery tasks, or the usage of your server has some peaks and some idle time (if your server is at 100% the whole time, you won't get anything good from using Celery)
The amount of work that your Celery tasks will do, are worth the overhead of using Celery

You can use multiprocessing.Process with deamon=True to run a daemonic process and return a response to the caller immediately:
from multiprocessing import Process
class APIHandler:
def on_get(self, req, resp):
heavy_process = Process( # Create a daemonic process
target=my_func,
daemon=True
)
heavy_process.start()
resp.body = "Quick response"
# Define some heavy function
def my_func():
time.sleep(10)
print("Process finished")
You can test it by sending a GET request. You will get a response immediately and, after 10s you will see a printed message in the console.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.