I am writing an API using python3 + falcon combination.
There are lot of places in methods where I can send a reply to a client but because of some heavy code which does DB, i/o operations, etc it has to wait until the heavy part ends.
For example:
class APIHandler:
def on_get(self, req, resp):
response = "Hello"
#Some heavy code
resp.body(response)
I could send "Hello" at the first line of code. What I want is to run the heavy code in a background and send a response regardless of when the heavy part finishes.
Falcon does not have any built-in async capabilities but they mention it can be used with something like gevent. I haven't found any documentation of how to combine those two.
Client libraries have varying support for async operations, so the decision often comes down to which async approach is best supported by your particular backend client(s), combined with which WSGI server you would like to use. See also below for some of the more common options...
For libraries that do not support an async interaction model, either natively or via some kind of subclassing mechanism, tasks can be delegated to a thread pool. And for especially long-running tasks (i.e., on the order of several seconds or minutes), Celery's not a bad choice.
A brief survey of some of the more common async options for WSGI (and Falcon) apps:
Twisted. Favors an explicit asynchronous style, and is probably the most mature option. For integrating with a WSGI framework like Falcon, there's twisted.web.wsgi and crochet.
asyncio. Borrows many ideas from Twisted, but takes advantage of Python 3 language features to provide a cleaner interface. Long-term, this is probably the cleanest option, but necessitates an evolution of the WSGI interface (see also pulsar's extension to PEP-3333 as one possible approach). The asyncio ecosystem is relatively young at the time of this writing; the community is still experimenting with a wide variety of approaches around interfaces, patterns and tooling.
eventlet. Favors an implicit style that seeks to make async code look synchronous. One way eventlet does this is by monkey-patching I/O modules in the standard library. Some people don't like this approach because it masks the asynchronous mechanism, making edge cases harder to debug.
gevent. Similar to eventlet, albeit a bit more modern. Both uWSGI and Gunicorn support gevent worker types that monkey-patch the standard library.
Finally, it may be possible to extend Falcon to natively support twisted.web or asyncio (ala aiohttp), but I don't think anyone's tried it yet.
I use Celery for async related works . I don't know about gevent .Take a look at this http://celery.readthedocs.org/en/latest/getting-started/introduction.html
I think there are two different approaches here:
A task manager (like Celery)
An async implementation (like gevent)
What you achieve with each of them is different. With Celery, what you can do is to run all the code you need to compute the response synchronously, and then run in the background any other operation (like saving to logs). This way, the response should be faster.
With gevent, what you achieve, is to run in parallel different instances of your handler. So, if you have a single request, you won't see any difference in the response time, but if you have thousands of concurrent requests, the performance will be much better. The reason for this, is that without gevent, when your code executes an IO operation, it blocks the execution of that process, while with gevent, the CPU can go on executing other requests while the IO operation waits.
Setting up gevent is much easier than setting up Celery. If you're using gunicorn, you simply install gevent and change the worker type to gevent. Another advantage is that you can parallelize any operation that is required in the response (like extracting the response from a database). In Celery, you can't use the output of the Celery task in your response.
What I would recommend, is to start by using gevent, and consider to add Celery later (and have both of them) if:
The output of the task you will process with Celery is not required in the response
You have a different machine for your celery tasks, or the usage of your server has some peaks and some idle time (if your server is at 100% the whole time, you won't get anything good from using Celery)
The amount of work that your Celery tasks will do, are worth the overhead of using Celery
You can use multiprocessing.Process with deamon=True to run a daemonic process and return a response to the caller immediately:
from multiprocessing import Process
class APIHandler:
def on_get(self, req, resp):
heavy_process = Process( # Create a daemonic process
target=my_func,
daemon=True
)
heavy_process.start()
resp.body = "Quick response"
# Define some heavy function
def my_func():
time.sleep(10)
print("Process finished")
You can test it by sending a GET request. You will get a response immediately and, after 10s you will see a printed message in the console.
Related
I have a module in Python 3.5+, providing a function that reads some data from a remote web API and returns it. The function relies on a wrapper function, which in turn uses the library requests to make the HTTP call.
Here it is (omitting on purpose all data validation logic and exception handling):
# module fetcher.py
import requests
# high-level module API
def read(some_params):
resp = requests.get('http://example.com', params=some_params)
return resp.json()
# wrapper for the actual remote API call
def get_data(some_params):
return call_web_api(some_params)
The module is currently imported and used by multiple clients.
As of today, the call to get_data is inherently synchronous: this means that whoever uses the function fetcher.read() knows that this is going to block the thread the function is executed on.
What I would love to achieve
I want to allow the fetcher.read() to be run both in a synchronous and an asynchronous fashion (eg. via an event loop).
This is in order to keep compatibility with existing callers consuming the module and at the same time to offer the possibility
to leverage non-blocking calls to allow a better throughput for callers that do want to call the function asynchronously.
This said, my legitimate wish is to modify the original code as little as possible...
As of today, the only thing I know is that Requests does not support asynchronous operations out of the box and therefore I should switch to an asyncio-friendly HTTP client (eg. aiohttp) in order to provide a non-blocking behaviour
How would the above code need to be modified to meet my desiderata? Which also leads me to ask: is there any best practice about enhancing sync software APIs to async contexts?
I want to allow the fetcher.read() to be run both in a synchronous and an asynchronous fashion (eg. via an event loop).
I don't think it is feasible for the same function to be usable via both sync and async API because the usage patterns are so different. Even if you could somehow make it work, it would be just too easy to mess things up, especially taking into account Python's dynamic-typing nature. (For example, users might accidentally forget to await their functions in async code, and the sync code would kick in, thus blocking their event loop.)
Instead, I would recommend the actual API to be async, and to create a trivial sync wrapper that just invokes the entry points using run_until_complete. Something along these lines:
# new module afetcher.py (or fetcher_async, or however you like it)
import aiohttp
# high-level module API
async def read(some_params):
async with aiohttp.request('GET', 'http://example.com', params=some_params) as resp:
return await resp.json()
# wrapper for the actual remote API call
async def get_data(some_params):
return call_web_api(some_params)
Yes, you switch from using requests to aiohttp, but the change is mechanical as the APIs are very similar in spirit.
The sync module would exist for backward compatibility and convenience, and would trivially wrap the async functionality:
# module fetcher.py
import afetcher
def read(some_params):
loop = asyncio.get_event_loop()
return loop.run_until_complete(afetcher.read(some_params))
...
This approach provides both sync and async version of the API, without code duplication because the sync version consists of trivial trampolines, whose definition can be further compressed using appropriate decorators.
The async fetcher module should have a nice short name, so that the users don't feel punished for using the async functionality. It should be easy to use, and it actually provides a lot of new features compared to the sync API, most notably low-overhead parallelization and reliable cancellation.
The route that is not recommended is using run_in_executor or similar thread-based tool to run requests in a thread pool under the hood. That implementation doesn't provide the actual benefits of using asyncio, but incurs all the costs. In that case it is better to continue providing the synchronous API and leave it to the users to use concurrent.futures or similar tools for parallel execution, where they're at least aware they're using threads.
So which will be a better method to make database calls in Tornado for a high performance application with highly scalable infrastructure which makes a high number of database queries?
Method 1 : So I have come across async database drivers/clients like TorMySQL, Tornado-Mysql, asynctorndb,etc which can perform async database calls.
Method 2 : Use the general and common MySQLdb or mysqlclient (by PyMySQL) drivers and make the database calls to separate backend services and load-balance the calls with nginx.
Similar to what the original Friendfeed guys did which they mentioned in one of the groups,
Bret Taylor, one of the original authors, writes :
groups.google.com/group/python-tornado/browse_thread/thread/9a08f5f08cdab108
We experimented with different async DB approaches, but settled on
synchronous at FriendFeed because generally if our DB queries were
backlogging our requests, our backends couldn't scale to the load
anyway. Things that were slow enough were abstracted to separate
backend services which we fetched asynchronously via the async HTTP
module.
Other supporting links for Method 2 for better understanding what I am trying to say are:
StackOverFlow Answer
Method 3 : By making use of threads and IOLoop and using the general sync libraries.
Supporting example links to explain what I mean,
What the Tornado wiki says, https://github.com/tornadoweb/tornado/wiki/Threading-and-concurrency
Do it synchronously and block the IOLoop. This is most appropriate for
things like memcache and database queries that are under your control
and should always be fast. If it's not fast, make it fast by adding
the appropriate indexes to the database, etc.
Another sample Method :
For sync database(mysqldb), we can
executor = ThreadPoolExecutor(4)
result = yield executor.submit(mysqldb_operation)
Method 4 : Just use sync drivers like MySQLdb and start enough Tornado Instances and load balance using nginx that the application remains asynchronous on a broader level with some calls being blocked but other requests benefit the asynchronous nature via the large number of tornado instances.
'Explanation' :
For details follow this link - www.jjinux.com/2009/12/python-asynchronous-networking-apis-and.html, which says :
They allow MySQL queries to block the entire process. However, they
compensate in two ways. They lean heavily on their asynchronous web
client wherever possible. They also make use of multiple Python
processes. Hence, if you're handling 500 simultaneous request, you
might use nginx to split them among 10 different Tornado Web
processes. Each process is handling 50 simultaneous requests. If one
of those requests needs to make a call to the database, only 50
(instead of 500) requests are blocked.
FriendFeed used what you call "method 4": there was no separate backend service; the process was just blocked during the database call.
Using a completely asynchronous driver ("method 1") is generally best. However, this means that you can't use synchronous libraries that wrap the database operations like SQLAlchemy. In this case you may have to use threads ("method 3"), which is almost as good (and easier than "method 2").
The advantage of "method 4" is that it is easy. In the past this was enough to recommend it because introducing callbacks everywhere was tedious; with the advent of coroutines method 3 is almost as easy so it is usually better to use threads than to block the process.
Is it OK to run certain pieces of code asynchronously in a Django web app. If so how?
For example:
I have a search algorithm that returns hundreds or thousands of results. I want to enter into the database that these items were the result of the search, so I can see what users are searching most. I don't want the client to have to wait an extra hundred or thousand more database inserts. Is there a way I can do this asynchronously? Is there any danger in doing so? Is there a better way to achieve this?
As far as Django is concerned yes.
The bigger concern is your web server and if it plays nice with threading. For instance, the sync workers of gunicorn are single threads, but there are other engines, such as greenlet. I'm not sure how well they play with threads.
Combining threading and multiprocessing can be an issue if you're forking from threads:
Status of mixing multiprocessing and threading in Python
http://bugs.python.org/issue6721
That being said, I know of popular performance analytics utilities that have been using threads to report on metrics, so seems to be an accepted practice.
In sum, seems safest to use the threading.Thread object from the standard library, so long as whatever you do in it doesn't fork (python's multiprocessing library)
https://docs.python.org/2/library/threading.html
Offloading requests from the main thread is a common practice; as the end goal is to return a result to the client (browser) as quickly as possible.
As I am sure you are aware, HTTP is blocking - so until you return a response, the client cannot do anything (it is blocked, in a waiting state).
The de-facto way of offloading requests is through celery which is a task queuing system.
I highly recommend you read the introduction to celery topic, but in summary here is what happens:
You mark certain pieces of codes as "tasks". These are usually functions that you want to run asynchronously.
Celery manages workers - you can think of them as threads - that will run these tasks.
To communicate with the worker a message queue is required. RabbitMQ is the one often recommended.
Once you have all the components running (it takes but a few minutes); your workflow goes like this:
In your view, when you want to offload some work; you will call the function that does that work with the .delay() option. This will trigger the worker to start executing the method in the background.
Your view then returns a response immediately.
You can then check for the result of the task, and take appropriate actions based on what needs to be done. There are ways to track progress as well.
It is also good practice to include caching - so that you are not executing expensive tasks unnecessarily. For example, you might choose to offload a request to do some analytics on search keywords that will be placed in a report.
Once the report is generated, I would cache the results (if applicable) so that the same report can be displayed if requested later - rather than be generated again.
I'm working on a web application that will receive a request from a user and have to hit a number of external APIs to compose the answer to that request. This could be done directly from the main web thread using something like gevent to fan out the request.
Alternatively, I was thinking, I could put incoming requests into a queue and use workers to distribute the load. The idea would be to try to keep it real time, while splitting up the requests amongst several workers. Each of these workers would be querying only one of the many external APIs. The response they receive would then go through a series transformations, be saved into a DB, be transformed to a common schema and saved in a common DB to finally be composed into one big response that would be returned through the web request. The web request is most likely going to be blocking all this time, with a user waiting, so keeping
the queueing and dequeueing as fast as possible is important.
The external API calls can easily be turned into individual tasks. I think the linking
from one api task to a transformation to a DB saving task could be done using a chain, etc, and the final result combining all results returned to the web thread using a chord.
Some questions:
Can this (and should this) be done using celery?
I'm using django. Should I try to use django-celery over plain celery?
Each one of those tasks might spawn off other tasks - such as logging what just
happened or other types of branching off. Is this possible?
Could tasks be returning the data they get - i.e. potentially Kb of data through celery (redis as underlying in this case) or should they write to the DB, and just pass pointers to that data around?
Each task is mostly I/O bound, and was initially just going to use gevent from the web thread to fan out the requests and skip the whole queuing design, but it turns out that it would be reused for a different component. Trying to keep the whole round trip through the Qs real time will probably require many workers making sure the queueus are mostly empty. Or is it? Would running the gevent worker pool help with this?
Do I have to write gevent specific tasks or will using the gevent pool deal with network IO automagically?
Is it possible to assign priority to certain tasks?
What about keeping them in order?
Should I skip celery and just use kombu?
It seems like celery is geared more towards "tasks" that can be deferred and are
not time sensitive. Am I nuts for trying to keep this real time?
What other technologies should I look at?
Update: Trying to hash this out a bit more. I did some reading on Kombu and it seems to be able to do what I'm thinking of, although at a much lower level than celery. Here is a diagram of what I had in mind.
What seems to be possible with raw queues as accessible with Kombu is the ability for a number of workers to subscribe to a broadcast message. The type and number does not need to be known by the publisher if using a queue. Can something similar be achieved using Celery? It seems like if you want to make a chord, you need to know at runtime what tasks are going to be involved in the chord, whereas in this scenario you can simply add listeners to the broadcast, and simply make sure they announce they are in the running to add responses to the final queue.
Update 2: I see there is the ability to broadcast Can you combine this with a chord? In general, can you combine celery with raw kombu? This is starting to sound like a question about smoothies.
I will try to answer as many of the questions as possible.
Can this (and should this) be done using celery?
Yes you can
I'm using django. Should I try to use django-celery over plain celery?
Django has a good support for celery and would make the life much easier during development
Each one of those tasks might spawn off other tasks - such as logging
what just happened or other types of branching off. Is this possible?
You can start subtasks from withing a task with ignore_result = true for only side effects
Could tasks be returning the data they get - i.e. potentially Kb of
data through celery (redis as underlying in this case) or should they
write to the DB, and just pass pointers to that data around?
I would suggest putting the results in db and then passing id around would make your broker and workers happy. Less data transfer/pickling etc.
Each task is mostly I/O bound, and was initially just going to use
gevent from the web thread to fan out the requests and skip the whole
queuing design, but it turns out that it would be reused for a
different component. Trying to keep the whole round trip through the
Qs real time will probably require many workers making sure the
queueus are mostly empty. Or is it? Would running the gevent worker
pool help with this?
Since the process is io bound then gevent will definitely help here. However, how much the concurrency should be for gevent pool'd worker, is something that I'm looking for answer too.
Do I have to write gevent specific tasks or will using the gevent pool
deal with network IO automagically?
Gevent does the monkey patching automatically when you use it in pool. But the libraries that you use should play well with gevent. Otherwise, if your parsing some data with simplejson (which is written in c) then that would block other gevent greenlets.
Is it possible to assign priority to certain tasks?
You cannot assign specific priorities to certain tasks, but route them to different queue and then have those queues being listened to by varying number of workers. The more the workers for a particular queue, the higher would be the priority of that tasks on that queue.
What about keeping them in order?
Chain is one way to maintain order. Chord is a good way to summarize. Celery takes care of it, so you dont have to worry about it. Even when using gevent pool, it would at the end be possible to reason about the order of the tasks execution.
Should I skip celery and just use kombu?
You can, if your use case will not change to something more complex over time and also if you are willing to manage your processes through celeryd + supervisord by yourself. Also, if you don't care about the task monitoring that comes with tools such as celerymon, flower, etc.
It seems like celery is geared more towards "tasks" that can be
deferred and are not time sensitive.
Celery supports scheduled tasks as well. If that is what you meant by that statement.
Am I nuts for trying to keep this real time?
I don't think so. As long as your consumers are fast enough, it will be as good as real time.
What other technologies should I look at?
Pertaining to celery, you should choose result store wisely. My suggestion would be to use cassandra. It is good for realtime data (both write and query wise). You can also use redis or mongodb. They come with their own set of problems as result store. But then a little tweaking in configuration can go a long way.
If you mean something completely different from celery, then you can look into asyncio (python3.5) and zeromq for achieving the same. I can't comment more on that though.
Are there any pure wsgi implementation of background task?
I want to use local variables under the same context directly, not serialize/deserialize to another daemon process via a broker.
Is it possible to make this happen under the current wsgi infrastructure? E.g. after return response yield, run some callback functions?
This is a duplicate of question asked on the Python WEB-SIG. I reference the same page as provided in response to the question on the Python WEB-SIG so others can see it:
http://code.google.com/p/modwsgi/wiki/RegisteringCleanupCode
In doing this though, it ties up the request thread and so it would not be able to handle other requests until your task has finished.
Creating background threads at the end of a request is not a good idea unless you do it using a pooling mechanism such that you limit the number of worker threads for your tasks. Because the process can crash or be shutdown, you loose the job as only in memory and thus not persistent.
Better to use Celery, or if you think that is too heavy weight, have a look at Redis Queue (RQ) instead.
You could look at Django async. It uses an in-database queue and so handles transactions much better. All arguments need to be JSONable as does the return type. In some cases this means you may need to schedule a wrapper function, but that oughtn't to cause you any headaches.
http://pypi.python.org/pypi/django-async
You don't want to be doing this sort of thing inside the web server -- it's absolutely not the right place to do it. Django async provides a manage.py command for flushing the queue which you can run in a loop, possible on another machine from the web server.