Eventlet wsgi server and time-consuming operations in requests

Eventlet wsgi server and time-consuming operations in requests - python

Let's assume we have a WSGI app which is hosted on an event-driven single-threaded server:
from eventlet import wsgi
import eventlet
def app(env, start_response):
# IO opeartions here
...
wsgi.server(eventlet.listen(('', 8090)), app)
Within app function, some I/O operations such as reading files or DB access must be performed.
Now, when we perform IO operations in app, the server is effectively blocked and can't serve other clients.
Q: What are possible solutions to this problem? How can I get Eventlet wsgi server perform time-consuming operations while not getting blocked?

TL;DR: use mysqldb/psycopg or eventlet.import_patched() pure python DB drivers; tpool.execute() for files and everything else.
Try to mend your thought process into separating operations which could be converted to cooperation with Eventlet and those for which it is impossible. Cooperation here means breaking into "execute code" - "wait for result" parts and providing notification mechanism when result is ready. Main notification mechanism for Eventlet is file descriptors.
So everything that waits for file descriptor is a candidate to be green (not blocking). Most importantly, it affects all network IO. If your blocking function is written in pure Python, just use import_patched(module_name) to modify its socket and other references to Eventlet green version. mysqldb and psycopg2 are special cases of C extension modules made cooperative thanks for explicit support from their authors. Everything else blocking in non Python code - your option is OS threads.
Unfortunately, waiting on actual disk files is full of quirks, so I recommend using OS threads and we have built-in thread pool to support that. Convert blocking_fun(filepath, something_else) to eventlet.tpool.execute(blocking_fun, filepath, something_else) and it doesn't block everything. Check tpool documentation for details.
If you can, reengineer whole application into blocking and non-blocking processes and have them communicate via sockets. This is hard from code rewriting point of view, but very simple for runtime, debugging; robust and fail-proof design.

Related

Is every flask request run on a separate thread? [duplicate]

What exactly does passing threaded = True to app.run() do?
My application processes input from the user, and takes a bit of time to do so. During this time, the application is unable to handle other requests. I have tested my application with threaded=True and it allows me to handle multiple requests concurrently.

As of Flask 1.0, the WSGI server included with Flask is run in threaded mode by default.
Prior to 1.0, or if you disable threading, the server is run in single-threaded mode, and can only handle one request at a time. Any parallel requests will have to wait until they can be handled, which can lead to issues if you tried to contact your own server from a request.
With threaded=True requests are each handled in a new thread. How many threads your server can handle concurrently depends entirely on your OS and what limits it sets on the number of threads per process. The implementation uses the SocketServer.ThreadingMixIn class, which sets no limits to the number of threads it can spin up.
Note that the Flask server is designed for development only. It is not a production-ready server. Don't rely on it to run your site on the wider web. Use a proper WSGI server (like gunicorn or uWSGI) instead.

How many requests will my application be able to handle concurrently with this statement?
This depends drastically on your application. Each new request will have a thread launched- it depends on how many threads your machine can handle. I don't see an option to limit the number of threads (like uwsgi offers in a production deployment).
What are the downsides to using this? If i'm not expecting more than a few requests concurrently, can I just continue to use this?
Switching from a single thread to multi-threaded can lead to concurrency bugs... if you use this be careful about how you handle global objects (see the g object in the documentation!) and state.

Handle Flask requests concurrently with threaded=True

What exactly does passing threaded = True to app.run() do?
My application processes input from the user, and takes a bit of time to do so. During this time, the application is unable to handle other requests. I have tested my application with threaded=True and it allows me to handle multiple requests concurrently.

As of Flask 1.0, the WSGI server included with Flask is run in threaded mode by default.
Prior to 1.0, or if you disable threading, the server is run in single-threaded mode, and can only handle one request at a time. Any parallel requests will have to wait until they can be handled, which can lead to issues if you tried to contact your own server from a request.
With threaded=True requests are each handled in a new thread. How many threads your server can handle concurrently depends entirely on your OS and what limits it sets on the number of threads per process. The implementation uses the SocketServer.ThreadingMixIn class, which sets no limits to the number of threads it can spin up.
Note that the Flask server is designed for development only. It is not a production-ready server. Don't rely on it to run your site on the wider web. Use a proper WSGI server (like gunicorn or uWSGI) instead.

How many requests will my application be able to handle concurrently with this statement?
This depends drastically on your application. Each new request will have a thread launched- it depends on how many threads your machine can handle. I don't see an option to limit the number of threads (like uwsgi offers in a production deployment).
What are the downsides to using this? If i'm not expecting more than a few requests concurrently, can I just continue to use this?
Switching from a single thread to multi-threaded can lead to concurrency bugs... if you use this be careful about how you handle global objects (see the g object in the documentation!) and state.

Python falcon and async operations

I am writing an API using python3 + falcon combination.
There are lot of places in methods where I can send a reply to a client but because of some heavy code which does DB, i/o operations, etc it has to wait until the heavy part ends.
For example:
class APIHandler:
def on_get(self, req, resp):
response = "Hello"
#Some heavy code
resp.body(response)
I could send "Hello" at the first line of code. What I want is to run the heavy code in a background and send a response regardless of when the heavy part finishes.
Falcon does not have any built-in async capabilities but they mention it can be used with something like gevent. I haven't found any documentation of how to combine those two.

Client libraries have varying support for async operations, so the decision often comes down to which async approach is best supported by your particular backend client(s), combined with which WSGI server you would like to use. See also below for some of the more common options...
For libraries that do not support an async interaction model, either natively or via some kind of subclassing mechanism, tasks can be delegated to a thread pool. And for especially long-running tasks (i.e., on the order of several seconds or minutes), Celery's not a bad choice.
A brief survey of some of the more common async options for WSGI (and Falcon) apps:
Twisted. Favors an explicit asynchronous style, and is probably the most mature option. For integrating with a WSGI framework like Falcon, there's twisted.web.wsgi and crochet.
asyncio. Borrows many ideas from Twisted, but takes advantage of Python 3 language features to provide a cleaner interface. Long-term, this is probably the cleanest option, but necessitates an evolution of the WSGI interface (see also pulsar's extension to PEP-3333 as one possible approach). The asyncio ecosystem is relatively young at the time of this writing; the community is still experimenting with a wide variety of approaches around interfaces, patterns and tooling.
eventlet. Favors an implicit style that seeks to make async code look synchronous. One way eventlet does this is by monkey-patching I/O modules in the standard library. Some people don't like this approach because it masks the asynchronous mechanism, making edge cases harder to debug.
gevent. Similar to eventlet, albeit a bit more modern. Both uWSGI and Gunicorn support gevent worker types that monkey-patch the standard library.
Finally, it may be possible to extend Falcon to natively support twisted.web or asyncio (ala aiohttp), but I don't think anyone's tried it yet.

I use Celery for async related works . I don't know about gevent .Take a look at this http://celery.readthedocs.org/en/latest/getting-started/introduction.html

I think there are two different approaches here:
A task manager (like Celery)
An async implementation (like gevent)
What you achieve with each of them is different. With Celery, what you can do is to run all the code you need to compute the response synchronously, and then run in the background any other operation (like saving to logs). This way, the response should be faster.
With gevent, what you achieve, is to run in parallel different instances of your handler. So, if you have a single request, you won't see any difference in the response time, but if you have thousands of concurrent requests, the performance will be much better. The reason for this, is that without gevent, when your code executes an IO operation, it blocks the execution of that process, while with gevent, the CPU can go on executing other requests while the IO operation waits.
Setting up gevent is much easier than setting up Celery. If you're using gunicorn, you simply install gevent and change the worker type to gevent. Another advantage is that you can parallelize any operation that is required in the response (like extracting the response from a database). In Celery, you can't use the output of the Celery task in your response.
What I would recommend, is to start by using gevent, and consider to add Celery later (and have both of them) if:
The output of the task you will process with Celery is not required in the response
You have a different machine for your celery tasks, or the usage of your server has some peaks and some idle time (if your server is at 100% the whole time, you won't get anything good from using Celery)
The amount of work that your Celery tasks will do, are worth the overhead of using Celery

You can use multiprocessing.Process with deamon=True to run a daemonic process and return a response to the caller immediately:
from multiprocessing import Process
class APIHandler:
def on_get(self, req, resp):
heavy_process = Process( # Create a daemonic process
target=my_func,
daemon=True
)
heavy_process.start()
resp.body = "Quick response"
# Define some heavy function
def my_func():
time.sleep(10)
print("Process finished")
You can test it by sending a GET request. You will get a response immediately and, after 10s you will see a printed message in the console.

python http server, multiple simultaneous requests

I have developed a rather extensive http server written in python utilizing tornado. Without setting anything special, the server blocks on requests and can only handle one at a time. The requests basically access data (mysql/redis) and print it out in json. These requests can take upwards of a second at the worst case. The problem is that a request comes in that takes a long time (3s), then an easy request comes in immediately after that would take 5ms to handle. Well since that first request is going to take 3s, the second one doesn't start until the first one is done. So the second request takes >3s to be handled.
How can I make this situation better? I need that second simple request to begin executing regardless of other requests. I'm new to python, and more experienced with apache/php where there is no notion of two separate requests blocking each other. I've looked into mod_python to emulate the php example, but that seems to block as well. Can I change my tornado server to get the functionality that I want? Everywhere I read, it says that tornado is great at handling multiple simultaneous requests.
Here is the demo code I'm working with. I have a sleep command which I'm using to test if the concurrency works. Is sleep a fair way to test concurrency?
import tornado.httpserver
import tornado.ioloop
import tornado.web
import tornado.gen
import time
class MainHandler(tornado.web.RequestHandler):
#tornado.web.asynchronous
#tornado.gen.engine
def handlePing1(self):
time.sleep(4)#simulating an expensive mysql call
self.write("response to browser ....")
self.finish()
def get(self):
start = time.time()
self.handlePing1()
#response = yield gen.Task(handlePing1)#i see tutorials around that suggest using something like this ....
print "done with request ...", self.request.path, round((time.time()-start),3)
application = tornado.web.Application([
(r"/.*", MainHandler),
])
if __name__ == "__main__":
http_server = tornado.httpserver.HTTPServer(application)
port=8833;
http_server.listen(port)
print "listening on "+str(port);
tornado.ioloop.IOLoop.instance().start()
Thanks for any help!

Edit: remember that Redis is also single threaded, so even if you have concurrent requests, your bottleneck will be Redis. You won't be able to process more requests because Redis won't be able to process them.
Tornado is single-threaded, event-loop based server.
From the documentation:
By using non-blocking network I/O, Tornado can scale to tens of thousands of open connections, making it ideal for long polling, WebSockets, and other applications that require a long-lived connection to each user.
Concurrency in tornado is achieved through asynchronous callbacks. The idea is to do as little as possible in the main event loop (single-threaded) to avoid blocking and defer i/o operations through callbacks.
If using asynchronous operations doesn't work for you (ex: no async driver for MySQL, or Redis), your only way of handling more concurrent requests is to run multiple processes.
The easiest way is to front your tornado processes with a reverse-proxy like HAProxy or Nginx. The tornado doc recommends Nginx: http://www.tornadoweb.org/en/stable/overview.html#running-tornado-in-production
Your basically run multiple versions of your app on different ports. Ex:
python app.py --port=8000
python app.py --port=8001
python app.py --port=8002
python app.py --port=8003
A good rule of thumb is to run 1 process for each core on your server.
Nginx will take care of balancing each incoming requests to the different backends. So if one of the request is slow (~ 3s) you have n-1 other processes listening for incoming requests. It is possible – and very likely – that all processes will be busy processing a slow-ish request, in which case requests will be queued and processed when any process becomes free, eg. finished processing the request.
I strongly recommend you start with Nginx before trying HAProxy as the latter is a little bit more advanced and thus a bit more complex to setup properly (lots of switches to tweak).
Hope this helps. Key take-away: Tornado is great for async I/O, less so for CPU heavy workloads.

I had same problem, but no tornado, no mysql.
Do you have one database connection shared with all server?
I created a multiprocessing.Pool. Each have its own db connection provided by init function. I wrap slow code in function and map it to Pool. So i have no shared variables and connections.
Sleep not blocks other threads, but DB transaction may block threads.
You need to setup Pool at top of your code.
def spawn_pool(fishes=None):
global pool
from multiprocessing import Pool
def init():
from storage import db #private connections
db.connect() #connections stored in db-framework and will be global in each process
pool = Pool(processes=fishes,initializer=init)
if __name__ == "__main__":
spawn_pool(8)
from storage import db #shared connection for quick-type requests.
#code here
if __name__ == "__main__":
start_server()
Many of concurrent quick requests may slowdown one big request, but this concurrency will be placed on database server only.

Is safe to use the shove module to store data in a non blocking program?

I'm writing a simple crawler with eventlet and I want to store all the url I retrieve in a simple datastore like shove. Is safe to use it in a non blocking enviroment?

Since most modules are written in the traditional synchronous/blocking module, unless your module explicitly touts that it is asynchronous, you need to handle it with a callback in your eventlet program. The shove home page doesn't mention anything about the issue, which means its probably going to block on file I/O. You might want to ask the shove development community if there's an async variant.

It depends whether you are OK with blocking disk I/O. A lot of people accept the blocking aspect of disk I/O even in asynchronous programs, as they regard it as 'fast enough'. If not, you'll have to move the data store handling to another thread or worker threads.
Or figure out if your O/S can do non-blocking disk I/O from a single thread and port your database library to that. But that's likely a lot of extra work.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.