python http server, multiple simultaneous requests - python

I have developed a rather extensive http server written in python utilizing tornado. Without setting anything special, the server blocks on requests and can only handle one at a time. The requests basically access data (mysql/redis) and print it out in json. These requests can take upwards of a second at the worst case. The problem is that a request comes in that takes a long time (3s), then an easy request comes in immediately after that would take 5ms to handle. Well since that first request is going to take 3s, the second one doesn't start until the first one is done. So the second request takes >3s to be handled.
How can I make this situation better? I need that second simple request to begin executing regardless of other requests. I'm new to python, and more experienced with apache/php where there is no notion of two separate requests blocking each other. I've looked into mod_python to emulate the php example, but that seems to block as well. Can I change my tornado server to get the functionality that I want? Everywhere I read, it says that tornado is great at handling multiple simultaneous requests.
Here is the demo code I'm working with. I have a sleep command which I'm using to test if the concurrency works. Is sleep a fair way to test concurrency?
import tornado.httpserver
import tornado.ioloop
import tornado.web
import tornado.gen
import time
class MainHandler(tornado.web.RequestHandler):
#tornado.web.asynchronous
#tornado.gen.engine
def handlePing1(self):
time.sleep(4)#simulating an expensive mysql call
self.write("response to browser ....")
self.finish()
def get(self):
start = time.time()
self.handlePing1()
#response = yield gen.Task(handlePing1)#i see tutorials around that suggest using something like this ....
print "done with request ...", self.request.path, round((time.time()-start),3)
application = tornado.web.Application([
(r"/.*", MainHandler),
])
if __name__ == "__main__":
http_server = tornado.httpserver.HTTPServer(application)
port=8833;
http_server.listen(port)
print "listening on "+str(port);
tornado.ioloop.IOLoop.instance().start()
Thanks for any help!

Edit: remember that Redis is also single threaded, so even if you have concurrent requests, your bottleneck will be Redis. You won't be able to process more requests because Redis won't be able to process them.
Tornado is single-threaded, event-loop based server.
From the documentation:
By using non-blocking network I/O, Tornado can scale to tens of thousands of open connections, making it ideal for long polling, WebSockets, and other applications that require a long-lived connection to each user.
Concurrency in tornado is achieved through asynchronous callbacks. The idea is to do as little as possible in the main event loop (single-threaded) to avoid blocking and defer i/o operations through callbacks.
If using asynchronous operations doesn't work for you (ex: no async driver for MySQL, or Redis), your only way of handling more concurrent requests is to run multiple processes.
The easiest way is to front your tornado processes with a reverse-proxy like HAProxy or Nginx. The tornado doc recommends Nginx: http://www.tornadoweb.org/en/stable/overview.html#running-tornado-in-production
Your basically run multiple versions of your app on different ports. Ex:
python app.py --port=8000
python app.py --port=8001
python app.py --port=8002
python app.py --port=8003
A good rule of thumb is to run 1 process for each core on your server.
Nginx will take care of balancing each incoming requests to the different backends. So if one of the request is slow (~ 3s) you have n-1 other processes listening for incoming requests. It is possible – and very likely – that all processes will be busy processing a slow-ish request, in which case requests will be queued and processed when any process becomes free, eg. finished processing the request.
I strongly recommend you start with Nginx before trying HAProxy as the latter is a little bit more advanced and thus a bit more complex to setup properly (lots of switches to tweak).
Hope this helps. Key take-away: Tornado is great for async I/O, less so for CPU heavy workloads.

I had same problem, but no tornado, no mysql.
Do you have one database connection shared with all server?
I created a multiprocessing.Pool. Each have its own db connection provided by init function. I wrap slow code in function and map it to Pool. So i have no shared variables and connections.
Sleep not blocks other threads, but DB transaction may block threads.
You need to setup Pool at top of your code.
def spawn_pool(fishes=None):
global pool
from multiprocessing import Pool
def init():
from storage import db #private connections
db.connect() #connections stored in db-framework and will be global in each process
pool = Pool(processes=fishes,initializer=init)
if __name__ == "__main__":
spawn_pool(8)
from storage import db #shared connection for quick-type requests.
#code here
if __name__ == "__main__":
start_server()
Many of concurrent quick requests may slowdown one big request, but this concurrency will be placed on database server only.

Related

Is every flask request run on a separate thread? [duplicate]

What exactly does passing threaded = True to app.run() do?
My application processes input from the user, and takes a bit of time to do so. During this time, the application is unable to handle other requests. I have tested my application with threaded=True and it allows me to handle multiple requests concurrently.
As of Flask 1.0, the WSGI server included with Flask is run in threaded mode by default.
Prior to 1.0, or if you disable threading, the server is run in single-threaded mode, and can only handle one request at a time. Any parallel requests will have to wait until they can be handled, which can lead to issues if you tried to contact your own server from a request.
With threaded=True requests are each handled in a new thread. How many threads your server can handle concurrently depends entirely on your OS and what limits it sets on the number of threads per process. The implementation uses the SocketServer.ThreadingMixIn class, which sets no limits to the number of threads it can spin up.
Note that the Flask server is designed for development only. It is not a production-ready server. Don't rely on it to run your site on the wider web. Use a proper WSGI server (like gunicorn or uWSGI) instead.
How many requests will my application be able to handle concurrently with this statement?
This depends drastically on your application. Each new request will have a thread launched- it depends on how many threads your machine can handle. I don't see an option to limit the number of threads (like uwsgi offers in a production deployment).
What are the downsides to using this? If i'm not expecting more than a few requests concurrently, can I just continue to use this?
Switching from a single thread to multi-threaded can lead to concurrency bugs... if you use this be careful about how you handle global objects (see the g object in the documentation!) and state.

Eventlet wsgi server and time-consuming operations in requests

Let's assume we have a WSGI app which is hosted on an event-driven single-threaded server:
from eventlet import wsgi
import eventlet
def app(env, start_response):
# IO opeartions here
...
wsgi.server(eventlet.listen(('', 8090)), app)
Within app function, some I/O operations such as reading files or DB access must be performed.
Now, when we perform IO operations in app, the server is effectively blocked and can't serve other clients.
Q: What are possible solutions to this problem? How can I get Eventlet wsgi server perform time-consuming operations while not getting blocked?
TL;DR: use mysqldb/psycopg or eventlet.import_patched() pure python DB drivers; tpool.execute() for files and everything else.
Try to mend your thought process into separating operations which could be converted to cooperation with Eventlet and those for which it is impossible. Cooperation here means breaking into "execute code" - "wait for result" parts and providing notification mechanism when result is ready. Main notification mechanism for Eventlet is file descriptors.
So everything that waits for file descriptor is a candidate to be green (not blocking). Most importantly, it affects all network IO. If your blocking function is written in pure Python, just use import_patched(module_name) to modify its socket and other references to Eventlet green version. mysqldb and psycopg2 are special cases of C extension modules made cooperative thanks for explicit support from their authors. Everything else blocking in non Python code - your option is OS threads.
Unfortunately, waiting on actual disk files is full of quirks, so I recommend using OS threads and we have built-in thread pool to support that. Convert blocking_fun(filepath, something_else) to eventlet.tpool.execute(blocking_fun, filepath, something_else) and it doesn't block everything. Check tpool documentation for details.
If you can, reengineer whole application into blocking and non-blocking processes and have them communicate via sockets. This is hard from code rewriting point of view, but very simple for runtime, debugging; robust and fail-proof design.

Handle Flask requests concurrently with threaded=True

What exactly does passing threaded = True to app.run() do?
My application processes input from the user, and takes a bit of time to do so. During this time, the application is unable to handle other requests. I have tested my application with threaded=True and it allows me to handle multiple requests concurrently.
As of Flask 1.0, the WSGI server included with Flask is run in threaded mode by default.
Prior to 1.0, or if you disable threading, the server is run in single-threaded mode, and can only handle one request at a time. Any parallel requests will have to wait until they can be handled, which can lead to issues if you tried to contact your own server from a request.
With threaded=True requests are each handled in a new thread. How many threads your server can handle concurrently depends entirely on your OS and what limits it sets on the number of threads per process. The implementation uses the SocketServer.ThreadingMixIn class, which sets no limits to the number of threads it can spin up.
Note that the Flask server is designed for development only. It is not a production-ready server. Don't rely on it to run your site on the wider web. Use a proper WSGI server (like gunicorn or uWSGI) instead.
How many requests will my application be able to handle concurrently with this statement?
This depends drastically on your application. Each new request will have a thread launched- it depends on how many threads your machine can handle. I don't see an option to limit the number of threads (like uwsgi offers in a production deployment).
What are the downsides to using this? If i'm not expecting more than a few requests concurrently, can I just continue to use this?
Switching from a single thread to multi-threaded can lead to concurrency bugs... if you use this be careful about how you handle global objects (see the g object in the documentation!) and state.

Python falcon and async operations

I am writing an API using python3 + falcon combination.
There are lot of places in methods where I can send a reply to a client but because of some heavy code which does DB, i/o operations, etc it has to wait until the heavy part ends.
For example:
class APIHandler:
def on_get(self, req, resp):
response = "Hello"
#Some heavy code
resp.body(response)
I could send "Hello" at the first line of code. What I want is to run the heavy code in a background and send a response regardless of when the heavy part finishes.
Falcon does not have any built-in async capabilities but they mention it can be used with something like gevent. I haven't found any documentation of how to combine those two.
Client libraries have varying support for async operations, so the decision often comes down to which async approach is best supported by your particular backend client(s), combined with which WSGI server you would like to use. See also below for some of the more common options...
For libraries that do not support an async interaction model, either natively or via some kind of subclassing mechanism, tasks can be delegated to a thread pool. And for especially long-running tasks (i.e., on the order of several seconds or minutes), Celery's not a bad choice.
A brief survey of some of the more common async options for WSGI (and Falcon) apps:
Twisted. Favors an explicit asynchronous style, and is probably the most mature option. For integrating with a WSGI framework like Falcon, there's twisted.web.wsgi and crochet.
asyncio. Borrows many ideas from Twisted, but takes advantage of Python 3 language features to provide a cleaner interface. Long-term, this is probably the cleanest option, but necessitates an evolution of the WSGI interface (see also pulsar's extension to PEP-3333 as one possible approach). The asyncio ecosystem is relatively young at the time of this writing; the community is still experimenting with a wide variety of approaches around interfaces, patterns and tooling.
eventlet. Favors an implicit style that seeks to make async code look synchronous. One way eventlet does this is by monkey-patching I/O modules in the standard library. Some people don't like this approach because it masks the asynchronous mechanism, making edge cases harder to debug.
gevent. Similar to eventlet, albeit a bit more modern. Both uWSGI and Gunicorn support gevent worker types that monkey-patch the standard library.
Finally, it may be possible to extend Falcon to natively support twisted.web or asyncio (ala aiohttp), but I don't think anyone's tried it yet.
I use Celery for async related works . I don't know about gevent .Take a look at this http://celery.readthedocs.org/en/latest/getting-started/introduction.html
I think there are two different approaches here:
A task manager (like Celery)
An async implementation (like gevent)
What you achieve with each of them is different. With Celery, what you can do is to run all the code you need to compute the response synchronously, and then run in the background any other operation (like saving to logs). This way, the response should be faster.
With gevent, what you achieve, is to run in parallel different instances of your handler. So, if you have a single request, you won't see any difference in the response time, but if you have thousands of concurrent requests, the performance will be much better. The reason for this, is that without gevent, when your code executes an IO operation, it blocks the execution of that process, while with gevent, the CPU can go on executing other requests while the IO operation waits.
Setting up gevent is much easier than setting up Celery. If you're using gunicorn, you simply install gevent and change the worker type to gevent. Another advantage is that you can parallelize any operation that is required in the response (like extracting the response from a database). In Celery, you can't use the output of the Celery task in your response.
What I would recommend, is to start by using gevent, and consider to add Celery later (and have both of them) if:
The output of the task you will process with Celery is not required in the response
You have a different machine for your celery tasks, or the usage of your server has some peaks and some idle time (if your server is at 100% the whole time, you won't get anything good from using Celery)
The amount of work that your Celery tasks will do, are worth the overhead of using Celery
You can use multiprocessing.Process with deamon=True to run a daemonic process and return a response to the caller immediately:
from multiprocessing import Process
class APIHandler:
def on_get(self, req, resp):
heavy_process = Process( # Create a daemonic process
target=my_func,
daemon=True
)
heavy_process.start()
resp.body = "Quick response"
# Define some heavy function
def my_func():
time.sleep(10)
print("Process finished")
You can test it by sending a GET request. You will get a response immediately and, after 10s you will see a printed message in the console.

how is cherrypy working? it handls requests well compared with tornado when concurrence is low

I carried out a test on cherrypy (using web.py as a framework) and tornado retrieving webpages from the internet.
I have three test cases using siege to send requests to server (-c means number of users; -t is testing time). Code is below the test results.
1. web.py (cherrpy)
siege ip -c20 -t100s server can handle 2747requests
siege ip -c200 -t30s server can handle 1361requests
siege ip -c500 -t30s server can handle 170requests
2. tornado synchronous
siege ip -c20 -t100s server can handle 600requests
siege ip -c200 -t30s server can handle 200requests
siege ip -c500 -t30s server can handle 116requests
3. tornado asynchronous
siege ip -c20 -t100s server can handle 3022requests
siege ip -c200 -t30s server can handle 2259requests
siege ip -c500 -t30s server can handle 471requests
performance analysis:
tornado synchronous < web.py (cherrypy) < tornado asynchronous
Question 1:
I know, using an asynchronous architecture can improve the performance of a web server dramatically.
I'm curious about the difference between tornado asynchronous architecture and web.py (cherry).
I think tornado synchronous mode handles requests one by one, but how is cherrypy working, using multiple threads? But I didn't see a large increase of memory. Cherrypy might handle multiple requests concurrently. How does it solve the blocking of a program?
Question 2:
Can I improve the performance of tornado synchronous mode without using asynchronous techniques? I think tornado can do better.
Web.py code:
import web
import tornado.httpclient
urls = (
'/(.*)', 'hello'
)
app = web.application(urls, globals())
class hello:
def GET(self, name):
client = tornado.httpclient.HTTPClient()
response=client.fetch("http://www.baidu.com/")
return response.body
if __name__ == "__main__":
app.run()
Tornado synchronous:
import tornado.ioloop
import tornado.options
import tornado.web
import tornado.httpclient
from tornado.options import define, options
define("port", default=8000, help="run on the given port", type=int)
class IndexHandler(tornado.web.RequestHandler):
def get(self):
client = tornado.httpclient.HTTPClient()
response = client.fetch("http://www.baidu.com/" )
self.write(response.body)
if __name__=='__main__':
tornado.options.parse_command_line()
app=tornado.web.Application(handlers=[(r'/',IndexHandler)])
http_server=tornado.httpserver.HTTPServer(app)
http_server.listen(options.port)
tornado.ioloop.IOLoop.instance().start()
Tornado asynchronous:
import tornado.httpserver
import tornado.ioloop
import tornado.options
import tornado.web
import tornado.httpclient
from tornado.options import define, options
define("port", default=8001, help="run on the given port", type=int)
class IndexHandler(tornado.web.RequestHandler):
#tornado.web.asynchronous
def get(self):
client = tornado.httpclient.AsyncHTTPClient()
response = client.fetch("http://www.baidu.com/" ,callback=self.on_response)
def on_response(self,response):
self.write(response.body)
self.finish()
if __name__=='__main__':
tornado.options.parse_command_line()
app=tornado.web.Application(handlers=[(r'/',IndexHandler)])
http_server=tornado.httpserver.HTTPServer(app)
http_server.listen(options.port)
tornado.ioloop.IOLoop.instance().start()
To answer question 1...
Tornado is single threaded. If you block the main thread, as you do in your synchronous example, then that single thread cannot do anything until the blocking call returns. This limits the synchronous example to one request at a time.
I am not particularly familiar with web.py, but looking at the source for its HTTP server it appears to be using a threading mixin, which suggests that it is not limited to handling one request at a time. When the first request comes in, it is handled by a single thread. That thread will block until the HTTP client call returns, but other threads are free to handle further incoming requests. This allows for more requests to be processed at once.
I suspect if you emulated this with Tornado, eg, by handing off HTTP client requests to a thread pool, then you'd see similar throughput.
Most of the handler time in your test code is by far spent in client.fetch(...) - effectively waiting for connection and for incoming data on a socket - while not blocking potential other Python threads.
So your "performance measure" is mostly determined by the max number of effective handler threads of the framework in question and by the max number of parallel connections which the "baidu" server allows from your IP.
wep.py's copy of CherryPyWSGIServer web.wsgiserver.CherryPyWSGIServer which is used by the default web.httpserver.runsimple() indeed uses threads - 10 by default.
Threads do not increase memory usage a lot here. Most memory is consumed by the libraries and Python interpreter itself here.
And CherryPyWSGIServer's (10) handling worker threads are all started right at the beginning.
The alternative web.httpserver.runbasic() also uses threads - via Python's builtin HTTPServer and SocketServer.ThreadingMixIn. This one starts a new thread for each request. Probably "unlimited" number of threads - but there is overhead for thread startup for each request.
tornado asynchronous mode may also use more/unlimited number of threads (?), which may explain the difference to web.py here.
Your test doesn't say much about the execution speed of the servers & handler frameworks themselves. You may simply increase the max number of threads in web.py's CherryPyWSGIServer. Parallel execution of your client.fetch(...)'s is somehow needed to get more "performance" here.
To test the mere server / framework speed (the overhead cost) simply return a string or a database query or a typical complete web page rendered from local contents.
A multithreaded CPython based web&app server in one process finally cannot use much more than one CPU core (of maybe 8 CPU cores typically today on server hardware) - because of the GIL in CPython, which is only released for some I/O. So if CPU-load becomes a factor (and not the network or database speed), Jython or a multi-processes approach could be considered.

Categories