I am building a gevent application in which I use gevent.http.HTTPServer. The application must support CORS, and properly handle HTTP OPTIONS requests. However, when OPTIONS arrives, HTTPServer automatically sends out a 501 Not Implemented, without even dispatching anything to my connection greenlet.
What is the way to work around this? I would not want to introduce an extra framework/web server via WSGI just to be able to support HTTP OPTIONS.
Practically the only option in this situation is to switch to using WSGI. I ended up switching to pywsgi.WSGIServer, and the problem solved itself.
It's important to understand that switching to WSGI in reality introduces very little (if any) overhead, giving you so many benefits that the practical pros far outweigh the hypothetical cons.
Related
I know GIL blocks python from running its threads across cores. If it does so, why python is being used in webservers, how are the companies like youtube, instagram handling it.
PS: I know alternatives like multiprocessing can solve it. But it would be great if anyone can post it with a scenario that was handled by them.
Python is used for server-side handling in webservers, but not (usually) as webserver.
On normal setup: we have have Apache or other webserver to handles a lot of processes (server-side) (python uses usually wsgi). Note usually apache handles directly "static" files. So we have one apache server, many parallel apache processes (to handle connection and basic http) and many python processes which handles one connection per time.
Each of such process are independent each others (they just use the same resources), so you can program your server side part easily, without worrying about deadlocks. It is mostly a trade-off: performance of code, and easy and quickly to produce code without huge problems. But usually webserver with python scale very well (also on large sites), and servers are cheaper then programmers.
Note: security is also increased by having just one request in a process.
GIL exists in CPython, (Python interpreter made in C and most used), other interpreter versions such as Jython or IronPython don't have such problem, because they don't have GIL.
Even though, using CPython you can still have concurrency, just do your thing in C and then "link it" in your Python code, just like Numpy or similar do.
Other thing is, even though you have your page using Flask or Django, when you set up it in a production server, you have an Apache or Nginx, etc which has a real charge balancer (or load balancer, I can't remember the name in english now) that can serve the page to many people at the same time.
Take it from the Flask docs (link):
Flask’s built-in server is not suitable for production as it doesn’t scale well and by default serves only one request at a time.
[...]
If you want to deploy your Flask application to a WSGI server not listed here, look up the server documentation about how to use a WSGI app with it. Just remember that your Flask application object is the actual WSGI application.
Although a bit late, but I will try to give a generic and useful answer.
#Giacomo Catenazzi's answer is a good one but some part of it is factually incorrect.
API requests (or other form of web requests) are served from an already running process. The creation of this 'already running' process is handled by some webserver like gunicorn which on startup creates specified number of processes that are running the code in your web application continuously waiting to serve any incoming request.
Needless to say, each of these processes are limited by the GIL to only run one thread at a time. But one process in its lifetime handles more than one (normally many) request. Here it would be better if we could understand the flow of a request.
We will take an example of flask but this is applicable to most web frameworks. When a request comes from Nginx, it is handed over to gunicorn which interacts with your web application via wsgi. When the request reaches to the framework, an app context is created and some variables are pushed into the app-context. Then it follows the normal route that mostly people are familiar with: routing, db calls, response creation and so on. The response is then handed back to the gunicorn via wsgi again. At the time of handing over the response, the app context is teared down. So it's the app context, not the process that is created on every new request.
Also, I have talked only about the sync worker in gunicorn but it also has an option of async worker which can handle multiple requests in parallel through coroutines. But thats a separate topic.
So answering your question:
Nginx (Capable of handling multiple requests at a time)
Gunicorn creates a pool of n number of processes at the start and also manages the pool in the sense that if a process exits or gets stuck, it kills/recreates ans adds that to the pool.
Each process handling 1 request at a time.
Read more about gunicorn's design and how it can be used to help you achieve your requirements. This is a good thread about gunicorn with flask understanding. And this is a great resource to understand flask app context
I am new at Server side,
but I have gotten a chance to design and implement a server that will cover around 2000~3000 client.
And I am thinking that I will use Python and Websocket, though I don't know this choice is appropriate.
In this point, I am curious on how to design the server.
I think there must be some architecture normally in use depending on capacity that server handles.
Otherwise, Could I use a Websocket server offered by some python package like Tornado or Django?
I hope that I can get any information on this.
Any advice?
I've had good experiences using haproxy in front of sockjs-tornado. Depending on how complex your server-side logic, routing, and persistence requirements are, you could write all your server endpoints using tornado and use SQLAlchemy to handle writes to a relational database or use a non SQL data store like Redis.
If your main requirement is real-time interactivity it might be worth investigating meteor as well.
One of solutions could be Pyramid, sockjs, gunicorn, and gevent. Nginx probably better suits to be a frontend than Apache, but of course if you do not have any lengthy processing on the backend, any decent asynchronous Python server with websocket and sockjs support (not sure about socket.io as an alternative) will work for you out of the box.
Lenghty processing should be offloaded to some queue workers anyway, so asynchronous server will fit the bill.
Just check whether all used datastore/database adapters are compatible with your server solution be it asynchronous or multi-threading.
I'm struggling with some architectural choices for a scalable internet-of-things application.
I've chosen to base my project on Twisted augmented with the Cyclone framework to provide many Tornado convenances (websockets, auth-decorators, secure-cookies, etc)
Using a Twisted core has worked beautifully for me. I have numerous IP protocol and hardware interfaces all of which turned out to have great library support inside of twisted (and adding new protocols and interfaces to my application are the most-likely angles I'll have project scope creep), all with Twisted needing very low CPU and providing for very high connection counts.
My problems are with second-order webapp functionality.
I pulled in Cyclone thinking that with it's auth goodies (OpenID, oauth, user-auth decorators and secure-cookies) it wouldn't take much to implement user/session/admin functionality in my webapp. After the 500+ lines of abstracting my database (via txmongo) and just building user logins it became clear I both:
Didn't understand how little Cyclone/Tornado bring in the user/session/admin space, and
Didn't understand the amount of code it takes to fill in the gaps if your trying to build a multi-user auth webapp
A friend pointed me at Flask, which initially I thought was completely redundant, until I found flask plugins. The combination of Flask-Login and Flask-Admin would completely cover my user, session and user-admin needs, negating me writing what I would guess to be about 2k lines of code. Unfortunately, the flask plugins are all rife with blocking code and calls to blocking libraries. I don't see them as compatible with my project even if WSGI containers are used given that the user/session functionality happens with every page load (additionally I don't see any short cuts that would allow me to port them to async world without work roughly equal to that of rewriting them)
My question is:
In the python async space (... hopefully in the Twisted space, given my protocol needs), are there any plugins or alternate frameworks that provide ready-to-go user/login/admin functionality similar to what is in Flask-Login and Flask-Admin?
P.S. I've looked at Klein as the obvious Twisted version of Flask, but it doesn't seem to have a plugin ecosystem, and I'm not finding any strong user/session/admin there.
P.P.S. By the time I wrote this question I had already written my own (crappy) user-login-session system. So what I'm really after is the "Admin" capability (automated CRUD functions on user-style records, including web UI rendering, all designed in a Twisted/async way). I asked about user/login in the question in case it turn out there is an already-integraded solution (such as flask-login and flask-admin) in which case I would happily drop my code and switch to that.
Do you really need everything async? Consider async WebSockets but sync page renders. If you must, add an async downstream proxy or load balancer which will virtually eliminate app server's IO overhead.
I'm creating a simple web game that uses web sockets for to stream updates HTTP AJAX requests for everything else (e.g. login system, user profiles, &c). Unfortunately I'm somewhat new to mod_python, but it seems that I want to use the Sessions class to keep track of visitors. The only problem is that a Session requires a mod_python request for some reason. Is there a way I can use these sessions within a mod_pywebsocket handler, or do I need to roll my own session mechanism?
In case anyone could use this, I've found that mod_python's sessions work quite well with mod_pywebsocket. Here are two considerations to be aware of:
Initialization Typically, you construct a mod_python Session object with a mod_python request. Luckily, the authors of mod_pywebsocket had the forethought to make the web socket requests (the ones you get in web_socket_transfer_data arguments) compatible. That means you can instantiate your Session in the same way you normally would in mod_python (see the docs for examples). This might seem obvious, but it wasn't to me. If you get an error doing this, you've done something else wrong.
Session locks The other thing to keep in mind is that the session associated with a given ID is locked by default, and the lock persists for the lifetime of that Session object. This means that if you have two web sockets that use Sessions from the same host, one of them is in danger of blocking forever. In addition, the documentation states that these mutex locks can require non-trivial system resources. They were clearly designed for serving quick HTTP requests, not for persistent connection-oriented use.
One way to fix sessions is to disable the locking, but that's probably not a smart thing to do. I haven't tried it, but best of luck with those race conditions if you make the attempt. What I did was to create the Sessions I needed only for short periods of time and then assign None to it when I was done. Apparently with clauses won't work with these sessions. Again, this isn't terribly obscure, but it can lead to some headaches if you don't realize what's going on under the hood.
Here the simple webserver means a server that deal with simple HTTP request, just like the following one:
import BaseHTTPServer
class WebRequestHandler(BaseHTTPServer.BaseHTTPRequestHandler):
def do_GET(self):
if self.path == ‘/foo’:
self.send_response(200)
self.do_something()
else:
self.send_error(404)
def do_something(self):
print ‘hello world’
server = BaseHTTPServer.HTTPServer((’127.0.0.1′,8080), WebRequestHandler)
server.serve_forever()
Despite of dealing with request of POST,PUT,DELETE methods, what is the difference between this simple server with Apache Web Server?
Or in other words, if i want to use python to implement a server which can be put into use of business, what also should i do?
It'd be greatly appreciated if the big picture of Apache Server is shown.
Or in other words, if i want to use python to implement a server which can be put into use of business, what also should i do?
There are already python-based web servers, such as CherryPy (which I think is intended to be a web server solution on the same stack level as Apache; it is more python-based though, and Apache has been around a lot longer).
If you wish to write a lightweight extremely simple webserver from scratch, there is probably nothing wrong with using BaseHTTPServer, other than perhaps a few outstanding design issues (I hear race conditions might permanently clog a socket until a thread dies).
Though I would not recommend it (alone) for business, some of the big boys use BaseHTTPServer with a bit of extra machinery:
http://www.cherrypy.org/browser/trunk/cherrypy/_cphttpserver.py?rev=583
To elaborate, Apache is the industry standard. It has a plethora of configuration options, a security team, vulnerability mailing lists I think, etc. It supports modules (e.g. mod_python). Python-based web servers also support python-based modules (maybe they might allow you access to non-python things) via something called a WSGI stack; a WSGI application can run on any python-based web server (and Apache too, which also has a modwsgi); I think they are narrower in scope than Apache modules.
Apache module examples: http://httpd.apache.org/docs/2.0/mod/
WSGI examples (not a valid comparison): http://wsgi.org/wsgi/Middleware_and_Utilities
I might code my own webserver if I'm doing something extremely lightweight, or if I needed massive control over webserver internals that the module interfaces could not provide, or if I was doing a personal project. I would not code my own server for a business unless I had significant experience with how real-world web servers worked. This is especially important from a security vulnerability point-of-view.
For example, I once wrote a web-based music player. I used a BaseHTTPServer to serve music out of a sandbox I had written to ensure that people couldn't access arbitrary files. Threading was a nightmare. (I recall a bug where you needed to pass special arguments to Popen since threading caused an implicit fork which would cause hangs on dangling file descriptors.) There were other various issues. The code needed to be refactored a lot. It can be very worthwhile for a personal project, but is a significant undertaking and not worth it for a business that just needs a website.
I know two startups who have been content using Pylons (using Paste) or Turbogears (using CherryPy) in the past, if you're looking for a lightweight python web server stack. Their default template systems are lacking though. The choice between Apache and a leaner more python-based web server may also depend on the skillset of your co-developers.
Apache is written in C and designed to be scalable while BaseHTTPServer is meant for local/testing/debugging environments.
So you shouldn't use BaseHTTPServer for any production sites.
Apache web server knows and supports the entire HTTP protocol, so it can deal with all the complications having to do with headers, keeping connections open, caching content, all the different HTTP response codes and their proper treatment, etc.
You'd have to understand the entire HTTP protocol and express it in code to go beyond your simple HTTP server.