Every time I read either WSGI or CGI I cringe. I've tried reading on it before but nothing really has stuck.
What is it really in plain English?
Does it just pipe requests to a terminal and redirect the output?
From a totally step-back point of view, Blankman, here is my "Intro Page" for Web Server Gateway Interface:
PART ONE: WEB SERVERS
Web servers serve up responses. They sit around, waiting patiently, and then with no warning at all, suddenly:
a client process sends a request. The client process could be a web server, a bot, a mobile app, whatever. It is simply "the client"
the web server receives this request
deliberate mumble various things happen (see below)
The web server sends back something to the client
web server sits around again
Web servers (at least, the better ones) are VERY good at this. They scale up and down processing depending on demand, they reliably hold conversations with the flakiest of clients over really cruddy networks, and we never really have to worry about it. They just keep on serving.
This is my point: web servers are just that: servers. They know nothing about content, nothing about users, nothing in fact other than how to wait a lot and reply reliably.
Your choice of web server should reflect your delivery preference, not your software. Your web server should be in charge of serving, not processing or logical stuff.
PART TWO: (PYTHON) SOFTWARE
Software does not sit around. Software only exists at execution time. Software is not terribly accommodating when it comes to unexpected changes in its environment (files not being where it expects, parameters being renamed etc). Although optimisation should be a central tenet of your design (of course), software itself does not optimise. Developers optimise. Software executes. Software does all the stuff in the 'deliberate mumble' section above. Could be anything.
Your choice or design of software should reflect your application, your choice of functionality, and not your choice of web server.
This is where the traditional method of "compiling in" languages to web servers becomes painful. You end up putting code in your application to cope with the physical server environment or, at least, being forced to choose an appropriate 'wrapper' library to include at runtime, to give the illusion of uniformity across web servers.
SO WHAT IS WSGI?
So, at last, what is WSGI? WSGI is a set of rules, written in two halves. They are written in such a way that they can be integrated into any environment that welcomes integration.
The first part, written for the web server side, says "OK, if you want to deal with a WSGI application, here's how the software will be thinking when it loads. Here are the things you must make available to the application, and here is the interface (layout) that you can expect every application to have. Moreover, if anything goes wrong, here's how the app will be thinking and how you can expect it to behave."
The second part, written for the Python application software, says "OK, if you want to deal with a WSGI server, here's how the server will be thinking when it contacts you. Here are the things you must make available to the server, and here is the interface (layout) that you can expect every server to have. Moreover, if anything goes wrong, here's how you should behave and here's what you should tell the server."
So there you have it - servers will be servers and software will be software, and here's a way they can get along just great without one having to make any allowances for the specifics of the other. This is WSGI.
mod_wsgi, on the other hand, is a plugin for Apache that lets it talk to WSGI-compliant software, in other words, mod_wsgi is an implementation - in Apache - of the rules of part one of the rule book above.
As for CGI.... ask someone else :-)
WSGI runs the Python interpreter on web server start, either as part of the web server process (embedded mode) or as a separate process (daemon mode), and loads the script into it. Each request results in a specific function in the script being called, with the request environment passed as arguments to the function.
CGI runs the script as a separate process each request and uses environment variables, stdin, and stdout to "communicate" with it.
Both CGI and WSGI define standard interfaces that programs can use to handle web requests. The CGI interface is at a lower level than WSGI, and involves the server setting up environment variables containing the data from the HTTP request, with the program returning something formatted pretty much like a bare HTTP server response.
WSGI, on the other hand, is a Python-specific, slightly higher-level interface that allows programmers to write applications that are server-agnostic and which can be wrapped in other WSGI applications (middleware).
If you are unclear on all the terms in this space, and let's face it, it's a confusing acronym-laden one, there's also a good background reader in the form of an official python HOWTO which discusses CGI vs. FastCGI vs. WSGI and so on. I wish I'd read it first.
Related
I know GIL blocks python from running its threads across cores. If it does so, why python is being used in webservers, how are the companies like youtube, instagram handling it.
PS: I know alternatives like multiprocessing can solve it. But it would be great if anyone can post it with a scenario that was handled by them.
Python is used for server-side handling in webservers, but not (usually) as webserver.
On normal setup: we have have Apache or other webserver to handles a lot of processes (server-side) (python uses usually wsgi). Note usually apache handles directly "static" files. So we have one apache server, many parallel apache processes (to handle connection and basic http) and many python processes which handles one connection per time.
Each of such process are independent each others (they just use the same resources), so you can program your server side part easily, without worrying about deadlocks. It is mostly a trade-off: performance of code, and easy and quickly to produce code without huge problems. But usually webserver with python scale very well (also on large sites), and servers are cheaper then programmers.
Note: security is also increased by having just one request in a process.
GIL exists in CPython, (Python interpreter made in C and most used), other interpreter versions such as Jython or IronPython don't have such problem, because they don't have GIL.
Even though, using CPython you can still have concurrency, just do your thing in C and then "link it" in your Python code, just like Numpy or similar do.
Other thing is, even though you have your page using Flask or Django, when you set up it in a production server, you have an Apache or Nginx, etc which has a real charge balancer (or load balancer, I can't remember the name in english now) that can serve the page to many people at the same time.
Take it from the Flask docs (link):
Flask’s built-in server is not suitable for production as it doesn’t scale well and by default serves only one request at a time.
[...]
If you want to deploy your Flask application to a WSGI server not listed here, look up the server documentation about how to use a WSGI app with it. Just remember that your Flask application object is the actual WSGI application.
Although a bit late, but I will try to give a generic and useful answer.
#Giacomo Catenazzi's answer is a good one but some part of it is factually incorrect.
API requests (or other form of web requests) are served from an already running process. The creation of this 'already running' process is handled by some webserver like gunicorn which on startup creates specified number of processes that are running the code in your web application continuously waiting to serve any incoming request.
Needless to say, each of these processes are limited by the GIL to only run one thread at a time. But one process in its lifetime handles more than one (normally many) request. Here it would be better if we could understand the flow of a request.
We will take an example of flask but this is applicable to most web frameworks. When a request comes from Nginx, it is handed over to gunicorn which interacts with your web application via wsgi. When the request reaches to the framework, an app context is created and some variables are pushed into the app-context. Then it follows the normal route that mostly people are familiar with: routing, db calls, response creation and so on. The response is then handed back to the gunicorn via wsgi again. At the time of handing over the response, the app context is teared down. So it's the app context, not the process that is created on every new request.
Also, I have talked only about the sync worker in gunicorn but it also has an option of async worker which can handle multiple requests in parallel through coroutines. But thats a separate topic.
So answering your question:
Nginx (Capable of handling multiple requests at a time)
Gunicorn creates a pool of n number of processes at the start and also manages the pool in the sense that if a process exits or gets stuck, it kills/recreates ans adds that to the pool.
Each process handling 1 request at a time.
Read more about gunicorn's design and how it can be used to help you achieve your requirements. This is a good thread about gunicorn with flask understanding. And this is a great resource to understand flask app context
I have an interesting project going on at our workplace. The task, that stands before us, is such:
Build a custom server using Python
It has a web server part, serving REST
It has a FTP server part, serving files
It has a SMTP part, which receives mail only
and last but not least, a it has a background worker that manages lowlevel file IO based on requests received from the above mentioned services
Obviously the go to place was Twisted library/framework, which is an excelent networking tool. However, studying the docs further, a few things came up that I'm not sure about.
Having Java background, I would solve the task (at least at the beginning) by spawning a separate thread for each service and going from there. Being in Python however, I cannot do that for any reasonable purpose as Python has GIL. I'm not sure, how Twisted handles this. I would expect, that Twisted has large (if not majority) code written in C, where GIL is not the issue, but that I couldn't find the docs explained to my satisfaction.
So the most oustanding question is: Given that Twisted uses Reactor as it's main design pattern, will it be able to:
Serve all those services needed
Do it in a non-blocking fashion (it should, according to docs, but if someone could elaborate, I'd be grateful)
Be able to serve about few hundreds of clients at once
Serve large file downloads in a reasonable way, meaning that it can serve multiple clients, using multiple services, downloading and uploading large files.
Large files being in the order of hundres of MB, or few GB. The size is not important, it's the time that the client has to stay connected to the server that matters.
Edit: I'm actually inclined to go the way of python multiprocessing, but not sure, whether that's a correct thing to do with Twisted etc.
Serve all those services needed
Yes.
Do it in a non-blocking fashion (it should, according to docs, but if someone could elaborate, I'd be grateful)
Twisted's uses the common reactor model. I/O goes through your choice of poll, select, whatever to determine if data is available. It handles only what is available, and passes the data along to other stages of your app. This is how it is non-blocking.
I don't think it provides non-blocking disk I/O, but I'm not sure. That feature not what most people need when they say non-blocking.
Be able to serve about few hundreds of clients at once
Yes. No. Maybe. What are those clients doing? Is each hitting refresh every second on a browser making 100 requests? Is each one doing a numerical simulation of galaxy collisions? Is each sending the string "hi!" to the server, without expecting a response?
Twisted can easily handle 1000+ requests per second.
Serve large file downloads in a reasonable way, meaning that it can serve multiple clients, using multiple services, downloading and uploading large files.
Sure. For example, the original version of BitTorrent was written in Twisted.
Here the simple webserver means a server that deal with simple HTTP request, just like the following one:
import BaseHTTPServer
class WebRequestHandler(BaseHTTPServer.BaseHTTPRequestHandler):
def do_GET(self):
if self.path == ‘/foo’:
self.send_response(200)
self.do_something()
else:
self.send_error(404)
def do_something(self):
print ‘hello world’
server = BaseHTTPServer.HTTPServer((’127.0.0.1′,8080), WebRequestHandler)
server.serve_forever()
Despite of dealing with request of POST,PUT,DELETE methods, what is the difference between this simple server with Apache Web Server?
Or in other words, if i want to use python to implement a server which can be put into use of business, what also should i do?
It'd be greatly appreciated if the big picture of Apache Server is shown.
Or in other words, if i want to use python to implement a server which can be put into use of business, what also should i do?
There are already python-based web servers, such as CherryPy (which I think is intended to be a web server solution on the same stack level as Apache; it is more python-based though, and Apache has been around a lot longer).
If you wish to write a lightweight extremely simple webserver from scratch, there is probably nothing wrong with using BaseHTTPServer, other than perhaps a few outstanding design issues (I hear race conditions might permanently clog a socket until a thread dies).
Though I would not recommend it (alone) for business, some of the big boys use BaseHTTPServer with a bit of extra machinery:
http://www.cherrypy.org/browser/trunk/cherrypy/_cphttpserver.py?rev=583
To elaborate, Apache is the industry standard. It has a plethora of configuration options, a security team, vulnerability mailing lists I think, etc. It supports modules (e.g. mod_python). Python-based web servers also support python-based modules (maybe they might allow you access to non-python things) via something called a WSGI stack; a WSGI application can run on any python-based web server (and Apache too, which also has a modwsgi); I think they are narrower in scope than Apache modules.
Apache module examples: http://httpd.apache.org/docs/2.0/mod/
WSGI examples (not a valid comparison): http://wsgi.org/wsgi/Middleware_and_Utilities
I might code my own webserver if I'm doing something extremely lightweight, or if I needed massive control over webserver internals that the module interfaces could not provide, or if I was doing a personal project. I would not code my own server for a business unless I had significant experience with how real-world web servers worked. This is especially important from a security vulnerability point-of-view.
For example, I once wrote a web-based music player. I used a BaseHTTPServer to serve music out of a sandbox I had written to ensure that people couldn't access arbitrary files. Threading was a nightmare. (I recall a bug where you needed to pass special arguments to Popen since threading caused an implicit fork which would cause hangs on dangling file descriptors.) There were other various issues. The code needed to be refactored a lot. It can be very worthwhile for a personal project, but is a significant undertaking and not worth it for a business that just needs a website.
I know two startups who have been content using Pylons (using Paste) or Turbogears (using CherryPy) in the past, if you're looking for a lightweight python web server stack. Their default template systems are lacking though. The choice between Apache and a leaner more python-based web server may also depend on the skillset of your co-developers.
Apache is written in C and designed to be scalable while BaseHTTPServer is meant for local/testing/debugging environments.
So you shouldn't use BaseHTTPServer for any production sites.
Apache web server knows and supports the entire HTTP protocol, so it can deal with all the complications having to do with headers, keeping connections open, caching content, all the different HTTP response codes and their proper treatment, etc.
You'd have to understand the entire HTTP protocol and express it in code to go beyond your simple HTTP server.
Here is what I would like to do, and I want to know how some people with experience in this field do this:
With three POST requests I get from the http server:
widgets and layout
and then app logic (minimal)
data
Or maybe it's better to combine the first two or all three. I'm thinking of using pyqt. I think I can load .ui files. I can parse json data. I just think it would be rather dangerous to pass code over a network to be executed on the client. If someone can hijack the connection, or can change the apps setting to access a bogus server, that is nasty.
I want to do it this way because it keeps all the clients up-to-date. It's sort of like a webapp but simpler because of Qt. Essentially the "thin" app is just a minimal compiled python file that loads data from a server.
How can I do this without introducing security issues on the client? Is https good enough? Is there a way to get pyqt to run in a sandbox of sorts?
PS. I'm not stuck on Qt or python. I do like the concept though. I don't really want to use Java - server or client side.
Your desire to send "app logic" from the server to the client without sending "code" is inherently self-contradictory, though you may not realize that yet -- even if the "logic" you're sending is in some simplified ad-hoc "language" (which you don't even think of as a language;-), to all intents and purposes your Python code will be interpreting that language and thereby execute that code. You may "sandbox" things to some extent, but in the end, that's what you're doing.
To avoid hijackings and other tricks, instead, use HTTPS and validate the server's cert in your client: that will protect you from all the problems you're worrying about (if somebody can edit the app enough to defeat the HTTPS cert validation, they can edit it enough to make it run whatever code they want, without any need to send that code from a server;-).
Once you're using https, having the server send Python modules (in source form if you need to support multiple Python versions on the clients, else bytecode is fine) and the client thereby save them to disk and import / reload them, will be just fine. You'll basically be doing a variant of the classic "plugins architecture" where the "plugins" are in fact being sent from the server (instead of being found on disk in a given location).
Use a web-browser it is a well documented system that does everything you want. It is also relatively fast to create simple graphical applications in a browser. Examples for my reasoning:
The Sage math environment has built their graphical client as an application that runs in a browser, together with a local web-server.
There is the Pyjamas project that compiles Python to Javascript. This is IMHO worth a try.
Edit:
You could try PyPy's sandbox interpreter, as a secure Python interpreter for the code that was transferred over a network.
An then there is the most simple solution: Simply send Python modules over the network, but sign and/or encrypt them. This is the way all Linux distributions work. You store a cryptographic token on the local computer. The server signs/encrypts the code before it sends it, with a matching token. GPG should be able to do it.
Here is where I am at presently. I am designing a card game with the aim of utilizing major components for future work. The part that is hanging me up is creating a layer of abstraction between the server and the client(s). A server is started, and then one or more clients can connect (locally or remotely). I am designing a thick client but my friend is looking at doing a web-based client. I would like to design the server in a manner that allows a variety of different clients to call a common set of server commands.
So, for a start, I would like to create a 'server' which manages the game rules and player interactions, and a 'client' on the local CLI (I'm running Ubuntu Linux for convenience). I'm attempting to flesh out how the two pieces are supposed to interact, without mandating that future clients be CLI-based or on the local machine.
I've found the following two questions which are beneficial, but don't quite answer the above.
Client Server programming in python?
Evaluate my Python server structure
I don't require anything full-featured right away; I just want to establish the basic mechanisms for abstraction so that the resulting mock-up code reflects the relationship appropriately: there are different assumptions at play with a client/server relationship than with an all-in-one application.
Where do I start? What resources do you recommend?
Disclaimers:
I am familiar with code in a variety of languages and general programming/logic concepts, but have little real experience writing substantial amounts of code. This pet project is an attempt at rectifying this.
Also, I know the information is out there already, but I have the strong impression that I am missing the forest for the trees.
Read up on RESTful architectures.
Your fat client can use REST. It will use urllib2 to make RESTful requests of a server. It can exchange data in JSON notation.
A web client can use REST. It can make simple browser HTTP requests or a Javascript component can make more sophisticated REST requests using JSON.
Your server can be built as a simple WSGI application using any simple WSGI components. You have nice ones in the standard library, or you can use Werkzeug. Your server simply accepts REST requests and makes REST responses. Your server can work in HTML (for a browser) or JSON (for a fat client or Javascript client.)
I would consider basing all server / client interactions on HTTP -- probably with JSON payloads. This doesn't directly allow server-initiated interactions ("server push"), but the (newish but already traditional;-) workaround for that is AJAX-y (even though the X makes little sense as I suggest JSON payloads, not XML ones;-) -- the client initiates an async request (via a separate thread or otherwise) to a special URL on the server, and the server responds to those requests to (in practice) do "pushes". From what you say it looks like the limitations of this approach might not be a problem.
The key advantage of specifying the interactions in such terms is that they're entirely independent from the programming language -- so the web-based client in Javascript will be just as doable as your CLI one in Python, etc etc. Of course, the server can live on localhost as a special case, but there is no constraint for that as the HTTP URLs can specify whatever host is running the server; etc, etc.
First of all, regardless of the locality or type of the client, you will be communicating through an established message-based interface. All clients will be operating based on a common set of requests and responses, and the server will handle and reject these based on their validity according to game state. Whether you are dealing with local clients on the same machine or remote clients via HTTP does not matter whatsoever from an abstraction standpoint, as they will all be communicating through the same set of requests/responses.
What this comes down to is your protocol. Your protocol should be a well-defined and technically sound language between client and server that will allow clients to a) participate effectively, and b) participate fairly. This protocol should define what messages ('moves') a client can do, and when, and how the server will react.
Your protocol should be fully fleshed out and documented before you even start on game logic - the two are intrinsically connected and you will save a lot of wasted time and effort by competely defining your protocol first.
You protocol is the abstraction between client and server and it will also serve as the design document and programming guide for both.
Protocol design is all about state, state transitions, and validation. Game servers usually have a set of fairly common, generic states for each game instance e.g. initialization, lobby, gameplay, pause, recap, close game, etc...
Each one of these states has important state data related with it. For example, a 'lobby' state on the server-side might contain the known state of each player...how long since the last message or ping, what the player is doing (selecting an avatar, switching settings, going to the fridge, etc.). Organizing and managing state and substate data in code is important.
Managing these states, and the associated data requirements for each is a process that should be exquisitely planned out as they are directly related to volume of work and project complexity - this is very important and also great practice if you are using this project to step up into larger things.
Also, you must keep in mind that if you have a game, and you let people play, people will cheat. It's a fact of life. In order to minimize this, you must carefully design your protocol and state management to only ever allow valid state transitions. Never trust a single client packet.
For every permutation of client/server state, you must enforce a limited set of valid game messages, and you must be very careful in what you allow players to do, and when you allow them to do it.
Project complexity is generally exponential and not linear - client/server game programming is usually a good/painful way to learn this. Great question. Hope this helps, and good luck!