I'm actually new to Python web development. My previous knowledge about web development came from PHP.
In PHP, there's no resource (variable etc.) preserved between two different HTTP requests (except for $_SESSION I guess?)
So if Flask is run by gunicorn, what resource is preserved between two different HTTP requests?
This question came from the document of Flask. In the document, it says we need to register database session close code in app.teardown_request. In my own test, if I didn't register the session close code, the database will get many idle connection.
There are really only two modes of handling web requests:
Spin up the entire application for each request and tear it down after each request. Everything that isn't persisted (to some other process, to disk, or to the client) is destroyed at the end of the request.
Spin up the application and then let it handle more than one request before it gets torn down. Almost everything that isn't specific to the request is preserved between requests.
Mode #1 has more work to do for each request but it ensures that all resources that the application uses are torn down (so you don't have problems with leaking database connections, even if you forget to close them).
Mode #2 has less work to do for each request but it is possible for an application to "leak" resources - as in your example where you leak database connections if you don't explicitly close them.
PHP (running in embedded mode in Apache under mod_php) uses the first mode. Flask (and frameworks in the majority of languages commonly used for web development these days) uses the second.
Related
If I develop a REST service hosted in Apache and a Python plugin which services GET, PUT, DELETE, PATCH; and this service is consumed by an Angular client (or other REST interacting browser technology). Then how do I make it scale-able with RabbitMQ (AMQP)?
Potential Solution #1
Multiple Apache's still faces off against the browser's HTTP calls.
Each Apache instance uses an AMQP plugin and then posts message to a queue
Python microservices monitor a queue and pull a message, service it and return response
Response passed back to Apache plugin, in turn Apache generates the HTTP response
Does this mean the Python microservice no longer has any HTTP server code at all. This will change that component a lot. Perhaps best to decide upfront if you want to use this pattern as it seems it would be a task to rip out any HTTP server code.
Other potential solutions? I am genuinely puzzled as to how we're supposed to take a classic REST server component and upgrade it to be scale-able with RabbitMQ/AMQP with minimal disruption.
I would recommend switching wsgi to asgi(nginx can help here), Im not sure why you think rabbitmq is the solution to your problem, as nothing you described seems like that would be solved by using this method.
asgi is not supported by apache as far as I know, but it allows the server to go do work, and while its working it can continue to service new requests that come in. (gross over simplification)
If for whatever reason you really want to use job workers (rabbitmq, etc) then I would suggest returning to the user a "token" (really just the job_id) and then they can call with that token, and it will report back either the current job status or the result
I know GIL blocks python from running its threads across cores. If it does so, why python is being used in webservers, how are the companies like youtube, instagram handling it.
PS: I know alternatives like multiprocessing can solve it. But it would be great if anyone can post it with a scenario that was handled by them.
Python is used for server-side handling in webservers, but not (usually) as webserver.
On normal setup: we have have Apache or other webserver to handles a lot of processes (server-side) (python uses usually wsgi). Note usually apache handles directly "static" files. So we have one apache server, many parallel apache processes (to handle connection and basic http) and many python processes which handles one connection per time.
Each of such process are independent each others (they just use the same resources), so you can program your server side part easily, without worrying about deadlocks. It is mostly a trade-off: performance of code, and easy and quickly to produce code without huge problems. But usually webserver with python scale very well (also on large sites), and servers are cheaper then programmers.
Note: security is also increased by having just one request in a process.
GIL exists in CPython, (Python interpreter made in C and most used), other interpreter versions such as Jython or IronPython don't have such problem, because they don't have GIL.
Even though, using CPython you can still have concurrency, just do your thing in C and then "link it" in your Python code, just like Numpy or similar do.
Other thing is, even though you have your page using Flask or Django, when you set up it in a production server, you have an Apache or Nginx, etc which has a real charge balancer (or load balancer, I can't remember the name in english now) that can serve the page to many people at the same time.
Take it from the Flask docs (link):
Flask’s built-in server is not suitable for production as it doesn’t scale well and by default serves only one request at a time.
[...]
If you want to deploy your Flask application to a WSGI server not listed here, look up the server documentation about how to use a WSGI app with it. Just remember that your Flask application object is the actual WSGI application.
Although a bit late, but I will try to give a generic and useful answer.
#Giacomo Catenazzi's answer is a good one but some part of it is factually incorrect.
API requests (or other form of web requests) are served from an already running process. The creation of this 'already running' process is handled by some webserver like gunicorn which on startup creates specified number of processes that are running the code in your web application continuously waiting to serve any incoming request.
Needless to say, each of these processes are limited by the GIL to only run one thread at a time. But one process in its lifetime handles more than one (normally many) request. Here it would be better if we could understand the flow of a request.
We will take an example of flask but this is applicable to most web frameworks. When a request comes from Nginx, it is handed over to gunicorn which interacts with your web application via wsgi. When the request reaches to the framework, an app context is created and some variables are pushed into the app-context. Then it follows the normal route that mostly people are familiar with: routing, db calls, response creation and so on. The response is then handed back to the gunicorn via wsgi again. At the time of handing over the response, the app context is teared down. So it's the app context, not the process that is created on every new request.
Also, I have talked only about the sync worker in gunicorn but it also has an option of async worker which can handle multiple requests in parallel through coroutines. But thats a separate topic.
So answering your question:
Nginx (Capable of handling multiple requests at a time)
Gunicorn creates a pool of n number of processes at the start and also manages the pool in the sense that if a process exits or gets stuck, it kills/recreates ans adds that to the pool.
Each process handling 1 request at a time.
Read more about gunicorn's design and how it can be used to help you achieve your requirements. This is a good thread about gunicorn with flask understanding. And this is a great resource to understand flask app context
I am trying to design a web application that processes large quantities of large mixed-media files coming from asynchronous processes. Each process can take several minutes.
The files are either uploaded as a POST body or pulled by the web server according to a source URL provided. The files can be processed by a variety of external tools in a synchronous or asynchronous way.
I need to be able to load balance this application so I can process multiple large files simultaneously for as much as I can afford to scale.
I think Python is my best choice for this project, but beside this, I am open to any solution. The app can either deliver the file back or rely on a messaging channel to notify the clients about the process completion.
Some approaches I thought I might use:
1) Use a non-blocking web server such as Tornado that keeps the connection open until the file processing is done. The external processing command is launched and the web server waits until the file is ready and pipes the resulting IO stream directly back to the web app that returns it. Since the processes sending requests are asynchronous, they might afford to wait (unless memory or some other issues come up).
2) Use a regular web server like Cherrypy (which I am more confident with) and have the webapp use a messaging channel to report the processing progress. The web server returns a HTTP response as soon as it receives the file, validates it and sends it to a background process. At the same time it sends a message notifying the process start. The background process then takes care of delivering the file to an available location and sending another message to the channel notifying the location of the new file. This solution looks more flexible than 1), but requires writing a separate script to handle the messages outside the web application, as well as a separate storage space for the temp files that have to be cleaned up at a certain point.
3) Use some internal messaging capability of any of the webserves mentioned above, which I am not familiar with...
Edit: something like CherryPy's pub-sub engine (http://cherrypy.readthedocs.org/en/latest/extend.html?highlight=messaging#publish-subscribe-pattern) could be a good solution.
Any suggestions?
Thank you,
gm
I had a similar situation come up with a really large scale data processing engine that my team implemented. We wanted to build our api calls in Flask, some of which can take many hours to complete, but have a way to notify the user in real time what is going on.
Basically what I came up with is was what you described as option 2. On the same machine that I am serving the flask app through apache, I created a tornado app that serves up a websocket that reports progress to the end user. Once my main page is served, it establishes the websocket connection to the tornado server, and the flask app periodically sends updates to the tornado app, and down to the end user. Even if the browser is closed during the long running application, apache keeps the request alive and processing, and upon logging back in, I can still see the current progress.
I wrote about this solution in some more detail here:
http://jonfeatherstone.com/2013/08/01/mongo-and-websockets-for-application-logging/
Good luck!
I think I don't completely understand the deployment process. Here is what I know:
when we need to do hot deployment -- meaning that we need to change the code that is live -- we can do it by reloading the modules, but
imp.reload is a bad idea, and we should restart the application instead of reloading the changed modules
ideally the running code should be a clone of your code repository, and any time you need to deploy, you just pull the changes
Now, let's say I have multiple instances of wsgi app running behind a reverse proxy like nginx (on ports like 8011, 8012, ...). And, let's also assume that I get 5 requests per second.
Now in this case, how should I update my code in all the running instances of the application.
If I stop all the instances, then update all of them, then restart them all -- I will certainly lose some requests
If I update each instance one by one -- then the instances will be in inconsistent states (some will be running old code, and some new) until all of them are updated. Now if a request hits an updated instance, and then a subsequent (and related) request hits an older instance (yet to be updated) -- then I will get wrong results.
Can somebody explain thoroughly how busy applications like this are hot-deployed?
For deployment across several hot instances that are behind a load balancer like nginx I like to do rolling deployments with a tool like Fabric.
Fabric connects you to Server 1
Shut down the web-server
Deploy changes, either by using your VCS or transferring tarball with the new application
Start up the web-server
GOTO1 and connect to the next server.
That way you're never offline, and it's seamless as nginx knows when a webserver is taken down when it tries to round-robin to it and will move onto the next one instead, and as soon as the node/instance is back up it will be back into production usage.
EDIT:
You can use the ip_hash module in nginx to ensure all requests from one IP Address goes to the same server for the length of the session
This directive causes requests to be distributed between upstreams based on the IP-address of the client.
The key for the hash is the class-C network address of the client. This method guarantees that the client request will always be transferred to the same server. But if this server is considered inoperative, then the request of this client will be transferred to another server. This gives a high probability clients will always connect to the same server.
What this means to you, is that once your web-server is updated and a client has connected to the new instance, all connections for that session will continue to be forwarded to the same server.
This does leave you in the situation of
Client connects to site, gets served from Server 1
Server 1 is updated before client finishes whatever they're doing
Client potentially left in a state of limbo?
This scenario begs the question, are you removing things from your API/Site which could potentially leave the client in a state of limbo ? If all you're doing is for example updating UI elements or adding pages etc but not changing any back-end APIs you should not have any problems. If you are removing API functions, you might end up with issues.
Couldn't you take half your servers offline (say by pulling them out of the load balancing pool) and then update those. Then bring them back online while simultaneously pulling down the other half. Then update those and bring them back online.
This will ensure that you stay online while also ensuring that you never have the old and new versions of your application online at the same time. Yes, this will mean that your site would run at half its capacity during the time. But that might be ok?
Every time I read either WSGI or CGI I cringe. I've tried reading on it before but nothing really has stuck.
What is it really in plain English?
Does it just pipe requests to a terminal and redirect the output?
From a totally step-back point of view, Blankman, here is my "Intro Page" for Web Server Gateway Interface:
PART ONE: WEB SERVERS
Web servers serve up responses. They sit around, waiting patiently, and then with no warning at all, suddenly:
a client process sends a request. The client process could be a web server, a bot, a mobile app, whatever. It is simply "the client"
the web server receives this request
deliberate mumble various things happen (see below)
The web server sends back something to the client
web server sits around again
Web servers (at least, the better ones) are VERY good at this. They scale up and down processing depending on demand, they reliably hold conversations with the flakiest of clients over really cruddy networks, and we never really have to worry about it. They just keep on serving.
This is my point: web servers are just that: servers. They know nothing about content, nothing about users, nothing in fact other than how to wait a lot and reply reliably.
Your choice of web server should reflect your delivery preference, not your software. Your web server should be in charge of serving, not processing or logical stuff.
PART TWO: (PYTHON) SOFTWARE
Software does not sit around. Software only exists at execution time. Software is not terribly accommodating when it comes to unexpected changes in its environment (files not being where it expects, parameters being renamed etc). Although optimisation should be a central tenet of your design (of course), software itself does not optimise. Developers optimise. Software executes. Software does all the stuff in the 'deliberate mumble' section above. Could be anything.
Your choice or design of software should reflect your application, your choice of functionality, and not your choice of web server.
This is where the traditional method of "compiling in" languages to web servers becomes painful. You end up putting code in your application to cope with the physical server environment or, at least, being forced to choose an appropriate 'wrapper' library to include at runtime, to give the illusion of uniformity across web servers.
SO WHAT IS WSGI?
So, at last, what is WSGI? WSGI is a set of rules, written in two halves. They are written in such a way that they can be integrated into any environment that welcomes integration.
The first part, written for the web server side, says "OK, if you want to deal with a WSGI application, here's how the software will be thinking when it loads. Here are the things you must make available to the application, and here is the interface (layout) that you can expect every application to have. Moreover, if anything goes wrong, here's how the app will be thinking and how you can expect it to behave."
The second part, written for the Python application software, says "OK, if you want to deal with a WSGI server, here's how the server will be thinking when it contacts you. Here are the things you must make available to the server, and here is the interface (layout) that you can expect every server to have. Moreover, if anything goes wrong, here's how you should behave and here's what you should tell the server."
So there you have it - servers will be servers and software will be software, and here's a way they can get along just great without one having to make any allowances for the specifics of the other. This is WSGI.
mod_wsgi, on the other hand, is a plugin for Apache that lets it talk to WSGI-compliant software, in other words, mod_wsgi is an implementation - in Apache - of the rules of part one of the rule book above.
As for CGI.... ask someone else :-)
WSGI runs the Python interpreter on web server start, either as part of the web server process (embedded mode) or as a separate process (daemon mode), and loads the script into it. Each request results in a specific function in the script being called, with the request environment passed as arguments to the function.
CGI runs the script as a separate process each request and uses environment variables, stdin, and stdout to "communicate" with it.
Both CGI and WSGI define standard interfaces that programs can use to handle web requests. The CGI interface is at a lower level than WSGI, and involves the server setting up environment variables containing the data from the HTTP request, with the program returning something formatted pretty much like a bare HTTP server response.
WSGI, on the other hand, is a Python-specific, slightly higher-level interface that allows programmers to write applications that are server-agnostic and which can be wrapped in other WSGI applications (middleware).
If you are unclear on all the terms in this space, and let's face it, it's a confusing acronym-laden one, there's also a good background reader in the form of an official python HOWTO which discusses CGI vs. FastCGI vs. WSGI and so on. I wish I'd read it first.