I know GIL blocks python from running its threads across cores. If it does so, why python is being used in webservers, how are the companies like youtube, instagram handling it.
PS: I know alternatives like multiprocessing can solve it. But it would be great if anyone can post it with a scenario that was handled by them.
Python is used for server-side handling in webservers, but not (usually) as webserver.
On normal setup: we have have Apache or other webserver to handles a lot of processes (server-side) (python uses usually wsgi). Note usually apache handles directly "static" files. So we have one apache server, many parallel apache processes (to handle connection and basic http) and many python processes which handles one connection per time.
Each of such process are independent each others (they just use the same resources), so you can program your server side part easily, without worrying about deadlocks. It is mostly a trade-off: performance of code, and easy and quickly to produce code without huge problems. But usually webserver with python scale very well (also on large sites), and servers are cheaper then programmers.
Note: security is also increased by having just one request in a process.
GIL exists in CPython, (Python interpreter made in C and most used), other interpreter versions such as Jython or IronPython don't have such problem, because they don't have GIL.
Even though, using CPython you can still have concurrency, just do your thing in C and then "link it" in your Python code, just like Numpy or similar do.
Other thing is, even though you have your page using Flask or Django, when you set up it in a production server, you have an Apache or Nginx, etc which has a real charge balancer (or load balancer, I can't remember the name in english now) that can serve the page to many people at the same time.
Take it from the Flask docs (link):
Flask’s built-in server is not suitable for production as it doesn’t scale well and by default serves only one request at a time.
[...]
If you want to deploy your Flask application to a WSGI server not listed here, look up the server documentation about how to use a WSGI app with it. Just remember that your Flask application object is the actual WSGI application.
Although a bit late, but I will try to give a generic and useful answer.
#Giacomo Catenazzi's answer is a good one but some part of it is factually incorrect.
API requests (or other form of web requests) are served from an already running process. The creation of this 'already running' process is handled by some webserver like gunicorn which on startup creates specified number of processes that are running the code in your web application continuously waiting to serve any incoming request.
Needless to say, each of these processes are limited by the GIL to only run one thread at a time. But one process in its lifetime handles more than one (normally many) request. Here it would be better if we could understand the flow of a request.
We will take an example of flask but this is applicable to most web frameworks. When a request comes from Nginx, it is handed over to gunicorn which interacts with your web application via wsgi. When the request reaches to the framework, an app context is created and some variables are pushed into the app-context. Then it follows the normal route that mostly people are familiar with: routing, db calls, response creation and so on. The response is then handed back to the gunicorn via wsgi again. At the time of handing over the response, the app context is teared down. So it's the app context, not the process that is created on every new request.
Also, I have talked only about the sync worker in gunicorn but it also has an option of async worker which can handle multiple requests in parallel through coroutines. But thats a separate topic.
So answering your question:
Nginx (Capable of handling multiple requests at a time)
Gunicorn creates a pool of n number of processes at the start and also manages the pool in the sense that if a process exits or gets stuck, it kills/recreates ans adds that to the pool.
Each process handling 1 request at a time.
Read more about gunicorn's design and how it can be used to help you achieve your requirements. This is a good thread about gunicorn with flask understanding. And this is a great resource to understand flask app context
Related
This is really troublesome for me. I have a telegram bot that runs in django and python 2.7. During development I used django sslserver and everything worked fine. Today I deployed it using gunicorn in nginx and the code works very different than it did on my localhost. I tried everything I could since I already started getting users, but all to no avail. It seems to me that most python objects lose their state after each request and this is what might be causing the problems. The library I use has a class that handles conversation with a telegram user and the state of the conversation is stored in a class instance. Sometimes when new requests come, those values would already be lost. Please has anyone faced this? and is there a way to solve the problem quick? I am in a critical situation and need a quick solution
Gunicorn has a preforking worker model -- meaning that it launches several independent subprocesses, each of which is responsible for handling a subset of the load.
If you're relying on internal application state being consistent across all threads involved in offering your service, you'll want to turn the number of workers down to 1, to ensure that all those threads are within the same process.
Of course, this is a stopgap -- if you want to be able to scale your solution to run on production loads, or have multiple servers backing your application, then you'll want to be modify your system to persist the relevant state to a shared store, rather than relying on content being available in-process.
I am trying to design a web application that processes large quantities of large mixed-media files coming from asynchronous processes. Each process can take several minutes.
The files are either uploaded as a POST body or pulled by the web server according to a source URL provided. The files can be processed by a variety of external tools in a synchronous or asynchronous way.
I need to be able to load balance this application so I can process multiple large files simultaneously for as much as I can afford to scale.
I think Python is my best choice for this project, but beside this, I am open to any solution. The app can either deliver the file back or rely on a messaging channel to notify the clients about the process completion.
Some approaches I thought I might use:
1) Use a non-blocking web server such as Tornado that keeps the connection open until the file processing is done. The external processing command is launched and the web server waits until the file is ready and pipes the resulting IO stream directly back to the web app that returns it. Since the processes sending requests are asynchronous, they might afford to wait (unless memory or some other issues come up).
2) Use a regular web server like Cherrypy (which I am more confident with) and have the webapp use a messaging channel to report the processing progress. The web server returns a HTTP response as soon as it receives the file, validates it and sends it to a background process. At the same time it sends a message notifying the process start. The background process then takes care of delivering the file to an available location and sending another message to the channel notifying the location of the new file. This solution looks more flexible than 1), but requires writing a separate script to handle the messages outside the web application, as well as a separate storage space for the temp files that have to be cleaned up at a certain point.
3) Use some internal messaging capability of any of the webserves mentioned above, which I am not familiar with...
Edit: something like CherryPy's pub-sub engine (http://cherrypy.readthedocs.org/en/latest/extend.html?highlight=messaging#publish-subscribe-pattern) could be a good solution.
Any suggestions?
Thank you,
gm
I had a similar situation come up with a really large scale data processing engine that my team implemented. We wanted to build our api calls in Flask, some of which can take many hours to complete, but have a way to notify the user in real time what is going on.
Basically what I came up with is was what you described as option 2. On the same machine that I am serving the flask app through apache, I created a tornado app that serves up a websocket that reports progress to the end user. Once my main page is served, it establishes the websocket connection to the tornado server, and the flask app periodically sends updates to the tornado app, and down to the end user. Even if the browser is closed during the long running application, apache keeps the request alive and processing, and upon logging back in, I can still see the current progress.
I wrote about this solution in some more detail here:
http://jonfeatherstone.com/2013/08/01/mongo-and-websockets-for-application-logging/
Good luck!
I'm actually new to Python web development. My previous knowledge about web development came from PHP.
In PHP, there's no resource (variable etc.) preserved between two different HTTP requests (except for $_SESSION I guess?)
So if Flask is run by gunicorn, what resource is preserved between two different HTTP requests?
This question came from the document of Flask. In the document, it says we need to register database session close code in app.teardown_request. In my own test, if I didn't register the session close code, the database will get many idle connection.
There are really only two modes of handling web requests:
Spin up the entire application for each request and tear it down after each request. Everything that isn't persisted (to some other process, to disk, or to the client) is destroyed at the end of the request.
Spin up the application and then let it handle more than one request before it gets torn down. Almost everything that isn't specific to the request is preserved between requests.
Mode #1 has more work to do for each request but it ensures that all resources that the application uses are torn down (so you don't have problems with leaking database connections, even if you forget to close them).
Mode #2 has less work to do for each request but it is possible for an application to "leak" resources - as in your example where you leak database connections if you don't explicitly close them.
PHP (running in embedded mode in Apache under mod_php) uses the first mode. Flask (and frameworks in the majority of languages commonly used for web development these days) uses the second.
Every time I read either WSGI or CGI I cringe. I've tried reading on it before but nothing really has stuck.
What is it really in plain English?
Does it just pipe requests to a terminal and redirect the output?
From a totally step-back point of view, Blankman, here is my "Intro Page" for Web Server Gateway Interface:
PART ONE: WEB SERVERS
Web servers serve up responses. They sit around, waiting patiently, and then with no warning at all, suddenly:
a client process sends a request. The client process could be a web server, a bot, a mobile app, whatever. It is simply "the client"
the web server receives this request
deliberate mumble various things happen (see below)
The web server sends back something to the client
web server sits around again
Web servers (at least, the better ones) are VERY good at this. They scale up and down processing depending on demand, they reliably hold conversations with the flakiest of clients over really cruddy networks, and we never really have to worry about it. They just keep on serving.
This is my point: web servers are just that: servers. They know nothing about content, nothing about users, nothing in fact other than how to wait a lot and reply reliably.
Your choice of web server should reflect your delivery preference, not your software. Your web server should be in charge of serving, not processing or logical stuff.
PART TWO: (PYTHON) SOFTWARE
Software does not sit around. Software only exists at execution time. Software is not terribly accommodating when it comes to unexpected changes in its environment (files not being where it expects, parameters being renamed etc). Although optimisation should be a central tenet of your design (of course), software itself does not optimise. Developers optimise. Software executes. Software does all the stuff in the 'deliberate mumble' section above. Could be anything.
Your choice or design of software should reflect your application, your choice of functionality, and not your choice of web server.
This is where the traditional method of "compiling in" languages to web servers becomes painful. You end up putting code in your application to cope with the physical server environment or, at least, being forced to choose an appropriate 'wrapper' library to include at runtime, to give the illusion of uniformity across web servers.
SO WHAT IS WSGI?
So, at last, what is WSGI? WSGI is a set of rules, written in two halves. They are written in such a way that they can be integrated into any environment that welcomes integration.
The first part, written for the web server side, says "OK, if you want to deal with a WSGI application, here's how the software will be thinking when it loads. Here are the things you must make available to the application, and here is the interface (layout) that you can expect every application to have. Moreover, if anything goes wrong, here's how the app will be thinking and how you can expect it to behave."
The second part, written for the Python application software, says "OK, if you want to deal with a WSGI server, here's how the server will be thinking when it contacts you. Here are the things you must make available to the server, and here is the interface (layout) that you can expect every server to have. Moreover, if anything goes wrong, here's how you should behave and here's what you should tell the server."
So there you have it - servers will be servers and software will be software, and here's a way they can get along just great without one having to make any allowances for the specifics of the other. This is WSGI.
mod_wsgi, on the other hand, is a plugin for Apache that lets it talk to WSGI-compliant software, in other words, mod_wsgi is an implementation - in Apache - of the rules of part one of the rule book above.
As for CGI.... ask someone else :-)
WSGI runs the Python interpreter on web server start, either as part of the web server process (embedded mode) or as a separate process (daemon mode), and loads the script into it. Each request results in a specific function in the script being called, with the request environment passed as arguments to the function.
CGI runs the script as a separate process each request and uses environment variables, stdin, and stdout to "communicate" with it.
Both CGI and WSGI define standard interfaces that programs can use to handle web requests. The CGI interface is at a lower level than WSGI, and involves the server setting up environment variables containing the data from the HTTP request, with the program returning something formatted pretty much like a bare HTTP server response.
WSGI, on the other hand, is a Python-specific, slightly higher-level interface that allows programmers to write applications that are server-agnostic and which can be wrapped in other WSGI applications (middleware).
If you are unclear on all the terms in this space, and let's face it, it's a confusing acronym-laden one, there's also a good background reader in the form of an official python HOWTO which discusses CGI vs. FastCGI vs. WSGI and so on. I wish I'd read it first.
I have a Pylons web application served by Apache (mod_wsgi, prefork). Because of Apache, there are multiple separate processes running my application code concurrently. Some of the non-critical tasks that the application does I want to defer for processing in background to improve "live" response times. So I'm thinking of task queue, many Apache processes adding tasks to this queue, a single separate Python process processing them one-by-one and removing from queue.
The queue should preferably be persisted to disk so queued unprocessed tasks are not lost because of power outage, server restart etc. The question is what would be a reasonable way to implement such queue?
As for the things I've tried: I started with simple SQLite database and single table in it for storing queue items. In load testing, when increasing level of concurrency, I started getting "database locked" errors, as expected. The quick'n'dirty fix was to replace SQLite with MySQL--it handles concurrency issues well but feels like an overkill for the simple thing I need to do. Queue-related DB operations also show up prominently in my profiling reports.
A message broker like Apache's ActiveMQ is an ideal solution here.
The pipeline could be following:
Application process that is responsible for handling HTTP requests generates replies quickly and sends low-priority, heavy tasks to AMQ queue.
One or more another processes are subscribed to consume AMQ queue and do what is intended to do with these heavy tasks.
The requirement of queue persistence is fulfilled out of the box since ActiveMQ stores messages that are not yet consumed in persistent storage. Furthermore it scales quite well since you're free to deploy multiple HTTP-apps, multiple consumer apps and AMQ itself on different machines each.
We use something like this in our project written in Python utilizing STOMP as underlying communication protocol.
A web server (any web server) is multi-producer, single-consumer process.
A simple solution is to build a wsgiref or Werkzeug backend server to handle your backend requests.
Since this "backend" server is build using WSGI technology, it's very, very similar to the front-end web server. Except. It doesn't produce HTML responses (JSON is usually simpler). Other than that, it's very straightforward.
You design RESTful transactions for this backend. You use all of the various WSGI features for URI parsing, authorization, authentication, etc. You -- generally -- don't need session management, since RESTful servers don't usually offer sessions.
If you get into serious scalability issues, you simply wrap your backend server in lighttpd or some other web engine to create a multi-threaded backend.