There are a bunch of techniques I can think of for doing this:
Setting up a replica web-server on a different port and/or IP, then using DNS as load-balancer; restarting one server at a time
Utilising more explicit load-balancing (which PaaS such as Heroku and OpenShift use) with implicit replicas
Using some in-built mechanism (e.g.: in nginx)
I am working within an IaaS solution, and will be setting up git and some listeners to handle this whole setup.
What's the best method of restarting the web-server—so my latest revision of my Python web-app can go live—without noticeably affecting site visitors/users/clients?
The simpler the better, no silver bullet.
For single server, gracefully restart mechanism can be helpful. It will start new processes to accept new requests, and maintain the old processes till the old requests finished. Nginx already using this, see http://wiki.nginx.org/CommandLine#Stopping_or_Restarting_Nginx
For multiple servers, using reverse proxy is a good practice. An example structure looks like this, and it can be easily build using Nginx:
If some of backend servers broken down, the reverse proxy can dispatch requests to other healthy servers and will not affect users. You can customize the load balancing strategy to do fine-grained control. And you can also flexible add server for scaling up, or pick off server for trouble shooting or code updating.
Related
I know GIL blocks python from running its threads across cores. If it does so, why python is being used in webservers, how are the companies like youtube, instagram handling it.
PS: I know alternatives like multiprocessing can solve it. But it would be great if anyone can post it with a scenario that was handled by them.
Python is used for server-side handling in webservers, but not (usually) as webserver.
On normal setup: we have have Apache or other webserver to handles a lot of processes (server-side) (python uses usually wsgi). Note usually apache handles directly "static" files. So we have one apache server, many parallel apache processes (to handle connection and basic http) and many python processes which handles one connection per time.
Each of such process are independent each others (they just use the same resources), so you can program your server side part easily, without worrying about deadlocks. It is mostly a trade-off: performance of code, and easy and quickly to produce code without huge problems. But usually webserver with python scale very well (also on large sites), and servers are cheaper then programmers.
Note: security is also increased by having just one request in a process.
GIL exists in CPython, (Python interpreter made in C and most used), other interpreter versions such as Jython or IronPython don't have such problem, because they don't have GIL.
Even though, using CPython you can still have concurrency, just do your thing in C and then "link it" in your Python code, just like Numpy or similar do.
Other thing is, even though you have your page using Flask or Django, when you set up it in a production server, you have an Apache or Nginx, etc which has a real charge balancer (or load balancer, I can't remember the name in english now) that can serve the page to many people at the same time.
Take it from the Flask docs (link):
Flask’s built-in server is not suitable for production as it doesn’t scale well and by default serves only one request at a time.
[...]
If you want to deploy your Flask application to a WSGI server not listed here, look up the server documentation about how to use a WSGI app with it. Just remember that your Flask application object is the actual WSGI application.
Although a bit late, but I will try to give a generic and useful answer.
#Giacomo Catenazzi's answer is a good one but some part of it is factually incorrect.
API requests (or other form of web requests) are served from an already running process. The creation of this 'already running' process is handled by some webserver like gunicorn which on startup creates specified number of processes that are running the code in your web application continuously waiting to serve any incoming request.
Needless to say, each of these processes are limited by the GIL to only run one thread at a time. But one process in its lifetime handles more than one (normally many) request. Here it would be better if we could understand the flow of a request.
We will take an example of flask but this is applicable to most web frameworks. When a request comes from Nginx, it is handed over to gunicorn which interacts with your web application via wsgi. When the request reaches to the framework, an app context is created and some variables are pushed into the app-context. Then it follows the normal route that mostly people are familiar with: routing, db calls, response creation and so on. The response is then handed back to the gunicorn via wsgi again. At the time of handing over the response, the app context is teared down. So it's the app context, not the process that is created on every new request.
Also, I have talked only about the sync worker in gunicorn but it also has an option of async worker which can handle multiple requests in parallel through coroutines. But thats a separate topic.
So answering your question:
Nginx (Capable of handling multiple requests at a time)
Gunicorn creates a pool of n number of processes at the start and also manages the pool in the sense that if a process exits or gets stuck, it kills/recreates ans adds that to the pool.
Each process handling 1 request at a time.
Read more about gunicorn's design and how it can be used to help you achieve your requirements. This is a good thread about gunicorn with flask understanding. And this is a great resource to understand flask app context
This is really troublesome for me. I have a telegram bot that runs in django and python 2.7. During development I used django sslserver and everything worked fine. Today I deployed it using gunicorn in nginx and the code works very different than it did on my localhost. I tried everything I could since I already started getting users, but all to no avail. It seems to me that most python objects lose their state after each request and this is what might be causing the problems. The library I use has a class that handles conversation with a telegram user and the state of the conversation is stored in a class instance. Sometimes when new requests come, those values would already be lost. Please has anyone faced this? and is there a way to solve the problem quick? I am in a critical situation and need a quick solution
Gunicorn has a preforking worker model -- meaning that it launches several independent subprocesses, each of which is responsible for handling a subset of the load.
If you're relying on internal application state being consistent across all threads involved in offering your service, you'll want to turn the number of workers down to 1, to ensure that all those threads are within the same process.
Of course, this is a stopgap -- if you want to be able to scale your solution to run on production loads, or have multiple servers backing your application, then you'll want to be modify your system to persist the relevant state to a shared store, rather than relying on content being available in-process.
I want to rate limit my Flask API. I found 2 solutions.
The Flask-Limiter extension.
A snippet from the Flask website using Redis: http://flask.pocoo.org/snippets/70/
What is the significance of Redis when Flask-Limiter is able to rate limit the request on the basis of remote address without Redis?
Redis allows you to store the rate-limiting state in a persistent store.
This means you can:
Restart your web server or web application and still have the rate-limitation work. You won't lose the records of the last requests made because of the working process being destroyed and a new one being created, instead.
Use multiple web servers or web applications. This is because the rate-limitation state is stored in an external data store that also solves the issue of shared data synchronisation and data races. You can run as many web servers as you wish - the rate-limitation is shared among all of them.
Look at the rate-limitation state. Redis offers easy CLI tools that allow you to look at the current active data in an ad-hoc manner, even MONITORing the incoming commands and requests.
Let Redis manage TTL, LRU etc for rate-limitation algorithms. Redis supports this intrinsically.
I think I don't completely understand the deployment process. Here is what I know:
when we need to do hot deployment -- meaning that we need to change the code that is live -- we can do it by reloading the modules, but
imp.reload is a bad idea, and we should restart the application instead of reloading the changed modules
ideally the running code should be a clone of your code repository, and any time you need to deploy, you just pull the changes
Now, let's say I have multiple instances of wsgi app running behind a reverse proxy like nginx (on ports like 8011, 8012, ...). And, let's also assume that I get 5 requests per second.
Now in this case, how should I update my code in all the running instances of the application.
If I stop all the instances, then update all of them, then restart them all -- I will certainly lose some requests
If I update each instance one by one -- then the instances will be in inconsistent states (some will be running old code, and some new) until all of them are updated. Now if a request hits an updated instance, and then a subsequent (and related) request hits an older instance (yet to be updated) -- then I will get wrong results.
Can somebody explain thoroughly how busy applications like this are hot-deployed?
For deployment across several hot instances that are behind a load balancer like nginx I like to do rolling deployments with a tool like Fabric.
Fabric connects you to Server 1
Shut down the web-server
Deploy changes, either by using your VCS or transferring tarball with the new application
Start up the web-server
GOTO1 and connect to the next server.
That way you're never offline, and it's seamless as nginx knows when a webserver is taken down when it tries to round-robin to it and will move onto the next one instead, and as soon as the node/instance is back up it will be back into production usage.
EDIT:
You can use the ip_hash module in nginx to ensure all requests from one IP Address goes to the same server for the length of the session
This directive causes requests to be distributed between upstreams based on the IP-address of the client.
The key for the hash is the class-C network address of the client. This method guarantees that the client request will always be transferred to the same server. But if this server is considered inoperative, then the request of this client will be transferred to another server. This gives a high probability clients will always connect to the same server.
What this means to you, is that once your web-server is updated and a client has connected to the new instance, all connections for that session will continue to be forwarded to the same server.
This does leave you in the situation of
Client connects to site, gets served from Server 1
Server 1 is updated before client finishes whatever they're doing
Client potentially left in a state of limbo?
This scenario begs the question, are you removing things from your API/Site which could potentially leave the client in a state of limbo ? If all you're doing is for example updating UI elements or adding pages etc but not changing any back-end APIs you should not have any problems. If you are removing API functions, you might end up with issues.
Couldn't you take half your servers offline (say by pulling them out of the load balancing pool) and then update those. Then bring them back online while simultaneously pulling down the other half. Then update those and bring them back online.
This will ensure that you stay online while also ensuring that you never have the old and new versions of your application online at the same time. Yes, this will mean that your site would run at half its capacity during the time. But that might be ok?
I need to build a webservice that is very computationally intensive, and I'm trying to get my bearings on how best to proceed.
I expect users to connect to my service, at which point some computation is done for some amount of time, typically less than 60s. The user knows that they need to wait, so this is not really a problem. My question is, what's the best way to structure a service like this and leave me with the least amount of headache? Can I use Node.js, web.py, CherryPy, etc.? Do I need a load balancer sitting in front of these pieces if used? I don't expect huge numbers of users, perhaps hundreds or into the thousands. I'll need a number of machines to host this number of users, of course, but this is uncharted territory for me, and if someone can give me a few pointers or things to read, that would be great.
Thanks.
Can I use Node.js, web.py, CherryPy, etc.?
Yes. Pick one. Django is nice, also.
Do I need a load balancer sitting in front of these pieces if used?
Almost never.
I'll need a number of machines to host this number of users,
Doubtful.
Remember that each web transaction has several distinct (and almost unrelated) parts.
A front-end (Apache HTTPD or NGINX or similar) accepts the initial web request. It can handle serving static files (.CSS, .JS, Images, etc.) so your main web application is uncluttered by this.
A reasonably efficient middleware like mod_wsgi can manage dozens (or hundreds) of backend processes.
If you choose a clever backend processing component like celery, you should be able to distribute the "real work" to the minimal number of processors to get the job done.
The results are fed back into Apache HTTPD (or NGINX) via mod_wsgi to the user's browser.
Now the backend processes (managed by celery) are divorced from the essential web server. You achieve a great deal of parallelism with Apache HTTPD and mod_wsgi and celery allowing you to use every scrap of processor resource.
Further, you may be able to decompose your "computationally intensive" process into parallel processes -- a Unix Pipeline is remarkably efficient and makes use of all available resources. You have to decompose your problem into step1 | step2 | step3 and make celery manage those pipelines.
You may find that this kind of decomposition leads to serving a far larger workload than you might have originally imagined.
Many Python web frameworks will keep the user's session information in a single common database. This means that all of your backends can -- without any real work -- move the user's session from web server to web server, making "load balancing" seamless and automatic. Just have lots of HTTPD/NGINX front-ends that spawn Django (or web.py or whatever) which all share a common database. It works remarkably well.
I think you can build it however you like, as long as you can make it an asynchronous service so that the users don't have to wait.
Unless, of course, the users don't mind waiting in this context.
I'd recommend using nginx as it can handle rewrite/balancing/ssl etc with a minimum of fuss
If you want to make your web sevices asynchronous you can try Twisted. It is a framework oriented to asynchronous tasks and implements so many network protocols. It is so easy to offer this services via xml-rpc (just put xmlrpc_ as the prefix of your method). On the other hand it scales very well with hundreds and thousands of users.
Celery is also a good option to make the most computionally intensive tasks asynchronous. It integrates very well with Django.