Differentiate nginx, haproxy, varnish and uWSGI/Gunicorn [closed] - python

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I am really new to sys admin stuff, and have only provisioned a VPS with nginx(serving the static files) and gunicorn as the web server.
I have lately been reading about different other stuff. I came to know about other tools:
nginx : high-performance HTTP server and reverse proxy, as well as an IMAP/POP3 proxy server
haproxy : high performance load balancer
varnish : caching HTTP reverse proxy
gunicorn : python WSGI http server
uwsgi : another python WSGI server
I have been reading about all the above 5 tools and I have confused myself as to which one is used for what purpose? Could someone please explain me in lay man terms what use is each of the tool put in, when used together and which specific concern do they address?

Let's say you plan to host a few websites on your new VPS. Let's look at the tools you might need for each site.
HTTP Servers
Website 'Alpha' just consists of a some pure HTML, CSS and Javascript. The content is static.
When someone visits website Alpha, their browser will issue an HTTP request. You have configured (via DNS and name server configuration) that request to be directed to the IP address of your VPS. Now you need your VPS to be able to accept that HTTP request, decide what to do with it, and issue a response that the visitor's browser can understand. You need an HTTP server, such as Apache httpd or NGINX, and let's say you do some research and eventually decide on NGINX.
Application Servers
Website 'Beta' is dynamic, written using the Django Web Framework.
WSGI is an protocol that describes the interface between a Python application (the django app) and an application server. So what you need now is an WSGI app server, which will be able to understand web requests, make appropriate 'calls' to the application's various objects, and return the results. You have many options here, including gunicorn and uWSGI. Let's say you do some research and eventually decide on uWSGI.
uWSGI can accept and handle HTTPS requests for static content as well, so if you wanted to you could have website Alpha served entirely by NGINX and website Beta served entirely by uWSGI. And that would be that.
Reverse Proxy Servers
But uWSGI has poor performance in dealing with static content, so you would rather use NGINX for static content like images, even on website Beta. But then something would have to distinguish between requests and send them to the right place. Is that possible?
It turns out NGINX is not just an HTTP server but also a reverse proxy server: it is capable of redirecting incoming requests to another place, like your uWSGI application server, or many other places, collecting the response(s) and sending them back to the original requester. Awesome! So you configure all incoming requests to go to NGINX, which will serve up static content or, when required, redirect it to the app server.
Load Balancing with multiple web servers
You are also hosting Website Gamma, which is a blog that is popular internationally and receives a ton of traffic.
For Gamma you decide to set up multiple web servers. All incoming requests are going to your original VPS with NGINX, and you configure NGINX to redirect the request to one of several other web servers based in round-robin fashion, and return the response to the original requester.
HAProxy is web server that specializes in balancing loads for high traffic sites. In this case, you were able to use NGINX to handle traffic for site Gamma. In other scenarios, one may choose to set up a high-availability cluster: e.g., send all requests to a server like HAProxy, which intelligently redirects traffic to a cluster of nginx servers similar to your original VPS.
Cache Server
Website Gamma exceeded the capacity of your VPS due to the sheer volume of traffic. Let's say you instead hosted website Delta, and the reason your web server is unable to handle Delta is due to a popular feature that is very content-heavy.
A cache server is able to understand what media content is being frequently requested and store this content differently, such that it can be more quickly served. This is achieved by reducing disk IO operations; the popular content can be stored in memory or virtual memory instead. You might decide to combine your existing NGINX stack with a technology like Varnish or Memchached to achieve this type of optimization and server website Gamma more effectively.

I will put a very concise (very informal) description for each one, in the order they would be hit when you make a request from your web browser:
HAProxy balances your traffic load, so if your webpage is receiving 5000 hits per second, you can't handle that with only one
webserver, so HAProxy will balance the hits among the webservers you
had behind.
Varnish is a cache server, it sits upfront your webservers and behind HAProxy, so if a resource is already cached by Varnish he will serve the request itself, instead
of passing the request to the webservers behind.
ngingx, gunicorn, uwsgi are web servers, that would be behind varnish and will get the requests that varnish will let pass
through. These web servers use optimized designs to handle high
loads (requests per second).

First gunicorn and uwsgi are both appservers. In other words they are responsible for running your python code in a stable and performant manner. Usually as a backend to a regular webserver.
The webserver would be nginx, it excels at serving static assets and passing the requests for dynamic content on to the appservers.
If the above doesn't give enough performance you add in varnish between nginx and the client, it should speed up repeated requests for the same thing.
haproxy is a load balancer, if you have several servers for the same content, this software will attempt to distribute requests among them optimally.
so basically:
your python code lives in the appserver (uwsgi or gunicorn)
your static webassets live in nginx
haproxy and varnish are software that allow you to better server very large amounts of requests

Related

How to interpose RabbitMQ between REST client and (Python) REST server?

If I develop a REST service hosted in Apache and a Python plugin which services GET, PUT, DELETE, PATCH; and this service is consumed by an Angular client (or other REST interacting browser technology). Then how do I make it scale-able with RabbitMQ (AMQP)?
Potential Solution #1
Multiple Apache's still faces off against the browser's HTTP calls.
Each Apache instance uses an AMQP plugin and then posts message to a queue
Python microservices monitor a queue and pull a message, service it and return response
Response passed back to Apache plugin, in turn Apache generates the HTTP response
Does this mean the Python microservice no longer has any HTTP server code at all. This will change that component a lot. Perhaps best to decide upfront if you want to use this pattern as it seems it would be a task to rip out any HTTP server code.
Other potential solutions? I am genuinely puzzled as to how we're supposed to take a classic REST server component and upgrade it to be scale-able with RabbitMQ/AMQP with minimal disruption.
I would recommend switching wsgi to asgi(nginx can help here), Im not sure why you think rabbitmq is the solution to your problem, as nothing you described seems like that would be solved by using this method.
asgi is not supported by apache as far as I know, but it allows the server to go do work, and while its working it can continue to service new requests that come in. (gross over simplification)
If for whatever reason you really want to use job workers (rabbitmq, etc) then I would suggest returning to the user a "token" (really just the job_id) and then they can call with that token, and it will report back either the current job status or the result

Can Flask run on Gunicorn alone?

I'm currently developing HTTP Rest API server using Flask and Gunicorn. For various reason, it is not possible to put a reverse proxy server in front of Gunicorn. I don't have any static media, and all url are being served by #app.route pattern in Flask Framework. Can Flask run on Gunicorn alone?
It could, but it is a very bad idea. Gunicorn is not working well without a proxy that is doing request and response buffering for slow clients.
Without buffering the gunicorn worker has to wait until the client has send the whole request and then has to wait until the client has read the whole response.
This can be a serious problem if there are clients on a slow network for example.
http://docs.gunicorn.org/en/latest/deploy.html?highlight=buffering
see also: http://blog.etianen.com/blog/2014/01/19/gunicorn-heroku-django/
Because Gunicorn has a relatively small (2x CPU cores) pool of workers, if can only handle a small number of concurrent requests. If all the worker processes become tied up waiting for network traffic, the entire server will become unresponsive. To the outside world, your web application will cease to exist.

How do I make Django admin URLs accessible to localhost only?

What is the simplest way to make Django /admin/ urls accessible to localhost only?
Options I have thought of:
Seperate the admin site out of the project (somehow) and run as a different virtual host (in Apache2)
Use a proxy in front of the hosting (Apache2) web server
Restrict the URL in Apache within WSGI somehow.
Is there a standard approach?
Thanks!
Id go for apache configuration:
<Location /admin>
Order Deny, Allow
Deny from all
Allow from 127.0.0.1
</Location>
HTH.
I'd go for the Apache configuration + run a proxy in front + restrict in WSGI :
I dislike Apache for communicating with web clients when dynamic content generation is involved. Because of it's execution model, a slow or disconnected client can tie up the Apache process. If you have a proxy in front ( i prefer nginx, but even a vanilla apache will do ), the proxy will worry about the clients and Apache can focus on a new dynamic content request.
Depending on your Apache configuration, a process can also slurp a lot of memory and hold onto it until it hits MaxRequests. If you have memory intensive code in /admin ( many people do ), you can end up with Apache processes that grab a lot more memory than they need. If you split your Apache config into /admin and /!admin , you can tweak your apache settings to have a larger number of /!admin servers which require a smaller potential footprint.
I'm paranoid server setups.
I want to ensure the proxy only sends /admin to a certain Apache port.
I want to ensure that Apache only receives /admin on certain apache port, and that it came from the proxy (with a secret header) or from localhost.
I want to ensure that the WSGI is only running the /admin stuff based on certain server/client conditions.

Python framework for an extremely lightweight python webservice

I want to develop an extremely lightweight web service with a RESTful JSON API. I will do all the session management on the server side. The solution will receive several 100k (or more) API calls an hour and return (compressed) JSON as response, it should be able to scale effortlessly.
Security is naturally important, but I want to avoid heavy weight frameworks like Django etc, and preferably will use a webserver like nginx or lighttpd instead of Apache.
At the server end, this is all I need:
user session management
security (protection against atleast the more common attacks such as cross site request forgery etc)
url routing
http utilities (e.g. compression)
I am aware of web2py, but its deployment options seems 'not well though out' at best - so far, I have been unable to get it to work with Apache, despite following the user manuals.
Can anyone suggest a python framework (and web server) best suited for this task?
if you want to go really lightweight, you might try wsgi itself without a framework, or maybe Flask. I understand wsgi runs on lighttpd, you'll get some hits on a Google search.
Try Pyramid. It's fast, lightweight and with a lot of options to configure your enviroment as you like...

How to control Apache via Django to connect to mongoose(another HTTP server)?

I have been doing lots of searching and reading to solve this.
The main goal is let a Django-based web management system connecting to a device which runs a http server as well. Django will handle user request and ask device for the real data, then feedback to user.
Now I have a "kinda-work-in-concept" solution:
Browser -> Apache Server: Browser have jQuery and HTML/CSS to collect user request.
Apache Server-> Device HTTP Server:
Apache + mod_python(or somesay Apache + mod_wsgi?) , so I might control the Apache to do stuff like build up a session and cookies to record login.
But, this is the issue actually bugs me.
How to make it work? Using what to build up socket connection between this two servers?
You could use httplib or urllib2 (both supplied in the Python standard library) in your Django view to send HTTP requests to the device running mongoose.
Alternatively you could use the Requests library which provides a less verbose API for generating HTTP requests - see also this blog post.
(Also, I would strongly recommend that you use mod_wsgi rather than mod_python as mod_wsgi is being actively maintained and performs better than mod_python, which was last updated in 2007)
If you have control over what runs on the device side, consider using XML-RPC to talk from client to server.

Categories