What happens to the supernumerary requests when a Django app gets flooded?

What happens to the supernumerary requests when a Django app gets flooded? - python

I have a web app using Django. The app has a maximum capacity of N RPS, while the client sends M RPS, where M>N. In other words, the app receives more requests than it can handle, and the number of unprocessed requests will grow linearly over time (after t sec, the number of requests that are waiting to be processed is (M-N) * t)
I would like to know what will happen to these requests. Will they accumulate in memory until the memory is full? Will they get "canceled" after some conditions are met?

It's hard to answer to your question directly without details about your configuration. Moreover for a extremely high usage of your app it's realy hard to determine what will happen there. But surely, you can't be sure that all those request will be handled correctly.
If you are able to count how many requests per second your application can handle and you want to make it reliable for more than N requests, then maybe it's a good start to think of some kind of load balancer, which will spread your request over multiple server machines.
To answer your question, I can of think of few posibilities when request can't be handled correclty:
Client cancelled he's request (maybe a browser, which can have maximum time execution limit).
Time execution of request was above the timeout limit set in web server configuration (because of lack of resources, too many I/O operations, ...).
Maybe other service (like some blocked PostgreSQL query or maybe Memcache server failed to work) was timeouted
Your server machine is overloaded and TCP connection can't be established.
Web server of your choice is able to handle only specified in configuration amount of requests/queue length and rejects those over limit (in Apache for example this configuration: http://httpd.apache.org/docs/2.2/mod/mpm_common.html#listenbacklog).
Try to read something about C10K problem, could be useful to think about it more deeply.

Related

How do you call a service which limits access to another service

We have a lot of small services (python based) which all access the same external resource (REST API). The external resource is a payed service with a given set of tickets per day. Every request, no matter how small or big decreases the available tickets by one. Even if the request fails due to some error in the request parameters or timeout the tickets get reduced. This is usually not a big problem and can be dealt with on a per service level. The problem starts with the limitation about maximum parallel request. We have a certain amount of parallel request, and if we reach that limit the next requests fail and again decrease our available tickets.
So for me a solution where each service handles this error and retries after a certain amount of time is no option. This would be to costly in terms of tickets and also way too inefficient.
My solution now would be to have special internal service which all other service call which acts as a kind of proxy or middleman which receives the requests and puts them in a queue and processes them in a way that we never exceed the parallel requests limit.
Before i now start to code i would like to know if there is a proper name for such a service and if there are already some solutions out there, because i can imagine someone else could also have these problems. I also think that someone (probably not me) could create such a service completely independent from the actual external API.
Thank you very much and please stackoverflow gods be kind with me.

Is there a way to limit the number of concurrent requests from one IP with Gunicorn?

Basically I'm running a Flask web server that crunches a bunch of data and sends it back to the user. We aren't expecting many users ~60, but I've noticed what could be an issue with concurrency. Right now, if I open a tab and send a request to have some data crunched, it takes about 30s, for our application that's ok.
If I open another tab and send the same request at the same time, unicorn will do it concurrently, this is great if we have two seperate users making two seperate requests. But what happens if I have one user open 4 or 8 tabs and send the same request? It backs up the server for everyone else, is there a way I can tell Gunicorn to only accept 1 request at a time from the same IP?

A better solution to the answer by #jon would be limiting the access by your web server instead of the application server. A good way would always be to have separation between the responsibilities to be carried out by the different layers of your application. Ideally, the application server, flask should not have any configuration for the limiting or anything to do with from where the requests are coming. The responsibility of the web server, in this case nginx is to route the request based on certain parameters to the right client. The limiting should be done at this layer.
Now, coming to the limiting, you could do it by using the limit_req_zone directive in the http block config of nginx
http {
limit_req_zone $binary_remote_addr zone=one:10m rate=1r/s;
...
server {
...
location / {
limit_req zone=one burst=5;
proxy_pass ...
}
where, binary_remote_addris the IP of the client and not more than 1 request per second at an average is allowed, with bursts not exceeding 5 requests.
Pro-tip: Since the subsequent requests from the same IP would be held in a queue, there is a good chance of nginx timing out. Hence, it would be advisable to have a better proxy_read_timeout and if the reports take longer then also adjusting the timeout of gunicorn
Documentation of limit_req_zone
A blog post by nginx on rate limiting can be found here

This is probably NOT best handled at the flask level. But if you had to do it there, then it turns out someone else already designed a flask plugin to do just this:
https://flask-limiter.readthedocs.io/en/stable/
If a request takes at least 30s then make your limit by address for one request every 30s. This will solve the issue of impatient users obsessively clicking instead of waiting for a very long process to finish.
This isn't exactly what you requested, since it means that longer/shorter requests may overlap and allow multiple requests at the same time, which doesn't fully exclude the behavior you describe of multiple tabs, etc. That said, if you are able to tell your users to wait 30 seconds for anything, it sounds like you are in the drivers seat for setting UX expectations. Probably a good wait/progress message will help too if you can build an asynchronous server interaction.

How best for an API running on a web server to check its public availability / status

I have an AWS server that handles end-user registration/ It runs an EC2 linux instance that serves our API via Apache & Python, and which is connected to its data on a separate Amazon RDS instance running mysql.
To remotely admin the system, I set states in a mysql table to control the availability of the registration API to the public user, and also the level of logging for our Python API, which may reference up to 5 concurrent admin preferences (i.e. not a single "log level")
Because our API provides almost two dozen different functions, we need to check the state of the system's availability before any individual function is accessed. That means there's an SQL Select statement from that table (which only has one record), but for every session of user transaction,s which might involve a half-dozen API calls. We need to check to see if the availability status has changed, so the user doesn't start an API call and have the database become unavailable in the middle of the process. Same for the logging preferences.
The API calls return the server's availability, and estimated downtime, back to the calling program (NOT a web browser interface) which handles that situation gracefully.
Is this a commonly accepted approch for handling this? Should I care if I'm over-polling the status table? And should I set up mysql with my status table in such a way to make my constant checking more efficient (e.g. cached?) when Python obtains its data?
I should note that we might have thousands of simultaneous users making API requests, not tens of thousands, or millions.

Your strategy seems off-track, here.
Polling a status table should not be a major hot spot. A small table, with proper indexes, queried outside a transaction, is a lightweight operation. With an appropriately-provisioned server, such a query should be done entirely in memory, requiring no disk access.
But that doesn't mean it's a fully viable strategy.
We need to check to see if the availability status has changed, so the user doesn't start an API call and have the database become unavailable in the middle of the process.
This will prove impossible. You need time travel capability for this strategy to succeed.
Consider this: the database becoming unavailable in the middle of a process wouldn't be detected by your approach. Only the lack of availability at the beginning would be detected. And that's easy enough to detect, anyway -- you will realize that as soon as you try to do something.
Set appropriate timeouts. The MySQL client library should have support for a connect timeout, as well as a timeout which will cause your application to see an error if a query runs longer than is acceptable or a network disruption causes the connection to be lost mid-query. I don't know whether this exists or what it's called in Python but in the C client library, this is MYSQL_OPT_READ_TIMEOUT and is very handy for preventing a hang when for whatever reason you get no response from the database within an acceptable period of time.
Use database transactions, so that a failure to process a request results in no net change to the database. A MySQL transaction is implicitly rolled back if the connection between the application and the database is lost.
Implementing error handling and recovery -- written into your code -- is likely the more viable approach than trying to prevent your code from running when the service is unavailable is more likely to be a good design, because there is no check interval small enough to fully avoid a database becoming unavailable "in the middle" of a request.
In any event, polling a database table with each request seems like the wrong approach, not to mention the fact that an outage on the health status table's server makes your service fail unnecessarily when the service itself might have been healthy but failed to prove that.
On the other hand, I don't know your architecture, but assuming your front-end involves something like Amazon Application Load Balancer or HAProxy, the health checks against the API service endpoint can actually perform the test. If you configure your check interval for, say, 10 seconds, and making a request to the check endpoint (say GET /health-check) actually verifies end-to-end availability of the necessary components (e.g. database access) then the API service can effectively take itself offline when a problem occurs. It remains offline until it starts returning success again.
The advantage here is that your workload involved in healthy checking is consistent -- it happens every 10 seconds, increasing with the number of nodes providing the service, but not increasing with actual request traffic, because you don't have to perform a check for each request. This means you have a window of a few seconds between the actual loss of availability and the detection of the loss of availability, but the requests that get through in the mean time will fail, anyway.
HAProxy -- and presumably other tools like Varnish or Nginx -- can help you handle graceful failures in other ways as well, by timing out failed requests at a layer before the API endpoint so that the caller gets a response even though the service itself didn't respond. An example from one of my environments is a shopping page where an external API call is made by the application when a site visitor is browsing items by category. If this request runs longer than it should, the proxy can interrupt the request and return a preconfigured static error page to the system making the request with an error -- say, in JSON or XML, that the requesting application will understand -- so that the hard failure becomes a softer one. This fake response can, for example in this case, return an empty JSON array of "items found."
It isn't entirely clear to me, now, whether these APIs are yours, or are external APIs that you are aggregating. If the latter, then HAProxy is a good solution here, too, but facing the other direction -- the back-end faces outward and your service contacts its front-end. You access the external service through the proxy and the proxy checks the remote service and will immediately return an error back to your application if the target API is unhealthy. I use this solution to access an external trouble ticketing system from one of my apps. An additional advantage, here, is that the proxy logs allow me to collect usage, performance, and reliability data about all of the many requests passed to that external service regardless of which of dozens of internal systems may access it, with far better visibility than I could accomplish than if I tried to collect it from all of the internal application servers that access that external service.

Long-running connection HTTP server (Python)

I am trying to design a web application that processes large quantities of large mixed-media files coming from asynchronous processes. Each process can take several minutes.
The files are either uploaded as a POST body or pulled by the web server according to a source URL provided. The files can be processed by a variety of external tools in a synchronous or asynchronous way.
I need to be able to load balance this application so I can process multiple large files simultaneously for as much as I can afford to scale.
I think Python is my best choice for this project, but beside this, I am open to any solution. The app can either deliver the file back or rely on a messaging channel to notify the clients about the process completion.
Some approaches I thought I might use:
1) Use a non-blocking web server such as Tornado that keeps the connection open until the file processing is done. The external processing command is launched and the web server waits until the file is ready and pipes the resulting IO stream directly back to the web app that returns it. Since the processes sending requests are asynchronous, they might afford to wait (unless memory or some other issues come up).
2) Use a regular web server like Cherrypy (which I am more confident with) and have the webapp use a messaging channel to report the processing progress. The web server returns a HTTP response as soon as it receives the file, validates it and sends it to a background process. At the same time it sends a message notifying the process start. The background process then takes care of delivering the file to an available location and sending another message to the channel notifying the location of the new file. This solution looks more flexible than 1), but requires writing a separate script to handle the messages outside the web application, as well as a separate storage space for the temp files that have to be cleaned up at a certain point.
3) Use some internal messaging capability of any of the webserves mentioned above, which I am not familiar with...
Edit: something like CherryPy's pub-sub engine (http://cherrypy.readthedocs.org/en/latest/extend.html?highlight=messaging#publish-subscribe-pattern) could be a good solution.
Any suggestions?
Thank you,
gm

I had a similar situation come up with a really large scale data processing engine that my team implemented. We wanted to build our api calls in Flask, some of which can take many hours to complete, but have a way to notify the user in real time what is going on.
Basically what I came up with is was what you described as option 2. On the same machine that I am serving the flask app through apache, I created a tornado app that serves up a websocket that reports progress to the end user. Once my main page is served, it establishes the websocket connection to the tornado server, and the flask app periodically sends updates to the tornado app, and down to the end user. Even if the browser is closed during the long running application, apache keeps the request alive and processing, and upon logging back in, I can still see the current progress.
I wrote about this solution in some more detail here:
http://jonfeatherstone.com/2013/08/01/mongo-and-websockets-for-application-logging/
Good luck!

Slow access to Django's request.body

Sometimes this line of Django app (hosted using Apache/mod_wsgi) takes a lot of time to execute (eg. 99% of eg. 6 seconds of request handling, as measured by New Relic), when submitted by some mobile clients:
raw_body = request.body
(where request is an incoming request)
The questions I have:
What could have slowed down access to request.body so much?
What would be the correct configuration for Apache to wait before invoking Django until client sends whole payload? Maybe the problem is in Apache configuration.
Django's body attribute in HttpRequest is a property, so that really resolves on what is really being done there and how to make it happen outside of the Django app, if possible. I want Apache to wait for full request before sending it to Django app.

Regarding (1), Apache passes control to the mod_wsgi handler as soon as the request's headers are available, and mod_wsgi then passes control on to Python. The internal implementation of request.body then calls the read() method which eventually calls the implementation within mod_wsgi, which requests the request's body from Apache and, if it hasn't been completely received by Apache yet, blocks until it is available.
Regarding (2), this is not possible with mod_wsgi alone. At least, the hook processing incoming requests doesn't provide a mechanism to block until the full request is available. Another poster suggested to use nginx as a proxy in a response to this duplicate question.

There are two ways you can fix this in Apache.
You can use mod_buffer, available in >=2.3, and change BufferSize to the maximum expected payload size. This should make Apache hold the request in memory until it's either finished sending, or the buffer is reached.
For older Apache versions < 2.3, you can use mod_proxy combined with ProxyIOBufferSize, ProxyReceiveBufferSize and a loopback vhost. This involves putting your real vhost on a loopback interface, and exposing a proxy vhost which connects back to the real vhost. The downside to this is that it uses twice as many sockets, and can make resource calculation difficult.
However, the most ideal choice would be to enable request/response buffering at your L4/L7 load balancer. For example, haproxy lets you add rules based on req_len and same goes for nginx. Most good commercial load balancers also have an option to buffer requests before sending.
All three approaches rely on buffering the full request/response payload, and there are performance considerations depending on your use case and available resources. You could cache the entire payload in memory but this may dramatically decrease your maximum concurrent connections. You could choose to write the payload to local storage (preferably SSD), but you are then limited by IO capacity.
You also need to consider file uploads, because these are not a good fit for memory based payload buffering. In most cases, you would handle upload requests in your webserver, for example HttpUploadModule, then query nginx for the upload progress, rather than handling it directly in WSGI. If you are buffering at your load balancer, then you may wish to exclude file uploads from the buffering rules.
You need to understand why this is happening, and that this problem exists both when sending a response and receiving a request. It's also a good idea to have these protections in place, not just for scalability, but for security reasons.

I'm afraid the problem could be in the amount of data you are transferring and possibly a slow connection. Also note that upload bandwidth is typically much less than download bandwidth.
As already pointed out, when you use request.body Django will wait for the whole body to be fully transferred from the client and available in-memory (or on disk, according to configurations and size) on the server.
I would suggest you to try what happens with the same request if the client is connected to a WiFi access point which is wired to the server itself, and see if it improves grately. If this is not possible, perhaps just run a tool like speedtest.net on the client, get the request size and do the math to see how much time it would require theoretically (I'd expect the mesured time to be more or less 20% more). Be careful that network speed is often mesured in bits per second, while file size is mesured in Bytes.
In some cases, if a lot of processing is needed on the data, it may be convinient to read() the request and do computations on-the-go, or perhaps directly pass the request object to any function that can read from a so-called "file-like object" instead of a string.
In your specific case, however, I'm afraid this would only affect that 1% of time that is not spent in receiving the body from the network.
Edit:
Sorry, ony now I've noticed the extra description in the bounty. I'm afraid I can't help you but, may I ask, what is the point? I'd guess this would only save a tiny bit of server resources for keeping a python thread idle for a while, without any noticable performance gain on the request...

Looking at the Django source, it looks like what actually happens when you call request.body is the the request body is loaded into memory by being read from a stream.
https://github.com/django/django/blob/stable/1.4.x/django/http/init.py#L390-L392
It's likely that if the request is large the time being taken is actually just loading it into memory. Django has methods on the request to handle acting on the body as a stream, which depending on what exactly the content being consumed is could allow you to process the request more efficiently.
https://docs.djangoproject.com/en/dev/ref/request-response/#django.http.HttpRequest.read
You could for example read one line at a time.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.