How to change the queuing/stacking requests behaviour of Sanic

How to change the queuing/stacking requests behaviour of Sanic - python

Good day all
I'm tagging Sanic because that is what I'm using in my web app, but I'm not sure if this behaviour is due to Sanic or something underlying (like asyncio or even the network interface).
Please let me know if I need to write a quick example showing what I mean, my application is quite large so I can't share that here, but I think my problem is simple enough to explain.
I have a simple web application in Python using the Sanic framework. For my purposes, I actually need a server which is synchronous. As such, none of my endpoint functions are async, and I explicitly start my Sanic app with one worker.
This is because, when I send the server a number of requests, I need them to be performed in the order which they were sent.
However this does not happen.
Some of my requests take a lot of calculation, so they're not immediate. Meaning that consequent requests arrive while it is still processing.
In other words, imagine that I send 4 requests one after each other. Request 1 takes a few seconds to calculate, meaning that requests 2-4 arrive while request 1 is still being processed. What I want to happen is that requests are processed in order:
Request 1
Request 2
Request 3
Request 4
But what happens is that requests are processed out of order, in fact, after the first, it seems random:
Request 1
Request 3
Request 2
Request 4
So is there a way to force it to execute requests in order? Preferably at a high level (Sanic) ?
I've looked around but have not seen anyone talking about this behaviour. I suppose this is a rare case as my server is not RESTful and not stateless.
Any help is appreciated.

Related

Performance of Singleton Controllers in fastapi

I am having trouble troubleshooting a performance bottleneck in one of my APIs and I have a theory that I need somebody with deeper knowledge of Python to validate for me.
I have a fastapi web service and a nodejs web service deployed on AWS. The node.js api is performing perfeclty under heavier loads, multiple concurrent requests taking same amount of time to be served.
My fastapi service however, is performing absurdly. If I make two requests concurrently, only one is served while the other has to wait for the first to be finished, hence the response time for the second request is twice as long as the first one.
My theory is that I am using Singleton pattern to instantiate the controller after a request comes to a route and the object already being in use and locked is causing the second request to wait until the first is resolved. Could this be it or am I missing something very obvious here? 2 concurrent requests should absolutely not be a problem for any type of web server.

What is a good way to not create multiple requests to an API caching the result, if the requests are close in time?

I have a Python webapp, and it's an intermediary service that requests data from another API and caches the data.
The thing is, each requests to the external API requires some paperwork so we better cache that in our side.
Supposing I have two requests to my service, at times T and T + 1 and the API takes 3 seconds to respond, both requests will check that I don't have the result stored and then will try to request to the external API.
What are good mechanisms in Python to do some kind of semaphore until the first requests finishes, then the second request can read from the cache?

Is there a way to limit the number of concurrent requests from one IP with Gunicorn?

Basically I'm running a Flask web server that crunches a bunch of data and sends it back to the user. We aren't expecting many users ~60, but I've noticed what could be an issue with concurrency. Right now, if I open a tab and send a request to have some data crunched, it takes about 30s, for our application that's ok.
If I open another tab and send the same request at the same time, unicorn will do it concurrently, this is great if we have two seperate users making two seperate requests. But what happens if I have one user open 4 or 8 tabs and send the same request? It backs up the server for everyone else, is there a way I can tell Gunicorn to only accept 1 request at a time from the same IP?

A better solution to the answer by #jon would be limiting the access by your web server instead of the application server. A good way would always be to have separation between the responsibilities to be carried out by the different layers of your application. Ideally, the application server, flask should not have any configuration for the limiting or anything to do with from where the requests are coming. The responsibility of the web server, in this case nginx is to route the request based on certain parameters to the right client. The limiting should be done at this layer.
Now, coming to the limiting, you could do it by using the limit_req_zone directive in the http block config of nginx
http {
limit_req_zone $binary_remote_addr zone=one:10m rate=1r/s;
...
server {
...
location / {
limit_req zone=one burst=5;
proxy_pass ...
}
where, binary_remote_addris the IP of the client and not more than 1 request per second at an average is allowed, with bursts not exceeding 5 requests.
Pro-tip: Since the subsequent requests from the same IP would be held in a queue, there is a good chance of nginx timing out. Hence, it would be advisable to have a better proxy_read_timeout and if the reports take longer then also adjusting the timeout of gunicorn
Documentation of limit_req_zone
A blog post by nginx on rate limiting can be found here

This is probably NOT best handled at the flask level. But if you had to do it there, then it turns out someone else already designed a flask plugin to do just this:
https://flask-limiter.readthedocs.io/en/stable/
If a request takes at least 30s then make your limit by address for one request every 30s. This will solve the issue of impatient users obsessively clicking instead of waiting for a very long process to finish.
This isn't exactly what you requested, since it means that longer/shorter requests may overlap and allow multiple requests at the same time, which doesn't fully exclude the behavior you describe of multiple tabs, etc. That said, if you are able to tell your users to wait 30 seconds for anything, it sounds like you are in the drivers seat for setting UX expectations. Probably a good wait/progress message will help too if you can build an asynchronous server interaction.

App Engine frontend waiting for backend to finish and return data - what's the right way to do it?

I'm using a frontend built in angularjs and a backend built in python and webapp2 in app engine.
The backend makes calls to a third party API, fetches data and returns to the frontend.
The API request from the backend may take upto 30s or more. The problem is the frontend can't really progress any further until it gets the data.
I tried running 3 simultaneous requests to the backend using different tabs and 2 of them failed. I'm afraid that this seems to suggest that the app only allows one user at a time.
What's the best way to handle this? One thought I have is:
Use task queues to run the API call to 3rd party in the background
Create a new handler which reads from the queue for the last task sent and let the frontend poll this one at regular intervals
Update the frontend once data is available
Is that the right way? I'm sure this is a problem solved in a frontend+backend kind of world, but I just don't know what to search for.
Thanks!

Requests from the frontend are capped at 30 seconds; after that they time out in the server side. That is part of GAE's design. Requests originating from the task queue get 10 minutes, so your idea is viable. However, you'll want some identifier to use for polling, rather than just using "the last sent," to distinguish between concurrent tasks.

of tornado and blocking code

I am trying to move away from CherryPy for a web service that I am working on and one alternative that I am considering is Tornado. Now, most of my requests look on the backend something like:
get POST data
see if I have it in cache (database access)
if not make multiple HTTP requests to some other web service which can take even a good few seconds depending on the number of requests
I keep hearing that one should not block the tornado main loop; I am wondering if all of the above code is executed in the post() method of a RequestHandler, does this mean that I am blocking the code ? And if so, what's the appropriate approach to use tornado with the above requirements.

Tornado comes shipped with an asynchronous (actually two iirc) http client (AsyncHTTPClient). Use that one if you need to do additional http requests.
The database lookup should also be done using an asynchronous client in order to not block the tornado ioloop/mainloop. I know there are a couple of tornado tailor made database clients (e.g redis, mongodb) out there. The mysql lib is included in the tornado distribution.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.