Performance of Singleton Controllers in fastapi

Performance of Singleton Controllers in fastapi - python

I am having trouble troubleshooting a performance bottleneck in one of my APIs and I have a theory that I need somebody with deeper knowledge of Python to validate for me.
I have a fastapi web service and a nodejs web service deployed on AWS. The node.js api is performing perfeclty under heavier loads, multiple concurrent requests taking same amount of time to be served.
My fastapi service however, is performing absurdly. If I make two requests concurrently, only one is served while the other has to wait for the first to be finished, hence the response time for the second request is twice as long as the first one.
My theory is that I am using Singleton pattern to instantiate the controller after a request comes to a route and the object already being in use and locked is causing the second request to wait until the first is resolved. Could this be it or am I missing something very obvious here? 2 concurrent requests should absolutely not be a problem for any type of web server.

Related

How to change the queuing/stacking requests behaviour of Sanic

Good day all
I'm tagging Sanic because that is what I'm using in my web app, but I'm not sure if this behaviour is due to Sanic or something underlying (like asyncio or even the network interface).
Please let me know if I need to write a quick example showing what I mean, my application is quite large so I can't share that here, but I think my problem is simple enough to explain.
I have a simple web application in Python using the Sanic framework. For my purposes, I actually need a server which is synchronous. As such, none of my endpoint functions are async, and I explicitly start my Sanic app with one worker.
This is because, when I send the server a number of requests, I need them to be performed in the order which they were sent.
However this does not happen.
Some of my requests take a lot of calculation, so they're not immediate. Meaning that consequent requests arrive while it is still processing.
In other words, imagine that I send 4 requests one after each other. Request 1 takes a few seconds to calculate, meaning that requests 2-4 arrive while request 1 is still being processed. What I want to happen is that requests are processed in order:
Request 1
Request 2
Request 3
Request 4
But what happens is that requests are processed out of order, in fact, after the first, it seems random:
Request 1
Request 3
Request 2
Request 4
So is there a way to force it to execute requests in order? Preferably at a high level (Sanic) ?
I've looked around but have not seen anyone talking about this behaviour. I suppose this is a rare case as my server is not RESTful and not stateless.
Any help is appreciated.

How to handle high response time

There are two different services. One service -Django is getting the request from the front-end and then calling an API in the other service -Flask.
But the response time of the Flask service is high and if the user navigates to another page that request will be canceled.
Should it be a background task or a pub/sub pattern? If so, how to do it in the background and then tell the user here is your last result?

You have two main options possible:
Make an initial request to a "simple" view of Django, which load a skeleton HTML page with a spinner where some JS will trigger a XHR request to a second Django view which will contain the other service (Flask) call. Thus, you can even properly alert your user the loading takes times and handle the exit on the browser side (ask confirmation before leaving/abort the request...)
If possible, cache the result of the Flask service, so you don't need to call it at each page load.
You can combine those two solutions by calling the service in a asynchronous request and cache its result (depending on context, you may need to customize the cache depending on the user connected for example).
The first solution can be declined with pub/sub, websockets, whatever, but a classical XHR seems fine for your case.

On our project, we have a couple of time-expensive endpoints. Our solution was similar to a previous answer:
Once we receive a request we call a Celery task that does its expensive work in async mode. We do not wait for its results and return a quick response to the user. Celery task sends its progress/results via WebSockets to a user. Frontend handles this WS message. The benefit of this approach is that we do not spend the CPU of our backend. We spend the CPU of the Celery worker that is running on another machine.

What happens to the supernumerary requests when a Django app gets flooded?

I have a web app using Django. The app has a maximum capacity of N RPS, while the client sends M RPS, where M>N. In other words, the app receives more requests than it can handle, and the number of unprocessed requests will grow linearly over time (after t sec, the number of requests that are waiting to be processed is (M-N) * t)
I would like to know what will happen to these requests. Will they accumulate in memory until the memory is full? Will they get "canceled" after some conditions are met?

It's hard to answer to your question directly without details about your configuration. Moreover for a extremely high usage of your app it's realy hard to determine what will happen there. But surely, you can't be sure that all those request will be handled correctly.
If you are able to count how many requests per second your application can handle and you want to make it reliable for more than N requests, then maybe it's a good start to think of some kind of load balancer, which will spread your request over multiple server machines.
To answer your question, I can of think of few posibilities when request can't be handled correclty:
Client cancelled he's request (maybe a browser, which can have maximum time execution limit).
Time execution of request was above the timeout limit set in web server configuration (because of lack of resources, too many I/O operations, ...).
Maybe other service (like some blocked PostgreSQL query or maybe Memcache server failed to work) was timeouted
Your server machine is overloaded and TCP connection can't be established.
Web server of your choice is able to handle only specified in configuration amount of requests/queue length and rejects those over limit (in Apache for example this configuration: http://httpd.apache.org/docs/2.2/mod/mpm_common.html#listenbacklog).
Try to read something about C10K problem, could be useful to think about it more deeply.

Handling multiple requests in Flask

My Flask applications has to do quite a large calculation to fetch a certain page. While Flask is doing that function, another user cannot access the website, because Flask is busy with the large calculation.
Is there any way that I can make my Flask application accept requests from multiple users?

Yes, deploy your application on a different WSGI server, see the Flask deployment options documentation.
The server component that comes with Flask is really only meant for when you are developing your application; even though it can be configured to handle concurrent requests with app.run(threaded=True) (as of Flask 1.0 this is the default). The above document lists several options for servers that can handle concurrent requests and are far more robust and tuneable.

For requests that take a long time, you might want to consider starting a background job for them.

of tornado and blocking code

I am trying to move away from CherryPy for a web service that I am working on and one alternative that I am considering is Tornado. Now, most of my requests look on the backend something like:
get POST data
see if I have it in cache (database access)
if not make multiple HTTP requests to some other web service which can take even a good few seconds depending on the number of requests
I keep hearing that one should not block the tornado main loop; I am wondering if all of the above code is executed in the post() method of a RequestHandler, does this mean that I am blocking the code ? And if so, what's the appropriate approach to use tornado with the above requirements.

Tornado comes shipped with an asynchronous (actually two iirc) http client (AsyncHTTPClient). Use that one if you need to do additional http requests.
The database lookup should also be done using an asynchronous client in order to not block the tornado ioloop/mainloop. I know there are a couple of tornado tailor made database clients (e.g redis, mongodb) out there. The mysql lib is included in the tornado distribution.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.