I know I can set max_buffer_size in Tornado to limit the amount of data that can be uploaded to the server. But what I am trying to do is restrict the total amount of data across all requests to my Tornado server.
For example, I have 500 simultaneous requests being sent to my Tornado server. Each request is uploading 1MB of data. I want my Tornado server to reject connections when >150MB of data has been received across all requests. So the first 150 requests will be received, but then the next 350 will be rejected by Tornado before buffering any of that data into memory.
Is it possible to do this in Tornado?
There's not currently a way to set a global limit like this (but it might be a nice thing to add).
The best thing you can do currently is to ensure that the memory used by each connection stays low: set a low default max_body_size, and for RequestHandlers that need to receive more data than that, use #stream_request_body and in prepare() call self.request.connection.set_max_body_size(large_value). With the #stream_request_body decorator, each connection's memory usage will be limited by the chunk_size parameter instead of reading the whole body at once. Then in your data_recieved method you can await an allocation from a global semaphore to control memory usage beyond the chunk size per connection.
Related
I have a question regarding the grpc server handles multiple requests in parallel, I have a grpc server, and the server provides an endpoint to handle client requests, and there are multiple clients sending request to the same endpoint.
When different clients send multiple requests to server at the same time, how the server handle those requests received the same time? Will each request will be handled by a thread simultaneously? Or the requests will be queued and handled one by one?
Thanks!
HTTP/2 connections have a limit on the number of maximum concurrent streams on a connection at one time. By default, most servers set this limit to 100 concurrent streams.
A gRPC channel uses a single HTTP/2 connection, and concurrent calls are multiplexed on that connection. When the number of active calls reaches the connection stream limit, additional calls are queued in the client. Queued calls wait for active calls to complete before they are sent. Applications with high load, or long running streaming gRPC calls, could see performance issues caused by calls queuing because of this limit.
But this problem has its own solution, for example in .Net, we could set following setting while defining GrpcChannel:
SocketsHttpHandler.EnableMultipleHttp2Connections = true
and it means, when the concurrent stream limit is reached, create additional HTTP/2 connections by a channel.
I'm load testing a Spring Boot API, a POC to show the team it will handle high throughput. I'm making the requests with a Python script that uses a multiprocessing pool. When I start sending more than like 10,000 records I get an error that "Max retries exceeded" which I've determined means the endpoint is refusing the connection from the client, because it's making too many connections.
Is there a Tomcat setting to allow more requests from a client (temporarily) for something like load testing? I tried setting "server.tomcat.max-threads" in the applicatin.properties file, but that doesn't seem to help.
I have an application which handles websocket and http requests for some basic operations and consuming push data over sockets. Nothing is very computation intensive. Some file tailing, occasional file read / write is all that it has to do with heavy processing currently. I want to deploy this to Linux. I have no static files to handle
Can a tornado application handle 50-100 websocket and http clients without needing ngnix ? I don't want to use another server for this. How many clients can it handle on its own ?
Everywhere I search I get ngnix, and I don't want to involve it
Yes, Tornado can easily handle 50-100 websocket and http clients without needing Ngnix. You only need Nginx as a reverse proxy if you're running multiple Tornado processes on separate ports.
If you're running a single process or multiple process on a single port, you don't need Nginx.
I've seen benchmarks which show that with a single Tornado process, you can serve around 5,000 connections per second if your response message size is around 100 KB; and over 20,000 requests per second for 1 KB response size. But this also depends on your CPU speed.
I think it's safe to assume with an average CPU and around 1 GB RAM, you can easily serve around a 2,000-3,000 requests per second.
I have a python application that uses eventlet Green thread (pool of 1000 green threads) to make HTTP connections. Whenever the client fired more than 1000 parallel requests ETIMEDOUT occurs. Can anyone help me out with the possible reason?
Most likely reason in this case: DNS server request throttling. You can easily check if that's the case by eliminating DNS resolving (request http://{ip-address}/path, don't forget to add proper Host: header). If you do web crawling these steps are not optional, you absolutely must:
control concurrency automatically (without human action) based on aggregate (i.e. average) execution time. This applies at all levels independently. Back off concurrent DNS requests if you get DNS responses slower. Back off TCP concurrency if you get response speed (body size / time) slower. Back off overall request concurrency if your CPU is overloaded - don't request more than you can process.
retry on temporary failures, each time increase wait-before-retry period, search backoff algorithm. How to decide if an error is temporary? Mostly research, trial and error.
run local DNS server, find and configure many upstreams
Next popular problem with high concurrency that you'll likely face is OS limit of number of open connections and file descriptors. Search sysctl somaxconn and ulimit nofile to fix those.
I have a Flask API, it connects to a Redis cluster for caching purposes. Should I be creating and tearing down a Redis connection on each flask api call? Or, should I try and maintain a connection across requests?
My argument against the second option is that I should really try and keep the api as stateless as possible, and I also don't know if keeping some persistent across request might causes threads race conditions or other side effects.
However, if I want to persist a connection, should it be saved on the session or on the application context?
This is about performance and scale. To get those 2 buzzwords buzzing you'll in fact need persistent connections.
Eventual race conditions will be no different than with a reconnect on every request so that shouldn't be a problem. Any RCs will depend on how you're using redis, but if it's just caching there's not much room for error.
I understand the desired stateless-ness of an API from a client sides POV, but not so sure what you mean about the server side.
I'd suggest you put them in the application context, not the sessions (those could become too numerous) whereas the app context gives you the optimal 1 connection per process (and created immediately at startup). Scaling this way becomes easy-peasy: you'll never have to worry about hitting the max connection counts on the redis box (and the less multiplexing the better).
It's good idea from the performance standpoint to keep connections to a database opened between requests. The reason for that is that opening and closing connections is not free and takes some time which may become problem when you have too many requests. Another issue that a database can only handle up to a certain number of connections and if you open more, database performance will degrade, so you need to control how many connections are opened at the same time.
To solve both of these issues you may use a connection pool. A connection pool contains a number of opened database connections and provides access to them. When a database operation should be performed from a connection shoul be taken from a pool. When operation is completed a connection should be returned to the pool. If a connection is requested when all connections are taken a caller will have to wait until some connections are returned to the pool. Since no new connections are opened in this processed (they all opened in advance) this will ensure that a database will not be overloaded with too many parallel connections.
If connection pool is used correctly a single connection will be used by only one thread will use it at any moment.
Despite of the fact that connection pool has a state (it should track what connections are currently in use) your API will be stateless. This is because from the API perspective "stateless" means: does not have a state/side-effects visible to an API user. Your server can perform a number of operations that change its internal state like writing to log files or writing to a cache, but since this does not influence what data is being returned as a reply to API calls this does not make this API "stateful".
You can see some examples of using Redis connection pool here.
Regarding where it should be stored I would use application context since it fits better to its purpose.