I have a question regarding the grpc server handles multiple requests in parallel, I have a grpc server, and the server provides an endpoint to handle client requests, and there are multiple clients sending request to the same endpoint.
When different clients send multiple requests to server at the same time, how the server handle those requests received the same time? Will each request will be handled by a thread simultaneously? Or the requests will be queued and handled one by one?
Thanks!
HTTP/2 connections have a limit on the number of maximum concurrent streams on a connection at one time. By default, most servers set this limit to 100 concurrent streams.
A gRPC channel uses a single HTTP/2 connection, and concurrent calls are multiplexed on that connection. When the number of active calls reaches the connection stream limit, additional calls are queued in the client. Queued calls wait for active calls to complete before they are sent. Applications with high load, or long running streaming gRPC calls, could see performance issues caused by calls queuing because of this limit.
But this problem has its own solution, for example in .Net, we could set following setting while defining GrpcChannel:
SocketsHttpHandler.EnableMultipleHttp2Connections = true
and it means, when the concurrent stream limit is reached, create additional HTTP/2 connections by a channel.
Related
I want to implement long polling in python using cyclone or tornado with regards to scalability of service from beginning. Clients might connect for hours to this service. My concept:
Client HTTP requests will be processed by multiple tornado/cyclone handler threads behind NGINX proxy (serving as load balancer). There will be multiple data queues for requests: one for all unprocessed requests from all clients and rest of queues containing responses specific to each connected client, previously generated by worker processes. When requests are delivered to tornado/cyclone handler threads, request data will be sent for processing to worker queue and then processed by workers (which connect to database etc.). Meanwhile tornado/cyclone handler thread will look into client-specific queue and sends response with data back to client (if there is some waiting in queue). Please see the diagram.
Simple diagram: https://i.stack.imgur.com/9ZxcA.png
I am considering queue system because some requests might be pretty heavy on database and some requests might create notifications and messages for other clients. Is this a way to go towards scalable server or is it just overkill?
After doing some research I have decided to go with tornado websockets connected to zeroMQ. Inspired by this answer: Scaling WebSockets with a Message Queue.
I know I can set max_buffer_size in Tornado to limit the amount of data that can be uploaded to the server. But what I am trying to do is restrict the total amount of data across all requests to my Tornado server.
For example, I have 500 simultaneous requests being sent to my Tornado server. Each request is uploading 1MB of data. I want my Tornado server to reject connections when >150MB of data has been received across all requests. So the first 150 requests will be received, but then the next 350 will be rejected by Tornado before buffering any of that data into memory.
Is it possible to do this in Tornado?
There's not currently a way to set a global limit like this (but it might be a nice thing to add).
The best thing you can do currently is to ensure that the memory used by each connection stays low: set a low default max_body_size, and for RequestHandlers that need to receive more data than that, use #stream_request_body and in prepare() call self.request.connection.set_max_body_size(large_value). With the #stream_request_body decorator, each connection's memory usage will be limited by the chunk_size parameter instead of reading the whole body at once. Then in your data_recieved method you can await an allocation from a global semaphore to control memory usage beyond the chunk size per connection.
I have an application which handles websocket and http requests for some basic operations and consuming push data over sockets. Nothing is very computation intensive. Some file tailing, occasional file read / write is all that it has to do with heavy processing currently. I want to deploy this to Linux. I have no static files to handle
Can a tornado application handle 50-100 websocket and http clients without needing ngnix ? I don't want to use another server for this. How many clients can it handle on its own ?
Everywhere I search I get ngnix, and I don't want to involve it
Yes, Tornado can easily handle 50-100 websocket and http clients without needing Ngnix. You only need Nginx as a reverse proxy if you're running multiple Tornado processes on separate ports.
If you're running a single process or multiple process on a single port, you don't need Nginx.
I've seen benchmarks which show that with a single Tornado process, you can serve around 5,000 connections per second if your response message size is around 100 KB; and over 20,000 requests per second for 1 KB response size. But this also depends on your CPU speed.
I think it's safe to assume with an average CPU and around 1 GB RAM, you can easily serve around a 2,000-3,000 requests per second.
I use pika-0.10.0 with rabbitmq-3.6.6 broker on ubuntu-16.04. I designed a Request/Reply service. There is a single Request queue where all clients push their requests. Each client creates a unique Reply queue: the server pushes replies targeting this client to this unique queue. My API can be seen as two messages: init and run.
init messages contain big images, thus init is a big and slow request. run messages are lighter and the server reuses previous images. The server can serve multiple clients. Usually client#1 init then run multiple times. If client#2 comes in and init, it will replace the images sent by client#1 on the server. And further run issued by client#1 would use wrong images. Then I am asking:
is it possible to limit the number of connections to a queue? E.g. the server serves one client at a time.
another option would be: the server binds images to a client, saves them, and reuse them when this client runs. It requires more work, and will impact performance if two or more clients' requests are closely interleaved.
sending the images in each run request is not an option, would be too slow.
I think you have a problem in your design. Logically each run corresponds to a certain init so they have to be connected. I'd put a correlation id field into init and run events. When server receives run it checks if it there was a corresponding init processed and uses the result of that init.
Speaking of performance:
You can make init worker queue and have multiple processing servers listen to it. The example is in the RabbitMQ docs
Then, when init request comes in, one of available servers will pick it up, and store your images and the correlation ID. If you have multiple init requests at the same time - no problem, they will be processed eventually (or simultaneosly if servers are free)
Then server that did the process sends reply message to the client queue saying init work is done, and sends name of the queue where run request have to be published.
When ready, client sends its run request to the correct queue.
To directly answer the question:
there is a common misconception that you publish to a queue. In RabbitMQ you publish to an exchange that cares about routing of your messages to a number of queues. So you question really becomes can I limit number of publishing connections to an exchange. I'm pretty sure there is no way of doing so on the broker side.
Even if there was a way of limiting number of connections, imagine the situation:
Client1 comes in, pushes its 'init' request.
Client1 holds its connection, waiting to push run.
Client1 fails or network partition occurs, its connection gets
dropped.
Client2 comes in and pushes its init request.
Client2 fails
Client1 comes back up and pushes its run and gets Client2's
images.
Connection is a transient thing and cannot be relied upon as a transaction mechanism.
I setup a Django + nginx + uwsgi server. In my Django application, I want to send several HTTP request in parallel. I created multiple threads and send each request in a thread. However, when I checked the time stamp at sending each request, I saw all the requests were sent sequentially.
Could anyone tell me how I can send the HTTP request in parallel?
I'm guessing this is due to Python threads and the GIL ...if you used multiprocessing instead of threads you should be able to send them truly in parallel