Long polling scalable architecture in tornado/cyclone

Long polling scalable architecture in tornado/cyclone - python

I want to implement long polling in python using cyclone or tornado with regards to scalability of service from beginning. Clients might connect for hours to this service. My concept:
Client HTTP requests will be processed by multiple tornado/cyclone handler threads behind NGINX proxy (serving as load balancer). There will be multiple data queues for requests: one for all unprocessed requests from all clients and rest of queues containing responses specific to each connected client, previously generated by worker processes. When requests are delivered to tornado/cyclone handler threads, request data will be sent for processing to worker queue and then processed by workers (which connect to database etc.). Meanwhile tornado/cyclone handler thread will look into client-specific queue and sends response with data back to client (if there is some waiting in queue). Please see the diagram.
Simple diagram: https://i.stack.imgur.com/9ZxcA.png
I am considering queue system because some requests might be pretty heavy on database and some requests might create notifications and messages for other clients. Is this a way to go towards scalable server or is it just overkill?

After doing some research I have decided to go with tornado websockets connected to zeroMQ. Inspired by this answer: Scaling WebSockets with a Message Queue.

Related

grpc server handles multiple requests in parallel

I have a question regarding the grpc server handles multiple requests in parallel, I have a grpc server, and the server provides an endpoint to handle client requests, and there are multiple clients sending request to the same endpoint.
When different clients send multiple requests to server at the same time, how the server handle those requests received the same time? Will each request will be handled by a thread simultaneously? Or the requests will be queued and handled one by one?
Thanks!

HTTP/2 connections have a limit on the number of maximum concurrent streams on a connection at one time. By default, most servers set this limit to 100 concurrent streams.
A gRPC channel uses a single HTTP/2 connection, and concurrent calls are multiplexed on that connection. When the number of active calls reaches the connection stream limit, additional calls are queued in the client. Queued calls wait for active calls to complete before they are sent. Applications with high load, or long running streaming gRPC calls, could see performance issues caused by calls queuing because of this limit.
But this problem has its own solution, for example in .Net, we could set following setting while defining GrpcChannel:
SocketsHttpHandler.EnableMultipleHttp2Connections = true
and it means, when the concurrent stream limit is reached, create additional HTTP/2 connections by a channel.

Server Push with SocketIO from Celery Task

I have a flask application within which I have many long running asynchronous tasks (~hours). It's important that the state of these tasks is communicated with the client.
I use celery to manage the background task queue, and I'm currently trying to broadcast updates to the client from each background thread via socketIO. Is this possible? Is there a better suited strategy to achieving what I would like?

You did not say, but I assume you plan on using Flask-SocketIO to handle the server-side SocketIO and not the official Node.js server, correct?
What you want to do can be done, but with the current version of Flask-SocketIO, the problem is that the process that hosts the Flask and Flask-SocketIO server owns the socket connections with the clients, so it is the only process that can communicate with them. At this time, Flask-SocketIO does not offer any help in sending data to clients from other processes such as Celery workers, this part you have to implement yourself. Specifically for Celery, you can have your long running tasks expose progress information that the server process can pick up and send to the clients.
I am currently working on improvements to Flask-SocketIO that will enable any process to send messages to connected clients using a Redis pub/sub backend for communication to the Flask-SocketIO server. Once this work is completed you will be able to write data to any client transparently from your Celery process.
You also ask if there is another alternative. You should also consider that the client can poll the server for status. If the updates do not need to be very frequent, then this is an option that is going to be much easier to implement. The client asks the server for status for a given task, and the server in turn asks the Celery task. I showed this approach in my Flask+Celery blog article.

I was able to solve this by creating and endpoint on the Flask server. See my answer here for details

Scaling a decoupled realtime server alongside a standard webserver

Say I have a typical web server that serves standard HTML pages to clients, and a websocket server running alongside it used for realtime updates (chat, notifications, etc.).
My general workflow is when something occurs on the main server that triggers the need for a realtime message, the main server sends that message to the realtime server (via a message queue) and the realtime server distributes it to any related connection.
My concern is, if I want to scale things up a bit, and add another realtime server, it seems my only options are:
Have the main server keep track of which realtime server the client
is connected to. When that client receives a notification/chat
message, the main server forwards that message along to only the
realtime server the client is connected to. The downside here is
code complexity, as the main server has to do some extra book
keeping.
Or instead have the main server simply pass that message
along to every realtime server; only the server the client is
connected to would actually do anything with it. This would result
in a number of wasted messages being passed around.
Am I missing another option here? I'm just trying to make sure I don't go too far down one of these paths and realize I'm doing things totally wrong.

If the scenario is
a) The main web server raises a message upon an action (let's say a record is inserted)
b ) He notifies the appropriate real-time server
you could decouple these two steps by using an intermediate pub/sub architecture that forwards the messages to the indended recipient.
An implementation would be
1) You have a redis pub-sub channel where upon a client connecting to a real-time socket, you start listening in that channel
2) When the main app wants to notify a user via the real-time server, it pushes to the channel a message, the real-time server get's it and forwards it to the intended user.
This way, you decouple the realtime notification from the main app and you don't have to keep track of where the user is.

The problem you are describing is the common "message backplane" used for example in SignalR, also related to the "fanout message exchange" in message architectures. When having a backplane or doing fanout, every message is forwarded to every message node server, so clients can connect to any server and get the message. This approach is a reasonable pain when you have to support both long polling and websockets. However, as you noticed, it is a waste of traffic and resources.
You need to use a message infrastructure with intelligent routing, like RabbitMQ. Take a look to topic and header exchange : https://www.rabbitmq.com/tutorials/amqp-concepts.html
How Topic Exchanges Route Messages
RabbitMQ for Windows: Exchange Types
There are tons of different queuing frameworks. Pick the one you like, but ensure you can have more exchange modes than just direct or fanout ;) At the end, a WebSocket is just and endpoint to connect to a message infrastructure. So if you want to scale out, it boils down to the backend you have :)

For just a few realtime servers, you could conceivably just keep a list of them in the main server and just go through them round-robin.
Another approach is to use a load balancer.
Basically, you'll have one dedicated node to receive the requests from the main server, and then have that load-balancer node take care of choosing which websocket/realtime server to forward the request to.
Of course, this just shifts the code complexity from the main server to a new component, but conceptually I think it's better and more decoupled.

Changed the answer because a reply indicated that the "main" and "realtime" servers are alraady load-balanced clusters and not individual hosts.
The central scalability question seems to be:
My general workflow is when something occurs on the main server that triggers the need for a realtime message, the main server sends that message to the realtime server (via a message queue) and the realtime server distributes it to any related connection.
Emphasis on the word "related". Assume you have 10 "main" servers and 50 "realtime" servers, and an event occurs on main server #5: which of the websockets would be considered related to this event?
Worst case is that any event on any "main" server would need to propagate to all websockets. That's a O(N^2) complexity, which counts as a severe scalability impairment.
This O(N^2) complexity can only be prevented if you can group the related connections in groups that don't grow with the cluster size or total nr. of connections. Grouping requires state memory to store to which group(s) does a connection belong.
Remember that there's 3 ways to store state:
global memory (memcached / redis / DB, ...)
sticky routing (load balancer configuration)
client memory (cookies, browser local storage, link/redirect URLs)
Where option 3 counts as the most scalable one because it omits a central state storage.
For passing the messages from "main" to the "realtime" servers, that traffic should by definition be much smaller than the traffic towards the clients. There's also efficient frameworks to push pub/sub traffic.

Python Twisted TCP application - How to prevent incoming message loss by blocking process

I have 10 messages/second(total activity) coming in on TCP from 40 clients.
I need to take each message and do a 5 second process (look up a webservice, do some DB queries and finally write the results to the DB).
How do I separate messages coming in from the slow 5 second process? Also I might receive another message from a client while already processing a message for that client. I NEVER want to lose a message.

With Twisted, the answer is to simply do what you want to do:
from twisted.python.log import err
from twisted.internet.protocol import Protocol
class YourProtocol(Protocol):
...
def messageReceived(self, message):
d = lookupWebService(message)
d.addCallback(queryDatabase)
d.addCallback(saveResults)
d.addErrback(err, "Servicing %r failed" % (message,))
You can find APIs for interacting with web services in twisted.web.client (presuming "web services" are things you talk to using an HTTP client). You can find APIs for interacting with some SQL database servers in twisted.enterprise.adbapi. You can find APIs for interacting with other kinds of databases with a little googling.

Distribute tasks in parallel using Divide and Conqueur.
Lots of python examples illustrating this approach, read up here:
Task Event: http://zguide.zeromq.org/py:taskvent
Task Worker: http://zguide.zeromq.org/py:taskwork
Task Sink: http://zguide.zeromq.org/py:tasksink
You can also distribute tasks using a ROUTER/DEALER proxy. Messages arriving at the proxy are fair-queued and distributed amongst downstream workers, no back chatter; this approach may better fit your needs.

Websockets behind nginx triggered by zeromq?

I'm trying to design a system that will process large amounts of data and send updates to the client about its progress. I'd like to use nginx (which, thankfully, just started supporting websockets) and uwsgi for the web server, and I'm passing messages through the system with zeromq. Ideally the solution could be written in Python, but I'm also open to a Nodejs or even a Go solution.
Here is the flow that I'd like to achieve:
Client visits a website and requests that a large amount of data be processed.
The server farms out the processing to another process/server [the worker] via zeromq, and replies to the client request explaining that processing has begun, including information about how to set up a websocket with the server.
The client sets up the websocket connection and waits for updates.
When the processing is done, the worker sends a "processing done!" message to the websocket process via zeromq, and the websocket process pushes the message down to the client.
Is what I describe possible? I guess I was thinking that I could run uwsgi in emperor mode so that it can handle one process (port) for the webserver and another for the websocket process. I'm just not sure if I can find a way to both receive zeromq message and manage websocket connections all from the same process. Maybe I have to initiate the final websocket push from the worker?
Any help/correct-direction-pointing/potential-solutions would be much appreciated. Any sample or snippet of an nginx config file with websockets properly routed would be appreciated as well.
Thanks!

Sure, that should be possible. You might want to look at zerogw.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.