websocket vs rest API for real time data? [closed] - python

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I need to constantly access a server to get real time data of financial instruments. The price is constantly changing so I need to request new prices every 0.5 seconds. The REST APIs of the brokers let me do this, however, I have noticed there's quite some delay when connecting to the server. I just noticed that they also have websocket API though. According to what I read, they both have some pros/cons. But for what I want to do and because speed is specially important here, which kind if API would you recommend? Is websocket really faster?
Thank you!

The most efficient operation for what you're describing would be to use a webSocket connection between client and server and have the server send updated price information directly to the client over the webSocket ONLY when the price changes by some meaningful amount or when some minimum amount of time has elapsed and the price has changed.
This could be much more efficient than having the client constantly ask for new price changes and the timing of when the new information gets to the client can be more timely.
So, if you're interested in how quickly the information on a new price level gets to the client, a webSocket can get it there much more timely because the server can just send the new pricing information directly to the client the very moment it changes on the server. Whereas using a REST call, the client has to poll on some fixed time interval and will only ever get new data at the point of their polling interval.
A webSocket can also be faster and easier on your networking infrastructure simply because fewer network operations are involved to simply send a packet over an already open webSocket connection versus creating a new connection for each REST/Ajax call, sending new data, then closing the connection. How much of a difference/improvement this makes in your particular application would be something you'd have to measure to really know.
But, webSockets were designed to help with your specific scenario where a client wants to know (as close to real-time as practical) when something changes on the server so I would definitely think that it would be the preferred design pattern for this type of use.
Here's a comparison of the networking operations involved in sending a price change over an already open webSocket vs. making a REST call.
webSocket
Server sees that a price has changed and immediately sends a message to each client.
Client receives the message about new price.
Rest/Ajax
Client sets up a polling interval
Upon next polling interval trigger, client creates socket connection to server
Server receives request to open new socket
When connection is made with the server, client sends request for new pricing info to server
Server receives request for new pricing info and sends reply with new data (if any).
Client receives new pricing data
Client closes socket
Server receives socket close
As you can see there's a lot more going on in the Rest/Ajax call from a networking point of view because a new connection has to be established for every new call whereas the webSocket uses an already open call. In addition, in the webSocket cases, the server just sends the client new data when new data is available - the client doens't have to regularly request it.
If the pricing information doesn't change super often, the REST/Ajax scenario will also frequently have "do-nothing" calls where the client requests an update, but there is no new data. The webSocket case never has that wasteful case since the server just sends new data when it is available.

Related

Scaling a decoupled realtime server alongside a standard webserver

Say I have a typical web server that serves standard HTML pages to clients, and a websocket server running alongside it used for realtime updates (chat, notifications, etc.).
My general workflow is when something occurs on the main server that triggers the need for a realtime message, the main server sends that message to the realtime server (via a message queue) and the realtime server distributes it to any related connection.
My concern is, if I want to scale things up a bit, and add another realtime server, it seems my only options are:
Have the main server keep track of which realtime server the client
is connected to. When that client receives a notification/chat
message, the main server forwards that message along to only the
realtime server the client is connected to. The downside here is
code complexity, as the main server has to do some extra book
keeping.
Or instead have the main server simply pass that message
along to every realtime server; only the server the client is
connected to would actually do anything with it. This would result
in a number of wasted messages being passed around.
Am I missing another option here? I'm just trying to make sure I don't go too far down one of these paths and realize I'm doing things totally wrong.
If the scenario is
a) The main web server raises a message upon an action (let's say a record is inserted)
b ) He notifies the appropriate real-time server
you could decouple these two steps by using an intermediate pub/sub architecture that forwards the messages to the indended recipient.
An implementation would be
1) You have a redis pub-sub channel where upon a client connecting to a real-time socket, you start listening in that channel
2) When the main app wants to notify a user via the real-time server, it pushes to the channel a message, the real-time server get's it and forwards it to the intended user.
This way, you decouple the realtime notification from the main app and you don't have to keep track of where the user is.
The problem you are describing is the common "message backplane" used for example in SignalR, also related to the "fanout message exchange" in message architectures. When having a backplane or doing fanout, every message is forwarded to every message node server, so clients can connect to any server and get the message. This approach is a reasonable pain when you have to support both long polling and websockets. However, as you noticed, it is a waste of traffic and resources.
You need to use a message infrastructure with intelligent routing, like RabbitMQ. Take a look to topic and header exchange : https://www.rabbitmq.com/tutorials/amqp-concepts.html
How Topic Exchanges Route Messages
RabbitMQ for Windows: Exchange Types
There are tons of different queuing frameworks. Pick the one you like, but ensure you can have more exchange modes than just direct or fanout ;) At the end, a WebSocket is just and endpoint to connect to a message infrastructure. So if you want to scale out, it boils down to the backend you have :)
For just a few realtime servers, you could conceivably just keep a list of them in the main server and just go through them round-robin.
Another approach is to use a load balancer.
Basically, you'll have one dedicated node to receive the requests from the main server, and then have that load-balancer node take care of choosing which websocket/realtime server to forward the request to.
Of course, this just shifts the code complexity from the main server to a new component, but conceptually I think it's better and more decoupled.
Changed the answer because a reply indicated that the "main" and "realtime" servers are alraady load-balanced clusters and not individual hosts.
The central scalability question seems to be:
My general workflow is when something occurs on the main server that triggers the need for a realtime message, the main server sends that message to the realtime server (via a message queue) and the realtime server distributes it to any related connection.
Emphasis on the word "related". Assume you have 10 "main" servers and 50 "realtime" servers, and an event occurs on main server #5: which of the websockets would be considered related to this event?
Worst case is that any event on any "main" server would need to propagate to all websockets. That's a O(N^2) complexity, which counts as a severe scalability impairment.
This O(N^2) complexity can only be prevented if you can group the related connections in groups that don't grow with the cluster size or total nr. of connections. Grouping requires state memory to store to which group(s) does a connection belong.
Remember that there's 3 ways to store state:
global memory (memcached / redis / DB, ...)
sticky routing (load balancer configuration)
client memory (cookies, browser local storage, link/redirect URLs)
Where option 3 counts as the most scalable one because it omits a central state storage.
For passing the messages from "main" to the "realtime" servers, that traffic should by definition be much smaller than the traffic towards the clients. There's also efficient frameworks to push pub/sub traffic.

WebSockets best practice for connecting an external application to push data [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am trying to understand how to use websockets correctly and seem to be missing some fundamental part of the puzzle.
Say I have a website with 3 different pages:
newsfeed1.html
newsfeed2.html
newsfeed3.html
When a user goes to one of those pages they get a feed specific to the page, ie newsfeed1.html = sport, newsfeed2.html = world news etc.
There is a CoreApplication.py that does all the handling of getting data and parsing etc.
Then there is a WebSocketServer.py, using say Autobahn.
All the examples I have looked at, and that is alot, only seem to react to a message from the client (browser) within the WebSocketServer.py, think chat echo examples. So a client browser sends a chat message and it is echoed back or broadcast to all connected client browsers.
What I am trying to figure out is given the following two components:
CoreApplication.py
WebSocketServer.py
How to best make CoreApplication.py communicate with WebSocketServer.py for the purpose of sending messages to connected users.
Normally should CoreApplication.py simply send command messages to the WebSocketServer.py as a client. For example like this:
CoreApplication.py -> Connects to WebServerSocket.py as a normal client -> sends a Json command message (like broadcast message X to all users || send message Y to specific remote client) -> WebSocketServer.py determines how to process the incoming message dependant on which client is connected to which feed and sends to according remote client browsers.
OR, should CoreApplication.py connect programatically with WebSocketServer.py? As I cannot seem to find any examples of being able to do this for example with Autobahn or other simple web sockets as once the WebSocketServer is instantiated it seems to run in a loop and does not accept external sendMessage requests?
So to sum up the question: What is the best practice? To simply make CoreApplication.py interact with WebSocketServer.py as a client (with special command data) or for CoreApplication.py to use an already running instance of WebSocketServer.py (both of which are on the same machine) through some more direct method to directly sendMessages without having to make a full websocket connection first to the WebSocketServer.py server?
It depends on your software design, if you decide the logic from WebSocketServer.px and CoreApplication.py belongs together, merge it.
If not, you need some kind of inter process communication (ipc).
You can use websockets for this ipc, but i would suggest, you use something simpler. For example, you can you json-rpc over tcp or unix domain to send control messages from CoreApplication.py to WebSocketServer.py

Which web servers are compatible with gevent and how do the two relate?

I'm looking to start a web project using Flask and its SocketIO plugin, which depends on gevent (something something greenlets), but I don't understand how gevent relates to the webserver. Does using gevent restrict my server choice at all? How does it relate to the different levels of web servers that we have in python (e.g. Nginx/Apache, Gunicorn)?
Thanks for the insight.
First, lets clarify what we are talking about:
gevent is a library to allow the programming of event loops easily. It is a way to immediately return responses without "blocking" the requester.
socket.io is a javascript library create clients that can maintain permanent connections to servers, which send events. Then, the library can react to these events.
greenlet think of this a thread. A way to launch multiple workers that do some tasks.
A highly simplified overview of the entire process follows:
Imagine you are creating a chat client.
You need a way to notify the user's screens when anyone types a message. For this to happen, you need someway to tell all the users when a new message is there to be displayed. That's what socket.io does. You can think of it like a radio that is tuned to a particular frequency. Whenever someone transmits on this frequency, the code does something. In the case of the chat program, it adds the message to the chat box window.
Of course, if you have a radio tuned to a frequency (your client), then you need a radio station/dj to transmit on this frequency. Here is where your flask code comes in. It will create "rooms" and then transmit messages. The clients listen for these messages.
You can also write the server-side ("radio station") code in socket.io using node, but that is out of scope here.
The problem here is that traditionally - a web server works like this:
A user types an address into a browser, and hits enter (or go).
The browser reads the web address, and then using the DNS system, finds the IP address of the server.
It creates a connection to the server, and then sends a request.
The webserver accepts the request.
It does some work, or launches some process (depending on the type of request).
It prepares (or receives) a response from the process.
It sends the response to the client.
It closes the connection.
Between 3 and 8, the client (the browser) is waiting for a response - it is blocked from doing anything else. So if there is a problem somewhere, like say, some server side script is taking too long to process the request, the browser stays stuck on the white page with the loading icon spinning. It can't do anything until the entire process completes. This is just how the web was designed to work.
This kind of 'blocking' architecture works well for 1-to-1 communication. However, for multiple people to keep updated, this blocking doesn't work.
The event libraries (gevent) help with this because they accept and will not block the client; they immediately send a response and when the process is complete.
Your application, however, still needs to notify the client. However, as the connection is closed - you don't have a way to contact the client back.
In order to notify the client and to make sure the client doesn't need to "refresh", a permanent connection should be open - that's what socket.io does. It opens a permanent connection, and is always listening for messages.
So work request comes in from one end - is accepted.
The work is executed and a response is generated by something else (it could be a the same program or another program).
Then, a notification is sent "hey, I'm done with your request - here is the response".
The person from step 1, listens for this message and then does something.
Underneath is all is WebSocket a new full-duplex protocol that enables all this radio/dj functionality.
Things common between WebSockets and HTTP:
Work on the same port (80)
WebSocket requests start off as HTTP requests for the handshake (an upgrade header), but then shift over to the WebSocket protocol - at which point the connection is handed off to a websocket-compatible server.
All your traditional web server has to do is listen for this handshake request, acknowledge it, and then pass the request on to a websocket-compatible server - just like any other normal proxy request.
For Apache, you can use mod_proxy_wstunnel
For nginx versions 1.3+ have websocket support built-in

Send TCP messages at certain rate with Python

I am trying to generate some traffic to a server by sending TCP messages to it.
For this, I am using a Python script which opens a TCP socket and then sends some data over it. After receiving a reply, the TCP connection gets closed.
Question: I would like to be able to predefine a rate with which the script will be sending the requests to the server, eg: 5 messages per second. However, I do not have a clue how to script this via Python :(.
Anyone an idea how to do this (a short example would be super ! ;) ?
Thanks in advance.
Note: I might need to add an extra difficulty: since the server has to reply,
I guess I have to make the script working asynchronously ... That way, I can
send the requests out without having to wait for a reply on the previous request...
What you're looking for is an implementation of the token bucket algorithm. It's analogous to a bucket with a fixed capacity, where each consumer can't perform the action until it gets a token, and the bucket is refilled at a fixed rate.
The algorithm is easy to implement, but the link below has an example:
http://code.activestate.com/recipes/511490-implementation-of-the-token-bucket-algorithm/

Writing a p2p client/server app [duplicate]

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
How to write a twisted server that is also a client?
How can I create a tcp client server app with twisted, where also the server can send requests, not just answer them? Sort of like a p2p app but where clients always initiate the connection. Since I don't know when the requests from the server will occur, I don't see how I can do this once the reactor is started.
The question you have to ask yourself is: why is the server sending a request?
Presumably something has happened in the world that would prompt the server to send a request; it wouldn't just do it at random. Even if it did it at random, the thing that has happened in the world would be "some random amount of time has passed". In other words, callLater(random(...), doSomething).
When you are writing a program with Twisted, you start off by setting up ways to react to events. Then you run the reactor - i.e. the "thing that reacts to events" - forever. At any time you can set up new ways to react to incoming network events (reactor.connectTCP, reactor.listenTCP, reactor.callLater) or tear down existing waiting things (protocol.loseConnection, port.stopListening, delayedCall.cancel). You don't need to re-start the reactor; in fact, really, the only thing you should do before the reactor runs is do reactor.callWhenRunning(someFunctionThatListensOrConnects), and write someFunctionThatListensOrConnects to do all your initial set-up. That set-up then happens once the reactor is already running, which demonstrates that you don't need to do anything in advance; the reactor is perfectly capable of changing its configuration as it runs.
If the event that causes the server to send an event to client B the fact that client A sent it a message, then your question is answered by the FAQ, "how do I make input on one connection result in output on another?"

Categories