Determining which packet/update arrived first between two machines - python

I have two separate AWS Virtual Machines set up within a region (different availability zones) both are connected via WebSocket (in Python) to a different load balancer (Cloudfront) of the same host server (also hosted with AWS) and receive frequent small WebSocket payloads - every 5ms.
NB: I do not own the host server I am merely on the receiving end.
Both machines are receiving the same updates and I would like to measure on which machine the updates/payloads/packets are arriving first
In essence I would like to figure out which load balancer is "closer" to the host and so has the least latency overhead in transmitting the signal since my application is highly latency sensitive.
I have tried using the system clock to get timestamps of the data arrival however it is not guaranteed that the two instances have their time synced to an appropriate accuracy.

follow this.
Send a request to the load balancer with the body of the request containing the timestamp when it was sent to the server. You can
easily do this using the DateTime api of your fav language.
After that packet arrives to your backend server residing on your instance (can be a simple node server or a rails server), you can get
that request, and compare it to the current timestamp.
You can do this on both the servers and can easily compare which was faster.

Related

Trying to use gRPC in Python with multiple servers or possibly multiplexing

Our use case involves one class that has to remotely initialize several instances of another class (each on a different IoT device) and has to get certain results from each of these instances. At most, we would need to receive 30 messages a second from each remote client, with each message being relatively small. What type of architecture would you all recommend to solve this?
We were thinking that each class that is located on the IoT device will serve as a server and the class that receives the results would be the client, so should we create a server, each with its own channel, for each IoT device? Or is it possible to have each IoT device use the same service on the same server (meaning there would be multiple instances of the same service on the same server but on different devices)?
The question would benefit from additional detail to help guide an answer.
gRPC (and its use of HTTP/2) are 'heavier' protocols than e.g. MQTT. MQTT is more commonly used with IoT devices as it has a smaller footprint. REST/HTTP (even though heavier than MQTT) may also have benefits for you over gRPC/HTTP2.
If you're committed to gRPC, I wonder whether it would not be better to invert your proposed architecture and have the IoT device be the client? This seems to provide additional security in that the clients initiate communications with your servers rather than expose services. Either way (and if you decide to use MQTT), hopefully you'll be using mTLS. I assume (!?) client implementations are smaller than server implementations.
Regardless of the orientation, clients and servers can (independently) stream messages. The IoT devices (client or server) could stream the 30 messages/second. The servers could stream management|control messages.
I've no experience managing fleets of IoT devices but, remote management|monitoring and over-the-air upgrades|patching are, I assume, important requirements for you. gRPC does not limit any of these capabilities but debugging can be more challenging. With e.g. REST/HTTP, it is trivial to curl endpoints but with gRPC (even with the excellent grpcurl) you'll be constrained to the services implemented. Yes, you can't call a non-existent REST API either but I find remote-debugging gRPC services more challenging than REST.

What limits the number of connections to a Kubernetes service?

I've included more detail below, but the question I'm trying to answer is in the title. I'm currently trying to figure this out, but thought I'd ask here first in case anyone knows the answer off-hand.
About my setup
I have a Kubernetes service running on a Google Compute Engine cluster (started via Google Container Engine). It consists of a service (for the front-end stable IP), a replication controller, and pods running a Python server. The server is a Python gRPC server sleep-listening on a port.
There are 2 pods (2 replicas specified in the replication controller), one rc, one service, and 4 GCE instances (set to autoscale up to 5 based on CPU).
I'd like the service to be able to handle an arbitrary number of clients that want to stream information. However, I'm currently seeing that the service only talks to 16 of the clients.
I'm hypothesizing that the number of connections is either limited by the number of GCE instances I have, or by the number of pods. I'll be doing experiments to see how changing these numbers affects things.
Figured it out:
It's not the number of GCE instances: I increased the number of GCE instances with no change in the number of streaming clients.
It's the number of pods: each pod apparently can handle 8 connections. I simply scaled my replication controller with kubernetes scale rc <rc-name> --replicas=3 to support 24 clients.
I'll be looking into autoscaling (with a horizontal pod scaler?) the number of pods based on incoming HTTP requests.
Update 1:
Kubernetes doesn't currently support horizontal pod scaling based on HTTP.
Update 2:
Apparently there are other things at play here, like the size of the thread pool available to the server. With N threads and P pods, I'm able to maintain P*N open channels. This works particularly well for me because my clients only need to poll the server once every few seconds, and they sleep when inactive.

Scaling a decoupled realtime server alongside a standard webserver

Say I have a typical web server that serves standard HTML pages to clients, and a websocket server running alongside it used for realtime updates (chat, notifications, etc.).
My general workflow is when something occurs on the main server that triggers the need for a realtime message, the main server sends that message to the realtime server (via a message queue) and the realtime server distributes it to any related connection.
My concern is, if I want to scale things up a bit, and add another realtime server, it seems my only options are:
Have the main server keep track of which realtime server the client
is connected to. When that client receives a notification/chat
message, the main server forwards that message along to only the
realtime server the client is connected to. The downside here is
code complexity, as the main server has to do some extra book
keeping.
Or instead have the main server simply pass that message
along to every realtime server; only the server the client is
connected to would actually do anything with it. This would result
in a number of wasted messages being passed around.
Am I missing another option here? I'm just trying to make sure I don't go too far down one of these paths and realize I'm doing things totally wrong.
If the scenario is
a) The main web server raises a message upon an action (let's say a record is inserted)
b ) He notifies the appropriate real-time server
you could decouple these two steps by using an intermediate pub/sub architecture that forwards the messages to the indended recipient.
An implementation would be
1) You have a redis pub-sub channel where upon a client connecting to a real-time socket, you start listening in that channel
2) When the main app wants to notify a user via the real-time server, it pushes to the channel a message, the real-time server get's it and forwards it to the intended user.
This way, you decouple the realtime notification from the main app and you don't have to keep track of where the user is.
The problem you are describing is the common "message backplane" used for example in SignalR, also related to the "fanout message exchange" in message architectures. When having a backplane or doing fanout, every message is forwarded to every message node server, so clients can connect to any server and get the message. This approach is a reasonable pain when you have to support both long polling and websockets. However, as you noticed, it is a waste of traffic and resources.
You need to use a message infrastructure with intelligent routing, like RabbitMQ. Take a look to topic and header exchange : https://www.rabbitmq.com/tutorials/amqp-concepts.html
How Topic Exchanges Route Messages
RabbitMQ for Windows: Exchange Types
There are tons of different queuing frameworks. Pick the one you like, but ensure you can have more exchange modes than just direct or fanout ;) At the end, a WebSocket is just and endpoint to connect to a message infrastructure. So if you want to scale out, it boils down to the backend you have :)
For just a few realtime servers, you could conceivably just keep a list of them in the main server and just go through them round-robin.
Another approach is to use a load balancer.
Basically, you'll have one dedicated node to receive the requests from the main server, and then have that load-balancer node take care of choosing which websocket/realtime server to forward the request to.
Of course, this just shifts the code complexity from the main server to a new component, but conceptually I think it's better and more decoupled.
Changed the answer because a reply indicated that the "main" and "realtime" servers are alraady load-balanced clusters and not individual hosts.
The central scalability question seems to be:
My general workflow is when something occurs on the main server that triggers the need for a realtime message, the main server sends that message to the realtime server (via a message queue) and the realtime server distributes it to any related connection.
Emphasis on the word "related". Assume you have 10 "main" servers and 50 "realtime" servers, and an event occurs on main server #5: which of the websockets would be considered related to this event?
Worst case is that any event on any "main" server would need to propagate to all websockets. That's a O(N^2) complexity, which counts as a severe scalability impairment.
This O(N^2) complexity can only be prevented if you can group the related connections in groups that don't grow with the cluster size or total nr. of connections. Grouping requires state memory to store to which group(s) does a connection belong.
Remember that there's 3 ways to store state:
global memory (memcached / redis / DB, ...)
sticky routing (load balancer configuration)
client memory (cookies, browser local storage, link/redirect URLs)
Where option 3 counts as the most scalable one because it omits a central state storage.
For passing the messages from "main" to the "realtime" servers, that traffic should by definition be much smaller than the traffic towards the clients. There's also efficient frameworks to push pub/sub traffic.

How can multiple python servers on the same machine be made to interact with each other?

I need to implement "serial look-up" and "parallel look-up" across a set of local servers. My servers are simple HTTP python server instances. The local servers must be able to send updated data to other local servers. Each local server knows about all the remaining local servers and the ports they are bound to. I need to know if this kind of communication is possible. If yes how? . Searching on the web results in mostly single server - multiple client architectures.
I am implementing a research paper and I need to compare the latency in serial and parallel look up in such a scenario.
edit 1: It does not need to be a HTTP server. It was easy to set up initially, thus it was chosen. Other alternatives are welcome

Should I build a TCP server or use simple http messages for a back-end?

I am building a back end that will handle requests from web apps and mobile device apps.
I am trying to decide if an TCP server is appropriate for this vs. Regular http GET and POST requests.
Use case 1:
1. Client on mobile device executes a search on the device for the word "red".
Word sent to server (unclear wether JSON or TCP somehow)
The word red goes to the server and the server pulls all rows from a mysql db that have red as their color (this could be ~5000 results).
Alternate step 2 (maybe TCP should make more sense here): there is a hashmap built with the word red as the key and the value a pointer to an array of all the objects with the word red (I think this will be a faster look up time).
Data is sent to the phone (either JSON or some other way, not sure). I am unclear on this step.
The phone parses, etc...
There is a possibility that I may want to keep the array alive on the server until the user finishes the query (since they could continue to filter down results).
Based on this example, what is the architecture I should be looking at?
Any different way is highly appreciated.
Thank you
In your case I would use the HTTP because:
Your service is stateless.
If you use TCP you will have problem scaling up your service (since every request will be directed to the server that establish the TCP connection ), this relate to that your service is stateless. In HTTP you just add more servers behind load balane
For TCP you will need to state some port which can be block due to firewall and ect' - you can use port 80/8080 but I don't think this is good practice
if you service were more like suggestion that change as the use typein his word you may want to use TCP/HTTP Socket
TCP is used for more long term connection - like Security system that report the state of the system every X seconds - which is not the case

Categories