ZMQ pub/sub reliable/scalable design

ZMQ pub/sub reliable/scalable design - python

I'm designin a pub/sub architecture using ZMQ. I need maximum reliability and scalability and am kind of lost in the hell of possibilities provided.
At the moment, I got a set a publishers and subscribers, linked by a broker. The broker is a simple forwarder device exposing a frontend for publishers, and a backend for subscribers.
I need to handle the case when the broker crashes or disconnects, and improve the overall scalability.
Okay, so i thought of adding multiple brokers, the publishers would round robin the broker to send messages to, and the subscribers would just subscribe to all these brokers.
Then i needed a way to retrieve the list of possible brokers, so i wrote a name service that provides a list of brokers on demand. Publishers and subscribers ask this service which brokers to connect to.
I also wrote a kind of "lazy pirate" (i.e. try/retry one after the other) reliable name service in case the main name service falls.
I'm starting to think that i'm designing it wrong since the codebase is non stop increasing in size and complexity. I'm lost in the jungle of possibilities provided by ZMQ.
Maybe something router/dealer based would be usable here ?
Any advice greatly appreciated !

It's not possible to answer your question directly because it's predicated on so many assumptions, many of which are probably wrong.
You're getting lost because you're using the wrong approach. Consider 0MQ as a language, one that you don't know very well yet. If you start by trying to write "maximum reliability and scalability", you're going to end up with Godzilla's vomit.
So: use the approach I use in the Guide. Start with a minimal solution to the core message flow and get that working properly. Think very carefully about the right kind of sockets to use. Then make incremental improvements, each time testing fully to make sure you understand what is actually going on. Refactor the code regularly, as you find it growing. Continue until you have a stable minimal version 1. Do not aim for "maximum" anything at the start.
Finally, when you've understood the problem better, start again from scratch and again, build up a working model in several steps.
Repeat until you have totally dominated the problem and learned the best ways to solve it.

It seems like most of the complexity stems from trying to make the broker service persist in the event of a failure. Solving this at the application level gives you the highest degree of flexibility, but requires the most effort if you're starting from scratch.
Instead of handling this at the application level, you could instead handle this at the network level. Treat your brokers as you would any other simple network service and use an IP failover mechanism (e.g., pacemaker/corosync, UCARP, etc) to fail a virtual ip address over to the secondary service if the primary becomes unavailable.
This greatly simplifies your publishers and subscribers, because you don't need a name service. They only need to know about the single virtual ip address. ZMQ will take care of reconnecting to the service as necessary (i.e., when a failover occurs).

Related

Python message cache for comet?

I can best describe what I'm looking for with an example of a simplified version. One of the demos for Tornado is a simple chat server:
https://github.com/facebook/tornado/blob/master/demos/chat/chatdemo.py
I'm interested in the MessageMixin class here. It keeps a static-length backlog of messages, and when new messages are available, it returns the slice of the message list that's new. Or that's what it appears to do. I know that I've implemented something like that before when writing a simple comet app.
So has anyone generalized this and added fancy things to it? I'm particularly interested in a way to manage many channels of communication and delete ones that haven't been used in a while. Persistence might also be useful.
Is this something an MQ can do?

Redis has a publish/subscribe feature, along with additional data structure-oriented commands which you can use to persist and expire the message backlog, list users in a given room, or other attributes associated with them. The protocol is text-based and is a superset of the Memcached commands.
Here is a description which uses chat as an example of pub/sub along with a Ruby example using Websocket, and a snippet in Python which uses Websocket, Tornado and Redis pub/sub to implement a simple chat room.
Based on the information in your question, a dedicated message queue (like RabbitMQ) may also be useful to you. It is hard to say without knowing what you need in the areas of message volume, fault-tolerance, replication, etc. Redis may also be what you're looking for, but if nothing else it is pretty simple and could help you get a prototype running quickly to further nail down your app's requirements.

What is best pythonic way to communicate between spreaded services/unix machines?

Mornink!
I need to design, write and implement wide system consisting of multiple unix servers performing different roles and running different services. The system must be bullet proof, robust and fast. Yeah, I know. ;) Since I dont know how to approach this task, I've decided to ask you for your opinion before I leave design stage. Here is how the workflow is about to flow:
users are interacting with website, where they set up demands for service
this demand is being stored (database?) and some kind of message to central system (clustered) is being sent about new demand in database/queue
central system picks up the demand and sends signals to various other systems (clusters) to perform their duties (parts of the demanded service setup)
when they are done, they send up message to central system or the website that the service is now being served
Now, what is the modern, robust, clean and efficient way of storing these requests in some kind of queue, and executing them? Should I send some signals, or should I let all subsystems check the queue/db of any sort for new data? What could be that queue, should it be a database? How to deal with the messages? I thought about opening single tcp connection and sending data over that, along with comands triggering actions/functions on the other end, but at closer inspection, there has to be other, better way. So I found Spring Python, that has been criticized for being so 90's-ish.
I know its a very wide question, but I really hope you can help me wrap my head around that design and not make something stupid here :)
Thanks in advance!

Some general ideas for you:
You could have a master-client approach. Requests would be inserted in the master, stored in a database. Master knows the state of each client (same db). Whenever there is a request, the master redirects it to a free client. The client reports back when has finished the task (including answers if any), making it able to receive a new task from the master (this removes the need for pooling).
Communication could be done using web-services. An HTTP request/post should solve every cases. No need to actually go down to the TCP level.
Just general ideas, hope they're useful.

There are a number of message queue technologies out there which are Python friendly which could serve quite well. The top two that I know of are ActiveMQ and RabbitMQ, which both play well with Python, plus I found this comparison which states that ActiveMQ currently (as of 18 months ago!) outperforms RabbitMQ.

Text-based one-on-one chat with Flash interface: what to power the backend?

I'm building a website where I hook people up so that they can anonymously vent to strangers. You either choose to be a listener, or a talker, and then you get catapulted into a one-on-one chat room.
The reason for the app's construction is because you often can't vent to friends, because your deepest vulnerabilities can often be leveraged against you later on. (Like it or not, this is a part of human nature. Sad.)
I'm looking for some insight into how I should architect everything. I found this neat tutorial, http://giantflyingsaucer.com/blog/?p=875, which suggests using python & stackless + flash. Someone else suggested I should try using p2p sockets, but I don't even know where to begin to look for info on that.
Any other suggestions? I'd like to keep it simple. :^)

Unless you expect super high load, this is simple enough that it doesn't really matter what you use on the backend: just pick something you're comfortable with. PHP, Python, Ruby, Even a bash script using CGI - your skill level with the language is likely to make more difference that the language features themselves.

I would use an XMPP server like ejabberd or OpenFire to power the backend. XMPP contains everything you need for creating chat/real-time applications. You can use a Flex/Flash Actionscript library like Actionscript 3 XIFF to communicate with the XMPP server.

Flash is user-unfriendly for UI (forms, etc) and it is relatively easy to do what you want using HTML and Javascript on the front-end.
One possible approach for reading the messages would be to regularly do an Ajax request from the server for any new messages. Format the new message and insert it into the DOM.
You will probably need to answer at least these questions before you continue, though:
1) Are you recreating IRQ (everyone sees your posts), or is this a random one-to-one chat, like chatroulette?
1a) Is this a way for a specific person to talk to another specific person, or is this more like twitter?
2) What is your plan for scaling up if this idea takes off? Memcached should probably be a method of last-resort ("bandaid over a bullet-hole"). What's your roadmap for eventually handling a large volume of messages?
3) Is there any way to ignore users? Talk to certain users? Hide your rants from users?

Hey Zach I had to create a socket server for a flash game I made. I built my server in C#, but I would use whatever language your familiar with. If you let me know what your most comfortable with I could try to help find a good tutorial.
The one thing I spent many hours on was getting flash to work from a website with a socket server. With the newer versions of Flash you need to send back a policy file. In my case this needed to be the first chunk of data sent back to the client when they connected to the socket server.
Not sure what to tell you about structuring the back end. I need to know a little bit more about your programming experience. I had an array of all user connections, and was placing them in different "Rooms" so they could play each other. So just some simple arrays and understanding how to send messages to the clients would help you here.
If you have any familiarity with C# I would have no problem sending you the source code for my socket server.

Abstraction and client/server architecture questions for Python game program

Here is where I am at presently. I am designing a card game with the aim of utilizing major components for future work. The part that is hanging me up is creating a layer of abstraction between the server and the client(s). A server is started, and then one or more clients can connect (locally or remotely). I am designing a thick client but my friend is looking at doing a web-based client. I would like to design the server in a manner that allows a variety of different clients to call a common set of server commands.
So, for a start, I would like to create a 'server' which manages the game rules and player interactions, and a 'client' on the local CLI (I'm running Ubuntu Linux for convenience). I'm attempting to flesh out how the two pieces are supposed to interact, without mandating that future clients be CLI-based or on the local machine.
I've found the following two questions which are beneficial, but don't quite answer the above.
Client Server programming in python?
Evaluate my Python server structure
I don't require anything full-featured right away; I just want to establish the basic mechanisms for abstraction so that the resulting mock-up code reflects the relationship appropriately: there are different assumptions at play with a client/server relationship than with an all-in-one application.
Where do I start? What resources do you recommend?
Disclaimers:
I am familiar with code in a variety of languages and general programming/logic concepts, but have little real experience writing substantial amounts of code. This pet project is an attempt at rectifying this.
Also, I know the information is out there already, but I have the strong impression that I am missing the forest for the trees.

Read up on RESTful architectures.
Your fat client can use REST. It will use urllib2 to make RESTful requests of a server. It can exchange data in JSON notation.
A web client can use REST. It can make simple browser HTTP requests or a Javascript component can make more sophisticated REST requests using JSON.
Your server can be built as a simple WSGI application using any simple WSGI components. You have nice ones in the standard library, or you can use Werkzeug. Your server simply accepts REST requests and makes REST responses. Your server can work in HTML (for a browser) or JSON (for a fat client or Javascript client.)

I would consider basing all server / client interactions on HTTP -- probably with JSON payloads. This doesn't directly allow server-initiated interactions ("server push"), but the (newish but already traditional;-) workaround for that is AJAX-y (even though the X makes little sense as I suggest JSON payloads, not XML ones;-) -- the client initiates an async request (via a separate thread or otherwise) to a special URL on the server, and the server responds to those requests to (in practice) do "pushes". From what you say it looks like the limitations of this approach might not be a problem.
The key advantage of specifying the interactions in such terms is that they're entirely independent from the programming language -- so the web-based client in Javascript will be just as doable as your CLI one in Python, etc etc. Of course, the server can live on localhost as a special case, but there is no constraint for that as the HTTP URLs can specify whatever host is running the server; etc, etc.

First of all, regardless of the locality or type of the client, you will be communicating through an established message-based interface. All clients will be operating based on a common set of requests and responses, and the server will handle and reject these based on their validity according to game state. Whether you are dealing with local clients on the same machine or remote clients via HTTP does not matter whatsoever from an abstraction standpoint, as they will all be communicating through the same set of requests/responses.
What this comes down to is your protocol. Your protocol should be a well-defined and technically sound language between client and server that will allow clients to a) participate effectively, and b) participate fairly. This protocol should define what messages ('moves') a client can do, and when, and how the server will react.
Your protocol should be fully fleshed out and documented before you even start on game logic - the two are intrinsically connected and you will save a lot of wasted time and effort by competely defining your protocol first.
You protocol is the abstraction between client and server and it will also serve as the design document and programming guide for both.
Protocol design is all about state, state transitions, and validation. Game servers usually have a set of fairly common, generic states for each game instance e.g. initialization, lobby, gameplay, pause, recap, close game, etc...
Each one of these states has important state data related with it. For example, a 'lobby' state on the server-side might contain the known state of each player...how long since the last message or ping, what the player is doing (selecting an avatar, switching settings, going to the fridge, etc.). Organizing and managing state and substate data in code is important.
Managing these states, and the associated data requirements for each is a process that should be exquisitely planned out as they are directly related to volume of work and project complexity - this is very important and also great practice if you are using this project to step up into larger things.
Also, you must keep in mind that if you have a game, and you let people play, people will cheat. It's a fact of life. In order to minimize this, you must carefully design your protocol and state management to only ever allow valid state transitions. Never trust a single client packet.
For every permutation of client/server state, you must enforce a limited set of valid game messages, and you must be very careful in what you allow players to do, and when you allow them to do it.
Project complexity is generally exponential and not linear - client/server game programming is usually a good/painful way to learn this. Great question. Hope this helps, and good luck!

Data Synchronization framework / algorithm for server<->device?

I'm looking to implement data synchronization between servers and distributed clients. The data source on the server is mysql with django on top. The client can vary. Updates can take place on either client or server, and the connection between server and client is not reliable (eg. changes can be made on a disconnected cell phone, should get sync'd when the cell phone has a connection again).
S. Lott suggests using a version control design pattern in this question, which makes sense. I'm wondering if there are any existing packages / implementations of this I can use. Or, should I directly make use of svn/git/etc?
Are there other alternatives? There must be synchronization frameworks or detailed descriptions of algorithms out there, but I'm not having a lot of luck finding them. I'd appreciate if you point me in the right direction.

Perhaps using plain old rsync is enough.

AFAIK there isnt any generic solution to this mainly due to the diverse requirements for synchronization.
In one of our earlier projects we implemented a Spring batching based sync mechanism which relies on last updated timestamp field on each of the tables (that take part in sync).
I have heard about SyncML but dont have much experience with that.
If you have a single server and multiple clients, you could think of a JMS based approach.
The data is bundled and placed in Queues (or topics) and would be pulled by clients.
In your case, since updates are bi-directional, you need to handle conflict detection as well. This brings additional complexities.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.