Data Synchronization framework / algorithm for server<->device?

Data Synchronization framework / algorithm for server<->device? - python

I'm looking to implement data synchronization between servers and distributed clients. The data source on the server is mysql with django on top. The client can vary. Updates can take place on either client or server, and the connection between server and client is not reliable (eg. changes can be made on a disconnected cell phone, should get sync'd when the cell phone has a connection again).
S. Lott suggests using a version control design pattern in this question, which makes sense. I'm wondering if there are any existing packages / implementations of this I can use. Or, should I directly make use of svn/git/etc?
Are there other alternatives? There must be synchronization frameworks or detailed descriptions of algorithms out there, but I'm not having a lot of luck finding them. I'd appreciate if you point me in the right direction.

Perhaps using plain old rsync is enough.

AFAIK there isnt any generic solution to this mainly due to the diverse requirements for synchronization.
In one of our earlier projects we implemented a Spring batching based sync mechanism which relies on last updated timestamp field on each of the tables (that take part in sync).
I have heard about SyncML but dont have much experience with that.
If you have a single server and multiple clients, you could think of a JMS based approach.
The data is bundled and placed in Queues (or topics) and would be pulled by clients.
In your case, since updates are bi-directional, you need to handle conflict detection as well. This brings additional complexities.

Related

codeigniter as web and python as the web service

is it possible to make the python as the web server and the front end is codeigniter?
For some reasons:
database security - like when you are saving data. the codeigniter will pass the data to python basehttpserver / or maybe flask (but i have not yet done the flask before)
SQL Injection.
it's like for example. front end codeigniter
form - send data.
back end python web service
receives data - will serve as an API. and the python will be the one in charge of saving data in MySQLdb.

In theory I don't see why this would be impossible.
You could easily write a web application using codeigniter and have the controllers just pass data along to a python based web service. If you're interested in a fully decoupled front-end/back-end, you could also use a queuing layer (such as RabbitMQ) in between the data entry facilities in your CI program, and the persistence web services in Python.
That said, I'm not clear why you would want to. CodeIgniter is PHP, and includes some very excellent data modelling components that integrate fully into the overall framework. Long story short, if you're using CodeIgniter, just have it connect to MySQL and do the data persistence for you.
Likewise, if you'd prefer to code your persistence in Python, why not just use Django? It's a fully realized Python web framework, and also features an excellent ORM and support for MySQL.
I don't really see how either technology gives you clear benefits, provided they are both used properly, for database security. Both have built-in methodologies for "cleansing" user provided data to prevent SQL injection (notes for Django and notes for CodeIgniter)
There are a great many other posts on StackOverflow dealing with preventing SQL injection with CodeIgniter and other frameworks. Just using python, or decoupling your front-end and back-end, will not provide you any additional security or protection guarantees. The only way to do that is to carefully architect your interactions with databases, using all the tools provided you by whatever framework you are using, or creating their equivalents if none are available (or switch to a better toolset).
Edit - expansion
Based on the comments above I figured it was actually worth writing a little more about the potential advantages and real challenges of a decoupled infrastructure.
In principle, it's easy to decouple a front-end from a more isolated backend. You could leverage in either Django or likely CodeIgniter (although I haven't personally seen it done in CI, just in Django) the existing model infrastructure, but deal with model objects in memory only on the frontend, and expand on the existing ORM functionality to use your backend services to actually store and retrieve data from a persistence layer (your database).
Practically, this can become quite a bit of work to do right. To gain the security advantages you desire, your decoupled backend needs to deal with the frontend as if in principle it is "hostile", or at the least, untrustworthy. So, be sure that you implement a method for the frontend to reliably authenticate itself to the backend. Ensure that all traffic is minimally using SSL between the frontend and the backend. Consider carefully your services architecture (the SOA layer in front of your backend logic) and make sure your APIs, where possible, are MECE (Mutually exclusive, collectively exhaustive).
I'm sure I'm missing some basic principles, but having recently participated in the design and build of a system along these lines I can assure you that the complexity can very quickly explode, so careful architectural discipline and adherence to both MECE and MVP (minimum viable product) is critical. A decoupled infrastructure can be an amazing end-product if it fits the need, and in use cases I've worked with it has been extremely effective. It isn't a one-size-fits-all, though, and hopefully some of the extra description here can help you make a more informed choice.
Hopefully this helps round out the topic answer. Basic principles: Design for what you need. Don't conflate complicated with secure. Simple can be secure, as can complex, but complexity breeds room for hard-to-plug security vulnerabilities, and simple gives the illusion of security by seeming easy. No approach guarantees a positive outcome, so don't try to cut corners; spend as much time as you can in research and design to minimize your time building, refactoring, and fixing.

ZMQ pub/sub reliable/scalable design

I'm designin a pub/sub architecture using ZMQ. I need maximum reliability and scalability and am kind of lost in the hell of possibilities provided.
At the moment, I got a set a publishers and subscribers, linked by a broker. The broker is a simple forwarder device exposing a frontend for publishers, and a backend for subscribers.
I need to handle the case when the broker crashes or disconnects, and improve the overall scalability.
Okay, so i thought of adding multiple brokers, the publishers would round robin the broker to send messages to, and the subscribers would just subscribe to all these brokers.
Then i needed a way to retrieve the list of possible brokers, so i wrote a name service that provides a list of brokers on demand. Publishers and subscribers ask this service which brokers to connect to.
I also wrote a kind of "lazy pirate" (i.e. try/retry one after the other) reliable name service in case the main name service falls.
I'm starting to think that i'm designing it wrong since the codebase is non stop increasing in size and complexity. I'm lost in the jungle of possibilities provided by ZMQ.
Maybe something router/dealer based would be usable here ?
Any advice greatly appreciated !

It's not possible to answer your question directly because it's predicated on so many assumptions, many of which are probably wrong.
You're getting lost because you're using the wrong approach. Consider 0MQ as a language, one that you don't know very well yet. If you start by trying to write "maximum reliability and scalability", you're going to end up with Godzilla's vomit.
So: use the approach I use in the Guide. Start with a minimal solution to the core message flow and get that working properly. Think very carefully about the right kind of sockets to use. Then make incremental improvements, each time testing fully to make sure you understand what is actually going on. Refactor the code regularly, as you find it growing. Continue until you have a stable minimal version 1. Do not aim for "maximum" anything at the start.
Finally, when you've understood the problem better, start again from scratch and again, build up a working model in several steps.
Repeat until you have totally dominated the problem and learned the best ways to solve it.

It seems like most of the complexity stems from trying to make the broker service persist in the event of a failure. Solving this at the application level gives you the highest degree of flexibility, but requires the most effort if you're starting from scratch.
Instead of handling this at the application level, you could instead handle this at the network level. Treat your brokers as you would any other simple network service and use an IP failover mechanism (e.g., pacemaker/corosync, UCARP, etc) to fail a virtual ip address over to the secondary service if the primary becomes unavailable.
This greatly simplifies your publishers and subscribers, because you don't need a name service. They only need to know about the single virtual ip address. ZMQ will take care of reconnecting to the service as necessary (i.e., when a failover occurs).

What is best pythonic way to communicate between spreaded services/unix machines?

Mornink!
I need to design, write and implement wide system consisting of multiple unix servers performing different roles and running different services. The system must be bullet proof, robust and fast. Yeah, I know. ;) Since I dont know how to approach this task, I've decided to ask you for your opinion before I leave design stage. Here is how the workflow is about to flow:
users are interacting with website, where they set up demands for service
this demand is being stored (database?) and some kind of message to central system (clustered) is being sent about new demand in database/queue
central system picks up the demand and sends signals to various other systems (clusters) to perform their duties (parts of the demanded service setup)
when they are done, they send up message to central system or the website that the service is now being served
Now, what is the modern, robust, clean and efficient way of storing these requests in some kind of queue, and executing them? Should I send some signals, or should I let all subsystems check the queue/db of any sort for new data? What could be that queue, should it be a database? How to deal with the messages? I thought about opening single tcp connection and sending data over that, along with comands triggering actions/functions on the other end, but at closer inspection, there has to be other, better way. So I found Spring Python, that has been criticized for being so 90's-ish.
I know its a very wide question, but I really hope you can help me wrap my head around that design and not make something stupid here :)
Thanks in advance!

Some general ideas for you:
You could have a master-client approach. Requests would be inserted in the master, stored in a database. Master knows the state of each client (same db). Whenever there is a request, the master redirects it to a free client. The client reports back when has finished the task (including answers if any), making it able to receive a new task from the master (this removes the need for pooling).
Communication could be done using web-services. An HTTP request/post should solve every cases. No need to actually go down to the TCP level.
Just general ideas, hope they're useful.

There are a number of message queue technologies out there which are Python friendly which could serve quite well. The top two that I know of are ActiveMQ and RabbitMQ, which both play well with Python, plus I found this comparison which states that ActiveMQ currently (as of 18 months ago!) outperforms RabbitMQ.

Quick test if web-page offer asynchronous HTTP?

I am fetching current data from another company's web feed. It is a simple fetch of an XML file over HTTP. They haven't provided me with much documentation - just a URL.
Because I need to know as soon as possible when the data changes on their site, I need to poll frequently, which isn't a satisfactory solution for either side.
I was about to recommend to them that they set-up some sort of server push - presumably a long-term HTTP connection with asynchronous updates being sent by the server. I am not very familiar with any common protocols for this. It occurred to me that they may already offer this, and I have been too ignorant to realise.
Is there a common web-based protocol for server pushes over HTTP? If there is, is there a quick way I can check if they support it before I make myself look foolish by asking for something that is already available.
(Bonus points for a platform-independent, Python-based solution, but I will take what I can get.)

What you want is HTTP Streaming; read this page. "Comet" is what this technology is commonly called. One implementation is the Ajax Push Engine (APE); the page I just gave you has several others.
Now I don't think it's possible to automatically test if a server supports a push technology because as of now there are no standards on this and the protocols used will vary depending on the implementation.
Alternatively you can use periodic refresh ("polling"), and the advantages of this technique are: you don't need additional software on the server, and this can be done without the cooperation of the server you are polling (it is unfeasible to use Comet if the server you are querying won't install it).
For more information and tricks to reduce bandwidth usage on polling, see this page. Some of these will require some effort from the server you are polling.

I suggest you read this Wikipedia article on the subject. What you want is certainly possible, however it may not be supported by all browsers.
That said... I generally recommend against push technologies on the web, as they sap the resources of a server much faster than a request/response paradigm.
Perhaps there's another way? Polling frequently to see if the file changed is at least a small payload... why is it unsatisfactory for both sides?
Unless you can get the other company to change some of its practices -- perhaps to FTP you the new file, or call a webservice to let your company know that the file has changed -- you may be stuck with polling.

I'm not aware of any method to test if a web server support a push technology.
You should ask to that company if a Comet approach could be adopted to avoid polling.
For Comet python-based solution, have a look here.

To avoid unnecessary download I would check etags and Last-modified headers as described here
http://diveintopython3.ep.io/http-web-services.html

Abstraction and client/server architecture questions for Python game program

Here is where I am at presently. I am designing a card game with the aim of utilizing major components for future work. The part that is hanging me up is creating a layer of abstraction between the server and the client(s). A server is started, and then one or more clients can connect (locally or remotely). I am designing a thick client but my friend is looking at doing a web-based client. I would like to design the server in a manner that allows a variety of different clients to call a common set of server commands.
So, for a start, I would like to create a 'server' which manages the game rules and player interactions, and a 'client' on the local CLI (I'm running Ubuntu Linux for convenience). I'm attempting to flesh out how the two pieces are supposed to interact, without mandating that future clients be CLI-based or on the local machine.
I've found the following two questions which are beneficial, but don't quite answer the above.
Client Server programming in python?
Evaluate my Python server structure
I don't require anything full-featured right away; I just want to establish the basic mechanisms for abstraction so that the resulting mock-up code reflects the relationship appropriately: there are different assumptions at play with a client/server relationship than with an all-in-one application.
Where do I start? What resources do you recommend?
Disclaimers:
I am familiar with code in a variety of languages and general programming/logic concepts, but have little real experience writing substantial amounts of code. This pet project is an attempt at rectifying this.
Also, I know the information is out there already, but I have the strong impression that I am missing the forest for the trees.

Read up on RESTful architectures.
Your fat client can use REST. It will use urllib2 to make RESTful requests of a server. It can exchange data in JSON notation.
A web client can use REST. It can make simple browser HTTP requests or a Javascript component can make more sophisticated REST requests using JSON.
Your server can be built as a simple WSGI application using any simple WSGI components. You have nice ones in the standard library, or you can use Werkzeug. Your server simply accepts REST requests and makes REST responses. Your server can work in HTML (for a browser) or JSON (for a fat client or Javascript client.)

I would consider basing all server / client interactions on HTTP -- probably with JSON payloads. This doesn't directly allow server-initiated interactions ("server push"), but the (newish but already traditional;-) workaround for that is AJAX-y (even though the X makes little sense as I suggest JSON payloads, not XML ones;-) -- the client initiates an async request (via a separate thread or otherwise) to a special URL on the server, and the server responds to those requests to (in practice) do "pushes". From what you say it looks like the limitations of this approach might not be a problem.
The key advantage of specifying the interactions in such terms is that they're entirely independent from the programming language -- so the web-based client in Javascript will be just as doable as your CLI one in Python, etc etc. Of course, the server can live on localhost as a special case, but there is no constraint for that as the HTTP URLs can specify whatever host is running the server; etc, etc.

First of all, regardless of the locality or type of the client, you will be communicating through an established message-based interface. All clients will be operating based on a common set of requests and responses, and the server will handle and reject these based on their validity according to game state. Whether you are dealing with local clients on the same machine or remote clients via HTTP does not matter whatsoever from an abstraction standpoint, as they will all be communicating through the same set of requests/responses.
What this comes down to is your protocol. Your protocol should be a well-defined and technically sound language between client and server that will allow clients to a) participate effectively, and b) participate fairly. This protocol should define what messages ('moves') a client can do, and when, and how the server will react.
Your protocol should be fully fleshed out and documented before you even start on game logic - the two are intrinsically connected and you will save a lot of wasted time and effort by competely defining your protocol first.
You protocol is the abstraction between client and server and it will also serve as the design document and programming guide for both.
Protocol design is all about state, state transitions, and validation. Game servers usually have a set of fairly common, generic states for each game instance e.g. initialization, lobby, gameplay, pause, recap, close game, etc...
Each one of these states has important state data related with it. For example, a 'lobby' state on the server-side might contain the known state of each player...how long since the last message or ping, what the player is doing (selecting an avatar, switching settings, going to the fridge, etc.). Organizing and managing state and substate data in code is important.
Managing these states, and the associated data requirements for each is a process that should be exquisitely planned out as they are directly related to volume of work and project complexity - this is very important and also great practice if you are using this project to step up into larger things.
Also, you must keep in mind that if you have a game, and you let people play, people will cheat. It's a fact of life. In order to minimize this, you must carefully design your protocol and state management to only ever allow valid state transitions. Never trust a single client packet.
For every permutation of client/server state, you must enforce a limited set of valid game messages, and you must be very careful in what you allow players to do, and when you allow them to do it.
Project complexity is generally exponential and not linear - client/server game programming is usually a good/painful way to learn this. Great question. Hope this helps, and good luck!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.