Quick test if web-page offer asynchronous HTTP? - python

I am fetching current data from another company's web feed. It is a simple fetch of an XML file over HTTP. They haven't provided me with much documentation - just a URL.
Because I need to know as soon as possible when the data changes on their site, I need to poll frequently, which isn't a satisfactory solution for either side.
I was about to recommend to them that they set-up some sort of server push - presumably a long-term HTTP connection with asynchronous updates being sent by the server. I am not very familiar with any common protocols for this. It occurred to me that they may already offer this, and I have been too ignorant to realise.
Is there a common web-based protocol for server pushes over HTTP? If there is, is there a quick way I can check if they support it before I make myself look foolish by asking for something that is already available.
(Bonus points for a platform-independent, Python-based solution, but I will take what I can get.)

What you want is HTTP Streaming; read this page. "Comet" is what this technology is commonly called. One implementation is the Ajax Push Engine (APE); the page I just gave you has several others.
Now I don't think it's possible to automatically test if a server supports a push technology because as of now there are no standards on this and the protocols used will vary depending on the implementation.
Alternatively you can use periodic refresh ("polling"), and the advantages of this technique are: you don't need additional software on the server, and this can be done without the cooperation of the server you are polling (it is unfeasible to use Comet if the server you are querying won't install it).
For more information and tricks to reduce bandwidth usage on polling, see this page. Some of these will require some effort from the server you are polling.

I suggest you read this Wikipedia article on the subject. What you want is certainly possible, however it may not be supported by all browsers.
That said... I generally recommend against push technologies on the web, as they sap the resources of a server much faster than a request/response paradigm.
Perhaps there's another way? Polling frequently to see if the file changed is at least a small payload... why is it unsatisfactory for both sides?
Unless you can get the other company to change some of its practices -- perhaps to FTP you the new file, or call a webservice to let your company know that the file has changed -- you may be stuck with polling.

I'm not aware of any method to test if a web server support a push technology.
You should ask to that company if a Comet approach could be adopted to avoid polling.
For Comet python-based solution, have a look here.

To avoid unnecessary download I would check etags and Last-modified headers as described here
http://diveintopython3.ep.io/http-web-services.html

Related

Delivering Python Processed data to the web

I have developed a python program that parses a webpage and creates a new text document with the parsed data. I want to deliver this new information to the web. I have no idea where to start with something like this. Are there any free options where I can have a site automatically call this python code upon request and update the new data to its page? Or is the only feasible solution here to have my own website/server that uses my code? I'm honestly pretty overwhelmed with many of the options when I try to begin doing a web-search for a solution like this. I have done a decent amount of application programming before so i'm confident in my ability to learn new things, but web protocols are all new to me so its hard to find a starting point.
Ultimately I want this python code to run automatically, or per request of a user, and deliver to the data to them. It could even be through an email, although that is probably less practical.
I personally have good experience using Google Appengine (and its free for a limited amount of requests). The downside is that it does not allow C-extensions or Python3.
If you want to host your own server, tornado is a good option I think. Tornado supports both Python2 and Python3.
There are a great deal of options available.. from 'traditional' virtual server or website hosts like a2hosting or godaddy to 'Cloud Application Hosts' such as Amazon EC2, Heroku or OpenShift.
For your case, and without knowing more, I would suggest that an application hosting is more appropriate, and that you should take a look at Heroku and Openshift in particular.
Define carefully what you want to achieve (how the users access your application, what they see, how they interact with it... etc..) and then evaluate these options based on those requirements.
Most offer a free trial, or even free services, depending on what you need! Good luck
If you've never worked with web technologies before this will be a overwhelming task, since there's a lot of different technologies involved, and many possible ways to combine them.
You'll probably want to start by familiarizing yourself with the very basics of the HTTP protocol.
Then you should read a bit on CGI server-side programming (the article also has a quick overview on HTTP).
Python can run both on CGI and WSGI (if the server provider allows such access), so you may also want to read about WSGI.
Once you grasp all these concepts, you should check this question for actual python techniques.
Also, since you seem to be under the impression you must pay to have a website/app deployed, you should know there are companies that host python apps for free

What is best pythonic way to communicate between spreaded services/unix machines?

Mornink!
I need to design, write and implement wide system consisting of multiple unix servers performing different roles and running different services. The system must be bullet proof, robust and fast. Yeah, I know. ;) Since I dont know how to approach this task, I've decided to ask you for your opinion before I leave design stage. Here is how the workflow is about to flow:
users are interacting with website, where they set up demands for service
this demand is being stored (database?) and some kind of message to central system (clustered) is being sent about new demand in database/queue
central system picks up the demand and sends signals to various other systems (clusters) to perform their duties (parts of the demanded service setup)
when they are done, they send up message to central system or the website that the service is now being served
Now, what is the modern, robust, clean and efficient way of storing these requests in some kind of queue, and executing them? Should I send some signals, or should I let all subsystems check the queue/db of any sort for new data? What could be that queue, should it be a database? How to deal with the messages? I thought about opening single tcp connection and sending data over that, along with comands triggering actions/functions on the other end, but at closer inspection, there has to be other, better way. So I found Spring Python, that has been criticized for being so 90's-ish.
I know its a very wide question, but I really hope you can help me wrap my head around that design and not make something stupid here :)
Thanks in advance!
Some general ideas for you:
You could have a master-client approach. Requests would be inserted in the master, stored in a database. Master knows the state of each client (same db). Whenever there is a request, the master redirects it to a free client. The client reports back when has finished the task (including answers if any), making it able to receive a new task from the master (this removes the need for pooling).
Communication could be done using web-services. An HTTP request/post should solve every cases. No need to actually go down to the TCP level.
Just general ideas, hope they're useful.
There are a number of message queue technologies out there which are Python friendly which could serve quite well. The top two that I know of are ActiveMQ and RabbitMQ, which both play well with Python, plus I found this comparison which states that ActiveMQ currently (as of 18 months ago!) outperforms RabbitMQ.

Abstraction and client/server architecture questions for Python game program

Here is where I am at presently. I am designing a card game with the aim of utilizing major components for future work. The part that is hanging me up is creating a layer of abstraction between the server and the client(s). A server is started, and then one or more clients can connect (locally or remotely). I am designing a thick client but my friend is looking at doing a web-based client. I would like to design the server in a manner that allows a variety of different clients to call a common set of server commands.
So, for a start, I would like to create a 'server' which manages the game rules and player interactions, and a 'client' on the local CLI (I'm running Ubuntu Linux for convenience). I'm attempting to flesh out how the two pieces are supposed to interact, without mandating that future clients be CLI-based or on the local machine.
I've found the following two questions which are beneficial, but don't quite answer the above.
Client Server programming in python?
Evaluate my Python server structure
I don't require anything full-featured right away; I just want to establish the basic mechanisms for abstraction so that the resulting mock-up code reflects the relationship appropriately: there are different assumptions at play with a client/server relationship than with an all-in-one application.
Where do I start? What resources do you recommend?
Disclaimers:
I am familiar with code in a variety of languages and general programming/logic concepts, but have little real experience writing substantial amounts of code. This pet project is an attempt at rectifying this.
Also, I know the information is out there already, but I have the strong impression that I am missing the forest for the trees.
Read up on RESTful architectures.
Your fat client can use REST. It will use urllib2 to make RESTful requests of a server. It can exchange data in JSON notation.
A web client can use REST. It can make simple browser HTTP requests or a Javascript component can make more sophisticated REST requests using JSON.
Your server can be built as a simple WSGI application using any simple WSGI components. You have nice ones in the standard library, or you can use Werkzeug. Your server simply accepts REST requests and makes REST responses. Your server can work in HTML (for a browser) or JSON (for a fat client or Javascript client.)
I would consider basing all server / client interactions on HTTP -- probably with JSON payloads. This doesn't directly allow server-initiated interactions ("server push"), but the (newish but already traditional;-) workaround for that is AJAX-y (even though the X makes little sense as I suggest JSON payloads, not XML ones;-) -- the client initiates an async request (via a separate thread or otherwise) to a special URL on the server, and the server responds to those requests to (in practice) do "pushes". From what you say it looks like the limitations of this approach might not be a problem.
The key advantage of specifying the interactions in such terms is that they're entirely independent from the programming language -- so the web-based client in Javascript will be just as doable as your CLI one in Python, etc etc. Of course, the server can live on localhost as a special case, but there is no constraint for that as the HTTP URLs can specify whatever host is running the server; etc, etc.
First of all, regardless of the locality or type of the client, you will be communicating through an established message-based interface. All clients will be operating based on a common set of requests and responses, and the server will handle and reject these based on their validity according to game state. Whether you are dealing with local clients on the same machine or remote clients via HTTP does not matter whatsoever from an abstraction standpoint, as they will all be communicating through the same set of requests/responses.
What this comes down to is your protocol. Your protocol should be a well-defined and technically sound language between client and server that will allow clients to a) participate effectively, and b) participate fairly. This protocol should define what messages ('moves') a client can do, and when, and how the server will react.
Your protocol should be fully fleshed out and documented before you even start on game logic - the two are intrinsically connected and you will save a lot of wasted time and effort by competely defining your protocol first.
You protocol is the abstraction between client and server and it will also serve as the design document and programming guide for both.
Protocol design is all about state, state transitions, and validation. Game servers usually have a set of fairly common, generic states for each game instance e.g. initialization, lobby, gameplay, pause, recap, close game, etc...
Each one of these states has important state data related with it. For example, a 'lobby' state on the server-side might contain the known state of each player...how long since the last message or ping, what the player is doing (selecting an avatar, switching settings, going to the fridge, etc.). Organizing and managing state and substate data in code is important.
Managing these states, and the associated data requirements for each is a process that should be exquisitely planned out as they are directly related to volume of work and project complexity - this is very important and also great practice if you are using this project to step up into larger things.
Also, you must keep in mind that if you have a game, and you let people play, people will cheat. It's a fact of life. In order to minimize this, you must carefully design your protocol and state management to only ever allow valid state transitions. Never trust a single client packet.
For every permutation of client/server state, you must enforce a limited set of valid game messages, and you must be very careful in what you allow players to do, and when you allow them to do it.
Project complexity is generally exponential and not linear - client/server game programming is usually a good/painful way to learn this. Great question. Hope this helps, and good luck!

REST / JSON / XML-RPC / SOAP

Sorry for being the 100000th person to ask the same question. But I guess my case is slightly distinctive.
The application is that we'd like to have an Android phone client on 3g and a light python web service server.
The phone would do most of the work and do a lot of uploading, pictures, GPS, etc etc. The server just has to respond with an 'ok' per upload.
I want to use the lightest method, easiest on the battery. But reading about all these protocols is a bit confusing since they all sound the same.
Are they all on the same levels? Or can JSON be a RESTful thing etc?
So as described, the key here is uploading. Does all the input for a REST transaction have to be in a URI? i.e. http://www.server.com/upload/0x81d058f82ac13.
XML-RPC and SOAP sound decently similar from Googling too.
REST mandates the general semantics and concepts. The transport and encodings are up to you. They were originally formulated on XML, but JSON is totally applicable.
XML-RPC / SOAP are different mechanisms, but mostly the same ideas: how to map OO APIs on top of XML and HTTP. IMHO, they're disgusting from a design view. I was so relieved when found about REST. In your case, i'm sure that the lots of layers would mean a lot more CPU demand.
I'd say go REST, using JSON for encoding; but if your requirements are really that simple as just uploading, then you can use simply HTTP (which might be RESTful in design even without adding any specific library)

Data Synchronization framework / algorithm for server<->device?

I'm looking to implement data synchronization between servers and distributed clients. The data source on the server is mysql with django on top. The client can vary. Updates can take place on either client or server, and the connection between server and client is not reliable (eg. changes can be made on a disconnected cell phone, should get sync'd when the cell phone has a connection again).
S. Lott suggests using a version control design pattern in this question, which makes sense. I'm wondering if there are any existing packages / implementations of this I can use. Or, should I directly make use of svn/git/etc?
Are there other alternatives? There must be synchronization frameworks or detailed descriptions of algorithms out there, but I'm not having a lot of luck finding them. I'd appreciate if you point me in the right direction.
Perhaps using plain old rsync is enough.
AFAIK there isnt any generic solution to this mainly due to the diverse requirements for synchronization.
In one of our earlier projects we implemented a Spring batching based sync mechanism which relies on last updated timestamp field on each of the tables (that take part in sync).
I have heard about SyncML but dont have much experience with that.
If you have a single server and multiple clients, you could think of a JMS based approach.
The data is bundled and placed in Queues (or topics) and would be pulled by clients.
In your case, since updates are bi-directional, you need to handle conflict detection as well. This brings additional complexities.

Categories