How to implement a two way jsonrpc + twisted server/client

How to implement a two way jsonrpc + twisted server/client - python

Hello I am working on develop a rpc server based on twisted to serve several microcontrollers which make rpc call to twisted jsonrpc server. But the application also required that server send information to each micro at any time, so the question is how could be a good practice to prevent that the response from a remote jsonrpc call from a micro be confused with a server jsonrpc request which is made for a user.
The consequence that I am having now is that micros are receiving bad information, because they dont know if netstring/json string that is comming from socket is their response from a previous requirement or is a new request from server.
Here is my code:
from twisted.internet import reactor
from txjsonrpc.netstring import jsonrpc
import weakref
creds = {'user1':'pass1','user2':'pass2','user3':'pass3'}
class arduinoRPC(jsonrpc.JSONRPC):
def connectionMade(self):
pass
def jsonrpc_identify(self,username,password,mac):
""" Each client must be authenticated just after to be connected calling this rpc """
if creds.has_key(username):
if creds[username] == password:
authenticated = True
else:
authenticated = False
else:
authenticated = False
if authenticated:
self.factory.clients.append(self)
self.factory.references[mac] = weakref.ref(self)
return {'results':'Authenticated as %s'%username,'error':None}
else:
self.transport.loseConnection()
def jsonrpc_sync_acq(self,data,f):
"""Save into django table data acquired from sensors and send ack to gateway"""
if not (self in self.factory.clients):
self.transport.loseConnection()
print f
return {'results':'synced %s records'%len(data),'error':'null'}
def connectionLost(self, reason):
""" mac address is searched and all reference to self.factory.clientes are erased """
for mac in self.factory.references.keys():
if self.factory.references[mac]() == self:
print 'Connection closed - Mac address: %s'%mac
del self.factory.references[mac]
self.factory.clients.remove(self)
class rpcfactory(jsonrpc.RPCFactory):
protocol = arduinoRPC
def __init__(self, maxLength=1024):
self.maxLength = maxLength
self.subHandlers = {}
self.clients = []
self.references = {}
""" Asynchronous remote calling to micros, simulating random calling from server """
import threading,time,random,netstring,json
class asyncGatewayCalls(threading.Thread):
def __init__(self,rpcfactory):
threading.Thread.__init__(self)
self.rpcfactory = rpcfactory
"""identifiers of each micro/client connected"""
self.remoteMacList = ['12:23:23:23:23:23:23','167:67:67:67:67:67:67','90:90:90:90:90:90:90']
def run(self):
while True:
time.sleep(10)
while True:
""" call to any of three potential micros connected """
mac = self.remoteMacList[random.randrange(0,len(self.remoteMacList))]
if self.rpcfactory.references.has_key(mac):
print 'Calling %s'%mac
proto = self.rpcfactory.references[mac]()
""" requesting echo from selected micro"""
dataToSend = netstring.encode(json.dumps({'method':'echo_from_micro','params':['plop']}))
proto.transport.write(dataToSend)
break
factory = rpcfactory(arduinoRPC)
"""start thread caller"""
r=asyncGatewayCalls(factory)
r.start()
reactor.listenTCP(7080, factory)
print "Micros remote RPC server started"
reactor.run()

You need to add a enough information to each message so that the recipient can determine how to interpret it. Your requirements sounds very similar to those of AMP, so you could either use AMP instead or use the same structure as AMP to identify your messages. Specifically:
In requests, put a particular key - for example, AMP uses "_ask" to identify requests. It also gives these a unique value, which further identifies that request for the lifetime of the connection.
In responses, put a different key - for example, AMP uses "_answer" for this. The value matches up with the value from the "_ask" key in the request the response is for.
Using an approach like this, you just have to look to see whether there is an "_ask" key or an "_answer" key to determine if you've received a new request or a response to a previous request.
On a separate topic, your asyncGatewayCalls class shouldn't be thread-based. There's no apparent reason for it to use threads, and by doing so it is also misusing Twisted APIs in a way which will lead to undefined behavior. Most Twisted APIs can only be used in the thread in which you called reactor.run. The only exception is reactor.callFromThread, which you can use to send a message to the reactor thread from any other thread. asyncGatewayCalls tries to write to a transport, though, which will lead to buffer corruption or arbitrary delays in the data being sent, or perhaps worse things. Instead, you can write asyncGatewayCalls like this:
from twisted.internet.task import LoopingCall
class asyncGatewayCalls(object):
def __init__(self, rpcfactory):
self.rpcfactory = rpcfactory
self.remoteMacList = [...]
def run():
self._call = LoopingCall(self._pokeMicro)
return self._call.start(10)
def _pokeMicro(self):
while True:
mac = self.remoteMacList[...]
if mac in self.rpcfactory.references:
proto = ...
dataToSend = ...
proto.transport.write(dataToSend)
break
factory = ...
r = asyncGatewayCalls(factory)
r.run()
reactor.listenTCP(7080, factory)
reactor.run()
This gives you a single-threaded solution which should have the same behavior as you intended for the original asyncGatewayCalls class. Instead of sleeping in a loop in a thread in order to schedule the calls, though, it uses the reactor's scheduling APIs (via the higher-level LoopingCall class, which schedules things to be called repeatedly) to make sure _pokeMicro gets called every ten seconds.

Related

How can I dynamically change the type of tweets that are being streamed and figure out which message to send to who?

App Description
So I'm trying to create an application that does real-time sentiment analysis on tweets(as close to real time as I'm able to get it) and these tweets have to be based on user input. So in the main page of my application, I have a simple search bar where the user can enter a topic they would like to perform sentiment analysis on and when they press enter, it would take them to another page where they see a line chart displaying all the data in real time.
Problem 1
The first problem I'm facing at the moment is that I don't know how I can get tweepy to change what it is tracking when two or more people make a request. If I were to have global streaming that I simply disconnect and reconnect every time the user makes a new query, then it is also going to disconnect for other users as well which I don't want. On the other hand, if I were to allocate a streaming object for each user that connects, then this strategy should work. This still poses a problem. Twitter does not allow you to hold more than one connection at a time it seems given this StackOverflow post.
Does Tweepy support running multiple Streams to collect data?
If I still were to go along with this, I risk getting my IP banned. So both of these solutions are no good.
Problem 2
The last problem I'm having is figuring out who the message belongs to. At the moment, I'm using RabbitMQ to store all incoming messages in one single queue called twitter_topic_feed. For every tweet that I receive from tweepy, I publish it in that queue. Then RabbiMQ consumes the message and sends it to every available connection. Obviously, that behaviour is not what I'm looking for. Consider two users who search for pizza and sports. Both users will receive tweets pertaining to football and pizza when one user asked for sports tweets and the other asked for pizza tweets.
One idea is to create a queue with a unique identifier for each available connection. The identifier would have the form {Search Term}_{Hash ID}.
For generating the hash ID, I can use the UUID package that is available in python and create the ID when the connection opens and delete it when it closes. Of course, when they close the connection I also need to delete the queue. I'm not sure how well this solution would scale. If we were to have 10,000 connections, we would have 10,000 queues and each queue could potentially have a lot of messages stored in it. Seems like it would be very memory intensive.
Design
tornado Framework for WebSockets,
tweepy API for streaming tweets
RabbitMQ For publishing messages to the queue whenever tweepy receives a new tweet. RabbitMQ will then consume that message and send it to the WebSocket.
Attempt(What I currently have so far)
TweetStreamListener uses the tweepy API to listen for tweets based on the user's input. Whatever tweet it gets, it calculates the polarity of that tweet and publishes it to rabbitMQ twitter_topic_feed queue.
import logging
from tweepy import StreamListener, OAuthHandler, Stream, API
from sentiment_analyzer import calculate_polarity_score
from constants import SETTINGS
auth = OAuthHandler(
SETTINGS["TWITTER_CONSUMER_API_KEY"], SETTINGS["TWITTER_CONSUMER_API_SECRET_KEY"])
auth.set_access_token(
SETTINGS["TWITTER_ACCESS_KEY"], SETTINGS["TWITTER_ACCESS_SECRET_KEY"])
api = API(auth, wait_on_rate_limit=True)
class TweetStreamListener(StreamListener):
def __init__(self):
self.api = api
self.stream = Stream(auth=self.api.auth, listener=self)
def start_listening(self):
pass
def on_status(self, status):
if not hasattr(status, 'retweeted_status'):
polarity = calculate_polarity_score(status.text)
message = {
'polarity': polarity,
'timestamp': status.created_at
}
# TODO(Luis) Need to figure who to send this message to.
logging.debug("Message received from Twitter: {0}".format(message))
# limit handling
def on_limit(self, status):
logging.info(
'Limit threshold exceeded. Status code: {0}'.format(status))
def on_timeout(self, status):
logging.error('Stream disconnected. continuing...')
return True # Don't kill the stream
"""
Summary: Callback that executes for any error that may occur. Whenever we get a 420 Error code, we simply
stop streaming tweets as we have reached our rate limit. This is due to making too many requests.
Returns: False if we are sending too many tweets, otherwise return true to keep the stream going.
"""
def on_error(self, status_code):
if status_code == 420:
logging.error(
'Encountered error code 420. Disconnecting the stream')
# returning False in on_data disconnects the stream
return False
else:
logging.error('Encountered error with status code: {}'.format(
status_code))
return True # Don't kill the stream
WS_Handler is in charge of maintaining a list of open connections and sending any message that it receives back to every client(This behaviour is something I don't want).
import logging
import json
from uuid import uuid4
from tornado.web import RequestHandler
from tornado.websocket import WebSocketHandler
class WSHandler(WebSocketHandler):
def check_origin(self, origin):
return True
#property
def sess_id(self):
return self._sess_id
def open(self):
self._sess_id = uuid4().hex
logging.debug('Connection established.')
self.application.pc.register_websocket(self._sess_id, self)
# When messages arrives via RabbitMQ, write it to websocket
def on_message(self, message):
logging.debug('Message received: {0}'.format(message))
self.application.pc.redirect_incoming_message(
self._sess_id, json.dumps(message))
def on_close(self):
logging.debug('Connection closed.')
self.application.pc.unregister_websocket(self._sess_id)
The PikaClient module contains the PikaClient that will allows to keep track of inbound and outbound channels as well as keeping track of the websockets that currently running.
import logging
import pika
from constants import SETTINGS
from pika import PlainCredentials, ConnectionParameters
from pika.adapters.tornado_connection import TornadoConnection
pika.log = logging.getLogger(__name__)
class PikaClient(object):
INPUT_QUEUE_NAME = 'in_queue'
def __init__(self):
self.connected = False
self.connecting = False
self.connection = None
self.in_channel = None
self.out_channels = {}
self.websockets = {}
def connect(self):
if self.connecting:
return
self.connecting = True
# Setup rabbitMQ connection
credentials = PlainCredentials(
SETTINGS['RABBITMQ_USERNAME'], SETTINGS['RABBITMQ_PASSWORD'])
param = ConnectionParameters(
host=SETTINGS['RABBITMQ_HOST'], port=SETTINGS['RABBITMQ_PORT'], virtual_host='/', credentials=credentials)
return TornadoConnection(param, on_open_callback=self.on_connected)
def run(self):
self.connection = self.connect()
self.connection.ioloop.start()
def stop(self):
self.connected = False
self.connecting = False
self.connection.ioloop.stop()
def on_connected(self, unused_Connection):
self.connected = True
self.in_channel = self.connection.channel(self.on_conn_open)
def on_conn_open(self, channel):
self.in_channel.exchange_declare(
exchange='tornado_input', exchange_type='topic')
channel.queue_declare(
callback=self.on_input_queue_declare, queue=self.INPUT_QUEUE_NAME)
def on_input_queue_declare(self, queue):
self.in_channel.queue_bind(
callback=None, exchange='tornado_input', queue=self.INPUT_QUEUE_NAME, routing_key="#")
def register_websocket(self, sess_id, ws):
self.websockets[sess_id] = ws
self.create_out_channel(sess_id)
def unregister_websocket(self, sess_id):
self.websockets.pop(sess_id)
if sess_id in self.out_channels:
self.out_channels[sess_id].close()
def create_out_channel(self, sess_id):
def on_output_channel_creation(channel):
def on_output_queue_declaration(queue):
channel.basic_consume(self.on_message, queue=sess_id)
self.out_channels[sess_id] = channel
channel.queue_declare(callback=on_output_queue_declaration,
queue=sess_id, auto_delete=True, exclusive=True)
self.connection.channel(on_output_channel_creation)
def redirect_incoming_message(self, sess_id, message):
self.in_channel.basic_publish(
exchange='tornado_input', routing_key=sess_id, body=message)
def on_message(self, channel, method, header, body):
sess_id = method.routing_key
if sess_id in self.websockets:
self.websockets[sess_id].write_message(body)
channel.basic_ack(delivery_tag=method.delivery_tag)
else:
channel.basic_reject(delivery_tag=method.delivery_tag)
Server.py is the main entry point of the application.
import logging
import os
from tornado import web, ioloop
from tornado.options import define, options, parse_command_line
from client import PikaClient
from handlers import WSHandler, MainHandler
define("port", default=3000, help="run on the given port.", type=int)
define("debug", default=True, help="run in debug mode.", type=bool)
def main():
parse_command_line()
settings = {
"debug": options.debug,
"static_path": os.path.join(os.path.dirname(__file__), "web/static")
}
app = web.Application(
[
(r"/", MainHandler),
(r"/stream", WSHandler),
],
**settings
)
# Setup PikaClient
app.pc = PikaClient()
app.listen(options.port)
logging.info("Server running on http://localhost:3000")
try:
app.pc.run()
except KeyboardInterrupt:
app.pc.stop()
if __name__ == "__main__":
main()

Correct use of coroutine in Tornado web server

I'm trying to convert a simple syncronous server to an asyncronous version, the server receives post requestes and it retrieves the response from an external web service (amazon sqs). Here's the syncronous code
def post(self):
zoom_level = self.get_argument('zoom_level')
neLat = self.get_argument('neLat')
neLon = self.get_argument('neLon')
swLat = self.get_argument('swLat')
swLon = self.get_argument('swLon')
data = self._create_request_message(zoom_level, neLat, neLon, swLat, swLon)
self._send_parking_spots_request(data)
#....other stuff
def _send_parking_spots_request(self, data):
msg = Message()
msg.set_body(json.dumps(data))
self._sqs_send_queue.write(msg)
Reading Tornado documentation and some threads here I ended with this code using coroutines:
def post(self):
zoom_level = self.get_argument('zoom_level')
neLat = self.get_argument('neLat')
neLon = self.get_argument('neLon')
swLat = self.get_argument('swLat')
swLon = self.get_argument('swLon')
data = self._create_request_message(zoom_level, neLat, neLon, swLat, swLon)
self._send_parking_spots_request(data)
self.finish()
#gen.coroutine
def _send_parking_spots_request(self, data):
msg = Message()
msg.set_body(json.dumps(data))
yield gen.Task(write_msg, self._sqs_send_queue, msg)
def write_msg(queue, msg, callback=None):
queue.write(msg)
Comparing the performances using siege I get that the second version is even worse than the original one, so probably there's something about coroutines and Torndado asyncronous programming that I didn't understand at all.
Could you please help me with this?
Edit: self._sqs_send_queue it's a queue object retrieved from boto interface and queue.write(msg) returns the message that has been written on the queue

tornado relies on you converting all your I/O to be non-blocking. Simply sticking the same code you were using before inside of a gen.Task will not improve performance at all, because the I/O itself is still going to block the event loop. Additionally, you need to make your post method a coroutine, and call _send_parking_spots_requests using yield for the code to behave properly. So, a "correct" solution would look something like this:
#gen.coroutine
def post(self):
...
yield self._send_parking_spots_request(data) # wait (without blocking the event loop) until the method is done
self.finish()
#gen.coroutine
def _send_parking_spots_request(self, data):
msg = Message()
msg.set_body(json.dumps(data))
yield gen.Task(write_msg, self._sqs_send_queue, msg)
def write_msg(queue, msg, callback=None):
yield queue.write(msg, callback=callback) # This has to do non-blocking I/O.
In this example, queue.write would need to be some API that sends your request using non-blocking I/O, and executes callback when a response is received. Without knowing exactly what queue in your original example is, I can't specify exactly how that can be implemented in your case.
Edit: Assuming you're using boto, you may want to check out bototornado, which implements the exact same API I described above:
def write(self, message, callback=None):
"""
Add a single message to the queue.
:type message: Message
:param message: The message to be written to the queue
:rtype: :class:`boto.sqs.message.Message`
:return: The :class:`boto.sqs.message.Message` object that was written.

Django: Cleaning up redis connection after client disconnects from stream

I've implemented a Server Sent Event API in my Django app to stream realtime updates from my backend to the browser. The backend is a Redis pubsub. My Django view looks like this:
def event_stream(request):
"""
Stream worker events out to browser.
"""
listener = events.Listener(
settings.EVENTS_PUBSUB_URL,
channels=[settings.EVENTS_PUBSUB_CHANNEL],
buffer_key=settings.EVENTS_BUFFER_KEY,
last_event_id=request.META.get('HTTP_LAST_EVENT_ID')
)
return http.HttpResponse(listener, mimetype='text/event-stream')
And the events.Listener class that I'm returning as an iterator looks like this:
class Listener(object):
def __init__(self, rcon_or_url, channels, buffer_key=None,
last_event_id=None):
if isinstance(rcon_or_url, redis.StrictRedis):
self.rcon = rcon_or_url
elif isinstance(rcon_or_url, basestring):
self.rcon = redis.StrictRedis(**utils.parse_redis_url(rcon_or_url))
self.channels = channels
self.buffer_key = buffer_key
self.last_event_id = last_event_id
self.pubsub = self.rcon.pubsub()
self.pubsub.subscribe(channels)
def __iter__(self):
# If we've been initted with a buffer key, then get all the events off
# that and spew them out before blocking on the pubsub.
if self.buffer_key:
buffered_events = self.rcon.lrange(self.buffer_key, 0, -1)
# check whether msg with last_event_id is still in buffer. If so,
# trim buffered_events to have only newer messages.
if self.last_event_id:
# Note that we're looping through most recent messages first,
# here
counter = 0
for msg in buffered_events:
if (json.loads(msg)['id'] == self.last_event_id):
break
counter += 1
buffered_events = buffered_events[:counter]
for msg in reversed(list(buffered_events)):
# Stream out oldest messages first
yield to_sse({'data': msg})
try:
for msg in self.pubsub.listen():
if msg['type'] == 'message':
yield to_sse(msg)
finally:
logging.info('Closing pubsub')
self.pubsub.close()
self.rcon.connection_pool.disconnect()
I'm able to successfully stream events out to the browser with this setup. However, it seems that the disconnect calls in the listener's "finally" don't ever actually get called. I assume that they're still camped out waiting for messages to come from the pubsub. As clients disconnect and reconnect, I can see the number of connections to my Redis instance climbing and never going down. Once it gets to around 1000, Redis starts freaking out and consuming all the available CPU.
I would like to be able to detect when the client is no longer listening and close the Redis connection(s) at that time.
Things I've tried or thought about:
A connection pool. But as the redis-py README states, "It is not safe to pass PubSub or Pipeline objects between threads."
A middleware to handle the connections, or maybe just disconnections. This won't work because a middleware's process_response() method gets called too early (before http headers are even sent to the client). I need something called when the client disconnects while I'm in the middle of streaming content to them.
The request_finished and got_request_exception signals. The first, like process_response() in a middleware, seems to fire too soon. The second doesn't get called when a client disconnects mid-stream.
Final wrinkle: In production I'm using Gevent so I can get away with keeping a lot of connections open at once. However, this connection leak issue occurs whether I'm using plain old 'manage.py runserver', or Gevent monkeypatched runserver, or Gunicorn's gevent workers.

UPDATE: As of Django 1.5, you'll need to return a StreamingHttpResponse instance if you want to lazily stream things out as I'm doing in this question/answer.
ORIGINAL ANSWER BELOW
After a lot of banging on things and reading framework code, I've found what I think is the right answer to this question.
According to the WSGI PEP, if your application returns an iterator with a close() method, it should be called by the WSGI server once the response has finished. Django supports this too. That's a natural place to do the Redis connection cleanup that I need.
There's a bug in Python's wsgiref implementation, and by extension in Django's 'runserver', that causes close() to be skipped if the client disconnects from the server mid-stream. I've submitted a patch.
Even if the server honors close(), it won't be called until a write to the client actually fails. If your iterator is blocked waiting on the pubsub and not sending anything, close() won't be called. I've worked around this by sending a no-op message into the pubsub each time a client connects. That way when a browser does a normal reconnect, the now-defunct threads will try to write to their closed connections, throw an exception, then get cleaned up when the server calls close(). The SSE spec says that any line beginning with a colon is a comment that should be ignored, so I'm just sending ":\n" as my no-op message to flush out stale clients.
Here's the new code. First the Django view:
def event_stream(request):
"""
Stream worker events out to browser.
"""
return events.SSEResponse(
settings.EVENTS_PUBSUB_URL,
channels=[settings.EVENTS_PUBSUB_CHANNEL],
buffer_key=settings.EVENTS_BUFFER_KEY,
last_event_id=request.META.get('HTTP_LAST_EVENT_ID')
)
And the Listener class that does the work, along with a helper function to format the SSEs and an HTTPResponse subclass that lets the view be a little cleaner:
class Listener(object):
def __init__(self,
rcon_or_url=settings.EVENTS_PUBSUB_URL,
channels=None,
buffer_key=settings.EVENTS_BUFFER_KEY,
last_event_id=None):
if isinstance(rcon_or_url, redis.StrictRedis):
self.rcon = rcon_or_url
elif isinstance(rcon_or_url, basestring):
self.rcon = redis.StrictRedis(**utils.parse_redis_url(rcon_or_url))
if channels is None:
channels = [settings.EVENTS_PUBSUB_CHANNEL]
self.channels = channels
self.buffer_key = buffer_key
self.last_event_id = last_event_id
self.pubsub = self.rcon.pubsub()
self.pubsub.subscribe(channels)
# Send a superfluous message down the pubsub to flush out stale
# connections.
for channel in self.channels:
# Use buffer_key=None since these pings never need to be remembered
# and replayed.
sender = Sender(self.rcon, channel, None)
sender.publish('_flush', tags=['hidden'])
def __iter__(self):
# If we've been initted with a buffer key, then get all the events off
# that and spew them out before blocking on the pubsub.
if self.buffer_key:
buffered_events = self.rcon.lrange(self.buffer_key, 0, -1)
# check whether msg with last_event_id is still in buffer. If so,
# trim buffered_events to have only newer messages.
if self.last_event_id:
# Note that we're looping through most recent messages first,
# here
counter = 0
for msg in buffered_events:
if (json.loads(msg)['id'] == self.last_event_id):
break
counter += 1
buffered_events = buffered_events[:counter]
for msg in reversed(list(buffered_events)):
# Stream out oldest messages first
yield to_sse({'data': msg})
for msg in self.pubsub.listen():
if msg['type'] == 'message':
yield to_sse(msg)
def close(self):
self.pubsub.close()
self.rcon.connection_pool.disconnect()
class SSEResponse(HttpResponse):
def __init__(self, rcon_or_url, channels, buffer_key=None,
last_event_id=None, *args, **kwargs):
self.listener = Listener(rcon_or_url, channels, buffer_key,
last_event_id)
super(SSEResponse, self).__init__(self.listener,
mimetype='text/event-stream',
*args, **kwargs)
def close(self):
"""
This will be called by the WSGI server at the end of the request, even
if the client disconnects midstream. Unless you're using Django's
runserver, in which case you should expect to see Redis connections
build up until http://bugs.python.org/issue16220 is fixed.
"""
self.listener.close()
def to_sse(msg):
"""
Given a Redis pubsub message that was published by a Sender (ie, has a JSON
body with time, message, title, tags, and id), return a properly-formatted
SSE string.
"""
data = json.loads(msg['data'])
# According to the SSE spec, lines beginning with a colon should be
# ignored. We can use that as a way to force zombie listeners to try
# pushing something down the socket and clean up their redis connections
# when they get an error.
# See http://dev.w3.org/html5/eventsource/#event-stream-interpretation
if data['message'] == '_flush':
return ":\n" # Administering colonic!
if 'id' in data:
out = "id: " + data['id'] + '\n'
else:
out = ''
if 'name' in data:
out += 'name: ' + data['name'] + '\n'
payload = json.dumps({
'time': data['time'],
'message': data['message'],
'tags': data['tags'],
'title': data['title'],
})
out += 'data: ' + payload + '\n\n'
return out

Websocket/event-source/... implementation to expose a two-way RPC to a python/django application

for a django application I'm working on, I need to implement a two ways RPC so :
the clients can call RPC methods from the platform and
the platform can call RPC methods from each client.
As the clients will mostly be behind NATs (which means no public IPs, and unpredictable weird firewalling policies), the platform to client way has to be initiated by the client.
I have a pretty good idea on how I can write this from scratch, I also think I can work something out of the publisher/subscriber model of twisted, but I've learned that there is always a best way to do it in python.
So I'm wondering what would be the best way to do it, that would also integrate the best to django. The code will have to be able to scope with hundreds of clients in short term, and (we hope) with thousands of clients in medium/long term.
So what library/implementation would you advice me to use ?
I'm mostly looking to starting points for RTFM !

websocket is a moving target, with new specifications from time to time. Brave developpers implements server side library, but few implements client side. The client for web socket is a web browser.
websocket is not the only way for a server to talk to a client, event source is a simple and pragmatic way to push information to a client. It's just a never ending page. Twitter fire hose use this tricks before its specification. The client open a http connection and waits for event. The connection is kept open, and reopen if there is some troubles (connection cut, something like that).
No timeout, you can send many events in one connection.
The difference between websocket and eventsource is simple. Websocket is bidirectionnal and hard to implement. Eventsource is unidirectionnal and simple to implement, both client and server side.
You can use eventsource as a zombie controller. Each client connects and reconnect to the master and wait for instruction. When instruction is received, the zombie acts and if needed can talk to its master, with a classical http connection, targeting the django app.
Eventsource keep the connection open, so you need an async server, like tornado. Django need a sync server, so, you need both, with a dispatcher, like nginx. Django or a cron like action talks to the async server, wich talks to the right zombie. Zombie talks to django, so, the async server doesn't need any peristance, it's just a hub with plugged zombies.
Gevent is able to handle such http server but there is no decent doc and examples for this point. It's a shame. I want a car, you give me a screw.

You can also use Tornado + Tornadio + Socket.io. That's what we are using right now for notifications, and the amount of code that you should write is not that much.
from tornadio2 import SocketConnection, TornadioRouter, SocketServer
class RouterConnection(SocketConnection):
__endpoints__ = {'/chat': ChatConnection,
'/ping': PingConnection,
'/notification' : NotificationConnection
}
def on_open(self, info):
print 'Router', repr(info)
MyRouter = TornadioRouter(RouterConnection)
# Create socket application
application = web.Application(
MyRouter.apply_routes([(r"/", IndexHandler),
(r"/socket.io.js", SocketIOHandler)]),
flash_policy_port = 843,
flash_policy_file = op.join(ROOT, 'flashpolicy.xml'),
socket_io_port = 3001,
template_path=os.path.join(os.path.dirname(__file__), "templates/notification")
)
class PingConnection(SocketConnection):
def on_open(self, info):
print 'Ping', repr(info)
def on_message(self, message):
now = dt.utcnow()
message['server'] = [now.hour, now.minute, now.second, now.microsecond / 1000]
self.send(message)
class ChatConnection(SocketConnection):
participants = set()
unique_id = 0
#classmethod
def get_username(cls):
cls.unique_id += 1
return 'User%d' % cls.unique_id
def on_open(self, info):
print 'Chat', repr(info)
# Give user unique ID
self.user_name = self.get_username()
self.participants.add(self)
def on_message(self, message):
pass
def on_close(self):
self.participants.remove(self)
def broadcast(self, msg):
for p in self.participants:
p.send(msg)

here is a really simple solution I could came up with :
import tornado.ioloop
import tornado.web
import time
class MainHandler(tornado.web.RequestHandler):
#tornado.web.asynchronous
def get(self):
self.set_header("Content-Type", "text/event-stream")
self.set_header("Cache-Control", "no-cache")
self.write("Hello, world")
self.flush()
for i in range(0, 5):
msg = "%d<br>" % i
self.write("%s\r\n" % msg) # content
self.flush()
time.sleep(5)
application = tornado.web.Application([
(r"/", MainHandler),
])
if __name__ == "__main__":
application.listen(8888)
tornado.ioloop.IOLoop.instance().start()
and
curl http://localhost:8888
gives output when it comes !
Now, I'll just have to implement the full event-source spec and some kind of data serialization between the server and the clients, but that's trivial. I'll post an URL to the lib I'll write here when it'll be done.

I've recently played with Django, Server-Sent Events and WebSocket, and I've wrote an article about it at http://curella.org/blog/2012/jul/17/django-push-using-server-sent-events-and-websocket/
Of course, this comes with the usual caveats that Django probably isn't the best fit for evented stuff, and both protocols are still drafts.

Using Twisted AMP with Database insertion

I am learning how to use Twisted AMP. I am developing a program that sends data from a client to a server and inserts the data in a SQLite3 DB. The server then sends back a result to the client which indicates success or error (try and except might not be the best way to do this but it is only a temporary solution while I work out the main problem). In order to do this I modified an example I found that originally did a sum and returned the result, so I realize that this might not be the most efficient way to do what I am trying to do. In particular I am trying to do some timings on multiple insertions (i.e. send the data to the server multiple times for multiple insertions) and I have included the code I have written. It works but clearly it is not a good way to send multiple data for insertion since I am performing multiple connections before running the reactor.
I have tried several ways to get around this including passing the ClientCreator to reactor.callWhenRunning() but you cannot do this with a deferred.
Any suggestions, advice or help with how to do this would be much appreciated. Here is the code.
Server:
from twisted.protocols import amp
from twisted.internet import reactor
from twisted.internet.protocol import Factory
import sqlite3, time
class Insert(amp.Command):
arguments = [('data', amp.Integer())]
response = [('insert_result', amp.Integer())]
class Protocol(amp.AMP):
def __init__(self):
self.conn = sqlite3.connect('biomed1.db')
self.c =self.conn.cursor()
self.res=None
#Insert.responder
def dbInsert(self, data):
self.InsertDB(data) #call the DB inserter
result=self.res # send back the result of the insertion
return {'insert_result': result}
def InsertDB(self,data):
tm=time.time()
print "insert time:",tm
chx=data
PID=2
device_ID=5
try:
self.c.execute("INSERT INTO btdata4(co2_data, patient_Id, sensor_Id) VALUES ('%s','%s','%s')" % (chx, PID, device_ID))
except Exception, err:
print err
self.res=0
else:
self.res=1
self.conn.commit()
pf = Factory()
pf.protocol = Protocol
reactor.listenTCP(1234, pf)
reactor.run()
Client:
from twisted.internet import reactor
from twisted.internet.protocol import ClientCreator
from twisted.protocols import amp
import time
class Insert(amp.Command):
arguments = [('data', amp.Integer())]
response = [('insert_result', amp.Integer())]
def connected(protocol):
return protocol.callRemote(Insert, data=5555).addCallback(gotResult)
def gotResult(result):
print 'insert_result:', result['insert_result']
tm=time.time()
print "stop", tm
def error(reason):
print "error", reason
tm=time.time()
print "start",tm
for i in range (10): #send data over ten times
ClientCreator(reactor, amp.AMP).connectTCP(
'127.0.0.1', 1234).addCallback(connected).addErrback(error)
reactor.run()
End of Code.
Thank you.

Few things which will improve your Server code.
First and foremost: The use of direct database access functions is discouraged in twisted, as they normally causes block. Twisted has nice abstraction for database access which provides twisted approach to db connection - twisted.adbapi
Now on to reuse of db connection: If you want to reuse certain assets (like database connection) across a number of Protocol instances, you should initialize those in constructor of Factory or if you dont fancy initiating such things at a launch time, create an resource access method, which will initiate resource upon first method call then assign it to class variable and return that on subsequent calls.
When Factory creates a specific Protocol intance, it will add a reference to itself inside the protocol, see line 97 of twisted.internet.protocol
Then within your Protocol instance, you can access shared database connection instance like:
self.factory.whatever_name_for_db_connection.doSomething()
Reworked Server code (I dont have python, twisted or even decent IDE available, so this is pretty much untested, some errors are to be expected)
from twisted.protocols import amp
from twisted.internet import reactor
from twisted.internet.protocol import Factory
import time
class AMPDBAccessProtocolFactory(Factory):
def getDBConnection(self):
if 'dbConnection' in dir(self):
return self.dbConnection
else:
self.dbConnection = SQLLiteTestConnection(self.dbURL)
return self.dbConnection
class SQLLiteTestConnection(object):
"""
Provides abstraction for database access and some business functions.
"""
def __init__(self,dbURL):
self.dbPool = adbapi.ConnectionPool("sqlite3" , dbURL, check_same_thread=False)
def insertBTData4(self,data):
query = "INSERT INTO btdata4(co2_data, patient_Id, sensor_Id) VALUES (%s,%s,%s)"
tm=time.time()
print "insert time:",tm
chx=data
PID=2
device_ID=5
dF = self.dbPool.runQuery(query,(chx, PID, device_ID))
dF.addCallback(self.onQuerySuccess,insert_data=data)
return dF
def onQuerySuccess(self,insert_data,*r):
"""
Here you can inspect query results or add any other valuable information to be parsed at client.
For the test sake we will just return True to a customer if query was a success.
original data available at kw argument insert_data
"""
return True
class Insert(amp.Command):
arguments = [('data', amp.Integer())]
response = [('insert_result', amp.Integer())]
class MyAMPProtocol(amp.AMP):
#Insert.responder
def dbInsert(self, data):
db = self.factory.getDBConnection()
dF = db.insertBTData4(data)
dF.addErrback(self.onInsertError,data)
return dF
def onInsertError(self, error, data):
"""
Here you could do some additional error checking or inspect data
which was handed for insert here. For now we will just throw the same exception again
so that the client gets notified
"""
raise error
if __name__=='__main__':
pf = AMPDBAccessProtocolFactory()
pf.protocol = MyAMPProtocol
pf.dbURL='biomed1.db'
reactor.listenTCP(1234, pf)
reactor.run()
Now on to the client. IF AMP follows the overall RPC logic (cant test it currently) it should be able to peruse the same connection across a number of calls. So I have created a ServerProxy class which will hold that perusable protocol instance and provide abstraction for calls:
from twisted.internet import reactor
from twisted.internet.protocol import ClientCreator
from twisted.protocols import amp
import time
class Insert(amp.Command):
arguments = [('data', amp.Integer())]
response = [('insert_result', amp.Integer())]
class ServerProxy(object):
def connected(self,protocol):
self.serverProxy = protocol # assign protocol as instance variable
reactor.callLater(5,self.startMultipleInsert) #after five seconds start multiple insert procedure
def remote_insert(self,data):
return self.serverProxy.callRemote(Insert, data)
def startMultipleInsert(self):
for i in range (10): #send data over ten times
dF = self.remote_insert(i)
dF.addCallback(self.gotInsertResult)
dF.addErrback(error)
def gotInsertResult(self,result):
print 'insert_result:', str(result)
tm=time.time()
print "stop", tm
def error(reason):
print "error", reason
def main():
tm=time.time()
print "start",tm
serverProxy = ServerProxy()
ClientCreator(reactor, amp.AMP).connectTCP('127.0.0.1', 1234).addCallback(serverProxy.connected).addErrback(error)
reactor.run()
if __name__=='__main__':
main()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.