The architecture of a real-time web chat app

The architecture of a real-time web chat app - python

I would to create a real-time web chat application using web.py in python. The problem is that I don't know how to 'architect' or design the such an app.
The way I'm thinking to implement this app is the following:
a user logs into the app.
the app connects to a controller that has a push service to push new messages and a queue service to store the new messages.
when the user sends a message, the app sends the message with an ajax call to the controller and the controller stores the message in a queue.
then the controller sends the messages in the queue to the destination user by its push service.
However I see this is a very poor design since I see a lot of ajax requests being sent here. I really don't know if there are better designs or architectures for such a service. So can you please point me toward the correct design for a real-time chat app?

Alex,
This is an understandable question, I recently thought about it when I was building my own messaging application. This is the way I broke down the app's functionality:
User registration
User authentication
Adding a new friend by username
Approving a friend
Messaging with a friend in list (Of course)
Shows online and offline users
Runs a background service in order to get messages even when the application is closed.
Uses notification area when a new message is received.
Quiting the application(kills the background service)
A few things I realized after building this application was:
The back-end architecture was a simple mixture of a simple CRUD application with pub/sub functionality. You can read up more on pub/sub systems here. Here's a simple chat application built using Ruby on Rails. You can look at it for reference, it's very well architected.
You should think about the last steps listed in the above functionality as much in the beginning of this app as you do in the end. If you architect it well in the beginning, the final steps will just fall into place! :-)
If you want to learn about concurrency and do something really cool, I suggest trying to implement some of the frameworks discussed here.
Please let me know if you have any questions!

Related

how to design rest api which handle offline data

I have multiple api which we have provided to android developers.
Like :
1) Creating Business card API
2) Creating Contacts API
So these api working fine when app is online. So our requirement is to handle to create business card and contacts when app is offline.
We are following steps but not sure:-
1) Android developer store the business card when app offline and send this data to server using separate offline business card api when app comes online.
2) Same we do for creating contacts offline using offline contact api.
My problem is I want do in one api call to send all data to server and do operation.
Is this approach will right?? Also please suggest what is the best approach to handle offline data. Also how to handle syncing data when app would come online??
Please let me know if I could provide more information.

I'm confused as to how you're approaching this. My understanding is that when the app is offline you want to "queue up" any API requests that are sent.
Your process seems fine however without knowing the terms around the app being "offline" it's hard to understand if this best.
Assuming you're meaning the server(s) holding the application are offline you're correct you want a process in the android app that will store the request until the application becomes online. However, this can be dangerous for end users. They should be receiving a message on the application being offline and to "try again later" as it were. The fear being they submit a request for x new contacts to be queued and then re-submit not realizing the application was offline.
I would suggest you have the android app built to either notify the user of the app being down or provide some very visible notification that requests are queued locally on their phone until the application becomes available and for them to view/modify/delete said locally cached requests until the application becomes available. When the API becomes available a notification can be set for users to release their queue on their device.

Can push notifications be done with an AngularJS+Flask stack?

I have a Python/Flask backend and an Angular frontend for my website. At the backend there is a process that occasionally checks SQS for messages and I want it to push a notification to the client which can then in turn update an Angular controller. What is the best way to do this my existing technologies?

To be able to push to the client, you'll have to implement web socket support in some fashion. If you want to keep it in python/flask, there is this tutorial on how to do that with gevent:
http://www.socketubs.org/2012/10/28/Websocket_with_flask_and_gevent.html
In that article, Geoffrey also mentions a SocketIO compatible library for python/gevent that may allow you to leverage the SocketIO client-side JS library, called "gevent-socketio".
That may reduce how much work you have to do in terms of cross-browser compatibility since SocketIO has done a lot of that already.
Here is a pretty good tutorial on how to use SocketIO in AngularJS so that you can notify the AngularJS model when an event comes in from SocketIO:
http://www.html5rocks.com/en/tutorials/frameworks/angular-websockets/
If you don't want to host the web socket backend, you could look to a hosted service like PubNub or Pusher and then integrate them into AngularJS as a service. You can communicate with these services through your Python app (when the SQS notification happens) and they'll notify all connected clients for you.

I know this is a bit late, but I have done pretty much exactly what you ask for (though without Angular).
I ended up having a separate process running a websocket server called Autobahn, which listens to a redis pub/sub socket, the code is here on github
This allows you to send push notifications to your clients from pretty much anything that can access redis.
So when I want to publish a message to all my connected clients I just use redis like this:
r = redis.Redis()
r.publish('broadcasts', "SOME MESSAGE")
This has worked fairly good so far. What I can't currently do is send a push notification to a specific client. But if you have a authentication system or something to identify a specific user you could tie that to the open websockets and then be able to send messages directly to a specific client :-)
You could of course use any websocket server or client (like socket.io or sock.js), but this has worked great for me :-)

rabbitmq python - notifications design pattern to scale for thousands of users

Our website has around 50,000 users and daily active traffic is pretty good. We are designing a new notifications feature for our user base.
Our requirement is as follows:
Users are part of different Groups.
A user can be part of multiple Groups.
When a user uploads a image in a group, all the members of that particular group should get a notification saying "new image uploaded" regardless of being online or offline.
We thought of creating rabbitmq exchanges for each group and queue for each user. But got confused going forward of designing the right way!!
Say, a user should receive notifications even he logs-in to site days after the notifications is generated. We ended up storing the messages in DB which is not a good thing at all for offline users.
Can someone suggest proper design pattern with explanation for this use case? We are using celery + rabbitmq + tornado. Should tornado talk directly to the celery consumer? Where do the messages get stored when the user is offline?

I have similar project. So how it works:
Put messages to your rabbit queue, from your events source ( django, celery, everywhere)
You can use pika+tornado IOLoop ( so when messages come to tornado you will receive notification via pika loop, when request comes http, or websocket connection via Tornado)
Use collection of opened in TornaodApplication websockets for messaging for users
You can check very close project with logging via torando+rabbitmq on my github: https://github.com/rmuslimov/RapidLog

Realtime options (Websockets, flash, polling) for Django?

What are some realtime "push" options for django that can install as a python package? I want to avoid having to do things like installing independent web-servers for realtime.
Essentially I am looking for something like pusher.com (cloud system) or this socket.io build for django (which has a build status:failing) for chat and other various push operations.
Ape was suggested here, but it seems it requires you to setup Ape as a server. If its not too much to ask for, are there any solutions that build right into django?

Since the time the answer was written (2012); a lot has changed.
The preferred method now to do realtime updates of the system is using websockets; which is being formalized and proposed as a standard RFC 6455. This page on MDN has a great overview of the technology.
The other emerging technology is Server Sent Events which is a W3C draft proposal.
Projects such as swampdragon and django-socketio make integrating realtime functionality easier in your project.
There are two main components to any realtime system:
A connection that remains open from the browser to a server.
A server that listens on this connection and then responds.
A system / standard to store and be notified of messages.
Okay so maybe three components.
Since django doesn't work in realtime any solution that offers realtime push/updates will require another server/service to accept messages and then notify listeners of pending messages.
Django would be the application that pushes messages (writes them) to this server on a channel (a queue/bucket). Listeners then subscribe to a channel to be notified of messages. Since the connection remains open; messages are retrieved in "realtime".
Django really has a minimal role in all of this. There are various implementations that provide the three components necessary for realtime notifications to work.
I really like juggernaut because it is super simple to set up, and uses node.js that doesn't require a lot in terms of server-side components. The other reason I prefer it is because it supports Adobe Flash Socket in addion to WebSocket (and others, see the link).
The api to access it also very simple - in fact, if you are already using redis (which you really should since its so easy to use), you don't need another API as you can drop messages to redis and juggernaut will read them, or you can use its Python API. A simple example from this flask snippet:
Send (write) a message to a channel:
>>> from juggernaut import Juggernaut
>>> jug = Juggernaut()
>>> jug.publish('channel', 'The message')
Listen to it:
<script type=text/javascript
src=http://localhost:8080/application.js></script>
<script type=text/javascript>
var jug = new Juggernaut();
jug.subscribe('channel', function(data) {
alert('Got message: ' + data);
});
</script>

Django is built to serve web pages and there is nothing out of the box to support websockets in django. The quickest/easiest option is pusher.com (I use it an really like it). You can start with something like pusher.com and if you write a quick wrapper around it, you can replace it with your own server using socket.io or any other web socket server by just changing the wrapper / interface to connect to the new server. Just make sure you write it with being able to switch out the backend at any time.
If you really want to start run your own socket server there are projects out there that will make it easy to use sockets in django:
django-websocket
django-socketio

You can actually serve up Django from Tornadio2, a working implementation of socketio in Tornado. If you want to build any degree of sophistication into your realtime app you will likely need a redis pubsub backend that maps sessions to channels and handles multicasting. For this you might like to take a look at Brukva. Also read up Yuval Adam's blog post on this subject. Finally, Tony Abou Assaleh's sample package and post will provide a useful base reference when setting up tornadio2 for django.

Need help understanding Comet in Python (with Django)

After spending two entire days on this I'm still finding it impossible to understand all the choices and configurations for Comet in Python. I've read all the answers here as well as every blog post I could find. It feels like I'm about to hemorrhage at this point, so my utmost apologies for anything wrong with this question.
I'm entirely new to all of this, all I've done before were simple non-real-time sites with a PHP/Django backend on Apache.
My goal is to create a real-time chat application; hopefully tied to Django for users, auth, templates, etc.
Every time I read about a tool it says I need another tool on top of it, it feels like a never-ending chain.
First of all, can anybody categorize all the tools needed for this job?
I've read about different servers, networking libraries, engines, JavaScripts for the client side, and I don't know what else. I never imagined it would be this complex.
Twisted / Twisted Web seems to be popular, but I have no idea to to integrate it or what else I need (guessing I need client-side JS at least).
If I understand correctly, Orbited is built on Twisted, do I need anything else with it?
Are Gevent and Eventlet in the same category as Twisted? How much else do I need with them?
Where do things like Celery, RabbitMQ, or KV stores like Redis come into this? I don't really understand the concept of a message queue. Are they essential and what service do they provide?
Are there any complete chat app tutorials I should look at?
I'll be entirely indebted to anybody who helps me past this mental roadblock, and if I left anything out please don't hesitate to ask. I know it's a pretty loaded question.

You could use Socket.IO. There are gevent and tornado handlers for it. See my blog post on gevent-socketio with Django here: http://codysoyland.com/2011/feb/6/evented-django-part-one-socketio-and-gevent/

I feel your pain, having had to go through the same research over the past few months. I haven't had time to deal with proper documentation yet but I have a working example of using Django with socket.io and tornadio at http://bitbucket.org/virtualcommons/vcweb - I was hoping to set up direct communication from the Django server-side to the tornadio server process using queues (i.e., logic in a django view pushes a message onto a queue that then gets handled by tornadio which pushes a json encoded version of that message out to all interested subscribers) but haven't implemented that part fully yet. The way I've currently gotten it set up involves:
An external tornado (tornadio) server, running on another port, accepting socket.io requests and working with Django models. The only writes this server process makes to the database are the chat messages that need to be stored. It has full access to all Django models, etc., and all real-time interactions need to go directly through this server process.
Django template pages that require real-time access include the socket.io javascript and establish direct connections to the tornadio server
I looked into orbited, hookbox, and gevent but decided to go with socket.io + tornado as it seemed to allow me the cleanest javascript + python code. I could be wrong about that though, having just started to learn Python/Django over the past year.

Redis is relevant as a persistence layer that also supports native publish/subscribe. So instead of a situation where you are polling the db looking for new messages, you can subscribe to a channel, and have messages pushed out to you.
I found a working example of the type of system you describe. The magic happens in the socketio view:
def socketio(request):
"""The socket.io view."""
io = request.environ['socketio']
redis_sub = redis_client().pubsub()
user = username(request.user)
# Subscribe to incoming pubsub messages from redis.
def subscriber(io):
redis_sub.subscribe(room_channel())
redis_client().publish(room_channel(), user + ' connected.')
while io.connected():
for message in redis_sub.listen():
if message['type'] == 'message':
io.send(message['data'])
greenlet = Greenlet.spawn(subscriber, io)
# Listen to incoming messages from client.
while io.connected():
message = io.recv()
if message:
redis_client().publish(room_channel(), user + ': ' + message[0])
# Disconnected. Publish disconnect message and kill subscriber greenlet.
redis_client().publish(room_channel(), user + ' disconnected')
greenlet.throw(Greenlet.GreenletExit)
return HttpResponse()
Take the view step-by-step:
Set up socket.io, get a redis client and the current user
Use Gevent to register a "subscriber" - this takes incoming messages from Redis and forwards them on to the client browser.
Run a "publisher" which takes messages from socket.io (from the user's browser) and pushes them into Redis
Repeat until the socket disconnects
The Redis Cookbook gives a little more detail on the Redis side, as well as discussing how you can persist messages.
Regarding the rest of your question: Twisted is an event-based networking library, it could be considered an alternative to Gevent in this application. It's powerful and difficult to debug in my experience.
Celery is a "distributed task queue" - basically, it lets you spread units of work out across multiple machines. The "distributed" angle means some sort of transport is required between the machines. Celery supports several types of transport, including RabbitMQ (and Redis too).
In the context of your example, Celery would only be appropriate if you had to do some sort of costly processing on each message like scanning for profanity or something. Even still, something would have to initiate the Celery task, so there would need to be some code listening for the socket.io callback.
(Just in case you weren't totally confused, Celery itself can be made to use Gevent as its underlying concurrency library.)
Hope that helps!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.