Need help understanding Comet in Python (with Django) - python

After spending two entire days on this I'm still finding it impossible to understand all the choices and configurations for Comet in Python. I've read all the answers here as well as every blog post I could find. It feels like I'm about to hemorrhage at this point, so my utmost apologies for anything wrong with this question.
I'm entirely new to all of this, all I've done before were simple non-real-time sites with a PHP/Django backend on Apache.
My goal is to create a real-time chat application; hopefully tied to Django for users, auth, templates, etc.
Every time I read about a tool it says I need another tool on top of it, it feels like a never-ending chain.
First of all, can anybody categorize all the tools needed for this job?
I've read about different servers, networking libraries, engines, JavaScripts for the client side, and I don't know what else. I never imagined it would be this complex.
Twisted / Twisted Web seems to be popular, but I have no idea to to integrate it or what else I need (guessing I need client-side JS at least).
If I understand correctly, Orbited is built on Twisted, do I need anything else with it?
Are Gevent and Eventlet in the same category as Twisted? How much else do I need with them?
Where do things like Celery, RabbitMQ, or KV stores like Redis come into this? I don't really understand the concept of a message queue. Are they essential and what service do they provide?
Are there any complete chat app tutorials I should look at?
I'll be entirely indebted to anybody who helps me past this mental roadblock, and if I left anything out please don't hesitate to ask. I know it's a pretty loaded question.

You could use Socket.IO. There are gevent and tornado handlers for it. See my blog post on gevent-socketio with Django here: http://codysoyland.com/2011/feb/6/evented-django-part-one-socketio-and-gevent/

I feel your pain, having had to go through the same research over the past few months. I haven't had time to deal with proper documentation yet but I have a working example of using Django with socket.io and tornadio at http://bitbucket.org/virtualcommons/vcweb - I was hoping to set up direct communication from the Django server-side to the tornadio server process using queues (i.e., logic in a django view pushes a message onto a queue that then gets handled by tornadio which pushes a json encoded version of that message out to all interested subscribers) but haven't implemented that part fully yet. The way I've currently gotten it set up involves:
An external tornado (tornadio) server, running on another port, accepting socket.io requests and working with Django models. The only writes this server process makes to the database are the chat messages that need to be stored. It has full access to all Django models, etc., and all real-time interactions need to go directly through this server process.
Django template pages that require real-time access include the socket.io javascript and establish direct connections to the tornadio server
I looked into orbited, hookbox, and gevent but decided to go with socket.io + tornado as it seemed to allow me the cleanest javascript + python code. I could be wrong about that though, having just started to learn Python/Django over the past year.

Redis is relevant as a persistence layer that also supports native publish/subscribe. So instead of a situation where you are polling the db looking for new messages, you can subscribe to a channel, and have messages pushed out to you.
I found a working example of the type of system you describe. The magic happens in the socketio view:
def socketio(request):
"""The socket.io view."""
io = request.environ['socketio']
redis_sub = redis_client().pubsub()
user = username(request.user)
# Subscribe to incoming pubsub messages from redis.
def subscriber(io):
redis_sub.subscribe(room_channel())
redis_client().publish(room_channel(), user + ' connected.')
while io.connected():
for message in redis_sub.listen():
if message['type'] == 'message':
io.send(message['data'])
greenlet = Greenlet.spawn(subscriber, io)
# Listen to incoming messages from client.
while io.connected():
message = io.recv()
if message:
redis_client().publish(room_channel(), user + ': ' + message[0])
# Disconnected. Publish disconnect message and kill subscriber greenlet.
redis_client().publish(room_channel(), user + ' disconnected')
greenlet.throw(Greenlet.GreenletExit)
return HttpResponse()
Take the view step-by-step:
Set up socket.io, get a redis client and the current user
Use Gevent to register a "subscriber" - this takes incoming messages from Redis and forwards them on to the client browser.
Run a "publisher" which takes messages from socket.io (from the user's browser) and pushes them into Redis
Repeat until the socket disconnects
The Redis Cookbook gives a little more detail on the Redis side, as well as discussing how you can persist messages.
Regarding the rest of your question: Twisted is an event-based networking library, it could be considered an alternative to Gevent in this application. It's powerful and difficult to debug in my experience.
Celery is a "distributed task queue" - basically, it lets you spread units of work out across multiple machines. The "distributed" angle means some sort of transport is required between the machines. Celery supports several types of transport, including RabbitMQ (and Redis too).
In the context of your example, Celery would only be appropriate if you had to do some sort of costly processing on each message like scanning for profanity or something. Even still, something would have to initiate the Celery task, so there would need to be some code listening for the socket.io callback.
(Just in case you weren't totally confused, Celery itself can be made to use Gevent as its underlying concurrency library.)
Hope that helps!

Related

Usage of RabbitMQ queues with Django

I'm trying to add some real-time features to my Django applications, for that i'm using RabbitMQ and Celery on my django project, so what i would like to do is this: i have an external Python script which sends data to RabbitMQ > from RabbitMQ it should be retrieved from the Django app.
I'm sending some muppet data, like this:
connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
channel = connection.channel()
channel.queue_declare(queue='Test')
channel.basic_publish(exchange='',
routing_key='Test',
body='Hello world!')
print(" [x] Sent 'Hello World!'")
connection.close()
What i would like to do is: as soon as i send Hello World!, my Django app should receive the string, so that i can perform some operations with it, such as saving it on my database, passing it to an HTML template or simply printing it to my console.
My actual problem is that i still have no idea how to do this. I added Celery to my Django project but i don't know how to connect to RabbitMQ and receive the message. Would i have to do it with Django Channels? Is there some tutorial on this? I found various material about using RabbitMQ and Celery with Django but nothing on this particular matter.
This is not directly connected to Celery.
You could solve it like this:
create a manage command in Django that does whatever needs to be done with the incoming message/data (https://docs.djangoproject.com/en/3.0/howto/custom-management-commands/)
create a consumer which could also be a Django command that is then started in a separate process, it is not part of the regular Django process (though it can be part of your Django code). In this consumer, you listen to the queue and whenever data comes in, the manage command from (1.) is called.
Of course, 1. and 2. could also be done in one single command. I've separated it to illustrate better the different aspects. And you might have different tasks and reuse one consumer. Also, if you have already (1.) you can reuse it like this, and you can test it easily without the overhead of the consumer.
More on RabbitMQ python consumers: https://github.com/celery/celery/blob/master/celery/worker/consumer/consumer.py
Here is the Celery consumer:
https://github.com/celery/celery/blob/master/celery/worker/consumer/consumer.py
because Celery of course has it's own consumer. It looks rather generic though, a simple consumer should be less complex.
(I only ever have written NSQ python consumers as part of Django, so no direct experience with RabbitMQ consumers (only as backend for Celery).)
EDIT: What you should ask yourself is - do I want the realtime data saved and stored in my Django app, first of all?
If yes - then RabbitMQ+Consumer is a very valid approach.
If no, if this is just for the user - you could also think about directly exposing it via an API to your frontend (and there use ajax calls to fetch it).
If no but you want to buffer the data to avoid hitting the other app that generates the data - then a queue is a very nice tool. In this case though, you might change the consumer not to save the data but to expose it to your frontend. If you only have to support new browsers you could use websockets which are supported now with Django 3:
https://blog.heroku.com/in_deep_with_django_channels_the_future_of_real_time_apps_in_django

Asynchronous Socket connections in Python or Node

I am creating an application that basically has multiple connections to a third party Chat Streaming API(Socket based).
The way it works is - Every user has an account on my app and another account on the third party app. He gives me an access token for the third party chat app and I connect to the third party API to stream his chats. This happens for hundreds of users.
I need to create a socket connection pool for every user and run parallel threads. I am using a python library(for that API) and am able to achieve real time feeds for single users. How do I implement an asynchronous socket connection pool in Python or NodeJS? I have a Linux micro instance on EC2 and I need to run this application for 1000 users.
I am exploring Redis+Tornado to implement this. Are there any better alternatives?
This will be messy and also a couple of things to consider.
If you are going to use multiple threads remember that you can only run so many per CPU as the OS permits, rather go multiprocessing.
If you are going async with long polling processes it will prevent other clients from processing requests.
Solution
When your application absolutely needs to be real-time I would suggest websockets for server-client interaction.
Then from your clients request start a single process that listens\polls on your streaming API using multiprocessing in python. So you will essentially create a separate process for each client.
And now, to make your WebSocketHandler and Background API Streamer interact with each other you can use the Observer Pattern (https://en.wikipedia.org/wiki/Observer_pattern) to notify the WebSocket that you have received data from the API.
Make sure that you assign a unique ID to every client and make sure that you only post the data to the intended client when using websockets.
EDIT:
Web:
Also on your question regarding Tornado. It is a good lightweight framework for running a couple of users maybe 1000. But anything more than that I would suggest looking at Django as it will allow you to be more productive in producing code and also there are lots of tools out there that the community have developed over time.
Database:
Red.is is a good choice if you need a very fast no-sql db, also have a look at mongodb. If you require a multi-region DB I would suggest going with Cassandra or CouchDB due to the partitioned nodes. The image below might help you better decide which DB to use.

add asynchronous capabilities to django

I have a mature, production django-tastypie server with an angular client service where I need to update the client with real-time data. I was thinking about websockets for the client.
my question is which strategy is best for the server side:
use some django plugin that handles asynchronism
raise a new tornado server for the sake of handling the async part (and then make it learn my django users/authentication model)
embed tornado inside django (like this)
what is your recommendation? or maybe something else I didn't think of?
I would suggest using this:
https://github.com/jrief/django-websocket-redis
I have used it in production and it works very well; the main benefit is that it is non-blocking and integrates with Django's auth system (you can message specific users, groups or all users via the websocket).
#Randi's suggestion to use Celery if you need to run async tasks is a very good one: Celery is excellent. Combine that with django-websocket-redis and you can perform long-running async tasks and give your users real-time updates as the task runs.
An alternative, but probably less appropriate solution would be to poll the server for changes/notifications from your angular client every 2-5 seconds. However, this is the "old school" approach and will potentially add A LOT of load to your server. This approach would save the overhead of managing extra services and would probably be better choice if you have a small number of users.

Using RabbitMQ with Django to get information from internal servers

I've been trying to make a decision about my student project before going further. The main idea is get disk usage data, active linux user data, and so on from multiple internal server and publish them with Django.
Before I came to RabbitMQ I was thinking about developing a client application for each linux server and geting this data through a socket. But I want to make that student project simple. Also, I don't know how difficult it is to make a socket connection via Django.
So, I thought I could solve my problem with RabbitMQ without socket programming. Basically, I send a message to rabbit queue. Then get whatever I want from the consumer server.
On the Django side, the client will select one of the internal servers and click the "details" button. Then I want to show this information on web page.
I already read almost all documentation about rabbitmq, celery and pika. Sending messages to all internal servers(clients) and the calculation of information that I want to get is OKAY but I can't figure out how I can put this data on a webpage with Django?
How would you figure out that problem if you were me?
Thank you.
I solved my problem own my own. Solution is RabbitMQ RPC call. You can execute your python code on remote server and get result of process via RPC requests. Details can ben found here.
http://www.rabbitmq.com/tutorials/tutorial-six-python.html
Thank you guys.
Looks like you already done the hard work(celery, rabbit, etc) but missing Django basics. Go through the polls tutorial and getting started with django or the many other resources on the web, and It would be quite simple. Basically:
create the models (objects represented in db)
declare urls
setup views to pass the data from the model to the webpage template
create the templates (or do it with client side framework and create a JSON response)
EDIT: (after you clarified the question) Actually I just hit the same problem too. The answer is running another python process parallel to the Django process (in the same virtualenv) in this process you can set up a rabbit consumer (using pica, puka, kombu or whatever) and calling specific Django functions/methods to do something with the information from rabbitmq. you can also just call celery tasks from there to be executed in the Django app context.
a procfile for example (just illustrating, you can run both process in many other ways):
web: python manage.py runserver
worker: python listen_from_servers.py
Notice that you'll have to set the DJANGO_SETTIGNS_MODULE for the settings file enviroment variable for django imports to work.
You need the following two programs running at all times:
The producer, which will populate the queue. This is the program that will collect the various messages and then post them on the queue.
The consumer, which will process messages from the queue. This consumer's job is to read the message and do something with it; so that it is processed and removed from the queue. The function that this consumer does is entirely up to you, but what you want to do in this scenario is write information from the message to a database model; the same database that is part of your django app.
As the producer pushes messages and the consumer removes them from the queue, your database will get updated.
On the django side, the process is simply to filter this database and display records for a particular machine. As such, django does not need to be aware of how the records are being populated in the database - all django is doing is fetching, filtering, sending to the template and rendering the views.
The question comes how best (well actually, easily) populate the databases. You can do it the traditional way, by using Python's well documentation DB-API and write your own SQL statements; but since celery is so well integrated with django - you can use the django's ORM to do this work for you as well.
I hope this gets you going in the right direction.

Realtime options (Websockets, flash, polling) for Django?

What are some realtime "push" options for django that can install as a python package? I want to avoid having to do things like installing independent web-servers for realtime.
Essentially I am looking for something like pusher.com (cloud system) or this socket.io build for django (which has a build status:failing) for chat and other various push operations.
Ape was suggested here, but it seems it requires you to setup Ape as a server. If its not too much to ask for, are there any solutions that build right into django?
Since the time the answer was written (2012); a lot has changed.
The preferred method now to do realtime updates of the system is using websockets; which is being formalized and proposed as a standard RFC 6455. This page on MDN has a great overview of the technology.
The other emerging technology is Server Sent Events which is a W3C draft proposal.
Projects such as swampdragon and django-socketio make integrating realtime functionality easier in your project.
There are two main components to any realtime system:
A connection that remains open from the browser to a server.
A server that listens on this connection and then responds.
A system / standard to store and be notified of messages.
Okay so maybe three components.
Since django doesn't work in realtime any solution that offers realtime push/updates will require another server/service to accept messages and then notify listeners of pending messages.
Django would be the application that pushes messages (writes them) to this server on a channel (a queue/bucket). Listeners then subscribe to a channel to be notified of messages. Since the connection remains open; messages are retrieved in "realtime".
Django really has a minimal role in all of this. There are various implementations that provide the three components necessary for realtime notifications to work.
I really like juggernaut because it is super simple to set up, and uses node.js that doesn't require a lot in terms of server-side components. The other reason I prefer it is because it supports Adobe Flash Socket in addion to WebSocket (and others, see the link).
The api to access it also very simple - in fact, if you are already using redis (which you really should since its so easy to use), you don't need another API as you can drop messages to redis and juggernaut will read them, or you can use its Python API. A simple example from this flask snippet:
Send (write) a message to a channel:
>>> from juggernaut import Juggernaut
>>> jug = Juggernaut()
>>> jug.publish('channel', 'The message')
Listen to it:
<script type=text/javascript
src=http://localhost:8080/application.js></script>
<script type=text/javascript>
var jug = new Juggernaut();
jug.subscribe('channel', function(data) {
alert('Got message: ' + data);
});
</script>
Django is built to serve web pages and there is nothing out of the box to support websockets in django. The quickest/easiest option is pusher.com (I use it an really like it). You can start with something like pusher.com and if you write a quick wrapper around it, you can replace it with your own server using socket.io or any other web socket server by just changing the wrapper / interface to connect to the new server. Just make sure you write it with being able to switch out the backend at any time.
If you really want to start run your own socket server there are projects out there that will make it easy to use sockets in django:
django-websocket
django-socketio
You can actually serve up Django from Tornadio2, a working implementation of socketio in Tornado. If you want to build any degree of sophistication into your realtime app you will likely need a redis pubsub backend that maps sessions to channels and handles multicasting. For this you might like to take a look at Brukva. Also read up Yuval Adam's blog post on this subject. Finally, Tony Abou Assaleh's sample package and post will provide a useful base reference when setting up tornadio2 for django.

Categories