How can I emit socket.io messages from Redis rq-scheduler task?

How can I emit socket.io messages from Redis rq-scheduler task? - python

I'm using flask and socket.io to build an app that queries an external API every 10 seconds, and emits the response to the users.
I've set up rq-scheduler to run a function every 10 seconds to query an API. I'd like to emit a socket.io message from this job, but I can't figure out how to do that, or what best practices are.
I see in the flask-socket.io docs, emitting from an external process, but this example is very limited. Is there a clear example of how to actually emit an event from an external process? Does the code written in the docs go in the rq task, or in the flask server code?
Do I need to instantiate a socket.io server every time the task is run? How does the task know about socket.io otherwise?
I'd really appreciate if someone could point me in the right direction here. If there is a more robust example that would be awesome.

Related

Trouble with understanding how RabbitMQ can be used

i'm currently working on a Python web app that needs to implement RabbitMQ.
The app is structured like that :
The client connects to a HTTP server
His connexion is send to a message queue that is connected to the main service of my app
the main service receive the message and give the user his information
I understand how to make work RabbitMq using the documentation and tutorial on the website but I have trouble seeing how can it work with real tasks like displaying a web page or printing a file ? How does my service connected to the message queue will read the message received and say : "oh, i'm gonna display this webpage".
Sorry if this is confusing, if you need further explanations on what i'm trying to get, just tell me.
Thanks for reading!

RabbitMq can be good to send message to service which can execute long running process - ie download big file, generate complex animation. Web server can't (or shoudn't) execute long running process.
Web page sends message to RabbitMq (ie. with parameters for long running process) and get unique number. When service has free worker then it checks if there is new message in queue, get it (with unique number) and start worker. When worker finish job then service send result to RabbitMQ with the same uniqe number.
At the same time web page uses JavaScript to run loop which periodically check in RabbitMQ if there is result with this unique number. If there is no result then it may display progressbar, if there is result then it may display this result.
Example: Celery - Distributed Task Queue.
Celery can use RabbitMQ to communicate with Django or Flask.
(but it can use other modules ie. Redis)
Using Celery with Django.
Flask - Celery Background Tasks
From Celery repo
Celery is usually used with a message broker to send and receive messages.
The RabbitMQ, Redis transports are feature complete, but there's also
experimental support for a myriad of other solutions, including using
SQLite for local development.

Asynchronous Socket connections in Python or Node

I am creating an application that basically has multiple connections to a third party Chat Streaming API(Socket based).
The way it works is - Every user has an account on my app and another account on the third party app. He gives me an access token for the third party chat app and I connect to the third party API to stream his chats. This happens for hundreds of users.
I need to create a socket connection pool for every user and run parallel threads. I am using a python library(for that API) and am able to achieve real time feeds for single users. How do I implement an asynchronous socket connection pool in Python or NodeJS? I have a Linux micro instance on EC2 and I need to run this application for 1000 users.
I am exploring Redis+Tornado to implement this. Are there any better alternatives?

This will be messy and also a couple of things to consider.
If you are going to use multiple threads remember that you can only run so many per CPU as the OS permits, rather go multiprocessing.
If you are going async with long polling processes it will prevent other clients from processing requests.
Solution
When your application absolutely needs to be real-time I would suggest websockets for server-client interaction.
Then from your clients request start a single process that listens\polls on your streaming API using multiprocessing in python. So you will essentially create a separate process for each client.
And now, to make your WebSocketHandler and Background API Streamer interact with each other you can use the Observer Pattern (https://en.wikipedia.org/wiki/Observer_pattern) to notify the WebSocket that you have received data from the API.
Make sure that you assign a unique ID to every client and make sure that you only post the data to the intended client when using websockets.
EDIT:
Web:
Also on your question regarding Tornado. It is a good lightweight framework for running a couple of users maybe 1000. But anything more than that I would suggest looking at Django as it will allow you to be more productive in producing code and also there are lots of tools out there that the community have developed over time.
Database:
Red.is is a good choice if you need a very fast no-sql db, also have a look at mongodb. If you require a multi-region DB I would suggest going with Cassandra or CouchDB due to the partitioned nodes. The image below might help you better decide which DB to use.

Python Embed Web Server in Data Processing Node

I am working on a Python 2.7 project with a simple event loop that checks a variety of data sources (rabbitmq, mongodb, postgres, etc) for new data, processes the data and writes data to the next stage.
I would like to embed a web server in the application so it can receive simple REST commands, for shutting it down, diagnosis etc.
However, from reading the documentation on the available web servers it wasn’t clear if they will allow the event loop described above to function outside of the web server’s event loop. Ie. it looks like I would have to do something like launch the event loop using a REST call and have the loop live on an io thread, or similar.
Can someone explain which embedded server (cherrypy, bottle, flask, etc) / concurrency framework (tornado, gevent, twisted etc.) are best suited for this problem?
Thank you in advance!

I would recommend you use a separate process for your app that will receive REST commands (use Pyramid or Flask), and have it send messages over RabbitMQ to the real time part. I like Kombu myself for interfacing with RabbitMQ, and your message bus will nicely decouple your web/rest needs from your event driven needs. Your event driven part just gets messages off the bus, and doesn't need to know anything about REST.

Need help understanding Comet in Python (with Django)

After spending two entire days on this I'm still finding it impossible to understand all the choices and configurations for Comet in Python. I've read all the answers here as well as every blog post I could find. It feels like I'm about to hemorrhage at this point, so my utmost apologies for anything wrong with this question.
I'm entirely new to all of this, all I've done before were simple non-real-time sites with a PHP/Django backend on Apache.
My goal is to create a real-time chat application; hopefully tied to Django for users, auth, templates, etc.
Every time I read about a tool it says I need another tool on top of it, it feels like a never-ending chain.
First of all, can anybody categorize all the tools needed for this job?
I've read about different servers, networking libraries, engines, JavaScripts for the client side, and I don't know what else. I never imagined it would be this complex.
Twisted / Twisted Web seems to be popular, but I have no idea to to integrate it or what else I need (guessing I need client-side JS at least).
If I understand correctly, Orbited is built on Twisted, do I need anything else with it?
Are Gevent and Eventlet in the same category as Twisted? How much else do I need with them?
Where do things like Celery, RabbitMQ, or KV stores like Redis come into this? I don't really understand the concept of a message queue. Are they essential and what service do they provide?
Are there any complete chat app tutorials I should look at?
I'll be entirely indebted to anybody who helps me past this mental roadblock, and if I left anything out please don't hesitate to ask. I know it's a pretty loaded question.

You could use Socket.IO. There are gevent and tornado handlers for it. See my blog post on gevent-socketio with Django here: http://codysoyland.com/2011/feb/6/evented-django-part-one-socketio-and-gevent/

I feel your pain, having had to go through the same research over the past few months. I haven't had time to deal with proper documentation yet but I have a working example of using Django with socket.io and tornadio at http://bitbucket.org/virtualcommons/vcweb - I was hoping to set up direct communication from the Django server-side to the tornadio server process using queues (i.e., logic in a django view pushes a message onto a queue that then gets handled by tornadio which pushes a json encoded version of that message out to all interested subscribers) but haven't implemented that part fully yet. The way I've currently gotten it set up involves:
An external tornado (tornadio) server, running on another port, accepting socket.io requests and working with Django models. The only writes this server process makes to the database are the chat messages that need to be stored. It has full access to all Django models, etc., and all real-time interactions need to go directly through this server process.
Django template pages that require real-time access include the socket.io javascript and establish direct connections to the tornadio server
I looked into orbited, hookbox, and gevent but decided to go with socket.io + tornado as it seemed to allow me the cleanest javascript + python code. I could be wrong about that though, having just started to learn Python/Django over the past year.

Redis is relevant as a persistence layer that also supports native publish/subscribe. So instead of a situation where you are polling the db looking for new messages, you can subscribe to a channel, and have messages pushed out to you.
I found a working example of the type of system you describe. The magic happens in the socketio view:
def socketio(request):
"""The socket.io view."""
io = request.environ['socketio']
redis_sub = redis_client().pubsub()
user = username(request.user)
# Subscribe to incoming pubsub messages from redis.
def subscriber(io):
redis_sub.subscribe(room_channel())
redis_client().publish(room_channel(), user + ' connected.')
while io.connected():
for message in redis_sub.listen():
if message['type'] == 'message':
io.send(message['data'])
greenlet = Greenlet.spawn(subscriber, io)
# Listen to incoming messages from client.
while io.connected():
message = io.recv()
if message:
redis_client().publish(room_channel(), user + ': ' + message[0])
# Disconnected. Publish disconnect message and kill subscriber greenlet.
redis_client().publish(room_channel(), user + ' disconnected')
greenlet.throw(Greenlet.GreenletExit)
return HttpResponse()
Take the view step-by-step:
Set up socket.io, get a redis client and the current user
Use Gevent to register a "subscriber" - this takes incoming messages from Redis and forwards them on to the client browser.
Run a "publisher" which takes messages from socket.io (from the user's browser) and pushes them into Redis
Repeat until the socket disconnects
The Redis Cookbook gives a little more detail on the Redis side, as well as discussing how you can persist messages.
Regarding the rest of your question: Twisted is an event-based networking library, it could be considered an alternative to Gevent in this application. It's powerful and difficult to debug in my experience.
Celery is a "distributed task queue" - basically, it lets you spread units of work out across multiple machines. The "distributed" angle means some sort of transport is required between the machines. Celery supports several types of transport, including RabbitMQ (and Redis too).
In the context of your example, Celery would only be appropriate if you had to do some sort of costly processing on each message like scanning for profanity or something. Even still, something would have to initiate the Celery task, so there would need to be some code listening for the socket.io callback.
(Just in case you weren't totally confused, Celery itself can be made to use Gevent as its underlying concurrency library.)
Hope that helps!

Python Threads (or their equivalent) on Google Application Engine Workaround?

I want to make a Google App Engine app that does the following:
Client makes an asynchronous http request
Server starts processing that request
Client makes ajax http requests to get progress
The problem is that the server processing (step #2) may take more than 30 seconds.
I know that you can't have threads on Google Application Engine and that all tasks must complete within 30 seconds or they get shut down. Is there some way to work around this?
Also, I'm using python-django as a backend.

You'll want to use the Task Queue API, probably via deferred tasks. The deferred API makes working with Task Queues dramatically simpler.
Essentially, you'll want to spawn a task to start the processing. That task should catch DeadlineExceeded exceptions and reschedule itself (again via the deferred API) to continue processing. This requires that your tasks be able to keep track of their own progress. They can also update their own status in memcache, which you can use to write a view that checks a task's status. That view can then be polled via Ajax.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.