rabbitmq python - notifications design pattern to scale for thousands of users

rabbitmq python - notifications design pattern to scale for thousands of users - python

Our website has around 50,000 users and daily active traffic is pretty good. We are designing a new notifications feature for our user base.
Our requirement is as follows:
Users are part of different Groups.
A user can be part of multiple Groups.
When a user uploads a image in a group, all the members of that particular group should get a notification saying "new image uploaded" regardless of being online or offline.
We thought of creating rabbitmq exchanges for each group and queue for each user. But got confused going forward of designing the right way!!
Say, a user should receive notifications even he logs-in to site days after the notifications is generated. We ended up storing the messages in DB which is not a good thing at all for offline users.
Can someone suggest proper design pattern with explanation for this use case? We are using celery + rabbitmq + tornado. Should tornado talk directly to the celery consumer? Where do the messages get stored when the user is offline?

I have similar project. So how it works:
Put messages to your rabbit queue, from your events source ( django, celery, everywhere)
You can use pika+tornado IOLoop ( so when messages come to tornado you will receive notification via pika loop, when request comes http, or websocket connection via Tornado)
Use collection of opened in TornaodApplication websockets for messaging for users
You can check very close project with logging via torando+rabbitmq on my github: https://github.com/rmuslimov/RapidLog

Related

Using FastAPI for socket chat system?

How would one implement a comprehensive chat system using sockets within FastAPI. Specifically keeping the following in mind:
Multiple Chat rooms many-to-many between users
Storing messages with a SQL or NoSQL database for persistence
Security: Authentication or possibly encryption
I've looked at some libraries, but actual useful implementations are far between, regrettably.
Any advice or redirects towards places for more information will be of great help!

For chat rooms you could use FastAPI builtin websockets support and add redis pubsub or PostgreSQL pg_notify to it for sending messages to all participants in the room.
Storing messages in PostgreSQL is a solid choice because of its long history and stability.
Authentication can be handled by OAuth2 provider in FastAPI. Authorization can be handled by OAuth2 scopes that is hidden in the Advanced Security section in the FastAPI Documentation. Encryption is provided by HTTPS and reverse proxy that you put in front of your app.
There aren't any fully ready made libraries that provide everything out of the box. But breaking down the problem in to smaller pieces and then working on those will get you pretty far.
Write down what fields/data you want to store about your users, chat rooms, messages.
Implement those basic models in FastAPI probably using SQLAlchemy.
Wire up those models to api endpoints so that you can use those models in Swagger (list chatrooms, get and post messages into chatrooms).
Implement a websocket endpoint in FastAPI that will echo back everything sent to it. That should allow you to wire up some client side javascript for sending and receiving messages from the websocket.
Modify your exising message storing endpoint to push the same message also to redis publish topic and change your websocket endpoint to subscribe to the redis subscribe topic.
Add authentication to your endpoints. At first basic user/password, later more advanced configurations.
Add reverse proxy with https in front and voila.

Slack API - queue for retrieving slash commands

I am tasked with building a Slack slash command app in Python which will respond to incoming slash commands. However, for security reasons, I am not allowed to open the firewall for incoming webhooks from Slack. Is there instead a way to check a queue of sent slash commands?
For example, a user types "/myslashapp" in a specific channel. My app will need to do something like call an endpoint every 30 seconds and check if the "/myslashapp" command was sent. If it was, my app should trigger a Lambda function in AWS.
Based on reading the Slack API docs, I haven't found any way to do this other than perhaps the RTM API, though it seems like overkill and still requires an open socket.

No. The Slack API has no build-in support that allows you to pull requests after-the-fact from a queue instead of receiving them from Slack when they happen.
The RTM API might work for you, because the connection to Slack is initiated from your side. So - provided you firewall allows it - would also work from within an intranet. However, you can not do slash commands with the RTM API or any of the other interesting interactive Slack features like buttons. Only simple messages and events.
You could implement your own bridging solution and pull from it. But I don't think that a pulling solution would work, because it creates a lot of latency for your app. Users expect an immediate response to their slash command, not a delay of 30 secs or more.
So in summary I think you only have two valid options:
Host your app internally and use a secure VPN like ngrok to expose a public URL to your app.
Run your app on the Internet and let it have a secure connection to your Intranet for accessing internal data. (similar to e.g. a shopping web site would work, that has a public app on the Internet, but also can transmit orders to the business applications on the companies Intranet.)

The architecture of a real-time web chat app

I would to create a real-time web chat application using web.py in python. The problem is that I don't know how to 'architect' or design the such an app.
The way I'm thinking to implement this app is the following:
a user logs into the app.
the app connects to a controller that has a push service to push new messages and a queue service to store the new messages.
when the user sends a message, the app sends the message with an ajax call to the controller and the controller stores the message in a queue.
then the controller sends the messages in the queue to the destination user by its push service.
However I see this is a very poor design since I see a lot of ajax requests being sent here. I really don't know if there are better designs or architectures for such a service. So can you please point me toward the correct design for a real-time chat app?

Alex,
This is an understandable question, I recently thought about it when I was building my own messaging application. This is the way I broke down the app's functionality:
User registration
User authentication
Adding a new friend by username
Approving a friend
Messaging with a friend in list (Of course)
Shows online and offline users
Runs a background service in order to get messages even when the application is closed.
Uses notification area when a new message is received.
Quiting the application(kills the background service)
A few things I realized after building this application was:
The back-end architecture was a simple mixture of a simple CRUD application with pub/sub functionality. You can read up more on pub/sub systems here. Here's a simple chat application built using Ruby on Rails. You can look at it for reference, it's very well architected.
You should think about the last steps listed in the above functionality as much in the beginning of this app as you do in the end. If you architect it well in the beginning, the final steps will just fall into place! :-)
If you want to learn about concurrency and do something really cool, I suggest trying to implement some of the frameworks discussed here.
Please let me know if you have any questions!

Using RabbitMQ with Django to get information from internal servers

I've been trying to make a decision about my student project before going further. The main idea is get disk usage data, active linux user data, and so on from multiple internal server and publish them with Django.
Before I came to RabbitMQ I was thinking about developing a client application for each linux server and geting this data through a socket. But I want to make that student project simple. Also, I don't know how difficult it is to make a socket connection via Django.
So, I thought I could solve my problem with RabbitMQ without socket programming. Basically, I send a message to rabbit queue. Then get whatever I want from the consumer server.
On the Django side, the client will select one of the internal servers and click the "details" button. Then I want to show this information on web page.
I already read almost all documentation about rabbitmq, celery and pika. Sending messages to all internal servers(clients) and the calculation of information that I want to get is OKAY but I can't figure out how I can put this data on a webpage with Django?
How would you figure out that problem if you were me?
Thank you.

I solved my problem own my own. Solution is RabbitMQ RPC call. You can execute your python code on remote server and get result of process via RPC requests. Details can ben found here.
http://www.rabbitmq.com/tutorials/tutorial-six-python.html
Thank you guys.

Looks like you already done the hard work(celery, rabbit, etc) but missing Django basics. Go through the polls tutorial and getting started with django or the many other resources on the web, and It would be quite simple. Basically:
create the models (objects represented in db)
declare urls
setup views to pass the data from the model to the webpage template
create the templates (or do it with client side framework and create a JSON response)
EDIT: (after you clarified the question) Actually I just hit the same problem too. The answer is running another python process parallel to the Django process (in the same virtualenv) in this process you can set up a rabbit consumer (using pica, puka, kombu or whatever) and calling specific Django functions/methods to do something with the information from rabbitmq. you can also just call celery tasks from there to be executed in the Django app context.
a procfile for example (just illustrating, you can run both process in many other ways):
web: python manage.py runserver
worker: python listen_from_servers.py
Notice that you'll have to set the DJANGO_SETTIGNS_MODULE for the settings file enviroment variable for django imports to work.

You need the following two programs running at all times:
The producer, which will populate the queue. This is the program that will collect the various messages and then post them on the queue.
The consumer, which will process messages from the queue. This consumer's job is to read the message and do something with it; so that it is processed and removed from the queue. The function that this consumer does is entirely up to you, but what you want to do in this scenario is write information from the message to a database model; the same database that is part of your django app.
As the producer pushes messages and the consumer removes them from the queue, your database will get updated.
On the django side, the process is simply to filter this database and display records for a particular machine. As such, django does not need to be aware of how the records are being populated in the database - all django is doing is fetching, filtering, sending to the template and rendering the views.
The question comes how best (well actually, easily) populate the databases. You can do it the traditional way, by using Python's well documentation DB-API and write your own SQL statements; but since celery is so well integrated with django - you can use the django's ORM to do this work for you as well.
I hope this gets you going in the right direction.

Need help understanding Comet in Python (with Django)

After spending two entire days on this I'm still finding it impossible to understand all the choices and configurations for Comet in Python. I've read all the answers here as well as every blog post I could find. It feels like I'm about to hemorrhage at this point, so my utmost apologies for anything wrong with this question.
I'm entirely new to all of this, all I've done before were simple non-real-time sites with a PHP/Django backend on Apache.
My goal is to create a real-time chat application; hopefully tied to Django for users, auth, templates, etc.
Every time I read about a tool it says I need another tool on top of it, it feels like a never-ending chain.
First of all, can anybody categorize all the tools needed for this job?
I've read about different servers, networking libraries, engines, JavaScripts for the client side, and I don't know what else. I never imagined it would be this complex.
Twisted / Twisted Web seems to be popular, but I have no idea to to integrate it or what else I need (guessing I need client-side JS at least).
If I understand correctly, Orbited is built on Twisted, do I need anything else with it?
Are Gevent and Eventlet in the same category as Twisted? How much else do I need with them?
Where do things like Celery, RabbitMQ, or KV stores like Redis come into this? I don't really understand the concept of a message queue. Are they essential and what service do they provide?
Are there any complete chat app tutorials I should look at?
I'll be entirely indebted to anybody who helps me past this mental roadblock, and if I left anything out please don't hesitate to ask. I know it's a pretty loaded question.

You could use Socket.IO. There are gevent and tornado handlers for it. See my blog post on gevent-socketio with Django here: http://codysoyland.com/2011/feb/6/evented-django-part-one-socketio-and-gevent/

I feel your pain, having had to go through the same research over the past few months. I haven't had time to deal with proper documentation yet but I have a working example of using Django with socket.io and tornadio at http://bitbucket.org/virtualcommons/vcweb - I was hoping to set up direct communication from the Django server-side to the tornadio server process using queues (i.e., logic in a django view pushes a message onto a queue that then gets handled by tornadio which pushes a json encoded version of that message out to all interested subscribers) but haven't implemented that part fully yet. The way I've currently gotten it set up involves:
An external tornado (tornadio) server, running on another port, accepting socket.io requests and working with Django models. The only writes this server process makes to the database are the chat messages that need to be stored. It has full access to all Django models, etc., and all real-time interactions need to go directly through this server process.
Django template pages that require real-time access include the socket.io javascript and establish direct connections to the tornadio server
I looked into orbited, hookbox, and gevent but decided to go with socket.io + tornado as it seemed to allow me the cleanest javascript + python code. I could be wrong about that though, having just started to learn Python/Django over the past year.

Redis is relevant as a persistence layer that also supports native publish/subscribe. So instead of a situation where you are polling the db looking for new messages, you can subscribe to a channel, and have messages pushed out to you.
I found a working example of the type of system you describe. The magic happens in the socketio view:
def socketio(request):
"""The socket.io view."""
io = request.environ['socketio']
redis_sub = redis_client().pubsub()
user = username(request.user)
# Subscribe to incoming pubsub messages from redis.
def subscriber(io):
redis_sub.subscribe(room_channel())
redis_client().publish(room_channel(), user + ' connected.')
while io.connected():
for message in redis_sub.listen():
if message['type'] == 'message':
io.send(message['data'])
greenlet = Greenlet.spawn(subscriber, io)
# Listen to incoming messages from client.
while io.connected():
message = io.recv()
if message:
redis_client().publish(room_channel(), user + ': ' + message[0])
# Disconnected. Publish disconnect message and kill subscriber greenlet.
redis_client().publish(room_channel(), user + ' disconnected')
greenlet.throw(Greenlet.GreenletExit)
return HttpResponse()
Take the view step-by-step:
Set up socket.io, get a redis client and the current user
Use Gevent to register a "subscriber" - this takes incoming messages from Redis and forwards them on to the client browser.
Run a "publisher" which takes messages from socket.io (from the user's browser) and pushes them into Redis
Repeat until the socket disconnects
The Redis Cookbook gives a little more detail on the Redis side, as well as discussing how you can persist messages.
Regarding the rest of your question: Twisted is an event-based networking library, it could be considered an alternative to Gevent in this application. It's powerful and difficult to debug in my experience.
Celery is a "distributed task queue" - basically, it lets you spread units of work out across multiple machines. The "distributed" angle means some sort of transport is required between the machines. Celery supports several types of transport, including RabbitMQ (and Redis too).
In the context of your example, Celery would only be appropriate if you had to do some sort of costly processing on each message like scanning for profanity or something. Even still, something would have to initiate the Celery task, so there would need to be some code listening for the socket.io callback.
(Just in case you weren't totally confused, Celery itself can be made to use Gevent as its underlying concurrency library.)
Hope that helps!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.