I have a mature, production django-tastypie server with an angular client service where I need to update the client with real-time data. I was thinking about websockets for the client.
my question is which strategy is best for the server side:
use some django plugin that handles asynchronism
raise a new tornado server for the sake of handling the async part (and then make it learn my django users/authentication model)
embed tornado inside django (like this)
what is your recommendation? or maybe something else I didn't think of?
I would suggest using this:
https://github.com/jrief/django-websocket-redis
I have used it in production and it works very well; the main benefit is that it is non-blocking and integrates with Django's auth system (you can message specific users, groups or all users via the websocket).
#Randi's suggestion to use Celery if you need to run async tasks is a very good one: Celery is excellent. Combine that with django-websocket-redis and you can perform long-running async tasks and give your users real-time updates as the task runs.
An alternative, but probably less appropriate solution would be to poll the server for changes/notifications from your angular client every 2-5 seconds. However, this is the "old school" approach and will potentially add A LOT of load to your server. This approach would save the overhead of managing extra services and would probably be better choice if you have a small number of users.
Related
We are working on an Internet and Intranet platform, that serves client-requests over website applications.
There are heavy-weight computations on database entries and files. We want to update the state of those computations via push-notification to the client and make changes to files without the risk of race-conditions. The architecture is supposed to run on both, low- scaled one-server environments and high-scaled cluster environments.
So far, we are running a Django Webserver with Postgresql, the Python-Library Channels and RabbitMQ as Messagebroker.
Once a HTTP-Request from a client arrives in Django, we trigger the task via task.delay() and immediatly return the task_id to the client. The client then opens a websocket to another Django-route and hands over the task_ids he is interested in. Django then polls the state of the task via AsyncResult(task_id).state. Once the state changes, we read the results via AsyncResult(task_id).get and push the task_results to the client.
Here a similar sequence diagramm, from another project I found online.
Source(18.09.21)
Something that is not seen on the diagram, the channels_worker have to fetch the file they are working on from Django. A part of the result is not for the client, but to update the file. Django locks and updates the file localy as soon, as the client asks for and Django receives the task_results from celery (the changes only add attributes and will not be in conflict with each other).
My thoughts about this architecture are:
monitoring of the celery-events is bad so far.
It is only triggered by the client, which has to know about the tasks to begin with.
Django is not suited for monitoring
and polling is not efficient in general.
The file management seems fishy.
I would prefer a proper monitoring, where events are pushed to Django and the client. The client have to be able to consume the events at any time later.
I have some thoughts about solutions, but I would like to hear your opinion first. Later I can bring them in the discussion too.
Greetings
Python
Edit 1
From other sources I got helpful information regarding a good strategy.
Instead of Django "monitoring" the celery tasks, we can use a dedicated Websocket-Service, like FastAPI thand monitors task events and propagates them to the clients via websocket.
The client doesn't have to know about it's running tasks per se. Instead we can have ownership of tasks and the client only has to authenticate himself. The whole Security Blog will be implemented anyways and its supported by Celery.
For the file management, we should use a dedicated object storage like minio. This service can become subscriber to task-events related to files.
We all like Python, but we don't have to re-invent the wheel whenever we want a better monitoring or more control on the behavior of our systems.
That being said, I would recommend re-architecture the solution to decrease the complexity of your django application by exploring what native cloud solutions are offering in terms of micro-service architecture (API-Gateway), AWS SQS and SNS, computation, and storage options for your files.
Such an approach will carry out a lot of the monitoring, configurations, file management activities, and most importantly your monolithic application could scale without code changes or additional configurations.
So, I am currently working on a django project hosted at pythonanywhere, which includes a feature for notifications, while also receiving data externally from sensors through AWS. I have been thinking of the best practice in order to implement this.
I currently have a simple implementation which is a view that checks all notifications and does the actions as needed if required, with an always-on task (which simply means a script that is running independently) sending a REST request to the server every minute.
Server side:
views.py:
def checkNotifications(request):
notificationsObject = notifications.objects.order_by('thing').values_list('thing').distinct()
thingsList = list(notificationsObject)
for thing in thingsList:
valuesDic = returnAllField(thing)
thingNotifications = notifications.objects.filter(thing=thing)
#Do stuff for each notification
urls:
path('notifications/',views.checkNotifications,name="checkNotification")
and the client just sents a GET request to my URL/notifications/. which works.
Now, while researching I saw some other options such as the ones discussed here with django background tasks and/or celery:
How to initialize repeating tasks using Django Background Tasks?
Celery task best practices in Django/Python
as well as some other options.
My question is: Is there a benefit to moving from my first implementation to this one? The only benefit I can see directly is avoid abuse from another service trying to hit my URl to check notifications too often, but I can/have a required authentication to avoid that. And, is there a certain "best practice" with regards to this, considering that I am checking with this repeating task quite so often, it almost feels like there should be a more proper/cleaner solution. For one, I am not sure if running a repeating task is the best option with pythonanywhere.
(https://help.pythonanywhere.com/pages/AsyncInWebApps/ suggests using always-on tasks, but it also mentions django background tasks)
Thank you
To use Django background tasks on PythonAnywhere you need to run it using an always-on task, so it is not an alternative, but just the other use of always-on tasks.
You can also access your Django code in your always-on task directly with some kind of long-running management command, so you do not need to hit your web app with a special request.
I am creating an application that basically has multiple connections to a third party Chat Streaming API(Socket based).
The way it works is - Every user has an account on my app and another account on the third party app. He gives me an access token for the third party chat app and I connect to the third party API to stream his chats. This happens for hundreds of users.
I need to create a socket connection pool for every user and run parallel threads. I am using a python library(for that API) and am able to achieve real time feeds for single users. How do I implement an asynchronous socket connection pool in Python or NodeJS? I have a Linux micro instance on EC2 and I need to run this application for 1000 users.
I am exploring Redis+Tornado to implement this. Are there any better alternatives?
This will be messy and also a couple of things to consider.
If you are going to use multiple threads remember that you can only run so many per CPU as the OS permits, rather go multiprocessing.
If you are going async with long polling processes it will prevent other clients from processing requests.
Solution
When your application absolutely needs to be real-time I would suggest websockets for server-client interaction.
Then from your clients request start a single process that listens\polls on your streaming API using multiprocessing in python. So you will essentially create a separate process for each client.
And now, to make your WebSocketHandler and Background API Streamer interact with each other you can use the Observer Pattern (https://en.wikipedia.org/wiki/Observer_pattern) to notify the WebSocket that you have received data from the API.
Make sure that you assign a unique ID to every client and make sure that you only post the data to the intended client when using websockets.
EDIT:
Web:
Also on your question regarding Tornado. It is a good lightweight framework for running a couple of users maybe 1000. But anything more than that I would suggest looking at Django as it will allow you to be more productive in producing code and also there are lots of tools out there that the community have developed over time.
Database:
Red.is is a good choice if you need a very fast no-sql db, also have a look at mongodb. If you require a multi-region DB I would suggest going with Cassandra or CouchDB due to the partitioned nodes. The image below might help you better decide which DB to use.
I am new at Server side,
but I have gotten a chance to design and implement a server that will cover around 2000~3000 client.
And I am thinking that I will use Python and Websocket, though I don't know this choice is appropriate.
In this point, I am curious on how to design the server.
I think there must be some architecture normally in use depending on capacity that server handles.
Otherwise, Could I use a Websocket server offered by some python package like Tornado or Django?
I hope that I can get any information on this.
Any advice?
I've had good experiences using haproxy in front of sockjs-tornado. Depending on how complex your server-side logic, routing, and persistence requirements are, you could write all your server endpoints using tornado and use SQLAlchemy to handle writes to a relational database or use a non SQL data store like Redis.
If your main requirement is real-time interactivity it might be worth investigating meteor as well.
One of solutions could be Pyramid, sockjs, gunicorn, and gevent. Nginx probably better suits to be a frontend than Apache, but of course if you do not have any lengthy processing on the backend, any decent asynchronous Python server with websocket and sockjs support (not sure about socket.io as an alternative) will work for you out of the box.
Lenghty processing should be offloaded to some queue workers anyway, so asynchronous server will fit the bill.
Just check whether all used datastore/database adapters are compatible with your server solution be it asynchronous or multi-threading.
After spending two entire days on this I'm still finding it impossible to understand all the choices and configurations for Comet in Python. I've read all the answers here as well as every blog post I could find. It feels like I'm about to hemorrhage at this point, so my utmost apologies for anything wrong with this question.
I'm entirely new to all of this, all I've done before were simple non-real-time sites with a PHP/Django backend on Apache.
My goal is to create a real-time chat application; hopefully tied to Django for users, auth, templates, etc.
Every time I read about a tool it says I need another tool on top of it, it feels like a never-ending chain.
First of all, can anybody categorize all the tools needed for this job?
I've read about different servers, networking libraries, engines, JavaScripts for the client side, and I don't know what else. I never imagined it would be this complex.
Twisted / Twisted Web seems to be popular, but I have no idea to to integrate it or what else I need (guessing I need client-side JS at least).
If I understand correctly, Orbited is built on Twisted, do I need anything else with it?
Are Gevent and Eventlet in the same category as Twisted? How much else do I need with them?
Where do things like Celery, RabbitMQ, or KV stores like Redis come into this? I don't really understand the concept of a message queue. Are they essential and what service do they provide?
Are there any complete chat app tutorials I should look at?
I'll be entirely indebted to anybody who helps me past this mental roadblock, and if I left anything out please don't hesitate to ask. I know it's a pretty loaded question.
You could use Socket.IO. There are gevent and tornado handlers for it. See my blog post on gevent-socketio with Django here: http://codysoyland.com/2011/feb/6/evented-django-part-one-socketio-and-gevent/
I feel your pain, having had to go through the same research over the past few months. I haven't had time to deal with proper documentation yet but I have a working example of using Django with socket.io and tornadio at http://bitbucket.org/virtualcommons/vcweb - I was hoping to set up direct communication from the Django server-side to the tornadio server process using queues (i.e., logic in a django view pushes a message onto a queue that then gets handled by tornadio which pushes a json encoded version of that message out to all interested subscribers) but haven't implemented that part fully yet. The way I've currently gotten it set up involves:
An external tornado (tornadio) server, running on another port, accepting socket.io requests and working with Django models. The only writes this server process makes to the database are the chat messages that need to be stored. It has full access to all Django models, etc., and all real-time interactions need to go directly through this server process.
Django template pages that require real-time access include the socket.io javascript and establish direct connections to the tornadio server
I looked into orbited, hookbox, and gevent but decided to go with socket.io + tornado as it seemed to allow me the cleanest javascript + python code. I could be wrong about that though, having just started to learn Python/Django over the past year.
Redis is relevant as a persistence layer that also supports native publish/subscribe. So instead of a situation where you are polling the db looking for new messages, you can subscribe to a channel, and have messages pushed out to you.
I found a working example of the type of system you describe. The magic happens in the socketio view:
def socketio(request):
"""The socket.io view."""
io = request.environ['socketio']
redis_sub = redis_client().pubsub()
user = username(request.user)
# Subscribe to incoming pubsub messages from redis.
def subscriber(io):
redis_sub.subscribe(room_channel())
redis_client().publish(room_channel(), user + ' connected.')
while io.connected():
for message in redis_sub.listen():
if message['type'] == 'message':
io.send(message['data'])
greenlet = Greenlet.spawn(subscriber, io)
# Listen to incoming messages from client.
while io.connected():
message = io.recv()
if message:
redis_client().publish(room_channel(), user + ': ' + message[0])
# Disconnected. Publish disconnect message and kill subscriber greenlet.
redis_client().publish(room_channel(), user + ' disconnected')
greenlet.throw(Greenlet.GreenletExit)
return HttpResponse()
Take the view step-by-step:
Set up socket.io, get a redis client and the current user
Use Gevent to register a "subscriber" - this takes incoming messages from Redis and forwards them on to the client browser.
Run a "publisher" which takes messages from socket.io (from the user's browser) and pushes them into Redis
Repeat until the socket disconnects
The Redis Cookbook gives a little more detail on the Redis side, as well as discussing how you can persist messages.
Regarding the rest of your question: Twisted is an event-based networking library, it could be considered an alternative to Gevent in this application. It's powerful and difficult to debug in my experience.
Celery is a "distributed task queue" - basically, it lets you spread units of work out across multiple machines. The "distributed" angle means some sort of transport is required between the machines. Celery supports several types of transport, including RabbitMQ (and Redis too).
In the context of your example, Celery would only be appropriate if you had to do some sort of costly processing on each message like scanning for profanity or something. Even still, something would have to initiate the Celery task, so there would need to be some code listening for the socket.io callback.
(Just in case you weren't totally confused, Celery itself can be made to use Gevent as its underlying concurrency library.)
Hope that helps!