Best practices for often repeating background tasks in django and pythonanywhere

Best practices for often repeating background tasks in django and pythonanywhere - python

So, I am currently working on a django project hosted at pythonanywhere, which includes a feature for notifications, while also receiving data externally from sensors through AWS. I have been thinking of the best practice in order to implement this.
I currently have a simple implementation which is a view that checks all notifications and does the actions as needed if required, with an always-on task (which simply means a script that is running independently) sending a REST request to the server every minute.
Server side:
views.py:
def checkNotifications(request):
notificationsObject = notifications.objects.order_by('thing').values_list('thing').distinct()
thingsList = list(notificationsObject)
for thing in thingsList:
valuesDic = returnAllField(thing)
thingNotifications = notifications.objects.filter(thing=thing)
#Do stuff for each notification
urls:
path('notifications/',views.checkNotifications,name="checkNotification")
and the client just sents a GET request to my URL/notifications/. which works.
Now, while researching I saw some other options such as the ones discussed here with django background tasks and/or celery:
How to initialize repeating tasks using Django Background Tasks?
Celery task best practices in Django/Python
as well as some other options.
My question is: Is there a benefit to moving from my first implementation to this one? The only benefit I can see directly is avoid abuse from another service trying to hit my URl to check notifications too often, but I can/have a required authentication to avoid that. And, is there a certain "best practice" with regards to this, considering that I am checking with this repeating task quite so often, it almost feels like there should be a more proper/cleaner solution. For one, I am not sure if running a repeating task is the best option with pythonanywhere.
(https://help.pythonanywhere.com/pages/AsyncInWebApps/ suggests using always-on tasks, but it also mentions django background tasks)
Thank you

To use Django background tasks on PythonAnywhere you need to run it using an always-on task, so it is not an alternative, but just the other use of always-on tasks.
You can also access your Django code in your always-on task directly with some kind of long-running management command, so you do not need to hit your web app with a special request.

Related

Proper Celery Monitoring and File-Management with Web-Services

We are working on an Internet and Intranet platform, that serves client-requests over website applications.
There are heavy-weight computations on database entries and files. We want to update the state of those computations via push-notification to the client and make changes to files without the risk of race-conditions. The architecture is supposed to run on both, low- scaled one-server environments and high-scaled cluster environments.
So far, we are running a Django Webserver with Postgresql, the Python-Library Channels and RabbitMQ as Messagebroker.
Once a HTTP-Request from a client arrives in Django, we trigger the task via task.delay() and immediatly return the task_id to the client. The client then opens a websocket to another Django-route and hands over the task_ids he is interested in. Django then polls the state of the task via AsyncResult(task_id).state. Once the state changes, we read the results via AsyncResult(task_id).get and push the task_results to the client.
Here a similar sequence diagramm, from another project I found online.
Source(18.09.21)
Something that is not seen on the diagram, the channels_worker have to fetch the file they are working on from Django. A part of the result is not for the client, but to update the file. Django locks and updates the file localy as soon, as the client asks for and Django receives the task_results from celery (the changes only add attributes and will not be in conflict with each other).
My thoughts about this architecture are:
monitoring of the celery-events is bad so far.
It is only triggered by the client, which has to know about the tasks to begin with.
Django is not suited for monitoring
and polling is not efficient in general.
The file management seems fishy.
I would prefer a proper monitoring, where events are pushed to Django and the client. The client have to be able to consume the events at any time later.
I have some thoughts about solutions, but I would like to hear your opinion first. Later I can bring them in the discussion too.
Greetings
Python
Edit 1
From other sources I got helpful information regarding a good strategy.
Instead of Django "monitoring" the celery tasks, we can use a dedicated Websocket-Service, like FastAPI thand monitors task events and propagates them to the clients via websocket.
The client doesn't have to know about it's running tasks per se. Instead we can have ownership of tasks and the client only has to authenticate himself. The whole Security Blog will be implemented anyways and its supported by Celery.
For the file management, we should use a dedicated object storage like minio. This service can become subscriber to task-events related to files.

We all like Python, but we don't have to re-invent the wheel whenever we want a better monitoring or more control on the behavior of our systems.
That being said, I would recommend re-architecture the solution to decrease the complexity of your django application by exploring what native cloud solutions are offering in terms of micro-service architecture (API-Gateway), AWS SQS and SNS, computation, and storage options for your files.
Such an approach will carry out a lot of the monitoring, configurations, file management activities, and most importantly your monolithic application could scale without code changes or additional configurations.

add asynchronous capabilities to django

I have a mature, production django-tastypie server with an angular client service where I need to update the client with real-time data. I was thinking about websockets for the client.
my question is which strategy is best for the server side:
use some django plugin that handles asynchronism
raise a new tornado server for the sake of handling the async part (and then make it learn my django users/authentication model)
embed tornado inside django (like this)
what is your recommendation? or maybe something else I didn't think of?

I would suggest using this:
https://github.com/jrief/django-websocket-redis
I have used it in production and it works very well; the main benefit is that it is non-blocking and integrates with Django's auth system (you can message specific users, groups or all users via the websocket).
#Randi's suggestion to use Celery if you need to run async tasks is a very good one: Celery is excellent. Combine that with django-websocket-redis and you can perform long-running async tasks and give your users real-time updates as the task runs.
An alternative, but probably less appropriate solution would be to poll the server for changes/notifications from your angular client every 2-5 seconds. However, this is the "old school" approach and will potentially add A LOT of load to your server. This approach would save the overhead of managing extra services and would probably be better choice if you have a small number of users.

django-social-auth profile builder

I recently started playing around with django-social-auth and am looking for some help from the community to figure out the best way to move forward with an idea.
Once a user has registered you have access to his oauth token which allows you to pull certain data.
In my case I want to build a nice little profile based on the users avatar, location and maybe some other information if it's available.
Would the best way be to:
build a custom task for celery and pull the information and build the profile?
or, make use of signals to build the profile?

This really comes down to synchronous vs asynchronous. Django signals are synchronous that is they will block the response until they are completed. Celery tasks are asynchronous.
Which will be better will depend on whether the benefits of handling the profile building asynchronously outweighs the negatives of the maintaining the extra infrastructure necessary for celery.
It's basically impossible to answer this without a lot more specific information regarding the specifics of your situation.

Need help understanding Comet in Python (with Django)

After spending two entire days on this I'm still finding it impossible to understand all the choices and configurations for Comet in Python. I've read all the answers here as well as every blog post I could find. It feels like I'm about to hemorrhage at this point, so my utmost apologies for anything wrong with this question.
I'm entirely new to all of this, all I've done before were simple non-real-time sites with a PHP/Django backend on Apache.
My goal is to create a real-time chat application; hopefully tied to Django for users, auth, templates, etc.
Every time I read about a tool it says I need another tool on top of it, it feels like a never-ending chain.
First of all, can anybody categorize all the tools needed for this job?
I've read about different servers, networking libraries, engines, JavaScripts for the client side, and I don't know what else. I never imagined it would be this complex.
Twisted / Twisted Web seems to be popular, but I have no idea to to integrate it or what else I need (guessing I need client-side JS at least).
If I understand correctly, Orbited is built on Twisted, do I need anything else with it?
Are Gevent and Eventlet in the same category as Twisted? How much else do I need with them?
Where do things like Celery, RabbitMQ, or KV stores like Redis come into this? I don't really understand the concept of a message queue. Are they essential and what service do they provide?
Are there any complete chat app tutorials I should look at?
I'll be entirely indebted to anybody who helps me past this mental roadblock, and if I left anything out please don't hesitate to ask. I know it's a pretty loaded question.

You could use Socket.IO. There are gevent and tornado handlers for it. See my blog post on gevent-socketio with Django here: http://codysoyland.com/2011/feb/6/evented-django-part-one-socketio-and-gevent/

I feel your pain, having had to go through the same research over the past few months. I haven't had time to deal with proper documentation yet but I have a working example of using Django with socket.io and tornadio at http://bitbucket.org/virtualcommons/vcweb - I was hoping to set up direct communication from the Django server-side to the tornadio server process using queues (i.e., logic in a django view pushes a message onto a queue that then gets handled by tornadio which pushes a json encoded version of that message out to all interested subscribers) but haven't implemented that part fully yet. The way I've currently gotten it set up involves:
An external tornado (tornadio) server, running on another port, accepting socket.io requests and working with Django models. The only writes this server process makes to the database are the chat messages that need to be stored. It has full access to all Django models, etc., and all real-time interactions need to go directly through this server process.
Django template pages that require real-time access include the socket.io javascript and establish direct connections to the tornadio server
I looked into orbited, hookbox, and gevent but decided to go with socket.io + tornado as it seemed to allow me the cleanest javascript + python code. I could be wrong about that though, having just started to learn Python/Django over the past year.

Redis is relevant as a persistence layer that also supports native publish/subscribe. So instead of a situation where you are polling the db looking for new messages, you can subscribe to a channel, and have messages pushed out to you.
I found a working example of the type of system you describe. The magic happens in the socketio view:
def socketio(request):
"""The socket.io view."""
io = request.environ['socketio']
redis_sub = redis_client().pubsub()
user = username(request.user)
# Subscribe to incoming pubsub messages from redis.
def subscriber(io):
redis_sub.subscribe(room_channel())
redis_client().publish(room_channel(), user + ' connected.')
while io.connected():
for message in redis_sub.listen():
if message['type'] == 'message':
io.send(message['data'])
greenlet = Greenlet.spawn(subscriber, io)
# Listen to incoming messages from client.
while io.connected():
message = io.recv()
if message:
redis_client().publish(room_channel(), user + ': ' + message[0])
# Disconnected. Publish disconnect message and kill subscriber greenlet.
redis_client().publish(room_channel(), user + ' disconnected')
greenlet.throw(Greenlet.GreenletExit)
return HttpResponse()
Take the view step-by-step:
Set up socket.io, get a redis client and the current user
Use Gevent to register a "subscriber" - this takes incoming messages from Redis and forwards them on to the client browser.
Run a "publisher" which takes messages from socket.io (from the user's browser) and pushes them into Redis
Repeat until the socket disconnects
The Redis Cookbook gives a little more detail on the Redis side, as well as discussing how you can persist messages.
Regarding the rest of your question: Twisted is an event-based networking library, it could be considered an alternative to Gevent in this application. It's powerful and difficult to debug in my experience.
Celery is a "distributed task queue" - basically, it lets you spread units of work out across multiple machines. The "distributed" angle means some sort of transport is required between the machines. Celery supports several types of transport, including RabbitMQ (and Redis too).
In the context of your example, Celery would only be appropriate if you had to do some sort of costly processing on each message like scanning for profanity or something. Even still, something would have to initiate the Celery task, so there would need to be some code listening for the socket.io callback.
(Just in case you weren't totally confused, Celery itself can be made to use Gevent as its underlying concurrency library.)
Hope that helps!

Django authentication from an automated source

I have a set of URL's in my django application that trigger certain actions or processes. This would be similar to cron jobs. I have a script that polls one or more of these URLS at some regular inverval and I'm interested in adding a layer of security.
I'd like to set up an account for the script and require authentication before the proccesses would execute. I've been reading around in the Django user authentication documentation, along with python's urllib2 library and I'm just a bit lost. I have some ideas of how this might be done, but I dont' have a lot of experience in security like this.
Any suggested reading materials?

I have a script that polls one or more of these URLS at some regular inverval and I'm interested in adding a layer of security.
Have you considered using Celery? Celery works seamlessly with Django. This will let you periodically run jobs using the same authentication mechanism as the rest of the project. You can also make things a bit more uniform by avoiding urllib2.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.