How to queue up scheduled actions - python

I am trying to set up some scheduled tasks for a Django app with celery, hosted on heroku. Aside from not know how everything should be configured, what is the best way to approach this?
Let's say users can opt to receive a daily email at a time of their choosing.
Should I have a scheduled job that run every, say 5 minutes. Looks up every user who wants to be emailed at that time and then fire off the emails?
OR
Schedule a task for each user, when they set their preference. (Not sure how I would actually implement this yet)

It depends on how much accuracy you need. Do you want users to select the time down to the minute? second? or will allowing them to select the hour they wish to be emailed be enough.
If on the hour is accurate enough, then use a task that polls for users to mail every hour.
If your users need the mail to go out accurate to the second, then set a task for each user timed to complete on that second.
Everything in between comes down to personal choice. What are you more comfortable doing, and even more importantly: what produces the simplest code with the fewest failure modes?

I would suggest the first option (scheduled job that looks up outstanding jobs) - easier to scale and manage. What if you have 1000s of users - that is a lot of tasks JUST for sending emails.
If you use your database as celery broker, you can use django-celery's built in cron-like scheduling, which would allow you to create and destroy tasks dynamically. I don't like using the DB for my broker, though.
Also, you might want to check out chronos

Related

Fair scheduling of task execution time between web service users

Suppose we have the following web service. The main function is doing screenshots for the given website URL. There is REST API and user interface for entering URLs. For each new URL is a task in Celery is created. For frontend UI is important that screens for some URL will follow in a reasonable time, like 10 seconds.
Now a user, intensionally or by a software error, enters few hundreds URLs. This bloats task queue and other users must wait until all those tasks will be done.
So the request here is to:
Running tasks in some fair order. The simplest solution is to run one task for each user in one time. Like: user1 task, user2 task, user1 task, user2 task, and so on.
Having some priorities on tasks. Like tasks of priority 1 is always done before tasks of priority 2.
Currently, we utilize our handcrafted module. It stores tasks in Redis and pushes them in fair order to Celery. To not depend on Celery ordering it pushes only as many tasks as there are free Celery workers available, and checking Celery queue for free workers every 100 milliseconds.
Are there any libraries or services which meet my requirements?
How many tasks do you have?
How many users you have?
Sounds like you need rate-limiting mechanism in your webserver per user.
For your question, there are serval options:
you can use celery router and assign different tasks for different queues (and then consume from those queues by different workers.
Celery support tasks priority, you can read about it here.
You can rate-limit per task in Celery - again, depends on your usage.
EDIT:
#uhbif19 I described those features since you asked for them - you wanted a way to achieve priority and you send tasks with a specific priority.
In your current architecture you might want to decrease priority to abusers and avoid starvation of other users.
A better way to tackle this problem IMO is to add a rate-limiting mechanism in the gateway and ensure that a single user won't be able to abuse the system and make starvation for all others.
Good luck!

Schedule task for one year from now

What is the best way to schedule an event for a month or a year from now?
For example, I want my program to send a notification one year after a customer's registration.
I try to use celery with redis using the eta option but, at some point, the task multiplies and sends the same notification to the same customer (like 600 times). I also think that using a cronjob is not the best option.
Any suggestions?
You don't want to push something in the queue that you are not going to read for 1 year.
Store registration date in the database. Write a program that reads the database and pushes information to a topic. You can run this program everyday and it will find all the people who should be notified.
As JR ibkr pointed, it may not be a good idea to schedule a task to 1 year from now, but run a daily task to scan for people to notify.
But regardless, what you are seeing may be related to a bug with celery + redis configuration, which is discussed here: https://github.com/celery/celery/issues/4400 .
You may try using RabbitMQ as the message broker to avoid this issue, or try one of the suggestions in that discussion.
Hope you don't need to wait for 1 year to see if it works :)

How do I run a scheduled job in Django?

I have a Django app which have invitations stored in a db (mysql for now, but may go Postgres). These invitations have expiration dates. I want the invitation removed from the database when the expiration date arrives. I want this done from the Django side as opposed to directly from the database because I need the proper notifications / cleanup to occur which the app handles. I guess I could do have a cron job run every once in a while and have it hit the API but I really wanted to have all app components within the app and not rely on OS function (cron).
cron is the correct way to run scheduled jobs. That's the fundamental philosophy of unix-like systems: components that know how to do a single thing well, and cron is the thing that knows how to trigger jobs at a certain time.
In terms of what the job itself should be, the easiest thing is to write a custom management command.

Running functions automatically when certain criteria are met. Without user interaction.

I am using Flask.
I am currently using a fabfile to check which users should get a bill and I set up a cron job to run the fabfile every morning at 5am. This automatically creates bills in Stripe and in my database and sends out emails to the users to inform them. This could be used for birthday reminders or anything else similar.
Is setting up a cronjob the standard way of doing this sort of thing? Is there a better way/standard?
I would define "this sort of thing" as. Anything that needs to happen automatically in the app when certain criteria are met without a user interacting with said app.
I could not find much when I googled this.
Using cron is in effect the most straightforward way of doing it. However, there are other kind of services that trigger tasks on a periodic basis and offer some additional control. For instance, Celery's scheduler. There seems to be a tutorial about building periodic tasks with celery here.
What I think you have to ask yourself is:
Is a cron job the most reliable way of billing your customers?
I've written small/simple apps that use an internal timer. e.g: https://bitbucket.org/prologic/irclogger which roates it's irc log files once per day. Is this any better or more reliable? Not really; if the daemon/bot were to die prematurely or the system were to crash; what happens then? In this case it just gets started again and logs continue to rorate at the next "day" interval.
I think two things are important here:
Reliability
Robustness

What is the best way of handling rankings in django?

I'm building web service with ranking function.
I don't have powerful servers: whole service would be hosted in standard PC.
There could be time, when many users (in this case many = ~100) are refreshing ranking, so I would do it way, in which users wouldn't crash server because of this.
There is no problem in no real-time refreshing: I can show user ranking generated some time before.
There is no problem for me in generating ranking.
I can easily do this:
User.objects.filter(...).order_by('rank')
EDIT: More details:
I have some workers doing some calculating.
When worker ends its work, it changes rank field of some User instance.
You can assume, all users would do actions leading to several (5-20) calculating, each causing rank change of this user.
If updating the ranking is too long a task to do per-request, then here are a few solutions you could be using:
After something that updates a ranking happens, create an asynchronous task that will update the rankings but not block the request. You could use celery or gearman
Update periodically the rankings, using a celery periodic task or a cron job
Solution 1 is better performance wise but harder to get right. Solution 2 is easier to do, but could less optimal.

Categories