What is the best way to schedule an event for a month or a year from now?
For example, I want my program to send a notification one year after a customer's registration.
I try to use celery with redis using the eta option but, at some point, the task multiplies and sends the same notification to the same customer (like 600 times). I also think that using a cronjob is not the best option.
Any suggestions?
You don't want to push something in the queue that you are not going to read for 1 year.
Store registration date in the database. Write a program that reads the database and pushes information to a topic. You can run this program everyday and it will find all the people who should be notified.
As JR ibkr pointed, it may not be a good idea to schedule a task to 1 year from now, but run a daily task to scan for people to notify.
But regardless, what you are seeing may be related to a bug with celery + redis configuration, which is discussed here: https://github.com/celery/celery/issues/4400 .
You may try using RabbitMQ as the message broker to avoid this issue, or try one of the suggestions in that discussion.
Hope you don't need to wait for 1 year to see if it works :)
Related
I am trying to use threads in Odoo 14 for the first time and I would like to ask for some basics and common pitfalls.
I found surprisingly little on this topic online. Even official docs basically say: don't do this unless you are 1000% sure you know what you are doing, but they don't provide any resources on how to learn it.
Key points I would like to learn:
How to read, write, create and unlink records in a new thread?
When to commit and when to rollback?
What is cr.savepoint()?
My usecase: I have 2000 products for which I need to get the current price every day from 5 different vendor e-shops with HTTP requests. This process will be run from CRON and cannot be blocking, because it might take a long time to complete.
Any help would be appreciated. Even if you answer only part of my questions I would be glad.
Originally it was a small project, just 150 accounts that I wrote a selenium program in python to do a small task with them. It used one computer and took about 5 hours. Now however, I am looking to scale to 1000 accounts. For obvious reasons I do not want to do this on one computer, this task needs to be done once per day and it would obviously take about 30-35 hours for this task to complete on one machine. I want to use more than one machine, but also want to have the option to easily scale to three, four, or more.
I have moved the data of all the accounts into an amazon cloud database, and can easily connect to it from my python program. However, as I mentioned earlier I want this project to be easily scalable. I do not want to hardcode values, aka have one computer do accounts 1-500 and the other do 501-1000 (What if I added 500 more accounts and 2 machines? I would want each machine to do 1500/4). I'm thinking of a master slave approach. Where on each machine I have a program that can be called with some number of accounts as an array. And a master program that runs on my machine that once per 24 hours can send out a command with the accounts each machine is supposed to utilize.
Then I want the program to return the data back to me and when each slave is finished the master program will combine the data sent back by each slave and update the table accordingly. OR each slave to update the table independently, but I am not sure this is possible due to table-locks (if anyone could maybe comment on this it would be helpful as well)
Thanks for reading!
Edit: If you think this is too broad I'm not looking for an exact answer. Just trying to find someone who has done anything like this before. Just listing a technology or method of doing this that I can research would help me a lot
I've done a similar thing before and ended up using a master-slave design.
I had a master with the database of "jobs" and the slaves queried it to get their tasks.
In my case the process what something like this:
Slave query master for jobs
Master send 50 jobs and change status in DB with slave name
Slave finish jobs and tells master
Master change status in DB to complete and send new tasks
Repeat until queue is all completed
This way I could add more slaves as the job queue grew bigger and they could have different performance. Some of my slaves did 3 times more than the slowest ones, depending on internet connection and page loading times.
I am using Flask.
I am currently using a fabfile to check which users should get a bill and I set up a cron job to run the fabfile every morning at 5am. This automatically creates bills in Stripe and in my database and sends out emails to the users to inform them. This could be used for birthday reminders or anything else similar.
Is setting up a cronjob the standard way of doing this sort of thing? Is there a better way/standard?
I would define "this sort of thing" as. Anything that needs to happen automatically in the app when certain criteria are met without a user interacting with said app.
I could not find much when I googled this.
Using cron is in effect the most straightforward way of doing it. However, there are other kind of services that trigger tasks on a periodic basis and offer some additional control. For instance, Celery's scheduler. There seems to be a tutorial about building periodic tasks with celery here.
What I think you have to ask yourself is:
Is a cron job the most reliable way of billing your customers?
I've written small/simple apps that use an internal timer. e.g: https://bitbucket.org/prologic/irclogger which roates it's irc log files once per day. Is this any better or more reliable? Not really; if the daemon/bot were to die prematurely or the system were to crash; what happens then? In this case it just gets started again and logs continue to rorate at the next "day" interval.
I think two things are important here:
Reliability
Robustness
I am trying to set up some scheduled tasks for a Django app with celery, hosted on heroku. Aside from not know how everything should be configured, what is the best way to approach this?
Let's say users can opt to receive a daily email at a time of their choosing.
Should I have a scheduled job that run every, say 5 minutes. Looks up every user who wants to be emailed at that time and then fire off the emails?
OR
Schedule a task for each user, when they set their preference. (Not sure how I would actually implement this yet)
It depends on how much accuracy you need. Do you want users to select the time down to the minute? second? or will allowing them to select the hour they wish to be emailed be enough.
If on the hour is accurate enough, then use a task that polls for users to mail every hour.
If your users need the mail to go out accurate to the second, then set a task for each user timed to complete on that second.
Everything in between comes down to personal choice. What are you more comfortable doing, and even more importantly: what produces the simplest code with the fewest failure modes?
I would suggest the first option (scheduled job that looks up outstanding jobs) - easier to scale and manage. What if you have 1000s of users - that is a lot of tasks JUST for sending emails.
If you use your database as celery broker, you can use django-celery's built in cron-like scheduling, which would allow you to create and destroy tasks dynamically. I don't like using the DB for my broker, though.
Also, you might want to check out chronos
I'm building web service with ranking function.
I don't have powerful servers: whole service would be hosted in standard PC.
There could be time, when many users (in this case many = ~100) are refreshing ranking, so I would do it way, in which users wouldn't crash server because of this.
There is no problem in no real-time refreshing: I can show user ranking generated some time before.
There is no problem for me in generating ranking.
I can easily do this:
User.objects.filter(...).order_by('rank')
EDIT: More details:
I have some workers doing some calculating.
When worker ends its work, it changes rank field of some User instance.
You can assume, all users would do actions leading to several (5-20) calculating, each causing rank change of this user.
If updating the ranking is too long a task to do per-request, then here are a few solutions you could be using:
After something that updates a ranking happens, create an asynchronous task that will update the rankings but not block the request. You could use celery or gearman
Update periodically the rankings, using a celery periodic task or a cron job
Solution 1 is better performance wise but harder to get right. Solution 2 is easier to do, but could less optimal.