I am using Flask.
I am currently using a fabfile to check which users should get a bill and I set up a cron job to run the fabfile every morning at 5am. This automatically creates bills in Stripe and in my database and sends out emails to the users to inform them. This could be used for birthday reminders or anything else similar.
Is setting up a cronjob the standard way of doing this sort of thing? Is there a better way/standard?
I would define "this sort of thing" as. Anything that needs to happen automatically in the app when certain criteria are met without a user interacting with said app.
I could not find much when I googled this.
Using cron is in effect the most straightforward way of doing it. However, there are other kind of services that trigger tasks on a periodic basis and offer some additional control. For instance, Celery's scheduler. There seems to be a tutorial about building periodic tasks with celery here.
What I think you have to ask yourself is:
Is a cron job the most reliable way of billing your customers?
I've written small/simple apps that use an internal timer. e.g: https://bitbucket.org/prologic/irclogger which roates it's irc log files once per day. Is this any better or more reliable? Not really; if the daemon/bot were to die prematurely or the system were to crash; what happens then? In this case it just gets started again and logs continue to rorate at the next "day" interval.
I think two things are important here:
Reliability
Robustness
Related
I have a REST API service that I want to call on at scheduled times. Currently, I call the API using a basic cronjob on a server every day at 8:00am.
I want to scale up and allow my users to schedule a time they would like to receive the notification from my API call. How could I go about doing this? I know I would need to keep a database of user requests and their associated times, however I am not sure if continuing to use cron is the best way about this... (I would prefer not to use third party services in order to keep costs down)
I am having trouble wrapping my head this, if anybody has any advice that would be much appreciated!
If the time frame is going to be something simple, like one-per-day, once-per-week, etc., using the cron.d folder is a fairly trivial and in my opinion appropriate solution.
The simplest way would be each user having their own file with a simple one-line cron statement that reflects their selected time. When the user selects their time, part of your service creates the correct file for that user. You can go on from there.
Whether or not you put them in a database is really a question of your own system design; given a proper file naming scheme, you could feasibly do this without having to keep that requested time in persistent storage.
I'm working on a Django web app. The app includes messages that will self-delete after a certain amount of time. I'm using timezone.now() as the sent time and the user inputs a timedelta to display the message until. I'm checking to see if the message should delete itself by checking if current time is after sent time plus the time delta. Will this place a heavy load on the server? How frequently will it automatically check? Is there a way that I can tell it to check once a minute (or otherwise set the frequency)?
Thanks
How frequently will it automatically check?
who is "it" ? If you mean "the django process", then it will NOT check anything by itself. You will have to use either a cronjob or some async queue to take care of removing "dead" messages.
Is there a way that I can tell it to check once a minute (or otherwise set the frequency)?
Well yes, cf above. cronjobs are the simplest solution, async queues (like celery) are much more heavy-weight but if you have a lot of "off-band" processing (processes you want to launch from the request/response cycle BUT execute outside of it) then it's the way to go.
Will this place a heavy load on the server?
It's totally impossible to answer this. It depends on your exact models, the way you write the "check & clean" code, and, of course, data volumes. But using either a cronjob or an async queue this won't run within the django server process(es) itself, and can even be runned on another server as long as it can access the database. IOW the load will be on the database mostly (well, on the server running the process too of course but given your problem description a simple SQL delete query should be enough so..).
I am trying to set up some scheduled tasks for a Django app with celery, hosted on heroku. Aside from not know how everything should be configured, what is the best way to approach this?
Let's say users can opt to receive a daily email at a time of their choosing.
Should I have a scheduled job that run every, say 5 minutes. Looks up every user who wants to be emailed at that time and then fire off the emails?
OR
Schedule a task for each user, when they set their preference. (Not sure how I would actually implement this yet)
It depends on how much accuracy you need. Do you want users to select the time down to the minute? second? or will allowing them to select the hour they wish to be emailed be enough.
If on the hour is accurate enough, then use a task that polls for users to mail every hour.
If your users need the mail to go out accurate to the second, then set a task for each user timed to complete on that second.
Everything in between comes down to personal choice. What are you more comfortable doing, and even more importantly: what produces the simplest code with the fewest failure modes?
I would suggest the first option (scheduled job that looks up outstanding jobs) - easier to scale and manage. What if you have 1000s of users - that is a lot of tasks JUST for sending emails.
If you use your database as celery broker, you can use django-celery's built in cron-like scheduling, which would allow you to create and destroy tasks dynamically. I don't like using the DB for my broker, though.
Also, you might want to check out chronos
I'm developing a Python-application that "talks" to the user, and performs tasks based on what the user says(e.g. User:"Do I have any new facebook-messages?", answer:"Yes, you have 2 new messages. Would you like to see them?"). Functionality like integration with facebook or twitter is provided by plugins. Based on predefined parsing rules, my application calls the plugin with the parsed arguments, and uses it's response. The application needs to be able to answer multiple query's from different users at the same time(or practically the same time).
Currently, I need to call a function, "Respond", with the user input as argument. This has some disadvantages, however:
i)The application can only "speak when it is spoken to". It can't decide to query facebook for new messages, and tell the user whether it does, without being told to do that.
ii)Having a conversation with multiple users at a time is very hard, because the application can only do one thing at a time: if Alice asks the application to check her Facebook for new messages, Bob can't communicate with the application.
iii)I can't develop(and use) plugins that take a lot of time to complete, e.g. download a movie, because the application isn't able to do anything whilesame the previous task isn't completed.
Multithreading seems like the obvious way to go, here, but I'm worried that creating and using 500 threads at a time dramatically impacts performance, so using one thread per query(a query is a statement from the user) doesn' seem like the right option.
What would be the right way to do this? I've read a bit about Twisted, and the "reactor" approach seems quite elegant. However, I'm not sure how to implement something like that in my application.
i didn't really understand what sort of application its going to be, but i tried to anwser your questions
create a thread that query's, and then sleeps for a while
create a thread for each user, and close it when the user is gone
create a thread that download's and stops
after all, there ain't going to be 500 threads.
This seems like a simple question, but I am having trouble finding the answer.
I am making a web app which would require the constant running of a task.
I'll use sites like Pingdom or Twitterfeed as an analogy. As you may know, Pingdom checks uptime, so is constantly checking websites to see if they are up and Twitterfeed checks RSS feeds to see if they;ve changed and then tweet that. I too need to run a simple script to cycle through URLs in a database and perform an action on them.
My question is: how should I implement this? I am familiar with cron, currently using it to do my server backups. Would this be the way to go?
I know how to make a Python script which runs indefinitely, starting back at the beginning with the next URL in the database when I'm done. Should I just run that on the server? How will I know it is always running and doesn't crash or something?
I hope this question makes sense and I hope I am not repeating someone else or anything.
Thank you,
Sam
Edit: To be clear, I need the task to run constantly. As in, check URL 1 in the database, check URl 2 in the database, check URL 3 and, when it reaches the last one, go right back to the beginning. Thanks!
If you need a repeatable running of the task which can be run from command line - that's what the cron is ideal for.
I don't see any demerits of this approach.
Update:
Okay, I saw the issue somewhat different. Now I see several solutions:
run the cron task at set intervals, let it process the data once per run, next time it will process the data on another run; use PIDs/Database/semaphores to avoid parallel processes;
update the processes that insert/update data in the database; let the information be processed when it is inserted/updated; c)
write a demon process which will reside in memory and check the data in real time.
cron would definitely be a way to go with this, as well as any other task scheduler you may prefer.
The main point is found in the title to your question:
Run a repeating task for a web app
The background task and the web application should be kept separate. They can share code, they can share access to a database, but they should be separate and discrete application contexts. (Consider them as separate UIs accessing the same back-end logic.)
The main reason for this is because web applications and background processes are architecturally very different and aren't meant to be mixed. Consider the structure of a web application being held within a web server (Apache, IIS, etc.). When is the application "running"? When it is "on"? It's not really a running task. It's a service waiting for input (requests) to handle and generate output (responses) and then go back to waiting.
Web applications are for responding to requests. Scheduled tasks or daemon jobs are for running repeated processes in the background. Keeping the two separate will make your management of the two a lot easier.