I have a Django app which have invitations stored in a db (mysql for now, but may go Postgres). These invitations have expiration dates. I want the invitation removed from the database when the expiration date arrives. I want this done from the Django side as opposed to directly from the database because I need the proper notifications / cleanup to occur which the app handles. I guess I could do have a cron job run every once in a while and have it hit the API but I really wanted to have all app components within the app and not rely on OS function (cron).
cron is the correct way to run scheduled jobs. That's the fundamental philosophy of unix-like systems: components that know how to do a single thing well, and cron is the thing that knows how to trigger jobs at a certain time.
In terms of what the job itself should be, the easiest thing is to write a custom management command.
Related
I'm using Prefect to automatize my flows (python scripts). Once running, some data get persisted to a postgresql database, problem, the size of pg_data gets rapidely out of hands (~20Gb) and I was wondering if there was a way to reduce the amount of data stored to pg_data when running an agent or if there was a way to automatically clean the directory.
Thanks in advance for your help,
best,
Christian
I assume you are running Prefect Server and you want to clean up the underlying database instance to save space? If so, there are a couple of ways you can clean up the Postgres database:
you can manually delete old records, especially logs from the flow run table using DELETE FROM in SQL,
you can do the same in an automated fashion, e.g. some users have an actual flow that runs on schedule and purges old data from the database,
alternatively, you can use the open-source pg_cron job scheduler for Postgres to schedule such DB administration tasks,
you can also do the same using GraphQL: you would need to query for flow run IDs of "old" flow runs using the flow_run query, and then execute delete_flow_run mutation,
lastly, to be more proactive, you can reduce the number of logs you generate by generally logging less (only logging what's needed) and setting the log level to a lower category, e.g. instead of using DEBUG logs on your agent, switching to INFO should significantly reduce the amount of space consumed by logs in the database.
I'm working on a Django web app. The app includes messages that will self-delete after a certain amount of time. I'm using timezone.now() as the sent time and the user inputs a timedelta to display the message until. I'm checking to see if the message should delete itself by checking if current time is after sent time plus the time delta. Will this place a heavy load on the server? How frequently will it automatically check? Is there a way that I can tell it to check once a minute (or otherwise set the frequency)?
Thanks
How frequently will it automatically check?
who is "it" ? If you mean "the django process", then it will NOT check anything by itself. You will have to use either a cronjob or some async queue to take care of removing "dead" messages.
Is there a way that I can tell it to check once a minute (or otherwise set the frequency)?
Well yes, cf above. cronjobs are the simplest solution, async queues (like celery) are much more heavy-weight but if you have a lot of "off-band" processing (processes you want to launch from the request/response cycle BUT execute outside of it) then it's the way to go.
Will this place a heavy load on the server?
It's totally impossible to answer this. It depends on your exact models, the way you write the "check & clean" code, and, of course, data volumes. But using either a cronjob or an async queue this won't run within the django server process(es) itself, and can even be runned on another server as long as it can access the database. IOW the load will be on the database mostly (well, on the server running the process too of course but given your problem description a simple SQL delete query should be enough so..).
I am using Flask.
I am currently using a fabfile to check which users should get a bill and I set up a cron job to run the fabfile every morning at 5am. This automatically creates bills in Stripe and in my database and sends out emails to the users to inform them. This could be used for birthday reminders or anything else similar.
Is setting up a cronjob the standard way of doing this sort of thing? Is there a better way/standard?
I would define "this sort of thing" as. Anything that needs to happen automatically in the app when certain criteria are met without a user interacting with said app.
I could not find much when I googled this.
Using cron is in effect the most straightforward way of doing it. However, there are other kind of services that trigger tasks on a periodic basis and offer some additional control. For instance, Celery's scheduler. There seems to be a tutorial about building periodic tasks with celery here.
What I think you have to ask yourself is:
Is a cron job the most reliable way of billing your customers?
I've written small/simple apps that use an internal timer. e.g: https://bitbucket.org/prologic/irclogger which roates it's irc log files once per day. Is this any better or more reliable? Not really; if the daemon/bot were to die prematurely or the system were to crash; what happens then? In this case it just gets started again and logs continue to rorate at the next "day" interval.
I think two things are important here:
Reliability
Robustness
I am trying to set up some scheduled tasks for a Django app with celery, hosted on heroku. Aside from not know how everything should be configured, what is the best way to approach this?
Let's say users can opt to receive a daily email at a time of their choosing.
Should I have a scheduled job that run every, say 5 minutes. Looks up every user who wants to be emailed at that time and then fire off the emails?
OR
Schedule a task for each user, when they set their preference. (Not sure how I would actually implement this yet)
It depends on how much accuracy you need. Do you want users to select the time down to the minute? second? or will allowing them to select the hour they wish to be emailed be enough.
If on the hour is accurate enough, then use a task that polls for users to mail every hour.
If your users need the mail to go out accurate to the second, then set a task for each user timed to complete on that second.
Everything in between comes down to personal choice. What are you more comfortable doing, and even more importantly: what produces the simplest code with the fewest failure modes?
I would suggest the first option (scheduled job that looks up outstanding jobs) - easier to scale and manage. What if you have 1000s of users - that is a lot of tasks JUST for sending emails.
If you use your database as celery broker, you can use django-celery's built in cron-like scheduling, which would allow you to create and destroy tasks dynamically. I don't like using the DB for my broker, though.
Also, you might want to check out chronos
I have a django project with various apps, which are completely independent. I'd like to make them run each one in their own process, as some of them spawn background threads to precalculate periodically some data and now they are competing for the CPU (the machine has loads of cores, but you know, the GIL and such...)
So, is there an easy way to split automatically the project into different ones, or at least to make each app live in its own process?
You can always have different settings files, but that would be like having multiple projects and even multiple endpoints. With some effort you could configure a reverse proxy to forward to the right Django server, based on the request's path and so on, but I don't think that's what you want and it would be an ugly solution to your problem.
The solution to this is to move the heavy processing to a jobs queue. A lot of people and projects prefer Celery for this.
If that seems like overkill for some reason, you can always implement your own based on simple cron jobs. You can take a look at my small project that does this.
The simplest of the simple is probably to write a custom management command that observes given model (database table) for new entries and processes them. The model is written to by e.g. Django view and the management command is launched periodically from cron (e.g. every 5 minutes).
Example: user registers on the site, but the account creation is an expensive operation (allocating some space, pinging remote services etc.). Therefore you just write a new record to AccountRequest table (AccountRequest.objects.create(...)). Then, cron periodically launches your management script (./manage.py account_creator), which checks for new AccountRequest-s (AccountRequest.objects.filter(unprocessed=True)), does its job and marks those requests as processed.