Need pythonic way to schedule one time jobs with arguments - python

The premise is that I have a script which checks a resource every morning, and retrieves times and URI of events, which will vary from day to day. I want to pass the time and URI location to a scheduler, so that a script designed to capture the event gets called at the event time, passing the location as a variable to the capture script.
At first glance crontab seems like the easiest way to do it, but every job is unique and will only run once, so it creates a lot of maintenance.

I don't have a suggestion for Python specifically, but given that you mentioned crontab as something you were considering, the "one-off" version of a crontab would be at.
'at' tutorial
'at' man page

Related

Run python script automatically every week (windows)

I wrote a little python Script that fetches data from some website using hard coded credentials ( i know its bad but not part of this question).
The website has new data every day and im gathering data from a whole week and parse it into a single .pdf.
I've already adjusted the script to always generate a pdf off last week by default. (no params needed)
Im kinda lazy and don't want to run the script every week by hand.
So is it possible to run the script at certain times, for example every monday at 10am?
Sure, just utilize Windows' task scheduler. There you can create new tasks to your delight and let it run commands to whatever times or intervalls you want. The task schedulers' GUI should be self-explanatory, but to be concrete on your example:
Configure the run time (weekly, monday, 10am) under triggers
Add a new action and give it your Python interpreter as the command and your script to be run as the argument
Configure the rest according to your needs

Persist Completed Pipeline in Luigi Visualiser

I'm starting to port a nightly data pipeline from a visual ETL tool to Luigi, and I really enjoy that there is a visualiser to see the status of jobs. However, I've noticed that a few minutes after the last job (named MasterEnd) completes, all of the nodes disappear from the graph except for MasterEnd. This is a little inconvenient, as I'd like to see that everything is complete for the day/past days.
Further, if in the visualiser I go directly to the last job's URL, it can't find any history that it ran: Couldn't find task MasterEnd(date=2015-09-17, base_url=http://aws.east.com/, log_dir=/home/ubuntu/logs/). I have verified that it ran successfully this morning.
One thing to note is that I have a cron that runs this pipeline every 15 minutes to check for a file on S3. If it exists, it runs, otherwise it stops. I'm not sure if that is causing the removal of tasks from the visualiser or not. I've noticed it generates a new PID every run, but I couldn't find a way to persist one PID/day in the docs.
So, my questions: Is it possible to persist the completed graph for the current day in the visualiser? And is there a way to see what has happened in the past?
Appreciate all the help
I'm not 100% positive if this is correct, but this is what I would try first. When you call luigi.run, pass it --scheduler-remove-delay. I'm guessing this is how long the scheduler waits before forgetting a task after all of its dependents have completed. If you look through luigi's source, the default is 600 seconds. For example:
luigi.run(["--workers", "8", "--scheduler-remove-delay","86400")], main_task_cls=task_name)
If you configure the remove_delay setting in your luigi.cfg then it will keep the tasks around for longer.
[scheduler]
record_task_history = True
state_path = /x/s/hadoop/luigi/var/luigi-state.pickle
remove_delay = 86400
Note, there is a typo in the documentation ("remove-delay" instead of remove_delay") which is being fixed under https://github.com/spotify/luigi/issues/2133

Writing a Python script that runs everyday till a specified date

I want to schedule a job (run a python script) everyday at a specific time till a specific date has been reached.
Researching on a lot of Pythonic schedulers, I thought that APScheduler was a good candidate to get around this.
This is an example snippet using APScheduler that starts a job and executes it every two hours after a specified date.
from datetime import datetime
from apscheduler.scheduler import Scheduler
# Start the scheduler
sched = Scheduler()
sched.start()
def job_function():
print "Hello World"
# Schedule job_function to be called every two hours
sched.add_interval_job(job_function, hours=2)
# The same as before, but start after a certain time point
sched.add_interval_job(job_function, hours=2, start_date='2010-10-10 09:30')
How to achieve the same and have a upper limit date after which the job should not be executed?
Any suggestions that revolve within and outside the APScheduler are most welcome.
Thanks in advance.
Use a cron job that executes your script every two hours (cron is made specifically for things like this). In your script, you just look up the system date and check, if it's smaller than your given date. If it's smaller, you execute the rest of your script, otherwise you quit.
You may also write additional code, so you get notified when the script is not actually executed anymore.
I eventually found the interval trigger can take an end_date.
You can pass arguments for the trigger to add_job with trigger='interval':
sched.add_job(job_function, trigger='interval', hours=2, end_date='2016-10-10 09:30')
I think you may be using an older version of the software.

Automate Python Script

I'm running a python script manually that fetches data in JSON format.How do I automate this script to run automatically on an hourly basis?
I'm working on Windows7.Can I use tools like Task scheduler?If I can use it,what do I need to put in the batch file?
Can I use tools like Task scheduler?
Yes. Any tool that can run arbitrary programs can run your Python script. Pick the one you like best.
If I can use it,what do I need to put in the batch file?
What batch file? Task Scheduler takes anything that can be run, with arguments—a C program, a .NET program, even a document with a default app associated with it. So, there's no reason you need a batch file. Use C:\Python33\python.exe (or whatever the appropriate path is) as your executable, and your script's path (and its arguments, if any) as the arguments. Just as you do when running the script from the command line.
See Using the Task Scheduler in MSDN for some simple examples, and Task Scheduler Schema Elements or Task Scheduler Scripting Objects for reference (depending on whether you want to create the schedule in XML, or via the scripting interface).
You want to create an ExecAction with Path set to "C:\Python33\python.exe" and Arguments set to "C:\MyStuff\myscript.py", and a RepetitionPattern with Interval set to "PT1H". You should be able to figure out the rest from there.
As sr2222 points out in the comments, often you end up scheduling tasks frequently, and needing to programmatically control their scheduling. If you need this, you can control Task Scheduler's scripting interface from Python, or build something on top of Task Scheduler, or use a different tool that's a bit easier to get at from Python and has more helpful examples online, etc.—but when you get to that point, take a step back and look at whether you're over-using OS task scheduling. (If you start adding delays or tweaking times to make sure the daily foo1.py job never runs until 5 minutes after the most recent hourly foo0.py has finished its job, you're over-using OS task scheduling—but it's not always that obvious.)
May I suggest WinAutomation or AutoMate. These two do the exact same thing, except the UI is a little different. I prefer WinAutomation, because the scripts are a little easier to build.
Yes, you can use the Task Scheduler to run the script on an hourly bases.
To execute a python script via a Batch File, use the following code:
start path_to_python_exe path_to_python_file
Example:
start C:\Users\harshgoyal\AppData\Local\Continuum\Anaconda3\python.exe %UserProfile%\Documents\test_script.py
If python is set as Window’s Environment Window then you can reduce the syntax to:
start python %UserProfile%\Documents\test_script.py
What I generally do is run the batch file once via Task Scheduler and within the python script I call a thread/timer every hour.
class threading.Timer(interval, function, args=None, kwargs=None)

How to use thread in Django

I want to check users' subscribed dates for certain period. And send mail to users whose subscription is finishing (ex. reminds two days).
I think the best way is using thread and timer to check dates. But I have no idea how to call this function. I don't want to make a separate program or shell. I want to combine this procedure to my django code. I tried to call this function in my settings.py file. But it seems it is not a good idea. It calls the function and creates thread every time I imported settings.
That's case for manage.py command called periodically from cron. Oficial doc about creating those commands. Here bit more helpful.
If you want something simpler then django-command-extensions has commands for managing django jobs.
if you need more then only this one asynchronous job have a look at celery.
using Django-cron is much easier and simple
EDIT: Added a tip
from django_cron import cronScheduler, Job
class sendMail(Job):
# period run every 300 seconds (5 minutes)
run_every = 300
def job(self):
# This will be executed every 5 minutes
datatuple = check_subscription_finishing()
send_mass_mail(datatuple)
//and just register it
cronScheduler.register(sendMail)

Categories