Detecting change in website Python3 - python

I am taking data from a webpage that updates every morning that updates at different times and I would like to know how to get a script to run every 10 minutes or so to check if the website has been updated. I was thinking of somehow using cron but I don't understand it very well. Thanks for your help.

Have you tried using the package APScheduler? It makes it fairly simple to schedule tasks. Here's the documentation.
To do a scheduled task, this is all that needs to be done (for something basic):
from apscheduler.schedulers.background import BackgroundScheduler
from pytz import utc
scheduler = BackgroundScheduler()
scheduler.configure(timezone=utc)
def print_hello():
print("Hello, this is a scheduled event!")
job = scheduler.add_job(print_hello, 'interval', minutes=1, max_instances=10)
scheduler.start()
Note, however, that I had a small bug when I first tried to use the library, but an explanation of how to fix that is here.

Related

Apscheduler calling function too quickly

Here is my scheduler.py file:
from apscheduler.schedulers.background import BackgroundScheduler
from django_apscheduler.jobstores import DjangoJobStore, register_events
from django.utils import timezone
from django_apscheduler.models import DjangoJobExecution
import sys
# This is the function you want to schedule - add as many as you want and then register them in the start() function below
def hello():
print("Hello")
def start():
scheduler = BackgroundScheduler()
scheduler.add_jobstore(DjangoJobStore(), "default")
# run this job every 10 seconds
scheduler.add_job(hello, 'interval', seconds=10, jobstore='default')
register_events(scheduler)
scheduler.start()
print("Scheduler started...", file=sys.stdout)
My Django app runs fine on localhost. I'm simply attempting to print 'hello' every 10 seconds in the terminal, but it sometimes prints like 3 or 4 at a time. Why is this? This was just a base template to help understand apscheduler.
The primary reason this might be happening is if you are running your development server without the --noreload flag set, which will cause the scheduler to be called twice (sometimes more).
When you run your server in development, try it like:
python manage.py runserver localhost:8000 --noreload
And see what happens. If it still keeps happening, it may be that the interval is too close together so by the time your system gets around to it, another version is still being called (even though it is a very short function). Django stores all pending, overdue, and run jobs in the database, so it has to store a record for the job after each transaction. Try expanding the interval and see what happens.
If neither of those things work, please post the rest of the code you are using and I will update my answer with other options. I've had a similar issue in the past, but it was solved with the --noreload option being set. When you run it in production behind a regular web server with DEBUG=False, then it should resolve itself as well.

How to convert Schedule to APScheduler module?

I'm a python newbie and need a little help with my code.
I need to convert my code to use APSCHEDULER and not SCHEDULE module because I need to use hours, minutes, seconds which SCHEDULE is not capable of. How can I convert this script to use hh:mm:ss ?
Note: I did read the documentation and examples, and oddly enough there isn't anything about scheduling a job to run at exactly 7:45:30 PM. All the examples appeared to be geared towards cron jobs running every seconds=3.
Here is my code for schedule-module
import subprocess
import time
import schedule
def job1():
subprocess.call("netsh interface portproxy add v4tov4 listenaddress=192.168.0.153 listenport=1101 connectaddress=192.168.0.153 connectport=809 protocol=tcp", shell=True)
def job2():
subprocess.call("netsh interface portproxy reset", shell=True)
schedule.every().day.at("06:00").do(job1)
schedule.every().day.at("07:00").do(job2)

Writing a Python script that runs everyday till a specified date

I want to schedule a job (run a python script) everyday at a specific time till a specific date has been reached.
Researching on a lot of Pythonic schedulers, I thought that APScheduler was a good candidate to get around this.
This is an example snippet using APScheduler that starts a job and executes it every two hours after a specified date.
from datetime import datetime
from apscheduler.scheduler import Scheduler
# Start the scheduler
sched = Scheduler()
sched.start()
def job_function():
print "Hello World"
# Schedule job_function to be called every two hours
sched.add_interval_job(job_function, hours=2)
# The same as before, but start after a certain time point
sched.add_interval_job(job_function, hours=2, start_date='2010-10-10 09:30')
How to achieve the same and have a upper limit date after which the job should not be executed?
Any suggestions that revolve within and outside the APScheduler are most welcome.
Thanks in advance.
Use a cron job that executes your script every two hours (cron is made specifically for things like this). In your script, you just look up the system date and check, if it's smaller than your given date. If it's smaller, you execute the rest of your script, otherwise you quit.
You may also write additional code, so you get notified when the script is not actually executed anymore.
I eventually found the interval trigger can take an end_date.
You can pass arguments for the trigger to add_job with trigger='interval':
sched.add_job(job_function, trigger='interval', hours=2, end_date='2016-10-10 09:30')
I think you may be using an older version of the software.

How to perform periodic task with Flask in Python

I've been using Flask to provide a simple web API for my k8055 USB interface board; fairly standard getters and putters, and Flask really made my life a lot easier.
But I want to be able to register changes of state as / near when whey happen.
For instance, if I have a button connected to the board, I can poll the api for that particular port. But if I wanted to have the outputs directly reflect the outputs, whether or not someone was talking to the api, I would have something like this.
while True:
board.read()
board.digital_outputs = board.digital_inputs
board.read()
time.sleep(1)
And every second, the outputs would be updated to match the inputs.
Is there any way to do this kind of thing under Flask? I've done similar things in Twisted before but Flask is too handy for this particular application to give up on it just yet...
Thanks.
For my Flask application, I contemplated using the cron approach described by Pashka in his answer, the schedule library, and APScheduler.
I found APScheduler to be simple and serving the periodic task run purpose, so went ahead with APScheduler.
Example code:
from flask import Flask
from apscheduler.schedulers.background import BackgroundScheduler
app = Flask(__name__)
def test_job():
print('I am working...')
scheduler = BackgroundScheduler()
job = scheduler.add_job(test_job, 'interval', minutes=1)
scheduler.start()
You could use cron for simple tasks.
Create a flask view for your task.
# a separate view for periodic task
#app.route('/task')
def task():
board.read()
board.digital_outputs = board.digital_inputs
Then using cron, download from that url periodically
# cron task to run each minute
0-59 * * * * run_task.sh
Where run_task.sh contents are
wget http://localhost/task
Cron is unable to run more frequently than once a minute. If you need higher frequency, (say, each 5 seconds = 12 times per minute), you must do it in tun_task.sh in the following way
# loop 12 times with a delay
for i in 1 2 3 4 5 6 7 8 9 10 11 12
do
# download url in background for not to affect delay interval much
wget -b http://localhost/task
sleep 5s
done
For some reason, Antony's code wasn't working for me. I didn't get any error messages or anything, but the test_job function wouldn't run.
I was able to get it working by installing Flask-APScheduler and then using the following code, which is a blend of Antony's code and the example from this Techcoil article.
from flask import Flask
from flask_apscheduler import APScheduler
app = Flask(__name__)
def test_job():
print('I am working...')
scheduler = APScheduler()
scheduler.init_app(app)
scheduler.start()
scheduler.add_job(id='test-job', func=test_job, trigger='interval', seconds=1)
No there is not tasks support in Flask, but you can use flask-celery or simply run your function in separate thread(greenlet).

How can I schedule a Task to execute at a specific time using celery?

I've looked into PeriodicTask, but the examples only cover making it recur. I'm looking for something more like cron's ability to say "execute this task every Monday at 1 a.m."
Use
YourTask.apply_async(args=[some, args, here], eta=when)
And at the end of your task, reschedule it to the next time it should run.
The recently released version 1.0.3 supports this now, thanks to Patrick Altman!
Example:
from celery.task.schedules import crontab
from celery.decorators import periodic_task
#periodic_task(run_every=crontab(hour=7, minute=30, day_of_week="mon"))
def every_monday_morning():
print("This runs every Monday morning at 7:30a.m.")
See the changelog for more information:
http://celeryproject.org/docs/changelog.html
I have just submitted a patch to add a ScheduledTask to accomplish a small bit of time based scheduling versus period based:
https://github.com/celery/celery/commit/e8835f1052bb45a73f9404005c666f2d2b9a9228
While #asksol's answer still holds, the api has been updated. For celery 4.1.0, I have to import crontab and periodic_task as follows:
from celery.schedules import crontab
from celery.task import periodic_task
How you can read in this tutorial, you can make a PeriodicTask, i think if you have execute a task at 1 .am. Monday's morning is because you wan to run a long cpu/mem operation, rememeber celery use ampq for enqueue tasks.

Categories