I have written a scraper that does html scraping and then use API to get some data, since its a very lengthy code I haven't put it here. I have implemented random sleep method and using it within my code to monitor throttle. But I want to make sure I don't over run this code, so my idea is to run for an 3-4 hours then taker breather and then run again. I haven't done anything like this in python I was trying to search but not really sure where to start from, it would be great if I get some guidance on this. If python has a specific module link to that would be a great help.
Also is this relevant? I don't I need this level of complication?
Suggestions for a Cron like scheduler in Python?
I have functions for every single scraping task, and I have main method calling all those functions.
You can use a threading.Timer object to schedule an interrupt signal to the main thread after the time is exceeded:
import thread, threading
def longjob():
try:
# do your job
while True:
print '*',
except KeyboardInterrupt:
# do your cleanup
print 'ok, giving up'
def terminate():
print 'sorry, pal'
thread.interrupt_main()
time_limit = 5 # terminate in 5 seconds
threading.Timer(time_limit, terminate).start()
longjob()
Put this in your crontab and run every time_limit + 2 minutes.
You could just note the time you have started and each time you want to run something make sure you haven't exceeded the given maximum. Something like this should get you started:
from datetime import datetime
MAX_SECONDS = 3600
# note the time you have started
start = datetime.now()
while True:
current = datetime.now()
diff = current-start
if diff.seconds >= MAX_SECONDS:
# break the loop after MAX_SECONDS
break
# MAX_SECONDS not exceeded, run more tasks
scrape_some_more()
Here's the link to the datetime module documentation.
Related
I want to show off some of my work, but i don't want it to be kept forever.
I've tried this trial period that checks on launch if date has past.
from datetime import datetime, timedelta
def check_trial():
trial_start = "2019-09-03"
trial_end = datetime.strptime(trial_start, '%Y-%m-%d') + timedelta(days=5)
today = datetime.now()
if today > trial_end:
print("Trial Expired")
os._exit()
def main():
print("running")
check_trial()
main()
Im scared if they just change their computer date then this wont run. What should be done to protect against that?
Not sure if this completes your requirement but here is an alternate method using signal module in python. Let us suppose that run() function runs your application and you want it to run for 10 mins.
import signal
signal.alarm(600)
run()
signal.alarm(0)
This will simply run your run() function till 600 seconds and if the function doesn't stop execution it will forcefully break the execution of the function.
I am trying to create a scheduled task in Python using Win32com. I am able to create a daily trigger. However, I cannot find a way to create a trigger every 5 seconds or every minute for that matter. Does anybody have any pointers on how to do that?
As said in a comment, if you want to do stuff with this frequency you are better off just having your program run forever and do its own scheduling.
In a similar fashion to #Barmak Shemirani's answer, but without spawning threads:
import time
def screenshot():
# do your screenshot...
interval = 5.
target_time = time.monotonic() + interval
while True:
screenshot()
delay = target_time - time.monotonic()
if delay > 0.:
time.sleep(delay)
target_time += interval
or, if your screenshot is fast enough and you don't really care about precise timing:
while True:
screenshot()
time.sleep(interval)
If you want this to run from the system startup, you'll have to make it a service, and change the exit condition accordingly.
pywin32 is not required to create schedule or timer. Use the following:
import threading
def screenshot():
#pywin32 code here
print ("Test")
def starttimer():
threading.Timer(1.0, starttimer).start()
screenshot()
starttimer()
Use pywin32 for taking screenshot etc.
How do I have a part of python script(only a method, the whole script runs in 24/7) run everyday at a set-time, exactly at every 20th minutes? Like 12:20, 12:40, 13:00 in every hour.
I can not use cron, I tried periodic execution but that is not as accurate as I would... It depends from the script starting time.
Module schedule may be useful for this. See answer to
How do I get a Cron like scheduler in Python? for details.
You can either put calling this method in a loop, which would sleep for some time
from time import sleep
while True:
sleep(1200)
my_function()
and be triggered once in a while, you could use datetime to compare current timestamp and set next executions.
import datetime
function_executed = False
trigger_time = datetime.datetime.now()
def set_trigger_time():
global function executed = False
return datetime.datetime.now() + datetime.timedelta(minutes=20)
while True:
if function_executed:
triggertime = set_trigger_time()
if datetime.datetime.now() == triggertime:
function_executed = True
my_function()
I think however making a system call the script would be a nicer solution.
Use for example redis for that and rq-scheduler package. You can schedule tasks with specific time. So you can run first script, save to the variable starting time, calculate starting time + 20 mins and if your current script will end, at the end you will push another, the same task with proper time.
I'm scraping (extracting) data from a certain website. The data contains two values that I need, namely (grid) frequency value and time.
The data on the website is being updated every second. I'd like to continuously save these values (append them) into a list or a tuple using python. To do that I tried using schedule library. The following job schedule commands run the data scraping function (socket_freq) every second.
import schedule
schedule.every(1).seconds.do(socket_freq)
while True:
schedule.run_pending()
I'm facing two problems:
I don't know how to restrict the schedule to run during a chosen time interval. For example, i'd like to run it for 5 or 10 minutes. how do I define that? I mean how to I tell the schedule to stop after a certain time.
if I run this code and stop it after few seconds (using break), then I often get multiple entries, for example here is one result, where the first list[ ] in the tuple refers to the time value and the second list[ ] is the values of frequency:
out:
(['19:27:02','19:27:02','19:27:02','19:27:03','19:27:03','19:27:03','19:27:03','19:27:03','19:27:03','19:27:03','19:27:04','19:27:04','19:27:04', ...],
['50.020','50.020','50.020','50.018','50.018','50.018','50.018','50.018','50.018','50.018','50.017','50.017','50.017'...])
As you can see, the time variable is entered (appended) multiple times, although I used a schedule that runs every 1 second. What i'd actually would expect to retrieve is:
out:
(['19:27:02','19:27:03','19:27:04'],['50.020','50.018','50.017'])
Does anybody know how to solve these problems?
Thanks!
(I'm using python 2.7.9)
Ok, so here's how I would tackle these problems:
Try to obtain a timestamp at the start of your program and then simply check if it has been working long enough each time you execute piece of code you are scheduling.
Use time.sleep() to put your program to sleep for a period of time.
Check my example below:
import schedule
import datetime
import time
# Obtain current time
start = datetime.datetime.now()
# Simple callable for example
class DummyClock:
def __call__(self):
print datetime.datetime.now()
schedule.every(1).seconds.do(DummyClock())
while True:
schedule.run_pending()
# 5 minutes == 300 seconds
if (datetime.datetime.now() - start).seconds >= 300:
break
# And here we halt execution for a second
time.sleep(1)
All refactoring is welcome
I am using this loop for running every 5 minutes just creating thread and it completes.
while True:
now_plus_5 = now + datetime.timedelta(minutes = 5)
while datetime.datetime.now()<= now_plus_5:
new=datetime.datetime.now()
pass
now = new
pass
But when i check my process status it shows 100% usage when the script runs.Does it causing problem?? or any good ways??
Does it causes CPU 100% usage??
You might rather use something like time.sleep
while True:
# do something
time.sleep(5*60) # wait 5 minutes
Based on your comment above, you may find a Timer object from the threading module to better suit your needs:
from threading import Timer
def hello():
print "hello, world"
t = Timer(300.0, hello)
t.start() # after 5 minutes, "hello, world" will be printed
(code snippet modified from docs)
A Timer is a thread subclass, so you can further encapsulate your logic as needed.
This allows the threading subsystem to schedule the execution of your task such that it's not entirely CPU bound like your current implementation.
I should also note that the Timer class is designed to be fired only once. As such, you'd want to design your task to start a new instance upon completion, or create your own Thread subclass with its own smarts.
While researching this, I noticed that there's also a sched module that provides this functionality as well, but rather than rehash the solution, check out this related question:
Python Equivalent of setInterval()?
timedelta takes(seconds,minutes,hours,days,months,years) as input and works accordingly
from datetime import datetime,timedelta
end_time = datetime.now()+timedelta(minutes=5)
while end_time>= datetime.now():
statements