How to exit from a generator at some specific time? - python

I'm reading tweets from Twitter Streaming API. After connecting to the API, I'm getting a generator.
I'm looping through each tweet received but I want to exit from the iterator, say, at 18PM. After receiving each tweet, I'm checking if it's later than the specified timestamp and stopping.
The issue is that I'm not receiving tweets frequently enough. So, I could receive one at 17:50 and the next one at 19PM. That's when I'll find out that the time has passed and I need to stop.
Is there a way to force the stop at 18PM exactly?
Here's a high-level view of my code:
def getStream(tweet_iter):
for tweet in tweet_iter:
#do stuff
if time_has_passed():
return
tweet_iter = ConnectAndGetStream()
getStream(tweet_iter)

Create a separate thread for the producer and use a Queue to communicate. I also had to use a threading.Event for stopping the producer.
import itertools, queue, threading, time
END_TIME = time.time() + 5 # run for ~5 seconds
def time_left():
return END_TIME - time.time()
def ConnectAndGetStream(): # stub for the real thing
for i in itertools.count():
time.sleep(1)
yield "tweet {}".format(i)
def producer(tweets_queue, the_end): # producer
it = ConnectAndGetStream()
while not the_end.is_set():
tweets_queue.put(next(it))
def getStream(tweets_queue, the_end): # consumer
try:
while True:
tweet = tweets_queue.get(timeout=time_left())
print('Got', tweet)
except queue.Empty:
print('THE END')
the_end.set()
tweets_queue = queue.Queue() # you might wanna use the maxsize parameter
the_end = threading.Event()
producer_thread = threading.Thread(target=producer,
args=(tweets_queue, the_end))
producer_thread.start()
getStream(tweets_queue, the_end)
producer_thread.join()

Your problem could be resolved by splitting the functionality of your design into two separated processes:
A twitter process that acts as wrapper to Twitter API and
A monitor process that is able to terminate the twitter process when the exit time is reached.
The following piece of code prototypes the functionality described above using Python's multiprocessing module:
import multiprocessing as mp
import time
EXIT_TIME = '12:21' #'18:00'
def twitter():
while True:
print 'Twittttttttttt.....'
time.sleep(5)
def get_time():
return time.ctime().split()[3][:5]
if __name__ == '__main__':
# Execute the function as a process
p = mp.Process( target=twitter, args=() )
p.start()
# Monitoring the process p
while True:
print 'Checking the hour...'
if get_time() == EXIT_TIME:
p.terminate()
print 'Current time:', time.ctime()
print 'twitter process has benn terminated...'
break
time.sleep(5)
Of course you can use p.join(TIMEOUT) instead of using the while True loop presented in my example as pointed here.

Here is an example with threading and python scheduler:
import threading
import time
import os
import schedule
def theKillingJob():
print("Kenny and Cartman die!")
os._exit(1)
schedule.every().day.at("18:00").do(theKillingJob,'It is 18:00')
def getStream(tweet_iter):
for tweet in tweet_iter:
#do stuff
def kenny():
while True:
print("Kenny alive..")
schedule.run_pending()
time.sleep(1)
def cartman():
while True:
print("Cartman alive..")
tweet_iter = ConnectAndGetStream()
getStream(tweet_iter)
# You can change whenever you want to check for tweets by changing sleep time here
time.sleep(1)
if __name__ == '__main__':
daemon_kenny = threading.Thread(name='kenny', target=kenny)
daemon_cartman = threading.Thread(name='cartman', target=cartman)
daemon_kenny.setDaemon(True)
daemon_cartman.setDaemon(True)
daemon_kenny.start()
daemon_cartman.start()
daemon_kenny.join()
daemon_cartman.join()

Related

Timeouts for multiprocessing?

I've searched StackOverflow and although I've found many questions on this, I haven't found an answer that fits for my situation/not a strong python programmer to adapt their answer to fit my need.
I've looked here to no avail:
kill a function after a certain time in windows
Python: kill or terminate subprocess when timeout
signal.alarm replacement in Windows [Python]
I am using multiprocessing to run multiple SAP windows at once to pull reports. The is set up to run on a schedule every 5 minutes. Every once in a while, one of the reports gets stalled due to the GUI interface and never ends. I don't get an error or exception, it just stalls forever. What I would like is to have a timeout function that during this part of the code that is executed in SAP, if it takes longer than 4 minutes, it times out, closes SAP, skips the rest of the code, and waits for next scheduled report time.
I am using Windows Python 2.7
import multiprocessing
from multiprocessing import Manager, Process
import time
import datetime
### OPEN SAP ###
def start_SAP():
print 'opening SAP program'
### REPORTS IN SAP ###
def report_1(q, lock):
while True: # logic to get shared queue
if not q.empty():
lock.acquire()
k = q.get()
time.sleep(1)
lock.release()
break
else:
time.sleep(1)
print 'running report 1'
def report_2(q, lock):
while True: # logic to get shared queue
if not q.empty():
lock.acquire()
k = q.get()
time.sleep(1)
lock.release()
break
else:
time.sleep(1)
print 'running report 2'
def report_3(q, lock):
while True: # logic to get shared queue
if not q.empty():
lock.acquire()
k = q.get()
time.sleep(1)
lock.release()
break
else:
time.sleep(1)
time.sleep(60000) #mimicking the stall for report 3 that takes longer than allotted time
print 'running report 3'
def report_N(q, lock):
while True: # logic to get shared queue
if not q.empty():
lock.acquire()
k = q.get()
time.sleep(1)
lock.release()
break
else:
time.sleep(1)
print 'running report N'
### CLOSES SAP ###
def close_SAP():
print 'closes SAP'
def format_file():
print 'formatting files'
def multi_daily_pull():
lock = multiprocessing.Lock() # creating a lock in multiprocessing
shared_list = range(6) # creating a shared list for all functions to use
q = multiprocessing.Queue() # creating an empty queue in mulitprocessing
for n in shared_list: # putting list into the queue
q.put(n)
print 'Starting process at ', time.strftime('%m/%d/%Y %H:%M:%S')
print 'Starting SAP Pulls at ', time.strftime('%m/%d/%Y %H:%M:%S')
StartSAP = Process(target=start_SAP)
StartSAP.start()
StartSAP.join()
report1= Process(target=report_1, args=(q, lock))
report2= Process(target=report_2, args=(q, lock))
report3= Process(target=report_3, args=(q, lock))
reportN= Process(target=report_N, args=(q, lock))
report1.start()
report2.start()
report3.start()
reportN.start()
report1.join()
report2.join()
report3.join()
reportN.join()
EndSAP = Process(target=close_SAP)
EndSAP.start()
EndSAP.join()
formatfile = Process(target=format_file)
formatfile .start()
formatfile .join()
if __name__ == '__main__':
multi_daily_pull()
One way to do what you want would be to use the optional timeout argument that the Process.join() method accepts. This will make it only block the calling thread at most that length of time.
I also set the daemon attribute of each Process instance so your main thread will be able to terminate even if one of the processes it started is still "running" (or has hung up).
One final point, you don't need a multiprocessing.Lock to control access a multiprocessing.Queue, because they handle that aspect of things automatically, so I removed it. You may still want to have one for some other reason, such as controlling access to stdout so printing to it from the various processes doesn't overlap and mess up what is output to the screen.
import multiprocessing
from multiprocessing import Process
import time
import datetime
def start_SAP():
print 'opening SAP program'
### REPORTS IN SAP ###
def report_1(q):
while True: # logic to get shared queue
if q.empty():
time.sleep(1)
else:
k = q.get()
time.sleep(1)
break
print 'report 1 finished'
def report_2(q):
while True: # logic to get shared queue
if q.empty():
time.sleep(1)
else:
k = q.get()
time.sleep(1)
break
print 'report 2 finished'
def report_3(q):
while True: # logic to get shared queue
if q.empty():
time.sleep(1)
else:
k = q.get()
time.sleep(60000) # Take longer than allotted time
break
print 'report 3 finished'
def report_N(q):
while True: # logic to get shared queue
if q.empty():
time.sleep(1)
else:
k = q.get()
time.sleep(1)
break
print 'report N finished'
def close_SAP():
print 'closing SAP'
def format_file():
print 'formatting files'
def multi_daily_pull():
shared_list = range(6) # creating a shared list for all functions to use
q = multiprocessing.Queue() # creating an empty queue in mulitprocessing
for n in shared_list: # putting list into the queue
q.put(n)
print 'Starting process at ', time.strftime('%m/%d/%Y %H:%M:%S')
print 'Starting SAP Pulls at ', time.strftime('%m/%d/%Y %H:%M:%S')
StartSAP = Process(target=start_SAP)
StartSAP.start()
StartSAP.join()
report1 = Process(target=report_1, args=(q,))
report1.daemon = True
report2 = Process(target=report_2, args=(q,))
report2.daemon = True
report3 = Process(target=report_3, args=(q,))
report3.daemon = True
reportN = Process(target=report_N, args=(q,))
reportN.daemon = True
report1.start()
report2.start()
report3.start()
reportN.start()
report1.join(30)
report2.join(30)
report3.join(30)
reportN.join(30)
EndSAP = Process(target=close_SAP)
EndSAP.start()
EndSAP.join()
formatfile = Process(target=format_file)
formatfile .start()
formatfile .join()
if __name__ == '__main__':
multi_daily_pull()

Python multiprocessing - check status of each processes

I wonder if it is possible to check how long of each processes take.
for example, there are four workers and the job should take no more than 10 seconds, but one of worker take more than 10 seconds.Is there way to raise a alert after 10 seconds and before process finish the job.
My initial thought is using manager, but it seems I have wait till process finished.
Many thanks.
You can check whether process is alive after you tried to join it. Don't forget to set timeout otherwise it'll wait until job is finished.
Here is simple example for you
from multiprocessing import Process
import time
def task():
import time
time.sleep(5)
procs = []
for x in range(2):
proc = Process(target=task)
procs.append(proc)
proc.start()
time.sleep(2)
for proc in procs:
proc.join(timeout=0)
if proc.is_alive():
print "Job is not finished!"
I have found this solution time ago (somewhere here in StackOverflow) and I am very happy with it.
Basically, it uses signal to raise an exception if a process takes more than expected.
All you need to do is to add this class to your code:
import signal
class Timeout:
def __init__(self, seconds=1, error_message='TimeoutError'):
self.seconds = seconds
self.error_message = error_message
def handle_timeout(self, signum, frame):
raise TimeoutError(self.error_message)
def __enter__(self):
signal.signal(signal.SIGALRM, self.handle_timeout)
signal.alarm(self.seconds)
def __exit__(self, type, value, traceback):
signal.alarm(0)
Here is a general example of how it works:
import time
with Timeout(seconds=3, error_message='JobX took too much time'):
try:
time.sleep(10) #your job
except TimeoutError as e:
print(e)
In your case, I would add the with statement to the job that your worker need to perform. Then you catch the Exception and you do what you think is best.
Alternatively, you can periodically check if a process is alive:
timeout = 3 #seconds
start = time.time()
while time.time() - start < timeout:
if any(proces.is_alive() for proces in processes):
time.sleep(1)
else:
print('All processes done')
else:
print("Timeout!")
# do something
Use Pipe and messages
from multiprocessing import Process, Pipe
import numpy as np
caller, worker = Pipe()
val1 = ['der', 'die', 'das']
def worker_function(info):
print (info.recv())
for i in range(10):
print (val1[np.random.choice(3, 1)[0]])
info.send(['job finished'])
info.close()
def request(data):
caller.send(data)
task = Process(target=worker_function, args=(worker,))
if not task.is_alive():
print ("task is requested")
task.start()
if caller.recv() == ['job finished']:
task.join()
print ("finished")
if __name__ == '__main__':
data = {'input': 'here'}
request(data)

Calling a function and each call works simultaneously and not in a queue

Hi i would like to know what method i could use to call a function a few times and each call is processed in parallel and NOT in a queue based processing.
Something along this line
import time
import random
def run(incoming):
time.sleep(5)
print incoming
break
while True:
hash = random.getrandbits(128)
run(hash)
time.sleep(1)
import time
import random
import threading
def run(incoming):
time.sleep(5)
print incoming
while True:
hash = random.getrandbits(128)
threading.Thread(target = run,args = (hash,)).start()
time.sleep(1)
note that this is restricted by the gil where it interleaves the processes ... but for your purposes you can probably call it parallel and since your thread count keeps growing it may eventually break down
there are much better ways to do this lets check it out
def do_hard_work(hash):
time.sleep(1)
def Run(data_pipe):
while True:
while data_pipe.poll():
hash = data_pipe.recv()
if hash == "QUIT":
break
threading.Thread(target=do_hard_work,args=(hash)).start()
time.sleep(1)
local,remote = multiprocessing.Pipe()
worker_process = multiprocessing.Process(target=run,args=(local,))
worker_process.start()
while True:
remote.send(random.getrandbits(128))
time.sleep(1)
if some_condition:
remote.send("QUIT")
break

Is my HelloWorld queue working?

I'm about to put this design into use in an application, but I'm fairly new to threading and Queue stuff in python. Obviously the actual application is not for saying hello, but the design is the same - i.e. there is a process which takes some time to set-up and tear down, but I can do multiple tasks in one hit. Tasks will arrive at random times, and often in bursts.
Is this a sensible and thread safe design?
class HelloThing(object):
def __init__(self):
self.queue = self._create_worker()
def _create_worker(self):
import threading, Queue
def worker():
while True:
things = [q.get()]
while True:
try:
things.append(q.get_nowait())
except Queue.Empty:
break
self._say_hello(things)
[q.task_done() for task in xrange(len(things))]
q = Queue.Queue()
n_worker_threads = 1
for i in xrange(n_worker_threads):
t = threading.Thread(target=worker)
t.daemon = True
t.start()
return q
def _say_hello(self, greeting_list):
import time, sys
# setup stuff
time.sleep(1)
# do some things
sys.stdout.write('hello {0}!\n'.format(', '.join(greeting_list)))
# tear down stuff
time.sleep(1)
if __name__ == '__main__':
print 'enter __main__'
import time
hello = HelloThing()
hello.queue.put('world')
hello.queue.put('cruel world')
hello.queue.put('stack overflow')
time.sleep(2)
hello.queue.put('a')
hello.queue.put('b')
time.sleep(2)
for i in xrange(20):
hello.queue.put(str(i))
#hello.queue.join()
print 'finish __main__'
The thread safety is handled by Queue implementation (also you must handle in your _say_hello implementation if it is required).
Burst handler problem: A burst should be handled by a single thread only.(ex: let's say your process setup/teardown takes 10 seconds; at second 1 all threads will be busy with burst from sec 0, on second 5 a new task(or burst) but no thread available to handle them/it). So a burst should be defined by max number of tasks (or maybe "infinite") for a specific time-window. An entry in queue should be a list of tasks.
How can you group burst tasks list?
I provide a solution as code, more easy to explain ...
producer_q = Queue()
def _burst_thread():
while True:
available_tasks = [producer_q.get()]
time.sleep(BURST_TIME_WINDOW)
available_tasks.extend(producer_q.get() # I'm the single consumer, so will be at least qsize elements
for i in range(producer_q.qsize()))
consumer_q.push(available_tasks)
If you want to have a maximum of messages in a burst, you just need to slice the available_tasks in multiple lists.

kill a function after a certain time in windows

I've read a lot of posts about using threads, subprocesses, etc.. A lot of it seems over complicated for what I'm trying to do...
All I want to do is stop executing a function after X amount of time has elapsed.
def big_loop(bob):
x = bob
start = time.time()
while True:
print time.time()-start
This function is an endless loop that never throws any errors or exceptions, period.
I"m not sure the difference between "commands, shells, subprocesses, threads, etc.." and this function, which is why I'm having trouble manipulating subprocesses.
I found this code here, and tried it but as you can see it keeps printing after 10 seconds have elapsed:
import time
import threading
import subprocess as sub
import time
class RunCmd(threading.Thread):
def __init__(self, cmd, timeout):
threading.Thread.__init__(self)
self.cmd = cmd
self.timeout = timeout
def run(self):
self.p = sub.Popen(self.cmd)
self.p.wait()
def Run(self):
self.start()
self.join(self.timeout)
if self.is_alive():
self.p.terminate()
self.join()
def big_loop(bob):
x = bob
start = time.time()
while True:
print time.time()-start
RunCmd(big_loop('jimijojo'), 10).Run() #supposed to quit after 10 seconds, but doesn't
x = raw_input('DONEEEEEEEEEEEE')
What's a simple way this function can be killed. As you can see in my attempt above, it doesn't terminate after 20 seconds and just keeps on going...
***OH also, I've read about using signal, but I"m on windows so I can't use the alarm feature.. (python 2.7)
**assume the "infinitely running function" can't be manipulated or changed to be non-infinite, if I could change the function, well I'd just change it to be non infinite wouldn't I?
Here are some similar questions, which I haven't able to port over their code to work with my simple function:
Perhaps you can?
Python: kill or terminate subprocess when timeout
signal.alarm replacement in Windows [Python]
Ok I tried an answer I received, it works.. but how can I use it if I remove the if __name__ == "__main__": statement? When I remove this statement, the loop never ends as it did before..
import multiprocessing
import Queue
import time
def infinite_loop_function(bob):
var = bob
start = time.time()
while True:
time.sleep(1)
print time.time()-start
print 'this statement will never print'
def wrapper(queue, bob):
result = infinite_loop_function(bob)
queue.put(result)
queue.close()
#if __name__ == "__main__":
queue = multiprocessing.Queue(1) # Maximum size is 1
proc = multiprocessing.Process(target=wrapper, args=(queue, 'var'))
proc.start()
# Wait for TIMEOUT seconds
try:
timeout = 10
result = queue.get(True, timeout)
except Queue.Empty:
# Deal with lack of data somehow
result = None
finally:
proc.terminate()
print 'running other code, now that that infinite loop has been defeated!'
print 'bla bla bla'
x = raw_input('done')
Use the building blocks in the multiprocessing module:
import multiprocessing
import Queue
TIMEOUT = 5
def big_loop(bob):
import time
time.sleep(4)
return bob*2
def wrapper(queue, bob):
result = big_loop(bob)
queue.put(result)
queue.close()
def run_loop_with_timeout():
bob = 21 # Whatever sensible value you need
queue = multiprocessing.Queue(1) # Maximum size is 1
proc = multiprocessing.Process(target=wrapper, args=(queue, bob))
proc.start()
# Wait for TIMEOUT seconds
try:
result = queue.get(True, TIMEOUT)
except Queue.Empty:
# Deal with lack of data somehow
result = None
finally:
proc.terminate()
# Process data here, not in try block above, otherwise your process keeps running
print result
if __name__ == "__main__":
run_loop_with_timeout()
You could also accomplish this with a Pipe/Connection pair, but I'm not familiar with their API. Change the sleep time or TIMEOUT to check the behaviour for either case.
There is no straightforward way to kill a function after a certain amount of time without running the function in a separate process. A better approach would probably be to rewrite the function so that it returns after a specified time:
import time
def big_loop(bob, timeout):
x = bob
start = time.time()
end = start + timeout
while time.time() < end:
print time.time() - start
# Do more stuff here as needed
Can't you just return from the loop?
start = time.time()
endt = start + 30
while True:
now = time.time()
if now > endt:
return
else:
print end - start
import os,signal,time
cpid = os.fork()
if cpid == 0:
while True:
# do stuff
else:
time.sleep(10)
os.kill(cpid, signal.SIGKILL)
You can also check in the loop of a thread for an event, which is more portable and flexible as it allows other reactions than brute killing. However, this approach fails if # do stuff can take time (or even wait forever on some event).

Categories