Timeouts for multiprocessing?

Timeouts for multiprocessing? - python

I've searched StackOverflow and although I've found many questions on this, I haven't found an answer that fits for my situation/not a strong python programmer to adapt their answer to fit my need.
I've looked here to no avail:
kill a function after a certain time in windows
Python: kill or terminate subprocess when timeout
signal.alarm replacement in Windows [Python]
I am using multiprocessing to run multiple SAP windows at once to pull reports. The is set up to run on a schedule every 5 minutes. Every once in a while, one of the reports gets stalled due to the GUI interface and never ends. I don't get an error or exception, it just stalls forever. What I would like is to have a timeout function that during this part of the code that is executed in SAP, if it takes longer than 4 minutes, it times out, closes SAP, skips the rest of the code, and waits for next scheduled report time.
I am using Windows Python 2.7
import multiprocessing
from multiprocessing import Manager, Process
import time
import datetime
### OPEN SAP ###
def start_SAP():
print 'opening SAP program'
### REPORTS IN SAP ###
def report_1(q, lock):
while True: # logic to get shared queue
if not q.empty():
lock.acquire()
k = q.get()
time.sleep(1)
lock.release()
break
else:
time.sleep(1)
print 'running report 1'
def report_2(q, lock):
while True: # logic to get shared queue
if not q.empty():
lock.acquire()
k = q.get()
time.sleep(1)
lock.release()
break
else:
time.sleep(1)
print 'running report 2'
def report_3(q, lock):
while True: # logic to get shared queue
if not q.empty():
lock.acquire()
k = q.get()
time.sleep(1)
lock.release()
break
else:
time.sleep(1)
time.sleep(60000) #mimicking the stall for report 3 that takes longer than allotted time
print 'running report 3'
def report_N(q, lock):
while True: # logic to get shared queue
if not q.empty():
lock.acquire()
k = q.get()
time.sleep(1)
lock.release()
break
else:
time.sleep(1)
print 'running report N'
### CLOSES SAP ###
def close_SAP():
print 'closes SAP'
def format_file():
print 'formatting files'
def multi_daily_pull():
lock = multiprocessing.Lock() # creating a lock in multiprocessing
shared_list = range(6) # creating a shared list for all functions to use
q = multiprocessing.Queue() # creating an empty queue in mulitprocessing
for n in shared_list: # putting list into the queue
q.put(n)
print 'Starting process at ', time.strftime('%m/%d/%Y %H:%M:%S')
print 'Starting SAP Pulls at ', time.strftime('%m/%d/%Y %H:%M:%S')
StartSAP = Process(target=start_SAP)
StartSAP.start()
StartSAP.join()
report1= Process(target=report_1, args=(q, lock))
report2= Process(target=report_2, args=(q, lock))
report3= Process(target=report_3, args=(q, lock))
reportN= Process(target=report_N, args=(q, lock))
report1.start()
report2.start()
report3.start()
reportN.start()
report1.join()
report2.join()
report3.join()
reportN.join()
EndSAP = Process(target=close_SAP)
EndSAP.start()
EndSAP.join()
formatfile = Process(target=format_file)
formatfile .start()
formatfile .join()
if __name__ == '__main__':
multi_daily_pull()

One way to do what you want would be to use the optional timeout argument that the Process.join() method accepts. This will make it only block the calling thread at most that length of time.
I also set the daemon attribute of each Process instance so your main thread will be able to terminate even if one of the processes it started is still "running" (or has hung up).
One final point, you don't need a multiprocessing.Lock to control access a multiprocessing.Queue, because they handle that aspect of things automatically, so I removed it. You may still want to have one for some other reason, such as controlling access to stdout so printing to it from the various processes doesn't overlap and mess up what is output to the screen.
import multiprocessing
from multiprocessing import Process
import time
import datetime
def start_SAP():
print 'opening SAP program'
### REPORTS IN SAP ###
def report_1(q):
while True: # logic to get shared queue
if q.empty():
time.sleep(1)
else:
k = q.get()
time.sleep(1)
break
print 'report 1 finished'
def report_2(q):
while True: # logic to get shared queue
if q.empty():
time.sleep(1)
else:
k = q.get()
time.sleep(1)
break
print 'report 2 finished'
def report_3(q):
while True: # logic to get shared queue
if q.empty():
time.sleep(1)
else:
k = q.get()
time.sleep(60000) # Take longer than allotted time
break
print 'report 3 finished'
def report_N(q):
while True: # logic to get shared queue
if q.empty():
time.sleep(1)
else:
k = q.get()
time.sleep(1)
break
print 'report N finished'
def close_SAP():
print 'closing SAP'
def format_file():
print 'formatting files'
def multi_daily_pull():
shared_list = range(6) # creating a shared list for all functions to use
q = multiprocessing.Queue() # creating an empty queue in mulitprocessing
for n in shared_list: # putting list into the queue
q.put(n)
print 'Starting process at ', time.strftime('%m/%d/%Y %H:%M:%S')
print 'Starting SAP Pulls at ', time.strftime('%m/%d/%Y %H:%M:%S')
StartSAP = Process(target=start_SAP)
StartSAP.start()
StartSAP.join()
report1 = Process(target=report_1, args=(q,))
report1.daemon = True
report2 = Process(target=report_2, args=(q,))
report2.daemon = True
report3 = Process(target=report_3, args=(q,))
report3.daemon = True
reportN = Process(target=report_N, args=(q,))
reportN.daemon = True
report1.start()
report2.start()
report3.start()
reportN.start()
report1.join(30)
report2.join(30)
report3.join(30)
reportN.join(30)
EndSAP = Process(target=close_SAP)
EndSAP.start()
EndSAP.join()
formatfile = Process(target=format_file)
formatfile .start()
formatfile .join()
if __name__ == '__main__':
multi_daily_pull()

Related

How to exit from a generator at some specific time?

I'm reading tweets from Twitter Streaming API. After connecting to the API, I'm getting a generator.
I'm looping through each tweet received but I want to exit from the iterator, say, at 18PM. After receiving each tweet, I'm checking if it's later than the specified timestamp and stopping.
The issue is that I'm not receiving tweets frequently enough. So, I could receive one at 17:50 and the next one at 19PM. That's when I'll find out that the time has passed and I need to stop.
Is there a way to force the stop at 18PM exactly?
Here's a high-level view of my code:
def getStream(tweet_iter):
for tweet in tweet_iter:
#do stuff
if time_has_passed():
return
tweet_iter = ConnectAndGetStream()
getStream(tweet_iter)

Create a separate thread for the producer and use a Queue to communicate. I also had to use a threading.Event for stopping the producer.
import itertools, queue, threading, time
END_TIME = time.time() + 5 # run for ~5 seconds
def time_left():
return END_TIME - time.time()
def ConnectAndGetStream(): # stub for the real thing
for i in itertools.count():
time.sleep(1)
yield "tweet {}".format(i)
def producer(tweets_queue, the_end): # producer
it = ConnectAndGetStream()
while not the_end.is_set():
tweets_queue.put(next(it))
def getStream(tweets_queue, the_end): # consumer
try:
while True:
tweet = tweets_queue.get(timeout=time_left())
print('Got', tweet)
except queue.Empty:
print('THE END')
the_end.set()
tweets_queue = queue.Queue() # you might wanna use the maxsize parameter
the_end = threading.Event()
producer_thread = threading.Thread(target=producer,
args=(tweets_queue, the_end))
producer_thread.start()
getStream(tweets_queue, the_end)
producer_thread.join()

Your problem could be resolved by splitting the functionality of your design into two separated processes:
A twitter process that acts as wrapper to Twitter API and
A monitor process that is able to terminate the twitter process when the exit time is reached.
The following piece of code prototypes the functionality described above using Python's multiprocessing module:
import multiprocessing as mp
import time
EXIT_TIME = '12:21' #'18:00'
def twitter():
while True:
print 'Twittttttttttt.....'
time.sleep(5)
def get_time():
return time.ctime().split()[3][:5]
if __name__ == '__main__':
# Execute the function as a process
p = mp.Process( target=twitter, args=() )
p.start()
# Monitoring the process p
while True:
print 'Checking the hour...'
if get_time() == EXIT_TIME:
p.terminate()
print 'Current time:', time.ctime()
print 'twitter process has benn terminated...'
break
time.sleep(5)
Of course you can use p.join(TIMEOUT) instead of using the while True loop presented in my example as pointed here.

Here is an example with threading and python scheduler:
import threading
import time
import os
import schedule
def theKillingJob():
print("Kenny and Cartman die!")
os._exit(1)
schedule.every().day.at("18:00").do(theKillingJob,'It is 18:00')
def getStream(tweet_iter):
for tweet in tweet_iter:
#do stuff
def kenny():
while True:
print("Kenny alive..")
schedule.run_pending()
time.sleep(1)
def cartman():
while True:
print("Cartman alive..")
tweet_iter = ConnectAndGetStream()
getStream(tweet_iter)
# You can change whenever you want to check for tweets by changing sleep time here
time.sleep(1)
if __name__ == '__main__':
daemon_kenny = threading.Thread(name='kenny', target=kenny)
daemon_cartman = threading.Thread(name='cartman', target=cartman)
daemon_kenny.setDaemon(True)
daemon_cartman.setDaemon(True)
daemon_kenny.start()
daemon_cartman.start()
daemon_kenny.join()
daemon_cartman.join()

Strange process clone appears with python multiprocessing

I have faces a very strange behavior of Python. It looks like when I start parallel program which uses multiprocessing and in the main process spawn 2 more(producer, consumer) I see 4 processes running. I think there should be only 3: the main, Producer, Consumer. But after some time the 4th process appears.
I have made a minimal example of the code to reproduce the problem. It create two processes in which calculate Fibonacci numbers using recursion:
from multiprocessing import Process, Queue
import os, sys
import time
import signal
def fib(n):
if n == 1 or n == 2:
return 1
result = fib(n-1) + fib(n-2)
return result
def worker(queue, amount):
pid = os.getpid()
def workerProcess(a, b):
print a, b
print 'This is Writer(', pid, ')'
signal.signal(signal.SIGUSR1, workerProcess)
print 'Worker', os.getpid()
for i in range(0, amount):
queue.put(fib(35 - i % 4))
queue.put('end')
print 'Worker finished'
def writer(queue):
pid = os.getpid()
def writerProcess(a, b):
print a, b
print 'This is Writer(', pid, ')'
signal.signal(signal.SIGUSR1, writerProcess)
print 'Writer', os.getpid()
working = True
while working:
if not queue.empty():
value = queue.get()
if value != 'end':
fib(32 + value % 4)
else:
working = False
else:
time.sleep(1)
print 'Writer finished'
def daemon():
print 'Daemon', os.getpid()
while True:
time.sleep(1)
def useProcesses(amount):
q = Queue()
writer_process = Process(target=writer, args=(q,))
worker_process = Process(target=worker, args=(q, amount))
writer_process.daemon = True
worker_process.daemon = True
worker_process.start()
writer_process.start()
def run(amount):
print 'Main', os.getpid()
pid = os.getpid()
def killThisProcess(a, b):
print a, b
print 'Main killed by signal(', pid, ')'
sys.exit(0)
signal.signal(signal.SIGTERM, killThisProcess)
useProcesses(amount)
print 'Ready to exit main'
while True:
time.sleep(1)
def main():
run(1000)
if __name__=='__main__':
main()
What I see in the output is:
$ python python_daemon.py
Main 13257
Ready to exit main
Worker 13258
Writer 13259
but in htop I see the following:
And it looks like the process with PID 13322 is actually a thread. The question is what is it? Who spawn it? Why?
If I send SIGUSR1 to this PID I see in the output appears:
10 <frame object at 0x7f05c14ed5d8>
This is Writer( 13258 )
This question is slightly related with: Python multiprocessing: more processes than requested

The threads belongs to the Queue object.
It uses internally a thread to dispatch the data over a Pipe.
From the docs:
class multiprocessing.Queue([maxsize])
Returns a process shared queue implemented using a pipe and a few locks/semaphores. When a process first puts an item on the queue a feeder thread is started which transfers objects from a buffer into the pipe.

Why the threads are not released after all work is consumed from python Queue

I use Queue to provide tasks that threads can work on. After all work is done from Queue, I see the threads are still alive while I expected them being released. Here is my code. You can see the active threads number is increasing after a batch of task(in the same queue) increases from the console. How could I release the threads after a batch of work get done?
import threading
import time
from Queue import Queue
class ThreadWorker(threading.Thread):
def __init__(self, task_queue):
threading.Thread.__init__(self)
self.task_queue = task_queue
def run(self):
while True:
work = self.task_queue.get()
#do some work
# do_work(work)
time.sleep(0.1)
self.task_queue.task_done()
def get_batch_work_done(works):
task_queue = Queue()
for _ in range(5):
t = ThreadWorker(task_queue)
t.setDaemon(True)
t.start()
for work in range(works):
task_queue.put(work)
task_queue.join()
print 'get batch work done'
print 'active threads count is {}'.format(threading.activeCount())
if __name__ == '__main__':
for work_number in range(3):
print 'start with {}'.format(work_number)
get_batch_work_done(work_number)

Do a non blocking read in a loop and use the exception handling to terminate
def run(self):
try:
while True:
work = self.task_queue.get(True, 0.1)
#do some work
# do_work(work)
except Queue.Empty:
print "goodbye"

Python multiprocessing with Queue (split loads dynamically)

I am trying to use multiprocessing to process very large number of files.
I tried to put the list of files into queue and make 3 workers split the load with a common Queue data type. However this seems not working. Probably I am misunderstanding about the queue in multiprocessing package.
Below is the example source code:
import multiprocessing
from multiprocessing import Queue
def worker(i, qu):
"""worker function"""
while ~qu.empty():
val=qu.get()
print 'Worker:',i, ' start with file:',val
j=1
for k in range(i*10000,(i+1)*10000): # some time consuming process
for j in range(i*10000,(i+1)*10000):
j=j+k
print 'Worker:',i, ' end with file:',val
if __name__ == '__main__':
jobs = []
qu=Queue()
for j in range(100,110): # files numbers are from 100 to 110
qu.put(j)
for i in range(3): # 3 multiprocess
p = multiprocessing.Process(target=worker, args=(i,qu))
jobs.append(p)
p.start()
p.join()
Thanks for the comments.
I come to know that using Pool is the best solution.
import multiprocessing
import time
def worker(val):
"""worker function"""
print 'Worker: start with file:',val
time.sleep(1.1)
print 'Worker: end with file:',val
if __name__ == '__main__':
file_list=range(100,110)
p = multiprocessing.Pool(2)
p.map(worker, file_list)

Two issues:
1) you are joining only on the 3rd process
2) Why not use multiprocessing.Pool?
3) race condition on qu.get()
1 & 3)
import multiprocessing
from multiprocessing import Queue
def worker(i, qu):
"""worker function"""
while 1:
try:
val=qu.get(timeout)
except Queue.Empty: break# Yay no race condition
print 'Worker:',i, ' start with file:',val
j=1
for k in range(i*10000,(i+1)*10000): # some time consuming process
for j in range(i*10000,(i+1)*10000):
j=j+k
print 'Worker:',i, ' end with file:',val
if __name__ == '__main__':
jobs = []
qu=Queue()
for j in range(100,110): # files numbers are from 100 to 110
qu.put(j)
for i in range(3): # 3 multiprocess
p = multiprocessing.Process(target=worker, args=(i,qu))
jobs.append(p)
p.start()
for p in jobs: #<--- join on all processes ...
p.join()
2)
for how to use the Pool, see:
https://docs.python.org/2/library/multiprocessing.html

You are joining only the last of your created processes. That means if the first or the second process is still working while the third is finished, your main process is goning down and kills the remaining processes before they are finished.
You should join them all in order to wait until they are finished:
for p in jobs:
p.join()
Another thing is you should consider using qu.get_nowait() in order to get rid of the race condition between qu.empty() and qu.get().
For example:
try:
while 1:
message = self.queue.get_nowait()
""" do something fancy here """
except Queue.Empty:
pass
I hope that helps

kill a function after a certain time in windows

I've read a lot of posts about using threads, subprocesses, etc.. A lot of it seems over complicated for what I'm trying to do...
All I want to do is stop executing a function after X amount of time has elapsed.
def big_loop(bob):
x = bob
start = time.time()
while True:
print time.time()-start
This function is an endless loop that never throws any errors or exceptions, period.
I"m not sure the difference between "commands, shells, subprocesses, threads, etc.." and this function, which is why I'm having trouble manipulating subprocesses.
I found this code here, and tried it but as you can see it keeps printing after 10 seconds have elapsed:
import time
import threading
import subprocess as sub
import time
class RunCmd(threading.Thread):
def __init__(self, cmd, timeout):
threading.Thread.__init__(self)
self.cmd = cmd
self.timeout = timeout
def run(self):
self.p = sub.Popen(self.cmd)
self.p.wait()
def Run(self):
self.start()
self.join(self.timeout)
if self.is_alive():
self.p.terminate()
self.join()
def big_loop(bob):
x = bob
start = time.time()
while True:
print time.time()-start
RunCmd(big_loop('jimijojo'), 10).Run() #supposed to quit after 10 seconds, but doesn't
x = raw_input('DONEEEEEEEEEEEE')
What's a simple way this function can be killed. As you can see in my attempt above, it doesn't terminate after 20 seconds and just keeps on going...
***OH also, I've read about using signal, but I"m on windows so I can't use the alarm feature.. (python 2.7)
**assume the "infinitely running function" can't be manipulated or changed to be non-infinite, if I could change the function, well I'd just change it to be non infinite wouldn't I?
Here are some similar questions, which I haven't able to port over their code to work with my simple function:
Perhaps you can?
Python: kill or terminate subprocess when timeout
signal.alarm replacement in Windows [Python]
Ok I tried an answer I received, it works.. but how can I use it if I remove the if __name__ == "__main__": statement? When I remove this statement, the loop never ends as it did before..
import multiprocessing
import Queue
import time
def infinite_loop_function(bob):
var = bob
start = time.time()
while True:
time.sleep(1)
print time.time()-start
print 'this statement will never print'
def wrapper(queue, bob):
result = infinite_loop_function(bob)
queue.put(result)
queue.close()
#if __name__ == "__main__":
queue = multiprocessing.Queue(1) # Maximum size is 1
proc = multiprocessing.Process(target=wrapper, args=(queue, 'var'))
proc.start()
# Wait for TIMEOUT seconds
try:
timeout = 10
result = queue.get(True, timeout)
except Queue.Empty:
# Deal with lack of data somehow
result = None
finally:
proc.terminate()
print 'running other code, now that that infinite loop has been defeated!'
print 'bla bla bla'
x = raw_input('done')

Use the building blocks in the multiprocessing module:
import multiprocessing
import Queue
TIMEOUT = 5
def big_loop(bob):
import time
time.sleep(4)
return bob*2
def wrapper(queue, bob):
result = big_loop(bob)
queue.put(result)
queue.close()
def run_loop_with_timeout():
bob = 21 # Whatever sensible value you need
queue = multiprocessing.Queue(1) # Maximum size is 1
proc = multiprocessing.Process(target=wrapper, args=(queue, bob))
proc.start()
# Wait for TIMEOUT seconds
try:
result = queue.get(True, TIMEOUT)
except Queue.Empty:
# Deal with lack of data somehow
result = None
finally:
proc.terminate()
# Process data here, not in try block above, otherwise your process keeps running
print result
if __name__ == "__main__":
run_loop_with_timeout()
You could also accomplish this with a Pipe/Connection pair, but I'm not familiar with their API. Change the sleep time or TIMEOUT to check the behaviour for either case.

There is no straightforward way to kill a function after a certain amount of time without running the function in a separate process. A better approach would probably be to rewrite the function so that it returns after a specified time:
import time
def big_loop(bob, timeout):
x = bob
start = time.time()
end = start + timeout
while time.time() < end:
print time.time() - start
# Do more stuff here as needed

Can't you just return from the loop?
start = time.time()
endt = start + 30
while True:
now = time.time()
if now > endt:
return
else:
print end - start

import os,signal,time
cpid = os.fork()
if cpid == 0:
while True:
# do stuff
else:
time.sleep(10)
os.kill(cpid, signal.SIGKILL)
You can also check in the loop of a thread for an event, which is more portable and flexible as it allows other reactions than brute killing. However, this approach fails if # do stuff can take time (or even wait forever on some event).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Timeouts for multiprocessing? - python

Related

How to exit from a generator at some specific time?

Strange process clone appears with python multiprocessing

Why the threads are not released after all work is consumed from python Queue

Python multiprocessing with Queue (split loads dynamically)

kill a function after a certain time in windows

Categories

Resources