I am building a watchdog timer that runs another Python program, and if it fails to find a check-in from any of the threads, shuts down the whole program. This is so it will, eventually, be able to take control of needed communication ports. The code for the timer is as follows:
from multiprocessing import Process, Queue
from time import sleep
from copy import deepcopy
PATH_TO_FILE = r'.\test_program.py'
WATCHDOG_TIMEOUT = 2
class Watchdog:
def __init__(self, filepath, timeout):
self.filepath = filepath
self.timeout = timeout
self.threadIdQ = Queue()
self.knownThreads = {}
def start(self):
threadIdQ = self.threadIdQ
process = Process(target = self._executeFile)
process.start()
try:
while True:
unaccountedThreads = deepcopy(self.knownThreads)
# Empty queue since last wake. Add new thread IDs to knownThreads, and account for all known thread IDs
# in queue
while not threadIdQ.empty():
threadId = threadIdQ.get()
if threadId in self.knownThreads:
unaccountedThreads.pop(threadId, None)
else:
print('New threadId < {} > discovered'.format(threadId))
self.knownThreads[threadId] = False
# If there is a known thread that is unaccounted for, then it has either hung or crashed.
# Shut everything down.
if len(unaccountedThreads) > 0:
print('The following threads are unaccounted for:\n')
for threadId in unaccountedThreads:
print(threadId)
print('\nShutting down!!!')
break
else:
print('No unaccounted threads...')
sleep(self.timeout)
# Account for any exceptions thrown in the watchdog timer itself
except:
process.terminate()
raise
process.terminate()
def _executeFile(self):
with open(self.filepath, 'r') as f:
exec(f.read(), {'wdQueue' : self.threadIdQ})
if __name__ == '__main__':
wd = Watchdog(PATH_TO_FILE, WATCHDOG_TIMEOUT)
wd.start()
I also have a small program to test the watchdog functionality
from time import sleep
from threading import Thread
from queue import SimpleQueue
Q_TO_Q_DELAY = 0.013
class QToQ:
def __init__(self, processQueue, threadQueue):
self.processQueue = processQueue
self.threadQueue = threadQueue
Thread(name='queueToQueue', target=self._run).start()
def _run(self):
pQ = self.processQueue
tQ = self.threadQueue
while True:
while not tQ.empty():
sleep(Q_TO_Q_DELAY)
pQ.put(tQ.get())
def fastThread(q):
while True:
print('Fast thread, checking in!')
q.put('fastID')
sleep(0.5)
def slowThread(q):
while True:
print('Slow thread, checking in...')
q.put('slowID')
sleep(1.5)
def hangThread(q):
print('Hanging thread, checked in')
q.put('hangID')
while True:
pass
print('Hello! I am a program that spawns threads!\n\n')
threadQ = SimpleQueue()
Thread(name='fastThread', target=fastThread, args=(threadQ,)).start()
Thread(name='slowThread', target=slowThread, args=(threadQ,)).start()
Thread(name='hangThread', target=hangThread, args=(threadQ,)).start()
QToQ(wdQueue, threadQ)
As you can see, I need to have the threads put into a queue.Queue, while a separate object slowly feeds the output of the queue.Queue into the multiprocessing queue. If instead I have the threads put directly into the multiprocessing queue, or do not have the QToQ object sleep in between puts, the multiprocessing queue will lock up, and will appear to always be empty on the watchdog side.
Now, as the multiprocessing queue is supposed to be thread and process safe, I can only assume I have messed something up in the implementation. My solution seems to work, but also feels hacky enough that I feel I should fix it.
I am using Python 3.7.2, if it matters.
I suspect that test_program.py exits.
I changed the last few lines to this:
tq = threadQ
# tq = wdQueue # option to send messages direct to WD
t1 = Thread(name='fastThread', target=fastThread, args=(tq,))
t2 = Thread(name='slowThread', target=slowThread, args=(tq,))
t3 = Thread(name='hangThread', target=hangThread, args=(tq,))
t1.start()
t2.start()
t3.start()
QToQ(wdQueue, threadQ)
print('Joining with threads...')
t1.join()
t2.join()
t3.join()
print('test_program exit')
The calls to join() means that the test program never exits all by itself since none of the threads ever exit.
So, as is, t3 hangs and the watchdog program detects this and detects the unaccounted for thread and stops the test program.
If t3 is removed from the above program, then the other two threads are well behaved and the watchdog program allows the test program to continue indefinitely.
Related
I am a new programmer towards multi-thread, now I want to use a thread to write to a log file for every 2 seconds, I have two solutions but I don't know which one is better.
First:
def logger(msg):
if msg != None:
logging.info(msg)
def main():
last = time.time()
while True:
msg = get_msg_from_somewhere()
current = time.time()
if current - last > 2:
t1 = threading.Thread(target=logger, args = (msg, ))
t1.start()
last = current
Second:
message = None
def logger():
global msg
while True:
if msg != None:
logging.info(msg)
msg = None
time.sleep(2)
def main():
t1 = threading.Thread(target=logger)
t1.setDaemon(True)
t1.start()
while True:
update_msg_from_somewhere()
My thoughts:
I prefer the second solution, because it doesn't need to compare the timestamp all the time and to create endless new threads (though they will be destroyed after they finish, right?), but I think the way I pass the msg is not the best (through global variables).
Do you have any ideas on how to pass variables to the daemon thread when it's running? And which solution do you prefer? Why?
Thanks a lot!
There are two questions.
The first one is using either daemon thread or not. It depends on your demand. If you can accept that the thread terminates suddenly which means there is no need for cleaning up, then you can use daemon thread as it will be convenient.
The second one is how to pass message in. As far as I think, this is a classic message queue problem. A better structure should be using a queue.
from queue import Queue
def logger(q):
for msg in iter(q.get, None):
logging.info(msg)
def main():
q = Queue()
t1 = threading.Thread(target=logger, args=(q,))
t1.setDaemon(True)
t1.start()
while True:
q.put(get_msg_from_somewhere())
time.sleep(2)
I'm trying to run multiple API requests in parallel with multiprocessing.Process and requests. I put urls to parse into JoinableQueue instance and put back the content to the Queue instance. I've noticed that putting response.content into the Queue somehow prevents the process from terminating.
Here's simplified example with just 1 process (Python 3.5):
import multiprocessing as mp
import queue
import requests
import time
class ChildProcess(mp.Process):
def __init__(self, q, qout):
super().__init__()
self.qin = qin
self.qout = qout
self.daemon = True
def run(self):
while True:
try:
url = self.qin.get(block=False)
r = requests.get(url, verify=False)
self.qout.put(r.content)
self.qin.task_done()
except queue.Empty:
break
except requests.exceptions.RequestException as e:
print(self.name, e)
self.qin.task_done()
print("Infinite loop terminates")
if __name__ == '__main__':
qin = mp.JoinableQueue()
qout = mp.Queue()
for _ in range(5):
qin.put('http://en.wikipedia.org')
w = ChildProcess(qin, qout)
w.start()
qin.join()
time.sleep(1)
print(w.name, w.is_alive())
After running the code I get:
Infinite loop terminates
ChildProcess-1 True
Please help to understand why the process doesn't terminate after run function exits.
Update: added print statement to show the loop terminates
As noted in the Pipes and Queues documentation
if a child process has put items on a queue (and it has not used
JoinableQueue.cancel_join_thread), then that process will not
terminate until all buffered items have been flushed to the pipe.
This means that if you try joining that process you may get a deadlock
unless you are sure that all items which have been put on the queue
have been consumed.
...
Note that a queue created using a manager does not have this issue.
If you switch over to a manager queue, then the process terminates successfully:
import multiprocessing as mp
import queue
import requests
import time
class ChildProcess(mp.Process):
def __init__(self, q, qout):
super().__init__()
self.qin = qin
self.qout = qout
self.daemon = True
def run(self):
while True:
try:
url = self.qin.get(block=False)
r = requests.get(url, verify=False)
self.qout.put(r.content)
self.qin.task_done()
except queue.Empty:
break
except requests.exceptions.RequestException as e:
print(self.name, e)
self.qin.task_done()
print("Infinite loop terminates")
if __name__ == '__main__':
manager = mp.Manager()
qin = mp.JoinableQueue()
qout = manager.Queue()
for _ in range(5):
qin.put('http://en.wikipedia.org')
w = ChildProcess(qin, qout)
w.start()
qin.join()
time.sleep(1)
print(w.name, w.is_alive())
It's a bit hard to figure this out based on the Queue documentation - I struggled with the same problem.
The key concept here is that before a producer thread terminates, it joins any queues that it has put data into; that join then blocks until the queue's background thread terminates, which only happens when the queue is empty. So basically, before your ChildProcess can exit, someone has to consume all the stuff it put into the queue!
There is some documentation of the Queue.cancel_join_thread function, which is supposed to circumvent this problem, but I couldn't get it to have any effect - maybe I'm not using it correctly.
Here's an example modification you can make that should fix the issue:
if __name__ == '__main__':
qin = mp.JoinableQueue()
qout = mp.Queue()
for _ in range(5):
qin.put('http://en.wikipedia.org')
w = ChildProcess(qin, qout)
w.start()
qin.join()
while True:
try:
qout.get(True, 0.1) # Throw away remaining stuff in qout (or process it or whatever,
# just get it out of the queue so the queue background process
# can terminate, so your ChildProcess can terminate.
except queue.Empty:
break
w.join() # Wait for your ChildProcess to finish up.
# time.sleep(1) # Not necessary since we've joined the ChildProcess
print(w.name, w.is_alive())
Add a call to w.terminate() above the print message.
Regarding why the process doesn't terminate itself; your function code is an infinite loop, so it doesn't ever return. Calling terminate signals the process to kill itself.
I know that the termination notice is made available via the meta-data url and that I can do something similar to
if requests.get("http://169.254.169.254/latest/meta-data/spot/termination-time").status_code == 200
in order to determine if the notice has been posted. I run a Python service on my Spot Instances that:
Loops over long polling SQS Queues
If it gets a message, it pauses polling and works on the payload.
Working on the payload can take 5-50 minutes.
Working on the payload will involve spawning a threadpool of up to 50 threads to handle parallel uploading of files to S3, this is the majority of the time spent working on the payload.
Finally, remove the message from the queue, rinse, repeat.
The work is idempotent, so if the same payload runs multiple times, I'm out the processing time/costs, but will not negatively impact the application workflow.
I'm searching for an elegant way to now also poll for the termination notice every five seconds in the background. As soon as the termination notice appears, I'd like to immediately release the message back to the SQS queue in order for another instance to pick it up as quickly as possible.
As a bonus, I'd like to shutdown the work, kill off the threadpool, and have the service enter a stasis state. If I terminate the service, supervisord will simply start it back up again.
Even bigger bonus! Is there not a python module available that simplifies this and just works?
I wrote this code to demonstrate how a thread can be used to poll for the Spot instance termination. It first starts up a polling thread, which would be responsible for checking the http endpoint.
Then we create pool of fake workers (mimicking real work to be done) and starts running the pool. Eventually the polling thread will kick in (about 10 seconds into execution as implemented) and kill the whole thing.
To prevent the script from continuing to work after Supervisor restarts it, we would simply put a check at the beginning of the __main__ and if the termination notice is there we sleep for 2.5 minutes, which is longer than that notice lasts before the instance is shutdown.
#!/usr/bin/env python
import threading
import Queue
import random
import time
import sys
import os
class Instance_Termination_Poll(threading.Thread):
"""
Sleep for 5 seconds and eventually pretend that we then recieve the
termination event
if requests.get("http://169.254.169.254/latest/meta-data/spot/termination-time").status_code == 200
"""
def run(self):
print("Polling for termination")
while True:
for i in range(30):
time.sleep(5)
if i==2:
print("Recieve Termination Poll!")
print("Pretend we returned the message to the queue.")
print("Now Kill the entire program.")
os._exit(1)
print("Well now, this is embarassing!")
class ThreadPool:
"""
Pool of threads consuming tasks from a queue
"""
def __init__(self, num_threads):
self.num_threads = num_threads
self.errors = Queue.Queue()
self.tasks = Queue.Queue(self.num_threads)
for _ in range(num_threads):
Worker(self.tasks, self.errors)
def add_task(self, func, *args, **kargs):
"""
Add a task to the queue
"""
self.tasks.put((func, args, kargs))
def wait_completion(self):
"""
Wait for completion of all the tasks in the queue
"""
try:
while True:
if self.tasks.empty() == False:
time.sleep(10)
else:
break
except KeyboardInterrupt:
print "Ctrl-c received! Kill it all with Prejudice..."
os._exit(1)
self.tasks.join()
class Worker(threading.Thread):
"""
Thread executing tasks from a given tasks queue
"""
def __init__(self, tasks, error_queue):
threading.Thread.__init__(self)
self.tasks = tasks
self.daemon = True
self.errors = error_queue
self.start()
def run(self):
while True:
func, args, kargs = self.tasks.get()
try:
func(*args, **kargs)
except Exception, e:
print("Exception " + str(e))
error = {'exception': e}
self.errors.put(error)
self.tasks.task_done()
def do_work(n):
"""
Sleeps a random ammount of time, then creates a little CPU usage to
mimic some work taking place.
"""
for z in range(100):
time.sleep(random.randint(3,10))
print "Thread ID: {} working.".format(threading.current_thread())
for x in range(30000):
x*n
print "Thread ID: {} done, sleeping.".format(threading.current_thread())
if __name__ == '__main__':
num_threads = 30
# Start up the termination polling thread
term_poll = Instance_Termination_Poll()
term_poll.start()
# Create our threadpool
pool = ThreadPool(num_threads)
for y in range(num_threads*2):
pool.add_task(do_work, n=y)
# Wait for the threadpool to complete
pool.wait_completion()
I'm using Python Python Multiprocessing for a RabbitMQ Consumers.
On Application Start I create 4 WorkerProcesses.
def start_workers(num=4):
for i in xrange(num):
process = WorkerProcess()
process.start()
Below you find my WorkerClass.
The Logic works so far, I create 4 parallel Consumer Processes.
But the Problem is after a Process got killed. I want to create a new Process. The Problem in the Logic below is that the new Process is created as child process from the old one and after a while the memory runs out of space.
Is there any possibility with Python Multiprocessing to start a new process and kill the old one correctly?
class WorkerProcess(multiprocessing.Process):
def ___init__(self):
app.logger.info('%s: Starting new Thread!', self.name)
super(multiprocessing.Process, self).__init__()
def shutdown(self):
process = WorkerProcess()
process.start()
return True
def kill(self):
start_workers(1)
self.terminate()
def run(self):
try:
# Connect to RabbitMQ
credentials = pika.PlainCredentials(app.config.get('RABBIT_USER'), app.config.get('RABBIT_PASS'))
connection = pika.BlockingConnection(
pika.ConnectionParameters(host=app.config.get('RABBITMQ_SERVER'), port=5672, credentials=credentials))
channel = connection.channel()
# Declare the Queue
channel.queue_declare(queue='screenshotlayer',
auto_delete=False,
durable=True)
app.logger.info('%s: Start to consume from RabbitMQ.', self.name)
channel.basic_qos(prefetch_count=1)
channel.basic_consume(callback, queue='screenshotlayer')
channel.start_consuming()
app.logger.info('%s: Thread is going to sleep!', self.name)
# do what channel.start_consuming() does but with stoppping signal
#while self.stop_working.is_set():
# channel.transport.connection.process_data_events()
channel.stop_consuming()
connection.close()
except Exception as e:
self.shutdown()
return 0
Thank You
In the main process, keep track of your subprocesses (in a list) and loop over them with .join(timeout=50) (https://docs.python.org/2/library/multiprocessing.html#multiprocessing.Process.join).
Then check is he is alive (https://docs.python.org/2/library/multiprocessing.html#multiprocessing.Process.is_alive).
If he is not, replace him with a fresh one.
def start_workers(n):
wks = []
for _ in range(n):
wks.append(WorkerProcess())
wks[-1].start()
while True:
#Remove all terminated process
wks = [p for p in wks if p.is_alive()]
#Start new process
for i in range(n-len(wks)):
wks.append(WorkerProcess())
wks[-1].start()
I would not handle the process pool management myself. Instead, I would use the ProcessPoolExecutor from the concurrent.future module.
No need to inherit the WorkerProcess to inherit the Process class. Just write your actual code in the class and then submit it to a process pool executor. The executor would have a pool of processes always ready to execute your tasks.
This way you can keep things simple and less headache for you.
You can read more about in my blog post here: http://masnun.com/2016/03/29/python-a-quick-introduction-to-the-concurrent-futures-module.html
Example Code:
from concurrent.futures import ProcessPoolExecutor
from time import sleep
def return_after_5_secs(message):
sleep(5)
return message
pool = ProcessPoolExecutor(3)
future = pool.submit(return_after_5_secs, ("hello"))
print(future.done())
sleep(5)
print(future.done())
print("Result: " + future.result())
I'm very new to multiprocessing module. And I just tried to create the following: I have one process that's job is to get message from RabbitMQ and pass it to internal queue (multiprocessing.Queue). Then what I want to do is : spawn a process when new message comes in. It works, but after the job is finished it leaves a zombie process not terminated by it's parent. Here is my code:
Main Process:
#!/usr/bin/env python
import multiprocessing
import logging
import consumer
import producer
import worker
import time
import base
conf = base.get_settings()
logger = base.logger(identity='launcher')
request_order_q = multiprocessing.Queue()
result_order_q = multiprocessing.Queue()
request_status_q = multiprocessing.Queue()
result_status_q = multiprocessing.Queue()
CONSUMER_KEYS = [{'queue':'product.order',
'routing_key':'product.order',
'internal_q':request_order_q}]
# {'queue':'product.status',
# 'routing_key':'product.status',
# 'internal_q':request_status_q}]
def main():
# Launch consumers
for key in CONSUMER_KEYS:
cons = consumer.RabbitConsumer(rabbit_q=key['queue'],
routing_key=key['routing_key'],
internal_q=key['internal_q'])
cons.start()
# Check reques_order_q if not empty spaw a process and process message
while True:
time.sleep(0.5)
if not request_order_q.empty():
handler = worker.Worker(request_order_q.get())
logger.info('Launching Worker')
handler.start()
if __name__ == "__main__":
main()
And here is my Worker:
import multiprocessing
import sys
import time
import base
conf = base.get_settings()
logger = base.logger(identity='worker')
class Worker(multiprocessing.Process):
def __init__(self, msg):
super(Worker, self).__init__()
self.msg = msg
self.daemon = True
def run(self):
logger.info('%s' % self.msg)
time.sleep(10)
sys.exit(1)
So after all the messages gets processed I can see processes with ps aux command. But I would really like them to be terminated once finished.
Thanks.
Using multiprocessing.active_children is better than Process.join. The function active_children cleans any zombies created since the last call to active_children. The method join awaits the selected process. During that time, other processes can terminate and become zombies, but the parent process will not notice, until the awaited method is joined. To see this in action:
import multiprocessing as mp
import time
def main():
n = 3
c = list()
for i in range(n):
d = dict(i=i)
p = mp.Process(target=count, kwargs=d)
p.start()
c.append(p)
for p in reversed(c):
p.join()
print('joined')
def count(i):
print(f'{i} going to sleep')
time.sleep(i * 10)
print(f'{i} woke up')
if __name__ == '__main__':
main()
The above will create 3 processes that terminate 10 seconds apart each. As the code is, the last process is joined first, so the other two, which terminated earlier, will be zombies for 20 seconds. You can see them with:
ps aux | grep Z
There will be no zombies if the processes are awaited in the sequence that they will terminate. Remove the call to the function reversed to see this case. However, in real applications we rarely know the sequence that children will terminate, so using the method multiprocessing.Process.join will result in some zombies.
The alternative active_children does not leave any zombies.
In the above example, replace the loop for p in reversed(c): with:
while True:
time.sleep(1)
if not mp.active_children():
break
and see what happens.
A couple of things:
Make sure the parent joins its children, to avoid zombies. See Python Multiprocessing Kill Processes
You can check whether a child is still running with the is_alive() member function. See http://docs.python.org/2/library/multiprocessing.html#multiprocessing.Process
Use active_children.
multiprocessing.active_children