how to kill zombie processes created by multiprocessing module?

how to kill zombie processes created by multiprocessing module? - python

I'm very new to multiprocessing module. And I just tried to create the following: I have one process that's job is to get message from RabbitMQ and pass it to internal queue (multiprocessing.Queue). Then what I want to do is : spawn a process when new message comes in. It works, but after the job is finished it leaves a zombie process not terminated by it's parent. Here is my code:
Main Process:
#!/usr/bin/env python
import multiprocessing
import logging
import consumer
import producer
import worker
import time
import base
conf = base.get_settings()
logger = base.logger(identity='launcher')
request_order_q = multiprocessing.Queue()
result_order_q = multiprocessing.Queue()
request_status_q = multiprocessing.Queue()
result_status_q = multiprocessing.Queue()
CONSUMER_KEYS = [{'queue':'product.order',
'routing_key':'product.order',
'internal_q':request_order_q}]
# {'queue':'product.status',
# 'routing_key':'product.status',
# 'internal_q':request_status_q}]
def main():
# Launch consumers
for key in CONSUMER_KEYS:
cons = consumer.RabbitConsumer(rabbit_q=key['queue'],
routing_key=key['routing_key'],
internal_q=key['internal_q'])
cons.start()
# Check reques_order_q if not empty spaw a process and process message
while True:
time.sleep(0.5)
if not request_order_q.empty():
handler = worker.Worker(request_order_q.get())
logger.info('Launching Worker')
handler.start()
if __name__ == "__main__":
main()
And here is my Worker:
import multiprocessing
import sys
import time
import base
conf = base.get_settings()
logger = base.logger(identity='worker')
class Worker(multiprocessing.Process):
def __init__(self, msg):
super(Worker, self).__init__()
self.msg = msg
self.daemon = True
def run(self):
logger.info('%s' % self.msg)
time.sleep(10)
sys.exit(1)
So after all the messages gets processed I can see processes with ps aux command. But I would really like them to be terminated once finished.
Thanks.

Using multiprocessing.active_children is better than Process.join. The function active_children cleans any zombies created since the last call to active_children. The method join awaits the selected process. During that time, other processes can terminate and become zombies, but the parent process will not notice, until the awaited method is joined. To see this in action:
import multiprocessing as mp
import time
def main():
n = 3
c = list()
for i in range(n):
d = dict(i=i)
p = mp.Process(target=count, kwargs=d)
p.start()
c.append(p)
for p in reversed(c):
p.join()
print('joined')
def count(i):
print(f'{i} going to sleep')
time.sleep(i * 10)
print(f'{i} woke up')
if __name__ == '__main__':
main()
The above will create 3 processes that terminate 10 seconds apart each. As the code is, the last process is joined first, so the other two, which terminated earlier, will be zombies for 20 seconds. You can see them with:
ps aux | grep Z
There will be no zombies if the processes are awaited in the sequence that they will terminate. Remove the call to the function reversed to see this case. However, in real applications we rarely know the sequence that children will terminate, so using the method multiprocessing.Process.join will result in some zombies.
The alternative active_children does not leave any zombies.
In the above example, replace the loop for p in reversed(c): with:
while True:
time.sleep(1)
if not mp.active_children():
break
and see what happens.

A couple of things:
Make sure the parent joins its children, to avoid zombies. See Python Multiprocessing Kill Processes
You can check whether a child is still running with the is_alive() member function. See http://docs.python.org/2/library/multiprocessing.html#multiprocessing.Process

Use active_children.
multiprocessing.active_children

Related

Why does my multiprocess queue not appear to be thread safe?

I am building a watchdog timer that runs another Python program, and if it fails to find a check-in from any of the threads, shuts down the whole program. This is so it will, eventually, be able to take control of needed communication ports. The code for the timer is as follows:
from multiprocessing import Process, Queue
from time import sleep
from copy import deepcopy
PATH_TO_FILE = r'.\test_program.py'
WATCHDOG_TIMEOUT = 2
class Watchdog:
def __init__(self, filepath, timeout):
self.filepath = filepath
self.timeout = timeout
self.threadIdQ = Queue()
self.knownThreads = {}
def start(self):
threadIdQ = self.threadIdQ
process = Process(target = self._executeFile)
process.start()
try:
while True:
unaccountedThreads = deepcopy(self.knownThreads)
# Empty queue since last wake. Add new thread IDs to knownThreads, and account for all known thread IDs
# in queue
while not threadIdQ.empty():
threadId = threadIdQ.get()
if threadId in self.knownThreads:
unaccountedThreads.pop(threadId, None)
else:
print('New threadId < {} > discovered'.format(threadId))
self.knownThreads[threadId] = False
# If there is a known thread that is unaccounted for, then it has either hung or crashed.
# Shut everything down.
if len(unaccountedThreads) > 0:
print('The following threads are unaccounted for:\n')
for threadId in unaccountedThreads:
print(threadId)
print('\nShutting down!!!')
break
else:
print('No unaccounted threads...')
sleep(self.timeout)
# Account for any exceptions thrown in the watchdog timer itself
except:
process.terminate()
raise
process.terminate()
def _executeFile(self):
with open(self.filepath, 'r') as f:
exec(f.read(), {'wdQueue' : self.threadIdQ})
if __name__ == '__main__':
wd = Watchdog(PATH_TO_FILE, WATCHDOG_TIMEOUT)
wd.start()
I also have a small program to test the watchdog functionality
from time import sleep
from threading import Thread
from queue import SimpleQueue
Q_TO_Q_DELAY = 0.013
class QToQ:
def __init__(self, processQueue, threadQueue):
self.processQueue = processQueue
self.threadQueue = threadQueue
Thread(name='queueToQueue', target=self._run).start()
def _run(self):
pQ = self.processQueue
tQ = self.threadQueue
while True:
while not tQ.empty():
sleep(Q_TO_Q_DELAY)
pQ.put(tQ.get())
def fastThread(q):
while True:
print('Fast thread, checking in!')
q.put('fastID')
sleep(0.5)
def slowThread(q):
while True:
print('Slow thread, checking in...')
q.put('slowID')
sleep(1.5)
def hangThread(q):
print('Hanging thread, checked in')
q.put('hangID')
while True:
pass
print('Hello! I am a program that spawns threads!\n\n')
threadQ = SimpleQueue()
Thread(name='fastThread', target=fastThread, args=(threadQ,)).start()
Thread(name='slowThread', target=slowThread, args=(threadQ,)).start()
Thread(name='hangThread', target=hangThread, args=(threadQ,)).start()
QToQ(wdQueue, threadQ)
As you can see, I need to have the threads put into a queue.Queue, while a separate object slowly feeds the output of the queue.Queue into the multiprocessing queue. If instead I have the threads put directly into the multiprocessing queue, or do not have the QToQ object sleep in between puts, the multiprocessing queue will lock up, and will appear to always be empty on the watchdog side.
Now, as the multiprocessing queue is supposed to be thread and process safe, I can only assume I have messed something up in the implementation. My solution seems to work, but also feels hacky enough that I feel I should fix it.
I am using Python 3.7.2, if it matters.

I suspect that test_program.py exits.
I changed the last few lines to this:
tq = threadQ
# tq = wdQueue # option to send messages direct to WD
t1 = Thread(name='fastThread', target=fastThread, args=(tq,))
t2 = Thread(name='slowThread', target=slowThread, args=(tq,))
t3 = Thread(name='hangThread', target=hangThread, args=(tq,))
t1.start()
t2.start()
t3.start()
QToQ(wdQueue, threadQ)
print('Joining with threads...')
t1.join()
t2.join()
t3.join()
print('test_program exit')
The calls to join() means that the test program never exits all by itself since none of the threads ever exit.
So, as is, t3 hangs and the watchdog program detects this and detects the unaccounted for thread and stops the test program.
If t3 is removed from the above program, then the other two threads are well behaved and the watchdog program allows the test program to continue indefinitely.

How to handle abnormal child process termination?

I'm using python 3.7 and following this documentation. I want to have a process, which should spawn a child process, wait for it to finish a task, and get some info back. I use the following code:
if __name__ == '__main__':
q = Queue()
p = Process(target=some_func, args=(q,))
p.start()
print q.get()
p.join()
When the child process finishes correctly there is no problem, and it works great, but the problem starts when my child process is terminated before it finished.
In this case, my application is hanging on wait.
Giving a timeout to q.get() and p.join() not completely solves the issue, because I want to know immediately that the child process died and not to wait to the timeout.
Another problem is that timeout on q.get() yields an exception, which I prefer to avoid.
Can someone suggest me a more elegant way to overcome those issues?

Queue & Signal
One possibility would be registering a signal handler and use it to pass a sentinel value.
On Unix you could handle SIGCHLD in the parent, but that's not an option in your case. According to the signal module:
On Windows, signal() can only be called with SIGABRT, SIGFPE, SIGILL, SIGINT, SIGSEGV, SIGTERM, or SIGBREAK.
Not sure if killing it through Task-Manager will translate into SIGTERM but you can give it a try.
For handling SIGTERM you would need to register the signal handler in the child.
import os
import sys
import time
import signal
from functools import partial
from multiprocessing import Process, Queue
SENTINEL = None
def _sigterm_handler(signum, frame, queue):
print("received SIGTERM")
queue.put(SENTINEL)
sys.exit()
def register_sigterm(queue):
global _sigterm_handler
_sigterm_handler = partial(_sigterm_handler, queue=queue)
signal.signal(signal.SIGTERM, _sigterm_handler)
def some_func(q):
register_sigterm(q)
print(os.getpid())
for i in range(30):
time.sleep(1)
q.put(f'msg_{i}')
if __name__ == '__main__':
q = Queue()
p = Process(target=some_func, args=(q,))
p.start()
for msg in iter(q.get, SENTINEL):
print(msg)
p.join()
Example Output:
12273
msg_0
msg_1
msg_2
msg_3
received SIGTERM
Process finished with exit code 0
Queue & Process.is_alive()
Even if this works with Task-Manager, your use-case sounds like you can't exclude force kills, so I think you're better off with an approach which doesn't rely on signals.
You can check in a loop if your process p.is_alive(), call queue.get() with a timeout specified and handle the Empty exceptions:
import os
import time
from queue import Empty
from multiprocessing import Process, Queue
def some_func(q):
print(os.getpid())
for i in range(30):
time.sleep(1)
q.put(f'msg_{i}')
if __name__ == '__main__':
q = Queue()
p = Process(target=some_func, args=(q,))
p.start()
while p.is_alive():
try:
msg = q.get(timeout=0.1)
except Empty:
pass
else:
print(msg)
p.join()
It would be also possible to avoid an exception, but I wouldn't recommend this because you don't spend your waiting time "on the queue", hence decreasing the responsiveness:
while p.is_alive():
if not q.empty():
msg = q.get_nowait()
print(msg)
time.sleep(0.1)
Pipe & Process.is_alive()
If you intend to utilize one connection per-child, it would however be possible to use a pipe instead of a queue. It's more performant than a queue
(which is mounted on top of a pipe) and you can use multiprocessing.connection.wait (Python 3.3+) to await readiness of multiple objects at once.
multiprocessing.connection.wait(object_list, timeout=None)
Wait till an object in object_list is ready. Returns the list of those objects in object_list which are ready. If timeout is a float then the call blocks for at most that many seconds. If timeout is None then it will block for an unlimited period. A negative timeout is equivalent to a zero timeout.
For both Unix and Windows, an object can appear in object_list if it is a readable Connection object;
a connected and readable socket.socket object; or
the sentinel attribute of a Process object.
A connection or socket object is ready when there is data available to be read from it, or the other end has been closed.
Unix: wait(object_list, timeout) almost equivalent select.select(object_list, [], [], timeout). The difference is that, if select.select() is interrupted by a signal, it can raise OSError with an error number of EINTR, whereas wait() will not.
Windows: An item in object_list must either be an integer handle which is waitable (according to the definition used by the documentation of the Win32 function WaitForMultipleObjects()) or it can be an object with a fileno() method which returns a socket handle or pipe handle. (Note that pipe handles and socket handles are not waitable handles.)
You can use this to await the sentinel attribute of the process and the parental end of the pipe concurrently.
import os
import time
from multiprocessing import Process, Pipe
from multiprocessing.connection import wait
def some_func(conn_write):
print(os.getpid())
for i in range(30):
time.sleep(1)
conn_write.send(f'msg_{i}')
if __name__ == '__main__':
conn_read, conn_write = Pipe(duplex=False)
p = Process(target=some_func, args=(conn_write,))
p.start()
while p.is_alive():
wait([p.sentinel, conn_read]) # block-wait until something gets ready
if conn_read.poll(): # check if something can be received
print(conn_read.recv())
p.join()

Multiprocessing and global True/False variable

I'm struggling to get my head around multiprocessing and passing a global True/False variable into my function.
After get_data() finishes I want the analysis() function to start and process the data, while fetch() continues running. How can I make this work? TIA
import multiprocessing
ready = False
def fetch():
global ready
get_data()
ready = True
return
def analysis():
analyse_data()
if __name__ == '__main__':
p1 = multiprocessing.Process(target=fetch)
p2 = multiprocessing.Process(target=analysis)
p1.start()
if ready:
p2.start()

You should run the two processes and use a shared queue to exchange information between them, such as signaling the completion of an action in one of the processes.
Also, you need to have a join() statement to properly wait for completion of the processes you spawn.
from multiprocessing import Process, Queue
import time
def get_data(q):
#Do something to get data
time.sleep(2)
#Put an event in the queue to signal that get_data has finished
q.put('message from get_data to analyse_data')
def analyse_data(q):
#waiting for get_data to finish...
msg = q.get()
print msg #Will print 'message from get_data to analyse_data'
#get_data has finished
if __name__ == '__main__':
#Create queue for exchanging messages between processes
q = Queue()
#Create processes, and send the shared queue to them
processes = [Process(target=get_data,args(q,)),Process(target=analyse_data,args=(q,))]
#Start processes
for p in processes:
p.start()
#Wait until all processes complete
for p in processes:
p.join()

You example won't work for a few reasons :
Process cannot share a piece of memory with each other (you can't change the global in one process and see the change in the other)
Even if you could change the global value, you are checking it too fast and most likely it won't change in time
Read https://docs.python.org/3/library/ipc.html for more possibilities for inter-process-communications

python multiprocessing - process hangs on join for large queue

I'm running python 2.7.3 and I noticed the following strange behavior. Consider this minimal example:
from multiprocessing import Process, Queue
def foo(qin, qout):
while True:
bar = qin.get()
if bar is None:
break
qout.put({'bar': bar})
if __name__ == '__main__':
import sys
qin = Queue()
qout = Queue()
worker = Process(target=foo,args=(qin,qout))
worker.start()
for i in range(100000):
print i
sys.stdout.flush()
qin.put(i**2)
qin.put(None)
worker.join()
When I loop over 10,000 or more, my script hangs on worker.join(). It works fine when the loop only goes to 1,000.
Any ideas?

The qout queue in the subprocess gets full. The data you put in it from foo() doesn't fit in the buffer of the OS's pipes used internally, so the subprocess blocks trying to fit more data. But the parent process is not reading this data: it is simply blocked too, waiting for the subprocess to finish. This is a typical deadlock.

There must be a limit on the size of queues. Consider the following modification:
from multiprocessing import Process, Queue
def foo(qin,qout):
while True:
bar = qin.get()
if bar is None:
break
#qout.put({'bar':bar})
if __name__=='__main__':
import sys
qin=Queue()
qout=Queue() ## POSITION 1
for i in range(100):
#qout=Queue() ## POSITION 2
worker=Process(target=foo,args=(qin,))
worker.start()
for j in range(1000):
x=i*100+j
print x
sys.stdout.flush()
qin.put(x**2)
qin.put(None)
worker.join()
print 'Done!'
This works as-is (with qout.put line commented out). If you try to save all 100000 results, then qout becomes too large: if I uncomment out the qout.put({'bar':bar}) in foo, and leave the definition of qout in POSITION 1, the code hangs. If, however, I move qout definition to POSITION 2, then the script finishes.
So in short, you have to be careful that neither qin nor qout becomes too large. (See also: Multiprocessing Queue maxsize limit is 32767)

I had the same problem on python3 when tried to put strings into a queue of total size about 5000 cahrs.
In my project there was a host process that sets up a queue and starts subprocess, then joins. Afrer join host process reads form the queue. When subprocess producess too much data, host hungs on join. I fixed this using the following function to wait for subprocess in the host process:
from multiprocessing import Process, Queue
from queue import Empty
def yield_from_process(q: Queue, p: Process):
while p.is_alive():
p.join(timeout=1)
while True:
try:
yield q.get(block=False)
except Empty:
break
I read from queue as soon as it fills so it never gets very large

I was trying to .get() an async worker after the pool had closed
indentation error outside of a with block
i had this
with multiprocessing.Pool() as pool:
async_results = list()
for job in jobs:
async_results.append(
pool.apply_async(
_worker_func,
(job,),
)
)
# wrong
for async_result in async_results:
yield async_result.get()
i needed this
with multiprocessing.Pool() as pool:
async_results = list()
for job in jobs:
async_results.append(
pool.apply_async(
_worker_func,
(job,),
)
)
# right
for async_result in async_results:
yield async_result.get()

python3 multiprocessing example crashed my pc :(

I am new to multiprocessing
I have run example code for two 'highly recommended' multiprocessing examples given in response to other stackoverflow multiprocessing questions. Here is an example of one (which i dare not run again!)
test2.py (running from pydev)
import multiprocessing
class MyFancyClass(object):
def __init__(self, name):
self.name = name
def do_something(self):
proc_name = multiprocessing.current_process().name
print(proc_name, self.name)
def worker(q):
obj = q.get()
obj.do_something()
queue = multiprocessing.Queue()
p = multiprocessing.Process(target=worker, args=(queue,))
p.start()
queue.put(MyFancyClass('Fancy Dan'))
# Wait for the worker to finish
queue.close()
queue.join_thread()
p.join()
When I run this my computer slows down imminently. It gets incrementally slower. After some time I managed to get into the task manager only to see MANY MANY python.exe under the processes tab. after trying to end process on some, my mouse stopped moving. It was the second time i was forced to reboot.
I am too scared to attempt a third example...
running - Intel(R) Core(TM) i7 CPU 870 # 2.93GHz (8 CPUs), ~2.9GHz on win7 64
If anyone know what the issue is and can provide a VERY SIMPLE example of multiprocessing (send a string too a multiprocess, alter it and send it back for printing) I would be very grateful.

From the docs:
Make sure that the main module can be safely imported by a new Python
interpreter without causing unintended side effects (such a starting a
new process).
Thus, on Windows, you must wrap your code inside a
if __name__=='__main__':
block.
For example, this sends a string to the worker process, the string is reversed and the result is printed by the main process:
import multiprocessing as mp
def worker(inq,outq):
obj = inq.get()
obj = obj[::-1]
outq.put(obj)
if __name__=='__main__':
inq = mp.Queue()
outq = mp.Queue()
p = mp.Process(target=worker, args=(inq,outq))
p.start()
inq.put('Fancy Dan')
# Wait for the worker to finish
p.join()
result = outq.get()
print(result)

Because of the way multiprocessing works on Windows (child processes import the __main__ module) the __main__ module cannot actually run anything when imported -- any code that should execute when run directly must be protected by the if __name__ == '__main__' idiom. Your corrected code:
import multiprocessing
class MyFancyClass(object):
def __init__(self, name):
self.name = name
def do_something(self):
proc_name = multiprocessing.current_process().name
print(proc_name, self.name)
def worker(q):
obj = q.get()
obj.do_something()
if __name__ == '__main__':
queue = multiprocessing.Queue()
p = multiprocessing.Process(target=worker, args=(queue,))
p.start()
queue.put(MyFancyClass('Fancy Dan'))
# Wait for the worker to finish
queue.close()
queue.join_thread()
p.join()

Might I suggest this link? It's using threads, instead of multiprocessing, but many of the principles are the same.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to kill zombie processes created by multiprocessing module? - python

A couple of things: Make sure the parent joins its children, to avoid zombies. See Python Multiprocessing Kill Processes You can check whether a child is still running with the is_alive() member function. See http://docs.python.org/2/library/multiprocessing.html#multiprocessing.Process

Use active_children. multiprocessing.active_children

Related

Why does my multiprocess queue not appear to be thread safe?

How to handle abnormal child process termination?

Multiprocessing and global True/False variable

python multiprocessing - process hangs on join for large queue

python3 multiprocessing example crashed my pc :(

Categories

Resources