Python on Windows - how to wait for multiple child processes? - python

How to wait for multiple child processes in Python on Windows, without active wait (polling)? Something like this almost works for me:
proc1 = subprocess.Popen(['python','mytest.py'])
proc2 = subprocess.Popen(['python','mytest.py'])
proc1.wait()
print "1 finished"
proc2.wait()
print "2 finished"
The problem is that when proc2 finishes before proc1, the parent process will still wait for proc1. On Unix one would use waitpid(0) in a loop to get the child processes' return codes as they finish - how to achieve something like this in Python on Windows?

It might seem overkill, but, here it goes:
import Queue, thread, subprocess
results= Queue.Queue()
def process_waiter(popen, description, que):
try: popen.wait()
finally: que.put( (description, popen.returncode) )
process_count= 0
proc1= subprocess.Popen( ['python', 'mytest.py'] )
thread.start_new_thread(process_waiter,
(proc1, "1 finished", results))
process_count+= 1
proc2= subprocess.Popen( ['python', 'mytest.py'] )
thread.start_new_thread(process_waiter,
(proc2, "2 finished", results))
process_count+= 1
# etc
while process_count > 0:
description, rc= results.get()
print "job", description, "ended with rc =", rc
process_count-= 1

Building on zseil's answer, you can do this with a mix of subprocess and win32 API calls. I used straight ctypes, because my Python doesn't happen to have win32api installed. I'm just spawning sleep.exe from MSYS here as an example, but clearly you could spawn any process you like. I use OpenProcess() to get a HANDLE from the process' PID, and then WaitForMultipleObjects to wait for any process to finish.
import ctypes, subprocess
from random import randint
SYNCHRONIZE=0x00100000
INFINITE = -1
numprocs = 5
handles = {}
for i in xrange(numprocs):
sleeptime = randint(5,10)
p = subprocess.Popen([r"c:\msys\1.0\bin\sleep.exe", str(sleeptime)], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=False)
h = ctypes.windll.kernel32.OpenProcess(SYNCHRONIZE, False, p.pid)
handles[h] = p.pid
print "Spawned Process %d" % p.pid
while len(handles) > 0:
print "Waiting for %d children..." % len(handles)
arrtype = ctypes.c_long * len(handles)
handle_array = arrtype(*handles.keys())
ret = ctypes.windll.kernel32.WaitForMultipleObjects(len(handle_array), handle_array, False, INFINITE)
h = handle_array[ret]
ctypes.windll.kernel32.CloseHandle(h)
print "Process %d done" % handles[h]
del handles[h]
print "All done!"

Twisted has an asynchronous process-spawning API which works on Windows. There are actually several different implementations, many of which are not so great, but you can switch between them without changing your code.

Twisted on Windows will perform an active wait under the covers. If you don't want to use threads, you will have to use the win32 API to avoid polling. Something like this:
import win32process
import win32event
# Note: CreateProcess() args are somewhat cryptic, look them up on MSDN
proc1, thread1, pid1, tid1 = win32process.CreateProcess(...)
proc2, thread2, pid2, tid2 = win32process.CreateProcess(...)
thread1.close()
thread2.close()
processes = {proc1: "proc1", proc2: "proc2"}
while processes:
handles = processes.keys()
# Note: WaitForMultipleObjects() supports at most 64 processes at a time
index = win32event.WaitForMultipleObjects(handles, False, win32event.INFINITE)
finished = handles[index]
exitcode = win32process.GetExitCodeProcess(finished)
procname = processes.pop(finished)
finished.close()
print "Subprocess %s finished with exit code %d" % (procname, exitcode)

You can use psutil:
>>> import subprocess
>>> import psutil
>>>
>>> proc1 = subprocess.Popen(['python','mytest.py'])
>>> proc2 = subprocess.Popen(['python','mytest.py'])
>>> ls = [psutil.Process(proc1.pid), psutil.Process(proc2.pid)]
>>>
>>> gone, alive = psutil.wait_procs(ls, timeout=3)
'gone' and 'alive' are lists indicating which processes are gone and which ones are still alive.
Optionally you can specify a callback which gets invoked every time one of the watched processes terminates:
>>> def on_terminate(proc):
... print "%s terminated" % proc
...
>>> gone, alive = psutil.wait_procs(ls, timeout=3, callback=on_terminate)

you can use psutil
import psutil
with psutil.Popen(["python", "mytest.py"]) as proc1, psutil.Popen(
["python", "mytest.py"]
) as proc2:
gone, alive = psutil.wait_procs([proc1, proc2], timeout=3)
'gone' and 'alive' are lists indicating which processes are gone and which ones are still alive.
Optionally you can specify a callback which gets invoked every time one of the watched processes terminates:
def on_terminate(proc):
print "%s terminated" % proc
gone, alive = psutil.wait_procs(ls, timeout=3, callback=on_terminate)

Related

How do I get child process PIDs when using ProcessPoolExecuter?

I'm using ProcessPoolExecutor context manager to run several Kafka consumers in parallel. I need to store the process IDs of the child processes so that later, I can cleanly terminate those processes. I have such code:
Class MultiProcessConsumer:
...
def run_in_parallel(self):
parallelism_factor = 5
with ProcessPoolExecutor() as executor:
processes = [executor.submit(self.consume) for _ in range(parallelism_factor)]
# It would be nice If I could write [process.pid for process in processes] to a file here.
def consume(self):
while True:
for message in self.kafka_consumer:
do_stuff(message)
I know I can use os.get_pid() in the consume method to get PIDs. But, handling them properly (in case of constant shutting down or starting up of consumers) requires some extra work.
How would you propose that I get and store PIDs of the child processes in such a context?
os.get_pid() seems to be the way to go. Just pass them through a Queue or Pipe in combination with maybe some random UUID that you pass to the process before to identify the PID.
from concurrent.futures import ProcessPoolExecutor
import os
import time
import uuid
#from multiprocessing import Process, Queue
import multiprocessing
import queue
#The Empty exception in in Queue, multiprocessing borrows
#it from there
# https://stackoverflow.com/questions/9908781/sharing-a-result-queue-among-several-processes
m = multiprocessing.Manager()
q = m.Queue()
def task(n, queue, uuid):
my_pid = os.getpid()
print("Executing our Task on Process {}".format(my_pid))
queue.put((uuid, my_pid))
time.sleep(n)
return n * n
def main():
with ProcessPoolExecutor(max_workers = 3) as executor:
some_dict = {}
for i in range(10):
print(i)
u = uuid.uuid4()
f = executor.submit(task, i, q, u)
some_dict[u] = [f, None] # PID not known here
try:
rcv_uuid, rcv_pid = q.get(block=True, timeout=1)
some_dict[rcv_uuid][1] = rcv_pid # store PID
except queue.Empty as e:
print('handle me', e)
print('I am', rcv_uuid, 'and my PID is', rcv_pid)
if __name__ == '__main__':
main()
Although this field is private, you could use the field in PoolProcessExecutor self._processes. The code snippet below shows how to use this variable.
import os
from concurrent.futures import ProcessPoolExecutor
from concurrent.futures import wait
nb_processes = 100
executor = ProcessPoolExecutor(nb_processes )
futures = [executor.submit(os.getpid) for _ in range(nb_processes )]
wait(futures)
backends = list(map(lambda x: x.result(), futures))
assert len(set(backends)) == nb_processes
In the case above, an assertion error is raised. This is because a new task can reuse the forked processes in the pool. You cannot know all forked process IDs through the method you memtioned. Hence, you can do as:
import os
from concurrent.futures import ProcessPoolExecutor
from concurrent.futures import wait
nb_processes = 100
executor = ProcessPoolExecutor(nb_processes )
futures = [executor.submit(os.getpid) for _ in range(nb_processes )]
wait(futures)
backends = list(map(lambda x: x.result(), futures))
assert len(set(executor._processes.keys())) == nb_processes
print('all of PID are: %s.' % list(executor._processes.keys()))
If you don't want to destroy the encapsulation, you could inhert the ProcessPoolExecutor and create a new property for that.

python multiprocessing - select-like on running processes to see which have one have finished

I want to run 15 commands but only run 3 at a time
testme.py
import multiprocessing
import time
import random
import subprocess
def popen_wrapper(i):
p = subprocess.Popen( ['echo', 'hi'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = p.communicate()
print stdout
time.sleep(randomint(5,20)) #pretend it's doing some work
return p.returncode
num_to_run = 15
max_parallel = 3
running = []
for i in range(num_to_run):
p = multiprocessing.Process(target=popen_wrapper, args=(i,))
running.append(p)
p.start()
if len(running) >= max_parallel:
# blocking wait - join on whoever finishes first then continue
else:
# nonblocking wait- see if any processes is finished. If so, join the finished processes
Im not sure how to implement the comments on:
if len(running) >= max_parallel:
# blocking wait - join on whoever finishes first then continue
else:
# nonblocking wait- see if any processes is finished. If so, join the finished processes
I would NOT be able to do something like:
for p in running:
p.join()
because the second process in running would have finished but im still blocked on the first one.
Quesion: how do you check to see if processes in running is finished in both blocking and nonblocking (find the first one finished)?
looking for something similiar to waitpid, maybe
Perhaps the easiest way to arrange this is to use a multiprocessing.Pool:
pool = mp.Pool(3)
will set up a pool with 3 worker processes. Then you can send 15 tasks to the pool:
for i in range(num_to_run):
pool.apply_async(popen_wrapper, args=(i,), callback=log_result)
and all the machinery necessary to coordinate the 3 workers and 15 tasks is
taken care of by mp.Pool.
Using mp.Pool:
import multiprocessing as mp
import time
import random
import subprocess
import logging
logger = mp.log_to_stderr(logging.WARN)
def popen_wrapper(i):
logger.warn('echo "hi"')
return i
def log_result(retval):
results.append(retval)
if __name__ == '__main__':
num_to_run = 15
max_parallel = 3
results = []
pool = mp.Pool(max_parallel)
for i in range(num_to_run):
pool.apply_async(popen_wrapper, args=(i,), callback=log_result)
pool.close()
pool.join()
logger.warn(results)
yields
[WARNING/PoolWorker-1] echo "hi"
[WARNING/PoolWorker-3] echo "hi"
[WARNING/PoolWorker-1] echo "hi"
[WARNING/PoolWorker-1] echo "hi"
[WARNING/PoolWorker-3] echo "hi"
[WARNING/PoolWorker-1] echo "hi"
[WARNING/PoolWorker-3] echo "hi"
[WARNING/PoolWorker-1] echo "hi"
[WARNING/PoolWorker-3] echo "hi"
[WARNING/PoolWorker-1] echo "hi"
[WARNING/PoolWorker-3] echo "hi"
[WARNING/PoolWorker-1] echo "hi"
[WARNING/PoolWorker-1] echo "hi"
[WARNING/PoolWorker-3] echo "hi"
[WARNING/PoolWorker-2] echo "hi"
[WARNING/MainProcess] [0, 2, 3, 5, 4, 6, 7, 8, 9, 10, 11, 12, 14, 13, 1]
The logging statements show which PoolWorker handles each task, and the last logging statement shows the MainProcess has received the return values from the 15 calls to popen_wrapper.
If you'd like do it without a Pool, you could set up a mp.Queue for tasks and a mp.Queue for return values:
Using mp.Process and mp.Queues:
import multiprocessing as mp
import time
import random
import subprocess
import logging
logger = mp.log_to_stderr(logging.WARN)
SENTINEL = None
def popen_wrapper(inqueue, outqueue):
for i in iter(inqueue.get, SENTINEL):
logger.warn('echo "hi"')
outqueue.put(i)
if __name__ == '__main__':
num_to_run = 15
max_parallel = 3
inqueue = mp.Queue()
outqueue = mp.Queue()
procs = [mp.Process(target=popen_wrapper, args=(inqueue, outqueue))
for i in range(max_parallel)]
for p in procs:
p.start()
for i in range(num_to_run):
inqueue.put(i)
for i in range(max_parallel):
# Put sentinels in the queue to tell `popen_wrapper` to quit
inqueue.put(SENTINEL)
for p in procs:
p.join()
results = [outqueue.get() for i in range(num_to_run)]
logger.warn(results)
Notice that if you use
procs = [mp.Process(target=popen_wrapper, args=(inqueue, outqueue))
for i in range(max_parallel)]
then you enforce there being exactly max_parallel (e.g. 3) worker processes. You then send all 15 tasks to one Queue:
for i in range(num_to_run):
inqueue.put(i)
and let the worker processes pull tasks out of the queue:
def popen_wrapper(inqueue, outqueue):
for i in iter(inqueue.get, SENTINEL):
logger.warn('echo "hi"')
outqueue.put(i)
You may also find Doug Hellman's multiprocessing tutorial of interest. Among the many instructive examples you'll find there is an ActivePool recipe which shows how to spawn 10 processes and yet limit them (using a mp.Semaphore) so that only 3 are active at any given time. While that may be instructive, it may not be the best solution in your situation since there doesn't appear to be a reason why you'd want to spawn more than 3 processes.

Killing a process based on how long it as been running with psutil Python

I am running a lot of processes called "csm.py". Each process has a command line argument for example "testarg".
"csm.py testarg".
I am using psutil to sucessfully check if another process of the same name ("csm.py testarg") is running with this code:
for process in psutil.process_iter():
cmdline = process.cmdline
if result[0][2] in cmdline:
proc += 1
if proc >= 2:
# DO SOMETHING
What I would like to do is find out IF there is a process called "csm.py testarg" already running that is older than 1 hour, and if it is kill it but do not kill the new process (this one) that is checking for the old "csm.py testarg". Is it possible to get the processes start time / date with psutil?
Thanks
I managed to figure it out like so:
for process in psutil.process_iter():
pid = process.pid
cmdline = process.cmdline
if what_im_looking_for in cmdline:
p = psutil.Process(pid)
p.create_time
pidcreated = datetime.datetime.fromtimestamp(p.create_time)
allowed_time = pidcreated + datetime.timedelta( 0, 5400 )
now = datetime.datetime.now()
if now > allowed_time:
print "Killing process {} Start time: {} Current time: {}".format( pid, pidcreated, now)
p.kill()
>>> import os, psutil, datetime
>>> p = psutil.Process(os.getpid())
>>> p.create_time()
1307289803.47
>>> datetime.datetime.fromtimestamp(p.create_time()).strftime("%Y-%m-%d %H:%M:%S")
'2011-03-05 18:03:52'

python get new windows processes ids as they start running in an event

Im looking for a way to get new processes ids as they start running.
currently i could get list of processes like this:
from ctypes import *
psapi = windll.psapi
print "[+] PID dumper by Y"
print "[+] contact : If you know me then give me a shout"
def getListOfProcesses():
max_array = c_ulong * 4096 # define long array to capture all the processes
pProcessIds = max_array() # array to store the list of processes
pBytesReturned = c_ulong() # the number of bytes returned in the array
#EnumProcess
psapi.EnumProcesses(byref(pProcessIds),
sizeof(pProcessIds),
byref(pBytesReturned))
# get the number of returned processes
nReturned = pBytesReturned.value/sizeof(c_ulong())
pidProcessArray = [i for i in pProcessIds][:nReturned]
for processes in pidProcessArray:
print "[+] Running Process PID %d" % processes
getListOfProcesses()
thanks in advance
You can do this with the WMI module:
import wmi
c = wmi.WMI()
process_watcher = c.Win32_Process.watch_for("creation")
while True:
p = process_watcher()
print "[+] Running Process PID %d" % p.ProcessId
By the way, in your getListOfProcesses function you can use pProcessIds[:nReturned] instead of the list comprehension.

Threading in python: retrieve return value when using target= [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Return value from thread
I want to get the "free memory" of a bunch of servers like this:
def get_mem(servername):
res = os.popen('ssh %s "grep MemFree /proc/meminfo | sed \'s/[^0-9]//g\'"' % servername)
return res.read().strip()
since this can be threaded I want to do something like that:
import threading
thread1 = threading.Thread(target=get_mem, args=("server01", ))
thread1.start()
But now: how can I access the return value(s) of the get_mem functions?
Do I really need to go the full fledged way creating a class MemThread(threading.Thread) and overwriting __init__ and __run__?
You could create a synchronised queue, pass it to the thread function and have it report back by pushing the result into the queue, e.g.:
def get_mem(servername, q):
res = os.popen('ssh %s "grep MemFree /proc/meminfo | sed \'s/[^0-9]//g\'"' % servername)
q.put(res.read().strip())
# ...
import threading, queue
q = queue.Queue()
threading.Thread(target=get_mem, args=("server01", q)).start()
result = q.get()
For the record, this is what I finally came up with (deviated from multiprocessing examples
from multiprocessing import Process, Queue
def execute_parallel(hostnames, command, max_processes=None):
"""
run the command parallely on the specified hosts, returns output of the commands as dict
>>> execute_parallel(['host01', 'host02'], 'hostname')
{'host01': 'host01', 'host02': 'host02'}
"""
NUMBER_OF_PROCESSES = max_processes if max_processes else len(hostnames)
def worker(jobs, results):
for hostname, command in iter(jobs.get, 'STOP'):
results.put((hostname, execute_host_return_output(hostname, command)))
job_queue = Queue()
result_queue = Queue()
for hostname in hostnames:
job_queue.put((hostname, command))
for i in range(NUMBER_OF_PROCESSES):
Process(target=worker, args=(job_queue, result_queue)).start()
result = {}
for i in range(len(hostnames)):
result.update([result_queue.get()])
# tell the processes to stop
for i in range(NUMBER_OF_PROCESSES):
job_queue.put('STOP')
return result

Categories