I have a number of Popen objects, each representing a long-running command I have started. In fact, I do not expect these commands to exit. If any of them do exit, I want to wait a few seconds and then restart. Is there a good, pythonic way to do this?
For example:
import random
from subprocess import Popen
procs = list()
for i in range(10):
procs.append(Popen(["/bin/sleep", str(random.randrange(5,10))]))
A naive approach might be:
for p in procs:
p.wait()
print "a process has exited"
# restart code
print "all done!"
But this will not alert me to the first process that has exited. So I could try
for p in procs:
p.poll()
if p.returncode is not None:
print "a process has exited"
procs.remove(p)
# restart code
print "all done!"
However, this is a tight loop and will consume a CPU. I suppose I could add a time.sleep(1) in the loop so it's not busywaiting, but I lose precision.
I feel like there should be some nice way to wait on a group of pids -- am I right?
The "restart crashed server" task is really common, and probably shouldn't be handled by custom code unless there's a concrete reason. See upstart and systemd and monit.
The multiprocessing.Pool object sounds like a win -- it automatically starts processes, and even restarts them if needed. Unfortunately it's not very configurable.
Here's one solution with good old Popen:
import random, time
from subprocess import Popen
def work_diligently():
cmd = ["/bin/sleep", str(random.randrange(2,4))]
proc = Popen(cmd)
print '\t{}\t{}'.format(proc.pid, cmd) # pylint: disable=E1101
return proc
def spawn(num):
return [ work_diligently() for _ in xrange(num) ]
NUM_PROCS = 3
procs = spawn(NUM_PROCS)
while True:
print time.ctime(), 'scan'
procs = [
proc for proc in procs
if proc.poll() is None
]
num_exited = NUM_PROCS - len(procs)
if num_exited:
print 'Uhoh! Restarting {} procs'.format(num_exited)
procs.extend( spawn(num_exited) )
time.sleep(1)
Output:
2340 ['/bin/sleep', '2']
2341 ['/bin/sleep', '2']
2342 ['/bin/sleep', '3']
Mon Jun 2 18:01:42 2014 scan
Mon Jun 2 18:01:43 2014 scan
Mon Jun 2 18:01:44 2014 scan
Uhoh! Restarting 2 procs
2343 ['/bin/sleep', '3']
2344 ['/bin/sleep', '2']
Mon Jun 2 18:01:45 2014 scan
Uhoh! Restarting 1 procs
2345 ['/bin/sleep', '2']
Mon Jun 2 18:01:46 2014 scan
Uhoh! Restarting 1 procs
2346 ['/bin/sleep', '2']
Mon Jun 2 18:01:47 2014 scan
Uhoh! Restarting 2 procs
2347 ['/bin/sleep', '3']
2349 ['/bin/sleep', '2']
If you use a posix operating system, you can use os.wait to wait for any child process. You get the process-id, which you can compare with the pids of your list, to find the process, which has terminated:
import random
from subprocess import Popen
import os
procs = {}
for i in range(10):
proc = Popen(["/bin/sleep", str(random.randrange(5,10))])
procs[proc.pid] = proc
while procs:
pid, status = os.wait()
proc = procs.pop(pid)
print "process %d has exited" % proc.pid
# restart code
print "all done!"
The twisted process API allows efficiently responding to processes finishing and lots of other conditions.
Related
I'm using ThreadPool for IO-intensive work (such as loading data) while fork a process within the thread if there is CPU-intensive calculation needed. This is my workaround to the GIL problem (anyway, this is not my key problem here).
My problem is: when running my code, sometimes, although a process is folked, it looks always in sleeping status (not even run my calculation). Thus, it causes blocking since the code calls join to wait for result (or error). Note that the problem doesn't always happen, just occasionally.
My running environment is: linux centos 7.3, anaconda 5.1.0 (with built-in python 3.6.4). And note that: i failed to reproduce the issue with the same code on windows.
the following is the simplified code, which can reproduce the issue on linux:
import logging
import time
import random
import os
import threading
from multiprocessing.pool import Pool, ThreadPool
class Task(object):
def __init__(self, func, *args) -> None:
super().__init__()
self.func = func
self.args = args
class ConcurrentExecutor(object):
def __init__(self, spawn_count=1) -> None:
super().__init__()
self.spawn_count = spawn_count
def fork(self, task):
result = None
print('Folk started: %s' % task.args)
pool = Pool(processes=1)
try:
handle = pool.apply_async(task.func, task.args)
pool.close()
result = handle.get()
print('Folk completed: %s' % task.args)
except Exception as err:
print('Fork failure: FOLK%s' % task.args)
raise err
finally:
pool.terminate()
return result
def spawn(self, tasks):
results = []
try:
print('Spawn started')
handles = []
pool = ThreadPool(processes=self.spawn_count)
for task in tasks:
handles.append(pool.apply_async(task.func, task.args))
pool.close()
pool.join()
print('all done')
for handle in handles:
results.append(handle.get(10))
print('Spawn completed')
except Exception as err:
print('Spawn failure')
raise err
return results
def foo_proc(i):
print(i)
result=i*i
time.sleep(1)
return result
def foo(i):
executor = ConcurrentExecutor(2)
try:
result = executor.fork(Task(foo_proc, i))
except Exception as err:
result = 'ERROR'
return result
if __name__ == '__main__':
executor = ConcurrentExecutor(4)
tasks = []
for i in range(1, 10):
tasks.append(Task(foo, i))
start = time.time()
print(executor.spawn(tasks))
end = time.time()
print(end - start)
The following shows an example of running result:
[appadmin#168-61-40-47 test]$ python test.py
7312
Spawn started
Folk started: 1
Folk started: 2
Folk started: 3
Folk started: 4
4
2
1
3
Folk completed: 4
Folk completed: 2
Folk completed: 1
Folk completed: 3
Folk started: 5
Folk started: 6
Folk started: 7
5
Folk started: 8
7
8
Folk completed: 5
Folk completed: 7
Folk completed: 8
Folk started: 9
9
Folk completed: 9
You may see the code was stuck since process "6" was folked but never do the work. Meanwhile, i can see two python processes, which is supposed to be only 1 process at last if everything runs correctly:
[user#x-x-x-x x]$ ps -aux | grep python
user 7312 0.1 0.0 1537216 13524 pts/0 Sl+ 22:23 0:02 python test.py
user 7339 0.0 0.0 1545444 10604 pts/0 S+ 22:23 0:00 python test.py
Could anyone help? Thanks in advance!
This is my code:
from multiprocessing import Pool, Lock
from datetime import datetime as dt
console_out = "/STDOUT/Console.out"
chunksize = 50
lock = Lock()
def writer(message):
lock.acquire()
with open(console_out, 'a') as out:
out.write(message)
out.flush()
lock.release()
def conf_wrapper(state):
import ProcessingModule as procs
import sqlalchemy as sal
stcd, nrows = state
engine = sal.create_engine('postgresql://foo:bar#localhost:5432/schema')
writer("State {s} started at: {n}"
"\n".format(s=str(stcd).zfill(2), n=dt.now()))
with engine.connect() as conn, conn.begin():
procs.processor(conn, stcd, nrows, chunksize)
writer("\tState {s} finished at: {n}"
"\n".format(s=str(stcd).zfill(2), n=dt.now()))
def main():
nprocesses = 12
maxproc = 1
state_list = [(2, 113), (10, 119), (15, 84), (50, 112), (44, 110), (11, 37), (33, 197)]
with open(console_out, 'w') as out:
out.write("Starting at {n}\n".format(n=dt.now()))
out.write("Using {p} processes..."
"\n".format(p=nprocesses))
with Pool(processes=int(nprocesses), maxtasksperchild=maxproc) as pool:
pool.map(func=conf_wrapper, iterable=state_list, chunksize=1)
with open(console_out, 'a') as out:
out.write("\nAll done at {n}".format(n=dt.now()))
The file console_out never has all 7 states in it. It always misses one or more state. Here is the output from the latest run:
Starting at 2016-07-27 21:46:58.638587
Using 12 processes...
State 44 started at: 2016-07-27 21:47:01.482322
State 02 started at: 2016-07-27 21:47:01.497947
State 11 started at: 2016-07-27 21:47:01.529198
State 10 started at: 2016-07-27 21:47:01.497947
State 11 finished at: 2016-07-27 21:47:15.701207
State 15 finished at: 2016-07-27 21:47:24.123164
State 44 finished at: 2016-07-27 21:47:32.029489
State 50 finished at: 2016-07-27 21:47:51.203107
State 10 finished at: 2016-07-27 21:47:53.046876
State 33 finished at: 2016-07-27 21:47:58.156301
State 02 finished at: 2016-07-27 21:48:18.856979
All done at 2016-07-27 21:48:18.992277
Why?
Note, OS is Windows Server 2012 R2.
Since you're running on Windows, nothing is inherited by worker processes. Each process runs the entire main program "from scratch".
In particular, with the code as written every process has its own instance of lock, and these instances have nothing to do with each other. In short, lock isn't supplying any inter-process mutual exclusion at all.
To fix this, the Pool constructor can be changed to call a once-per-process initialization function, to which you pass an instance of Lock(). For example, like so:
def init(L):
global lock
lock = L
and then add these arguments to the Pool() constructor:
initializer=init, initargs=(Lock(),),
And you no longer need the:
lock = Lock()
line.
Then the inter-process mutual exclusion will work as intended.
WITHOUT A LOCK
If you'd like to delegate all output to a writer process, you could skip the lock and use a queue instead to feed that process [and see later for different version].
def writer_process(q):
with open(console_out, 'w') as out:
while True:
message = q.get()
if message is None:
break
out.write(message)
out.flush() # can't guess whether you really want this
and change writer() to just:
def writer(message):
q.put(message)
You would again need to use initializer= and initargs= in the Pool constructor so that all processes use the same queue.
Only one process should run writer_process(), and that can be started on its own as an instance of multiprocessing.Process.
Finally, to let writer_process() know it's time to quit, when it is time for it to drain the queue and return just run
q.put(None)
in the main process.
LATER
The OP settled on this version instead, because they needed to open the output file in other code simultaneously:
def writer_process(q):
while True:
message = q.get()
if message == 'done':
break
else:
with open(console_out, 'a') as out:
out.write(message)
I don't know why the terminating sentinel was changed to "done". Any unique value works for this; None is traditional.
I want to run 15 commands but only run 3 at a time
testme.py
import multiprocessing
import time
import random
import subprocess
def popen_wrapper(i):
p = subprocess.Popen( ['echo', 'hi'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = p.communicate()
print stdout
time.sleep(randomint(5,20)) #pretend it's doing some work
return p.returncode
num_to_run = 15
max_parallel = 3
running = []
for i in range(num_to_run):
p = multiprocessing.Process(target=popen_wrapper, args=(i,))
running.append(p)
p.start()
if len(running) >= max_parallel:
# blocking wait - join on whoever finishes first then continue
else:
# nonblocking wait- see if any processes is finished. If so, join the finished processes
Im not sure how to implement the comments on:
if len(running) >= max_parallel:
# blocking wait - join on whoever finishes first then continue
else:
# nonblocking wait- see if any processes is finished. If so, join the finished processes
I would NOT be able to do something like:
for p in running:
p.join()
because the second process in running would have finished but im still blocked on the first one.
Quesion: how do you check to see if processes in running is finished in both blocking and nonblocking (find the first one finished)?
looking for something similiar to waitpid, maybe
Perhaps the easiest way to arrange this is to use a multiprocessing.Pool:
pool = mp.Pool(3)
will set up a pool with 3 worker processes. Then you can send 15 tasks to the pool:
for i in range(num_to_run):
pool.apply_async(popen_wrapper, args=(i,), callback=log_result)
and all the machinery necessary to coordinate the 3 workers and 15 tasks is
taken care of by mp.Pool.
Using mp.Pool:
import multiprocessing as mp
import time
import random
import subprocess
import logging
logger = mp.log_to_stderr(logging.WARN)
def popen_wrapper(i):
logger.warn('echo "hi"')
return i
def log_result(retval):
results.append(retval)
if __name__ == '__main__':
num_to_run = 15
max_parallel = 3
results = []
pool = mp.Pool(max_parallel)
for i in range(num_to_run):
pool.apply_async(popen_wrapper, args=(i,), callback=log_result)
pool.close()
pool.join()
logger.warn(results)
yields
[WARNING/PoolWorker-1] echo "hi"
[WARNING/PoolWorker-3] echo "hi"
[WARNING/PoolWorker-1] echo "hi"
[WARNING/PoolWorker-1] echo "hi"
[WARNING/PoolWorker-3] echo "hi"
[WARNING/PoolWorker-1] echo "hi"
[WARNING/PoolWorker-3] echo "hi"
[WARNING/PoolWorker-1] echo "hi"
[WARNING/PoolWorker-3] echo "hi"
[WARNING/PoolWorker-1] echo "hi"
[WARNING/PoolWorker-3] echo "hi"
[WARNING/PoolWorker-1] echo "hi"
[WARNING/PoolWorker-1] echo "hi"
[WARNING/PoolWorker-3] echo "hi"
[WARNING/PoolWorker-2] echo "hi"
[WARNING/MainProcess] [0, 2, 3, 5, 4, 6, 7, 8, 9, 10, 11, 12, 14, 13, 1]
The logging statements show which PoolWorker handles each task, and the last logging statement shows the MainProcess has received the return values from the 15 calls to popen_wrapper.
If you'd like do it without a Pool, you could set up a mp.Queue for tasks and a mp.Queue for return values:
Using mp.Process and mp.Queues:
import multiprocessing as mp
import time
import random
import subprocess
import logging
logger = mp.log_to_stderr(logging.WARN)
SENTINEL = None
def popen_wrapper(inqueue, outqueue):
for i in iter(inqueue.get, SENTINEL):
logger.warn('echo "hi"')
outqueue.put(i)
if __name__ == '__main__':
num_to_run = 15
max_parallel = 3
inqueue = mp.Queue()
outqueue = mp.Queue()
procs = [mp.Process(target=popen_wrapper, args=(inqueue, outqueue))
for i in range(max_parallel)]
for p in procs:
p.start()
for i in range(num_to_run):
inqueue.put(i)
for i in range(max_parallel):
# Put sentinels in the queue to tell `popen_wrapper` to quit
inqueue.put(SENTINEL)
for p in procs:
p.join()
results = [outqueue.get() for i in range(num_to_run)]
logger.warn(results)
Notice that if you use
procs = [mp.Process(target=popen_wrapper, args=(inqueue, outqueue))
for i in range(max_parallel)]
then you enforce there being exactly max_parallel (e.g. 3) worker processes. You then send all 15 tasks to one Queue:
for i in range(num_to_run):
inqueue.put(i)
and let the worker processes pull tasks out of the queue:
def popen_wrapper(inqueue, outqueue):
for i in iter(inqueue.get, SENTINEL):
logger.warn('echo "hi"')
outqueue.put(i)
You may also find Doug Hellman's multiprocessing tutorial of interest. Among the many instructive examples you'll find there is an ActivePool recipe which shows how to spawn 10 processes and yet limit them (using a mp.Semaphore) so that only 3 are active at any given time. While that may be instructive, it may not be the best solution in your situation since there doesn't appear to be a reason why you'd want to spawn more than 3 processes.
I am running a lot of processes called "csm.py". Each process has a command line argument for example "testarg".
"csm.py testarg".
I am using psutil to sucessfully check if another process of the same name ("csm.py testarg") is running with this code:
for process in psutil.process_iter():
cmdline = process.cmdline
if result[0][2] in cmdline:
proc += 1
if proc >= 2:
# DO SOMETHING
What I would like to do is find out IF there is a process called "csm.py testarg" already running that is older than 1 hour, and if it is kill it but do not kill the new process (this one) that is checking for the old "csm.py testarg". Is it possible to get the processes start time / date with psutil?
Thanks
I managed to figure it out like so:
for process in psutil.process_iter():
pid = process.pid
cmdline = process.cmdline
if what_im_looking_for in cmdline:
p = psutil.Process(pid)
p.create_time
pidcreated = datetime.datetime.fromtimestamp(p.create_time)
allowed_time = pidcreated + datetime.timedelta( 0, 5400 )
now = datetime.datetime.now()
if now > allowed_time:
print "Killing process {} Start time: {} Current time: {}".format( pid, pidcreated, now)
p.kill()
>>> import os, psutil, datetime
>>> p = psutil.Process(os.getpid())
>>> p.create_time()
1307289803.47
>>> datetime.datetime.fromtimestamp(p.create_time()).strftime("%Y-%m-%d %H:%M:%S")
'2011-03-05 18:03:52'
How to wait for multiple child processes in Python on Windows, without active wait (polling)? Something like this almost works for me:
proc1 = subprocess.Popen(['python','mytest.py'])
proc2 = subprocess.Popen(['python','mytest.py'])
proc1.wait()
print "1 finished"
proc2.wait()
print "2 finished"
The problem is that when proc2 finishes before proc1, the parent process will still wait for proc1. On Unix one would use waitpid(0) in a loop to get the child processes' return codes as they finish - how to achieve something like this in Python on Windows?
It might seem overkill, but, here it goes:
import Queue, thread, subprocess
results= Queue.Queue()
def process_waiter(popen, description, que):
try: popen.wait()
finally: que.put( (description, popen.returncode) )
process_count= 0
proc1= subprocess.Popen( ['python', 'mytest.py'] )
thread.start_new_thread(process_waiter,
(proc1, "1 finished", results))
process_count+= 1
proc2= subprocess.Popen( ['python', 'mytest.py'] )
thread.start_new_thread(process_waiter,
(proc2, "2 finished", results))
process_count+= 1
# etc
while process_count > 0:
description, rc= results.get()
print "job", description, "ended with rc =", rc
process_count-= 1
Building on zseil's answer, you can do this with a mix of subprocess and win32 API calls. I used straight ctypes, because my Python doesn't happen to have win32api installed. I'm just spawning sleep.exe from MSYS here as an example, but clearly you could spawn any process you like. I use OpenProcess() to get a HANDLE from the process' PID, and then WaitForMultipleObjects to wait for any process to finish.
import ctypes, subprocess
from random import randint
SYNCHRONIZE=0x00100000
INFINITE = -1
numprocs = 5
handles = {}
for i in xrange(numprocs):
sleeptime = randint(5,10)
p = subprocess.Popen([r"c:\msys\1.0\bin\sleep.exe", str(sleeptime)], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=False)
h = ctypes.windll.kernel32.OpenProcess(SYNCHRONIZE, False, p.pid)
handles[h] = p.pid
print "Spawned Process %d" % p.pid
while len(handles) > 0:
print "Waiting for %d children..." % len(handles)
arrtype = ctypes.c_long * len(handles)
handle_array = arrtype(*handles.keys())
ret = ctypes.windll.kernel32.WaitForMultipleObjects(len(handle_array), handle_array, False, INFINITE)
h = handle_array[ret]
ctypes.windll.kernel32.CloseHandle(h)
print "Process %d done" % handles[h]
del handles[h]
print "All done!"
Twisted has an asynchronous process-spawning API which works on Windows. There are actually several different implementations, many of which are not so great, but you can switch between them without changing your code.
Twisted on Windows will perform an active wait under the covers. If you don't want to use threads, you will have to use the win32 API to avoid polling. Something like this:
import win32process
import win32event
# Note: CreateProcess() args are somewhat cryptic, look them up on MSDN
proc1, thread1, pid1, tid1 = win32process.CreateProcess(...)
proc2, thread2, pid2, tid2 = win32process.CreateProcess(...)
thread1.close()
thread2.close()
processes = {proc1: "proc1", proc2: "proc2"}
while processes:
handles = processes.keys()
# Note: WaitForMultipleObjects() supports at most 64 processes at a time
index = win32event.WaitForMultipleObjects(handles, False, win32event.INFINITE)
finished = handles[index]
exitcode = win32process.GetExitCodeProcess(finished)
procname = processes.pop(finished)
finished.close()
print "Subprocess %s finished with exit code %d" % (procname, exitcode)
You can use psutil:
>>> import subprocess
>>> import psutil
>>>
>>> proc1 = subprocess.Popen(['python','mytest.py'])
>>> proc2 = subprocess.Popen(['python','mytest.py'])
>>> ls = [psutil.Process(proc1.pid), psutil.Process(proc2.pid)]
>>>
>>> gone, alive = psutil.wait_procs(ls, timeout=3)
'gone' and 'alive' are lists indicating which processes are gone and which ones are still alive.
Optionally you can specify a callback which gets invoked every time one of the watched processes terminates:
>>> def on_terminate(proc):
... print "%s terminated" % proc
...
>>> gone, alive = psutil.wait_procs(ls, timeout=3, callback=on_terminate)
you can use psutil
import psutil
with psutil.Popen(["python", "mytest.py"]) as proc1, psutil.Popen(
["python", "mytest.py"]
) as proc2:
gone, alive = psutil.wait_procs([proc1, proc2], timeout=3)
'gone' and 'alive' are lists indicating which processes are gone and which ones are still alive.
Optionally you can specify a callback which gets invoked every time one of the watched processes terminates:
def on_terminate(proc):
print "%s terminated" % proc
gone, alive = psutil.wait_procs(ls, timeout=3, callback=on_terminate)