I have a piece of python code that should spawn an interruptible task in a child process:
class Script:
'''
Class instantiated by the parent process
'''
def __init__(self, *args, **kwargs):
self.process = None
self.start()
def __del__(self):
if self.process:
if self.process.poll() is None:
self.stop()
def start(self):
popen_kwargs = {
'executable': sys.executable,
'creationflags': 0 * subprocess.CREATE_DEFAULT_ERROR_MODE
| subprocess.CREATE_NEW_PROCESS_GROUP,
}
self.process = subprocess.Popen(['python', os.path.realpath(__file__)],
**popen_kwargs)
def stop(self):
if not self.process:
return
try:
self.process.send_signal(signal.CTRL_C_EVENT)
self.process.wait()
self.process = None
except KeyboardInterrupt:
pass
class ScriptSubprocess:
def __init__(self):
self.stop = False
def run(self):
try:
while not self.stop:
# ...
except KeyboardInterrupt:
# interrupted!
pass
finally:
# make a *clean* exit if interrupted
self.stop = True
if __name__ == '__main__':
p = ScriptSubprocess()
p.run()
del p
and it works fine in a standalone python interpreter.
The problem arises when I move this code in the real application, which has an embedded Python interpreter.
In this case, it hangs when trying to stop the child process at the line self.process.wait(), indicating the previous line self.process.send_signal(signal.CTRL_C_EVENT) did not work and in fact the child process is still running and if I manually terminate it via task manager, the call to self.process.wait() returns as if it has succeeded stopping the child process.
I am looking for possible causes (e.g. some process flag of the parent process) that disables CTRL_C_EVENT.
The documentation of subprocess says:
Popen.send_signal(signal)
Sends the signal signal to the child.
Do nothing if the process completed.
Note On Windows, SIGTERM is an alias for terminate(). CTRL_C_EVENT and
CTRL_BREAK_EVENT can be sent to processes started with a creationflags
parameter which includes CREATE_NEW_PROCESS_GROUP.
and also:
subprocess.CREATE_NEW_PROCESS_GROUP
A Popen creationflags parameter to specify that a new process group will be created. This flag is necessary for using os.kill() on the subprocess.
This flag is ignored if CREATE_NEW_CONSOLE is specified.
So I am using creationflags with subprocess.CREATE_NEW_PROCESS_GROUP, but still it is unable to kill the subprocess with CTRL_C_EVENT in the real application. (same as without this flag)
Since the real application (i.e. the parent process) is also using SetConsoleCtrlHandler to handle certain signals, I also try to pass creationflags with subprocess.CREATE_DEFAULT_ERROR_MODE to override that error mode in the child process, but still unable to kill the child process with CTRL_C_EVENT.
Note: CTRL_BREAK_EVENT works but does not give a clean exit (i.e. the finally: clause is not executed).
My guess is that SetConsoleCtrlHandler is the culprit, but I have no means of avoiding that being called in the parent process, or undoing its effect...
Related
How can I keep the ROS Publisher publishing the messages while calling a sub-process:
import subprocess
import rospy
class Pub():
def __init__(self):
pass
def updateState(self, msg):
cmd = ['python3', planner_path, "--alias", search_options, "--plan-file", plan_path, domain_path, problem_path]
subprocess.run(cmd, shell=False, stdout=subprocess.PIPE)
self.plan_pub.publish(msg)
def myPub(self):
rospy.init_node('problem_formulator', anonymous=True)
self.plan_pub = rospy.Publisher("plan", String, queue_size=10)
rate = rospy.Rate(10) # 10hz
rospy.Subscriber('model', String, updateState)
rospy.sleep(1)
rospy.spin()
if __name__ == "__main__":
p_ = Pub()
p_.myPub()
Since subprocess.call is a blocking call your subscription callback may take a long time.
Run the command described by args. Wait for command to complete, then return the returncode attribute.
ROS itself will not call the callback again while it is executed already. This means you are blocking this and potentially also other callbacks to be called in time.
The most simple solution would be to replace subprocess.call by subprocess.Popen which
Execute a child program in a new process
nonblocking.
But keep in mind that this potentially starts the process multiple times quite fast.
Think about starting the process only conditionally if not already running. This can be achieved by checking the process to be finished in another thread. Simple but effective, use boolean flag. Here is a small prototype:
def updateState(self, msg):
#Start the process if not already running
if not self._process_running:
p = subprocess.Popen(...)
self._process_running = True
def wait_process():
while p.poll() is None:
time.sleep(0.1)
self._process_running = False
threading.Thread(target=wait_process).start()
#Other callback code
self.plan_pub.publish(msg)
I have an architecture where the main process can spawn children process.
The main process sends computation requests to the children via Pipe.
Here is my current code for the child process:
while True:
try:
# not sufficient because conn.recv() is blocking
if self.close_event.is_set():
break
fun, args = self.conn.recv()
# some heavy computation
res = getattr(ds, fun)(*args)
self.conn.send(res)
except EOFError as err:
# should be raised by conn.recv() if connection is closed
# but it never happens
break
and how it is initialized in the main process:
def init_worker(self):
close_event = DefaultCtxEvent()
conn_parent, conn_child = Pipe()
process = WorkerProcess(
i, self.nb_workers, conn_child, close_event, arguments=self.arguments)
process.daemon = True
process.start()
# close the side we don't use
conn_child.close()
# Remember the side we need
self.conn = conn_parent
I have a clean method that should close all child like so from the main process:
def clean(self):
self.conn.close()
# waiting for the loop to break for a clean exit
self.child_process.join()
However, the call to conn.recv() blocks and never throws an error as I would expect.
I may be confusing the behaviour of "conn_parent" and "conn_children" somehow?
How to properly close the children connection?
Edit: a possible solution is to explicitely send a message with a content like "_break". The loop receive the message via conn.recv() and breaks. Is that a "normal" pattern? As a bonus, is there a way to kill a potentially long running method without terminating the process?
apperantly there's a problem with linux Pipes, because the child forks the parent's connection, it's still open and need to be closed explicitly on the child's side.
this is just a dummy example of how it can be done.
from multiprocessing import Pipe, Process
def worker_func(parent_conn, child_conn):
parent_conn.close() # close parent connection forked in child
while True:
try:
a = child_conn.recv()
except EOFError:
print('child cancelled')
break
else:
print(a)
if __name__ == "__main__":
parent_conn, child_conn = Pipe()
child = Process(target=worker_func, args=(parent_conn, child_conn,))
child.start()
child_conn.close()
parent_conn.send("a")
parent_conn.close()
child.join()
print('child done')
a
child cancelled
child done
this is not required on windows, or when linux uses "spawn" for creating workers, because the child won't fork the parent connection, but this code will work on any system with any worker creation strategy.
Consider simple setup of a child process. Basically, it is a producer(parent)-consumer(child) scenario.
class Job:
def start_process(self):
self.queue = multiprocessing.Queue(3)
self.process = multiprocessing.Process(target=run,
args=(self.queue))
def run(queue):
while True:
item = queue.get()
....
If I do kill -9 on the parent process the child will hang forever. I was sure that it will receive SIGHUP like with subprocess.Popen - when the python process quits the popened will quit as well. Any idea how to fix child cleanup?
If the daemon param doesn't work for you, you can catch a SIGINT signal and have it set a boolean value to exit the while loop in your children. ie..
import signal
g_run_loops = True
def signal_handler(signum, frame):
global g_run_loops
g_run_loops = False
signal.signal(signal.SIGINT, signal_handler)
def run(queue):
global g_run_loops
while g_run_loops:
item = queue.get()
....
Note that this won't work for SIGKILL (kill -9) but should work for SIGINT (kill -2).
Here is my code, it launches a subprocess, waits till it ends and returns stdout, or a timeout happens and it raises exception. Common use is print(Run('python --version').execute())
class Run(object):
def __init__(self, cmd, timeout=2*60*60):
self.cmd = cmd.split()
self.timeout = timeout
self._stdout = b''
self.dt = 10
self.p = None
def execute(self):
print("Execute command: {}".format(' '.join(self.cmd)))
def target():
self.p = Popen(self.cmd, stdout=PIPE, stderr=STDOUT)
self._stdout = self.p.communicate()[0]
thread = Thread(target=target)
thread.start()
t = 0
while t < self.timeout:
thread.join(self.dt)
if thread.is_alive():
t += self.dt
print("Running for: {} seconds".format(t))
else:
ret_code = self.p.poll()
if ret_code:
raise AssertionError("{} failed.\nretcode={}\nstdout:\n{}".format(
self.cmd, ret_code, self._stdout))
return self._stdout
else:
print('Timeout {} reached, kill task, pid={}'.format(self.timeout, self.p.pid))
self.p.terminate()
thread.join()
raise AssertionError("Timeout")
The problem is following case. The process that I launch spawns more child processes. So when the timeout is reached, I kill main process (the one I srarted using my class) with self.p.terminate(), the children are remaining and my code hangs on line self._stdout = self.p.communicate()[0]. And execution continues if I manually kill all child processes.
I tried soulution when instead of self.p.terminate() I kill whole process tree.
This also does not work if the main process finished by itself and its children are existing on their own, and I have no ability to find and kill them. But they are blocking self.p.communicate().
Is there way to effectively solve this?
You could use the ProcessWrapper from the PySys framework - it offers alot of this functionality as an abstraction in a cross platform way i.e.
import sys, os
from pysys.constants import *
from pysys.process.helper import ProcessWrapper
from pysys.exceptions import ProcessTimeout
command=sys.executable
arguments=['--version']
try:
process = ProcessWrapper(command, arguments=arguments, environs=os.environ, workingDir=os.getcwd(), stdout='stdout.log', stderr='stderr.log', state=FOREGROUND, timeout=5.0)
process.start()
except ProcessTimeout:
print "Process timeout"
process.stop()
It's at SourceForge (http://sourceforge.net/projects/pysys/files/ and http://pysys.sourceforge.net/) if of interest.
I would like to repeatedly execute a subprocess as fast as possible. However, sometimes the process will take too long, so I want to kill it.
I use signal.signal(...) like below:
ppid=pipeexe.pid
signal.signal(signal.SIGALRM, stop_handler)
signal.alarm(1)
.....
def stop_handler(signal, frame):
print 'Stop test'+testdir+'for time out'
if(pipeexe.poll()==None and hasattr(signal, "SIGKILL")):
os.kill(ppid, signal.SIGKILL)
return False
but sometime this code will try to stop the next round from executing.
Stop test/home/lu/workspace/152/treefit/test2for time out
/bin/sh: /home/lu/workspace/153/squib_driver: not found ---this is the next execution; the program wrongly stops it.
Does anyone know how to solve this? I want to stop in time not execute 1 second the time.sleep(n) often wait n seconds. I do not want that I want it can execute less than 1 second
You could do something like this:
import subprocess as sub
import threading
class RunCmd(threading.Thread):
def __init__(self, cmd, timeout):
threading.Thread.__init__(self)
self.cmd = cmd
self.timeout = timeout
def run(self):
self.p = sub.Popen(self.cmd)
self.p.wait()
def Run(self):
self.start()
self.join(self.timeout)
if self.is_alive():
self.p.terminate() #use self.p.kill() if process needs a kill -9
self.join()
RunCmd(["./someProg", "arg1"], 60).Run()
The idea is that you create a thread that runs the command and to kill it if the timeout exceeds some suitable value, in this case 60 seconds.
Here is something I wrote as a watchdog for subprocess execution. I use it now a lot, but I'm not so experienced so maybe there are some flaws in it:
import subprocess
import time
def subprocess_execute(command, time_out=60):
"""executing the command with a watchdog"""
# launching the command
c = subprocess.Popen(command)
# now waiting for the command to complete
t = 0
while t < time_out and c.poll() is None:
time.sleep(1) # (comment 1)
t += 1
# there are two possibilities for the while to have stopped:
if c.poll() is None:
# in the case the process did not complete, we kill it
c.terminate()
# and fill the return code with some error value
returncode = -1 # (comment 2)
else:
# in the case the process completed normally
returncode = c.poll()
return returncode
Usage:
return = subprocess_execute(['java', '-jar', 'some.jar'])
Comments:
here, the watchdog time out is in seconds; but it's easy to change to whatever needed by changing the time.sleep() value. The time_out will have to be documented accordingly;
according to what is needed, here it maybe more suitable to raise some exception.
Documentation: I struggled a bit with the documentation of subprocess module to understand that subprocess.Popen is not blocking; the process is executed in parallel (maybe I do not use the correct word here, but I think it's understandable).
But as what I wrote is linear in its execution, I really have to wait for the command to complete, with a time out to avoid bugs in the command to pause the nightly execution of the script.
I guess this is a common synchronization problem in event-oriented programming with threads and processes.
If you should always have only one subprocess running, make sure the current subprocess is killed before running the next one. Otherwise the signal handler may get a reference to the last subprocess run and ignore the older.
Suppose subprocess A is running. Before the alarm signal is handled, subprocess B is launched. Just after that, your alarm signal handler attempts to kill a subprocess. As the current PID (or the current subprocess pipe object) was set to B's when launching the subprocess, B gets killed and A keeps running.
Is my guess correct?
To make your code easier to understand, I would include the part that creates a new subprocess just after the part that kills the current subprocess. That would make clear there is only one subprocess running at any time. The signal handler could do both the subprocess killing and launching, as if it was the iteration block that runs in a loop, in this case event-driven with the alarm signal every 1 second.
Here's what I use:
class KillerThread(threading.Thread):
def __init__(self, pid, timeout, event ):
threading.Thread.__init__(self)
self.pid = pid
self.timeout = timeout
self.event = event
self.setDaemon(True)
def run(self):
self.event.wait(self.timeout)
if not self.event.isSet() :
try:
os.kill( self.pid, signal.SIGKILL )
except OSError, e:
#This is raised if the process has already completed
pass
def runTimed(dt, dir, args, kwargs ):
event = threading.Event()
cwd = os.getcwd()
os.chdir(dir)
proc = subprocess.Popen(args, **kwargs )
os.chdir(cwd)
killer = KillerThread(proc.pid, dt, event)
killer.start()
(stdout, stderr) = proc.communicate()
event.set()
return (stdout,stderr, proc.returncode)
A bit more complex, I added an answer to solve a similar problem: Capturing stdout, feeding stdin, and being able to terminate after some time of inactivity and/or after some overall runtime.