I have two Python scripts foo.py and bar.py, foo.py will call bar.py via os.system().
#foo.py
import os
print os.getpid()
os.system("python dir/bar.py")
#bar.py
import time
time.sleep(10)
print "over"
Say the pid of foo.py is 123, if the program terminate normally, it'll print
123
over
If I type kill 123 while it's running, I'll get the following output
123
Terminated
over
If I press Ctrl-C while it's running, I'll get something like
123
^CTraceback (most recent call last):
File "dir/bar.py", line 4, in <module>
time.sleep(10)
KeyboardInterrupt
But if I type kill -SIGINT 123 while it's running, it seems the program will just ignore the signal and exit normally.
123
over
It seems to me that,
if I type kill 123, the sub-process will not be affected.
if I type Ctrl-C, both processes will be terminated.
if I type kill -SIGINT 123 while the sub-process is running, the signal will be ignored.
Can someone please explain to me how it works?
Isn't Ctrl-C and kill -SIGINT supposed to be equivalent?
If I type kill 123 is it guaranteed that the sub-process will not be affected (if it happens to be running)?
I am on Ubuntu 14.04 by the way. Thanks!
Let's consider each case in turn:
if I type kill 123, the sub-process will not be affected.
Yes, that's how kill [pid] works. It sends a signal only to the process you want to kill. If you want to send the signal to an group of processes, then you have to use the negative number representing the process group.
if I type Ctrl-C, both processes will be terminated.
I assume you mean "terminated by Ctrl-C". Actually, that's not the case: only the child is terminated. If you add at the end of foo.py a line like this one print "I'm a little teapot", you'll see this line gets printed. What happens is that the child gets the signal. The parent then continues from os.system. Without the additional line, it looks like the parent was also affected by the Ctrl-C but it is not the case, as the additional line shows.
You shell does send the signal to the process group that is associated with the tty, which includes the parent. However, os.system uses the system call which blocks the SIGINT and SIGQUIT signals in the process that makes the call. So the parent is immune.
If you do not use os.system, then your process will be affected by the SIGINT. Try this code for foo.py:
import os
import subprocess
print os.getpid()
p = subprocess.Popen(["python", "dir/bar.py"])
p.wait()
print "I'm a little teapot"
If you hit Ctrl-C while this runs, you'll get two tracebacks: one from the parent, one from the child:
$ python foo.py
29626
^CTraceback (most recent call last):
File "dir/bar.py", line 4, in <module>
Traceback (most recent call last):
File "foo.py", line 8, in <module>
time.sleep(10)
KeyboardInterrupt p.wait()
File "/usr/lib/python2.7/subprocess.py", line 1389, in wait
pid, sts = _eintr_retry_call(os.waitpid, self.pid, 0)
File "/usr/lib/python2.7/subprocess.py", line 476, in _eintr_retry_call
return func(*args)
KeyboardInterrupt
if I type kill -SIGINT 123 while the sub-process is running, the signal will be ignored.
See above.
Isn't Ctrl-C and kill -SIGINT supposed to be equivalent?
Ctrl-C does send a SIGINT to foreground process group associated with the tty in which you issue the Ctrl-C.
If I type kill 123 is it guaranteed that the sub-process will not be affected (if it happens to be running)?
By itself kill 123 will send the signal only to the process with pid 123. Children won't be affected.
Related
I am using the following code to launch a subprocess :
# Run the program
subprocess_result = subprocess.run(
cmd,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
check=False,
timeout=timeout,
cwd=directory,
env=env,
preexec_fn=set_memory_limits,
)
The launched subprocess is also a Python program, with a shebang.
This subprocess may last for longer than the specified timeout.
The subprocess does heavy computations and write results in a file and does not contain any signal handler.
According to the documentation https://docs.python.org/3/library/subprocess.html#subprocess.run, subprocess.run kills a child that timesout :
The timeout argument is passed to Popen.communicate(). If the timeout
expires, the child process will be killed and waited for. The
TimeoutExpired exception will be re-raised after the child process has
terminated.
When my subprocess timesout, I always receive the subprocess.TimeoutExpired exception, but from time to time the subprocess is not killed, hence still consuming resources on my machine.
So my question is, am I doing something wrong here ? If yes, what and if no, why do I have this issue and how can I solve it ?
Note : I am using Python 3.10 on Ubuntu 22_04
The most likely culprit for the behaviour you see is that the subprocess you are spawning is probably using multiprocessing and spawning its own child processes. Killing the parent process does not automatically kill the whole set of descendants. The granchildren are inherited by the init process (i.e. the process with PID 1) and will continue to run.
You can verify from the source code of suprocess.run :
with Popen(*popenargs, **kwargs) as process:
try:
stdout, stderr = process.communicate(input, timeout=timeout)
except TimeoutExpired as exc:
process.kill()
if _mswindows:
# Windows accumulates the output in a single blocking
# read() call run on child threads, with the timeout
# being done in a join() on those threads. communicate()
# _after_ kill() is required to collect that and add it
# to the exception.
exc.stdout, exc.stderr = process.communicate()
else:
# POSIX _communicate already populated the output so
# far into the TimeoutExpired exception.
process.wait()
raise
except: # Including KeyboardInterrupt, communicate handled that.
process.kill()
# We don't call process.wait() as .__exit__ does that for us.
raise
Here you can see at line 550 the timeout is set on the communicate call, if it fires at line 552 the subprocess is .kill()ed. The kill method sends a SIGKILL which immediately kills the subprocess without any cleanup. It's a signal that cannot be caught by the subprocess, so it's not possible that the child is somehow ignoring it.
The TimeoutException is then re-raised at line 564, so if your parent process sees this exception the subprocess is already dead.
This however says nothing of granchildren processes. Those will continue to run as children of PID 1.
I don't see any way in which you can customize how subprocess.run handles subprocess termination. For example, if it used SIGTERM instead of SIGKILL you could modify your child process or write a wrapper process that will catch the signal and properly kill all its descendants. But SIGKILL doesn't give you this luxury.
So I believe that for your use case you cannot use the subprocess.run facade but you should use Popen directly. You can look at the subprocess.run implementation and take just the things that you need, maybe dropping support for platforms you don't use.
Note: There are extremely rare situations in which the subprocesses won't die immediately on SIGKILL. I believe the only situation in which this happens is if the subprocess is performing a very long system call or other kernel operation, which might not be interrupted immediately. If the operation is in deadlock this might prevent the process from terminating forever. However I don't think that this is your case, since you did not mention that the process is stuck doing nothing, but from what you said the process simply seems to continue running.
I'm experiencing an issue where a call to proc.communicate() still hangs even after calling proc.terminate().
I create tasks to be run in the background with the call
import subprocess as sub
p = sub.Popen(command, stdout=sub.PIPE, stderr=sub.PIPE, shell=False)
At the completion of the script, I call terminate(), communicate(), and wait() to gather information from the process.
p.terminate()
errorcode = p.wait()
(pout, perr) = p.communicate()
The script hangs at the call to communicate. I'd assumed that any call to communicate that follows a call to terminate would return immediately. Is there any reason why this would fail?
Edit: I'm using this method because the command is really a tight loop that won't terminate on its own. I'd like to use p.terminate() to do that, and then see what the stdout and stderr has to offer.
You don't need the first two statements. Just call communicate().
I'm trying to kill the notepad.exe process on windows using this function:
import thread, wmi, os
print 'CMD: Kill command called'
def kill():
c = wmi.WMI ()
Commands=['notepad.exe']
if Commands[0]!='All':
print 'CMD: Killing: ',Commands[0]
for process in c.Win32_Process ():
if process.Name==Commands[0]:
process.Terminate()
else:
print 'CMD: trying to kill all processes'
for process in c.Win32_Process ():
if process.executablepath!=inspect.getfile(inspect.currentframe()):
try:
process.Terminate()
except:
print 'CMD: Unable to kill: ',proc.name
kill() #Works
thread.start_new_thread( kill, () ) #Not working
It works like a charm when I'm calling the function like this:
kill()
But when running the function in a new thread it crashes and I have no idea why.
import thread, wmi, os
import pythoncom
print 'CMD: Kill command called'
def kill():
pythoncom.CoInitialize()
. . .
Running Windows functions in threads can be tricky since it often involves COM objects. Using pythoncom.CoInitialize() usually allows you do it. Also, you may want to take a look at the threading library. It's much easier to deal with than thread.
There are a couple of problems (EDIT: The second problem has been addressed since starting my answer, by "MikeHunter", so I will skip that):
Firstly, your program ends right after starting the thread, taking the thread with it. I will assume this is not a problem long-term because presumably this is going to be part of something bigger. To get around that, you can simulate something else keeping the program going by just adding a time.sleep() call at the end of the script with, say, 5 seconds as the sleep length.
This will allow the program to give us a useful error, which in your case is:
CMD: Kill command called
Unhandled exception in thread started by <function kill at 0x0223CF30>
Traceback (most recent call last):
File "killnotepad.py", line 4, in kill
c = wmi.WMI ()
File "C:\Python27\lib\site-packages\wmi.py", line 1293, in connect
raise x_wmi_uninitialised_thread ("WMI returned a syntax error: you're probably running inside a thread without first calling pythoncom.CoInitialize[Ex]")
wmi.x_wmi_uninitialised_thread: <x_wmi: WMI returned a syntax error: you're probably running inside a thread without first calling pythoncom.CoInitialize[Ex] (no underlying exception)>
As you can see, this reveals the real problem and leads us to the solution posted by MikeHunter.
I have a script that repeatedly runs an Ant buildfile and scrapes output into a parsable format. When I create the subprocess using Popen, there is a small time window where hitting Ctrl+C will kill the script, but will not kill the subprocess running Ant, leaving a zombie that is printing output to the console that can only be killed using Task Manager. Once Ant has started printing output, hitting Ctrl+C will always kill my script as well as Ant. Is there a way to make it so that hitting Ctrl+C will always kill the subprocess running Ant without leaving a zombie behind?
Also of note: I have a handler for SIGINT that performs a few cleanup operations before calling exit(0). If I manually kill the subprocess in the handler using os.kill(p.pid, signal.SIGTERM) (not SIGINT), then I can successfully kill the subprocess in situations where it would normally zombify. However, when you hit Ctrl+C once Ant has started producing output, you get a stacktrace from subprocess where it is unable to kill the subprocess itself as I have already killed it.
EDIT: My code looked something like:
p = Popen('ls')
def handle_sig_int(signum, stack_frame):
# perform cleanup
os.kill(p.pid, signal.SIGTERM)
exit(0)
signal.signal(signal.SIGINT, handle_sig_int)
p.wait()
Which would produce the following stacktrace when triggered incorrectly:
File "****.py", line ***, in run_test
p.wait()
File "/usr/lib/python2.5/subprocess.py", line 1122, in wait
pid, sts = os.waitpid(self.pid, 0)
File "****.py", line ***, in handle_sig_int
os.kill(p.pid, signal.SIGTERM)
I fixed it by catching the OSError raised by p.wait and exiting:
try:
p.wait()
except OSError:
exit('The operation was interrupted by the user')
This seems to work in the vast majority of my test runs. I occasionally get a uname: write error: Broken pipe, though I don't know what causes it. It seems to happen if I time the Ctrl+C just right before the child process can start displaying output.
Call p.terminate() in your SIGTERM handler:
if p.poll() is None: # Child still around?
p.terminate() # kill it
[EDIT] Since you're stuck with Python 2.5, use os.kill(p.pid, signal.SIGTERM) instead of p.terminate(). The check should make sure you don't get an exception (or reduce the number of times you get one).
To make it even better, you can catch the exception and check the message. If it means "child process not found", then ignore the exception. Otherwise, rethrow it with raise (no arguments).
I'm working on a programming project--writing a basic P2P filesharing application in Python. I'm using two threads: a main one to call select and wait for input from a list of sockets and sys.stdin (to receive typed commands) and a helper thread that takes status update messages off a queue and prints them. (It is the only thing that prints anything)
I'm also required to catch the standard SIGINT and handle it to exit gracefully. I have a quit method that does this; typing 'quit' as a command works just fine. So in the main thread I try setting this method as the handler for SIGINT. As far as I can tell, the process catches the signal and calls the quit method. The helper thread prints a message confirming that it is exiting. But then I get the following error message from the main thread:
Traceback (most recent call last):
File "peer.py", line 226, in <module>
main()
File "peer.py", line 223, in main
p.run()
File "peer.py", line 160, in run
readables, writables, exceptions = select(self.sockets, [], [])
select.error: (4, 'Interrupted system call')
After which the program does still exit. Whereas without the signal handler in place, sending a SIGINT gives me the following:
Traceback (most recent call last):
File "peer.py", line 225, in <module>
main()
File "peer.py", line 222, in main
p.run()
File "peer.py", line 159, in run
readables, writables, exceptions = select(self.sockets, [], [])
KeyboardInterrupt
Which fails to terminate the program; I have to stop and kill it. This is confusing because the SIGINT appears to interrupt the call to select only when it is caught by my custom method. (Which only puts a message on the print queue and sets a "done" variable) Does anyone know how this can be happening? Is it just a bad idea trying to use signal handlers and threads simultaneously?
I'm not sure about using signal handlers to catch this case, but I've found a recipe for handling this case on *nix based systems here: http://code.activestate.com/recipes/496735-workaround-for-missed-sigint-in-multithreaded-prog/
In a nutshell (If I undertand correctly):
Before you start any new threads, fork a child process (using os.fork) to finish the program run, and have the parent process watch for the KeyboardInterrupt.
When the parent catches the keyboard interrupt, you can kill the child process (which by now may have started other threads) using os.kill. This will, in turn, terminate any threads of that child process.
Yes, last night after I stopped working on it I realized that I did want it to interrupt. It was being interrupted by executing the signal handler, presumably. So I just catch the select.error and have it jump to the end of the loop, where it immediately exits and moves on to the cleanup code.