Popen.communicate() throws OSError: "[Errno 10] No child processes" - python

I'm trying to start up a child process and get its output on Linux from Python using the subprocess module:
#!/usr/bin/python2.4
import subprocess
p = subprocess.Popen(['ls', '-l', '/etc'],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
out, err = p.communicate()
However, I experience some flakiness: sometimes, p.communicate() would throw
OSError: [Errno 10] No child processes
What can cause this exception? Is there any non-determinism or race condition here that can cause flakiness?

Are you intercepting SIGCHLD in the script? If you are then Popen will not run as expected since it relies on it's own handler for that signal.
You can check for SIGCHLD handlers by commenting out the Popen call and then running:
strace python <your_script.py> | grep SIGCHLD
if you see something similar to:
rt_sigaction(SIGCHLD, ...)
then, you are in trouble. You need to disable the handler prior to calling Popen and then resetting it after communicate is done (this might introduce a race conditions so beware).
signal.signal(SIGCHLD, handler)
...
signal.signal(SIGCHLD, signal.SIG_DFL)
'''
now you can go wild with Popen.
WARNING!!! during this time no signals will be delivered to handler
'''
...
signal.signal(SIGCHLD, handler)
There is a python bug reported on this and as far as I see it hasn't been resolved yet:
http://bugs.python.org/issue9127
Hope that helps.

You might be running into the bug mentioned here: http://bugs.python.org/issue1731717

I'm not able to reproduce this on my Python (2.4.6-1ubuntu3). How are you running your script? How often does this occur?

I ran into this problem using Python 2.6.4 which I built into my home directory (because I don't want to upgrade the "built-in" Python on the machine).
I worked around it by replacing subprocess.Popen() with (the deprecated) os.popen3().

Related

Catching SIGINT (Ctrl+C) signal sent from systemd to a python daemon/service

EDIT: Narrowed down problem from original version, originally assumed all SIGINT overrides were being ignored, but it's actually just the subprocess one, edited to reflect this.
I'd like to have python shutdown safely when receiving the SIGINT (Ctrl+C) from systemd. However, the command sudo systemctl kill --signal=SIGINT myapp ignores my subprocess.Popen(args, stdout=PIPE, stderr=PIPE, preexec_fn = os.setsid) line, which prevents the SIGINT from going to a called process (works when NOT using systemd), and crashes my program anyways.
Here's my setup (similar to this: How can I make a python daemon handle systemd signals?):
shutdown = False
def shutdown_handler(signal, frame):
global shutdown
is_thread = frame.f_code.co_name == "my_thread_func"
if shutdown:
logging.info("Force shutdown for process {0}".format(os.getpid()))
raise KeyboardInterrupt
else:
shutdown = True
if not is_thread:
logging.info("Shutdown signal received. Waiting for sweeps to finish.")
logging.info("Press Ctrl-C again to force shutdown.")
return
signal.signal(signal.SIGINT, shutdown_handler)
Elsewhere:
subprocess.Popen(args, stdout=PIPE, stderr=PIPE, preexec_fn = os.setsid)
When running NOT using systemd (as just python daemon.py), the Popen subprocess continues running as desired. But when using sudo systemctl kill --signal=SIGINT myapp, it sends the signal to the parent, child, and Popen (command line) processes.
systemd[1]: fi_iot.service: Sent signal SIGINT to main process 512562 (python3) on client request.
systemd[1]: fi_iot.service: Sending signal SIGINT to process 512978 (python3) on client request.
systemd[1]: fi_iot.service: Sending signal SIGINT to process 513023 (my-cli-tool) on client request.
Any one know why this is happening?
I'm also open to suggestions on alternative ways of implementing this (eg adding an ExecStop= arg to my system config, or using a custom signal instead of SIGINT), though I'd rather override as little default behavior as possible, I want sudo systemctl stop myapp to do what it's supposed to do without my custom code potentially messing things up or confusing others.
EDIT: It seems this issue is specific to how the Popen function is called, I might try setting it to SIGIGN and see it that works, an earlier version of this post indicated this was a broader issue than it appears to be.
In python3 ctl-c is raised as the error KeyboardInterrupt
to catch it use try, and except KeyboardInterrupt:
The best bet to catching it is to make a main method, and put the try except around where you call it.
Update:
Popen has a method called send_signal, so you can forward the systemd signal to it
relevant python docs
https://docs.python.org/3/library/subprocess.html#subprocess.Popen.send_signal
Solution: Using a different method of preventing subprocess.Popen from overriding signals worked:
def preexec_function():
# used by Popen to tell driver to ignore SIGINT
signal.signal(signal.SIGINT, signal.SIG_IGN)
proc = Popen(args, stdout=PIPE, stderr=PIPE, preexec_fn = preexec_function)
Now the subprocess ignores the signal, unlike copilot's preexec_fn=os.setsid, which doesn't do what I want from systemd, which is what I get for using GPT-3 generated code I don't understand.
I may look into using Showierdata9978's suggestion of using send_signal, which could allow me to send the interrupt when the second Ctrl+C is pressed, allowing it to shutdown safely despite the ignore.

Popen.wait never returning with docker-compose

I am developing a wrapper around docker compose with python.
However, I struggle with Popen.
Here is how I launch launch it :
import subprocess as sp
argList=['docker-compose', 'up']
env={'HOME': '/home/me/somewhere'}
p = sp.Popen(argList, env=env)
def handler(signum, frame):
p.send_signal(signum)
for s in (signal.SIGINT,):
signal.signal(s, handler) # to redirect Ctrl+C
p.wait()
Everything works fine, when I hit Ctrl+C, docker-compose kills gracelly the container, however, p.wait() never returns...
Any hint ?
NOTE : While writing the question, I though I needed to check if p.wait() does actually return and if the block is after (it's the last instruction in the script). Adding a print after it end in the process exiting normally, any further hints on this behavior ?
When I run your code as written, it works as intended in that it causes docker-compose to exit and then p.wait() returns. However, I occasionally see this behavior:
Killing example_service_1 ... done
ERROR: 2
I think that your code may end up delivering SIGINT twice to docker-compose. That is, I think docker-compose receives an initial SIGINT when you type CTRL-C, because it has the same controlling terminal as your Python script, and then you explicitly deliver another SIGINT in your handler function.
I don't always see this behavior, so it's possible my explanation is incorrect.
In any case, I think the correct solution here is imply to ignore SIGINT in your Python code:
import signal
import subprocess
argList = ["docker-compose", "up"]
p = subprocess.Popen(argList)
signal.signal(signal.SIGINT, signal.SIG_IGN) # to redirect Ctrl+C
p.wait()
With this implementation, your Python code ignores the SIGINT generated by CTRL-C, but it is received and processed normally by docker-compose.

How to pass SIGINT to child process with Python subprocess.Popen() using shell = true

I am currently trying to write (Python 2.7.3) kind of a wrapper for GDB, which will allow me to dynamically switch from scripted input to interactive communication with GDB.
So far I use
self.process = subprocess.Popen(["gdb vuln"], stdin = subprocess.PIPE, shell = True)
to start gdb within my script. (vuln is the binary I want to examine)
Since a key feature of gdb is to pause the execution of the attached process and allow the user to inspect registers and memory on receiving SIGINT (STRG+C) I do need some way to pass a SIGINT signal to it.
Neither
self.process.send_signal(signal.SIGINT)
nor
os.kill(self.process.pid, signal.SIGINT)
or
os.killpg(self.process.pid, signal.SIGINT)
work for me.
When I use one of these functions there is no response. I suppose this problem arises from the use of shell=True. However, at this point I am really out of ideas.
Even my old friend Google couldn't really help me out this time, so maybe you can help me. Thank's in advance.
Cheers, Mike
Here is what worked for me:
import signal
import subprocess
try:
p = subprocess.Popen(...)
p.wait()
except KeyboardInterrupt:
p.send_signal(signal.SIGINT)
p.wait()
I looked deeper into the problem and found some interesting things. Maybe these findings will help someone in the future.
When calling gdb vuln using suprocess.Popen() it does in fact create three processes, where the pid returned is the one of sh (5180).
ps -a
5180 pts/0 00:00:00 sh
5181 pts/0 00:00:00 gdb
5183 pts/0 00:00:00 vuln
Consequently sending a SIGINT to the process will in fact send SIGINT to sh.
Besides, I continued looking for an answer and stumbled upon this post
https://bugzilla.kernel.org/show_bug.cgi?id=9039
To keep it short, what is mentioned there is the following:
When pressing STRG+C while using gdb regularly SIGINT is in fact sent to the examined program (in this case vuln), then ptrace will intercept it and pass it to gdb.
What this means is, that if I use self.process.send_signal(signal.SIGINT) it will in fact never reach gdb this way.
Temporary Workaround:
I managed to work around this problem by simply calling subprocess.popen() as follows:
subprocess.Popen("killall -s INT " + self.binary, shell = True)
This is nothing more than a first workaround. When multiple applications with the same name are running might do some serious damage. Besides, it somehow fails, if shell=True is not set.
If someone has a better fix (e.g. how to get the pid of the process startet by gdb), please let me know.
Cheers, Mike
EDIT:
Thanks to Mark for pointing out to look at the ppid of the process.
I managed to narrow down the process's to which SIGINT is sent using the following approach:
out = subprocess.check_output(['ps', '-Aefj'])
for line in out.splitlines():
if self.binary in line:
l = line.split(" ")
while "" in l:
l.remove("")
# Get sid and pgid of child process (/bin/sh)
sid = os.getsid(self.process.pid)
pgid = os.getpgid(self.process.pid)
#only true for target process
if l[4] == str(sid) and l[3] != str(pgid):
os.kill(pid, signal.SIGINT)
I have done something like the following in the past and if I recollect it seemed to work for me :
def detach_procesGroup():
os.setpgrp()
subprocess.Popen(command,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
preexec_fn=detach_processGroup)

python how to kill a popen process, shell false [why not working with standard methods]

I'm trying to kill a subprocess started with:
playing_long = Popen(["omxplayer", "/music.mp3"], stdout=subprocess.PIPE)
and after a while
pid = playing_long.pid
playing_long.terminate()
os.kill(pid,0)
playing_long.kill()
Which doesn't work.
Neither the solution pointed out here
How to terminate a python subprocess launched with shell=True
Noting that I am using threads, and it is not recommended to use preexec_fn when you use threads (or at least this is what I read, anyway it doesn't work either).
Why it is not working? There's no error message in the code, but I have to manually kill -9 the process to stop listening the mp3 file.
Thanks
EDIT:
From here, I have added a wait() after the kill().
Surprisingly, before re-starting the process I check if this is still await, so that I don't start a chorus with the mp3 file.
Without the wait(), the system sees that the process is alive.
With the wait(), the system understands that the process is dead and starts again it.
However, the process is still sounding. Definitively I can't seem to get it killed.
EDIT2: The problem is that omxplayer starts a second process that I don't kill, and it's the responsible for the actual music.
I've tried to use this code, found in several places in internet, it seems to work for everyone but not for me
playing_long.stdin.write('q')
playing_long.stdin.flush()
And it prints 'NoneType' object has no attribute 'write'. Even when using this code immediately after starting the popen process, it fails with the same message
playing_long = subprocess.Popen(["omxplayer", "/home/pi/Motion_sounds/music.mp3"], stdout=subprocess.PIPE)
time.sleep(5)
playing_long.stdin.write('q')
playing_long.stdin.flush()
EDIT3: The problem then was that I wasn't establishing the stdin line in the popen line. Now it is
playing_long = subprocess.Popen(["omxplayer", "/home/pi/Motion_sounds/music.mp3"], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
time.sleep(5)
playing_long.stdin.write(b'q')
playing_long.stdin.flush()
*needing to specify that it is bytes what I write in stdin
Final solution then (see the process edited in the question):
playing_long = subprocess.Popen(["omxplayer", "/home/pi/Motion_sounds/music.mp3"], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
time.sleep(5)
playing_long.stdin.write(b'q')
playing_long.stdin.flush()

Problems killing a process with Python on Solaris

I have a C++ program, called C, that is designed to shut down when it receives a SIGINT signal. I've written a Python program P that runs C as a subprocess. I want P to stop C. I tried 3 things and I'd like to know why some of them didn't work.
Attempt #1:
import subprocess
import signal
import os
p = subprocess.Popen(...)
...
os.killpg(p.pid, signal.SIGINT)
This code gives me the error
OSError [Errno 3]: No such process`
even though the p.pid matches the pid displayed by ps.
Attempt #2:
import subprocess
import signal
import os
p = subprocess.Popen(...)
...
os.system('kill -SIGINT %u' % p.pid)
This gives me the error
sh: kill: bad signal`
even though kill -SIGINT <pid> works from the terminal.
Attempt #3:
import subprocess
import signal
import os
p = subprocess.Popen(...)
...
os.system('kill -2 %u' % p.pid)
This works.
My question is, why didn't #1 and #2 work?
Edit: my original assumption was that since the documentation for os.kill() says New in version 2.7: Windows support, I thought that os.kill() is (a) first available in 2.7 and (b) works in Windows. After reading the answers below, I ran os.kill() on Solaris, which I should have done in the first place sorry, and it does work in 2.4. Obviously, the documentation means that Windows support is new in 2.7. Opps.
The first fails because os.killpg kills a process group, identified by its leader; you have a simple process, not a process group. Try os.kill instead. The second fails because the shell builtin kill understands symbolic signals, but the external command on Solaris doesn't (whereas on *BSD and Linux it does); use a numeric signal (SIGINT is 2 on Solaris, or use Python's predefined signal constants from the signal module). That said, use Popen's own interface instead as mentioned by someone else; don't reinvent the wheel, you're liable to create some corners.
The Popen object has a kill() method that you can invoke as well as a terminate() method and a generic send_signal() method.
I would use one of these rather than trying any of the out of band stuff you'd use with the os interface. You've already got a handle to the process, you should use it!

Categories