subprocess.Popen' s stdin.buffer/stdout is never flushing - python

I am writing a python daemon (using python 3.7) that continuously checks if data is available on stdin (using select) and does something with it. The data can contain non-unicode characters, so the daemon needs to read from sys.stdin.buffer instead of sys.stdin.
That program actually works. But I am writing a functional test for that program, using a second python script, that starts the daemon with subprocess.Popen, and sends data to it's stdin and reads from its stdout. For some reason, that part doesn't work. proc.readline() blocks forever, and some proc.stdin.write never reach the child process.
To boil it down, I have 2 small scripts that illustrate the problem.
Right now, the issue is that the first readline() call in test.py is blocking and never returning, while test2.py already wrote a full line to stdout.
# test.py
from subprocess import Popen, PIPE, TimeoutExpired
import time
import select
proc = Popen(["python", "test1.py"], stdin=PIPE, stdout=PIPE,
bufsize=0)
print("Writing")
proc.stdin.write("blablabla\n".encode('utf-8'))
time.sleep(2)
output = proc.stdout.readline()
print("Result")
print(output)
print("Writing")
proc.stdin.write("zxczxczxczxc\n".encode('utf-8'))
time.sleep(2)
output = proc.stdout.readline()
print("Result")
print(output)
proc.terminate()
print(proc.stdout.read())
#test1.py
import select
import signal
import sys
import logging
logging.basicConfig(level=logging.DEBUG, stream=sys.stdout)
log = logging.getLogger()
STOP = False
def stop(frame, signal):
global STOP
STOP = False
signal.signal(signal.SIGTERM, stop)
signal.signal(signal.SIGINT, stop)
def input_available():
"""Check if data is available on stdin."""
data_available = select.select([sys.stdin], [], [], 0)
return sys.stdin in data_available[0]
data = ""
while not STOP:
if input_available():
char = sys.stdin.buffer.read(1)
try:
char = char.decode('utf-8')
data += char
except UnicodeDecodeError:
char = None
print("skipping char")
if char == '\n':
s = f"Got line {data}"
print(s)
log.debug(s)
data = ""
sys.stdout.flush()
EDIT
From the commment of #charles-duffy I ran it with strace. It seems that the program in the subprocess is blocking on it's read call?
This is the last bit of strace output:
write(1, "Writing\n", 8Writing
) = 8
write(4, "blablabla\n", 10) = 10
select(0, NULL, NULL, NULL, {tv_sec=2, tv_usec=0}) = 0 (Timeout)
read(5,

Related

python subprocess waiting for grandchild on Windows with stdout set

I have a script that is part of an automated test suite. It runs very slowly on Windows but not on Linux and I have found out why. The process that we are testing ('frank') creates a child process (so a grandchild). The python code won't return until that grandchild process also ends (on Windows - doesn't do this on Linux). The grandchild process will kill itself off after 5 seconds if there is no parent (it hangs around in case another process talks to it)
I've found I can stop the communicate function from hanging in this way if I don't capture stdout. But I need stdout. I read somewhere that the communicate function is waiting for all pipes to be closed. I know that the stdout handle is duplicated for the grandchild but I can't change the code I'm testing.
I've been searching for a solution. I tried some creation flags (still in the code) but that didn't help.
This is the cut down test -
import os
import sys
import threading
import subprocess
def read_from_pipe(process):
last_stdout = process.communicate()[0]
print (last_stdout)
CREATE_NEW_PROCESS_GROUP = 0x00000200
DETACHED_PROCESS = 0x00000008
# start process
command = 'frank my arguments'
cwd = "C:\\dev\\ui_test\\frank_test\\workspace\\report183"
p = subprocess.Popen(command,
stdout=subprocess.PIPE,
cwd=cwd)
# run thread to read from output
t = threading.Thread(target=read_from_pipe, args=[p])
t.start()
t.join(30)
print('finished')
Any ideas?
Thanks.
Peter.
After tips from #eryksun and a lot of Googling, I have this rather complicated lot of code! At one point, I considered cheating and doing os.system and redirecting to a temp file but then I realised that our test code allows for a command timing out. os.system would just block forever if the child process doesn't die.
import os
import sys
import threading
import subprocess
import time
if os.name == 'nt':
import msvcrt
import ctypes
# See https://stackoverflow.com/questions/55160319/python-subprocess-waiting-for-grandchild-on-windows-with-stdout-set for details on Windows code
# Based on https://github.com/it2school/Projects/blob/master/2017/Python/party4kids-2/CarGame/src/pygame/tests/test_utils/async_sub.py
from ctypes.wintypes import DWORD
if sys.version_info >= (3,):
null_byte = '\x00'.encode('ascii')
else:
null_byte = '\x00'
def ReadFile(handle, desired_bytes, ol = None):
c_read = DWORD()
buffer = ctypes.create_string_buffer(desired_bytes+1)
success = ctypes.windll.kernel32.ReadFile(handle, buffer, desired_bytes, ctypes.byref(c_read), ol)
buffer[c_read.value] = null_byte
return ctypes.windll.kernel32.GetLastError(), buffer.value
def PeekNamedPipe(handle):
c_avail = DWORD()
c_message = DWORD()
success = ctypes.windll.kernel32.PeekNamedPipe(handle, None, 0, None, ctypes.byref(c_avail), ctypes.byref(c_message))
return "", c_avail.value, c_message.value
def read_available(handle):
buffer, bytesToRead, result = PeekNamedPipe(handle)
if bytesToRead:
hr, data = ReadFile(handle, bytesToRead, None)
return data
return b''
def read_from_pipe(process):
if os.name == 'posix':
last_stdout = process.communicate()[0]
else:
handle = msvcrt.get_osfhandle(process.stdout.fileno())
last_stdout = b''
while process.poll() is None:
last_stdout += read_available(handle)
time.sleep(0.1)
last_stdout += read_available(handle)
print (last_stdout)
# start process
command = 'frank my arguments'
cwd = "C:\\dev\\ui_test\\frank_test\\workspace\\report183"
p = subprocess.Popen(command,
stdout=subprocess.PIPE,
cwd=cwd)
# run thread to read from output
t = threading.Thread(target=read_from_pipe, args=[p])
t.start()
t.join(30)
print('finished')

PYTHON subprocess cmd.exe closes after first command

I am working on a python program which implements the cmd window.
I am using subproccess with PIPE.
If for example i write "dir" (by stdout), I use communicate() in order to get the response from the cmd and it does work.
The problem is that in a while True loop, this doesn't work more than one time, it seems like the subprocess closes itself..
Help me please
import subprocess
process = subprocess.Popen('cmd.exe', shell=False, stdin=subprocess.PIPE,stdout=subprocess.PIPE,stderr=None)
x=""
while x!="x":
x = raw_input("insert a command \n")
process.stdin.write(x+"\n")
o,e=process.communicate()
print o
process.stdin.close()
The main problem is that trying to read subprocess.PIPE deadlocks when the program is still running but there is nothing to read from stdout. communicate() manually terminates the process to stop this.
A solution would be to put the piece of code that reads stdout in another thread, and then access it via Queue, which allows for reliable sharing of data between threads by timing out instead of deadlocking.
The new thread will read standard out continuously, stopping when there is no more data.
Each line will be grabbed from the queue stream until a timeout is reached(no more data in Queue), then the list of lines will be displayed to the screen.
This process will work for non-interactive programs
import subprocess
import threading
import Queue
def read_stdout(stdout, queue):
while True:
queue.put(stdout.readline()) #This hangs when there is no IO
process = subprocess.Popen('cmd.exe', shell=False, stdout=subprocess.PIPE, stdin=subprocess.PIPE)
q = Queue.Queue()
t = threading.Thread(target=read_stdout, args=(process.stdout, q))
t.daemon = True # t stops when the main thread stops
t.start()
while True:
x = raw_input("insert a command \n")
if x == "x":
break
process.stdin.write(x + "\n")
o = []
try:
while True:
o.append(q.get(timeout=.1))
except Queue.Empty:
print ''.join(o)

Checking to see if there is more data to read from a file descriptor using Python's select module

I have a program that creates a subprocess within a thread, so that the thread can be constantly checking for specific output conditions (from either stdout or stderr), and call the appropriate callbacks, while the rest of the program continues. Here is a pared-down version of that code:
import select
import subprocess
import threading
def run_task():
command = ['python', 'a-script-that-outputs-lines.py']
proc = subprocess.Popen(command, stdout = subprocess.PIPE, stderr = subprocess.PIPE)
while True:
ready, _, _ = select.select((proc.stdout, proc.stderr), (), (), .1)
if proc.stdout in ready:
next_line_to_process = proc.stdout.readline()
# process the output
if proc.stderr in ready:
next_line_to_process = proc.stderr.readline()
# process the output
if not ready and proc.poll() is not None:
break
thread = threading.Thread(target = run_task)
thread.run()
It works reasonably well, but I would like the thread to exit once two conditions are met: the running child process has finished, and all of the data in stdout and stderr has been processed.
The difficulty I have is that if my last condition is as it is above (if not ready and proc.poll() is not None), then the thread never exits, because once stdout and stderr's file descriptors are marked as ready, they never become unready (even after all of the data has been read from them, and read() would hang or readline() would return an empty string).
If I change that condition to just if proc.poll() is not None, then the loop exists when the program exits, and I can't guarantee that it's seen all of the data that needs to be processed.
Is this just the wrong approach, or is there a way to reliably determine when you've read all of the data that will ever be written to a file descriptor? Or is this an issue specific to trying to read from the stderr/stdout of a subprocess?
I have been trying this on Python 2.5 (running on OS X) and also tried select.poll() and select.epoll()-based variants on Python 2.6 (running on Debian with a 2.6 kernel).
select module is appropriate if you want to find out whether you can read from a pipe without blocking.
To make sure that you've read all data, use a simpler condition if proc.poll() is not None: break and call rest = [pipe.read() for pipe in [p.stdout, p.stderr]] after the loop.
It is unlikely that a subprocess closes its stdout/stderr before its shutdown therefore you could skip the logic that handles EOF for simplicity.
Don't call Thread.run() directly, use Thread.start() instead. You probably don't need the separate thread here at all.
Don't call p.stdout.readline() after the select(), it may block, use os.read(p.stdout.fileno(), limit) instead. Empty bytestring indicates EOF for the corresponding pipe.
As an alternative or in addition to you could make the pipes non-blocking using fcntl module:
import os
from fcntl import fcntl, F_GETFL, F_SETFL
def make_nonblocking(fd):
return fcntl(fd, F_SETFL, fcntl(fd, F_GETFL) | os.O_NONBLOCK)
and handle io/os errors while reading.
My eventual solution, as I mentioned above, was the following, in case this is helpful to anyone. I think it is the right approach, since I'm now 97.2% sure you can't do this with just select()/poll() and read():
import select
import subprocess
import threading
def run_task():
command = ['python', 'a-script-that-outputs-lines.py']
proc = subprocess.Popen(command, stdout = subprocess.PIPE, stderr = subprocess.PIPE)
while True:
ready, _, _ = select.select((proc.stdout, proc.stderr), (), (), .1)
if proc.stdout in ready:
next_line_to_process = proc.stdout.readline()
if next_line_to_process:
# process the output
elif proc.returncode is not None:
# The program has exited, and we have read everything written to stdout
ready = filter(lambda x: x is not proc.stdout, ready)
if proc.stderr in ready:
next_line_to_process = proc.stderr.readline()
if next_line_to_process:
# process the output
elif proc.returncode is not None:
# The program has exited, and we have read everything written to stderr
ready = filter(lambda x: x is not proc.stderr, ready)
if proc.poll() is not None and not ready:
break
thread = threading.Thread(target = run_task)
thread.run()
You could do a raw os.read(fd, size) on the pipe's file descriptor instead of using readline(). This is a non-blocking operation which can also detect EOF (in that case it returns an empty string or byte object). You'd have to implement the line splitting and buffering yourself. Use something like this:
class NonblockingReader():
def __init__(self, pipe):
self.fd = pipe.fileno()
self.buffer = ""
def readlines(self):
data = os.read(self.fd, 2048)
if not data:
return None
self.buffer += data
if os.linesep in self.buffer:
lines = self.buffer.split(os.linesep)
self.buffer = lines[-1]
return lines[:-1]
else:
return []

Non-blocking read on subprocess PIPE in Python, one byte at a time

I have implemented a variant on the code in this question:
A non-blocking read on a subprocess.PIPE in Python
To try and read the output in real time from this dummy program test.py:
import time, sys
print "Hello there"
for i in range(100):
time.sleep(0.1)
sys.stdout.write("\r%d"%i)
sys.stdout.flush()
print
print "Go now or I shall taunt you once again!"
The variation on the other question is that the calling program must read character by character, not line by line, as the dummy program test.py outputs progress indication all on one line by use of \r. So here it is:
import sys,time
from subprocess import PIPE, Popen
from threading import Thread
try:
from Queue import Queue, Empty
except ImportError:
from queue import Queue, Empty # Python 3.x
ON_POSIX = 'posix' in sys.builtin_module_names
def enqueue_output(out, queue):
while True:
buffersize = 1
data = out.read(buffersize)
if not data:
break
queue.put(data)
out.close()
p = Popen(sys.executable + " test.py", stdout=PIPE, bufsize=1, close_fds=ON_POSIX)
q = Queue()
t = Thread(target=enqueue_output, args=(p.stdout, q))
t.daemon = True # Thread dies with the program
t.start()
while True:
p.poll()
if p.returncode:
break
# Read line without blocking
try:
char = q.get_nowait()
time.sleep(0.1)
except Empty:
pass
else: # Got line
sys.stdout.write(char)
sys.stdout.flush()
print "left loop"
sys.exit(0)
Two problems with this
It never exits - p.returncode never returns a value and the loop is not left. How can I fix it?
It's really slow! Is there a way to make it more efficient without increasing buffersize?
As #Markku K. pointed out, you should use bufsize=0 to read one byte at a time.
Your code doesn't require a non-blocking read. You can simplify it:
import sys
from functools import partial
from subprocess import Popen, PIPE
p = Popen([sys.executable, "test.py"], stdout=PIPE, bufsize=0)
for b in iter(partial(p.stdout.read, 1), b""):
print b # it should print as soon as `sys.stdout.flush()` is called
# in the test.py
p.stdout.close()
p.wait()
Note: reading 1 byte at a time is very inefficient.
Also, in general, there could be a block-buffering issue that sometimes can be solved using pexpect, pty modules or unbuffer, stdbuf, script command-line utilities.
For Python processes you could use -u flag to force unbuffering (binary layer) of stdin, stdout, stderr streams.

Python subprocess readlines() hangs

The task I try to accomplish is to stream a ruby file and print out the output. (NOTE: I don't want to print out everything at once)
main.py
from subprocess import Popen, PIPE, STDOUT
import pty
import os
file_path = '/Users/luciano/Desktop/ruby_sleep.rb'
command = ' '.join(["ruby", file_path])
master, slave = pty.openpty()
proc = Popen(command, bufsize=0, shell=True, stdout=slave, stderr=slave, close_fds=True)
stdout = os.fdopen(master, 'r', 0)
while proc.poll() is None:
data = stdout.readline()
if data != "":
print(data)
else:
break
print("This is never reached!")
ruby_sleep.rb
puts "hello"
sleep 2
puts "goodbye!"
Problem
Streaming the file works fine. The hello/goodbye output is printed with the 2 seconds delay. Exactly as the script should work. The problem is that readline() hangs in the end and never quits. I never reach the last print.
I know there is a lot of questions like this here a stackoverflow but non of them made me solve the problem. I'm not that into the whole subprocess thing so please give me a more hands-on/concrete answer.
Regards
edit
Fix unintended code. (nothing to do with the actual error)
I assume you use pty due to reasons outlined in Q: Why not just use a pipe (popen())? (all other answers so far ignore your "NOTE: I don't want to print out everything at once").
pty is Linux only as said in the docs:
Because pseudo-terminal handling is highly platform dependent, there
is code to do it only for Linux. (The Linux code is supposed to work
on other platforms, but hasn’t been tested yet.)
It is unclear how well it works on other OSes.
You could try pexpect:
import sys
import pexpect
pexpect.run("ruby ruby_sleep.rb", logfile=sys.stdout)
Or stdbuf to enable line-buffering in non-interactive mode:
from subprocess import Popen, PIPE, STDOUT
proc = Popen(['stdbuf', '-oL', 'ruby', 'ruby_sleep.rb'],
bufsize=1, stdout=PIPE, stderr=STDOUT, close_fds=True)
for line in iter(proc.stdout.readline, b''):
print line,
proc.stdout.close()
proc.wait()
Or using pty from stdlib based on #Antti Haapala's answer:
#!/usr/bin/env python
import errno
import os
import pty
from subprocess import Popen, STDOUT
master_fd, slave_fd = pty.openpty() # provide tty to enable
# line-buffering on ruby's side
proc = Popen(['ruby', 'ruby_sleep.rb'],
stdin=slave_fd, stdout=slave_fd, stderr=STDOUT, close_fds=True)
os.close(slave_fd)
try:
while 1:
try:
data = os.read(master_fd, 512)
except OSError as e:
if e.errno != errno.EIO:
raise
break # EIO means EOF on some systems
else:
if not data: # EOF
break
print('got ' + repr(data))
finally:
os.close(master_fd)
if proc.poll() is None:
proc.kill()
proc.wait()
print("This is reached!")
All three code examples print 'hello' immediately (as soon as the first EOL is seen).
leave the old more complicated code example here because it may be referenced and discussed in other posts on SO
Or using pty based on #Antti Haapala's answer:
import os
import pty
import select
from subprocess import Popen, STDOUT
master_fd, slave_fd = pty.openpty() # provide tty to enable
# line-buffering on ruby's side
proc = Popen(['ruby', 'ruby_sleep.rb'],
stdout=slave_fd, stderr=STDOUT, close_fds=True)
timeout = .04 # seconds
while 1:
ready, _, _ = select.select([master_fd], [], [], timeout)
if ready:
data = os.read(master_fd, 512)
if not data:
break
print("got " + repr(data))
elif proc.poll() is not None: # select timeout
assert not select.select([master_fd], [], [], 0)[0] # detect race condition
break # proc exited
os.close(slave_fd) # can't do it sooner: it leads to errno.EIO error
os.close(master_fd)
proc.wait()
print("This is reached!")
Not sure what is wrong with your code, but the following seems to work for me:
#!/usr/bin/python
from subprocess import Popen, PIPE
import threading
p = Popen('ls', stdout=PIPE)
class ReaderThread(threading.Thread):
def __init__(self, stream):
threading.Thread.__init__(self)
self.stream = stream
def run(self):
while True:
line = self.stream.readline()
if len(line) == 0:
break
print line,
reader = ReaderThread(p.stdout)
reader.start()
# Wait until subprocess is done
p.wait()
# Wait until we've processed all output
reader.join()
print "Done!"
Note that I don't have Ruby installed and hence cannot check with your actual problem. Works fine with ls, though.
Basically what you are looking at here is a race condition between your proc.poll() and your readline(). Since the input on the master filehandle is never closed, if the process attempts to do a readline() on it after the ruby process has finished outputting, there will never be anything to read, but the pipe will never close. The code will only work if the shell process closes before your code tries another readline().
Here is the timeline:
readline()
print-output
poll()
readline()
print-output (last line of real output)
poll() (returns false since process is not done)
readline() (waits for more output)
(process is done, but output pipe still open and no poll ever happens for it).
Easy fix is to just use the subprocess module as it suggests in the docs, not in conjunction with openpty:
http://docs.python.org/library/subprocess.html
Here is a very similar problem for further study:
Using subprocess with select and pty hangs when capturing output
Try this:
proc = Popen(command, bufsize=0, shell=True, stdout=PIPE, close_fds=True)
for line in proc.stdout:
print line
print("This is most certainly reached!")
As others have noted, readline() will block when reading data. It will even do so when your child process has died. I am not sure why this does not happen when executing ls as in the other answer, but maybe the ruby interpreter detects that it is writing to a PIPE and therefore it will not close automatically.

Categories