Is it possible to read multiple asyncio Streams concurrently? - python

I need to read the output of several asyncio tasks running concurrently.
These tasks are actually created using asyncio.create_subprocess_exec().
In the simplest form I would need to print stdout/stderr of a single process while accumulating lines in separate strings.
My current (working) code is:
async def run_command(*args, stdin=None, can_fail=False, echo=False):
"""
Run command asynchronously in subprocess.
Waits for command completion and returns return code, stdout and stdin
Example from:
http://asyncio.readthedocs.io/en/latest/subprocess.html
"""
# Create subprocess
try:
process = await asyncio.create_subprocess_exec(
*args,
stdin=asyncio.subprocess.PIPE,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE
)
except (FileNotFoundError, OSError):
if not can_fail:
log.error("run_command(%s): Error FileNotFound", args)
return -1, '', 'File "%s" NotFound' % args[0]
# Status
log.debug("run_command(%s): pid=%s", args, process.pid)
# Wait for the subprocess to finish
stdout, stderr = await process.communicate(stdin)
# Progress
if process.returncode == 0:
log.debug("run_command(%s): ok: %s", process.pid, stdout.decode().strip())
else:
log.debug("run_command(%s): ko: %s", process.pid, stderr.decode().strip())
# Result
result = process.returncode, stdout.decode().strip(), stderr.decode().strip()
# Return stdout
return result
Problem with this code is I see nothing till process terminates; some of the spawned processes may take several minutes to complete and would print "interesting" info while executing. How can I print (or log) output as soon as it happens while capturing? (I am aware that omitting capture the underlying process would print, but I also need the capture)
I tried to do something along the lines:
_stdout = ''
while True:
data = process.stdout.readline()
if not data:
break
print(data)
_stdout += data.decode()
but I have no idea how to extend this to multiple streams (in this case just stdout/stderr, but potentially expanding to multiple programs). Is there something akin to select() call?
Any hint welcome

Is there something akin to select() call?
The answer to this must be yes, as asyncio is wholly built around a call to select(). However it's not always obvious how to translate that to a select on the level of streams. The thing to notice is that you shouldn't try to select the stream exactly - instead, start reading on the stream and rely on the ability to select the progress of the coroutines. The equivalent of select() would thus be to use asyncio.wait(return_when=FIRST_COMPLETED) to drive the reads in a loop.
An even more elegant alternative is to spawn separate tasks where each does its thing, and just let them run in parallel. The code is easier to understand than with a select, boiling down to a single call to gather, and yet under the hood asyncio performs exactly the kind of select() that was requested:
import asyncio, sys, io
async def _read_all(stream, echo):
# helper function to read the whole stream, optionally
# displaying data as it arrives
buf = io.BytesIO() # BytesIO is preferred to +=
while True:
chunk = await stream.read(4096)
if len(chunk) == 0:
break
buf.write(chunk)
if echo:
sys.stdout.buffer.write(chunk)
return buf.getvalue()
async def run_command(*args, stdin=None, echo=False):
process = await asyncio.create_subprocess_exec(
*args,
stdin=asyncio.subprocess.PIPE,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE
)
if stdin is not None:
process.stdin.write(stdin)
process.stdin.close()
stdout, stderr = await asyncio.gather(
_read_all(process.stdout, echo),
_read_all(process.stderr, echo)
)
return process.returncode, stdout.decode().strip(), stderr.decode().strip()
Note that asyncio's write() is not a coroutine, it defaults to writing in the background, so we don't need to include the write among the coroutines we gather().

Related

How to run an EXE program in the background and get the outuput in python

I want to run an exe program in the background
Let's say the program is httpd.exe
I can run it but when I want to get the outupt It get stuck becuase there is no output if It starts successfully. But if there is an error It's OK.
Here is the code I'm using:
import asyncio
import os
os.chdir('c:\\apache\\bin')
process, stdout, stderr = asyncio.run(run('httpd.exe'))
print(stdout, stderr)
async def run(cmd):
proc = await asyncio.create_subprocess_exec(
cmd,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE)
stdout, stderr = await proc.communicate()
return (proc, stdout, stderr)
I tried to make the following code as general as possible:
I make no assumptions as to whether the program being run only writes its output to stdout alone or stderr alone. So I capture both outputs by starting two threads, one for each stream, and then write the output to a common queue that can be read in real time. When end-of-stream in encountered on each stdout and stderr, the threads write a special None record to the queue to indicate end of stream. So the reader of the queue know that after seeing two such "end of stream" indicators that there will be no more lines being written to the queue and that the process has effectively ended.
The call to subprocess.Popen can be made with argument shell=True so that this can also built-in shell commands and also to make the specification of the command easier (it can now be a single string rather than a list of strings).
The function run_cmd returns the created process and the queue. You just have to now loop reading lines from the queue until two None records are seen. Once that occurs, you can then just wait for the process to complete, which should be immediate.
If you know that the process you are starting only writes its output to stdout or stderr (or if you only want to catch one of these outputs), then you can modify the program to start only one thread and specify the subprocess.PIPE value for only one of these outputs and then the loop that is reading lines from the queue should only be looking for one None end-of-stream indicator.
The threads are daemon threads so that if you wish to terminate based on output from the process that has been read before all the end-of-stream records have been detected, then the threads will automatically be terminated along with the main process.
run_apache, which runs Apache as a subprocess, is itself a daemon thread. If it detects any output from Apache, it sets an event that has been passed to it. The main thread that starts run_apache can periodically test this event, wait on this event, wait for the run_apache thread to end (which will only occur when Apache ends) or can terminate Apache via global variable proc.
import subprocess
import sys
import threading
import queue
def read_stream(f, q):
for line in iter(f.readline, ''):
q.put(line)
q.put(None) # show no more data from stdout or stderr
def run_cmd(command, run_in_shell=True):
"""
Run command as a subprocess. If run_in_shell is True, then
command is a string, else it is a list of strings.
"""
proc = subprocess.Popen(command, shell=run_in_shell, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)
q = queue.Queue()
threading.Thread(target=read_stream, args=(proc.stdout, q), daemon=True).start()
threading.Thread(target=read_stream, args=(proc.stderr, q), daemon=True).start()
return proc, q
import os
def run_apache(event):
global proc
os.chdir('c:\\apache\\bin')
proc, q = run_cmd(['httpd.exe'], False)
seen_None_count = 0
while seen_None_count < 2:
line = q.get()
if line is None:
# end of stream from either stdout or stderr
seen_None_count += 1
else:
event.set() # Seen output line:
print(line, end='')
# wait for process to terminate, which should be immediate:
proc.wait()
# This event will be set if Apache write output:
event = threading.Event()
t = threading.Thread(target=run_apache, args=(event,), daemon=True)
t.start()
# Main thread runs and can test event any time to see if it has done any output:
if event.is_set():
...
# The main thread can wait for run_apache thread to normally terminate,
# will occur when Apache terminates:
t.join()
# or the main thread can kill Apache via global variable procL
proc.terminate() # No need to do t.join() since run_apache is a daemon thread

Continous output with asyncio subprocess

I'm using asyncio subprocess to execute a subcommand. I want to see the long-running process and save the content at the same time to a buffer for later use. Furthermore, I found this related question (Getting live output from asyncio subprocess), but it mainly centers around the use case for ssh.
The asyncio subprocess docs have an example for reading the output line-by-line, which goes into the direction of what I want to achieve. (https://docs.python.org/3/library/asyncio-subprocess.html#examples)
import asyncio
import sys
async def get_date():
code = 'import datetime; print(datetime.datetime.now())'
# Create the subprocess; redirect the standard output
# into a pipe.
proc = await asyncio.create_subprocess_exec(
sys.executable, '-c', code,
stdout=asyncio.subprocess.PIPE)
# Read one line of output.
data = await proc.stdout.readline()
line = data.decode('ascii').rstrip()
# Wait for the subprocess exit.
await proc.wait()
return line
date = asyncio.run(get_date())
print(f"Current date: {date}")
I adapted this example to the following:
async def subprocess_async(cmd, **kwargs):
cmd_list = shlex.split(cmd)
proc = await asyncio.create_subprocess_exec(
*cmd_list,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.STDOUT, **kwargs)
full_log = ""
while True:
buf = await proc.stdout.readline()
if not buf:
break
full_log += buf.decode()
print(f' {buf.decode().rstrip()}')
await proc.wait()
res = subprocess.CompletedProcess(cmd, proc.returncode, stdout=full_log.encode(), stderr=b'')
return res
The issue here is, that the proc.returncode value sometimes becomes None. I guess, I have a misunderstanding, how proc.wait() works and when it is safe to stop reading the output. How do I achieve continuous output using asyncio subprocess?
Your code is working fine as-is for me. What command are you trying to run that is causing your issue?
Two things I can think of to help, are
Instead of calling .wait() afterwards, set the returncode as the loop condition to keep running.
Don't wait for full line returns in case the program is like ffmpeg where it will do some tricks to paste over itself in console and not actually send newline characters.
Example code:
import asyncio, shlex, subprocess, sys
async def subprocess_async(cmd, **kwargs):
cmd_list = shlex.split(cmd)
proc = await asyncio.create_subprocess_exec(
*cmd_list,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.STDOUT,
**kwargs)
full_log = b""
while proc.returncode is None:
buf = await proc.stdout.read(20)
if not buf:
break
full_log += buf
sys.stdout.write(buf.decode())
res = subprocess.CompletedProcess(cmd, proc.returncode, stdout=full_log, stderr=b'')
return res
if __name__ == '__main__':
asyncio.run(subprocess_async("ffprobe -i video.mp4"))

I want to continue with my program after opening the .bat file [duplicate]

I need to run a shell command asynchronously from a Python script. By this I mean that I want my Python script to continue running while the external command goes off and does whatever it needs to do.
I read this post:
Calling an external command in Python
I then went off and did some testing, and it looks like os.system() will do the job provided that I use & at the end of the command so that I don't have to wait for it to return. What I am wondering is if this is the proper way to accomplish such a thing? I tried commands.call() but it will not work for me because it blocks on the external command.
Please let me know if using os.system() for this is advisable or if I should try some other route.
subprocess.Popen does exactly what you want.
from subprocess import Popen
p = Popen(['watch', 'ls']) # something long running
# ... do other stuff while subprocess is running
p.terminate()
(Edit to complete the answer from comments)
The Popen instance can do various other things like you can poll() it to see if it is still running, and you can communicate() with it to send it data on stdin, and wait for it to terminate.
If you want to run many processes in parallel and then handle them when they yield results, you can use polling like in the following:
from subprocess import Popen, PIPE
import time
running_procs = [
Popen(['/usr/bin/my_cmd', '-i %s' % path], stdout=PIPE, stderr=PIPE)
for path in '/tmp/file0 /tmp/file1 /tmp/file2'.split()]
while running_procs:
for proc in running_procs:
retcode = proc.poll()
if retcode is not None: # Process finished.
running_procs.remove(proc)
break
else: # No process is done, wait a bit and check again.
time.sleep(.1)
continue
# Here, `proc` has finished with return code `retcode`
if retcode != 0:
"""Error handling."""
handle_results(proc.stdout)
The control flow there is a little bit convoluted because I'm trying to make it small -- you can refactor to your taste. :-)
This has the advantage of servicing the early-finishing requests first. If you call communicate on the first running process and that turns out to run the longest, the other running processes will have been sitting there idle when you could have been handling their results.
This is covered by Python 3 Subprocess Examples under "Wait for command to terminate asynchronously". Run this code using IPython or python -m asyncio:
import asyncio
proc = await asyncio.create_subprocess_exec(
'ls','-lha',
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE)
# do something else while ls is working
# if proc takes very long to complete, the CPUs are free to use cycles for
# other processes
stdout, stderr = await proc.communicate()
The process will start running as soon as the await asyncio.create_subprocess_exec(...) has completed. If it hasn't finished by the time you call await proc.communicate(), it will wait there in order to give you your output status. If it has finished, proc.communicate() will return immediately.
The gist here is similar to Terrels answer but I think Terrels answer appears to overcomplicate things.
See asyncio.create_subprocess_exec for more information.
What I am wondering is if this [os.system()] is the proper way to accomplish such a thing?
No. os.system() is not the proper way. That's why everyone says to use subprocess.
For more information, read http://docs.python.org/library/os.html#os.system
The subprocess module provides more
powerful facilities for spawning new
processes and retrieving their
results; using that module is
preferable to using this function. Use
the subprocess module. Check
especially the Replacing Older
Functions with the subprocess Module
section.
The accepted answer is very old.
I found a better modern answer here:
https://kevinmccarthy.org/2016/07/25/streaming-subprocess-stdin-and-stdout-with-asyncio-in-python/
and made some changes:
make it work on windows
make it work with multiple commands
import sys
import asyncio
if sys.platform == "win32":
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
async def _read_stream(stream, cb):
while True:
line = await stream.readline()
if line:
cb(line)
else:
break
async def _stream_subprocess(cmd, stdout_cb, stderr_cb):
try:
process = await asyncio.create_subprocess_exec(
*cmd, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE
)
await asyncio.wait(
[
_read_stream(process.stdout, stdout_cb),
_read_stream(process.stderr, stderr_cb),
]
)
rc = await process.wait()
return process.pid, rc
except OSError as e:
# the program will hang if we let any exception propagate
return e
def execute(*aws):
""" run the given coroutines in an asyncio loop
returns a list containing the values returned from each coroutine.
"""
loop = asyncio.get_event_loop()
rc = loop.run_until_complete(asyncio.gather(*aws))
loop.close()
return rc
def printer(label):
def pr(*args, **kw):
print(label, *args, **kw)
return pr
def name_it(start=0, template="s{}"):
"""a simple generator for task names
"""
while True:
yield template.format(start)
start += 1
def runners(cmds):
"""
cmds is a list of commands to excecute as subprocesses
each item is a list appropriate for use by subprocess.call
"""
next_name = name_it().__next__
for cmd in cmds:
name = next_name()
out = printer(f"{name}.stdout")
err = printer(f"{name}.stderr")
yield _stream_subprocess(cmd, out, err)
if __name__ == "__main__":
cmds = (
[
"sh",
"-c",
"""echo "$SHELL"-stdout && sleep 1 && echo stderr 1>&2 && sleep 1 && echo done""",
],
[
"bash",
"-c",
"echo 'hello, Dave.' && sleep 1 && echo dave_err 1>&2 && sleep 1 && echo done",
],
[sys.executable, "-c", 'print("hello from python");import sys;sys.exit(2)'],
)
print(execute(*runners(cmds)))
It is unlikely that the example commands will work perfectly on your system, and it doesn't handle weird errors, but this code does demonstrate one way to run multiple subprocesses using asyncio and stream the output.
I've had good success with the asyncproc module, which deals nicely with the output from the processes. For example:
import os
from asynproc import Process
myProc = Process("myprogram.app")
while True:
# check to see if process has ended
poll = myProc.wait(os.WNOHANG)
if poll is not None:
break
# print any new output
out = myProc.read()
if out != "":
print out
Using pexpect with non-blocking readlines is another way to do this. Pexpect solves the deadlock problems, allows you to easily run the processes in the background, and gives easy ways to have callbacks when your process spits out predefined strings, and generally makes interacting with the process much easier.
Considering "I don't have to wait for it to return", one of the easiest solutions will be this:
subprocess.Popen( \
[path_to_executable, arg1, arg2, ... argN],
creationflags = subprocess.CREATE_NEW_CONSOLE,
).pid
But... From what I read this is not "the proper way to accomplish such a thing" because of security risks created by subprocess.CREATE_NEW_CONSOLE flag.
The key things that happen here is use of subprocess.CREATE_NEW_CONSOLE to create new console and .pid (returns process ID so that you could check program later on if you want to) so that not to wait for program to finish its job.
I have the same problem trying to connect to an 3270 terminal using the s3270 scripting software in Python. Now I'm solving the problem with an subclass of Process that I found here:
http://code.activestate.com/recipes/440554/
And here is the sample taken from file:
def recv_some(p, t=.1, e=1, tr=5, stderr=0):
if tr < 1:
tr = 1
x = time.time()+t
y = []
r = ''
pr = p.recv
if stderr:
pr = p.recv_err
while time.time() < x or r:
r = pr()
if r is None:
if e:
raise Exception(message)
else:
break
elif r:
y.append(r)
else:
time.sleep(max((x-time.time())/tr, 0))
return ''.join(y)
def send_all(p, data):
while len(data):
sent = p.send(data)
if sent is None:
raise Exception(message)
data = buffer(data, sent)
if __name__ == '__main__':
if sys.platform == 'win32':
shell, commands, tail = ('cmd', ('dir /w', 'echo HELLO WORLD'), '\r\n')
else:
shell, commands, tail = ('sh', ('ls', 'echo HELLO WORLD'), '\n')
a = Popen(shell, stdin=PIPE, stdout=PIPE)
print recv_some(a),
for cmd in commands:
send_all(a, cmd + tail)
print recv_some(a),
send_all(a, 'exit' + tail)
print recv_some(a, e=0)
a.wait()
There are several answers here but none of them satisfied my below requirements:
I don't want to wait for command to finish or pollute my terminal with subprocess outputs.
I want to run bash script with redirects.
I want to support piping within my bash script (for example find ... | tar ...).
The only combination that satiesfies above requirements is:
subprocess.Popen(['./my_script.sh "arg1" > "redirect/path/to"'],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
shell=True)

Creating a minimal sandbox for running binary programs in Python3

I am trying to build a Python sandbox for running student's code in a minimal and safe environment. I intend to run it into a container and to limit its access to the resources of that container. So, I am currently designing the part of the sandbox that is supposed to run into the container and handle the access to the resources.
For now, my specification is to limit the amount of time and memory used by the process. I also need to be able to communicate with the process through the stdin and to catch the retcode, stdout and stderr at the end of the execution.
Moreover, the program may enter in an infinite loop and fill-up the memory through the stdout or stderr (I had one student's program that crashed my container because of that). So, I want also to be able to limit the size of the recovered stdout and stderr (after a certain limit is reached I can just kill the process and ignore the rest of the output. I do not care about these extra data as it is most likely a buggy program and it should be discarded).
For now, my sandbox is catching almost everything, meaning that I can:
Set a timeout as I want;
Set a limit to the memory used in the process;
Feed the process through a stdin (for now a given string);
Get the final retcode, stdout and stderr.
Here is my current code (I tried to keep it small for the example):
MEMORY_LIMIT = 64 * 1024 * 1024
TIMEOUT_LIMIT = 5 * 60
__NR_FILE_NOT_FOUND = -1
__NR_TIMEOUT = -2
__NR_MEMORY_OUT = -3
def limit_memory(memory):
import resource
return lambda :resource.setrlimit(resource.RLIMIT_AS, (memory, memory))
def run_program(cmd, sinput='', timeout=TIMEOUT_LIMIT, memory=MEMORY_LIMIT):
"""Run the command line and output (ret, sout, serr)."""
from subprocess import Popen, PIPE
try:
proc = Popen(cmd, stdin=PIPE, stdout=PIPE, stderr=PIPE,
preexec_fn=limit_memory(memory))
except FileNotFoundError:
return (__NR_FILE_NOT_FOUND, "", "")
sout, serr = "".encode("utf-8"), "".encode("utf-8")
try:
sout, serr = proc.communicate(sinput.encode("utf-8"), timeout=timeout)
ret = proc.wait()
except subprocess.TimeoutExpired:
ret = __NR_TIMEOUT
except MemoryError:
ret = __NR_MEMORY_OUT
return (ret, sout.decode("utf-8"), serr.decode("utf-8"))
if __name__ == "__main__":
ret, out, err = run_program(['./example.sh'], timeout=8)
print("return code: %i\n" % ret)
print("stdout:\n%s" % out)
print("stderr:\n%s" % err)
The missing features are:
Set a limitation on the size of stdout and stderr. I looked on the Web and saw several attempts, but none is really working.
Attach a function to stdin better than just a static string. The function should connect to the pipes stdout and stderr and return bytes to stdin.
Does anyone has an idea about that ?
PS: I already looked at:
Non blocking reading from a subprocess output stream in Python;
Python subprocess with timeout and large output (>64K)
As I was saying, you can create your own buffers and write the STDOUT/STDERR to them, checking the size along the way. For convenience, you can write a small io.BytesIO wrapper to do the check for you, e.g.:
from io import BytesIO
# lets first create a size-controlled BytesIO buffer for convenience
class MeasuredStream(BytesIO):
def __init__(self, maxsize=1024): # lets use a 1 KB as a default
super(MeasuredStream, self).__init__()
self.maxsize = maxsize
self.length = 0
def write(self, b):
if self.length + len(b) > self.maxsize: # o-oh, max size exceeded
# write only up to maxsize, truncate the rest
super(MeasuredStream, self).write(b[:self.maxsize - self.length])
raise ValueError("Max size reached, excess data is truncated")
# plenty of space left, write the bytes and increase the length
self.length += super(MeasuredStream, self).write(b)
return len(b) # convention: return the written number of bytes
Mind you, if you intend to do truncation / seek & replace you'll have to account for those in your length but this is enough for our purposes.
Anyway, now all you need to do is to handle your own streams and account for the possible ValueError from the MeasuredStream, instead of using Popen.communicate(). This, unfortunately, also means that you'll have to handle the timeout yourself. Something like:
from subprocess import Popen, PIPE, STDOUT, TimeoutExpired
import sys
import time
MEMORY_LIMIT = 64 * 1024 * 1024
TIMEOUT_LIMIT = 5 * 60
STDOUT_LIMIT = 1024 * 1024 # let's use 1 MB as a STDOUT limit
__NR_FILE_NOT_FOUND = -1
__NR_TIMEOUT = -2
__NR_MEMORY_OUT = -3
__NR_MAX_STDOUT_EXCEEDED = -4 # let's add a new return code
# a cross-platform precision clock
get_timer = time.clock if sys.platform == "win32" else time.time
def limit_memory(memory):
import resource
return lambda :resource.setrlimit(resource.RLIMIT_AS, (memory, memory))
def run_program(cmd, sinput='', timeout=TIMEOUT_LIMIT, memory=MEMORY_LIMIT):
"""Run the command line and output (ret, sout, serr)."""
from subprocess import Popen, PIPE, STDOUT
try:
proc = Popen(cmd, stdin=PIPE, stdout=PIPE, stderr=STDOUT,
preexec_fn=limit_memory(memory), timeout=timeout)
except FileNotFoundError:
return (__NR_FILE_NOT_FOUND, "", "")
sout = MeasuredStream(STDOUT_LIMIT) # store STDOUT in a measured stream
start_time = get_timer() # store a reference timer for our custom timeout
try:
proc.stdin.write(sinput.encode("utf-8")) # write the input to STDIN
proc.stdin.flush() # flush the STDOUT buffer
while True: # our main listener loop
line = proc.stdout.readline() # read a line from the STDOUT
# use proc.stdout.read(buf_size) instead to handle your own buffer
if line != b"": # content collected...
sout.write(line) # write it to our stream
elif proc.poll() is not None: # process finished, nothing to do
break
# finally, check the current time progress...
if get_timer() >= start_time + TIMEOUT_LIMIT:
raise TimeoutExpired(proc.args, TIMEOUT_LIMIT)
ret = proc.poll() # get the return code
except TimeoutExpired:
proc.kill() # we're no longer interested in the process, kill it
ret = __NR_TIMEOUT
except MemoryError:
ret = __NR_MEMORY_OUT
except ValueError: # max buffer reached
proc.kill() # we're no longer interested in the process, kill it
ret = __NR_MAX_STDOUT_EXCEEDED
sout.seek(0) # rewind the buffer
return ret, sout.read().decode("utf-8") # send the results back
if __name__ == "__main__":
ret, out, err = run_program(['./example.sh'], timeout=8)
print("return code: %i\n" % ret)
print("stdout:\n%s" % out)
print("stderr:\n%s" % err)
There are two 'issues' with this, tho, the first one being quite obvious - I'm piping the subprocesses STDERR to STDOUT so the result would be a mix in. Since reading from STDOUT and STDERR streams is a blocking operation, if you want to read them both separately you'll have to spawn two threads (and separately handle their ValueError exceptions when a stream size is exceeded). The second issue is that the subprocesses STDOUT can lock out the timeout check as it depends on STDOUT actually flushing some data. This can also be solved by a separate timer thread that will forcefully kill the process if the timeout is exceeded. In fact, that's exactly what Popen.communicate() does.
The principle of operation would essentially be the same, you'll just have to outsource the checks to separate threads and join everything back in the end. That's an exercise I'll leave to you ;)
As for your second missing feature, could you elaborate a bit more what you have in mind?
It seems that this problem is more complex than it seems, I had hard time to discover solutions on the Web and understand them all.
In fact, the complexity of the problem comes from the fact that there are several ways to solve it. I explored three ways (threading, multiprocessing and asyncio).
Finally, I chose to use a separate thread to listen to the current subprocess and capture the output of the program. It seems to me to be the simplest, the most portable and the most efficient way to proceed.
So, the basic idea behind this solution is to create a thread that will be listening to stdout and stderr and gather all the output. When you reach a limit, you just kill the process and return.
Here is a simplified version of my code:
from subprocess import Popen, PIPE, TimeoutExpired
from queue import Queue
from time import sleep
from threading import Thread
MAX_BUF = 35
def stream_reader(p, q, n):
stdout_buf, stderr_buf = b'', b''
while p.poll() is None:
sleep(0.1)
stdout_buf += p.stdout.read(n)
stderr_buf += p.stderr.read(n)
if (len(stdout_buf) > n) or (len(stderr_buf) > n):
stdout_buf, stderr_buf = stdout_buf[:n], stderr_buf[:n]
try:
p.kill()
except ProcessLookupError:
pass
break
q.put((stdout_buf.decode('utf-8', errors="ignore"),
stderr_buf.decode('utf-8', errors="ignore")))
# Main function
cmd = ['./example.sh']
proc = Popen(cmd, shell=False, stdin=PIPE, stdout=PIPE, stderr=PIPE)
q = Queue()
t_io = Thread(target=stream_reader, args=(proc, q, MAX_BUF,), daemon=True)
t_io.start()
# Running the process
try:
proc.stdin.write(b'AAAAAAA')
proc.stdin.close()
except IOError:
pass
try:
ret = proc.wait(timeout=20)
except TimeoutExpired:
ret = -1 # Or whatever code you decide to give it.
t_io.join()
sout, serr = q.get()
print(ret, sout, serr)
You can attach whatever you want to the example.sh script that is run. Note that there are several pitfalls that are avoided here to avoid deadlocks and broken code (I tested a bit this script). Yet, I am not totally sure of this script, so do not hesitate to mention obvious errors or improvements.

Checking to see if there is more data to read from a file descriptor using Python's select module

I have a program that creates a subprocess within a thread, so that the thread can be constantly checking for specific output conditions (from either stdout or stderr), and call the appropriate callbacks, while the rest of the program continues. Here is a pared-down version of that code:
import select
import subprocess
import threading
def run_task():
command = ['python', 'a-script-that-outputs-lines.py']
proc = subprocess.Popen(command, stdout = subprocess.PIPE, stderr = subprocess.PIPE)
while True:
ready, _, _ = select.select((proc.stdout, proc.stderr), (), (), .1)
if proc.stdout in ready:
next_line_to_process = proc.stdout.readline()
# process the output
if proc.stderr in ready:
next_line_to_process = proc.stderr.readline()
# process the output
if not ready and proc.poll() is not None:
break
thread = threading.Thread(target = run_task)
thread.run()
It works reasonably well, but I would like the thread to exit once two conditions are met: the running child process has finished, and all of the data in stdout and stderr has been processed.
The difficulty I have is that if my last condition is as it is above (if not ready and proc.poll() is not None), then the thread never exits, because once stdout and stderr's file descriptors are marked as ready, they never become unready (even after all of the data has been read from them, and read() would hang or readline() would return an empty string).
If I change that condition to just if proc.poll() is not None, then the loop exists when the program exits, and I can't guarantee that it's seen all of the data that needs to be processed.
Is this just the wrong approach, or is there a way to reliably determine when you've read all of the data that will ever be written to a file descriptor? Or is this an issue specific to trying to read from the stderr/stdout of a subprocess?
I have been trying this on Python 2.5 (running on OS X) and also tried select.poll() and select.epoll()-based variants on Python 2.6 (running on Debian with a 2.6 kernel).
select module is appropriate if you want to find out whether you can read from a pipe without blocking.
To make sure that you've read all data, use a simpler condition if proc.poll() is not None: break and call rest = [pipe.read() for pipe in [p.stdout, p.stderr]] after the loop.
It is unlikely that a subprocess closes its stdout/stderr before its shutdown therefore you could skip the logic that handles EOF for simplicity.
Don't call Thread.run() directly, use Thread.start() instead. You probably don't need the separate thread here at all.
Don't call p.stdout.readline() after the select(), it may block, use os.read(p.stdout.fileno(), limit) instead. Empty bytestring indicates EOF for the corresponding pipe.
As an alternative or in addition to you could make the pipes non-blocking using fcntl module:
import os
from fcntl import fcntl, F_GETFL, F_SETFL
def make_nonblocking(fd):
return fcntl(fd, F_SETFL, fcntl(fd, F_GETFL) | os.O_NONBLOCK)
and handle io/os errors while reading.
My eventual solution, as I mentioned above, was the following, in case this is helpful to anyone. I think it is the right approach, since I'm now 97.2% sure you can't do this with just select()/poll() and read():
import select
import subprocess
import threading
def run_task():
command = ['python', 'a-script-that-outputs-lines.py']
proc = subprocess.Popen(command, stdout = subprocess.PIPE, stderr = subprocess.PIPE)
while True:
ready, _, _ = select.select((proc.stdout, proc.stderr), (), (), .1)
if proc.stdout in ready:
next_line_to_process = proc.stdout.readline()
if next_line_to_process:
# process the output
elif proc.returncode is not None:
# The program has exited, and we have read everything written to stdout
ready = filter(lambda x: x is not proc.stdout, ready)
if proc.stderr in ready:
next_line_to_process = proc.stderr.readline()
if next_line_to_process:
# process the output
elif proc.returncode is not None:
# The program has exited, and we have read everything written to stderr
ready = filter(lambda x: x is not proc.stderr, ready)
if proc.poll() is not None and not ready:
break
thread = threading.Thread(target = run_task)
thread.run()
You could do a raw os.read(fd, size) on the pipe's file descriptor instead of using readline(). This is a non-blocking operation which can also detect EOF (in that case it returns an empty string or byte object). You'd have to implement the line splitting and buffering yourself. Use something like this:
class NonblockingReader():
def __init__(self, pipe):
self.fd = pipe.fileno()
self.buffer = ""
def readlines(self):
data = os.read(self.fd, 2048)
if not data:
return None
self.buffer += data
if os.linesep in self.buffer:
lines = self.buffer.split(os.linesep)
self.buffer = lines[-1]
return lines[:-1]
else:
return []

Categories