Continous output with asyncio subprocess

Continous output with asyncio subprocess - python

I'm using asyncio subprocess to execute a subcommand. I want to see the long-running process and save the content at the same time to a buffer for later use. Furthermore, I found this related question (Getting live output from asyncio subprocess), but it mainly centers around the use case for ssh.
The asyncio subprocess docs have an example for reading the output line-by-line, which goes into the direction of what I want to achieve. (https://docs.python.org/3/library/asyncio-subprocess.html#examples)
import asyncio
import sys
async def get_date():
code = 'import datetime; print(datetime.datetime.now())'
# Create the subprocess; redirect the standard output
# into a pipe.
proc = await asyncio.create_subprocess_exec(
sys.executable, '-c', code,
stdout=asyncio.subprocess.PIPE)
# Read one line of output.
data = await proc.stdout.readline()
line = data.decode('ascii').rstrip()
# Wait for the subprocess exit.
await proc.wait()
return line
date = asyncio.run(get_date())
print(f"Current date: {date}")
I adapted this example to the following:
async def subprocess_async(cmd, **kwargs):
cmd_list = shlex.split(cmd)
proc = await asyncio.create_subprocess_exec(
*cmd_list,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.STDOUT, **kwargs)
full_log = ""
while True:
buf = await proc.stdout.readline()
if not buf:
break
full_log += buf.decode()
print(f' {buf.decode().rstrip()}')
await proc.wait()
res = subprocess.CompletedProcess(cmd, proc.returncode, stdout=full_log.encode(), stderr=b'')
return res
The issue here is, that the proc.returncode value sometimes becomes None. I guess, I have a misunderstanding, how proc.wait() works and when it is safe to stop reading the output. How do I achieve continuous output using asyncio subprocess?

Your code is working fine as-is for me. What command are you trying to run that is causing your issue?
Two things I can think of to help, are
Instead of calling .wait() afterwards, set the returncode as the loop condition to keep running.
Don't wait for full line returns in case the program is like ffmpeg where it will do some tricks to paste over itself in console and not actually send newline characters.
Example code:
import asyncio, shlex, subprocess, sys
async def subprocess_async(cmd, **kwargs):
cmd_list = shlex.split(cmd)
proc = await asyncio.create_subprocess_exec(
*cmd_list,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.STDOUT,
**kwargs)
full_log = b""
while proc.returncode is None:
buf = await proc.stdout.read(20)
if not buf:
break
full_log += buf
sys.stdout.write(buf.decode())
res = subprocess.CompletedProcess(cmd, proc.returncode, stdout=full_log, stderr=b'')
return res
if __name__ == '__main__':
asyncio.run(subprocess_async("ffprobe -i video.mp4"))

Related

Start subprocess shell that waits for new inputs

I'm trying to start a shell from Python that will wait for some inputs.
In real life, the other shell will have a very long start-up sequence and that's why I want to keep it open and not restart it.
I would like to pass arguments from the main process to this shell and wait for the end of execution (an output file is generated). Then, based on the results, new arguments will be passed to the shell to run the next computation.
I tried to do so with asyncio but I cannot get it to work.
import asyncio
import os
def get_cmd_to_run_in_shell(filename: str) -> str:
return f"echo Hello World > {filename}.txt"
async def start_shell_that_waits_for_inputs():
print("Long start-up sequence of shell")
proc = await asyncio.create_subprocess_shell(
cmd="CMD /K",
stdin=asyncio.subprocess.PIPE,
shell=True
)
return proc
def parse_results():
"""Dummy code to generate the next input based on the previous computation"""
obj = os.scandir("./")
entries = list(obj)
last_index = entries[-1].name[-6]
return f"file_{last_index}"
async def main_process():
proc = await start_shell_that_waits_for_inputs()
await proc.communicate()
filename_to_generate = "file_0"
for _ in range(10):
cmd = get_cmd_to_run_in_shell(filename_to_generate)
proc.stdin.write(cmd)
proc.stdin.flush()
filename_to_generate = parse_results()
await proc.wait()
if __name__ == "__main__":
asyncio.run(main_process())

Is it possible to read multiple asyncio Streams concurrently?

I need to read the output of several asyncio tasks running concurrently.
These tasks are actually created using asyncio.create_subprocess_exec().
In the simplest form I would need to print stdout/stderr of a single process while accumulating lines in separate strings.
My current (working) code is:
async def run_command(*args, stdin=None, can_fail=False, echo=False):
"""
Run command asynchronously in subprocess.
Waits for command completion and returns return code, stdout and stdin
Example from:
http://asyncio.readthedocs.io/en/latest/subprocess.html
"""
# Create subprocess
try:
process = await asyncio.create_subprocess_exec(
*args,
stdin=asyncio.subprocess.PIPE,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE
)
except (FileNotFoundError, OSError):
if not can_fail:
log.error("run_command(%s): Error FileNotFound", args)
return -1, '', 'File "%s" NotFound' % args[0]
# Status
log.debug("run_command(%s): pid=%s", args, process.pid)
# Wait for the subprocess to finish
stdout, stderr = await process.communicate(stdin)
# Progress
if process.returncode == 0:
log.debug("run_command(%s): ok: %s", process.pid, stdout.decode().strip())
else:
log.debug("run_command(%s): ko: %s", process.pid, stderr.decode().strip())
# Result
result = process.returncode, stdout.decode().strip(), stderr.decode().strip()
# Return stdout
return result
Problem with this code is I see nothing till process terminates; some of the spawned processes may take several minutes to complete and would print "interesting" info while executing. How can I print (or log) output as soon as it happens while capturing? (I am aware that omitting capture the underlying process would print, but I also need the capture)
I tried to do something along the lines:
_stdout = ''
while True:
data = process.stdout.readline()
if not data:
break
print(data)
_stdout += data.decode()
but I have no idea how to extend this to multiple streams (in this case just stdout/stderr, but potentially expanding to multiple programs). Is there something akin to select() call?
Any hint welcome

Is there something akin to select() call?
The answer to this must be yes, as asyncio is wholly built around a call to select(). However it's not always obvious how to translate that to a select on the level of streams. The thing to notice is that you shouldn't try to select the stream exactly - instead, start reading on the stream and rely on the ability to select the progress of the coroutines. The equivalent of select() would thus be to use asyncio.wait(return_when=FIRST_COMPLETED) to drive the reads in a loop.
An even more elegant alternative is to spawn separate tasks where each does its thing, and just let them run in parallel. The code is easier to understand than with a select, boiling down to a single call to gather, and yet under the hood asyncio performs exactly the kind of select() that was requested:
import asyncio, sys, io
async def _read_all(stream, echo):
# helper function to read the whole stream, optionally
# displaying data as it arrives
buf = io.BytesIO() # BytesIO is preferred to +=
while True:
chunk = await stream.read(4096)
if len(chunk) == 0:
break
buf.write(chunk)
if echo:
sys.stdout.buffer.write(chunk)
return buf.getvalue()
async def run_command(*args, stdin=None, echo=False):
process = await asyncio.create_subprocess_exec(
*args,
stdin=asyncio.subprocess.PIPE,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE
)
if stdin is not None:
process.stdin.write(stdin)
process.stdin.close()
stdout, stderr = await asyncio.gather(
_read_all(process.stdout, echo),
_read_all(process.stderr, echo)
)
return process.returncode, stdout.decode().strip(), stderr.decode().strip()
Note that asyncio's write() is not a coroutine, it defaults to writing in the background, so we don't need to include the write among the coroutines we gather().

call xtail by tornado.proces.Subprocess

How can I call xtail by tornado.proces.Subprocess?
import subprocess
from tornado.ioloop import IOLoop
from tornado import gen
from tornado import process
class Reader(object):
def __init__(self, xwatch_path, max_idle=600, ioloop=None):
self.xwatch_path = xwatch_path
self.ioloop = ioloop
self.max_idle = max_idle
#gen.coroutine
def call_subprocess(self, cmd, stdin_data=None, stdin_async=False):
stdin = STREAM if stdin_async else subprocess.PIPE
sub_process = process.Subprocess(
cmd, stdin=stdin, stdout=STREAM, stderr=STREAM, io_loop=self.ioloop
)
if stdin_data:
if stdin_async:
yield gen.Task(sub_process.stdin.write, stdin_data)
else:
sub_process.stdin.write(stdin_data)
if stdin_async or stdin_data:
sub_process.stdin.close()
result, error = yield [
gen.Task(sub_process.stdout.read_until, '\n'),
gen.Task(sub_process.stderr.read_until, '\n')
]
print result
raise gen.Return((result, error))
#gen.coroutine
def popen(self):
while True:
result, error = yield self.call_subprocess(['xtail', self.xwatch_path])
print result, error
def read_log(ioloop):
access_reader = AccessLogReader(
'/home/vagrant/logs')
ioloop.add_callback(access_reader.popen)
def main():
ioloop = IOLoop.instance()
read_log(ioloop)
ioloop.start()
if __name__ == '__main__':
main()
I would like to collect a few of the log changes in the log folder, ready to use xtail multiple folders to collect logs, and then I develop the environment for debugging.
I use Vim to modify the ~/log/123.txt file, but I can't see the output.

The statement
result, error = yield [
gen.Task(sub_process.stdout.read_until, '\n'),
gen.Task(sub_process.stderr.read_until, '\n')
]
reads one line of the process's standard output and one line of standard error, and blocks until it has read both lines. If xtail only writes to one of the two streams, this will never complete.
You probably want to read in a loop (note that gen.Task is not necessary):
#gen.coroutine
def read_from_stream(stream):
try:
while True:
line = yield stream.read_until('\n')
print(line)
except StreamClosedError:
return
If you care about the difference between stdout and stderr, read from them separately. This will print lines from each stream as they arrive, and stop when both streams are closed:
yield [read_from_stream(sub_process.stdout), read_from_stream(sub_process.stderr)]
If you don't, merge them by passing stdout=STREAM, stderr=subprocess.STDOUT when creating the subprocess, and only read from sub_process.stdout.

Python Asyncio subprocess never finishes

i have a simple python program that I'm using to test asyncio with subprocesses:
import sys, time
for x in range(100):
print("processing (%s/100) " % x)
sys.stdout.flush()
print("enjoy")
sys.stdout.flush()
Running this on the command line produces the desired results.
However, when called from asyncio, it never finishes
process = yield from asyncio.create_subprocess_exec(
*["python", "program.py"],
stdout=async_subprocess.PIPE,
stderr=async_subprocess.STDOUT,
cwd=working_dir
)
# this never finishes
yield from process.communicate()
ps ax shows this process is <defunct>, not sure what that means

I suspect your issue is just related to how you're calling asyncio.create_subprocess_exec and process.communiate(). This complete example works fine for me:
import asyncio
from asyncio import subprocess
#asyncio.coroutine
def do_work():
process = yield from asyncio.create_subprocess_exec(
*["python", "program.py"],
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT
)
stdout, _= yield from process.communicate()
print(stdout)
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.run_until_complete(do_work())
You have to place code that uses yield from inside of a asyncio.coroutine, and then call it inside an event loop (using loop.run_until_complete), for it to behave the way you want it to.

How to replicate tee behavior in Python when using subprocess?

I'm looking for a Python solution that will allow me to save the output of a command in a file without hiding it from the console.
FYI: I'm asking about tee (as the Unix command line utility) and not the function with the same name from Python intertools module.
Details
Python solution (not calling tee, it is not available under Windows)
I do not need to provide any input to stdin for called process
I have no control over the called program. All I know is that it will output something to stdout and stderr and return with an exit code.
To work when calling external programs (subprocess)
To work for both stderr and stdout
Being able to differentiate between stdout and stderr because I may want to display only one of the to the console or I could try to output stderr using a different color - this means that stderr = subprocess.STDOUT will not work.
Live output (progressive) - the process can run for a long time, and I'm not able to wait for it to finish.
Python 3 compatible code (important)
References
Here are some incomplete solutions I found so far:
http://devlishgenius.blogspot.com/2008/10/logging-in-real-time-in-python.html (mkfifo works only on Unix)
http://blog.kagesenshi.org/2008/02/teeing-python-subprocesspopen-output.html (doesn't work at all)
Diagram http://blog.i18n.ro/wp-content/uploads/2010/06/Drawing_tee_py.png
Current code (second try)
#!/usr/bin/python
from __future__ import print_function
import sys, os, time, subprocess, io, threading
cmd = "python -E test_output.py"
from threading import Thread
class StreamThread ( Thread ):
def __init__(self, buffer):
Thread.__init__(self)
self.buffer = buffer
def run ( self ):
while 1:
line = self.buffer.readline()
print(line,end="")
sys.stdout.flush()
if line == '':
break
proc = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdoutThread = StreamThread(io.TextIOWrapper(proc.stdout))
stderrThread = StreamThread(io.TextIOWrapper(proc.stderr))
stdoutThread.start()
stderrThread.start()
proc.communicate()
stdoutThread.join()
stderrThread.join()
print("--done--")
#### test_output.py ####
#!/usr/bin/python
from __future__ import print_function
import sys, os, time
for i in range(0, 10):
if i%2:
print("stderr %s" % i, file=sys.stderr)
else:
print("stdout %s" % i, file=sys.stdout)
time.sleep(0.1)
Real output
stderr 1
stdout 0
stderr 3
stdout 2
stderr 5
stdout 4
stderr 7
stdout 6
stderr 9
stdout 8
--done--
Expected output was to have the lines ordered. Remark, modifying the Popen to use only one PIPE is not allowed because in the real life I will want to do different things with stderr and stdout.
Also even in the second case I was not able to obtain real-time like out, in fact all the results were received when the process finished. By default, Popen should use no buffers (bufsize=0).

I see that this is a rather old post but just in case someone is still searching for a way to do this:
proc = subprocess.Popen(["ping", "localhost"],
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
with open("logfile.txt", "w") as log_file:
while proc.poll() is None:
line = proc.stderr.readline()
if line:
print "err: " + line.strip()
log_file.write(line)
line = proc.stdout.readline()
if line:
print "out: " + line.strip()
log_file.write(line)

If requiring python 3.6 isn't an issue there is now a way of doing this using asyncio. This method allows you to capture stdout and stderr separately but still have both stream to the tty without using threads. Here's a rough outline:
class RunOutput:
def __init__(self, returncode, stdout, stderr):
self.returncode = returncode
self.stdout = stdout
self.stderr = stderr
async def _read_stream(stream, callback):
while True:
line = await stream.readline()
if line:
callback(line)
else:
break
async def _stream_subprocess(cmd, stdin=None, quiet=False, echo=False) -> RunOutput:
if isWindows():
platform_settings = {"env": os.environ}
else:
platform_settings = {"executable": "/bin/bash"}
if echo:
print(cmd)
p = await asyncio.create_subprocess_shell(
cmd,
stdin=stdin,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
**platform_settings
)
out = []
err = []
def tee(line, sink, pipe, label=""):
line = line.decode("utf-8").rstrip()
sink.append(line)
if not quiet:
print(label, line, file=pipe)
await asyncio.wait(
[
_read_stream(p.stdout, lambda l: tee(l, out, sys.stdout)),
_read_stream(p.stderr, lambda l: tee(l, err, sys.stderr, label="ERR:")),
]
)
return RunOutput(await p.wait(), out, err)
def run(cmd, stdin=None, quiet=False, echo=False) -> RunOutput:
loop = asyncio.get_event_loop()
result = loop.run_until_complete(
_stream_subprocess(cmd, stdin=stdin, quiet=quiet, echo=echo)
)
return result
The code above was based on this blog post: https://kevinmccarthy.org/2016/07/25/streaming-subprocess-stdin-and-stdout-with-asyncio-in-python/

This is a straightforward port of tee(1) to Python.
import sys
sinks = sys.argv[1:]
sinks = [open(sink, "w") for sink in sinks]
sinks.append(sys.stderr)
while True:
input = sys.stdin.read(1024)
if input:
for sink in sinks:
sink.write(input)
else:
break
I'm running on Linux right now but this ought to work on most platforms.
Now for the subprocess part, I don't know how you want to 'wire' the subprocess's stdin, stdout and stderr to your stdin, stdout, stderr and file sinks, but I know you can do this:
import subprocess
callee = subprocess.Popen(
["python", "-i"],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)
Now you can access callee.stdin, callee.stdout and callee.stderr like normal files, enabling the above "solution" to work. If you want to get the callee.returncode, you'll need to make an extra call to callee.poll().
Be careful with writing to callee.stdin: if the process has exited when you do that, an error may be rised (on Linux, I get IOError: [Errno 32] Broken pipe).

This is how it can be done
import sys
from subprocess import Popen, PIPE
with open('log.log', 'w') as log:
proc = Popen(["ping", "google.com"], stdout=PIPE, encoding='utf-8')
while proc.poll() is None:
text = proc.stdout.readline()
log.write(text)
sys.stdout.write(text)

If you don't want to interact with the process you can use the subprocess module just fine.
Example:
tester.py
import os
import sys
for file in os.listdir('.'):
print file
sys.stderr.write("Oh noes, a shrubbery!")
sys.stderr.flush()
sys.stderr.close()
testing.py
import subprocess
p = subprocess.Popen(['python', 'tester.py'], stdout=subprocess.PIPE,
stdin=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = p.communicate()
print stdout, stderr
In your situation you can simply write stdout/stderr to a file first. You can send arguments to your process with communicate as well, though I wasn't able to figure out how to continually interact with the subprocess.

On Linux, if you really need something like the tee(2) syscall, you can get it like this:
import os
import ctypes
ld = ctypes.CDLL(None, use_errno=True)
SPLICE_F_NONBLOCK = 0x02
def tee(fd_in, fd_out, length, flags=SPLICE_F_NONBLOCK):
result = ld.tee(
ctypes.c_int(fd_in),
ctypes.c_int(fd_out),
ctypes.c_size_t(length),
ctypes.c_uint(flags),
)
if result == -1:
errno = ctypes.get_errno()
raise OSError(errno, os.strerror(errno))
return result
To use this, you probably want to use Python 3.10 and something with os.splice (or use ctypes in the same way to get splice). See the tee(2) man page for an example.

My solution isn't elegant, but it works.
You can use powershell to gain access to "tee" under WinOS.
import subprocess
import sys
cmd = ['powershell', 'ping', 'google.com', '|', 'tee', '-a', 'log.txt']
if 'darwin' in sys.platform:
cmd.remove('powershell')
p = subprocess.Popen(cmd)
p.wait()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Continous output with asyncio subprocess - python

Related

Start subprocess shell that waits for new inputs

Is it possible to read multiple asyncio Streams concurrently?

call xtail by tornado.proces.Subprocess

Python Asyncio subprocess never finishes

How to replicate tee behavior in Python when using subprocess?

Categories

Resources