chaining line by line writing/reading of pipes in Python with subprocess

chaining line by line writing/reading of pipes in Python with subprocess - python

I have the following code which appears to work, for chaining pipes together in python with subprocess while reading / writing to them line by line (without using communicate() upfront). The code just calls a Unix command (mycmd), reads its output, then writes that to the stdin of another Unix command (next_cmd), and redirects the output of that last command to a file.
# some unix command that uses a pipe: command "a"
# writes to stdout and "b" reads it and writes to stdout
mycmd = "a | b"
mycmd_proc = subprocess.Popen(mycmd, shell=True,
stdin=sys.stdin,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
# nextCmd reads from stdin, and I'm passing it mycmd's output
next_cmd = "nextCmd -stdin"
output_file = open(output_filename, "w")
next_proc = subprocess.Popen(next_cmd, shell=True,
stdin=subprocess.PIPE,
stdout=output_file)
for line in iter(mycmd.stdout.readline, ''):
# do something with line
# ...
# write it to next command
next_proc.stdin.write(line)
### If I wanted to call another command here that passes next_proc output
### line by line to another command, would I need
### to call next_proc.communicate() first?
next_proc.communicate()
output_file.close()
This appears to work, and it only calls communicate() at the end of the command.
I'm trying to extend this code to add another command so you can do:
mycmd1 | mycmd2 | mycmd3 > some_file
meaning: line by line, read output of mycmd1 from Python, process the line, feed it to mycmd2, read mycmd2's output and line by line process it and feed it to mycmd3 which in turns puts its output in some_file. Is this possible or is this bound to end in deadlock/blocking/unflushed buffers? Note that I'm not just calling three unix commands as a pipe since I want to intervene with Python in between and post-process each command's output line by line before feeding it to the next command.
I want to avoid calling communicate and loading all the output into memory - instead I want to parse it line by line. thanks.

This should handle an arbitrary number of commands:
import sys
import subprocess
def processFirst(out):
return out
def processSecond(out):
return out
def processThird(out):
return out
commands = [("a|b", processFirst), ("nextCmd -stdin", processSecond), ("thirdCmd", processThird)]
previous_output = None
for cmd,process_func in commands:
if previous_output is None:
stdin = sys.stdin
else:
stdin = subprocess.PIPE
proc = subprocess.Popen(cmd, shell=True,
stdin = stdin,
stdout = subprocess.PIPE)
if previous_output is not None:
proc.stdin.write(previous_output)
out,err = proc.communicate()
out = process_func(out)
previous_output = out
Just add any command you want to run to the list of commands along with the function that should process its output. The output from the last command will end up being in previous_output at the end of the loop.
To avoid any deadlocking/buffering/etc issues, you simply run each command to completion using proc.communicate() which will return the output(instead of reading it directly as in your example). You then feed that into the next command before letting it run to completion, so on and so forth.
Edit: Just noticed that you don't want to use communicate() upfront and that you want to react line by line. I will edit my answer in a bit to address that
This answer provides an example on how to read line-by-line from a pipe without blocking using select.select().
Below is an example that uses it for your particular case:
import sys
import subprocess
import select
import os
class LineReader(object):
def __init__(self, fd, process_func):
self._fd = fd
self._buf = ''
self._process_func = process_func
self.next_proc = None
def fileno(self):
return self._fd
def readlines(self):
data = os.read(self._fd, 4096)
if not data:
# EOF
if self.next_proc is not None:
self.next_proc.stdin.close()
return None
self._buf += data
if '\n' not in data:
return []
tmp = self._buf.split('\n')
tmp_lines, self._buf = tmp[:-1], tmp[-1]
lines = []
for line in tmp_lines:
lines.append(self._process_func(line))
if self.next_proc is not None:
self.next_proc.stdin.write("%s\n" % lines[-1])
return lines
def processFirst(line):
return line
def processSecond(line):
return line
def processThird(line):
return line
commands = [("a|b", processFirst), ("nextCmd -stdin", processSecond), ("thirdCmd", processThird)]
readers = []
previous_reader = None
for cmd,process_func in commands:
if previous_reader is None:
stdin = sys.stdin
else:
stdin = subprocess.PIPE
proc = subprocess.Popen(cmd, shell=True,
stdin = stdin,
stdout = subprocess.PIPE)
if previous_reader is not None:
previous_reader.next_proc = proc
previous_reader = LineReader(proc.stdout.fileno(), process_func)
readers.append(previous_reader)
while readers:
ready,_,_ = select.select(readers, [], [], 10.0)
for stream in ready:
lines = stream.readlines()
if lines is None:
readers.remove(stream)

Related

Read both stdout and stderr concurrently, in realtime, line by line [duplicate]

I have a python subprocess that I'm trying to read output and error streams from. Currently I have it working, but I'm only able to read from stderr after I've finished reading from stdout. Here's what it looks like:
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout_iterator = iter(process.stdout.readline, b"")
stderr_iterator = iter(process.stderr.readline, b"")
for line in stdout_iterator:
# Do stuff with line
print line
for line in stderr_iterator:
# Do stuff with line
print line
As you can see, the stderr for loop can't start until the stdout loop completes. How can I modify this to be able to read from both in the correct order the lines come in?
To clarify: I still need to be able to tell whether a line came from stdout or stderr because they will be treated differently in my code.

The code in your question may deadlock if the child process produces enough output on stderr (~100KB on my Linux machine).
There is a communicate() method that allows to read from both stdout and stderr separately:
from subprocess import Popen, PIPE
process = Popen(command, stdout=PIPE, stderr=PIPE)
output, err = process.communicate()
If you need to read the streams while the child process is still running then the portable solution is to use threads (not tested):
from subprocess import Popen, PIPE
from threading import Thread
from Queue import Queue # Python 2
def reader(pipe, queue):
try:
with pipe:
for line in iter(pipe.readline, b''):
queue.put((pipe, line))
finally:
queue.put(None)
process = Popen(command, stdout=PIPE, stderr=PIPE, bufsize=1)
q = Queue()
Thread(target=reader, args=[process.stdout, q]).start()
Thread(target=reader, args=[process.stderr, q]).start()
for _ in range(2):
for source, line in iter(q.get, None):
print "%s: %s" % (source, line),
See:
Python: read streaming input from subprocess.communicate()
Non-blocking read on a subprocess.PIPE in python
Python subprocess get children's output to file and terminal?

Here's a solution based on selectors, but one that preserves order, and streams variable-length characters (even single chars).
The trick is to use read1(), instead of read().
import selectors
import subprocess
import sys
p = subprocess.Popen(
["python", "random_out.py"], stdout=subprocess.PIPE, stderr=subprocess.PIPE
)
sel = selectors.DefaultSelector()
sel.register(p.stdout, selectors.EVENT_READ)
sel.register(p.stderr, selectors.EVENT_READ)
while True:
for key, _ in sel.select():
data = key.fileobj.read1().decode()
if not data:
exit()
if key.fileobj is p.stdout:
print(data, end="")
else:
print(data, end="", file=sys.stderr)
If you want a test program, use this.
import sys
from time import sleep
for i in range(10):
print(f" x{i} ", file=sys.stderr, end="")
sleep(0.1)
print(f" y{i} ", end="")
sleep(0.1)

The order in which a process writes data to different pipes is lost after write.
There is no way you can tell if stdout has been written before stderr.
You can try to read data simultaneously from multiple file descriptors in a non-blocking way
as soon as data is available, but this would only minimize the probability that the order is incorrect.
This program should demonstrate this:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import select
import subprocess
testapps={
'slow': '''
import os
import time
os.write(1, 'aaa')
time.sleep(0.01)
os.write(2, 'bbb')
time.sleep(0.01)
os.write(1, 'ccc')
''',
'fast': '''
import os
os.write(1, 'aaa')
os.write(2, 'bbb')
os.write(1, 'ccc')
''',
'fast2': '''
import os
os.write(1, 'aaa')
os.write(2, 'bbbbbbbbbbbbbbb')
os.write(1, 'ccc')
'''
}
def readfds(fds, maxread):
while True:
fdsin, _, _ = select.select(fds,[],[])
for fd in fdsin:
s = os.read(fd, maxread)
if len(s) == 0:
fds.remove(fd)
continue
yield fd, s
if fds == []:
break
def readfromapp(app, rounds=10, maxread=1024):
f=open('testapp.py', 'w')
f.write(testapps[app])
f.close()
results={}
for i in range(0, rounds):
p = subprocess.Popen(['python', 'testapp.py'], stdout=subprocess.PIPE
, stderr=subprocess.PIPE)
data=''
for (fd, s) in readfds([p.stdout.fileno(), p.stderr.fileno()], maxread):
data = data + s
results[data] = results[data] + 1 if data in results else 1
print 'running %i rounds %s with maxread=%i' % (rounds, app, maxread)
results = sorted(results.items(), key=lambda (k,v): k, reverse=False)
for data, count in results:
print '%03i x %s' % (count, data)
print
print "=> if output is produced slowly this should work as whished"
print " and should return: aaabbbccc"
readfromapp('slow', rounds=100, maxread=1024)
print
print "=> now mostly aaacccbbb is returnd, not as it should be"
readfromapp('fast', rounds=100, maxread=1024)
print
print "=> you could try to read data one by one, and return"
print " e.g. a whole line only when LF is read"
print " (b's should be finished before c's)"
readfromapp('fast', rounds=100, maxread=1)
print
print "=> but even this won't work ..."
readfromapp('fast2', rounds=100, maxread=1)
and outputs something like this:
=> if output is produced slowly this should work as whished
and should return: aaabbbccc
running 100 rounds slow with maxread=1024
100 x aaabbbccc
=> now mostly aaacccbbb is returnd, not as it should be
running 100 rounds fast with maxread=1024
006 x aaabbbccc
094 x aaacccbbb
=> you could try to read data one by one, and return
e.g. a whole line only when LF is read
(b's should be finished before c's)
running 100 rounds fast with maxread=1
003 x aaabbbccc
003 x aababcbcc
094 x abababccc
=> but even this won't work ...
running 100 rounds fast2 with maxread=1
003 x aaabbbbbbbbbbbbbbbccc
001 x aaacbcbcbbbbbbbbbbbbb
008 x aababcbcbcbbbbbbbbbbb
088 x abababcbcbcbbbbbbbbbb

This works for Python3 (3.6):
p = subprocess.Popen(cmd, stdout=subprocess.PIPE,
stderr=subprocess.PIPE, universal_newlines=True)
# Read both stdout and stderr simultaneously
sel = selectors.DefaultSelector()
sel.register(p.stdout, selectors.EVENT_READ)
sel.register(p.stderr, selectors.EVENT_READ)
ok = True
while ok:
for key, val1 in sel.select():
line = key.fileobj.readline()
if not line:
ok = False
break
if key.fileobj is p.stdout:
print(f"STDOUT: {line}", end="")
else:
print(f"STDERR: {line}", end="", file=sys.stderr)

from https://docs.python.org/3/library/subprocess.html#using-the-subprocess-module
If you wish to capture and combine both streams into one, use
stdout=PIPE and stderr=STDOUT instead of capture_output.
so the easiest solution would be:
process = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
stdout_iterator = iter(process.stdout.readline, b"")
for line in stdout_iterator:
# Do stuff with line
print line

I know this question is very old, but this answer may help others who stumble upon this page in researching a solution for a similar situation, so I'm posting it anyway.
I've built a simple python snippet that will merge any number of pipes into a single one. Of course, as stated above, the order cannot be guaranteed, but this is as close as I think you can get in Python.
It spawns a thread for each of the pipes, reads them line by line and puts them into a Queue (which is FIFO). The main thread loops through the queue, yielding each line.
import threading, queue
def merge_pipes(**named_pipes):
r'''
Merges multiple pipes from subprocess.Popen (maybe other sources as well).
The keyword argument keys will be used in the output to identify the source
of the line.
Example:
p = subprocess.Popen(['some', 'call'],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
outputs = {'out': log.info, 'err': log.warn}
for name, line in merge_pipes(out=p.stdout, err=p.stderr):
outputs[name](line)
This will output stdout to the info logger, and stderr to the warning logger
'''
# Constants. Could also be placed outside of the method. I just put them here
# so the method is fully self-contained
PIPE_OPENED=1
PIPE_OUTPUT=2
PIPE_CLOSED=3
# Create a queue where the pipes will be read into
output = queue.Queue()
# This method is the run body for the threads that are instatiated below
# This could be easily rewritten to be outside of the merge_pipes method,
# but to make it fully self-contained I put it here
def pipe_reader(name, pipe):
r"""
reads a single pipe into the queue
"""
output.put( ( PIPE_OPENED, name, ) )
try:
for line in iter(pipe.readline,''):
output.put( ( PIPE_OUTPUT, name, line.rstrip(), ) )
finally:
output.put( ( PIPE_CLOSED, name, ) )
# Start a reader for each pipe
for name, pipe in named_pipes.items():
t=threading.Thread(target=pipe_reader, args=(name, pipe, ))
t.daemon = True
t.start()
# Use a counter to determine how many pipes are left open.
# If all are closed, we can return
pipe_count = 0
# Read the queue in order, blocking if there's no data
for data in iter(output.get,''):
code=data[0]
if code == PIPE_OPENED:
pipe_count += 1
elif code == PIPE_CLOSED:
pipe_count -= 1
elif code == PIPE_OUTPUT:
yield data[1:]
if pipe_count == 0:
return

This works for me (on windows):
https://github.com/waszil/subpiper
from subpiper import subpiper
def my_stdout_callback(line: str):
print(f'STDOUT: {line}')
def my_stderr_callback(line: str):
print(f'STDERR: {line}')
my_additional_path_list = [r'c:\important_location']
retcode = subpiper(cmd='echo magic',
stdout_callback=my_stdout_callback,
stderr_callback=my_stderr_callback,
add_path_list=my_additional_path_list)

rsync called by subprocess Popen works when running script but does not when I generate an app with py2app

This is my code:
def uploadByRSync(host, user, passwd, src, dst, statusManager):
try:
os.environ["RSYNC_PASSWORD"] = passwd
print host, user, passwd, src, dst
parameters = ["rsync", "-azP", "--partial", src ,"{3}#{0}::{2}/{1}".format(host, dst, user, user)]
print " ".join(parameters)
process = subprocess.Popen(parameters, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, universal_newlines=True)
for line in unbuffered(process):
if "%" in line:
spl = line.split()
statusManager.uploadSpeed = spl[2]
statusManager.uploaded = spl[1]
return not process.wait()
except Exception as ex:
print ex
return False
newlines = ['\n', '\r\n', '\r']
def unbuffered(proc, stream='stdout'):
stream = getattr(proc, stream)
with contextlib.closing(stream):
while True:
out = []
last = stream.read(1)
# Don't loop forever
if last == '' and proc.poll() is not None:
break
while last not in newlines:
# Don't loop forever
if last == '' and proc.poll() is not None:
break
out.append(last)
last = stream.read(1)
out = ''.join(out)
print out
yield out
When running with the py2app version I can never get an output. When running as script everything works just fine. ps: this code runs on a separated thread of a Qt app. Does anyone have any idea why this is happening?

Most likely you have a stream buffering issue. Here is how you can output all lines of your process in real time:
import subprocess
import select
p = subprocess.Popen(parameters,
stdout=subprocess.PIPE, stderr=subprocess.PIPE,
bufsize=0)
poll = [p.stdout.fileno(), p.stderr.fileno()]
while True:
# check if process is still running and read remaining data
if p.poll() is not None:
for l in p.stdout.readlines():
print(l)
for l in p.stderr.readlines():
print(l)
break
# blocks until data is being recieved
ret = select.select(poll, [], [])
for fd in ret[0]:
line = p.stdout.readline() if fd == p.stdout.fileno() else p.stderr.readline()
print(line)

Just made a test changing the Popen call by a simple 'ls',but I still cannot get the output when running py2app version. It works just fine when running python script. When I kill the py2app version app the output is just printed.
process = subprocess.Popen(["ls"], stdout=subprocess.PIPE, stderr=subprocess.STDOUT, universal_newlines=True)

Print output of external command in realtime and have it in a string at the same time in python

For example:
#!/usr/bin/env python3
# cmd.py
import time
for i in range(10):
print("Count %d" % i)
time.sleep(1)
#!/usr/bin/env python3
import subprocess
# useCmd.py
p = subprocess.Popen(['./cmd.py'], stdout=subprocess.PIPE)
out, err = p.communicate()
out = out.decode()
print(out)
In useCmd.py I can print out the output of cmd.py, but only after it's finished outputting. How can I print out it in realtime and still have it stored in a string? (sort of like tee in bash.)

If you don't have to deal with stdin, you could avoid using communicate that is blocking, and read directly from the process stdout until your stdout ends:
p = subprocess.Popen(['python', 'cmd.py'], stdout=subprocess.PIPE)
# out, err = p.communicate()
while True:
line = p.stdout.readline()
if line != '':
print line,
else:
break
related

Python: Closing a for loop by reading stdout

import os
dictionaryfile = "/root/john.txt"
pgpencryptedfile = "helloworld.txt.gpg"
array = open(dictionaryfile).readlines()
for x in array:
x = x.rstrip('\n')
newstring = "echo " + x + " | gpg --passphrase-fd 0 " + pgpencryptedfile
os.popen(newstring)
I need to create something inside the for loop that will read gpg's output. When gpg outputs this string gpg: WARNING: message was not integrity protected, I need the loop to close and print Success!
How can I do this, and what is the reasoning behind it?
Thanks Everyone!

import subprocess
def check_file(dictfile, pgpfile):
# Command to run, constructed as a list to prevent shell-escaping accidents
cmd = ["gpg", "--passphrase-fd", "0", pgpfile]
# Launch process, with stdin/stdout wired up to `p.stdout` and `p.stdin`
p = subprocess.Popen(cmd, stdin = subprocess.PIPE, stdout = subprocess.PIPE)
# Read dictfile, and send contents to stdin
passphrase = open(dictfile).read()
p.stdin.write(passphrase)
# Read stdout and check for message
stdout, stderr = p.communicate()
for line in stdout.splitlines():
if line.strip() == "gpg: WARNING: message was not integrity protected":
# Relevant line was found
return True
# Line not found
return False
Then to use:
not_integrity_protected = check_file("/root/john.txt", "helloworld.txt.gpg")
if not_integrity_protected:
print "Success!"
If the "gpg: WARNING:" message is actually on stderr (which I would suspect it is), change the subprocess.Popen line to this:
p = subprocess.Popen(cmd, stdin = subprocess.PIPE, stderr = subprocess.PIPE)
..and the for loop from stdout to stderr, like this:
for line in stderr.splitlines():

Use subprocess.check_output to call gpg and break the loop based on its output.
Something like this (untested since I don't know anything about gpg):
import subprocess
dictionaryfile = "/root/john.txt"
pgpencryptedfile = "helloworld.txt.gpg"
with open(dictionaryfile, 'r') as f:
for line in f:
x = line.rstrip('\n')
cmd = ["echo " + x + " | gpg --passphrase-fd 0 " + pgpencryptedfile]
output = subprocess.check_output(cmd, shell=True)
if 'gpg: WARNING: message was not integrity protected' in output:
break

You could use the subprocess module which allows you to use:
subprocess.call(args, *, stdin, stdout, stderr, shell)
(See the Python Documentation for how to use the parameters.)
This is good because you can easily read in the exit code of whatever program you call.
For example if you change 'newstring' to:
"echo " + x + " | gpg --passphrase-fd 0 " + pgpencryptedfile | grep 'gpg: WARNING: message was not integrity protected'
grep will then return 0 if there is a match and a 1 if not matches are found. (Source)
This exit code from grep will be returned from the subprocess.call() function and you can easily store it in a variable and use an if statement.
Edit: As Matthew Adams mentions below, you could also read the exit code of gpg itself.

How to get the last N lines of a subprocess' stderr stream output?

I am a Python newbie writing a Python (2.7) script that needs to exec a number of external applications, one of which writes a lot of output to its stderr stream. What I am trying to figure out is a concise and succinct way (in Python) to get the last N lines from that subprocess' stderr output stream.
Currently, I am running that external application from my Python script like so:
p = subprocess.Popen('/path/to/external-app.sh', stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = p.communicate()
if p.returncode != 0:
print "ERROR: External app did not complete successfully (error code is " + str(p.returncode) + ")"
print "Error/failure details: ", stderr
status = False
else:
status = True
I'd like to capture the last N lines of output from its stderr stream so that they can be written to a log file or emailed, etc.

N = 3 # for 3 lines of output
p = subprocess.Popen(['/path/to/external-app.sh'],
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = p.communicate()
if p.returncode != 0:
print ("ERROR: External app did not complete successfully "
"(error code is %s)" % p.returncode)
print "Error/failure details: ", '\n'.join(stderr.splitlines()[-N:])
status = False
else:
status = True

If the whole output can't be stored in RAM then:
import sys
from collections import deque
from subprocess import Popen, PIPE
from threading import Thread
ON_POSIX = 'posix' in sys.builtin_module_names
def start_thread(func, *args):
t = Thread(target=func, args=args)
t.daemon = True
t.start()
return t
def consume(infile, output):
for line in iter(infile.readline, ''):
output(line)
infile.close()
p = Popen(['cat', sys.argv[1]], stdout=PIPE, stderr=PIPE,
bufsize=1, close_fds=ON_POSIX)
# preserve last N lines of stdout, print stderr immediately
N = 100
queue = deque(maxlen=N)
threads = [start_thread(consume, *args)
for args in (p.stdout, queue.append), (p.stderr, sys.stdout.write)]
for t in threads: t.join() # wait for IO completion
print ''.join(queue), # print last N lines
retcode = p.wait()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

chaining line by line writing/reading of pipes in Python with subprocess - python

Related

Read both stdout and stderr concurrently, in realtime, line by line [duplicate]

rsync called by subprocess Popen works when running script but does not when I generate an app with py2app

Print output of external command in realtime and have it in a string at the same time in python

Python: Closing a for loop by reading stdout

How to get the last N lines of a subprocess' stderr stream output?

Categories

Resources