Python 2 to 3 conversion: iterating over lines in subprocess stdout - python

I have the following Python 2 example code that I want to make compatible with Python 3:
call = 'for i in {1..5}; do sleep 1; echo "Hello $i"; done'
p = subprocess.Popen(call, stdout=subprocess.PIPE, shell=True)
for line in iter(p.stdout.readline, ''):
print(line, end='')
This works well in Python 2 but in Python 3 p.stdout does not allow me to specify an encoding and reading it will return byte strings, rather than Unicode, so the comparison with '' will always return false and iter won't stop. This issue seems to imply that in Python 3.6 there'll be a way to define this encoding.
For now, I have changed the iter call to stop when it finds an empty bytes string iter(p.stdout.readline, b''), which seems to work in 2 and 3. My questions are: Is this safe in both 2 and 3? Is there a better way of ensuring compatibility?
Note: I'm not using for line in p.stdout: because I need each line to be printed as it's generated and according to this answer p.stdout has a too large a buffer.

You can add unversal_newlines=True.
p = subprocess.Popen(call, stdout=subprocess.PIPE, shell=True, universal_newlines=True)
for line in iter(p.stdout.readline, ''):
print(line, end='')
Instead of bytes, str will be returned so '' will work in both situations.
Here is what the docs have to say about the option:
If universal_newlines is False the file objects stdin, stdout and
stderr will be opened as binary streams, and no line ending conversion
is done.
If universal_newlines is True, these file objects will be opened as
text streams in universal newlines mode using the encoding returned by
locale.getpreferredencoding(False). For stdin, line ending characters
'\n' in the input will be converted to the default line separator
os.linesep. For stdout and stderr, all line endings in the output will
be converted to '\n'. For more information see the documentation of
the io.TextIOWrapper class when the newline argument to its
constructor is None.
It's not explicitly called out about the bytes versus str difference, but it is implied by stating that False returns a binary stream and True returns a text stream.

You can use p.communicate() and then decode it if it is a bytes object:
from __future__ import print_function
import subprocess
def b(t):
if isinstance(t, bytes):
return t.decode("utf8")
return t
call = 'for i in {1..5}; do sleep 1; echo "Hello $i"; done'
p = subprocess.Popen(call, stdout=subprocess.PIPE, shell=True)
stdout, stderr = p.communicate()
for line in iter(b(stdout).splitlines(), ''):
print(line, end='')
This would work in both Python 2 and Python 3

Related

Catch universal newlines but preserve original

So this is my problem,
I'm trying to do a simple program that runs another process using Python's subprocess module, and I want to catch real-time output of the process.
I know this can be done as such:
proc = subprocess.Popen(cmd, stdout=subprocess.PIPE)
for line in iter(proc.stdout.readline, ""):
line = line.rstrip()
if line != "":
print(line)
The issue is, the process might generate output with a carriage return \r, and I want to simulate that behavior in my program.
If I use the universal_newlines flag in Popen, then I could catch the output that is generated with a carriage return, but I wouldn't know it was as such, and I could only print it "regularly" with a newline. I want to avoid that, as this could be a lot of output.
My question is basically if I could catch the \r output like it is a \n but differentiate it from actual \n output
EDIT
Here is some simplified code of what I tried:
File download.py:
import subprocess
try:
subprocess.check_call(
[
"aws",
"s3",
"cp",
"S3_LINK",
"TARGET",
]
)
except subprocess.CalledProcessError as err:
print(err)
raise SystemExit(1)
File process_runner.py:
import os
import sys
import subprocess
proc = subprocess.Popen(cmd, stdout=subprocess.PIPE)
for char in iter(lambda: proc.stdout.read(1), ""):
sys.stdout.write(char)
The code in download uses aws s3 cp, which gives carriage returns of the download progress. I want to simulate this behavior of output in my program process_runner which receives download's output.
At first I tried to iter readline instead of read(1). That did not work due to the CR being overlooked.
A possible way is to use the binary interface of Popen by specifying neither encoding nor error and of course not universal_newline. And then, we can use a TextIOWrapper around the binary stream, with newline=''. Because the documentation for TextIOWrapper says:
... if newline is None... If it is '', universal newlines mode is enabled, but line endings are returned to the caller untranslated
(which is conformant with PEP 3116)
You original code could be changed to:
proc = subprocess.Popen(cmd, stdout=subprocess.PIPE)
out = io.TextIOWrapper(proc.stdout, newline='')
for line in out:
# line is delimited with the universal newline convention and actually contains
# the original end of line, be it a raw \r, \n of the pair \r\n
...

How to get Popen stdout in realtime NOT line by line in Python?

In general this questions has a lot of answers but they are all limited to line by line reading. For example this code:
def execute(cmd):
popen = subprocess.Popen(cmd, stdout=subprocess.PIPE, universal_newlines=True)
for stdout_line in iter(popen.stdout.readline, ""):
yield stdout_line
popen.stdout.close()
return_code = popen.wait()
if return_code:
raise subprocess.CalledProcessError(return_code, cmd)
But there are output lines for example like this (where dots are added once in ~10s):
............................
They show progress of a task that runs. I don't want to stop output until the line is finished and only then print the whole dots line.
So I need to yield string when:
I can read a block of 1024 symbols of output (or just the whole output)
there are ANY symbols of output and more then 1s passed (no matter line is finished or not)
But I don't know how to do this.
p.s. Maybe a dup. Didn't find.
If you're on Linux you can use stdbuf -o0 in front of the command you're executing to make its stdout become unbuffered (i.e. instantaneous).

How to remove b and \n from variable/text file in Python3? (TypeError)

This gives me a massive headache. My code:
import subprocess
proc = subprocess.Popen("php /var/scripts/data.php", shell=True, stdout=subprocess.PIPE)
scriptresponse = proc.stdout.read()
print (scriptresponse)
Output:
b'January\n'
I tried scriptresponse.replace ('\n', '') but failed:
TypeError: 'str' does not support the buffer interface
How to remove b and \n from scriptresponse so the output will look like this:
January
Try adding universal_newlines=True as an argument to the Popen call.
As mentioned in the docs:
If universal_newlines is True, the file objects stdin, stdout and stderr are opened as text streams in universal newlines mode, as described above in Frequently Used Arguments, otherwise they are opened as binary streams.
Right now you have a binary string (indicated by the b). If you still have the trailing newline, use the rstrip() method on the string to remove the trailing characters.

Getting output of a process at runtime

I am using a python script to run a process using subprocess.Popen and simultaneously store the output in a text file as well as print it on the console. This is my code:
result = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
for line in result.stdout.readlines(): #read and store result in log file
openfile.write("%s\n" %line)
print("%s" %line)
Above code works fine, but what it does is it first completes the process and stores the output in result variable. After that for loop stores the output as well as print it.
But i want the output at runtime (as my process can take hours to complete, i don't get any output for all these hours).
So is there any other function that gives me the output dynamically (at runtime), means as soon as the process gives first line, it should get printed.
The problem here is that .readlines() gets the entire output before returning, as it constructs a full list. Just iterate directly:
for line in result.stdout:
print(line)
.readlines() returns a list of all the lines the process will return while open, i.e., it doesn't return anything until all output from the subprocess is received. To read line by line in "real time":
import sys
from subprocess import Popen, PIPE
proc = Popen(cmd, shell=True, bufsize=1, stdout=PIPE)
for line in proc.stdout:
openfile.write(line)
sys.stdout.buffer.write(line)
sys.stdout.buffer.flush()
proc.stdout.close()
proc.wait()
Note: if the subprocess uses block-buffering when it is run in non-interactive mode; you might need pexpect, pty modules or stdbuf, unbuffer, script commands.
Note: on Python 2, you might also need to use iter(), to get "real time" output:
for line in iter(proc.stdout.readline, ""):
openfile.write(line)
print line,
You can iterate over the lines one by one by using readline on the pipe:
while True:
line = result.stdout.readline()
print line.strip()
if not line:
break
The lines contain a trailing \n which I stripped for printing.
When the process terminates, readline returns an empty string, so you know when to stop.

Output of proc.communicate() does not format newlines in django python

I have a subprocess using communicate to get the output ans saving it to my database:
p = Popen([str(pre_sync), '-avu', str(src), str(dest)], stdout=PIPE)
syncoutput = p.communicate()
check.log = syncoutput
It all works fine, but the output from communicate looks like this:
('sending incremental file list\n\nsent 89 bytes received 13 bytes 204.00 bytes/sec\ntotal size is 25 speedup is 0.25\n', None)
All in a single line and with the "\n" inserted. Is there a way I can make it print each line in a new line? Thanks in advance.
syncoutput,sync_error = p.communicate()
print(syncoutput)
p.communicate() returns a 2-tuple, composed of the output from stdout and stderr. When you print the 2-tuple, you see the \n characters. When you print the string (of the new syncoutput), you will get formatted text.

Categories