Catch universal newlines but preserve original

Catch universal newlines but preserve original - python

So this is my problem,
I'm trying to do a simple program that runs another process using Python's subprocess module, and I want to catch real-time output of the process.
I know this can be done as such:
proc = subprocess.Popen(cmd, stdout=subprocess.PIPE)
for line in iter(proc.stdout.readline, ""):
line = line.rstrip()
if line != "":
print(line)
The issue is, the process might generate output with a carriage return \r, and I want to simulate that behavior in my program.
If I use the universal_newlines flag in Popen, then I could catch the output that is generated with a carriage return, but I wouldn't know it was as such, and I could only print it "regularly" with a newline. I want to avoid that, as this could be a lot of output.
My question is basically if I could catch the \r output like it is a \n but differentiate it from actual \n output
EDIT
Here is some simplified code of what I tried:
File download.py:
import subprocess
try:
subprocess.check_call(
[
"aws",
"s3",
"cp",
"S3_LINK",
"TARGET",
]
)
except subprocess.CalledProcessError as err:
print(err)
raise SystemExit(1)
File process_runner.py:
import os
import sys
import subprocess
proc = subprocess.Popen(cmd, stdout=subprocess.PIPE)
for char in iter(lambda: proc.stdout.read(1), ""):
sys.stdout.write(char)
The code in download uses aws s3 cp, which gives carriage returns of the download progress. I want to simulate this behavior of output in my program process_runner which receives download's output.
At first I tried to iter readline instead of read(1). That did not work due to the CR being overlooked.

A possible way is to use the binary interface of Popen by specifying neither encoding nor error and of course not universal_newline. And then, we can use a TextIOWrapper around the binary stream, with newline=''. Because the documentation for TextIOWrapper says:
... if newline is None... If it is '', universal newlines mode is enabled, but line endings are returned to the caller untranslated
(which is conformant with PEP 3116)
You original code could be changed to:
proc = subprocess.Popen(cmd, stdout=subprocess.PIPE)
out = io.TextIOWrapper(proc.stdout, newline='')
for line in out:
# line is delimited with the universal newline convention and actually contains
# the original end of line, be it a raw \r, \n of the pair \r\n
...

Related

Passing input to subprocess popen at runtime based on stdout string

I am trying to run following code
process = subprocess.Popen(args=cmd, shell=True, stdout=subprocess.PIPE)
while process.poll() is None:
stdoutput = process.stdout.readline()
print(stdoutput.decode())
if '(Y/N)' in stdoutput.decode():
process.communicate(input=b'Y\n')
this cmd argument runs for a few minutes after which it prompts for a confirmation, but the process.communicate is not working, neither is process.stdin.write()
How do I send input string 'Y' to this running process when it prompts for confirmation

Per the doc on Popen.communicate(input=None, timeout=None):
Note that if you want to send data to the process’s stdin, you need to create the Popen object with stdin=PIPE.
Please try that, and if it's not sufficient, do indicate what the symptom is.

On top of the answer from #Jerry101, if the subprocess that you are calling is a python script that uses the input(), be aware that as documented:
If the prompt argument is present, it is written to standard output without a trailing newline.
Thus if you perform readline() as in process.stdout.readline(), it would hang there waiting for the new line \n character as documented:
f.readline() reads a single line from the file; a newline character (\n) is left at the end of the string
A quick fix is append the newline \n when requesting the input() e.g. input("(Y/N)\n") instead of just input("(Y/N)").
Related question:
Python subprocess stdout doesn't capture input prompt

How to remove b and \n from variable/text file in Python3? (TypeError)

This gives me a massive headache. My code:
import subprocess
proc = subprocess.Popen("php /var/scripts/data.php", shell=True, stdout=subprocess.PIPE)
scriptresponse = proc.stdout.read()
print (scriptresponse)
Output:
b'January\n'
I tried scriptresponse.replace ('\n', '') but failed:
TypeError: 'str' does not support the buffer interface
How to remove b and \n from scriptresponse so the output will look like this:
January

Try adding universal_newlines=True as an argument to the Popen call.
As mentioned in the docs:
If universal_newlines is True, the file objects stdin, stdout and stderr are opened as text streams in universal newlines mode, as described above in Frequently Used Arguments, otherwise they are opened as binary streams.
Right now you have a binary string (indicated by the b). If you still have the trailing newline, use the rstrip() method on the string to remove the trailing characters.

Python 2 to 3 conversion: iterating over lines in subprocess stdout

I have the following Python 2 example code that I want to make compatible with Python 3:
call = 'for i in {1..5}; do sleep 1; echo "Hello $i"; done'
p = subprocess.Popen(call, stdout=subprocess.PIPE, shell=True)
for line in iter(p.stdout.readline, ''):
print(line, end='')
This works well in Python 2 but in Python 3 p.stdout does not allow me to specify an encoding and reading it will return byte strings, rather than Unicode, so the comparison with '' will always return false and iter won't stop. This issue seems to imply that in Python 3.6 there'll be a way to define this encoding.
For now, I have changed the iter call to stop when it finds an empty bytes string iter(p.stdout.readline, b''), which seems to work in 2 and 3. My questions are: Is this safe in both 2 and 3? Is there a better way of ensuring compatibility?
Note: I'm not using for line in p.stdout: because I need each line to be printed as it's generated and according to this answer p.stdout has a too large a buffer.

You can add unversal_newlines=True.
p = subprocess.Popen(call, stdout=subprocess.PIPE, shell=True, universal_newlines=True)
for line in iter(p.stdout.readline, ''):
print(line, end='')
Instead of bytes, str will be returned so '' will work in both situations.
Here is what the docs have to say about the option:
If universal_newlines is False the file objects stdin, stdout and
stderr will be opened as binary streams, and no line ending conversion
is done.
If universal_newlines is True, these file objects will be opened as
text streams in universal newlines mode using the encoding returned by
locale.getpreferredencoding(False). For stdin, line ending characters
'\n' in the input will be converted to the default line separator
os.linesep. For stdout and stderr, all line endings in the output will
be converted to '\n'. For more information see the documentation of
the io.TextIOWrapper class when the newline argument to its
constructor is None.
It's not explicitly called out about the bytes versus str difference, but it is implied by stating that False returns a binary stream and True returns a text stream.

You can use p.communicate() and then decode it if it is a bytes object:
from __future__ import print_function
import subprocess
def b(t):
if isinstance(t, bytes):
return t.decode("utf8")
return t
call = 'for i in {1..5}; do sleep 1; echo "Hello $i"; done'
p = subprocess.Popen(call, stdout=subprocess.PIPE, shell=True)
stdout, stderr = p.communicate()
for line in iter(b(stdout).splitlines(), ''):
print(line, end='')
This would work in both Python 2 and Python 3

Getting output of a process at runtime

I am using a python script to run a process using subprocess.Popen and simultaneously store the output in a text file as well as print it on the console. This is my code:
result = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
for line in result.stdout.readlines(): #read and store result in log file
openfile.write("%s\n" %line)
print("%s" %line)
Above code works fine, but what it does is it first completes the process and stores the output in result variable. After that for loop stores the output as well as print it.
But i want the output at runtime (as my process can take hours to complete, i don't get any output for all these hours).
So is there any other function that gives me the output dynamically (at runtime), means as soon as the process gives first line, it should get printed.

The problem here is that .readlines() gets the entire output before returning, as it constructs a full list. Just iterate directly:
for line in result.stdout:
print(line)

.readlines() returns a list of all the lines the process will return while open, i.e., it doesn't return anything until all output from the subprocess is received. To read line by line in "real time":
import sys
from subprocess import Popen, PIPE
proc = Popen(cmd, shell=True, bufsize=1, stdout=PIPE)
for line in proc.stdout:
openfile.write(line)
sys.stdout.buffer.write(line)
sys.stdout.buffer.flush()
proc.stdout.close()
proc.wait()
Note: if the subprocess uses block-buffering when it is run in non-interactive mode; you might need pexpect, pty modules or stdbuf, unbuffer, script commands.
Note: on Python 2, you might also need to use iter(), to get "real time" output:
for line in iter(proc.stdout.readline, ""):
openfile.write(line)
print line,

You can iterate over the lines one by one by using readline on the pipe:
while True:
line = result.stdout.readline()
print line.strip()
if not line:
break
The lines contain a trailing \n which I stripped for printing.
When the process terminates, readline returns an empty string, so you know when to stop.

using Python subprocess to redirect stdout to stdin?

I'm making a call to a program from the shell using the subprocess module that outputs a binary file to STDOUT.
I use Popen() to call the program and then I want to pass the stream to a function in a Python package (called "pysam") that unfortunately cannot Python file objects, but can read from STDIN. So what I'd like to do is have the output of the shell command go from STDOUT into STDIN.
How can this be done from within Popen/subprocess module? This is the way I'm calling the shell program:
p = subprocess.Popen(my_cmd, stdout=subprocess.PIPE, shell=True).stdout
This will read "my_cmd"'s STDOUT output and get a stream to it in p. Since my Python module cannot read from "p" directly, I am trying to redirect STDOUT of "my_cmd" back into STDIN using:
p = subprocess.Popen(my_cmd, stdout=subprocess.PIPE, stdin=subprocess.PIPE, shell=True).stdout
I then call my module, which uses "-" as a placeholder for STDIN:
s = pysam.Samfile("-", "rb")
The above call just means read from STDIN (denoted "-") and read it as a binary file ("rb").
When I try this, I just get binary output sent to the screen, and it doesn't look like the Samfile() function can read it. This occurs even if I remove the call to Samfile, so I think it's my call to Popen that is the problem and not downstream steps.
EDIT: In response to answers, I tried:
sys.stdin = subprocess.Popen(tagBam_cmd, stdout=subprocess.PIPE, shell=True).stdout
print "Opening SAM.."
s = pysam.Samfile("-","rb")
print "Done?"
sys.stdin = sys.__stdin__
This seems to hang. I get the output:
Opening SAM..
but it never gets past the Samfile("-", "rb") line. Any idea why?
Any idea how this can be fixed?
EDIT 2: I am adding a link to Pysam documentation in case it helps, I really cannot figure this out. The documentation page is:
http://wwwfgu.anat.ox.ac.uk/~andreas/documentation/samtools/usage.html
and the specific note about streams is here:
http://wwwfgu.anat.ox.ac.uk/~andreas/documentation/samtools/usage.html#using-streams
In particular:
"""
Pysam does not support reading and writing from true python file objects, but it does support reading and writing from stdin and stdout. The following example reads from stdin and writes to stdout:
infile = pysam.Samfile( "-", "r" )
outfile = pysam.Samfile( "-", "w", template = infile )
for s in infile: outfile.write(s)
It will also work with BAM files. The following script converts a BAM formatted file on stdin to a SAM formatted file on stdout:
infile = pysam.Samfile( "-", "rb" )
outfile = pysam.Samfile( "-", "w", template = infile )
for s in infile: outfile.write(s)
Note, only the file open mode needs to changed from r to rb.
"""
So I simply want to take the stream coming from Popen, which reads stdout, and redirect that into stdin, so that I can use Samfile("-", "rb") as the above section states is possible.
thanks.

I'm a little confused that you see binary on stdout if you are using stdout=subprocess.PIPE, however, the overall problem is that you need to work with sys.stdin if you want to trick pysam into using it.
For instance:
sys.stdin = subprocess.Popen(my_cmd, stdout=subprocess.PIPE, shell=True).stdout
s = pysam.Samfile("-", "rb")
sys.stdin = sys.__stdin__ # restore original stdin
UPDATE: This assumed that pysam is running in the context of the Python interpreter and thus means the Python interpreter's stdin when "-" is specified. Unfortunately, it doesn't; when "-" is specified it reads directly from file descriptor 0.
In other words, it is not using Python's concept of stdin (sys.stdin) so replacing it has no effect on pysam.Samfile(). It also is not possible to take the output from the Popen call and somehow "push" it on to file descriptor 0; it's readonly and the other end of that is connected to your terminal.
The only real way to get that output onto file descriptor 0 is to just move it to an additional script and connect the two together from the first. That ensures that the output from the Popen in the first script will end up on file descriptor 0 of the second one.
So, in this case, your best option is to split this into two scripts. The first one will invoke my_cmd and take the output of that and use it for the input to a second Popen of another Python script that invokes pysam.Samfile("-", "rb").

In the specific case of dealing with pysam, I was able to work around the issue using a named pipe (http://docs.python.org/library/os.html#os.mkfifo), which is a pipe that can be accessed like a regular file. In general, you want the consumer (reader) of the pipe to listen before you start writing to the pipe, to ensure you don't miss anything. However, pysam.Samfile("-", "rb") will hang as you noted above if nothing is already registered on stdin.
Assuming you're dealing with a prior computation that takes a decent amount of time (e.g. sorting the bam before passing it into pysam), you can start that prior computation and then listen on the stream before anything gets output:
import os
import tempfile
import subprocess
import shutil
import pysam
# Create a named pipe
tmpdir = tempfile.mkdtemp()
samtools_prefix = os.path.join(tmpdir, "namedpipe")
fifo = samtools_prefix + ".bam"
os.mkfifo(fifo)
# The example below sorts the file 'input.bam',
# creates a pysam.Samfile object of the sorted data,
# and prints out the name of each record in sorted order
# Your prior process that spits out data to stdout/a file
# We pass samtools_prefix as the output prefix, knowing that its
# ending file will be named what we called the named pipe
subprocess.Popen(["samtools", "sort", "input.bam", samtools_prefix])
# Read from the named pipe
samfile = pysam.Samfile(fifo, "rb")
# Print out the names of each record
for read in samfile:
print read.qname
# Clean up the named pipe and associated temp directory
shutil.rmtree(tmpdir)

If your system supports it; you could use /dev/fd/# filenames:
process = subprocess.Popen(args, stdout=subprocess.PIPE)
samfile = pysam.Samfile("/dev/fd/%d" % process.stdout.fileno(), "rb")

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Catch universal newlines but preserve original - python

Related

Passing input to subprocess popen at runtime based on stdout string

How to remove b and \n from variable/text file in Python3? (TypeError)

Python 2 to 3 conversion: iterating over lines in subprocess stdout

Getting output of a process at runtime

using Python subprocess to redirect stdout to stdin?

Categories

Resources