Eagerly return lines from stdin Python - python

I'm making a script which takes has some others script output piped into it. The other script takes a while to complete, and prints the progress onto the console along with the data I want to parse.
Since I'm piping the result to my script, I want to be able to do 2 things. As my input comes, I would like to echo it out onto the screen. After the command completes, I would like to have a list of lines that were passed via stdin.
My first though was to use a simple
for line in sys.stdin:
sys.stdout.write(line + '\n')
lines.append(line)
sys.stdout.flush()
but to my surprise, the command waits until stdin hits EOF, until it starts yielding lines.
My current workaround is this:
line = sys.stdin.readline()
lines = []
while line:
sys.stdout.write(line.strip() + '\n')
lines.append(line.strip())
sys.stdout.flush()
line = sys.stdin.readline()
But this does not always wait until the whole input is used.
Is there any other way to do this? It seems strange that the for solution behaves the way it does.

edited to answer your question regarding exiting on end of input
The workaround you describe, or something similar like this below appears to be necessary:
#!/usr/bin/env python
import sys
lines = []
while True:
line = sys.stdin.readline()
if not line:
break
line = line.rstrip()
sys.stdout.write(line + '\n')
lines.append(line)
sys.stdout.flush()
This is explained in the python man page, under the -u option:
-u Force stdin, stdout and stderr to be totally unbuffered. On
systems where it matters, also put stdin, stdout and stderr in
binary mode. Note that there is internal buffering in xread-
lines(), readlines() and file-object iterators ("for line in
sys.stdin") which is not influenced by this option. To work
around this, you will want to use "sys.stdin.readline()" inside
a "while 1:" loop.
I created a file dummy.py containing the code above, then ran this:
for i in 1 2 3 4 5; do sleep 5; echo $i; echo; done | ./dummy.py
This is the output:
harold_mac:~ harold$ for i in 1 2 3 4 5; do sleep 5; echo $i; done | ./dummy.py
1
2
3
4
5
harold_mac:~ harold$

Python uses buffered input. If you check with python --help you see:
-u : unbuffered binary stdout and stderr; also PYTHONUNBUFFERED=x
So try the unbuffered option with:
command | python -u your_script.py

Other people have already told you about the unbuffered output. I will just add a couple of thoughts:
often it is better to print debug info to stderr, and stderr output is usually unbuffered
it is simplier to delegate intermediate output to special tools. For example, there is a tee utility, that allows to split stdout of a previous command. Assuming you are in bash, you can print the intermediate output to stdout right away, and use process substitution instead of printing to a file (instead of awk you will call your python script):
$ python -c 'for i in range(5): print i+1' | tee >( awk '{print "from awk", $0**2 }')
1
2
3
4
5
from awk 1
from awk 4
from awk 9
from awk 16
from awk 25

You need to make 1) stdin in your python program and 2) stdout on the contrary side of the pipe both to be line buffered. To get this
1) use stdin = os.fdopen(sys.stdin.fileno(), 'r', 1) in your program;
2) use stdbuf -oL to change buffering mode of the output of the other program:
stdbuf -oL otherprogram | python yourscript.py

Related

python: read data from stdin and raw_input

I want to pass some data to a python script using echo and after that promote the user to input options. I am running through an EOFError which I think is happening since I read all data in sys.stdin. How do I fix this issue? Thanks!
code.py:
x = ''
for line in sys.stdin:
x += line
y = raw_input()
usage:
echo -e -n '1324' | ./code.py
error at raw_input():
EOFError: EOF when reading a line
Use:
{ echo -e -n '1324'; cat; } | ./code.py
First echo will write the literal string to the pipe, then cat will read from standard input and copy that to the pipe. The python script will see this all as its standard input.
You just cannot send data through stdin (that's redirecting) and then get back the interactive mode.
When you perform a | b, b cannot read from standard input anymore. If it wants to do that, it will stop as soon as a finishes and breaks the pipe.
But when a finishes, it does not mean than you get hold of stdin again.
Maybe you could change the way you want to do things, example:
echo -n -e '1324' | ./code.py
becomes
./code.py '1234' '5678'
and use sys.argv[] to get the value of 1234, 5678...
import sys
x = ''
for line in sys.argv[1:]:
x += line+"\n"
y = raw_input()
if you have a lot of lines to output, pass an argument which is a file and what you'll read
import sys
x = ''
for line in open(sys.argv[1],"r"):
x += line
y = raw_input()

File following program

I am trying to build a python program that follows a log file checks for certain patterns. (Much like grep ..)
Part of the testing code 'test.py' is to read the stdin,
import fileinput
for line in fileinput.input():
print line
so if I do this in one terminal
tail -f log.txt | python test.py
In another terminal
echo "hello" >> log.txt
you expect hello is print out on the first terminal, but it doesn't. How to change the code? I also want to use it like this
cat log.txt | python test.py
with the same test.py.
Echoing sys.stdin directly seems to work on my Mac OS laptop:
import sys
for line in sys.stdin:
print line.rstrip()
But interestingly, this didn't work very well on my Linux box. It would print the output from tail -f eventually, but the buffering was definitely making it appear as though the program was not working (it would print out fairly large chunks after several seconds of waiting).
Instead I got more responsive behavior by reading from sys.stdin one byte at a time:
import sys
buf = ''
while True:
buf += sys.stdin.read(1)
if buf.endswith('\n'):
print buf[:-1]
buf = ''

Storing value from a parsed ping

I'm working on some code that performs a ping operation from python and extracts only the latency by using awk. This is currently what I have:
from os import system
l = system("ping -c 1 sitename | awk -F = 'FNR==2 {print substr($4,1,length($4)-3)}'")
print l
The system() call works fine, but I get an output in terminal rather than the value storing into l. Basically, an example output I'd get from this particular block of code would be
90.3
0
Why does this happen, and how would I go about actually storing that value into l? This is part of a larger thing I'm working on, so preferably I'd like to keep it in native python.
Use subprocess.check_output if you want to store the output in a variable:
from subprocess import check_output
l = check_output("ping -c 1 sitename | awk -F = 'FNR==2 {print substr($4,1,length($4)-3)}'", shell=True)
print l
Related: Extra zero after executing a python script
os.system() returns the return code of the called command, not the output to stdout.
For detail on how to properly get the command's output (including pre-Python 2.7), see this: Running shell command from Python and capturing the output
BTW I would use Ping Package https://pypi.python.org/pypi/ping
It looks promising
Here is how I store output to a variable.
test=$(ping -c 1 google.com | awk -F"=| " 'NR==2 {print $11}')
echo "$test"
34.9

Python not printing newline

I have script that seems to have stopped working after my latest upgrade. To find the problem, I wrote a little script:
import subprocess
hdparm = subprocess.Popen(["xargs","echo"],
stdin=subprocess.PIPE)
hdparm.stdin.write("Hello\n")
hdparm.stdin.write("\n")
hdparm.stdin.close()
hdparm.wait()
quit()
This just prints "Hello" and a new line, but I expect two newlines. What's causing this? (I am using 2.7.3 at the moment)
EDIT: Here is the problematic script (edited for clarity):
hdparm = subprocess.Popen(["hdparm", "--please-destroy-my-drive", "--trim-sector-ranges-stdin", "/dev/sda"],
stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
hdparm_counter = 0
for rng in ranges_to_trim:
hdparm.stdin.write("%d:%d\n" % (rng["begin"],rng["length"]))
hdparm_counter += 1
if hdparm_counter > 63:
hdparm.stdin.write("\n")
hdparm_counter = 0
if hdparm_counter != 0:
hdparm.stdin.write("\n")
hdparm.stdin.close()
hdparm.wait()
EDIT: I believe the problem is with my script itself. I need to send EOF to hdparm to make it do whatever it is supposed to.
From the xargs man page:
This manual page documents the GNU version of xargs. xargs reads items from the standard input, delimited by
blanks (which can be protected with double or single quotes or a backslash) or newlines, and executes the
command (default is /bin/echo) one or more times with any initial-arguments followed by items read from stan‐
dard input. Blank lines on the standard input are ignored.
(emphasis added).
Also, to add -- the newline you see is from echo itself. xargs doesn't pass it along anyway.

Why does python keep buffering stdout even when flushing and using -u?

$ cat script.py
import sys
for line in sys.stdin:
sys.stdout.write(line)
sys.stdout.flush()
$ cat script.py - | python -u script.py
The output is right but it only starts printing once I hit Ctrl-D whereas the following starts printing right away :
$ cat script.py - | cat
which led me to think that the buffering does not come from cat.
I managed to get it working by doing :
for line in iter(sys.stdin.readline, ""):
as explained here : Streaming pipes in Python, but I don't understand why the former solution doesn't work as expected.
Python manpage reveals the answer to your question:
-u Force stdin, stdout and stderr to be totally unbuffered. On systems where it matters, also put stdin, stdout and stderr in binary mode. Note that
there is internal buffering in xreadlines(), readlines() and file-object iterators ("for line in sys.stdin") which is not influenced by this
option. To work around this, you will want to use "sys.stdin.readline()" inside a "while 1:" loop.
That is: file-object iterators' internal buffering is to blame (and it doesn't go away with -u).
cat does block buffering by default if output is to a pipe. So when you include - (stdin) in the cat command, it waits to get EOF (your ctrl-D closes the stdin stream) or 8K (probably) of data before outputting anything.
If you change the cat command to "cat script.py |" you'll see that it works as you expected.
Also, if you add 8K of comments to the end of script.py, it will immediately print it as well.
Edit:
The above is wrong. :-)
It turns out that file.next() (used by file iterators, ie. for line in file) has a hidden read-ahead buffer that is not used by readline(), which simply reads a character until it sees a newline or EOF.

Categories