Why does python keep buffering stdout even when flushing and using -u?

Why does python keep buffering stdout even when flushing and using -u? - python

$ cat script.py
import sys
for line in sys.stdin:
sys.stdout.write(line)
sys.stdout.flush()
$ cat script.py - | python -u script.py
The output is right but it only starts printing once I hit Ctrl-D whereas the following starts printing right away :
$ cat script.py - | cat
which led me to think that the buffering does not come from cat.
I managed to get it working by doing :
for line in iter(sys.stdin.readline, ""):
as explained here : Streaming pipes in Python, but I don't understand why the former solution doesn't work as expected.

Python manpage reveals the answer to your question:
-u Force stdin, stdout and stderr to be totally unbuffered. On systems where it matters, also put stdin, stdout and stderr in binary mode. Note that
there is internal buffering in xreadlines(), readlines() and file-object iterators ("for line in sys.stdin") which is not influenced by this
option. To work around this, you will want to use "sys.stdin.readline()" inside a "while 1:" loop.
That is: file-object iterators' internal buffering is to blame (and it doesn't go away with -u).

cat does block buffering by default if output is to a pipe. So when you include - (stdin) in the cat command, it waits to get EOF (your ctrl-D closes the stdin stream) or 8K (probably) of data before outputting anything.
If you change the cat command to "cat script.py |" you'll see that it works as you expected.
Also, if you add 8K of comments to the end of script.py, it will immediately print it as well.
Edit:
The above is wrong. :-)
It turns out that file.next() (used by file iterators, ie. for line in file) has a hidden read-ahead buffer that is not used by readline(), which simply reads a character until it sees a newline or EOF.

Related

tail and less commands not monitoring file in real time

I'm looking for a way to monitor a file that is written to by a program on Linux. I found the tail -F command in here, and also recommended was less +FG. I tested it by running tail -F file in one terminal, and a simple python script:
import time
for i in range(20):
print i
time.sleep(0.5)
in another. I redirected the output to the file:
python script.py >> file
I expected that tail would track the file contents and update the display in fixed intervals, instead it only shows what was written to the file after the command terminates.
The same thing happens with less +FG and also if I watch the output from cat. I've also tried using the usual redirect which truncates the file > instead of >>. Here it says the file was truncated, but still does not track it in real time.
Any idea why this doesn't work? (It's suggested here that it might be due to buffered writes, but since my script runs over 10 seconds, I suspect this might not be the cause)
Edit: In case it matters, I'm running Linux Mint 18.1

Python's standard out is buffered. If when you close the script / script is done, you see all the output - that's definitely buffer issue.
You can use this instead:
import time
import sys
for i in range(20):
sys.stdout.write('%d\n' % i)
sys.stdout.flush()
time.sleep(0.5)
I've tested it and it prints values in real time. To overcome buffer issue, after each .write() method I use .flush() force "flushing" the buffer.
Additional options from the comments:
Use the original print statement with sys.stdout.flush() afterwords
Run the python script with python -u for unbuffered binary stdout and stderr

Regarding jon1467 answer (sorry can't comment your answer), your understanding of redirection is wrong.
Try this :
dd if=/dev/urandom > test.txt
while looking at the file size with :
ls -l test.txt
You'll see the file grow while dd is running.
Vinny's answer is correct, python standard output is buffered.
The more common way to the "buffering effect" you notice is by flushing the stdout as Vinny showed you.
You could also use -u option to disable buffering for the whole python process, or you could just reopen standard output with a buffer size of 0 as below (in python2 at least):
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)

Eagerly return lines from stdin Python

I'm making a script which takes has some others script output piped into it. The other script takes a while to complete, and prints the progress onto the console along with the data I want to parse.
Since I'm piping the result to my script, I want to be able to do 2 things. As my input comes, I would like to echo it out onto the screen. After the command completes, I would like to have a list of lines that were passed via stdin.
My first though was to use a simple
for line in sys.stdin:
sys.stdout.write(line + '\n')
lines.append(line)
sys.stdout.flush()
but to my surprise, the command waits until stdin hits EOF, until it starts yielding lines.
My current workaround is this:
line = sys.stdin.readline()
lines = []
while line:
sys.stdout.write(line.strip() + '\n')
lines.append(line.strip())
sys.stdout.flush()
line = sys.stdin.readline()
But this does not always wait until the whole input is used.
Is there any other way to do this? It seems strange that the for solution behaves the way it does.

edited to answer your question regarding exiting on end of input
The workaround you describe, or something similar like this below appears to be necessary:
#!/usr/bin/env python
import sys
lines = []
while True:
line = sys.stdin.readline()
if not line:
break
line = line.rstrip()
sys.stdout.write(line + '\n')
lines.append(line)
sys.stdout.flush()
This is explained in the python man page, under the -u option:
-u Force stdin, stdout and stderr to be totally unbuffered. On
systems where it matters, also put stdin, stdout and stderr in
binary mode. Note that there is internal buffering in xread-
lines(), readlines() and file-object iterators ("for line in
sys.stdin") which is not influenced by this option. To work
around this, you will want to use "sys.stdin.readline()" inside
a "while 1:" loop.
I created a file dummy.py containing the code above, then ran this:
for i in 1 2 3 4 5; do sleep 5; echo $i; echo; done | ./dummy.py
This is the output:
harold_mac:~ harold$ for i in 1 2 3 4 5; do sleep 5; echo $i; done | ./dummy.py
1
2
3
4
5
harold_mac:~ harold$

Python uses buffered input. If you check with python --help you see:
-u : unbuffered binary stdout and stderr; also PYTHONUNBUFFERED=x
So try the unbuffered option with:
command | python -u your_script.py

Other people have already told you about the unbuffered output. I will just add a couple of thoughts:
often it is better to print debug info to stderr, and stderr output is usually unbuffered
it is simplier to delegate intermediate output to special tools. For example, there is a tee utility, that allows to split stdout of a previous command. Assuming you are in bash, you can print the intermediate output to stdout right away, and use process substitution instead of printing to a file (instead of awk you will call your python script):
$ python -c 'for i in range(5): print i+1' | tee >( awk '{print "from awk", $0**2 }')
1
2
3
4
5
from awk 1
from awk 4
from awk 9
from awk 16
from awk 25

You need to make 1) stdin in your python program and 2) stdout on the contrary side of the pipe both to be line buffered. To get this
1) use stdin = os.fdopen(sys.stdin.fileno(), 'r', 1) in your program;
2) use stdbuf -oL to change buffering mode of the output of the other program:
stdbuf -oL otherprogram | python yourscript.py

Python: have check_call output to file continously?

Is it possible to get the following check_call procedure:
logPath="log.txt"
with open(logPath,"w") as log:
subprocess.check_call(command, stdout = log, stderr=subprocess.STDOUT )
to output the stdout and stderr to a file continously?
On my machine, the output is written to the file only after the subprocess.check_call finished.
To achieve this, perhaps we could modify the buffer length of the log filestream?

Not without some OS tricks.
That happens because the output usually is line-buffered (i.e. after a newline character, the buffer is flushed) when the output is a terminal, but it is block-buffered when the output is a file or pipe, so in the block-buffering case, you won't see the output written "continuously", but rather it will be written every 1k or 4k or whatever the block size it is.
This is the default behavior of libc, so if the subprocess is written in C and using printf()/fprintf(), it will check the output if it is a terminal or a file and change the buffering mode accordingly.
The concept of buffering is (better) explained at http://www.gnu.org/software/libc/manual/html_node/Buffering-Concepts.html
This is done for performance reasons (see the answer to this question).
If you can modify subprocess' code, you can put a call to flush() after each line or when needed.
Otherwise there are external tools to force line buffering mode (by tricking programs into believing the output is a terminal):
unbuffer part of the expect package
stdbuf
Possibly related:
Force line-buffering of stdout when piping to tee (suggests use of unbuffer)
java subprocess does not write its output until it terminates (a shorter explanation of mine written years ago)
How to get “instant" output of “tail -f” as input? (suggests stdbuf usage)
Piping of grep is not working with tail? (only for grep)

sys.stdin does not close on ctrl-d

I have the following code in program.py:
from sys import stdin
for line in stdin:
print line
I run, enter lines, and then press Ctrl+D, but the program does not exit.
This does work:
$ printf "echo" | python program.py
Why does the program not exit when I press Ctrl+d?
I am using the Fedora 18 terminal.

Ctrl+D has a strange effect. It doesn't close the input stream, but only causes a C-level fread() to return an empty result. For regular files such a result means that the file is now at its end, but it's acceptable to read more, e.g. to check if someone else wrote more data to the file in the meantime.
In addition, there are issues of buffering --- three levels of them!
Python's iteration over a file does block buffering. Avoid it to read from interactive streams.
the C-level stdin file has, by default, a line buffer.
the terminal itself(!), in its default mode ("cooked mode"), reads one line of data before sending it to the process, which explains why typing Ctrl+D doesn't have any effect when typed in the middle of a line.
This example avoids the first issue, which is all you need if all you want is detecting Ctrl+D typed as its own line:
import sys
while True:
line = sys.stdin.readline()
print repr(line)
You get every line with a final '\n', apart from when the "line" comes from a Ctrl+D, in which case you get just '' (but reading continues, unless of course we add if line == '': break).

sys.stdin.readlines() hangs Python script

Everytime I'm executing my Python script, it appears to hang on this line:
lines = sys.stdin.readlines()
What should I do to fix/avoid this?
EDIT
Here's what I'm doing with lines:
lines = sys.stdin.readlines()
updates = [line.split() for line in lines]
EDIT 2
I'm running this script from a git hook so is there anyway around the EOF?

This depends a lot on what you are trying to accomplish. You might be able do:
for line in sys.stdin:
#do something with line
Of course, with this idiom as well as the readlines() method you are using, you need to somehow send the EOF character to your script so that it knows that the file is ready to read. (On unix Ctrl-D usually does the trick).

Unless you are redirecting something to stdin that would be expected behavior. That says to read input from stdin (which would be the console you are running the script from). It is waiting for your input.
See: "How to finish sys.stdin.readlines() input?

If you're running the program in an interactive session, then this line causes Python to read from standard input (i. e. your keyboard) until you send the EOF character (Ctrl-D (Unix/Mac) or Ctrl-Z (Windows)).
>>> import sys
>>> a = sys.stdin.readlines()
Test
Test2
^Z
>>> a
['Test\n', 'Test2\n']

I know this isn't directly answering your question, as others have already addressed the EOF issue, but typically what I've found that works best when reading live output from a long lived subprocess or stdin is the while/if line approach:
while True:
line = sys.stdin.readline()
if not line:
break
process(line)
In this case, sys.stdin.readline() will return lines of text before an EOF is returned. Once the EOF if given, the empty line will be returned which triggers the break from the loop. A hang can still occur here, as long as an EOF isn't provided.
It's worth noting that the ability to process the "live output", while the subprocess/stdin is still running, requires the writing application to flush it's output.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why does python keep buffering stdout even when flushing and using -u? - python

Related

tail and less commands not monitoring file in real time

Eagerly return lines from stdin Python

Python: have check_call output to file continously?

sys.stdin does not close on ctrl-d

sys.stdin.readlines() hangs Python script

Categories

Resources