I'm looking for a way to monitor a file that is written to by a program on Linux. I found the tail -F command in here, and also recommended was less +FG. I tested it by running tail -F file in one terminal, and a simple python script:
import time
for i in range(20):
print i
time.sleep(0.5)
in another. I redirected the output to the file:
python script.py >> file
I expected that tail would track the file contents and update the display in fixed intervals, instead it only shows what was written to the file after the command terminates.
The same thing happens with less +FG and also if I watch the output from cat. I've also tried using the usual redirect which truncates the file > instead of >>. Here it says the file was truncated, but still does not track it in real time.
Any idea why this doesn't work? (It's suggested here that it might be due to buffered writes, but since my script runs over 10 seconds, I suspect this might not be the cause)
Edit: In case it matters, I'm running Linux Mint 18.1
Python's standard out is buffered. If when you close the script / script is done, you see all the output - that's definitely buffer issue.
You can use this instead:
import time
import sys
for i in range(20):
sys.stdout.write('%d\n' % i)
sys.stdout.flush()
time.sleep(0.5)
I've tested it and it prints values in real time. To overcome buffer issue, after each .write() method I use .flush() force "flushing" the buffer.
Additional options from the comments:
Use the original print statement with sys.stdout.flush() afterwords
Run the python script with python -u for unbuffered binary stdout and stderr
Regarding jon1467 answer (sorry can't comment your answer), your understanding of redirection is wrong.
Try this :
dd if=/dev/urandom > test.txt
while looking at the file size with :
ls -l test.txt
You'll see the file grow while dd is running.
Vinny's answer is correct, python standard output is buffered.
The more common way to the "buffering effect" you notice is by flushing the stdout as Vinny showed you.
You could also use -u option to disable buffering for the whole python process, or you could just reopen standard output with a buffer size of 0 as below (in python2 at least):
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
Related
I am attempting to write a (Bash) shell script that wraps around a third-party python script and captures all output (errors and stdout) into a log file, and also restarts the script with a new batch of data each time it completes successfully. I'm doing this on a standard Linux distribution, but hopefully this solution can be platform-independent.
So here's a simplified version of the shell script, omitting everything except the logging:
#!/bin/bash
/home/me/script.py &>> /home/me/logfile
The problem is the third-party python script's output is mostly on a single line, which is being refreshed periodically (~every 90 seconds) by use of a carriage return ("\r"). Here's an example of the type of output I mean:
#!/usr/bin/env python3
import time
tracker = 1
print("This line is captured in the logfile because it ends with a newline")
while tracker < 5:
print(" This output isn't captured in the log file. Tracker = " + str(tracker),end="\r")
tracker += 1
time.sleep(1)
print("This line does get captured. Script is done. ")
How can I write a simple shell script to capture the output each time it is refreshed, or at least to periodically capture the current output as it would appear on the screen if I were running the script in the terminal?
Obviously I could try to modify the python script to change its output behavior, but the actual script I'm using is very complex and I think beyond my abilities to do that easily.
The program should have disabled this behavior when output is not a tty.
The output is already captured completely, it's just that you see all the updates at once when you cat the file. Open it in a text editor and see for yourself.
To make the file easier to work with, you can just replace the carriage returns with line feeds:
/home/me/script.py | tr '\r' '\n'
If the process normally produces output right away, but not with this command, you can disable Python's output buffering.
Terminal automatically cut the output as it scrolled up with each iteration.
By default mac terminal will have a limited no of lines of buffer. Say 1,000.
When you run a program and the output crossed 1000 lines your lines will be lost from memory. It's like FIFO buffer queue.
Basically the answer to your question:
`Is there a way to store the output of the previously run command in a text file?` is no. Sorry.
You can re-run the program and preserve the output by redirecting it to another file. Or increase the number of lines in the buffer (Maybe make it unlimited)
You can use less to go through the output:
your_command | less
Your Enter key will take you down.
Also, press q to exit.
Or you can reroute the output to a file
In one terminal tab run the program and redirect the output to a output.log file like this.
python program.py > output.log
In another tab you can tailf on the same log file and see the output.
tailf output.log
To see the complete output open the log file in any text editor.
You can consider increasing the scrollback buffer.
Or
If you want to see the data and also run it to a file, use tee, e.g,
spark-shell | tee tmp.out
This may not be the best wording for the question. I am trying to see 2 files at once on my screen.
I run:
multitail ~/path/to/somefile.err ~/path/to/somefile.out
I have a python script with the following lines:
sys.stdout = open('~/path/to/somefile.out', 'a')
sys.stderr = open('~/path/to/somefile.err', 'a')
My multitail command seems to only output my .out file, regardless of which order I put the files in the command.
I verified that my script is indeed writing to the files. What is also interesting, was that when I run the following command:
echo "text" >> ~/path/to/somefile.err
All of a sudden I see all the output from the .err file in the multitail screen (including that which didn't show up before)!
What is going on here that I can not see?
P.S. this is my first time using multitail so maybe I overlooked something simple. If it means anything, I am using CentOS 7.
You need to pass either buffering=0 (for unbuffered) or buffering=1 (for line-buffered - probably what you want) in your call to open.
The default is buffering=-1, which is equivalent to something like buffering=512 with a value depending on the system, so nothing will be written to the file until 512 (or whatever) bytes are written.
Alternatively, you could leave buffering set to its default value, and call .flush() every time you want the data to appear in the file.
When you use >> in the shell, that will close the file when the command exits, and closing implies a flush. (You can defer the close by using exec >> file.txt)
Is it possible to get the following check_call procedure:
logPath="log.txt"
with open(logPath,"w") as log:
subprocess.check_call(command, stdout = log, stderr=subprocess.STDOUT )
to output the stdout and stderr to a file continously?
On my machine, the output is written to the file only after the subprocess.check_call finished.
To achieve this, perhaps we could modify the buffer length of the log filestream?
Not without some OS tricks.
That happens because the output usually is line-buffered (i.e. after a newline character, the buffer is flushed) when the output is a terminal, but it is block-buffered when the output is a file or pipe, so in the block-buffering case, you won't see the output written "continuously", but rather it will be written every 1k or 4k or whatever the block size it is.
This is the default behavior of libc, so if the subprocess is written in C and using printf()/fprintf(), it will check the output if it is a terminal or a file and change the buffering mode accordingly.
The concept of buffering is (better) explained at http://www.gnu.org/software/libc/manual/html_node/Buffering-Concepts.html
This is done for performance reasons (see the answer to this question).
If you can modify subprocess' code, you can put a call to flush() after each line or when needed.
Otherwise there are external tools to force line buffering mode (by tricking programs into believing the output is a terminal):
unbuffer part of the expect package
stdbuf
Possibly related:
Force line-buffering of stdout when piping to tee (suggests use of unbuffer)
java subprocess does not write its output until it terminates (a shorter explanation of mine written years ago)
How to get “instant" output of “tail -f” as input? (suggests stdbuf usage)
Piping of grep is not working with tail? (only for grep)
$ cat script.py
import sys
for line in sys.stdin:
sys.stdout.write(line)
sys.stdout.flush()
$ cat script.py - | python -u script.py
The output is right but it only starts printing once I hit Ctrl-D whereas the following starts printing right away :
$ cat script.py - | cat
which led me to think that the buffering does not come from cat.
I managed to get it working by doing :
for line in iter(sys.stdin.readline, ""):
as explained here : Streaming pipes in Python, but I don't understand why the former solution doesn't work as expected.
Python manpage reveals the answer to your question:
-u Force stdin, stdout and stderr to be totally unbuffered. On systems where it matters, also put stdin, stdout and stderr in binary mode. Note that
there is internal buffering in xreadlines(), readlines() and file-object iterators ("for line in sys.stdin") which is not influenced by this
option. To work around this, you will want to use "sys.stdin.readline()" inside a "while 1:" loop.
That is: file-object iterators' internal buffering is to blame (and it doesn't go away with -u).
cat does block buffering by default if output is to a pipe. So when you include - (stdin) in the cat command, it waits to get EOF (your ctrl-D closes the stdin stream) or 8K (probably) of data before outputting anything.
If you change the cat command to "cat script.py |" you'll see that it works as you expected.
Also, if you add 8K of comments to the end of script.py, it will immediately print it as well.
Edit:
The above is wrong. :-)
It turns out that file.next() (used by file iterators, ie. for line in file) has a hidden read-ahead buffer that is not used by readline(), which simply reads a character until it sees a newline or EOF.