Is it possible to get the following check_call procedure:
logPath="log.txt"
with open(logPath,"w") as log:
subprocess.check_call(command, stdout = log, stderr=subprocess.STDOUT )
to output the stdout and stderr to a file continously?
On my machine, the output is written to the file only after the subprocess.check_call finished.
To achieve this, perhaps we could modify the buffer length of the log filestream?
Not without some OS tricks.
That happens because the output usually is line-buffered (i.e. after a newline character, the buffer is flushed) when the output is a terminal, but it is block-buffered when the output is a file or pipe, so in the block-buffering case, you won't see the output written "continuously", but rather it will be written every 1k or 4k or whatever the block size it is.
This is the default behavior of libc, so if the subprocess is written in C and using printf()/fprintf(), it will check the output if it is a terminal or a file and change the buffering mode accordingly.
The concept of buffering is (better) explained at http://www.gnu.org/software/libc/manual/html_node/Buffering-Concepts.html
This is done for performance reasons (see the answer to this question).
If you can modify subprocess' code, you can put a call to flush() after each line or when needed.
Otherwise there are external tools to force line buffering mode (by tricking programs into believing the output is a terminal):
unbuffer part of the expect package
stdbuf
Possibly related:
Force line-buffering of stdout when piping to tee (suggests use of unbuffer)
java subprocess does not write its output until it terminates (a shorter explanation of mine written years ago)
How to get “instant" output of “tail -f” as input? (suggests stdbuf usage)
Piping of grep is not working with tail? (only for grep)
Related
I'm looking for a way to monitor a file that is written to by a program on Linux. I found the tail -F command in here, and also recommended was less +FG. I tested it by running tail -F file in one terminal, and a simple python script:
import time
for i in range(20):
print i
time.sleep(0.5)
in another. I redirected the output to the file:
python script.py >> file
I expected that tail would track the file contents and update the display in fixed intervals, instead it only shows what was written to the file after the command terminates.
The same thing happens with less +FG and also if I watch the output from cat. I've also tried using the usual redirect which truncates the file > instead of >>. Here it says the file was truncated, but still does not track it in real time.
Any idea why this doesn't work? (It's suggested here that it might be due to buffered writes, but since my script runs over 10 seconds, I suspect this might not be the cause)
Edit: In case it matters, I'm running Linux Mint 18.1
Python's standard out is buffered. If when you close the script / script is done, you see all the output - that's definitely buffer issue.
You can use this instead:
import time
import sys
for i in range(20):
sys.stdout.write('%d\n' % i)
sys.stdout.flush()
time.sleep(0.5)
I've tested it and it prints values in real time. To overcome buffer issue, after each .write() method I use .flush() force "flushing" the buffer.
Additional options from the comments:
Use the original print statement with sys.stdout.flush() afterwords
Run the python script with python -u for unbuffered binary stdout and stderr
Regarding jon1467 answer (sorry can't comment your answer), your understanding of redirection is wrong.
Try this :
dd if=/dev/urandom > test.txt
while looking at the file size with :
ls -l test.txt
You'll see the file grow while dd is running.
Vinny's answer is correct, python standard output is buffered.
The more common way to the "buffering effect" you notice is by flushing the stdout as Vinny showed you.
You could also use -u option to disable buffering for the whole python process, or you could just reopen standard output with a buffer size of 0 as below (in python2 at least):
sys.stdout = os.fdopen(sys.stdout.fileno(), 'w', 0)
I am working on extracting PDFs from SEC filings. They usually come like this:
SEC Filing Example
For whatever reason when I save the raw PDF to a .text file, and then try to run
uudecode -o output_file.pdf input_file.txt
from the python subprocess.call() function or any other python function that allows commands to be executed from the command line, the PDF files that are generated are corrupted. If I run this same command from the command line directly there is no corruption.
When taking a closer look at the PDF file being output from the python script, it looks like the file ends prematurely. Is there some sort of output limit when executing a command line command from python?
Thanks!
This script worked fine for me running under Python 3.4.1 on Fedora 21 x86_64 with uudecode 4.15.2:
import subprocess
subprocess.call("uudecode -o output_file.pdf input_file.txt", shell=True)
Using the linked SEC filing (length: 173,141 B; sha1: e4f7fa2cbb3422411c2f2968d954d6bb9808b884), the decoded PDF (length: 124,557 B; sha1: 1676320e1d9923e14d19451c16688198bc93ca0d) appears correct when viewed.
There may be something else in your environment causing the problem. You may want to add additional details to your question.
Is there some sort of output limit when executing a command line command from python?
If by "output limit" you mean the size of the file being written by uudecode, then no. The only type of "output limit" you need to worry about when using the subprocess module is when you pass stdout=PIPE or stderr=PIPE when creating a child process. If the child process writes enough data to either of these streams, and your script does not regularly drain them, the child process will block (see the subprocess module documentation). In my test, uudecode wrote nothing to stdout or stderr.
I have a code with function sys.stdin.readlines().
What is the difference between the above one and sys.stdin.buffer.readlines()?.
What exactly do they do ?
If they read lines from command line,how to stop reading lines at a certain instant and proceed to flow through the program?
1) sys.stdin is a TextIOWrapper, its purpose is to read text from stdin. The resulting strings will be actual strs. sys.stdin.buffer is a BufferedReader. The lines you get from this will be byte strings
2) They read all the lines from stdin until hitting eof or they hit the limit you give them
3) If you're trying to read a single line, you can use .readline() (note: no s). Otherwise, when interacting with the program on the command line, you'd have to give it the EOF signal (Ctrl+D on *nix)
Is there a reason you are doing this rather than just calling input() to get one text line at a time from stdin?
From the docs
sys.stdin
sys.stdout
sys.stderr
File objects corresponding to the interpreter’s standard input, output and error streams. stdin is used for all interpreter input except for scripts but including calls to input(). stdout is used for the output of print() and expression statements and for the prompts of input(). The interpreter’s own prompts and (almost all of) its error messages go to stderr. stdout and stderr needn’t be built-in file objects: any object is acceptable as long as it has a write() method that takes a string argument. (Changing these objects doesn’t affect the standard I/O streams of processes executed by os.popen(), os.system() or the exec*() family of functions in the os module.)
Note: The standard streams are in text mode by default. To write or read binary data to these, use the underlying binary buffer. For example, to write bytes to stdout, use sys.stdout.buffer.write(b'abc').
So, sys.stdin.readlines() reads everything that was passed to stdin and separates the contents so lines are formed (you get a list as a result).
sys.stdin.buffer.readlines() does the same, but for buffer of stdin. I'd suggest to use the first method as the buffer may be empty while stdin may contain some data.
If you want to stop at some moment, then use readline() to read only one line at a time.
$ cat script.py
import sys
for line in sys.stdin:
sys.stdout.write(line)
sys.stdout.flush()
$ cat script.py - | python -u script.py
The output is right but it only starts printing once I hit Ctrl-D whereas the following starts printing right away :
$ cat script.py - | cat
which led me to think that the buffering does not come from cat.
I managed to get it working by doing :
for line in iter(sys.stdin.readline, ""):
as explained here : Streaming pipes in Python, but I don't understand why the former solution doesn't work as expected.
Python manpage reveals the answer to your question:
-u Force stdin, stdout and stderr to be totally unbuffered. On systems where it matters, also put stdin, stdout and stderr in binary mode. Note that
there is internal buffering in xreadlines(), readlines() and file-object iterators ("for line in sys.stdin") which is not influenced by this
option. To work around this, you will want to use "sys.stdin.readline()" inside a "while 1:" loop.
That is: file-object iterators' internal buffering is to blame (and it doesn't go away with -u).
cat does block buffering by default if output is to a pipe. So when you include - (stdin) in the cat command, it waits to get EOF (your ctrl-D closes the stdin stream) or 8K (probably) of data before outputting anything.
If you change the cat command to "cat script.py |" you'll see that it works as you expected.
Also, if you add 8K of comments to the end of script.py, it will immediately print it as well.
Edit:
The above is wrong. :-)
It turns out that file.next() (used by file iterators, ie. for line in file) has a hidden read-ahead buffer that is not used by readline(), which simply reads a character until it sees a newline or EOF.
I was wondering if there was a way to run a command line executable in python, but pass it the argument values from memory, without having to write the memory data into a temporary file on disk. From what I have seen, it seems to that the subprocess.Popen(args) is the preferred way to run programs from inside python scripts.
For example, I have a pdf file in memory. I want to convert it to text using the commandline function pdftotext which is present in most linux distros. But I would prefer not to write the in-memory pdf file to a temporary file on disk.
pdfInMemory = myPdfReader.read()
convertedText = subprocess.<method>(['pdftotext', ??]) <- what is the value of ??
what is the method I should call and how should I pipe in memory data into its first input and pipe its output back to another variable in memory?
I am guessing there are other pdf modules that can do the conversion in memory and information about those modules would be helpful. But for future reference, I am also interested about how to pipe input and output to the commandline from inside python.
Any help would be much appreciated.
with Popen.communicate:
import subprocess
out, err = subprocess.Popen(["pdftotext", "-", "-"], stdout=subprocess.PIPE).communicate(pdf_data)
os.tmpfile is useful if you need a seekable thing. It uses a file, but it's nearly as simple as a pipe approach, no need for cleanup.
tf=os.tmpfile()
tf.write(...)
tf.seek(0)
subprocess.Popen( ... , stdin = tf)
This may not work on Posix-impaired OS 'Windows'.
Popen.communicate from subprocess takes an input parameter that is used to send data to stdin, you can use that to input your data. You also get the output of your program from communicate, so you don't have to write it into a file.
The documentation for communicate explicitly warns that everything is buffered in memory, which seems to be exactly what you want to achieve.