I am using a 3rd-party python module which is normally called through terminal commands. When called through terminal commands it has a verbose option which prints to terminal in real time.
I then have another python program which calls the 3rd-party program through subprocess. Unfortunately, when called through subprocess the terminal output no longer flushes, and is only returned on completion (the process takes many hours so I would like real-time progress).
I can see the source code of the 3rd-party module and it does not set printing to be flushed such as print('example', flush=True). Is there a way to force the flushing through my module without editing the 3rd-party source code? Furthermore, can I send this output to a log file (again in real time)?
Thanks for any help.
The issue is most likely that many programs work differently if run interactively in a terminal or as part of a pipe line (i.e. called using subprocess). It has very little to do with Python itself, but more with the Unix/Linux architecture.
As you have noted, it is possible to force a program to flush stdout even when run in a pipe line, but it requires changes to the source code, by manually applying stdout.flush calls.
Another way to print to screen, is to "trick" the program to think it is working with an interactive terminal, using a so called pseudo-terminal. There is a supporting module for this in the Python standard library, namely pty. Using, that, you will not explicitly call subprocess.run (or Popen or ...). Instead you have to use the pty.spawn call:
def prout(fd):
data = os.read(fd, 1024)
while(data):
print(data.decode(), end="")
data = os.read(fd, 1024)
pty.spawn("./callee.py", prout)
As can be seen, this requires a special function for handling stdout. Here above, I just print it to the terminal, but of course it is possible to do other thing with the text as well (such as log or parse...)
Another way to trick the program, is to use an external program, called unbuffer. Unbuffer will take your script as input, and make the program think (as for the pty call) that is called from a terminal. This is arguably simpler if unbuffer is installed or you are allowed to install it on your system (it is part of the expect package). All you have to do then, is to change your subprocess call as
p=subprocess.Popen(["unbuffer", "./callee.py"], stdout=subprocess.PIPE)
and then of course handle the output as usual, e.g. with some code like
for line in p.stdout:
print(line.decode(), end="")
print(p.communicate()[0].decode(), end="")
or similar. But this last part I think you have already covered, as you seem to be doing something with the output.
I am rather confused with the purpose of these three files. If my understanding is correct, stdin is the file in which a program writes into its requests to run a task in the process, stdout is the file into which the kernel writes its output and the process requesting it accesses the information from, and stderr is the file into which all the exceptions are entered. On opening these files to check whether these actually do occur, I found nothing seem to suggest so!
What I would want to know is what exactly is the purpose of these files, absolutely dumbed down answer with very little tech jargon!
Standard input - this is the file handle that your process reads to get information from you.
Standard output - your process writes conventional output to this file handle.
Standard error - your process writes diagnostic output to this file handle.
That's about as dumbed-down as I can make it :-)
Of course, that's mostly by convention. There's nothing stopping you from writing your diagnostic information to standard output if you wish. You can even close the three file handles totally and open your own files for I/O.
When your process starts, it should already have these handles open and it can just read from and/or write to them.
By default, they're probably connected to your terminal device (e.g., /dev/tty) but shells will allow you to set up connections between these handles and specific files and/or devices (or even pipelines to other processes) before your process starts (some of the manipulations possible are rather clever).
An example being:
my_prog <inputfile 2>errorfile | grep XYZ
which will:
create a process for my_prog.
open inputfile as your standard input (file handle 0).
open errorfile as your standard error (file handle 2).
create another process for grep.
attach the standard output of my_prog to the standard input of grep.
Re your comment:
When I open these files in /dev folder, how come I never get to see the output of a process running?
It's because they're not normal files. While UNIX presents everything as a file in a file system somewhere, that doesn't make it so at the lowest levels. Most files in the /dev hierarchy are either character or block devices, effectively a device driver. They don't have a size but they do have a major and minor device number.
When you open them, you're connected to the device driver rather than a physical file, and the device driver is smart enough to know that separate processes should be handled separately.
The same is true for the Linux /proc filesystem. Those aren't real files, just tightly controlled gateways to kernel information.
It would be more correct to say that stdin, stdout, and stderr are "I/O streams" rather
than files. As you've noticed, these entities do not live in the filesystem. But the
Unix philosophy, as far as I/O is concerned, is "everything is a file". In practice,
that really means that you can use the same library functions and interfaces (printf,
scanf, read, write, select, etc.) without worrying about whether the I/O stream
is connected to a keyboard, a disk file, a socket, a pipe, or some other I/O abstraction.
Most programs need to read input, write output, and log errors, so stdin, stdout,
and stderr are predefined for you, as a programming convenience. This is only
a convention, and is not enforced by the operating system.
As a complement of the answers above, here is a sum up about Redirections:
EDIT: This graphic is not entirely correct.
The first example does not use stdin at all, it's passing "hello" as an argument to the echo command.
The graphic also says 2>&1 has the same effect as &> however
ls Documents ABC > dirlist 2>&1
#does not give the same output as
ls Documents ABC > dirlist &>
This is because &> requires a file to redirect to, and 2>&1 is simply sending stderr into stdout
I'm afraid your understanding is completely backwards. :)
Think of "standard in", "standard out", and "standard error" from the program's perspective, not from the kernel's perspective.
When a program needs to print output, it normally prints to "standard out". A program typically prints output to standard out with printf, which prints ONLY to standard out.
When a program needs to print error information (not necessarily exceptions, those are a programming-language construct, imposed at a much higher level), it normally prints to "standard error". It normally does so with fprintf, which accepts a file stream to use when printing. The file stream could be any file opened for writing: standard out, standard error, or any other file that has been opened with fopen or fdopen.
"standard in" is used when the file needs to read input, using fread or fgets, or getchar.
Any of these files can be easily redirected from the shell, like this:
cat /etc/passwd > /tmp/out # redirect cat's standard out to /tmp/foo
cat /nonexistant 2> /tmp/err # redirect cat's standard error to /tmp/error
cat < /etc/passwd # redirect cat's standard input to /etc/passwd
Or, the whole enchilada:
cat < /etc/passwd > /tmp/out 2> /tmp/err
There are two important caveats: First, "standard in", "standard out", and "standard error" are just a convention. They are a very strong convention, but it's all just an agreement that it is very nice to be able to run programs like this: grep echo /etc/services | awk '{print $2;}' | sort and have the standard outputs of each program hooked into the standard input of the next program in the pipeline.
Second, I've given the standard ISO C functions for working with file streams (FILE * objects) -- at the kernel level, it is all file descriptors (int references to the file table) and much lower-level operations like read and write, which do not do the happy buffering of the ISO C functions. I figured to keep it simple and use the easier functions, but I thought all the same you should know the alternatives. :)
I think people saying stderr should be used only for error messages is misleading.
It should also be used for informative messages that are meant for the user running the command and not for any potential downstream consumers of the data (i.e. if you run a shell pipe chaining several commands you do not want informative messages like "getting item 30 of 42424" to appear on stdout as they will confuse the consumer, but you might still want the user to see them.
See this for historical rationale:
"All programs placed diagnostics on the standard output. This had
always caused trouble when the output was redirected into a file, but
became intolerable when the output was sent to an unsuspecting
process. Nevertheless, unwilling to violate the simplicity of the
standard-input-standard-output model, people tolerated this state of
affairs through v6. Shortly thereafter Dennis Ritchie cut the Gordian
knot by introducing the standard error file. That was not quite enough.
With pipelines diagnostics could come from any of several programs
running simultaneously. Diagnostics needed to identify themselves."
stdin
Reads input through the console (e.g. Keyboard input).
Used in C with scanf
scanf(<formatstring>,<pointer to storage> ...);
stdout
Produces output to the console.
Used in C with printf
printf(<string>, <values to print> ...);
stderr
Produces 'error' output to the console.
Used in C with fprintf
fprintf(stderr, <string>, <values to print> ...);
Redirection
The source for stdin can be redirected. For example, instead of coming from keyboard input, it can come from a file (echo < file.txt ), or another program ( ps | grep <userid>).
The destinations for stdout, stderr can also be redirected. For example stdout can be redirected to a file: ls . > ls-output.txt, in this case the output is written to the file ls-output.txt. Stderr can be redirected with 2>.
Using ps -aux reveals current processes, all of which are listed in /proc/ as /proc/(pid)/, by calling cat /proc/(pid)/fd/0 it prints anything that is found in the standard output of that process I think. So perhaps,
/proc/(pid)/fd/0 - Standard Output File
/proc/(pid)/fd/1 - Standard Input File
/proc/(pid)/fd/2 - Standard Error File
for example
But only worked this well for /bin/bash other processes generally had nothing in 0 but many had errors written in 2
For authoritative information about these files, check out the man pages, run the command on your terminal.
$ man stdout
But for a simple answer, each file is for:
stdout for a stream out
stdin for a stream input
stderr for printing errors or log messages.
Each unix program has each one of those streams.
stderr will not do IO Cache buffering so if our application need to print critical message info (some errors ,exceptions) to console or to file use it where as use stdout to print general log info as it use IO Cache buffering there is a chance that before writing our messages to file application may close ,leaving debugging complex
A file with associated buffering is called a stream and is declared to be a pointer to a defined type FILE. The fopen() function creates certain descriptive data for a stream and returns a pointer to designate the stream in all further transactions. Normally there are three open streams with constant pointers declared in the header and associated with the standard open files.
At program startup three streams are predefined and need not be opened explicitly: standard input (for reading conventional input), standard output (for writing conventional output), and standard error (for writing diagnostic output). When opened the standard error stream is not fully buffered; the standard input and standard output streams are fully buffered if and only if the stream can be determined not to refer to an interactive device
https://www.mkssoftware.com/docs/man5/stdio.5.asp
Here is a lengthy article on stdin, stdout and stderr:
What Are stdin, stdout, and stderr on Linux?
To summarize:
Streams Are Handled Like Files
Streams in Linux—like almost everything else—are treated as though
they were files. You can read text from a file, and you can write text
into a file. Both of these actions involve a stream of data. So the
concept of handling a stream of data as a file isn’t that much of a
stretch.
Each file associated with a process is allocated a unique number to
identify it. This is known as the file descriptor. Whenever an action
is required to be performed on a file, the file descriptor is used to
identify the file.
These values are always used for stdin, stdout, and stderr:
0: stdin
1: stdout
2: stderr
Ironically I found this question on stack overflow and the article above because I was searching for information on abnormal / non-standard streams. So my search continues.
I am trying to integrate a small Win32 C++ program which reads from stdin and writes the decoded result (˜128 kbytes)to the output stream.
I read entire input into buffer with
while (std::cin.get(c)) { }
After I write entire output to the stdout.
Everything works fine when I run the application from command line eg test.exe < input.bin > output.bin, however this small app is supposed to be run from Python.
I expect that Python subprocess.communicate is supposed to be used, the docs say:
Interact with process: Send data to stdin. Read data from stdout and
stderr, until end-of-file is reached. Wait for process to terminate.
So communicate() will wait until the end-of-file before waiting my app to finish - is EOF supposed to happen when my application exits? Or should I explicitly do fclose(stderr) and fclose(stdout)?
Don't close stdout
In the general case, it is actually wrong, since it is possible to register a function with atexit() which tries to write to stdout, and this will break if stdout is closed.
When the process terminates, all handles are closed by the operating system automatically. This includes stdout, so you are not responsible for closing it manually.
(Technically, the C++ runtime will normally try to flush and close all C++ streams before the OS even has a chance to get involved, but the OS absolutely must close any handles which the runtime, for whatever reason, misses.)
In specialized circumstances, it may be useful to close standard streams (for example, when daemonizing), but it should be done with great care. It's usually a good idea to redirect to or from the null device (/dev/null on Unix, nul on Windows) so that code expecting to interact with those streams will still work. On Unix, this is done with freopen(3); Windows has an equivalent function, but it's part of the POSIX API and may not work well with standard Windows I/O.
import sh
sh.vim("lalala")
does not show the vim editor in my console. Setting _bg=False kwarg makes no change (since that's already the default value)
If instead I use the subprocess module, it works:
import subprocess
subprocess.call(["vim", "lalala"])
The problem is that vim expects its stdin to be a TTY, but the pipe created by sh is not a TTY, it's a pipe.
The solution is to not try to intercept vim's standard I/O with pipes. Since intercepting stdio with pipes is the entire purpose of sh, rather than trying to find a way to fight against it, you're better off not using it. Just use the stdlib's subprocess module, which only intercepts stdio if you go out of your way to ask it to:
subprocess.check_call(['vim', 'lalala'])
But notice the TTYs section in the sh docs:
Some applications behave differently depending on whether their standard file descriptors are attached to a TTY or not. For example, git will disable features intended for humans such as colored and paged output when STDOUT is not attached to a TTY. Other programs may disable interactive input if a TTY is not attached to STDIN. Still other programs, such as SSH (without -n), expect their input to come from a TTY/terminal.
By default, sh emulates a TTY for STDOUT but not for STDIN. You can change the default behavior by passing in extra special keyword arguments…
So, if you pass _tty_in=True, then vim's input will be an emulated TTY instead of a pipe.
But that still isn't going to do much good. It'll allow vim to run, but it'll run using the fake TTY created by sh for its input and output, which I'm pretty sure is not what you want. (If you were looking to send it control sequences and capture and process the control sequences it sends back, it would almost certainly be simpler to just script ed—or, better, sed—instead…)
So why aren't you getting some kind of error message or other sane behavior?
Really, that's down to vim. If you try the same thing with emacs, or any app that uses curses, and many other TTY apps, they'll write an error message to stderr and exit with 1, so you'll see something like this:
ErrorReturnCode_1:
RAN: '/usr/bin/emacs -nw'
STDOUT:
STDERR:
emacs: standard input is not a tty
This question already has answers here:
read subprocess stdout line by line
(10 answers)
Closed 21 days ago.
How can I receive input from the terminal in Python?
I am using Python to interface with another program which generates output from user input.
I am using subprocess.Popen() to input to the program, but I can't set stdout to subprocess.PIPE because the program does not seem to flush ever, so everything gets stuck in the buffer.
The program's standard output seems to be to print to terminal, and I see output when I do not redirect stdout. However, I need Python to read and interpret the output which is now in the terminal.
Sorry if this is a stupid question, but I can't seem to get this to work.
Buffering in child processes is a common problem. Here are four possible approaches.
First, and easiest, you could read one byte at a time from your pipe. This is what I would call a "dirty hack" and it carries a performance penalty, but it's easy and it guarantees that your read() calls will only block until the first byte comes in, rather than wait for a buffer to fill up that's never going to fill up. However, this does not force the other process to flush its write buffer, so if that is the issue this approach will not help you anyway.
Second, and I think next-easiest, consider using the Twisted framework which has a facility for using a virtual terminal, or pty ("pseudo-teletype" I think) to talk to your child process. However, this can affect the design of your application (possibly for the better, but this may not be in the cards for you regardless). http://twistedmatrix.com/documents/current/core/howto/process.html
If neither of the above options works for you, you're reduced to solving gritty I/O concurrency issues yourself.
Third, try setting your pipes (all of them, before fork()) to non-blocking mode using fcntl() with O_NONBLOCK. Then you can use select() to test for read/write readiness before trying the read/write; but you still have to catch IOError and test for EAGAIN because it can happen even in this case. This may, depending on the behavior of the child process, allow you to wait until the data really shows up before trying to read it in.
The last resort is to implement the PTY logic yourself. If you've seen references to stuff like termio options, ioctl() calls, etc. then that's what you're up against. I have not done this before, because it's complicated and I have never really needed to. If this is your destiny, good luck.
Have you tried setting the bufsize in your Popen object to 0? I'm not sure if you can force the buffer to be unbuffered from the receiving size, but I'd try it.
http://docs.python.org/library/subprocess.html#using-the-subprocess-module