Should I always close stdout explicitly? - python

I am trying to integrate a small Win32 C++ program which reads from stdin and writes the decoded result (˜128 kbytes)to the output stream.
I read entire input into buffer with
while (std::cin.get(c)) { }
After I write entire output to the stdout.
Everything works fine when I run the application from command line eg test.exe < input.bin > output.bin, however this small app is supposed to be run from Python.
I expect that Python subprocess.communicate is supposed to be used, the docs say:
Interact with process: Send data to stdin. Read data from stdout and
stderr, until end-of-file is reached. Wait for process to terminate.
So communicate() will wait until the end-of-file before waiting my app to finish - is EOF supposed to happen when my application exits? Or should I explicitly do fclose(stderr) and fclose(stdout)?

Don't close stdout
In the general case, it is actually wrong, since it is possible to register a function with atexit() which tries to write to stdout, and this will break if stdout is closed.
When the process terminates, all handles are closed by the operating system automatically. This includes stdout, so you are not responsible for closing it manually.
(Technically, the C++ runtime will normally try to flush and close all C++ streams before the OS even has a chance to get involved, but the OS absolutely must close any handles which the runtime, for whatever reason, misses.)
In specialized circumstances, it may be useful to close standard streams (for example, when daemonizing), but it should be done with great care. It's usually a good idea to redirect to or from the null device (/dev/null on Unix, nul on Windows) so that code expecting to interact with those streams will still work. On Unix, this is done with freopen(3); Windows has an equivalent function, but it's part of the POSIX API and may not work well with standard Windows I/O.

Related

Where do the stderr within the return value of subprocess.Popen() originate? [duplicate]

I am rather confused with the purpose of these three files. If my understanding is correct, stdin is the file in which a program writes into its requests to run a task in the process, stdout is the file into which the kernel writes its output and the process requesting it accesses the information from, and stderr is the file into which all the exceptions are entered. On opening these files to check whether these actually do occur, I found nothing seem to suggest so!
What I would want to know is what exactly is the purpose of these files, absolutely dumbed down answer with very little tech jargon!
Standard input - this is the file handle that your process reads to get information from you.
Standard output - your process writes conventional output to this file handle.
Standard error - your process writes diagnostic output to this file handle.
That's about as dumbed-down as I can make it :-)
Of course, that's mostly by convention. There's nothing stopping you from writing your diagnostic information to standard output if you wish. You can even close the three file handles totally and open your own files for I/O.
When your process starts, it should already have these handles open and it can just read from and/or write to them.
By default, they're probably connected to your terminal device (e.g., /dev/tty) but shells will allow you to set up connections between these handles and specific files and/or devices (or even pipelines to other processes) before your process starts (some of the manipulations possible are rather clever).
An example being:
my_prog <inputfile 2>errorfile | grep XYZ
which will:
create a process for my_prog.
open inputfile as your standard input (file handle 0).
open errorfile as your standard error (file handle 2).
create another process for grep.
attach the standard output of my_prog to the standard input of grep.
Re your comment:
When I open these files in /dev folder, how come I never get to see the output of a process running?
It's because they're not normal files. While UNIX presents everything as a file in a file system somewhere, that doesn't make it so at the lowest levels. Most files in the /dev hierarchy are either character or block devices, effectively a device driver. They don't have a size but they do have a major and minor device number.
When you open them, you're connected to the device driver rather than a physical file, and the device driver is smart enough to know that separate processes should be handled separately.
The same is true for the Linux /proc filesystem. Those aren't real files, just tightly controlled gateways to kernel information.
It would be more correct to say that stdin, stdout, and stderr are "I/O streams" rather
than files. As you've noticed, these entities do not live in the filesystem. But the
Unix philosophy, as far as I/O is concerned, is "everything is a file". In practice,
that really means that you can use the same library functions and interfaces (printf,
scanf, read, write, select, etc.) without worrying about whether the I/O stream
is connected to a keyboard, a disk file, a socket, a pipe, or some other I/O abstraction.
Most programs need to read input, write output, and log errors, so stdin, stdout,
and stderr are predefined for you, as a programming convenience. This is only
a convention, and is not enforced by the operating system.
As a complement of the answers above, here is a sum up about Redirections:
EDIT: This graphic is not entirely correct.
The first example does not use stdin at all, it's passing "hello" as an argument to the echo command.
The graphic also says 2>&1 has the same effect as &> however
ls Documents ABC > dirlist 2>&1
#does not give the same output as
ls Documents ABC > dirlist &>
This is because &> requires a file to redirect to, and 2>&1 is simply sending stderr into stdout
I'm afraid your understanding is completely backwards. :)
Think of "standard in", "standard out", and "standard error" from the program's perspective, not from the kernel's perspective.
When a program needs to print output, it normally prints to "standard out". A program typically prints output to standard out with printf, which prints ONLY to standard out.
When a program needs to print error information (not necessarily exceptions, those are a programming-language construct, imposed at a much higher level), it normally prints to "standard error". It normally does so with fprintf, which accepts a file stream to use when printing. The file stream could be any file opened for writing: standard out, standard error, or any other file that has been opened with fopen or fdopen.
"standard in" is used when the file needs to read input, using fread or fgets, or getchar.
Any of these files can be easily redirected from the shell, like this:
cat /etc/passwd > /tmp/out # redirect cat's standard out to /tmp/foo
cat /nonexistant 2> /tmp/err # redirect cat's standard error to /tmp/error
cat < /etc/passwd # redirect cat's standard input to /etc/passwd
Or, the whole enchilada:
cat < /etc/passwd > /tmp/out 2> /tmp/err
There are two important caveats: First, "standard in", "standard out", and "standard error" are just a convention. They are a very strong convention, but it's all just an agreement that it is very nice to be able to run programs like this: grep echo /etc/services | awk '{print $2;}' | sort and have the standard outputs of each program hooked into the standard input of the next program in the pipeline.
Second, I've given the standard ISO C functions for working with file streams (FILE * objects) -- at the kernel level, it is all file descriptors (int references to the file table) and much lower-level operations like read and write, which do not do the happy buffering of the ISO C functions. I figured to keep it simple and use the easier functions, but I thought all the same you should know the alternatives. :)
I think people saying stderr should be used only for error messages is misleading.
It should also be used for informative messages that are meant for the user running the command and not for any potential downstream consumers of the data (i.e. if you run a shell pipe chaining several commands you do not want informative messages like "getting item 30 of 42424" to appear on stdout as they will confuse the consumer, but you might still want the user to see them.
See this for historical rationale:
"All programs placed diagnostics on the standard output. This had
always caused trouble when the output was redirected into a file, but
became intolerable when the output was sent to an unsuspecting
process. Nevertheless, unwilling to violate the simplicity of the
standard-input-standard-output model, people tolerated this state of
affairs through v6. Shortly thereafter Dennis Ritchie cut the Gordian
knot by introducing the standard error file. That was not quite enough.
With pipelines diagnostics could come from any of several programs
running simultaneously. Diagnostics needed to identify themselves."
stdin
Reads input through the console (e.g. Keyboard input).
Used in C with scanf
scanf(<formatstring>,<pointer to storage> ...);
stdout
Produces output to the console.
Used in C with printf
printf(<string>, <values to print> ...);
stderr
Produces 'error' output to the console.
Used in C with fprintf
fprintf(stderr, <string>, <values to print> ...);
Redirection
The source for stdin can be redirected. For example, instead of coming from keyboard input, it can come from a file (echo < file.txt ), or another program ( ps | grep <userid>).
The destinations for stdout, stderr can also be redirected. For example stdout can be redirected to a file: ls . > ls-output.txt, in this case the output is written to the file ls-output.txt. Stderr can be redirected with 2>.
Using ps -aux reveals current processes, all of which are listed in /proc/ as /proc/(pid)/, by calling cat /proc/(pid)/fd/0 it prints anything that is found in the standard output of that process I think. So perhaps,
/proc/(pid)/fd/0 - Standard Output File
/proc/(pid)/fd/1 - Standard Input File
/proc/(pid)/fd/2 - Standard Error File
for example
But only worked this well for /bin/bash other processes generally had nothing in 0 but many had errors written in 2
For authoritative information about these files, check out the man pages, run the command on your terminal.
$ man stdout
But for a simple answer, each file is for:
stdout for a stream out
stdin for a stream input
stderr for printing errors or log messages.
Each unix program has each one of those streams.
stderr will not do IO Cache buffering so if our application need to print critical message info (some errors ,exceptions) to console or to file use it where as use stdout to print general log info as it use IO Cache buffering there is a chance that before writing our messages to file application may close ,leaving debugging complex
A file with associated buffering is called a stream and is declared to be a pointer to a defined type FILE. The fopen() function creates certain descriptive data for a stream and returns a pointer to designate the stream in all further transactions. Normally there are three open streams with constant pointers declared in the header and associated with the standard open files.
At program startup three streams are predefined and need not be opened explicitly: standard input (for reading conventional input), standard output (for writing conventional output), and standard error (for writing diagnostic output). When opened the standard error stream is not fully buffered; the standard input and standard output streams are fully buffered if and only if the stream can be determined not to refer to an interactive device
https://www.mkssoftware.com/docs/man5/stdio.5.asp
Here is a lengthy article on stdin, stdout and stderr:
What Are stdin, stdout, and stderr on Linux?
To summarize:
Streams Are Handled Like Files
Streams in Linux—like almost everything else—are treated as though
they were files. You can read text from a file, and you can write text
into a file. Both of these actions involve a stream of data. So the
concept of handling a stream of data as a file isn’t that much of a
stretch.
Each file associated with a process is allocated a unique number to
identify it. This is known as the file descriptor. Whenever an action
is required to be performed on a file, the file descriptor is used to
identify the file.
These values are always used for stdin, stdout, and stderr:
0: stdin
1: stdout
2: stderr
Ironically I found this question on stack overflow and the article above because I was searching for information on abnormal / non-standard streams. So my search continues.

How do you trivially change a Python script to capture everything written to stdout by itself and its subprocesses?

Suppose you have a big script that writes to stdout in many places, both directly (using Python features like print() or logging that goes to stdout) and indirectly by launching subprocesses which write to stdout.
Is there a trivial way to capture all this stdout?
For example, if you want the script to send an email with all its output when it completes.
By "trivial" I mean a constant rather than linear code change. Otherwise, I believe you will have to introduce redirection parameters (and some accumulation logic) into every single subrprocess call. You can capture all the output of the script itself by redirecting sys.stdout, however I don't see a similar "catch-all" trivial solution for all the subprocess calls, or indeed whatever other types of code you may be using to launch these subprocesses.
Is there any such solution, or must one use a runner script that will call this Python script as a subprocess and capture all stdout from that subprocess?
Probably the shortest way to do so, not python specific would be to use os.dup2() e.g.:
f = open('/tmp/OUT', 'w')
os.dup2(f.fileno(), 1)
f.close()
What it does is to replaces file descriptor 1 which would normally be your stdout. With file descriptor of f (which you can then close). After that all writes to stdout and in /tmp/OUT. This duplication is inheritable, subprocesses have fd 1 writing to the same file.

python 2.7 Popen: what does `close_fds` do?

I have a web server in Python (2.7) that uses Popen to delegate some work to a child process:
url_arg = "http://localhost/index.html?someparam=somevalue"
call = ('phantomjs', 'some/phantom/script.js', url_arg)
imageB64data = tempfile.TemporaryFile()
errordata = tempfile.TemporaryFile()
p = Popen(call, stdout=imageB64data, stderr=errordata, stdin=PIPE)
p.communicate(input="")
I am seeing intermittent issues where after some number of these Popens have occurred (roughly 64), the process runs out of file descriptors and is unable to function -- it becomes completely unresponsive and all threads seem to block forever if they attempt to open any files or sockets.
(Possibly relevant: the phantomjs child process loads a URL calls back into the server that spawned it.)
Based on this Python bug report, I believe I need to set close_fds=True on all Popen calls from inside my server process in order to mitigate the leaking of file descriptors. However, I am unfamiliar with the machinery around exec-ing subprocesses and inheritance of file descriptors so much of the Popen documentation and the notes in the aforementioned bug report are unclear to me.
It sounds like it would actually close all open file descriptors (which includes active request sockets, log file handles, etc.) in my process before executing the subprocess. This sounds like it would be strictly better than leaking the sockets, but would still result in errors.
However, in practice, when I use close_fds=True during a web request, it seems to work fine and thus far I have been unable to construct a scenario where it actually closes any other request sockets, database requests, etc.
The docs state:
If close_fds is true, all file descriptors except 0, 1 and 2 will be closed before the child process is executed.
So my question is: is it "safe" and "correct" to pass close_fds=True to Popen in a multithreaded Python web server? Or should I expect this to have side effects if other requests are doing file/socket IO at the same time?
I tried the following test with the subprocess32 backport of Python 3.2/3.3's subprocess:
import tempfile
import subprocess32 as subprocess
fp = open('test.txt', 'w')
fp.write("some stuff")
echoed = tempfile.TemporaryFile()
p = subprocess.Popen(("echo", "this", "stuff"), stdout=echoed, close_fds=True)
p.wait()
echoed.seek(0)
fp.write("whatevs")
fp.write(echoed.read())
fp.close()
and I got the expected result of some stuffwhatevsecho this stuff in test.txt.
So it appears that the meaning of close in close_fds does not mean that open files (sockets, etc.) in the parent process will be unusable after executing a child process.
Also worth noting: subprocess32 defaults close_fds=True on POSIX systems, AFAICT. This implies to me that it is not as dangerous as it sounds.
I suspect that close_fds solves the problem of file descriptors leaking to subprocesses. Imagine opening a file, and then running some task using subprocess. Without close_fds, the file descriptor is copied to the subprocess, so even if the parent process closes the file, the file remains open due to the subprocess. Now, let's say we want to delete the directory with the file in another thread using shutil.rmtree. On a regular filesystem, this should not be an issue. The directory is just removed as expected. However, when the file resides on NFS, the following happens: First, Python will try to delete the file. Since the file is still in use, it gets renamed to .nfsXXX instead, where XXX is a long hexadecimal number. Next, Python will try to delete the directory, but that has become impossible because the .nfsXXX file still resides in it.

twisted reactor.spawnProcess get stdout w/o bufffering on windows

I'm running an external process and I need to get the stdout immediately so I can push it to a textview, on GNU/Linux I can use "usePTY=True" to get the stdout by line, unfortunately usePTY is not available on windows.
I'm fairly new to twisted, is there a way to achieve the same result on Windows with some twisted (or python maybe) magic stuff?
on GNU/Linux I can use "usePTY=True" to get the stdout by line
Sort of! What usePTY=True actually does is create a PTY (a "pseudo-terminal" - the thing you always get when you log in to a shell on GNU/Linux unless you have a real terminal which no one does anymore :) instead of a boring old pipe. A PTY is a lot like a pipe but it has some extra features - but more importantly for you, a PTY is strongly associated with interactive sessions (ie, a user) whereas a pipe is pretty strongly associated with programmatic uses (think foo | bar - no user ever sees the output of foo).
This means that people tend to use existence of a PTY as stdout as a signal that they should produce output in a timely manner - because a human is waiting to see it. On the flip side, the existence of a regular old pipe as stdout is taken as a signal that another program is consuming the output and they should instead produce output in the most efficient way possible.
What this tends to mean in practice is that if a program has a PTY then it will line buffer its output and if it has a pipe then it will "block" buffer its output (usually gather up about 4kB of data before writing any of it) - because line buffering is less efficient.
The thing to note here is that it is the program you are running that does this buffering. Whether you pass usePTY=True or usePTY=False makes no direct difference to that buffering: it is just a hint to the program you are running what kind of output buffering it should do.
This means that you might run programs that block buffer even if you pass usePTY=True and vice versa.
However... Windows doesn't have PTYs. So programs on Windows can't consider PTYs as a hint for how to buffer their output.
I don't actually know if there is another hint that it is conventional for programs to respect on Windows. I've never come across one, at least.
If you're lucky, then the program you're running will have some way for you to request line-buffered output. If you're running Python, then it does - the PYTHONUNBUFFERED environment variable controls this, as does the -u command line option (and I think they both work on Windows).
Incidentally, if you plan to pass binary data between the two processes, then you probably also want to put stdio into binary mode in the child process as well:
import os, sys, mscvrt
msvcrt.setmode(sys.stdin.fileno(), os.O_BINARY)
msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
msvcrt.setmode(sys.stderr.fileno(), os.O_BINARY)

Python subprocess interaction, why does my process work with Popen.communicate, but not Popen.stdout.read()?

I am trying to communicate with a command-line chat bot with Python using the subprocess module. (http://howie.sourceforge.net/ using the compiled win32 binary, I have my reasons!)
This works:
proc = Popen('Howie/howie.exe', stdout=PIPE,stderr=STDOUT,stdin=PIPE)
output = proc.communicate()
But Popen.communicate waits for the process to terminate (and sends it EOF?), I want to be able to interact with it. The apparent solution for this was to read stdout / write stdin like so:
This doesn't work:
proc = Popen('Howie/howie.exe', stdout=PIPE,stderr=STDOUT,stdin=PIPE)
while True: print proc.stdout.readline()
(Note that I am actually using more complex code based on http://code.activestate.com/recipes/440554/ but the issue is the same.)
The problem is, the second approach works perfectly for communicating to cmd, but when I run the chatbot, nothing. So my question is, how is this different in capturing output to using Popen.communicate()?
i.e. I can use the second approach to use the command line as per normal, until I run the chatbot, at which point I stop receiving output. Using the first approach correctly displays the first few lines of output from the bot, but leaves me unable to interact with it.
One major difference between the two is that communicate() closes stdin after sending the data. I don't know about your particular case, but in many cases this means that if a process is awaiting the end of the user input, he will get it when communicate() is used, and will never get it when the code blocks on read() or readline().
Try adding Popen.stdin.close() first and see if it affects your case.
If you want to interact with the program after sending the EOF, rather than using Popen.stdin.close(), you can manually send the command-line End Of File character, which has the same effect but leaves stdin open.
In Python this character's escape sequence is '\x1a'.

Categories