I am confused about a couple of things when it comes to the issue of stdout and stderr being buffered/unbuffered:
1)
Is the statement "stdout/err is buffered/unbuffered" decided by my Operating System or the programming language library functions (particular the write() or print() functions) that I am working with ?
While programming in C, I have always gone by the rule that stdout is buffered while stderr is unbuffered. I have seen this in action by calling sleep() after putchar() statements within a while loop to see the individual characters being placed on stderr one by one, while only complete lines appeared in stdout. When I tried to replicate this program in python, both stderr and stdout had the same behaviour: produced complete lines - so I looked this up and found a post that said:
sys.stderr is line-buffered by default since Python 3.9.
Hence the question - because I was under the impression that the behaviour of stderr being buffered/unbuffered was decided and fixed by the OS but apparently, code libraries free to implement their own behaviour ? Can I hypothetically write a routine that writes to stdout without a buffer ?
The relevant code snippets for reference:
/* C */
while ((c = fgetc(file)) != EOF) {
fputc(c, stdout /* or stderr */);
usleep(800);
}
# Python
for line in file:
for ch in line:
print(ch, end='', file=sys.stdout) # or sys.stderr
time.sleep(0.08);
2)
Secondly, my understanding of the need for buffering is that: since disk access is slower than RAM access, writing individual bytes would be inefficient and thus bytes are written in blocks. But is writing to a device file like /dev/stdout and /dev/stdin the same as writing to disk? (Isn't disk supposed to be permanent? Stuff written to stdout or stderr only appears in the terminal, if connected, and then lost right?)
3)
Finally, is there really a need for stderr to be unbuffered in C if it is less efficient?
Is the statement "stdout/err is buffered/unbuffered" decided by my Operating System or the programming language library functions (particular the write() or print() functions) that I am working with ?
Mostly it is decided by the programming language implementation, and programming languages standardize this. For example, the C language specification says:
At program startup, three text streams are predefined and need not be
opened explicitly — standard input (for reading conventional input),
standard output (for writing conventional output), and standard error (for writing diagnostic output). As initially opened, the
standard error stream is not fully buffered; the standard input and
standard output streams are fully buffered if and only if the stream
can be determined not to refer to an interactive device.
(C2017, paragraph 7.21.3/7)
Similarly, the Python docs for sys.stdin, sys.stdout, and sys.stderr say:
When interactive, the stdout stream is line-buffered. Otherwise, it is
block-buffered like regular text files. The stderr stream is
line-buffered in both cases. You can make both streams unbuffered by
passing the -u command-line option or setting the PYTHONUNBUFFERED
environment variable.
Be aware, however, that both of those particular languages provide mechanisms to change the buffering of the standard streams (or in the Python case, at least stdout and stderr).
MOREOVER, the above is relevant only if you are using streams (C) or File objects (Python). In C, this is what all of the stdio functions use -- printf(), fgets(), fwrite(), etc. -- but it is not what (say) the POSIX raw I/O functions such as read() and write() use. If you use raw I/O interfaces such as the latter then there is only whatever buffering you perform manually.
Hence the question - because I was under the impression that the behaviour of stderr being buffered/unbuffered was decided and fixed by the OS
No. The OS (at least Unixes (including Mac) and Windows) does not perform I/O buffering on behalf of programs. Programming language implementations do, under some circumstances, and they are then in control of the details.
but apparently, code libraries free to implement their own behaviour ?
It's a bit more nuanced than that, but basically yes.
Can I hypothetically write a routine that writes to stdout without a buffer ?
Maybe. In C or Python, at least, you can exert some control over the buffering mode of the stdout stream. In C you can adjust it dynamically at runtime, but in Python I think the buffering mode is decided when Python starts.
You may also be able to bypass the buffer of a buffered stream by performing (raw) I/O on the underlying file descriptor, but this is extremely poor form, and depending on the details, it may produce undefined behavior.
Secondly, my understanding of the need for buffering is that: since disk access is slower than RAM access, writing individual bytes would be inefficient and thus bytes are written in blocks.
All I/O is slow, even I/O to a terminal. Disk I/O tends to be especially slow, but program performance generally benefits from buffering I/O to all devices.
But is writing to a device file like /dev/stdout and /dev/stdin the same as writing to disk?
Sometimes it is exactly writing to disk (look up I/O redirection). Different devices do have different performance characteristics, so buffering may improve performance more with some than with others, but again, all I/O is slow.
Finally, is there really a need for stderr to be unbuffered in C if it is less efficient?
The point of stderr being unbuffered (by default) in C is so that messages directed there are written to the underlying device (often a terminal) as soon as possible. Efficiency is not really a concern for the kinds of messages that this policy is most intended to serve.
https://linux.die.net/man/3/stderr, https://linux.die.net/man/3/setbuf, and https://linux.die.net/man/2/write are helpful resources here
If you use the raw syscall write, there won't be buffering. I'd imagine the same is true for WinAPI but I don't know.
Python and C want to make it easier to write things, so they wrap the raw syscalls with a file pointer (in C)/file object (in python). This, in addition to storing the raw file descriptor used to make the syscalls, can optionally do things like buffer to reduce the amount of syscalls you're making.
You can change the buffering settings of a file or stream. (In C that's setbuf, I'm not sure for python.)
C and Python just happen to have different default configurations of stderr's wrapper.
For 2), writing to a pipe is usually much faster than writing to disk, but it's still a relatively slow operation compared to memcpy or the like, which is what buffering essentially is. The processor has to jump into kernel mode and back.
For 3), I'd guess that C developers decided it was more important to get errors on-time than to get performance. In general, if your program is spitting out lots of data to stderr you have bigger problems than performance.
Related
In Python, I can open a file with f= open(<filename>,<permissions>). This returns an object f which I can write to using f.write(<some data>).
If, at this point, I access the original final (eg with cat from a terminal), it appears empty: Python stored the data I wrote to the object f and not the actual on-disk file. If I then call f.close(), the data in f is persisted to the on-disk file (and I can access it from other programs).
I assume data is buffered to improve latency. However, what happens if the buffered data grows a lot? Will Python initiate a write? If so, details on the internals (what influences the buffer size? is the disk I/O handled within Python or by another program/thread? is there a chance Python will just hang during the write?) would be much appreciated.
The general subject of I/O buffering has been treated many times (including in questions linked from the comments). But to answer your specific questions:
By default, when writing to a terminal (“the screen”), a newline causes the text to be flushed up through it. For all files, the buffer is flushed each time it fills. (Large single writes might flush any existing buffer contents and then bypass it.)
The buffer has a fixed size and is allocated before any data is written; Python 3 doesn’t use stdio, so it chooses its own buffer sizes. (A few kB is typical.)
The “disk I/O” (really kernel I/O, which is distinguishable only in certain special circumstances like network/power failure) happens within whatever Python write triggers the flush.
Yes, it can hang, if the file is a pipe to a busy process, a socket over a slow network, a special device, or even a regular file mounted from a remote machine.
In the Python HDF5 library h5py, do I need to flush() a file before I close() it?
Or does closing the file already make sure that any data that might still be in the buffers will be written to disk?
What exactly is the point of flushing? When would flushing be necessary?
No, you do not need to flush the file before closing. Flushing is done automatically by the underlying HDF5 C library when you close the file.
As to the point of flushing. File I/O is slow compared to things like memory or cache access. If programs had to wait before data was actually on the disk each time a write was performed, that would slow things down a lot. So the actual writing to disk is buffered by at least the OS, but in many cases by the I/O library being used (e.g., the C standard I/O library). When you ask to write data to a file, it usually just means that the OS has copied your data to its own internal buffer, and will actually put it on the disk when it's convenient to do so.
Flushing overrides this buffering, at whatever level the call is made. So calling h5py.File.flush() will flush the HDF5 library buffers, but not necessarily the OS buffers. The point of this is to give the program some control over when data actually leaves a buffer.
For example, writing to the standard output is usually line-buffered. But if you really want to see the output before a newline, you can call fflush(stdout). This might make sense if you are piping the standard output of one process into another: that downstream process can start consuming the input right away, without waiting for the OS to decide it's a good time.
Another good example is making a call to fork(2). This usually copies the entire address space of a process, which means the I/O buffers as well. That may result in duplicated output, unnecessary copying, etc. Flushing a stream guarantees that the buffer is empty before forking.
I learned that by default I/O in programs is buffered, i.e they are served from a temporary storage to the requesting program.
I understand that buffering improves IO performance (maybe by reducing system calls). I have seen examples of disabling buffering, like setvbuf in C. What is the difference between the two modes and when should one be used over the other?
You want unbuffered output whenever you want to ensure that the output has been written before continuing. One example is standard error under a C runtime library - this is usually unbuffered by default. Since errors are (hopefully) infrequent, you want to know about them immediately. On the other hand, standard output is buffered simply because it's assumed there will be far more data going through it.
Another example is a logging library. If your log messages are held within buffers in your process, and your process dumps core, there a very good chance that output will never be written.
In addition, it's not just system calls that are minimized but disk I/O as well. Let's say a program reads a file one byte at a time. With unbuffered input, you will go out to the (relatively very slow) disk for every byte even though it probably has to read in a whole block anyway (the disk hardware itself may have buffers but you're still going out to the disk controller which is going to be slower than in-memory access).
By buffering, the whole block is read in to the buffer at once then the individual bytes are delivered to you from the (in-memory, incredibly fast) buffer area.
Keep in mind that buffering can take many forms, such as in the following example:
+-------------------+-------------------+
| Process A | Process B |
+-------------------+-------------------+
| C runtime library | C runtime library | C RTL buffers
+-------------------+-------------------+
| OS caches | Operating system buffers
+---------------------------------------+
| Disk controller hardware cache | Disk hardware buffers
+---------------------------------------+
| Disk |
+---------------------------------------+
You want unbuffered output when you already have large sequence of bytes ready to write to disk, and want to avoid an extra copy into a second buffer in the middle.
Buffered output streams will accumulate write results into an intermediate buffer, sending it to the OS file system only when enough data has accumulated (or flush() is requested). This reduces the number of file system calls. Since file system calls can be expensive on most platforms (compared to short memcpy), buffered output is a net win when performing a large number of small writes. Unbuffered output is generally better when you already have large buffers to send -- copying to an intermediate buffer will not reduce the number of OS calls further, and introduces additional work.
Unbuffered output has nothing to do with ensuring your data reaches the disk; that functionality is provided by flush(), and works on both buffered and unbuffered streams. Unbuffered IO writes don't guarantee the data has reached the physical disk -- the OS file system is free to hold on to a copy of your data indefinitely, never writing it to disk, if it wants. It is only required to commit it to disk when you invoke flush(). (Note that close() will call flush() on your behalf).
first question so please be gentle.
i am using python.
when creating a named pipe to a c++ windows program with
PIPE = open(r'\\.\pipe\NamedPipe','rb+',0)
as global i can read/write from and to the pipe.
def pipe_writer():
PIPE.write(some_stuff)
def pipe_reader():
data = struct.unpack("byte-type",PIPE.read(number_of_bytes),0)
pipe_writer()
pipe_reader()
this is fine to collect data from the pipe and process the complete data with several functions, one function after the other.
unfortunately i have to process the data bit by bit as i pull it from the pipe with several functions in a serialized manner.
i thought that queueing the data would just do the job so i use the multiprocess module.
when i try to multiprocess i am able to create the pipe and send data once when opening it it after:
if __name__ == '__main__':
PIPE = open(r'\\.\pipe\NamedPipe','rb+',0)
PIPE.write(some_stuff)
when I then try to .start() the functions as processes and read from the pipe I get an error that the pipe doesn't exist or is open in the wrong mode, which can't really be as it works just fine when reading/writing to it without using Process() on the functions AND i can write to it ... even if it's only once.
any suggestions? Also I think I kinda need to use multiprocess as threading doesn't work ... probably ... because of the GIL and slowing stuff down.
If you're in control of the C++ source code too, you can save yourself a lot of code and hassle by moving on to using ZeroMQ or Nanomsg instead of the pipe, and Google Protocol Buffers instead of interpreting a byte stream yourself.
ZeroMQ and Nanomsg are like networks / pipes /IPC on steroids, and are much easier to use than raw pipes, sockets, etc. You have less source code and more functionality : win-win.
Google's protocol Buffers allow you to define data structures (messages) in a language neutral way, and then auto generate source code in C++, Python, Java or whatever. This source code defines structs, classes, etc that represent the messages and also converts them to a standard binary format. That binary data is what you'll send via ZeroMQ. Again, less source code for you to write, more functionality.
This is ideal for getting C++ classes into Python and vice versa.
nanomsg python wrapper is also available on GitHub at Nanomsg Python.
Examples you can see at Examples. I guess this wrapper will serve your purpose. It's always better to use this in place of raw PIPEs. It supports IPC, Between Process and TCP communication patterns.
Moreover it is crossplatform and it's basic implementation is in C. So I guess communication between python and C process can also be made possible.
if you run couple of threads but they all have to print to the same stdout, does this mean they have to wait on each other? so say if all 4 threads have something to write, they have to pause and wait for the stdout to be free so they can get on with their work?
Deep deep (deep deep deep...) down in the OS's system calls, yes. Modern OSes have thread-safe terminal printing routines which usually just lock around the critical sections that do the actual device access (or buffer, depending on what you're writing into and what its settings are). These waits are very short, however. Keep in mind that this is IO you're dealing with here, so the wait times are likely to be negligible relatively to actual IO execution.
It depends. If stdout is a pipe, each pipe gets a 4KB buffer which you can override when the pipe is created. Buffers are flushed when the buffer is full or with a call to flush().
If stdout is a terminal, output is usually line buffered. So until you print a newline, all threads can write to their buffers. When the newline is written, the whole buffer is dumped on the console and all other threads that are writing newlines at the same time have to wait.
Since threads do other things than writing newlines, each thread gets some CPU. So even in the worst case, the congestion should be pretty small.
There is one exception, though: If you write a lot of data or if the console is slow (like the Linux kernel debug console which uses the serial port). When the console can't cope with the amount of data, more and more threads will hang in the write of the newline waiting for the buffers to flush.