Curious python print behavior - python

I'm using a print statement in a python 2.7 script in which I'm creating instances of data modeling classes. They're fairly large classes which do a good number of calculations in property setters during the init, so it's not the fastest executing script. I use print statements to have some idea of progress, but what's interesting is how they're executing. The code looks something like this:
from __future__ import absolute_import, division, print_function, unicode_literals
print('Loading data...', end='\t')
data = LoadData(data_path)
first_model = FirstModel(parameters).fit(data)
print('Done.\nFitting second model...', end='\t')
# prints 'Done.' and then there's a very long pause...
# suddenly 'Fitting second model...' prints and the next model initializes almost immediately
second_model = SecondModel(parameters).fit(data)
results = second_model.forecast(future_dates)
Why would the statement print('Done.\nFitting second model...', end=\t') first print 'Done.' and then pause for a long period of time? There was one instance when I was running this code, and after the 'Done.' printed I got an error before the rest of the statement printed. The error returned was an error in SecondModel where I tried too access a method as an attribute. What's going on here? How or why is python executing this print statement in such a counterintuitive way? It's as if the interpreter views the new line character as an indication that it should start looking at later parts of the code.

By default, print calls are buffered. The buffer is flushed whenever a newline character is encountered (therefore, you see Done\n appear). However, the subsequent text is kept in the buffer until the next event that flushes it (in the absence of some subsequent newline character to print, that'll probably be Python either returning to the command prompt or exiting completely to the shell, depending on how you're running this script). Therefore, your time-consuming call to SecondModel().fit() is occurring between the display of the two lines.
To avoid this, you can flush the buffer manually by calling sys.stdout.flush() immediately after the print. Or, if you were ever to move to Python 3.3 or higher, you would be able to shortcut this by passing the additional argument flush=True into print().
Error messages can interrupt printed output, and vice versa, because by default they are handled by two separate streams: sys.stderr and sys.stdout, respectively. The two streams have separate buffers.

Related

`time.sleep()` causing previous `print()` with `end=''` to delay [duplicate]

I have a python script that performs a simulation. It takes a fairly long, varying time to run through each iteration, so I print a . after each loop as a way to monitor how fast it runs and how far it went through the for statement as the script runs. So the code has this general structure:
for step in steps:
run_simulation(step)
# Python 3.x version:
print('.', end='')
# for Python 2.x:
# print '.',
However, when I run the code, the dots do not appear one by one. Instead, all the dots are printed at once when the loop finishes, which makes the whole effort pointless. How can I print the dots inline as the code runs?
This problem can also occur when iterating over data fed from another process and trying to print results, for example to echo input from an Electron app. See Python not printing output.
The issue
By default, output from a Python program is buffered to improve performance. The terminal is a separate program from your code, and it is more efficient to store up text and communicate it all at once, rather than separately asking the terminal program to display each symbol.
Since terminal programs are usually meant to be used interactively, with input and output progressing a line at a time (for example, the user is expected to hit Enter to indicate the end of a single input item), the default is to buffer the output a line at a time.
So, if no newline is printed, the print function (in 3.x; print statement in 2.x) will simply add text to the buffer, and nothing is displayed.
Outputting in other ways
Every now and then, someone will try to output from a Python program by using the standard output stream directly:
import sys
sys.stdout.write('test')
This will have the same problem: if the output does not end with a newline, it will sit in the buffer until it is flushed.
Fixing the issue
For a single print
We can explicitly flush the output after printing.
In 3.x, the print function has a flush keyword argument, which allows for solving the problem directly:
for _ in range(10):
print('.', end=' ', flush=True)
time.sleep(.2) # or other time-consuming work
In 2.x, the print statement does not offer this functionality. Instead, flush the stream explicitly, using its .flush method. The standard output stream (where text goes when printed, by default) is made available by the sys standard library module, and is named stdout. Thus, the code will look like:
for _ in range(10):
print '.',
sys.stdout.flush()
time.sleep(.2) # or other time-consuming work
For multiple prints
Rather than flushing after every print (or deciding which ones need flushing afterwards), it is possible to disable the output line buffering completely. There are many ways to do this, so please refer to the linked question.

Why does this for loop wait until the end of the iteration to print everything? [duplicate]

I have a python script that performs a simulation. It takes a fairly long, varying time to run through each iteration, so I print a . after each loop as a way to monitor how fast it runs and how far it went through the for statement as the script runs. So the code has this general structure:
for step in steps:
run_simulation(step)
# Python 3.x version:
print('.', end='')
# for Python 2.x:
# print '.',
However, when I run the code, the dots do not appear one by one. Instead, all the dots are printed at once when the loop finishes, which makes the whole effort pointless. How can I print the dots inline as the code runs?
This problem can also occur when iterating over data fed from another process and trying to print results, for example to echo input from an Electron app. See Python not printing output.
The issue
By default, output from a Python program is buffered to improve performance. The terminal is a separate program from your code, and it is more efficient to store up text and communicate it all at once, rather than separately asking the terminal program to display each symbol.
Since terminal programs are usually meant to be used interactively, with input and output progressing a line at a time (for example, the user is expected to hit Enter to indicate the end of a single input item), the default is to buffer the output a line at a time.
So, if no newline is printed, the print function (in 3.x; print statement in 2.x) will simply add text to the buffer, and nothing is displayed.
Outputting in other ways
Every now and then, someone will try to output from a Python program by using the standard output stream directly:
import sys
sys.stdout.write('test')
This will have the same problem: if the output does not end with a newline, it will sit in the buffer until it is flushed.
Fixing the issue
For a single print
We can explicitly flush the output after printing.
In 3.x, the print function has a flush keyword argument, which allows for solving the problem directly:
for _ in range(10):
print('.', end=' ', flush=True)
time.sleep(.2) # or other time-consuming work
In 2.x, the print statement does not offer this functionality. Instead, flush the stream explicitly, using its .flush method. The standard output stream (where text goes when printed, by default) is made available by the sys standard library module, and is named stdout. Thus, the code will look like:
for _ in range(10):
print '.',
sys.stdout.flush()
time.sleep(.2) # or other time-consuming work
For multiple prints
Rather than flushing after every print (or deciding which ones need flushing afterwards), it is possible to disable the output line buffering completely. There are many ways to do this, so please refer to the linked question.

Clearing the terminal window in Python, memory/performance

I'm using a script that runs for many hours, which prints statements to verify whether or not issues might have arisen (data is downloaded from the web, which sometimes gets distorted).
I've noticed a significant drop in performance after a while. I suspect that the many thousands of lines of print statements might be the reason.
It is commonly known that the terminal can be cleared of these print statements by the following line of code:
import os
os.system('cls') # for windows
Still, I suspect that this doesn't actually improve the performance speed and that it's merely a perceived improvement due to the fact that the screen is cleared. Is that true or not?
I've also considered suppressing certain print statements with the following code:
import sys
class NullWriter(object):
def write(self, arg):
pass
nullwrite = NullWriter()
oldstdout = sys.stdout
sys.stdout = oldstdout # enable output
print("text that I want to see")
sys.stdout = nullwrite # disable output
print("text I don't want to see")
My question: How can I improve the performance (speed) of my script, given that I still want to see the most recent print statements?
If you like you can just do a line feed without a carriage return and override the last line:
sys.stdout.write("\rDoing things")
sys.stdout.flush()
Printing over time shouldn't use any extra memory within python, but you might have your terminal's buffer set to high which can use a lot of memory. Or it's just taking time to flush the buffer because you're writing so fast to stdout.
You can also use the print function
Python 2.6+
From Python 2.6 you can import the print function from Python 3:
from __future__ import print_function
This allows you to use the Python 3 solution below.
Python 3
In Python 3, the print statement has been changed into a function. In Python 3, you can instead do:
print('.', end='')

linux " close failed in file object destructor" error in python script [duplicate]

NB: I have not attempted to reproduce the problem described below under Windows, or with versions of Python other than 2.7.3.
The most reliable way to elicit the problem in question is to pipe the output of the following test script through : (under bash):
try:
for n in range(20):
print n
except:
pass
I.e.:
% python testscript.py | :
close failed in file object destructor:
sys.excepthook is missing
lost sys.stderr
My question is:
How can I modify the test script above to avoid the error message when the script is run as shown (under Unix/bash)?
(As the test script shows, the error cannot be trapped with a try-except.)
The example above is, admittedly, highly artificial, but I'm running into the same problem sometimes when the output of a script of mine is piped through some 3rd party software.
The error message is certainly harmless, but it is disconcerting to end-users, so I would like to silence it.
EDIT: The following script, which differs from the original one above only in that it redefines sys.excepthook, behaves exactly like the one given above.
import sys
STDERR = sys.stderr
def excepthook(*args):
print >> STDERR, 'caught'
print >> STDERR, args
sys.excepthook = excepthook
try:
for n in range(20):
print n
except:
pass
How can I modify the test script above to avoid the error message when the script is run as shown (under Unix/bash)?
You will need to prevent the script from writing anything to standard output. That means removing any print statements and any use of sys.stdout.write, as well as any code that calls those.
The reason this is happening is that you're piping a nonzero amount of output from your Python script to something which never reads from standard input. This is not unique to the : command; you can get the same result by piping to any command which doesn't read standard input, such as
python testscript.py | cd .
Or for a simpler example, consider a script printer.py containing nothing more than
print 'abcde'
Then
python printer.py | python printer.py
will produce the same error.
When you pipe the output of one program into another, the output produced by the writing program gets backed up in a buffer, and waits for the reading program to request that data from the buffer. As long as the buffer is nonempty, any attempt to close the writing file object is supposed to fail with an error. This is the root cause of the messages you're seeing.
The specific code that triggers the error is in the C language implementation of Python, which explains why you can't catch it with a try/except block: it runs after the contents of your script has finished processing. Basically, while Python is shutting itself down, it attempts to close stdout, but that fails because there is still buffered output waiting to be read. So Python tries to report this error as it would normally, but sys.excepthook has already been removed as part of the finalization procedure, so that fails. Python then tries to print a message to sys.stderr, but that has already been deallocated so again, it fails. The reason you see the messages on the screen is that the Python code does contain a contingency fprintf to write out some output to the file pointer directly, even if Python's output object doesn't exist.
Technical details
For those interested in the details of this procedure, let's take a look at the Python interpreter's shutdown sequence, which is implemented in the Py_Finalize function of pythonrun.c.
After invoking exit hooks and shutting down threads, the finalization code calls PyImport_Cleanup to finalize and deallocate all imported modules. The next-to-last task performed by this function is removing the sys module, which mainly consists of calling _PyModule_Clear to clear all the entries in the module's dictionary - including, in particular, the standard stream objects (the Python objects) such as stdout and stderr.
When a value is removed from a dictionary or replaced by a new value, its reference count is decremented using the Py_DECREF macro. Objects whose reference count reaches zero become eligible for deallocation. Since the sys module holds the last remaining references to the standard stream objects, when those references are unset by _PyModule_Clear, they are then ready to be deallocated.1
Deallocation of a Python file object is accomplished by the file_dealloc function in fileobject.c. This first invokes the Python file object's close method using the aptly-named close_the_file function:
ret = close_the_file(f);
For a standard file object, close_the_file(f) delegates to the C fclose function, which sets an error condition if there is still data to be written to the file pointer. file_dealloc then checks for that error condition and prints the first message you see:
if (!ret) {
PySys_WriteStderr("close failed in file object destructor:\n");
PyErr_Print();
}
else {
Py_DECREF(ret);
}
After printing that message, Python then attempts to display the exception using PyErr_Print. That delegates to PyErr_PrintEx, and as part of its functionality, PyErr_PrintEx attempts to access the Python exception printer from sys.excepthook.
hook = PySys_GetObject("excepthook");
This would be fine if done in the normal course of a Python program, but in this situation, sys.excepthook has already been cleared.2 Python checks for this error condition and prints the second message as a notification.
if (hook && hook != Py_None) {
...
} else {
PySys_WriteStderr("sys.excepthook is missing\n");
PyErr_Display(exception, v, tb);
}
After notifying us about the missing excepthook, Python then falls back to printing the exception info using PyErr_Display, which is the default method for displaying a stack trace. The very first thing this function does is try to access sys.stderr.
PyObject *f = PySys_GetObject("stderr");
In this case, that doesn't work because sys.stderr has already been cleared and is inaccessible.3 So the code invokes fprintf directly to send the third message to the C standard error stream.
if (f == NULL || f == Py_None)
fprintf(stderr, "lost sys.stderr\n");
Interestingly, the behavior is a little different in Python 3.4+ because the finalization procedure now explicitly flushes the standard output and error streams before builtin modules are cleared. This way, if you have data waiting to be written, you get an error that explicitly signals that condition, rather than an "accidental" failure in the normal finalization procedure. Also, if you run
python printer.py | python printer.py
using Python 3.4 (after putting parentheses on the print statement of course), you don't get any error at all. I suppose the second invocation of Python may be consuming standard input for some reason, but that's a whole separate issue.
1Actually, that's a lie. Python's import mechanism caches a copy of each imported module's dictionary, which is not released until _PyImport_Fini runs, later in the implementation of Py_Finalize, and that's when the last references to the standard stream objects disappear. Once the reference count reaches zero, Py_DECREF deallocates the objects immediately. But all that matters for the main answer is that the references are removed from the sys module's dictionary and then deallocated sometime later.
2Again, this is because the sys module's dictionary is cleared completely before anything is really deallocated, thanks to the attribute caching mechanism. You can run Python with the -vv option to see all the module's attributes being unset before you get the error message about closing the file pointer.
3This particular piece of behavior is the only part that doesn't make sense unless you know about the attribute caching mechanism mentioned in previous footnotes.
I ran into this sort of issue myself today and went looking for an answer. I think a simple workaround here is to ensure you flush stdio first, so python blocks instead of failing during script shutdown. For example:
--- a/testscript.py
+++ b/testscript.py
## -9,5 +9,6 ## sys.excepthook = excepthook
try:
for n in range(20):
print n
+ sys.stdout.flush()
except:
pass
Then with this script nothing happens, as the exception (IOError: [Errno 32] Broken pipe) is suppressed by the try...except.
$ python testscript.py | :
$
In your program throws an exception that can not be caught using try/except block. To catch him, override function sys.excepthook:
import sys
sys.excepthook = lambda *args: None
From documentation:
sys.excepthook(type, value, traceback)
When an exception is raised and uncaught, the interpreter calls
sys.excepthook with three arguments, the exception class, exception
instance, and a traceback object. In an interactive session this
happens just before control is returned to the prompt; in a Python
program this happens just before the program exits. The handling of
such top-level exceptions can be customized by assigning another
three-argument function to sys.excepthook.
Illustrative example:
import sys
import logging
def log_uncaught_exceptions(exception_type, exception, tb):
logging.critical(''.join(traceback.format_tb(tb)))
logging.critical('{0}: {1}'.format(exception_type, exception))
sys.excepthook = log_uncaught_exceptions
I realize that this is an old question, but I found it in a Google search for the error. In my case it was a coding error. One of my last statements was:
print "Good Bye"
The solution was simply fixing the syntax to:
print ("Good Bye")
[Raspberry Pi Zero, Python 2.7.9]

Three ways to print in Python -- when to use each?

According to Tim Peters, "There should be one-- and preferably only one --obvious way to do it." In Python, there appears to be three ways to print information:
print('Hello World', end='')
sys.stdout.write('Hello World')
os.write(1, b'Hello World')
Question: Are there best-practice policies that state when each of these three different methods of printing should be used in a program?
Note that the statement of Tim is perfectly correct: there is only one obvious way to do it: print().
The other two possibilities that you mention have different goals.
If we want to summarize the goals of the three alternatives:
print is the high-level function that allow you to write something to stdout(or an other file). It provides a simple and readable API, with some fancy options about how the single items are separated, or whether you want to add or not a terminator etc. This is what you want to do most of the time.
sys.stdout.write is just a method of the file objects. So the real point of sys.stdout is that you can pass it around as if it were any other file. This is useful when you have to deal with a function that is expecting a file and you want it to print the text directly on stdout.
In other words you shouldn't use sys.stdout.write at all. You just pass around sys.stdout to code that expects a file.
Note: in python2 there were some situations where using the print statement produced worse code than calling sys.stdout.write. However the print function allows you to define the separator and terminator and thus avoids almost all these corner cases.
os.write is a low-level call to write to a file. You must manually encode the contents and you also have to pass the file descriptor explicitly. This is meant to handle only low level code that, for some reason, cannot be implemented on top of the higher-level interfaces. You almost never want to call this directly, because it's not required and has a worse API than the rest.
Note that if you have code that should write down things on a file, it's better to do:
my_file.write(a)
# ...
my_file.write(b)
# ...
my_file.write(c)
Than:
print(a, file=my_file)
# ...
print(b, file=my_file)
# ...
print(c, file=my_file)
Because it's more DRY. Using print you have to repeat file= everytime. This is fine if you have to write only in one place of the code, but if you have 5/6 different writes is much easier to simply call the write method directly.
To me print is the right way to print to stdout, but :
There is a good reason why sys.stdout.write exists - Imagine a class which generates some text output, and you want to make it write to either stdout, and file on disk, or a string. Ideally the class really shouldn't care what output type it is writing to. The class can simple be given a file object, and so long as that object supports the write method, the class can use the write method to output the text.
Two of these methods require importing entire modules. Based on this alone, print() is the best standard use option.
sys.stdout is useful whenever stdout may change. This gives quite a bit of power for stream handling.
os.write is useful for os specific writing tasks (non blocking writes for instance)
This question has been asked a number of times on this site for sys.stdout vs. print:
Python - The difference between sys.stdout.write and print
print() vs sys.stdout.write(): which and why?
One example for using os.write (non blocking file writes demonstrated in the question below). The function may only be useful on some os's but it still must remain portable even when certain os's don't support different/special behaviors.
How to write to a file using non blocking IO?

Categories