I have a python script that performs a simulation. It takes a fairly long, varying time to run through each iteration, so I print a . after each loop as a way to monitor how fast it runs and how far it went through the for statement as the script runs. So the code has this general structure:
for step in steps:
run_simulation(step)
# Python 3.x version:
print('.', end='')
# for Python 2.x:
# print '.',
However, when I run the code, the dots do not appear one by one. Instead, all the dots are printed at once when the loop finishes, which makes the whole effort pointless. How can I print the dots inline as the code runs?
This problem can also occur when iterating over data fed from another process and trying to print results, for example to echo input from an Electron app. See Python not printing output.
The issue
By default, output from a Python program is buffered to improve performance. The terminal is a separate program from your code, and it is more efficient to store up text and communicate it all at once, rather than separately asking the terminal program to display each symbol.
Since terminal programs are usually meant to be used interactively, with input and output progressing a line at a time (for example, the user is expected to hit Enter to indicate the end of a single input item), the default is to buffer the output a line at a time.
So, if no newline is printed, the print function (in 3.x; print statement in 2.x) will simply add text to the buffer, and nothing is displayed.
Outputting in other ways
Every now and then, someone will try to output from a Python program by using the standard output stream directly:
import sys
sys.stdout.write('test')
This will have the same problem: if the output does not end with a newline, it will sit in the buffer until it is flushed.
Fixing the issue
For a single print
We can explicitly flush the output after printing.
In 3.x, the print function has a flush keyword argument, which allows for solving the problem directly:
for _ in range(10):
print('.', end=' ', flush=True)
time.sleep(.2) # or other time-consuming work
In 2.x, the print statement does not offer this functionality. Instead, flush the stream explicitly, using its .flush method. The standard output stream (where text goes when printed, by default) is made available by the sys standard library module, and is named stdout. Thus, the code will look like:
for _ in range(10):
print '.',
sys.stdout.flush()
time.sleep(.2) # or other time-consuming work
For multiple prints
Rather than flushing after every print (or deciding which ones need flushing afterwards), it is possible to disable the output line buffering completely. There are many ways to do this, so please refer to the linked question.
Related
I'm a beginner at Python and coding in general, and I've been primarily using the trinket interpreter to work out some of my scripts. I was just trying to get used to defining a function with if and elif lines alongside the return command.
The code I'm trying to run is a pretty simple one, but when I run it regularly nothing shows up. However, when I run it through the console it
comes out fine. What am I doing wrong?
def the_line(text):
if text == ("just a boy"):
phrase = ("I fly alone")
elif text == "syndrome":
phrase = ("darn you syndrome")
return phrase
the_line("just a boy")
The first picture is what happens when I run it regularly and the second is through the console.
In the console, when you run a statement but don't assign it to anything, the shell will print the resulting object. You call the function but don't save it in a variable, so it is displayed. The "console" in your IDE is also called a Read, Evaluate and Print Loop (REPL).
But your code really just discarded that return value. That's what you see in the first case, the returned object wasn't assigned to anything and the object was deleted. You could assign it to a variable and print, if you want to see it.
def the_line(text):
if text == ("just a boy"):
phrase = ("I fly alone")
elif text == "syndrome":
phrase = ("darn you syndrome")
return phrase
foo = the_line("just a boy")
print(foo)
(As a side note, 4 spaces for indents please. We are not running out of spaces).
This is very clearly explained in Think Python's section 2.4. It has everything to do with the Read-Eval-Print-Loop (REPL) concept.
Briefly, the console is a REPL, so you see the output because it Prints what it Evaluated after Reading something (and then it prompts again for you to input something, that's the loop part). When you run the way you call "regularly", you are in what is called "script mode" (as in Think Python). Script mode simply Reads and Evaluates, there is not the Print-Loop part. A REPL is also called "interactive mode".
One could say that a REPL is very useful for prototyping and testing things out, but script mode is more useful for automation.
What you need to see the output would be like
print(the_line("just a boy"))
for line number 9.
I have a python script that performs a simulation. It takes a fairly long, varying time to run through each iteration, so I print a . after each loop as a way to monitor how fast it runs and how far it went through the for statement as the script runs. So the code has this general structure:
for step in steps:
run_simulation(step)
# Python 3.x version:
print('.', end='')
# for Python 2.x:
# print '.',
However, when I run the code, the dots do not appear one by one. Instead, all the dots are printed at once when the loop finishes, which makes the whole effort pointless. How can I print the dots inline as the code runs?
This problem can also occur when iterating over data fed from another process and trying to print results, for example to echo input from an Electron app. See Python not printing output.
The issue
By default, output from a Python program is buffered to improve performance. The terminal is a separate program from your code, and it is more efficient to store up text and communicate it all at once, rather than separately asking the terminal program to display each symbol.
Since terminal programs are usually meant to be used interactively, with input and output progressing a line at a time (for example, the user is expected to hit Enter to indicate the end of a single input item), the default is to buffer the output a line at a time.
So, if no newline is printed, the print function (in 3.x; print statement in 2.x) will simply add text to the buffer, and nothing is displayed.
Outputting in other ways
Every now and then, someone will try to output from a Python program by using the standard output stream directly:
import sys
sys.stdout.write('test')
This will have the same problem: if the output does not end with a newline, it will sit in the buffer until it is flushed.
Fixing the issue
For a single print
We can explicitly flush the output after printing.
In 3.x, the print function has a flush keyword argument, which allows for solving the problem directly:
for _ in range(10):
print('.', end=' ', flush=True)
time.sleep(.2) # or other time-consuming work
In 2.x, the print statement does not offer this functionality. Instead, flush the stream explicitly, using its .flush method. The standard output stream (where text goes when printed, by default) is made available by the sys standard library module, and is named stdout. Thus, the code will look like:
for _ in range(10):
print '.',
sys.stdout.flush()
time.sleep(.2) # or other time-consuming work
For multiple prints
Rather than flushing after every print (or deciding which ones need flushing afterwards), it is possible to disable the output line buffering completely. There are many ways to do this, so please refer to the linked question.
I'm using a script that runs for many hours, which prints statements to verify whether or not issues might have arisen (data is downloaded from the web, which sometimes gets distorted).
I've noticed a significant drop in performance after a while. I suspect that the many thousands of lines of print statements might be the reason.
It is commonly known that the terminal can be cleared of these print statements by the following line of code:
import os
os.system('cls') # for windows
Still, I suspect that this doesn't actually improve the performance speed and that it's merely a perceived improvement due to the fact that the screen is cleared. Is that true or not?
I've also considered suppressing certain print statements with the following code:
import sys
class NullWriter(object):
def write(self, arg):
pass
nullwrite = NullWriter()
oldstdout = sys.stdout
sys.stdout = oldstdout # enable output
print("text that I want to see")
sys.stdout = nullwrite # disable output
print("text I don't want to see")
My question: How can I improve the performance (speed) of my script, given that I still want to see the most recent print statements?
If you like you can just do a line feed without a carriage return and override the last line:
sys.stdout.write("\rDoing things")
sys.stdout.flush()
Printing over time shouldn't use any extra memory within python, but you might have your terminal's buffer set to high which can use a lot of memory. Or it's just taking time to flush the buffer because you're writing so fast to stdout.
You can also use the print function
Python 2.6+
From Python 2.6 you can import the print function from Python 3:
from __future__ import print_function
This allows you to use the Python 3 solution below.
Python 3
In Python 3, the print statement has been changed into a function. In Python 3, you can instead do:
print('.', end='')
I'm using a print statement in a python 2.7 script in which I'm creating instances of data modeling classes. They're fairly large classes which do a good number of calculations in property setters during the init, so it's not the fastest executing script. I use print statements to have some idea of progress, but what's interesting is how they're executing. The code looks something like this:
from __future__ import absolute_import, division, print_function, unicode_literals
print('Loading data...', end='\t')
data = LoadData(data_path)
first_model = FirstModel(parameters).fit(data)
print('Done.\nFitting second model...', end='\t')
# prints 'Done.' and then there's a very long pause...
# suddenly 'Fitting second model...' prints and the next model initializes almost immediately
second_model = SecondModel(parameters).fit(data)
results = second_model.forecast(future_dates)
Why would the statement print('Done.\nFitting second model...', end=\t') first print 'Done.' and then pause for a long period of time? There was one instance when I was running this code, and after the 'Done.' printed I got an error before the rest of the statement printed. The error returned was an error in SecondModel where I tried too access a method as an attribute. What's going on here? How or why is python executing this print statement in such a counterintuitive way? It's as if the interpreter views the new line character as an indication that it should start looking at later parts of the code.
By default, print calls are buffered. The buffer is flushed whenever a newline character is encountered (therefore, you see Done\n appear). However, the subsequent text is kept in the buffer until the next event that flushes it (in the absence of some subsequent newline character to print, that'll probably be Python either returning to the command prompt or exiting completely to the shell, depending on how you're running this script). Therefore, your time-consuming call to SecondModel().fit() is occurring between the display of the two lines.
To avoid this, you can flush the buffer manually by calling sys.stdout.flush() immediately after the print. Or, if you were ever to move to Python 3.3 or higher, you would be able to shortcut this by passing the additional argument flush=True into print().
Error messages can interrupt printed output, and vice versa, because by default they are handled by two separate streams: sys.stderr and sys.stdout, respectively. The two streams have separate buffers.
According to Tim Peters, "There should be one-- and preferably only one --obvious way to do it." In Python, there appears to be three ways to print information:
print('Hello World', end='')
sys.stdout.write('Hello World')
os.write(1, b'Hello World')
Question: Are there best-practice policies that state when each of these three different methods of printing should be used in a program?
Note that the statement of Tim is perfectly correct: there is only one obvious way to do it: print().
The other two possibilities that you mention have different goals.
If we want to summarize the goals of the three alternatives:
print is the high-level function that allow you to write something to stdout(or an other file). It provides a simple and readable API, with some fancy options about how the single items are separated, or whether you want to add or not a terminator etc. This is what you want to do most of the time.
sys.stdout.write is just a method of the file objects. So the real point of sys.stdout is that you can pass it around as if it were any other file. This is useful when you have to deal with a function that is expecting a file and you want it to print the text directly on stdout.
In other words you shouldn't use sys.stdout.write at all. You just pass around sys.stdout to code that expects a file.
Note: in python2 there were some situations where using the print statement produced worse code than calling sys.stdout.write. However the print function allows you to define the separator and terminator and thus avoids almost all these corner cases.
os.write is a low-level call to write to a file. You must manually encode the contents and you also have to pass the file descriptor explicitly. This is meant to handle only low level code that, for some reason, cannot be implemented on top of the higher-level interfaces. You almost never want to call this directly, because it's not required and has a worse API than the rest.
Note that if you have code that should write down things on a file, it's better to do:
my_file.write(a)
# ...
my_file.write(b)
# ...
my_file.write(c)
Than:
print(a, file=my_file)
# ...
print(b, file=my_file)
# ...
print(c, file=my_file)
Because it's more DRY. Using print you have to repeat file= everytime. This is fine if you have to write only in one place of the code, but if you have 5/6 different writes is much easier to simply call the write method directly.
To me print is the right way to print to stdout, but :
There is a good reason why sys.stdout.write exists - Imagine a class which generates some text output, and you want to make it write to either stdout, and file on disk, or a string. Ideally the class really shouldn't care what output type it is writing to. The class can simple be given a file object, and so long as that object supports the write method, the class can use the write method to output the text.
Two of these methods require importing entire modules. Based on this alone, print() is the best standard use option.
sys.stdout is useful whenever stdout may change. This gives quite a bit of power for stream handling.
os.write is useful for os specific writing tasks (non blocking writes for instance)
This question has been asked a number of times on this site for sys.stdout vs. print:
Python - The difference between sys.stdout.write and print
print() vs sys.stdout.write(): which and why?
One example for using os.write (non blocking file writes demonstrated in the question below). The function may only be useful on some os's but it still must remain portable even when certain os's don't support different/special behaviors.
How to write to a file using non blocking IO?