What is the use of the "-O" flag for running Python?

What is the use of the "-O" flag for running Python? - python

Python can run scripts in optimized mode (python -O) which turns off debugs, removes assert statements, and IIRC it also removes docstrings.
However, I have not seen it used. Is python -O actually used? If so, what for?

python -O does the following currently:
completely ignores asserts
sets the special builtin name __debug__ to False (which by default is True)
and when called as python -OO
removes docstrings from the code
I don't know why everyone forgets to mention the __debug__ issue; perhaps it is because I'm the only one using it :) An if __debug__ construct creates no bytecode at all when running under -O, and I find that very useful.

It saves a small amount of memory, and a small amount of disk space if you distribute any archive form containing only the .pyo files. (If you use assert a lot, and perhaps with complicated conditions, the savings can be not trivial and can extend to running time too).
So, it's definitely not useless -- and of course it's being used (if you deploy a Python-coded server program to a huge number N of server machines, why ever would you want to waste N * X bytes to keep docstrings which nobody, ever, would anyway be able to access?!). Of course it would be better if it saved even more, but, hey -- waste not, want not!-)
So it's pretty much a no-brainer to keep this functionality (which is in any case trivially simple to provide, you know;-) in Python 3 -- why add even "epsilon" to the latter's adoption difficulties?-)

Prepacked software in different Linux distributions often comes byte-compiled with -O. For example, this if from Fedora packaging guidelines for python applications:
In the past it was common practice to %ghost .pyo files in order to save a small amount of space on the users filesystem. However, this has two issues: 1. With SELinux, if a user is running python -O [APP] it will try to write the .pyos when they don't exist. This leads to AVC denial records in the logs. 2. If the system administrator runs python -OO [APP] the .pyos will get created with no docstrings. Some programs require docstrings in order to function. On subsequent runs with python -O [APP] python will use the cached .pyos even though a different optimization level has been requested. The only way to fix this is to find out where the .pyos are and delete them.
The current method of dealing with pyo files is to include them as is, no %ghosting.

Removing assertions means a small performance benefit, so you could use this for "release" code. Anyway nobody uses it because many Python libraries are open sourced and thus the help() function should work.
So, as long as there isn't any real optimization in this mode, you can ignore it.

Related

What is the best or proper way to allow debugging of generated code?

For various reasons, in one project I generate executable code by means of generating AST from various source files the compiling that to bytecode (though the question could also work for cases where the bytecode is generated directly I guess).
From some experimentation, it looks like the debugger more or less just uses the lineno information embedded in the AST alongside the filename passed to compile in order to provide a representation for the debugger's purposes, however this assumes the code being executed comes from a single on-disk file.
That is not necessarily the case for my project, the executable code can be pieced together from multiple sources, and some or all of these sources may have been fetched over the network, or been retrieved from non-disk storage (e.g. database).
And so my Y questions, which may be the wrong ones (hence the background):
is it possible to provide a memory buffer of some sort, or is it necessary to generate a singular on-disk representation of the "virtual source"?
how well would the debugger deal with jumping around between the different bits and pieces if the virtual source can't or should not be linearised[0]
and just in case, is the assumption of Python only supporting a single contiguous source file correct or can it actually be fed multiple sources somehow?
[0] for instance a web-style literate program would be debugged in its original form, jumping between the code sections, not in the so-called "tangled" form

Some of this can be handled by the trepan3k debugger. For other things various hooks are in place.
First of all it can debug based on bytecode alone. But of course stepping instructions won't be possible if the line number table doesn't exist. And for that reason if for no other, I would add a "line number" for each logical stopping point, such as at the beginning of statements. The numbers don't have to be line numbers, they could just count from 1 or be indexes into some other table. This is more or less how go's Pos type position works.
The debugger will let you set a breakpoint on a function, but that function has to exist and when you start any python program most of the functions you define don't exist. So the typically way to do this is to modify the source to call the debugger at some point. In trepan3k the lingo for this is:
from trepan.api import debug; debug()
Do that in a place where the other functions you want to break on and that have been defined.
And the functions can be specified as methods on existing variables, e.g. self.my_function()
One of the advanced features of this debugger is that will decompile the bytecode to produce source code. There is a command called deparse which will show you the context around where you are currently stopped.
Deparsing bytecode though is a bit difficult so depending on which kind of bytecode you get the results may vary.
As for the virtual source problem, well that situation is somewhat tolerated in the debugger, since that kind of thing has to go on when there is no source. And to facilitate this and remote debugging (where the file locations locally and remotely can be different), we allow for filename remapping.
Another library pyficache is used to for this remapping; it has the ability I believe remap contiguous lines of one file into lines in another file. And I think you could use this over and over again. However so far there hasn't been need for this. And that code is pretty old. So someone would have to beef up trepan3k here.
Lastly, related to trepan3k is a trepan-xpy which is a CPython bytecode debugger which can step bytecode instructions even when the line number table is empty.

Development vs Release Python Code

I am currently developing a Python application which I continually performance test, simply by recording the runtime of various parts.
A lot of the code is related only to the testing environment and would not exist in the real world application, I have these separated into functions and at the moment I comment out these calls when testing. This requires me to remember which calls refer to test only components (they are quite interleaved so I cannot group the functionality).
I was wondering if there was a better solution to this, the only idea I have had so far is creation of a 'mode' boolean and insertion of If statements, though this feels needlessly messy. I was hoping there might be some more standardised testing method that I am naive of.
I am new to python so I may have overlooked some simple solutions.
Thank you in advance

There are libraries for testing like those in the development-section of the standard library. If you did not use such tools yet, you should start to do so - they help a lot with testing. (especially unittest).
Normally Python runs programs in debug mode with __debug__ set to True (see docs on assert) - you can switch off debug mode by setting the command-line switches -O or -OO for optimization (see docs).
There is something about using specifically assertions in the Python Wiki

I'd say if you're commenting out several parts of your code when switching between debug&release mode I think you're doing wrong. Take a look for example to the logging library, as you can see, with that library you can specify the logging level you want to use only by changing a single parameter.
Try to avoid commenting specific parts of your debug code by having one or more variables which controls the mode (debug, release, ...) your script will run. You could also use some builtin ones python already provides

Counting number of symbols in Python script

I have a Telit module which runs [Python 1.5.2+] (http://www.roundsolutions.com/techdocs/python/Easy_Script_Python_r13.pdf)!. There are certain restrictions in the number of variable, module and method names I can use (< 500), the size of each variable (16k) and amount of RAM (~ 1MB). Refer pg 113&114 for details. I would like to know how to get the number of symbols being generated, size in RAM of each variable, memory usage (stack and heap usage).
I need something similar to a map file that gets generated with gcc after the linking process which shows me each constant / variable, symbol, its address and size allocated.

Python is an interpreted and dynamically-typed language, so generating that kind of output is very difficult, if it's even possible. I'd imagine that the only reasonable way to get this information is to profile your code on the target interpreter.
If you're looking for a true memory map, I doubt such a tool exists since Python doesn't go through the same kind of compilation process as C or C++. Since everything is initialized and allocated at runtime as the program is parsed and interpreted, there's nothing to say that one interpreter will behave the same as another, especially in a case such as this where you're running on such a different architecture. As a result, there's nothing to say that your objects will be created in the same locations or even with the same overall memory structure.
If you're just trying to determine memory footprint, you can do some manual checking with sys.getsizeof(object, [default]) provided that it is supported with Telit's libs. I don't think they're using a straight implementation of CPython. Even still, this doesn't always work and with raise a TypeError when an object's size cannot be determined if you don't specify the default parameter.
You might also get some interesting results by studying the output of the dis module's bytecode disassembly, but that assumes that dis works on your interpreter, and that your interpreter is actually implemented as a VM.
If you just want a list of symbols, take a look at this recipe. It uses reflection to dump a list of symbols.
Good manual testing is key here. Your best bet is to set up the module's CMUX (COM port MUXing), and watch the console output. You'll know very quickly if you start running out of memory.

This post makes me recall my pain once with Telit GM862-GPS modules. My code was exactly at the point that the number of variables, strings, etc added up to the limit. Of course, I didn't know this fact by then. I added one innocent line and my program did not work any more. I drove me really crazy for two days until I look at the datasheet to find this fact.
What you are looking for might not have a good answer because the Python interpreter is not a full fledged version. What I did was to use the same local variable names as many as possible. Also I deleted doc strings for functions (those count too) and replace with #comments.
In the end, I want to say that this module is good for small applications. The python interpreter does not support threads or interrupts so your program must be a super loop. When your application gets bigger, each iteration will take longer. Eventually, you might want to switch to a faster platform.

What is the use of Python's basic optimizations mode? (python -O)

Python has a flag -O that you can execute the interpreter with. The option will generate "optimized" bytecode (written to .pyo files), and given twice, it will discard docstrings. From Python's man page:
-O Turn on basic optimizations. This changes the filename extension
for compiled (bytecode) files from .pyc to .pyo. Given twice,
causes docstrings to be discarded.
This option's two major features as I see it are:
Strip all assert statements. This trades defense against corrupt program state for speed. But don't you need a ton of assert statements for this to make a difference? Do you have any code where this is worthwhile (and sane?)
Strip all docstrings. In what application is the memory usage so critical, that this is a win? Why not push everything into modules written in C?
What is the use of this option?
Does it have a real-world value?

Another use for the -O flag is that the value of the __debug__ builtin variable is set to False.
So, basically, your code can have a lot of "debugging" paths like:
if __debug__:
# output all your favourite debugging information
# and then more
which, when running under -O, won't even be included as bytecode in the .pyo file; a poor man's C-ish #ifdef.
Remember that docstrings are being dropped only when the flag is -OO.

On stripping assert statements: this is a standard option in the C world, where many people believe part of the definition of ASSERT is that it doesn't run in production code. Whether stripping them out or not makes a difference depends less on how many asserts there are than on how much work those asserts do:
def foo(x):
assert x in huge_global_computation_to_check_all_possible_x_values()
# ok, go ahead and use x...
Most asserts are not like that, of course, but it's important to remember that you can do stuff like that.
As for stripping docstrings, it does seem like a quaint holdover from a simpler time, though I guess there are memory-constrained environments where it could make a difference.

If you have assertions in frequently called code (e.g. in an inner loop), stripping them can certainly make a difference. Extreme example:
$ python -c 'import timeit;print timeit.repeat("assert True")'
[0.088717937469482422, 0.088625192642211914, 0.088654994964599609]
$ python -O -c 'import timeit;print timeit.repeat("assert True")'
[0.029736995697021484, 0.029587030410766602, 0.029623985290527344]
In real scenarios, savings will usually be much less.
Stripping the docstrings might reduce the size of your code, and hence your working set.
In many cases, the performance impact will be negligible, but as always with optimizations, the only way to be sure is to measure.

I have never encountered a good reason to use -O. I have always assumed its main purpose is in case at some point in the future some meaningful optimization is added.

But don't you need a ton of assert statements for this to make a difference? Do you have any code where this is worthwhile (and sane?)
As an example, I have a piece of code that gets paths between nodes in a graph. I have an assert statement at the end of the function to check that the path doesn't contain duplicates:
assert not any(a == b for a, b in zip(path, path[1:]))
I like the peace of mind and clarity that this simple statement gives during development. In production, the code processes some big graphs and this single line can take up to 66% of the run time. Running with -O therefore gives a significant speed-up.

I imagine that the heaviest users of -O are py2exe py2app and similar.
I've personally never found a use for -O directly.

You've pretty much figured it out: It does practically nothing at all. You're almost never going to see speed or memory gains, unless you're severely hurting for RAM.

Why python compile the source to bytecode before interpreting?

Why python compile the source to bytecode before interpreting?
Why not interpret from the source directly?

Nearly no interpreter really interprets code directly, line by line – it's simply too inefficient. Almost all interpreters use some intermediate representation which can be executed easily. Also, small optimizations can be performed on this intermediate code.
Python furthermore stores this code which has a huge advantage for the next time this code gets executed: Python doesn't have to parse the code anymore; parsing is the slowest part in the compile process. Thus, a bytecode representation reduces execution overhead quite substantially.

Because you can compile to a .pyc once and interpret from it many times.
So if you're running a script many times you only have the overhead of parsing the source code once.

Because interpretting from bytecode directly is faster. It avoids the need to do lexing, for one thing.

Re-lexing and parsing the source code over and over, rather than doing it just once (most often on the first import), would obviously be a silly and pointless waste of effort.

Although there is a small efficiency aspect to it (you can store the bytecode on disk or in memory), its mostly engineering: it allows you separate parsing from interpreting. Parsers can often be nasty creatures, full of edge-cases and having to conform to esoteric rules like using just the right amount of lookahead and resolving shift-reduce problems. By contrast, interpreting is really simple: its just a big switch statement using the bytecode's opcode.

I doubt very much that the reason is performance, albeit be it a nice side effect. I would say that it's only natural to think a VM built around some high-level assembly language would be more practical than to find and replace text in some source code string.
Edit:
Okay, clearly, who ever put a -1 vote on my post without leaving a reasonable comment to explain knows very little about virtual machines (run-time environments).
http://channel9.msdn.com/shows/Going+Deep/Expert-to-Expert-Erik-Meijer-and-Lars-Bak-Inside-V8-A-Javascript-Virtual-Machine/

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.