are 'optimized' .pyo files unsafe?

are 'optimized' .pyo files unsafe? - python

I found out to my horror that python -O strips out assert statements. I use asserts anywhere and everywhere, and I think of asserts (like exceptions in general) as a form of flow control.
Python people: are python -O and .pyo files considered safe? Is it unsafe to rely on asserts?

It's not a good idea to rely on asserts. It's not a good idea to use asserts as flow control. The reason is exactly as you describe: they can be disabled. The documentation says it simply:
Assert statements are a convenient way to insert debugging assertions into a program
Asserts are for debugging, not to be relied on in production code.

Assertions are meant for catching bugs, not for flow control. It's therefore perfectly valid for an optimiser to strip them out because, by the time your code ships, those bugs should have been removed.
If you're using them as a general purpose exception raiser, I would suggest that you're using them wrongly.
There's a good page discussing this on the Python Wiki and I point you to the last bit specifically:
One important reason why assertions should only be used for self-tests of the program is that assertions can be disabled at compile time.
If Python is started with the -O option, then assertions will be stripped out and not evaluated. So if code uses assertions heavily, but is performance-critical, then there is a system for turning them off in release builds.

Related

S101 Use of assert detected for python tests

When I wrote unit test for a function and ran flake8 test_function().py, I received the following error:
S101 Use of assert detected. The enclosed code will be removed when compiling to optimised byte code.
My question:
How can I write unit tests without using assert keyword?
Should we ignore unit tests from the flake8 configuration?

imo B101 (from bandit) is one of the worst "error" codes to enforce -- almost noone runs with -O in python because (1) it doesn't make things faster and (2) many third party libraries use assert defensively and disabling it can change behaviour
calling assert a "security problem" is alarmist at best
that said, the error code makes no sense in tests so I would recommend disabling it there:
[flake8]
per-file-ignores =
tests: S101
you can also disable it via bandit's configuration, though I'm less familiar with that
disclaimer: I'm the current flake8 maintainer

Development vs Release Python Code

I am currently developing a Python application which I continually performance test, simply by recording the runtime of various parts.
A lot of the code is related only to the testing environment and would not exist in the real world application, I have these separated into functions and at the moment I comment out these calls when testing. This requires me to remember which calls refer to test only components (they are quite interleaved so I cannot group the functionality).
I was wondering if there was a better solution to this, the only idea I have had so far is creation of a 'mode' boolean and insertion of If statements, though this feels needlessly messy. I was hoping there might be some more standardised testing method that I am naive of.
I am new to python so I may have overlooked some simple solutions.
Thank you in advance

There are libraries for testing like those in the development-section of the standard library. If you did not use such tools yet, you should start to do so - they help a lot with testing. (especially unittest).
Normally Python runs programs in debug mode with __debug__ set to True (see docs on assert) - you can switch off debug mode by setting the command-line switches -O or -OO for optimization (see docs).
There is something about using specifically assertions in the Python Wiki

I'd say if you're commenting out several parts of your code when switching between debug&release mode I think you're doing wrong. Take a look for example to the logging library, as you can see, with that library you can specify the logging level you want to use only by changing a single parameter.
Try to avoid commenting specific parts of your debug code by having one or more variables which controls the mode (debug, release, ...) your script will run. You could also use some builtin ones python already provides

What is the use of the "-O" flag for running Python?

Python can run scripts in optimized mode (python -O) which turns off debugs, removes assert statements, and IIRC it also removes docstrings.
However, I have not seen it used. Is python -O actually used? If so, what for?

python -O does the following currently:
completely ignores asserts
sets the special builtin name __debug__ to False (which by default is True)
and when called as python -OO
removes docstrings from the code
I don't know why everyone forgets to mention the __debug__ issue; perhaps it is because I'm the only one using it :) An if __debug__ construct creates no bytecode at all when running under -O, and I find that very useful.

It saves a small amount of memory, and a small amount of disk space if you distribute any archive form containing only the .pyo files. (If you use assert a lot, and perhaps with complicated conditions, the savings can be not trivial and can extend to running time too).
So, it's definitely not useless -- and of course it's being used (if you deploy a Python-coded server program to a huge number N of server machines, why ever would you want to waste N * X bytes to keep docstrings which nobody, ever, would anyway be able to access?!). Of course it would be better if it saved even more, but, hey -- waste not, want not!-)
So it's pretty much a no-brainer to keep this functionality (which is in any case trivially simple to provide, you know;-) in Python 3 -- why add even "epsilon" to the latter's adoption difficulties?-)

Prepacked software in different Linux distributions often comes byte-compiled with -O. For example, this if from Fedora packaging guidelines for python applications:
In the past it was common practice to %ghost .pyo files in order to save a small amount of space on the users filesystem. However, this has two issues: 1. With SELinux, if a user is running python -O [APP] it will try to write the .pyos when they don't exist. This leads to AVC denial records in the logs. 2. If the system administrator runs python -OO [APP] the .pyos will get created with no docstrings. Some programs require docstrings in order to function. On subsequent runs with python -O [APP] python will use the cached .pyos even though a different optimization level has been requested. The only way to fix this is to find out where the .pyos are and delete them.
The current method of dealing with pyo files is to include them as is, no %ghosting.

Removing assertions means a small performance benefit, so you could use this for "release" code. Anyway nobody uses it because many Python libraries are open sourced and thus the help() function should work.
So, as long as there isn't any real optimization in this mode, you can ignore it.

What is the use of Python's basic optimizations mode? (python -O)

Python has a flag -O that you can execute the interpreter with. The option will generate "optimized" bytecode (written to .pyo files), and given twice, it will discard docstrings. From Python's man page:
-O Turn on basic optimizations. This changes the filename extension
for compiled (bytecode) files from .pyc to .pyo. Given twice,
causes docstrings to be discarded.
This option's two major features as I see it are:
Strip all assert statements. This trades defense against corrupt program state for speed. But don't you need a ton of assert statements for this to make a difference? Do you have any code where this is worthwhile (and sane?)
Strip all docstrings. In what application is the memory usage so critical, that this is a win? Why not push everything into modules written in C?
What is the use of this option?
Does it have a real-world value?

Another use for the -O flag is that the value of the __debug__ builtin variable is set to False.
So, basically, your code can have a lot of "debugging" paths like:
if __debug__:
# output all your favourite debugging information
# and then more
which, when running under -O, won't even be included as bytecode in the .pyo file; a poor man's C-ish #ifdef.
Remember that docstrings are being dropped only when the flag is -OO.

On stripping assert statements: this is a standard option in the C world, where many people believe part of the definition of ASSERT is that it doesn't run in production code. Whether stripping them out or not makes a difference depends less on how many asserts there are than on how much work those asserts do:
def foo(x):
assert x in huge_global_computation_to_check_all_possible_x_values()
# ok, go ahead and use x...
Most asserts are not like that, of course, but it's important to remember that you can do stuff like that.
As for stripping docstrings, it does seem like a quaint holdover from a simpler time, though I guess there are memory-constrained environments where it could make a difference.

If you have assertions in frequently called code (e.g. in an inner loop), stripping them can certainly make a difference. Extreme example:
$ python -c 'import timeit;print timeit.repeat("assert True")'
[0.088717937469482422, 0.088625192642211914, 0.088654994964599609]
$ python -O -c 'import timeit;print timeit.repeat("assert True")'
[0.029736995697021484, 0.029587030410766602, 0.029623985290527344]
In real scenarios, savings will usually be much less.
Stripping the docstrings might reduce the size of your code, and hence your working set.
In many cases, the performance impact will be negligible, but as always with optimizations, the only way to be sure is to measure.

I have never encountered a good reason to use -O. I have always assumed its main purpose is in case at some point in the future some meaningful optimization is added.

But don't you need a ton of assert statements for this to make a difference? Do you have any code where this is worthwhile (and sane?)
As an example, I have a piece of code that gets paths between nodes in a graph. I have an assert statement at the end of the function to check that the path doesn't contain duplicates:
assert not any(a == b for a, b in zip(path, path[1:]))
I like the peace of mind and clarity that this simple statement gives during development. In production, the code processes some big graphs and this single line can take up to 66% of the run time. Running with -O therefore gives a significant speed-up.

I imagine that the heaviest users of -O are py2exe py2app and similar.
I've personally never found a use for -O directly.

You've pretty much figured it out: It does practically nothing at all. You're almost never going to see speed or memory gains, unless you're severely hurting for RAM.

python coding speed and cleanest

Python is pretty clean, and I can code neat apps quickly.
But I notice I have some minor error someplace and I dont find the error at compile but at run time. Then I need to change and run the script again. Is there a way to have it break and let me modify and run?
Also, I dislike how python has no enums. If I were to write code that needs a lot of enums and types, should I be doing it in C++? It feels like I can do it quicker in C++.

"I don't find the error at compile but at run time"
Correct. True for all non-compiled interpreted languages.
"I need to change and run the script again"
Also correct. True for all non-compiled interpreted languages.
"Is there a way to have it break and let me modify and run?"
What?
If it's a run-time error, the script breaks, you fix it and run again.
If it's not a proper error, but a logic problem of some kind, then the program finishes, but doesn't work correctly. No language can anticipate what you hoped for and break for you.
Or perhaps you mean something else.
"...code that needs a lot of enums"
You'll need to provide examples of code that needs a lot of enums. I've been writing Python for years, and have no use for enums. Indeed, I've been writing C++ with no use for enums either.
You'll have to provide code that needs a lot of enums as a specific example. Perhaps in another question along the lines of "What's a Pythonic replacement for all these enums."
It's usually polymorphic class definitions, but without an example, it's hard to be sure.

With interpreted languages you have a lot of freedom. Freedom isn't free here either. While the interpreter won't torture you into dotting every i and crossing every T before it deems your code worthy of a run, it also won't try to statically analyze your code for all those problems. So you have a few choices.
1) {Pyflakes, pychecker, pylint} will do static analysis on your code. That settles the syntax issue mostly.
2) Test-driven development with nosetests or the like will help you. If you make a code change that breaks your existing code, the tests will fail and you will know about it. This is actually better than static analysis and can be as fast. If you test-first, then you will have all your code checked at test runtime instead of program runtime.
Note that with 1 & 2 in place you are a bit better off than if you had just a static-typing compiler on your side. Even so, it will not create a proof of correctness.
It is possible that your tests may miss some plumbing you need for the app to actually run. If that happens, you fix it by writing more tests usually. But you still need to fire up the app and bang on it to see what tests you should have written and didn't.

You might want to look into something like nosey, which runs your unit tests periodically when you've saved changes to a file. You could also set up a save-event trigger to run your unit tests in the background whenever you save a file (possible e.g. with Komodo Edit).
That said, what I do is bind the F7 key to run unit tests in the current directory and subdirectories, and the F6 key to run pylint on the current file. Frequent use of these allows me to spot errors pretty quickly.

Python is an interpreted language, there is no compile stage, at least not that is visible to the user. If you get an error, go back, modify the script, and try again. If your script has long execution time, and you don't want to stop-restart, you can try a debugger like pdb, using which you can fix some of your errors during runtime.
There are a large number of ways in which you can implement enums, a quick google search for "python enums" gives everything you're likely to need. However, you should look into whether or not you really need them, and if there's a better, more 'pythonic' way of doing the same thing.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

are 'optimized' .pyo files unsafe? - python

I found out to my horror that python -O strips out assert statements. I use asserts anywhere and everywhere, and I think of asserts (like exceptions in general) as a form of flow control. Python people: are python -O and .pyo files considered safe? Is it unsafe to rely on asserts?

Related

S101 Use of assert detected for python tests

Development vs Release Python Code

What is the use of the "-O" flag for running Python?

What is the use of Python's basic optimizations mode? (python -O)

python coding speed and cleanest

Categories

Resources