S101 Use of assert detected for python tests - python

When I wrote unit test for a function and ran flake8 test_function().py, I received the following error:
S101 Use of assert detected. The enclosed code will be removed when compiling to optimised byte code.
My question:
How can I write unit tests without using assert keyword?
Should we ignore unit tests from the flake8 configuration?

imo B101 (from bandit) is one of the worst "error" codes to enforce -- almost noone runs with -O in python because (1) it doesn't make things faster and (2) many third party libraries use assert defensively and disabling it can change behaviour
calling assert a "security problem" is alarmist at best
that said, the error code makes no sense in tests so I would recommend disabling it there:
[flake8]
per-file-ignores =
tests: S101
you can also disable it via bandit's configuration, though I'm less familiar with that
disclaimer: I'm the current flake8 maintainer

Related

Better python unittest integration?

I'm using GNU Emacs 24.5.1 to work on Python code. I often want to run just a single unit test. I can do this, for example, by running:
test=spi.test_views.IndexViewTest.generate_select2_data_with_embedded_spaces make test
with M-X compile. My life would be simpler if I could give some command like "Run the test where point is", and have emacs figure out the full name of the test for me. Is possible?
Update: with the folowing buffer, I'd like some command which runs M-X compile with:
test=spi.test_views.IndexViewTest.test_unknown_button make test
where spi is the name of the directory test_views.py is in. Well, technically, I need to construct the python path to my test function, but in practice, it'll be <directory.file.class.function>.
This seems like the kind of thing somebody would have already invented, but I don't see anything in the python mode docs.
I believe you use the "default" python mode, while the so-called elpy mode (that I strongly recommend giving a try when doing Python developments within Emacs) seems to provide what you are looking for:
C-c C-t (elpy-test)
Start a test run. This uses the currently configured test runner to discover
and run tests. If point is inside a test case, the test runner will run exactly
that test case. Otherwise, or if a prefix argument is given, it will run all tests.
Extra details
The elpy-test function internally relies on the function (elpy-test-at-point), which appears to be very close to the feature you mentioned in the question.
See e.g. the code/help excerpt in the following screenshot:

How to find code which was never executed in coverage.py despite a 100% coverage report

Consider the following code:
import math
def dumb_sqrt(x):
result = math.sqrt(x) if x >= 0 else math.sqrt(-x)*j
return result
def test_dumb_sqrt():
assert dumb_sqrt(9.) == 3.
The test can be executed like this:
$ pip install pytest pytest-cov
$ pytest test_thing.py --cov=test_thing --cov-report=html --cov-branch
The coverage report will consider all lines 100% covered, even with branch coverage enabled:
However, this code has a bug, and those of you with a keen eye may have seen it already. Should it ever go into the "else" branch, there will be an exception:
NameError: global name 'j' is not defined
It's easy to fix the bug: change the undefined j name into a literal 1j. It's also easy to add another test which will reveal the bug: assert dumb_sqrt(-9.) == 3j. Neither is what this question is asking about. I want to know how to find sections of code which were never actually executed despite a 100% code coverage report.
Using conditional expressions is one such culprit, but there are similar cases anywhere that Python can short-circuit an evaluation (x or y, x and y are other examples).
Preferably, the line 4 above could be colored as yellow in the report, similar to how the "if" line would have rendered had it avoided using the conditional expression in the first place:
Does coverage.py support such a feature? If so, how can you enable "inline branch coverage" in your cov reporting? If not, are there any other approaches to identify "hidden" code that was never actually executed by your test suite?
No, coverage.py doesn't handle conditional branching within an expression. This doesn't just affect the Python conditional expression, using and or or would also be affected:
# pretending, for the sake of illustration, that x will never be 0
result = x >= 0 and math.sqrt(x) or math.sqrt(-x)*j
Ned Batchelder, the maintainer of coverage.py, calls this a hidden conditional, in an article from 2007 covering this and other cases coverage.py can't handle.
This problem extends to if statements too! Take, for example:
if condition_a and (condition_b or condtion_c):
do_foo()
else:
do_bar()
If condition_b is always true when condition_a is true, you'll never find the typo in condtion_c, not if you rely solely on converage.py, because there is no support for conditional coverage (let alone more advanced concepts like modified condition/decision coverage and multiple condition coverage.
One hurdle to supporting conditional coverage is technical: coverage.py relies heavily on Python's built-in tracing support, but until recently this would only let you track execution per line. Ned actually explored work-arounds for this issue.
Not that that stopped a different project, instrumental from offering condition/decision coverage anyway. That project used AST rewriting and an import hook to add extra bytecode that'd let it track the outcome of individual conditions and so give you an overview of the 'truth table' of expressions. There is a huge drawback to this approach: it is very fragile, and frequently requires updating for new Python releases. As a result, the project broke with Python 3.4 and hasn't been fixed.
However, Python 3.7 added support for tracing at the opcode level, allowing a tracer to analyse the effect of each individual bytecode without having to resort to AST hacking. And with coverage.py 5.0 having reached a stable state, it appears that the project is considering adding support for condition coverage, with possible sponsors to back development.
So your options right now are:
Run your code in Python 3.3 and use instrumental
Repair instrumental to run on more recent Python versions
Wait for coverage.py to add condition coverage
Help write the feature for coverage.py
Hack together your own version of instrumental's approach using the Python 3.7 or newer 'opcode' tracing mode.
Don't use res1 if cond else res2 - it's about gotta be considered a single statement. If you write this as an if/else, I think coverage.py will do its job better.
And consider using something like pylint, or at least pyflakes. I believe these would've automatically detected the problem.

are 'optimized' .pyo files unsafe?

I found out to my horror that python -O strips out assert statements. I use asserts anywhere and everywhere, and I think of asserts (like exceptions in general) as a form of flow control.
Python people: are python -O and .pyo files considered safe? Is it unsafe to rely on asserts?
It's not a good idea to rely on asserts. It's not a good idea to use asserts as flow control. The reason is exactly as you describe: they can be disabled. The documentation says it simply:
Assert statements are a convenient way to insert debugging assertions into a program
Asserts are for debugging, not to be relied on in production code.
Assertions are meant for catching bugs, not for flow control. It's therefore perfectly valid for an optimiser to strip them out because, by the time your code ships, those bugs should have been removed.
If you're using them as a general purpose exception raiser, I would suggest that you're using them wrongly.
There's a good page discussing this on the Python Wiki and I point you to the last bit specifically:
One important reason why assertions should only be used for self-tests of the program is that assertions can be disabled at compile time.
If Python is started with the -O option, then assertions will be stripped out and not evaluated. So if code uses assertions heavily, but is performance-critical, then there is a system for turning them off in release builds.

Unit-testing with dependencies between tests

How do you do unit testing when you have
some general unit tests
more sophisticated tests checking edge cases, depending on the general ones
To give an example, imagine testing a CSV-reader (I just made up a notation for demonstration),
def test_readCsv(): ...
#dependsOn(test_readCsv)
def test_readCsv_duplicateColumnName(): ...
#dependsOn(test_readCsv)
def test_readCsv_unicodeColumnName(): ...
I expect sub-tests to be run only if their parent test succeeds. The reason behind this is that running these tests takes time. Many failure reports that go back to a single reason wouldn't be informative, either. Of course, I could shoehorn all edge-cases into the main test, but I wonder if there is a more structured way to do this.
I've found these related but different questions,
How to structure unit tests that have dependencies?
Unit Testing - Is it bad form to have unit test calling other unit tests
UPDATE:
I've found TestNG which has great built-in support for test dependencies. You can write tests like this,
#Test{dependsOnMethods = ("test_readCsv"))
public void test_readCsv_duplicateColumnName() {
...
}
Personally, I wouldn't worry about creating dependencies between unit tests. This sounds like a bit of a code smell to me. A few points:
If a test fails, let the others fail to and get a good idea of the scale of the problem that the adverse code change made.
Test failures should be the exception rather than the norm, so why waste effort and create dependencies when the vast majority of the time (hopefully!) no benefit is derived? If failures happen often, your problem is not with unit test dependencies but with frequent test failures.
Unit tests should run really fast. If they are running slow, then focus your efforts on increasing the speed of these tests rather than preventing subsequent failures. Do this by decoupling your code more and using dependency injection or mocking.
Proboscis is a python version of TestNG (which is a Java library).
See packages.python.org/proboscis/
It supports dependencies, e.g.
#test(depends_on=[test_readCsv])
public void test_readCsv_duplicateColumnName() {
...
}
I'm not sure what language you're referring to (as you don't specifically mention it in your question) but for something like PHPUnit there is an #depends tag that will only run a test if the depended upon test has already passed.
Depending on what language or unit testing you use there may also be something similar available
I have implemented a plugin for Nose (Python) which adds support for test dependencies and test prioritization.
As mentioned in the other answers/comments this is often a bad idea, however there can be exceptions where you would want to do this (in my case it was performance for integration tests - with a huge overhead for getting into a testable state, minutes vs hours).
You can find it here: nosedep.
A minimal example is:
def test_a:
pass
#depends(before=test_a)
def test_b:
pass
To ensure that test_b is always run before test_a.
You may want use pytest-dependency. According to theirs documentation code looks elegant:
import pytest
#pytest.mark.dependency()
#pytest.mark.xfail(reason="deliberate fail")
def test_a():
assert False
#pytest.mark.dependency()
def test_b():
pass
#pytest.mark.dependency(depends=["test_a"])
def test_c():
pass
#pytest.mark.dependency(depends=["test_b"])
def test_d():
pass
#pytest.mark.dependency(depends=["test_b", "test_c"])
def test_e():
pass
Please note, it is plugin for pytest, not unittest which is part of python itself. So, you need 2 more dependencies (f.e. add into requirements.txt):
pytest==5.1.1
pytest-dependency==0.4.0
According to best practices and unit testing principles unit test should not depend on other ones.
Each test case should check concrete isolated behavior.
Then if some test case fail you will know exactly what became wrong with our code.

Non-critical unittest failures

I'm using Python's built-in unittest module and I want to write a few tests that are not critical.
I mean, if my program passes such tests, that's great! However, if it doesn't pass, it's not really a problem, the program will still work.
For example, my program is designed to work with a custom type "A". If it fails to work with "A", then it's broken. However, for convenience, most of it should also work with another type "B", but that's not mandatory. If it fails to work with "B", then it's not broken (because it still works with "A", which is its main purpose). Failing to work with "B" is not critical, I will just miss a "bonus feature" I could have.
Another (hypothetical) example is when writing an OCR. The algorithm should recognize most images from the tests, but it's okay if some of them fails. (and no, I'm not writing an OCR)
Is there any way to write non-critical tests in unittest (or other testing framework)?
As a practical matter, I'd probably use print statements to indicate failure in that case. A more correct solution is to use warnings:
http://docs.python.org/library/warnings.html
You could, however, use the logging facility to generate a more detailed record of your test results (i.e. set your "B" class failures to write warnings to the logs).
http://docs.python.org/library/logging.html
Edit:
The way we handle this in Django is that we have some tests we expect to fail, and we have others that we skip based on the environment. Since we can generally predict whether a test SHOULD fail or pass (i.e. if we can't import a certain module, the system doesn't have it, and so the test won't work), we can skip failing tests intelligently. This means that we still run every test that will pass, and have no tests that "might" pass. Unit tests are most useful when they do things predictably, and being able to detect whether or not a test SHOULD pass before we run it makes this possible.
Asserts in unit tests are binary: they will work or they will fail, there's no mid-term.
Given that, to create those "non-critical" tests you should not use assertions when you don't want the tests to fail. You should do this carefully so you don't compromise the "usefulness" of the test.
My advice to your OCR example is that you use something to record the success rate in your tests code and then create one assertion like: "assert success_rate > 8.5", and that should give the effect you desire.
Thank you for the great answers. No only one answer was really complete, so I'm writing here a combination of all answers that helped me. If you like this answer, please vote up the people who were responsible for this.
Conclusions
Unit tests (or at least unit tests in unittest module) are binary. As Guilherme Chapiewski says: they will work or they will fail, there's no mid-term.
Thus, my conclusion is that unit tests are not exactly the right tool for this job. It seems that unit tests are more concerned about "keep everything working, no failure is expected", and thus I can't (or it's not easy) to have non-binary tests.
So, unit tests don't seem the right tool if I'm trying to improve an algorithm or an implementation, because unit tests can't tell me how better is one version when compared to the other (supposing both of them are correctly implemented, then both will pass all unit tests).
My final solution
My final solution is based on ryber's idea and code shown in wcoenen answer. I'm basically extending the default TextTestRunner and making it less verbose. Then, my main code call two test suits: the critical one using the standard TextTestRunner, and the non-critical one, with my own less-verbose version.
class _TerseTextTestResult(unittest._TextTestResult):
def printErrorList(self, flavour, errors):
for test, err in errors:
#self.stream.writeln(self.separator1)
self.stream.writeln("%s: %s" % (flavour,self.getDescription(test)))
#self.stream.writeln(self.separator2)
#self.stream.writeln("%s" % err)
class TerseTextTestRunner(unittest.TextTestRunner):
def _makeResult(self):
return _TerseTextTestResult(self.stream, self.descriptions, self.verbosity)
if __name__ == '__main__':
sys.stderr.write("Running non-critical tests:\n")
non_critical_suite = unittest.TestLoader().loadTestsFromTestCase(TestSomethingNonCritical)
TerseTextTestRunner(verbosity=1).run(non_critical_suite)
sys.stderr.write("\n")
sys.stderr.write("Running CRITICAL tests:\n")
suite = unittest.TestLoader().loadTestsFromTestCase(TestEverythingImportant)
unittest.TextTestRunner(verbosity=1).run(suite)
Possible improvements
It should still be useful to know if there is any testing framework with non-binary tests, like Kathy Van Stone suggested. Probably I won't use it this simple personal project, but it might be useful on future projects.
Im not totally sure how unittest works, but most unit testing frameworks have something akin to categories. I suppose you could just categorize such tests, mark them to be ignored, and then run them only when your interested in them. But I know from experience that ignored tests very quickly become...just that ignored tests that nobody ever runs and are therefore a waste of time and energy to write them.
My advice is for your app to do, or do not, there is no try.
From unittest documentation which you link:
Instead of unittest.main(), there are
other ways to run the tests with a
finer level of control, less terse
output, and no requirement to be run
from the command line. For example,
the last two lines may be replaced
with:
suite = unittest.TestLoader().loadTestsFromTestCase(TestSequenceFunctions)
unittest.TextTestRunner(verbosity=2).run(suite)
In your case, you can create separate TestSuite instances for the criticial and non-critical tests. You could control which suite is passed to the test runner with a command line argument. Test suites can also contain other test suites so you can create big hierarchies if you want.
Python 2.7 (and 3.1) added support for skipping some test methods or test cases, as well as marking some tests as expected failure.
http://docs.python.org/library/unittest.html#skipping-tests-and-expected-failures
Tests marked as expected failure won't be counted as failure on a TestResult.
There are some test systems that allow warnings rather than failures, but test_unit is not one of them (I don't know which ones do, offhand) unless you want to extend it (which is possible).
You can make the tests so that they log warnings rather than fail.
Another way to handle this is to separate out the tests and only run them to get the pass/fail reports and not have any build dependencies (this depends on your build setup).
Take a look at Nose : http://somethingaboutorange.com/mrl/projects/nose/0.11.1/
There are plenty of command line options for selecting tests to run, and you can keep your existing unittest tests.
Another possibility is to create a "B" branch (you ARE using some sort of version control, right?) and have your unit tests for "B" in there. That way, you keep your release version's unit tests clean (Look, all dots!), but still have tests for B. If you're using a modern version control system like git or mercurial (I'm partial to mercurial), branching/cloning and merging are trivial operations, so that's what I'd recommend.
However, I think you're using tests for something they're not meant to do. The real question is "How important to you is it that 'B' works?" Because your test suite should only have tests in it that you care whether they pass or fail. Tests that, if they fail, it means the code is broken. That's why I suggested only testing "B" in the "B" branch, since that would be the branch where you are developing the "B" feature.
You could test using logger or print commands, if you like. But if you don't care enough that it's broken to have it flagged in your unit tests, I'd seriously question whether you care enough to test it at all. Besides, that adds needless complexity (extra variables to set debug level, multiple testing vectors that are completely independent of each other yet operate within the same space, causing potential collisions and errors, etc, etc). Unless you're developing a "Hello, World!" app, I suspect your problem set is complicated enough without adding additional, unnecessary complications.
You could write your test so that they count success rate.
With OCR you could throw at code 1000 images and require that 95% is successful.
If your program must work with type A then if this fails the test fails. If it's not required to work with B, what is the value of doing such a test ?

Categories