Non-critical unittest failures

Non-critical unittest failures - python

I'm using Python's built-in unittest module and I want to write a few tests that are not critical.
I mean, if my program passes such tests, that's great! However, if it doesn't pass, it's not really a problem, the program will still work.
For example, my program is designed to work with a custom type "A". If it fails to work with "A", then it's broken. However, for convenience, most of it should also work with another type "B", but that's not mandatory. If it fails to work with "B", then it's not broken (because it still works with "A", which is its main purpose). Failing to work with "B" is not critical, I will just miss a "bonus feature" I could have.
Another (hypothetical) example is when writing an OCR. The algorithm should recognize most images from the tests, but it's okay if some of them fails. (and no, I'm not writing an OCR)
Is there any way to write non-critical tests in unittest (or other testing framework)?

As a practical matter, I'd probably use print statements to indicate failure in that case. A more correct solution is to use warnings:
http://docs.python.org/library/warnings.html
You could, however, use the logging facility to generate a more detailed record of your test results (i.e. set your "B" class failures to write warnings to the logs).
http://docs.python.org/library/logging.html
Edit:
The way we handle this in Django is that we have some tests we expect to fail, and we have others that we skip based on the environment. Since we can generally predict whether a test SHOULD fail or pass (i.e. if we can't import a certain module, the system doesn't have it, and so the test won't work), we can skip failing tests intelligently. This means that we still run every test that will pass, and have no tests that "might" pass. Unit tests are most useful when they do things predictably, and being able to detect whether or not a test SHOULD pass before we run it makes this possible.

Asserts in unit tests are binary: they will work or they will fail, there's no mid-term.
Given that, to create those "non-critical" tests you should not use assertions when you don't want the tests to fail. You should do this carefully so you don't compromise the "usefulness" of the test.
My advice to your OCR example is that you use something to record the success rate in your tests code and then create one assertion like: "assert success_rate > 8.5", and that should give the effect you desire.

Thank you for the great answers. No only one answer was really complete, so I'm writing here a combination of all answers that helped me. If you like this answer, please vote up the people who were responsible for this.
Conclusions
Unit tests (or at least unit tests in unittest module) are binary. As Guilherme Chapiewski says: they will work or they will fail, there's no mid-term.
Thus, my conclusion is that unit tests are not exactly the right tool for this job. It seems that unit tests are more concerned about "keep everything working, no failure is expected", and thus I can't (or it's not easy) to have non-binary tests.
So, unit tests don't seem the right tool if I'm trying to improve an algorithm or an implementation, because unit tests can't tell me how better is one version when compared to the other (supposing both of them are correctly implemented, then both will pass all unit tests).
My final solution
My final solution is based on ryber's idea and code shown in wcoenen answer. I'm basically extending the default TextTestRunner and making it less verbose. Then, my main code call two test suits: the critical one using the standard TextTestRunner, and the non-critical one, with my own less-verbose version.
class _TerseTextTestResult(unittest._TextTestResult):
def printErrorList(self, flavour, errors):
for test, err in errors:
#self.stream.writeln(self.separator1)
self.stream.writeln("%s: %s" % (flavour,self.getDescription(test)))
#self.stream.writeln(self.separator2)
#self.stream.writeln("%s" % err)
class TerseTextTestRunner(unittest.TextTestRunner):
def _makeResult(self):
return _TerseTextTestResult(self.stream, self.descriptions, self.verbosity)
if __name__ == '__main__':
sys.stderr.write("Running non-critical tests:\n")
non_critical_suite = unittest.TestLoader().loadTestsFromTestCase(TestSomethingNonCritical)
TerseTextTestRunner(verbosity=1).run(non_critical_suite)
sys.stderr.write("\n")
sys.stderr.write("Running CRITICAL tests:\n")
suite = unittest.TestLoader().loadTestsFromTestCase(TestEverythingImportant)
unittest.TextTestRunner(verbosity=1).run(suite)
Possible improvements
It should still be useful to know if there is any testing framework with non-binary tests, like Kathy Van Stone suggested. Probably I won't use it this simple personal project, but it might be useful on future projects.

Im not totally sure how unittest works, but most unit testing frameworks have something akin to categories. I suppose you could just categorize such tests, mark them to be ignored, and then run them only when your interested in them. But I know from experience that ignored tests very quickly become...just that ignored tests that nobody ever runs and are therefore a waste of time and energy to write them.
My advice is for your app to do, or do not, there is no try.

From unittest documentation which you link:
Instead of unittest.main(), there are
other ways to run the tests with a
finer level of control, less terse
output, and no requirement to be run
from the command line. For example,
the last two lines may be replaced
with:
suite = unittest.TestLoader().loadTestsFromTestCase(TestSequenceFunctions)
unittest.TextTestRunner(verbosity=2).run(suite)
In your case, you can create separate TestSuite instances for the criticial and non-critical tests. You could control which suite is passed to the test runner with a command line argument. Test suites can also contain other test suites so you can create big hierarchies if you want.

Python 2.7 (and 3.1) added support for skipping some test methods or test cases, as well as marking some tests as expected failure.
http://docs.python.org/library/unittest.html#skipping-tests-and-expected-failures
Tests marked as expected failure won't be counted as failure on a TestResult.

There are some test systems that allow warnings rather than failures, but test_unit is not one of them (I don't know which ones do, offhand) unless you want to extend it (which is possible).
You can make the tests so that they log warnings rather than fail.
Another way to handle this is to separate out the tests and only run them to get the pass/fail reports and not have any build dependencies (this depends on your build setup).

Take a look at Nose : http://somethingaboutorange.com/mrl/projects/nose/0.11.1/
There are plenty of command line options for selecting tests to run, and you can keep your existing unittest tests.

Another possibility is to create a "B" branch (you ARE using some sort of version control, right?) and have your unit tests for "B" in there. That way, you keep your release version's unit tests clean (Look, all dots!), but still have tests for B. If you're using a modern version control system like git or mercurial (I'm partial to mercurial), branching/cloning and merging are trivial operations, so that's what I'd recommend.
However, I think you're using tests for something they're not meant to do. The real question is "How important to you is it that 'B' works?" Because your test suite should only have tests in it that you care whether they pass or fail. Tests that, if they fail, it means the code is broken. That's why I suggested only testing "B" in the "B" branch, since that would be the branch where you are developing the "B" feature.
You could test using logger or print commands, if you like. But if you don't care enough that it's broken to have it flagged in your unit tests, I'd seriously question whether you care enough to test it at all. Besides, that adds needless complexity (extra variables to set debug level, multiple testing vectors that are completely independent of each other yet operate within the same space, causing potential collisions and errors, etc, etc). Unless you're developing a "Hello, World!" app, I suspect your problem set is complicated enough without adding additional, unnecessary complications.

You could write your test so that they count success rate.
With OCR you could throw at code 1000 images and require that 95% is successful.
If your program must work with type A then if this fails the test fails. If it's not required to work with B, what is the value of doing such a test ?

Related

test isolation between pytest-hypothesis runs

I just migrated a pytest test suite from quickcheck to hypothesis. This worked quite well (and immediately uncovered some hidden edge case bugs), but one major difference I see is related to test isolation between the two property managers.
quickcheck seems to simply run the test function multiple times with different parameter values, each time running my function-scoped fixtures. This also results in many more dots in pytest's output.
hypothesis however seems to run only the body of the test function multiple times, meaning for example no transaction rollbacks between individual runs. This then means I cannot reliably assert for a number of DB entries when my test inserts something into the DB for example, since all the entries from the previous run would still be hanging around.
Am I missing something obvious here or is this expected behaviour? If so, is there a way to get the number of runs hypothesis has done as a variable to use inside the test?

I'm afraid you're a bit stuck and there isn't currently any good solution to this problem.
The way Hypothesis needs to work (which is the source of a lot of its improvements over pytest-quickcheck) doesn't meet pytest's assumptions about test execution. The problem is mostly on the pytest side - the current pytest fixture system has some very baked in assumptions about how you run a test that do not play well with taking control of the test execution, and the last time I tried to work around this I ended up sinking about a week of work into it before giving up and basically saying that either something needs to change on the pytest side or someone needs to fund this work if it's going to get any better.

Is there an alternative result for Python unit tests, other than a Pass or Fail?

I'm writing unit tests that have a database dependency (so technically they're functional tests). Often these tests not only rely on the database to be live and functional, but they can also rely on certain data to be available.
For example, in one test I might query the database to retrieve sample data that I am going to use to test the update or delete functionality. If data doesn't already exist, then this isn't exactly a failure in this context. I'm only concerned about the pass/fail status of the update or delete, and in this situation we didn't even get far enough to test it. So I don't want to give a false positive or false negative.
Is there an elegant way to have the unit test return a 3rd possible result? Such as a warning?

In general I think the advice by Paul Becotte is best for most cases:
This is a failure though- your tests failed to set up the system in
the way that your test required. Saying "the data wasn't there, so it
is okay for this test to fail" is saying that you don't really care
about whether the functionality you are testing works. Make your test
reliably insert the data immediately before retrieving it, which will
give you better insight into the state of your system.
However, in my particular case, I am writing a functional test that relies on data generated and manipulated from several processes. Generating it quickly at the beginning of the test just isn't practical (at least yet).
What I ultimately found to work as I need it to use skipTest as mentioned here:
Skip unittest if some-condition in SetUpClass fails

How to approach unittesting and TDD (using python + nose)

I have been trying to get the hang of TDD and unit testing (in python, using nose) and there are a few basic concepts which I'm stuck on. I've read up a lot on the subject but nothing seems to address my issues - probably because they're so basic they're assumed to be understood.
The idea of TDD is that unit tests are written before the code they test. Unit test should test small portions of code (e.g. functions) which, for the purposes of the test, are self-contained and isolated. However, this seems to me to be highly dependent on the implementation. During implementation, or during a later bugfix it may become necessary to abstract some of the code into a new function. Should I then go through all my tests and mock out that function to keep them isolated? Surely in doing this there is a danger of introducing new bugs into the tests, and the tests will no longer test exactly the same situation?
From my limited experience in writing unit tests, it appears that completely isolating a function sometimes results in a test that is longer and more complicated than the code it is testing. So if the test fails all it tells you is that there is either a bug in the code or in the test, but its not obvious which. Not isolating it may mean a much shorter and easier to read test, but then its not a unit test...
Often, once isolated, unit tests seem to be merely repeating the function. E.g. if there is a simple function which adds two numbers, then the test would probably look something like assert add(a, b) == a + b. Since the implementation is simply return a + b, what's the point in the test? A far more useful test would be to see how the function works within the system, but this goes against unit testing because it is no longer isolated.
My conclusion is that unit tests are good in some situations, but not everywhere and that system tests are generally more useful. The approach that this implies is to write system tests first, then, if they fail, isolate portions of the system into unit tests to pinpoint the failure. The problem with this, obviously, is that its not so easy to test corner cases. It also means that the development is not fully test driven, as unit tests are only written as needed.
So my basic questions are:
Should unit tests be used everywhere, however small and simple the function?
How does one deal with changing implementations? I.e. should the implementation of the tests change continuously too, and doesn't this reduce their usefulness?
What should be done when the test gets more complicated than the code its testing?
Is it always best to start with unit tests, or is it better to start with system tests, which at the start of development are much easier to write?

Regarding your conclusion first: both unit tests and system tests (integration tests) both have their use, and are in my opinion just as useful. During development I find it easier to start with unit tests, but for testing legacy code I find your approach where you start with the integration tests easier. I don't think there's a right or wrong way of doing this, the goal is to make a safetynet that allows you to write solid and well tested code, not the method itself.
I find it useful to think about each function as an API in this context. The unit test is testing the API, not the implementation. If the implementation changes, the test should remain the same, this is the safety net that allows you to refactor your code with confidence. Even if refactoring means taking part of the implementation out to a new function, I will say it's ok to keep the test as it is without stubbing or mocking the part that was refactored out. You will probably want a new set of tests for the new function however.
Unit tests are not a holy grail! Test code should be fairly simple in my opinion, and it should be little reason for the test code itself to fail. If the test becomes more complex than the function it tests, it probably means you need to refactor the code differently. An example from my own past: I had some code that took some input and produced some output stored as XML. Parsing the XML to verifying that the output was correct caused a lot of complexity in my tests. However realizing that the XML-representation was not the point, I was able to refactor the code so that I could test the output without messing with the details of XML.
Some functions are so trivial that a separate test for them adds no value. In your example you're not really testing your code, but that the '+' operator in your language works as expected. This should be tested by the language implementer, not you. However that function won't need to get very much more complex before adding a test for it is worthwhile.
In short, I think your observations are very relevant and point towards a pragmatic approach to testing. Following some rigorous definition too closely will often get in the way, even though the definitions themselves may be necessary for the purpose of having a way to communicate about the ideas they convey. As said, the goal is not the method, but the result; which for testing is to have confidence in your code.

1) Should unit tests be used everywhere, however small and simple the function?
No. If a function has no logic in it (if, while-loops, appends, etc...) there's nothing to test.
This means that an add function implemented like:
def add(a, b):
return a + b
It doesn't have anything to test. But if you really want to build a test for it, then:
assert add(a, b) == a + b # Worst test ever!
is the worst test one could ever write. The main problem is that the tested logic must NOT be reproduced in the testing code, because:
If there's a bug in there it will be reproduced as well.
You're no more testing the function but that a + b works in the same way in two different files.
So it would make more sense something like:
assert add(1, 2) == 3
But once again, this is just an example, and this add function shouldn't even be tested.
2) How does one deal with changing implementations?
It depends on what changes. Keep in mind that:
You're testing the API (roughly speaking, that for a given input you get a specific output/effect).
You're not repeating the production code in your testing code (as explained before).
So, unless you're changing the API of your production code, the testing code will not be affacted in any way.
3) What should be done when the test gets more complicated than the code its testing?
Yell at whoever wrote those tests! (And re-write them).
Unit tests are simple and don't have any logic in them.
4a) Is it always best to start with unit tests, or is it better to start with system tests?
If we are talking about TDD than one shouldn't even have this problem, because even before writing one little tiny function the good TDD developer would've written unit tests for it.
If you have already working code without tests whatsoever, I'd say that unit tests are easier to write.
4b) Which at the start of development are much easier to write?
Unit tests! Because you don't even have the root of your code, how could you write system tests?

How do I test whether a module is imported in Python for Test-Driven Development of a game?

I am about to make a game using python and libtcod roguelike game library.
More to the point, I am using PyMock because I am just starting to learn Test-Driven Development, and I am determined not to cheat. I really want to get into the habit of doing it properly, and according to TDD I need a failing unit test before I write my first line of code.
I figure my first test of my "production" code should be that its dependency, libcotdpy, is imported.
My testing file:
#!/usr/bin/python
import pymock # for mocking and unit testing
import game # my (empty) production code file, game.py
class InitializeTest(pymock.PyMockTestCase):
def test_libtcod_is_imported(self):
# How do I test that my production file imports the libtcodpy module?
if __name__=="__main__":
import unittest
unittest.main()
Please:
1) (python people) How do I test that the module is loaded?
2) (TDD people) Should I be unit testing something this basic? If not, what is the first thing I should be testing?

1) 'your_module' in sys.modules.
Don't actually use that, though:
2)
What should your library should do?
Is it “have a dependency on libcotdpy”? I think not.
You've just made a design choice that wasn't test-driven!
Write a test that demonstrates how you want to use the library. Don't think about how you're going to implement it. For example:
player = my_lib.PlayerCharacter()
assert player.position == (0, 0) # or whatever assert syntax `pymock` uses
press_key('k')
assert player.position == (0, 1)
Or something similar. (I don't know what you want your library to do, or how much libtcod provides.)

The way I usually think about TDD (and BDD) is at two levels of development: acceptance-testing level, and unit-testing level.
First thing I would do is write stories (acceptance criteria). What is the core feature of your application? Define an end-to-end scenario that explicit one feature, and goes end-to-end with it. That's your first story. Write a test for it, using an acceptance testing (or integration testing) framework. Unfortunately, I don't know Python tools, but in Java I would use JBehave, or FITnesse. It would be something very high-level, far away from the code, that considers your application as a "black box". Something like "When my input parameters are xxx, I run my application, the expected output is yyyy".
Run this test, it will fail because the underlying application doesn't exist. Create the minimal amount of classes to make it go red (and not throw an exception anymore). That's when you need to start the second phase of TDD: unit-TDD. It's basically a "descending analysis", from top-level to details, and this phase will contain a lot of red-green-refactor cycles, bringing a lot of different units in the game.
From time to time, re-run your original acceptance test, or refine it if your growing architecture and analysis forced you to make changes to specifications (theoretically, it shouldn't happen at that stage, but in practice it does, very often). When your acceptance test is completely green, you're done with that story, rinse and repeat.
All of that brings me to my point: pure TDD (I mean unit-TDD) is not practical. I mean I really like TDD, but trying to follow it religiously will be more a hassle than a help in the long run. Sometimes you will go and spike an approach to see if that goes well with the rest of your project, without writing tests first for it, and potentially rewrite it using TDD. but as long as you have acceptance tests to cover the whole lot, you're fine.

Even if there is a way to test that, I'd recommend not doing it.
Test from the client perspective (outside-in), what behavior is provided by your SUT (Game). Your tests (or your users) don't need to know (/care) that you expose this behavior using a library. As long as the behavior isn't broken, your tests should pass.
Also like another answer says, maybe you don't need the dependency - there may be a simpler solution (e.g. a hashtable might do where you instinctively jumped on a relational database). Listen to the tests... let the tests pull in behavior.
This also leaves you free to change the dependency in the future without having to fix a bunch of tests.

Unit-testing with dependencies between tests

How do you do unit testing when you have
some general unit tests
more sophisticated tests checking edge cases, depending on the general ones
To give an example, imagine testing a CSV-reader (I just made up a notation for demonstration),
def test_readCsv(): ...
#dependsOn(test_readCsv)
def test_readCsv_duplicateColumnName(): ...
#dependsOn(test_readCsv)
def test_readCsv_unicodeColumnName(): ...
I expect sub-tests to be run only if their parent test succeeds. The reason behind this is that running these tests takes time. Many failure reports that go back to a single reason wouldn't be informative, either. Of course, I could shoehorn all edge-cases into the main test, but I wonder if there is a more structured way to do this.
I've found these related but different questions,
How to structure unit tests that have dependencies?
Unit Testing - Is it bad form to have unit test calling other unit tests
UPDATE:
I've found TestNG which has great built-in support for test dependencies. You can write tests like this,
#Test{dependsOnMethods = ("test_readCsv"))
public void test_readCsv_duplicateColumnName() {
...
}

Personally, I wouldn't worry about creating dependencies between unit tests. This sounds like a bit of a code smell to me. A few points:
If a test fails, let the others fail to and get a good idea of the scale of the problem that the adverse code change made.
Test failures should be the exception rather than the norm, so why waste effort and create dependencies when the vast majority of the time (hopefully!) no benefit is derived? If failures happen often, your problem is not with unit test dependencies but with frequent test failures.
Unit tests should run really fast. If they are running slow, then focus your efforts on increasing the speed of these tests rather than preventing subsequent failures. Do this by decoupling your code more and using dependency injection or mocking.

Proboscis is a python version of TestNG (which is a Java library).
See packages.python.org/proboscis/
It supports dependencies, e.g.
#test(depends_on=[test_readCsv])
public void test_readCsv_duplicateColumnName() {
...
}

I'm not sure what language you're referring to (as you don't specifically mention it in your question) but for something like PHPUnit there is an #depends tag that will only run a test if the depended upon test has already passed.
Depending on what language or unit testing you use there may also be something similar available

I have implemented a plugin for Nose (Python) which adds support for test dependencies and test prioritization.
As mentioned in the other answers/comments this is often a bad idea, however there can be exceptions where you would want to do this (in my case it was performance for integration tests - with a huge overhead for getting into a testable state, minutes vs hours).
You can find it here: nosedep.
A minimal example is:
def test_a:
pass
#depends(before=test_a)
def test_b:
pass
To ensure that test_b is always run before test_a.

You may want use pytest-dependency. According to theirs documentation code looks elegant:
import pytest
#pytest.mark.dependency()
#pytest.mark.xfail(reason="deliberate fail")
def test_a():
assert False
#pytest.mark.dependency()
def test_b():
pass
#pytest.mark.dependency(depends=["test_a"])
def test_c():
pass
#pytest.mark.dependency(depends=["test_b"])
def test_d():
pass
#pytest.mark.dependency(depends=["test_b", "test_c"])
def test_e():
pass
Please note, it is plugin for pytest, not unittest which is part of python itself. So, you need 2 more dependencies (f.e. add into requirements.txt):
pytest==5.1.1
pytest-dependency==0.4.0

According to best practices and unit testing principles unit test should not depend on other ones.
Each test case should check concrete isolated behavior.
Then if some test case fail you will know exactly what became wrong with our code.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Non-critical unittest failures - python

Python 2.7 (and 3.1) added support for skipping some test methods or test cases, as well as marking some tests as expected failure. http://docs.python.org/library/unittest.html#skipping-tests-and-expected-failures Tests marked as expected failure won't be counted as failure on a TestResult.

Take a look at Nose : http://somethingaboutorange.com/mrl/projects/nose/0.11.1/ There are plenty of command line options for selecting tests to run, and you can keep your existing unittest tests.

You could write your test so that they count success rate. With OCR you could throw at code 1000 images and require that 95% is successful. If your program must work with type A then if this fails the test fails. If it's not required to work with B, what is the value of doing such a test ?

Related

test isolation between pytest-hypothesis runs

Is there an alternative result for Python unit tests, other than a Pass or Fail?

How to approach unittesting and TDD (using python + nose)

How do I test whether a module is imported in Python for Test-Driven Development of a game?

Unit-testing with dependencies between tests

Categories

Resources