Python - doctest vs. unittest [closed]

Python - doctest vs. unittest [closed] - python

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I'm trying to get started with unit testing in Python and I was wondering if someone could explain the advantages and disadvantages of doctest and unittest.
What conditions would you use each for?

Both are valuable. I use both doctest and nose taking the place of unittest. I use doctest for cases where the test is giving an example of usage that is actually useful as documentation. Generally I don't make these tests comprehensive, aiming solely for informative. I'm effectively using doctest in reverse: not to test my code is correct based on my doctest, but to check that my documentation is correct based on the code.
The reason is that I find comprehensive doctests will clutter your documentation far too much, so you will either end up with either unusable docstrings, or incomplete testing.
For actually testing the code, the goal is to thoroughly test every case, rather than illustrate what is does by example, which is a different goal which I think is better met by other frameworks.

I use unittest almost exclusively.
Once in a while, I'll put some stuff in a docstring that's usable by doctest.
95% of the test cases are unittest.
Why? I like keeping docstrings somewhat shorter and more to the point. Sometimes test cases help clarify a docstring. Most of the time, the application's test cases are too long for a docstring.

Another advantage of doctesting is that you get to make sure your code does what your documentation says it does. After a while, software changes can make your documentation and code do different things. :-)

I work as a bioinformatician, and most of the code I write is "one time, one task" scripts, code that will be run only once or twice and that execute a single specific task.
In this situation, writing big unittests may be overkill, and doctests are an useful compromise. They are quicker to write, and since they are usually incorporated in the code, they allow to always keep an eye on how the code should behave, without having to have another file open. That's useful when writing small script.
Also, doctests are useful when you have to pass your script to a researcher that is not expert in programming. Some people find it very difficult to understand how unittests are structured; on the other hand, doctests are simple examples of usage, so people can just copy and paste them to see how to use them.
So, to resume my answer: doctests are useful when you have to write small scripts, and when you have to pass them or show them to researchers that are not computer scientists.

If you're just getting started with the idea of unit testing, I would start with doctest because it is so simple to use. It also naturally provides some level of documentation. And for more comprehensive testing with doctest, you can place tests in an external file so it doesn't clutter up your documentation.
I would suggest unittest if you're coming from a background of having used JUnit or something similar, where you want to be able to write unit tests in generally the same way as you have been elsewhere.

I don't use doctest as a replacement for unittest. Although they overlap a bit, the two modules don't have the same function:
I use unittest as a unit testing framework, meaning it helps me determine quickly the impact of any modification on the rest of the code.
I use doctest as a guarantee that comments (namely docstrings) are still relevant to current version of the code.
The widely documented benefits of test driven development I get from unittest. doctest solves the far more subtle danger of having outdated comments misleading the maintenance of the code.

I use unittest exclusively; I think doctest clutters up the main module too much. This probably has to do with writing thorough tests.

Using both is a valid and rather simple option. The doctest module provides the DoctTestSuite and DocFileSuite methods which create a unittest-compatible testsuite from a module or file, respectively.
So I use both and typically use doctest for simple tests with functions that require little or no setup (simple types for arguments). I actually think a few doctest tests help document the function, rather than detract from it.
But for more complicated cases, and for a more comprehensive set of test cases, I use unittest which provides more control and flexibility.

I almost never use doctests. I want my code to be self documenting, and the docstrings provide the documentation to the user. IMO adding hundreds of lines of tests to a module makes the docstrings far less readable. I also find unit tests easier to modify when needed.

Doctest can some times lead to wrong result. Especially when output contains escape sequences. For example
def convert():
"""
>>> convert()
'\xe0\xa4\x95'
"""
a = '\xe0\xa4\x95'
return a
import doctest
doctest.testmod()
gives
**********************************************************************
File "hindi.py", line 3, in __main__.convert
Failed example:
convert()
Expected:
'क'
Got:
'\xe0\xa4\x95'
**********************************************************************
1 items had failures:
1 of 1 in __main__.convert
***Test Failed*** 1 failures.
Also doesn't check the type of the output. It just compares the output strings. For example it have made some type rational which prints just like integer if it is a whole number. Then suppose you have function which return rational. So, a doctest won't differentiate if the output is rational whole number or a integer number.

I prefer the discovery based systems ("nose" and "py.test", using the former currently).
doctest is nice when the test is also good as a documentation, otherwise they tend to clutter the code too much.

Related

What to mock for python test cases?

I want to understand what needs to be mocked and what not when writing test cases in general.
For example, we will mock I/O operations, but what about functions imported from another module. Are we supposed to mock them as well?

Mocking should be done for a reason. Good reasons are:
You can not easily make the depended-on-component (DOC) behave as intended for your tests.
Does calling the DOC cause any non-derministic behaviour (date/time, randomness, network connections)?
The test setup is overly complex and/or maintenance intensive (like, need for external files)
The original DOC brings portability problems for your test code.
Does using the original DOC cause unnacceptably long build / execution times?
Has the DOC stability (maturity) issues that make the tests unreliable, or, worse, is the DOC not even available yet?
For example, you (typically) don't mock standard library math functions like sin or cos, because they don't have any of the abovementioned problems.

You really have to know what you are unit testing. From there it will be clear what to mock...

overriding the way Python unittest prints results

I am utterly confused by the unittest documentation: TestResult, TestLoader, testing framework, etc.
I just want to tweak the way the final results of a test run are printed out.
I have a specific thing I need to do: I am in fact using Jython, so when a bit of code raises an ExecutionException I need to dig down into the cause of this exception (ExecutionException.getCause()) to find the "real" exception which occurred, where it occurred, etc. At the moment I am just getting the location of the Future.get() which raises such an exception, and the message from the original exception (with no location). Useful, but could be improved.
Shouldn't it (in principle) be really simple to find out the object responsible for outputting the results of the testing and override some method like "print_result"...
There is another question here: Overriding Python Unit Test module for custom output? [code updated]
... this has no answers, and although the questioner said 9 months ago that he had "solved" it, he hasn't provided an answer. In any event it looks horribly complicated for what is a not unreasonable way of wishing to tweak things mildly... isn't there a simple way to do this?
later, answer to MartinBroadhurst's question about documenting during the run:
In fact I could laboriously surround all sorts of bits of code with try...except followed by a documentation function ... but if I don't do that any unexpected exceptions obviously get ejected, ultimately being caught by the testing framework.
In fact I have a decorator which I've made, #vigil( is_EDT ) (boolean param), which I use to decorate most methods and functions, the primary function of which is to check that the decorated method is being called in the "right kind of thread" (i.e. either the EDT or a non-EDT thread). This could be extended to trap any kinds of exceptions ... which is something I did previously as a solution to this problem of mine. This then printed out the exception details there and then, which was fine: the stuff was obviously not printed out at the same time as the results of the unittest run, but it was useful.
But in fact I shouldn't need to resort to my vigil function in this "make-and-mend" way! It really should be possible to tweak the unittest classes to override the way an exception is handled! Ultimately, unless some unittest guru can answer this question of mine, I'm going to have to examine the unittest source code and find out a way that way.
In a previous question of mine I asked about what appear to be a couple of non-functioning methods of unittest.TestResult... and it does regretfully appear this is not implemented as the Python documentation claims. Similarly, a little bit of additional experimentation just now seems to suggest more misdocumentation : on the python documentation page for unittest they appear to have incorrectly documented TestResult.startTest(), stopTest(), etc.: the parameter "test" should not be there (the convention in this documentation appears to be to omit the self param, and each of these methods takes only the self param).
In short, the whole unittest module is surprisingly unwieldy and dodgy... I'm surprised not least because I would have thought others in more influential positions than me would have got things changed...

How to approach unittesting and TDD (using python + nose)

I have been trying to get the hang of TDD and unit testing (in python, using nose) and there are a few basic concepts which I'm stuck on. I've read up a lot on the subject but nothing seems to address my issues - probably because they're so basic they're assumed to be understood.
The idea of TDD is that unit tests are written before the code they test. Unit test should test small portions of code (e.g. functions) which, for the purposes of the test, are self-contained and isolated. However, this seems to me to be highly dependent on the implementation. During implementation, or during a later bugfix it may become necessary to abstract some of the code into a new function. Should I then go through all my tests and mock out that function to keep them isolated? Surely in doing this there is a danger of introducing new bugs into the tests, and the tests will no longer test exactly the same situation?
From my limited experience in writing unit tests, it appears that completely isolating a function sometimes results in a test that is longer and more complicated than the code it is testing. So if the test fails all it tells you is that there is either a bug in the code or in the test, but its not obvious which. Not isolating it may mean a much shorter and easier to read test, but then its not a unit test...
Often, once isolated, unit tests seem to be merely repeating the function. E.g. if there is a simple function which adds two numbers, then the test would probably look something like assert add(a, b) == a + b. Since the implementation is simply return a + b, what's the point in the test? A far more useful test would be to see how the function works within the system, but this goes against unit testing because it is no longer isolated.
My conclusion is that unit tests are good in some situations, but not everywhere and that system tests are generally more useful. The approach that this implies is to write system tests first, then, if they fail, isolate portions of the system into unit tests to pinpoint the failure. The problem with this, obviously, is that its not so easy to test corner cases. It also means that the development is not fully test driven, as unit tests are only written as needed.
So my basic questions are:
Should unit tests be used everywhere, however small and simple the function?
How does one deal with changing implementations? I.e. should the implementation of the tests change continuously too, and doesn't this reduce their usefulness?
What should be done when the test gets more complicated than the code its testing?
Is it always best to start with unit tests, or is it better to start with system tests, which at the start of development are much easier to write?

Regarding your conclusion first: both unit tests and system tests (integration tests) both have their use, and are in my opinion just as useful. During development I find it easier to start with unit tests, but for testing legacy code I find your approach where you start with the integration tests easier. I don't think there's a right or wrong way of doing this, the goal is to make a safetynet that allows you to write solid and well tested code, not the method itself.
I find it useful to think about each function as an API in this context. The unit test is testing the API, not the implementation. If the implementation changes, the test should remain the same, this is the safety net that allows you to refactor your code with confidence. Even if refactoring means taking part of the implementation out to a new function, I will say it's ok to keep the test as it is without stubbing or mocking the part that was refactored out. You will probably want a new set of tests for the new function however.
Unit tests are not a holy grail! Test code should be fairly simple in my opinion, and it should be little reason for the test code itself to fail. If the test becomes more complex than the function it tests, it probably means you need to refactor the code differently. An example from my own past: I had some code that took some input and produced some output stored as XML. Parsing the XML to verifying that the output was correct caused a lot of complexity in my tests. However realizing that the XML-representation was not the point, I was able to refactor the code so that I could test the output without messing with the details of XML.
Some functions are so trivial that a separate test for them adds no value. In your example you're not really testing your code, but that the '+' operator in your language works as expected. This should be tested by the language implementer, not you. However that function won't need to get very much more complex before adding a test for it is worthwhile.
In short, I think your observations are very relevant and point towards a pragmatic approach to testing. Following some rigorous definition too closely will often get in the way, even though the definitions themselves may be necessary for the purpose of having a way to communicate about the ideas they convey. As said, the goal is not the method, but the result; which for testing is to have confidence in your code.

1) Should unit tests be used everywhere, however small and simple the function?
No. If a function has no logic in it (if, while-loops, appends, etc...) there's nothing to test.
This means that an add function implemented like:
def add(a, b):
return a + b
It doesn't have anything to test. But if you really want to build a test for it, then:
assert add(a, b) == a + b # Worst test ever!
is the worst test one could ever write. The main problem is that the tested logic must NOT be reproduced in the testing code, because:
If there's a bug in there it will be reproduced as well.
You're no more testing the function but that a + b works in the same way in two different files.
So it would make more sense something like:
assert add(1, 2) == 3
But once again, this is just an example, and this add function shouldn't even be tested.
2) How does one deal with changing implementations?
It depends on what changes. Keep in mind that:
You're testing the API (roughly speaking, that for a given input you get a specific output/effect).
You're not repeating the production code in your testing code (as explained before).
So, unless you're changing the API of your production code, the testing code will not be affacted in any way.
3) What should be done when the test gets more complicated than the code its testing?
Yell at whoever wrote those tests! (And re-write them).
Unit tests are simple and don't have any logic in them.
4a) Is it always best to start with unit tests, or is it better to start with system tests?
If we are talking about TDD than one shouldn't even have this problem, because even before writing one little tiny function the good TDD developer would've written unit tests for it.
If you have already working code without tests whatsoever, I'd say that unit tests are easier to write.
4b) Which at the start of development are much easier to write?
Unit tests! Because you don't even have the root of your code, how could you write system tests?

Non-critical unittest failures

I'm using Python's built-in unittest module and I want to write a few tests that are not critical.
I mean, if my program passes such tests, that's great! However, if it doesn't pass, it's not really a problem, the program will still work.
For example, my program is designed to work with a custom type "A". If it fails to work with "A", then it's broken. However, for convenience, most of it should also work with another type "B", but that's not mandatory. If it fails to work with "B", then it's not broken (because it still works with "A", which is its main purpose). Failing to work with "B" is not critical, I will just miss a "bonus feature" I could have.
Another (hypothetical) example is when writing an OCR. The algorithm should recognize most images from the tests, but it's okay if some of them fails. (and no, I'm not writing an OCR)
Is there any way to write non-critical tests in unittest (or other testing framework)?

As a practical matter, I'd probably use print statements to indicate failure in that case. A more correct solution is to use warnings:
http://docs.python.org/library/warnings.html
You could, however, use the logging facility to generate a more detailed record of your test results (i.e. set your "B" class failures to write warnings to the logs).
http://docs.python.org/library/logging.html
Edit:
The way we handle this in Django is that we have some tests we expect to fail, and we have others that we skip based on the environment. Since we can generally predict whether a test SHOULD fail or pass (i.e. if we can't import a certain module, the system doesn't have it, and so the test won't work), we can skip failing tests intelligently. This means that we still run every test that will pass, and have no tests that "might" pass. Unit tests are most useful when they do things predictably, and being able to detect whether or not a test SHOULD pass before we run it makes this possible.

Asserts in unit tests are binary: they will work or they will fail, there's no mid-term.
Given that, to create those "non-critical" tests you should not use assertions when you don't want the tests to fail. You should do this carefully so you don't compromise the "usefulness" of the test.
My advice to your OCR example is that you use something to record the success rate in your tests code and then create one assertion like: "assert success_rate > 8.5", and that should give the effect you desire.

Thank you for the great answers. No only one answer was really complete, so I'm writing here a combination of all answers that helped me. If you like this answer, please vote up the people who were responsible for this.
Conclusions
Unit tests (or at least unit tests in unittest module) are binary. As Guilherme Chapiewski says: they will work or they will fail, there's no mid-term.
Thus, my conclusion is that unit tests are not exactly the right tool for this job. It seems that unit tests are more concerned about "keep everything working, no failure is expected", and thus I can't (or it's not easy) to have non-binary tests.
So, unit tests don't seem the right tool if I'm trying to improve an algorithm or an implementation, because unit tests can't tell me how better is one version when compared to the other (supposing both of them are correctly implemented, then both will pass all unit tests).
My final solution
My final solution is based on ryber's idea and code shown in wcoenen answer. I'm basically extending the default TextTestRunner and making it less verbose. Then, my main code call two test suits: the critical one using the standard TextTestRunner, and the non-critical one, with my own less-verbose version.
class _TerseTextTestResult(unittest._TextTestResult):
def printErrorList(self, flavour, errors):
for test, err in errors:
#self.stream.writeln(self.separator1)
self.stream.writeln("%s: %s" % (flavour,self.getDescription(test)))
#self.stream.writeln(self.separator2)
#self.stream.writeln("%s" % err)
class TerseTextTestRunner(unittest.TextTestRunner):
def _makeResult(self):
return _TerseTextTestResult(self.stream, self.descriptions, self.verbosity)
if __name__ == '__main__':
sys.stderr.write("Running non-critical tests:\n")
non_critical_suite = unittest.TestLoader().loadTestsFromTestCase(TestSomethingNonCritical)
TerseTextTestRunner(verbosity=1).run(non_critical_suite)
sys.stderr.write("\n")
sys.stderr.write("Running CRITICAL tests:\n")
suite = unittest.TestLoader().loadTestsFromTestCase(TestEverythingImportant)
unittest.TextTestRunner(verbosity=1).run(suite)
Possible improvements
It should still be useful to know if there is any testing framework with non-binary tests, like Kathy Van Stone suggested. Probably I won't use it this simple personal project, but it might be useful on future projects.

Im not totally sure how unittest works, but most unit testing frameworks have something akin to categories. I suppose you could just categorize such tests, mark them to be ignored, and then run them only when your interested in them. But I know from experience that ignored tests very quickly become...just that ignored tests that nobody ever runs and are therefore a waste of time and energy to write them.
My advice is for your app to do, or do not, there is no try.

From unittest documentation which you link:
Instead of unittest.main(), there are
other ways to run the tests with a
finer level of control, less terse
output, and no requirement to be run
from the command line. For example,
the last two lines may be replaced
with:
suite = unittest.TestLoader().loadTestsFromTestCase(TestSequenceFunctions)
unittest.TextTestRunner(verbosity=2).run(suite)
In your case, you can create separate TestSuite instances for the criticial and non-critical tests. You could control which suite is passed to the test runner with a command line argument. Test suites can also contain other test suites so you can create big hierarchies if you want.

Python 2.7 (and 3.1) added support for skipping some test methods or test cases, as well as marking some tests as expected failure.
http://docs.python.org/library/unittest.html#skipping-tests-and-expected-failures
Tests marked as expected failure won't be counted as failure on a TestResult.

There are some test systems that allow warnings rather than failures, but test_unit is not one of them (I don't know which ones do, offhand) unless you want to extend it (which is possible).
You can make the tests so that they log warnings rather than fail.
Another way to handle this is to separate out the tests and only run them to get the pass/fail reports and not have any build dependencies (this depends on your build setup).

Take a look at Nose : http://somethingaboutorange.com/mrl/projects/nose/0.11.1/
There are plenty of command line options for selecting tests to run, and you can keep your existing unittest tests.

Another possibility is to create a "B" branch (you ARE using some sort of version control, right?) and have your unit tests for "B" in there. That way, you keep your release version's unit tests clean (Look, all dots!), but still have tests for B. If you're using a modern version control system like git or mercurial (I'm partial to mercurial), branching/cloning and merging are trivial operations, so that's what I'd recommend.
However, I think you're using tests for something they're not meant to do. The real question is "How important to you is it that 'B' works?" Because your test suite should only have tests in it that you care whether they pass or fail. Tests that, if they fail, it means the code is broken. That's why I suggested only testing "B" in the "B" branch, since that would be the branch where you are developing the "B" feature.
You could test using logger or print commands, if you like. But if you don't care enough that it's broken to have it flagged in your unit tests, I'd seriously question whether you care enough to test it at all. Besides, that adds needless complexity (extra variables to set debug level, multiple testing vectors that are completely independent of each other yet operate within the same space, causing potential collisions and errors, etc, etc). Unless you're developing a "Hello, World!" app, I suspect your problem set is complicated enough without adding additional, unnecessary complications.

You could write your test so that they count success rate.
With OCR you could throw at code 1000 images and require that 95% is successful.
If your program must work with type A then if this fails the test fails. If it's not required to work with B, what is the value of doing such a test ?

How should unit tests be documented? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I'm trying to improve the number and quality of tests in my Python projects. One of the the difficulties I've encountered as the number of tests increase is knowing what each test does and how it's supposed to help spot problems. I know that part of keeping track of tests is better unit test names (which has been addressed elsewhere), but I'm also interested in understanding how documentation and unit testing go together.
How can unit tests be documented to improve their utility when those tests fail in the future? Specifically, what makes a good unit test docstring?
I'd appreciate both descriptive answers and examples of unit tests with excellent documentation. Though I'm working exclusively with Python, I'm open to practices from other languages.

I document most on my unit tests with the method name exclusively:
testInitializeSetsUpChessBoardCorrectly()
testSuccessfulPromotionAddsCorrectPiece()
For almost 100% of my test cases, this clearly explains what the unit test is validating and that's all I use. However, in a few of the more complicated test cases, I'll add a few comments throughout the method to explain what several lines are doing.
I've seen a tool before (I believe it was for Ruby) that would generate documentation files by parsing the names of all the test cases in a project, but I don't recall the name. If you had test cases for a chess Queen class:
testCanMoveStraightUpWhenNotBlocked()
testCanMoveStraightLeftWhenNotBlocked()
the tool would generate an HTML doc with contents something like this:
Queen requirements:
- can move straight up when not blocked.
- can move straight left when not blocked.

Perhaps the issue isn't in how best to write test docstrings, but how to write the tests themselves? Refactoring tests in such a way that they're self documenting can go a long way, and your docstring won't go stale when the code changes.
There's a few things you can do to make the tests clearer:
clear & descriptive test method names (already mentioned)
test body should be clear and concise (self documenting)
abstract away complicated setup/teardown etc. in methods
more?
For example, if you have a test like this:
def test_widget_run_returns_0():
widget = Widget(param1, param2, "another param")
widget.set_option(true)
widget.set_temp_dir("/tmp/widget_tmp")
widget.destination_ip = "10.10.10.99"
return_value = widget.run()
assert return_value == 0
assert widget.response == "My expected response"
assert widget.errors == None
You might replace the setup statements with a method call:
def test_widget_run_returns_0():
widget = create_basic_widget()
return_value = widget.run()
assert return_value == 0
assert_basic_widget(widget)
def create_basic_widget():
widget = Widget(param1, param2, "another param")
widget.set_option(true)
widget.set_temp_dir("/tmp/widget_tmp")
widget.destination_ip = "10.10.10.99"
return widget
def assert_basic_widget():
assert widget.response == "My expected response"
assert widget.errors == None
Note that your test method is now composed of a series of method calls with intent-revealing names, a sort of DSL specific to your tests. Does a test like that still need documentation?
Another thing to note is that your test method is mainly at one level of abstraction. Someone reading the test method will see the algorithm is:
creating a widget
calling run on the widget
asserting the code did what we expect
Their understanding of the test method is not muddied by the details of setting up the widget, which is one level of abstraction lower than the test method.
The first version of the test method follows the Inline Setup pattern. The second version follows Creation Method and Delegated Setup patterns.
Generally I'm against comments, except where they explain the "why" of the code. Reading Uncle Bob Martin's Clean Code convinced me of this. There is a chapter on comments, and there is a chapter on testing. I recommend it.
For more on automated testing best practices, do check out xUnit Patterns.

The name of the test method should describe exactly what you are testing. The documentation should say what makes the test fail.

You should use a combination of descriptive method names and comments in the doc string. A good way to do it is including a basic procedure and verification steps in the doc string. Then if you run these tests from some kind of testing framework that automates running the tests and collecting results, you can have the framework log the contents of the doc string for each test method along with its stdout+stderr.
Here's a basic example:
class SimpelTestCase(unittest.TestCase):
def testSomething(self):
""" Procedure:
1. Print something
2. Print something else
---------
Verification:
3. Verify no errors occurred
"""
print "something"
print "something else"
Having the procedure with the test makes it much easier to figure out what the test is doing. And if you include the docstring with the test output it makes figuring out what went wrong when going through the results later much easier. The previous place I worked at did something like this and it worked out very well when failures occurred. We ran the unit tests on every checkin automatically, using CruiseControl.

When the test fails (which should be before it ever passes) you should see the error message and be able to tell what's up. That only happens if you plan it that way.
It's entirely a matter of the naming of the test class, the test method, and the assert message. When a test fails, and you can't tell what is up from these three clues, then rename some things or break up some tests classes.
It doesn't happen if the name of the fixture is ClassXTests and the name of the test is TestMethodX and the error message is "expected true, returned false". That's a sign of sloppy test writing.
Most of the time you shouldn't have to read the test or any comments to know what has happened.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.