The following doctests fail because the first affects the next.
def createNode(type):
"""Create a new node of `type`
Example:
>>> node = createNode("transform", name="myNode")
>>> node == "myNode"
True
"""
def getAttr(path):
"""Get attribute from `path`
Example:
>>> node = createNode("transform", name="myNode")
>>> node == "myNode"
True
>>> getAttr(node + ".translateX")
0.0
"""
I need to reset the external resource - in this case the scenegraph of Autodesk Maya - prior to running each individual doctest, with a function like this..
def setup():
cmds.file(new=True, force=True)
Granted I could call the above once per test, which I am until I find a solution, but for readability and maintenance of this project I'd prefer stowing the setup away into a dedicated function, for when it grows and needs to change.
Python's native doctest and nose both support calling a setup/teardown function, but only at a per-file level.
I'm happy to use any framework, I would also accept any level of hacking to get around it. It's for use with a single module of about 30-100 doctests, to be run on Travis through GitHub.
Unfortunately the easiest and most accurate -- though not the fastest -- option is to reset the scene before each test as you suggest. Apart from some application level globals you'll be resetting all of the scene content before each test.
However, even for trivial tests (create a cube, delete it, etc) you'll be spending a lot of time creating and deleting empty scenes -- a task which Maya is unfortunately rather slow at. Still, this is the best option because it would otherwise require complicated code to manage your invariants inside the tests -- a test that fails could easily leave state in the scene that "smarter" faster methods of resetting will miss, thus messing up the test run.
A lot of maya people will use mocks instead of scene tests wherever possible : guaranteeing that the same series of calls to cmds or the api are made in the same sequence each time without guaranteeing the results. That's less satisfying as a test but will run faster and catch many accidental changes.
You may want to see if the doctest constraint is negotiable -- a doctest will have to be exec'ed or something similar, so you will not be running your tests in the same scope as user code (at least, not necessarily) which also raises the possibility of code that runs fine in the tests but fails in the runtime.
If you run your tests using a maya.standalone instance you'll be able to churn through them a lot faster; the GUI lag is a significant part of the speed issue in using cmds.file() As long as you don't use GUI elements in the code you're testing -- which is a good idea to avoid anyway, since GUI testing is a much more complex business -- you should be OK
Related
A bit of a theoretical question that comes up with Python, since we can access almost anything we want even if it is underscored to sign as something "private".
def main_function():
_helper_function_()
...
_other_helper_function()
Doing it with TDD, you follow the Red-Green-Refactor cycle. A test looks like this now:
def test_main_function_for_something_only_helper_function_does():
# tedious setup
...
main_function()
assert something
The problem is that my main_function had so much setup steps that I've decided to test the helper functions for those specific cases:
from main_package import _helper_function
def test_helper_function_works_for_this_specific_input():
# no tedious setup
...
_helper_function_(some_input)
assert helper function does exactly something I expect
But this seems to be a bad practice. Should I even "know" about any inner/helper functions?
I refactored the main function to be more readable by moving out parts into these helper functions. So I've rewritten tests to actually test these smaller parts and created another test that the main function indeed calls them. This also seems counter-productive.
On the other hand I dislike the idea of a lot of lingering inner/helper functions with no dedicated unit tests to them, only happy path-like ones for the main function. I guess if I covered the original function before the refactoring, my old tests would be just as good enough.
Also if the main function breaks this would mean many additional tests for the helper ones are breaking too.
What is the better practice to follow?
The problem is that my main_function had so much setup steps that I've decided to test the helper functions for those specific cases
Excellent, that's exactly what's supposed to happen (the tests "driving" you to decompose the whole into smaller pieces that are easier to test).
Should I even "know" about any inner/helper functions?
Tradeoffs.
Yes, part of the point of modules is that they afford information hiding, allowing you to later change how the code does something without impacting clients, including test clients.
But also there are benefits to testing the internal modules directly; test design becomes simpler, with less coupling to irrelevant details. Fewer tests are coupled to each decision, which means that the blast radius is smaller when you need to change one of them.
My usual thinking goes like this: I should know that there are testable inner modules, and I can know that an outer module behaves like it is coupled to an inner module, but I shouldn't necessarily know that the outer module is coupled to the inner module.
assert X.result(A,B) == Y.sort([C,D,E])
If you squint at this, you'll see that it implies that X.result and Y.sort have some common requirement today, but it doesn't necessarily promise that X.result calls Y.sort.
So I've rewritten tests to actually test these smaller parts and created another test that the main function indeed calls them. This also seems counter-productive.
A works, and B works, and C works, and now here you are writing a test for f(A,B,C).... yeah, things go sideways.
The desired outcome of TDD is "Clean code that works" (Jeffries); and the truth of things is that you can get clean code that works without writing every test in the world.
Tests are most important in code where faults are most probable - straight line code where we are just wiring things together doesn't benefit nearly as much from the red-green-refactor cycle as code that has a lot of conditionals and branching.
There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies
For sections of code that are "so simple that there are obviously no deficiencies", a suite of automated programmer tests is not a great investment. Get two people to perform a manual review, and sign off on it.
Too many private/helper functions are often a sign of missing abstraction.
May be you should consider applying the 'Extract class' refactoring. This refactoring will solve your confusion, as the private members will end up becoming public members of the extracted class.
Please not, I am not suggesting here to create a class for every private member but rather to play with the model a bit to find a better design.
I have a custom framework which runs different code for different clients. I have monkeypatched certain methods in order to customize functionality for a client.
Here is the pattern simplified:
#import monkeypatches here
if self.config['client'] == 'cool_dudes':
from app.monkeypatches import Stuff
if self.config['client'] == 'cool_dudettes':
from app.monkeypatches import OtherStuff
Here is an example patch:
from app.framework.stuff import Stuff
def function_override(self):
return pass
Stuff.function = function_override
This works fine when the program executes as it is executed in a batch manner, spinning up from scratch every time. However, when running across unit tests, I find that the monkey patches persist across tests, causing unexpected behavior.
I realize that it would be far better to use an object oriented inheritance approach to these overrides, but I inherited this codebase and am not currently empowered to rearchitect it to that degree.
Barring properly re-architecting the program, how can I prevent these monkey patches from persisting across unit tests?
The modules, including app.framework.<whatever>, are not reloaded for every test. So, any changes in them you make persist. The same happens if your module is stateful (that's one of the reasons why global state is not such a good idea, you should rather keep state in objects).
Your options are to:
undo the monkey-patches when needed, or
change them into something more generic that would change (semi-)automatically depending on the test running, or
(preferred) Do not reinvent the wheel and use an existing, manageable, time-proven solution for your task (or at least, base your work on one if it doesn't meet your requirements completely). E.g. if you use them for mocking, see How can one mock/stub python module like urllib . Among the suggestions there is #mock.patch that does the patching for a specific test and undoes it upon its completion.
Anyone coming here looking for information about monkeypatching, might want to have a look at pytest's monkeypatch fixture. It avoids the problem of the OP by automatically undoing all modifications after the test function has finished.
I am about to make a game using python and libtcod roguelike game library.
More to the point, I am using PyMock because I am just starting to learn Test-Driven Development, and I am determined not to cheat. I really want to get into the habit of doing it properly, and according to TDD I need a failing unit test before I write my first line of code.
I figure my first test of my "production" code should be that its dependency, libcotdpy, is imported.
My testing file:
#!/usr/bin/python
import pymock # for mocking and unit testing
import game # my (empty) production code file, game.py
class InitializeTest(pymock.PyMockTestCase):
def test_libtcod_is_imported(self):
# How do I test that my production file imports the libtcodpy module?
if __name__=="__main__":
import unittest
unittest.main()
Please:
1) (python people) How do I test that the module is loaded?
2) (TDD people) Should I be unit testing something this basic? If not, what is the first thing I should be testing?
1) 'your_module' in sys.modules.
Don't actually use that, though:
2)
What should your library should do?
Is it “have a dependency on libcotdpy”? I think not.
You've just made a design choice that wasn't test-driven!
Write a test that demonstrates how you want to use the library. Don't think about how you're going to implement it. For example:
player = my_lib.PlayerCharacter()
assert player.position == (0, 0) # or whatever assert syntax `pymock` uses
press_key('k')
assert player.position == (0, 1)
Or something similar. (I don't know what you want your library to do, or how much libtcod provides.)
The way I usually think about TDD (and BDD) is at two levels of development: acceptance-testing level, and unit-testing level.
First thing I would do is write stories (acceptance criteria). What is the core feature of your application? Define an end-to-end scenario that explicit one feature, and goes end-to-end with it. That's your first story. Write a test for it, using an acceptance testing (or integration testing) framework. Unfortunately, I don't know Python tools, but in Java I would use JBehave, or FITnesse. It would be something very high-level, far away from the code, that considers your application as a "black box". Something like "When my input parameters are xxx, I run my application, the expected output is yyyy".
Run this test, it will fail because the underlying application doesn't exist. Create the minimal amount of classes to make it go red (and not throw an exception anymore). That's when you need to start the second phase of TDD: unit-TDD. It's basically a "descending analysis", from top-level to details, and this phase will contain a lot of red-green-refactor cycles, bringing a lot of different units in the game.
From time to time, re-run your original acceptance test, or refine it if your growing architecture and analysis forced you to make changes to specifications (theoretically, it shouldn't happen at that stage, but in practice it does, very often). When your acceptance test is completely green, you're done with that story, rinse and repeat.
All of that brings me to my point: pure TDD (I mean unit-TDD) is not practical. I mean I really like TDD, but trying to follow it religiously will be more a hassle than a help in the long run. Sometimes you will go and spike an approach to see if that goes well with the rest of your project, without writing tests first for it, and potentially rewrite it using TDD. but as long as you have acceptance tests to cover the whole lot, you're fine.
Even if there is a way to test that, I'd recommend not doing it.
Test from the client perspective (outside-in), what behavior is provided by your SUT (Game). Your tests (or your users) don't need to know (/care) that you expose this behavior using a library. As long as the behavior isn't broken, your tests should pass.
Also like another answer says, maybe you don't need the dependency - there may be a simpler solution (e.g. a hashtable might do where you instinctively jumped on a relational database). Listen to the tests... let the tests pull in behavior.
This also leaves you free to change the dependency in the future without having to fix a bunch of tests.
I'm using Python's built-in unittest module and I want to write a few tests that are not critical.
I mean, if my program passes such tests, that's great! However, if it doesn't pass, it's not really a problem, the program will still work.
For example, my program is designed to work with a custom type "A". If it fails to work with "A", then it's broken. However, for convenience, most of it should also work with another type "B", but that's not mandatory. If it fails to work with "B", then it's not broken (because it still works with "A", which is its main purpose). Failing to work with "B" is not critical, I will just miss a "bonus feature" I could have.
Another (hypothetical) example is when writing an OCR. The algorithm should recognize most images from the tests, but it's okay if some of them fails. (and no, I'm not writing an OCR)
Is there any way to write non-critical tests in unittest (or other testing framework)?
As a practical matter, I'd probably use print statements to indicate failure in that case. A more correct solution is to use warnings:
http://docs.python.org/library/warnings.html
You could, however, use the logging facility to generate a more detailed record of your test results (i.e. set your "B" class failures to write warnings to the logs).
http://docs.python.org/library/logging.html
Edit:
The way we handle this in Django is that we have some tests we expect to fail, and we have others that we skip based on the environment. Since we can generally predict whether a test SHOULD fail or pass (i.e. if we can't import a certain module, the system doesn't have it, and so the test won't work), we can skip failing tests intelligently. This means that we still run every test that will pass, and have no tests that "might" pass. Unit tests are most useful when they do things predictably, and being able to detect whether or not a test SHOULD pass before we run it makes this possible.
Asserts in unit tests are binary: they will work or they will fail, there's no mid-term.
Given that, to create those "non-critical" tests you should not use assertions when you don't want the tests to fail. You should do this carefully so you don't compromise the "usefulness" of the test.
My advice to your OCR example is that you use something to record the success rate in your tests code and then create one assertion like: "assert success_rate > 8.5", and that should give the effect you desire.
Thank you for the great answers. No only one answer was really complete, so I'm writing here a combination of all answers that helped me. If you like this answer, please vote up the people who were responsible for this.
Conclusions
Unit tests (or at least unit tests in unittest module) are binary. As Guilherme Chapiewski says: they will work or they will fail, there's no mid-term.
Thus, my conclusion is that unit tests are not exactly the right tool for this job. It seems that unit tests are more concerned about "keep everything working, no failure is expected", and thus I can't (or it's not easy) to have non-binary tests.
So, unit tests don't seem the right tool if I'm trying to improve an algorithm or an implementation, because unit tests can't tell me how better is one version when compared to the other (supposing both of them are correctly implemented, then both will pass all unit tests).
My final solution
My final solution is based on ryber's idea and code shown in wcoenen answer. I'm basically extending the default TextTestRunner and making it less verbose. Then, my main code call two test suits: the critical one using the standard TextTestRunner, and the non-critical one, with my own less-verbose version.
class _TerseTextTestResult(unittest._TextTestResult):
def printErrorList(self, flavour, errors):
for test, err in errors:
#self.stream.writeln(self.separator1)
self.stream.writeln("%s: %s" % (flavour,self.getDescription(test)))
#self.stream.writeln(self.separator2)
#self.stream.writeln("%s" % err)
class TerseTextTestRunner(unittest.TextTestRunner):
def _makeResult(self):
return _TerseTextTestResult(self.stream, self.descriptions, self.verbosity)
if __name__ == '__main__':
sys.stderr.write("Running non-critical tests:\n")
non_critical_suite = unittest.TestLoader().loadTestsFromTestCase(TestSomethingNonCritical)
TerseTextTestRunner(verbosity=1).run(non_critical_suite)
sys.stderr.write("\n")
sys.stderr.write("Running CRITICAL tests:\n")
suite = unittest.TestLoader().loadTestsFromTestCase(TestEverythingImportant)
unittest.TextTestRunner(verbosity=1).run(suite)
Possible improvements
It should still be useful to know if there is any testing framework with non-binary tests, like Kathy Van Stone suggested. Probably I won't use it this simple personal project, but it might be useful on future projects.
Im not totally sure how unittest works, but most unit testing frameworks have something akin to categories. I suppose you could just categorize such tests, mark them to be ignored, and then run them only when your interested in them. But I know from experience that ignored tests very quickly become...just that ignored tests that nobody ever runs and are therefore a waste of time and energy to write them.
My advice is for your app to do, or do not, there is no try.
From unittest documentation which you link:
Instead of unittest.main(), there are
other ways to run the tests with a
finer level of control, less terse
output, and no requirement to be run
from the command line. For example,
the last two lines may be replaced
with:
suite = unittest.TestLoader().loadTestsFromTestCase(TestSequenceFunctions)
unittest.TextTestRunner(verbosity=2).run(suite)
In your case, you can create separate TestSuite instances for the criticial and non-critical tests. You could control which suite is passed to the test runner with a command line argument. Test suites can also contain other test suites so you can create big hierarchies if you want.
Python 2.7 (and 3.1) added support for skipping some test methods or test cases, as well as marking some tests as expected failure.
http://docs.python.org/library/unittest.html#skipping-tests-and-expected-failures
Tests marked as expected failure won't be counted as failure on a TestResult.
There are some test systems that allow warnings rather than failures, but test_unit is not one of them (I don't know which ones do, offhand) unless you want to extend it (which is possible).
You can make the tests so that they log warnings rather than fail.
Another way to handle this is to separate out the tests and only run them to get the pass/fail reports and not have any build dependencies (this depends on your build setup).
Take a look at Nose : http://somethingaboutorange.com/mrl/projects/nose/0.11.1/
There are plenty of command line options for selecting tests to run, and you can keep your existing unittest tests.
Another possibility is to create a "B" branch (you ARE using some sort of version control, right?) and have your unit tests for "B" in there. That way, you keep your release version's unit tests clean (Look, all dots!), but still have tests for B. If you're using a modern version control system like git or mercurial (I'm partial to mercurial), branching/cloning and merging are trivial operations, so that's what I'd recommend.
However, I think you're using tests for something they're not meant to do. The real question is "How important to you is it that 'B' works?" Because your test suite should only have tests in it that you care whether they pass or fail. Tests that, if they fail, it means the code is broken. That's why I suggested only testing "B" in the "B" branch, since that would be the branch where you are developing the "B" feature.
You could test using logger or print commands, if you like. But if you don't care enough that it's broken to have it flagged in your unit tests, I'd seriously question whether you care enough to test it at all. Besides, that adds needless complexity (extra variables to set debug level, multiple testing vectors that are completely independent of each other yet operate within the same space, causing potential collisions and errors, etc, etc). Unless you're developing a "Hello, World!" app, I suspect your problem set is complicated enough without adding additional, unnecessary complications.
You could write your test so that they count success rate.
With OCR you could throw at code 1000 images and require that 95% is successful.
If your program must work with type A then if this fails the test fails. If it's not required to work with B, what is the value of doing such a test ?
Recently I've been experimenting with TDD while developing a GUI application in Python. I find it very reassuring to have tests that verify the functionality of my code, but it's been tricky to follow some of the recommened practices of TDD. Namely, writing tests first has been hard. And I'm finding it difficult to make my tests readable (due to extensive use of a mocking library).
I chose a mocking library called mocker. I use it a lot since much of the code I'm testing makes calls to (a) other methods in my application that depend on system state or (b) ObjC/Cocoa objects that cannot exist without an event loop, etc.
Anyway, I've got a lot of tests that look like this:
def test_current_window_controller():
def test(config):
ac = AppController()
m = Mocker()
ac.iter_window_controllers = iwc = m.replace(ac.iter_window_controllers)
expect(iwc()).result(iter(config))
with m:
result = ac.current_window_controller()
assert result == (config[0] if config else None)
yield test, []
yield test, [0]
yield test, [1, 0]
Notice that this is actually three tests; all use the same parameterized test function. Here's the code that is being tested:
def current_window_controller(self):
try:
# iter_window_controllers() iterates in z-order starting
# with the controller of the top-most window
# assumption: the top-most window is the "current" one
wc = self.iter_window_controllers().next()
except StopIteration:
return None
return wc
One of the things I've noticed with using mocker is that it's easier to write the application code first and then go back and write the tests second, since most of the time I'm mocking many method calls and the syntax to write the mocked calls is much more verbose (thus harder to write) than the application code. It's easier to write the app code and then model the test code off of that.
I find that with this testing method (and a bit of discipline) I can easily write code with 100% test coverage.
I'm wondering if these tests are good tests? Will I regret doing it this way down the road when I finally discover the secret to writing good tests?
Am I violating the core principles of TDD so much that my testing is in vain?
If you are writing your tests after you've written your code and making them pass, you are not doing TDD (nor are you getting any benefits of Test-First or Test-Driven development.. check out SO questions for definitive books on TDD)
One of the things I've noticed with
using mocker is that it's easier to
write the application code first and
then go back and write the tests
second, since most of the time I'm
mocking many method calls and the
syntax to write the mocked calls is
much more verbose (thus harder to
write) than the application code. It's
easier to write the app code and then
model the test code off of that.
Of course, its easier because you are just testing that the sky is orange after you made it orange by painting it with a specific kind of brush.
This is retrofitting tests (for self-assurance). Mocks are good but you should know how and when to use them - Like the saying goes 'When you have a hammer everything looks like a nail' It's also easy to write a whole load of unreadable and not-as-helpful-as-can-be tests. The time spent understanding what the test is about is time lost that can be used to fix broken ones.
And the point is:
Read Mocks aren't stubs - Martin Fowler if you haven't already. Google out some documented instances of good ModelViewPresenter patterned GUIs (Fake/Mock out the UIs if necessary).
Study your options and choose wisely. I'll play the guy with the halo on your left shoulder in white saying 'Don't do it.' Read this question as to my reasons - St. Justin is on your right shoulder. I believe he has also something to say:)
Unit tests are really useful when you refactor your code (ie. completely rewrite or move a module). As long as you have unit tests before you do the big changes, you'll have confidence that you havent forgotten to move or include something when you finish.
Please remember that TDD is not a panaceum. It's hard, it's supposed to be hard, and it's especially hard to write mocking tests "in advance".
So I would say - do what works for you. Even it's not "certified TDD". I do basically the same thing.
You may want to provide your own API for GUI that would sit between controller code and GUI library code. That could be easier to mock, or you can even add some testing hooks to it.
Last but not least, your code doesn't look too unreadable to me. Code using mocks is generally harder to understand. Fortunately in Python mocking is much easier and cleaner than i n other languages.