The following point (in bold) is mentioned in this famous Stackoverflow question:
Unit Tests allows you to make big changes to code quickly. You know it works now because you've run the tests, when you make the changes you need to make, you need to get the tests working again. This saves hours.
In my case, I finished writing a program in Python 2.7. Now I started writing the test using PyUnit. The test will be another class (derived from "unittest.TestCase") which will exist in a different file. (I did not know that the test should be written before or during development at the beginning)
As I am writing the test, I started wondering: In case I modified my program code, and ran my test again, then the test should still work without changes because it was not changed (the above point suggests that you need to make changes to test to make it work!) It is the program code itself that was changed and not the test.
I do not understand how the last sentence in the above-mentioned point makes sense. I hope I can find somebody who can help me in understanding it.
Thanks
Unit tests verify contracts. They won't change if contracts are unchanged. A programmer can freely modify implementation feeling himself protected from errors by UT.
The sentence you quote is about changing contracts - UT indicates a change in contract and programmer should ensure this change is reasonable. In well designed software this is easier than verifying correctness of implementation, hence speed-up of the process.
The test should actually execute the package code, so that breaking the package will show up in tests.
I think the highlighted sentence should have little more details, like if the original 'contract', or 'requirement' of the module its testing is changed, or not changed.
My quick read says, the original contract has not changed. But still you have to run, and make sure it works. Or if your code improved performance due to your modification, it should readjust the test to reflect improvements. But again requirement remained same, and your code is performing better.
Related
I just migrated a pytest test suite from quickcheck to hypothesis. This worked quite well (and immediately uncovered some hidden edge case bugs), but one major difference I see is related to test isolation between the two property managers.
quickcheck seems to simply run the test function multiple times with different parameter values, each time running my function-scoped fixtures. This also results in many more dots in pytest's output.
hypothesis however seems to run only the body of the test function multiple times, meaning for example no transaction rollbacks between individual runs. This then means I cannot reliably assert for a number of DB entries when my test inserts something into the DB for example, since all the entries from the previous run would still be hanging around.
Am I missing something obvious here or is this expected behaviour? If so, is there a way to get the number of runs hypothesis has done as a variable to use inside the test?
I'm afraid you're a bit stuck and there isn't currently any good solution to this problem.
The way Hypothesis needs to work (which is the source of a lot of its improvements over pytest-quickcheck) doesn't meet pytest's assumptions about test execution. The problem is mostly on the pytest side - the current pytest fixture system has some very baked in assumptions about how you run a test that do not play well with taking control of the test execution, and the last time I tried to work around this I ended up sinking about a week of work into it before giving up and basically saying that either something needs to change on the pytest side or someone needs to fund this work if it's going to get any better.
I have been trying to get the hang of TDD and unit testing (in python, using nose) and there are a few basic concepts which I'm stuck on. I've read up a lot on the subject but nothing seems to address my issues - probably because they're so basic they're assumed to be understood.
The idea of TDD is that unit tests are written before the code they test. Unit test should test small portions of code (e.g. functions) which, for the purposes of the test, are self-contained and isolated. However, this seems to me to be highly dependent on the implementation. During implementation, or during a later bugfix it may become necessary to abstract some of the code into a new function. Should I then go through all my tests and mock out that function to keep them isolated? Surely in doing this there is a danger of introducing new bugs into the tests, and the tests will no longer test exactly the same situation?
From my limited experience in writing unit tests, it appears that completely isolating a function sometimes results in a test that is longer and more complicated than the code it is testing. So if the test fails all it tells you is that there is either a bug in the code or in the test, but its not obvious which. Not isolating it may mean a much shorter and easier to read test, but then its not a unit test...
Often, once isolated, unit tests seem to be merely repeating the function. E.g. if there is a simple function which adds two numbers, then the test would probably look something like assert add(a, b) == a + b. Since the implementation is simply return a + b, what's the point in the test? A far more useful test would be to see how the function works within the system, but this goes against unit testing because it is no longer isolated.
My conclusion is that unit tests are good in some situations, but not everywhere and that system tests are generally more useful. The approach that this implies is to write system tests first, then, if they fail, isolate portions of the system into unit tests to pinpoint the failure. The problem with this, obviously, is that its not so easy to test corner cases. It also means that the development is not fully test driven, as unit tests are only written as needed.
So my basic questions are:
Should unit tests be used everywhere, however small and simple the function?
How does one deal with changing implementations? I.e. should the implementation of the tests change continuously too, and doesn't this reduce their usefulness?
What should be done when the test gets more complicated than the code its testing?
Is it always best to start with unit tests, or is it better to start with system tests, which at the start of development are much easier to write?
Regarding your conclusion first: both unit tests and system tests (integration tests) both have their use, and are in my opinion just as useful. During development I find it easier to start with unit tests, but for testing legacy code I find your approach where you start with the integration tests easier. I don't think there's a right or wrong way of doing this, the goal is to make a safetynet that allows you to write solid and well tested code, not the method itself.
I find it useful to think about each function as an API in this context. The unit test is testing the API, not the implementation. If the implementation changes, the test should remain the same, this is the safety net that allows you to refactor your code with confidence. Even if refactoring means taking part of the implementation out to a new function, I will say it's ok to keep the test as it is without stubbing or mocking the part that was refactored out. You will probably want a new set of tests for the new function however.
Unit tests are not a holy grail! Test code should be fairly simple in my opinion, and it should be little reason for the test code itself to fail. If the test becomes more complex than the function it tests, it probably means you need to refactor the code differently. An example from my own past: I had some code that took some input and produced some output stored as XML. Parsing the XML to verifying that the output was correct caused a lot of complexity in my tests. However realizing that the XML-representation was not the point, I was able to refactor the code so that I could test the output without messing with the details of XML.
Some functions are so trivial that a separate test for them adds no value. In your example you're not really testing your code, but that the '+' operator in your language works as expected. This should be tested by the language implementer, not you. However that function won't need to get very much more complex before adding a test for it is worthwhile.
In short, I think your observations are very relevant and point towards a pragmatic approach to testing. Following some rigorous definition too closely will often get in the way, even though the definitions themselves may be necessary for the purpose of having a way to communicate about the ideas they convey. As said, the goal is not the method, but the result; which for testing is to have confidence in your code.
1) Should unit tests be used everywhere, however small and simple the function?
No. If a function has no logic in it (if, while-loops, appends, etc...) there's nothing to test.
This means that an add function implemented like:
def add(a, b):
return a + b
It doesn't have anything to test. But if you really want to build a test for it, then:
assert add(a, b) == a + b # Worst test ever!
is the worst test one could ever write. The main problem is that the tested logic must NOT be reproduced in the testing code, because:
If there's a bug in there it will be reproduced as well.
You're no more testing the function but that a + b works in the same way in two different files.
So it would make more sense something like:
assert add(1, 2) == 3
But once again, this is just an example, and this add function shouldn't even be tested.
2) How does one deal with changing implementations?
It depends on what changes. Keep in mind that:
You're testing the API (roughly speaking, that for a given input you get a specific output/effect).
You're not repeating the production code in your testing code (as explained before).
So, unless you're changing the API of your production code, the testing code will not be affacted in any way.
3) What should be done when the test gets more complicated than the code its testing?
Yell at whoever wrote those tests! (And re-write them).
Unit tests are simple and don't have any logic in them.
4a) Is it always best to start with unit tests, or is it better to start with system tests?
If we are talking about TDD than one shouldn't even have this problem, because even before writing one little tiny function the good TDD developer would've written unit tests for it.
If you have already working code without tests whatsoever, I'd say that unit tests are easier to write.
4b) Which at the start of development are much easier to write?
Unit tests! Because you don't even have the root of your code, how could you write system tests?
I maintain a Python program that provides advice on certain topics. It does this by applying a complicated algorithm to the input data.
The program code is regularly changed, both to resolve newly found bugs, and to modify the underlying algorithm.
I want to use regression tests. Trouble is, there's no way to tell what the "correct" output is for a certain input - other than by running the program (and even then, only if it has no bugs).
I describe below my current testing process. My question is whether there are tools to help automate this process (and of course, if there is any other feedback on what I'm doing).
The first time the program seemed to run correctly for all my input cases, I saved their outputs in a folder I designated for "validated" outputs. "Validated" means that the output is, to the best of my knowledge, correct for a given version of my program.
If I find a bug, I make whatever changes I think would fix it. I then rerun the program on all the input sets, and manually compare the outputs. Whenever the output changes, I do my best to informally review those changes and figure out whether:
the changes are exclusively due to the bug fix, or
the changes are due, at least in part, to a new bug I introduced
In case 1, I increment the internal version counter. I mark the output file with a suffix equal to the version counter and move it to the "validated" folder. I then commit the changes to the Mercurial repository.
If in the future, when this version is no longer current, I decide to branch off it, I'll need these validated outputs as the "correct" ones for this particular version.
In case 2, I of course try to find the newly introduced bug, and fix it. This process continues until I believe the only changes versus the previous validated version are due to the intended bug fixes.
When I modify the code to change the algorithm, I follow a similar process.
Here's the approach I'll probably use.
Have Mercurial manage the code, the input files, and the regression test outputs.
Start from a certain parent revision.
Make and document (preferably as few as possible) modifications.
Run regression tests.
Review the differences with the parent revision regression test output.
If these differences do not match the expectations, try to see whether a new bug was introduced or whether the expectations were incorrect. Either fix the new bug and go to 3, or update the expectations and go to 4.
Copy the output of regression tests to the folder designated for validated outputs.
Commit the changes to Mercurial (including the code, the input files, and the output files).
I am about to make a game using python and libtcod roguelike game library.
More to the point, I am using PyMock because I am just starting to learn Test-Driven Development, and I am determined not to cheat. I really want to get into the habit of doing it properly, and according to TDD I need a failing unit test before I write my first line of code.
I figure my first test of my "production" code should be that its dependency, libcotdpy, is imported.
My testing file:
#!/usr/bin/python
import pymock # for mocking and unit testing
import game # my (empty) production code file, game.py
class InitializeTest(pymock.PyMockTestCase):
def test_libtcod_is_imported(self):
# How do I test that my production file imports the libtcodpy module?
if __name__=="__main__":
import unittest
unittest.main()
Please:
1) (python people) How do I test that the module is loaded?
2) (TDD people) Should I be unit testing something this basic? If not, what is the first thing I should be testing?
1) 'your_module' in sys.modules.
Don't actually use that, though:
2)
What should your library should do?
Is it “have a dependency on libcotdpy”? I think not.
You've just made a design choice that wasn't test-driven!
Write a test that demonstrates how you want to use the library. Don't think about how you're going to implement it. For example:
player = my_lib.PlayerCharacter()
assert player.position == (0, 0) # or whatever assert syntax `pymock` uses
press_key('k')
assert player.position == (0, 1)
Or something similar. (I don't know what you want your library to do, or how much libtcod provides.)
The way I usually think about TDD (and BDD) is at two levels of development: acceptance-testing level, and unit-testing level.
First thing I would do is write stories (acceptance criteria). What is the core feature of your application? Define an end-to-end scenario that explicit one feature, and goes end-to-end with it. That's your first story. Write a test for it, using an acceptance testing (or integration testing) framework. Unfortunately, I don't know Python tools, but in Java I would use JBehave, or FITnesse. It would be something very high-level, far away from the code, that considers your application as a "black box". Something like "When my input parameters are xxx, I run my application, the expected output is yyyy".
Run this test, it will fail because the underlying application doesn't exist. Create the minimal amount of classes to make it go red (and not throw an exception anymore). That's when you need to start the second phase of TDD: unit-TDD. It's basically a "descending analysis", from top-level to details, and this phase will contain a lot of red-green-refactor cycles, bringing a lot of different units in the game.
From time to time, re-run your original acceptance test, or refine it if your growing architecture and analysis forced you to make changes to specifications (theoretically, it shouldn't happen at that stage, but in practice it does, very often). When your acceptance test is completely green, you're done with that story, rinse and repeat.
All of that brings me to my point: pure TDD (I mean unit-TDD) is not practical. I mean I really like TDD, but trying to follow it religiously will be more a hassle than a help in the long run. Sometimes you will go and spike an approach to see if that goes well with the rest of your project, without writing tests first for it, and potentially rewrite it using TDD. but as long as you have acceptance tests to cover the whole lot, you're fine.
Even if there is a way to test that, I'd recommend not doing it.
Test from the client perspective (outside-in), what behavior is provided by your SUT (Game). Your tests (or your users) don't need to know (/care) that you expose this behavior using a library. As long as the behavior isn't broken, your tests should pass.
Also like another answer says, maybe you don't need the dependency - there may be a simpler solution (e.g. a hashtable might do where you instinctively jumped on a relational database). Listen to the tests... let the tests pull in behavior.
This also leaves you free to change the dependency in the future without having to fix a bunch of tests.
Python is pretty clean, and I can code neat apps quickly.
But I notice I have some minor error someplace and I dont find the error at compile but at run time. Then I need to change and run the script again. Is there a way to have it break and let me modify and run?
Also, I dislike how python has no enums. If I were to write code that needs a lot of enums and types, should I be doing it in C++? It feels like I can do it quicker in C++.
"I don't find the error at compile but at run time"
Correct. True for all non-compiled interpreted languages.
"I need to change and run the script again"
Also correct. True for all non-compiled interpreted languages.
"Is there a way to have it break and let me modify and run?"
What?
If it's a run-time error, the script breaks, you fix it and run again.
If it's not a proper error, but a logic problem of some kind, then the program finishes, but doesn't work correctly. No language can anticipate what you hoped for and break for you.
Or perhaps you mean something else.
"...code that needs a lot of enums"
You'll need to provide examples of code that needs a lot of enums. I've been writing Python for years, and have no use for enums. Indeed, I've been writing C++ with no use for enums either.
You'll have to provide code that needs a lot of enums as a specific example. Perhaps in another question along the lines of "What's a Pythonic replacement for all these enums."
It's usually polymorphic class definitions, but without an example, it's hard to be sure.
With interpreted languages you have a lot of freedom. Freedom isn't free here either. While the interpreter won't torture you into dotting every i and crossing every T before it deems your code worthy of a run, it also won't try to statically analyze your code for all those problems. So you have a few choices.
1) {Pyflakes, pychecker, pylint} will do static analysis on your code. That settles the syntax issue mostly.
2) Test-driven development with nosetests or the like will help you. If you make a code change that breaks your existing code, the tests will fail and you will know about it. This is actually better than static analysis and can be as fast. If you test-first, then you will have all your code checked at test runtime instead of program runtime.
Note that with 1 & 2 in place you are a bit better off than if you had just a static-typing compiler on your side. Even so, it will not create a proof of correctness.
It is possible that your tests may miss some plumbing you need for the app to actually run. If that happens, you fix it by writing more tests usually. But you still need to fire up the app and bang on it to see what tests you should have written and didn't.
You might want to look into something like nosey, which runs your unit tests periodically when you've saved changes to a file. You could also set up a save-event trigger to run your unit tests in the background whenever you save a file (possible e.g. with Komodo Edit).
That said, what I do is bind the F7 key to run unit tests in the current directory and subdirectories, and the F6 key to run pylint on the current file. Frequent use of these allows me to spot errors pretty quickly.
Python is an interpreted language, there is no compile stage, at least not that is visible to the user. If you get an error, go back, modify the script, and try again. If your script has long execution time, and you don't want to stop-restart, you can try a debugger like pdb, using which you can fix some of your errors during runtime.
There are a large number of ways in which you can implement enums, a quick google search for "python enums" gives everything you're likely to need. However, you should look into whether or not you really need them, and if there's a better, more 'pythonic' way of doing the same thing.