PyTest and controlled test failure for GitHub Workflow/Automation

PyTest and controlled test failure for GitHub Workflow/Automation - python

Although I've been using Python for a number of years now, I realised that working predominantly on personal projects, I never needed to do Unit testing before, so apologies for the obvious questions or wrong assumptions I might make.
My goal is to understand how I can make tests and possibly combine everything with the GitHub workflow to create some automation.
I've seen Failures/Errors (which are conceptually different) thrown locally are not treated differently once online. But before I go, I have some doubts that I want to clarify.
From reading online, my initial understanding seems to be that a test should always SUCCEED, even if it contains errors or failure.
But if it succeeds, how can then I record a failure or an error? So I'm tempted to say I'm capturing this in the wrong way?
I appreciate that in an Agile environment, some would like to say it's a controlled process, and errors can be intercepted while looking into the code. But I'm not sure this is the best approach.And this leads me to the second question.
Say I have a function accepting dates, and I know that it cannot accept anything else than that.
Would it make sense to do a test to say pass in strings (and get
a failure)?
Or should I test only for the expected circumstances?
Say case 1) is a best practice; what should I do in the context of running these tests? Should I let the test fail and get a long list of errors? Or should I decorate functions with a #pytest.mark.xfail() (a sort of Soft fail, where I can use a try ... catch)?
And last question (for now): would an xfail decorator let the workflow automation consider the test as "passed". Probably not, but at this stage, I've so much confusion in my head that any clarity from experienced users could help.
Thanks for your patience in reading.

The question is a bit fuzzy, but I will have a shot.
The notion that tests should always succeed even if they have errors is probably a misunderstanding. Failing tests are errors and should be shown as such (with the exception of tests known to fail, but that is a special case, see below). From the comment I guess what was actually meant was that other tests shall continue to run, even if one test failed - that makes certainly sense, especially in CI tests, where you want to get the whole picture.
If you have a function accepting dates and nothing else, it shall be tested that it indeed only accepts dates, and raises an exception or something in the case an invalid date is given. What I meant in the comment is if your software ensures that only a date can be passed to that function, and this is also ensured via tests, it would not be needed to test this again, but in general - yes, this should be tested.
So, to give a few examples: if your function is specified to raise an exception on invalid input, this has to be tested using something like pytest.raises - it would fail, if no exception is raised. If your function shall handle invalid dates by logging an error, the test shall verify that the error is logged. If an invalid input should just be ignored, the test shall ensure that no exception is raised and the state does not change.
For xfail, I just refer you to the pytest documentation, where this is described nicely:
An xfail means that you expect a test to fail for some reason. A common example is a test for a feature not yet implemented, or a bug not yet fixed. When a test passes despite being expected to fail (marked with pytest.mark.xfail), it’s an xpass and will be reported in the test summary.
So a passing xfail test will be shown as passed indeed. You can easily test this yourself:
import pytest
#pytest.mark.xfail
def test_fails():
assert False
#pytest.mark.xfail
def test_succeeds():
assert True
gives something like:
============================= test session starts =============================
collecting ... collected 2 items
test_xfail.py::test_fails
test_xfail.py::test_succeeds
======================== 1 xfailed, 1 xpassed in 0.35s ========================
and the test is considered passed (e.g. has the exit code 0).

Related

Is there any good reason to catch exceptions in unittest transactions?

The unittest module is extremely good to detect problems in code.
I understand the idea of isolating and testing parts of code with assertions:
self.assertEqual(web_page_view.func, web_page_url)
But besides these assertions you also might have some logic before it, in the same test method, that could have problems.
I am wondering if manual exception handling is something to take in account ever inside methods of a TestCase subclass.
Because if I wrap a block in a try-catch, if something fails, the test returns OK and does not fail:
def test_simulate_requests(self):
"""
Simulate requests to a url
"""
try:
response = self.client.get('/adress/of/page/')
self.assertEqual(response.status_code, 200)
except Exception as e:
print("error: ", e)
Should exception handling be always avoided in such tests?

First part of answer:
As you correctly say, there needs to be some logic before the actual test. The code belonging to a unit-test can be clustered into four parts (I use Meszaros' terminology in the following): setup, exercise, verify, teardown. Often the code of a test case is structured such that the code for the four parts are cleanly separated and come in that precise order - this is called the four phase test pattern.
The exercise phase is the heart of the test, where the functionality is executed that shall be checked in the test. The setup ensures that this happens in a well defined context. So, what you have described is in this terminology the situation that during setup something fails. Which means, that the preconditions are not met which are required for a meaningful execution of the functionality that is to be tested.
This is a common situation and it means that you in fact need to be able to distinguish three outcomes of a test: A test can pass successfully, it can fail, or it can just be meaningless.
Fortunately, there is an answer for this in python: You can skip tests, and if a test is skipped this is recorded, but neither as a failure nor as a success. Skipping of tests would probably a better way to handle the situation that you have shown in your example. Here is a small code piece that demonstrates one way of skipping tests:
import unittest
class TestException(unittest.TestCase):
def test_skipTest_shallSkip(self):
self.skipTest("Skipped because skipping shall be demonstrated.")
Second part of answer:
Your test seems to have some non-deterministic elements. The self.client.get can throw exceptions (but only sometimes - sometimes it doesn't). This means you do not have the context during the test execution under control. In unit-testing this is a situation you should try to avoid. Your tests should have a deterministic behavior.
One typical way to achieve this is to isolate your code from the components that are responsible for the nondeterminism and during testing replace these components by mocks. The behaviour of the mocks is under full control of the test code. Thus, if your code uses some component for network accesses, you would mock that component. Then, in some test cases you can instruct the mock to simulate a successful network communication to see how your component handles this, and in other tests you instruct the mock to simulate a network failure to see how your component copes with this situation.

There are two "bad" states of a test: Failure (when one of the assertions fails) and Error (when the test itself fails - your case).
First of all, it goes without saying that it's better to build your test in such a way that it reaches its assertions.
If you need to assert some tested code raises an exception, you should use with self.assertRaises(ExpectedError)
If some code inside the test raises an exception - it's better to know it from 'Error' result than seeing 'OK all tests have passed'
If your test logic really assumes that something can fail in the test itself and it is normal behaviour - probably the test is wrong. May be you should use mocks (https://docs.python.org/3/library/unittest.mock.html) to imitate an api call or something else.
In your case, even if the test fails, you catch it with bare except and say "Ok, continue". Anyway the implementation is wrong.
Finally: no, there shouldn't be except in your test cases
P.S. it's better to call your test functions with test_what_you_want_to_test_name, in this case probably test_successful_request would be ok.

Fail test if the assert statement is missing [duplicate]

Today I had a failing test that happily succeeded, because I forgot a rather important line at the end:
assert actual == expected
I would like to have the machine catch this mistake in the future. Is there a way to make pytest detect if a test function does not assert anything, and consider this a test failure?
Of course, this needs to be a "global" configuration setting; annotating each test function with #fail_if_nothing_is_asserted would defeat the purpose.

This is one of the reasons why it really helps to write a failing test before writing the code to make the test pass. It's that one little extra sanity check for your code.
Also, the first time your test passes without actually writing the code to make it pass is a nice double-take moment too.

test isolation between pytest-hypothesis runs

I just migrated a pytest test suite from quickcheck to hypothesis. This worked quite well (and immediately uncovered some hidden edge case bugs), but one major difference I see is related to test isolation between the two property managers.
quickcheck seems to simply run the test function multiple times with different parameter values, each time running my function-scoped fixtures. This also results in many more dots in pytest's output.
hypothesis however seems to run only the body of the test function multiple times, meaning for example no transaction rollbacks between individual runs. This then means I cannot reliably assert for a number of DB entries when my test inserts something into the DB for example, since all the entries from the previous run would still be hanging around.
Am I missing something obvious here or is this expected behaviour? If so, is there a way to get the number of runs hypothesis has done as a variable to use inside the test?

I'm afraid you're a bit stuck and there isn't currently any good solution to this problem.
The way Hypothesis needs to work (which is the source of a lot of its improvements over pytest-quickcheck) doesn't meet pytest's assumptions about test execution. The problem is mostly on the pytest side - the current pytest fixture system has some very baked in assumptions about how you run a test that do not play well with taking control of the test execution, and the last time I tried to work around this I ended up sinking about a week of work into it before giving up and basically saying that either something needs to change on the pytest side or someone needs to fund this work if it's going to get any better.

Is there an alternative result for Python unit tests, other than a Pass or Fail?

I'm writing unit tests that have a database dependency (so technically they're functional tests). Often these tests not only rely on the database to be live and functional, but they can also rely on certain data to be available.
For example, in one test I might query the database to retrieve sample data that I am going to use to test the update or delete functionality. If data doesn't already exist, then this isn't exactly a failure in this context. I'm only concerned about the pass/fail status of the update or delete, and in this situation we didn't even get far enough to test it. So I don't want to give a false positive or false negative.
Is there an elegant way to have the unit test return a 3rd possible result? Such as a warning?

In general I think the advice by Paul Becotte is best for most cases:
This is a failure though- your tests failed to set up the system in
the way that your test required. Saying "the data wasn't there, so it
is okay for this test to fail" is saying that you don't really care
about whether the functionality you are testing works. Make your test
reliably insert the data immediately before retrieving it, which will
give you better insight into the state of your system.
However, in my particular case, I am writing a functional test that relies on data generated and manipulated from several processes. Generating it quickly at the beginning of the test just isn't practical (at least yet).
What I ultimately found to work as I need it to use skipTest as mentioned here:
Skip unittest if some-condition in SetUpClass fails

Non-critical unittest failures

I'm using Python's built-in unittest module and I want to write a few tests that are not critical.
I mean, if my program passes such tests, that's great! However, if it doesn't pass, it's not really a problem, the program will still work.
For example, my program is designed to work with a custom type "A". If it fails to work with "A", then it's broken. However, for convenience, most of it should also work with another type "B", but that's not mandatory. If it fails to work with "B", then it's not broken (because it still works with "A", which is its main purpose). Failing to work with "B" is not critical, I will just miss a "bonus feature" I could have.
Another (hypothetical) example is when writing an OCR. The algorithm should recognize most images from the tests, but it's okay if some of them fails. (and no, I'm not writing an OCR)
Is there any way to write non-critical tests in unittest (or other testing framework)?

As a practical matter, I'd probably use print statements to indicate failure in that case. A more correct solution is to use warnings:
http://docs.python.org/library/warnings.html
You could, however, use the logging facility to generate a more detailed record of your test results (i.e. set your "B" class failures to write warnings to the logs).
http://docs.python.org/library/logging.html
Edit:
The way we handle this in Django is that we have some tests we expect to fail, and we have others that we skip based on the environment. Since we can generally predict whether a test SHOULD fail or pass (i.e. if we can't import a certain module, the system doesn't have it, and so the test won't work), we can skip failing tests intelligently. This means that we still run every test that will pass, and have no tests that "might" pass. Unit tests are most useful when they do things predictably, and being able to detect whether or not a test SHOULD pass before we run it makes this possible.

Asserts in unit tests are binary: they will work or they will fail, there's no mid-term.
Given that, to create those "non-critical" tests you should not use assertions when you don't want the tests to fail. You should do this carefully so you don't compromise the "usefulness" of the test.
My advice to your OCR example is that you use something to record the success rate in your tests code and then create one assertion like: "assert success_rate > 8.5", and that should give the effect you desire.

Thank you for the great answers. No only one answer was really complete, so I'm writing here a combination of all answers that helped me. If you like this answer, please vote up the people who were responsible for this.
Conclusions
Unit tests (or at least unit tests in unittest module) are binary. As Guilherme Chapiewski says: they will work or they will fail, there's no mid-term.
Thus, my conclusion is that unit tests are not exactly the right tool for this job. It seems that unit tests are more concerned about "keep everything working, no failure is expected", and thus I can't (or it's not easy) to have non-binary tests.
So, unit tests don't seem the right tool if I'm trying to improve an algorithm or an implementation, because unit tests can't tell me how better is one version when compared to the other (supposing both of them are correctly implemented, then both will pass all unit tests).
My final solution
My final solution is based on ryber's idea and code shown in wcoenen answer. I'm basically extending the default TextTestRunner and making it less verbose. Then, my main code call two test suits: the critical one using the standard TextTestRunner, and the non-critical one, with my own less-verbose version.
class _TerseTextTestResult(unittest._TextTestResult):
def printErrorList(self, flavour, errors):
for test, err in errors:
#self.stream.writeln(self.separator1)
self.stream.writeln("%s: %s" % (flavour,self.getDescription(test)))
#self.stream.writeln(self.separator2)
#self.stream.writeln("%s" % err)
class TerseTextTestRunner(unittest.TextTestRunner):
def _makeResult(self):
return _TerseTextTestResult(self.stream, self.descriptions, self.verbosity)
if __name__ == '__main__':
sys.stderr.write("Running non-critical tests:\n")
non_critical_suite = unittest.TestLoader().loadTestsFromTestCase(TestSomethingNonCritical)
TerseTextTestRunner(verbosity=1).run(non_critical_suite)
sys.stderr.write("\n")
sys.stderr.write("Running CRITICAL tests:\n")
suite = unittest.TestLoader().loadTestsFromTestCase(TestEverythingImportant)
unittest.TextTestRunner(verbosity=1).run(suite)
Possible improvements
It should still be useful to know if there is any testing framework with non-binary tests, like Kathy Van Stone suggested. Probably I won't use it this simple personal project, but it might be useful on future projects.

Im not totally sure how unittest works, but most unit testing frameworks have something akin to categories. I suppose you could just categorize such tests, mark them to be ignored, and then run them only when your interested in them. But I know from experience that ignored tests very quickly become...just that ignored tests that nobody ever runs and are therefore a waste of time and energy to write them.
My advice is for your app to do, or do not, there is no try.

From unittest documentation which you link:
Instead of unittest.main(), there are
other ways to run the tests with a
finer level of control, less terse
output, and no requirement to be run
from the command line. For example,
the last two lines may be replaced
with:
suite = unittest.TestLoader().loadTestsFromTestCase(TestSequenceFunctions)
unittest.TextTestRunner(verbosity=2).run(suite)
In your case, you can create separate TestSuite instances for the criticial and non-critical tests. You could control which suite is passed to the test runner with a command line argument. Test suites can also contain other test suites so you can create big hierarchies if you want.

Python 2.7 (and 3.1) added support for skipping some test methods or test cases, as well as marking some tests as expected failure.
http://docs.python.org/library/unittest.html#skipping-tests-and-expected-failures
Tests marked as expected failure won't be counted as failure on a TestResult.

There are some test systems that allow warnings rather than failures, but test_unit is not one of them (I don't know which ones do, offhand) unless you want to extend it (which is possible).
You can make the tests so that they log warnings rather than fail.
Another way to handle this is to separate out the tests and only run them to get the pass/fail reports and not have any build dependencies (this depends on your build setup).

Take a look at Nose : http://somethingaboutorange.com/mrl/projects/nose/0.11.1/
There are plenty of command line options for selecting tests to run, and you can keep your existing unittest tests.

Another possibility is to create a "B" branch (you ARE using some sort of version control, right?) and have your unit tests for "B" in there. That way, you keep your release version's unit tests clean (Look, all dots!), but still have tests for B. If you're using a modern version control system like git or mercurial (I'm partial to mercurial), branching/cloning and merging are trivial operations, so that's what I'd recommend.
However, I think you're using tests for something they're not meant to do. The real question is "How important to you is it that 'B' works?" Because your test suite should only have tests in it that you care whether they pass or fail. Tests that, if they fail, it means the code is broken. That's why I suggested only testing "B" in the "B" branch, since that would be the branch where you are developing the "B" feature.
You could test using logger or print commands, if you like. But if you don't care enough that it's broken to have it flagged in your unit tests, I'd seriously question whether you care enough to test it at all. Besides, that adds needless complexity (extra variables to set debug level, multiple testing vectors that are completely independent of each other yet operate within the same space, causing potential collisions and errors, etc, etc). Unless you're developing a "Hello, World!" app, I suspect your problem set is complicated enough without adding additional, unnecessary complications.

You could write your test so that they count success rate.
With OCR you could throw at code 1000 images and require that 95% is successful.
If your program must work with type A then if this fails the test fails. If it's not required to work with B, what is the value of doing such a test ?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

PyTest and controlled test failure for GitHub Workflow/Automation - python

Related

Is there any good reason to catch exceptions in unittest transactions?

Fail test if the assert statement is missing [duplicate]

test isolation between pytest-hypothesis runs

Is there an alternative result for Python unit tests, other than a Pass or Fail?

Non-critical unittest failures

Categories

Resources