Isolating Python unittest from imports in other test modules

Isolating Python unittest from imports in other test modules - python

When running tests that target a specific method which uses reflection, I encounter the problem that the output of tests is dependent on whether I run them with PTVS ('run all tests' in Test Explorer) or with the command line Python tool (both on Windows and Linux systems):
$ python -m unittest
I assumed from the start that it has something to do with differences in how the test runners work in PTVS and Python's unittest framework (because I've noticed other differences, too).
# method to be tested
# written in Python 3
def create_line(self):
problems = []
for creator in LineCreator.__subclasses__():
item = creator(self.structure)
cls = item.get_subtype()
line = cls(self.text)
try:
line.parse()
return line
except ParseException as exc:
problems.append(exc)
raise ParseException("parsing did not succeed", problems)
""" subclasses of LineCreator are defined in separate files.
They implement get_subtype() and return the class objects of the actual types they must instantiate.
"""
I have noticed that the subclasses found in this way will vary, depending on which modules have been loaded in the code that calls this method. This is exactly what I want (for now). Given this knowledge, I am always careful to only have access to one subclass of LineCreator in any given test module, class, or method.
However, when I run the tests from the Python command line, it is clear from the ParseException.problems attribute that both are loaded at all times. It is also easy to reproduce: inserting the following code makes all tests fail on the command line, yet they succeed on PTVS.
if len(LineCreator.__subclasses__()) > 1:
raise ImportError()
I know that my tests should run independently from each other and from any contextual factors. That is actually what I'm trying to achieve here.
In case I wasn't clear, my question is why behaviors are different, and which one is correct. And if you're feeling really generous, how to change my code to make tests succeed on all platforms.

Related

Is there a way to run several time a combinaison of python code and Pytest tests automatically?

I am looking to automate the process where:
I run some python code,
then run a set of tests using pytest
then, if all tests are validated, start the process again with new data.
I am thinking of writing a script executing the python code, then calling pytest using pytest.main(), check with the help of the exit code that all tests passed and in case of success start again.
The issue is that it is stated in pytest docs (https://docs.pytest.org/en/stable/usage.html) that it is not recommended to make multiple calls to pytest.main():
Note from pytest docs:
"Calling pytest.main() will result in importing your tests and any modules that they import. Due to the caching mechanism of python’s import system, making subsequent calls to pytest.main() from the same process will not reflect changes to those files between the calls. For this reason, making multiple calls to pytest.main() from the same process (in order to re-run tests, for example) is not recommended."
I was woundering if it was ok to call pytest.main() the way I intend to or if there was any better way to achieve what I am looking for?
I've made a simple example to make it problem more clear:
A = [0]
def some_action(x):
x[0] += 1
if __name__ == '__main__':
print('Initial value of A: {}'.format(A))
for i in range(10):
if i == 5:
# one test in test_mock2 that fails
test_dir = "./tests/functional_tests/test_mock2.py"
else:
# two tests in test_mock that pass
test_dir = "./tests/functional_tests/test_mock.py"
some_action(A)
check_tests = int(pytest.main(["-q", "--tb=no", test_dir]))
if check_tests != 0:
print('Interrupted at i={} because of tests failures'.format(i))
break
if i > 5:
print('All tests validated, final value of A: {}'.format(A))
else:
print('final value of A: {}'.format(A))
In this example some_action is executed until i reaches 5, at which point the tests fail and the process of executing/testing is interrupted. It seems to work fine, I'm only concerned because of the comments in pytest docs as stated above

The warning applies to the following sequence of events:
Run pytest.main on some folder which imports a.py, directly or indirectly.
Modify a.py (manually or programatically).
Attempt to rerun pytest.main on the same directory in the same python process as #1
The second run in step #3 will not not see the changes you made to a.py in step #2. That is because python does not import a file twice. Instead, it will check if the file has an entry in sys.modules, and use that instead. It's what lets you import large libraries multiple times without incurring a huge penalty every time.
Modifying the values in imported modules is fine. Python binds names to references, so if you bind something (like a new integer value) to the right name, everyone will be able to see it. Your some_action function is a good example of this. Future tests will run with the modified value if they import your script as a module.
The reason that the caveat is there is that pytest is usually used to test code after it has been modified. The warning is simply telling you that if you modify your code, you need to start pytest.main in a new python process to see the changes.
Since you do not appear to be modifying the code of the files in your test and expecting the changes to show up, the caveat you cite does not apply to you. Keep doing what you are doing.

Nose: find test generator

I'm having a little issue with the nose testing framework.
Scenario:
I want to test my software with devices I have connected to my computer. Each device is tested with different set of configurations. Luckily, as the testing code does not change, I can simply create files containing the configurations for each device.
For now, I have used normal test methods inside my class. Meaning that inside of my test method I'm loading the configuration file for the device I want to test and then iterate over all configurations in that file and perform the test. Unfortunately, this only gives one test result per device, not one per configuration the devices was tested with.
So I stumbled over nose own test generators. The old tests were modified so they are using a generator and everything worked fine so far, I got a result for each configuration, everything worked great.
My little Issue:
However, I've seen to hit a wall now. When using --collect-only to show the available tests, I get one of two possible outcomes:
A configuration file is loaded and the test generator generates the tests according to that configuration file. This means that nose displays the test with each possible parameter configuration. However, as the configuration file may not be the right one for the device I want to test later on, I get false result.
I found out that the plug-in that offers the --collect-only functionality bypasses the fixtures of the tests. So I moved the loading of the configuration into a fixture to avoid having the generated tests spam the list of available tests. Unfortunately, this resulted in no test being generated and hence not test being displayed as the generator didn't generate anything
What I tried so far:
So, I have tried a few things to solve this issue:
Using a flag to determine if --collect-only is running. As the collect plug-in bypasses the fixtures, I have set a flag in the generator with a default value of true and set it to false in the fixture of that generator. I hoped that checking the flag and simply ignoring the test generation when --collect-only was running would solve my issue. That was when I learned that nose checks if a test method is a generator and expects it to give test method.
As my first idea failed because nose knows that the function is a generator and expects at least one generated function from it, my second idea was to call the function with an empty set of parameters. As my configuration is stored in a dict(), I simply passed an empty dict. Luckily, this was enough to generate a test. However, apparently, nose makes a little check if the test is executable and if it fails, the test is once again ignored.
After the second fail I tried the other direction. I read the source code of nose a bit, trying to figure out how it works internally. That was when I stumbled over the "isgenerator()" check. So I thought, I would scan my directories for the tests on my own, add all static tests I find to a list and when I stumble across a generator, I will not generate tests, but add the name of the generator to the lists of tests. Well, this one fails so far as I have no real experience on how nose works internally. Meaning, I find all the static tests, however not the generators. For the moment, this is my code to find the tests:
k
from nose.loader import TestLoader
from nose.util import isgenerator
import os
folder = os.getcwd()
testLoader = TestLoader()
tests = testLoader.loadTestsFromDir(folder)
for testModule in tests:
print ("Found test module %s" % testModule.id())
for testFile in testModule:
print ("Found test file %s" % testFile.id())
for test in testFile:
print("Found test %s" % test.id())
if not isgenerator(test):
for x in test:
print("Found x %s" % x.id())
else:
print ("GENERATOR FOUND")
I ran out of ideas on what to try. Maybe I ran in some sort of X-Y problem, or maybe I'm simply blind to see the obvious solution.

Ok, so, after jumping through the debugger a bit more, I came up with the following solution and will post it here as an answer because it seems to work. However, it is neither really elegant looking nor do I know how stable it is exactly.
foundTests = dict()
for testPackage in loadedTests:
#print ("Found test module %s" % testModule.id())
for testModule in testPackage:
#print ("Found test file %s" % testFile.id())
for testCase in testModule:
#print("Found test %s" % test.id())
for test in testCase:
if isinstance(test, Test):
key, value = readTestInformation(test)
foundTests[key] = value
elif hasattr(test, "test_generator"):
factory = test.factory
for suite in factory.suites:
try:
if inspect.ismethod(suite):
key, value = readGeneratedTestInformation(suite)
foundTests[key]=value
except:
pass
This function simply iterates over all packages found, inside them iterates over all modules found, inside the modules it iterates over all classes and at last iterates over all test methods inside a class.
If a test method is a generator it seems to have the "test_generator" attribute, so I check for that and then iterate over all suites inside the factory, checking each time if it is a function. If it is, I got the test generator function.
So, if anyone comes up with a better solution, I'd be happy to see it. This way seems kind of hacky.

How to sort python unittest discover tests?

I created my bunch of Python tests using the Python unittest format.
Now I'm able to run them with
python -m unittest discover -s TestDirectory -p '*.py' -v
I finds them all, and runs them.
Now there's a subtle difference whether I run the tests on Windows, or on Linux. Indeed, on Windows the tests are run in alphabetical order, whereas on Linux the tests are run in no apparent human specific discoverable order, even if always the same.
The trouble is I relied on the first two letters of the test file to sort the order of execution of the tests. Not that they have to be run in a specific order, but to have some kind of informational tests, showing version data in their output to appear first in the test run log.
Is there something I can do to run the tests also in alphabetical order on Linux?

I have not tried this, but I imagine that one thing you could do is override the TestSuite class to add a sort function. Then you can call the sort function before calling the unittest run function. So, in an 'AllTests.py' script, you could add something like this:
class SortableSuite(unittest.TestSuite):
def sort(self):
self._tests.sort(cmp=lambda x,y: x._testMethodName < y._testMethodName)
def run(self,testResult):
#or if you don't want to run a sort() function, you can override the run
#function to automatically sort.
self._tests.sort(cmp=lambda x,y: x._testMethodName < y._testMethodName)
return unittest.TestSuite.run(self,testResult)
loader = unittest.TestLoader()
loader.suiteClass = SortableSuite
suite = loader.loadTestFromTestCases(collectedTests)
suite.sort()
suite.run(defaultTestResult())

Python CLI program unit testing

I am working on a python Command-Line-Interface program, and I find it boring when doing testings, for example, here is the help information of the program:
usage: pyconv [-h] [-f ENCODING] [-t ENCODING] [-o file_path] file_path
Convert text file from one encoding to another.
positional arguments:
file_path
optional arguments:
-h, --help show this help message and exit
-f ENCODING, --from ENCODING
Encoding of source file
-t ENCODING, --to ENCODING
Encoding you want
-o file_path, --output file_path
Output file path
When I made changes on the program and want to test something, I must open a terminal,
type the command(with options and arguments), type enter, and see if any error occurs
while running. If error really occurs, I must go back to the editor and check the code
from top to end, guessing where the bug positions, make small changes, write print lines,
return to the terminal, run command again...
Recursively.
So my question is, what is the best way to do testing with CLI program, can it be as easy
as unit testing with normal python scripts?

I think it's perfectly fine to test functionally on a whole-program level. It's still possible to test one aspect/option per test. This way you can be sure that the program really works as a whole. Writing unit-tests usually means that you get to execute your tests quicker and that failures are usually easier to interpret/understand. But unit-tests are typically more tied to the program structure, requiring more refactoring effort when you internally change things.
Anyway, using py.test, here is a little example for testing a latin1 to utf8 conversion for pyconv::
# content of test_pyconv.py
import pytest
# we reuse a bit of pytest's own testing machinery, this should eventually come
# from a separatedly installable pytest-cli plugin.
pytest_plugins = ["pytester"]
#pytest.fixture
def run(testdir):
def do_run(*args):
args = ["pyconv"] + list(args)
return testdir._run(*args)
return do_run
def test_pyconv_latin1_to_utf8(tmpdir, run):
input = tmpdir.join("example.txt")
content = unicode("\xc3\xa4\xc3\xb6", "latin1")
with input.open("wb") as f:
f.write(content.encode("latin1"))
output = tmpdir.join("example.txt.utf8")
result = run("-flatin1", "-tutf8", input, "-o", output)
assert result.ret == 0
with output.open("rb") as f:
newcontent = f.read()
assert content.encode("utf8") == newcontent
After installing pytest ("pip install pytest") you can run it like this::
$ py.test test_pyconv.py
=========================== test session starts ============================
platform linux2 -- Python 2.7.3 -- pytest-2.4.5dev1
collected 1 items
test_pyconv.py .
========================= 1 passed in 0.40 seconds =========================
The example reuses some internal machinery of pytest's own testing by leveraging pytest's fixture mechanism, see http://pytest.org/latest/fixture.html. If you forget about the details for a moment, you can just work from the fact that "run" and "tmpdir" are provided for helping you to prepare and run tests. If you want to play, you can try to insert a failing assert-statement or simply "assert 0" and then look at the traceback or issue "py.test --pdb" to enter a python prompt.

Start from the user interface with functional tests and work down towards unit tests. It can feel difficult, especially when you use the argparse module or the click package, which take control of the application entry point.
The cli-test-helpers Python package has examples and helper functions (context managers) for a holistic approach on writing tests for your CLI. It's a simple idea, and one that works perfectly with TDD:
Start with functional tests (to ensure your user interface definition) and
Work towards unit tests (to ensure your implementation contracts)
Functional tests
NOTE: I assume you develop code that is deployed with a setup.py file or is run as a module (-m).
Is the entrypoint script installed? (tests the configuration in your setup.py)
Can this package be run as a Python module? (i.e. without having to be installed)
Is command XYZ available? etc. Cover your entire CLI usage here!
Those tests are simplistic: They run the shell command you would enter in the terminal, e.g.
def test_entrypoint():
exit_status = os.system('foobar --help')
assert exit_status == 0
Note the trick to use a non-destructive operation (e.g. --help or --version) as we can't mock anything with this approach.
Towards unit tests
To test single aspects inside the application you will need to mimic things like command line arguments and maybe environment variables. You will also need to catch the exiting of your script to avoid the tests to fail for SystemExit exceptions.
Example with ArgvContext to mimic command line arguments:
#patch('foobar.command.baz')
def test_cli_command(mock_command):
"""Is the correct code called when invoked via the CLI?"""
with ArgvContext('foobar', 'baz'), pytest.raises(SystemExit):
foobar.cli.main()
assert mock_command.called
Note that we mock the function that we want our CLI framework (click in this example) to call, and that we catch SystemExit that the framework naturally raises. The context managers are provided by cli-test-helpers and pytest.
Unit tests
The rest is business as usual. With the above two strategies we've overcome the control a CLI framework may have taken away from us. The rest is usual unit testing. TDD-style hopefully.
Disclosure: I am the author of the cli-test-helpers Python package.

So my question is, what is the best way to do testing with CLI program, can it be as easy as unit testing with normal python scripts?
The only difference is that when you run Python module as a script, its __name__ attribute is set to '__main__'. So generally, if you intend to run your script from command line it should have following form:
import sys
# function and class definitions, etc.
# ...
def foo(arg):
pass
def main():
"""Entry point to the script"""
# Do parsing of command line arguments and other stuff here. And then
# make calls to whatever functions and classes that are defined in your
# module. For example:
foo(sys.argv[1])
if __name__ == '__main__':
main()
Now there is no difference, how you would use it: as a script or as a module. So inside your unit-testing code you can just import foo function, call it and make any assertions you want.

Maybe too little too late,
but you can always use
import os.system
result = os.system(<'Insert your command with options here'>
assert(0 == result)
In that way, you can run your program as if it was from command line, and evaluate the exit code.
(Update after I studied pytest)
You can also use capsys.
(from running pytest --fixtures)
capsys
Enable text capturing of writes to sys.stdout and sys.stderr.
The captured output is made available via ``capsys.readouterr()`` method
calls, which return a ``(out, err)`` namedtuple.
``out`` and ``err`` will be ``text`` objects.

This isn't for Python specifically, but what I do to test command-line scripts is to run them with various predetermined inputs and options and store the correct output in a file. Then, to test them when I make changes, I simply run the new script and pipe the output into diff correct_output -. If the files are the same, it outputs nothing. If they're different, it shows you where. This will only work if you are on Linux or OS X; on Windows, you will have to get MSYS.
Example:
python mycliprogram --someoption "some input" | diff correct_output -
To make it even easier, you can add all these test runs to your 'make test' Makefile target, which I assume you already have. ;)
If you are running many of these at once, you could make it a little more obvious where each one ends by adding a fail tag:
python mycliprogram --someoption "some input" | diff correct_output - || tput setaf 1 && echo "FAILED"

The short answer is yes, you can use unit tests, and should. If your code is well structured, it should be quite easy to test each component separately, and if you need to to can always mock sys.argv to simulate running it with different arguments.

pytest-console-scripts is a Pytest plugin for testing python scripts installed via console_scripts entry point of setup.py.

For Python 3.5+, you can use the simpler subprocess.run to call your CLI command from your test.
Using pytest:
import subprocess
def test_command__works_properly():
try:
result = subprocess.run(['command', '--argument', 'value'], check=True, capture_output=True, text=True)
except subprocess.CalledProcessError as error:
print(error.stdout)
print(error.stderr)
raise error
The output can be accessed via result.stdout, result.stderr, and result.returncode if needed.
The check parameter causes an exception to be raised if an error occurs. Note Python 3.7+ is required for the capture_output and text parameters, which simplify capturing and reading stdout/stderr.

Given that you are explicitly asking about testing for a command line application, I believe that you are aware of unit-testing tools in python and that you are actually looking for a tool to automate end-to-end tests of a command line tool. There are a couple of tools out there that are specifically designed for that. If you are looking for something that's pip-installable, I would recommend cram. It integrates well with the rest of the python environment (e.g. through a pytest extension) and it's quite easy to use:
Simply write the commands you want to run prepended with $ and the expected output prepended with . For example, the following would be a valid cram test:
$ echo Hello
Hello
By having four spaces in front of expected output and two in front of the test, you can actually use these tests to also write documentation. More on that on the website.

You can use standard unittest module:
# python -m unittest <test module>
or use nose as a testing framework. Just write classic unittest files in separate directory and run:
# nosetests <test modules directory>
Writing unittests is easy. Just follow online manual for unittesting

I would not test the program as a whole this is not a good test strategy and may not actually catch the actual spot of the error. The CLI interface is just front end to an API. You test the API via your unit tests and then when you make a change to a specific part you have a test case to exercise that change.
So, restructure your application so that you test the API and not the application it self. But, you can have a functional test that actually does run the full application and checks that the output is correct.
In short, yes testing the code is the same as testing any other code, but you must test the individual parts rather than their combination as a whole to ensure that your changes do not break everything.

Run python program from another python program (with certain requirements)

Let's say I have two python scripts A.py and B.py. I'm looking for a way to run B from within A in such a way that:
B believes it is __main__ (so that code in an if __name__=="__main__" block in B will run)
B is not actually __main__ (so that it does not, e.g., overwrite the "__main__" entry in sys.modules)
Exceptions raised within B propagate to A (i.e., could be caught with an except clause in A).
Those exceptions, if not caught, generate a correct traceback referencing line numbers within B.
I've tried various techniques, but none seem to satisfy all my requirements.
using tools from the subprocess module means exceptions in B do not propagate to A.
execfile("B.py", {}) runs B, but it doesn't think it's main.
execfile("B.py", {'__name__': '__main__'}) makes B.py think it's main, but it also seems to screw up the exception traceback printing, so that the tracebacks refer to lines within A (i.e., the real __main__).
using imp.load_source with __main__ as the name almost works, except that it actually modifies sys.modules, thus stomping on the existing value of __main__
Is there any way to get what I want?
(The reason I'm doing this is because I'm doing some cleanup on an existing library. This library has no real test suite, just a set of "example" scripts that produce certain output. I'm trying to leverage these as tests to ensure that my cleanup doesn't affect the library's ability to execute these examples, so I want to run each example script from within my test suite. I'd like to be able to see exceptions from these scripts within the test script so the test script can report the type of failure, instead of just reporting a generic SubprocessError whenever an example script raises some exception.)

Your use case makes sense, but I still think you'd be better off refactoring the tests such that they can be run externally.
Do you test scripts have something like this?
def test():
pass
if __name__ == '__main__':
test()
If not, perhaps you should convert your tests to calling a function such as test. Then, from your main test script, you can just:
import test1
test1.test()
import test2
test2.test()
Provide a common interface to running tests, that the tests themselves use. Having a big block of code in a __main__ check is Not A Good Thing.
Sorry that I didn't answer the question you asked, but I feel this is the correct solution without deviating too far from the original test code.

Answering my own question because the result is kind of interesting and might be useful to others:
It turns out I was wrong: execfile("B.py", {'__name__': '__main__'} is the way to go after all. It does correctly produce the tracebacks. What I was seeing with incorrect line numbers weren't exceptions but warnings. These warnings were produced using warnings.warn("blah", stacklevel=2). The stacklevel=2 argument is supposed to allow for things like deprecation warnings to be raised where the deprecated thing is used, rather than at the warning call (see the documentation).
However, it seems that the execfile-d file doesn't count as a "stack level" for this purpose, and is invisible for stacklevel purposes. So if code at the top level of an execfile-d module causes a warning with stacklevel 2, the warning is not raised at the right line number in the execfile-d source file; instead it is raised at the corresponding line number of the file which is running the execfile.
This is unfortunate, but I can live with it, since these are only warnings, so they won't impact the actual performance of the tests. (I didn't notice at first that it was only the warnings that were affected by the line-number mismatches, because there were lots of warnings and exceptions intermixed in the test output.)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.