I'm learning pyspark, I have a function:
import re
def function_1(string):
new_string = re.sub(r"!", " ", string)
return new_string
udf_function_1 = udf(lambda s: function_1(s), StringType())
def function_2(data):
new_data = data \
.withColumn("column_1", udf_function_1("column_1"))
return new_data
My question is how to write unittest for function_2() in Python.
what do you exactly want to test in function_2?
Below is a simple test saved in a file called sample_test.py. I used pytest but you can right very similar code in unittest.
# sample_test.py
from pyspark import sql
spark = sql.SparkSession.builder \
.appName("local-spark-session") \
.getOrCreate()
def test_create_session():
assert isinstance(spark, sql.SparkSession) == True
assert spark.sparkContext.appName == 'local-spark-session'
def test_spark_version():
assert spark.version == '3.1.2'
running the test...
C:\Users\user\Desktop>pytest -v sample_test.py
============================================= test session starts =============================================
platform win32 -- Python 3.6.7, pytest-6.2.5, py-1.10.0, pluggy-1.0.0 -- c:\users\user\appdata\local\programs\python\python36\python.exe
cachedir: .pytest_cache
rootdir: C:\Users\user\Desktop
collected 2 items
sample_test.py::test_create_session PASSED [ 50%]
sample_test.py::test_spark_version PASSED [100%]
============================================== 2 passed in 4.81s ==============================================
Related
This question already has answers here:
Python coverage.py exclude_lines
(2 answers)
is there a python-version specific "#pragma nocover" available for python coverage tool?
(4 answers)
Closed last year.
I wrote the following code.
https://gitlab.com/ksaito11/click-test
$ cat commands/cmd.py
import click
from commands.hello import hello
def print_version(ctx, param, value):
if not value or ctx.resilient_parsing:
return
click.echo('Version 1.0')
ctx.exit()
#click.group()
#click.option('--opt1')
#click.option('--version', is_flag=True, callback=print_version,
expose_value=False, is_eager=True)
#click.pass_context
def cmd(ctx, **kwargs):
ctx.obj = kwargs
def main():
cmd.add_command(hello)
cmd(auto_envvar_prefix='HELLOCLI')
if __name__ == '__main__':
main()
$ cat commands/hello.py
import click
#click.command()
def hello():
click.echo('Hello World!')
The code works correctly.
$ export PYTHONPATH=.
$ python commands/cmd.py
Usage: cmd.py [OPTIONS] COMMAND [ARGS]...
Options:
--opt1 TEXT
--version
--help Show this message and exit.
Commands:
hello
$ python commands/cmd.py --version
Version 1.0
$ python commands/cmd.py hello
Hello World!
I wrote the following test case.
$ cat tests/test_cmd.py
from click.testing import CliRunner
import click
import pytest
from commands.cmd import cmd, main
from commands.hello import hello
def test_version():
runner = CliRunner()
result = runner.invoke(cmd, ["--version"])
assert result.exit_code == 0
def test_help():
runner = CliRunner()
result = runner.invoke(cmd)
assert result.exit_code == 0
def test_hello():
runner = CliRunner()
result = runner.invoke(hello)
assert result.exit_code == 0
I measured the coverage with the following command.
$ pytest --cov-branch --cov=commands
================================================================ test session starts ================================================================
platform linux -- Python 3.9.9, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: /home/ksaito/ghq/gitlab.com/ksaito11/click-test
plugins: cov-3.0.0
collected 3 items
tests/test_cmd.py ... [100%]
----------- coverage: platform linux, python 3.9.9-final-0 -----------
Name Stmts Miss Branch BrPart Cover
--------------------------------------------------------
commands/__init__.py 0 0 0 0 100%
commands/cmd.py 18 5 4 2 68%
commands/hello.py 4 0 0 0 100%
--------------------------------------------------------
TOTAL 22 5 4 2 73%
================================================================= 3 passed in 0.15s =================================================================
I didn't know how to write the code to test the part below and couldn't get 100% coverage.
def cmd(ctx, **kwargs):
ctx.obj = kwargs
def main():
cmd.add_command(hello)
cmd(auto_envvar_prefix='HELLOCLI')
The code below may not be needed when using "# click.group", but I couldn't determine.
def print_version(ctx, param, value):
if not value or ctx.resilient_parsing:
return
Please give me advice.
By adding the following settings, code that does not need to be included in coverage is excluded.
$ cat .coveragerc
[run]
branch = True
[report]
exclude_lines =
# Don't complain if non-runnable code isn't run:
if 0:
if __name__ == .__main__.:
def main
ctx.obj = kwargs
I deleted the code below because I thought it was unnecessary.
if not value or ctx.resilient_parsing:
return
The coverage is now 100%.
$ pytest --cov-branch --cov=commands
================================================================ test session starts ================================================================
platform linux -- Python 3.9.9, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
rootdir: /home/ksaito/ghq/gitlab.com/ksaito11/click-test
plugins: cov-3.0.0
collected 3 items
tests/test_cmd.py ... [100%]
----------- coverage: platform linux, python 3.9.9-final-0 -----------
Name Stmts Miss Branch BrPart Cover
--------------------------------------------------------
commands/__init__.py 0 0 0 0 100%
commands/cmd.py 10 0 0 0 100%
commands/hello.py 4 0 0 0 100%
--------------------------------------------------------
TOTAL 14 0 0 0 100%
================================================================= 3 passed in 0.22s =================================================================
I'm using PyTest for python code testing. Since I use googletest for my C++ code testing, I like the output format of googletest.
I'm wondering, is it possible to make pytest output like googletest? The pytest output line is too long, while googletest is short:
// pytest example:
(base) zz#home% pytest test_rle_v2.py
================================================================================== test session starts ===================================================================================
platform linux -- Python 3.8.1, pytest-6.0.1, py-1.9.0, pluggy-0.13.1
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/home/zz/work/test/learn-hp/.hypothesis/examples')
rootdir: /home/zz/work/test/learn-hp
plugins: env-0.6.2, hypothesis-4.38.0
collected 1 item
test_rle_v2.py . [100%]
=================================================================================== 1 passed in 0.46s ====================================================================================
// googletest example
(base) zz#home% ./test_version
[==========] Running 5 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 5 tests from VERSION
[ RUN ] VERSION.str
[ OK ] VERSION.str (0 ms)
[ RUN ] VERSION.parts
[ OK ] VERSION.parts (0 ms)
[ RUN ] VERSION.metadata
[ OK ] VERSION.metadata (1 ms)
[ RUN ] VERSION.atLeast
[ OK ] VERSION.atLeast (0 ms)
[ RUN ] VERSION.hasFeature
[ OK ] VERSION.hasFeature (0 ms)
[----------] 5 tests from VERSION (1 ms total)
[----------] Global test environment tear-down
[==========] 5 tests from 1 test suite ran. (1 ms total)
[ PASSED ] 5 tests.
After several hours searching and trying, I found a conftest.py file required for my purpose. In conftest.py, people can override default pytest function, i.e. by providing hooks.
The following is an WIP example:
# conftest.py
import os
import random
def pytest_runtest_call(item):
item.add_report_section("call", "custom", " [ Run ] " + str(item))
def pytest_report_teststatus(report, config):
#print(">>> outcome:", report.outcome)
if report.when == 'call':
# line = f' [ Run ] {report.nodeid}'
# report.sections.append(('ChrisZZ', line))
if (report.outcome == 'failed'):
line = f' [ FAILED ] {report.nodeid}'
report.sections.append(('failed due to', line))
if report.when == 'teardown':
if (report.outcome == 'passed'):
line = f' [ OK ] {report.nodeid}'
report.sections.append(('ChrisZZ', line))
def pytest_terminal_summary(terminalreporter, exitstatus, config):
reports = terminalreporter.getreports('')
content = os.linesep.join(text for report in reports for secname, text in report.sections)
if content:
terminalreporter.ensure_newline()
#terminalreporter.section('', sep=' ', green=True, bold=True)
#terminalreporter.section('My custom section2', sep='------]', green=True, bold=True, fullwidth=None)
terminalreporter.line(content)
Is there an option to list the deselected tests in the cli output along with the mark that triggered their deselection?
I know that in suites with many tests this would not be good as a default but would be a useful option in something like api testing where the tests are likely to be more limited.
The numeric summary
collected 21 items / 16 deselected / 5 selected
is helpful but not enough when trying to organize marks and see what happened in a ci build.
pytest has a hookspec pytest_deselected for accessing the deselected tests. Example: add this code to conftest.py in your test root dir:
def pytest_deselected(items):
if not items:
return
config = items[0].session.config
reporter = config.pluginmanager.getplugin("terminalreporter")
reporter.ensure_newline()
for item in items:
reporter.line(f"deselected: {item.nodeid}", yellow=True, bold=True)
Running the tests now will give you an output similar to this:
$ pytest -vv
...
plugins: cov-2.8.1, asyncio-0.10.0
collecting ...
deselected: test_spam.py::test_spam
deselected: test_spam.py::test_bacon
deselected: test_spam.py::test_ham
collected 4 items / 3 deselected / 1 selected
...
If you want a report in another format, simply store the deselected items in the config and use them for the desired output somewhere else, e.g. pytest_terminal_summary:
# conftest.py
import os
def pytest_deselected(items):
if not items:
return
config = items[0].session.config
config.deselected = items
def pytest_terminal_summary(terminalreporter, exitstatus, config):
reports = terminalreporter.getreports('')
content = os.linesep.join(text for report in reports for secname, text in report.sections)
deselected = getattr(config, "deselected", [])
if deselected:
terminalreporter.ensure_newline()
terminalreporter.section('Deselected tests', sep='-', yellow=True, bold=True)
content = os.linesep.join(item.nodeid for item in deselected)
terminalreporter.line(content)
gives the output:
$ pytest -vv
...
plugins: cov-2.8.1, asyncio-0.10.0
collected 4 items / 3 deselected / 1 selected
...
---------------------------------------- Deselected tests -----------------------------------------
test_spam.py::test_spam
test_spam.py::test_bacon
test_spam.py::test_ham
================================= 1 passed, 3 deselected in 0.01s =================================
In pytest, when a test case is failed, you have in the report the following categories:
Failure details
Captured stdout call
Captured stderr call
Captured log call
I would like to add some additional custom sections (I have a server that turns in parallel and would like to display the information logged by this server in a dedicated section).
How could I do that (if ever possible)?
Thanks
NOTE:
I have currently found the following in source code but don't know whether that shall be right approach
nodes.py
class Item(Node):
...
def add_report_section(self, when, key, content):
"""
Adds a new report section, similar to what's done internally
to add stdout and stderr captured output::
...
"""
reports.py
class BaseReport:
...
#property
def caplog(self):
"""Return captured log lines, if log capturing is enabled
.. versionadded:: 3.5
"""
return "\n".join(
content for (prefix, content) in self.get_sections("Captured log")
)
To add custom sections to terminal output, you need to append to report.sections list. This can be done in pytest_report_teststatus hookimpl directly, or in other hooks indirectly (via a hookwrapper); the actual implementation heavily depends on your particular use case. Example:
# conftest.py
import os
import random
import pytest
def pytest_report_teststatus(report, config):
messages = (
'Egg and bacon',
'Egg, sausage and bacon',
'Egg and Spam',
'Egg, bacon and Spam'
)
if report.when == 'teardown':
line = f'{report.nodeid} says:\t"{random.choice(messages)}"'
report.sections.append(('My custom section', line))
def pytest_terminal_summary(terminalreporter, exitstatus, config):
reports = terminalreporter.getreports('')
content = os.linesep.join(text for report in reports for secname, text in report.sections)
if content:
terminalreporter.ensure_newline()
terminalreporter.section('My custom section', sep='-', blue=True, bold=True)
terminalreporter.line(content)
Example tests:
def test_spam():
assert True
def test_eggs():
assert True
def test_bacon():
assert False
When running the tests, you should see My custom section header at the bottom colored blue and containing a message for every test:
collected 3 items
test_spam.py::test_spam PASSED
test_spam.py::test_eggs PASSED
test_spam.py::test_bacon FAILED
============================================= FAILURES =============================================
____________________________________________ test_bacon ____________________________________________
def test_bacon():
> assert False
E assert False
test_spam.py:9: AssertionError
---------------------------------------- My custom section -----------------------------------------
test_spam.py::test_spam says: "Egg, bacon and Spam"
test_spam.py::test_eggs says: "Egg and Spam"
test_spam.py::test_bacon says: "Egg, sausage and bacon"
================================ 1 failed, 2 passed in 0.07 seconds ================================
The other answer shows how to add a custom section to the terminal report summary, but it's not the best way for adding a custom section per test.
For this goal, you can (and should) use the higher-level API add_report_section of an Item node (docs). A minimalist example is shown below, modify it to suit your needs. You can pass state from the test instance through an item node, if necessary.
In test_something.py, here is one passing test and two failing:
def test_good():
assert 2 + 2 == 4
def test_bad():
assert 2 + 2 == 5
def test_ugly():
errorerror
In conftest.py, setup a hook wrapper:
import pytest
content = iter(["first", "second", "third"])
#pytest.hookimpl(hookwrapper=True)
def pytest_runtest_call(item):
outcome = yield
item.add_report_section("call", "custom", next(content))
The report will now display custom sections per-test:
$ pytest
============================== test session starts ===============================
platform linux -- Python 3.9.0, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: /tmp/example
collected 3 items
test_something.py .FF [100%]
==================================== FAILURES ====================================
____________________________________ test_bad ____________________________________
def test_bad():
> assert 2 + 2 == 5
E assert (2 + 2) == 5
test_something.py:5: AssertionError
------------------------------ Captured custom call ------------------------------
second
___________________________________ test_ugly ____________________________________
def test_ugly():
> errorerror
E NameError: name 'errorerror' is not defined
test_something.py:8: NameError
------------------------------ Captured custom call ------------------------------
third
============================ short test summary info =============================
FAILED test_something.py::test_bad - assert (2 + 2) == 5
FAILED test_something.py::test_ugly - NameError: name 'errorerror' is not defined
========================== 2 failed, 1 passed in 0.02s ===========================
Is there a way to parametrize a test, when test has a list of different/multiple test-data?
example_test_data.json
{ "test_one" : [1,2,3], # this is the case, where the `test_one` test need to be parametrize.
"test_two" : "split",
"test_three" : {"three":3},
"test_four" : {"four":4},
"test_set_comparison" : "1234"
}
Directory structure:
main --
conftest.py # conftest file for my fixtures
testcases
project_1
(contains these files -- test_suite_1.py, config.json)
project_2
(contains these files -- test_suite_2.py, config.json)
workflows
libs
Using below code in conftest.py at top directory level, able to get/map the test data from json file for particular test case.
#pytest.yield_fixture(scope="class", autouse=True)
def test_config(request):
f = pathlib.Path(request.node.fspath.strpath)
print "File : %s" % f
config = f.with_name("config.json")
print "Config json file : %s" % config
with config.open() as fd:
testdata = json.loads(fd.read())
print "test data :", testdata
yield testdata
#pytest.yield_fixture(scope="function", autouse=True)
def config_data(request, test_config):
testdata = test_config
test = request.function.__name__
print "Class Name : %s" % request.cls.__name__
print "Testcase Name : %s" % test
if test in testdata:
test_args = testdata[test]
yield test_args
else:
yield {}
In my case:
#pytest.yield_fixture(scope="function", autouse=True)
def config_data(request, test_config):
testdata = test_config
test = request.function.__name__
print "Class Name : %s" % request.cls.__name__
print "Testcase Name : %s" % test
if test in testdata:
test_args = testdata[test]
if isinstance(test_args, list):
# How to parametrize the test
# yield test_args
else:
yield {}
I would handle the special parametrization case in pytest_generate_tests hook:
# conftest.py
import json
import pathlib
import pytest
#pytest.fixture(scope="class")
def test_config(request):
f = pathlib.Path(request.node.fspath.strpath)
config = f.with_name("config.json")
with config.open() as fd:
testdata = json.loads(fd.read())
yield testdata
#pytest.fixture(scope="function")
def config_data(request, test_config):
testdata = test_config
test = request.function.__name__
if test in testdata:
test_args = testdata[test]
yield test_args
else:
yield {}
def pytest_generate_tests(metafunc):
if 'config_data' not in metafunc.fixturenames:
return
config = pathlib.Path(metafunc.module.__file__).with_name('config.json')
testdata = json.loads(config.read_text())
param = testdata.get(metafunc.function.__name__, None)
if isinstance(param, list):
metafunc.parametrize('config_data', param)
Some notes: yield_fixture is deprecated so I replaced it with plain fixture. Also, you don't need autouse=True in fixtures that return values - you call them anyway.
Example tests and configs I used:
# testcases/project_1/config.json
{
"test_one": [1, 2, 3],
"test_two": "split"
}
# testcases/project_1/test_suite_1.py
def test_one(config_data):
assert config_data >= 0
def test_two(config_data):
assert config_data == 'split'
# testcases/project_2/config.json
{
"test_three": {"three": 3},
"test_four": {"four": 4}
}
# testcases/project_2/test_suite_2.py
def test_three(config_data):
assert config_data['three'] == 3
def test_four(config_data):
assert config_data['four'] == 4
Running the tests yields:
$ pytest -vs
============================== test session starts ================================
platform linux -- Python 3.6.5, pytest-3.4.1, py-1.5.3, pluggy-0.6.0 --
/data/gentoo64/usr/bin/python3.6
cachedir: .pytest_cache
rootdir: /data/gentoo64/home/u0_a82/projects/stackoverflow/so-50815777, inifile:
plugins: mock-1.6.3, cov-2.5.1
collected 6 items
testcases/project_1/test_suite_1.py::test_one[1] PASSED
testcases/project_1/test_suite_1.py::test_one[2] PASSED
testcases/project_1/test_suite_1.py::test_one[3] PASSED
testcases/project_1/test_suite_1.py::test_two PASSED
testcases/project_2/test_suite_2.py::test_three PASSED
testcases/project_2/test_suite_2.py::test_four PASSED
============================ 6 passed in 0.12 seconds =============================