How do I use pytest with bazel? - python

I have a my_module.py file that implements my_module and a file test_my_module.py that does import my_module and runs some tests written with pytest on it.
Normally I run the tests by cding into the directory that contains these two files and then doing
pytest
Now I want to use Bazel. I've added my_module.py as a py_binary but I don't know what the right way to invoke my tests is.

If you want to create a reusable code, that don't need add call to pytest add end of every python file with test. You can create py_test call that call a python file wrapping a call to pytest and keeping all argument. Then create a macro around the py_test. I explain the detailed solution in Experimentations on Bazel: Python (3), linter & pytest, with link to source code.
Create the python tool (wrapp call to pytest, or only pylint) in tools/pytest/pytest_wrapper.py
import sys
import pytest
# if using 'bazel test ...'
if __name__ == "__main__":
sys.exit(pytest.main(sys.argv[1:]))
Create the macro in tools/pytest/defs.bzl
"""Wrap pytest"""
load("#rules_python//python:defs.bzl", "py_test")
load("#my_python_deps//:requirements.bzl", "requirement")
def pytest_test(name, srcs, deps = [], args = [], data = [], **kwargs):
"""
Call pytest
"""
py_test(
name = name,
srcs = [
"//tools/pytest:pytest_wrapper.py",
] + srcs,
main = "//tools/pytest:pytest_wrapper.py",
args = [
"--capture=no",
"--black",
"--pylint",
"--pylint-rcfile=$(location //tools/pytest:.pylintrc)",
# "--mypy",
] + args + ["$(location :%s)" % x for x in srcs],
python_version = "PY3",
srcs_version = "PY3",
deps = deps + [
requirement("pytest"),
requirement("pytest-black"),
requirement("pytest-pylint"),
# requirement("pytest-mypy"),
],
data = [
"//tools/pytest:.pylintrc",
] + data,
**kwargs
)
expose some resources from tools/pytest/BUILD.bazel
exports_files([
"pytest_wrapper.py",
".pylintrc",
])
Call it from your package BUILD.bazel
load("//tools/pytest:defs.bzl", "pytest_test")
...
pytest_test(
name = "test",
srcs = glob(["*.py"]),
deps = [
...
],
)
then calling bazel test //... means that pylint, pytest and black are all part of the test flow.

Add the following code to test_my_module.py and mark the test script as a py_test instead of py_binary in your BUILD file:
if __name__ == "__main__":
import pytest
raise SystemExit(pytest.main([__file__]))
Then you can run your tests with bazel test test_my_module

Following on from #David Bernard, who wrote his answer in an awesome series of blog posts BTW, there is a curve-ball there with pytest + bazel + Windows...
Long story short, you'll need to add legacy_create_init = 0 to the py_test rule call.
This is a workaround a "feature" where bazel will create __init__.py files in the sandbox, even when none were present in your repo https://github.com/bazelbuild/rules_python/issues/55

It seems a bunch of the suggestions here have been packaged up into https://github.com/caseyduquettesc/rules_python_pytest now.
load("#rules_python_pytest//python_pytest:defs.bzl", "py_pytest_test")
py_pytest_test(
name = "test_w_pytest",
size = "small",
srcs = ["test.py"],
deps = [
# TODO Add this for the user
requirement("pytest"),
],
)
Edit: I'm the author of the above repository

Related

Kedro cannot find run

As a part of upgrading Kedro from 0.16.2 to 0.17.3 in our organization, I've made changes to Kedro related files in our codebase based on Kedro starter pyspark-iris on 0.17.3.
Now I get an error of Error: No such command 'run' on kedro run.
setup.py
from setuptools import find_packages, setup
entry_point = "kedro-project = kedro-package.__main__:main"
# get the dependencies and installs
with open("requirements.txt", "r", encoding="utf-8") as f:
# Make sure we strip all comments and options (e.g "--extra-index-url")
# that arise from a modified pip.conf file that configure global options
# when running kedro build-reqs
requires = []
for line in f:
req = line.split("#", 1)[0].strip()
if req and not req.startswith("--"):
requires.append(req)
setup(
name="kedro-package",
version="0.1",
packages=find_packages(exclude=["tests"]),
entry_points={"console_scripts": [entry_point]},
install_requires=requires,
extras_require={
"docs": [
"sphinx~=3.4.3",
"sphinx_rtd_theme==0.5.1",
"nbsphinx==0.8.1",
"nbstripout==0.3.3",
"recommonmark==0.7.1",
"sphinx-autodoc-typehints==1.11.1",
"sphinx_copybutton==0.3.1",
"jupyter_client>=5.1.0, <6.0",
"tornado>=4.2, <6.0",
"ipykernel~=5.3",
]
},
)
main.py
from pathlib import Path
from kedro.framework.project import configure_project
import logging
from .cli import run
def main():
package_name = str(Path(__file__).resolve().parent.name)
logging.getLogger(__name__).info(f"package name is: {package_name}")
configure_project(package_name=package_name)
run()
if __name__ == "__main__":
main()
and cli.py is at the same level as main.py which are directly inside the package (altered to kedro-package here for anonymity)
This only happens when performing kedro run on the EMR. When we run locally we don't see that error. Rather it errors out because it can't connect to S3, which is expected.
Additionally, I've tried running

How to modify pytest arguments?

I found out for this purpose I can use PyTest function pytest_load_initial_conftests()
https://docs.pytest.org/en/latest/example/simple.html#dynamically-adding-command-line-options
But I can't implement this example (see link) correctly.
pytest_load_initial_conftests() doesn't even start (looked through debug).
Tests run ordinary without any params (one thread), but I expected "-n" param.
I installed pytest and xdist.
Only two file in project. There are no pytest.ini.
What am I doing wrong? Please help run it.
conftest.py
import pytest
import os
import sys
def pytest_addoption(parser):
parser.addoption('--some_param', action='store', help='some_param', default='')
def pytest_configure(config):
some_param = config.getoption('--some_param')
def pytest_load_initial_conftests(args):
if "xdist" in sys.modules:
import multiprocessing
num = max(multiprocessing.cpu_count() / 2, 1)
args[:] = ["-n", str(num)] + args
test_t1.py
import inspect
from time import sleep
import os
import pytest
class Test_Run:
def test_1(self):
body()
def test_2(self):
body()
def test_3(self):
body()
def test_4(self):
body()
def setup(self):
pass
def teardown(self):
pass
def body():
sleep(5)
According to the docs on pytest_load_initial_conftests:
Note: This hook will not be called for conftest.py files, only for
setuptools plugins.
https://docs.pytest.org/en/latest/reference/reference.html#pytest.hookspec.pytest_load_initial_conftests
Probably it shouldn't be mentioned on that page that you found.
Edit: update docs url
Add extra plugin to make pytest arguments dynamic
As per the API documentation, pytest_load_initial_conftests hook will not be called in conftest.py file and it can be used in plugins only.
Further, pytest documentation mentions how to write custom plugin for pytest and make it installable.
Following this:
create following files in root directory
- ./setup.py
- ./plugin.py
- ./tests/conftest.py
- ./pyproject.toml
# contents of ./setup.py
from setuptools import setup
setup(
name='my_project',
version='0.0.1',
entry_points={
'console_scripts': [
], # feel free to add if you have any
"pytest11": ["custom_args = plugin"]
},
classifiers=["Framework :: Pytest"],
)
notice python11 here, it is reserved for adding pytest plugins as I have read.
# contents of ./plugin.py
import sys
def pytest_load_initial_conftests(args):
if "xdist" in sys.modules:
import multiprocessing
num = max(multiprocessing.cpu_count() / 2, 1)
args[:] = ["-n", str(num)] + args
# contents of ./tests/conftest.py
pytest_plugins = ["custom_args"] # allows to load plugin while running tests
# ... other fixtures and hooks
finally, pyproject.toml file for the project
# contents of ./pyproject.toml
[tool.setuptools]
py-modules = []
[tool.setuptools]
py-modules = []
[build-system]
requires = [
"setuptools",
]
build-backend = "setuptools.build_meta"
[project]
name = "my_package"
description = "My package description"
readme = "README.md"
requires-python = ">=3.8"
classifiers = [
"Framework :: Flask",
"Programming Language :: Python :: 3",
]
dynamic = ["version"]
This will Dynamically add -n argument with value which enables parallel running based on the number of CPUs your system has.
Hope this helps, feel free to comment.

How do you properly integrate unit tests for file parsing with pytest?

I'm trying to test file parsing with pytest. I have a directory tree that looks something like this for my project:
project
project/
cool_code.py
setup.py
setup.cfg
test/
test_read_files.py
test_files/
data_file1.txt
data_file2.txt
My setup.py file looks something like this:
from setuptools import setup
setup(
name = 'project',
description = 'The coolest project ever!',
setup_requires = ['pytest-runner'],
tests_require = ['pytest'],
)
My setup.cfg file looks something like this:
[aliases]
test=pytest
I've written several unit tests with pytest to verify that files are properly read. They work fine when I run pytest from within the "test" directory. However, if I execute any of the following from my project directory, the tests fail because they cannot find data files in test_files:
>> py.test
>> python setup.py pytest
The test seems to be sensitive to the directory from which pytest is executed.
How can I get pytest unit tests to discover the files in "data_files" for parsing when I call it from either the test directory or the project root directory?
One solution is to define a rootdir fixture with the path to the test directory, and reference all data files relative to this. This can be done by creating a test/conftest.py (if not already created) with some code like this:
import os
import pytest
#pytest.fixture
def rootdir():
return os.path.dirname(os.path.abspath(__file__))
Then use os.path.join in your tests to get absolute paths to test files:
import os
def test_read_favorite_color(rootdir):
test_file = os.path.join(rootdir, 'test_files/favorite_color.csv')
data = read_favorite_color(test_file)
# ...
One solution is to try multiple paths to find the files.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from coolprogram import *
import os
def test_file_locations():
"""Possible locations where test data could be found."""
return(['./test_files',
'./tests/test_files',
])
def find_file(filename):
""" Searches for a data file to use in tests """
for location in test_file_locations():
filepath = os.path.join(location, filename)
if os.path.exists(filepath):
return(filepath)
raise IOError('Could not find test file.')
def test_read_favorite_color():
""" Test that favorite color is read properly """
filename = 'favorite_color.csv'
test_file = find_file(filename)
data = read_favorite_color(test_file)
assert(data['first_name'][1] == 'King')
assert(data['last_name'][1] == 'Arthur')
assert(data['correct_answers'][1] == 2)
assert(data['cross_bridge'][1] == True)
assert(data['favorite_color'][1] == 'green')
One way is to pass a dictionary of command name and custom command class to cmdclass argument of setup function.
Another way is like here, posted it here for quick reference.
pytest-runner will install itself on every invocation of setup.py. In some cases, this causes delays for invocations of setup.py that will never invoke pytest-runner. To help avoid this contingency, consider requiring pytest-runner only when pytest is invoked:
pytest = {'pytest', 'test', 'ptr'}.intersection(sys.argv)
pytest_runner = ['pytest-runner'] if needs_pytest else []
# ...
setup(
#...
setup_requires=[
#... (other setup requirements)
] + pytest_runner,
)
Make sure all the data you read in your test module is relative to the location of setup.py directory.
In OP's case data file path would be test/test_files/data_file1.txt,
I made a project with same structure and read the data_file1.txt with some text in it and it works for me.

py2app picking up .git subdir of a package during build

We use py2app extensively at our facility to produce self contained .app packages for easy internal deployment without dependency issues. Something I noticed recently, and have no idea how it began, is that when building an .app, py2app started including the .git directory of our main library.
commonLib, for instance, is our root python library package, which is a git repo. Under this package are the various subpackages such as database, utility, etc.
commonLib/
|- .git/ # because commonLib is a git repo
|- __init__.py
|- database/
|- __init__.py
|- utility/
|- __init__.py
# ... etc
In a given project, say Foo, we will do imports like from commonLib import xyz to use our common packages. Building via py2app looks something like: python setup.py py2app
So the recent issue I am seeing is that when building an app for project Foo, I will see it include everything in commonLib/.git/ into the app, which is extra bloat. py2app has an excludes option but that only seems to be for python modules. I cant quite figure out what it would take to exclude the .git subdir, or in fact, what is causing it to be included in the first place.
Has anyone experienced this when using a python package import that is a git repo?
Nothing has changed in our setup.py files for each project, and commonLib has always been a git repo. So the only thing I can think of being a variable is the version of py2app and its deps which have obviously been upgraded over time.
Edit
I'm using the latest py2app 0.6.4 as of right now. Also, my setup.py was first generated from py2applet a while back, but has been hand configured since and copied over as a template for every new project. I am using PyQt4/sip for every single one of these projects, so it also makes me wonder if its an issue with one of the recipes?
Update
From the first answer, I tried to fix this using various combinations of exclude_package_data settings. Nothing seems to force the .git directory to become excluded. Here is a sample of what my setup.py files generally look like:
from setuptools import setup
from myApp import VERSION
appname = 'MyApp'
APP = ['myApp.py']
DATA_FILES = []
OPTIONS = {
'includes': 'atexit, sip, PyQt4.QtCore, PyQt4.QtGui',
'strip': True,
'iconfile':'ui/myApp.icns',
'resources':['src/myApp.png'],
'plist':{
'CFBundleIconFile':'ui/myApp.icns',
'CFBundleIdentifier':'com.company.myApp',
'CFBundleGetInfoString': appname,
'CFBundleVersion' : VERSION,
'CFBundleShortVersionString' : VERSION
}
}
setup(
app=APP,
data_files=DATA_FILES,
options={'py2app': OPTIONS},
setup_requires=['py2app'],
)
I have tried things like:
setup(
...
exclude_package_data = { 'commonLib': ['.git'] },
#exclude_package_data = { '': ['.git'] },
#exclude_package_data = { 'commonLib/.git/': ['*'] },
#exclude_package_data = { '.git': ['*'] },
...
)
Update #2
I have posted my own answer which does a monkeypatch on distutils. Its ugly and not preferred, but until someone can offer me a better solution, I guess this is what I have.
I am adding an answer to my own question, to document the only thing I have found to work thus far. My approach was to monkeypatch distutils to ignore certain patterns when creating a directory or copying a file. This is really not what I wanted to do, but like I said, its the only thing that works so far.
## setup.py ##
import re
# file_util has to come first because dir_util uses it
from distutils import file_util, dir_util
def wrapper(fn):
def wrapped(src, *args, **kwargs):
if not re.search(r'/\.git/?', src):
fn(src, *args, **kwargs)
return wrapped
file_util.copy_file = wrapper(file_util.copy_file)
dir_util.mkpath = wrapper(dir_util.mkpath)
# now import setuptools so it uses the monkeypatched methods
from setuptools import setup
Hopefully someone will comment on this and tell me a higher level approach to avoid doing this. But as of now, I will probably wrap this into a utility method like exclude_data_patterns(re_pattern) to be reused in my projects.
I can see two options for excluding the .git directory.
Build the application from a 'clean' checkout of the code. When deploying a new version, we always build from a fresh svn export based on a tag to ensure we don't pick up spurious changes/files. You could try the equivalent here - although the git equivalent seems somewhat more involved.
Modify the setup.py file to massage the files included in the application. This might be done using the exclude_package_data functionality as described in the docs, or build the list of data_files and pass it to setup.
As for why it has suddenly started happening, knowing the version of py2app you are using might help, as will knowing the contents of your setup.py and perhaps how this was made (by hand or using py2applet).
I have a similar experience with Pyinstaller, so I'm not sure it applies directly.
Pyinstaller creates a "manifest" of all files to be included in the distribution, before running the export process. You could "massage" this manifest, as per Mark's second suggestion, to exclude any files you want. Including anything within .git or .git itself.
In the end, I stuck with checking out my code before producing a binary as there was more than just .git being bloat (such as UML documents and raw resource files for Qt). A checkout guaranteed a clean result and I experienced no issues automating that process along with the process of creating the installer for the binary.
There is a good answer to this, but I have a more elaborate answer to solve the problem mentioned here with a white-list approach. To have the monkey patch also work for packages outside site-packages.zip I had to monkey patch also copy_tree (because it imports copy_file inside its function), this helps in making a standalone application.
In addition, I create a white-list recipe to mark certain packages zip-unsafe. The approach makes it easy to add filters other than white-list.
import pkgutil
from os.path import join, dirname, realpath
from distutils import log
# file_util has to come first because dir_util uses it
from distutils import file_util, dir_util
# noinspection PyUnresolvedReferences
from py2app import util
def keep_only_filter(base_mod, sub_mods):
prefix = join(realpath(dirname(base_mod.filename)), '')
all_prefix = [join(prefix, sm) for sm in sub_mods]
log.info("Set filter for prefix %s" % prefix)
def wrapped(mod):
name = getattr(mod, 'filename', None)
if name is None:
# ignore anything that does not have file name
return True
name = join(realpath(dirname(name)), '')
if not name.startswith(prefix):
# ignore those that are not in this prefix
return True
for p in all_prefix:
if name.startswith(p):
return True
# log.info('ignoring %s' % name)
return False
return wrapped
# define all the filters we need
all_filts = {
'mypackage': (keep_only_filter, [
'subpackage1', 'subpackage2',
]),
}
def keep_only_wrapper(fn, is_dir=False):
filts = [(f, k[1]) for (f, k) in all_filts.iteritems()
if k[0] == keep_only_filter]
prefixes = {}
for f, sms in filts:
pkg = pkgutil.get_loader(f)
assert pkg, '{f} package not found'.format(f=f)
p = join(pkg.filename, '')
sp = [join(p, sm, '') for sm in sms]
prefixes[p] = sp
def wrapped(src, *args, **kwargs):
name = src
if not is_dir:
name = dirname(src)
name = join(realpath(name), '')
keep = True
for prefix, sub_prefixes in prefixes.iteritems():
if name == prefix:
# let the root pass
continue
# if it is a package we have a filter for
if name.startswith(prefix):
keep = False
for sub_prefix in sub_prefixes:
if name.startswith(sub_prefix):
keep = True
break
if keep:
return fn(src, *args, **kwargs)
return []
return wrapped
file_util.copy_file = keep_only_wrapper(file_util.copy_file)
dir_util.mkpath = keep_only_wrapper(dir_util.mkpath, is_dir=True)
util.copy_tree = keep_only_wrapper(util.copy_tree, is_dir=True)
class ZipUnsafe(object):
def __init__(self, _module, _filt):
self.module = _module
self.filt = _filt
def check(self, dist, mf):
m = mf.findNode(self.module)
if m is None:
return None
# Do not put this package in site-packages.zip
if self.filt:
return dict(
packages=[self.module],
filters=[self.filt[0](m, self.filt[1])],
)
return dict(
packages=[self.module]
)
# Any package that is zip-unsafe (uses __file__ ,... ) should be added here
# noinspection PyUnresolvedReferences
import py2app.recipes
for module in [
'sklearn', 'mypackage',
]:
filt = all_filts.get(module)
setattr(py2app.recipes, module, ZipUnsafe(module, filt))

Create different distribution types with setup.py

Given the following (demonstration) project layout:
MyProject/
README
LICENSE
setup.py
myproject/
... # packages
extrastuff/
... # some extra data
How (and where) do I declare different distribution types? Especially I need these two options:
A distribution containing only the source
A distribution containing the source and all data files under (extrastuff)
Ideally, how do I declare the upper two configuration whereas the second one depends on the first one?
I've implemented something like this before ... the sdist command can be extended to handle additional command line arguments and to manipulate the data files based on these. If you run python setup.py sdist --help, it'll include your custom command line arguments in the help, which is nice. Use the following recipe:
from distutils import log
from distutils.core import setup
from distutils.command.sdist import sdist
class CustomSdist(sdist):
user_options = [
('packaging=', None, "Some option to indicate what should be packaged")
] + sdist.user_options
def __init__(self, *args, **kwargs):
sdist.__init__(self, *args, **kwargs)
self.packaging = "default value for this option"
def get_file_list(self):
log.info("Chosen packaging option: {self.packaging}".format(self=self))
# Change the data_files list here based on the packaging option
self.distribution.data_files = list(
('folder', ['file1', 'file2'])
)
sdist.get_file_list(self)
if __name__ == "__main__":
setup(
name = "name",
version = "version",
author = "author",
author_email = "author_email",
url = "url",
py_modules = [
# ...
],
packages = [
# ...
],
# data_files = default data files for commands other than sdist if you wish
cmdclass={
'sdist': CustomSdist
}
)
You could extend setup.py to additionally include some custom command line parsing. You could then catch a custom argument and strip it out so that it won't affect setuptools.
You can access the command line argument in sys.argv. As for modifying the call to setuptools.setup(), I recommend creating a dictionary of arguments to pass, modifying the dictionary based on the command line arguments, and then calling setup() using the **dict notation, like so:
from setuptools import setup
import sys
basic = {'name': 'my program'}
extra = {'bonus': 'content'}
if '--extras' in sys.argv:
basic.update(extra)
sys.argv.remove('--extras')
setup(**basic)
For more thorough command line parsing you could also use the getopt module, or the newer argparse module if you're only targeting Python 2.7 and higher.
EDIT: I also found a section in the distutils documentation titled Creating a new Distutils command. That may also be a helpful resource.

Categories