I am using pytest to test code that creates a prometheus Python client to export metrics.
Between the test functions the prometheus client does not reset (because it has an internal state) which screws up my tests.
I am looking for a way to basically get a new Pyhon runtime before all test function calls. That way the internal state of the prometheus client would hopefully reset to a state that it had when the Python runtime started to execute my tests.
I already tried importlib.reload() but that does not work.
If you want to start each test with "clean" Prometheus client then I think best is to move it's creation and tear down to fixture with function scope (it's actually default scope), like this:
#pytest.fixture(scope=function)
def prometheus_client(arg1, arg2, etc...)
#create your client here
yield client
#remove your client here
Then you define your tests using this fixture:
def test_number_one(prometheus_client):
#test body
This way client is created from scratch in each test and deleted even if the test fails.
Straightforward approach: memorize current metric values before the test
First of all, let me show how I believe you should work with metrics in tests (and how I do it in my projects). Instead of doing resets, keep track of the metric values before the test starts. After the test finishes, collect the metrics again and analyze the diff between both values. Example of a django view test with a counter:
import copy
import prometheus_client
import pytest
from django.test import Client
def find_value(metrics, metric_name):
return next((
sample.value
for metric in metrics
for sample in metric.samples
if sample.name == metric_name
), None)
#pytest.fixture
def metrics_before():
yield copy.deepcopy(list(prometheus_client.REGISTRY.collect()))
#pytest.fixture
def client():
return Client()
def test_counter_increases_by_one(client, metrics_before):
# record the metric value before the HTTP client requests the view
value_before = find_value(metrics_before, 'http_requests_total') or 0.0
# do the request
client.get('/my-view/')
# collect the metric value again
metrics_after = prometheus_client.REGISTRY.collect()
value_after = find_value(metrics_after, 'http_requests_total')
# the value should have increased by one
assert value_after == value_before + 1.0
Now let's see what can be done with the registry itself. Note that this uses prometheus-client internals and is fragile by definition - use at your own risk!
Messing with prometheus-client internals: unregister all metrics
If you are sure your test code will invoke metrics registration from scratch, you can unregister all metrics from the registry before the test starts:
#pytest.fixture(autouse=True)
def clear_registry():
collectors = tuple(prometheus_client.REGISTRY._collector_to_names.keys())
for collector in collectors:
prometheus_client.REGISTRY.unregister(collector)
yield
Beware that this will only work if your test code invokes metrics registration again! Otherwise, you will effectively stop the metrics collection. For example, the built-in PlatformCollector will be gone until you explicitly register it again, e.g. by creating a new instance via prometheus_client.PlatformCollector().
Messing with prometheus-client internals: reset (almost) all metrics
You can also reset the values of registered metrics:
#pytest.fixture(autouse=True)
def reset_registry():
collectors = tuple(prometheus_client.REGISTRY._collector_to_names.keys())
for collector in collectors:
try:
collector._metrics.clear()
collector._metric_init()
except AttributeError:
pass # built-in collectors don't inherit from MetricsWrapperBase
yield
This will reinstantiate the values and metrics of all counters/gauges/histograms etc. The test from above could now be written as
def test_counter_increases_by_one(client):
# do the request
client.get('/my-view/')
# collect the metric value
metrics_after = prometheus_client.REGISTRY.collect()
value_after = find_value(metrics_after, 'http_requests_total')
# the value should be one
assert value_after == 1.0
Of course, this will not reset any metrics of built-in collectors, like PlatformCollector since it scrapes the values only once at instantiation, or ProcessCollector because it doesn't store any values at all, instead reading them from OS anew.
Related
I've got a Python application using pytest. For several of my tests, there are API calls to Elasticsearch (using elasticsearch-dsl-py) that slow down my tests that I'd like to:
prevent unless a Pytest marker is used.
If a marker is used, I would want that marker to execute some code before the test runs. Just like how a fixture would work if you used yield.
This is mostly inspired by pytest-django where you have to use the django_db marker in order to make a conn to the database (but they throw an error if you try to connect to the DB, whereas I just don't want a call in the first place, that's all).
For example:
def test_unintentionally_using_es():
"""I don't want a call going to Elasticsearch. But they just happen. Is there a way to "mock" the call? Or even just prevent the call from happening?"""
#pytest.mark.elastic
def test_intentionally_using_es():
"""I would like for this marker to perform some tasks beforehand (i.e. clear the indices)"""
# To replicate that second test, I currently use a fixture:
#pytest.fixture
def elastic():
# Pre-test tasks
yield something
I think that's a use-case for markers right? Mostly inspired by pytest-django.
Your initial approach with having a combination of a fixture and a custom marker is the correct one; in the code below, I took the code from your question and filled in the gaps.
Suppose we have some dummy function to test that uses the official elasticsearch client:
# lib.py
from datetime import datetime
from elasticsearch import Elasticsearch
def f():
es = Elasticsearch()
es.indices.create(index='my-index', ignore=400)
return es.index(
index="my-index",
id=42,
body={"any": "data", "timestamp": datetime.now()},
)
We add two tests, one is not marked with elastic and should operate on fake client, the other one is marked and needs access to real client:
# test_lib.py
from lib import f
def test_fake():
resp = f()
assert resp["_id"] == "42"
#pytest.mark.elastic
def test_real():
resp = f()
assert resp["_id"] == "42"
Now let's write the elastic() fixture that will mock the Elasticsearch class depending on whether the elastic marker was set:
from unittest.mock import MagicMock, patch
import pytest
#pytest.fixture(autouse=True)
def elastic(request):
should_mock = request.node.get_closest_marker("elastic") is None
if should_mock:
patcher = patch('lib.Elasticsearch')
fake_es = patcher.start()
# this is just a mock example
fake_es.return_value.index.return_value.__getitem__.return_value = "42"
else:
... # e.g. start the real server here etc
yield
if should_mock:
patcher.stop()
Notice the usage of autouse=True: the fixture will be executed on each test invocation, but only do the patching if the test is not marked. This presence of the marker is checked via request.node.get_closest_marker("elastic") is None. If you run both tests now, test_fake will pass because elastic mocks the Elasticsearch.index() response, while test_real will fail, assuming you don't have a server running on port 9200.
I have a function which is registered as an event on a sqlalchemy model, as show in the code snippets below (not fully-functional as I don't show the db fixture), which should be enough to explain the problem.
root/myapp/models.py:
class MyModel:
id = Column(UUID, primary_key=True)
value = ''
#classmethod
def register_hook(cls, hook_fn):
event.listen(cls, "after_update", hook_fn, propagate=True)
root/myapp/app.py:
from models import MyModel
def hook_fn(mapper, connection, target):
print('fired hook!')
MyModel.register_hook(hook_fn)
root/test/conftest.py:
#pytest.fixture
def patched_hook_fn(mocker):
with mocker.patch("root.myapp.app.hook_fn") as patched:
yield patched
root/test/tests.py:
def test_hook_fires_on_change(db, patched_hook_fn):
model = MyModel(value="initial")
db.session.commit()
model.value = "changed"
db.session.commit() # hook fires here
assert patched_hook_fn.called # assert fails
What I'd like to know is:
Why doesn't the patched function get called?
Is there a simple way in a debug session to see where I should be patching in the with mocker.patch("myapp.app.hook_fn") as patched line?
It doesn't get called because you've already registered the unpatched version with the event system. SQLAlchemy does not read the value at root.myapp.app.hook_fn every time the event is fired, so even if you later set root.myapp.app.hook_fn = some_other_function (which is what patch is doing), it has no visible effect.
The way to fix this is to simply force your app to read the value every time the event is fired, by introducing a level of indirection:
MyModel.register_hook(lambda: hook_fn())
This takes advantage of the way Python resolves identifiers in a closure, where changing root.myapp.app.hook_fn actually changes the value of hook_fn in the closure.
As for your second question, there's no straightforward way to figure out what you need to patch because in order to patch it directly you need to figure out where it is stored in the internals of SQLAlchemy, and depending on that, even in your tests, is quite fragile.
I am a beginner to using pytest in python and trying to write test cases for the following method which get the user address when correct Id is passed else rises custom error BadId.
def get_user_info(id: str, host='127.0.0.1', port=3000 ) -> str:
uri = 'http://{}:{}/users/{}'.format(host,port,id)
result = Requests.get(uri).json()
address = result.get('user',{}).get('address',None)
if address:
return address
else:
raise BadId
Can someone help me with this and also can you suggest me what are the best resources for learning pytest? TIA
Your test regimen might look something like this.
First I suggest creating a fixture to be used in your various method tests. The fixture sets up an instance of your class to be used in your tests rather than creating the instance in the test itself. Keeping tasks separated in this way helps to make your tests both more robust and easier to read.
from my_package import MyClass
import pytest
#pytest.fixture
def a_test_object():
return MyClass()
You can pass the test object to your series of method tests:
def test_something(a_test_object):
# do the test
However if your test object requires some resources during setup (such as a connection, a database, a file, etc etc), you can mock it instead to avoid setting up the resources for the test. See this talk for some helpful info on how to do that.
By the way: if you need to test several different states of the user defined object being created in your fixture, you'll need to parametrize your fixture. This is a bit of a complicated topic, but the documentation explains fixture parametrization very clearly.
The other thing you need to do is make sure any .get calls to Requests are intercepted. This is important because it allows your tests to be run without an internet connection, and ensures they do not fail as a result of a bad connection, which is not the thing you are trying to test.
You can intercept Requests.get by using the monkeypatch feature of pytest. All that is required is to include monkeypatch as an input parameter to the test regimen functions.
You can employ another fixture to accomplish this. It might look like this:
import Requests
import pytest
#pytest.fixture
def patched_requests(monkeypatch):
# store a reference to the old get method
old_get = Requests.get
def mocked_get(uri, *args, **kwargs):
'''A method replacing Requests.get
Returns either a mocked response object (with json method)
or the default response object if the uri doesn't match
one of those that have been supplied.
'''
_, id = uri.split('/users/', 1)
try:
# attempt to get the correct mocked json method
json = dict(
with_address1 = lambda: {'user': {'address': 123}},
with_address2 = lambda: {'user': {'address': 456}},
no_address = lambda: {'user': {}},
no_user = lambda: {},
)[id]
except KeyError:
# fall back to default behavior
obj = old_get(uri, *args, **kwargs)
else:
# create a mocked requests object
mock = type('MockedReq', (), {})()
# assign mocked json to requests.json
mock.json = json
# assign obj to mock
obj = mock
return obj
# finally, patch Requests.get with patched version
monkeypatch.setattr(Requests, 'get', mocked_get)
This looks complicated until you understand what is happening: we have simply made some mocked json objects (represented by dictionaries) with pre-determined user ids and addresses. The patched version of Requests.get simply returns an object- of type MockedReq- with the corresponding mocked .json() method when its id is requested.
Note that Requests will only be patched in tests that actually use the above fixture, e.g.:
def test_something(patched_requests):
# use patched Requests.get
Any test that does not use patched_requests as an input parameter will not use the patched version.
Also note that you could monkeypatch Requests within the test itself, but I suggest doing it separately. If you are using other parts of the requests API, you may need to monkeypatch those as well. Keeping all of this stuff separate is often going to be easier to understand than including it within your test.
Write your various method tests next. You'll need a different test for each aspect of your method. In other words, you will usually write a different test for the instance in which your method succeeds, and another one for testing when it fails.
First we test method success with a couple test cases.
#pytest.mark.parametrize('id, result', [
('with_address1', 123),
('with_address2', 456),
])
def test_get_user_info_success(patched_requests, a_test_object, id, result):
address = a_test_object.get_user_info(id)
assert address == result
Next we can test for raising the BadId exception using the with pytest.raises feature. Note that since an exception is raised, there is not a result input parameter for the test function.
#pytest.mark.parametrize('id', [
'no_address',
'no_user',
])
def test_get_user_info_failure(patched_requests, a_test_object, id):
from my_package import BadId
with pytest.raises(BadId):
address = a_test_object.get_user_info(id)
As posted in my comment, here also are some additional resources to help you learn more about pytest:
link
link
Also be sure to check out Brian Okken's book and Bruno Oliveira's book. They are both very helpful for learning pytest.
I'm not sure if this is an IntelliJ thing or not (using the built-in test runner) but I have a class whose logging output I'd like to appear in the test case that I am running. I hope the example code is enough scope, if not I can edit to include more.
Basically the log.info() call in the Matching() class never shows up in my test runner console when running. Is there something I need to configure on the class that extends TestCase ?
Here's the class in matching.py:
class Matching(object):
"""
The main compliance matching logic.
"""
request_data = None
def __init__(self, matching_request):
"""
Set matching request information.
"""
self.request_data = matching_request
def can_matching_run(self):
raise Exception("Not implemented yet.")
def run_matching(self):
log.info("Matching started at {0}".format(datetime.now()))
Here is the test:
class MatchingServiceTest(IntegrationTestBase):
def __do_matching(self, client_name, date_range):
"""
Pull control records from control table, and compare against program generated
matching data from teh non-control table.
The ``client_name`` dictates which model to use. Data is compared within
a mock ``date_range``.
"""
from matching import Matching, MatchingRequest
# Run the actual matching service for client.
match_request = MatchingRequest(client_name, date_range)
matcher = Matching(match_request)
matcher.run_matching()
Well I do not see where you initialize the log object but I presume you do that somewhere and you add a Handler to it (StreamHandler, FileHandler etc.)
This means that during your tests this does not occur. So you would have to that in test. Since you did not post that part of the code, I can't give an exact solution:
import logging
log = logging.getLogger("your-logger-name")
log.addHandler(logging.StreamHandler())
log.setLevel(logging.DEBUG)
Although test should generally not have anything printed out to stdout. It's best to use a FileHandler, and you should design your tests in such a way that they will fail if something goes wrong. That's the whole point of automated tests. So you won't have to manually inspect the output. If they fail, you can then check the log to see if they contain useful debugging information.
Hope this helps.
Read more on logging here.
I'm using the following approach to handle a FIFO queue based on Google App Engine db.Model (see this question).
from google.appengine.ext import db
from google.appengine.ext import webapp
from google.appengine.ext.webapp import run_wsgi_app
class QueueItem(db.Model):
created = db.DateTimeProperty(required=True, auto_now_add=True)
data = db.BlobProperty(required=True)
#staticmethod
def push(data):
"""Add a new queue item."""
return QueueItem(data=data).put()
#staticmethod
def pop():
"""Pop the oldest item off the queue."""
def _tx_pop(candidate_key):
# Try and grab the candidate key for ourselves. This will fail if
# another task beat us to it.
task = QueueItem.get(candidate_key)
if task:
task.delete()
return task
# Grab some tasks and try getting them until we find one that hasn't been
# taken by someone else ahead of us
while True:
candidate_keys = QueueItem.all(keys_only=True).order('created').fetch(10)
if not candidate_keys:
# No tasks in queue
return None
for candidate_key in candidate_keys:
task = db.run_in_transaction(_tx_pop, candidate_key)
if task:
return task
This queue works as expected (very good).
Right now my code has a method that access this FIFO queue invoked by a deferred queue:
def deferred_worker():
data= QueueItem.pop()
do_something_with(data)
I would like to enhance this method and the queue data structure adding a client_ID parameter representing a specific client that needs to access its own Queue.
Something like:
def deferred_worker(client_ID):
data= QueueItem_of_this_client_ID.pop() # I need to implement this
do_something_with(data)
How could I code the Queue to be client_ID aware?
Constraints:
- The number of clients is dynamic and not predefined
- Taskqueue is not an option (1. ten max queues 2. I would like to have full control on my queue)
Do you know how could I add this behaviour using the new Namespaces api (Remember that I'm not calling the db.Model from a webapp.RequestHandler)?
Another option: I could add a client_ID db.StringProperty to the QueueItem using it has a filter on pull method:
QueueItem.all(keys_only=True).filter(client_ID=an_ID).order('created').fetch(10)
Any better idea?
Assuming your "client class" is really a request handler the client calls, you could do something like this:
from google.appengine.api import users
from google.appengine.api.namespace_manager import set_namespace
class ClientClass(webapp.RequestHandler):
def get(self):
# For this example let's assume the user_id is your unique id.
# You could just as easily use a parameter you are passed.
user = users.get_current_user()
if user:
# If there is a user, use their queue. Otherwise the global queue.
set_namespace(user.user_id())
item = QueueItem.pop()
self.response.out.write(str(item))
QueueItem.push('The next task.')
Alternatively, you could also set the namespace app-wide.
By setting the default namespace all calls to the datastore will be "within" that namespace, unless you explicitly specify otherwise. Just note, to fetch and run tasks you'll have to know the namespace. So you probably want to maintain a list of namespaces in the default namespace for cleanup purposes.
As I said in response to your query on my original answer, you don't need to do anything to make this work with namespaces: the datastore, on which the queue is built, already supports namespaces. Just set the namespace as desired, as described in the docs.