I am writing unit testing for a spark application. I am using pytest and I have created a fixture to load the spark session once.
When I run one test at a time, it is passing but when I run all the tests together I am getting unexpected behavior. Then, I realize, spark is not multi-threadable. Any way to fix this? Is running pytest in non-parallel mode is the only solution?
Sample code structure,
#pytest.fixture(scope="session")
def spark() -> SparkSession:
builder = SparkSession.builder.appName("pandas-on-spark")
builder = builder.config("spark.sql.execution.arrow.pyspark.enabled", "true")
return builder.getOrCreate()
def test1(spark):
df = spark.createDataFrame(dummy_rows)
# do some transformaton
# assert
def test2(spark):
df = spark.createDataFrame(dummy_rows)
# do some transformaton
# assert
def testN(spark):
df = spark.createDataFrame(dummy_rows)
# do some transformaton
# assert
pytest -s .
With scope="session", you'd have a single Spark session for all the tests, means all variables, all caches, all transformations etc. If you really need to have each transformation completely separated from each test, you should consider having a new Spark session for each test by changing lower scope into class or function. The whole test would run slower, but your logic will be secured.
I am new to Python ,
For the data driven testing, if there is 10tests , and the assertion fails [ AssertionError. ] for 4th test , then the rest of the 6 data sets are not considered for execution and the program gets stopped at that point completely. I want test to continue even if one dataset fails ? How can we achieve with fixtures ? Or is there any other way ?
We set up a fixture for Data Driven Testing using Fixtures.
========== TEST DATA ============
Test data is defined in Json File [ array of JSON Objects ]
==============. Fixture Code =========
from _pytest.fixtures import fixture
users_json_files_path=. “//Pathto_Input_Json_File”.
#fixture(params=load_jsondata(users_json_files_path))
def users_testdata(request):
user_test_data=request.param
return user_test_data
=========== Passing Fixture to Test Method . ==============
def test_one(users_testdata):
assert len(users_not_present == 0 ,"Following Users not present :: " + str(users_not_present)
Thanks,
Deepti
Not exactly sure what you are asserting. You can use this for your case:
https://pypi.org/project/pytest_check/
I am developing an application using with the Cloud Datastore Emulator (2.1.0) and the google-cloud-ndb Python library (1.6).
I find that there is an intermittent delay on entities being retrievable via a query.
For example, if I create an entity like this:
my_entity = MyEntity(foo='bar')
my_entity.put()
get_my_entity = MyEntity.query().filter(MyEntity.foo == 'bar').get()
print(get_my_entity.foo)
it will fail itermittently because the get() method returns None.
This only happens on about 1 in 10 calls.
To demonstrate, I've created this script (also available with ready to run docker-compose setup on GitHub):
import random
from google.cloud import ndb
from google.auth.credentials import AnonymousCredentials
client = ndb.Client(
credentials=AnonymousCredentials(),
project='local-dev',
)
class SampleModel(ndb.Model):
"""Sample model."""
some_val = ndb.StringProperty()
for x in range(1, 1000):
print(f'Attempt {x}')
with client.context():
random_text = str(random.randint(0, 9999999999))
new_model = SampleModel(some_val=random_text)
new_model.put()
retrieved_model = SampleModel.query().filter(
SampleModel.some_val == random_text
).get()
print(f'Model Text: {retrieved_model.some_val}')
What would be the correct way to avoid this intermittent failure? Is there a way to ensure the entity is always available after the put() call?
Update
I can confirm that this is only an issue with the datastore emulator. When testing on app engine and a Firestore in Datastore mode, entities are available immediately after calling put().
The issue turned out to be related to the emulator trying to replicate eventual consistency.
Unlike relational databases, Datastore does not gaurentee that the data will be available immediately after it's posted. This is because there are often replication and indexing delays.
For things like unit tests, this can be resolved by passing --consistency=1.0 to the datastore start command as documented here.
I am using Pytest to implement unit test in my django project which has MySql as backend.
In combination with these I am making use of SQLAlchemy for data generation.
I have a python function call_my_flow() which executes two different flows depending upon conditions. First flow uses sqlalchemy connection and second flow uses django connection for database insert.
I have written two unit tests using pytest to check both the flows.
First flow (where sqlalchemy connection is used): Commits the process flow transaction in database and pytest runs as per expectation.
Second flow (where django database connection is used): The transaction commit fails thus resulting into the failure of test.
Demo code:
import pytest
from myflow import call_my_flow
#pytest.fixture(scope="class")
#pytest.mark.django_db(transaction=False)
def setup_my_flow():
call_my_flow()
#pytest.mark.usefixtures("setup_my_flow")
class TestGenerateOrder(object):
#pytest.fixture(autouse=True)
def setuporder(self):
self.first_count = 2
self.second_count = 5
#pytest.mark.order1
#pytest.mark.django_db
def test_first_flow_count(self):
db_count = get_first_count()
assert db_count == self.first_count
#pytest.mark.order2
#pytest.mark.django_db
def test_second_flow_count(self):
db_count = get_second_count()
assert db_count == self.second_count
Please suggest a solution on the same.
I'm creating the test cases for web-tests using Jenkins, Python, Selenium2(webdriver) and Py.test frameworks.
So far I'm organizing my tests in the following structure:
each Class is the Test Case and each test_ method is a Test Step.
This setup works GREAT when everything is working fine, however when one step crashes the rest of the "Test Steps" go crazy. I'm able to contain the failure inside the Class (Test Case) with the help of teardown_class(), however I'm looking into how to improve this.
What I need is somehow skip(or xfail) the rest of the test_ methods within one class if one of them has failed, so that the rest of the test cases are not run and marked as FAILED (since that would be false positive)
Thanks!
UPDATE: I'm not looking or the answer "it's bad practice" since calling it that way is very arguable. (each Test Class is independent - and that should be enough).
UPDATE 2: Putting "if" condition in each test method is not an option - is a LOT of repeated work. What I'm looking for is (maybe) somebody knows how to use the hooks to the class methods.
I like the general "test-step" idea. I'd term it as "incremental" testing and it makes most sense in functional testing scenarios IMHO.
Here is a an implementation that doesn't depend on internal details of pytest (except for the official hook extensions). Copy this into your conftest.py:
import pytest
def pytest_runtest_makereport(item, call):
if "incremental" in item.keywords:
if call.excinfo is not None:
parent = item.parent
parent._previousfailed = item
def pytest_runtest_setup(item):
previousfailed = getattr(item.parent, "_previousfailed", None)
if previousfailed is not None:
pytest.xfail("previous test failed (%s)" % previousfailed.name)
If you now have a "test_step.py" like this:
import pytest
#pytest.mark.incremental
class TestUserHandling:
def test_login(self):
pass
def test_modification(self):
assert 0
def test_deletion(self):
pass
then running it looks like this (using -rx to report on xfail reasons):
(1)hpk#t2:~/p/pytest/doc/en/example/teststep$ py.test -rx
============================= test session starts ==============================
platform linux2 -- Python 2.7.3 -- pytest-2.3.0.dev17
plugins: xdist, bugzilla, cache, oejskit, cli, pep8, cov, timeout
collected 3 items
test_step.py .Fx
=================================== FAILURES ===================================
______________________ TestUserHandling.test_modification ______________________
self = <test_step.TestUserHandling instance at 0x1e0d9e0>
def test_modification(self):
> assert 0
E assert 0
test_step.py:8: AssertionError
=========================== short test summary info ============================
XFAIL test_step.py::TestUserHandling::()::test_deletion
reason: previous test failed (test_modification)
================ 1 failed, 1 passed, 1 xfailed in 0.02 seconds =================
I am using "xfail" here because skips are rather for wrong environments or missing dependencies, wrong interpreter versions.
Edit: Note that neither your example nor my example would directly work with distributed testing. For this, the pytest-xdist plugin needs to grow a way to define groups/classes to be sent whole-sale to one testing slave instead of the current mode which usually sends test functions of a class to different slaves.
If you'd like to stop the test execution after N failures anywhere (not in a particular test class) the command line option pytest --maxfail=N is the way to go:
https://docs.pytest.org/en/latest/usage.html#stopping-after-the-first-or-n-failures
if you instead want to stop a test that is comprised of multiple steps if any of them fails, (and continue executing the other tests) you should put all your steps in a class, and use the #pytest.mark.incremental decorator on that class and edit your conftest.py to include the code shown here
https://docs.pytest.org/en/latest/example/simple.html#incremental-testing-test-steps.
The pytest -x option will stop test after first failure:
pytest -vs -x test_sample.py
It's generally bad practice to do what are you doing. Each test should be as independent as possible from the others, while you completely depend on the results of the other tests.
Anyway, reading the docs it seems like a feature like the one you want is not implemented.(Probably because it wasn't considered useful).
A work-around could be to "fail" your tests calling a custom method which sets some condition on the class, and mark each test with the "skipIf" decorator:
class MyTestCase(unittest.TestCase):
skip_all = False
#pytest.mark.skipIf("MyTestCase.skip_all")
def test_A(self):
...
if failed:
MyTestCase.skip_all = True
#pytest.mark.skipIf("MyTestCase.skip_all")
def test_B(self):
...
if failed:
MyTestCase.skip_all = True
Or you can do this control before running each test and eventually call pytest.skip().
edit:
Marking as xfail can be done in the same way, but using the corresponding function calls.
Probably, instead of rewriting the boiler-plate code for each test, you could write a decorator(this would probably require that your methods return a "flag" stating if they failed or not).
Anyway, I'd like to point out that,as you state, if one of these tests fails then other failing tests in the same test case should be considered false positive...
but you can do this "by hand". Just check the output and spot the false positives.
Even though this might be boring./error prone.
You might want to have a look at pytest-dependency. It is a plugin that allows you to skip some tests if some other test had failed.
In your very case, it seems that the incremental tests that gbonetti discussed is more relevant.
Based on hpk42's answer, here's my slightly modified incremental mark that makes test cases xfail if the previous test failed (but not if it xfailed or it was skipped). This code has to be added to conftest.py:
import pytest
try:
pytest.skip()
except BaseException as e:
Skipped = type(e)
try:
pytest.xfail()
except BaseException as e:
XFailed = type(e)
def pytest_runtest_makereport(item, call):
if "incremental" in item.keywords:
if call.excinfo is not None:
if call.excinfo.type in {Skipped, XFailed}:
return
parent = item.parent
parent._previousfailed = item
def pytest_runtest_setup(item):
previousfailed = getattr(item.parent, "_previousfailed", None)
if previousfailed is not None:
pytest.xfail("previous test failed (%s)" % previousfailed.name)
And then a collection of test cases has to be marked with #pytest.mark.incremental:
import pytest
#pytest.mark.incremental
class TestWhatever:
def test_a(self): # this will pass
pass
def test_b(self): # this will be skipped
pytest.skip()
def test_c(self): # this will fail
assert False
def test_d(self): # this will xfail because test_c failed
pass
def test_e(self): # this will xfail because test_c failed
pass
UPDATE: Please take a look at #hpk42 answer. His answer is less intrusive.
This is what I was actually looking for:
from _pytest.runner import runtestprotocol
import pytest
from _pytest.mark import MarkInfo
def check_call_report(item, nextitem):
"""
if test method fails then mark the rest of the test methods as 'skip'
also if any of the methods is marked as 'pytest.mark.blocker' then
interrupt further testing
"""
reports = runtestprotocol(item, nextitem=nextitem)
for report in reports:
if report.when == "call":
if report.outcome == "failed":
for test_method in item.parent._collected[item.parent._collected.index(item):]:
test_method._request.applymarker(pytest.mark.skipif("True"))
if test_method.keywords.has_key('blocker') and isinstance(test_method.keywords.get('blocker'), MarkInfo):
item.session.shouldstop = "blocker issue has failed or was marked for skipping"
break
def pytest_runtest_protocol(item, nextitem):
# add to the hook
item.ihook.pytest_runtest_logstart(
nodeid=item.nodeid, location=item.location,
)
check_call_report(item, nextitem)
return True
Now adding this to conftest.py or as a plugin solves my problem.
Also it's improved to STOP testing if the blocker test has failed. (meaning that the entire further tests are useless)
Or quite simply instead of calling py.test from cmd (or tox or wherever), just call:
py.test --maxfail=1
see here for more switches:
https://pytest.org/latest/usage.html
To complement hpk42's answer, you can also use pytest-steps to perform incremental testing, this can help you in particular if you wish to share some kind of incremental state/intermediate results between the steps.
With this package you do not need to put all the steps in a class (you can, but it is not required), simply decorate your "test suite" function with #test_steps:
from pytest_steps import test_steps
def step_a():
# perform this step ...
print("step a")
assert not False # replace with your logic
def step_b():
# perform this step
print("step b")
assert not False # replace with your logic
#test_steps(step_a, step_b)
def test_suite_no_shared_results(test_step):
# Execute the step
test_step()
You can add a steps_data parameter to your test function if you wish to share a StepsDataHolder object between your steps.
import pytest
from pytest_steps import test_steps, StepsDataHolder
def step_a(steps_data):
# perform this step ...
print("step a")
assert not False # replace with your logic
# intermediate results can be stored in steps_data
steps_data.intermediate_a = 'some intermediate result created in step a'
def step_b(steps_data):
# perform this step, leveraging the previous step's results
print("step b")
# you can leverage the results from previous steps...
# ... or pytest.skip if not relevant
if len(steps_data.intermediate_a) < 5:
pytest.skip("Step b should only be executed if the text is long enough")
new_text = steps_data.intermediate_a + " ... augmented"
print(new_text)
assert len(new_text) == 56
#test_steps(step_a, step_b)
def test_suite_with_shared_results(test_step, steps_data: StepsDataHolder):
# Execute the step with access to the steps_data holder
test_step(steps_data)
Finally, you can automatically skip or fail a step if another has failed using #depends_on, check in the documentation for details.
(I'm the author of this package by the way ;) )