I developed a crawler and it's unit-tests (mainly to validate XPATHs). I want to run specific unit-tests before script execution in order to be sure that HTML structure has not changed and existing XPATHs still working. I don't want the output of unit-test, just a flag: passed or failed.
for example:
tests.py:
import unittest
class CrwTst(unittest.TestCase):
def test_1(self):
[..]
crawler.py
class Crawler(object):
def action_1(self):
[..]
and I want to work like:
if CrwTst.test_1() is True:
Crawler.action_1()
You could potentially do this:
crawler.py
import unittest
from tests import CrwTst
if unittest.TextTestRunner().run(CrwTst('test_1')).wasSuccessful():
Crawler.action_1()
Note however that you may run into an issue with circular imports, because your test presumably already depends on Crawler, and what you are looking to do will make the Crawler depend on the test. This will likely manifest itself as ImportError: cannot import name CrwTst.
To resolve that, you can dynamically import the CrwTst.
crawler.py
import unittest
def function_that_runs_crawler():
from tests import CrwTst # Dynamically import to resolve circular ref
if unittest.TextTestRunner().run(CrwTst('test_1')).wasSuccessful():
Crawler.action_1()
Related
PROBLEM
I need to import a function/method located in scrapy project #1 into a spider in scrapy project # 2 and use it in one of the spiders of project #2.
DIRECTORY STRUCTURE
For starters, here's my directory structure (assume these are all under one root directory):
/importables # scrapy project #1
/importables
/spiders
title_collection.py # take class functions defined from here
/alibaba # scrapy project #2
/alibaba
/spiders
alibabaPage.py # use them here
WHAT I WANT
As shown above, I am trying to get scrapy to:
Run alibabaPage.py
From title_collection.py, import a class method named saveTitleInTitlesCollection out of a class in that file named TitleCollectionSpider
I want to use saveTitleInTitlesCollection inside functions that are called in the alibabaPage.py spider.
HOW IT'S GOING...
Here's what I've done so far at the top of alibabaPage.py:
from importables.importables.spiders import saveTitleInTitlesCollection
nope. Fails and the error says builtins.ModuleNotFoundError: No module named 'importables'
How can that be? That answer I got from this answer.
sys.path.append(os.path.join(os.path.dirname(__file__), '../..'))
Then, I did this...
from importables.importables.spiders import saveTitleInTitlesCollection
nope, Fails and I get the same error as the first attempt. Taken from this answer.
Re-reading the post in the link from answer #1, I realized the guy put the two files in the same directory, so, I tried doing that (making a copy of title_collection.py and putting it in like so:
/alibaba # scrapy project #2
/alibaba
/spiders
alibabaPage.py # use them here
title_collection.py # added this
Well, that appeared to work but didn't in the end. This threw no errors...
from alibaba.spiders.title_collection import TitleCollectionSpiderAlibaba
Leading me to assume everything worked. I added a test function named testForImport and tried importing it, ended up getting error: builtins.ModuleNotFoundError: No module named 'alibaba.spiders.title_collection.testForImport'; 'alibaba.spiders.title_collection' is not a package
Unfortunately, this wasn't actually achieving the goal of importing the class method I want to use, named saveTitleInTitlesCollection.
I have numerous scrapy projects and want to really just have one project of spiders that I can just import into every other project with ease.
This is not that solution so, the quest for a true solution to importing a bunch of class methods from one scrapy project to many continues... can this even be done I wonder...
WAIT, this actually didn't work after all because when builtins.ModuleNotFoundError:
No module named 'TitleCollectionSpiderAlibaba'
from alibaba.spiders.title_collection import testForImport
nope. This failed too.
But, this time it gave me slightly different error...
builtins.ImportError:
cannot import name 'testForImport' from 'alibaba.spiders.title_collection'
(C:\Users\User\\scrapy-webscrapers\alibaba\alibaba\spiders\title_collection.py)
Consider this now solved!
Due to Umair's answer I was able to do this:
# typical scrapy spider imports...
import scrapy
from ..items import AlibabaItem
# import this near the top of the page
sys.path.append(os.path.join(os.path.abspath('../')))
from importables.importables.spiders.title_collection import TitleCollectionSpider
...
# then, in parse method I did this...
def parse(self, response):
alibaba_item = AlibabaItem()
title_collection_spider_obj = TitleCollectionSpider()
title_collection_spider_obj.testForImportTitlesCollection()
# terminal showed this, proving it worked...
# "testForImport worked if you see this!"
inside alibabaPage.py you can do this to import class outside of your Scrapy project folder
import os, sys
sys.path.append(os.path.join(os.path.abspath('../')))
from importables.importables.spiders.title_collection import TitleCollectionSpider
This will import class from title_collection.py into alibabaPage.py
I am writing test cases for a module module.py that imports from another module legacy.py. The legacy.py reads os.environ["some_var"] at module level. When I am trying to run test cases for module.py, they are failing with KeyError for some_var in os.environ.
This is how code looks like:
module.py
from legacy import db
def fun():
pass
legacy.py
import os
env = os.environ["some_var"]
class db:
def __init__(self):
pass
When running test cases for module.py, I am getting KeyError: 'some_var'.
I tried patching os (at module level also putting it before importing from module.py in test file) but the import statement is run before it could be patched. I tried looking for similar question on StackOverflow but didn't find the exact problem I am facing here.
How can I mock it? Or any other suggestion to be able to run the test cases. Assume that I cannot modify the legacy.py file.
You could use the mock.patch.dict. There is an example in official doc.
Here is a fully functional example based in your question.
module.py
import os
def get_env_var_value():
env_var = os.environ['PATH']
return env_var
print(get_env_var_value())
test_module.py
from unittest.mock import patch
from unittest import main, TestCase
import module
class MyClassTestCase(TestCase):
#patch.dict('os.environ', {'PATH': '/usr/sbin'})
def test_get_env_var_value(self):
self.assertEqual(module.get_env_var_value(), '/usr/sbin')
if __name__ == '__main__':
main()
Doesn't matter if the environment var is loaded in outside or inside the class/method scope, because you will mock for all test scope.
I am using pytest to run test case for a package I am developing. The tests use a small image file that I have saved as a github asset. The code below works just fine, but I think that pytest is downloading the image each time it runs a new test and that takes unnecessary time and resources. I was trying to figure out how I can download the file once, and then share it across test cases
Here is some sample code.
# -- in conftest.py --
import sys
import pytest
import os
import shutil
import requests
#pytest.fixture(scope="function")
def small_image(tmpdir):
url = 'https://github.com/.../sample_image_small.tif'
r = requests.get(url)
with open(os.path.join(str(tmpdir), 'sample_image_small.tif'), 'wb') as f:
f.write(r.content)
return os.path.join(str(tmpdir), 'sample_image_small.tif')
Then here are some very simple test cases that should be able to share the same image.
# -- test_package.py --
import pytest
import os
#pytest.mark.usefixtures('small_image')
def test_ispath(small_image, compression):
assert os.path.exists(small_image)
def test_isfile(small_image, compression):
assert os.path.isfile(small_image)
Now I believe that pytest will try and isolate each test by itself and so that is what causes the repeated downloads of files. I tried to set the #pytest.fixture(scope="module") instead of function but that was generating strange errors:
ScopeMismatch: You tried to access the 'function' scoped fixture 'tmpdir' with a 'module' scoped request object, involved factories
Is there a better way to setup the tests so that I don't keep download the file over and over?
First, a note beforehand: a better alternative to the old tmpdir/tmpdir_factory fixtures pair is tmp_path/tmp_path_factory which deals with pathlib objects instead of the deprecated py.path, see Temporary directories and files.
Second, if you want to handle files session-scoped (or module-scoped), tmp*_factory fixtures are meant for that. Example:
#pytest.fixture(scope='session')
def small_image(tmp_path_factory):
img = tmp_path_factory.getbasetemp() / 'sample_image_small.tif'
img.write_bytes(b'spam')
return img
The sample_image_small.tif will now be written once per test run.
Of course, there's nothing wrong with using tempfile as suggested by #MrBean Bremen in his answer, this is just an alternative doing the same, but using only standard pytest fixtures.
You can use the same code, just handle the tempfile yourself instead of using the tmpdir fixture (which cannot be used in module-scoped fixtures):
import os
import tempfile
import pytest
import requests
#pytest.fixture(scope="module")
def small_image():
url = 'https://github.com/.../sample_image_small.tif'
r = requests.get(url)
f = tempfile.NamedTemporaryFile(delete=False):
f.write(f.content)
yield f.name
os.remove(f.name)
This will create the file, return the file name, and delete the file after the tests are finished.
EDIT:
The answer by #hoefling shows a more standard way to do this, I'll leave this one for reference.
Question on unit testing
Goal: The goal is to use pyUnit in testCalc.py to unit test the simpleCalc object in calculator.py.
Problem: I cannot successfully import the simpleCalc object from calculator.py into testCalc.py when testCalc is run from a separate directory in the project.
Background: The unit test in testCalc.py runs perfectly fine when it's included in the same directory as calculator.py, but when I move it into a separate folder and try to import the simpleCalc object defined in calculator.py, I get an error. I am trying to learn how to use the pyUnit unit testing framework in a simple project and I'm clearly missing something basic about how to import modules for unit testing in a hierarchical directory structure. The basic calculator_test project described below is a simple project that I created to practice. You can see all the posts I've gone through already at the end of this post.
Ultimate Question: How can I import the simpleCalc object into testCalc.py with the directory hierarchy described below?
Github: https://github.com/jaybird4522/calculator_test/tree/unit_test
Here's my directory structure:
calculator_test/
calculatorAlgo/
__init__.py
calculator.py
test_calculatorAlgo/
__init__.py
testCalc.py
testlib/
__init__.py
testcase.py
Here's the calculator.py file, which describes the simpleCalc object I want to unit test:
# calculator.py
class simpleCalc(object):
def __init__(self):
self.input1 = 0
self.input2 = 0
def set(self, in1, in2):
self.input1 = in1
self.input2 = in2
def subtract(self, in1, in2):
self.set(in1, in2)
result = self.input1 - self.input2
return result
Here's the testCalc.py file, which contains the unit tests:
# testCalc.py
import unittest
from calculatorAlgo.calculator import simpleCalc
class testCalc(unittest.TestCase):
# set up the tests by instantiating a simpleCalc object as calc
def setUp(self):
self.calc = simpleCalc()
def runTest(self):
self.assertEqual(self.calc.subtract(7,3),4)
if __name__ == '__main__':
unittest.main()
I have been running the unit test file with the simple command:
testCalc.py
What I've attempted so far
First Attempt
I tried simply importing the simpleCalc object based on where it's located in the directory structure:
# testCalc.py
import unittest
from .calculatorAlgo.calculator import simpleCalc
class testCalc(unittest....
And got this error:
ValueError: Attempted relative import in non-package
Second Attempt
I tried importing it without relative references:
# testCalc.py
import unittest
import simpleCalc
class testCalc(unittest....
And got this error:
ImportError: No module named simpleCalc
Third Attempt
Based on this post, http://blog.aaronboman.com/programming/testing/2016/02/11/how-to-write-tests-in-python-project-structure/, I tried creating a separate base class called testcase.py which could do the relative imports.
# testcase.py
from unittest import TestCase
from ...calculator import simpleCalc
class BaseTestCase(TestCase):
pass
And changed my imports in testCalc.py
# testCalc.py
import unittest
from testlib.testcase import BaseTestCase
class testCalc(unittest....
And got this error:
ValueError: Attempted relative import beyond toplevel package
Other Resources
Here are some of the posts I've worked through to no avail:
Import a module from a relative path
python packaging for relative imports
How to fix "Attempted relative import in non-package" even with __init__.py
Python importing works from one folder but not another
Relative imports for the billionth time
Ultimately, I feel that I'm just missing something basic, even after a lot of research. This feels like a common setup, and I'm hoping someone can tell me what I'm doing wrong, and that it might help others avoid this problem in the future.
Inside the testCalc.py file, add the following.
import sys
import os
sys.path.append(os.path.abspath('../calculatorAlgo'))
from calculator import simpleCalc
I'm doing a simple script to run and test my code. How can i import dinamically and run my test classes?
This is the solution that I found to import and dynamically run my test classes.
import glob
import os
import imp
import unittest
def execute_all_tests(tests_folder):
test_file_strings = glob.glob(os.path.join(tests_folder, 'test_*.py'))
suites = []
for test in test_file_strings:
mod_name, file_ext = os.path.splitext(os.path.split(test)[-1])
py_mod = imp.load_source(mod_name, test)
suites.append(unittest.defaultTestLoader.loadTestsFromModule(py_mod))
text_runner = unittest.TextTestRunner().run(unittest.TestSuite(suites))
Install pytest, and run your tests with a command like:
py.test src
That's it. Py.test will load all test_*.py files, find all def test_* calls inside them, and run each one for you.
The board is having trouble answering your question because it's in "why is water wet?" territory; all test rigs come with runners that automatically do what your code snip does, so you only need read the tutorial for one to get started.
And major props for writing auto tests at all; they put you above 75% of all programmers.
This solution is too simple and perform what i want.
import unittest
def execute_all_tests(tests_folder):
suites = unittest.TestLoader().discover(tests_folder)
text_runner = unittest.TextTestRunner().run(suites)