What is the right approach to unittest this method in Python? - python

I have a scraping module on my application that uses Beautiful Soup and Selenium to get website info with this function:
def get_page(user: str) -> Optional[BeautifulSoup]:
"""Get a Beautiful Soup object that represents the user profile page in some website"""
try:
browser = webdriver.Chrome(options=options)
wait = WebDriverWait(browser, 10)
browser.get('https://somewebsite.com/' + user)
wait.until(EC.presence_of_element_located((By.TAG_NAME, 'article')))
except TimeoutException:
print("User hasn't been found. Try another user.")
return None
return BeautifulSoup(browser.page_source, 'lxml')
I need to test this function in two ways:
if it is getting a page (the success case);
and if it is printing the warning and returning None when it's not getting any page (the failure case).
I tried to test like this:
class ScrapeTests(unittest.TestCase):
def test_get_page_success(self):
"""
Test if get_page is getting a page
"""
self.assertEqual(isinstance(sc.get_page('myusername'), BeautifulSoup), True)
def test_get_page_not_found(self):
"""
Test if get_page returns False when looking for a user
that doesn't exist
"""
self.assertEqual(sc.get_page('iwçl9239jaçklsdjf'), None)
if __name__ == '__main__':
unittest.main()
Doing it like that makes the tests somewhat slower, as get_page itself is slow in the success case, and in the failure case, I'm forcing a timeout error looking for a non-existing user. I have the impression that my approach for testing these functions is not the right one. Probably the best way to test it is to fake a response, so get_page won't need to connect to the server and ask for anything.
So I have two questions:
Is this "fake web response" idea the right approach to test this function?
If so, how can I achieve it for that function? Do I need to rewrite the get_page function so it can be "testable"?
EDIT:
I tried to create a test to get_page like this:
class ScrapeTests(TestCase):
def setUp(self) -> None:
self.driver = mock.patch(
'scrape.webdriver.Chrome',
autospec=True
)
self.driver.page_source.return_value = "<html><head></head><body><article>Yes</article></body></html>"
self.driver.start()
def tearDown(self) -> None:
self.driver.stop()
def test_get_page_success(self):
"""
Test if get_page is getting a page
"""
self.assertEqual(isinstance(sc.get_page('whatever'), BeautifulSoup), True)
The problem I'm facing is that the driver.page_source attribute is created only after the wait.until function call. I need the wait.until because I need that Selenium browser waits for the javascript to create the article tags in HTML in order for me to scrape them.
When I try to define a return value for page source in setUp, I get an error: AttributeError: '_patch' object has no attribute 'page_source'
I tried lots of ways to mock webdriver attributes with mock. patch but it seems very difficult to my little knowledge. I think that maybe the best way to achieve what I desire (test get_page function without the need to connect to a server) is to mock an entire web server connection. But this is just a guess.

I think you are on the right track with #1. Look into the mock library which allows you to mock (fake) out functions, methods and classes and control the results of method calls. This will also remove the latency of the actual calls.
In my experience it is best to focus on testing the local logic and mock out any external dependencies. If you do that you will have a lot of small unit tests that together will test the majority of your code.
Based on your update, try:
self.driver = mock.patch_object(webdriver.Chrome, 'page_source', return_value="<html><head></head><body><article>Yes</article></body></html>"
If that doesn't work, then unfortunately I am out of ideas. I't possible the Selenium code is harder to mock.

Related

Create multiple similar PyTest fixtures

I have a series of pytest fixtures which are very similar to each other in nature. The fixtures are passed to tests which verify that certain CSS selectors work properly. The code within each fixture is nearly the same; this only difference is that a different URL is passed to the page.goto() function for each fixture.
Each fixture looks something like this:
import pytest
#pytest.fixture(scope="module")
def goto_pagename():
with sync_playwright() as play:
browser = play.chromium.launch()
page = browser.new_page()
page.goto(TestScrapeWebsite.address)
yield page
browser.close()
I tried to use a decorator which covers all of the code except page.goto() and yield page, which is below:
from playwright.sync_api import sync_playwright
def get_page(func):
def wrapper(*args, **kwargs):
with sync_playwright() as play:
browser = play.chromium.launch(*args, **kwargs)
page = browser.new_page()
func(page)
browser.close()
return wrapper
Then, the fixtures and tests would look something like this:
import pytest
#pytest.fixture(scope="module")
#get_page
def google_page(page):
page.goto(TestScrapeGoogle.address)
yield page
#pytest.fixture(scope="module")
#get_page
def stackoverflow_page(page):
page.goto(TestScrapeStackOverflow.address)
yield page
#pytest.fixture(scope="module")
#get_page
def github_page(page):
page.goto(TestScrapeGitHub.address)
yield page
class TestScrapeGoogle:
address = "https://google.com/"
def test_selectors(self, google_page):
assert google_page.url == self.address
class TestScrapeStackOverflow:
address = "https://stackoverflow.com/"
def test_selectors(self, stackoverflow_page):
assert stackoverflow_page.url == self.address
class TestScrapeGitHub:
address = "https://github.com/"
def test_selectors(self, github_page):
assert github_page.url == self.address
However, when running the pytest test runner, exceptions were raised concerning the fixtures:
$ pytest test_script.py
...
============================================================================================ short test summary info =============================================================================================
FAILED test_script.py::TestScrapeGoogle::test_selectors - AttributeError: 'NoneType' object has no attribute 'url'
FAILED test_script.py::TestScrapeStackOverflow::test_selectors - AttributeError: 'NoneType' object has no attribute 'url'
FAILED test_script.py::TestScrapeGitHub::test_selectors - AttributeError: 'NoneType' object has no attribute 'url'
Is there a way to modify the approach which I have taken in order to simplify each of the fixtures? Or, is what I am asking out of the capabilities of pytest, and do I just have to fully write out each fixture?
Similar Questions:
Below I added a few Stack Overflow questions which appear when the title of my question is searched. I also included a reason why the answers to for the question would not be an ideal solution for my issue.
Multiple copies of a pytest fixture: All the copies of the pytest fixture are for the same tests. In my case, I'm expecting to have a separate fixture for each test.
Run a test with two different pytest fixtures: While there are multiple fixtures, each test will (likely) only have one fixture decorating the test.
Edit: I'm not able to run this code myself, as my chromeium setup seems busted at this time
I would opt for using pytest's parametrize functionality to do what you expect.
First, I would create my fixtures
#pytest.fixture(scope='module')
def browser():
with sync_playwright() as play:
browser = play.chromium.launch(*args, **kwargs)
try:
yield browser
finally:
# Ensure the browser is gracefully closed at end of test
browser.close()
#pytest.fixture
def page(browser, url):
# Provide the page
page = browser.new_page()
page.goto(url)
return page
Note that the url fixture is not yet defined.
Next, I create one test with the url (and expected url) parametrized
#pytest.mark.parametrize('url, expected_url', [
('https://google.com/', 'https://google.com/'),
('https://stackoverflow.com/', 'https://stackoverflow.com/'),
('https://github.com/', 'https://github.com/'),
])
def test_selectors(page, expected_url):
assert page.url == expected_url
Alternatively, if the tests for the web sites are slightly different, you can have three separate tests, each having a single parametrized entry.
#pytest.mark.parametrize('url, expected_url', [
('https://google.com/', 'https://google.com/'),
])
def test_google_selector(page, expected_url):
assert page.url == expected_url
#pytest.mark.parametrize('url, expected_url', [
('https://stackoverflow.com/', 'https://stackoverflow.com/'),
])
def test_stackoverflow_selector(page, expected_url):
assert page.url == expected_url

Scrapy with multiple Selenium instances (parallel)

I need to scrape many urls with Selenium and Scrapy. To speed up whole process, I'm trying to create a bunch of shared Selenium instances. My idea is to have a set of parallel Selenium instances available to any Request if needed and released if done.
I tried to create a Middleware but the problem is that Middleware is sequential (I see all drivers (I call it browsers) loading urls and it seems to be sequential). I want all drivers work parallel.
class ScrapySpiderDownloaderMiddleware(object):
BROWSERS_COUNT = 10
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.free_browsers = set(
[webdriver.Chrome(executable_path=BASE_DIR + '/chromedriver') for x in range(self.BROWSERS_COUNT)])
def get_free_browser(self):
while True:
try:
return self.free_browsers.pop()
except KeyError:
time.sleep(0.1)
def release_browser(self, browser):
self.free_browsers.add(browser)
def process_request(self, request, spider):
browser = self.get_free_browser()
browser.get(request.url)
body = str.encode(browser.page_source)
self.release_browser(browser)
# Expose the driver via the "meta" attribute
request.meta.update({'browser': browser})
return HtmlResponse(
browser.current_url,
body=body,
encoding='utf-8',
request=request
)
I don't like solutions where you do:
driver.get(response.url)
in parse method because it causes redundant requests. Every url is being requested two times which I need to avoid.
For example this https://stackoverflow.com/a/17979285/2607447
Do you know what to do?
I suggest you look towards scrapy + docker. you can run many instances at once
As #Granitosaurus suggested, Splash is a good choice. I personally used Scrapy-splash - Scrapy takes care of parallel processing and Splash takes care of website rendering including JavaScript execution.

How can I more accurately report Pytest Selenium passes and failures to SauceLabs?

Trying to find the most elegant way to inform the test fixture of a test failure. This test fixture needs to report the results of the test to saucelabs in order to mark it as pass or fail. I've tried to delete as much irrelevant code from these examples as possible.
The following test uses the fixture browser.
def test_9(browser):
browser.get(MY_CONSTANT)
assert "My Page" in browser.title
browser.find_element_by_css_selector('div > img.THX_IP')
browser.find_element_by_link_text('Some text').click()
... etc
The fixture browser, which currently is hard coded to mark the test as passed:
#pytest.fixture()
def browser(request):
driver_type = request.config.getoption('driver')
if driver_type == 'sauce':
driver = webdriver.Remote(
command_executor = 'MY_CREDENTIALS',
desired_capabilities = caps)
else:
driver = webdriver.Chrome()
driver.implicitly_wait(2)
yield driver
if driver_type == 'sauce':
sauce_client.jobs.update_job(driver.session_id, passed = True)
driver.quit()
I've discovered a few workarounds but I'd really like to know the best way to do it.
I have the following fixture in my conftest.py file which handles all of the pass/fail reporting to sauce. I am not sure if it's "the best way to do it" but it certainly works for us at the moment. We include this fixture for every test that we write.
There's considerably more to this fixture but this section contained at the bottom handles the report really well.
def quit():
try:
if config.host == "saucelabs":
if request.node.result_call.failed:
driver_.execute_script("sauce:job-result=failed")
elif request.node.result_call.passed:
driver_.execute_script("sauce:job-result=passed")
finally:
driver_.quit()

pytest exception none type object is not callable

In test1.py I have below code
#pytest.fixture(scope="session")
def moduleSetup(request):
module_setup = Module_Setup()
request.addfinalizer(module_setup.teardown())
return module_setup
def test_1(moduleSetup):
print moduleSetup
print '...'
#assert 0
# def test_2(moduleSetup):
# print moduleSetup
# print '...'
# #assert 0
And in conftest.py I have
class Module_Setup:
def __init__(self):
self.driver = webdriver.Firefox()
def teardown(self):
self.driver.close()
When I run it launches and closes browser.
But I also get error self = <CallInfo when='teardown' exception: 'NoneType' object is not callable>, func = <function <lambda> at 0x104580488>, when = 'teardown'
Also If I want to run both tests test_1 and test_2 with same driver object I need to use scope module or session?
Regarding the exception
When using request.addfinalizer(), you shall pass in reference to a function.
Your code is passing result of calling that function.
request.addfinalizer(module_setup.teardown())
You shall call it this way:
request.addfinalizer(module_setup.teardown)
Regarding fixture scope
If your fixture allows reuse across multiple test calls, use "session"
scope. If it allows reuse only for tests in one module, use "module" scope.
Alternative fixture solution
The way you use the fixtures is not much in pytest style, it rather resembles unittest.
From the code you show it seems, the only think you need is to have running Firefox with driver allowing to use it in your tests, and after being done, you need to close it.
This can be accomplished by single fixture:
#pytest.fixture(scope="session")
def firefox(request):
driver = webdriver.Firefox()
def fin():
driver.close()
request.addfinalizer(fin)
or even better using #pytest.yield_fixture
#pytest.yield_fixture(scope="session")
def firefox(request):
driver = webdriver.Firefox()
yield driver
driver.close()
The yield is place, where fixture stops executing, yields the created value (driver) to test cases.
After the tests are over (or better, when the scope of our fixture is over), it
continues running the instructions following the yield and does the cleanup
work.
In all cases, you may then modify your test cases as follows:
def test_1(firefox):
print moduleSetup
print '...'
and the moduleSetup fixture becomes completely obsolete.

Closing the webdriver instance automatically after the test failed

My English is very poor but I'll try my best to describe the problem I encountered.
I used selenium webdriver to test a web site and the language that I used to write my script is python.Because of this,I used Pyunit.
I know that if my test suite have no exceptions,the webdriver instance will be closed correctly,(by the way, I used chrome) however,once a exception was threw,the script will be shut down and I have to close chrome manually.
I wonder that how can I achieved that when a python process quits , any remaining open WebDriver instances will also be closed.
By the way, I used Page Object Design Pattern,and the code below is a part of my script:
class personalcenter(unittest.TestCase):
def setUp(self):
self.driver = webdriver.Chrome()
self.page = personalCenter(self.driver,"admin","123456")
def testAddWorkExp(self):
blahblahblah...
def tearDown(self):
self.page.quit()
self.driver.quit()
if __name__ == "__main__":
unittest.main()
I haved searched the solution of this problem for a long time ,but almost every answer is depended on java and junit or testNG.How can I deal with this issue with Pyunit?
Thanks for every answer.
From the tearDown() documentation:
Method called immediately after the test method has been called and
the result recorded. This is called even if the test method raised an
exception, so the implementation in subclasses may need to be
particularly careful about checking internal state. Any exception
raised by this method will be considered an error rather than a test
failure. This method will only be called if the setUp() succeeds,
regardless of the outcome of the test method. The default
implementation does nothing.
So, the only case where tearDown will not be called is when setUp fails.
Therefore I would simply catch the exception inside setUp, close the driver, and re-raise it:
def setUp(self):
self.driver = webdriver.Chrome()
try:
self.page = personalCenter(self.driver,"admin","123456")
except Exception:
self.driver.quit()
raise
Hope!! this will help you. you might need to close the driver with boolean expression as False
from selenium import webdriver
import time
class Firefoxweb(object):
def __init__(self):
print("this is to run the test")
def seltest(self):
driver=webdriver.Chrome(executable_path="C:\\Users\Admin\PycharmProjects\Drivers\chromedriver.exe")
driver.get("URL")
driver.find_element_by_xpath("//*[#id='btnSkip']").click()
driver.find_element_by_xpath("//*[#id='userInput']").send_keys("ABC")
driver.find_element_by_xpath("//*[#id='passwordInput']").send_keys("*******")
driver.find_element_by_xpath("//*[#id='BtnLogin']").click()
driver.close(False)
FF=Firefoxweb()
FF.seltest()

Categories