I' playing with OOP (OOP concept is something totally new for me) in Python 3 and trying to access attribute (list) of one class from another class. Obviously I am doing something wrong but don't understand what.
from urllib import request
from bs4 import BeautifulSoup
class getUrl(object):
def __init__(self):
self.appList = []
self.page = None
def getPage(self, url):
url = request.urlopen(url)
self.page = url.read()
url.close()
def parsePage(self):
soup = BeautifulSoup(self.page)
for link in soup.find_all("a"):
self.appList.append(link.get('href'))
return (self.appList)
class getApp(object):
def __init__(self):
pass
def selectApp(self):
for i in getUrl.appList():
return print(i)
a = getUrl()
a.getPage("http://somepage/page")
a.parsePage()
b = getApp()
b.selectApp()
And I get:
AttributeError: type object 'getUrl' has no attribute 'appList'
Your code seems to confuse classes with functions. Normally a function name is a verb (e.g. getUrl) because it represents an action. A class name is usually a noun, because it represents a class of objects rather than actions. For example, the following is closer to how I would expect to see classes being used:
from urllib import request
from bs4 import BeautifulSoup
class Webpage(object):
def __init__(self, url):
self.app_list = []
url = request.urlopen(url)
self.page = url.read()
def parse(self):
soup = BeautifulSoup(self.page)
for link in soup.find_all("a"):
self.app_list.append(link.get('href'))
return self.app_list
class App(object):
def __init__(self, webpage, number):
self.webpage = webpage
self.link = webpage.app_list[number]
my_webpage = Webpage("http://somepage/page")
my_webpage.parse()
selected_app = App(my_webpage, 1)
print (selected_app.link)
Note that we usually make an instance of a class (e.g. my_webpage) then access methods and properties of the instance rather than of the class itself. I don't know what you intend to do with the links found on the page, so it is not clear if these need their own class (App) or not.
You need to pass in the getUrl() instance; the attributes are not present on the class itself:
class getApp(object):
def __init__(self):
pass
def selectApp(self, geturl_object):
for i in geturl_object.appList:
print(i)
(note the removed return as well; print() returns None and you'd exit the loop early).
and
b = getApp()
b.selectApp(a)
The appList is a variable in an instance of the getUrl class. So you can only access it for each instance (object) of the getUrl class. The problem is here:
class getApp(object):
def __init__(self):
pass
def selectApp(self):
for i in getUrl.appList():
return print(i)
Look at getUrl.appList(). Here you call the class, not an object. You might also want to look at the return print(i) statement.
Use requests instead of urllib, it's more comfortable.
Related
I am building a parser in python that needs to:
Retrieve a stored HTML page from S3 based on an ID
Determine what type of parser to use based on header information in the HTML
Return some data from the HTML using the correct parser
How can I create an elegant structure where I request the data from S3 one time, determine what parser to use based on classes I have built, then return the appropriate result?
This is the structure I came up to build the first parser:
# / parser.py
from gzip import decompress
from bs4 import BeautifulSoup
import requests
class Parser:
def __init__(self, page_id):
self.landing_page_endpoint = f"https://my_org.org/{page_id}"
self.parser_name = None
self.soup = self.get_soup()
def get_html(self):
r = requests.get(self.landing_page_endpoint)
html = decompress(r.content)
return html
def get_soup(self):
soup = BeautifulSoup(self.get_html(), "html.parser")
return soup
def parse(self):
"""Core method that returns authors and associated affiliations."""
pass
# /parsers/gregory.py
import json
from parser import Parser
class Gregory(Parser):
def __init__(self, doi):
super().__init__(doi)
self.parser_name = "gregory"
def parse(self):
my_parsed_info = 'asdf'
return my_parsed_info
Then I call this with:
# views.py
from flask import jsonify, request
from parsers.gregory import Gregory
page_id = request.args.get('page_id')
g = Gregory(page_id)
result = g.parse()
return jsonify(result)
My idea is to add a method to each parser class I create such as 'detect_parser' that returns True if it is the correct parser. Then I can make a list of all the parser classes, and go through each one running that method until it is True.
The problem with my current setup is this will call the request to S3 every time I call a class, which is slow and unnecessary. Should I do something where I initialize the overall Parser class once, then pass then into each lower class?
You'll want to look into a way to register parsers as you create them in your code. I'd suggest learning about decorators.
You classes would look something like:
class ParserController:
def __init__(self):
self.parsers=[]
def register(self, test):
def decorated(cls):
self.parsers.append((test,cls))
return cls
return decorated
def parse(self, page_id):
html = get_html(page_id)
for test, cls in self.parsers:
if test(html):
parser = cls()
break
return parser.parse(html)
controller = ParserController()
#controller.register(lambda html: #test to use Gregory parser)
class Gregory(Parser):
parse(self, html):
#do stuff
#controller.register(lambda html: #test to use Mark parser)
class Mark(Parser):
parse(self, html):
#do stuff
Is it possible to patch over a class instance variable and force it to return a different value each time that it's referenced? specifically, I'm interested in doing this with the side_effect parameter
I know that when patching over a method it is possible to assign a side_effect to a mock method. If you set the side_effect to be a list it will iterate through the list returning a different value each time it is called.
I would like to do the same thing with a class instance variable but cannot get it to work and I cannot see any documentation to suggest whether this is or is not possible
Example
from unittest.mock import patch
def run_test():
myClass = MyClass()
for i in range(2):
print(myClass.member_variable)
class MyClass():
def __init__(self):
self.member_variable = None
#patch('test_me.MyClass.member_variable',side_effect=[1,2], create=True)
def test_stuff(my_mock):
run_test()
assert False
Output
-------------- Captured stdout call ---------------------------------------------------------------------------------------------------------------------
None
None
Desired Output
-------------- Captured stdout call ---------------------------------------------------------------------------------------------------------------------
1
2
To be clear - I'm aware that I can wrap member_variable in a get_member_variable method(). That is not my question. I just want to know if you can patch a member variable with a side_effect.
side_effect can be either a function, an iterable or an exception (https://docs.python.org/3/library/unittest.mock.html#unittest.mock.Mock.side_effect). I think that's the reason why it's not working.
Another way to test this would be:
>>> class Class:
... member_variable = None
...
>>> with patch('__main__.Class') as MockClass:
... instance = MockClass.return_value
... instance.member_variable = 'foo'
... assert Class() is instance
... assert Class().member_variable == 'foo'
...
Here's the docs: https://docs.python.org/3/library/unittest.mock.html#unittest.mock.patch
In the case of the example you set I could not make the change the way I thought, you may have more items in this class and with this idea can help you.
Not the best option to side effect on an attribute but it worked as I needed it.
PS: I ended up putting as an example the code that brought me to your question.
Example:
# -*- coding: utf-8 -*-
# !/usr/bin/env python3
import requests
from src.metaclasses.singleton import Singleton
from src.services.logger import new_logger
from src.exceptions.too_many_retries import TooManyRetries
from src.exceptions.unavailable_url import UnavailableURL
LOG = new_logger(__name__)
class PostbackService(metaclass=Singleton):
def __init__(self):
self.session = requests.session()
def make_request(self, method, url, headers, data=None, retry=0):
r = self.session.request(method, url, data=data, headers=headers)
if r.status_code != 200:
if retry < 3:
return self.make_request(method, url, headers, data, retry + 1)
message = f"Error performing request for url: {url}"
LOG.error(message)
raise TooManyRetries(message)
return r.json()
Test:
# -*- coding: utf-8 -*-
# !/usr/bin/env python3
from unittest import TestCase
from unittest.mock import patch, MagicMock
from src.services.postback import PostbackService
from src.exceptions.too_many_retries import TooManyRetries
from src.exceptions.unavailable_url import UnavailableURL
class TestPostbackService(TestCase):
#patch("src.services.postback.requests")
def setUp(self, mock_requests) -> None:
mock_requests.return_value = MagicMock()
self.pb = PostbackService()
def test_make_request(self):
self.pb.session.request.return_value = MagicMock()
url = "http://mock.io"
header = {"mock-header": "mock"}
data = {"mock-data": "mock"}
mock_json = {"mock-json": "mock"}
def _def_mock(value):
"""
Returns a MagicMock with the status code changed for each request, so you can test the retry behavior of the application.
"""
mock = MagicMock()
mock.status_code = value
mock.json.return_value = mock_json
return mock
self.pb.session.request.side_effect = [
_def_mock(403),
_def_mock(404),
_def_mock(200),
]
self.assertEqual(self.pb.make_request("GET", url, header, data), mock_json)
self.pb.session.request.side_effect = [
_def_mock(403),
_def_mock(404),
_def_mock(404),
_def_mock(404),
]
with self.assertRaises(TooManyRetries):
self.pb.make_request("GET", url, header, data)
As you can see I recreated magicmock by changing the side effect of each one to what I wanted to do. It was not beautiful code and super pythonic, but it worked as expected.
I used as base to create this magicmock object the link that #rsarai sent from the unittest documentation.
As it was unclear earlier I am posting this scenario:
class Scraper:
def __init__(self,url):
self.start_page = url
def parse_html(self):
pass
def get_all_links(self):
pass
def run(self):
#parse html, get all links, parse them and when done...
return links
Now in a task queue like rq
from rq import Queue
from worker import conn
q = Queue(connection=conn)
result = q.enqueue(what_function, 'http://stackoverflow.com')
I want to know what this what_function would be? I remembered Django does something similar with their CBVs so I used that analogy but it wasn't so clear.
I have a class like
class A:
def run(self,arg):
#do something
I need to past this to a task queue, so I can do something like
a = A()
b = a.run
# q is the queue object
q.enqueue(b,some_arg)
I'd want to know what other method is there to do this, for example, Django does it in their Class Based Views,
class YourListView(ListView):
#code for your view
which is eventually passed as a function
your_view = YourListView.as_view()
How is it done?
Edit: to elaborate, django's class based views are converted to functions because the argument in the pattern function expects a function. Similarly, you might have a function which accepts the following argument
task_queue(callback_function, *parameters):
#add to queue and return result when done
but the functionality of callback_function might have been mostly implemented in a class, which has a run() method via which the process is ran.
I think you're describing a classmethod:
class MyClass(object):
#classmethod
def as_view(cls):
'''method intended to be called on the class, not an instance'''
return cls(instantiation, args)
which could be used like this:
call_later = MyClass.as_view
and later called:
call_later()
Most frequently, class methods are used to instantiate a new instance, for example, dict's fromkeys classmethod:
dict.fromkeys(['foo', 'bar'])
returns a new dict instance:
{'foo': None, 'bar': None}
Update
In your example,
result = q.enqueue(what_function, 'http://stackoverflow.com')
you want to know what_function could go there. I saw a very similar example from the RQ home page. That's got to be your own implementation. It's going to be something you can call with your code. It's only going to be called with that argument once, so if using a class, your __init__ should look more like this, if you want to use Scraper for your what_function replacement:
class Scraper:
def __init__(self,url):
self.start_page = url
self.run()
# etc...
If you want to use a class method, that might look like this:
class Scraper:
def __init__(self,url):
self.start_page = url
def parse_html(self):
pass
def get_all_links(self):
pass
#classmethod
def run(cls, url):
instance = cls(url)
#parse html, get all links, parse them and when done...
return links
And then your what_function would be Scraper.run.
I am using the HTMLParser library of Python 2.7 to process and extract some information from
an HTML content which was fetched from a remote url. I did not quite understand how to know or catch the exact moment when the parser instance finishes parsing the HTML data.
The basic implementation of my parser class looks like this:
class MyParser(HTMLParser.HTMLParser):
def __init__(self, url):
self.url = url
self.users = set()
def start(self):
self.reset()
response = urllib3.PoolManager().request('GET', self.url)
if not str(response.status).startswith('2'):
raise urllib3.HTTPError('HTTP error here..')
self.feed(response.data.decode('utf-8'))
def handle_starttag(self, tag, attrs):
if tag == 'div':
attrs = dict(attrs)
if attrs.get('class') == 'js_userPictureOuterOnRide':
user = attrs.get("data-name")
if user:
self.users.add(user)
def reset(self):
HTMLParser.HTMLParser.reset(self)
self.users.clear()
My question is, how can I detect that parsing process is finished?
Thanks.
HTMLParser is synchronous, that is, once it returns from feed, all data so far has been parsed and all callbacks called.
self.feed(response.data.decode('utf-8'))
print 'ready!'
(if I misunderstood your question, please let me know).
I am writing a unit test to determine if an attribute is properly set during the instantiation of my parser object. Unfortunetly the only way that I can think to do it is to use self.assertTrue(p.soup)
I haven't slung any python in awhile, but that doesn't seem like a very clear way to check that the instance attribute was properly set. Any ideas on how to improve it?
Here is my test class:
class ParserTest(unittest.TestCase):
def setUp(self):
self.uris = ctd.Url().return_urls()
self.uri = self.uris['test']
def test_create_soup(self):
p = ctd.Parser(self.uri)
self.assertTrue(p.soup)
if __name__ == '__main__':
unittest.main()
# suite = unittest.TestLoader().loadTestsFromTestCase(UrlTest)
unittest.TextTestRunner(verbosity=2).run(suite)
Here is my Parser class that I am testing
class Parser():
def __init__(self, uri):
self.uri = uri
self.soup = self.createSoup()
def createSoup(self):
htmlPage = urlopen(self.uri)
htmlText = htmlPage.read()
self.soup = BeautifulSoup(htmlText)
return BeautifulSoup(htmlText)
I got in the bad habot over the past few years of not unit testing, so I am fairly new to the topic. Any good resources to look at for an in depth explaination of unit testing in Python would be appreciated. I look at the standard library unittest documentation, but that really didn't help much...
If p.soup attribute needs to be instance of BeautifulSoup you can explicitly check its type
self.assertIsInstance(p.soup, BeautifulSoup)