My question concerns a class that I am writing that may or may not be fully initialized. The basic goal is to take a match_id and open the corresponding match_url (example: http://dota2lounge.com/match?m=1899) and then grab some properties out of the webpage. The problem is some match_ids will result in 404 pages (http://dota2lounge.com/404).
When this happens, there won't be a way to determine the winner of the match, so the rest of the Match can't be initialized. I have seen this causing problems with methods of the Match, so I added the lines to initialize everything to None if self._valid_url is False. This works in principal, but then I'm adding a line each time a new attribute is added, and it seems prone to errors down the pipeline (in methods, etc.) It also doesn't alert the user that this class wasn't properly initialized. They would need to call .is_valid_match() to determine that.
tl;dr: What is the best way to handle classes that may be only partially initiated? Since this is a hobby project and I'm looking to learn, I'm open to pretty much any solutions (trying new things), including other classes or whatever. Thanks.
This is an abbreviated version of the code containing the relevant portions (Python 3.3):
from urllib.request import urlopen
from bs4 import BeautifulSoup
class Match(object):
def __init__(self, match_id):
self.match_id = match_id
self.match_url = self.__determine_match_url__()
self._soup = self.__get_match_soup__()
self._valid_match_url = self.__determine_match_404__()
if self._valid_match_url:
self.teams, self.winner = self.__get_teams_and_winner__()
# These lines were added, but I'm not sure if this is correct.
else:
self.teams, self.winner = None, None
def __determine_match_url__(self):
return 'http://dota2lounge.com/match?m=' + str(self.match_id)
def __get_match_soup__(self):
return BeautifulSoup(urlopen(self.match_url))
def __get_match_details__(self):
return self._soup.find('section', {'class': 'box'})
def __determine_match_404__(self):
try:
if self._soup.find('h1').text == '404':
return False
except AttributeError:
return True
def __get_teams_and_winner__(self):
teams = [team.getText() for team in
self._soup.find('section', {'class': 'box'}).findAll('b')]
winner = False
for number, team in enumerate(teams):
if ' (win)' in team:
teams[number] = teams[number].replace(' (win)', '')
winner = teams[number]
return teams, winner
def is_valid_match(self):
return all([self._valid_match_url, self.winner])
I would raise an exception, handle that in your creation code (wherever you call some_match = Match(match_id)), and probably don't add it to whatever list you may or may not be using...
For a better answer, you might want to include in your question the code that instantiates all your Match objects.
Related
I'm trying to extract values from JSON input using python. There are many tag that I need to extract and not all JSON files have the same structure as the sources are multiple. Sometimes there is a possibility that a tag might be missing. So, a KeyError is bound to happen. If a tag is missing then respective variable will by default be None and it will be returned as a list (members) to the main call.
I tried calling a function to pass each tags into an individual try/except. But, I get hit by an error on the function call itself where the tag is being passed. So, instead I tried the below code but it skips any subsequent lines even if the tags are present. Is there a better way to do this?
def extract(self):
try:
self.data_version = self.data['meta']['data_version']
self.created = self.data['meta']['created']
self.revision = self.data['meta']['revision']
self.gender = self.data['info']['gender']
self.season = self.data['info']['season']
self.team_type = self.data['info']['team_type']
self.venue = self.data['info']['venue']
status = True
except KeyError:
status = False
members = [attr for attr in dir(self) if
not callable(getattr(self, attr)) and not attr.startswith("__") and getattr(self, attr) is None]
return status, members
UPDATED:
Thanks Barmar & John! .get() worked really well.
I'm doing some RESTful API calls to an outside department and have written various functions (similar to the snippet below) that handle this based on what info I'm needing (e.g. "enrollment", "person", etc.). Now I'm left wondering if it wouldn't be more pythonic to put this inside of a class, which I believe would then make it easier to do processing such as "has_a_passing_grade", etc. and pass that out as an attribute or something when the class is instantiated.
Is there a standard way of doing this? Is it as easy as creating a class, somehow building the api_url as I'm doing below, call the api, parse and format the data, build a dict or something to return, and be done? And how would the call to such a class look? Does anyone have some example code similar to this that can be shared?
Thanks, in advance, for any help!
from django.utils import simplejson
try:
api_url = get_api_url(request, 'enrollment', person_id)
enrollment = call_rest_stop(key, secret, 'GET', api_url)
enrollment_raw = enrollment.read()
if enrollment_raw == '' or None:
return 'error encountered', ''
enrollment_recs = simplejson.loads(enrollment_raw)
# now put it in a dict
for enrollment in enrollment_recs:
coursework_dict = {
'enrollment_id': enrollment['id'],
...,
}
coursework_list.append(coursework_dict)
cola_enrollment.close()
except Exception, exception:
return 'Error: ' + str(exception), ''
So, let's say you want your API's users to call your API like so:
student_history, error_message = get_student_history(student_id)
You could then just wrap the above in that function:
from django.utils import simplejson
def get_student_history(person_id)
try:
api_url = get_api_url(request, 'enrollment', person_id)
enrollment = call_rest_stop(key, secret, 'GET', api_url)
enrollment_raw = enrollment.read()
if enrollment_raw == '' or None:
return [], 'Got empty enrollment response'
enrollment_recs = simplejson.loads(enrollment_raw)
# now put it in a dict
for enrollment in enrollment_recs:
coursework_dict = {
'enrollment_id': enrollment['id'],
...,
}
coursework_list.append(coursework_dict)
cola_enrollment.close()
return coursework_list, None
except Exception as e:
return [], str(exception)
You could also use a class, but keep in mind that you should only do that if there would be methods that those using your API would benefit from having. For example:
class EnrollmentFetcher(object):
def __init__(person_id):
self.person_id = person_id
def fetch_data(self):
self.coursework_list, self.error_message = get_student_history(self.person_id)
def has_coursework(self):
return len(self.coursework_list) > 0
fetcher = EnrollmentFetcher(student_id)
fetcher.fetch_data()
if fetcher.has_coursework():
# Do something
Object-oriented programming is neither a good practice nor a bad one. You should choose to use it if it serves your needs in any particular case. In this case, it could help clarify your code (has_coursework is a bit clearer than checking if a list is empty, for example), but it may very well do the opposite.
Side note: Be careful about catching such a broad exception. Are you really okay with continuing if it's an OutOfMemory error, for example?
[EDIT: I'm running Python 2.7.3]
I'm a network engineer by trade, and I've been hacking on ncclient (the version on the website is old, and this was the version I've been working off of) to make it work with Brocade's implementation of NETCONF. There are some tweaks that I had to make in order to get it to work with our Brocade equipment, but I had to fork off the package and make tweaks to the source itself. This didn't feel "clean" to me so I decided I wanted to try to do it "the right way" and override a couple of things that exist in the package*; three things specifically:
A "static method" called build() which belongs to the HelloHandler class, which itself is a subclass of SessionListener
The "._id" attribute of the RPC class (the original implementation used uuid, and Brocade boxes didn't like this very much, so in my original tweaks I just changed this to a static value that never changed).
A small tweak to a util function that builds XML filter attributes
So far I have this code in a file brcd_ncclient.py:
#!/usr/bin/env python
# hack on XML element creation and create a subclass to override HelloHandler's
# build() method to format the XML in a way that the brocades actually like
from ncclient.xml_ import *
from ncclient.transport.session import HelloHandler
from ncclient.operations.rpc import RPC, RaiseMode
from ncclient.operations import util
# register brocade namespace and create functions to create proper xml for
# hello/capabilities exchange
BROCADE_1_0 = "http://brocade.com/ns/netconf/config/netiron-config/"
register_namespace('brcd', BROCADE_1_0)
brocade_new_ele = lambda tag, ns, attrs={}, **extra: ET.Element(qualify(tag, ns), attrs, **extra)
brocade_sub_ele = lambda parent, tag, ns, attrs={}, **extra: ET.SubElement(parent, qualify(tag, ns), attrs, **extra)
# subclass RPC to override self._id to change uuid-generated message-id's;
# Brocades seem to not be able to handle the really long id's
class BrcdRPC(RPC):
def __init__(self, session, async=False, timeout=30, raise_mode=RaiseMode.NONE):
self._id = "1"
return super(BrcdRPC, self).self._id
class BrcdHelloHandler(HelloHandler):
def __init__(self):
return super(BrcdHelloHandler, self).__init__()
#staticmethod
def build(capabilities):
hello = brocade_new_ele("hello", None, {'xmlns':"urn:ietf:params:xml:ns:netconf:base:1.0"})
caps = brocade_sub_ele(hello, "capabilities", None)
def fun(uri): brocade_sub_ele(caps, "capability", None).text = uri
map(fun, capabilities)
return to_xml(hello)
#return super(BrcdHelloHandler, self).build() ???
# since there's no classes I'm assuming I can just override the function itself
# in ncclient.operations.util?
def build_filter(spec, capcheck=None):
type = None
if isinstance(spec, tuple):
type, criteria = spec
# brocades want the netconf prefix on subtree filter attribute
rep = new_ele("filter", {'nc:type':type})
if type == "xpath":
rep.attrib["select"] = criteria
elif type == "subtree":
rep.append(to_ele(criteria))
else:
raise OperationError("Invalid filter type")
else:
rep = validated_element(spec, ("filter", qualify("filter")),
attrs=("type",))
# TODO set type var here, check if select attr present in case of xpath..
if type == "xpath" and capcheck is not None:
capcheck(":xpath")
return rep
And then in my file netconftest.py I have:
#!/usr/bin/env python
from ncclient import manager
from brcd_ncclient import *
manager.logging.basicConfig(filename='ncclient.log', level=manager.logging.DEBUG)
# brocade server capabilities advertising as 1.1 compliant when they're really not
# this will stop ncclient from attempting 1.1 chunked netconf message transactions
manager.CAPABILITIES = ['urn:ietf:params:netconf:capability:writeable-running:1.0', 'urn:ietf:params:netconf:base:1.0']
# BROCADE_1_0 is the namespace defined for netiron configs in brcd_ncclient
# this maps to the 'brcd' prefix used in xml elements, ie subtree filter criteria
with manager.connect(host='hostname_or_ip', username='username', password='password') as m:
# 'get' request with no filter - for brocades just shows 'show version' data
c = m.get()
print c
# 'get-config' request with 'mpls-config' filter - if no filter is
# supplied with 'get-config', brocade returns nothing
netironcfg = brocade_new_ele('netiron-config', BROCADE_1_0)
mplsconfig = brocade_sub_ele(netironcfg, 'mpls-config', BROCADE_1_0)
filterstr = to_xml(netironcfg)
c2 = m.get_config(source='running', filter=('subtree', filterstr))
print c2
# so far it only looks like the supported filters for 'get-config'
# operations are: 'interface-config', 'vlan-config' and 'mpls-config'
Whenever I run my netconftest.py file, I get timeout errors because in the log file ncclient.log I can see that my subclass definitions (namely the one that changes the XML for hello exchange - the staticmethod build) are being ignored and the Brocade box doesn't know how to interpret the XML that the original ncclient HelloHandler.build() method is generating**. I can also see in the generated logfile that the other things I'm trying to override are also being ignored, like the message-id (static value of 1) as well as the XML filters.
So, I'm kind of at a loss here. I did find this blog post/module from my research, and it would appear to do exactly what I want, but I'd really like to be able to understand what I'm doing wrong via doing it by hand, rather than using a module that someone has already written as an excuse to not have to figure this out on my own.
*Can someone explain to me if this is "monkey patching" and is actually bad? I've seen in my research that monkey patching is not desirable, but this answer and this answer are confusing me quite a bit. To me, my desire to override these bits would prevent me from having to maintain an entire fork of my own ncclient.
**To give a little more context, this XML, which ncclient.transport.session.HelloHandler.build() generates by default, the Brocade box doesn't seem to like:
<?xml version='1.0' encoding='UTF-8'?>
<nc:hello xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0">
<nc:capabilities>
<nc:capability>urn:ietf:params:netconf:base:1.0</nc:capability>
<nc:capability>urn:ietf:params:netconf:capability:writeable-running:1.0</nc:capability>
</nc:capabilities>
</nc:hello>
The purpose of my overridden build() method is to turn the above XML into this (which the Brocade does like:
<?xml version="1.0" encoding="UTF-8"?>
<hello xmlns="urn:ietf:params:xml:ns:netconf:base:1.0">
<capabilities>
<capability>urn:ietf:params:netconf:base:1.0</capability>
<capability>urn:ietf:params:netconf:capability:writeable-running:1.0</capability>
</capabilities>
</hello>
So it turns out that the "meta info" should not have been so hastily removed, because again, it's difficult to find answers to what I'm after when I don't fully understand what I want to ask. What I really wanted to do was override stuff in a package at runtime.
Here's what I've changed brcd_ncclient.py to (comments removed for brevity):
#!/usr/bin/env python
from ncclient import manager
from ncclient.xml_ import *
brcd_new_ele = lambda tag, ns, attrs={}, **extra: ET.Element(qualify(tag, ns), attrs, **extra)
brcd_sub_ele = lambda parent, tag, ns, attrs={}, **extra: ET.SubElement(parent, qualify(tag, ns), attrs, **extra)
BROCADE_1_0 = "http://brocade.com/ns/netconf/config/netiron-config/"
register_namespace('brcd', BROCADE_1_0)
#staticmethod
def brcd_build(capabilities):
hello = brcd_new_ele("hello", None, {'xmlns':"urn:ietf:params:xml:ns:netconf:base:1.0"})
caps = brcd_sub_ele(hello, "capabilities", None)
def fun(uri): brcd_sub_ele(caps, "capability", None).text = uri
map(fun, capabilities)
return to_xml(hello)
def brcd_build_filter(spec, capcheck=None):
type = None
if isinstance(spec, tuple):
type, criteria = spec
# brocades want the netconf prefix on subtree filter attribute
rep = new_ele("filter", {'nc:type':type})
if type == "xpath":
rep.attrib["select"] = criteria
elif type == "subtree":
rep.append(to_ele(criteria))
else:
raise OperationError("Invalid filter type")
else:
rep = validated_element(spec, ("filter", qualify("filter")),
attrs=("type",))
if type == "xpath" and capcheck is not None:
capcheck(":xpath")
return rep
manager.transport.session.HelloHandler.build = brcd_build
manager.operations.util.build_filter = brcd_build_filter
And then in netconftest.py:
#!/usr/bin/env python
from brcd_ncclient import *
manager.logging.basicConfig(filename='ncclient.log', level=manager.logging.DEBUG)
manager.CAPABILITIES = ['urn:ietf:params:netconf:capability:writeable-running:1.0', 'urn:ietf:params:netconf:base:1.0']
with manager.connect(host='host', username='user', password='password') as m:
netironcfg = brcd_new_ele('netiron-config', BROCADE_1_0)
mplsconfig = brcd_sub_ele(netironcfg, 'mpls-config', BROCADE_1_0)
filterstr = to_xml(netironcfg)
c2 = m.get_config(source='running', filter=('subtree', filterstr))
print c2
This gets me almost to where I want to be. I still have to edit the original source code to change the message-id's from being generated with uuid1().urn because I haven't figured out or don't understand how to change an object's attributes before __init__ happens at runtime (chicken/egg problem?); here's the offending code in ncclient/operations/rpc.py:
class RPC(object):
DEPENDS = []
REPLY_CLS = RPCReply
def __init__(self, session, async=False, timeout=30, raise_mode=RaiseMode.NONE):
self._session = session
try:
for cap in self.DEPENDS:
self._assert(cap)
except AttributeError:
pass
self._async = async
self._timeout = timeout
self._raise_mode = raise_mode
self._id = uuid1().urn # Keeps things simple instead of having a class attr with running ID that has to be locked
Credit goes to this recipe on ActiveState for finally cluing me in on what I really wanted to do. The code I had originally posted I don't think was technically incorrect - if what I wanted to do was fork off my own ncclient and make changes to it and/or maintain it, which wasn't what I wanted to do at all, at least not right now.
I'll edit my question title to better reflect what I had originally wanted - if other folks have better or cleaner ideas, I'm totally open.
My script below scrapes a website and returns the data from a table. It's not finished but it works. The problem is that it has no error checking. Where should I have error handling in my script?
There are no unittests, should I write some and schedule my unittests to be run periodicaly. Or should the error handling be done in my script?
Any advice on the proper way to do this would be great.
#!/usr/bin/env python
''' Gets the Canadian Monthly Residential Bill Calculations table
from URL and saves the results to a sqllite database.
'''
import urllib2
from BeautifulSoup import BeautifulSoup
class Bills():
''' Canadian Monthly Residential Bill Calculations '''
URL = "http://www.hydro.mb.ca/regulatory_affairs/energy_rates/electricity/utility_rate_comp.shtml"
def __init__(self):
''' Initialization '''
self.url = self.URL
self.data = []
self.get_monthly_residential_bills(self.url)
def get_monthly_residential_bills(self, url):
''' Gets the Monthly Residential Bill Calculations table from URL '''
doc = urllib2.urlopen(url)
soup = BeautifulSoup(doc)
res_table = soup.table.th.findParents()[1]
results = res_table.findNextSibling()
header = self.get_column_names(res_table)
self.get_data(results)
self.save(header, self.data)
def get_data(self, results):
''' Extracts data from search result. '''
rows = results.childGenerator()
data = []
for row in rows:
if row == "\n":
continue
for td in row.contents:
if td == "\n":
continue
data.append(td.text)
self.data.append(tuple(data))
data = []
def get_column_names(self, table):
''' Gets table title, subtitle and column names '''
results = table.findAll('tr')
title = results[0].text
subtitle = results[1].text
cols = results[2].childGenerator()
column_names = []
for col in cols:
if col == "\n":
continue
column_names.append(col.text)
return title, subtitle, column_names
def save(self, header, data):
pass
if __name__ == '__main__':
a = Bills()
for td in a.data:
print td
See the documentation of all the functions and see what all exceptions do they throw.
For ex, in urllib2.urlopen(), it's written that Raises URLError on errors. It's a subclass of IOError.
So, for the urlopen(), you could do something like:
try:
doc = urllib2.urlopen(url)
except IOError:
print >> sys.stderr, 'Error opening URL'
Similary, do the same for others.
You should write unit tests and you should use exception handling. But only catch the exceptions you can handle; you do no one any favors by catching everything and throwing any useful information out.
Unit tests aren't run periodically though; they're run before and after the code changes (although it is feasible for one change's "after" to become another change's "before" if they're close enough).
A copple places you need to have them.is in importing things like tkinter
try:
import Tkinter as tk
except:
import tkinter as tk
also anywhere where the user enters something with a n intended type. A good way to figure this out is to run it abd try really hard to make it crash. Eg typing in wrong type.
The answer to "where should I have error handling in my script?" is basically "any place where something could go wrong", which depends entirely on the logic of your program.
In general, any place where your program relies on an assumption that a particular operation worked as you intended, and there's a possibility that it may not have, you should add code to check whether or not it actually did work, and take appropriate remedial action if it didn't. In some cases, the underlying code might generate an exception on failure and you may be happy to just let the program terminate with an uncaught exception without adding any error-handling code of your own, but (1) this would be, or ought to be, rare if anyone other than you is ever going to use that program; and (2) I'd say this would fall into the "works as intended" category anyway.
hard to word the question so ill go right to the point, i wrote the following template tag
def do_simple_tag(parser, token):
try:
tag_name, name = token.split_contents()
except ValueError:
raise template.TemplateSyntaxError("%r tag requires exactly one argument" % token.contents.split()[0])
if not (name[0] == name[-1] and name[0] in ('"', "'")):
raise template.TemplateSyntaxError("%r tag's argument should be in quotes" % tag_name)
return SimpleTagNode(name[1:-1])
class SimpleTagNode(template.Node):
def __init__(self, name):
self.name = name
def render(self, context):
content = get_content(context, request, name)
return content
register.tag('simple_tag', do_simple_tag)
then i wrote a function that scans for this tag in a template and gets all instances of this tag within said template in a list like so
def get_tags(template):
compiled_template = get_template(template)
simple_tag_instances = _scan_tag(compiled_template.nodelist)
def _scan_tag(nodelist, current_block=None, ignore_blocks=[]):
tags = []
for node in nodelist:
if isinstance(node, SimpleTagNode):
tags.append(node.get_name())
so, my question is why does the isinstance fail if node is infact an instance of SimpleTagNode ( or so i believe ) , i checked nodelist and saw that indeed there were instances of SimpleTagNode, but they would all return false in the isinstance condition, i have spent a long time trying to figure this one out, but found nothing, i even used the shell running the funcions above and still returned fals, any help is much a appreciated
So i finally solved it, basically in the module that contained the _scan_tag function at the top of the file i was importing the SimpleTagNode class like so
from simple_tag.templatetags.simple_tag import SimpleTagNode
simple_tag being the name of my app, and also the name of the template file, for some reason this was conflicted with isinstance , so i tried
from paulo.simple_tag.templatetags.simple_tag import SimpleTagNode
paulo being my project app, and it worked.