I'm looking at the Binary Search Trees section in the tutorial "Problem Solving with Algorithms and Data Structures", (http://interactivepython.org/runestone/static/pythonds/Trees/SearchTreeImplementation.html). On several occasions, they use "public" and "private" helper methods with the same name, e.g. for the "put" method:
def put(self,key,val):
if self.root:
self._put(key,val,self.root)
else:
self.root = TreeNode(key,val)
self.size = self.size + 1
def _put(self,key,val,currentNode):
if key < currentNode.key:
if currentNode.hasLeftChild():
self._put(key,val,currentNode.leftChild)
else:
currentNode.leftChild = TreeNode(key,val,parent=currentNode)
else:
if currentNode.hasRightChild():
self._put(key,val,currentNode.rightChild)
else:
currentNode.rightChild = TreeNode(key,val,parent=currentNode)
I also have seen this approach elsewhere, but I don't really understand the motivation. What is the advantage compared to putting everything directly into one method, is it just to improve readability?
The rationale is that the user of the class should not know anything about "current node". The current node only makes sense during the recursive insert process, it's not a permanent property of the tree. The user treats the tree as a whole, and only does insert/lookup operations on that.
That said, you could mix both methods into one, by using a default value currentNode=None and checking it. However, the two methods are doing significantly different things. The put method just initialises the root, while the _put does the recursive insertion, so it would probably be better to keep them separate.
Here, the motivation is to use recursion. As you probably notice, _put method calls itself and method signatures are different. If you put _put method into public method, you have to change signature of public method to handle put operation on a given node. Simply, you have to add currentNode parameter. However, original public method does not have this parameter. I assume, it is because the author does not want to expose this functionality to end user.
Related
I am relatively new to OOP, and definitely still learning. I would like to know what is the best practice when dealing with two classes such as this:
(I have reverse engineered the statistics engine of an old computer game, but I guess the subject does not matter to my question)
class Step(object):
def __init__(self):
self.input = 'G'
...more attributes...
def reset_input(self):
''' Reset input to None. '''
self.input = None
print '* Input reset.'
... more methods ...
Then I have the Player class, which is the main object to control (at least in my design):
class Player(object):
''' Represents a player. Accepts initial stats.'''
def __init__(self, step= 250, off= 13, dng= 50000, dist= 128, d_inc= 113):
self.route = []
self.step = Step(step=step, off=off, dng=dng, dist=dist, d_inc=d_inc)
self.original = copy.copy(self.step)
As you can see, Player contains a Step object, which represents the next Step.
I have found that I sometimes want to access a method from that Step class.
In this case, is it better to add a wrapper to Player, such as:
(If I want to access reset_input()):
class Player(object):
...
def reset_input(self):
self.step.reset_input()
Then for Player to reset the input value:
p = Player()
p.reset_input()
Or would it be better practice to access the reset_input() directly with:
p = Player()
p.step.reset_input()
It seems adding the wrapper is just duplicating code. It's also annoying as I need access to quite a few of Step's methods.
So, when using composition (I think it is the correct term), is it good practice to directly access the 'inner' objects methods?
I believe you should apply an additional layer of abstraction in OOP if:
you foresee yourself updating the code later; and
the code will be used in multiple places.
In this case, let's say you go with this method:
def reset_input(self):
self.step.reset_input()
and then you call it in multiple places in your code. Later on, you decide that you want to do action x() before all your calls to reset_input, pass in optional parameter y to reset_input, and do action z() after that. Then it's trivial to update the method as follows:
def reset_input(self):
self.x()
self.step.reset_input(self.y)
self.z()
And the code will be changed everywhere with just a few keystrokes. Imagine the nightmare you'd have on your hands if you had to update all the calls in multiple places because you weren't using a wrapper function.
You should apply a wrapper if you actually foresee yourself using the wrapper to apply changes to your code. This will make your code easier to maintain. As stated in the comments, this concept is known as encapsulation; it allows you to use an interface that hides implementation details, so that you can easily update the implementation at any time and it will change the code universally in a very simple way.
It's always a tradeoff. Look at the Law of Demeter. It describes your situation and also pros and cons of different solutions.
Using plistlib to load a plist file in Python, I have a data structure to work with wherein a given path to a key-value pair should never fail, so it's acceptable IMO to hard-code the path without .get() and other tricks -- however, it is a long and ugly path. The plist is full of dicts in arrays in dicts, so it ends up looking like this:
def input_user_data(plist, user_text):
new_text = clean_user_data(user_data)
plist['template_data_array'][0]['template_section']['section_fields_data'][0]['disclaimer_text'] = new_text #do not like
Apart from being way past the 79 character limit, it just looks lugubrious and sophomoric. However it seems equally silly to step through it like this:
#....
one = plist['template_data_array']
two = one[0]['template_section']['section_fields_data']
two[0]['disclaimer_text'] = new_text
...because I don't really need all those assignments, I'm just looking to sanitize the user text and toss it into the predefined section of a plist.
When dealing with a nested path that will always exist but is just tedious to access (and may indeed need to be found again by other methods), is there a shorter technique to employ, or do I just grin and bear the lousy nested structure that I have no control over?
When you see a lot of duplicated or boilerplate code, this is often a hint that you can refactor the repetitive operations into a function. Writing get_node and set_node helper functions not only makes the code that sets the values simpler, it allows you to easily define the paths as constants, which you can put all in one place in your code for easier maintenance.
def get_node(container, path):
for node in path:
container = container[node]
return container
def set_node(container, path, value):
container = get_node(container, path[:-1])
container[path[-1]] = value
DISCLAIMER_PATH = ("template_data_array", 0, "template_section", "section_fields_data",
0, "disclaimer_text")
set_node(plist, DISCLAIMER_PATH, new_text)
Potentially, you could subclass plist's class to have these as methods, or even to override __getitem__ and __setitem__, which would be convenient.
I'm fairly new to Python and have a question regarding the following class:
class Configuration:
def __init__(self):
parser = SafeConfigParser()
try:
if parser.read(CONFIG_FILE) is None:
raise IOError('Cannot open configuration file')
except IOError, error:
sys.exit(error)
else:
self.__parser = parser
self.fileName = CONFIG_FILE
def get_section(self):
p = self.__parser
result = []
for s in p.sections():
result.append('{0}'.format(s))
return result
def get_info(self, config_section):
p = self.__parser
self.section = config_section
self.url = p.get(config_section, 'url')
self.imgexpr = p.get(config_section, 'imgexpr')
self.imgattr1 = p.get(config_section, 'imgattr1')
self.imgattr2 = p.get(config_section, 'imgattr2')
self.destination = p.get(config_section, 'destination')
self.createzip = p.get(config_section, 'createzip')
self.pagesnumber = p.get(config_section, 'pagesnumber')
Is it OK to add more instance variables in another function, get_info in this example, or is it best practice to define all instance variables in the constructor? Couldn't it lead to spaghetti code if I define new instance variables all over the place?
EDIT: I'm using this code with a simple image scraper. Via get_section I return all sections in the config file, and then iterate through them to visit each site that I'm scraping images from. For each iteration I make a call to get_section to get the configuration settings for each section in the config file.
If anyone can come up with another approach it'll be fine! Thanks!
I would definitely declare all instance variables in __init__. To not do so leads to increased complexity and potential unexpected side effects.
To provide an alternate point of view from David Hall in terms of access, this is from the Google Python style guide.
Access Control:
If an accessor function would be trivial you should use public
variables instead of accessor functions to avoid the extra cost of
function calls in Python. When more functionality is added you can use
property to keep the syntax consistent
On the other hand, if access is more complex, or the cost of accessing
the variable is significant, you should use function calls (following
the Naming guidelines) such as get_foo() and set_foo(). If the past
behavior allowed access through a property, do not bind the new
accessor functions to the property. Any code still attempting to
access the variable by the old method should break visibly so they are
made aware of the change in complexity.
From PEP8
For simple public data attributes, it is best to expose just the
attribute name, without complicated accessor/mutator methods. Keep in
mind that Python provides an easy path to future enhancement, should
you find that a simple data attribute needs to grow functional
behavior. In that case, use properties to hide functional
implementation behind simple data attribute access syntax.
Note 1: Properties only work on new-style classes.
Note 2: Try to keep the functional behavior side-effect free, although
side-effects such as caching are generally fine.
Note 3: Avoid using properties for computationally expensive
operations; the attribute notation makes the caller believe that
access is (relatively) cheap.
Python isn't java/C#, and it has very strong ideas about how code should look and be written. If you are coding in python, it makes sense to make it look and feel like python. Other people will be able to understand your code more easily and you'll be able to understand other python code better as well.
I would favour setting all the instance variables in the constructor over having functions like get_info() that are required to put the class in a valid state.
With public instance variables that are only instantiated by calls to methods such as your get_info() you create a class that is a bit of a minefield to use.
If you are worried about have certain configuration values which are not always needed and are expensive to calculate (which I guess is why you have get_info(), allowing for deferred execution), then I'd either consider refactoring that subset of config into a second class or introducting properties or functions that return values.
With properties or get style functions you encourage consumers of the class to go through a defined interface and improve the encapsulation 1.
Once you have that encapsulation of the instance variables you give yourself the option to do something more than simply throw a NameError exception - you can perhaps call get_info() yourself, or throw a custom exception.
1.You can't provide 100% encapsulation with Python since private instance variables denoted by a leading double underscore are only private by convention
I've set up a custom namespace lookup dictionary in order to map elements in XML files to subclasses of ObjectifiedElement. Now, I want to add some data to instances of these classes. But due to the way ObjectifiedElement works, adding an attribute will result in an element being added to the element tree, which is not what I want. More importantly, this doesn't work for all Python types; for example, it is not possible to create an attribute of the list type.
This seems to be possible by subclassing ElementBase instead, but that would imply losing the features provided by ObjectifiedElement. You could say I only need the read part of ObjectifiedElement. I suppose I can add a __getattr__ to my subclasses to simulate this, but I was hoping there was another way.
I ended up with having __getattr__() simply forward to etree's find():
class SomewhatObjectifiedElement(etree.ElementBase):
nsmap = {'ns': 'http://www.my.org/namespace'}
def __getattr__(self, name):
return self.find('ns:' + name, self.nsmap)
This will only return the first element if there are several matching, unlike ObjectifiedElement's behaviour, but it suffices for my application (mostly it can be only a single match, otherwise, I use findall()).
Which of the following classes would demonstrate the best way to set an instance attribute? Should they be used interchangeably based on the situation?
class Eggs(object):
def __init__(self):
self.load_spam()
def load_spam(self):
# Lots of code here
self.spam = 5
or
class Eggs(object):
def __init__(self):
self.spam = self.load_spam()
def load_spam(self):
# Lots of code here
return 5
I would prefer the second method.
Here's why:
Procedures with side effects tend to introduce temporal coupling. Simply put, changing the order in which you execute these procedures might break your code. Returning values and passing them to other methods in need of them makes inter-method communication explicit and thus easier to reason about and hard to forget/get in the wrong order.
Also returning a value makes it easier to test your method. With a return value, you can treat the enclosing object as a black box and ignore the internals of the object, which is generally a good thing. It makes your test code more robust.
I would indeed choose depending on the situation. If in doubt, I would choose the second version, because it's more explicit and load_spam as no (or at least less) side effects. Less side effects usually lead to code which is easier to maintain and easier to understand. As you know, there's not rule without exception. But that's the way how I would approach the problem.
If you are setting instance attributes the first method is more Pythonic. If you are calculating intermediate results then function calls are fine. Note that the second method is not only not Pythonic, it's misleading -- it's called load_spam, but it doesn't!