Best practice when defining instance variables

Best practice when defining instance variables - python

I'm fairly new to Python and have a question regarding the following class:
class Configuration:
def __init__(self):
parser = SafeConfigParser()
try:
if parser.read(CONFIG_FILE) is None:
raise IOError('Cannot open configuration file')
except IOError, error:
sys.exit(error)
else:
self.__parser = parser
self.fileName = CONFIG_FILE
def get_section(self):
p = self.__parser
result = []
for s in p.sections():
result.append('{0}'.format(s))
return result
def get_info(self, config_section):
p = self.__parser
self.section = config_section
self.url = p.get(config_section, 'url')
self.imgexpr = p.get(config_section, 'imgexpr')
self.imgattr1 = p.get(config_section, 'imgattr1')
self.imgattr2 = p.get(config_section, 'imgattr2')
self.destination = p.get(config_section, 'destination')
self.createzip = p.get(config_section, 'createzip')
self.pagesnumber = p.get(config_section, 'pagesnumber')
Is it OK to add more instance variables in another function, get_info in this example, or is it best practice to define all instance variables in the constructor? Couldn't it lead to spaghetti code if I define new instance variables all over the place?
EDIT: I'm using this code with a simple image scraper. Via get_section I return all sections in the config file, and then iterate through them to visit each site that I'm scraping images from. For each iteration I make a call to get_section to get the configuration settings for each section in the config file.
If anyone can come up with another approach it'll be fine! Thanks!

I would definitely declare all instance variables in __init__. To not do so leads to increased complexity and potential unexpected side effects.
To provide an alternate point of view from David Hall in terms of access, this is from the Google Python style guide.
Access Control:
If an accessor function would be trivial you should use public
variables instead of accessor functions to avoid the extra cost of
function calls in Python. When more functionality is added you can use
property to keep the syntax consistent
On the other hand, if access is more complex, or the cost of accessing
the variable is significant, you should use function calls (following
the Naming guidelines) such as get_foo() and set_foo(). If the past
behavior allowed access through a property, do not bind the new
accessor functions to the property. Any code still attempting to
access the variable by the old method should break visibly so they are
made aware of the change in complexity.
From PEP8
For simple public data attributes, it is best to expose just the
attribute name, without complicated accessor/mutator methods. Keep in
mind that Python provides an easy path to future enhancement, should
you find that a simple data attribute needs to grow functional
behavior. In that case, use properties to hide functional
implementation behind simple data attribute access syntax.
Note 1: Properties only work on new-style classes.
Note 2: Try to keep the functional behavior side-effect free, although
side-effects such as caching are generally fine.
Note 3: Avoid using properties for computationally expensive
operations; the attribute notation makes the caller believe that
access is (relatively) cheap.
Python isn't java/C#, and it has very strong ideas about how code should look and be written. If you are coding in python, it makes sense to make it look and feel like python. Other people will be able to understand your code more easily and you'll be able to understand other python code better as well.

I would favour setting all the instance variables in the constructor over having functions like get_info() that are required to put the class in a valid state.
With public instance variables that are only instantiated by calls to methods such as your get_info() you create a class that is a bit of a minefield to use.
If you are worried about have certain configuration values which are not always needed and are expensive to calculate (which I guess is why you have get_info(), allowing for deferred execution), then I'd either consider refactoring that subset of config into a second class or introducting properties or functions that return values.
With properties or get style functions you encourage consumers of the class to go through a defined interface and improve the encapsulation 1.
Once you have that encapsulation of the instance variables you give yourself the option to do something more than simply throw a NameError exception - you can perhaps call get_info() yourself, or throw a custom exception.
1.You can't provide 100% encapsulation with Python since private instance variables denoted by a leading double underscore are only private by convention

Related

use of attributes in python

This is kind of a high level question. I'm not sure what you'd do with code like this:
class Object(object):
pass
obj = Object
obj.a = lambda: None
obj.d = lambda: dict
setattr(obj.d, 'dictionary', {4,3,5})
setattr(obj.a, 'somefield', 'somevalue')
If I'm going to call obj.a.somefield, why would I use print? It feels redundant.
I simply can't see what programming strictly with setting attributes would be good for?
I could write an entire program with all of my variables in object classes.

First about your print question. Print is used more for debugging or for attributes that are an output from an object that gives you information when you create it.
For example, there might be an object that you create by passing it data and it finds all of the basic statistics information of that data. You could have it return a dictionary via a method and access the values from there or you could simply access it via an attribute, making the data more readable.
For your second part of your question about why you would want to use attributes in general, they're more for internally passing information from function to function in an object or for configuring an object. Python has different scopes that determine which information each function can access. All methods of an object can access that object's attributes, which allows you to avoid using external or global variables. That makes your object nice and self contained. Global variables are generally avoided at all costs, because they can get messy, so they are considered bad practice.
Taking that a step further, using setattr is a more sophisticated way of setting these attributes to make your code more readable. You could use a function to modify aspects of an object or you could "hide" the complexity inside your setattr so the user can use a higher level interface rather than getting bogged down in the specifics.

Methods of creating syntax highlighting in textX?

As I cannot find any guidelines about syntax highlighting, I decided to prepare simple write-as-plain-text-and-then-highlight-everything-in-html-preview, which is enough for my scope at the moment.
By overriding many custom meta-model classes I have to_source method, which actually reimplements the whole syntax in reverse, as reverse parsing is not yet available. It's fine, but it ignores user formatting.
To retain user formatting we can use only available thing: _tx_position and _tx_position_end. Descending from main textX rule to its children by stored custom meta-model classes attributes works for most cases, but it fails with primitives.
# textX meta-model file
NonsenseProgram:
"begin" foo=Foo "," count=INT "end";
;
Foo:
"fancy" a=ID "separator" b=ID "finished"
;
# textX custom meta-model classes
class NonsenseProgram():
def __init__(foo, count):
self.foo = foo
self.count = count
def to_source(self):
pass # some recursive magic that use _tx_position and _tx_position_end
class Foo():
def __init__(parent, a, b):
self.parent = parent
self.a = a
self.b = b
def to_source(self):
pass # some recursive magic that use _tx_position and _tx_position_end
Let's consider given example. As we have NonsenseProgram and Foo classes that we can override, we are in control about it's returning source as a whole. We can modify NonsenseProgram generated code, NonsenseProgram.foo fragment (by overriding Foo), by accessing its _tx_* attributes. We can't do the same with NonsenseProgram.count, Foo.a and Foo.b as we have primitive string or int value.
Depending of the usage of primitives is out grammar we have following options:
Wrap every primitive with rule that contains only that primitive and nothing else.
Pros: It just works right now!
Cons: Produces massive overhead of nested values that our grammar toolchain need to handle. It's actually messing with grammar only for being pretty...
Ignore syntax from user and use only our reverse parsing rules.
Pros: It just works too!
Cons: You need reimplement your syntax with nearly every grammar element. It's forces code reformat on every highlight try.
Use some external rules of highlighting.
Pros: It would work...
Cons: Again grammar reimplementation.
Use language server.
Pros: Would be the best option on long run.
Cons: It's only mentioned once without any in-depth docs.
Any suggestions about any other options?

You are right. There is no information on position for primitive types. It seems that you have covered available options at the moment.
What would be an easy to implement option is to add bookkeeping of position directly to textX of all attributes as a special structure on each created object (e.g. a dict keyed by attribute name). It should be straightforward to implement so you can register a feature request in the issue tracker if you wish.
There was some work in the past to support full language services to the textX based languages. The idea is to get all the features you would expect from a decent code editor/IDE for any language specified using textX.
The work staled for a while but resumed recently as the full rewrite. It should be officially supported by the textX team. You can follow the progress here. Although, the project doesn't mention syntax highlighting at the moment, it is on our agenda.

Python - Bad practice to store instance vars in local vars to avoid "self"?

I've been mostly programming in Java and I find Pythons explicit self referencing to class members to be ugly. I really don't like how all the "self."s clutter down my methods, so I find myself wanting to store instance variables in local variables just to get rid of it. For example, I would replace this:
def insert(self, data, priority):
self.list.append(self.Node(data, priority))
index = len(self)-1
while self.list[index].priority < self.list[int(index/2)].priority:
self.list[index], self.list[int(index/2)] = self.list[int(index/2)], self.list[index]
index = int(index/2)
with this:
def insert(self, data, priority):
l = self.list
l.append(self.Node(data, priority))
index = len(self)-1
while l[index].priority < l[int(index/2)].priority:
l[index], l[int(index/2)] = l[int(index/2)], l[index]
index = int(index/2)
Normally I would name the local variable the same as the instance variable, but "list" is reserved so I went with "l". My question is: is this considered bad practice in the Python community?

Easier answer first. In Python, underscore is used to avoid clashes with keywords and builtins:
list_ = self.list
This will be understood by Python programmers as the right way.
As for making local variables for properties, it depends. Grepping codebase of Plone (and even standard library) shows, that x = self.x is used, especially,
context = self.context
As pointed out in comments, it's potentially error-prone, because binding another value to local variable will not affect the property.
On the other hand, if some attribute is read-only in the method, it makes code much more readable. So, it's ok if variable use is local enough, say, like let-clauses in functional programming languages.
Sometimes properties are actually functions, so self.property will be calculated each time. (It's another question how "pythonic" is doing extensive calculations for property getters) (thanks Python #property versus getters and setters for a ready example):
class MyClass(object):
...
#property
def my_attr(self):
...
#my_attr.setter
def my_attr(self, value):
...
In summary, use sparingly, with care, do not make it a rule.

I agree that explicitly adding "self" (or "this" for other languages) isn't very appealing for the eye. But as people said, python follows the philosophy "explicit is better than implicit". Therefore it really wants you to express the scope of the variable you want to access.
Java won't let you use variables you didn't declare, so there are no chances for confusion. But in python if the "self" was optional, for the assignment a = 5 it would not be clear whether to create a member or local variable. So the explicit self is required at some places. Accessing would work the same though. Note that also Java requires an explicit this for name clashes.
I just counted the selfs in some spaghetti code of mine. For 1000 lines of code there's more than 500 appearances of self. Now the code indeed isn't that readable, but the problem isn't the repeated use of self. For your code example above: the 2nd version has a shorter line length, which makes it easier and/or faster to comprehend. I would say your example is an acceptable case.

Should I use a class in this: Reading a XML file using lxml

This question is in continuation to my previous question, in which I asked about passing around an ElementTree.
I need to read the XML files only and to solve this, I decided to create a global ElementTree and then parse it wherever required.
My question is:
Is this an acceptable practice? I heard global variables are bad. If I don't make it global, I was suggested to make a class. But do I really need to create a class? What benefits would I have from that approach. Note that I would be handling only one ElementTree instance per run, the operations are read-only. If I don't use a class, how and where do I declare that ElementTree so that it available globally? (Note that I would be importing this module)
Please answer this question in the respect that I am a beginner to development, and at this stage I can't figure out whether to use a class or just go with the functional style programming approach.

There are a few reasons that global variables are bad. First, it gets you in the habit of declaring global variables which is not good practice, though in some cases globals make sense -- PI, for instance. Globals also create problems when you on purpose or accidentally re-use the name locally. Or worse, when you think you're using the name locally but in reality you're assigning a new value to the global variable. This particular problem is language dependent, and python handles it differently in different cases.
class A:
def __init__(self):
self.name = 'hi'
x = 3
a = A()
def foo():
a.name = 'Bedevere'
x = 9
foo()
print x, a.name #outputs 3 Bedevere
The benefit of creating a class and passing your class around is you will get a defined, constant behavior, especially since you should be calling class methods, which operate on the class itself.
class Knights:
def __init__(self, name='Bedevere'):
self.name = name
def knight(self):
self.name = 'Sir ' + self.name
def speak(self):
print self.name + ":", "Run away!"
class FerociousRabbit:
def __init__(self):
self.death = "awaits you with sharp pointy teeth!"
def speak(self):
print "Squeeeeeeee!"
def cave(thing):
thing.speak()
if isinstance(thing, Knights):
thing.knight()
def scene():
k = Knights()
k2 = Knights('Launcelot')
b = FerociousRabbit()
for i in (b, k, k2):
cave(i)
This example illustrates a few good principles. First, the strength of python when calling functions - FerociousRabbit and Knights are two different classes but they have the same function speak(). In other languages, in order to do something like this, they would at least have to have the same base class. The reason you would want to do this is it allows you to write a function (cave) that can operate on any class that has a 'speak()' method. You could create any other method and pass it to the cave function:
class Tim:
def speak(self):
print "Death awaits you with sharp pointy teeth!"
So in your case, when dealing with an elementTree, say sometime down the road you need to also start parsing an apache log. Well if you're doing purely functional program you're basically hosed. You can modify and extend your current program, but if you wrote your functions well, you could just add a new class to the mix and (technically) everything will be peachy keen.

Pragmatically, is your code expected to grow? Even though people herald OOP as the right way, I found that sometimes it's better to weigh cost:benefit(s) whenever you refactor a piece of code. If you are looking to grow this, then OOP is a better option in that you can extend and customise any future use case, while saving yourself from unnecessary time wasted in code maintenance. Otherwise, if it ain't broken, don't fix it, IMHO.

I generally find myself regretting it when I give in to the temptation to give a module, for example, a load_file() method that sets a global that the module's other functions can then use to find the file they're supposed to be talking about. It makes testing far more difficult, for example, and as soon as I need two XML files there is a problem. Plus, every single function needs to check whether the file's there and give an error if it's not.
If I want to be functional, I simply therefore have every function take the XML file as an argument.
If I want to be object oriented, I'll have a MyXMLFile class whose methods can just look at self.xmlfile or whatever.
The two approaches are more or less equivalent when there's just one single thing, like a file, to be passed around; but when the number of things in the "state" becomes larger than a few, then I find classes simpler because I can stick all of those things in the class.
(Am I answering your question? I'm still a big vague on what kind of answer you want.)

Bad Practice to run code in constructor thats likely to fail?

my question is rather a design question.
In Python, if code in your "constructor" fails, the object ends up not being defined. Thus:
someInstance = MyClass("test123") #lets say that constructor throws an exception
someInstance.doSomething() # will fail, name someInstance not defined.
I do have a situation though, where a lot of code copying would occur if i remove the error-prone code from my constructor. Basically my constructor fills a few attributes (via IO, where a lot can go wrong) that can be accessed with various getters. If I remove the code from the contructor, i'd have 10 getters with copy paste code something like :
is attribute really set?
do some IO actions to fill the attribute
return the contents of the variable in question
I dislike that, because all my getters would contain a lot of code. Instead of that I perform my IO operations in a central location, the constructor, and fill all my attributes.
Whats a proper way of doing this?

There is a difference between a constructor in C++ and an __init__ method
in Python. In C++, the task of a constructor is to construct an object. If it fails,
no destructor is called. Therefore if any resources were acquired before an
exception was thrown, the cleanup should be done before exiting the constructor.
Thus, some prefer two-phase construction with most of the construction done
outside the constructor (ugh).
Python has a much cleaner two-phase construction (construct, then
initialize). However, many people confuse an __init__ method (initializer)
with a constructor. The actual constructor in Python is called __new__.
Unlike in C++, it does not take an instance, but
returns one. The task of __init__ is to initialize the created instance.
If an exception is raised in __init__, the destructor __del__ (if any)
will be called as expected, because the object was already created (even though it was not properly initialized) by the time __init__ was called.
Answering your question:
In Python, if code in your
"constructor" fails, the object ends
up not being defined.
That's not precisely true. If __init__ raises an exception, the object is
created but not initialized properly (e.g., some attributes are not
assigned). But at the time that it's raised, you probably don't have any references to
this object, so the fact that the attributes are not assigned doesn't matter. Only the destructor (if any) needs to check whether the attributes actually exist.
Whats a proper way of doing this?
In Python, initialize objects in __init__ and don't worry about exceptions.
In C++, use RAII.
Update [about resource management]:
In garbage collected languages, if you are dealing with resources, especially limited ones such as database connections, it's better not to release them in the destructor.
This is because objects are destroyed in a non-deterministic way, and if you happen
to have a loop of references (which is not always easy to tell), and at least one of the objects in the loop has a destructor defined, they will never be destroyed.
Garbage collected languages have other means of dealing with resources. In Python, it's a with statement.

In C++ at least, there is nothing wrong with putting failure-prone code in the constructor - you simply throw an exception if an error occurs. If the code is needed to properly construct the object, there reallyb is no alternative (although you can abstract the code into subfunctions, or better into the constructors of subobjects). Worst practice is to half-construct the object and then expect the user to call other functions to complete the construction somehow.

It is not bad practice per se.
But I think you may be after a something different here. In your example the doSomething() method will not be called when the MyClass constructor fails. Try the following code:
class MyClass:
def __init__(self, s):
print s
raise Exception("Exception")
def doSomething(self):
print "doSomething"
try:
someInstance = MyClass("test123")
someInstance.doSomething()
except:
print "except"
It should print:
test123
except
For your software design you could ask the following questions:
What should the scope of the someInstance variable be? Who are its users? What are their requirements?
Where and how should the error be handled for the case that one of your 10 values is not available?
Should all 10 values be cached at construction time or cached one-by-one when they are needed the first time?
Can the I/O code be refactored into a helper method, so that doing something similiar 10 times does not result in code repetition?
...

I'm not a Python developer, but in general, it's best to avoid complex/error-prone operations in your constructor. One way around this would be to put a "LoadFromFile" or "Init" method in your class to populate the object from an external source. This load/init method must then be called separately after constructing the object.

One common pattern is two-phase construction, also suggested by Andy White.
First phase: Regular constructor.
Second phase: Operations that can fail.
Integration of the two: Add a factory method to do both phases and make the constructor protected/private to prevent instantation outside the factory method.
Oh, and I'm neither a Python developer.

If the code to initialise the various values is really extensive enough that copying it is undesirable (which it sounds like it is in your case) I would personally opt for putting the required initialisation into a private method, adding a flag to indicate whether the initialisation has taken place, and making all accessors call the initialisation method if it has not initialised yet.
In threaded scenarios you may have to add extra protection in case initialisation is only allowed to occur once for valid semantics (which may or may not be the case since you are dealing with a file).

Again, I've got little experience with Python, however in C# its better to try and avoid having a constructor that throws an exception. An example of why that springs to mind is if you want to place your constructor at a point where its not possible to surround it with a try {} catch {} block, for example initialisation of a field in a class:
class MyClass
{
MySecondClass = new MySecondClass();
// Rest of class
}
If the constructor of MySecondClass throws an exception that you wish to handle inside MyClass then you need to refactor the above - its certainly not the end of the world, but a nice-to-have.
In this case my approach would probably be to move the failure-prone initialisation logic into an initialisation method, and have the getters call that initialisation method before returning any values.
As an optimisation you should have the getter (or the initialisation method) set some sort of "IsInitialised" boolean to true, to indicate that the (potentially costly) initialisation does not need to be done again.
In pseudo-code (C# because I'll just mess up the syntax of Python):
class MyClass
{
private bool IsInitialised = false;
private string myString;
public void Init()
{
// Put initialisation code here
this.IsInitialised = true;
}
public string MyString
{
get
{
if (!this.IsInitialised)
{
this.Init();
}
return myString;
}
}
}
This is of course not thread-safe, but I don't think multithreading is used that commonly in python so this is probably a non-issue for you.

seems Neil had a good point: my friend just pointed me to this:
http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization
which is basically what Neil said...

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.