Bad Practice to run code in constructor thats likely to fail? - python

my question is rather a design question.
In Python, if code in your "constructor" fails, the object ends up not being defined. Thus:
someInstance = MyClass("test123") #lets say that constructor throws an exception
someInstance.doSomething() # will fail, name someInstance not defined.
I do have a situation though, where a lot of code copying would occur if i remove the error-prone code from my constructor. Basically my constructor fills a few attributes (via IO, where a lot can go wrong) that can be accessed with various getters. If I remove the code from the contructor, i'd have 10 getters with copy paste code something like :
is attribute really set?
do some IO actions to fill the attribute
return the contents of the variable in question
I dislike that, because all my getters would contain a lot of code. Instead of that I perform my IO operations in a central location, the constructor, and fill all my attributes.
Whats a proper way of doing this?

There is a difference between a constructor in C++ and an __init__ method
in Python. In C++, the task of a constructor is to construct an object. If it fails,
no destructor is called. Therefore if any resources were acquired before an
exception was thrown, the cleanup should be done before exiting the constructor.
Thus, some prefer two-phase construction with most of the construction done
outside the constructor (ugh).
Python has a much cleaner two-phase construction (construct, then
initialize). However, many people confuse an __init__ method (initializer)
with a constructor. The actual constructor in Python is called __new__.
Unlike in C++, it does not take an instance, but
returns one. The task of __init__ is to initialize the created instance.
If an exception is raised in __init__, the destructor __del__ (if any)
will be called as expected, because the object was already created (even though it was not properly initialized) by the time __init__ was called.
Answering your question:
In Python, if code in your
"constructor" fails, the object ends
up not being defined.
That's not precisely true. If __init__ raises an exception, the object is
created but not initialized properly (e.g., some attributes are not
assigned). But at the time that it's raised, you probably don't have any references to
this object, so the fact that the attributes are not assigned doesn't matter. Only the destructor (if any) needs to check whether the attributes actually exist.
Whats a proper way of doing this?
In Python, initialize objects in __init__ and don't worry about exceptions.
In C++, use RAII.
Update [about resource management]:
In garbage collected languages, if you are dealing with resources, especially limited ones such as database connections, it's better not to release them in the destructor.
This is because objects are destroyed in a non-deterministic way, and if you happen
to have a loop of references (which is not always easy to tell), and at least one of the objects in the loop has a destructor defined, they will never be destroyed.
Garbage collected languages have other means of dealing with resources. In Python, it's a with statement.

In C++ at least, there is nothing wrong with putting failure-prone code in the constructor - you simply throw an exception if an error occurs. If the code is needed to properly construct the object, there reallyb is no alternative (although you can abstract the code into subfunctions, or better into the constructors of subobjects). Worst practice is to half-construct the object and then expect the user to call other functions to complete the construction somehow.

It is not bad practice per se.
But I think you may be after a something different here. In your example the doSomething() method will not be called when the MyClass constructor fails. Try the following code:
class MyClass:
def __init__(self, s):
print s
raise Exception("Exception")
def doSomething(self):
print "doSomething"
try:
someInstance = MyClass("test123")
someInstance.doSomething()
except:
print "except"
It should print:
test123
except
For your software design you could ask the following questions:
What should the scope of the someInstance variable be? Who are its users? What are their requirements?
Where and how should the error be handled for the case that one of your 10 values is not available?
Should all 10 values be cached at construction time or cached one-by-one when they are needed the first time?
Can the I/O code be refactored into a helper method, so that doing something similiar 10 times does not result in code repetition?
...

I'm not a Python developer, but in general, it's best to avoid complex/error-prone operations in your constructor. One way around this would be to put a "LoadFromFile" or "Init" method in your class to populate the object from an external source. This load/init method must then be called separately after constructing the object.

One common pattern is two-phase construction, also suggested by Andy White.
First phase: Regular constructor.
Second phase: Operations that can fail.
Integration of the two: Add a factory method to do both phases and make the constructor protected/private to prevent instantation outside the factory method.
Oh, and I'm neither a Python developer.

If the code to initialise the various values is really extensive enough that copying it is undesirable (which it sounds like it is in your case) I would personally opt for putting the required initialisation into a private method, adding a flag to indicate whether the initialisation has taken place, and making all accessors call the initialisation method if it has not initialised yet.
In threaded scenarios you may have to add extra protection in case initialisation is only allowed to occur once for valid semantics (which may or may not be the case since you are dealing with a file).

Again, I've got little experience with Python, however in C# its better to try and avoid having a constructor that throws an exception. An example of why that springs to mind is if you want to place your constructor at a point where its not possible to surround it with a try {} catch {} block, for example initialisation of a field in a class:
class MyClass
{
MySecondClass = new MySecondClass();
// Rest of class
}
If the constructor of MySecondClass throws an exception that you wish to handle inside MyClass then you need to refactor the above - its certainly not the end of the world, but a nice-to-have.
In this case my approach would probably be to move the failure-prone initialisation logic into an initialisation method, and have the getters call that initialisation method before returning any values.
As an optimisation you should have the getter (or the initialisation method) set some sort of "IsInitialised" boolean to true, to indicate that the (potentially costly) initialisation does not need to be done again.
In pseudo-code (C# because I'll just mess up the syntax of Python):
class MyClass
{
private bool IsInitialised = false;
private string myString;
public void Init()
{
// Put initialisation code here
this.IsInitialised = true;
}
public string MyString
{
get
{
if (!this.IsInitialised)
{
this.Init();
}
return myString;
}
}
}
This is of course not thread-safe, but I don't think multithreading is used that commonly in python so this is probably a non-issue for you.

seems Neil had a good point: my friend just pointed me to this:
http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization
which is basically what Neil said...

Related

Caching an instance's method indefinitely raises pylint warning

class A:
#cache
def extremely_long_and_expensive_function(self) -> None:
# series of instructions that MUST access self
Pylint complains as follows:
lru_cache(maxsize=None)' or 'cache' will keep all method args alive
indefinitely, including 'self'pylint(method-cache-max-size-none)
But I could not find a satisfying solution online that actually tells me how
to cache that method without having to create some contrived rube-goldberg machine.
How do I memoize expensive_function so that the method is run exactly once and no more, no matter how many times I launch it?
Others have suggested using #cached_property, but this is not a property, so it feels wrong to write A().expensive_function. It's a function that executes initialization commands that are not always needed in every instance, and it doesn't return anything, so it should not be a property.
Surely there's some simple way to do this that I'm missing, I don't want to believe that such a simple use case requires a Frankenstein reimplementation like the answer in https://stackoverflow.com/a/33672499/11558993.

LLDB Python scripting create variable

I am using LLDB Python scripting support to add custom Variable Formatting for a complex C++ class type in XCode.
This is working well for simple situations, but I have hit a wall when I need to call a method which uses a pass-by-reference parameter, which it populates with results. This would require me to create a variable to pass here, but I can't find a way to do this?
I have tried using the target's CreateValueFromData method, as below, but this doesn't seem to work.
import lldb
def MyClass(valobj, internal_dict):
class2_type = valobj.target.FindFirstType('class2')
process = valobj.process
class2Data = [0]
data = lldb.SBData.CreateDataFromUInt32Array(process.GetByteOrder(), process.GetAddressByteSize(), class2Data)
valobj.target.CreateValueFromData("testClass2", data, class2_type)
valobj.EvaluateExpression("getType(testClass2)")
class2Val = valobj.frame.FindVariable("testClass2")
if not class2Val.error.success:
return class2Val.error.description
return class2Val.GetValueAsUnsigned()
Is there some way to be able to achieve what I'm trying to do?
SBValue names are just labels for the SBValue, they aren't guaranteed to exist as symbols in the target. For instance if the value you are formatting is an ivar of some other object, it's name will be the ivar name... And lldb does not inject new SBValue's names into the symbol table - that would end up causing lots of name collisions. So they don't exist in the namespace the expression evaluator queries when looking up names.
If the variable you are formatting is a pointer, you can get the pointer value and cons up an expression that casts the pointer value to the appropriate type for your getType function, and pass that to your function. If the value is not a pointer, you can still use SBValue.AddressOf to get the memory location of the value. If the value exists only in lldb (AddressOf will return an invalid address) then you would have to push it to the target with SBProcess.AllocateMemory/WriteMemory, but that should only happen if you have another data formatter that makes these objects out of whole cloth for its own purposes.
It's better not to call functions in formatters if you can help it. But if you really must call a function in your data formatter, you should to do that judiciously.
They can cause performance problems (if you have an array of 100 elements of this type, your formatter will require 100 function calls in the target to render the array... That's 200 context switches between your process and the debugger, plus a bunch of memory reads and writes) for every step operation.
Also, since you can't ensure that the data in your value is correct (it might represent a variable that has not been initialized yet, or already deallocated) you either need to have your function handle bad data, or at least be prepared for the expression to crash. lldb can clean up the stack and suppress the exception from crashes, but it can't undo any side-effects the expression might have had before crashing.
For instance, if the function you called took some lock before crashing that it was expecting to release on the way out, your formatter will damage the state of the program. So you have to be careful what you call...
And by default, EvaluateExpression will allow all threads to run so that expressions don't deadlock against a lock held by another thread. You probably don't want that to happen, since that means looking at the locals of one thread will "change" the state of another thread. So you really should only call functions you are sure don't take locks. And use the version of EvaluateExpression that takes an SBExpressionOption, in which you set the SBExpressionOptions.StopOthers to True, and SetTryAllThreads to False.

Best practice when defining instance variables

I'm fairly new to Python and have a question regarding the following class:
class Configuration:
def __init__(self):
parser = SafeConfigParser()
try:
if parser.read(CONFIG_FILE) is None:
raise IOError('Cannot open configuration file')
except IOError, error:
sys.exit(error)
else:
self.__parser = parser
self.fileName = CONFIG_FILE
def get_section(self):
p = self.__parser
result = []
for s in p.sections():
result.append('{0}'.format(s))
return result
def get_info(self, config_section):
p = self.__parser
self.section = config_section
self.url = p.get(config_section, 'url')
self.imgexpr = p.get(config_section, 'imgexpr')
self.imgattr1 = p.get(config_section, 'imgattr1')
self.imgattr2 = p.get(config_section, 'imgattr2')
self.destination = p.get(config_section, 'destination')
self.createzip = p.get(config_section, 'createzip')
self.pagesnumber = p.get(config_section, 'pagesnumber')
Is it OK to add more instance variables in another function, get_info in this example, or is it best practice to define all instance variables in the constructor? Couldn't it lead to spaghetti code if I define new instance variables all over the place?
EDIT: I'm using this code with a simple image scraper. Via get_section I return all sections in the config file, and then iterate through them to visit each site that I'm scraping images from. For each iteration I make a call to get_section to get the configuration settings for each section in the config file.
If anyone can come up with another approach it'll be fine! Thanks!
I would definitely declare all instance variables in __init__. To not do so leads to increased complexity and potential unexpected side effects.
To provide an alternate point of view from David Hall in terms of access, this is from the Google Python style guide.
Access Control:
If an accessor function would be trivial you should use public
variables instead of accessor functions to avoid the extra cost of
function calls in Python. When more functionality is added you can use
property to keep the syntax consistent
On the other hand, if access is more complex, or the cost of accessing
the variable is significant, you should use function calls (following
the Naming guidelines) such as get_foo() and set_foo(). If the past
behavior allowed access through a property, do not bind the new
accessor functions to the property. Any code still attempting to
access the variable by the old method should break visibly so they are
made aware of the change in complexity.
From PEP8
For simple public data attributes, it is best to expose just the
attribute name, without complicated accessor/mutator methods. Keep in
mind that Python provides an easy path to future enhancement, should
you find that a simple data attribute needs to grow functional
behavior. In that case, use properties to hide functional
implementation behind simple data attribute access syntax.
Note 1: Properties only work on new-style classes.
Note 2: Try to keep the functional behavior side-effect free, although
side-effects such as caching are generally fine.
Note 3: Avoid using properties for computationally expensive
operations; the attribute notation makes the caller believe that
access is (relatively) cheap.
Python isn't java/C#, and it has very strong ideas about how code should look and be written. If you are coding in python, it makes sense to make it look and feel like python. Other people will be able to understand your code more easily and you'll be able to understand other python code better as well.
I would favour setting all the instance variables in the constructor over having functions like get_info() that are required to put the class in a valid state.
With public instance variables that are only instantiated by calls to methods such as your get_info() you create a class that is a bit of a minefield to use.
If you are worried about have certain configuration values which are not always needed and are expensive to calculate (which I guess is why you have get_info(), allowing for deferred execution), then I'd either consider refactoring that subset of config into a second class or introducting properties or functions that return values.
With properties or get style functions you encourage consumers of the class to go through a defined interface and improve the encapsulation 1.
Once you have that encapsulation of the instance variables you give yourself the option to do something more than simply throw a NameError exception - you can perhaps call get_info() yourself, or throw a custom exception.
1.You can't provide 100% encapsulation with Python since private instance variables denoted by a leading double underscore are only private by convention

Using a global MANIFEST for object execution. Is there a better way?

I find myself repeating a common pattern when trying to execute code across multiple objects.
Arg_list_one = ["first","second", "so on"]
Arg_list_two = ["first","second", "so on"]
MANIFEST = [ ]
class connection(object):
def __init__(self, args):
...
MANIFEST.append(self)
def Run(self):
...
connection(Arg_list_one)
connection(Arg_list_two)
[conn.Run() for conn in MANIFEST]
Is this a pattern (or anti-pattern)? Or just something that I made up?
Are there other, better, ways of doing this?
Why would you need a list of all objects ever created? Many of those may belong to completely unrelated pieces of your application! A given piece of code shouldn't assume it's the only one to user a class. Especially since there's usually no need to:
A function creates a bunch of objects and then does something to all of them? Put the objects into a temporary, locally-scoped list.
Need to share some objects between functions? Put them in a list, pass the list to whoever should see the objects.
Instances of some class share some of these object? Make a list of them and put them in the class.
Et cetera, you see where this is going.
A more practical and less stylistic issue is that this list would keep every single object of that class that's ever instanciated alive forever. And they say one can't create memory leaks in Python... (This can be avoided with weak reference, but would make the code much more complex to transparently remove dead references.)
The solution is barely more typing and saves you a whole lot of headaches later on. Next you'll use local variables to save yourself a return?
connections = [Connection(arg_list_one), Connection(arg_list_two)]
for connection in connections:
connection.run()
That said, there may be situations where such a list may be useful (with a fix for said memory leak, of course). I just don't see anything close to that in your example, and I think such situations are very rare.
I would suggest moving the manifest into the class. If it's really being used as a class variable, make it one.
class connection(object):
MANIFEST = [ ]
def __init__(self, args):
...
self.MANIFEST.append(self)
def Run(self):
...
#classmethod
def RunAll(cls):
for conn in cls.MANIFEST:
conn.Run()
connection(Arg_list_one)
connection(Arg_list_two)
conn.RunAll()
Also, if there are a lot of objects, your method will aggregate a long list of Nones (or whatever Run returns), so you're probably better off with a normal for loop.
Edit: The memory leak issue is a good point. If you do this, use the special __del__ method to remove the object from the list when it's deleted.
Edit 2: Actually, I think you'd need to do this in a close() method or something, because __del__ will never be called while there is a reference in the list.

Is it common/good practice to test for type values in Python?

Is it common in Python to keep testing for type values when working in a OOP fashion?
class Foo():
def __init__(self,barObject):
self.bar = setBarObject(barObject)
def setBarObject(barObject);
if (isInstance(barObject,Bar):
self.bar = barObject
else:
# throw exception, log, etc.
class Bar():
pass
Or I can use a more loose approach, like:
class Foo():
def __init__(self,barObject):
self.bar = barObject
class Bar():
pass
Nope, in fact it's overwhelmingly common not to test for type values, as in your second approach. The idea is that a client of your code (i.e. some other programmer who uses your class) should be able to pass any kind of object that has all the appropriate methods or properties. If it doesn't happen to be an instance of some particular class, that's fine; your code never needs to know the difference. This is called duck typing, because of the adage "If it quacks like a duck and flies like a duck, it might as well be a duck" (well, that's not the actual adage but I got the gist of it I think)
One place you'll see this a lot is in the standard library, with any functions that handle file input or output. Instead of requiring an actual file object, they'll take anything that implements the read() or readline() method (depending on the function), or write() for writing. In fact you'll often see this in the documentation, e.g. with tokenize.generate_tokens, which I just happened to be looking at earlier today:
The generate_tokens() generator requires one argument, readline, which must be a callable object which provides the same interface as the readline() method of built-in file objects (see section File Objects). Each call to the function should return one line of input as a string.
This allows you to use a StringIO object (like an in-memory file), or something wackier like a dialog box, in place of a real file.
In your own code, just access whatever properties of an object you need, and if it's the wrong kind of object, one of the properties you need won't be there and it'll throw an exception.
I think that it's good practice to check input for type. It's reasonable to assume that if you asked a user to give one data type they might give you another, so you should code to defend against this.
However, it seems like a waste of time (both writing and running the program) to check the type of input that the program generates independent of input. As in a strongly-typed language, checking type isn't important to defend against programmer error.
So basically, check input but nothing else so that code can run smoothly and users don't have to wonder why they got an exception rather than a result.
If your alternative to the type check is an else containing exception handling, then you should really consider duck typing one tier up, supporting as many objects with the methods you require from the input, and working inside a try.
You can then except (and except as specifically as possible) that.
The final result wouldn't be unlike what you have there, but a lot more versatile and Pythonic.
Everything else that needed to be said about the actual question, whether it's common/good practice or not, I think has been answered excellently by David's.
I agree with some of the above answers, in that I generally never check for type from one function to another.
However, as someone else mentioned, anything accepted from a user should be checked, and for things like this I use regular expressions. The nice thing about using regular expressions to validate user input is that not only can you verify that the data is in the correct format, but you can parse the input into a more convenient form, like a string into a dictionary.

Categories