How and where to compute derived instance variables in python - python

I have a class with a str instance variable. From this instance variable, I calculate a second instance variable, which is basically just the string broken up into certain 'atoms'. The second instance variable is completely determined by the first. I've made it an instance variable because I think that it is best regarded as a 'property' of the class. I'm a bit unsure about how to treat derived instance variables. In particular:
1) I think that they should be get-only properties. However, since the computation of the derived instance variable is quite intensive, I want it to be done when the class is initiated, not when the variable is called.
2) If I make a function purely for calculating the instance variable, is there a way to mark this?
3) Also, should I pass the first instance variable as a parameter, or just read it in the method from self? (in general I'm still a bit unsure of when to pass instance variables as parameters to methods.)
4) Is there a better way to do this that I haven't mentioned?
Thanks
EDIT: Here's a simplified example of what I mean:
class Amendment:
def __init__(self, string):
self.string = string
self.atoms = generate_atoms()
def generate_atoms():
return do_something_that_takes_long(self.string)

You forgot self in a couple of places. But here's how to make .string and .atoms get-only properties. We use a couple of "private" attributes that are created during __init__, and use #property to create the actual getters.
class Amendment:
def __init__(self, string):
self._string = string
self._atoms = self.generate_atoms()
def generate_atoms(self):
#return do_something_that_takes_long(self.string)
return list(self.string)
#property
def string(self):
return self._string
#property
def atoms(self):
return self._atoms
# Test
a = Amendment('abc')
print(a.string, a.atoms)
# This will raise an error because `.string` is a get-only property.
a.string = 'xyz'
output
abc ['a', 'b', 'c']
Traceback (most recent call last):
File "./qtest.py", line 53, in <module>
a.string = 'xyz'
AttributeError: can't set attribute
If you like, you could also mark generate_atoms as private, but there's probably no need. And nothing stops an insistent user from accessing such things anyway, as the linked docs explain.
As for your 3rd question, methods should normally access the attributes they need via self. In some cases you can use the same method on different attributes, and then it makes sense to pass the attribute as a parameter, but if that's not the case it just looks weird. ;)

Related

Is this sound software engineering practice for class construction?

Is this a plausible and sound way to write a class where there is a syntactic sugar #staticmethod that is used for the outside to interact with? Thanks.
###scrip1.py###
import SampleClass.method1 as method1
output = method1(input_var)
###script2.py###
class SampleClass(object):
def __init__(self):
self.var1 = 'var1'
self.var2 = 'var2'
#staticmethod
def method1(input_var):
# Syntactic Sugar method that outside uses
sample_class = SampleClass()
result = sample_class._method2(input_var)
return result
def _method2(self, input_var):
# Main method executes the various steps.
self.var4 = self._method3(input_var)
return self._method4(self.var4)
def _method3(self):
pass
def _method4(self):
pass
Answering to both your question and your comment, yes it is possible to write such a code but I see no point in doing it:
class A:
def __new__(cls, value):
return cls.meth1(value)
def meth1(value):
return value + 1
result = A(100)
print(result)
# output:
101
You can't store a reference to a class A instance because you get your method result instead of an A instance. And because of this, an existing __init__will not be called.
So if the instance just calculates something and gets discarded right away, what you want is to write a simple function, not a class. You are not storing state anywhere.
And if you look at it:
result = some_func(value)
looks exactly to what people expect when reading it, a function call.
So no, it is not a good practice unless you come up with a good use case for it (I can't remember one right now)
Also relevant for this question is the documentation here to understand __new__ and __init__ behaviour.
Regarding your other comment below my answer:
defining __init__ in a class to set the initial state (attribute values) of the (already) created instance happens all the time. But __new__ has the different goal of customizing the object creation. The instance object does not exist yet when __new__is run (it is a constructor function). __new__ is rarely needed in Python unless you need things like a singleton, say a class A that always returns the very same object instance (of A) when called with A(). Normal user-defined classes usually return a new object on instantiation. You can check this with the id() builtin function. Another use case is when you create your own version (by subclassing) of an immutable type. Because it's immutable the value was already set and there is no way of changing the value inside __init__ or later. Hence the need to act before that, adding code inside __new__. Using __new__ without returning an object of the same class type (this is the uncommon case) has the addtional problem of not running __init__.
If you are just grouping lots of methods inside a class but there is still no state to store/manage in each instance (you notice this also by the absence of self use in the methods body), consider not using a class at all and organize these methods now turned into selfless functions in a module or package for import. Because it looks you are grouping just to organize related code.
If you stick to classes because there is state involved, consider breaking the class into smaller classes with no more than five to 7 methods. Think also of giving them some more structure by grouping some of the small classes in various modules/submodules and using subclasses, because a long plain list of small classes (or functions anyway) can be mentally difficult to follow.
This has nothing to do with __new__ usage.
In summary, use the syntax of a call for a function call that returns a result (or None) or for an object instantiation by calling the class name. In this case the usual is to return an object of the intended type (the class called). Returning the result of a method usually involves returning a different type and that can look unexpected to the class user. There is a close use case to this where some coders return self from their methods to allow for train-like syntax:
my_font = SomeFont().italic().bold()
Finally if you don't like result = A().method(value), consider an alias:
func = A().method
...
result = func(value)
Note how you are left with no reference to the A() instance in your code.
If you need the reference split further the assignment:
a = A()
func = a.method
...
result = func(value)
If the reference to A() is not needed then you probably don't need the instance too, and the class is just grouping the methods. You can just write
func = A.method
result = func(value)
where selfless methods should be decorated with #staticmethod because there is no instance involved. Note also how static methods could be turned into simple functions outside classes.
Edit:
I have setup an example similar to what you are trying to acomplish. It is also difficult to judge if having methods injecting results into the next method is the best choice for a multistep procedure. Because they share some state, they are coupled to each other and so can also inject errors to each other more easily. I assume you want to share some data between them that way (and that's why you are setting them up in a class):
So this an example class where a public method builds the result by calling a chain of internal methods. All methods depend on object state, self.offset in this case, despite getting an input value for calculations.
Because of this it makes sense that every method uses self to access the state. It also makes sense that you are able to instantiate different objects holding different configurations, so I see no use here for #staticmethod or #classmethod.
Initial instance configuration is done in __init__ as usual.
# file: multistepinc.py
def __init__(self, offset):
self.offset = offset
def result(self, value):
return self._step1(value)
def _step1(self, x):
x = self._step2(x)
return self.offset + 1 + x
def _step2(self, x):
x = self._step3(x)
return self.offset + 2 + x
def _step3(self, x):
return self.offset + 3 + x
def get_multi_step_inc(offset):
return MultiStepInc(offset).result
--------
# file: multistepinc_example.py
from multistepinc import get_multi_step_inc
# get the result method of a configured
# MultiStepInc instance
# with offset = 10.
# Much like an object factory, but you
# mentioned to prefer to have the result
# method of the instance
# instead of the instance itself.
inc10 = get_multi_step_inc(10)
# invoke the inc10 method
result = inc10(1)
print(result)
# creating another instance with offset=2
inc2 = get_multi_step_inc(2)
result = inc2(1)
print(result)
# if you need to manipulate the object
# instance
# you have to (on file top)
from multistepinc import MultiStepInc
# and then
inc_obj = MultiStepInc(5)
# ...
# ... do something with your obj, then
result = inc_obj.result(1)
print(result)
Outputs:
37
13
22

Function to behave differently on class vs on instance

I'd like a particular function to be callable as a classmethod, and to behave differently when it's called on an instance.
For example, if I have a class Thing, I want Thing.get_other_thing() to work, but also thing = Thing(); thing.get_other_thing() to behave differently.
I think overwriting the get_other_thing method on initialization should work (see below), but that seems a bit hacky. Is there a better way?
class Thing:
def __init__(self):
self.get_other_thing = self._get_other_thing_inst()
#classmethod
def get_other_thing(cls):
# do something...
def _get_other_thing_inst(self):
# do something else
Great question! What you seek can be easily done using descriptors.
Descriptors are Python objects which implement the descriptor protocol, usually starting with __get__().
They exist, mostly, to be set as a class attribute on different classes. Upon accessing them, their __get__() method is called, with the instance and owner class passed in.
class DifferentFunc:
"""Deploys a different function accroding to attribute access
I am a descriptor.
"""
def __init__(self, clsfunc, instfunc):
# Set our functions
self.clsfunc = clsfunc
self.instfunc = instfunc
def __get__(self, inst, owner):
# Accessed from class
if inst is None:
return self.clsfunc.__get__(None, owner)
# Accessed from instance
return self.instfunc.__get__(inst, owner)
class Test:
#classmethod
def _get_other_thing(cls):
print("Accessed through class")
def _get_other_thing_inst(inst):
print("Accessed through instance")
get_other_thing = DifferentFunc(_get_other_thing,
_get_other_thing_inst)
And now for the result:
>>> Test.get_other_thing()
Accessed through class
>>> Test().get_other_thing()
Accessed through instance
That was easy!
By the way, did you notice me using __get__ on the class and instance function? Guess what? Functions are also descriptors, and that's the way they work!
>>> def func(self):
... pass
...
>>> func.__get__(object(), object)
<bound method func of <object object at 0x000000000046E100>>
Upon accessing a function attribute, it's __get__ is called, and that's how you get function binding.
For more information, I highly suggest reading the Python manual and the "How-To" linked above. Descriptors are one of Python's most powerful features and are barely even known.
Why not set the function on instantiation?
Or Why not set self.func = self._func inside __init__?
Setting the function on instantiation comes with quite a few problems:
self.func = self._funccauses a circular reference. The instance is stored inside the function object returned by self._func. This on the other hand is stored upon the instance during the assignment. The end result is that the instance references itself and will clean up in a much slower and heavier manner.
Other code interacting with your class might attempt to take the function straight out of the class, and use __get__(), which is the usual expected method, to bind it. They will receive the wrong function.
Will not work with __slots__.
Although with descriptors you need to understand the mechanism, setting it on __init__ isn't as clean and requires setting multiple functions on __init__.
Takes more memory. Instead of storing one single function, you store a bound function for each and every instance.
Will not work with properties.
There are many more that I didn't add as the list goes on and on.
Here is a bit hacky solution:
class Thing(object):
#staticmethod
def get_other_thing():
return 1
def __getattribute__(self, name):
if name == 'get_other_thing':
return lambda: 2
return super(Thing, self).__getattribute__(name)
print Thing.get_other_thing() # 1
print Thing().get_other_thing() # 2
If we are on class, staticmethod is executed. If we are on instance, __getattribute__ is first to be executed, so we can return not Thing.get_other_thing but some other function (lambda in my case)

Why it's not possible to create object attribute outside object methods?

While researching about python class attribute and instance attribute, I came to know that it's not possible to create object attribute outside object methods (or may be class method). Like code below will generate an "NameError" in python.
class test(object):
def __init__(self):
self.lst = []
self.str = 'xyz'
Why python doesn't allow this? I'm not questioning language creator's decision, but any reason behind this. Like, is it technically incorrect or any other disadvantage of this behavior.
You are defining a class, so there is no instance to point to outside methods. Drop the `self:
class test(object):
def __init__(self):
self.lst = []
str = 'xyz'
self points to the instance, not the class. You either need to create an instance and assign directly to attributes (test().str = 'xyz') or you need to be inside a method (when self can actually refer to an instance).
self is not a special name in python, you could use \
class test(object):
def __init__(foo):
foo.lst = []
If you want. Every method of a class gets the instance explicitly passed to it as the first parameter, you can call it whatever you want. Trying to access a parameter outside the scope of the method obviously won't work.

Classes in python, how to set an attributes

When I write class in python, most of the time, I am eager to set variables I use, as properties of the object. Is there any rule or general guidelines about which variables should be used as class/instance attribute and which should not?
for example:
class simple(object):
def __init(self):
a=2
b=3
return a*b
class simple(object):
def __init(self):
self.a=2
self.b=3
return a*b
While I completely understand the attributes should be a property of the object. This is simple to understand when the class declaration is simple but as the program goes longer and longer and there are many places where the data exchange between various modules should be done, I get confused on where I should use a/b or self.a/self.b. Is there any guidelines for this?
Where you use self.a you are creating a property, so this can be accessed from outside the class and persists beyond that function. These should be used for storing data about the object.
Where you use a it is a local variable, and only lasts while in the scope of that function, so should be used where you are only using it within the function (as in this case).
Note that __init is misleading, as it looks like __init__ - but isn't the constructor. If you intended them to be the constructor, then it makes no sense to return a value (as the new object is what is returned).
class Person(object):
def __init__(self, name):
# Introduce all instance variables on __init__
self.name = name
self.another = None
def get_name(self):
# get_name has access to the `instance` variable 'name'
return self.name
So if you want a variable to be available on more than one method, make
it an instance variable.
Notice my comment on introducing all instance vars on __init__.
Although the example below is valid python don't do it.
class Person(object):
def __init__(self):
self.a = 0
def foo(self):
self.b = 1 # Whoa, introduced new instance variable
Instead initialize all your instance variables on __init__ and set
them to None if no other value is appropriate for them.
I try to imagine what I want the API of my class to look like prior to implementing it. I think to myself, If I didn't write this class, would I want to read the documentation about what this particular variable does? If reading that documentation would simply waste my time, then it should probably be a local variable.
Occasionally, you need to preserve some information, but you wouldn't necessarily want that to be part of the API, which is when you use the convention of appending an underscore. e.g. self._some_data_that_is_not_part_of_the_api.
The self parameter refers to the object itself. So if you need to use on of the class attributes outside of the class you would it call it as the name of class instance and the attribute name. I don't think there is any guideline on when to use self, it all depends on your need. When you are building a class you should try to think about what you will use the variables you creating for. If you know for sure that you will need that specific attribute in the program you are importing your class, then add self.

Is there a way to instantiate a class without calling __init__?

Is there a way to circumvent the constructor __init__ of a class in python?
Example:
class A(object):
def __init__(self):
print "FAILURE"
def Print(self):
print "YEHAA"
Now I would like to create an instance of A. It could look like this, however this syntax is not correct.
a = A
a.Print()
EDIT:
An even more complex example:
Suppose I have an object C, which purpose it is to store one single parameter and do some computations with it. The parameter, however, is not passed as such but it is embedded in a huge parameter file. It could look something like this:
class C(object):
def __init__(self, ParameterFile):
self._Parameter = self._ExtractParamterFile(ParameterFile)
def _ExtractParamterFile(self, ParameterFile):
#does some complex magic to extract the right parameter
return the_extracted_parameter
Now I would like to dump and load an instance of that object C. However, when I load this object, I only have the single variable self._Parameter and I cannot call the constructor, because it is expecting the parameter file.
#staticmethod
def Load(file):
f = open(file, "rb")
oldObject = pickle.load(f)
f.close()
#somehow create newObject without calling __init__
newObject._Parameter = oldObject._Parameter
return newObject
In other words, it is not possible to create an instance without passing the parameter file. In my "real" case, however, it is not a parameter file but some huge junk of data I certainly not want to carry around in memory or even store it to disc.
And since I want to return an instance of C from the method Load I do somehow have to call the constructor.
OLD EDIT:
A more complex example, which explains why I am asking the question:
class B(object):
def __init__(self, name, data):
self._Name = name
#do something with data, but do NOT save data in a variable
#staticmethod
def Load(self, file, newName):
f = open(file, "rb")
s = pickle.load(f)
f.close()
newS = B(???)
newS._Name = newName
return newS
As you can see, since data is not stored in a class variable I cannot pass it to __init__. Of course I could simply store it, but what if the data is a huge object, which I do not want to carry around in memory all the time or even save it to disc?
You can circumvent __init__ by calling __new__ directly. Then you can create a object of the given type and call an alternative method for __init__. This is something that pickle would do.
However, first I'd like to stress very much that it is something that you shouldn't do and whatever you're trying to achieve, there are better ways to do it, some of which have been mentioned in the other answers. In particular, it's a bad idea to skip calling __init__.
When objects are created, more or less this happens:
a = A.__new__(A, *args, **kwargs)
a.__init__(*args, **kwargs)
You could skip the second step.
Here's why you shouldn't do this: The purpose of __init__ is to initialize the object, fill in all the fields and ensure that the __init__ methods of the parent classes are also called. With pickle it is an exception because it tries to store all the data associated with the object (including any fields/instance variables that are set for the object), and so anything that was set by __init__ the previous time would be restored by pickle, there's no need to call it again.
If you skip __init__ and use an alternative initializer, you'd have a sort of a code duplication - there would be two places where the instance variables are filled in, and it's easy to miss one of them in one of the initializers or accidentally make the two fill the fields act differently. This gives the possibility of subtle bugs that aren't that trivial to trace (you'd have to know which initializer was called), and the code will be more difficult to maintain. Not to mention that you'd be in an even bigger mess if you're using inheritance - the problems will go up the inheritance chain, because you'd have to use this alternative initializer everywhere up the chain.
Also by doing so you'd be more or less overriding Python's instance creation and making your own. Python already does that for you pretty well, no need to go reinventing it and it will confuse people using your code.
Here's what to best do instead: Use a single __init__ method that is to be called for all possible instantiations of the class that initializes all instance variables properly. For different modes of initialization use either of the two approaches:
Support different signatures for __init__ that handle your cases by using optional arguments.
Create several class methods that serve as alternative constructors. Make sure they all create instances of the class in the normal way (i.e. calling __init__), as shown by Roman Bodnarchuk, while performing additional work or whatever. It's best if they pass all the data to the class (and __init__ handles it), but if that's impossible or inconvenient, you can set some instance variables after the instance was created and __init__ is done initializing.
If __init__ has an optional step (e.g. like processing that data argument, although you'd have to be more specific), you can either make it an optional argument or make a normal method that does the processing... or both.
Use classmethod decorator for your Load method:
class B(object):
def __init__(self, name, data):
self._Name = name
#store data
#classmethod
def Load(cls, file, newName):
f = open(file, "rb")
s = pickle.load(f)
f.close()
return cls(newName, s)
So you can do:
loaded_obj = B.Load('filename.txt', 'foo')
Edit:
Anyway, if you still want to omit __init__ method, try __new__:
>>> class A(object):
... def __init__(self):
... print '__init__'
...
>>> A()
__init__
<__main__.A object at 0x800f1f710>
>>> a = A.__new__(A)
>>> a
<__main__.A object at 0x800f1fd50>
Taking your question literally I would use meta classes :
class MetaSkipInit(type):
def __call__(cls):
return cls.__new__(cls)
class B(object):
__metaclass__ = MetaSkipInit
def __init__(self):
print "FAILURE"
def Print(self):
print "YEHAA"
b = B()
b.Print()
This can be useful e.g. for copying constructors without polluting the parameter list.
But to do this properly would be more work and care than my proposed hack.
Not really. The purpose of __init__ is to instantiate an object, and by default it really doesn't do anything. If the __init__ method is not doing what you want, and it's not your own code to change, you can choose to switch it out though. For example, taking your class A, we could do the following to avoid calling that __init__ method:
def emptyinit(self):
pass
A.__init__ = emptyinit
a = A()
a.Print()
This will dynamically switch out which __init__ method from the class, replacing it with an empty call. Note that this is probably NOT a good thing to do, as it does not call the super class's __init__ method.
You could also subclass it to create your own class that does everything the same, except overriding the __init__ method to do what you want it to (perhaps nothing).
Perhaps, however, you simply wish to call the method from the class without instantiating an object. If that is the case, you should look into the #classmethod and #staticmethod decorators. They allow for just that type of behavior.
In your code you have put the #staticmethod decorator, which does not take a self argument. Perhaps what may be better for the purpose would a #classmethod, which might look more like this:
#classmethod
def Load(cls, file, newName):
# Get the data
data = getdata()
# Create an instance of B with the data
return cls.B(newName, data)
UPDATE: Rosh's Excellent answer pointed out that you CAN avoid calling __init__ by implementing __new__, which I was actually unaware of (although it makes perfect sense). Thanks Rosh!
I was reading the Python cookbook and there's a section talking about this: the example is given using __new__ to bypass __init__()
>>> class A:
def __init__(self,a):
self.a = a
>>> test = A('a')
>>> test.a
'a'
>>> test_noinit = A.__new__(A)
>>> test_noinit.a
Traceback (most recent call last):
File "", line 1, in
test_noinit.a
AttributeError: 'A' object has no attribute 'a'
>>>
However I think this only works in Python3. Below is running under 2.7
>>> class A:
def __init__(self,a):
self.a = a
>>> test = A.__new__(A)
Traceback (most recent call last):
File "", line 1, in
test = A.__new__(A)
AttributeError: class A has no attribute '__new__'
>>>
As I said in my comment you could change your __init__ method so that it allows creation without giving any values to its parameters:
def __init__(self, p0, p1, p2):
# some logic
would become:
def __init__(self, p0=None, p1=None, p2=None):
if p0 and p1 and p2:
# some logic
or:
def __init__(self, p0=None, p1=None, p2=None, init=True):
if init:
# some logic

Categories