Why must instance variables be defined inside of methods?

Why must instance variables be defined inside of methods? - python

Why must instance variables be defined inside of methods? In other words why must self only be used to define new variables inside of methods in a class. Why can't you define variables using self as part of the class, but outside of methods.
"Instance variables are those variables for which each class object has it's own copy of it" - this definition doesn't say anything about methods. So, given that the definition doesn't mention methods why can't I define an instance variable (in other words use self to define a new variable) inside of a class, but outside of a method?

Python requires the object reference (implicit or explicit this in Java, for example) to be explicit. Inside methods -- bound functions -- the first param in the function definition is the instance. (This is conventionally called self but you can use any name.)
If you define
class C:
x = 1
there is no self reference, unlike, e.g. Java, where this is implicit.

Because the mechanism which Python uses to deal with OOP are very simple. There's no special syntax to define classes really, the class keyword is a very thin layer over what amounts to creating a dict. Everything you define inside a class Foo: block basically ends up as the contents of Foo.__dict__. So there's no syntax to define attributes of the instance resulting from calling Foo(). You add instance attributes simply by attaching them to the object you get from calling Foo(), which is self in __init__ or other instance methods.

For that to answer you need to know a little bit how the Python interpreter works.
In general every class and method definition are separate objects.
What you do when calling a method is that you pass the class instance as first parameter to the method. With that the method knows on what instance it is running on (and therefore where to allocate instance variables to).
This however only counts for instance methods.
Of course you can also create classmethods with #classmethod these take the class type as argument instead of an instance and can therefore not be used to create variables on the self context.

Why must instance variables be defined inside of methods?
They don't. You can define them from anywhere, as long as you have an instance (of a mutable type):
class Foo(object):
pass
f = Foo()
f.bar = 42
print(f.bar)
In other words why must self only be used to define new variables inside of methods in a class. Why can't you define variables using self as part of the class, but outside of methods.
self (which is only a naming convention, there's absolutely nothing magical here) is used to represent the current instance. How could you use it at the class block's top-level where you don't have any instance at all (and not even the class itself FWIW) ?
Defining the class "members" at the class top-level is mostly a static languages thing, where "objects" are mainly (technically) structs (C style structs, or Pascal style records if you prefer) with a statically defined memory structure.
Python is a dynamic language, which instead uses dicts as supporting data structure, so someobj.attribute is usually (minus computed attributes etc) resolved as someobj.__dict__["attribute"] (and someobj.attribute = value as someobj.__dict__["attribute"] = value).
So 1/ it doesn't NEED to have a fixed, explicitely defined data structure, and 2/ yet it DOES need to have an instance at end to set an attribute on it.
Note that you can force a class to use a fixed memory structure (instead of a plain dict) using slots, but you will still need to set the values from within a method (canonically the __init__, which exists for this very reason: initializing the instance's attributes).

Related

Questions related to classes

I have a problem understanding some concepts of data structures in Python, in the following code.
class Stack(object): #1
def __init__(self): #2
self.items=[]
def isEmpty(self):
return self.items ==[]
def push(self,item):
self.items.append(item)
def pop(self):
self.items.pop()
def peak(self):
return self.items[len(self.items)-1]
def size(self):
return len(self.items)
s = Stack()
s.push(3)
s.push(7)
print(s.peak())
print (s.size())
s.pop()
print (s.size())
print (s.isEmpty())
I don't understand what is this object argument
I replaced it with (obj) and it generated an error, why?
I tried to remove it and it worked perfectly, why?
Why do I have __init__ to set a constructor?
self is an argument, but how does it get passed? and which object does it represent, the class it self?
Thanks.

object is a class, from which class Stack inherits. There is no
class obj, hence error. However, you can define a class that does
not inherit from anything (at least, in Python 2).
self represents an object on which the method is called; for
example when you do s.pop(), self inside method pop refers to
the same object as s - it is not a class, it is an instance of the class.

1
object here is the class your new class inherits from. There is already a base class named object, but there is no class named obj which is why replacing object with obj would cause an error. Anyway in your example code it is not needed at all since all classes in python 3 implicitly extends the object class.
2
__init__ is the constructor of the object and self there represents the object that you are creating itself, not the class, just like in the other methods you made.

Point 1:
Some history required here... Originally Python had two distinct kind of types, those implemented in C (whether in the stdlib or C extensions) and those implemented in Python with the class statement. Python 2.2 introduced a new object model (known as "new-style classes") to unify both, but kept the "classic" (aka "old-style") model for compatibility. This new model also introduced quite a lot of goodies like support for computed attributes, cooperative super calls via the super() object, metaclasses etc, all of which coming from the builtin object base class.
So in Python 2.2.x to 2.7.x, you can either create a new-style class by inheriting from object (or any subclass of object) or an old-style one by not inheriting from object (nor - obviously - any subclass of object).
In Python 2.7., since your example Stack class does not use any feature of the new object model, it works as well as an 'old-style' or as a 'new-style' class, but try to add a custom metaclass or a computed attribute and it will break in one way or another.
Python 3 totally removed old-style classes support and object is the defaut base class if you dont explicitely specify one, so whatever you do your class WILL inherit from object and will work as well with or without explicit parent class.
You can read this for more details.
Point 2.1 - I'm not sure I understand the question actually, but anyway:
In Python, objects are not fixed C-struct-like structures with a fixed set of attributes, but dict-like mappings (well there are exceptions but let's ignore them for the moment). The set of attributes of an object is composed of the class attributes (methods mainly but really any name defined at the class level) that are shared between all instances of the class, and instance attributes (belonging to a single instance) which are stored in the instance's __dict__. This imply that you dont define the instance attributes set at the class level (like in Java or C++ etc), but set them on the instance itself.
The __init__ method is there so you can make sure each instance is initialised with the desired set of attributes. It's kind of an equivalent of a Java constructor, but instead of being only used to pass arguments at instanciation, it's also responsible for defining the set of instance attributes for your class (which you would, in Java, define at the class level).
Point 2.2 : self is the current instance of the class (the instance on which the method is called), so if s is an instance of your Stack class, s.push(42) is equivalent to Stack.push(s, 42).
Note that the argument doesn't have to be called self (which is only a convention, albeit a very strong one), the important part is that it's the first argument.
How s get passed as self when calling s.push(42) is a bit intricate at first but an interesting example of how to use a small feature set to build a larger one. You can find a detailed explanation of the whole mechanism here, so I wont bother reposting it here.

Instance variables in methods outside the constructor (Python) -- why and how?

My questions concern instance variables that are initialized in methods outside the class constructor. This is for Python.
I'll first state what I understand:
Classes may define a constructor, and it may also define other methods.
Instance variables are generally defined/initialized within the constructor.
But instance variables can also be defined/initialized outside the constructor, e.g. in the other methods of the same class.
An example of (2) and (3) -- see self.meow and self.roar in the Cat class below:
class Cat():
def __init__(self):
self.meow = "Meow!"
def meow_bigger(self):
self.roar = "Roar!"
My questions:
Why is it best practice to initialize the instance variable within the constructor?
What general/specific mess could arise if instance variables are regularly initialized in methods other than the constructor? (E.g. Having read Mark Lutz's Tkinter guide in his Programming Python, which I thought was excellent, I noticed that the instance variable used to hold the PhotoImage objects/references were initialized in the further methods, not in the constructor. It seemed to work without issue there, but could that practice cause issues in the long run?)
In what scenarios would it be better to initialize instance variables in the other methods, rather than in the constructor?
To my knowledge, instance variables exist not when the class object is created, but after the class object is instantiated. Proceeding upon my code above, I demonstrate this:
>> c = Cat()
>> c.meow
'Meow!'
>> c.roar
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'Cat' object has no attribute 'roar'
>>> c.meow_bigger()
>>> c.roar
'Roar!'
As it were:
I cannot access the instance variable (c.roar) at first.
However, after I have called the instance method c.meow_bigger() once, I am suddenly able to access the instance variable c.roar.
Why is the above behaviour so?
Thank you for helping out with my understanding.

Why is it best practice to initialize the instance variable within the
constructor?
Clarity.
Because it makes it easy to see at a glance all of the attributes of the class. If you initialize the variables in multiple methods, it becomes difficult to understand the complete data structure without reading every line of code.
Initializing within the __init__ also makes documentation easier. With your example, you can't write "an instance of Cat has a roar attribute". Instead, you have to add a paragraph explaining that an instance of Cat might have a "roar" attribute, but only after calling the "meow_louder" method.
Clarity is king. One of the smartest programmers I ever met once told me "show me your data structures, and I can tell you how your code works without seeing any of your code". While that's a tiny bit hyperbolic, there's definitely a ring of truth to it. One of the biggest hurdles to learning a code base is understanding the data that it manipulates.
What general/specific mess could arise if instance variables are
regularly initialized in methods other than the constructor?
The most obvious one is that an object may not have an attribute available during all parts of the program, leading to having to add a lot of extra code to handle the case where the attribute is undefined.
In what scenarios would it be better to initialize instance variables
in the other methods, rather than in the constructor?
I don't think there are any.
Note: you don't necessarily have to initialize an attribute with it's final value. In your case it's acceptable to initialize roar to None. The mere fact that it has been initialized to something shows that it's a piece of data that the class maintains. It's fine if the value changes later.

Remember that class members in "pure" Python are just a dictionary. Members aren't added to an instance's dictionary until you run the function in which they are defined. Ideally this is the constructor, because that then guarantees that your members will all exist regardless of the order that your functions are called.
I believe your example above could be translated to:
class Cat():
def __init__(self):
self.__dict__['meow'] = "Meow!"
def meow_bigger(self):
self.__dict__['roar'] = "Roar!"
>>> c = Cat() # c.__dict__ = { 'meow': "Meow!" }
>>> c.meow_bigger() # c.__dict__ = { 'meow': "Meow!", 'roar': "Roar!" }

To initialize instance variables within the constructor, is - as you already pointed out - only recommended in python.
First of all, defining all instance variables within the constructor is a good way to document a class. Everybody, seeing the code, knows what kind of internal state an instance has.
Secondly, order matters. if one defines an instance variable V in a function A and there is another function B also accessing V, it is important to call A before B. Otherwise B will fail since V was never defined. Maybe, A has to be invoked before B, but then it should be ensured by an internal state, which would be an instance variable.
There are many more examples. Generally it is just a good idea to define everything in the __init__ method, and set it to None if it can not / should not be initialized at initialization.
Of course, one could use hasattr method to derive some information of the state. But, also one could check if some instance variable V is for example None, which can imply the same then.
So in my opinion, it is never a good idea to define an instance variable anywhere else as in the constructor.
Your examples state some basic properties of python. An object in Python is basically just a dictionary.
Lets use a dictionary: One can add functions and values to that dictionary and construct some kind of OOP. Using the class statement just brings everything into a clean syntax and provides extra stuff like magic methods.
In other languages all information about instance variables and functions are present before the object was initialized. Python does that at runtime. You can also add new methods to any object outside the class definition: Adding a Method to an Existing Object Instance

3.) But instance variables can also be defined/initialized outside the constructor, e.g. in the other methods of the same class.
I'd recommend providing a default state in initialization, just so its clear what the class should expect. In statically typed languages, you'd have to do this, and it's good practice in python.
Let's convey this by replacing the variable roar with a more meaningful variable like has_roared.
In this case, your meow_bigger() method now has a reason to set has_roar. You'd initialize it to false in __init__, as the cat has not roared yet upon instantiation.
class Cat():
def __init__(self):
self.meow = "Meow!"
self.has_roared = False
def meow_bigger(self):
print self.meow + "!!!"
self.has_roared = True
Now do you see why it often makes sense to initialize attributes with default values?
All that being said, why does python not enforce that we HAVE to define our variables in the __init__ method? Well, being a dynamic language, we can now do things like this.
>>> cat1 = Cat()
>>> cat2 = Cat()
>>> cat1.name = "steve"
>>> cat2.name = "sarah"
>>> print cat1.name
... "steve"
The name attribute was not defined in the __init__ method, but we're able to add it anyway. This is a more realistic use case of setting variables that aren't defaulted in __init__.

I try to provide a case where you would do so for:
3.) But instance variables can also be defined/initialized outside the constructor, e.g. in the other methods of the same class.
I agree it would be clear and organized to include instance field in the constructor, but sometimes you are inherit other class, which is created by some other people and has many instance fields and api.
But if you inherit it only for certain apis and you want to have your own instance field for your own apis, in this case, it is easier for you to just declare extra instance field in the method instead override the other's constructor without bothering to deep into the source code. This also support Adam Hughes's answer, because in this case, you will always have your defined instance because you will guarantee to call you own api first.
For instance, suppose you inherit a package's handler class for web development, you want to include a new instance field called user for handler, you would probability just declare it directly in the method--initialize without override the constructor, I saw it is more common to do so.
class BlogHandler(webapp2.RequestHandler):
def initialize(self, *a, **kw):
webapp2.RequestHandler.initialize(self, *a, **kw)
uid = self.read_cookie('user_id') #get user_id by read cookie in the browser
self.user = User.by_id(int(uid)) #run query in data base find the user and return user

These are very open questions.
Python is a very "free" language in the sense that it tries to never restrict you from doing anything, even if it looks silly. This is why you can do completely useless things such as replacing a class with a boolean (Yes you can).
The behaviour that you mention follows that same logic: if you wish to add an attribute to an object (or to a function - yes you can, too) dynamically, anywhere, not necessarily in the constructor, well... you can.
But it is not because you can that you should. The main reason for initializing attributes in the constructor is readability, which is a prerequisite for maintenance. As Bryan Oakley explains in his answer, class fields are key to understand the code as their names and types often reveal the intent better than the methods.
That being said, there is now a way to separate attribute definition from constructor initialization: pyfields. I wrote this library to be able to define the "contract" of a class in terms of attributes, while not requiring initialization in the constructor. This allows you in particular to create "mix-in classes" where attributes and methods relying on these attributes are defined, but no constructor is provided.
See this other answer for an example and details.

i think to keep it simple and understandable, better to initialize the class variables in the class constructor, so they can be directly called without the necessity of compiling of a specific class method.
class Cat():
def __init__(self,Meow,Roar):
self.meow = Meow
self.roar = Roar
def meow_bigger(self):
return self.roar
def mix(self):
return self.meow+self.roar
c=Cat("Meow!","Roar!")
print(c.meow_bigger())
print(c.mix())
Output
Roar!
Roar!
Meow!Roar!

Python : serialise class hierarchy

I have to serialise a dynamically created class hierarchy. And a bunch of objects - instances of the latter classes.
Python pickle is not of big help, its wiki says "Classes ... cannot be pickled". O there may be some trick that I cannot figure.
Performance requirement:
Deserialization should be pretty fast, because the serialised staff serves for cache and should save me the work of creating the same class hierarchy.
Details:
classes are created dynamically using type and sometimes meta-classes.

If you provide a custom object.__reduce__() method I believe you can still use pickling.
Normally, when pickling, the class import path is stored, plus instance state. On unpickling, the class is imported, and a new instance is created using the stored state. This is why pickling cannot work with dynamic classes, there is nothing to import.
The object.__reduce__() method lets you store a different instance factory. The callable returned by this function is stored (again by import path), and called with specified arguments to produce an instance. This instance is then used to apply state to, in the same way a regular instance would be unpickled:
def class_factory(name):
return globals()[name]()
class SomeDynamicClass(object):
def __reduce__(self):
return (class_factory, (type(self).__name__,), self.__dict__)
Here __reduce__ returns a function, the arguments for the function, and the instance state.
All you need to do then, is provide the right arguments to the factory function to recreate the class, and return an instance of that class. It'll be used instead of importing the class directly.

Classes are normal python objects, so, in theory, should be picklable, if you provide __reduce__ (or implement other pickle protocol methods) for them. Try to define __reduce__ on their metaclass.

python 3: class "template" (function that returns a parameterized class)

I am trying to create a function that is passed a parameter x and returns a new class C. C should be a subclass of a fixed base class A, with only one addition: a certain class attribute is added and is set to equal x.
In other words:
class C(A):
C.p = x # x is the parameter passed to the factory function
Is this easy to do? Are there any issues I should be aware of?

First off, note that the term "class factory" is somewhat obsolete in Python. It's used in languages like C++, for a function that returns a dynamically-typed instance of a class. It has a name because it stands out in C++; it's not rare, but it's uncommon enough that it's useful to give the pattern a name. In Python, however, this is done constantly--it's such a basic operation that nobody bothers giving it a special name anymore.
Also, note that a class factory returns instances of a class--not a class itself. (Again, that's because it's from languages like C++, which have no concept of returning a class--only objects.) However, you said you want to return "a new class", not a new instance of a class.
It's trivial to create a local class and return it:
def make_class(x):
class C(A):
p = x
return C

The type function has a 3-argument version which dynamically constructs a new class. Pass the name, bases and a dict containing the attributes and methods of the class.
In your case:
def class_factory(x):
return type("C", (A,), {"p": x})
You can obviously dynamically set the name of the class, "C", but note that in order to make the class publicly accessible, you also need to assign the result of the function to a variable. You can do that dynamically using globals()["C"] = ..., or assign the classes to a dictionary, whatever.

Is there any particular reason why this syntax is used for instantiating a class?

I was wondering if anyone knew of a particular reason (other than purely stylistic) why the following languages these syntaxes to initiate a class?
Python:
class MyClass:
def __init__(self):
x = MyClass()
Ruby:
class AnotherClass
def initialize()
end
end
x = AnotherClass.new()
I can't understand why the syntax used for the constructor and the syntax used to actually get an instance of the class are so different. Sure, I know it doesn't really make a difference but, for example, in ruby what's wrong with making the constructor "new()"?

When you are creating an object of a class, you are doing more than just initializing it. You are allocating the memory for it, then initializing it, then returning it.
Note also that in Ruby, new() is a class method, while initialize() is an instance method. If you simply overrode new(), you would have to create the object first, then operate on that object, and return it, rather than the simpler initialize() where you can just refer to self, as the object has already been created for you by the built-in new() (or in Ruby, leave self off as it's implied).
In Objective-C, you can actually see what's going on a little more clearly (but more verbosely) because you need to do the allocation and initialization separately, since Objective-C can't pass argument lists from the allocation method to the initialization one:
[[MyClass alloc] initWithFoo: 1 bar: 2];

Actually in Python the constructor is __new__(), while __init__() is instance initializer.
__new__() is static class method, thus it has to be called first, as a first parameter (usually named cls or klass) it gets the class . It creates object instance, which is then passed to __init__() as first parameter (usually named self), along with all the rest of __new__'s parameters.

This is useful because in Python, a constructor is just another function. For example, I've done this several times:
def ClassThatShouldntBeDirectlyInstantiated():
return _classThatShouldntBeDirectlyInstantiated()
class _classThatShouldntBeDirectlyInstantiated(object):
...
Of course, that's a contrived example, but you get the idea. Essentially, most people that use your class will probably think of ClassThatShouldntBeDirectlyInstantiated as your class, and there's no need to let them think otherwise. Doing things this way, all you have to do is document the factory function as the class it instantiates and not confuse anyone using the class.
In a language like C# or Java, I sometimes find it annoying to make classes like this because it can be difficult to determine whether you should use the constructor or some factory function. I'm not sure if this is also the case in Ruby though.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.