I am reading up on how we ensure data encapsulation in python.One of the blog says
"Data Encapsulation means, that we should only be able to access private attributes via getters and setters"
Consider the following snippets from the blog:
class Robot:
def __init__(self, name=None, build_year=None):
self.name = name
self.build_year = build_year
Now, if i create the object of the class as below:
obj1=Robot()
obj1.name('Robo1")
obj1.build_year("1978")
Currently, i can access the attributes directly as i have defined them public(without the __notation)
Now to ensure data encapsulation, i need to define the attributes as privates
using the __ notation and access private attributes via getters and setters.
So the new class definition is as follows:
class Robot:
def __init__(self, name=None, build_year=2000):
self.__name = name
self.__build_year = build_year
def set_name(self, name):
self.__name = name
def get_name(self):
return self.__name
def set_build_year(self, by):
self.__build_year = by
def get_build_year(self):
return self.__build_year
Now i instantiate the class as below:
x = Robot("Marvin", 1979)
x.set_build_year(1993)
This way, i achive data encapsulation as private data members are no longer accessed directly and they can only be accessed via the class methods.
Q1:Why are we doing this? Who are we protecting the code from? Who is outside world?Anyone who has the source code can tweak it as per their requirement, so why at all do we add extra methods(get/set) to modify/tweak the attributes?
Q2:Is the above example considered data encapsulation?
Data encapsulation is slightly more general than access protection. name and build_year are encapsulated by the class Robot regardless of how you define the attributes. Python takes the position that getters and setters that do nothing more than access or assign to the underlying attribute are unnecessary.
Even using the double-underscore prefix is just advisory, and is more concerned with preventing name collisions in subclasses. If you really wanted to get to the __build_year attribute directly, you still could with
# Prefix attribute name with _Robot
x._Robot__build_year = 1993
A better design in Python is to use a property, which causes Python to invoke a defined getter and/or setter whenever an attribute is defined directly. For example:
class Robot(object):
def __init__(self, name, by):
self.name = name
self.build_year = by
#property
def name(self):
return self._name
#name.setter
def name(self, newname):
self._name = newname
#property
def build_year(self):
return self._build_year
#build_year.setter
def build_year(self, newby):
self._build_year = newby
You wouldn't actually define these property functions so simply, but a big benefit is that you can start by allowing direct access to a name attribute, and if you decide later that there should be more logic involved in getting/setting the value and you want to switch to properties, you can do so without affecting existing code. Code like
x = Robot("bob", 1993)
x.build_year = 1993
will work the same whether or not x.build_year = 1993 assigns to build_year directly or if it really triggers a call to the property setter.
About source code: sometimes you supply others with compiled python files that does not present the source, and you don't want people to get in mess with direct attribute assignments.
Now, consider data encapsulation as safe guards, last point before assigning or supplying values:
You may want to validate or process assignments using the sets, to make sure the assignment is valid for your needs or enters to the variable in the right format, (e.g. you want to check that attribute __build_year is higher than 1800, or that the name is a string). Very important in dynamic languages like python where a variable is not declared with a specific type.
Same goes for gets. You might want to return the year as a decimal, but use it as an integer in the class.
Yes, your example is a basic data encapsulation.
Related
Am I missing something, or this something like this not possible?
class Outer:
def __init__(self, val):
self.__val = val
def __getVal(self):
return self.__val
def getInner(self):
return self.Inner(self)
class Inner:
def __init__(self, outer):
self.__outer = outer
def getVal(self):
return self.__outer.__getVal()
foo = Outer('foo')
inner = foo.getInner()
val = inner.getVal()
print val
I'm getting this error message:
return self.__outer.__getVal()
AttributeError: Outer instance has no attribute '_Inner__getVal'
You are trying to apply Java techniques to Python classes. Don't. Python has no privacy model like Java does. All attributes on a class and its instances are always accessible, even when using __name double-underscore names in a class (they are simply renamed to add a namespace).
As such, you don't need an inner class either, as there is no privileged access for such a class. You can just put that class outside Outer and have the exact same access levels.
You run into your error because Python renames attributes with initial double-underscore names within a class context to avoid clashing with subclasses. These are called class private because the renaming adds the class names as a namespace; this applies both to their definition and use. See the Reserved classes of identifiers section of the reference documentation:
__*
Class-private names. Names in this category, when used within the context of a class definition, are re-written to use a mangled form to help avoid name clashes between “private” attributes of base and derived classes.
All names with double underscores in Outer get renamed to _Outer prefixed, so __getVal is renamed to _Outer__getVal. The same happens to any such names in Inner, so your Inner.getVal() method will be looking for a _Inner__getVal attribute. Since Outer has no _Inner__getVal attribute, you get your error.
You could manually apply the same transformation to Inner.getVal() to 'fix' this error:
def getVal(self):
return self.__outer._Outer__getVal()
But you are not using double-underscore names as intended anyway, so move to single underscores instead, and don't use a nested class:
class Outer:
def __init__(self, val):
self._val = val
def _getVal(self):
return self._val
def getInner(self):
return _Inner(self)
class _Inner:
def __init__(self, outer):
self._outer = outer
def getVal(self):
return self._outer._getVal()
I renamed Inner to _Inner to document the type is an internal implementation detail.
While we are on the subject, there really is no need to use accessors either. In Python you can switch between property objects and plain attributes at any time. There is no need to code defensively like you have to in Java, where switching between attributes and accessors carries a huge switching cost. In Python, don't use obj.getAttribute() and obj.setAttribute(val) methods. Just use obj.attribute and obj.attribute = val, and use property if you need to do more work to produce or set the value. Switch to or away from property objects at will during your development cycles.
As such, you can simplify the above further to:
class Outer(object):
def __init__(self, val):
self._val = val
#property
def inner(self):
return _Inner(self)
class _Inner(object):
def __init__(self, outer):
self._outer = outer
#property
def val(self):
return self._outer._val
Here outer.inner produces a new _Inner() instance as needed, and the Inner.val property proxies to the stored self._outer reference. The user of the instance never need know either attribute is handled by a property object:
>>> outer = Outer(42)
>>> print outer.inner.val
42
Note that for property to work properly in Python 2, you must use new-style classes; inherit from object to do this; on old-style classes on property getters are supported (meaning setting is not prevented either!). This is the default in Python 3.
The leading-double-underscore naming convention in Python is supported with "name mangling." This is implemented by inserting the name of the current class in the name, as you have seen.
What this means for you is that names of the form __getVal can only be accessed from within the exact same class. If you have a nested class, it will be subject to different name mangling. Thus:
class Outer:
def foo(self):
print(self.__bar)
class Inner:
def foo2(self):
print(self.__bar)
In the two nested classes, the names will be mangled to _Outer__bar and _Inner__bar respectively.
This is not Java's notion of private. It's "lexical privacy" (akin to "lexical scope" ;-).
If you want Inner to be able to access the Outer value, you will have to provide a non-mangled API. Perhaps a single underscore: _getVal, or perhaps a public method: getVal.
Im trying to understand Python Classes. I am a little confused around defining the __init__. If I have 4 functions created all taking various input variables. Do I have to assign each variable in the __init__?
class Thing:
def __init__(self, arguments, name, address, phone_number, other):
self.arguments = arguments
self.name = name
self.address = address
self.phone_number = phone_number
self.other = other
def First(self, name):
print self.name
def Arguments(self, arguments):
print self.arguments
def Address(self, address, phone_number):
print self.address + str(self.phone_number)
def Other(self, other):
print self.other
The above is completely made up so please excuse its pointlessness (arguably its point is to illustrate my question so guess its not pointless).
No doubt im going to get loads of down marks for this question for some reason but I have been reading various books (Learning Python The Hard Way, Python For Beginners) and been reading various tutorials online but none of them actually confirm "You must add every variable in the init function". So any help understanding the __init__ a little better would be appreciated.
Firstly: The __init__() function is special in that it is called for you while creating a new instance of a class. Apart from that, it is a function like any other function, you can even call it manually if you want.
Then: You are free to do what you want with the parameters passed to a function. Often, the parameters passed to __init__() are stored inside the object (self) that is being initialized. Sometimes, they are used in other ways and the result then stored in self, like passing a hostname and using that to create a socket - you then only store the socket. You don't have to do anything with the parameters though, you can ignore them like you do in your First() function, which receives a parameter name which is ignored. Note that this parameter name and the similar attribute self.name are different things!
Notes:
It is uncommon to ignore parameters passed to __init__() though, just as it is uncommon to ignore parameters in general. In some cases, they are "reserved" so they can be used by derived classes but typically they are unintended (as with your First() function, I guess).
Check out PEP8, which is a style guide. Adhering to it will make it easier for others to read your code. E.g. First() should be first() instead.
Upgrade to Python 3. I don't think there's any excuse nowadays to learn Python 2 and littley excuse except maintenance to use it in general.
If the variable is logically connected to the object itself, it's good to set it in constructor:
class Student:
def __init__(self, name):
self.name = name
def print_name(self):
print(self.name)
If the variable is just a temporary parameter to some function, then, well, just pass it just as a parameter:
class Cafeteria:
def __init__(self):
pass # nothing
def process(self, student_name):
print(student_name + " got lunch")
In Python you don't have to declare all possible object attributes like you have to do in C++, C#, Java etc. I think it's still good idea to initialize them in constructor to some null value (None, 0), but it's not necessary. This is just fine:
class Cafeteria:
def set_today_menu(self, menu):
self.menu = menu
def process(self, student_name):
print(student_name + " got " + self.menu)
You don't have to, but to access them they need to set somewhere...you can also set defaults outside of any of the defs in a class if you want.
def Class:
RandomVariable = 5
def __init__(self, val):
self.RandomVariable = val
This method is known as your initializer where you insatiate an object and then add attributes to it. A human object, if we were to create a class, may contain a name, age, and gender attributes all of which are bound to the object using the self pointer in the initializer. Also, an initializer may modify global variables as well. If you wanted to count the number of babies born, you would add to a global counter variable from our fictitious human class initializer.
Say I have a class that looks like
class MeasurementList:
def __init__(self, measurement_list):
self.__measurements = measurement_list
#property
def measurements(self):
return self.__measurements
what is the most pythonic way to retrieve the value of self.measurements from inside the class; directly accessing the variable or going via the property (external accessor)? I.e.,
def do_something(self)
# do something
return self.measurements
or
def do_something(self)
# do something
return self.__measurements
Does any of the alternatives have any speed advantages, or easier refactoring, or other factors?
The point of properties is to add additional functionality to the process of getting/setting a field, while keeping the interface of a field.
That means you start out with a simple field, and access it as a field:
class MeasurementList:
def __init__(self, measurement_list):
self.measurements = measurement_list
def foo(self):
print("there are %d measurements" % len(self.measurements))
Then if you want/have to add additional logic to the setter/getter you convert it into a property, without having changed the interface. Thus no need to refactor accessing code.
class MeasurementList:
def __init__(self, measurement_list):
self._count = 0
self.measurements = measurement_list
#property
def measurements(self):
return self._measurements
#measurements.setter
def measurements(self value):
self._measurements = value
self._count = len(value)
def foo(self):
print("there are %d measurements" % (self._count))
def bar(self):
print(self.measurements)
Alternative reasons for using properties are readonly properties or properties that return computed (not directly stored in fields) values. In the case of read only properties you would access the backing field directly to write (from inside the class).
class MeasurementList:
def __init__(self, measurement_list):
self._measurements = measurement_list
# readonly
#property
def measurements(self):
return self._measurements
# computed property
#property
def count(self):
return len(self.measurements)
def foo(self):
print("there are %d measurements" % (self.count))
def bar(self):
print(self.measurements)
Keeping all that in mind you should not forget that there is no such thing as 'private' in python. If anyone really wants to access a private anything he can do so. It is just convention that anything starting with an underscore should be considered private and not be accessed by the caller. That is also the reason why one underscore is enough. Two underscores initiate some name mangling that is primarily used to avoid name conflicts, not prohibit access.
When you use properties in Python, you should almost always avoid accessing attribute under the property, unless it's necessary. Why?
Properties in Python are used to create getter, setter and deleter, but you probably know it.
They are usually used when you process the property data during those operation. I don't really have a good example for it right now, but consider following:
class User:
# _password stores hash object from user's password.
#property
def password(self):
return self._password.hexdigest() # Returns hash as string
#password.setter
def password(self, val):
self._password = hash(val) # Creates hash object
Here using _password and password results in quite different output. In most cases, you need simply password, both inside and outside class definition, unless you want to interact directly with object wrapped by it.
If you have the same object returned in getter and attribute, then you should follow the same practice, as you may wish sameday to add some checks or mechanics to it, and it will save you from refactoring every use of _attribute, and following that convention will also save you from errors when creating more complex descriptors.
Also, from your code, note that using __measurements (leading double underscore) results in name mangling of attribute name, so if you ever inherit from MeasurementList, you will be almost unable to access this attribute.
I presume you have seen code like this in Java. It is, however, deeply unpythonic to use methods where attribute access serves the purpose perfectly well. Your existing code would be much more simply written as
class MeasurementList:
def __init__(self, measurement_list):
self.measurements = measurement_list
Then no property is required.
The point is, presumably, to avoid allowing code external to the class to alter the value of the __measurements attribute. How necessary is this in practice?
Use setters and getters both inside and outside your class. It would make your code easier to maintain once you add some additional data processing into setters and getters:
class C(object):
_p = 1
#property
def p(self):
print 'getter'
return self._p
#p.setter
def p(self, val):
print 'setter'
self._p = val
def any_method(self):
self.p = 5
print '----'
a = self.p
myObject = C()
myObject.any_method()
From the output, you see that setter and getter are invoked:
setter
----
getter
When I write class in python, most of the time, I am eager to set variables I use, as properties of the object. Is there any rule or general guidelines about which variables should be used as class/instance attribute and which should not?
for example:
class simple(object):
def __init(self):
a=2
b=3
return a*b
class simple(object):
def __init(self):
self.a=2
self.b=3
return a*b
While I completely understand the attributes should be a property of the object. This is simple to understand when the class declaration is simple but as the program goes longer and longer and there are many places where the data exchange between various modules should be done, I get confused on where I should use a/b or self.a/self.b. Is there any guidelines for this?
Where you use self.a you are creating a property, so this can be accessed from outside the class and persists beyond that function. These should be used for storing data about the object.
Where you use a it is a local variable, and only lasts while in the scope of that function, so should be used where you are only using it within the function (as in this case).
Note that __init is misleading, as it looks like __init__ - but isn't the constructor. If you intended them to be the constructor, then it makes no sense to return a value (as the new object is what is returned).
class Person(object):
def __init__(self, name):
# Introduce all instance variables on __init__
self.name = name
self.another = None
def get_name(self):
# get_name has access to the `instance` variable 'name'
return self.name
So if you want a variable to be available on more than one method, make
it an instance variable.
Notice my comment on introducing all instance vars on __init__.
Although the example below is valid python don't do it.
class Person(object):
def __init__(self):
self.a = 0
def foo(self):
self.b = 1 # Whoa, introduced new instance variable
Instead initialize all your instance variables on __init__ and set
them to None if no other value is appropriate for them.
I try to imagine what I want the API of my class to look like prior to implementing it. I think to myself, If I didn't write this class, would I want to read the documentation about what this particular variable does? If reading that documentation would simply waste my time, then it should probably be a local variable.
Occasionally, you need to preserve some information, but you wouldn't necessarily want that to be part of the API, which is when you use the convention of appending an underscore. e.g. self._some_data_that_is_not_part_of_the_api.
The self parameter refers to the object itself. So if you need to use on of the class attributes outside of the class you would it call it as the name of class instance and the attribute name. I don't think there is any guideline on when to use self, it all depends on your need. When you are building a class you should try to think about what you will use the variables you creating for. If you know for sure that you will need that specific attribute in the program you are importing your class, then add self.
I have many different small classes which have a few fields each, e.g. this:
class Article:
def __init__(self, name, available):
self.name = name
self.available = available
What's the easiest and/or most idiomatic way to make the name field read only, so that
a = Article("Pineapple", True)
a.name = "Banana" # <-- should not be possible
is not possible anymore?
Here's what I considered so far:
Use a getter (ugh!).
class Article:
def __init__(self, name, available):
self._name = name
self.available = available
def name(self):
return self._name
Ugly, non-pythonic - and a lot of boilerplate code to write (especially if I have multiple fields to make read-only). However, it does the job and it's easy to see why that is.
Use __setattr__:
class Article:
def __init__(self, name, available):
self.name = name
self.available = available
def __setattr__(self, name, value):
if name == "name":
raise Exception("%s property is read-only" % name)
self.__dict__[name] = value
Looks pretty on the caller side, seems to be the idiomatic way to do the job - but unfortunately I have many classes with only a few fields to make read only each. So I'd need to add a __setattr__ implementation to all of them. Or use some sort of mixin maybe? In any case, I'd need to make up my mind how to behave in case a client attempts to assign a value to a read-only field. Yield some exception, I guess - but which?
Use a utility function to define properties (and optionally getters) automatically. This is basically the same idea as (1) except that I don't write the getters explicitely but rather do something like
class Article:
def __init__(self, name, available):
# This function would somehow give a '_name' field to self
# and a 'name()' getter to the 'Article' class object (if
# necessary); the getter simply returns self._name
defineField(self, "name")
self.available = available
The downside of this is that I don't even know if this is possible (or how to implement it) since I'm not familiar with runtime code generation in Python. :-)
So far, (2) appears to be most promising to me except for the fact that I'll need __setattr__ definitions to all my classes. I wish there was a way to 'annotate' fields so that this happens automatically. Does anybody have a better idea?
For what it's worth, I'mu sing Python 2.6.
UPDATE:
Thanks for all the interesting responses! By now, I have this:
def ro_property(o, name, value):
setattr(o.__class__, name, property(lambda o: o.__dict__["_" + name]))
setattr(o, "_" + name, value)
class Article(object):
def __init__(self, name, available):
ro_property(self, "name", name)
self.available = available
This seems to work quite nicely. The only changes needed to the original class are
I need to inherit object (which is not such a stupid thing anyway, I guess)
I need to change self._name = name to ro_property(self, "name", name).
This looks quite neat to me - can anybody see a downside with it?
I would use property as a decorator to manage your getter for name (see the example for the class Parrot in the documentation). Use, for example, something like:
class Article(object):
def __init__(self, name, available):
self._name = name
self.available = available
#property
def name(self):
return self._name
If you do not define the setter for the name property (using the decorator x.setter around a function) this throws an AttributeError when you try and reset name.
Note: You have to use Python's new-style classes (i.e. in Python 2.6 you have to inherit from object) for properties to work correctly. This is not the case according to #SvenMarnach.
As pointed out in other answers, using a property is the way to go for read-only attributes. The solution in Chris' answer is the cleanest one: It uses the property() built-in in a straight-forward, simple way. Everyone familiar with Python will recognize this pattern, and there's no domain-specific voodoo happening.
If you don't like that every property needs three lines to define, here's another straight-forward way:
from operator import attrgetter
class Article(object):
def __init__(self, name, available):
self._name = name
self.available = available
name = property(attrgetter("_name"))
Generally, I don't like defining domain-specific functions to do something that can be done easily enough with standard tools. Reading code is so much easier if you don't have to get used to all the project-specific stuff first.
Based in the Chris answer, but arguably more pythonic:
def ro_property(field):
return property(lambda self : self.__dict__[field])
class Article(object):
name = ro_property('_name')
def __init__(self):
self._name = "banana"
If trying to modify the property it will raise an AttributeError.
a = Article()
print a.name # -> 'banana'
a.name = 'apple' # -> AttributeError: can't set attribute
UPDATE: About your updated answer, the (little) problem I see is that you are modifying the definition of the property in the class every time you create an instance. And I don't think that is such a good idea. That's why I put the ro_property call outside of the __init__ function
What about?:
def ro_property(name):
def ro_property_decorator(c):
setattr(c, name, property(lambda o: o.__dict__["_" + name]))
return c
return ro_property_decorator
#ro_property('name')
#ro_property('other')
class Article(object):
def __init__(self, name):
self._name = name
self._other = "foo"
a = Article("banana")
print a.name # -> 'banana'
a.name = 'apple' # -> AttributeError: can't set attribute
Class decorators are fancy!
It should be noted that it's always possible to modify attributes of an object in Python - there are no truly private variables in Python. It's just that some approaches make it a bit harder. But a determined coder can always lookup and modify the value of an attribute. For example, I can always modify your __setattr__ if I want to...
For more information, see Section 9.6 of The Python Tutorial. Python uses name mangling when attributes are prefixed with __ so the actual name at runtime is different but you could still derive what that name at runtime is (and thus modify the attribute).
I would stick with your option 1 but refined it to use Python property:
class Article
def get_name(self):
return self.__name
name = property(get_name)