Invariants Python

Invariants Python - python

I'm reading the python tutorial. The 3rd paragraph confuses me a bit.
"Clients should use data attributes with care — clients may mess up invariants maintained by the methods by stamping on their data attributes."
What exactly do they mean by invariants? Do they mean data attributes that certain methods rely on? (e.g a method that returns a certain data member; i.e. a getter method)

I believe what they want to say there is that you should be careful when accessing/mutating an object’s properties. (Note that I call them “properties”, not “data attributes”)
This is because an object may put something in a property that might be necessary to maintain an object’s state, i.e. an invariant which you would normally not be given means means to mess with.
But unlike other programming languages, Python does not have any protection for object members. There are no private instance variables or methods†, so anyone can change an object in the way they want to, possibly completely destroying its functionality.
So the tutorial suggests you to avoid storing important things in properties—but to be honest, there is not really a much better way, and if someone would want to mess with it, they just could. For example you can swap out whole methods at run time, replacing them with a completely different logic.
So trying to protect yourself too much is not really worth it. Instead, I would suggest you to just document everything in a clear way. E.g. point out which properties are free to touch by users, and which are for internal use only.
† There are some means and conventions, e.g. one leading underscore for internal/protected members, and two leading underscores for private members, but those will not technically prevent you from accessing it, so it’s not a protection.

Related

Python encapsulated attributes on real world

I was doing some research about the use of encapsulation in object oriented programming using Python and I have stumbled with this topic that has mixed opinions about how encapsulated attributes work and about the usage of them.
I have programmed this piece of code that only made matters more confuse to me:
class Dog:
def __init__(self,weight):
self.weight = weight
__color =''
def set_color(self,color):
self.__color = color
def get_color(self):
print(self.__color)
rex = Dog(59)
rex.set_color('Black')
rex.get_color()
rex.color = 'White'
rex.__color = rex.color
print(rex.__color)
rex.get_color()
The result is:
>Black
>White
>Black
I understand that the reason behind this is because when we do the assignment rex.__color = rex.color, a new attribute is created that does not point to the real __color of the instanced Dog.
My questions here are:
Is this a common scenario to occur?
Are private attributes a thing used really often?

In a language that does not have properties (eg. java) this is so common that it has become a standard, and all frameworks assume that getters/setters already exist.
However, in python you can have properties, which are essentially getters/setters that can be added later without altering the code that uses the variables. So, no reason to do it in python. Use the fields as public, and add properties if something changes later.
Note: use single instead of double underscore in your "private" variables. Not only it's the common convention, but also, double underscore is handled differently by the interpreter.

Encapsulation is not about data hidding but about keeping state and behaviour together. Data hidding is meant as a way to enforce encapsulation by preventing direct access to internal state, so the client code must use (public) methods instead. The main points here are 1/ to allow the object to maintain a coherent state (check the values, eventually update some other part of the state accordingly etc) and 2/ to allow implementation changes (internal state / private methods) without breaking the client code.
Languages like Java have no support for computed attributes, so the only way to maintain encapsulation in such languages is to make all attributes protected or private and to eventally provide accessors. Alas, some people never got the "eventually" part right and insist on providing read/write accessors for all attributes, which is a complete nonsense.
Python has a strong support for computed attributes thru the descriptor protocol (mostly known via the generic property type but you can of course write your own desciptors instead), so there's no need for explicit getters/setters - if your design dictates that some class should provide a publicly accessible attribute as part of it's API, you can always start with just a public attribute and if at some point you need to change implementation you can just replace it with a computed attribute.
This doesn't mean you should make all your attributes public !!! - most of the time, you will have "implementation attributes" (attributes that support the internal state but are in no way part of the class API), and you definitly want to keep those protected (by prefixing them with a single leading underscore).
Note that Python doesn't try to technically enforce privacy, it's only a naming convention and you can't prevent client code to access internal state. Nothing to be worried about here, very few peoples stupid enough to bypass the official API without good reasons, and then they know their code might break something and assume all consequences.

Should I treat my own single underscored attributes as private?

If I use a single leading underscore for an attribute in a class, would it be wrong of me to access it from a different object? Is a single underscore saying "I will use this as I please but you the user shouldn't touch it" or should even the developer treat it as though it were private?

The single underscore indicates that it's not for public consumption; code within the same package is welcome to poke and prod it.

I think you should refrain from doing so as much as reasonably possible because it breaks encapsulation which one of the benefits of the object-oriented paradigm. While some feel it's fine as certain higher levels of scoping, say module or package, I've found avoiding doing so a useful rule-of-thumb even within the various methods of a single class -- at least if it's fairly complicated, subtle, or just an aspect I think I might want to change later.
The reason is every time you do something with one, you're creating a dependency between one object's internals and where you're coding, be it another part of the same object or module or whatever. The more of this there is, the harder it will be later to maintain or enhance things.
Another aspect to consider is the fact that whenever you feel a need to do so, it may be indicative of a need for a better design -- in which cause you can treat it a warning sign and react accordingly.
Creating an interface for doing operations that deal with internal details often means having to design and implement more code than not doing so would. So in each case, the benefits must be weighed against the costs to determine if it's worth it. Eventually experience (and education if it's ongoing) makes such decisions easier maybe even instinctual.

Python syntax reasoning (why not fall back for . the way django template syntax does?)

My karate instructor is fond of saying, "a block is a lock is a throw is a blow." What he means is this: When we come to a technique in a form, although it might seem to look like a block, a little creativity and examination shows that it can also be seen as some kind of joint lock, or some kind of throw, or some kind of blow.
So it is with the way the django template syntax uses the dot (".") character. It perceives it first as a dictionary lookup, but it will also treat it as a class attribute, a method, or list index - in that order. The assumption seems to be that, one way or another, we are looking for a piece of knowledge. Whatever means may be employed to store that knowledge, we'll treat it in such a way as to get it into the template.
Why doesn't python do the same? If there's a case where I might have assigned a dictionary term spam['eggs'], but know for sure that spam has an attribute eggs, why not let me just write spam.eggs and sort it out the way django templates do?
Otherwise, I have to except an AttributeError and add three additional lines of code.
I'm particularly interested in the philosophy that drives this setup. Is it regarded as part of strong typing?

django templates and python are two, unrelated languages. They also have different target audiences.
In django templates, the target audience is designers, who proabably don't want to learn 4 different ways of doing roughly the same thing ( a dictionary lookup ). Thus there is a single syntax in django templates that performs the lookup in several possible ways.
python has quite a different audience. developers actually make use of the many different ways of doing similar things, and overload each with distinct meaning. When one fails it should fail, because that is what the developer means for it to do.

JUST MY correct OPINION's opinion is indeed correct. I can't say why Guido did it this way but I can say why I'm glad that he did.
I can look at code and know right away if some expression is accessing the 'b' key in a dict-like object a, the 'b' attribute on the object a, a method being called on or the b index into the sequence a.
Python doesn't have to try all of the above options every time there is an attribute lookup. Imagine if every time one indexed into a list, Python had to try three other options first. List intensive programs would drag. Python is slow enough!
It means that when I'm writing code, I have to know what I'm doing. I can't just toss objects around and hope that I'll get the information somewhere somehow. I have to know that I want to lookup a key, access an attribute, index a list or call a method. I like it that way because it helps me think clearly about the code that I'm writing. I know what the identifiers are referencing and what attributes and methods I'm expecting the object of those references to support.
Of course Guido Van Rossum might have just flipped a coin for all I know (He probably didn't) so you would have to ask him yourself if you really want to know.
As for your comment about having to surround these things with try blocks, it probably means that you're not writing very robust code. Generally, you want your code to expect to get some piece of information from a dict-like object, list-like object or a regular object. You should know which way it's going to do it and let anything else raise an exception.
The exception to this is that it's OK to conflate attribute access and method calls using the property decorator and more general descriptors. This is only good if the method doesn't take arguments.

The different methods of accessing
attributes do different things. If
you have a function foo the two lines
of code
a = foo,
a = foo()
do two
very different things. Without
distinct syntax to reference and call
functions there would be no way for
python to know whether the variable
should be a reference to foo or the
result of running foo. The () syntax removes the ambiguity.
Lists and dictionaries are two very different data structures. One of the things that determine which one is appropriate in a given situation is how its contents can be accessed (key Vs index). Having separate syntax for both of them reinforces the notion that these two things are not the same and neither one is always appropriate.
It makes sense for these distinctions to be ignored in a template language, the person writing the html doesn't care, the template language doesn't have function pointers so it knows you don't want one. Programmers who write the python that drive the template however do care about these distinctions.

In addition to the points already posted, consider this. Python uses special member variables and functions to provide metadata about the object. Both the interpreter and programmers make heavy use of these. For example, both dicts and lists have a __len__ member function. Now, if a dict's data were accessed by using the . operator, a potential ambiguity arises if the dict has a key called __len__. You could special-case these, but many objects have a __dict__ attribute which is a mapping of member names and values. If that object happened to be a container, which also defined a __len__ attribute, you would end up with an utter mess.
Problems like this would end up turning Python into a mishmash of special cases that the programmer would have to constantly be aware of. This would detract from the reason why many people use Python in the first place, i.e., its elegant simplicity.
Now, consider that new users often shadow built-ins (if the code in SO questions is any indication) and having something like this starts to look like a really bad idea, since it would exacerbate the problem many-fold.

In addition to the responses above, it's not practical to merge dictionary lookup and object lookup in general because of the restrictions on object members.
What if your key has whitespace? What if it's an int, or a frozenset, etc.? Dot notation can't account for these discrepancies, so while it's an acceptable tradeoff for a templating language, it's unacceptable for a general-purpose programming language like Python.

Should properties do nontrivial initialization?

I have an object that is basically a Python implementation of an Oracle sequence. For a variety of reasons, we have to get the nextval of an Oracle sequence, count up manually when determining primary keys, then update the sequence once the records have been inserted.
So here's the steps my object does:
Construct an object, with a key_generator attribute initially set to None.
Get the first value from the database, passing it to an itertools.count.
Return keys from that generator using a property next_key.
I'm a little bit unsure about where to do step 2. I can think of three possibilities:
Skip step 1 and do step 2 in the constructor. I find this evil because I tend to dislike doing this kind of initialization in a constructor.
Make next_key get the starting key from the database the first time it is called. I find this evil because properties are typically assumed to be trivial.
Make next_key into a get_next_key method. I dislike this because properties just seem more natural here.
Which is the lesser of 3 evils? I'm leaning towards #2, because only the first call to this property will result in a database query.

I think your doubts come from PEP-8:
Note 3: Avoid using properties for computationally expensive
operations; the attribute notation makes the caller believe
that access is (relatively) cheap.
Adherence to a standard behavior is usually quite a good idea; and this would be a reason to scrap away solution #2.
However, if you feel the interface is better with property than a method, then I would simply document that the first call is more expensive, and go with that (solution #2).
In the end, recommendations are meant to be interpreted.

I agree that attribute access and everything that looks like it (i.e. properties in the Python context) should be fairly trivial. If a property is going to perform a potentially costly operation, use a method to make this explicit. I recommend a name like "fetch_XYZ" or "retrieve_XYZ", since "get_XYZ" is used in some languages (e.g. Java) as a convention for simple attribute access, is quite generic, and does not sound "costly" either.
A good guideline is: If your property could throw an exception that is not due to a programming error, it should be a method. For example, throwing a (hypothetical) DatabaseConnectionError from a property is bad, while throwing an ObjectStateError would be okay.
Also, when I understood you correctly, you want to return the next key, whenever the next_key property is accessed. I recommend strongly against having side-effects (apart from caching, cheap lazy initialization, etc.) in your properties. Properties (and attributes for that matter) should be idempotent.

I've decided that the key smell in the solution I'm proposing is that the property I was creating contained the word "next" in it. Thus, instead of making a next_key property, I've decided to turn my DatabaseIntrospector class into a KeyCounter class and implemented the iterator protocol (ie making a plain old next method that returns the next key).

Code organization in Python: Where is a good place to put obscure methods?

I have a class called Path for which there are defined about 10 methods, in a dedicated module Path.py. Recently I had a need to write 5 more methods for Path, however these new methods are quite obscure and technical and 90% of the time they are irrelevant.
Where would be a good place to put them so their context is clear? Of course I can just put them with the class definition, but I don't like that because I like to keep the important things separate from the obscure things.
Currently I have these methods as functions that are defined in a separate module, just to keep things separate, but it would be better to have them as bound methods. (Currently they take the Path instance as an explicit parameter.)
Does anyone have a suggestion?

If the method is relevant to the Path - no matter how obscure - I think it should reside within the class itself.
If you have multiple places where you have path-related functionality, it leads to problems. For example, if you want to check if some functionality already exists, how will a new programmer know to check the other, less obvious places?
I think a good practice might be to order functions by importance. As you may have heard, some suggest putting public members of classes first, and private/protected ones after. You could consider putting the common methods in your class higher than the obscure ones.

If you're keen to put those methods in a different source file at any cost, AND keen to have them at methods at any cost, you can achieve both goals by using the different source file to define a mixin class and having your Path class import that method and multiply-inherit from that mixin. So, technically, it's quite feasible.
However, I would not recommend this course of action: it's worth using "the big guns" (such as multiple inheritance) only to serve important goals (such as reuse and removing repetition), and separating methods out in this way is not really a particularly crucial goal.
If those "obscure methods" played no role you would not be implementing them, so they must have SOME importance, after all; so I'd just clarify in docstrings and comments what they're for, maybe explicitly mention that they're rarely needed, and leave it at that.

I would just prepend the names with an underscore _, to show that the reader shouldn't bother about them.
It's conventionally the same thing as private members in other languages.

Put them in the Path class, and document that they are "obscure" either with comments or docstrings. Separate them at the end if you like.

Oh wait, I thought of something -- I can just define them in the Path.py module, where every obscure method will be a one-liner that will call the function from the separate module that currently exists. With this compromise, the obscure methods will comprise of maybe 10 lines in the end of the file instead of 50% of its bulk.

I suggest making them accessible from a property of the Path class called something like "Utilties". For example: Path.Utilities.RazzleDazzle. This will help with auto-completion tools and general maintenance.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.