Should properties do nontrivial initialization?

Should properties do nontrivial initialization? - python

I have an object that is basically a Python implementation of an Oracle sequence. For a variety of reasons, we have to get the nextval of an Oracle sequence, count up manually when determining primary keys, then update the sequence once the records have been inserted.
So here's the steps my object does:
Construct an object, with a key_generator attribute initially set to None.
Get the first value from the database, passing it to an itertools.count.
Return keys from that generator using a property next_key.
I'm a little bit unsure about where to do step 2. I can think of three possibilities:
Skip step 1 and do step 2 in the constructor. I find this evil because I tend to dislike doing this kind of initialization in a constructor.
Make next_key get the starting key from the database the first time it is called. I find this evil because properties are typically assumed to be trivial.
Make next_key into a get_next_key method. I dislike this because properties just seem more natural here.
Which is the lesser of 3 evils? I'm leaning towards #2, because only the first call to this property will result in a database query.

I think your doubts come from PEP-8:
Note 3: Avoid using properties for computationally expensive
operations; the attribute notation makes the caller believe
that access is (relatively) cheap.
Adherence to a standard behavior is usually quite a good idea; and this would be a reason to scrap away solution #2.
However, if you feel the interface is better with property than a method, then I would simply document that the first call is more expensive, and go with that (solution #2).
In the end, recommendations are meant to be interpreted.

I agree that attribute access and everything that looks like it (i.e. properties in the Python context) should be fairly trivial. If a property is going to perform a potentially costly operation, use a method to make this explicit. I recommend a name like "fetch_XYZ" or "retrieve_XYZ", since "get_XYZ" is used in some languages (e.g. Java) as a convention for simple attribute access, is quite generic, and does not sound "costly" either.
A good guideline is: If your property could throw an exception that is not due to a programming error, it should be a method. For example, throwing a (hypothetical) DatabaseConnectionError from a property is bad, while throwing an ObjectStateError would be okay.
Also, when I understood you correctly, you want to return the next key, whenever the next_key property is accessed. I recommend strongly against having side-effects (apart from caching, cheap lazy initialization, etc.) in your properties. Properties (and attributes for that matter) should be idempotent.

I've decided that the key smell in the solution I'm proposing is that the property I was creating contained the word "next" in it. Thus, instead of making a next_key property, I've decided to turn my DatabaseIntrospector class into a KeyCounter class and implemented the iterator protocol (ie making a plain old next method that returns the next key).

Related

Is it a best practice to always use a concrete type (List, Tuple, etc.) for a return value instead of an abstract type (Sequence, Iterable)?

I've read somewhere that it's a best practice to type the return value with a concrete type, but despite all my efforts can't find the resource where I read that...
There an answer on SO at the very least. Looked at the PEPs and mypy documentation but can't find it...

Your type annotations should express your intentended commitments, and no more.
Anything which you consider an implementation detail and reserve the right to change should be annotated to maintain that encapsulation. Parameters should be annotated as the most general thing you want to commit to accepting. Return values should be annotated as the most specific thing you want to commit to your return value allowing.
Suppose that you have some code doing brute force factorisation, and this includes a class which produces some candidate prime numbers to loop through. Initially, you store the first 100 candidates in a list and expose them like so:
def primes(self):
return self._internal_primes
You could annotate that as List or as Sequence, because self._internal_primes is both of those. But you shouldn't.
The foremost reason that you shouldn't is the most philosophical. The intended purpose of this method is to supply some candidate primes to loop over. Iterable[int] is the right thing to convey that purpose, irrespective of what happens under the hood.
The second reason is encapsulation and the associated flexibility. You didn't intend to commit to exposing a private list as your implementation. Maybe you anticipate rewriting primes in future as an on-demand generator implementing a sieve. Commiting to List now means you'd have to break your interface promise to do so.
The third reason is that even if you never expect to change the implementation, you may not want to advertise the full capabilities of that implementation. For example, lists can be mutated. If you promise someone it's a list, you also promise that they can mutate it. There is simply nothing good to gain by allowing someone to write p.primes()[2] = 72. By exposing an interface which doesn't even have the mutation methods, such a mistake becomes much less likely to slip through undetected.

Invariants Python

I'm reading the python tutorial. The 3rd paragraph confuses me a bit.
"Clients should use data attributes with care — clients may mess up invariants maintained by the methods by stamping on their data attributes."
What exactly do they mean by invariants? Do they mean data attributes that certain methods rely on? (e.g a method that returns a certain data member; i.e. a getter method)

I believe what they want to say there is that you should be careful when accessing/mutating an object’s properties. (Note that I call them “properties”, not “data attributes”)
This is because an object may put something in a property that might be necessary to maintain an object’s state, i.e. an invariant which you would normally not be given means means to mess with.
But unlike other programming languages, Python does not have any protection for object members. There are no private instance variables or methods†, so anyone can change an object in the way they want to, possibly completely destroying its functionality.
So the tutorial suggests you to avoid storing important things in properties—but to be honest, there is not really a much better way, and if someone would want to mess with it, they just could. For example you can swap out whole methods at run time, replacing them with a completely different logic.
So trying to protect yourself too much is not really worth it. Instead, I would suggest you to just document everything in a clear way. E.g. point out which properties are free to touch by users, and which are for internal use only.
† There are some means and conventions, e.g. one leading underscore for internal/protected members, and two leading underscores for private members, but those will not technically prevent you from accessing it, so it’s not a protection.

Python syntax reasoning (why not fall back for . the way django template syntax does?)

My karate instructor is fond of saying, "a block is a lock is a throw is a blow." What he means is this: When we come to a technique in a form, although it might seem to look like a block, a little creativity and examination shows that it can also be seen as some kind of joint lock, or some kind of throw, or some kind of blow.
So it is with the way the django template syntax uses the dot (".") character. It perceives it first as a dictionary lookup, but it will also treat it as a class attribute, a method, or list index - in that order. The assumption seems to be that, one way or another, we are looking for a piece of knowledge. Whatever means may be employed to store that knowledge, we'll treat it in such a way as to get it into the template.
Why doesn't python do the same? If there's a case where I might have assigned a dictionary term spam['eggs'], but know for sure that spam has an attribute eggs, why not let me just write spam.eggs and sort it out the way django templates do?
Otherwise, I have to except an AttributeError and add three additional lines of code.
I'm particularly interested in the philosophy that drives this setup. Is it regarded as part of strong typing?

django templates and python are two, unrelated languages. They also have different target audiences.
In django templates, the target audience is designers, who proabably don't want to learn 4 different ways of doing roughly the same thing ( a dictionary lookup ). Thus there is a single syntax in django templates that performs the lookup in several possible ways.
python has quite a different audience. developers actually make use of the many different ways of doing similar things, and overload each with distinct meaning. When one fails it should fail, because that is what the developer means for it to do.

JUST MY correct OPINION's opinion is indeed correct. I can't say why Guido did it this way but I can say why I'm glad that he did.
I can look at code and know right away if some expression is accessing the 'b' key in a dict-like object a, the 'b' attribute on the object a, a method being called on or the b index into the sequence a.
Python doesn't have to try all of the above options every time there is an attribute lookup. Imagine if every time one indexed into a list, Python had to try three other options first. List intensive programs would drag. Python is slow enough!
It means that when I'm writing code, I have to know what I'm doing. I can't just toss objects around and hope that I'll get the information somewhere somehow. I have to know that I want to lookup a key, access an attribute, index a list or call a method. I like it that way because it helps me think clearly about the code that I'm writing. I know what the identifiers are referencing and what attributes and methods I'm expecting the object of those references to support.
Of course Guido Van Rossum might have just flipped a coin for all I know (He probably didn't) so you would have to ask him yourself if you really want to know.
As for your comment about having to surround these things with try blocks, it probably means that you're not writing very robust code. Generally, you want your code to expect to get some piece of information from a dict-like object, list-like object or a regular object. You should know which way it's going to do it and let anything else raise an exception.
The exception to this is that it's OK to conflate attribute access and method calls using the property decorator and more general descriptors. This is only good if the method doesn't take arguments.

The different methods of accessing
attributes do different things. If
you have a function foo the two lines
of code
a = foo,
a = foo()
do two
very different things. Without
distinct syntax to reference and call
functions there would be no way for
python to know whether the variable
should be a reference to foo or the
result of running foo. The () syntax removes the ambiguity.
Lists and dictionaries are two very different data structures. One of the things that determine which one is appropriate in a given situation is how its contents can be accessed (key Vs index). Having separate syntax for both of them reinforces the notion that these two things are not the same and neither one is always appropriate.
It makes sense for these distinctions to be ignored in a template language, the person writing the html doesn't care, the template language doesn't have function pointers so it knows you don't want one. Programmers who write the python that drive the template however do care about these distinctions.

In addition to the points already posted, consider this. Python uses special member variables and functions to provide metadata about the object. Both the interpreter and programmers make heavy use of these. For example, both dicts and lists have a __len__ member function. Now, if a dict's data were accessed by using the . operator, a potential ambiguity arises if the dict has a key called __len__. You could special-case these, but many objects have a __dict__ attribute which is a mapping of member names and values. If that object happened to be a container, which also defined a __len__ attribute, you would end up with an utter mess.
Problems like this would end up turning Python into a mishmash of special cases that the programmer would have to constantly be aware of. This would detract from the reason why many people use Python in the first place, i.e., its elegant simplicity.
Now, consider that new users often shadow built-ins (if the code in SO questions is any indication) and having something like this starts to look like a really bad idea, since it would exacerbate the problem many-fold.

In addition to the responses above, it's not practical to merge dictionary lookup and object lookup in general because of the restrictions on object members.
What if your key has whitespace? What if it's an int, or a frozenset, etc.? Dot notation can't account for these discrepancies, so while it's an acceptable tradeoff for a templating language, it's unacceptable for a general-purpose programming language like Python.

Are there reasons to use get/put methods instead of item access?

I find that I have recently been implementing Mapping interfaces on classes which on the surface fit the model (they are essentially just key-value stores with no more meta-data), but underneath they are sometimes quite complex.
Here are a couple examples of increasing severity:
An object which wraps another mapping converting all objects to strings when set.
An object which uses a local database as a back-end to store the key-value pairs.
An object which makes HTTP requests to remote servers to get/set data.
Lets suppose all of these examples seamlessly implement the Mapping interface, and the only indication that there is something fishy going on is that item access potentially could take a few seconds, and an item may not be retrievable in the same form as stored (if at all). I'm perfectly content with something like the first example, pretty okay with the second, but I'm getting kinda uncomfortable with the last.
The question is, is there a line at which the API for these models should not use item access, even though the underlying structure may feel like it fits on the surface?

It sounds like you are describing the standard anydbm module semantics. And just as anydbm can raise exception anydbm.error, so too could your subclass raise derivatives like MyDbmTimeoutError as needed. Whether you implement it as dictionary operations or function calls, the caller will still have to contend with exceptions anyway (e.g. KeyError, NameError).
I think that arbitrary "tied" hashes exist in Python 2 and 3.x is decent justification for saying it is a reasonable approach. Indeed, I've been looking for (and designing in my head) more complex bindings than simple key ⇒ value mappings without a heavy ORM SQL layer in between.
added: The more I think about it, the more Pythonic tied dictionaries seem to be. A key ⇒ value collection is a dictionary. Whether it lives in core or on disk or across the network is an implementation detail that is best abstracted away. The only substantive differences are increased latency and possible unavailability; however, on a virtual memory based OS, "core" can have higher latency than RAM and in a multiprocessing OS, "core" can become unavailable too. So these are differences in degree only, not kind.

From a strictly philosophical point of view, I don't think that there is a line you can cross with this. If some tool provides the needed functionality, but its API is different, adapt away. The only time you shouldn't do this is if the adapted to API simply is not expressive enough to manipulate the adapted component in a way that is needed.
I wouldn't hesitate to adapt a database into a dict, because that's a great way to manipulate collections, and it's already compatible with a heck of a lot of other code. If I find that my particular application must make calls to the database connections begin(), commit(), and rollback() methods to work right, then a dict won't do, since dict's don't have transaction semantics.

Modifying state of other objects in a constructor: design no-no?

I'm refactoring some code and found this (simplified of course, but general idea):
class Variable:
def __init__(self):
self.__constraints = []
def addConstraint(self, c):
self.__constraints.append(c)
class Constraint:
def __init__(self, variables):
for v in variables:
v.addConstraint(self)
The fact that the constructor of Constraint modifies other object's states instead of its own smells a little funky to me. What do other people think - is this OK, or is it a prime candidate for refactoring?
Edit: My concern is not the parent/child relationship, but that it is linked up inside the constructor rather than in a separate method.

I see it as a self registration pattern. "Hello I'm new here, please allow me to join."
I might prefer to have a differently named method so that the purpose is more clear, but I do actually quite like the approach.

I entirely concur with #djna's answer that the specific use case is perfectly legit -- here, it's an example of an object needing to en-register itself with a specified set of registries "at birth".
A very sharp and extremely common subcase of that would be an observer object that exists strictly for the purpose of observing a given observable -- perfectly fine to pass the observable to the observer's initializer, and exactly the right way to ensure the class invariant "instances of this observer class are at all times connected to exactly one observable", which would be not established "at birth" if the registration was carried out only after the completion of initialization.
Other similar cases include for example a widget object that must at all time exist within a container window: it would somewhat weird to implement it otherwise than having the widget take the parent as an initializer argument and tell the parent "hi, I'm your new child!".
At least in those 1-many cases you could imagine forcing the parent or observable to have a method that both creates and enregisters the new object. In a many-many case like this one, the somewhat inside-out nature of that approach gets revealed -- since the constraint must be registered with multiple variables, it would be "against the grain" to ask any specific one of them to create the constraint. The code you supply on the other hand is perfectly natural.
Only for cases that cannot reasonably be framed as the new object "enregistering itself" would I feel some doubt (there are a few other legit ones, such as objects creating and enregistering other auxiliary ones at birth, but they're nowhere near as common).

I agree with you. That's backwards. There may be some good reason for why, but it's unclear programming and it likely to bite someone if the foot sooner or later.

This is common usage when you have two objects that are closely related (i.e. where only one of them alone doesn't make sense). Most common case: Parent child relations. When you add a child to a parent (i.e. parent.children.append(child)), you often also update the child.parent pointer.

I personally am not necessarily opposed to this, but...
I would choose one usage pattern, and stick with it. In your case, since Variable already has a clean addConstraint method, my preference would be to use it.
Otherwise, you'll need to add good checking to prevent the user from constructing a Constraint, and then adding it to the Variable class (thereby adding it twice).
That being said, with something like a Constraint, though, I would probably not do this. A Constraint seems like a conceptually independent entity from a Variable. I see no logical reason the same constraint couldn't be added to two separate variables. I would just make it so you construct your constraint, then add them manually, specifically for this reason.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.