Modifying state of other objects in a constructor: design no-no?

Modifying state of other objects in a constructor: design no-no? - python

I'm refactoring some code and found this (simplified of course, but general idea):
class Variable:
def __init__(self):
self.__constraints = []
def addConstraint(self, c):
self.__constraints.append(c)
class Constraint:
def __init__(self, variables):
for v in variables:
v.addConstraint(self)
The fact that the constructor of Constraint modifies other object's states instead of its own smells a little funky to me. What do other people think - is this OK, or is it a prime candidate for refactoring?
Edit: My concern is not the parent/child relationship, but that it is linked up inside the constructor rather than in a separate method.

I see it as a self registration pattern. "Hello I'm new here, please allow me to join."
I might prefer to have a differently named method so that the purpose is more clear, but I do actually quite like the approach.

I entirely concur with #djna's answer that the specific use case is perfectly legit -- here, it's an example of an object needing to en-register itself with a specified set of registries "at birth".
A very sharp and extremely common subcase of that would be an observer object that exists strictly for the purpose of observing a given observable -- perfectly fine to pass the observable to the observer's initializer, and exactly the right way to ensure the class invariant "instances of this observer class are at all times connected to exactly one observable", which would be not established "at birth" if the registration was carried out only after the completion of initialization.
Other similar cases include for example a widget object that must at all time exist within a container window: it would somewhat weird to implement it otherwise than having the widget take the parent as an initializer argument and tell the parent "hi, I'm your new child!".
At least in those 1-many cases you could imagine forcing the parent or observable to have a method that both creates and enregisters the new object. In a many-many case like this one, the somewhat inside-out nature of that approach gets revealed -- since the constraint must be registered with multiple variables, it would be "against the grain" to ask any specific one of them to create the constraint. The code you supply on the other hand is perfectly natural.
Only for cases that cannot reasonably be framed as the new object "enregistering itself" would I feel some doubt (there are a few other legit ones, such as objects creating and enregistering other auxiliary ones at birth, but they're nowhere near as common).

I agree with you. That's backwards. There may be some good reason for why, but it's unclear programming and it likely to bite someone if the foot sooner or later.

This is common usage when you have two objects that are closely related (i.e. where only one of them alone doesn't make sense). Most common case: Parent child relations. When you add a child to a parent (i.e. parent.children.append(child)), you often also update the child.parent pointer.

I personally am not necessarily opposed to this, but...
I would choose one usage pattern, and stick with it. In your case, since Variable already has a clean addConstraint method, my preference would be to use it.
Otherwise, you'll need to add good checking to prevent the user from constructing a Constraint, and then adding it to the Variable class (thereby adding it twice).
That being said, with something like a Constraint, though, I would probably not do this. A Constraint seems like a conceptually independent entity from a Variable. I see no logical reason the same constraint couldn't be added to two separate variables. I would just make it so you construct your constraint, then add them manually, specifically for this reason.

Related

What, if any, are the differences between implementing one class as a child of another versus having the child employ an instance of the parent? [duplicate]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
There are two schools of thought on how to best extend, enhance, and reuse code in an object-oriented system:
Inheritance: extend the functionality of a class by creating a subclass. Override superclass members in the subclasses to provide new functionality. Make methods abstract/virtual to force subclasses to "fill-in-the-blanks" when the superclass wants a particular interface but is agnostic about its implementation.
Aggregation: create new functionality by taking other classes and combining them into a new class. Attach an common interface to this new class for interoperability with other code.
What are the benefits, costs, and consequences of each? Are there other alternatives?
I see this debate come up on a regular basis, but I don't think it's been asked on
Stack Overflow yet (though there is some related discussion). There's also a surprising lack of good Google results for it.

It's not a matter of which is the best, but of when to use what.
In the 'normal' cases a simple question is enough to find out if we need inheritance or aggregation.
If The new class is more or less as the original class. Use inheritance. The new class is now a subclass of the original class.
If the new class must have the original class. Use aggregation. The new class has now the original class as a member.
However, there is a big gray area. So we need several other tricks.
If we have used inheritance (or we plan to use it) but we only use part of the interface, or we are forced to override a lot of functionality to keep the correlation logical. Then we have a big nasty smell that indicates that we had to use aggregation.
If we have used aggregation (or we plan to use it) but we find out we need to copy almost all of the functionality. Then we have a smell that points in the direction of inheritance.
To cut it short. We should use aggregation if part of the interface is not used or has to be changed to avoid an illogical situation. We only need to use inheritance, if we need almost all of the functionality without major changes. And when in doubt, use Aggregation.
An other possibility for, the case that we have an class that needs part of the functionality of the original class, is to split the original class in a root class and a sub class. And let the new class inherit from the root class. But you should take care with this, not to create an illogical separation.
Lets add an example. We have a class 'Dog' with methods: 'Eat', 'Walk', 'Bark', 'Play'.
class Dog
Eat;
Walk;
Bark;
Play;
end;
We now need a class 'Cat', that needs 'Eat', 'Walk', 'Purr', and 'Play'. So first try to extend it from a Dog.
class Cat is Dog
Purr;
end;
Looks, alright, but wait. This cat can Bark (Cat lovers will kill me for that). And a barking cat violates the principles of the universe. So we need to override the Bark method so that it does nothing.
class Cat is Dog
Purr;
Bark = null;
end;
Ok, this works, but it smells bad. So lets try an aggregation:
class Cat
has Dog;
Eat = Dog.Eat;
Walk = Dog.Walk;
Play = Dog.Play;
Purr;
end;
Ok, this is nice. This cat does not bark anymore, not even silent. But still it has an internal dog that wants out. So lets try solution number three:
class Pet
Eat;
Walk;
Play;
end;
class Dog is Pet
Bark;
end;
class Cat is Pet
Purr;
end;
This is much cleaner. No internal dogs. And cats and dogs are at the same level. We can even introduce other pets to extend the model. Unless it is a fish, or something that does not walk. In that case we again need to refactor. But that is something for an other time.

At the beginning of GOF they state
Favor object composition over class inheritance.
This is further discussed here

The difference is typically expressed as the difference between "is a" and "has a". Inheritance, the "is a" relationship, is summed up nicely in the Liskov Substitution Principle. Aggregation, the "has a" relationship, is just that - it shows that the aggregating object has one of the aggregated objects.
Further distinctions exist as well - private inheritance in C++ indicates a "is implemented in terms of" relationship, which can also be modeled by the aggregation of (non-exposed) member objects as well.

Here's my most common argument:
In any object-oriented system, there are two parts to any class:
Its interface: the "public face" of the object. This is the set of capabilities it announces to the rest of the world. In a lot of languages, the set is well defined into a "class". Usually these are the method signatures of the object, though it varies a bit by language.
Its implementation: the "behind the scenes" work that the object does to satisfy its interface and provide functionality. This is typically the code and member data of the object.
One of the fundamental principles of OOP is that the implementation is encapsulated (ie:hidden) within the class; the only thing that outsiders should see is the interface.
When a subclass inherits from a subclass, it typically inherits both the implementation and the interface. This, in turn, means that you're forced to accept both as constraints on your class.
With aggregation, you get to choose either implementation or interface, or both -- but you're not forced into either. The functionality of an object is left up to the object itself. It can defer to other objects as it likes, but it's ultimately responsible for itself. In my experience, this leads to a more flexible system: one that's easier to modify.
So, whenever I'm developing object-oriented software, I almost always prefer aggregation over inheritance.

I gave an answer to "Is a" vs "Has a" : which one is better?.
Basically I agree with other folks: use inheritance only if your derived class truly is the type you're extending, not merely because it contains the same data. Remember that inheritance means the subclass gains the methods as well as the data.
Does it make sense for your derived class to have all the methods of the superclass? Or do you just quietly promise yourself that those methods should be ignored in the derived class? Or do you find yourself overriding methods from the superclass, making them no-ops so no one calls them inadvertently? Or giving hints to your API doc generation tool to omit the method from the doc?
Those are strong clues that aggregation is the better choice in that case.

I see a lot of "is-a vs. has-a; they're conceptually different" responses on this and the related questions.
The one thing I've found in my experience is that trying to determine whether a relationship is "is-a" or "has-a" is bound to fail. Even if you can correctly make that determination for the objects now, changing requirements mean that you'll probably be wrong at some point in the future.
Another thing I've found is that it's very hard to convert from inheritance to aggregation once there's a lot of code written around an inheritance hierarchy. Just switching from a superclass to an interface means changing nearly every subclass in the system.
And, as I mentioned elsewhere in this post, aggregation tends to be less flexible than inheritance.
So, you have a perfect storm of arguments against inheritance whenever you have to choose one or the other:
Your choice will likely be the wrong one at some point
Changing that choice is difficult once you've made it.
Inheritance tends to be a worse choice as it's more constraining.
Thus, I tend to choose aggregation -- even when there appears to be a strong is-a relationship.

The question is normally phrased as Composition vs. Inheritance, and it has been asked here before.

I wanted to make this a comment on the original question, but 300 characters bites [;<).
I think we need to be careful. First, there are more flavors than the two rather specific examples made in the question.
Also, I suggest that it is valuable not to confuse the objective with the instrument. One wants to make sure that the chosen technique or methodology supports achievement of the primary objective, but I don't thing out-of-context which-technique-is-best discussion is very useful. It does help to know the pitfalls of the different approaches along with their clear sweet spots.
For example, what are you out to accomplish, what do you have available to start with, and what are the constraints?
Are you creating a component framework, even a special purpose one? Are interfaces separable from implementations in the programming system or is it accomplished by a practice using a different sort of technology? Can you separate the inheritance structure of interfaces (if any) from the inheritance structure of classes that implement them? Is it important to hide the class structure of an implementation from the code that relies on the interfaces the implementation delivers? Are there multiple implementations to be usable at the same time or is the variation more over-time as a consequence of maintenance and enhancememt? This and more needs to be considered before you fixate on a tool or a methodology.
Finally, is it that important to lock distinctions in the abstraction and how you think of it (as in is-a versus has-a) to different features of the OO technology? Perhaps so, if it keeps the conceptual structure consistent and manageable for you and others. But it is wise not to be enslaved by that and the contortions you might end up making. Maybe it is best to stand back a level and not be so rigid (but leave good narration so others can tell what's up). [I look for what makes a particular portion of a program explainable, but some times I go for elegance when there is a bigger win. Not always the best idea.]
I'm an interface purist, and I am drawn to the kinds of problems and approaches where interface purism is appropriate, whether building a Java framework or organizing some COM implementations. That doesn't make it appropriate for everything, not even close to everything, even though I swear by it. (I have a couple of projects that appear to provide serious counter-examples against interface purism, so it will be interesting to see how I manage to cope.)

I'll cover the where-these-might-apply part. Here's an example of both, in a game scenario. Suppose, there's a game which has different types of soldiers. Each soldier can have a knapsack which can hold different things.
Inheritance here?
There's a marine, green beret & a sniper. These are types of soldiers. So, there's a base class Soldier with Marine, Green Beret & Sniper as derived classes
Aggregation here?
The knapsack can contain grenades, guns (different types), knife, medikit, etc. A soldier can be equipped with any of these at any given point in time, plus he can also have a bulletproof vest which acts as armor when attacked and his injury decreases to a certain percentage. The soldier class contains an object of bulletproof vest class and the knapsack class which contains references to these items.

I think it's not an either/or debate. It's just that:
is-a (inheritance) relationships occur less often than has-a (composition) relationships.
Inheritance is harder to get right, even when it's appropriate to use it, so due diligence has to be taken because it can break encapsulation, encourage tight coupling by exposing implementation and so forth.
Both have their place, but inheritance is riskier.
Although of course it wouldn't make sense to have a class Shape 'having-a' Point and a Square classes. Here inheritance is due.
People tend to think about inheritance first when trying to design something extensible, that is what's wrong.

Favour happens when both candidate qualifies. A and B are options and you favour A. The reason is that composition offers more extension/flexiblity possiblities than generalization. This extension/flexiblity refers mostly to runtime/dynamic flexibility.
The benefit is not immediately visible. To see the benefit you need to wait for the next unexpected change request. So in most cases those sticked to generlalization fails when compared to those who embraced composition(except one obvious case mentioned later). Hence the rule. From a learning point of view if you can implement a dependency injection successfully then you should know which one to favour and when. The rule helps you in making a decision as well; if you are not sure then select composition.
Summary: Composition :The coupling is reduced by just having some smaller things you plug into something bigger, and the bigger object just calls the smaller object back. Generlization: From an API point of view defining that a method can be overridden is a stronger commitment than defining that a method can be called. (very few occassions when Generalization wins). And never forget that with composition you are using inheritance too, from a interface instead of a big class

Both approaches are used to solve different problems. You don't always need to aggregate over two or more classes when inheriting from one class.
Sometimes you do have to aggregate a single class because that class is sealed or has otherwise non-virtual members you need to intercept so you create a proxy layer that obviously isn't valid in terms of inheritance but so long as the class you are proxying has an interface you can subscribe to this can work out fairly well.

Proper usage of QValidator

I use validators to filter user input. Normally I have my validators working like this:
my_reg_ex = QRegExp("[1-9]\d{0,5}")
my_validator = QRegExpValidator(my_reg_ex, self.ui.lineEdit_test)
self.ui.lineEdit_test.setValidator(my_validator)
I wrote this after looking at some examples online. But I just noticed that if I remove the last part on the second line:
, self.ui.lineEdit_test
The code works exactly the same. I have a couple of these validators all around. I was wondering if it's okay to just use it without the part I mentioned. For example:
my_reg_ex = QRegExp("[1-9]\d{0,5}")
my_validator = QRegExpValidator(my_reg_ex)
self.ui.lineEdit_test.setValidator(my_validator)
Is there any difference between these? If there is please explain and tell me which one is the better way to go.

The QRegExpValidator class inherits QObject, so its constructor has an argument that takes a parent QObject. One general reason for setting a parent, is to ensure the object doesn't get garbage-collected when it goes out of scope. This can easily happen if you don't keep any other reference to the object, and is a very common cause of many of the problems seen in newbie questions on SO.
However, this is actually not a problem in your specific example. This is because the line-edit takes ownership of the validator when you pass it to setValidator(), so you don't need to explicitly keep a reference to it yourself. As far as the code in your question is concerned, it really makes no difference whether you set a parent or not.
Having said that, there is at least one scenario where not setting a parent may be advantageous. This could arise if you needed to regularly reset the validator at runtime. If you gave each new validator a parent, the old one would not be deleted when setValidator() is called (because the parent would still hold a reference to it). So for the purposes of simplifying object-cleanup, not setting a parent in this situation might be preferrable.

Should properties do nontrivial initialization?

I have an object that is basically a Python implementation of an Oracle sequence. For a variety of reasons, we have to get the nextval of an Oracle sequence, count up manually when determining primary keys, then update the sequence once the records have been inserted.
So here's the steps my object does:
Construct an object, with a key_generator attribute initially set to None.
Get the first value from the database, passing it to an itertools.count.
Return keys from that generator using a property next_key.
I'm a little bit unsure about where to do step 2. I can think of three possibilities:
Skip step 1 and do step 2 in the constructor. I find this evil because I tend to dislike doing this kind of initialization in a constructor.
Make next_key get the starting key from the database the first time it is called. I find this evil because properties are typically assumed to be trivial.
Make next_key into a get_next_key method. I dislike this because properties just seem more natural here.
Which is the lesser of 3 evils? I'm leaning towards #2, because only the first call to this property will result in a database query.

I think your doubts come from PEP-8:
Note 3: Avoid using properties for computationally expensive
operations; the attribute notation makes the caller believe
that access is (relatively) cheap.
Adherence to a standard behavior is usually quite a good idea; and this would be a reason to scrap away solution #2.
However, if you feel the interface is better with property than a method, then I would simply document that the first call is more expensive, and go with that (solution #2).
In the end, recommendations are meant to be interpreted.

I agree that attribute access and everything that looks like it (i.e. properties in the Python context) should be fairly trivial. If a property is going to perform a potentially costly operation, use a method to make this explicit. I recommend a name like "fetch_XYZ" or "retrieve_XYZ", since "get_XYZ" is used in some languages (e.g. Java) as a convention for simple attribute access, is quite generic, and does not sound "costly" either.
A good guideline is: If your property could throw an exception that is not due to a programming error, it should be a method. For example, throwing a (hypothetical) DatabaseConnectionError from a property is bad, while throwing an ObjectStateError would be okay.
Also, when I understood you correctly, you want to return the next key, whenever the next_key property is accessed. I recommend strongly against having side-effects (apart from caching, cheap lazy initialization, etc.) in your properties. Properties (and attributes for that matter) should be idempotent.

I've decided that the key smell in the solution I'm proposing is that the property I was creating contained the word "next" in it. Thus, instead of making a next_key property, I've decided to turn my DatabaseIntrospector class into a KeyCounter class and implemented the iterator protocol (ie making a plain old next method that returns the next key).

Code organization in Python: Where is a good place to put obscure methods?

I have a class called Path for which there are defined about 10 methods, in a dedicated module Path.py. Recently I had a need to write 5 more methods for Path, however these new methods are quite obscure and technical and 90% of the time they are irrelevant.
Where would be a good place to put them so their context is clear? Of course I can just put them with the class definition, but I don't like that because I like to keep the important things separate from the obscure things.
Currently I have these methods as functions that are defined in a separate module, just to keep things separate, but it would be better to have them as bound methods. (Currently they take the Path instance as an explicit parameter.)
Does anyone have a suggestion?

If the method is relevant to the Path - no matter how obscure - I think it should reside within the class itself.
If you have multiple places where you have path-related functionality, it leads to problems. For example, if you want to check if some functionality already exists, how will a new programmer know to check the other, less obvious places?
I think a good practice might be to order functions by importance. As you may have heard, some suggest putting public members of classes first, and private/protected ones after. You could consider putting the common methods in your class higher than the obscure ones.

If you're keen to put those methods in a different source file at any cost, AND keen to have them at methods at any cost, you can achieve both goals by using the different source file to define a mixin class and having your Path class import that method and multiply-inherit from that mixin. So, technically, it's quite feasible.
However, I would not recommend this course of action: it's worth using "the big guns" (such as multiple inheritance) only to serve important goals (such as reuse and removing repetition), and separating methods out in this way is not really a particularly crucial goal.
If those "obscure methods" played no role you would not be implementing them, so they must have SOME importance, after all; so I'd just clarify in docstrings and comments what they're for, maybe explicitly mention that they're rarely needed, and leave it at that.

I would just prepend the names with an underscore _, to show that the reader shouldn't bother about them.
It's conventionally the same thing as private members in other languages.

Put them in the Path class, and document that they are "obscure" either with comments or docstrings. Separate them at the end if you like.

Oh wait, I thought of something -- I can just define them in the Path.py module, where every obscure method will be a one-liner that will call the function from the separate module that currently exists. With this compromise, the obscure methods will comprise of maybe 10 lines in the end of the file instead of 50% of its bulk.

I suggest making them accessible from a property of the Path class called something like "Utilties". For example: Path.Utilities.RazzleDazzle. This will help with auto-completion tools and general maintenance.

What's a good way to keep track of class instance variables in Python?

I'm a C++ programmer just starting to learn Python. I'd like to know how you keep track of instance variables in large Python classes. I'm used to having a .h file that gives me a neat list (complete with comments) of all the class' members. But since Python allows you to add new instance variables on the fly, how do you keep track of them all?
I'm picturing a scenario where I mistakenly add a new instance variable when I already had one - but it was 1000 lines away from where I was working. Are there standard practices for avoiding this?
Edit: It appears I created some confusion with the term "member variable." I really mean instance variable, and I've edited my question accordingly.

I would say, the standard practice to avoid this is to not write classes where you can be 1000 lines away from anything!
Seriously, that's way too much for just about any useful class, especially in a language that is as expressive as Python. Using more of what the Standard Library offers and abstracting away code into separate modules should help keeping your LOC count down.
The largest classes in the standard library have well below 100 lines!

First of all: class attributes, or instance attributes? Or both? =)
Usually you just add instance attributes in __init__, and class attributes in the class definition, often before method definitions... which should probably cover 90% of use cases.
If code adds attributes on the fly, it probably (hopefully :-) has good reasons for doing so... leveraging dynamic features, introspection, etc. Other than that, adding attributes this way is probably less common than you think.

pylint can statically detect attributes that aren't detected in __init__, along with many other potential bugs.
I'd also recommend writing unit tests and running your code often to detect these types of "whoopsie" programming mistakes.

Instance variables should be initialized in the class's __init__() method. (In general)
If that's not possible. You can use __dict__ to get a dictionary of all instance variables of an object during runtime. If you really need to track this in documentation add a list of instance variables you are using into the docstring of the class.

It sounds like you're talking about instance variables and not class variables. Note that in the following code a is a class variable and b is an instance variable.
class foo:
a = 0 #class variable
def __init__(self):
self.b = 0 #instance variable
Regarding the hypothetical where you create an unneeded instance variable because the other one was about one thousand lines away: The best solution is to not have classes that are one thousand lines long. If you can't avoid the length, then your class should have a well defined purpose and that will enable you to keep all of the complexities in your head at once.

A documentation generation system such as Epydoc can be used as a reference for what instance/class variables an object has, and if you're worried about accidentally creating new variables via typos you can use PyChecker to check your code for this.

This is a common concern I hear from many programmers who come from a C, C++, or other statically typed language where variables are pre-declared. In fact it was one of the biggest concerns we heard when we were persuading programmers at our organization to abandon C for high-level programs and use Python instead.
In theory, yes you can add instance variables to an object at any time. Yes it can happen from typos, etc. In practice, it rarely results in a bug. When it does, the bugs are generally not hard to find.
As long as your classes are not bloated (1000 lines is pretty huge!) and you have ample unit tests, you should rarely run in to a real problem. In case you do, it's easy to drop to a Python console at almost any time and inspect things as much as you wish.

It seems to me that the main issue here is that you're thinking in terms of C++ when you're working in python.
Having a 1000 line class is not a very wise thing anyway in python, (I know it happens alot in C++ though),
Learn to exploit the dynamism that python gives you, for instance you can combine lists and dictionaries in very creative ways and save your self hundreds of useless lines of code.
For example, if you're mapping strings to functions (for dispatching), you can exploit the fact that functions are first class objects and have a dictionary that goes like:
d = {'command1' : func1, 'command2': func2, 'command3' : func3}
#then somewhere else use this list to dispatch
#given a string `str`
func = d[str]
func() #call the function!
Something like this in C++ would take up sooo many lines of code!

The easiest is to use an IDE. PyDev is a plugin for eclipse.
I'm not a full on expert in all ways pythonic, but in general I define my class members right under the class definition in python, so if I add members, they're all relative.
My personal opinion is that class members should be declared in one section, for this specific reason.
Local scoped variables, otoh, should be defined closest to when they are used (except in C--which I believe still requires variables to be declared at the beginning of a method).

Consider using slots.
For example:
class Foo:
__slots__ = "a b c".split()
x = Foo()
x.a =1 # ok
x.b =1 # ok
x.c =1 # ok
x.bb = 1 # will raise "AttributeError: Foo instance has no attribute 'bb'"
It is generally a concern in any dynamic programming language -- any language that does not require variable declaration -- that a typo in a variable name will create a new variable instead of raise an exception or cause a compile-time error. Slots helps with instance variables, but doesn't help you with, module-scope variables, globals, local variables, etc. There's no silver bullet for this; it's part of the trade-off of not having to declare variables.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.