Python: Nested Class vs Inheritance [duplicate]

Python: Nested Class vs Inheritance [duplicate] - python

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
There are two schools of thought on how to best extend, enhance, and reuse code in an object-oriented system:
Inheritance: extend the functionality of a class by creating a subclass. Override superclass members in the subclasses to provide new functionality. Make methods abstract/virtual to force subclasses to "fill-in-the-blanks" when the superclass wants a particular interface but is agnostic about its implementation.
Aggregation: create new functionality by taking other classes and combining them into a new class. Attach an common interface to this new class for interoperability with other code.
What are the benefits, costs, and consequences of each? Are there other alternatives?
I see this debate come up on a regular basis, but I don't think it's been asked on
Stack Overflow yet (though there is some related discussion). There's also a surprising lack of good Google results for it.

It's not a matter of which is the best, but of when to use what.
In the 'normal' cases a simple question is enough to find out if we need inheritance or aggregation.
If The new class is more or less as the original class. Use inheritance. The new class is now a subclass of the original class.
If the new class must have the original class. Use aggregation. The new class has now the original class as a member.
However, there is a big gray area. So we need several other tricks.
If we have used inheritance (or we plan to use it) but we only use part of the interface, or we are forced to override a lot of functionality to keep the correlation logical. Then we have a big nasty smell that indicates that we had to use aggregation.
If we have used aggregation (or we plan to use it) but we find out we need to copy almost all of the functionality. Then we have a smell that points in the direction of inheritance.
To cut it short. We should use aggregation if part of the interface is not used or has to be changed to avoid an illogical situation. We only need to use inheritance, if we need almost all of the functionality without major changes. And when in doubt, use Aggregation.
An other possibility for, the case that we have an class that needs part of the functionality of the original class, is to split the original class in a root class and a sub class. And let the new class inherit from the root class. But you should take care with this, not to create an illogical separation.
Lets add an example. We have a class 'Dog' with methods: 'Eat', 'Walk', 'Bark', 'Play'.
class Dog
Eat;
Walk;
Bark;
Play;
end;
We now need a class 'Cat', that needs 'Eat', 'Walk', 'Purr', and 'Play'. So first try to extend it from a Dog.
class Cat is Dog
Purr;
end;
Looks, alright, but wait. This cat can Bark (Cat lovers will kill me for that). And a barking cat violates the principles of the universe. So we need to override the Bark method so that it does nothing.
class Cat is Dog
Purr;
Bark = null;
end;
Ok, this works, but it smells bad. So lets try an aggregation:
class Cat
has Dog;
Eat = Dog.Eat;
Walk = Dog.Walk;
Play = Dog.Play;
Purr;
end;
Ok, this is nice. This cat does not bark anymore, not even silent. But still it has an internal dog that wants out. So lets try solution number three:
class Pet
Eat;
Walk;
Play;
end;
class Dog is Pet
Bark;
end;
class Cat is Pet
Purr;
end;
This is much cleaner. No internal dogs. And cats and dogs are at the same level. We can even introduce other pets to extend the model. Unless it is a fish, or something that does not walk. In that case we again need to refactor. But that is something for an other time.

At the beginning of GOF they state
Favor object composition over class inheritance.
This is further discussed here

The difference is typically expressed as the difference between "is a" and "has a". Inheritance, the "is a" relationship, is summed up nicely in the Liskov Substitution Principle. Aggregation, the "has a" relationship, is just that - it shows that the aggregating object has one of the aggregated objects.
Further distinctions exist as well - private inheritance in C++ indicates a "is implemented in terms of" relationship, which can also be modeled by the aggregation of (non-exposed) member objects as well.

Here's my most common argument:
In any object-oriented system, there are two parts to any class:
Its interface: the "public face" of the object. This is the set of capabilities it announces to the rest of the world. In a lot of languages, the set is well defined into a "class". Usually these are the method signatures of the object, though it varies a bit by language.
Its implementation: the "behind the scenes" work that the object does to satisfy its interface and provide functionality. This is typically the code and member data of the object.
One of the fundamental principles of OOP is that the implementation is encapsulated (ie:hidden) within the class; the only thing that outsiders should see is the interface.
When a subclass inherits from a subclass, it typically inherits both the implementation and the interface. This, in turn, means that you're forced to accept both as constraints on your class.
With aggregation, you get to choose either implementation or interface, or both -- but you're not forced into either. The functionality of an object is left up to the object itself. It can defer to other objects as it likes, but it's ultimately responsible for itself. In my experience, this leads to a more flexible system: one that's easier to modify.
So, whenever I'm developing object-oriented software, I almost always prefer aggregation over inheritance.

I gave an answer to "Is a" vs "Has a" : which one is better?.
Basically I agree with other folks: use inheritance only if your derived class truly is the type you're extending, not merely because it contains the same data. Remember that inheritance means the subclass gains the methods as well as the data.
Does it make sense for your derived class to have all the methods of the superclass? Or do you just quietly promise yourself that those methods should be ignored in the derived class? Or do you find yourself overriding methods from the superclass, making them no-ops so no one calls them inadvertently? Or giving hints to your API doc generation tool to omit the method from the doc?
Those are strong clues that aggregation is the better choice in that case.

I see a lot of "is-a vs. has-a; they're conceptually different" responses on this and the related questions.
The one thing I've found in my experience is that trying to determine whether a relationship is "is-a" or "has-a" is bound to fail. Even if you can correctly make that determination for the objects now, changing requirements mean that you'll probably be wrong at some point in the future.
Another thing I've found is that it's very hard to convert from inheritance to aggregation once there's a lot of code written around an inheritance hierarchy. Just switching from a superclass to an interface means changing nearly every subclass in the system.
And, as I mentioned elsewhere in this post, aggregation tends to be less flexible than inheritance.
So, you have a perfect storm of arguments against inheritance whenever you have to choose one or the other:
Your choice will likely be the wrong one at some point
Changing that choice is difficult once you've made it.
Inheritance tends to be a worse choice as it's more constraining.
Thus, I tend to choose aggregation -- even when there appears to be a strong is-a relationship.

The question is normally phrased as Composition vs. Inheritance, and it has been asked here before.

I wanted to make this a comment on the original question, but 300 characters bites [;<).
I think we need to be careful. First, there are more flavors than the two rather specific examples made in the question.
Also, I suggest that it is valuable not to confuse the objective with the instrument. One wants to make sure that the chosen technique or methodology supports achievement of the primary objective, but I don't thing out-of-context which-technique-is-best discussion is very useful. It does help to know the pitfalls of the different approaches along with their clear sweet spots.
For example, what are you out to accomplish, what do you have available to start with, and what are the constraints?
Are you creating a component framework, even a special purpose one? Are interfaces separable from implementations in the programming system or is it accomplished by a practice using a different sort of technology? Can you separate the inheritance structure of interfaces (if any) from the inheritance structure of classes that implement them? Is it important to hide the class structure of an implementation from the code that relies on the interfaces the implementation delivers? Are there multiple implementations to be usable at the same time or is the variation more over-time as a consequence of maintenance and enhancememt? This and more needs to be considered before you fixate on a tool or a methodology.
Finally, is it that important to lock distinctions in the abstraction and how you think of it (as in is-a versus has-a) to different features of the OO technology? Perhaps so, if it keeps the conceptual structure consistent and manageable for you and others. But it is wise not to be enslaved by that and the contortions you might end up making. Maybe it is best to stand back a level and not be so rigid (but leave good narration so others can tell what's up). [I look for what makes a particular portion of a program explainable, but some times I go for elegance when there is a bigger win. Not always the best idea.]
I'm an interface purist, and I am drawn to the kinds of problems and approaches where interface purism is appropriate, whether building a Java framework or organizing some COM implementations. That doesn't make it appropriate for everything, not even close to everything, even though I swear by it. (I have a couple of projects that appear to provide serious counter-examples against interface purism, so it will be interesting to see how I manage to cope.)

I'll cover the where-these-might-apply part. Here's an example of both, in a game scenario. Suppose, there's a game which has different types of soldiers. Each soldier can have a knapsack which can hold different things.
Inheritance here?
There's a marine, green beret & a sniper. These are types of soldiers. So, there's a base class Soldier with Marine, Green Beret & Sniper as derived classes
Aggregation here?
The knapsack can contain grenades, guns (different types), knife, medikit, etc. A soldier can be equipped with any of these at any given point in time, plus he can also have a bulletproof vest which acts as armor when attacked and his injury decreases to a certain percentage. The soldier class contains an object of bulletproof vest class and the knapsack class which contains references to these items.

I think it's not an either/or debate. It's just that:
is-a (inheritance) relationships occur less often than has-a (composition) relationships.
Inheritance is harder to get right, even when it's appropriate to use it, so due diligence has to be taken because it can break encapsulation, encourage tight coupling by exposing implementation and so forth.
Both have their place, but inheritance is riskier.
Although of course it wouldn't make sense to have a class Shape 'having-a' Point and a Square classes. Here inheritance is due.
People tend to think about inheritance first when trying to design something extensible, that is what's wrong.

Favour happens when both candidate qualifies. A and B are options and you favour A. The reason is that composition offers more extension/flexiblity possiblities than generalization. This extension/flexiblity refers mostly to runtime/dynamic flexibility.
The benefit is not immediately visible. To see the benefit you need to wait for the next unexpected change request. So in most cases those sticked to generlalization fails when compared to those who embraced composition(except one obvious case mentioned later). Hence the rule. From a learning point of view if you can implement a dependency injection successfully then you should know which one to favour and when. The rule helps you in making a decision as well; if you are not sure then select composition.
Summary: Composition :The coupling is reduced by just having some smaller things you plug into something bigger, and the bigger object just calls the smaller object back. Generlization: From an API point of view defining that a method can be overridden is a stronger commitment than defining that a method can be called. (very few occassions when Generalization wins). And never forget that with composition you are using inheritance too, from a interface instead of a big class

Both approaches are used to solve different problems. You don't always need to aggregate over two or more classes when inheriting from one class.
Sometimes you do have to aggregate a single class because that class is sealed or has otherwise non-virtual members you need to intercept so you create a proxy layer that obviously isn't valid in terms of inheritance but so long as the class you are proxying has an interface you can subscribe to this can work out fairly well.

Related

Both Inheritance and composition in Python, bad practice?

I'm working a project, where the natural approach is to implement a main object with sub-components based on different classes, e.g. a PC consisting of CPU, GPU, ...
I've started with a composition structure, where the components have attributes and functions inherent to their sub-system and whenever higher level attributes are needed, they are given as arguments.
Now, as I'm adding more functionality, it would make sense to have different types of the main object, e.g. a notebook, which would extend the PC class, but still have a CPU, etc. At the moment, I'm using a separate script, which contains all the functions related to the type.
Would it be considered bad practice to combine inheritance and composition, by using child classes for different types of the main object?

In short
Preferring composition over inheritance does not exclude inheritance, and does not automatically make the combination of both a bad practice. It's about making an informed decision.
More details
The recommendation to prefer composition over inheritance is a rule of thumb. It was first coined by GoF. If you'll read their full argumentation, you'll see that it's not about composition being good and inheritance bad; it's that composition is more flexible and therefore more suitable in many cases.
But you'll need to decide case by case. And indeed, if you consider some variant of the composite pattern, specialization of the leaf and composite classes can be perfectly justified in some situations:
polymorphism could avoid a lot of if and cases,
composition could in some circumstances require additional call-forwarding overhead that might not be necessary when it's really about type specialization.
combination of composition and inheritance could be used to get the best of both worlds (caution: if applied carelessly, it could also give the worst of both worlds)
Note: If you'd provide a short overview of the context with an UML diagram, more arguments could be provided in your particular context. Meanwhile, this question on SE, could also be of interest

How do I avoid subclassing a pandas DataFrame using composition?

The pandas documentation recommends against sub-classing their data structures. One of their recommended alternatives is to use composition, but they just point readers to a Wikipedia article on composition vs. inheritance. That article and other resources I've found have not helped me understand how to extend a pandas DataFrame using composition. Can someone explain composition in this context and tell me about cases where composition might be a preferred alternative to sub-classing pd.DataFrame? A simple example or a link to information that's more instructive than Wikipedia articles would be very helpful.
In this question I'm specifically asking how composition should be used in cases where someone might be tempted to subclass pd.DataFrame. I understand there are other solutions to extending a Python object that do not involve composition, and I asked another question about extending pandas DataFrames that resulted in a different solution using a wrapper class.
I didn't understand that "wrapping" and "composition" refer to the same approach here, as noted in MaxYarmolinsky's answer below. The answer to the question I linked to above has a more complete discussion about using composition in this case, which may require handling __getattr__, __getitem__, and __setitem__ properly (I realize this is obvious to people who know what they're doing, but I had to ask my previous question because I had failed to get/set items when I tried on my own).

Just some googling show you how to create a simple class as you describe through composition.
class mydataframe():
def __init__(self,data):
self.coredataframe = pd.DataFrame(data)
self.otherattribute = None
Then you can add methods and attributes of your own...

In OOP inheriting models an "is-a" relationship where composition models "has-a."
In general you should reach for composition over inheritance unless you have a specific polymorphic design in mind as it is less tightly coupled and more modular. Inheritance is the strongest coupling you can do. And strong coupling leads to maintenance difficulties (everything is connected and hard to separate), whereas composition is much easier to refactor.
Inheritance can also lead to confusing inheritance hierarchies if care is not taken with the design or design is incremental.
That said don't be afraid to use inheritance for polymorphism. But be wary of using it for simple code reuse.

How do I design a class in Python?

I've had some really awesome help on my previous questions for detecting paws and toes within a paw, but all these solutions only work for one measurement at a time.
Now I have data that consists off:
about 30 dogs;
each has 24 measurements (divided into several subgroups);
each measurement has at least 4 contacts (one for each paw) and
each contact is divided into 5 parts and
has several parameters, like contact time, location, total force etc.
Obviously sticking everything into one big object isn't going to cut it, so I figured I needed to use classes instead of the current slew of functions. But even though I've read Learning Python's chapter about classes, I fail to apply it to my own code (GitHub link)
I also feel like it's rather strange to process all the data every time I want to get out some information. Once I know the locations of each paw, there's no reason for me to calculate this again. Furthermore, I want to compare all the paws of the same dog to determine which contact belongs to which paw (front/hind, left/right). This would become a mess if I continue using only functions.
So now I'm looking for advice on how to create classes that will let me process my data (link to the zipped data of one dog) in a sensible fashion.

How to design a class.
Write down the words. You started to do this. Some people don't and wonder why they have problems.
Expand your set of words into simple statements about what these objects will be doing. That is to say, write down the various calculations you'll be doing on these things. Your short list of 30 dogs, 24 measurements, 4 contacts, and several "parameters" per contact is interesting, but only part of the story. Your "locations of each paw" and "compare all the paws of the same dog to determine which contact belongs to which paw" are the next step in object design.
Underline the nouns. Seriously. Some folks debate the value of this, but I find that for first-time OO developers it helps. Underline the nouns.
Review the nouns. Generic nouns like "parameter" and "measurement" need to be replaced with specific, concrete nouns that apply to your problem in your problem domain. Specifics help clarify the problem. Generics simply elide details.
For each noun ("contact", "paw", "dog", etc.) write down the attributes of that noun and the actions in which that object engages. Don't short-cut this. Every attribute. "Data Set contains 30 Dogs" for example is important.
For each attribute, identify if this is a relationship to a defined noun, or some other kind of "primitive" or "atomic" data like a string or a float or something irreducible.
For each action or operation, you have to identify which noun has the responsibility, and which nouns merely participate. It's a question of "mutability". Some objects get updated, others don't. Mutable objects must own total responsibility for their mutations.
At this point, you can start to transform nouns into class definitions. Some collective nouns are lists, dictionaries, tuples, sets or namedtuples, and you don't need to do very much work. Other classes are more complex, either because of complex derived data or because of some update/mutation which is performed.
Don't forget to test each class in isolation using unittest.
Also, there's no law that says classes must be mutable. In your case, for example, you have almost no mutable data. What you have is derived data, created by transformation functions from the source dataset.

The following advices (similar to #S.Lott's advice) are from the book, Beginning Python: From Novice to Professional
Write down a description of your problem (what should the problem do?). Underline all the nouns, verbs, and adjectives.
Go through the nouns, looking for potential classes.
Go through the verbs, looking for potential methods.
Go through the adjectives, looking for potential attributes
Allocate methods and attributes to your classes
To refine the class, the book also advises we can do the following:
Write down (or dream up) a set of use cases—scenarios of how your program may be used. Try to cover all the functionally.
Think through every use case step by step, making sure that everything we need is covered.

I like the TDD approach...
So start by writing tests for what you want the behaviour to be. And write code that passes. At this point, don't worry too much about design, just get a test suite and software that passes. Don't worry if you end up with a single big ugly class, with complex methods.
Sometimes, during this initial process, you'll find a behaviour that is hard to test and needs to be decomposed, just for testability. This may be a hint that a separate class is warranted.
Then the fun part... refactoring. After you have working software you can see the complex pieces. Often little pockets of behaviour will become apparent, suggesting a new class, but if not, just look for ways to simplify the code. Extract service objects and value objects. Simplify your methods.
If you're using git properly (you are using git, aren't you?), you can very quickly experiment with some particular decomposition during refactoring, and then abandon it and revert back if it doesn't simplify things.
By writing tested working code first you should gain an intimate insight into the problem domain that you couldn't easily get with the design-first approach. Writing tests and code push you past that "where do I begin" paralysis.

The whole idea of OO design is to make your code map to your problem, so when, for example, you want the first footstep of a dog, you do something like:
dog.footstep(0)
Now, it may be that for your case you need to read in your raw data file and compute the footstep locations. All this could be hidden in the footstep() function so that it only happens once. Something like:
class Dog:
def __init__(self):
self._footsteps=None
def footstep(self,n):
if not self._footsteps:
self.readInFootsteps(...)
return self._footsteps[n]
[This is now a sort of caching pattern. The first time it goes and reads the footstep data, subsequent times it just gets it from self._footsteps.]
But yes, getting OO design right can be tricky. Think more about the things you want to do to your data, and that will inform what methods you'll need to apply to what classes.

After skimming your linked code, it seems to me that you are better off not designing a Dog class at this point. Rather, you should use Pandas and dataframes. A dataframe is a table with columns. You dataframe would have columns such as: dog_id, contact_part, contact_time, contact_location, etc.
Pandas uses Numpy arrays behind the scenes, and it has many convenience methods for you:
Select a dog by e.g. : my_measurements['dog_id']=='Charly'
save the data: my_measurements.save('filename.pickle')
Consider using pandas.read_csv() instead of manually reading the text files.

Writing out your nouns, verbs, adjectives is a great approach, but I prefer to think of class design as asking the question what data should be hidden?
Imagine you had a Query object and a Database object:
The Query object will help you create and store a query -- store, is the key here, as a function could help you create one just as easily. Maybe you could stay: Query().select('Country').from_table('User').where('Country == "Brazil"'). It doesn't matter exactly the syntax -- that is your job! -- the key is the object is helping you hide something, in this case the data necessary to store and output a query. The power of the object comes from the syntax of using it (in this case some clever chaining) and not needing to know what it stores to make it work. If done right the Query object could output queries for more then one database. It internally would store a specific format but could easily convert to other formats when outputting (Postgres, MySQL, MongoDB).
Now let's think through the Database object. What does this hide and store? Well clearly it can't store the full contents of the database, since that is why we have a database! So what is the point? The goal is to hide how the database works from people who use the Database object. Good classes will simplify reasoning when manipulating internal state. For this Database object you could hide how the networking calls work, or batch queries or updates, or provide a caching layer.
The problem is this Database object is HUGE. It represents how to access a database, so under the covers it could do anything and everything. Clearly networking, caching, and batching are quite hard to deal with depending on your system, so hiding them away would be very helpful. But, as many people will note, a database is insanely complex, and the further from the raw DB calls you get, the harder it is to tune for performance and understand how things work.
This is the fundamental tradeoff of OOP. If you pick the right abstraction it makes coding simpler (String, Array, Dictionary), if you pick an abstraction that is too big (Database, EmailManager, NetworkingManager), it may become too complex to really understand how it works, or what to expect. The goal is to hide complexity, but some complexity is necessary. A good rule of thumb is to start out avoiding Manager objects, and instead create classes that are like structs -- all they do is hold data, with some helper methods to create/manipulate the data to make your life easier. For example, in the case of EmailManager start with a function called sendEmail that takes an Email object. This is a simple starting point and the code is very easy to understand.
As for your example, think about what data needs to be together to calculate what you are looking for. If you wanted to know how far an animal was walking, for example, you could have AnimalStep and AnimalTrip (collection of AnimalSteps) classes. Now that each Trip has all the Step data, then it should be able to figure stuff out about it, perhaps AnimalTrip.calculateDistance() makes sense.

Explain polymorphism

What is polymorphism? I'm not sure I am understanding it correctly.
In the Python scope, what I am getting out of it is that I can define parameters as followed:
def blah (x, y)
without having to specify the type, as opposed to another language like Java, where it'd look more along the lines of:
public void blah (int x, string y)
Am I getting that right?

Beware that different people use different terminology; in particular there is often a rift between the object oriented community and the (rest of the) programming language theory community.
Generally speaking, polymorphism means that a method or function is able to cope with different types of input. For example the add method (or + operator) in the Integer class might perform integer addition, while the add method in the Float class performs floating-point addition, and the add method in the Bignum class performs the correct operations for an arbitrary-size number. Polymorphism is the ability to call the add method on an object without knowing what kind of a number it is.
One particular kind of polymorphism, usually called parametric polymorphism in the functional community and generic programming in the OOP community, is the ability to perform certain operations on an object without caring about its precise type. For example, to reverse a list, you don't need to care about the type of the elements of the list, you just need to know that it's a list. So you can write generic (in this sense) list reversal code: it'll work identically on lists of integers, strings, widgets, arbitrary objects, whatever. But you can't write code that adds the elements of a list in a generic way, because the way the elements are interpreted as numbers depends on what type they are.
Another kind of polymorphism, usually called ad-hoc polymorphism or (at least for some forms of it) generic programming in the functional community, and often subtyping polymorphism (though this somewhat restricts the concept) in the OOP community, it the ability to have a single method or function that behaves differently depending on the precise type of its arguments (or, for methods, the type of the object whose method is being invoked). The add example above is ad-hoc polymorphism. In dynamically typed languages this ability goes without saying; statically-typed languages tend to (but don't have to) have restrictions such as requiring that the argument be a subclass of some particular class (Addable).
Polymorphism is not about having to specify types when you define a function. That's more related to static vs. dynamic typing, though it's not an intrinsic part of the issue. Dynamically typed languages have no need for type declarations, while statically typed languages usually need some type declarations (going from quite a lot in Java to almost none in ML).

Hope from this example, you will understand what Polymorphism is. In this picture, all objects have a method Speak() but each has a different implementation. Polymorphism allows you to do this, you can declare an action for a class and its subclasses but for each subclass, you can write exactly what you want later.

The answers you've gotten are good, and explain what polymorphism is. I think it can also help to understand some of the reasons it is useful.
In some languages that lack polymorphism you find yourself in situations where you want to perform what is conceptually the same operation on different types of objects, in cases where that operation has to be implemented differently for each type. For instance, in a python-like syntax:
def dosomething(thing):
if type(thing)==suchandsuch:
#do some stuff
elif type(thing)==somesuch:
#do some other stuff
elif type(thing)==nonesuch:
#yet more stuff
There are some problems with this. The biggest is that it causes very tight coupling and a lot of repetition. You are likely to have this same set of tests in a lot of places in your code. What happens if you add a new type that has to support this operation? You have to go find every place you have this sort of conditional and add a new branch. And of course you have to have access to all the source code involved to be able to make those changes. On top of that conditional logic like this is wordy, and hard to understand in real cases.
It's nicer to be able to just write:
thing.dosomething()
On top of being a lot shorter this leads to much looser coupling. This example/explanation is geared to traditional OO languages like Python. The details are a bit different in, say, functional languages. But a lot of the general utility of polymorphism remains the same.

Polymorphism literally means "many shapes", and that's pretty good at explaining its purpose. The idea behind polymorphism is that one can use the same calls on different types and have them behave appropriately.
It is important to distinguish this from the typing system - strongly typed languages require that objects be related through an inheritance chain to be polymorphic, but with weakly typed languages, this is not necessary.
In Java (et al.), this is why interfaces are useful - they define the set of functions that can be called on objects without specifying the exact object - the objects that implement that interface are polymorphic with respect to that interface.
In Python, since things are dynamically typed, the interface is less formal, but the idea is the same - calling foo() will succeed on two objects that know how to properly implement foo(), but we don't care about what type they really are.

No, that is not polymorphism. Polymorphism is the concept that there can be many different implementations of an executable unit and the difference happen all behind the scene without the caller awareness.
For example, Bird and Plane are FlyingObject. So FlyingObject has a method call fly() and both Bird and Plane implement fly() method. Bird and Plan flies differently so the implementations are different. To the clients point of view, it does not matter how Bird or Plane fly, they can just call fly() method to a FlyingObject object does not matter if that object is Plan or Bird.
What you are describing is dynamic and static type checking which the type compatibility is done at run-time and compile-time respectively.
Hope this out.
NawaMan

Short answer: The ability to treat programmatic type instances of different types as the same for certain purposes.
Long answer:
From Ancient Greek poly (many) + morph (form) + -ism.
Polymorphism is a general technique enabling different types to be treated uniformly in some way. Examples in the programming world include:
parametric polymorphism (seen as
generics in Java)
subtyping
polymorphism, implemented in Java
using dynamic message dispatch
between object instances.
ad-hoc
polymorphism, which relates to the
ability to define functions of the
same name that vary only by the types
they deal with (overloading in Java).
The word polymorphism is also used to describe concepts in other, unrelated, domains such as genetics.

What you are talking about is auto-typing--or maybe type detection. It is something a Dynamic language tends to do--it allows the user to not know or care about the types at build time--the types are figured out at runtime and not restricted to a specific type.
Polymorphism is where you have two classes that inherit from a main class but each implement a method differently.
For instance, if you have Vehicle as a root class and Car and Bicycle as instances, then vehicle.calculateDistance() would operate based on gas available if the vehicle is an instance of Car, and would operate based on the endurance of the cyclist if it was a Bicycle type.
It's generally used like this:
getTotalDistance(Vehicle[] vehicle) {
int dist=0
for each vehicle
dist+=vehicle.calculateDistance();
Note that you are passing in the base type, but the instances will never be Vehicle itself, but always a child class of Vehicle--but you DO NOT cast it to the child type. Polymorphis means that vehicle morphs into whatever child is required.

Yes, that is an example of "type-polymorphism". However, when talking about object-oriented programming "polymorphism" typically relates to "subtype polymorphism." The example you gave is often called "typing".
Java, C, C++ and others, use static typing. In that, you have to specify the types are compile time.
Python, and Ruby use dynamic in that the typing is inferred during interpretation.
Subtype polymorphism or just "polymorphism" is the ability for a base class reference that is a derived type, to properly invoke the derived type's method.
For example (in near pseudo code):
class Base
{
virtual void Draw() { //base draw}
}
class Derived: public Base
{
void Draw() { //derived draw}
}
Base* myBasePtr = new Derived();
myBasePtr->Draw(); //invokes Derived::Draw(), even though it's a base pointer.
This is polymorphism.

Polymorphism:
One method call works on several classes, even if the classes need different implementations;
Ability to provide multiple implementations of an action, and to select the correct implementation based on the surrounding context;
Provides overloading and overriding;
Could occurs both in Runtime and Compile-Time;
Run time Polymorphism :
Run time Polymorphism also known as method overriding
Method overriding means having two or more methods with the same name , same signature but with different implementation
Compile time Polymorphism :
Compile time Polymorphism also known as method overloading
Method overloading means having two or more methods with the same name but with different signatures
In computer science, polymorphism is a programming language feature that allows values of different data types to be handled using a uniform interface. The concept of parametric polymorphism applies to both data types and functions. A function that can evaluate to or be applied to values of different types is known as a polymorphic function. A data type that can appear to be of a generalized type (e.g., a list with elements of arbitrary type) is designated polymorphic data type like the generalized type from which such specializations are made.
Disadvantages of Polymorphism:
Polymorphism reduces readability of the program. One needs to visualize runtime behaviour of program to identify actual execution time class involved. It also becomes difficult to navigate through classes implementing this concept. Even sofasticated IDEs can not provide this navigation feature. This adds to the maintainance cost to some extent.

Polymorphism - The same object acting differently based on the scenario it is in. For example, if a 12-year old kid was in a room with a bunch of kids, the type of music they would listen to would be different than if a 12 year old kid was in a room full of adults. The 12 year old kid is the same, however the kid acting differently based on the scenario it is in (the different music).

The ability to define a function in multiple forms is called Polymorphism. In C#, Java, C++ there are two types of polymorphism: compile time polymorphism (overloading) and runtime polymorphism (overriding).
Overriding: Overriding occurs when a class method has the same name and signature as a method in parent class.
Overloading: Overloading is determined at the compile time. It occurs when several methods have same names with:
Different method signature and different number or type of
parameters.
Same method signature but different number of parameters.
Same method signature and same number of parameters but of different type

How many private variables are too many? Capsulizing classes? Class Practices?

Okay so i am currently working on an inhouse statistics package for python, its mainly geared towards a combination of working with arcgis geoprocessor, for modeling comparasion and tools.
Anyways, so i have a single class, that calculates statistics. Lets just call it Stats. Now my Stats class, is getting to the point of being very large. It uses statistics calculated by other statistics, to calculate other statistics sets, etc etc. This leads to alot of private variables, that are kept simply to prevent recalculation. however there is certain ones, while used quite frequintly they are often only used by one or two key subsections of functionality. (e.g. summation of matrix diagonals, and probabilities). However its starting to become a major eyeesore, and i feel as if i am doing this terribly wrong.
So is this bad?
I was recommended by a coworker, to simply start putting core and common functionality togther, in the main class, then simply having capsules, that take a reference to the main class, and simply do what ever functionality they need to within themselves. E.g. for calculating accuracy of model predictions, i would create a capsule, who simply takes a reference to the parent, and it will offload all of the calculations needed, for model predictions.
Is something like this really a good idea? Is there a better way? Right now i have over a dozen different sub statistics that are dumped to a text file to make a smallish report. The code base is growing, and i would just love it if i could start splitting up more and more of my python classes. I am just not sure really what the best way about doing stuff like this is.

Why not create a class for each statistic you need to compute and when of the statistics requires other, just pass an instance of the latter to the computing method? However, there is little known about your code and required functionalities. Maybe you could describe in a broader fashion, what kind of statistics you need calculate and how they depend on each other?
Anyway, if I had to count certain statistics, I would instantly turn to creating separate class for each of them. I did once, when I was writing code statistics library for python. Every statistic, like how many times class is inherited or how often function was called, was a separate class. This way each of them was simple, however I didn't need to use any of them in the other.

I can think of a couple of solutions. One would be to simply store values in an array with an enum like so:
StatisticType = enum('AveragePerDay','MedianPerDay'...)
Another would be to use a inheritance like so:
class StatisticBase
....
class AveragePerDay ( StatisticBase )
...
class MedianPerDay ( StatisticBase )
...
There is no hard and fast rule on "too many", however a guideline is that if the list of fields, properties, and methods when collapsed, is longer than a single screen full, it's probably too big.

It's a common anti-pattern for a class to become "too fat" (have too much functionality and related state), and while this is commonly observed about "base classes" (whence the "fat base class" monicker for the anti-pattern), it can really happen without any inheritance involved.
Many design patterns (DPs for short_ can help you re-factor your code to whittle down the large, untestable, unmaintainable "fat class" to a nice package of cooperating classes (which can be used through "Facade" DPs for simplicity): consider, for example, State, Strategy, Memento, Proxy.
You could attack this problem directly, but I think, especially since you mention in a comment that you're looking at it as a general class design topic, it may offer you a good opportunity to dig into the very useful field of design patterns, and especially "refactoring to patterns" (Fowler's book by that title is excellent, though it doesn't touch on Python-specific issues).
Specifically, I believe you'll be focusing mostly on a few Structural and Behavioral patterns (since I don't think you have much need for Creational ones for this use case, except maybe "lazy initialization" of some of your expensive-to-compute state that's only needed in certain cases -- see this wikipedia entry for a pretty exhaustive listing of DPs, with classification and links for further explanations of each).

Since you are asking about best practices you might want to check out pylint (http://www.logilab.org/857). It has many good suggestions about code style including ones relating to how many private variables in a class.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.