Python data model / internal methods

Python data model / internal methods - python

I would like to understand the data model in python (3.8.2), so maybe someone can explain in simple words, why every basic data type is a class-object and has so many attributes and functions.
Consider the following code:
num = 1337
type(num) # <class 'int'>
dir(num)
>>>
['__abs__',
'__add__',
'__and__',
...
'real',
'to_bytes']
So I can do:
num.__add__(1)
>>> 1338
So my question is: why are such functions part of the user defined variables and isn't there quite some overhead to this?

"everything is object in Python", so "everything" (maybe not exactly everything, I don't know) is a class instance and has potentially attributes and methods.
That said you have to know that, by convention, every method whom name begins with __ is considered as private (and is almost really protected) and you should not use it directly.
Every method whom name starts and ends by __ is called a magic method and has a pretty well defined role. They could be considered as a sort of hooks bound to particular operators or moments in the object lifecycle.
For example __add__ is the method bound to the + operator. When you do an addition, you are actually calling the __add__ method of a number, giving it another number as parameter.
As you noticed, you also can call directly the __add__ method if you want, but it is not encouraged.
Python is a high-level interpreted language. It provides many facilities and a clear syntax that allows to concentrate on real problems rather than implementation details.
Those facilities do not come from nowhere and are a benefit provided by (among others) an internal data model.
This data model is certainly not as lightweight as a custom handcrafted C data structure specialy designed for a single problem but it is pratical and usable in many situations (and still reasonably lightweight and well performing for many usages).
If you are concerned by memory optimisation and extreme-performance, You probably should look some lower level languages like C or Go.
But keep in mind that solving any problem could then probably take much more time :)

Why so many methods:
Python is an object-oriented language. It's built on objects interacting with each other. Maybe there's no specific reason why Python has to be object-oriented, but the guy who wrote it gets to decide what kind of language it is and he chose this type.
In Python, the objects get to decide how an operation is implemented. That's why you can have natural syntax like 'str1' + 'str2' or 23 + 40. Note, these are both written in the same way, but will produce different results because the + operation is carried out on different objects.
What about memory management:
If the objects didn't decide how to implement an operation, the logic would still have to be stored somewhere. It would probably take up at least some memory.
The question is how much memory does an object use? To do this, you can use the sys.getsizeof function:
import sys
print(sys.getsizeof(int)) # Prints size in bytes
>>> 416
print(sys.getsizeof(1))
>>> 28
So, the memory of the class (int) is much larger than the object (1). This means that the instances of integers are much smaller than their classes, so memory doesn't inflate as drastically.

Related

Both Inheritance and composition in Python, bad practice?

I'm working a project, where the natural approach is to implement a main object with sub-components based on different classes, e.g. a PC consisting of CPU, GPU, ...
I've started with a composition structure, where the components have attributes and functions inherent to their sub-system and whenever higher level attributes are needed, they are given as arguments.
Now, as I'm adding more functionality, it would make sense to have different types of the main object, e.g. a notebook, which would extend the PC class, but still have a CPU, etc. At the moment, I'm using a separate script, which contains all the functions related to the type.
Would it be considered bad practice to combine inheritance and composition, by using child classes for different types of the main object?

In short
Preferring composition over inheritance does not exclude inheritance, and does not automatically make the combination of both a bad practice. It's about making an informed decision.
More details
The recommendation to prefer composition over inheritance is a rule of thumb. It was first coined by GoF. If you'll read their full argumentation, you'll see that it's not about composition being good and inheritance bad; it's that composition is more flexible and therefore more suitable in many cases.
But you'll need to decide case by case. And indeed, if you consider some variant of the composite pattern, specialization of the leaf and composite classes can be perfectly justified in some situations:
polymorphism could avoid a lot of if and cases,
composition could in some circumstances require additional call-forwarding overhead that might not be necessary when it's really about type specialization.
combination of composition and inheritance could be used to get the best of both worlds (caution: if applied carelessly, it could also give the worst of both worlds)
Note: If you'd provide a short overview of the context with an UML diagram, more arguments could be provided in your particular context. Meanwhile, this question on SE, could also be of interest

copy.deepcopy or create a new object?

I'm developing a real-time application and sometimes I need to create instances to new objects always with the same data.
First, I did it just instantiating them, but then I realised maybe with copy.deepcopy it would be faster. Now, I find people who say deepcopy is horribly slow.
I can't do a simply copy.copy because my object has lists.
My question is, do you know a faster way or I just need to give up and instantiate them again?
Thank you for your time

I believe copy.deepcopy() is still pure Python, so it's unlikely to give you any speed boost.
It sounds to me a little like a classic case of early optimisation. I would suggest writing your code to be intuitive which, in my opinion, is simply instantiating each object. Then you can profile it and see where savings need to be made, if anywhere. It may well be in your real-world use-case that some completely different piece of code will be a bottleneck.
EDIT: One thing I forgot to mention in my original answer - if you're copying a list, make sure you use slice notation (new_list = old_list[:]) rather than iterating through it in Python, which will be slower. This won't do a deep copy, however, so if your lists have other lists or dictionaries you'll need to use deepcopy(). For dict objects, use the copy() method.
If you still find constructing your objects is what's taking the time, you can then consider how to speed it up. You could experiment with __slots__, although they're typically about saving memory more than CPU time so I doubt they'll buy you much. In the extreme case, you can push your object out to a C extension module, which is likely to be a lot faster at the expense of increased complexity. This is always the approach I've taken in the past, where I use native C data structures under the hood and use Python's special methods to wrap a "list-like" or "dict-like" interface on top. This is does rely on you being happy with coding in C, of course.
(As an aside I would avoid C++ unless you have a compelling reason, C++ Python extensions are slightly more fiddly to get building than plain C - it's entirely possible if you have a good motivation, however)
If you object has, for example, very long lists then you might get some mileage out of a sort of copy-on-write approach, where clones of objects just keep the same references instead of copying the lists. Every time you access them you could use sys.getrefcount() to see if it's safe to update in-place or whether you need to take a copy. This approach is likely to be error-prone and overly complex, but I thought I'd mention it for interest.
You could also look at your object hierarchy and see whether you can break objects up such that parts which don't need to be duplicated can be shared between other objects. Again, you'd need to take care when modifying such shared objects.
The important point, however, is that you first want to get your code correct and then make your code fast once you understand the best ways to do that from real world usage.

Python 3.2: Information on class, class attributes, and values of objects

I'm new to Python, and I'm learning about the class function. Does anyone know of any good sources/examples for this? Here's an example I wrote up. I'm trying to read more about self and init
class Example:
def __init__(a, b, c, d):
self.a = a
self.b = b
self.c = c
self.d = d
test = Example(1, 1, 1, 1)
I've been to python.org, as well as this site. I've also been reading beginners Python books, but I'd like more information.
Thanks.

A couple of generic clarifications here:
Python's "class" keyword is not a function, it's a statement which signals to the language that the following code describes a class (a user-defined data type and its associated behavior). "class" takes a name (and a possibly empty list of "parent" classes) ... and introduces a "suite" (an indented block of code).
The "def" keyboard is a similar statement which defines a function. In your example, which should have read: *def _init_(self, a, b, c)*) you're defining a special type of function which is "part of" (associated with, bound to) the Example class. It's also possible (and fairly common) to create unbound functions. Commonly, in Python, unbound functions are simple called "functions" while those which are part of a class are called "methods" ... or "instance functions."
classes are templates for instantiating objects. The terms "instance" and "object" are synonymous in this context. Your example "test" is an instance ... and the Python interpreter "instantiates" and initializes that object according to the class description.
A class is also a "type", that is to say that it's a user definition of a type of data and its associated methods. "class" and "type" are somewhat synonymous in Python though they are conventionally used in different ways. The core "types" of Python data (integers, real numbers, imaginary/complex numbers, strings, lists, tuples, and dictionaries) are all referred to as "types" while the more complex data/operational structures are called classes. Early versions of Python were implemented with constraints that made the distinction between "type" and "class" more than merely a matter of terminological difference. However, the last several versions of Python have eliminated those underlying technical distinctions. Those distinctions related to "subclassing" (inheritance).
classes can be described as a set of additions and modifications to another class. This is called "inheritance" and the class which is derived from another in this manner is referred to as a "subclass." It's common for programmers to create hierarchies of classes ... with specific variations all deriving from more common bases. It's also common to define related functionality within the same files or sets of files. These are "class libraries" and sometimes they are built as "packages."
_init_() is a method; in particular it's the initializer for Python objects (instances of a class).
Python generally uses _..._ (prefixing and suffixing pairs of underscore characters around selected keywords) for "special" method or attribute names ... which is intended to reduce the likelihood that its naming choices will conflict with the meaningful names that you might wish to give to your own methods. While you can name your other methods and attributes with this _XXXX_ --- Python will not inherently treat that as an error --- it's an extremely bad idea to do so. Even if you don't pick any of the currently defined special names there's no guarantee that some future version of Python won't conflict with your usage later.
"methods" are functions ... but they are a type of function which is bound (associated with) a particular instance of a particular class. There are also "class methods" which are associated with the class rather than with a specific instance of the class.
In your example self.b, self.c and so on are "attributes" or "members" of the object (instance). (The terms are synonymous).
In general the purpose of object orient programming is to provide ways of describing types of data and operations on those types of data in a way that's amenable to both human comprehension and computerized interpretation and execution. (All programming languages are intended to strike some balance between human readability/comprehension and machine parsing/execution; but object-oriented languages do so specifically with a focus on the description and definition of "types," and the instantiation of those types into "objects" and finally the various interactions among those objects.
"self" is a Python specific convention for naming the special (and generally required) first argument to any bound method. It's a "self" reference (a way for the code in a method to refer to the object/instance's own attributes and other methods without any ambiguity). While you can call your first argument to your bound methods "a" (as you've unwittingly done in your Example) it's an extremely bad idea to do so. Not only will it likely confuse you later ... it will make no sense to anyone else trying to read your Python code).
The term "object-oriented" is confusing unless one is aware of the comparisons to other forms of programming language. It's an evolution from "procedural" programming. The simplest gist of that comparison is when you consider the sorts of functions one would define in a procedural language were one might have to define and separately name different functions to perform analogous operations on different types of data: print_student_record(this_student) vs. print_teacher_report(some_report) --- a programming model which necessitates a fair amount of overhead on the part of the programmer, to keep track of which functions work on which types. This sort of problem is eliminated in OO (object oriented) programming where one can, conceivably, call on this.print_() ... and, assuming one has created compatible classes, this will "print" regardless of whether "this" is a student (record/object) or a teacher (report/record/object). That's greatly oversimplified but useful for understanding the pressures which led to the development and adoption of OO based programming.
In Python it's possible to create classes with little or no functionality. Your example does nothing, yet, but transfer a set of arguments into "attributes" (members) during initialization (instantiation). After that you could use these attributes in programming statements like: test.a += 1 and print (test.a). This is possible because Python is a "multi-paradigm" language. It supports procedural as well as object-orient programming styles. Objects used this way are very similar to "structs" from the C programming language (predecessor to C++) and to the "records" in Pascal, etc. That style of programming is largely considered to be obsolete (particularly when using a modern, "very high level" language such as Python).
The gist of what I'm getting at is this ... you'll want to learn how to think of your data as the combination of it's "parts" (attributes) and the functionality that changes, manipulates, validates, and handles input, output, and possibly storage, of those attributes.
For example if you were writing a "code breaker" program to solve simply ciphers you might implement a "Histogram" object which counts the letter frequencies of a given coded message. That would have attributes (one integer for every letter) and behavior (feeding ports of the coded message(s) into the instance, splitting the strings into individual characters, filtering out all the non-letter characters, converting all the letters to upper or lower case, and counting them --- that is incrementing the integer corresponding to each letter). Additionally you'd need to have some way of querying the histogram ... for example getting list of the letters sorted by their frequency in the cipher text.
Once you had such a "histogram" class then you could think of ways to use that for your solver. For example to solve a cryptogram puzzle you might computer the histogram then try substituting "etaon" for the five most common ciphered letters ... then check how many of the "partial" strings (t.e for "the") match words, trying permutations, and so on. Each of these might be it's own class. A key point of programming is that your histogram class might be useful for counting all sorts of other things (even in a simple voting system or popularity context). A particular subclass or instantiation might make it a histogram of letters while others could be re-used for other types of "things" you want counted. Similarly the code that iterates over permutions of some list might be used in any number of simulation, optimization, and related programs. (In fact Python's standard libraries already including "counters" and "permutations" functions in the "collections" and "itertools" modules, respectively).
Of course you're going to hear of all of these concepts repeatedly as you study programming. This has been a rather rambling attempt to kickstart that process. I know I've been a bit repetitious in a few points here --- part of that is because I'm typing this a 4am after having started work at 7am yesterday; but part of it serves a pedagogical purpose as well.

There's an error in your class definition. Always include the variable self in your __init__ method. It represents the instance of the object itself and should be included as the first parameter to all of your methods.
What do you want to accomplish with this class? Up until now, it just stores a few variables. Try adding a few methods to spice things up a little bit! There is a plethora of available resources on classes in Python. To start, you may wanna give this one a try:
Python Programming - Classes

I'm learning python now also, and there is an intro class that's pretty good on codecademy.com
http://www.codecademy.com/tracks/python
It has a section that goes through an exercise on classes. Hope this helps

Explain polymorphism

What is polymorphism? I'm not sure I am understanding it correctly.
In the Python scope, what I am getting out of it is that I can define parameters as followed:
def blah (x, y)
without having to specify the type, as opposed to another language like Java, where it'd look more along the lines of:
public void blah (int x, string y)
Am I getting that right?

Beware that different people use different terminology; in particular there is often a rift between the object oriented community and the (rest of the) programming language theory community.
Generally speaking, polymorphism means that a method or function is able to cope with different types of input. For example the add method (or + operator) in the Integer class might perform integer addition, while the add method in the Float class performs floating-point addition, and the add method in the Bignum class performs the correct operations for an arbitrary-size number. Polymorphism is the ability to call the add method on an object without knowing what kind of a number it is.
One particular kind of polymorphism, usually called parametric polymorphism in the functional community and generic programming in the OOP community, is the ability to perform certain operations on an object without caring about its precise type. For example, to reverse a list, you don't need to care about the type of the elements of the list, you just need to know that it's a list. So you can write generic (in this sense) list reversal code: it'll work identically on lists of integers, strings, widgets, arbitrary objects, whatever. But you can't write code that adds the elements of a list in a generic way, because the way the elements are interpreted as numbers depends on what type they are.
Another kind of polymorphism, usually called ad-hoc polymorphism or (at least for some forms of it) generic programming in the functional community, and often subtyping polymorphism (though this somewhat restricts the concept) in the OOP community, it the ability to have a single method or function that behaves differently depending on the precise type of its arguments (or, for methods, the type of the object whose method is being invoked). The add example above is ad-hoc polymorphism. In dynamically typed languages this ability goes without saying; statically-typed languages tend to (but don't have to) have restrictions such as requiring that the argument be a subclass of some particular class (Addable).
Polymorphism is not about having to specify types when you define a function. That's more related to static vs. dynamic typing, though it's not an intrinsic part of the issue. Dynamically typed languages have no need for type declarations, while statically typed languages usually need some type declarations (going from quite a lot in Java to almost none in ML).

Hope from this example, you will understand what Polymorphism is. In this picture, all objects have a method Speak() but each has a different implementation. Polymorphism allows you to do this, you can declare an action for a class and its subclasses but for each subclass, you can write exactly what you want later.

The answers you've gotten are good, and explain what polymorphism is. I think it can also help to understand some of the reasons it is useful.
In some languages that lack polymorphism you find yourself in situations where you want to perform what is conceptually the same operation on different types of objects, in cases where that operation has to be implemented differently for each type. For instance, in a python-like syntax:
def dosomething(thing):
if type(thing)==suchandsuch:
#do some stuff
elif type(thing)==somesuch:
#do some other stuff
elif type(thing)==nonesuch:
#yet more stuff
There are some problems with this. The biggest is that it causes very tight coupling and a lot of repetition. You are likely to have this same set of tests in a lot of places in your code. What happens if you add a new type that has to support this operation? You have to go find every place you have this sort of conditional and add a new branch. And of course you have to have access to all the source code involved to be able to make those changes. On top of that conditional logic like this is wordy, and hard to understand in real cases.
It's nicer to be able to just write:
thing.dosomething()
On top of being a lot shorter this leads to much looser coupling. This example/explanation is geared to traditional OO languages like Python. The details are a bit different in, say, functional languages. But a lot of the general utility of polymorphism remains the same.

Polymorphism literally means "many shapes", and that's pretty good at explaining its purpose. The idea behind polymorphism is that one can use the same calls on different types and have them behave appropriately.
It is important to distinguish this from the typing system - strongly typed languages require that objects be related through an inheritance chain to be polymorphic, but with weakly typed languages, this is not necessary.
In Java (et al.), this is why interfaces are useful - they define the set of functions that can be called on objects without specifying the exact object - the objects that implement that interface are polymorphic with respect to that interface.
In Python, since things are dynamically typed, the interface is less formal, but the idea is the same - calling foo() will succeed on two objects that know how to properly implement foo(), but we don't care about what type they really are.

No, that is not polymorphism. Polymorphism is the concept that there can be many different implementations of an executable unit and the difference happen all behind the scene without the caller awareness.
For example, Bird and Plane are FlyingObject. So FlyingObject has a method call fly() and both Bird and Plane implement fly() method. Bird and Plan flies differently so the implementations are different. To the clients point of view, it does not matter how Bird or Plane fly, they can just call fly() method to a FlyingObject object does not matter if that object is Plan or Bird.
What you are describing is dynamic and static type checking which the type compatibility is done at run-time and compile-time respectively.
Hope this out.
NawaMan

Short answer: The ability to treat programmatic type instances of different types as the same for certain purposes.
Long answer:
From Ancient Greek poly (many) + morph (form) + -ism.
Polymorphism is a general technique enabling different types to be treated uniformly in some way. Examples in the programming world include:
parametric polymorphism (seen as
generics in Java)
subtyping
polymorphism, implemented in Java
using dynamic message dispatch
between object instances.
ad-hoc
polymorphism, which relates to the
ability to define functions of the
same name that vary only by the types
they deal with (overloading in Java).
The word polymorphism is also used to describe concepts in other, unrelated, domains such as genetics.

What you are talking about is auto-typing--or maybe type detection. It is something a Dynamic language tends to do--it allows the user to not know or care about the types at build time--the types are figured out at runtime and not restricted to a specific type.
Polymorphism is where you have two classes that inherit from a main class but each implement a method differently.
For instance, if you have Vehicle as a root class and Car and Bicycle as instances, then vehicle.calculateDistance() would operate based on gas available if the vehicle is an instance of Car, and would operate based on the endurance of the cyclist if it was a Bicycle type.
It's generally used like this:
getTotalDistance(Vehicle[] vehicle) {
int dist=0
for each vehicle
dist+=vehicle.calculateDistance();
Note that you are passing in the base type, but the instances will never be Vehicle itself, but always a child class of Vehicle--but you DO NOT cast it to the child type. Polymorphis means that vehicle morphs into whatever child is required.

Yes, that is an example of "type-polymorphism". However, when talking about object-oriented programming "polymorphism" typically relates to "subtype polymorphism." The example you gave is often called "typing".
Java, C, C++ and others, use static typing. In that, you have to specify the types are compile time.
Python, and Ruby use dynamic in that the typing is inferred during interpretation.
Subtype polymorphism or just "polymorphism" is the ability for a base class reference that is a derived type, to properly invoke the derived type's method.
For example (in near pseudo code):
class Base
{
virtual void Draw() { //base draw}
}
class Derived: public Base
{
void Draw() { //derived draw}
}
Base* myBasePtr = new Derived();
myBasePtr->Draw(); //invokes Derived::Draw(), even though it's a base pointer.
This is polymorphism.

Polymorphism:
One method call works on several classes, even if the classes need different implementations;
Ability to provide multiple implementations of an action, and to select the correct implementation based on the surrounding context;
Provides overloading and overriding;
Could occurs both in Runtime and Compile-Time;
Run time Polymorphism :
Run time Polymorphism also known as method overriding
Method overriding means having two or more methods with the same name , same signature but with different implementation
Compile time Polymorphism :
Compile time Polymorphism also known as method overloading
Method overloading means having two or more methods with the same name but with different signatures
In computer science, polymorphism is a programming language feature that allows values of different data types to be handled using a uniform interface. The concept of parametric polymorphism applies to both data types and functions. A function that can evaluate to or be applied to values of different types is known as a polymorphic function. A data type that can appear to be of a generalized type (e.g., a list with elements of arbitrary type) is designated polymorphic data type like the generalized type from which such specializations are made.
Disadvantages of Polymorphism:
Polymorphism reduces readability of the program. One needs to visualize runtime behaviour of program to identify actual execution time class involved. It also becomes difficult to navigate through classes implementing this concept. Even sofasticated IDEs can not provide this navigation feature. This adds to the maintainance cost to some extent.

Polymorphism - The same object acting differently based on the scenario it is in. For example, if a 12-year old kid was in a room with a bunch of kids, the type of music they would listen to would be different than if a 12 year old kid was in a room full of adults. The 12 year old kid is the same, however the kid acting differently based on the scenario it is in (the different music).

The ability to define a function in multiple forms is called Polymorphism. In C#, Java, C++ there are two types of polymorphism: compile time polymorphism (overloading) and runtime polymorphism (overriding).
Overriding: Overriding occurs when a class method has the same name and signature as a method in parent class.
Overloading: Overloading is determined at the compile time. It occurs when several methods have same names with:
Different method signature and different number or type of
parameters.
Same method signature but different number of parameters.
Same method signature and same number of parameters but of different type

How bad is it to override a method from a third-party module?

How bad is it to redefine a class method from another, third-party module, in Python?
In fact, users can create NumPy matrices that contain numbers with uncertainty; ideally, I would like their code to run unmodified (compared to when the code manipulates float matrices); in particular, it would be great if the inverse of matrix m could still be obtained with m.I, despite the fact that m.I has to be calculated with my own code (the original I method does not work, in general).
How bad is it to redefine numpy.matrix.I? For one thing, it does tamper with third-party code, which I don't like, as it may not be robust (what if other modules do the same?…). Another problem is that the new numpy.matrix.I is a wrapper that involves a small overhead when the original numpy.matrix.I can actually be applied in order to obtain the inverse matrix.
Is subclassing NumPy matrices and only changing their I method better? this would force users to update their code and create matrices of numbers with uncertainty with m = matrix_with_uncert(…) (instead of keeping using numpy.matrix(…), as for a matrix of floats), but maybe this is an inconvenience that should be accepted for the sake of robustness? Matrix inversions could still be performed with m.I, which is good… On the other hand, it would be nice if users could build all their matrices (of floats or of numbers with uncertainties) with numpy.matrix() directly, without having to bother checking for data types.
Any comment, or additional approach would be welcome!

Subclassing (which does involve overriding, as the term is generally used) is generally much preferable to "monkey-patching" (stuffing altered methods into existing classes or modules), even when the latter is available (built-in types, meaning ones implemented in C, can protect themselves against monkey-patching, and most of them do).
For example, if your functionality relies on monkey-patching, it will break and stop upgrades if at any time the class you're monkey patching gets upgraded to be implemented in C (for speed or specifically to defend against monkey patching). Maintainers of third party packages hate monkey-patching because it means they get bogus bug reports from hapless users who (unbeknownst to them) are in fact using a buggy monkey-patch which breaks the third party package, where the latter (unless broken monkey-wise) is flawless. You've already remarked on the possible performance hit.
Conceptually, a "matrix of numbers with uncertainty" is a different concept from a "matrix of numbers". Subclassing expresses this cleanly, monkey-patching tries to hide it. That's really the root of what's wrong with monkey-patching in general: a covert channel operating through global, hidden means, without clarity and transparency. All the many practical problems descend in a sense from this root conceptual problem.
I strongly urge you to reject monkey-patching in favor of clean solutions such as subclassing.

In general, it's perfectly acceptable to override methods that are ...
Intentionally permit overrides
In a way they document (satisfying LSP won't hurt)
If both conditions are met, then overriding should be safe.

Depends on what you mean with "redefine". Obviously you can use your own version of it, no problem at all. Also you can redefine it by subclassing if it's a method.
You can also make a new method, and patch it into the class, a practice known as monkey_patching. Like so:
from amodule import aclass
def newfunction(self, param):
do_something()
aclass.oldfunction = newfunction
This will make all instances of aclass use your new function instead of the old one, including instances in any "fourth-party" modules. This works and is highly useful, but it's regarded as very ugly and a last resort option. This is because there is nothing in the aclass code to suggest that you have overridden the method, so it's hard to debug. And even worse it gets when two modules monkeypatch the same thing. Then you really get confused.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.