python: what is this funny notation? [0,1,3].__len__() - python

why would anyone use double underscores
why not just do len([1,2,3])?
my question is specifically What do the underscores mean?

__len__() is the special Python method that is called when you use len().
It's pretty much like str() uses __str__(), repr uses __repr__(), etc. You can overload it in your classes to give len() a custom behaviour when used on an instance of your class.
See here: http://docs.python.org/release/2.5.2/ref/sequence-types.html:
__len__(self)
Called to implement the built-in function len(). Should return
the length of the object, an integer >= 0.
Also, an object that doesn't define a __nonzero__() method and
whose __len__() method returns zero is
considered to be false in a Boolean
context.

The one reason I've had to use x.__len__() rather than len(x) is due to the limitation that len can only return an int, whereas __len__ can return anything.
You might quite reasonably argue that only an integer should be returned, but if the length of your object is greater than sys.maxsize then you have no choice except to return a long instead:
>>> class A(object):
... def __len__(self):
... return 2**80
...
>>> a = A()
>>> len(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OverflowError: long int too large to convert to int
>>> a.__len__()
1208925819614629174706176L
This isn't just a theoretical limitation, I've done a fair bit of work with bit containers where (with 32-bit Python) the len function stops working when the object reaches just 256MB.

If you "absolutely must" call it as a method (rather than a function) then the double underscores are needed. But the only motivation I can think for the code you show of is that somebody's gone so OO-happy that they don't understand the key Python principle that you don't directly call special methods (except you may need to call your superclass's version if you're overriding it, of course) -- you use the appropriate builtin or operator, and it calls (internally) whatever special methods are needed.
Actually, in the code you show the only sensible thing is to just use the constant 3, so maybe you've oversimplified your example a little...?-)

You should read the docs on special method names.

Related

Function and method calls in Python [duplicate]

I know that python has a len() function that is used to determine the size of a string, but I was wondering why it's not a method of the string object?
Strings do have a length method: __len__()
The protocol in Python is to implement this method on objects which have a length and use the built-in len() function, which calls it for you, similar to the way you would implement __iter__() and use the built-in iter() function (or have the method called behind the scenes for you) on objects which are iterable.
See Emulating container types for more information.
Here's a good read on the subject of protocols in Python: Python and the Principle of Least Astonishment
Jim's answer to this question may help; I copy it here. Quoting Guido van Rossum:
First of all, I chose len(x) over x.len() for HCI reasons (def __len__() came much later). There are two intertwined reasons actually, both HCI:
(a) For some operations, prefix notation just reads better than postfix — prefix (and infix!) operations have a long tradition in mathematics which likes notations where the visuals help the mathematician thinking about a problem. Compare the easy with which we rewrite a formula like x*(a+b) into x*a + x*b to the clumsiness of doing the same thing using a raw OO notation.
(b) When I read code that says len(x) I know that it is asking for the length of something. This tells me two things: the result is an integer, and the argument is some kind of container. To the contrary, when I read x.len(), I have to already know that x is some kind of container implementing an interface or inheriting from a class that has a standard len(). Witness the confusion we occasionally have when a class that is not implementing a mapping has a get() or keys() method, or something that isn’t a file has a write() method.
Saying the same thing in another way, I see ‘len‘ as a built-in operation. I’d hate to lose that. /…/
Python is a pragmatic programming language, and the reasons for len() being a function and not a method of str, list, dict etc. are pragmatic.
The len() built-in function deals directly with built-in types: the CPython implementation of len() actually returns the value of the ob_size field in the PyVarObject C struct that represents any variable-sized built-in object in memory. This is much faster than calling a method -- no attribute lookup needs to happen. Getting the number of items in a collection is a common operation and must work efficiently for such basic and diverse types as str, list, array.array etc.
However, to promote consistency, when applying len(o) to a user-defined type, Python calls o.__len__() as a fallback. __len__, __abs__ and all the other special methods documented in the Python Data Model make it easy to create objects that behave like the built-ins, enabling the expressive and highly consistent APIs we call "Pythonic".
By implementing special methods your objects can support iteration, overload infix operators, manage contexts in with blocks etc. You can think of the Data Model as a way of using the Python language itself as a framework where the objects you create can be integrated seamlessly.
A second reason, supported by quotes from Guido van Rossum like this one, is that it is easier to read and write len(s) than s.len().
The notation len(s) is consistent with unary operators with prefix notation, like abs(n). len() is used way more often than abs(), and it deserves to be as easy to write.
There may also be a historical reason: in the ABC language which preceded Python (and was very influential in its design), there was a unary operator written as #s which meant len(s).
There is a len method:
>>> a = 'a string of some length'
>>> a.__len__()
23
>>> a.__len__
<method-wrapper '__len__' of str object at 0x02005650>
met% python -c 'import this' | grep 'only one'
There should be one-- and preferably only one --obvious way to do it.
There are some great answers here, and so before I give my own I'd like to highlight a few of the gems (no ruby pun intended) I've read here.
Python is not a pure OOP language -- it's a general purpose, multi-paradigm language that allows the programmer to use the paradigm they are most comfortable with and/or the paradigm that is best suited for their solution.
Python has first-class functions, so len is actually an object. Ruby, on the other hand, doesn't have first class functions. So the len function object has it's own methods that you can inspect by running dir(len).
If you don't like the way this works in your own code, it's trivial for you to re-implement the containers using your preferred method (see example below).
>>> class List(list):
... def len(self):
... return len(self)
...
>>> class Dict(dict):
... def len(self):
... return len(self)
...
>>> class Tuple(tuple):
... def len(self):
... return len(self)
...
>>> class Set(set):
... def len(self):
... return len(self)
...
>>> my_list = List([1,2,3,4,5,6,7,8,9,'A','B','C','D','E','F'])
>>> my_dict = Dict({'key': 'value', 'site': 'stackoverflow'})
>>> my_set = Set({1,2,3,4,5,6,7,8,9,'A','B','C','D','E','F'})
>>> my_tuple = Tuple((1,2,3,4,5,6,7,8,9,'A','B','C','D','E','F'))
>>> my_containers = Tuple((my_list, my_dict, my_set, my_tuple))
>>>
>>> for container in my_containers:
... print container.len()
...
15
2
15
15
Something missing from the rest of the answers here: the len function checks that the __len__ method returns a non-negative int. The fact that len is a function means that classes cannot override this behaviour to avoid the check. As such, len(obj) gives a level of safety that obj.len() cannot.
Example:
>>> class A:
... def __len__(self):
... return 'foo'
...
>>> len(A())
Traceback (most recent call last):
File "<pyshell#8>", line 1, in <module>
len(A())
TypeError: 'str' object cannot be interpreted as an integer
>>> class B:
... def __len__(self):
... return -1
...
>>> len(B())
Traceback (most recent call last):
File "<pyshell#13>", line 1, in <module>
len(B())
ValueError: __len__() should return >= 0
Of course, it is possible to "override" the len function by reassigning it as a global variable, but code which does this is much more obviously suspicious than code which overrides a method in a class.

Why I can't extend bool in Python?

>>> class BOOL(bool):
... print "why?"
...
why?
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Error when calling the metaclass bases
type 'bool' is not an acceptable base type
I thought Python trusted the programmer.
Guido's take on it:
I thought about this last
night, and realized that you shouldn't
be allowed to subclass bool at all! A
subclass would only be useful when it
has instances, but the mere existance
of an instance of a subclass of bool
would break the invariant that True
and False are the only instances of
bool! (An instance of a subclass of C
is also an instance of C.) I think
it's important not to provide a
backdoor to create additional bool
instances, so I think bool should not
be subclassable.
Reference: http://mail.python.org/pipermail/python-dev/2002-March/020822.html
Since the OP mentions in a comment:
I want 1 and 2 to return an instance
of my class.
I think it's important to point out that this is entirely impossible: Python does not let you alter built-in types (and, in particular, their special methods). Literal 1 will always be an instance of built-in type int, and in any case the basic semantics of the and operator are not overridable anyway -- a and b is always identical to b if a else a for any a and b (no bool coercion involved, even though the OP appears to mistakenly believe one is happening).
Restating this crucial point: the value of a and b is always, unchangeably either a or b -- there is no way to break this semantic constraint (even if a and b were instances of your own peculiar classes -- even less so of course when they're constrained to be instances of Python's built-in int!-).
If you are using Python 3, and you want to have a class that can be evaluated as a boolean, but also contain other functionality, implement __bool__ in your class.
In Python 2 the same effect can be achieved by implementing __nonzero__ or __len__ (if your class is a container).
Here is a post that explains the reasoning behind the decision: http://mail.python.org/pipermail/python-dev/2004-February/042537.html
The idea is that bool has a specific purpose - to be True or to be False, and adding to that would only serve to complicate your code elsewhere.
Because bool is supposed to only have two values -- True and False. If you were able to subclass bool, you could define arbitrary numbers of values for it, and that's definitely not what you want to happen.
A better question is: why do you want to extend bool?

Is there any case where len(someObj) does not call someObj's __len__ function?

Is there any case where len(someObj) does not call someObj's __len__ function?
I recently replaced the former with the latter in a (sucessful) effort to speed up some code. I want to make sure there's not some edge case somewhere where len(someObj) is not the same as someObj.__len__().
If __len__ returns a length over sys.maxsize, len() will raise an exception. This isn't true of calling __len__ directly. (In fact you could return any object from __len__ which won't be caught unless it goes through len().)
What kind of speedup did you see? I cannot imagine it was noticeable was it?
From http://mail.python.org/pipermail/python-list/2002-May/147079.html
in certain situations there is no
difference, but using len() is
preferred for a couple reasons.
first, it's not recommended to go
calling the __methods__ yourself, they
are meant to be used by other parts of
python.
len() will work on any type of
sequence object (lists, tuples, and
all).
__len__ will only work on class instances with a __len__ method.
len() will return a more appropriate
exception on objects without length.
I think the answer is that it will always work -- according to the Python docs:
__len__(self):
Called to implement the built-in function len(). Should return the length of the object, an integer >= 0. Also, an object that doesn't define a __nonzero__() method and whose __len__() method returns zero is considered to be false in a Boolean context.
There are cases where len(someObj) is not the same as someObj.__len__() since len() validates __len__()'s return value. Here are the possible errors in Python 3.6.9:
Too low, i.e. less than 0
ValueError: __len__() should return >= 0
Too high, i.e. greater than sys.maxsize (CPython-specific, per the docs)
OverflowError: cannot fit 'int' into an index-sized integer
An invalid type, e.g float
TypeError: 'float' object cannot be interpreted as an integer
Missing, e.g. len(object)
TypeError: object of type 'type' has no len()
I mention this because object.__len__() raises a different exception, AttributeError.
It's also worth noting that range(sys.maxsize+1) is valid, but its __len__() raises an exception:
OverflowError: Python int too large to convert to C ssize_t
According to Mark Pilgrim, it looks like no. len(someObj) is the same as someObj.__len__();
Cheers!

Why does Python code use len() function instead of a length method?

I know that python has a len() function that is used to determine the size of a string, but I was wondering why it's not a method of the string object?
Strings do have a length method: __len__()
The protocol in Python is to implement this method on objects which have a length and use the built-in len() function, which calls it for you, similar to the way you would implement __iter__() and use the built-in iter() function (or have the method called behind the scenes for you) on objects which are iterable.
See Emulating container types for more information.
Here's a good read on the subject of protocols in Python: Python and the Principle of Least Astonishment
Jim's answer to this question may help; I copy it here. Quoting Guido van Rossum:
First of all, I chose len(x) over x.len() for HCI reasons (def __len__() came much later). There are two intertwined reasons actually, both HCI:
(a) For some operations, prefix notation just reads better than postfix — prefix (and infix!) operations have a long tradition in mathematics which likes notations where the visuals help the mathematician thinking about a problem. Compare the easy with which we rewrite a formula like x*(a+b) into x*a + x*b to the clumsiness of doing the same thing using a raw OO notation.
(b) When I read code that says len(x) I know that it is asking for the length of something. This tells me two things: the result is an integer, and the argument is some kind of container. To the contrary, when I read x.len(), I have to already know that x is some kind of container implementing an interface or inheriting from a class that has a standard len(). Witness the confusion we occasionally have when a class that is not implementing a mapping has a get() or keys() method, or something that isn’t a file has a write() method.
Saying the same thing in another way, I see ‘len‘ as a built-in operation. I’d hate to lose that. /…/
Python is a pragmatic programming language, and the reasons for len() being a function and not a method of str, list, dict etc. are pragmatic.
The len() built-in function deals directly with built-in types: the CPython implementation of len() actually returns the value of the ob_size field in the PyVarObject C struct that represents any variable-sized built-in object in memory. This is much faster than calling a method -- no attribute lookup needs to happen. Getting the number of items in a collection is a common operation and must work efficiently for such basic and diverse types as str, list, array.array etc.
However, to promote consistency, when applying len(o) to a user-defined type, Python calls o.__len__() as a fallback. __len__, __abs__ and all the other special methods documented in the Python Data Model make it easy to create objects that behave like the built-ins, enabling the expressive and highly consistent APIs we call "Pythonic".
By implementing special methods your objects can support iteration, overload infix operators, manage contexts in with blocks etc. You can think of the Data Model as a way of using the Python language itself as a framework where the objects you create can be integrated seamlessly.
A second reason, supported by quotes from Guido van Rossum like this one, is that it is easier to read and write len(s) than s.len().
The notation len(s) is consistent with unary operators with prefix notation, like abs(n). len() is used way more often than abs(), and it deserves to be as easy to write.
There may also be a historical reason: in the ABC language which preceded Python (and was very influential in its design), there was a unary operator written as #s which meant len(s).
There is a len method:
>>> a = 'a string of some length'
>>> a.__len__()
23
>>> a.__len__
<method-wrapper '__len__' of str object at 0x02005650>
met% python -c 'import this' | grep 'only one'
There should be one-- and preferably only one --obvious way to do it.
There are some great answers here, and so before I give my own I'd like to highlight a few of the gems (no ruby pun intended) I've read here.
Python is not a pure OOP language -- it's a general purpose, multi-paradigm language that allows the programmer to use the paradigm they are most comfortable with and/or the paradigm that is best suited for their solution.
Python has first-class functions, so len is actually an object. Ruby, on the other hand, doesn't have first class functions. So the len function object has it's own methods that you can inspect by running dir(len).
If you don't like the way this works in your own code, it's trivial for you to re-implement the containers using your preferred method (see example below).
>>> class List(list):
... def len(self):
... return len(self)
...
>>> class Dict(dict):
... def len(self):
... return len(self)
...
>>> class Tuple(tuple):
... def len(self):
... return len(self)
...
>>> class Set(set):
... def len(self):
... return len(self)
...
>>> my_list = List([1,2,3,4,5,6,7,8,9,'A','B','C','D','E','F'])
>>> my_dict = Dict({'key': 'value', 'site': 'stackoverflow'})
>>> my_set = Set({1,2,3,4,5,6,7,8,9,'A','B','C','D','E','F'})
>>> my_tuple = Tuple((1,2,3,4,5,6,7,8,9,'A','B','C','D','E','F'))
>>> my_containers = Tuple((my_list, my_dict, my_set, my_tuple))
>>>
>>> for container in my_containers:
... print container.len()
...
15
2
15
15
Something missing from the rest of the answers here: the len function checks that the __len__ method returns a non-negative int. The fact that len is a function means that classes cannot override this behaviour to avoid the check. As such, len(obj) gives a level of safety that obj.len() cannot.
Example:
>>> class A:
... def __len__(self):
... return 'foo'
...
>>> len(A())
Traceback (most recent call last):
File "<pyshell#8>", line 1, in <module>
len(A())
TypeError: 'str' object cannot be interpreted as an integer
>>> class B:
... def __len__(self):
... return -1
...
>>> len(B())
Traceback (most recent call last):
File "<pyshell#13>", line 1, in <module>
len(B())
ValueError: __len__() should return >= 0
Of course, it is possible to "override" the len function by reassigning it as a global variable, but code which does this is much more obviously suspicious than code which overrides a method in a class.

Why isn't the 'len' function inherited by dictionaries and lists in Python

example:
a_list = [1, 2, 3]
a_list.len() # doesn't work
len(a_list) # works
Python being (very) object oriented, I don't understand why the 'len' function isn't inherited by the object.
Plus I keep trying the wrong solution since it appears as the logical one to me
Guido's explanation is here:
First of all, I chose len(x) over x.len() for HCI reasons (def __len__() came much later). There are two intertwined reasons actually, both HCI:
(a) For some operations, prefix notation just reads better than postfix — prefix (and infix!) operations have a long tradition in mathematics which likes notations where the visuals help the mathematician thinking about a problem. Compare the easy with which we rewrite a formula like x*(a+b) into x*a + x*b to the clumsiness of doing the same thing using a raw OO notation.
(b) When I read code that says len(x) I know that it is asking for the length of something. This tells me two things: the result is an integer, and the argument is some kind of container. To the contrary, when I read x.len(), I have to already know that x is some kind of container implementing an interface or inheriting from a class that has a standard len(). Witness the confusion we occasionally have when a class that is not implementing a mapping has a get() or keys() method, or something that isn’t a file has a write() method.
Saying the same thing in another way, I see ‘len‘ as a built-in operation. I’d hate to lose that. /…/
The short answer: 1) backwards compatibility and 2) there's not enough of a difference for it to really matter. For a more detailed explanation, read on.
The idiomatic Python approach to such operations is special methods which aren't intended to be called directly. For example, to make x + y work for your own class, you write a __add__ method. To make sure that int(spam) properly converts your custom class, write a __int__ method. To make sure that len(foo) does something sensible, write a __len__ method.
This is how things have always been with Python, and I think it makes a lot of sense for some things. In particular, this seems like a sensible way to implement operator overloading. As for the rest, different languages disagree; in Ruby you'd convert something to an integer by calling spam.to_i directly instead of saying int(spam).
You're right that Python is an extremely object-oriented language and that having to call an external function on an object to get its length seems odd. On the other hand, len(silly_walks) isn't any more onerous than silly_walks.len(), and Guido has said that he actually prefers it (http://mail.python.org/pipermail/python-3000/2006-November/004643.html).
It just isn't.
You can, however, do:
>>> [1,2,3].__len__()
3
Adding a __len__() method to a class is what makes the len() magic work.
This way fits in better with the rest of the language. The convention in python is that you add __foo__ special methods to objects to make them have certain capabilities (rather than e.g. deriving from a specific base class). For example, an object is
callable if it has a __call__ method
iterable if it has an __iter__ method,
supports access with [] if it has __getitem__ and __setitem__.
...
One of these special methods is __len__ which makes it have a length accessible with len().
Maybe you're looking for __len__. If that method exists, then len(a) calls it:
>>> class Spam:
... def __len__(self): return 3
...
>>> s = Spam()
>>> len(s)
3
Well, there actually is a length method, it is just hidden:
>>> a_list = [1, 2, 3]
>>> a_list.__len__()
3
The len() built-in function appears to be simply a wrapper for a call to the hidden len() method of the object.
Not sure why they made the decision to implement things this way though.
there is some good info below on why certain things are functions and other are methods. It does indeed cause some inconsistencies in the language.
http://mail.python.org/pipermail/python-dev/2008-January/076612.html

Categories