I know that python has a len() function that is used to determine the size of a string, but I was wondering why it's not a method of the string object?
Strings do have a length method: __len__()
The protocol in Python is to implement this method on objects which have a length and use the built-in len() function, which calls it for you, similar to the way you would implement __iter__() and use the built-in iter() function (or have the method called behind the scenes for you) on objects which are iterable.
See Emulating container types for more information.
Here's a good read on the subject of protocols in Python: Python and the Principle of Least Astonishment
Jim's answer to this question may help; I copy it here. Quoting Guido van Rossum:
First of all, I chose len(x) over x.len() for HCI reasons (def __len__() came much later). There are two intertwined reasons actually, both HCI:
(a) For some operations, prefix notation just reads better than postfix — prefix (and infix!) operations have a long tradition in mathematics which likes notations where the visuals help the mathematician thinking about a problem. Compare the easy with which we rewrite a formula like x*(a+b) into x*a + x*b to the clumsiness of doing the same thing using a raw OO notation.
(b) When I read code that says len(x) I know that it is asking for the length of something. This tells me two things: the result is an integer, and the argument is some kind of container. To the contrary, when I read x.len(), I have to already know that x is some kind of container implementing an interface or inheriting from a class that has a standard len(). Witness the confusion we occasionally have when a class that is not implementing a mapping has a get() or keys() method, or something that isn’t a file has a write() method.
Saying the same thing in another way, I see ‘len‘ as a built-in operation. I’d hate to lose that. /…/
Python is a pragmatic programming language, and the reasons for len() being a function and not a method of str, list, dict etc. are pragmatic.
The len() built-in function deals directly with built-in types: the CPython implementation of len() actually returns the value of the ob_size field in the PyVarObject C struct that represents any variable-sized built-in object in memory. This is much faster than calling a method -- no attribute lookup needs to happen. Getting the number of items in a collection is a common operation and must work efficiently for such basic and diverse types as str, list, array.array etc.
However, to promote consistency, when applying len(o) to a user-defined type, Python calls o.__len__() as a fallback. __len__, __abs__ and all the other special methods documented in the Python Data Model make it easy to create objects that behave like the built-ins, enabling the expressive and highly consistent APIs we call "Pythonic".
By implementing special methods your objects can support iteration, overload infix operators, manage contexts in with blocks etc. You can think of the Data Model as a way of using the Python language itself as a framework where the objects you create can be integrated seamlessly.
A second reason, supported by quotes from Guido van Rossum like this one, is that it is easier to read and write len(s) than s.len().
The notation len(s) is consistent with unary operators with prefix notation, like abs(n). len() is used way more often than abs(), and it deserves to be as easy to write.
There may also be a historical reason: in the ABC language which preceded Python (and was very influential in its design), there was a unary operator written as #s which meant len(s).
There is a len method:
>>> a = 'a string of some length'
>>> a.__len__()
23
>>> a.__len__
<method-wrapper '__len__' of str object at 0x02005650>
met% python -c 'import this' | grep 'only one'
There should be one-- and preferably only one --obvious way to do it.
There are some great answers here, and so before I give my own I'd like to highlight a few of the gems (no ruby pun intended) I've read here.
Python is not a pure OOP language -- it's a general purpose, multi-paradigm language that allows the programmer to use the paradigm they are most comfortable with and/or the paradigm that is best suited for their solution.
Python has first-class functions, so len is actually an object. Ruby, on the other hand, doesn't have first class functions. So the len function object has it's own methods that you can inspect by running dir(len).
If you don't like the way this works in your own code, it's trivial for you to re-implement the containers using your preferred method (see example below).
>>> class List(list):
... def len(self):
... return len(self)
...
>>> class Dict(dict):
... def len(self):
... return len(self)
...
>>> class Tuple(tuple):
... def len(self):
... return len(self)
...
>>> class Set(set):
... def len(self):
... return len(self)
...
>>> my_list = List([1,2,3,4,5,6,7,8,9,'A','B','C','D','E','F'])
>>> my_dict = Dict({'key': 'value', 'site': 'stackoverflow'})
>>> my_set = Set({1,2,3,4,5,6,7,8,9,'A','B','C','D','E','F'})
>>> my_tuple = Tuple((1,2,3,4,5,6,7,8,9,'A','B','C','D','E','F'))
>>> my_containers = Tuple((my_list, my_dict, my_set, my_tuple))
>>>
>>> for container in my_containers:
... print container.len()
...
15
2
15
15
Something missing from the rest of the answers here: the len function checks that the __len__ method returns a non-negative int. The fact that len is a function means that classes cannot override this behaviour to avoid the check. As such, len(obj) gives a level of safety that obj.len() cannot.
Example:
>>> class A:
... def __len__(self):
... return 'foo'
...
>>> len(A())
Traceback (most recent call last):
File "<pyshell#8>", line 1, in <module>
len(A())
TypeError: 'str' object cannot be interpreted as an integer
>>> class B:
... def __len__(self):
... return -1
...
>>> len(B())
Traceback (most recent call last):
File "<pyshell#13>", line 1, in <module>
len(B())
ValueError: __len__() should return >= 0
Of course, it is possible to "override" the len function by reassigning it as a global variable, but code which does this is much more obviously suspicious than code which overrides a method in a class.
Related
In Python, len is a function to get the length of a collection by calling an object's __len__ method:
def len(x):
return x.__len__()
So I would expect direct call of __len__() to be at least as fast as len().
import timeit
setup = '''
'''
print (timeit.Timer('a="12345"; x=a.__len__()', setup=setup).repeat(10))
print (timeit.Timer('a="12345"; x=len(a)', setup=setup).repeat(10))
Demo link
But results of testing with the above code shows len() to be faster. Why?
The builtin len() function does not look up the .__len__ attribute. It looks up the tp_as_sequence pointer, which in turn has a sq_length attribute.
The .__len__ attribute on built-in objects is indirectly mapped to the same slot, and it is that indirection (plus the attribute lookup) that takes more time.
For Python-defined classes, the type object looks up the .__len__ method when the sq_length is requested.
From an excellent Python Object-Oriented Programming: Build robust and maintainable object-oriented Python applications and libraries, 4th Edition book by Steven F. Lott, and Dusty Phillips
You may wonder why these objects don't have a length property instead of having to call a function on them. Technically, they do. Most objects that len() will apply to have a method called __len__() that returns the same value. So len(myobj) seems to call myobj.__len__().
Why should we use the len() function instead of the __len__() method? Obviously, __len__() is a special double-underscore method, suggesting that we shouldn't call it directly. There must be an explanation for this. The Python developers don't make such design decisions lightly.
The main reason is efficiency. When we call the __len__() method of an object, the object has to look the method up in its namespace, and, if the special __getattribute__() method (which is called every time an attribute or method on an object is accessed) is defined on that object, it has to be called as well. Furthermore, the __getattribute__() method may have been written to do something clever, for example, refusing to give us access to special methods such as __len__()! The len() function doesn't encounter any of this. It actually calls the __len__() method on the underlying class, so len(myobj) maps to MyObj.__len__(myobj).
__len__ is slower than len(), because __len__
involves a dict lookup.
why would anyone use double underscores
why not just do len([1,2,3])?
my question is specifically What do the underscores mean?
__len__() is the special Python method that is called when you use len().
It's pretty much like str() uses __str__(), repr uses __repr__(), etc. You can overload it in your classes to give len() a custom behaviour when used on an instance of your class.
See here: http://docs.python.org/release/2.5.2/ref/sequence-types.html:
__len__(self)
Called to implement the built-in function len(). Should return
the length of the object, an integer >= 0.
Also, an object that doesn't define a __nonzero__() method and
whose __len__() method returns zero is
considered to be false in a Boolean
context.
The one reason I've had to use x.__len__() rather than len(x) is due to the limitation that len can only return an int, whereas __len__ can return anything.
You might quite reasonably argue that only an integer should be returned, but if the length of your object is greater than sys.maxsize then you have no choice except to return a long instead:
>>> class A(object):
... def __len__(self):
... return 2**80
...
>>> a = A()
>>> len(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OverflowError: long int too large to convert to int
>>> a.__len__()
1208925819614629174706176L
This isn't just a theoretical limitation, I've done a fair bit of work with bit containers where (with 32-bit Python) the len function stops working when the object reaches just 256MB.
If you "absolutely must" call it as a method (rather than a function) then the double underscores are needed. But the only motivation I can think for the code you show of is that somebody's gone so OO-happy that they don't understand the key Python principle that you don't directly call special methods (except you may need to call your superclass's version if you're overriding it, of course) -- you use the appropriate builtin or operator, and it calls (internally) whatever special methods are needed.
Actually, in the code you show the only sensible thing is to just use the constant 3, so maybe you've oversimplified your example a little...?-)
You should read the docs on special method names.
I know that python has a len() function that is used to determine the size of a string, but I was wondering why it's not a method of the string object?
Strings do have a length method: __len__()
The protocol in Python is to implement this method on objects which have a length and use the built-in len() function, which calls it for you, similar to the way you would implement __iter__() and use the built-in iter() function (or have the method called behind the scenes for you) on objects which are iterable.
See Emulating container types for more information.
Here's a good read on the subject of protocols in Python: Python and the Principle of Least Astonishment
Jim's answer to this question may help; I copy it here. Quoting Guido van Rossum:
First of all, I chose len(x) over x.len() for HCI reasons (def __len__() came much later). There are two intertwined reasons actually, both HCI:
(a) For some operations, prefix notation just reads better than postfix — prefix (and infix!) operations have a long tradition in mathematics which likes notations where the visuals help the mathematician thinking about a problem. Compare the easy with which we rewrite a formula like x*(a+b) into x*a + x*b to the clumsiness of doing the same thing using a raw OO notation.
(b) When I read code that says len(x) I know that it is asking for the length of something. This tells me two things: the result is an integer, and the argument is some kind of container. To the contrary, when I read x.len(), I have to already know that x is some kind of container implementing an interface or inheriting from a class that has a standard len(). Witness the confusion we occasionally have when a class that is not implementing a mapping has a get() or keys() method, or something that isn’t a file has a write() method.
Saying the same thing in another way, I see ‘len‘ as a built-in operation. I’d hate to lose that. /…/
Python is a pragmatic programming language, and the reasons for len() being a function and not a method of str, list, dict etc. are pragmatic.
The len() built-in function deals directly with built-in types: the CPython implementation of len() actually returns the value of the ob_size field in the PyVarObject C struct that represents any variable-sized built-in object in memory. This is much faster than calling a method -- no attribute lookup needs to happen. Getting the number of items in a collection is a common operation and must work efficiently for such basic and diverse types as str, list, array.array etc.
However, to promote consistency, when applying len(o) to a user-defined type, Python calls o.__len__() as a fallback. __len__, __abs__ and all the other special methods documented in the Python Data Model make it easy to create objects that behave like the built-ins, enabling the expressive and highly consistent APIs we call "Pythonic".
By implementing special methods your objects can support iteration, overload infix operators, manage contexts in with blocks etc. You can think of the Data Model as a way of using the Python language itself as a framework where the objects you create can be integrated seamlessly.
A second reason, supported by quotes from Guido van Rossum like this one, is that it is easier to read and write len(s) than s.len().
The notation len(s) is consistent with unary operators with prefix notation, like abs(n). len() is used way more often than abs(), and it deserves to be as easy to write.
There may also be a historical reason: in the ABC language which preceded Python (and was very influential in its design), there was a unary operator written as #s which meant len(s).
There is a len method:
>>> a = 'a string of some length'
>>> a.__len__()
23
>>> a.__len__
<method-wrapper '__len__' of str object at 0x02005650>
met% python -c 'import this' | grep 'only one'
There should be one-- and preferably only one --obvious way to do it.
There are some great answers here, and so before I give my own I'd like to highlight a few of the gems (no ruby pun intended) I've read here.
Python is not a pure OOP language -- it's a general purpose, multi-paradigm language that allows the programmer to use the paradigm they are most comfortable with and/or the paradigm that is best suited for their solution.
Python has first-class functions, so len is actually an object. Ruby, on the other hand, doesn't have first class functions. So the len function object has it's own methods that you can inspect by running dir(len).
If you don't like the way this works in your own code, it's trivial for you to re-implement the containers using your preferred method (see example below).
>>> class List(list):
... def len(self):
... return len(self)
...
>>> class Dict(dict):
... def len(self):
... return len(self)
...
>>> class Tuple(tuple):
... def len(self):
... return len(self)
...
>>> class Set(set):
... def len(self):
... return len(self)
...
>>> my_list = List([1,2,3,4,5,6,7,8,9,'A','B','C','D','E','F'])
>>> my_dict = Dict({'key': 'value', 'site': 'stackoverflow'})
>>> my_set = Set({1,2,3,4,5,6,7,8,9,'A','B','C','D','E','F'})
>>> my_tuple = Tuple((1,2,3,4,5,6,7,8,9,'A','B','C','D','E','F'))
>>> my_containers = Tuple((my_list, my_dict, my_set, my_tuple))
>>>
>>> for container in my_containers:
... print container.len()
...
15
2
15
15
Something missing from the rest of the answers here: the len function checks that the __len__ method returns a non-negative int. The fact that len is a function means that classes cannot override this behaviour to avoid the check. As such, len(obj) gives a level of safety that obj.len() cannot.
Example:
>>> class A:
... def __len__(self):
... return 'foo'
...
>>> len(A())
Traceback (most recent call last):
File "<pyshell#8>", line 1, in <module>
len(A())
TypeError: 'str' object cannot be interpreted as an integer
>>> class B:
... def __len__(self):
... return -1
...
>>> len(B())
Traceback (most recent call last):
File "<pyshell#13>", line 1, in <module>
len(B())
ValueError: __len__() should return >= 0
Of course, it is possible to "override" the len function by reassigning it as a global variable, but code which does this is much more obviously suspicious than code which overrides a method in a class.
example:
a_list = [1, 2, 3]
a_list.len() # doesn't work
len(a_list) # works
Python being (very) object oriented, I don't understand why the 'len' function isn't inherited by the object.
Plus I keep trying the wrong solution since it appears as the logical one to me
Guido's explanation is here:
First of all, I chose len(x) over x.len() for HCI reasons (def __len__() came much later). There are two intertwined reasons actually, both HCI:
(a) For some operations, prefix notation just reads better than postfix — prefix (and infix!) operations have a long tradition in mathematics which likes notations where the visuals help the mathematician thinking about a problem. Compare the easy with which we rewrite a formula like x*(a+b) into x*a + x*b to the clumsiness of doing the same thing using a raw OO notation.
(b) When I read code that says len(x) I know that it is asking for the length of something. This tells me two things: the result is an integer, and the argument is some kind of container. To the contrary, when I read x.len(), I have to already know that x is some kind of container implementing an interface or inheriting from a class that has a standard len(). Witness the confusion we occasionally have when a class that is not implementing a mapping has a get() or keys() method, or something that isn’t a file has a write() method.
Saying the same thing in another way, I see ‘len‘ as a built-in operation. I’d hate to lose that. /…/
The short answer: 1) backwards compatibility and 2) there's not enough of a difference for it to really matter. For a more detailed explanation, read on.
The idiomatic Python approach to such operations is special methods which aren't intended to be called directly. For example, to make x + y work for your own class, you write a __add__ method. To make sure that int(spam) properly converts your custom class, write a __int__ method. To make sure that len(foo) does something sensible, write a __len__ method.
This is how things have always been with Python, and I think it makes a lot of sense for some things. In particular, this seems like a sensible way to implement operator overloading. As for the rest, different languages disagree; in Ruby you'd convert something to an integer by calling spam.to_i directly instead of saying int(spam).
You're right that Python is an extremely object-oriented language and that having to call an external function on an object to get its length seems odd. On the other hand, len(silly_walks) isn't any more onerous than silly_walks.len(), and Guido has said that he actually prefers it (http://mail.python.org/pipermail/python-3000/2006-November/004643.html).
It just isn't.
You can, however, do:
>>> [1,2,3].__len__()
3
Adding a __len__() method to a class is what makes the len() magic work.
This way fits in better with the rest of the language. The convention in python is that you add __foo__ special methods to objects to make them have certain capabilities (rather than e.g. deriving from a specific base class). For example, an object is
callable if it has a __call__ method
iterable if it has an __iter__ method,
supports access with [] if it has __getitem__ and __setitem__.
...
One of these special methods is __len__ which makes it have a length accessible with len().
Maybe you're looking for __len__. If that method exists, then len(a) calls it:
>>> class Spam:
... def __len__(self): return 3
...
>>> s = Spam()
>>> len(s)
3
Well, there actually is a length method, it is just hidden:
>>> a_list = [1, 2, 3]
>>> a_list.__len__()
3
The len() built-in function appears to be simply a wrapper for a call to the hidden len() method of the object.
Not sure why they made the decision to implement things this way though.
there is some good info below on why certain things are functions and other are methods. It does indeed cause some inconsistencies in the language.
http://mail.python.org/pipermail/python-dev/2008-January/076612.html
I read somewhere that functions should always return only one type
so the following code is considered as bad code:
def x(foo):
if 'bar' in foo:
return (foo, 'bar')
return None
I guess the better solution would be
def x(foo):
if 'bar' in foo:
return (foo, 'bar')
return ()
Wouldn't it be cheaper memory wise to return a None then to create a new empty tuple or is this time difference too small to notice even in larger projects?
Why should functions return values of a consistent type? To meet the following two rules.
Rule 1 -- a function has a "type" -- inputs mapped to outputs. It must return a consistent type of result, or it isn't a function. It's a mess.
Mathematically, we say some function, F, is a mapping from domain, D, to range, R. F: D -> R. The domain and range form the "type" of the function. The input types and the result type are as essential to the definition of the function as is the name or the body.
Rule 2 -- when you have a "problem" or can't return a proper result, raise an exception.
def x(foo):
if 'bar' in foo:
return (foo, 'bar')
raise Exception( "oh, dear me." )
You can break the above rules, but the cost of long-term maintainability and comprehensibility is astronomical.
"Wouldn't it be cheaper memory wise to return a None?" Wrong question.
The point is not to optimize memory at the cost of clear, readable, obvious code.
It's not so clear that a function must always return objects of a limited type, or that returning None is wrong. For instance, re.search can return a _sre.SRE_Match object or a NoneType object:
import re
match=re.search('a','a')
type(match)
# <type '_sre.SRE_Match'>
match=re.search('a','b')
type(match)
# <type 'NoneType'>
Designed this way, you can test for a match with the idiom
if match:
# do xyz
If the developers had required re.search to return a _sre.SRE_Match object, then
the idiom would have to change to
if match.group(1) is None:
# do xyz
There would not be any major gain by requiring re.search to always return a _sre.SRE_Match object.
So I think how you design the function must depend on the situation and in particular, how you plan to use the function.
Also note that both _sre.SRE_Match and NoneType are instances of object, so in a broad sense they are of the same type. So the rule that "functions should always return only one type" is rather meaningless.
Having said that, there is a beautiful simplicity to functions that return objects which all share the same properties. (Duck typing, not static typing, is the python way!) It can allow you to chain together functions: foo(bar(baz))) and know with certainty the type of object you'll receive at the other end.
This can help you check the correctness of your code. By requiring that a function returns only objects of a certain limited type, there are fewer cases to check. "foo always returns an integer, so as long as an integer is expected everywhere I use foo, I'm golden..."
Best practice in what a function should return varies greatly from language to language, and even between different Python projects.
For Python in general, I agree with the premise that returning None is bad if your function generally returns an iterable, because iterating without testing becomes impossible. Just return an empty iterable in this case, it will still test False if you use Python's standard truth testing:
ret_val = x()
if ret_val:
do_stuff(ret_val)
and still allow you to iterate over it without testing:
for child in x():
do_other_stuff(child)
For functions that are likely to return a single value, I think returning None is perfectly acceptable, just document that this might happen in your docstring.
Here are my thoughts on all that and I'll try to also explain why I think that the accepted answer is mostly incorrect.
First of all programming functions != mathematical functions. The closest you can get to mathematical functions is if you do functional programming but even then there are plenty of examples that say otherwise.
Functions do not have to have input
Functions do not have to have output
Functions do not have to map input to output (because of the previous two bullet points)
A function in terms of programming is to be viewed simply as a block of memory with a start (the function's entry point), a body (empty or otherwise) and exit point (one or multiple depending on the implementation) all of which are there for the purpose of reusing code that you've written. Even if you don't see it a function always "returns" something. This something is actually the address of next statement right after the function call. This is something you will see in all of its glory if you do some really low-level programming with an Assembly language (I dare you to go the extra mile and do some machine code by hand like Linus Torvalds who ever so often mentions this during his seminars and interviews :D). In addition you can also take some input and also spit out some output. That is why
def foo():
pass
is a perfectly correct piece of code.
So why would returning multiple types be bad? Well...It isn't at all unless you abuse it. This is of course a matter of poor programming skills and/or not knowing what the language you're using can do.
Wouldn't it be cheaper memory wise to return a None then to create a new empty tuple or is this time difference too small to notice even in larger projects?
As far as I know - yes, returning a NoneType object would be much cheaper memory-wise. Here is a small experiment (returned values are bytes):
>> sys.getsizeof(None)
16
>> sys.getsizeof(())
48
Based on the type of object you are using as your return value (numeric type, list, dictionary, tuple etc.) Python manages the memory in different ways including the initially reserved storage.
However you have to also consider the code that is around the function call and how it handles whatever your function returns. Do you check for NoneType? Or do you simply check if the returned tuple has length of 0? This propagation of the returned value and its type (NoneType vs. empty tuple in your case) might actually be more tedious to handle and blow up in your face. Don't forget - the code itself is loaded into memory so if handling the NoneType requires too much code (even small pieces of code but in a large quantity) better leave the empty tuple, which will also avoid confusion in the minds of people using your function and forgetting that it actually returns 2 types of values.
Speaking of returning multiple types of value this is the part where I agree with the accepted answer (but only partially) - returning a single type makes the code more maintainable without a doubt. It's much easier to check only for type A then A, B, C, ... etc.
However Python is an object-oriented language and as such inheritance, abstract classes etc. and all that is part of the whole OOP shenanigans comes into play. It can go as far as even generating classes on-the-fly, which I have discovered a few months ago and was stunned (never seen that stuff in C/C++).
Side note: You can read a little bit about metaclasses and dynamic classes in this nice overview article with plenty of examples.
There are in fact multiple design patterns and techniques that wouldn't even exists without the so called polymorphic functions. Below I give you two very popular topics (can't find a better way to summarize both in a single term):
Duck typing - often part of the dynamic typing languages which Python is a representative of
Factory method design pattern - basically it's a function that returns various objects based on the input it receives.
Finally whether your function returns one or multiple types is totally based on the problem you have to solve. Can this polymorphic behaviour be abused? Sure, like everything else.
I personally think it is perfectly fine for a function to return a tuple or None. However, a function should return at most 2 different types and the second one should be a None. A function should never return a string and list for example.
If x is called like this
foo, bar = x(foo)
returning None would result in a
TypeError: 'NoneType' object is not iterable
if 'bar' is not in foo.
Example
def x(foo):
if 'bar' in foo:
return (foo, 'bar')
return None
foo, bar = x(["foo", "bar", "baz"])
print foo, bar
foo, bar = x(["foo", "NOT THERE", "baz"])
print foo, bar
This results in:
['foo', 'bar', 'baz'] bar
Traceback (most recent call last):
File "f.py", line 9, in <module>
foo, bar = x(["foo", "NOT THERE", "baz"])
TypeError: 'NoneType' object is not iterable
Premature optimization is the root of all evil. The minuscule efficiency gains might be important, but not until you've proven that you need them.
Whatever your language: a function is defined once, but tends to be used at any number of places. Having a consistent return type (not to mention documented pre- and postconditions) means you have to spend more effort defining the function, but you simplify the usage of the function enormously. Guess whether the one-time costs tend to outweigh the repeated savings...?