Why is Python's 'len' function faster than the __len__ method? - python

In Python, len is a function to get the length of a collection by calling an object's __len__ method:
def len(x):
return x.__len__()
So I would expect direct call of __len__() to be at least as fast as len().
import timeit
setup = '''
'''
print (timeit.Timer('a="12345"; x=a.__len__()', setup=setup).repeat(10))
print (timeit.Timer('a="12345"; x=len(a)', setup=setup).repeat(10))
Demo link
But results of testing with the above code shows len() to be faster. Why?

The builtin len() function does not look up the .__len__ attribute. It looks up the tp_as_sequence pointer, which in turn has a sq_length attribute.
The .__len__ attribute on built-in objects is indirectly mapped to the same slot, and it is that indirection (plus the attribute lookup) that takes more time.
For Python-defined classes, the type object looks up the .__len__ method when the sq_length is requested.

From an excellent Python Object-Oriented Programming: Build robust and maintainable object-oriented Python applications and libraries, 4th Edition book by Steven F. Lott, and Dusty Phillips
You may wonder why these objects don't have a length property instead of having to call a function on them. Technically, they do. Most objects that len() will apply to have a method called __len__() that returns the same value. So len(myobj) seems to call myobj.__len__().
Why should we use the len() function instead of the __len__() method? Obviously, __len__() is a special double-underscore method, suggesting that we shouldn't call it directly. There must be an explanation for this. The Python developers don't make such design decisions lightly.
The main reason is efficiency. When we call the __len__() method of an object, the object has to look the method up in its namespace, and, if the special __getattribute__() method (which is called every time an attribute or method on an object is accessed) is defined on that object, it has to be called as well. Furthermore, the __getattribute__() method may have been written to do something clever, for example, refusing to give us access to special methods such as __len__()! The len() function doesn't encounter any of this. It actually calls the __len__() method on the underlying class, so len(myobj) maps to MyObj.__len__(myobj).

__len__ is slower than len(), because __len__
involves a dict lookup.

Related

Why does eval() not find the function?

def __remove_client(self, parameters):
try:
client = self.__client_service.remove_client_by_id(int(parameters[0]))
FunctionsManager.add_undo_operation([self.__client_service, self.__rental_service],
UndoHandler.delete_client_entry, [client[0], client[1]])
FunctionsManager.add_redo_operation(eval('self.__add_new_client(client[0].id,client[0].name)'))
And this gives me : 'UI' object has no attribute '__add_new_client'
What should I do? Or is there another way of adding that function to my repo() stack without calling the function while I am at it?
According to the docs on Private methods:
Notice that code passed to exec() or eval() does not consider the classname of the invoking class to be the current class; this is similar to the effect of the global statement, the effect of which is likewise restricted to code that is byte-compiled together. The same restriction applies to getattr(), setattr() and delattr(), as well as when referencing __dict__ directly.
As for why your eval() is pointless, this:
eval('self.__add_new_client(client[0].id,client[0].name)')
is exacty equivalent to if you just ran the code:
self.__add_new_client(client[0].id,client[0].name)
directly. It seems like maybe you were hoping for some kind of delayed lazy evaluation or something but that's not how it works. Perhaps you wanted to pass a partial evaluation of that method such as:
from functools import partial
FunctionsManager.add_redo_operation(partial(self.__add_new_client, client[0].id, client[0].name))
If this is your own code, you shouldn't actually use the __ methods unless you know exactly what you're doing. There is generally no good reason to use this (Guido has even I think regretted the feature in the past). It's mostly just useful in the special case described in the docs, where you might intend a subclass to override a special method, and you want to keep a "private" copy of that method that cannot be overridden.
Otherwise just use the single _ convention for internal attributes and methods.

Python intercept method call

Let me start by saying what I would like to do. I want to create a lazy wrapper for a variable, as in I record all the method calls and operator calls and evaluate them later when I specify the variable to call it on.
As such, I want to be able to intercept all the method calls and operator calls and special methods so that I can work on them. However, __getattr__ doesn't intercept operator calls or __str__ and such, so I want to know if there is a generic way to overload all method calls, or should I just dynamically create a class and duplicate the code for all of it (which I already did, but is ugly).
It can be done, but yes, it becomes "ugly" - I wrote a lazy decorator once, that turns any function into a "lazily computed function".
Basically, I found out that the only moment an object's value is actually used in Python is when one of the special "dunder" methods s called. For example, when you have a number, it's value is only used when you are either using it in another operation, or converting it to a string for IO (which also uses a "dunder" method)
So, my wrapper anotates the parameters to a function call, and returns an special object,
which has potentially all of the "dunder" methods. Just when one of those methods is called, the original function is called - and its return value is then cached for further usage.
The implementation is here:
https://bitbucket.org/jsbueno/metapython/src/510a7d125b24/lazy_decorator.py
Sorry for the text and most of the presentation being in Portuguese.

Is it good form to have an __init__ method that checks the type of its input?

I have a class that wants to be initialized from a few possible inputs. However a combination of no function overloading and my relative inexperience with the language makes me unsure of how to proceed. Any advice?
Check out this question asked earlier.
In short, the recommendation is that you use classmethods or isinstance(), with classmethods being heavily favored.
With Python, you should use duck typing. Wikipedia has a good section on its use in Python at http://en.wikipedia.org/wiki/Duck_typing#In_Python
Contrary to what others have answered, it's not rare to check for types in __init__. For example the array.array class in the Python Standard library accepts an optional initializer argument, which may be a list, string, or iterable. The documentation explicitly states different actions take place based on the type. For another example of the same treatment by argument type see decimal.Decimal. Or see zipfile.Zipfile, which accepts a file argument "where file can be either a path to a file (a string) or a file-like object." (Here we see both explicit type checking (a string) and duck typing (a file-like object) all in one!)
If you find explicit type checking in __init__ is getting messy, try a different approach. Use factory functions instead. For example, let's say you have a triangle module with a Triangle class. There are many ways to construct a triangle. Rather than having __init__ handle all these ways, you could add factory methods to your module:
triangle.from_sas(side1, angle, side2)
triangle.from_asa(angle1, side, angle2)
triangle.from_sss(side1, side2, side3)
triangle.from_aas(angle1, angle2, side)
These factory methods could also be rolled into the Triangle class, using the #classmethod decorator. For an excellent example of this technique see Thomas Wouter's fine answer to stackoverflow question overloading init in python.
No, don't check for types explicitly. Python is a duck typed language. If the wrong type is passed, a TypeError will be raised. That's it. You need not bother about the type, that is the responsibility of the programmer.

What is the difference between the __int__ and __index__ methods in Python 3?

The Data Model section of the Python 3.2 documentation provides the following descriptions for the __int__ and __index__ methods:
object.__int__(self)
Called to implement the built-in [function int()]. Should return [an integer].
object.__index__(self)
Called to implement operator.index(). Also called whenever Python needs an integer object (such as in slicing, or in the built-in bin(), hex() and oct() functions). Must return an integer.
I understand that they're used for different purposes, but I've been unable to figure out why two different methods are necessary. What is the difference between these methods? Is it safe to just alias __index__ = __int__ in my classes?
See PEP 357: Allowing Any Object to be Used for Slicing.
The nb_int method is used for coercion and so means something
fundamentally different than what is requested here. This PEP
proposes a method for something that can already be thought of as
an integer communicate that information to Python when it needs an
integer. The biggest example of why using nb_int would be a bad
thing is that float objects already define the nb_int method, but
float objects should not be used as indexes in a sequence.
Edit: It seems that it was implemented in Python 2.5.
I believe you'll find the answer in PEP 357, which has this abstract:
This PEP proposes adding an nb_index
slot in PyNumberMethods and an
__index__ special method so that arbitrary objects can be used
whenever integers are explicitly needed in Python, such as in slice
syntax (from which the slot gets its name).

Why isn't the 'len' function inherited by dictionaries and lists in Python

example:
a_list = [1, 2, 3]
a_list.len() # doesn't work
len(a_list) # works
Python being (very) object oriented, I don't understand why the 'len' function isn't inherited by the object.
Plus I keep trying the wrong solution since it appears as the logical one to me
Guido's explanation is here:
First of all, I chose len(x) over x.len() for HCI reasons (def __len__() came much later). There are two intertwined reasons actually, both HCI:
(a) For some operations, prefix notation just reads better than postfix — prefix (and infix!) operations have a long tradition in mathematics which likes notations where the visuals help the mathematician thinking about a problem. Compare the easy with which we rewrite a formula like x*(a+b) into x*a + x*b to the clumsiness of doing the same thing using a raw OO notation.
(b) When I read code that says len(x) I know that it is asking for the length of something. This tells me two things: the result is an integer, and the argument is some kind of container. To the contrary, when I read x.len(), I have to already know that x is some kind of container implementing an interface or inheriting from a class that has a standard len(). Witness the confusion we occasionally have when a class that is not implementing a mapping has a get() or keys() method, or something that isn’t a file has a write() method.
Saying the same thing in another way, I see ‘len‘ as a built-in operation. I’d hate to lose that. /…/
The short answer: 1) backwards compatibility and 2) there's not enough of a difference for it to really matter. For a more detailed explanation, read on.
The idiomatic Python approach to such operations is special methods which aren't intended to be called directly. For example, to make x + y work for your own class, you write a __add__ method. To make sure that int(spam) properly converts your custom class, write a __int__ method. To make sure that len(foo) does something sensible, write a __len__ method.
This is how things have always been with Python, and I think it makes a lot of sense for some things. In particular, this seems like a sensible way to implement operator overloading. As for the rest, different languages disagree; in Ruby you'd convert something to an integer by calling spam.to_i directly instead of saying int(spam).
You're right that Python is an extremely object-oriented language and that having to call an external function on an object to get its length seems odd. On the other hand, len(silly_walks) isn't any more onerous than silly_walks.len(), and Guido has said that he actually prefers it (http://mail.python.org/pipermail/python-3000/2006-November/004643.html).
It just isn't.
You can, however, do:
>>> [1,2,3].__len__()
3
Adding a __len__() method to a class is what makes the len() magic work.
This way fits in better with the rest of the language. The convention in python is that you add __foo__ special methods to objects to make them have certain capabilities (rather than e.g. deriving from a specific base class). For example, an object is
callable if it has a __call__ method
iterable if it has an __iter__ method,
supports access with [] if it has __getitem__ and __setitem__.
...
One of these special methods is __len__ which makes it have a length accessible with len().
Maybe you're looking for __len__. If that method exists, then len(a) calls it:
>>> class Spam:
... def __len__(self): return 3
...
>>> s = Spam()
>>> len(s)
3
Well, there actually is a length method, it is just hidden:
>>> a_list = [1, 2, 3]
>>> a_list.__len__()
3
The len() built-in function appears to be simply a wrapper for a call to the hidden len() method of the object.
Not sure why they made the decision to implement things this way though.
there is some good info below on why certain things are functions and other are methods. It does indeed cause some inconsistencies in the language.
http://mail.python.org/pipermail/python-dev/2008-January/076612.html

Categories