It's used as a function, but why:
>>> help(map)
Help on class map in module builtins:
class map(object)
| map(func, *iterables) --> map object
|
| Make an iterator that computes the function using arguments from
| each of the iterables. Stops when the shortest iterable is exhausted.
|
| Methods defined here:
|
| __getattribute__(self, name, /)
| Return getattr(self, name).
...
How to understand the above output, where it shows a class and some methods?
Thanks.
The misunderstanding is due to poor documentation that doesn't catch a major change in specs, or due to the CPython implementation which dare to write a class for what is listed as a built-in function in the specs.
In Python 2, it is a function that returns a list. In the online documentation of Python 2, it is listed under Built-in Functions. The first line of help(map) on CPython 2.7.10 reads
Help on built-in function map in module builtin
correctly calling it a function.
In Python 3, they changed the specs that it returns an iterator instead of a list. As #RafaelC noted, it has an advantage of lazy loading. Althiugh it is still under "Built-n Functions", the CPython implementation decided to make it a class. This change is reflected in help(map), which you have seen and quoted in the question.
What you are doing when you call map() in CPython 3 is, you are creating an object of class map with the parameters you throw. This is clearly shown when you try to print what map() returns.
CPython 2.7.10:
>>> map(int, "12345")
[1, 2, 3, 4, 5]
CPython 3.7.2:
>>> map(int, "12345")
<map object at 0x1023454e0>
So you are clearly creating an object of class map, which makes what you've seen in help(map) sound very fine.
So it seems that, to the CPython core developers, a class can be a "function" with some definiton of a "function". This is clearly misleading. Anyway, it implements the necessary methods that enables it to be used as an iterator. (as the docs says, if you ignore that it's listed under builtin functions.)
It's used as a function
That's because the syntax of calling a function and fetching its return value is identical to creating a class object (by calling its initializer) and fetching the object.
For example, using a function my_function() as in return_value = my_function() is syntactically no different from creating a class object of my_class() as in my_object = my_class(). When you call map() in CPython 3, you are creating an object of class map. But you would write the same even if map were a function. That's why you're confused.
So in short,
map was a function in CPython 2, but is a class in CPython 3. That is clear from help(map) on both versions, and that's what the CPython implementation does.
The documentation keeps it under "Built-in functions" while CPython implementation finds liberty to write a class for it, causing confusion.
It's a shame that the two aren't clearly distinguished in the docs.
Related
Trying to code as efficient as possible object-oriented implementation of priority queue in Python, I faced an interesting behavior. The following code works fine
from heapq import heappush
class PriorityQueue(list):
__slots__ = ()
def push(self, item):
heappush(self, item)
However, I really didn’t want to write a wrapper method for calling heappush, as it incurs additional overhead for calling the function. I reasoned that since the heappush signature uses list as the first argument, while aliasing the push class attribute with the heappush function, the latter becomes a full-fledged class instance method. However, my assumption turned out to be false, and the following code gives an error.
from heapq import heappush
class PriorityQueue(list):
__slots__ = ()
push = heappush
PriorityQueue().push(0)
# TypeError: heappush expected 2 arguments, got 1
But going to cpython heapq source code, just copying heappush implementation into the scope and applying the same logic works fine.
from heapq import _siftdown
def heappush(heap, item):
"""Push item onto heap, maintaining the heap invariant."""
heap.append(item)
_siftdown(heap, 0, len(heap) - 1)
class PriorityQueue(list):
__slots__ = ()
push = heappush
pq = PriorityQueue()
pq.push(0)
pq.push(-1)
pq.push(3)
print(pq)
# [-1, 0, 3]
The first question: Why does it happen? How does Python decide which function is appropriate for binding as an instance method and which is not?
The second question: What is the difference between heappush in the cpython/Lib/heapq.py and the actual heappush from the heapq module? They are actually different since the following code gives an error
from dis import dis
from heapq import heappush
dis(heappush)
# TypeError: don't know how to disassemble builtin_function_or_method objects
The third question: How can one force Python to bind native heappush as an instance method? Some metaclass magic?
Thank you!
What takes place is that Python offers pure Python implementations of a lot of its algorithms in the standard library even when it contains acceletated native code implementations of the same algorithms.
The heapq library is one of those - if you pick the file you link to, but close to the end, you will see the code snippet which looks if the native version is available, and overwrites the Python version, which has the code you copy and pasted - https://github.com/python/cpython/blob/76cd81d60310d65d01f9d7b48a8985d8ab89c8b4/Lib/heapq.py#L580
try:
from _heapq import *
except ImportError:
pass
...
The native version of heappush is loaded into the module, and there is no easy way of getting a reference to the original Python function, short of getting to the actual file source code.
Now, the point: why do native functions do not work as class methods?
heappush's type is builtin_function_or_method, in constrast with function for pure Python functions - and one of the major diference is that the second object type features a __get__ method. This __get__ makes Python defined functions work as "descriptors": the __get__ method is called when one retrieves the attribute from an instance. For ordinary functions, this call records the self parameter and injects it when the actual function is called.
Thus, it is easy to write an "instancemethod" decorator that will make built-in functions work as Python functions and usable as methods. However, the overhead of creating a partial or a lambda function should surpass the overhead of the extra function call you are trying to eliminate - so you should get no speed gains from it, although it might still read as more elegant:
class instancemethod:
def __init__(self, func):
self.func = func
def __get__(self, instance, owner):
return lambda *args, **kwargs: self.func(instance, *args, **kwargs)
import heapq
class MyHeap(list):
push = instancemethod(heapq.heappush)
Maybe it the way python calls a function. When you try print(type(heappush)) you will notice the difference.
For question 1, the decorator that use to identify which function is which type (i.e. staticmethod, classmethod) is like call and process the function and return the processed one to that name. So the data that determine that should be in some attribute of the function. After I find where it is, question 3 may be solved.
For question 2. When you import the built-in function, it will be in the type of builtin_function_or_method. But if you copy and paste it, it was defined in your code so it's just function. That may cause the interpreter to call it a static method instead of an instance method.
Consider the following snippet.
class A:
def __next__(self):
return 2
a = A()
print(next(a),a.__next__()) # prints "2,2" as expected
a.__next__ = lambda: 4
print(next(a),a.__next__()) # prints "2,4". I expected "4,4"
Clearly, the property __next__ is updated by the patching, but the inbuilt next function does not resolve that.
The python 3 docs docs on the python datamodel that says
For instance, if a class defines a method named __getitem__(), and x is an instance of this class, then x[i] is roughly equivalent to type(x).__getitem__(x, i).
From this, I came up with a hack as below
class A:
def next_(self):
return 2
def __next__(self):
return self.next_()
a = A()
print(next(a),a.__next__()) # 2,2
a.next_ = lambda: 4
print(next(a),a.__next__()) # 4,4
The code works, but at the expense of another layer of indirection via another next_-method.
My question is: What is the proper way to monkey-patch the __next__ instance method? What is the rationale behind this design in python?
You can't. Special methods are special they cannot be overridden at the instance level. Period. If you want to "customize" the instance behaviour the correct way to do it is to simply have a proper implementation instead of a bogus implementation that you swap at runtime. Change the value instead of the method.
The rationale can be found in The History of Python - Adding Support for User-defined Classes at the end of following section:
Special Methods
As briefly mentioned in the last section, one of my main goals was to
keep the implementation of classes simple. In most object oriented
languages, there are a variety of special operators and methods that
only apply to classes. For example, in C++, there is a special syntax
for defining constructors and destructors that is different than the
normal syntax used to define ordinary function and methods.
I really didn't want to introduce additional syntax to handle special
operations for objects. So instead, I handled this by simply mapping
special operators to a predefined set of "special method" names such
as __init__ and __del__. By defining methods with these names, users
could supply code related to the construction and destruction of
objects.
I also used this technique to allow user classes to redefine the
behavior of Python's operators. As previously noted, Python is
implemented in C and uses tables of function pointers to implement
various capabilities of built-in objects (e.g., “get attribute”, “add”
and “call”). To allow these capabilities to be defined in user-defined
classes, I mapped the various function pointers to special method
names such as __getattr__, __add__, and __call__. There is a direct
correspondence between these names and the tables of function pointers
one has to define when implementing new Python objects in C.
In summary: types defined in C have a structure that contains pointers to special methods. Guido wanted to keep consistency with types defined in Python and so their special methods end up being used at the class level.
Could the implementation always follow the lookup order? Yes... at a huge cost in speed, since now even the C code would have to first perform a dictionary lookup on the instance to ensure whether or not a special method is defined and call that. Given that special methods are called often, especially for built-in types, it makes sense to just have a direct pointer to the function in the class. The behaviour of the python side is just consistent with this.
Python was never bright in the performance sector. Your suggested implementation would run extremely slowly, especially 20 years ago when it was design on way less powerful machines and when JITs were extremely rare and not so well understood (compared to the present).
How are you supposed to access the 10 in this? I've been informed we're returning a function in this function, but how does this make sense?
function([1, 2, 3, 4])(10)
I'm assuming a lot based on the limited information you've provided in your question.
But it looks like you trying to understand a functional closure. Here's a totally contrived example:
def function(a):
def inner(b):
return sum(a) == b
return inner
>>> function([1,2,3,4])(10)
True
>>> eq = function([1,2,3,4])
>>> eq(10)
True
>>> eq(11)
False
In your expression function([1, 2, 3, 4])(10), there are two calls, one with the argument [1, 2, 3, 4] and the other with the argument 10. For this to work, function must be a callable that returns a callable. Python relies heavily on objects having types which define their behaviour, and callability is one of those behaviours, recursively defined by objects having a __call__ method (which is a type of callable). Because of this dynamic behaviour, we can't tell from the expression what type function is.
We can provide examples that would make the expression valid, though. For instance:
function = lambda x: x.__contains__
This creates an anonymous function using a lambda expression, which is a callable. That function returns a bound method (assuming its argument has the __contains__ method) which in turn is callable, and the expression would evaluate to False.
class function:
def __init__(self,a):
"Method called during object initialization"
# Note that the return value doesn't come from this method.
# self is created before it is called and returned after.
def __call__(self,b):
"Method called when the object is called"
return "Well, the first one wasn't quite a function."
This makes a class named function, and classes are callable, which is how we instantiate them. So the first call became an object instantiation and the second call calls an object. In this example, we don't actually have a function, though we do have two methods that are called within the two calls.
AChampion's example uses two normal function definitions, one of which occurs inside another creating a closure over that call's a value. That is a more traditional approach, though we can still muddle the waters using mutable values:
def function(a):
def inner(b):
return sum(a) == b
return inner
>>> l = [1,2,3,4]
>>> eq = function(l)
>>> eq(10)
True
>>> eq(15)
False
>>> l.append(5)
>>> eq(15)
True
>>> eq(10)
False
We see here that this isn't a pure function in the mathematical sense, as its value is affected by other state than its arguments. We frequently try to avoid such side effects, or at least expose them by prominently displaying the state container, such as in method calls.
Lastly, depending on the context, the expression could fail in a variety of ways including NameError if function simply isn't defined, or TypeError if one of the calls was attempted on a non-callable object. It's still syntactically correct Python, and both of those exceptions are possible to handle, although doing so is likely a bit of a perversion. An example might be a spreadsheet program in which the cell formulae are Python expressions; you'd evaluate them with specific namespaces (globals), and catch any error to account for mistyped formulae.
We all know that functions are objects as well. But how do function objects compare with functions? What are the differences between function objects and functions?
By function object I mean g so defined:
class G(object):
def __call__(self, a):
pass
g = G()
By function I mean this:
def f(a):
pass
Python creates function objects for you when you use a def statement, or you use a lambda expression:
>>> def foo(): pass
...
>>> foo
<function foo at 0x106aafd70>
>>> lambda: None
<function <lambda> at 0x106d90668>
So whenever you are defining a function in python an object like this is created. There is no plain way of function definition.
TL;DR any object that implements __call__ can be called eg: functions, custom classes, etc..
Slightly longer version: (walloftext)
The full answer to your question on what's the difference sits within the implementation of the python virtual machine, so we must take a look at python under the hood. First comes the concept of a code object. Python parses whatever you throw at it into it's own internal language that is the same across all platforms known as bytecode. A very visual represnetation of this is when you get a .pyc file after importing a custom library you wrote. These are the raw instructions for the python VM. Ignoring how these instructions are created from your source code, they are then executed by PyEval_EvalFrameEx in Python/ceval.c. The source code is a bit of a beast, but ultimately works like a simple processor with some of the complicated bits abstracted away. The bytecode is the assembly language for this processor. In particular one of the "opcodes" for this "processor" is (aptly named) CALL_FUNCTION. The callback goes through a number of calls eventually getting to PyObject_Call(). This function takes a pointer to a PyObject and extracts the tp_call attribute from it's type and directly calls it (technically it checks if it's there first):
...
call = func->ob_type->tp_call //func is an arg of PyObject_Call() and is a pointer to a PyObject
...
result = (*call)(func, arg, kw);
Any object that implements __call__ is given a tp_call attribute with a pointer to the actual function. I believe that is handled by the slotdefs[] difinition from Objects/typeobject.c:
FLSLOT("__call__", tp_call, slot_tp_call, (wrapperfunc)wrap_call,
"__call__($self, /, *args, **kwargs)\n--\n\nCall self as a function.",
PyWrapperFlag_KEYWORDS)
The __call__ method itself for functions is defined in the cpython implementation and it defines how the python VM should start executing the bytecode for that function and how data should be returned (if any). When you give an arbitrary class a __call__ method, the attribute is a function object that again refers back to the cpython implementation of __call__. Therefore when you call a "normal" function foo.__call__ is referenced. when you call a callable class, the self.__call__ is equivalent to foo and the actual cpython reference called is self.__call__.im_func.__call__.
disclaimer
This has been a journey into somewhat uncharted waters for me, and it's entirely possible I have misrepresented some of the finer points of the implementation. I mainly took from this blog post on how python callables work under the hood, and some digging of my own through the python source code
I know that you can't call object.__setattr__ on objects not inherited from object, but is there anything else that is different between the two? I'm working in Python 2.6, if this matters.
Reading this question again I misunderstood what #paper.cut was asking about: the difference between classic classes and new-style classes (not an issue in Python 3+). I do not know the answer to that.
Original Answer*
setattr(instance, name, value) is syntactic sugar for instance.__setattr__(name, value)**.
You would only need to call object.__setattr__(...) inside a class definition, and then only if directly subclassing object -- if you were subclassing something else, Spam for example, then you should either use super() to get the next item in the heirarchy, or call Spam.__setattr__(...) -- this way you don't risk missing behavior that super-classes have defined by skipping over them directly to object.
* applies to Python 3.0+ classes and 2.x new-style classes
**There are two instances where setattr(x, ...) and x.__setattr__(...) are not the same:
x itself has a __setattr__ in it's private dictionary (so x.__dict__[__setattr__] = ... (this is almost certainly an error)
x.__class__ has a __getattribute__ method -- because __getattribute__ intercepts every lookup, even when the method/attribute exists
NB
These two caveats apply to every syntactic sugar shortcut:
setattr
getattr
len
bool
hash
etc