We all know that functions are objects as well. But how do function objects compare with functions? What are the differences between function objects and functions?
By function object I mean g so defined:
class G(object):
def __call__(self, a):
pass
g = G()
By function I mean this:
def f(a):
pass
Python creates function objects for you when you use a def statement, or you use a lambda expression:
>>> def foo(): pass
...
>>> foo
<function foo at 0x106aafd70>
>>> lambda: None
<function <lambda> at 0x106d90668>
So whenever you are defining a function in python an object like this is created. There is no plain way of function definition.
TL;DR any object that implements __call__ can be called eg: functions, custom classes, etc..
Slightly longer version: (walloftext)
The full answer to your question on what's the difference sits within the implementation of the python virtual machine, so we must take a look at python under the hood. First comes the concept of a code object. Python parses whatever you throw at it into it's own internal language that is the same across all platforms known as bytecode. A very visual represnetation of this is when you get a .pyc file after importing a custom library you wrote. These are the raw instructions for the python VM. Ignoring how these instructions are created from your source code, they are then executed by PyEval_EvalFrameEx in Python/ceval.c. The source code is a bit of a beast, but ultimately works like a simple processor with some of the complicated bits abstracted away. The bytecode is the assembly language for this processor. In particular one of the "opcodes" for this "processor" is (aptly named) CALL_FUNCTION. The callback goes through a number of calls eventually getting to PyObject_Call(). This function takes a pointer to a PyObject and extracts the tp_call attribute from it's type and directly calls it (technically it checks if it's there first):
...
call = func->ob_type->tp_call //func is an arg of PyObject_Call() and is a pointer to a PyObject
...
result = (*call)(func, arg, kw);
Any object that implements __call__ is given a tp_call attribute with a pointer to the actual function. I believe that is handled by the slotdefs[] difinition from Objects/typeobject.c:
FLSLOT("__call__", tp_call, slot_tp_call, (wrapperfunc)wrap_call,
"__call__($self, /, *args, **kwargs)\n--\n\nCall self as a function.",
PyWrapperFlag_KEYWORDS)
The __call__ method itself for functions is defined in the cpython implementation and it defines how the python VM should start executing the bytecode for that function and how data should be returned (if any). When you give an arbitrary class a __call__ method, the attribute is a function object that again refers back to the cpython implementation of __call__. Therefore when you call a "normal" function foo.__call__ is referenced. when you call a callable class, the self.__call__ is equivalent to foo and the actual cpython reference called is self.__call__.im_func.__call__.
disclaimer
This has been a journey into somewhat uncharted waters for me, and it's entirely possible I have misrepresented some of the finer points of the implementation. I mainly took from this blog post on how python callables work under the hood, and some digging of my own through the python source code
Related
Trying to code as efficient as possible object-oriented implementation of priority queue in Python, I faced an interesting behavior. The following code works fine
from heapq import heappush
class PriorityQueue(list):
__slots__ = ()
def push(self, item):
heappush(self, item)
However, I really didn’t want to write a wrapper method for calling heappush, as it incurs additional overhead for calling the function. I reasoned that since the heappush signature uses list as the first argument, while aliasing the push class attribute with the heappush function, the latter becomes a full-fledged class instance method. However, my assumption turned out to be false, and the following code gives an error.
from heapq import heappush
class PriorityQueue(list):
__slots__ = ()
push = heappush
PriorityQueue().push(0)
# TypeError: heappush expected 2 arguments, got 1
But going to cpython heapq source code, just copying heappush implementation into the scope and applying the same logic works fine.
from heapq import _siftdown
def heappush(heap, item):
"""Push item onto heap, maintaining the heap invariant."""
heap.append(item)
_siftdown(heap, 0, len(heap) - 1)
class PriorityQueue(list):
__slots__ = ()
push = heappush
pq = PriorityQueue()
pq.push(0)
pq.push(-1)
pq.push(3)
print(pq)
# [-1, 0, 3]
The first question: Why does it happen? How does Python decide which function is appropriate for binding as an instance method and which is not?
The second question: What is the difference between heappush in the cpython/Lib/heapq.py and the actual heappush from the heapq module? They are actually different since the following code gives an error
from dis import dis
from heapq import heappush
dis(heappush)
# TypeError: don't know how to disassemble builtin_function_or_method objects
The third question: How can one force Python to bind native heappush as an instance method? Some metaclass magic?
Thank you!
What takes place is that Python offers pure Python implementations of a lot of its algorithms in the standard library even when it contains acceletated native code implementations of the same algorithms.
The heapq library is one of those - if you pick the file you link to, but close to the end, you will see the code snippet which looks if the native version is available, and overwrites the Python version, which has the code you copy and pasted - https://github.com/python/cpython/blob/76cd81d60310d65d01f9d7b48a8985d8ab89c8b4/Lib/heapq.py#L580
try:
from _heapq import *
except ImportError:
pass
...
The native version of heappush is loaded into the module, and there is no easy way of getting a reference to the original Python function, short of getting to the actual file source code.
Now, the point: why do native functions do not work as class methods?
heappush's type is builtin_function_or_method, in constrast with function for pure Python functions - and one of the major diference is that the second object type features a __get__ method. This __get__ makes Python defined functions work as "descriptors": the __get__ method is called when one retrieves the attribute from an instance. For ordinary functions, this call records the self parameter and injects it when the actual function is called.
Thus, it is easy to write an "instancemethod" decorator that will make built-in functions work as Python functions and usable as methods. However, the overhead of creating a partial or a lambda function should surpass the overhead of the extra function call you are trying to eliminate - so you should get no speed gains from it, although it might still read as more elegant:
class instancemethod:
def __init__(self, func):
self.func = func
def __get__(self, instance, owner):
return lambda *args, **kwargs: self.func(instance, *args, **kwargs)
import heapq
class MyHeap(list):
push = instancemethod(heapq.heappush)
Maybe it the way python calls a function. When you try print(type(heappush)) you will notice the difference.
For question 1, the decorator that use to identify which function is which type (i.e. staticmethod, classmethod) is like call and process the function and return the processed one to that name. So the data that determine that should be in some attribute of the function. After I find where it is, question 3 may be solved.
For question 2. When you import the built-in function, it will be in the type of builtin_function_or_method. But if you copy and paste it, it was defined in your code so it's just function. That may cause the interpreter to call it a static method instead of an instance method.
I am a beginner in python and I am trying to wrap my head around function decorators in python. And I cannot figure out how functions return functions.
I mean in what order does interpreter interprets this function:
def decorator(another_func):
def wrapper():
print('before actual function')
return another_func()
print('pos')
return wrapper
And what is the difference between these 2 statements:-
return wrapper
AND
return wrapper()
I am using Head First Python, but this topic I feel is not described very well in there, please suggest any video or a good resource so that I can understand it.
The key to understanding the difference is understanding that everything is an object in python, including functions. When you use the name of the function without parenthesis (return wrapper), you are returning the actual function itself. When you use parenthesis, you're calling the function. Take a look at the following example code:
def foo(arg):
return 2
bar = foo
baz = foo()
qux = bar()
bar()
If you print baz or qux, it will print two. If you print bar, it will give you the memory address to reference the function, not a number. But, if you call the function, you are now printing the results of th
I cannot figure out how functions return functions.
As already explained by LTheriault, in python everything is an object. Not only this, but also everything happens at runtime - the def statement is an executable statement, which creates a function object from the code within the def block and bind this object to the function's name in the current namespace - IOW it's mostly syntactic sugar for some operations you could code manually (a very welcome syntactic sugar though - building a function object "by hand" is quite a lot of work).
Note that having functions as "first-class citizens" is not Python specific - that's the basis of functional programming.
I mean in what order does interpreter interprets this function:
def decorator(another_func):
def wrapper():
print('before actual function')
return another_func()
print('pos')
return wrapper
Assuming the decorator function is declared at the module top-level: the runtime first takes the code block that follows the def statement, compiles it into a code object, creates a function object (instance of type 'function') from this code object and a couple other things (the arguments list etc), and finally binds this function object to the declared name (nb: 'binds' => "assigns to").
The inner def statement is actually only executed when the outer function is called, and it's executed anew each time the outer function is called - IOW, each call to decorator returns a new function instance.
The above explanation is of course quite simplified (hence partially inexact), but it's enough to understand the basic principle.
It's used as a function, but why:
>>> help(map)
Help on class map in module builtins:
class map(object)
| map(func, *iterables) --> map object
|
| Make an iterator that computes the function using arguments from
| each of the iterables. Stops when the shortest iterable is exhausted.
|
| Methods defined here:
|
| __getattribute__(self, name, /)
| Return getattr(self, name).
...
How to understand the above output, where it shows a class and some methods?
Thanks.
The misunderstanding is due to poor documentation that doesn't catch a major change in specs, or due to the CPython implementation which dare to write a class for what is listed as a built-in function in the specs.
In Python 2, it is a function that returns a list. In the online documentation of Python 2, it is listed under Built-in Functions. The first line of help(map) on CPython 2.7.10 reads
Help on built-in function map in module builtin
correctly calling it a function.
In Python 3, they changed the specs that it returns an iterator instead of a list. As #RafaelC noted, it has an advantage of lazy loading. Althiugh it is still under "Built-n Functions", the CPython implementation decided to make it a class. This change is reflected in help(map), which you have seen and quoted in the question.
What you are doing when you call map() in CPython 3 is, you are creating an object of class map with the parameters you throw. This is clearly shown when you try to print what map() returns.
CPython 2.7.10:
>>> map(int, "12345")
[1, 2, 3, 4, 5]
CPython 3.7.2:
>>> map(int, "12345")
<map object at 0x1023454e0>
So you are clearly creating an object of class map, which makes what you've seen in help(map) sound very fine.
So it seems that, to the CPython core developers, a class can be a "function" with some definiton of a "function". This is clearly misleading. Anyway, it implements the necessary methods that enables it to be used as an iterator. (as the docs says, if you ignore that it's listed under builtin functions.)
It's used as a function
That's because the syntax of calling a function and fetching its return value is identical to creating a class object (by calling its initializer) and fetching the object.
For example, using a function my_function() as in return_value = my_function() is syntactically no different from creating a class object of my_class() as in my_object = my_class(). When you call map() in CPython 3, you are creating an object of class map. But you would write the same even if map were a function. That's why you're confused.
So in short,
map was a function in CPython 2, but is a class in CPython 3. That is clear from help(map) on both versions, and that's what the CPython implementation does.
The documentation keeps it under "Built-in functions" while CPython implementation finds liberty to write a class for it, causing confusion.
It's a shame that the two aren't clearly distinguished in the docs.
I came across a peculiar behaviour of functools.update_wrapper: it overwrites the __dict__ of the wrapper object by that of the wrapped object - which may hinder its use when nesting decorators.
As a simple example, assume that we are writing a decorator class that caches data in memory and another decorator class that caches data to a file. The following example demonstrates this (I made the example brief and omitted all cacheing logic, but I hope that it demonstrates the question):
import functools
class cached:
cache_type = 'memory'
def __init__(self, fcn):
super().__init__()
self.fcn = fcn
functools.update_wrapper(self, fcn, updated=())
def __call__(self, *args):
print("Retrieving from", type(self).cache_type)
return self.fcn(*args)
class diskcached(cached):
cache_type = 'disk'
#cached
#diskcached
def expensive_function(what):
print("expensive_function working on", what)
expensive_function("Expensive Calculation")
This example works as intended - its output is
Retrieving from memory
Retrieving from disk
expensive_function working on Expensive Calculation
However, it took me long to make this work - at first, I hat not included the 'updated=()' argument in the functools.update_wrapper call. But when this is left out, then nesting the decorators does not work - in this case, the output is
Retrieving from memory
expensive_function working on Expensive Calculation
I.e. the outer decorator directly calls the innermost wrapped function. The reason (which took me a while to understand) for this is that functools.update_wrapper updates the __dict__ attribute of the wrapper to the __dict__ attribute of the wrapped argument - which short-circuits the inner decorator, unless one adds the updated=() argument.
My question: is this behaviour intended - and why? (Python 3.7.1)
Making a wrapper function look like the function it wraps is the point of update_wrapper, and that includes __dict__ entries. It doesn't replace the __dict__; it calls update.
If update_wrapper didn't do this, then if one decorator set attributes on a function and another decorator wrapped the modified function:
#decorator_with_update_wrapper
#decorator_that_sets_attributes
def f(...):
...
the wrapper function wouldn't have the attributes set, rendering it incompatible with the code that looks for those attributes.
I have this simple function:
def f():
print("heh")
When I am calling f in reality I am calling its call method. But when I am calling a call method of f in reality I a calling a call method of call method of f. And so on and so on.
How does far does python go when call f(), it clearly must stop somewhere?
I was wondering whether this can go to infinity and it turns out that 100 000 is enough to crash Python.
>>> exec('f'+100000*'.__call__'+'()')
========= RESTART ==========
What's the reason of this crash?
A 'call' on an object causes the interpreter to look for a way to call it. When that is resolved by locating a __call__ method, that method is invoked, and then something real happens. The __call__ method can't just invoke the same mechanism on itself.
In the case of a function object, I believe there is an internal method table which is directly consulted first to see if there's a defined (C language) call handler, and that is invoked. There may also be a __call__ attribute which does the same thing, but I think the engine checks the table first (some of this may have been reworked in Py 3).
The 'C' langauge call handler for functions is handed a reference to the function object, and a package of parameters. The function object contains a reference to a code object, and another to the proper global namespace. The code object contains a description of what parameters are expected, and all the information needed to actually set up the call on the python stack.
When you call a method of a class, there's a little binder object with its own call method (containing a pointer to the 'self' and to the actual method').
I guess the main point is that some objects have __call__ methods coded in Python, but for many types the interpreter can go straight to C code after looking in the object's internal type descriptor. Another example is calling a type object , such as str, where the C-language constructor will be invoked.