Trying to code as efficient as possible object-oriented implementation of priority queue in Python, I faced an interesting behavior. The following code works fine
from heapq import heappush
class PriorityQueue(list):
__slots__ = ()
def push(self, item):
heappush(self, item)
However, I really didn’t want to write a wrapper method for calling heappush, as it incurs additional overhead for calling the function. I reasoned that since the heappush signature uses list as the first argument, while aliasing the push class attribute with the heappush function, the latter becomes a full-fledged class instance method. However, my assumption turned out to be false, and the following code gives an error.
from heapq import heappush
class PriorityQueue(list):
__slots__ = ()
push = heappush
PriorityQueue().push(0)
# TypeError: heappush expected 2 arguments, got 1
But going to cpython heapq source code, just copying heappush implementation into the scope and applying the same logic works fine.
from heapq import _siftdown
def heappush(heap, item):
"""Push item onto heap, maintaining the heap invariant."""
heap.append(item)
_siftdown(heap, 0, len(heap) - 1)
class PriorityQueue(list):
__slots__ = ()
push = heappush
pq = PriorityQueue()
pq.push(0)
pq.push(-1)
pq.push(3)
print(pq)
# [-1, 0, 3]
The first question: Why does it happen? How does Python decide which function is appropriate for binding as an instance method and which is not?
The second question: What is the difference between heappush in the cpython/Lib/heapq.py and the actual heappush from the heapq module? They are actually different since the following code gives an error
from dis import dis
from heapq import heappush
dis(heappush)
# TypeError: don't know how to disassemble builtin_function_or_method objects
The third question: How can one force Python to bind native heappush as an instance method? Some metaclass magic?
Thank you!
What takes place is that Python offers pure Python implementations of a lot of its algorithms in the standard library even when it contains acceletated native code implementations of the same algorithms.
The heapq library is one of those - if you pick the file you link to, but close to the end, you will see the code snippet which looks if the native version is available, and overwrites the Python version, which has the code you copy and pasted - https://github.com/python/cpython/blob/76cd81d60310d65d01f9d7b48a8985d8ab89c8b4/Lib/heapq.py#L580
try:
from _heapq import *
except ImportError:
pass
...
The native version of heappush is loaded into the module, and there is no easy way of getting a reference to the original Python function, short of getting to the actual file source code.
Now, the point: why do native functions do not work as class methods?
heappush's type is builtin_function_or_method, in constrast with function for pure Python functions - and one of the major diference is that the second object type features a __get__ method. This __get__ makes Python defined functions work as "descriptors": the __get__ method is called when one retrieves the attribute from an instance. For ordinary functions, this call records the self parameter and injects it when the actual function is called.
Thus, it is easy to write an "instancemethod" decorator that will make built-in functions work as Python functions and usable as methods. However, the overhead of creating a partial or a lambda function should surpass the overhead of the extra function call you are trying to eliminate - so you should get no speed gains from it, although it might still read as more elegant:
class instancemethod:
def __init__(self, func):
self.func = func
def __get__(self, instance, owner):
return lambda *args, **kwargs: self.func(instance, *args, **kwargs)
import heapq
class MyHeap(list):
push = instancemethod(heapq.heappush)
Maybe it the way python calls a function. When you try print(type(heappush)) you will notice the difference.
For question 1, the decorator that use to identify which function is which type (i.e. staticmethod, classmethod) is like call and process the function and return the processed one to that name. So the data that determine that should be in some attribute of the function. After I find where it is, question 3 may be solved.
For question 2. When you import the built-in function, it will be in the type of builtin_function_or_method. But if you copy and paste it, it was defined in your code so it's just function. That may cause the interpreter to call it a static method instead of an instance method.
Related
I came across a peculiar behaviour of functools.update_wrapper: it overwrites the __dict__ of the wrapper object by that of the wrapped object - which may hinder its use when nesting decorators.
As a simple example, assume that we are writing a decorator class that caches data in memory and another decorator class that caches data to a file. The following example demonstrates this (I made the example brief and omitted all cacheing logic, but I hope that it demonstrates the question):
import functools
class cached:
cache_type = 'memory'
def __init__(self, fcn):
super().__init__()
self.fcn = fcn
functools.update_wrapper(self, fcn, updated=())
def __call__(self, *args):
print("Retrieving from", type(self).cache_type)
return self.fcn(*args)
class diskcached(cached):
cache_type = 'disk'
#cached
#diskcached
def expensive_function(what):
print("expensive_function working on", what)
expensive_function("Expensive Calculation")
This example works as intended - its output is
Retrieving from memory
Retrieving from disk
expensive_function working on Expensive Calculation
However, it took me long to make this work - at first, I hat not included the 'updated=()' argument in the functools.update_wrapper call. But when this is left out, then nesting the decorators does not work - in this case, the output is
Retrieving from memory
expensive_function working on Expensive Calculation
I.e. the outer decorator directly calls the innermost wrapped function. The reason (which took me a while to understand) for this is that functools.update_wrapper updates the __dict__ attribute of the wrapper to the __dict__ attribute of the wrapped argument - which short-circuits the inner decorator, unless one adds the updated=() argument.
My question: is this behaviour intended - and why? (Python 3.7.1)
Making a wrapper function look like the function it wraps is the point of update_wrapper, and that includes __dict__ entries. It doesn't replace the __dict__; it calls update.
If update_wrapper didn't do this, then if one decorator set attributes on a function and another decorator wrapped the modified function:
#decorator_with_update_wrapper
#decorator_that_sets_attributes
def f(...):
...
the wrapper function wouldn't have the attributes set, rendering it incompatible with the code that looks for those attributes.
I am new to Django. Would be really helpful if someone can tell the difference between #cached_property and #lru_cache decorator in Django.
Also when should I use which decorator in Django. Use cases would be really helpful.
Thanks.
First and foremost, lru_cache is a decorator provided by the Python language itself as of version 3.4; cached_property is a decorator provided by Django for many years, while only being added to the Python language in version 3.8 in October 2019. That being said, they are similar.
lru_cache is specifically useful in functional programming. What it does is saves the results of function calls with a certain set of parameters. When a function decorated with lru_cache is called multiple times with the same parameters, the decorator will just return a cached result of the function result. This employs a method of programming called dynamic programming, and more specifically, memoization. Using these methods, you can drastically speed up code which repeatedly calls functions that are computationally expensive.
Python also provides another similar decorator called lfu_cache. Both of these decorators accomplish memoization, however with different replacement policies. lru_cache (least recently used) will fill it's cache and have to kick something out during the next decorated function call. This replacement policy dictates that the least recently used entry gets replaced by the new data. lfu_cache (least frequently used) dictates that replacements happen based on which entries are used the least.
cached_property is similar to lru_cache in the sense that it caches the result of expensive function calls. The only difference here is that it can only be used on methods, meaning the functions belong to an object. Furthermore, they can only be used on methods that have no other parameters aside from self. You would specifically want to use this during django development for a method on a class that hits the database. The Django docs mention its usage on a model class which has a property method friends. This method presumably hits the database to gather a set of people who are friends of that instance of Person. Because calls to the database are expensive, we'd want to cache that result for later use.
A major difference is that lru_cache will keep alive the objects in the cache, which might lead to memory leak, especially if the instance in which the lru_cache is applied is big (see: https://bugs.python.org/issue19859)
class A:
#property
#functools.lru_cache(maxsize=None)
def x(self):
return 123
for _ in range(100):
A().x # Call lru_cache on 100 different `A` instances
# The instances of `A()` are never garbage-collected:
assert A.x.fget.cache_info().currsize == 100
With cached_property, there is no cache, so no memory leak.
class B:
#functools.cached_property
def x(self):
return 123
b = B()
print(vars(b)) # {}
b.x
print(vars(b)) # {'x': 123}
del b # b is garbage-collected
Another difference is that #property are read-only while #cached_property are not. cache_property allows writes to the attributes Refer Python docs
A().x = 123
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: can't set attribute
B().x = 123 # Works
This is due to the fact that #cached_property are replacing the attribute, so the second call to b.x bypass the B.x.get descriptor call.
Another difference which likely don't matter in most cases is that cached_property is more performant if you access the same attribute multiple times, while lru_cache has overhead for the function call and attribute lookup. Note the difference is only visible with huge numbers.
[A().x for _ in range(10_000)]
[B().x for _ in range(10_000)]
a = A()
b = B()
print(timeit.timeit(lambda: a.x, number=1_000_000)) # ~0.83
print(timeit.timeit(lambda: b.x, number=1_000_000)) # ~0.57
They serve different purposes.
lru_cache saves the least recent uses - you should specify maxsize which distinguishes how many computations of your function you can save. Once you surpass this number, the 'oldest' result is discarded and the new one is saved.
cached_property just computes the result and saves it. It doesn't take arguments unlike lru_cache (you can think of it as a lru_cache on an object type with maxsize = 1 with no arguments).
We all know that functions are objects as well. But how do function objects compare with functions? What are the differences between function objects and functions?
By function object I mean g so defined:
class G(object):
def __call__(self, a):
pass
g = G()
By function I mean this:
def f(a):
pass
Python creates function objects for you when you use a def statement, or you use a lambda expression:
>>> def foo(): pass
...
>>> foo
<function foo at 0x106aafd70>
>>> lambda: None
<function <lambda> at 0x106d90668>
So whenever you are defining a function in python an object like this is created. There is no plain way of function definition.
TL;DR any object that implements __call__ can be called eg: functions, custom classes, etc..
Slightly longer version: (walloftext)
The full answer to your question on what's the difference sits within the implementation of the python virtual machine, so we must take a look at python under the hood. First comes the concept of a code object. Python parses whatever you throw at it into it's own internal language that is the same across all platforms known as bytecode. A very visual represnetation of this is when you get a .pyc file after importing a custom library you wrote. These are the raw instructions for the python VM. Ignoring how these instructions are created from your source code, they are then executed by PyEval_EvalFrameEx in Python/ceval.c. The source code is a bit of a beast, but ultimately works like a simple processor with some of the complicated bits abstracted away. The bytecode is the assembly language for this processor. In particular one of the "opcodes" for this "processor" is (aptly named) CALL_FUNCTION. The callback goes through a number of calls eventually getting to PyObject_Call(). This function takes a pointer to a PyObject and extracts the tp_call attribute from it's type and directly calls it (technically it checks if it's there first):
...
call = func->ob_type->tp_call //func is an arg of PyObject_Call() and is a pointer to a PyObject
...
result = (*call)(func, arg, kw);
Any object that implements __call__ is given a tp_call attribute with a pointer to the actual function. I believe that is handled by the slotdefs[] difinition from Objects/typeobject.c:
FLSLOT("__call__", tp_call, slot_tp_call, (wrapperfunc)wrap_call,
"__call__($self, /, *args, **kwargs)\n--\n\nCall self as a function.",
PyWrapperFlag_KEYWORDS)
The __call__ method itself for functions is defined in the cpython implementation and it defines how the python VM should start executing the bytecode for that function and how data should be returned (if any). When you give an arbitrary class a __call__ method, the attribute is a function object that again refers back to the cpython implementation of __call__. Therefore when you call a "normal" function foo.__call__ is referenced. when you call a callable class, the self.__call__ is equivalent to foo and the actual cpython reference called is self.__call__.im_func.__call__.
disclaimer
This has been a journey into somewhat uncharted waters for me, and it's entirely possible I have misrepresented some of the finer points of the implementation. I mainly took from this blog post on how python callables work under the hood, and some digging of my own through the python source code
I am learning python, step-by-step. Today is about Object Oriented Programming. I know how to create and use simple classes, but something bugs me. Most of the objects I use in python do not require to call a constructor
How can this works? Or is the constructor called implicitly? Example:
>>> import xml.etree.ElementTree as etree
>>> tree = etree.parse('examples/feed.xml')
>>> root = tree.getroot()
>>> root
<Element {http://www.w3.org/2005/Atom}feed at cd1eb0>
(from http://www.diveinto.org/python3/xml.html#xml-parse)
I would have gone this way (which actually works):
>>> import xml.etree.ElementTree as etree
>>> tree = etree.ElementTree() # instanciate object
>>> tree.parse('examples/feed.xml')
I'd like to use this way of programming (do not call constructor, or at least call it implicitly) for my own project, but I can't get how it really works.
Thanks
etree.parse is a Factory function. Their purpose is mainly to be convenient ways of constructing objects (instances). As you can easily verify by looking at the source, the parse function does almost exactly as you do in your second example except it ommits a line of code or two.
In this case, what's happening is that the etree.parse() function is creating the ElementTree object for you, and returning it. That's why you don't have to call a constructor yourself; it's wrapped up in the parse function. That function creates an ElementTree instance, parses the data and modifies the new object to correctly represent the parsed information. Then it returns the object, so you can use it (in fact, if you look at the source, it does essentially what you wrote in your second example).
This is a pretty common idiom in object-oriented programming. Broadly speaking, it's called a factory function. Basically, especially for complex objects, a lot of work is required to create a useful instance of the object. So, rather than pack a lot of logic into the object's constructor, it's cleaner to make one or more factory functions to create the object and configure it as needed. This means that someone developing with the library may have several clean, simple ways to instantiate the class, even if "under the hood" that instantiation may be complex.
In the first case, you are calling a helper function from the module. Its not a class method (though internally it might create an object and then call its method). In the second case, you are instantiating an object and then calling its method.
For a class named ClassName, calling ClassName() implicity calls __init__() and returns you a new instance of ClassName.
If __init__ is not defined in ClassName, the super's __init__ will be called.
In your case, this is all inside a function:
def name(foo):
return ClassName(foo)
n = name("bar") # a function call returns a new instance
In my code I'm trying to take copies of instances a class using copy.deepcopy. The problem is that under some circumstances it is erroring with the following error:
TypeError: 'object.__new__(NotImplementedType) is not safe, use NotImplementedType.__new__()'
After much digging I have found that I am able to reproduce the error using the following code:
import copy
copy.deepcopy(__builtins__)
The problem appears to be that at some point it is trying to copy the NotImplementedType builtin. The question is why is it doing this? I have not overridden __deepcopy__ in my class and it doesn't happen all the time. Does anyone have any tips for tracking down where the request to make a copy of this type comes from?
I've put some debugging code in the copy module itself to ensure that this is what's happening, but the point at which the problem occurs is so far down a recursive stack it's very hard to make much of what I'm seeing.
In the end I did some digging in the copy source code and came up with the following solution:
from copy import deepcopy, _deepcopy_dispatch
from types import ModuleType
class MyType(object):
def __init__(self):
self.module = __builtins__
def copy(self):
''' Patch the deepcopy dispatcher to pass modules back unchanged '''
_deepcopy_dispatch[ModuleType] = lambda x, m: x
result = deepcopy(self)
del _deepcopy_dispatch[ModuleType]
return result
MyType().copy()
I realise this uses a private API but I couldn't find another clean way of achieving the same thing. I did a quick search on the web and found that other people had used the same API without any bother. If it changes in the future I'll take the hit.
I'm also aware that this is not thread-safe (if a thread needed the old behaviour whilst I was doing a copy on another thread I'd be screwed) but again its not a problem for me right now.
Hope that helps someone else out at some point.
you could override the __deepcopy__ method: (python documentation)
In order for a class to define its own copy implementation, it can define special methods __copy__() and __deepcopy__(). The former is called to implement the shallow copy operation; no additional arguments are passed. The latter is called to implement the deep copy operation; it is passed one argument, the memo dictionary. If the __deepcopy__() implementation needs to make a deep copy of a component, it should call the deepcopy() function with the component as first argument and the memo dictionary as second argument.
Otherwise you could save the modules in a global list or something else.
You can override the deepcopy behavior of the class that contains a pointer to a module, by using the pickle protocol, which is supported by the copy module, as is stated here. In particular, you can define __getstate__ and __setstate__ for that class. E.g.:
>>> class MyClass:
... def __getstate__(self):
... state = self.__dict__.copy()
... del state['some_module']
... return state
... def __setstate__(self, state):
... self.__dict__.update(state)
... self.some_module = some_module