Difference between #cached_property and #lru_cache decorator - python

I am new to Django. Would be really helpful if someone can tell the difference between #cached_property and #lru_cache decorator in Django.
Also when should I use which decorator in Django. Use cases would be really helpful.
Thanks.

First and foremost, lru_cache is a decorator provided by the Python language itself as of version 3.4; cached_property is a decorator provided by Django for many years, while only being added to the Python language in version 3.8 in October 2019. That being said, they are similar.
lru_cache is specifically useful in functional programming. What it does is saves the results of function calls with a certain set of parameters. When a function decorated with lru_cache is called multiple times with the same parameters, the decorator will just return a cached result of the function result. This employs a method of programming called dynamic programming, and more specifically, memoization. Using these methods, you can drastically speed up code which repeatedly calls functions that are computationally expensive.
Python also provides another similar decorator called lfu_cache. Both of these decorators accomplish memoization, however with different replacement policies. lru_cache (least recently used) will fill it's cache and have to kick something out during the next decorated function call. This replacement policy dictates that the least recently used entry gets replaced by the new data. lfu_cache (least frequently used) dictates that replacements happen based on which entries are used the least.
cached_property is similar to lru_cache in the sense that it caches the result of expensive function calls. The only difference here is that it can only be used on methods, meaning the functions belong to an object. Furthermore, they can only be used on methods that have no other parameters aside from self. You would specifically want to use this during django development for a method on a class that hits the database. The Django docs mention its usage on a model class which has a property method friends. This method presumably hits the database to gather a set of people who are friends of that instance of Person. Because calls to the database are expensive, we'd want to cache that result for later use.

A major difference is that lru_cache will keep alive the objects in the cache, which might lead to memory leak, especially if the instance in which the lru_cache is applied is big (see: https://bugs.python.org/issue19859)
class A:
#property
#functools.lru_cache(maxsize=None)
def x(self):
return 123
for _ in range(100):
A().x # Call lru_cache on 100 different `A` instances
# The instances of `A()` are never garbage-collected:
assert A.x.fget.cache_info().currsize == 100
With cached_property, there is no cache, so no memory leak.
class B:
#functools.cached_property
def x(self):
return 123
b = B()
print(vars(b)) # {}
b.x
print(vars(b)) # {'x': 123}
del b # b is garbage-collected
Another difference is that #property are read-only while #cached_property are not. cache_property allows writes to the attributes Refer Python docs
A().x = 123
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: can't set attribute
B().x = 123 # Works
This is due to the fact that #cached_property are replacing the attribute, so the second call to b.x bypass the B.x.get descriptor call.
Another difference which likely don't matter in most cases is that cached_property is more performant if you access the same attribute multiple times, while lru_cache has overhead for the function call and attribute lookup. Note the difference is only visible with huge numbers.
[A().x for _ in range(10_000)]
[B().x for _ in range(10_000)]
a = A()
b = B()
print(timeit.timeit(lambda: a.x, number=1_000_000)) # ~0.83
print(timeit.timeit(lambda: b.x, number=1_000_000)) # ~0.57

They serve different purposes.
lru_cache saves the least recent uses - you should specify maxsize which distinguishes how many computations of your function you can save. Once you surpass this number, the 'oldest' result is discarded and the new one is saved.
cached_property just computes the result and saves it. It doesn't take arguments unlike lru_cache (you can think of it as a lru_cache on an object type with maxsize = 1 with no arguments).

Related

Python LRU cache's global and per instance behavioral differences

I'm going through the implementation details of Python's LRU cache decorator. To understand the behavior of the lru_cache decorator in different scenarios properly, I've also gone through the following SO answers:
Python LRU Cache Decorator Per Instance
Python: building an LRU cache
Python LRU cache in a class disregards maxsize limit when decorated with a staticmethod or classmethod decorator
So far, I can tell that the caching behaviors are different in these 3 scenarios:
Decorating a function in the global namespace.
Decorating an instance method in a class.
Decorating a method in a class that is later on decorated with a staticmethod or classmethod decorator.
The first case is the happy path where each function decorated with the lru_cache decorator has its own cache. This is already well documented. In the second case, the cache is shared among multiple instances of the class where each instance will have different keys for the same argument of the instance method. This is explained quite well in the last question that I've listed. In the third case, the cache is also shared among multiple instances of the encapsulating class. However, since static method or class method don't take self as their first argument, the instances of the class won't create separate cache entries for the same arguments.
My question is—what implementation detail defines this behavior? In the implementation of lru_cache function, I can only see that a local cache dictionary inside the _lru_cache_wrapper function, is saving up the cache entries. Here's the snippet:
def _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo):
sentinel = object()
make_key = _make_key
PREV, NEXT, KEY, RESULT = 0, 1, 2, 3
cache = {} # This is a local dict,
# then how come the instance cache entries are shared?
What I don't understand is how is this local cache dictionary is shared among the instances of a class when the lru_cache decorator is applied to a method that resides in the class? I expected it to act the same as the first case where every entity has its own cache and nothing is shared.

Instance method aliases to builtin-functions in Python

Trying to code as efficient as possible object-oriented implementation of priority queue in Python, I faced an interesting behavior. The following code works fine
from heapq import heappush
class PriorityQueue(list):
__slots__ = ()
def push(self, item):
heappush(self, item)
However, I really didn’t want to write a wrapper method for calling heappush, as it incurs additional overhead for calling the function. I reasoned that since the heappush signature uses list as the first argument, while aliasing the push class attribute with the heappush function, the latter becomes a full-fledged class instance method. However, my assumption turned out to be false, and the following code gives an error.
from heapq import heappush
class PriorityQueue(list):
__slots__ = ()
push = heappush
PriorityQueue().push(0)
# TypeError: heappush expected 2 arguments, got 1
But going to cpython heapq source code, just copying heappush implementation into the scope and applying the same logic works fine.
from heapq import _siftdown
def heappush(heap, item):
"""Push item onto heap, maintaining the heap invariant."""
heap.append(item)
_siftdown(heap, 0, len(heap) - 1)
class PriorityQueue(list):
__slots__ = ()
push = heappush
pq = PriorityQueue()
pq.push(0)
pq.push(-1)
pq.push(3)
print(pq)
# [-1, 0, 3]
The first question: Why does it happen? How does Python decide which function is appropriate for binding as an instance method and which is not?
The second question: What is the difference between heappush in the cpython/Lib/heapq.py and the actual heappush from the heapq module? They are actually different since the following code gives an error
from dis import dis
from heapq import heappush
dis(heappush)
# TypeError: don't know how to disassemble builtin_function_or_method objects
The third question: How can one force Python to bind native heappush as an instance method? Some metaclass magic?
Thank you!
What takes place is that Python offers pure Python implementations of a lot of its algorithms in the standard library even when it contains acceletated native code implementations of the same algorithms.
The heapq library is one of those - if you pick the file you link to, but close to the end, you will see the code snippet which looks if the native version is available, and overwrites the Python version, which has the code you copy and pasted - https://github.com/python/cpython/blob/76cd81d60310d65d01f9d7b48a8985d8ab89c8b4/Lib/heapq.py#L580
try:
from _heapq import *
except ImportError:
pass
...
The native version of heappush is loaded into the module, and there is no easy way of getting a reference to the original Python function, short of getting to the actual file source code.
Now, the point: why do native functions do not work as class methods?
heappush's type is builtin_function_or_method, in constrast with function for pure Python functions - and one of the major diference is that the second object type features a __get__ method. This __get__ makes Python defined functions work as "descriptors": the __get__ method is called when one retrieves the attribute from an instance. For ordinary functions, this call records the self parameter and injects it when the actual function is called.
Thus, it is easy to write an "instancemethod" decorator that will make built-in functions work as Python functions and usable as methods. However, the overhead of creating a partial or a lambda function should surpass the overhead of the extra function call you are trying to eliminate - so you should get no speed gains from it, although it might still read as more elegant:
class instancemethod:
def __init__(self, func):
self.func = func
def __get__(self, instance, owner):
return lambda *args, **kwargs: self.func(instance, *args, **kwargs)
import heapq
class MyHeap(list):
push = instancemethod(heapq.heappush)
Maybe it the way python calls a function. When you try print(type(heappush)) you will notice the difference.
For question 1, the decorator that use to identify which function is which type (i.e. staticmethod, classmethod) is like call and process the function and return the processed one to that name. So the data that determine that should be in some attribute of the function. After I find where it is, question 3 may be solved.
For question 2. When you import the built-in function, it will be in the type of builtin_function_or_method. But if you copy and paste it, it was defined in your code so it's just function. That may cause the interpreter to call it a static method instead of an instance method.

Should I use python magic methods directly?

I heard from one guy that you should not use magic methods directly. and I think in some use cases I would have to use magic methods directly. So experienced devs, should I use python magic methods directly?
I intended to show some benefits of not using magic methods directly:
1- Readability:
Using built-in functions like len() is much more readable than its relevant magic/special method __len__(). Imagine a source code full of only magic methods instead of built-in function... thousands of underscores...
2- Comparison operators:
class C:
def __lt__(self, other):
print('__lt__ called')
class D:
pass
c = C()
d = D()
d > c
d.__gt__(c)
I haven't implemented __gt__ for neither of those classes, but in d > c when Python sees that class D doesn't have __gt__, it checks to see if class C implements __lt__. It does, so we get '__lt__ called' in output which isn't the case with d.__gt__(c).
3- Extra checks:
class C:
def __len__(self):
return 'boo'
obj = C()
print(obj.__len__()) # fine
print(len(obj)) # error
or:
class C:
def __str__(self):
return 10
obj = C()
print(obj.__str__()) # fine
print(str(obj)) # error
As you see, when Python calls that magic methods implicitly, it does some extra checks as well.
4- This is the least important but using let's say len() on built-in data types such as str gives a little bit of speed as compared to __len__():
from timeit import timeit
string = 'abcdefghijklmn'
print(timeit("len(string)", globals=globals(), number=10_000_000))
print(timeit("string.__len__()", globals=globals(), number=10_000_000))
output:
0.5442426
0.8312854999999999
It's because of the lookup process(__len__ in the namespace), If you create a bound method before timing, it's gonna be faster.
bound_method = string.__len__
print(timeit("bound_method()", globals=globals(), number=10_000_000))
I'm not a senior developer, but my experience says that you shouldn't call magic methods directly.
Magic methods should be used to override a behavior on your object. For example, if you want to define how does your object is built, you override __init__. Afterwards when you want to initialize it, you use MyNewObject() instead of MyNewObject.__init__().
For me, I tend to appreciate the answer given by Alex Martelli here:
When you see a call to the len built-in, you're sure that, if the program continues after that rather than raising an exception, the call has returned an integer, non-negative, and less than 2**31 -- when you see a call to xxx.__len__(), you have no certainty (except that the code's author is either unfamiliar with Python or up to no good;-).
If you want to know more about Python's magic methods, I strongly recommend taking a look on this documentation made by Rafe Kettler: https://rszalski.github.io/magicmethods/
No you shouldn't.
it's ok to be used in quick code problems like in hackerrank but not in production code. when I asked this question I used them as first class functions. what I mean is, I used xlen = x.__mod__ instead of xlen = lamda y: x % y which was more convenient. it's ok to use these kinda snippets in simple programs but not in any other case.

Why does functools.update_wrapper update the __dict__ in the wrapper object?

I came across a peculiar behaviour of functools.update_wrapper: it overwrites the __dict__ of the wrapper object by that of the wrapped object - which may hinder its use when nesting decorators.
As a simple example, assume that we are writing a decorator class that caches data in memory and another decorator class that caches data to a file. The following example demonstrates this (I made the example brief and omitted all cacheing logic, but I hope that it demonstrates the question):
import functools
class cached:
cache_type = 'memory'
def __init__(self, fcn):
super().__init__()
self.fcn = fcn
functools.update_wrapper(self, fcn, updated=())
def __call__(self, *args):
print("Retrieving from", type(self).cache_type)
return self.fcn(*args)
class diskcached(cached):
cache_type = 'disk'
#cached
#diskcached
def expensive_function(what):
print("expensive_function working on", what)
expensive_function("Expensive Calculation")
This example works as intended - its output is
Retrieving from memory
Retrieving from disk
expensive_function working on Expensive Calculation
However, it took me long to make this work - at first, I hat not included the 'updated=()' argument in the functools.update_wrapper call. But when this is left out, then nesting the decorators does not work - in this case, the output is
Retrieving from memory
expensive_function working on Expensive Calculation
I.e. the outer decorator directly calls the innermost wrapped function. The reason (which took me a while to understand) for this is that functools.update_wrapper updates the __dict__ attribute of the wrapper to the __dict__ attribute of the wrapped argument - which short-circuits the inner decorator, unless one adds the updated=() argument.
My question: is this behaviour intended - and why? (Python 3.7.1)
Making a wrapper function look like the function it wraps is the point of update_wrapper, and that includes __dict__ entries. It doesn't replace the __dict__; it calls update.
If update_wrapper didn't do this, then if one decorator set attributes on a function and another decorator wrapped the modified function:
#decorator_with_update_wrapper
#decorator_that_sets_attributes
def f(...):
...
the wrapper function wouldn't have the attributes set, rendering it incompatible with the code that looks for those attributes.

Python's equivalent of .Net's sealed class

Does python have anything similar to a sealed class? I believe it's also known as final class, in java.
In other words, in python, can we mark a class so it can never be inherited or expanded upon? Did python ever considered having such a feature? Why?
Disclaimers
Actually trying to understand why sealed classes even exist. Answer here (and in many, many, many, many, many, really many other places) did not satisfy me at all, so I'm trying to look from a different angle. Please, avoid theoretical answers to this question, and focus on the title! Or, if you'd insist, at least please give one very good and practical example of a sealed class in csharp, pointing what would break big time if it was unsealed.
I'm no expert in either language, but I do know a bit of both. Just yesterday while coding on csharp I got to know about the existence of sealed classes. And now I'm wondering if python has anything equivalent to that. I believe there is a very good reason for its existence, but I'm really not getting it.
You can use a metaclass to prevent subclassing:
class Final(type):
def __new__(cls, name, bases, classdict):
for b in bases:
if isinstance(b, Final):
raise TypeError("type '{0}' is not an acceptable base type".format(b.__name__))
return type.__new__(cls, name, bases, dict(classdict))
class Foo:
__metaclass__ = Final
class Bar(Foo):
pass
gives:
>>> class Bar(Foo):
... pass
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 5, in __new__
TypeError: type 'Foo' is not an acceptable base type
The __metaclass__ = Final line makes the Foo class 'sealed'.
Note that you'd use a sealed class in .NET as a performance measure; since there won't be any subclassing methods can be addressed directly. Python method lookups work very differently, and there is no advantage or disadvantage, when it comes to method lookups, to using a metaclass like the above example.
Before we talk Python, let's talk "sealed":
I, too, have heard that the advantage of .Net sealed / Java final / C++ entirely-nonvirtual classes is performance. I heard it from a .Net dev at Microsoft, so maybe it's true. If you're building a heavy-use, highly-performance-sensitive app or framework, you may want to seal a handful of classes at or near the real, profiled bottleneck. Particularly classes that you are using within your own code.
For most applications of software, sealing a class that other teams consume as part of a framework/library/API is kinda...weird.
Mostly because there's a simple work-around for any sealed class, anyway.
I teach "Essential Test-Driven Development" courses, and in those three languages, I suggest consumers of such a sealed class wrap it in a delegating proxy that has the exact same method signatures, but they're override-able (virtual), so devs can create test-doubles for these slow, nondeterministic, or side-effect-inducing external dependencies.
[Warning: below snark intended as humor. Please read with your sense of humor subroutines activated. I do realize that there are cases where sealed/final are necessary.]
The proxy (which is not test code) effectively unseals (re-virtualizes) the class, resulting in v-table look-ups and possibly less efficient code (unless the compiler optimizer is competent enough to in-line the delegation). The advantages are that you can test your own code efficiently, saving living, breathing humans weeks of debugging time (in contrast to saving your app a few million microseconds) per month... [Disclaimer: that's just a WAG. Yeah, I know, your app is special. ;-]
So, my recommendations: (1) trust your compiler's optimizer, (2) stop creating unnecessary sealed/final/non-virtual classes that you built in order to either (a) eke out every microsecond of performance at a place that is likely not your bottleneck anyway (the keyboard, the Internet...), or (b) create some sort of misguided compile-time constraint on the "junior developers" on your team (yeah...I've seen that, too).
Oh, and (3) write the test first. ;-)
Okay, yes, there's always link-time mocking, too (e.g. TypeMock). You got me. Go ahead, seal your class. Whatevs.
Back to Python: The fact that there's a hack rather than a keyword is probably a reflection of the pure-virtual nature of Python. It's just not "natural."
By the way, I came to this question because I had the exact same question. Working on the Python port of my ever-so-challenging and realistic legacy-code lab, and I wanted to know if Python had such an abominable keyword as sealed or final (I use them in the Java, C#, and C++ courses as a challenge to unit testing). Apparently it doesn't. Now I have to find something equally challenging about untested Python code. Hmmm...
Python does have classes that can't be extended, such as bool or NoneType:
>>> class ExtendedBool(bool):
... pass
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: type 'bool' is not an acceptable base type
However, such classes cannot be created from Python code. (In the CPython C API, they are created by not setting the Py_TPFLAGS_BASETYPE flag.)
Python 3.6 will introduce the __init_subclass__ special method; raising an error from it will prevent creating subclasses. For older versions, a metaclass can be used.
Still, the most “Pythonic” way to limit usage of a class is to document how it should not be used.
Similar in purpose to a sealed class and useful to reduce memory usage (Usage of __slots__?) is the __slots__ attribute which prevents monkey patching a class. Because when the metaclass __new__ is called, it is too late to put a __slots__ into the class, we have to put it into the namespace at the first possible timepoint, i.e. during __prepare__. Additionally, this throws the TypeError a little bit earlier. Using mcs for the isinstance comparison removes the necessity to hardcode the metaclass name in itself. The disadvantage is that all unslotted attributes are read-only. Therefore, if we want to set specific attributes during initialization or later, they have to slotted specifically. This is feasible e.g. by using a dynamic metaclass taking slots as an argument.
def Final(slots=[]):
if "__dict__" in slots:
raise ValueError("Having __dict__ in __slots__ breaks the purpose")
class _Final(type):
#classmethod
def __prepare__(mcs, name, bases, **kwargs):
for b in bases:
if isinstance(b, mcs):
msg = "type '{0}' is not an acceptable base type"
raise TypeError(msg.format(b.__name__))
namespace = {"__slots__":slots}
return namespace
return _Final
class Foo(metaclass=Final(slots=["_z"])):
y = 1
def __init__(self, z=1):
self.z = 1
#property
def z(self):
return self._z
#z.setter
def z(self, val:int):
if not isinstance(val, int):
raise TypeError("Value must be an integer")
else:
self._z = val
def foo(self):
print("I am sealed against monkey patching")
where the attempt of overwriting foo.foo will throw AttributeError: 'Foo' object attribute 'foo' is read-only and attempting to add foo.x will throw AttributeError: 'Foo' object has no attribute 'x'. The limiting power of __slots__ would be broken when inheriting, but because Foo has the metaclass Final, you can't inherit from it. It would also be broken when dict is in slots, so we throw a ValueError in case. To conclude, defining setters and getters for slotted properties allows to limit how the user can overwrite them.
foo = Foo()
# attributes are accessible
foo.foo()
print(foo.y)
# changing slotted attributes is possible
foo.z = 2
# %%
# overwriting unslotted attributes won't work
foo.foo = lambda:print("Guerilla patching attempt")
# overwriting a accordingly defined property won't work
foo.z = foo.foo
# expanding won't work
foo.x = 1
# %% inheriting won't work
class Bar(Foo):
pass
In that regard, Foo could not be inherited or expanded upon. The disadvantage is that all attributes have to be explicitly slotted, or are limited to a read-only class variable.
Python 3.8 has that feature in the form of the typing.final decorator:
class Base:
#final
def done(self) -> None:
...
class Sub(Base):
def done(self) -> None: # Error reported by type checker
...
#final
class Leaf:
...
class Other(Leaf): # Error reported by type checker
See https://docs.python.org/3/library/typing.html#typing.final

Categories