I'm going through the implementation details of Python's LRU cache decorator. To understand the behavior of the lru_cache decorator in different scenarios properly, I've also gone through the following SO answers:
Python LRU Cache Decorator Per Instance
Python: building an LRU cache
Python LRU cache in a class disregards maxsize limit when decorated with a staticmethod or classmethod decorator
So far, I can tell that the caching behaviors are different in these 3 scenarios:
Decorating a function in the global namespace.
Decorating an instance method in a class.
Decorating a method in a class that is later on decorated with a staticmethod or classmethod decorator.
The first case is the happy path where each function decorated with the lru_cache decorator has its own cache. This is already well documented. In the second case, the cache is shared among multiple instances of the class where each instance will have different keys for the same argument of the instance method. This is explained quite well in the last question that I've listed. In the third case, the cache is also shared among multiple instances of the encapsulating class. However, since static method or class method don't take self as their first argument, the instances of the class won't create separate cache entries for the same arguments.
My question is—what implementation detail defines this behavior? In the implementation of lru_cache function, I can only see that a local cache dictionary inside the _lru_cache_wrapper function, is saving up the cache entries. Here's the snippet:
def _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo):
sentinel = object()
make_key = _make_key
PREV, NEXT, KEY, RESULT = 0, 1, 2, 3
cache = {} # This is a local dict,
# then how come the instance cache entries are shared?
What I don't understand is how is this local cache dictionary is shared among the instances of a class when the lru_cache decorator is applied to a method that resides in the class? I expected it to act the same as the first case where every entity has its own cache and nothing is shared.
Related
I am trying to make a class that wraps a value that will be used across multiple other objects. For computational reasons, the aim is for this wrapped value to only be calculated once and the reference to the value passed around to its users. I don't believe this is possible in vanilla python due to its object container model. Instead, my approach is a wrapper class that is passed around, defined as follows:
class DynamicProperty():
def __init__(self, value = None):
# Value of the property
self.value: Any = value
def __repr__(self):
# Use value's repr instead
return repr(self.value)
def __getattr__(self, attr):
# Doesn't exist in wrapper, get it from the value
# instead
return getattr(self.value, attr)
The following works as expected:
wrappedString = DynamicProperty("foo")
wrappedString.upper() # 'FOO'
wrappedFloat = DynamicProperty(1.5)
wrappedFloat.__add__(2) # 3.5
However, implicitly calling __add__ through normal syntax fails:
wrappedFloat + 2 # TypeError: unsupported operand type(s) for
# +: 'DynamicProperty' and 'float'
Is there a way to intercept these implicit method calls without explicitly defining magic methods for DynamicProperty to call the method on its value attribute?
Talking about "passing by reference" will only confuse you. Keep that terminology to languages where you can have a choice on that, and where it makes a difference. In Python you always pass objects around - and this passing is the equivalent of "passing by reference" - for all objects - from None to int to a live asyncio network connection pool instance.
With that out of the way: the algorithm the language follows to retrieve attributes from an object is complicated, have details - implementing __getattr__ is just the tip of the iceberg. Reading the document called "Data Model" in its entirety will give you a better grasp of all the mechanisms involved in retrieving attributes.
That said, here is how it works for "magic" or "dunder" methods - (special functions with two underscores before and two after the name): when you use an operator that requires the existence of the method that implements it (like __add__ for +), the language checks the class of your object for the __add__ method - not the instance. And __getattr__ on the class can dynamically create attributes for instances of that class only.
But that is not the only problem: you could create a metaclass (inheriting from type) and put a __getattr__ method on this metaclass. For all querying you would do from Python, it would look like your object had the __add__ (or any other dunder method) in its class. However, for dunder methods, Python do not go through the normal attribute lookup mechanism - it "looks" directly at the class, if the dunder method is "physically" there. There are slots in the memory structure that holds the classes for each of the possible dunder methods - and they either refer to the corresponding method, or are "null" (this is "viewable" when coding in C on the Python side, the default dir will show these methods when they exist, or omit them if not). If they are not there, Python will just "say" the object does not implement that operation and period.
The way to work around that with a proxy object like you want is to create a proxy class that either features the dunder methods from the class you want to wrap, or features all possible methods, and upon being called, check if the underlying object actually implements the called method.
That is why "serious" code will rarely, if ever, offer true "transparent" proxy objects. There are exceptions, but from "Weakrefs", to "super()", to concurrent.futures, just to mention a few in the core language and stdlib, no one attempts a "fully working transparent proxy" - instead, the api is more like you call a ".value()" or ".result()" method on the wrapper to get to the original object itself.
However, it can be done, as I described above. I even have a small (long unmaintained) package on pypi that does that, wrapping a proxy for a future.
The code is at https://bitbucket.org/jsbueno/lelo/src/master/lelo/_lelo.py
The + operator in your case does not work, because DynamicProperty does not inherit from float. See:
>>> class Foo(float):
pass
>>> Foo(1.5) + 2
3.5
So, you'll need to do some kind of dynamic inheritance:
def get_dynamic_property(instance):
base = type(instance)
class DynamicProperty(base):
pass
return DynamicProperty(instance)
wrapped_string = get_dynamic_property("foo")
print(wrapped_string.upper())
wrapped_float = get_dynamic_property(1.5)
print(wrapped_float + 2)
Output:
FOO
3.5
I came across a peculiar behaviour of functools.update_wrapper: it overwrites the __dict__ of the wrapper object by that of the wrapped object - which may hinder its use when nesting decorators.
As a simple example, assume that we are writing a decorator class that caches data in memory and another decorator class that caches data to a file. The following example demonstrates this (I made the example brief and omitted all cacheing logic, but I hope that it demonstrates the question):
import functools
class cached:
cache_type = 'memory'
def __init__(self, fcn):
super().__init__()
self.fcn = fcn
functools.update_wrapper(self, fcn, updated=())
def __call__(self, *args):
print("Retrieving from", type(self).cache_type)
return self.fcn(*args)
class diskcached(cached):
cache_type = 'disk'
#cached
#diskcached
def expensive_function(what):
print("expensive_function working on", what)
expensive_function("Expensive Calculation")
This example works as intended - its output is
Retrieving from memory
Retrieving from disk
expensive_function working on Expensive Calculation
However, it took me long to make this work - at first, I hat not included the 'updated=()' argument in the functools.update_wrapper call. But when this is left out, then nesting the decorators does not work - in this case, the output is
Retrieving from memory
expensive_function working on Expensive Calculation
I.e. the outer decorator directly calls the innermost wrapped function. The reason (which took me a while to understand) for this is that functools.update_wrapper updates the __dict__ attribute of the wrapper to the __dict__ attribute of the wrapped argument - which short-circuits the inner decorator, unless one adds the updated=() argument.
My question: is this behaviour intended - and why? (Python 3.7.1)
Making a wrapper function look like the function it wraps is the point of update_wrapper, and that includes __dict__ entries. It doesn't replace the __dict__; it calls update.
If update_wrapper didn't do this, then if one decorator set attributes on a function and another decorator wrapped the modified function:
#decorator_with_update_wrapper
#decorator_that_sets_attributes
def f(...):
...
the wrapper function wouldn't have the attributes set, rendering it incompatible with the code that looks for those attributes.
please review the following code…
class Client(MqC):
def __init__(self, tmpl=None):
self.ConfigSetBgError(self.BgError)
MqC.__init__(self)
def BgError(self):
... do some stuff……
I can add the callback BgError as class or as object callback…
class = self.ConfigSetBgError(Client.BgError)
object = self.ConfigSetBgError(self.BgError)
both cases are working because the callback code can handle this
the problem is the refCount of the self object… the case (2) increment the refCount by ONE… so the following code shows the difference code…
cl = Client()
print("refCount=" + str(sys.getrefcount(cl)))
cl = None
.tp_dealloc is called… because refCount=2
.tp_dealloc is NOT called because refCount=3
→ so… question… ho to solve this refCount cleanup issue ?
If you are worried about the callback keeping the instance alive, then don't pass in a bound method. self.BgError creates a method object (through the descriptor protocol), which references the instance object because it needs to have access to that instance when you call it; that's how the self parameter is passed in in the first place.
If you don't need to reference the instance state and the callback API can handle unbound methods or class method or static methods, then pass one of those in instead.
For example, you can make BgError a class method or static method:
#classmethod
def BgError(cls):
# ...
Now both Client.BgError and self.BgError (instance_of_Client.BgError) produce a method object bound to the class instead of the instance, giving you consistent behaviour. No additional reference to the instance is made.
If you do need the instance state, pass in a wrapper function with a weak reference to your instance. When invoked, check if the weak reference is still available before using the instance. Also see using python WeakSet to enable a callback functionality for a more in-depth post about callbacks and weak references. There the callback registry takes care of producing and storing weak references, but the same principles apply.
I am new to Django. Would be really helpful if someone can tell the difference between #cached_property and #lru_cache decorator in Django.
Also when should I use which decorator in Django. Use cases would be really helpful.
Thanks.
First and foremost, lru_cache is a decorator provided by the Python language itself as of version 3.4; cached_property is a decorator provided by Django for many years, while only being added to the Python language in version 3.8 in October 2019. That being said, they are similar.
lru_cache is specifically useful in functional programming. What it does is saves the results of function calls with a certain set of parameters. When a function decorated with lru_cache is called multiple times with the same parameters, the decorator will just return a cached result of the function result. This employs a method of programming called dynamic programming, and more specifically, memoization. Using these methods, you can drastically speed up code which repeatedly calls functions that are computationally expensive.
Python also provides another similar decorator called lfu_cache. Both of these decorators accomplish memoization, however with different replacement policies. lru_cache (least recently used) will fill it's cache and have to kick something out during the next decorated function call. This replacement policy dictates that the least recently used entry gets replaced by the new data. lfu_cache (least frequently used) dictates that replacements happen based on which entries are used the least.
cached_property is similar to lru_cache in the sense that it caches the result of expensive function calls. The only difference here is that it can only be used on methods, meaning the functions belong to an object. Furthermore, they can only be used on methods that have no other parameters aside from self. You would specifically want to use this during django development for a method on a class that hits the database. The Django docs mention its usage on a model class which has a property method friends. This method presumably hits the database to gather a set of people who are friends of that instance of Person. Because calls to the database are expensive, we'd want to cache that result for later use.
A major difference is that lru_cache will keep alive the objects in the cache, which might lead to memory leak, especially if the instance in which the lru_cache is applied is big (see: https://bugs.python.org/issue19859)
class A:
#property
#functools.lru_cache(maxsize=None)
def x(self):
return 123
for _ in range(100):
A().x # Call lru_cache on 100 different `A` instances
# The instances of `A()` are never garbage-collected:
assert A.x.fget.cache_info().currsize == 100
With cached_property, there is no cache, so no memory leak.
class B:
#functools.cached_property
def x(self):
return 123
b = B()
print(vars(b)) # {}
b.x
print(vars(b)) # {'x': 123}
del b # b is garbage-collected
Another difference is that #property are read-only while #cached_property are not. cache_property allows writes to the attributes Refer Python docs
A().x = 123
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: can't set attribute
B().x = 123 # Works
This is due to the fact that #cached_property are replacing the attribute, so the second call to b.x bypass the B.x.get descriptor call.
Another difference which likely don't matter in most cases is that cached_property is more performant if you access the same attribute multiple times, while lru_cache has overhead for the function call and attribute lookup. Note the difference is only visible with huge numbers.
[A().x for _ in range(10_000)]
[B().x for _ in range(10_000)]
a = A()
b = B()
print(timeit.timeit(lambda: a.x, number=1_000_000)) # ~0.83
print(timeit.timeit(lambda: b.x, number=1_000_000)) # ~0.57
They serve different purposes.
lru_cache saves the least recent uses - you should specify maxsize which distinguishes how many computations of your function you can save. Once you surpass this number, the 'oldest' result is discarded and the new one is saved.
cached_property just computes the result and saves it. It doesn't take arguments unlike lru_cache (you can think of it as a lru_cache on an object type with maxsize = 1 with no arguments).
I want to implement pickling support for objects belonging to my extension library. There is a global instance of class Service initialized at startup. All these objects are produced as a result of some Service method invocations and essentially belong to it. Service knows how to serialize them into binary buffers and how deserialize buffers back into objects.
It appeared that Pythons __ reduce__ should serve my purpose - implement pickling support. I started implementing one and realized that there is an issue with unpickler (first element od a tuple expected to be returned by __ reduce__). This unpickle function needs instance of a Service to be able to convert input buffer into an Object. Here is a bit of pseudo code to illustrate the issue:
class Service(object):
...
def pickleObject(self,obj):
# do serialization here and return buffer
...
def unpickleObject(self,buffer):
# do deserialization here and return new Object
...
class Object(object):
...
def __reduce__(self):
return self.service().unpickleObject, (self.service().pickleObject(self),)
Note the first element in a tuple. Python pickler does not like it: it says it is instancemethod and it can't be pickled. Obviously pickler is trying to store the routine into the output and wants Service instance along with function name, but this is not want I want to happen. I do not want (and really can't : Service is not pickable) to store service along with all the objects. I want service instance to be created before pickle.load is invoked and somehow that instance get used during unpickling.
Here where I came by copy_reg module. Again it appeared as it should solve my problems. This module allows to register pickler and unpickler routines per type dynamically and these are supposed to be used later on for the objects of this type. So I added this registration to the Service construction:
class Service(object):
...
def __init__(self):
...
import copy_reg
copy_reg( mymodule.Object, self.pickleObject, self.unpickleObject )
self.unpickleObject is now a bound method taking service as a first parameter and buffer as second. self.pickleObject is also bound method taking service and object to pickle. copy_reg required that pickleObject routine should follow reducer semantic and returns similar tuple as before. And here the problem arose again: what should I return as the first tuple element??
class Service(object):
...
def pickleObject(self,obj):
...
return self.unpickleObject, (self.serialize(obj),)
In this form pickle again complains that it can't pickle instancemethod. I tried None - it does not like it either. I put there some dummy function. This works - meaning serialization phase went through fine, but during unpickling it calls this dummy function instead of unpickler I registered for the type mymodule.Object in Service constructor.
So now I am at loss. Sorry for long explanation: I did not know how to ask this question in a few lines. I can summarize my questions like this:
Why does copy_reg semantic requires me to return unpickler routine from pickleObject, if I an expected to register one independently?
Is there any reason to prefer copy_reg.constructor interface to register unpickler routine?
How do I make pickle to use the unpickler I registered instead of one inside the stream?
What should I return as first element in a tuple as pickleObject result value? Is there a "correct" value?
Do I approach this whole thing correctly? Is there different/simpler solution?
First of all, the copy_reg module is unlikely to help you much here: it is primarily a way to add __reduce__ like features to classes that don't have that method rather than offering any special abilities (e.g. if you want to pickle objects from some library that doesn't natively support it).
The callable returned by __reduce__ needs to be locatable in the environment where the object is to be unpickled, so an instance method isn't really appropriate. As mentioned in the Pickle documentation:
In the unpickling environment this object must be either a class, a callable registered as a
“safe constructor” (see below), or it must have an attribute
__safe_for_unpickling__ with a true value.
So if you defined a function (not method) as follows:
def _unpickle_service_object(buffer):
# Grab the global service object, however that is accomplished
service = get_global_service_object()
return service.unpickleObject(buffer)
_unpickle_service_object.__safe_for_unpickling__ = True
You could now use this _unpickle_service_object function in the return value of your __reduce__ methods so that your objects linked to the new environment's global Service object when unpickled.