Ok, here is the real world scenario: I'm writing an application, and I have a class that represents a certain type of files (in my case this is photographs but that detail is irrelevant to the problem). Each instance of the Photograph class should be unique to the photo's filename.
The problem is, when a user tells my application to load a file, I need to be able to identify when files are already loaded, and use the existing instance for that filename, rather than create duplicate instances on the same filename.
To me this seems like a good situation to use memoization, and there's a lot of examples of that out there, but in this case I'm not just memoizing an ordinary function, I need to be memoizing __init__(). This poses a problem, because by the time __init__() gets called it's already too late as there's a new instance created already.
In my research I found Python's __new__() method, and I was actually able to write a working trivial example, but it fell apart when I tried to use it on my real-world objects, and I'm not sure why (the only thing I can think of is that my real world objects were subclasses of other objects that I can't really control, and so there were some incompatibilities with this approach). This is what I had:
class Flub(object):
instances = {}
def __new__(cls, flubid):
try:
self = Flub.instances[flubid]
except KeyError:
self = Flub.instances[flubid] = super(Flub, cls).__new__(cls)
print 'making a new one!'
self.flubid = flubid
print id(self)
return self
#staticmethod
def destroy_all():
for flub in Flub.instances.values():
print 'killing', flub
a = Flub('foo')
b = Flub('foo')
c = Flub('bar')
print a
print b
print c
print a is b, b is c
Flub.destroy_all()
Which output this:
making a new one!
139958663753808
139958663753808
making a new one!
139958663753872
<__main__.Flub object at 0x7f4aaa6fb050>
<__main__.Flub object at 0x7f4aaa6fb050>
<__main__.Flub object at 0x7f4aaa6fb090>
True False
killing <__main__.Flub object at 0x7f4aaa6fb050>
killing <__main__.Flub object at 0x7f4aaa6fb090>
It's perfect! Only two instances were made for the two unique id's given, and Flub.instances clearly only has two listed.
But when I tried to take this approach with the objects I was using, I got all kinds of nonsensical errors about how __init__() took only 0 arguments, not 2. So I'd change some things around and then it would tell me that __init__() needed an argument. Totally bizarre.
After a while of fighting with it, I basically just gave up and moved all the __new__() black magic into a staticmethod called get, such that I could call Photograph.get(filename) and it would only call Photograph(filename) if filename wasn't already in Photograph.instances.
Does anybody know where I went wrong here? Is there some better way to do this?
Another way of thinking about it is that it's similar to a singleton, except it's not globally singleton, just singleton-per-filename.
Here's my real-world code using the staticmethod get if you want to see it all together.
Let us see two points about your question.
Using memoize
You can use memoization, but you should decorate the class, not the __init__ method. Suppose we have this memoizator:
def get_id_tuple(f, args, kwargs, mark=object()):
"""
Some quick'n'dirty way to generate a unique key for an specific call.
"""
l = [id(f)]
for arg in args:
l.append(id(arg))
l.append(id(mark))
for k, v in kwargs:
l.append(k)
l.append(id(v))
return tuple(l)
_memoized = {}
def memoize(f):
"""
Some basic memoizer
"""
def memoized(*args, **kwargs):
key = get_id_tuple(f, args, kwargs)
if key not in _memoized:
_memoized[key] = f(*args, **kwargs)
return _memoized[key]
return memoized
Now you just need to decorate the class:
#memoize
class Test(object):
def __init__(self, somevalue):
self.somevalue = somevalue
Let us see a test?
tests = [Test(1), Test(2), Test(3), Test(2), Test(4)]
for test in tests:
print test.somevalue, id(test)
The output is below. Note that the same parameters yield the same id of the returned object:
1 3072319660
2 3072319692
3 3072319724
2 3072319692
4 3072319756
Anyway, I would prefer to create a function to generate the objects and memoize it. Seems cleaner to me, but it may be some irrelevant pet peeve:
class Test(object):
def __init__(self, somevalue):
self.somevalue = somevalue
#memoize
def get_test_from_value(somevalue):
return Test(somevalue)
Using __new__:
Or, of course, you can override __new__. Some days ago I posted an answer about the ins, outs and best practices of overriding __new__ that can be helpful. Basically, it says to always pass *args, **kwargs to your __new__ method.
I, for one, would prefer to memoize a function which creates the objects, or even write a specific function which would take care of never recreating a object to the same parameter. Of course, however, this is mostly a opinion of mine, not a rule.
The solution that I ended up using is this:
class memoize(object):
def __init__(self, cls):
self.cls = cls
self.__dict__.update(cls.__dict__)
# This bit allows staticmethods to work as you would expect.
for attr, val in cls.__dict__.items():
if type(val) is staticmethod:
self.__dict__[attr] = val.__func__
def __call__(self, *args):
key = '//'.join(map(str, args))
if key not in self.cls.instances:
self.cls.instances[key] = self.cls(*args)
return self.cls.instances[key]
And then you decorate the class with this, not __init__. Although brandizzi provided me with that key piece of information, his example decorator didn't function as desired.
I found this concept quite subtle, but basically when you're using decorators in Python, you need to understand that the thing that gets decorated (whether it's a method or a class) is actually replaced by the decorator itself. So for example when I'd try to access Photograph.instances or Camera.generate_id() (a staticmethod), I couldn't actually access them because Photograph doesn't actually refer to the original Photograph class, it refers to the memoized function (from brandizzi's example).
To get around this, I had to create a decorator class that actually took all the attributes and static methods from the decorated class and exposed them as it's own. Almost like a subclass, except that the decorator class doesn't know ahead of time what classes it will be decorating, so it has to copy the attributes over after the fact.
The end result is that any instance of the memoize class becomes an almost transparent wrapper around the actual class that it has decorated, with the exception that attempting to instantiate it (but really calling it) will provide you with cached copies when they're available.
The parameters to __new__ also get passed to __init__, so:
def __init__(self, flubid):
...
You need to accept the flubid argument there, even if you don't use it in __init__
Here is the relevant comment taken from typeobject.c in Python2.7.3
/* You may wonder why object.__new__() only complains about arguments
when object.__init__() is not overridden, and vice versa.
Consider the use cases:
1. When neither is overridden, we want to hear complaints about
excess (i.e., any) arguments, since their presence could
indicate there's a bug.
2. When defining an Immutable type, we are likely to override only
__new__(), since __init__() is called too late to initialize an
Immutable object. Since __new__() defines the signature for the
type, it would be a pain to have to override __init__() just to
stop it from complaining about excess arguments.
3. When defining a Mutable type, we are likely to override only
__init__(). So here the converse reasoning applies: we don't
want to have to override __new__() just to stop it from
complaining.
4. When __init__() is overridden, and the subclass __init__() calls
object.__init__(), the latter should complain about excess
arguments; ditto for __new__().
Use cases 2 and 3 make it unattractive to unconditionally check for
excess arguments. The best solution that addresses all four use
cases is as follows: __init__() complains about excess arguments
unless __new__() is overridden and __init__() is not overridden
(IOW, if __init__() is overridden or __new__() is not overridden);
symmetrically, __new__() complains about excess arguments unless
__init__() is overridden and __new__() is not overridden
(IOW, if __new__() is overridden or __init__() is not overridden).
However, for backwards compatibility, this breaks too much code.
Therefore, in 2.6, we'll *warn* about excess arguments when both
methods are overridden; for all other cases we'll use the above
rules.
*/
Was trying to figure this out as well and I put together a solution that combines some tips from other StackOverflow questions (links in the code comments).
If anyone still needs, try this out:
import functools
from collections import OrderedDict
def memoize(f):
class Memoized:
def __init__(self, func):
self._f = func
self._cache = {}
# Make the Memoized class masquerade as the object we are memoizing.
# Preserve class attributes
functools.update_wrapper(self, func)
# Preserve static methods
# From https://stackoverflow.com/questions/11174362
for k, v in func.__dict__.items():
self.__dict__[k] = v.__func__ if type(v) is staticmethod else v
def __call__(self, *args, **kwargs):
# Generate key
key = (args)
if kwargs:
key += (object())
for k, v in kwargs.items():
key += (hash(k))
key += (hash(v))
key = hash(key)
if key in self._cache:
return self._cache[key]
else:
self._cache[key] = self._f(*args, **kwargs)
return self._cache[key]
def __get__(self, instance, owner):
"""
From https://stackoverflow.com/questions/30104047/how-can-i-decorate-an-instance-method-with-a-decorator-class
"""
return functools.partial(self.__call__, instance)
def __instancecheck__(self, other):
"""Make isinstance() work"""
return isinstance(other, self._f)
return Memoized(f)
Then you can use like so:
#memoize
class Test:
def __init__(self, value):
self._value = value
#property
def value(self):
return self._value
Uploaded the full thing with documentation to: https://github.com/spoorn/nemoize
Related
I'd like to write a decorator that does somewhat different things when it gets a function or a method.
for example, I'd like to write a cache decorator but I don't want to have self as part of the key if it's a method.
def cached(f):
def _internal(*args, **kwargs):
if ismethod(f):
key = create_key(*args[1:], **kwargs) # ignore self from args
else: # this is a regular function
key = create_key(*args, **kwargs)
return actual_cache_mechanism(key, f, *args, **kwargs)
return _internal
class A:
#cached
def b(self, something):
...
#cached
def c(something):
...
the problem is that when #cached is called, it cannot distinguish between methods and functions as both are of type function.
can that even be done? As I'm thinking of it I feel that actually methods have no idea about the context in which they are being defined in...
Thanks!
This is kind of an ugly hack, but you can use obj.__qualname__ to see if obj was defined in a class, by checking if it has a period
if "." in obj.__qualname__":
#obj is a member of an object, so it is a method
I'm not sure if it will work nicely for decorators though, since for this to work the method would need to be defined in the class.
I think it is desirable to avoid such introspecting decorator in the name of good pythonic style.
You can always factor out the function to be cached to accept just the required arguments:
#cached
def func(something):
return ...
class A:
def b(self, something):
self.bvalue = func(something)
For the case mentioned in comments (an object is needed to get the result, but its value does not affect it, e.g. a socket), please refer to these questions: How to ignore a parameter in functools. lru_cache? and Make #lru_cache ignore some of the function arguments
I'd like a particular function to be callable as a classmethod, and to behave differently when it's called on an instance.
For example, if I have a class Thing, I want Thing.get_other_thing() to work, but also thing = Thing(); thing.get_other_thing() to behave differently.
I think overwriting the get_other_thing method on initialization should work (see below), but that seems a bit hacky. Is there a better way?
class Thing:
def __init__(self):
self.get_other_thing = self._get_other_thing_inst()
#classmethod
def get_other_thing(cls):
# do something...
def _get_other_thing_inst(self):
# do something else
Great question! What you seek can be easily done using descriptors.
Descriptors are Python objects which implement the descriptor protocol, usually starting with __get__().
They exist, mostly, to be set as a class attribute on different classes. Upon accessing them, their __get__() method is called, with the instance and owner class passed in.
class DifferentFunc:
"""Deploys a different function accroding to attribute access
I am a descriptor.
"""
def __init__(self, clsfunc, instfunc):
# Set our functions
self.clsfunc = clsfunc
self.instfunc = instfunc
def __get__(self, inst, owner):
# Accessed from class
if inst is None:
return self.clsfunc.__get__(None, owner)
# Accessed from instance
return self.instfunc.__get__(inst, owner)
class Test:
#classmethod
def _get_other_thing(cls):
print("Accessed through class")
def _get_other_thing_inst(inst):
print("Accessed through instance")
get_other_thing = DifferentFunc(_get_other_thing,
_get_other_thing_inst)
And now for the result:
>>> Test.get_other_thing()
Accessed through class
>>> Test().get_other_thing()
Accessed through instance
That was easy!
By the way, did you notice me using __get__ on the class and instance function? Guess what? Functions are also descriptors, and that's the way they work!
>>> def func(self):
... pass
...
>>> func.__get__(object(), object)
<bound method func of <object object at 0x000000000046E100>>
Upon accessing a function attribute, it's __get__ is called, and that's how you get function binding.
For more information, I highly suggest reading the Python manual and the "How-To" linked above. Descriptors are one of Python's most powerful features and are barely even known.
Why not set the function on instantiation?
Or Why not set self.func = self._func inside __init__?
Setting the function on instantiation comes with quite a few problems:
self.func = self._funccauses a circular reference. The instance is stored inside the function object returned by self._func. This on the other hand is stored upon the instance during the assignment. The end result is that the instance references itself and will clean up in a much slower and heavier manner.
Other code interacting with your class might attempt to take the function straight out of the class, and use __get__(), which is the usual expected method, to bind it. They will receive the wrong function.
Will not work with __slots__.
Although with descriptors you need to understand the mechanism, setting it on __init__ isn't as clean and requires setting multiple functions on __init__.
Takes more memory. Instead of storing one single function, you store a bound function for each and every instance.
Will not work with properties.
There are many more that I didn't add as the list goes on and on.
Here is a bit hacky solution:
class Thing(object):
#staticmethod
def get_other_thing():
return 1
def __getattribute__(self, name):
if name == 'get_other_thing':
return lambda: 2
return super(Thing, self).__getattribute__(name)
print Thing.get_other_thing() # 1
print Thing().get_other_thing() # 2
If we are on class, staticmethod is executed. If we are on instance, __getattribute__ is first to be executed, so we can return not Thing.get_other_thing but some other function (lambda in my case)
I have defined the following class-method to define my object from a pandas.DataFrame instead of from a list like so:
class Container(object):
#classmethod
def from_df(cls, df):
rows = [i for _, i in df.iterrows()]
return cls(rows)
and pylint complains at the return line with the E1120 'code-smell':
No value for argument 'cls' in constructor call
I can't see anything wrong with it, and it seems to work. Does anybody else maybe have an idea what could be wrong with it?
Update: Ugh, user rogalski got it (I think): I confused myself by using the same variable name for a class that comes in as argument:
def __init__(self, iterable, cls):
self.content = [cls(item) for item in iterable]
I do this because I have different kind of objects coming in and this Container class is the abstract version of this daughter:
class FanContainer(Container):
def __init__(self, iterable):
super().__init__(iterable, Fan)
with Fan being one of several classes that need to be 'contained'.
Rogalski, want to write up an answer along the lines of saying that the error might reference a name of the __init__ constructor? Cheers! (Now I have to dig why my code isn't stumbling over this...)
Update2
Only realizing know how feeble I have coded this: I am using this basically like so:
fancontainer = FanContainer.from_df(df)
and because I am overwriting the __init__ in the FanContainer class, I guess that's why my code still worked? So, the abstract __init__ is never being called directly, because I never call Container.from_df(df) but only the daughter classes' classmethods. Guess that can be done prettier a different way.
Typically this error is related to non-complaint function signatures.
Given your code:
class Container(object):
def __init__(self, iterable, cls):
self.content = [cls(item) for item in iterable]
#classmethod
def from_df(cls, df):
rows = [i for _, i in df.iterrows()]
return cls(rows)
Pylint resolves cls in from_df scope object to be Container. Class objects are callables (like functions) and they return new instance of given class. Pylint investigates constructor interface and checks if passed arguments are correct.
In your case passed arguments are incorrect - second required argument (which happens to have same name - cls - but it exists in different score) is missing. That's why Pylint yields error.
Follow up your edits:
Pylint does not run your code. It statically analyzes it. Since it's possible to call it like Container.from_df PyLint will warn about possible misuse.
If constructor is never intended to use both arguments outside of your subclasses you may pass default argument and explicitly raise an exception:
class Container(object):
def __init__(self, iterable, cls=None):
if cls is None:
raise NotImplementedError()
self.content = [cls(item) for item in iterable]
#classmethod
def from_df(cls, df):
rows = [i for _, i in df.iterrows()]
return cls(rows)
This question already has answers here:
Decorating class methods - how to pass the instance to the decorator?
(3 answers)
Closed 3 years ago.
I am new to Python decorators (wow, great feature!), and I have trouble getting the following to work because the self argument gets sort of mixed up.
#this is the decorator
class cacher(object):
def __init__(self, f):
self.f = f
self.cache = {}
def __call__(self, *args):
fname = self.f.__name__
if (fname not in self.cache):
self.cache[fname] = self.f(self,*args)
else:
print "using cache"
return self.cache[fname]
class Session(p.Session):
def __init__(self, user, passw):
self.pl = p.Session(user, passw)
#cacher
def get_something(self):
print "get_something called with self = %s "% self
return self.pl.get_something()
s = Session(u,p)
s.get_something()
When I run this, I get:
get_something called with self = <__main__.cacher object at 0x020870F0>
Traceback:
...
AttributeError: 'cacher' object has no attribute 'pl'
for the line where I do self.cache[fname] = self.f(self,*args)
The problem - Obviously, the problem is that self is the cacher object instead of a Session instance, which indeed doesn't have a pl attribute. However I can't find how to solve this.
Solutions I've considered, but can't use - I thought of making the decorator class return a function instead of a value (like in section 2.1 of this article) so that self is evaluated in the right context, but that isn't possible since my decorator is implemented as a class and uses the build-in __call__ method. Then I thought to not use a class for my decorator, so that I don't need the __call__method, but I can't do that because I need to keep state between decorator calls (i.e. for keeping track of what is in the self.cache attribute).
Question - So, apart from using a global cache dictionary variable (which I didn't try, but assume will work), is there any other way to make this decorator work?
Edit: this SO question seems similar Decorating python class methods, how do I pass the instance to the decorator?
Use the descriptor protocol like this:
import functools
class cacher(object):
def __init__(self, f):
self.f = f
self.cache = {}
def __call__(self, *args):
fname = self.f.__name__
if (fname not in self.cache):
self.cache[fname] = self.f(self,*args)
else:
print "using cache"
return self.cache[fname]
def __get__(self, instance, instancetype):
"""Implement the descriptor protocol to make decorating instance
method possible.
"""
# Return a partial function with the first argument is the instance
# of the class decorated.
return functools.partial(self.__call__, instance)
Edit :
How it's work ?
Using the descriptor protocol in the decorator will allow us to access the method decorated with the correct instance as self, maybe some code can help better:
Now when we will do:
class Session(p.Session):
...
#cacher
def get_something(self):
print "get_something called with self = %s "% self
return self.pl.get_something()
equivalent to:
class Session(p.Session):
...
def get_something(self):
print "get_something called with self = %s "% self
return self.pl.get_something()
get_something = cacher(get_something)
So now get_something is an instance of cacher . so when we will call the method get_something it will be translated to this (because of the descriptor protocol):
session = Session()
session.get_something
# <==>
session.get_something.__get__(get_something, session, <type ..>)
# N.B: get_something is an instance of cacher class.
and because :
session.get_something.__get__(get_something, session, <type ..>)
# return
get_something.__call__(session, ...) # the partial function.
so
session.get_something(*args)
# <==>
get_something.__call__(session, *args)
Hopefully this will explain how it work :)
Closures are often a better way to go, since you don't have to muck about with the descriptor protocol. Saving mutable state across calls is even easier than with a class, since you just stick the mutable object in the containing scope (references to immutable objects can be handled either via the nonlocal keyword, or by stashing them in a mutable object like a single-entry list).
#this is the decorator
from functools import wraps
def cacher(f):
# No point using a dict, since we only ever cache one value
# If you meant to create cache entries for different arguments
# check the memoise decorator linked in other answers
print("cacher called")
cache = []
#wraps(f)
def wrapped(*args, **kwds):
print ("wrapped called")
if not cache:
print("calculating and caching result")
cache.append(f(*args, **kwds))
return cache[0]
return wrapped
class C:
#cacher
def get_something(self):
print "get_something called with self = %s "% self
C().get_something()
C().get_something()
If you aren't completely familiar with the way closures work, adding more print statements (as I have above) can be illustrative. You will see that cacher is only called as the function is defined, but wrapped is called each time the method is called.
This does highlight how you need to be careful with memoisation techniques and instance methods though - if you aren't careful to account for changes in the value of self, you will end up sharing cached answers across instances, which may not be what you want.
First, you explicitly pass cacher object as first argument in the following line:
self.cache[fname] = self.f(self,*args)
Python automatically adds self to the list of arguments for methods only. It converts functions (but not other callables as your cacher object!) defined in class namespace to methods. To get such behavior I see two ways:
Change your decorator to return function by using closures.
Implement descriptor protocol to pass self argument yourself as it's done in memoize decorator recipe.
Many times I have member functions that copy parameters into object's fields. For Example:
class NouveauRiches(object):
def __init__(self, car, mansion, jet, bling):
self.car = car
self.mansion = mansion
self.jet = jet
self.bling = bling
Is there a python language construct that would make the above code less tedious?
One could use *args:
def __init__(self, *args):
self.car, self.mansion, self.jet, self.bling = args
+: less tedious
-: function signature not revealing enough. need to dive into function code to know how to use function
-: does not raise a TypeError on call with wrong # of parameters (but does raise a ValueError)
Any other ideas? (Whatever your suggestion, make sure the code calling the function does stays simple)
You could do this with a helper method, something like this:
import inspect
def setargs(func):
f = inspect.currentframe(1)
argspec = inspect.getargspec(func)
for arg in argspec.args:
setattr(f.f_locals["self"], arg, f.f_locals[arg])
Usage:
class Foo(object):
def __init__(self, bar, baz=4711):
setargs(self.__init__)
print self.bar # Now defined
print self.baz # Now defined
This is not pretty, and it should probably only be used when prototyping. Please use explicit assignment if you plan to have others read it.
It could probably be improved not to need to take the function as an argument, but that would require even more ugly hacks and trickery :)
I would go for this, also you could override already defined properties.
class D:
def __init__(self, **kwargs):
self.__dict__.update(kwargs)
But i personally would just go the long way.
Think of those:
- Explicit is better than implicit.
- Flat is better than nested.
(The Zen of Python)
Try something like
d = dict(locals())
del d['self']
self.__dict__.update(d)
Of course, it returns all local variables, not just function arguments.
I am not sure this is such a good idea, but it can be done:
import inspect
class NouveauRiches(object):
def __init__(self, car, mansion, jet, bling):
arguments = inspect.getargvalues(frame)[0]
values = inspect.getargvalues(frame)[3];
for name in arguments:
self.__dict__[name] = values[name]
It does not read great either, though I suppose you could put this in a utility method that is reused.
You could try something like this:
class C(object):
def __init__(self, **kwargs):
for k in kwargs:
d = {k: kwargs[k]}
self.__dict__.update(d)
Or using setattr you can do:
class D(object):
def __init__(self, **kwargs):
for k in kwargs:
setattr(self, k, kwargs[k])
Both can then be called like:
myclass = C(test=1, test2=2)
So you have to use **kwargs, rather than *args.
I sometimes do this for classes that act "bunch-like", that is, they have a bunch of customizable attributes:
class SuperClass(object):
def __init__(self, **kw):
for name, value in kw.iteritems():
if not hasattr(self, name):
raise TypeError('Unexpected argument: %s' % name)
setattr(self, name, value)
class SubClass(SuperClass):
instance_var = None # default value
class SubClass2(SubClass):
other_instance_var = True
#property
def something_dynamic(self):
return self._internal_var
#something_dynamic.setter # new Python 2.6 feature of properties
def something_dynamic(self, value):
assert value is None or isinstance(value, str)
self._internal_var = value
Then you can call SubClass2(instance_var=[], other_instance_var=False) and it'll work without defining __init__ in either of them. You can use any property as well. Though this allows you to overwrite methods, which you probably wouldn't intend (as they return True for hasattr() just like an instance variable).
If you add any property or other other descriptor it will work fine. You can use that to do type checking; unlike type checking in __init__ it'll be applied any time that value is updated. Note you can't use any positional arguments for these unless you override __init__, so sometimes what would be a natural positional argument won't work. formencode.declarative covers this and other issues, probably with a thoroughness I would not suggest you attempt (in retrospect I don't think it's worth it).
Note that any recipe that uses self.__dict__ won't respect properties and descriptors, and if you use those together you'll just get weird and unexpected results. I only recommend using setattr() to set attributes, never self.__dict__.
Also this recipe doesn't give a very helpful signature, while some of the ones that do frame and function introspection do. With some work it is possible to dynamically generate a __doc__ that clarifies the arguments... but again I'm not sure the payoff is worth the addition of more moving parts.
I am a fan of the following
import inspect
def args_to_attrs(otherself):
frame = inspect.currentframe(1)
argvalues = inspect.getargvalues(frame)
for arg in argvalues.args:
if arg == 'self':
continue
value = argvalues.locals[arg]
setattr(otherself, arg, value)
class MyClass:
def __init__(self, arga="baf", argb="lek", argc=None):
args_to_attrs(self)
Arguments to __init__ are explicitly named, so it is clear what attributes are being set. Additionally, it is a little bit streamlined over the currently accepted answer.