I have a class like this:
class Something(object):
def __init__(self):
self._thing_id
self._cached_thing
#property
def thing(self):
if self._cached_thing:
return self._cached_thing
return Thing.objects.get(id=self._thing_id)
When pickling objects like this, I'd like to prevent pickling of the _cached_thing field, as it's volatile and a specifically in-memory-only implementation.
Is there a way to suggest to Pickle that I only want a subset of my fields to be pickled?
Pickle can be customized in three ways, as described in the docs.
Provide __getstate__ and __setstate__ methods.
Provide __getnewargs__/__getnewargs_ex__ (and a constructor that takes those args).
Provide __reduce__ (and a function to give to __reduce__ to reverse it).
The first is usually the simplest:
class Something(object):
def __init__(self):
self._thing_id
self._cached_thing
def __getstate__(self):
return self._thing_id
def __setstate__(self, thing_id):
self._thing_id = thing_id
# etc.
If you want something more generic, that will pickle all values (including those set by a subclass, or dynamically after creation, etc.) except your blacklist, note that the default is "the instance's __dict__ is pickled", so just filter that:
_blacklist = ['_cached_thing']
def __getstate__(self):
return {k: v for k, v in self.__dict__.items() if k not in self._blacklist}
def __setstate__(self, state):
self.__dict__.update(state)
And please see gnibbler's comment on the question: if you're doing something generic, you should seriously consider coming up with some kind of naming convention instead of putting a blacklist in each class. Any reader who knows or learns the convention will immediately know which properties are "cache" values rather than part of the "real" value, it'll be more obvious how things work, there's less work for you to do in each class, and fewer places to screw things up with a typo…
Yes, you can use the special methods __getstate__ and __setstate__ to have pickle save customized data for your objects.
http://docs.python.org/2/library/pickle.html#object.getstate
This should get you started:
class Something(object):
def __init__(self):
self._thing_id = 0
self._cached_thing = None
def __getstate__(self):
return {
'_thing_id': self._thing_id,
}
def __setstate(self, state):
self._thing_id = state['_thing_id']
Related
Task:
Implement some class that accepts at least one argument and can be either initialized by original data, or its own instance.
Minimal example of usage:
arg = {} # whatever necessary for the real object
instance1 = NewClass(arg)
instance2 = NewClass(instance1)
assert instance2 is instance1 # or at least, ==
More complex example of usage:
from typing import Mapping, Union
class NewClass:
"""
Incomplete
Should somehow act like described in the task
"""
def __init__(self, data: Mapping):
self.data = data
def cool_method(self):
assert isinstance(self.data, Mapping)
# do smth with self.data
return ...
...
class AnotherClass:
"""
Accepts both mappings and NewClass instances,
but needs NewClass internally
"""
def __init__(self, obj: Union[Mapping, NewClass]):
self.cool = NewClass(obj).cool_method()
...
One just have to make use of the __new__ method on the class, instead of __init__ to be able to change what is instantiated.
In this case, all you need is to write your NewClass like this:
from typing import Union, Mapping, Self
class NewClass:
"""
acts like described in the task
"""
# typing.Self is available in Python 3.11.
# For previous versions, just put the class name quoted
# in a string: `"NewClass"` instead of `Self`
def __new__(cls, data: Union[Mapping, Self]):
if isinstance(data, NewClass):
return data
self = super().__new__(cls)
self.data = data
return self
def cool_method(self):
assert isinstance(self.data, Mapping)
# do smth with self.data
return ...
Avoiding a metaclass is interesting because it avoid metaclasses conflicts, in larger projects, and it is an abstraction level most
projects simply does not need. Actually, static type checkers such
as "Mypy" can't even figure out behavior changes coded into
the metaclasses.
On the other hand, __new__ is a common special method sibling to __init__, readily available, just not used more commonly because Python also provides
__init__, which suffices when the default behavior of __new__, of
always creating a new instance, is not the desired one.
For some reason I do not know, making use of a metaclass to create a "singleton" got wildly popular in tutorials and answers. It is a design pattern much less important and less used in Python than in languages which do not allow "stand alone" functions. Metaclasses are not needed for singletons either, by the way - one can just create a top-level instance of whatever class should have a single instance, and use that instance from that point on, instead of creating new instances. Other languages also restrict the existence of top-level, importable, instances, making that a need that was artificially imported into Python.
Metaclass solution:
Actual for python 3.8
class SelfWrapperMeta(type):
"""
Metaclass, allowing to return previously created user class instance,
if the user class init receives it as the first positional argument
Other arguments are just ignored in that self-wrapping case
Otherwise, the user class init calls normally
"""
def __call__(cls, arg, /, *args, **kwargs):
if isinstance(arg, cls):
return arg
return super().__call__(arg, *args, **kwargs)
Example of usage:
class A(metaclass=SelfWrapperMeta):
def __init__(self, data):
self.data = data
example = {}
a = A(example)
b = A(a)
c = A(example)
assert a is b
assert c is not a
I have a class (Bar) which effectively has its own state and callback(s) and is used by another class (Foo):
class Foo(object):
def __init__(self):
self._bar = Bar(self.say, 10)
self._bar.work()
def say(self, msg):
print msg
class Bar(object):
def __init__(self, callback, value):
self._callback = callback
self._value = value
self._more = { 'foo' : 1, 'bar': 3, 'baz': 'fubar'}
def work(self):
# Do some work
self._more['foo'] = 5
self._value = 10
self._callback('FooBarBaz')
Foo()
Obviously I can't pickle the class Foo since Bar has an instancemethod, so I'm left with the following solution of implementing __getstate__ & __setstate__ in Bar to save self._value & self._more, but I have to instantiate the self._callback method as well (i.e. call __init__() from the outer class Foo passing the callback function.
But I cannot figure out how to achieve this.
Any help is much appreciated.
Thanks.
I think if you need to serialize something like this you need to be able to define your callback as a string. For example, you might say that callback = 'myproject.callbacks.foo_callback'.
Basically in __getstate__ you'd replace the _callback function with something you could use to look up the function later like self._callback.__name__.
In __setstate__ you'd replace _callback with a function.
This depends on your functions all having real names so you couldn't use a lambda as a callback and expect it to be serialized. You'd also need a reasonable mechanism for looking up your functions by name.
You could potentially use __import__ (something like: 'myproject.somemodule.somefunc' dotted name syntax could be supported that way, see http://code.google.com/p/mock/source/browse/mock.py#1076) or just define a lookup table in your code.
Just a quick (untested, sorry!) example assuming you have a small set of possible callbacks defined in a lookup table:
def a():
pass
callbacks_to_name = {a: 'a'
# ...
}
callbacks_by_name = {'a': a,
# ...
}
class C:
def __init__(self, cb):
self._callback = cb
def __getstate__(self):
self._callback = callbacks_to_name[self._callback]
return self.__dict__
def __setstate__(self, state):
state[_callback] = callbacks_by_name[self._callback]
I'm not sure what your use case is but I'd recommend doing this by serializing your work items to JSON or XML and writing a simple set of functions to serialize and deserialize them yourself.
The benefit is that the serialized format can be read and understood by humans and modified when you upgrade your software. Pickle is tempting because it seems close enough, but by the time you have a serious pile of __getstate__ and __setstate__ you haven't really saved yourself much effort or headache over building your own scheme specifically for your application.
For example, I have a
class BaseHandler(object):
def prepare(self):
self.prepped = 1
I do not want everyone that subclasses BaseHandler and also wants to implement prepare to have to remember to call
super(SubBaseHandler, self).prepare()
Is there a way to ensure the superclass method is run even if the subclass also implements prepare?
I have solved this problem using a metaclass.
Using a metaclass allows the implementer of the BaseHandler to be sure that all subclasses will call the superclasses prepare() with no adjustment to any existing code.
The metaclass looks for an implementation of prepare on both classes and then overwrites the subclass prepare with one that calls superclass.prepare followed by subclass.prepare.
class MetaHandler(type):
def __new__(cls, name, bases, attrs):
instance = type.__new__(cls, name, bases, attrs)
super_instance = super(instance, instance)
if hasattr(super_instance, 'prepare') and hasattr(instance, 'prepare'):
super_prepare = getattr(super_instance, 'prepare')
sub_prepare = getattr(instance, 'prepare')
def new_prepare(self):
super_prepare(self)
sub_prepare(self)
setattr(instance, 'prepare', new_prepare)
return instance
class BaseHandler(object):
__metaclass__ = MetaHandler
def prepare(self):
print 'BaseHandler.prepare'
class SubHandler(BaseHandler):
def prepare(self):
print 'SubHandler.prepare'
Using it looks like this:
>>> sh = SubHandler()
>>> sh.prepare()
BaseHandler.prepare
SubHandler.prepare
Tell your developers to define prepare_hook instead of prepare, but
tell the users to call prepare:
class BaseHandler(object):
def prepare(self):
self.prepped = 1
self.prepare_hook()
def prepare_hook(self):
pass
class SubBaseHandler(BaseHandler):
def prepare_hook(self):
pass
foo = SubBaseHandler()
foo.prepare()
If you want more complex chaining of prepare calls from multiple subclasses, then your developers should really use super as that's what it was intended for.
Just accept that you have to tell people subclassing your class to call the base method when overriding it. Every other solution either requires you to explain them to do something else, or involves some un-pythonic hacks which could be circumvented too.
Python’s object inheritance model was designed to be open, and any try to go another way will just overcomplicate the problem which does not really exist anyway. Just tell everybody using your stuff to either follow your “rules”, or the program will mess up.
One explicit solution without too much magic going on would be to maintain a list of prepare call-backs:
class BaseHandler(object):
def __init__(self):
self.prepare_callbacks = []
def register_prepare_callback(self, callback):
self.prepare_callbacks.append(callback)
def prepare(self):
# Do BaseHandler preparation
for callback in self.prepare_callbacks:
callback()
class MyHandler(BaseHandler):
def __init__(self):
BaseHandler.__init__(self)
self.register_prepare_callback(self._prepare)
def _prepare(self):
# whatever
In general you can try using __getattribute__ to achive something like this (until the moment someone overwrites this method too), but it is against the Python ideas. There is a reason to be able to access private object members in Python. The reason is mentioned in import this
Let's say I have a class:
class Thing(object):
cachedBar = None
def __init__(self, foo):
self.foo = foo
def bar(self):
if not self.cachedBar:
self.cachedBar = doSomeIntenseCalculation()
return self.cachedBar
To get bar some intense calculation, so I cache it in memory to speed things up.
However, when I pickle one of these classes I don't want cachedBar to be pickled.
Can I mark cachedBar as volatile / transient / not picklable?
According to the Pickle documentation, you can provide a method called __getstate__(), which returns something representing the state you want to have pickled (if it isn't provided, pickle uses thing.__dict__). So, you can do something like this:
class Thing:
def __getstate__(self):
state = dict(self.__dict__)
del state['cachedBar']
return state
This doesn't have to be a dict, but if it is something else, you need to also implement __setstate__(state).
Implement __getstate__ to return only what parts of an object to be pickled
I wonder if there is a reasonable easy way to allow for this code (with minor modifications) to work.
class Info(object):
#attr("Version")
def version(self):
return 3
info = Info()
assert info.version == 3
assert info["Version"] == 3
Ideally, the code would do some caching/memoising as well, e.g. employ lazy attributes, but I hope to figure that out myself.
Additional information:
The reason why I want provide two interfaces for accessing the same information is as follows.
I’d like to have a dict-like class which uses lazy keys. E.g. info["Version"] should call and cache another method and transparently return the result.
I don’t think that works with dicts alone, therefore I need to create new methods.
Methods alone won’t do either, because there are some attributes which are easier to define with pure dictionary syntax.
It probably is not the best idea anyway…
If the attribute name (version) is always a lowercase version of the dict key ("Version"), then you could set it up this way:
class Info(object):
#property
def version(self):
return 3
def __getitem__(self,key):
if hasattr(self,key.lower()):
return getattr(self,key.lower())
If you wish the dict key to be arbitrary, then its still possible, though more complicated:
def attrcls(cls):
cls._attrdict={}
for methodname in cls.__dict__:
method=cls.__dict__[methodname]
if hasattr(method,'_attr'):
cls._attrdict[getattr(method,'_attr')]=methodname
return cls
def attr(key):
def wrapper(func):
class Property(object):
def __get__(self,inst,instcls):
return func(inst)
def __init__(self):
self._attr=key
return Property()
return wrapper
#attrcls
class Info(object):
#attr("Version")
def version(self):
return 3
def __getitem__(self,key):
if key in self._attrdict:
return getattr(self,self._attrdict[key])
I guess the larger question is, Is it a good interface? Why provide two syntaxes (with two different names) for the same thing?
Not trivially. You could use a metaclass to detect decorated methods and wrap __*attr__() and __*item__() appropriately.