Hash a python new-style class instance? - python

Given a custom, new-style python class instance, what is a good way to hash it and get a unique ID-like value from it to use for various purposes? Think md5sum or sha1sum of a given class instance.
The approach I am currently using pickles the class and runs that through hexdigest, storing the resultant hash string into a class property (this property is never part of the pickle/unpickle procedures, fyi). Except now I've run into a case where a third-party module uses nested classes, and there is no really good way to pickle those without some hacks. I figure that I am missing out on some clever little Python trick somewhere to accomplish this.
Edit:
Example code because it seems to be a requirement around here to get any traction on a question. The below class can be initialized and the self._uniq_id property can be properly setup.
#!/usr/bin/env python
import hashlib
# cPickle or pickle.
try:
import cPickle as pickle
except:
import pickle
# END try
# Single class, pickles fine.
class FooBar(object):
__slots__ = ("_foo", "_bar", "_uniq_id")
def __init__(self, eth=None, ts=None, pkt=None):
self._foo = "bar"
self._bar = "bar"
self._uniq_id = hashlib.sha1(pickle.dumps(self, -1)).hexdigest()[0:16]
def __getstate__(self):
return {'foo':self._foo, 'bar':self._bar}
def __setstate__(self, state):
self._foo = state['foo']
self._bar = state['bar']
self._uniq_id = hashlib.sha1(pickle.dumps(self, -1)).hexdigest()[0:16]
def _get_foo(self): return self._foo
def _get_bar(self): return self._bar
def _get_uniq_id(self): return self._uniq_id
foo = property(_get_foo)
bar = property(_get_bar)
uniq_id = property(_get_uniq_id)
# End
This next class, however, cannot be initialized because of Bar being nested in Foo:
#!/usr/bin/env python
import hashlib
# cPickle or pickle.
try:
import cPickle as pickle
except:
import pickle
# END try
# Nested class, can't pickle for hexdigest.
class Foo(object):
__slots__ = ("_foo", "_bar", "_uniq_id")
class Bar(object):
pass
def __init__(self, eth=None, ts=None, pkt=None):
self._foo = "bar"
self._bar = self.Bar()
self._uniq_id = hashlib.sha1(pickle.dumps(self, -1)).hexdigest()[0:16]
def __getstate__(self):
return {'foo':self._foo, 'bar':self._bar}
def __setstate__(self, state):
self._foo = state['foo']
self._bar = state['bar']
self._uniq_id = hashlib.sha1(pickle.dumps(self, -1)).hexdigest()[0:16]
def _get_foo(self): return self._foo
def _get_bar(self): return self._bar
def _get_uniq_id(self): return self._uniq_id
foo = property(_get_foo)
bar = property(_get_bar)
uniq_id = property(_get_uniq_id)
# End
The error I receive is:
Traceback (most recent call last):
File "./nest_test.py", line 70, in <module>
foobar2 = Foo()
File "./nest_test.py", line 49, in __init__
self._uniq_id = hashlib.sha1(pickle.dumps(self, -1)).hexdigest()[0:16]
cPickle.PicklingError: Can't pickle <class '__main__.Bar'>: attribute lookup __main__.Bar failed
(nest_test.py) has both classes in it, hence the line number offset).
Pickling requires the __getstate__() method I found out, so I also implemented __setstate__() for completeness as well. But given the already existing warnings about security and pickle, there's got to be a better way to do this.
Based on what I have read so far, the error stems from Python not being able to resolve the nested classes. It tries to look up the attribute __main__.Bar, which doesn't exist. It really needs to be able to find __main__.Foo.Bar instead, but there is no really good way to do this. I bumped into another SO answer here that provides a "hack" to trick Python, but it came with a stern warning that such an approach is not advisable, and to either use something other than pickling or to move the nested class definition to the outside versus the inside.
However, the original question of that SO answer, I believe, was for pickling and unpickling to a file. I only need to pickle in order to use the requisite hashlib functions, which seem to operate on a bytearray (much like I am used to in .NET), and pickling (Especially cPickle) is fast and optimized versus writing my own bytearray routine.

That depends entirely on what properties the ID should have.
For instance, you can use id(foo) to get an ID which is guaranteed to be unique as long as foo is active in memory, or you could use repr(instance.__dict__) if all of the fields have sensible repr values.
What specifically do you need it for?

While you're using hexdigests of pickles at the moment, you make it sound like the id doesn't actually need to be related to the object, it just needs to be unique. Why not simply use the uuid module, specifically uuid.uuid4 to generate unique IDs and assign them to a uuid field in the object...

Related

How can I specialise instances of objects when I don't have access to the instantiation code?

Let's assume I am using a library which gives me instances of classes defined in that library when calling its functions:
>>> from library import find_objects
>>> result = find_objects("name = any")
[SomeObject(name="foo"), SomeObject(name="bar")]
Let's further assume that I want to attach new attributes to these instances. For example a classifier to avoid running this code every time I want to classify the instance:
>>> from library import find_objects
>>> result = find_objects("name = any")
>>> for row in result:
... row.item_class= my_classifier(row)
Note that this is contrived but illustrates the problem: I now have instances of the class SomeObject but the attribute item_class is not defined in that class and trips up the type-checker.
So when I now write:
print(result[0].item_class)
I get a typing error. It also trips up auto-completion in editors as the editor does not know that this attribute exists.
And, not to mention that this way of implementing this is quite ugly and hacky.
One thing I could do is create a subclass of SomeObject:
class ExtendedObject(SomeObject):
item_class = None
def classify(self):
cls = do_something_with(self)
self.item_class = cls
This now makes everything explicit, I get a chance to properly document the new attributes and give it proper type-hints. Everything is clean. However, as mentioned before, the actual instances are created inside library and I don't have control over the instantiation.
Side note: I ran into this issue in flask for the Response class. I noticed that flask actually offers a way to customise the instantiation using Flask.response_class. But I am still interested how this could be achieved in libraries that don't offer this injection seam.
One thing I could do is write a wrapper that does something like this:
class WrappedObject(SomeObject):
item_class = None
wrapped = None
#staticmethod
def from_original(wrapped):
self.wrapped = wrapped
self.item_class = do_something_with(wrapped)
def __getattribute__(self, key):
return getattr(self.wrapped, key)
But this seems rather hacky and will not work in other programming languages.
Or try to copy the data:
from copy import deepcopy
class CopiedObject(SomeObject):
item_class = None
#staticmethod
def from_original(wrapped):
for key, value in vars(wrapped):
setattr(self, key, deepcopy(value))
self.item_class = do_something_with(wrapped)
but this feels equally hacky, and is risky when the objects sue properties and/or descriptors.
Are there any known "clean" patterns for something like this?
I would go with a variant of your WrappedObject approach, with the following adjustments:
I would not extend SomeObject: this is a case where composition feels more appropriate than inheritance
With that in mind, from_original is unnecessary: you can have a proper __init__ method
item_class should be an instance variable and not a class variable. It should be initialized in your WrappedObject class constructor
Think twice before implementing __getattribute__ and forwarding everything to the wrapped object. If you need only a few method and attributes of the original SomeObject class, it might be better to implement them explicitly as methods and properties
class WrappedObject:
def __init__(self, wrapped):
self.wrapped = wrapped
self.item_class = do_something_with(wrapped)
def a_method(self):
return self.wrapped.a_method()
#property
def a_property(self):
return self.wrapped.a_property

(Un)Pickle Class having Instancemethod Objects

I have a class (Bar) which effectively has its own state and callback(s) and is used by another class (Foo):
class Foo(object):
def __init__(self):
self._bar = Bar(self.say, 10)
self._bar.work()
def say(self, msg):
print msg
class Bar(object):
def __init__(self, callback, value):
self._callback = callback
self._value = value
self._more = { 'foo' : 1, 'bar': 3, 'baz': 'fubar'}
def work(self):
# Do some work
self._more['foo'] = 5
self._value = 10
self._callback('FooBarBaz')
Foo()
Obviously I can't pickle the class Foo since Bar has an instancemethod, so I'm left with the following solution of implementing __getstate__ & __setstate__ in Bar to save self._value & self._more, but I have to instantiate the self._callback method as well (i.e. call __init__() from the outer class Foo passing the callback function.
But I cannot figure out how to achieve this.
Any help is much appreciated.
Thanks.
I think if you need to serialize something like this you need to be able to define your callback as a string. For example, you might say that callback = 'myproject.callbacks.foo_callback'.
Basically in __getstate__ you'd replace the _callback function with something you could use to look up the function later like self._callback.__name__.
In __setstate__ you'd replace _callback with a function.
This depends on your functions all having real names so you couldn't use a lambda as a callback and expect it to be serialized. You'd also need a reasonable mechanism for looking up your functions by name.
You could potentially use __import__ (something like: 'myproject.somemodule.somefunc' dotted name syntax could be supported that way, see http://code.google.com/p/mock/source/browse/mock.py#1076) or just define a lookup table in your code.
Just a quick (untested, sorry!) example assuming you have a small set of possible callbacks defined in a lookup table:
def a():
pass
callbacks_to_name = {a: 'a'
# ...
}
callbacks_by_name = {'a': a,
# ...
}
class C:
def __init__(self, cb):
self._callback = cb
def __getstate__(self):
self._callback = callbacks_to_name[self._callback]
return self.__dict__
def __setstate__(self, state):
state[_callback] = callbacks_by_name[self._callback]
I'm not sure what your use case is but I'd recommend doing this by serializing your work items to JSON or XML and writing a simple set of functions to serialize and deserialize them yourself.
The benefit is that the serialized format can be read and understood by humans and modified when you upgrade your software. Pickle is tempting because it seems close enough, but by the time you have a serious pile of __getstate__ and __setstate__ you haven't really saved yourself much effort or headache over building your own scheme specifically for your application.

Is there a way to instantiate a class without calling __init__?

Is there a way to circumvent the constructor __init__ of a class in python?
Example:
class A(object):
def __init__(self):
print "FAILURE"
def Print(self):
print "YEHAA"
Now I would like to create an instance of A. It could look like this, however this syntax is not correct.
a = A
a.Print()
EDIT:
An even more complex example:
Suppose I have an object C, which purpose it is to store one single parameter and do some computations with it. The parameter, however, is not passed as such but it is embedded in a huge parameter file. It could look something like this:
class C(object):
def __init__(self, ParameterFile):
self._Parameter = self._ExtractParamterFile(ParameterFile)
def _ExtractParamterFile(self, ParameterFile):
#does some complex magic to extract the right parameter
return the_extracted_parameter
Now I would like to dump and load an instance of that object C. However, when I load this object, I only have the single variable self._Parameter and I cannot call the constructor, because it is expecting the parameter file.
#staticmethod
def Load(file):
f = open(file, "rb")
oldObject = pickle.load(f)
f.close()
#somehow create newObject without calling __init__
newObject._Parameter = oldObject._Parameter
return newObject
In other words, it is not possible to create an instance without passing the parameter file. In my "real" case, however, it is not a parameter file but some huge junk of data I certainly not want to carry around in memory or even store it to disc.
And since I want to return an instance of C from the method Load I do somehow have to call the constructor.
OLD EDIT:
A more complex example, which explains why I am asking the question:
class B(object):
def __init__(self, name, data):
self._Name = name
#do something with data, but do NOT save data in a variable
#staticmethod
def Load(self, file, newName):
f = open(file, "rb")
s = pickle.load(f)
f.close()
newS = B(???)
newS._Name = newName
return newS
As you can see, since data is not stored in a class variable I cannot pass it to __init__. Of course I could simply store it, but what if the data is a huge object, which I do not want to carry around in memory all the time or even save it to disc?
You can circumvent __init__ by calling __new__ directly. Then you can create a object of the given type and call an alternative method for __init__. This is something that pickle would do.
However, first I'd like to stress very much that it is something that you shouldn't do and whatever you're trying to achieve, there are better ways to do it, some of which have been mentioned in the other answers. In particular, it's a bad idea to skip calling __init__.
When objects are created, more or less this happens:
a = A.__new__(A, *args, **kwargs)
a.__init__(*args, **kwargs)
You could skip the second step.
Here's why you shouldn't do this: The purpose of __init__ is to initialize the object, fill in all the fields and ensure that the __init__ methods of the parent classes are also called. With pickle it is an exception because it tries to store all the data associated with the object (including any fields/instance variables that are set for the object), and so anything that was set by __init__ the previous time would be restored by pickle, there's no need to call it again.
If you skip __init__ and use an alternative initializer, you'd have a sort of a code duplication - there would be two places where the instance variables are filled in, and it's easy to miss one of them in one of the initializers or accidentally make the two fill the fields act differently. This gives the possibility of subtle bugs that aren't that trivial to trace (you'd have to know which initializer was called), and the code will be more difficult to maintain. Not to mention that you'd be in an even bigger mess if you're using inheritance - the problems will go up the inheritance chain, because you'd have to use this alternative initializer everywhere up the chain.
Also by doing so you'd be more or less overriding Python's instance creation and making your own. Python already does that for you pretty well, no need to go reinventing it and it will confuse people using your code.
Here's what to best do instead: Use a single __init__ method that is to be called for all possible instantiations of the class that initializes all instance variables properly. For different modes of initialization use either of the two approaches:
Support different signatures for __init__ that handle your cases by using optional arguments.
Create several class methods that serve as alternative constructors. Make sure they all create instances of the class in the normal way (i.e. calling __init__), as shown by Roman Bodnarchuk, while performing additional work or whatever. It's best if they pass all the data to the class (and __init__ handles it), but if that's impossible or inconvenient, you can set some instance variables after the instance was created and __init__ is done initializing.
If __init__ has an optional step (e.g. like processing that data argument, although you'd have to be more specific), you can either make it an optional argument or make a normal method that does the processing... or both.
Use classmethod decorator for your Load method:
class B(object):
def __init__(self, name, data):
self._Name = name
#store data
#classmethod
def Load(cls, file, newName):
f = open(file, "rb")
s = pickle.load(f)
f.close()
return cls(newName, s)
So you can do:
loaded_obj = B.Load('filename.txt', 'foo')
Edit:
Anyway, if you still want to omit __init__ method, try __new__:
>>> class A(object):
... def __init__(self):
... print '__init__'
...
>>> A()
__init__
<__main__.A object at 0x800f1f710>
>>> a = A.__new__(A)
>>> a
<__main__.A object at 0x800f1fd50>
Taking your question literally I would use meta classes :
class MetaSkipInit(type):
def __call__(cls):
return cls.__new__(cls)
class B(object):
__metaclass__ = MetaSkipInit
def __init__(self):
print "FAILURE"
def Print(self):
print "YEHAA"
b = B()
b.Print()
This can be useful e.g. for copying constructors without polluting the parameter list.
But to do this properly would be more work and care than my proposed hack.
Not really. The purpose of __init__ is to instantiate an object, and by default it really doesn't do anything. If the __init__ method is not doing what you want, and it's not your own code to change, you can choose to switch it out though. For example, taking your class A, we could do the following to avoid calling that __init__ method:
def emptyinit(self):
pass
A.__init__ = emptyinit
a = A()
a.Print()
This will dynamically switch out which __init__ method from the class, replacing it with an empty call. Note that this is probably NOT a good thing to do, as it does not call the super class's __init__ method.
You could also subclass it to create your own class that does everything the same, except overriding the __init__ method to do what you want it to (perhaps nothing).
Perhaps, however, you simply wish to call the method from the class without instantiating an object. If that is the case, you should look into the #classmethod and #staticmethod decorators. They allow for just that type of behavior.
In your code you have put the #staticmethod decorator, which does not take a self argument. Perhaps what may be better for the purpose would a #classmethod, which might look more like this:
#classmethod
def Load(cls, file, newName):
# Get the data
data = getdata()
# Create an instance of B with the data
return cls.B(newName, data)
UPDATE: Rosh's Excellent answer pointed out that you CAN avoid calling __init__ by implementing __new__, which I was actually unaware of (although it makes perfect sense). Thanks Rosh!
I was reading the Python cookbook and there's a section talking about this: the example is given using __new__ to bypass __init__()
>>> class A:
def __init__(self,a):
self.a = a
>>> test = A('a')
>>> test.a
'a'
>>> test_noinit = A.__new__(A)
>>> test_noinit.a
Traceback (most recent call last):
File "", line 1, in
test_noinit.a
AttributeError: 'A' object has no attribute 'a'
>>>
However I think this only works in Python3. Below is running under 2.7
>>> class A:
def __init__(self,a):
self.a = a
>>> test = A.__new__(A)
Traceback (most recent call last):
File "", line 1, in
test = A.__new__(A)
AttributeError: class A has no attribute '__new__'
>>>
As I said in my comment you could change your __init__ method so that it allows creation without giving any values to its parameters:
def __init__(self, p0, p1, p2):
# some logic
would become:
def __init__(self, p0=None, p1=None, p2=None):
if p0 and p1 and p2:
# some logic
or:
def __init__(self, p0=None, p1=None, p2=None, init=True):
if init:
# some logic

Pickle a dynamically parameterized sub-class

I have a system which commonly stores pickled class types.
I want to be able to save dynamically-parameterized classes in the same way, but I can't because I get a PicklingError on trying to pickle a class which is not globally found (not defined in simple code).
My problem can be modeled as the following example code:
class Base(object):
def m(self):
return self.__class__.PARAM
def make_parameterized(param_value):
class AutoSubClass(Base):
PARAM = param_value
return AutoSubClass
cls = make_parameterized(input("param value?"))
When I try to pickle the class, I get the following error:
# pickle.PicklingError: Can't pickle <class '__main__.AutoSubClass'>: it's not found as __main__.AutoSubClass
import pickle
print pickle.dumps(cls)
I am looking for some method to declare Base as a ParameterizableBaseClass which should define the params needed (PARAM in above example). A dynamic parameterized subclass (cls above) should then be picklable by saving the "ParameterizableBaseClass" type and the different param-values (dynamic param_value above).
I am sure that in many cases, this can be avoided altogether... And I can avoid this in my code as well if I really (really) have to. I was playing with __metaclass__, copyreg and even __builtin__.issubclass at some point (don't ask), but was unable to crack this one.
I feel like I wouldn't be true to the python spirit if I wasn't to ask: how can this be achieved, in a relatively clean way?
I know this is a very old question, but I think it is worth sharing a better means of pickling the parameterised classes than the one that is the currently accepted solution (making the parameterised class a global).
Using the __reduce__ method, we can provide a callable which will return an uninitialised instance of our desired class.
class Base(object):
def m(self):
return self.__class__.PARAM
def __reduce__(self):
return (_InitializeParameterized(), (self.PARAM, ), self.__dict__)
def make_parameterized(param_value):
class AutoSub(Base):
PARAM = param_value
return AutoSub
class _InitializeParameterized(object):
"""
When called with the param value as the only argument, returns an
un-initialized instance of the parameterized class. Subsequent __setstate__
will be called by pickle.
"""
def __call__(self, param_value):
# make a simple object which has no complex __init__ (this one will do)
obj = _InitializeParameterized()
obj.__class__ = make_parameterized(param_value)
return obj
if __name__ == "__main__":
from pickle import dumps, loads
a = make_parameterized("a")()
b = make_parameterized("b")()
print a.PARAM, b.PARAM, type(a) is type(b)
a_p = dumps(a)
b_p = dumps(b)
del a, b
a = loads(a_p)
b = loads(b_p)
print a.PARAM, b.PARAM, type(a) is type(b)
It is worth reading the __reduce__ docs a couple of times to see exactly what is going on here.
Hope somebody finds this useful.
Yes, it is possible -
Whenever you want to custom the Pickle and Unpickle behaviors for your objects, you just have to set the "__getstate__" and "__setstate__" methods on the class itself.
In this case it is a bit trickier:
There need, as you observed - to exist a class on the global namespace that is the class of the currently being pickled object: it has to be the same class, with the same name. Ok - the deal is that gthis class existing in the globalname space can be created at Pickle time.
At Unpickle time the class, with the same name, have to exist - but it does not have to be the same object - just behave like it does - and as __setstate__ is called in the Unpickling proccess, it can recreate the parameterized class of the orignal object, and set its own class to be that one, by setting the __class__ attribute of the object.
Setting the __class__ attribute of an object may seen objectionable but it is how OO works in Python and it is officially documented, it even works accross implementations. (I tested this snippet in both Python 2.6 and Pypy)
class Base(object):
def m(self):
return self.__class__.PARAM
def __getstate__(self):
global AutoSub
AutoSub = self.__class__
return (self.__dict__,self.__class__.PARAM)
def __setstate__(self, state):
self.__class__ = make_parameterized(state[1])
self.__dict__.update(state[0])
def make_parameterized(param_value):
class AutoSub(Base):
PARAM = param_value
return AutoSub
class AutoSub(Base):
pass
if __name__ == "__main__":
from pickle import dumps, loads
a = make_parameterized("a")()
b = make_parameterized("b")()
print a.PARAM, b.PARAM, type(a) is type(b)
a_p = dumps(a)
b_p = dumps(b)
del a, b
a = loads(a_p)
b = loads(b_p)
print a.PARAM, b.PARAM, type(a) is type(b)
I guess it's too late now, but pickle is a module I'd rather avoid for anything complex, because it has problems like this one and many more.
Anyways, since pickle wants the class in a global it can have it:
import cPickle
class Base(object):
def m(self):
return self.__class__.PARAM
#classmethod
def make_parameterized(cls,param):
clsname = "AutoSubClass.%s" % param
# create a class, assign it as a global under the same name
typ = globals()[clsname] = type(clsname, (cls,), dict(PARAM=param))
return typ
cls = Base.make_parameterized('asd')
import pickle
s = pickle.dumps(cls)
cls = pickle.loads(s)
print cls, cls.PARAM
# <class '__main__.AutoSubClass.asd'> asd
But yeah, you're probably overcomplicating things.
Classes that are not created in the top level of a module cannot be pickled, as shown in the Python documentation.
Furthermore, even for an instance of a top level module class the class attributes are not stored. So in your example PARAM wouldn't be stored anyway. (Explained in the Python documentation section linked above as well)

How can I pickle a dynamically created nested class in python?

I have a nested class:
class WidgetType(object):
class FloatType(object):
pass
class TextType(object):
pass
.. and an object that refers the nested class type (not an instance of it) like this
class ObjectToPickle(object):
def __init__(self):
self.type = WidgetType.TextType
Trying to serialize an instance of the ObjectToPickle class results in:
PicklingError: Can't pickle <class
'setmanager.app.site.widget_data_types.TextType'>
Is there a way to pickle nested classes in python?
I know this is a very old question, but I have never explicitly seen a satisfactory solution to this question other than the obvious, and most likely correct, answer to re-structure your code.
Unfortunately, it is not always practical to do such a thing, in which case as a very last resort, it is possible to pickle instances of classes which are defined inside another class.
The python documentation for the __reduce__ function states that you can return
A callable object that will be called to create the initial version of the object. The next element of the tuple will provide arguments for this callable.
Therefore, all you need is an object which can return an instance of the appropriate class. This class must itself be picklable (hence, must live on the __main__ level), and could be as simple as:
class _NestedClassGetter(object):
"""
When called with the containing class as the first argument,
and the name of the nested class as the second argument,
returns an instance of the nested class.
"""
def __call__(self, containing_class, class_name):
nested_class = getattr(containing_class, class_name)
# return an instance of a nested_class. Some more intelligence could be
# applied for class construction if necessary.
return nested_class()
All that is left therefore, is to return the appropriate arguments in a __reduce__ method on FloatType:
class WidgetType(object):
class FloatType(object):
def __reduce__(self):
# return a class which can return this class when called with the
# appropriate tuple of arguments
return (_NestedClassGetter(), (WidgetType, self.__class__.__name__, ))
The result is a class which is nested but instances can be pickled (further work is needed to dump/load the __state__ information, but this is relatively straightforward as per the __reduce__ documentation).
This same technique (with slight code modifications) can be applied for deeply nested classes.
A fully worked example:
import pickle
class ParentClass(object):
class NestedClass(object):
def __init__(self, var1):
self.var1 = var1
def __reduce__(self):
state = self.__dict__.copy()
return (_NestedClassGetter(),
(ParentClass, self.__class__.__name__, ),
state,
)
class _NestedClassGetter(object):
"""
When called with the containing class as the first argument,
and the name of the nested class as the second argument,
returns an instance of the nested class.
"""
def __call__(self, containing_class, class_name):
nested_class = getattr(containing_class, class_name)
# make an instance of a simple object (this one will do), for which we can change the
# __class__ later on.
nested_instance = _NestedClassGetter()
# set the class of the instance, the __init__ will never be called on the class
# but the original state will be set later on by pickle.
nested_instance.__class__ = nested_class
return nested_instance
if __name__ == '__main__':
orig = ParentClass.NestedClass(var1=['hello', 'world'])
pickle.dump(orig, open('simple.pickle', 'w'))
pickled = pickle.load(open('simple.pickle', 'r'))
print type(pickled)
print pickled.var1
My final note on this is to remember what the other answers have said:
If you are in a position to do so, consider re-factoring your code to
avoid the nested classes in the first place.
The pickle module is trying to get the TextType class from the module. But since the class is nested it doesn't work. jasonjs's suggestion will work.
Here are the lines in pickle.py responsible for the error message:
try:
__import__(module)
mod = sys.modules[module]
klass = getattr(mod, name)
except (ImportError, KeyError, AttributeError):
raise PicklingError(
"Can't pickle %r: it's not found as %s.%s" %
(obj, module, name))
klass = getattr(mod, name) will not work in the nested class case of course. To demonstrate what is going on try to add these lines before pickling the instance:
import sys
setattr(sys.modules[__name__], 'TextType', WidgetType.TextType)
This code adds TextType as an attribute to the module. The pickling should work just fine. I don't advice you to use this hack though.
If you use dill instead of pickle, it works.
>>> import dill
>>>
>>> class WidgetType(object):
... class FloatType(object):
... pass
... class TextType(object):
... pass
...
>>> class ObjectToPickle(object):
... def __init__(self):
... self.type = WidgetType.TextType
...
>>> x = ObjectToPickle()
>>>
>>> _x = dill.dumps(x)
>>> x_ = dill.loads(_x)
>>> x_
<__main__.ObjectToPickle object at 0x10b20a250>
>>> x_.type
<class '__main__.TextType'>
Get dill here: https://github.com/uqfoundation/dill
In Sage (www.sagemath.org), we have many instances of this pickling issue. The way we decided to systematically solve it is to put the outer class inside a specific metaclass whose goal is to implement and hide the hack. Note that this automatically propagate through nested classes if there are several level of nesting.
Pickle only works with classes defined in module scope (top level). In this case, it looks like you could define the nested classes in module scope and then set them as properties on WidgetType, assuming there's a reason not to just reference TextType and FloatType in your code. Or, import the module they're in and use widget_type.TextType and widget_type.FloatType.
Nadia's answer is pretty complete - it is practically not something you want to be doing; are you sure you can't use inheritance in WidgetTypes instead of nested classes?
The only reason to use nested classes is to encapsulate classes working together closely, your specific example looks like an immediate inheritance candidate to me - there is no benefit in nesting WidgetType classes together; put them in a module and inherit from the base WidgetType instead.
This seems to work fine in newer versions of Python. I tried it in v3.8 and it was able to pickle and unpickle the nested class.

Categories