Preserve custom attributes when pickling subclass of numpy array

Preserve custom attributes when pickling subclass of numpy array - python

I've created a subclass of numpy ndarray following the numpy documentation. In particular, I have added a custom attribute by modifying the code provided.
I'm manipulating instances of this class within a parallel loop, using Python multiprocessing. As I understand it, the way that the scope is essentially 'copied' to multiple threads is using pickle.
The problem I am now coming up against relates to the way that numpy arrays are pickled. I can't find any comprehensive documentation about this, but some discussions between the dill developers suggest that I should be focusing on the __reduce__ method, which is being called upon pickling.
Can anyone shed any more light on this? The minimal working example is really just the numpy example code I linked to above, copied here for completeness:
import numpy as np
class RealisticInfoArray(np.ndarray):
def __new__(cls, input_array, info=None):
# Input array is an already formed ndarray instance
# We first cast to be our class type
obj = np.asarray(input_array).view(cls)
# add the new attribute to the created instance
obj.info = info
# Finally, we must return the newly created object:
return obj
def __array_finalize__(self, obj):
# see InfoArray.__array_finalize__ for comments
if obj is None: return
self.info = getattr(obj, 'info', None)
Now here is the problem:
import pickle
obj = RealisticInfoArray([1, 2, 3], info='foo')
print obj.info # 'foo'
pickle_str = pickle.dumps(obj)
new_obj = pickle.loads(pickle_str)
print new_obj.info # raises AttributeError
Thanks.

np.ndarray uses __reduce__ to pickle itself. We can take a look at what it actually returns when you call that function to get an idea of what's going on:
>>> obj = RealisticInfoArray([1, 2, 3], info='foo')
>>> obj.__reduce__()
(<built-in function _reconstruct>, (<class 'pick.RealisticInfoArray'>, (0,), 'b'), (1, (3,), dtype('int64'), False, '\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00'))
So, we get a 3-tuple back. The docs for __reduce__ describe what each element is doing:
When a tuple is returned, it must be between two and five elements
long. Optional elements can either be omitted, or None can be provided
as their value. The contents of this tuple are pickled as normal and
used to reconstruct the object at unpickling time. The semantics of
each element are:
A callable object that will be called to create the initial version of
the object. The next element of the tuple will provide arguments for
this callable, and later elements provide additional state information
that will subsequently be used to fully reconstruct the pickled data.
In the unpickling environment this object must be either a class, a
callable registered as a “safe constructor” (see below), or it must
have an attribute __safe_for_unpickling__ with a true value.
Otherwise, an UnpicklingError will be raised in the unpickling
environment. Note that as usual, the callable itself is pickled by
name.
A tuple of arguments for the callable object.
Optionally, the object’s state, which will be passed to the object’s
__setstate__() method as described in section Pickling and unpickling normal class instances. If the object has no __setstate__() method,
then, as above, the value must be a dictionary and it will be added to
the object’s __dict__.
So, _reconstruct is the function called to rebuild the object, (<class 'pick.RealisticInfoArray'>, (0,), 'b') are the arguments passed to that function, and (1, (3,), dtype('int64'), False, '\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00')) gets passed to the class' __setstate__. This gives us an opportunity; we could override __reduce__ and provide our own tuple to __setstate__, and then additionally override __setstate__, to set our custom attribute when we unpickle. We just need to make sure we preserve all the data the parent class needs, and call the parent's __setstate__, too:
class RealisticInfoArray(np.ndarray):
def __new__(cls, input_array, info=None):
obj = np.asarray(input_array).view(cls)
obj.info = info
return obj
def __array_finalize__(self, obj):
if obj is None: return
self.info = getattr(obj, 'info', None)
def __reduce__(self):
# Get the parent's __reduce__ tuple
pickled_state = super(RealisticInfoArray, self).__reduce__()
# Create our own tuple to pass to __setstate__
new_state = pickled_state[2] + (self.info,)
# Return a tuple that replaces the parent's __setstate__ tuple with our own
return (pickled_state[0], pickled_state[1], new_state)
def __setstate__(self, state):
self.info = state[-1] # Set the info attribute
# Call the parent's __setstate__ with the other tuple elements.
super(RealisticInfoArray, self).__setstate__(state[0:-1])
Usage:
>>> obj = pick.RealisticInfoArray([1, 2, 3], info='foo')
>>> pickle_str = pickle.dumps(obj)
>>> pickle_str
"cnumpy.core.multiarray\n_reconstruct\np0\n(cpick\nRealisticInfoArray\np1\n(I0\ntp2\nS'b'\np3\ntp4\nRp5\n(I1\n(I3\ntp6\ncnumpy\ndtype\np7\n(S'i8'\np8\nI0\nI1\ntp9\nRp10\n(I3\nS'<'\np11\nNNNI-1\nI-1\nI0\ntp12\nbI00\nS'\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x02\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x03\\x00\\x00\\x00\\x00\\x00\\x00\\x00'\np13\nS'foo'\np14\ntp15\nb."
>>> new_obj = pickle.loads(pickle_str)
>>> new_obj.info
'foo'

I'm the dill (and pathos) author. dill was pickling a numpy.array before numpy could do it itself. #dano's explanation is pretty accurate. Me personally, I'd just use dill and let it do the job for you. With dill, you don't need __reduce__, as dill has several ways that it grabs subclassed attributes… one of which is storing the __dict__ for any class object. pickle doesn't do this, b/c it usually works with classes by name reference and not storing the class object itself… so you have to work with __reduce__ to make pickle work for you. No need, in most cases, with dill.
>>> import numpy as np
>>>
>>> class RealisticInfoArray(np.ndarray):
... def __new__(cls, input_array, info=None):
... # Input array is an already formed ndarray instance
... # We first cast to be our class type
... obj = np.asarray(input_array).view(cls)
... # add the new attribute to the created instance
... obj.info = info
... # Finally, we must return the newly created object:
... return obj
... def __array_finalize__(self, obj):
... # see InfoArray.__array_finalize__ for comments
... if obj is None: return
... self.info = getattr(obj, 'info', None)
...
>>> import dill as pickle
>>> obj = RealisticInfoArray([1, 2, 3], info='foo')
>>> print obj.info # 'foo'
foo
>>>
>>> pickle_str = pickle.dumps(obj)
>>> new_obj = pickle.loads(pickle_str)
>>> print new_obj.info
foo
dill can extend itself into pickle (essentially by copy_reg everything it knows), so you can then use all dill types in anything that uses pickle. Now, if you are going to use multiprocessing, you are a bit screwed, since it uses cPickle. There is, however, the pathos fork of multiprocessing (called pathos.multiprocessing), which basically the only change is it uses dill instead of cPickle… and thus can serialize a heck of a lot more in a Pool.map. I think (currently) if you want to work with your subclass of a numpy.array in multiprocessing (or pathos.multiprocessing), you might have to do something like #dano suggests -- but not sure, as I didn't think of a good case off the top of my head to test your subclass.
If you are interested, get pathos here: https://github.com/uqfoundation

Here is a slight improvement to #dano's answer and and #Gabriel's comment. Leveraging the __dict__ attribute for serialization works for me even with subclasses.
def __reduce__(self):
# Get the parent's __reduce__ tuple
pickled_state = super(RealisticInfoArray, self).__reduce__()
# Create our own tuple to pass to __setstate__, but append the __dict__ rather than individual members.
new_state = pickled_state[2] + (self.__dict__,)
# Return a tuple that replaces the parent's __setstate__ tuple with our own
return (pickled_state[0], pickled_state[1], new_state)
def __setstate__(self, state):
self.__dict__.update(state[-1]) # Update the internal dict from state
# Call the parent's __setstate__ with the other tuple elements.
super(RealisticInfoArray, self).__setstate__(state[0:-1])
Here is a full example: https://onlinegdb.com/SJ88d5DLB

Related

Pickle and decorated classes (PicklingError: not the same object)

The following minimal example uses a dummy decorator, that justs prints some message when an object of the decorated class is constructed.
import pickle
def decorate(message):
def call_decorator(func):
def wrapper(*args, **kwargs):
print(message)
return func(*args, **kwargs)
return wrapper
return call_decorator
#decorate('hi')
class Foo:
pass
foo = Foo()
dump = pickle.dumps(foo) # Fails already here.
foo = pickle.loads(dump)
Using it however makes pickle raise the following exception:
_pickle.PicklingError: Can't pickle <class '__main__.Foo'>: it's not the same object as __main__.Foo
Is there anything I can do to fix this?

Pickle requires that the __class__ attribute of instances can be loaded via importing.
Pickling instances only stores the instance data, and the __qualname__ and __module__ attributes of the class are used to later on re-create the instance by importing the class again and creating a new instance for the class.
Pickle validates that the class can actually be imported first. The __module__ and __qualname__ pair are used to find the correct module and then access the object named by __qualname__ on that module, and if the __class__ object and the object found on the module don't match, the error you see is raised.
Here, foo.__class__ points to a class object with __qualname__ set to 'Foo' and __module__ set to '__main__', but sys.modules['__main__'].Foo doesn't point to a class, it points to a function instead, the wrapper nested function your decorator returned.
There are two possible solutions:
Don't return a function, return the original class, and perhaps instrument the class object to do the work the wrapper does. If you are acting on the arguments for the class constructor, add or wrap a __new__ or __init__ method on the decorated class.
Take into account that unpickling usually calls __new__ on the class to create a new empty instance, before restoring the instance state (unless pickling has been customised).
Store the class under a new location. Alter the __qualname__ and perhaps the __module__ attributes of the class to point to a location where the original class can be found by pickle. On unpickling the right type of instance will be created again, just like the original Foo() call would have.
Another option is to customise pickling for the produced class. You can give the class new __reduce_ex__ and new __reduce__ methods that point to the wrapper function or a custom reduce function, instead. This can get complex, as the class may already have customised pickling, and object.__reduce_ex__ provides a default, and the return value can differ by pickle version.
If you don't want to alter the class, you can also use the copyreg.pickle() function to register a custom __reduce__ handler for the class.
Either way, the return value of the reducer should still avoid referencing the class and should reference the new constructor instead, by the name that it can be imported with. This can be problematic if you use the decorator directly with new_name = decorator()(classobj). Pickle itself would not deal with such situations either (as classobj.__name__ would not match newname).

Using dill, istead of pickle raises no errors.
import dill
def decorate(message):
def call_decorator(func):
def wrapper(*args, **kwargs):
print(message)
return func(*args, **kwargs)
return wrapper
return call_decorator
#decorate('hi')
class Foo:
pass
foo = Foo()
dump = dill.dumps(foo) # Fails already here.
foo = dill.loads(dump)
output -> hi

How is types.MethodType used?

What arguments do types.MethodType expect and what does it return?
https://docs.python.org/3.6/library/types.html doesn't say more about it:
types.MethodType
The type of methods of user-defined class instances.
For an example, from https://docs.python.org/3.6/howto/descriptor.html
To support method calls, functions include the __get__() method for
binding methods during attribute access. This means that all functions
are non-data descriptors which return bound or unbound methods
depending whether they are invoked from an object or a class. In pure
python, it works like this:
class Function(object):
. . .
def __get__(self, obj, objtype=None):
"Simulate func_descr_get() in Objects/funcobject.c"
if obj is None:
return self
return types.MethodType(self, obj)
Must the first argument self of types.MethodType be a callable object? In other words, must the class Function be a callable type, i.e. must Function have a method __call__?
If self is a callable object, does it take at least one argument?
Does types.MethodType(self, obj) mean giving obj as the first argument to the callable object self, i.e. currying self with obj?
How does types.MethodType(self, obj) create and return an instance of types.MethodType?
Thanks.

Usually you don't need to create instance of types.MethodType yourself. Instead, you'll get one automatically when you access a method on an instance of a class.
For example, if we make a class, create an instance of it, then access a method on the instance (without calling it), we'll get an instance of types.MethodType back:
import types
class Foo:
def bar(self):
pass
foo = Foo()
method = foo.bar
print(type(method) == types.MethodType) # prints True
The code you excerpt in your question is trying to show how this normally happens. It's not something you usually have to do yourself, though you can if you really want to. For instance, to create another instance of types.MethodType equivalent to method above, we could do:
method_manual = types.MethodType(Foo.bar, foo)
The first argument to MethodType is a callable object (normally a function, but it can be something else, like an instance of the Function class in the example you were reading). The second argument what we're binding the function to. When you call the method object (with e.g. method()), the bound object will be passed into the function as the first argument.
Usually the object the method gets bound to is an instance, though it can be something else. For instance, a classmethod decorated function will bind to the class it is called on, rather than an instance. Here's an example of that (both getting a method bound to a class automatically, and doing it manually ourselves):
class Foo2:
#classmethod
def baz(cls):
pass
foo2 = Foo2()
method2 = Foo2.baz
method2_via_an_instance = foo2.baz
method2_manual = types.MethodType(method2.__func__, Foo2)
All three of the method2-prefixed variables work exactly the same way (when you call them, they'll all call baz with Foo2 as the cls argument). The only wonky thing about the manual approach this time is that it's hard to get at the original baz function without getting a bound method instead, so I fished it out of one of the other bound method objects.
A final note: The name types.MethodType is an alias for the internal type used for bound methods, which doesn't otherwise have an accessible name. Unlike many classes, the repr of an instance is not an expression to recreate it (it will be something like "<bound method Foo.bar of <__main__.Foo object at 0x0000...>>"). Nor is the repr of the type a valid name to access the type by (the repr is "method").

Short Answer:
Must the first argument self of types.MethodType be a callable object?
In other words, must the class Function be a callable type, i.e. must
Function have a method __call__?
Yes
If self is a callable object, does it take at least one argument?
Depends
Does types.MethodType(self, obj) mean giving obj as the first argument
to the callable object self, i.e. currying self with obj?
Yes
How does types.MethodType(self, obj) create and return an instance of
types.MethodType?
It doesn't work like that.
Long Answer:
the code
class Function(object):
. . .
def __get__(self, obj, objtype=None):
"Simulate func_descr_get() in Objects/funcobject.c"
if obj is None:
return self
return types.MethodType(self, obj)
As Daniel explained is mainly to demonstrate for
To support method calls, functions include the __get__() method for
binding methods during attribute access. This means that all functions
are non-data descriptors which return bound or unbound methods
depending whether they are invoked from an object or a class. In pure
python, it works like this:
The types.MethodType() works when the Function has an object.
if obj is None would be False
Then it's a method of some object aka. bound method.
It explains how Python grammar work. As a function, it could be called in the
following two ways.
some_func_() or some_class.some_func()
The former part https://docs.python.org/3.6/howto/descriptor.html#invoking-descriptors explained.
For objects, the machinery is in object.__getattribute__() which
transforms b.x into type(b).__dict__['x'].__get__(b, type(b)). The
implementation works through a precedence chain that gives data
descriptors priority over instance variables, instance variables
priority over non-data descriptors, and assigns lowest priority to
__getattr__() if provided.
Here it's some demonstrate code
>>> import types
>>> types.MethodType
<type 'instancemethod'>
>>> def a(self):
... print(1)
...
>>> class B:
... pass
...
>>> types.MethodType(a,B)
<bound method ?.a of <class __main__.B at 0x7f4d3d5aa598>>
>>> B.t = types.MethodType(a,B)
>>> B.t()
1
>>> def s():
... print(3)
...
>>> B.r = types.MethodType(s,B)
>>> B.r
<bound method ?.s of <class __main__.B at 0x7f4d3d5aa598>>
>>> B.r()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: s() takes no arguments (1 given)
See also dynamically adding callable to class as instance "method"

Documentation doesn't say much, but you can always check its source code. The signature of MethodType constructor is:
def __init__(self, func: Callable[..., Any], obj: object) -> None: ...
It accepts a callable and object, and returns None.
MethodType can be used to add an instance method to an object, instead of a function; here's an example:
from types import MethodType
class MyClass:
language = 'Python'
# a function is bound to obj1
obj1 = MyClass()
obj1.say_hello = lambda: 'Hello World!'
print(type(obj1.say_hello)) # type is class 'function'
obj1.say_hello()
# a method is bound to obj2
obj2 = MyClass()
# this is used to bind a "method" to a specific object obj2, rather than a function
obj2.say_hello = MethodType(lambda self: f'Hello {self.language}!', obj2)
print(type(obj2.say_hello)) # type is class 'method'
obj2.say_hello()

It's not something you would ever call. Like most of the classes in the types module, it's more for comparing with existing objects (for example in isinstance).

Why isn't Pickle calling new like the documentation says?

The documentation for Pickle specifically says:
Instances of a new-style class C are created using:
obj = C.__new__(C, *args)
Attempting to take advantage of this, I created a singleton with no instance attributes or methods:
class ZeroResultSentinel(object):
instance = None
def __new__(cls, *args):
if not cls.instance:
cls.instance = super(ZeroResultSentinel, cls).__new__(cls, *args)
return cls.instance
(This class is used in a caching layer to differentiate a no-result result from nothing in the cache.)
The singleton works great (every call to ZeroResultSentinel() results in the same instance in memory, and ZeroResultSentinel() == ZeroResultSentinel() evaluates to True). And I can pickle and unpickle the instance without errors. However, when I unpickle it, I get a different instance. So I placed a breakpoint within __new__. I hit the breakpoint every time I call ZeroResultSentinel(), but I do not hit a breakpoint when I unpickle a pickled ZeroResultSentinel. This is in direct contradiction to the documentation. So am I doing something wrong, or is the documentation incorrect?

The documentation doesn't really make it clear, but your __new__ method will only be used for pickle protocol 2 and up:
>>> class Foo(object):
... def __new__(cls):
... print "New"
... return object.__new__(cls)
...
>>> foo = Foo()
New
>>> pickle.loads(pickle.dumps(foo, protocol=0))
<__main__.Foo object at 0x00000000025E9A20>
>>> pickle.loads(pickle.dumps(foo, protocol=2))
New
<__main__.Foo object at 0x00000000022A3F60>
On Python 2, the default protocol is 0, so if you're using the default, you'll have to change that.

How to get the object for a given class name in Python?

Is there any way to get the object name when the class name is known. If there are multiple objects for a class they also need to be printed.
Class A():
pass
Assume that some one have created objects for class A in some other files. So, I want to look all instances of 'Class A'

If you are the one creating the class you can simply store weak-references when instantiating the class:
import weakref
class A(object):
instances = []
def __init__(self):
A.instances.append(weakref.ref(self))
a, b, c = A(), A(), A()
instances = [ref() for ref in A.instances if ref() is not None]
Using weak-references allow the instances to be deallocated before the class.
See the weakref module for details on what it does.
Note that you may be able to use this technique even with classes that you didn't write. You simply have to monkey-patch the class.
For example:
def track_instances(cls):
def init(self, *args, **kwargs):
getattr(self, 'instances').append(weakref.ref(self))
getattr(self, '_old_init')(self, *args, **kwargs)
cls._old_init = cls.__init__
cls.__init__ = init
return cls
Then you can do:
track_instances(ExternalClass)
And all instances created after the execution of this statement will be found in ExternalClass.instances.
Depending on the class you may have to replace __new__ instead of __init__.
You can do this even without any special code in the class, simply using the garbage collector:
import gc
candidates = gc.get_referrers(cls_object)
instances = [candidate for candidate in candidates if isinstance(candidate, cls_object)]
And you can always obtain the class object since you can find it using object.__subclasses__ method:
cls_object = next(cls for cls in object.__subclasses__() if cls.__name__ == cls_name)
(assuming there is only a class with that name, otherwise you should try all of them)
However I cannot think of a situation where this is the right thing to do, so avoid this code in real applications.
I've done some testing and I believe that this solution may not work for built-in classes or classes defined in C extensions.
If you are in this case the last resort is to use gc.get_objects() to retrieve all tracked objects. However this will work only if the object support cyclic garbage collection, so there isn't a method that works in every possible situation.

Here the version getting the instances from memory, I wouldn't recommend using this in live code but it can be convenient for debugging:
import weakref
class SomeClass(object):
register = []
def __init__(self):
self.register.append(weakref.ref(self))
a = SomeClass()
b = SomeClass()
c = SomeClass()
# Now the magic :)
import gc
def get_instances(class_name):
# Get the objects from memory
for instance in gc.get_objects():
# Try and get the actual class
class_ = getattr(instance, '__class__', None)
# Only return if the class has the name we want
if class_ and getattr(class_, '__name__', None) == class_name:
yield instance
print list(get_instances('SomeClass'))

Python provides the types module that defined classes for built-in types and the locals() and globals() functions that return a list of local and global variables in the application.
One quick way to find objects by type is to do this.
import types
for varname, var_instance in locals().items():
if type(var_instance) == types.InstanceType and var_instance.__class__.__name__ == 'CLASS_NAME_YOU_ARE_LOOKING_FOR':
print "This instance was found:", varname, var_instance
It's worth going through the Python library documentation and read the docs for modules that work with the code directly. Some of which are inspect, gc, types, codeop, code, imp, ast. bdb, pdb. The IDLE source code is also very informative.

Instances are created within a namespace:
def some_function():
some_object = MyClass()
In this case, some_object is a name inside the "namespace" of the function that points at a MyClass instance. Once you leave the namespace (i.e., the function ends), Python's garbage collection cleans up the name and the instance.
If there would be some other location that also has a pointer to the object, the cleanup wouldn't happen.
So: no, there's no place where a list of instances is maintained.
It would be a different case where you to use a database with an ORM (object-relational mapper). In Django's ORM you can do MyClass.objects.all() if MyClass is a database object. Something to look into if you really need the functionality.
Update: See Bakuriu's answer. The garbage collector (which I mentioned) knows about all the instances :-) And he suggests the "weakref" module that prevents my won't-be-cleaned-up problem.

You cann get names for all the instances as they may not all have names, or the names they do have may be in scope. You may be able to get the instances.
If you are willing to keep track of the instances yourself, use a WeakSet:
import weakref
class SomeClass(object):
instances = weakref.WeakSet()
def __init__(self):
self.instances.add(self)
>>> instances = [SomeClass(), SomeClass(), SomeClass()]
>>> other = SomeClass()
>>> SomeClass.instances
<_weakrefset.WeakSet object at 0x0291F6F0>
>>> list(SomeClass.instances)
[<__main__.SomeClass object at 0x0291F710>, <__main__.SomeClass object at 0x0291F730>, <__main__.SomeClass object at 0x028F0150>, <__main__.SomeClass object at 0x0291F210>]
Note that just deleting a name may not destroy the instance. other still exists until the garbage collected:
>>> del other
>>> list(SomeClass.instances)
[<__main__.SomeClass object at 0x0291F710>, <__main__.SomeClass object at 0x0291F730>, <__main__.SomeClass object at 0x028F0150>, <__main__.SomeClass object at 0x0291F210>]
>>> import gc
>>> gc.collect()
0
>>> list(SomeClass.instances)
[<__main__.SomeClass object at 0x0291F710>, <__main__.SomeClass object at 0x0291F730>, <__main__.SomeClass object at 0x0291F210>]
If you don't want to track them manually, then it is possible to use gc.get_objects() and filter out the instances you want, but that means you have to filter through all the objects in your program every time you do this. Even in the above example that means processing nearly 12,000 objects to find the 3 instances you want.
>>> [g for g in gc.get_objects() if isinstance(g, SomeClass)]
[<__main__.SomeClass object at 0x0291F210>, <__main__.SomeClass object at 0x0291F710>, <__main__.SomeClass object at 0x0291F730>]

>>> class TestClass:
... pass
...
>>> foo = TestClass()
>>> for i in dir():
... if isinstance(eval(i), TestClass):
... print(i)
...
foo
>>>

Finally found a way to get through.
As I know the class name, I would search for the object created for that class in garbage collector(gc) like this...
for instance in gc.get_objects():
if str(type(instance)).find("dict") != -1:
for k in instance.keys():
if str(k).find("Sample") != -1:
return k
The above code returns an instance of the class which will be like this. Unfortunately,its in String format which doesn't suit the requirement. It should be of 'obj' type.
<mod_example.Sample object at 0x6f55250>
From the above value, parse the id(0x6f55250) and get the object reference based on the id.
obj_id = 0x6f55250
for obj in gc.get_objects():
# Converting decimal value to hex value
if id(obj) == ast.literal_eval(obj_id):
required_obj = obj
Hence required_obj will hold the object reference exactly in the 'obj' format.
:-)

Dumping a subclass of gtk.ListStore using pickle

I am trying to dump a custom class using pickle. The class was subclassed from gtk.ListStore, since that made it easier to store particular data and then display it using gtk. This can be reproduced as shown here.
import gtk
import pickle
import os
class foo(gtk.ListStore):
pass
if __name__=='__main__':
x = foo(str)
with open(os.path.expandvars('%userprofile%\\temp.txt'),'w') as f:
pickle.dump(x,f)
The solution that I have tried was to add a __getstate__ function into my class. As far as I understand the documentation, this should take precedence for pickle so that it no longer tries to serialize the ListStore, which it is unable to do. However, I still get an identical error from pickle.dump when I try to pickle my object. The error can be reproduced as follows.
import gtk
import pickle
import os
class foo(gtk.ListStore):
def __getstate__(self):
return 'bar'
if __name__=='__main__':
x = foo(str)
with open(os.path.expandvars('%userprofile%\\temp.txt'),'w') as f:
pickle.dump(x,f)
In each case, pickle.dump raises a TypeError, "can't pickle ListStore objects". Using print statements, I have verified that the __getstate__ function is run when using pickle.dump. I don't see any hints as to what to do next from the documentation, and so I'm in a bit of a bind. Any suggestions?

With this method you can even use json instead of pickle for your purpose.
Here is a quick working example to show you the steps you need to employ to pickle "unpicklable types" like gtk.ListStore. Essentially you need to do a few things:
Define __reduce__ which returns a function and arguments needed to reconstruct the instance.
Determine the column types for your ListStore. The method self.get_column_type(0) returns a Gtype, so you will need to map this back to the corresponding Python type. I've left that as an exercise - in my example I've employed a hack to get the column types from the first row of values.
Your _new_foo function will need to rebuild the instance.
Example:
import gtk, os, pickle
def _new_foo(cls, coltypes, rows):
inst = cls.__new__(cls)
inst.__init__(*coltypes)
for row in rows:
inst.append(row)
return inst
class foo(gtk.ListStore):
def __reduce__(self):
rows = [list(row) for row in self]
# hack - to be correct you'll really need to use
# `self.get_column_type` and map it back to Python's
# corresponding type.
coltypes = [type(c) for c in rows[0]]
return _new_foo, (self.__class__, coltypes, rows)
x = foo(str, int)
x.append(['foo', 1])
x.append(['bar', 2])
s = pickle.dumps(x)
y = pickle.loads(s)
print list(y[0])
print list(y[1])
Output:
['foo', 1]
['bar', 2]

When you subclass object, object.__reduce__ takes care of calling __getstate__. It would seem that since this is a subclass of gtk.ListStore, the default implementation of __reduce__ tries to pickle the data for reconstructing a gtk.ListStore object first, then calls your __getstate__, but since the gtk.ListStore can't be pickled, it refuses to pickle your class. The problem should go away if you try to implement __reduce__ and __reduce_ex__ instead of __getstate__.
>>> class Foo(gtk.ListStore):
... def __init__(self, *args):
... super(Foo, self).__init__(*args)
... self._args = args
... def __reduce_ex__(self, proto=None):
... return type(self), self._args, self.__getstate__()
... def __getstate__(self):
... return 'foo'
... def __setstate__(self, state):
... print state
...
>>> x = Foo(str)
>>> pickle.loads(pickle.dumps(x))
foo
<Foo object at 0x18be1e0 (__main__+Foo-v3 at 0x194bd90)>
As an addition, you may try to consider other serializers, such as json. There you take full control of the serialiazaton process by defining how custom classes are to be serialized yourself. Plus by default they come without the security issues of pickle.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Preserve custom attributes when pickling subclass of numpy array - python

Related

Pickle and decorated classes (PicklingError: not the same object)

How is types.MethodType used?

Why isn't Pickle calling new like the documentation says?

How to get the object for a given class name in Python?

Dumping a subclass of gtk.ListStore using pickle

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Preserve custom attributes when pickling subclass of numpy array - python

Related

Pickle and decorated classes (PicklingError: not the same object)

How is types.MethodType used?

Why isn't Pickle calling __new__ like the documentation says?

How to get the object for a given class name in Python?

Dumping a subclass of gtk.ListStore using pickle

Categories

Resources

Why isn't Pickle calling new like the documentation says?