Python 3.6 pickling custom procedure - python

I have some objects of class A which has its own method to be pickled, call it custom_module.customPickle(A) which takes an instance of A and returns a serialization string.
I also have list of objects each of class B that contains A.
I need to pickle the list, but pickling A gives some error difficult to solve. However, A has its own method to be pickled.
I can implement the __reduce__() method in class B so that it calls custom_module.customPickle(A). But how can I do this so that pickle is able to serialize B efficiently?
Object A is a music21.stream and object B is a custom object. The custom serialization function is music21.converter.freezeStr(streamObj, fmt=None) and the unpickle function should be music21.converter.thawStr(strData)

You can use the copyreg module to register custom functions for pickling and unpickling; the function you register acts like a __reduce__ method on the class.
If you return a tuple of (unpickle_function, state), then the registered unpickle_function callable will be called to unpickle it again, with state as the argument, so you can use your music21.converter.thawStr() function there:
import copyreg
import music21.converter
import music21.stream
def pickle_music21_stream(stream_obj):
return music21.converter.thawStr, (music21.converter.freezeStr(stream_obj),)
copyreg.pickle(music21.stream.Stream, pickle_music21_stream)
(the constructor argument to copyreg is ignored in recent Python versions)
This registers a global handler for those objects. You can also use a dispatch table per pickler, see [*Dispatch Tables on how you'd register one.
Now, when pickling, when encountering any instances of Stream the handle_stream() function is used to produce a serialisation, and the thawStr() function will be used to unpickle that data again.
However, the music21.converter functions use pickle themselves. They effectively pack and clean up the stream, and then pickle the resulting Stream instance. This will then call the custom handler, and you have an infinite loop.
The work-around is to use a custom dispatch table to handle pickling and unpickling. Avoid using copyreg in this case, as it sets a global hook that'll be called recursively each time a Stream object is being pickled.
Your own pickle infrastructure needs to use a custom pickler:
import copyreg
import io
import pickle
import music21.converter
import music21.stream
def pickle_music21_stream(stream_obj):
return music21.converter.thawStr, (music21.converter.freezeStr(stream_obj),)
def dumps(obj):
f = io.BytesIO()
p = pickle.Pickler(f)
p.dispatch_table = copyreg.dispatch_table.copy()
p.dispatch_table[music21.stream.Stream] = pickle_music21_stream
p.dump(obj)
return f.getvalue()
def loads(data):
return pickle.loads(data) # hook is registered in the pickle data
Here the custom function is only called when a Stream instance is found in your own data structure. The music21 routines use the global pickle.dumps() and pickle.loads() functions and won't use the same hook.

Related

Pickling Cython decorated function results in PicklingError

I have the following code:
def decorator(func):
#functools.wraps(func)
def other_func():
print('other func')
return other_func
#decorator
def func():
pass
If I try to pickle func everything works. However if I compile the module as a Cython extension it fails.
Here is the error:
>>>> pickle.dumps(module.func)
PicklingError: Can't pickle <cyfunction decorator.<locals>.other_func at 0x102a45a58>: attribute lookup other_func on module failed
The same happens if I use dill instead of pickle.
Do you know how to fix it?
I don't think there is anything you can really do here. It looks like a possible bug in Cython. But there might be a good reason for why Cython does what it does that I don't know about.
The problem arises because Cython functions are exposed as builtin functions in Python land (eg. map, all, etc.). These functions cannot have their name attributes changed. However, Cython attempts to make its functions more like pure Python functions, and so provides for the ability for several of their attributes to modified. However, the Cython functions also implement __reduce__ which customises how objects are serialised by pickle. It looks like this function does think the name of the function object can be changed and so ignores these values and uses the name of the internal PyCFunction struct that is being wrapped (github blob).
Best thing you can do is file a bug report. You might be able to create a thin wrapper than enables your function to be serialised, but this will add overhead when the function is called.
Customising Pickle
You can use the persistent_id feature of the Pickler and Unpickler to override the custom implementation that Cython has provided. Below is how to customise pickling for specific types/objects. It's done with a pure python function, but you can easily change it to deal with Cython functions.
import pickle
from importlib import import_module
from io import BytesIO
# example using pure python
class NoPickle:
def __init__(self, name):
# emulating a function set of attributes needed to pickle
self.__module__ = __name__
self.__qualname__ = name
def __reduce__(self):
# cannot pickle this object
raise Exception
my_object = NoPickle('my_object')
# pickle.dumps(obj) # error!
# use persistent_id/load to help dump/load cython functions
class CustomPickler(pickle.Pickler):
def persistent_id(self, obj):
if isinstance(obj, NoPickle):
# replace with NoPickle with type(module.func) to get the correct type
# alternatively you might want to include a simple cython function
# in the same module to make it easier to get the write type.
return "CythonFunc" , obj.__module__, obj.__qualname__
else:
# else return None to pickle the object as normal
return None
class CustomUnpickler(pickle.Unpickler):
def persistent_load(self, pid):
if pid[0] == "CythonFunc":
_, mod_name, func_name = pid
return getattr(import_module(mod_name), func_name)
else:
raise pickle.UnpicklingError('unsupported pid')
bytes_ = BytesIO()
CustomPickler(bytes_).dump(my_object)
bytes_.seek(0)
obj = CustomUnpickler(bytes_).load()
assert obj is my_object

Pickle and decorated classes (PicklingError: not the same object)

The following minimal example uses a dummy decorator, that justs prints some message when an object of the decorated class is constructed.
import pickle
def decorate(message):
def call_decorator(func):
def wrapper(*args, **kwargs):
print(message)
return func(*args, **kwargs)
return wrapper
return call_decorator
#decorate('hi')
class Foo:
pass
foo = Foo()
dump = pickle.dumps(foo) # Fails already here.
foo = pickle.loads(dump)
Using it however makes pickle raise the following exception:
_pickle.PicklingError: Can't pickle <class '__main__.Foo'>: it's not the same object as __main__.Foo
Is there anything I can do to fix this?
Pickle requires that the __class__ attribute of instances can be loaded via importing.
Pickling instances only stores the instance data, and the __qualname__ and __module__ attributes of the class are used to later on re-create the instance by importing the class again and creating a new instance for the class.
Pickle validates that the class can actually be imported first. The __module__ and __qualname__ pair are used to find the correct module and then access the object named by __qualname__ on that module, and if the __class__ object and the object found on the module don't match, the error you see is raised.
Here, foo.__class__ points to a class object with __qualname__ set to 'Foo' and __module__ set to '__main__', but sys.modules['__main__'].Foo doesn't point to a class, it points to a function instead, the wrapper nested function your decorator returned.
There are two possible solutions:
Don't return a function, return the original class, and perhaps instrument the class object to do the work the wrapper does. If you are acting on the arguments for the class constructor, add or wrap a __new__ or __init__ method on the decorated class.
Take into account that unpickling usually calls __new__ on the class to create a new empty instance, before restoring the instance state (unless pickling has been customised).
Store the class under a new location. Alter the __qualname__ and perhaps the __module__ attributes of the class to point to a location where the original class can be found by pickle. On unpickling the right type of instance will be created again, just like the original Foo() call would have.
Another option is to customise pickling for the produced class. You can give the class new __reduce_ex__ and new __reduce__ methods that point to the wrapper function or a custom reduce function, instead. This can get complex, as the class may already have customised pickling, and object.__reduce_ex__ provides a default, and the return value can differ by pickle version.
If you don't want to alter the class, you can also use the copyreg.pickle() function to register a custom __reduce__ handler for the class.
Either way, the return value of the reducer should still avoid referencing the class and should reference the new constructor instead, by the name that it can be imported with. This can be problematic if you use the decorator directly with new_name = decorator()(classobj). Pickle itself would not deal with such situations either (as classobj.__name__ would not match newname).
Using dill, istead of pickle raises no errors.
import dill
def decorate(message):
def call_decorator(func):
def wrapper(*args, **kwargs):
print(message)
return func(*args, **kwargs)
return wrapper
return call_decorator
#decorate('hi')
class Foo:
pass
foo = Foo()
dump = dill.dumps(foo) # Fails already here.
foo = dill.loads(dump)
output -> hi

boost python enable_pickling expectation

Hi I am using python to initiate a cpp class which use boost python lib to convert into python usable. at the same time, i have a requirement to pickle the python classes that use the python enabled cpp class.
So what i did is to add enable_picking() to an example class definition like this:
class_<pform::base::Price>("Price", init<double>())
.def(self == self)
.def(self_ns::str(self_ns::self)) // __str__
.def("get_value", &pform::base::Price::get_value)
it make the class pickleable. However i get this error when unpickle it.
Boost.Python.ArgumentError: Python argument types in
Price.__init__(Price)
did not match C++ signature:
__init__(_object*, double)
So what is missing here?
A bit late, but I found the relevant boost documentation for this:
http://www.boost.org/doc/libs/1_64_0/libs/python/doc/html/reference/topics/pickle_support.html
The Pickle Interface
At the user level, the Boost.Python pickle
interface involves three special methods:
__getinitargs__ When an instance of a Boost.Python extension class is pickled, the pickler tests if the instance has a __getinitargs__
method. This method must return a Python tuple (it is most convenient
to use a boost::python::tuple). When the instance is restored by the
unpickler, the contents of this tuple are used as the arguments for
the class constructor. If __getinitargs__ is not defined, pickle.load
will call the constructor (__init__) without arguments; i.e., the
object must be default-constructible.
__getstate__ When an instance of a Boost.Python extension class is pickled, the pickler tests if the instance has a __getstate__ method.
This method should return a Python object representing the state of
the instance.
__setstate__ When an instance of a Boost.Python extension class is restored by the unpickler (pickle.load), it is first constructed using
the result of __getinitargs__ as arguments (see above). Subsequently
the unpickler tests if the new instance has a __setstate__ method. If
so, this method is called with the result of __getstate__ (a Python
object) as the argument.
The three special methods described above may
be .def()'ed individually by the user. However, Boost.Python provides
an easy to use high-level interface via the
boost::python::pickle_suite class that also enforces consistency:
__getstate__ and __setstate__ must be defined as pairs. Use of this interface is demonstrated by the following examples.
In your particular example the class is not default constructible as it needs a double argument (which I assume is the "value"). To wrap it for Python you would also need to define:
.def("__getinitargs__", +[](pform::base::Price const& self){
return boost::python::make_tuple(self.get_value());
})
Now Boost Python will initialize your class using "value"; instead of calling the default constructor (pform::base::Price()).

Can top level classes be pickled and unpickled (documentation wrong?)

The documentation linked below seems to say that top level classes can be pickled, as well as their instances. But based on the answers to my previous question it seem not to be correct. In the script I posted the pickle accepts the class object and writes a file, but this is not useful.
THIS IS MY QUESTION: Is this documentation wrong, or is there something more subtle I don't understand? Also, should pickle be generating some kind of error message in this case?
https://docs.python.org/2/library/pickle.html#what-can-be-pickled-and-unpickled,
The following types can be pickled:
None, True, and False
integers, long integers, floating point numbers, complex numbers
normal and Unicode strings
tuples, lists, sets, and dictionaries containing only picklable objects
functions defined at the top level of a module
built-in functions defined at the top level of a module
classes that are defined at the top level of a module ( my bold )
instances of such classes whose dict or the result of calling getstate() > is picklable (see section The pickle protocol for details).
Make a class that is defined at the top level of a module:
foo.py:
class Foo(object): pass
Then running a separate script,
script.py:
import pickle
import foo
with open('/tmp/out.pkl', 'w') as f:
pickle.dump(foo.Foo, f)
del foo
with open('/tmp/out.pkl', 'r') as f:
cls = pickle.load(f)
print(cls)
prints
<class 'foo.Foo'>
Note that the pickle file, out.pkl, merely contains strings which name the defining module and the name of the class. It does not store the definition of the class:
cfoo
Foo
p0
.
Therefore, at the time of unpickling the defining module, foo, must contain the definition of the class. If you delete the class from the defining module
del foo.Foo
then you'll get the error
AttributeError: 'module' object has no attribute 'Foo'
It's totally possible to pickle a class instance in python… while also saving the code to reconstruct the class and the instance's state. If you want to hack together a solution on top of pickle, or use a "trojan horse" exec based method here's how to do it:
How to unpickle an object whose class exists in a different namespace (python)?
Or, if you use dill, you have a dump function that already knows how to store a class instance, the class code, and the instance state:
How to recover a pickled class and its instances
Pickle python class instance plus definition
I'm the dill author, and I created dill in part to be able to ship class instances and class methods across multiprocessing.
Can't pickle <type 'instancemethod'> when using python's multiprocessing Pool.map()

Python's __reduce__/copy_reg semantic and stateful unpickler

I want to implement pickling support for objects belonging to my extension library. There is a global instance of class Service initialized at startup. All these objects are produced as a result of some Service method invocations and essentially belong to it. Service knows how to serialize them into binary buffers and how deserialize buffers back into objects.
It appeared that Pythons __ reduce__ should serve my purpose - implement pickling support. I started implementing one and realized that there is an issue with unpickler (first element od a tuple expected to be returned by __ reduce__). This unpickle function needs instance of a Service to be able to convert input buffer into an Object. Here is a bit of pseudo code to illustrate the issue:
class Service(object):
...
def pickleObject(self,obj):
# do serialization here and return buffer
...
def unpickleObject(self,buffer):
# do deserialization here and return new Object
...
class Object(object):
...
def __reduce__(self):
return self.service().unpickleObject, (self.service().pickleObject(self),)
Note the first element in a tuple. Python pickler does not like it: it says it is instancemethod and it can't be pickled. Obviously pickler is trying to store the routine into the output and wants Service instance along with function name, but this is not want I want to happen. I do not want (and really can't : Service is not pickable) to store service along with all the objects. I want service instance to be created before pickle.load is invoked and somehow that instance get used during unpickling.
Here where I came by copy_reg module. Again it appeared as it should solve my problems. This module allows to register pickler and unpickler routines per type dynamically and these are supposed to be used later on for the objects of this type. So I added this registration to the Service construction:
class Service(object):
...
def __init__(self):
...
import copy_reg
copy_reg( mymodule.Object, self.pickleObject, self.unpickleObject )
self.unpickleObject is now a bound method taking service as a first parameter and buffer as second. self.pickleObject is also bound method taking service and object to pickle. copy_reg required that pickleObject routine should follow reducer semantic and returns similar tuple as before. And here the problem arose again: what should I return as the first tuple element??
class Service(object):
...
def pickleObject(self,obj):
...
return self.unpickleObject, (self.serialize(obj),)
In this form pickle again complains that it can't pickle instancemethod. I tried None - it does not like it either. I put there some dummy function. This works - meaning serialization phase went through fine, but during unpickling it calls this dummy function instead of unpickler I registered for the type mymodule.Object in Service constructor.
So now I am at loss. Sorry for long explanation: I did not know how to ask this question in a few lines. I can summarize my questions like this:
Why does copy_reg semantic requires me to return unpickler routine from pickleObject, if I an expected to register one independently?
Is there any reason to prefer copy_reg.constructor interface to register unpickler routine?
How do I make pickle to use the unpickler I registered instead of one inside the stream?
What should I return as first element in a tuple as pickleObject result value? Is there a "correct" value?
Do I approach this whole thing correctly? Is there different/simpler solution?
First of all, the copy_reg module is unlikely to help you much here: it is primarily a way to add __reduce__ like features to classes that don't have that method rather than offering any special abilities (e.g. if you want to pickle objects from some library that doesn't natively support it).
The callable returned by __reduce__ needs to be locatable in the environment where the object is to be unpickled, so an instance method isn't really appropriate. As mentioned in the Pickle documentation:
In the unpickling environment this object must be either a class, a callable registered as a
“safe constructor” (see below), or it must have an attribute
__safe_for_unpickling__ with a true value.
So if you defined a function (not method) as follows:
def _unpickle_service_object(buffer):
# Grab the global service object, however that is accomplished
service = get_global_service_object()
return service.unpickleObject(buffer)
_unpickle_service_object.__safe_for_unpickling__ = True
You could now use this _unpickle_service_object function in the return value of your __reduce__ methods so that your objects linked to the new environment's global Service object when unpickled.

Categories