serialize instances of scipy rv_continuous and rv_discrete subclasses - python

I am using the distribution classes in scipy.stats.distributions and need to serialize instances for storage and transfer. These are quite complex objects, and they don't pickle. I am trying to develop a mixin class that makes objects pickle-able, so that I can work with remixed subclasses that otherwise behave just like the objects from scipy.stats. The more I investigate the problem, the more confused I become, and I wonder if I am missing an obvious way to do this.
I have read a related question on how to pickle instance methods, but this is only part of the overall solution that I need and may not even be necessary. I have experimented with writing pickle support functions that closely follow the __init__ method and serialize the object as arguments to __init__, but this seems brittle, especially when subclasses can define arbitrary subclass-specific behavior in __init__.
Does someone have an elegant solution to share?
Update: I found a Python bug report with an example of registering pickle support functions with the copy_reg module to pickle instance methods. For my case, the instance method attributes were the only blockers. However, I would still like to know if there is a way to use a mixin class to solve this problem, because copy_reg has global effects which may not be desireable in all situations.

Related

When does pickle fail to pickle an instance?

I have a python class which I can instantiate and then pickle. But then I have a second class, inheriting from the first, whose instances I cannot pickle. Pickle gives me the error "can't pickle instancemethod". Both instances have plenty of methods. So, does anyone have a guess as to why the first class would pickle OK, but not the second? I'm sure that you will want to see the code, but it's pretty lengthy and I really have no idea what the "offending" parts of the second class might be. So I can't show the whole thing and I don't really know what the relevant parts might be.
There's a pretty extensive list of what can and can't be pickled here:
https://github.com/uqfoundation/dill/blob/master/dill/_objects.py
It lists all objects through the first 15 or so sections in the python standard library, and while it's not everything, it also covers all of the objects of primary and many of the secondary importance in the standard library.
Also, if you decide to use dill instead of pickle, I'm going to guess that you probably won't have a pickling issue, as dill can pretty much serialize anything in python.
More directly addressing your question… pickle pickles classes by reference, while dill pickles classes code or by reference, depending on the setting you choose (default is to pickle the code). This can bypass "lookup" issues for class references that pickle has.
Pickling simply doesnt pickle your classes, pickle only works on data, if you try to pickle a class with built in methods it simply will not work. it will come out glitchy and broken.
source: learning python by Mark Lutz

In Python, what is a method_descriptor?

In Python, what is a method_descriptor (in plain English)?
I had this error, and I can't really find any information on it:
*** TypeError: can't pickle method_descriptor objects
Switch to dill.
I am not interested in debugging this error...
You should be. If you're uninterested in debugging errors, you're in the wrong field. For the sake of polite argumentation, let's charitably assume you authored that comment under the duress of an unreasonable deadline. (It happens.)
The standard pickle module is incapable of serializing so-called "exotic types," including but presumably not limited to: functions with yields, nested functions, lambdas, cells, methods, unbound methods, modules, ranges, slices, code objects, methodwrapper objects, dictproxy objects, getsetdescriptor objects, memberdescriptor objects, wrapperdescriptor objects, notimplemented objects, ellipsis objects, quit objects, and (...wait for it!) method_descriptor objects.
All is not lost, however. The third-party dill package is capable of serializing all of these types and substantially more. Since dill is a drop-in replacement for pickle, globally replacing all calls across your codebase to the pickle.dump() function with the equivalent dill.dump() function should suffice to pickle the problematic method descriptors in question.
I just want to know what a method_descriptor is, in plain English.
No, you don't. There is no plain-English explanation of method descriptors, because the descriptor protocol underlying method descriptors is deliciously dark voodoo.
It's voodoo, because it has to be; it's the fundamental basis for Python's core implementation of functions, properties, static methods, and class methods. It's dark, because only a dwindling cabal of secretive Pythonistas are actually capable of correctly implementing a descriptor in the wild. It's delicious, because the power that data descriptors in particular provide is nonpareil in the Python ecosystem.
Fortunately, you don't need to know what method descriptors are to pickle them. You only need to switch to dill.
method_descriptor is a normal class with
__get__, __set__ and __del__ methods.
You can check the link for more info at
Static vs instance methods of str in Python

Unpickling classes not defined on the receiving end

As stated in the pickle documentation, classes are normally pickled in such a way that they require the exact same class to be present in a module on the receiving end. However, I do note that there's also some __getstate__() and __setstate__() methods for classes, which affect how their instances are pickled...
How feasible would it be to create a metaclass that would allow pickling and unpickling of the classes created from that metaclass (in other words, the instances of that metaclass) even without the classes being present on the receiving end? (Though I think the metaclass would probably have to be present.)
Would utilizing a __reduce__() method in the class or metaclass also be something to look into?
The classes have to be somehow present on the receiving end, because methods are not stored with the objects. So, I think that using a specific metaclass unfortunately can't help, here…

How is __slots__ implemented in Python?

How is __slots__ implemented in Python?
Is this exposed in the C interface?
How do I get __slots__ behaviour when defining a Python class in C via PyTypeObject?
When creating Python classes, they by default have a __dict__ and you can set any attribute on them. The point of slots is to not create a __dict__ to save space.
In the C interface it's the other way around, an extension class has by default no __dict__, and you would instead explicitly have to add one and add getattr/setattr support to handle it (although luckily there are methods for this already, PyObject_GenericGetAttr and PyObject_GenericSetAttr, so you don't have to implement them, just use them. (Funnily there is not PyObject_GenericDelAttr, though, I'm not sure what that is about. (I should probably stop nesting parenthesis like this (or not)))).
Slots therefore aren't needed nor make sense for Extension types. By default you just let your getattr/setatttr methods access only those attributes that the class has.
As for how it's implemented, the code is in typeobject.c, and it's pretty much just a question of "If the object has a __slots__ attribute, don't create a __dict__. Quite unexciting. :-)

How can I pickle suds results?

To avoid repeatedly accessing a SOAP server during development, I'm trying to cache the results so I can run the rest of my code without querying the server each time.
With the code below I get a PicklingError: Can't pickle <class suds.sudsobject.AdvertiserSearchResponse at 0x03424060>: it's not found as suds.sudsobject.AdvertiserSearchResponse when I try to pickle a suds result. I guess this is because the classes are dynamically created.
import pickle
from suds.client import Client
client = Client(...)
result = client.service.search(...)
file = open('test_pickle.dat', 'wb')
pickle.dump(result, file, -1)
file.close()
If I drop the -1 protocol version from pickle.dump(result, file, -1), I get a different error:
TypeError: a class that defines __slots__ without defining __getstate__ cannot be pickled
Is pickling the right thing to do? Can I make it work? Is there a better way?
As the error message you're currently getting is trying to tell you, you're trying to pickle instances that are not picklable (in the ancient legacy pickle protocol you're now using) because their class defines __slots__ but not a __getstate__ method.
However, even altering their class would not help because then you'd run into the other problem -- which you already correctly identified as being likely due to dynamically generated classes. All pickle protocols serialize classes (and functions) "by name", essentially constraining them to be at top-level names in their modules. And, serializing an instance absolutely does require serializing the class (how else could you possibly reconstruct the instance later if the class was not around?!).
So you'll need to save and reload your data in some other way, breaking your current direct dependence on concrete classes in suds.sudsobject in favor of depending on an interface (either formalized or just defined by duck typing) that can be implemented both by such concrete classes when you are in fact accessing the SOAP server, or simpler "homemade" ones when you're loading the data from a file. (The data representing instance state can no doubt be represented as a dict, so you can force it through pickle if you really want, e.g. via the copy_reg module which allows you to customize serialize/deserialize protocols for objects that you're forced to treat non-invasively [[so you can't go around adding __getstate__ or the like to their classes]] -- the problem will come only if there's a rich mesh of mutual references among such objects).
You are pickling the class object itself, and not instance objects of the class. This won't work if the class object is recreated. However, pickling instances of the class will work as long as the class object exists.

Categories