When does pickle fail to pickle an instance? - python

I have a python class which I can instantiate and then pickle. But then I have a second class, inheriting from the first, whose instances I cannot pickle. Pickle gives me the error "can't pickle instancemethod". Both instances have plenty of methods. So, does anyone have a guess as to why the first class would pickle OK, but not the second? I'm sure that you will want to see the code, but it's pretty lengthy and I really have no idea what the "offending" parts of the second class might be. So I can't show the whole thing and I don't really know what the relevant parts might be.

There's a pretty extensive list of what can and can't be pickled here:
https://github.com/uqfoundation/dill/blob/master/dill/_objects.py
It lists all objects through the first 15 or so sections in the python standard library, and while it's not everything, it also covers all of the objects of primary and many of the secondary importance in the standard library.
Also, if you decide to use dill instead of pickle, I'm going to guess that you probably won't have a pickling issue, as dill can pretty much serialize anything in python.
More directly addressing your question… pickle pickles classes by reference, while dill pickles classes code or by reference, depending on the setting you choose (default is to pickle the code). This can bypass "lookup" issues for class references that pickle has.

Pickling simply doesnt pickle your classes, pickle only works on data, if you try to pickle a class with built in methods it simply will not work. it will come out glitchy and broken.
source: learning python by Mark Lutz

Related

How to write test cases for pickle backwards compatibility

You're writing a library, and you know your users pickle your object. Sometimes you add new fields, and this creates a BC problem because old pickles of the objects don't have the needed fields. You'd like to add some tests for this case.
The obvious way to do this is to save some actual pickles from your old version, shove them in your test suite somehow, and make sure you can keep unpickling them. But having hard-coded binary test data is pretty uncool. Is there a way to write tests without having to do some manual binary? I've tried playing around with "faking" the __module__ and __qualname__ fields on a locally declared class (which is intended to simulate the "old" version) but I get errors like "_pickle.PicklingError: Can't pickle : it's not the same object as torch.nn.modules.conv.Conv2d" Is there a good way to do this?

python: pickle misbehaving in django shell as opposed to python shell?

As an additional question stemming from my previous question it turns out that pickle behaves differently in django shell compared to python shell...
this script:
import pickle
class TestObj(object): pass
testobj = TestObj()
pickled = pickle.dumps(testobj, pickle.HIGHEST_PROTOCOL)
will work fine in python shell, but in django shell will raise a PickleError along the lines of PicklingError: Can't pickle <class 'TestObj'>: attribute lookup __builtin__.TestObj failed
Is anyone able to explain the issue here? and if possible, link it back to my previous question?
pickle will make sure it can re-import a class, as only the data on the instance itself is pickled, plus the import location of the class. As such, pickle looks for the __module__ attribute on the class to determine where it came from.
It appears the Django interactive environment does not set this __module__ attribute; as a result TestObj.__module__ is inherited from the object base class instead, and that's __builtin__. Perhaps no __name__ global is set. As a result, the pickle module ends up looking in the wrong place for your class. There is no __builtin__.TestObj after all.
From the comments, I gather that you are trying to store mocked objects in the Django cache. That won't work, as mock objects are not pickleable. That makes sense, as on unpickling (which could be in an entirely new Python process), how would pickle know what original class was being mocked?
Could you not just use a serializer that doesn't pickle classes by reference… and then pre-serialize what you pass into the django cache? This would store not only the instance, but also the class definition itself -- which should allow you to reconstruct the instance anywhere.
See my answer on your original question:
Pickle can't store an object in django locmem cache during tests?
I don't know anything about django Mock objects, or if there's something especially unpicklable about them… But as long as they are built from python code (as opposed to built in C, and have a thin python wrapper layer), then the above should likely work.

Python Pickle, avoiding module dependencies

Is there a particular way to pickle objects so that pickle.load() has no dependencies on any modules? I read that while unpickling objects, Pickle tries to load the module containing the class definition of the object. Is there a way to avoid this, so that pickle.load() doesnt try to load any modules?
May be a bit unrelated but still I would quote form the documentation:
Warning The pickle module is not intended to be secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.
You need to write a custom unpickler that avoids loading extra modules. A general approach will be:
Derive your custom unpickler by subclassing pickle.Unpickler
Override find_class(..)
Inside find_class(..) Check for module and the class that needs to be loaded. Avoid loading it by raising errors.
Use this custom class to unpickle from the string.
Here is an excellent article about dangers of using pickle. You would also find the code that has the above approach.
Does not make much sense what you are asking since the serialization and deserialization of objects is the primary purpose of the pickle functionality. If you want something different: serialize or deserialize your objects to XML or JSON (or any other suitable format).
There is e.g. lxml.objectify or you google for "Python serialize json" or "Python serialize xml"...but you can not deserialize an object from a pickle without its class definition - at least not without further coding.
http://docs.python.org/library/pickle.html
documents how to write a custom unpickler...perhaps that a good way to start - but this appears like the wrong way to do it.

serialize instances of scipy rv_continuous and rv_discrete subclasses

I am using the distribution classes in scipy.stats.distributions and need to serialize instances for storage and transfer. These are quite complex objects, and they don't pickle. I am trying to develop a mixin class that makes objects pickle-able, so that I can work with remixed subclasses that otherwise behave just like the objects from scipy.stats. The more I investigate the problem, the more confused I become, and I wonder if I am missing an obvious way to do this.
I have read a related question on how to pickle instance methods, but this is only part of the overall solution that I need and may not even be necessary. I have experimented with writing pickle support functions that closely follow the __init__ method and serialize the object as arguments to __init__, but this seems brittle, especially when subclasses can define arbitrary subclass-specific behavior in __init__.
Does someone have an elegant solution to share?
Update: I found a Python bug report with an example of registering pickle support functions with the copy_reg module to pickle instance methods. For my case, the instance method attributes were the only blockers. However, I would still like to know if there is a way to use a mixin class to solve this problem, because copy_reg has global effects which may not be desireable in all situations.

How can I pickle suds results?

To avoid repeatedly accessing a SOAP server during development, I'm trying to cache the results so I can run the rest of my code without querying the server each time.
With the code below I get a PicklingError: Can't pickle <class suds.sudsobject.AdvertiserSearchResponse at 0x03424060>: it's not found as suds.sudsobject.AdvertiserSearchResponse when I try to pickle a suds result. I guess this is because the classes are dynamically created.
import pickle
from suds.client import Client
client = Client(...)
result = client.service.search(...)
file = open('test_pickle.dat', 'wb')
pickle.dump(result, file, -1)
file.close()
If I drop the -1 protocol version from pickle.dump(result, file, -1), I get a different error:
TypeError: a class that defines __slots__ without defining __getstate__ cannot be pickled
Is pickling the right thing to do? Can I make it work? Is there a better way?
As the error message you're currently getting is trying to tell you, you're trying to pickle instances that are not picklable (in the ancient legacy pickle protocol you're now using) because their class defines __slots__ but not a __getstate__ method.
However, even altering their class would not help because then you'd run into the other problem -- which you already correctly identified as being likely due to dynamically generated classes. All pickle protocols serialize classes (and functions) "by name", essentially constraining them to be at top-level names in their modules. And, serializing an instance absolutely does require serializing the class (how else could you possibly reconstruct the instance later if the class was not around?!).
So you'll need to save and reload your data in some other way, breaking your current direct dependence on concrete classes in suds.sudsobject in favor of depending on an interface (either formalized or just defined by duck typing) that can be implemented both by such concrete classes when you are in fact accessing the SOAP server, or simpler "homemade" ones when you're loading the data from a file. (The data representing instance state can no doubt be represented as a dict, so you can force it through pickle if you really want, e.g. via the copy_reg module which allows you to customize serialize/deserialize protocols for objects that you're forced to treat non-invasively [[so you can't go around adding __getstate__ or the like to their classes]] -- the problem will come only if there's a rich mesh of mutual references among such objects).
You are pickling the class object itself, and not instance objects of the class. This won't work if the class object is recreated. However, pickling instances of the class will work as long as the class object exists.

Categories