Python Pickle, avoiding module dependencies - python

Is there a particular way to pickle objects so that pickle.load() has no dependencies on any modules? I read that while unpickling objects, Pickle tries to load the module containing the class definition of the object. Is there a way to avoid this, so that pickle.load() doesnt try to load any modules?

May be a bit unrelated but still I would quote form the documentation:
Warning The pickle module is not intended to be secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.
You need to write a custom unpickler that avoids loading extra modules. A general approach will be:
Derive your custom unpickler by subclassing pickle.Unpickler
Override find_class(..)
Inside find_class(..) Check for module and the class that needs to be loaded. Avoid loading it by raising errors.
Use this custom class to unpickle from the string.
Here is an excellent article about dangers of using pickle. You would also find the code that has the above approach.

Does not make much sense what you are asking since the serialization and deserialization of objects is the primary purpose of the pickle functionality. If you want something different: serialize or deserialize your objects to XML or JSON (or any other suitable format).
There is e.g. lxml.objectify or you google for "Python serialize json" or "Python serialize xml"...but you can not deserialize an object from a pickle without its class definition - at least not without further coding.
http://docs.python.org/library/pickle.html
documents how to write a custom unpickler...perhaps that a good way to start - but this appears like the wrong way to do it.

Related

How to write test cases for pickle backwards compatibility

You're writing a library, and you know your users pickle your object. Sometimes you add new fields, and this creates a BC problem because old pickles of the objects don't have the needed fields. You'd like to add some tests for this case.
The obvious way to do this is to save some actual pickles from your old version, shove them in your test suite somehow, and make sure you can keep unpickling them. But having hard-coded binary test data is pretty uncool. Is there a way to write tests without having to do some manual binary? I've tried playing around with "faking" the __module__ and __qualname__ fields on a locally declared class (which is intended to simulate the "old" version) but I get errors like "_pickle.PicklingError: Can't pickle : it's not the same object as torch.nn.modules.conv.Conv2d" Is there a good way to do this?

Do Pickle and Dill have similar levels of risk of containing malicious script?

Dill is obviously a very useful module, and it seems as long as you manage the files carefully it is relatively safe. But I was put off by the statement:
Thus dill is not intended to be secure against erroneously or maliciously constructed data. It is left to the user to decide whether the data they unpickle is from a trustworthy source.
I read in in https://pypi.python.org/pypi/dill. It's left to the user to decide how to manage their files.
If I understand correctly, once it has been pickled by dill, you can not easily find out what the original script will do without some special skill.
MY QUESTION IS: although I don't see a warning, does a similar situation also exist for pickle?
Dill is built on top of pickle, and the warnings apply just as much to pickle as they do to dill.
Pickle uses a stack language to effectively execute arbitrary Python code. An attacker can sneak in instructions to open up a backport to your machine, for example. Don't ever use pickled data from untrusted sources.
The documentation includes an explicit warning:
Warning: The pickle module is not secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.
Yes
Because Pickle allows you to override the object serialization and deserialization, via
object.__getstate__()
Classes can further influence how their instances are pickled; if the
class defines the method __getstate__(), it is called and the returned
object is pickled as the contents for the instance, instead of the
contents of the instance’s dictionary. If the __getstate__() method is
absent, the instance’s __dict__ is pickled as usual.
object.__setstate__(state)
Upon unpickling, if the class defines __setstate__(), it is called
with the unpickled state. In that case, there is no requirement for
the state object to be a dictionary. Otherwise, the pickled state must
be a dictionary and its items are assigned to the new instance’s
dictionary.
Because these functions can execute arbitrary code at the user's permission level, it is relatively easy to write a malicious deserializer -- e.g. one that deletes all the files on your hard disk.
Although I don't see a warning, does a similar situation also exist for pickle?
Always, always assume that just because someone doesn't state it's dangerous it is not safe to use something.
That being said, Pickle docs do say the same:
Warning The pickle module is not secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.
So yes, that security risk exists on pickle, too.
To explain the background: pickle and dill restore the state of python objects. In CPython, the default python implementation, this means restoring PyObjects structs, which contain a length field. Modification of that, as an example, leads to funky effects and might have arbitrary effects on your python process' memory.
By the way, even assuming that data is not malicious doesn't mean you can un-pickle or un-dill just about anything that comes e.g. from a different python version. So, to me, that question is a bit of theoretical one: If you need portable objects, you will have to implement a rock-solid serialization/deserialization mechanism that transports the data you need transported, and nothing more or less.

When does pickle fail to pickle an instance?

I have a python class which I can instantiate and then pickle. But then I have a second class, inheriting from the first, whose instances I cannot pickle. Pickle gives me the error "can't pickle instancemethod". Both instances have plenty of methods. So, does anyone have a guess as to why the first class would pickle OK, but not the second? I'm sure that you will want to see the code, but it's pretty lengthy and I really have no idea what the "offending" parts of the second class might be. So I can't show the whole thing and I don't really know what the relevant parts might be.
There's a pretty extensive list of what can and can't be pickled here:
https://github.com/uqfoundation/dill/blob/master/dill/_objects.py
It lists all objects through the first 15 or so sections in the python standard library, and while it's not everything, it also covers all of the objects of primary and many of the secondary importance in the standard library.
Also, if you decide to use dill instead of pickle, I'm going to guess that you probably won't have a pickling issue, as dill can pretty much serialize anything in python.
More directly addressing your question… pickle pickles classes by reference, while dill pickles classes code or by reference, depending on the setting you choose (default is to pickle the code). This can bypass "lookup" issues for class references that pickle has.
Pickling simply doesnt pickle your classes, pickle only works on data, if you try to pickle a class with built in methods it simply will not work. it will come out glitchy and broken.
source: learning python by Mark Lutz

python: pickle misbehaving in django shell as opposed to python shell?

As an additional question stemming from my previous question it turns out that pickle behaves differently in django shell compared to python shell...
this script:
import pickle
class TestObj(object): pass
testobj = TestObj()
pickled = pickle.dumps(testobj, pickle.HIGHEST_PROTOCOL)
will work fine in python shell, but in django shell will raise a PickleError along the lines of PicklingError: Can't pickle <class 'TestObj'>: attribute lookup __builtin__.TestObj failed
Is anyone able to explain the issue here? and if possible, link it back to my previous question?
pickle will make sure it can re-import a class, as only the data on the instance itself is pickled, plus the import location of the class. As such, pickle looks for the __module__ attribute on the class to determine where it came from.
It appears the Django interactive environment does not set this __module__ attribute; as a result TestObj.__module__ is inherited from the object base class instead, and that's __builtin__. Perhaps no __name__ global is set. As a result, the pickle module ends up looking in the wrong place for your class. There is no __builtin__.TestObj after all.
From the comments, I gather that you are trying to store mocked objects in the Django cache. That won't work, as mock objects are not pickleable. That makes sense, as on unpickling (which could be in an entirely new Python process), how would pickle know what original class was being mocked?
Could you not just use a serializer that doesn't pickle classes by reference… and then pre-serialize what you pass into the django cache? This would store not only the instance, but also the class definition itself -- which should allow you to reconstruct the instance anywhere.
See my answer on your original question:
Pickle can't store an object in django locmem cache during tests?
I don't know anything about django Mock objects, or if there's something especially unpicklable about them… But as long as they are built from python code (as opposed to built in C, and have a thin python wrapper layer), then the above should likely work.

How can I pickle suds results?

To avoid repeatedly accessing a SOAP server during development, I'm trying to cache the results so I can run the rest of my code without querying the server each time.
With the code below I get a PicklingError: Can't pickle <class suds.sudsobject.AdvertiserSearchResponse at 0x03424060>: it's not found as suds.sudsobject.AdvertiserSearchResponse when I try to pickle a suds result. I guess this is because the classes are dynamically created.
import pickle
from suds.client import Client
client = Client(...)
result = client.service.search(...)
file = open('test_pickle.dat', 'wb')
pickle.dump(result, file, -1)
file.close()
If I drop the -1 protocol version from pickle.dump(result, file, -1), I get a different error:
TypeError: a class that defines __slots__ without defining __getstate__ cannot be pickled
Is pickling the right thing to do? Can I make it work? Is there a better way?
As the error message you're currently getting is trying to tell you, you're trying to pickle instances that are not picklable (in the ancient legacy pickle protocol you're now using) because their class defines __slots__ but not a __getstate__ method.
However, even altering their class would not help because then you'd run into the other problem -- which you already correctly identified as being likely due to dynamically generated classes. All pickle protocols serialize classes (and functions) "by name", essentially constraining them to be at top-level names in their modules. And, serializing an instance absolutely does require serializing the class (how else could you possibly reconstruct the instance later if the class was not around?!).
So you'll need to save and reload your data in some other way, breaking your current direct dependence on concrete classes in suds.sudsobject in favor of depending on an interface (either formalized or just defined by duck typing) that can be implemented both by such concrete classes when you are in fact accessing the SOAP server, or simpler "homemade" ones when you're loading the data from a file. (The data representing instance state can no doubt be represented as a dict, so you can force it through pickle if you really want, e.g. via the copy_reg module which allows you to customize serialize/deserialize protocols for objects that you're forced to treat non-invasively [[so you can't go around adding __getstate__ or the like to their classes]] -- the problem will come only if there's a rich mesh of mutual references among such objects).
You are pickling the class object itself, and not instance objects of the class. This won't work if the class object is recreated. However, pickling instances of the class will work as long as the class object exists.

Categories