You're writing a library, and you know your users pickle your object. Sometimes you add new fields, and this creates a BC problem because old pickles of the objects don't have the needed fields. You'd like to add some tests for this case.
The obvious way to do this is to save some actual pickles from your old version, shove them in your test suite somehow, and make sure you can keep unpickling them. But having hard-coded binary test data is pretty uncool. Is there a way to write tests without having to do some manual binary? I've tried playing around with "faking" the __module__ and __qualname__ fields on a locally declared class (which is intended to simulate the "old" version) but I get errors like "_pickle.PicklingError: Can't pickle : it's not the same object as torch.nn.modules.conv.Conv2d" Is there a good way to do this?
Related
I'm pickling an object, then do things to it, then pickle it again, and check it the object is still the same. Maybe not the best way to ensure equality, but a very strict and simple one.
However, I get unexpected difference in the pickle file on some machines running python2.7.
I used pickletools to analyze the resulting files.
The difference is additional PUT codes, though these registers are never accessed. When are PUT statements generated, and why are they generated differently when calling pickle twice on the same object?
Dill is obviously a very useful module, and it seems as long as you manage the files carefully it is relatively safe. But I was put off by the statement:
Thus dill is not intended to be secure against erroneously or maliciously constructed data. It is left to the user to decide whether the data they unpickle is from a trustworthy source.
I read in in https://pypi.python.org/pypi/dill. It's left to the user to decide how to manage their files.
If I understand correctly, once it has been pickled by dill, you can not easily find out what the original script will do without some special skill.
MY QUESTION IS: although I don't see a warning, does a similar situation also exist for pickle?
Dill is built on top of pickle, and the warnings apply just as much to pickle as they do to dill.
Pickle uses a stack language to effectively execute arbitrary Python code. An attacker can sneak in instructions to open up a backport to your machine, for example. Don't ever use pickled data from untrusted sources.
The documentation includes an explicit warning:
Warning: The pickle module is not secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.
Yes
Because Pickle allows you to override the object serialization and deserialization, via
object.__getstate__()
Classes can further influence how their instances are pickled; if the
class defines the method __getstate__(), it is called and the returned
object is pickled as the contents for the instance, instead of the
contents of the instance’s dictionary. If the __getstate__() method is
absent, the instance’s __dict__ is pickled as usual.
object.__setstate__(state)
Upon unpickling, if the class defines __setstate__(), it is called
with the unpickled state. In that case, there is no requirement for
the state object to be a dictionary. Otherwise, the pickled state must
be a dictionary and its items are assigned to the new instance’s
dictionary.
Because these functions can execute arbitrary code at the user's permission level, it is relatively easy to write a malicious deserializer -- e.g. one that deletes all the files on your hard disk.
Although I don't see a warning, does a similar situation also exist for pickle?
Always, always assume that just because someone doesn't state it's dangerous it is not safe to use something.
That being said, Pickle docs do say the same:
Warning The pickle module is not secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.
So yes, that security risk exists on pickle, too.
To explain the background: pickle and dill restore the state of python objects. In CPython, the default python implementation, this means restoring PyObjects structs, which contain a length field. Modification of that, as an example, leads to funky effects and might have arbitrary effects on your python process' memory.
By the way, even assuming that data is not malicious doesn't mean you can un-pickle or un-dill just about anything that comes e.g. from a different python version. So, to me, that question is a bit of theoretical one: If you need portable objects, you will have to implement a rock-solid serialization/deserialization mechanism that transports the data you need transported, and nothing more or less.
I have a python class which I can instantiate and then pickle. But then I have a second class, inheriting from the first, whose instances I cannot pickle. Pickle gives me the error "can't pickle instancemethod". Both instances have plenty of methods. So, does anyone have a guess as to why the first class would pickle OK, but not the second? I'm sure that you will want to see the code, but it's pretty lengthy and I really have no idea what the "offending" parts of the second class might be. So I can't show the whole thing and I don't really know what the relevant parts might be.
There's a pretty extensive list of what can and can't be pickled here:
https://github.com/uqfoundation/dill/blob/master/dill/_objects.py
It lists all objects through the first 15 or so sections in the python standard library, and while it's not everything, it also covers all of the objects of primary and many of the secondary importance in the standard library.
Also, if you decide to use dill instead of pickle, I'm going to guess that you probably won't have a pickling issue, as dill can pretty much serialize anything in python.
More directly addressing your question… pickle pickles classes by reference, while dill pickles classes code or by reference, depending on the setting you choose (default is to pickle the code). This can bypass "lookup" issues for class references that pickle has.
Pickling simply doesnt pickle your classes, pickle only works on data, if you try to pickle a class with built in methods it simply will not work. it will come out glitchy and broken.
source: learning python by Mark Lutz
Is there a particular way to pickle objects so that pickle.load() has no dependencies on any modules? I read that while unpickling objects, Pickle tries to load the module containing the class definition of the object. Is there a way to avoid this, so that pickle.load() doesnt try to load any modules?
May be a bit unrelated but still I would quote form the documentation:
Warning The pickle module is not intended to be secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.
You need to write a custom unpickler that avoids loading extra modules. A general approach will be:
Derive your custom unpickler by subclassing pickle.Unpickler
Override find_class(..)
Inside find_class(..) Check for module and the class that needs to be loaded. Avoid loading it by raising errors.
Use this custom class to unpickle from the string.
Here is an excellent article about dangers of using pickle. You would also find the code that has the above approach.
Does not make much sense what you are asking since the serialization and deserialization of objects is the primary purpose of the pickle functionality. If you want something different: serialize or deserialize your objects to XML or JSON (or any other suitable format).
There is e.g. lxml.objectify or you google for "Python serialize json" or "Python serialize xml"...but you can not deserialize an object from a pickle without its class definition - at least not without further coding.
http://docs.python.org/library/pickle.html
documents how to write a custom unpickler...perhaps that a good way to start - but this appears like the wrong way to do it.
I've got a Python program with about a dozen classes, with several classes possessing instances of other classes, e.g. ObjectA has a list of ObjectB's, and a dictionary of (ObjectC, ObjectD) pairs.
My goal is to put the program's functionality on a website.
I've written and tested JSON encode and decode methods for each class. The problem as I see it now is that I need to choose between starting over and writing the models and logic afresh from a database perspective, or simply storing the python objects (encoded as JSON) in the database, and pulling out the saved states for changes.
Can someone confirm that these are both valid approaches, and that I'm not missing any other simple options?
Man, what I think you can do is convert the classes you already have made into django model classes. Of course, only the ones that need to be saved to a database. The other classes, as the rest of the code, I recommend you to encapsulate them for use as helper functions. So you don't have to change too much your code and it's going to work fine. ;D
Or, another choice, that can be easier to implement is: put everything in a helper, the classes, the functions and everything else.
SO you'll just need to call the functions in your views and define models to save your data into the database.
Your idea of saving the objects as JSON on the database works, but it's ugly. ;)
Anyway, if you are in a hurry to deliver the website, anything is valid. Just remember that things made in this way always give us lots of problems in the future.
It hopes that it could be useful! :D