python: pickle misbehaving in django shell as opposed to python shell? - python

As an additional question stemming from my previous question it turns out that pickle behaves differently in django shell compared to python shell...
this script:
import pickle
class TestObj(object): pass
testobj = TestObj()
pickled = pickle.dumps(testobj, pickle.HIGHEST_PROTOCOL)
will work fine in python shell, but in django shell will raise a PickleError along the lines of PicklingError: Can't pickle <class 'TestObj'>: attribute lookup __builtin__.TestObj failed
Is anyone able to explain the issue here? and if possible, link it back to my previous question?

pickle will make sure it can re-import a class, as only the data on the instance itself is pickled, plus the import location of the class. As such, pickle looks for the __module__ attribute on the class to determine where it came from.
It appears the Django interactive environment does not set this __module__ attribute; as a result TestObj.__module__ is inherited from the object base class instead, and that's __builtin__. Perhaps no __name__ global is set. As a result, the pickle module ends up looking in the wrong place for your class. There is no __builtin__.TestObj after all.
From the comments, I gather that you are trying to store mocked objects in the Django cache. That won't work, as mock objects are not pickleable. That makes sense, as on unpickling (which could be in an entirely new Python process), how would pickle know what original class was being mocked?

Could you not just use a serializer that doesn't pickle classes by reference… and then pre-serialize what you pass into the django cache? This would store not only the instance, but also the class definition itself -- which should allow you to reconstruct the instance anywhere.
See my answer on your original question:
Pickle can't store an object in django locmem cache during tests?
I don't know anything about django Mock objects, or if there's something especially unpicklable about them… But as long as they are built from python code (as opposed to built in C, and have a thin python wrapper layer), then the above should likely work.

Related

When does pickle fail to pickle an instance?

I have a python class which I can instantiate and then pickle. But then I have a second class, inheriting from the first, whose instances I cannot pickle. Pickle gives me the error "can't pickle instancemethod". Both instances have plenty of methods. So, does anyone have a guess as to why the first class would pickle OK, but not the second? I'm sure that you will want to see the code, but it's pretty lengthy and I really have no idea what the "offending" parts of the second class might be. So I can't show the whole thing and I don't really know what the relevant parts might be.
There's a pretty extensive list of what can and can't be pickled here:
https://github.com/uqfoundation/dill/blob/master/dill/_objects.py
It lists all objects through the first 15 or so sections in the python standard library, and while it's not everything, it also covers all of the objects of primary and many of the secondary importance in the standard library.
Also, if you decide to use dill instead of pickle, I'm going to guess that you probably won't have a pickling issue, as dill can pretty much serialize anything in python.
More directly addressing your question… pickle pickles classes by reference, while dill pickles classes code or by reference, depending on the setting you choose (default is to pickle the code). This can bypass "lookup" issues for class references that pickle has.
Pickling simply doesnt pickle your classes, pickle only works on data, if you try to pickle a class with built in methods it simply will not work. it will come out glitchy and broken.
source: learning python by Mark Lutz

Python Pickle, avoiding module dependencies

Is there a particular way to pickle objects so that pickle.load() has no dependencies on any modules? I read that while unpickling objects, Pickle tries to load the module containing the class definition of the object. Is there a way to avoid this, so that pickle.load() doesnt try to load any modules?
May be a bit unrelated but still I would quote form the documentation:
Warning The pickle module is not intended to be secure against erroneous or maliciously constructed data. Never unpickle data received from an untrusted or unauthenticated source.
You need to write a custom unpickler that avoids loading extra modules. A general approach will be:
Derive your custom unpickler by subclassing pickle.Unpickler
Override find_class(..)
Inside find_class(..) Check for module and the class that needs to be loaded. Avoid loading it by raising errors.
Use this custom class to unpickle from the string.
Here is an excellent article about dangers of using pickle. You would also find the code that has the above approach.
Does not make much sense what you are asking since the serialization and deserialization of objects is the primary purpose of the pickle functionality. If you want something different: serialize or deserialize your objects to XML or JSON (or any other suitable format).
There is e.g. lxml.objectify or you google for "Python serialize json" or "Python serialize xml"...but you can not deserialize an object from a pickle without its class definition - at least not without further coding.
http://docs.python.org/library/pickle.html
documents how to write a custom unpickler...perhaps that a good way to start - but this appears like the wrong way to do it.

Tool to inspect Python objects without changing them

While trying to track down a resource leak in a Python program this evening, it occurred to me that modern ORMs make the job quite difficult. An object which is, in fact, sitting alone in memory with no children will suddenly appear to have a dozen associated objects as you start checking its attributes because, of course, each attribute dereference invokes a descriptor that pulls in additional information on-the-fly.
I even noticed that doing a simple print of one particular object wound up doing a database query and pulling more linked objects into memory — ruining the careful reference counts that I had been computing — because its __repr__() built the displayed name out of a few associated objects.
There are, it happens, a few techniques that allow objects to be inspected without affecting them — operations like type(obj) and id(obj) and obj.__dict__. (But not printing the __dict__, since that invokes __repr__() on every single value in the dictionary!) Has anyone ever combined these few “safe” inspection methods to support, at a prompt like the Python debugger, convenient inspection and exploration of a Python object graph so that I can see where these files are being held open, running me out of file descriptors?
I need, essentially, an anti-Heisenberg tool, that prevents my acts of inspection from having any side effects!
The “inspect” module:
One answer suggests that I try the inspect() module, but it looks like it dereferences every attribute on the object you supply:
import inspect
class Thing(object):
#property
def one(self):
print 'one() got called!'
return 1
t = Thing()
inspect.getmembers(t)
This outputs:
one() got called!
[('__class__', <class '__main__.Thing'>),
('__delattr__', <method-wrapper '__delattr__'…),
…
('one', 1)]
Python 3.2 now provides inspect.getattr_static() for precisely this kind of use case:
http://docs.python.org/py3k/library/inspect#fetching-attributes-statically
The source code link from the top of the docs page should make it fairly easy to backport that functionality to earlier versions (although keep in mind that as 3.x stdlib code, it isn't built to handle old-style classes).
I'm not aware of any existing tools that combine that kind of technique with inspection of obj.__dict__ to navigate a whole object graph without invoking descriptors, though.
I have no clue how safe the various methods are (it seems fairly dependent on your particular situation) but the inspect module provides a tremendous number of inspection tools.

organising classes and modules in python

I'm getting a bit of a headache trying to figure out how to organise modules and classes together. Coming from C++, I'm used to classes encapsulating all the data and methods required to process that data. In python there are modules however and from code I have looked at, some people have a lot of loose functions stored in modules, whereas others almost always bind their functions to classes as methods.
For example say I have a data structure and would like to write it to disk.
One way would be to implement a save method for that object so that I could just type
MyObject.save(filename)
or something like that. Another method I have seen in equal proportion is to have something like
from myutils import readwrite
readwrite.save(MyObject,filename)
This is a small example, and I'm not sure how python specific this problem is at all, but my general question is what is the best pythonic practice in terms of functions vs methods organisation?
It seems like loose functions bother you. This is the python way. It makes sense because a module in python is really just an object on the same footing as any other object. It does have language level support for loading it from a file but other than that, it's just an object.
so if I have a module foo.py:
import pprint
def show(obj):
pprint(obj)
Then the when I import it from bar.py
import foo
class fubar(object):
#code
def method(self, obj):
#more stuff
foo.show(obj)
I am essentially accessing a method on the foo object. The data attributes of the foo module are just the globals that are defined in foo. A module is the language level implementation of a singleton without the need to prepend self to every methods argument list.
I try to write as many module level functions as possible. If some function will only work with an instance of a particular class, I will make it a method on the class. Otherwise, I try to make it work on instances of every class that is defined in the module for which it would make sense.
The rational behind the exact example that you gave is that if each class has a save method, then if you later change how you are saving data (from say filesystem to database or remote XML file) then you have to change every class. If each class implements an interface to yield that data that it wants saved, then you can write one function to save instances of every class and only change that function once. This is known as the Single Responsibility Principle: Each class should have only one reason to change.
If you have a regular old class you want to save to disk, I would just make it an instance method. If it were a serialization library that could handle different types of objects I would do the second way.

How can I pickle suds results?

To avoid repeatedly accessing a SOAP server during development, I'm trying to cache the results so I can run the rest of my code without querying the server each time.
With the code below I get a PicklingError: Can't pickle <class suds.sudsobject.AdvertiserSearchResponse at 0x03424060>: it's not found as suds.sudsobject.AdvertiserSearchResponse when I try to pickle a suds result. I guess this is because the classes are dynamically created.
import pickle
from suds.client import Client
client = Client(...)
result = client.service.search(...)
file = open('test_pickle.dat', 'wb')
pickle.dump(result, file, -1)
file.close()
If I drop the -1 protocol version from pickle.dump(result, file, -1), I get a different error:
TypeError: a class that defines __slots__ without defining __getstate__ cannot be pickled
Is pickling the right thing to do? Can I make it work? Is there a better way?
As the error message you're currently getting is trying to tell you, you're trying to pickle instances that are not picklable (in the ancient legacy pickle protocol you're now using) because their class defines __slots__ but not a __getstate__ method.
However, even altering their class would not help because then you'd run into the other problem -- which you already correctly identified as being likely due to dynamically generated classes. All pickle protocols serialize classes (and functions) "by name", essentially constraining them to be at top-level names in their modules. And, serializing an instance absolutely does require serializing the class (how else could you possibly reconstruct the instance later if the class was not around?!).
So you'll need to save and reload your data in some other way, breaking your current direct dependence on concrete classes in suds.sudsobject in favor of depending on an interface (either formalized or just defined by duck typing) that can be implemented both by such concrete classes when you are in fact accessing the SOAP server, or simpler "homemade" ones when you're loading the data from a file. (The data representing instance state can no doubt be represented as a dict, so you can force it through pickle if you really want, e.g. via the copy_reg module which allows you to customize serialize/deserialize protocols for objects that you're forced to treat non-invasively [[so you can't go around adding __getstate__ or the like to their classes]] -- the problem will come only if there's a rich mesh of mutual references among such objects).
You are pickling the class object itself, and not instance objects of the class. This won't work if the class object is recreated. However, pickling instances of the class will work as long as the class object exists.

Categories