Can top level classes be pickled and unpickled (documentation wrong?) - python

The documentation linked below seems to say that top level classes can be pickled, as well as their instances. But based on the answers to my previous question it seem not to be correct. In the script I posted the pickle accepts the class object and writes a file, but this is not useful.
THIS IS MY QUESTION: Is this documentation wrong, or is there something more subtle I don't understand? Also, should pickle be generating some kind of error message in this case?
https://docs.python.org/2/library/pickle.html#what-can-be-pickled-and-unpickled,
The following types can be pickled:
None, True, and False
integers, long integers, floating point numbers, complex numbers
normal and Unicode strings
tuples, lists, sets, and dictionaries containing only picklable objects
functions defined at the top level of a module
built-in functions defined at the top level of a module
classes that are defined at the top level of a module ( my bold )
instances of such classes whose dict or the result of calling getstate() > is picklable (see section The pickle protocol for details).

Make a class that is defined at the top level of a module:
foo.py:
class Foo(object): pass
Then running a separate script,
script.py:
import pickle
import foo
with open('/tmp/out.pkl', 'w') as f:
pickle.dump(foo.Foo, f)
del foo
with open('/tmp/out.pkl', 'r') as f:
cls = pickle.load(f)
print(cls)
prints
<class 'foo.Foo'>
Note that the pickle file, out.pkl, merely contains strings which name the defining module and the name of the class. It does not store the definition of the class:
cfoo
Foo
p0
.
Therefore, at the time of unpickling the defining module, foo, must contain the definition of the class. If you delete the class from the defining module
del foo.Foo
then you'll get the error
AttributeError: 'module' object has no attribute 'Foo'

It's totally possible to pickle a class instance in python… while also saving the code to reconstruct the class and the instance's state. If you want to hack together a solution on top of pickle, or use a "trojan horse" exec based method here's how to do it:
How to unpickle an object whose class exists in a different namespace (python)?
Or, if you use dill, you have a dump function that already knows how to store a class instance, the class code, and the instance state:
How to recover a pickled class and its instances
Pickle python class instance plus definition
I'm the dill author, and I created dill in part to be able to ship class instances and class methods across multiprocessing.
Can't pickle <type 'instancemethod'> when using python's multiprocessing Pool.map()

Related

Python's __reduce__/copy_reg semantic and stateful unpickler

I want to implement pickling support for objects belonging to my extension library. There is a global instance of class Service initialized at startup. All these objects are produced as a result of some Service method invocations and essentially belong to it. Service knows how to serialize them into binary buffers and how deserialize buffers back into objects.
It appeared that Pythons __ reduce__ should serve my purpose - implement pickling support. I started implementing one and realized that there is an issue with unpickler (first element od a tuple expected to be returned by __ reduce__). This unpickle function needs instance of a Service to be able to convert input buffer into an Object. Here is a bit of pseudo code to illustrate the issue:
class Service(object):
...
def pickleObject(self,obj):
# do serialization here and return buffer
...
def unpickleObject(self,buffer):
# do deserialization here and return new Object
...
class Object(object):
...
def __reduce__(self):
return self.service().unpickleObject, (self.service().pickleObject(self),)
Note the first element in a tuple. Python pickler does not like it: it says it is instancemethod and it can't be pickled. Obviously pickler is trying to store the routine into the output and wants Service instance along with function name, but this is not want I want to happen. I do not want (and really can't : Service is not pickable) to store service along with all the objects. I want service instance to be created before pickle.load is invoked and somehow that instance get used during unpickling.
Here where I came by copy_reg module. Again it appeared as it should solve my problems. This module allows to register pickler and unpickler routines per type dynamically and these are supposed to be used later on for the objects of this type. So I added this registration to the Service construction:
class Service(object):
...
def __init__(self):
...
import copy_reg
copy_reg( mymodule.Object, self.pickleObject, self.unpickleObject )
self.unpickleObject is now a bound method taking service as a first parameter and buffer as second. self.pickleObject is also bound method taking service and object to pickle. copy_reg required that pickleObject routine should follow reducer semantic and returns similar tuple as before. And here the problem arose again: what should I return as the first tuple element??
class Service(object):
...
def pickleObject(self,obj):
...
return self.unpickleObject, (self.serialize(obj),)
In this form pickle again complains that it can't pickle instancemethod. I tried None - it does not like it either. I put there some dummy function. This works - meaning serialization phase went through fine, but during unpickling it calls this dummy function instead of unpickler I registered for the type mymodule.Object in Service constructor.
So now I am at loss. Sorry for long explanation: I did not know how to ask this question in a few lines. I can summarize my questions like this:
Why does copy_reg semantic requires me to return unpickler routine from pickleObject, if I an expected to register one independently?
Is there any reason to prefer copy_reg.constructor interface to register unpickler routine?
How do I make pickle to use the unpickler I registered instead of one inside the stream?
What should I return as first element in a tuple as pickleObject result value? Is there a "correct" value?
Do I approach this whole thing correctly? Is there different/simpler solution?
First of all, the copy_reg module is unlikely to help you much here: it is primarily a way to add __reduce__ like features to classes that don't have that method rather than offering any special abilities (e.g. if you want to pickle objects from some library that doesn't natively support it).
The callable returned by __reduce__ needs to be locatable in the environment where the object is to be unpickled, so an instance method isn't really appropriate. As mentioned in the Pickle documentation:
In the unpickling environment this object must be either a class, a callable registered as a
“safe constructor” (see below), or it must have an attribute
__safe_for_unpickling__ with a true value.
So if you defined a function (not method) as follows:
def _unpickle_service_object(buffer):
# Grab the global service object, however that is accomplished
service = get_global_service_object()
return service.unpickleObject(buffer)
_unpickle_service_object.__safe_for_unpickling__ = True
You could now use this _unpickle_service_object function in the return value of your __reduce__ methods so that your objects linked to the new environment's global Service object when unpickled.

Classes How I understand them. Correct me if Im wrong please

I really hope this is not a question posed by millions of newbies, but my search didn t really give me a satisfying answer.
So my question is fairly simple. Are classes basically a container for functions with its own namespace? What other functions do they have beside providing a separate namespace and holding functions while making them callable as class atributes? Im asking in a python context.
Oh and thanks for the great help most of you have been!
More importantly than functions, class instances hold data attributes, allowing you to define new data types beyond what is built into the language; and
they support inheritance and duck typing.
For example, here's a moderately useful class. Since Python files (created with open) don't remember their own name, let's make a file class that does.
class NamedFile(object):
def __init__(self, name):
self._f = f
self.name = name
def readline(self):
return self._f.readline()
Had Python not had classes, you'd probably be working with dicts instead:
def open_file(name):
return {"name": name, "f": open(name)}
Needless to say, calling myfile["f"].readline() all the time will cause your fingers to hurt at some point. You could of course introduce a function readline in a NamedFile module (namespace), but then you'd always have to use that exact function. By contrast, NamedFile instances can be used anywhere you need an object with a readline method, so it would be a plug-in replacement for file in many situation. That's called polymorphism, one of the biggest benefits of OO/class-based programming.
(Also, dict is a class, so using it violates the assumption that there are no classes :)
In most languages, classes are just pieces of code that describe how to produce an object. That's kinda true in Python too:
>>> class ObjectCreator(object):
... pass
...
>>> my_object = ObjectCreator()
>>> print my_object
<__main__.ObjectCreator object at 0x8974f2c>
But classes are more than that in Python. Classes are objects too.
Yes, objects.
As soon as you use the keyword class, Python executes it and creates an OBJECT. The instruction:
>>> class ObjectCreator(object):
... pass
...
creates in memory an object with the name ObjectCreator.
This object (the class) is itself capable of creating objects (the instances), and this is why it's a class.
But still, it's an object, and therefore:
you can assign it to a variable
you can copy it
you can add attributes to it
you can pass it as a function parameter
e.g.:
>>> print ObjectCreator # you can print a class because it's an object
<class '__main__.ObjectCreator'>
>>> def echo(o):
... print o
...
>>> echo(ObjectCreator) # you can pass a class as a parameter
<class '__main__.ObjectCreator'>
>>> print hasattr(ObjectCreator, 'new_attribute')
False
>>> ObjectCreator.new_attribute = 'foo' # you can add attributes to a class
>>> print hasattr(ObjectCreator, 'new_attribute')
True
>>> print ObjectCreator.new_attribute
foo
>>> ObjectCreatorMirror = ObjectCreator # you can assign a class to a variable
>>> print ObjectCreatorMirror.new_attribute
foo
>>> print ObjectCreatorMirror()
<__main__.ObjectCreator object at 0x8997b4c>
Classes (or objects) are used to provide encapsulation of data and operations that can be performed on that data.
They don't provide namespacing in Python per se; module imports provide the same type of stuff and a module can be entirely functional rather than object oriented.
You might gain some benefit from looking at OOP With Python, Dive into Python, Chapter 5. Objects and Object Oriented Programming or even just the Wikipedia article on object oriented programming
A class is the definition of an object. In this sense, the class provides a namespace of sorts, but that is not the true purpose of a class. The true purpose is to define what the object will 'look like' - what the object is capable of doing (methods) and what it will know (properties).
Note that my answer is intended to provide a sense of understanding on a relatively non-technical level, which is what my initial trouble was with understanding classes. I'm sure there will be many other great answers to this question; I hope this one adds to your overall understanding.

Python Suite, Package, Module, TestCase and TestSuite differences

Best Guess:
method - def(self, maybeSomeVariables); lines of code which achieve some purpose
Function - same as method but returns something
Class - group of methods/functions
Module - a script, OR one or more classes. Basically a .py file.
Package - a folder which has modules in, and also a __init__.py file in there.
Suite - Just a word that gets thrown around a lot, by convention
TestCase - unittest's equivalent of a function
TestSuite - unittest's equivalent of a Class (or Module?)
My question is: Is this completely correct, and did I miss any hierarchical building blocks from that list?
I feel that you're putting in differences that don't actually exist. There isn't really a hierarchy as such. In python everything is an object. This isn't some abstract notion, but quite fundamental to how you should think about constructs you create when using python. An object is just a bunch of other objects. There is a slight subtlety in whether you're using new-style classes or not, but in the absence of a good reason otherwise, just use and assume new-style classes. Everything below is assuming new-style classes.
If an object is callable, you can call it using the calling syntax of a pair of braces, with the arguments inside them: my_callable(arg1, arg2). To be callable, an object needs to implement the __call__ method (or else have the correct field set in its C level type definition).
In python an object has a type associated with it. The type describes how the object was constructed. So, for example, a list object is of type list and a function object is of type function. The types themselves are of type type. You can find the type by using the built-in function type(). A list of all the built-in types can be found in the python documentation. Types are actually callable objects, and are used to create instances of a given type.
Right, now that's established, the nature of a given object is defined by it's type. This describes the objects of which it comprises. Coming back to your questions then:
Firstly, the bunch of objects that make up some object are called the attributes of that object. These attributes can be anything, but they typically consist of methods and some way of storing state (which might be types such as int or list).
A function is an object of type function. Crucially, that means it has the __call__ method as an attribute which makes it a callable (the __call__ method is also an object that itself has the __call__ method. It's __call__ all the way down ;)
A class, in the python world, can be considered as a type, but typically is used to refer to types that are not built-in. These objects are used to create other objects. You can define your own classes with the class keyword, and to create a class which is new-style you must inherit from object (or some other new-style class). When you inherit, you create a type that acquires all the characteristics of the parent type, and then you can overwrite the bits you want to (and you can overwrite any bits you want!). When you instantiate a class (or more generally, a type) by calling it, another object is returned which is created by that class (how the returned object is created can be changed in weird and crazy ways by modifying the class object).
A method is a special type of function that is called using the attribute notation. That is, when it is created, 2 extra attributes are added to the method (remember it's an object!) called im_self and im_func. im_self I will describe in a few sentences. im_func is a function that implements the method. When the method is called, like, for example, foo.my_method(10), this is equivalent to calling foo.my_method.im_func(im_self, 10). This is why, when you define a method, you define it with the extra first argument which you apparently don't seem to use (as self).
When you write a bunch of methods when defining a class, these become unbound methods. When you create an instance of that class, those methods become bound. When you call an bound method, the im_self argument is added for you as the object in which the bound method resides. You can still call the unbound method of the class, but you need to explicitly add the class instance as the first argument:
class Foo(object):
def bar(self):
print self
print self.bar
print self.bar.im_self # prints the same as self
We can show what happens when we call the various manifestations of the bar method:
>>> a = Foo()
>>> a.bar()
<__main__.Foo object at 0x179b610>
<bound method Foo.bar of <__main__.Foo object at 0x179b610>>
<__main__.Foo object at 0x179b610>
>>> Foo.bar()
TypeError: unbound method bar() must be called with Foo instance as first argument (got nothing instead)
>>> Foo.bar(a)
<__main__.Foo object at 0x179b610>
<bound method Foo.bar of <__main__.Foo object at 0x179b610>>
<__main__.Foo object at 0x179b610>
Bringing all the above together, we can define a class as follows:
class MyFoo(object):
a = 10
def bar(self):
print self.a
This generates a class with 2 attributes: a (which is an integer of value 10) and bar, which is an unbound method. We can see that MyFoo.a is just 10.
We can create extra attributes at run time, both within the class methods, and outside. Consider the following:
class MyFoo(object):
a = 10
def __init__(self):
self.b = 20
def bar(self):
print self.a
print self.b
def eep(self):
print self.c
__init__ is just the method that is called immediately after an object has been created from a class.
>>> foo = Foo()
>>> foo.bar()
10
20
>>> foo.eep()
AttributeError: 'MyFoo' object has no attribute 'c'
>>> foo.c = 30
>>> foo.eep()
30
This example shows 2 ways of adding an attribute to a class instance at run time (that is, after the object has been created from it's class).
I hope you can see then, that TestCase and TestSuite are just classes that are used to create test objects. There's nothing special about them except that they happen to have some useful features for writing tests. You can subclass and overwrite them to your heart's content!
Regarding your specific point, both methods and functions can return anything they want.
Your description of module, package and suite seems pretty sound. Note that modules are also objects!

How to obtain all instances of a class within the current module

I have a module foo that defines a class Foo, and instantiates a number of instances of this class.
From other modules, I can import foo and obtain a list of the Foo objects instantiated by:
[getattr(foo,f) for f in dir(f) if isinstance(getattr(foo,f), foo.Foo)]
It would be convenient if I could do this from within the module foo. But as written, the name foo is meaningless inside foo, and changing to self doesn't help.
Is there a way to use introspection within this module to find all instances of this class? I was hoping there was a way to avoid making a list and appending each instance.
goal: "import foo and obtain a list of the Foo objects instantiated by..."
You don't need introspection at all. Imagine this is your class:
class Registered(object):
_all = set()
def __init__(self):
self.__class__._all.add(self)
Demo; instantiate a bunch of objects and refer to them:
>>> Registered()
>>> Registered()
>>> Registered()
>>> Registered._all
{<__main__.Registered object at 0xcd3210>, <__main__.Registered object at 0xcd3150>, <__main__.Registered object at 0xcd31d0>}
Since you are using dir, if I understand you correctly, you are trying to find all the objects in the current module's global variables. However it doesn't really get you all the objects in the current module, only those bound to variables. If that's what you really want, you can do {var for name,var in globals() if isinstance(var,Foo)}
If that's not what you really want, you can modify the above to keep track of the module the object was defined in via inspect.currentframe().f_back.f_globals or something, but this would prevent you from using factory functions defined in foo.py or in other modules.
If you are actually trying to get all instances of something created in a module, this is a sign of a serious coding issue however; consider perhaps writing a factory function wrapper in that module, which does nothing except pass along the initialization args and return the result, keeping track of initialized objects.

Masquerading real module of a class

Suppose you have the following layout for a python package
./a
./a/__init__.py
./a/_b.py
inside __init__.py you have
from _b import *
and inside _b.py you have
class B(object): pass
If you import from interactive prompt
>>> import a
>>> a.B
<class 'a._b.B'>
>>>
How can I completely hide the existence of _b ?
The problem I am trying to solve is the following: I want a facade package importing "hidden" modules and classes. The classes available from the facade (in my case a) are kept stable and guaranteed for the future. I want, however, freedom to relocate classes "under the hood", hence the hidden modules. This is all nice, but if some client code pickles an object provided by the facade, this pickled data will refer to the hidden module nesting, not to the facade nesting. In other words, if I reposition the B class in a module _c.py, client codes will not be able to unpickle because the pickled classes are referring to a._b.B, which has been moved. If they referred to a.B, I could relocate the B class as much as I want under the hood, without ruining pickled data.
try:
B.__module__= 'a'
Incidentally you probably want an absolute import:
from a._b import *
as relative imports without the new explicit dot syntax are going away (see PEP 328).
ETA re comment:
I would have to set the module explicitly for every class
Yes, I don't think there's a way around that but you could at least automate it, at the end of __init__:
for value in globals().values():
if inspect.isclass(value) and value.__module__.startswith('a.'):
value.__module__= 'a'
(For new-style classes only you could get away with isinstance(value, type) instead of inspect. If the module doesn't have to run as __main__ you could use __name__ instead of hard-coding 'a'.)
You could set the __module__ variable for Class B
class B(object): pass
B.__module__ = 'a'
For classes, functions, and methods, this attribute contains the name of the module in which the object was defined.
Or define it once in your __init__.py:
from a._b import B # change this line, when required, e.g. from a._c import B
B.__module__ = 'a'

Categories