I am trying to dump a custom class using pickle. The class was subclassed from gtk.ListStore, since that made it easier to store particular data and then display it using gtk. This can be reproduced as shown here.
import gtk
import pickle
import os
class foo(gtk.ListStore):
pass
if __name__=='__main__':
x = foo(str)
with open(os.path.expandvars('%userprofile%\\temp.txt'),'w') as f:
pickle.dump(x,f)
The solution that I have tried was to add a __getstate__ function into my class. As far as I understand the documentation, this should take precedence for pickle so that it no longer tries to serialize the ListStore, which it is unable to do. However, I still get an identical error from pickle.dump when I try to pickle my object. The error can be reproduced as follows.
import gtk
import pickle
import os
class foo(gtk.ListStore):
def __getstate__(self):
return 'bar'
if __name__=='__main__':
x = foo(str)
with open(os.path.expandvars('%userprofile%\\temp.txt'),'w') as f:
pickle.dump(x,f)
In each case, pickle.dump raises a TypeError, "can't pickle ListStore objects". Using print statements, I have verified that the __getstate__ function is run when using pickle.dump. I don't see any hints as to what to do next from the documentation, and so I'm in a bit of a bind. Any suggestions?
With this method you can even use json instead of pickle for your purpose.
Here is a quick working example to show you the steps you need to employ to pickle "unpicklable types" like gtk.ListStore. Essentially you need to do a few things:
Define __reduce__ which returns a function and arguments needed to reconstruct the instance.
Determine the column types for your ListStore. The method self.get_column_type(0) returns a Gtype, so you will need to map this back to the corresponding Python type. I've left that as an exercise - in my example I've employed a hack to get the column types from the first row of values.
Your _new_foo function will need to rebuild the instance.
Example:
import gtk, os, pickle
def _new_foo(cls, coltypes, rows):
inst = cls.__new__(cls)
inst.__init__(*coltypes)
for row in rows:
inst.append(row)
return inst
class foo(gtk.ListStore):
def __reduce__(self):
rows = [list(row) for row in self]
# hack - to be correct you'll really need to use
# `self.get_column_type` and map it back to Python's
# corresponding type.
coltypes = [type(c) for c in rows[0]]
return _new_foo, (self.__class__, coltypes, rows)
x = foo(str, int)
x.append(['foo', 1])
x.append(['bar', 2])
s = pickle.dumps(x)
y = pickle.loads(s)
print list(y[0])
print list(y[1])
Output:
['foo', 1]
['bar', 2]
When you subclass object, object.__reduce__ takes care of calling __getstate__. It would seem that since this is a subclass of gtk.ListStore, the default implementation of __reduce__ tries to pickle the data for reconstructing a gtk.ListStore object first, then calls your __getstate__, but since the gtk.ListStore can't be pickled, it refuses to pickle your class. The problem should go away if you try to implement __reduce__ and __reduce_ex__ instead of __getstate__.
>>> class Foo(gtk.ListStore):
... def __init__(self, *args):
... super(Foo, self).__init__(*args)
... self._args = args
... def __reduce_ex__(self, proto=None):
... return type(self), self._args, self.__getstate__()
... def __getstate__(self):
... return 'foo'
... def __setstate__(self, state):
... print state
...
>>> x = Foo(str)
>>> pickle.loads(pickle.dumps(x))
foo
<Foo object at 0x18be1e0 (__main__+Foo-v3 at 0x194bd90)>
As an addition, you may try to consider other serializers, such as json. There you take full control of the serialiazaton process by defining how custom classes are to be serialized yourself. Plus by default they come without the security issues of pickle.
Related
Let's say I have an object already defined in my Python script that serves as a container for some random items. Each attribute of the container corresponds to an item. In this simple example, I have an ITEMS object that has a BALL attribute which points to a Ball instance.
Now, I need to load some content in YAML, but I want that content to be able to reference the existing ITEMS variable that is already defined. Is this possible? Maybe something along the lines of...
ITEMS = Items()
setattr(Items, 'BALL', Ball())
yaml_text = "item1: !!python/object:ITEMS.BALL"
yaml_items = yaml.load(yaml_text)
My goal, after loading the YAML, is for yaml_items['item1'] to be the Ball instance from the ITEMS object.
Here's a way of doing it the uses the di() function defined in the answer to another question. It takes the integer value returned from the built-in id() function and converts it to a string. The yaml.load() function will call a custom constructor which then does the reverse of that process to determine the object returned.
Caveat: This takes advantage of the fact that, with CPython at least, the id() function returns the address of the Python object in memory—so it may not work with other implementations of the interpreter.
import _ctypes
import yaml
def di(obj_id):
""" Reverse of id() function. """
return _ctypes.PyObj_FromPtr(obj_id)
def py_object_constructor(loader, node):
return di(int(node.value))
yaml.add_constructor(u'!py_object', py_object_constructor)
class Items(object): pass
def Ball(): return 42
ITEMS = Items()
setattr(Items, 'BALL', Ball()) # Set attribute to result of calling Ball().
yaml_text = "item1: !py_object " + str(id(ITEMS.BALL))
yaml_items = yaml.load(yaml_text)
print(yaml_items['item1']) # -> 42
If you're OK with using eval(), you could formalize this and make it easier to use by monkey-patching the yaml module's load() function to do some preprocessing of the yaml stream:
import _ctypes
import re
import yaml
#### Monkey-patch yaml module.
def _my_load(yaml_text, *args, **kwargs):
REGEX = r'##(.+)##'
match = re.search(REGEX, yaml_text)
if match:
obj = eval(match.group(1))
yaml_text = re.sub(REGEX, str(id(obj)), yaml_text)
return _yaml_load(yaml_text, *args, **kwargs)
_yaml_load = yaml.load # Save original function.
yaml.load = _my_load # Change it to custom version.
#### End monkey-patch yaml module.
def di(obj_id):
""" Reverse of id() function. """
return _ctypes.PyObj_FromPtr(obj_id)
def py_object_constructor(loader, node):
return di(int(node.value))
yaml.add_constructor(u'!py_object', py_object_constructor)
class Items(object): pass
def Ball(): return 42
ITEMS = Items()
setattr(Items, 'BALL', Ball()) # Set attribute to result of calling Ball().
yaml_text = "item1: !py_object ##ITEMS.BALL##"
yaml_items = yaml.load(yaml_text)
print(yaml_items['item1']) # -> 42
#martineau quoted the documentation:
[…] provides Python-specific tags that allow to represent an arbitrary Python object.
represent, not construct. It means that you can dump any Python object to YAML, but you can not reference an existing Python object inside YAML.
That being said, you can of course add your own constructor to do it:
import yaml
def eval_constructor(loader, node):
return eval(loader.construct_scalar(node))
yaml.add_constructor(u'!eval', eval_constructor)
some_value = '123'
yaml_text = "item1: !eval some_value"
yaml_items = yaml.load(yaml_text)
Be aware of the security implications of evaling configuration data. Arbitrary Python code can be executed by writing it into the YAML file!
Mostly copied from this answer
I've created a subclass of numpy ndarray following the numpy documentation. In particular, I have added a custom attribute by modifying the code provided.
I'm manipulating instances of this class within a parallel loop, using Python multiprocessing. As I understand it, the way that the scope is essentially 'copied' to multiple threads is using pickle.
The problem I am now coming up against relates to the way that numpy arrays are pickled. I can't find any comprehensive documentation about this, but some discussions between the dill developers suggest that I should be focusing on the __reduce__ method, which is being called upon pickling.
Can anyone shed any more light on this? The minimal working example is really just the numpy example code I linked to above, copied here for completeness:
import numpy as np
class RealisticInfoArray(np.ndarray):
def __new__(cls, input_array, info=None):
# Input array is an already formed ndarray instance
# We first cast to be our class type
obj = np.asarray(input_array).view(cls)
# add the new attribute to the created instance
obj.info = info
# Finally, we must return the newly created object:
return obj
def __array_finalize__(self, obj):
# see InfoArray.__array_finalize__ for comments
if obj is None: return
self.info = getattr(obj, 'info', None)
Now here is the problem:
import pickle
obj = RealisticInfoArray([1, 2, 3], info='foo')
print obj.info # 'foo'
pickle_str = pickle.dumps(obj)
new_obj = pickle.loads(pickle_str)
print new_obj.info # raises AttributeError
Thanks.
np.ndarray uses __reduce__ to pickle itself. We can take a look at what it actually returns when you call that function to get an idea of what's going on:
>>> obj = RealisticInfoArray([1, 2, 3], info='foo')
>>> obj.__reduce__()
(<built-in function _reconstruct>, (<class 'pick.RealisticInfoArray'>, (0,), 'b'), (1, (3,), dtype('int64'), False, '\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00'))
So, we get a 3-tuple back. The docs for __reduce__ describe what each element is doing:
When a tuple is returned, it must be between two and five elements
long. Optional elements can either be omitted, or None can be provided
as their value. The contents of this tuple are pickled as normal and
used to reconstruct the object at unpickling time. The semantics of
each element are:
A callable object that will be called to create the initial version of
the object. The next element of the tuple will provide arguments for
this callable, and later elements provide additional state information
that will subsequently be used to fully reconstruct the pickled data.
In the unpickling environment this object must be either a class, a
callable registered as a “safe constructor” (see below), or it must
have an attribute __safe_for_unpickling__ with a true value.
Otherwise, an UnpicklingError will be raised in the unpickling
environment. Note that as usual, the callable itself is pickled by
name.
A tuple of arguments for the callable object.
Optionally, the object’s state, which will be passed to the object’s
__setstate__() method as described in section Pickling and unpickling normal class instances. If the object has no __setstate__() method,
then, as above, the value must be a dictionary and it will be added to
the object’s __dict__.
So, _reconstruct is the function called to rebuild the object, (<class 'pick.RealisticInfoArray'>, (0,), 'b') are the arguments passed to that function, and (1, (3,), dtype('int64'), False, '\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00')) gets passed to the class' __setstate__. This gives us an opportunity; we could override __reduce__ and provide our own tuple to __setstate__, and then additionally override __setstate__, to set our custom attribute when we unpickle. We just need to make sure we preserve all the data the parent class needs, and call the parent's __setstate__, too:
class RealisticInfoArray(np.ndarray):
def __new__(cls, input_array, info=None):
obj = np.asarray(input_array).view(cls)
obj.info = info
return obj
def __array_finalize__(self, obj):
if obj is None: return
self.info = getattr(obj, 'info', None)
def __reduce__(self):
# Get the parent's __reduce__ tuple
pickled_state = super(RealisticInfoArray, self).__reduce__()
# Create our own tuple to pass to __setstate__
new_state = pickled_state[2] + (self.info,)
# Return a tuple that replaces the parent's __setstate__ tuple with our own
return (pickled_state[0], pickled_state[1], new_state)
def __setstate__(self, state):
self.info = state[-1] # Set the info attribute
# Call the parent's __setstate__ with the other tuple elements.
super(RealisticInfoArray, self).__setstate__(state[0:-1])
Usage:
>>> obj = pick.RealisticInfoArray([1, 2, 3], info='foo')
>>> pickle_str = pickle.dumps(obj)
>>> pickle_str
"cnumpy.core.multiarray\n_reconstruct\np0\n(cpick\nRealisticInfoArray\np1\n(I0\ntp2\nS'b'\np3\ntp4\nRp5\n(I1\n(I3\ntp6\ncnumpy\ndtype\np7\n(S'i8'\np8\nI0\nI1\ntp9\nRp10\n(I3\nS'<'\np11\nNNNI-1\nI-1\nI0\ntp12\nbI00\nS'\\x01\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x02\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x03\\x00\\x00\\x00\\x00\\x00\\x00\\x00'\np13\nS'foo'\np14\ntp15\nb."
>>> new_obj = pickle.loads(pickle_str)
>>> new_obj.info
'foo'
I'm the dill (and pathos) author. dill was pickling a numpy.array before numpy could do it itself. #dano's explanation is pretty accurate. Me personally, I'd just use dill and let it do the job for you. With dill, you don't need __reduce__, as dill has several ways that it grabs subclassed attributes… one of which is storing the __dict__ for any class object. pickle doesn't do this, b/c it usually works with classes by name reference and not storing the class object itself… so you have to work with __reduce__ to make pickle work for you. No need, in most cases, with dill.
>>> import numpy as np
>>>
>>> class RealisticInfoArray(np.ndarray):
... def __new__(cls, input_array, info=None):
... # Input array is an already formed ndarray instance
... # We first cast to be our class type
... obj = np.asarray(input_array).view(cls)
... # add the new attribute to the created instance
... obj.info = info
... # Finally, we must return the newly created object:
... return obj
... def __array_finalize__(self, obj):
... # see InfoArray.__array_finalize__ for comments
... if obj is None: return
... self.info = getattr(obj, 'info', None)
...
>>> import dill as pickle
>>> obj = RealisticInfoArray([1, 2, 3], info='foo')
>>> print obj.info # 'foo'
foo
>>>
>>> pickle_str = pickle.dumps(obj)
>>> new_obj = pickle.loads(pickle_str)
>>> print new_obj.info
foo
dill can extend itself into pickle (essentially by copy_reg everything it knows), so you can then use all dill types in anything that uses pickle. Now, if you are going to use multiprocessing, you are a bit screwed, since it uses cPickle. There is, however, the pathos fork of multiprocessing (called pathos.multiprocessing), which basically the only change is it uses dill instead of cPickle… and thus can serialize a heck of a lot more in a Pool.map. I think (currently) if you want to work with your subclass of a numpy.array in multiprocessing (or pathos.multiprocessing), you might have to do something like #dano suggests -- but not sure, as I didn't think of a good case off the top of my head to test your subclass.
If you are interested, get pathos here: https://github.com/uqfoundation
Here is a slight improvement to #dano's answer and and #Gabriel's comment. Leveraging the __dict__ attribute for serialization works for me even with subclasses.
def __reduce__(self):
# Get the parent's __reduce__ tuple
pickled_state = super(RealisticInfoArray, self).__reduce__()
# Create our own tuple to pass to __setstate__, but append the __dict__ rather than individual members.
new_state = pickled_state[2] + (self.__dict__,)
# Return a tuple that replaces the parent's __setstate__ tuple with our own
return (pickled_state[0], pickled_state[1], new_state)
def __setstate__(self, state):
self.__dict__.update(state[-1]) # Update the internal dict from state
# Call the parent's __setstate__ with the other tuple elements.
super(RealisticInfoArray, self).__setstate__(state[0:-1])
Here is a full example: https://onlinegdb.com/SJ88d5DLB
Is there any way to get the object name when the class name is known. If there are multiple objects for a class they also need to be printed.
Class A():
pass
Assume that some one have created objects for class A in some other files. So, I want to look all instances of 'Class A'
If you are the one creating the class you can simply store weak-references when instantiating the class:
import weakref
class A(object):
instances = []
def __init__(self):
A.instances.append(weakref.ref(self))
a, b, c = A(), A(), A()
instances = [ref() for ref in A.instances if ref() is not None]
Using weak-references allow the instances to be deallocated before the class.
See the weakref module for details on what it does.
Note that you may be able to use this technique even with classes that you didn't write. You simply have to monkey-patch the class.
For example:
def track_instances(cls):
def init(self, *args, **kwargs):
getattr(self, 'instances').append(weakref.ref(self))
getattr(self, '_old_init')(self, *args, **kwargs)
cls._old_init = cls.__init__
cls.__init__ = init
return cls
Then you can do:
track_instances(ExternalClass)
And all instances created after the execution of this statement will be found in ExternalClass.instances.
Depending on the class you may have to replace __new__ instead of __init__.
You can do this even without any special code in the class, simply using the garbage collector:
import gc
candidates = gc.get_referrers(cls_object)
instances = [candidate for candidate in candidates if isinstance(candidate, cls_object)]
And you can always obtain the class object since you can find it using object.__subclasses__ method:
cls_object = next(cls for cls in object.__subclasses__() if cls.__name__ == cls_name)
(assuming there is only a class with that name, otherwise you should try all of them)
However I cannot think of a situation where this is the right thing to do, so avoid this code in real applications.
I've done some testing and I believe that this solution may not work for built-in classes or classes defined in C extensions.
If you are in this case the last resort is to use gc.get_objects() to retrieve all tracked objects. However this will work only if the object support cyclic garbage collection, so there isn't a method that works in every possible situation.
Here the version getting the instances from memory, I wouldn't recommend using this in live code but it can be convenient for debugging:
import weakref
class SomeClass(object):
register = []
def __init__(self):
self.register.append(weakref.ref(self))
a = SomeClass()
b = SomeClass()
c = SomeClass()
# Now the magic :)
import gc
def get_instances(class_name):
# Get the objects from memory
for instance in gc.get_objects():
# Try and get the actual class
class_ = getattr(instance, '__class__', None)
# Only return if the class has the name we want
if class_ and getattr(class_, '__name__', None) == class_name:
yield instance
print list(get_instances('SomeClass'))
Python provides the types module that defined classes for built-in types and the locals() and globals() functions that return a list of local and global variables in the application.
One quick way to find objects by type is to do this.
import types
for varname, var_instance in locals().items():
if type(var_instance) == types.InstanceType and var_instance.__class__.__name__ == 'CLASS_NAME_YOU_ARE_LOOKING_FOR':
print "This instance was found:", varname, var_instance
It's worth going through the Python library documentation and read the docs for modules that work with the code directly. Some of which are inspect, gc, types, codeop, code, imp, ast. bdb, pdb. The IDLE source code is also very informative.
Instances are created within a namespace:
def some_function():
some_object = MyClass()
In this case, some_object is a name inside the "namespace" of the function that points at a MyClass instance. Once you leave the namespace (i.e., the function ends), Python's garbage collection cleans up the name and the instance.
If there would be some other location that also has a pointer to the object, the cleanup wouldn't happen.
So: no, there's no place where a list of instances is maintained.
It would be a different case where you to use a database with an ORM (object-relational mapper). In Django's ORM you can do MyClass.objects.all() if MyClass is a database object. Something to look into if you really need the functionality.
Update: See Bakuriu's answer. The garbage collector (which I mentioned) knows about all the instances :-) And he suggests the "weakref" module that prevents my won't-be-cleaned-up problem.
You cann get names for all the instances as they may not all have names, or the names they do have may be in scope. You may be able to get the instances.
If you are willing to keep track of the instances yourself, use a WeakSet:
import weakref
class SomeClass(object):
instances = weakref.WeakSet()
def __init__(self):
self.instances.add(self)
>>> instances = [SomeClass(), SomeClass(), SomeClass()]
>>> other = SomeClass()
>>> SomeClass.instances
<_weakrefset.WeakSet object at 0x0291F6F0>
>>> list(SomeClass.instances)
[<__main__.SomeClass object at 0x0291F710>, <__main__.SomeClass object at 0x0291F730>, <__main__.SomeClass object at 0x028F0150>, <__main__.SomeClass object at 0x0291F210>]
Note that just deleting a name may not destroy the instance. other still exists until the garbage collected:
>>> del other
>>> list(SomeClass.instances)
[<__main__.SomeClass object at 0x0291F710>, <__main__.SomeClass object at 0x0291F730>, <__main__.SomeClass object at 0x028F0150>, <__main__.SomeClass object at 0x0291F210>]
>>> import gc
>>> gc.collect()
0
>>> list(SomeClass.instances)
[<__main__.SomeClass object at 0x0291F710>, <__main__.SomeClass object at 0x0291F730>, <__main__.SomeClass object at 0x0291F210>]
If you don't want to track them manually, then it is possible to use gc.get_objects() and filter out the instances you want, but that means you have to filter through all the objects in your program every time you do this. Even in the above example that means processing nearly 12,000 objects to find the 3 instances you want.
>>> [g for g in gc.get_objects() if isinstance(g, SomeClass)]
[<__main__.SomeClass object at 0x0291F210>, <__main__.SomeClass object at 0x0291F710>, <__main__.SomeClass object at 0x0291F730>]
>>> class TestClass:
... pass
...
>>> foo = TestClass()
>>> for i in dir():
... if isinstance(eval(i), TestClass):
... print(i)
...
foo
>>>
Finally found a way to get through.
As I know the class name, I would search for the object created for that class in garbage collector(gc) like this...
for instance in gc.get_objects():
if str(type(instance)).find("dict") != -1:
for k in instance.keys():
if str(k).find("Sample") != -1:
return k
The above code returns an instance of the class which will be like this. Unfortunately,its in String format which doesn't suit the requirement. It should be of 'obj' type.
<mod_example.Sample object at 0x6f55250>
From the above value, parse the id(0x6f55250) and get the object reference based on the id.
obj_id = 0x6f55250
for obj in gc.get_objects():
# Converting decimal value to hex value
if id(obj) == ast.literal_eval(obj_id):
required_obj = obj
Hence required_obj will hold the object reference exactly in the 'obj' format.
:-)
I'm trying to use pickle to save a custom class; something very much like the code below (though with a few methods defined on the class, and several more dicts and such for data). However, often when I run this, pickle and then unpickle, I lose whatever data was in the class, and its as if I created a new blank instance.
import pickle
class MyClass:
VERSION = 1
some_data = {}
more_data = set()
def save(self,filename):
with open(filename, 'wb') as f:
p = pickle.Pickler(f)
p.dump(self)
def load(filename):
with open(filename,'rb') as ifile:
u = pickle.Unpickler(ifile)
obj = u.load()
return obj
I was wondering if this had something to do with the memo of the pickle class, but I don't feel like it should. When it doesn't work, I look at my generated file and it looks something like this: (Obviously not meant to be readable, but it obviously contains no data)
€c__main__
MyClass
q
Anyways, I hope this is enough for someone to understand what might possibly be going on here, or what to look at.
The problem you're having is that you're using mutable class variables to hold your data, rather than putting the data into instance variables.
The pickle module only saves the data stored directly on the instance, not class variables that can also be accessed via self. When you're finding your unpickled instance have no data, what that probably means is that the class doesn't hold the data from the previous run, so the instances can't access it any more.
Using class variables that way will probably cause you other problems too, as the data will be shared by all instances of the class! Here's a Python console session code that illustrates the issue:
>>> class Foo(object):
class_var = []
def __init__(self, value):
self.class_var.append(value)
>>> f1 = Foo(1)
>>> f1.class_var
[1]
>>> f2 = Foo(2)
>>> f2.class_var
[1, 2]
That's probably not what you wanted. But it gets worse!
>>> f1.class_var
[1, 2]
The data you thought had belonged to f1 has been changed by the creation of f2. In fact, f1.class_var is the very same object as f2.class_var (it is also available via Foo.class_var directly, without going through any instances at all).
So, using a class variable is almost certainly not what you want. Instead, write an __init__ method for the class that creates a new value and saves it as an instance variable:
>>> class Bar(object):
def __init__(self, value):
self.instance_var = [] # creates a separate list for each instance!
self.instance_var.append(value)
>>> b1 = Bar(1)
>>> b1.instance_var
[1]
>>> b2 = Bar(2)
>>> b2.instance_var # doesn't include value from b1
[2]
>>> b1.instance_var # b1's data is unchanged
[1]
Pickle will handle this class as you expect. All of its data is in the instances, so you should never end up with an empty instance when you unpickle.
I have a class whose instances need to format output as instructed by the user. There's a default format, which can be overridden. I implemented it like this:
class A:
def __init__(self, params):
# ...
# by default printing all float values as percentages with 2 decimals
self.format_functions = {float: lambda x : '{:.2%}'.format(x)}
def __str__(self):
# uses self.format_functions to format output
# ...
a = A(params)
print(a) # uses default output formatting
# overriding default output formatting
# float printed as percentages 3 decimal digits; bool printed as Y / N
a.format_functions = {float : lambda x: '{:.3%}'.format(x),
bool : lambda x: 'Y' if x else 'N'}
print(a)
Is it ok? Let me know if there is a better way to design this.
Unfortunately, I need to pickle instances of this class. But only functions defined at the top level of the module can be pickled; lambda functions are unpicklable, so my format_functions instance attribute breaks the pickling.
I tried rewriting this to use a class method instead of lambda functions, but still no luck for the same reason:
class A:
#classmethod
def default_float_format(cls, x):
return '{:.2%}'.format(x)
def __init__(self, params):
# ...
# by default printing all float values as percentages with 2 decimals
self.format_functions = {float: self.default_float_format}
def __str__(self):
# uses self.format_functions to format output
# ...
a = A(params)
pickle.dump(a) # Can't pickle <class 'method'>: attribute lookup builtins.method failed
Note that pickling here doesn't work even if I don't override the defaults; just the fact that I assigned self.format_functions = {float : self.default_float_format} breaks it.
What to do? I'd rather not pollute the namespace and break encapsulation by defining default_float_format at the module level.
Incidentally, why in the world does pickle create this restriction? It certainly feels like a gratuitous and substantial pain to the end user.
For pickling of class instances or functions (and therefore methods), Python's pickle depend that their name is available as global variables - the reference to the method in the dictionary points to a name that is not available in the global name space - which iis better said "module namespace" -
You could circunvent that by customizing the pickling of your class, by creating teh "__setstate__" and "__getstate__" methods - but I think you be better, since the formatting function does not depend on any information of the object or of the class itself (and even if some formatting function does, you could pass that as parameters), and define a function outside of the class scope.
This does work (Python 3.2):
def default_float_format( x):
return '{:.2%}'.format(x)
class A:
def __init__(self, params):
# ...
# by default printing all float values as percentages with 2 decimals
self.format_functions = {float: default_float_format}
def __str__(self):
# uses self.format_functions to format output
pass
a = A(1)
pickle.dumps(a)
If you use the dill module, either of your two approaches will just "work" as is. dill can pickle lambda as well as instances of classes and also class methods.
No need to pollute the namespace and break encapsulation, as you said you didn't want to do… but the other answer does.
dill is basically ten years or so worth of finding the right copy_reg function that registers how to serialize the majority of objects in standard python. Nothing special or tricky, it just takes time. So why doesn't pickle do this for us? Why does pickle have this restriction?
Well, if you look at the pickle docs, the answer is there:
https://docs.python.org/2/library/pickle.html#what-can-be-pickled-and-unpickled
Basically: Functions and classes are pickled by reference.
This means pickle does not work on objects defined in __main__, and it also doesn't work on many dynamically modified objects. dill registers __main__ as a module, so it has a valid namespace. dill also given you the option to not pickle by reference, so you can serialize dynamically modified objects… and class instances, class methods (bound and unbound), and so on.