(Un)Pickle Class having Instancemethod Objects - python

I have a class (Bar) which effectively has its own state and callback(s) and is used by another class (Foo):
class Foo(object):
def __init__(self):
self._bar = Bar(self.say, 10)
self._bar.work()
def say(self, msg):
print msg
class Bar(object):
def __init__(self, callback, value):
self._callback = callback
self._value = value
self._more = { 'foo' : 1, 'bar': 3, 'baz': 'fubar'}
def work(self):
# Do some work
self._more['foo'] = 5
self._value = 10
self._callback('FooBarBaz')
Foo()
Obviously I can't pickle the class Foo since Bar has an instancemethod, so I'm left with the following solution of implementing __getstate__ & __setstate__ in Bar to save self._value & self._more, but I have to instantiate the self._callback method as well (i.e. call __init__() from the outer class Foo passing the callback function.
But I cannot figure out how to achieve this.
Any help is much appreciated.
Thanks.

I think if you need to serialize something like this you need to be able to define your callback as a string. For example, you might say that callback = 'myproject.callbacks.foo_callback'.
Basically in __getstate__ you'd replace the _callback function with something you could use to look up the function later like self._callback.__name__.
In __setstate__ you'd replace _callback with a function.
This depends on your functions all having real names so you couldn't use a lambda as a callback and expect it to be serialized. You'd also need a reasonable mechanism for looking up your functions by name.
You could potentially use __import__ (something like: 'myproject.somemodule.somefunc' dotted name syntax could be supported that way, see http://code.google.com/p/mock/source/browse/mock.py#1076) or just define a lookup table in your code.
Just a quick (untested, sorry!) example assuming you have a small set of possible callbacks defined in a lookup table:
def a():
pass
callbacks_to_name = {a: 'a'
# ...
}
callbacks_by_name = {'a': a,
# ...
}
class C:
def __init__(self, cb):
self._callback = cb
def __getstate__(self):
self._callback = callbacks_to_name[self._callback]
return self.__dict__
def __setstate__(self, state):
state[_callback] = callbacks_by_name[self._callback]
I'm not sure what your use case is but I'd recommend doing this by serializing your work items to JSON or XML and writing a simple set of functions to serialize and deserialize them yourself.
The benefit is that the serialized format can be read and understood by humans and modified when you upgrade your software. Pickle is tempting because it seems close enough, but by the time you have a serious pile of __getstate__ and __setstate__ you haven't really saved yourself much effort or headache over building your own scheme specifically for your application.

Related

how to construct methods dynamically?

I have designed a class. It is pretty standard, with some method attributes
class foo:
def f1(self):
print 'f1'
def f2(self):
print 'f2'
....
def fn(self):
print 'fn'
Now I would like to create a class which contains a set of foo instances.
class bar:
self.myfoos=[foo(),foo(),foo()]
I would then like to class the f1..fn methods on all the foo instances.
I could do:
class bar:
...
def f1():
for foo_ in self.myfoos:
foo_.f1()
However, my list of f1..fn is quite long so how could I obtain this behavior in a succint way?Maybe with alternative design completely?
You could just implement __getattr__ and delegate that call to list of foos. I'm sure there is a more elegant way to do this:
class foo:
def f1(self):
print('f1')
def f2(self):
print('f2')
class bar:
def __init__(self):
self.foos = [foo() for _ in range(3)]
def __getattr__(self, fn):
def fns(*args, **kwargs):
for f in self.foos:
getattr(f, fn)(*args, **kwargs)
return fns
In []:
b = bar()
b.f1()
Out[]
f1
f1
f1
In []:
b.f2()
Out[]:
f2
f2
f2
You're looking for a way to construct a bunch of methods dynamically. This is often not a good idea—but sometimes it is. (For example, consider libraries like PyObjC and pythoncom that build dynamic proxies to ObjC and COM classes that you don't even know about until runtime. How else could you do that?)
So, you should definitely think through whether you actually want and need this—but, if you do, there are two basic approaches.
Building a static class dynamically
If you're only trying to wrap up a collection of foo objects, you can create all the methods in a loop. Methods aren't anything too magical; you just define them the same as any other function, and assign them to the class.
The only tricky bit there is that you can't just write bar.f1 = …, because f1 is only available as a string. So we have to use setattr to do it:
class bar:
# your existing stuff
for name in 'f1 f2 f3 f4 f5 f6 f7 f8'.split():
foometh = getattr(foo, name)
def f(self):
for foo in self.myfoos:
foometh(foo)
f.__name__ = name
setattr(bar, name, f)
If there's some kind of rule that specifies which methods you want to forward, instead of a list of a bunch of method names, you'd do something like:
for name, foometh in inspect.getmembers(foo):
if name.startswith('_') or not isinstance(foometh, types.FunctionType)) or <rest of your rule>:
continue
def f(self):
# from here it's the same as above
Building a dynamic class statically
If you're trying to wrap up anything that meets some basic qualifications, rather than some specific list of methods of some specific class, you won't know what you want to wrap up, or how you want to wrap it, until someone tries to call those methods. So you have to catch the attempt to look up an unknown method, and build the wrapper on the fly. For this, we override __getattr__:
class bar:
# your existing stuff
def __getattr__(self, attr):
if attr.startswith('_') or <other rules here>:
raise AttributeError
def f():
for foo in self.myfoos:
foometh(foo)
f.__name__ = attr
return f
This version returns functions that act like bound methods if you don't look too closely, rather than actual bound methods that can be introspected. If you want the latter, bind a method explicitly, by adding self as a parameter to f, and then calling __get__ on f and returning the result. (And if you don't know what that means, you don't want to write this part…)

Hash a python new-style class instance?

Given a custom, new-style python class instance, what is a good way to hash it and get a unique ID-like value from it to use for various purposes? Think md5sum or sha1sum of a given class instance.
The approach I am currently using pickles the class and runs that through hexdigest, storing the resultant hash string into a class property (this property is never part of the pickle/unpickle procedures, fyi). Except now I've run into a case where a third-party module uses nested classes, and there is no really good way to pickle those without some hacks. I figure that I am missing out on some clever little Python trick somewhere to accomplish this.
Edit:
Example code because it seems to be a requirement around here to get any traction on a question. The below class can be initialized and the self._uniq_id property can be properly setup.
#!/usr/bin/env python
import hashlib
# cPickle or pickle.
try:
import cPickle as pickle
except:
import pickle
# END try
# Single class, pickles fine.
class FooBar(object):
__slots__ = ("_foo", "_bar", "_uniq_id")
def __init__(self, eth=None, ts=None, pkt=None):
self._foo = "bar"
self._bar = "bar"
self._uniq_id = hashlib.sha1(pickle.dumps(self, -1)).hexdigest()[0:16]
def __getstate__(self):
return {'foo':self._foo, 'bar':self._bar}
def __setstate__(self, state):
self._foo = state['foo']
self._bar = state['bar']
self._uniq_id = hashlib.sha1(pickle.dumps(self, -1)).hexdigest()[0:16]
def _get_foo(self): return self._foo
def _get_bar(self): return self._bar
def _get_uniq_id(self): return self._uniq_id
foo = property(_get_foo)
bar = property(_get_bar)
uniq_id = property(_get_uniq_id)
# End
This next class, however, cannot be initialized because of Bar being nested in Foo:
#!/usr/bin/env python
import hashlib
# cPickle or pickle.
try:
import cPickle as pickle
except:
import pickle
# END try
# Nested class, can't pickle for hexdigest.
class Foo(object):
__slots__ = ("_foo", "_bar", "_uniq_id")
class Bar(object):
pass
def __init__(self, eth=None, ts=None, pkt=None):
self._foo = "bar"
self._bar = self.Bar()
self._uniq_id = hashlib.sha1(pickle.dumps(self, -1)).hexdigest()[0:16]
def __getstate__(self):
return {'foo':self._foo, 'bar':self._bar}
def __setstate__(self, state):
self._foo = state['foo']
self._bar = state['bar']
self._uniq_id = hashlib.sha1(pickle.dumps(self, -1)).hexdigest()[0:16]
def _get_foo(self): return self._foo
def _get_bar(self): return self._bar
def _get_uniq_id(self): return self._uniq_id
foo = property(_get_foo)
bar = property(_get_bar)
uniq_id = property(_get_uniq_id)
# End
The error I receive is:
Traceback (most recent call last):
File "./nest_test.py", line 70, in <module>
foobar2 = Foo()
File "./nest_test.py", line 49, in __init__
self._uniq_id = hashlib.sha1(pickle.dumps(self, -1)).hexdigest()[0:16]
cPickle.PicklingError: Can't pickle <class '__main__.Bar'>: attribute lookup __main__.Bar failed
(nest_test.py) has both classes in it, hence the line number offset).
Pickling requires the __getstate__() method I found out, so I also implemented __setstate__() for completeness as well. But given the already existing warnings about security and pickle, there's got to be a better way to do this.
Based on what I have read so far, the error stems from Python not being able to resolve the nested classes. It tries to look up the attribute __main__.Bar, which doesn't exist. It really needs to be able to find __main__.Foo.Bar instead, but there is no really good way to do this. I bumped into another SO answer here that provides a "hack" to trick Python, but it came with a stern warning that such an approach is not advisable, and to either use something other than pickling or to move the nested class definition to the outside versus the inside.
However, the original question of that SO answer, I believe, was for pickling and unpickling to a file. I only need to pickle in order to use the requisite hashlib functions, which seem to operate on a bytearray (much like I am used to in .NET), and pickling (Especially cPickle) is fast and optimized versus writing my own bytearray routine.
That depends entirely on what properties the ID should have.
For instance, you can use id(foo) to get an ID which is guaranteed to be unique as long as foo is active in memory, or you could use repr(instance.__dict__) if all of the fields have sensible repr values.
What specifically do you need it for?
While you're using hexdigests of pickles at the moment, you make it sound like the id doesn't actually need to be related to the object, it just needs to be unique. Why not simply use the uuid module, specifically uuid.uuid4 to generate unique IDs and assign them to a uuid field in the object...

How can I ensure that one of my class's methods is always called even if a subclass overrides it?

For example, I have a
class BaseHandler(object):
def prepare(self):
self.prepped = 1
I do not want everyone that subclasses BaseHandler and also wants to implement prepare to have to remember to call
super(SubBaseHandler, self).prepare()
Is there a way to ensure the superclass method is run even if the subclass also implements prepare?
I have solved this problem using a metaclass.
Using a metaclass allows the implementer of the BaseHandler to be sure that all subclasses will call the superclasses prepare() with no adjustment to any existing code.
The metaclass looks for an implementation of prepare on both classes and then overwrites the subclass prepare with one that calls superclass.prepare followed by subclass.prepare.
class MetaHandler(type):
def __new__(cls, name, bases, attrs):
instance = type.__new__(cls, name, bases, attrs)
super_instance = super(instance, instance)
if hasattr(super_instance, 'prepare') and hasattr(instance, 'prepare'):
super_prepare = getattr(super_instance, 'prepare')
sub_prepare = getattr(instance, 'prepare')
def new_prepare(self):
super_prepare(self)
sub_prepare(self)
setattr(instance, 'prepare', new_prepare)
return instance
class BaseHandler(object):
__metaclass__ = MetaHandler
def prepare(self):
print 'BaseHandler.prepare'
class SubHandler(BaseHandler):
def prepare(self):
print 'SubHandler.prepare'
Using it looks like this:
>>> sh = SubHandler()
>>> sh.prepare()
BaseHandler.prepare
SubHandler.prepare
Tell your developers to define prepare_hook instead of prepare, but
tell the users to call prepare:
class BaseHandler(object):
def prepare(self):
self.prepped = 1
self.prepare_hook()
def prepare_hook(self):
pass
class SubBaseHandler(BaseHandler):
def prepare_hook(self):
pass
foo = SubBaseHandler()
foo.prepare()
If you want more complex chaining of prepare calls from multiple subclasses, then your developers should really use super as that's what it was intended for.
Just accept that you have to tell people subclassing your class to call the base method when overriding it. Every other solution either requires you to explain them to do something else, or involves some un-pythonic hacks which could be circumvented too.
Python’s object inheritance model was designed to be open, and any try to go another way will just overcomplicate the problem which does not really exist anyway. Just tell everybody using your stuff to either follow your “rules”, or the program will mess up.
One explicit solution without too much magic going on would be to maintain a list of prepare call-backs:
class BaseHandler(object):
def __init__(self):
self.prepare_callbacks = []
def register_prepare_callback(self, callback):
self.prepare_callbacks.append(callback)
def prepare(self):
# Do BaseHandler preparation
for callback in self.prepare_callbacks:
callback()
class MyHandler(BaseHandler):
def __init__(self):
BaseHandler.__init__(self)
self.register_prepare_callback(self._prepare)
def _prepare(self):
# whatever
In general you can try using __getattribute__ to achive something like this (until the moment someone overwrites this method too), but it is against the Python ideas. There is a reason to be able to access private object members in Python. The reason is mentioned in import this

Can I mark variables as transient so they won't be pickled?

Let's say I have a class:
class Thing(object):
cachedBar = None
def __init__(self, foo):
self.foo = foo
def bar(self):
if not self.cachedBar:
self.cachedBar = doSomeIntenseCalculation()
return self.cachedBar
To get bar some intense calculation, so I cache it in memory to speed things up.
However, when I pickle one of these classes I don't want cachedBar to be pickled.
Can I mark cachedBar as volatile / transient / not picklable?
According to the Pickle documentation, you can provide a method called __getstate__(), which returns something representing the state you want to have pickled (if it isn't provided, pickle uses thing.__dict__). So, you can do something like this:
class Thing:
def __getstate__(self):
state = dict(self.__dict__)
del state['cachedBar']
return state
This doesn't have to be a dict, but if it is something else, you need to also implement __setstate__(state).
Implement __getstate__ to return only what parts of an object to be pickled

Is there a way to instantiate a class without calling __init__?

Is there a way to circumvent the constructor __init__ of a class in python?
Example:
class A(object):
def __init__(self):
print "FAILURE"
def Print(self):
print "YEHAA"
Now I would like to create an instance of A. It could look like this, however this syntax is not correct.
a = A
a.Print()
EDIT:
An even more complex example:
Suppose I have an object C, which purpose it is to store one single parameter and do some computations with it. The parameter, however, is not passed as such but it is embedded in a huge parameter file. It could look something like this:
class C(object):
def __init__(self, ParameterFile):
self._Parameter = self._ExtractParamterFile(ParameterFile)
def _ExtractParamterFile(self, ParameterFile):
#does some complex magic to extract the right parameter
return the_extracted_parameter
Now I would like to dump and load an instance of that object C. However, when I load this object, I only have the single variable self._Parameter and I cannot call the constructor, because it is expecting the parameter file.
#staticmethod
def Load(file):
f = open(file, "rb")
oldObject = pickle.load(f)
f.close()
#somehow create newObject without calling __init__
newObject._Parameter = oldObject._Parameter
return newObject
In other words, it is not possible to create an instance without passing the parameter file. In my "real" case, however, it is not a parameter file but some huge junk of data I certainly not want to carry around in memory or even store it to disc.
And since I want to return an instance of C from the method Load I do somehow have to call the constructor.
OLD EDIT:
A more complex example, which explains why I am asking the question:
class B(object):
def __init__(self, name, data):
self._Name = name
#do something with data, but do NOT save data in a variable
#staticmethod
def Load(self, file, newName):
f = open(file, "rb")
s = pickle.load(f)
f.close()
newS = B(???)
newS._Name = newName
return newS
As you can see, since data is not stored in a class variable I cannot pass it to __init__. Of course I could simply store it, but what if the data is a huge object, which I do not want to carry around in memory all the time or even save it to disc?
You can circumvent __init__ by calling __new__ directly. Then you can create a object of the given type and call an alternative method for __init__. This is something that pickle would do.
However, first I'd like to stress very much that it is something that you shouldn't do and whatever you're trying to achieve, there are better ways to do it, some of which have been mentioned in the other answers. In particular, it's a bad idea to skip calling __init__.
When objects are created, more or less this happens:
a = A.__new__(A, *args, **kwargs)
a.__init__(*args, **kwargs)
You could skip the second step.
Here's why you shouldn't do this: The purpose of __init__ is to initialize the object, fill in all the fields and ensure that the __init__ methods of the parent classes are also called. With pickle it is an exception because it tries to store all the data associated with the object (including any fields/instance variables that are set for the object), and so anything that was set by __init__ the previous time would be restored by pickle, there's no need to call it again.
If you skip __init__ and use an alternative initializer, you'd have a sort of a code duplication - there would be two places where the instance variables are filled in, and it's easy to miss one of them in one of the initializers or accidentally make the two fill the fields act differently. This gives the possibility of subtle bugs that aren't that trivial to trace (you'd have to know which initializer was called), and the code will be more difficult to maintain. Not to mention that you'd be in an even bigger mess if you're using inheritance - the problems will go up the inheritance chain, because you'd have to use this alternative initializer everywhere up the chain.
Also by doing so you'd be more or less overriding Python's instance creation and making your own. Python already does that for you pretty well, no need to go reinventing it and it will confuse people using your code.
Here's what to best do instead: Use a single __init__ method that is to be called for all possible instantiations of the class that initializes all instance variables properly. For different modes of initialization use either of the two approaches:
Support different signatures for __init__ that handle your cases by using optional arguments.
Create several class methods that serve as alternative constructors. Make sure they all create instances of the class in the normal way (i.e. calling __init__), as shown by Roman Bodnarchuk, while performing additional work or whatever. It's best if they pass all the data to the class (and __init__ handles it), but if that's impossible or inconvenient, you can set some instance variables after the instance was created and __init__ is done initializing.
If __init__ has an optional step (e.g. like processing that data argument, although you'd have to be more specific), you can either make it an optional argument or make a normal method that does the processing... or both.
Use classmethod decorator for your Load method:
class B(object):
def __init__(self, name, data):
self._Name = name
#store data
#classmethod
def Load(cls, file, newName):
f = open(file, "rb")
s = pickle.load(f)
f.close()
return cls(newName, s)
So you can do:
loaded_obj = B.Load('filename.txt', 'foo')
Edit:
Anyway, if you still want to omit __init__ method, try __new__:
>>> class A(object):
... def __init__(self):
... print '__init__'
...
>>> A()
__init__
<__main__.A object at 0x800f1f710>
>>> a = A.__new__(A)
>>> a
<__main__.A object at 0x800f1fd50>
Taking your question literally I would use meta classes :
class MetaSkipInit(type):
def __call__(cls):
return cls.__new__(cls)
class B(object):
__metaclass__ = MetaSkipInit
def __init__(self):
print "FAILURE"
def Print(self):
print "YEHAA"
b = B()
b.Print()
This can be useful e.g. for copying constructors without polluting the parameter list.
But to do this properly would be more work and care than my proposed hack.
Not really. The purpose of __init__ is to instantiate an object, and by default it really doesn't do anything. If the __init__ method is not doing what you want, and it's not your own code to change, you can choose to switch it out though. For example, taking your class A, we could do the following to avoid calling that __init__ method:
def emptyinit(self):
pass
A.__init__ = emptyinit
a = A()
a.Print()
This will dynamically switch out which __init__ method from the class, replacing it with an empty call. Note that this is probably NOT a good thing to do, as it does not call the super class's __init__ method.
You could also subclass it to create your own class that does everything the same, except overriding the __init__ method to do what you want it to (perhaps nothing).
Perhaps, however, you simply wish to call the method from the class without instantiating an object. If that is the case, you should look into the #classmethod and #staticmethod decorators. They allow for just that type of behavior.
In your code you have put the #staticmethod decorator, which does not take a self argument. Perhaps what may be better for the purpose would a #classmethod, which might look more like this:
#classmethod
def Load(cls, file, newName):
# Get the data
data = getdata()
# Create an instance of B with the data
return cls.B(newName, data)
UPDATE: Rosh's Excellent answer pointed out that you CAN avoid calling __init__ by implementing __new__, which I was actually unaware of (although it makes perfect sense). Thanks Rosh!
I was reading the Python cookbook and there's a section talking about this: the example is given using __new__ to bypass __init__()
>>> class A:
def __init__(self,a):
self.a = a
>>> test = A('a')
>>> test.a
'a'
>>> test_noinit = A.__new__(A)
>>> test_noinit.a
Traceback (most recent call last):
File "", line 1, in
test_noinit.a
AttributeError: 'A' object has no attribute 'a'
>>>
However I think this only works in Python3. Below is running under 2.7
>>> class A:
def __init__(self,a):
self.a = a
>>> test = A.__new__(A)
Traceback (most recent call last):
File "", line 1, in
test = A.__new__(A)
AttributeError: class A has no attribute '__new__'
>>>
As I said in my comment you could change your __init__ method so that it allows creation without giving any values to its parameters:
def __init__(self, p0, p1, p2):
# some logic
would become:
def __init__(self, p0=None, p1=None, p2=None):
if p0 and p1 and p2:
# some logic
or:
def __init__(self, p0=None, p1=None, p2=None, init=True):
if init:
# some logic

Categories