Python's __reduce__/copy_reg semantic and stateful unpickler - python

I want to implement pickling support for objects belonging to my extension library. There is a global instance of class Service initialized at startup. All these objects are produced as a result of some Service method invocations and essentially belong to it. Service knows how to serialize them into binary buffers and how deserialize buffers back into objects.
It appeared that Pythons __ reduce__ should serve my purpose - implement pickling support. I started implementing one and realized that there is an issue with unpickler (first element od a tuple expected to be returned by __ reduce__). This unpickle function needs instance of a Service to be able to convert input buffer into an Object. Here is a bit of pseudo code to illustrate the issue:
class Service(object):
...
def pickleObject(self,obj):
# do serialization here and return buffer
...
def unpickleObject(self,buffer):
# do deserialization here and return new Object
...
class Object(object):
...
def __reduce__(self):
return self.service().unpickleObject, (self.service().pickleObject(self),)
Note the first element in a tuple. Python pickler does not like it: it says it is instancemethod and it can't be pickled. Obviously pickler is trying to store the routine into the output and wants Service instance along with function name, but this is not want I want to happen. I do not want (and really can't : Service is not pickable) to store service along with all the objects. I want service instance to be created before pickle.load is invoked and somehow that instance get used during unpickling.
Here where I came by copy_reg module. Again it appeared as it should solve my problems. This module allows to register pickler and unpickler routines per type dynamically and these are supposed to be used later on for the objects of this type. So I added this registration to the Service construction:
class Service(object):
...
def __init__(self):
...
import copy_reg
copy_reg( mymodule.Object, self.pickleObject, self.unpickleObject )
self.unpickleObject is now a bound method taking service as a first parameter and buffer as second. self.pickleObject is also bound method taking service and object to pickle. copy_reg required that pickleObject routine should follow reducer semantic and returns similar tuple as before. And here the problem arose again: what should I return as the first tuple element??
class Service(object):
...
def pickleObject(self,obj):
...
return self.unpickleObject, (self.serialize(obj),)
In this form pickle again complains that it can't pickle instancemethod. I tried None - it does not like it either. I put there some dummy function. This works - meaning serialization phase went through fine, but during unpickling it calls this dummy function instead of unpickler I registered for the type mymodule.Object in Service constructor.
So now I am at loss. Sorry for long explanation: I did not know how to ask this question in a few lines. I can summarize my questions like this:
Why does copy_reg semantic requires me to return unpickler routine from pickleObject, if I an expected to register one independently?
Is there any reason to prefer copy_reg.constructor interface to register unpickler routine?
How do I make pickle to use the unpickler I registered instead of one inside the stream?
What should I return as first element in a tuple as pickleObject result value? Is there a "correct" value?
Do I approach this whole thing correctly? Is there different/simpler solution?

First of all, the copy_reg module is unlikely to help you much here: it is primarily a way to add __reduce__ like features to classes that don't have that method rather than offering any special abilities (e.g. if you want to pickle objects from some library that doesn't natively support it).
The callable returned by __reduce__ needs to be locatable in the environment where the object is to be unpickled, so an instance method isn't really appropriate. As mentioned in the Pickle documentation:
In the unpickling environment this object must be either a class, a callable registered as a
“safe constructor” (see below), or it must have an attribute
__safe_for_unpickling__ with a true value.
So if you defined a function (not method) as follows:
def _unpickle_service_object(buffer):
# Grab the global service object, however that is accomplished
service = get_global_service_object()
return service.unpickleObject(buffer)
_unpickle_service_object.__safe_for_unpickling__ = True
You could now use this _unpickle_service_object function in the return value of your __reduce__ methods so that your objects linked to the new environment's global Service object when unpickled.

Related

Intercept magic method calls in python class

I am trying to make a class that wraps a value that will be used across multiple other objects. For computational reasons, the aim is for this wrapped value to only be calculated once and the reference to the value passed around to its users. I don't believe this is possible in vanilla python due to its object container model. Instead, my approach is a wrapper class that is passed around, defined as follows:
class DynamicProperty():
def __init__(self, value = None):
# Value of the property
self.value: Any = value
def __repr__(self):
# Use value's repr instead
return repr(self.value)
def __getattr__(self, attr):
# Doesn't exist in wrapper, get it from the value
# instead
return getattr(self.value, attr)
The following works as expected:
wrappedString = DynamicProperty("foo")
wrappedString.upper() # 'FOO'
wrappedFloat = DynamicProperty(1.5)
wrappedFloat.__add__(2) # 3.5
However, implicitly calling __add__ through normal syntax fails:
wrappedFloat + 2 # TypeError: unsupported operand type(s) for
# +: 'DynamicProperty' and 'float'
Is there a way to intercept these implicit method calls without explicitly defining magic methods for DynamicProperty to call the method on its value attribute?
Talking about "passing by reference" will only confuse you. Keep that terminology to languages where you can have a choice on that, and where it makes a difference. In Python you always pass objects around - and this passing is the equivalent of "passing by reference" - for all objects - from None to int to a live asyncio network connection pool instance.
With that out of the way: the algorithm the language follows to retrieve attributes from an object is complicated, have details - implementing __getattr__ is just the tip of the iceberg. Reading the document called "Data Model" in its entirety will give you a better grasp of all the mechanisms involved in retrieving attributes.
That said, here is how it works for "magic" or "dunder" methods - (special functions with two underscores before and two after the name): when you use an operator that requires the existence of the method that implements it (like __add__ for +), the language checks the class of your object for the __add__ method - not the instance. And __getattr__ on the class can dynamically create attributes for instances of that class only.
But that is not the only problem: you could create a metaclass (inheriting from type) and put a __getattr__ method on this metaclass. For all querying you would do from Python, it would look like your object had the __add__ (or any other dunder method) in its class. However, for dunder methods, Python do not go through the normal attribute lookup mechanism - it "looks" directly at the class, if the dunder method is "physically" there. There are slots in the memory structure that holds the classes for each of the possible dunder methods - and they either refer to the corresponding method, or are "null" (this is "viewable" when coding in C on the Python side, the default dir will show these methods when they exist, or omit them if not). If they are not there, Python will just "say" the object does not implement that operation and period.
The way to work around that with a proxy object like you want is to create a proxy class that either features the dunder methods from the class you want to wrap, or features all possible methods, and upon being called, check if the underlying object actually implements the called method.
That is why "serious" code will rarely, if ever, offer true "transparent" proxy objects. There are exceptions, but from "Weakrefs", to "super()", to concurrent.futures, just to mention a few in the core language and stdlib, no one attempts a "fully working transparent proxy" - instead, the api is more like you call a ".value()" or ".result()" method on the wrapper to get to the original object itself.
However, it can be done, as I described above. I even have a small (long unmaintained) package on pypi that does that, wrapping a proxy for a future.
The code is at https://bitbucket.org/jsbueno/lelo/src/master/lelo/_lelo.py
The + operator in your case does not work, because DynamicProperty does not inherit from float. See:
>>> class Foo(float):
pass
>>> Foo(1.5) + 2
3.5
So, you'll need to do some kind of dynamic inheritance:
def get_dynamic_property(instance):
base = type(instance)
class DynamicProperty(base):
pass
return DynamicProperty(instance)
wrapped_string = get_dynamic_property("foo")
print(wrapped_string.upper())
wrapped_float = get_dynamic_property(1.5)
print(wrapped_float + 2)
Output:
FOO
3.5

Python 3.6 pickling custom procedure

I have some objects of class A which has its own method to be pickled, call it custom_module.customPickle(A) which takes an instance of A and returns a serialization string.
I also have list of objects each of class B that contains A.
I need to pickle the list, but pickling A gives some error difficult to solve. However, A has its own method to be pickled.
I can implement the __reduce__() method in class B so that it calls custom_module.customPickle(A). But how can I do this so that pickle is able to serialize B efficiently?
Object A is a music21.stream and object B is a custom object. The custom serialization function is music21.converter.freezeStr(streamObj, fmt=None) and the unpickle function should be music21.converter.thawStr(strData)
You can use the copyreg module to register custom functions for pickling and unpickling; the function you register acts like a __reduce__ method on the class.
If you return a tuple of (unpickle_function, state), then the registered unpickle_function callable will be called to unpickle it again, with state as the argument, so you can use your music21.converter.thawStr() function there:
import copyreg
import music21.converter
import music21.stream
def pickle_music21_stream(stream_obj):
return music21.converter.thawStr, (music21.converter.freezeStr(stream_obj),)
copyreg.pickle(music21.stream.Stream, pickle_music21_stream)
(the constructor argument to copyreg is ignored in recent Python versions)
This registers a global handler for those objects. You can also use a dispatch table per pickler, see [*Dispatch Tables on how you'd register one.
Now, when pickling, when encountering any instances of Stream the handle_stream() function is used to produce a serialisation, and the thawStr() function will be used to unpickle that data again.
However, the music21.converter functions use pickle themselves. They effectively pack and clean up the stream, and then pickle the resulting Stream instance. This will then call the custom handler, and you have an infinite loop.
The work-around is to use a custom dispatch table to handle pickling and unpickling. Avoid using copyreg in this case, as it sets a global hook that'll be called recursively each time a Stream object is being pickled.
Your own pickle infrastructure needs to use a custom pickler:
import copyreg
import io
import pickle
import music21.converter
import music21.stream
def pickle_music21_stream(stream_obj):
return music21.converter.thawStr, (music21.converter.freezeStr(stream_obj),)
def dumps(obj):
f = io.BytesIO()
p = pickle.Pickler(f)
p.dispatch_table = copyreg.dispatch_table.copy()
p.dispatch_table[music21.stream.Stream] = pickle_music21_stream
p.dump(obj)
return f.getvalue()
def loads(data):
return pickle.loads(data) # hook is registered in the pickle data
Here the custom function is only called when a Stream instance is found in your own data structure. The music21 routines use the global pickle.dumps() and pickle.loads() functions and won't use the same hook.

boost python enable_pickling expectation

Hi I am using python to initiate a cpp class which use boost python lib to convert into python usable. at the same time, i have a requirement to pickle the python classes that use the python enabled cpp class.
So what i did is to add enable_picking() to an example class definition like this:
class_<pform::base::Price>("Price", init<double>())
.def(self == self)
.def(self_ns::str(self_ns::self)) // __str__
.def("get_value", &pform::base::Price::get_value)
it make the class pickleable. However i get this error when unpickle it.
Boost.Python.ArgumentError: Python argument types in
Price.__init__(Price)
did not match C++ signature:
__init__(_object*, double)
So what is missing here?
A bit late, but I found the relevant boost documentation for this:
http://www.boost.org/doc/libs/1_64_0/libs/python/doc/html/reference/topics/pickle_support.html
The Pickle Interface
At the user level, the Boost.Python pickle
interface involves three special methods:
__getinitargs__ When an instance of a Boost.Python extension class is pickled, the pickler tests if the instance has a __getinitargs__
method. This method must return a Python tuple (it is most convenient
to use a boost::python::tuple). When the instance is restored by the
unpickler, the contents of this tuple are used as the arguments for
the class constructor. If __getinitargs__ is not defined, pickle.load
will call the constructor (__init__) without arguments; i.e., the
object must be default-constructible.
__getstate__ When an instance of a Boost.Python extension class is pickled, the pickler tests if the instance has a __getstate__ method.
This method should return a Python object representing the state of
the instance.
__setstate__ When an instance of a Boost.Python extension class is restored by the unpickler (pickle.load), it is first constructed using
the result of __getinitargs__ as arguments (see above). Subsequently
the unpickler tests if the new instance has a __setstate__ method. If
so, this method is called with the result of __getstate__ (a Python
object) as the argument.
The three special methods described above may
be .def()'ed individually by the user. However, Boost.Python provides
an easy to use high-level interface via the
boost::python::pickle_suite class that also enforces consistency:
__getstate__ and __setstate__ must be defined as pairs. Use of this interface is demonstrated by the following examples.
In your particular example the class is not default constructible as it needs a double argument (which I assume is the "value"). To wrap it for Python you would also need to define:
.def("__getinitargs__", +[](pform::base::Price const& self){
return boost::python::make_tuple(self.get_value());
})
Now Boost Python will initialize your class using "value"; instead of calling the default constructor (pform::base::Price()).

Overriding the default type() metaclass before Python runs

Here be dragons. You've been warned.
I'm thinking about creating a new library that will attempt to help write a better test suite.
In order to do that one of the features is a feature that verifies that any object that is being used which isn't the test runner and the system under test has a test double (a mock object, a stub, a fake or a dummy). If the tester wants the live object and thus reduce test isolation it has to specify so explicitly.
The only way I see to do this is to override the builtin type() function which is the default metaclass.
The new default metaclass will check the test double registry dictionary to see if it has been replaced with a test double or if the live object was specified.
Of course this is not possible through Python itself:
>>> TypeError: can't set attributes of built-in/extension type 'type'
Is there a way to intervene with Python's metaclass lookup before the test suite will run (and probably Python)?
Maybe using bytecode manipulation? But how exactly?
The following is not advisable, and you'll hit plenty of problems and cornercases implementing your idea, but on Python 3.1 and onwards, you can hook into the custom class creation process by overriding the __build_class__ built-in hook:
import builtins
_orig_build_class = builtins.__build_class__
class SomeMockingMeta(type):
# whatever
def my_build_class(func, name, *bases, **kwargs):
if not any(isinstance(b, type) for b in bases):
# a 'regular' class, not a metaclass
if 'metaclass' in kwargs:
if not isinstance(kwargs['metaclass'], type):
# the metaclass is a callable, but not a class
orig_meta = kwargs.pop('metaclass')
class HookedMeta(SomeMockingMeta):
def __new__(meta, name, bases, attrs):
return orig_meta(name, bases, attrs)
kwargs['metaclass'] = HookedMeta
else:
# There already is a metaclass, insert ours and hope for the best
class SubclassedMeta(SomeMockingMeta, kwargs['metaclass']):
pass
kwargs['metaclass'] = SubclassedMeta
else:
kwargs['metaclass'] = SomeMockingMeta
return _orig_build_class(func, name, *bases, **kwargs)
builtins.__build_class__ = my_build_class
This is limited to custom classes only, but does give you an all-powerful hook.
For Python versions before 3.1, you can forget hooking class creation. The C build_class function directly uses the C-type type() value if no metaclass has been defined, it never looks it up from the __builtin__ module, so you cannot override it.
I like your idea, but I think you're going slightly off course. What if the code calls a library function instead of a class? Your fake type() would never be called and you would never be advised that you failed to mock that library function. There are plenty of utility functions both in Django and in any real codebase.
I would advise you to write the interpreter-level support you need in the form of a patch to the Python sources. Or you might find it easier to add such a hook to PyPy's codebase, which is written in Python itself, instead of messing with Python's C sources.
I just realized that the Python interpreter includes a comprehensive set of tools to enable any piece of Python code to step through the execution of any other piece of code, checking what it does down to each function call, or even to each single Python line being executed, if needed.
sys.setprofile should be enough for your needs. With it you can install a hook (a callback) that will be notified of every function call being made by the target program. You cannot use it to change the behavior of the target program, but you can collect statistics about it, including your "mock coverage" metric.
Python's documentation about the Profilers introduces a number of modules built upon sys.setprofile. You can study their sources to see how to use it effectively.
If that turns out not to be enough, there is still sys.settrace, a heavy-handed approach that allows you to step through every line of the target program, inspect its variables and modify its execution. The standard module bdb.py is built upon sys.settrace and implements the standard set of debugging tools (breakpoints, step into, step over, etc.) It is used by pdb.py which is the commandline debugger, and by other graphical debuggers.
With these two hooks, you should be all right.

Overcoming Python's limitations regarding instance methods

It seems that Python has some limitations regarding instance methods.
Instance methods can't be copied.
Instance methods can't be pickled.
This is problematic for me, because I work on a very object-oriented project in which I reference instance methods, and there's use of both deepcopying and pickling. The pickling thing is done mostly by the multiprocessing mechanism.
What would be a good way to solve this? I did some ugly workaround to the copying issue, but
I'm looking for a nicer solution to both problems.
Does anyone have any suggestions?
Update:
My use case: I have a tiny event system. Each event has an .action attribute that points to a function it's supposed to trigger, and sometimes that function is an instance method of some object.
You might be able to do this using copy_reg.pickle. In Python 2.6:
import copy_reg
import types
def reduce_method(m):
return (getattr, (m.__self__, m.__func__.__name__))
copy_reg.pickle(types.MethodType, reduce_method)
This does not store the code of the method, just its name; but that will work correctly in the common case.
This makes both pickling and copying work!
REST - Representation State Transfer. Just send state, not methods.
To transfer an object X from A to B, we do this.
A encode the state of X in some
handy, easy-to-parse notation. JSON
is popular.
A sends the JSON text to B.
B decodes the state of X form JSON
notation, reconstructing X.
B must have the class definitions for X's class for this to work. B must have all functions and other class definitions on which X's class depends. In short, both A
and B have all the definitions. Only a representation of the object's state gets moved
around.
See any article on REST.
http://en.wikipedia.org/wiki/Representational_State_Transfer
http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm
pickle the instance and then access the method after unpickling it. Pickling a method of an instance doesn't make sense because it relies on the instance. If it doesn't, then write it as an independent function.
import pickle
class A:
def f(self):
print 'hi'
x = A()
f = open('tmp', 'w')
r = pickle.dump(x, f)
f.close()
f = open('tmp', 'r')
pickled_x = pickle.load(f)
pickled_x.f()

Categories