Pickling Cython decorated function results in PicklingError - python

I have the following code:
def decorator(func):
#functools.wraps(func)
def other_func():
print('other func')
return other_func
#decorator
def func():
pass
If I try to pickle func everything works. However if I compile the module as a Cython extension it fails.
Here is the error:
>>>> pickle.dumps(module.func)
PicklingError: Can't pickle <cyfunction decorator.<locals>.other_func at 0x102a45a58>: attribute lookup other_func on module failed
The same happens if I use dill instead of pickle.
Do you know how to fix it?

I don't think there is anything you can really do here. It looks like a possible bug in Cython. But there might be a good reason for why Cython does what it does that I don't know about.
The problem arises because Cython functions are exposed as builtin functions in Python land (eg. map, all, etc.). These functions cannot have their name attributes changed. However, Cython attempts to make its functions more like pure Python functions, and so provides for the ability for several of their attributes to modified. However, the Cython functions also implement __reduce__ which customises how objects are serialised by pickle. It looks like this function does think the name of the function object can be changed and so ignores these values and uses the name of the internal PyCFunction struct that is being wrapped (github blob).
Best thing you can do is file a bug report. You might be able to create a thin wrapper than enables your function to be serialised, but this will add overhead when the function is called.
Customising Pickle
You can use the persistent_id feature of the Pickler and Unpickler to override the custom implementation that Cython has provided. Below is how to customise pickling for specific types/objects. It's done with a pure python function, but you can easily change it to deal with Cython functions.
import pickle
from importlib import import_module
from io import BytesIO
# example using pure python
class NoPickle:
def __init__(self, name):
# emulating a function set of attributes needed to pickle
self.__module__ = __name__
self.__qualname__ = name
def __reduce__(self):
# cannot pickle this object
raise Exception
my_object = NoPickle('my_object')
# pickle.dumps(obj) # error!
# use persistent_id/load to help dump/load cython functions
class CustomPickler(pickle.Pickler):
def persistent_id(self, obj):
if isinstance(obj, NoPickle):
# replace with NoPickle with type(module.func) to get the correct type
# alternatively you might want to include a simple cython function
# in the same module to make it easier to get the write type.
return "CythonFunc" , obj.__module__, obj.__qualname__
else:
# else return None to pickle the object as normal
return None
class CustomUnpickler(pickle.Unpickler):
def persistent_load(self, pid):
if pid[0] == "CythonFunc":
_, mod_name, func_name = pid
return getattr(import_module(mod_name), func_name)
else:
raise pickle.UnpicklingError('unsupported pid')
bytes_ = BytesIO()
CustomPickler(bytes_).dump(my_object)
bytes_.seek(0)
obj = CustomUnpickler(bytes_).load()
assert obj is my_object

Related

Python multiprocessing - mapping private method

Generally I'm aware of pickle mechanism, but can't understand why this example:
from multiprocessing import Pool
class Foo:
attr = 'a class attr'
def __test(self,x):
print(x, self.attr)
def test2(self):
with Pool(4) as p:
p.map(self.__test, [1,2,3,4])
if __name__ == '__main__':
f = Foo()
f.test2()
complains about __test method?
return _ForkingPickler.loads(res)
AttributeError: 'Foo' object has no attribute '__test'
After changing def __test to def _test(one underscore) everything works fine. Do I miss any basics knowledge of pickleing or "private" methods?
This appears to be a flaw in the name mangling magic. The actual name of a name-mangled private function incorporates the class name, so Foo.__test is actually named Foo._Foo__test, and other methods in the class just implicitly look up that name when they use self.__test.
Problem is, the magic extends to preserving the __name__ unmangled; Foo._Foo__test.__name__ is "__test". And pickle uses the __name__ to serialize the method. When it tries to deserialize it on the other end, it tries to look up plain __test, without applying the name mangling, so it can't find _Foo__test (the real name).
I don't think there is any immediate solution here aside from not using a private method directly (using it indirectly via another non-private method or global function would be fine); even if you try to pass self._Foo__test, it'll still pickle the unmangled name from __name__.
The longer term solution would be to file a bug on the Python bug tracker; there may be a clever way to preserve the "friendly" __name__ while still allowing pickle to seamlessly mangle as needed.

PicklingError: Can't pickle <type 'function'> with python process pool executor

util.py
def exec_multiprocessing(self, method, args):
with concurrent.futures.ProcessPoolExecutor() as executor:
results = pool.map(method, args)
return results
clone.py
def clone_vm(self, name, first_run, host, ip):
# clone stuff
invoke.py
exec_args = [(name, first_run, host, ip) for host, ip in zip(hosts, ips)]
results = self.util.exec_multiprocessing(self.clone.clone_vm, exec_args)
The above code gives the pickling error. I found that it is because we are passing instance method. So we should unwrap the instance method. But I am not able to make it work.
Note: I can not create top level method to avoid this. I have to use instance methods.
Let's start with an overview - why the error came up in the first place:
The multiprocessing must requires to pickle (serialize) data to pass them along processes or threads. To be specific, pool methods themselves rely on queue at the lower level, to stack tasks and pass them to threads/processes, and queue requires everything that goes through it must be pickable.
The problem is, not all items are pickable - list of pickables - and when one tries to pickle an unpicklable object, gets the PicklingError exception - exactly what happened in your case, you passed an instance method which is not picklable.
There can be various workarounds (as is the case with every problem) - the solution which worked for me is here by Dano - is to make pickle handle the methods and register it with copy_reg.
Add the following lines at the start of your module clone.py to make clone_vm picklable (do import copy_reg and types):
def _pickle_method(m):
if m.im_self is None:
return getattr, (m.im_class, m.im_func.func_name)
else:
return getattr, (m.im_self, m.im_func.func_name)
copy_reg.pickle(types.MethodType, _pickle_method)
Other useful answers - by Alex Martelli, mrule, by unutbu
You need to add support for pickling functions and methods for that to work as pointed out by Nabeel Ahmed. But his solution won't work with name-mangled methods -
import copy_reg
import types
def _pickle_method(method):
attached_object = method.im_self or method.im_class
func_name = method.im_func.func_name
if func_name.startswith('__'):
func_name = filter(lambda method_name: method_name.startswith('_') and method_name.endswith(func_name), dir(attached_object))[0]
return (getattr, (attached_object, func_name))
copy_reg.pickle(types.MethodType, _pickle_method)
This would work for name mangled methods as well. For this to work, you need to ensure this code is always ran before any pickling happens. Ideal place is settings file(if you are using django) or some package that is always imported before other code is executed.
Credits:- Steven Bethard (https://bethard.cis.uab.edu/)

Dumping a subclass of gtk.ListStore using pickle

I am trying to dump a custom class using pickle. The class was subclassed from gtk.ListStore, since that made it easier to store particular data and then display it using gtk. This can be reproduced as shown here.
import gtk
import pickle
import os
class foo(gtk.ListStore):
pass
if __name__=='__main__':
x = foo(str)
with open(os.path.expandvars('%userprofile%\\temp.txt'),'w') as f:
pickle.dump(x,f)
The solution that I have tried was to add a __getstate__ function into my class. As far as I understand the documentation, this should take precedence for pickle so that it no longer tries to serialize the ListStore, which it is unable to do. However, I still get an identical error from pickle.dump when I try to pickle my object. The error can be reproduced as follows.
import gtk
import pickle
import os
class foo(gtk.ListStore):
def __getstate__(self):
return 'bar'
if __name__=='__main__':
x = foo(str)
with open(os.path.expandvars('%userprofile%\\temp.txt'),'w') as f:
pickle.dump(x,f)
In each case, pickle.dump raises a TypeError, "can't pickle ListStore objects". Using print statements, I have verified that the __getstate__ function is run when using pickle.dump. I don't see any hints as to what to do next from the documentation, and so I'm in a bit of a bind. Any suggestions?
With this method you can even use json instead of pickle for your purpose.
Here is a quick working example to show you the steps you need to employ to pickle "unpicklable types" like gtk.ListStore. Essentially you need to do a few things:
Define __reduce__ which returns a function and arguments needed to reconstruct the instance.
Determine the column types for your ListStore. The method self.get_column_type(0) returns a Gtype, so you will need to map this back to the corresponding Python type. I've left that as an exercise - in my example I've employed a hack to get the column types from the first row of values.
Your _new_foo function will need to rebuild the instance.
Example:
import gtk, os, pickle
def _new_foo(cls, coltypes, rows):
inst = cls.__new__(cls)
inst.__init__(*coltypes)
for row in rows:
inst.append(row)
return inst
class foo(gtk.ListStore):
def __reduce__(self):
rows = [list(row) for row in self]
# hack - to be correct you'll really need to use
# `self.get_column_type` and map it back to Python's
# corresponding type.
coltypes = [type(c) for c in rows[0]]
return _new_foo, (self.__class__, coltypes, rows)
x = foo(str, int)
x.append(['foo', 1])
x.append(['bar', 2])
s = pickle.dumps(x)
y = pickle.loads(s)
print list(y[0])
print list(y[1])
Output:
['foo', 1]
['bar', 2]
When you subclass object, object.__reduce__ takes care of calling __getstate__. It would seem that since this is a subclass of gtk.ListStore, the default implementation of __reduce__ tries to pickle the data for reconstructing a gtk.ListStore object first, then calls your __getstate__, but since the gtk.ListStore can't be pickled, it refuses to pickle your class. The problem should go away if you try to implement __reduce__ and __reduce_ex__ instead of __getstate__.
>>> class Foo(gtk.ListStore):
... def __init__(self, *args):
... super(Foo, self).__init__(*args)
... self._args = args
... def __reduce_ex__(self, proto=None):
... return type(self), self._args, self.__getstate__()
... def __getstate__(self):
... return 'foo'
... def __setstate__(self, state):
... print state
...
>>> x = Foo(str)
>>> pickle.loads(pickle.dumps(x))
foo
<Foo object at 0x18be1e0 (__main__+Foo-v3 at 0x194bd90)>
As an addition, you may try to consider other serializers, such as json. There you take full control of the serialiazaton process by defining how custom classes are to be serialized yourself. Plus by default they come without the security issues of pickle.

python dll swig help

First, I have never used SWIG, I dont know what it does...
We have a python library, that as far as I can tell uses SWIG, say when I want to use this library I have to put this in my python code:
import pylib
Now if I go open this vendor's pylib.py I see some classes, functions and this header:
# This file was automatically generated by SWIG (http://www.swig.org).
# Version 1.3.33
#
# Don't modify this file, modify the SWIG interface instead.
# This file is compatible with both classic and new-style classes.
import _pylib
import new
new_instancemethod = new.instancemethod
Next, in the same directory as pylib.py, there is a file called _pylib.pyd, that I think is a dll.
My problem is the following:
Many classes in pylib.py look like this:
class PersistentCache(_object):
__swig_setmethods__ = {}
__setattr__ = lambda self, name, value: _swig_setattr(self, PersistentCache, name, value)
__swig_getmethods__ = {}
__getattr__ = lambda self, name: _swig_getattr(self, PersistentCache, name)
__repr__ = _swig_repr
def __init__(self, *args):
this = _pylib.new_PersistentCache(*args)
try: self.this.append(this)
except: self.this = this
__swig_destroy__ = _pylib.delete_PersistentCache
__del__ = lambda self : None;
def setProperty(*args): return _pylib.PersistentCache_setProperty(*args)
def getProperty(*args): return _pylib.PersistentCache_getProperty(*args)
def clear(*args): return _pylib.PersistentCache_clear(*args)
def entries(*args): return _pylib.PersistentCache_entries(*args)
PersistentCache_swigregister = _pylib.PersistentCache_swigregister
PersistentCache_swigregister(PersistentCache)
Say I want to use this class or it's methods, with things like:
*args
as parameters, I cant know how many parameters I should pass nor what they should be, with what I have is it possible to find this out, so I can use the library?
SWIG is a method of automatically wrapping up a C/C++ library so it can be accessed from Python. The library is actually a C library compiled as a DLL. The Python code is just pass-through code, all autogenerated by SWIG, and you're right that it's not very helpful.
If you want to know what arguments to pass, you should not look at the Python code, you should look at the C code it was generated from -- if you have it, or the documentation if not. If you don't have any code or documentation for that library, then I think you're going to have a very difficult time figuring it out... you should contact the vendor for documentation.

Python serialize lexical closures?

Is there a way to serialize a lexical closure in Python using the standard library? pickle and marshal appear not to work with lexical closures. I don't really care about the details of binary vs. string serialization, etc., it just has to work. For example:
def foo(bar, baz) :
def closure(waldo) :
return baz * waldo
return closure
I'd like to just be able to dump instances of closure to a file and read them back.
Edit:
One relatively obvious way that this could be solved is with some reflection hacks to convert lexical closures into class objects and vice-versa. One could then convert to classes, serialize, unserialize, convert back to closures. Heck, given that Python is duck typed, if you overloaded the function call operator of the class to make it look like a function, you wouldn't even really need to convert it back to a closure and the code using it wouldn't know the difference. If any Python reflection API gurus are out there, please speak up.
PiCloud has released an open-source (LGPL) pickler which can handle function closure and a whole lot more useful stuff. It can be used independently of their cloud computing infrastructure - it's just a normal pickler. The whole shebang is documented here, and you can download the code via 'pip install cloud'. Anyway, it does what you want. Let's demonstrate that by pickling a closure:
import pickle
from StringIO import StringIO
import cloud
# generate a closure
def foo(bar, baz):
def closure(waldo):
return baz * waldo
return closure
closey = foo(3, 5)
# use the picloud pickler to pickle to a string
f = StringIO()
pickler = cloud.serialization.cloudpickle.CloudPickler(f)
pickler.dump(closey)
#rewind the virtual file and reload
f.seek(0)
closey2 = pickle.load(f)
Now we have closey, the original closure, and closey2, the one that has been restored from a string serialisation. Let's test 'em.
>>> closey(4)
20
>>> closey2(4)
20
Beautiful. The module is pure python—you can open it up and easily see what makes the magic work. (The answer is a lot of code.)
If you simply use a class with a __call__ method to begin with, it should all work smoothly with pickle.
class foo(object):
def __init__(self, bar, baz):
self.baz = baz
def __call__(self,waldo):
return self.baz * waldo
On the other hand, a hack which converted a closure into an instance of a new class created at runtime would not work, because of the way pickle deals with classes and instances. pickle doesn't store classes; only a module name and class name. When reading back an instance or class it tries to import the module and find the required class in it. If you used a class created on-the-fly, you're out of luck.
Yes! I got it (at least I think) -- that is, the more generic problem of pickling a function. Python is so wonderful :), I found out most of it though the dir() function and a couple of web searches. Also wonderful to have it [hopefully] solved, I needed it also.
I haven't done a lot of testing on how robust this co_code thing is (nested fcns, etc.), and it would be nice if someone could look up how to hook Python so functions can be pickled automatically (e.g. they might sometimes be closure args).
Cython module _pickle_fcn.pyx
# -*- coding: utf-8 -*-
cdef extern from "Python.h":
object PyCell_New(object value)
def recreate_cell(value):
return PyCell_New(value)
Python file
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# author gatoatigrado [ntung.com]
import cPickle, marshal, types
import pyximport; pyximport.install()
import _pickle_fcn
def foo(bar, baz) :
def closure(waldo) :
return baz * waldo
return closure
# really this problem is more about pickling arbitrary functions
# thanks so much to the original question poster for mentioning marshal
# I probably wouldn't have found out how to serialize func_code without it.
fcn_instance = foo("unused?", -1)
code_str = marshal.dumps(fcn_instance.func_code)
name = fcn_instance.func_name
defaults = fcn_instance.func_defaults
closure_values = [v.cell_contents for v in fcn_instance.func_closure]
serialized = cPickle.dumps((code_str, name, defaults, closure_values),
protocol=cPickle.HIGHEST_PROTOCOL)
code_str_, name_, defaults_, closure_values_ = cPickle.loads(serialized)
code_ = marshal.loads(code_str_)
closure_ = tuple([_pickle_fcn.recreate_cell(v) for v in closure_values_])
# reconstructing the globals is like pickling everything :)
# for most functions, it's likely not necessary
# it probably wouldn't be too much work to detect if fcn_instance global element is of type
# module, and handle that in some custom way
# (have the reconstruction reinstantiate the module)
reconstructed = types.FunctionType(code_, globals(),
name_, defaults_, closure_)
print(reconstructed(3))
cheers,
Nicholas
EDIT - more robust global handling is necessary for real-world cases. fcn.func_code.co_names lists global names.
#!python
import marshal, pickle, new
def dump_func(f):
if f.func_closure:
closure = tuple(c.cell_contents for c in f.func_closure)
else:
closure = None
return marshal.dumps(f.func_code), f.func_defaults, closure
def load_func(code, defaults, closure, globs):
if closure is not None:
closure = reconstruct_closure(closure)
code = marshal.loads(code)
return new.function(code, globs, code.co_name, defaults, closure)
def reconstruct_closure(values):
ns = range(len(values))
src = ["def f(arg):"]
src += [" _%d = arg[%d]" % (n, n) for n in ns]
src += [" return lambda:(%s)" % ','.join("_%d"%n for n in ns), '']
src = '\n'.join(src)
try:
exec src
except:
raise SyntaxError(src)
return f(values).func_closure
if __name__ == '__main__':
def get_closure(x):
def the_closure(a, b=1):
return a * x + b, some_global
return the_closure
f = get_closure(10)
code, defaults, closure = dump_func(f)
dump = pickle.dumps((code, defaults, closure))
code, defaults, closure = pickle.loads(dump)
f = load_func(code, defaults, closure, globals())
some_global = 'some global'
print f(2)
Recipe 500261: Named Tuples contains a function that defines a class on-the-fly. And this class supports pickling.
Here's the essence:
result.__module__ = _sys._getframe(1).f_globals.get('__name__', '__main__')
Combined with #Greg Ball's suggestion to create a new class at runtime it might answer your question.

Categories