Python serialize lexical closures? - python

Is there a way to serialize a lexical closure in Python using the standard library? pickle and marshal appear not to work with lexical closures. I don't really care about the details of binary vs. string serialization, etc., it just has to work. For example:
def foo(bar, baz) :
def closure(waldo) :
return baz * waldo
return closure
I'd like to just be able to dump instances of closure to a file and read them back.
Edit:
One relatively obvious way that this could be solved is with some reflection hacks to convert lexical closures into class objects and vice-versa. One could then convert to classes, serialize, unserialize, convert back to closures. Heck, given that Python is duck typed, if you overloaded the function call operator of the class to make it look like a function, you wouldn't even really need to convert it back to a closure and the code using it wouldn't know the difference. If any Python reflection API gurus are out there, please speak up.

PiCloud has released an open-source (LGPL) pickler which can handle function closure and a whole lot more useful stuff. It can be used independently of their cloud computing infrastructure - it's just a normal pickler. The whole shebang is documented here, and you can download the code via 'pip install cloud'. Anyway, it does what you want. Let's demonstrate that by pickling a closure:
import pickle
from StringIO import StringIO
import cloud
# generate a closure
def foo(bar, baz):
def closure(waldo):
return baz * waldo
return closure
closey = foo(3, 5)
# use the picloud pickler to pickle to a string
f = StringIO()
pickler = cloud.serialization.cloudpickle.CloudPickler(f)
pickler.dump(closey)
#rewind the virtual file and reload
f.seek(0)
closey2 = pickle.load(f)
Now we have closey, the original closure, and closey2, the one that has been restored from a string serialisation. Let's test 'em.
>>> closey(4)
20
>>> closey2(4)
20
Beautiful. The module is pure python—you can open it up and easily see what makes the magic work. (The answer is a lot of code.)

If you simply use a class with a __call__ method to begin with, it should all work smoothly with pickle.
class foo(object):
def __init__(self, bar, baz):
self.baz = baz
def __call__(self,waldo):
return self.baz * waldo
On the other hand, a hack which converted a closure into an instance of a new class created at runtime would not work, because of the way pickle deals with classes and instances. pickle doesn't store classes; only a module name and class name. When reading back an instance or class it tries to import the module and find the required class in it. If you used a class created on-the-fly, you're out of luck.

Yes! I got it (at least I think) -- that is, the more generic problem of pickling a function. Python is so wonderful :), I found out most of it though the dir() function and a couple of web searches. Also wonderful to have it [hopefully] solved, I needed it also.
I haven't done a lot of testing on how robust this co_code thing is (nested fcns, etc.), and it would be nice if someone could look up how to hook Python so functions can be pickled automatically (e.g. they might sometimes be closure args).
Cython module _pickle_fcn.pyx
# -*- coding: utf-8 -*-
cdef extern from "Python.h":
object PyCell_New(object value)
def recreate_cell(value):
return PyCell_New(value)
Python file
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# author gatoatigrado [ntung.com]
import cPickle, marshal, types
import pyximport; pyximport.install()
import _pickle_fcn
def foo(bar, baz) :
def closure(waldo) :
return baz * waldo
return closure
# really this problem is more about pickling arbitrary functions
# thanks so much to the original question poster for mentioning marshal
# I probably wouldn't have found out how to serialize func_code without it.
fcn_instance = foo("unused?", -1)
code_str = marshal.dumps(fcn_instance.func_code)
name = fcn_instance.func_name
defaults = fcn_instance.func_defaults
closure_values = [v.cell_contents for v in fcn_instance.func_closure]
serialized = cPickle.dumps((code_str, name, defaults, closure_values),
protocol=cPickle.HIGHEST_PROTOCOL)
code_str_, name_, defaults_, closure_values_ = cPickle.loads(serialized)
code_ = marshal.loads(code_str_)
closure_ = tuple([_pickle_fcn.recreate_cell(v) for v in closure_values_])
# reconstructing the globals is like pickling everything :)
# for most functions, it's likely not necessary
# it probably wouldn't be too much work to detect if fcn_instance global element is of type
# module, and handle that in some custom way
# (have the reconstruction reinstantiate the module)
reconstructed = types.FunctionType(code_, globals(),
name_, defaults_, closure_)
print(reconstructed(3))
cheers,
Nicholas
EDIT - more robust global handling is necessary for real-world cases. fcn.func_code.co_names lists global names.

#!python
import marshal, pickle, new
def dump_func(f):
if f.func_closure:
closure = tuple(c.cell_contents for c in f.func_closure)
else:
closure = None
return marshal.dumps(f.func_code), f.func_defaults, closure
def load_func(code, defaults, closure, globs):
if closure is not None:
closure = reconstruct_closure(closure)
code = marshal.loads(code)
return new.function(code, globs, code.co_name, defaults, closure)
def reconstruct_closure(values):
ns = range(len(values))
src = ["def f(arg):"]
src += [" _%d = arg[%d]" % (n, n) for n in ns]
src += [" return lambda:(%s)" % ','.join("_%d"%n for n in ns), '']
src = '\n'.join(src)
try:
exec src
except:
raise SyntaxError(src)
return f(values).func_closure
if __name__ == '__main__':
def get_closure(x):
def the_closure(a, b=1):
return a * x + b, some_global
return the_closure
f = get_closure(10)
code, defaults, closure = dump_func(f)
dump = pickle.dumps((code, defaults, closure))
code, defaults, closure = pickle.loads(dump)
f = load_func(code, defaults, closure, globals())
some_global = 'some global'
print f(2)

Recipe 500261: Named Tuples contains a function that defines a class on-the-fly. And this class supports pickling.
Here's the essence:
result.__module__ = _sys._getframe(1).f_globals.get('__name__', '__main__')
Combined with #Greg Ball's suggestion to create a new class at runtime it might answer your question.

Related

Check if class name (a string) is used for some class in Python

How do I get the exact class variable which is (in current scope) available under a given name? I want to write a function like this:
from my_module import ClassA # A subclass of my_other_module.BenevolentClass
from my_other_module import my_function
a = 'ClassA'
cls_var = my_function(a)
o = cls_var()
So that I could supply any string to my_function and so long as that string is available in the caller's namespace as a class name, it would produce the correct class much like if I copypasted the string directly to the code. The reason is that I need to supply class names to a complex object creation routine, but avoid eval when possible. My current implementation is like this:
def my_function(name):
if name in globals():
c = globals()[name]
# Actually a complex class whitelist
if issubclass(c, BenevolentClass):
return c
else:
raise ValueError(f'Potentially malicious class {name}')
But that apparently produces globals() from my_other_module, which is not what I want. I want all classes that are available at the exact line of code where my_function is called (which may be inside completely different module which is called from yet another one).
You can pass the global dict to my_function.
def my_function(name, g):
if name in g:
...
cls_var = my_function(a, globals())
I think I've found a solution. Not sure it is perfect, so I'd rather hear some comments before accepting my own answer.
import inspect
def my_function(name):
for frame in inspect.getouterframes(inspect.currentframe()):
c = None
if name in frame.frame.f_globals:
c = frame.frame.f_globals[name]
break
if c and issubclass(c, BenevolentClass):
return c
else:
raise ValueError(f'Potentially malicious class {name}')
As far as I understand it, inspect.getouterframes walks outwards along the call stack, and I can check globals on every step to see what's available. Neither local variables nor builtins are available in frame.f_globals, so it doesn't seem to have much potential for injecting malicious data.

Pickling Cython decorated function results in PicklingError

I have the following code:
def decorator(func):
#functools.wraps(func)
def other_func():
print('other func')
return other_func
#decorator
def func():
pass
If I try to pickle func everything works. However if I compile the module as a Cython extension it fails.
Here is the error:
>>>> pickle.dumps(module.func)
PicklingError: Can't pickle <cyfunction decorator.<locals>.other_func at 0x102a45a58>: attribute lookup other_func on module failed
The same happens if I use dill instead of pickle.
Do you know how to fix it?
I don't think there is anything you can really do here. It looks like a possible bug in Cython. But there might be a good reason for why Cython does what it does that I don't know about.
The problem arises because Cython functions are exposed as builtin functions in Python land (eg. map, all, etc.). These functions cannot have their name attributes changed. However, Cython attempts to make its functions more like pure Python functions, and so provides for the ability for several of their attributes to modified. However, the Cython functions also implement __reduce__ which customises how objects are serialised by pickle. It looks like this function does think the name of the function object can be changed and so ignores these values and uses the name of the internal PyCFunction struct that is being wrapped (github blob).
Best thing you can do is file a bug report. You might be able to create a thin wrapper than enables your function to be serialised, but this will add overhead when the function is called.
Customising Pickle
You can use the persistent_id feature of the Pickler and Unpickler to override the custom implementation that Cython has provided. Below is how to customise pickling for specific types/objects. It's done with a pure python function, but you can easily change it to deal with Cython functions.
import pickle
from importlib import import_module
from io import BytesIO
# example using pure python
class NoPickle:
def __init__(self, name):
# emulating a function set of attributes needed to pickle
self.__module__ = __name__
self.__qualname__ = name
def __reduce__(self):
# cannot pickle this object
raise Exception
my_object = NoPickle('my_object')
# pickle.dumps(obj) # error!
# use persistent_id/load to help dump/load cython functions
class CustomPickler(pickle.Pickler):
def persistent_id(self, obj):
if isinstance(obj, NoPickle):
# replace with NoPickle with type(module.func) to get the correct type
# alternatively you might want to include a simple cython function
# in the same module to make it easier to get the write type.
return "CythonFunc" , obj.__module__, obj.__qualname__
else:
# else return None to pickle the object as normal
return None
class CustomUnpickler(pickle.Unpickler):
def persistent_load(self, pid):
if pid[0] == "CythonFunc":
_, mod_name, func_name = pid
return getattr(import_module(mod_name), func_name)
else:
raise pickle.UnpicklingError('unsupported pid')
bytes_ = BytesIO()
CustomPickler(bytes_).dump(my_object)
bytes_.seek(0)
obj = CustomUnpickler(bytes_).load()
assert obj is my_object

Python using function like variable

I've got a Python module which has several variables with hard-coded values which are used throughout the project. I'd like to bind the variables somehow to a function to redirect to a config file. Is this possible?
# hardcoded_values.py
foo = 'abc'
bar = 1234
# usage somewhere else in another module
from hardcoded_values import *
print foo
print bar
What I want to do is change only hardcoded_values.py, so that print foo transparently calls a function.
# hardcoded_values.py
import config
foo = SomeWrapper(config.get_value, 'foo') # or whatever you can think of to call config.get_value('foo')
...
config.get_value would be a function that is called with parameter 'foo' when using variable foo (as in print foo).
I'm pretty sure that you can't do what you want to do if you import like from hardcoded_values import *.
What you want to do is to set foo to some function, and then apply the property decorator (or equivalent) so that you can call foo as foo rather than foo(). You cannot apply the property decorator to modules for reasons detailed here: Why Is The property Decorator Only Defined For Classes?
Now, if you were to import hardcoded_values then I think there is a way to do what you want to hardcoded_values.foo. I have a pretty good feeling that what I am about to describe is a BAD IDEA that should never be used, but I think it is interesting.
BAD IDEA???
So say you wanted to replace a constant like os.EX_USAGE which on my system is 64 with some function, and then call it as os.EX_USAGE rather than os.EX_USAGE(). We need to be able to use the property decorator, but for that we need a type other than module.
So what can be done is to create a new type on the fly and dump in the __dict__ of a module with a type factory function that takes a module as an argument:
def module_class_factory(module):
ModuleClass = type('ModuleClass' + module.__name__,
(object,), module.__dict__)
return ModuleClass
Now I will import os:
>>> import os
>>> os.EX_USAGE
64
>>> os.getcwd()
'/Users/Eric'
Now I will make a class OsClass, and bind the name os to an instance of this class:
>>> OsClass = module_class_factory(os)
>>> os = OsClass()
>>> os.EX_USAGE
64
>>> os.getcwd()
'/Users/Eric'
Everything still seems to work. Now define a function to replace os.EX_USAGE, noting that it will need to take a dummy self argument:
>>> def foo(self):
... return 42
...
...and bind the class attribute OsClass.EX_USAGE to the function:
>>> OsClass.EX_USAGE = foo
>>> os.EX_USAGE()
42
It works when called by the os object! Now just apply the property decorator:
>>> OsClass.EX_USAGE = property(OsClass.EX_USAGE)
>>> os.EX_USAGE
42
now the constant defined in the module has been transparently replaced by a function call.
You definitely cannot do this if the client code of your module uses from hardcoded_variables import *. That makes references to the contents of hardcoded_variables in the other module's namespace, and you can't do anything with them after that.
If the client code can be changed to just import the module (with e.g. import hardcoded_variables) and then access its attributes (with hardcoded_variables.foo) you do have a chance, but it's a bit awkward.
Python caches modules that have been imported in sys.modules (which is a dictionary). You can replace a module in that dictionary with some other object, such as an instance of a custom class, and use property objects or other descriptors to implement special behavior when you access the object's attributes.
Try making your new hardcoded_variables.py look like this (and consider renaming it, too!):
import sys
class DummyModule(object):
def __init__(self):
self._bar = 1233
#property
def foo(self):
print("foo called!")
return "abc"
#property
def bar(self):
self._bar += 1
return self._bar
if __name__ != "__main__": # Note, this is the opposite of the usual boilerplate
sys.modules[__name__] = DummyModule()
If I understand correctly, you want your hardcoded_variables module evaluated every time you try to access a variable.
I would probably have hardcoded_variables in a document (e.g. json?) and a custom wrapper function like:
import json
def getSettings(var):
with open('path/to/variables.json') as infl:
data = json.load(infl)
return infl[var]

pickling class method

I have a class whose instances need to format output as instructed by the user. There's a default format, which can be overridden. I implemented it like this:
class A:
def __init__(self, params):
# ...
# by default printing all float values as percentages with 2 decimals
self.format_functions = {float: lambda x : '{:.2%}'.format(x)}
def __str__(self):
# uses self.format_functions to format output
# ...
a = A(params)
print(a) # uses default output formatting
# overriding default output formatting
# float printed as percentages 3 decimal digits; bool printed as Y / N
a.format_functions = {float : lambda x: '{:.3%}'.format(x),
bool : lambda x: 'Y' if x else 'N'}
print(a)
Is it ok? Let me know if there is a better way to design this.
Unfortunately, I need to pickle instances of this class. But only functions defined at the top level of the module can be pickled; lambda functions are unpicklable, so my format_functions instance attribute breaks the pickling.
I tried rewriting this to use a class method instead of lambda functions, but still no luck for the same reason:
class A:
#classmethod
def default_float_format(cls, x):
return '{:.2%}'.format(x)
def __init__(self, params):
# ...
# by default printing all float values as percentages with 2 decimals
self.format_functions = {float: self.default_float_format}
def __str__(self):
# uses self.format_functions to format output
# ...
a = A(params)
pickle.dump(a) # Can't pickle <class 'method'>: attribute lookup builtins.method failed
Note that pickling here doesn't work even if I don't override the defaults; just the fact that I assigned self.format_functions = {float : self.default_float_format} breaks it.
What to do? I'd rather not pollute the namespace and break encapsulation by defining default_float_format at the module level.
Incidentally, why in the world does pickle create this restriction? It certainly feels like a gratuitous and substantial pain to the end user.
For pickling of class instances or functions (and therefore methods), Python's pickle depend that their name is available as global variables - the reference to the method in the dictionary points to a name that is not available in the global name space - which iis better said "module namespace" -
You could circunvent that by customizing the pickling of your class, by creating teh "__setstate__" and "__getstate__" methods - but I think you be better, since the formatting function does not depend on any information of the object or of the class itself (and even if some formatting function does, you could pass that as parameters), and define a function outside of the class scope.
This does work (Python 3.2):
def default_float_format( x):
return '{:.2%}'.format(x)
class A:
def __init__(self, params):
# ...
# by default printing all float values as percentages with 2 decimals
self.format_functions = {float: default_float_format}
def __str__(self):
# uses self.format_functions to format output
pass
a = A(1)
pickle.dumps(a)
If you use the dill module, either of your two approaches will just "work" as is. dill can pickle lambda as well as instances of classes and also class methods.
No need to pollute the namespace and break encapsulation, as you said you didn't want to do… but the other answer does.
dill is basically ten years or so worth of finding the right copy_reg function that registers how to serialize the majority of objects in standard python. Nothing special or tricky, it just takes time. So why doesn't pickle do this for us? Why does pickle have this restriction?
Well, if you look at the pickle docs, the answer is there:
https://docs.python.org/2/library/pickle.html#what-can-be-pickled-and-unpickled
Basically: Functions and classes are pickled by reference.
This means pickle does not work on objects defined in __main__, and it also doesn't work on many dynamically modified objects. dill registers __main__ as a module, so it has a valid namespace. dill also given you the option to not pickle by reference, so you can serialize dynamically modified objects… and class instances, class methods (bound and unbound), and so on.

Use a class in the context of a different module

I want to modify some classes in the standard library to use a different set of globals the ones that other classes in that module use.
Example
This example is an example only:
# module_a.py
my_global = []
class A:
def __init__(self):
my_global.append(self)
class B:
def __init__(self):
my_global.append(self)
In this example, If I create an instance of A, via A(), it will call append on the object named by my_global. But now I wish to create a new module, import B to it, and have B use my_global from the module it's been imported into, instead of the my_global from the module B was original defined.
# module_b.py
from module_a import B
my_global = []
Related
I'm struggling to explain my problem, here is my previous attempt which did in fact ask something completely different:
Clone a module and make changes to the copy
Update0
The example above is only for illustration of what I'm trying to achieve.
Since there is no variable scope for classes (unlike say, C++), I think a reference to a globals mapping is not stored in a class, but instead is attached to every function when defined.
Update1
An example was requested from the standard library:
Many (maybe all?) of the classes in the threading module make use of globals such as _allocate_lock, get_ident, and _active, defined here and here. One cannot change these globals without changing it for all the classes in that module.
You can't change the globals without affecting all other users of the module, but what you sort of can do is create a private copy of the whole module.
I trust you are familiar with sys.modules, and that if you remove a module from there, Python forgets it was imported, but old objects referencing it will continue to do so. When imported again, a new copy of the module will be made.
A hacky solution to your problem could would be something like this:
import sys
import threading
# Remove the original module, but keep it around
main_threading = sys.modules.pop('threading')
# Get a private copy of the module
import threading as private_threading
# Cover up evidence by restoring the original
sys.modules['threading'] = main_threading
# Modify the private copy
private_threading._allocate_lock = my_allocate_lock()
And now, private_threading.Lock has globals entirely separate from threading.Lock!
Needless to say, the module wasn't written with this in mind, and especially with a system module such as threading you might run into problems. For example, threading._active is supposed to contain all running threads, but with this solution, neither _active will have them all. The code may also eat your socks and set your house on fire, etc. Test rigorously.
Okay, here's a proof-of-concept that shows how to do it. Note that it only goes one level deep -- properties and nested functions are not adjusted. To implement that, as well as make this more robust, each function's globals() should be compared to the globals() that should be replaced, and only make the substitution if they are the same.
def migrate_class(cls, globals):
"""Recreates a class substituting the passed-in globals for the
globals already in the existing class. This proof-of-concept
version only goes one-level deep (i.e. properties and other nested
functions are not changed)."""
name = cls.__name__
bases = cls.__bases__
new_dict = dict()
if hasattr(cls, '__slots__'):
new_dict['__slots__'] = cls.__slots__
for name in cls.__slots__:
if hasattr(cls, name):
attr = getattr(cls, name)
if callable(attr):
closure = attr.__closure__
defaults = attr.__defaults__
func_code = attr.__code__
attr = FunctionType(func_code, globals)
new_dict[name] = attr
if hasattr(cls, '__dict__'):
od = getattr(cls, '__dict__')
for name, attr in od.items():
if callable(attr):
closure = attr.__closure__
defaults = attr.__defaults__
kwdefaults = attr.__kwdefaults__
func_code = attr.__code__
attr = FunctionType(func_code, globals, name, defaults, closure)
if kwdefaults:
attr.__kwdefaults__ = kwdefaults
new_dict[name] = attr
return type(name, bases, new_dict)
After having gone through this excercise, I am really curious as to why you need to do this?
"One cannot change these globals without changing it for all the classes in that module." That's the root of the problem isn't it, and a good explanation of the problem with global variables in general. The use of globals in threading tethers its classes to those global objects.
By the time you jerry-rig something to find and monkey patch each use of a global variable within an individual class from the module, are you any further ahead of just reimplementing the code for your own use?
The only work around that "might" be of use in your situation is something like mock. Mock's patch decorators/context managers (or something similar) could be used to swap out a global variable for the life-time of a given object. It works well within the very controlled context of unit testing, but in any other circumstances I wouldn't recommend it and would think about just reimplementing the code to suit my needs.
Globals are bad for exactly this reason, as I am sure you know well enough.
I'd try to reimplement A and B (maybe by subclassing them) in my own module and with all references to
my_global replaced by an injected dependency on A and B, which I'll call registry here.
class A(orig.A):
def __init__(self, registry):
self.registry = registry
self.registry.append(self)
# more updated methods
If you are creating all instances of A yourself you are pretty much done. You might want to create a factory which hides away the new init parameter.
my_registry = []
def A_in_my_registry():
return A(my_registry)
If foreign code creates orig.A instances for you, and you would rather have new A instances, you have to hope the foreign code is customizeable
with factories. If not, derive from the foreign classes and update them to use (newly injected) A factories instead. .... And rinse repeat for for the creation of those updated classes. I realize this can be tedious to almost impossible depending on the complexity of the foreign code, but most std libs are quite flat.
--
Edit: Monkey patch std lib code.
If you don't mind monkey patching std libs, you could also try to modifiy the original classes to work
with a redirection level which defaults to the original globals, but is customizable per instance:
import orig
class A(orig.A):
def __init__(self, registry=orig.my_globals):
self.registry = registry
self.registry.append(self)
# more updated methods
orig.A = A
As before you will need to control creations of A which should use non "standard globals",
but you won't have different A classes around as long as you monkey patch early enough.
If you use Python 3, you can subclass B and redefine the __globals__ attribute of the __init__ method like this:
from module_a import B
function = type(lambda: 0) # similar to 'from types import FunctionType as function', but faster
my_global = []
class My_B (B):
__init__ = function(B.__init__.__code__, globals(), '__init__', B.__init__.__defaults__, B.__init__.__closure__)
IMHO it is not possible to override global variables...
Globals are rarely a good idea.
Implicit variables are rarely a good idea.
An implicitly-used global is easy to indict as also "rarely good".
Additionally, you don't want A.__init__() doing anything "class-level" like updating some mysterious collection that exists for the class as a whole. That's often a bad idea.
Rather than mess with implicit class-level collection, you want a Factory in module_a that (1) creates A or B instances and (b) updates an explicit collection.
You can then use this factory in module_b, except with a different collection.
This can promote testability by exposing an implicit dependency.
module_a.py
class Factory( object ):
def __init__( self, collection ):
self.collection= collection
def make( self, name, *args, **kw ):
obj= eval( name )( *args, **kw )
self.collection.append( obj )
return obj
module_collection = []
factory= Factory( module_collection )
module_b.py
module_collection = []
factory = module_a.Factory( module_collection )
Now a client can do this
import module_b
a = module_b.factory.make( "A" )
b = module_b.factory.make( "B" )
print( module_b.module_collection )
You can make the API a bit more fluent by making the factory "callable" (implementing __call__ instead of make.
The point is to make the collection explicit via a factory class.

Categories