How to reference an existing variable in YAML? - python

Let's say I have an object already defined in my Python script that serves as a container for some random items. Each attribute of the container corresponds to an item. In this simple example, I have an ITEMS object that has a BALL attribute which points to a Ball instance.
Now, I need to load some content in YAML, but I want that content to be able to reference the existing ITEMS variable that is already defined. Is this possible? Maybe something along the lines of...
ITEMS = Items()
setattr(Items, 'BALL', Ball())
yaml_text = "item1: !!python/object:ITEMS.BALL"
yaml_items = yaml.load(yaml_text)
My goal, after loading the YAML, is for yaml_items['item1'] to be the Ball instance from the ITEMS object.

Here's a way of doing it the uses the di() function defined in the answer to another question. It takes the integer value returned from the built-in id() function and converts it to a string. The yaml.load() function will call a custom constructor which then does the reverse of that process to determine the object returned.
Caveat: This takes advantage of the fact that, with CPython at least, the id() function returns the address of the Python object in memory—so it may not work with other implementations of the interpreter.
import _ctypes
import yaml
def di(obj_id):
""" Reverse of id() function. """
return _ctypes.PyObj_FromPtr(obj_id)
def py_object_constructor(loader, node):
return di(int(node.value))
yaml.add_constructor(u'!py_object', py_object_constructor)
class Items(object): pass
def Ball(): return 42
ITEMS = Items()
setattr(Items, 'BALL', Ball()) # Set attribute to result of calling Ball().
yaml_text = "item1: !py_object " + str(id(ITEMS.BALL))
yaml_items = yaml.load(yaml_text)
print(yaml_items['item1']) # -> 42
If you're OK with using eval(), you could formalize this and make it easier to use by monkey-patching the yaml module's load() function to do some preprocessing of the yaml stream:
import _ctypes
import re
import yaml
#### Monkey-patch yaml module.
def _my_load(yaml_text, *args, **kwargs):
REGEX = r'##(.+)##'
match = re.search(REGEX, yaml_text)
if match:
obj = eval(match.group(1))
yaml_text = re.sub(REGEX, str(id(obj)), yaml_text)
return _yaml_load(yaml_text, *args, **kwargs)
_yaml_load = yaml.load # Save original function.
yaml.load = _my_load # Change it to custom version.
#### End monkey-patch yaml module.
def di(obj_id):
""" Reverse of id() function. """
return _ctypes.PyObj_FromPtr(obj_id)
def py_object_constructor(loader, node):
return di(int(node.value))
yaml.add_constructor(u'!py_object', py_object_constructor)
class Items(object): pass
def Ball(): return 42
ITEMS = Items()
setattr(Items, 'BALL', Ball()) # Set attribute to result of calling Ball().
yaml_text = "item1: !py_object ##ITEMS.BALL##"
yaml_items = yaml.load(yaml_text)
print(yaml_items['item1']) # -> 42

#martineau quoted the documentation:
[…] provides Python-specific tags that allow to represent an arbitrary Python object.
represent, not construct. It means that you can dump any Python object to YAML, but you can not reference an existing Python object inside YAML.
That being said, you can of course add your own constructor to do it:
import yaml
def eval_constructor(loader, node):
return eval(loader.construct_scalar(node))
yaml.add_constructor(u'!eval', eval_constructor)
some_value = '123'
yaml_text = "item1: !eval some_value"
yaml_items = yaml.load(yaml_text)
Be aware of the security implications of evaling configuration data. Arbitrary Python code can be executed by writing it into the YAML file!
Mostly copied from this answer

Related

Is there a way to change a class variable without adding 'foo = '?

I have a class, and would like to change an object of it (similar to the pop method of lists), without adding an foo = foo.bar()
In simpler terms, i'd like to do foo.bar() instead of foo = foo.bar(). Is this possible in python?
Here's some code that i have, which hopefully furthers understanding:
class mystr(str):
def pop(self, num):
self = list(self)
changed = self.pop(num) # The particular character that was removed
self = ''.join(self) # The rest of the string
# Somewhere in here i need to be able to change the actual variable that pop() was called on
return changed # Emulates python lists' way of returning the removed element.
my_var = mystr("Hello World!")
print(my_var.pop(4) # Prints 'o', as you would expect
print(my_var) # But this still prints 'Hello World!', instead of 'Hell World!'
# It isn't modified, which is what i want it to do
You can, but not with str.
What you're looking for is a way to mutate your object. For most classes you write yourself, doing that is straightforward:
class Foo:
def __init__(self):
self.stuff = 0
def example(self):
self.stuff += 1
Here, calling example on a Foo instance mutates it, by changing its stuff instance attribute.
str, however, is immutable. It stores its data in C-level data structures and provides no mutator methods, so there's no way to modify its data. Even if you used ctypes to bypass the protection, you'd just get a bunch of memory corruption bugs.
You can add your own attributes in a subclass, and those will be mutable, but if you do that to fake a mutable string, you might as well just not inherit from str. Inheriting from str in that case will only cause bugs, with some code looking at your "fake" data and other code looking at the "real" underlying str data.
Most likely, the way to go will be one of two options. The first is to just use regular strings without your subclass or the methods you want to add. The second is to write a class that doesn't inherit from str.
You could achieve that by encapsulating a string, rather then inheriting from it:
class mystr:
def __init__(self, string):
self._str = string
def pop(self, num):
string_list = list(self._str)
changed = string_list.pop(num) # The particular character that was removed
self._str = ''.join(string_list) # The rest of the string
return changed # Emulates python lists' way of returning the removed element.
def __repr__(self):
return self._str
Running the same code with this class instead will print:
o
Hell World!

How to get variable names of function call

I am going to to write a decorator which evaluates the actual names (not their value) of the variables that are passed to the function call.
Below, you find a skeleton of the code which makes it a bit clearer what I want to do.
import functools
def check_func(func):
# how to get variable names of function call
# s.t. a call like func(arg1, arg2, arg3)
# returns a dictionary {'a':'arg1', 'b':'arg2', 'c':'arg3'} ?
pass
def my_decorator(func):
#functools.wraps(func)
def call_func(*args, **kwargs):
check_func(func)
return func(*args, **kwargs)
return call_func
#my_decorator
def my_function(a, b, c):
pass
arg1='foo'
arg2=1
arg3=[1,2,3]
my_function(arg1,arg2,arg3)
You can't really have what you are asking for.
There are many ways of calling a function, where you won't even get variable names for individual values. For example, what would the names when you use literal values in the call, so:
my_function('foo', 10 - 9, [1] + [2, 3])
or when you use a list with values for argument expansion with *:
args = ['foo', 1, [1, 2, 3]]
my_function(*args)
Or when you use a functools.partial() object to bind some argument values to a callable object:
from functools import partial
func_partial = partial(my_function, arg1, arg2)
func_partial(arg3)
Functions are passed objects (values), not variables. Expressions consisting of just names may have been used to produce the objects, but those objects are independent of the variables.
Python objects can have many different references, so just because the call used arg1, doesn't mean that there won't be other references to the object elsewhere that would be more interesting to your code.
You could try to analyse the code that called the function (the inspect module can give you access to the call stack), but then that presumes that the source code is available. The calling code could be using a C extension, or interpreter only has access to .pyc bytecode files, not the original source code. You still would have to trace back and analyse the call expression (not always that straightforward, functions are objects too and can be stored in containers and retrieved later to be called dynamically) and from there find the variables involved if there are any at all.
For the trivial case, where only direct positional argument names were used for the call and the whole call was limited to a single line of source code, you could use a combination of inspect.stack() and the ast module to parse the source into something useful enough to analyse:
import inspect, ast
class CallArgumentNameFinder(ast.NodeVisitor):
def __init__(self, functionname):
self.name = functionname
self.params = []
self.kwargs = {}
def visit_Call(self, node):
if not isinstance(node.func, ast.Name):
return # not a name(...) call
if node.func.id != self.name:
return # different name being called
self.params = [n.id for n in node.args if isinstance(n, ast.Name)]
self.kwargs = {
kw.arg: kw.value.id for kw in node.keywords
if isinstance(kw.value, ast.Name)
}
def check_func(func):
caller = inspect.stack()[2] # caller of our caller
try:
tree = ast.parse(caller.code_context[0])
except SyntaxError:
# not a complete Python statement
return None
visitor = CallArgumentNameFinder(func.__name__)
visitor.visit(tree)
return inspect.signature(func).bind_partial(
*visitor.params, **visitor.kwargs)
Again, for emphasis: this only works with the most basic of calls, where the call consists of a single line only, and the called name matches the function name. It can be expanded upon but this takes a lot of work.
For your specific example, this produces <BoundArguments (a='arg1', b='arg2', c='arg3')>, so an inspect.BoundArguments instance. Use .arguments to get an OrderedDict mapping with the name-value pairs, or dict(....arguments) to turn that into a regular dictionary.
You'll have to think about your specific problem differently instead. Decorators are not meant to be acting upon the code calling, they act upon the decorated object. There are many other powerful features in the language that can help you deal with the calling context, decorators are not it.

pickling class method

I have a class whose instances need to format output as instructed by the user. There's a default format, which can be overridden. I implemented it like this:
class A:
def __init__(self, params):
# ...
# by default printing all float values as percentages with 2 decimals
self.format_functions = {float: lambda x : '{:.2%}'.format(x)}
def __str__(self):
# uses self.format_functions to format output
# ...
a = A(params)
print(a) # uses default output formatting
# overriding default output formatting
# float printed as percentages 3 decimal digits; bool printed as Y / N
a.format_functions = {float : lambda x: '{:.3%}'.format(x),
bool : lambda x: 'Y' if x else 'N'}
print(a)
Is it ok? Let me know if there is a better way to design this.
Unfortunately, I need to pickle instances of this class. But only functions defined at the top level of the module can be pickled; lambda functions are unpicklable, so my format_functions instance attribute breaks the pickling.
I tried rewriting this to use a class method instead of lambda functions, but still no luck for the same reason:
class A:
#classmethod
def default_float_format(cls, x):
return '{:.2%}'.format(x)
def __init__(self, params):
# ...
# by default printing all float values as percentages with 2 decimals
self.format_functions = {float: self.default_float_format}
def __str__(self):
# uses self.format_functions to format output
# ...
a = A(params)
pickle.dump(a) # Can't pickle <class 'method'>: attribute lookup builtins.method failed
Note that pickling here doesn't work even if I don't override the defaults; just the fact that I assigned self.format_functions = {float : self.default_float_format} breaks it.
What to do? I'd rather not pollute the namespace and break encapsulation by defining default_float_format at the module level.
Incidentally, why in the world does pickle create this restriction? It certainly feels like a gratuitous and substantial pain to the end user.
For pickling of class instances or functions (and therefore methods), Python's pickle depend that their name is available as global variables - the reference to the method in the dictionary points to a name that is not available in the global name space - which iis better said "module namespace" -
You could circunvent that by customizing the pickling of your class, by creating teh "__setstate__" and "__getstate__" methods - but I think you be better, since the formatting function does not depend on any information of the object or of the class itself (and even if some formatting function does, you could pass that as parameters), and define a function outside of the class scope.
This does work (Python 3.2):
def default_float_format( x):
return '{:.2%}'.format(x)
class A:
def __init__(self, params):
# ...
# by default printing all float values as percentages with 2 decimals
self.format_functions = {float: default_float_format}
def __str__(self):
# uses self.format_functions to format output
pass
a = A(1)
pickle.dumps(a)
If you use the dill module, either of your two approaches will just "work" as is. dill can pickle lambda as well as instances of classes and also class methods.
No need to pollute the namespace and break encapsulation, as you said you didn't want to do… but the other answer does.
dill is basically ten years or so worth of finding the right copy_reg function that registers how to serialize the majority of objects in standard python. Nothing special or tricky, it just takes time. So why doesn't pickle do this for us? Why does pickle have this restriction?
Well, if you look at the pickle docs, the answer is there:
https://docs.python.org/2/library/pickle.html#what-can-be-pickled-and-unpickled
Basically: Functions and classes are pickled by reference.
This means pickle does not work on objects defined in __main__, and it also doesn't work on many dynamically modified objects. dill registers __main__ as a module, so it has a valid namespace. dill also given you the option to not pickle by reference, so you can serialize dynamically modified objects… and class instances, class methods (bound and unbound), and so on.

Dumping a subclass of gtk.ListStore using pickle

I am trying to dump a custom class using pickle. The class was subclassed from gtk.ListStore, since that made it easier to store particular data and then display it using gtk. This can be reproduced as shown here.
import gtk
import pickle
import os
class foo(gtk.ListStore):
pass
if __name__=='__main__':
x = foo(str)
with open(os.path.expandvars('%userprofile%\\temp.txt'),'w') as f:
pickle.dump(x,f)
The solution that I have tried was to add a __getstate__ function into my class. As far as I understand the documentation, this should take precedence for pickle so that it no longer tries to serialize the ListStore, which it is unable to do. However, I still get an identical error from pickle.dump when I try to pickle my object. The error can be reproduced as follows.
import gtk
import pickle
import os
class foo(gtk.ListStore):
def __getstate__(self):
return 'bar'
if __name__=='__main__':
x = foo(str)
with open(os.path.expandvars('%userprofile%\\temp.txt'),'w') as f:
pickle.dump(x,f)
In each case, pickle.dump raises a TypeError, "can't pickle ListStore objects". Using print statements, I have verified that the __getstate__ function is run when using pickle.dump. I don't see any hints as to what to do next from the documentation, and so I'm in a bit of a bind. Any suggestions?
With this method you can even use json instead of pickle for your purpose.
Here is a quick working example to show you the steps you need to employ to pickle "unpicklable types" like gtk.ListStore. Essentially you need to do a few things:
Define __reduce__ which returns a function and arguments needed to reconstruct the instance.
Determine the column types for your ListStore. The method self.get_column_type(0) returns a Gtype, so you will need to map this back to the corresponding Python type. I've left that as an exercise - in my example I've employed a hack to get the column types from the first row of values.
Your _new_foo function will need to rebuild the instance.
Example:
import gtk, os, pickle
def _new_foo(cls, coltypes, rows):
inst = cls.__new__(cls)
inst.__init__(*coltypes)
for row in rows:
inst.append(row)
return inst
class foo(gtk.ListStore):
def __reduce__(self):
rows = [list(row) for row in self]
# hack - to be correct you'll really need to use
# `self.get_column_type` and map it back to Python's
# corresponding type.
coltypes = [type(c) for c in rows[0]]
return _new_foo, (self.__class__, coltypes, rows)
x = foo(str, int)
x.append(['foo', 1])
x.append(['bar', 2])
s = pickle.dumps(x)
y = pickle.loads(s)
print list(y[0])
print list(y[1])
Output:
['foo', 1]
['bar', 2]
When you subclass object, object.__reduce__ takes care of calling __getstate__. It would seem that since this is a subclass of gtk.ListStore, the default implementation of __reduce__ tries to pickle the data for reconstructing a gtk.ListStore object first, then calls your __getstate__, but since the gtk.ListStore can't be pickled, it refuses to pickle your class. The problem should go away if you try to implement __reduce__ and __reduce_ex__ instead of __getstate__.
>>> class Foo(gtk.ListStore):
... def __init__(self, *args):
... super(Foo, self).__init__(*args)
... self._args = args
... def __reduce_ex__(self, proto=None):
... return type(self), self._args, self.__getstate__()
... def __getstate__(self):
... return 'foo'
... def __setstate__(self, state):
... print state
...
>>> x = Foo(str)
>>> pickle.loads(pickle.dumps(x))
foo
<Foo object at 0x18be1e0 (__main__+Foo-v3 at 0x194bd90)>
As an addition, you may try to consider other serializers, such as json. There you take full control of the serialiazaton process by defining how custom classes are to be serialized yourself. Plus by default they come without the security issues of pickle.

Python serialize lexical closures?

Is there a way to serialize a lexical closure in Python using the standard library? pickle and marshal appear not to work with lexical closures. I don't really care about the details of binary vs. string serialization, etc., it just has to work. For example:
def foo(bar, baz) :
def closure(waldo) :
return baz * waldo
return closure
I'd like to just be able to dump instances of closure to a file and read them back.
Edit:
One relatively obvious way that this could be solved is with some reflection hacks to convert lexical closures into class objects and vice-versa. One could then convert to classes, serialize, unserialize, convert back to closures. Heck, given that Python is duck typed, if you overloaded the function call operator of the class to make it look like a function, you wouldn't even really need to convert it back to a closure and the code using it wouldn't know the difference. If any Python reflection API gurus are out there, please speak up.
PiCloud has released an open-source (LGPL) pickler which can handle function closure and a whole lot more useful stuff. It can be used independently of their cloud computing infrastructure - it's just a normal pickler. The whole shebang is documented here, and you can download the code via 'pip install cloud'. Anyway, it does what you want. Let's demonstrate that by pickling a closure:
import pickle
from StringIO import StringIO
import cloud
# generate a closure
def foo(bar, baz):
def closure(waldo):
return baz * waldo
return closure
closey = foo(3, 5)
# use the picloud pickler to pickle to a string
f = StringIO()
pickler = cloud.serialization.cloudpickle.CloudPickler(f)
pickler.dump(closey)
#rewind the virtual file and reload
f.seek(0)
closey2 = pickle.load(f)
Now we have closey, the original closure, and closey2, the one that has been restored from a string serialisation. Let's test 'em.
>>> closey(4)
20
>>> closey2(4)
20
Beautiful. The module is pure python—you can open it up and easily see what makes the magic work. (The answer is a lot of code.)
If you simply use a class with a __call__ method to begin with, it should all work smoothly with pickle.
class foo(object):
def __init__(self, bar, baz):
self.baz = baz
def __call__(self,waldo):
return self.baz * waldo
On the other hand, a hack which converted a closure into an instance of a new class created at runtime would not work, because of the way pickle deals with classes and instances. pickle doesn't store classes; only a module name and class name. When reading back an instance or class it tries to import the module and find the required class in it. If you used a class created on-the-fly, you're out of luck.
Yes! I got it (at least I think) -- that is, the more generic problem of pickling a function. Python is so wonderful :), I found out most of it though the dir() function and a couple of web searches. Also wonderful to have it [hopefully] solved, I needed it also.
I haven't done a lot of testing on how robust this co_code thing is (nested fcns, etc.), and it would be nice if someone could look up how to hook Python so functions can be pickled automatically (e.g. they might sometimes be closure args).
Cython module _pickle_fcn.pyx
# -*- coding: utf-8 -*-
cdef extern from "Python.h":
object PyCell_New(object value)
def recreate_cell(value):
return PyCell_New(value)
Python file
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# author gatoatigrado [ntung.com]
import cPickle, marshal, types
import pyximport; pyximport.install()
import _pickle_fcn
def foo(bar, baz) :
def closure(waldo) :
return baz * waldo
return closure
# really this problem is more about pickling arbitrary functions
# thanks so much to the original question poster for mentioning marshal
# I probably wouldn't have found out how to serialize func_code without it.
fcn_instance = foo("unused?", -1)
code_str = marshal.dumps(fcn_instance.func_code)
name = fcn_instance.func_name
defaults = fcn_instance.func_defaults
closure_values = [v.cell_contents for v in fcn_instance.func_closure]
serialized = cPickle.dumps((code_str, name, defaults, closure_values),
protocol=cPickle.HIGHEST_PROTOCOL)
code_str_, name_, defaults_, closure_values_ = cPickle.loads(serialized)
code_ = marshal.loads(code_str_)
closure_ = tuple([_pickle_fcn.recreate_cell(v) for v in closure_values_])
# reconstructing the globals is like pickling everything :)
# for most functions, it's likely not necessary
# it probably wouldn't be too much work to detect if fcn_instance global element is of type
# module, and handle that in some custom way
# (have the reconstruction reinstantiate the module)
reconstructed = types.FunctionType(code_, globals(),
name_, defaults_, closure_)
print(reconstructed(3))
cheers,
Nicholas
EDIT - more robust global handling is necessary for real-world cases. fcn.func_code.co_names lists global names.
#!python
import marshal, pickle, new
def dump_func(f):
if f.func_closure:
closure = tuple(c.cell_contents for c in f.func_closure)
else:
closure = None
return marshal.dumps(f.func_code), f.func_defaults, closure
def load_func(code, defaults, closure, globs):
if closure is not None:
closure = reconstruct_closure(closure)
code = marshal.loads(code)
return new.function(code, globs, code.co_name, defaults, closure)
def reconstruct_closure(values):
ns = range(len(values))
src = ["def f(arg):"]
src += [" _%d = arg[%d]" % (n, n) for n in ns]
src += [" return lambda:(%s)" % ','.join("_%d"%n for n in ns), '']
src = '\n'.join(src)
try:
exec src
except:
raise SyntaxError(src)
return f(values).func_closure
if __name__ == '__main__':
def get_closure(x):
def the_closure(a, b=1):
return a * x + b, some_global
return the_closure
f = get_closure(10)
code, defaults, closure = dump_func(f)
dump = pickle.dumps((code, defaults, closure))
code, defaults, closure = pickle.loads(dump)
f = load_func(code, defaults, closure, globals())
some_global = 'some global'
print f(2)
Recipe 500261: Named Tuples contains a function that defines a class on-the-fly. And this class supports pickling.
Here's the essence:
result.__module__ = _sys._getframe(1).f_globals.get('__name__', '__main__')
Combined with #Greg Ball's suggestion to create a new class at runtime it might answer your question.

Categories