pyyaml map dict to dict of objects - python

I'm struggling with PyYAML docs to understand a probably easy thing.
I have a dictionary that maps string names to python objects:
lut = { 'bar_one': my_bar_one_obj,
'bar_two': my_bar_two_obj }
and I'd like to load a YAML file like this and map all "foo" nodes to my dictionary objects (the inverse, dumping, is not really necessary)
node1:
# ...
foo: "bar_one"
node2:
# ...
foo: "bar_two"
My first thought was to use add_constructor but I couldn't find a way to give it an extra kwarg. Maybe a custom loader?
PyYAML docs aren't really helpful or probably I'm looking for the wrong keywords...
I could accept using a custom tag like
node1:
# ...
foo: !footag "bar_one"
node2:
# ...
foo: !footag "bar_two"
But detecting just foo nodes would be nicer

You are not looking for the wrong keywords, this is not something any of the YAML parsers I know of was made to do. YAML parsers load a, possible complex, data structure that is self contained. And what you want to do is merge that self contained structure, during one of the parsing steps, into an already existing structure ( lut ). The parser is built to allow tweaking by providing alternative routines not by providing routines + data
There is no option for that built into PyYAML, i.e. there is no built-in way to tell the loader about lut that make PyYAML do something with it, and certainly not to attach key-value pairs (assuming that is what you mean with the nodes) as values to its keys.
Probably the easiest way of getting what you want is using some post process which takes lut and the data loaded from your YAML file (which is also a dict) and combine the two.
If you want to try and do this with add_constructor, then what you need to do is construct a class with a __call__ method, create an instance of the class with lut as argument and than pass that instance in as an alternative constructor):
class ConstructorWithLut:
def __init__(self, lut):
self._lut = lut
def __call__(self):
# the actual constructor routine added by add_constructor
constructor_with_lut(lut)
SomeConstructor.add_constructor('your_tag', constructor_with_lut)
In which you can replace 'your_tag' with u'tag:yaml.org,2002:map' if you want
your constructor to handle (all) normal dicts.
Another option is to do this during YAML loading, but once again you cannot just tweak the Loader, or one of its constituent components (the Constructor) as you normally hand in the class not an object. You need an object to be able to attach lut. So what you would to do is create your own constructor and your own loader that uses that constructor and then a load() replacement that instantiates your loader, attaches lut (by just adding it as a unique attribute, or by passing it in as a parameter and handing it on to your constructor).
Your constructor, which should be a subclass of one of the existing constructors, then has to have its own construct_mapping() that first calls the parent class' construct_mapping() and, before returning the result, inspects whether it could update that attribute to which lut has been assigned. You cannot do this based on looking at the keys of the dict for foo, because if you have such a key you don't have access to the parent node that you need to assign to lut. What you need to do is see if any of the values of the mapping is a dict that has a key name foo, and if it does the dictionary can be used to update lut based on the value associated with foo.
I would certainly first implement the post process stage using two routines:
def update_from_yaml(your_dict, yaml_data):
for node_key in yaml_data:
node_value = yaml_data[node_key]
map_value(your_dict, node_key, node_value)
def map_value(your_dict, key, value):
foo_val = value.get('foo')
if foo_val is None: # key foo not found
return
your_dict[foo_val] = value # or = {key: value}
I am not sure what you really mean with "assigning all foo nodes", the YAML data has no nodes at the top level, it only has keys and values. So you either assign that pair or only its value (a dict).
Once those two routines work satisfactory, you can try to implement the add_constructor or Loader based alternatives, in which you should be able to re-use at least map_value

Related

Safe way to create an "out of band" alternative value for a function argument in Python

I have a function like the following:
def do_something(thing):
pass
def foo(everything, which_things):
"""Do stuff to things.
Args:
everything: Dict of things indexed by thing identifiers.
which_things: Iterable of thing identifiers. Something is only
done to these things.
"""
for thing_identifier in which_things:
do_something(everything[thing_identifier])
But I want to extend it so that a caller can do_something with everything passed in everything without having to provide a list of identifiers. (As a motivation, if everything was an opaque container whose keys weren't accessible to library users but only library internals. In this case, foo can access the keys but the caller can't. Another motivation is error prevention: having a constant with obvious semantics avoids a caller mistakenly passing the wrong set of identifiers in.) So one thought is to have a constant USE_EVERYTHING that can be passed in, like so:
def foo(everything, which_things):
"""Do stuff to things.
Args:
everything: Dict of things indexed by thing identifiers.
which_things: Iterable of thing identifiers. Something is only
done to these things. Alternatively pass USE_EVERYTHING to
do something to everything.
"""
if which_things == USE_EVERYTHING:
which_things = everything.keys()
for thing_identifier in which_things:
do_something(everything[thing_identifier])
What are some advantages and limitations of this approach? How can I define a USE_EVERYTHING constant so that it is unique and specific to this function?
My first thought is to give it its own type, like so:
class UseEverythingType:
pass
USE_EVERYTHING = UseEverythingType()
This would be in a package only exporting USE_EVERYTHING, discouraging creating any other UseEverythingType objects. But I worry that I'm not considering all aspects of Python's object model -- could two instances of USE_EVERYTHING somehow compare unequal?

Python - how to create graph of variable assignment?

In a sample python class function, I have one or more class items that have arbitrary type and constructor signatures that all have a single return value and one or more original parameters to the function. Additionally, I have the possibility of using the output of a given member object as the input to another member object:
class Blah(...):
def __init__(
def myfunc(param1, param2... param_n):
r1 = self.obj1(param1,...)
...
r_n = self.obj_n(param1,r1,...)
What I need to know is, is there a way to instrument python to track edges between input and output of each invocation of a given set of tracked objects?
For example, as in the above, the result would be a graph: (param1...) -> r1, and (param1,r1...) -> r_n
The actual edge direction doesn't matter so long as the input-output relationship is consitent.
You could trace the function, and create a mapping of every function call.
An example of this is pytorch's onnx export capability, which uses this technique. In addition, if that's not enough, you could probably resort to using the python debugger api or just instrument all items within a module by using the inspect module.
import inspect
inspect.getmembers(your_module, isfunction)
By creating a class and defining call with the kwargs convention, you can match the signature of any object or function that you wrap with it. Then, when you iterate on the members of a module, you can wrap and re-assign that member with some class instance that reads the function meta-data or dynamic type information (f.name or otherwise), you can then track the arguments (maintain names by some unique id generation scheme) and function names and just create a graph right out of them.

PERL-like autovivification with default value in Python, and returns a default value from non-existing arbitrary nesting?

Suppose I want PERL-like autovivication in Python, i.e.:
>>> d = Autovivifier()
>>> d = ['nested']['key']['value']=10
>>> d
{'nested': {'key': {'value': 10}}}
There are a couple of dominant ways to do that:
Use a recursive default dict
Use a __missing__ hook to return the nested structure
OK -- easy.
Now suppose I want to return a default value from a dict with a missing key. Once again, few way to do that:
For a non-nested path, you can use a __missing__ hook
try/except block wrapping the access to potentially missing key path
Use {}.get(key, default) (does not easily work with a nested dict) i.e., There is no version of autoviv.get(['nested']['key']['no key of this value'], default)
The two goals seem in irreconcilable conflict (based on me trying to work this out the last couple hours.)
Here is the question:
Suppose I want to have an Autovivifying dict that 1) creates the nested structure for d['arbitrary']['nested']['path']; AND 2) returns a default value from a non-existing arbitrary nesting without wrapping that in try/except?
Here are the issues:
The call of d['nested']['key']['no key of this value'] is equivalent to (d['nested'])['key']['no key of this value']. Overiding __getitem__ does not work without returning an object that ALSO overrides __getitem__.
Both the methods for creating an Autovivifier will create a dict entry if you test that path for existence. i.e., I do not want if d['p1']['sp2']['etc.'] to create that whole path if you just test it with the if.
How can I provide a dict in Python that will:
Create an access path of the type d['p1']['p2'][etc]=val (Autovivication);
NOT create that same path if you test for existence;
Return a default value (like {}.get(key, default)) without wrapping in try/except
I do not need the FULL set of dict operations. Really only d=['nested']['key']['value']=val and d['nested']['key']['no key of this value'] is equal to a default value. I would prefer that testing d['nested']['key']['no key of this value'] does not create it, but would accept that.
?
To create a recursive tree of dictionaries, use defaultdict with a trick:
from collections import defaultdict
tree = lambda: defaultdict(tree)
Then you can create your x with x = tree().
above from #BrenBarn -- defaultdict of defaultdict, nested
Don't do this. It could be solved much more easily by just writing a class that has the operations you want, and even in Perl it's not a universally-appraised feature.
But, well, it is possible, with a custom autoviv class. You'd need a __getitem__ that returns an empty autoviv dict but doesn't store it. The new autoviv dict would remember the autoviv dict and key that created it, then insert itself into its parent only when a "real" value is stored in it.
Since an empty dict tests as falsey, you could then test for existence Perl-style, without ever actually creating the intermediate dicts.
But I'm not going to write the code out, because I'm pretty sure this is a terrible idea.
While it does not precisely match the dictionary protocol in Python, you could achieve reasonable results by implementing your own auto-vivification dictionary that uses variable getitem arguments. Something like (2.x):
class ExampleVivifier(object):
""" Small example class to show how to use varargs in __getitem__. """
def __getitem__(self, *args):
print args
Example usage would be:
>>> v = ExampleVivifier()
>>> v["nested", "dictionary", "path"]
(('nested', 'dictionary', 'path'),)
You can fill in the blanks to see how you can achieve your desired behaviour here.

Use python dict to lookup mutable objects

I have a bunch of File objects, and a bunch of Folder objects. Each folder has a list of files. Now, sometimes I'd like to lookup which folder a certain file is in. I don't want to traverse over all folders and files, so I create a lookup dict file -> folder.
folder = Folder()
myfile = File()
folder_lookup = {}
# This is pseudocode, I don't actually reach into the Folder
# object, but have an appropriate method
folder.files.append(myfile)
folder_lookup[myfile] = folder
Now, the problem is, the files are mutable objects. My application is built around the fact. I change properites on them, and the GUI is notified and updated accordingly. Of course you can't put mutable objects in dicts. So what I tried first is to generate a hash based on the current content, basically:
def __hash__(self):
return hash((self.title, ...))
This didn't work of course, because when the object's contents changed its hash (and thus its identity) changed, and everything got messed up. What I need is an object that keeps its identity, although its contents change. I tried various things, like making __hash__ return id(self), overriding __eq__, and so on, but never found a satisfying solution. One complication is that the whole construction should be pickelable, so that means I'd have to store id on creation, since it could change when pickling, I guess.
So I basically want to use the identity of an object (not its state) to quickly look up data related to the object. I've actually found a really nice pythonic workaround for my problem, which I might post shortly, but I'd like to see if someone else comes up with a solution.
I felt dirty writing this. Just put folder as an attribute on the file.
class dodgy(list):
def __init__(self, title):
self.title = title
super(list, self).__init__()
self.store = type("store", (object,), {"blanket" : self})
def __hash__(self):
return hash(self.store)
innocent_d = {}
dodge_1 = dodgy("dodge_1")
dodge_2 = dodgy("dodge_2")
innocent_d[dodge_1] = dodge_1.title
innocent_d[dodge_2] = dodge_2.title
print innocent_d[dodge_1]
dodge_1.extend(range(5))
dodge_1.title = "oh no"
print innocent_d[dodge_1]
OK, everybody noticed the extremely obvious workaround (that took my some days to come up with), just put an attribute on File that tells you which folder it is in. (Don't worry, that is also what I did.)
But, it turns out that I was working under wrong assumptions. You are not supposed to use mutable objects as keys, but that doesn't mean you can't (diabolic laughter)! The default implementation of __hash__ returns a unique value, probably derived from the object's address, that remains constant in time. And the default __eq__ follows the same notion of object identity.
So you can put mutable objects in a dict, and they work as expected (if you expect equality based on instance, not on value).
See also: I'm able to use a mutable object as a dictionary key in python. Is this not disallowed?
I was having problems because I was pickling/unpickling the objects, which of course changed the hashes. One could generate a unique ID in the constructor, and use that for equality and deriving a hash to overcome this.
(For the curious, as to why such a "lookup based on instance identity" dict might be neccessary: I've been experimenting with a kind of "object database". You have pure python objects, put them in lists/containers, and can define indexes on attributes for faster lookup, complex queries and so on. For foreign keys (1:n relationships) I can just use containers, but for the backlink I have to come up with something clever if I don't want to modify the objects on the n side.)

Is it possible to add arbitrary data to an ObjectifiedElement instance?

I've set up a custom namespace lookup dictionary in order to map elements in XML files to subclasses of ObjectifiedElement. Now, I want to add some data to instances of these classes. But due to the way ObjectifiedElement works, adding an attribute will result in an element being added to the element tree, which is not what I want. More importantly, this doesn't work for all Python types; for example, it is not possible to create an attribute of the list type.
This seems to be possible by subclassing ElementBase instead, but that would imply losing the features provided by ObjectifiedElement. You could say I only need the read part of ObjectifiedElement. I suppose I can add a __getattr__ to my subclasses to simulate this, but I was hoping there was another way.
I ended up with having __getattr__() simply forward to etree's find():
class SomewhatObjectifiedElement(etree.ElementBase):
nsmap = {'ns': 'http://www.my.org/namespace'}
def __getattr__(self, name):
return self.find('ns:' + name, self.nsmap)
This will only return the first element if there are several matching, unlike ObjectifiedElement's behaviour, but it suffices for my application (mostly it can be only a single match, otherwise, I use findall()).

Categories