Is it possible to add arbitrary data to an ObjectifiedElement instance?

Is it possible to add arbitrary data to an ObjectifiedElement instance? - python

I've set up a custom namespace lookup dictionary in order to map elements in XML files to subclasses of ObjectifiedElement. Now, I want to add some data to instances of these classes. But due to the way ObjectifiedElement works, adding an attribute will result in an element being added to the element tree, which is not what I want. More importantly, this doesn't work for all Python types; for example, it is not possible to create an attribute of the list type.
This seems to be possible by subclassing ElementBase instead, but that would imply losing the features provided by ObjectifiedElement. You could say I only need the read part of ObjectifiedElement. I suppose I can add a __getattr__ to my subclasses to simulate this, but I was hoping there was another way.

I ended up with having __getattr__() simply forward to etree's find():
class SomewhatObjectifiedElement(etree.ElementBase):
nsmap = {'ns': 'http://www.my.org/namespace'}
def __getattr__(self, name):
return self.find('ns:' + name, self.nsmap)
This will only return the first element if there are several matching, unlike ObjectifiedElement's behaviour, but it suffices for my application (mostly it can be only a single match, otherwise, I use findall()).

Related

Python: can an object have an object as a "default" representation?

I am just getting started with OOP, so I apologise in advance if my question is as obvious as 2+2. :)
Basically I created a class that adds attributes and methods to a panda data frame. That's because I am sometimes looking to do complex but repetitive tasks like merging with a bunch of other tables, dropping duplicates, etc. So it's pleasant to be able to do that in just one go with a predefined method. For example, I can create an object like this:
mysupertable = MySupperTable(original_dataframe)
And then do:
mysupertable.complex_operation()
Where original_dataframe is the original panda data frame (or object) that is defined as an attribute to the class. Now, this is all good and well, but if I want to print (or just access) that original data frame I have to do something like
print(mysupertable.original_dataframe)
Is there a way to have that happening "by default" so if I just do:
print(mysupertable)
it will print the original data frame, rather than the memory location?
I know there are the str and rep methods that can be implemented in a class which return default string representations for an object. I was just wondering if there was a similar magic method (or else) to just default showing a particular attribute. I tried looking this up but I think I am somehow not finding the right words to describe what I want to do, because I can't seem to be able to find an answer.
Thank you!
Cheers

In your MySupperTable class, do:
class MySupperTable:
# ... other stuff in the class
def __str__(self) -> str:
return str(self.original_dataframe)
That will make it so that when a MySupperTable is converted to a str, it will convert its original_dataframe to a str and return that.

When you pass an object to print() it will print the object's string representation, which under the hood is retrieved by calling the object.__str__(). You can give a custom definition to this method the way that you would define any other method.

use of attributes in python

This is kind of a high level question. I'm not sure what you'd do with code like this:
class Object(object):
pass
obj = Object
obj.a = lambda: None
obj.d = lambda: dict
setattr(obj.d, 'dictionary', {4,3,5})
setattr(obj.a, 'somefield', 'somevalue')
If I'm going to call obj.a.somefield, why would I use print? It feels redundant.
I simply can't see what programming strictly with setting attributes would be good for?
I could write an entire program with all of my variables in object classes.

First about your print question. Print is used more for debugging or for attributes that are an output from an object that gives you information when you create it.
For example, there might be an object that you create by passing it data and it finds all of the basic statistics information of that data. You could have it return a dictionary via a method and access the values from there or you could simply access it via an attribute, making the data more readable.
For your second part of your question about why you would want to use attributes in general, they're more for internally passing information from function to function in an object or for configuring an object. Python has different scopes that determine which information each function can access. All methods of an object can access that object's attributes, which allows you to avoid using external or global variables. That makes your object nice and self contained. Global variables are generally avoided at all costs, because they can get messy, so they are considered bad practice.
Taking that a step further, using setattr is a more sophisticated way of setting these attributes to make your code more readable. You could use a function to modify aspects of an object or you could "hide" the complexity inside your setattr so the user can use a higher level interface rather than getting bogged down in the specifics.

pyyaml map dict to dict of objects

I'm struggling with PyYAML docs to understand a probably easy thing.
I have a dictionary that maps string names to python objects:
lut = { 'bar_one': my_bar_one_obj,
'bar_two': my_bar_two_obj }
and I'd like to load a YAML file like this and map all "foo" nodes to my dictionary objects (the inverse, dumping, is not really necessary)
node1:
# ...
foo: "bar_one"
node2:
# ...
foo: "bar_two"
My first thought was to use add_constructor but I couldn't find a way to give it an extra kwarg. Maybe a custom loader?
PyYAML docs aren't really helpful or probably I'm looking for the wrong keywords...
I could accept using a custom tag like
node1:
# ...
foo: !footag "bar_one"
node2:
# ...
foo: !footag "bar_two"
But detecting just foo nodes would be nicer

You are not looking for the wrong keywords, this is not something any of the YAML parsers I know of was made to do. YAML parsers load a, possible complex, data structure that is self contained. And what you want to do is merge that self contained structure, during one of the parsing steps, into an already existing structure ( lut ). The parser is built to allow tweaking by providing alternative routines not by providing routines + data
There is no option for that built into PyYAML, i.e. there is no built-in way to tell the loader about lut that make PyYAML do something with it, and certainly not to attach key-value pairs (assuming that is what you mean with the nodes) as values to its keys.
Probably the easiest way of getting what you want is using some post process which takes lut and the data loaded from your YAML file (which is also a dict) and combine the two.
If you want to try and do this with add_constructor, then what you need to do is construct a class with a __call__ method, create an instance of the class with lut as argument and than pass that instance in as an alternative constructor):
class ConstructorWithLut:
def __init__(self, lut):
self._lut = lut
def __call__(self):
# the actual constructor routine added by add_constructor
constructor_with_lut(lut)
SomeConstructor.add_constructor('your_tag', constructor_with_lut)
In which you can replace 'your_tag' with u'tag:yaml.org,2002:map' if you want
your constructor to handle (all) normal dicts.
Another option is to do this during YAML loading, but once again you cannot just tweak the Loader, or one of its constituent components (the Constructor) as you normally hand in the class not an object. You need an object to be able to attach lut. So what you would to do is create your own constructor and your own loader that uses that constructor and then a load() replacement that instantiates your loader, attaches lut (by just adding it as a unique attribute, or by passing it in as a parameter and handing it on to your constructor).
Your constructor, which should be a subclass of one of the existing constructors, then has to have its own construct_mapping() that first calls the parent class' construct_mapping() and, before returning the result, inspects whether it could update that attribute to which lut has been assigned. You cannot do this based on looking at the keys of the dict for foo, because if you have such a key you don't have access to the parent node that you need to assign to lut. What you need to do is see if any of the values of the mapping is a dict that has a key name foo, and if it does the dictionary can be used to update lut based on the value associated with foo.
I would certainly first implement the post process stage using two routines:
def update_from_yaml(your_dict, yaml_data):
for node_key in yaml_data:
node_value = yaml_data[node_key]
map_value(your_dict, node_key, node_value)
def map_value(your_dict, key, value):
foo_val = value.get('foo')
if foo_val is None: # key foo not found
return
your_dict[foo_val] = value # or = {key: value}
I am not sure what you really mean with "assigning all foo nodes", the YAML data has no nodes at the top level, it only has keys and values. So you either assign that pair or only its value (a dict).
Once those two routines work satisfactory, you can try to implement the add_constructor or Loader based alternatives, in which you should be able to re-use at least map_value

Use python dict to lookup mutable objects

I have a bunch of File objects, and a bunch of Folder objects. Each folder has a list of files. Now, sometimes I'd like to lookup which folder a certain file is in. I don't want to traverse over all folders and files, so I create a lookup dict file -> folder.
folder = Folder()
myfile = File()
folder_lookup = {}
# This is pseudocode, I don't actually reach into the Folder
# object, but have an appropriate method
folder.files.append(myfile)
folder_lookup[myfile] = folder
Now, the problem is, the files are mutable objects. My application is built around the fact. I change properites on them, and the GUI is notified and updated accordingly. Of course you can't put mutable objects in dicts. So what I tried first is to generate a hash based on the current content, basically:
def __hash__(self):
return hash((self.title, ...))
This didn't work of course, because when the object's contents changed its hash (and thus its identity) changed, and everything got messed up. What I need is an object that keeps its identity, although its contents change. I tried various things, like making __hash__ return id(self), overriding __eq__, and so on, but never found a satisfying solution. One complication is that the whole construction should be pickelable, so that means I'd have to store id on creation, since it could change when pickling, I guess.
So I basically want to use the identity of an object (not its state) to quickly look up data related to the object. I've actually found a really nice pythonic workaround for my problem, which I might post shortly, but I'd like to see if someone else comes up with a solution.

I felt dirty writing this. Just put folder as an attribute on the file.
class dodgy(list):
def __init__(self, title):
self.title = title
super(list, self).__init__()
self.store = type("store", (object,), {"blanket" : self})
def __hash__(self):
return hash(self.store)
innocent_d = {}
dodge_1 = dodgy("dodge_1")
dodge_2 = dodgy("dodge_2")
innocent_d[dodge_1] = dodge_1.title
innocent_d[dodge_2] = dodge_2.title
print innocent_d[dodge_1]
dodge_1.extend(range(5))
dodge_1.title = "oh no"
print innocent_d[dodge_1]

OK, everybody noticed the extremely obvious workaround (that took my some days to come up with), just put an attribute on File that tells you which folder it is in. (Don't worry, that is also what I did.)
But, it turns out that I was working under wrong assumptions. You are not supposed to use mutable objects as keys, but that doesn't mean you can't (diabolic laughter)! The default implementation of __hash__ returns a unique value, probably derived from the object's address, that remains constant in time. And the default __eq__ follows the same notion of object identity.
So you can put mutable objects in a dict, and they work as expected (if you expect equality based on instance, not on value).
See also: I'm able to use a mutable object as a dictionary key in python. Is this not disallowed?
I was having problems because I was pickling/unpickling the objects, which of course changed the hashes. One could generate a unique ID in the constructor, and use that for equality and deriving a hash to overcome this.
(For the curious, as to why such a "lookup based on instance identity" dict might be neccessary: I've been experimenting with a kind of "object database". You have pure python objects, put them in lists/containers, and can define indexes on attributes for faster lookup, complex queries and so on. For foreign keys (1:n relationships) I can just use containers, but for the backlink I have to come up with something clever if I don't want to modify the objects on the n side.)

Python - set somehow getting duplicate data

I have an class definition with a __hash__ function that uses the object properties to create a unique key for comparison in python sets.
The hash method looks like this:
def __hash__(self):
return int('%d%s'%(self.id,self.create_key))
In a module responsible for implementing this class, several queries are run that could conceivably construct duplicate instances of this class, and the queue that is created in the function responsible for doing this is a represented as a set to make sure the the dupes can be omitted:
in_set = set()
out_set = set()
for inid in inids:
ps = Perceptron.getwherelinked(inid,self.in_ents)
for p in ps:
in_set.add(p)
for poolid in poolids:
ps = Perceptron.getwherelinked(poolid,self.out_ents)
for p in ps:
out_set.add(p)
return in_set.union(out_set)
(Not sure why the indenting got mangled here)
Somehow, despite calling the union method, I am still getting the two duplicate instances. When printed out (with a str method in the Perceptron class that just calls hash) the two hashes are identical, which theoretically shouldn't be possible.
set([1630, 1630])
Any guidance would be appreciated.

If a class does not define a __cmp__() or __eq__() method it should not define a __hash__() operation either
source
Define __eq__().

You also need to implement __eq__() to match your __hash__() implementation.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Is it possible to add arbitrary data to an ObjectifiedElement instance? - python

Related

Python: can an object have an object as a "default" representation?

use of attributes in python

pyyaml map dict to dict of objects

Use python dict to lookup mutable objects

Python - set somehow getting duplicate data

Categories

Resources