python dict of class objects that inherits attributes from class objects - python

How can I create a dictionary class 'D' that stores instances of some class 'A' and inherits wrapped versions of the methods of 'A'?
For example, suppose a = A() and a.first() is a function that does something to 'a' and returns some object. Then I want d = D({'k0': a0, 'k1': a1}) to have an attribute d.first() that returns {'k0': a0.first(), 'k1': a1.first()} and changes the underlying data in a0 and a1 as would calls to first().
Additionally, I want to only expose methods (and possibly attributes) that do not already exist in the dict class (no stomping) AND I want to do this dynamically (no hard coding of the methods of A, I already know how to type all the methods out by hand). Maybe consider dealing only with methods that don't start with '_' if that is easier somehow.
Also, I would like 'D' to do some handling of outputs from the calls depending on the return types (post-processing, wrapping whatever).
It seems like this might be kind of a bad idea but I just want to understand what approaches there are. From my reading so far, it seems like I could use multiple inheritance and write my own __new__ function or do something else with metaclassing. But the lore is that you should generally not mess around with __new__ unless you are a guru so this is making me hesitate. Is there some decorator trick for doing this? I think I could also use the inspect module to traverse the class 'A' but this feels like hack.
Update: As a very specific example. Imagine I have a dictionary of pandas.DataFrames and that I want the collection to appear and behave as much as possible like a single DataFrame with even tab completion of methods working in iPython (and I'm not using a pandas.Panel). For simplicity, consider only read-only behaviour (indexing and selecting), things like d.ix[0:10] and d.mean() should return something from applying the function to all the frames in the dictionary whilst d['k0'] should return the value in the dictionary corresponding to key 'k0'. A combination of hard-coding and getattr does what I want, but I am exploring to see how this could be done with less hard-coding.
I think this is a common design problem where you want to dynamically generate an iterable that behaves somewhat like the objects it contains.

Each of these work in limited testing:
#! /usr/bin/python3
class D(dict):
def __getattr__(self, name):
return lambda *args, **kw: {
k:getattr(v,name)(*args, **kw) for k,v in self.items() }
class D(dict):
def __getattr__(self, name):
def proxy(*args, **kw):
d2 = {k:getattr(v,name)(*args, **kw) for k,v in self.items()}
# Post-process if required
return d2
return proxy
d = D()
d['k1']='a1'
d['k2']='a2'
d['k3']='33'
print(d.capitalize()) # No-arg form passed to str, not dict
print(d.replace('a', 'Hello')) # Arguments
print(d.get('k1')) # .get() satisified by dict, not str

This is a bit of an odd question, but it very much depends on what 'D' is structurally.
If D is just a data structure, then it is perfectly valid to pack the dictionary with the objects, subcript and call the functions like so:
class A(object):
def __init__(self,x):
self.x = x
def first(self):
return self.x
a = A(7)
b = A(3)
d = {'X':a,'Y':b}
print d['X'].first()
Alongside this, if you want all "firsts" then you can do something like this:
firsts = {}
for i,j in d:
firsts[i]=j.first()
However, it seems like your challenge is a little more complex, plus unsafe: what if an objecy in the dictionary has a keys() method? Which takes precedence, and if its the dictionaries, how do I access the keys() method of the contents?
What I'd suggest is reframing, or reasking, the question with some more specifics. Firstly, what do the objects in the dictionary represent? And more importantly what does the dictionary represent?

Related

Is there a way to get class constructor arguments by self inspection?

I have a set of classes that I want to serialize to/from both JSON and a mongodb database. The most efficient way I see to do that is to write methods to serialize to dicts, then use built-in methods to get to/from storage. (Q1: Is this conclusion actually valid, or is there a better way?)
So, to export an instance of my class to a dict, I can use self.dict. In my case these classes get nested, so it has to be recursive, fine. Now I want to read it back...but now I'm stuck if my class has a non trivial constructor. Consider:
class MyClass(object):
def __init__(self, name, value=None):
self.name = name
self._value = value
#property
def value(self):
return self._value
a = MyClass('spam', 42)
d = a.__dict__ #has {'name':'spam', '_value':42}
#now how to unserialize?
b = MyClass(**d) #nope, because '_value' is not a valid argument
c = MyClass(); c.__dict__.update(d) #nope, can't construct 'empty' MyClass
I don't want to write a constructor that ignores unknown parameters, because I hate wasting hours trying to figure out why a class is ignoring some parameter to find that there was a typo in the name. And I don't want to remove the required parameters, because that may cause problems elsewhere.
So how do I get around this mess I've made for myself?
If there's a way to bypass the class's constructor and create an empty
object, that might work, but if there is any useful work done in __init__, I lose that. (E.g. type/range checking of parameters).
In my case, these classes don't change after construction (they define lots of useful methods, and have some caching). So if I could extract just the constructor arguments from the dict, I'd be doing good. Is there a way to do that that doesn't involve repeating all the constructor arguments??
I don't know if it's just a trivial example or not but I can't see the value of the property value(No pun intended). If you're directly assigning the value argument from the __init__ I mean. In that case just using a simple attribute will solve your problem.
But if for some reason you really need a property just strip the dash in the key :
class MyClass(object):
Pdef __init__(self, name, value=None):
self.name = name
self._value = value
#property
def value(self):
return self._value
a = MyClass('spam', 42)
d = {k.strip('_'): v for k, v in a.__dict__.items()}
b = MyClass(**d)

Is the way my class inherits list class methods pythonicly correct?

A little example will help clarify my question:
I define two classes: Security and Universe which I would like to behave as a list of Secutity objects.
Here is my example code:
class Security(object):
def __init__(self, name):
self.name = name
class Universe(object):
def __init__(self, securities):
self.securities = securities
s1 = Security('name1')
s2 = Security('name2')
u = Universe([s1, s2])
I would like my Universe Class to be able to use usual list features such as enumerate(), len(), __getitem__()... :
enumerate(u)
len(u)
u[0]
So I defined my Class as:
class Universe(list, object):
def __init__(self, securities):
super(Universe, self).__init__(iter(securities))
self.securities = securities
It seems to work, but is it the appropriate pythonic way to do it ?
[EDIT]
The above solution does not work as I wish when I subset the list:
>>> s1 = Security('name1')
>>> s2 = Security('name2')
>>> s3 = Security('name3')
>>> u = Universe([s1, s2, s3])
>>> sub_u = u[0:2]
>>> type(u)
<class '__main__.Universe'>
>>> type(sub_u)
<type 'list'>
I would like my variable sub_u to remain of type Universe.
You don't have to actually be a list to use those features. That's the whole point of duck typing. Anything that defines __getitem__(self, i) automatically handles x[i], for i in x, iter(x), enumerate(x), and various other things. Also define __len__(self) and len(x), list(x), etc. also work. Or you can define __iter__ instead of __getitem__. Or both. It depends on exactly how list-y you want to be.
The documentation on Python's special methods explains what each one is for, and organizes them pretty nicely.
For example:
class FakeList(object):
def __getitem__(self, i):
return -i
fl = FakeList()
print(fl[20])
for i, e in enumerate(fl):
print(i)
if e < -2: break
No list in sight.
If you actually have a real list and want to represent its data as your own, there are two ways to do that: delegation, and inheritance. Both work, and both are appropriate in different cases.
If your object really is a list plus some extra stuff, use inheritance. If you find yourself stepping on the base class's behavior, you may want to switch to delegation anyway, but at least start with inheritance. This is easy:
class Universe(list): # don't add object also, just list
def __init__(self, securities):
super(Universe, self).__init__(iter(securities))
# don't also store `securities`--you already have `self`!
You may also want to override __new__, which allows you to get the iter(securities) into the list at creation time rather than initialization time, but this doesn't usually matter for a list. (It's more important for immutable types like str.)
If the fact that your object owns a list rather than being one is inherent in its design, use delegation.
The simplest way to delegate is explicitly. Define the exact same methods you'd define to fake being a list, and make them all just forward to the list you own:
class Universe(object):
def __init__(self, securities):
self.securities = list(securities)
def __getitem__(self, index):
return self.securities[index] # or .__getitem__[index] if you prefer
# ... etc.
You can also do delegation through __getattr__:
class Universe(object):
def __init__(self, securities):
self.securities = list(securities)
# no __getitem__, __len__, etc.
def __getattr__(self, name):
if name in ('__getitem__', '__len__',
# and so on
):
return getattr(self.securities, name)
raise AttributeError("'{}' object has no attribute '{}'"
.format(self.__class__.__name__), name)
Note that many of list's methods will return a new list. If you want them to return a new Universe instead, you need to wrap those methods. But keep in mind that some of those methods are binary operators—for example, should a + b return a Universe only if a is one, or only if both are, or if either are?
Also, __getitem__ is a little tricky, because they can return either a list or a single object, and you only want to wrap the former in a Universe. You can do that by checking the return value for isinstance(ret, list), or by checking the index for isinstance(index, slice); which one is appropriate depends on whether you can have lists as element of a Universe, and whether they should be treated as a list or as a Universe when extracted. Plus, if you're using inheritance, in Python 2, you also need to wrap the deprecated __getslice__ and friends, because list does support them (although __getslice__ always returns a sub-list, not an element, so it's pretty easy).
Once you decide those things, the implementations are easy, if a bit tedious. Here are examples for all three versions, using __getitem__ because it's tricky, and the one you asked about in a comment. I'll show a way to use generic helpers for wrapping, even though in this case you may only need it for one method, so it may be overkill.
Inheritance:
class Universe(list): # don't add object also, just list
#classmethod
def _wrap_if_needed(cls, value):
if isinstance(value, list):
return cls(value)
else:
return value
def __getitem__(self, index):
ret = super(Universe, self).__getitem__(index)
return _wrap_if_needed(ret)
Explicit delegation:
class Universe(object):
# same _wrap_if_needed
def __getitem__(self, index):
ret = self.securities.__getitem__(index)
return self._wrap_if_needed(ret)
Dynamic delegation:
class Universe(object):
# same _wrap_if_needed
#classmethod
def _wrap_func(cls, func):
#functools.wraps(func)
def wrapper(*args, **kwargs):
return cls._wrap_if_needed(func(*args, **kwargs))
def __getattr__(self, name):
if name in ('__getitem__'):
return self._wrap_func(getattr(self.securities, name))
elif name in ('__len__',
# and so on
):
return getattr(self.securities, name)
raise AttributeError("'{}' object has no attribute '{}'"
.format(self.__class__.__name__), name)
As I said, this may be overkill in this case, especially for the __getattr__ version. If you just want to override one method, like __getitem__, and delegate everything else, you can always define __getitem__ explicitly, and let __getattr__ handle everything else.
If you find yourself doing this kind of wrapping a lot, you can write a function that generates wrapper classes, or a class decorator that lets you write skeleton wrappers and fills in the details, etc. Because the details depend on your use case (all those issues I mentioned above that can go one way or the other), there's no one-size-fits-all library that just magically does what you want, but there are a number of recipes on ActiveState that show more complete details—and there are even a few wrappers in the standard library source.
That is a reasonable way to do it, although you don't need to inherit from both list and object. list alone is enough. Also, if your class is a list, you don't need to store self.securities; it will be stored as the contents of the list.
However, depending on what you want to use your class for, you may find it easier to define a class that stores a list internally (as you were storing self.securities), and then define methods on your class that (sometimes) pass through to the methods of this stored list, instead of inheriting from list. The Python builtin types don't define a rigorous interface in terms of which methods depend on which other ones (e.g., whether append depends on insert), so you can run into confusing behavior if you try to do any nontrivial manipulations of the contents of your list-class.
Edit: As you discovered, any operation that returns a new list falls into this category. If you subclass list without overriding its methods, then you call methods on your object (explicitly or implicitly), the underlying list methods will be called. These methods are hardcoded to return a plain Python list and do not check what the actual class of the object is, so they will return a plain Python list.

Python idiom for dict-able classes?

I want to do something like this:
class Dictable:
def dict(self):
raise NotImplementedError
class Foo(Dictable):
def dict(self):
return {'bar1': self.bar1, 'bar2': self.bar2}
Is there a more pythonic way to do this? For example, is it possible to overload the built-in conversion dict(...)? Note that I don't necessarily want to return all the member variables of Foo, I'd rather have each class decide what to return.
Thanks.
The Pythonic way depends on what you want to do. If your objects shouldn't be regarded as mappings in their own right, then a dict method is perfectly fine, but you shouldn't "overload" dict to handle dictables. Whether or not you need the base class depends on whether you want to do isinstance(x, Dictable); note that hasattr(x, "dict") would serve pretty much the same purpose.
If the classes are conceptually mappings of keys to values, then implementing the Mapping protocol seems appropriate. I.e., you'd implement
__getitem__
__iter__
__len__
and inherit from collections.Mapping to get the other methods. Then you get dict(Foo()) for free. Example:
class Foo(Mapping):
def __getitem__(self, key):
if key not in ("bar1", "bar2"):
raise KeyError("{} not found".format(repr(key))
return getattr(self, key)
def __iter__(self):
yield "bar1"
yield "bar2"
def __len__(self):
return 2
Firstly, look at collections.ABC, which describes the Python abstract base class protocol (equivalent to interfaces in static languages).
Then, decide if you want to write your own ABC or make use of an existing one; in this case, Mapping might be what you want.
Note that although the dict constructor (i.e. dict(my_object)) is not overrideable, if it encounters an iterable object that yields a sequence of key-value pairs, it will construct a dict from that; i.e. (Python 2; for Python 3 replace items with iteritems):
def __iter__(self):
return {'bar1': self.bar1, 'bar2': self.bar2}.iteritems()
However, if your classes are intended to behave like a dict you shouldn't do this as it's different from the expected behaviour of a Mapping instance, which is to iterate over keys, not key-value pairs. In particular it would cause for .. in to behave incorrectly.
Most of the answers here are about making your class behave like a dict, which isn't actually what you asked. If you want to express the idea, "I am a class that can be turned into a dict," I would simply define a bunch of classes and have them each implement .dict(). Python favors duck-typing (what an object can do) over what an object is. The ABC doesn't add much. Documentation serves the same purpose.
You can certainly overload dict() but you almost never want to! Too many aspects of the standard library depend upon dict being available and you will break most of its functionality. You cab probably do something like this though:
class Dictable:
def dict(self):
return self.__dict__

what is the dict class used for

Can someone explain what the dict class is used for? This snippet is from Dive Into Python
class FileInfo(dict):
"store file metadata"
def __init__(self, filename=None):
self["name"] = filename
I understand the assignment of key=value pairs with self['name'] = filename but what does inheriting the dict class have to do with this? Please help me understand.
If you're not familiar with inheritance concept of object-oriented programming have a look at least at this wiki article (though, that's only for introduction and may be not for the best one).
In python we use this syntax to define class A as subclass of class B:
class A(B):
pass # empty class
In your example, as FileInfo class is inherited from standard dict type you can use instances of that class as dictionaries (as they have all methods that regular dict object has). Besides other things that allows you assign values by key like that (dict provides method for handing this operation):
self['name'] = filename
Is that the explanation you want or you don't understand something else?
It's for creating your own customized Dictionary type.
You can override __init__, __getitem__ and __setitem__ methods for your own special purposes to extend dictionary's usage.
Read the next section in the Dive into Python text: we use such inheritance to be able to work with file information just the way we do using a normal dictionary.
# From the example on the next section
>>> f = fileinfo.FileInfo("/music/_singles/kairo.mp3")
>>> f["name"]
'/music/_singles/kairo.mp3'
The fileinfo class is designed in a way that it receives a file name in its constructor, then lets the user get file information just the way you get the values from an ordinary dictionary.
Another usage of such a class is to create dictionaries which control their data. For example you want a dictionary who does a special thing when things are assigned to, or read from its 'sensor' key. You could define your special __setitem__ function which is sensitive with the key name:
def __setitem__(self, key, item):
self.data[key] = item
if key == "sensor":
print("Sensor activated!")
Or for example you want to return a special value each time user reads the 'temperature' key. For this you subclass a __getitem__ function:
def __getitem__(self, key):
if key == "temperature":
return CurrentWeatherTemperature()
else:
return self.data[key]
When an Class in Python inherits from another Class, it means that any of the methods defined on the inherited Class are, by nature, defined on the newly created Class.
So when FileInfo inherits dict it means all of the functionality of the dict class is now available to FileInfo, in addition to anything that FileInfo may declare, or more importantly, override by re-defining the method or parameter.
Since the dict Object in Python allows for key/value name pairs, this enables FileInfo to have access to that same mechanism.

Lookup table for unhashable in Python

I need to create a mapping from objects of my own custom class (derived from dict) to objects of another custom class. As I see it there are two ways of doing this:
I can make the objects hashable. I'm not sure how I would do this. I know I can implement __hash__() but I'm unsure how to actually calculate the hash (which should be an integer).
Since my objects can be compared I can make a list [(myobj, myotherobj)] and then implement a lookup which finds the tuple where the first item in the tuple is the same as the lookup key. Implementing this is trivial (the number of objects is small) but I want to avoid reinventing the wheel if something like this already exists in the standard library.
It seems to me that wanting to look up unhashables would be a common problem so I assume someone has already solved this problem. Any suggestions on how to implement __hash()__ for a dict-like object or if there is some other standard way of making lookup tables of unhashables?
Mappings with mutable objects as keys are generally difficult. Is that really what you want? If you consider your objects to be immutable (there is no way to really enforce immutability in Python), or you know they will not be changed while they are used as keys in a mapping, you can implement your own hash-function for them in several ways. For instance, if your object only has hashable data-members, you can return the hash of a tuple of all data-members as the objects hash.
If your object is a dict-like, you can use the hash of a frozenset of all key-value-pairs.
def __hash__(self):
return hash(frozenset(self.iteritems()))
This only works if all values are hashable. In order to save recalculations of the hashes (which would be done on every lookup), you can cache the hash-value and just recalculate it if some dirty-flag is set.
A simple solution seems to be to do lookup[id(myobj)] = myotherobj instead of lookup[myobj] = myotherobj. Any commente on this approach?
The following should work if you're not storing any additional unhashable objects in your custom class:
def __hash__(self):
return hash(self.items())
Here is an implementation of a frozendict, taken from http://code.activestate.com/recipes/414283/:
class frozendict(dict):
def _blocked_attribute(obj):
raise AttributeError, "A frozendict cannot be modified."
_blocked_attribute = property(_blocked_attribute)
__delitem__ = __setitem__ = clear = _blocked_attribute
pop = popitem = setdefault = update = _blocked_attribute
def __new__(cls, *args):
new = dict.__new__(cls)
dict.__init__(new, *args)
return new
def __init__(self, *args):
pass
def __hash__(self):
try:
return self._cached_hash
except AttributeError:
h = self._cached_hash = hash(tuple(sorted(self.items())))
return h
def __repr__(self):
return "frozendict(%s)" % dict.__repr__(self)
I would replace tuple(sorted(self.items())) by frozenset(self.iteritems()) as in Spacecowboy's answer. And consider adding __slots__ = ("_cached_hash",) to the class.

Categories