Python idiom for dict-able classes? - python

I want to do something like this:
class Dictable:
def dict(self):
raise NotImplementedError
class Foo(Dictable):
def dict(self):
return {'bar1': self.bar1, 'bar2': self.bar2}
Is there a more pythonic way to do this? For example, is it possible to overload the built-in conversion dict(...)? Note that I don't necessarily want to return all the member variables of Foo, I'd rather have each class decide what to return.
Thanks.

The Pythonic way depends on what you want to do. If your objects shouldn't be regarded as mappings in their own right, then a dict method is perfectly fine, but you shouldn't "overload" dict to handle dictables. Whether or not you need the base class depends on whether you want to do isinstance(x, Dictable); note that hasattr(x, "dict") would serve pretty much the same purpose.
If the classes are conceptually mappings of keys to values, then implementing the Mapping protocol seems appropriate. I.e., you'd implement
__getitem__
__iter__
__len__
and inherit from collections.Mapping to get the other methods. Then you get dict(Foo()) for free. Example:
class Foo(Mapping):
def __getitem__(self, key):
if key not in ("bar1", "bar2"):
raise KeyError("{} not found".format(repr(key))
return getattr(self, key)
def __iter__(self):
yield "bar1"
yield "bar2"
def __len__(self):
return 2

Firstly, look at collections.ABC, which describes the Python abstract base class protocol (equivalent to interfaces in static languages).
Then, decide if you want to write your own ABC or make use of an existing one; in this case, Mapping might be what you want.
Note that although the dict constructor (i.e. dict(my_object)) is not overrideable, if it encounters an iterable object that yields a sequence of key-value pairs, it will construct a dict from that; i.e. (Python 2; for Python 3 replace items with iteritems):
def __iter__(self):
return {'bar1': self.bar1, 'bar2': self.bar2}.iteritems()
However, if your classes are intended to behave like a dict you shouldn't do this as it's different from the expected behaviour of a Mapping instance, which is to iterate over keys, not key-value pairs. In particular it would cause for .. in to behave incorrectly.

Most of the answers here are about making your class behave like a dict, which isn't actually what you asked. If you want to express the idea, "I am a class that can be turned into a dict," I would simply define a bunch of classes and have them each implement .dict(). Python favors duck-typing (what an object can do) over what an object is. The ABC doesn't add much. Documentation serves the same purpose.

You can certainly overload dict() but you almost never want to! Too many aspects of the standard library depend upon dict being available and you will break most of its functionality. You cab probably do something like this though:
class Dictable:
def dict(self):
return self.__dict__

Related

MutableSequence to pass as a list in isinstance() check

I built a custom list-like class based on collections.MutableSequence:
class MyList(collections.MutableSequence):
etc... behave mostly like a list...
value = MyList([1,2,3])
Before processing list data, a third-party library runs a check:
def check_correct_type(value):
assert isinstance(value, list)
I do not wish to convert my custom list-like object to a built-in list before passing it to the third-party library.
Is there an elegant way to make an instance of MyList appear as though it was an instance of list in the isinstance(MyList([1,2,3]), list) check?
No, there is no way instances of your class can pass that test without inheriting from list. You have to subclass list to pass that test.
You can try inheriting from both MutableSequence and list; any method or attribute not implemented by your class or by MutableSequence will then be looked up on list, so you may get extra methods that you don't want this way and those may behave unexpectedly:
class MyList(collections.MutableSequence, list):
You could also monkeypatch the check_correct_type() function, provided it really is a stand-alone function like that:
def my_check_correct_type(value):
assert isinstance(value, collections.MutableSequence)
third_party_library.check_correct_type = my_check_correct_type

python dict of class objects that inherits attributes from class objects

How can I create a dictionary class 'D' that stores instances of some class 'A' and inherits wrapped versions of the methods of 'A'?
For example, suppose a = A() and a.first() is a function that does something to 'a' and returns some object. Then I want d = D({'k0': a0, 'k1': a1}) to have an attribute d.first() that returns {'k0': a0.first(), 'k1': a1.first()} and changes the underlying data in a0 and a1 as would calls to first().
Additionally, I want to only expose methods (and possibly attributes) that do not already exist in the dict class (no stomping) AND I want to do this dynamically (no hard coding of the methods of A, I already know how to type all the methods out by hand). Maybe consider dealing only with methods that don't start with '_' if that is easier somehow.
Also, I would like 'D' to do some handling of outputs from the calls depending on the return types (post-processing, wrapping whatever).
It seems like this might be kind of a bad idea but I just want to understand what approaches there are. From my reading so far, it seems like I could use multiple inheritance and write my own __new__ function or do something else with metaclassing. But the lore is that you should generally not mess around with __new__ unless you are a guru so this is making me hesitate. Is there some decorator trick for doing this? I think I could also use the inspect module to traverse the class 'A' but this feels like hack.
Update: As a very specific example. Imagine I have a dictionary of pandas.DataFrames and that I want the collection to appear and behave as much as possible like a single DataFrame with even tab completion of methods working in iPython (and I'm not using a pandas.Panel). For simplicity, consider only read-only behaviour (indexing and selecting), things like d.ix[0:10] and d.mean() should return something from applying the function to all the frames in the dictionary whilst d['k0'] should return the value in the dictionary corresponding to key 'k0'. A combination of hard-coding and getattr does what I want, but I am exploring to see how this could be done with less hard-coding.
I think this is a common design problem where you want to dynamically generate an iterable that behaves somewhat like the objects it contains.
Each of these work in limited testing:
#! /usr/bin/python3
class D(dict):
def __getattr__(self, name):
return lambda *args, **kw: {
k:getattr(v,name)(*args, **kw) for k,v in self.items() }
class D(dict):
def __getattr__(self, name):
def proxy(*args, **kw):
d2 = {k:getattr(v,name)(*args, **kw) for k,v in self.items()}
# Post-process if required
return d2
return proxy
d = D()
d['k1']='a1'
d['k2']='a2'
d['k3']='33'
print(d.capitalize()) # No-arg form passed to str, not dict
print(d.replace('a', 'Hello')) # Arguments
print(d.get('k1')) # .get() satisified by dict, not str
This is a bit of an odd question, but it very much depends on what 'D' is structurally.
If D is just a data structure, then it is perfectly valid to pack the dictionary with the objects, subcript and call the functions like so:
class A(object):
def __init__(self,x):
self.x = x
def first(self):
return self.x
a = A(7)
b = A(3)
d = {'X':a,'Y':b}
print d['X'].first()
Alongside this, if you want all "firsts" then you can do something like this:
firsts = {}
for i,j in d:
firsts[i]=j.first()
However, it seems like your challenge is a little more complex, plus unsafe: what if an objecy in the dictionary has a keys() method? Which takes precedence, and if its the dictionaries, how do I access the keys() method of the contents?
What I'd suggest is reframing, or reasking, the question with some more specifics. Firstly, what do the objects in the dictionary represent? And more importantly what does the dictionary represent?

Is the way my class inherits list class methods pythonicly correct?

A little example will help clarify my question:
I define two classes: Security and Universe which I would like to behave as a list of Secutity objects.
Here is my example code:
class Security(object):
def __init__(self, name):
self.name = name
class Universe(object):
def __init__(self, securities):
self.securities = securities
s1 = Security('name1')
s2 = Security('name2')
u = Universe([s1, s2])
I would like my Universe Class to be able to use usual list features such as enumerate(), len(), __getitem__()... :
enumerate(u)
len(u)
u[0]
So I defined my Class as:
class Universe(list, object):
def __init__(self, securities):
super(Universe, self).__init__(iter(securities))
self.securities = securities
It seems to work, but is it the appropriate pythonic way to do it ?
[EDIT]
The above solution does not work as I wish when I subset the list:
>>> s1 = Security('name1')
>>> s2 = Security('name2')
>>> s3 = Security('name3')
>>> u = Universe([s1, s2, s3])
>>> sub_u = u[0:2]
>>> type(u)
<class '__main__.Universe'>
>>> type(sub_u)
<type 'list'>
I would like my variable sub_u to remain of type Universe.
You don't have to actually be a list to use those features. That's the whole point of duck typing. Anything that defines __getitem__(self, i) automatically handles x[i], for i in x, iter(x), enumerate(x), and various other things. Also define __len__(self) and len(x), list(x), etc. also work. Or you can define __iter__ instead of __getitem__. Or both. It depends on exactly how list-y you want to be.
The documentation on Python's special methods explains what each one is for, and organizes them pretty nicely.
For example:
class FakeList(object):
def __getitem__(self, i):
return -i
fl = FakeList()
print(fl[20])
for i, e in enumerate(fl):
print(i)
if e < -2: break
No list in sight.
If you actually have a real list and want to represent its data as your own, there are two ways to do that: delegation, and inheritance. Both work, and both are appropriate in different cases.
If your object really is a list plus some extra stuff, use inheritance. If you find yourself stepping on the base class's behavior, you may want to switch to delegation anyway, but at least start with inheritance. This is easy:
class Universe(list): # don't add object also, just list
def __init__(self, securities):
super(Universe, self).__init__(iter(securities))
# don't also store `securities`--you already have `self`!
You may also want to override __new__, which allows you to get the iter(securities) into the list at creation time rather than initialization time, but this doesn't usually matter for a list. (It's more important for immutable types like str.)
If the fact that your object owns a list rather than being one is inherent in its design, use delegation.
The simplest way to delegate is explicitly. Define the exact same methods you'd define to fake being a list, and make them all just forward to the list you own:
class Universe(object):
def __init__(self, securities):
self.securities = list(securities)
def __getitem__(self, index):
return self.securities[index] # or .__getitem__[index] if you prefer
# ... etc.
You can also do delegation through __getattr__:
class Universe(object):
def __init__(self, securities):
self.securities = list(securities)
# no __getitem__, __len__, etc.
def __getattr__(self, name):
if name in ('__getitem__', '__len__',
# and so on
):
return getattr(self.securities, name)
raise AttributeError("'{}' object has no attribute '{}'"
.format(self.__class__.__name__), name)
Note that many of list's methods will return a new list. If you want them to return a new Universe instead, you need to wrap those methods. But keep in mind that some of those methods are binary operators—for example, should a + b return a Universe only if a is one, or only if both are, or if either are?
Also, __getitem__ is a little tricky, because they can return either a list or a single object, and you only want to wrap the former in a Universe. You can do that by checking the return value for isinstance(ret, list), or by checking the index for isinstance(index, slice); which one is appropriate depends on whether you can have lists as element of a Universe, and whether they should be treated as a list or as a Universe when extracted. Plus, if you're using inheritance, in Python 2, you also need to wrap the deprecated __getslice__ and friends, because list does support them (although __getslice__ always returns a sub-list, not an element, so it's pretty easy).
Once you decide those things, the implementations are easy, if a bit tedious. Here are examples for all three versions, using __getitem__ because it's tricky, and the one you asked about in a comment. I'll show a way to use generic helpers for wrapping, even though in this case you may only need it for one method, so it may be overkill.
Inheritance:
class Universe(list): # don't add object also, just list
#classmethod
def _wrap_if_needed(cls, value):
if isinstance(value, list):
return cls(value)
else:
return value
def __getitem__(self, index):
ret = super(Universe, self).__getitem__(index)
return _wrap_if_needed(ret)
Explicit delegation:
class Universe(object):
# same _wrap_if_needed
def __getitem__(self, index):
ret = self.securities.__getitem__(index)
return self._wrap_if_needed(ret)
Dynamic delegation:
class Universe(object):
# same _wrap_if_needed
#classmethod
def _wrap_func(cls, func):
#functools.wraps(func)
def wrapper(*args, **kwargs):
return cls._wrap_if_needed(func(*args, **kwargs))
def __getattr__(self, name):
if name in ('__getitem__'):
return self._wrap_func(getattr(self.securities, name))
elif name in ('__len__',
# and so on
):
return getattr(self.securities, name)
raise AttributeError("'{}' object has no attribute '{}'"
.format(self.__class__.__name__), name)
As I said, this may be overkill in this case, especially for the __getattr__ version. If you just want to override one method, like __getitem__, and delegate everything else, you can always define __getitem__ explicitly, and let __getattr__ handle everything else.
If you find yourself doing this kind of wrapping a lot, you can write a function that generates wrapper classes, or a class decorator that lets you write skeleton wrappers and fills in the details, etc. Because the details depend on your use case (all those issues I mentioned above that can go one way or the other), there's no one-size-fits-all library that just magically does what you want, but there are a number of recipes on ActiveState that show more complete details—and there are even a few wrappers in the standard library source.
That is a reasonable way to do it, although you don't need to inherit from both list and object. list alone is enough. Also, if your class is a list, you don't need to store self.securities; it will be stored as the contents of the list.
However, depending on what you want to use your class for, you may find it easier to define a class that stores a list internally (as you were storing self.securities), and then define methods on your class that (sometimes) pass through to the methods of this stored list, instead of inheriting from list. The Python builtin types don't define a rigorous interface in terms of which methods depend on which other ones (e.g., whether append depends on insert), so you can run into confusing behavior if you try to do any nontrivial manipulations of the contents of your list-class.
Edit: As you discovered, any operation that returns a new list falls into this category. If you subclass list without overriding its methods, then you call methods on your object (explicitly or implicitly), the underlying list methods will be called. These methods are hardcoded to return a plain Python list and do not check what the actual class of the object is, so they will return a plain Python list.

Python Overriding String __hash__

I'm trying to create a custom hashing function for strings. I want to hash strings by their character frequency by weight. So that hi and ih will yield the same hash. Can I override __hash__?
Or is creating a wrapper class that holds the string and overriding __hash__ and __eq__ the only way?
You want a derived type with different equality semantics. Usually the approach taken will be to define how equality works, then build the hash method from the structures derived there, since it's neccesary that the hash agree with equality. That might be:
import collections
class FrequencyString(str):
#property
def normalized(self):
try:
return self._normalized
except AttributeError:
self._normalized = normalized = ''.join(sorted(collections.Counter(self).elements()))
return normalized
def __eq__(self, other):
return self.normalized == other.normalized
def __hash__(self):
return hash(self.normalized)
Your assumption is right, you cannot override the base clases in Python. Although can, of course, override what str() will do, it won't work for string literals.
If you are writing code for pre-python 2.2 look at the UserString class if you want to create your own: http://docs.python.org/2/library/userdict.html#module-UserString
Otherwise you can simply inherit str or unicode
In your case simply overwriting the __hash__ method is enough if you want to use it as a dict key. But if you're looking at comparisons than you would have to overwrite __eq__ or __cmp__
You can inherit from str, but since those are immutable you have to subclass them in a slightly different way. Most likely you will want to create new ones from existing strings, so you must also override the __new__ method. You may also have to put in extra special methods to defeat the optimizations that Python does.
Here is an example of subclassing built-in str, the mapstr object that allows easy substitution placeholders in forms.

Object directing to a property when accessed as an iterable

I'm trying to figure out if there's an elegant and concise way to have a class accessing one of its own properties when "used" as a dictionary, basically redirecting all the methods that'd be implemented in an ordered dictionary to one of its properties.
Currently I'm inheriting from IterableUserDict and explicitly setting its data to another property, and it seems to be working, but I know that UserDict is considered sort of old, and I'm concerned I might be overlooking something.
What I have:
class ConnectionInterface(IterableUserDict):
def __init__(self, hostObject):
self._hostObject= hostObject
self.ports= odict.OrderedDict()
self.inputPorts= odict.OrderedDict()
self.outputPorts= odict.OrderedDict()
self.data= self.ports
This way I expect the object to behave and respond (and be used) the way I mean it to, except I want to get a freebie ordered dictionary behaviour on its property "ports" when it's iterated, items are gotten by key, something is looked up ala if this in myObject, and so on.
Any advice welcome, the above seems to be working fine, but I have an odd itch that I might be missing something.
Thanks in advance.
In the end inheriting IterableUserDict and setting self.data explicitly worked out to what I needed and hasn't had any unforeseen consequences or added dodgyness when serialising and deserialising.
Sticking to my original solution I guess and can recommend it if anybody needs a simple and full fledged dict like behaviour on a selected subset of data in their own objects.
It's fairly simple and doesn't have particularly strict scalability or complexity requirements stressing it though.
Sure, you can do this. The primary thing with dictionaries is the getattr and setattr methods, so you can implement the magic methods __getattr__ and __setattr__ something like this:
def __getattr__(self, key):
return self.ports[key]
def __setattr__(self, key, value):
self.ports[key] = value
If you want implementation for .keys() and .values() and stuff, just write them in this style:
def keys(self):
return self.ports.keys()

Categories