Lookup table for unhashable in Python - python

I need to create a mapping from objects of my own custom class (derived from dict) to objects of another custom class. As I see it there are two ways of doing this:
I can make the objects hashable. I'm not sure how I would do this. I know I can implement __hash__() but I'm unsure how to actually calculate the hash (which should be an integer).
Since my objects can be compared I can make a list [(myobj, myotherobj)] and then implement a lookup which finds the tuple where the first item in the tuple is the same as the lookup key. Implementing this is trivial (the number of objects is small) but I want to avoid reinventing the wheel if something like this already exists in the standard library.
It seems to me that wanting to look up unhashables would be a common problem so I assume someone has already solved this problem. Any suggestions on how to implement __hash()__ for a dict-like object or if there is some other standard way of making lookup tables of unhashables?

Mappings with mutable objects as keys are generally difficult. Is that really what you want? If you consider your objects to be immutable (there is no way to really enforce immutability in Python), or you know they will not be changed while they are used as keys in a mapping, you can implement your own hash-function for them in several ways. For instance, if your object only has hashable data-members, you can return the hash of a tuple of all data-members as the objects hash.
If your object is a dict-like, you can use the hash of a frozenset of all key-value-pairs.
def __hash__(self):
return hash(frozenset(self.iteritems()))
This only works if all values are hashable. In order to save recalculations of the hashes (which would be done on every lookup), you can cache the hash-value and just recalculate it if some dirty-flag is set.

A simple solution seems to be to do lookup[id(myobj)] = myotherobj instead of lookup[myobj] = myotherobj. Any commente on this approach?

The following should work if you're not storing any additional unhashable objects in your custom class:
def __hash__(self):
return hash(self.items())

Here is an implementation of a frozendict, taken from http://code.activestate.com/recipes/414283/:
class frozendict(dict):
def _blocked_attribute(obj):
raise AttributeError, "A frozendict cannot be modified."
_blocked_attribute = property(_blocked_attribute)
__delitem__ = __setitem__ = clear = _blocked_attribute
pop = popitem = setdefault = update = _blocked_attribute
def __new__(cls, *args):
new = dict.__new__(cls)
dict.__init__(new, *args)
return new
def __init__(self, *args):
pass
def __hash__(self):
try:
return self._cached_hash
except AttributeError:
h = self._cached_hash = hash(tuple(sorted(self.items())))
return h
def __repr__(self):
return "frozendict(%s)" % dict.__repr__(self)
I would replace tuple(sorted(self.items())) by frozenset(self.iteritems()) as in Spacecowboy's answer. And consider adding __slots__ = ("_cached_hash",) to the class.

Related

best way to created set of named objects in Python

I find myself frequently creating sets of named objects where each object has a unique name. I implement these as dicts whose keys are derived from myObject.name. But this feels a bit clunky to keep the name in two places.
My typical approach looks like this:
class NamedObject(object):
ITEMS = {}
def __init__(self, name, ...other arguments...):
self.name = name
...more initialization...
#classmethod
def create_named_object(cls, name, ...other arguments...):
obj = cls(name, ...other arguments...)
cls.ITEMS[name] = obj
#classmethod
def find_object_by_name(cls, name):
return cls.ITEMS.get(name, None)
#classmethod
def filter_objects(cls, predicate):
return [e for e in cls.ITEMS.values() if predicate(e)]
I know I could create a generalized class to handle this, but is there a more naturally Pythonic way to do this?
There is no more generalised support in the standard library, no, nor is there any more 'Pythonic' way to achieve this than using a dictionary.
What you are doing is providing an lookup table index, and indices generally require some duplication of data. You are trading memory for access speed. But indices are use-case specific and either trivially implemented with a mapping, or too specific to the application to be generalisable to the level that adding that to the language library makes sense.
At least in Python, the string value for the name is not actually duplicated; you just add more references to it; once from the instance and another time from the ITEMS dictionary.

Pool of hashable objects

I've made a highly recursive, hashable (assumed immutable) datastructure. Thus it would be nice to have only one instance of each object (if objectA == objectB, then there is no reason not to have objectA is objectB).
I have tried solving it by defining a custom __new__(). It creates the requested object, then checks if it is in a dictionary (stored as a class variable). The object is added to the dict if necessary and then returned. If it is already in the dict, the version in the dict is returned and the newly created instance passes out of scope.
This solution works, but
I have to have a dict where the value at each key is the same object. What I really need is to extract an object from a set when I "show" the set an equal object. Is there a more elegant way of doing this?
Is there a builtin/canonical solution to my problem in Python? Such as a class I can inherit from or something....
My current implementation is along these lines:
class NoDuplicates(object):
pool = dict()
def __new__(cls, *args):
new_instance = object.__new__(cls)
new_instance.__init__(*args)
if new_instance in cls.pool:
return cls.pool[new_instance]
else:
cls.pool[new_instance] = new_instance
return new_instance
I am not a programmer by profession, so I suspect this corresponds to some well known technique or concept. The most similar concepts that come to mind are memoization and singleton.
One subtle problem with the above implementation is that __init__ is always called on the return value from __new__. I made a metaclass to modify this behaviour. But that ended up causing a lot of trouble since NoDuplicates also inherits from dict.
First, I would use a factory instead of overriding __new__. See Python's use of __new__ and __init__?.
Second, you can use tuples of arguments needed to create an object as dictionary keys (if same arguments produce same objects, of course), so you won't need to create an actual (expensive to create) object instance.

Is the way my class inherits list class methods pythonicly correct?

A little example will help clarify my question:
I define two classes: Security and Universe which I would like to behave as a list of Secutity objects.
Here is my example code:
class Security(object):
def __init__(self, name):
self.name = name
class Universe(object):
def __init__(self, securities):
self.securities = securities
s1 = Security('name1')
s2 = Security('name2')
u = Universe([s1, s2])
I would like my Universe Class to be able to use usual list features such as enumerate(), len(), __getitem__()... :
enumerate(u)
len(u)
u[0]
So I defined my Class as:
class Universe(list, object):
def __init__(self, securities):
super(Universe, self).__init__(iter(securities))
self.securities = securities
It seems to work, but is it the appropriate pythonic way to do it ?
[EDIT]
The above solution does not work as I wish when I subset the list:
>>> s1 = Security('name1')
>>> s2 = Security('name2')
>>> s3 = Security('name3')
>>> u = Universe([s1, s2, s3])
>>> sub_u = u[0:2]
>>> type(u)
<class '__main__.Universe'>
>>> type(sub_u)
<type 'list'>
I would like my variable sub_u to remain of type Universe.
You don't have to actually be a list to use those features. That's the whole point of duck typing. Anything that defines __getitem__(self, i) automatically handles x[i], for i in x, iter(x), enumerate(x), and various other things. Also define __len__(self) and len(x), list(x), etc. also work. Or you can define __iter__ instead of __getitem__. Or both. It depends on exactly how list-y you want to be.
The documentation on Python's special methods explains what each one is for, and organizes them pretty nicely.
For example:
class FakeList(object):
def __getitem__(self, i):
return -i
fl = FakeList()
print(fl[20])
for i, e in enumerate(fl):
print(i)
if e < -2: break
No list in sight.
If you actually have a real list and want to represent its data as your own, there are two ways to do that: delegation, and inheritance. Both work, and both are appropriate in different cases.
If your object really is a list plus some extra stuff, use inheritance. If you find yourself stepping on the base class's behavior, you may want to switch to delegation anyway, but at least start with inheritance. This is easy:
class Universe(list): # don't add object also, just list
def __init__(self, securities):
super(Universe, self).__init__(iter(securities))
# don't also store `securities`--you already have `self`!
You may also want to override __new__, which allows you to get the iter(securities) into the list at creation time rather than initialization time, but this doesn't usually matter for a list. (It's more important for immutable types like str.)
If the fact that your object owns a list rather than being one is inherent in its design, use delegation.
The simplest way to delegate is explicitly. Define the exact same methods you'd define to fake being a list, and make them all just forward to the list you own:
class Universe(object):
def __init__(self, securities):
self.securities = list(securities)
def __getitem__(self, index):
return self.securities[index] # or .__getitem__[index] if you prefer
# ... etc.
You can also do delegation through __getattr__:
class Universe(object):
def __init__(self, securities):
self.securities = list(securities)
# no __getitem__, __len__, etc.
def __getattr__(self, name):
if name in ('__getitem__', '__len__',
# and so on
):
return getattr(self.securities, name)
raise AttributeError("'{}' object has no attribute '{}'"
.format(self.__class__.__name__), name)
Note that many of list's methods will return a new list. If you want them to return a new Universe instead, you need to wrap those methods. But keep in mind that some of those methods are binary operators—for example, should a + b return a Universe only if a is one, or only if both are, or if either are?
Also, __getitem__ is a little tricky, because they can return either a list or a single object, and you only want to wrap the former in a Universe. You can do that by checking the return value for isinstance(ret, list), or by checking the index for isinstance(index, slice); which one is appropriate depends on whether you can have lists as element of a Universe, and whether they should be treated as a list or as a Universe when extracted. Plus, if you're using inheritance, in Python 2, you also need to wrap the deprecated __getslice__ and friends, because list does support them (although __getslice__ always returns a sub-list, not an element, so it's pretty easy).
Once you decide those things, the implementations are easy, if a bit tedious. Here are examples for all three versions, using __getitem__ because it's tricky, and the one you asked about in a comment. I'll show a way to use generic helpers for wrapping, even though in this case you may only need it for one method, so it may be overkill.
Inheritance:
class Universe(list): # don't add object also, just list
#classmethod
def _wrap_if_needed(cls, value):
if isinstance(value, list):
return cls(value)
else:
return value
def __getitem__(self, index):
ret = super(Universe, self).__getitem__(index)
return _wrap_if_needed(ret)
Explicit delegation:
class Universe(object):
# same _wrap_if_needed
def __getitem__(self, index):
ret = self.securities.__getitem__(index)
return self._wrap_if_needed(ret)
Dynamic delegation:
class Universe(object):
# same _wrap_if_needed
#classmethod
def _wrap_func(cls, func):
#functools.wraps(func)
def wrapper(*args, **kwargs):
return cls._wrap_if_needed(func(*args, **kwargs))
def __getattr__(self, name):
if name in ('__getitem__'):
return self._wrap_func(getattr(self.securities, name))
elif name in ('__len__',
# and so on
):
return getattr(self.securities, name)
raise AttributeError("'{}' object has no attribute '{}'"
.format(self.__class__.__name__), name)
As I said, this may be overkill in this case, especially for the __getattr__ version. If you just want to override one method, like __getitem__, and delegate everything else, you can always define __getitem__ explicitly, and let __getattr__ handle everything else.
If you find yourself doing this kind of wrapping a lot, you can write a function that generates wrapper classes, or a class decorator that lets you write skeleton wrappers and fills in the details, etc. Because the details depend on your use case (all those issues I mentioned above that can go one way or the other), there's no one-size-fits-all library that just magically does what you want, but there are a number of recipes on ActiveState that show more complete details—and there are even a few wrappers in the standard library source.
That is a reasonable way to do it, although you don't need to inherit from both list and object. list alone is enough. Also, if your class is a list, you don't need to store self.securities; it will be stored as the contents of the list.
However, depending on what you want to use your class for, you may find it easier to define a class that stores a list internally (as you were storing self.securities), and then define methods on your class that (sometimes) pass through to the methods of this stored list, instead of inheriting from list. The Python builtin types don't define a rigorous interface in terms of which methods depend on which other ones (e.g., whether append depends on insert), so you can run into confusing behavior if you try to do any nontrivial manipulations of the contents of your list-class.
Edit: As you discovered, any operation that returns a new list falls into this category. If you subclass list without overriding its methods, then you call methods on your object (explicitly or implicitly), the underlying list methods will be called. These methods are hardcoded to return a plain Python list and do not check what the actual class of the object is, so they will return a plain Python list.

Python idiom for dict-able classes?

I want to do something like this:
class Dictable:
def dict(self):
raise NotImplementedError
class Foo(Dictable):
def dict(self):
return {'bar1': self.bar1, 'bar2': self.bar2}
Is there a more pythonic way to do this? For example, is it possible to overload the built-in conversion dict(...)? Note that I don't necessarily want to return all the member variables of Foo, I'd rather have each class decide what to return.
Thanks.
The Pythonic way depends on what you want to do. If your objects shouldn't be regarded as mappings in their own right, then a dict method is perfectly fine, but you shouldn't "overload" dict to handle dictables. Whether or not you need the base class depends on whether you want to do isinstance(x, Dictable); note that hasattr(x, "dict") would serve pretty much the same purpose.
If the classes are conceptually mappings of keys to values, then implementing the Mapping protocol seems appropriate. I.e., you'd implement
__getitem__
__iter__
__len__
and inherit from collections.Mapping to get the other methods. Then you get dict(Foo()) for free. Example:
class Foo(Mapping):
def __getitem__(self, key):
if key not in ("bar1", "bar2"):
raise KeyError("{} not found".format(repr(key))
return getattr(self, key)
def __iter__(self):
yield "bar1"
yield "bar2"
def __len__(self):
return 2
Firstly, look at collections.ABC, which describes the Python abstract base class protocol (equivalent to interfaces in static languages).
Then, decide if you want to write your own ABC or make use of an existing one; in this case, Mapping might be what you want.
Note that although the dict constructor (i.e. dict(my_object)) is not overrideable, if it encounters an iterable object that yields a sequence of key-value pairs, it will construct a dict from that; i.e. (Python 2; for Python 3 replace items with iteritems):
def __iter__(self):
return {'bar1': self.bar1, 'bar2': self.bar2}.iteritems()
However, if your classes are intended to behave like a dict you shouldn't do this as it's different from the expected behaviour of a Mapping instance, which is to iterate over keys, not key-value pairs. In particular it would cause for .. in to behave incorrectly.
Most of the answers here are about making your class behave like a dict, which isn't actually what you asked. If you want to express the idea, "I am a class that can be turned into a dict," I would simply define a bunch of classes and have them each implement .dict(). Python favors duck-typing (what an object can do) over what an object is. The ABC doesn't add much. Documentation serves the same purpose.
You can certainly overload dict() but you almost never want to! Too many aspects of the standard library depend upon dict being available and you will break most of its functionality. You cab probably do something like this though:
class Dictable:
def dict(self):
return self.__dict__

Override reversed(...) in Python 2.5

I need a custom __reverse__ function for my class that I am deploying on App Engine, so it needs to work with Python 2.5. Is there a __future__ import or a workaround I could use?
Subclassing list won't work, as I need my class to be a subclass of dict.
EDIT:
Using OrderedDict will not solve the problems, because the dict keys are not the same the same as the list items.
This is the object I'm trying to create:
My object needs to provide the same attributes as a list, i.e. support iter(obj) and reverse(obj).
The elements must be instances of a special third party class.
Each elements is associated with a key.
Internally, need to access these objects using their keys. That's why I'd put them in a mapping.
I've revised my implementation to be a list subclass instead of a dict subclass, so here's what I have now:
class Foo(list):
pat = {}
def __init__(self):
for app in APPS: # these are strings
obj = SpecialClass(app)
self.append(obj)
self.pat[app] = obj
def __getitem__(self, item):
# Use object as a list
if isinstance(item, int):
return super(Foo, self).__getitem__(item)
# Use object as a dict
if item not in self.pat:
# Never raise a KeyError
self.pat[item] = SpecialClass(None)
return self.pat[item]
def __setitem__(self, item, value):
if isinstance(item, int):
return self.pat.__setitem__(item, value)
return super(Foo).__setitem__(item, value)
EDIT 2:
Now that my class is a subclass of list, my problem is resolved.
__reversed__ isn't supported in 2.5, so your only option if you really need to customize the reversed order of your collection, is to modify the places that you call reversed to use something else.
But I'm curious: if you are subclassing dict, then the order of items is arbitrary anyway, so what does reversed mean in this case?
Creating a custom __reversed__ is only possible since 2.6, so you can't simply implement that and have reversed working in 2.5. In 2.5 and below, you can however make your custom class still working with reversed by implementing the sequence protocol (i.e. implement both __len__ and __getitem__).
A different possibility would be to replace the built-in function reversed with a custom function that treats your custom class differently. This could work like this:
originalReversed = reversed
def myReversed ( seq ):
if isinstance( seq, MyCustomClass ):
# do something special
else:
return originalReversed( seq )
reversed = myReversed
However, I wouldn't recommend that as it changes the normal behaviour of built-in functions (obviously) and might confuse other users.. So you should rather implement the sequnce protocol to make reversed working.

Categories