Comparing instances of a dict subclass - python

I have subclassed dict to add an extra method (so no overriding).
Now, I try to compare two of those subclasses, and I get something weird :
>>> d1.items() == d2.items()
True
>>> d1.values() == d2.values()
True
>>> d1.keys() == d2.keys()
True
>>> d1 == d2
False
EDIT
That's damn weird ... I don't understand at all ! Anybody with an insight on how the dict.eq is implemented ?
Following is all the code :
# ------ Bellow is my dict subclass (with no overriding) :
class ClassSetDict(dict):
def subsetget(self, klass, default=None):
class_sets = set(filter(lambda cs: klass <= cs, self))
# Eliminate supersets
for cs1 in class_sets.copy():
for cs2 in class_sets.copy():
if cs1 <= cs2 and not cs1 is cs2:
class_sets.discard(cs2)
try:
best_match = list(class_sets)[0]
except IndexError:
return default
return self[best_match]
# ------ Then an implementation of class sets
class ClassSet(object):
# Set of classes, allowing to easily calculate inclusions
# with comparison operators : `a < B` <=> "A strictly included in B"
def __init__(self, klass):
self.klass = klass
def __ne__(self, other):
return not self == other
def __gt__(self, other):
other = self._default_to_singleton(other)
return not self == other and other < self
def __le__(self, other):
return self < other or self == other
def __ge__(self, other):
return self > other or self == other
def _default_to_singleton(self, klass):
if not isinstance(klass, ClassSet):
return Singleton(klass)
else:
return klass
class Singleton(ClassSet):
def __eq__(self, other):
other = self._default_to_singleton(other)
return self.klass == other.klass
def __lt__(self, other):
if isinstance(other, AllSubSetsOf):
return issubclass(self.klass, other.klass)
else:
return False
class AllSubSetsOf(ClassSet):
def __eq__(self, other):
if isinstance(other, AllSubSetsOf):
return self.klass == other.klass
else:
return False
def __lt__(self, other):
if isinstance(other, AllSubSetsOf):
return issubclass(self.klass, other.klass) and not other == self
else:
return False
# ------ and finally the 2 dicts that don't want to be equal !!!
d1 = ClassSetDict({AllSubSetsOf(object): (int,)})
d2 = ClassSetDict({AllSubSetsOf(object): (int,)})

the problem you're seing has nothing at all to do with subclassing dict. in fact this behavior can be seen using a regular dict. The problem is how you have defined the keys you're using. A simple class like:
>>> class Foo(object):
... def __init__(self, value):
... self.value = value
...
... def __eq__(self, other):
... return self.value == other.value
...
Is enough to demonstrate the problem:
>>> f1 = Foo(5)
>>> f2 = Foo(5)
>>> f1 == f2
True
>>> d1 = {f1: 6}
>>> d2 = {f2: 6}
>>> d1.items() == d2.items()
True
>>> d1 == d2
False
What's missing is that you forgot to define __hash__. Every time you change the equality semantics of a class, you should make sure that the __hash__ method agrees with it: when two objects are equal, they must have equal hashes. dict behavior depends strongly on the hash value of keys.
When you inherit from object, you automatically get both __eq__ and __hash__, the former compares object identity, and the latter returns the address of the object (so they agree), but when you change __eq__, you're still seeing the old __hash__, which no longer agrees and dict gets lost.
Simply provide a __hash__ method that in a stable way combines the hash values of its attributes.
>>> class Bar(object):
... def __init__(self, value):
... self.value = value
...
... def __eq__(self, other):
... return self.value == other.value
...
... def __hash__(self):
... return hash((Bar, self.value))
...
>>> b1 = Bar(5)
>>> b2 = Bar(5)
>>> {b1: 6} == {b2: 6}
True
>>>
When using __hash__ in this way, it's also a good idea to make sure that the attributes do not (or better, cannot) change after the object is created. If the hash value changes while collected in a dict, the key will be "lost", and all sorts of weird things can happen (even weirder than the issue you initially asked about)

This most probably depends from some implementation details, in fact a basic subclassing doesn't show this problem:
>>> class D(dict):
... def my_method(self):
... pass
...
>>> d1 = D(alpha=123)
>>> d1
{'alpha': 123}
>>> d2 = D(alpha=123)
>>> d1.items() == d2.items()
True
>>> d1.values() == d2.values()
True
>>> d1.keys() == d2.keys()
True
>>> d1 == d2
True

Your instance of "AllSubSetsOf" asre used as dict keys -- they should have a hash method.
Try adding a
def __hash__(self):
return hash(self.klass)
method to either ClassSet or AllSubSetsOf

I do so hate it when people say things like "The dicts contain funky stuff, so it wouldn't help much to show" since it is precisely the nature of the funky stuff that matters here.
The first thing to note is that if you had exactly the opposite result it wouldn't be surprising at all: i.e. if d1.items(), d1.values(), d1.keys() were not equal to d2.items(), d2.values(), d2.keys() you could quite happily have d1 == d2. That's because dictionaries don't compare by comparing items or keys, they use a different technique which (I think) is the source of your problem.
Effectively comparing two dictionaries first checks they are the same length, then goes through all the keys in the first dictionary to find the smallest one that doesn't match the key/value from the second dictionary. So what we are actually looking for is a case where d1.keys()==d2.keys() but for some k either k not in d1 or k not in d2 or d1[k] != d2[k].
I think the clue may be in the objects you are using as dictionary keys. If they are mutable you can store an object in the dictionary but then mutate it and it becomes inaccessible through normal means. The keys() method may still find it though and in that case you could get what you are seeing.
Now you've updated the question with the AllSubSetsOf class: it is the missing __hash__() method that is the problem. Two different instances can compare equal: AllSubSetsOf(object)==allSubSetsOf(object) but the hash values are just hashing on the address so they will be different.
>>> class AllSubSetsOf(object):
def __init__(self, klass):
self.klass = klass
def __eq__(self, other):
if isinstance(other, AllSubSetsOf):
return self.klass == other.klass
else:
return False
def __lt__(self, other):
if isinstance(other, AllSubSetsOf):
return issubclass(self.klass, other.klass) and not other == self
else:
return False
>>> a = AllSubSetsOf(object)
>>> b = AllSubSetsOf(object)
>>> a==b
True
>>> hash(a), hash(b)
(2400161, 2401895)
>>>

Related

Comparing two objects using __dict__

Is there ever a reason not to do this to compare two objects:
def __eq__(self, other):
return self.__dict__ == other.__dict__
as opposed to checking each individual attribute:
def __eq__(self, other):
return self.get_a() == other.get_a() and self.get_b() == other.get_b() and ...
Initially I had the latter, but figured the former was the cleaner solution.
You can be explicit and concise:
def __eq__(self, other):
fetcher = operator.attrgetter("a", "b", "c", "d")
try:
return self is other or fetcher(self) == fetcher(other)
except AttributeError:
return False
Just comparing the __dict__ attribute (which might not exist if __slots__ is used) leaves you open to the risk that an unexpected attribute exists on the object:
class A:
def __init__(self, a):
self.a = a
def __eq__(self, other):
return self.__dict__ == other.__dict__
a1 = A(5)
a2 = A(5)
a1.b = 3
assert a1 == a2 # Fails
Some comments:
You should include a self is other check, otherwise, under certain conditions, the same object in memory can compare unequal to itself. Here is a demonstration. The instance-check chrisz mentioned in the comments is a good idea as well.
The dicts of self and other probably contain many more items than the attributes you are manually checking for in the second version. Therefore, the first one will be slower.
(Lastly, but not related to the question, we don't write getters and setters in Python. Access attributes directly with the dot-notation, and if something special needs to happen when getting/setting an attribute, use a property.)

Python: Is it possible to do a "deep class override"?

I want to create a subclass of dict that includes a custom comparison function that applies to all nested dicts. This example class ignores all dict values with the key 'j' at the top level, but doesn't replace lower level dicts when a copy is made:
import copy
p = {'a': 1, 'j': 2, 'c': [{'j':'cat','k':'dog'}]}
class udict(dict):
def __init__(self, x):
dict.__init__(self, copy.deepcopy(x))
def __eq__(self, other):
return all([self[k]==other[k] for k in set(self.keys())-set('j')])
a = udict(p)
b = udict(p)
a==b # True
b['j'] = 5
a==b # True - 'j' keys are imaginary and invisible
b['a'] = 5
a==b # False
b = udict(p)
b['c'][0]['j'] = 'bird'
a==b # False (should be True, but list contains dicts, not udicts)
I could manually tree-walk arbitrarily deep data structures replacing each dict with a udict, but if I have to walk the data structure anyway, I'll just do the comparison in the recursion without defining a custom class.
So is there a way to define a custom subclass that automatically replaces all embedded instances of the base class?
You can implement the __deepcopy__ method on your
custom class: https://docs.python.org/2/library/copy.html -
You will have to "use recursion" - but it still seens it will be easier than anythng else you'd have to do in there:
from copy import deepcopy
def custom_deepcopier(dct, memo=None):
result = MD()
for key, value in dct.items():
if isinstance(value, dict):
result[key] = MD(value)
else:
result[key] = deepcopy(value, memo)
return result
class MD(dict):
def __init__(self, x=None):
if x:
dict.__init__(self, custom_deepcopier(x))
def __eq__(self, other):
...
__deepcopy__ = custom_deepcopier
In declaring things this way, the custom_deepcopier is used both as the deepcopy method called authomatically when deep-copying one of your custom dicts, but can also be "bootstraped" with a plain dictionary, being called as a stand-alone function.
And finally, not directly related to the answer you need, on your real code, consider inheriting from collections.UserDict instead of dict - there are some shortcuts in the native code for dicts that might bring in bad surprises for you in your inherited classes. (including in the inherent recursion used for __eq__)
A simpler approach requires no copying of data, and the recursion that replaces selected dicts with a subclass is short, explicit, and easily understandable. The subclass overrides only the equality test, it doesn't need __init__ or __copy__ methods:
class MyDict(dict):
def __eq__(self, other):
return <custom equality test result>
def replaceable(var):
if <dict instance should be replaced by subclass instance>:
return <dict of instances to be replaced>
return {}
def replacedict(var)
if isinstance(var, list):
for i, v in enumerate(var):
var[i] = replacedict(v)
elif isinstance(var, dict):
for k, v in var.items():
var[k] = replacedict(v)
rep = replaceable(var)
for k, v in rep.items():
rep[k] = MyDict(v)
return(var)
For the specific case of testing JSON Schemas to test if multiple properties can be merged into a patternProperties:
def replaceable(var):
if 'type' in var and var['type'] == 'object' and \
'properties' in var and isinstance(var['properties'],dict):
return var['properties']
return {}

Check if object is in list (not "by value", but by id)

Consider the following code:
>>> class A(object):
... def __init__(self, a):
... self.a = a
... def __eq__(self, other):
... return self.a==other.a
...
>>> a=A(1)
>>> b=A(1)
>>> c=A(2)
>>> a==b
True # because __eq__ says so
>>> a==c
False # because __eq__ says so
>>> a is b
False # because they're different objects
>>> l = [b,c]
>>> a in l
True # seems to use __eq__ under the hood
So, in seems to use __eq__ to determine whether or not something is in a container.
Where can one find documentation on this behavior?
Is it possible to make in use object identity, a.k.a. a in somelist if the object a is in somelist, and not some other object that compares equal to a?
Use the any() function and a generator expression:
any(o is a for o in l)
The behaviour of in is documented in the Common Sequence Operators section:
x in s
True if an item of s is equal to x, else False
Bold emphasis mine.
If you must use in, use a wrapper object with a custom __eq__ method that uses is, or build your own container where a custom __contains__ method uses is to test against each contained element.
The wrapper could look like this:
class IdentityWrapper(object):
def __init__(self, ob):
self.ob = ob
def __eq__(self, other):
return other is self.ob
Demo:
>>> IdentityWrapper(a) in l
False
>>> IdentityWrapper(a) in (l + [a])
True
The container could just use the same any() function outlined above:
class IdentityList(list):
def __contains__(self, other):
return any(o is other for o in self)
Demo:
>>> il = IdentityList(l)
>>> a in il
False
>>> a in IdentityList(l + [a])
True
If you do not want to change A behaviour, you may prepare thin wrapper for used container. To change how in operator behaves, magic method __contains__ needs to get overridden. Quoting docs:
Called to implement membership test operators. Should return true if
item is in self, false otherwise. For mapping objects, this should
consider the keys of the mapping rather than the values or the
key-item pairs.
Sample code:
class A(object):
def __init__(self, a):
self.a = a
def __eq__(self, other):
return self.a == other.a
class IdentityList(list):
def __contains__(self, obj):
return any(o is obj for o in self)
a = A(1)
b = A(1)
c = A(2)
container = [b, c]
identity_container = IdentityList(container)
assert a in container # not desired output (described in question)
assert a not in identity_container # desired output

Unmutable dictionary in python [duplicate]

A frozen set is a frozenset.
A frozen list could be a tuple.
What would a frozen dict be? An immutable, hashable dict.
I guess it could be something like collections.namedtuple, but that is more like a frozen-keys dict (a half-frozen dict). Isn't it?
A "frozendict" should be a frozen dictionary, it should have keys, values, get, etc., and support in, for, etc.
update :
* there it is : https://www.python.org/dev/peps/pep-0603
Python doesn't have a builtin frozendict type. It turns out this wouldn't be useful too often (though it would still probably be useful more often than frozenset is).
The most common reason to want such a type is when memoizing function calls for functions with unknown arguments. The most common solution to store a hashable equivalent of a dict (where the values are hashable) is something like tuple(sorted(kwargs.items())).
This depends on the sorting not being a bit insane. Python cannot positively promise sorting will result in something reasonable here. (But it can't promise much else, so don't sweat it too much.)
You could easily enough make some sort of wrapper that works much like a dict. It might look something like
import collections
class FrozenDict(collections.Mapping):
"""Don't forget the docstrings!!"""
def __init__(self, *args, **kwargs):
self._d = dict(*args, **kwargs)
self._hash = None
def __iter__(self):
return iter(self._d)
def __len__(self):
return len(self._d)
def __getitem__(self, key):
return self._d[key]
def __hash__(self):
# It would have been simpler and maybe more obvious to
# use hash(tuple(sorted(self._d.iteritems()))) from this discussion
# so far, but this solution is O(n). I don't know what kind of
# n we are going to run into, but sometimes it's hard to resist the
# urge to optimize when it will gain improved algorithmic performance.
if self._hash is None:
hash_ = 0
for pair in self.items():
hash_ ^= hash(pair)
self._hash = hash_
return self._hash
It should work great:
>>> x = FrozenDict(a=1, b=2)
>>> y = FrozenDict(a=1, b=2)
>>> x is y
False
>>> x == y
True
>>> x == {'a': 1, 'b': 2}
True
>>> d = {x: 'foo'}
>>> d[y]
'foo'
Curiously, although we have the seldom useful frozenset, there's still no frozen mapping. The idea was rejected in PEP 416 -- Add a frozendict builtin type. This idea may be revisited in a later Python release, see PEP 603 -- Adding a frozenmap type to collections.
So the Python 2 solution to this:
def foo(config={'a': 1}):
...
Still seems to be the usual:
def foo(config=None):
if config is None:
config = {'a': 1} # default config
...
In Python 3 you have the option of this:
from types import MappingProxyType
default_config = {'a': 1}
DEFAULTS = MappingProxyType(default_config)
def foo(config=DEFAULTS):
...
Now the default config can be updated dynamically, but remain immutable where you want it to be immutable by passing around the proxy instead.
So changes in the default_config will update DEFAULTS as expected, but you can't write to the mapping proxy object itself.
Admittedly it's not really the same thing as an "immutable, hashable dict", but it might be a decent substitute for some use cases of a frozendict.
Assuming the keys and values of the dictionary are themselves immutable (e.g. strings) then:
>>> d
{'forever': 'atones', 'minks': 'cards', 'overhands': 'warranted',
'hardhearted': 'tartly', 'gradations': 'snorkeled'}
>>> t = tuple((k, d[k]) for k in sorted(d.keys()))
>>> hash(t)
1524953596
There is no fronzedict, but you can use MappingProxyType that was added to the standard library with Python 3.3:
>>> from types import MappingProxyType
>>> foo = MappingProxyType({'a': 1})
>>> foo
mappingproxy({'a': 1})
>>> foo['a'] = 2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'mappingproxy' object does not support item assignment
>>> foo
mappingproxy({'a': 1})
I think of frozendict everytime I write a function like this:
def do_something(blah, optional_dict_parm=None):
if optional_dict_parm is None:
optional_dict_parm = {}
Install frozendict
pip install frozendict
Use it!
from frozendict import frozendict
def smth(param = frozendict({})):
pass
Here is the code I've been using. I subclassed frozenset. The advantages of this are the following.
This is a truly immutable object. No relying on the good behavior of future users and developers.
It's easy to convert back and forth between a regular dictionary and a frozen dictionary. FrozenDict(orig_dict) --> frozen dictionary. dict(frozen_dict) --> regular dict.
Update Jan 21 2015: The original piece of code I posted in 2014 used a for-loop to find a key that matched. That was incredibly slow. Now I've put together an implementation which takes advantage of frozenset's hashing features. Key-value pairs are stored in special containers where the __hash__ and __eq__ functions are based on the key only. This code has also been formally unit-tested, unlike what I posted here in August 2014.
MIT-style license.
if 3 / 2 == 1:
version = 2
elif 3 / 2 == 1.5:
version = 3
def col(i):
''' For binding named attributes to spots inside subclasses of tuple.'''
g = tuple.__getitem__
#property
def _col(self):
return g(self,i)
return _col
class Item(tuple):
''' Designed for storing key-value pairs inside
a FrozenDict, which itself is a subclass of frozenset.
The __hash__ is overloaded to return the hash of only the key.
__eq__ is overloaded so that normally it only checks whether the Item's
key is equal to the other object, HOWEVER, if the other object itself
is an instance of Item, it checks BOTH the key and value for equality.
WARNING: Do not use this class for any purpose other than to contain
key value pairs inside FrozenDict!!!!
The __eq__ operator is overloaded in such a way that it violates a
fundamental property of mathematics. That property, which says that
a == b and b == c implies a == c, does not hold for this object.
Here's a demonstration:
[in] >>> x = Item(('a',4))
[in] >>> y = Item(('a',5))
[in] >>> hash('a')
[out] >>> 194817700
[in] >>> hash(x)
[out] >>> 194817700
[in] >>> hash(y)
[out] >>> 194817700
[in] >>> 'a' == x
[out] >>> True
[in] >>> 'a' == y
[out] >>> True
[in] >>> x == y
[out] >>> False
'''
__slots__ = ()
key, value = col(0), col(1)
def __hash__(self):
return hash(self.key)
def __eq__(self, other):
if isinstance(other, Item):
return tuple.__eq__(self, other)
return self.key == other
def __ne__(self, other):
return not self.__eq__(other)
def __str__(self):
return '%r: %r' % self
def __repr__(self):
return 'Item((%r, %r))' % self
class FrozenDict(frozenset):
''' Behaves in most ways like a regular dictionary, except that it's immutable.
It differs from other implementations because it doesn't subclass "dict".
Instead it subclasses "frozenset" which guarantees immutability.
FrozenDict instances are created with the same arguments used to initialize
regular dictionaries, and has all the same methods.
[in] >>> f = FrozenDict(x=3,y=4,z=5)
[in] >>> f['x']
[out] >>> 3
[in] >>> f['a'] = 0
[out] >>> TypeError: 'FrozenDict' object does not support item assignment
FrozenDict can accept un-hashable values, but FrozenDict is only hashable if its values are hashable.
[in] >>> f = FrozenDict(x=3,y=4,z=5)
[in] >>> hash(f)
[out] >>> 646626455
[in] >>> g = FrozenDict(x=3,y=4,z=[])
[in] >>> hash(g)
[out] >>> TypeError: unhashable type: 'list'
FrozenDict interacts with dictionary objects as though it were a dict itself.
[in] >>> original = dict(x=3,y=4,z=5)
[in] >>> frozen = FrozenDict(x=3,y=4,z=5)
[in] >>> original == frozen
[out] >>> True
FrozenDict supports bi-directional conversions with regular dictionaries.
[in] >>> original = {'x': 3, 'y': 4, 'z': 5}
[in] >>> FrozenDict(original)
[out] >>> FrozenDict({'x': 3, 'y': 4, 'z': 5})
[in] >>> dict(FrozenDict(original))
[out] >>> {'x': 3, 'y': 4, 'z': 5} '''
__slots__ = ()
def __new__(cls, orig={}, **kw):
if kw:
d = dict(orig, **kw)
items = map(Item, d.items())
else:
try:
items = map(Item, orig.items())
except AttributeError:
items = map(Item, orig)
return frozenset.__new__(cls, items)
def __repr__(self):
cls = self.__class__.__name__
items = frozenset.__iter__(self)
_repr = ', '.join(map(str,items))
return '%s({%s})' % (cls, _repr)
def __getitem__(self, key):
if key not in self:
raise KeyError(key)
diff = self.difference
item = diff(diff({key}))
key, value = set(item).pop()
return value
def get(self, key, default=None):
if key not in self:
return default
return self[key]
def __iter__(self):
items = frozenset.__iter__(self)
return map(lambda i: i.key, items)
def keys(self):
items = frozenset.__iter__(self)
return map(lambda i: i.key, items)
def values(self):
items = frozenset.__iter__(self)
return map(lambda i: i.value, items)
def items(self):
items = frozenset.__iter__(self)
return map(tuple, items)
def copy(self):
cls = self.__class__
items = frozenset.copy(self)
dupl = frozenset.__new__(cls, items)
return dupl
#classmethod
def fromkeys(cls, keys, value):
d = dict.fromkeys(keys,value)
return cls(d)
def __hash__(self):
kv = tuple.__hash__
items = frozenset.__iter__(self)
return hash(frozenset(map(kv, items)))
def __eq__(self, other):
if not isinstance(other, FrozenDict):
try:
other = FrozenDict(other)
except Exception:
return False
return frozenset.__eq__(self, other)
def __ne__(self, other):
return not self.__eq__(other)
if version == 2:
#Here are the Python2 modifications
class Python2(FrozenDict):
def __iter__(self):
items = frozenset.__iter__(self)
for i in items:
yield i.key
def iterkeys(self):
items = frozenset.__iter__(self)
for i in items:
yield i.key
def itervalues(self):
items = frozenset.__iter__(self)
for i in items:
yield i.value
def iteritems(self):
items = frozenset.__iter__(self)
for i in items:
yield (i.key, i.value)
def has_key(self, key):
return key in self
def viewkeys(self):
return dict(self).viewkeys()
def viewvalues(self):
return dict(self).viewvalues()
def viewitems(self):
return dict(self).viewitems()
#If this is Python2, rebuild the class
#from scratch rather than use a subclass
py3 = FrozenDict.__dict__
py3 = {k: py3[k] for k in py3}
py2 = {}
py2.update(py3)
dct = Python2.__dict__
py2.update({k: dct[k] for k in dct})
FrozenDict = type('FrozenDict', (frozenset,), py2)
You may use frozendict from utilspie package as:
>>> from utilspie.collectionsutils import frozendict
>>> my_dict = frozendict({1: 3, 4: 5})
>>> my_dict # object of `frozendict` type
frozendict({1: 3, 4: 5})
# Hashable
>>> {my_dict: 4}
{frozendict({1: 3, 4: 5}): 4}
# Immutable
>>> my_dict[1] = 5
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mquadri/workspace/utilspie/utilspie/collectionsutils/collections_utils.py", line 44, in __setitem__
self.__setitem__.__name__, type(self).__name__))
AttributeError: You can not call '__setitem__()' for 'frozendict' object
As per the document:
frozendict(dict_obj): Accepts obj of dict type and returns a hashable and immutable dict
Subclassing dict
i see this pattern in the wild (github) and wanted to mention it:
class FrozenDict(dict):
def __init__(self, *args, **kwargs):
self._hash = None
super(FrozenDict, self).__init__(*args, **kwargs)
def __hash__(self):
if self._hash is None:
self._hash = hash(tuple(sorted(self.items()))) # iteritems() on py2
return self._hash
def _immutable(self, *args, **kws):
raise TypeError('cannot change object - object is immutable')
# makes (deep)copy alot more efficient
def __copy__(self):
return self
def __deepcopy__(self, memo=None):
if memo is not None:
memo[id(self)] = self
return self
__setitem__ = _immutable
__delitem__ = _immutable
pop = _immutable
popitem = _immutable
clear = _immutable
update = _immutable
setdefault = _immutable
example usage:
d1 = FrozenDict({'a': 1, 'b': 2})
d2 = FrozenDict({'a': 1, 'b': 2})
d1.keys()
assert isinstance(d1, dict)
assert len(set([d1, d2])) == 1 # hashable
Pros
support for get(), keys(), items() (iteritems() on py2) and all the goodies from dict out of the box without explicitly implementing them
uses internally dict which means performance (dict is written in c in CPython)
elegant simple and no black magic
isinstance(my_frozen_dict, dict) returns True - although python encourages duck-typing many packages uses isinstance(), this can save many tweaks and customizations
Cons
any subclass can override this or access it internally (you cant really 100% protect something in python, you should trust your users and provide good documentation).
if you care for speed, you might want to make __hash__ a bit faster.
Yes, this is my second answer, but it is a completely different approach. The first implementation was in pure python. This one is in Cython. If you know how to use and compile Cython modules, this is just as fast as a regular dictionary. Roughly .04 to .06 micro-sec to retrieve a single value.
This is the file "frozen_dict.pyx"
import cython
from collections import Mapping
cdef class dict_wrapper:
cdef object d
cdef int h
def __init__(self, *args, **kw):
self.d = dict(*args, **kw)
self.h = -1
def __len__(self):
return len(self.d)
def __iter__(self):
return iter(self.d)
def __getitem__(self, key):
return self.d[key]
def __hash__(self):
if self.h == -1:
self.h = hash(frozenset(self.d.iteritems()))
return self.h
class FrozenDict(dict_wrapper, Mapping):
def __repr__(self):
c = type(self).__name__
r = ', '.join('%r: %r' % (k,self[k]) for k in self)
return '%s({%s})' % (c, r)
__all__ = ['FrozenDict']
Here's the file "setup.py"
from distutils.core import setup
from Cython.Build import cythonize
setup(
ext_modules = cythonize('frozen_dict.pyx')
)
If you have Cython installed, save the two files above into the same directory. Move to that directory in the command line.
python setup.py build_ext --inplace
python setup.py install
And you should be done.
The main disadvantage of namedtuple is that it needs to be specified before it is used, so it's less convenient for single-use cases.
However, there is a practical workaround that can be used to handle many such cases. Let's say that you want to have an immutable equivalent of the following dict:
MY_CONSTANT = {
'something': 123,
'something_else': 456
}
This can be emulated like this:
from collections import namedtuple
MY_CONSTANT = namedtuple('MyConstant', 'something something_else')(123, 456)
It's even possible to write an auxiliary function to automate this:
def freeze_dict(data):
from collections import namedtuple
keys = sorted(data.keys())
frozen_type = namedtuple(''.join(keys), keys)
return frozen_type(**data)
a = {'foo':'bar', 'x':'y'}
fa = freeze_dict(data)
assert a['foo'] == fa.foo
Of course this works only for flat dicts, but it shouldn't be too difficult to implement a recursive version.
freeze implements frozen collections (dict, list and set) that are hashable, type-hinted and will recursively freeze the data you give them (when possible) for you.
pip install frz
Usage:
from freeze import FDict
a_mutable_dict = {
"list": [1, 2],
"set": {3, 4},
}
a_frozen_dict = FDict(a_mutable_dict)
print(repr(a_frozen_dict))
# FDict: {'list': FList: (1, 2), 'set': FSet: {3, 4}}
In the absence of native language support, you can either do it yourself or use an existing solution. Fortunately Python makes it dead simple to extend off of their base implementations.
class frozen_dict(dict):
def __setitem__(self, key, value):
raise Exception('Frozen dictionaries cannot be mutated')
frozen_dict = frozen_dict({'foo': 'FOO' })
print(frozen['foo']) # FOO
frozen['foo'] = 'NEWFOO' # Exception: Frozen dictionaries cannot be mutated
# OR
from types import MappingProxyType
frozen_dict = MappingProxyType({'foo': 'FOO'})
print(frozen_dict['foo']) # FOO
frozen_dict['foo'] = 'NEWFOO' # TypeError: 'mappingproxy' object does not support item assignment
I needed to access fixed keys for something at one point for something that was a sort of globally-constanty kind of thing and I settled on something like this:
class MyFrozenDict:
def __getitem__(self, key):
if key == 'mykey1':
return 0
if key == 'mykey2':
return "another value"
raise KeyError(key)
Use it like
a = MyFrozenDict()
print(a['mykey1'])
WARNING: I don't recommend this for most use cases as it makes some pretty severe tradeoffs.

Immutable dictionary, only use as a key for another dictionary

I had the need to implement a hashable dict so I could use a dictionary as a key for another dictionary.
A few months ago I used this implementation: Python hashable dicts
However I got a notice from a colleague saying 'it is not really immutable, thus it is not safe. You can use it, but it does make me feel like a sad Panda'.
So I started looking around to create one that is immutable. I have no need to compare the 'key-dict' to another 'key-dict'. Its only use is as a key for another dictionary.
I have come up with the following:
class HashableDict(dict):
"""Hashable dict that can be used as a key in other dictionaries"""
def __new__(self, *args, **kwargs):
# create a new local dict, that will be used by the HashableDictBase closure class
immutableDict = dict(*args, **kwargs)
class HashableDictBase(object):
"""Hashable dict that can be used as a key in other dictionaries. This is now immutable"""
def __key(self):
"""Return a tuple of the current keys"""
return tuple((k, immutableDict[k]) for k in sorted(immutableDict))
def __hash__(self):
"""Return a hash of __key"""
return hash(self.__key())
def __eq__(self, other):
"""Compare two __keys"""
return self.__key() == other.__key() # pylint: disable-msg=W0212
def __repr__(self):
"""#see: dict.__repr__"""
return immutableDict.__repr__()
def __str__(self):
"""#see: dict.__str__"""
return immutableDict.__str__()
def __setattr__(self, *args):
raise TypeError("can't modify immutable instance")
__delattr__ = __setattr__
return HashableDictBase()
I used the following to test the functionality:
d = {"a" : 1}
a = HashableDict(d)
b = HashableDict({"b" : 2})
print a
d["b"] = 2
print a
c = HashableDict({"a" : 1})
test = {a : "value with a dict as key (key a)",
b : "value with a dict as key (key b)"}
print test[a]
print test[b]
print test[c]
which gives:
{'a': 1}
{'a': 1}
value with a dict as key (key a)
value with a dict as key (key b)
value with a dict as key (key a)
as output
Is this the 'best possible' immutable dictionary that I can use that satisfies my requirements? If not, what would be a better solution?
If you are only using it as a key for another dict, you could go for frozenset(mutabledict.items()). If you need to access the underlying mappings, you could then use that as the parameter to dict.
mutabledict = dict(zip('abc', range(3)))
immutable = frozenset(mutabledict.items())
read_frozen = dict(immutable)
read_frozen['a'] # => 1
Note that you could also combine this with a class derived from dict, and use the frozenset as the source of the hash, while disabling __setitem__, as suggested in another answer. (#RaymondHettinger's answer for code which does just that).
The Mapping abstract base class makes this easy to implement:
import collections
class ImmutableDict(collections.Mapping):
def __init__(self, somedict):
self._dict = dict(somedict) # make a copy
self._hash = None
def __getitem__(self, key):
return self._dict[key]
def __len__(self):
return len(self._dict)
def __iter__(self):
return iter(self._dict)
def __hash__(self):
if self._hash is None:
self._hash = hash(frozenset(self._dict.items()))
return self._hash
def __eq__(self, other):
return self._dict == other._dict
I realize this has already been answered, but types.MappingProxyType is an analogous implementation for Python 3.3. Regarding the original question of safety, there is a discussion in PEP 416 -- Add a frozendict builtin type on why the idea of a frozendict was rejected.
In order for your immutable dictionary to be safe, all it needs to do is never change its hash. Why don't you just disable __setitem__ as follows:
class ImmutableDict(dict):
def __setitem__(self, key, value):
raise Exception("Can't touch this")
def __hash__(self):
return hash(tuple(sorted(self.items())))
a = ImmutableDict({'a':1})
b = {a:1}
print b
print b[a]
a['a'] = 0
The output of the script is:
{{'a': 1}: 1}
1
Traceback (most recent call last):
File "ex.py", line 11, in <module>
a['a'] = 0
File "ex.py", line 3, in __setitem__
raise Exception("Can't touch this")
Exception: Can't touch this
Here is a link to pip install-able implementation of #RaymondHettinger's answer: https://github.com/pcattori/icicle
Simply pip install icicle and you can from icicle import FrozenDict!
Update: icicle has been deprecated in favor of maps: https://github.com/pcattori/maps (documentation, PyPI).
It appears I am late to post. Not sure if anyone else has come up with ideas. But here is my take on it. The Dict is immutable and hashable. I made it immutable by overriding all the methods, magic and otherwise, with a custom '_readonly' function that raises an Exception. This is done when the object is instantiated. To get around the problem of not being able to apply the values I set the 'hash' under '__new__'. I then I override the '__hash__'function. Thats it!
class ImmutableDict(dict):
_HASH = None
def __new__(cls, *args, **kwargs):
ImmutableDict._HASH = hash(frozenset(args[0].items()))
return super(ImmutableDict, cls).__new__(cls, args)
def __hash__(self):
return self._HASH
def _readonly(self, *args, **kwards):
raise TypeError("Cannot modify Immutable Instance")
__delattr__ = __setattr__ = __setitem__ = pop = update = setdefault = clear = popitem = _readonly
Test:
immutabled1 = ImmutableDict({"This": "That", "Cheese": "Blarg"})
dict1 = {immutabled1: "Yay"}
dict1[immutabled1]
"Yay"
dict1
{{'Cheese': 'Blarg', 'This': 'That'}: 'Yay'}
Variation of Raymond Hettinger's answer by wrapping the self._dict with types.MappingProxyType.
class ImmutableDict(collections.Mapping):
"""
Copies a dict and proxies it via types.MappingProxyType to make it immutable.
"""
def __init__(self, somedict):
dictcopy = dict(somedict) # make a copy
self._dict = MappingProxyType(dictcopy) # lock it
self._hash = None
def __getitem__(self, key):
return self._dict[key]
def __len__(self):
return len(self._dict)
def __iter__(self):
return iter(self._dict)
def __hash__(self):
if self._hash is None:
self._hash = hash(frozenset(self._dict.items()))
return self._hash
def __eq__(self, other):
return self._dict == other._dict
def __repr__(self):
return str(self._dict)
You can use an enum:
import enum
KeyDict1 = enum.Enum('KeyDict1', {'InnerDictKey1':'bla', 'InnerDictKey2 ':2})
d = { KeyDict1: 'whatever', KeyDict2: 1, ...}
You can access the enums like you would a dictionary:
KeyDict1['InnerDictKey2'].value # This is 2
You can iterate over the names, and get their values... It does everything you'd expect.
You can try using https://github.com/Lightricks/freeze
It provides recursively immutable and hashable dictionaries
from freeze import FDict
a_mutable_dict = {
"list": [1, 2],
"set": {3, 4},
}
a_frozen_dict = FDict(a_mutable_dict)
print(a_frozen_dict)
print(hash(a_frozen_dict))
# FDict: {'list': FList: (1, 2), 'set': FSet: {3, 4}}
# -4855611361973338606

Categories