Objects have same hash, dictionary not recognizing as same

Objects have same hash, dictionary not recognizing as same - python

I have two objects which represent the same one. I even insured they had the same hash. I still got an error though from a dictionary:
>>> hash(one)
1098414562
>>> hash(one+zero)
1098414562
>>> a={one:1}
>>> a[one+zero]
Traceback (most recent call last):
File "<pyshell#25>", line 1, in <module>
a[one+zero]
KeyError: {{|}|}
What else do I have to do to ensure the dictionary reconizes it as the same key?

To be properly hashable dict keys, the objects must also define __eq__() or __cmp__(). They must compare equal to be recognized as the same key.
If the objects have the same hash, but do not compare equal, a hash collision is assumed, and they go separately in the same hash bucket.
When an object is looked up by hash, all objects in the matching hash bucket are compared to it, if none are equal, it's a KeyError.

If a class does not define a __cmp__() or __eq__() method it should not define a __hash__() operation either; if it defines __cmp__() or __eq__() but not __hash__(), its instances will not be usable in hashed collections. If a class defines mutable objects and implements a __cmp__() or __eq__() method, it should not implement __hash__(), since hashable collection implementations require that a object’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).
source

This is called a hash-collision. The hash is just a relatively efficient way to distribute the keys across the dictionary. It doesn't guarantee uniqueness. When you look up a key, all the keys with a matching hash need to be considered, and will use the __eq__ method to determine if they really match or not.
Aside:
It's possible to have a class that always returns the same hash for any of it's instances. However this destroys the usual O(1) lookup complexity
You can easily see the linear time lookups by experimenting with different values of n here
class MyInt(int):
def __hash__(self):
return hash(1)
d = {MyInt(i): i for i in xrange(n)}
d[MyInt(1)] # This will take O(n) time since all values have the same hash

Related

In Python, why is a tuple hashable but not a list?

Here below when I try to hash a list, it gives me an error but works with a tuple. Guess it has something to do with immutability. Can someone explain this in detail ?
List
x = [1,2,3]
y = {x: 9}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
Tuple
z = (5,6)
y = {z: 89}
print(y)
{(5, 6): 89}

Dicts and other objects use hashes to store and retrieve items really quickly. The mechanics of this all happens "under the covers" - you as the programmer don't need to do anything and Python handles it all internally. The basic idea is that when you create a dictionary with {key: value}, Python needs to be able to hash whatever you used for key so it can store and look up the value quickly.
Immutable objects, or objects that can't be altered, are hashable. They have a single unique value that never changes, so python can "hash" that value and use it to look up dictionary values efficiently. Objects that fall into this category include strings, tuples, integers and so on. You may think, "But I can change a string! I just go mystr = mystr + 'foo'," but in fact what this does is create a new string instance and assigns it to mystr. It doesn't modify the existing instance. Immutable objects never change, so you can always be sure that when you generate a hash for an immutable object, looking up the object by its hash will always return the same object you started with, and not a modified version.
You can try this for yourself: hash("mystring"), hash(('foo', 'bar')), hash(1)
Mutable objects, or objects that can be modified, aren't hashable. A list can be modified in-place: mylist.append('bar') or mylist.pop(0). You can't safely hash a mutable object because you can't guarantee that the object hasn't changed since you last saw it. You'll find that list, set, and other mutable types don't have a __hash__() method. Because of this, you can't use mutable objects as dictionary keys:
>>> hash([1,2,3])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
Eric Duminil's answer provides a great example of the unexpected behaviour that arises from using mutable objects as dictionary keys

Here are examples why it might not be a good idea to allow mutable types as keys. This behaviour might be useful in some cases (e.g. using the state of the object as a key rather than the object itself) but it also might lead to suprising results or bugs.
Python
It's possible to use a numeric list as a key by defining __hash__ on a subclass of list :
class MyList(list):
def __hash__(self):
return sum(self)
my_list = MyList([1, 2, 3])
my_dict = {my_list: 'a'}
print(my_dict.get(my_list))
# a
my_list[2] = 4 # __hash__() becomes 7
print(next(iter(my_dict)))
# [1, 2, 4]
print(my_dict.get(my_list))
# None
print(my_dict.get(MyList([1,2,3])))
# None
my_list[0] = 0 # __hash_() is 6 again, but for different elements
print(next(iter(my_dict)))
# [0, 2, 4]
print(my_dict.get(my_list))
# 'a'
Ruby
In Ruby, it's allowed to use a list as a key. A Ruby list is called an Array and a dict is a Hash, but the syntax is very similar to Python's :
my_list = [1]
my_hash = { my_list => 'a'}
puts my_hash[my_list]
#=> 'a'
But if this list is modified, the dict doesn't find the corresponding value any more, even if the key is still in the dict :
my_list << 2
puts my_list
#=> [1,2]
puts my_hash.keys.first
#=> [1,2]
puts my_hash[my_list]
#=> nil
It's possible to force the dict to calculate the key hashes again :
my_hash.rehash
puts my_hash[my_list]
#=> 'a'

A hashset calculates the hash of an object and based on that hash, stores the object in the structure for fast lookup. As a result, by contract once an object is added to the dictionary, the hash is not allowed to change. Most good hash functions will depend on the number of elements and the elements itself.
A tuple is immutable, so after construction, the values cannot change and therefore the hash cannot change either (or at least a good implementation should not let the hash change).
A list on the other hand is mutable: one can later add/remove/alter elements. As a result the hash can change violating the contract.
So all objects that cannot guarantee a hash function that remains stable after the object is added, violate the contract and thus are no good candidates. Because for a lookup, the dictionary will first calculate the hash of the key, and determine the correct bucket. If the key is meanwhile changed, this could result in false negatives: the object is in the dictionary, but it can no longer be retrieved because the hash is different so a different bucket will be searched than the one where the object was originally added to.

I would like to add the following aspect as it's not covered by other answers already.
There's nothing wrong about making mutable objects hashable, it's just not unambiguous and this is why it needs to be defined and implemented consistently by the programmer himself (not by the programming language).
Note that you can implement the __hash__ method for any custom class which allows its instances to be stored in contexts where hashable types are required (such as dict keys or sets).
Hash values are usually used to decide if two objects represent the same thing. So consider the following example. You have a list with two items: l = [1, 2]. Now you add an item to the list: l.append(3). And now you must answer the following question: Is it still the same thing? Both - yes and no - are valid answers. "Yes", it is still the same list and "no", it has not the same content anymore.
So the answer to this question depends on you as the programmer and so it is up to you to manually implement hash methods for your mutable types.

Based on Python Glossary
An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() method). Hashable objects which compare equal must have the same hash value.
All of Python’s immutable built-in objects are hashable; mutable containers (such as lists or dictionaries) are not.

Because a list is mutable, while a tuple is not. When you store the hash of a value in, for example, a dict, if the object changes, the stored hash value won't find out, so it will remain the same. The next time you look up the object, the dictionary will try to look it up by the old hash value, which is not relevant anymore.
To prevent that, python does not allow you to has mutable items.

why sets,dicts,list are unhashable in python

What exactly is meant by unhashable?
>>> a={1,2,3}
>>> b={4,5,6}
>>> set([a,b])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'set'
>>>
Can any one tell what the error is exactly? Also can i add set into another set in python?

Objects that doesn't have the __hash__() attribute called unhashable. Python documentation has described the reason very well:
If a class defines mutable objects and implements an __eq__() method, it should not implement __hash__(), since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).

As Kasramvd explained, objects in python that are mutable and implement the __eq__ function are unhashable.
Since sets, lists and dicts are mutable (i.e. they can be changed; for instance you can add, remove items to all of them) , they cannot be hashed.
Since a set of sets is not possible, perhaps a set of tuple might work, though you will need to do additional bookkeeping (e.g ensure unique values) in order to achieve exactly what you described.
a = (1,2,3)
b = (4,5,6)
c = set([a,b])
Or even better, a set of frozensets. Similar to sets, but immutable (you cannot add or remove elements from them).
a = frozenset(a)
b = frozenset(b)
c = set([a,b])

A hash function is any function that can be used to map data of
arbitrary size to data of fixed size. The values returned by a hash
function are called hash values, hash codes, hash sums, or simply
hashes.
The dictionary in python is just a hash map.
And sets could only contain strings or chars or numbers, but not dics or another sets.
You might wanna look at: https://docs.python.org/2/tutorial/datastructures.html#sets

Difference between hash() and id()

I have two user-defined objects, say a and b.
Both these objects have the same hash values.
However, the id(a) and id(b) are unequal.
Moreover,
>>> a is b
False
>>> a == b
True
From this observation, can I infer the following?
Unequal objects may have the same hash values.
Equal objects need to have the same id values.
Whenever obj1 is obj2 is called, the id values of both objects is compared, not their hash values.

There are three concepts to grasp when trying to understand id, hash and the == and is operators: identity, value and hash value. Not all objects have all three.
All objects have an identity, though even this can be a little slippery in some cases. The id function returns a number corresponding to an object's identity (in cpython, it returns the memory address of the object, but other interpreters may return something else). If two objects (that exist at the same time) have the same identity, they're actually two references to the same object. The is operator compares items by identity, a is b is equivalent to id(a) == id(b).
Identity can get a little confusing when you deal with objects that are cached somewhere in their implementation. For instance, the objects for small integers and strings in cpython are not remade each time they're used. Instead, existing objects are returned any time they're needed. You should not rely on this in your code though, because it's an implementation detail of cpython (other interpreters may do it differently or not at all).
All objects also have a value, though this is a bit more complicated. Some objects do not have a meaningful value other than their identity (so value an identity may be synonymous, in some cases). Value can be defined as what the == operator compares, so any time a == b, you can say that a and b have the same value. Container objects (like lists) have a value that is defined by their contents, while some other kinds of objects will have values based on their attributes. Objects of different types can sometimes have the same values, as with numbers: 0 == 0.0 == 0j == decimal.Decimal("0") == fractions.Fraction(0) == False (yep, bools are numbers in Python, for historic reasons).
If a class doesn't define an __eq__ method (to implement the == operator), it will inherit the default version from object and its instances will be compared solely by their identities. This is appropriate when otherwise identical instances may have important semantic differences. For instance, two different sockets connected to the same port of the same host need to be treated differently if one is fetching an HTML webpage and the other is getting an image linked from that page, so they don't have the same value.
In addition to a value, some objects have a hash value, which means they can be used as dictionary keys (and stored in sets). The function hash(a) returns the object a's hash value, a number based on the object's value. The hash of an object must remain the same for the lifetime of the object, so it only makes sense for an object to be hashable if its value is immutable (either because it's based on the object's identity, or because it's based on contents of the object that are themselves immutable).
Multiple different objects may have the same hash value, though well designed hash functions will avoid this as much as possible. Storing objects with the same hash in a dictionary is much less efficient than storing objects with distinct hashes (each hash collision requires more work). Objects are hashable by default (since their default value is their identity, which is immutable). If you write an __eq__ method in a custom class, Python will disable this default hash implementation, since your __eq__ function will define a new meaning of value for its instances. You'll need to write a __hash__ method as well, if you want your class to still be hashable. If you inherit from a hashable class but don't want to be hashable yourself, you can set __hash__ = None in the class body.

Unequal objects may have the same hash values.
Yes this is true. A simple example is hash(-1) == hash(-2) in CPython.
Equal objects need to have the same id values.
No this is false in general. A simple counterexample noted by #chepner is that 5 == 5.0 but id(5) != id(5.0).
Whenever, obj1 is obj2 is called, the id values of both the objects is compared, not their hash values.
Yes this is true. is compares the id of the objects for equality (in CPython it is the memory address of the object). Generally, this has nothing to do with the object's hash value (the object need not even be hashable).

The hash function is used to:
quickly compare dictionary keys during a dictionary lookup
the ID function is used to:
Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.

I just had an experiment , which assigned same whole number to 2 separated variables , but when I used ' is ' operator to compare them , then in returned True ; I didn't expect such a thing , Thought that the variables must be exact same like==>
a = 10
b = a
a is b # output == True

How to create an immutable dictionary in python?

I want to subclass dict in python such that all the dictionaries of the sub-class are immutable.
I don't understand how does __hash__ affects the immutability, since in my understanding it just signifies the equality or non-equality of objects !
So, can __hash__ be used to implement immutability ? How ?
Update:
Objective is that common response from an API is available as a dict, which has to be shared as a global variable. So, that needs to be intact no matter what ?

I found a Official reference : suggestion contained in a rejected PEP.
class imdict(dict):
def __hash__(self):
return id(self)
def _immutable(self, *args, **kws):
raise TypeError('object is immutable')
__setitem__ = _immutable
__delitem__ = _immutable
clear = _immutable
update = _immutable
setdefault = _immutable
pop = _immutable
popitem = _immutable
Attribution : http://www.python.org/dev/peps/pep-0351/

It is possible to create immutable dict using just standard library.
from types import MappingProxyType
power_levels = MappingProxyType(
{
"Kevin": 9001,
"Benny": 8000,
}
)
See source of idea with more detailed explanation

In frozendict, hash is simply implemented following the rejected PEP 416 of Victor Stinner:
def __hash__(self):
try:
fs = frozenset(self.items())
except TypeError:
hash = -1
else:
hash = hash(fs)
if hash == -1:
raise TypeError("Not all values are hashable.")
return hash
PS: I'm the new maintainer of the package.

So, can __hash__ be used to implement immutability ?
No, it can't. The object can be made mutable (or not) irrespective of what its __hash__ method does.
The relationship between immutable objects and __hash__ is that, since an immutable object cannot be changed, the value returned by __hash__ remains constant post-construction. For mutable objects, this may or may not be the case (the recommended practice is that such objects simply fail to hash).
For further discussion, see Issue 13707: Clarify hash() constency period.

Regarding the relationship between hashability and mutability:
To be useful, a hash implementation needs to fulfil the following properties:
The hash value of two objects that compare equal using == must be equal.
The hash value may not change over time.
These two properties imply that hashable classes cannot take mutable properties into account when comparing instances, and by contraposition that classes which do take mutable properties into account when comparing instances are not hashable. Immutable classes can be made hashable without any implications for comparison.
All of the built-in mutable types are not hashable, and all of the immutable built-in types are hashable. This is mainly a consequence of the above observations.
User-defined classes by default define comparison based on object identity, and use the id() as hash. They are mutable, but the mutable data is not taken into account when comparing instances, so they can be made hashable.
Making a class hashable does not make it immutable in some magic way. On the contrary, to make a dictionary hashable in a reasonable way while keeping the original comparison operator, you will first need to make it immutable.
Edit: Regarding your update:
There are several ways to provide the equivalent of global immutable dictionary:
Use a collections.namedtuple() instance instead.
Use a user-defined class with read-only properties.
I'd usually go with something like this:
_my_global_dict = {"a": 42, "b": 7}
def request_value(key):
return _my_global_dict[key]
By the leading underscore, you make clear that _my_global_dict is an implementation detail not to be touched by application code. Note that this code would still allow to modify dictionary values if they happen to be mutable objects. You could solve this problem by returning copy.copy()s or copy.deepcopy()s of the values if necessary.

Since Python 3.3, it's possible to use MappingProxyType to create an immutable mapping:
>>> from types import MappingProxyType
>>> MappingProxyType({'a': 1})
mappingproxy({'a': 1})
>>> immutable_mapping = MappingProxyType({'a': 1})
>>> immutable_mapping['a']
1
>>> immutable_mapping['b'] = 2
Traceback (most recent call last):
(...)
TypeError: 'mappingproxy' object does not support item assignment
It's not hashable so you can't use it as a dictionary key (and it's "final", so you can't subclass it to override __hash__), but it's good enough if you want an immutable mapping to prevent accidental modification of a global value (like a class default attribute).
Careful not to add mutable values that could themselves be modified.

Why aren't Python sets hashable?

I stumbled across a blog post detailing how to implement a powerset function in Python. So I went about trying my own way of doing it, and discovered that Python apparently cannot have a set of sets, since set is not hashable. This is irksome, since the definition of a powerset is that it is a set of sets, and I wanted to implement it using actual set operations.
>>> set([ set() ])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'set'
Is there a good reason Python sets are not hashable?

Generally, only immutable objects are hashable in Python. The immutable variant of set() -- frozenset() -- is hashable.

Because they're mutable.
If they were hashable, a hash could silently become "invalid", and that would pretty much make hashing pointless.

From the Python docs:
hashable
An object is hashable if it
has a hash value which never changes
during its lifetime (it needs a
hash() method), and can be compared to other objects (it needs an
eq() or cmp() method). Hashable objects which compare equal
must have the same hash value.
Hashability makes an object usable as
a dictionary key and a set member,
because these data structures use the
hash value internally.
All of Python’s immutable built-in
objects are hashable, while no mutable
containers (such as lists or
dictionaries) are. Objects which are
instances of user-defined classes are
hashable by default; they all compare
unequal, and their hash value is their
id().

In case this helps... if you really need to convert unhashable things into hashable equivalents for some reason you might do something like this:
from collections import Hashable, MutableSet, MutableSequence, MutableMapping
def make_hashdict(value):
"""
Inspired by https://stackoverflow.com/questions/1151658/python-hashable-dicts
- with the added bonus that it inherits from the dict type of value
so OrderedDict's maintain their order and other subclasses of dict() maintain their attributes
"""
map_type = type(value)
class HashableDict(map_type):
def __init__(self, *args, **kwargs):
super(HashableDict, self).__init__(*args, **kwargs)
def __hash__(self):
return hash(tuple(sorted(self.items())))
hashDict = HashableDict(value)
return hashDict
def make_hashable(value):
if not isinstance(value, Hashable):
if isinstance(value, MutableSet):
value = frozenset(value)
elif isinstance(value, MutableSequence):
value = tuple(value)
elif isinstance(value, MutableMapping):
value = make_hashdict(value)
return value
my_set = set()
my_set.add(make_hashable(['a', 'list']))
my_set.add(make_hashable({'a': 1, 'dict': 2}))
my_set.add(make_hashable({'a', 'new', 'set'}))
print my_set
My HashableDict implementation is the simplest and least rigorous example from here. If you need a more advanced HashableDict that supports pickling and other things, check the many other implementations. In my version above I wanted to preserve the original dict class, thus preserving the order of OrderedDicts. I also use AttrDict from here for attribute-like access.
My example above is not in any way authoritative, just my solution to a similar problem where I needed to store some things in a set and needed to "hashify" them first.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Objects have same hash, dictionary not recognizing as same - python

Related

In Python, why is a tuple hashable but not a list?

why sets,dicts,list are unhashable in python

Difference between hash() and id()

How to create an immutable dictionary in python?

Why aren't Python sets hashable?

Categories

Resources