when compare by id is used in Python? Dictionary key comparison? - python

What does this mean?
The only types of values not acceptable as dictionary keys are values containing lists or dictionaries or other mutable types that are compared by value rather than by object identity, the reason being that the efficient implementation of dictionaries requires a key’s hash value to remain constant.
I think even for tuples, comparison will happen by value.

The problem with a mutable object as a key is that when we use a dictionary, we rarely want to check identity. For example, when we use a dictionary like this:
a = "bob"
test = {a: 30}
print(test["bob"])
We expect it to work - the second string "bob" may not be the same as a, but it is the same value, which is what we care about. This works as any two strings that equate will have the same hash, meaning that the dict (implemented as a hashmap) can find those strings very efficiently.
The issue comes into play when we have a list as a key, imagine this case:
a = ["bob"]
test = {a: 30}
print(test[["bob"]])
We can't do this any more - the comparison won't work as the hash of a list is not based on it's value, but rather the instance of the list (aka (id(a) != id(["bob"))).
Python has the choice of making the list's hash change (undermining the efficiency of a hashmap) or simply comparing on identity (which is useless in most cases). Python disallows these specific mutable keys to avoid subtle but common bugs where people expect the values to be equated on value, rather than identity.

The documentation mixes together two different things: mutability, and value-comparable. Let's separate them out.
Immutable objects that compare by identity are fine. The identity can
never change, for any object.
Immutable objects that compare by value are fine. The value can never
change for an immutable object. This includes tuples.
Mutable objects that compare by identity are fine. The identity can
never change, for any object.
Mutable objects that compare by value are not acceptable. The value
can change for a mutable object, which would make the dictionary
invalid.
Meanwhile, your wording isn't quite the same as Mapping Types (4.10 in Python 3.3 or 5.8 in Python 2.7, both of which say:
A dictionary’s keys are almost arbitrary values. Values that are not hashable, that is, values containing lists, dictionaries or other mutable types (that are compared by value rather than by object identity) may not be used as keys.
Anyway, the key point here is that the rule is "not hashable"; "mutable types (that are compared by value rather than by object identity)" is just to explain things a little further. It isn't strictly true that comparing by object identity and hashing by object identity are always the same (the only thing that's required is that if id is equal, the hash is equal).
The part about "efficient implementation of dictionaries" from the version you posted just adds to the confusion (which is probably why it's not in the reference documentation). Even if someone came up with an efficient way to deal with storing lists as dict keys tomorrow, the language doesn't allow it.

A hash is way of calculating an unique code for an object, this code always the same for the same object. hash('test') for example is 2314058222102390712, so is a = 'test'; hash(a) = 2314058222102390712.
Internally a dictionary value is searched by the hash, not by the variable you specify. A list is mutable, a hash for a list, if it where defined, would be changing whenever the list changes. Therefore python's design does not hash lists. Lists therefore can not be used as dictionary keys.
Tuples are immutable, therefore tubles have hashes e.G. hash((1,2)) = 3713081631934410656. one could compare whether a tuple a is equal to the tuple (1,2) by comparing the hash, rather than the value. This would be more efficient as we have to compare only one value instead of two.

Related

can i use dictionary as a value of set?

as you can see that i am trying to use dictionary in set as a value but it is showing error i want to know that why it is not possible to use dictionary as a value of set ? i want to know why ? is there any reason ? it is not working at all and so many errors are comming So please help me what is the problem ? why can not i use dictionary as a value of set sequence ? but it is working with list and tuple but it is not working only with set why?
s={1,2,4,{1:'fc',2:'tw'},'co-operator'}
print(s)
A set requires that all elements in it are hashable. A dictionary is not hashable.
An object is hashable if it has a hash value which never
changes during its lifetime (it needs a hash() method), and can be
compared to other objects (it needs an eq() method). Hashable
objects which compare equal must have the same hash value.
Hashability makes an object usable as a dictionary key and a set
member, because these data structures use the hash value internally.
Most of Python’s immutable built-in objects are hashable; mutable
containers (such as lists or dictionaries) are not; immutable
containers (such as tuples and frozensets) are only hashable if their
elements are hashable. Objects which are instances of user-defined
classes are hashable by default. They all compare unequal (except with
themselves), and their hash value is derived from their id().

In Python, why is a tuple hashable but not a list?

Here below when I try to hash a list, it gives me an error but works with a tuple. Guess it has something to do with immutability. Can someone explain this in detail ?
List
x = [1,2,3]
y = {x: 9}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
Tuple
z = (5,6)
y = {z: 89}
print(y)
{(5, 6): 89}
Dicts and other objects use hashes to store and retrieve items really quickly. The mechanics of this all happens "under the covers" - you as the programmer don't need to do anything and Python handles it all internally. The basic idea is that when you create a dictionary with {key: value}, Python needs to be able to hash whatever you used for key so it can store and look up the value quickly.
Immutable objects, or objects that can't be altered, are hashable. They have a single unique value that never changes, so python can "hash" that value and use it to look up dictionary values efficiently. Objects that fall into this category include strings, tuples, integers and so on. You may think, "But I can change a string! I just go mystr = mystr + 'foo'," but in fact what this does is create a new string instance and assigns it to mystr. It doesn't modify the existing instance. Immutable objects never change, so you can always be sure that when you generate a hash for an immutable object, looking up the object by its hash will always return the same object you started with, and not a modified version.
You can try this for yourself: hash("mystring"), hash(('foo', 'bar')), hash(1)
Mutable objects, or objects that can be modified, aren't hashable. A list can be modified in-place: mylist.append('bar') or mylist.pop(0). You can't safely hash a mutable object because you can't guarantee that the object hasn't changed since you last saw it. You'll find that list, set, and other mutable types don't have a __hash__() method. Because of this, you can't use mutable objects as dictionary keys:
>>> hash([1,2,3])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
Eric Duminil's answer provides a great example of the unexpected behaviour that arises from using mutable objects as dictionary keys
Here are examples why it might not be a good idea to allow mutable types as keys. This behaviour might be useful in some cases (e.g. using the state of the object as a key rather than the object itself) but it also might lead to suprising results or bugs.
Python
It's possible to use a numeric list as a key by defining __hash__ on a subclass of list :
class MyList(list):
def __hash__(self):
return sum(self)
my_list = MyList([1, 2, 3])
my_dict = {my_list: 'a'}
print(my_dict.get(my_list))
# a
my_list[2] = 4 # __hash__() becomes 7
print(next(iter(my_dict)))
# [1, 2, 4]
print(my_dict.get(my_list))
# None
print(my_dict.get(MyList([1,2,3])))
# None
my_list[0] = 0 # __hash_() is 6 again, but for different elements
print(next(iter(my_dict)))
# [0, 2, 4]
print(my_dict.get(my_list))
# 'a'
Ruby
In Ruby, it's allowed to use a list as a key. A Ruby list is called an Array and a dict is a Hash, but the syntax is very similar to Python's :
my_list = [1]
my_hash = { my_list => 'a'}
puts my_hash[my_list]
#=> 'a'
But if this list is modified, the dict doesn't find the corresponding value any more, even if the key is still in the dict :
my_list << 2
puts my_list
#=> [1,2]
puts my_hash.keys.first
#=> [1,2]
puts my_hash[my_list]
#=> nil
It's possible to force the dict to calculate the key hashes again :
my_hash.rehash
puts my_hash[my_list]
#=> 'a'
A hashset calculates the hash of an object and based on that hash, stores the object in the structure for fast lookup. As a result, by contract once an object is added to the dictionary, the hash is not allowed to change. Most good hash functions will depend on the number of elements and the elements itself.
A tuple is immutable, so after construction, the values cannot change and therefore the hash cannot change either (or at least a good implementation should not let the hash change).
A list on the other hand is mutable: one can later add/remove/alter elements. As a result the hash can change violating the contract.
So all objects that cannot guarantee a hash function that remains stable after the object is added, violate the contract and thus are no good candidates. Because for a lookup, the dictionary will first calculate the hash of the key, and determine the correct bucket. If the key is meanwhile changed, this could result in false negatives: the object is in the dictionary, but it can no longer be retrieved because the hash is different so a different bucket will be searched than the one where the object was originally added to.
I would like to add the following aspect as it's not covered by other answers already.
There's nothing wrong about making mutable objects hashable, it's just not unambiguous and this is why it needs to be defined and implemented consistently by the programmer himself (not by the programming language).
Note that you can implement the __hash__ method for any custom class which allows its instances to be stored in contexts where hashable types are required (such as dict keys or sets).
Hash values are usually used to decide if two objects represent the same thing. So consider the following example. You have a list with two items: l = [1, 2]. Now you add an item to the list: l.append(3). And now you must answer the following question: Is it still the same thing? Both - yes and no - are valid answers. "Yes", it is still the same list and "no", it has not the same content anymore.
So the answer to this question depends on you as the programmer and so it is up to you to manually implement hash methods for your mutable types.
Based on Python Glossary
An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() method). Hashable objects which compare equal must have the same hash value.
All of Python’s immutable built-in objects are hashable; mutable containers (such as lists or dictionaries) are not.
Because a list is mutable, while a tuple is not. When you store the hash of a value in, for example, a dict, if the object changes, the stored hash value won't find out, so it will remain the same. The next time you look up the object, the dictionary will try to look it up by the old hash value, which is not relevant anymore.
To prevent that, python does not allow you to has mutable items.

dictionary keys: custom objects vs lists

I have read that lists cannot be dictionary keys because mutable objects cannot be hashed.
However, custom objects appear to be mutable as well:
# custom object
class Vertex(object):
def __init__(self, key):
self.key = key
v = Vertex(1)
v.color = 'grey' # this line suggests the custom object is mutable
But, unlike lists, they can be used as dictionary keys; why is this? Couldn't we simply hash some sort of id (such as the address of the object in memory) in both cases?
as noted in Why Lists can't be Dictionary Keys:
Lists as Dictionary Keys
That said, the simple answer to why lists cannot be used as dictionary keys is that lists do not provide a valid hash method. Of course, the obvious question is, "Why not?"
Consider what kinds of hash functions could be provided for lists.
If lists hashed by id, this would certainly be valid given Python's definition of a hash function -- lists with different hash values would have different ids. But lists are containers, and most other operations on them deal with them as such. So hashing lists by their id instead would produce unexpected behavior such as:
Looking up different lists with the same contents would produce different results, even though comparing lists with the same contents would indicate them as equivalent.
Using a list literal in a dictionary lookup would be pointless -- it would always produce a KeyError.
User Defined Types as Dictionary Keys
What about instances of user defined types?
By default, all user defined types are usable as dictionary keys with hash(object) defaulting to id(object), and cmp(object1, object2) defaulting to cmp(id(object1), id(object2)). This same suggestion was discussed above for lists and found unsatisfactory. Why are user defined types different?
In the cases where an object must be placed in a mapping, object identity is often much more important than object contents.
In the cases where object content really is important, the default settings can be redefined by overridding __hash__ and __cmp__ or __eq__.
Note that it is often better practice, when an object is to be associated with a value, to simply assign that value as one of the object's attributes.

Why are integers immutable in Python?

I understand the differences between mutable and immutable objects in Python. I have read many posts discussing the differences. However, I have not read anything regarding WHY integers are immutable objects.
Does there exist a reason for this? Or is the answer "that's just how it is"?
Edit: I am getting prompted to 'differentiate' this question from other questions as it seems to be a previously asked question. However, I believe what I'm asking is more of a philosophical Python question rather than a technical Python question.
It appears that 'primitive' objects in Python (i.e. strings, booleans, numbers, etc.) are immutable. I've also noticed that derived data types that are made up of primitives (i.e. dicts, lists, classes) are mutable.
Is that where the line is drawn whether or not an object is mutable? Primitive vs derived?
Making integers mutable would be very counter-intuitive to the way we are used to working with them.
Consider this code fragment:
a = 1 # assign 1 to a
b = a+2 # assign 3 to b, leave a at 1
After these assignments are executed we expect a to have the value 1 and b to have the value 3. The addition operation is creating a new integer value from the integer stored in a and an instance of the integer 2.
If the addition operation just took the integer at a and just mutated it then both a and b would have the value 3.
So we expect arithmetic operations to create new values for their results - not to mutate their input parameters.
However, there are cases where mutating a data structure is more convenient and more efficient. Let's suppose for the moment that list.append(x) did not modify list but returned a new copy of list with x appended.
Then a function like this:
def foo():
nums = []
for x in range(0,10):
nums.append(x)
return nums
would just return the empty list. (Remember - here nums.append(x) doesn't alter nums - it returns a new list with x appended. But this new list isn't saved anywhere.)
We would have to write the foo routine like this:
def foo():
nums = []
for x in range(0,10):
nums = nums.append(x)
return nums
(This, in fact, is very similar to the situation with Python strings up until about 2.6 or perhaps 2.5.)
Moreover, every time we assign nums = nums.append(x) we would be copying a list that is increasing in size resulting in quadratic behavior.
For those reasons we make lists mutable objects.
A consequence to making lists mutable is that after these statements:
a = [1,2,3]
b = a
a.append(4)
the list b has changed to [1,2,3,4]. This is something that we live with even though it still trips us up now and then.
What are the design decisions to make numbers immutable in Python?
There are several reasons for immutability, let's see first what are the reasons for immutability?
1- Memory
Saves memory. If it's well known that an object is immutable, it can be easily copied creating a new reference to the same object.
Performance. Python can allocate space for an immutable object at creation time, and the storage requirements are fixed and unchanging.
2- Fast execution.
It doesn't have to copy each part of the object, only a simple reference.
Easy to be compared, comparing equality by reference is faster than comparing values.
3- Security:
In Multi-threading apps Different threads can interact with data contained inside the immutable objects, without to worry about data consistency.
The internal state of your program will be consistent even if you have exceptions.
Classes should be immutable unless there's a very good reason to make them mutable....If a class cannot be made immutable, limit its mutability as much as possible
4- Ease to use
Is easier to read, easier to maintain and less likely to fail in odd and unpredictable ways.
Immutable objects are easier to test, due not only to their easy mockability, but also the code patterns they tend to enforce.
5- Keys must be immutable. Which means you can use strings, numbers or tuples as dictionary key. This is something that you want to use.
The hash table implementation of dictionaries uses a hash value calculated from the key value to find the key. If the key were a mutable object, its value could change, and thus its hash could also change. But since whoever changes the key object can’t tell that it was being used as a dictionary key, it can’t move the entry around in the dictionary. Then, when you try to look up the same object in the dictionary it won’t be found because its hash value is different. If you tried to look up the old value it wouldn’t be found either, because the value of the object found in that hash bin would be different.
Going back to the integers:
Security (3), Easy to use (4) and capacity of using numbers as keys in dictionaries (5) are reasons for taken the decision of making numbers immutable.
Has fixed memory requirements since creation time (1).
All in Python is an object, the numbers (like strings) are "elemental" objects. No amount of activity will change the value 8 to anything else, and no amount of activity will change the string “eight” to anything else. This is because a decision in the design too.

Difference between hash() and id()

I have two user-defined objects, say a and b.
Both these objects have the same hash values.
However, the id(a) and id(b) are unequal.
Moreover,
>>> a is b
False
>>> a == b
True
From this observation, can I infer the following?
Unequal objects may have the same hash values.
Equal objects need to have the same id values.
Whenever obj1 is obj2 is called, the id values of both objects is compared, not their hash values.
There are three concepts to grasp when trying to understand id, hash and the == and is operators: identity, value and hash value. Not all objects have all three.
All objects have an identity, though even this can be a little slippery in some cases. The id function returns a number corresponding to an object's identity (in cpython, it returns the memory address of the object, but other interpreters may return something else). If two objects (that exist at the same time) have the same identity, they're actually two references to the same object. The is operator compares items by identity, a is b is equivalent to id(a) == id(b).
Identity can get a little confusing when you deal with objects that are cached somewhere in their implementation. For instance, the objects for small integers and strings in cpython are not remade each time they're used. Instead, existing objects are returned any time they're needed. You should not rely on this in your code though, because it's an implementation detail of cpython (other interpreters may do it differently or not at all).
All objects also have a value, though this is a bit more complicated. Some objects do not have a meaningful value other than their identity (so value an identity may be synonymous, in some cases). Value can be defined as what the == operator compares, so any time a == b, you can say that a and b have the same value. Container objects (like lists) have a value that is defined by their contents, while some other kinds of objects will have values based on their attributes. Objects of different types can sometimes have the same values, as with numbers: 0 == 0.0 == 0j == decimal.Decimal("0") == fractions.Fraction(0) == False (yep, bools are numbers in Python, for historic reasons).
If a class doesn't define an __eq__ method (to implement the == operator), it will inherit the default version from object and its instances will be compared solely by their identities. This is appropriate when otherwise identical instances may have important semantic differences. For instance, two different sockets connected to the same port of the same host need to be treated differently if one is fetching an HTML webpage and the other is getting an image linked from that page, so they don't have the same value.
In addition to a value, some objects have a hash value, which means they can be used as dictionary keys (and stored in sets). The function hash(a) returns the object a's hash value, a number based on the object's value. The hash of an object must remain the same for the lifetime of the object, so it only makes sense for an object to be hashable if its value is immutable (either because it's based on the object's identity, or because it's based on contents of the object that are themselves immutable).
Multiple different objects may have the same hash value, though well designed hash functions will avoid this as much as possible. Storing objects with the same hash in a dictionary is much less efficient than storing objects with distinct hashes (each hash collision requires more work). Objects are hashable by default (since their default value is their identity, which is immutable). If you write an __eq__ method in a custom class, Python will disable this default hash implementation, since your __eq__ function will define a new meaning of value for its instances. You'll need to write a __hash__ method as well, if you want your class to still be hashable. If you inherit from a hashable class but don't want to be hashable yourself, you can set __hash__ = None in the class body.
Unequal objects may have the same hash values.
Yes this is true. A simple example is hash(-1) == hash(-2) in CPython.
Equal objects need to have the same id values.
No this is false in general. A simple counterexample noted by #chepner is that 5 == 5.0 but id(5) != id(5.0).
Whenever, obj1 is obj2 is called, the id values of both the objects is compared, not their hash values.
Yes this is true. is compares the id of the objects for equality (in CPython it is the memory address of the object). Generally, this has nothing to do with the object's hash value (the object need not even be hashable).
The hash function is used to:
quickly compare dictionary keys during a dictionary lookup
the ID function is used to:
Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
I just had an experiment , which assigned same whole number to 2 separated variables , but when I used ' is ' operator to compare them , then in returned True ; I didn't expect such a thing , Thought that the variables must be exact same like==>
a = 10
b = a
a is b # output == True

Categories