Difference between hash() and id() - python

I have two user-defined objects, say a and b.
Both these objects have the same hash values.
However, the id(a) and id(b) are unequal.
Moreover,
>>> a is b
False
>>> a == b
True
From this observation, can I infer the following?
Unequal objects may have the same hash values.
Equal objects need to have the same id values.
Whenever obj1 is obj2 is called, the id values of both objects is compared, not their hash values.

There are three concepts to grasp when trying to understand id, hash and the == and is operators: identity, value and hash value. Not all objects have all three.
All objects have an identity, though even this can be a little slippery in some cases. The id function returns a number corresponding to an object's identity (in cpython, it returns the memory address of the object, but other interpreters may return something else). If two objects (that exist at the same time) have the same identity, they're actually two references to the same object. The is operator compares items by identity, a is b is equivalent to id(a) == id(b).
Identity can get a little confusing when you deal with objects that are cached somewhere in their implementation. For instance, the objects for small integers and strings in cpython are not remade each time they're used. Instead, existing objects are returned any time they're needed. You should not rely on this in your code though, because it's an implementation detail of cpython (other interpreters may do it differently or not at all).
All objects also have a value, though this is a bit more complicated. Some objects do not have a meaningful value other than their identity (so value an identity may be synonymous, in some cases). Value can be defined as what the == operator compares, so any time a == b, you can say that a and b have the same value. Container objects (like lists) have a value that is defined by their contents, while some other kinds of objects will have values based on their attributes. Objects of different types can sometimes have the same values, as with numbers: 0 == 0.0 == 0j == decimal.Decimal("0") == fractions.Fraction(0) == False (yep, bools are numbers in Python, for historic reasons).
If a class doesn't define an __eq__ method (to implement the == operator), it will inherit the default version from object and its instances will be compared solely by their identities. This is appropriate when otherwise identical instances may have important semantic differences. For instance, two different sockets connected to the same port of the same host need to be treated differently if one is fetching an HTML webpage and the other is getting an image linked from that page, so they don't have the same value.
In addition to a value, some objects have a hash value, which means they can be used as dictionary keys (and stored in sets). The function hash(a) returns the object a's hash value, a number based on the object's value. The hash of an object must remain the same for the lifetime of the object, so it only makes sense for an object to be hashable if its value is immutable (either because it's based on the object's identity, or because it's based on contents of the object that are themselves immutable).
Multiple different objects may have the same hash value, though well designed hash functions will avoid this as much as possible. Storing objects with the same hash in a dictionary is much less efficient than storing objects with distinct hashes (each hash collision requires more work). Objects are hashable by default (since their default value is their identity, which is immutable). If you write an __eq__ method in a custom class, Python will disable this default hash implementation, since your __eq__ function will define a new meaning of value for its instances. You'll need to write a __hash__ method as well, if you want your class to still be hashable. If you inherit from a hashable class but don't want to be hashable yourself, you can set __hash__ = None in the class body.

Unequal objects may have the same hash values.
Yes this is true. A simple example is hash(-1) == hash(-2) in CPython.
Equal objects need to have the same id values.
No this is false in general. A simple counterexample noted by #chepner is that 5 == 5.0 but id(5) != id(5.0).
Whenever, obj1 is obj2 is called, the id values of both the objects is compared, not their hash values.
Yes this is true. is compares the id of the objects for equality (in CPython it is the memory address of the object). Generally, this has nothing to do with the object's hash value (the object need not even be hashable).

The hash function is used to:
quickly compare dictionary keys during a dictionary lookup
the ID function is used to:
Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.

I just had an experiment , which assigned same whole number to 2 separated variables , but when I used ' is ' operator to compare them , then in returned True ; I didn't expect such a thing , Thought that the variables must be exact same like==>
a = 10
b = a
a is b # output == True

Related

Comparing falsy values in Python

Why does the expression below evaluate to False? Both the values in the comparison are Falsy.
print('' == [])
According to the documentation:
An object’s type determines the operations that the object supports (e.g., “does it have a length?”) and also defines the possible values for objects of that type.
Furthermore, it also states:
== compares the values of two objects
However, since the value of an object in Python is pretty much abstract, in that there is no canonical method to access it, the default behavior of == compares the identity of the two objects (which can be thought of as the object’s address in memory) but is defined as:
An integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same identity.
It's also important to mention that many built-in types have customized comparison methods within their respective types that compare based on their 'values'.
While it's true that both of your objects are Falsy, they are of different types (String and List), and as a result, the default behavior of comparing identities is used. Since there is no chance of the two objects having non-overlapping lifetimes, their identities will be different, and so '' == [] will evaluate to False.
In order to compare the Truthy values of two objects, first convert them to booleans as Waket suggested in the comments:
bool('') == bool([])
taking advantage of Python's truth value testing.

Distinct python classes instances returning the same class object [duplicate]

How much can I rely on the object's id() and its uniqueness in practice? E.g.:
Does id(a) == id(b) mean a is b or vice versa? What about the opposite?
How safe is it to save an id somewhere to be used later (e.g. into some registry instead of the object itself)?
(Written as a proposed canonical in response to Canonicals for Python: are objects with the same id() the same object, `is` operator, unbound method objects)
According to the id() documentation, an id is only guaranteed to be unique
for the lifetime of the specific object, and
within a specific interpreter instance
As such, comparing ids is not safe unless you also somehow ensure that both objects whose ids are taken are still alive at the time of comparison (and are associated with the same Python interpreter instance, but you need to really try to make that become false).
Which is exactly what is does -- which makes comparing ids redundant. If you cannot use the is syntax for whatever reason, there's always operator.is_.
Now, whether an object is still alive at the time of comparison is not always obvious (and sometimes is grossly non-obvious):
Accessing some attributes (e.g. bound methods of an object) creates a new object each time. So, the result's id may or may not be the same on each attribute access.
Example:
>>> class C(object): pass
>>> c=C()
>>> c.a=1
>>> c.a is c.a
True # same object each time
>>> c.__init__ is c.__init__
False # a different object each time
# The above two are not the only possible cases.
# An attribute may be implemented to sometimes return the same object
# and sometimes a different one:
#property
def page(self):
if check_for_new_version():
self._page=get_new_version()
return self._page
If an object is created as a result of calculating an expression and not saved anywhere, it's immediately discarded,1 and any object created after that can take up its id.
This is even true within the same code line. E.g. the result of id(create_foo()) == id(create_bar()) is undefined.
Example:
>>> id([]) #the list object is discarded when id() returns
39733320L
>>> id([]) #a new, unrelated object is created (and discarded, too)
39733320L #its id can happen to be the same
>>> id([[]])
39733640L #or not
>>> id([])
39733640L #you never really know
Due to the above safety requirements when comparing ids, saving an id instead of the object is not very useful because you have to save a reference to the object itself anyway -- to ensure that it stays alive. Neither is there any performance gain: is implementation is as simple as comparing pointers.
Finally, as an internal optimization (and implementation detail, so this may differ between implementations and releases), CPython reuses some often-used simple objects of immutable types. As of this writing, that includes small integers and some strings. So even if you got them from different places, their ids might coincide.
This does not (technically) violate the above id() documentation's uniqueness promises: the reused object stays alive through all the reuses.
This is also not a big deal because whether two variables point to the same object or not is only practical to know if the object is mutable: if two variables point to the same mutable object, mutating one will (unexpectedly) change the other, too. Immutable types don't have that problem, so for them, it doesn't matter if two variables point to two identical objects or to the same one.
1Sometimes, this is called "unnamed expression".

in python Class object is immutable object ,but it can be modify,why?

I am not familiar whih python.I found class object and instance object can be dictionary's key recently.So i assume class object and instance is mutable object.As we all kown,dictionary's key must be immutable object and like tuple must contains immutable object.in other words,If use a tuple as a dictionary's key.It can't contains list object,ect. But class object can ,class object could modify it's attribute.it confused me for a long time.Please lighten me.And i am not understand class namespace concept , instance namespace and the relationship with each other. Could you explain this for me? Thanks in advance. The following is my testing
class Student(object):
name='tests'
pass
dic={Student:'test'} #not error
print(id(Student))
Student.name = 'modified'
print(id(Student))
You need to be careful here. You're mixing 2 separate (but closely related) concepts.
The first concept is immutability. Immutable objects cannot change once you've created them.
The second concept is hashability -- The ability to construct a consistent integer value from an object that does not change over it's lifetime and to define a consistent equality function1. Note, these constraints are very important (as we'll see).
The latter concept determines what can be used as a dictionary key (or set item). By default, class instances have a well defined equality function (two objects are equal iff they have the same id()). Class instances also have an integer value that does not change over their lifetime (their id()). Because the id() is also exposed as the hash() return value, instances of classes (and, classes themselves which are instances of type) are hashable by default.
class Foo(object):
def __init__(self, a):
self.a = a
f1 = Foo(1)
f2 = Foo(1)
d = {
f1: 1,
f2: 2,
}
Here we have 2 separate Foo instances in our dictionary. Even though they're the same, they aren't equal and they have different hash values.
f1 == f2 # False -- They do not have the same id()
hash(f1) == hash(f2) # False. By default, the hash() is the id()
Ok, but not all things are hashable -- e.g. list and set instances aren't hashable. At some point, reference equality isn't so useful anymore. e.g. I write:
d = {[1, 2, 3]: 6}
print(d[1, 2, 3])
and I get a KeyError. Why? Because my two lists aren't the same list -- They just happen to have the same values. In other words, they equal, but they don't have reference equality. Now that just starts to get really confusing. To avoid all that confusion, the python devs have just decided to not expose the list's id() to the list's hash(). Instead, they raise a TypeError with a (hopefully) more helpful error message.
hash([]) # TypeError: unhashable type: 'list'
Note that equality is overridden to do the natural thing rather than compare by id():
l1 = [1]
l2 = [1]
l1 == l2 # True. Nice.
Alright, so far we've basically said that to put something in a dictionary, we need to have well behaving __hash__ and __eq__ methods and that objects have those by default. Some objects choose to remove them to avoid confusing situations. Where does immutability come in to this?
So far, our world consists of being able to store things in a table and look them up solely by the object's id(). That's super useful sometimes, but it's still really restrictive. I wouldn't be able to use integers in a lookup table naturally if all I can rely on is their id() (what if I store it using a literal but then do a lookup using the result of a computation?). Fortunately, we live in a world that lets us get around that problem -- immutability aids in the construction of a hash() value that isn't tied to the object's id() and isn't in danger of changing during the object's lifetime. This can be super useful because now I can do:
d = {(1, 2, 3): 4}
d[(1, 2) + (3,)] # 4!
Now the two tuples that I used were not the same tuple (they didn't have the same id()), but they are equal and because they're immutable, we can construct a hash() function that uses the contents of the tuple rather than it's id(). This is super useful! Note that if the tuple was mutable and we tried to play this trick, we'd (potentially) violate the condition that hash() should not change over the lifetime of the object.
1Consistent here means that if two objects are equal, then they also must have the same hash. This is necessary for resolving hash collisions which I won't discuss here in detail...

when compare by id is used in Python? Dictionary key comparison?

What does this mean?
The only types of values not acceptable as dictionary keys are values containing lists or dictionaries or other mutable types that are compared by value rather than by object identity, the reason being that the efficient implementation of dictionaries requires a key’s hash value to remain constant.
I think even for tuples, comparison will happen by value.
The problem with a mutable object as a key is that when we use a dictionary, we rarely want to check identity. For example, when we use a dictionary like this:
a = "bob"
test = {a: 30}
print(test["bob"])
We expect it to work - the second string "bob" may not be the same as a, but it is the same value, which is what we care about. This works as any two strings that equate will have the same hash, meaning that the dict (implemented as a hashmap) can find those strings very efficiently.
The issue comes into play when we have a list as a key, imagine this case:
a = ["bob"]
test = {a: 30}
print(test[["bob"]])
We can't do this any more - the comparison won't work as the hash of a list is not based on it's value, but rather the instance of the list (aka (id(a) != id(["bob"))).
Python has the choice of making the list's hash change (undermining the efficiency of a hashmap) or simply comparing on identity (which is useless in most cases). Python disallows these specific mutable keys to avoid subtle but common bugs where people expect the values to be equated on value, rather than identity.
The documentation mixes together two different things: mutability, and value-comparable. Let's separate them out.
Immutable objects that compare by identity are fine. The identity can
never change, for any object.
Immutable objects that compare by value are fine. The value can never
change for an immutable object. This includes tuples.
Mutable objects that compare by identity are fine. The identity can
never change, for any object.
Mutable objects that compare by value are not acceptable. The value
can change for a mutable object, which would make the dictionary
invalid.
Meanwhile, your wording isn't quite the same as Mapping Types (4.10 in Python 3.3 or 5.8 in Python 2.7, both of which say:
A dictionary’s keys are almost arbitrary values. Values that are not hashable, that is, values containing lists, dictionaries or other mutable types (that are compared by value rather than by object identity) may not be used as keys.
Anyway, the key point here is that the rule is "not hashable"; "mutable types (that are compared by value rather than by object identity)" is just to explain things a little further. It isn't strictly true that comparing by object identity and hashing by object identity are always the same (the only thing that's required is that if id is equal, the hash is equal).
The part about "efficient implementation of dictionaries" from the version you posted just adds to the confusion (which is probably why it's not in the reference documentation). Even if someone came up with an efficient way to deal with storing lists as dict keys tomorrow, the language doesn't allow it.
A hash is way of calculating an unique code for an object, this code always the same for the same object. hash('test') for example is 2314058222102390712, so is a = 'test'; hash(a) = 2314058222102390712.
Internally a dictionary value is searched by the hash, not by the variable you specify. A list is mutable, a hash for a list, if it where defined, would be changing whenever the list changes. Therefore python's design does not hash lists. Lists therefore can not be used as dictionary keys.
Tuples are immutable, therefore tubles have hashes e.G. hash((1,2)) = 3713081631934410656. one could compare whether a tuple a is equal to the tuple (1,2) by comparing the hash, rather than the value. This would be more efficient as we have to compare only one value instead of two.

Is there an object unique identifier in Python

This would be similar to the java.lang.Object.hashcode() method.
I need to store objects I have no control over in a set, and make sure that only if two objects are actually the same object (not contain the same values) will the values be overwritten.
id(x)
will do the trick for you. But I'm curious, what's wrong about the set of objects (which does combine objects by value)?
For your particular problem I would probably keep the set of ids or of wrapper objects. A wrapper object will contain one reference and compare by x==y <==> x.ref is y.ref.
It's also worth noting that Python objects have a hash function as well. This function is necessary to put an object into a set or dictionary. It is supposed to sometimes collide for different objects, though good implementations of hash try to make it less likely.
That's what "is" is for.
Instead of testing "if a == b", which tests for the same value,
test "if a is b", which will test for the same identifier.
As ilya n mentions, id(x) produces a unique identifier for an object.
But your question is confusing, since Java's hashCode method doesn't give a unique identifier. Java's hashCode works like most hash functions: it always returns the same value for the same object, two objects that are equal always get equal codes, and unequal hash values imply unequal hash codes. In particular, two different and unequal objects can get the same value.
This is confusing because cryptographic hash functions are quite different from this, and more like (though not exactly) the "unique id" that you asked for.
The Python equivalent of Java's hashCode method is hash(x).
You don't have to compare objects before placing them in a set. set() semantics already takes care of this.
class A(object):
a = 10
b = 20
def __hash__(self):
return hash((self.a, self.b))
a1 = A()
a2 = A()
a3 = A()
a4 = a1
s = set([a1,a2,a3,a4])
s
=> set([<__main__.A object at 0x222a8c>, <__main__.A object at 0x220684>, <__main__.A object at 0x22045c>])
Note: You really don't have to override hash to prove this behaviour :-)

Categories