Why don't functions preserve identity? [closed] - python

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I was wondering why Python 3.7 functions behave in a rather strange way. I think it's kinda weird and contradictory to the whole notion of hashability. Let me clarify what I encounter with a simple example code. Knowing that tuples are hashable, consider the following:
a = (-1, 20, 8)
b = (-1, 20, 8)
def f(x):
return min(x), max(x)
Now let us examine:
>>> print(a is b, a.__hash__() == b.__hash__())
False True
>>> print((-1, 20, 8) is (-1, 20, 8))
True
This is odd enough, but I guess "naming" hashable objects make them something different (their id()'s change during variable definition). How about functions? Functions are hashable, right? Let's see:
>>> print(f(a) is f(b))
False
>>> print(id(f(a)) == id(f(b)), f(a).__hash__() == f(b).__hash__())
True True
Now this is the climax of my confusion. You should be surprised that even f(a) is f(a) is False. But how so? Don't you think this kind of behavior is incorrect and should be addressed and fixed by Python community?

You can't guarantee two identical calls are the same since functions are also objects in Python, thus they can maintain state. Yet even if you put state apart you shouldn't rely that is will evaluate True if the contents of two objects are the same.
There are cases in which Python will optimize the code to use the same object as a singleton but you should't assume anything on this.
255 is 255 returns True due to implementation details of CPython while 256 is 256 returns False. If care only for deep equality use ==. is is designed for object equality checks.
c = 40
def f(x):
return c + x
a = 1
f(a)
# 41
c += 1
f(a)
# 42
f(a) is f(a)
# True
c += 500
f(a) is f(a)
# False
f(a) is f(a) can result in the same objects, for instance Python stores integers up to 255 as singletons so the first test returns True but when we are out of those optimizations (c += 500) each call will instantiate its own object to return and now f(a) is f(a) will return False.

is keyword in python compares if the operand are pointing to the same object. Python provides id() function to return a unique identifier for an object instance. So, a is b does not compare if objects contain the same value, it just return if a and b are the same object.
__hash__() function returns a value based on the content/value of the object.
>>> a = (-1, 20, 8)
>>> b = (-1, 20, 8)
>>> id(a)
2347044252768
>>> id(b)
2347044252336
>>> hash(a)
-3789721413161926883
>>> hash(b)
-3789721413161926883
Now the last question, f(a) is f(b) compares if the results returned by f(a) and f(b) points to the same object in memory.
If your function return min(x), max(x) will return a new tuple containing the min and max of x. Therefore, print(f(a) is f(b)) is False
f(a).__hash__() == f(b).__hash__() is True because this actually compares hash of the resulting value, not the hash of the function as you think.
If you want the hash of the function, you will do f.__hash__() or hash(f) since function in Python is just a callable object.
The only interesting part is print(id(f(a)) == id(f(b))) shows True. This is probably due to CPython expression bytecode optimizer.
If you do it separately, it returns False.
>>> c = f(a)
>>> d = f(b)
>>> print(id(f(a)) == id(f(b)))
True
>>> print(id(c) == id(d))
False
I'm not sure if it is a bug that should be fix, but it is an odd inconsistency. BTW, I'm using Python 3.7.2 on Windows 64-bit. The behavior might different on different Python version or implementation.
If you replace integer values with strings, the behavior also changes due to Python's string interning optimization.
Therefore, the lesson here is just like general guidelines in other language, avoid comparing object references/pointers if possible as you might be looking into some implementation details about how the objects are referenced, optimization and possible how its GC works.
Here's an interesting related article: Python Optimization: How it Can Make You a Better Programmer

Related

I don't understand how these recursive calls could finish

Here is my simple Python function:
def f(b):
return b and f(b)
To me this must be an infinite loop, no matter the value of b, because b is always the same. But:
With f(True) I get an overflow, which is what I expected
When I run f(False) I get False!
I'm very curious about that.
Recursion stops when b is falsey, i.e. False (or an empty string, or zero).
In some more detail,
x and y
does not evaluate y if x is False (or, as explicated above, another value which is interpreted as that when converted into a bool value, which Python calls "falsey"), as it is then impossible for y to affect the outcome of the expression. This is often called "short-circuiting".

What is the body of __hash__ function for the type int in cython python3 [duplicate]

When hash() method is called in Python 3, I noticed that it doesn't return a long-length integer when taking in int data type but with string type.
Is this supposed to work this way? If that actually is the case, for the int type to have a short hash value, won't it cause collision since it's too short?
for i in [i for i in range(5)]:
print(hash(i))
print(hash("abc"))
The Result:
0
1
2
3
4
4714025963994714141
In CPython, default Python interpreter implementation, built-in hash is done in this way:
For numeric types, the hash of a number x is based on the reduction
of x modulo the prime P = 2**_PyHASH_BITS - 1. It's designed so that
hash(x) == hash(y) whenever x and y are numerically equal, even if
x and y have different types
_PyHASH_BITS is 61 (64-bit systems) or 31 (32-bit systems)(defined here)
So on 64-bit system built-in hash looks like this function:
def hash(number):
return number % (2 ** 61 - 1)
That's why for small ints you got the same values, while for example hash(2305843009213693950) returns 2305843009213693950 and hash(2305843009213693951) returns 0
The only purpose of the hash function is to produce an integer value that can be used to insert an object into a dict. The only thing hash guarantees is that if a == b, then hash(a) == hash(b). For a user-defined class Foo, it is the user's responsibility to ensure that Foo.__eq__ and Foo.__hash__ enforce this guarantee.
Anything else is implementation-dependent, and you shouldn't read anything into the value of hash(x) for any value x. Specifically, hash(a) == hash(b) is allowed for a != b, and hash(x) == x is not required for any particular x.
You should use hashlib module:
>>> import hashlib()
>>> m.update(b'abc')
>>> m.hexdigest()

Python set __contains__ is not finding objects contained in the set [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 4 years ago.
Improve this question
(python 3.7.1 on linux)
I am observing some strange behavior storing user-defined objects in a set. The objects are highly complex so a minimal example is not in the cards-- but I am hoping the observed behavior will elicit an explanation from someone wiser than myself. Here it is:
>>> from mycode import MyObject
>>> a = MyObject(*args1)
>>> b = MyObject(*args2)
>>> a == b
False
>>> z = {a, b}
>>> len(z)
2
>>> a in z
False
My understanding was that an object is "in" a set if (1) its hash matches the hash of an object in the set and (2) it equals that object. But those expectations are violated here:
>>> [hash(t) for t in z]
[1013724486348463466, -1852733432963649245]
>>> hash(a)
1013724486348463466
>>> [(hash(t) == hash(a), t == a) for t in z]
[(True, True), (False, False)]
>>> [t is a for t in z]
[True, False]
And the strangest (syntactically) of all:
>>> [t in z for t in z]
[False, False]
What might be up with MyObject to cause it to behave this way? To recap: it has a sane __hash__ and __eq__ function, set is just a stock python set.
Here they are specifically:
class MyObject(object):
...
def __hash__(self):
return hash(self.link)
def __eq__(self, other):
"""
two entities are equal if their types, origins, and external references are the same.
internal refs do not need to be equal; reference entities do not need to be equal
:return:
"""
if other is None:
return False
try:
is_eq = (self.external_ref == other.external_ref
and self.origin == other.origin
and self.entity_type == other.entity_type)
except AttributeError:
is_eq = False
return is_eq
All of those properties are defined on these objects. As demonstrated above, a == t evaluates to True for one of the objects in the set. Thanks for any suggestions.
I was mutating the objects after adding them to the set. The hash function as defined was not static.

Python 3 == operator

I'm confused as to how the == operator works in Python 3. From the docs, eq(a, b) is equivalent to a == b. Also eq and __eq__ are equivalent.
Take the following example:
class Potato:
def __eq__(self, other):
print("In Potato's __eq__")
return True
>> p = Potato()
>> p == "hello"
In Potato's __eq__ # As expected, p.__eq__("hello") is called
True
>> "hello" == p
In Potato's __eq__ # Hmm, I expected this to be false because
True # this should call "hello".__eq__(p)
>> "hello".__eq__(p)
NotImplemented # Not implemented? How does == work for strings then?
AFAIK, the docs only talk about the == -> __eq__ mapping, but don't say anything about what happens either one of the arguments is not an object (e.g. 1 == p), or when the first object's __eq__ is NotImplemented, like we saw with "hello".__eq(p).
I'm looking for the general algorithm that is employed for equality... Most, if not all other SO answers, refer to Python 2's coercion rules, which don't apply anymore in Python 3.
You're mixing up the functions in the operator module and the methods used to implement those operators. operator.eq(a, b) is equivalent to a == b or operator.__eq__(a, b), but not to a.__eq__(b).
In terms of the __eq__ method, == and operator.eq work as follows:
def eq(a, b):
if type(a) is not type(b) and issubclass(type(b), type(a)):
# Give type(b) priority
a, b = b, a
result = a.__eq__(b)
if result is NotImplemented:
result = b.__eq__(a)
if result is NotImplemented:
result = a is b
return result
with the caveat that the real code performs method lookup for __eq__ in a way that bypasses instance dicts and custom __getattribute__/__getattr__ methods.
When you do this:
"hello" == potato
Python first calls "hello".__eq__(potato). That return NotImplemented, so Python tries it the other way: potato.__eq__("hello").
Returning NotImplemented doesn't mean there's no implementation of .__eq__ on that object. It means that the implementation didn't know how to compare to the value that was passed in. From https://docs.python.org/3/library/constants.html#NotImplemented:
Note When a binary (or in-place) method returns NotImplemented the
interpreter will try the reflected operation on the other type (or
some other fallback, depending on the operator). If all attempts
return NotImplemented, the interpreter will raise an appropriate
exception. Incorrectly returning NotImplemented will result in a
misleading error message or the NotImplemented value being returned to
Python code. See Implementing the arithmetic operations for examples.
I'm confused as to how the == operator works in Python 3. From the docs, eq(a, b) is equivalent to a == b. Also eq and __eq__ are equivalent.
No that is only the case in the operator module. The operator module is used to pass an == as a function for instance. But operator has not much to do with vanilla Python itself.
AFAIK, the docs only talk about the == -> eq mapping, but don't say anything about what happens either one of the arguments is not an object (e.g. 1 == p), or when the first object's.
In Python everything is an object: an int is an object, a "class" is an object", a None is an object, etc. We can for instance get the __eq__ of 0:
>>> (0).__eq__
<method-wrapper '__eq__' of int object at 0x55a81fd3a480>
So the equality is implemented in the "int class". As specified in the documentation on the datamodel __eq__ can return several values: True, False but any other object (for which the truthiness will be calculated). If on the other hand NotImplemented is returned, Python will fallback and call the __eq__ object on the object on the other side of the equation.

Does comparing using `==` compare identities before comparing values?

If I compare two variables using ==, does Python compare the identities, and, if they're not the same, then compare the values?
For example, I have two strings which point to the same string object:
>>> a = 'a sequence of chars'
>>> b = a
Does this compare the values, or just the ids?:
>>> b == a
True
It would make sense to compare identity first, and I guess that is the case, but I haven't yet found anything in the documentation to support this. The closest I've got is this:
x==y calls x.__eq__(y)
which doesn't tell me whether anything is done before calling x.__eq__(y).
For user-defined class instances, is is used as a fallback - where the default __eq__ isn't overridden, a == b is evaluated as a is b. This ensures that the comparison will always have a result (except in the NotImplemented case, where comparison is explicitly forbidden).
This is (somewhat obliquely - good spot Sven Marnach) referred to in the data model documentation (emphasis mine):
User-defined classes have __eq__() and __hash__() methods by
default; with them, all objects compare unequal (except with
themselves) and x.__hash__() returns an appropriate value such
that x == y implies both that x is y and hash(x) == hash(y).
You can demonstrate it as follows:
>>> class Unequal(object):
def __eq__(self, other):
return False
>>> ue = Unequal()
>>> ue is ue
True
>>> ue == ue
False
so __eq__ must be called before id, but:
>>> class NoEqual(object):
pass
>>> ne = NoEqual()
>>> ne is ne
True
>>> ne == ne
True
so id must be invoked where __eq__ isn't defined.
You can see this in the CPython implementation, which notes:
/* If neither object implements it, provide a sensible default
for == and !=, but raise an exception for ordering. */
The "sensible default" implemented is a C-level equality comparison of the pointers v and w, which will return whether or not they point to the same object.
In addition to the answer by #jonrsharpe: if the objects being compared implement __eq__, it would be wrong for Python to check for identity first.
Look at the following example:
>>> x = float('nan')
>>> x is x
True
>>> x == x
False
NaN is a specific thing that should never compare equal to itself; however, even in this case x is x should return True, because of the semantics of is.

Categories