Set "in" operator: uses equality or identity? - python

class A(object):
def __cmp__(self):
print '__cmp__'
return object.__cmp__(self)
def __eq__(self, rhs):
print '__eq__'
return True
a1 = A()
a2 = A()
print a1 in set([a1])
print a1 in set([a2])
Why does first line prints True, but second prints False? And neither enters operator eq?
I am using Python 2.6

Set __contains__ makes checks in the following order:
'Match' if hash(a) == hash(b) and (a is b or a==b) else 'No Match'
The relevant C source code is in Objects/setobject.c::set_lookkey() and in Objects/object.c::PyObject_RichCompareBool().

You need to define __hash__ too. For example
class A(object):
def __hash__(self):
print '__hash__'
return 42
def __cmp__(self, other):
print '__cmp__'
return object.__cmp__(self, other)
def __eq__(self, rhs):
print '__eq__'
return True
a1 = A()
a2 = A()
print a1 in set([a1])
print a1 in set([a2])
Will work as expected.
As a general rule, any time you implement __cmp__ you should implement a __hash__ such that for all x and y such that x == y, x.__hash__() == y.__hash__().

Sets and dictionaries gain their speed by using hashing as a fast approximation of full equality checking. If you want to redefine equality, you usually need to redefine the hash algorithm so that it is consistent.
The default hash function uses the identity of the object, which is pretty useless as a fast approximation of full equality, but at least allows you to use an arbitrary class instance as a dictionary key and retrieve the value stored with it if you pass exactly the same object as a key. But it means if you redefine equality and don't redefine the hash function, your objects will go into a dictionary/set without complaining about not being hashable, but still won't actually work the way you expect them to.
See the official python docs on __hash__ for more details.

A tangential answer, but your question and my testing made me curious. If you ignore the set operator which is the source of your __hash__ problem, it turns out your question is still interesting.
Thanks to the help I got on this SO question, I was able to chase the in operator through the source code to it's root. Near the bottom I found the PyObject_RichCompareBool function which indeed tests for identity (see the comment about "Quick result") before testing for equality.
So unless I misunderstand the way things work, the technical answer to your question is first identity and then equality, through the equality test itself. Just to reiterate, that is not the source of the behavior you were seeing but just the technical answer to your question.
If I misunderstood the source, somebody please set me straight.
int
PyObject_RichCompareBool(PyObject *v, PyObject *w, int op)
{
PyObject *res;
int ok;
/* Quick result when objects are the same.
Guarantees that identity implies equality. */
if (v == w) {
if (op == Py_EQ)
return 1;
else if (op == Py_NE)
return 0;
}
res = PyObject_RichCompare(v, w, op);
if (res == NULL)
return -1;
if (PyBool_Check(res))
ok = (res == Py_True);
else
ok = PyObject_IsTrue(res);
Py_DECREF(res);
return ok;
}

Sets seem to use hash codes, then identity, before comparing for equality. The following code:
class A(object):
def __eq__(self, rhs):
print '__eq__'
return True
def __hash__(self):
print '__hash__'
return 1
a1 = A()
a2 = A()
print 'set1'
set1 = set([a1])
print 'set2'
set2 = set([a2])
print 'a1 in set1'
print a1 in set1
print 'a1 in set2'
print a1 in set2
outputs:
set1
__hash__
set2
__hash__
a1 in set1
__hash__
True
a1 in set2
__hash__
__eq__
True
What happens seems to be:
The hash code is computed when an element is inserted into a hash. (To compare with the existing elements.)
The hash code for the object you're checking with the in operator is computed.
Elements of the set with the same hash code are inspected by first checking whether they're the same object as the one you're looking for, or if they're logically equal to it.

Related

Python 3 == operator

I'm confused as to how the == operator works in Python 3. From the docs, eq(a, b) is equivalent to a == b. Also eq and __eq__ are equivalent.
Take the following example:
class Potato:
def __eq__(self, other):
print("In Potato's __eq__")
return True
>> p = Potato()
>> p == "hello"
In Potato's __eq__ # As expected, p.__eq__("hello") is called
True
>> "hello" == p
In Potato's __eq__ # Hmm, I expected this to be false because
True # this should call "hello".__eq__(p)
>> "hello".__eq__(p)
NotImplemented # Not implemented? How does == work for strings then?
AFAIK, the docs only talk about the == -> __eq__ mapping, but don't say anything about what happens either one of the arguments is not an object (e.g. 1 == p), or when the first object's __eq__ is NotImplemented, like we saw with "hello".__eq(p).
I'm looking for the general algorithm that is employed for equality... Most, if not all other SO answers, refer to Python 2's coercion rules, which don't apply anymore in Python 3.
You're mixing up the functions in the operator module and the methods used to implement those operators. operator.eq(a, b) is equivalent to a == b or operator.__eq__(a, b), but not to a.__eq__(b).
In terms of the __eq__ method, == and operator.eq work as follows:
def eq(a, b):
if type(a) is not type(b) and issubclass(type(b), type(a)):
# Give type(b) priority
a, b = b, a
result = a.__eq__(b)
if result is NotImplemented:
result = b.__eq__(a)
if result is NotImplemented:
result = a is b
return result
with the caveat that the real code performs method lookup for __eq__ in a way that bypasses instance dicts and custom __getattribute__/__getattr__ methods.
When you do this:
"hello" == potato
Python first calls "hello".__eq__(potato). That return NotImplemented, so Python tries it the other way: potato.__eq__("hello").
Returning NotImplemented doesn't mean there's no implementation of .__eq__ on that object. It means that the implementation didn't know how to compare to the value that was passed in. From https://docs.python.org/3/library/constants.html#NotImplemented:
Note When a binary (or in-place) method returns NotImplemented the
interpreter will try the reflected operation on the other type (or
some other fallback, depending on the operator). If all attempts
return NotImplemented, the interpreter will raise an appropriate
exception. Incorrectly returning NotImplemented will result in a
misleading error message or the NotImplemented value being returned to
Python code. See Implementing the arithmetic operations for examples.
I'm confused as to how the == operator works in Python 3. From the docs, eq(a, b) is equivalent to a == b. Also eq and __eq__ are equivalent.
No that is only the case in the operator module. The operator module is used to pass an == as a function for instance. But operator has not much to do with vanilla Python itself.
AFAIK, the docs only talk about the == -> eq mapping, but don't say anything about what happens either one of the arguments is not an object (e.g. 1 == p), or when the first object's.
In Python everything is an object: an int is an object, a "class" is an object", a None is an object, etc. We can for instance get the __eq__ of 0:
>>> (0).__eq__
<method-wrapper '__eq__' of int object at 0x55a81fd3a480>
So the equality is implemented in the "int class". As specified in the documentation on the datamodel __eq__ can return several values: True, False but any other object (for which the truthiness will be calculated). If on the other hand NotImplemented is returned, Python will fallback and call the __eq__ object on the object on the other side of the equation.

Does comparing using `==` compare identities before comparing values?

If I compare two variables using ==, does Python compare the identities, and, if they're not the same, then compare the values?
For example, I have two strings which point to the same string object:
>>> a = 'a sequence of chars'
>>> b = a
Does this compare the values, or just the ids?:
>>> b == a
True
It would make sense to compare identity first, and I guess that is the case, but I haven't yet found anything in the documentation to support this. The closest I've got is this:
x==y calls x.__eq__(y)
which doesn't tell me whether anything is done before calling x.__eq__(y).
For user-defined class instances, is is used as a fallback - where the default __eq__ isn't overridden, a == b is evaluated as a is b. This ensures that the comparison will always have a result (except in the NotImplemented case, where comparison is explicitly forbidden).
This is (somewhat obliquely - good spot Sven Marnach) referred to in the data model documentation (emphasis mine):
User-defined classes have __eq__() and __hash__() methods by
default; with them, all objects compare unequal (except with
themselves) and x.__hash__() returns an appropriate value such
that x == y implies both that x is y and hash(x) == hash(y).
You can demonstrate it as follows:
>>> class Unequal(object):
def __eq__(self, other):
return False
>>> ue = Unequal()
>>> ue is ue
True
>>> ue == ue
False
so __eq__ must be called before id, but:
>>> class NoEqual(object):
pass
>>> ne = NoEqual()
>>> ne is ne
True
>>> ne == ne
True
so id must be invoked where __eq__ isn't defined.
You can see this in the CPython implementation, which notes:
/* If neither object implements it, provide a sensible default
for == and !=, but raise an exception for ordering. */
The "sensible default" implemented is a C-level equality comparison of the pointers v and w, which will return whether or not they point to the same object.
In addition to the answer by #jonrsharpe: if the objects being compared implement __eq__, it would be wrong for Python to check for identity first.
Look at the following example:
>>> x = float('nan')
>>> x is x
True
>>> x == x
False
NaN is a specific thing that should never compare equal to itself; however, even in this case x is x should return True, because of the semantics of is.

Why is my object properly removed from a list when __eq__ isn't being called?

I have the following code, which is making me scratch my head -
class Element:
def __init__(self, name):
self.name = name
def __repr__(self):
return self.name
def eq(self, other):
print('comparing {} to {} ({})'.format(self.name,
other.name,
self.name == other.name))
return self.name == other.name
Element.__eq__ = eq
elements = [
Element('a'),
Element('b'),
Element('c'),
Element('d')
]
print('before {}'.format(elements))
elements.remove(elements[3])
print('after {}'.format(elements))
Which yields the following output -
before [a, b, c, d]
comparing a to d (False)
comparing b to d (False)
comparing c to d (False)
after [a, b, c]
Why isn't eq() outputting comparing d to d (True)?
The reason I'm monkey patching __eq__ instead of simply implementing it in my Element class is because I'm testing how monkey patching works before I implement it with one of the libraries I'm using.
The fourth element is the exactly same object with the object the code is passing (elements[3]).
In other word,
>>> elements[3] is elements[3]
True
>>> elements[3] == elements[3]
True
So, no need to check the equality because they(?) are identical (same) one.
Equality check will happen if they are not identical. For example, __eq__ will be called if the code passes another object with the same value:
elements.remove(Element('d'))
Python's list.remove() method first checks whether the both objects are identical otherwise falls back to regular comparison methods like __eq__ in this case. So, in this case as both objects are identical it is removed from the list.
listremove(PyListObject *self, PyObject *v)
{
Py_ssize_t i;
for (i = 0; i < Py_SIZE(self); i++) {
int cmp = PyObject_RichCompareBool(self->ob_item[i], v, Py_EQ);
...
Here PyObject_RichCompareBool(PyObject *o1, PyObject *o2, int opid) is being used for comparison, and from its docs:
If o1 and o2 are the same object, PyObject_RichCompareBool() will
always return 1 for Py_EQ and 0 for Py_NE.

Check if an object is order-able in python?

How can I check if an object is orderable/sortable in Python?
I'm trying to implement basic type checking for the __init__ method of my binary tree class, and I want to be able to check if the value of the node is orderable, and throw an error if it isn't. It's similar to checking for hashability in the implementation of a hashtable.
I'm trying to accomplish something similar to Haskell's (Ord a) => etc. qualifiers. Is there a similar check in Python?
If you want to know if an object is sortable, you must check if it implements the necessary methods of comparison.
In Python 2.X there were two different ways to implement those methods:
cmp method (equivalent of compareTo in Java per example)
__cmp__(self, other): returns >0, 0 or <0 wether self is more, equal or less than other
rich comparison methods
__lt__, __gt__, __eq__, __le__, __ge__, __ne__
The sort() functions call this method to make the necessary comparisons between instances (actually sort only needs the __lt__ or __gt__ methods but it's recommended to implement all of them)
In Python 3.X the __cmp__ was removed in favor of the rich comparison methods as having more than one way to do the same thing is really against Python's "laws".
So, you basically need a function that check if these methods are implemented by a class:
# Python 2.X
def is_sortable(obj):
return hasattr(obj, "__cmp__") or \
hasattr(obj, "__lt__") or \
hasattr(obj, "__gt__")
# Python 3.X
def is_sortable(obj):
cls = obj.__class__
return cls.__lt__ != object.__lt__ or \
cls.__gt__ != object.__gt__
Different functions are needed for Python 2 and 3 because a lot of other things also change about unbound methods, method-wrappers and other internal things in Python 3.
Read this links you want better understanding of the sortable objects in Python:
http://python3porting.com/problems.html#unorderable-types-cmp-and-cmp
http://docs.python.org/2/howto/sorting.html#the-old-way-using-the-cmp-parameter
PS: this was a complete re-edit of my first answer, but it was needed as I investigated the problem better and had a cleaner idea about it :)
While the explanations in answers already here address runtime type inspection, here's how the static types are annotated by typeshed. They start by defining a collection of comparison Protocols, e.g.
class SupportsDunderLT(Protocol):
def __lt__(self, __other: Any) -> bool: ...
which are then collected into rich comparison sum types, such as
SupportsRichComparison = Union[SupportsDunderLT, SupportsDunderGT]
SupportsRichComparisonT = TypeVar("SupportsRichComparisonT", bound=SupportsRichComparison)
then finally these are used to type e.g. the key functions of list.sort:
#overload
def sort(self: list[SupportsRichComparisonT], *, key: None = ..., reverse: bool = ...) -> None: ...
#overload
def sort(self, *, key: Callable[[_T], SupportsRichComparison], reverse: bool = ...) -> None: ...
and sorted:
#overload
def sorted(
__iterable: Iterable[SupportsRichComparisonT], *, key: None = ..., reverse: bool = ...
) -> list[SupportsRichComparisonT]: ...
#overload
def sorted(__iterable: Iterable[_T], *, key: Callable[[_T], SupportsRichComparison], reverse: bool = ...) -> list[_T]: ...
Regrettably it is not enough to check that your object implements lt.
numpy uses the '<' operator to return an array of Booleans, which has no truth value. SQL Alchemy uses it to return a query filter, which again no truth value.
Ordinary sets uses it to check for a subset relationship, so that
set1 = {1,2}
set2 = {2,3}
set1 == set2
False
set1 < set2
False
set1 > set2
False
The best partial solution I could think of (starting from a single object of unknown type) is this, but with rich comparisons it seems to be officially impossible to determine orderability:
if hasattr(x, '__lt__'):
try:
isOrderable = ( ((x == x) is True) and ((x > x) is False)
and not isinstance(x, (set, frozenset)) )
except:
isOrderable = False
else:
isOrderable = False
Edited
As far as I know, all lists are sortable, so if you want to know if a list is "sortable", the answer is yes, no mather what elements it has.
class C:
def __init__(self):
self.a = 5
self.b = "asd"
c = C()
d = True
list1 = ["abc", "aad", c, 1, "b", 2, d]
list1.sort()
print list1
>>> [<__main__.C instance at 0x0000000002B7DF08>, 1, True, 2, 'aad', 'abc', 'b']
You could determine what types you consider "sortable" and implement a method to verify if all elements in the list are "sortable", something like this:
def isSortable(list1):
types = [int, float, str]
res = True
for e in list1:
res = res and (type(e) in types)
return res
print isSortable([1,2,3.0, "asd", [1,2,3]])

How is __eq__ handled in Python and in what order?

Since Python does not provide left/right versions of its comparison operators, how does it decide which function to call?
class A(object):
def __eq__(self, other):
print "A __eq__ called"
return self.value == other
class B(object):
def __eq__(self, other):
print "B __eq__ called"
return self.value == other
>>> a = A()
>>> a.value = 3
>>> b = B()
>>> b.value = 4
>>> a == b
"A __eq__ called"
"B __eq__ called"
False
This seems to call both __eq__ functions.
I am looking for the official decision tree.
The a == b expression invokes A.__eq__, since it exists. Its code includes self.value == other. Since int's don't know how to compare themselves to B's, Python tries invoking B.__eq__ to see if it knows how to compare itself to an int.
If you amend your code to show what values are being compared:
class A(object):
def __eq__(self, other):
print("A __eq__ called: %r == %r ?" % (self, other))
return self.value == other
class B(object):
def __eq__(self, other):
print("B __eq__ called: %r == %r ?" % (self, other))
return self.value == other
a = A()
a.value = 3
b = B()
b.value = 4
a == b
it will print:
A __eq__ called: <__main__.A object at 0x013BA070> == <__main__.B object at 0x013BA090> ?
B __eq__ called: <__main__.B object at 0x013BA090> == 3 ?
When Python2.x sees a == b, it tries the following.
If type(b) is a new-style class, and type(b) is a subclass of type(a), and type(b) has overridden __eq__, then the result is b.__eq__(a).
If type(a) has overridden __eq__ (that is, type(a).__eq__ isn't object.__eq__), then the result is a.__eq__(b).
If type(b) has overridden __eq__, then the result is b.__eq__(a).
If none of the above are the case, Python repeats the process looking for __cmp__. If it exists, the objects are equal iff it returns zero.
As a final fallback, Python calls object.__eq__(a, b), which is True iff a and b are the same object.
If any of the special methods return NotImplemented, Python acts as though the method didn't exist.
Note that last step carefully: if neither a nor b overloads ==, then a == b is the same as a is b.
From https://eev.ee/blog/2012/03/24/python-faq-equality/
Python 3 Changes/Updates for this algorithm
How is __eq__ handled in Python and in what order?
a == b
It is generally understood, but not always the case, that a == b invokes a.__eq__(b), or type(a).__eq__(a, b).
Explicitly, the order of evaluation is:
if b's type is a strict subclass (not the same type) of a's type and has an __eq__, call it and return the value if the comparison is implemented,
else, if a has __eq__, call it and return it if the comparison is implemented,
else, see if we didn't call b's __eq__ and it has it, then call and return it if the comparison is implemented,
else, finally, do the comparison for identity, the same comparison as is.
We know if a comparison isn't implemented if the method returns NotImplemented.
(In Python 2, there was a __cmp__ method that was looked for, but it was deprecated and removed in Python 3.)
Let's test the first check's behavior for ourselves by letting B subclass A, which shows that the accepted answer is wrong on this count:
class A:
value = 3
def __eq__(self, other):
print('A __eq__ called')
return self.value == other.value
class B(A):
value = 4
def __eq__(self, other):
print('B __eq__ called')
return self.value == other.value
a, b = A(), B()
a == b
which only prints B __eq__ called before returning False.
Note that I also correct a small error in the question where self.value is compared to other instead of other.value - in this comparison, we get two objects (self and other), usually of the same type since we are doing no type-checking here (but they can be of different types), and we need to know if they are equal. Our measure of whether or not they are equal is to check the value attribute, which must be done on both objects.
How do we know this full algorithm?
The other answers here seem incomplete and out of date, so I'm going to update the information and show you how how you could look this up for yourself.
This is handled at the C level.
We need to look at two different bits of code here - the default __eq__ for objects of class object, and the code that looks up and calls the __eq__ method regardless of whether it uses the default __eq__ or a custom one.
Default __eq__
Looking __eq__ up in the relevant C api docs shows us that __eq__ is handled by tp_richcompare - which in the "object" type definition in cpython/Objects/typeobject.c is defined in object_richcompare for case Py_EQ:.
case Py_EQ:
/* Return NotImplemented instead of False, so if two
objects are compared, both get a chance at the
comparison. See issue #1393. */
res = (self == other) ? Py_True : Py_NotImplemented;
Py_INCREF(res);
break;
So here, if self == other we return True, else we return the NotImplemented object. This is the default behavior for any subclass of object that does not implement its own __eq__ method.
How __eq__ gets called
Then we find the C API docs, the PyObject_RichCompare function, which calls do_richcompare.
Then we see that the tp_richcompare function, created for the "object" C definition is called by do_richcompare, so let's look at that a little more closely.
The first check in this function is for the conditions the objects being compared:
are not the same type, but
the second's type is a subclass of the first's type, and
the second's type has an __eq__ method,
then call the other's method with the arguments swapped, returning the value if implemented. If that method isn't implemented, we continue...
if (!Py_IS_TYPE(v, Py_TYPE(w)) &&
PyType_IsSubtype(Py_TYPE(w), Py_TYPE(v)) &&
(f = Py_TYPE(w)->tp_richcompare) != NULL) {
checked_reverse_op = 1;
res = (*f)(w, v, _Py_SwappedOp[op]);
if (res != Py_NotImplemented)
return res;
Py_DECREF(res);
Next we see if we can lookup the __eq__ method from the first type and call it.
As long as the result is not NotImplemented, that is, it is implemented, we return it.
if ((f = Py_TYPE(v)->tp_richcompare) != NULL) {
res = (*f)(v, w, op);
if (res != Py_NotImplemented)
return res;
Py_DECREF(res);
Else if we didn't try the other type's method and it's there, we then try it, and if the comparison is implemented, we return it.
if (!checked_reverse_op && (f = Py_TYPE(w)->tp_richcompare) != NULL) {
res = (*f)(w, v, _Py_SwappedOp[op]);
if (res != Py_NotImplemented)
return res;
Py_DECREF(res);
}
Finally, we get a fallback in case it isn't implemented for either one's type.
The fallback checks for the identity of the object, that is, whether it is the same object at the same place in memory - this is the same check as for self is other:
/* If neither object implements it, provide a sensible default
for == and !=, but raise an exception for ordering. */
switch (op) {
case Py_EQ:
res = (v == w) ? Py_True : Py_False;
break;
Conclusion
In a comparison, we respect the subclass implementation of comparison first.
Then we attempt the comparison with the first object's implementation, then with the second's if it wasn't called.
Finally we use a test for identity for comparison for equality.

Categories