Python 2 assertItemsEqual incorrect result - python

I ran into an interesting situation with the unittest.TestCase.assertItemsEqual function under Python 2; posting my findings here for posterity.
The following unit test breaks under Python 2 when it should succeed:
import unittest
class Foo(object):
def __init__(self, a=1, b=2):
self.a = a
self.b = b
def __repr__(self):
return '({},{})'.format(self.a, self.b)
def __eq__(self, other):
return self.a == other.a and self.b == other.b
def __lt__(self, other):
return (self.a, self.b) < (other.a, other.b)
class Test(unittest.TestCase):
def test_foo_eq(self):
self.assertEqual(sorted([Foo()]), sorted([Foo()]))
self.assertItemsEqual([Foo()], [Foo()])
unittest.main()
Here is the output:
======================================================================
FAIL: test_foo_eq (__main__.Test)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/tsanders/scripts/one_offs/test_unittest_assert_items_equal2.py", line 17, in test_foo_eq
self.assertItemsEqual([Foo()], [Foo()])
AssertionError: Element counts were not equal:
First has 1, Second has 0: (1,2)
First has 0, Second has 1: (1,2)
----------------------------------------------------------------------
Ran 1 test in 0.000s
FAILED (failures=1)
This is pretty confusing since the docs state:
It [assertItemsEqual] is the equivalent of assertEqual(sorted(expected), sorted(actual)) but it works with sequences of unhashable objects as well.
The same test passes under Python 3 (after swapping self.assertItemsEqual for self.assertCountEqual, since the name has changed).
EDIT: After posting this question, I did find this other question that covers the situation where neither __eq__ nor __hash__ is defined.

To get the test passing under both Python 2 and 3, I had to add the line __hash__ = None to Foo.
The assertItemsEqual/assertCountEqual function takes different code paths depending on whether or not the items in each list are hashable. And according to the docs:
If a class does not define a __cmp__() or __eq__() method it should not define a __hash__() operation either; if it defines __cmp__() or __eq__() but not __hash__(), its instances will not be usable in hashed collections.
With that in mind, Python 2 and 3 have different behavior with regard to __hash__ when __eq__ is defined:
In Python 2, defining __eq__ does not affect the default-provided __hash__, but trying to use the item in a hashed container results in implementation-defined behavior (e.g. keys may be duplicated because they have different hashes, even if __eq__ would return True).
In Python 3, defining __eq__ sets __hash__ to None.
As of Python 2.6, __hash__ can be explicitly set to None to make a class unhashable. In my case, this was required to get assertItemsEqual to use the correct comparison algorithm that relies on __eq__ instead of __hash__.

Related

What is the order of execution of __eq__ if one side inherits from the other? [duplicate]

This question already has an answer here:
python equality precedence
(1 answer)
Closed 3 years ago.
I recently stumbled upon a seemingly strange behavior regarding the order in which __eq__ methods are executed, if one side of the comparison is an object which inherits from the other.
I've tried this in Python 3.7.2 in an admittedly academic example. Usually, given an equality comparison a == b, I expect a.__eq__ to be called first, followed by b.__eq__, if the first call returned NotImplemented. However, this doesn't seem to be the case, if a and b are part of the same class hierarchy. Consider the following example:
class A(object):
def __eq__(self, other):
print("Calling A({}).__eq__".format(self))
return NotImplemented
class B(A):
def __eq__(self, other):
print("Calling B({}).__eq__".format(self))
return True
class C(object):
def __eq__(self, other):
print("Calling C({}).__eq__".format(self))
return False
a = A()
b = B()
c = C()
print("a: {}".format(a)) # output "a: <__main__.A object at 0x7f8fda95f860>"
print("b: {}".format(b)) # output "b: <__main__.B object at 0x7f8fda8bcfd0>"
print("c: {}".format(c)) # output "c: <__main__.C object at 0x7f8fda8bcef0>"
a == a # case 1
a == b # case 2.1
b == a # case 2.2
a == c # case 3.1
c == a # case 3.2
In case 1, I expect a.__eq__ to be called twice and this is also what I get:
Calling A(<__main__.A object at 0x7f8fda95f860>).__eq__
Calling A(<__main__.A object at 0x7f8fda95f860>).__eq__
However, in cases 2.1 and 2.2, b.__eq__ is always executed first, no matter on which side of the comparison it stands:
Calling B(<__main__.B object at 0x7f8fda8bcfd0>).__eq__ # case 2.1
Calling B(<__main__.B object at 0x7f8fda8bcfd0>).__eq__ # case 2.2
In cases 3.1 and 3.2 then, the left-hand side is again evaluated first, as I expected:
Calling A(<__main__.A object at 0x7f8fda95f860>).__eq__ # case 3.1
Calling C(<__main__.C object at 0x7f8fda8bcef0>).__eq__ # case 3.1
Calling C(<__main__.C object at 0x7f8fda8bcef0>).__eq__ # case 3.2
It seems that, if the compared objects are related to each other, __eq__ of the object of the child class is always evaluated first. Is there a deeper reasoning behind this behavior? If so, is this documented somewhere? PEP 207 doesn't mention this case, as far as I can see. Or am I maybe missing something obvious here?
From the official documentation for __eq__ :
If the operands are of different types, and right operand’s type is a direct or indirect subclass of the left operand’s type, the reflected method of the right operand has priority, otherwise the left operand’s method has priority.

python set() membership and hashable objects

I wanted to store instances of a class in a set, so I could use the set methods to find intersections, etc. My class has a __hash__() function, along with an __eq__ and a __lt__, and is decorated with functools.total_ordering
When I create two sets, each containing the same two objects, and do a set_a.difference(set_b), I get a result with a single object, and I have no idea why. I was expecting none, or at the least, 2, indicating a complete failure in my understanding of how sets work. But one?
for a in set_a:
print(a, a.__hash__())
for b in set_b:
print(b, b.__hash__(), b in set_a)
(<foo>, -5267863171333807568)
(<bar>, -8020339072063373731)
(<foo>, -5267863171333807568, False)
(<bar)>, -8020339072063373731, True)
Why is the <foo> object in set_b not considered to be in set_a? What other properties does an object require in order to be considered a member of a set? And why is bar considered to be a part of set_a, but not foo?
edit: updating with some more info. I figured that simply showing that the two objects' hash() results where the same meant that they where indeed the same, so I guess that's where my mistake probably comes from.
#total_ordering
class Thing(object):
def __init__(self, i):
self.i = i
def __eq__(self, other):
return self.i == other.i
def __lt__(self, other):
return self.i < other.i
def __repr__(self):
return "<Thing {}>".format(self.i)
def __hash__(self):
return hash(self.i)
I figured it out thanks to some of the questions in the comments- the problem was due to the fact that I had believed that ultimately, the hash function decides if two objects are the same, or not. The __eq__ also needs to match, which it always did in my tests and attempts to create a minimal example here.
However, when pulling data from a DB in prod, a certain float was being rounded down, and thus, the x == y was failing in prod. Argh.

Removing objects that have changed from a Python set

Given this program:
class Obj:
def __init__(self, a, b):
self.a = a
self.b = b
def __hash__(self):
return hash((self.a, self.b))
class Collection:
def __init__(self):
self.objs = set()
def add(self, obj):
self.objs.add(obj)
def find(self, a, b):
objs = []
for obj in self.objs:
if obj.b == b and obj.a == a:
objs.append(obj)
return objs
def remove(self, a, b):
for obj in self.find(a, b):
print('removing', obj)
self.objs.remove(obj)
o1 = Obj('a1', 'b1')
o2 = Obj('a2', 'b2')
o3 = Obj('a3', 'b3')
o4 = Obj('a4', 'b4')
o5 = Obj('a5', 'b5')
objs = Collection()
for o in (o1, o2, o3, o4, o5):
objs.add(o)
objs.remove('a1', 'b1')
o2.a = 'a1'
o2.b = 'b1'
objs.remove('a1', 'b1')
o3.a = 'a1'
o3.b = 'b1'
objs.remove('a1', 'b1')
o4.a = 'a1'
o4.b = 'b1'
objs.remove('a1', 'b1')
o5.a = 'a1'
o5.b = 'b1'
If I run this a few times with Python 3.4.2, sometimes it will succeed, other times it throws a KeyError after removing 2 or 3 objects:
$ python3 py_set_obj_remove_test.py
removing <__main__.Obj object at 0x7f3648035828>
removing <__main__.Obj object at 0x7f3648035860>
removing <__main__.Obj object at 0x7f3648035898>
removing <__main__.Obj object at 0x7f36480358d0>
$ python3 py_set_obj_remove_test.py
removing <__main__.Obj object at 0x7f156170b828>
removing <__main__.Obj object at 0x7f156170b860>
Traceback (most recent call last):
File "py_set_obj_remove_test.py", line 42, in <module>
objs.remove('a1', 'b1')
File "py_set_obj_remove_test.py", line 27, in remove
self.objs.remove(obj)
KeyError: <__main__.Obj object at 0x7f156170b860>
Is this a bug in Python? Or something about the implementation of sets I don't know about?
Interestingly, it seems to always fail at the second objs.remove() call in Python 2.7.9.
This is not a bug in Python, your code is violating a principle of sets: that the hash value must not change. By mutating your object attributes, the hash changes and the set can no longer reliably locate the object in the set.
From the __hash__ method documentation:
If a class defines mutable objects and implements an __eq__() method, it should not implement __hash__(), since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).
Custom Python classes define a default __eq__ method that returns True when both operands reference the same object (obj1 is obj2 is true).
That it sometimes works in Python 3 is a property of hash randomisation for strings. Because the hash value for a string changes between Python interpreter runs, and because the modulus of a hash against the size of the hash table is used, you can end up with the right hash slot anyway, purely by accident, and then the == equality test will still be true because you didn't implement a custom __eq__ method.
Python 2 has hash randomisation too but it is disabled by default, but you could make your test 'pass' anyway by carefully picking the 'right' values for the a and b attributes.
Instead, you could make your code work by basing your hash on the id() of your instance; that makes the hash value not change and would match the default __eq__ implementation:
def __hash__(self):
return hash(id(self))
You could also just remove your __hash__ implementation for the same effect, as the default implementation does basically the above (with the id() value rotated by 4 bits to evade memory alignment patterns). Again, from the __hash__ documentation:
User-defined classes have __eq__() and __hash__() methods by default; with them, all objects compare unequal (except with themselves) and x.__hash__() returns an appropriate value such that x == y implies both that x is y and hash(x) == hash(y).
Alternatively, implement an __eq__ method that bases equality on equality of the attributes of the instance, and don't mutate the attributes.
You are changing the objects (ie changing the objects' hash) after they were added to the set.
When remove is called, it can't find that hash in the set because it was changed after it was calculated (when the objects were originally added to the set).

What is assertEqual in Python UnitTest supposed to do?

assertEqual(a, b) checks if a == b and return True or False,
The documentation says,
Test that first and second are equal. If the values do not compare
equal, the test will fail.
I'm running three tests with assertEqual on a simple class,
The class on test
class Car:
def __init__(self, name):
self.name = name
The TestCase
class CarTest(unittest.TestCase):
def test_diff_equal(self):
car1 = Car('Ford')
car2 = Car('Hyundai')
self.assertEqual(car1, car2)
def test_name_equal(self):
car1 = Car('Ford')
car2 = Car('Ford')
self.assertEqual(car1, car2)
def test_instance_equal(self):
car1 = Car('Ford')
self.assertEqual(car1, car1)
The results are
F.F
======================================================================
FAIL: test_diff_equal (cartest.CarTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "cartest.py", line 10, in test_diff_equal
self.assertEqual(car1, car2)
AssertionError: <car.Car instance at 0x7f499ec12ef0> != <car.Car instance at 0x7f499ec12f38>
======================================================================
FAIL: test_name_equal (cartest.CarTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "cartest.py", line 15, in test_name_equal
self.assertEqual(car1, car2)
AssertionError: <car.Car instance at 0x7f499ec12fc8> != <car.Car instance at 0x7f499ec12f38>
----------------------------------------------------------------------
Ran 3 tests in 0.000s
FAILED (failures=2)
Is assertEqual used to check if both the instances are same? Or is anything wrong in my setup? Why did test_name_equal() fail?
Your test is working absolutely fine, and it's found a bug. Hurray!
Your two Car objects may have the same name, but why would that mean that they are the same car? Nothing in your code makes that so.
If you want that to be the case, implement __eq__ on the Car class:
def __eq__(self, other):
"""Return True if other is also a car and has the same name as
this one."""
return isinstance(other, Car) and self.name == other.name
Then that test should pass.
The whole question is reducible to "How Python compare objects" what is precisely defined in Section 5.9: Comparisons of official documentation.
Quoting from the official documentation (emphasis mine) to clarify on aspects you're asking of.
Most other objects of built-in types compare unequal unless they are
the same object; the choice whether one object is considered smaller
or larger than another one is made arbitrarily but consistently within
one execution of a program.
That's what's covered by test_instance_equal and what essentially is:
o1 = object()
o1 == o1 # will always be True
The operators <, >, ==, >=, <=, and != compare the values of two
objects. The objects need not have the same type. If both are numbers,
they are converted to a common type. Otherwise, objects of different
types always compare unequal, and are ordered consistently but
arbitrarily. You can control comparison behavior of objects of
non-built-in types by defining a cmp method or rich comparison
methods like gt, described in section Special method names.*
Quoting from special method names:
object.__lt__(self, other)
object.__le__(self, other)
object.__eq__(self, other)
object.__ne__(self, other)
object.__gt__(self, other)
object.__ge__(self, other)
New in version 2.1.
These are the so-called “rich comparison” methods, and are called for
comparison operators in preference to __cmp__() below. The
correspondence between operator symbols and method names is as
follows: (...) x==y calls x.__eq__(y), (...)
That's what test_diff_equal and test_name_equal shows. There isn't any __eq__ magic method defined, and therefore it falls back to the default implementation (they compare unequal unless they are the same object).
The question has nothing to do with unit testing a module.
Adding to what's been already said: For the example given, you'll need to directly compare the attributes of the objects, and not the objects themselves, for unittest.TestCase.assertEqual to work.
class CarTest(unittest.TestCase):
def test_diff_equal(self):
car1 = Car('Ford')
car2 = Car('Hyundai')
self.assertEqual(car1.name, car2.name)
def test_name_equal(self):
car1 = Car('Ford')
car2 = Car('Ford')
self.assertEqual(car1.name, car2.name)
def test_instance_equal(self):
car1 = Car('Ford')
self.assertEqual(car1.name, car1.name)
This should now work (and fail) as expected.

Python - __eq__ method not being called

I have a set of objects, and am interested in getting a specific object from the set. After some research, I decided to use the solution provided here: http://code.activestate.com/recipes/499299/
The problem is that it doesn't appear to be working.
I have two classes defined as such:
class Foo(object):
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
def __key(self):
return (self.a, self.b, self.c)
def __eq__(self, other):
return self.__key() == other.__key()
def __hash__(self):
return hash(self.__key())
class Bar(Foo):
def __init__(self, a, b, c, d, e):
self.a = a
self.b = b
self.c = c
self.d = d
self.e = e
Note: equality of these two classes should only be defined on the attributes a, b, c.
The wrapper _CaptureEq in http://code.activestate.com/recipes/499299/ also defines its own __eq__ method. The problem is that this method never gets called (I think). Consider,
bar_1 = Bar(1,2,3,4,5)
bar_2 = Bar(1,2,3,10,11)
summary = set((bar_1,))
assert(bar_1 == bar_2)
bar_equiv = get_equivalent(summary, bar_2)
bar_equiv.d should equal 4 and likewise bar_equiv .e should equal 5, but they are not. Like I mentioned, it looks like the __CaptureEq __eq__ method does not get called when the statement bar_2 in summary is executed.
Is there some reason why the __CaptureEq __eq__ method is not being called? Hopefully this is not too obscure of a question.
Brandon's answer is informative, but incorrect. There are actually two problems, one with
the recipe relying on _CaptureEq being written as an old-style class (so it won't work properly if you try it on Python 3 with a hash-based container), and one with your own Foo.__eq__ definition claiming definitively that the two objects are not equal when it should be saying "I don't know, ask the other object if we're equal".
The recipe problem is trivial to fix: just define __hash__ on the comparison wrapper class:
class _CaptureEq:
'Object wrapper that remembers "other" for successful equality tests.'
def __init__(self, obj):
self.obj = obj
self.match = obj
# If running on Python 3, this will be a new-style class, and
# new-style classes must delegate hash explicitly in order to populate
# the underlying special method slot correctly.
# On Python 2, it will be an old-style class, so the explicit delegation
# isn't needed (__getattr__ will cover it), but it also won't do any harm.
def __hash__(self):
return hash(self.obj)
def __eq__(self, other):
result = (self.obj == other)
if result:
self.match = other
return result
def __getattr__(self, name): # support anything else needed by __contains__
return getattr(self.obj, name)
The problem with your own __eq__ definition is also easy to fix: return NotImplemented when appropriate so you aren't claiming to provide a definitive answer for comparisons with unknown objects:
class Foo(object):
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
def __key(self):
return (self.a, self.b, self.c)
def __eq__(self, other):
if not isinstance(other, Foo):
# Don't recognise "other", so let *it* decide if we're equal
return NotImplemented
return self.__key() == other.__key()
def __hash__(self):
return hash(self.__key())
With those two fixes, you will find that Raymond's get_equivalent recipe works exactly as it should:
>>> from capture_eq import *
>>> bar_1 = Bar(1,2,3,4,5)
>>> bar_2 = Bar(1,2,3,10,11)
>>> summary = set((bar_1,))
>>> assert(bar_1 == bar_2)
>>> bar_equiv = get_equivalent(summary, bar_2)
>>> bar_equiv.d
4
>>> bar_equiv.e
5
Update: Clarified that the explicit __hash__ override is only needed in order to correctly handle the Python 3 case.
The problem is that the set compares two objects the “wrong way around” for this pattern to intercept the call to __eq__(). The recipe from 2006 evidently was written against containers that, when asked if x was present, went through the candidate y values already present in the container doing:
x == y
comparisons, in which case an __eq__() on x could do special actions during the search. But the set object is doing the comparison the other way around:
y == x
for each y in the set. Therefore this pattern might simply not be usable in this form when your data type is a set. You can confirm this by instrumenting Foo.__eq__() like this:
def __eq__(self, other):
print '__eq__: I am', self.d, self.e, 'and he is', other.d, other.e
return self.__key() == other.__key()
You will then see a message like:
__eq__: I am 4 5 and he is 10 11
confirming that the equality comparison is posing the equality question to the object already in the set — which is, alas, not the object wrapped with Hettinger's _CaptureEq object.
Update:
And I forgot to suggest a way forward: have you thought about using a dictionary? Since you have an idea here of a key that is a subset of the data inside the object, you might find that splitting out the idea of the key from the idea of the object itself might alleviate the need to attempt this kind of convoluted object interception. Just write a new function that, given an object and your dictionary, computes the key and looks in the dictionary and returns the object already in the dictionary if the key is present else inserts the new object at the key.
Update 2: well, look at that — Nick's answer uses a NotImplemented in one direction to force the set to do the comparison in the other direction. Give the guy a few +1's!
There are two issues here. The first is that:
t = _CaptureEq(item)
if t in container:
return t.match
return default
Doesn't do what you think. In particular, t will never be in container, since _CaptureEq doesn't define __hash__. This becomes more obvious in Python 3, since it will point this out to you rather than providing a default __hash__. The code for _CaptureEq seems to believe that providing an __getattr__ will solve this - it won't, since Python's special method lookups are not guaranteed to go through all the same steps as normal attribute lookups - this is the same reason __hash__ (and various others) need to be defined on a class and can't be monkeypatched onto an instance. So, the most direct way around this is to define _CaptureEq.__hash__ like so:
def __hash__(self):
return hash(self.obj)
But that still isn't guaranteed to work, because of the second issue: set lookup is not guaranteed to test equality. sets are based on hashtables, and only do an equality test if there's more than one item in a hash bucket. You can't (and don't want to) force items that hash differently into the same bucket, since that's all an implementation detail of set. The easiest way around this issue, and to neatly sidestep the first one, is to use a list instead:
summary = [bar_1]
assert(bar_1 == bar_2)
bar_equiv = get_equivalent(summary, bar_2)
assert(bar_equiv is bar_1)

Categories