python set() membership and hashable objects

python set() membership and hashable objects - python

I wanted to store instances of a class in a set, so I could use the set methods to find intersections, etc. My class has a __hash__() function, along with an __eq__ and a __lt__, and is decorated with functools.total_ordering
When I create two sets, each containing the same two objects, and do a set_a.difference(set_b), I get a result with a single object, and I have no idea why. I was expecting none, or at the least, 2, indicating a complete failure in my understanding of how sets work. But one?
for a in set_a:
print(a, a.__hash__())
for b in set_b:
print(b, b.__hash__(), b in set_a)
(<foo>, -5267863171333807568)
(<bar>, -8020339072063373731)
(<foo>, -5267863171333807568, False)
(<bar)>, -8020339072063373731, True)
Why is the <foo> object in set_b not considered to be in set_a? What other properties does an object require in order to be considered a member of a set? And why is bar considered to be a part of set_a, but not foo?
edit: updating with some more info. I figured that simply showing that the two objects' hash() results where the same meant that they where indeed the same, so I guess that's where my mistake probably comes from.
#total_ordering
class Thing(object):
def __init__(self, i):
self.i = i
def __eq__(self, other):
return self.i == other.i
def __lt__(self, other):
return self.i < other.i
def __repr__(self):
return "<Thing {}>".format(self.i)
def __hash__(self):
return hash(self.i)

I figured it out thanks to some of the questions in the comments- the problem was due to the fact that I had believed that ultimately, the hash function decides if two objects are the same, or not. The __eq__ also needs to match, which it always did in my tests and attempts to create a minimal example here.
However, when pulling data from a DB in prod, a certain float was being rounded down, and thus, the x == y was failing in prod. Argh.

Related

How to have different objects from different classes in one dict in Python?

I had a question about dictionaries with custom objects. In a dict, I know that the key has to be immutable, so if I want to use a custom class, I have to define its hash function. The hash doc in python recommends you use the hash function on the tuple of the equal dunder method. So for example, i defined the custom class temp as such:
class temp():
def __init__(self,value):
self.value = value
def __hash__(self):
return hash(self.value)
def __eq__(self,other):
return self.value == other.value
def __lt__(self,other):
return self.value < other.value
This way I can have they key:value pair such as temp(1):1. So to my question. In python, you can have different types in the same dict. So I declared this dict:
myDict={ temp(1):1, temp(2):2, 'a':1,1:1, (1,2):1, True:1 }
The problem I am facing is that I would get an error for the int:int and bool:int pairing telling me the error:
'bool' object has no attribute 'value'
or
'int' object has no attribute 'value'
Can someone explain to me why this is the case? The same issue would happen if I have a different class in the dict as well. So an object from a cars class would give this error:
'cars' object has no attribute 'value'
Strangely enough in my tests, I found that if the key is a tuple or a float, it works fine.
Any help would be greatly appreciated. I wanted to know why the error is happening and how I can fix it. MY main goal is to learn how to my one dict that has various objects from different classes.

Your eq method needs to check if the other object is the same type:
def __eq__(self,other):
if not isinstance(other, temp):
return NotImplemented
return self.value==other.value
That said, I highly recommend using dataclasses for cases like this. They define init, eq, and (if frozen=True) hash for you, which helps avoid this sort of issue.

You can define your __eq__ method like this:
def __eq__(self, other):
if other is None:
return False
if self.__class__ != other.__class__:
return False
return self.value == other.value
Strangely enough in my tests, I found that if the key is a tuple or a float, it works fine.
As for the second question, this has got to do with how a dict works.
For every key, the instance of dict checks if the hash of the key exists. If yes, then it checks for equality with other keys with the same hash. Here, the check for equality is to check if they are basically the same keys (and hence the same hash). If the equality check fails, then the keys are deemed different.
If there are no hash collisions, then no equality checks are done. Hence, when you used a tuple as a key, say, (1, 2), its hash((1, 2)) = 3713081631934410656, which doesn't yet exist in the dict. Hence no error.

The issue happens when running the __eq__ and __lt__ dunder methods. You can reproduce the same by running:
temp(1) == 1
The issue happens because __eq__ receives other as 1, and the value 1 does not have a .value, but you're trying to use it here:
return self.value == other.value
If you just use other for comparisons it should work:
class temp():
def __init__(self,value):
self.value = value
def __hash__(self):
return hash(self.value)
def __eq__(self,other):
return self.value == other
def __lt__(self,other):
return self.value < other

Decorator to alter function behavior

I've found that I have two unrelated functions that implement identical behavior in different ways. I'm now wondering if there's a way, via decorators probably, to deal with this efficiently, to avoid writing the same logic over and over if the behavior is added elsewhere.
Essentially I have two functions in two different classes that have a flag called exact_match. Both functions check for some type of equivalence in the objects that they are members of. The exact_match flag forces to function to check float comparisons exactly instead of with a tolerance. You can see how I do this below.
def is_close(a, b, rel_tol=1e-09, abs_tol=0.0):
return abs(a-b) <= max(rel_tol * max(abs(a), abs(b)), abs_tol)
def _equal(val_a, val_b):
"""Wrapper for equality test to send in place of is_close."""
return val_a == val_b
#staticmethod
def get_equivalence(obj_a, obj_b, check_name=True, exact_match=False):
equivalence_func = is_close
if exact_match:
# If we're looking for an exact match, changing the function we use to the equality tester.
equivalence_func = _equal
if check_name:
return obj_a.name == obj_b.name
# Check minimum resolutions if they are specified
if 'min_res' in obj_a and 'min_res' in obj_b and not equivalence_func(obj_a['min_res'], obj_b['min_res']):
return False
return False
As you can see, standard procedure has us use the function is_close when we don't need an exact match, but we swap out the function call when we do. Now another function needs this same logic, swapping out the function. Is there a way to use decorators or something similar to handle this type of logic when I know a specific function call may need to be swapped out?

No decorator needed; just pass the desired function as an argument to get_equivalence (which is now little more than a wrapper that applies
the argument).
def make_eq_with_tolerance(rel_tol=1e-09, abs_tol=0.0):
def _(a, b):
return abs(a-b) <= max(rel_tol * max(abs(a), abs(b)), abs_tol)
return _
# This is just operator.eq, by the way
def _equal(val_a, val_b-):
return val_a == val_b
def same_name(a, b):
return a.name == b.name
Now get_equivalence takes three arguments: the two objects to compare
and a function that gets called on those two arguments.
#staticmethod
def get_equivalence(obj_a, obj_b, equivalence_func):
return equivalence_func(obj_a, obj_b)
Some example calls:
get_equivalence(a, b, make_eq_with_tolerance())
get_equivalence(a, b, make_eq_with_tolerance(rel_tol=1e-12)) # Really tight tolerance
get_equivalence(a, b, _equal)
get_equivalence(a, b, same_name)

I came up with an alternative solution that is perhaps less correct but answers let's me solve the problem as I originally wanted to.
My solution uses a utility class that can be used as a member of a class or as a mixin for the class to provide the utility functions in a convenient way. Below, the functions _equals and is_close are defined elsewhere as their implementations is besides the point.
class EquivalenceUtil(object):
def __init__(self, equal_comparator=_equals, inexact_comparator=is_close):
self.equals = equal_comparator
self.default_comparator = inexact_comparator
def check_equivalence(self, obj_a, obj_b, exact_match=False, **kwargs):
return self.equals(obj_a, obj_b, **kwargs) if exact_match else self.default_comparator(obj_a, obj_b, **kwargs)
It's a simple class that can be used like so:
class BBOX(object):
_equivalence = EquivalenceUtil()
def __init__(self, **kwargs):
...
#classmethod
def are_equivalent(cls, bbox_a, bbox_b, exact_match=False):
"""Test for equivalence between two BBOX's."""
bbox_list = bbox_a.as_list
other_list = bbox_b.as_list
for _index in range(0, 3):
if not cls._equivalence.check_equivalence(bbox_list[_index],
other_list[_index],
exact_match=exact_match):
return False
return True
This solution is more opaque to the user about how things are checked behind the scenes, which is important for my project. Additionally it is pretty flexible and can be reused within a class in multiple places and ways, and easily added to a new class.
In my original example the code can turn into this:
class TileGrid(object):
def __init__(self, **kwargs):
...
#staticmethod
def are_equivalent(grid1, grid2, check_name=False, exact_match=False):
if check_name:
return grid1.name == grid2.name
# Check minimum resolutions if they are specified
if 'min_res' in grid1 and 'min_res' in grid2 and not cls._equivalence.check_equivalence(grid1['min_res'], grid2['min_res'], exact_match=exact_match):
return False
# Compare the bounding boxes of the two grids if they exist in the grid
if 'bbox' in grid1 and 'bbox' in grid2:
return BBOX.are_equivalent(grid1.bbox, grid2.bbox, exact_mach=exact_match)
return False
I can't recommend this approach in the general case, because I can't help but feel there's some code smell to it, but it does exactly what I need it to and will solve a great many problems for my current codebase. We have specific requirements, this is a specific solution. The solution by chepner is probably best for the general case of letting the user decide how a function should test equivalence.

How to calculate hash of a python class object

In python3, I have a class. Like below:
class Foo:
def __init__(self):
self.x = 3
def fcn(self, val):
self.x += val
Then I instantiate objects of that class, like so:
new_obj = Foo()
new_obj2 = Foo()
Now when I hash these objects, I get different hash values. I need them to return the same hash, as they are the same objects (in theory).
Any idea how I can do this?

Thank you to all who answered. You're right that instantiating a new instance of the same class object is not actually the same, as it occupies a different place in memory. What I ended up doing is similar to what #nosklo suggested.
I created a 'get_hashables' function that returned a dictionary with all the properties of the class that would constitute a unique class object, like so:
def get_hashables(self):
return {'data': self.data, 'result': self.result}
Then my main method would take these 'hashable' variables, and hash them to produce the hash itself.

class Foo:
def __init__(self):
self.x = 3
def fcn(self, val):
self.x += val
def __hash__(self):
return hash(self.x)
This will calculate the hash using self.x; That means the hash will be the same when self.x is the same. You can return anything from __hash__, but to prevent consistency bugs you should return the same hash if the objects compare equal. More about that in the docs.

They are not the same object. The expression Foo() invokes the class constructor, Foo.__init__, which returns a new, unique instance of the object on each call. Your two calls return two independent objects, residing in different memory locations, each containing its own, private instance of the x attribute.
You might want to read up on Python class and instance theory.

Is an instance of object (but not a subclass) guaranteed to compare unequal to any other object?

Apologies if this is a dupe. (I couldn't find it but I'm not very good with google.)
I just stumbled over some code where they use
x = object()
in a place where they probably want x to compare not equal to anyhing that's already there. Is that guaranteed by the language?

If you compare it by using x == other_object, then this might return True. Since a custom class can override the __eq__ function, and make it for instance equal to every other object.
But we can use is to check whether the two operands refer to the same object. So we can for instance use it like:
dummy = object()
lookup = somedict.get(somekey, dummy):
if lookup is dummy:
# we did *not* find the key in the dictionary
pass
else:
pass
Since we just created the dummy object, there is no way that object can be in the somedict (unless it is of course something like locals()), so as a result we know for sure that if we find the key in the dictionary, then it will not return dummy. So we can use is safely to determine that.

Nothing is guaranteed. You can make anything equals to anything else by implementing __eq__.
Unless you know what x is, nothing is guaranteed and nothing can be assumed.
For example:
class A:
def __eq__(self, other):
return True
print(A() == object())
# True
And the contrary:
class A:
def __eq__(self, other):
return False
print(A() == object())
# False

Python - eq method not being called

I have a set of objects, and am interested in getting a specific object from the set. After some research, I decided to use the solution provided here: http://code.activestate.com/recipes/499299/
The problem is that it doesn't appear to be working.
I have two classes defined as such:
class Foo(object):
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
def __key(self):
return (self.a, self.b, self.c)
def __eq__(self, other):
return self.__key() == other.__key()
def __hash__(self):
return hash(self.__key())
class Bar(Foo):
def __init__(self, a, b, c, d, e):
self.a = a
self.b = b
self.c = c
self.d = d
self.e = e
Note: equality of these two classes should only be defined on the attributes a, b, c.
The wrapper _CaptureEq in http://code.activestate.com/recipes/499299/ also defines its own __eq__ method. The problem is that this method never gets called (I think). Consider,
bar_1 = Bar(1,2,3,4,5)
bar_2 = Bar(1,2,3,10,11)
summary = set((bar_1,))
assert(bar_1 == bar_2)
bar_equiv = get_equivalent(summary, bar_2)
bar_equiv.d should equal 4 and likewise bar_equiv .e should equal 5, but they are not. Like I mentioned, it looks like the __CaptureEq __eq__ method does not get called when the statement bar_2 in summary is executed.
Is there some reason why the __CaptureEq __eq__ method is not being called? Hopefully this is not too obscure of a question.

Brandon's answer is informative, but incorrect. There are actually two problems, one with
the recipe relying on _CaptureEq being written as an old-style class (so it won't work properly if you try it on Python 3 with a hash-based container), and one with your own Foo.__eq__ definition claiming definitively that the two objects are not equal when it should be saying "I don't know, ask the other object if we're equal".
The recipe problem is trivial to fix: just define __hash__ on the comparison wrapper class:
class _CaptureEq:
'Object wrapper that remembers "other" for successful equality tests.'
def __init__(self, obj):
self.obj = obj
self.match = obj
# If running on Python 3, this will be a new-style class, and
# new-style classes must delegate hash explicitly in order to populate
# the underlying special method slot correctly.
# On Python 2, it will be an old-style class, so the explicit delegation
# isn't needed (__getattr__ will cover it), but it also won't do any harm.
def __hash__(self):
return hash(self.obj)
def __eq__(self, other):
result = (self.obj == other)
if result:
self.match = other
return result
def __getattr__(self, name): # support anything else needed by __contains__
return getattr(self.obj, name)
The problem with your own __eq__ definition is also easy to fix: return NotImplemented when appropriate so you aren't claiming to provide a definitive answer for comparisons with unknown objects:
class Foo(object):
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
def __key(self):
return (self.a, self.b, self.c)
def __eq__(self, other):
if not isinstance(other, Foo):
# Don't recognise "other", so let *it* decide if we're equal
return NotImplemented
return self.__key() == other.__key()
def __hash__(self):
return hash(self.__key())
With those two fixes, you will find that Raymond's get_equivalent recipe works exactly as it should:
>>> from capture_eq import *
>>> bar_1 = Bar(1,2,3,4,5)
>>> bar_2 = Bar(1,2,3,10,11)
>>> summary = set((bar_1,))
>>> assert(bar_1 == bar_2)
>>> bar_equiv = get_equivalent(summary, bar_2)
>>> bar_equiv.d
4
>>> bar_equiv.e
5
Update: Clarified that the explicit __hash__ override is only needed in order to correctly handle the Python 3 case.

The problem is that the set compares two objects the “wrong way around” for this pattern to intercept the call to __eq__(). The recipe from 2006 evidently was written against containers that, when asked if x was present, went through the candidate y values already present in the container doing:
x == y
comparisons, in which case an __eq__() on x could do special actions during the search. But the set object is doing the comparison the other way around:
y == x
for each y in the set. Therefore this pattern might simply not be usable in this form when your data type is a set. You can confirm this by instrumenting Foo.__eq__() like this:
def __eq__(self, other):
print '__eq__: I am', self.d, self.e, 'and he is', other.d, other.e
return self.__key() == other.__key()
You will then see a message like:
__eq__: I am 4 5 and he is 10 11
confirming that the equality comparison is posing the equality question to the object already in the set — which is, alas, not the object wrapped with Hettinger's _CaptureEq object.
Update:
And I forgot to suggest a way forward: have you thought about using a dictionary? Since you have an idea here of a key that is a subset of the data inside the object, you might find that splitting out the idea of the key from the idea of the object itself might alleviate the need to attempt this kind of convoluted object interception. Just write a new function that, given an object and your dictionary, computes the key and looks in the dictionary and returns the object already in the dictionary if the key is present else inserts the new object at the key.
Update 2: well, look at that — Nick's answer uses a NotImplemented in one direction to force the set to do the comparison in the other direction. Give the guy a few +1's!

There are two issues here. The first is that:
t = _CaptureEq(item)
if t in container:
return t.match
return default
Doesn't do what you think. In particular, t will never be in container, since _CaptureEq doesn't define __hash__. This becomes more obvious in Python 3, since it will point this out to you rather than providing a default __hash__. The code for _CaptureEq seems to believe that providing an __getattr__ will solve this - it won't, since Python's special method lookups are not guaranteed to go through all the same steps as normal attribute lookups - this is the same reason __hash__ (and various others) need to be defined on a class and can't be monkeypatched onto an instance. So, the most direct way around this is to define _CaptureEq.__hash__ like so:
def __hash__(self):
return hash(self.obj)
But that still isn't guaranteed to work, because of the second issue: set lookup is not guaranteed to test equality. sets are based on hashtables, and only do an equality test if there's more than one item in a hash bucket. You can't (and don't want to) force items that hash differently into the same bucket, since that's all an implementation detail of set. The easiest way around this issue, and to neatly sidestep the first one, is to use a list instead:
summary = [bar_1]
assert(bar_1 == bar_2)
bar_equiv = get_equivalent(summary, bar_2)
assert(bar_equiv is bar_1)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

python set() membership and hashable objects - python

Related

How to have different objects from different classes in one dict in Python?

Decorator to alter function behavior

How to calculate hash of a python class object

Is an instance of object (but not a subclass) guaranteed to compare unequal to any other object?

Python - eq method not being called

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

python set() membership and hashable objects - python

Related

How to have different objects from different classes in one dict in Python?

Decorator to alter function behavior

How to calculate hash of a python class object

Is an instance of object (but not a subclass) guaranteed to compare unequal to any other object?

Python - __eq__ method not being called

Categories

Resources

Python - eq method not being called