What makes a user-defined class unhashable?

What makes a user-defined class unhashable? - python

The docs say that a class is hashable as long as it defines __hash__ method and __eq__ method. However:
class X(list):
# read-only interface of `tuple` and `list` should be the same, so reuse tuple.__hash__
__hash__ = tuple.__hash__
x1 = X()
s = {x1} # TypeError: unhashable type: 'X'
What makes X unhashable?
Note that I must have identical lists (in terms of regular equality) to be hashed to the same value; otherwise, I will violate this requirement on hash functions:
The only required property is that objects which compare equal have
the same hash value
The docs do warn that a hashable object shouldn't be modified during its lifetime, and of course I don't modify instances of X after creation. Of course, the interpreter won't check that anyway.

Simply setting the __hash__ method to that of the tuple class is not enough. You haven't actually told it how to hash any differently. tuples are hashable because they are immutable. If you really wanted to make you specific example work, it might be like this:
class X2(list):
def __hash__(self):
return hash(tuple(self))
In this case you are actually defining how to hash your custom list subclass. You just have to define exactly how it can generate a hash. You can hash on whatever you want, as opposed to using the tuple's hashing method:
def __hash__(self):
return hash("foobar"*len(self))

From the Python3 docs:
If a class does not define an __eq__() method it should not define a
__hash__() operation either; if it defines __eq__() but not __hash__(), its instances will not be usable as items in hashable collections. If a class defines mutable objects and implements an
__eq__() method, it should not implement __hash__(), since the implementation of hashable collections requires that a key’s hash
value is immutable (if the object’s hash value changes, it will be in
the wrong hash bucket).
Ref: object.__hash__(self)
Sample code:
class Hashable:
pass
class Unhashable:
def __eq__(self, other):
return (self == other)
class HashableAgain:
def __eq__(self, other):
return (self == other)
def __hash__(self):
return id(self)
def main():
# OK
print(hash(Hashable()))
# Throws: TypeError("unhashable type: 'X'",)
print(hash(Unhashable()))
# OK
print(hash(HashableAgain()))

What you could and should do, based on your other question, is:
don't subclass anything, just encapsulate a tuple. It's perfectly fine to do so in the init.
class X(object):
def __init__(self, *args):
self.tpl = args
def __hash__(self):
return hash(self.tpl)
def __eq__(self, other):
return self.tpl == other
def __repr__(self):
return repr(self.tpl)
x1 = X()
s = {x1}
which yields:
>>> s
set([()])
>>> x1
()

An addition to the above answers - For the specific case of a dataclass in python3.7+ - to make a dataclass hashable, you can use
#dataclass(frozen=True)
class YourClass:
pass
as the decoration instead of
#dataclass
class YourClass:
pass

If you don't modify instances of X after creation, why aren't you subclassing tuple?
But I'll point out that this actually doesn't throw an error, at least in Python 2.6.
>>> class X(list):
... __hash__ = tuple.__hash__
... __eq__ = tuple.__eq__
...
>>> x = X()
>>> s = set((x,))
>>> s
set([[]])
I hesitate to say "works" because this doesn't do what you think it does.
>>> a = X()
>>> b = X((5,))
>>> hash(a)
4299954584
>>> hash(b)
4299954672
>>> id(a)
4299954584
>>> id(b)
4299954672
It's just using the object id as a hash. When you actually call __hash__ you still get an error; likewise for __eq__.
>>> a.__hash__()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: descriptor '__hash__' for 'tuple' objects doesn't apply to 'X' object
>>> X().__eq__(X())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: descriptor '__eq__' for 'tuple' objects doesn't apply to 'X' object
I gather that the python internals, for some reason, are detecting that X has a __hash__ and an __eq__ method, but aren't calling them.
The moral of all this is: just write a real hash function. Since this is a sequence object, converting it to a tuple and hashing that is the most obvious approach.
def __hash__(self):
return hash(tuple(self))

Related

Inventory system adds new object instead of increasing amount by 1 [duplicate]

I have a class MyClass, which contains two member variables foo and bar:
class MyClass:
def __init__(self, foo, bar):
self.foo = foo
self.bar = bar
I have two instances of this class, each of which has identical values for foo and bar:
x = MyClass('foo', 'bar')
y = MyClass('foo', 'bar')
However, when I compare them for equality, Python returns False:
>>> x == y
False
How can I make python consider these two objects equal?

You should implement the method __eq__:
class MyClass:
def __init__(self, foo, bar):
self.foo = foo
self.bar = bar
def __eq__(self, other):
if not isinstance(other, MyClass):
# don't attempt to compare against unrelated types
return NotImplemented
return self.foo == other.foo and self.bar == other.bar
Now it outputs:
>>> x == y
True
Note that implementing __eq__ will automatically make instances of your class unhashable, which means they can't be stored in sets and dicts. If you're not modelling an immutable type (i.e. if the attributes foo and bar may change the value within the lifetime of your object), then it's recommended to just leave your instances as unhashable.
If you are modelling an immutable type, you should also implement the data model hook __hash__:
class MyClass:
...
def __hash__(self):
# necessary for instances to behave sanely in dicts and sets.
return hash((self.foo, self.bar))
A general solution, like the idea of looping through __dict__ and comparing values, is not advisable - it can never be truly general because the __dict__ may have uncomparable or unhashable types contained within.
N.B.: be aware that before Python 3, you may need to use __cmp__ instead of __eq__. Python 2 users may also want to implement __ne__, since a sensible default behaviour for inequality (i.e. inverting the equality result) will not be automatically created in Python 2.

You override the rich comparison operators in your object.
class MyClass:
def __lt__(self, other):
# return comparison
def __le__(self, other):
# return comparison
def __eq__(self, other):
# return comparison
def __ne__(self, other):
# return comparison
def __gt__(self, other):
# return comparison
def __ge__(self, other):
# return comparison
Like this:
def __eq__(self, other):
return self._id == other._id

If you're dealing with one or more classes that you can't change from the inside, there are generic and simple ways to do this that also don't depend on a diff-specific library:
Easiest, unsafe-for-very-complex-objects method
pickle.dumps(a) == pickle.dumps(b)
pickle is a very common serialization lib for Python objects, and will thus be able to serialize pretty much anything, really. In the above snippet, I'm comparing the str from serialized a with the one from b. Unlike the next method, this one has the advantage of also type checking custom classes.
The biggest hassle: due to specific ordering and [de/en]coding methods, pickle may not yield the same result for equal objects, especially when dealing with more complex ones (e.g. lists of nested custom-class instances) like you'll frequently find in some third-party libs. For those cases, I'd recommend a different approach:
Thorough, safe-for-any-object method
You could write a recursive reflection that'll give you serializable objects, and then compare results
from collections.abc import Iterable
BASE_TYPES = [str, int, float, bool, type(None)]
def base_typed(obj):
"""Recursive reflection method to convert any object property into a comparable form.
"""
T = type(obj)
from_numpy = T.__module__ == 'numpy'
if T in BASE_TYPES or callable(obj) or (from_numpy and not isinstance(T, Iterable)):
return obj
if isinstance(obj, Iterable):
base_items = [base_typed(item) for item in obj]
return base_items if from_numpy else T(base_items)
d = obj if T is dict else obj.__dict__
return {k: base_typed(v) for k, v in d.items()}
def deep_equals(*args):
return all(base_typed(args[0]) == base_typed(other) for other in args[1:])
Now it doesn't matter what your objects are, deep equality is assured to work
>>> from sklearn.ensemble import RandomForestClassifier
>>>
>>> a = RandomForestClassifier(max_depth=2, random_state=42)
>>> b = RandomForestClassifier(max_depth=2, random_state=42)
>>>
>>> deep_equals(a, b)
True
The number of comparables doesn't matter as well
>>> c = RandomForestClassifier(max_depth=2, random_state=1000)
>>> deep_equals(a, b, c)
False
My use case for this was checking deep equality among a diverse set of already trained Machine Learning models inside BDD tests. The models belonged to a diverse set of third-party libs. Certainly implementing __eq__ like other answers here suggest wasn't an option for me.
Covering all the bases
You may be in a scenario where one or more of the custom classes being compared do not have a __dict__ implementation. That's not common by any means, but it is the case of a subtype within sklearn's Random Forest classifier: <type 'sklearn.tree._tree.Tree'>. Treat these situations on a case by case basis - e.g. specifically, I decided to replace the content of the afflicted type with the content of a method that gives me representative information on the instance (in this case, the __getstate__ method). For such, the second-to-last row in base_typed became
d = obj if T is dict else obj.__dict__ if '__dict__' in dir(obj) else obj.__getstate__()
Edit: for the sake of organization, I replaced the hideous oneliner above with return dict_from(obj). Here, dict_from is a really generic reflection made to accommodate more obscure libs (I'm looking at you, Doc2Vec)
def isproperty(prop, obj):
return not callable(getattr(obj, prop)) and not prop.startswith('_')
def dict_from(obj):
"""Converts dict-like objects into dicts
"""
if isinstance(obj, dict):
# Dict and subtypes are directly converted
d = dict(obj)
elif '__dict__' in dir(obj):
# Use standard dict representation when available
d = obj.__dict__
elif str(type(obj)) == 'sklearn.tree._tree.Tree':
# Replaces sklearn trees with their state metadata
d = obj.__getstate__()
else:
# Extract non-callable, non-private attributes with reflection
kv = [(p, getattr(obj, p)) for p in dir(obj) if isproperty(p, obj)]
d = {k: v for k, v in kv}
return {k: base_typed(v) for k, v in d.items()}
Do mind none of the above methods yield True for objects with the same key-value pairs in a differing order, as in
>>> a = {'foo':[], 'bar':{}}
>>> b = {'bar':{}, 'foo':[]}
>>> pickle.dumps(a) == pickle.dumps(b)
False
But if you want that you could use Python's built-in sorted method beforehand anyway.

With Dataclasses in Python 3.7 (and above), a comparison of object instances for equality is an inbuilt feature.
A backport for Dataclasses is available for Python 3.6.
(Py37) nsc#nsc-vbox:~$ python
Python 3.7.5 (default, Nov 7 2019, 10:50:52)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from dataclasses import dataclass
>>> #dataclass
... class MyClass():
... foo: str
... bar: str
...
>>> x = MyClass(foo="foo", bar="bar")
>>> y = MyClass(foo="foo", bar="bar")
>>> x == y
True

Implement the __eq__ method in your class; something like this:
def __eq__(self, other):
return self.path == other.path and self.title == other.title
Edit: if you want your objects to compare equal if and only if they have equal instance dictionaries:
def __eq__(self, other):
return self.__dict__ == other.__dict__

As a summary :
It's advised to implement __eq__ rather than __cmp__, except if you run python <= 2.0 (__eq__ has been added in 2.1)
Don't forget to also implement __ne__ (should be something like return not self.__eq__(other) or return not self == other except very special case)
Don`t forget that the operator must be implemented in each custom class you want to compare (see example below).
If you want to compare with object that can be None, you must implement it. The interpreter cannot guess it ... (see example below)
class B(object):
def __init__(self):
self.name = "toto"
def __eq__(self, other):
if other is None:
return False
return self.name == other.name
class A(object):
def __init__(self):
self.toto = "titi"
self.b_inst = B()
def __eq__(self, other):
if other is None:
return False
return (self.toto, self.b_inst) == (other.toto, other.b_inst)

Depending on your specific case, you could do:
>>> vars(x) == vars(y)
True
See Python dictionary from an object's fields

You should implement the method __eq__:
class MyClass:
def __init__(self, foo, bar, name):
self.foo = foo
self.bar = bar
self.name = name
def __eq__(self,other):
if not isinstance(other,MyClass):
return NotImplemented
else:
#string lists of all method names and properties of each of these objects
prop_names1 = list(self.__dict__)
prop_names2 = list(other.__dict__)
n = len(prop_names1) #number of properties
for i in range(n):
if getattr(self,prop_names1[i]) != getattr(other,prop_names2[i]):
return False
return True

When comparing instances of objects, the __cmp__ function is called.
If the == operator is not working for you by default, you can always redefine the __cmp__ function for the object.
Edit:
As has been pointed out, the __cmp__ function is deprecated since 3.0.
Instead you should use the “rich comparison” methods.

I wrote this and placed it in a test/utils module in my project. For cases when its not a class, just plan ol' dict, this will traverse both objects and ensure
every attribute is equal to its counterpart
No dangling attributes exist (attrs that only exist on one object)
Its big... its not sexy... but oh boi does it work!
def assertObjectsEqual(obj_a, obj_b):
def _assert(a, b):
if a == b:
return
raise AssertionError(f'{a} !== {b} inside assertObjectsEqual')
def _check(a, b):
if a is None or b is None:
_assert(a, b)
for k,v in a.items():
if isinstance(v, dict):
assertObjectsEqual(v, b[k])
else:
_assert(v, b[k])
# Asserting both directions is more work
# but it ensures no dangling values on
# on either object
_check(obj_a, obj_b)
_check(obj_b, obj_a)
You can clean it up a little by removing the _assert and just using plain ol' assert but then the message you get when it fails is very unhelpful.

Below works (in my limited testing) by doing deep compare between two object hierarchies. In handles various cases including the cases when objects themselves or their attributes are dictionaries.
def deep_comp(o1:Any, o2:Any)->bool:
# NOTE: dict don't have __dict__
o1d = getattr(o1, '__dict__', None)
o2d = getattr(o2, '__dict__', None)
# if both are objects
if o1d is not None and o2d is not None:
# we will compare their dictionaries
o1, o2 = o1.__dict__, o2.__dict__
if o1 is not None and o2 is not None:
# if both are dictionaries, we will compare each key
if isinstance(o1, dict) and isinstance(o2, dict):
for k in set().union(o1.keys() ,o2.keys()):
if k in o1 and k in o2:
if not deep_comp(o1[k], o2[k]):
return False
else:
return False # some key missing
return True
# mismatched object types or both are scalers, or one or both None
return o1 == o2
This is a very tricky code so please add any cases that might not work for you in comments.

class Node:
def __init__(self, value):
self.value = value
self.next = None
def __repr__(self):
return str(self.value)
def __eq__(self,other):
return self.value == other.value
node1 = Node(1)
node2 = Node(1)
print(f'node1 id:{id(node1)}')
print(f'node2 id:{id(node2)}')
print(node1 == node2)
>>> node1 id:4396696848
>>> node2 id:4396698000
>>> True

Use the setattr function. You might want to use this when you can't add something inside the class itself, say, when you are importing the class.
setattr(MyClass, "__eq__", lambda x, y: x.foo == y.foo and x.bar == y.bar)

If you want to get an attribute-by-attribute comparison, and see if and where it fails, you can use the following list comprehension:
[i for i,j in
zip([getattr(obj_1, attr) for attr in dir(obj_1)],
[getattr(obj_2, attr) for attr in dir(obj_2)])
if not i==j]
The extra advantage here is that you can squeeze it one line and enter in the "Evaluate Expression" window when debugging in PyCharm.

I tried the initial example (see 7 above) and it did not work in ipython. Note that cmp(obj1,obj2) returns a "1" when implemented using two identical object instances. Oddly enough when I modify one of the attribute values and recompare, using cmp(obj1,obj2) the object continues to return a "1". (sigh...)
Ok, so what you need to do is iterate two objects and compare each attribute using the == sign.

Instance of a class when compared with == comes to non-equal. The best way is to ass the cmp function to your class which will do the stuff.
If you want to do comparison by the content you can simply use cmp(obj1,obj2)
In your case cmp(doc1,doc2) It will return -1 if the content wise they are same.

Set contains for user defined classes using hash function

Given:
class T:
def __hash__(self):
return 1234
t1 = T()
t2 = T()
my_set = { t1 }
I would expect the following to print True:
print t2 in my_set
Isn't this supposed to print True because t1 and t2 have the same hash value. How can I make the in operator of the set use the given hash function?

You need to define an __eq__ method because only instances that are identical a is b or equal a == b (besides having the same hash) will be recognized as equal by set and dict:
class T:
def __hash__(self):
return 1234
def __eq__(self, other):
return True
t1 = T()
t2 = T()
my_set = { t1 }
print(t2 in my_set) # True
The data model on __hash__ (and the same documentation page for Python 2) explains this:
__hash__
Called by built-in function hash() and for operations on members of hashed collections including set, frozenset, and dict. __hash__() should return an integer. The only required property is that objects which compare equal have the same hash value; it is advised to mix together the hash values of the components of the object that also play a part in comparison of objects by packing them into a tuple and hashing the tuple.
If a class does not define an __eq__() method it should not define a __hash__() operation either; if it defines __eq__() but not __hash__(), its instances will not be usable as items in hashable collections. If a class defines mutable objects and implements an __eq__() method, it should not implement __hash__(), since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).
User-defined classes have __eq__() and __hash__() methods by default; with them, all objects compare unequal (except with themselves) and x.__hash__() returns an appropriate value such that x == y implies both that x is y and hash(x) == hash(y).
(Emphasis mine)
Note: In Python 2 you can also implement a __cmp__ method instead of __eq__.

In psuedocode, the logic for set.__contains__() when called by x in s is roughly:
h = hash(s) # This uses your class's __hash__()
i = h % table_size # This logic is internal to the hash table
if table[i] is empty: return False # Nothing found in the set
if table[i] is x: return True # Identity implies equality
if hash(table[i]) != h: return False # Hash mismatch implies inequality
return table[i] == x # This needs __eq__() in your class

Is it possible to modify the behavior of len()?

I'm aware of creating a custom __repr__ or __add__ method (and so on), to modify the behavior of operators and functions. Is there a method override for len?
For example:
class Foo:
def __repr__(self):
return "A wild Foo Class in its natural habitat."
foo = Foo()
print(foo) # A wild Foo Class in its natural habitat.
print(repr(foo)) # A wild Foo Class in its natural habitat.
Could this be done for len, with a list? Normally, it would look like this:
foo = []
print(len(foo)) # 0
foo = [1, 2, 3]
print(len(foo)) # 3
What if I want to leave search types out of the count? Like this:
class Bar(list):
pass
foo = [Bar(), 1, '']
print(len(foo)) # 3
count = 0
for item in foo:
if not isinstance(item, Bar):
count += 1
print(count) # 2
Is there a way to do this from within a list subclass?

Yes, implement the __len__ method:
def __len__(self):
return 42
Demo:
>>> class Foo(object):
... def __len__(self):
... return 42
...
>>> len(Foo())
42
From the documentation:
Called to implement the built-in function len(). Should return the length of the object, an integer >= 0. Also, an object that doesn’t define a __bool__() method and whose __len__() method returns zero is considered to be false in a Boolean context.
For your specific case:
>>> class Bar(list):
... def __len__(self):
... return sum(1 for ob in self if not isinstance(ob, Bar))
...
>>> len(Bar([1, 2, 3]))
3
>>> len(Bar([1, 2, 3, Bar()]))
3

Yes, just as you have already discovered that you can override the behaviour of a repr() function call by implementing the __repr__ magic method, you can specify the behaviour from a len() function call by implementing (surprise surprise) then __len__ magic:
>>> class Thing:
... def __len__(self):
... return 123
...
>>> len(Thing())
123
A pedant might mention that you are not modifying the behaviour of len(), you are modifying the behaviour of your class. len just does the same thing it always does, which includes checking for a __len__ attribute on the argument.

Remember: Python is a dynamically and Duck Typed language.
If it acts like something that might have a length;
class MyCollection(object):
def __len__(self):
return 1234
Example:
>>> obj = MyCollection()
>>> len(obj)
1234
if it doesn't act like it has a length; KABOOM!
class Foo(object):
def __repr___(self):
return "<Foo>"
Example:
>>> try:
... obj = Foo()
... len(obj)
... except:
... raise
...
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
TypeError: object of type 'Foo' has no len()
From Typing:
Python uses duck typing and has typed objects but untyped variable
names. Type constraints are not checked at compile time; rather,
operations on an object may fail, signifying that the given object is
not of a suitable type. Despite being dynamically typed, Python is
strongly typed, forbidding operations that are not well-defined (for
example, adding a number to a string) rather than silently attempting
to make sense of them.
Example:
>>> x = 1234
>>> s = "1234"
>>> x + s
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'str'

You can just add a __len__ method to your class.
class Test:
def __len__(self):
return 2
a=Test()
len(a) # --> 2

Does a Python object which doesn't override comparison operators equals itself?

class A(object):
def __init__(self, value):
self.value = value
x = A(1)
y = A(2)
q = [x, y]
q.remove(y)
I want to remove from the list a specific object which was added before to it and to which I still have a reference. I do not want an equality test. I want an identity test. This code seems to work in both CPython and IronPython, but does the language guarantee this behavior or is it just a fluke?
The list.remove method documentation is this: same as del s[s.index(x)], which implies that an equality test is performed.
So will an object be equal to itself if you don't override __cmp__, __eq__ or __ne__?

Yes. In your example q.remove(y) would remove the first occurrence of an object which compares equal with y. However, the way the class A is defined, you shouldn't† ever have a variable compare equal with y - with the exception of any other names which are also bound to the same y instance.
The relevant section of the docs is here:
If no __cmp__(), __eq__() or __ne__() operation is defined, class
instances are compared by object identity ("address").
So comparison for A instances is by identity (implemented as memory address in CPython). No other object can have an identity equal to id(y) within y's lifetime, i.e. for as long as you hold a reference to y (which you must, if you're going to remove it from a list!)
† Technically, it is still possible to have objects at other memory locations which are comparing equal - mock.ANY is one such example. But these objects need to override their comparison operators to force the result.

In python, by default an object is always equal to itself (the only exception I can think of is float("nan"). An object of a user-defined class will not be equal to any other object unless you define a comparison function.
See also http://docs.python.org/reference/expressions.html#notin

The answer is yes and no.
Consider the following example
>>> class A(object):
def __init__(self, value):
self.value = value
>>> x = A(1)
>>> y = A(2)
>>> z = A(3)
>>> w = A(3)
>>> q = [x, y,z]
>>> id(y) #Second element in the list and y has the same reference
46167248
>>> id(q[1]) #Second element in the list and y has the same reference
46167248
>>> q.remove(y) #So it just compares the id and removes it
>>> q
[<__main__.A object at 0x02C19AB0>, <__main__.A object at 0x02C19B50>]
>>> q.remove(w) #Fails because though z and w contain the same value yet they are different object
Traceback (most recent call last):
File "<pyshell#11>", line 1, in <module>
q.remove(w)
ValueError: list.remove(x): x not in list
It will remove from the list iff they are the same object. If they are different object with same value it won;t remove it.

Compare object instances for equality by their attributes

I have a class MyClass, which contains two member variables foo and bar:
class MyClass:
def __init__(self, foo, bar):
self.foo = foo
self.bar = bar
I have two instances of this class, each of which has identical values for foo and bar:
x = MyClass('foo', 'bar')
y = MyClass('foo', 'bar')
However, when I compare them for equality, Python returns False:
>>> x == y
False
How can I make python consider these two objects equal?

You should implement the method __eq__:
class MyClass:
def __init__(self, foo, bar):
self.foo = foo
self.bar = bar
def __eq__(self, other):
if not isinstance(other, MyClass):
# don't attempt to compare against unrelated types
return NotImplemented
return self.foo == other.foo and self.bar == other.bar
Now it outputs:
>>> x == y
True
Note that implementing __eq__ will automatically make instances of your class unhashable, which means they can't be stored in sets and dicts. If you're not modelling an immutable type (i.e. if the attributes foo and bar may change the value within the lifetime of your object), then it's recommended to just leave your instances as unhashable.
If you are modelling an immutable type, you should also implement the data model hook __hash__:
class MyClass:
...
def __hash__(self):
# necessary for instances to behave sanely in dicts and sets.
return hash((self.foo, self.bar))
A general solution, like the idea of looping through __dict__ and comparing values, is not advisable - it can never be truly general because the __dict__ may have uncomparable or unhashable types contained within.
N.B.: be aware that before Python 3, you may need to use __cmp__ instead of __eq__. Python 2 users may also want to implement __ne__, since a sensible default behaviour for inequality (i.e. inverting the equality result) will not be automatically created in Python 2.

You override the rich comparison operators in your object.
class MyClass:
def __lt__(self, other):
# return comparison
def __le__(self, other):
# return comparison
def __eq__(self, other):
# return comparison
def __ne__(self, other):
# return comparison
def __gt__(self, other):
# return comparison
def __ge__(self, other):
# return comparison
Like this:
def __eq__(self, other):
return self._id == other._id

If you're dealing with one or more classes that you can't change from the inside, there are generic and simple ways to do this that also don't depend on a diff-specific library:
Easiest, unsafe-for-very-complex-objects method
pickle.dumps(a) == pickle.dumps(b)
pickle is a very common serialization lib for Python objects, and will thus be able to serialize pretty much anything, really. In the above snippet, I'm comparing the str from serialized a with the one from b. Unlike the next method, this one has the advantage of also type checking custom classes.
The biggest hassle: due to specific ordering and [de/en]coding methods, pickle may not yield the same result for equal objects, especially when dealing with more complex ones (e.g. lists of nested custom-class instances) like you'll frequently find in some third-party libs. For those cases, I'd recommend a different approach:
Thorough, safe-for-any-object method
You could write a recursive reflection that'll give you serializable objects, and then compare results
from collections.abc import Iterable
BASE_TYPES = [str, int, float, bool, type(None)]
def base_typed(obj):
"""Recursive reflection method to convert any object property into a comparable form.
"""
T = type(obj)
from_numpy = T.__module__ == 'numpy'
if T in BASE_TYPES or callable(obj) or (from_numpy and not isinstance(T, Iterable)):
return obj
if isinstance(obj, Iterable):
base_items = [base_typed(item) for item in obj]
return base_items if from_numpy else T(base_items)
d = obj if T is dict else obj.__dict__
return {k: base_typed(v) for k, v in d.items()}
def deep_equals(*args):
return all(base_typed(args[0]) == base_typed(other) for other in args[1:])
Now it doesn't matter what your objects are, deep equality is assured to work
>>> from sklearn.ensemble import RandomForestClassifier
>>>
>>> a = RandomForestClassifier(max_depth=2, random_state=42)
>>> b = RandomForestClassifier(max_depth=2, random_state=42)
>>>
>>> deep_equals(a, b)
True
The number of comparables doesn't matter as well
>>> c = RandomForestClassifier(max_depth=2, random_state=1000)
>>> deep_equals(a, b, c)
False
My use case for this was checking deep equality among a diverse set of already trained Machine Learning models inside BDD tests. The models belonged to a diverse set of third-party libs. Certainly implementing __eq__ like other answers here suggest wasn't an option for me.
Covering all the bases
You may be in a scenario where one or more of the custom classes being compared do not have a __dict__ implementation. That's not common by any means, but it is the case of a subtype within sklearn's Random Forest classifier: <type 'sklearn.tree._tree.Tree'>. Treat these situations on a case by case basis - e.g. specifically, I decided to replace the content of the afflicted type with the content of a method that gives me representative information on the instance (in this case, the __getstate__ method). For such, the second-to-last row in base_typed became
d = obj if T is dict else obj.__dict__ if '__dict__' in dir(obj) else obj.__getstate__()
Edit: for the sake of organization, I replaced the hideous oneliner above with return dict_from(obj). Here, dict_from is a really generic reflection made to accommodate more obscure libs (I'm looking at you, Doc2Vec)
def isproperty(prop, obj):
return not callable(getattr(obj, prop)) and not prop.startswith('_')
def dict_from(obj):
"""Converts dict-like objects into dicts
"""
if isinstance(obj, dict):
# Dict and subtypes are directly converted
d = dict(obj)
elif '__dict__' in dir(obj):
# Use standard dict representation when available
d = obj.__dict__
elif str(type(obj)) == 'sklearn.tree._tree.Tree':
# Replaces sklearn trees with their state metadata
d = obj.__getstate__()
else:
# Extract non-callable, non-private attributes with reflection
kv = [(p, getattr(obj, p)) for p in dir(obj) if isproperty(p, obj)]
d = {k: v for k, v in kv}
return {k: base_typed(v) for k, v in d.items()}
Do mind none of the above methods yield True for objects with the same key-value pairs in a differing order, as in
>>> a = {'foo':[], 'bar':{}}
>>> b = {'bar':{}, 'foo':[]}
>>> pickle.dumps(a) == pickle.dumps(b)
False
But if you want that you could use Python's built-in sorted method beforehand anyway.

With Dataclasses in Python 3.7 (and above), a comparison of object instances for equality is an inbuilt feature.
A backport for Dataclasses is available for Python 3.6.
(Py37) nsc#nsc-vbox:~$ python
Python 3.7.5 (default, Nov 7 2019, 10:50:52)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from dataclasses import dataclass
>>> #dataclass
... class MyClass():
... foo: str
... bar: str
...
>>> x = MyClass(foo="foo", bar="bar")
>>> y = MyClass(foo="foo", bar="bar")
>>> x == y
True

Implement the __eq__ method in your class; something like this:
def __eq__(self, other):
return self.path == other.path and self.title == other.title
Edit: if you want your objects to compare equal if and only if they have equal instance dictionaries:
def __eq__(self, other):
return self.__dict__ == other.__dict__

As a summary :
It's advised to implement __eq__ rather than __cmp__, except if you run python <= 2.0 (__eq__ has been added in 2.1)
Don't forget to also implement __ne__ (should be something like return not self.__eq__(other) or return not self == other except very special case)
Don`t forget that the operator must be implemented in each custom class you want to compare (see example below).
If you want to compare with object that can be None, you must implement it. The interpreter cannot guess it ... (see example below)
class B(object):
def __init__(self):
self.name = "toto"
def __eq__(self, other):
if other is None:
return False
return self.name == other.name
class A(object):
def __init__(self):
self.toto = "titi"
self.b_inst = B()
def __eq__(self, other):
if other is None:
return False
return (self.toto, self.b_inst) == (other.toto, other.b_inst)

Depending on your specific case, you could do:
>>> vars(x) == vars(y)
True
See Python dictionary from an object's fields

You should implement the method __eq__:
class MyClass:
def __init__(self, foo, bar, name):
self.foo = foo
self.bar = bar
self.name = name
def __eq__(self,other):
if not isinstance(other,MyClass):
return NotImplemented
else:
#string lists of all method names and properties of each of these objects
prop_names1 = list(self.__dict__)
prop_names2 = list(other.__dict__)
n = len(prop_names1) #number of properties
for i in range(n):
if getattr(self,prop_names1[i]) != getattr(other,prop_names2[i]):
return False
return True

When comparing instances of objects, the __cmp__ function is called.
If the == operator is not working for you by default, you can always redefine the __cmp__ function for the object.
Edit:
As has been pointed out, the __cmp__ function is deprecated since 3.0.
Instead you should use the “rich comparison” methods.

I wrote this and placed it in a test/utils module in my project. For cases when its not a class, just plan ol' dict, this will traverse both objects and ensure
every attribute is equal to its counterpart
No dangling attributes exist (attrs that only exist on one object)
Its big... its not sexy... but oh boi does it work!
def assertObjectsEqual(obj_a, obj_b):
def _assert(a, b):
if a == b:
return
raise AssertionError(f'{a} !== {b} inside assertObjectsEqual')
def _check(a, b):
if a is None or b is None:
_assert(a, b)
for k,v in a.items():
if isinstance(v, dict):
assertObjectsEqual(v, b[k])
else:
_assert(v, b[k])
# Asserting both directions is more work
# but it ensures no dangling values on
# on either object
_check(obj_a, obj_b)
_check(obj_b, obj_a)
You can clean it up a little by removing the _assert and just using plain ol' assert but then the message you get when it fails is very unhelpful.

Below works (in my limited testing) by doing deep compare between two object hierarchies. In handles various cases including the cases when objects themselves or their attributes are dictionaries.
def deep_comp(o1:Any, o2:Any)->bool:
# NOTE: dict don't have __dict__
o1d = getattr(o1, '__dict__', None)
o2d = getattr(o2, '__dict__', None)
# if both are objects
if o1d is not None and o2d is not None:
# we will compare their dictionaries
o1, o2 = o1.__dict__, o2.__dict__
if o1 is not None and o2 is not None:
# if both are dictionaries, we will compare each key
if isinstance(o1, dict) and isinstance(o2, dict):
for k in set().union(o1.keys() ,o2.keys()):
if k in o1 and k in o2:
if not deep_comp(o1[k], o2[k]):
return False
else:
return False # some key missing
return True
# mismatched object types or both are scalers, or one or both None
return o1 == o2
This is a very tricky code so please add any cases that might not work for you in comments.

class Node:
def __init__(self, value):
self.value = value
self.next = None
def __repr__(self):
return str(self.value)
def __eq__(self,other):
return self.value == other.value
node1 = Node(1)
node2 = Node(1)
print(f'node1 id:{id(node1)}')
print(f'node2 id:{id(node2)}')
print(node1 == node2)
>>> node1 id:4396696848
>>> node2 id:4396698000
>>> True

Use the setattr function. You might want to use this when you can't add something inside the class itself, say, when you are importing the class.
setattr(MyClass, "__eq__", lambda x, y: x.foo == y.foo and x.bar == y.bar)

If you want to get an attribute-by-attribute comparison, and see if and where it fails, you can use the following list comprehension:
[i for i,j in
zip([getattr(obj_1, attr) for attr in dir(obj_1)],
[getattr(obj_2, attr) for attr in dir(obj_2)])
if not i==j]
The extra advantage here is that you can squeeze it one line and enter in the "Evaluate Expression" window when debugging in PyCharm.

I tried the initial example (see 7 above) and it did not work in ipython. Note that cmp(obj1,obj2) returns a "1" when implemented using two identical object instances. Oddly enough when I modify one of the attribute values and recompare, using cmp(obj1,obj2) the object continues to return a "1". (sigh...)
Ok, so what you need to do is iterate two objects and compare each attribute using the == sign.

Instance of a class when compared with == comes to non-equal. The best way is to ass the cmp function to your class which will do the stuff.
If you want to do comparison by the content you can simply use cmp(obj1,obj2)
In your case cmp(doc1,doc2) It will return -1 if the content wise they are same.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

What makes a user-defined class unhashable? - python

An addition to the above answers - For the specific case of a dataclass in python3.7+ - to make a dataclass hashable, you can use #dataclass(frozen=True) class YourClass: pass as the decoration instead of #dataclass class YourClass: pass

Related

Inventory system adds new object instead of increasing amount by 1 [duplicate]

Set contains for user defined classes using hash function

Is it possible to modify the behavior of len()?

Does a Python object which doesn't override comparison operators equals itself?

Compare object instances for equality by their attributes

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

What makes a user-defined class unhashable? - python

An addition to the above answers - For the specific case of a dataclass in python3.7+ - to make a dataclass hashable, you can use #dataclass(frozen=True) class YourClass: pass as the decoration instead of #dataclass class YourClass: pass

Related

Inventory system adds new object instead of increasing amount by 1 [duplicate]

Set contains for user defined classes using __hash__ function

Is it possible to modify the behavior of len()?

Does a Python object which doesn't override comparison operators equals itself?

Compare object instances for equality by their attributes

Categories

Resources

Set contains for user defined classes using hash function