How can I check if an object is orderable/sortable in Python?
I'm trying to implement basic type checking for the __init__ method of my binary tree class, and I want to be able to check if the value of the node is orderable, and throw an error if it isn't. It's similar to checking for hashability in the implementation of a hashtable.
I'm trying to accomplish something similar to Haskell's (Ord a) => etc. qualifiers. Is there a similar check in Python?
If you want to know if an object is sortable, you must check if it implements the necessary methods of comparison.
In Python 2.X there were two different ways to implement those methods:
cmp method (equivalent of compareTo in Java per example)
__cmp__(self, other): returns >0, 0 or <0 wether self is more, equal or less than other
rich comparison methods
__lt__, __gt__, __eq__, __le__, __ge__, __ne__
The sort() functions call this method to make the necessary comparisons between instances (actually sort only needs the __lt__ or __gt__ methods but it's recommended to implement all of them)
In Python 3.X the __cmp__ was removed in favor of the rich comparison methods as having more than one way to do the same thing is really against Python's "laws".
So, you basically need a function that check if these methods are implemented by a class:
# Python 2.X
def is_sortable(obj):
return hasattr(obj, "__cmp__") or \
hasattr(obj, "__lt__") or \
hasattr(obj, "__gt__")
# Python 3.X
def is_sortable(obj):
cls = obj.__class__
return cls.__lt__ != object.__lt__ or \
cls.__gt__ != object.__gt__
Different functions are needed for Python 2 and 3 because a lot of other things also change about unbound methods, method-wrappers and other internal things in Python 3.
Read this links you want better understanding of the sortable objects in Python:
http://python3porting.com/problems.html#unorderable-types-cmp-and-cmp
http://docs.python.org/2/howto/sorting.html#the-old-way-using-the-cmp-parameter
PS: this was a complete re-edit of my first answer, but it was needed as I investigated the problem better and had a cleaner idea about it :)
While the explanations in answers already here address runtime type inspection, here's how the static types are annotated by typeshed. They start by defining a collection of comparison Protocols, e.g.
class SupportsDunderLT(Protocol):
def __lt__(self, __other: Any) -> bool: ...
which are then collected into rich comparison sum types, such as
SupportsRichComparison = Union[SupportsDunderLT, SupportsDunderGT]
SupportsRichComparisonT = TypeVar("SupportsRichComparisonT", bound=SupportsRichComparison)
then finally these are used to type e.g. the key functions of list.sort:
#overload
def sort(self: list[SupportsRichComparisonT], *, key: None = ..., reverse: bool = ...) -> None: ...
#overload
def sort(self, *, key: Callable[[_T], SupportsRichComparison], reverse: bool = ...) -> None: ...
and sorted:
#overload
def sorted(
__iterable: Iterable[SupportsRichComparisonT], *, key: None = ..., reverse: bool = ...
) -> list[SupportsRichComparisonT]: ...
#overload
def sorted(__iterable: Iterable[_T], *, key: Callable[[_T], SupportsRichComparison], reverse: bool = ...) -> list[_T]: ...
Regrettably it is not enough to check that your object implements lt.
numpy uses the '<' operator to return an array of Booleans, which has no truth value. SQL Alchemy uses it to return a query filter, which again no truth value.
Ordinary sets uses it to check for a subset relationship, so that
set1 = {1,2}
set2 = {2,3}
set1 == set2
False
set1 < set2
False
set1 > set2
False
The best partial solution I could think of (starting from a single object of unknown type) is this, but with rich comparisons it seems to be officially impossible to determine orderability:
if hasattr(x, '__lt__'):
try:
isOrderable = ( ((x == x) is True) and ((x > x) is False)
and not isinstance(x, (set, frozenset)) )
except:
isOrderable = False
else:
isOrderable = False
Edited
As far as I know, all lists are sortable, so if you want to know if a list is "sortable", the answer is yes, no mather what elements it has.
class C:
def __init__(self):
self.a = 5
self.b = "asd"
c = C()
d = True
list1 = ["abc", "aad", c, 1, "b", 2, d]
list1.sort()
print list1
>>> [<__main__.C instance at 0x0000000002B7DF08>, 1, True, 2, 'aad', 'abc', 'b']
You could determine what types you consider "sortable" and implement a method to verify if all elements in the list are "sortable", something like this:
def isSortable(list1):
types = [int, float, str]
res = True
for e in list1:
res = res and (type(e) in types)
return res
print isSortable([1,2,3.0, "asd", [1,2,3]])
Related
I have a class called Transaction, which contains multiple attributes. If any of these attributes match, then i want those transactions to be treated as duplicate transactions and hence do not want to store duplicates in a set.
class Transaction:
def __init__(self, a, b):
self.a = a
self.b = b
def __eq__(self, other):
if not isinstance(other, Transaction):
return NotImplemented
return self.a == other.a or self.b == other.b
def __hash__(self):
# TODO
I learnt that it is important to implement both __eq__ as well as __hash__ if we want to avoid duplicates while inserting in a set. Also, if A == B, then their hashes should also match as per the contract.
How can i implement __hash__ in this case, so that if i try to insert a transaction into the set, then it is rejected if it contains repeated value of either attribute 'a' or 'b'.
Thanks in advance!
I'm not sure it's possible to compress an or condition like this into a single hash value. I tried experimenting with applying DeMorgan's law (not nand instead of or) but came up empty.
Your best bet for making the type hashable might just be to return a constant value (such that all instances have the same hash), and rely on the hashtable's collision behavior.
This is implicitly allowed by the standards, because the rule is
a == b implies hash(a) == hash(b)
and not
hash(a) == hash(b) implies a == b
which has never been the case (after all, hash collisions are expected to happen occasionally - a hash is only 32 or 64 bits large)
A set will accommodate for this behavior with its natural collision-avoidance behavior, and while this will not at all be performant, it will at least allow you to use the set data structure in the first place.
>>> class A:
... def __init__(self, prop):
... self.prop = prop
... def __repr__(self):
... return f'A({self.prop})'
... def __eq__(self, other):
... return self.prop == other.prop
... def __hash__(self):
... return 0
...
>>> {A(1), A(2), A(3), A(1)}
{A(1), A(2), A(3)}
Admittedly, this kind of defeats the purpose of using a set, though there might be more point to it if you were using your objects as keys in a dict.
I think it's not possible and you shouldn't do that. Whatever you use in your __eq__ , should also be present in __hash__, otherwise:
let's say you only use hash of a in your __hash__, you would end up with a scenario that two equal objects have different hashes(because their bs are equal) which contradict the actual rule:
if obj1 == obj2 -> True then hash(obj1) == hash(obj2) "must" be True
Same with using only b in __hash__.
I have a class MyClass, which contains two member variables foo and bar:
class MyClass:
def __init__(self, foo, bar):
self.foo = foo
self.bar = bar
I have two instances of this class, each of which has identical values for foo and bar:
x = MyClass('foo', 'bar')
y = MyClass('foo', 'bar')
However, when I compare them for equality, Python returns False:
>>> x == y
False
How can I make python consider these two objects equal?
You should implement the method __eq__:
class MyClass:
def __init__(self, foo, bar):
self.foo = foo
self.bar = bar
def __eq__(self, other):
if not isinstance(other, MyClass):
# don't attempt to compare against unrelated types
return NotImplemented
return self.foo == other.foo and self.bar == other.bar
Now it outputs:
>>> x == y
True
Note that implementing __eq__ will automatically make instances of your class unhashable, which means they can't be stored in sets and dicts. If you're not modelling an immutable type (i.e. if the attributes foo and bar may change the value within the lifetime of your object), then it's recommended to just leave your instances as unhashable.
If you are modelling an immutable type, you should also implement the data model hook __hash__:
class MyClass:
...
def __hash__(self):
# necessary for instances to behave sanely in dicts and sets.
return hash((self.foo, self.bar))
A general solution, like the idea of looping through __dict__ and comparing values, is not advisable - it can never be truly general because the __dict__ may have uncomparable or unhashable types contained within.
N.B.: be aware that before Python 3, you may need to use __cmp__ instead of __eq__. Python 2 users may also want to implement __ne__, since a sensible default behaviour for inequality (i.e. inverting the equality result) will not be automatically created in Python 2.
You override the rich comparison operators in your object.
class MyClass:
def __lt__(self, other):
# return comparison
def __le__(self, other):
# return comparison
def __eq__(self, other):
# return comparison
def __ne__(self, other):
# return comparison
def __gt__(self, other):
# return comparison
def __ge__(self, other):
# return comparison
Like this:
def __eq__(self, other):
return self._id == other._id
If you're dealing with one or more classes that you can't change from the inside, there are generic and simple ways to do this that also don't depend on a diff-specific library:
Easiest, unsafe-for-very-complex-objects method
pickle.dumps(a) == pickle.dumps(b)
pickle is a very common serialization lib for Python objects, and will thus be able to serialize pretty much anything, really. In the above snippet, I'm comparing the str from serialized a with the one from b. Unlike the next method, this one has the advantage of also type checking custom classes.
The biggest hassle: due to specific ordering and [de/en]coding methods, pickle may not yield the same result for equal objects, especially when dealing with more complex ones (e.g. lists of nested custom-class instances) like you'll frequently find in some third-party libs. For those cases, I'd recommend a different approach:
Thorough, safe-for-any-object method
You could write a recursive reflection that'll give you serializable objects, and then compare results
from collections.abc import Iterable
BASE_TYPES = [str, int, float, bool, type(None)]
def base_typed(obj):
"""Recursive reflection method to convert any object property into a comparable form.
"""
T = type(obj)
from_numpy = T.__module__ == 'numpy'
if T in BASE_TYPES or callable(obj) or (from_numpy and not isinstance(T, Iterable)):
return obj
if isinstance(obj, Iterable):
base_items = [base_typed(item) for item in obj]
return base_items if from_numpy else T(base_items)
d = obj if T is dict else obj.__dict__
return {k: base_typed(v) for k, v in d.items()}
def deep_equals(*args):
return all(base_typed(args[0]) == base_typed(other) for other in args[1:])
Now it doesn't matter what your objects are, deep equality is assured to work
>>> from sklearn.ensemble import RandomForestClassifier
>>>
>>> a = RandomForestClassifier(max_depth=2, random_state=42)
>>> b = RandomForestClassifier(max_depth=2, random_state=42)
>>>
>>> deep_equals(a, b)
True
The number of comparables doesn't matter as well
>>> c = RandomForestClassifier(max_depth=2, random_state=1000)
>>> deep_equals(a, b, c)
False
My use case for this was checking deep equality among a diverse set of already trained Machine Learning models inside BDD tests. The models belonged to a diverse set of third-party libs. Certainly implementing __eq__ like other answers here suggest wasn't an option for me.
Covering all the bases
You may be in a scenario where one or more of the custom classes being compared do not have a __dict__ implementation. That's not common by any means, but it is the case of a subtype within sklearn's Random Forest classifier: <type 'sklearn.tree._tree.Tree'>. Treat these situations on a case by case basis - e.g. specifically, I decided to replace the content of the afflicted type with the content of a method that gives me representative information on the instance (in this case, the __getstate__ method). For such, the second-to-last row in base_typed became
d = obj if T is dict else obj.__dict__ if '__dict__' in dir(obj) else obj.__getstate__()
Edit: for the sake of organization, I replaced the hideous oneliner above with return dict_from(obj). Here, dict_from is a really generic reflection made to accommodate more obscure libs (I'm looking at you, Doc2Vec)
def isproperty(prop, obj):
return not callable(getattr(obj, prop)) and not prop.startswith('_')
def dict_from(obj):
"""Converts dict-like objects into dicts
"""
if isinstance(obj, dict):
# Dict and subtypes are directly converted
d = dict(obj)
elif '__dict__' in dir(obj):
# Use standard dict representation when available
d = obj.__dict__
elif str(type(obj)) == 'sklearn.tree._tree.Tree':
# Replaces sklearn trees with their state metadata
d = obj.__getstate__()
else:
# Extract non-callable, non-private attributes with reflection
kv = [(p, getattr(obj, p)) for p in dir(obj) if isproperty(p, obj)]
d = {k: v for k, v in kv}
return {k: base_typed(v) for k, v in d.items()}
Do mind none of the above methods yield True for objects with the same key-value pairs in a differing order, as in
>>> a = {'foo':[], 'bar':{}}
>>> b = {'bar':{}, 'foo':[]}
>>> pickle.dumps(a) == pickle.dumps(b)
False
But if you want that you could use Python's built-in sorted method beforehand anyway.
With Dataclasses in Python 3.7 (and above), a comparison of object instances for equality is an inbuilt feature.
A backport for Dataclasses is available for Python 3.6.
(Py37) nsc#nsc-vbox:~$ python
Python 3.7.5 (default, Nov 7 2019, 10:50:52)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from dataclasses import dataclass
>>> #dataclass
... class MyClass():
... foo: str
... bar: str
...
>>> x = MyClass(foo="foo", bar="bar")
>>> y = MyClass(foo="foo", bar="bar")
>>> x == y
True
Implement the __eq__ method in your class; something like this:
def __eq__(self, other):
return self.path == other.path and self.title == other.title
Edit: if you want your objects to compare equal if and only if they have equal instance dictionaries:
def __eq__(self, other):
return self.__dict__ == other.__dict__
As a summary :
It's advised to implement __eq__ rather than __cmp__, except if you run python <= 2.0 (__eq__ has been added in 2.1)
Don't forget to also implement __ne__ (should be something like return not self.__eq__(other) or return not self == other except very special case)
Don`t forget that the operator must be implemented in each custom class you want to compare (see example below).
If you want to compare with object that can be None, you must implement it. The interpreter cannot guess it ... (see example below)
class B(object):
def __init__(self):
self.name = "toto"
def __eq__(self, other):
if other is None:
return False
return self.name == other.name
class A(object):
def __init__(self):
self.toto = "titi"
self.b_inst = B()
def __eq__(self, other):
if other is None:
return False
return (self.toto, self.b_inst) == (other.toto, other.b_inst)
Depending on your specific case, you could do:
>>> vars(x) == vars(y)
True
See Python dictionary from an object's fields
You should implement the method __eq__:
class MyClass:
def __init__(self, foo, bar, name):
self.foo = foo
self.bar = bar
self.name = name
def __eq__(self,other):
if not isinstance(other,MyClass):
return NotImplemented
else:
#string lists of all method names and properties of each of these objects
prop_names1 = list(self.__dict__)
prop_names2 = list(other.__dict__)
n = len(prop_names1) #number of properties
for i in range(n):
if getattr(self,prop_names1[i]) != getattr(other,prop_names2[i]):
return False
return True
When comparing instances of objects, the __cmp__ function is called.
If the == operator is not working for you by default, you can always redefine the __cmp__ function for the object.
Edit:
As has been pointed out, the __cmp__ function is deprecated since 3.0.
Instead you should use the “rich comparison” methods.
I wrote this and placed it in a test/utils module in my project. For cases when its not a class, just plan ol' dict, this will traverse both objects and ensure
every attribute is equal to its counterpart
No dangling attributes exist (attrs that only exist on one object)
Its big... its not sexy... but oh boi does it work!
def assertObjectsEqual(obj_a, obj_b):
def _assert(a, b):
if a == b:
return
raise AssertionError(f'{a} !== {b} inside assertObjectsEqual')
def _check(a, b):
if a is None or b is None:
_assert(a, b)
for k,v in a.items():
if isinstance(v, dict):
assertObjectsEqual(v, b[k])
else:
_assert(v, b[k])
# Asserting both directions is more work
# but it ensures no dangling values on
# on either object
_check(obj_a, obj_b)
_check(obj_b, obj_a)
You can clean it up a little by removing the _assert and just using plain ol' assert but then the message you get when it fails is very unhelpful.
Below works (in my limited testing) by doing deep compare between two object hierarchies. In handles various cases including the cases when objects themselves or their attributes are dictionaries.
def deep_comp(o1:Any, o2:Any)->bool:
# NOTE: dict don't have __dict__
o1d = getattr(o1, '__dict__', None)
o2d = getattr(o2, '__dict__', None)
# if both are objects
if o1d is not None and o2d is not None:
# we will compare their dictionaries
o1, o2 = o1.__dict__, o2.__dict__
if o1 is not None and o2 is not None:
# if both are dictionaries, we will compare each key
if isinstance(o1, dict) and isinstance(o2, dict):
for k in set().union(o1.keys() ,o2.keys()):
if k in o1 and k in o2:
if not deep_comp(o1[k], o2[k]):
return False
else:
return False # some key missing
return True
# mismatched object types or both are scalers, or one or both None
return o1 == o2
This is a very tricky code so please add any cases that might not work for you in comments.
class Node:
def __init__(self, value):
self.value = value
self.next = None
def __repr__(self):
return str(self.value)
def __eq__(self,other):
return self.value == other.value
node1 = Node(1)
node2 = Node(1)
print(f'node1 id:{id(node1)}')
print(f'node2 id:{id(node2)}')
print(node1 == node2)
>>> node1 id:4396696848
>>> node2 id:4396698000
>>> True
Use the setattr function. You might want to use this when you can't add something inside the class itself, say, when you are importing the class.
setattr(MyClass, "__eq__", lambda x, y: x.foo == y.foo and x.bar == y.bar)
If you want to get an attribute-by-attribute comparison, and see if and where it fails, you can use the following list comprehension:
[i for i,j in
zip([getattr(obj_1, attr) for attr in dir(obj_1)],
[getattr(obj_2, attr) for attr in dir(obj_2)])
if not i==j]
The extra advantage here is that you can squeeze it one line and enter in the "Evaluate Expression" window when debugging in PyCharm.
I tried the initial example (see 7 above) and it did not work in ipython. Note that cmp(obj1,obj2) returns a "1" when implemented using two identical object instances. Oddly enough when I modify one of the attribute values and recompare, using cmp(obj1,obj2) the object continues to return a "1". (sigh...)
Ok, so what you need to do is iterate two objects and compare each attribute using the == sign.
Instance of a class when compared with == comes to non-equal. The best way is to ass the cmp function to your class which will do the stuff.
If you want to do comparison by the content you can simply use cmp(obj1,obj2)
In your case cmp(doc1,doc2) It will return -1 if the content wise they are same.
if object in lst:
#do something
As far as I can tell, when you execute this statement it is internally checking == between object and every element in lst, which will refer to the __eq__ methods of these two objects. This can have the implication of two distinct objects being "equal", which is usually desired if all of their attributes are the same.
However, is there a way to Pythonically achieve a predicate such as in where the underlying equality check is is - i.e. we're actually checking if the two references are to the same object?
3list membership in python is dictated by the __contains__ dunder method. You can choose to overwrite this for a custom implementation if you want to use the normal "in" syntax:
class my_list(list):
def __contains__(self, x):
for y in self:
if x is y:
return True
return False
4 in my_list([4, [3,2,1]])
>> True
[3,2,1] in my_list([4, [3,2,1]]) # Because while the lists are "==" equal, they have different pointers.
>>> False
Otherwise, I'd suggest kaya3's answer of using a generator check.
Use the any function:
if any(x is object for x in lst):
# ...
if you want to specifically use is then just use filter like:
filtered_list = filter(lambda n: n is object, list)
I'm confused as to how the == operator works in Python 3. From the docs, eq(a, b) is equivalent to a == b. Also eq and __eq__ are equivalent.
Take the following example:
class Potato:
def __eq__(self, other):
print("In Potato's __eq__")
return True
>> p = Potato()
>> p == "hello"
In Potato's __eq__ # As expected, p.__eq__("hello") is called
True
>> "hello" == p
In Potato's __eq__ # Hmm, I expected this to be false because
True # this should call "hello".__eq__(p)
>> "hello".__eq__(p)
NotImplemented # Not implemented? How does == work for strings then?
AFAIK, the docs only talk about the == -> __eq__ mapping, but don't say anything about what happens either one of the arguments is not an object (e.g. 1 == p), or when the first object's __eq__ is NotImplemented, like we saw with "hello".__eq(p).
I'm looking for the general algorithm that is employed for equality... Most, if not all other SO answers, refer to Python 2's coercion rules, which don't apply anymore in Python 3.
You're mixing up the functions in the operator module and the methods used to implement those operators. operator.eq(a, b) is equivalent to a == b or operator.__eq__(a, b), but not to a.__eq__(b).
In terms of the __eq__ method, == and operator.eq work as follows:
def eq(a, b):
if type(a) is not type(b) and issubclass(type(b), type(a)):
# Give type(b) priority
a, b = b, a
result = a.__eq__(b)
if result is NotImplemented:
result = b.__eq__(a)
if result is NotImplemented:
result = a is b
return result
with the caveat that the real code performs method lookup for __eq__ in a way that bypasses instance dicts and custom __getattribute__/__getattr__ methods.
When you do this:
"hello" == potato
Python first calls "hello".__eq__(potato). That return NotImplemented, so Python tries it the other way: potato.__eq__("hello").
Returning NotImplemented doesn't mean there's no implementation of .__eq__ on that object. It means that the implementation didn't know how to compare to the value that was passed in. From https://docs.python.org/3/library/constants.html#NotImplemented:
Note When a binary (or in-place) method returns NotImplemented the
interpreter will try the reflected operation on the other type (or
some other fallback, depending on the operator). If all attempts
return NotImplemented, the interpreter will raise an appropriate
exception. Incorrectly returning NotImplemented will result in a
misleading error message or the NotImplemented value being returned to
Python code. See Implementing the arithmetic operations for examples.
I'm confused as to how the == operator works in Python 3. From the docs, eq(a, b) is equivalent to a == b. Also eq and __eq__ are equivalent.
No that is only the case in the operator module. The operator module is used to pass an == as a function for instance. But operator has not much to do with vanilla Python itself.
AFAIK, the docs only talk about the == -> eq mapping, but don't say anything about what happens either one of the arguments is not an object (e.g. 1 == p), or when the first object's.
In Python everything is an object: an int is an object, a "class" is an object", a None is an object, etc. We can for instance get the __eq__ of 0:
>>> (0).__eq__
<method-wrapper '__eq__' of int object at 0x55a81fd3a480>
So the equality is implemented in the "int class". As specified in the documentation on the datamodel __eq__ can return several values: True, False but any other object (for which the truthiness will be calculated). If on the other hand NotImplemented is returned, Python will fallback and call the __eq__ object on the object on the other side of the equation.
I have a class MyClass, which contains two member variables foo and bar:
class MyClass:
def __init__(self, foo, bar):
self.foo = foo
self.bar = bar
I have two instances of this class, each of which has identical values for foo and bar:
x = MyClass('foo', 'bar')
y = MyClass('foo', 'bar')
However, when I compare them for equality, Python returns False:
>>> x == y
False
How can I make python consider these two objects equal?
You should implement the method __eq__:
class MyClass:
def __init__(self, foo, bar):
self.foo = foo
self.bar = bar
def __eq__(self, other):
if not isinstance(other, MyClass):
# don't attempt to compare against unrelated types
return NotImplemented
return self.foo == other.foo and self.bar == other.bar
Now it outputs:
>>> x == y
True
Note that implementing __eq__ will automatically make instances of your class unhashable, which means they can't be stored in sets and dicts. If you're not modelling an immutable type (i.e. if the attributes foo and bar may change the value within the lifetime of your object), then it's recommended to just leave your instances as unhashable.
If you are modelling an immutable type, you should also implement the data model hook __hash__:
class MyClass:
...
def __hash__(self):
# necessary for instances to behave sanely in dicts and sets.
return hash((self.foo, self.bar))
A general solution, like the idea of looping through __dict__ and comparing values, is not advisable - it can never be truly general because the __dict__ may have uncomparable or unhashable types contained within.
N.B.: be aware that before Python 3, you may need to use __cmp__ instead of __eq__. Python 2 users may also want to implement __ne__, since a sensible default behaviour for inequality (i.e. inverting the equality result) will not be automatically created in Python 2.
You override the rich comparison operators in your object.
class MyClass:
def __lt__(self, other):
# return comparison
def __le__(self, other):
# return comparison
def __eq__(self, other):
# return comparison
def __ne__(self, other):
# return comparison
def __gt__(self, other):
# return comparison
def __ge__(self, other):
# return comparison
Like this:
def __eq__(self, other):
return self._id == other._id
If you're dealing with one or more classes that you can't change from the inside, there are generic and simple ways to do this that also don't depend on a diff-specific library:
Easiest, unsafe-for-very-complex-objects method
pickle.dumps(a) == pickle.dumps(b)
pickle is a very common serialization lib for Python objects, and will thus be able to serialize pretty much anything, really. In the above snippet, I'm comparing the str from serialized a with the one from b. Unlike the next method, this one has the advantage of also type checking custom classes.
The biggest hassle: due to specific ordering and [de/en]coding methods, pickle may not yield the same result for equal objects, especially when dealing with more complex ones (e.g. lists of nested custom-class instances) like you'll frequently find in some third-party libs. For those cases, I'd recommend a different approach:
Thorough, safe-for-any-object method
You could write a recursive reflection that'll give you serializable objects, and then compare results
from collections.abc import Iterable
BASE_TYPES = [str, int, float, bool, type(None)]
def base_typed(obj):
"""Recursive reflection method to convert any object property into a comparable form.
"""
T = type(obj)
from_numpy = T.__module__ == 'numpy'
if T in BASE_TYPES or callable(obj) or (from_numpy and not isinstance(T, Iterable)):
return obj
if isinstance(obj, Iterable):
base_items = [base_typed(item) for item in obj]
return base_items if from_numpy else T(base_items)
d = obj if T is dict else obj.__dict__
return {k: base_typed(v) for k, v in d.items()}
def deep_equals(*args):
return all(base_typed(args[0]) == base_typed(other) for other in args[1:])
Now it doesn't matter what your objects are, deep equality is assured to work
>>> from sklearn.ensemble import RandomForestClassifier
>>>
>>> a = RandomForestClassifier(max_depth=2, random_state=42)
>>> b = RandomForestClassifier(max_depth=2, random_state=42)
>>>
>>> deep_equals(a, b)
True
The number of comparables doesn't matter as well
>>> c = RandomForestClassifier(max_depth=2, random_state=1000)
>>> deep_equals(a, b, c)
False
My use case for this was checking deep equality among a diverse set of already trained Machine Learning models inside BDD tests. The models belonged to a diverse set of third-party libs. Certainly implementing __eq__ like other answers here suggest wasn't an option for me.
Covering all the bases
You may be in a scenario where one or more of the custom classes being compared do not have a __dict__ implementation. That's not common by any means, but it is the case of a subtype within sklearn's Random Forest classifier: <type 'sklearn.tree._tree.Tree'>. Treat these situations on a case by case basis - e.g. specifically, I decided to replace the content of the afflicted type with the content of a method that gives me representative information on the instance (in this case, the __getstate__ method). For such, the second-to-last row in base_typed became
d = obj if T is dict else obj.__dict__ if '__dict__' in dir(obj) else obj.__getstate__()
Edit: for the sake of organization, I replaced the hideous oneliner above with return dict_from(obj). Here, dict_from is a really generic reflection made to accommodate more obscure libs (I'm looking at you, Doc2Vec)
def isproperty(prop, obj):
return not callable(getattr(obj, prop)) and not prop.startswith('_')
def dict_from(obj):
"""Converts dict-like objects into dicts
"""
if isinstance(obj, dict):
# Dict and subtypes are directly converted
d = dict(obj)
elif '__dict__' in dir(obj):
# Use standard dict representation when available
d = obj.__dict__
elif str(type(obj)) == 'sklearn.tree._tree.Tree':
# Replaces sklearn trees with their state metadata
d = obj.__getstate__()
else:
# Extract non-callable, non-private attributes with reflection
kv = [(p, getattr(obj, p)) for p in dir(obj) if isproperty(p, obj)]
d = {k: v for k, v in kv}
return {k: base_typed(v) for k, v in d.items()}
Do mind none of the above methods yield True for objects with the same key-value pairs in a differing order, as in
>>> a = {'foo':[], 'bar':{}}
>>> b = {'bar':{}, 'foo':[]}
>>> pickle.dumps(a) == pickle.dumps(b)
False
But if you want that you could use Python's built-in sorted method beforehand anyway.
With Dataclasses in Python 3.7 (and above), a comparison of object instances for equality is an inbuilt feature.
A backport for Dataclasses is available for Python 3.6.
(Py37) nsc#nsc-vbox:~$ python
Python 3.7.5 (default, Nov 7 2019, 10:50:52)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from dataclasses import dataclass
>>> #dataclass
... class MyClass():
... foo: str
... bar: str
...
>>> x = MyClass(foo="foo", bar="bar")
>>> y = MyClass(foo="foo", bar="bar")
>>> x == y
True
Implement the __eq__ method in your class; something like this:
def __eq__(self, other):
return self.path == other.path and self.title == other.title
Edit: if you want your objects to compare equal if and only if they have equal instance dictionaries:
def __eq__(self, other):
return self.__dict__ == other.__dict__
As a summary :
It's advised to implement __eq__ rather than __cmp__, except if you run python <= 2.0 (__eq__ has been added in 2.1)
Don't forget to also implement __ne__ (should be something like return not self.__eq__(other) or return not self == other except very special case)
Don`t forget that the operator must be implemented in each custom class you want to compare (see example below).
If you want to compare with object that can be None, you must implement it. The interpreter cannot guess it ... (see example below)
class B(object):
def __init__(self):
self.name = "toto"
def __eq__(self, other):
if other is None:
return False
return self.name == other.name
class A(object):
def __init__(self):
self.toto = "titi"
self.b_inst = B()
def __eq__(self, other):
if other is None:
return False
return (self.toto, self.b_inst) == (other.toto, other.b_inst)
Depending on your specific case, you could do:
>>> vars(x) == vars(y)
True
See Python dictionary from an object's fields
You should implement the method __eq__:
class MyClass:
def __init__(self, foo, bar, name):
self.foo = foo
self.bar = bar
self.name = name
def __eq__(self,other):
if not isinstance(other,MyClass):
return NotImplemented
else:
#string lists of all method names and properties of each of these objects
prop_names1 = list(self.__dict__)
prop_names2 = list(other.__dict__)
n = len(prop_names1) #number of properties
for i in range(n):
if getattr(self,prop_names1[i]) != getattr(other,prop_names2[i]):
return False
return True
When comparing instances of objects, the __cmp__ function is called.
If the == operator is not working for you by default, you can always redefine the __cmp__ function for the object.
Edit:
As has been pointed out, the __cmp__ function is deprecated since 3.0.
Instead you should use the “rich comparison” methods.
I wrote this and placed it in a test/utils module in my project. For cases when its not a class, just plan ol' dict, this will traverse both objects and ensure
every attribute is equal to its counterpart
No dangling attributes exist (attrs that only exist on one object)
Its big... its not sexy... but oh boi does it work!
def assertObjectsEqual(obj_a, obj_b):
def _assert(a, b):
if a == b:
return
raise AssertionError(f'{a} !== {b} inside assertObjectsEqual')
def _check(a, b):
if a is None or b is None:
_assert(a, b)
for k,v in a.items():
if isinstance(v, dict):
assertObjectsEqual(v, b[k])
else:
_assert(v, b[k])
# Asserting both directions is more work
# but it ensures no dangling values on
# on either object
_check(obj_a, obj_b)
_check(obj_b, obj_a)
You can clean it up a little by removing the _assert and just using plain ol' assert but then the message you get when it fails is very unhelpful.
Below works (in my limited testing) by doing deep compare between two object hierarchies. In handles various cases including the cases when objects themselves or their attributes are dictionaries.
def deep_comp(o1:Any, o2:Any)->bool:
# NOTE: dict don't have __dict__
o1d = getattr(o1, '__dict__', None)
o2d = getattr(o2, '__dict__', None)
# if both are objects
if o1d is not None and o2d is not None:
# we will compare their dictionaries
o1, o2 = o1.__dict__, o2.__dict__
if o1 is not None and o2 is not None:
# if both are dictionaries, we will compare each key
if isinstance(o1, dict) and isinstance(o2, dict):
for k in set().union(o1.keys() ,o2.keys()):
if k in o1 and k in o2:
if not deep_comp(o1[k], o2[k]):
return False
else:
return False # some key missing
return True
# mismatched object types or both are scalers, or one or both None
return o1 == o2
This is a very tricky code so please add any cases that might not work for you in comments.
class Node:
def __init__(self, value):
self.value = value
self.next = None
def __repr__(self):
return str(self.value)
def __eq__(self,other):
return self.value == other.value
node1 = Node(1)
node2 = Node(1)
print(f'node1 id:{id(node1)}')
print(f'node2 id:{id(node2)}')
print(node1 == node2)
>>> node1 id:4396696848
>>> node2 id:4396698000
>>> True
Use the setattr function. You might want to use this when you can't add something inside the class itself, say, when you are importing the class.
setattr(MyClass, "__eq__", lambda x, y: x.foo == y.foo and x.bar == y.bar)
If you want to get an attribute-by-attribute comparison, and see if and where it fails, you can use the following list comprehension:
[i for i,j in
zip([getattr(obj_1, attr) for attr in dir(obj_1)],
[getattr(obj_2, attr) for attr in dir(obj_2)])
if not i==j]
The extra advantage here is that you can squeeze it one line and enter in the "Evaluate Expression" window when debugging in PyCharm.
I tried the initial example (see 7 above) and it did not work in ipython. Note that cmp(obj1,obj2) returns a "1" when implemented using two identical object instances. Oddly enough when I modify one of the attribute values and recompare, using cmp(obj1,obj2) the object continues to return a "1". (sigh...)
Ok, so what you need to do is iterate two objects and compare each attribute using the == sign.
Instance of a class when compared with == comes to non-equal. The best way is to ass the cmp function to your class which will do the stuff.
If you want to do comparison by the content you can simply use cmp(obj1,obj2)
In your case cmp(doc1,doc2) It will return -1 if the content wise they are same.