I run into an interesting problem - here is an override of float that compares inexactly:
class Rounder(float):
"""Float wrapper used for inexact comparison."""
__slots__ = ()
def __hash__(self):
raise NotImplementedError
def __eq__(self, b, rel_tol=1e-06, abs_tol=1e-12):
"""Check if the two floats are equal to the sixth place (relatively)
or to the twelfth place (absolutely)."""
try:
return abs(self - b) <= max(rel_tol * max(abs(self), abs(b)),
abs_tol) # could use math.isclose
except TypeError:
return NotImplemented
...
The requirement is that equal objects have equal hashes - but can't seem to to come up with a formula that represents all Rounder(float) instances that would compare the same (so map them all to the same hash value). Most of the advice in the web is on how to define hash/equals for classes that compare based on some (immutable) attributes - does not apply to this case.
There is no valid way to hash these objects. Hashing requires a transitive definition of ==, which your objects do not have. Even if you did something like def __hash__(self): return 0, intransitivity of == would still make your objects unsafe to use as dict keys.
Non-transitivity is one of the big reasons not to define == this way. If you want to do a closeness check, do that explicitly with math.isclose. Don't make that operation ==.
Related
I have two lists of objects. Let's call the lists a and b. The objects (for our intents and purposes) are defined as below:
class MyObj:
def __init__(self, string: str, integer: int):
self.string = string
self.integer = integer
def __eq__(self, other):
if self.integer == other.integer:
pass
else:
return False
if fuzz.ratio(self.string, other.string) > 90: # fuzzywuzzy library checks if strings are "similar enough"
return True
else:
return False
Now what I want to achieve is to check which objects in list a are "in" list b (return true against == when compared to some object in list b).
Currently I'm just looping through them as follows:
for obj in a:
for other_obj in b:
if a == b:
<do something>
break
I strongly suspect that there is a faster way of implementing this. The lists are long. Up to like 100 000 objects each. So this is a big bottleneck in my code.
I looked at this answer Fastest way to search a list in python and it suggests that sets work much better. I'm a bit confused by this though:
How significant is the "removal of duplicates" speedup? I don't expect to have many duplicates in my lists.
Can sets remove duplicates and properly hash when I have defined the eq the way I have?
How would this compare with pre-ordering the list, and using something like binary search? A set is unordered...
So what is the best approach here? Please provide implementation guidelines in the answer as well.
TL;DR, when using fuzzy comparison techniques, sets and sorting can be very difficult to work with without some normalization method. You can try to be smart about reducing search spaces as much as possible, but care should be taken to do it consistently.
If a class defines __eq__ and not __hash__, it is not hashable.
For instance, consider the following class
class Name:
def __init__(self, first, last):
self.first = first
self.last = last
def __repr__(self):
return f'{self.first} {self.last}'
def __eq__(self, other):
return (self.first == other.first) and (self.last == other.last)
Now, if you were to try to create a set with these elements
>>> {Name('Neil', 'Stackoverflow-user')}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'Name'
So, in the case of Name, you would simply define a __hash__ method. However, in your case, this is more difficult since you have fuzzy equality semantics. The only way I can think of to get around this is to have a normalization function that you can prove will be consistent, and use the normalized string instead of the actual string as part of your hash. Take Floats as dictionary keys as an example of needing to normalize in order to use a "fuzzy" type like floats as keys.
For sorting and binary searching, since you are fuzzy-searching, you still need to be careful with things like binary searching. As an example, assume you say equality is determined by being within a certain range of Levenshtein distances. Then book and hook will similar to each other (distance = 1), but hack with a distance of 2, will be closer to hook. So how will you define a good sorting algorithm for fuzzy searching in this case?
One thing to try would be to use some form of group-by/bucketing, like a dictionary of the type Dict[int, List[MyObj]], where instances of MyObj are classified by their one constant, the self.integer field. Then you can try comparing smaller sub-lists. This would at least reduce search spaces by clustering.
Hi I just started learning classes in python and I'm trying to implement an array based list. This is my class and the init constructor.
class List:
def __init__(self,max_capacity=50):
self.array=build_array(max_capacity)
self.count=0
However, I wrote a method equals that returns true if the list equals another. However, it always return false. And yes my append method is working.
def __eq__(self,other):
result=False
if self.array==other:
result=True
else:
result=False
return result
This is how I tested it but it return false?
a_list=List()
b_list=[3,2,1]
a_list.append(3)
a_list.append(2)
a_list.append(1)
print(a_list==b_list)
Any help would be appreciated!
EDIT:
After all the helpful suggestions, I figured out I have to iterate through other and a_list and check the elements.
__eq__, for any class, should handle three cases:
self and other are the same object
self and other are compatible instances (up to duck-typing: they don't need to be instances of the same class, but should support the same interface as necessary)
self and other are not comparable.
Keeping these three points in mind, define __eq__ as
def __eq__(self, other):
if self is other:
return True
try:
return self.array == other.array
except AttributeError:
# other doesn't have an array attribute,
# meaning they can't be equal
return False
Note this assumes that a List instance should compare as equal to another object as long as both objects have equal array attributes (whatever that happens to mean). If that isn't what you want, you'll have to be more specific in your question.
One final option is to fall back to other == self to see if type of other knows how to compare itself to your List class. Equality should be symmetric, so self == other and other == self should produce the same value if, indeed, the two values can be compared for equality.
except AttributeError:
return other == self
Of course, you need to be careful that this doesn't lead to an infinite loop of List and type(other) repeatedly deferring to the other.
You are comparing the array embedded in your instance (self.array) to the entirety of the other object. This will not work well.
You need to compare self.array to other.array and/or convert both objects to the same type before comparing. You probably also need to specify what it means to compare two arrays (i.e., you want a single boolean value that indicates whether all elements are equal, not an array of boolean values for each element).
For the code below, I assume you are using a numpy ndarray for self.array. If not, you could write your own array_equal that will convert other to an array, then compare the lengths of the arrays, then return (self.array==other_as_array).all().
If you want to test for strict equality between the objects (same types, same values), you could use this:
from numpy import array_equal
import numpy as np
class List
...
def __eq__(self, other):
return isinstance(other, List) and array_equal(self.array, other.array)
If you just want to check for equality of the items in the list, regardless of the object type, then you could do this:
def __eq__(self, other):
if isinstance(other, List):
return array_equal(array, other.array)
else:
return array_equal(self.array, other)
I'm trying to understand what Python dictionaries must be doing internally to locate a key. It seems to me that hash would be evaluated first, and if there's a collision, Python would iterate through the keys till it finds one for which eq returns True. Which makes me wonder why the following code works (test code only for understanding the internals):
class MyClass(object):
def __eq__(self, other):
return False
def __hash__(self):
return 42
if __name__=='__main__':
o1 = MyClass()
o2 = MyClass()
d = {o1: 'o1', o2: 'o2'}
assert(o1 in d) # 1
assert(d[o1]=='o1') # 2
assert(o2 in d) # 3
assert(d[o2]=='o2') # 4
Shouldn't the dictionary be unable to find the correct key (returning either 'o1' or 'o2' in both cases #2 and #4, or throwing an error, depending on internal implementation). How is it able to land on the correct key in both cases, when it should never be able to correctly 'equate' the keys (since eq returns False).
All the documentation I've seen on hashing always mentions hash and eq together, never cmp, ne etc, which makes me think these 2 are the only ones that play a role in this scenario.
Anything you use as a dict key has to satisfy the invariant that bool(x == x) is True. (I would have just said x == x, but there are reasonable objects for which that isn't even a boolean.)
The dict assumes that this will hold, so the routine it uses to check key equality actually checks object identity first before using ==. This preliminary check is an implementation detail; you should not rely on it happening or not happening.
Reasonable objects for which (x == x) is not True include float('nan') and numpy.array([1, 2]).
I wrote a python script in python to create two sets. I was trying to override eq in string class, so the equal logic is that if string a is in string b, then a "equal" b. I subclass str class and the two sets contains the new class.
Then I tried to use set.intersect to get the result. But the result always show 0. My code is like this:
# override str class method__eq__
class newString(str):
def __new__(self, origial):
self.value = origial
return str.__new__(self, origial)
def __eq__(self, other):
return other.value in self.value or self.value in other.value
def __ne__(self, other):
return not self.__eq__(other)
def __hash__(self):
return 1
def get_rows():
lines = set([])
for line in file_handler:
lines.add(newString(line.upper()))
unique_new_set = lines.intersection(columb)
intersection_new_set = lines.intersection(columa)
# open file1 and file2 in append model
A = open(mailfile, 'r+U')
B = open(suppfile, 'r+U')
get_rows(intersection, unique, A, AB, CLEAN)
A.close()
B.close()
AB.close()
CLEAN.close()
You cannot use sets to do this, because you also need to produce the same hash values for the two strings. You cannot do that, because you'd have to know up front what containment equalities might exist.
From the object.__hash__ documentation:
Called by built-in function hash() and for operations on members of hashed collections including set, frozenset, and dict. __hash__() should return an integer. The only required property is that objects which compare equal have the same hash value; it is advised to somehow mix together (e.g. using exclusive or) the hash values for the components of the object that also play a part in comparison of objects.
Emphasis mine.
You cannot set the return value of __hash__ to a constant, because then you map all values to the same hash table slot, removing any and all advantages a set might have over other data structures. Instead, you'll get an endless series of hash collisions for any object you try to add to the set, turning a O(1) lookup into O(N).
Sets are the wrong approach because your equality test does not allow for the data to be partitioned into proper sub-sets. If you have the lines The quick brown fox jumps over the lazy dog, quick brown fox and lazy dog, depending on how you build sets you have between 1 and 3 unique values; for sets the values need to be unique in whatever order you add them to a set.
I want to implement Fast Marching Method for Inpainting in Python. In literature, this has been implemented using a min-heap. Since it involves adding, removing and reordering the data structure many times, and each time extracting the smallest element. So the complexity for these operations need to be min preferably.
I know there is a heapq in-built module in Python. It accepts a single float value. However, I need to store 3 different information content corresponding to a pixel. Is there a way I can tweak heapq to accept a list perhaps?
Alternatively, is there a different data structure with this functionality?
heapq takes any type, as long as they are orderable. The items must either support the < lower then or the <= lower or equal than operator (heapq will use the latter if the first isn't available).
For example, you could use tuples ((priority, your_data_structure)); tuples have a relative order based on their contents, starting with the first item.
Or you can use custom objects that implement at least one of __lt__, __le__, __gt__ or __ge__ to implement comparisons between them and thus define an ordering (and, preferably, include a __eq__ equality method too). The functools. total_ordering() decorator would then supply your class with the remaining methods:
from functools import total_ordering
#total_ordering
class PixelInfo(object):
def __init__(self, r, g, b):
self.r, self.g, self.b = r, g, b
def __eq__(self, other):
if not isinstance(other, type(self)): return NotImplemented
return all(getattr(self, c) == getattr(other, c) for c in 'rgb')
def __lt__(self, other):
if not isinstance(other, type(self)): return NotImplemented
return self.r + self.g + self.b < other.r + other.g + other.b
would be an orderable custom class, which heapq would be happy to handle for you.