I have a set of objects, and am interested in getting a specific object from the set. After some research, I decided to use the solution provided here: http://code.activestate.com/recipes/499299/
The problem is that it doesn't appear to be working.
I have two classes defined as such:
class Foo(object):
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
def __key(self):
return (self.a, self.b, self.c)
def __eq__(self, other):
return self.__key() == other.__key()
def __hash__(self):
return hash(self.__key())
class Bar(Foo):
def __init__(self, a, b, c, d, e):
self.a = a
self.b = b
self.c = c
self.d = d
self.e = e
Note: equality of these two classes should only be defined on the attributes a, b, c.
The wrapper _CaptureEq in http://code.activestate.com/recipes/499299/ also defines its own __eq__ method. The problem is that this method never gets called (I think). Consider,
bar_1 = Bar(1,2,3,4,5)
bar_2 = Bar(1,2,3,10,11)
summary = set((bar_1,))
assert(bar_1 == bar_2)
bar_equiv = get_equivalent(summary, bar_2)
bar_equiv.d should equal 4 and likewise bar_equiv .e should equal 5, but they are not. Like I mentioned, it looks like the __CaptureEq __eq__ method does not get called when the statement bar_2 in summary is executed.
Is there some reason why the __CaptureEq __eq__ method is not being called? Hopefully this is not too obscure of a question.
Brandon's answer is informative, but incorrect. There are actually two problems, one with
the recipe relying on _CaptureEq being written as an old-style class (so it won't work properly if you try it on Python 3 with a hash-based container), and one with your own Foo.__eq__ definition claiming definitively that the two objects are not equal when it should be saying "I don't know, ask the other object if we're equal".
The recipe problem is trivial to fix: just define __hash__ on the comparison wrapper class:
class _CaptureEq:
'Object wrapper that remembers "other" for successful equality tests.'
def __init__(self, obj):
self.obj = obj
self.match = obj
# If running on Python 3, this will be a new-style class, and
# new-style classes must delegate hash explicitly in order to populate
# the underlying special method slot correctly.
# On Python 2, it will be an old-style class, so the explicit delegation
# isn't needed (__getattr__ will cover it), but it also won't do any harm.
def __hash__(self):
return hash(self.obj)
def __eq__(self, other):
result = (self.obj == other)
if result:
self.match = other
return result
def __getattr__(self, name): # support anything else needed by __contains__
return getattr(self.obj, name)
The problem with your own __eq__ definition is also easy to fix: return NotImplemented when appropriate so you aren't claiming to provide a definitive answer for comparisons with unknown objects:
class Foo(object):
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
def __key(self):
return (self.a, self.b, self.c)
def __eq__(self, other):
if not isinstance(other, Foo):
# Don't recognise "other", so let *it* decide if we're equal
return NotImplemented
return self.__key() == other.__key()
def __hash__(self):
return hash(self.__key())
With those two fixes, you will find that Raymond's get_equivalent recipe works exactly as it should:
>>> from capture_eq import *
>>> bar_1 = Bar(1,2,3,4,5)
>>> bar_2 = Bar(1,2,3,10,11)
>>> summary = set((bar_1,))
>>> assert(bar_1 == bar_2)
>>> bar_equiv = get_equivalent(summary, bar_2)
>>> bar_equiv.d
4
>>> bar_equiv.e
5
Update: Clarified that the explicit __hash__ override is only needed in order to correctly handle the Python 3 case.
The problem is that the set compares two objects the “wrong way around” for this pattern to intercept the call to __eq__(). The recipe from 2006 evidently was written against containers that, when asked if x was present, went through the candidate y values already present in the container doing:
x == y
comparisons, in which case an __eq__() on x could do special actions during the search. But the set object is doing the comparison the other way around:
y == x
for each y in the set. Therefore this pattern might simply not be usable in this form when your data type is a set. You can confirm this by instrumenting Foo.__eq__() like this:
def __eq__(self, other):
print '__eq__: I am', self.d, self.e, 'and he is', other.d, other.e
return self.__key() == other.__key()
You will then see a message like:
__eq__: I am 4 5 and he is 10 11
confirming that the equality comparison is posing the equality question to the object already in the set — which is, alas, not the object wrapped with Hettinger's _CaptureEq object.
Update:
And I forgot to suggest a way forward: have you thought about using a dictionary? Since you have an idea here of a key that is a subset of the data inside the object, you might find that splitting out the idea of the key from the idea of the object itself might alleviate the need to attempt this kind of convoluted object interception. Just write a new function that, given an object and your dictionary, computes the key and looks in the dictionary and returns the object already in the dictionary if the key is present else inserts the new object at the key.
Update 2: well, look at that — Nick's answer uses a NotImplemented in one direction to force the set to do the comparison in the other direction. Give the guy a few +1's!
There are two issues here. The first is that:
t = _CaptureEq(item)
if t in container:
return t.match
return default
Doesn't do what you think. In particular, t will never be in container, since _CaptureEq doesn't define __hash__. This becomes more obvious in Python 3, since it will point this out to you rather than providing a default __hash__. The code for _CaptureEq seems to believe that providing an __getattr__ will solve this - it won't, since Python's special method lookups are not guaranteed to go through all the same steps as normal attribute lookups - this is the same reason __hash__ (and various others) need to be defined on a class and can't be monkeypatched onto an instance. So, the most direct way around this is to define _CaptureEq.__hash__ like so:
def __hash__(self):
return hash(self.obj)
But that still isn't guaranteed to work, because of the second issue: set lookup is not guaranteed to test equality. sets are based on hashtables, and only do an equality test if there's more than one item in a hash bucket. You can't (and don't want to) force items that hash differently into the same bucket, since that's all an implementation detail of set. The easiest way around this issue, and to neatly sidestep the first one, is to use a list instead:
summary = [bar_1]
assert(bar_1 == bar_2)
bar_equiv = get_equivalent(summary, bar_2)
assert(bar_equiv is bar_1)
Related
This MWE in the past would cause a stack overflow, as x references y that references x:
class Ref:
def __init__(self, name):
self.name = name
self.value = None
def __repr__(self):
if self.value is None:
return self.name
return f"{self.name}={self.value!r}"
if __name__ == '__main__':
x, y = Ref("x"), Ref("y")
x.value = (1, y)
y.value = (2, x)
print(x)
print(y)
But as I test it with CPython 3.10.4, it works out of the box!
x=(1, y=(2, x=(...)))
y=(2, x=(1, y=(...)))
I can't find when this behavior changed. I see several questions as recent as 2020 wondering how to handle mutually- or self-recursive data structures. I also found about the reprlib builtin library which produces similar output, so I suspect some language dev decided to use it by default.
Note: I also tested it with __str__ and it also works, so it's not specific to repr().
It actually never really did, and still doesn't as of today (version 3.12.0 alpha 0).
The case you show is the simplest possible one: recursive repr with instances of the same class. In such case, it's pretty easy for the interpreter to detect that the repr is going to cause infinite recursion and therefore stop and produce ... instead: it just needs to check whether the .__repr__() method for the current class is asking for the .__repr__() of an instance of the same class.
This has been supported since Python 1.5.1 (1998!) as can be seen in Misc/HISTORY:
========================================
==> Release 1.5.1 (October 31, 1998) <==
========================================
[...]
- No longer a core dump when attempting to print (or repr(), or str())
a list or dictionary that contains an instance of itself; instead, the
recursive entry is printed as [...] or {...}. See Py_ReprEnter() and
Py_ReprLeave() below. Comparisons of such objects still go beserk,
since this requires a different kind of fix; fortunately, this is a
less common scenario in practice.
Any slightly more complex case will still cause trouble even on the latest CPython version:
class A:
def __init__(self):
self.value = None
def __repr__(self):
return f"{self.value!r}"
class B:
def __init__(self):
self.value = None
def __repr__(self):
return f"{self.value!r}"
a, b = A(), B()
a.value = b
b.value = a
print(a)
# RecursionError: maximum recursion depth exceeded while getting the repr of an object
In python3, I have a class. Like below:
class Foo:
def __init__(self):
self.x = 3
def fcn(self, val):
self.x += val
Then I instantiate objects of that class, like so:
new_obj = Foo()
new_obj2 = Foo()
Now when I hash these objects, I get different hash values. I need them to return the same hash, as they are the same objects (in theory).
Any idea how I can do this?
Thank you to all who answered. You're right that instantiating a new instance of the same class object is not actually the same, as it occupies a different place in memory. What I ended up doing is similar to what #nosklo suggested.
I created a 'get_hashables' function that returned a dictionary with all the properties of the class that would constitute a unique class object, like so:
def get_hashables(self):
return {'data': self.data, 'result': self.result}
Then my main method would take these 'hashable' variables, and hash them to produce the hash itself.
class Foo:
def __init__(self):
self.x = 3
def fcn(self, val):
self.x += val
def __hash__(self):
return hash(self.x)
This will calculate the hash using self.x; That means the hash will be the same when self.x is the same. You can return anything from __hash__, but to prevent consistency bugs you should return the same hash if the objects compare equal. More about that in the docs.
They are not the same object. The expression Foo() invokes the class constructor, Foo.__init__, which returns a new, unique instance of the object on each call. Your two calls return two independent objects, residing in different memory locations, each containing its own, private instance of the x attribute.
You might want to read up on Python class and instance theory.
Apologies if this is a dupe. (I couldn't find it but I'm not very good with google.)
I just stumbled over some code where they use
x = object()
in a place where they probably want x to compare not equal to anyhing that's already there. Is that guaranteed by the language?
If you compare it by using x == other_object, then this might return True. Since a custom class can override the __eq__ function, and make it for instance equal to every other object.
But we can use is to check whether the two operands refer to the same object. So we can for instance use it like:
dummy = object()
lookup = somedict.get(somekey, dummy):
if lookup is dummy:
# we did *not* find the key in the dictionary
pass
else:
pass
Since we just created the dummy object, there is no way that object can be in the somedict (unless it is of course something like locals()), so as a result we know for sure that if we find the key in the dictionary, then it will not return dummy. So we can use is safely to determine that.
Nothing is guaranteed. You can make anything equals to anything else by implementing __eq__.
Unless you know what x is, nothing is guaranteed and nothing can be assumed.
For example:
class A:
def __eq__(self, other):
return True
print(A() == object())
# True
And the contrary:
class A:
def __eq__(self, other):
return False
print(A() == object())
# False
I wanted to store instances of a class in a set, so I could use the set methods to find intersections, etc. My class has a __hash__() function, along with an __eq__ and a __lt__, and is decorated with functools.total_ordering
When I create two sets, each containing the same two objects, and do a set_a.difference(set_b), I get a result with a single object, and I have no idea why. I was expecting none, or at the least, 2, indicating a complete failure in my understanding of how sets work. But one?
for a in set_a:
print(a, a.__hash__())
for b in set_b:
print(b, b.__hash__(), b in set_a)
(<foo>, -5267863171333807568)
(<bar>, -8020339072063373731)
(<foo>, -5267863171333807568, False)
(<bar)>, -8020339072063373731, True)
Why is the <foo> object in set_b not considered to be in set_a? What other properties does an object require in order to be considered a member of a set? And why is bar considered to be a part of set_a, but not foo?
edit: updating with some more info. I figured that simply showing that the two objects' hash() results where the same meant that they where indeed the same, so I guess that's where my mistake probably comes from.
#total_ordering
class Thing(object):
def __init__(self, i):
self.i = i
def __eq__(self, other):
return self.i == other.i
def __lt__(self, other):
return self.i < other.i
def __repr__(self):
return "<Thing {}>".format(self.i)
def __hash__(self):
return hash(self.i)
I figured it out thanks to some of the questions in the comments- the problem was due to the fact that I had believed that ultimately, the hash function decides if two objects are the same, or not. The __eq__ also needs to match, which it always did in my tests and attempts to create a minimal example here.
However, when pulling data from a DB in prod, a certain float was being rounded down, and thus, the x == y was failing in prod. Argh.
Suppose I have the following classes:
class base(object):
def __init__(self, name):
self.name = name
self.last_x = 0.0
def calc(self, x):
return x
class A(base):
def calc(self, x):
return f_A(x)
class B(base):
def calc(self, x):
return f_B(x)
...
Each of the lettered classes is basically a wrapper for a corresponding lettered function f_A, f_B. The class instances include a state variable self.last_x as well as the lettered functions are assumed to be state-dependent (i.e. a Markov Chain type process).
What I would like to do is to define dependency chains between instances of these classes in order to try out different functional convolutions. For example, if we wanted to calculate a chain [a, b] on a numerical input value x we would have to do
a = A('firstnode')
b = B('secondnode')
res = b.calc(a.calc(x))
The goal is to do this with arbitrarily long chains, while also being able to access results from each intermediate calculation. I.e. if the chain is [a, b, c] I would like to make accessible results of [a] and [a, b] as well (which is why I included a name string for each node in my current implementation).
What would be the right way to setup my classes and data structures for this use case?
So far I have a fairly heavy-handed solution involving multiple dictionaries to keep track of things, but it feels inelegant and I think I might be missing out on something obvious.
Unfortunately you're improperly reusing names (thus hiding their previous values). E.g, after:
a = A('firstnode')
calling a.calc will try to call this instance (since the assignment has replaced the fact that previously name a was bound to a function) and fail. Best would be to use more sensible naming. If for some reason that's not practical, you need to bind the function names internally at class definition time:
class A(base):
def calc(self, x, a=a):
return a(x)
where the a=a does the trick, and so forth.
Having passed that hurdle, the second one is that you want the last result of each class to be saved, but, you don't save it. So, change the code to e.g
class A(base):
def calc(self, x, a=a):
self.last_result = a(x)
return self.last_result
Once that is done, performing your desired operation on a list of class instances is the least of your problems. E.g
def doit(instances, x):
curr = x
for inst in instances: curr=inst.calc(curr)
return curr
and after this
[inst.last_result for inst in instances]
will give you the intermediate results you're looking for.