How to calculate hash of a python class object - python

In python3, I have a class. Like below:
class Foo:
def __init__(self):
self.x = 3
def fcn(self, val):
self.x += val
Then I instantiate objects of that class, like so:
new_obj = Foo()
new_obj2 = Foo()
Now when I hash these objects, I get different hash values. I need them to return the same hash, as they are the same objects (in theory).
Any idea how I can do this?

Thank you to all who answered. You're right that instantiating a new instance of the same class object is not actually the same, as it occupies a different place in memory. What I ended up doing is similar to what #nosklo suggested.
I created a 'get_hashables' function that returned a dictionary with all the properties of the class that would constitute a unique class object, like so:
def get_hashables(self):
return {'data': self.data, 'result': self.result}
Then my main method would take these 'hashable' variables, and hash them to produce the hash itself.

class Foo:
def __init__(self):
self.x = 3
def fcn(self, val):
self.x += val
def __hash__(self):
return hash(self.x)
This will calculate the hash using self.x; That means the hash will be the same when self.x is the same. You can return anything from __hash__, but to prevent consistency bugs you should return the same hash if the objects compare equal. More about that in the docs.

They are not the same object. The expression Foo() invokes the class constructor, Foo.__init__, which returns a new, unique instance of the object on each call. Your two calls return two independent objects, residing in different memory locations, each containing its own, private instance of the x attribute.
You might want to read up on Python class and instance theory.

Related

Best practice for defining a class that computes attributes in order when initialized

I would like to define a class that does something like:
Class computer():
def __init__(self, x):
# compute first the 'helper' properties
self.prop1 = self.compute_prop1(x)
self.prop2 = self.compute_prop2(x)
# then compute the property that depends on 'helpers'
self.prop3 = self.compute_prop3(x)
def compute_prop1(self, x):
return x
def compute_prop2(self, x):
return x*x
def compute_prop3(self, x):
return self.prop1 + self.prop2
Then, when I initialize an instance, I get all properties computed in order (first helpers, then everything depending on helpers later):
>>> computer = Computer(3)
>>> computer.__dict__
{'prop1': 3, 'prop2': 9, 'prop3': 12}
However, I think there is a better practice of writing this code, for example using decorators. Could you please give me some hints? Thank you!
Here's your class using properties instead (with an added method for returning each property):
Class PropertyComputer:
def __init__(self, x):
self._x = x
#property
def prop1(self):
return self._x
#property
def prop2(self):
return self._x * self._x
#property
def prop3(self):
return self.prop1 + self.prop2
def get_props(self):
return self.prop1, self.prop2, self.prop3
Design-wise, I believe this is better because:
storing x as an instance variable makes more sense: the point of using objects is to avoid having to pass variables around, especially those that the object itself can keep track of;
the attribute assignment and its corresponding calculation are bundled together in each property-decorated method: we'll never have to think whether the problem is in the init method (where you define the attribute) or in the compute method (where the logic for the attribute's calculation is laid out).
Note that the concept of "first calculate helpers, then the properties depending on them" does not really apply to this code: we only need to evaluate prop3 if/when we actually need it. If we never access it, we never need to compute it.
A "bad" side-effect of using properties, compared to your example, is that these properties are not "stored" anywhere (hence why I added the last method):
c = PropertyComputer(x=2)
c.__dict__ # outputs {'_x': 2}
Also note that, using decorators, the attributes are calculated on-the-fly whenever you access them, instead of just once in the init method. In this manner, property-decorated methods work like methods, but are accessed like attributes (it's the whole point of using them):
c = PropertyComputer(x=2)
c.prop1 # outputs 2
c._x = 10
c.prop1 # outputs 10
As a side note, you can use functools.cached_property to cache the evaluation of one of these properties, in case it's computationally expensive.
I think the following would be the easiest way to avoid redundancy
class computer():
def __init__(self, x):
self.prop_dict = self.compute_prop_dict(x)
def compute_prop_dict(self, x):
prop1 = x
prop2 = x*x
return {'prop1': prop1, 'prop2': prop2, 'prop3': prop1 + prop2}
So anything that would come after instantiation could have access to these helpers via the prop_dict
But as said by Brian as a comment this order is just a language specification for Python 3.7

Non-iterable Object After \__init__ Array in Python Class

Say I would like to create a python class that behave as array of another class. While the __init__ is called, it recognizes itself as an array (iterable); however, when I call it again through some other method, or even call by the index, the object becomes non-iterable. I wonder which part I got it wrong, or perhaps, there's DO and DON'T for python class?
Last but not least, this is an attempt to simplify one object type to another (trying to cast from one class to another). Perhaps the code below will give a better clarification.
The example is below:
Say I have an object FOO
FOO.name = "john"
FOO.records[0].a = 1
FOO.records[0].b = 2
FOO.records[1].a = 4
FOO.records[1].b = 5
And I create a python class
class BAR:
__init__(self, record):
self.a = int(record.a)
self.b = int(record.b)
and another class which would like to store BAR class as array
class BARS:
__init__(self,bars):
self = numpy.array([]) # regardless the array type whether python native or Numpy it does not work
for item in bars:
self = numpy.append(self, BAR(item))
so what I would expect this code to perform would be that if I call
A = BARS(FOO.records)
I would get an iterable A. But this does not work, though if I call SELF in BARS __init__, it would see SELF as iterable object.
If one should not expect python class to behave in this manner, at least I hope you could help pointing me out, what would be the alternative logical and pythonic way to achieve it.
Perhaps answering my own question after a hint from comment above would be good.
It turns out that assining self in class as itself is a DON'T (silly me trying to get a shortcut).
To achieve an iterable class, one would require __iter__ method alongside with __next__, and __getitem__ to fulfill (maybe some others methods as well, but let's stick to these three for now).
So, the code above should look like this
class BARS:
def __init__(self, records):
self.records = [] # Use list for simplicity
for record in records:
self.records.append(BAR(record))
def __iter__(self):
self.n = 0
return self
def __next__(self):
if self.n < len(self.records):
result = self.records[self.n]
self.n += 1
return result
else:
raise StopIteration
def __getitem__(self, key):
return self.records[key]
Eventually, this will yield a iteration and index accessible object.

python set() membership and hashable objects

I wanted to store instances of a class in a set, so I could use the set methods to find intersections, etc. My class has a __hash__() function, along with an __eq__ and a __lt__, and is decorated with functools.total_ordering
When I create two sets, each containing the same two objects, and do a set_a.difference(set_b), I get a result with a single object, and I have no idea why. I was expecting none, or at the least, 2, indicating a complete failure in my understanding of how sets work. But one?
for a in set_a:
print(a, a.__hash__())
for b in set_b:
print(b, b.__hash__(), b in set_a)
(<foo>, -5267863171333807568)
(<bar>, -8020339072063373731)
(<foo>, -5267863171333807568, False)
(<bar)>, -8020339072063373731, True)
Why is the <foo> object in set_b not considered to be in set_a? What other properties does an object require in order to be considered a member of a set? And why is bar considered to be a part of set_a, but not foo?
edit: updating with some more info. I figured that simply showing that the two objects' hash() results where the same meant that they where indeed the same, so I guess that's where my mistake probably comes from.
#total_ordering
class Thing(object):
def __init__(self, i):
self.i = i
def __eq__(self, other):
return self.i == other.i
def __lt__(self, other):
return self.i < other.i
def __repr__(self):
return "<Thing {}>".format(self.i)
def __hash__(self):
return hash(self.i)
I figured it out thanks to some of the questions in the comments- the problem was due to the fact that I had believed that ultimately, the hash function decides if two objects are the same, or not. The __eq__ also needs to match, which it always did in my tests and attempts to create a minimal example here.
However, when pulling data from a DB in prod, a certain float was being rounded down, and thus, the x == y was failing in prod. Argh.

are user defined classes mutable

Say I want to create a class for car, tractor and boat. All these classes have an instance of engine and I want to keep track of all the engines in a single list. If I understand correctly if the motor object is mutable i can store it as an attribute of car and also the same instance in a list.
I cant track down any solid info on whether user defined classes are mutable and if there is a choice to choose when you define them, can anybody shed some light?
User classes are considered mutable. Python doesn't have (absolutely) private attributes, so you can always change a class by reaching into the internals.
For using your class as a key in a dict or storing them in a set, you can define a .__hash__() method and a .__eq__() method, making a promise that your class is immutable. You generally design your class API to not mutate the internal state after creation in such cases.
For example, if your engines are uniquely defined by their id, you can use that as the basis of your hash:
class Engine(object):
def __init__(self, id):
self.id = id
def __hash__(self):
return hash(self.id)
def __eq__(self, other):
if isinstance(other, self.__class__):
return self.id == other.id
return NotImplemented
Now you can use instances of class Engine in sets:
>>> eng1 = Engine(1)
>>> eng2 = Engine(2)
>>> eng1 == eng2
False
>>> eng1 == eng1
True
>>> eng1 == Engine(1)
True
>>> engines = set([eng1, eng2])
>>> engines
set([<__main__.Engine object at 0x105ebef10>, <__main__.Engine object at 0x105ebef90>])
>>> engines.add(Engine(1))
>>> engines
set([<__main__.Engine object at 0x105ebef10>, <__main__.Engine object at 0x105ebef90>])
In the above sample I add another Engine(1) instance to the set, but it is recognized as already present and the set didn't change.
Note that as far as lists are concerned, the .__eq__() implementation is the important one; lists don't care if an object is mutable or not, but with the .__eq__() method in place you can test if a given engine is already in a list:
>>> Engine(1) in [eng1, eng2]
True
All objects (with the exception of a few in the standard library, some that implement special access mechanisms using things like descriptors and decorators, or some implemented in C) are mutable. This includes instances of user defined classes, classes themselves, and even the type objects that define the classes. You can even mutate a class object at runtime and have the modifications manifest in instances of the class created before the modification. By and large, things are only immutable by convention in Python if you dig deep enough.
I think you're confusing mutability with how python keeps references -- Consider:
class Foo(object):
pass
t = (1,2,Foo()) # t is a tuple, :. t is immutable
b = a[2] # b is an instance of Foo
b.foo = "Hello" # b is mutable. (I just changed it)
print (hash(b)) # b is hashable -- although the default hash isn't very useful
d = {b : 3} # since b is hashable, it can be used as a key in a dictionary (or set).
c = t # even though t is immutable, we can create multiple references to it.
a = [t] # here we add another reference to t in a list.
Now to your question about getting/storing a list of engines globally -- There are a few different ways to do this, here's one:
class Engine(object):
def __init__(self, make, model):
self.make = make
self.model = model
class EngineFactory(object):
def __init__(self,**kwargs):
self._engines = kwargs
def all_engines(self):
return self._engines.values()
def __call__(self,make, model):
""" Return the same object every for each make,model combination requested """
if (make,model) in _engines:
return self._engines[(make,model)]
else:
a = self._engines[(make,model)] = Engine(make,model)
return a
engine_factory = EngineFactory()
engine1 = engine_factory('cool_engine',1.0)
engine2 = engine_factory('cool_engine',1.0)
engine1 is engine2 #True !!! They're the same engine. Changing engine1 changes engine2
The example above could be improved a little bit by having the EngineFactory._engines dict store weakref.ref objects instead of actually storing real references to the objects. In that case, you'd check to make sure the reference is still alive (hasn't been garbage collected) before you return a new reference to the object.
EDIT: This is conceptually wrong, The immutable object in python can shed some light as to why.
class Engine():
def __init__(self, sn):
self.sn = sn
a = Engine(42)
b = a
print (a is b)
prints True.

Python - __eq__ method not being called

I have a set of objects, and am interested in getting a specific object from the set. After some research, I decided to use the solution provided here: http://code.activestate.com/recipes/499299/
The problem is that it doesn't appear to be working.
I have two classes defined as such:
class Foo(object):
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
def __key(self):
return (self.a, self.b, self.c)
def __eq__(self, other):
return self.__key() == other.__key()
def __hash__(self):
return hash(self.__key())
class Bar(Foo):
def __init__(self, a, b, c, d, e):
self.a = a
self.b = b
self.c = c
self.d = d
self.e = e
Note: equality of these two classes should only be defined on the attributes a, b, c.
The wrapper _CaptureEq in http://code.activestate.com/recipes/499299/ also defines its own __eq__ method. The problem is that this method never gets called (I think). Consider,
bar_1 = Bar(1,2,3,4,5)
bar_2 = Bar(1,2,3,10,11)
summary = set((bar_1,))
assert(bar_1 == bar_2)
bar_equiv = get_equivalent(summary, bar_2)
bar_equiv.d should equal 4 and likewise bar_equiv .e should equal 5, but they are not. Like I mentioned, it looks like the __CaptureEq __eq__ method does not get called when the statement bar_2 in summary is executed.
Is there some reason why the __CaptureEq __eq__ method is not being called? Hopefully this is not too obscure of a question.
Brandon's answer is informative, but incorrect. There are actually two problems, one with
the recipe relying on _CaptureEq being written as an old-style class (so it won't work properly if you try it on Python 3 with a hash-based container), and one with your own Foo.__eq__ definition claiming definitively that the two objects are not equal when it should be saying "I don't know, ask the other object if we're equal".
The recipe problem is trivial to fix: just define __hash__ on the comparison wrapper class:
class _CaptureEq:
'Object wrapper that remembers "other" for successful equality tests.'
def __init__(self, obj):
self.obj = obj
self.match = obj
# If running on Python 3, this will be a new-style class, and
# new-style classes must delegate hash explicitly in order to populate
# the underlying special method slot correctly.
# On Python 2, it will be an old-style class, so the explicit delegation
# isn't needed (__getattr__ will cover it), but it also won't do any harm.
def __hash__(self):
return hash(self.obj)
def __eq__(self, other):
result = (self.obj == other)
if result:
self.match = other
return result
def __getattr__(self, name): # support anything else needed by __contains__
return getattr(self.obj, name)
The problem with your own __eq__ definition is also easy to fix: return NotImplemented when appropriate so you aren't claiming to provide a definitive answer for comparisons with unknown objects:
class Foo(object):
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
def __key(self):
return (self.a, self.b, self.c)
def __eq__(self, other):
if not isinstance(other, Foo):
# Don't recognise "other", so let *it* decide if we're equal
return NotImplemented
return self.__key() == other.__key()
def __hash__(self):
return hash(self.__key())
With those two fixes, you will find that Raymond's get_equivalent recipe works exactly as it should:
>>> from capture_eq import *
>>> bar_1 = Bar(1,2,3,4,5)
>>> bar_2 = Bar(1,2,3,10,11)
>>> summary = set((bar_1,))
>>> assert(bar_1 == bar_2)
>>> bar_equiv = get_equivalent(summary, bar_2)
>>> bar_equiv.d
4
>>> bar_equiv.e
5
Update: Clarified that the explicit __hash__ override is only needed in order to correctly handle the Python 3 case.
The problem is that the set compares two objects the “wrong way around” for this pattern to intercept the call to __eq__(). The recipe from 2006 evidently was written against containers that, when asked if x was present, went through the candidate y values already present in the container doing:
x == y
comparisons, in which case an __eq__() on x could do special actions during the search. But the set object is doing the comparison the other way around:
y == x
for each y in the set. Therefore this pattern might simply not be usable in this form when your data type is a set. You can confirm this by instrumenting Foo.__eq__() like this:
def __eq__(self, other):
print '__eq__: I am', self.d, self.e, 'and he is', other.d, other.e
return self.__key() == other.__key()
You will then see a message like:
__eq__: I am 4 5 and he is 10 11
confirming that the equality comparison is posing the equality question to the object already in the set — which is, alas, not the object wrapped with Hettinger's _CaptureEq object.
Update:
And I forgot to suggest a way forward: have you thought about using a dictionary? Since you have an idea here of a key that is a subset of the data inside the object, you might find that splitting out the idea of the key from the idea of the object itself might alleviate the need to attempt this kind of convoluted object interception. Just write a new function that, given an object and your dictionary, computes the key and looks in the dictionary and returns the object already in the dictionary if the key is present else inserts the new object at the key.
Update 2: well, look at that — Nick's answer uses a NotImplemented in one direction to force the set to do the comparison in the other direction. Give the guy a few +1's!
There are two issues here. The first is that:
t = _CaptureEq(item)
if t in container:
return t.match
return default
Doesn't do what you think. In particular, t will never be in container, since _CaptureEq doesn't define __hash__. This becomes more obvious in Python 3, since it will point this out to you rather than providing a default __hash__. The code for _CaptureEq seems to believe that providing an __getattr__ will solve this - it won't, since Python's special method lookups are not guaranteed to go through all the same steps as normal attribute lookups - this is the same reason __hash__ (and various others) need to be defined on a class and can't be monkeypatched onto an instance. So, the most direct way around this is to define _CaptureEq.__hash__ like so:
def __hash__(self):
return hash(self.obj)
But that still isn't guaranteed to work, because of the second issue: set lookup is not guaranteed to test equality. sets are based on hashtables, and only do an equality test if there's more than one item in a hash bucket. You can't (and don't want to) force items that hash differently into the same bucket, since that's all an implementation detail of set. The easiest way around this issue, and to neatly sidestep the first one, is to use a list instead:
summary = [bar_1]
assert(bar_1 == bar_2)
bar_equiv = get_equivalent(summary, bar_2)
assert(bar_equiv is bar_1)

Categories