I had a strange bug when porting a feature to the Python 3.1 fork of my program. I narrowed it down to the following hypothesis:
In contrast to Python 2.x, in Python 3.x if an object has an __eq__ method it is automatically unhashable.
Is this true?
Here's what happens in Python 3.1:
>>> class O(object):
... def __eq__(self, other):
... return 'whatever'
...
>>> o = O()
>>> d = {o: 0}
Traceback (most recent call last):
File "<pyshell#16>", line 1, in <module>
d = {o: 0}
TypeError: unhashable type: 'O'
The follow-up question is, how do I solve my personal problem? I have an object ChangeTracker which stores a WeakKeyDictionary that points to several objects, giving for each the value of their pickle dump at a certain time point in the past. Whenever an existing object is checked in, the change tracker says whether its new pickle is identical to its old one, therefore saying whether the object has changed in the meantime. Problem is, now I can't even check if the given object is in the library, because it makes it raise an exception about the object being unhashable. (Cause it has a __eq__ method.) How can I work around this?
Yes, if you define __eq__, the default __hash__ (namely, hashing the address of the object in memory) goes away. This is important because hashing needs to be consistent with equality: equal objects need to hash the same.
The solution is simple: just define __hash__ along with defining __eq__.
This paragraph from http://docs.python.org/3.1/reference/datamodel.html#object.hash
If a class that overrides __eq__()
needs to retain the implementation of
__hash__() from a parent class, the interpreter must be told this
explicitly by setting __hash__ =
<ParentClass>.__hash__. Otherwise the
inheritance of __hash__() will be
blocked, just as if __hash__ had been
explicitly set to None.
Check the Python 3 manual on object.__hash__:
If a class does not define an __eq__() method it should not define a __hash__() operation either; if it defines __eq__() but not __hash__(), its instances will not be usable as items in hashable collections.
Emphasis is mine.
If you want to be lazy, it sounds like you can just define __hash__(self) to return id(self):
User-defined classes have __eq__() and __hash__() methods by default; with them, all objects compare unequal (except with themselves) and x.__hash__() returns id(x).
I'm no python expert, but wouldn't it make sense that, when you define a eq-method, you also have to define a hash-method as well (which calculates the hash value for an object) Otherwise, the hashing mechanism wouldn't know if it hit the same object, or a different object with just the same hash-value. Actually, it's the other way around, it'd probably end up computing different hash values for objects considered equal by your __eq__ method.
I have no idea what that hash function is called though, __hash__ perhaps? :)
Related
I had a strange bug when porting a feature to the Python 3.1 fork of my program. I narrowed it down to the following hypothesis:
In contrast to Python 2.x, in Python 3.x if an object has an __eq__ method it is automatically unhashable.
Is this true?
Here's what happens in Python 3.1:
>>> class O(object):
... def __eq__(self, other):
... return 'whatever'
...
>>> o = O()
>>> d = {o: 0}
Traceback (most recent call last):
File "<pyshell#16>", line 1, in <module>
d = {o: 0}
TypeError: unhashable type: 'O'
The follow-up question is, how do I solve my personal problem? I have an object ChangeTracker which stores a WeakKeyDictionary that points to several objects, giving for each the value of their pickle dump at a certain time point in the past. Whenever an existing object is checked in, the change tracker says whether its new pickle is identical to its old one, therefore saying whether the object has changed in the meantime. Problem is, now I can't even check if the given object is in the library, because it makes it raise an exception about the object being unhashable. (Cause it has a __eq__ method.) How can I work around this?
Yes, if you define __eq__, the default __hash__ (namely, hashing the address of the object in memory) goes away. This is important because hashing needs to be consistent with equality: equal objects need to hash the same.
The solution is simple: just define __hash__ along with defining __eq__.
This paragraph from http://docs.python.org/3.1/reference/datamodel.html#object.hash
If a class that overrides __eq__()
needs to retain the implementation of
__hash__() from a parent class, the interpreter must be told this
explicitly by setting __hash__ =
<ParentClass>.__hash__. Otherwise the
inheritance of __hash__() will be
blocked, just as if __hash__ had been
explicitly set to None.
Check the Python 3 manual on object.__hash__:
If a class does not define an __eq__() method it should not define a __hash__() operation either; if it defines __eq__() but not __hash__(), its instances will not be usable as items in hashable collections.
Emphasis is mine.
If you want to be lazy, it sounds like you can just define __hash__(self) to return id(self):
User-defined classes have __eq__() and __hash__() methods by default; with them, all objects compare unequal (except with themselves) and x.__hash__() returns id(x).
I'm no python expert, but wouldn't it make sense that, when you define a eq-method, you also have to define a hash-method as well (which calculates the hash value for an object) Otherwise, the hashing mechanism wouldn't know if it hit the same object, or a different object with just the same hash-value. Actually, it's the other way around, it'd probably end up computing different hash values for objects considered equal by your __eq__ method.
I have no idea what that hash function is called though, __hash__ perhaps? :)
So, I was playing around with Python while answering this question, and I discovered that this is not valid:
o = object()
o.attr = 'hello'
due to an AttributeError: 'object' object has no attribute 'attr'. However, with any class inherited from object, it is valid:
class Sub(object):
pass
s = Sub()
s.attr = 'hello'
Printing s.attr displays 'hello' as expected. Why is this the case? What in the Python language specification specifies that you can't assign attributes to vanilla objects?
For other workarounds, see How can I create an object and add attributes to it?.
To support arbitrary attribute assignment, an object needs a __dict__: a dict associated with the object, where arbitrary attributes can be stored. Otherwise, there's nowhere to put new attributes.
An instance of object does not carry around a __dict__ -- if it did, before the horrible circular dependence problem (since dict, like most everything else, inherits from object;-), this would saddle every object in Python with a dict, which would mean an overhead of many bytes per object that currently doesn't have or need a dict (essentially, all objects that don't have arbitrarily assignable attributes don't have or need a dict).
For example, using the excellent pympler project (you can get it via svn from here), we can do some measurements...:
>>> from pympler import asizeof
>>> asizeof.asizeof({})
144
>>> asizeof.asizeof(23)
16
You wouldn't want every int to take up 144 bytes instead of just 16, right?-)
Now, when you make a class (inheriting from whatever), things change...:
>>> class dint(int): pass
...
>>> asizeof.asizeof(dint(23))
184
...the __dict__ is now added (plus, a little more overhead) -- so a dint instance can have arbitrary attributes, but you pay quite a space cost for that flexibility.
So what if you wanted ints with just one extra attribute foobar...? It's a rare need, but Python does offer a special mechanism for the purpose...
>>> class fint(int):
... __slots__ = 'foobar',
... def __init__(self, x): self.foobar=x+100
...
>>> asizeof.asizeof(fint(23))
80
...not quite as tiny as an int, mind you! (or even the two ints, one the self and one the self.foobar -- the second one can be reassigned), but surely much better than a dint.
When the class has the __slots__ special attribute (a sequence of strings), then the class statement (more precisely, the default metaclass, type) does not equip every instance of that class with a __dict__ (and therefore the ability to have arbitrary attributes), just a finite, rigid set of "slots" (basically places which can each hold one reference to some object) with the given names.
In exchange for the lost flexibility, you gain a lot of bytes per instance (probably meaningful only if you have zillions of instances gallivanting around, but, there are use cases for that).
As other answerers have said, an object does not have a __dict__. object is the base class of all types, including int or str. Thus whatever is provided by object will be a burden to them as well. Even something as simple as an optional __dict__ would need an extra pointer for each value; this would waste additional 4-8 bytes of memory for each object in the system, for a very limited utility.
Instead of doing an instance of a dummy class, in Python 3.3+, you can (and should) use types.SimpleNamespace for this.
It is simply due to optimization.
Dicts are relatively large.
>>> import sys
>>> sys.getsizeof((lambda:1).__dict__)
140
Most (maybe all) classes that are defined in C do not have a dict for optimization.
If you look at the source code you will see that there are many checks to see if the object has a dict or not.
So, investigating my own question, I discovered this about the Python language: you can inherit from things like int, and you see the same behaviour:
>>> class MyInt(int):
pass
>>> x = MyInt()
>>> print x
0
>>> x.hello = 4
>>> print x.hello
4
>>> x = x + 1
>>> print x
1
>>> print x.hello
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
AttributeError: 'int' object has no attribute 'hello'
I assume the error at the end is because the add function returns an int, so I'd have to override functions like __add__ and such in order to retain my custom attributes. But this all now makes sense to me (I think), when I think of "object" like "int".
https://docs.python.org/3/library/functions.html#object :
Note: object does not have a __dict__, so you can’t assign arbitrary attributes to an instance of the object class.
It's because object is a "type", not a class. In general, all classes that are defined in C extensions (like all the built in datatypes, and stuff like numpy arrays) do not allow addition of arbitrary attributes.
This is (IMO) one of the fundamental limitations with Python - you can't re-open classes. I believe the actual problem, though, is caused by the fact that classes implemented in C can't be modified at runtime... subclasses can, but not the base classes.
I would like to write a class that can be used as a key in a hashable collections (e.g. in a dict). I know that user classes are by default hashable, but using id(self) would be the wrong thing here.
My class holds a tuple as member variable. Deriving from tuple doesn't seem like an option because in my constructor I don't get the same kind of arguments as a tuple constructor. But perhaps that's not a limitation?
What I need is basically the hash of a tuple the way a real tuple would give it.
hash(self.member_tuple) does just that.
The idea here is that two tuples can be equal without their id being equal.
If I implement my __cmp__() as follows:
def __cmp__(self, other):
return cmp(self, other)
will this automatically resort to hash(self) for the comparison? ... or should I implement it as follows:
def __cmp__(self, other):
return cmp(self.member_tuple, other)
My __hash__() function is implemented to return the hash of the held tuple, i.e.:
def __hash__(self):
return hash(self.member_tuple)
Basically, how do __cmp__() and __hash__() interact? I don't know whether in __cmp__() the other will already be a hash or not and whether I should compare against "my" hash (which would be the one of the held tuple) or against self.
So which one is the right one?
Can anyone shed any light on this and possibly point me to documentation?
I'd not use __cmp__ and stick to using __eq__ instead. For hashing that is enough and you don't want to extend to being sortable here. Moreover, __cmp__ has been removed from Python 3 in favour of the rich comparison methods (__eq__, __lt__, __gt__, etc.).
Next, your __eq__ should return True when the member tuples are equal:
def __eq__(self, other):
if not isinstance(other, ThisClass):
return NotImplemented
return self.member_tuple == other.member_tuple
Returning the NotImplemented singleton when the type of the other object is not the same is good practice because that'll delegate the equality test to the other object; if it doesn't implement __eq__ or also returns NotImplemented Python will fall back to the standard id() test.
Your __hash__ implementation is spot-on.
Because a hash is not meant to be unique (it is just a means to pick a slot in the hash table), equality is then used to determine if a matching key is already present or if a hash collision has taken place. As such __eq__ (or __cmp__ if __eq__ is missing) is not called if the slot to which the object is being hashed is empty.
This does mean that if two objects are considered equal (a.__eq__(b) returns True), then their hash values must be equal too. Otherwise you could end up with a corrupted dictionary, as Python will no longer be able to determine if a key is already present in the hash table.
If both your __eq__ and __hash__ methods are delegating their duties to the self.member_tuple attribute, you are maintaining that property; you can trust the basic tuple type to have implemented this correctly.
See the glossary definition of hashable and the object.__hash__() documentation. If you are curious, I've written about how the dict and set types work internally:
Why is the order in dictionaries and sets arbitrary?
Overriding Python's Hashing Function in Dictionary
Although the title can be interpreted as three questions, the actual problem is simple to describe. On a Linux system I have python 2.7.3 installed, and want to be warned about python 3 incompatibilities. Therefore, my code snippet (tester.py) looks like:
#!/usr/bin/python -3
class MyClass(object):
def __eq__(self, other):
return False
When I execute this code snippet (thought to be only to show the problem, not an actual piece of code I am using in my project) as
./tester.py
I get the following deprecation warning:
./tester.py:3: DeprecationWarning: Overriding __eq__ blocks inheritance of __hash__ in 3.x
class MyClass(object):
My question: How do I change this code snippet to get rid of the warning, i.e. to make it compatible to version 3? I want to implement the equality operator in the correct way, not just suppressing the warning or something similar.
From the documentation page for Python 3.4:
If a class does not define an __eq__() method it should not define a __hash__() operation either; if it defines __eq__() but not __hash__(), its instances will not be usable as items in hashable collections. If a class defines mutable objects and implements an __eq__() method, it should not implement __hash__(), since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).
Basically, you need to define a __hash()__ function.
The problem is that for user-defined classes, the __eq()__ and __hash()__ functions are automatically defined.
x.__hash__() returns an appropriate value such that x == y implies
both that x is y and hash(x) == hash(y).
If you define just the __eq()__, then __hash()__ is set to return None. So you will hit the wall.
The simpler way out if you don't want to bother about implementing the __hash()__ and you know for certain that your object will never be hashed, you just explicitly declare __hash__ = None which takes care of the warning.
Alex: python's -3 option is warning you about a potential problem; it doesn't know that you aren't using instances of MyClass in sets or as keys in mappings, so it warns that something that you might have been relying on wouldn't work, if you were. If you aren't using MyClass that way, just ignore the warning. It's a dumb tool to help you catch potential problems; in the end, you're expected to be the one with the actual intelligence to work out which warnings actually matter.
If you really care about suppressing the warning - or, indeed, if a class is mutable and you want to make sure it isn't used in sets or as the key in any mapping - the simple assignment __hash__ = None (as Sudipta pointed out) in the class body shall do that for you. Since None isn't callable, this makes instances non-hashable.
class MyClass (object):
def __eq__(self, other): return self is other
__hash__ = None
I know that you can't call object.__setattr__ on objects not inherited from object, but is there anything else that is different between the two? I'm working in Python 2.6, if this matters.
Reading this question again I misunderstood what #paper.cut was asking about: the difference between classic classes and new-style classes (not an issue in Python 3+). I do not know the answer to that.
Original Answer*
setattr(instance, name, value) is syntactic sugar for instance.__setattr__(name, value)**.
You would only need to call object.__setattr__(...) inside a class definition, and then only if directly subclassing object -- if you were subclassing something else, Spam for example, then you should either use super() to get the next item in the heirarchy, or call Spam.__setattr__(...) -- this way you don't risk missing behavior that super-classes have defined by skipping over them directly to object.
* applies to Python 3.0+ classes and 2.x new-style classes
**There are two instances where setattr(x, ...) and x.__setattr__(...) are not the same:
x itself has a __setattr__ in it's private dictionary (so x.__dict__[__setattr__] = ... (this is almost certainly an error)
x.__class__ has a __getattribute__ method -- because __getattribute__ intercepts every lookup, even when the method/attribute exists
NB
These two caveats apply to every syntactic sugar shortcut:
setattr
getattr
len
bool
hash
etc