It's a thing that bugged me for a while. Why can't I do:
>>> a = ""
>>> a.foo = 2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'foo'
...while I can do the following?
>>> class Bar():
... pass
...
>>> a = Bar()
>>> a.foo = 10 #ok!
What's the rule here? Could you please point me to some description?
You can add attributes to any object that has a __dict__.
x = object() doesn't have it, for example.
Strings and other simple builtin objects also don't have it.
Classes using __slots__ also do not have it.
Classes defined with class have it unless the previous statement applies.
If an object is using __slots__ / doesn't have a __dict__, it's usually to save space. For example, in a str it would be overkill to have a dict - imagine the amount of bloat for a very short string.
If you want to test if a given object has a __dict__, you can use hasattr(obj, '__dict__').
This might also be interesting to read:
Some objects, such as built-in types and their instances (lists, tuples, etc.) do not have a __dict__. Consequently user-defined attributes cannot be set on them.
Another interesting article about Python's data model including __dict__, __slots__, etc. is this from the python reference.
Related
I have difficulty understanding the last part (in bold) from Python in a Nutshell
Per-Instance Methods
An instance can have instance-specific bindings for all attributes,
including callable attributes (methods). For a method, just like for
any other attribute (except those bound to overriding descriptors),
an instance-specific binding hides a class-level binding:
attribute lookup does not consider the class when it finds a
binding directly in the instance. An instance-specific binding for a
callable attribute does not perform any of the transformations
detailed in “Bound and Unbound Methods” on page 110: the attribute
reference returns exactly the same callable object that was earlier
bound directly to the instance attribute.
However, this does not work as you might expect
for per-instance bindings of the special methods that Python calls
implicitly as a result of various operations, as covered in “Special
Methods” on page 123. Such implicit uses of special methods always
rely on the class-level binding of the special method, if any. For
example:
def fake_get_item(idx): return idx
class MyClass(object): pass
n = MyClass()
n.__getitem__ = fake_get_item
print(n[23]) # results in:
# Traceback (most recent call last):
# File "<stdin>", line 1, in ?
# TypeError: unindexable object
What does it mean specifically?
Why is the error of the example?
Thanks.
Neglecting all the fine details it basically says that special methods (as defined in Pythons data model - generally these are the methods starting with two underscores and ending with two underscores and are rarely, if ever, called directly) will never be used implicitly from the instance even if defined there:
n[whatever] # will always call type(n).__getitem__(n, whatever)
This differs from attribute look-up which checks the instance first:
def fake_get_item(idx):
return idx
class MyClass(object):
pass
n = MyClass()
n.__getitem__ = fake_get_item
print(n.__getitem__(23)) # works because attribute lookup checks the instance first
There is a whole section in the documentation about this (including rationale): "Special method lookup":
3.3.9. Special method lookup
For custom classes, implicit invocations of special methods are only guaranteed to work correctly if defined on an object’s type, not in the object’s instance dictionary. That behaviour is the reason why the following code raises an exception:
>>> class C:
... pass
...
>>> c = C()
>>> c.__len__ = lambda: 5
>>> len(c)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object of type 'C' has no len()
The rationale behind this behaviour lies with a number of special methods such as __hash__() and __repr__() that are implemented by all objects, including type objects. If the implicit lookup of these methods used the conventional lookup process, they would fail when invoked on the type object itself:
>>> 1 .__hash__() == hash(1)
True
>>> int.__hash__() == hash(int)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: descriptor '__hash__' of 'int' object needs an argument
[...]
Bypassing the __getattribute__() machinery in this fashion provides significant scope for speed optimisations within the interpreter, at the cost of some flexibility in the handling of special methods (the special method must be set on the class object itself in order to be consistently invoked by the interpreter).
To put it even more plainly, it means that you can't redefine the dunder methods on the fly. As a consequence, ==, +, and the rest of the operators always mean the same thing for all objects of type T.
I'll try to summarize what the extract says and in particular the part in bold.
Generally speaking, when Python tries to find the value of an attribute (including a method), it first checks the instance (i.e. the actual object you created), then the class.
The code below illustrates the generic behavior.
class MyClass(object):
def a(self):
print("howdy from the class")
n = MyClass()
#here the class method is called
n.a()
#'howdy from the class'
def new_a():
print("hello from new a")
n.a = new_a
#the new instance binding hides the class binding
n.a()
#'hello from new a'
What the part in bold states is that this behavior does not apply to "Special Methods" such as __getitem__. In other words, overriding __getitem__ at the instance level (n.__getitem__ = fake_get_item in your exemple) does nothing : when the method is called through the n[] syntax, an error is raised because the class does not implement the method.
(If the generic behavior also held in this case, the result of print(n[23]) would have been to print 23, i.e. executing the fake_get_item method).
Another example of the same behavior:
class MyClass(object):
def __getitem__(self, idx):
return idx
n = MyClass()
fake_get_item = lambda x: "fake"
print(fake_get_item(23))
#'fake'
n.__getitem__ = fake_get_item
print(n[23])
#'23'
In this example, the class method for __getitem__ (which returns the index number) is called instead of the instance binding (which returns 'fake').
>>> a = object()
>>> a.x = 5
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'object' object has no attribute 'x'
>>> b = lambda:0
>>> b.x = 5
>>> b.x
5
Why do instances of the object class not have a __dict__, causing it to behave as semantically immutable? What were the reasons for choosing this design?
Specifically, why:
instances of types defined in C don't have a __dict__ attribute by
default.
As noted in this question.
The documentation for Python 2 is not very helpful in giving an explanation as to why you cannot assign attributes to an object(), but the documentation for Python 3 provides a bit more information:
Return a new featureless object. object is a base for all classes. It has the methods that are common to all instances of Python classes. This function does not accept any arguments.
Note: object does not have a __dict__, so you can’t assign arbitrary attributes to an instance of the object class.
Thus, the reason you cannot add arbitrary attributes to your object() appears to be because of the fact that object() instances do not have an implementation of the __dict__ attribute, not because object() instances are immutable:
>>> hasattr(object(), '__dict__')
False
>>>
Another interesting thing, but perhaps not relevant to the discussion at hand, is that while an instance of object may not have a __dict__ implementation, the object class itself does:
>>> hasattr(object, '__dict__')
True
As for the why part of the question, I cannot find any exact reasons for why object() doesn't have a __dict__. Is is probably because - as #tdelany has already mentioned on in the comments - an implementation detail. If you really want a definitive answer, you should ask Guido himself.
So, I was playing around with Python while answering this question, and I discovered that this is not valid:
o = object()
o.attr = 'hello'
due to an AttributeError: 'object' object has no attribute 'attr'. However, with any class inherited from object, it is valid:
class Sub(object):
pass
s = Sub()
s.attr = 'hello'
Printing s.attr displays 'hello' as expected. Why is this the case? What in the Python language specification specifies that you can't assign attributes to vanilla objects?
For other workarounds, see How can I create an object and add attributes to it?.
To support arbitrary attribute assignment, an object needs a __dict__: a dict associated with the object, where arbitrary attributes can be stored. Otherwise, there's nowhere to put new attributes.
An instance of object does not carry around a __dict__ -- if it did, before the horrible circular dependence problem (since dict, like most everything else, inherits from object;-), this would saddle every object in Python with a dict, which would mean an overhead of many bytes per object that currently doesn't have or need a dict (essentially, all objects that don't have arbitrarily assignable attributes don't have or need a dict).
For example, using the excellent pympler project (you can get it via svn from here), we can do some measurements...:
>>> from pympler import asizeof
>>> asizeof.asizeof({})
144
>>> asizeof.asizeof(23)
16
You wouldn't want every int to take up 144 bytes instead of just 16, right?-)
Now, when you make a class (inheriting from whatever), things change...:
>>> class dint(int): pass
...
>>> asizeof.asizeof(dint(23))
184
...the __dict__ is now added (plus, a little more overhead) -- so a dint instance can have arbitrary attributes, but you pay quite a space cost for that flexibility.
So what if you wanted ints with just one extra attribute foobar...? It's a rare need, but Python does offer a special mechanism for the purpose...
>>> class fint(int):
... __slots__ = 'foobar',
... def __init__(self, x): self.foobar=x+100
...
>>> asizeof.asizeof(fint(23))
80
...not quite as tiny as an int, mind you! (or even the two ints, one the self and one the self.foobar -- the second one can be reassigned), but surely much better than a dint.
When the class has the __slots__ special attribute (a sequence of strings), then the class statement (more precisely, the default metaclass, type) does not equip every instance of that class with a __dict__ (and therefore the ability to have arbitrary attributes), just a finite, rigid set of "slots" (basically places which can each hold one reference to some object) with the given names.
In exchange for the lost flexibility, you gain a lot of bytes per instance (probably meaningful only if you have zillions of instances gallivanting around, but, there are use cases for that).
As other answerers have said, an object does not have a __dict__. object is the base class of all types, including int or str. Thus whatever is provided by object will be a burden to them as well. Even something as simple as an optional __dict__ would need an extra pointer for each value; this would waste additional 4-8 bytes of memory for each object in the system, for a very limited utility.
Instead of doing an instance of a dummy class, in Python 3.3+, you can (and should) use types.SimpleNamespace for this.
It is simply due to optimization.
Dicts are relatively large.
>>> import sys
>>> sys.getsizeof((lambda:1).__dict__)
140
Most (maybe all) classes that are defined in C do not have a dict for optimization.
If you look at the source code you will see that there are many checks to see if the object has a dict or not.
So, investigating my own question, I discovered this about the Python language: you can inherit from things like int, and you see the same behaviour:
>>> class MyInt(int):
pass
>>> x = MyInt()
>>> print x
0
>>> x.hello = 4
>>> print x.hello
4
>>> x = x + 1
>>> print x
1
>>> print x.hello
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
AttributeError: 'int' object has no attribute 'hello'
I assume the error at the end is because the add function returns an int, so I'd have to override functions like __add__ and such in order to retain my custom attributes. But this all now makes sense to me (I think), when I think of "object" like "int".
https://docs.python.org/3/library/functions.html#object :
Note: object does not have a __dict__, so you can’t assign arbitrary attributes to an instance of the object class.
It's because object is a "type", not a class. In general, all classes that are defined in C extensions (like all the built in datatypes, and stuff like numpy arrays) do not allow addition of arbitrary attributes.
This is (IMO) one of the fundamental limitations with Python - you can't re-open classes. I believe the actual problem, though, is caused by the fact that classes implemented in C can't be modified at runtime... subclasses can, but not the base classes.
I find the following example mildly surprising:
>>> class Foo:
def blah(self):
pass
>>> f = Foo()
>>> def bar(self):
pass
>>> Foo.bar = bar
>>> f.bar
<bound method Foo.bar of <__main__.Foo object at 0x02D18FB0>>
I expected the bound method to be associated with each particular instance, and to be placed in it at construction. It seems logical that the bound method would have to be different for each instance, so that it knows which instance to pass in to the underlying function - and, indeed:
>>> g = Foo()
>>> g.blah is f.blah
False
But my understanding of the process is clearly flawed, since I would not expect assigning a function into a class attribute would put it in instances that had already been created by then.
So, my question is two fold -
Why does assigning a function into a class apply retroactively to instances? What are the actual lookup rules and processes that make this so?
Is this something guaranteed by the language, or just something that happens to happen?
You want to blow your mind, try this:
f.blah is f.blah
That's right, the instance method wrapper is different each time you access it.
In fact an instance method is a descriptor. In other words, f.blah is actually:
Foo.blah.__get__(f, type(f))
Methods are not actually stored on the instance; they are stored on the class, and a method wrapper is generated on the fly to bind the method to the instance.
The instances do not "contain" the method. The lookup process happens dynamically at the time you access foo.bar. It checks to see if the instance has an attribute of that name. Since it doesn't, it looks on the class, whereupon it finds whatever attribute the class has at that time. Note that methods are not special in this regard. You'll see the same effect if you set Foo.bar = 2; after that, foo.bar will evalute to 2.
What is guaranteed by the language is that attribute lookup proceeds in this fashion: first the instance, then the class if the attribute is not found on the instance. (Lookup rules are different for special methods implicitly invoked via operator overloading, etc..)
Only if you directly assign an attribute to the instance will it mask the class attribute.
>>> foo = Foo()
>>> foo.bar
Traceback (most recent call last):
File "<pyshell#79>", line 1, in <module>
foo.bar
AttributeError: 'Foo' object has no attribute 'bar'
>>> foo.bar = 2
>>> Foo.bar = 88
>>> foo.bar
2
All of the above is a separate matter from bound/unbound methods. The class machinery in Python uses the descriptor protocol so that when you access foo.bar, a new bound method instance is created on the fly. That's why you're seeing different bound method instances on your different objects. But note that underlyingly these bound methods rely on the same code object, as defined by the method you wrote in the class:
>>> foo = Foo()
>>> foo2 = Foo()
>>> foo.blah.im_func.__code__ is foo2.blah.im_func.__code__
True
In python, it is illegal to create new attribute for an object instance like this
>>> a = object()
>>> a.hhh = 1
throws
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'object' object has no attribute 'hhh'
However, for a function object, it is OK.
>>> def f():
... return 1
...
>>> f.hhh = 1
What is the rationale behind this difference?
The reason function objects support arbitrary attributes is that, before we added that feature, several frameworks (e.g. parser generator ones) were abusing function docstrings (and other attribute of function objects) to stash away per-function information that was crucial to them -- the need for such association of arbitrary named attributes to function objects being proven by example, supporting them directly in the language rather than punting and letting (e.g.) docstrings be abused, was pretty obvious.
To support arbitrary instance attributes a type must supply every one of its instances with a __dict__ -- that's no big deal for functions (which are never tiny objects anyway), but it might well be for other objects intended to be tiny. By making the object type as light as we could, and also supplying __slots__ to allow avoiding per-instance __dict__ in subtypes of object, we supported small, specialized "value" types to the best of our ability.
Alex Martelli posted an awesome answer to your question. For anyone who is looking for a good way to accomplish arbitrary attributes on an empty object, do this:
class myobject(object):
pass
o = myobject()
o.anything = 123
Or more efficient (and better documented) if you know the attributes:
class myobject(object):
__slots__ = ('anything', 'anythingelse')
o = myobject()
o.anything = 123
o.anythingelse = 456
The rationale is that an instance of object() is a degenerate special case. It "is" an object but it isn't designed to be useful by itself.
Think of object as a temporary hack, bridging old-style types and classes. In Python 3.0 it will fade into obscurity because it will no longer be used as part of
class Foo( object ):
pass
f = Foo()
f.randomAttribute = 3.1415926
Here's another alternative, as short as I could make it:
>>> dummy = type('', (), {})()
>>> dummy.foo = 5
>>> dummy.foo
5