Why doesn't Python have a hybrid getattr + __getitem__ built in?

Why doesn't Python have a hybrid getattr + __getitem__ built in? - python

I have methods that accept dicts or other objects and the names of "fields" to fetch from those objects. If the object is a dict then the method uses __getitem__ to retrieve the named key, or else it uses getattr to retrieve the named attribute. This is pretty common in web templating languages. For example, in a Chameleon template you might have:
<p tal:content="foo.keyname">Stuff goes here</p>
If you pass in foo as a dict like {'keyname':'bar'}, then foo.keyname fetches the 'keyname' key to get 'bar'. If foo is an instance of a class like:
class Foo(object):
keyname = 'baz'
then foo.keyname fetches the value from the keyname attribute. Chameleon itself implements that function (in the chameleon.py26 module) like this:
def lookup_attr(obj, key):
try:
return getattr(obj, key)
except AttributeError as exc:
try:
get = obj.__getitem__
except AttributeError:
raise exc
try:
return get(key)
except KeyError:
raise exc
I've implemented it in my own package like:
try:
value = obj[attribute]
except (KeyError, TypeError):
value = getattr(obj, attribute)
The thing is, that's a pretty common pattern. I've seen that method or one awfully similar to it in a lot of modules. So why isn't something like it in the core of the language, or at least in one of the core modules? Failing that, is there a definitive way of how that could should be written?

I sort of half-read your question, wrote the below, and then reread your question and realized I had answered a subtly different question. But I think the below actually still provides an answer after a sort. If you don't think so, pretend instead that you had asked this more general question, which I think includes yours as a sub-question:
"Why doesn't Python provide any built-in way to treat attributes and items as interchangable?"
I've given a fair bit of thought to this question, and I think the answer is very simple. When you create a container type, it's very important to distinguish between attributes and items. Any reasonably well-developed container type will have a number of attributes -- often though not always methods -- that enable it to manage its contents in graceful ways. So for example, a dict has items, values, keys, iterkeys and so on. These attributes are all accessed using . notation. Items, on the other hand, are accessed using [] notation. So there can be no collisions.
What happens when you enable item access using . notation? Suddenly you have overlapping namespaces. How do you handle collisions now? If you subclass a dict and give it this functionality, either you can't use keys like items as a rule, or you have to create some kind of namespace hierarchy. The first option creates a rule that is onerous, hard to follow, and hard to enforce. The second option creates an annoying amount of complexity, without fully resolving the collision problem, since you still have to have an alternative interface to specify whether you want items the item or items the attribute.
Now, for certain kinds of very primitive types, this is acceptable. That's probably why there's namedtuple in the standard library, for example. (But note that namedtuple is subject to these very problems, which is probably why it was implemented as a factory function (prevents inheritance) and uses weird, private method names like _asdict.)
It's also very, very, very easy to create a subclass of object with no (public) attributes and use setattr on it. It's even pretty easy to override __getitem__, __setitem__, and __delitem__ to invoke __getattribute__, __setattr__ and __delattr__, so that item access just becomes syntactic sugar for getattr(), setattr(), etc. (Though that's a bit more questionable since it creates somewhat unexpected behavior.)
But for any kind of well-developed container class that you want to be able to expand and inherit from, adding new, useful attributes, a __getattr__ + __getitem__ hybrid would be, frankly, an enormous PITA.

The closest thing in the python standard library is a namedtuple(), http://docs.python.org/dev/library/collections.html#collections.namedtuple
Foo = namedtuple('Foo', ['key', 'attribute'])
foo = Foo(5, attribute=13)
print foo[1]
print foo.key
Or you can easily define your own type that always actually stores into it's dict but allows the appearance of attribute setting and getting:
class MyDict(dict):
def __getattr__(self, attr):
return self[attr]
def __setattr__(self, attr, value):
self[attr] = value
d = MyDict()
d.a = 3
d[3] = 'a'
print(d['a']) # 3
print(d[3]) # 'a'
print(d['b']) # Returns a keyerror
But don't do d.3 because that's a syntax error. There are of course more complicated ways out there of making a hybrid storage type like this, search the web for many examples.
As far as how to check both, the Chameleon way looks thorough. When it comes to 'why isn't there a way to do both in the standard library' it's because ambiguity is BAD. Yes, we have ducktyping and all other kinds of masquerading in python, and classes are really just dictionaries anyway, but at some point we want different functionality from a container like a dict or list than we want from a class, with it's method resolution order, overriding, etc.

You can pretty easily write your own dict subclass that natively behaves this way. A minimal implementation, which I like to call a "pile" of attributes, is like so:
class Pile(dict):
# raise AttributeError for missing key here to fulfill API
def __getattr__(self, key):
if key in self:
return self[key]
else:
raise AttributeError(key)
def __setattr__(self, key, value):
self[key] = value
Unfortunately if you need to be able to deal with either dictionaries or attribute-laden objects passed to you, rather than having control of the object from the beginning, this won't help.
In your situation I would probably use something very much like what you have, except break it out into a function so I don't have to repeat it all the time.

Related

What is the correct way to override the dir method?

This question is meant to be more about __dir__ than about numpy.
I have a subclass of numpy.recarray (in python 2.7, numpy 1.6.2), and I noticed recarray's field names are not listed when diring the object (and therefore ipython's autocomplete doesn't work).
Trying to fix it, I tried overriding __dir__ in my subclass, like this:
def __dir__(self):
return sorted(set(
super(MyRecArray, self).__dir__() + \
self.__dict__.keys() + self.dtype.fields.keys()))
which resulted with: AttributeError: 'super' object has no attribute '__dir__'.
(I found here this should actually work in python 3.3...)
As a workaround, I tried:
def __dir__(self):
return sorted(set(
dir(type(self)) + \
self.__dict__.keys() + self.dtype.fields.keys()))
As far as I can tell, this one works, but of course, not as elegantly.
Questions:
Is the latter solution correct in my case, i.e. for a subclass of recarray?
Is there a way to make it work in the general case? It seems to me it wouldn't work with multiple inheritance (breaking the super-call chain), and of course, for objects with no __dict__...
Do you know why recarray does not support listing its field names to begin with? mere oversight?

Python 2.7+, 3.3+ class mixin that simplifies implementation of __dir__ method in subclasses. Hope it will help. Gist.
import six
class DirMixIn:
""" Mix-in to make implementing __dir__ method in subclasses simpler
"""
def __dir__(self):
if six.PY3:
return super(DirMixIn, self).__dir__()
else:
# code is based on
# http://www.quora.com/How-dir-is-implemented-Is-there-any-PEP-related-to-that
def get_attrs(obj):
import types
if not hasattr(obj, '__dict__'):
return [] # slots only
if not isinstance(obj.__dict__, (dict, types.DictProxyType)):
raise TypeError("%s.__dict__ is not a dictionary"
"" % obj.__name__)
return obj.__dict__.keys()
def dir2(obj):
attrs = set()
if not hasattr(obj, '__bases__'):
# obj is an instance
if not hasattr(obj, '__class__'):
# slots
return sorted(get_attrs(obj))
klass = obj.__class__
attrs.update(get_attrs(klass))
else:
# obj is a class
klass = obj
for cls in klass.__bases__:
attrs.update(get_attrs(cls))
attrs.update(dir2(cls))
attrs.update(get_attrs(obj))
return list(attrs)
return dir2(self)

Have you tried:
def __dir__(self):
return sorted(set(
dir(super(MyRecArray, self)) + \
self.__dict__.keys() + self.dtype.fields.keys()))

and 3: Yes your solution is correct. recarray does not define __dir__ simply because the default implementation was okay, so they didn't bother implementing it, and numpy's devs did not design the class to be subclassed, so I don't see why they should have bothered.
It's often a bad idea to subclass built-in types or classes that are not specifically designed for inheritance, thus I'd suggest you to use delegation/composition instead of inheritance, except if there is a particular reason(e.g. you want to pass it to a numpy function that excplicitly checks with isinstance).
No. As you pointed out in python3 they changed the implementation so that there is an object.__dir__, but on other python versions I can't see anything that you can do. Also, again, using recarray with multiple-inheritance is simply crazy, things will break. Multiple-inheritance should be carefully designed, and usually classes are specifically designed to be used with it(e.g. mix-ins). So I wouldn't bother treating this case, since whoever tries it will be bitten by other problems.
I don't see why you should care for classes that do not have __dict__... since your subclass has it how should it break? When you'll change the subclass implementation, e.g. using __slots__ you could easily change the __dir__ also. If you want to avoid redefining __dir__ you can simply define a function that checks for __dict__ then for __slots__ etc. Note however that attributes can be generated in subtle ways with __getattr__ and __getattribute__ and thus you simply can't reliably catch all of them.

The most pythonic way to implement two constructors

Pardon incompetence of style from Python novice here.
I have a class that takes one parameter for establishing the initial data. There are two ways how the initial data can come in: either a list of strings, or a dictionary with string keys and integer values.
Right now I implement only one version of the constructor, the one that takes the dictionary for parameter, with {} as default value. The list parameter init is implemented as a method, ie
myClass = MyClass()
myClass.initList(listVar)
I can surely live with this, but this certainly is not perfect. So, I decided to turn here for some Pythonic wisdom: how such polymorphic constructors should be implemented? Should I try and fail to read initData.keys() in order to sniff if this is dictionary? Or maybe sniffing parameter types to implement lousy polymorphism where it's not welcome by design is considered non-pythonic?

In an ideal world you'd write one constructor that could take either a list or dict without knowing the difference (i.e. duck typed). Of course, this isn't very realistic since these are pretty different ducks.
Understandably, too, you have a little heartburn about the idea of checking the actual instance types, because it breaks with the idea of duck typing. But, in python 2.6 an interesting module called abc was introduced which allows the definition of "abstract base classes". To be considered an instance of an abstract base class, one doesn't actually have to inherit from it, but rather only has to implement all its abstract methods.
The collections module includes some abstract base classes that would be of interest here, namely collections.Sequence and collections.Mapping. Thus you could write your __init__ functions like:
def __init__(self, somedata):
if isinstance(somedata, collections.Sequence):
# somedata is a list or some other Sequence
elif isinstance(somedata, collections.Mapping):
# somedata is a dict or some other Mapping
http://docs.python.org/2/library/collections.html#collections-abstract-base-classes contains the specifics of which methods are provided by each ABC. If you stick to these, then your code can now accept any object which fits one of these abstract base classes. And, as far as taking the builtin dict and list types, you can see that:
>>> isinstance([], collections.Sequence)
True
>>> isinstance([], collections.Mapping)
False
>>> isinstance({}, collections.Sequence)
False
>>> isinstance({}, collections.Mapping)
True
And, almost by accident, you just made it work for tuple too. You probably didn't care if it was really a list, just that you can read the elements out of it. But, if you had checked isinstance(somedata, list) you would have ruled out tuple. This is what using an ABC buys you.

As #Jan-PhilipGehrcke notes, pythonic can be hard to quantify. To me it means:
easy to read
easy to maintain
simple is better than complex is better than complicated
etcetera, etcetera, and so forth (see the Zen of Python for the complete list, which you get by typing import this in the interpreter)
So, the most pythonic solution depends on what you have to do for each supported initializer, and how many of them you have. I would say if you have only a handful, and each one can be handled by only a few lines of code, then use isinstance and __init__:
class MyClass(object):
def __init__(self, initializer):
"""
initialize internal data structures with 'initializer'
"""
if isinstance(initializer, dict):
for k, v in itit_dict.items():
# do something with k & v
setattr(self, k, v)
elif isinstance(initializer, (list, tuple)):
for item in initializer:
setattr(self, item, None)
On the other hand, if you have many possible initializers, or if any one of them requires a lot of code to handle, then you'll want to have one classmethod constructor for each possible init type, with the most common usage being in __init__:
class MyClass(object):
def __init__(self, init_dict={}):
"""
initialize internal data structures with 'init_dict'
"""
for k, v in itit_dict.items():
# do something with k & v
setattr(self, k, v)
#classmethod
def from_sequence(cls, init_list):
"""
initialize internal data structures with 'init_list'
"""
result = cls()
for item in init_list:
setattr(result, item, None)
return result
This keeps each possible constructor simple, clean, and easy to understand.
As a side note: using mutable objects as defaults (like I do in the above __init__) needs to be done with care; the reason is that defaults are only evaluated once, and then whatever the result is will be used for every subsequent invocation. This is only a problem when you modify that object in your function, because those modifications will then be seen by every subsequent invocation -- and unless you wanted to create a cache that's probably not the behavior you were looking for. This is not a problem with my example because I am not modifying init_dict, just iterating over it (which is a no-op if the caller hasn't replaced it as it's empty).

There is no function overloading in Python. Using if isinstance(...) in your __init__() method would be very simple to read and understand:
class Foo(object):
def __init__(self, arg):
if isinstance(arg, dict):
...
elif isinstance(arg, list):
...
else:
raise Exception("big bang")

You can use *args and **kwargs to do it.
But if you want to know what type of parametr is you should use type() or isinstance().

Which special methods bypasses getattribute in Python?

In addition to bypassing any instance attributes in the interest of correctness, implicit special method lookup generally also bypasses the __getattribute__() method even of the object’s metaclass.
The docs mention special methods such as __hash__, __repr__ and __len__, and I know from experience it also includes __iter__ for Python 2.7.
To quote an answer to a related question:
"Magic __methods__() are treated specially: They are internally assigned to "slots" in the type data structure to speed up their look-up, and they are only looked up in these slots."
In a quest to improve my answer to another question, I need to know: Which methods, specifically, are we talking about?

You can find an answer in the python3 documentation for object.__getattribute__, which states:
Called unconditionally to implement attribute accesses for instances of the class. If the class also defines __getattr__(), the
latter will not be called unless __getattribute__() either calls it
explicitly or raises an AttributeError. This method should return the
(computed) attribute value or raise an AttributeError exception. In
order to avoid infinite recursion in this method, its implementation
should always call the base class method with the same name to access
any attributes it needs, for example, object.__getattribute__(self,
name).
Note
This method may still be bypassed when looking up special methods as the result of implicit invocation via language syntax or built-in
functions. See Special method lookup.
also this page explains exactly how this "machinery" works. Fundamentally __getattribute__ is called only when you access an attribute with the .(dot) operator(and also by hasattr as Zagorulkin pointed out).
Note that the page does not specify which special methods are implicitly looked up, so I deem that this hold for all of them(which you may find here.

Checked in 2.7.9
Couldn't find any way to bypass the call to __getattribute__, with any of the magical methods that are found on object or type:
# Preparation step: did this from the console
# magics = set(dir(object) + dir(type))
# got 38 names, for each of the names, wrote a.<that_name> to a file
# Ended up with this:
a.__module__
a.__base__
#...
Put this at the beginning of that file, which i renamed into a proper python module (asdf.py)
global_counter = 0
class Counter(object):
def __getattribute__(self, name):
# this will count how many times the method was called
global global_counter
global_counter += 1
return super(Counter, self).__getattribute__(name)
a = Counter()
# after this comes the list of 38 attribute accessess
a.__module__
#...
a.__repr__
#...
print global_counter # you're not gonna like it... it printer 38
Then i also tried to get each of those names by getattr and hasattr -> same result. __getattribute__ was called every time.
So if anyone has other ideas... I was too lazy to look inside C code for this, but I'm sure the answer lies somewhere there.
So either there's something that i'm not getting right, or the docs are lying.

super().method will also bypass __getattribute__. This atrocious code will run just fine (Python 3.11).
class Base:
def print(self):
print("whatever")
def __getattribute__(self, item):
raise Exception("Don't access this with a dot!")
class Sub(Base):
def __init__(self):
super().print()
a = Sub()
# prints 'whatever'
a.print()
# Exception Don't access this with a dot!

Why Is The property Decorator Only Defined For Classes?

tl;dr: How come property decorators work with class-level function definitions, but not with module-level definitions?
I was applying property decorators to some module-level functions, thinking they would allow me to invoke the methods by mere attribute lookup.
This was particularly tempting because I was defining a set of configuration functions, like get_port, get_hostname, etc., all of which could have been replaced with their simpler, more terse property counterparts: port, hostname, etc.
Thus, config.get_port() would just be the much nicer config.port
I was surprised when I found the following traceback, proving that this was not a viable option:
TypeError: int() argument must be a string or a number, not 'property'
I knew I had seen some precedant for property-like functionality at module-level, as I had used it for scripting shell commands using the elegant but hacky pbs library.
The interesting hack below can be found in the pbs library source code. It enables the ability to do property-like attribute lookups at module-level, but it's horribly, horribly hackish.
# this is a thin wrapper around THIS module (we patch sys.modules[__name__]).
# this is in the case that the user does a "from pbs import whatever"
# in other words, they only want to import certain programs, not the whole
# system PATH worth of commands. in this case, we just proxy the
# import lookup to our Environment class
class SelfWrapper(ModuleType):
def __init__(self, self_module):
# this is super ugly to have to copy attributes like this,
# but it seems to be the only way to make reload() behave
# nicely. if i make these attributes dynamic lookups in
# __getattr__, reload sometimes chokes in weird ways...
for attr in ["__builtins__", "__doc__", "__name__", "__package__"]:
setattr(self, attr, getattr(self_module, attr))
self.self_module = self_module
self.env = Environment(globals())
def __getattr__(self, name):
return self.env[name]
Below is the code for inserting this class into the import namespace. It actually patches sys.modules directly!
# we're being run as a stand-alone script, fire up a REPL
if __name__ == "__main__":
globs = globals()
f_globals = {}
for k in ["__builtins__", "__doc__", "__name__", "__package__"]:
f_globals[k] = globs[k]
env = Environment(f_globals)
run_repl(env)
# we're being imported from somewhere
else:
self = sys.modules[__name__]
sys.modules[__name__] = SelfWrapper(self)
Now that I've seen what lengths pbs has to go through, I'm left wondering why this facility of Python isn't built into the language directly. The property decorator in particular seems like a natural place to add such functionality.
Is there any partiuclar reason or motivation for why this isn't built directly in?

This is related to a combination of two factors: first, that properties are implemented using the descriptor protocol, and second that modules are always instances of a particular class rather than being instantiable classes.
This part of the descriptor protocol is implemented in object.__getattribute__ (the relevant code is PyObject_GenericGetAttr starting at line 1319). The lookup rules go like this:
Search through the class mro for a type dictionary that has name
If the first matching item is a data descriptor, call its __get__ and return its result
If name is in the instance dictionary, return its associated value
If there was a matching item from the class dictionaries and it was a non-data descriptor, call its __get__ and return the result
If there was a matching item from the class dictionaries, return it
raise AttributeError
The key to this is at number 3 - if name is found in the instance dictionary (as it will be with modules), then its value will just be returned - it won't be tested for descriptorness, and its __get__ won't be called. This leads to this situation (using Python 3):
>>> class F:
... def __getattribute__(self, attr):
... print('hi')
... return object.__getattribute__(self, attr)
...
>>> f = F()
>>> f.blah = property(lambda: 5)
>>> f.blah
hi
<property object at 0xbfa1b0>
You can see that .__getattribute__ is being invoked, but isn't treating f.blah as a descriptor.
It is likely that the reason for the rules being structured this way is an explicit tradeoff between the usefulness of allowing descriptors on instances (and, therefore, in modules) and the extra code complexity that this would lead to.

Properties are a feature specific to classes (new-style classes specifically) so by extension the property decorator can only be applied to class methods.
A new-style class is one that derives from object, i.e. class Foo(object):
Further info: Can modules have properties the same way that objects can?

Wrapping a Python Object

I'd like to serialize Python objects to and from the plist format (this can be done with plistlib). My idea was to write a class PlistObject which wraps other objects:
def __init__(self, anObject):
self.theObject = anObject
and provides a "write" method:
def write(self, pathOrFile):
plistlib.writeToPlist(self.theObject.__dict__, pathOrFile)
Now it would be nice if the PlistObject behaved just like wrapped object itself, meaning that all attributes and methods are somehow "forwarded" to the wrapped object. I realize that the methods __getattr__ and __setattr__ can be used for complex attribute operations:
def __getattr__(self, name):
return self.theObject.__getattr__(name)
But then of course I run into the problem that the constructor now produces an infinite recursion, since also self.theObject = anObject tries to access the wrapped object.
How can I avoid this? If the whole idea seems like a bad one, tell me too.

Unless I'm missing something, this will work just fine:
def __getattr__(self, name):
return getattr(self.theObject, name)
Edit: for those thinking that the lookup of self.theObject will result in an infinite recursive call to __getattr__, let me show you:
>>> class Test:
... a = "a"
... def __init__(self):
... self.b = "b"
... def __getattr__(self, name):
... return 'Custom: %s' % name
...
>>> Test.a
'a'
>>> Test().a
'a'
>>> Test().b
'b'
>>> Test().c
'Custom: c'
__getattr__ is only called as a last resort. Since theObject can be found in __dict__, no issues arise.

But then of course I run into the problem that the constructor now produces an infinite recursion, since also self.theObject = anObject tries to access the wrapped object.
That's why the manual suggests that you do this for all "real" attribute accesses.
theobj = object.__getattribute__(self, "theObject")

I'm glad to see others have been able to help you with the recursive call to __getattr__. Since you've asked for comments on the general approach of serializing to plist, I just wanted to chime in with a few thoughts.
Python's plist implementation handles basic types only, and provides no extension mechanism for you to instruct it on serializing/deserializing complex types. If you define a custom class, for example, writePlist won't be able to help, as you've discovered since you're passing the instance's __dict__ for serialization.
This has a couple implications:
You won't be able to use this to serialize any objects that contain other objects of non-basic type without converting them to a __dict__, and so-on recursively for the entire network graph.
If you roll your own network graph walker to serialize all non-basic objects that can be reached, you'll have to worry about circles in the graph where one object has another in a property, which in turn holds a reference back to the first, etc etc.
Given then, you may wish to look at pickle instead as it can handle all of these and more. If you need the plist format for other reasons, and you're sure you can stick to "simple" object dicts, then you may wish to just use a simple function... trying to have the PlistObject mock every possible function in the contained object is an onion with potentially many layers as you need to handle all the possibilities of the wrapped instance.
Something as simple as this may be more pythonic, and keep the usability of the wrapped object simpler by not wrapping it in the first place:
def to_plist(obj, f_handle):
writePlist(obj.__dict__, f_handle)
I know that doesn't seem very sexy, but it is a lot more maintainable in my opinion than a wrapper given the severe limits of the plist format, and certainly better than artificially forcing all objects in your application to inherit from a common base class when there's nothing in your business domain that actually indicates those disparate objects are related.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why doesn't Python have a hybrid getattr + getitem built in? - python

Related

What is the correct way to override the dir method?

The most pythonic way to implement two constructors

Which special methods bypasses getattribute in Python?

Why Is The property Decorator Only Defined For Classes?

Wrapping a Python Object

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why doesn't Python have a hybrid getattr + __getitem__ built in? - python

Related

What is the correct way to override the __dir__ method?

The most pythonic way to implement two constructors

Which special methods bypasses __getattribute__ in Python?

Why Is The property Decorator Only Defined For Classes?

Wrapping a Python Object

Categories

Resources

Why doesn't Python have a hybrid getattr + getitem built in? - python

What is the correct way to override the dir method?

Which special methods bypasses getattribute in Python?