Detect if a __getattribute__ call was due to hasattr

Detect if a __getattribute__ call was due to hasattr - python

I'm re-implementing __getattribute__ for a class.
I want to notice any incorrect (meaning failures are expected, of course) failures of providing attributes (because the __getattribute__ implementation turned out quite complex). For that I log a warning if my code was unable to find/provide the attribute just before raising an AttributeError.
I'm aware:
__getattribute__ implementations are encouraged to be as small as simple as possible.
It is considered wrong for a __getattribute__ implementation to behave differently based on how/why it was called.
Code accessing the attribute can just as well try/except instead of using hasattr.
TL;DR: Nevertheless, I'd like to detect whether a call to __getattribute__ was done due to hasattr (verses a "genuine" attempt at accessing the attribute).

This is not possible, even through stack inspection. hasattr produces no frame object in the Python call stack, as it is written in C, and trying to inspect the last Python frame to guess whether it's suspended in the middle of a hasattr call is prone to all kinds of false negatives and false positives.
If you're absolutely determined to make your best shot at it anyway, the most reliable (but still fragile) kludge I can think of is to monkey-patch builtins.hasattr with a Python function that does produce a Python stack frame:
import builtins
import inspect
import types
_builtin_hasattr = builtins.hasattr
if not isinstance(_builtin_hasattr, types.BuiltinFunctionType):
raise Exception('hasattr already patched by someone else!')
def hasattr(obj, name):
return _builtin_hasattr(obj, name)
builtins.hasattr = hasattr
def probably_called_from_hasattr():
# Caller's caller's frame.
frame = inspect.currentframe().f_back.f_back
return frame.f_code is hasattr.__code__
Calling probably_called_from_hasattr inside __getattribute__ will then test if your __getattribute__ was probably called from hasattr. This avoids any need to assume that the calling code used the name "hasattr", or that use of the name "hasattr" corresponds to this particular __getattribute__ call, or that the hasattr call originated inside Python-level code instead of C.
The primary sources of fragility here are if someone saved a reference to the real hasattr before the monkey-patch went through, or if someone else monkey-patches hasattr (such as if someone copy-pastes this code into another file in the same program). The isinstance check attempts to catch most cases of someone else monkey-patching hasattr before us, but it's not perfect.
Additionally, if hasattr on an object written in C triggers attribute access on your object, that will look like your __getattribute__ was called from hasattr. This is the most likely way to get false positives; everything in the previous paragraph would give false negatives. You can protect against that by checking that the entry for obj in the hasattr frame's f_locals is the object it should be.
Finally, if your __getattribute__ was called from a decorator-created wrapper, subclass __getattribute__, or something similar, that will not count as a call from hasattr, even if the wrapper or override was called from hasattr, even if you want it to count.

You can use sys._getframe to get the caller frame and use inspect.getframeinfo to get the line of code that makes the call, and then use some sort of parsing mechanism such as regex (you can't use ast.parse since the one line of code is often an incomplete statement) to see if hasattr is the caller. It isn't very robust but it should work in most reasonable cases:
import inspect
import sys
import re
class A:
def __getattribute__(self, item):
if re.search(r'\bhasattr\b', inspect.getframeinfo(sys._getframe(1)).code_context[0]):
print('called by hasattr')
else:
print('called by something else')
hasattr(A(), 'foo')
getattr(A(), 'foo')
This outputs:
called by hasattr
called by something else

Related

Python 3 getattribute vs dot access behaviour

I read a bit on python's object attribute lookup (here: https://blog.ionelmc.ro/2015/02/09/understanding-python-metaclasses/#object-attribute-lookup).
Seems pretty straight forward, so I tried it out (python3):
class A:
def __getattr__(self, attr):
return (1,2,3)
a = A()
a.foobar #returns (1,2,3) as expected
a.__getattribute__('foobar') # raises AttributeError
My question is, aren't the two supposed to be identical?
Why does the second one raise an attribute error?
So apparently the answer is that the logic for a.foobar IS different from the logic for a.__getattribute("foobar"). According to the data model: a.foobar calls a.__getattribute("foobar") and if it raises an AttributeError, it calls a.-__getattr__('foobar')
So it seems the article has a mistake in their diagram. Is this correct?
And another question: Where does the real logic for a.foobar sit? I thought it was in __getattribute__ but apparently not entirely.
Edit:
Not a duplicate of
Difference between __getattr__ vs __getattribute__.
I am asking here what is the different between object.foo and object.__getattribute__("foo"). This is different from __getattr__ vs __getatribute__ which is trivial...

It's easy to get the impression that __getattribute__ is responsible for more than it really is. thing.attr doesn't directly translate to thing.__getattribute__('attr'), and __getattribute__ is not responsible for calling __getattr__.
The fallback to __getattr__ happens in the part of the attribute access machinery that lies outside __getattribute__. The attribute lookup process works like this:
Find the __getattribute__ method through a direct search of the object's type's MRO, bypassing the regular attribute lookup process.
Try __getattribute__.
If __getattribute__ returned something, the attribute lookup process is complete, and that's the attribute value.
If __getattribute__ raised a non-AttributeError, the attribute lookup process is complete, and the exception propagates out of the lookup.
Otherwise, __getattribute__ raised an AttributeError. The lookup continues.
Find the __getattr__ method the same way we found __getattribute__.
If there is no __getattr__, the attribute lookup process is complete, and the AttributeError from __getattribute__ propagates.
Try __getattr__, and return or raise whatever __getattr__ returns or raises.
At least, in terms of the language semantics, it works like that. In terms of the low-level implementation, some of these steps may be optimized out in cases where they're unnecessary, and there are C hooks like tp_getattro that I haven't described. You don't need to worry about that kind of thing unless you want to dive into the CPython interpreter source code.

How to determine the method type of stdlib methods written in C

The classify_class_attrs function from the inspect module can be used to determine what kind of object each of a class's attributes is, including whether a function is an instance method, a class method, or a static method. Here is an example:
from inspect import classify_class_attrs
class Example(object):
#classmethod
def my_class_method(cls):
pass
#staticmethod
def my_static_method():
pass
def my_instance_method(self):
pass
print classify_class_attrs(Example)
This will output a list of Attribute objects for each attribute on Example, with metadata about the attribute. The relevant ones in these case are:
Attribute(name='my_class_method', kind='class method', defining_class=<class '__main__.Example'>, object=<classmethod object at 0x100535398>)
Attribute(name='my_instance_method', kind='method', defining_class=<class '__main__.Example'>, object=<unbound method Example.my_instance_method>)
Attribute(name='my_static_method', kind='static method', defining_class=<class '__main__.Example'>, object=<staticmethod object at 0x100535558>)
However, it seems that many objects in Python's standard library can't be introspected this way. I'm guessing this has something to do with the fact that many of them are implemented in C. For example, datetime.datetime.now is described with this Attribute object by inspect.classify_class_attrs:
Attribute(name='now', kind='method', defining_class=<type 'datetime.datetime'>, object=<method 'now' of 'datetime.datetime' objects>)
If we compare this to the metadata returned about the attributes on Example, you'd probably draw the conclusion that datetime.datetime.now is an instance method. But it actually behaves as a class method!
from datetime import datetime
print datetime.now() # called from the class: 2014-09-12 16:13:33.890742
print datetime.now().now() # called from a datetime instance: 2014-09-12 16:13:33.891161
Is there a reliable way to determine whether a method on a stdlib class is a static, class, or instance method?

I think you can get much of what you want, distinguishing five kinds, without relying on anything that isn't documented by inspect:
Python instance methods
Python class methods
Python static methods
Builtin instance methods
Builtin class methods or static methods
But you can't distinguish those last two from each other with using CPython-specific implementation details.
(As far as I know, only 3.x has any builtin static methods in the stdlib… but of course even in 2.x, someone could always define one in an extension module.)
The details of what's available in inspect and even what it means are a little different in each version of Python, partly because things have changed between 2.x and 3.x, partly because inspect is basically a bunch of heuristics that have gradually improved over time.
But at least for CPython 2.6 and 2.7 and 3.3-3.5, the simplest way to distinguish builtin instance methods from the other two types is isbuiltin on the method looked up from the class. For a static method or class method, this will be True; for an instance method, False. For example:
>>> inspect.isbuiltin(str.maketrans)
True
>>> inspect.isbuiltin(datetime.datetime.now)
True
>>> inspect.isbuiltin(datetime.datetime.ctime)
False
Why does this work? Well, isbuiltin will:
Return true if the object is a built-in function or a bound built-in method.
When looked up on an instance, either a regular method or a classmethod-like method is bound. But when looked up on the class, a regular method is unbound, while a classmethod-like method is bound (to the class). And of course a staticmethod-like method ends up as a plain-old function when looked up either way. So, it's a bit indirect, but it will always be correct.*
What about class methods vs. static methods?
In CPython 3.x, builtin static and class method descriptors both return the exact same type when looked up on their class, and none of the documented attributes can be used to distinguish them either. And even if this weren't true, I think the way the reference is written, it's guaranteed that no functions in inspect would be able to distinguish them.
What if we turn to the descriptors themselves? Yes, there are ways we can distinguish them… but I don't think it's something guaranteed by the language:
>>> callable(str.__dict__['maketrans'])
False
>>> callable(datetime.datetime.__dict__['now'])
True
Why does this work? Well, static methods just use a staticmethod descriptor, exactly like in Python (but wrapping a builtin function instead of a function). But class and instance methods use a special descriptor type, instead of using classmethod wrapping a (builtin) function and the (builtin) function itself, as Python class and instance methods do. These special descriptor types, classmethod_descriptor and method_descriptor, are unbound (class and instance) methods, as well as being the descriptors that bind them. There are historical/implementation reasons for this to be true, but I don't think there's anything in the language reference that requires it to be true, or even implies it.
And if you're willing to rely on implementation artifacts, isinstance(m, staticmethod) seems a lot simpler…
All that being said, are there any implementations besides CPython that have both builtin staticmethods and classmethods? If not, remember that practicality beats purity…
* What it's really testing for is whether the thing is callable without an extra argument, but that's basically the same thing as the documented "function or bound method"; either way, it's what you want.

Python warn me or prevent me from using global variables

I've gotten myself in trouble a few times now with accidentially (unintentionally) referencing global variables in a function or method definition.
My question is: is there any way to disallow python from letting me reference a global variable? Or at least warn me that I am referencing a global variable?
x = 123
def myfunc() :
print x # throw a warning or something!!!
Let me add that the typical situation where this arrises for my is using IPython as an interactive shell. I use 'execfile' to execute a script that defines a class. In the interpreter, I access the class variable directly to do something useful, then decide I want to add that as a method in my class. When I was in the interpreter, I was referencing the class variable. However, when it becomes a method, it needs to reference 'self'. Here's an example.
class MyClass :
a = 1
b = 2
def add(self) :
return a+b
m = MyClass()
Now in my interpreter I run the script 'execfile('script.py')', I'm inspecting my class and type: 'm.a * m.b' and decide, that would be a useful method to have. So I modify my code to be, with the non-intentional copy/paste error:
class MyClass :
a = 1
b = 2
def add(self) :
return a+b
def mult(self) :
return m.a * m.b # I really meant this to be self.a * self.b
This of course still executes in IPython, but it can really confuse me since it is now referencing the previously defined global variable!
Maybe someone has a suggestion given my typical IPython workflow.

First, you probably don't want to do this. As Martijn Pieters points out, many things, like top-level functions and classes, are globals.
You could filter this for only non-callable globals. Functions, classes, builtin-function-or-methods that you import from a C extension module, etc. are callable. You might also want to filter out modules (anything you import is a global). That still won't catch cases where you, say, assign a function to another name after the def. You could add some kind of whitelisting for that (which would also allow you to create global "constants" that you can use without warnings). Really, anything you come up with will be a very rough guide at best, not something you want to treat as an absolute warning.
Also, no matter how you do it, trying to detect implicit global access, but not explicit access (with a global statement) is going to be very hard, so hopefully that isn't important.
There is no obvious way to detect all implicit uses of global variables at the source level.
However, it's pretty easy to do with reflection from inside the interpreter.
The documentation for the inspect module has a nice chart that shows you the standard members of various types. Note that some of them have different names in Python 2.x and Python 3.x.
This function will get you a list of all the global names accessed by a bound method, unbound method, function, or code object in both versions:
def get_globals(thing):
thing = getattr(thing, 'im_func', thing)
thing = getattr(thing, '__func__', thing)
thing = getattr(thing, 'func_code', thing)
thing = getattr(thing, '__code__', thing)
return thing.co_names
If you want to only handle non-callables, you can filter it:
def get_callable_globals(thing):
thing = getattr(thing, 'im_func', thing)
func_globals = getattr(thing, 'func_globals', {})
thing = getattr(thing, 'func_code', thing)
return [name for name in thing.co_names
if callable(func_globals.get(name))]
This isn't perfect (e.g., if a function's globals have a custom builtins replacement, we won't look it up properly), but it's probably good enough.
A simple example of using it:
>>> def foo(myparam):
... myglobal
... mylocal = 1
>>> print get_globals(foo)
('myglobal',)
And you can pretty easily import a module and recursively walk its callables and call get_globals() on each one, which will work for the major cases (top-level functions, and methods of top-level and nested classes), although it won't work for anything defined dynamically (e.g., functions or classes defined inside functions).
If you only care about CPython, another option is to use the dis module to scan all the bytecode in a module, or .pyc file (or class, or whatever), and log each LOAD_GLOBAL op.
One major advantage of this over the inspect method is that it will find functions that have been compiled, even if they haven't been created yet.
The disadvantage is that there is no way to look up the names (how could there be, if some of them haven't even been created yet?), so you can't easily filter out callables. You can try to do something fancy, like connecting up LOAD_GLOBAL ops to corresponding CALL_FUNCTION (and related) ops, but… that's starting to get pretty complicated.
Finally, if you want to hook things dynamically, you can always replace globals with a wrapper that warns every time you access it. For example:
class GlobalsWrapper(collections.MutableMapping):
def __init__(self, globaldict):
self.globaldict = globaldict
# ... implement at least __setitem__, __delitem__, __iter__, __len__
# in the obvious way, by delegating to self.globaldict
def __getitem__(self, key):
print >>sys.stderr, 'Warning: accessing global "{}"'.format(key)
return self.globaldict[key]
globals_wrapper = GlobalsWrapper(globals())
Again, you can filter on non-callables pretty easily:
def __getitem__(self, key):
value = self.globaldict[key]
if not callable(value):
print >>sys.stderr, 'Warning: accessing global "{}"'.format(key)
return value
Obviously for Python 3 you'd need to change the print statement to a print function call.
You can also raise an exception instead of warning pretty easily. Or you might want to consider using the warnings module.
You can hook this into your code in various different ways. The most obvious one is an import hook that gives each new module a GlobalsWrapper around its normally-built globals. Although I'm not sure how that will interact with C extension modules, but my guess is that it will either work, or be harmlessly ignored, either of which is probably fine. The only problem is that this won't affect your top-level script. If that's important, you can write a wrapper script that execfiles the main script with a GlobalsWrapper, or something like that.

I've been struggling with a similar challenge (especially in Jupyter notebooks) and created a small package to limit the scope of functions.
>>> from localscope import localscope
>>> a = 'hello world'
>>> #localscope
... def print_a():
... print(a)
Traceback (most recent call last):
...
ValueError: `a` is not a permitted global
The #localscope decorator uses python's disassembler to find all instances of the decorated function using a LOAD_GLOBAL (global variable access) or LOAD_DEREF (closure access) statement. If the variable to be loaded is a builtin function, is explicitly listed as an exception, or satisfies a predicate, the variable is permitted. Otherwise, an exception is raised.
Note that the decorator analyses the code statically. Consequently, it does not have access to the values of variables accessed by closure.

Which special methods bypasses getattribute in Python?

In addition to bypassing any instance attributes in the interest of correctness, implicit special method lookup generally also bypasses the __getattribute__() method even of the object’s metaclass.
The docs mention special methods such as __hash__, __repr__ and __len__, and I know from experience it also includes __iter__ for Python 2.7.
To quote an answer to a related question:
"Magic __methods__() are treated specially: They are internally assigned to "slots" in the type data structure to speed up their look-up, and they are only looked up in these slots."
In a quest to improve my answer to another question, I need to know: Which methods, specifically, are we talking about?

You can find an answer in the python3 documentation for object.__getattribute__, which states:
Called unconditionally to implement attribute accesses for instances of the class. If the class also defines __getattr__(), the
latter will not be called unless __getattribute__() either calls it
explicitly or raises an AttributeError. This method should return the
(computed) attribute value or raise an AttributeError exception. In
order to avoid infinite recursion in this method, its implementation
should always call the base class method with the same name to access
any attributes it needs, for example, object.__getattribute__(self,
name).
Note
This method may still be bypassed when looking up special methods as the result of implicit invocation via language syntax or built-in
functions. See Special method lookup.
also this page explains exactly how this "machinery" works. Fundamentally __getattribute__ is called only when you access an attribute with the .(dot) operator(and also by hasattr as Zagorulkin pointed out).
Note that the page does not specify which special methods are implicitly looked up, so I deem that this hold for all of them(which you may find here.

Checked in 2.7.9
Couldn't find any way to bypass the call to __getattribute__, with any of the magical methods that are found on object or type:
# Preparation step: did this from the console
# magics = set(dir(object) + dir(type))
# got 38 names, for each of the names, wrote a.<that_name> to a file
# Ended up with this:
a.__module__
a.__base__
#...
Put this at the beginning of that file, which i renamed into a proper python module (asdf.py)
global_counter = 0
class Counter(object):
def __getattribute__(self, name):
# this will count how many times the method was called
global global_counter
global_counter += 1
return super(Counter, self).__getattribute__(name)
a = Counter()
# after this comes the list of 38 attribute accessess
a.__module__
#...
a.__repr__
#...
print global_counter # you're not gonna like it... it printer 38
Then i also tried to get each of those names by getattr and hasattr -> same result. __getattribute__ was called every time.
So if anyone has other ideas... I was too lazy to look inside C code for this, but I'm sure the answer lies somewhere there.
So either there's something that i'm not getting right, or the docs are lying.

super().method will also bypass __getattribute__. This atrocious code will run just fine (Python 3.11).
class Base:
def print(self):
print("whatever")
def __getattribute__(self, item):
raise Exception("Don't access this with a dot!")
class Sub(Base):
def __init__(self):
super().print()
a = Sub()
# prints 'whatever'
a.print()
# Exception Don't access this with a dot!

Wrapping a Python Object

I'd like to serialize Python objects to and from the plist format (this can be done with plistlib). My idea was to write a class PlistObject which wraps other objects:
def __init__(self, anObject):
self.theObject = anObject
and provides a "write" method:
def write(self, pathOrFile):
plistlib.writeToPlist(self.theObject.__dict__, pathOrFile)
Now it would be nice if the PlistObject behaved just like wrapped object itself, meaning that all attributes and methods are somehow "forwarded" to the wrapped object. I realize that the methods __getattr__ and __setattr__ can be used for complex attribute operations:
def __getattr__(self, name):
return self.theObject.__getattr__(name)
But then of course I run into the problem that the constructor now produces an infinite recursion, since also self.theObject = anObject tries to access the wrapped object.
How can I avoid this? If the whole idea seems like a bad one, tell me too.

Unless I'm missing something, this will work just fine:
def __getattr__(self, name):
return getattr(self.theObject, name)
Edit: for those thinking that the lookup of self.theObject will result in an infinite recursive call to __getattr__, let me show you:
>>> class Test:
... a = "a"
... def __init__(self):
... self.b = "b"
... def __getattr__(self, name):
... return 'Custom: %s' % name
...
>>> Test.a
'a'
>>> Test().a
'a'
>>> Test().b
'b'
>>> Test().c
'Custom: c'
__getattr__ is only called as a last resort. Since theObject can be found in __dict__, no issues arise.

But then of course I run into the problem that the constructor now produces an infinite recursion, since also self.theObject = anObject tries to access the wrapped object.
That's why the manual suggests that you do this for all "real" attribute accesses.
theobj = object.__getattribute__(self, "theObject")

I'm glad to see others have been able to help you with the recursive call to __getattr__. Since you've asked for comments on the general approach of serializing to plist, I just wanted to chime in with a few thoughts.
Python's plist implementation handles basic types only, and provides no extension mechanism for you to instruct it on serializing/deserializing complex types. If you define a custom class, for example, writePlist won't be able to help, as you've discovered since you're passing the instance's __dict__ for serialization.
This has a couple implications:
You won't be able to use this to serialize any objects that contain other objects of non-basic type without converting them to a __dict__, and so-on recursively for the entire network graph.
If you roll your own network graph walker to serialize all non-basic objects that can be reached, you'll have to worry about circles in the graph where one object has another in a property, which in turn holds a reference back to the first, etc etc.
Given then, you may wish to look at pickle instead as it can handle all of these and more. If you need the plist format for other reasons, and you're sure you can stick to "simple" object dicts, then you may wish to just use a simple function... trying to have the PlistObject mock every possible function in the contained object is an onion with potentially many layers as you need to handle all the possibilities of the wrapped instance.
Something as simple as this may be more pythonic, and keep the usability of the wrapped object simpler by not wrapping it in the first place:
def to_plist(obj, f_handle):
writePlist(obj.__dict__, f_handle)
I know that doesn't seem very sexy, but it is a lot more maintainable in my opinion than a wrapper given the severe limits of the plist format, and certainly better than artificially forcing all objects in your application to inherit from a common base class when there's nothing in your business domain that actually indicates those disparate objects are related.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.