Tool to inspect Python objects without changing them - python

While trying to track down a resource leak in a Python program this evening, it occurred to me that modern ORMs make the job quite difficult. An object which is, in fact, sitting alone in memory with no children will suddenly appear to have a dozen associated objects as you start checking its attributes because, of course, each attribute dereference invokes a descriptor that pulls in additional information on-the-fly.
I even noticed that doing a simple print of one particular object wound up doing a database query and pulling more linked objects into memory — ruining the careful reference counts that I had been computing — because its __repr__() built the displayed name out of a few associated objects.
There are, it happens, a few techniques that allow objects to be inspected without affecting them — operations like type(obj) and id(obj) and obj.__dict__. (But not printing the __dict__, since that invokes __repr__() on every single value in the dictionary!) Has anyone ever combined these few “safe” inspection methods to support, at a prompt like the Python debugger, convenient inspection and exploration of a Python object graph so that I can see where these files are being held open, running me out of file descriptors?
I need, essentially, an anti-Heisenberg tool, that prevents my acts of inspection from having any side effects!
The “inspect” module:
One answer suggests that I try the inspect() module, but it looks like it dereferences every attribute on the object you supply:
import inspect
class Thing(object):
#property
def one(self):
print 'one() got called!'
return 1
t = Thing()
inspect.getmembers(t)
This outputs:
one() got called!
[('__class__', <class '__main__.Thing'>),
('__delattr__', <method-wrapper '__delattr__'…),
…
('one', 1)]

Python 3.2 now provides inspect.getattr_static() for precisely this kind of use case:
http://docs.python.org/py3k/library/inspect#fetching-attributes-statically
The source code link from the top of the docs page should make it fairly easy to backport that functionality to earlier versions (although keep in mind that as 3.x stdlib code, it isn't built to handle old-style classes).
I'm not aware of any existing tools that combine that kind of technique with inspection of obj.__dict__ to navigate a whole object graph without invoking descriptors, though.

I have no clue how safe the various methods are (it seems fairly dependent on your particular situation) but the inspect module provides a tremendous number of inspection tools.

Related

Can you safely change a Python object's type in a C extension?

Question
Suppose that I have implemented two Python types using the C extension API and that the types are identical (same data layouts/C struct) with the exception of their names and a few methods. Assuming that all methods respect the data layout, can you safely change the type of an object from one of these types into the other in a C function?
Notably, as of Python 3.9, there appears to be a function Py_SET_TYPE, but the documentation is not clear as to whether/when this is safe to do. I'm interested in knowing both how to use this function safely and whether types can be safely changed prior to version 3.9.
Motivation
I'm writing a Python C extension to implement a Persistent Hash Array Mapped Trie (PHAMT); in case it's useful, the source code is here (as of writing, it is at this commit). A feature I would like to add is the ability to create a Transient Hash Array Mapped Trie (THAMT) from a PHAMT. THAMTs can be created from PHAMTs in O(1) time and can be mutated in-place efficiently. Critically, THAMTs have the exact same underlying C data-structure as PHAMTs—the only real difference between a PHAMT and a THAMT is a few methods encapsulated by their Python types. This common structure allows one to very efficiently turn a THAMT back into a PHAMT once one has finished performing a set of edits. (This pattern typically reduces the number of memory allocations when performing a large number of updates to a PHAMT).
A very convenient way to implement the conversion from THAMT to PHAMT would be to simply change the type pointers of the THAMT objects from the THAMT type to the PHAMT type. I am confident that I can write code that safely navigates this change, but I can imagine that doing so might, for example, break the Python garbage collector.
(To be clear: the motivation is just context as to how the question arose. I'm not looking for help implementing the structures described in the Motivation, I'm looking for an answer to the Question, above.)
The supported way
It is officially possible to change an object's type in Python, as long as the memory layouts are compatible... but this is mostly limited to types not implemented in C. With some restrictions, it is possible to do
# Python attribute assignment, not C struct member assignment
obj.__class__ = some_new_class
to change an object's class, with one of the restrictions being that both the old and new classes must be "heap types", which all classes implemented in Python are and most classes implemented in C are not. (types.ModuleType and subclasses of that type are also specifically permitted, despite types.ModuleType not being a heap type. See the source for exact restrictions.)
If you want to create a heap type from C, you can, but the interface is pretty different from the normal way of defining Python types from C. Plus, for __class__ assignment to work, you have to not set the Py_TPFLAGS_IMMUTABLETYPE flag, and that means that people will be able to monkey-patch your classes in ways you might not like (or maybe you see that as an upside).
If you want to go that route, I suggest looking at the CPython 3.10 _functools module source code for an example. (They set the Py_TPFLAGS_IMMUTABLETYPE flag, which you'll have to make sure not to do.)
The unsupported way
There was an attempt at one point to allow __class__ assignment for non-heap types, as long as the memory layouts worked. It got abandoned because it caused problems with some built-in immutable types, where the interpreter likes to reuse instances. For example, allowing (1).__class__ = SomethingElse would have caused a lot of problems. You can read more in the big comment in the source code for the __class__ setter. (The comment is slightly out of date, particularly regarding the Py_TPFLAGS_IMMUTABLETYPE flag, which was added after the comment was written.)
As far as I know, this was the only problem, and I don't think any more problems have been added since then. The interpreter isn't going to aggressively reuse instances of your classes, so as long as you're not doing anything like that, and the memory layouts are compatible, I think changing the type of your objects should work for now, even for non-heap-types. However, it is not officially supported, so even if I'm right about this working for now, there's no guarantee it'll keep working.
Py_SET_TYPE only sets an object's type pointer. It doesn't do any refcount fixing that might be needed. It's a very low-level operation. If neither the old class nor the new class are heap types, no extra refcount fixing is needed, but if the old class is a heap type, you will have to decref the old class, and if the new class is a heap type, you will have to incref the new class.
If you need to decref the old class, make sure to do it after changing the object's class and possibly incref'ing the new class.
According to the language reference, chapter 3 "Data model" (see here):
An object’s type determines the operations that the object supports (e.g., “does it have a length?”) and also defines the possible values for objects of that type. The type() function returns an object’s type (which is an object itself). Like its identity, an object’s type is also unchangeable.[1]
which, to my mind states that the type must never change, and changing it would be illegal as it would break the language specification. The footnote however states that
[1] It is possible in some cases to change an object’s type, under certain controlled conditions. It generally isn’t a good idea though, since it can lead to some very strange behaviour if it is handled incorrectly.
I don't know of any method to change the type of an object from within python itself, so the "possible" may indeed refer to the CPython function.
As far as I can see a PyObject is defined internally as a
struct _object {
_PyObject_HEAD_EXTRA
Py_ssize_t ob_refcnt;
PyTypeObject *ob_type;
};
So the reference counting should still work. On the other hand you will segfault the interpreter if you set the type to something that is not a PyTypeObject, or if the pointer is free()d, so the usual caveats.
Apart from that I agree that the specification is a little ambiguous, but the question of "legality" may not have a good answer. The long and short of it seems to me to be "do not change types unless you know what your are doing, and if you are not hacking on CPython itself you do not know what you are doing".
Edit: The Py_SET_TYPE function was added in Python 3.9 based on this commit. Apparently, people used to just set the type using
Py_TYPE(obj) = typeobj;
So the inclusion (without being formerly announced as far as I can see) is more akin to adding a convenience function.

subclassing dict; dict.update returns incorrrect value - python bug?

I needed to make a class that extended dict and ran into an interesting problem illustrated by the dumb example in the image below.
Why is d.update() ignoring the class's __getitem__?
EDIT: This is in python2.7 which does not appear to contain collections.UserDict
Thinking UserDict.UserDict is the equivalent I tried this, and it gets closer, but still behaves interestingly.
This is an example of the open-closed-principle (the class is open for extension but closed for modification). It is good thing to have because it allows subclassers to extend or override a method without unintentionally triggering behavior changes in others and without breaking the classes's invariants.
We even do this in pure python code as well; for example, inside the pure python ordered dict code, the class local call from __init__() to update() is done using name mangling. This allows a subclasser to override update() without accidentally breaking __init__().
Sometimes, this is inconvenient. It means that a subclasser has to override every method whose behavior they want to change including get(), update(), and others. However, there are offsetting benefits (protection of internal invariants, preventing implementation details from leaking from the abstraction, and allowing users to assume the methods are independent of one another).
This style (chosen by Guido from the outset) is the default for the builtin types (otherwise we would forever be fighting segfaulting invariant violations) and for some pure python classes.
We do document when there is a departure from the default. For example, the cmd module uses the framework design pattern, letting the user define various do_action() methods. Also, some of the http modules do the same, specifically documenting that a user's do_GET() method is called and that is how you attach customized HTTP event handlers.
In the absence of specifically documented method hooks (i.e. those listed above or methods like dict.__missing__(), a subclasser should presume method independence. Otherwise, how are you to know whether __getitem__() calls get() under the hood or vice-versa?
FWIW, this isn't unique to Python. It comes up quite a bit in object oriented programming. Correctly designed classes either document root methods that affect the behavior of other methods or they are presumed to be independent.
There may need to be a FAQ for this, but nothing is broken or wrong here (other than Python having way too many dict variants to chose from). If someone mistakenly assumes or believes that __getitem__() must be called by the other accessor methods, they find out very quickly that assumption is wrong (that is if they run even minimal tests on the code).

Python: is there a use case for changing an instance's class?

Related: Python object conversion
I recently learned that Python allows you to change an instance's class like so:
class Robe:
pass
class Dress:
pass
r = Robe()
r.__class__ = Dress
I'm trying to figure out whether there is a case where 'transmuting' an object like this can be useful. I've messed around with this in IDLE, and one thing I've noticed is that assigning a different class doesn't call the new class's __init__ method, though this can be done explicitly if needed.
Virtually every use case I can think of would be better served by composition, but I'm a coding newb so what do I know. ;)
There is rarely a good reason to do this for unrelated classes, like Robe and Dress in your example. Without a bit of work, it's hard to ensure that the object you get in the end is in a sane state.
However, it can be useful when inheriting from a base class, if you want to use a non-standard factory function or constructor to build the base object. Here's an example:
class Base(object):
pass
def base_factory():
return Base() # in real code, this would probably be something opaque
def Derived(Base):
def __new__(cls):
self = base_factory() # get an instance of Base
self.__class__ = Derived # and turn it into an instance of Derived
return self
In this example, the Derived class's __new__ method wants to construct its object using the base_factory method which returns an instance of the Base class. Often this sort of factory is in a library somewhere, and you can't know for certain how it's making the object (you can't just call Base() or super(Derived, cls).__new__(cls) yourself to get the same result).
The instance's __class__ attribute is rewritten so that the result of calling Derived.__new__ will be an instance of the Derived class, which ensures that it will have the Derived.__init__ method called on it (if such a method exists).
I remember using this technique ages ago to “upgrade” existing objects after recognizing what kind of data they hold. It was a part of an experimental XMPP client. XMPP uses many short XML messages (“stanzas”) for communication.
When the application received a stanza, it was parsed into a DOM tree. Then the application needed to recognize what kind of stanza it is (a presence stanza, message, automated query etc.). If, for example, it was recognized as a message stanza, the DOM object was “upgraded” to a subclass that provided methods like “get_author”, “get_body” etc.
I could of course just make a new class to represent a parsed message, make new object of that class and copy the relevant data from the original XML DOM object. There were two benefits of changing object's class in-place, though. Firstly, XMPP is a very extensible standard, and it was useful to still have an easy access to the original DOM object in case some other part of the code found something useful there, or while debugging. Secondly, profiling the code told me that creating a new object and explicitly copying data is much slower than just reusing the object that would be quickly destroyed anyway—the difference was enough to matter in XMPP, which uses many short messages.
I don't think any of these reasons justifies the use of this technique in production code, unless maybe you really need the (not that big) speedup in CPython. It's just a hack which I found useful to make code a bit shorter and faster in the experimental application. Note also that this technique will easily break JIT engines in non-CPython implementations, making the code much slower!

Can I get a Python object from its memory address?

I'm learning how to use Qt with PyQt, and I have a QTabelView with a StandardItemModel I've populated the model successfully and hooked up the itemChanged signal to a slot. I'd l'd like to mess around with whatever object is returned in IPython, so currently I have the line:
def itemChangedSlot(epw, item):
new_data = item.data()
print new_data
print item
which prints
<PyQt4.QtGui.QStandardItem object at 0x07C5F930>
<PyQt4.QtCore.QVariant object at 0x07D331F0>
In the IPython session is it possible to get the object using this memory address? I'm not seeing anything on Google, maybe I don't have my terminology right?
You need to hold a reference to an object (i.e. assign it to a variable or store it in a list).
There is no language support for going from an object address directly to an object (i.e. pointer dereferencing).
You're almost certainly asking the wrong question, and Raymond Hettinger's answer is almost certainly what you really want.
Something like this might be useful trying to dig into the internals of the CPython interpreter for learning purposes or auditing it for security holes or something… But even then, you're probably better off embedding the Python interpreter into a program and writing functions that expose whatever you want into the Python interpreter, or at least writing a C extension module that lets you manipulate CPython objects.
But, on the off chance that you really do need to do this…
First, there is no reliable way to even get the address from the repr. Most objects with a useful eval-able representation will give you that instead. For example, the repr of ('1', 1) is "('1', 1)", not <tuple at 0x10ed51908>. Also, even for objects that have no useful representation, returning <TYPE at ADDR> is just an unstated convention that many types follow (and a default for user-defined classes), not something you can rely on.
However, since you presumably only care about CPython, you can rely on id:
CPython implementation detail: This is the address of the object in memory.
(Of course if you have the object to call id (or repr) on, you don't need to dereference it via pointer, and if you don't have the object, it's probably been garbage collected so there's nothing to dereference, but maybe you still have it and just can't remember where you put it…)
Next, what do you do with this address? Well, Python doesn't expose any functions to do the opposite of id. But the Python C API is well documented—and, if your Python is built around a shared library, that C API can be accessed via ctypes, just by loading it up. In fact, ctypes provides a special variable that automatically loads the right shared library to call the C API on, ctypes.pythonapi.
In very old versions of ctypes, you may have to find and load it explicitly, like pydll = ctypes.cdll.LoadLibrary('/usr/lib/libpython2.5.so') (This is for linux with Python 2.5 installed into /usr/lib; obviously if any of those details differ, the exact command line will differ.)
Of course it's much easier to crash the Python interpreter doing this than to do anything useful, but it's not impossible to do anything useful, and you may have fun experimenting with it.
I think the answer is here. This way, I accessed an object using its address:
import ctypes
myObj = ctypes.cast(0x249c8c6cac0, ctypes.py_object).value

How is introspection useful?

I have been programming mainly in PHP, and I am trying to make a switch to python. I am skilled with PHP, and I have never needed to use introspection / introspection like capabilities. What good is code introspection, and in what situations would I find it indispensable?
Here is the only way I find it useful:
From the examples I saw in 'Dive into Python', introspection basically means that you can list all of the functions and attributes of an object. To me it seems that introspection is just there as a "user's manual" to an object. It lets you look at the object and its functionality from the python shell.
I just do not see why or in what situation you would take an arbitrary object, introspect upon it, and do something useful.
Suppose you are given a custom object and you want to know if the object has certain attributes or has as a certain method, then the introspection function such as hasattr can be used to find that out.
Also like the DiveintoPython book already illustrates, suppose you are building a GUI Editor with Auto-Completion feature, you want to get the public methods of the object which are callable at the run-time, then you can use the introspection methods like getattr for each for the methods got via dir and check if it is callable and then display it in your auto-completion list.
One example where I have used introspection on a real project:
We had a service that was managing background tasks called TaskService. Each task was actually implemented as a class that was implementing the Start() Stop() methods of a given interface. We had a config file, in which we were matching each task with its class. So when running TaskService, it just browsed the config file, and for each task it took the name of the class and instanciated it (during runtime) through reflection (introspection is a subpart of reflection).
Another example of where introspection can be useful is in the use of annotations in your programming language. Annotations are used to give metainformation about your classes to other third party programs (like ORMs), for instance you can use annotations to tell whether a class is an entity class (it is the case in Java, I don't know about Python sorry), or about the type of association of certain attributs etc.
Code Completion is another example of the usefulness of introspection.
And by the way, as you mentionned, introspection helps a lot to program documentation tools.
I wrote a documentation validator that runs tests on PDF files to check for various problems with them. The tests are methods of special classes that represent Subversion branches, products, manuals, and arbitrary groupings of various types. The validator engine uses introspection to find these special classes, instantiate them, and run their methods.
I could have written the validator so that you have to write boilerplate code to instantiate each class, call each method, etc. But that is repeating yourself and it is prone to maintenance problems (failure to update both places when adding/removing tests, in this case). By taking advantage of the fact that you want to apply the same operation to all the special classes, the computer can essentially do the boilerplate stuff for you, and it won't make mistakes. That way, you have to declare the structure of the documentation in only one place.
A little bit late on this, but another example of introspection in use is with Active Record (a Ruby library that maps objects to database tables). Active Record uses introspection to look at the name of the class, determine the associated table, and defines methods that access object attributes after inferring their names from the database schema. It reads the schema at runtime (metaprogramming & introspection in play here). So for example, if you have a class Person, you don't have to write accessor methods e.g nameoremail. As long as columns by the same name (in this case nameandemail) exist in the associated table (in this case people`), Active Record defines accessor methods for these attributes of the same name at runtime.

Categories