Determine whether object has been finalized - python

One of the effects of the GC changes that happened in Python 3.4 is that a gc-tracked object will only have its __del__ method called once, even if the first __del__ call resurrects the object:
>>> class Foo(object):
... def __del__(self):
... print('__del__')
... global x
... x = self
...
>>> x = Foo()
>>> del x
__del__
>>> del x
>>>
(Untracked objects currently behave differently, since they don't have the flag that indicates already-finalized status. You can see this by inserting __slots__ = () in the above class definition. I'm not sure whether whether this is a bug or a known and accepted behavior difference.)
For debugging purposes, it would be useful to be able to determine if an object has had its __del__ method called. One option would be to insert a line in __del__ that sets an indicator flag, but that requires advance preparation, and it may not be possible for objects with __del__ written in C, such as generators.
Is it possible to determine whether an object has been finalized, without modifying its __del__ method?

In Python 3.9, this can be tested using gc.is_finalized(obj).

Related

Why doesn't Python have an instancemethod function?

Why doesn't Python have an instancemethod function analogous to staticmethod and classmethod?
Here is how this arose for me. Suppose I have an object which I know will be hashed frequently and whose hash is expensive to calculate. Under this assumption, it is reasonable to compute the hash value once and cache it, as in the following toy example:
class A:
def __init__(self, x):
self.x = x
self._hash_cache = hash(self.x)
def __hash__(self):
return self._hash_cache
The __hash__ function in this class does very little, just an attribute lookup and a return. Naively, it seems it ought to be equivalent to instead write:
class B:
def __init__(self, x):
self.x = x
self._hash_cache = hash(self.x)
__hash__ = operator.attrgetter('_hash_cache')
According to the documentation, operator.attrgetter returns a callable object that fetches the given attribute from its operand. If its operand is self, then it will return self._hash_cache, which is the desired result. Unfortunately this does not work:
>>> hash(A(1))
1
>>> hash(B(1))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: attrgetter expected 1 arguments, got 0
The reason for this is as follows. If one reads the descriptor HOWTO, one finds that class dictionaries store methods as functions; functions are non-data descriptors whose __get__ method returns a bound method. But operator.attrgetter does not return a function; it returns a callable object. And in fact, it is a callable object with no __get__ method:
>>> hasattr(operator.attrgetter('_hash_cache'), '__get__')
False
Lacking a __get__ method, this of course will not automatically be turned into a bound method. We can make a bound method from it using types.MethodType, but using it in our class B would require creating a bound method for every object instance and assigning it to __hash__.
We can see the fact that operator.attrgetter has no __get__ directly if we browse the CPython source. I'm not very familiar with the CPython API, but I believe that what's going on is as follows. The definition of the attrgetter_type is in Modules/_operator.c, at line 1439 as I write this. This type sets tp_descr_get to 0. And according to the type object documentation, that means an object whose type is attrgetter_type will not have a __get__.
Of course, if we give ourselves a __get__ method, then everything works. This is the case in the first example above, where __hash__ is actually a function and not just a callable. It's also true in some other cases. For example, if we want to lookup a class attribute, we could write the following:
class C:
y = 'spam'
get_y = classmethod(operator.attrgetter('y'))
As written this is terribly un-Pythonic (though it might be defensible if there were a strange custom __getattr__ for which we wanted to provide convenience functions). But at least it gives the desired result:
>>> C.get_y()
'spam'
I can't think of any reason why it would be bad for attrgetter_type to implement __get__. But on the other hand, even if it did, there would be other situations where we run into trouble. For example, suppose we have a class whose instances are callable:
class D:
def __call__(self, other):
...
We can't use an instance of this class as a class attribute and expect instance lookups to generate bound methods. For instance,
d = D()
class E:
apply_d = d
When D.__call__ is called, it will receive self but not other, and that generates a TypeError. This example might be a little far-fetched, but I'd be a little surprised if nobody had ever encountered something like this in practice. It could be fixed by giving D a __get__ method; but if D is from a third-party library that could be inconvenient.
It seems that the easiest solution would be to have an instancemethod function. Then we could write __hash__ = instancemethod(operator.attrgetter('_hash_cache')) and apply_d = instancemethod(d) and they would both work as intended. Yet, as far as I know, no such function exists. Hence my question: Why is there no instancemethod function?
EDIT: Just to be clear, the functionality of instancemethod would be equivalent to:
def instancemethod(func):
#functools.wraps(func)
def wrapper(*args, **kwargs):
return func(*args, **kwargs)
return wrapper
This could be applied as in the original question above. One could also imagine writing a class decorator that could be applied to D that would give it a __get__ method; but this code doesn't do this.
So I'm not talking about adding a new feature to Python. Really the question is one of language design: Why not provide it as, say, functools.instancemethod? If the answer is simply, "The use cases are so obscure that nobody's bothered," that's okay. But I would be happy to learn about other reasons, if there are any.
There is no instancemethod decorator because this is the default behaviour for functions declared inside a class.
class A:
...
# This is an instance method
def __hash__(self):
return self._hash_cache
Any callable which does not have a __get__ method can thus be wrapped into an instance method like so.
class A:
def instance_method(*args):
return any_callable(*args)
Thus creating an instancemethod decorator would just add another syntax for a feature which already exists. This would go against the saying that there should be one-- and preferably only one --obvious way to do it.
Side note
If it is so expensive to hash your instances, you might want to avoid calling you hash function on instantiation and delay it for when the object are hashed.
One way to do that could be to set the attribute _hash_cache in __hash__ instead of __init__. Although, let me suggest a slightly more self-contained methods which relies on caching your hash.
from weakref import finalize
class CachedHash:
def __init__(self, x):
self.x = x
def __hash__(self, _cache={}):
if id(self) not in _cache:
finalize(self, _cache.pop, id(self))
_cache[id(self)] = hash(self.x) # or some complex hash function
return _cache[id(self)]
The use of finalize ensures the cache is cleared of an id when its instance is garbage collected.
I have a satisfying answer to my question. Python does have the internal interface necessary for an instancemethod function, but it's not exposed by default.
import ctypes
import operator
instancemethod = ctypes.pythonapi.PyInstanceMethod_New
instancemethod.argtypes = (ctypes.py_object,)
instancemethod.restype = ctypes.py_object
class A:
def __init__(self, x):
self.x = x
self._hash_cache = hash(x)
__hash__ = instancemethod(operator.attrgetter('_hash_cache'))
a = A(1)
print(hash(a))
The instancemethod function this creates works in essentially the same way as classmethod and staticmethod. These three functions return new objects of types instancemethod, classmethod, and staticmethod, respectively. We can see how they work by looking at Objects/funcobject.c. These objects all have __func__ members which store a callable object. They also have a __get__. For a staticmethod object, the __get__ returns __func__ unchanged. For a classmethod object, __get__ returns a bound method object, where the binding is to the class object. And for a staticmethod object, __get__ returns a bound method object, where the binding is to the object instance. This is precisely the same behavior as __get__ for a function object and is exactly what we want.
The only documentation on these objects seems to be in the Python C API here. My guess is that they're not exposed because they're so rarely needed. I think it would be nice to have PyInstanceMethod_New available as functools.instancemethod.

Delete an instance from its class' dict in desctructor?

I'm trying to create a class that saves all of its instances in a dictionary:
>>> class X:
def __new__(cls, index):
if index in cls._instances:
return cls._instances[index]
self = object.__new__(cls)
self.index = index
cls._instances[index] = self
return self
def __del__(self):
del type(self)._instances[self.index]
_instances = {}
However, the __del__ doesn't seem to work:
>>> x = X(1)
>>> del x
>>> X._instances
{1: <__main__.X object at 0x00000000035166D8>}
>>>
What am I doing wrong?
Building on Kirk Strauser's answer, I'd like to point out that, when you del x, the class' _instances still holds another reference to x - and thus it can't be garbage collected (and __del__ won't run.
Instead of doing this kind of low-level magic, you probably should be using weakrefs, which were implemented especially for this purpose.
WeakValueDictinary, in particular, suits your needs perfectly, and you can fill it on __init__ instead of fiddling with __new__ and __del__
You're not doing anything wrong, but __del__ isn't quite what you think. From the docs on it:
Note del x doesn’t directly call x.__del__() — the former decrements the reference count for x by one, and the latter is only called when x‘s reference count reaches zero.
Running this from the interpreter is particularly tricky because command history or other mechanisms may hold references to x for an indeterminate amount of time.
By the way, your code looks an awful lot like a defaultdict with X as the factory. It may be more straightforward to use something like that to be more explicit (ergo more Pythonic) about what you're trying to do.

How to get all instances of a certain class in python?

Someone asked a similar one [question]:Printing all instances of a class.
While I am less concerned about printing them, I'd rather to know how many instances are currently "live".
The reason for this instance capture is more like a setting up a scheduled job, every hour check these "live" unprocessed instances and enrich the data. After that, either a flag in this instance is set or just delete this instance.
Torsten Marek 's answer in [question]:Printing all instances of a class using weakrefs need a call to the base class constructor for every class of this type, is it possible to automate this? Or we can get all instances with some other methods?
You can either track it on your own (see the other answers) or ask the garbage collector:
import gc
class Foo(object):
pass
foo1, foo2 = Foo(), Foo()
foocount = sum(1 for o in gc.get_referrers(Foo) if o.__class__ is Foo)
This can be kinda slow if you have a lot of objects, but it's generally not too bad, and it has the advantage of being something you can easily use with someone else's code.
Note: Used o.__class__ rather than type(o) so it works with old-style classes.
If you only want this to work for CPython, and your definition of "live" can be a little lax, there's another way to do this that may be useful for debugging/introspection purposes:
>>> import gc
>>> class Foo(object): pass
>>> spam, eggs = Foo(), Foo()
>>> foos = [obj for obj in gc.get_objects() if isinstance(obj, Foo)]
>>> foos
[<__main__.Foo at 0x1153f0190>, <__main__.Foo at 0x1153f0210>]
>>> del spam
>>> foos = [obj for obj in gc.get_objects() if isinstance(obj, Foo)]
>>> foos
[<__main__.Foo at 0x1153f0190>, <__main__.Foo at 0x1153f0210>]
>>> del foos
>>> foos = [obj for obj in gc.get_objects() if isinstance(obj, Foo)]
>>> foos
[<__main__.Foo at 0x1153f0190>]
Note that deleting spam didn't actually make it non-live, because we've still got a reference to the same object in foos. And reassigning foos didn't not help, because apparently the call to get_objects happened before the old version is released. But eventually it went away once we stopped referring to it.
And the only way around this problem is to use weakrefs.
Of course this will be horribly slow in a large system, with or without weakrefs.
Sure, store the count in a class attribute:
class CountedMixin(object):
count = 0
def __init__(self, *args, **kwargs):
type(self).count += 1
super().__init__(*args, **kwargs)
def __del__(self):
type(self).count -= 1
try:
super().__del__()
except AttributeError:
pass
You could make this slightly more magical with a decorator or a metaclass than with a base class, or simpler if it can be a bit less general (I've attempted to make this fit in anywhere in any reasonable multiple-inheritance hierarchy, which you usually don't need to worry about…), but basically, this is all there is to it.
If you want to have the instances themselves (or, better, weakrefs to them), rather than just a count of them, just replace count=0 with instances=set(), then do instances.add(self) instead of count += 1, etc. (Again, though, you probably want a weakref to self, rather than self.)
I cannot comment to the answer of kindall, thus I write my comment as answer:
The solution with gc.get_referrers(<ClassName>) does not work with inherited classes in python 3. The method gc.get_referrers(<ClassName>) does not return any instances of a class that was inherited from <ClassName>.
Instead you need to use gc.get_objects() which is much slower, since it returns a full list of objects. But in case of unit-tests, where you simply want to ensure your objects get deleted after the test (no circular references) it should be sufficient and fast enough.
Also do not forget to call gc.collect() before checking the number of your instances, to ensure all unreferenced instances are really deleted.
I also saw an issue with weak references which are also counted in this way. The problem with weak references is, that the object which is referenced might not exist any more, thus isinstance(Instance, Class) might fail with an error about non existing weak references.
Here is a simple code example:
import gc
def getInstances(Class):
gc.collect()
Number = 0
InstanceList = gc.get_objects()
for Instance in InstanceList:
if 'weakproxy' not in str(type(Instance)): # avoid weak references
if isinstance(Instance, Class):
Number += 1
return Number

Weak reference to Python class method

Python 2.7 docs for weakref module say this:
Not all objects can be weakly referenced; those objects which can
include class instances, functions written in Python (but not in C),
methods (both bound and unbound), ...
And Python 3.3 docs for weakref module say this:
Not all objects can be weakly referenced; those objects which can
include class instances, functions written in Python (but not in C),
instance methods, ...
To me, these indicate that weakrefs to bound methods (in all versions Python 2.7 - 3.3) should be good, and that weakrefs to unbound methods should be good in Python 2.7.
Yet in Python 2.7, creating a weakref to a method (bound or unbound) results in a dead weakref:
>>> def isDead(wr): print 'dead!'
...
>>> class Foo:
... def bar(self): pass
...
>>> wr=weakref.ref(Foo.bar, isDead)
dead!
>>> wr() is None
True
>>> foo=Foo()
>>> wr=weakref.ref(foo.bar, isDead)
dead!
>>> wr() is None
True
Not what I would have expected based on the docs.
Similarly, in Python 3.3, a weakref to a bound method dies on creation:
>>> wr=weakref.ref(Foo.bar, isDead)
>>> wr() is None
False
>>> foo=Foo()
>>> wr=weakref.ref(foo.bar, isDead)
dead!
>>> wr() is None
True
Again not what I would have expected based on the docs.
Since this wording has been around since 2.7, it's surely not an oversight. Can anyone explain how the statements and the observed behavior are in fact not in contradiction?
Edit/Clarification: In other words, the statement for 3.3 says "instance methods can be weak referenced"; doesn't this mean that it is reasonable to expect that weakref.ref(an instance method)() is not None? and if it None, then "instance methods" should not be listed among the types of objects that can be weak referenced?
Foo.bar produces a new unbound method object every time you access it, due to some gory details about descriptors and how methods happen to be implemented in Python.
The class doesn't own unbound methods; it owns functions. (Check out Foo.__dict__['bar'].) Those functions just happen to have a __get__ which returns an unbound-method object. Since nothing else holds a reference, it vanishes as soon as you're done creating the weakref. (In Python 3, the rather unnecessary extra layer goes away, and an "unbound method" is just the underlying function.)
Bound methods work pretty much the same way: the function's __get__ returns a bound-method object, which is really just partial(function, self). You get a new one every time, so you see the same phenomenon.
You can store a method object and keep a reference to that, of course:
>>> def is_dead(wr): print "blech"
...
>>> class Foo(object):
... def bar(self): pass
...
>>> method = Foo.bar
>>> wr = weakref.ref(method, is_dead)
>>> 1 + 1
2
>>> method = None
blech
This all seems of dubious use, though :)
Note that if Python didn't spit out a new method instance on every attribute access, that'd mean that classes refer to their methods and methods refer to their classes. Having such cycles for every single method on every single instance in the entire program would make garbage collection way more expensive—and before 2.1, Python didn't even have cycle collection, so they would've stuck around forever.
#Eevee's answer is correct but there is a subtlety that is important.
The Python docs state that instance methods (py3k) and un/bound methods (py2.4+) can be weak referenced. You'd expect (naively, as I did) that weakref.ref(foo.bar)() would therefore be non-None, yet it is None, making the weak ref "dead on arrival" (DOA). This lead to my question, if the weakref to an instance method is DOA, why do the docs say you can weak ref a method?
So as #Eevee showed, you can create a non-dead weak reference to an instance method, by creating a strong reference to the method object which you give to weakref:
m = foo.bar # creates a *new* instance method "Foo.bar" and strong refs it
wr = weakref.ref(m)
assert wr() is not None # success
The subtlety (to me, anyways) is that a new instance method object is created every time you use Foo.bar, so even after the above code is run, the following will fail:
wr = weakref.ref(foo.bar)
assert wr() is not None # fails
because foo.bar is new instance of the "Foo instance" foo's "bar" method, different from m, and there is no strong ref to this new instance, so it is immediately gc'd, even if you have created a strong reference to it earlier (it is not the same strong ref). To be clear,
>>> d1 = foo.bla # assume bla is a data member
>>> d2 = foo.bla # assume bla is a data member
>>> d1 is d2
True # which is what you expect
>>> m1 = foo.bar # assume bar is an instance method
>>> m2 = foo.bar
>>> m1 is m2
False # !!! counter-intuitive
This takes many people by surprise since no one expects access to an instance member to be creating a new instance of anything. For example, if foo.bla is a data member of foo, then using foo.bla in your code does not create a new instance of the object referenced by foo.bla. Now if bla is a "function", foo.bla does create a new instance of type "instance method" representing the bound function.
Why the weakref docs (since python 2.4!) don't point that out is very strange, but that's a separate issue.
While I see that there's an accepted answer as to why this should be so, from a simple use-case situation wherein one would like an object that acts as a weakref to a bound method, I believe that one might be able to sneak by with an object as such. It's kind of a runt compared to some of the 'codier' things out there, but it works.
from weakref import proxy
class WeakMethod(object):
"""A callable object. Takes one argument to init: 'object.method'.
Once created, call this object -- MyWeakMethod() --
and pass args/kwargs as you normally would.
"""
def __init__(self, object_dot_method):
self.target = proxy(object_dot_method.__self__)
self.method = proxy(object_dot_method.__func__)
###Older versions of Python can use 'im_self' and 'im_func' in place of '__self__' and '__func__' respectively
def __call__(self, *args, **kwargs):
"""Call the method with args and kwargs as needed."""
return self.method(self.target, *args, **kwargs)
As an example of its ease of use:
class A(object):
def __init__(self, name):
self.name = name
def foo(self):
return "My name is {}".format(self.name)
>>> Stick = A("Stick")
>>> WeakFoo = WeakMethod(Stick.foo)
>>> WeakFoo()
'My name is Stick'
>>> Stick.name = "Dave"
>>> WeakFoo()
'My name is Dave'
Note that evil trickery will cause this to blow up, so depending on how you'd prefer it to work this may not be the best solution.
>>> A.foo = lambda self: "My eyes, aww my eyes! {}".format(self.name)
>>> Stick.foo()
'My eyes, aww my eyes! Dave'
>>> WeakFoo()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 6, in __call__
ReferenceError: weakly-referenced object no longer exists
>>>
If you were going to be replacing methods on-the-fly you might need to use a getattr(weakref.proxy(object), 'name_of_attribute_as_string') approach instead. getattr is a fairly fast look-up so that isn't the literal worst thing in the world, but depending on what you're doing, YMMV.

setattr, object deletion and cyclic garbage collection

I would like to understand how object deletion works on python. Here is a very simple bunch of code.
class A(object):
def __init__(self):
setattr(self, "test", self._test)
def _test(self):
print "Hello, World!"
def __del__(self):
print "I'm dying!"
class B(object):
def test(self):
print "Hello, World!"
def __del__(self):
print "I'm dying"
print "----------Test on A"
A().test()
print "----------Test on B"
B().test()
Pythonista would recognize that I'm running a python 2.x version. More specially, this code runs on a python 2.7.1 setup.
This code outputs the following:
----------Test on A
Hello, World!
----------Test on B
Hello, World!
I'm dying
Surprisingly, A object is not deleted. I can understand why, since the setattr statement in __init__ produces a circular reference. But this one seems to be easy to resolve.
Finally, this page, in python documentation (Supporting Cyclic Garbage Collection), show that it's possible to deal with this kind of circular reference.
I would like to know:
why I never go thru my __del__ method in A class?
if my diagnosis about circular reference is good, why my object subclass does not support cyclic garbage collection?
finally, how to deal with this kind of setattr if I really want to go thru __del__?
Note: In A if the setattr points to another method of my module, there's no problem.
Fact 1
Instance methods are normally stored on the class. The interpreter first looks them up in the instance __dict__, which fails, and then looks on the class, which succeeds.
When you dynamically set the instance method of A in __init__, you create a reference to it in the instance dictionary. This reference is circular, so the refcount will never go to zero and the reference counter will not clean A up.
>>> class A(object):
... def _test(self): pass
... def __init__(self):
... self.test = self._test
...
>>> a = A()
>>> a.__dict__['test'].im_self
Fact 2
The garbage collector is what Python uses to deal with circular references. Unfortunately, it can't handle objects with __del__ methods, since in general it can't determine a safe order to call them. Instead, it just puts all such objects in gc.garbage. You can then go look there to break cycles, so they can be freed. From the docs
gc.garbage
A list of objects which the collector found to be unreachable but could
not be freed (uncollectable objects). By default, this list contains only
objects with __del__() methods. Objects that have __del__() methods
and are part of a reference cycle cause the entire reference cycle
to be uncollectable, including objects not necessarily
in the cycle but reachable only from it. Python doesn’t collect such
cycles automatically because, in general, it isn’t possible for Python
to guess a safe order in which to run the __del__() methods. If you
know a safe order, you can force the issue by examining the garbage
list, and explicitly breaking cycles due to your objects within the
list. Note that these objects are kept alive even so by virtue of
being in the garbage list, so they should be removed from garbage too.
For example, after breaking cycles, do del gc.garbage[:] to empty the
list. It’s generally better to avoid the issue by not creating cycles
containing objects with __del__() methods, and garbage can be examined
in that case to verify that no such cycles are being created.
Therefore
Don't make cyclic references on objects with __del__ methods if you want them to be garbage collected.
You should read the documentation on the __del__ method rather carefully - specifically, the part where objects with __del__ methods change the way the collector works.
The gc module provides some hooks where you can clean this up yourself.
I suspect that simply not having a __del__ method here would result in your object being properly cleaned up. You can verify this by looking through gc.garbage and seeing if your instance of A is present.

Categories