Surfing on the internet (here) I found that there are some problems to collect objects with __del__ method for the garbage collector.
My doubt is simple: why?
According to the documentation:
Objects that have __del__() methods and are part of a reference cycle cause the entire reference cycle to be uncollectable, including objects not necessarily in the cycle but reachable only from it. Python doesn’t collect such cycles automatically because, in general, it isn’t possible for Python to guess a safe order in which to run the __del__() methods.
Why is the __del__ method so problematic? What's the difference between an object that implements it and one which doesn't? It only destroys an instance.
__del__ doesn't destroy an instance, it is automatically destroyed by the python runtime once its reference count reaches zero. __del__ allows you to hook into that process and perform additional actions, such as freeing external resources associated with the object.
The danger is that the additional action may even resurrect the object - for example, by storing it into a global container. In that case, the destruction is effectively cancelled (until the next time the object's reference count drops to zero). It is this scenario that causes the mere presence of __del__ to exclude the object from those governed by the cycle breaker (also known as the garbage collector). If the collector invoked __del__ on all objects in the cycle, and one of them decided to resurrect the object, this would need to resurrect the whole cycle - which is impossible since __del__ method of other cycle members have already been invoked, possibly causing permanent damage to their objects (e.g. by freeing external resources, as mentioned above).
If you only need to be notified of object's destruction, use weakref.ref. If your object is associated with external resources that need freeing, implement a close method and/or a context manager interface. There is almost never a legitimate reason to use __del__.
Related
If I instantiate an object in the main thread, and then send one of it's member methods to a ThreadPoolExecutor, does Python somehow create a copy-by-value of the object and sends it to the subthread, so that the objects member method will have access to its own copy of self?
Or is it indeed accessing self from the object in the main thread, thus meaning that every member in a subthread is modifying / overwriting the same properties (living in the main thread)?
Threads share a memory space. There is no magic going on behind the scenes, so code in different threads accesses the same objects. Thread switches can occur at any time, although most simple Python operations are atomic. It is up to you to avoid race conditions. Normal Python scoping rules apply.
You might want to read about ThreadLocal variables if you want to find out about workarounds to the default behavior.
Processes as quite different. Each Process has its own memory space and its own copy of all the objects it references.
In a CPython multithreaded environment, consider the following code
class Container:
def __del__(self):
# Some code that fails when run in a different thread than the thread that initialized this
def use_container():
c = Container()
# Some code
def thread():
use_container()
We are running the function thread in its own thread. Under normal circumstances, when we call use_container and it returns, as the reference count for c will drop to zero, its __del__ method is called. For a project I am working on, I suspect sometimes the __del__ method is not called, or called by a different thread.
I know we should not rely on __del__ being called. But under CPython, and when we are sure there are no reference cycles, are there any cases where __del__ might not be called when the objects reference count gets to 0?
One possible case I'm considering is, right after use_container returns, garbage collector is kicked in in another thread before the interpreter releases the object, in turn garbage collector releases the object and calls its __del__ method in another thread. Is such a case possible? And if so, what would happen in the original thread once it resumes execution?
The garbage collector runs on the main thread. Not the child threads. So there is no way that 1 thread will have a reference and a different thread will del the object. Since on the interpreter level it still has a ref count of 1.
What could happen is the object is moved to a different generation within the collector.
Say you have a reference to an object. that object gets put in generation 0. The collector can run but not collect the object since it is still referenced somewhere. If it does survive collection it gets moved to generation 1 which will be collected less often. If it survives the second round of collection it will move to generation 2 where it will be collected least often.
In other words, Depending on when the collector runs the object might well survive in memory for a while with a ref count of 0 because that object lives in the last generation. It will stay in memory until that generation is collected and only then its __del__ will be called. Further reading about generations here: https://devguide.python.org/garbage_collector/#optimization-generations
One of the use cases python's docs propose for weak references is to keep references to large cached objects without increasing their reference count, thus not preventing them from being garbage collected when their time comes.
However, garbage collection is not guaranteed to happen immediately after an object's refcount reaches zero, and a weakref is only invalidated when the GC collects its target. So essentially one can be left holding a valid (not dead) weakref to an invalid object - PyPy's broken WeakSet is one example of such a scenario.
So assuming an adversarialy-minded garbage collector, is there a scenario (apart from finalizers) where weak references provide deterministic and useful behavior to the user?
It’s really not about making an object get garbage collected as soon as the references are gone, and making the weak reference invalid in that case. It’s really just about allowing the object to be garbage collected when nothing else references it.
A common use case is the observer pattern, where you add an observer (or listener) to an observable. This is often used for event systems. Let’s say you have a button with a click event; when you now register for that click event with a handler, then you need to make sure to unregister those handlers properly or you will run into memory leaks. The observable will keep a reference to its listeners so those object will never be garbage collected even if they are no longer used (aside from their job as handlers).
Using weak references here prevents listener registrations from counting as references when determining whether to garbage collect an object. So you remove the need to explicitly unregister the event handler, making it easier to use. You can just register the handler with a weak reference, and delete the listener whenever you want.
There are other legitimate use cases, Wikipedia has some, but in general, weak references are used to prevent objects from being kept in the memory when there are no other strong references. But that says nothing about when the object actually gets garbage collected.
At first glance, it seems like Python's __del__ special method offers much the same advantages a destructor has in C++. But according to the Python documentation (https://docs.python.org/3.4/reference/datamodel.html), there is no guarantee that your object's __del__ method ever gets called at all!
It is not guaranteed that __del__() methods are called for objects that still exist when the interpreter exits.
So in other words, the method is useless! Isn't it? A hook function that may or may not get called really doesn't do much good, so __del__ offers nothing with regard to RAII. If I have some essential cleanup, I don't need it to run some of the time, oh, when ever the GC feels like it really, I need it to run reliably, deterministically and 100% of the time.
I know that Python provides context managers, which are far more useful for that task, but why was __del__ kept around at all? What's the point?
__del__ is a finalizer. It is not a destructor. Finalizers and destructors are entirely different animals.
Destructors are called reliably, and only exist in languages with deterministic memory management (such as C++). Python's context managers (the with statement) can achieve similar effects in certain circumstances. These are reliable because the lifespan of an object is precisely fixed; in C++, objects die when they are explicitly deleted or when some scope is exited (or when a smart pointer deletes them in response to its own destruction). And that's when destructors run.
Finalizers are not called reliably. The only valid use of a finalizer is as an emergency safety net (NB: this article is written from a .NET perspective, but the concepts translate reasonably well). For instance, the file objects returned by open() automatically close themselves when finalized. But you're still supposed to close them yourself (e.g. using the with statement). This is because the objects are destroyed dynamically by the garbage collector, which may or may not run right away, and with generational garbage collection, it may or may not collect some objects in any given pass. Since nobody knows what kinds of optimizations we might invent in the future, it's safest to assume that you just can't know when the garbage collector will get around to collecting your objects. That means you cannot rely on finalizers.
In the specific case of CPython, you get slightly stronger guarantees, thanks to the use of reference counting (which is far simpler and more predictable than garbage collection). If you can ensure that you never create a reference cycle involving a given object, that object's finalizer will be called at a predictable point (when the last reference dies). This is only true of CPython, the reference implementation, and not of PyPy, IronPython, Jython, or any other implementations.
Because __del__ does get called. It's just that it's unclear when it will, because in CPython if you have circular references, the refcount mechanism can't take care of the object reclamation (and thus its finalization via __del__) and must delegate it to the garbage collector.
The garbage collector then has a problem: he cannot know in which order to break the circular references, because this may trigger additional problems (e.g. frees the memory that is going to be needed in the finalization of another object that is part of the collected loop, triggering a segfault).
The point you stress is because the interpreter may exit for reasons that prevents it to perform the cleanup (e.g. it segfaults, or some C module impolitely calls exit() ).
There's PEP 442 for safe object finalization that has been finalized in 3.4. I suggest you take a look at it.
https://www.python.org/dev/peps/pep-0442/
I'm trying to understand the internals of the CPython garbage collector, specifically when the destructor is called. So far, the behavior is intuitive, but the following case trips me up:
Disable the GC.
Create an object, then remove a reference to it.
The object is destroyed and the _____del_____ method is called.
I thought this would only happen if the garbage collector was enabled. Can someone explain why this happens? Is there a way to defer calling the destructor?
import gc
import unittest
_destroyed = False
class MyClass(object):
def __del__(self):
global _destroyed
_destroyed = True
class GarbageCollectionTest(unittest.TestCase):
def testExplicitGarbageCollection(self):
gc.disable()
ref = MyClass()
ref = None
# The next test fails.
# The object is automatically destroyed even with the collector turned off.
self.assertFalse(_destroyed)
gc.collect()
self.assertTrue(_destroyed)
if __name__=='__main__':
unittest.main()
Disclaimer: this code is not meant for production -- I've already noted that this is very implementation-specific and does not work on Jython.
Python has both reference counting garbage collection and cyclic garbage collection, and it's the latter that the gc module controls. Reference counting can't be disabled, and hence still happens when the cyclic garbage collector is switched off.
Since there are no references left to your object after ref = None, its __del__ method is called as a result of its reference count going to zero.
There's a clue in the documentation: "Since the collector supplements the reference counting already used in Python..." (my emphasis).
You can stop the first assertion from firing by making the object refer to itself, so that its reference count doesn't go to zero, for instance by giving it this constructor:
def __init__(self):
self.myself = self
But if you do that, the second assertion will fire. That's because garbage cycles with __del__ methods don't get collected - see the documentation for gc.garbage.
The docs here (original link was to a documentation section which up to Python 3.5 was here, and was later relocated) explain how what's called "the optional garbage collector" is actually a collector of cyclic garbage (the kind that reference counting wouldn't catch) (see also here). Reference counting is explained here, with a nod to its interplay with the cyclic gc:
While Python uses the traditional
reference counting implementation, it
also offers a cycle detector that
works to detect reference cycles. This
allows applications to not worry about
creating direct or indirect circular
references; these are the weakness of
garbage collection implemented using
only reference counting. Reference
cycles consist of objects which
contain (possibly indirect) references
to themselves, so that each object in
the cycle has a reference count which
is non-zero. Typical reference
counting implementations are not able
to reclaim the memory belonging to any
objects in a reference cycle, or
referenced from the objects in the
cycle, even though there are no
further references to the cycle
itself.
Depending on your definition of garbage collector, CPython has two garbage collectors, the reference counting one, and the other one.
The reference counter is always working, and cannot be turned off, as it's quite a fast and lightweight one that does not sigificantly affect the run time of the system.
The other one (some varient of mark and sweep, I think), gets run every so often, and can be disabled. This is because it requires the interpreter to be paused while it is running, and this can happen at the wrong moment, and consume quite a lot of CPU time.
This ability to disable it is there for those time when you expect to be doing something that's time critical, and the lack of this GC won't cause you any problems.