Python: flush a buffer before program termination via a finalizer - python

I keep a cache of transactions to flush (to persistent storage) on the event of a watermark or object finalization. Since __del__ is no longer guaranteed to be called on every object, is the appropriate approach to hook a similar function (or __del__ itself) into atexit.register (during initialization)?
If I'm not mistaken, this will cause the object to which the method is bound to hang around until program termination. This isn't likely to be a problem, but maybe there's a more elegant solution?
Note: I know using __del__ is non-ideal because it can cause uncatchable exceptions, but I can't think of another way to do this short of cascading finalize() calls all the way through my program. TIA!

If you have to handle ressources the prefered way is to have an explicit call to a close() or finalize() method. Have a look at the with statement to abstract that. In your case the weakref module might be an option. The cached object can be garbage collected by the system and have their __del__() method called or you finalize them if they are still alive.

I would say atexit or try and see if you can refactor the code into being able to be expressed using a with_statement which is in the __future__ in 2.5 and in 2.6 by default. 2.5 includes a module contextlib to simplify things a bit. I've done something like this when using Canonical's Storm ORM.
from future import with_statement
#contextlib.contextmanager
def start_transaction(db):
db.start()
yield
db.end()
with start_transaction(db) as transaction:
...
For a non-db case, you could just register the objects to be flushed with a global and then use something similar. The benefit of this approach is that it keeps things explicit.

If you don't need your object to be alive at the time you perform the flush, you could use weak references
This is similar to your proposed solution, but rather than using a real reference, store a list of weak references, with a callback function to perform the flush. This way, the references aren't going to keep those objects alive, and you won't run into any circular garbage problems with __del__ methods.
You can run through the list of weak references on termination to manually flush any still alive if this needs to be guaranteed done at a certain point.

Put the following in a file called destructor.py
import atexit
objects = []
def _destructor():
global objects
for obj in objects:
obj.destroy()
del objects
atexit.register(_destructor)
now use it this way:
import destructor
class MyObj(object):
def __init__(self):
destructor.objects.append(self)
# ... other init stuff
def destroy(self):
# clean up resources here

I think atexit is the way to go here.

Related

Which objects are not destroyed upon Python interpreter exit?

According to Python documentation:
It is not guaranteed that __del__() methods are called for objects that still exist when the interpreter exits.
I know that in older versions of Python cyclic referencing would be one of the examples for this behaviour, however as I understand it, in Python 3 such cycles will successfully be destroyed upon interpreter exit.
I'm wondering what are the cases (as close to exhaustive list as possible) when the interpreter would not destroy an object upon exit.
All examples are implementation details - Python does not promise whether or not it will call __del__ for any particular objects on interpreter exit. That said, one of the simplest examples is with daemon threads:
import threading
import time
def target():
time.sleep(1000)
class HasADel:
def __del__(self):
print('del')
x = HasADel()
threading.Thread(target=target, daemon=True).start()
Here, the daemon thread prevents the HasADel instance from being garbage collected on interpreter shutdown. The daemon thread doesn't actually do anything with that object, but Python can't clean up references the daemon thread owns, and x is reachable from references the daemon thread owns.
When the interpreter exits normally, in such ways as the program ending or sys.exit being called, not all objects are guaranteed to be destroyed. There is probably some amount of logic to this, but not very simple logic. After all, the __del__ method is for freeing memory resources, not other resources (like network connections) - that's what __enter__ and __exit__ are for.
Having said that, there are situtations in which __del__ will most certainly not be called. The parallel to this is atexit functions; they are usually run at exit. However:
Note: The functions registered via this module are not called when the program is killed by a signal not handled by Python, when a Python fatal internal error is detected, or when os._exit() is called.
atexit documentation
So, there are situations in which clean-up functions, like __del__, __exit__, and functions registered with atexit will not be called:
The program is killed by a signal not handled by Python - If a program recieves a signal to stop, like SIGINT or SIGQUIT, and it doesn't handle the signal, then it will be stopped.
A Python fatal interpreter error occurs.
os._exit() is called - the documentation says:
Exit the process with status n, without calling cleanup handlers, flushing stdio buffers, etc.
So it is pretty clear that __del__ should not be called.
In conclusion, the interpreter does not guarantee __del__ being called, but there are situations in which it will definitely not be called.
After comparing the quoted sentence from documentation and your title, I thought you misunderstood what __del__ is and what it does.
You used the word "destroyed", and documentation said __del__ may not get called in some situations... The thing is "all" objects are get deleted after the interpreter's process finishes. __del__ is not a destructor and has nothing to do with the destruction of objects. Even if a memory leakage occurs in a process, operating systems(the ones I know at least: Linux, Windows,...) will eventually reclaim that memory for the process after it finishes. So everything is destroyed/deleted!(here and here)
In normal cases when these objects are about to get destroyed, __del__ (better known as finalizer) gets called in the very last step of destruction. In other cases mentioned by other answers, It doesn't get called.
That's why people say don't count on __del__ method for cleaning vital stuff and instead use a context manager. In some scenarios, __del__ may even revive the object by passing a reference around.

How does one expose a memory managed item that must be closed/deleted as an importable resource in python3?

Suppose I have an item that interfaces with a C library. The interface allocates memory.
The __del__ method takes care of everything, but there is no guarantee the __del__ method will be called in an imperative python3 runtime.
So, I have overloaded the context manager functions and can declare my item 'with':
with Foo(**kwargs) as foo:
foo.doSomething()
# no memory leaks
However, I am now exposing my foo in an __init__.py, and am curious how I could possibly expose the context manager object in a way that allows a user to use it without using it inside of a 'with' block.
Is there a way I can open/construct my Foo inside my module such that there is a guarantee __del__ will be called (or the context manager exit function) so that it is exposed for use, but doesn't expose a daemon or other long term process to risk of memory loss?
Or is deletion implied when an object is constructed implicitly via import, even though it ~may or may not~ occur when the object is constructed in the runtime scope?
Although this should probably not be the case, or at least might not always be the case from version to version...
...Python3.9 does indeed call __del__ on objects initialized prior to importation.
I can't accept this answer because I have no way of directly proving that it is not possible python3 does not call the __del__, but it has not failed to call it so far, whereas I get a memory leak every time I do not dispose the object properly after declaring it during runtime flow.

Proper finalization in Python

I have a bunch of instances, each having a unique tempfile for its use (save data from memory to disk and retrieve them later).
I want to be sure that at the end of the day, all these files are removed. However, I want to leave a room for a fine-grained control of their deletion. That is, some files may be removed earlier, if needed (e.g. they are too big and not important any more).
What is the best / recommended way to achieve this?
May thoughts on that
The try-finalize blocks or with statements are not an option, as we have many files, whose lifetime may overlap each other. Also, it hardly admits the option of finer control.
From what I have read, __del__ is also not a feasible option, as it is not even guaranteed that it will eventually run (although, it is not entirely clear to me, what are the "risky" cases). Also (if it is still the case), the libraries may not be available when __del__ runs.
tempfile library seems promising. However, the file is gone after just closing it, which is definitely a bummer, as I want them to be closed (when they perform no operation) to limit the number of open files.
The library promises that the file "will be destroyed as soon as it is closed (including an implicit close when the object is garbage collected)."
How do they achieve the implicit close? E.g. in C# I would use a (reliable) finalizer, which __del__ is not.
atexit library seems to be the best candidate, which can work as a reliable finalizer instead of __del__ to implement safe disposable pattern. The only problem, compared to object finalizers, is that it runs truly at-exit, which is rather inconvenient (what if the object eligible to be garbage-collected earlier?).
Here, the question still stands. How the library achieves that the methods always run? (Except in a really unexpected cases with which is hard to do anything)
In ideal case, it seems that a combination of __del__ and atexit library may perform best. That is, the clean-up is both at __del__ and the method registered in atexit, while repeated clean-up would be forbidden. If __del__ was called, the registered will be removed.
The only (yet crucial) problem is that __del__ won't run if a method is registered at atexit, because a reference to the object exists forever.
Thus, any suggestion, advice, useful link and so on is welcomed.
I suggest considering weakref built-in module for this task, more specifically weakref.finalize simple example:
import weakref
class MyClass:
pass
def clean_up(*args):
print('clean_up', args)
my_obj = MyClass()
weakref.finalize(my_obj, clean_up, 'arg1', 'arg2', 'arg3')
del my_obj # optional
when run it will output
clean_up ('arg1', 'arg2', 'arg3')
Note that clean_up will be executed even without del-ing of my_obj (you might delete last line of code and behavior will not change). clean_up is called after all strong references to my_obj are gone or at end (like using atexit module).

Should I tear down all resources constructed in __init__?

Quite often I create classes which internally use some resources like requests sessions, IMAP connections. I initialize such resources in __init__.
For example:
class SomeClass:
def __init__(self, login, password):
self.session = requests.Session()
self.imap_connection = IMAPLib.connect(...)
... and so on ...
So the main question: should I manually free such resources, like sessions, imap_connections or it's quite safe to let them die when the GC runs or it's not safe and if so what is best solution?
As I understand besides implementing some free_resources method and invoking it explicitly is to implement __enter__ which returns self and an __exit__ method which tears all these resources down. Then I instantiate these class using a with block.
Context manager support (support for with blocks) is the most portable, consistent approach. Otherwise, assuming the resources define their own __del__ cleanup, then, barring reference loops, in CPython they'll be cleaned up when the owner instance loses its last reference. Unfortunately, if there is a cycle, prior to CPython 3.4, the presence of a __del__ finalizer in the cycle will prevent the cyclic GC from cleaning the cycle at all, so memory leaks and no finalizers are invoked. In 3.4+, it will probably, eventually be cleaned, but the timing won't be deterministic thanks to the need for cyclic collection to occur.

Finding where a python object is hiding

I have a problem in which there is a python object that is hiding somewhere. The object is a wrapper around a C library and I need to call the deallocation routine at exit, otherwise the (behind the scenes) thread will hang (it's a cython object, so the deallocation is put in the __dealloc__ method).
The problem is I can't for the life of me work out where the object is hiding. I haven't intentionally introduced any global state. Is there some way to work out where an object is lingering? Could it just be a lingering object cycle, so gc should pick it up? That said, I'd really like to work out the cause of the problem if possible.
Edit: I solved the problem, which was down to pyglet event handlers not being cleanly removed. They were in the __del__ method, but the object wasn't being deleted because the event dispatcher had hold of an object method. This is fine logically, but it seems odd to me that the object is never deleted, even at exit. Does anyone know why the __del__ is not called at interpreter exit? Actually, this question has been asked - though the answers aren't brilliant.
Anyway, the basic question still stands - how do I reliably find these lingering references?
One possible place is gc.garbage It is a list of objects that have been found unreachable, but cannot be deleted because they include __del__ methods in a cycle.
In Python previous to 3.4, if you have a cycle with several __del__ methods, the interpreter doesn't know in which way they should be executed, as they could have mutual references. So instead it doesn't execute any, and moves the objects to this list.
If you find your object there, the documentation recommends doing del gc.garbage[:].
The solution to avoid this in the first place is to use weakrefs where possible to avoid cycles.

Categories