Should I tear down all resources constructed in __init__? - python

Quite often I create classes which internally use some resources like requests sessions, IMAP connections. I initialize such resources in __init__.
For example:
class SomeClass:
def __init__(self, login, password):
self.session = requests.Session()
self.imap_connection = IMAPLib.connect(...)
... and so on ...
So the main question: should I manually free such resources, like sessions, imap_connections or it's quite safe to let them die when the GC runs or it's not safe and if so what is best solution?
As I understand besides implementing some free_resources method and invoking it explicitly is to implement __enter__ which returns self and an __exit__ method which tears all these resources down. Then I instantiate these class using a with block.

Context manager support (support for with blocks) is the most portable, consistent approach. Otherwise, assuming the resources define their own __del__ cleanup, then, barring reference loops, in CPython they'll be cleaned up when the owner instance loses its last reference. Unfortunately, if there is a cycle, prior to CPython 3.4, the presence of a __del__ finalizer in the cycle will prevent the cyclic GC from cleaning the cycle at all, so memory leaks and no finalizers are invoked. In 3.4+, it will probably, eventually be cleaned, but the timing won't be deterministic thanks to the need for cyclic collection to occur.

Related

How does one expose a memory managed item that must be closed/deleted as an importable resource in python3?

Suppose I have an item that interfaces with a C library. The interface allocates memory.
The __del__ method takes care of everything, but there is no guarantee the __del__ method will be called in an imperative python3 runtime.
So, I have overloaded the context manager functions and can declare my item 'with':
with Foo(**kwargs) as foo:
foo.doSomething()
# no memory leaks
However, I am now exposing my foo in an __init__.py, and am curious how I could possibly expose the context manager object in a way that allows a user to use it without using it inside of a 'with' block.
Is there a way I can open/construct my Foo inside my module such that there is a guarantee __del__ will be called (or the context manager exit function) so that it is exposed for use, but doesn't expose a daemon or other long term process to risk of memory loss?
Or is deletion implied when an object is constructed implicitly via import, even though it ~may or may not~ occur when the object is constructed in the runtime scope?
Although this should probably not be the case, or at least might not always be the case from version to version...
...Python3.9 does indeed call __del__ on objects initialized prior to importation.
I can't accept this answer because I have no way of directly proving that it is not possible python3 does not call the __del__, but it has not failed to call it so far, whereas I get a memory leak every time I do not dispose the object properly after declaring it during runtime flow.

Proper finalization in Python

I have a bunch of instances, each having a unique tempfile for its use (save data from memory to disk and retrieve them later).
I want to be sure that at the end of the day, all these files are removed. However, I want to leave a room for a fine-grained control of their deletion. That is, some files may be removed earlier, if needed (e.g. they are too big and not important any more).
What is the best / recommended way to achieve this?
May thoughts on that
The try-finalize blocks or with statements are not an option, as we have many files, whose lifetime may overlap each other. Also, it hardly admits the option of finer control.
From what I have read, __del__ is also not a feasible option, as it is not even guaranteed that it will eventually run (although, it is not entirely clear to me, what are the "risky" cases). Also (if it is still the case), the libraries may not be available when __del__ runs.
tempfile library seems promising. However, the file is gone after just closing it, which is definitely a bummer, as I want them to be closed (when they perform no operation) to limit the number of open files.
The library promises that the file "will be destroyed as soon as it is closed (including an implicit close when the object is garbage collected)."
How do they achieve the implicit close? E.g. in C# I would use a (reliable) finalizer, which __del__ is not.
atexit library seems to be the best candidate, which can work as a reliable finalizer instead of __del__ to implement safe disposable pattern. The only problem, compared to object finalizers, is that it runs truly at-exit, which is rather inconvenient (what if the object eligible to be garbage-collected earlier?).
Here, the question still stands. How the library achieves that the methods always run? (Except in a really unexpected cases with which is hard to do anything)
In ideal case, it seems that a combination of __del__ and atexit library may perform best. That is, the clean-up is both at __del__ and the method registered in atexit, while repeated clean-up would be forbidden. If __del__ was called, the registered will be removed.
The only (yet crucial) problem is that __del__ won't run if a method is registered at atexit, because a reference to the object exists forever.
Thus, any suggestion, advice, useful link and so on is welcomed.
I suggest considering weakref built-in module for this task, more specifically weakref.finalize simple example:
import weakref
class MyClass:
pass
def clean_up(*args):
print('clean_up', args)
my_obj = MyClass()
weakref.finalize(my_obj, clean_up, 'arg1', 'arg2', 'arg3')
del my_obj # optional
when run it will output
clean_up ('arg1', 'arg2', 'arg3')
Note that clean_up will be executed even without del-ing of my_obj (you might delete last line of code and behavior will not change). clean_up is called after all strong references to my_obj are gone or at end (like using atexit module).

RAII in Python: What's the point of __del__?

At first glance, it seems like Python's __del__ special method offers much the same advantages a destructor has in C++. But according to the Python documentation (https://docs.python.org/3.4/reference/datamodel.html), there is no guarantee that your object's __del__ method ever gets called at all!
It is not guaranteed that __del__() methods are called for objects that still exist when the interpreter exits.
So in other words, the method is useless! Isn't it? A hook function that may or may not get called really doesn't do much good, so __del__ offers nothing with regard to RAII. If I have some essential cleanup, I don't need it to run some of the time, oh, when ever the GC feels like it really, I need it to run reliably, deterministically and 100% of the time.
I know that Python provides context managers, which are far more useful for that task, but why was __del__ kept around at all? What's the point?
__del__ is a finalizer. It is not a destructor. Finalizers and destructors are entirely different animals.
Destructors are called reliably, and only exist in languages with deterministic memory management (such as C++). Python's context managers (the with statement) can achieve similar effects in certain circumstances. These are reliable because the lifespan of an object is precisely fixed; in C++, objects die when they are explicitly deleted or when some scope is exited (or when a smart pointer deletes them in response to its own destruction). And that's when destructors run.
Finalizers are not called reliably. The only valid use of a finalizer is as an emergency safety net (NB: this article is written from a .NET perspective, but the concepts translate reasonably well). For instance, the file objects returned by open() automatically close themselves when finalized. But you're still supposed to close them yourself (e.g. using the with statement). This is because the objects are destroyed dynamically by the garbage collector, which may or may not run right away, and with generational garbage collection, it may or may not collect some objects in any given pass. Since nobody knows what kinds of optimizations we might invent in the future, it's safest to assume that you just can't know when the garbage collector will get around to collecting your objects. That means you cannot rely on finalizers.
In the specific case of CPython, you get slightly stronger guarantees, thanks to the use of reference counting (which is far simpler and more predictable than garbage collection). If you can ensure that you never create a reference cycle involving a given object, that object's finalizer will be called at a predictable point (when the last reference dies). This is only true of CPython, the reference implementation, and not of PyPy, IronPython, Jython, or any other implementations.
Because __del__ does get called. It's just that it's unclear when it will, because in CPython if you have circular references, the refcount mechanism can't take care of the object reclamation (and thus its finalization via __del__) and must delegate it to the garbage collector.
The garbage collector then has a problem: he cannot know in which order to break the circular references, because this may trigger additional problems (e.g. frees the memory that is going to be needed in the finalization of another object that is part of the collected loop, triggering a segfault).
The point you stress is because the interpreter may exit for reasons that prevents it to perform the cleanup (e.g. it segfaults, or some C module impolitely calls exit() ).
There's PEP 442 for safe object finalization that has been finalized in 3.4. I suggest you take a look at it.
https://www.python.org/dev/peps/pep-0442/

Way to know what objects a python multiprocessing manager is sharing?

In python multiprocessing module, in order to obtain an object from a remote Manager, most recipes tell us that we need to build a getter to recover each object:
class QueueMgr(multiprocessing.managers.SyncManager): pass
datos=Queue()
resultados=Queue()
topList=list(top)
QueueMgr.register('get_datos',callable=lambda:datos)
QueueMgr.register('get_resultados',callable=lambda:resultados)
QueueMgr.register('get_top',callable=lambda:topList)
def Cola_run():
queueMgr=QueueMgr(address=('172.2.0.1', 25555),authkey="foo")
queueMgr.get_server().serve_forever()
Cola=Thread(target=Cola_run)
Cola.daemon=True
Cola.start()
and than the same getter must be declared in the client program:
class QueueMgr(multiprocessing.managers.SyncManager): pass
QueueMgr.register('get_datos')
QueueMgr.register('get_resultados')
QueueMgr.register('get_top')
queueMgr=QueueMgr(address=('172.22.0.4', 25555),authkey="foo")
queueMgr.connect()
datos=queueMgr.get_datos()
resultados=queueMgr.get_resultados()
top=queueMgr.get_top()._getvalue()
Ok, it covers most usage cases. But I find the code looks ugly. Perhaps I am not getting the right recipe. But if it is really so, then at least I could do some nicer code in the client, perhaps automagically declaring the getters, if I were able to known in advance what objects the Manager is sharing. Is they a way to do it?
It is particularly troubling if you think that the instances of SyncManager provided by multiprocessing.Manager() allow to create sophisticated Proxy objects but that any client connecting to such SyncManager seems to need to obtain the reference to such proxies from elsewhere.
There's nothing stopping you from introspecting into the class and, for each shared attribute, generating the getter and calling register.

Python: flush a buffer before program termination via a finalizer

I keep a cache of transactions to flush (to persistent storage) on the event of a watermark or object finalization. Since __del__ is no longer guaranteed to be called on every object, is the appropriate approach to hook a similar function (or __del__ itself) into atexit.register (during initialization)?
If I'm not mistaken, this will cause the object to which the method is bound to hang around until program termination. This isn't likely to be a problem, but maybe there's a more elegant solution?
Note: I know using __del__ is non-ideal because it can cause uncatchable exceptions, but I can't think of another way to do this short of cascading finalize() calls all the way through my program. TIA!
If you have to handle ressources the prefered way is to have an explicit call to a close() or finalize() method. Have a look at the with statement to abstract that. In your case the weakref module might be an option. The cached object can be garbage collected by the system and have their __del__() method called or you finalize them if they are still alive.
I would say atexit or try and see if you can refactor the code into being able to be expressed using a with_statement which is in the __future__ in 2.5 and in 2.6 by default. 2.5 includes a module contextlib to simplify things a bit. I've done something like this when using Canonical's Storm ORM.
from future import with_statement
#contextlib.contextmanager
def start_transaction(db):
db.start()
yield
db.end()
with start_transaction(db) as transaction:
...
For a non-db case, you could just register the objects to be flushed with a global and then use something similar. The benefit of this approach is that it keeps things explicit.
If you don't need your object to be alive at the time you perform the flush, you could use weak references
This is similar to your proposed solution, but rather than using a real reference, store a list of weak references, with a callback function to perform the flush. This way, the references aren't going to keep those objects alive, and you won't run into any circular garbage problems with __del__ methods.
You can run through the list of weak references on termination to manually flush any still alive if this needs to be guaranteed done at a certain point.
Put the following in a file called destructor.py
import atexit
objects = []
def _destructor():
global objects
for obj in objects:
obj.destroy()
del objects
atexit.register(_destructor)
now use it this way:
import destructor
class MyObj(object):
def __init__(self):
destructor.objects.append(self)
# ... other init stuff
def destroy(self):
# clean up resources here
I think atexit is the way to go here.

Categories