RAII in python - How to manage the lifetime of chain of resources - python

I am bit of a python newbie, but I am implementing a benchmarking tool in python that will for example create several sets of resources which depend on each other. And when the program goes out of scope, I want to cleanup the resources in the correct order.
I'm from a C++ background, in C++ I know I can do this with RAII (constructors, destructors).
What is an equivalent pattern in pattern for this problem? Is there a way to do RAII in python or there is a better way to solve this problem?

You are probably looking for a context manager, which is an object that can be used in a with statement:
with context() as c:
do_something(c)
When the with statement is entered, the expression (in this case, context()) will be evaluated, and should return a context manager. __enter__() will be called on the context manager, and the result (which may or may not be the same object as the context manager) is assigned to the variable specified with as. No matter how control is exiting the with body, __exit__() will be called on the context manager, with arguments that specify whether an exception was thrown or not.
As an example: the builtin open() should be used in this way in order to close the opened file after interacting with it.
A new context manager type can easily be defined with contextlib.
For a more one-off solution, you can use try/finally: the finally block is executed after the try block, no matter how control exits the try block:
try:
do_something()
finally:
cleanup()

Related

Are generators with context managers an anti-pattern?

I'm wondering about code like this:
def all_lines(filename):
with open(filename) as infile:
yield from infile
The point of a context manager is to have explicit control over the lifetime of some form of state, e.g. a file handle. A generator, on the other hand, keeps its state until it is exhausted or deleted.
I do know that both cases work in practice. But I'm worried about whether it is a good idea. Consider for example this:
def all_first_lines(filenames):
return [next(all_lines(filename), None) for filename in filenames]
I never exhaust the generators. Instead, their state is destroyed when the generator object is deleted. This works fine in reference-counted implementations like CPython, but what about garbage-collected implementations? I'm practically relying on the reference counter for managing state, something that context managers were explicitly designed to avoid!
And even in CPython it shouldn't be too hard to construct cases were a generator is part of a reference cycle and needs the garbage collector to be destroyed.
To summarize: Would you consider it prudent to avoid context managers in generators, for example by refactoring the above code into something like this?
def all_lines(filename):
with open(filename) as infile:
return infile.readlines()
def first_line(filename):
with open(filename) as infile:
return next(infile, None)
def all_first_lines(filenames):
return [first_line(filename) for filename in filenames]
While it does indeed extend the lifetime of the object until the generator exits or is destroyed, it also can make the generators clearer to work with.
Consider creating the generators under an outer with and passing the file as an argument instead of them opening it. Now the file is invalid for use after the context manager is exited, even though the generators can still be seen as usable.
If limiting the time for how long the handles are held is important, you can explicitly close the generators using the close method after you are done with them.
This is a similar problem to what trio tries to solve with its nurseries for asynchronous tasks, where the nursery context manager waits for every task spawned from that nursery to exit before proceeding, the tutorial example illustrates this. This blog post by the author can provide some reasoning for the way it's done in trio which can be an interesting read that's somewhat related to the problem.
There are two answers to your question :
the absolutist : indeed, the context managers will not serve their role, the GC will have to clean the mess that should not have happened
the pragmatic : true, but is it actually a problem ? Your file handle will get released a few milliseconds later, what's the bother ? Does it have a measurable impact on production, or is it just bikeshedding ?
I'm not an expert to Python alt implementations' differences (see this page for PyPy's example), but I posit that this lifetime problem will not occur in 99% of cases. If you happen to hit in prod, then yes, you should address it (either with your proposal, or a mix of generator with context manager) otherwise, why bother ? I mean it in a kind way : your point is strictly valid, but irrelevant to most cases.

How does one expose a memory managed item that must be closed/deleted as an importable resource in python3?

Suppose I have an item that interfaces with a C library. The interface allocates memory.
The __del__ method takes care of everything, but there is no guarantee the __del__ method will be called in an imperative python3 runtime.
So, I have overloaded the context manager functions and can declare my item 'with':
with Foo(**kwargs) as foo:
foo.doSomething()
# no memory leaks
However, I am now exposing my foo in an __init__.py, and am curious how I could possibly expose the context manager object in a way that allows a user to use it without using it inside of a 'with' block.
Is there a way I can open/construct my Foo inside my module such that there is a guarantee __del__ will be called (or the context manager exit function) so that it is exposed for use, but doesn't expose a daemon or other long term process to risk of memory loss?
Or is deletion implied when an object is constructed implicitly via import, even though it ~may or may not~ occur when the object is constructed in the runtime scope?
Although this should probably not be the case, or at least might not always be the case from version to version...
...Python3.9 does indeed call __del__ on objects initialized prior to importation.
I can't accept this answer because I have no way of directly proving that it is not possible python3 does not call the __del__, but it has not failed to call it so far, whereas I get a memory leak every time I do not dispose the object properly after declaring it during runtime flow.

Proper finalization in Python

I have a bunch of instances, each having a unique tempfile for its use (save data from memory to disk and retrieve them later).
I want to be sure that at the end of the day, all these files are removed. However, I want to leave a room for a fine-grained control of their deletion. That is, some files may be removed earlier, if needed (e.g. they are too big and not important any more).
What is the best / recommended way to achieve this?
May thoughts on that
The try-finalize blocks or with statements are not an option, as we have many files, whose lifetime may overlap each other. Also, it hardly admits the option of finer control.
From what I have read, __del__ is also not a feasible option, as it is not even guaranteed that it will eventually run (although, it is not entirely clear to me, what are the "risky" cases). Also (if it is still the case), the libraries may not be available when __del__ runs.
tempfile library seems promising. However, the file is gone after just closing it, which is definitely a bummer, as I want them to be closed (when they perform no operation) to limit the number of open files.
The library promises that the file "will be destroyed as soon as it is closed (including an implicit close when the object is garbage collected)."
How do they achieve the implicit close? E.g. in C# I would use a (reliable) finalizer, which __del__ is not.
atexit library seems to be the best candidate, which can work as a reliable finalizer instead of __del__ to implement safe disposable pattern. The only problem, compared to object finalizers, is that it runs truly at-exit, which is rather inconvenient (what if the object eligible to be garbage-collected earlier?).
Here, the question still stands. How the library achieves that the methods always run? (Except in a really unexpected cases with which is hard to do anything)
In ideal case, it seems that a combination of __del__ and atexit library may perform best. That is, the clean-up is both at __del__ and the method registered in atexit, while repeated clean-up would be forbidden. If __del__ was called, the registered will be removed.
The only (yet crucial) problem is that __del__ won't run if a method is registered at atexit, because a reference to the object exists forever.
Thus, any suggestion, advice, useful link and so on is welcomed.
I suggest considering weakref built-in module for this task, more specifically weakref.finalize simple example:
import weakref
class MyClass:
pass
def clean_up(*args):
print('clean_up', args)
my_obj = MyClass()
weakref.finalize(my_obj, clean_up, 'arg1', 'arg2', 'arg3')
del my_obj # optional
when run it will output
clean_up ('arg1', 'arg2', 'arg3')
Note that clean_up will be executed even without del-ing of my_obj (you might delete last line of code and behavior will not change). clean_up is called after all strong references to my_obj are gone or at end (like using atexit module).

What is the Python "with" statement used for?

I am trying to understand the with statement in python. Everywhere I look it talks of opening and closing a file, and is meant to replace the try-finally block. Could someone post some other examples too. I am just trying out flask and there are with statements galore in it. Definitely request someone to provide some clarity on it.
There's a very nice explanation here. Basically, the with statement calls two special methods on the associated object. The __enter__ and __exit__ methods. The enter method returns the variable associated with the "with" statement. While the __exit__ method is called after the statement executes to handle any cleanup (such as closing a file pointer).
The idea of the with statement is to make "doing the right thing" the path of least resistance. While the file example is the simplest, threading locks actually provide a more classic example of non-obviously buggy code:
try:
lock.acquire()
# do stuff
finally:
lock.release()
This code is broken - if the lock acquisition ever fails, either the wrong exception will be thrown (since the code will attempt to release a lock that it never acquired), or, worse, if this is a recursive lock, it will be released early. The correct code looks like this:
lock.acquire()
try:
# do stuff
finally:
# If lock.acquire() fails, this *doesn't* run
lock.release()
By using a with statement, it becomes impossible to get this wrong, since it is built into the context manager:
with lock: # The lock *knows* how to correctly handle acquisition and release
# do stuff
The other place where the with statement helps greatly is similar to the major benefit of function and class decorators: it takes "two piece" code, which may be separated by an arbitrary number of lines of code (the function definition for decorators, the try block in the current case) and turns it into "one piece" code where the programmer simply declares up front what they're trying to do.
For short examples, this doesn't look like a big gain, but it actually makes a huge difference when reviewing code. When I see lock.acquire() in a piece of code, I need to scroll down and check for a corresponding lock.release(). When I see with lock:, though, no such check is needed - I can see immediately that the lock will be released correctly.
There are twelve examples of using with in PEP343, including the file-open example:
A template for ensuring that a lock, acquired at the start of a
block, is released when the block is left
A template for opening a file that ensures the file is closed
when the block is left
A template for committing or rolling back a database
transaction
Example 1 rewritten without a generator
Redirect stdout temporarily
A variant on opened() that also returns an error condition
Another useful example would be an operation that blocks
signals
Another use for this feature is the Decimal context
Here's a simple context manager for the decimal module
A generic "object-closing" context manager
a released() context to temporarily release a previously acquired lock by swapping the acquire() and release() calls
A "nested" context manager that automatically nests the
supplied contexts from left-to-right to avoid excessive
indentation

Does an application-wide exception handler make sense?

Long story short, I have a substantial Python application that, among other things, does outcalls to "losetup", "mount", etc. on Linux. Essentially consuming system resources that must be released when complete.
If my application crashes, I want to ensure these system resources are properly released.
Does it make sense to do something like the following?
def main():
# TODO: main application entry point
pass
def cleanup():
# TODO: release system resources here
pass
if __name__ == "__main__":
try:
main()
except:
cleanup()
raise
Is this something that is typically done? Is there a better way? Perhaps the destructor in a singleton class?
I like top-level exception handlers in general (regardless of language). They're a great place to cleanup resources that may not be immediately related to resources consumed inside the method that throws the exception.
It's also a fantastic place to log those exceptions if you have such a framework in place. Top-level handlers will catch those bizarre exceptions you didn't plan on and let you correct them in the future, otherwise, you may never know about them at all.
Just be careful that your top-level handler doesn't throw exceptions!
A destructor (as in a __del__ method) is a bad idea, as these are not guaranteed to be called. The atexit module is a safer approach, although these will still not fire if the Python interpreter crashes (rather than the Python application), or if os._exit() is used, or the process is killed aggressively, or the machine reboots. (Of course, the last item isn't an issue in your case.) If your process is crash-prone (it uses fickle third-party extension modules, for instance) you may want to do the cleanup in a simple parent process for more isolation.
If you aren't really worried, use the atexit module.
Application wide handler is fine. They are great for logging. Just make sure that the application wide one is durable and is unlikely to crash itself.
if you use classes, you should free the resources they allocate in their destructors instead, of course. Use the try: on entire application just if you want to free resources that aren't already liberated by your classes' destructors.
And instead of using a catch-all except:, you should use the following block:
try:
main()
finally:
cleanup()
That will ensure cleanup in a more pythonic way.
That seems like a reasonable approach, and more straightforward and reliable than a destructor on a singleton class. You might also look at the "atexit" module. (Pronounced "at exit", not "a tex it" or something like that. I confused that for a long while.)
Consider writing a context manager and using the with statement.

Categories