I have a bunch of instances, each having a unique tempfile for its use (save data from memory to disk and retrieve them later).
I want to be sure that at the end of the day, all these files are removed. However, I want to leave a room for a fine-grained control of their deletion. That is, some files may be removed earlier, if needed (e.g. they are too big and not important any more).
What is the best / recommended way to achieve this?
May thoughts on that
The try-finalize blocks or with statements are not an option, as we have many files, whose lifetime may overlap each other. Also, it hardly admits the option of finer control.
From what I have read, __del__ is also not a feasible option, as it is not even guaranteed that it will eventually run (although, it is not entirely clear to me, what are the "risky" cases). Also (if it is still the case), the libraries may not be available when __del__ runs.
tempfile library seems promising. However, the file is gone after just closing it, which is definitely a bummer, as I want them to be closed (when they perform no operation) to limit the number of open files.
The library promises that the file "will be destroyed as soon as it is closed (including an implicit close when the object is garbage collected)."
How do they achieve the implicit close? E.g. in C# I would use a (reliable) finalizer, which __del__ is not.
atexit library seems to be the best candidate, which can work as a reliable finalizer instead of __del__ to implement safe disposable pattern. The only problem, compared to object finalizers, is that it runs truly at-exit, which is rather inconvenient (what if the object eligible to be garbage-collected earlier?).
Here, the question still stands. How the library achieves that the methods always run? (Except in a really unexpected cases with which is hard to do anything)
In ideal case, it seems that a combination of __del__ and atexit library may perform best. That is, the clean-up is both at __del__ and the method registered in atexit, while repeated clean-up would be forbidden. If __del__ was called, the registered will be removed.
The only (yet crucial) problem is that __del__ won't run if a method is registered at atexit, because a reference to the object exists forever.
Thus, any suggestion, advice, useful link and so on is welcomed.
I suggest considering weakref built-in module for this task, more specifically weakref.finalize simple example:
import weakref
class MyClass:
pass
def clean_up(*args):
print('clean_up', args)
my_obj = MyClass()
weakref.finalize(my_obj, clean_up, 'arg1', 'arg2', 'arg3')
del my_obj # optional
when run it will output
clean_up ('arg1', 'arg2', 'arg3')
Note that clean_up will be executed even without del-ing of my_obj (you might delete last line of code and behavior will not change). clean_up is called after all strong references to my_obj are gone or at end (like using atexit module).
Related
In the comments of this question about a python one-liner, it occurred to me I have no idea how python handles anonymous file objects. From the question:
open(to_file, 'w').write(open(from_file).read())
There are two calls to open without using the with keyword (which is usually how I handle files). I have, in the past, used this kind of unnamed file. IIRC, it seemed there was a leftover OS-level lock on the file that would expire after a minute or two.
So what happens to these file handles? Are they cleaned up by garbage collection? By the OS? What happens to the Python machine and file when close() is called, and will it all happen anyway when the script finishes and some time passes?
Monitoring the file descriptor on Linux (by checking /proc/$$/fds) and the File Handle on Windows (using SysInternals tools) it appears that the file is closed immediately after the statement.
This cannot be guarenteed however, since the garbage collector has to execute. In the testing I have done it does get closed at once every time.
The with statement is recommended to be used with open, however the occasions when it is actually needed are rare. It is difficult to demonstrate a scenario where you must use with, but it is probably a good idea to be safe.
So your one-liner becomes:
with open(to_file, 'w') as tof, open(from_file) as fof:
tof.write(fof.read())
The advantage of with is that the special method (in the io class) called __exit__() is guaranteed* to be called.
* Unless you do something like os._exit().
The files will get closed after the garbage collector collects them, CPython will collect them immediately because it uses reference counting, but this is not a guaranteed behavior.
If you use files without closing them in a loop you might run out of file descriptors, that's why it's recommended to use the with statement (if you're using 2.5 you can use from __future__ import with_statement).
At first glance, it seems like Python's __del__ special method offers much the same advantages a destructor has in C++. But according to the Python documentation (https://docs.python.org/3.4/reference/datamodel.html), there is no guarantee that your object's __del__ method ever gets called at all!
It is not guaranteed that __del__() methods are called for objects that still exist when the interpreter exits.
So in other words, the method is useless! Isn't it? A hook function that may or may not get called really doesn't do much good, so __del__ offers nothing with regard to RAII. If I have some essential cleanup, I don't need it to run some of the time, oh, when ever the GC feels like it really, I need it to run reliably, deterministically and 100% of the time.
I know that Python provides context managers, which are far more useful for that task, but why was __del__ kept around at all? What's the point?
__del__ is a finalizer. It is not a destructor. Finalizers and destructors are entirely different animals.
Destructors are called reliably, and only exist in languages with deterministic memory management (such as C++). Python's context managers (the with statement) can achieve similar effects in certain circumstances. These are reliable because the lifespan of an object is precisely fixed; in C++, objects die when they are explicitly deleted or when some scope is exited (or when a smart pointer deletes them in response to its own destruction). And that's when destructors run.
Finalizers are not called reliably. The only valid use of a finalizer is as an emergency safety net (NB: this article is written from a .NET perspective, but the concepts translate reasonably well). For instance, the file objects returned by open() automatically close themselves when finalized. But you're still supposed to close them yourself (e.g. using the with statement). This is because the objects are destroyed dynamically by the garbage collector, which may or may not run right away, and with generational garbage collection, it may or may not collect some objects in any given pass. Since nobody knows what kinds of optimizations we might invent in the future, it's safest to assume that you just can't know when the garbage collector will get around to collecting your objects. That means you cannot rely on finalizers.
In the specific case of CPython, you get slightly stronger guarantees, thanks to the use of reference counting (which is far simpler and more predictable than garbage collection). If you can ensure that you never create a reference cycle involving a given object, that object's finalizer will be called at a predictable point (when the last reference dies). This is only true of CPython, the reference implementation, and not of PyPy, IronPython, Jython, or any other implementations.
Because __del__ does get called. It's just that it's unclear when it will, because in CPython if you have circular references, the refcount mechanism can't take care of the object reclamation (and thus its finalization via __del__) and must delegate it to the garbage collector.
The garbage collector then has a problem: he cannot know in which order to break the circular references, because this may trigger additional problems (e.g. frees the memory that is going to be needed in the finalization of another object that is part of the collected loop, triggering a segfault).
The point you stress is because the interpreter may exit for reasons that prevents it to perform the cleanup (e.g. it segfaults, or some C module impolitely calls exit() ).
There's PEP 442 for safe object finalization that has been finalized in 3.4. I suggest you take a look at it.
https://www.python.org/dev/peps/pep-0442/
I have a problem in which there is a python object that is hiding somewhere. The object is a wrapper around a C library and I need to call the deallocation routine at exit, otherwise the (behind the scenes) thread will hang (it's a cython object, so the deallocation is put in the __dealloc__ method).
The problem is I can't for the life of me work out where the object is hiding. I haven't intentionally introduced any global state. Is there some way to work out where an object is lingering? Could it just be a lingering object cycle, so gc should pick it up? That said, I'd really like to work out the cause of the problem if possible.
Edit: I solved the problem, which was down to pyglet event handlers not being cleanly removed. They were in the __del__ method, but the object wasn't being deleted because the event dispatcher had hold of an object method. This is fine logically, but it seems odd to me that the object is never deleted, even at exit. Does anyone know why the __del__ is not called at interpreter exit? Actually, this question has been asked - though the answers aren't brilliant.
Anyway, the basic question still stands - how do I reliably find these lingering references?
One possible place is gc.garbage It is a list of objects that have been found unreachable, but cannot be deleted because they include __del__ methods in a cycle.
In Python previous to 3.4, if you have a cycle with several __del__ methods, the interpreter doesn't know in which way they should be executed, as they could have mutual references. So instead it doesn't execute any, and moves the objects to this list.
If you find your object there, the documentation recommends doing del gc.garbage[:].
The solution to avoid this in the first place is to use weakrefs where possible to avoid cycles.
Do open files (and other resources) get automatically closed when the script exits due to an exception?
I'm wondering if I need to be closing my resources during my exception handling.
EDIT: to be more specific, I am creating a simple log file in my script. I want to know if I need to be concerned about closing the log file explicitly in the case of exceptions.
since my script has a complex, nested, try/except blocks, doing so is somewhat complicated, so if python, CLIB, or the OS is going to close my text file when the script crashes/errors out, I don't want to waste too much time on making sure the file gets closed.
If there is a part in Python manual that talks about this, please refer me to it, but I could not find it.
A fairly straightforward question.
Two answers.
One saying, “Yes.”
The other saying, “No!”
Both with significant upvotes.
Who to believe? Let me attempt to clarify.
Both answers have some truth to them, and it depends on what you mean by a
file being closed.
First, consider what is meant by closing a file from the operating system’s
perspective.
When a process exits, the operating system clears up all the resources
that only that process had open. Otherwise badly-behaved programs that
crash but didn’t free up their resources could consume all the system
resources.
If Python was the only process that had that file open, then the file will
be closed. Similarly the operating system will clear up memory allocated by
the process, any networking ports that were still open, and most other
things. There are a few exceptional functions like shmat that create
objects that persist beyond the process, but for the most part the
operating system takes care of everything.
Now, what about closing files from Python’s perspective? If any program
written in any programming language exits, most resources will get cleaned
up—but how does Python handle cleanup inside standard Python programs?
The standard CPython implementation of Python—as opposed to other Python
implementations like Jython—uses reference counting to do most of its
garbage collection. An object has a reference count field. Every time
something in Python gets a reference to some other object, the reference
count field in the referred-to object is incremented. When a reference is
lost, e.g, because a variable is no longer in scope, the reference count is
decremented. When the reference count hits zero, no Python code can reach
the object anymore, so the object gets deallocated. And when it gets
deallocated, Python calls the __del__() destructor.
Python’s __del__() method for files flushes the buffers and closes the
file from the operating system’s point of view. Because of reference
counting, in CPython, if you open a file in a function and don’t return the
file object, then the reference count on the file goes down to zero when
the function exits, and the file is automatically flushed and closed. When
the program ends, CPython dereferences all objects, and all objects have
their destructors called, even if the program ends due to an unhanded
exception. (This does technically fail for the pathological case where you have a cycle
of objects with destructors,
at least in Python versions before 3.4.)
But that’s just the CPython implementation. Python the language is defined
in the Python language reference, which is what all Python
implementations are required to follow in order to call themselves
Python-compatible.
The language reference explains resource management in its data model
section:
Some objects contain references to “external” resources such as open
files or windows. It is understood that these resources are freed when
the object is garbage-collected, but since garbage collection is not
guaranteed to happen, such objects also provide an explicit way to
release the external resource, usually a close() method. Programs are
strongly recommended to explicitly close such objects. The
‘try...finally‘ statement and the ‘with‘ statement provide convenient
ways to do this.
That is, CPython will usually immediately close the object, but that may
change in a future release, and other Python implementations aren’t even
required to close the object at all.
So, for portability and because explicit is better than implicit,
it’s highly recommended to call close() on everything that can be
close()d, and to do that in a finally block if there is code between
the object creation and close() that might raise an exception. Or to use
the with syntactic sugar that accomplishes the same thing. If you do
that, then buffers on files will be flushed, even if an exception is
raised.
However, even with the with statement, the same underlying mechanisms are
at work. If the program crashes in a way that doesn’t give Python’s
__del__() method a chance to run, you can still end up with a corrupt
file on disk:
#!/usr/bin/env python3.3
import ctypes
# Cast the memory adress 0x0001 to the C function int f()
prototype = ctypes.CFUNCTYPE(int)
f = prototype(1)
with open('foo.txt', 'w'):
x.write('hi')
# Segfault
print(f())
This program produces a zero-length file. It’s an abnormal case, but it
shows that even with the with statement resources won’t always
necessarily be cleaned up the way you expect. Python tells the operating
system to open a file for writing, which creates it on disk; Python writes hi
into the C library’s stdio buffers; and then it crashes before the with
statement ends, and because of the apparent memory corruption, it’s not safe
for the operating system to try to read the remains of the buffer and flush them to disk. So the program fails to clean up properly even though there’s a with statement. Whoops. Despite this, close() and with almost always work, and your program is always better off having them than not having them.
So the answer is neither yes nor no. The with statement and close() are technically not
necessary for most ordinary CPython programs. But not using them results in
non-portable code that will look wrong. And while they are extremely
helpful, it is still possible for them to fail in pathological cases.
No, they don't.
Use with statement if you want your files to be closed even if an exception occurs.
From the docs:
The with statement is used to wrap the execution of a block with
methods defined by a context manager. This allows common
try...except...finally usage patterns to be encapsulated for convenient reuse.
From docs:
The with statement allows objects like files to be used in a way that ensures they are always cleaned up promptly and correctly.
with open("myfile.txt") as f:
for line in f:
print line,
After the statement is executed, the file f is always closed, even if a problem was encountered while processing the lines. Other objects which provide predefined clean-up actions will indicate this in their documentation.
Yes they do.
This is a CLIB (at least in cpython) and operating system thing. When the script exits, CLIB will flush and close all file objects. Even if it doesn't (e.g., python itself crashes) the operating system closes its resources just like any other process. It doesn't matter if it was an exception or a normal exit or even if its python or any other program.
Here's a script that writes a file and raises an exception before the file contents have been flushed to disk. Works fine:
~/tmp/so$ cat xyz.txt
cat: xyz.txt: No such file or directory
~/tmp/so$ cat exits.py
f = open("xyz.txt", "w")
f.write("hello")
print("file is", open("xyz.txt").read())
assert False
~/tmp/so$ python exits.py
('file is', '')
Traceback (most recent call last):
File "exits.py", line 4, in <module>
assert False
AssertionError
~/tmp/so$ cat xyz.txt
hello
I, as well as other persons in this thread, are left with the question, "Well what is finally true?"
Now, supposing that files are left open in a premature program termination -- and there are a lot of such cases besides exceptions due to file handling -- the only safe way to avoid this, is to read the whole (or part of the) file into a buffer and close it. Then handle the contents in the buffer as needed. This is esp. the case for global search, changes, etc. that have to be done on the file. After changes are done, one can then write the whole buffer to the same or other file at once, avoiding the risk to leave the the newly created file open -- by doing a lot readings and writings -- which is the worst case of all!
I keep a cache of transactions to flush (to persistent storage) on the event of a watermark or object finalization. Since __del__ is no longer guaranteed to be called on every object, is the appropriate approach to hook a similar function (or __del__ itself) into atexit.register (during initialization)?
If I'm not mistaken, this will cause the object to which the method is bound to hang around until program termination. This isn't likely to be a problem, but maybe there's a more elegant solution?
Note: I know using __del__ is non-ideal because it can cause uncatchable exceptions, but I can't think of another way to do this short of cascading finalize() calls all the way through my program. TIA!
If you have to handle ressources the prefered way is to have an explicit call to a close() or finalize() method. Have a look at the with statement to abstract that. In your case the weakref module might be an option. The cached object can be garbage collected by the system and have their __del__() method called or you finalize them if they are still alive.
I would say atexit or try and see if you can refactor the code into being able to be expressed using a with_statement which is in the __future__ in 2.5 and in 2.6 by default. 2.5 includes a module contextlib to simplify things a bit. I've done something like this when using Canonical's Storm ORM.
from future import with_statement
#contextlib.contextmanager
def start_transaction(db):
db.start()
yield
db.end()
with start_transaction(db) as transaction:
...
For a non-db case, you could just register the objects to be flushed with a global and then use something similar. The benefit of this approach is that it keeps things explicit.
If you don't need your object to be alive at the time you perform the flush, you could use weak references
This is similar to your proposed solution, but rather than using a real reference, store a list of weak references, with a callback function to perform the flush. This way, the references aren't going to keep those objects alive, and you won't run into any circular garbage problems with __del__ methods.
You can run through the list of weak references on termination to manually flush any still alive if this needs to be guaranteed done at a certain point.
Put the following in a file called destructor.py
import atexit
objects = []
def _destructor():
global objects
for obj in objects:
obj.destroy()
del objects
atexit.register(_destructor)
now use it this way:
import destructor
class MyObj(object):
def __init__(self):
destructor.objects.append(self)
# ... other init stuff
def destroy(self):
# clean up resources here
I think atexit is the way to go here.