The Python reference on the data model notes that
catching an exception with a ‘try…except’ statement may keep objects alive.
It seems rather obvious that exceptions change control flow, potentially leading to different objects remaining referenced. Why is it explicitly mentioned? Is there a potential for memory leaks here?
An exception stores a traceback, which stores all child frames ("function calls") between raising and excepting. Frames reference all local names and their values, preventing the garbage collection of local names and values.
This means that an exception handler should promptly finish handling exceptions to allow child locals to be cleaned up. Still, a function cannot rely on its locals being collectable immediately after the function ends.
As a result, patterns such as RAII are not reliable to be prompt even on reference counted implementations. When prompt cleanup is required, objects should provide a means for explicit cleanup (for use in finally blocks) or preferably automatic cleanup (for use in with blocks).
Objects, values and types
[…]
Programs are strongly recommended to explicitly close such objects. The ‘try…finally’ statement and the ‘with’ statement provide convenient ways to do this.
One can observe this with a class that marks when it is garbage collected.
class Collectible:
def __init__(self, name):
self.name = name
def __del__(self, print=print):
print("Collecting", self.name)
def inner():
local_name = Collectible("inner local value")
raise RuntimeError("This is a drill")
def outer():
local_name = Collectible("outer local value")
inner()
try:
outer()
except RuntimeError as e:
print(f"handling a {type(e).__name__}: {e}")
On CPython, the output shows that the handler runs before the locals are collected:
handling a RuntimeError: This is a drill
Collecting inner local value
Collecting outer local value
Note that CPython uses reference counting, which already leads to quick cleanup as soon as possible. Other implementations may further and arbitrarily delay cleanup.
Well, AFAIK, if the exception references some object or another, those won't be collected until the exception itself is collected and, also, if the except statement happens to reference some object, that would also postergate its collection until after the block is over. I wonder if there are other, less obvious ways in which catching an exception could affect garbage collection.
Related
One of the basic changes from Python 2 to Python 3 was making print a function - which, to me, makes perfect sense given its structure. Why aren't the raise and del statements also functions? Especially in the case of raise it seems like it is taking an argument and doing something with it, just like a function does.
raise and del are definitely distinct from functions, each for different reasons:
raise exits the current flow of execution; the normal flow of byte-code interpretation is interrupted and the stack is unwound until the next exception handler is found. Functions can't do this, they create a new stack frame instead.
del can't be a function, because you must specify a specific target; you can't use just any expression, and what is deleted depends on the syntax given; if you use subscription, then deletion takes place for a given element in a container, or a name is removed from the current namespace. The right namespace to delete to is also dependent on the scope of the name deleted. See the del statement grammar definition:
del_stmt ::= "del" target_list
A function can't remove items from a parent namespace, nor can they distinguish between the result of a subscription expression or a direct reference. You pass objects to the function, but to a del statement you pass a name and a context (perhaps by the interpreter when deleting a local or global name).
print on the other hand, requires no special relationship with the current namespace or stack frame, and needs no special syntax constraints to do it's work. It is purely functionality at the application level. The global sys.stdout reference can be accessed by functions just as much as by the interpreter. As such it didn't need to be a statement, and by moving it to a function, additional benefits were made available, such as being able to override it's behaviour and to innovate on it quicker across Python releases.
Do note that part of the raise statement was moved to application-level code instead; in Python 2 you can attach a traceback to the raised exception with:
raise ExceptionClass, exception_value, traceback_object
In Python 3, attaching a traceback to an exception has been moved to the exception itself:
raise Exception("foo occurred").with_traceback(tracebackobj)
https://www.python.org/dev/peps/pep-3105/ has a list of rationals why print is made function. Of the five reasons, (IMO) the most relevant one is:
print is the only application-level functionality that has a statement dedicated to it.
As explained by Alex Martelli here https://stackoverflow.com/a/1054062:
Python statements are things the Python compiler must be specifically aware of -- they may alter the binding of names, may alter control flow, and/or may need to be entirely removed from the generated bytecode in certain conditions (the latter applies to assert). print was the only exception to this assertion in Python 2; by removing it from the roster of statements, Python 3 removes an exception, makes the general assertion "just hold", and therefore is a more regular language.
del and raise obviously alter the binding of names/alter the control flow, thus they both are okay.
I would like to handle a NameError exception by injecting the desired missing variable into the frame and then continue the execution from last attempted instruction.
The following pseudo-code should illustrate my needs.
def function():
return missing_var
try:
print function()
except NameError:
frame = inspect.trace()[-1][0]
# inject missing variable
frame.f_globals["missing_var"] = ...
# continue frame execution from last attempted instruction
exec frame.f_code from frame.f_lasti
Read the whole unittest on repl.it
Notes
As pointed out by ivan_pozdeev in his answer, this is known as resumption.
After more research, I found Veedrac's answer to the question Resuming program at line number in the context before an exception using a custom sys.excepthook posted by lc2817 very interesting. It relies on Richie Hindle's work.
Background
The code runs in a slave process, which is controlled by a parent. Tasks (functions really) are written in the parent and latter passed to the slave using dill. I expect some tasks (running in the slave process) to try to access variables from outer scopes in the parent and I'd like the slave to request those variables to the parent on the fly.
p.s.: I don't expect this magic to run in a production environment.
On the contrary to what various commenters are saying, "resume-on-error" exception handling is possible in Python. The library fuckit.py implements said strategy. It steamrollers errors by rewriting the source code of your module at import time, inserting try...except blocks around every statement and swallowing all exceptions. So perhaps you could try a similar sort of tactic?
It goes without saying: that library is intended as a joke. Don't ever use it in production code.
You mentioned that your use case is to trap references to missing names. Have you thought about using metaprogramming to run your code in the context of a "smart" namespace such as a defaultdict? (This is perhaps only marginally less of a bad idea than fuckit.py.)
from collections import defaultdict
class NoMissingNamesMeta(type):
#classmethod
def __prepare__(meta, name, bases):
return defaultdict(lambda: "foo")
class MyClass(metaclass=NoMissingNamesMeta):
x = y + "bar" # y doesn't exist
>>> MyClass.x
'foobar'
NoMissingNamesMeta is a metaclass - a language construct for customising the behaviour of the class statement. Here we're using the __prepare__ method to customise the dictionary which will be used as the class's namespace during creation of the class. Thus, because we're using a defaultdict instead of a regular dictionary, a class whose metaclass is NoMissingNamesMeta will never get a NameError. Any names referred to during the creation of the class will be auto-initialised to "foo".
This approach is similar to #AndréFratelli's idea of manually requesting the lazily-initialised data from a Scope object. In production I'd do that, not this. The metaclass version requires less typing to write the client code, but at the expense of a lot more magic. (Imagine yourself debugging this code in two years, trying to understand why non-existent variables are dynamically being brought into scope!)
The "resumption" exception handling technique has proven to be problematic, that's why it's missing from C++ and later languages.
Your best bet is to use a while loop to not resume where the exception was thrown but rather repeat from a predetermined place:
while True:
try:
do_something()
except NameError as e:
handle_error()
else:
break
You really can't unwind the stack after an exception is thrown, so you'd have to deal with the issue before hand. If your requirement is to generate these variables on the fly (which wouldn't be recommended, but you seem to understand that), then you'd have to actually request them. You can implement a mechanism for that (such as having a global custom Scope class instance and overriding __getitem__, or using something like the __dir__ function), but not as you are asking for it.
I've been testing a dirty hack inspired by this http://docs.python.org/2/library/contextlib.html .
The main idea is to bring try/finally idea onto class level and get reliable and simple class destructor.
class Foo():
def __init__(self):
self.__res_mgr__ = self.__acquire_resources__()
self.__res_mgr__.next()
def __acquire_resources__(self):
try:
# Acquire some resources here
print "Initialize"
self.f = 1
yield
finally:
# Release the resources here
print "Releasing Resources"
self.f = 0
f = Foo()
print "testing resources"
print f.f
But it always gives me:
Initialize
testing resources
1
and never "Releasing Resources". I'm basing my hope on:
As of Python version 2.5, the yield statement is now allowed in the
try clause of a try ... finally construct. If the generator is not
resumed before it is finalized (by reaching a zero reference count or
by being garbage collected), the generator-iterator’s close() method
will be called, allowing any pending finally clauses to execute. Source link
But it seems when the class member is being garbage collected together with the class their ref counts don't decrease, so as a result generators close() and thus finally is never called. As for the second part of the quote
"or by being garbage collected"
I just don't know why it's not true. Any chance to make this utopia work? :)
BTW this works on module level:
def f():
try:
print "ack"
yield
finally:
print "release"
a = f()
a.next()
print "testing"
Output will be as I expect:
ack
testing
release
NOTE: In my task I'm not able to use WITH manager because I'm releasing the resource inside end_callback of the thread (it will be out of any WITH). So I wanted to get a reliable destructor for cases when callback won't be called for some reason
The problem you are having is caused by a reference cycle and an implicit __del__ defined on your generator (it's so implicit, CPython doesn't actually show __del__ when you introspect, because only the C level tp_del exists, no Python-visible __del__ is created). Basically, when a generator has a yield inside:
A try block, or equivalently
A with block
it has an implicit __del__-like implementation. On Python 3.3 and earlier, if a reference cycle contains an object whose class implements __del__ (technically, has tp_del in CPython), unless the cycle is manually broken, the cyclic garbage collector cannot clean it up, and just sticks it in gc.garbage (import gc to gain access), because it doesn't know which objects (if any) must be collected first to clean up "nicely".
Because your class's __acquire_resources__(self) contains a reference to the instance's self, you form a reference cycle:
self -> self.__res_mgr__ (generator object) -> generator frame (referencing locals which includes) -> self
Because of this reference cycle, and the fact that the generator has a try/finally in it (creating tp_del equivalent to __del__), the cycle is uncollectable, and your finally block never gets executed unless you manually advance self.__res_mgr__ (which defeats the whole purpose).
You experiment happens to display this problem automatically because the reference cycle is implicit/automatic, but any accidental reference cycle where an object in the cycle has a class with __del__ will trigger the same problem, so even if you just did:
class Foo():
def __init__(self):
# Acquire some resources here
print "Initialize"
self.f = 1
def __del__(self):
# Release the resources here
print "Releasing Resources"
self.f = 0
if the "resources" involved could conceivably lead to a reference cycle with an instance of Foo, you'd have the same problem.
The solution here is one or both of:
Make your class a context manager so users provide the information necessary for deterministic finalization (by using with blocks) as well as providing an explicit cleanup method (e.g. close) for when with blocks aren't feasible (part of another object's state that is cleaned up through its own resource management). This is also the only way to provide deterministic cleanup on most non-CPython interpreters where reference counting semantics have never been used (so all finalizers are called non-deterministically, if at all)
Move to Python 3.4 or higher, where PEP 442 resolves the issue with uncollectable cyclic garbage (it's technically still possible to produce such cycles on CPython, but only via third party extensions that continue to use tp_del instead of updating to use the tp_finalize slot that allows cyclic garbage to be cleaned properly). It's still non-deterministic cleanup (if a reference cycle exists, you're waiting on the cyclic gc to run, sometime), but it's possible, where pre-3.4, cyclic garbage of this sort could not be cleaned up at all.
Background: I'm doing COM programming of National Instruments' TestStand in Python. TestStand complains if objects aren't "released" properly (it pops up an "objects not released properly" debug dialog box). The way to release the TestStand COM objects in Python is to ensure all variables no longer contain the object—e.g. del() them, or set them to None. Or, as long as the variables are function local variables, the object is released as soon as the variable goes out of scope when the function ends.
Well, I've followed this rule in my program, and my program releases object properly as long as there are no exceptions. But if I get an exception, then I'm getting the "objects not released" message from TestStand. This seems to indicate that function local variables aren't going out of scope normally, when an exception happens.
Here is a simplified code example:
class TestObject(object):
def __init__(self, name):
self.name = name
print("Init " + self.name)
def __del__(self):
print("Del " + self.name)
def test_func(parameter):
local_variable = parameter
try:
pass
# raise Exception("Test exception")
finally:
pass
# local_variable = None
# parameter = None
outer_object = TestObject('outer_object')
try:
inner_object = TestObject('inner_object')
try:
test_func(inner_object)
finally:
inner_object = None
finally:
outer_object = None
When this runs as shown, it shows what I expect:
Init outer_object
Init inner_object
Del inner_object
Del outer_object
But if I uncomment the raise Exception... line, instead I get:
Init outer_object
Init inner_object
Del outer_object
Traceback (most recent call last):
...
Exception: Test exception
Del inner_object
The inner_object is deleted late due to the exception.
If I uncomment the lines that set both parameter and local_variable to None, then I get what I expect:
Init outer_object
Init inner_object
Del inner_object
Del outer_object
Traceback (most recent call last):
...
Exception: Test exception
So when exceptions happen in Python, what exactly happens to function local variables? Are they being saved somewhere so they don't go out of scope as normal? What is "the right way" to control this behaviour?
Your exception-handling is probably creating reference loops by keeping references to frames. As the docs put it:
Note Keeping references to frame
objects, as found in the first element
of the frame records these functions
return [[NB: "these functions" here refers to
some in module inspect, but the rest of the
paragraph applies more widely!]], can cause your program to
create reference cycles. Once a
reference cycle has been created, the
lifespan of all objects which can be
accessed from the objects which form
the cycle can become much longer even
if Python’s optional cycle detector is
enabled. If such cycles must be
created, it is important to ensure
they are explicitly broken to avoid
the delayed destruction of objects and
increased memory consumption which
occurs. Though the cycle detector will
catch these, destruction of the frames
(and local variables) can be made
deterministic by removing the cycle in
a finally clause. This is also
important if the cycle detector was
disabled when Python was compiled or
using gc.disable(). For example:
def handle_stackframe_without_leak():
frame = inspect.currentframe()
try:
# do something with the frame
finally:
del frame
A function's scope is for the entire function. Handle this in finally.
According to this answer for another question, it is possible to inspect local variables on the frame in an exception traceback via tb_frame.f_locals. So it does look as though the objects are kept "alive" for the duration of the exception handling.
Let's say I want to be able to log to file every time any exception is raised, anywhere in my program. I don't want to modify any existing code.
Of course, this could be generalized to being able to insert a hook every time an exception is raised.
Would the following code be considered safe for doing such a thing?
class MyException(Exception):
def my_hook(self):
print('---> my_hook() was called');
def __init__(self, *args, **kwargs):
global BackupException;
self.my_hook();
return BackupException.__init__(self, *args, **kwargs);
def main():
global BackupException;
global Exception;
BackupException = Exception;
Exception = MyException;
raise Exception('Contrived Exception');
if __name__ == '__main__':
main();
If you want to log uncaught exceptions, just use sys.excepthook.
I'm not sure I see the value of logging all raised exceptions, since lots of libraries will raise/catch exceptions internally for things you probably won't care about.
Your code as far as I can tell would not work.
__init__ has to return None and you are trying to return an instance of backup exception. In general if you would like to change what instance is returned when instantiating a class you should override __new__.
Unfortunately you can't change any of the attributes on the Exception class. If that was an option you could have changed Exception.__new__ and placed your hook there.
the "global Exception" trick will only work for code in the current module. Exception is a builtin and if you really want to change it globally you need to import __builtin__; __builtin__.Exception = MyException
Even if you changed __builtin__.Exception it will only affect future uses of Exception, subclasses that have already been defined will use the original Exception class and will be unaffected by your changes. You could loop over Exception.__subclasses__ and change the __bases__ for each one of them to insert your Exception subclass there.
There are subclasses of Exception that are also built-in types that you also cannot modify, although I'm not sure you would want to hook any of them (think StopIterration).
I think that the only decent way to do what you want is to patch the Python sources.
This code will not affect any exception classes that were created before the start of main, and most of the exceptions that happen will be of such kinds (KeyError, AttributeError, and so forth). And you can't really affect those "built-in exceptions" in the most important sense -- if anywhere in your code is e.g. a 1/0, the real ZeroDivisionError will be raised (by Python's own internals), not whatever else you may have bound to that exceptions' name.
So, I don't think your code can do what you want (despite all the semicolons, it's still supposed to be Python, right?) -- it could be done by patching the C sources for the Python runtime, essentially (e.g. by providing a hook potentially caught on any exception even if it's later caught) -- such a hook currently does not exist because the use cases for it would be pretty rare (for example, a StopIteration is always raised at the normal end of every for loop -- and caught, too; why on Earth would one want to trace that, and the many other routine uses of caught exceptions in the Python internals and standard library?!).
Download pypy and instrument it.