What are use cases in python 3 of writing a custom __del__ method or relying on one from stdlib1? That is, in what scenario is it reasonably safe, and can do something that's hard to do without it?
For many good reasons (1 2 3 4 5 6), the usual recommendation is to avoid __del__ and instead use context managers or perform the cleanup manually:
__del__ is not guaranteed to be called if objects are alive on intrepreter exit2.
At the point one expects the object can be destroyed, the ref count may actually be non-zero (e.g., a reference may survive through a traceback frame held onto by a calling function). This makes the destruction time far more uncertain than the mere unpredictability of gc implies.
Garbage collector cannot get rid of cycles if they include more than 1 object with __del__
The code inside __del__ must be written super carefully:
object attributes set in __init__ may not be present since __init__ might have raised an exception;
exceptions are ignored (only printed to stderr);
globals may no longer be available.
Update:
PEP 442 has made significant improvements in the behavior of __del__. It seems though that my points 1-4 are still valid?
Update 2:
Some of the top python libraries embrace the use of __del__ in the post-PEP 442 python (i.e., python 3.4+). I guess my point 3 is no longer valid after PEP 442, and the other points are accepted as unavoidable complexity of object finalization.
1I expanded the question from just writing a custom __del__ method to include relying on __del__ from stdlib.
2It seems that __del__ is always called on interpreter exit in the more recent versions of Cpython (does anyone have a counter-example?). However, it doesn't matter for the purpose of __del__'s usablity: the docs explicitly provide no guarantee about this behavior, so one cannot rely on it (it may change in future versions, and it may be different in non-CPython interpreters).
Context managers (and try/finally blocks) are somewhat more restrictive than __del__. In general they require you to structure your code in such a way that the lifetime of the resource you need to free doesn't extend beyond a single function call at some level in the call stack, rather than, say, binding it to the lifetime of a class instance that could be destroyed at unpredictable times and places. It's usually a good thing to restrict the lifetime of resources to one scope, but there sometimes edge cases where this pattern is an awkward fit.
The only case where I've used __del__ (aside from for debugging, c.f. #MSeifert's answer) is for freeing memory allocated outside of Python by an external library. Because of the design of the library I was wrapping, it was difficult to avoid having a large number of objects that held pointers to heap-allocated memory. Using a __del__ method to free the pointers was the easiest way to do cleanup, since it would have been impractical to enclose the lifespan of each instance inside a context manager.
One use-case is debugging. If you want to track the lifetime of a specific object it's convenient to write a temporary __del__ method. It can be used to do some logging or just to print something. I have used it a few times especially when I was interested in when and how instances are deleted. It's sometimes good to know when you create and discard a lot of temporary instances. But as I said I only ever used this to satisfy my curiosity or when debugging.
Another use-case is subclassing a class that defines a __del__ method. Sometimes you find a class that you want to subclass but the internals require you to actually override __del__ to control the order in which the instance is cleaned up. That's very rare too because you need to find a class with __del__, you nee to subclass it and you need to introduced some internals that actually require to call the superclass __del__ at exactly the right time. I actually did that once but I don't remember where and why it was important (maybe I didn't even know about alternatives then, so treat this as possible use-case).
When you wrap an external object (for example a c object that isn't tracked by Python) that really, really needs to be deallocated even if someone "forgets" (I suspect a lot of people just omit them on purpose!) to use the context manager that you provided.
However all these cases are (or should be) very, very rare. Actually it's a bit like with metaclasses: They are fun and it's really cool to understand the concepts because you can probe the "fun parts" of python. But in practice:
If you wonder whether you need them [metaclasses], you don’t (the people who actually need them know with certainty that they need them, and don’t need an explanation about why).
Citation (probably) from Tim Peters (I haven't found the original reference).
One case where i always use __del__, is for closing a aiohttp.ClientSession object.
When you don't, aiohttp will print warnings about the unclosed client session.
Related
I needed to make a class that extended dict and ran into an interesting problem illustrated by the dumb example in the image below.
Why is d.update() ignoring the class's __getitem__?
EDIT: This is in python2.7 which does not appear to contain collections.UserDict
Thinking UserDict.UserDict is the equivalent I tried this, and it gets closer, but still behaves interestingly.
This is an example of the open-closed-principle (the class is open for extension but closed for modification). It is good thing to have because it allows subclassers to extend or override a method without unintentionally triggering behavior changes in others and without breaking the classes's invariants.
We even do this in pure python code as well; for example, inside the pure python ordered dict code, the class local call from __init__() to update() is done using name mangling. This allows a subclasser to override update() without accidentally breaking __init__().
Sometimes, this is inconvenient. It means that a subclasser has to override every method whose behavior they want to change including get(), update(), and others. However, there are offsetting benefits (protection of internal invariants, preventing implementation details from leaking from the abstraction, and allowing users to assume the methods are independent of one another).
This style (chosen by Guido from the outset) is the default for the builtin types (otherwise we would forever be fighting segfaulting invariant violations) and for some pure python classes.
We do document when there is a departure from the default. For example, the cmd module uses the framework design pattern, letting the user define various do_action() methods. Also, some of the http modules do the same, specifically documenting that a user's do_GET() method is called and that is how you attach customized HTTP event handlers.
In the absence of specifically documented method hooks (i.e. those listed above or methods like dict.__missing__(), a subclasser should presume method independence. Otherwise, how are you to know whether __getitem__() calls get() under the hood or vice-versa?
FWIW, this isn't unique to Python. It comes up quite a bit in object oriented programming. Correctly designed classes either document root methods that affect the behavior of other methods or they are presumed to be independent.
There may need to be a FAQ for this, but nothing is broken or wrong here (other than Python having way too many dict variants to chose from). If someone mistakenly assumes or believes that __getitem__() must be called by the other accessor methods, they find out very quickly that assumption is wrong (that is if they run even minimal tests on the code).
I'm reading the python tutorial. The 3rd paragraph confuses me a bit.
"Clients should use data attributes with care — clients may mess up invariants maintained by the methods by stamping on their data attributes."
What exactly do they mean by invariants? Do they mean data attributes that certain methods rely on? (e.g a method that returns a certain data member; i.e. a getter method)
I believe what they want to say there is that you should be careful when accessing/mutating an object’s properties. (Note that I call them “properties”, not “data attributes”)
This is because an object may put something in a property that might be necessary to maintain an object’s state, i.e. an invariant which you would normally not be given means means to mess with.
But unlike other programming languages, Python does not have any protection for object members. There are no private instance variables or methods†, so anyone can change an object in the way they want to, possibly completely destroying its functionality.
So the tutorial suggests you to avoid storing important things in properties—but to be honest, there is not really a much better way, and if someone would want to mess with it, they just could. For example you can swap out whole methods at run time, replacing them with a completely different logic.
So trying to protect yourself too much is not really worth it. Instead, I would suggest you to just document everything in a clear way. E.g. point out which properties are free to touch by users, and which are for internal use only.
† There are some means and conventions, e.g. one leading underscore for internal/protected members, and two leading underscores for private members, but those will not technically prevent you from accessing it, so it’s not a protection.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Python (and Python C API): new versus init
I'm at college just now and the lecturer was using the terms constructors and initializers interchangeably. I'm pretty sure that this is wrong though.
I've tried googling the answer but not found the answer I'm looking for.
In most OO languages, they are the same step, so he's not wrong for things like java, c++, etc. In python they are done in two steps: __new__ is the constructor; __init__ is the initializer.
Here is another answer that goes into more detail about the differences between them.
In almost all usual cases, Python does not have constructors in the same sense used by other OO languages because manually managing memory is generally discouraged. Instead, what you should usually do is define an __init__ method on the class. This method is called to initialize the new instance object automatically, first thing after it is constructed. Thus, it is not really a constructor, and talking about it as a constructor might confuse some people.
Of course some people want to call it a constructor because it is used a little bit like a constructor - fundamentally you can call it whatever you want as long as everyone understands what you are actually referring to. But in general, to be explicit and make yourself understood, call it an init method or something other than a constructor. Fundamentally, different languages just come with somewhat different terminology and speaking very clearly will always require adjustment to your subject matter and audience.
In Python it is possible to manage instance creation and destruction at a finer granularity, though you won't want to unless you know what you're doing. This is done by defining __new__ and __del__ methods to hook object instantiation and del statements. Whether these qualify as constructors and destructors precisely is a little more debatable (Python docs call the del method a destructor, but tend to be vaguer on what constitutes a constructor, e.g. including many functions which return object instances). I'd still encourage you to use the specific terminology for the language at hand, and in comparative discussions to define your terms up front. As always, your choice of terms while speaking involves tradeoffs between the audience being able to easily follow you and the audience potentially being led into confusion: if you are talking about memory management probably be as specific as possible, but if you are talking loosely then just use some word your audience understands and be ready to clarify.
Your instructor is being unclear at worst, I'm not aware of any one canonical definition of these terms but they might cause confusion for people who have learned very specific definitions from other languages.
http://docs.python.org/reference/datamodel.html#basic-customization
__new__ - constructor.
__init__ - initializer.
Will it cause memory leak if they cannot be cleaned by GC?
It's a standard issue with garbage collection.
It's not about memory leaks, but about the circular references themselves, and about other kinds of resources managed by those objects that may need cleanup. The references create a dependency - you can't delete the referrer until all objects it references are deleted, because it may need to do something with those referred-to objects during its cleanup.
As a contrived example, two objects may each have log files, and during their cleanups may need to write log messages both to their own log file and to the other one. You can't clean up either object first, as by doing so you leave the other object unable to perform its cleanup.
The basic rule is that you can have either reliable destructors (as in C++) or garbage collection (as in Python, Java...), but not both. Though in principle, a static analysis of code (or even a visual inspection in most cases) can tell you which classes might have this circular reference problem.
From the docs for gc.garbage:
Python doesn’t collect such cycles
automatically because, in general, it
isn’t possible for Python to guess a
safe order in which to run the
__del__() methods. If you know a safe order, you can force the issue by
examining the garbage list, and
explicitly breaking cycles due to your
objects within the list.
It depends on what are You doing in __del__. If You are using it to handle references to another objects, it may be so.
Some discussion is in docs. More appropriate question is what are You trying to do in __del__ and if it should not be done explicitly somewhere else in the code.
Just for the sheer heck of it, I've decided to create a Scheme binding to libpython so you can embed Python in Scheme programs. I'm already able to call into Python's C API, but I haven't really thought about memory management.
The way mzscheme's FFI works is that I can call a function, and if that function returns a pointer to a PyObject, then I can have it automatically increment the reference count. Then, I can register a finalizer that will decrement the reference count when the Scheme object gets garbage collected. I've looked at the documentation for reference counting, and don't see any problems with this at first glance (although it may be sub-optimal in some cases). Are there any gotchas I'm missing?
Also, I'm having trouble making heads or tails of the cyclic garbage collector documentation. What things will I need to bear in mind here? In particular, how do I make Python aware that I have a reference to something so it doesn't collect it while I'm still using it?
Your link to http://docs.python.org/extending/extending.html#reference-counts is the right place. The Extending and Embedding and Python/C API sections of the documentation are the ones that will explain how to use the C API.
Reference counting is one of the annoying parts of using the C API. The main gotcha is keeping everything straight: Depending on the API function you call, you may or may not own the reference to the object you get. Be careful to understand whether you own it (and thus cannot forget to DECREF it or give it to something that will steal it) or are borrowing it (and must INCREF it to keep it and possibly to use it during your function). The most common bugs involving this are 1) remembering incorrectly whether you own a reference returned by a particular function and 2) believing you're safe to borrow a reference for a longer time than you are.
You do not have to do anything special for the cyclic garbage collector. It's just there to patch up a flaw in reference counting and doesn't require direct access.
The biggest gotcha I know with ref counting and the C API is the __del__ thing. When you have a borrowed reference to something, you think you can get away without INCREF'ing because you don't give up the GIL while you use that reference. But, if you end up deleting an object (by, for example, removing it from a list), it's possible that you trigger a __del__ call, which might remove the reference you're borrowing from under your feet. Very tricky.
If you INCREF (and then DECREF, of course) all borrowed references as soon as you get them, there shouldn't be any problem.