Will it cause memory leak if they cannot be cleaned by GC?
It's a standard issue with garbage collection.
It's not about memory leaks, but about the circular references themselves, and about other kinds of resources managed by those objects that may need cleanup. The references create a dependency - you can't delete the referrer until all objects it references are deleted, because it may need to do something with those referred-to objects during its cleanup.
As a contrived example, two objects may each have log files, and during their cleanups may need to write log messages both to their own log file and to the other one. You can't clean up either object first, as by doing so you leave the other object unable to perform its cleanup.
The basic rule is that you can have either reliable destructors (as in C++) or garbage collection (as in Python, Java...), but not both. Though in principle, a static analysis of code (or even a visual inspection in most cases) can tell you which classes might have this circular reference problem.
From the docs for gc.garbage:
Python doesn’t collect such cycles
automatically because, in general, it
isn’t possible for Python to guess a
safe order in which to run the
__del__() methods. If you know a safe order, you can force the issue by
examining the garbage list, and
explicitly breaking cycles due to your
objects within the list.
It depends on what are You doing in __del__. If You are using it to handle references to another objects, it may be so.
Some discussion is in docs. More appropriate question is what are You trying to do in __del__ and if it should not be done explicitly somewhere else in the code.
Related
I'm writing a small application in Python 3 where objects (players) hold sets of connected objects of the same type (other players) together with data that belong to the pair of objects (the result of a played game). From time to time it is necessary to delete objects (in my example when they drop out of a tournament). If this is the case, the to-be-deleted object also must be removed from all sets of all connected objects.
So, when an object detects that it shall be deleted, it shall walk through all connected objects in its own set and call the method .remove(self) on all connected objects. When this is done, it is ready to be destroyed.
Is it possible to have this done by simply calling del player42? I've read What is the __del__ method and how do I call it? and from there (and from other resources) I learned that the method __del__ is not reliable, because it will be called only when the to-be-deleted object is garbage collected, but this can happen much later than it really should be performed.
Is there another "magic" method in Python 3 objects that will be called immediately when the del command is performed on the object?
The del command deletes a specific reference to an object, not the object itself. There may still be many other references to that object (especially in parent/child relationships) that will prevent the object from being garbage collected, and __del__ from being called. If you have time-dependent and order-specific teardown requirements, you should implement a method on your objects like destroy() that you can call explicitly.
If you really want to use __del__ for some reason, you could make heavy use of weakrefs to prevent hard references that would prevent garbage collection, but it would be less deterministic with the potential for race conditions and other hard to diagnose bugs.
I guess this is not about memory management, but about getting rid of the back references from the objects to their containers. As garbage collection can assumed to be highly non-deterministic, you should not rely on it to perform runtime-relevant operations such as removing objects from a collection.
Instead, design your system in a different, less coupled way. For example, don't keep the items in collections associated with players -- store them in separate inventories, and access them only through them. You can then just delete objects from the inventory. This is a bit similar to certain forms of database normalization.
To achieve this kind of design (updating things referred to from differnt places), games tend to use their special design patterns, for example entity component sytems.
Pretty simple question:
I have some code to show some graphs, and it prepares data for the graphs, and I don't want to waste memory (limited)... is there a way to have a "local scope" so when we get to the end, everything inside is freed?
I come from C++ where you can define code inside { ... } so at the end everything is freed, and you don't have to care about anything
Anything like that in python?
The only thing I can think of is:
def tmp():
... code ...
tmp()
but is very ugly, and for sure I don't want to list all the del x at the end
If anything holds a reference to your object, it cannot be freed. By default, anything at the global scope is going to be held in the global namespace (globals()), and as far as the interpreter knows, the very next line of source code could reference it (or, another module could import it from this current module), so globals cannot be implicitly freed, ever.
This forces your hand to either explicitly delete references to objects with del, or to put them within the local scope of a function. This may seem ugly, but if you follow the philosophy that a function should do one thing and one thing well (thanks Unix!), you will already segment your code into functions already. On the one-off exceptions where you allocate a lot of memory early on in your function, and no longer need it midway through, you can del the reference to it.
I know this isn't the answer you want to hear, but its the reality of Python. You could accomplish something similar by nesting function defs or classs inside, but this is kinda hacky (or in the class case, which wouldn't require calling/instantiating, extremely hacky).
I will also mention, there is a gc built in module for interacting with the garbage collector. Here, you can trigger an immediate garbage collection (otherwise python will eventually get around to collecting the things you del refs to), as well as inspect how many references a given object has.
If you're curious where the allocations are happening, you can also use the built in tracemalloc module to trace said allocations.
Mechanism that handles freeing memory in Python is called "Garbage Collector" and it means there's no reason to use del in overwhelming majority of Python code.
When programming in Python, you are "not supposed" to care about such low level things as allocating and freeing memory for your variables.
That being said, putting your code into functions (although preferrably called something clearer than tmp()) is most definitely a good idea as it will make your code much more readable and "Pythonic"
Coming from C++ and already stumbled to one of the main diferences (drawbacks) of python and this is memory management.Python Garbage Collector will delete all the objects that will fall out of scope.Freeing up memory of objects althought doesnt guarantee that this memory will return actually to the system but instead a rather big portion will be kept reserved by the python programm even if not used.If you face a memory problem and you want to free your memory back to the system the only safe method is to run the memory intensive function into a seperate process.Every process in python have its own interpreter and any memory consumed by this process will return to the system when the process exits.
What are use cases in python 3 of writing a custom __del__ method or relying on one from stdlib1? That is, in what scenario is it reasonably safe, and can do something that's hard to do without it?
For many good reasons (1 2 3 4 5 6), the usual recommendation is to avoid __del__ and instead use context managers or perform the cleanup manually:
__del__ is not guaranteed to be called if objects are alive on intrepreter exit2.
At the point one expects the object can be destroyed, the ref count may actually be non-zero (e.g., a reference may survive through a traceback frame held onto by a calling function). This makes the destruction time far more uncertain than the mere unpredictability of gc implies.
Garbage collector cannot get rid of cycles if they include more than 1 object with __del__
The code inside __del__ must be written super carefully:
object attributes set in __init__ may not be present since __init__ might have raised an exception;
exceptions are ignored (only printed to stderr);
globals may no longer be available.
Update:
PEP 442 has made significant improvements in the behavior of __del__. It seems though that my points 1-4 are still valid?
Update 2:
Some of the top python libraries embrace the use of __del__ in the post-PEP 442 python (i.e., python 3.4+). I guess my point 3 is no longer valid after PEP 442, and the other points are accepted as unavoidable complexity of object finalization.
1I expanded the question from just writing a custom __del__ method to include relying on __del__ from stdlib.
2It seems that __del__ is always called on interpreter exit in the more recent versions of Cpython (does anyone have a counter-example?). However, it doesn't matter for the purpose of __del__'s usablity: the docs explicitly provide no guarantee about this behavior, so one cannot rely on it (it may change in future versions, and it may be different in non-CPython interpreters).
Context managers (and try/finally blocks) are somewhat more restrictive than __del__. In general they require you to structure your code in such a way that the lifetime of the resource you need to free doesn't extend beyond a single function call at some level in the call stack, rather than, say, binding it to the lifetime of a class instance that could be destroyed at unpredictable times and places. It's usually a good thing to restrict the lifetime of resources to one scope, but there sometimes edge cases where this pattern is an awkward fit.
The only case where I've used __del__ (aside from for debugging, c.f. #MSeifert's answer) is for freeing memory allocated outside of Python by an external library. Because of the design of the library I was wrapping, it was difficult to avoid having a large number of objects that held pointers to heap-allocated memory. Using a __del__ method to free the pointers was the easiest way to do cleanup, since it would have been impractical to enclose the lifespan of each instance inside a context manager.
One use-case is debugging. If you want to track the lifetime of a specific object it's convenient to write a temporary __del__ method. It can be used to do some logging or just to print something. I have used it a few times especially when I was interested in when and how instances are deleted. It's sometimes good to know when you create and discard a lot of temporary instances. But as I said I only ever used this to satisfy my curiosity or when debugging.
Another use-case is subclassing a class that defines a __del__ method. Sometimes you find a class that you want to subclass but the internals require you to actually override __del__ to control the order in which the instance is cleaned up. That's very rare too because you need to find a class with __del__, you nee to subclass it and you need to introduced some internals that actually require to call the superclass __del__ at exactly the right time. I actually did that once but I don't remember where and why it was important (maybe I didn't even know about alternatives then, so treat this as possible use-case).
When you wrap an external object (for example a c object that isn't tracked by Python) that really, really needs to be deallocated even if someone "forgets" (I suspect a lot of people just omit them on purpose!) to use the context manager that you provided.
However all these cases are (or should be) very, very rare. Actually it's a bit like with metaclasses: They are fun and it's really cool to understand the concepts because you can probe the "fun parts" of python. But in practice:
If you wonder whether you need them [metaclasses], you don’t (the people who actually need them know with certainty that they need them, and don’t need an explanation about why).
Citation (probably) from Tim Peters (I haven't found the original reference).
One case where i always use __del__, is for closing a aiohttp.ClientSession object.
When you don't, aiohttp will print warnings about the unclosed client session.
My design is as follows:
__main__ references a
a references b
b references a
a is created and then disposed of from __main__
Thus a and b have circular references. However upon del a I would prefer both a and b disposed of.
I see in many places advice to use Context Managers, and specifically the with statement instead of __del__(). However all the examples I see of with start and end in local scope (e.g. of a certain method)
Can this be elegantly performed with with?
What is the alternative?
I recommend either:
Using weakref - which is sometimes applicable when circular references are involved
or ... just manually disposing of stuff in the order you need - not in __del__ but in an explicit dispose method you call at the right time(s)
In general, when you know you have circular references, relying on automatic __del__ disposal is not a good idea. It's brittle - even if you manage to make it work in some case, small changes in dependencies can break it again.
What is the alternative?
Do nothing. Until you create millions of circular references like this -- and can prove that this (and only this) is breaking your program -- it doesn't actually matter.
Garbage collector is supposed to handle this.
Just for the sheer heck of it, I've decided to create a Scheme binding to libpython so you can embed Python in Scheme programs. I'm already able to call into Python's C API, but I haven't really thought about memory management.
The way mzscheme's FFI works is that I can call a function, and if that function returns a pointer to a PyObject, then I can have it automatically increment the reference count. Then, I can register a finalizer that will decrement the reference count when the Scheme object gets garbage collected. I've looked at the documentation for reference counting, and don't see any problems with this at first glance (although it may be sub-optimal in some cases). Are there any gotchas I'm missing?
Also, I'm having trouble making heads or tails of the cyclic garbage collector documentation. What things will I need to bear in mind here? In particular, how do I make Python aware that I have a reference to something so it doesn't collect it while I'm still using it?
Your link to http://docs.python.org/extending/extending.html#reference-counts is the right place. The Extending and Embedding and Python/C API sections of the documentation are the ones that will explain how to use the C API.
Reference counting is one of the annoying parts of using the C API. The main gotcha is keeping everything straight: Depending on the API function you call, you may or may not own the reference to the object you get. Be careful to understand whether you own it (and thus cannot forget to DECREF it or give it to something that will steal it) or are borrowing it (and must INCREF it to keep it and possibly to use it during your function). The most common bugs involving this are 1) remembering incorrectly whether you own a reference returned by a particular function and 2) believing you're safe to borrow a reference for a longer time than you are.
You do not have to do anything special for the cyclic garbage collector. It's just there to patch up a flaw in reference counting and doesn't require direct access.
The biggest gotcha I know with ref counting and the C API is the __del__ thing. When you have a borrowed reference to something, you think you can get away without INCREF'ing because you don't give up the GIL while you use that reference. But, if you end up deleting an object (by, for example, removing it from a list), it's possible that you trigger a __del__ call, which might remove the reference you're borrowing from under your feet. Very tricky.
If you INCREF (and then DECREF, of course) all borrowed references as soon as you get them, there shouldn't be any problem.