I'm writing a small application in Python 3 where objects (players) hold sets of connected objects of the same type (other players) together with data that belong to the pair of objects (the result of a played game). From time to time it is necessary to delete objects (in my example when they drop out of a tournament). If this is the case, the to-be-deleted object also must be removed from all sets of all connected objects.
So, when an object detects that it shall be deleted, it shall walk through all connected objects in its own set and call the method .remove(self) on all connected objects. When this is done, it is ready to be destroyed.
Is it possible to have this done by simply calling del player42? I've read What is the __del__ method and how do I call it? and from there (and from other resources) I learned that the method __del__ is not reliable, because it will be called only when the to-be-deleted object is garbage collected, but this can happen much later than it really should be performed.
Is there another "magic" method in Python 3 objects that will be called immediately when the del command is performed on the object?
The del command deletes a specific reference to an object, not the object itself. There may still be many other references to that object (especially in parent/child relationships) that will prevent the object from being garbage collected, and __del__ from being called. If you have time-dependent and order-specific teardown requirements, you should implement a method on your objects like destroy() that you can call explicitly.
If you really want to use __del__ for some reason, you could make heavy use of weakrefs to prevent hard references that would prevent garbage collection, but it would be less deterministic with the potential for race conditions and other hard to diagnose bugs.
I guess this is not about memory management, but about getting rid of the back references from the objects to their containers. As garbage collection can assumed to be highly non-deterministic, you should not rely on it to perform runtime-relevant operations such as removing objects from a collection.
Instead, design your system in a different, less coupled way. For example, don't keep the items in collections associated with players -- store them in separate inventories, and access them only through them. You can then just delete objects from the inventory. This is a bit similar to certain forms of database normalization.
To achieve this kind of design (updating things referred to from differnt places), games tend to use their special design patterns, for example entity component sytems.
Related
I'm doing some things in Python (3.3.3), and I came across something that is confusing me since to my understanding classes get a new id each time they are called.
Lets say you have this in some .py file:
class someClass: pass
print(someClass())
print(someClass())
The above returns the same id which is confusing me since I'm calling on it so it shouldn't be the same, right? Is this how Python works when the same class is called twice in a row or not? It gives a different id when I wait a few seconds but if I do it at the same like the example above it doesn't seem to work that way, which is confusing me.
>>> print(someClass());print(someClass())
<__main__.someClass object at 0x0000000002D96F98>
<__main__.someClass object at 0x0000000002D96F98>
It returns the same thing, but why? I also notice it with ranges for example
for i in range(10):
print(someClass())
Is there any particular reason for Python doing this when the class is called quickly? I didn't even know Python did this, or is it possibly a bug? If it is not a bug can someone explain to me how to fix it or a method so it generates a different id each time the method/class is called? I'm pretty puzzled on how that is doing it because if I wait, it does change but not if I try to call the same class two or more times.
The id of an object is only guaranteed to be unique during that object's lifetime, not over the entire lifetime of a program. The two someClass objects you create only exist for the duration of the call to print - after that, they are available for garbage collection (and, in CPython, deallocated immediately). Since their lifetimes don't overlap, it is valid for them to share an id.
It is also unsuprising in this case, because of a combination of two CPython implementation details: first, it does garbage collection by reference counting (with some extra magic to avoid problems with circular references), and second, the id of an object is related to the value of the underlying pointer for the variable (ie, its memory location). So, the first object, which was the most recent object allocated, is immediately freed - it isn't too surprising that the next object allocated will end up in the same spot (although this potentially also depends on details of how the interpreter was compiled).
If you are relying on several objects having distinct ids, you might keep them around - say, in a list - so that their lifetimes overlap. Otherwise, you might implement a class-specific id that has different guarantees - eg:
class SomeClass:
next_id = 0
def __init__(self):
self.id = SomeClass.nextid
SomeClass.nextid += 1
If you read the documentation for id, it says:
Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
And that's exactly what's happening: you have two objects with non-overlapping lifetimes, because the first one is already out of scope before the second one is ever created.
But don't trust that this will always happen, either. Especially if you need to deal with other Python implementations, or with more complicated classes. All that the language says is that these two objects may have the same id() value, not that they will. And the fact that they do depends on two implementation details:
The garbage collector has to clean up the first object before your code even starts to allocate the second object—which is guaranteed to happen with CPython or any other ref-counting implementation (when there are no circular references), but pretty unlikely with a generational garbage collector as in Jython or IronPython.
The allocator under the covers have to have a very strong preference for reusing recently-freed objects of the same type. This is true in CPython, which has multiple layers of fancy allocators on top of basic C malloc, but most of the other implementations leave a lot more to the underlying virtual machine.
One last thing: The fact that the object.__repr__ happens to contain a substring that happens to be the same as the id as a hexadecimal number is just an implementation artifact of CPython that isn't guaranteed anywhere. According to the docs:
If at all possible, this should look like a valid Python expression that could be used to recreate an object with the same value (given an appropriate environment). If this is not possible, a string of the form <...some useful description…> should be returned.
The fact that CPython's object happens to put hex(id(self)) (actually, I believe it's doing the equivalent of sprintf-ing its pointer through %p, but since CPython's id just returns the same pointer cast to a long that ends up being the same) isn't guaranteed anywhere. Even if it has been true since… before object even existed in the early 2.x days. You're safe to rely on it for this kind of simple "what's going on here" debugging at the interactive prompt, but don't try to use it beyond that.
I sense a deeper problem here. You should not be relying on id to track unique instances over the lifetime of your program. You should simply see it as a non-guaranteed memory location indicator for the duration of each object instance. If you immediately create and release instances then you may very well create consecutive instances in the same memory location.
Perhaps what you need to do is track a class static counter that assigns each new instance with a unique id, and increments the class static counter for the next instance.
It's releasing the first instance since it wasn't retained, then since nothing has happened to the memory in the meantime, it instantiates a second time to the same location.
Try this, try calling the following:
a = someClass()
for i in range(0,44):
print(someClass())
print(a)
You'll see something different. Why? Cause the memory that was released by the first object in the "foo" loop was reused. On the other hand a is not reused since it's retained.
A example where the memory location (and id) is not released is:
print([someClass() for i in range(10)])
Now the ids are all unique.
What are use cases in python 3 of writing a custom __del__ method or relying on one from stdlib1? That is, in what scenario is it reasonably safe, and can do something that's hard to do without it?
For many good reasons (1 2 3 4 5 6), the usual recommendation is to avoid __del__ and instead use context managers or perform the cleanup manually:
__del__ is not guaranteed to be called if objects are alive on intrepreter exit2.
At the point one expects the object can be destroyed, the ref count may actually be non-zero (e.g., a reference may survive through a traceback frame held onto by a calling function). This makes the destruction time far more uncertain than the mere unpredictability of gc implies.
Garbage collector cannot get rid of cycles if they include more than 1 object with __del__
The code inside __del__ must be written super carefully:
object attributes set in __init__ may not be present since __init__ might have raised an exception;
exceptions are ignored (only printed to stderr);
globals may no longer be available.
Update:
PEP 442 has made significant improvements in the behavior of __del__. It seems though that my points 1-4 are still valid?
Update 2:
Some of the top python libraries embrace the use of __del__ in the post-PEP 442 python (i.e., python 3.4+). I guess my point 3 is no longer valid after PEP 442, and the other points are accepted as unavoidable complexity of object finalization.
1I expanded the question from just writing a custom __del__ method to include relying on __del__ from stdlib.
2It seems that __del__ is always called on interpreter exit in the more recent versions of Cpython (does anyone have a counter-example?). However, it doesn't matter for the purpose of __del__'s usablity: the docs explicitly provide no guarantee about this behavior, so one cannot rely on it (it may change in future versions, and it may be different in non-CPython interpreters).
Context managers (and try/finally blocks) are somewhat more restrictive than __del__. In general they require you to structure your code in such a way that the lifetime of the resource you need to free doesn't extend beyond a single function call at some level in the call stack, rather than, say, binding it to the lifetime of a class instance that could be destroyed at unpredictable times and places. It's usually a good thing to restrict the lifetime of resources to one scope, but there sometimes edge cases where this pattern is an awkward fit.
The only case where I've used __del__ (aside from for debugging, c.f. #MSeifert's answer) is for freeing memory allocated outside of Python by an external library. Because of the design of the library I was wrapping, it was difficult to avoid having a large number of objects that held pointers to heap-allocated memory. Using a __del__ method to free the pointers was the easiest way to do cleanup, since it would have been impractical to enclose the lifespan of each instance inside a context manager.
One use-case is debugging. If you want to track the lifetime of a specific object it's convenient to write a temporary __del__ method. It can be used to do some logging or just to print something. I have used it a few times especially when I was interested in when and how instances are deleted. It's sometimes good to know when you create and discard a lot of temporary instances. But as I said I only ever used this to satisfy my curiosity or when debugging.
Another use-case is subclassing a class that defines a __del__ method. Sometimes you find a class that you want to subclass but the internals require you to actually override __del__ to control the order in which the instance is cleaned up. That's very rare too because you need to find a class with __del__, you nee to subclass it and you need to introduced some internals that actually require to call the superclass __del__ at exactly the right time. I actually did that once but I don't remember where and why it was important (maybe I didn't even know about alternatives then, so treat this as possible use-case).
When you wrap an external object (for example a c object that isn't tracked by Python) that really, really needs to be deallocated even if someone "forgets" (I suspect a lot of people just omit them on purpose!) to use the context manager that you provided.
However all these cases are (or should be) very, very rare. Actually it's a bit like with metaclasses: They are fun and it's really cool to understand the concepts because you can probe the "fun parts" of python. But in practice:
If you wonder whether you need them [metaclasses], you don’t (the people who actually need them know with certainty that they need them, and don’t need an explanation about why).
Citation (probably) from Tim Peters (I haven't found the original reference).
One case where i always use __del__, is for closing a aiohttp.ClientSession object.
When you don't, aiohttp will print warnings about the unclosed client session.
In other languages (e.g. Java), object references can be Strong, Weak, Soft or Phantom (http://weblogs.java.net/blog/enicholas/archive/2006/05/understanding_w.html).
In Python, references are Strong by default and the WeakRef module allows weak references.
Is it possible to have "soft references" in Python?
In my particular case, I have a cache of objects that are time-consuming to create. Sometimes there may be no references to a cached object, but I don't want to throw the cached object away if I don't have to (i.e. if memory is plentiful).
Python doesn't natively offer any flavors of references besides hard (aka strong) & weak.
That said, here is a softref implementation I whipped up a year or so ago which I've been using in a few places I needed one. What it provides aren't quite actual soft references, but it comes close for most use cases. It's a little rough around the edges, but is fully functional... though it relies on some reference counting internally that means it'll probably break on anything except CPython.
In particular, I wrote it precisely for a cache of expensive-to-create long-lived objects... the SoftValueDictionary should be exactly what you're looking for.
Another option is to use a cache that maintains a certain number of objects (e.g. 100) rather than explicitly calculating their memory consumption. When an object is accessed, it is put to the top of the cache if it exists, or the object on the bottom of the cache is replaced with the new object.
Untested, but it should work in theory.
Will it cause memory leak if they cannot be cleaned by GC?
It's a standard issue with garbage collection.
It's not about memory leaks, but about the circular references themselves, and about other kinds of resources managed by those objects that may need cleanup. The references create a dependency - you can't delete the referrer until all objects it references are deleted, because it may need to do something with those referred-to objects during its cleanup.
As a contrived example, two objects may each have log files, and during their cleanups may need to write log messages both to their own log file and to the other one. You can't clean up either object first, as by doing so you leave the other object unable to perform its cleanup.
The basic rule is that you can have either reliable destructors (as in C++) or garbage collection (as in Python, Java...), but not both. Though in principle, a static analysis of code (or even a visual inspection in most cases) can tell you which classes might have this circular reference problem.
From the docs for gc.garbage:
Python doesn’t collect such cycles
automatically because, in general, it
isn’t possible for Python to guess a
safe order in which to run the
__del__() methods. If you know a safe order, you can force the issue by
examining the garbage list, and
explicitly breaking cycles due to your
objects within the list.
It depends on what are You doing in __del__. If You are using it to handle references to another objects, it may be so.
Some discussion is in docs. More appropriate question is what are You trying to do in __del__ and if it should not be done explicitly somewhere else in the code.
Just for the sheer heck of it, I've decided to create a Scheme binding to libpython so you can embed Python in Scheme programs. I'm already able to call into Python's C API, but I haven't really thought about memory management.
The way mzscheme's FFI works is that I can call a function, and if that function returns a pointer to a PyObject, then I can have it automatically increment the reference count. Then, I can register a finalizer that will decrement the reference count when the Scheme object gets garbage collected. I've looked at the documentation for reference counting, and don't see any problems with this at first glance (although it may be sub-optimal in some cases). Are there any gotchas I'm missing?
Also, I'm having trouble making heads or tails of the cyclic garbage collector documentation. What things will I need to bear in mind here? In particular, how do I make Python aware that I have a reference to something so it doesn't collect it while I'm still using it?
Your link to http://docs.python.org/extending/extending.html#reference-counts is the right place. The Extending and Embedding and Python/C API sections of the documentation are the ones that will explain how to use the C API.
Reference counting is one of the annoying parts of using the C API. The main gotcha is keeping everything straight: Depending on the API function you call, you may or may not own the reference to the object you get. Be careful to understand whether you own it (and thus cannot forget to DECREF it or give it to something that will steal it) or are borrowing it (and must INCREF it to keep it and possibly to use it during your function). The most common bugs involving this are 1) remembering incorrectly whether you own a reference returned by a particular function and 2) believing you're safe to borrow a reference for a longer time than you are.
You do not have to do anything special for the cyclic garbage collector. It's just there to patch up a flaw in reference counting and doesn't require direct access.
The biggest gotcha I know with ref counting and the C API is the __del__ thing. When you have a borrowed reference to something, you think you can get away without INCREF'ing because you don't give up the GIL while you use that reference. But, if you end up deleting an object (by, for example, removing it from a list), it's possible that you trigger a __del__ call, which might remove the reference you're borrowing from under your feet. Very tricky.
If you INCREF (and then DECREF, of course) all borrowed references as soon as you get them, there shouldn't be any problem.