If I do this:
def foo():
a = SomeObject()
Is 'a' destroyed immediately after leaving foo? Or does it wait for some GC to happen?
Yes and no. The object will get destroyed after you leave foo (as long as nothing else has a reference to it), but whether it is immediate or not is an implementation detail, and will vary.
In CPython (the standard python implementation), refcounting is used, so the item will immediately be destroyed. There are some exceptions to this, such as when the object contains cyclical references, or when references are held to the enclosing frame (eg. an exception is raised that retains a reference to the frame's variables.)
In implmentations like Jython or IronPython however, the object won't be finalised until the garbage collector kicks in.
As such, you shouldn't rely on timely finalisation of objects, but should only assume that it will be destroyed at some point after the last reference goes. When you do need some cleanup to be done based on the lexical scope, either explicitely call a cleanup method, or look at the new with statement in python 2.6 (available in 2.5 with "from __future__ import with_statement").
Related
Python works with reference counting. That means, if there is no more reference to a value, then the memory of that value is recycled. Or in other words. As long as there is at least one remaining reference, the obj is not deleted and the memory is not released.
Lets consider the following example:
def myfn():
result = work_with(BigObj()) # reference 1 to BigObj is on the stack frame.
# Not yet counting any
# reference inside of work_with function
# after work_with returns: The stack frame
# and reference 1 are deleted. memory of BigObj
# is released
return result
def work_with(big_obj): # here we have another reference to BigObj
big_obj = None # let's assume, that need more memory and we don't
# need big_obj any_more
# the reference inside work_with is deleted. However,
# there is still the reference on the stack. So the
# memory is not released until work_with returns
other_big_obj = BigObj() # we need the memory for another BigObj -> we may run
# out of memory here
So my question is:
Why does CPython hold an additional reference to values which are passed to functions on the stack? Is there any special purpose behind this or is it just an "unlucky" implementation detail?
My first thought on this is:
To prevent the reference count from dropping to zero. However, we have still an alive reference inside the called function. So this does not make any sense to me.
It is the way CPython passes parameters to a function. The frame holds a reference to its argument to allow passing temporary objects. And the frame is destroyed only when the function returns, so all parameters get an additional reference during the function call.
This is the reason why the doc for sys.getrefcount says:
The count returned is generally one higher than you might expect, because it includes the (temporary) reference as an argument to getrefcount().
In fact, in the callee, the reference to the arguments is known to be a borrowed reference, meaning that the callee never has to decrement it. So when you set it to None it will not destroy the object.
A different implementation would be possible, where the callee should decrement the reference to its arguments. The benefit would be that it would allow immediate destruction of temporaries. But the drawback would be that the callee should explicitely decrement the reference count of all its parameters. At C level, ref counting is already tedious, and I assume that Python implementers made that choice for simplicity.
By the way, it only matters when you pass a large temporary object to a function which is not the most common use case.
TL/DR: IMHO there is no real rationale for preventing a function to immediately destroy a temporary, it is just a consequence of the general implementation of functions in CPython.
I'm writing a function which takes a huge argument, and runs for a long time. It needs the argument only halfway. Is there a way for the function to delete the value pointed to by the argument if there are no more references to it?
I was able to get it deleted as soon as the function returns, like this:
def f(m):
print 'S1'
m = None
#__import__('gc').collect() # Uncommenting this doesn't help.
print 'S2'
class M(object):
def __del__(self):
print '__del__'
f(M())
This prints:
S1
S2
__del__
I need:
S1
__del__
S2
I was also trying def f(*args): and def f(**kwargs), but it didn't help, I still get __del__ last.
Please note that my code is relying on the fact that Python has reference counting, and __del__ gets called as soon as an object's reference count drops to zero. I want the reference count of a function argument drop to zero in the middle of a function. Is this possible?
Please note that I know of a workaround: passing a list of arguments:
def f(ms):
print 'S1'
del ms[:]
print 'S2'
class M(object):
def __del__(self):
print '__del__'
f([M()])
This prints:
S1
__del__
S2
Is there a way to get the early deletion without changing the API (e.g. introducing lists to the arguments)?
If it's hard to get a portable solution which works in many Python implementations, I need something which works in the most recent CPython 2.7. It doesn't have to be documented.
From the documentation:
CPython implementation detail: CPython currently uses a reference-counting scheme with (optional) delayed detection of cyclically linked garbage, which collects most objects as soon as they become unreachable, but is not guaranteed to collect garbage containing circular references. See the documentation of the gc module for information on controlling the collection of cyclic garbage. Other implementations act differently and CPython may change. Do not depend on immediate finalization of objects when they become unreachable (ex: always close files).
Short of modifying the interpreter yourself, you cannot achieve what you want. __del__ will be called when the interpreter decides to do it.
It look like it's not possible to do the early deletion in CPython 2.7 without changing the API of the f function.
Read in Python CFFI documentation:
The interface is based on LuaJIT’s FFI (...)
Read on LuaJIT website (about ffi.gc()):
This function allows safe integration of unmanaged resources into the automatic memory management of the LuaJIT garbage collector. Typical usage:
local p = ffi.gc(ffi.C.malloc(n), ffi.C.free)
...
p = nil -- Last reference to p is gone.
-- GC will eventually run finalizer: ffi.C.free(p)
So, using Python-CFFI, do you have to trigger the destruction of the last reference to a variable instantiated using ffi.gc (= that needs a special function for deallocation because some parts of it are dynamically allocated) by setting it to (i.e.) ffi.NULL ?
Python is designed so that all objects are garbage collected as soon as there is no more reference to it (or soon afterwards), like any other garbage-collected language (including Lua). The trick of setting p = None explicitly (or del p) will merely make sure that this local variable p does not keep the object alive. It is pointless (barring special cases) if, for example, it is one of the last thing done in this function. You don't need it any more than you need it to free, say, a variable that would contain a regular string object.
I have this code, save as so.py:
import gc
gc.set_debug(gc.DEBUG_STATS|gc.DEBUG_LEAK)
class GUI():
#########################################
def set_func(self):
self.functions = {}
self.functions[100] = self.userInput
#########################################
def userInput(self):
a = 1
g = GUI()
g.set_func()
print gc.collect()
print gc.garbage
And this is the output:
I have two questions:
Why gc.collect() does not reports unreachable when first time import? Instead it reports unreachable only when reload().
Is there any quick way to fix this function mapping circular reference, i.e self.functions[100] = self.userInput ? Because my old project have a lot of this function mapping circular reference and i'm looking for a quick way/one line to change this codes. Currently what i do is "del g.functions" for all this functions at the end.
The first time you import the module nothing is being collected because you have a reference to the so module and all other objects are referenced by it, so they are all alive and the garbage collector has nothing to collect.
When you reload(so) what happens is that the module is reexecuted, overriding all previous references and thus now the old values don't have any reference anymore.
You do have a reference cycle in:
self.functions[100] = self.userInput
since self.userInput is a bound method it has a reference to self. So now self has a reference to the functions dictionary which has a reference to the userInput bound method which has a reference to self and the gc will collect those objects.
It depends by what you are trying to do. From your code is not clear how you are using that self.functions dictionary and depending on that different options may be viable.
The simplest way to break the cycle is to simply not create the self.functions attribute, but pass the dictionary around explicitly.
If self.functions only references bound methods you could store the name of the methods instead of the method itself:
self.functions[100] = self.userInput.__name__
and then you can call the method doing:
getattr(self, self.functions[100])()
or you can do:
from operator import methodcaller
call_method = methodcaller(self.functions[100])
call_method(self) # calls self.userInput()
I don't really understand what do you mean by "Currently what i do is del g.functions for all this functions at the end." Which functions are you talking about?
Also, is this really a problem? Are you experience a real memory leak?
Note that the garbage collector reports the objects as unreachable not as uncollectable. This means that the objects are freed even if they are part of a reference cycle. So no memory leak should happen.
In fact adding del g.functions is useless because the objects are going to be freed anyway, so the one line fix is to simply remove all those del statements, since they don't do anything at all.
The fact that they are put into gc.garbage is because gc.DEBUG_LEAK implies the flag GC.DEBUG_SAVEALL which makes the collector put all unreachable objects into the garbage and not just the uncollectable ones.
The nature of reload is that the module is re-executed. The new definitions supersede the old ones, so the old values become unreachable. By contrast, on the first import, there are no superseded definitions, so naturally there is nothing to become unreachable.
One way is to pass the functions object as a parameter to set_func, and do not assign it as an instance attribute. This will break the cycle while still allowing you to pass the functions object to where it's needed.
in c/c++, you have variables in stack when you create a local variable inside a function.
http://effbot.org/zone/call-by-object.htm
CLU objects exist independently of procedure activations. Space
for objects is allocated from a dynamic storage area /.../ In
theory, all objects continue to exist forever. In practice, the
space used by an object may be reclaimed when the object isno
longer accessible to any CLU program.
Does this mean objects in python is created from heap(as in malloc in c/c++)? and the objects are deallocated when there 's no name associated with them?(like smart pointers)?
Example:
def foo(a):
result = []
result.append(a)
return result
foo("hello")
myList = foo("bye")
So the first result([]) was created in the heap and got deallocated because there's no name associated with it?
Yes, all Python objects live on the heap (at least on CPython.) They are reference-counted: they are de-allocated when the last reference to the object disappear. (CPython also has a garbage collector to break cycles.)
In CPython your first list disappears as soon as the function returns since you did not bind the return value to a name and the reference count dropped to zero. In other implementation the object may live longer until the garbage-collector kicks in.
Some objects (like open files) have resources attached that are automatically freed when the object is deallocated, but because of the above it is not recommended to rely on this. Resources should be closed explicitly when you are done with them.
Yes, all values in CPython are allocated on the heap and reference-counted to know when to deallocate them. Unlike in C, there is no way to know in most cases if a value will outlive its function, so the only safe thing to do is to heap-allocate everything.
Certainly you could do some analysis and determine that certain values are never passed to functions and thus couldn't escape, but that's of limited use in Python and the extra overhead probably wouldn't be worth it.
As a supplement to the other answers, here's one way to track when garbage-collection happens, using the special method __del__:
class Test(object):
def __init__(self, name):
self.name = name
def __del__(self):
print "deleting {0}".format(self.name)
print "discarded instance creation"
Test("hello")
print "saved instance creation"
myList = Test("bye")
print "program done"
Output:
discarded instance creation
deleting hello
saved instance creation
program done
deleting bye
For more in-depth data, see the gc module.