python - gc unreachable when reload() - python

I have this code, save as so.py:
import gc
gc.set_debug(gc.DEBUG_STATS|gc.DEBUG_LEAK)
class GUI():
#########################################
def set_func(self):
self.functions = {}
self.functions[100] = self.userInput
#########################################
def userInput(self):
a = 1
g = GUI()
g.set_func()
print gc.collect()
print gc.garbage
And this is the output:
I have two questions:
Why gc.collect() does not reports unreachable when first time import? Instead it reports unreachable only when reload().
Is there any quick way to fix this function mapping circular reference, i.e self.functions[100] = self.userInput ? Because my old project have a lot of this function mapping circular reference and i'm looking for a quick way/one line to change this codes. Currently what i do is "del g.functions" for all this functions at the end.

The first time you import the module nothing is being collected because you have a reference to the so module and all other objects are referenced by it, so they are all alive and the garbage collector has nothing to collect.
When you reload(so) what happens is that the module is reexecuted, overriding all previous references and thus now the old values don't have any reference anymore.
You do have a reference cycle in:
self.functions[100] = self.userInput
since self.userInput is a bound method it has a reference to self. So now self has a reference to the functions dictionary which has a reference to the userInput bound method which has a reference to self and the gc will collect those objects.
It depends by what you are trying to do. From your code is not clear how you are using that self.functions dictionary and depending on that different options may be viable.
The simplest way to break the cycle is to simply not create the self.functions attribute, but pass the dictionary around explicitly.
If self.functions only references bound methods you could store the name of the methods instead of the method itself:
self.functions[100] = self.userInput.__name__
and then you can call the method doing:
getattr(self, self.functions[100])()
or you can do:
from operator import methodcaller
call_method = methodcaller(self.functions[100])
call_method(self) # calls self.userInput()
I don't really understand what do you mean by "Currently what i do is del g.functions for all this functions at the end." Which functions are you talking about?
Also, is this really a problem? Are you experience a real memory leak?
Note that the garbage collector reports the objects as unreachable not as uncollectable. This means that the objects are freed even if they are part of a reference cycle. So no memory leak should happen.
In fact adding del g.functions is useless because the objects are going to be freed anyway, so the one line fix is to simply remove all those del statements, since they don't do anything at all.
The fact that they are put into gc.garbage is because gc.DEBUG_LEAK implies the flag GC.DEBUG_SAVEALL which makes the collector put all unreachable objects into the garbage and not just the uncollectable ones.

The nature of reload is that the module is re-executed. The new definitions supersede the old ones, so the old values become unreachable. By contrast, on the first import, there are no superseded definitions, so naturally there is nothing to become unreachable.
One way is to pass the functions object as a parameter to set_func, and do not assign it as an instance attribute. This will break the cycle while still allowing you to pass the functions object to where it's needed.

Related

Why are references to python values, that are function parameters, stored on the stack(frame) in CPython?

Python works with reference counting. That means, if there is no more reference to a value, then the memory of that value is recycled. Or in other words. As long as there is at least one remaining reference, the obj is not deleted and the memory is not released.
Lets consider the following example:
def myfn():
result = work_with(BigObj()) # reference 1 to BigObj is on the stack frame.
# Not yet counting any
# reference inside of work_with function
# after work_with returns: The stack frame
# and reference 1 are deleted. memory of BigObj
# is released
return result
def work_with(big_obj): # here we have another reference to BigObj
big_obj = None # let's assume, that need more memory and we don't
# need big_obj any_more
# the reference inside work_with is deleted. However,
# there is still the reference on the stack. So the
# memory is not released until work_with returns
other_big_obj = BigObj() # we need the memory for another BigObj -> we may run
# out of memory here
So my question is:
Why does CPython hold an additional reference to values which are passed to functions on the stack? Is there any special purpose behind this or is it just an "unlucky" implementation detail?
My first thought on this is:
To prevent the reference count from dropping to zero. However, we have still an alive reference inside the called function. So this does not make any sense to me.
It is the way CPython passes parameters to a function. The frame holds a reference to its argument to allow passing temporary objects. And the frame is destroyed only when the function returns, so all parameters get an additional reference during the function call.
This is the reason why the doc for sys.getrefcount says:
The count returned is generally one higher than you might expect, because it includes the (temporary) reference as an argument to getrefcount().
In fact, in the callee, the reference to the arguments is known to be a borrowed reference, meaning that the callee never has to decrement it. So when you set it to None it will not destroy the object.
A different implementation would be possible, where the callee should decrement the reference to its arguments. The benefit would be that it would allow immediate destruction of temporaries. But the drawback would be that the callee should explicitely decrement the reference count of all its parameters. At C level, ref counting is already tedious, and I assume that Python implementers made that choice for simplicity.
By the way, it only matters when you pass a large temporary object to a function which is not the most common use case.
TL/DR: IMHO there is no real rationale for preventing a function to immediately destroy a temporary, it is just a consequence of the general implementation of functions in CPython.

How to delete a function argument early?

I'm writing a function which takes a huge argument, and runs for a long time. It needs the argument only halfway. Is there a way for the function to delete the value pointed to by the argument if there are no more references to it?
I was able to get it deleted as soon as the function returns, like this:
def f(m):
print 'S1'
m = None
#__import__('gc').collect() # Uncommenting this doesn't help.
print 'S2'
class M(object):
def __del__(self):
print '__del__'
f(M())
This prints:
S1
S2
__del__
I need:
S1
__del__
S2
I was also trying def f(*args): and def f(**kwargs), but it didn't help, I still get __del__ last.
Please note that my code is relying on the fact that Python has reference counting, and __del__ gets called as soon as an object's reference count drops to zero. I want the reference count of a function argument drop to zero in the middle of a function. Is this possible?
Please note that I know of a workaround: passing a list of arguments:
def f(ms):
print 'S1'
del ms[:]
print 'S2'
class M(object):
def __del__(self):
print '__del__'
f([M()])
This prints:
S1
__del__
S2
Is there a way to get the early deletion without changing the API (e.g. introducing lists to the arguments)?
If it's hard to get a portable solution which works in many Python implementations, I need something which works in the most recent CPython 2.7. It doesn't have to be documented.
From the documentation:
CPython implementation detail: CPython currently uses a reference-counting scheme with (optional) delayed detection of cyclically linked garbage, which collects most objects as soon as they become unreachable, but is not guaranteed to collect garbage containing circular references. See the documentation of the gc module for information on controlling the collection of cyclic garbage. Other implementations act differently and CPython may change. Do not depend on immediate finalization of objects when they become unreachable (ex: always close files).
Short of modifying the interpreter yourself, you cannot achieve what you want. __del__ will be called when the interpreter decides to do it.
It look like it's not possible to do the early deletion in CPython 2.7 without changing the API of the f function.

How to create dangling pointer (in stack or heap) in python

I was wondering is there any way to create a dangling pointers in python? I guess we have to manually delete an object for example and then the reference of that object will point at a location that has no meaning for the program.
I found this example Here
import weakref
class Object:
pass
o = Object() #new instance
print ("o id is:",id(o))
r = weakref.ref(o)
print ("r id is:",id(r))
o2 = r()
print ("o2 id is:",id(o2))
print ("r() id is:",id(r()))
print (o is o2)
del o,o2
print (r(),r) #If the referent no longer exists, calling the reference object returns None
o = r() # r is a weak reference object
if o is None:
# referent has been garbage collected
print ("Object has been deallocated; can't frobnicate.")
else:
print ("Object is still live!")
o.do_something_useful()
In this example which one is the dangling pointer/reference? Is it o or r? I am confused.
Is it also possible to create dangling pointers in stack? If you please, give me some simple examples so i can understand how it goes.
Thanks in advance.
All Python objects live on the heap. The stack is only used for function calls.
Calling a weakref object dereferences it and gives you a strong reference to the object, if the object is still around. Otherwise, you get None. In the latter case, you might call the weakref "dangling" (r in your example).
However, Python does not have any notion of a "dangling pointer" in the same way that C does. It's not possible (barring a bug in Python, a buggy extension module or misuse of a module like ctypes) to create a name (strong reference) that refers to a deleted object, because by definition strong references keep their referents alive. On the other hand, weak references are not really dangling pointers, since they are automatically resolved to None if their referents are deleted.
Note that with ctypes abuse it is possible to create a "real" dangling pointer:
import ctypes
a = (1, 2, 3)
ctypes.pythonapi.Py_DecRef(ctypes.py_object(a))
print a
What happens when you print a is now undefined. It might crash the interpreter, print (1, 2, 3), print other tuples, or execute a random function. Of course, this is only possible because you abused ctypes; it's not something that you should ever do.
Barring a bug in Python or an extension, there is no way to refer to a deallocated object. Weak references refer to the object as long as it is alive, while not contributing to keeping it alive. The moment the object is deallocated, the weak reference evaluates to None, so you never get the dangling object. (Even the callback of the weak reference is called after the object has already been deallocated and the weakref dereferences to None, so you cannot resurrect it, either.)
If you could refer to a real deallocated object, Python would most likely crash on first access, because the memory previously held by the object would be reused and the object's type and other slots would contain garbage. Python objects are never allocated on the stack.
If you have a use case why you need to make use of a dangling object, you should present the use case in the form of a question.
If you create a weak reference, it becomes "dangling" when the referenced object is deleted (when it's reference count reaches zero, or is part of a closed cycle of objects not referenced by anything else). This is possible because weakref doesn't increase the reference count itself (that's the whole point of a weak reference).
When this happens, everytime you try to "dereference" the weakref object (call it), it returns None.
It is important to remember that in Python variables are actually names, pointing at objects. They are actually "strong references". Example:
import weakref
class A:
pass
# a new object is created, and the name "x" is set to reference the object,
# giving a reference count of 1
x = A()
# a weak reference is created referencing the object that the name x references
# the reference count is still 1 though, because x is still the only strong
# reference
weak_reference = weakref.ref(x)
# the only strong reference to the object is deleted (x), reducing the reference
# count to 0 this means that the object is destroyed, and at this point
# "weak_reference" becomes dangling, and calls return None
del x
assert weak_reference() is None

python creates everything from heap?

in c/c++, you have variables in stack when you create a local variable inside a function.
http://effbot.org/zone/call-by-object.htm
CLU objects exist independently of procedure activations. Space
for objects is allocated from a dynamic storage area /.../ In
theory, all objects continue to exist forever. In practice, the
space used by an object may be reclaimed when the object isno
longer accessible to any CLU program.
Does this mean objects in python is created from heap(as in malloc in c/c++)? and the objects are deallocated when there 's no name associated with them?(like smart pointers)?
Example:
def foo(a):
result = []
result.append(a)
return result
foo("hello")
myList = foo("bye")
So the first result([]) was created in the heap and got deallocated because there's no name associated with it?
Yes, all Python objects live on the heap (at least on CPython.) They are reference-counted: they are de-allocated when the last reference to the object disappear. (CPython also has a garbage collector to break cycles.)
In CPython your first list disappears as soon as the function returns since you did not bind the return value to a name and the reference count dropped to zero. In other implementation the object may live longer until the garbage-collector kicks in.
Some objects (like open files) have resources attached that are automatically freed when the object is deallocated, but because of the above it is not recommended to rely on this. Resources should be closed explicitly when you are done with them.
Yes, all values in CPython are allocated on the heap and reference-counted to know when to deallocate them. Unlike in C, there is no way to know in most cases if a value will outlive its function, so the only safe thing to do is to heap-allocate everything.
Certainly you could do some analysis and determine that certain values are never passed to functions and thus couldn't escape, but that's of limited use in Python and the extra overhead probably wouldn't be worth it.
As a supplement to the other answers, here's one way to track when garbage-collection happens, using the special method __del__:
class Test(object):
def __init__(self, name):
self.name = name
def __del__(self):
print "deleting {0}".format(self.name)
print "discarded instance creation"
Test("hello")
print "saved instance creation"
myList = Test("bye")
print "program done"
Output:
discarded instance creation
deleting hello
saved instance creation
program done
deleting bye
For more in-depth data, see the gc module.

Are there stack based variables in Python?

If I do this:
def foo():
a = SomeObject()
Is 'a' destroyed immediately after leaving foo? Or does it wait for some GC to happen?
Yes and no. The object will get destroyed after you leave foo (as long as nothing else has a reference to it), but whether it is immediate or not is an implementation detail, and will vary.
In CPython (the standard python implementation), refcounting is used, so the item will immediately be destroyed. There are some exceptions to this, such as when the object contains cyclical references, or when references are held to the enclosing frame (eg. an exception is raised that retains a reference to the frame's variables.)
In implmentations like Jython or IronPython however, the object won't be finalised until the garbage collector kicks in.
As such, you shouldn't rely on timely finalisation of objects, but should only assume that it will be destroyed at some point after the last reference goes. When you do need some cleanup to be done based on the lexical scope, either explicitely call a cleanup method, or look at the new with statement in python 2.6 (available in 2.5 with "from __future__ import with_statement").

Categories