I was reading the documentation when I came in doubt with the following phrase:
Since the collector supplements the reference counting already used in Python, you can disable the collector if you are sure your program does not create reference cycles.
What does this mean? If I disable the garbage collector (gc.disable()) and I do something like this:
a = 'hi'
a = 'hello'
will 'hi' remain in memory? Do I need to free the memory by myself?
What I understood from that sentence is that the gc is an extra tool made up expecially to catch reference cycles and if it is disabled the memory is still automatically cleaned using the reference counters of the objects but the reference cycles will not be managed. Is that right?
In CPython, objects are cleared from memory immediately when their reference count drops to 0.
The moment you rebind a to 'hello', the reference count for the 'hi' string object is decremented. If it reaches 0, it'll be removed from memory.
As such, the garbage collector only needs to deal with objects that (indirectly or directly) reference one another, and thus keep the reference count from ever dropping to 0.
Strings cannot reference other objects, so are not of interest to the garbage collector. But anything that can reference something else (such as containers types such as lists or dictionaries, or any Python class or instance) can produce a circular reference:
a = [] # Ref count is 1
a.append(a) # A circular reference! Ref count is now 2
del a # Ref count is decremented to 1
The garbage collector detects these circular references; nothing else references a, so eventually the gc process breaks the circle, letting the reference counts drop to 0 naturally.
Incidentally, the Python compiler bundles string literals such as 'hi' and 'hello' as constants with the bytecode produced and as such, there is always at least one reference to such objects. In addition, string literals used in source code that match the regular expression [a-zA-Z0-9_] are interned; made into singletons to reduce the memory footprint, so other code blocks that use the same string literal will hold a reference to the same shared string.
You understanding of the docs is correct (but see caveat below).
Reference counting still works when GC is disabled. In other words, circular references will not be resolved, but if the reference count for an object drops to zero, the object will be GC'd.
Caveat: note that this doesn't apply to small strings (and integers) that are treated differently from other objects in Python (they're not really GC'd) — see Martijn Pieters' answer for more detail.
Consider the following code
import weakref
import gc
class Test(object):
pass
class Cycle(object):
def __init__(self):
self.other = None
if __name__ == '__main__':
gc.disable()
print "-- No Cycle"
t = Test()
r_t = weakref.ref(t) # Weak refs don't increment refcount
print "Before re-assign"
print r_t()
t = None
print "After re-assign"
print r_t()
print
print "-- Cycle"
c1 = Cycle()
c2 = Cycle()
c1.other = c2
c2.other = c1
r_c1 = weakref.ref(c1)
r_c2 = weakref.ref(c2)
c1 = None
c2 = None
print "After re-assign"
print r_c1()
print r_c2()
print "After run GC"
gc.collect()
print r_c1()
print r_c2()
Its output is :
-- No Cycle
Before re-assign
<__main__.Test object at 0x101387e90> # The object exists
After re-assign
None # The object was GC'd
-- Cycle
After re-assign
<__main__.Cycle object at 0x101387e90> # The object wasn't GC'd due to the circular reference
<__main__.Cycle object at 0x101387f10>
After run GC
None # The GC was able to resolve the circular reference, and deleted the object
None
In your example "hi" does not remain in memory. The garbage collector detects Circular references.
Here is a simple example of a circular reference in python:
a = []
b = [a]
a.append(b)
Here a contains b and b contains a. If you disable the garbage collector these two objects will remain in memory.
Note that some of the built-in modules cause circular references. And it's usually not worth disabling it.
Related
What happens to a Python list object in memory when it has been dereferenced from a Variable?
Example
A = ["A", "B"]
print(A)
A = []
A = None
print(A)
What happens to the ["A", "B"]?
Objects that no longer have any references pointing to them are deleted.
CPython achieves this by reference counting; assigning adds to the reference count, removing a reference decreases the reference count. When the count reaches 0 the object is automatically deleted.
The edge case is when an object directly or indirectly points to itself; for example by adding adding a list to itself:
>>> a = []
>>> a.append(a)
>>> a
[[...]]
The ellipsis there indicates a reference cycle. Python uses a garbage collector to detect and break such cycles; it runs periodically to find object graphs that are not referenced by anything else.
I have this code, save as so.py:
import gc
gc.set_debug(gc.DEBUG_STATS|gc.DEBUG_LEAK)
class GUI():
#########################################
def set_func(self):
self.functions = {}
self.functions[100] = self.userInput
#########################################
def userInput(self):
a = 1
g = GUI()
g.set_func()
print gc.collect()
print gc.garbage
And this is the output:
I have two questions:
Why gc.collect() does not reports unreachable when first time import? Instead it reports unreachable only when reload().
Is there any quick way to fix this function mapping circular reference, i.e self.functions[100] = self.userInput ? Because my old project have a lot of this function mapping circular reference and i'm looking for a quick way/one line to change this codes. Currently what i do is "del g.functions" for all this functions at the end.
The first time you import the module nothing is being collected because you have a reference to the so module and all other objects are referenced by it, so they are all alive and the garbage collector has nothing to collect.
When you reload(so) what happens is that the module is reexecuted, overriding all previous references and thus now the old values don't have any reference anymore.
You do have a reference cycle in:
self.functions[100] = self.userInput
since self.userInput is a bound method it has a reference to self. So now self has a reference to the functions dictionary which has a reference to the userInput bound method which has a reference to self and the gc will collect those objects.
It depends by what you are trying to do. From your code is not clear how you are using that self.functions dictionary and depending on that different options may be viable.
The simplest way to break the cycle is to simply not create the self.functions attribute, but pass the dictionary around explicitly.
If self.functions only references bound methods you could store the name of the methods instead of the method itself:
self.functions[100] = self.userInput.__name__
and then you can call the method doing:
getattr(self, self.functions[100])()
or you can do:
from operator import methodcaller
call_method = methodcaller(self.functions[100])
call_method(self) # calls self.userInput()
I don't really understand what do you mean by "Currently what i do is del g.functions for all this functions at the end." Which functions are you talking about?
Also, is this really a problem? Are you experience a real memory leak?
Note that the garbage collector reports the objects as unreachable not as uncollectable. This means that the objects are freed even if they are part of a reference cycle. So no memory leak should happen.
In fact adding del g.functions is useless because the objects are going to be freed anyway, so the one line fix is to simply remove all those del statements, since they don't do anything at all.
The fact that they are put into gc.garbage is because gc.DEBUG_LEAK implies the flag GC.DEBUG_SAVEALL which makes the collector put all unreachable objects into the garbage and not just the uncollectable ones.
The nature of reload is that the module is re-executed. The new definitions supersede the old ones, so the old values become unreachable. By contrast, on the first import, there are no superseded definitions, so naturally there is nothing to become unreachable.
One way is to pass the functions object as a parameter to set_func, and do not assign it as an instance attribute. This will break the cycle while still allowing you to pass the functions object to where it's needed.
I was wondering is there any way to create a dangling pointers in python? I guess we have to manually delete an object for example and then the reference of that object will point at a location that has no meaning for the program.
I found this example Here
import weakref
class Object:
pass
o = Object() #new instance
print ("o id is:",id(o))
r = weakref.ref(o)
print ("r id is:",id(r))
o2 = r()
print ("o2 id is:",id(o2))
print ("r() id is:",id(r()))
print (o is o2)
del o,o2
print (r(),r) #If the referent no longer exists, calling the reference object returns None
o = r() # r is a weak reference object
if o is None:
# referent has been garbage collected
print ("Object has been deallocated; can't frobnicate.")
else:
print ("Object is still live!")
o.do_something_useful()
In this example which one is the dangling pointer/reference? Is it o or r? I am confused.
Is it also possible to create dangling pointers in stack? If you please, give me some simple examples so i can understand how it goes.
Thanks in advance.
All Python objects live on the heap. The stack is only used for function calls.
Calling a weakref object dereferences it and gives you a strong reference to the object, if the object is still around. Otherwise, you get None. In the latter case, you might call the weakref "dangling" (r in your example).
However, Python does not have any notion of a "dangling pointer" in the same way that C does. It's not possible (barring a bug in Python, a buggy extension module or misuse of a module like ctypes) to create a name (strong reference) that refers to a deleted object, because by definition strong references keep their referents alive. On the other hand, weak references are not really dangling pointers, since they are automatically resolved to None if their referents are deleted.
Note that with ctypes abuse it is possible to create a "real" dangling pointer:
import ctypes
a = (1, 2, 3)
ctypes.pythonapi.Py_DecRef(ctypes.py_object(a))
print a
What happens when you print a is now undefined. It might crash the interpreter, print (1, 2, 3), print other tuples, or execute a random function. Of course, this is only possible because you abused ctypes; it's not something that you should ever do.
Barring a bug in Python or an extension, there is no way to refer to a deallocated object. Weak references refer to the object as long as it is alive, while not contributing to keeping it alive. The moment the object is deallocated, the weak reference evaluates to None, so you never get the dangling object. (Even the callback of the weak reference is called after the object has already been deallocated and the weakref dereferences to None, so you cannot resurrect it, either.)
If you could refer to a real deallocated object, Python would most likely crash on first access, because the memory previously held by the object would be reused and the object's type and other slots would contain garbage. Python objects are never allocated on the stack.
If you have a use case why you need to make use of a dangling object, you should present the use case in the form of a question.
If you create a weak reference, it becomes "dangling" when the referenced object is deleted (when it's reference count reaches zero, or is part of a closed cycle of objects not referenced by anything else). This is possible because weakref doesn't increase the reference count itself (that's the whole point of a weak reference).
When this happens, everytime you try to "dereference" the weakref object (call it), it returns None.
It is important to remember that in Python variables are actually names, pointing at objects. They are actually "strong references". Example:
import weakref
class A:
pass
# a new object is created, and the name "x" is set to reference the object,
# giving a reference count of 1
x = A()
# a weak reference is created referencing the object that the name x references
# the reference count is still 1 though, because x is still the only strong
# reference
weak_reference = weakref.ref(x)
# the only strong reference to the object is deleted (x), reducing the reference
# count to 0 this means that the object is destroyed, and at this point
# "weak_reference" becomes dangling, and calls return None
del x
assert weak_reference() is None
In many cases, you are sure you definitely won't use the list again, so you want the memory to be released right now.
a = [11,22,34,567,9999]
del a
I'm not sure if the above really releases the memory. You can use:
del a[:]
that actually removes all the elements in list a.
Is that the best way to release the memory?
def realse_list(a):
del a[:]
del a
I have the same question about tuples and sets.
def release_list(a):
del a[:]
del a
Do not ever do this. Python automatically frees all objects that are not referenced any more, so a simple del a ensures that the list's memory will be released if the list isn't referenced anywhere else. If that's the case, then the individual list items will also be released (and any objects referenced only from them, and so on and so on), unless some of the individual items were also still referenced.
That means the only time when del a[:]; del a will release more than del a on its own is when the list is referenced somewhere else. This is precisely when you shouldn't be emptying out the list: someone else is still using it!!!
Basically, you shouldn't be thinking about managing pieces of memory. Instead, think about managing references to objects. In 99% of all Python code, Python cleans up everything you don't need pretty soon after the last time you needed it, and there's no problem. Every time a function finishes all the local variables in that function "die", and if they were pointing to objects that are not referenced anywhere else they'll be deleted, and that will cascade to everything contained within those objects.
The only time you need to think about it is when you have a large object (say a huge list), you do something with it, and then you begin a long-running (or memory intensive) sub-computation, where the large object isn't needed for the sub-computation. Because you have a reference to it, the large object won't be released until the sub-computation finishes and then you return. In that sort of case (and only that sort of case), you can explicitly del your reference to the large object before you begin the sub-computation, so that the large object can be freed earlier (if no-one else is using it; if a caller passed the object in to you and the caller does still need it after you return, you'll be very glad that it doesn't get released).
Python uses Reference Count to manage its resource.
import sys
class foo:
pass
b = foo()
a = [b, 1]
sys.getrefcount(b) # gives 3
sys.getrefcount(a) # gives 2
a = None # delete the list
sys.getrefcount(b) # gives 2
In the above example, b's reference count will be incremented when you put it into a list, and as you can see, when you delete the list, the reference count of b get decremented too. So in your code
def release_list(a):
del a[:]
del a
was redundant.
In summary, all you need to do is assigning the list into a None object or use del keyword to remove the list from the attributes dictionary. (a.k.a, to unbind the name from the actual object). For example,
a = None # or
del a
When the reference count of an object goes to zero, python will free the memory for you. To make sure the object gets deleted, you have to make sure no other places reference the object by name, or by container.
sys.getrefcount(b) # gives 2
If sys.getrefcount gives you 2, that means you are the only one who had the reference of the object and when you do
b = None
it will get freed from the memory.
As #monkut notes, you probably shouldn't worry too much about memory management in most situations. If you do have a giant list that you're sure you're done with now and it won't go out of the current function's scope for a while, though:
del a simply removes your name a for that chunk of memory. If some other function or structure or whatever has a reference to it still, it won't be deleted; if this code has the only reference to that list under the name a and you're using CPython, the reference counter will immediately free that memory. Other implementations (PyPy, Jython, IronPython) might not kill it right away because they have different garbage collectors.
Because of this, the del a statement in your realse_list function doesn't actually do anything, because the caller still has a reference!
del a[:] will, as you note, remove the elements from the list and thus probably most of its memory usage.
You can do the_set.clear() for similar behavior with sets.
All you can do with a tuple, because they're immutable, is del the_tuple and hope nobody else has a reference to it -- but you probably shouldn't have enormous tuples!
If your worried about memory management and performance for data types why not use something like a linked double queue.
First its memory footprint is scattered though out the memory so you won't have to allocate a large chunk of continuous memory right off the bat.
Second you will see faster access times for enqueueing and dequeueing because unlike in a standard list when you remove lets say a middle element there is no need for sliding the rest of the list over in the index which takes time in large lists.
I should also note if you are using just integers I would suggest looking into a binary heap as you will see O(log^2n) access times compared to mostly O(N) with lists.
If you need to release list's memory, keeping the list's name, you can simply write a=[]
in c/c++, you have variables in stack when you create a local variable inside a function.
http://effbot.org/zone/call-by-object.htm
CLU objects exist independently of procedure activations. Space
for objects is allocated from a dynamic storage area /.../ In
theory, all objects continue to exist forever. In practice, the
space used by an object may be reclaimed when the object isno
longer accessible to any CLU program.
Does this mean objects in python is created from heap(as in malloc in c/c++)? and the objects are deallocated when there 's no name associated with them?(like smart pointers)?
Example:
def foo(a):
result = []
result.append(a)
return result
foo("hello")
myList = foo("bye")
So the first result([]) was created in the heap and got deallocated because there's no name associated with it?
Yes, all Python objects live on the heap (at least on CPython.) They are reference-counted: they are de-allocated when the last reference to the object disappear. (CPython also has a garbage collector to break cycles.)
In CPython your first list disappears as soon as the function returns since you did not bind the return value to a name and the reference count dropped to zero. In other implementation the object may live longer until the garbage-collector kicks in.
Some objects (like open files) have resources attached that are automatically freed when the object is deallocated, but because of the above it is not recommended to rely on this. Resources should be closed explicitly when you are done with them.
Yes, all values in CPython are allocated on the heap and reference-counted to know when to deallocate them. Unlike in C, there is no way to know in most cases if a value will outlive its function, so the only safe thing to do is to heap-allocate everything.
Certainly you could do some analysis and determine that certain values are never passed to functions and thus couldn't escape, but that's of limited use in Python and the extra overhead probably wouldn't be worth it.
As a supplement to the other answers, here's one way to track when garbage-collection happens, using the special method __del__:
class Test(object):
def __init__(self, name):
self.name = name
def __del__(self):
print "deleting {0}".format(self.name)
print "discarded instance creation"
Test("hello")
print "saved instance creation"
myList = Test("bye")
print "program done"
Output:
discarded instance creation
deleting hello
saved instance creation
program done
deleting bye
For more in-depth data, see the gc module.