How to create dangling pointer (in stack or heap) in python

How to create dangling pointer (in stack or heap) in python - python

I was wondering is there any way to create a dangling pointers in python? I guess we have to manually delete an object for example and then the reference of that object will point at a location that has no meaning for the program.
I found this example Here
import weakref
class Object:
pass
o = Object() #new instance
print ("o id is:",id(o))
r = weakref.ref(o)
print ("r id is:",id(r))
o2 = r()
print ("o2 id is:",id(o2))
print ("r() id is:",id(r()))
print (o is o2)
del o,o2
print (r(),r) #If the referent no longer exists, calling the reference object returns None
o = r() # r is a weak reference object
if o is None:
# referent has been garbage collected
print ("Object has been deallocated; can't frobnicate.")
else:
print ("Object is still live!")
o.do_something_useful()
In this example which one is the dangling pointer/reference? Is it o or r? I am confused.
Is it also possible to create dangling pointers in stack? If you please, give me some simple examples so i can understand how it goes.
Thanks in advance.

All Python objects live on the heap. The stack is only used for function calls.
Calling a weakref object dereferences it and gives you a strong reference to the object, if the object is still around. Otherwise, you get None. In the latter case, you might call the weakref "dangling" (r in your example).
However, Python does not have any notion of a "dangling pointer" in the same way that C does. It's not possible (barring a bug in Python, a buggy extension module or misuse of a module like ctypes) to create a name (strong reference) that refers to a deleted object, because by definition strong references keep their referents alive. On the other hand, weak references are not really dangling pointers, since they are automatically resolved to None if their referents are deleted.
Note that with ctypes abuse it is possible to create a "real" dangling pointer:
import ctypes
a = (1, 2, 3)
ctypes.pythonapi.Py_DecRef(ctypes.py_object(a))
print a
What happens when you print a is now undefined. It might crash the interpreter, print (1, 2, 3), print other tuples, or execute a random function. Of course, this is only possible because you abused ctypes; it's not something that you should ever do.

Barring a bug in Python or an extension, there is no way to refer to a deallocated object. Weak references refer to the object as long as it is alive, while not contributing to keeping it alive. The moment the object is deallocated, the weak reference evaluates to None, so you never get the dangling object. (Even the callback of the weak reference is called after the object has already been deallocated and the weakref dereferences to None, so you cannot resurrect it, either.)
If you could refer to a real deallocated object, Python would most likely crash on first access, because the memory previously held by the object would be reused and the object's type and other slots would contain garbage. Python objects are never allocated on the stack.
If you have a use case why you need to make use of a dangling object, you should present the use case in the form of a question.

If you create a weak reference, it becomes "dangling" when the referenced object is deleted (when it's reference count reaches zero, or is part of a closed cycle of objects not referenced by anything else). This is possible because weakref doesn't increase the reference count itself (that's the whole point of a weak reference).
When this happens, everytime you try to "dereference" the weakref object (call it), it returns None.
It is important to remember that in Python variables are actually names, pointing at objects. They are actually "strong references". Example:
import weakref
class A:
pass
# a new object is created, and the name "x" is set to reference the object,
# giving a reference count of 1
x = A()
# a weak reference is created referencing the object that the name x references
# the reference count is still 1 though, because x is still the only strong
# reference
weak_reference = weakref.ref(x)
# the only strong reference to the object is deleted (x), reducing the reference
# count to 0 this means that the object is destroyed, and at this point
# "weak_reference" becomes dangling, and calls return None
del x
assert weak_reference() is None

Related

How unique is Python's id()?

tl;dr
Does Python reuse ids? How likely it is that two objects with non overlapping lifetime will get the same id?
Background:
I've been working on a complex project, written purely in Python 3. I've been seeing some issues in testing and spent a lot of time searching for a root cause. After some analysis, my suspicion was that when the testing is being run as a whole (it's orchestrated and being run by a dedicated dispatcher) it's reusing some mocked methods instead of instatiating new objects with their original methods. To check if the interpreter is reusing I used id().
Problem:
id() usually works and shows the object identifier and lets me tell when my call is creating a new instance and not reusing. But what happens when ids if two objects are the same? The documentation says:
Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
The questions:
When can the interpreter reuse id() values? Is it just when it randomly selects the same memory area? If it's just random, it seems extremely unlikely but it's still not guaranteed.
Is there any other method to check what object I am actually referencing? I encountered a situation where I had the object, it had a mocked method. The object was no longer used, garbage collector destroyed it. After that I create a new object of the same class, it got a new id() but the method got the same id as when it was mocked and it actually was just a mock.
Is there a way to force Python to destroy the given object instance? From the reading I did it appears that no and that it is up to a garbage collector when it sees no references to the object but I thought it's worth asking anyway.

Yes, CPython re-uses id() values. Do not count on these being unique in a Python program.
This is clearly documented:
Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
Bold emphasis mine. The id is unique only as long as an object is alive. Objects that have no references left to them are removed from memory, allowing the id() value to be re-used for another object, hence the non-overlapping lifetimes wording.
Note that this applies to CPython only, the standard implementation provided by python.org. There are other Python implementations, such as IronPython, Jython and PyPy, that make their own choices about how to implement id(), because they each can make distinct choices on how to handle memory and object lifetimes.
To address your specific questions:
In CPython, id() is the memory address. New objects will be slotted into the next available memory space, so if a specific memory address has enough space to hold the next new object, the memory address will be reused. You can see this in the interpreter when creating new objects that are the same size:
>>> id(1234)
4546982768
>>> id(4321)
4546982768
The 1234 literal creates a new integer object, for which id() produces a numeric value. As there are no further references to the int value, it is removed from memory again. But executing the same expression again with a different integer literal, and chances are you'll see the same id() value (a garbage collection run breaking cyclic references could free up more memory, so you could also not see the same id() again.
So it's not random, but in CPython it is a function of the memory allocation algorithms.
If you need to check specific objects, keep your own reference to it. That can be a weakref weak reference if all you need to assure is that the object is still 'alive'.
For example, recording an object reference first, then later checking it:
import weakref
# record
object_ref = weakref.ref(some_object)
# check if it's the same object still
some_other_reference is object_ref() # only true if they are the same object
The weak reference won't keep the object alive, but if it is alive then the object_ref() will return it (it'll return None otherwise).
You could use such a mechanism to generate really unique identifiers, see below.
All you have to do to 'destroy' an object is to remove all references to it. Variables (local and global) are references. So are attributes on other objects, and entries in containers such as lists, tuples, dictionaries, sets, etc.
The moment all references to an object are gone, the reference count on the object drops to 0 and it is deleted, there and then.
Garbage collection only is needed to break cyclic references, objects that reference one another only, with no further references to the cycle. Because such a cycle will never reach a reference count of 0 without help, the garbage collector periodically checks for such cycles and breaks one of the references to help clear those objects from memory.
So you can cause any object to be deleted from memory (freed), by removing all references to it. How you achieve that depends on how the object is referenced. You can ask the interpreter to tell you what objects are referencing a given object with the gc.get_referrers() function, but take into account that doesn't give you variable names. It gives you objects, such as the dictionary object that is the __dict__ attribute of a module that references the object as a global, etc. For code fully under your control, at most use gc.get_referrers() as a tool to remind yourself what places the object is referenced from as you write the code to remove those.
If you must have unique identifiers for the lifetime of the Python application, you'd have to implement your own facility. If your objects are hashable and support weak references, then you could just use a WeakKeyDictionary instance to associate arbitrary objects with UUIDs:
from weakref import WeakKeyDictionary
from collections import defaultdict
from uuid import uuid4
class UniqueIdMap(WeakKeyDictionary):
def __init__(self, dict=None):
super().__init__(self)
# replace data with a defaultdict to generate uuids
self.data = defaultdict(uuid4)
if dict is not None:
self.update(dict)
uniqueidmap = UniqueIdMap()
def uniqueid(obj):
"""Produce a unique integer id for the object.
Object must me *hashable*. Id is a UUID and should be unique
across Python invocations.
"""
return uniqueidmap[obj].int
This still produces integers, but as they are UUIDs they are not quite guaranteed to be unique, but the likelihood you'll ever encounter the same ID during your lifetime are smaller than being hit by a meteorite. See How unique is UUID?
This then gives you unique ids even for objects with non-overlapping lifetimes:
>>> class Foo:
... pass
...
>>> id(Foo())
4547149104
>>> id(Foo()) # memory address reused
4547149104
>>> uniqueid(Foo())
151797163173960170410969562162860139237
>>> uniqueid(Foo()) # but you still get a unique UUID
188632072566395632221804340107821543671

The id is unique among currently existing objects. If an object is removed by the garbage collector, a future object can have the same id (and most probably will). You have to use your own unique value (eg. some uuid) to be sure that you are refering to a specific object. You can't do the garbage collection manually either.

It can reuse the id value as soon as the object which had it is no longer in any scope. It is in fact likely to reuse it if you create a similar object immediately after destroying the first.
If you're holding a reference (as opposed to a weak reference), the id is not reused because the object is still alive. If you're just holding the id value, you're probably doing something wrong.
No, but you could delete your reference and request the garbage collector to run. It's possible for the garbage collection to fail to collect that object even if there are no really live references.

About Python dynamic instantiation

I am trying to figure out what's happening here. I want to keep a map (aka dict) of string keys and class values, in order to create new instances at runtime. I omitted the Farm class definition which is not important here.
Well, given the following code:
d = dict(farm = Farm)
# Dynamic instantiation with assignment
f1 = d["farm"]()
f2 = d["farm"]()
print(f1)
print(f2)
# Dynamic instantiation without assignment
print(d["farm"]())
print(d["farm"]())
I get the next output:
C:\Python3\python.exe E:/Programacion/Python/PythonGame/Prueba.py
<BuildingManager.Farm object at 0x00F7B330>
<BuildingManager.Farm object at 0x00F7B730>
<BuildingManager.Farm object at 0x00F7BAD0>
<BuildingManager.Farm object at 0x00F7BAD0>
Process finished with exit code 0
Note that when I print them without being assigned, the ref is the same (0x00F7BAD0).
Why does instantiation in Python always return the same object?

Why does instantiation in Python always return the same object?
It doesn't. Look again at the IDs returned by your output:
Only the last one is recycled. And it's still not the same object.
So why are two those last IDs the same, but the first two are different?
In the first two cases you assign to a variable. That variable is kept around for the full execution of your program. Thus, each of the two object is unique, and remains unique.
Then, there is the third instantiation (the first print statement). This object is created, printed, but never assigned to any variable. Thus, after printing, Python can forget about it. And it does.
In the last instantiation (second print statement), Python creates a new Farm instance, but assigns it the same ID as the one that is not kept around (number 3). That is just convenience, and under the hood, this is probably efficient as well (the memory space is available.)
Thus, you see a recycled ID, even though it is in fact a new instance.

Python didn't return the same object, it returned a new object that just happened to be created to same address as the previous one. When print(d["farm"]()) is executed new object will be created and it's address is printed. Since there are no references to it it's available for garbage collection as soon as print returns. When second print(d["farm"]()) is executed it just happens to create the object to same address. Note that this won't happen when you assign the return value to a variable since object can't be garbage collected as long as there are references to it.

python - gc unreachable when reload()

I have this code, save as so.py:
import gc
gc.set_debug(gc.DEBUG_STATS|gc.DEBUG_LEAK)
class GUI():
#########################################
def set_func(self):
self.functions = {}
self.functions[100] = self.userInput
#########################################
def userInput(self):
a = 1
g = GUI()
g.set_func()
print gc.collect()
print gc.garbage
And this is the output:
I have two questions:
Why gc.collect() does not reports unreachable when first time import? Instead it reports unreachable only when reload().
Is there any quick way to fix this function mapping circular reference, i.e self.functions[100] = self.userInput ? Because my old project have a lot of this function mapping circular reference and i'm looking for a quick way/one line to change this codes. Currently what i do is "del g.functions" for all this functions at the end.

The first time you import the module nothing is being collected because you have a reference to the so module and all other objects are referenced by it, so they are all alive and the garbage collector has nothing to collect.
When you reload(so) what happens is that the module is reexecuted, overriding all previous references and thus now the old values don't have any reference anymore.
You do have a reference cycle in:
self.functions[100] = self.userInput
since self.userInput is a bound method it has a reference to self. So now self has a reference to the functions dictionary which has a reference to the userInput bound method which has a reference to self and the gc will collect those objects.
It depends by what you are trying to do. From your code is not clear how you are using that self.functions dictionary and depending on that different options may be viable.
The simplest way to break the cycle is to simply not create the self.functions attribute, but pass the dictionary around explicitly.
If self.functions only references bound methods you could store the name of the methods instead of the method itself:
self.functions[100] = self.userInput.__name__
and then you can call the method doing:
getattr(self, self.functions[100])()
or you can do:
from operator import methodcaller
call_method = methodcaller(self.functions[100])
call_method(self) # calls self.userInput()
I don't really understand what do you mean by "Currently what i do is del g.functions for all this functions at the end." Which functions are you talking about?
Also, is this really a problem? Are you experience a real memory leak?
Note that the garbage collector reports the objects as unreachable not as uncollectable. This means that the objects are freed even if they are part of a reference cycle. So no memory leak should happen.
In fact adding del g.functions is useless because the objects are going to be freed anyway, so the one line fix is to simply remove all those del statements, since they don't do anything at all.
The fact that they are put into gc.garbage is because gc.DEBUG_LEAK implies the flag GC.DEBUG_SAVEALL which makes the collector put all unreachable objects into the garbage and not just the uncollectable ones.

The nature of reload is that the module is re-executed. The new definitions supersede the old ones, so the old values become unreachable. By contrast, on the first import, there are no superseded definitions, so naturally there is nothing to become unreachable.
One way is to pass the functions object as a parameter to set_func, and do not assign it as an instance attribute. This will break the cycle while still allowing you to pass the functions object to where it's needed.

Garbage Collector and gc module

I was reading the documentation when I came in doubt with the following phrase:
Since the collector supplements the reference counting already used in Python, you can disable the collector if you are sure your program does not create reference cycles.
What does this mean? If I disable the garbage collector (gc.disable()) and I do something like this:
a = 'hi'
a = 'hello'
will 'hi' remain in memory? Do I need to free the memory by myself?
What I understood from that sentence is that the gc is an extra tool made up expecially to catch reference cycles and if it is disabled the memory is still automatically cleaned using the reference counters of the objects but the reference cycles will not be managed. Is that right?

In CPython, objects are cleared from memory immediately when their reference count drops to 0.
The moment you rebind a to 'hello', the reference count for the 'hi' string object is decremented. If it reaches 0, it'll be removed from memory.
As such, the garbage collector only needs to deal with objects that (indirectly or directly) reference one another, and thus keep the reference count from ever dropping to 0.
Strings cannot reference other objects, so are not of interest to the garbage collector. But anything that can reference something else (such as containers types such as lists or dictionaries, or any Python class or instance) can produce a circular reference:
a = [] # Ref count is 1
a.append(a) # A circular reference! Ref count is now 2
del a # Ref count is decremented to 1
The garbage collector detects these circular references; nothing else references a, so eventually the gc process breaks the circle, letting the reference counts drop to 0 naturally.
Incidentally, the Python compiler bundles string literals such as 'hi' and 'hello' as constants with the bytecode produced and as such, there is always at least one reference to such objects. In addition, string literals used in source code that match the regular expression [a-zA-Z0-9_] are interned; made into singletons to reduce the memory footprint, so other code blocks that use the same string literal will hold a reference to the same shared string.

You understanding of the docs is correct (but see caveat below).
Reference counting still works when GC is disabled. In other words, circular references will not be resolved, but if the reference count for an object drops to zero, the object will be GC'd.
Caveat: note that this doesn't apply to small strings (and integers) that are treated differently from other objects in Python (they're not really GC'd) — see Martijn Pieters' answer for more detail.
Consider the following code
import weakref
import gc
class Test(object):
pass
class Cycle(object):
def __init__(self):
self.other = None
if __name__ == '__main__':
gc.disable()
print "-- No Cycle"
t = Test()
r_t = weakref.ref(t) # Weak refs don't increment refcount
print "Before re-assign"
print r_t()
t = None
print "After re-assign"
print r_t()
print
print "-- Cycle"
c1 = Cycle()
c2 = Cycle()
c1.other = c2
c2.other = c1
r_c1 = weakref.ref(c1)
r_c2 = weakref.ref(c2)
c1 = None
c2 = None
print "After re-assign"
print r_c1()
print r_c2()
print "After run GC"
gc.collect()
print r_c1()
print r_c2()
Its output is :
-- No Cycle
Before re-assign
<__main__.Test object at 0x101387e90> # The object exists
After re-assign
None # The object was GC'd
-- Cycle
After re-assign
<__main__.Cycle object at 0x101387e90> # The object wasn't GC'd due to the circular reference
<__main__.Cycle object at 0x101387f10>
After run GC
None # The GC was able to resolve the circular reference, and deleted the object
None

In your example "hi" does not remain in memory. The garbage collector detects Circular references.
Here is a simple example of a circular reference in python:
a = []
b = [a]
a.append(b)
Here a contains b and b contains a. If you disable the garbage collector these two objects will remain in memory.
Note that some of the built-in modules cause circular references. And it's usually not worth disabling it.

python creates everything from heap?

in c/c++, you have variables in stack when you create a local variable inside a function.
http://effbot.org/zone/call-by-object.htm
CLU objects exist independently of procedure activations. Space
for objects is allocated from a dynamic storage area /.../ In
theory, all objects continue to exist forever. In practice, the
space used by an object may be reclaimed when the object isno
longer accessible to any CLU program.
Does this mean objects in python is created from heap(as in malloc in c/c++)? and the objects are deallocated when there 's no name associated with them?(like smart pointers)?
Example:
def foo(a):
result = []
result.append(a)
return result
foo("hello")
myList = foo("bye")
So the first result([]) was created in the heap and got deallocated because there's no name associated with it?

Yes, all Python objects live on the heap (at least on CPython.) They are reference-counted: they are de-allocated when the last reference to the object disappear. (CPython also has a garbage collector to break cycles.)
In CPython your first list disappears as soon as the function returns since you did not bind the return value to a name and the reference count dropped to zero. In other implementation the object may live longer until the garbage-collector kicks in.
Some objects (like open files) have resources attached that are automatically freed when the object is deallocated, but because of the above it is not recommended to rely on this. Resources should be closed explicitly when you are done with them.

Yes, all values in CPython are allocated on the heap and reference-counted to know when to deallocate them. Unlike in C, there is no way to know in most cases if a value will outlive its function, so the only safe thing to do is to heap-allocate everything.
Certainly you could do some analysis and determine that certain values are never passed to functions and thus couldn't escape, but that's of limited use in Python and the extra overhead probably wouldn't be worth it.

As a supplement to the other answers, here's one way to track when garbage-collection happens, using the special method __del__:
class Test(object):
def __init__(self, name):
self.name = name
def __del__(self):
print "deleting {0}".format(self.name)
print "discarded instance creation"
Test("hello")
print "saved instance creation"
myList = Test("bye")
print "program done"
Output:
discarded instance creation
deleting hello
saved instance creation
program done
deleting bye
For more in-depth data, see the gc module.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.