The basic memory address question about array slicing [duplicate]

The basic memory address question about array slicing [duplicate] - python

tl;dr
Does Python reuse ids? How likely it is that two objects with non overlapping lifetime will get the same id?
Background:
I've been working on a complex project, written purely in Python 3. I've been seeing some issues in testing and spent a lot of time searching for a root cause. After some analysis, my suspicion was that when the testing is being run as a whole (it's orchestrated and being run by a dedicated dispatcher) it's reusing some mocked methods instead of instatiating new objects with their original methods. To check if the interpreter is reusing I used id().
Problem:
id() usually works and shows the object identifier and lets me tell when my call is creating a new instance and not reusing. But what happens when ids if two objects are the same? The documentation says:
Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
The questions:
When can the interpreter reuse id() values? Is it just when it randomly selects the same memory area? If it's just random, it seems extremely unlikely but it's still not guaranteed.
Is there any other method to check what object I am actually referencing? I encountered a situation where I had the object, it had a mocked method. The object was no longer used, garbage collector destroyed it. After that I create a new object of the same class, it got a new id() but the method got the same id as when it was mocked and it actually was just a mock.
Is there a way to force Python to destroy the given object instance? From the reading I did it appears that no and that it is up to a garbage collector when it sees no references to the object but I thought it's worth asking anyway.

Yes, CPython re-uses id() values. Do not count on these being unique in a Python program.
This is clearly documented:
Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
Bold emphasis mine. The id is unique only as long as an object is alive. Objects that have no references left to them are removed from memory, allowing the id() value to be re-used for another object, hence the non-overlapping lifetimes wording.
Note that this applies to CPython only, the standard implementation provided by python.org. There are other Python implementations, such as IronPython, Jython and PyPy, that make their own choices about how to implement id(), because they each can make distinct choices on how to handle memory and object lifetimes.
To address your specific questions:
In CPython, id() is the memory address. New objects will be slotted into the next available memory space, so if a specific memory address has enough space to hold the next new object, the memory address will be reused. You can see this in the interpreter when creating new objects that are the same size:
>>> id(1234)
4546982768
>>> id(4321)
4546982768
The 1234 literal creates a new integer object, for which id() produces a numeric value. As there are no further references to the int value, it is removed from memory again. But executing the same expression again with a different integer literal, and chances are you'll see the same id() value (a garbage collection run breaking cyclic references could free up more memory, so you could also not see the same id() again.
So it's not random, but in CPython it is a function of the memory allocation algorithms.
If you need to check specific objects, keep your own reference to it. That can be a weakref weak reference if all you need to assure is that the object is still 'alive'.
For example, recording an object reference first, then later checking it:
import weakref
# record
object_ref = weakref.ref(some_object)
# check if it's the same object still
some_other_reference is object_ref() # only true if they are the same object
The weak reference won't keep the object alive, but if it is alive then the object_ref() will return it (it'll return None otherwise).
You could use such a mechanism to generate really unique identifiers, see below.
All you have to do to 'destroy' an object is to remove all references to it. Variables (local and global) are references. So are attributes on other objects, and entries in containers such as lists, tuples, dictionaries, sets, etc.
The moment all references to an object are gone, the reference count on the object drops to 0 and it is deleted, there and then.
Garbage collection only is needed to break cyclic references, objects that reference one another only, with no further references to the cycle. Because such a cycle will never reach a reference count of 0 without help, the garbage collector periodically checks for such cycles and breaks one of the references to help clear those objects from memory.
So you can cause any object to be deleted from memory (freed), by removing all references to it. How you achieve that depends on how the object is referenced. You can ask the interpreter to tell you what objects are referencing a given object with the gc.get_referrers() function, but take into account that doesn't give you variable names. It gives you objects, such as the dictionary object that is the __dict__ attribute of a module that references the object as a global, etc. For code fully under your control, at most use gc.get_referrers() as a tool to remind yourself what places the object is referenced from as you write the code to remove those.
If you must have unique identifiers for the lifetime of the Python application, you'd have to implement your own facility. If your objects are hashable and support weak references, then you could just use a WeakKeyDictionary instance to associate arbitrary objects with UUIDs:
from weakref import WeakKeyDictionary
from collections import defaultdict
from uuid import uuid4
class UniqueIdMap(WeakKeyDictionary):
def __init__(self, dict=None):
super().__init__(self)
# replace data with a defaultdict to generate uuids
self.data = defaultdict(uuid4)
if dict is not None:
self.update(dict)
uniqueidmap = UniqueIdMap()
def uniqueid(obj):
"""Produce a unique integer id for the object.
Object must me *hashable*. Id is a UUID and should be unique
across Python invocations.
"""
return uniqueidmap[obj].int
This still produces integers, but as they are UUIDs they are not quite guaranteed to be unique, but the likelihood you'll ever encounter the same ID during your lifetime are smaller than being hit by a meteorite. See How unique is UUID?
This then gives you unique ids even for objects with non-overlapping lifetimes:
>>> class Foo:
... pass
...
>>> id(Foo())
4547149104
>>> id(Foo()) # memory address reused
4547149104
>>> uniqueid(Foo())
151797163173960170410969562162860139237
>>> uniqueid(Foo()) # but you still get a unique UUID
188632072566395632221804340107821543671

The id is unique among currently existing objects. If an object is removed by the garbage collector, a future object can have the same id (and most probably will). You have to use your own unique value (eg. some uuid) to be sure that you are refering to a specific object. You can't do the garbage collection manually either.

It can reuse the id value as soon as the object which had it is no longer in any scope. It is in fact likely to reuse it if you create a similar object immediately after destroying the first.
If you're holding a reference (as opposed to a weak reference), the id is not reused because the object is still alive. If you're just holding the id value, you're probably doing something wrong.
No, but you could delete your reference and request the garbage collector to run. It's possible for the garbage collection to fail to collect that object even if there are no really live references.

Related

Distinct python classes instances returning the same class object [duplicate]

How much can I rely on the object's id() and its uniqueness in practice? E.g.:
Does id(a) == id(b) mean a is b or vice versa? What about the opposite?
How safe is it to save an id somewhere to be used later (e.g. into some registry instead of the object itself)?
(Written as a proposed canonical in response to Canonicals for Python: are objects with the same id() the same object, `is` operator, unbound method objects)

According to the id() documentation, an id is only guaranteed to be unique
for the lifetime of the specific object, and
within a specific interpreter instance
As such, comparing ids is not safe unless you also somehow ensure that both objects whose ids are taken are still alive at the time of comparison (and are associated with the same Python interpreter instance, but you need to really try to make that become false).
Which is exactly what is does -- which makes comparing ids redundant. If you cannot use the is syntax for whatever reason, there's always operator.is_.
Now, whether an object is still alive at the time of comparison is not always obvious (and sometimes is grossly non-obvious):
Accessing some attributes (e.g. bound methods of an object) creates a new object each time. So, the result's id may or may not be the same on each attribute access.
Example:
>>> class C(object): pass
>>> c=C()
>>> c.a=1
>>> c.a is c.a
True # same object each time
>>> c.__init__ is c.__init__
False # a different object each time
# The above two are not the only possible cases.
# An attribute may be implemented to sometimes return the same object
# and sometimes a different one:
#property
def page(self):
if check_for_new_version():
self._page=get_new_version()
return self._page
If an object is created as a result of calculating an expression and not saved anywhere, it's immediately discarded,1 and any object created after that can take up its id.
This is even true within the same code line. E.g. the result of id(create_foo()) == id(create_bar()) is undefined.
Example:
>>> id([]) #the list object is discarded when id() returns
39733320L
>>> id([]) #a new, unrelated object is created (and discarded, too)
39733320L #its id can happen to be the same
>>> id([[]])
39733640L #or not
>>> id([])
39733640L #you never really know
Due to the above safety requirements when comparing ids, saving an id instead of the object is not very useful because you have to save a reference to the object itself anyway -- to ensure that it stays alive. Neither is there any performance gain: is implementation is as simple as comparing pointers.
Finally, as an internal optimization (and implementation detail, so this may differ between implementations and releases), CPython reuses some often-used simple objects of immutable types. As of this writing, that includes small integers and some strings. So even if you got them from different places, their ids might coincide.
This does not (technically) violate the above id() documentation's uniqueness promises: the reused object stays alive through all the reuses.
This is also not a big deal because whether two variables point to the same object or not is only practical to know if the object is mutable: if two variables point to the same mutable object, mutating one will (unexpectedly) change the other, too. Immutable types don't have that problem, so for them, it doesn't matter if two variables point to two identical objects or to the same one.
1Sometimes, this is called "unnamed expression".

Do multiple immutable objects having the same value point to a single object in memory?

Lets say a = 10000000000 and b = 10000000000 i.e. both a and b have the same value.
When I print id() of a and b it always remains same no matter how many times I run the code.
Also, it remains same for float, string, boolean and tuple but does not remain same for lists, sets and dictionaries.
Does that mean when multiple variables (immutable types) have the exact same value it always point to a single object in memory and hence a is b will always return True, whereas multiple variables of mutable type having the same value point to its unique object in memory and hence a is b will always return False?

...it always point...
In general yes, but it is not guaranteed. It is a form of Python internal optimization known as type kerning.
You should look at it like something that does not matter for immutables, something transparent for the language user. If the object has a value that cannot change, it does not matter what instance of the objects of that type (and with that value) you are reading. That is why you can live with having only one.
As for the tuples, note that the contained objects can change, only the tuple cannot (that is, change the number of its elements).
So for immutables you do not have to worry.
For mutables, you should be careful, not with Python internal optimizations but with the code you write. Because you can have many names referring to the same instance (that now can be changed through any one of these references) and one change will be reflected in all of them. This is more tricky when passing mutables as arguments, because far away code can change the object (what was passed was a copy of the reference to the object, not a copy of the object itself).
It is your responsability to manage things with mutables. You can create new instances with the same values (copies) or share the objects. You can even pass copies as arguments to protect yourself from unintended side effects of calls.

How unique is Python's id()?

Yes, CPython re-uses id() values. Do not count on these being unique in a Python program.
This is clearly documented:
Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
Bold emphasis mine. The id is unique only as long as an object is alive. Objects that have no references left to them are removed from memory, allowing the id() value to be re-used for another object, hence the non-overlapping lifetimes wording.
Note that this applies to CPython only, the standard implementation provided by python.org. There are other Python implementations, such as IronPython, Jython and PyPy, that make their own choices about how to implement id(), because they each can make distinct choices on how to handle memory and object lifetimes.
To address your specific questions:
In CPython, id() is the memory address. New objects will be slotted into the next available memory space, so if a specific memory address has enough space to hold the next new object, the memory address will be reused. You can see this in the interpreter when creating new objects that are the same size:
>>> id(1234)
4546982768
>>> id(4321)
4546982768
The 1234 literal creates a new integer object, for which id() produces a numeric value. As there are no further references to the int value, it is removed from memory again. But executing the same expression again with a different integer literal, and chances are you'll see the same id() value (a garbage collection run breaking cyclic references could free up more memory, so you could also not see the same id() again.
So it's not random, but in CPython it is a function of the memory allocation algorithms.
If you need to check specific objects, keep your own reference to it. That can be a weakref weak reference if all you need to assure is that the object is still 'alive'.
For example, recording an object reference first, then later checking it:
import weakref
# record
object_ref = weakref.ref(some_object)
# check if it's the same object still
some_other_reference is object_ref() # only true if they are the same object
The weak reference won't keep the object alive, but if it is alive then the object_ref() will return it (it'll return None otherwise).
You could use such a mechanism to generate really unique identifiers, see below.
All you have to do to 'destroy' an object is to remove all references to it. Variables (local and global) are references. So are attributes on other objects, and entries in containers such as lists, tuples, dictionaries, sets, etc.
The moment all references to an object are gone, the reference count on the object drops to 0 and it is deleted, there and then.
Garbage collection only is needed to break cyclic references, objects that reference one another only, with no further references to the cycle. Because such a cycle will never reach a reference count of 0 without help, the garbage collector periodically checks for such cycles and breaks one of the references to help clear those objects from memory.
So you can cause any object to be deleted from memory (freed), by removing all references to it. How you achieve that depends on how the object is referenced. You can ask the interpreter to tell you what objects are referencing a given object with the gc.get_referrers() function, but take into account that doesn't give you variable names. It gives you objects, such as the dictionary object that is the __dict__ attribute of a module that references the object as a global, etc. For code fully under your control, at most use gc.get_referrers() as a tool to remind yourself what places the object is referenced from as you write the code to remove those.
If you must have unique identifiers for the lifetime of the Python application, you'd have to implement your own facility. If your objects are hashable and support weak references, then you could just use a WeakKeyDictionary instance to associate arbitrary objects with UUIDs:
from weakref import WeakKeyDictionary
from collections import defaultdict
from uuid import uuid4
class UniqueIdMap(WeakKeyDictionary):
def __init__(self, dict=None):
super().__init__(self)
# replace data with a defaultdict to generate uuids
self.data = defaultdict(uuid4)
if dict is not None:
self.update(dict)
uniqueidmap = UniqueIdMap()
def uniqueid(obj):
"""Produce a unique integer id for the object.
Object must me *hashable*. Id is a UUID and should be unique
across Python invocations.
"""
return uniqueidmap[obj].int
This still produces integers, but as they are UUIDs they are not quite guaranteed to be unique, but the likelihood you'll ever encounter the same ID during your lifetime are smaller than being hit by a meteorite. See How unique is UUID?
This then gives you unique ids even for objects with non-overlapping lifetimes:
>>> class Foo:
... pass
...
>>> id(Foo())
4547149104
>>> id(Foo()) # memory address reused
4547149104
>>> uniqueid(Foo())
151797163173960170410969562162860139237
>>> uniqueid(Foo()) # but you still get a unique UUID
188632072566395632221804340107821543671

The id is unique among currently existing objects. If an object is removed by the garbage collector, a future object can have the same id (and most probably will). You have to use your own unique value (eg. some uuid) to be sure that you are refering to a specific object. You can't do the garbage collection manually either.

It can reuse the id value as soon as the object which had it is no longer in any scope. It is in fact likely to reuse it if you create a similar object immediately after destroying the first.
If you're holding a reference (as opposed to a weak reference), the id is not reused because the object is still alive. If you're just holding the id value, you're probably doing something wrong.
No, but you could delete your reference and request the garbage collector to run. It's possible for the garbage collection to fail to collect that object even if there are no really live references.

Python: will id() always be nonzero?

I’m wondering if there is anything about python object IDs that will prevent them from ever equaling zero? I’m asking because I’m using zero as a stand-in for a special case in my code.

From the docs
CPython implementation detail: This is the address of the object in memory.
0 is an invalid memory location. So no object in C will ever have this memory location and no object in the CPython implementation will ever have an id of zero.
Not sure about other python implementations though

Return the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
There's nothing that says that it cannot be zero (zero is an integer). If you rely on it not being zero then you're relying on a current implementation detail which is not smart.
What you instead should do is to use for example None to indicate that it isn't an id of an object.

This isn't as strong an answer as I'd like, but doing help(id) on python 2.7.5
id(...)
id(object) -> integer
Return the identity of an object. This is guaranteed to be unique among
simultaneously existing objects. (Hint: it's the object's memory address.)
Assuming you don't have an object that is pointing to NULL, you should be safe there.

If you want an object that is different than any other, you can create one:
special = object()
As long as you don't delete it, special will be unique over the run time of your program. This might achieve the same thing you intend with checking id() being zero.

python creates everything from heap?

in c/c++, you have variables in stack when you create a local variable inside a function.
http://effbot.org/zone/call-by-object.htm
CLU objects exist independently of procedure activations. Space
for objects is allocated from a dynamic storage area /.../ In
theory, all objects continue to exist forever. In practice, the
space used by an object may be reclaimed when the object isno
longer accessible to any CLU program.
Does this mean objects in python is created from heap(as in malloc in c/c++)? and the objects are deallocated when there 's no name associated with them?(like smart pointers)?
Example:
def foo(a):
result = []
result.append(a)
return result
foo("hello")
myList = foo("bye")
So the first result([]) was created in the heap and got deallocated because there's no name associated with it?

Yes, all Python objects live on the heap (at least on CPython.) They are reference-counted: they are de-allocated when the last reference to the object disappear. (CPython also has a garbage collector to break cycles.)
In CPython your first list disappears as soon as the function returns since you did not bind the return value to a name and the reference count dropped to zero. In other implementation the object may live longer until the garbage-collector kicks in.
Some objects (like open files) have resources attached that are automatically freed when the object is deallocated, but because of the above it is not recommended to rely on this. Resources should be closed explicitly when you are done with them.

Yes, all values in CPython are allocated on the heap and reference-counted to know when to deallocate them. Unlike in C, there is no way to know in most cases if a value will outlive its function, so the only safe thing to do is to heap-allocate everything.
Certainly you could do some analysis and determine that certain values are never passed to functions and thus couldn't escape, but that's of limited use in Python and the extra overhead probably wouldn't be worth it.

As a supplement to the other answers, here's one way to track when garbage-collection happens, using the special method __del__:
class Test(object):
def __init__(self, name):
self.name = name
def __del__(self):
print "deleting {0}".format(self.name)
print "discarded instance creation"
Test("hello")
print "saved instance creation"
myList = Test("bye")
print "program done"
Output:
discarded instance creation
deleting hello
saved instance creation
program done
deleting bye
For more in-depth data, see the gc module.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.