Python: will id() always be nonzero? - python

I’m wondering if there is anything about python object IDs that will prevent them from ever equaling zero? I’m asking because I’m using zero as a stand-in for a special case in my code.

From the docs
CPython implementation detail: This is the address of the object in memory.
0 is an invalid memory location. So no object in C will ever have this memory location and no object in the CPython implementation will ever have an id of zero.
Not sure about other python implementations though

Return the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
There's nothing that says that it cannot be zero (zero is an integer). If you rely on it not being zero then you're relying on a current implementation detail which is not smart.
What you instead should do is to use for example None to indicate that it isn't an id of an object.

This isn't as strong an answer as I'd like, but doing help(id) on python 2.7.5
id(...)
id(object) -> integer
Return the identity of an object. This is guaranteed to be unique among
simultaneously existing objects. (Hint: it's the object's memory address.)
Assuming you don't have an object that is pointing to NULL, you should be safe there.

If you want an object that is different than any other, you can create one:
special = object()
As long as you don't delete it, special will be unique over the run time of your program. This might achieve the same thing you intend with checking id() being zero.

Related

Distinct python classes instances returning the same class object [duplicate]

How much can I rely on the object's id() and its uniqueness in practice? E.g.:
Does id(a) == id(b) mean a is b or vice versa? What about the opposite?
How safe is it to save an id somewhere to be used later (e.g. into some registry instead of the object itself)?
(Written as a proposed canonical in response to Canonicals for Python: are objects with the same id() the same object, `is` operator, unbound method objects)
According to the id() documentation, an id is only guaranteed to be unique
for the lifetime of the specific object, and
within a specific interpreter instance
As such, comparing ids is not safe unless you also somehow ensure that both objects whose ids are taken are still alive at the time of comparison (and are associated with the same Python interpreter instance, but you need to really try to make that become false).
Which is exactly what is does -- which makes comparing ids redundant. If you cannot use the is syntax for whatever reason, there's always operator.is_.
Now, whether an object is still alive at the time of comparison is not always obvious (and sometimes is grossly non-obvious):
Accessing some attributes (e.g. bound methods of an object) creates a new object each time. So, the result's id may or may not be the same on each attribute access.
Example:
>>> class C(object): pass
>>> c=C()
>>> c.a=1
>>> c.a is c.a
True # same object each time
>>> c.__init__ is c.__init__
False # a different object each time
# The above two are not the only possible cases.
# An attribute may be implemented to sometimes return the same object
# and sometimes a different one:
#property
def page(self):
if check_for_new_version():
self._page=get_new_version()
return self._page
If an object is created as a result of calculating an expression and not saved anywhere, it's immediately discarded,1 and any object created after that can take up its id.
This is even true within the same code line. E.g. the result of id(create_foo()) == id(create_bar()) is undefined.
Example:
>>> id([]) #the list object is discarded when id() returns
39733320L
>>> id([]) #a new, unrelated object is created (and discarded, too)
39733320L #its id can happen to be the same
>>> id([[]])
39733640L #or not
>>> id([])
39733640L #you never really know
Due to the above safety requirements when comparing ids, saving an id instead of the object is not very useful because you have to save a reference to the object itself anyway -- to ensure that it stays alive. Neither is there any performance gain: is implementation is as simple as comparing pointers.
Finally, as an internal optimization (and implementation detail, so this may differ between implementations and releases), CPython reuses some often-used simple objects of immutable types. As of this writing, that includes small integers and some strings. So even if you got them from different places, their ids might coincide.
This does not (technically) violate the above id() documentation's uniqueness promises: the reused object stays alive through all the reuses.
This is also not a big deal because whether two variables point to the same object or not is only practical to know if the object is mutable: if two variables point to the same mutable object, mutating one will (unexpectedly) change the other, too. Immutable types don't have that problem, so for them, it doesn't matter if two variables point to two identical objects or to the same one.
1Sometimes, this is called "unnamed expression".

The basic memory address question about array slicing [duplicate]

tl;dr
Does Python reuse ids? How likely it is that two objects with non overlapping lifetime will get the same id?
Background:
I've been working on a complex project, written purely in Python 3. I've been seeing some issues in testing and spent a lot of time searching for a root cause. After some analysis, my suspicion was that when the testing is being run as a whole (it's orchestrated and being run by a dedicated dispatcher) it's reusing some mocked methods instead of instatiating new objects with their original methods. To check if the interpreter is reusing I used id().
Problem:
id() usually works and shows the object identifier and lets me tell when my call is creating a new instance and not reusing. But what happens when ids if two objects are the same? The documentation says:
Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
The questions:
When can the interpreter reuse id() values? Is it just when it randomly selects the same memory area? If it's just random, it seems extremely unlikely but it's still not guaranteed.
Is there any other method to check what object I am actually referencing? I encountered a situation where I had the object, it had a mocked method. The object was no longer used, garbage collector destroyed it. After that I create a new object of the same class, it got a new id() but the method got the same id as when it was mocked and it actually was just a mock.
Is there a way to force Python to destroy the given object instance? From the reading I did it appears that no and that it is up to a garbage collector when it sees no references to the object but I thought it's worth asking anyway.
Yes, CPython re-uses id() values. Do not count on these being unique in a Python program.
This is clearly documented:
Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
Bold emphasis mine. The id is unique only as long as an object is alive. Objects that have no references left to them are removed from memory, allowing the id() value to be re-used for another object, hence the non-overlapping lifetimes wording.
Note that this applies to CPython only, the standard implementation provided by python.org. There are other Python implementations, such as IronPython, Jython and PyPy, that make their own choices about how to implement id(), because they each can make distinct choices on how to handle memory and object lifetimes.
To address your specific questions:
In CPython, id() is the memory address. New objects will be slotted into the next available memory space, so if a specific memory address has enough space to hold the next new object, the memory address will be reused. You can see this in the interpreter when creating new objects that are the same size:
>>> id(1234)
4546982768
>>> id(4321)
4546982768
The 1234 literal creates a new integer object, for which id() produces a numeric value. As there are no further references to the int value, it is removed from memory again. But executing the same expression again with a different integer literal, and chances are you'll see the same id() value (a garbage collection run breaking cyclic references could free up more memory, so you could also not see the same id() again.
So it's not random, but in CPython it is a function of the memory allocation algorithms.
If you need to check specific objects, keep your own reference to it. That can be a weakref weak reference if all you need to assure is that the object is still 'alive'.
For example, recording an object reference first, then later checking it:
import weakref
# record
object_ref = weakref.ref(some_object)
# check if it's the same object still
some_other_reference is object_ref() # only true if they are the same object
The weak reference won't keep the object alive, but if it is alive then the object_ref() will return it (it'll return None otherwise).
You could use such a mechanism to generate really unique identifiers, see below.
All you have to do to 'destroy' an object is to remove all references to it. Variables (local and global) are references. So are attributes on other objects, and entries in containers such as lists, tuples, dictionaries, sets, etc.
The moment all references to an object are gone, the reference count on the object drops to 0 and it is deleted, there and then.
Garbage collection only is needed to break cyclic references, objects that reference one another only, with no further references to the cycle. Because such a cycle will never reach a reference count of 0 without help, the garbage collector periodically checks for such cycles and breaks one of the references to help clear those objects from memory.
So you can cause any object to be deleted from memory (freed), by removing all references to it. How you achieve that depends on how the object is referenced. You can ask the interpreter to tell you what objects are referencing a given object with the gc.get_referrers() function, but take into account that doesn't give you variable names. It gives you objects, such as the dictionary object that is the __dict__ attribute of a module that references the object as a global, etc. For code fully under your control, at most use gc.get_referrers() as a tool to remind yourself what places the object is referenced from as you write the code to remove those.
If you must have unique identifiers for the lifetime of the Python application, you'd have to implement your own facility. If your objects are hashable and support weak references, then you could just use a WeakKeyDictionary instance to associate arbitrary objects with UUIDs:
from weakref import WeakKeyDictionary
from collections import defaultdict
from uuid import uuid4
class UniqueIdMap(WeakKeyDictionary):
def __init__(self, dict=None):
super().__init__(self)
# replace data with a defaultdict to generate uuids
self.data = defaultdict(uuid4)
if dict is not None:
self.update(dict)
uniqueidmap = UniqueIdMap()
def uniqueid(obj):
"""Produce a unique integer id for the object.
Object must me *hashable*. Id is a UUID and should be unique
across Python invocations.
"""
return uniqueidmap[obj].int
This still produces integers, but as they are UUIDs they are not quite guaranteed to be unique, but the likelihood you'll ever encounter the same ID during your lifetime are smaller than being hit by a meteorite. See How unique is UUID?
This then gives you unique ids even for objects with non-overlapping lifetimes:
>>> class Foo:
... pass
...
>>> id(Foo())
4547149104
>>> id(Foo()) # memory address reused
4547149104
>>> uniqueid(Foo())
151797163173960170410969562162860139237
>>> uniqueid(Foo()) # but you still get a unique UUID
188632072566395632221804340107821543671
The id is unique among currently existing objects. If an object is removed by the garbage collector, a future object can have the same id (and most probably will). You have to use your own unique value (eg. some uuid) to be sure that you are refering to a specific object. You can't do the garbage collection manually either.
It can reuse the id value as soon as the object which had it is no longer in any scope. It is in fact likely to reuse it if you create a similar object immediately after destroying the first.
If you're holding a reference (as opposed to a weak reference), the id is not reused because the object is still alive. If you're just holding the id value, you're probably doing something wrong.
No, but you could delete your reference and request the garbage collector to run. It's possible for the garbage collection to fail to collect that object even if there are no really live references.

How unique is Python's id()?

tl;dr
Does Python reuse ids? How likely it is that two objects with non overlapping lifetime will get the same id?
Background:
I've been working on a complex project, written purely in Python 3. I've been seeing some issues in testing and spent a lot of time searching for a root cause. After some analysis, my suspicion was that when the testing is being run as a whole (it's orchestrated and being run by a dedicated dispatcher) it's reusing some mocked methods instead of instatiating new objects with their original methods. To check if the interpreter is reusing I used id().
Problem:
id() usually works and shows the object identifier and lets me tell when my call is creating a new instance and not reusing. But what happens when ids if two objects are the same? The documentation says:
Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
The questions:
When can the interpreter reuse id() values? Is it just when it randomly selects the same memory area? If it's just random, it seems extremely unlikely but it's still not guaranteed.
Is there any other method to check what object I am actually referencing? I encountered a situation where I had the object, it had a mocked method. The object was no longer used, garbage collector destroyed it. After that I create a new object of the same class, it got a new id() but the method got the same id as when it was mocked and it actually was just a mock.
Is there a way to force Python to destroy the given object instance? From the reading I did it appears that no and that it is up to a garbage collector when it sees no references to the object but I thought it's worth asking anyway.
Yes, CPython re-uses id() values. Do not count on these being unique in a Python program.
This is clearly documented:
Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
Bold emphasis mine. The id is unique only as long as an object is alive. Objects that have no references left to them are removed from memory, allowing the id() value to be re-used for another object, hence the non-overlapping lifetimes wording.
Note that this applies to CPython only, the standard implementation provided by python.org. There are other Python implementations, such as IronPython, Jython and PyPy, that make their own choices about how to implement id(), because they each can make distinct choices on how to handle memory and object lifetimes.
To address your specific questions:
In CPython, id() is the memory address. New objects will be slotted into the next available memory space, so if a specific memory address has enough space to hold the next new object, the memory address will be reused. You can see this in the interpreter when creating new objects that are the same size:
>>> id(1234)
4546982768
>>> id(4321)
4546982768
The 1234 literal creates a new integer object, for which id() produces a numeric value. As there are no further references to the int value, it is removed from memory again. But executing the same expression again with a different integer literal, and chances are you'll see the same id() value (a garbage collection run breaking cyclic references could free up more memory, so you could also not see the same id() again.
So it's not random, but in CPython it is a function of the memory allocation algorithms.
If you need to check specific objects, keep your own reference to it. That can be a weakref weak reference if all you need to assure is that the object is still 'alive'.
For example, recording an object reference first, then later checking it:
import weakref
# record
object_ref = weakref.ref(some_object)
# check if it's the same object still
some_other_reference is object_ref() # only true if they are the same object
The weak reference won't keep the object alive, but if it is alive then the object_ref() will return it (it'll return None otherwise).
You could use such a mechanism to generate really unique identifiers, see below.
All you have to do to 'destroy' an object is to remove all references to it. Variables (local and global) are references. So are attributes on other objects, and entries in containers such as lists, tuples, dictionaries, sets, etc.
The moment all references to an object are gone, the reference count on the object drops to 0 and it is deleted, there and then.
Garbage collection only is needed to break cyclic references, objects that reference one another only, with no further references to the cycle. Because such a cycle will never reach a reference count of 0 without help, the garbage collector periodically checks for such cycles and breaks one of the references to help clear those objects from memory.
So you can cause any object to be deleted from memory (freed), by removing all references to it. How you achieve that depends on how the object is referenced. You can ask the interpreter to tell you what objects are referencing a given object with the gc.get_referrers() function, but take into account that doesn't give you variable names. It gives you objects, such as the dictionary object that is the __dict__ attribute of a module that references the object as a global, etc. For code fully under your control, at most use gc.get_referrers() as a tool to remind yourself what places the object is referenced from as you write the code to remove those.
If you must have unique identifiers for the lifetime of the Python application, you'd have to implement your own facility. If your objects are hashable and support weak references, then you could just use a WeakKeyDictionary instance to associate arbitrary objects with UUIDs:
from weakref import WeakKeyDictionary
from collections import defaultdict
from uuid import uuid4
class UniqueIdMap(WeakKeyDictionary):
def __init__(self, dict=None):
super().__init__(self)
# replace data with a defaultdict to generate uuids
self.data = defaultdict(uuid4)
if dict is not None:
self.update(dict)
uniqueidmap = UniqueIdMap()
def uniqueid(obj):
"""Produce a unique integer id for the object.
Object must me *hashable*. Id is a UUID and should be unique
across Python invocations.
"""
return uniqueidmap[obj].int
This still produces integers, but as they are UUIDs they are not quite guaranteed to be unique, but the likelihood you'll ever encounter the same ID during your lifetime are smaller than being hit by a meteorite. See How unique is UUID?
This then gives you unique ids even for objects with non-overlapping lifetimes:
>>> class Foo:
... pass
...
>>> id(Foo())
4547149104
>>> id(Foo()) # memory address reused
4547149104
>>> uniqueid(Foo())
151797163173960170410969562162860139237
>>> uniqueid(Foo()) # but you still get a unique UUID
188632072566395632221804340107821543671
The id is unique among currently existing objects. If an object is removed by the garbage collector, a future object can have the same id (and most probably will). You have to use your own unique value (eg. some uuid) to be sure that you are refering to a specific object. You can't do the garbage collection manually either.
It can reuse the id value as soon as the object which had it is no longer in any scope. It is in fact likely to reuse it if you create a similar object immediately after destroying the first.
If you're holding a reference (as opposed to a weak reference), the id is not reused because the object is still alive. If you're just holding the id value, you're probably doing something wrong.
No, but you could delete your reference and request the garbage collector to run. It's possible for the garbage collection to fail to collect that object even if there are no really live references.

Integers v/s Floats in python:Cannot understand the behavior

I was playing a bit in my python shell while learning about mutability of objects.
I found something strange:
>>> x=5.0
>>> id(x)
48840312
>>> id(5.0)
48840296
>>> x=x+3.0
>>> id(x) # why did x (now 8.0) keep the same id as 5.0?
48840296
>>> id(5.0)
36582128
>>> id(5.0)
48840344
Why is the id of 5.0 reused after the statement x=x+3.0?
Fundamentally, the answer to your question is "calling id() on numbers will give you unpredictable results". The reason for this is because unlike languages like Java, where primitives literally are their value in memory, "primitives" in Python are still objects, and no guarantee is provided that exactly the same object will be used every time, merely that a functionally equivalent one will be.
CPython caches the values of the integers from -5 to 256 for efficiency (ensuring that calls to id() will always be the same), since these are commonly used and can be effectively cached, however nothing about the language requires this to be the case, and other implementations may chose not to do so.
Whenever you write a double literal in Python, you're asking the interpreter to convert the string into a valid numerical object. If it can, Python will reuse existing objects, but if it cannot easily determine whether an object exits already, it will simply create a new one.
This is not to say that numbers in Python are mutable - they aren't. Any instance of a number, such as 5.0, in Python cannot be changed by the user after being created. However there's nothing wrong, as far as the interpreter is concerned, with constructing more than one instance of the same number.
Your specific example of the object representing x = 5.0 being reused for the value of x += 3.0 is an implementation detail. Under the covers, CPython may, if it sees fit, reuse numerical objects, both integers and floats, to avoid the costly activity of constructing a whole new object. I stress however, this is an implementation detail; it's entirely possible certain cases will not display this behavior, and CPython could at any time change its number-handling logic to no longer behave this way. You should avoid writing any code that relies on this quirk.
The alternative, as eryksun points out, is simply that you stumbled on an object being garbage collected and replaced in the same location. From the user's perspective, there's no difference between the two cases, and this serves to stress that id() should not be used on "primitives".
The Devil is in the details
PyObject* PyInt_FromLong(long ival)
Return value: New reference.
Create a new integer object with a value of ival.
The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range
you actually just get back a reference to the existing object. So it
should be possible to change the value of 1. I suspect the behaviour
of Python in this case is undefined. :-)
Note This is true only for CPython and may not apply for other Python Distribution.

Assign a value equal only to itself

I wish to assign to a variable (a "constant"), a value that will allow that variable to only ever return True in is and == comparisons against itself.
I want to avoid assigning an arbitary value such as an int or some other type on the off chance that the value I choose clashes with some other.
I'm considering generating an instance of a class that uses the uniqueness of CPython's id() values in any comparisons the value might support.
From here:
If no __cmp__(), __eq__() or __ne__() operation is defined, class instances are compared by object identity (“address”).
Would suggest that:
MY_CONSTANT = object()
Will only ever return true in a comparison with MY_CONSTANT on a CPython implementation if MY_CONSTANT was somehow garbage collected, and something else allocated in it's place during the comparison (I would assume this is probably never going to happen).
Yes. This is a good way to define unique constants. There is of course, the minimal risk of whatever object you are comparing it to being defined as equal to everything, but if everyone is playing reasonably nicely, this should work. Also, the garbage collection issue won't be a problem, because if that should ever happen, your constant has already gone away, and isn't around to compare equal to the new object with the same id.
As long as MY_CONSTANT stays in scope the object it references can not be garbage collected.
You really should always use is for comparison, as the behaviour of == can be overridden

Categories