I wish to assign to a variable (a "constant"), a value that will allow that variable to only ever return True in is and == comparisons against itself.
I want to avoid assigning an arbitary value such as an int or some other type on the off chance that the value I choose clashes with some other.
I'm considering generating an instance of a class that uses the uniqueness of CPython's id() values in any comparisons the value might support.
From here:
If no __cmp__(), __eq__() or __ne__() operation is defined, class instances are compared by object identity (“address”).
Would suggest that:
MY_CONSTANT = object()
Will only ever return true in a comparison with MY_CONSTANT on a CPython implementation if MY_CONSTANT was somehow garbage collected, and something else allocated in it's place during the comparison (I would assume this is probably never going to happen).
Yes. This is a good way to define unique constants. There is of course, the minimal risk of whatever object you are comparing it to being defined as equal to everything, but if everyone is playing reasonably nicely, this should work. Also, the garbage collection issue won't be a problem, because if that should ever happen, your constant has already gone away, and isn't around to compare equal to the new object with the same id.
As long as MY_CONSTANT stays in scope the object it references can not be garbage collected.
You really should always use is for comparison, as the behaviour of == can be overridden
Related
How much can I rely on the object's id() and its uniqueness in practice? E.g.:
Does id(a) == id(b) mean a is b or vice versa? What about the opposite?
How safe is it to save an id somewhere to be used later (e.g. into some registry instead of the object itself)?
(Written as a proposed canonical in response to Canonicals for Python: are objects with the same id() the same object, `is` operator, unbound method objects)
According to the id() documentation, an id is only guaranteed to be unique
for the lifetime of the specific object, and
within a specific interpreter instance
As such, comparing ids is not safe unless you also somehow ensure that both objects whose ids are taken are still alive at the time of comparison (and are associated with the same Python interpreter instance, but you need to really try to make that become false).
Which is exactly what is does -- which makes comparing ids redundant. If you cannot use the is syntax for whatever reason, there's always operator.is_.
Now, whether an object is still alive at the time of comparison is not always obvious (and sometimes is grossly non-obvious):
Accessing some attributes (e.g. bound methods of an object) creates a new object each time. So, the result's id may or may not be the same on each attribute access.
Example:
>>> class C(object): pass
>>> c=C()
>>> c.a=1
>>> c.a is c.a
True # same object each time
>>> c.__init__ is c.__init__
False # a different object each time
# The above two are not the only possible cases.
# An attribute may be implemented to sometimes return the same object
# and sometimes a different one:
#property
def page(self):
if check_for_new_version():
self._page=get_new_version()
return self._page
If an object is created as a result of calculating an expression and not saved anywhere, it's immediately discarded,1 and any object created after that can take up its id.
This is even true within the same code line. E.g. the result of id(create_foo()) == id(create_bar()) is undefined.
Example:
>>> id([]) #the list object is discarded when id() returns
39733320L
>>> id([]) #a new, unrelated object is created (and discarded, too)
39733320L #its id can happen to be the same
>>> id([[]])
39733640L #or not
>>> id([])
39733640L #you never really know
Due to the above safety requirements when comparing ids, saving an id instead of the object is not very useful because you have to save a reference to the object itself anyway -- to ensure that it stays alive. Neither is there any performance gain: is implementation is as simple as comparing pointers.
Finally, as an internal optimization (and implementation detail, so this may differ between implementations and releases), CPython reuses some often-used simple objects of immutable types. As of this writing, that includes small integers and some strings. So even if you got them from different places, their ids might coincide.
This does not (technically) violate the above id() documentation's uniqueness promises: the reused object stays alive through all the reuses.
This is also not a big deal because whether two variables point to the same object or not is only practical to know if the object is mutable: if two variables point to the same mutable object, mutating one will (unexpectedly) change the other, too. Immutable types don't have that problem, so for them, it doesn't matter if two variables point to two identical objects or to the same one.
1Sometimes, this is called "unnamed expression".
Lets say a = 10000000000 and b = 10000000000 i.e. both a and b have the same value.
When I print id() of a and b it always remains same no matter how many times I run the code.
Also, it remains same for float, string, boolean and tuple but does not remain same for lists, sets and dictionaries.
Does that mean when multiple variables (immutable types) have the exact same value it always point to a single object in memory and hence a is b will always return True, whereas multiple variables of mutable type having the same value point to its unique object in memory and hence a is b will always return False?
...it always point...
In general yes, but it is not guaranteed. It is a form of Python internal optimization known as type kerning.
You should look at it like something that does not matter for immutables, something transparent for the language user. If the object has a value that cannot change, it does not matter what instance of the objects of that type (and with that value) you are reading. That is why you can live with having only one.
As for the tuples, note that the contained objects can change, only the tuple cannot (that is, change the number of its elements).
So for immutables you do not have to worry.
For mutables, you should be careful, not with Python internal optimizations but with the code you write. Because you can have many names referring to the same instance (that now can be changed through any one of these references) and one change will be reflected in all of them. This is more tricky when passing mutables as arguments, because far away code can change the object (what was passed was a copy of the reference to the object, not a copy of the object itself).
It is your responsability to manage things with mutables. You can create new instances with the same values (copies) or share the objects. You can even pass copies as arguments to protect yourself from unintended side effects of calls.
In Python in a Nutshell
Assignment statements can be plain or augmented.
Plain assignment to a variable
(e.g., name=value ) is how you create a new variable or rebind an existing variable to
a new value. Plain assignment to an object attribute (e.g., x.attr=value ) is a request
to object x to create or rebind attribute 'attr' . Plain assignment to an item in a
container (e.g., x[k]=value ) is a request to container x to create or rebind the item
with index or key k .
Augmented assignment (e.g., name+=value ) cannot, per se, create new references.
Augmented assignment can rebind a variable, ask an object to rebind one of its
existing attributes or items, or request the target object to modify itself. When you
make a request to an object, it is up to the object to decide whether and how to
honor the request, and whether to raise an exception.
...
In an augmented assignment, just as in a plain one, Python first evaluates the RHS
expression. Then, when the LHS refers to an object that has a special method for the
appropriate in-place version of the operator, Python calls the method with the RHS
value as its argument. It is up to the method to modify the LHS object appropriately
and return the modified object (“Special Methods” on page 123 covers special meth‐
ods). When the LHS object has no appropriate in-place special method, Python
applies the corresponding binary operator to the LHS and RHS objects, then
rebinds the target reference to the operator’s result. For example, x+=y is like
x=x.__iadd__(y) when x has special method __iadd__ for in-place addition.
Otherwise, x+=y is like x=x+y .
Augmented assignment never creates its target reference; the target must already be
bound when augmented assignment executes. Augmented assignment can rebind
the target reference to a new object, or modify the same object to which the target
reference was already bound. Plain assignment, in contrast, can create or rebind the
LHS target reference, but it never modifies the object, if any, to which the target ref‐
erence was previously bound. The distinction between objects and references to
objects is crucial here. For example, x=x+y does not modify the object to which
name x was originally bound. Rather, it rebinds the name x to refer to a new object.
x+=y , in contrast, modifies the object to which the name x is bound, when that
object has special method __iadd__ ; otherwise, x+=y rebinds the name x to a new
object, just like x=x+y .
Is the difference whether an assigment performs in-place modification and not-in-place assignment, (e.g. used by augmented assignment and plain assignment), some implementation details of Python, which programmers in Python don't have to know, or something belonging to the semantics which programmers need to know? Note: in-place modification means change the value in a memory region, while not-in-place assignment allocates a new memory region.
If the answer is no, why do programmers in Python need to know the difference? Is there any situation where programmers in Python need to be aware of the difference?
I suspect that the difference is implementation details, and programmers in Python don't need to know the difference but only need to know the semantics of assignment.
Thanks.
The docs say, regarding the __i<method>__ special methods:
These methods are called to implement the augmented arithmetic assignments (+=, -=, *=, #=, /=, //=, %=, **=, <<=, >>=, &=, ^=, |=). These methods should attempt to do the operation in-place (modifying self) and return the result (which could be, but does not have to be, self). If a specific method is not defined, the augmented assignment falls back to the normal methods. For instance, if x is an instance of a class with an __iadd__() method, x += y is equivalent to x = x.__iadd__(y) . Otherwise, x.__add__(y) and y.__radd__(x) are considered, as with the evaluation of x + y. In certain situations, augmented assignment can result in unexpected errors (see Why does a_tuple\[i\] += \[‘item’\] raise an exception when the addition works?), but this behavior is in fact part of the data model.
To answer the question you pose:
Is the difference between in-place modification and not, (e.g. used by augmented assignment and plain assignment), some implementation details of Python, which programmers in Python don't have to know?
Yes, you need to be aware of this when implementing data model for custom objects.
As a user of such objects, you would also better understand what you're doing when using augmented assignment if you understand this.
Why? If you don't implement in-place behavior, when performing augmented assignment, the name or lookup gets reassigned to a object that is the result of the standard implementation of the operation.
As an implementer, and as a user, you'll need to know this.
follow-on question from asker:
My question is about whether an assignment performs in-place modification and not-in-place assignment, not which method is invoked. in-place modification means change the value in a memory region, while not-in-place assignment creates a new memory region. I was wondering if such difference is implementation details which programmers don't need to know, or something belonging to the semantics which programmers need to know.
In Python everything is an object. Every object has a header with some details. Any object that contains other objects does not actually contain the object, rather it has a pointer, or reference, to the location in memory of the object it holds. Mutation of the object changes the reference to a new reference.
The old location of the old object only gets overwritten if the count of non-weak references to that object go to zero. You can think of this as an implementation detail, but knowledge of this helps one to be a more confident user of the language.
Users rarely need to be concerned with these details, but when you do, you'll be glad you understand it.
Again, your question:
My question is about whether an assignment performs in-place modification and not-in-place assignment, not which method is invoked.
Which method is invoked determines the behavior. Therefore you need to know which method is being invoked - either from the semantics of the documentation of the objects you are using, or from your own knowledge of the Python data model - to answer your question.
Is there any situation where programmers in Python need to be aware of the difference?
Python is a dynamic language that gives you lots of polymorphism for free (duck typing). If a function is written to work on lists, it likely will work on many list-like things. Augmented assignment throws a wrench into that. Suppose a function adds data to a collection:
>>> def add_data(collection):
... collection += ('tuple',)
...
>>> l = []
>>> add_data(l)
>>> l
['tuple']
>>> t = tuple()
>>> add_data(t)
>>> t
()
It fails silently in the second case. This is a risk generally when you have multiple references to an object and an augmented assignment is applied to one of them. Its like a box of chocolates but in a bad way.
I’m wondering if there is anything about python object IDs that will prevent them from ever equaling zero? I’m asking because I’m using zero as a stand-in for a special case in my code.
From the docs
CPython implementation detail: This is the address of the object in memory.
0 is an invalid memory location. So no object in C will ever have this memory location and no object in the CPython implementation will ever have an id of zero.
Not sure about other python implementations though
Return the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
There's nothing that says that it cannot be zero (zero is an integer). If you rely on it not being zero then you're relying on a current implementation detail which is not smart.
What you instead should do is to use for example None to indicate that it isn't an id of an object.
This isn't as strong an answer as I'd like, but doing help(id) on python 2.7.5
id(...)
id(object) -> integer
Return the identity of an object. This is guaranteed to be unique among
simultaneously existing objects. (Hint: it's the object's memory address.)
Assuming you don't have an object that is pointing to NULL, you should be safe there.
If you want an object that is different than any other, you can create one:
special = object()
As long as you don't delete it, special will be unique over the run time of your program. This might achieve the same thing you intend with checking id() being zero.
I was playing a bit in my python shell while learning about mutability of objects.
I found something strange:
>>> x=5.0
>>> id(x)
48840312
>>> id(5.0)
48840296
>>> x=x+3.0
>>> id(x) # why did x (now 8.0) keep the same id as 5.0?
48840296
>>> id(5.0)
36582128
>>> id(5.0)
48840344
Why is the id of 5.0 reused after the statement x=x+3.0?
Fundamentally, the answer to your question is "calling id() on numbers will give you unpredictable results". The reason for this is because unlike languages like Java, where primitives literally are their value in memory, "primitives" in Python are still objects, and no guarantee is provided that exactly the same object will be used every time, merely that a functionally equivalent one will be.
CPython caches the values of the integers from -5 to 256 for efficiency (ensuring that calls to id() will always be the same), since these are commonly used and can be effectively cached, however nothing about the language requires this to be the case, and other implementations may chose not to do so.
Whenever you write a double literal in Python, you're asking the interpreter to convert the string into a valid numerical object. If it can, Python will reuse existing objects, but if it cannot easily determine whether an object exits already, it will simply create a new one.
This is not to say that numbers in Python are mutable - they aren't. Any instance of a number, such as 5.0, in Python cannot be changed by the user after being created. However there's nothing wrong, as far as the interpreter is concerned, with constructing more than one instance of the same number.
Your specific example of the object representing x = 5.0 being reused for the value of x += 3.0 is an implementation detail. Under the covers, CPython may, if it sees fit, reuse numerical objects, both integers and floats, to avoid the costly activity of constructing a whole new object. I stress however, this is an implementation detail; it's entirely possible certain cases will not display this behavior, and CPython could at any time change its number-handling logic to no longer behave this way. You should avoid writing any code that relies on this quirk.
The alternative, as eryksun points out, is simply that you stumbled on an object being garbage collected and replaced in the same location. From the user's perspective, there's no difference between the two cases, and this serves to stress that id() should not be used on "primitives".
The Devil is in the details
PyObject* PyInt_FromLong(long ival)
Return value: New reference.
Create a new integer object with a value of ival.
The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range
you actually just get back a reference to the existing object. So it
should be possible to change the value of 1. I suspect the behaviour
of Python in this case is undefined. :-)
Note This is true only for CPython and may not apply for other Python Distribution.