Related
How much can I rely on the object's id() and its uniqueness in practice? E.g.:
Does id(a) == id(b) mean a is b or vice versa? What about the opposite?
How safe is it to save an id somewhere to be used later (e.g. into some registry instead of the object itself)?
(Written as a proposed canonical in response to Canonicals for Python: are objects with the same id() the same object, `is` operator, unbound method objects)
According to the id() documentation, an id is only guaranteed to be unique
for the lifetime of the specific object, and
within a specific interpreter instance
As such, comparing ids is not safe unless you also somehow ensure that both objects whose ids are taken are still alive at the time of comparison (and are associated with the same Python interpreter instance, but you need to really try to make that become false).
Which is exactly what is does -- which makes comparing ids redundant. If you cannot use the is syntax for whatever reason, there's always operator.is_.
Now, whether an object is still alive at the time of comparison is not always obvious (and sometimes is grossly non-obvious):
Accessing some attributes (e.g. bound methods of an object) creates a new object each time. So, the result's id may or may not be the same on each attribute access.
Example:
>>> class C(object): pass
>>> c=C()
>>> c.a=1
>>> c.a is c.a
True # same object each time
>>> c.__init__ is c.__init__
False # a different object each time
# The above two are not the only possible cases.
# An attribute may be implemented to sometimes return the same object
# and sometimes a different one:
#property
def page(self):
if check_for_new_version():
self._page=get_new_version()
return self._page
If an object is created as a result of calculating an expression and not saved anywhere, it's immediately discarded,1 and any object created after that can take up its id.
This is even true within the same code line. E.g. the result of id(create_foo()) == id(create_bar()) is undefined.
Example:
>>> id([]) #the list object is discarded when id() returns
39733320L
>>> id([]) #a new, unrelated object is created (and discarded, too)
39733320L #its id can happen to be the same
>>> id([[]])
39733640L #or not
>>> id([])
39733640L #you never really know
Due to the above safety requirements when comparing ids, saving an id instead of the object is not very useful because you have to save a reference to the object itself anyway -- to ensure that it stays alive. Neither is there any performance gain: is implementation is as simple as comparing pointers.
Finally, as an internal optimization (and implementation detail, so this may differ between implementations and releases), CPython reuses some often-used simple objects of immutable types. As of this writing, that includes small integers and some strings. So even if you got them from different places, their ids might coincide.
This does not (technically) violate the above id() documentation's uniqueness promises: the reused object stays alive through all the reuses.
This is also not a big deal because whether two variables point to the same object or not is only practical to know if the object is mutable: if two variables point to the same mutable object, mutating one will (unexpectedly) change the other, too. Immutable types don't have that problem, so for them, it doesn't matter if two variables point to two identical objects or to the same one.
1Sometimes, this is called "unnamed expression".
This question already has answers here:
How can two Python objects have same id but 'is' operator returns False?
(2 answers)
Why is the id of a Python class not unique when called quickly?
(6 answers)
Unnamed Python objects have the same id
(2 answers)
Closed 4 years ago.
Note that this question might be (is?) specific to CPython.
Say you have some list, and check copies of the list for identity against each other:
>>> a=list(range(10))
>>> b,c=a[:],a[:]
>>> b is c
False
>>> id(b), id(c)
(3157888272304, 3157888272256)
No great shakes there. But if we do this in a more ephemeral way, things might seem a bit weird at first:
>>> a[:] is a[:]
False # <- two ephemeral copies not the same object (duh)
>>> id(a[:]),id(a[:])
(3157888272544, 3157888272544) # <- but two other ephemerals share the same id..? hmm....
...until we recognize what is probably going on here. I have not confirmed it by looking at the CPython implementation (I can barely read c++ so it would be a waste of time, to be honest), but it at least seems obvious that even though two objects have the same id, CPython is smart enough to know that they aren't the same object.
Assuming this is correct, my question is: what criteria is CPython using to determine whether the two ephemeral objects are the not the same object, given that they have the same id (presumably for efficiency reasons- see below)? Is it perhaps looking at the time it was marked to be garbage collected? The time it was created? Or something else...?
My theory on why they have the same id is that, likely, CPython knows an ephemeral copy of the list was already made and is waiting to be garbage collected, and it just efficiently re-uses the same memory location. It would be great if an answer could clarify/confirm this as well.
Two unmutable objects, sharing the same address, would, as you are concerned, be indistinguishable from each other.
The thing is that when you do a[:] is a[:] both objetcts are not at the same address - in order for the identity operator is to compare both objects, both operands have to exist - so, there is still a reference to the object at the left hand side when the native code for is is actually run.
On the other hand, when you do id(a[:]),id(a[:]) the object inside the parentheses on the first call is left without any references as soon as the id function call is done, and is destroyed, freeing the memory block to be used by the second a[:].
I was playing a bit in my python shell while learning about mutability of objects.
I found something strange:
>>> x=5.0
>>> id(x)
48840312
>>> id(5.0)
48840296
>>> x=x+3.0
>>> id(x) # why did x (now 8.0) keep the same id as 5.0?
48840296
>>> id(5.0)
36582128
>>> id(5.0)
48840344
Why is the id of 5.0 reused after the statement x=x+3.0?
Fundamentally, the answer to your question is "calling id() on numbers will give you unpredictable results". The reason for this is because unlike languages like Java, where primitives literally are their value in memory, "primitives" in Python are still objects, and no guarantee is provided that exactly the same object will be used every time, merely that a functionally equivalent one will be.
CPython caches the values of the integers from -5 to 256 for efficiency (ensuring that calls to id() will always be the same), since these are commonly used and can be effectively cached, however nothing about the language requires this to be the case, and other implementations may chose not to do so.
Whenever you write a double literal in Python, you're asking the interpreter to convert the string into a valid numerical object. If it can, Python will reuse existing objects, but if it cannot easily determine whether an object exits already, it will simply create a new one.
This is not to say that numbers in Python are mutable - they aren't. Any instance of a number, such as 5.0, in Python cannot be changed by the user after being created. However there's nothing wrong, as far as the interpreter is concerned, with constructing more than one instance of the same number.
Your specific example of the object representing x = 5.0 being reused for the value of x += 3.0 is an implementation detail. Under the covers, CPython may, if it sees fit, reuse numerical objects, both integers and floats, to avoid the costly activity of constructing a whole new object. I stress however, this is an implementation detail; it's entirely possible certain cases will not display this behavior, and CPython could at any time change its number-handling logic to no longer behave this way. You should avoid writing any code that relies on this quirk.
The alternative, as eryksun points out, is simply that you stumbled on an object being garbage collected and replaced in the same location. From the user's perspective, there's no difference between the two cases, and this serves to stress that id() should not be used on "primitives".
The Devil is in the details
PyObject* PyInt_FromLong(long ival)
Return value: New reference.
Create a new integer object with a value of ival.
The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range
you actually just get back a reference to the existing object. So it
should be possible to change the value of 1. I suspect the behaviour
of Python in this case is undefined. :-)
Note This is true only for CPython and may not apply for other Python Distribution.
This question already has answers here:
Why variable = object doesn't work like variable = number
(10 answers)
Closed 4 years ago.
There is this code:
# assignment behaviour for integer
a = b = 0
print a, b # prints 0 0
a = 4
print a, b # prints 4 0 - different!
# assignment behaviour for class object
class Klasa:
def __init__(self, num):
self.num = num
a = Klasa(2)
b = a
print a.num, b.num # prints 2 2
a.num = 3
print a.num, b.num # prints 3 3 - the same!
Questions:
Why assignment operator works differently for fundamental type and
class object (for fundamental types it copies by value, for class object it copies by reference)?
How to copy class objects only by value?
How to make references for fundamental types like in C++ int& b = a?
This is a stumbling block for many Python users. The object reference semantics are different from what C programmers are used to.
Let's take the first case. When you say a = b = 0, a new int object is created with value 0 and two references to it are created (one is a and another is b). These two variables point to the same object (the integer which we created). Now, we run a = 4. A new int object of value 4 is created and a is made to point to that. This means, that the number of references to 4 is one and the number of references to 0 has been reduced by one.
Compare this with a = 4 in C where the area of memory which a "points" to is written to. a = b = 4 in C means that 4 is written to two pieces of memory - one for a and another for b.
Now the second case, a = Klass(2) creates an object of type Klass, increments its reference count by one and makes a point to it. b = a simply takes what a points to , makes b point to the same thing and increments the reference count of the thing by one. It's the same as what would happen if you did a = b = Klass(2). Trying to print a.num and b.num are the same since you're dereferencing the same object and printing an attribute value. You can use the id builtin function to see that the object is the same (id(a) and id(b) will return the same identifier). Now, you change the object by assigning a value to one of it's attributes. Since a and b point to the same object, you'd expect the change in value to be visible when the object is accessed via a or b. And that's exactly how it is.
Now, for the answers to your questions.
The assignment operator doesn't work differently for these two. All it does is add a reference to the RValue and makes the LValue point to it. It's always "by reference" (although this term makes more sense in the context of parameter passing than simple assignments).
If you want copies of objects, use the copy module.
As I said in point 1, when you do an assignment, you always shift references. Copying is never done unless you ask for it.
Quoting from Data Model
Objects are Python’s abstraction for data. All data in a Python
program is represented by objects or by relations between objects. (In
a sense, and in conformance to Von Neumann’s model of a “stored
program computer,” code is also represented by objects.)
From Python's point of view, Fundamental data type is fundamentally different from C/C++. It is used to map C/C++ data types to Python. And so let's leave it from the discussion for the time being and consider the fact that all data are object and are manifestation of some class. Every object has an ID (somewhat like address), Value, and a Type.
All objects are copied by reference. For ex
>>> x=20
>>> y=x
>>> id(x)==id(y)
True
>>>
The only way to have a new instance is by creating one.
>>> x=3
>>> id(x)==id(y)
False
>>> x==y
False
This may sound complicated at first instance but to simplify a bit, Python made some types immutable. For example you can't change a string. You have to slice it and create a new string object.
Often copying by reference gives unexpected results for ex.
x=[[0]*8]*8 may give you a feeling that it creates a two dimensional list of 0s. But in fact it creates a list of the reference of the same list object [0]s. So doing x[1][1] would end up changing all the duplicate instance at the same time.
The Copy module provides a method called deepcopy to create a new instance of the object rather than a shallow instance. This is beneficial when you intend to have two distinct object and manipulate it separately just as you intended in your second example.
To extend your example
>>> class Klasa:
def __init__(self, num):
self.num = num
>>> a = Klasa(2)
>>> b = copy.deepcopy(a)
>>> print a.num, b.num # prints 2 2
2 2
>>> a.num = 3
>>> print a.num, b.num # prints 3 3 - different!
3 2
It doesn't work differently. In your first example, you changed a so that a and b reference different objects. In your second example, you did not, so a and b still reference the same object.
Integers, by the way, are immutable. You can't modify their value. All you can do is make a new integer and rebind your reference. (like you did in your first example)
Suppose you and I have a common friend. If I decide that I no longer like her, she is still your friend. On the other hand, if I give her a gift, your friend received a gift.
Assignment doesn't copy anything in Python, and "copy by reference" is somewhere between awkward and meaningless (as you actually point out in one of your comments). Assignment causes a variable to begin referring to a value. There aren't separate "fundamental types" in Python; while some of them are built-in, int is still a class.
In both cases, assignment causes the variable to refer to whatever it is that the right-hand-side evaluates to. The behaviour you're seeing is exactly what you should expect in that environment, per the metaphor. Whether your "friend" is an int or a Klasa, assigning to an attribute is fundamentally different from reassigning the variable to a completely other instance, with the correspondingly different behaviour.
The only real difference is that the int doesn't happen to have any attributes you can assign to. (That's the part where the implementation actually has to do a little magic to restrict you.)
You are confusing two different concepts of a "reference". The C++ T& is a magical thing that, when assigned to, updates the referred-to object in-place, and not the reference itself; that can never be "reseated" once the reference is initialized. This is useful in a language where most things are values. In Python, everything is a reference to begin with. The Pythonic reference is more like an always-valid, never-null, not-usable-for-arithmetic, automatically-dereferenced pointer. Assignment causes the reference to start referring to a different thing completely. You can't "update the referred-to object in-place" by replacing it wholesale, because Python's objects just don't work like that. You can, of course, update its internal state by playing with its attributes (if there are any accessible ones), but those attributes are, themselves, also all references.
I've read several python tutorials (Dive Into Python, for one), and the language reference on Python.org - I don't see why the language needs tuples.
Tuples have no methods compared to a list or set, and if I must convert a tuple to a set or list to be able to sort them, what's the point of using a tuple in the first place?
Immutability?
Why does anyone care if a variable lives at a different place in memory than when it was originally allocated? This whole business of immutability in Python seems to be over emphasized.
In C/C++ if I allocate a pointer and point to some valid memory, I don't care where the address is located as long as it's not null before I use it.
Whenever I reference that variable, I don't need to know if the pointer is still pointing to the original address or not. I just check for null and use it (or not).
In Python, when I allocate a string (or tuple) assign it to x, then modify the string, why do I care if it's the original object? As long as the variable points to my data, that's all that matters.
>>> x='hello'
>>> id(x)
1234567
>>> x='good bye'
>>> id(x)
5432167
x still references the data I want, why does anyone need to care if its id is the same or different?
immutable objects can allow substantial optimization; this is presumably why strings are also immutable in Java, developed quite separately but about the same time as Python, and just about everything is immutable in truly-functional languages.
in Python in particular, only immutables can be hashable (and, therefore, members of sets, or keys in dictionaries). Again, this afford optimization, but far more than just "substantial" (designing decent hash tables storing completely mutable objects is a nightmare -- either you take copies of everything as soon as you hash it, or the nightmare of checking whether the object's hash has changed since you last took a reference to it rears its ugly head).
Example of optimization issue:
$ python -mtimeit '["fee", "fie", "fo", "fum"]'
1000000 loops, best of 3: 0.432 usec per loop
$ python -mtimeit '("fee", "fie", "fo", "fum")'
10000000 loops, best of 3: 0.0563 usec per loop
None of the answers above point out the real issue of tuples vs lists, which many new to Python seem to not fully understand.
Tuples and lists serve different purposes. Lists store homogenous data. You can and should have a list like this:
["Bob", "Joe", "John", "Sam"]
The reason that is a correct use of lists is because those are all homogenous types of data, specifically, people's names. But take a list like this:
["Billy", "Bob", "Joe", 42]
That list is one person's full name, and their age. That isn't one type of data. The correct way to store that information is either in a tuple, or in an object. Lets say we have a few :
[("Billy", "Bob", "Joe", 42), ("Robert", "", "Smith", 31)]
The immutability and mutability of Tuples and Lists is not the main difference. A list is a list of the same kind of items: files, names, objects. Tuples are a grouping of different types of objects. They have different uses, and many Python coders abuse lists for what tuples are meant for.
Please don't.
Edit:
I think this blog post explains why I think this better than I did:
Understanding tuples vs. lists in Python - E-Scribe
if I must convert a tuple to a set or list to be able to sort them, what's the point of using a tuple in the first place?
In this particular case, there probably isn't a point. This is a non-issue, because this isn't one of the cases where you'd consider using a tuple.
As you point out, tuples are immutable. The reasons for having immutable types apply to tuples:
copy efficiency: rather than copying an immutable object, you can alias it (bind a variable to a reference)
comparison efficiency: when you're using copy-by-reference, you can compare two variables by comparing location, rather than content
interning: you need to store at most one copy of any immutable value
there's no need to synchronize access to immutable objects in concurrent code
const correctness: some values shouldn't be allowed to change. This (to me) is the main reason for immutable types.
Note that a particular Python implementation may not make use of all of the above features.
Dictionary keys must be immutable, otherwise changing the properties of a key-object can invalidate invariants of the underlying data structure. Tuples can thus potentially be used as keys. This is a consequence of const correctness.
See also "Introducing tuples", from Dive Into Python.
Sometimes we like to use objects as dictionary keys
For what it's worth, tuples recently (2.6+) grew index() and count() methods
I've always found having two completely separate types for the same basic data structure (arrays) to be an awkward design, but not a real problem in practice. (Every language has its warts, Python included, but this isn't an important one.)
Why does anyone care if a variable lives at a different place in memory than when it was originally allocated? This whole business of immutability in Python seems to be over emphasized.
These are different things. Mutability isn't related to the place it's stored in memory; it means the stuff it points to can't change.
Python objects can't change location after they're created, mutable or not. (More accurately, the value of id() can't change--same thing, in practice.) The internal storage of mutable objects can change, but that's a hidden implementation detail.
>>> x='hello'
>>> id(x)
1234567
>>> x='good bye'
>>> id(x)
5432167
This isn't modifying ("mutating") the variable; it's creating a new variable with the same name, and discarding the old one. Compare to a mutating operation:
>>> a = [1,2,3]
>>> id(a)
3084599212L
>>> a[1] = 5
>>> a
[1, 5, 3]
>>> id(a)
3084599212L
As others have pointed out, this allows using arrays as keys to dictionaries, and other data structures that need immutability.
Note that keys for dictionaries do not have to be completely immutable. Only the part of it used as a key needs to be immutable; for some uses, this is an important distinction. For example, you could have a class representing a user, which compares equality and a hash by the unique username. You could then hang other mutable data on the class--"user is logged in", etc. Since this doesn't affect equality or the hash, it's possible and perfectly valid to use this as a key in a dictionary. This isn't too commonly needed in Python; I just point it out since several people have claimed that keys need to be "immutable", which is only partially correct. I've used this many times with C++ maps and sets, though.
As gnibbler offered in a comment, Guido had an opinion that is not fully accepted/appreciated: “lists are for homogeneous data, tuples are for heterogeneous data”. Of course, many of the opposers interpreted this as meaning that all elements of a list should be of the same type.
I like to see it differently, not unlike others also have in the past:
blue= 0, 0, 255
alist= ["red", "green", blue]
Note that I consider alist to be homogeneous, even if type(alist[1]) != type(alist[2]).
If I can change the order of the elements and I won't have issues in my code (apart from assumptions, e.g. “it should be sorted”), then a list should be used. If not (like in the tuple blue above), then I should use a tuple.
They are important since they guarantee the caller that the object they pass won't be mutated.
If you do this:
a = [1,1,1]
doWork(a)
The caller has no guarantee of the value of a after the call.
However,
a = (1,1,1)
doWorK(a)
Now you as the caller or as a reader of this code know that a is the same.
You could always for this scenario make a copy of the list and pass that but now you are wasting cycles instead of using a language construct that makes more semantic sense.
you can see here for some discussion on this
Your question (and follow-up comments) focus on whether the id() changes during an assignment. Focusing on this follow-on effect of the difference between immutable object replacement and mutable object modification rather than the difference itself is perhaps not the best approach.
Before we continue, make sure that the behavior demonstrated below is what you expect from Python.
>>> a1 = [1]
>>> a2 = a1
>>> print a2[0]
1
>>> a1[0] = 2
>>> print a2[0]
2
In this case, the contents of a2 was changed, even though only a1 had a new value assigned. Contrast to the following:
>>> a1 = (1,)
>>> a2 = a1
>>> print a2[0]
1
>>> a1 = (2,)
>>> print a2[0]
1
In this latter case, we replaced the entire list, rather than updating its contents. With immutable types such as tuples, this is the only behavior allowed.
Why does this matter? Let's say you have a dict:
>>> t1 = (1,2)
>>> d1 = { t1 : 'three' }
>>> print d1
{(1,2): 'three'}
>>> t1[0] = 0 ## results in a TypeError, as tuples cannot be modified
>>> t1 = (2,3) ## creates a new tuple, does not modify the old one
>>> print d1 ## as seen here, the dict is still intact
{(1,2): 'three'}
Using a tuple, the dictionary is safe from having its keys changed "out from under it" to items which hash to a different value. This is critical to allow efficient implementation.