Normally if I assign a variable some value, and then check their ids, I expect them to be the same, because python is essentially just giving my object a "name". This can be seen in the below code:
>>> a = 3
>>> id(a)
19845928
>>> id(3)
19845928
The problem is when I perform the same with "name"
>>> __name__
'__main__'
>>> id(__name__)
19652416
>>> id('__main__')
19652448
How can there ids be different, shouldn't they be the same? Because __name__ should also be just a reference.
id() gives essentially the memory pointer to the data. Although strings are immutable, they are not guaranteed to be interned. This means that some strings with equal values have different pointers.
For integers (especially small ones), the pointers will be the same, so your 3 example works fine.
#KartikAnand: The way you're checking for 'same object' is valid, although the usual way is to use x is y. The problem is that they are not the same object, and not guaranteed to be. They simply have the same value. Note that when you do "__main__" you're creating a new object. Sometimes python does a nice optimization and re-uses a previously-created string of the same value, but it doesn't have to.
Kartik's goal is to "verify that assignment is in a way reference and objects are not created on the fly". To do this, avoid creating new objects (no string literals).
>>> __name__
'__main__'
>>> x = __name__
>>> id(__name__)
3078339808L
>>> id(x)
3078339808L
>>> __name__ is x
True
Just because two strings have the same value, it does not follow that they are the same object. This is completely expected behaviour.
In Python, small integers are "pooled", so that all small integer values point to the same object. This is not necessary true for strings.
At any rate, this is an implementation detail that should not be relied upon.
What you are running in to here is the fact that primitives are pseudo (or real) singletons in Python. Further, looking at strings clouds the issue because when strings are interned, value and id become synonymous as a side effect, so some strings with value matches will have id matches and others won't. Try looking at hand-built objects instead, as then you control when a new instance is created, and id vs value becomes more explicitly clear.
Related
How much can I rely on the object's id() and its uniqueness in practice? E.g.:
Does id(a) == id(b) mean a is b or vice versa? What about the opposite?
How safe is it to save an id somewhere to be used later (e.g. into some registry instead of the object itself)?
(Written as a proposed canonical in response to Canonicals for Python: are objects with the same id() the same object, `is` operator, unbound method objects)
According to the id() documentation, an id is only guaranteed to be unique
for the lifetime of the specific object, and
within a specific interpreter instance
As such, comparing ids is not safe unless you also somehow ensure that both objects whose ids are taken are still alive at the time of comparison (and are associated with the same Python interpreter instance, but you need to really try to make that become false).
Which is exactly what is does -- which makes comparing ids redundant. If you cannot use the is syntax for whatever reason, there's always operator.is_.
Now, whether an object is still alive at the time of comparison is not always obvious (and sometimes is grossly non-obvious):
Accessing some attributes (e.g. bound methods of an object) creates a new object each time. So, the result's id may or may not be the same on each attribute access.
Example:
>>> class C(object): pass
>>> c=C()
>>> c.a=1
>>> c.a is c.a
True # same object each time
>>> c.__init__ is c.__init__
False # a different object each time
# The above two are not the only possible cases.
# An attribute may be implemented to sometimes return the same object
# and sometimes a different one:
#property
def page(self):
if check_for_new_version():
self._page=get_new_version()
return self._page
If an object is created as a result of calculating an expression and not saved anywhere, it's immediately discarded,1 and any object created after that can take up its id.
This is even true within the same code line. E.g. the result of id(create_foo()) == id(create_bar()) is undefined.
Example:
>>> id([]) #the list object is discarded when id() returns
39733320L
>>> id([]) #a new, unrelated object is created (and discarded, too)
39733320L #its id can happen to be the same
>>> id([[]])
39733640L #or not
>>> id([])
39733640L #you never really know
Due to the above safety requirements when comparing ids, saving an id instead of the object is not very useful because you have to save a reference to the object itself anyway -- to ensure that it stays alive. Neither is there any performance gain: is implementation is as simple as comparing pointers.
Finally, as an internal optimization (and implementation detail, so this may differ between implementations and releases), CPython reuses some often-used simple objects of immutable types. As of this writing, that includes small integers and some strings. So even if you got them from different places, their ids might coincide.
This does not (technically) violate the above id() documentation's uniqueness promises: the reused object stays alive through all the reuses.
This is also not a big deal because whether two variables point to the same object or not is only practical to know if the object is mutable: if two variables point to the same mutable object, mutating one will (unexpectedly) change the other, too. Immutable types don't have that problem, so for them, it doesn't matter if two variables point to two identical objects or to the same one.
1Sometimes, this is called "unnamed expression".
I was playing a bit in my python shell while learning about mutability of objects.
I found something strange:
>>> x=5.0
>>> id(x)
48840312
>>> id(5.0)
48840296
>>> x=x+3.0
>>> id(x) # why did x (now 8.0) keep the same id as 5.0?
48840296
>>> id(5.0)
36582128
>>> id(5.0)
48840344
Why is the id of 5.0 reused after the statement x=x+3.0?
Fundamentally, the answer to your question is "calling id() on numbers will give you unpredictable results". The reason for this is because unlike languages like Java, where primitives literally are their value in memory, "primitives" in Python are still objects, and no guarantee is provided that exactly the same object will be used every time, merely that a functionally equivalent one will be.
CPython caches the values of the integers from -5 to 256 for efficiency (ensuring that calls to id() will always be the same), since these are commonly used and can be effectively cached, however nothing about the language requires this to be the case, and other implementations may chose not to do so.
Whenever you write a double literal in Python, you're asking the interpreter to convert the string into a valid numerical object. If it can, Python will reuse existing objects, but if it cannot easily determine whether an object exits already, it will simply create a new one.
This is not to say that numbers in Python are mutable - they aren't. Any instance of a number, such as 5.0, in Python cannot be changed by the user after being created. However there's nothing wrong, as far as the interpreter is concerned, with constructing more than one instance of the same number.
Your specific example of the object representing x = 5.0 being reused for the value of x += 3.0 is an implementation detail. Under the covers, CPython may, if it sees fit, reuse numerical objects, both integers and floats, to avoid the costly activity of constructing a whole new object. I stress however, this is an implementation detail; it's entirely possible certain cases will not display this behavior, and CPython could at any time change its number-handling logic to no longer behave this way. You should avoid writing any code that relies on this quirk.
The alternative, as eryksun points out, is simply that you stumbled on an object being garbage collected and replaced in the same location. From the user's perspective, there's no difference between the two cases, and this serves to stress that id() should not be used on "primitives".
The Devil is in the details
PyObject* PyInt_FromLong(long ival)
Return value: New reference.
Create a new integer object with a value of ival.
The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range
you actually just get back a reference to the existing object. So it
should be possible to change the value of 1. I suspect the behaviour
of Python in this case is undefined. :-)
Note This is true only for CPython and may not apply for other Python Distribution.
I tried the following code and It gave me different output.
>>> foo1 = 4
>>> foo2 = 2+2
>>> id(foo1)
37740064L
>>> id(foo2)
37740064L
>>> foo1 = 4.3
>>> foo2 = 1.3+3.0
>>> id(foo1)
37801304L
>>> id(foo2)
37801232L
>>>
I am using python 2.7.2. Why id function return different value in case of float but same value in case of integers?
That is because the result of id in numeric constants is implementation defined.
In your case, Python 2.7.2, IIRC, the issue is that the compiler builds a few useful integer constants as singletons, (from -1 to 100 or so). The rationale is that these numbers are used so frequently that it makes no sense to dynamically allocate them each time they are needed, they are simply reused.
But that constant singleton optimization is not useful for float values, other than maybe 0.0, there are too many of them! So each time a new float value is needed it is allocated, and it gets a different id.
For a more deeply insight, read the source! This file is from Python3, but the idea is the same: look for the small_ints array.
id is never really predictable, not even for integers. With low the very integers 2 and 4, you just happen to hit the small integer cache. Try this:
>>> a = 12345
>>> b = 12345
>>> id(a)
33525888
>>> id(b)
33525852
>>>
For small integers and strings, that are expected to be frequently used, Python uses internal memory optimization. Since any variable in Python is a reference to memory object, Python puts such small values into the memory only once. Then, whenever the same value is assigned to any other variable, it makes that variable point to the object already kept in memory. This works for strings and integers as they are immutable and if the variable value changes, effectively it's the reference used by this variable that is changed, the object in memory with original value is not itself affected.
That's why variables foo1 and foo2 in first case hold reference to the same integer object in memory with value 4 and therefore the ids are the same.
First of all, floating point numbers are not 'small', and, second, the same 4.3 in memory depending on calculations might be kept as 4.3123456789 and 4.31239874654 (just example numbers to explain). So these two values are two different objects, but during calculations and displaying the meaningful part looks the same, i.e. 4.3 (in fact there's obviously many more possible values in memory for the same meaningful floating point number). So reusing the same floating point number object in memory is problematic and doesn't worth it after all.
That's why in the second case foo1 and foo2 reference different floating point object in memory and therefore have different ids.
See more details on how floating point numbers are kept in memory:
http://floating-point-gui.de/
http://docs.python.org/2/tutorial/floatingpoint.html
Also, there's a big article on floating point numbers at Oracle docs.
#josliber, I edited the answer before reposting as you advised.
Besides from the mentionned (and very true) reasons, did you verify if foo1 == foo2 at the first place? As you are dealing with floating point values, there can very easily be differences...
For floating numbers, 1.0+3.3 == 4.3 is not always TRUE (in Python or other languages). and mutable objects with same value can have different IDs, too.
In that case the float value containing variable may contain same data but reason behind is to mathematical rules.
The only two value are consider after the point & remaining are value get neglected.
so the we get two different addresses of the same data containing variables.
To solve this error we can use the round() method.
y=3.1254
x = 3.1254
print(round(x,2))
print(round(y,2))
print(id(x)==id(y))
This can display the both has same addresses because in print(round(x,2)) method & print(round(y,2)) method takes the only two digits after point.
I read somewhere (an SO post, I think, and probably somewhere else, too), that Python automatically references single character strings, so not only does 'a' == 'a', but 'a' is 'a'.
However, I can't remember reading if this is guaranteed behavior in Python, or is it just implementation specific?
Bonus points for official sources.
It's implementation specific. It's difficult to tell, because (as the reference says):
... for immutable types, operations that compute new values may actually return a reference to any existing object with the same type and value, while for mutable objects this is not allowed.
The interpreter's pretty good about ensuring they're identical, but it doesn't always work:
x = u'a'
y = u'abc'[:1]
print x == y, x is y
Run on CPython 2.6, this gives True False.
It is all implementation defined.
The documentation for intern says: "Normally, the names used in Python programs are automatically interned, and the dictionaries used to hold module, class or instance attributes have interned keys."
That means that anything that could be a name and which is known at compile time is likely (but not guaranteed) to be the same as any other occurrences of the same name.
Other strings aren't stated to be interned. Constant strings appearing in the same compilation unit are folded together (but that is also just an implementation detail) so you get:
>>> a = '!'
>>> a is '!'
False
>>> a = 'a'
>>> a is 'a'
True
>>>
The string that contains an identifier is interned so even in different compilations you get the same string. The string that is not an identifier is only shared when in the same compilation unit:
>>> '!' is '!'
True
I've read several python tutorials (Dive Into Python, for one), and the language reference on Python.org - I don't see why the language needs tuples.
Tuples have no methods compared to a list or set, and if I must convert a tuple to a set or list to be able to sort them, what's the point of using a tuple in the first place?
Immutability?
Why does anyone care if a variable lives at a different place in memory than when it was originally allocated? This whole business of immutability in Python seems to be over emphasized.
In C/C++ if I allocate a pointer and point to some valid memory, I don't care where the address is located as long as it's not null before I use it.
Whenever I reference that variable, I don't need to know if the pointer is still pointing to the original address or not. I just check for null and use it (or not).
In Python, when I allocate a string (or tuple) assign it to x, then modify the string, why do I care if it's the original object? As long as the variable points to my data, that's all that matters.
>>> x='hello'
>>> id(x)
1234567
>>> x='good bye'
>>> id(x)
5432167
x still references the data I want, why does anyone need to care if its id is the same or different?
immutable objects can allow substantial optimization; this is presumably why strings are also immutable in Java, developed quite separately but about the same time as Python, and just about everything is immutable in truly-functional languages.
in Python in particular, only immutables can be hashable (and, therefore, members of sets, or keys in dictionaries). Again, this afford optimization, but far more than just "substantial" (designing decent hash tables storing completely mutable objects is a nightmare -- either you take copies of everything as soon as you hash it, or the nightmare of checking whether the object's hash has changed since you last took a reference to it rears its ugly head).
Example of optimization issue:
$ python -mtimeit '["fee", "fie", "fo", "fum"]'
1000000 loops, best of 3: 0.432 usec per loop
$ python -mtimeit '("fee", "fie", "fo", "fum")'
10000000 loops, best of 3: 0.0563 usec per loop
None of the answers above point out the real issue of tuples vs lists, which many new to Python seem to not fully understand.
Tuples and lists serve different purposes. Lists store homogenous data. You can and should have a list like this:
["Bob", "Joe", "John", "Sam"]
The reason that is a correct use of lists is because those are all homogenous types of data, specifically, people's names. But take a list like this:
["Billy", "Bob", "Joe", 42]
That list is one person's full name, and their age. That isn't one type of data. The correct way to store that information is either in a tuple, or in an object. Lets say we have a few :
[("Billy", "Bob", "Joe", 42), ("Robert", "", "Smith", 31)]
The immutability and mutability of Tuples and Lists is not the main difference. A list is a list of the same kind of items: files, names, objects. Tuples are a grouping of different types of objects. They have different uses, and many Python coders abuse lists for what tuples are meant for.
Please don't.
Edit:
I think this blog post explains why I think this better than I did:
Understanding tuples vs. lists in Python - E-Scribe
if I must convert a tuple to a set or list to be able to sort them, what's the point of using a tuple in the first place?
In this particular case, there probably isn't a point. This is a non-issue, because this isn't one of the cases where you'd consider using a tuple.
As you point out, tuples are immutable. The reasons for having immutable types apply to tuples:
copy efficiency: rather than copying an immutable object, you can alias it (bind a variable to a reference)
comparison efficiency: when you're using copy-by-reference, you can compare two variables by comparing location, rather than content
interning: you need to store at most one copy of any immutable value
there's no need to synchronize access to immutable objects in concurrent code
const correctness: some values shouldn't be allowed to change. This (to me) is the main reason for immutable types.
Note that a particular Python implementation may not make use of all of the above features.
Dictionary keys must be immutable, otherwise changing the properties of a key-object can invalidate invariants of the underlying data structure. Tuples can thus potentially be used as keys. This is a consequence of const correctness.
See also "Introducing tuples", from Dive Into Python.
Sometimes we like to use objects as dictionary keys
For what it's worth, tuples recently (2.6+) grew index() and count() methods
I've always found having two completely separate types for the same basic data structure (arrays) to be an awkward design, but not a real problem in practice. (Every language has its warts, Python included, but this isn't an important one.)
Why does anyone care if a variable lives at a different place in memory than when it was originally allocated? This whole business of immutability in Python seems to be over emphasized.
These are different things. Mutability isn't related to the place it's stored in memory; it means the stuff it points to can't change.
Python objects can't change location after they're created, mutable or not. (More accurately, the value of id() can't change--same thing, in practice.) The internal storage of mutable objects can change, but that's a hidden implementation detail.
>>> x='hello'
>>> id(x)
1234567
>>> x='good bye'
>>> id(x)
5432167
This isn't modifying ("mutating") the variable; it's creating a new variable with the same name, and discarding the old one. Compare to a mutating operation:
>>> a = [1,2,3]
>>> id(a)
3084599212L
>>> a[1] = 5
>>> a
[1, 5, 3]
>>> id(a)
3084599212L
As others have pointed out, this allows using arrays as keys to dictionaries, and other data structures that need immutability.
Note that keys for dictionaries do not have to be completely immutable. Only the part of it used as a key needs to be immutable; for some uses, this is an important distinction. For example, you could have a class representing a user, which compares equality and a hash by the unique username. You could then hang other mutable data on the class--"user is logged in", etc. Since this doesn't affect equality or the hash, it's possible and perfectly valid to use this as a key in a dictionary. This isn't too commonly needed in Python; I just point it out since several people have claimed that keys need to be "immutable", which is only partially correct. I've used this many times with C++ maps and sets, though.
As gnibbler offered in a comment, Guido had an opinion that is not fully accepted/appreciated: “lists are for homogeneous data, tuples are for heterogeneous data”. Of course, many of the opposers interpreted this as meaning that all elements of a list should be of the same type.
I like to see it differently, not unlike others also have in the past:
blue= 0, 0, 255
alist= ["red", "green", blue]
Note that I consider alist to be homogeneous, even if type(alist[1]) != type(alist[2]).
If I can change the order of the elements and I won't have issues in my code (apart from assumptions, e.g. “it should be sorted”), then a list should be used. If not (like in the tuple blue above), then I should use a tuple.
They are important since they guarantee the caller that the object they pass won't be mutated.
If you do this:
a = [1,1,1]
doWork(a)
The caller has no guarantee of the value of a after the call.
However,
a = (1,1,1)
doWorK(a)
Now you as the caller or as a reader of this code know that a is the same.
You could always for this scenario make a copy of the list and pass that but now you are wasting cycles instead of using a language construct that makes more semantic sense.
you can see here for some discussion on this
Your question (and follow-up comments) focus on whether the id() changes during an assignment. Focusing on this follow-on effect of the difference between immutable object replacement and mutable object modification rather than the difference itself is perhaps not the best approach.
Before we continue, make sure that the behavior demonstrated below is what you expect from Python.
>>> a1 = [1]
>>> a2 = a1
>>> print a2[0]
1
>>> a1[0] = 2
>>> print a2[0]
2
In this case, the contents of a2 was changed, even though only a1 had a new value assigned. Contrast to the following:
>>> a1 = (1,)
>>> a2 = a1
>>> print a2[0]
1
>>> a1 = (2,)
>>> print a2[0]
1
In this latter case, we replaced the entire list, rather than updating its contents. With immutable types such as tuples, this is the only behavior allowed.
Why does this matter? Let's say you have a dict:
>>> t1 = (1,2)
>>> d1 = { t1 : 'three' }
>>> print d1
{(1,2): 'three'}
>>> t1[0] = 0 ## results in a TypeError, as tuples cannot be modified
>>> t1 = (2,3) ## creates a new tuple, does not modify the old one
>>> print d1 ## as seen here, the dict is still intact
{(1,2): 'three'}
Using a tuple, the dictionary is safe from having its keys changed "out from under it" to items which hash to a different value. This is critical to allow efficient implementation.