we have a dict like:
a = dict()
then we insert items, like:
for i in range(10**10):
a[i] = 0
Why does the id(a) remains unchanged even when the dict is too large to reassign memory?
Let's start by quoting what the python docs state about id built-in function:
id(object)
Return the “identity” of an object. This is an integer
which is guaranteed to be unique and constant for this object during
its lifetime. Two objects with non-overlapping lifetimes may have the
same id() value.
CPython implementation detail: This is the address of the object in
memory.
To understand your snippet:
a = dict()
for i in range(10**10):
a[i] = 0
You need to understand first what's going on at line a = dict(), in this case you create a new python dictionary object and this is assigned to the variable a, at this point if we reference the docs part talking about id remaining unique through the whole lifetime of the object everything should make sense. For instance, let's say we've got this:
a = dict()
print(id(a))
a = dict() # New object
print(id(a))
Above you can clearly see how the id(a) should have changed and that's mainly because the object in the 2nd assignment to a is not the same. Another example:
a = dict() # id1
b = dict() # id2
a = b # id2
Same thing, you've created 2 dict objects and in the 3rd assignment a=b, the id(a) will be id2, which will remain unique during the lifetime of the object referenced by vars a and b
Now, the interesting part of your question is you wondering why the id of a is not changing even if you're inserting new items to the dictionary. To understand that you'd need to be aware even if the underlying memory object will change eventually when __setitem__ is called (underlying cpython implementation here) and resized at a particular growth rate the statement provided by the id docs will remain invariant through the object lifetime and the id will be unique.
Related
I read the Python 2 docs and noticed the id() function:
Return the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
CPython implementation detail: This is the address of the object in memory.
So, I experimented by using id() with a list:
>>> list = [1,2,3]
>>> id(list[0])
31186196
>>> id(list[1])
31907092 // increased by 896
>>> id(list[2])
31907080 // decreased by 12
What is the integer returned from the function? Is it synonymous to memory addresses in C? If so, why doesn't the integer correspond to the size of the data type?
When is id() used in practice?
Your post asks several questions:
What is the number returned from the function?
It is "an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime." (Python Standard Library - Built-in Functions) A unique number. Nothing more, and nothing less. Think of it as a social-security number or employee id number for Python objects.
Is it the same with memory addresses in C?
Conceptually, yes, in that they are both guaranteed to be unique in their universe during their lifetime. And in one particular implementation of Python, it actually is the memory address of the corresponding C object.
If yes, why doesn't the number increase instantly by the size of the data type (I assume that it would be int)?
Because a list is not an array, and a list element is a reference, not an object.
When do we really use id( ) function?
Hardly ever. You can test if two references are the same by comparing their ids, but the is operator has always been the recommended way of doing that. id( ) is only really useful in debugging situations.
That's the identity of the location of the object in memory...
This example might help you understand the concept a little more.
foo = 1
bar = foo
baz = bar
fii = 1
print id(foo)
print id(bar)
print id(baz)
print id(fii)
> 1532352
> 1532352
> 1532352
> 1532352
These all point to the same location in memory, which is why their values are the same. In the example, 1 is only stored once, and anything else pointing to 1 will reference that memory location.
Rob's answer (most voted above) is correct. I would like to add that in some situations using IDs is useful as it allows for comparison of objects and finding which objects refer to your objects.
The later usually helps you for example to debug strange bugs where mutable objects are passed as parameter to say classes and are assigned to local vars in a class. Mutating those objects will mutate vars in a class. This manifests itself in strange behavior where multiple things change at the same time.
Recently I had this problem with a Python/Tkinter app where editing text in one text entry field changed the text in another as I typed :)
Here is an example on how you might use function id() to trace where those references are. By all means this is not a solution covering all possible cases, but you get the idea. Again IDs are used in the background and user does not see them:
class democlass:
classvar = 24
def __init__(self, var):
self.instancevar1 = var
self.instancevar2 = 42
def whoreferencesmylocalvars(self, fromwhere):
return {__l__: {__g__
for __g__ in fromwhere
if not callable(__g__) and id(eval(__g__)) == id(getattr(self,__l__))
}
for __l__ in dir(self)
if not callable(getattr(self, __l__)) and __l__[-1] != '_'
}
def whoreferencesthisclassinstance(self, fromwhere):
return {__g__
for __g__ in fromwhere
if not callable(__g__) and id(eval(__g__)) == id(self)
}
a = [1,2,3,4]
b = a
c = b
democlassinstance = democlass(a)
d = democlassinstance
e = d
f = democlassinstance.classvar
g = democlassinstance.instancevar2
print( 'My class instance is of', type(democlassinstance), 'type.')
print( 'My instance vars are referenced by:', democlassinstance.whoreferencesmylocalvars(globals()) )
print( 'My class instance is referenced by:', democlassinstance.whoreferencesthisclassinstance(globals()) )
OUTPUT:
My class instance is of <class '__main__.democlass'> type.
My instance vars are referenced by: {'instancevar2': {'g'}, 'classvar': {'f'}, 'instancevar1': {'a', 'c', 'b'}}
My class instance is referenced by: {'e', 'd', 'democlassinstance'}
Underscores in variable names are used to prevent name colisions. Functions use "fromwhere" argument so that you can let them know where to start searching for references. This argument is filled by a function that lists all names in a given namespace. Globals() is one such function.
id() does return the address of the object being referenced (in CPython), but your confusion comes from the fact that python lists are very different from C arrays. In a python list, every element is a reference. So what you are doing is much more similar to this C code:
int *arr[3];
arr[0] = malloc(sizeof(int));
*arr[0] = 1;
arr[1] = malloc(sizeof(int));
*arr[1] = 2;
arr[2] = malloc(sizeof(int));
*arr[2] = 3;
printf("%p %p %p", arr[0], arr[1], arr[2]);
In other words, you are printing the address from the reference and not an address relative to where your list is stored.
In my case, I have found the id() function handy for creating opaque handles to return to C code when calling python from C. Doing that, you can easily use a dictionary to look up the object from its handle and it's guaranteed to be unique.
I am starting out with python and I use id when I use the interactive shell to see whether my variables are assigned to the same thing or if they just look the same.
Every value is an id, which is a unique number related to where it is stored in the memory of the computer.
If you're using python 3.4.1 then you get a different answer to your question.
list = [1,2,3]
id(list[0])
id(list[1])
id(list[2])
returns:
1705950792
1705950808 # increased by 16
1705950824 # increased by 16
The integers -5 to 256 have a constant id, and on finding it multiple times its id does not change, unlike all other numbers before or after it that have different id's every time you find it.
The numbers from -5 to 256 have id's in increasing order and differ by 16.
The number returned by id() function is a unique id given to each item stored in memory and it is analogy wise the same as the memory location in C.
The is operator uses it to check whether two objects are identical (as opposed to equal). The actual value that is returned from id() is pretty much never used for anything because it doesn't really have a meaning, and it's platform-dependent.
The answer is pretty much never. IDs are mainly used internally to Python.
The average Python programmer will probably never need to use id() in their code.
It is the address of the object in memory, exactly as the doc says. However, it has metadata attached to it, properties of the object and location in the memory is needed to store the metadata. So, when you create your variable called list, you also create metadata for the list and its elements.
So, unless you an absolute guru in the language you can't determine the id of the next element of your list based on the previous element, because you don't know what the language allocates along with the elements.
I have an idea to use value of id() in logging.
It's cheap to get and it's quite short.
In my case I use tornado and id() would like to have an anchor to group messages scattered and mixed over file by web socket.
I'm a little bit late and i will talk about Python3. To understand what id() is and how it (and Python) works, consider next example:
>>> x=1000
>>> y=1000
>>> id(x)==id(y)
False
>>> id(x)
4312240944
>>> id(y)
4312240912
>>> id(1000)
4312241104
>>> x=1000
>>> id(x)
4312241104
>>> y=1000
>>> id(y)
4312241200
You need to think about everything on the right side as objects. Every time you make assignment - you create new object and that means new id. In the middle you can see a "wild" object which is created only for function - id(1000). So, it's lifetime is only for that line of code. If you check the next line - you see that when we create new variable x, it has the same id as that wild object. Pretty much it works like memory address.
As of in python 3 id is assigned to a value not a variable. This means that if you create two functions as below, all the three id's are the same.
>>> def xyz():
... q=123
... print(id(q))
...
>>> def iop():
... w=123
... print(id(w))
>>> xyz()
1650376736
>>> iop()
1650376736
>>> id(123)
1650376736
Be carefull (concerning the answer just below)...That's only true because 123 is between -5 and 256...
In [111]: q = 257
In [112]: id(q)
Out[112]: 140020248465168
In [113]: w = 257
In [114]: id(w)
Out[114]: 140020274622544
In [115]: id(257)
Out[115]: 140020274622768
I have been reading the Python Data Model. The following text is taken from here:
Types affect almost all aspects of object behavior. Even the
importance of object identity is affected in some sense: for immutable
types, operations that compute new values may actually return a
reference to any existing object with the same type and value, while
for mutable objects this is not allowed. E.g., after a = 1; b = 1, a
and b may or may not refer to the same object with the value one,
depending on the implementation, but after c = []; d = [], c and d are
guaranteed to refer to two different, unique, newly created empty
lists. (Note that c = d = [] assigns the same object to both c and d.)
So, it mentions that, for immutable types, operations that compute new values may actually return a reference to an existing object with same type and value. So, I wanted to test this. Following is my code:
a = (1,2,3)
b = (1,2)
c = (3,)
k = b + c
print(id(a))
>>> 2169349869720
print(id(k))
>>> 2169342802424
Here, I did an operation to compute a new tuple that has same the value and type as a. But I got an object referencing to different id. This means I got an object which references different memory than a. Why is this?
Answering the question based on comments from #jonrsharpe
Note "may actually return" - it's not guaranteed, it would likely be
less efficient for Python to look through the existing tuples to find
out if one that's the same as the one your operation creates already
exists and reuse it than it would to just create a new one.
If two variable values are identical then it is said to be sharing same memory...
so python follows shared memory concept ?....and if i change one value will it change another?
See Python data model described here
Types affect almost all aspects of object behavior. Even the importance of object identity is affected in some sense: for immutable types, operations that compute new values may actually return a reference to any existing object with the same type and value, while for mutable objects this is not allowed. E.g., after a = 1; b = 1, a and b may or may not refer to the same object with the value one, depending on the implementation, but after c = []; d = [], c and d are guaranteed to refer to two different, unique, newly created empty lists. (Note that c = d = [] assigns the same object to both c and d.)
Why does multiple assignment make distinct references for ints, but not lists or other objects?
>>> a = b = 1
>>> a += 1
>>> a is b
>>> False
>>> a = b = [1]
>>> a.append(1)
>>> a is b
>>> True
In the int example, you first assign the same object to both a and b, but then reassign a with another object (the result of a+1). a now refers to a different object.
In the list example, you assign the same object to both a and b, but then you don't do anything to change that. append only changes the interal state of the list object, not its identity. Thus they remain the same.
If you replace a.append(1) with a = a + [1], you end up with different object, because, again, you assign a new object (the result of a+[1]) to a.
Note that a+=[1] will behave differently, but that's a whole other question.
primitive types are immutable. When a += 1 runs, a no longer refers to the memory location as b:
https://docs.python.org/2/library/functions.html#id
CPython implementation detail: This is the address of the object in memory.
In [1]: a = b = 100000000000000000000000000000
print id(a), id(b)
print a is b
Out [1]: 4400387016 4400387016
True
In [2]: a += 1
print id(a), id(b)
print a is b
Out [2]: 4395695296 4400387016
False
Python works differently when changing values of mutable object and immutable object
Immutable objects:
This are objects whose values which dose not after initialization
i.e.)int,string,tuple
Mutable Objects
This are objects whose values which can be after initialization
i.e.)All other objects are mutable like dist,list and user defined object
When changing the value of mutable object it dose not create a new memory space and transfer there it just changes the memory space where it was created
But it is exactly the opposite for immutable objects that is it creates a new space and transfer itself there
i.e.)
s="awe"
s[0]="e"
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-19-9f16ce5bbc72> in <module>()
----> 1 s[0]="e"
TypeError: 'str' object does not support item assignment
This is trying to tell u that you can change the value of the string memory
you could do this
"e"+s[1:]
Out[20]: 'ewe'
This creates a new memory space and allocates the string there .
Like wise making A=B=1 and changing A A=2 will create a new memory space and variable A will reference to that location so that's why B's value is not changed when changing value of A
But this not the case in List since it is a mutable object changing the value does not transfer it to a new memory location it just expands the used memory
i.e.)
a=b=[]
a.append(1)
print a
[1]
print b
[1]
Both gives the same value since it is referencing the same memory space so both are equal
The difference is not in the multiple assignment, but in what you subsequently do with the objects. With the int, you do +=, and with the list you do .append.
However, even if you do += for both, you won't necessarily see the same result, because what += does depends on what type you use it on.
So that is the basic answer: operations like += may work differently on different types. Whether += returns a new object or modifies the existing object is behavior that is defined by that object. To know what the behavior is, you need to know what kind of object it is and what behavior it defines (i.e., the documentation). All the more, you cannot assume that using an operation like += will have the same result as using a method like .append. What a method like .append does is defined by the object you call it on.
On p.35 of "Python Essential Reference" by David Beazley, he first states:
For immutable data such as strings, the interpreter aggressively
shares objects between different parts of the program.
However, later on the same page, he states
For immutable objects such as numbers and strings, this assignment
effectively creates a copy.
But isn't this a contradiction? On one hand he is saying that they are shared, but then he says they are copied.
An assignment in python never ever creates a copy (it is technically possible only if the assignment for a class member is redefined for example by using __setattr__, properties or descriptors).
So after
a = foo()
b = a
whatever was returned from foo has not been copied, and instead you have two variables a and b pointing to the same object. No matter if the object is immutable or not.
With immutable objects however it's hard to tell if this is the case (because you cannot mutate the object using one variable and check if the change is visible using the other) so you are free to think that indeed a and b cannot influence each other.
For some immutable objects also Python is free to reuse old objects instead of creating new ones and after
a = x + y
b = x + y
where both x and y are numbers (so the sum is a number and is immutable) may be that both a and b will be pointing to the same object. Note that there is no such a guarantee... it may also be that instead they will be pointing to different objects with the same value.
The important thing to remember is that Python never ever makes a copy unless specifically instructed to using e.g. copy or deepcopy. This is very important with mutable objects to avoid surprises.
One common idiom you can see is for example:
class Polygon:
def __init__(self, pts):
self.pts = pts[:]
...
In this case self.pts = pts[:] is used instead of self.pts = pts to make a copy of the whole array of points to be sure that the point list will not change unexpectedly if after creating the object changes are applied to the list that was passed to the constructor.
It effectively creates a copy. It doesn't actually create a copy. The main difference between having two copies and having two names share the same value is that, in the latter case, modifications via one name affect the value of the other name. If the value can't be mutated, this difference disappears, so for immutable objects there is little practical consequence to whether the value is copied or not.
There are some corner cases where you can tell the difference between copies and different objects even for immutable types (e.g., by using the id function or the is operator), but these are not useful for Python builtin immutable types (like strings and numbers).
No, assigning a pre-existing str variable to a new variable name does not create an independent copy of the value in memory.
The existence of unique objects in memory can be checked using the id() function. For example, using the interactive Python prompt, try:
>>> str1 = 'ATG'
>>> str2 = str1
Both str1 and str2 have the same value:
>>> str1
'ATG'
>>> str2
'ATG'
This is because str1 and str2 both point to the same object, evidenced by the fact that they share the same unique object ID:
>>> id(str1)
140439014052080
>>> id(str2)
140439014052080
>>> id(str1) == id(str2)
True
Now suppose you modify str1:
>>> str1 += 'TAG' # same as str1 = str1 + 'TAG'
>>> str1
ATGTAG
Because str objects are immutable, the above assignment created a new unique object with its own ID:
>>> id(str1)
140439016777456
>>> id(str1) == id(str2)
False
However, str2 maintains the same ID it had earlier:
>>> id(str2)
140439014052080
Thus, execution of str1 += 'TAG' assigned a brand new str object with its own unique ID to the variable str1, while str2 continues to point to the original str object.
This implies that assigning an existing str variable to another variable name does not create a copy of its value in memory.