Why does CPython (no clue about other Python implementations) have the following behavior?
tuple1 = ()
tuple2 = ()
dict1 = {}
dict2 = {}
list1 = []
list2 = []
# makes sense, tuples are immutable
assert(id(tuple1) == id(tuple2))
# also makes sense dicts are mutable
assert(id(dict1) != id(dict2))
# lists are mutable too
assert(id(list1) != id(list2))
assert(id(()) == id(()))
# why no assertion error on this?
assert(id({}) == id({}))
# or this?
assert(id([]) == id([]))
I have a few ideas why it may, but can't find a concrete reason why.
EDIT
To further prove Glenn's and Thomas' point:
[1] id([])
4330909912
[2] x = []
[3] id(x)
4330909912
[4] id([])
4334243440
When you call id({}), Python creates a dict and passes it to the id function. The id function takes its id (its memory location), and throws away the dict. The dict is destroyed. When you do it twice in quick succession (without any other dicts being created in the mean time), the dict Python creates the second time happens to use the same block of memory as the first time. (CPython's memory allocator makes that a lot more likely than it sounds.) Since (in CPython) id uses the memory location as the object id, the id of the two objects is the same. This obviously doesn't happen if you assign the dict to a variable and then get its id(), because the dicts are alive at the same time, so their id has to be different.
Mutability does not directly come into play, but code objects caching tuples and strings do. In the same code object (function or class body or module body) the same literals (integers, strings and certain tuples) will be re-used. Mutable objects can never be re-used, they're always created at runtime.
In short, an object's id is only unique for the lifetime of the object. After the object is destroyed, or before it is created, something else can have the same id.
CPython is garbage collecting objects as soon as they go out of scope, so the second [] is created after the first [] is collected. So, most of the time it ends up in the same memory location.
This shows what's happening very clearly (the output is likely to be different in other implementations of Python):
class A:
def __init__(self): print("a")
def __del__(self): print("b")
# a a b b False
print(A() is A())
# a b a b True
print(id(A()) == id(A()))
Related
I'm looking for some clarification regarding mutability and class objects. From what I understand, variables in Python are about assigning a variable name to an object.
If that object is immutable then when we set two variables to the same object, it'll be two separate copies (e.g. a = b = 3 so a changing to 4 will not affect b because 3 is a number, an example of an immutable object).
However, if an object is mutable, then changing the value in one variable assignment will naturally change the value in the other (e.g. a = b = [] -> a.append(1) so now both a and b will refer to "[1]")
Working with classes, it seems even more fluid than I believed. I wrote a quick example below to show the differences. The first class is a typical Node class with a next pointer and a value. Setting two variables, "slow" and "fast", to the same instance of the Node object ("head"), and then changing the values of both "slow" and "fast" won't affect the other. That is, "slow", "fast", and "head" all refer to different objects (verified by checking their id() as well).
The second example class doesn't have a next pointer and only has a self.val attribute. This time changing one of the two variables, "p1" and "p2", both of which are set to the same instance, "start", will affect the other. This is despite that self.val in the "start" instance is an immutable number.
'''
The below will have two variable names (slow, fast) assigned to a head Node.
Changing one of them will NOT change the other reference as well.
'''
class Node:
def __init__(self, x, next=None):
self.x = x
self.next = next
def __str__(self):
return str(self.x)
n3 = Node(3)
n2 = Node(2, n3)
n1 = Node(1, n2)
head = n1
slow = fast = head
print(f"Printing before moving...{head}, {slow}, {fast}") # 1, 1, 1
while fast and fast.next:
fast = fast.next.next
slow = slow.next
print(f"Printing after moving...{head}, {slow}, {fast}") # 1, 2, 3
print(f"Checking the ids of each variable {id(head)}, {id(slow)}, {id(fast)}") # all different
'''
The below will have two variable names (p1, p2) assigned to a start Dummy.
Changing one of them will change the other reference as well.
'''
class Dummy:
def __init__(self, val):
self.val = val
def __str__(self):
return str(self.val)
start = Dummy(100)
p1 = p2 = start
print(f"Printing before changing {p1}, {p2}") # 100, 100
p1.val = 42
print(f"Printing after changing {p1}, {p2}") # 42, 42
This is a bit murky for me to understand what is actually going on under the hood and I'm seeking clarification so I can feel confident in setting multiple variable assignments to the same object expecting a true copy (without resorting to "import copy; copy.deepcopy(x);")
Thank you for your help
This isn't a matter of immutability vs mutability. This is a matter of mutating an object vs reassigning a reference.
If that object is immutable then when we set two variables to the same object, it'll be two separate copies
This isn't true. A copy won't be made. If you have:
a = 1
b = a
You have two references to the same object, not a copy of the object. This is fine though because integers are immutable. You can't mutate 1, so the fact that a and b are pointing to the same object won't hurt anything.
Python will never make implicit copies for you. If you want a copy, you need to copy it yourself explicitly (using copy.copy, or some other method like slicing on lists). If you write this:
a = b = some_obj
a and b will point to the same object, regardless of the type of some_obj and whether or not it's mutable.
So what's the difference between your examples?
In your first Node example, you never actually alter any Node objects. They may as well be immutable.
slow = fast = head
That initial assignment makes both slow an fast point to the same object: head. Right after that though, you do:
fast = fast.next.next
This reassigns the fast reference, but never actually mutates the object fast is looking at. All you've done is change what object the fast reference is looking at.
In your second example however, you directly mutate the object:
p1.val = 42
While this looks like reassignment, it isn't. This is actually:
p1.__setattr__("val", 42)
And __setattr__ alters the internal state of the object.
So, reassignment changes what object is being looked at. It will always take the form:
a = b # Maybe chained as well.
Contrast with these that look like reassignment, but are actually calls to mutating methods of the object:
l = [0]
l[0] = 5 # Actually l.__setitem__(0, 5)
d = Dummy()
d.val = 42 # Actually d.__setattr__("val", 42)
You overcomplicate things. The fundamental, simple rule is: each time you use = to assign an object to a variable, you make the variable name refer to that object, that's all. The object being mutable or not makes no difference.
With a = b = 3, you make the names a and b refer to the object 3. If you then make a = 4, you make the name a refer to the object 4, and the name b still refers to 3.
With a = b = [], you've created two names a and b that refer to the same list object. When doing a.append(1), you append 1 to this list. You haven't assigned anything to a or b in the process (you didn't write any a = ... or b = ...). So, whether you access the list through the name a or b, it's still the same list that you manipulate. It can just be called by two different names.
The same happens in your example with classes: when you write fast = fast.next.next, you make the name fast refer to a new object.
When you do p1.val = 42, you don't make p1 refer to a new different instance, but you change the val attribute of this instance. p1 and p2are still two names for this unique instance, so using either name lets you refer to the same instance.
Mutable and Immutable Objects
When a program is run, data objects in the program are stored in the computer’s
memory for processing. While some of these objects can be modified at that memory
location, other data objects can’t be modified once they are stored in the memory. The
property of whether or not data objects can be modified in the same memory location
where they are stored is called mutability. We can check the mutability of an object by checking its memory location before and
after it is modified. If the memory location remains the same when the data object is
modified, it means it is mutable. To check the memory location of where a data object is stored, we use the function, id(). Consider the following example
a=[5, 10, 15]
id(a)
#1906292064
a[1]=20
id(a)
#1906292064
#Assigning values to the list a. The ID of the memory location where a is stored.
#Replacing the second item in the list,10 with a new item, 20.
#print(a) Using the print() function to verify the new value of a.# Using the function #id() to get the memory location of a.
#The ID of the memory location where a is stored.
the memory location has not changed as the ID remains (1906292064)
remains the same before and after the variable is modified. This indicates that the list
is mutable, i.e., it can be modified at the same memory location where it is stored
Why does CPython (no clue about other Python implementations) have the following behavior?
tuple1 = ()
tuple2 = ()
dict1 = {}
dict2 = {}
list1 = []
list2 = []
# makes sense, tuples are immutable
assert(id(tuple1) == id(tuple2))
# also makes sense dicts are mutable
assert(id(dict1) != id(dict2))
# lists are mutable too
assert(id(list1) != id(list2))
assert(id(()) == id(()))
# why no assertion error on this?
assert(id({}) == id({}))
# or this?
assert(id([]) == id([]))
I have a few ideas why it may, but can't find a concrete reason why.
EDIT
To further prove Glenn's and Thomas' point:
[1] id([])
4330909912
[2] x = []
[3] id(x)
4330909912
[4] id([])
4334243440
When you call id({}), Python creates a dict and passes it to the id function. The id function takes its id (its memory location), and throws away the dict. The dict is destroyed. When you do it twice in quick succession (without any other dicts being created in the mean time), the dict Python creates the second time happens to use the same block of memory as the first time. (CPython's memory allocator makes that a lot more likely than it sounds.) Since (in CPython) id uses the memory location as the object id, the id of the two objects is the same. This obviously doesn't happen if you assign the dict to a variable and then get its id(), because the dicts are alive at the same time, so their id has to be different.
Mutability does not directly come into play, but code objects caching tuples and strings do. In the same code object (function or class body or module body) the same literals (integers, strings and certain tuples) will be re-used. Mutable objects can never be re-used, they're always created at runtime.
In short, an object's id is only unique for the lifetime of the object. After the object is destroyed, or before it is created, something else can have the same id.
CPython is garbage collecting objects as soon as they go out of scope, so the second [] is created after the first [] is collected. So, most of the time it ends up in the same memory location.
This shows what's happening very clearly (the output is likely to be different in other implementations of Python):
class A:
def __init__(self): print("a")
def __del__(self): print("b")
# a a b b False
print(A() is A())
# a b a b True
print(id(A()) == id(A()))
Why does CPython (no clue about other Python implementations) have the following behavior?
tuple1 = ()
tuple2 = ()
dict1 = {}
dict2 = {}
list1 = []
list2 = []
# makes sense, tuples are immutable
assert(id(tuple1) == id(tuple2))
# also makes sense dicts are mutable
assert(id(dict1) != id(dict2))
# lists are mutable too
assert(id(list1) != id(list2))
assert(id(()) == id(()))
# why no assertion error on this?
assert(id({}) == id({}))
# or this?
assert(id([]) == id([]))
I have a few ideas why it may, but can't find a concrete reason why.
EDIT
To further prove Glenn's and Thomas' point:
[1] id([])
4330909912
[2] x = []
[3] id(x)
4330909912
[4] id([])
4334243440
When you call id({}), Python creates a dict and passes it to the id function. The id function takes its id (its memory location), and throws away the dict. The dict is destroyed. When you do it twice in quick succession (without any other dicts being created in the mean time), the dict Python creates the second time happens to use the same block of memory as the first time. (CPython's memory allocator makes that a lot more likely than it sounds.) Since (in CPython) id uses the memory location as the object id, the id of the two objects is the same. This obviously doesn't happen if you assign the dict to a variable and then get its id(), because the dicts are alive at the same time, so their id has to be different.
Mutability does not directly come into play, but code objects caching tuples and strings do. In the same code object (function or class body or module body) the same literals (integers, strings and certain tuples) will be re-used. Mutable objects can never be re-used, they're always created at runtime.
In short, an object's id is only unique for the lifetime of the object. After the object is destroyed, or before it is created, something else can have the same id.
CPython is garbage collecting objects as soon as they go out of scope, so the second [] is created after the first [] is collected. So, most of the time it ends up in the same memory location.
This shows what's happening very clearly (the output is likely to be different in other implementations of Python):
class A:
def __init__(self): print("a")
def __del__(self): print("b")
# a a b b False
print(A() is A())
# a b a b True
print(id(A()) == id(A()))
Why does CPython (no clue about other Python implementations) have the following behavior?
tuple1 = ()
tuple2 = ()
dict1 = {}
dict2 = {}
list1 = []
list2 = []
# makes sense, tuples are immutable
assert(id(tuple1) == id(tuple2))
# also makes sense dicts are mutable
assert(id(dict1) != id(dict2))
# lists are mutable too
assert(id(list1) != id(list2))
assert(id(()) == id(()))
# why no assertion error on this?
assert(id({}) == id({}))
# or this?
assert(id([]) == id([]))
I have a few ideas why it may, but can't find a concrete reason why.
EDIT
To further prove Glenn's and Thomas' point:
[1] id([])
4330909912
[2] x = []
[3] id(x)
4330909912
[4] id([])
4334243440
When you call id({}), Python creates a dict and passes it to the id function. The id function takes its id (its memory location), and throws away the dict. The dict is destroyed. When you do it twice in quick succession (without any other dicts being created in the mean time), the dict Python creates the second time happens to use the same block of memory as the first time. (CPython's memory allocator makes that a lot more likely than it sounds.) Since (in CPython) id uses the memory location as the object id, the id of the two objects is the same. This obviously doesn't happen if you assign the dict to a variable and then get its id(), because the dicts are alive at the same time, so their id has to be different.
Mutability does not directly come into play, but code objects caching tuples and strings do. In the same code object (function or class body or module body) the same literals (integers, strings and certain tuples) will be re-used. Mutable objects can never be re-used, they're always created at runtime.
In short, an object's id is only unique for the lifetime of the object. After the object is destroyed, or before it is created, something else can have the same id.
CPython is garbage collecting objects as soon as they go out of scope, so the second [] is created after the first [] is collected. So, most of the time it ends up in the same memory location.
This shows what's happening very clearly (the output is likely to be different in other implementations of Python):
class A:
def __init__(self): print("a")
def __del__(self): print("b")
# a a b b False
print(A() is A())
# a b a b True
print(id(A()) == id(A()))
Why does CPython (no clue about other Python implementations) have the following behavior?
tuple1 = ()
tuple2 = ()
dict1 = {}
dict2 = {}
list1 = []
list2 = []
# makes sense, tuples are immutable
assert(id(tuple1) == id(tuple2))
# also makes sense dicts are mutable
assert(id(dict1) != id(dict2))
# lists are mutable too
assert(id(list1) != id(list2))
assert(id(()) == id(()))
# why no assertion error on this?
assert(id({}) == id({}))
# or this?
assert(id([]) == id([]))
I have a few ideas why it may, but can't find a concrete reason why.
EDIT
To further prove Glenn's and Thomas' point:
[1] id([])
4330909912
[2] x = []
[3] id(x)
4330909912
[4] id([])
4334243440
When you call id({}), Python creates a dict and passes it to the id function. The id function takes its id (its memory location), and throws away the dict. The dict is destroyed. When you do it twice in quick succession (without any other dicts being created in the mean time), the dict Python creates the second time happens to use the same block of memory as the first time. (CPython's memory allocator makes that a lot more likely than it sounds.) Since (in CPython) id uses the memory location as the object id, the id of the two objects is the same. This obviously doesn't happen if you assign the dict to a variable and then get its id(), because the dicts are alive at the same time, so their id has to be different.
Mutability does not directly come into play, but code objects caching tuples and strings do. In the same code object (function or class body or module body) the same literals (integers, strings and certain tuples) will be re-used. Mutable objects can never be re-used, they're always created at runtime.
In short, an object's id is only unique for the lifetime of the object. After the object is destroyed, or before it is created, something else can have the same id.
CPython is garbage collecting objects as soon as they go out of scope, so the second [] is created after the first [] is collected. So, most of the time it ends up in the same memory location.
This shows what's happening very clearly (the output is likely to be different in other implementations of Python):
class A:
def __init__(self): print("a")
def __del__(self): print("b")
# a a b b False
print(A() is A())
# a b a b True
print(id(A()) == id(A()))