I'm looking for some clarification regarding mutability and class objects. From what I understand, variables in Python are about assigning a variable name to an object.
If that object is immutable then when we set two variables to the same object, it'll be two separate copies (e.g. a = b = 3 so a changing to 4 will not affect b because 3 is a number, an example of an immutable object).
However, if an object is mutable, then changing the value in one variable assignment will naturally change the value in the other (e.g. a = b = [] -> a.append(1) so now both a and b will refer to "[1]")
Working with classes, it seems even more fluid than I believed. I wrote a quick example below to show the differences. The first class is a typical Node class with a next pointer and a value. Setting two variables, "slow" and "fast", to the same instance of the Node object ("head"), and then changing the values of both "slow" and "fast" won't affect the other. That is, "slow", "fast", and "head" all refer to different objects (verified by checking their id() as well).
The second example class doesn't have a next pointer and only has a self.val attribute. This time changing one of the two variables, "p1" and "p2", both of which are set to the same instance, "start", will affect the other. This is despite that self.val in the "start" instance is an immutable number.
'''
The below will have two variable names (slow, fast) assigned to a head Node.
Changing one of them will NOT change the other reference as well.
'''
class Node:
def __init__(self, x, next=None):
self.x = x
self.next = next
def __str__(self):
return str(self.x)
n3 = Node(3)
n2 = Node(2, n3)
n1 = Node(1, n2)
head = n1
slow = fast = head
print(f"Printing before moving...{head}, {slow}, {fast}") # 1, 1, 1
while fast and fast.next:
fast = fast.next.next
slow = slow.next
print(f"Printing after moving...{head}, {slow}, {fast}") # 1, 2, 3
print(f"Checking the ids of each variable {id(head)}, {id(slow)}, {id(fast)}") # all different
'''
The below will have two variable names (p1, p2) assigned to a start Dummy.
Changing one of them will change the other reference as well.
'''
class Dummy:
def __init__(self, val):
self.val = val
def __str__(self):
return str(self.val)
start = Dummy(100)
p1 = p2 = start
print(f"Printing before changing {p1}, {p2}") # 100, 100
p1.val = 42
print(f"Printing after changing {p1}, {p2}") # 42, 42
This is a bit murky for me to understand what is actually going on under the hood and I'm seeking clarification so I can feel confident in setting multiple variable assignments to the same object expecting a true copy (without resorting to "import copy; copy.deepcopy(x);")
Thank you for your help
This isn't a matter of immutability vs mutability. This is a matter of mutating an object vs reassigning a reference.
If that object is immutable then when we set two variables to the same object, it'll be two separate copies
This isn't true. A copy won't be made. If you have:
a = 1
b = a
You have two references to the same object, not a copy of the object. This is fine though because integers are immutable. You can't mutate 1, so the fact that a and b are pointing to the same object won't hurt anything.
Python will never make implicit copies for you. If you want a copy, you need to copy it yourself explicitly (using copy.copy, or some other method like slicing on lists). If you write this:
a = b = some_obj
a and b will point to the same object, regardless of the type of some_obj and whether or not it's mutable.
So what's the difference between your examples?
In your first Node example, you never actually alter any Node objects. They may as well be immutable.
slow = fast = head
That initial assignment makes both slow an fast point to the same object: head. Right after that though, you do:
fast = fast.next.next
This reassigns the fast reference, but never actually mutates the object fast is looking at. All you've done is change what object the fast reference is looking at.
In your second example however, you directly mutate the object:
p1.val = 42
While this looks like reassignment, it isn't. This is actually:
p1.__setattr__("val", 42)
And __setattr__ alters the internal state of the object.
So, reassignment changes what object is being looked at. It will always take the form:
a = b # Maybe chained as well.
Contrast with these that look like reassignment, but are actually calls to mutating methods of the object:
l = [0]
l[0] = 5 # Actually l.__setitem__(0, 5)
d = Dummy()
d.val = 42 # Actually d.__setattr__("val", 42)
You overcomplicate things. The fundamental, simple rule is: each time you use = to assign an object to a variable, you make the variable name refer to that object, that's all. The object being mutable or not makes no difference.
With a = b = 3, you make the names a and b refer to the object 3. If you then make a = 4, you make the name a refer to the object 4, and the name b still refers to 3.
With a = b = [], you've created two names a and b that refer to the same list object. When doing a.append(1), you append 1 to this list. You haven't assigned anything to a or b in the process (you didn't write any a = ... or b = ...). So, whether you access the list through the name a or b, it's still the same list that you manipulate. It can just be called by two different names.
The same happens in your example with classes: when you write fast = fast.next.next, you make the name fast refer to a new object.
When you do p1.val = 42, you don't make p1 refer to a new different instance, but you change the val attribute of this instance. p1 and p2are still two names for this unique instance, so using either name lets you refer to the same instance.
Mutable and Immutable Objects
When a program is run, data objects in the program are stored in the computer’s
memory for processing. While some of these objects can be modified at that memory
location, other data objects can’t be modified once they are stored in the memory. The
property of whether or not data objects can be modified in the same memory location
where they are stored is called mutability. We can check the mutability of an object by checking its memory location before and
after it is modified. If the memory location remains the same when the data object is
modified, it means it is mutable. To check the memory location of where a data object is stored, we use the function, id(). Consider the following example
a=[5, 10, 15]
id(a)
#1906292064
a[1]=20
id(a)
#1906292064
#Assigning values to the list a. The ID of the memory location where a is stored.
#Replacing the second item in the list,10 with a new item, 20.
#print(a) Using the print() function to verify the new value of a.# Using the function #id() to get the memory location of a.
#The ID of the memory location where a is stored.
the memory location has not changed as the ID remains (1906292064)
remains the same before and after the variable is modified. This indicates that the list
is mutable, i.e., it can be modified at the same memory location where it is stored
Related
I created a linked list in python, and I created a pointer variable pointing to the same node.
However, when I try to update the pointer, the linked list doesn't update. It only updates when I use the orignal notation. Here are the three lines of concern, followed by the code snippet:
pointer = self.storage[index]
pointer = pointer.next #does not work
self.storage[index] = self.storage[index].next #DOES work.
def remove(self, key):
pair = LinkedPair(key,None)
index = self._hash_mod(pair)
pointer = self.storage[index]
prev = None
while pointer:
if pointer.key == pair.key:
if prev is None: #at the beginning of the linked list, set the head to equal the next value
print(self.storage[index] == pointer) #true
self.storage[index] = self.storage[index].next
pointer = pointer.next
# pointer = pointer.next
break
# self.display(pointer,prev,' pointer == pair')
prev.next = pointer.next
del pointer
break
else:
prev = pointer
pointer = pointer.next
# self.display(pointer,prev,' post shifting')
# self.storage[index] = self.storage[index].next
return -1
Variables in python are just names for object, and assigning to them just changes the object designated by that name.
However, when assigning to elements of a list, you are changing the object referenced by that position in the list.
Thus:
pointer = self.storage[index] # 1
pointer = pointer.next # 2
self.storage[index] = self.storage[index].next # 3
(1) Makes pointer a name for the object referenced at storage[index]
(2) On the right side of the assignment =, you look for the name attribute of the object referenced by pointer. This is self.storage[index].name. The assignment will make the pointer variable refer to self.storage[index].name. To update the self.storage list, you need to operate on the list object itself. As opposed, self.storage[index], as referenced by pointer, is just "some object". You don't even have a way to get back to the list from there, so there's no way to change the list here.
(3) Here, however, you are changing the list self.storage, replacing the element at index. You could have doneself.storage[index] = pointer` at this point, with the same effect.
Of course python works with references (or pointers) under the hood. my_list[1] = obj doesn't allocate space for obj, it will store a reference to it. But local and global variables are not references, they are names.
In the end, global and local namespaces are just mapping variable names to objects. They are usually just normal dictionaries. You can see these by calling globals() and locals(). Or you can "implement" variable assignment just by changing that dictionary:
foo = 1
globals()["foo"] = 2
print(foo) # --> 2
Python assignment to variables is different than assignment to mutable objects like lists or dictionaries. Let's consider the following statements:
a = 1 # 1
b = [1] # 2
b[0] = 2 # 3
The first two assignments are doing the same. They are creating (or updating) global variables a and b, and map them to the int object 1 and to the list object [1], respectively. You'll have:
globals()
{
...
"a": 1,
"b": [1]
...
}
The third assignment is completely different. The assignment of this type is actually implemented by the object on the left. From the docs:
When a target is part of a mutable object (an attribute reference, subscription or slicing), the mutable object must ultimately perform the assignment and decide about its validity, and may raise an exception if the assignment is unacceptable.
This means that the third statement is implemented by the list object itself. Now the list in python keeps references to objects (it does not copy the objects). b[0], when used in an expression, is implemented by a special method of the list object and will return the object referenced by the list at position 0. b[0] = 2 is actually calling a special method of the list object, with the index (0) and the target object (2) as arguments, and the list implementation decides how it updates itself: it will change the reference at index 0 to the new object.
This difference is embedded in python's assignment statement. I've liked you the full reference, too.
More details:
https://jeffknupp.com/blog/2012/11/13/is-python-callbyvalue-or-callbyreference-neither/
https://docs.python.org/3/tutorial/classes.html#a-word-about-names-and-objects
https://docs.python.org/2.0/ref/assignment.html
Why does CPython (no clue about other Python implementations) have the following behavior?
tuple1 = ()
tuple2 = ()
dict1 = {}
dict2 = {}
list1 = []
list2 = []
# makes sense, tuples are immutable
assert(id(tuple1) == id(tuple2))
# also makes sense dicts are mutable
assert(id(dict1) != id(dict2))
# lists are mutable too
assert(id(list1) != id(list2))
assert(id(()) == id(()))
# why no assertion error on this?
assert(id({}) == id({}))
# or this?
assert(id([]) == id([]))
I have a few ideas why it may, but can't find a concrete reason why.
EDIT
To further prove Glenn's and Thomas' point:
[1] id([])
4330909912
[2] x = []
[3] id(x)
4330909912
[4] id([])
4334243440
When you call id({}), Python creates a dict and passes it to the id function. The id function takes its id (its memory location), and throws away the dict. The dict is destroyed. When you do it twice in quick succession (without any other dicts being created in the mean time), the dict Python creates the second time happens to use the same block of memory as the first time. (CPython's memory allocator makes that a lot more likely than it sounds.) Since (in CPython) id uses the memory location as the object id, the id of the two objects is the same. This obviously doesn't happen if you assign the dict to a variable and then get its id(), because the dicts are alive at the same time, so their id has to be different.
Mutability does not directly come into play, but code objects caching tuples and strings do. In the same code object (function or class body or module body) the same literals (integers, strings and certain tuples) will be re-used. Mutable objects can never be re-used, they're always created at runtime.
In short, an object's id is only unique for the lifetime of the object. After the object is destroyed, or before it is created, something else can have the same id.
CPython is garbage collecting objects as soon as they go out of scope, so the second [] is created after the first [] is collected. So, most of the time it ends up in the same memory location.
This shows what's happening very clearly (the output is likely to be different in other implementations of Python):
class A:
def __init__(self): print("a")
def __del__(self): print("b")
# a a b b False
print(A() is A())
# a b a b True
print(id(A()) == id(A()))
Why does CPython (no clue about other Python implementations) have the following behavior?
tuple1 = ()
tuple2 = ()
dict1 = {}
dict2 = {}
list1 = []
list2 = []
# makes sense, tuples are immutable
assert(id(tuple1) == id(tuple2))
# also makes sense dicts are mutable
assert(id(dict1) != id(dict2))
# lists are mutable too
assert(id(list1) != id(list2))
assert(id(()) == id(()))
# why no assertion error on this?
assert(id({}) == id({}))
# or this?
assert(id([]) == id([]))
I have a few ideas why it may, but can't find a concrete reason why.
EDIT
To further prove Glenn's and Thomas' point:
[1] id([])
4330909912
[2] x = []
[3] id(x)
4330909912
[4] id([])
4334243440
When you call id({}), Python creates a dict and passes it to the id function. The id function takes its id (its memory location), and throws away the dict. The dict is destroyed. When you do it twice in quick succession (without any other dicts being created in the mean time), the dict Python creates the second time happens to use the same block of memory as the first time. (CPython's memory allocator makes that a lot more likely than it sounds.) Since (in CPython) id uses the memory location as the object id, the id of the two objects is the same. This obviously doesn't happen if you assign the dict to a variable and then get its id(), because the dicts are alive at the same time, so their id has to be different.
Mutability does not directly come into play, but code objects caching tuples and strings do. In the same code object (function or class body or module body) the same literals (integers, strings and certain tuples) will be re-used. Mutable objects can never be re-used, they're always created at runtime.
In short, an object's id is only unique for the lifetime of the object. After the object is destroyed, or before it is created, something else can have the same id.
CPython is garbage collecting objects as soon as they go out of scope, so the second [] is created after the first [] is collected. So, most of the time it ends up in the same memory location.
This shows what's happening very clearly (the output is likely to be different in other implementations of Python):
class A:
def __init__(self): print("a")
def __del__(self): print("b")
# a a b b False
print(A() is A())
# a b a b True
print(id(A()) == id(A()))
I understand that in python every thing, be it a number, string, dict or anything is an object. The variable name simply points to the object in the memory. Now according to this question,
>> a_dict = b_dict = c_dict = {}
This creates an empty dictionary and all the variables point to this dict object. So, changing any one would be reflected in the other variables.
>> a_dict["key"] = "value" #say
>> print a_dict
>> print b_dict
>> print c_dict
would give
{'key': value}
{'key': value}
{'key': value}
I had understood the concept of variables pointing to objects, so this seems fair enough.
Now even though it might be weird, since its such a basic statement, why does this happen ?
>> a = b = c = 1
>> a += 1
>> print a, b, c
2, 1, 1 # and not 2, 2, 2
First part of question: Why isn't the same concept applied here ?
Actually this doubt came up when I was trying to search for a solution for this:
>> a_dict = {}
>> some_var = "old_value"
>> a_dict['key'] = some_var
>> some_var = "new_value"
>> print a_dict
{'key': 'old_value'} # and not {'key': 'new_value'}
This seemed counter-intuitive since I had always assumed that I am telling the dictionary to hold the variable, and changing the object that the variable was pointing to would obviously reflect in the dictionary. But this seems to me as if the value is being copied, not referenced. This was the second thing I didn't understand.
Moving on, i tried something else
>> class some_class(object):
.. def __init__(self):
.. self.var = "old_value"
>> some_object = some_class()
>> a_dict = {}
>> a_dict['key'] = some_object
>> some_object.var = "new_value"
>> print a_dict['key'].var
"new_value" # even though this was what i wanted and expected, it conflicts with the output in the previous code
Now, over here, obviously it was being referenced. These contradictions has left me squacking at the unpredictable nature of python, even though I still love it, owing to the fact I don't know any other language well enough :p . Even though I'd always imagined that assignments lead to reference of the object, however these 2 cases are conflicting. So this is my final doubt . I understand that it might be one those python gotcha's . Please educate me.
You're wrestling with 2 different things here. The first is the idea of mutability vs. immutability. In python, str, int, tuple are some of the builtin immutable types compared to list, dict (and others) which are mutable types. immutable objects are ones which cannot be changed once they are created. So, in your example:
a = b = c = 1
After that line, all a, b and c refer to the same integer in memory (you can check by printing their respecitve id's and noting that they are the same). However, when you do:
a += 1
a now refers to a new (different) integer at a different memory location. Note that as a convention, += should return a new instance of something if the type is immutable. If the type is mutable, it should change the object in place and return it. I explain some of the more gory detail in this answer.
For the second part, you're trying to figure out how python's identifiers work. The way that I think of it is this... when you write a statement:
name = something
The right hand side is evaluated into some object (an integer, string, ...). That object is then given the name on the left hand side1. When a name is on the right hand side, the corresponding object is automatically "looked up" and substituted for the name in the calculation. Note that in this framework, assignment doesn't care if anything had that name before -- it simply overwrites the old value with the new one. Objects which were previously constructed using that name don't see any changes -- either. They've already been created -- keeping references to the objects themselves, not the names. So:
a = "foo" # `a` is the name of the string "foo"
b = {"bar": a} # evaluate the new dictionary and name it `b`. `a` is looked up and returns "foo" in this calculation
a = "bar" # give the object "bar" the name `a` irrespecitve of what previously had that name
1I'm glossing over a few details here for simplicity -- e.g. what happens when you assign to a list element: lst[idx] = some_value * some_other_value.
This is because += can be interpreted as a = a + 1, which rebinds the variable a to the value a + 1, that is, 2.
Similarly, some_var = "new_value" rebinds the variable and the object is not changed, so the key, value pair in the dictionary still points to that object.
In your last example, you are not rebinding, but mutating the object, so the value is changed in the dictionary.
class test:
def __init__(self):
self.see=0
self.dic={"1":self.see}
examine=test()
examine.see+=1
print examine.dic["1"]
print examine.see
this has as a result 0 and 1 and it makes no sense why.
print id(examine.dic["1"])
print id(examine.see)
they also have different memory addresses
However, if you use the same example but you have an array instead of variable in see. You get the expected output.
Any explanations?
This gives the expected output:
class test:
def __init__(self):
self.see=[0]
self.dic={"1":self.see}
examine=test()
examine.see[0]+=1
print examine.dic["1"][0]
print examine.see[0]
Short answer:
Arrays/lists are mutable whereas integers/ints are not.
lists are mutable (they can be changed in place), when you change a list the same object gets updated (the id doesn't change, because a new object is not needed).
Integers are immuable - this means to change the value of something, you have to create a new object, which will have a different id. Strings work the same way and you would have had the same "problem" if you set self.see = 'a', and then did examine.see += ' b'
>>> a = 'a'
>>> id(a)
3075861968L
>>> z = a
>>> id(z)
3075861968L
>>> a += ' b'
>>> id(a)
3075385776L
>>> id(z)
3075861968L
>>> z
'a'
>>> a
'a b'
In Python, names point to values; and values are managed by Python. The id() method returns a unique identifier of the value and not the name.
Any number of names can point to the same value. This means, you can have multiple names that are all linked to the same id.
When you first create your class object, the name see is pointing to the value of an integer object, and that object's value is 1. Then, when you create your class dic, the "1" key is now pointing to the same object that see was pointing to; which is 1.
Since 1 (an object of type integer) is immutable - whenever you update it, the original object is replaced and a new object is created - this is why the return value of id() changes.
Python is smart enough to know that there are some other names pointing to the "old" value, and so it keeps that around in memory.
However, now you have two objects; and the dictionary is still pointing to the "old" one, and see is now pointing to the new one.
When you use a list, Python doesn't need to create a new object because it can modify a list without destroying it; because lists are mutable. Now when you create a list and point two names to it, both the names are pointing to the same object. When you update this object (by adding a value, or deleting a value or changing its value) the same object is updated - and so everything pointing to it will get the "updated" value.
examine.dic["1"] and examine.see do indeed have different locations, even if the former's initial value is copied from the latter.
With your case of using an array, you're not changing the value of examine.see: you're instead changing examine.see[0], which is changing the content of the array it points to (which is aliased to examine.dic["1"]).
When you do self.dic={"1":self.see}, the dict value is set to the value of self.see at that moment. When you later do examine.see += 1, you set examine.see to a new value. This has no effect on the dict because the dict was set to the value of self.see; it does not know to "keep watching" the name self.see to see if is pointing to a different value.
If you set self.see to a list, and then do examine.see += [1], you are not setting examine.see to a new value, but are changing the existing value. This will be visible in the dict, because, again, the dict is set to the value, and that value can change.
The thing is that sometimes a += b sets a to a new value, and sometimes it changes the existing value. Which one happens depends on the type of a; you need to know what examine.see is to know what examine.see += something does.
Others have addressed the mutability/boxing question. What you seem to be asking for is late binding. This is possible, but a little counterintuitive and there's probably a better solution to your underlying problem… if we knew what it was.
class test:
#property
def dic(self):
self._dic.update({'1': self.see})
return self._dic
def __init__(self):
self.see = 0
self._dic = {}
>>> ex=test()
>>> ex.see
0
>>> ex.see+=1
>>> ex.see
1
>>> ex.dic
{'1': 1}
>>> ex.see+=1
>>> ex.dic
{'1': 2}
In fact, in this contrived example it's even a little dangerous because returning self._dic the consumer could modify the dict directly. But that's OK, because you don't need to do this in real life. If you want the value of self.see, just get the value of self.see.
In fact, it looks like this is what you want:
class test:
_see = 0
#property
def see(self):
self._see+=1
return self._see
or, you know, just itertools.count() :P
This solution worked for me. Feel free to use it.
class integer:
def __init__(self, integer):
self.value=integer
def plus(self):
self.value=self.value+1
def output(self):
return self.value
The solution replaces the mutable type int with a class whose address is used as reference.
Furthermore you can make changes to the class object and the changes apply to what the dictionary points. It is somewhat a pointer/datastructure.