I know when Python handles object, it uses 'reference' to the object, not the real value itself.
But behavior of below code seems very odd to me.
5 dummy object references were copied from prev to now list, then prev was cleared.
Then, shouldn't references stored in now be invalidated too?
class dummy:
pass
prev = [dummy() for _ in range(5)]
now = []
for d in prev:
now.append(d)
for idx in range(5):
print(prev[idx] is now[idx]) # all True - so, same reference copied
prev.clear()
print(prev) # empty
print(now) # 5 'dummy' objects survived - How??
I found possible duplicate to this question, and would like to ask if this is the reason why references in now isn't invalidated after prev.clear()
Python: Delete object referenced from tuple
Maybe I'm too used to reference concept in C++ and Python is different.
By adding each object in prev to now, you increment their reference counts by 1, so they become 2.
When you delete the objects in prev, reference count of each object is decremented by 1, and they become 1.
Since reference counts of the objects did not reach 0, they do not go away.
Related
I'm looking for some clarification regarding mutability and class objects. From what I understand, variables in Python are about assigning a variable name to an object.
If that object is immutable then when we set two variables to the same object, it'll be two separate copies (e.g. a = b = 3 so a changing to 4 will not affect b because 3 is a number, an example of an immutable object).
However, if an object is mutable, then changing the value in one variable assignment will naturally change the value in the other (e.g. a = b = [] -> a.append(1) so now both a and b will refer to "[1]")
Working with classes, it seems even more fluid than I believed. I wrote a quick example below to show the differences. The first class is a typical Node class with a next pointer and a value. Setting two variables, "slow" and "fast", to the same instance of the Node object ("head"), and then changing the values of both "slow" and "fast" won't affect the other. That is, "slow", "fast", and "head" all refer to different objects (verified by checking their id() as well).
The second example class doesn't have a next pointer and only has a self.val attribute. This time changing one of the two variables, "p1" and "p2", both of which are set to the same instance, "start", will affect the other. This is despite that self.val in the "start" instance is an immutable number.
'''
The below will have two variable names (slow, fast) assigned to a head Node.
Changing one of them will NOT change the other reference as well.
'''
class Node:
def __init__(self, x, next=None):
self.x = x
self.next = next
def __str__(self):
return str(self.x)
n3 = Node(3)
n2 = Node(2, n3)
n1 = Node(1, n2)
head = n1
slow = fast = head
print(f"Printing before moving...{head}, {slow}, {fast}") # 1, 1, 1
while fast and fast.next:
fast = fast.next.next
slow = slow.next
print(f"Printing after moving...{head}, {slow}, {fast}") # 1, 2, 3
print(f"Checking the ids of each variable {id(head)}, {id(slow)}, {id(fast)}") # all different
'''
The below will have two variable names (p1, p2) assigned to a start Dummy.
Changing one of them will change the other reference as well.
'''
class Dummy:
def __init__(self, val):
self.val = val
def __str__(self):
return str(self.val)
start = Dummy(100)
p1 = p2 = start
print(f"Printing before changing {p1}, {p2}") # 100, 100
p1.val = 42
print(f"Printing after changing {p1}, {p2}") # 42, 42
This is a bit murky for me to understand what is actually going on under the hood and I'm seeking clarification so I can feel confident in setting multiple variable assignments to the same object expecting a true copy (without resorting to "import copy; copy.deepcopy(x);")
Thank you for your help
This isn't a matter of immutability vs mutability. This is a matter of mutating an object vs reassigning a reference.
If that object is immutable then when we set two variables to the same object, it'll be two separate copies
This isn't true. A copy won't be made. If you have:
a = 1
b = a
You have two references to the same object, not a copy of the object. This is fine though because integers are immutable. You can't mutate 1, so the fact that a and b are pointing to the same object won't hurt anything.
Python will never make implicit copies for you. If you want a copy, you need to copy it yourself explicitly (using copy.copy, or some other method like slicing on lists). If you write this:
a = b = some_obj
a and b will point to the same object, regardless of the type of some_obj and whether or not it's mutable.
So what's the difference between your examples?
In your first Node example, you never actually alter any Node objects. They may as well be immutable.
slow = fast = head
That initial assignment makes both slow an fast point to the same object: head. Right after that though, you do:
fast = fast.next.next
This reassigns the fast reference, but never actually mutates the object fast is looking at. All you've done is change what object the fast reference is looking at.
In your second example however, you directly mutate the object:
p1.val = 42
While this looks like reassignment, it isn't. This is actually:
p1.__setattr__("val", 42)
And __setattr__ alters the internal state of the object.
So, reassignment changes what object is being looked at. It will always take the form:
a = b # Maybe chained as well.
Contrast with these that look like reassignment, but are actually calls to mutating methods of the object:
l = [0]
l[0] = 5 # Actually l.__setitem__(0, 5)
d = Dummy()
d.val = 42 # Actually d.__setattr__("val", 42)
You overcomplicate things. The fundamental, simple rule is: each time you use = to assign an object to a variable, you make the variable name refer to that object, that's all. The object being mutable or not makes no difference.
With a = b = 3, you make the names a and b refer to the object 3. If you then make a = 4, you make the name a refer to the object 4, and the name b still refers to 3.
With a = b = [], you've created two names a and b that refer to the same list object. When doing a.append(1), you append 1 to this list. You haven't assigned anything to a or b in the process (you didn't write any a = ... or b = ...). So, whether you access the list through the name a or b, it's still the same list that you manipulate. It can just be called by two different names.
The same happens in your example with classes: when you write fast = fast.next.next, you make the name fast refer to a new object.
When you do p1.val = 42, you don't make p1 refer to a new different instance, but you change the val attribute of this instance. p1 and p2are still two names for this unique instance, so using either name lets you refer to the same instance.
Mutable and Immutable Objects
When a program is run, data objects in the program are stored in the computer’s
memory for processing. While some of these objects can be modified at that memory
location, other data objects can’t be modified once they are stored in the memory. The
property of whether or not data objects can be modified in the same memory location
where they are stored is called mutability. We can check the mutability of an object by checking its memory location before and
after it is modified. If the memory location remains the same when the data object is
modified, it means it is mutable. To check the memory location of where a data object is stored, we use the function, id(). Consider the following example
a=[5, 10, 15]
id(a)
#1906292064
a[1]=20
id(a)
#1906292064
#Assigning values to the list a. The ID of the memory location where a is stored.
#Replacing the second item in the list,10 with a new item, 20.
#print(a) Using the print() function to verify the new value of a.# Using the function #id() to get the memory location of a.
#The ID of the memory location where a is stored.
the memory location has not changed as the ID remains (1906292064)
remains the same before and after the variable is modified. This indicates that the list
is mutable, i.e., it can be modified at the same memory location where it is stored
I created a linked list in python, and I created a pointer variable pointing to the same node.
However, when I try to update the pointer, the linked list doesn't update. It only updates when I use the orignal notation. Here are the three lines of concern, followed by the code snippet:
pointer = self.storage[index]
pointer = pointer.next #does not work
self.storage[index] = self.storage[index].next #DOES work.
def remove(self, key):
pair = LinkedPair(key,None)
index = self._hash_mod(pair)
pointer = self.storage[index]
prev = None
while pointer:
if pointer.key == pair.key:
if prev is None: #at the beginning of the linked list, set the head to equal the next value
print(self.storage[index] == pointer) #true
self.storage[index] = self.storage[index].next
pointer = pointer.next
# pointer = pointer.next
break
# self.display(pointer,prev,' pointer == pair')
prev.next = pointer.next
del pointer
break
else:
prev = pointer
pointer = pointer.next
# self.display(pointer,prev,' post shifting')
# self.storage[index] = self.storage[index].next
return -1
Variables in python are just names for object, and assigning to them just changes the object designated by that name.
However, when assigning to elements of a list, you are changing the object referenced by that position in the list.
Thus:
pointer = self.storage[index] # 1
pointer = pointer.next # 2
self.storage[index] = self.storage[index].next # 3
(1) Makes pointer a name for the object referenced at storage[index]
(2) On the right side of the assignment =, you look for the name attribute of the object referenced by pointer. This is self.storage[index].name. The assignment will make the pointer variable refer to self.storage[index].name. To update the self.storage list, you need to operate on the list object itself. As opposed, self.storage[index], as referenced by pointer, is just "some object". You don't even have a way to get back to the list from there, so there's no way to change the list here.
(3) Here, however, you are changing the list self.storage, replacing the element at index. You could have doneself.storage[index] = pointer` at this point, with the same effect.
Of course python works with references (or pointers) under the hood. my_list[1] = obj doesn't allocate space for obj, it will store a reference to it. But local and global variables are not references, they are names.
In the end, global and local namespaces are just mapping variable names to objects. They are usually just normal dictionaries. You can see these by calling globals() and locals(). Or you can "implement" variable assignment just by changing that dictionary:
foo = 1
globals()["foo"] = 2
print(foo) # --> 2
Python assignment to variables is different than assignment to mutable objects like lists or dictionaries. Let's consider the following statements:
a = 1 # 1
b = [1] # 2
b[0] = 2 # 3
The first two assignments are doing the same. They are creating (or updating) global variables a and b, and map them to the int object 1 and to the list object [1], respectively. You'll have:
globals()
{
...
"a": 1,
"b": [1]
...
}
The third assignment is completely different. The assignment of this type is actually implemented by the object on the left. From the docs:
When a target is part of a mutable object (an attribute reference, subscription or slicing), the mutable object must ultimately perform the assignment and decide about its validity, and may raise an exception if the assignment is unacceptable.
This means that the third statement is implemented by the list object itself. Now the list in python keeps references to objects (it does not copy the objects). b[0], when used in an expression, is implemented by a special method of the list object and will return the object referenced by the list at position 0. b[0] = 2 is actually calling a special method of the list object, with the index (0) and the target object (2) as arguments, and the list implementation decides how it updates itself: it will change the reference at index 0 to the new object.
This difference is embedded in python's assignment statement. I've liked you the full reference, too.
More details:
https://jeffknupp.com/blog/2012/11/13/is-python-callbyvalue-or-callbyreference-neither/
https://docs.python.org/3/tutorial/classes.html#a-word-about-names-and-objects
https://docs.python.org/2.0/ref/assignment.html
I am working with large nested dictionaries, and am trying to delete nested subdictionaries. I am wondering why the following behavior occurs.
When I set a reference to dictionary d (called ref), then I change ref and print d, it shows an updated version of d with the third element added.
input:
d={"a":1,"b":2}
ref=d
ref["c"]=3
print(d)
output:
{'a': 1, 'b': 2, 'c': 3}
Given this behavior, I was expecting to be able to delete the dictionary by delete
input:
d={"a":1,"b":2}
ref=d
del ref
print(d)
output:
{'a': 1, 'b': 2}
I am wondering if there is a way to delete the original object when I delete the reference (meaning that the output of the second program would be an error because d was deleted.
del doesn't actually handle any de-allocation of memory, it merely unbinds a value from a name, and then decrements the reference count of that object by one. There is no way to systematically unbind all names from an object given a single reference.
An object is not garbage collected until some point after the reference count drops to 0. You can see an object's reference count by using the sys.getrefcount method (which is typically one higher than it actually is because of the temporary reference within the method itself).
We can demonstrate del in practice using this method and the __del__ method (which is called only when the reference count for the object is decremented to 0):
>>> # print something when refcount == 0 and object is about to be collected
>>> class Deleted:
... def __del__(self):
... print("actually deleted")
...
>>> a = Deleted()
>>> # just a
>>> sys.getrefcount(a) - 1
1
>>> b = a
>>> # a and b
>>> sys.getrefcount(a) - 1
2
>>> del a
>>> # now it's just b
>>> sys.getrefcount(b) - 1
1
>>> del b
actually deleted
If you're curious to read more about how all of this works internally, check out the C API documentation on the internal calls for reference counting, and check out the gc module, which is the high level python interface for introspecting the garbage collection sub-system.
Given your specific problem, since you are working with dictionaries which are mutable types, you could just clear the dictionary:
>>> a = {"a": 1}
>>> b = a
>>> # will clear the dict that both a and b are referencing
>>> b.clear()
>>> a
{}
Alternatively you can use the equivalent range syntax to clear the dictionary del a[:].
The del statement behaves differently depending on what is being deleted. Paraphrasing slightly:
Deletion of a name removes the binding of that name from the local or global namespace
That is the second case presented. You've got two references to the same object. The name ref has been deleted, but the name d still exists and points to the same object is always did.
However, attributes, subscriptions, and slicings have different behaviour:
Deletion of attribute references, subscriptions and slicings is passed to the primary object involved
That is more like the first case - deleting an element from either name will be reflected in the other:
input:
d = {"a":1, "b":2}
ref = d
del ref["a"]
print(d)
output:
{'b': 2}
So, wrapping the references inside a dictionary (or other container), will allow deletion by any reference.
class test:
def __init__(self):
self.see=0
self.dic={"1":self.see}
examine=test()
examine.see+=1
print examine.dic["1"]
print examine.see
this has as a result 0 and 1 and it makes no sense why.
print id(examine.dic["1"])
print id(examine.see)
they also have different memory addresses
However, if you use the same example but you have an array instead of variable in see. You get the expected output.
Any explanations?
This gives the expected output:
class test:
def __init__(self):
self.see=[0]
self.dic={"1":self.see}
examine=test()
examine.see[0]+=1
print examine.dic["1"][0]
print examine.see[0]
Short answer:
Arrays/lists are mutable whereas integers/ints are not.
lists are mutable (they can be changed in place), when you change a list the same object gets updated (the id doesn't change, because a new object is not needed).
Integers are immuable - this means to change the value of something, you have to create a new object, which will have a different id. Strings work the same way and you would have had the same "problem" if you set self.see = 'a', and then did examine.see += ' b'
>>> a = 'a'
>>> id(a)
3075861968L
>>> z = a
>>> id(z)
3075861968L
>>> a += ' b'
>>> id(a)
3075385776L
>>> id(z)
3075861968L
>>> z
'a'
>>> a
'a b'
In Python, names point to values; and values are managed by Python. The id() method returns a unique identifier of the value and not the name.
Any number of names can point to the same value. This means, you can have multiple names that are all linked to the same id.
When you first create your class object, the name see is pointing to the value of an integer object, and that object's value is 1. Then, when you create your class dic, the "1" key is now pointing to the same object that see was pointing to; which is 1.
Since 1 (an object of type integer) is immutable - whenever you update it, the original object is replaced and a new object is created - this is why the return value of id() changes.
Python is smart enough to know that there are some other names pointing to the "old" value, and so it keeps that around in memory.
However, now you have two objects; and the dictionary is still pointing to the "old" one, and see is now pointing to the new one.
When you use a list, Python doesn't need to create a new object because it can modify a list without destroying it; because lists are mutable. Now when you create a list and point two names to it, both the names are pointing to the same object. When you update this object (by adding a value, or deleting a value or changing its value) the same object is updated - and so everything pointing to it will get the "updated" value.
examine.dic["1"] and examine.see do indeed have different locations, even if the former's initial value is copied from the latter.
With your case of using an array, you're not changing the value of examine.see: you're instead changing examine.see[0], which is changing the content of the array it points to (which is aliased to examine.dic["1"]).
When you do self.dic={"1":self.see}, the dict value is set to the value of self.see at that moment. When you later do examine.see += 1, you set examine.see to a new value. This has no effect on the dict because the dict was set to the value of self.see; it does not know to "keep watching" the name self.see to see if is pointing to a different value.
If you set self.see to a list, and then do examine.see += [1], you are not setting examine.see to a new value, but are changing the existing value. This will be visible in the dict, because, again, the dict is set to the value, and that value can change.
The thing is that sometimes a += b sets a to a new value, and sometimes it changes the existing value. Which one happens depends on the type of a; you need to know what examine.see is to know what examine.see += something does.
Others have addressed the mutability/boxing question. What you seem to be asking for is late binding. This is possible, but a little counterintuitive and there's probably a better solution to your underlying problem… if we knew what it was.
class test:
#property
def dic(self):
self._dic.update({'1': self.see})
return self._dic
def __init__(self):
self.see = 0
self._dic = {}
>>> ex=test()
>>> ex.see
0
>>> ex.see+=1
>>> ex.see
1
>>> ex.dic
{'1': 1}
>>> ex.see+=1
>>> ex.dic
{'1': 2}
In fact, in this contrived example it's even a little dangerous because returning self._dic the consumer could modify the dict directly. But that's OK, because you don't need to do this in real life. If you want the value of self.see, just get the value of self.see.
In fact, it looks like this is what you want:
class test:
_see = 0
#property
def see(self):
self._see+=1
return self._see
or, you know, just itertools.count() :P
This solution worked for me. Feel free to use it.
class integer:
def __init__(self, integer):
self.value=integer
def plus(self):
self.value=self.value+1
def output(self):
return self.value
The solution replaces the mutable type int with a class whose address is used as reference.
Furthermore you can make changes to the class object and the changes apply to what the dictionary points. It is somewhat a pointer/datastructure.
Today I learned that Python caches the expression {}, and replaces it with a new empty dict when it's assigned to a variable:
print id({})
# 40357936
print id({})
# 40357936
x = {}
print id(x)
# 40357936
print id({})
# 40356432
I haven't looked at the source code, but I have an idea as to how this might be implemented. (Maybe when the reference count to the global {} is incremented, the global {} gets replaced.)
But consider this bit:
def f(x):
x['a'] = 1
print(id(x), x)
print(id(x))
# 34076544
f({})
# (34076544, {'a': 1})
print(id({}), {})
# (34076544, {})
print(id({}))
# 34076544
f modifies the global dict without causing it to be replaced, and it prints out the modified dict. But outside of f, despite the id being the same, the global dict is now empty!
What is happening??
It's not being cached -- if you don't assign the result of {} anywhere, its reference count is 0 and it's cleaned up right away. It just happened that the next one you allocated reused the memory from the old one. When you assign it to x you keep it alive, and then the next one has a different address.
In your function example, once f returns there are no remaining references to your dict, so it gets cleaned up too, and the same thing applies.
Python isn't doing any caching here. There are two possibilities when id() gives the same return value at different points in a program:
id() was called on the same object twice
The first object that id() was called on was garbage collected before the second object was created, and the second object was created in the same memory location as the original
In this case, it was the second one. This means that even though print id({}); print id({}) may print the same value twice, each call is on a distinct object.