After reading on a few places including here: Understanding dict.copy() - shallow or deep?
It claims that dict.copy will create a shallow copy otherwise known as a reference to the same values. However, when playing with it myself in python3 repl, I only get a copy by value?
a = {'one': 1, 'two': 2, 'three': 3}
b = a.copy()
print(a is b) # False
print(a == b) # True
a['one'] = 5
print(a) # {'one': 5, 'two': 2, 'three': 3}
print(b) # {'one': 1, 'two': 2, 'three': 3}
Does this mean that shallow and deep copies do not necessarily affect immutable values?
Integers are inmutable, the problem comes when referencing objects, check this similar example:
import copy
a = {'one': [], 'two': 2, 'three': 3}
b = a.copy()
c = copy.deepcopy(a)
print(a is b) # False
print(a == b) # True
a['one'].append(5)
print(a) # {'one': [5], 'two': 2, 'three': 3}
print(b) # {'one': [5], 'two': 2, 'three': 3}
print(c) # {'one': [], 'two': 2, 'three': 3}
Here you have it live
What you are observing has nothing to do with dictionaries at all. You are getting confused by the difference between binding and mutation.
Let's forget dictionaries at first, and demonstrate the issue with simple variables. Once we understand the fundamental point, we can then go back to the dictionary example.
a = 1
b = a
a = 2
print(b) # prints 1
On the first line you create a binding between the name a and the object 1.
On the second line you create a binding between the name b and the value of the expression a ... which is the very same object 1 which was bound to the name a on the previous line.
On the third line you create a binding between the name a and the object 2, in the process forgetting that there ever was a binding between a and the 1.
It is vital to note that this last step cannot in any way affect b!
The situation is completely symmetric, so if line 3 were b = 2 this would have absolutely no effect on a.
Now, people often mistakenly claim that this is somehow a result of the immutability of integers. Integers are immutable in Python, but that is completely irrelevant. If we do something similar with some mutable objects, say lists, then we get equivalent results.
a = [1]
b = a
a = [2]
print(b) # prints [1]
Once again
a is bound to some object
b is bound to the same object
a is now rebound to some different object
This cannot affect b or the object to which it is bound [*] in any way! No attempt has been made anywhere to mutate any object, so mutability is completely irrelevant to this situation.
[*] actually, it does change the reference count of the object (at least in CPython) but that's not really an observable property of the object.
However, if, instead of rebinding a, we
Use a to access the object to which it is bound
Mutate that object
then we will affect b, because the object to which b is bound will be mutated:
a = [1]
b = a
a[0] = 2
print(b) # prints [2]
In summary, you have to understand
The difference between binding and mutation. The former affects a variable (or more generally a location) while the latter affects an object. Therein lies the key difference
Rebinding a name (or location in general) cannot affect the object to which that name was previously bound (beyond changing its reference count).
Now, in your example you create something that looks (conceptually) like this:
a ---> { 'three' ----------------------> 3
'two' -------------> 2 ^
'one' ---> 1 } ^ |
^ | |
| | |
b ---> { 'one' ----- | |
'two' --------------- |
'three' -------------------------
and then a['one'] = 5 simply rebinds the location a['one'] so that it is no longer bound to the 1 but to 5. In other words, that arrow coming out of the first 'one', now points somewhere else.
It is important to remember that this has absolutely nothing to do with the immutability of integers. If you make each and every integer in your example mutable (for example by replacing it with a list which contains it: i.e. replace every occurance of 1 with [1] (and similarly for 2 and 3)) then you will still observe essentially the same behaviour: a['one'] = [1] will not affect the value of b['one'].
Now, in this latest example, where the values stored in your dictionary are lists and therefore structured, it becomes possible to distinguish between shallow and deep copy:
b = a will not copy the dictionary at all: it will simply make b a new binding to the same single dictionary
b = copy.copy(b) will create a new dictionary with internal bindings to the same lists. The dictionary is copied but its contents (below the top level) are simply referenced by the new dictionary.
b = copy.deepcopy(a) will also create a new dictionary, but it will also create new objects to populate that dictionary, rather than referencing the original ones.
Consequently, if you mutate (rather than rebind) something in the shallow copy case, the other dictionary will 'see' mutation, because the two dictionaries share objects. This does not happen in the deep copy.
please consider this situation explained hence you will be able to understand the referencing and copy() method easily.
dic = {'data1': 100, 'data2': -54, 'data3': 247}
dict1 = dic
dict2 = dic.copy()
print(dict2 is dic)
# False
print(dict1 is dic)
# true
First print statement prints false because dict2 and dic are 2 separate dictionary with separate memory spaces even though they have same contents. This happens when we use copy function.
secondly when assigning dic to dict1 does not create a separate dictionary with separate memory spaces instead dict1 makes a refernce to dic.
A shallow copy of some container means that a new identical object is returned, but that its values are the same objects.
This means that mutating the values of the copy will mutate the values of the original. In your example, you are not mutating a value, you are instead updating a key.
Here is an example of value mutation.
d = {'a': []}
d_copy = d.copy()
print(d is d_copy) # False
print(d['a'] is d['a']) # True
d['a'].append(1)
print(d_copy) # {'a': [1]}
On the other side, a deepcopy of a container returns a new identical object, but where the values have been recursively copied as well.
Related
List reference append code
a = [1,2,3,4,5]
b = a
b.append(6)
print(a)
print(b)
#ans:
[1,2,3,4,5,6]
[1,2,3,4,5,6]
Integer reference in int
a = 1
b = a
b +=1
print(a)
print(b)
#ans:
1
2
how reference works in python integer vs list ? in list both value are same, why is in integer section a value is not 2 ?
In Python, everything is an object. Everything is a name for an address (pointer) per the docs.
On that page you can scroll down and find the following:
Numeric objects are immutable; once created their value never changes
Under that you'll see the int type defined, so it makes perfect sense your second example works.
On the top of the same page, you'll find the following:
Every object has an identity, a type and a value. An object’s identity never changes once it has been created; you may think of it as the object’s address in memory.
Python behaves just like C and Java in that you cannot reassign where the pointer to a name points. Python, like Java, is also pass-by-value and doesn't have a pass-by-reference semantic.
Looking at your first example:
>>> a = 1
>>> hex(id(a))
'0x7ffdc64cd420'
>>> b = a + 1
>>> hex(id(b))
'0x7ffdc64cd440'
>>> print(a)
1
>>> print(b)
2
Here it is shown that the operation b = a + 1 leaves a at 1 and b is now 2. That's because int is immutable, names that point to the value 1 will always point to the same address:
>>> a = 1
>>> b = 2
>>> c = 1
>>> hex(id(a))
'0x7ffdc64cd420'
>>> hex(id(b))
'0x7ffdc64cd440'
>>> hex(id(c))
'0x7ffdc64cd420'
Now this only holds true for the values of -5 to 256 in the C implementation, so beyond that you get new addresses, but the mutability shown above holds. I've shown you the sharing of memory addresses for a reason. On the same page you'll find the following:
Types affect almost all aspects of object behavior. Even the importance of object identity is affected in some sense: for immutable types, operations that compute new values may actually return a reference to any existing object with the same type and value, while for mutable objects this is not allowed. E.g., after a = 1; b = 1, a and b may or may not refer to the same object with the value one, depending on the implementation, but after c = []; d = [], c and d are guaranteed to refer to two different, unique, newly created empty lists. (Note that c = d = [] assigns the same object to both c and d.)
So your example:
>>> a = [1, 2, 3, 4, 5]
>>> hex(id(a))
'0x17292e1cbc8'
>>> b = a
>>> hex(id(b))
'0x17292e1cbc8'
I should be able to stop right here, its obvious that both a and b refer to the same object in memory at address 0x17292e1cbc8. Thats because the above is like saying:
# Lets assume that `[1, 2, 3, 4, 5]` is 0x17292e1cbc8 in memory
>>> a = 0x17292e1cbc8
>>> b = a
>>> print(b)
'0x17292e1cbc8'
Long and skinny? You're simply assigning a pointer to a new name, but both names point to the same object in memory! Note: This is not the same as a shallow copy because no external compound object is made.
I want to understand why:
a = [];
del a; and
del a[:];
behave so differently.
I ran a test for each to illustrate the differences I witnessed:
>>> # Test 1: Reset with a = []
...
>>> a = [1,2,3]
>>> b = a
>>> a = []
>>> a
[]
>>> b
[1, 2, 3]
>>>
>>> # Test 2: Reset with del a
...
>>> a = [1,2,3]
>>> b = a
>>> del a
>>> a
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'a' is not defined
>>> b
[1, 2, 3]
>>>
>>> # Test 3: Reset with del a[:]
...
>>> a = [1,2,3]
>>> b = a
>>> del a[:]
>>> a
[]
>>> b
[]
I did find Different ways of clearing lists, but I didn't find an explanation for the differences in behaviour. Can anyone clarify this?
Test 1
>>> a = [1,2,3] # set a to point to a list [1, 2, 3]
>>> b = a # set b to what a is currently pointing at
>>> a = [] # now you set a to point to an empty list
# Step 1: A --> [1 2 3]
# Step 2: A --> [1 2 3] <-- B
# Step 3: A --> [ ] [1 2 3] <-- B
# at this point a points to a new empty list
# whereas b points to the original list of a
Test 2
>>> a = [1,2,3] # set a to point to a list [1, 2, 3]
>>> b = a # set b to what a is currently pointing at
>>> del a # delete the reference from a to the list
# Step 1: A --> [1 2 3]
# Step 2: A --> [1 2 3] <-- B
# Step 3: [1 2 3] <-- B
# so a no longer exists because the reference
# was destroyed but b is not affected because
# b still points to the original list
Test 3
>>> a = [1,2,3] # set a to point to a list [1, 2, 3]
>>> b = a # set b to what a is currently pointing at
>>> del a[:] # delete the contents of the original
# Step 1: A --> [1 2 3]
# Step 2: A --> [1 2 3] <-- B
# Step 2: A --> [ ] <-- B
# both a and b are empty because they were pointing
# to the same list whose elements were just removed
Of your three "ways of deleting Python lists", only one actually alters the original list object; the other two only affect the name.
a = [] creates a new list object, and assigns it to the name a.
del a deletes the name, not the object it refers to.
del a[:] deletes all references from the list referenced by the name a (although, similarly, it doesn't directly affect the objects that were referenced from the list).
It's probably worth reading this article on Python names and values to better understand what's going on here.
Test 1: rebinds a to a new object, b still holds a reference to the original object, a is just a name by rebinding a to a new object does not change the original object that b points to.
Test 2: you del the name a so it no longer exists but again you still have a reference to the object in memory with b.
Test 3 a[:] just like when you copy a list or want to change all the elements of a list refers to references to the objects stored in the list not the name a. b gets cleared also as again it is a reference to a so changes to the content of a will effect b.
The behaviour is documented:
There is a way to remove an item from a list given its index instead
of its value: the del statement. This differs from the pop()
method which returns a value. The del statement can also be used to
remove slices from a list or clear the entire list (which we did
earlier by assignment of an empty list to the slice). For example:
>>>
>>> a = [-1, 1, 66.25, 333, 333, 1234.5]
>>> del a[0]
>>> a
[1, 66.25, 333, 333, 1234.5]
>>> del a[2:4]
>>> a
[1, 66.25, 1234.5]
>>> del a[:]
>>> a
[]
del can also be used to delete entire variables:
>>>
>>> del a
Referencing the name a hereafter is an error (at least until another
value is assigned to it). We'll find other uses for del later.
So only del a actually deletes a, a = [] rebinds a to a new object and del a[:] clears a. In your second test if b did not hold a reference to the object it would be garbage collected.
del a
is removing the variable a from the scope. Quoting from python docs:
Deletion of a name removes the binding of that name from the local or
global namespace, depending on whether the name occurs in a global
statement in the same code block.
del a[:]
is simply removing the contents of a, since the deletion is passed to the a object, instead of applied to it. Again from the docs:
Deletion of attribute references, subscriptions and slicings is passed
to the primary object involved; deletion of a slicing is in general
equivalent to assignment of an empty slice of the right type (but even
this is determined by the sliced object).
.
Of those three methods, only the third method actually results in deleting the list that 'a' points to. Lets do a quick overview.
When you right a = [1, 2, 3] it creates a list in memory, with the items [1, 2, 3] and then gets 'a' to point to it. When you write b = a this preforms whats' called a 'shallow copy,' i.e. it makes 'b' point to the same block of memory as 'a.' a deep copy would involve copying the contents of the list into a new block of memory, then pointing to that.
now, when you write a = [] you are creating a new list with no items in it, and getting 'a' to point to it. the original list still exists, and 'b' is pointing to it.
in the second case, del a deletes the pointer to [1,2,3] and not the array it's self. this means b can still point to it.
lastly, del a[:] goes through the data 'a' is pointing to and empties it's contents. 'a' still exists, so you can use it. 'b' also exists, but it points to the same empty list 'a' does, which is why it gives the same output.
To understand the difference between different ways of deleting lists, let us see each of them one by one with the help of images.
>>> a1 = [1,2,3]
A new list object is created and assigned to a1.
>>> a2 = a1
We assign a1 to a2. So, list a2 now points to the list object to which a1 points to.
DIFFERENT METHODS EXPLAINED BELOW:
Method-1 Using [] :
>>> a1 = []
On assigning an empty list to a1, there is no effect on a2. a2 still refers to the same list object but a1 now refers to an empty list.
Method-2 Using del [:]
>>> del a1[:]
This deletes all the contents of the list object which a1 was pointing to. a1 now points to an empty list. Since a2 was also referring to the same list object, it also becomes an empty list.
Method-3 Using del a1
>>> del a1
>>> a1
NameError: name 'a1' is not defined
This deletes the variable a1 from the scope. Here, just the variable a1 is removed, the original list is still present in the memory. a2 still points to that original list which a1 used to point to. If we now try to access a1, we will get a NameError.
I have found some Python behavior that confuses me.
>>> A = {1:1}
>>> B = A
>>> A[2] = 2
>>> A
{1: 1, 2: 2}
>>> B
{1: 1, 2: 2}
So far, everything is behaving as expected. A and B both reference the same, mutable, dictionary and altering one alters the other.
>>> A = {}
>>> A
{} # As expected
>>> B
{1: 1, 2: 2} # Why is this not an empty dict?
Why do A and B no longer reference the same object?
I have seen this question: Python empty dict not being passed by reference? and it verifies this behavior, but the answers explain how to fix the provided script not why this behavior occurs.
Here is a pictorial representation *:
A = {1: 1}
# A -> {1: 1}
B = A
# A -> {1: 1} <- B
A[2] = 2
# A -> {1: 1, 2: 2} <- B
A = {}
# {1: 1, 2: 2} <- B
# A -> {}
A = {} creates a completely new object and reassigns the identifier A to it, but does not affect B or the dictionary A previously referenced. You should read this article, it covers this sort of thing pretty well.
Note that, as an alternative, you can use the dict.clear method to empty the dictionary in-place:
>>> A = {1: 1}
>>> B = A
>>> A[2] = 2
>>> A.clear()
>>> B
{}
As A and B are still references to the same object, both now "see" the empty version.
* To a first approximation - similar referencing behaviour is going on within the dictionary too, but as the values are immutable it's less relevant.
Remember, variables in python act like labels. So, in the first example, you have a dictionary {1: 1, 2: 2}. That dictionary stays in memory. In the first example, A points to that dictionary, and you say B points to what A is pointing to (It won't point to the label A, but rather what the label A is pointing to).
In the second example, A and B are both pointing to this dictionary, but you point A to a new dictionary ({}). B stays pointing to the old dictionary in memory from the first example.
you are changing the dictionary A points to when you say A={} not destroying the old dictionary ... this sample should demonstrate for you
A={1:1}
print id(A)
B = A
print id(B)
B[2] = 5
print id(B)
print A
print id(A)
A = {}
print id(A)
It's about the difference between creating a new dictionary and changing an existing dictionary.
A[2] = 2
Is modifying the dictionary by adding a new key, the existing stuff is still part of that dictionary.
A = {}
This creates a totally new empty dictionary.
Think about it like this: A is the name of one object, then you make B a different name for that object. That's the first part, but then in the second code you make a new object and say ok that old object isn't called A anymore now this new object is called A.
B isn't pointing at A. B and A are both names for the same object, then names for two different objects.
Can someone tell me why when you copy dictionaries they both point to the same directory, so that a change to one effects the other, but this is not the case for lists?
I am interested in the logic behind why they would set up the dictionary one way, and lists another. It's confusing and if I know the reason behind it I will probably remember.
dict = {'Dog' : 'der Hund' , 'Cat' : 'die Katze' , 'Bird' : 'der Vogel'}
otherdict = dict
dict.clear()
print otherdict
Which results in otherdict = {}.So both dicts are pointing to the same directory. But this isn't the case for lists.
list = ['one' , 'two' , 'three']
newlist = list
list = list + ['four']
print newlist
newlist still holds on to the old list. So they are not pointing to the same directory. I am wanting to know the rationale behind the reasons why they are different?
Some code with similar intent to yours will show that changes to one list do affect other references.
>>> list = ['one' , 'two' , 'three']
>>> newlist = list
>>> list.append('four')
>>> print newlist
['one', 'two', 'three', 'four']
That is the closest analogy to your dictionary code. You call a method on the original object.
The difference is that with your code you used a separate plus and assignment operator
list = list + ['four']
This is two separate operations. First the interpreter evaluates the expression list + ['four']. It must put the result of that computation in a new list object, because it does not anticipate that you will assign the result back to list. If you had said other_list = list + ['four'], you would have been very annoyed if list were modified.
Now there is a new object, containing the result of list + ['four']. That new object is assigned to list. list is now a reference to the new object, whereas newlist remains a reference to the old object.
Even this is different
list += ['four']
The += has the meaning for mutable object that it will modify the object in place.
Your two cases are doing different things to the objects you're copying, that's why you're seeing different results.
First off, you're not really copying them. Your simply making new "references" or (in more Pythonic terms) binding new names to the same objects.
With the dictionary, you're calling dict.clear, which discards all the contents. This modifies the existing object, so you see the results through both of the references you have to it.
With the list, you're rebinding one of the names to a new list. This new list is not the same as the old list, which remains unmodified.
You could recreate the behavior of your dictionary code with the lists if you want. A slice assignment is one way to modify a whole list at once:
old_list[:] = [] # empties the list in place
One addendum, unrelated to the main issue above: It's a very bad idea to use names like dict and list as variables in your own code. That's because those are the names of the builtin Python dictionary and list types. By using the same names, you shadow the built in ones, which can lead to confusing bugs.
In your dictionary example, you've created a dictionary and store it in dict. You then store the same reference in otherdict. Now both dict and otherdict point to the same dictionary*. Then you call dict.clear(). This clears the dictionary that both dict and otherdict point to.
In your list example, you've created a list and store it in list. You then store the same reference in otherlist. Then you create a new list consisting of the elements of list and another element and store the new list in list. You did not modify the original list you created. You created a new list and changed what list pointed to.
You can get your list example to show the same behavior as the dictionary example by using list.append('four') rather than list = list + ['four'].
Do you mean this?
>>> d = {'test1': 1, 'test2': 2}
>>> new_d = d
>>> new_d['test3'] = 3
>>> new_d
{'test1': 1, 'test3': 3, 'test2': 2}
>>> d # copied over
{'test1': 1, 'test3': 3, 'test2': 2}
>>> lst = [1, 2, 3]
>>> new_lst = lst
>>> new_lst.append(5)
>>> new_lst
[1, 2, 3, 5]
>>> lst # copied over
[1, 2, 3, 5]
>>> new_lst += [5]
>>> lst # copied over
[1, 2, 3, 5, 5]
>>> my_tuple = (1, 2, 3)
>>> new_my_tuple = my_tuple
>>> new_my_tuple += (5,)
>>> new_my_tuple
(1, 2, 3, 5)
>>> my_tuple # immutable, so it is not affected by new_my_tuple
(1, 2, 3)
Lists DO pass reference, not the object themselves. Most (hesitant on saying all) mutable (can be changed, such as lists and dictionaries) objects pass references, whereas immutable (cannot be changed, such as tuples) objects pass the object themselves.
I see a article about the immutable object.
It says when:
variable = immutable
As assign the immutable to a variable.
for example
a = b # b is a immutable
It says in this case a refers to a copy of b, not reference to b.
If b is mutable, the a wiil be a reference to b
so:
a = 10
b = a
a =20
print (b) #b still is 10
but in this case:
a = 10
b = 10
a is b # return True
print id(10)
print id(a)
print id(b) # id(a) == id(b) == id(10)
if a is the copy of 10, and b is also the copy of 10, why id(a) == id(b) == id(10)?
"Simple" immutable literals (and in particular, integers between -1 and 255) are interned, which means that even when bound to different names, they will still be the same object.
>>> a = 'foo'
>>> b = 'foo'
>>> a is b
True
While that article may be correct for some languages, it's wrong for Python.
When you do any normal assignment in Python:
some_name = some_name_or_object
You aren't making a copy of anything. You're just pointing the name at the object on the right side of the assignment.
Mutability is irrelevant.
More specifically, the reason:
a = 10
b = 10
a is b
is True, is that 10 is interned -- meaning Python keeps one 10 in memory, and anything that is set to 10 points to that same 10.
If you do
a = object()
b = object()
a is b
You'll get False, but
a = object()
b = a
a is b
will still be True.
Because interning has already been explained, I'll only address the mutable/immutable stuff:
As assign the immutable to a variable.
When talking about what is actually happening, I wouldn't choose this wording.
We have objects (stuff that lives in memory) and means to access those objects: names (or variables), these are "bound" to an object in reference. (You could say the point to the objects)
The names/variables are independent of each other, they can happen to be bound to the same object, or to different ones. Relocating one such variable doesn't affect any others.
There is no such thing as passing by value or passing by reference. In Python, you always pass/assign "by object". When assigning or passing a variable to a function, Python never creates a copy, it always passes/assigns the very same object you already have.
Now, when you try to modify an immutable object, what happens? As already said, the object is immutable, so what happens instead is the following: Python creates a modified copy.
As for your example:
a = 10
b = a
a =20
print (b) #b still is 10
This is not related to mutability. On the first line, you bind the int object with the value 10 to the name a. On the second line, you bind the object referred to by a to the name b.
On the third line, you bind the int object with the value 20 to the name a, that does not change what the name b is bound to!
It says in this case a refers to a copy of b, not reference to b. If b
is mutable, the a wiil be a reference to b
As already mentioned before, there is no such thing as references in Python. Names in Python are bound to objects. Different names (or variables) can be bound to the very same object, but there is no connection between the different names themselves. When you modify things, you modify objects, that's why all other names that are bound to that object "see the changes", well they're bound to the same object that you've modified, right?
If you bind a name to a different object, that's just what happens. There's no magic done to the other names, they stay just the way they are.
As for the example with lists:
In [1]: smalllist = [0, 1, 2]
In [2]: biglist = [smalllist]
In [3]: biglist
Out[3]: [[0, 1, 2]]
Instead of In[1] and In[2], I might have written:
In [1]: biglist = [[0, 1, 2]]
In [2]: smalllist = biglist[0]
This is equivalent.
The important thing to see here, is that biglist is a list with one item. This one item is, of course, an object. The fact that it is a list does not conjure up some magic, it's just a simple object that happens to be a list, that we have attached to the name smalllist.
So, accessing biglist[i] is exactly the same as accessing smalllist, because they are the same object. We never made a copy, we passed the object.
In [14]: smalllist is biglist[0]
Out[14]: True
Because lists are mutable, we can change smallist, and see the change reflected in biglist. Why? Because we actually modified the object referred to by smallist. We still have the same object (apart from the fact that it's changed). But biglist will "see" that change because as its first item, it references that very same object.
In [4]: smalllist[0] = 3
In [5]: biglist
Out[5]: [[3, 1, 2]]
The same is true when we "double" the list:
In [11]: biglist *= 2
In [12]: biglist
Out[12]: [[0, 1, 2], [0, 1, 2]]
What happens is this: We have a list: [object1, object2, object3] (this is a general example)
What we get is: [object1, object2, object3, object1, object2, object3]: It will just insert (i.e. modify "biglist") all of the items at the end of the list. Again, we insert objects, we do not magically create copies.
So when we now change an item inside the first item of biglist:
In [20]: biglist[0][0]=3
In [21]: biglist
Out[21]: [[3, 1, 2], [3, 1, 2]]
We could also just have changed smalllist, because for all intents and purposes, biglist could be represented as: [smalllist, smalllist] -- it contains the very same object twice.