Difference between cloning and deepcopy? - python

I've just started programming, and am working my way through "How to think like a Computer Scientist" for Python. I haven't had any problems until I came to an exercise in Chapter 9:
def add_column(matrix):
"""
>>> m = [[0, 0], [0, 0]]
>>> add_column(m)
[[0, 0, 0], [0, 0, 0]]
>>> n = [[3, 2], [5, 1], [4, 7]]
>>> add_column(n)
[[3, 2, 0], [5, 1, 0], [4, 7, 0]]
>>> n
[[3, 2], [5, 1], [4, 7]]
"""
The code should make the above doctest pass. I was getting stuck on the last test: getting the original list to stay unaffected. I looked up the solution, which is the following:
x = len(matrix)
matrix2 = [d[:] for d in matrix]
for z in range(x):
matrix2[z] += [0]
return matrix2
My question is this: why can't the second line be:
matrix2 = matrix[:]
When this line is in place the original list gets edited to include the addition elements. The "How to be.." guide makes it sound like cloning creates a new list that can be edited without affecting the original list. If that were true, what's going on here? If I use:
matrix2 = copy.deepcopy(matrix)
Everything works fine, but I wasn't under the impression that cloning would fail...
any help would be greatly appreciated!

In your case, matrix contains other lists, so when you do matrix[:], you are cloning matrix, which contains references to other lists. Those are not cloned too. So, when you edit these, they are still the same in the original matrix list. However, if you append an item to the copy (matrix[:]), it will not be appended to the original list.
To visualize this, you can use the id function which returns a unique number for each object: see the docs.
a = [[1,2], [3,4], 5]
print 'id(a)', id(a)
print '>>', [id(i) for i in a]
not_deep = a[:]
# Notice that the ids of a and not_deep are different, so it's not the same list
print 'id(not_deep)', id(not_deep)
# but the lists inside of it have the same id, because they were not cloned!
print '>>', [id(i) for i in not_deep]
# Just to prove that a and not_deep are two different lists
not_deep.append([6, 7])
print 'a items:', len(a), 'not_deep items:', len(not_deep)
import copy
deep = copy.deepcopy(a)
# Again, a different list
print 'id(deep)', id(deep)
# And this time also all the nested list (and all mutable objects too, not shown here)
# Notice the different ids
print '>>', [id(i) for i in deep]
And the output:
id(a) 36169160
>> [36168904L, 35564872L, 31578344L]
id(not_deep) 35651784
>> [36168904L, 35564872L, 31578344L]
a items: 3 not_deep items: 4
id(deep) 36169864
>> [36168776L, 36209544L, 31578344L]

Say you have nested lists, copying will only copy the references to those nested lists.
>>> a = [1]
>>> b = [2]
>>> c = [a, b]
>>> c
[[1], [2]]
>>> d = c[:]
>>> d
[[1], [2]]
>>> d[1].append(2)
>>> d
[[1], [2, 2]]
>>> c
[[1], [2, 2]]
As where, with copy.deepcopy():
>>> d = copy.deepcopy(c)
>>> d[1].append(2)
>>> c
[[1], [2]]
>>> d
[[1], [2, 2]]
This is true of any mutable items. copy.deepcopy() will attempt to make sure that they are copied too.
It's also worth noting that using d = c[:] to copy a list isn't a very clear syntax anyway. A much better solution is d = list(c) (list() returns a new list from any iterable, including another list). Even more clear, obviously, is copy.copy().

Related

Why does Python react differently on assigning matrix element depending on how you build the matrix? [duplicate]

I create a list of lists and want to append items to the individual lists, but when I try to append to one of the lists (a[0].append(2)), the item gets added to all lists.
a = []
b = [1]
a.append(b)
a.append(b)
a[0].append(2)
a[1].append(3)
print(a)
Gives: [[1, 2, 3], [1, 2, 3]]
Whereas I would expect: [[1, 2], [1, 3]]
Changing the way I construct the initial list of lists, making b a float instead of a list and putting the brackets inside .append(), gives me the desired output:
a = []
b = 1
a.append([b])
a.append([b])
a[0].append(2)
a[1].append(3)
print(a)
Gives: [[1, 2], [1, 3]]
But why? It is not intuitive that the result should be different. I know this has to do with there being multiple references to the same list, but I don't see where that is happening.
It is because the list contains references to objects. Your list doesn't contain [[1 2 3] [1 2 3]], it is [<reference to b> <reference to b>].
When you change the object (by appending something to b), you are changing the object itself, not the list that contains the object.
To get the effect you desire, your list a must contain copies of b rather than references to b. To copy a list you can use the range [:]. For example:
>>> a = []
>>> b = [1]
>>> a.append(b[:])
>>> a.append(b[:])
>>> a[0].append(2)
>>> a[1].append(3)
>>> print a
[[1, 2], [1, 3]]
The key is this part:
a.append(b)
a.append(b)
You are appending the same list twice, so both a[0] and a[1] are references to the same list.
In your second example, you are creating new lists each time you call append like a.append([b]), so they are separate objects that are initialized with the same float value.
In order to make a shallow copy of a list, the idiom is
a.append(b[:])
which when doubled will cause a to have two novel copies of the list b which will not give you the aliasing bug you report.

List Multiplication: repeat paired elements and de-reference

I've found numerous solutions on how to repeat a List of Lists while flattening the repeated elements, like so:
[[1,2] for x in range(3)]
: [[1, 2], [1, 2], [1, 2]]
However, I actually have a list of objects that need to be repeated in-line in the List, like so:
mylist = [var1, var2, var1, var2, va1, var2]
without the extra inner list, and importantly, var1 should be copied, not referenced, such that changes to mylist[0] will not also change mylist[2] & mylist[4] etc.
Other solutions regarding lists only still leave the list elements as “pointers”, meaning editing one element actually alters all the repeats of that element in the list.
Is there a readable one-liner for using multiplication/comprehension to do this while removing the “pointers” in the list?
This is an operation users may have to do often with my class and a for() loop will make construction of their list much less readable. I'm already sad that we can't just use [var1, var2] * 30 without overloading __mult__, which I'm trying to avoid. That would have made their code so readable.
For example, the following use of deepcopy() is not good enough, as the objects are referenced and thus unexpectedly altered
>>> obj1=[1]; obj2=[200]
>>> a=deepcopy( 3*[obj1, obj2] )
>>> a
[[1], [200], [1], [200], [1], [200]]
>>> a[0][0]=50
>>> a
[[50], [200], [50], [200], [50], [200]]
the 50 propagated throughout the list, rather than changing only the first element.
You need
import copy
[copy.copy(y) for x in range(3) for y in [var1, var2]]
Or shallow copy is not enough
[copy.deepcopy(y) for x in range(3) for y in [var1, var2]]
I'm not quite sure what behavior you're looking for. Depending on what you want, one of these two options might be right:
In [0]: from itertools import chain
In [1]: from copy import deepcopy
In [2]: [deepcopy(x) for x in 3*[var1, var2]]
Out[2]: [[1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4]]
In [3]: list( chain( *(3*[var1, var2]) ) )
Out[3]: [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4]

Different behaviour when operating on equivalent multidimensional lists [duplicate]

This question already has answers here:
List of lists changes reflected across sublists unexpectedly
(17 answers)
Closed 5 years ago.
When I operate on two, I think equivalent multidimensional lists, I have different outcomes. The only difference between the lists is how they are created. I'm using Python 3.4.3
>>> b = [[1,2],[1,2]]
>>> b[0][0] += 1
>>> b
[[2, 2], [1, 2]]
>>> b = [[1,2]] * 2
>>> b
[[1, 2], [1, 2]]
>>> b[0][0] += 1
>>> b
[[2, 2], [2, 2]]
As you can see, both b's and the operations on them are the same, but the outcome is not. I'm guessing that it has something to do with the way they are created since that is the only difference, but I don't see how.
Its the same with Python 2.7.6
>>> b = [[1,2],[1,2]]
>>> b
[[1, 2], [1, 2]]
>>> c = [[1,2]] * 2
>>> c
[[1, 2], [1, 2]]
>>> c == b
True
>>> b[0][0] += 1
>>> b
[[2, 2], [1, 2]]
>>> c[0][0] += 1
>>> c
[[2, 2], [2, 2]]
>>> c == b
False
>>>
b = [[1,2],[1,2]]
print(id(b[0])) # 139948012160968
print(id(b[1])) # 139948011731400
b = [[1,2]]*2
print(id(b[0])) # 139948012161032
print(id(b[1])) # 139948012161032
`id() shows the object's ID or the memory location in Python.
When you do b = [[1,2]]*2 you are basically saying let's point to the same object twice and store it in b in a list.
When you do b = [[1,2],[1,2]] you are basically saying let me get two different objects and put them in a list and let b reference the list.
So for the latter example, of course you are going to get that output since they are the same object you are changing. You can think of it as me giving you the same address to a house and I have the same address I gave you. We end up at the same place and what ever changes we make to the house, we see it together.
Edited for comment:
Correct! They are changing how the memory is handled but the values are the same.
== tests if the values are the same. is tests if the objects are the same. so in our case:
#First case:
print(b[0] == b[1]) #true
print(b[0] is b[1]) #false
#second case:
print(b[0] == b[1]) #true
print(b[0] is b[1]) #true
Edited second time for second comment!~
import copy
x = [1,2]
b = [copy.copy(x) for i in range(3)]
print(id(b[0])) #140133864442248
print(id(b[1])) #140133864586120
print(id(b[2])) #140133864568008
print(b) #[[1, 2], [1, 2], [1, 2]] you can extend range to 256.
If you want a unique object and want to copy it from another object, try using copy. It makes a new object with the same values.
Edited again using one of my favorite function sum:
This is more or less redundant and it might confuse you some more, but sum also works too.
x = [1,2]
b = [sum([x],[]) for i in range(3)]
print(id(b[0])) #140692560200008
print(id(b[1])) #140692559012744
print(b) #[[1, 2], [1, 2], [1, 2]]
Will return different instances in the object. I only point this is just in case you don't want to import copy or import anything.
In the second case you're making what's known as a shallow copy of the [1,2] list. Essentially what that means is that somewhere in memory you have the list [1,2], and when you write [[1,2]]*2 you're saying you want two references to that same list. Thus, when you change one of the lists, you're actually changing the list that both items in b are referring to.
This is very-well understood behavior in Python.
a = [[], []] # two separate references to two separate lists
b = [] * 2 # two references to the same list object
a[0].append(1) # does not affect a[1]
b[0].append(1) # affects b[0] and b[1]

What's the difference between a[] and a[:] when assigning values?

I happen to see this snippet of code:
a = []
a = [a, a, None]
# makes a = [ [], [], None] when print
a = []
a[:] = [a, a, None]
# makes a = [ [...], [...], None] when print
It seems the a[:] assignment assigns a pointer but I can't find documents about that. So anyone could give me an explicit explanation?
In Python, a is a name - it points to an object, in this case, a list.
In your first example, a initially points to the empty list, then to a new list.
In your second example, a points to an empty list, then it is updated to contain the values from the new list. This does not change the list a references.
The difference in the end result is that, as the right hand side of an operation is evaluated first, in both cases, a points to the original list. This means that in the first case, it points to the list that used to be a, while in the second case, it points to itself, making a recursive structure.
If you are having trouble understanding this, I recommend taking a look at it visualized.
The first will point a to a new object, the second will mutate a, so the list referenced by a is still the same.
For example:
a = [1, 2, 3]
b = a
print b # [1, 2, 3]
a[:] = [3, 2, 1]
print b # [3, 2, 1]
a = [1, 2, 3]
#b still references to the old list
print b # [3, 2, 1]
More clear example from #pythonm response
>>> a=[1,2,3,4]
>>> b=a
>>> c=a[:]
>>> a.pop()
4
>>> a
[1, 2, 3]
>>> b
[1, 2, 3]
>>> c
[1, 2, 3, 4]
>>>

Why does foo.append(bar) affect all elements in a list of lists?

I create a list of lists and want to append items to the individual lists, but when I try to append to one of the lists (a[0].append(2)), the item gets added to all lists.
a = []
b = [1]
a.append(b)
a.append(b)
a[0].append(2)
a[1].append(3)
print(a)
Gives: [[1, 2, 3], [1, 2, 3]]
Whereas I would expect: [[1, 2], [1, 3]]
Changing the way I construct the initial list of lists, making b a float instead of a list and putting the brackets inside .append(), gives me the desired output:
a = []
b = 1
a.append([b])
a.append([b])
a[0].append(2)
a[1].append(3)
print(a)
Gives: [[1, 2], [1, 3]]
But why? It is not intuitive that the result should be different. I know this has to do with there being multiple references to the same list, but I don't see where that is happening.
It is because the list contains references to objects. Your list doesn't contain [[1 2 3] [1 2 3]], it is [<reference to b> <reference to b>].
When you change the object (by appending something to b), you are changing the object itself, not the list that contains the object.
To get the effect you desire, your list a must contain copies of b rather than references to b. To copy a list you can use the range [:]. For example:
>>> a = []
>>> b = [1]
>>> a.append(b[:])
>>> a.append(b[:])
>>> a[0].append(2)
>>> a[1].append(3)
>>> print a
[[1, 2], [1, 3]]
The key is this part:
a.append(b)
a.append(b)
You are appending the same list twice, so both a[0] and a[1] are references to the same list.
In your second example, you are creating new lists each time you call append like a.append([b]), so they are separate objects that are initialized with the same float value.
In order to make a shallow copy of a list, the idiom is
a.append(b[:])
which when doubled will cause a to have two novel copies of the list b which will not give you the aliasing bug you report.

Categories