Copy 2d array in Python: Confusion - python

My understanding is that:
def copy_2d(p):
return list(p)
Would make a full copy of p and return it as a result. list( p) seems to do this when I try it in the repl. However it seems like calling the above method like this:
b = [[1,2,3],[3,4,5]]
a = copy_2d(b)
a[0][0] = 0
if (b[0][0] == 0): print "Huh??"
It prints "Huh??", that is, it appears that b is just a reference to a. I double checked but I might be blind. Can someone clarify please?

Your current code for copy_2d returns a shallow copy of the list you pass as its argument. That is, you're creating a new outer list, but the inner values (which may be lists themselves) are not copied. The new list references the same inner lists, so when you mutate one of them, you'll see the same change in the shallow copy as you do in the original list.
You can fix the issue by copying the inner lists as well as creating a new outer list. Try:
def copy_2d(p):
return map(list, p) # warning: this only works as intended in Python 2
In Python 2, this works because map returns a list. In Python 3, the map function returns an iterator, so a list comprehension like [list(inner) for inner in p] would be better.
Of course, if you don't need to write your own code to solve this problem, you should just use copy.deepcopy from the standard library.

import copy
def copy_2d(p):
return copy.deepcopy(p)
or
def copy_2d(p):
return [list(p2) for p2 in p]
What you did copied the array with all the values inside. But inside you had objects with were arrays. Your function did not copied inside lists but their references.
The second solution still copies references, but one layer below.

Shallow copy made. Same logic as here?what-does-the-list-function-do-in-python
"list() converts the iterable passed to it to a list. If the iterable is already a list then a shallow copy is returned, i.e only the outermost container is new rest of the objects are still the same."

Related

Updating a list in Python: Why is the scope of my for-loop within a function apparently global?

I'm an absolute Python-Newbe and I have some trouble with following function. I hope you can help me. Thank you very much for your help in advance!
I have created a list of zip-files in a directory via a list-comprehension:
zips_in_folder = [file for file in os.listdir(my_path) if file.endswith('.zip')]
I then wanted to define a function that replaces a certain character at a certain index in every element fo the list with "-":
print(zips_in_folder)
def replacer_zip_names(r_index, replacer, zips_in_folder=zips_in_folder):
for index, element in enumerate(zips_in_folder):
x = list(element)
x[r_index] = replacer
zips_in_folder[index]=''.join(x)
replacer_zip_names(5,"-")
print(zips_in_folder)
Output:
['12345#6', '22345#6']
['12345-6', '22345-6']
The function worked, but what I cannot wrap my head around: Why will my function update the actual list "zips_in_folder". I thought the "zips_in_folder"-list within the function would only be a "shadow" of the actual list outside the function. Is the scope of the for-loop global instead of local in this case?
In other functions I wrote the scope of the variables was always local...
I was searching for an answer for hours now, I hope my question isn't too obvious!
Thanks again!
Best
Felix
This is a rather intermediate topic. In one line: Python is pass-by-object-reference.
What this means
zips_in_folder is an object. An object has a reference (think of it like an address) that points to its location in memory. To access an object, you need to use its reference.
Now, here's the key part:
For objects Python passes their reference as value
This means that a copy of the reference of the object is created but again, the new reference is pointing to the same location in the memory.
As a consequence, if you use the reference's copy to access the object, then the original object will be modified.
In your function, zips_in_folder is a variable storing a new copy of the reference.
The following line is using the new copy to access the original object:
zips_in_folder[index]=''.join(x)
However, if you decide to reassign the variable that is storing the reference, nothing will be done to the object, or its original reference, because you just reassigned the variable storing the copy of the reference, you did not modify the original object. Meaning that:
def reassign(a):
a = []
a = [1,0]
reassign(a)
print(a) # output: [1,0]
A simple way to think about it is that lists are mutable, this means that the following will be true:
a = [1, 2, 3]
b = a # a, b are referring to the same object
a[1] = 20 # b now is [1, 20, 3]
That is because lists are objects in python, not primitive variables, so the function changes the "original" list i.e. it doesn't make a local copy of it.
The same is true for any class, user-defined or otherwise: a function manipulating an object will not make a copy of the object, it will change the "original" object passed to it.
If you have knowledge of c++ or any other low-level programming language, it's the same as pass-by-reference.

Why does Python return None on list.reverse()?

Was solving an algorithms problem and had to reverse a list.
When done, this is what my code looked like:
def construct_path_using_dict(previous_nodes, end_node):
constructed_path = []
current_node = end_node
while current_node:
constructed_path.append(current_node)
current_node = previous_nodes[current_node]
constructed_path = reverse(constructed_path)
return constructed_path
But, along the way, I tried return constructed_path.reverse() and I realized it wasn't returning a list...
Why was it made this way?
Shouldn't it make sense that I should be able to return a reversed list directly, without first doing list.reverse() or list = reverse(list) ?
What I'm about to write was already said here, but I'll write it anyway because I think it will perhaps add some clarity.
You're asking why the reverse method doesn't return a (reference to the) result, and instead modifies the list in-place. In the official python tutorial, it says this on the matter:
You might have noticed that methods like insert, remove or sort that only modify the list have no return value printed – they return the default None. This is a design principle for all mutable data structures in Python.
In other words (or at least, this is the way I think about it) - python tries to mutate in-place where-ever possible (that is, when dealing with an immutable data structure), and when it mutates in-place, it doesn't also return a reference to the list - because then it would appear that it is returning a new list, when it is really returning the old list.
To be clear, this is only true for object methods, not functions that take a list, for example, because the function has no way of knowing whether or not it can mutate the iterable that was passed in. Are you passing a list or a tuple? The function has no way of knowing, unlike an object method.
list.reverse reverses in place, modifying the list it was called on. Generally, Python methods that operate in place don’t return what they operated on to avoid confusion over whether the returned value is a copy.
You can reverse and return the original list:
constructed_path.reverse()
return constructed_path
Or return a reverse iterator over the original list, which isn’t a list but doesn’t involve creating a second list just as big as the first:
return reversed(constructed_path)
Or return a new list containing the reversed elements of the original list:
return constructed_path[::-1]
# equivalent: return list(reversed(constructed_path))
If you’re not concerned about performance, just pick the option you find most readable.
methods like insert, remove or sort that only modify the list have no return value printed – they return the default None. 1 This is a design principle for all mutable data structures in Python.
PyDocs 5.1
As I understand it, you can see the distinction quickly by comparing the differences returned by modifying a list (mutable) ie using list.reverse() and mutating a list that's an element within a tuple (non-mutable), while calling
id(list)
id(tuple_with_list)
before and after the mutations. Mutable data-type mutations returning none is part allowing them to be changed/expanded/pointed-to-by-multiple references without reallocating memory.

Pointer addresses of Python Lists and Matrix when reversing

I am trying to understand how Python matrices are implemented as compared to Java/C style 2D arrays.
Specifically the problem I am facing is this:
Given a matrix (list of lists), I am asked to reverse the individual lists in the matrix in-place. I came up with the following code:
CODE 1
------
def flip(matrix):
for list in matrix:
list=list[::-1]
matrix=[[1,0,0],[0,0,1]]
flip(matrix)
print(matrix) # Outputs "[[1,0,0],[0,0,1]]" i.e. does not reverse
If I modify the code a bit,
CODE 2
------
def flip(matrix):
for list in matrix:
list.reverse()
matrix=[[1,0,0],[0,0,1]]
flip(matrix)
print(matrix) # Outputs "[[0,0,1],[1,0,0]]" i.e. works correctly this time
I know that list.reverse() does in-place operation and list[::-1] creates a shallow copy. However in CODE 1, I am assigning the address of the shallow copy to the same variable (list) only. So the variable matrix should effectively get changed. Because the variable matrix[i] is the variable list. So if list gets modified, so should matrix.
To illustrate my previous point, the following code is provided:
CODE 3
------
def test(matrix):
for i, list in enumerate(matrix):
print(matrix[i] is list)
matrix=[[1,0,0],[0,0,1]]
test(matrix) # Outputs "True True"
If matrix[i] is list, then changing list means changing matrix[i] and changing matrix[i] means changing matrix.
If I modify CODE 1 so that instead of list being assigned the address of the newly created reversed list, matrix[i] be assigned that address, then surprisingly it works!
CODE 4
------
def flip(matrix):
for i, list in enumerate(matrix):
# Instead of list=list[::-1]
matrix[i]=list[::-1]
matrix=[[1,0,0],[0,0,1]]
flip(matrix)
print(matrix) # Correctly Outputs [[0,0,1], [1,0,0]]
I would like an explanation why CODE 1 does not work and why CODE 4 works.
The first time through the loop, list is just a name for matrix[0].
Mutating the object that list names, as in CODE 2, obviously mutates the object that matrix[0] names, because they're naming the same object.
But just rebinding list to some different object, as in CODE 1, doesn't change matrix[0] in any way. If you think about it, that makes sense. After all, the next time through the loop, list is going to get rebound to matrix[1], and you certainly wouldn't want that to change what's in matrix[0], right?
In C terms (and this is literally true, if you're using the normal CPython implementation), being names for the same object means being pointers to the same object. If list is a List *, assigning to list doesn't do anything to whatever was in *list.
So, why does CODE 4 work? Well, in code 4, you're still not mutating the list—but you're rebinding matrix[0], instead of list, and of course that rebinds matrix[0].
I'm guessing that, despite talking about "Java/C", you're really thinking in C++ terms. In C++, = is an operator, which can be overloaded. Plus, you don't just have pointers, but references, which sort of magically work without needing to explicitly dereference them. So, if list is a reference to a list object, rather than a pointer, list = isn't changing it into a reference to another list object, it's calling a special method, ListType::operator=. That's actually pretty weird. There's nothing like that in Java. or C. any more than there is in Python.
For more detail on what happens under the covers:
If you want to think of it in C terms, the actual C API used by the main (CPython) implementation may make things clear here.
Your function's locals are just an array of pointers to Python objects. That matrix is locals[0], list is locals[1], etc.
What's in *locals[0] is a PyListObject struct, which contains, among other things, a pointer to an array of Python objects. Each of which is pointing to another PyListObject struct. But inside those inner lists' arrays are pointers to PyLongObject structs, which just hold numbers.
The for loop is a bit more complicated than this, but pretend it's just doing locals[1] = (*locals[0]).elements[0], then locals[1] = (*locals[0]).elements[1], etc.
So, assigning to list is just changing locals[1], not *locals[1], and therefore it's not changing *locals[0].elements[0].
But assigning to *locals[0].elements[0] is a different story.
And so is calling the reverse method. When you do that, self just ends up as yet another pointer to the same object, but its implementation mutates things on *self.

Variable overwritten inside a function

I have the following piece of code in Python 2.7
abc=[[1,2,3],[0,1,2]]
def make_change(some_list):
for i in range(len(some_list)):
if some_list[i][0]==0:
some_list[i]=[]
return some_list
new_list=make_change(abc)
print abc
print new_list
My understanding was that it should produce the following output.
[[1,2,3],[0,1,2]]
[[1,2,3],[]]
But the python actually produces
[[1,2,3],[]]
[[1,2,3],[]]
Am I missing something?
You can prevent this issue by copying the list while passing it to the function:
abc=[[1,2,3],[0,1,2]]
def make_change(some_list):
for i in range(len(some_list)):
if some_list[i][0]==0:
some_list[i]=[]
return some_list
new_list=make_change(abc[:])
print abc
print new_list
The changed part:
new_list=make_change(abc[:])
The reason this happens is Python passes the list by reference, so changes will be made to the original as well. Using [:] creates a shallow copy, which is enough to prevent this.
Do not change the list you pass to a function unless you specifically want the function to change a list that's passed to it. Make a copy of the list (or any other mutable object) and work on the copy. If you're using compound objects (objects with objects in them) use copy.deepcopy to make sure everything is a copy.
Since functions exist to encapsulate the weird stuff you have to do, doing weird stuff to the objects you're passing to a function rarely makes sense to me. The other answer has you pass a slice copy of the list to the function. Why not make the whole thing more readable by passing your mutable to the function and having the function create the copy. Better encapsulation = less annoying code.
abc=[[1,2,3],[0,1,2]]
def make_change(some_list):
from copy import deepcopy
new_list = deepcopy(some_list)
for i in range(len(new_list)):
if new_list[i][0]==0:
new_list[i]=[]
return new_list
new_list=make_change(abc)
print abc
print new_list
List Comprehensions to the rescue!
abc=[[1,2,3],[0,1,2]]
def make_change(some_list):
return [[] if a[0]==0 else a for a in some_list]
make_change(abc)
[[1, 2, 3], []]

Python list append issue

One of my function appends an item to the list. This appended list is then sent as argument into two recursive function. But when one of the called function updates its own local list, the original list gets modified because of which second recursive function gets modified list as input.
Basically this is a sort of thing that i am trying to do. Here i want
[2]
[2,3]
[2,4] as output but i am getting [2][2,3] as only output as the original list is getting modified. Is there any way that i can send same list in two functions.
def Growtree(llst, x):
if len(llst) == 2:
return
llst.append(x)
print(llst)
Growtree(llst,3)
Growtree(llst,4)
When Growtree(llst, 4) is called there is already 2 and 3 in the list llst. So it returns without appending a new element because of your if.
What you need is to make a copy of the list (before to call Glowtree of inside, it depends if you want the orignal list to get modified).
To copy a list, see https://stackoverflow.com/a/2612815/3410584
use copy.deepcopy()
The reason why the original list got modified in your functions, is that generally when you passing an mutable obj in python. It's the reference got passed into.

Categories