Why is Python modifying the source in this function?

Why is Python modifying the source in this function? - python

If I use a list as a parameter for a function, I would expect that the original list would be unmodified. Why is it when I use the code below, x == z is True when it should be x == range(10) is True?
In [83]: def pop_three(collection):
....: new_collection = []
....: new_collection.append(collection.pop())
....: new_collection.append(collection.pop())
....: new_collection.append(collection.pop())
....: return new_collection, collection
....:
In [84]: x = range(10)
In [85]: x
Out[85]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [86]: y, z = pop_three(x)
In [87]: y
Out[87]: [9, 8, 7]
In [88]: z
Out[88]: [0, 1, 2, 3, 4, 5, 6]
In [89]: x
Out[89]: [0, 1, 2, 3, 4, 5, 6]

If I use a list as a parameter for a function, I would expect that the original list would be unmodified.
No. I think what you're missing here is that Python never automatically copies anything.
If you're used to a language like, say, C++, Python is very different. Ned Batchelder, as always, explains this brilliantly, but let me summarize:
In C++, a function parameter, or a variable, or an array element, is a location in memory, with a type. When you write a = b, or call f(b), that copies the value in memory location b to memory location a (possibly calling a casting operator or conversion constructor or copy constructor or copy assignment operator—or moving versions of half of those… but we're not here to learn about the intricacies of C++).
In Python, a function parameter, or a variable, or a list element, is just a name. The value has its own typed memory location, somewhere else. You can have as many names as you want for the same object. When you write a = b or call f(b), that just makes another name for the same object.
In particular, in your case, collection and x are names for the same object. If you added print(id(collection)) before the call and print(id(x)) inside the function, they would both print out the same number.
So, in Python, if you want a copy, you have to ask for it explicitly—e.g., pop_three(x[:]) or pop_three(copy.copy(x)).
Or, of course, you could do that inside pop_three.
Or, best of all, you could just avoid using mutating methods like pop in the first place:
def pop_three(collection):
return collection[-3:][::-1], collection[:-3]
I was confused since strings or integers passed into a function can be treated in this fashion.
Well, they can be treated in this fashion exactly as far as lists can. If you never mutate anything, the distinction between separate copies and shared references is irrelevant.* Because Python strings and integers are immutable, the distinction can't come up in any code with strings or integers are arguments. Lists, on the other hand, are mutable, so the distinction can arise—but it's perfectly possible, and in fact often best, to write code that uses them without mutating anything, as in the final example above.
Notice that this assumes the first point above: that, unlike C++ assignment, Python assignment is not a form of mutation. But it can get tricky at the edges. For example, is x[0] = 1 a mutation? Well, it's not a mutation of x[0], but it is a mutation of x. What about x += y? The language leaves that up to the type of x, and for mutable types it usually (but not always) is a mutation, while for immutable types of course it isn't.
* This isn't quite true; you can write code that relies on identity even though it isn't relevant, e.g., by using the id function or the is operator. Such code is almost never Pythonic, or a good idea… but it can be helpful to learning how things work. Try to print(id(x)) suggestion above with x = "a" + "b". However, this can be misleading, because the Python interpreter is allowed to "intern" objects that it knows are guaranteed to be immutable, and it does so with, e.g., small integers, and the Python compiler is allowed to fold constants, and it does so with, e.g., literal strings.

Because when you pass x to the pop_tree() and you called collection.pop(). This effectively did x.pop()
Therefore when you turned from the funciton, x only got 7 elements

Lists are mutable. When you pop from a list, you modify the list.
Any variable that originally pointed to the list will continue to point to the same modified list. Any new variable that you point at the list will point to the same modified list as well.

z = pop_three(x) means you are passing in the list x and then mutating that list x using collection.pop() x so x is being mutated therefore could not be == range(10)
y becomes new_collection and z is collection which is the mutated returned value of x.
In [2]: x = range(10)
In [3]: y, z = pop_three(x)
In [4]: z is x # z is the object x
Out[4]: True

Related

Modifying Python lists via slices and for-loops?

I was trying to modify the values in lists via slices and for-loops, and ran into some pretty interesting behavior. I would appreciate if someone could explain what's happening internally here.
>>> x = [1,2,3,4,5]
>>> x[:2] = [6,7] #slices can be modified
>>> x
[6, 7, 3, 4, 5]
>>> x[:2][0] = 8 #indices of slices cannot be modified
>>> x
[6, 7, 3, 4, 5]
>>> x[:2][:1] = [8] #slices of slices cannot be modified
>>> x
[6, 7, 3, 4, 5]
>>> for z in x: #this version of a for-loop cannot modify lists
... z += 1
...
>>> x
[6, 7, 3, 4, 5]
>>> for i in range(len(x)): #this version of a for-loop can modify lists
... x[i] += 1
...
>>> x
[7, 8, 4, 5, 6]
>>> y = x[:2] #if I assign a slice to a var, it can be modified...
>>> y[0] = 1
>>> y
[1, 8]
>>> x #...but it has no impact on the original list
[7, 8, 4, 5, 6]

Let's break down your comments 1 by 1:
1.) x[:2] = [6, 7] slices can be modified:
See these answers here. It's calling the __setitem__ method from the list object and assigning the slice to it. Each time you reference x[:2] a new slice object is created (you can simple do id(x[:2]) and it's apparent, not once will it be the same id).
2.) indices of slices cannot be modified:
That's not true. It couldn't be modified because you're performing the assignment on the slice instance, not the list, so it doesn't trigger the __setitem__ to be performed on the list. Also, int are immutable so it cannot be changed either way.
3.) slices of slices cannot be modified:
See above. Same reason - you are assigning to an instance of the slice and not modifying the list directly.
4.) this version of a for-loop cannot modify lists:
z being referenced here is the actual objects in the elements of x. If you ran the for loop with id(z) you'll note that they're identical to id(6), id(7), id(3), id(4), id(5). Even though list contains all 5 identical references, when you do z = ... you are only assigning the new value to the object z, not the object that is stored in list. If you want to modify the list, you'll need to assign it by index, for the same reason you can't expect 1 = 6 will turn x into [6, 2, 3, 4, 5].
5.) this version of a for-loop can modify lists:
See my answer above. Now you are directly performing item assignment on the list instead of its representation.
6.) if I assign a slice to a var, it can be modified:
If you've been following so far, you'll realize now you are assigning the instance of x[:2] to the object y, which is now a list. The story follows - you perform an item assignment by index on y, of course it will be updated.
7.) ...but it has no impact on the original list:
Of course. x and y are two different objects. id(x) != id(y), therefore any operation performed on x will not affect y whatsoever. if you however assigned y = x and then made a change to y, then yes, x will be affected as well.
To expand a bit on for z in x:, say you have a class foo() and assigned two instances of such to the list f:
f1 = foo()
f2 = foo()
f = [f1, f2]
f
# [<__main__.foo at 0x15f4b898>, <__main__.foo at 0x15f4d3c8>]
Note that the reference in question is the actual foo instance, not the object f1 and f2. So even if I did the following:
f1 = 'hello'
f
# [<__main__.foo at 0x15f4b898>, <__main__.foo at 0x15f4d3c8>]
f still remains unchanged since the foo instances remains the same even though object f1 now is assigned to a different value. For the same reason, whenever you make changes to z in for z in x:, you are only affecting the object z, but nothing in the list is changed until you update x by index.
If however the object have attribute or is mutable, you can directly update the referenced object in the loop:
x = ['foo']
y = ['foo']
lst = [x,y]
lst
# [['foo'], ['foo']]
for z in lst:
z.append('bar')
lst
# [['foo', 'bar'], ['foo', 'bar']]
x.append('something')
lst
# [['foo', 'bar', 'something'], ['foo', 'bar']]
That is because you are directly updating the object in reference instead of assigning to object z. If you however assigned x or y to a new object, lst will not be affected.

There is nothing odd happening here. Any slice that you obtain from a list is a new object containing copies of your original list. The same is true for tuples.
When you iterate through your list, you get the object which the iteration yields. Since ints are immutable in Python you can't change the state of int objects. Each time you add two ints a new int object is created. So your "version of a for-loop [which] cannot modify lists" is not really trying to modify anything because it will not assign the result of the addition back to the list.
Maybe you can guess now why your second approach is different. It uses a special slicing syntax which is not really creating a slice of your list and allows you to assign to the list (documentation). The newly created object created by the addition operation is stored in the list through this method.
For understanding your last (and your first) examples, it is important to know that slicing creates (at least for lists and tuples, technically you could override this in your own classes) a partial copy of your list. Any change to this new object will, as you already found out, not change anything in your original list.

Why does concatenating a list to another one creates another object in memory, whereas other manipulations induce mutation? [duplicate]

This question already has answers here:
Object id in Python
(4 answers)
Closed 5 years ago.
Case A:
list1=[0, 1, 2, 3]
list2=list1
list1=list1+[4]
print(list1)
print(list2)
Output:
[0, 1, 2, 3, 4]
[0, 1, 2, 3]
(This non-mutating behavior also happens when concatenating a list of more than a single entry, and when 'multiplying' the list, e.g. list1=list1*2, in fact any type of "re-assignment" that performs an operation with an infix operator to the list and then assigns the result of that operation to the same list name using "=" )
In this case the original list object that list1 pointed to has not been altered in memory and list2 still points to it, another object has simply been created in memory for the result of the concatenation that list1 now points to (there are now two distinct, different list objects in memory)
Case B:
list1=[0, 1, 2, 3]
list2=list1
list1.append(4)
print(list1)
print(list2)
---
Output :
[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]
Case C:
list1=[0, 1, 2, 3]
list2=list1
list1[-1]="foo"
print(list1)
print(list2)
Outputs:
[0, 1, 2, 'foo']
[0, 1, 2, 'foo']
In case B and C the original list object that list1 points to has mutated, list2 still points to that same object, and as a result the value of list2 has changed. (there is still one single list object in memory and it has mutated).
This behavior seems inconsistent to me as a noob. Is there a good reason/utility for this?
EDIT :
I changed the list variables names from "list" and "list_copy" to "list1" and "list2" as this was clearly a very poor and confusing choice of names.
I chose Kabanus' answer as I liked how he pointed out that mutating operations are always(?) explicit in python.
In fact a short and simple answer to my question can be done summarizing Kabanus' answer into two of his statements :
-"In python mutating operations are explicit"
-"The addition[or multiplication] operator [performed on list objects] creates a new object, and doesn't change x[the list object] implicitly."
I could also add:
-"Every time you use square brackets to describe a list, that creates a new list object"
[this is from : http://www-inst.eecs.berkeley.edu/~selfpace/cs9honline/Q2/mutation.html , great explanations there on this topic]
Also I realized after Kabanus' answer how careful one must be tracking mutations in a program :
l=[1,2]
z=l
l+=[3]
z=z+[3]
and
l=[1,2]
z=l
z=z+[3]
l+=[3]
Will yield completely different values for z. This must be a frequent source of errors isn't it?
I'm only in the beginning of my learning and haven't delved deeply into OOP concepts just yet, but I think I'm already starting to understand what the fuss around functional paradigm is about...

l += [4]
is equivalent to:
l.append(4)
and won't create a copy.
l = l + [4]
is an assignment to l, it first evaluates the right side of the assignment, then assigns the resulting object to name l. There is no way for this operation to be mutating l.
Update: I guess I haven't made myself clear enough. Of course operations on the RHS of the assignment may involve mutating the object that is the current value of LHS; but finally, the result of computing RHS is assigned to the LHS, thus overwriting any previous mutations. Example:
def increment_first(x):
x[0] += 1
return []
l = [ 1 ]
l = increment_first(l)
While the call to increment_first will increment l[0] as its side effect, the mutated list object will be lost anyway as soon as the value of RHS (in this case - an empty list) is assigned to l.

This is by design. The point is python does not like non explicit side effects. Suppose this valid line in your file:
x=[1,2]
print(x+[3,4])
Note there is no assignment, but it's still a valid line. Do you expect x to have changed after that second line? For me it doesn't make sense.
That's what you're seeing - the addition operator creates a new object, and doesn't change x implicitly. If you feel it should, then what about:
[3,4]+x
Of course, addition does not change behavior in assignment, to avoid confusion.
In python mutating operations are explicit:
x+=[3,4]
Or your example:
x[0]=1
Here you are explicitly asking to change a cell, i.e. explicit mutation. These things are consistent - an operation is always a mutation or it isn't, and it won't be both. Usually it makes sense as well, such as concatenating lists on the fly.

for n in a list: del n

When I loop over a list, the name that I give within the loop to the elements of the list apparently refers directly to each element in turn, as evidenced by:
>>> a = [1, 2, 3]
>>> for n in a:
... print n is a[a.index(n)]
True
True
True
So why doesn't this seem to do anything?
>>> for n in a: del n
>>> a
[1, 2, 3]
If I try del a[a.index(n)], I get wonky behavior, but at least it's behavior I can understand - every time I delete an element, I shorten the list, changing the indices of the other elements, so I end up deleting every other element of the list:
>>> a = range(10)
>>> a
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> for n in a: del a[a.index(n)]
>>> a
[1, 3, 5, 7, 9]
Clearly I'm allowed to delete from the list while iterating. So what's going on when I try to del n inside the loop? Is anything being deleted?

Inside the block of any for n in X statement, n refers to the variable named n itself, not to any notion of "the place in the list of the last value you iterated over". Therefore, what your loop is doing is repeatedly binding a variable n to a value fetched from the list and then immediately unbinding that same variable again. The del n statement only affects your local variable bindings, rather than the list.

Because you're doing this:
some_reference = a[0]
del some_reference
#do you expect this to delete a[0]? it doesn't.
You're operating on a variable bound to the value of a[0] (then one bound to a[1], then...). You can delete it, but it won't do anything to a.

Others have explained the idea of deleting a reference pretty well, I just wanted to touch on item deletion for completeness. Even though you use similar syntax, del a[1], each type will handle item deletion a little differently. As expected, deleting items from most containers just removes them, and some types do not support item deleting at all, tuples for example. Just as a fun exerciser:
class A(object):
def __delitem__(self, index):
print 'I will NOT delete item {}!'.format(index)
a = A()
del a[3]
# I will NOT delete item 3!

Dolda2000's answer covers the main question very nicely, but there are three other issues here.
Using index(n) within a for n in a loop is almost always a bad idea.
It's incorrect:
a = [1, 2, 1, 2]
for n in a:
print(a.index(n))
This will print 0, then 1, then 0 again. Why? Well, the third value is 1. a.index(1) is the index of the first 1 in the list. And that's 0, not 2.
It's also slow: To find the ith value, you have to check the first i elements in the list. This turns a simple linear (fast) algorithm into a quadratic (slow) one.
Fortunately, Python has a nice tool to do exactly what you want, enumerate:
for i, n in enumerate(a):
print(i)
Or, if you don't need the values at all, just the indices:
for i in len(range(a)):
print(i)
(This appears as a hint in the docs at least twice, conveniently buried in places no novice would ever look. But it's also spelled out early in the tutorial.)
Next, it looks like you were attempting to test for exactly the case Dolda2000 explained was happening, with this:
n is a[a.index(n)]
Why didn't that work? You proved that they are the same object, so why didn't deleting it do anything?
Unlike C-family languages, where variables are addresses where values (including references to other addresses) get stored, Python variables are names that you bind to values that exist on their own somewhere else. So variables can't reference other variables, but they can be names for the same value. The is expression tests whether two expressions name the same value. So, you proved that you had two names for the same value, you deleted one of those names, but the other name, and the value, are still there.
An example is worth 1000 words, so run this:
a = object() # this guarantees us a completely unique value
b = a
print a, b, id(a), id(b), a is b
del b
print a, id(a)
(Of course if you also del a, then at some point Python will delete the value, but you can't see that, because by definition you no longer have any names to look at it with.)
Clearly I'm allowed to delete from the list while iterating.
Well, sort of. Python leaves it undefined what happens when you mutate an iterable while iterating over it—but it does have special language that describes what happens for builtin mutable sequences (which means list) in the for docs), and for dicts (I can't remember where, but it says somewhere that it can't guarantee to raise a RuntimeError, which implies that it should raise a RuntimeError).
So, if you know that a is a list, rather than some subclass of list or third-party sequence class, you are allowed to delete from it while iterating, and to expect the "skipping" behavior if the elements you're deleting are at or to the left of the iterator. But it's pretty hard to think of a realistic good use for that knowledge.

This is what happens when you do
for n in a: del a[a.index(n)]
Try this:
a = [0, 1, 2, 3, 4]
for n in a: print(n, a); del a[a.index(n)]
This is what you get:
(0, [0, 1, 2, 3, 4])
(2, [1, 2, 3, 4])
(4, [1, 3, 4])
Thus n is just a tracker of the index, and you can think this way. Every time the function iterates, n moves on to the next relative position in the iterable.
In this case, n refers to a[0] for the first time, a[1] for the second time, a[2] for the third time. After that there is no nextItem in the list, thus the iterations stops.

Python: .append(0)

I would like to ask what the following does in Python.
It was taken from http://danieljlewis.org/files/2010/06/Jenks.pdf
I have entered comments telling what I think is happening there.
# Seems to be a function that returns a float vector
# dataList seems to be a vector of flat.
# numClass seems to an int
def getJenksBreaks( dataList, numClass ):
# dataList seems to be a vector of float. "Sort" seems to sort it ascendingly
dataList.sort()
# create a 1-dimensional vector
mat1 = []
# "in range" seems to be something like "for i = 0 to len(dataList)+1)
for i in range(0,len(dataList)+1):
# create a 1-dimensional-vector?
temp = []
for j in range(0,numClass+1):
# append a zero to the vector?
temp.append(0)
# append the vector to a vector??
mat1.append(temp)
(...)
I am a little confused because in the pdf there are no explicit variable declarations. However I think and hope I could guess the variables.

Yes, the method append() adds elements to the end of the list. I think your interpretation of the code is correct.
But note the following:
x =[1,2,3,4]
x.append(5)
print(x)
[1, 2, 3, 4, 5]
while
x.append([6,7])
print(x)
[1, 2, 3, 4, 5, [6, 7]]
If you want something like
[1, 2, 3, 4, 5, 6, 7]
you may use extend()
x.extend([6,7])
print(x)
[1, 2, 3, 4, 5, 6, 7]

Python doesn't have explicit variable declarations. It's dynamically typed, variables are whatever type they get assigned to.
Your assessment of the code is pretty much correct.
One detail: The range function goes up to, but does not include, the last element. So the +1 in the second argument to range causes the last iterated value to be len(dataList) and numClass, respectively. This looks suspicious, because the range is zero-indexed, which means it will perform a total of len(dataList) + 1 iterations (which seems suspicious).
Presumably dataList.sort() modifies the original value of dataList, which is the traditional behavior of the .sort() method.
It is indeed appending the new vector to the initial one, if you look at the full source code there are several blocks that continue to concatenate more vectors to mat1.

append is a list function used to append a value at the end of the list
mat1 and temp together are creating a 2D array (eg = [[], [], []]) or matrix of (m x n)
where m = len(dataList)+1 and n = numClass
the resultant matrix is a zero martix as all its value is 0.

In Python, variables are implicitely declared. When you type this:
i = 1
i is set to a value of 1, which happens to be an integer. So we will talk of i as being an integer, although i is only a reference to an integer value. The consequence of that is that you don't need type declarations as in C++ or Java.
Your understanding is mostly correct, as for the comments. [] refers to a list. You can think of it as a linked-list (although its actual implementation is closer to std::vectors for instance).
As Python variables are only references to objects in general, lists are effectively lists of references, and can potentially hold any kind of values. This is valid Python:
# A vector of numbers
vect = [1.0, 2.0, 3.0, 4.0]
But this is perfectly valid code as well:
# The list of my objects:
list = [1, [2,"a"], True, 'foo', object()]
This list contains an integer, another list, a boolean... In Python, you usually rely on duck typing for your variable types, so this is not a problem.
Finally, one of the methods of list is sort, which sorts it in-place, as you correctly guessed, and the range function generates a range of numbers.
The syntax for x in L: ... iterates over the content of L (assuming it is iterable) and sets the variable x to each of the successive values in that context. For example:
>>> for x in ['a', 'b', 'c']:
... print x
a
b
c
Since range generates a range of numbers, this is effectively the idiomatic way to generate a for i = 0; i < N; i += 1 type of loop:
>>> for i in range(4): # range(4) == [0,1,2,3]
... print i
0
1
2
3

Is a Python list guaranteed to have its elements stay in the order they are inserted in?

If I have the following Python code
>>> x = []
>>> x = x + [1]
>>> x = x + [2]
>>> x = x + [3]
>>> x
[1, 2, 3]
Will x be guaranteed to always be [1,2,3], or are other orderings of the interim elements possible?

Yes, the order of elements in a python list is persistent.

In short, yes, the order is preserved. In long:
In general the following definitions will always apply to objects like lists:
A list is a collection of elements that can contain duplicate elements and has a defined order that generally does not change unless explicitly made to do so. stacks and queues are both types of lists that provide specific (often limited) behavior for adding and removing elements (stacks being LIFO, queues being FIFO). Lists are practical representations of, well, lists of things. A string can be thought of as a list of characters, as the order is important ("abc" != "bca") and duplicates in the content of the string are certainly permitted ("aaa" can exist and != "a").
A set is a collection of elements that cannot contain duplicates and has a non-definite order that may or may not change over time. Sets do not represent lists of things so much as they describe the extent of a certain selection of things. The internal structure of set, how its elements are stored relative to each other, is usually not meant to convey useful information. In some implementations, sets are always internally sorted; in others the ordering is simply undefined (usually depending on a hash function).
Collection is a generic term referring to any object used to store a (usually variable) number of other objects. Both lists and sets are a type of collection. Tuples and Arrays are normally not considered to be collections. Some languages consider maps (containers that describe associations between different objects) to be a type of collection as well.
This naming scheme holds true for all programming languages that I know of, including Python, C++, Java, C#, and Lisp (in which lists not keeping their order would be particularly catastrophic). If anyone knows of any where this is not the case, please just say so and I'll edit my answer. Note that specific implementations may use other names for these objects, such as vector in C++ and flex in ALGOL 68 (both lists; flex is technically just a re-sizable array).
If there is any confusion left in your case due to the specifics of how the + sign works here, just know that order is important for lists and unless there is very good reason to believe otherwise you can pretty much always safely assume that list operations preserve order. In this case, the + sign behaves much like it does for strings (which are really just lists of characters anyway): it takes the content of a list and places it behind the content of another.
If we have
list1 = [0, 1, 2, 3, 4]
list2 = [5, 6, 7, 8, 9]
Then
list1 + list2
Is the same as
[0, 1, 2, 3, 4] + [5, 6, 7, 8, 9]
Which evaluates to
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Much like
"abdcde" + "fghijk"
Produces
"abdcdefghijk"

You are confusing 'sets' and 'lists'. A set does not guarantee order, but lists do.
Sets are declared using curly brackets: {}. In contrast, lists are declared using square brackets: [].
mySet = {a, b, c, c}
Does not guarantee order, but list does:
myList = [a, b, c]

I suppose one thing that may be concerning you is whether or not the entries could change, so that the 2 becomes a different number, for instance. You can put your mind at ease here, because in Python, integers are immutable, meaning they cannot change after they are created.
Not everything in Python is immutable, though. For example, lists are mutable---they can change after being created. So for example, if you had a list of lists
>>> a = [[1], [2], [3]]
>>> a[0].append(7)
>>> a
[[1, 7], [2], [3]]
Here, I changed the first entry of a (I added 7 to it). One could imagine shuffling things around, and getting unexpected things here if you are not careful (and indeed, this does happen to everyone when they start programming in Python in some way or another; just search this site for "modifying a list while looping through it" to see dozens of examples).
It's also worth pointing out that x = x + [a] and x.append(a) are not the same thing. The second one mutates x, and the first one creates a new list and assigns it to x. To see the difference, try setting y = x before adding anything to x and trying each one, and look at the difference the two make to y.

Yes the list will remain as [1,2,3] unless you perform some other operation on it.

aList=[1,2,3]
i=0
for item in aList:
if i<2:
aList.remove(item)
i+=1
aList
[2]
The moral is when modifying a list in a loop driven by the list, takes two steps:
aList=[1,2,3]
i=0
for item in aList:
if i<2:
aList[i]="del"
i+=1
aList
['del', 'del', 3]
for i in range(2):
del aList[0]
aList
[3]

Yes lists and tuples are always ordered while dictionaries are not

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.