I'am trying to clip the values in arrays. I found the function np.clip()
and it did what I need. However, the way it modifies the array values in the list of arrays makes me confused. Here is the code:
import numpy as np
a = np.arange(5)
b = np.arange(5)
for x in [a,b]:
np.clip(x, 1, 3, out=x)
result
>>> a
array([1, 1, 2, 3, 3])
>>> b
array([1, 1, 2, 3, 3])
The values of a and b have been changed without being assigned while the function np.clip() only works with x.
There are some questions related, but they use index of the list, e.g Modifying list elements in a for loop, Changing iteration variable inside for loop in Python.
Can somebody explain for me how the function np.clip() can directly modify the value of list values.
It's not because of the np.clip function. It's because you use loop on a list of mutable, so the value of the element can be modified. You can see here Immutable vs Mutable types for more information.
Related
This question already has answers here:
In a Python `for` loop, is the iteration variable a reference? Can it be used to change the underlying data?
(2 answers)
Closed 9 months ago.
Consider the two following codes:
import numpy as np
mainlist = [np.array([0,0,0, 1]), np.array([0,0,0,1])]
for i in range(len(mainlist)):
mainlist[i] = mainlist[i][0:2]
print(mainlist) # [array([0, 0]), array([0, 0])] => OK!
and:
import numpy as np
mainlist = [np.array([0,0,0, 1]), np.array([0,0,0,1])]
for element in mainlist:
element = element[0:2]
print(mainlist) # [array([0, 0, 0, 1]), array([0, 0, 0, 1])] => WTF?
I was wondering why, in the second case, the arrays remain unchanged. It does not even throw an error about mutability problems. Could you explain exactly what is going on regarding the behavior of the second code? What would be the right way of doing it instead?
Variable name a --->(point to) np.array([0,0,0, 1])
Variable name b --->(point to) np.array([0,0,0, 1])
Variable name mainlist --->(point to) [a, b]
When you use for element in mainlist:,
Variable name element --->(point to) np.array([0,0,0, 1])
When you assign another value to element by element = np.array([]),
element --->(point to) np.array([])
But the mainlist is still pointing to [a, b].
When you use mainlist[0] = np.array([]), you really put np.array([]) on the first place of mainlist, but the a is still pointing to np.array([0,0,0, 1]).
element is holding a reference to the array (since you are iterating the list which is also just storing references to the arrays) and not a pointer to the list. element = element[0:2] is just changing the reference stored in element, leaving the one in the list unchanged. You can check the identity of the referenced object using id():
import numpy as np
mainlist = [np.array([0,0,0, 1]), np.array([0,0,0,1])]
ref_0 = id(mainlist[0])
for element in mainlist:
element = element[0:2]
print(mainlist) # [array([0, 0, 0, 1]), array([0, 0, 0, 1])] => WTF?
# True: referenced object changed.
print(ref_0 == id(mainlist[0]))
By doing manlist[i] you are actively changing the reference in the list stored at position i to the new view into the array:
import numpy as np
mainlist = [np.array([0,0,0, 1]), np.array([0,0,0,1])]
ref_0 = id(mainlist[0])
for i in range(len(mainlist)):
mainlist[i] = mainlist[i][0:2]
print(mainlist) # [array([0, 0]), array([0, 0])] => OK!
# False: referenced object changed.
print(ref_0 == id(mainlist[0]))
In the first case, you are actually modifying the elements of the list using the index position, so the list is updated, but in the second case you are only taking the element of the list and then creating a new variable called element and updating element but not updating the actual values present inside the list. So, the original list remains unchanged.
If you want to use the second method to update the list values, you can create a second list and append the element in that list.
Python operates on reference. In the second example, element is a variable of the loop referencing a list item (ie. a Numpy array). element = element[0:2] causes element to reference another object: a newly created view object. element is set later to another reference and the temporary view is implicitly deleted. It does not mutate the initial mainlist list, but only the element reference. In the first example, mainlist[i] = mainlist[i][0:2] mutates mainlist[i] and so the original mainlist list. The item is modified to reference a newly created view. mainlist[i] is not overwritten later so you see the change.
Another example to show such problem is:
l = [[None]*2]*2 # [[None, None], [None, None]]
l[0][0] = 42 # [[42, None], [42, None]]
Two items are modified since they reference the same object (ie. the list [42, None]). You can see that using id: id(l[1]) == id(l[1]).
I have a numpy array that is changed by a function.
After calling the function I want to proceed with the initial value of the array (value before calling the modifying function)
# Init of the array
array = np.array([1, 2, 3])
# Function that modifies array
func(array)
# Print the init value [1,2,3]
print(array)
Is there a way to pass the array by value or am I obligated to make a deep copy?
As I mentioned, np.ndarray objects are mutable data structures. This means that any variables that refer to a particular object will all reflect changes when a change is made to the object.
However, keep in mind that most numpy functions that transform arrays return new array objects, leaving the original unchanged.
What you need to do in this scenario depends on exactly what you're doing. If your function modifies the same array in place, then you'll need to pass a copy to the function. You can do this with np.ndarray.copy.
You could use the following library (https://pypi.python.org/pypi/pynverse) that inverts your function and call it, like so :
from pynverse import inversefunc
cube = (lambda x: x**3)
invcube = inversefunc(cube)
arr = func(arr)
In [0]: arr
Out[0]: array([1, 8, 27], dtype=int32)
In [1]: invcube(arr)
Out[1]: array([1, 2, 3])
what is the difference below codes
built-in list code
>>> a = [1,2,3,4]
>>> b = a[1:3]
>>> b[1] = 0
>>> a
[1, 2, 3, 4]
>>> b
[2, 0]
numpy array
>>> c = numpy.array([1,2,3,4])
>>> d = c[1:3]
>>> d[1] = 0
>>> c
array([1, 2, 0, 4])
>>> d
array([2, 0])
as it is seen in numpy array c is effected directly. I think in built-in lists, new memory is allocated for the variable b. Probably in numpy the reference of c[1:3] is assigned d, I am not clear about these.
How this works for numpy and built-in?
The key point to understand is that every assignment in Python associates a name with an object in memory. Python never copies on assignment. It now becomes important to understand when new objects are created and how they behave.
In your first example, the slicing in the list creates a new list object. In this case, both of the lists reference some of the same objects (the int 2 and the int 3). The fact that these references are copied is what is called a "shallow" copy. In other words, the references are copied, but the objects they refer to are still the same. Keep in mind that this will be true regardless of the type of thing that is stored in the list.
Now, we create a new object (the int 0) and assign b[1] = 0. Because a and b are separate lists, it should not surprise us that they now show different elements.
I like the pythontutor visualisation of this situation.
In the array case, "All arrays generated by basic slicing are always views of the original array.".
This new object shares data with the original, and indexed assignment is handled in such a way that any updates to the view will update the shared data.
This has been covered alot, but finding a good duplicate is too much work. :(
Let's see if I can quickly describe things with your examples:
>>> a = [1,2,3,4] # a list contains pointers to numbers elsewhere
>>> b = a[1:3] # a new list, with copies of those pointers
>>> b[1] = 0 # change one pointer in b
>>> a
[1, 2, 3, 4] # does not change any pointers in a
>>> b
[2, 0]
An array has a different structure - it has a data buffer with 'raw' numbers (or other byte values).
numpy array
>>> c = numpy.array([1,2,3,4])
>>> d = c[1:3] # a view; a new array but uses same data buffer
>>> d[1] = 0 # change a value in d;
>>> c
array([1, 2, 0, 4]) # we see the change in the corrsponding slot of c
>>> d
array([2, 0])
The key point with lists is that they contain pointers to objects. You can copy the pointers without copying the objects; and you can change pointers without changing other copies of the pointers.
To save memory and speed numpy as implemented a concept of view. It can make a new array without copying values from the original - because it can share the data buffer. But it is also possible to make a copy, e.g.
e = c[1:3].copy()
e[0] = 10
# no change in c
view v copy is a big topic in numpy and a fundamental one, especially when dealing with different kinds of indexing (slices, basic, advanced). We can help with questions, but you also should read the numpy docs. There's no substitute for understanding the basics of how a numpy array is stored.
http://scipy-cookbook.readthedocs.io/items/ViewsVsCopies.html
http://www.scipy-lectures.org/advanced/advanced_numpy/ (may be more advanced that what you need now)
For example, for a 1D array with n elements, if I want to do this in Matlab I can do:
A(end+1) = 1
that assigns the value of 1 to the last element of array A which is now n+1 in length.
Is there an equivalent in Python/Numpy?
You can just append a value to the end of an array/list using append or numpy.append:
# Python list
a = [1, 2, 3]
a.append(1)
# => [1, 2, 3, 1]
# Numpy array
import numpy as np
a = np.array([1, 2, 3])
a = np.append(a, 1)
# => [1, 2, 3, 1]
Note, as pointed out by #BrenBarn, that the numpy.append approach creates a whole new array each time it is executed, which makes it inefficient.
I bet the Matlab/Octave operation does the same - create a new object. But I don't know if there is something like the Python id(a) to verify that.
A crude timing test in Octave supports this - creating a large array by appending is slower than stepping through the full array. Both are much slower than direct assignment like A=1:N
octave:36> t=time; N=1000000; A=[]; A(N)=1; for i=1:N A(i)=i; end; t-time
ans = -4.0374
octave:37> t=time; N=1000000; A=[]; for i=1:N A(end+1)=i; end; t-time
ans = -15.218
Extending an array with (end+1) is more idiomatic in Javascript than Matlab.
I would like to ask what the following does in Python.
It was taken from http://danieljlewis.org/files/2010/06/Jenks.pdf
I have entered comments telling what I think is happening there.
# Seems to be a function that returns a float vector
# dataList seems to be a vector of flat.
# numClass seems to an int
def getJenksBreaks( dataList, numClass ):
# dataList seems to be a vector of float. "Sort" seems to sort it ascendingly
dataList.sort()
# create a 1-dimensional vector
mat1 = []
# "in range" seems to be something like "for i = 0 to len(dataList)+1)
for i in range(0,len(dataList)+1):
# create a 1-dimensional-vector?
temp = []
for j in range(0,numClass+1):
# append a zero to the vector?
temp.append(0)
# append the vector to a vector??
mat1.append(temp)
(...)
I am a little confused because in the pdf there are no explicit variable declarations. However I think and hope I could guess the variables.
Yes, the method append() adds elements to the end of the list. I think your interpretation of the code is correct.
But note the following:
x =[1,2,3,4]
x.append(5)
print(x)
[1, 2, 3, 4, 5]
while
x.append([6,7])
print(x)
[1, 2, 3, 4, 5, [6, 7]]
If you want something like
[1, 2, 3, 4, 5, 6, 7]
you may use extend()
x.extend([6,7])
print(x)
[1, 2, 3, 4, 5, 6, 7]
Python doesn't have explicit variable declarations. It's dynamically typed, variables are whatever type they get assigned to.
Your assessment of the code is pretty much correct.
One detail: The range function goes up to, but does not include, the last element. So the +1 in the second argument to range causes the last iterated value to be len(dataList) and numClass, respectively. This looks suspicious, because the range is zero-indexed, which means it will perform a total of len(dataList) + 1 iterations (which seems suspicious).
Presumably dataList.sort() modifies the original value of dataList, which is the traditional behavior of the .sort() method.
It is indeed appending the new vector to the initial one, if you look at the full source code there are several blocks that continue to concatenate more vectors to mat1.
append is a list function used to append a value at the end of the list
mat1 and temp together are creating a 2D array (eg = [[], [], []]) or matrix of (m x n)
where m = len(dataList)+1 and n = numClass
the resultant matrix is a zero martix as all its value is 0.
In Python, variables are implicitely declared. When you type this:
i = 1
i is set to a value of 1, which happens to be an integer. So we will talk of i as being an integer, although i is only a reference to an integer value. The consequence of that is that you don't need type declarations as in C++ or Java.
Your understanding is mostly correct, as for the comments. [] refers to a list. You can think of it as a linked-list (although its actual implementation is closer to std::vectors for instance).
As Python variables are only references to objects in general, lists are effectively lists of references, and can potentially hold any kind of values. This is valid Python:
# A vector of numbers
vect = [1.0, 2.0, 3.0, 4.0]
But this is perfectly valid code as well:
# The list of my objects:
list = [1, [2,"a"], True, 'foo', object()]
This list contains an integer, another list, a boolean... In Python, you usually rely on duck typing for your variable types, so this is not a problem.
Finally, one of the methods of list is sort, which sorts it in-place, as you correctly guessed, and the range function generates a range of numbers.
The syntax for x in L: ... iterates over the content of L (assuming it is iterable) and sets the variable x to each of the successive values in that context. For example:
>>> for x in ['a', 'b', 'c']:
... print x
a
b
c
Since range generates a range of numbers, this is effectively the idiomatic way to generate a for i = 0; i < N; i += 1 type of loop:
>>> for i in range(4): # range(4) == [0,1,2,3]
... print i
0
1
2
3