Numpy vs built-in copy list

Numpy vs built-in copy list - python

what is the difference below codes
built-in list code
>>> a = [1,2,3,4]
>>> b = a[1:3]
>>> b[1] = 0
>>> a
[1, 2, 3, 4]
>>> b
[2, 0]
numpy array
>>> c = numpy.array([1,2,3,4])
>>> d = c[1:3]
>>> d[1] = 0
>>> c
array([1, 2, 0, 4])
>>> d
array([2, 0])
as it is seen in numpy array c is effected directly. I think in built-in lists, new memory is allocated for the variable b. Probably in numpy the reference of c[1:3] is assigned d, I am not clear about these.
How this works for numpy and built-in?

The key point to understand is that every assignment in Python associates a name with an object in memory. Python never copies on assignment. It now becomes important to understand when new objects are created and how they behave.
In your first example, the slicing in the list creates a new list object. In this case, both of the lists reference some of the same objects (the int 2 and the int 3). The fact that these references are copied is what is called a "shallow" copy. In other words, the references are copied, but the objects they refer to are still the same. Keep in mind that this will be true regardless of the type of thing that is stored in the list.
Now, we create a new object (the int 0) and assign b[1] = 0. Because a and b are separate lists, it should not surprise us that they now show different elements.
I like the pythontutor visualisation of this situation.
In the array case, "All arrays generated by basic slicing are always views of the original array.".
This new object shares data with the original, and indexed assignment is handled in such a way that any updates to the view will update the shared data.

This has been covered alot, but finding a good duplicate is too much work. :(
Let's see if I can quickly describe things with your examples:
>>> a = [1,2,3,4] # a list contains pointers to numbers elsewhere
>>> b = a[1:3] # a new list, with copies of those pointers
>>> b[1] = 0 # change one pointer in b
>>> a
[1, 2, 3, 4] # does not change any pointers in a
>>> b
[2, 0]
An array has a different structure - it has a data buffer with 'raw' numbers (or other byte values).
numpy array
>>> c = numpy.array([1,2,3,4])
>>> d = c[1:3] # a view; a new array but uses same data buffer
>>> d[1] = 0 # change a value in d;
>>> c
array([1, 2, 0, 4]) # we see the change in the corrsponding slot of c
>>> d
array([2, 0])
The key point with lists is that they contain pointers to objects. You can copy the pointers without copying the objects; and you can change pointers without changing other copies of the pointers.
To save memory and speed numpy as implemented a concept of view. It can make a new array without copying values from the original - because it can share the data buffer. But it is also possible to make a copy, e.g.
e = c[1:3].copy()
e[0] = 10
# no change in c
view v copy is a big topic in numpy and a fundamental one, especially when dealing with different kinds of indexing (slices, basic, advanced). We can help with questions, but you also should read the numpy docs. There's no substitute for understanding the basics of how a numpy array is stored.
http://scipy-cookbook.readthedocs.io/items/ViewsVsCopies.html
http://www.scipy-lectures.org/advanced/advanced_numpy/ (may be more advanced that what you need now)

Related

Does the list object as argument value of an array object occupy memory in Python?

An array.array object has a smaller memory footprint than a list object in Python. Is it still memory-efficient when you create the array.array like below?
from array import array
array('l', [1, 2, 3, 4, 5])
We're still creating a list object there (as argument value) to create that array. Doesn't that make an array not worth using?
Update: It looks like I should reconsider the claim that arrays occupy less space than lists. It seems their behaviour differs across different Python versions:
Python 3.5.2
>>> import array, sys
>>> mylist = [1, 2, 3]
>>> myarray = array.array('i', [1, 2, 3])
>>> sys.getsizeof(mylist)
44
>>> sys.getsizeof(myarray)
44
Python 3.6.3
>>> import array, sys
>>> mylist = [1, 2, 3]
>>> myarray = array.array('i', [1, 2, 3])
>>> sys.getsizeof(mylist)
88
>>> sys.getsizeof(myarray)
76
However, my original question still remains (for Python 3.6). myarray used a list to be constructed. How is using an array more memory-efficient?

sys.getsizeof is not recursive. So it shows you, how much is list taking in memory. But not list's contents. Try this:
mylist = [ 5000, 5001, 5002 ]
sys.getsizeof(mylist) + sum(sys.getsizeof(q) for q in mylist)
outputs 172.
array.array is certainly using less memory, it also packs densier, so it's more cache friendly.
In your example also list is used to create array.array and then immediately destroyed. So no big issue here.

By using array.array you are storing adjacent C-like primitive values, instead of objects. If the arrays are so big that the size of the initializer list worries you, use an iterator for the initializer. That should reduce memory usage to a single element during array initialization.

array_a = array_b[:] but changing a changes b aswell (numpy)

sorry this question came up before here Setting two arrays equal
But the solution did not work and i dont know why.
import numpy as np
zero_matrix = np.zeros((3,3)) # 3x3 zero matrix
test_matrix = zero_matrix[:] # test_matrix is a view of zero_matrix. Without [:] it would be same object
print (zero_matrix)
print ()
print (test_matrix)
print ()
print(id(test_matrix))
print ()
print(id(zero_matrix))
print ()
test_matrix[1] = 42
print (test_matrix)
print ()
print (zero_matrix)
the 'zero_matrix' is also changed when i set the test_matrix[1] = 42.
And i dont get why since both have different object ids.

This is what is mean by the comment in your code that says test_matrix is a "view". A view does not have its own copy of the data. Rather it shares the underlying data of the original array. Views do not have to be of the entire array, but can be of small sub-sections of the array. These sub sections do not even need to be contiguous if the view is strided. eg.
a = np.arange(10)
b = a[::2] # create a view of every other element starting with the 0-th
assert list(b) == [0, 2, 4, 6, 8]
assert a[4] == 4
b[2] = -1
assert a[4] == -1
Views are powerful as they allow more complex operations without having to copy large amounts of data. Not needing to copy data all the time can mean some operations are faster than they otherwise would be.
Beware, not all index operations create views. eg.
a = np.arange(10, 20)
b = a[[1,2,5]]
assert list(b) == [11, 12, 15]
b[0] == -1
assert a[1] != -1

Use copy to copy your numpy arrays:
zero_matrix = np.zeros((3,3))
test_matrix = zero_matrix.copy()
test_matrix[1] = 42
print(zero_matrix)
print(test_matrix)
Numpy arrays and python lists behave differently in this regard.

They indeed have both different object IDs, but, as you write yourself: test_matrix is a view of zero_matrix.
An object is usually called a "view object" when it provides a way to access another object (be it by reading or by writing). In this case, accesses to this view object are deflected to the other object both by reading and writing.
That's a speciality of numpy objects opposed to "normal" python objects.
But even python has these objects, but doesn't use them unless explicitly requested.

Changing one list unexpectedly changes another, too [duplicate]

This question already has answers here:
How do I clone a list so that it doesn't change unexpectedly after assignment?
(24 answers)
Closed 4 years ago.
I have a list of the form
v = [0,0,0,0,0,0,0,0,0]
Somewhere in the code I do
vec=v
vec[5]=5
and this changes both v and vec:
>>> print vec
[0, 0, 0, 0, 0, 5, 0, 0, 0]
>>> print v
[0, 0, 0, 0, 0, 5, 0, 0, 0]
Why does v change at all?

Why does v change at all?
vec and v are both references.
When coding vec = v you assign v address to vec.
Therefore changing data in v will also "change" vec.
If you want to have two different arrays use:
vec = list(v)

Because v is pointed to the same list as vec is in memory.
If you do not want to have that you have to make a
from copy import deepcopy
vec = deepcopy(v)
or
vec = v[:]

Python points both lists in vec = v to the same spot of memory.
To copy a list use vec = v[:]
This might all seem counter-intuitive. Why not make copying the list the default behavior? Consider the situation
def foo():
my_list = some_function()
# Do stuff with my_list
Wouldn't you want my_list to contain the exact same list that was created in some_function and not have the computer spend extra time creating a copy. For large lists copying the data can take some time. Because of this reason, Python does not copy a list upon assignment.
Misc Notes:
If you're familiar with languages that use pointers. Internally, in the resulting assembly language, vec and v are just pointers that reference the address in memory where the list starts.
Other languages have been able to overcome the obstacles I mentioned through the use of copy on write which allows objects to share memory until they are modified. Unfortunately, Python never implemented this.
For other ways of copying a list, or to do a deep copy, see List changes unexpectedly after assignment. Why is this and how can I prevent it?

Run this code and you will understand why variable v changes.
a = [7, 3, 4]
b = a
c = a[:]
b[0] = 10
print 'a: ', a, id(a)
print 'b: ', b, id(b)
print 'c: ', c, id(c)
This code prints the following output on my interpreter:
a: [10, 3, 4] 140619073542552
b: [10, 3, 4] 140619073542552
c: [7, 3, 4] 140619073604136
As you can see, lists a and b point to the same memory location. Whereas, list c is a different memory location altogether. You can say that variables a and b are alias for the same list. Thus, any change done to either variable a or b will be reflected in the other list as well, but not on list c
Hope this helps! :)

you could use
vec=v[:] #but
"Alex Martelli's opinion (at least back in 2007) about this is, that it is a weird syntax and it does not make sense to use it ever. ;) (In his opinion, the next one is more readable)."
vec=list(v)
I mean it was Erez's link... "How to clone or copy a list in Python?"

Two variables with the same list have different IDs.....why is that?

Trying to understand the following
Why is it that the ID's assigned by Python are different for the same lists?
x = [1, 2, 3]
y = [1, 2, 3]
id(x) != id(y)
True
id(x)
11428848
id(y)
12943768

Every distinct object in Python has its own ID. It's not related to the contents -- it's related to the location where the information that describes the object is stored. Any distinct object stored in a distinct place will have a distinct id. (It's sometimes, but not always, the memory address of the object.)
This is especially important to understand for mutable objects -- that is, objects that can be changed, like lists. If an object can be changed, then you can create two different objects with the same contents. They will have different IDs, and if you change one later, the second will not change.
For immutable objects like integers and strings, this is less important, because the contents can never change. Even if two immutable objects have different IDs, they are essentially identical if they have identical contents.
This set of ideas goes pretty deep. You can think of a variable name as a tag assigned to an ID number, which in turn uniquely identifies an object. Multiple variable names can be used to tag the same object. Observe:
>>> a = [1, 2, 3]
>>> b = [1, 2, 3]
>>> id(a)
4532949432
>>> id(b)
4533024888
That, you've already discovered. Now let's create a new variable name:
>>> c = b
>>> id(c)
4533024888
No new object has been created. The object tagged with b is now tagged with c as well. What happens when we change a?
>>> a[1] = 1000
>>> a
[1, 1000, 3]
>>> b
[1, 2, 3]
a and b are different, as we know because they have different IDs. So a change to one doesn't affect the other. But b and c are the same object -- remember? So...
>>> b[1] = 2000
>>> b
[1, 2000, 3]
>>> c
[1, 2000, 3]
Now, if I assign a new value to b, it doesn't change anything about the objects themselves -- just the way they're tagged:
>>> b = a
>>> a
[1, 1000, 3]
>>> b
[1, 1000, 3]
>>> c
[1, 2000, 3]

The why to that is that if you do that:
l = [1, 2, 3]
m = [1, 2, 3]
l.append(4)
Ids should not be the same and ids must not change for any objects since they identify them.
All mutable objects works this way. But it is also the case for tuples (which are unmutable).
Edit:
As commented below, the ids may refer to memory address in some python implementation but not in all.

Those aren't the same lists. They may contain identical information, but they are not the same. If you made y = x, you'd find that actually the id is the same.

Python keep the mutable variables with different IDs, that's why.
You can check it with immutable object ids too; a tuple, for example.

Are numpy arrays passed by reference?

I came across the fact that numpy arrays are passed by reference at multiple places, but then when I execute the following code, why is there a difference between the behavior of foo and bar
import numpy as np
def foo(arr):
arr = arr - 3
def bar(arr):
arr -= 3
a = np.array([3, 4, 5])
foo(a)
print a # prints [3, 4, 5]
bar(a)
print a # prints [0, 1, 2]
I'm using python 2.7 and numpy version 1.6.1

In Python, all variable names are references to values.
When Python evaluates an assignment, the right-hand side is evaluated before the left-hand side. arr - 3 creates a new array; it does not modify arr in-place.
arr = arr - 3 makes the local variable arr reference this new array. It does not modify the value originally referenced by arr which was passed to foo. The variable name arr simply gets bound to the new array, arr - 3. Moreover, arr is local variable name in the scope of the foo function. Once the foo function completes, there is no more reference to arr and Python is free to garbage collect the value it references. As Reti43 points out, in order for arr's value to affect a, foo must return arr and a must be assigned to that value:
def foo(arr):
arr = arr - 3
return arr
# or simply combine both lines into `return arr - 3`
a = foo(a)
In contrast, arr -= 3, which Python translates into a call to the __iadd__ special method, does modify the array referenced by arr in-place.

The first function calculates (arr - 3), then assigns the local name arr to it, which doesn't affect the array data passed in. My guess is that in the second function, np.array overrides the -= operator, and operates in place on the array data.

Python passes the array by reference:
$:python
...python startup message
>>> import numpy as np
>>> x = np.zeros((2,2))
>>> x
array([[0.,0.],[0.,0.]])
>>> def setx(x):
... x[0,0] = 1
...
>>> setx(x)
>>> x
array([[1.,0.],[0.,0.]])
The top answer is referring to a phenomenon that occurs even in compiled c-code, as any BLAS events will involve a "read-onto" step where either a new array is formed which the user (code writer in this case) is aware of, or a new array is formed "under the hood" in a temporary variable which the user is unaware of (you might see this as a .eval() call).
However, I can clearly access the memory of the array as if it is in a more global scope than the function called (i.e., setx(...)); which is exactly what "passing by reference" is, in terms of writing code.
And let's do a few more tests to check the validity of the accepted answer:
(continuing the session above)
>>> def minus2(x):
... x[:,:] -= 2
...
>>> minus2(x)
>>> x
array([[-1.,-2.],[-2.,-2.]])
Seems to be passed by reference. Let us do a calculation which will definitely compute an intermediate array under the hood, and see if x is modified as if it is passed by reference:
>>> def pow2(x):
... x = x * x
...
>>> pow2(x)
>>> x
array([[-1.,-2.],[-2.,-2.]])
Huh, I thought x was passed by reference, but maybe it is not? -- No, here, we have shadowed the x with a brand new declaration (which is hidden via interpretation in python), and python will not propagate this "shadowing" back to global scope (which would violate the python-use case: namely, to be a beginner level coding language which can still be used effectively by an expert).
However, I can very easily perform this operation in a "pass-by-reference" manner by forcing the memory (which is not copied when I submit x to the function) to be modified instead:
>>> def refpow2(x):
... x *= x
...
>>> refpow2(x)
>>> x
array([[1., 4.],[4., 4.]])
And so you see that python can be finessed a bit to do what you are trying to do.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Numpy vs built-in copy list - python

Related

Does the list object as argument value of an array object occupy memory in Python?

array_a = array_b[:] but changing a changes b aswell (numpy)

Changing one list unexpectedly changes another, too [duplicate]

Two variables with the same list have different IDs.....why is that?

Are numpy arrays passed by reference?

Categories

Resources