NumPy: Why the need to explicitly copy a value?

NumPy: Why the need to explicitly copy a value? - python

arr = np.arange(0,11)
slice_of_arr = arr[0:6]
slice_of_arr[:]=99
# slice_of_arr returns
array([99, 99, 99, 99, 99, 99])
# arr returns
array([99, 99, 99, 99, 99, 99, 6, 7, 8, 9, 10])
As the example shown above, you cannot directly change the value of the slice_of_arr, because it's a view of arr, not a new variable.
My questions are:
Why does NumPy design like this? Wouldn't it be tedious every time you need to .copy and then assign value?
Is there anything I can do, to get rid of the .copy? How can I change this default behavior of NumPy?

I think you have the answers in the other comments, but more specifically:
1.a. Why does NumPy design like this?
Because it's way faster (constant time) to create a view rather than creating a whole array (linear time).
1.b. Wouldn't it be tedious every time you need to .copy and then assign value?
Actually it's not that common to need to create a copy. So no, it's not tedious. Even if it can be surprising at first this design is very good.
2.a. Is there anything I can do, to get rid of the .copy?
I can't really tell without seing real code. In the toy example you give, you can't avoid creating a copy, but in real code you usually apply functions to the data, which return another array so a copy isn't needed.
Can you give an example of real code where you need to call .copy repeatedly ?
2.b. How can I change this default behavior of NumPy?
You can't. Try to get used to it, you'll see how powerfull it is.

What does (numpy) __array_wrap__ do?
talks about ndarray subclasses and hooks like __array_wrap__. np.array takes copy parameter, forcing the result to be a copy, even if it isn't required by other considerations. ravel returns a view, flatten a copy. So it is probably possible, and maybe not too difficult, to construct a ndarray subclass that forces a copy. It may involve modifying a hook like __array_wrap__.
Or maybe modifying the .__getitem__ method. Indexing as in slice_of_arr = arr[0:6] involves a call to __getitem__. For ndarray this is compiled, but for a masked array, it is python code that you could use as an example:
/usr/lib/python3/dist-packages/numpy/ma/core.py
It may be something as simple as
def __getitem__(self, indx):
"""x.__getitem__(y) <==> x[y]
"""
# _data = ndarray.view(self, ndarray) # change to:
_data = ndarray.copy(self, ndarray)
dout = ndarray.__getitem__(_data, indx)
return dout
But I suspect that by the time you develop and fully test such a subclass, you might fall in love with the default no-copy approach. While this view-v-copy business bites many new comers (especially if coming from MATLAB), I haven't seen complaints from experienced users. Look at other numpy SO questions; you won't see a lot copy() calls.
Even regular Python users are used asking themselves whether a reference or slice is a copy or not, and whether something is mutable or not.
for example with lists:
In [754]: ll=[1,2,[3,4,5],6]
In [755]: llslice=ll[1:-1]
In [756]: llslice[1][1:2]=[10,11,12]
In [757]: ll
Out[757]: [1, 2, [3, 10, 11, 12, 5], 6]
modifying an item an item inside a slice modifies that same item in the original list. In contrast to numpy, a list slice is a copy. But it's a shallow copy. You have to take extra effort to make a deep copy (import copy).
/usr/lib/python3/dist-packages/numpy/lib/index_tricks.py contains some indexing functions aimed at making certain indexing operations more convenient. Several are actually classes, or class instances, with custom __getitem__ methods. They may also serve as models of how to customize your slicing and indexing.

Related

Are there any hidden changes if I change a numpy array that is a reference for another variable?

this is just a question purely out of interest.
I was learning numpy when I stumbled upon the difference between assigning an existing array to a variable and using numpy.copy().
import numpy as np
original = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
new = original[:, 0]
original[2, 2] = 69
print (new)
[ 1 4 7 10]
I know that if changes are made to the existing array (original), new will also have the same changes. But if I change something in original that is out of bounds for new, nothing will be changed in new. My question is, are there any hidden changes in new if this happens? Should I be concerned?

original[:, 0] is a view on original. This means that both share the same memory buffer containing array values. Modifying one will affect the other. Technically, original is also a view and in Numpy you can never really manipulate array, only view on arrays. Numpy take care of the memory management of array for you. A view is a reference to an array, some offsets, some strides, a size and some low-level flags. Not much more. Modifying a values of an array using a view does not affect the other view objects themselves though the content of the views can be affected since they reference the same array (technically, view does not contain the values, only "hidden" referenced array).
Put is shortly: no, there is no other hidden change than the modified cell in the shared array.

It might depend on what you would mean with "hidden" changes. What you have done when you assigned a part of original to new is that you made a view. Only that what is within your "scope" of new is changed. Underlying nothing changes that. As soon as you get into the "scope" of new inside original, you will start seeing changes reflected when you use new. Note, however, that new is not changed under the hood, because you look at a view.

Assign to Logical Indexed Numpy Array

So, I know that you can do this by doing
>>> arr[mask] = value
However, if I want to make the code shorter (and not recompute the mask and index each time), I'd like to do something like this:
>>> sub = arr[mask]
>>> sub[...] = value # This works in other cases, but not this one.
My understanding is that doing Ellipses indexing should allow you to specify that you're not reassigning a given variable, but are rather broadcasting to the actual array.
So, here's the question: why doesn't it work?
My thinking is that it's related to the fact that:
>>> arr[mask] is arr[mask]
False
But surely since the mask indexed versions are just views (not copies of the underlying structure), this shouldn't break assignment.

But surely since the mask indexed versions are just views (not copies of the underlying structure), this shouldn't break assignment.
The reason why this doesn't work is that indexing with masks will create a copy, not a view:
Advanced indexing always returns a copy of the data (contrast with basic slicing that returns a view).

arr[mask] is a copy. arr[mask]=... looks the same, but actually is a different assignment operation. Elsewhere I've explained this in terms of calls to __getitem__ and __setitem__.

Python: Create List of Object References

I want to clean up some code I've written, in order to scale the magnitude of what I'm trying to do. In order to do so, I'd like to ideally create a list of references to objects, so that I can systematically set the objects, using a loop, without actually have to put the objects in list. I've read about the way Python handles references and pass-by, but haven't quite found a way to do this effectively.
To better demonstrate what I'm trying to do:
I'm using bokeh, and would like to set up a large number of select boxes. Each box looks like this
select_one_name = Select(
title = 'test',
value = 'first_value',
options = ['first_value', 'second_value', 'etc']
)
Setting up each select is fine, when I only have a few, but when I have 20, my code gets very long and unwieldy. What I'd like to be able to do, is have a list of sample_list = [select_one_name, select_two_name, etc] that I can then loop through, to set the values of each select_one_name, select_two_name, etc. However, I want to have my reference select_one_name still point to the correct value, rather than necessarily refer to the value by calling sample_list[0].
I'm not sure if this is do-able--if there's a better way to do this, than creating a list of references, please let me know. I know that I could just create a list of objects, but I'm trying to avoid that.
For reference, I'm on Python 2.7, Anaconda distribution, Windows 7. Thanks!
To follow up on #Alex Martelli's post below:
The reason why I thought this might not work, is because when I tried a mini-test with a list of lists, I didn't get the results I wanted. To demonstrate
x = [1, 2, 3]
y = [4, 5, 6]
test = [x, y]
test[0].append(1)
Results in x = [1, 2, 3, 1] but if instead, I use test[0] = [1, 2], then x remains [1, 2, 3], although test itself reflects the change.
Drawing a parallel back to my original example, I thought that I would see the same results as from setting to equal. Is this not true?

Every Python list always is internally an array of references (in CPython, which is no doubt what you're using, at the C level it's an array of PyObject* -- "pointers to Python objects").
No copies of the objects get made implicitly: rather (again, in CPython) each object's reference count gets incremented when the you add "the object" (actually a reference to it) to the list. In fact when you do want an object's copy you need to specifically ask for one (with the copy module in general, or sometimes with type-specific copy methods).
Multiple references to the same object are internally pointers to exactly the same memory. If an object is mutable, then mutating it gets reflected through all the references to it. Of course, there are immutable objects (strings, numbers, tuples, ...) to which such mutation cannot apply.
So when you do, e.g,
sample_list = [select_one_name, select_two_name, etc]
each of the names (as long as it's in scope) still refers to exactly the same object as the corresponding item in sample_list.
In other words, using sample_list[0] and select_one_name is totally equivalent as long as both references to the same object exist.
IOW squared, your stated purpose is already accomplished by Python's most fundamental semantics. Now, please edit the Q to clarify which behavior you're observing that seems to contradict this, versus which behavior you think you should be observing (and desire), and we may be able to help further -- because to this point all the above observations amount to "you're getting exactly the semantics you ask for" so "steady as she goes" is all I can practically suggest!-)
Added (better here in the answer than just below in comments:-): note the focus on mutating operation. The OP tried test[0]= somelist followed by test[0].append and saw somelist mutated accordingly; then tried test[0] = [1, 2] and was surprised to see somelist not changed. But that's because assignment to a reference is not a mutating operation on the object that said reference used to indicate! It just re-seats the reference, decrement the previously-referred-to object's reference count, and that's it.
If you want to mutate an existing object (which needs to be a mutable one in the first place, but, a list satisfies that), you need to perform mutating operations on it (through whatever reference, doesn't matter). For example, besides append and many other named methods, one mutating operation on a list is assignment to a slice, including the whole-list slice denoted as [:]. So, test[0][:] = [1,2] would in fact mutate somelist -- very different from test[0] = [1,2] which assigns to a reference, not to a slice.

This is not recommended, but it works.
sample_list = ["select_one_name", "select_two_name", "select_three_name"]
for select in sample_list:
locals()[select] = Select(
title = 'test',value = 'first_value',
options = ['first_value', 'second_value', 'etc']
)
You can use select_one_name, select_two_name, etc directly because they're set in the local scope due the special locals() list.
A cleaner approach is to use a dictionary, e.g.
selects = {
'select_one_name': Select(...),
'select_two_name': Select(...),
'select_three_name': Select(...)
}
And reference selects['select_one_name'] in your code and you can iterate over selects.keys() or selects.items().

What to pass when passing arguments where a list or tuple is required?

Which of the following should I use and why?
import numpy as np
a = np.zeros([2, 3])
b = np.zeros((2, 3))
There are many cases where you can pass arguments in either way, I just wonder if one is more Pythonic or if there are other reasons where one should be preferred over the other.
I looked at this question where people tried to explain the difference between a tuple and a list. That's not what I'm interested in, unless there are reasons I should care which I ignore of course!
UPDATE:
Although numpy was used as an example this pertains generally to python. A non numpy example is as follows:
a = max([1, 2, 3, 5, 4])
b = max((1, 2, 3, 5, 4))
I'm not editing the above because some answers use numpy in their explanation

I'm answering this in the context of passing a literal iterable to a constructor or function beyond which the type does not matter. If you need to pass in a hashable argument, you need a tuple. If you'll need it mutated, pass in a list (so that you don't add tuples to tuples thereby multiplying the creation of objects.)
The answer to your question is that the better option varies situationally. Here's the tradeoffs.
Starting with list type, which is mutable, it preallocates memory for future extension:
a = np.zeros([2, 3])
Pro: It's easily readable.
Con: It wastes memory, and it's less performant.
Next, the tuple type, which is immutable. It doesn't need to preallocate memory for future extension, because it can't be extended.
b = np.zeros((2, 3))
Pro: It uses minimal memory, and it's more performant.
Con: It's a little less readable.
My preference is to pass tuple literals where memory is a consideration, for example, long-running scripts that will be used by lots of people. On the other hand, when I'm using an interactive interpreter, I prefer to pass lists because they're a bit more readable, the contrast between the square brackets and the parenthesis makes for easy visual parsing.
You should only care about performance in a function, where the code is compiled to bytecode:
>>> min(timeit.repeat('foo()', 'def foo(): return (0, 1)'))
0.080030765042010898
>>> min(timeit.repeat('foo()', 'def foo(): return [0, 1]'))
0.17389221549683498
Finally, note that the performance consideration will be dwarfed by other considerations. You use Python for speed of development, not for speed of algorithmic implementation. If you use a bad algorithm, your performance will be much worse. It's also very performant in many respects as well. I consider this only important insomuch as it may scale, if it can ameliorate heavily used processes from dying a death of a thousand cuts.

If the number of items is known at design time (e.g coordinates, colour systems) then I would go with tuples, otherwise go with lists.
If I am writing an interface, my code will tend to just check for whether the argument is iterable or is a sequence (rather than checking for a specific type, unless the interface needs a specific type). I use the collections module to do my checks - it feels cleaner than checking for particular attributes.

assignment in python

I know that "variable assignment" in python is in fact a binding / re-bindign of a name (the variable) to an object.
This brings the question: is it possible to have proper assignment in python, eg make an object equal to another object?
I guess there is no need for that in python:
Inmutable objects cannot be 'assigned to' since they can't be changed
Mutable objects could potentially be assigned to, since they can change, and this could be useful, since you may want to manipulate a copy of dictionary separately from the original one. However, in these cases the python philosophy is to offer a cloning method on the mutable object, so you can bind a copy rather than the original.
So I guess the answer is that there is no assignment in python, the best way to mimic it would be binding to a cloned object
I simply wanted to share the question in case I'm missing something important here
Thanks
EDIT:
Both Lie Ryan and Sven Marnach answers are good, I guess the overall answer is a mix of both:
For user defined types, use the idiom:
a.dict = dict(b.dict)
(I guess this has problems as well if the assigned class has redefined attribute access methods, but lets not be fussy :))
For mutable built-ins (lists and dicts) use the cloning / copying methods they provide (eg slices, update)
finally inmutable built-ins can't be changed so can't be assigned
I'll choose Lie Ryan because it's an elegant idiom that I hadn't thought of.
Thanks!

I think you are right with your characterization of assignment in Python -- I just would like to add a different method of cloning and ways of assignment in special cases.
"Copy-constructing" a mutable built-in Python object will yield a (shallow) copy of that object:
l = [2, 3]
m = list(l)
l is m
--> False
[Edit: As pointed out by Paul McGuire in the comments, the behaviour of a "copy contructor" (forgive me the C++ terminology) for a immutable built-in Python object is implementation dependent -- you might get a copy or just the same object. But because the object is immutable anyway, you shouldn't care.]
The copy constructor could be called generically by y = type(x)(x), but this seems a bit cryptic. And of course, there is the copy module which allows for shallow and deep copies.
Some Python objects allow assignment. For example, you can assign to a list without creating a new object:
l = [2, 3]
m = l
l[:] = [3, 4, 5]
m
--> [3, 4, 5]
For dictionaries, you could use the clear() method followed by update(otherdict) to assign to a dictionary without creating a new object. For a set s, you can use
s.clear()
s |= otherset

This brings the question: is it
possible to have proper assignment in
python, eg make an object equal to
another object?
Yes you can:
a.__dict__ = dict(b.__dict__)
will do the default assignment semantic in C/C++ (i.e. do a shallow assignment).
The problem with such generalized assignment is that it never works for everybody. In C++, you can override the assignment operator since you always have to pick whether you want a fully shallow assignment, fully deep assignment, or any shade between fully deep copy and fully shallow copy.

I don't think you are missing anything.
I like to picture variables in python as the name written on 'labels' that are attached to boxes but can change its placement by assignment, whereas in other languages, assignment changes the box's contents (and the assignment operator can be overloaded).
Beginners can write quite complex applications without being aware of that, but they are usually messy programs.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

NumPy: Why the need to explicitly copy a value? - python

Related

Are there any hidden changes if I change a numpy array that is a reference for another variable?

Assign to Logical Indexed Numpy Array

Python: Create List of Object References

What to pass when passing arguments where a list or tuple is required?

assignment in python

Categories

Resources