So, I know that you can do this by doing
>>> arr[mask] = value
However, if I want to make the code shorter (and not recompute the mask and index each time), I'd like to do something like this:
>>> sub = arr[mask]
>>> sub[...] = value # This works in other cases, but not this one.
My understanding is that doing Ellipses indexing should allow you to specify that you're not reassigning a given variable, but are rather broadcasting to the actual array.
So, here's the question: why doesn't it work?
My thinking is that it's related to the fact that:
>>> arr[mask] is arr[mask]
False
But surely since the mask indexed versions are just views (not copies of the underlying structure), this shouldn't break assignment.
But surely since the mask indexed versions are just views (not copies of the underlying structure), this shouldn't break assignment.
The reason why this doesn't work is that indexing with masks will create a copy, not a view:
Advanced indexing always returns a copy of the data (contrast with basic slicing that returns a view).
arr[mask] is a copy. arr[mask]=... looks the same, but actually is a different assignment operation. Elsewhere I've explained this in terms of calls to __getitem__ and __setitem__.
Related
I saw the following code here, which tries to iterate a numpy array arr and modify its elements. However, I do not quite understand what the purpose of using ellipsis (...) is here. If I removed the ellipsis (i.e. using x = x*2), then the elements of arr will not be modified. Hope to get some hints from you! Thanks.
for x in np.nditer(arr,op_flags='readwrite'):
x[...] = x*2;
Ellipsis ... is a built-in Python symbol that is used to specify slices in N-dimensional Numpy arrays, such as a[0,...,0] is equivalent to a[0,:,:,:,0] for a 5-D array a.
The nditer objects use this syntax to make iterators writable.
Using an iterator object as an lvalue modifies the object itself instead of the location in an array it is referring to. So x[...] is the syntax that is used for dereferencing an iterator.
You could also use this syntax to access the value for reading, but this is redundant.
Also note that your syntax for op_flags is incorrect. It should be a list or a tuple.
for x in np.nditer(arr,op_flags=['readwrite']):
x[...] = x*2;
arr = np.arange(0,11)
slice_of_arr = arr[0:6]
slice_of_arr[:]=99
# slice_of_arr returns
array([99, 99, 99, 99, 99, 99])
# arr returns
array([99, 99, 99, 99, 99, 99, 6, 7, 8, 9, 10])
As the example shown above, you cannot directly change the value of the slice_of_arr, because it's a view of arr, not a new variable.
My questions are:
Why does NumPy design like this? Wouldn't it be tedious every time you need to .copy and then assign value?
Is there anything I can do, to get rid of the .copy? How can I change this default behavior of NumPy?
I think you have the answers in the other comments, but more specifically:
1.a. Why does NumPy design like this?
Because it's way faster (constant time) to create a view rather than creating a whole array (linear time).
1.b. Wouldn't it be tedious every time you need to .copy and then assign value?
Actually it's not that common to need to create a copy. So no, it's not tedious. Even if it can be surprising at first this design is very good.
2.a. Is there anything I can do, to get rid of the .copy?
I can't really tell without seing real code. In the toy example you give, you can't avoid creating a copy, but in real code you usually apply functions to the data, which return another array so a copy isn't needed.
Can you give an example of real code where you need to call .copy repeatedly ?
2.b. How can I change this default behavior of NumPy?
You can't. Try to get used to it, you'll see how powerfull it is.
What does (numpy) __array_wrap__ do?
talks about ndarray subclasses and hooks like __array_wrap__. np.array takes copy parameter, forcing the result to be a copy, even if it isn't required by other considerations. ravel returns a view, flatten a copy. So it is probably possible, and maybe not too difficult, to construct a ndarray subclass that forces a copy. It may involve modifying a hook like __array_wrap__.
Or maybe modifying the .__getitem__ method. Indexing as in slice_of_arr = arr[0:6] involves a call to __getitem__. For ndarray this is compiled, but for a masked array, it is python code that you could use as an example:
/usr/lib/python3/dist-packages/numpy/ma/core.py
It may be something as simple as
def __getitem__(self, indx):
"""x.__getitem__(y) <==> x[y]
"""
# _data = ndarray.view(self, ndarray) # change to:
_data = ndarray.copy(self, ndarray)
dout = ndarray.__getitem__(_data, indx)
return dout
But I suspect that by the time you develop and fully test such a subclass, you might fall in love with the default no-copy approach. While this view-v-copy business bites many new comers (especially if coming from MATLAB), I haven't seen complaints from experienced users. Look at other numpy SO questions; you won't see a lot copy() calls.
Even regular Python users are used asking themselves whether a reference or slice is a copy or not, and whether something is mutable or not.
for example with lists:
In [754]: ll=[1,2,[3,4,5],6]
In [755]: llslice=ll[1:-1]
In [756]: llslice[1][1:2]=[10,11,12]
In [757]: ll
Out[757]: [1, 2, [3, 10, 11, 12, 5], 6]
modifying an item an item inside a slice modifies that same item in the original list. In contrast to numpy, a list slice is a copy. But it's a shallow copy. You have to take extra effort to make a deep copy (import copy).
/usr/lib/python3/dist-packages/numpy/lib/index_tricks.py contains some indexing functions aimed at making certain indexing operations more convenient. Several are actually classes, or class instances, with custom __getitem__ methods. They may also serve as models of how to customize your slicing and indexing.
For doing repeated operations in numpy/scipy, there's a lot of overhead because most operation return a new object.
For example
for i in range(100):
x = A*x
I would like to avoid this by passing a reference to the operation, like you would in C
for i in range(100):
np.dot(A,x,x_new) #x_new would now store the result of the multiplication
x,x_new = x_new,x
Is there any way to do this? I would like this not for just mutiplication but all operations that return a matrix or a vector.
See Learning to avoid unnecessary array copies in IPython Books. From there, note e.g. these guidelines:
a *= b
will not produce a copy, whereas:
a = a * b
will produce a copy. Also, flatten() will copy, while ravel() only copies if necessary and returns a view otherwise (and thus should in general be preferred). reshape() also does not produce a copy, but returns a view.
Furthermore, as #hpaulj and #ali_m noted in their comments, many numpy functions support an out parameter, so have a look at the docs. From numpy.dot() docs:
out : ndarray, optional
Output argument.
This must have the exact kind that would be returned if it was not used. In particular, it must have the right type, must be C-contiguous, and its dtype must be the dtype that would be returned for dot(a,b). This is a performance feature. Therefore, if these conditions are not met, an exception is raised, instead of attempting to be flexible.
I'm working on a project in Python requiring a lot of numerical array calculations. Unfortunately (or fortunately, depending on your POV), I'm very new to Python, but have been doing MATLAB and Octave programming (APL before that) for years. I'm very used to having every variable automatically typed to a matrix float, and still getting used to checking input types.
In many of my functions, I require the input S to be a numpy.ndarray of size (n,p), so I have to both test that type(S) is numpy.ndarray and get the values (n,p) = numpy.shape(S). One potential problem is that the input could be a list/tuple/int/etc..., another problem is that the input could be an array of shape (): S.ndim = 0. It occurred to me that I could simultaneously test the variable type, fix the S.ndim = 0problem, then get my dimensions like this:
# first simultaneously test for ndarray and get proper dimensions
try:
if (S.ndim == 0):
S = S.copy(); S.shape = (1,1);
# define dimensions p, and p2
(p,p2) = numpy.shape(S);
except AttributeError: # got here because input is not something array-like
raise AttributeError("blah blah blah");
Though it works, I'm wondering if this is a valid thing to do? The docstring for ndim says
If it is not already an ndarray, a conversion is
attempted.
and we surely know that numpy can easily convert an int/tuple/list to an array, so I'm confused why an AttributeError is being raised for these types inputs, when numpy should be doing this
numpy.array(S).ndim;
which should work.
When doing input validation for NumPy code, I always use np.asarray:
>>> np.asarray(np.array([1,2,3]))
array([1, 2, 3])
>>> np.asarray([1,2,3])
array([1, 2, 3])
>>> np.asarray((1,2,3))
array([1, 2, 3])
>>> np.asarray(1)
array(1)
>>> np.asarray(1).shape
()
This function has the nice feature that it only copies data when necessary; if the input is already an ndarray, the data is left in-place (only the type may be changed, because it also gets rid of that pesky np.matrix).
The docstring for ndim says
That's the docstring for the function np.ndim, not the ndim attribute, which non-NumPy objects don't have. You could use that function, but the effect would be that the data might be copied twice, so instead do:
S = np.asarray(S)
(p, p2) = S.shape
This will raise a ValueError if S.ndim != 2.
[Final note: you don't need ; in Python if you just follow the indentation rules. In fact, Python programmers eschew the semicolon.]
Given the comments to #larsmans answer, you could try:
if not isinstance(S, np.ndarray):
raise TypeError("Input not a ndarray")
if S.ndim == 0:
S = np.reshape(S, (1,1))
(p, p2) = S.shape
First, you check explicitly whether S is a (subclass of) ndarray. Then, you use the np.reshape to copy your data (and reshaping it, of course) if needed. At last, you get the dimension.
Note that in most cases, the np functions will first try to access the corresponding method of a ndarray, then attempt to convert the input to a ndarray (sometimes keeping it a subclass, as in np.asanyarray, sometimes not (as in np.asarray(...)). In other terms, it's always more efficient to use the method rather than the function: that's why we're using S.shape and not np.shape(S).
Another point: the np.asarray, np.asanyarray, np.atleast_1D... are all particular cases of the more generic function np.array. For example, asarray sets the optional copy argument of array to False, asanyarray does the same and sets subok=True, atleast_1D sets ndmin=1, atleast_2d sets ndmin=2... In other terms, it's always easier to use np.array with the appropriate arguments. But as mentioned in some comments, it's a matter of style. Shortcuts can often improve readability, which is always an objective to keep in mind.
In any case, when you use np.array(..., copy=True), you're explicitly asking for a copy of your initial data, a bit like doing a list([....]). Even if nothing else changed, your data will be copied. That has the advantages of its drawbacks (as we say in French), you could for example change the order from row-first C to column-first F. But anyway, you get the copy you wanted.
With np.array(input, copy=False), a new array is always created. It will either point to the same block of memory as input if this latter was already a ndarray (that is, no waste of memory), or will create a new one "from scratch" if input wasn't. The interesting case is of course if input was a ndarray.
Using this new array in a function may or may not change the original input, depending on the function. You have to check the documentation of the function you want to use to see whether it returns a copy or not. The NumPy developers try hard to limit unnecessary copies (following the Python example), but sometimes it can't be avoided. The documentation should tell explicitly what happens, if it doesn't or it's unclear, please mention it.
np.array(...) may raise some exceptions if something goes awry. For example, trying to use a dtype=float with an input like ["STRING", 1] will raise a ValueError. However, I must admit I can't remember which exceptions in all the cases, please edit this post accordingly.
Welcome to stack-overflow. This comes down to almost a style choice, but the most common way I've seen to deal with this kind of situation is to convert the input to an array. Numpy provides some useful tools for this. numpy.asarray has already been mentioned, but here are a few more. numpy.at_least1d is similar to asarray, but reshapes () arrays to be (1,) numpy.at_least2d is the same as above but reshapes 0d and 1d arrays to be 2d, ie (3,) to (1, 3). The reason we convert "array_like" inputs to arrays is partly just because we're lazy, for example sometimes it can be easier to write foo([1, 2, 3]) than foo(numpy.array([1, 2, 3])), but this is also the design choice made within numpy itself. Notice that the following works:
>>> numpy.mean([1., 2., 3.])
>>> 2.0
In the docs for numpy.mean we can see that x should be "array_like".
Parameters
----------
a : array_like
Array containing numbers whose mean is desired. If `a` is not an
array, a conversion is attempted.
That being said, there are situations when you want to only accept arrays as arguments and not all "array_like" types.
Is there a simple way to create an immutable NumPy array?
If one has to derive a class from ndarray to do this, what's the minimum set of methods that one has to override to achieve immutability?
You can make a numpy array unwriteable:
a = np.arange(10)
a.flags.writeable = False
a[0] = 1
# Gives: ValueError: assignment destination is read-only
Also see the discussion in this thread:
http://mail.scipy.org/pipermail/numpy-discussion/2008-December/039274.html
and the documentation:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.flags.html
I have a subclass of Array at this gist: https://gist.github.com/sfaleron/9791418d7023a9985bb803170c5d93d8
It makes a copy of its argument and marks that as read-only, so you should only be able to shoot yourself in the foot if you are very deliberate about it. My immediate need was for it to be hashable, so I could use them in sets, so that works too. It isn't a lot of code, but about 70% of the lines are for testing, so I won't post it directly.
Note that it's not a drop-in replacement; it won't accept any keyword args like a normal Array constructor. Instances will behave like Arrays, though.
Setting the flag directly didn't work for me, but using ndarray.setflags did work:
a = np.arange(10)
a.setflags(write=False)
a[0] = 1 # ValueError