Indexing with lists and arrays in numpy appears inconsistent - python

Inspired by this other question, I'm trying to wrap my mind around advanced indexing in NumPy and build up more intuitive understanding of how it works.
I've found an interesting case. Here's an array:
>>> y = np.arange(10)
>>> y
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
if I index it a scalar, I get a scalar of course:
>>> y[4]
4
with a 1D array of integers, I get another 1D array:
>>> idx = [4, 3, 2, 1]
>>> y[idx]
array([4, 3, 2, 1])
so if I index it with a 2D array of integers, I get... what do I get?
>>> idx = [[4, 3], [2, 1]]
>>> y[idx]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: too many indices for array
Oh no! The symmetry is broken. I have to index with a 3D array to get a 2D array!
>>> idx = [[[4, 3], [2, 1]]]
>>> y[idx]
array([[4, 3],
[2, 1]])
What makes numpy behave this way?
To make this more interesting, I noticed that indexing with numpy arrays (instead of lists) behaves how I'd intuitively expect, and 2D gives me 2D:
>>> idx = np.array([[4, 3], [2, 1]])
>>> y[idx]
array([[4, 3],
[2, 1]])
This looks inconsistent from where I'm at. What's the rule here?

The reason is the interpretation of lists as index for numpy arrays: Lists are interpreted like tuples and indexing with a tuple is interpreted by NumPy as multidimensional indexing.
Just like arr[1, 2] returns the element arr[1][2] the arr[[[4, 3], [2, 1]]] is identical to arr[[4, 3], [2, 1]] and will, according to the rules of multidimensional indexing return the elements arr[4, 2] and arr[3, 1].
By adding one more list you do tell NumPy that you want slicing along the first dimension, because the outermost list is effectively interpreted as if you only passed in one "list of indices for the first dimension": arr[[[[4, 3], [2, 1]]]].
From the documentation:
Example
From each row, a specific element should be selected. The row index is just [0, 1, 2] and the column index specifies the element to choose for the corresponding row, here [0, 1, 0]. Using both together the task can be solved using advanced indexing:
>>> x = np.array([[1, 2], [3, 4], [5, 6]])
>>> x[[0, 1, 2], [0, 1, 0]]
array([1, 4, 5])
and:
Warning
The definition of advanced indexing means that x[(1,2,3),] is fundamentally different than x[(1,2,3)]. The latter is equivalent to x[1,2,3] which will trigger basic selection while the former will trigger advanced indexing. Be sure to understand why this occurs.
In such cases it's probably better to use np.take:
>>> y.take([[4, 3], [2, 1]]) # 2D array
array([[4, 3],
[2, 1]])
This function [np.take] does the same thing as “fancy” indexing (indexing arrays using arrays); however, it can be easier to use if you need elements along a given axis.
Or convert the indices to an array. That way NumPy interprets it (array is special cased!) as fancy indexing instead of as "multidimensional indexing":
>>> y[np.asarray([[4, 3], [2, 1]])]
array([[4, 3],
[2, 1]])

Related

Using values in 2d matrix as lookups

I have a 2-D numpy array that I would like to use as a series of indexes. The values in the array are all integer values:
array([[3, 3, 3, 2],
[1, 5, 2, 3],
[4, 2, 3, 2],
[2, 3, 1, 3]])
I also have a 1D array with 5 values
array([x,y,z,q,p])
I'd like to use the cells of the 2D array as lookup values into the 1D array. In other words, where the 2D array is equal to 1, return x. Where it's equal to 2, return y, and so on.
This is simple enough to do in matlab, but I'd like a numpy solution that doesn't involve looping.
Thoughts?
Assuming x, y, z, q, p are variables (and not characters).
If you define the first array as (say, src):
src = np.array([[3, 3, 3, 2],
[1, 5, 2, 3],
[4, 2, 3, 2],
[2, 3, 1, 3]])
and the second as (say, lookup):
lookup = np.array([x,y,z,q,p])
You can get the desired output using:
lookup[src - 1]
Or if you want individual outputs:
lookup[src[i, j] - 1]
where (i, j) is the 2-D index of the value that you want to look up.
Please note that the -1 here is to account for the offset (as mentioned in the comment by slothrop)

Check shape of numpy array

I want to write a function that takes a numpy array and I want to check if it meets the requirements. One thing that confuses me is that:
np.array([1,2,3]).shape = np.array([[1,2,3],[2,3],[2,43,32]]) = (3,)
[1,2,3] should be allowed, while [[1,2,3],[2,3],[2,43,32]] shouldn't.
Allowed shapes:
[0, 1, 2, 3, 4]
[0, 1, 2]
[[1],[2]]
[[1, 2], [2, 3], [3, 4]]
Not Allowed:
[] (empty array is not allowed)
[[0], [1, 2]] (inner dimensions must have same size 1!=2)
[[[4,5,6],[4,3,2][[2,3,2],[2,3,4]]] (more than 2 dimension)
You should start with defining what you want in terms of shape. I tried to understand it from the question, please add more details if it is not correct.
So here we have (1) empty array is not allowed and (2) no more than two dimensions. It translates the following way:
def is_allowed(arr):
return arr.shape != (0, ) and len(arr.shape) <= 2
The first condition just compares you array's shape with the shape of an empty array. the second condition checks that an array has no more than two dimensions.
With inner dimensions there is a problem. Some of the lists you provided as an example are not numpy arrays. If you cast np.array([[1,2,3],[2,3],[2,43,32]]), you get just an array where each element is the list. It is not a "real" numpy array with direct access to all the elements. See example:
>>> np.array([[1,2,3],[2,3],[2,43,32]])
array([list([1, 2, 3]), list([2, 3]), list([2, 43, 32])], dtype=object)
>>> np.array([[1,2,3],[2,3, None],[2,43,32]])
array([[1, 2, 3],
[2, 3, None],
[2, 43, 32]], dtype=object)
So I would recommend (if you are operating with usual lists) check that all arrays have the same length without numpy.

Is there an `np.repeat` which acts on an existing array?

I have a large NumPy array which I want to fill with new data on each iteration of a loop. The array is filled with data repeated along axis 0, for example:
[[1, 5],
[1, 5],
[1, 5],
[1, 5]]
I know how to create this array from scratch in each iteration:
x = np.repeat([[1, 5]], 4, axis=0)
However, I don't want to create a new array every time, because it's a very large array (much larger than 4x2). Instead, I want to create the array in advance using the above code, and then just fill the array with new data on each iteration.
But np.repeat() returns a new array, rather than acting on an existing array. Is there an equivalent of np.repeat() for filling an existing array?
As we noted in comments, you can use a broadcasting assignment to fill your 2d array with a 1d array-like of the appropriate size:
x[...] = [1, 5]
If by any chance your large array always contains the same items in each row (i.e. you won't change these preset values later), you can almost certainly use broadcasting in later parts of your code and just work with an initial x such as
x = np.array([[1, 5]])
This array has shape (1, 2) which is broadcast-compatible with other arrays of shape (4, 2) you might have in the above example.
If you always need the same values in each row and for some reason you can't use broadcasting (both cases are highly unlikely), you can use broadcast_to to create an array with an explicit 2d shape without copying memory:
x_bc = np.broadcast_to([1, 5], (4, 2)) # broadcast 1d [1, 5] to shape (4, 2)
This might work because it has the right shape with only 2 unique elements in memory:
>>> x_bc
array([[1, 5],
[1, 5],
[1, 5],
[1, 5]])
>>> x_bc.strides
(0, 8)
However you can't mutate it, because it's a read-only view:
>>> x_bc[0, :] = [2, 4]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-35-ae12ecfe3c5e> in <module>
----> 1 x_bc[0, :] = [2, 4]
ValueError: assignment destination is read-only
So, if you only need the same values in each row and you can't use broadcasting and you want to mutate those same rows later, you can use stride tricks to map the same 1d data to a 2d array:
>>> x_in = np.array([1, 5])
... x_strided = np.lib.stride_tricks.as_strided(x_in, shape=(4,) + x_in.shape,
... strides=(0,) + x_in.strides[-1:])
>>> x_strided
array([[1, 5],
[1, 5],
[1, 5],
[1, 5]])
>>> x_strided[0, :] = [2, 4]
>>> x_strided
array([[2, 4],
[2, 4],
[2, 4],
[2, 4]])
Which gives you a 2d array of fixed shape that always contains one unique row, and mutating any of the rows mutates the rest (since the underlying data corresponds to only a single row). Handle with care, because if you ever want to have two different rows you'll have to do something else.

Python 2D numpy.ndarray slicing without comma

Recently someone told me to extract the first two columns of a 2D numpy.ndarray by
firstTwoCols = some2dMatrix[:2]
Where is this notation from and how does it work?
I'm only familiar with the comma separated slicing like
twoCols = some2dMatrix[:,:2]
The : before the comma says to get all rows, and the :2 after the comma says for columns 0 up to but not including 2.
firstTwoCols = some2dMatrix[:2]
This will just extract the first 2 rows with all the columns.
twoCols = some2dMatrix[:,:2] is the one that will extract your first 2 columns for all the rows.
The syntax you describe does not extract the first two columns; it extracts the first two rows. If you specify less slices than the dimension of the array, NumPy treats this as equivalent to all further slices being :, so
arr[:2]
is equivalent to
arr[:2, :]
for a 2D array.
Not sure I understand the question but...
If you do:
>>> Matrix = [[x for x in range(1,5)] for x in range(5)]
>>> Matrix
[[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]]
doing Matrix[:2], it will select the first two list in Matrix, [1, 2, 3, 4], [1, 2, 3, 4]. But if you do:
>>> Matrix[:,:2]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: list indices must be integers, not tuple
But if you work with a Numpy, do:
Matrix = np.array(Matrix)
>>>Matrix[:, :2]
array([[1, 2],
[1, 2],
[1, 2],
[1, 2],
[1, 2]])

NumPy min/max in-place assignment

Is it possible to perform min/max in-place assignment with NumPy multi-dimensional arrays without an extra copy?
Say, a and b are two 2D numpy arrays and I would like to have a[i,j] = min(a[i,j], b[i,j]) for all i and j.
One way to do this is:
a = numpy.minimum(a, b)
But according to the documentation, numpy.minimum creates and returns a new array:
numpy.minimum(x1, x2[, out])
Element-wise minimum of array elements.
Compare two arrays and returns a new array containing the element-wise minima.
So in the code above, it will create a new temporary array (min of a and b), then assign it to a and dispose it, right?
Is there any way to do something like a.min_with(b) so that the min-result is assigned back to a in-place?
numpy.minimum() takes an optional third argument, which is the output array. You can specify a there to have it modified in place:
In [9]: a = np.array([[1, 2, 3], [2, 2, 2], [3, 2, 1]])
In [10]: b = np.array([[3, 2, 1], [1, 2, 1], [1, 2, 1]])
In [11]: np.minimum(a, b, a)
Out[11]:
array([[1, 2, 1],
[1, 2, 1],
[1, 2, 1]])
In [12]: a
Out[12]:
array([[1, 2, 1],
[1, 2, 1],
[1, 2, 1]])

Categories