numpy array indexing with lists and arrays

numpy array indexing with lists and arrays - python

I have:
>>> a
array([[1, 2],
[3, 4]])
>>> type(l), l # list of scalers
(<type 'list'>, [0, 1])
>>> type(i), i # a numpy array
(<type 'numpy.ndarray'>, array([0, 1]))
>>> type(j), j # list of numpy arrays
(<type 'list'>, [array([0, 1]), array([0, 1])])
When I do
>>> a[l] # Case 1, l is a list of scalers
I get
array([[1, 2],
[3, 4]])
which means indexing happened only on 0th axis.
But when I do
>>> a[j] # Case 2, j is a list of numpy arrays
I get
array([1, 4])
which means indexing happened along axis 0 and axis 1.
Q1: When used for indexing, why is there a difference in treatment of list of scalers and list of numpy arrays ? (Case 1 vs Case 2). In Case 2, I was hoping to see indexing happen only along axis 0 and get
array( [[[1,2],
[3,4]],
[[1,2],
[3,4]]])
Now, when using numpy array of arrays instead
>>> j1 = np.array(j) # numpy array of arrays
The result below indicates that indexing happened only along axis 0 (as expected)
>>> a[j1] Case 3, j1 is a numpy array of numpy arrays
array([[[1, 2],
[3, 4]],
[[1, 2],
[3, 4]]])
Q2: When used for indexing, why is there a difference in treatment of list of numpy arrays and numpy array of numpy arrays? (Case 2 vs Case 3)

Case1, a[l] is actually a[(l,)] which expands to a[(l, slice(None))]. That is, indexing the first dimension with the list l, and an automatic trailing : slice. Indices are passed as a tuple to the array __getitem__, and extra () may be added without confusion.
Case2, a[j] is treated as a[array([0, 1]), array([0, 1]] or a[(array(([0, 1]), array([0, 1])]. In other words, as a tuple of indexing objects, one per dimension. It ends up returning a[0,0] and a[1,1].
Case3, a[j1] is a[(j1, slice(None))], applying the j1 index to just the first dimension.
Case2 is a bit of any anomaly. Your intuition is valid, but for historical reasons, this list of arrays (or list of lists) is interpreted as a tuple of arrays.
This has been discussed in other SO questions, and I think it is documented. But off hand I can't find those references.
So it's safer to use either a tuple of indexing objects, or an array. Indexing with a list has a potential ambiguity.
numpy array indexing: list index and np.array index give different result
This SO question touches on the same issue, though the clearest statement of what is happening is buried in a code link in a comment by #user2357112.
Another way of forcing the Case3 like indexing, make the 2nd dimension slice explicit, a[j,:]
In [166]: a[j]
Out[166]: array([1, 4])
In [167]: a[j,:]
Out[167]:
array([[[1, 2],
[3, 4]],
[[1, 2],
[3, 4]]])
(I often include the trailing : even if it isn't needed. It makes it clear to me, and readers, how many dimensions we are working with.)

A1: The structure of l is not the same as j.
l is just one-dimension while j is two-dimension. If you change one of them:
# l = [0, 1] # just one dimension!
l = [[0, 1], [0, 1]] # two dimensions
j = [np.array([0,1]), np.array([0, 1])] # two dimensions
They have the same behave.
A2: The same, the structure of arrays in Case 2 and Case 3 are not the same.

Related

Numpy array concatenation

In some special cases, array can be concatenated without explicitly calling concatenate function. For example, given a 2D array A, the following code will yield an identical array B:
B = np.array([A[ii,:] for ii in range(A.shape[0])])
I know this method works, but do not quite understand the underlying mechanism. Can anyone demystify the code above a little bit?

A[ii,:] is ii-th row of array A.
The list comprehension [A[ii,:] for ii in range(A.shape[0])] basically makes a list of rows in A (A.shape[0] is number of rows in A).
Finally, B is an array, that its content is a list of A's rows, which is essentially the same as A itself.

By now you should be familiar with making an array from a list of lists:
In [178]: np.array([[1,2],[3,4]])
Out[178]:
array([[1, 2],
[3, 4]])
but that works just as well if it's a list of arrays:
In [179]: np.array([np.array([1,2]),np.array([3,4])])
Out[179]:
array([[1, 2],
[3, 4]])
stack also does this, by adding a dimension to the arrays and calling concatenate (read its code):
In [180]: np.stack([np.array([1,2]),np.array([3,4])])
Out[180]:
array([[1, 2],
[3, 4]])
concatenate joins the arrays - on an existing axis:
In [181]: np.concatenate([np.array([1,2]),np.array([3,4])])
Out[181]: array([1, 2, 3, 4])
stack adds a dimension first, as in:
In [182]: np.concatenate([np.array([[1,2]]),np.array([[3,4]])])
Out[182]:
array([[1, 2],
[3, 4]])
np.array and concatenate aren't identical, but there's a lot of overlap in their functionality.

Check shape of numpy array

I want to write a function that takes a numpy array and I want to check if it meets the requirements. One thing that confuses me is that:
np.array([1,2,3]).shape = np.array([[1,2,3],[2,3],[2,43,32]]) = (3,)
[1,2,3] should be allowed, while [[1,2,3],[2,3],[2,43,32]] shouldn't.
Allowed shapes:
[0, 1, 2, 3, 4]
[0, 1, 2]
[[1],[2]]
[[1, 2], [2, 3], [3, 4]]
Not Allowed:
[] (empty array is not allowed)
[[0], [1, 2]] (inner dimensions must have same size 1!=2)
[[[4,5,6],[4,3,2][[2,3,2],[2,3,4]]] (more than 2 dimension)

You should start with defining what you want in terms of shape. I tried to understand it from the question, please add more details if it is not correct.
So here we have (1) empty array is not allowed and (2) no more than two dimensions. It translates the following way:
def is_allowed(arr):
return arr.shape != (0, ) and len(arr.shape) <= 2
The first condition just compares you array's shape with the shape of an empty array. the second condition checks that an array has no more than two dimensions.
With inner dimensions there is a problem. Some of the lists you provided as an example are not numpy arrays. If you cast np.array([[1,2,3],[2,3],[2,43,32]]), you get just an array where each element is the list. It is not a "real" numpy array with direct access to all the elements. See example:
>>> np.array([[1,2,3],[2,3],[2,43,32]])
array([list([1, 2, 3]), list([2, 3]), list([2, 43, 32])], dtype=object)
>>> np.array([[1,2,3],[2,3, None],[2,43,32]])
array([[1, 2, 3],
[2, 3, None],
[2, 43, 32]], dtype=object)
So I would recommend (if you are operating with usual lists) check that all arrays have the same length without numpy.

How to access numpy array with a set of indices stored in another numpy array?

I have a numpy array which stores a set of indices I need to access another numpy array.
I tried to use a for loop but it doesn't work as I expected.
The situation is like this:
>>> a
array([[1, 2],
[3, 4]])
>>> c
array([[0, 0],
[0, 1]])
>>> a[c[0]]
array([[1, 2],
[1, 2]])
>>> a[0,0] # the result I want
1
Above is a simplified version of my actual code, where the c array is much larger so I have to use a for loop to get every index.

Convert it to a tuple:
>>> a[tuple(c[0])]
1
Because list and array indices trigger advanced indexing. tuples are (mostly) basic slicing.

Index a with columns of c by passing the first column as row's index and second one as column index:
In [23]: a[c[:,0], c[:,1]]
Out[23]: array([1, 2])

finding indices that would sort numpy column returns zeros

I am trying to get the indices that would sort each column of an array using the function argsort. However, it keeps simply returning zeros instead of the true indices. For example:
x = np.matrix([[5, 2, 6], [3, 4, 1]])
print(x)
print(x[:,0])
print(x[:,1])
print(x[:,2])
print(x[:,0].argsort())
print(x[:,1].argsort())
print(x[:,2].argsort())
I am expecting this to return three arrays. [1 0], [0 1] and [1 0] denoting the indices of each column if it were sorted, however, instead I get three arrays that all contain zeros.
Any help much appreciated!

Indexing a matrix with a slice always returns another 2-d matrix. (This behavior is not the same as for a regular numpy array.) See, for example, the output of x[:,0]:
In [133]: x[:,0]
Out[133]:
matrix([[5],
[3]])
x[:,0] is a matrix with shape (2, 1).
To argsort the first (and only) column of that matrix, you have to tell argsort to use the first axis:
In [135]: x[:,0].argsort(axis=0)
Out[135]:
matrix([[1],
[0]])
The default (axis=-1) is to use the last axis, and since the rows in that matrix have length 1, the result when axis is not given is a column of zeros.
By the way, you can do all the columns at once:
In [138]: x
Out[138]:
matrix([[5, 2, 6],
[3, 4, 1]])
In [139]: x.argsort(axis=0)
Out[139]:
matrix([[1, 0, 1],
[0, 1, 0]])

NumPy indexing using List?

SOF,
I noticed an interesting NumPy demo in this URL:
http://cs231n.github.io/python-numpy-tutorial/
I see this:
import numpy as np
a = np.array([[1,2], [3, 4], [5, 6]])
# An example of integer array indexing.
# The returned array will have shape (3,) and
print( a[[0, 1, 2], [0, 1, 0]] )
# Prints "[1 4 5]"
I understand using integers as index arguments:
a[1,1]
and this syntax:
a[0:2,:]
Generally,
If I use a list as index syntax, what does that mean?
Specifically,
I do not understand why:
print( a[[0, 1, 2], [0, 1, 0]] )
# Prints "[1 4 5]"

The last statement will print (in matrix notation) a(0,0), a(1,1) and a(2,0). In python notation that's a[0][0], a[1][1] and a[2][0].
The first index list contains the indices for the first axis (matrix notation: row index), the second list contains the indices for the second axis (column index).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

numpy array indexing with lists and arrays - python

Related

Numpy array concatenation

Check shape of numpy array

How to access numpy array with a set of indices stored in another numpy array?

finding indices that would sort numpy column returns zeros

NumPy indexing using List?

Categories

Resources