Numpy Basics - How to Interpret [:,] in array access - python

I have an nd-array A
A.shape
(2, 500, 3)
What's the difference between A[:] and A[:,2]
Coming from Python, the ',' in the array access is confusing me a lot.

The commas separate the subscripts for each dimension. So, for example, if the matrix M is defined as
M = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
then M[2, 1] would be 8 (third row, second column).
The subscript for each dimension can also be a slice, where : represents a full slice, like a slice in normal Python sequences. For example, M[:, 2] would select from every row the third column, which would be [3, 6, 9].
Any additional dimensions for which a subscript is not provided are implicitly full slices. In your example, A[:,2] is equivalent to A[:, 2, :]. If you consider the (2, 500, 3) shaped array to be two stacked matrices with 500 rows and 3 columns, then A[:, 2, :] would select from both matrices the third row (and every column of the third row), which should have a shape of (2, 3).

When you have multidimensional NumPy arrays, the slicing operation [] can work if you provide tuple of slice() objects. If the number of tuples does not match your number of dimensions, this is equivalent to having a slice(None) (which abbreviates to :) in all the remaining dimensions. Note also that NumPy also accepts ... which means "fill the rest of the dimensions with :" - which is especially useful if you want to "fill" the initial dimensions.
So to recapitulate the following expression give identical results on your A array of A.ndim == 3:
A[:, 2]
A[:, 2, :]
A[:, 2, ...]
A[slice(None), 2]
A[slice(None), 2, slice(None)]
A[(slice(None), 2) + tuple(slice(None) for _ in range(A.ndim - 2))]

Related

Numpy indexing oddity: How to subselect from multidimensional array and keep all axes

I have a multi-dimensional array, and have two lists of integers, L_i and L_j, corresponding to the elements of axis-i and axis-j I want to keep. I also want to satisfy the following:
Keep original dimensionality of the array, even if L_i or L_j
consists of just 1 element (in other words I dont want a singleton
axis to be collapsed)
Preserve the order of the axes
What is the cleanest way to do this?
Here is a reproducible example that shows some of the unexpected behavior I've been getting:
import numpy as np
aa = np.arange(120).reshape(5,4,3,2)
aa.shape
### (5,4,3,2) as expected
aa[:,:,:,[0,1]].shape
### (5, 4, 3, 2) as expected
aa[:,:,:,[0]].shape
### (5,4,3,1) as desired. Notice that even though the [0] is one element,
### that last axis is preserved, which is what I want
aa[:,[1,3],:,[0]].shape
### (2, 5, 3) NOT WHAT I EXPECTED!!
### I was expecting (5, 2, 3, 1)
Curious as to why numpy is collapsing and reordering axes, and also best way to do my subsetting correctly.
Regarding the answers to your questions...
Why is numpy collapsing the axes?
Because advanced indices [1,3] and [0] are broadcast together to form a shape (2,) subspace which replaces the subspace they index (i.e. the axes with size 4 and 2 respectively).
Why is numpy reordering the axes?
Because the advanced indices are separated by a slice, there is no unambiguous place to drop the new shape (2,) subspace. As a result, numpy places it at the front of the array, with the sliced dimensions trailing afterward (shape (5, 3)).
... and thus you are left with a shape (2, 5, 3) array.
For more info, see the section in the numpy guide on combining basic and advanced indexing.
PS: It is still possible to get your desired shape using just a single indexing call, but you'll have to part ways with the slices and instead define indices that broadcast to shape (5, 2, 3, 1), for instance using np.ix_:
>>> aa[ np.ix_([0, 1, 2, 3, 4], [1, 3], [0, 1, 2], [0]) ].shape
(5, 2, 3, 1)
Reduce the axes one at a time:
aa[:, [1, 3], :, :][..., [0]].shape
(5, 2, 3, 1)

numpy's transpose method can't convert 1D row ndarray to a column one [duplicate]

This question already has answers here:
Transposing a 1D NumPy array
(15 answers)
Closed 3 years ago.
Let's consider a as an 1D row/horizontal array:
import numpy as np
N = 10
a = np.arange(N) # array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
a.shape # (10,)
now I want to have b a 1D column/vertical array transposed of a:
b = a.transpose() # array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
b.shape # (10,)
but the .transpose() method returns an identical ndarray whith the exact same shape!
What I expected to see was
np.array([[0], [1], [2], [3], [4], [5], [6], [7], [8], [9]])
which can be achieved by
c = a.reshape(a.shape[0], 1) # or c = a; c.shape = (c.shape[0], 1)
c.shape # (10, 1)
and to my surprise, it has a shape of (10, 1) instead of (1, 10).
In Octave/Scilab I could do:
N = 10
b = 0:(N-1)
a = b'
size(b) % ans = 1 10
size(a) % ans = 10 1
I understand that numpy ndarrays are not matrices (as discussed here), but the behavior of the numpy's transpose function just doesn't make sense to me! I would appreciate it if you could help me understand how this behavior makes sense and what am I missing here.
P.S. So what I have understood so far is that b = a.transpose() is the equivalent of b = a; b.shape = b.shape[::-1] which if you had a "2D array" of (N, 1) would return a (1, N) shaped array, as you would expect from a transpose operator. However, numpy seems to treat the "1D array" of (N,) as a 0D scalar. I think they should have named this method something else, as this is very misleading/confusing IMHO.
To understand the numpy array better, you should take a look at this review paper: The NumPy array: a structure for efficient numerical computation
In short, numpy ndarrays have this attribute called the stride, which is
the number of bytes to skip in memory to proceed to the next element.
For a (10, 10) array of bytes, for example, the strides may be (10,
1), in other words: proceed one byte to get to the next column and ten
bytes to locate the next row.
For your ndarray a, a.stride = (8,), which shows that it is only 1 dimensional, and that to get to the next element on this single dimension, you need to advance 8 bytes in memory (each int is 64-bit).
Strides are useful for representing transposes:
By modifying strides, for example, an array can be transposed or
reshaped at zero cost (no memory needs to be copied).
So if there was a 2-dimensional ndarray, say b = np.ones((3,5)) for example, then b.strides = (40, 8), while b.transpose().strides = (8, 40). So as you see a transposed 2D-ndarray is simply the exact same array, whose strides have been reordered. And since your 1D ndarray has only 1 dimension, swapping the the values of its strides (i.e. taking its transpose), doesn't do anything.
As you already mentioned that numpy array are not matrix. The defination of transpose function is like below
Permute the dimensions of an array.
Which means that numpy's transpose method will move data from one dimension to another. As 1D array has only one dimension there is no other dimension to move the data t0. So you need add a dimension before transpose has any effect. This behavior make sense also to be consistent with higher dimensional array (3D, 4D ...) array.
There is a clean way to achive what you want
N = 10
a = np.arange(N)
a[ :, np.newaxis]

Swapping the dimensions of a numpy array using Ellipsis?

This code is swapping first and the last channels of an RBG image which is loaded into a Numpy array:
img = imread('image1.jpg')
# Convert from RGB -> BGR
img = img[..., [2, 1, 0]]
While I understand the use of Ellipsis for slicing in Numpy arrays, I couldn't understand the use of Ellipsis here. Could anybody explain what is exactly happening here?
tl;dr
img[..., [2, 1, 0]] produces the same result as taking the slices img[:, :, i] for each i in the index array [2, 1, 0], and then stacking the results along the last dimension of img. In other words:
img[..., [2,1,0]]
will produce the same output as:
np.stack([img[:,:,2], img[:,:,1], img[:,:,0]], axis=2)
The ellipsis ... is a placeholder that tells numpy which axis to apply the index array to. Without the ... the index array will be applied to the first axis of img instead of the last. Thus, without ..., the index statement:
img[[2,1,0]]
will produce the same output as:
np.stack([img[2,:,:], img[1,:,:], img[0,:,:]], axis=0)
What the docs say
This is an example of what the docs call "Combining advanced and basic indexing":
When there is at least one slice (:), ellipsis (...) or np.newaxis in the index (or the array has more dimensions than there are advanced indexes), then the behaviour can be more complicated. It is like concatenating the indexing result for each advanced index element.
It goes on to describe that in this
case, the dimensions from the advanced indexing operations [in your example [2, 1, 0]] are inserted into the result array at the same spot as they were in the initial array (the latter logic is what makes simple advanced indexing behave just like slicing).
The 2D case
The docs aren't the easiest to understand, but in this case it's not too hard to pick apart. Start with a simpler 2D case:
arr = np.arange(12).reshape(4,3)
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
Using the same kind of advanced indexing with a single index value yields:
arr[:, [1]]
array([[ 1],
[ 4],
[ 7],
[10]])
which is the 1st column of arr. In other words, it's like you yielded all possible values from arr while holding the index of the last axis fixed. Like #hpaulj said in his comment, the ellipsis is there to act as a placeholder. It effectively tells numpy to iterate freely over all of the axes except for the last, to which the indexing array is applied.
You can use also this indexing syntax to shuffle the columns of arr around however you'd like:
arr[..., [1,0,2]]
array([[ 1, 0, 2],
[ 4, 3, 5],
[ 7, 6, 8],
[10, 9, 11]])
This is essentially the same operation as in your example, but on a 2D array instead of a 3D one.
You can explain what's going on with arr[..., [1,0,2]] by breaking it down to simpler indexing ops. It's kind of like you first take the return value of arr[..., [1]]:
array([[ 1],
[ 4],
[ 7],
[10]])
then the return value of arr[..., [0]]:
array([[0],
[3],
[6],
[9]])
then the return value of arr[..., [1]]:
array([[ 2],
[ 5],
[ 8],
[11]])
and then finally concatenated all of those results into a single array of shape (*arr.shape[:-1], len(ix)), where ix = [2, 0, 1] is the index array. The data along the last axis are ordered according to their order in ix.
One good way to understand exactly the ellipsis is doing is to perform the same op without it:
arr[[1,0,2]]
array([[6, 7, 8],
[0, 1, 2],
[3, 4, 5]])
In this case, the index array is applied to the first axis of arr, so the output is an array containing the [1,0,2] rows of arr. Adding an ... before the index array tells numpy to apply the index array to the last axis of arr instead.
Your 3D case
The case you asked about is the 3D equivalent of the 2D arr[..., [1,0,2]] example above. Say that img.shape is (480, 640, 3). You can think about img[..., [2, 1, 0]] as looping over each value i in ix=[2, 1, 0]. For every i, the indexing operation will gather the slab of shape (480, 640, 1) that lies along the ith index of the last axis of img. Once all three slabs are collected, the final result will be the equivalent of concatenating along their last axis (and in the order they were found).
notes
The only difference between arr[..., [1]] and arr[:,1] is that arr[..., [1]] preserves the shape of the data from the original array.
For a 2D array, arr[:, [1]] is equivalent to arr[..., [1]]. : acts as a placeholder just like ..., but only for a single dimension.

How does Ellipsis interact with index elements other than ints and slice objects?

The NumPy indexing docs say that
Ellipsis expand to the number of : objects needed to make a selection
tuple of the same length as x.ndim.
However, this seems to hold only when the other indexing arguments are ints and slice objects. For example, None doesn't seem to count towards the selection tuple length for the purposes of Ellipsis:
>>> import numpy
>>> numpy.zeros([2, 2]).shape
(2, 2)
>>> numpy.zeros([2, 2])[..., None].shape
(2, 2, 1)
>>> numpy.zeros([2, 2])[:, None].shape
(2, 1, 2)
>>> numpy.zeros([2, 2])[:, :, None].shape
(2, 2, 1)
Similar odd effects can be observed with boolean indexes, which may count as multiple tuple elements or none at all.
How does NumPy expand Ellipsis in the general case?
Ellipsis does expand to be equivalent to a number of :s, but that number is not always whatever makes the selection tuple length match the array's ndim. Rather, it expands to enough :s for the selection tuple to use every dimension of the array.
In most NumPy indexing, each element of the selection tuple matches up to some dimension of the original array. For example, in
>>> x = numpy.arange(9).reshape([3, 3])
>>> x[1, :]
array([3, 4, 5])
the 1 matches up to the first dimension of x, and the : matches up to the second dimension. The 1 and the : use those dimensions.
Indexing elements don't always use exactly one array dimension, though. If an indexing element corresponds to no input dimensions, or multiple input dimensions, that indexing element will use that many dimensions of the input. For example, None creates a new dimension in the output not corresponding to any dimension of the input. None doesn't use an input dimension, which is why
numpy.zeros([2, 2])[..., None]
expands to
numpy.zeros([2, 2])[:, :, None]
instead of numpy.zeros([2, 2])[:, None].
Similarly, a boolean index uses a number of dimensions corresponding to the number of dimensions of the boolean index itself. For example, a boolean scalar index uses none:
>>> x[..., False].shape
(3, 3, 0)
>>> x[:, False].shape
(3, 0, 3)
>>> x[:, :, False].shape
(3, 3, 0)
And in the common case of a boolean array index with the same shape as the array it's indexing, the boolean array will use every dimension of the other array, and inserting a ... will do nothing:
>>> x.shape
(3, 3)
>>> (x < 5).shape
(3, 3)
>>> x[x<5]
array([0, 1, 2, 3, 4])
>>> x[..., x<5]
array([0, 1, 2, 3, 4])
If you want to see the source code that handles ... expansion and used dimension calculation, it's in the NumPy github repository in the prepare_index function under numpy/core/src/multiarray/mapping.c. Look for the used_ndim variable.

How to multiply two vector and get a matrix?

In numpy operation, I have two vectors, let's say vector A is 4X1, vector B is 1X5, if I do AXB, it should result a matrix of size 4X5.
But I tried lot of times, doing many kinds of reshape and transpose, they all either raise error saying not aligned or return a single value.
How should I get the output product of matrix I want?
Normal matrix multiplication works as long as the vectors have the right shape. Remember that * in Numpy is elementwise multiplication, and matrix multiplication is available with numpy.dot() (or with the # operator, in Python 3.5)
>>> numpy.dot(numpy.array([[1], [2]]), numpy.array([[3, 4]]))
array([[3, 4],
[6, 8]])
This is called an "outer product." You can get it using plain vectors using numpy.outer():
>>> numpy.outer(numpy.array([1, 2]), numpy.array([3, 4]))
array([[3, 4],
[6, 8]])
If you are using numpy.
First, make sure you have two vectors. For example, vec1.shape = (10, ) and vec2.shape = (26, ); in numpy, row vector and column vector are the same thing.
Second, you do res_matrix = vec1.reshape(10, 1) # vec2.reshape(1, 26) ;.
Finally, you should have: res_matrix.shape = (10, 26).
numpy documentation says it will deprecate np.matrix(), so better not use it.
Function matmul (since numpy 1.10.1) works fine:
import numpy as np
a = np.array([[1],[2],[3],[4]])
b = np.array([[1,1,1,1,1],])
ab = np.matmul(a, b)
print (ab)
print(ab.shape)
You have to declare your vectors right. The first has to be a list of lists of one number (this vector has to have columns in one row), and the second - a list of list (this vector has to have rows in one column) like in above example.
Output:
[[1 1 1 1 1]
[2 2 2 2 2]
[3 3 3 3 3]
[4 4 4 4 4]]
(4, 5)

Categories