indexing rows and columns in numpy - python

a = np.array(list(range(16).reshape((4,4))
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
Say I want the middle square. It'd seem reasonable to do this:
a[[1,2],[1,2]]
but I get this:
array([5, 10])
This works, but seems inelegant:
a[[1,2],:][:,[1,2]]
array([[5, 6],
[9, 10]])
So my questions are:
Why is it this way? What premises are required to make the implemented way sensible?
Is there a canonical way to select along more than one index at once?

I think you can read more details on advanced indexing. Basically, when you slice the array by lists/arrays, the arrays will be broadcast and iterate together.
In your case, you can do:
idx = np.array([1,3])
a[idx,idx[:,None]]
Or as in the doc above:
a[np.ix_(idx, idx)]
Output:
array([[ 5, 13],
[ 7, 15]])

You can do both slicing operations at once instead of creating a view and indexing that again:
import numpy as np
a = np.arange(16).reshape((4, 4))
# preferred if possible
print(a[1:3, 1:3])
# [[ 5 6]
# [ 9 10]]
# otherwise add a second dimension to the first index to make it broadcastable
index1 = np.asarray([1, 2])
index2 = np.asarray([1, 2])
print(a[index1[:, None], index2])
# [[ 5 6]
# [ 9 10]]

You could use multiple np.take to select indices from multiple axes
a = np.arange(16).reshape((4, 4))
idx = np.array([1,2])
np.take(np.take(a, idx, axis=1), idx, axis=0)
Or (slightly more readable)
a.take(idx, axis=1).take(idx, axis=0)
Output:
array([[ 5, 6],
[ 9, 10]])
np.take also allows you to conveniently wrap around out-of-bound indices and such.

Related

How to slice 2D Torch tensor individually per row?

I have a 2D tensor in Pytorch that I would like to slice:
x = torch.rand((3, 5))
In this example, the tensor has 3 rows and I want to slice x, creating a new tensor y that also has 3 rows and num_col cols.
What's challenging for me is that I want to slice different columns per row. All I have is x, num_cols, and idx, which is a tensor holding the start index from where to slice.
Example:
What I have is num_cols=2, idx=[1,2,3] and
x=torch.arange(15).reshape((3,-1)) =
tensor([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
What I want is
y=
tensor([[ 1, 2],
[ 7, 8],
[13, 14]])
What's the "torch"-way of doing this? I know, I can slice if I get a boolean mask somehow, but I don't know how to construct that with idx and num_cols without normal Python loops.
You could use fancy indexing together with broadcasting. Another solution might be to use torch.gather which is similar to numpy's take_along_axis. Your idx array would need to be extended with the extra column:
x = torch.arange(15).reshape(3,-1)
idx = torch.tensor([1,2,3])
idx = torch.column_stack([idx, idx+1])
torch.gather(x, 1, idx)
output:
tensor([[ 1, 2],
[ 7, 8],
[13, 14]])

Why does the axis argument in NumPy change?

I am very confused when it comes to the logic of the NumPy axis argument. In some cases it affects the row when axis = 0 and in some cases it affects the columns when axis = 0. Example:
a = np.array([[1,3,6,7,4],[3,2,5,9,1]])
array([[1,3,6,7,4],
[3,2,5,9,1]])
np.sort(a, axis = 0) #This sorts the columns
array([[1, 2, 5, 7, 1],
[3, 3, 6, 9, 4]])
np.sort(a, axis=1) #This sorts the rows
array([[1, 3, 4, 6, 7],
[1, 2, 3, 5, 9]])
#####################################################################
arr = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
arr
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
np.delete(arr,obj = 1, axis = 0) # This deletes the row
array([[ 1, 2, 3, 4],
[ 9, 10, 11, 12]])
np.delete(arr,obj = 1, axis = 1) #This deletes the column
array([[ 1, 3, 4],
[ 5, 7, 8],
[ 9, 11, 12]])
If there is some logic here that I am missing I would love to learn it.
It's perhaps simplest to remember it as 0=down and 1=across.
This means:
Use axis=0 to apply a method down each column, or to the row labels (the index).
Use axis=1 to apply a method across each row, or to the column labels.
Here's a picture to show the parts of a DataFrame that each axis refers to:
It's also useful to remember that Pandas follows NumPy's use of the word axis. The usage is explained in NumPy's glossary of terms:
Axes are defined for arrays with more than one dimension. A 2-dimensional array has two corresponding axes: the first running vertically downwards across rows (axis 0), and the second running horizontally across columns (axis 1). [my emphasis]
So, concerning the method in the question, np.sort(axis=1), seems to be correctly defined. It takes the mean of entries horizontally across columns, that is, along each individual row. On the other hand, np.sort(axis=0) would be an operation acting vertically downwards across rows.
Similarly, np.delete(name, axis=1) refers to an action on column labels, because they intuitively go across the horizontal axis. Specifying axis=0 would make the method act on rows instead.
arr = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
arr
# array([[ 1, 2, 3, 4],
# [ 5, 6, 7, 8],
# [ 9, 10, 11, 12]])
arr has 2 dimensions, use the empty slice : to select the first and second axis arr[:,:]. From the documentation of np.delete regarding the second parameter obj:
obj : slice, int or array of ints
Indicate indices of sub-arrays to remove along the specified axis.
If we want to delete obj=1 from axis=0 we are effectively removing arr[[1],:] from arr
arr[[1],:] # array([[5, 6, 7, 8]])
With the same intuition, we can remove obj=1 from axis=1
arr[:,[1]] # array([[ 2],
# [ 6],
# [10]])
When sorting the array arr above along axis=0 we are comparing the following elements:
# array([[1, 2, 5, 7, 1]])
# array([[5, 6, 7, 8]])
# array([[ 9, 10, 11, 12]])
The array is already sorted in this case but the comparison is done between two rows. For example array([[5, 6, 7, 8]]) is compared with array([[ 9, 10, 11, 12]]) by doing an element-wise comparison.
Sorting the array on axis=1 we are comparing the following elements
# array([[1], array([[ 2], array([[ 3], array([[ 4],
# [5], [ 6], [ 7], [ 8],
# [9]]) [10]]) [11]]) [12]])
Notice the difference of axis usage between np.delete and np.sort. np.delete will remove the complete row/column while np.sort will use the complete row/column for comparison.

Concatenate NumPy 2D array with column (1D array)

Suppose I have a 2D NumPy array values. I want to add new column to it. New column should be values[:, 19] but lagged by one sample (first element equals to zero). It could be returned as np.append([0], values[0:-2:1, 19]). I tried: Numpy concatenate 2D arrays with 1D array
temp = np.append([0], [values[1:-2:1, 19]])
values = np.append(dataset.values, temp[:, None], axis=1)
but I get:
ValueError: all the input array dimensions except for the concatenation axis
must match exactly
I tried using c_ too as:
temp = np.append([0], [values[1:-2:1, 19]])
values = np.c_[values, temp]
but effect is the same. How this concatenation could be made. I think problem is in temp orientation - it is treated as a row instead of column, so there is an issue with dimensions. In Octave ' (transpose operator) would do the trick. Maybe there is similiar solution in NumPy?
Anyway, thank you for you time.
Best regards,
Max
In [76]: values = np.arange(16).reshape(4,4)
In [77]: temp = np.concatenate(([0], values[1:,-1]))
In [78]: values
Out[78]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In [79]: temp
Out[79]: array([ 0, 7, 11, 15])
This use of concatenate to make temp is similar to your use of append (which actually uses concatenate).
Sounds like you want to join values and temp in this way:
In [80]: np.concatenate((values, temp[:,None]),axis=1)
Out[80]:
array([[ 0, 1, 2, 3, 0],
[ 4, 5, 6, 7, 7],
[ 8, 9, 10, 11, 11],
[12, 13, 14, 15, 15]])
Again I prefer using concatenate directly.
You need to convert the 1D array to 2D as shown. You can then use vstack or hstack with reshaping to get the final array you want as shown:
a = np.array([[1, 2, 3],[4, 5, 6]])
b = np.array([[7, 8, 9]])
c = np.vstack([ele for ele in [a, b]])
print(c)
c = np.hstack([a.reshape(1,-1) for a in [a,b]]).reshape(-1,3)
print(c)
Either way, the output is:
[[1 2 3] [4 5 6] [7 8 9]]
Hope I understood the question correctly

Flip or reverse columns in numpy array

I want to flip the first and second values of arrays in an array. A naive solution is to loop through the array. What is the right way of doing this?
import numpy as np
contour = np.array([[1, 4],
[3, 2]])
flipped_contour = np.empty((0,2))
for point in contour:
x_y_fipped = np.array([point[1], point[0]])
flipped_contour = np.vstack((flipped_contour, x_y_fipped))
print(flipped_contour)
[[4. 1.]
[2. 3.]]
Use the aptly named np.flip:
np.flip(contour, axis=1)
Or,
np.fliplr(contour)
array([[4, 1],
[2, 3]])
You can use numpy indexing:
contour[:, ::-1]
In addition to COLDSPEED's answer, if we only want to swap the first and second column only, not to flip the entire array:
contour[:, :2] = contour[:, 1::-1]
Here contour[:, 1::-1] is the array formed by first two columns of the array contour, in the reverse order. It then is assigned to the first two columns (contour[:, :2]). Now the first two column are swapped.
In general, to swap the ith and jth columns, do the following:
contour[:, [i, j]] = contour[:, [j, i]]
Here are two non-inplace ways of swapping the first two columns:
>>> a = np.arange(15).reshape(3, 5)
>>> a[:, np.r_[1:-1:-1, 2:5]]
array([[ 1, 0, 2, 3, 4],
[ 6, 5, 7, 8, 9],
[11, 10, 12, 13, 14]])
or
>>> np.c_[a[:, 1::-1], a[:, 2:]]
array([[ 1, 0, 2, 3, 4],
[ 6, 5, 7, 8, 9],
[11, 10, 12, 13, 14]])
>>> your_array[indices_to_flip] = np.flip(your_array[indices_to_flip], axis=1)

Numpy array slicing using colons

I am trying to learn numpy array slicing.
But this is a syntax i cannot seem to understand.
What does
a[:1] do.
I ran it in python.
a = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16])
a = a.reshape(2,2,2,2)
a[:1]
Output:
array([[[ 5, 6],
[ 7, 8]],
[[13, 14],
[15, 16]]])
Can someone explain to me the slicing and how it works. The documentation doesn't seem to answer this question.
Another question would be would there be a way to generate the a array using something like
np.array(1:16) or something like in python where
x = [x for x in range(16)]
The commas in slicing are to separate the various dimensions you may have. In your first example you are reshaping the data to have 4 dimensions each of length 2. This may be a little difficult to visualize so if you start with a 2D structure it might make more sense:
>>> a = np.arange(16).reshape((4, 4))
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
>>> a[0] # access the first "row" of data
array([0, 1, 2, 3])
>>> a[0, 2] # access the 3rd column (index 2) in the first row of the data
2
If you want to access multiple values using slicing you can use the colon to express a range:
>>> a[:, 1] # get the entire 2nd (index 1) column
array([[1, 5, 9, 13]])
>>> a[1:3, -1] # get the second and third elements from the last column
array([ 7, 11])
>>> a[1:3, 1:3] # get the data in the second and third rows and columns
array([[ 5, 6],
[ 9, 10]])
You can do steps too:
>>> a[::2, ::2] # get every other element (column-wise and row-wise)
array([[ 0, 2],
[ 8, 10]])
Hope that helps. Once that makes more sense you can look in to stuff like adding dimensions by using None or np.newaxis or using the ... ellipsis:
>>> a[:, None].shape
(4, 1, 4)
You can find more here: http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html
It might pay to explore the shape and individual entries as we go along.
Let's start with
>>> a = np.array([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16])
>>> a.shape
(16, )
This is a one-dimensional array of length 16.
Now let's try
>>> a = a.reshape(2,2,2,2)
>>> a.shape
(2, 2, 2, 2)
It's a multi-dimensional array with 4 dimensions.
Let's see the 0, 1 element:
>>> a[0, 1]
array([[5, 6],
[7, 8]])
Since there are two dimensions left, it's a matrix of two dimensions.
Now a[:, 1] says: take a[i, 1 for all possible values of i:
>>> a[:, 1]
array([[[ 5, 6],
[ 7, 8]],
[[13, 14],
[15, 16]]])
It gives you an array where the first item is a[0, 1], and the second item is a[1, 1].
To answer the second part of your question (generating arrays of sequential values) you can use np.arange(start, stop, step) or np.linspace(start, stop, num_elements). Both of these return a numpy array with the corresponding range of values.

Categories