Indexing in NumPy: Access every other group of values - python

The [::n] indexing option in numpy provides a very useful way to index every nth item in a list. However, is it possible to use this feature to extract multiple values, e.g. every other pair of values?
For example:
a = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
And I want to extract every other pair of values i.e. I want to return
a[0, 1, 4, 5, 8, 9,]
Of course the index could be built using loops or something, but I wonder if there's a faster way to use ::-style indexing in numpy but also specifying the width of the pattern to take every nth iteration of.
Thanks

With length of array being a multiple of the window size -
In [29]: W = 2 # window-size
In [30]: a.reshape(-1,W)[::2].ravel()
Out[30]: array([0, 1, 4, 5, 8, 9])
Explanation with breaking-down-the-steps -
# Reshape to split into W-sized groups
In [43]: a.reshape(-1,W)
Out[43]:
array([[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11]])
# Use stepsize to select every other pair starting from the first one
In [44]: a.reshape(-1,W)[::2]
Out[44]:
array([[0, 1],
[4, 5],
[8, 9]])
# Flatten for desired output
In [45]: a.reshape(-1,W)[::2].ravel()
Out[45]: array([0, 1, 4, 5, 8, 9])
If you are okay with 2D output, skip the last step as that still be a view into the input and virtually free on runtime. Let's verify the view-part -
In [47]: np.shares_memory(a,a.reshape(-1,W)[::2])
Out[47]: True
For generic case of not necessarily a multiple, we can use a masking based one -
In [64]: a[(np.arange(len(a))%(2*W))<W]
Out[64]: array([0, 1, 4, 5, 8, 9])

You can do that reshaping the array into a nx3 matrix, then slice up the first two elements for each row and finally flatten up the reshaped array:
a.reshape((-1,3))[:,:2].flatten()
resulting in:
array([ 0, 1, 3, 4, 6, 7, 9, 10])

Related

Numpy Array: Slice several values at every step

I am trying to extract several values at once from an array but I can't seem to find a way to do it in a one-liner in Numpy.
Simply put, considering an array:
a = numpy.arange(10)
> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
I would like to be able to extract, say, 2 values, skip the next 2, extract the 2 following values etc. This would result in:
array([0, 1, 4, 5, 8, 9])
This is an example but I am ideally looking for a way to extract x values and skip y others.
I thought this could be done with slicing, doing something like:
a[:2:2]
but it only returns 0, which is the expected behavior.
I know I could obtain the expected result by combining several slicing operations (similarly to Numpy Array Slicing) but I was wondering if I was not missing some numpy feature.
If you want to avoid creating copies and allocating new memory, you could use a window_view of two elements:
win = np.lib.stride_tricks.sliding_window_view(a, 2)
array([[0, 1],
[1, 2],
[2, 3],
[3, 4],
[4, 5],
[5, 6],
[6, 7],
[7, 8],
[8, 9]])
And then only take every 4th window view:
win[::4].ravel()
array([0, 1, 4, 5, 8, 9])
Or directly go with the more dangerous as_strided, but heed the warnings in the documentation:
np.lib.stride_tricks.as_strided(a, shape=(3,2), strides=(32,8))
You can use a modulo operator:
x = 2 # keep
y = 2 # skip
out = a[np.arange(a.shape[0])%(x+y)<x]
Output: array([0, 1, 4, 5, 8, 9])
Output with x = 2 ; y = 3:
array([0, 1, 5, 6])

Why does the axis argument in NumPy change?

I am very confused when it comes to the logic of the NumPy axis argument. In some cases it affects the row when axis = 0 and in some cases it affects the columns when axis = 0. Example:
a = np.array([[1,3,6,7,4],[3,2,5,9,1]])
array([[1,3,6,7,4],
[3,2,5,9,1]])
np.sort(a, axis = 0) #This sorts the columns
array([[1, 2, 5, 7, 1],
[3, 3, 6, 9, 4]])
np.sort(a, axis=1) #This sorts the rows
array([[1, 3, 4, 6, 7],
[1, 2, 3, 5, 9]])
#####################################################################
arr = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
arr
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
np.delete(arr,obj = 1, axis = 0) # This deletes the row
array([[ 1, 2, 3, 4],
[ 9, 10, 11, 12]])
np.delete(arr,obj = 1, axis = 1) #This deletes the column
array([[ 1, 3, 4],
[ 5, 7, 8],
[ 9, 11, 12]])
If there is some logic here that I am missing I would love to learn it.
It's perhaps simplest to remember it as 0=down and 1=across.
This means:
Use axis=0 to apply a method down each column, or to the row labels (the index).
Use axis=1 to apply a method across each row, or to the column labels.
Here's a picture to show the parts of a DataFrame that each axis refers to:
It's also useful to remember that Pandas follows NumPy's use of the word axis. The usage is explained in NumPy's glossary of terms:
Axes are defined for arrays with more than one dimension. A 2-dimensional array has two corresponding axes: the first running vertically downwards across rows (axis 0), and the second running horizontally across columns (axis 1). [my emphasis]
So, concerning the method in the question, np.sort(axis=1), seems to be correctly defined. It takes the mean of entries horizontally across columns, that is, along each individual row. On the other hand, np.sort(axis=0) would be an operation acting vertically downwards across rows.
Similarly, np.delete(name, axis=1) refers to an action on column labels, because they intuitively go across the horizontal axis. Specifying axis=0 would make the method act on rows instead.
arr = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])
arr
# array([[ 1, 2, 3, 4],
# [ 5, 6, 7, 8],
# [ 9, 10, 11, 12]])
arr has 2 dimensions, use the empty slice : to select the first and second axis arr[:,:]. From the documentation of np.delete regarding the second parameter obj:
obj : slice, int or array of ints
Indicate indices of sub-arrays to remove along the specified axis.
If we want to delete obj=1 from axis=0 we are effectively removing arr[[1],:] from arr
arr[[1],:] # array([[5, 6, 7, 8]])
With the same intuition, we can remove obj=1 from axis=1
arr[:,[1]] # array([[ 2],
# [ 6],
# [10]])
When sorting the array arr above along axis=0 we are comparing the following elements:
# array([[1, 2, 5, 7, 1]])
# array([[5, 6, 7, 8]])
# array([[ 9, 10, 11, 12]])
The array is already sorted in this case but the comparison is done between two rows. For example array([[5, 6, 7, 8]]) is compared with array([[ 9, 10, 11, 12]]) by doing an element-wise comparison.
Sorting the array on axis=1 we are comparing the following elements
# array([[1], array([[ 2], array([[ 3], array([[ 4],
# [5], [ 6], [ 7], [ 8],
# [9]]) [10]]) [11]]) [12]])
Notice the difference of axis usage between np.delete and np.sort. np.delete will remove the complete row/column while np.sort will use the complete row/column for comparison.

Sorting 2D array by the first n rows

How can I sort an array in NumPy by the two first rows?
For example,
A=array([[9, 2, 2],
[4, 5, 6],
[7, 0, 5]])
And I'd like to sort columns by the first two rows, such that I get back:
A=array([[2, 2, 9],
[5, 6, 4],
[0, 5, 7]])
Thank you!
One approach is to transform the 2D array over which we want to take the argsort into an easier to handle 1D array. For that one idea could be to multiply the rows to take into accounts for the sorting purpose by successively decreasing values in the power of 10 sequence, sum them and then use argsort (note: this method will be numerically unstable for high values of k. Meant for values up to ~20):
def sort_on_first_k_rows(x, k):
# normalize each row so that its max value is 1
a = (x[:k,:]/x[:k,:,None].max(1)).astype('float64')
# multiply each row by the seq 10^n, for n=k-1,k-2...0
# Ensures that the contribution of each row in the sorting is
# captured in the final sum
a_pow = (a*10**np.arange(a.shape[0]-1,-1,-1)[:,None])
# Sort with the argsort on the resulting sum
return x[:,a_pow.sum(0).argsort()]
Checking with the shared example:
sort_on_first_k_rows(A, 2)
array([[2, 2, 9],
[5, 6, 4],
[0, 5, 7]])
Or with another example:
A=np.array([[9, 2, 2, 1, 5, 2, 9],
[4, 7, 6, 0, 9, 3, 3],
[7, 0, 5, 0, 2, 1, 2]])
sort_on_first_k_rows(A, 2)
array([[1, 2, 2, 2, 5, 9, 9],
[0, 3, 6, 7, 9, 3, 4],
[0, 1, 5, 0, 2, 2, 7]])
The pandas library is very flexible for sorting DataFrames - but only based on columns. So I suggest to transpose and convert your array to a DataFrame like this (note that you need to specify column names for later defining the sorting criteria):
df = pd.DataFrame(A.transpose(), columns=['col'+str(i) for i in range(len(A))])
Then sort it and convert it back like this:
A_new = df.sort_values(['col0', 'col1'], ascending=[True, True]).to_numpy().transpose()

Find value indexes in a mother array with a filter array

I have two arrays: one is a mother array and the other is a "filtering array". The mother array is a 2D array (about 65 rowsx147 cols in size). The filtering array is an array that has the max value of each column of the mother array (1 row x 147 cols). I need to get the matching row values for the max values.
I tried using
for index,k in np.ndenumerate(MotherArr):
for val in FiltArr:
if k == val:
print(index)
But for some reason, I am basically getting a print of val with the very last index printed afterwards.
Any ideas on how I could get this working?
You can just take the argmax of your array along an axis:
np.random.seed(0)
A = np.random.randint(0, 10, (5, 5))
# array([[5, 0, 3, 3, 7],
# [9, 3, 5, 2, 4],
# [7, 6, 8, 8, 1],
# [6, 7, 7, 8, 1],
# [5, 9, 8, 9, 4]])
maxima = A.max(1)
# array([7, 9, 8, 8, 9])
maxima_args = A.argmax(1)
# array([4, 0, 2, 3, 1], dtype=int64)

Apply same permutation for every row in a 2D numpy array

To permute a 1D array A I know that you can run the following code:
import numpy as np
A = np.random.permutation(A)
I have a 2D array and want to apply exactly the same permutation for every row of the array. Is there any way you can specify the numpy to do that for you?
Generate random permutations for the number of columns in A and index into the columns of A, like so -
A[:,np.random.permutation(A.shape[1])]
Sample run -
In [100]: A
Out[100]:
array([[3, 5, 7, 4, 7],
[2, 5, 2, 0, 3],
[1, 4, 3, 8, 8]])
In [101]: A[:,np.random.permutation(A.shape[1])]
Out[101]:
array([[7, 5, 7, 4, 3],
[3, 5, 2, 0, 2],
[8, 4, 3, 8, 1]])
Actually you do not need to do this, from the documentation:
If x is a multi-dimensional array, it is only shuffled along its first
index.
So, taking Divakar's array:
a = np.array([
[3, 5, 7, 4, 7],
[2, 5, 2, 0, 3],
[1, 4, 3, 8, 8]
])
you can just do: np.random.permutation(a) and get something like:
array([[2, 5, 2, 0, 3],
[3, 5, 7, 4, 7],
[1, 4, 3, 8, 8]])
P.S. if you need to perform column permutations - just do np.random.permutation(a.T).T. Similar things apply to multi-dim arrays.
It depends what you mean on every row.
If you want to permute all values (regardless of row and column), reshape your array to 1d, permute, reshape back to 2d.
If you want to permutate each row but not shuffle the elements among the different columns you need to loop trough the one axis and call permutation.
for i in range(len(A)):
A[i] = np.random.permutation(A[i])
It can probably done shorter somehow but that is how it can be done.

Categories