Sorting 2D array by the first n rows

Sorting 2D array by the first n rows - python

How can I sort an array in NumPy by the two first rows?
For example,
A=array([[9, 2, 2],
[4, 5, 6],
[7, 0, 5]])
And I'd like to sort columns by the first two rows, such that I get back:
A=array([[2, 2, 9],
[5, 6, 4],
[0, 5, 7]])
Thank you!

One approach is to transform the 2D array over which we want to take the argsort into an easier to handle 1D array. For that one idea could be to multiply the rows to take into accounts for the sorting purpose by successively decreasing values in the power of 10 sequence, sum them and then use argsort (note: this method will be numerically unstable for high values of k. Meant for values up to ~20):
def sort_on_first_k_rows(x, k):
# normalize each row so that its max value is 1
a = (x[:k,:]/x[:k,:,None].max(1)).astype('float64')
# multiply each row by the seq 10^n, for n=k-1,k-2...0
# Ensures that the contribution of each row in the sorting is
# captured in the final sum
a_pow = (a*10**np.arange(a.shape[0]-1,-1,-1)[:,None])
# Sort with the argsort on the resulting sum
return x[:,a_pow.sum(0).argsort()]
Checking with the shared example:
sort_on_first_k_rows(A, 2)
array([[2, 2, 9],
[5, 6, 4],
[0, 5, 7]])
Or with another example:
A=np.array([[9, 2, 2, 1, 5, 2, 9],
[4, 7, 6, 0, 9, 3, 3],
[7, 0, 5, 0, 2, 1, 2]])
sort_on_first_k_rows(A, 2)
array([[1, 2, 2, 2, 5, 9, 9],
[0, 3, 6, 7, 9, 3, 4],
[0, 1, 5, 0, 2, 2, 7]])

The pandas library is very flexible for sorting DataFrames - but only based on columns. So I suggest to transpose and convert your array to a DataFrame like this (note that you need to specify column names for later defining the sorting criteria):
df = pd.DataFrame(A.transpose(), columns=['col'+str(i) for i in range(len(A))])
Then sort it and convert it back like this:
A_new = df.sort_values(['col0', 'col1'], ascending=[True, True]).to_numpy().transpose()

Related

Numpy Array: Slice several values at every step

I am trying to extract several values at once from an array but I can't seem to find a way to do it in a one-liner in Numpy.
Simply put, considering an array:
a = numpy.arange(10)
> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
I would like to be able to extract, say, 2 values, skip the next 2, extract the 2 following values etc. This would result in:
array([0, 1, 4, 5, 8, 9])
This is an example but I am ideally looking for a way to extract x values and skip y others.
I thought this could be done with slicing, doing something like:
a[:2:2]
but it only returns 0, which is the expected behavior.
I know I could obtain the expected result by combining several slicing operations (similarly to Numpy Array Slicing) but I was wondering if I was not missing some numpy feature.

If you want to avoid creating copies and allocating new memory, you could use a window_view of two elements:
win = np.lib.stride_tricks.sliding_window_view(a, 2)
array([[0, 1],
[1, 2],
[2, 3],
[3, 4],
[4, 5],
[5, 6],
[6, 7],
[7, 8],
[8, 9]])
And then only take every 4th window view:
win[::4].ravel()
array([0, 1, 4, 5, 8, 9])
Or directly go with the more dangerous as_strided, but heed the warnings in the documentation:
np.lib.stride_tricks.as_strided(a, shape=(3,2), strides=(32,8))

You can use a modulo operator:
x = 2 # keep
y = 2 # skip
out = a[np.arange(a.shape[0])%(x+y)<x]
Output: array([0, 1, 4, 5, 8, 9])
Output with x = 2 ; y = 3:
array([0, 1, 5, 6])

Split a NumPy array into subarrays according to the values (sorted in ascending order) of another array

Suppose I have two NumPy arrays
x = [[5, 2, 8],
[4, 9, 1],
[7, 8, 9],
[1, 3, 5],
[1, 2, 3],
[1, 2, 4]]
y = [0, 0, 1, 1, 1, 2]
I want to efficiently split the array x into sub-arrays according to the values in y.
My desired outputs would be
z_0 = [[5, 2, 8],
[4, 9, 1]]
z_1 = [[7, 8, 9],
[1, 3, 5],
[1, 2, 3]]
z_2 = [[1, 2, 4]]
Assuming that y starts with zero and is sorted in ascending order, what is the most efficient way to do this?
Note: This question is the sorted version of this question:
Split a NumPy array into subarrays according to the values (not sorted, but grouped) of another array

If y is grouped (doesn't have to be sorted), you can use diff to get the split points:
indices = np.flatnonzero(np.diff(y)) + 1
You can pass those directly to np.split:
z = np.split(x, indices, axis=0)
If you want to know the labels too:
labels = y[np.r_[0, indices]]

Dropping array rows that DUPLICATE defined column elements of other array rows

Consider the np array sample below:
import numpy as np
arr = np.array([[1,2,5, 4,2,7, 5,2,9],
[4,4,1, 4,2,0, 3,6,4],
[1,2,1, 4,2,2, 5,2,0],
[1,2,7, 2,4,1, 5,2,8],
[1,2,9, 4,2,8, 5,2,1],
[4,2,0, 4,4,1, 5,2,4],
[4,4,0, 4,2,6, 3,6,6],
[1,2,1, 4,2,2, 5,2,0]])
PROBLEM: We are concerned only with the first TWO columns of each element triplet. I want to remove array rows that duplicate these two elements of each triplet (in the same order).
In the example above, the rows with indices 0,2,4, and 7 are all of the form [1,2,_, 4,2,_, 5,2,_]. So, we should keep arr[0],and drop the other three. Similarly, row[6] is dropped because it has the same pattern as row[1], namely [4,4,_, 4,2,_, 3,6,_].
In the example given, the output should look like:
[[1,2,5, 4,2,7, 5,2,9],
[4,4,1 4,2,0, 3,6,4],
[1,2,7, 2,4,1, 5,2,8],
[4,2,0, 4,4,1 5,2,4]]
The part I'm struggling with most is that the solution should be general enough to handle arrays of 3, 6, 9, 12... columns. (always a multiple of 3, and we are always interested in duplications of the first two columns of each triplet.

If you can create an array withonly the values you are interested in, you can pass that to np.unique() which has an option to return_index.
One way to get the groups you want is to delete every third column. Pass that to np.unique() and get the indices:
import numpy as np
arr = np.array([[1,2,5, 4,2,7, 5,2,9],
[4,4,1, 4,2,0, 3,6,4],
[1,2,1, 4,2,2, 5,2,0],
[1,2,7, 2,4,1, 5,2,8],
[1,2,9, 4,2,8, 5,2,1],
[4,2,0, 4,4,1, 5,2,4],
[4,4,0, 4,2,6, 3,6,6],
[1,2,1, 4,2,2, 5,2,0]])
unique_cols = np.delete(arr, slice(2, None, 3), axis=1)
vals, indices = np.unique(unique_cols, axis=0, return_index=True)
arr[sorted(indices)]
output:
array([[1, 2, 5, 4, 2, 7, 5, 2, 9],
[4, 4, 1, 4, 2, 0, 3, 6, 4],
[1, 2, 7, 2, 4, 1, 5, 2, 8],
[4, 2, 0, 4, 4, 1, 5, 2, 4]])

Array sort, slice and reverse returns empty

I have an array sorted by the last column, where I want to use/show the top 3 sorted rows.
Slicing works and reversing the 2nd dimension (2nd example) works as well.
However when I want to reverse the 1st dimension (3rd example) I get an empty print out.
I replicated these examples but when I enter a slice and a -1 for reverse I get an empty output as well.
It's probably really obvious what I'm missing...
arr = np.array ([[8, 2, 4, 6],
[8, 3, 1, 8],
[3, 7, 6, 1],
[9, 4, 2, 4],
[4, 7, 5, 8],
[1, 9, 3, 5],
[1, 3, 9, 111],
[3, 6, 7, 111],
[2, 8, 2, 111],
[4, 5, 9, 3]])
print(arr[0:10,:])
print("###")
# Sort by column 4, then by column 3
lexsorted_index = np.lexsort((arr[:, 2], arr[:, 3]))
a = arr[lexsorted_index]
print(a[0:10:1,::]) #0-10th row each step, all columns
print("###")
print(a[0:10:1,::-1]) #0-10th row each step, all columns reversed
print("###")
print(a[0:3:-1,::]) #0-3rd row reversed, all columns

Python slice syntax is start:stop:step, not low:high:step. If your step is counting down, the start is the high endpoint, not the low endpoint.
Also, slices are start-inclusive and stop-exclusive, not low-inclusive and high-exclusive, so if you want the first 3 rows, your start value should be 2 instead of 3, and your stop value should just be omitted to slice as far as it can go (because -1 doesn't mean what you want).
print(a[2::-1])

Apply same permutation for every row in a 2D numpy array

To permute a 1D array A I know that you can run the following code:
import numpy as np
A = np.random.permutation(A)
I have a 2D array and want to apply exactly the same permutation for every row of the array. Is there any way you can specify the numpy to do that for you?

Generate random permutations for the number of columns in A and index into the columns of A, like so -
A[:,np.random.permutation(A.shape[1])]
Sample run -
In [100]: A
Out[100]:
array([[3, 5, 7, 4, 7],
[2, 5, 2, 0, 3],
[1, 4, 3, 8, 8]])
In [101]: A[:,np.random.permutation(A.shape[1])]
Out[101]:
array([[7, 5, 7, 4, 3],
[3, 5, 2, 0, 2],
[8, 4, 3, 8, 1]])

Actually you do not need to do this, from the documentation:
If x is a multi-dimensional array, it is only shuffled along its first
index.
So, taking Divakar's array:
a = np.array([
[3, 5, 7, 4, 7],
[2, 5, 2, 0, 3],
[1, 4, 3, 8, 8]
])
you can just do: np.random.permutation(a) and get something like:
array([[2, 5, 2, 0, 3],
[3, 5, 7, 4, 7],
[1, 4, 3, 8, 8]])
P.S. if you need to perform column permutations - just do np.random.permutation(a.T).T. Similar things apply to multi-dim arrays.

It depends what you mean on every row.
If you want to permute all values (regardless of row and column), reshape your array to 1d, permute, reshape back to 2d.
If you want to permutate each row but not shuffle the elements among the different columns you need to loop trough the one axis and call permutation.
for i in range(len(A)):
A[i] = np.random.permutation(A[i])
It can probably done shorter somehow but that is how it can be done.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Sorting 2D array by the first n rows - python

How can I sort an array in NumPy by the two first rows? For example, A=array([[9, 2, 2], [4, 5, 6], [7, 0, 5]]) And I'd like to sort columns by the first two rows, such that I get back: A=array([[2, 2, 9], [5, 6, 4], [0, 5, 7]]) Thank you!

Related

Numpy Array: Slice several values at every step

Split a NumPy array into subarrays according to the values (sorted in ascending order) of another array

Dropping array rows that DUPLICATE defined column elements of other array rows

Array sort, slice and reverse returns empty

Apply same permutation for every row in a 2D numpy array

Categories

Resources