Sorting all rows in numpy matrix by target-column

Sorting all rows in numpy matrix by target-column - python

This seems been asked many times, however the answer I found not work now. Let's be simple, here I have a numpy matrix
data = np.matrix([[9, 8],
[7, 6],
[5, 7],
[3, 2],
[1, 0]])
Then sort by second column as below
[[1, 0],
[3, 2],
[7, 6],
[5, 7],
[9, 8]])
I tried a lot examples like Python Matrix sorting via one column but none of them worked.
I wondering maybe because the answers were posted years ago which do not work for newest Python? My Python is 3.5.1.
Example of my failed trial:
data = np.matrix([[9, 8],
[7, 6],
[5, 7],
[3, 2],
[1, 0]])
temp = data.view(np.ndarray)
np.lexsort((temp[:, 1], ))
print(temp)
print(data)

You are a moving target.
Sort each column independently:
In [151]: np.sort(data,axis=0)
Out[151]:
matrix([[1, 0],
[3, 2],
[5, 6],
[7, 7],
[9, 8]])
Sort on the values of the second column
In [160]: ind=np.argsort(data[:,1],axis=0)
In [161]: ind
Out[161]:
matrix([[4],
[3],
[1],
[2],
[0]], dtype=int32)
In [162]: data[ind.ravel(),:] # ravel needed because of matrix
Out[162]:
matrix([[[1, 0],
[3, 2],
[7, 6],
[5, 7],
[9, 8]]])
Another way to get a valid ind array:
In [163]: ind=np.argsort(data.A[:,1],axis=0)
In [164]: ind
Out[164]: array([4, 3, 1, 2, 0], dtype=int32)
In [165]: data[ind,:]
To use lexsort you need something like
In [175]: np.lexsort([data.A[:,0],data.A[:,1]])
Out[175]: array([4, 3, 1, 2, 0], dtype=int32)
or your 'failed' case - which isn't a fail
In [178]: np.lexsort((data.A[:,1],))
Out[178]: array([4, 3, 1, 2, 0], dtype=int32)
here data[:,1] is the primary key. data[:,0] is the tie breaker (not applicable in your example). I'm just working from the docs.

The approach in your link is working:
import numpy as np
data = np.matrix([[9, 8],
[7, 6],
[5, 7],
[3, 2],
[1, 0]])
print(data[np.argsort(data.A[:, 1])])
[[1 0]
[3 2]
[7 6]
[5 7]
[9 8]]
And now an example where it's better to see:
data = np.matrix([[1, 9],
[2, 8],
[3, 7],
[4, 6],
[0, 5]])
[[0 5]
[4 6]
[3 7]
[2 8]
[1 9]]

Related

numpy fancy indexing axis order

I have a problem using numpy fancy indexing which I somehow can't get my head around.
I know, that I can get an array of submatrices of rows like this:
A = np.array([[1,2,3],[4,5,6],[7,8,9]])
B = A[np.array([[0,1],[1,2]])]
This gives:
array([[[1, 2, 3],
[4, 5, 6]],
[[4, 5, 6],
[7, 8, 9]]])
a threedimensional numpy array containing matrices comprising the first,second row and second,third row of A, respectively.
What I want is now basically the same operation for the cols of A which should give
array([[[1, 2],
[4, 5],
[7, 8]],
[[2, 3],
[5, 6],
[8, 9]]])
But
B = A[:,np.array([[0,1],[1,2]])]
does not work (probably because of the order of the index evaluations). It gives
array([[[1, 2],
[2, 3]],
[[4, 5],
[5, 6]],
[[7, 8],
[8, 9]]])
How can I accomplish this in the best way? Should I work with transposed matrices?

You get a (3,2,2) array:
In [417]: B
Out[417]:
array([[[1, 2],
[2, 3]],
[[4, 5],
[5, 6]],
[[7, 8],
[8, 9]]])
The 3 is from the first axis of A. The (2,2) from B.
Swap the first 2 axes:
In [418]: B.transpose(1,0,2)
Out[418]:
array([[[1, 2],
[4, 5],
[7, 8]],
[[2, 3],
[5, 6],
[8, 9]]])
A (2,3,2) array

try this:
A = np.array([[1,2,3],[4,5,6],[7,8,9]])
B = A[np.array([[0,1,2],[0,1,2]])]
C = [list(), list()]
for i in range(2):
for j in range(3):
C[i].append(list(B[i][j][:2]) if i==0 else list(B[i][j][1:3]))
C = np.array(C)
C
output:
array([[[1, 2],
[4, 5],
[7, 8]],
[[2, 3],
[5, 6],
[8, 9]]])

One way could be to create B from A.T and then swapaxes:
import numpy as np
A = np.array([[1,2,3],[4,5,6],[7,8,9]])
B = A.T[np.array([[0,1],[1,2]])]
C = B.swapaxes(-2,-1)
To check intermediate step and result:
B
array([[[1, 4, 7],
[2, 5, 8]],
[[2, 5, 8],
[3, 6, 9]]])
C
array([[[1, 2],
[4, 5],
[7, 8]],
[[2, 3],
[5, 6],
[8, 9]]])

Insert item and change the array's dimension

I want to add dimensions to an array, but expand_dims always adds dimension of size 1.
Input:
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
What expand_dims does:
[[[1], [2], [3]], [[4], [5], [6]], [[7], [8], [9]]]
What I want:
[[[1, 1], [1, 2], [1, 3]], [[1, 4], [1, 5], [1, 6]], [[1, 7], [1, 8], [1, 9]]]
Basically I want to replace each scalar in the matrix by a vector [1, x] where x is the original scalar.

Here's one way using broadcasting and np.insert() function:
In [32]: a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
In [33]: np.insert(a[:,:,None], 0, 1, 2)
Out[33]:
array([[[1, 1],
[1, 2],
[1, 3]],
[[1, 4],
[1, 5],
[1, 6]],
[[1, 7],
[1, 8],
[1, 9]]])

There are lots of ways of constructing the new array.
You could initial the array with right shape and fill, and copy values:
In [402]: arr = np.arange(1,10).reshape(3,3)
In [403]: arr
Out[403]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [404]: res = np.ones((3,3,2),int)
In [405]: res[:,:,1] = arr
In [406]: res
Out[406]:
array([[[1, 1],
[1, 2],
[1, 3]],
[[1, 4],
[1, 5],
[1, 6]],
[[1, 7],
[1, 8],
[1, 9]]])
You could join the array with a like size array of 1s. concatenate is the basic joining function:
In [407]: np.concatenate((np.ones((3,3,1),int), arr[:,:,None]), axis=2)
Out[407]:
array([[[1, 1],
[1, 2],
[1, 3]],
[[1, 4],
[1, 5],
[1, 6]],
[[1, 7],
[1, 8],
[1, 9]]])
np.stack((np.ones((3,3),int), arr), axis=2) does the same thing under the covers. np.dstack ('d' for depth) does it as well. The insert in the other answer also does this.

why is reshape(4,2) different from reshape(2,-1).T for a 2x2x2 matrix

I tried the following snippet
a = np.array([[[1, 2], [3, 4]],[[5, 6], [7, 8]]])
b = a.reshape(2,-1).T
c = a.reshape(4,2)
I thought the b and c would be the same since a is getting reshaped to a 4x2 matrix. But they are not. Here is b and c
[[1 5]
[2 6]
[3 7]
[4 8]]
[[1 2]
[3 4]
[5 6]
[7 8]]
why did the arrangement change?

It has to do with the operations you are performing.
>>> a = np.array([[[1, 2], [3, 4]],[[5, 6], [7, 8]]])
>>> a
array([[[1, 2],
[3, 4]],
[[5, 6],
[7, 8]]])
This gets the 4,2 shape you wanted
>>> a.reshape(4,2)
array([[1, 2],
[3, 4],
[5, 6],
[7, 8]])
This gets you 2 rows and 4 columns
>>> a.reshape(2,-1)
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
Applying the .T operator switches your rows to columns:
>>> a.reshape(2,-1).T
array([[1, 5],
[2, 6],
[3, 7],
[4, 8]])
To reshape to a 4, 2 array using -1 for a dimension it needs to be in the first position to achieve a 4, 2 array:
>>> a.reshape(-1,2)
array([[1, 2],
[3, 4],
[5, 6],
[7, 8]])

How to split array by indices where the splitted sub-arrays include the split point

I have a 2D array containing values and a 1D array with index values where I would like to split the 2D matrix, where the splitted sub-arrays include the 'split-point'.
I know I can use the numpy.split function to split by indices and I know I can use stride_tricks to split an array for creating consecutive overlapping subset-views.
But it seems the stride_ticks only applies if we want to split an array into equal sized sub-arrays.
Minimal example, I can do the following:
>>> import numpy as np
>>> array = np.random.randint(0,10, (10,2))
>>> indices = np.array([2,3,8])
>>> array
array([[8, 1],
[1, 0],
[2, 0],
[8, 8],
[1, 6],
[7, 8],
[4, 4],
[9, 4],
[6, 7],
[6, 4]])
>>> split_array = np.split(array, indices, axis=0)
>>> split_array
[array([[8, 1],
[1, 0]]),
array([[2, 0]]),
array([[8, 8],
[1, 6],
[7, 8],
[4, 4],
[9, 4]]),
array([[6, 7],
[6, 4]])]
But I'm merely looking for an option within the split function where I could define include_split_point=True, which would give me a result as such:
[array([[8, 1],
[1, 0],
[2, 0]]),
array([[2, 0],
[8, 8]]),
array([[8, 8],
[1, 6],
[7, 8],
[4, 4],
[9, 4],
[6, 7]]),
array([[6, 7],
[6, 4]])]

Create a new array with the index elements repeated
new_indices = np.zeros(array.shape[0], dtype = int)
new_indices[indices] = 1
new_indices += 1
new_array = np.repeat(array, new_indices, axis = 0)
Update indices to account for the changed array
indices = indices + np.arange(1, len(indices)+1)
Split using the indices as usual
np.split(new_array, indices, axis = 0)
output:
[array([[8, 1],
[1, 0],
[2, 0]]),
array([[2, 0],
[8, 8]]),
array([[8, 8],
[1, 6],
[7, 8],
[4, 4],
[9, 4],
[6, 7]]),
array([[6, 7],
[6, 4]])]

numpy array max min from pixelpoints of open cv [duplicate]

I have a large n x 2 numpy array that is formatted as (x, y) coordinates. I would like to filter this array so as to:
Identify coordinate pairs with duplicated x-values.
Keep only the coordinate pair of those duplicates with the highest y-value.
For example, in the following array:
arr = [[1, 4]
[1, 8]
[2, 3]
[4, 6]
[4, 2]
[5, 1]
[5, 2]
[5, 6]]
I would like the result to be:
arr = [[1, 8]
[2, 3]
[4, 6]
[5, 6]]
Ive explored np.unique and np.where but cannot figure out how to leverage them to solve this problem. Thanks so much!

Here's one way based on np.maximum.reduceat -
def grouby_maxY(a):
b = a[a[:,0].argsort()] # if first col is already sorted, skip this
grp_idx = np.flatnonzero(np.r_[True,(b[:-1,0] != b[1:,0])])
grp_maxY = np.maximum.reduceat(b[:,1], grp_idx)
return np.c_[b[grp_idx,0], grp_maxY]
Alternatively, if you want to bring np.unique, we can use it to find grp_idx with np.unique(b[:,0], return_index=1)[1].
Sample run -
In [453]: np.random.seed(0)
In [454]: arr = np.random.randint(0,5,(10,2))
In [455]: arr
Out[455]:
array([[4, 0],
[3, 3],
[3, 1],
[3, 2],
[4, 0],
[0, 4],
[2, 1],
[0, 1],
[1, 0],
[1, 4]])
In [456]: grouby_maxY(arr)
Out[456]:
array([[0, 4],
[1, 4],
[2, 1],
[3, 3],
[4, 0]])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Sorting all rows in numpy matrix by target-column - python

Related

numpy fancy indexing axis order

Insert item and change the array's dimension

why is reshape(4,2) different from reshape(2,-1).T for a 2x2x2 matrix

How to split array by indices where the splitted sub-arrays include the split point

numpy array max min from pixelpoints of open cv [duplicate]

Categories

Resources