Rank within columns of 2d array

Rank within columns of 2d array - python

>>> a = array([[10, 50, 20, 30, 40],
... [50, 30, 40, 20, 10],
... [30, 20, 20, 10, 50]])
>>> some_np_expression(a)
array([[1, 3, 1, 3, 2],
[3, 2, 3, 2, 1],
[2, 1, 2, 1, 3]])
What is some_np_expression? Don't care about how ties are settled so long as the ranks are distinct and sequential.

Double argsort is a standard (but inefficient!) way to do this:
In [120]: a
Out[120]:
array([[10, 50, 20, 30, 40],
[50, 30, 40, 20, 10],
[30, 20, 20, 10, 50]])
In [121]: a.argsort(axis=0).argsort(axis=0) + 1
Out[121]:
array([[1, 3, 1, 3, 2],
[3, 2, 3, 2, 1],
[2, 1, 2, 1, 3]])
With some more code, you can avoid sorting twice. Note that I'm using a different a in the following:
In [262]: a
Out[262]:
array([[30, 30, 10, 10],
[10, 20, 20, 30],
[20, 10, 30, 20]])
Call argsort once:
In [263]: s = a.argsort(axis=0)
Use s to construct the array of rankings:
In [264]: i = np.arange(a.shape[0]).reshape(-1, 1)
In [265]: j = np.arange(a.shape[1])
In [266]: ranked = np.empty_like(a, dtype=int)
In [267]: ranked[s, j] = i + 1
In [268]: ranked
Out[268]:
array([[3, 3, 1, 1],
[1, 2, 2, 3],
[2, 1, 3, 2]])
Here's the less efficient (but more concise) version:
In [269]: a.argsort(axis=0).argsort(axis=0) + 1
Out[269]:
array([[3, 3, 1, 1],
[1, 2, 2, 3],
[2, 1, 3, 2]])

Now Scipy offers a function to rank data with an axis argument - you can set along what axis you want to rank the data.
from scipy.stats.mstats import rankdata
a = array([[10, 50, 20, 30, 40],
[50, 30, 40, 20, 10],
[30, 20, 20, 10, 50]])
ranked_vertical = rankdata(a, axis=0)

from scipy.stats.mstats import rankdata
import numpy as np
a = np.array([[10, 50, 20, 30, 40],
[50, 30, 40, 20, 10],
[30, 20, 20, 10, 50]])
rank = (rankdata(a, axis=0)-1).astype(int)
The output will be as follows.
array([[0, 2, 0, 2, 1],
[2, 1, 2, 1, 0],
[1, 0, 0, 0, 2]])

Related

Multiplying arrays with broadcasting

I have an mxn A matrix and an nxr B matrix that I want to multiply in a specific way to get an mxr matrix. I want to multiply every element in the ith column of A as a scalar to the ith row of B and the sum the n matrices
For example
a = [[0, 1, 2],
[3, 4, 5],
b = [[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 10, 11]]
The product would be
a*b = [[0, 0, 0, 0], + [[4, 5, 6, 7], + [[16, 18, 20, 22], = [[20, 23, 26, 29],
[0, 3, 6, 9]] [16, 20, 24, 28]] [40, 45, 50, 55]] [56, 68, 80, 92]]
I can't use any loops so I'm pretty sure I have to use broadcasting but I don't know how. Any help is appreciated

Your input matrices are of shape (2, 3) and (3, 4) respectively and the result you want is of shape (2, 4).
What you need is just a dot product of your two matrices as
a = np.array([[0, 1, 2],
[3, 4, 5]])
b = np.array([[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 10, 11]])
print (np.dot(a,b))
# array([[20, 23, 26, 29],
# [56, 68, 80, 92]])

Select different columns for each row

Suppose I have the following array:
>>> a = np.arange(25).reshape((5, 5))
>>> a
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
Now I want to select different columns for each row based on the following index array:
>>> i = np.array([0, 1, 2, 1, 0])
This index array denotes the start column for each row and the selections should be of similar range, e.g. 3. Thus I want to obtain the following result:
>>> ???
array([[ 0, 1, 2],
[ 6, 7, 8],
[12, 13, 14],
[16, 17, 18],
[20, 21, 22]])
I know that I can select a single column per row via
>>> a[np.arange(a.shape[0]), i]
but how about multiple columns?

Use advanced indexing with properly broadcasted 2d array as index.
a[np.arange(a.shape[0])[:,None], i[:,None] + np.arange(3)]
#array([[ 0, 1, 2],
# [ 6, 7, 8],
# [12, 13, 14],
# [16, 17, 18],
# [20, 21, 22]])
idx_row = np.arange(a.shape[0])[:,None]
idx_col = i[:,None] + np.arange(3)
idx_row
#array([[0],
# [1],
# [2],
# [3],
# [4]])
idx_col
#array([[0, 1, 2],
# [1, 2, 3],
# [2, 3, 4],
# [1, 2, 3],
# [0, 1, 2]])
a[idx_row, idx_col]
#array([[ 0, 1, 2],
# [ 6, 7, 8],
# [12, 13, 14],
# [16, 17, 18],
# [20, 21, 22]])

Numpy Multiply Size

Given a numpy 3-d array
[[[1][4]][[7][10]]]
let's say the first row is 1 4 and the second row is 7 10. If I have a multiplier of 3, the first through third rows would become 1 1 1 4 4 4 and the 4th through 6th rows would become 7 7 7 10 10 10, that is:
[[[1][1][1][4][4][4]][[1][1][1][4][4][4]][[1][1][1][4][4][4]][[7][7][7][10][10][10]][[7][7][7][10][10][10]][[7][7][7][10][10][10]]]
Is there a quick way to do this in numpy? The actual array I'm using has 3 or 4 elements instead of 1 at the bottom level so [1][1][1] could be [1,8,7][1,8,7][1,8,7] instead, but I simplified it here.

numpy.repeat sounds like what you want.
Here are some examples:
>>> a = numpy.array( [[[1,2,3],[4,5,6]], [[10,20,30],[40,50,60]]] )
>>> a
array([[[ 1, 2, 3],
[ 4, 5, 6]],
[[10, 20, 30],
[40, 50, 60]]])
>>>
>>> a.repeat( 3, axis=0 )
array([[[ 1, 2, 3],
[ 4, 5, 6]],
[[ 1, 2, 3],
[ 4, 5, 6]],
[[ 1, 2, 3],
[ 4, 5, 6]],
[[10, 20, 30],
[40, 50, 60]],
[[10, 20, 30],
[40, 50, 60]],
[[10, 20, 30],
[40, 50, 60]]])
>>>
>>> a.repeat( 3, axis=1 )
array([[[ 1, 2, 3],
[ 1, 2, 3],
[ 1, 2, 3],
[ 4, 5, 6],
[ 4, 5, 6],
[ 4, 5, 6]],
[[10, 20, 30],
[10, 20, 30],
[10, 20, 30],
[40, 50, 60],
[40, 50, 60],
[40, 50, 60]]])
>>>
>>> a.repeat( 3, axis=2 )
array([[[ 1, 1, 1, 2, 2, 2, 3, 3, 3],
[ 4, 4, 4, 5, 5, 5, 6, 6, 6]],
[[10, 10, 10, 20, 20, 20, 30, 30, 30],
[40, 40, 40, 50, 50, 50, 60, 60, 60]]])
Depending on the desired shape of your output, you may wish to chain multiple calls to .repeat() with different axis values.

Using NumPy arrays as indices to NumPy arrays

I have a 3x3x3 NumPy array:
>>> x = np.arange(27).reshape((3, 3, 3))
>>> x
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])
Now I create an ordinary list of indices:
>>> i = [[0, 1, 2, 1], [2, 1, 0, 1], [1, 2, 0, 1]]
As expected, I get four values using this list as the index:
>>> x[i]
array([ 7, 14, 18, 13])
But if I now convert i into a NumPy array, I won't get the same answer.
>>> j = np.asarray(i)
>>> x[j]
array([[[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]],
...,
[[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]],
[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]]]])
Why is this so? Why can't I use NumPy arrays as indices to NumPy array?

x[j] is the equivalent of x[j,:,:]
In [163]: j.shape
Out[163]: (3, 4)
In [164]: x[j].shape
Out[164]: (3, 4, 3, 3)
The resulting shape is the shape of j joined with the last 2 dimensions of x. j just selects from the 1st dimension of x.
x[i] on the other hand, is the equivalent to x[tuple(i)], that is:
In [168]: x[[0, 1, 2, 1], [2, 1, 0, 1], [1, 2, 0, 1]]
Out[168]: array([ 7, 14, 18, 13])
In fact x(tuple(j)] produces the same 4 item array.
The different ways of indexing numpy arrays can be confusing.
Another example of how the shape of the index array or lists affects the output:
In [170]: x[[[0, 1], [2, 1]], [[2, 1], [0, 1]], [[1, 2], [0, 1]]]
Out[170]:
array([[ 7, 14],
[18, 13]])
Same items, but in a 2d array.

Check out the docs for numpy, what you are doing is "Integer Array Indexing", you need to pass each coordinate in as a separate array:
j = [np.array(x) for x in i]
x[j]
Out[191]: array([ 7, 14, 18, 13])

Simple python code about double loop

I tested the following python code on Spyder IDE. Thinking it would output 2d array q as increasing number as 0..31 from q[0][0] to q[3][7]. But it actually returns q as:
[[24, 25, 26, 27, 28, 29, 30, 31], [24, 25, 26, 27, 28, 29, 30, 31], [24, 25, 26, 27, 28, 29, 30, 31], [24, 25, 26, 27, 28, 29, 30, 31]].
The code:
q=[[0]*8]*4
for i in range(4):
for j in range(8):
q[i][j] = 8*i+j
print q
Any idea of what's happening here? I debugged step by step. It shows the updates of every row will sync with all other rows, quite different from my experience of other programing languages.

q=[somelist]*4
creates a list with four identical items, the list somelist. So, for example, q[0] and q[1] reference the same object.
Thus, in the nested for loop q[i] is referencing the same list regardless of the value of i.
To fix:
q = [[0]*8 for _ in range(4)]
The list comprehension evaluates [0]*8 4 distinct times, resulting in 4 distinct lists.
Here is a quick demonstration of this pitfall:
In [14]: q=[[0]*8]*4
You might think you are updating only the first element in the second row:
In [15]: q[1][0] = 100
But you really end up altering the first element in every row:
In [16]: q
Out[16]:
[[100, 0, 0, 0, 0, 0, 0, 0],
[100, 0, 0, 0, 0, 0, 0, 0],
[100, 0, 0, 0, 0, 0, 0, 0],
[100, 0, 0, 0, 0, 0, 0, 0]]

As explained the problem is caused due to * operation on lists, which create more references to the same object. What you should do is to use append:
q=[]
for i in range(4):
q.append([])
for j in range(8):
q[i].append(8*i+j)
print q
[[0, 1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13, 14, 15], [16, 17, 18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29, 30, 31]]

When you do something like l = [x]*8 you are actually creating 8 references to the same list, not 8 copies.
To actually get 8 copies, you have to use l = [[x] for i in xrange(8)]
>>> x=[1,2,3]
>>> l=[x]*8
>>> l
[[1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2, 3]]
>>> l[0][0]=10
>>> l
[[10, 2, 3], [10, 2, 3], [10, 2, 3], [10, 2, 3], [10, 2, 3], [10, 2, 3], [10, 2, 3], [10, 2, 3]]
>>> l = [ [x] for i in xrange(8)]
>>> l
[[[10, 2, 3]], [[10, 2, 3]], [[10, 2, 3]], [[10, 2, 3]], [[10, 2, 3]], [[10, 2, 3]], [[10, 2, 3]], [[10, 2, 3]]]
>>> l[0][0] = 1
>>> l
[[1], [[10, 2, 3]], [[10, 2, 3]], [[10, 2, 3]], [[10, 2, 3]], [[10, 2, 3]], [[10, 2, 3]], [[10, 2, 3]]]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Rank within columns of 2d array - python

>>> a = array([[10, 50, 20, 30, 40], ... [50, 30, 40, 20, 10], ... [30, 20, 20, 10, 50]]) >>> some_np_expression(a) array([[1, 3, 1, 3, 2], [3, 2, 3, 2, 1], [2, 1, 2, 1, 3]]) What is some_np_expression? Don't care about how ties are settled so long as the ranks are distinct and sequential.

Now Scipy offers a function to rank data with an axis argument - you can set along what axis you want to rank the data. from scipy.stats.mstats import rankdata a = array([[10, 50, 20, 30, 40], [50, 30, 40, 20, 10], [30, 20, 20, 10, 50]]) ranked_vertical = rankdata(a, axis=0)

from scipy.stats.mstats import rankdata import numpy as np a = np.array([[10, 50, 20, 30, 40], [50, 30, 40, 20, 10], [30, 20, 20, 10, 50]]) rank = (rankdata(a, axis=0)-1).astype(int) The output will be as follows. array([[0, 2, 0, 2, 1], [2, 1, 2, 1, 0], [1, 0, 0, 0, 2]])

Related

Multiplying arrays with broadcasting

Select different columns for each row

Numpy Multiply Size

Using NumPy arrays as indices to NumPy arrays

Simple python code about double loop

Categories

Resources