Numpy argsort - what is happening? - python

I have a numpy array called arr1 defined like following.
arr1 = np.array([1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9])
print(arr1.argsort())
array([ 0, 1, 2, 3, 4, 5, 6, 7, 9, 8, 10, 11, 12, 13, 14, 15, 16,
17], dtype=int64)
I expected all the indices of the array to be in numeric order but indices 8 and 9 seems to have flipped.
Can someone help on why this is happening?

np.argsort by default uses the quicksort algorithm which is not stable. You can specify kind = "stable" to perform a stable sort, which will preserve the order of equal elements:
import numpy as np
arr1 = np.array([1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9])
print(arr1.argsort(kind="stable"))
It gives:
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17]

Because it will sort according to the quick sort algorithm if you follow the steps you will see that is why they are flipped. https://numpy.org/doc/stable/reference/generated/numpy.argsort.html

Related

From a 2D array, create 2nd 2D array of Unique(non-repeated) random selected values from 1st array (values not shared among rows) without using a loop

This is a follow up on this question.
From a 2d array, create another 2d array composed of randomly selected values from original array (values not shared among rows) without using a loop
I am looking for a way to create a 2D array whose rows are randomly selected unique values (non-repeating) from another row, without using a loop.
Here is a way to do it With using a loop.
pool = np.random.randint(0, 30, size=[4,5])
seln = np.empty([4,3], int)
for i in range(0, pool.shape[0]):
seln[i] =np.random.choice(pool[i], 3, replace=False)
print('pool = ', pool)
print('seln = ', seln)
>pool = [[ 1 11 29 4 13]
[29 1 2 3 24]
[ 0 25 17 2 14]
[20 22 18 9 29]]
seln = [[ 8 12 0]
[ 4 19 13]
[ 8 15 24]
[12 12 19]]
Here is a method that does not uses a loop, however, it can select the same value multiple times in each row.
pool = np.random.randint(0, 30, size=[4,5])
print(pool)
array([[ 4, 18, 0, 15, 9],
[ 0, 9, 21, 26, 9],
[16, 28, 11, 19, 24],
[20, 6, 13, 2, 27]])
# New array shape
new_shape = (pool.shape[0],3)
# Indices where to randomly choose from
ix = np.random.choice(pool.shape[1], new_shape)
array([[0, 3, 3],
[1, 1, 4],
[2, 4, 4],
[1, 2, 1]])
ixs = (ix.T + range(0,np.prod(pool.shape),pool.shape[1])).T
array([[ 0, 3, 3],
[ 6, 6, 9],
[12, 14, 14],
[16, 17, 16]])
pool.flatten()[ixs].reshape(new_shape)
array([[ 4, 15, 15],
[ 9, 9, 9],
[11, 24, 24],
[ 6, 13, 6]])
I am looking for a method that does not use a loop, and if a particular value from a row is selected, that value can Not be selected again.
Here is a way without explicit looping. However, it requires generating an array of random numbers of the size of the original array. That said, the generation is done using compiled code so it should be pretty fast. It can fail if you happen to generate two identical numbers, but the chance of that happening is essentially zero.
m,n = 4,5
pool = np.random.randint(0, 30, size=[m,n])
new_width = 3
mask = np.argsort(np.random.rand(m,n))<new_width
pool[mask].reshape(m,3)
How it works:
We generate a random array of floats, and argsort it. By default, when artsort is applied to a 2d array it is applied along axis 1 so the value of the i,j entry of the argsorted list is what place the j-th entry of the i-th row would appear if you sorted the i-th row.
We then find all the values in this array where the entries whose values are less than new_width. Each row contains the numbers 0,...,n-1 in a random order, so exactly new_width of them will be less than new_width. This means each row of mask will have exactly new_width number of entries which are True, and the rest will be False (when you use a boolean operator between a ndarray and a scalar it applies it component-wise).
Finally, the boolean mask is applied to the original data to grab new_width many entries from each row.
You could also use np.vectorize for your loop solution, although that is just shorthand for a loop.

Pythonic way to get both diagonals passing through a matrix entry (i,j)

What is the Pythonic way to get a list of diagonal elements in a matrix passing through entry (i,j)?
For e.g., given a matrix like:
[1 2 3 4 5]
[6 7 8 9 10]
[11 12 13 14 15]
[16 17 18 19 20]
[21 22 23 24 25]
and an entry, say, (1,3) (representing element 9) how can I get the elements in the diagonals passing through 9 in a Pythonic way? Basically, [3,9,15] and [5,9,13,17,21] both.
Using np.diagonal with a little offset logic.
import numpy as np
lst = np.array([[1, 2, 3, 4, 5],
[6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20],
[21, 22, 23, 24, 25]])
i, j = 1, 3
major = np.diagonal(lst, offset=(j - i))
print(major)
array([ 3, 9, 15])
minor = np.diagonal(np.rot90(lst), offset=-lst.shape[1] + (j + i) + 1)
print(minor)
array([ 5, 9, 13, 17, 21])
The indices i and j are the row and column. By specifying the offset, numpy knows from where to begin selecting elements for the diagonal.
For the major diagonal, You want to start collecting from 3 in the first row. So you need to take the current column index and subtract it by the current row index, to figure out the correct column index at the 0th row. Similarly for the minor diagonal, where the array is flipped (rotated by 90˚) and the process repeats.
As another alternative method, with raveling the array and for matrix with shape (n*n):
array = np.array([[1, 2, 3, 4, 5],
[6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20],
[21, 22, 23, 24, 25]])
x, y = 1, 3
a_mod = array.ravel()
size = array.shape[0]
if y >= x:
diag = a_mod[y-x:(x+size-y)*size:size+1]
else:
diag = a_mod[(x-y)*size::size+1]
if x-(size-1-y) >= 0:
reverse_diag = array[:, ::-1].ravel()[(x-(size-1-y))*size::size+1]
else:
reverse_diag = a_mod[x:x*size+1:size-1]
# diag --> [ 3 9 15]
# reverse_diag --> [ 5 9 13 17 21]
The correctness of the resulted arrays must be checked further. This can be developed to handle matrices with other shapes e.g. (n*m).

convert all rows to columns and columns to rows in Arrays [duplicate]

This question already has answers here:
Matrix Transpose in Python [duplicate]
(19 answers)
Closed 6 years ago.
I'm trying to create this program (in python) that converts all rows to columns and columns to rows. To be more specific, the first input are 2 numbers. N and M. N - total rows,M total columns. I've used b=map(int, raw_input().split()). and then based on b[0], Each of the next N lines will contain M space separated integers. For example:
Input:
3 5
13 4 8 14 1
9 6 3 7 21
5 12 17 9 3
Now the program will store it in a 2D array:
arr=[[13, 4, 8, 14, 1], [9, 6, 3, 7, 21], [5, 12, 17, 9, 3]]
What's required for the output is to print M lines each containing N space separated integers. For example:
Output:
13 9 5
4 6 12
8 3 17
14 7 9
1 21 3
This is what I've tried so far:
#Getting N and M from input
NM=map(int, raw_input().split())
arr=[]
for i in xrange(NM[0]):
c=map(int, raw_input().split())
arr.append(c)
I've created a 2D array and got the values from input but I don't know the rest. Let me make this clear that I'm definitely NOT asking for code. Just exactly what to do to convert rows to columns and in reverse.
Thanks in advance!
You can use zip to transpose the data:
arr = [[13, 4, 8, 14, 1], [9, 6, 3, 7, 21], [5, 12, 17, 9, 3]]
new_arr = zip(*arr)
# [(13, 9, 5), (4, 6, 12), (8, 3, 17), (14, 7, 9), (1, 21, 3)]

To Generate a split indices for n-fold

I have a requirement to generate a split for cross validation, say s is an index of records
s = [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20]
Now I want to randomly shuffle and split the data with 5 folds, typically I want output something like this
s = [[1 5 4 6], [2,3, 19,20], [... ], [... ], [.. ]]
Note: In each array numbers should be unique, it should not repeat
I know I can use chunk() but in chunk you can do only sequence wise like 1-4, 5-8,....
Can anyone help me on this ?
Shuffle your array using random.shuffle and split it into 5 pieces:
For Python2 use
import random
s = range(1, 21)
random.shuffle(s)
s = [s[i::5] for i in range(5)]
or for Python3:
import random
s = list(range(1, 21))
random.shuffle(s)
s = [s[i::5] for i in range(5)]
import random
s = [1 ,2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
print [random.sample(s,5) for i in xrange(len(s)/5)]

Matlab vs Python: Reshape

So I found this:
When converting MATLAB code it might be necessary to first reshape a
matrix to a linear sequence, perform some indexing operations and then
reshape back. As reshape (usually) produces views onto the same
storage, it should be possible to do this fairly efficiently.
Note that the scan order used by reshape in Numpy defaults to the 'C'
order, whereas MATLAB uses the Fortran order. If you are simply
converting to a linear sequence and back this doesn't matter. But if
you are converting reshapes from MATLAB code which relies on the scan
order, then this MATLAB code:
z = reshape(x,3,4);
should become
z = x.reshape(3,4,order='F').copy()
in Numpy.
I have a multidimensional 16*2 array called mafs, when I do in MATLAB:
mafs2 = reshape(mafs,[4,4,2])
I get something different than when in python I do:
mafs2 = reshape(mafs,(4,4,2))
or even
mafs2 = mafs.reshape((4,4,2),order='F').copy()
Any help on this? Thank you all.
Example:
MATLAB:
>> mafs = [(1:16)' (17:32)']
mafs =
1 17
2 18
3 19
4 20
5 21
6 22
7 23
8 24
9 25
10 26
11 27
12 28
13 29
14 30
15 31
16 32
>> reshape(mafs,[4 4 2])
ans(:,:,1) =
1 5 9 13
2 6 10 14
3 7 11 15
4 8 12 16
ans(:,:,2) =
17 21 25 29
18 22 26 30
19 23 27 31
20 24 28 32
Python:
>>> import numpy as np
>>> mafs = np.c_[np.arange(1,17), np.arange(17,33)]
>>> mafs.shape
(16, 2)
>>> mafs[:,0]
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16])
>>> mafs[:,1]
array([17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32])
>>> r = np.reshape(mafs, (4,4,2), order="F")
>>> r.shape
(4, 4, 2)
>>> r[:,:,0]
array([[ 1, 5, 9, 13],
[ 2, 6, 10, 14],
[ 3, 7, 11, 15],
[ 4, 8, 12, 16]])
>>> r[:,:,1]
array([[17, 21, 25, 29],
[18, 22, 26, 30],
[19, 23, 27, 31],
[20, 24, 28, 32]])
I was having a similar issue myself, as I am also trying to make the transition from MATLAB to Python. I was finally able to convert a numpy matrix, given in depth, row, col, format to a single sheet of column vectors (per image).
In MATLAB I would have done something like:
output = reshape(imStack,[row*col,depth])
In Python this seems to translate to:
import numpy as np
output=np.transpose(imStack)
output=output.reshape((row*col, depth), order='F')

Categories