Weird / Wrong outpout of np.argsort() - python

I was working with numpy and argsort, while encountering a strange (?) behavior of argsort:
>>> array = [[0, 1, 2, 3, 4, 5],
[444, 4, 8, 3, 1, 10],
[2, 5, 8, 999, 1, 4]]
>>> np.argsort(array, axis=0)
array([[0, 0, 0, 0, 1, 2],
[2, 1, 1, 1, 2, 0],
[1, 2, 2, 2, 0, 1]], dtype=int64)
The first 4 values of each list are pretty clear to me - argsort doing it's job right. But the last 2 values are pretty confusing, as it is kinda sorting the values wrong.
Shouldn't the output of argsort be:
array([[0, 0, 0, 0, 2, 1],
[2, 1, 1, 1, 0, 2],
[1, 2, 2, 2, 1, 0]], dtype=int64)

I think the issue is with what you think argsort is outputting. Let's focus on a simpler 1D example:
arr = np.array([5, 10, 4])
The result of np.argsort will be the indices from the original array to make the elements sorted:
[2, 0, 1]
Let's take a look at what the actual sorted values are to understand why:
[
4, # at index 2 in the original array
5, # at index 0 in the original array
10, # at index 1 in the original array
]
It seems like you are imagining the inverse operation, where argsort will tell you what index in the output each element will move to. You can obtain those indices by applying argsort to the result of argsort.

The output is correct, the thing is that np.argsort with axis=0, is actually comparing the each element of the first axis elements'. So, that for array
array = [[0, 1, 2, 3, 4, 5],
... [444, 4, 8, 3, 1, 10],
... [2, 5, 8, 999, 1, 4]]
axis=0, compares the elements, (0, 444, 2), (1,4,8), (2,8,8), (3,3,999), (4,1,1), (5,10,4) so that it gives the array of indices as:
np.argsort(array, axis=0)
array([[0, 0, 0, 0, 1, 2],
[2, 1, 1, 1, 2, 0],
[1, 2, 2, 2, 0, 1]])
So, for your question the last 2 values, comes from the elements (4,1,1) which give the array index value as (1,2,0), and for (5,10,4) it gives (2,0,1).
Refer this: np.argsort

Related

Python: Multiply 2D array with each row of another 2D array [duplicate]

This question already has answers here:
How to get element-wise matrix multiplication (Hadamard product) in numpy?
(5 answers)
Closed last year.
Suppose I have the following data:
mask = [[0, 1, 1, 0, 1]] # 2D mask
ip_array = [[7, 4, 5, 2, 3]
[3, 2, 1, 9, 0]
[1, 8, 6, 3, 1]] # 2D array
I want to multiply the mask with each row of ip_array. So the output should be like:
[[0, 4, 5, 0, 3]
[0, 2, 1, 0, 0]
[0, 8, 6, 0, 1]]
I am new to numpy functions and I am looking for an efficient way to do this. Any help is appreciated!
You can use:
np.multiply(mask, ip_array)
Giving you:
array([[0, 4, 5, 0, 3],
[0, 2, 1, 0, 0],
[0, 8, 6, 0, 1]])
Also, as a heads-up, you're missing two commas in your definition of ip_array. It should look like this:
ip_array = [[7, 4, 5, 2, 3],
[3, 2, 1, 9, 0],
[1, 8, 6, 3, 1]] # 2D array

Major vote by column?

I have a 20x20 2D array, from which I want to get for every column the value with the highest count of occurring (excluding zeros) i.e. the value that receives the major vote.
I can do that for a single column like this :
: np.unique(p[:,0][p[:,0] != 0],return_counts=True)
: (array([ 3, 21], dtype=int16), array([1, 3]))
: nums, cnts = np.unique(p[:,0][ p[:,0] != 0 ],return_counts=True)
: nums[cnts.argmax()]
: 21
Just for completeness, we can extend the earlier proposed method to a loop-based solution for 2D arrays -
# p is 2D input array
for i in range(p.shape[1]):
nums, cnts = np.unique(p[:,i][ p[:,i] != 0 ],return_counts=True)
output_per_col = nums[cnts.argmax()]
How do I do that for all columns w/o using for loop ?
We can use bincount2D_vectorized to get binned counts per col, where the bins would be each integer. Then, simply slice out from the second count onwards (as the first count would be for 0) and get argmax, add 1 (to compensate for the slicing). That's our desired output.
Hence, the solution shown as a sample case run -
In [116]: p # input array
Out[116]:
array([[4, 3, 4, 1, 1, 0, 2, 0],
[4, 0, 0, 0, 0, 0, 4, 0],
[3, 1, 3, 4, 3, 1, 4, 3],
[4, 4, 3, 3, 1, 1, 3, 2],
[3, 0, 3, 0, 4, 4, 4, 0],
[3, 0, 0, 3, 2, 0, 1, 4],
[4, 0, 3, 1, 3, 3, 2, 0],
[3, 3, 0, 0, 2, 1, 3, 1],
[2, 4, 0, 0, 2, 3, 4, 2],
[0, 2, 4, 2, 0, 2, 2, 4]])
In [117]: bincount2D_vectorized(p.T)[:,1:].argmax(1)+1
Out[117]: array([3, 3, 3, 1, 2, 1, 4, 2])
That transpose is needed because bincount2D_vectorized gets us 2D bincounts per row. Thus, for an alternative problem of getting ranks per row, simply skip that transpose.
Also, feel free to explore other options in that linked Q&A to get 2D-bincounts.

Most efficient way to get sorted indices based on two numpy arrays

How can i get the sorted indices of a numpy array (distance), only considering certain indices from another numpy array (val).
For example, consider the two numpy arrays val and distance below:
val = np.array([[10, 0, 0, 0, 0],
[0, 0, 10, 0, 10],
[0, 10, 10, 0, 0],
[0, 0, 0, 10, 0],
[0, 0, 0, 0, 0]])
distance = np.array([[4, 3, 2, 3, 4],
[3, 2, 1, 2, 3],
[2, 1, 0, 1, 2],
[3, 2, 1, 2, 3],
[4, 3, 2, 3, 4]])
the distances where val == 10 are 4, 1, 3, 1, 0, 2. I would like to get these sorted to be 0, 1, 1, 2, 3, 4 and return the respective indices from distance array.
Returning something like:
(array([2, 1, 2, 3, 1, 0], dtype=int64), array([2, 2, 1, 3, 4, 0], dtype=int64))
or:
(array([2, 2, 1, 3, 1, 0], dtype=int64), array([2, 1, 2, 3, 4, 0], dtype=int64))
since the second and third element both have distance '1', so i guess the indices can be interchangable.
Tried using combinations of np.where, np.argsort, np.argpartition, np.unravel_index but cant seem to get it working right
Here's one way with masking -
In [20]: mask = val==10
In [21]: np.argwhere(mask)[distance[mask].argsort()]
Out[21]:
array([[2, 2],
[1, 2],
[2, 1],
[3, 3],
[1, 4],
[0, 0]])

Numpy lexicographic ordering

I'd like to lexicographically sort the following array a (get index positions), but, I'm having problems understanding the numpy results:
>>> a = np.asarray([[1, 1, 1, 2, 1, 2], [2, 1, 2, 3, 1, 0], [1, 2, 3, 3, 2, 2]])
>>> a
array([[1, 1, 1, 2, 1, 2],
[2, 1, 2, 3, 1, 0],
[1, 2, 3, 3, 2, 2]])
>>> np.lexsort(a)
array([0, 5, 1, 4, 2, 3])
For instance, I don't understand why [1, 2, 1] (a[:,0]) is sort-index 0 while [1, 1, 2] (a[:,1]) is index 5, even thought it should be samller than [1, 2, 1].
The order of significance for keys is opposite to what you expected.
In order to get expected result just flip the matrix upside down
>>> np.lexsort(np.flipud(a))
array([1, 4, 0, 2, 5, 3])
np.lexsort gives you the index of the columns in lexicographic order, however the order it considers is such that the last element in the column has priority over the previous one and so on. That's why in your example column 5 comes before column 1.
[2,0,2] < [1,1,2] because 2 = 2 and 0 < 1.

making an array of n columns where each successive row increases by one

In numpy, I would like to be able to input n for rows and m for columns and end with the array that looks like:
[(0,0,0,0),
(1,1,1,1),
(2,2,2,2)]
So that would be a 3x4. Each column is just a copy of the previous one and the row increases by one each time. As an example:
input would be 4, then 6 and the output would be and array
[(0,0,0,0,0,0),
(1,1,1,1,1,1),
(2,2,2,2,2,2),
(3,3,3,3,3,3)]
4 rows and 6 columns where the row increases by one each time. Thanks for your time.
So many possibilities...
In [51]: n = 4
In [52]: m = 6
In [53]: np.tile(np.arange(n), (m, 1)).T
Out[53]:
array([[0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3]])
In [54]: np.repeat(np.arange(n).reshape(-1,1), m, axis=1)
Out[54]:
array([[0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3]])
In [55]: np.outer(np.arange(n), np.ones(m, dtype=int))
Out[55]:
array([[0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3]])
Here's one more. The neat trick here is that the values are not duplicated--only memory for the single sequence [0, 1, 2, ..., n-1] is allocated.
In [67]: from numpy.lib.stride_tricks import as_strided
In [68]: seq = np.arange(n)
In [69]: rep = as_strided(seq, shape=(n,m), strides=(seq.strides[0],0))
In [70]: rep
Out[70]:
array([[0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1],
[2, 2, 2, 2, 2, 2],
[3, 3, 3, 3, 3, 3]])
Be careful with the as_strided function. If you don't get the arguments right, you can crash Python.
To see that seq has not been copied, change seq in place, and then check rep:
In [71]: seq[1] = 99
In [72]: rep
Out[72]:
array([[ 0, 0, 0, 0, 0, 0],
[99, 99, 99, 99, 99, 99],
[ 2, 2, 2, 2, 2, 2],
[ 3, 3, 3, 3, 3, 3]])
import numpy as np
def foo(n, m):
return np.array([np.arange(n)] * m).T
Natively (no Python lists):
rows, columns = 4, 6
numpy.arange(rows).reshape(-1, 1).repeat(columns, axis=1)
#>>> array([[0, 0, 0, 0, 0, 0],
#>>> [1, 1, 1, 1, 1, 1],
#>>> [2, 2, 2, 2, 2, 2],
#>>> [3, 3, 3, 3, 3, 3]])
You can easily do this using built in python functions. The program counts to 3 converting each number to a string and repeats the string 6 times.
print [6*str(n) for n in range(0,4)]
Here is the output.
ks-MacBook-Pro:~ kyle$ pbpaste | python
['000000', '111111', '222222', '333333']
On more for fun
np.zeros((n, m), dtype=np.int) + np.arange(n, dtype=np.int)[:,None]
As has been mentioned, there are many ways to do this.
Here's what I'd do:
import numpy as np
def makearray(m, n):
A = np.empty((m,n))
A.T[:] = np.arange(m)
return A
Here's an amusing alternative that will work if you aren't going to be changing the contents of the array.
It should save some memory.
Be careful though because this doesn't allocate a full array, it will have multiple entries pointing to the same memory address.
import numpy as np
from numpy.lib.stride_tricks import as_strided
def makearray(m, n):
A = np.arange(m)
return as_strided(A, strides=(A.strides[0],0), shape=(m,n))
In either case, as I have written them, a 3x4 array can be created by makearray(3, 4)
Using count from the built-in module itertools:
>>> from itertools import count
>>> rows = 4
>>> columns = 6
>>> cnt = count()
>>> [[cnt.next()]*columns for i in range(rows)]
[[0, 0, 0, 0, 0, 0], [1, 1, 1, 1, 1, 1], [2, 2, 2, 2, 2, 2], [3, 3, 3, 3, 3, 3]]
you can simply
>>> nc=5
>>> nr=4
>>> [[k]*nc for k in range(nr)]
[[0, 0, 0, 0, 0], [1, 1, 1, 1, 1], [2, 2, 2, 2, 2], [3, 3, 3, 3, 3]]
Several other possibilities using a (n,1) array
a = np.arange(n)[:,None] (or np.arange(n).reshape(-1,1))
a*np.ones((m),dtype=int)
a[:,np.zeros((m),dtype=int)]
If used with a (m,) array, just leave it (n,1), and let broadcasting expand it for you.

Categories