Efficiently sorting a numpy array in descending order?

Efficiently sorting a numpy array in descending order? - python

I am surprised this specific question hasn't been asked before, but I really didn't find it on SO nor on the documentation of np.sort.
Say I have a random numpy array holding integers, e.g:
> temp = np.random.randint(1,10, 10)
> temp
array([2, 4, 7, 4, 2, 2, 7, 6, 4, 4])
If I sort it, I get ascending order by default:
> np.sort(temp)
array([2, 2, 2, 4, 4, 4, 4, 6, 7, 7])
but I want the solution to be sorted in descending order.
Now, I know I can always do:
reverse_order = np.sort(temp)[::-1]
but is this last statement efficient? Doesn't it create a copy in ascending order, and then reverses this copy to get the result in reversed order? If this is indeed the case, is there an efficient alternative? It doesn't look like np.sort accepts parameters to change the sign of the comparisons in the sort operation to get things in reverse order.

temp[::-1].sort() sorts the array in place, whereas np.sort(temp)[::-1] creates a new array.
In [25]: temp = np.random.randint(1,10, 10)
In [26]: temp
Out[26]: array([5, 2, 7, 4, 4, 2, 8, 6, 4, 4])
In [27]: id(temp)
Out[27]: 139962713524944
In [28]: temp[::-1].sort()
In [29]: temp
Out[29]: array([8, 7, 6, 5, 4, 4, 4, 4, 2, 2])
In [30]: id(temp)
Out[30]: 139962713524944

>>> a=np.array([5, 2, 7, 4, 4, 2, 8, 6, 4, 4])
>>> np.sort(a)
array([2, 2, 4, 4, 4, 4, 5, 6, 7, 8])
>>> -np.sort(-a)
array([8, 7, 6, 5, 4, 4, 4, 4, 2, 2])

For short arrays I suggest using np.argsort() by finding the indices of the sorted negatived array, which is slightly faster than reversing the sorted array:
In [37]: temp = np.random.randint(1,10, 10)
In [38]: %timeit np.sort(temp)[::-1]
100000 loops, best of 3: 4.65 µs per loop
In [39]: %timeit temp[np.argsort(-temp)]
100000 loops, best of 3: 3.91 µs per loop

Be careful with dimensions.
Let
x # initial numpy array
I = np.argsort(x) or I = x.argsort()
y = np.sort(x) or y = x.sort()
z # reverse sorted array
Full Reverse
z = x[I[::-1]]
z = -np.sort(-x)
z = np.flip(y)
flip changed in 1.15, previous versions 1.14 required axis. Solution: pip install --upgrade numpy.
First Dimension Reversed
z = y[::-1]
z = np.flipud(y)
z = np.flip(y, axis=0)
Second Dimension Reversed
z = y[::-1, :]
z = np.fliplr(y)
z = np.flip(y, axis=1)
Testing
Testing on a 100×10×10 array 1000 times.
Method | Time (ms)
-------------+----------
y[::-1] | 0.126659 # only in first dimension
-np.sort(-x) | 0.133152
np.flip(y) | 0.121711
x[I[::-1]] | 4.611778
x.sort() | 0.024961
x.argsort() | 0.041830
np.flip(x) | 0.002026
This is mainly due to reindexing rather than argsort.
# Timing code
import time
import numpy as np
def timeit(fun, xs):
t = time.time()
for i in range(len(xs)): # inline and map gave much worse results for x[-I], 5*t
fun(xs[i])
t = time.time() - t
print(np.round(t,6))
I, N = 1000, (100, 10, 10)
xs = np.random.rand(I,*N)
timeit(lambda x: np.sort(x)[::-1], xs)
timeit(lambda x: -np.sort(-x), xs)
timeit(lambda x: np.flip(x.sort()), xs)
timeit(lambda x: x[x.argsort()[::-1]], xs)
timeit(lambda x: x.sort(), xs)
timeit(lambda x: x.argsort(), xs)
timeit(lambda x: np.flip(x), xs)

np.flip() and reversed indexed are basically the same. Below is a benchmark using three different methods. It seems np.flip() is slightly faster. Using negation is slower because it is used twice so reversing the array is faster than that.
** Note that np.flip() is faster than np.fliplr() according to my tests.
def sort_reverse(x):
return np.sort(x)[::-1]
def sort_negative(x):
return -np.sort(-x)
def sort_flip(x):
return np.flip(np.sort(x))
arr=np.random.randint(1,10000,size=(1,100000))
%timeit sort_reverse(arr)
%timeit sort_negative(arr)
%timeit sort_flip(arr)
and the results are:
6.61 ms ± 67.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
6.69 ms ± 64.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
6.57 ms ± 58.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Hello I was searching for a solution to reverse sorting a two dimensional numpy array, and I couldn't find anything that worked, but I think I have stumbled on a solution which I am uploading just in case anyone is in the same boat.
x=np.sort(array)
y=np.fliplr(x)
np.sort sorts ascending which is not what you want, but the command fliplr flips the rows left to right! Seems to work!
Hope it helps you out!
I guess it's similar to the suggest about -np.sort(-a) above but I was put off going for that by comment that it doesn't always work. Perhaps my solution won't always work either however I have tested it with a few arrays and seems to be OK.

Unfortunately when you have a complex array, only np.sort(temp)[::-1] works properly. The two other methods mentioned here are not effective.

You could sort the array first (Ascending by default) and then apply np.flip()
(https://docs.scipy.org/doc/numpy/reference/generated/numpy.flip.html)
FYI It works with datetime objects as well.
Example:
x = np.array([2,3,1,0])
x_sort_asc=np.sort(x)
print(x_sort_asc)
>>> array([0, 1, 2, 3])
x_sort_desc=np.flip(x_sort_asc)
print(x_sort_desc)
>>> array([3,2,1,0])

Here is a quick trick
In[3]: import numpy as np
In[4]: temp = np.random.randint(1,10, 10)
In[5]: temp
Out[5]: array([5, 4, 2, 9, 2, 3, 4, 7, 5, 8])
In[6]: sorted = np.sort(temp)
In[7]: rsorted = list(reversed(sorted))
In[8]: sorted
Out[8]: array([2, 2, 3, 4, 4, 5, 5, 7, 8, 9])
In[9]: rsorted
Out[9]: [9, 8, 7, 5, 5, 4, 4, 3, 2, 2]

i suggest using this ...
np.arange(start_index, end_index, intervals)[::-1]
for example:
np.arange(10, 20, 0.5)
np.arange(10, 20, 0.5)[::-1]
Then your resault:
[ 19.5, 19. , 18.5, 18. , 17.5, 17. , 16.5, 16. , 15.5,
15. , 14.5, 14. , 13.5, 13. , 12.5, 12. , 11.5, 11. ,
10.5, 10. ]

Related

Efficient way to shift values in one matrix according to the values in another matrix

So I want to shift the values in matrix_a according to the values in matrix_b. So if the value in matrix_b at postion 0,0 is 1, then the element in the result_matrix at 0,0 should be the element that is at 1,1 in matrix_a. I already have this working using the following code:
import numpy as np
matrix_a = np.matrix([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
matrix_b = np.matrix([[1, 1, 0],
[0,-1, 0],
[0, 0, -1]])
result_matrix = np.zeros((3,3))
for x in range(matrix_b.shape[0]):
for y in range(matrix_b.shape[1]):
value = matrix_b.item(x,y)
result_matrix[x][y]=matrix_a.item(x+value,y+value)
print(result_matrix)
which results in:
[[5. 6. 3.]
[4. 1. 6.]
[7. 8. 5.]]
Right now this is quite slow on large matrices, and I have the feeling that this can be optimized using one of numpy or scipy's functions. Can someone tell me how this can be done more efficiently?

Using np.indices
ix = np.indices(matrix_a.shape)
matrix_a[tuple(ix + np.array(matrix_b))]
Out[]:
matrix([[5, 6, 3],
[4, 1, 6],
[7, 8, 5]])
As a word of advice, try to avoid using np.matrix - it's only really for compatibility with old MATLAB code, and breaks a lot of numpy functions. np.array works just as well 99% of the time, and the rest of the time np.matrix will be confusing for core numpy users.

Here's one way with integer-indexing generated off the same iterators as open ranged arrays to get row, column indices for all elements -
I,J = np.ogrid[:matrix_b.shape[0],:matrix_b.shape[1]]
out = matrix_a[I+matrix_b, J+matrix_b]
Output for given sample -
In [152]: out
Out[152]:
matrix([[5, 6, 3],
[4, 1, 6],
[7, 8, 5]])
Timings on a large dataset 5000x5000 -
In [142]: np.random.seed(0)
...: N = 5000 # matrix size
...: matrix_a = np.random.rand(N,N)
...: matrix_b = np.random.randint(0,N,matrix_a.shape)-matrix_a.shape[1]
# #Daniel F's soln
In [143]: %%timeit
...: ix = np.indices(matrix_a.shape)
...: matrix_a[tuple(ix + np.array(matrix_b))]
1.37 s ± 99.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# Solution from this post
In [149]: %%timeit
...: I,J = np.ogrid[:matrix_b.shape[0],:matrix_b.shape[1]]
...: out = matrix_a[I+matrix_b, J+matrix_b]
1.17 s ± 3.21 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Can numpy's argsort give equal element the same rank?

I want to get the rank of each element, so I use argsort in numpy:
np.argsort(np.array((1,1,1,2,2,3,3,3,3)))
array([0, 1, 2, 3, 4, 5, 6, 7, 8])
it give the same element the different rank, can I get the same rank like:
array([0, 0, 0, 3, 3, 5, 5, 5, 5])

If you don't mind a dependency on scipy, you can use scipy.stats.rankdata, with method='min':
In [14]: a
Out[14]: array([1, 1, 1, 2, 2, 3, 3, 3, 3])
In [15]: from scipy.stats import rankdata
In [16]: rankdata(a, method='min')
Out[16]: array([1, 1, 1, 4, 4, 6, 6, 6, 6])
Note that rankdata starts the ranks at 1. To start at 0, subtract 1 from the result:
In [17]: rankdata(a, method='min') - 1
Out[17]: array([0, 0, 0, 3, 3, 5, 5, 5, 5])
If you don't want the scipy dependency, you can use numpy.unique to compute the ranking. Here's a function that computes the same result as rankdata(x, method='min') - 1:
import numpy as np
def rankmin(x):
u, inv, counts = np.unique(x, return_inverse=True, return_counts=True)
csum = np.zeros_like(counts)
csum[1:] = counts[:-1].cumsum()
return csum[inv]
For example,
In [137]: x = np.array([60, 10, 0, 30, 20, 40, 50])
In [138]: rankdata(x, method='min') - 1
Out[138]: array([6, 1, 0, 3, 2, 4, 5])
In [139]: rankmin(x)
Out[139]: array([6, 1, 0, 3, 2, 4, 5])
In [140]: a = np.array([1,1,1,2,2,3,3,3,3])
In [141]: rankdata(a, method='min') - 1
Out[141]: array([0, 0, 0, 3, 3, 5, 5, 5, 5])
In [142]: rankmin(a)
Out[142]: array([0, 0, 0, 3, 3, 5, 5, 5, 5])
By the way, a single call to argsort() does not give ranks. You can find an assortment of approaches to ranking in the question Rank items in an array using Python/NumPy, including how to do it using argsort().

Alternatively, pandas series has a rank method which does what you need with the min method:
import pandas as pd
pd.Series((1,1,1,2,2,3,3,3,3)).rank(method="min")
# 0 1
# 1 1
# 2 1
# 3 4
# 4 4
# 5 6
# 6 6
# 7 6
# 8 6
# dtype: float64

With focus on performance, here's an approach -
def rank_repeat_based(arr):
idx = np.concatenate(([0],np.flatnonzero(np.diff(arr))+1,[arr.size]))
return np.repeat(idx[:-1],np.diff(idx))
For a generic case with the elements in input array not already sorted, we would need to use argsort() to keep track of the positions. So, we would have a modified version, like so -
def rank_repeat_based_generic(arr):
sidx = np.argsort(arr,kind='mergesort')
idx = np.concatenate(([0],np.flatnonzero(np.diff(arr[sidx]))+1,[arr.size]))
return np.repeat(idx[:-1],np.diff(idx))[sidx.argsort()]
Runtime test
Testing out all the approaches listed thus far to solve the problem on a large dataset.
Sorted array case :
In [96]: arr = np.sort(np.random.randint(1,100,(10000)))
In [97]: %timeit rankdata(arr, method='min') - 1
1000 loops, best of 3: 635 µs per loop
In [98]: %timeit rankmin(arr)
1000 loops, best of 3: 495 µs per loop
In [99]: %timeit (pd.Series(arr).rank(method="min")-1).values
1000 loops, best of 3: 826 µs per loop
In [100]: %timeit rank_repeat_based(arr)
10000 loops, best of 3: 200 µs per loop
Unsorted case :
In [106]: arr = np.random.randint(1,100,(10000))
In [107]: %timeit rankdata(arr, method='min') - 1
1000 loops, best of 3: 963 µs per loop
In [108]: %timeit rankmin(arr)
1000 loops, best of 3: 869 µs per loop
In [109]: %timeit (pd.Series(arr).rank(method="min")-1).values
1000 loops, best of 3: 1.17 ms per loop
In [110]: %timeit rank_repeat_based_generic(arr)
1000 loops, best of 3: 1.76 ms per loop

I've written a function for the same purpose. It uses pure python and numpy only. Please have a look. I put comments as well.
def my_argsort(array):
# this type conversion let us work with python lists and pandas series
array = np.array(array)
# create mapping for unique values
# it's a dictionary where keys are values from the array and
# values are desired indices
unique_values = list(set(array))
mapping = dict(zip(unique_values, np.argsort(unique_values)))
# apply mapping to our array
# np.vectorize works similar map(), and can work with dictionaries
array = np.vectorize(mapping.get)(array)
return array
Hope that helps.

Complex solutions are unnecessary for this problem.
> ary = np.sort([1, 1, 1, 2, 2, 3, 3, 3, 3]) # or anything; must be sorted.
> a = np.diff().cumsum(); a
array([0, 0, 1, 1, 2, 2, 2, 2])
> b = np.r_[0, a]; b # ties get first open rank
array([0, 0, 0, 1, 1, 2, 2, 2, 2])
> c = np.flatnonzero(ary[1:] != ary[:-1])
> np.r_[0, 1 + c][b] # ties get last open rank
array([0, 0, 0, 3, 3, 5, 5, 5])

Find the index of the k smallest values of a numpy array

In order to find the index of the smallest value, I can use argmin:
import numpy as np
A = np.array([1, 7, 9, 2, 0.1, 17, 17, 1.5])
print A.argmin() # 4 because A[4] = 0.1
But how can I find the indices of the k-smallest values?
I'm looking for something like:
print A.argmin(numberofvalues=3)
# [4, 0, 7] because A[4] <= A[0] <= A[7] <= all other A[i]
Note: in my use case A has between ~ 10 000 and 100 000 values, and I'm interested for only the indices of the k=10 smallest values. k will never be > 10.

Use np.argpartition. It does not sort the entire array. It only guarantees that the kth element is in sorted position and all smaller elements will be moved before it. Thus the first k elements will be the k-smallest elements.
import numpy as np
A = np.array([1, 7, 9, 2, 0.1, 17, 17, 1.5])
k = 3
idx = np.argpartition(A, k)
print(idx)
# [4 0 7 3 1 2 6 5]
This returns the k-smallest values. Note that these may not be in sorted order.
print(A[idx[:k]])
# [ 0.1 1. 1.5]
To obtain the k-largest values use
idx = np.argpartition(A, -k)
# [4 0 7 3 1 2 6 5]
A[idx[-k:]]
# [ 9. 17. 17.]
WARNING: Do not (re)use idx = np.argpartition(A, k); A[idx[-k:]] to obtain the k-largest.
That won't always work. For example, these are NOT the 3 largest values in x:
x = np.array([100, 90, 80, 70, 60, 50, 40, 30, 20, 10, 0])
idx = np.argpartition(x, 3)
x[idx[-3:]]
array([ 70, 80, 100])
Here is a comparison against np.argsort, which also works but just sorts the entire array to get the result.
In [2]: x = np.random.randn(100000)
In [3]: %timeit idx0 = np.argsort(x)[:100]
100 loops, best of 3: 8.26 ms per loop
In [4]: %timeit idx1 = np.argpartition(x, 100)[:100]
1000 loops, best of 3: 721 µs per loop
In [5]: np.alltrue(np.sort(np.argsort(x)[:100]) == np.sort(np.argpartition(x, 100)[:100]))
Out[5]: True

You can use numpy.argsort with slicing
>>> import numpy as np
>>> A = np.array([1, 7, 9, 2, 0.1, 17, 17, 1.5])
>>> np.argsort(A)[:3]
array([4, 0, 7], dtype=int32)

For n-dimentional arrays, this function works well. The indecies are returned in a callable form. If you want a list of the indices to be returned, then you need to transpose the array before you make a list.
To retrieve the k largest, simply pass in -k.
def get_indices_of_k_smallest(arr, k):
idx = np.argpartition(arr.ravel(), k)
return tuple(np.array(np.unravel_index(idx, arr.shape))[:, range(min(k, 0), max(k, 0))])
# if you want it in a list of indices . . .
# return np.array(np.unravel_index(idx, arr.shape))[:, range(k)].transpose().tolist()
Example:
r = np.random.RandomState(1234)
arr = r.randint(1, 1000, 2 * 4 * 6).reshape(2, 4, 6)
indices = get_indices_of_k_smallest(arr, 4)
indices
# (array([1, 0, 0, 1], dtype=int64),
# array([3, 2, 0, 1], dtype=int64),
# array([3, 0, 3, 3], dtype=int64))
arr[indices]
# array([ 4, 31, 54, 77])
%%timeit
get_indices_of_k_smallest(arr, 4)
# 17.1 µs ± 651 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

numpy.partition(your_array, k) is an alternative. No slicing necessary as it gives the values sorted until the kth element.

Fastest way to mix arrays in numpy?

a= array([1,3,5,7,9])
b= array([2,4,6,8,10])
I want to mix pair of arrays so that their sequences insert element by element
Example: using a and b, it should result into
c= array([1,2,3,4,5,6,7,8,9,10])
I need to do that using pairs of long arrays (more than one hundred elements) on thousand of sequences. Any smarter ideas than pickling element by element on each array?
thanks

c = np.empty(len(a)+len(b), dtype=a.dtype)
c[::2] = a
c[1::2] = b
(That assumes a and b have the same dtype.)
You asked for the fastest, so here's a timing comparison (vstack, ravel and empty are all numpy functions):
In [40]: a = np.random.randint(0, 10, size=150)
In [41]: b = np.random.randint(0, 10, size=150)
In [42]: %timeit vstack((a,b)).T.flatten()
100000 loops, best of 3: 5.6 µs per loop
In [43]: %timeit ravel([a, b], order='F')
100000 loops, best of 3: 3.1 µs per loop
In [44]: %timeit c = empty(len(a)+len(b), dtype=a.dtype); c[::2] = a; c[1::2] = b
1000000 loops, best of 3: 1.94 µs per loop
With vstack((a,b)).T.flatten(), a and b are copied to create vstack((a,b)), and then the data is copied again by the flatten() method.
ravel([a, b], order='F') is implemented as asarray([a, b]).ravel(order), which requires copying a and b, and then copying the result to create an array with order='F'. (If you do just ravel([a, b]), it is about the same speed as my answer, because it doesn't have to copy the data again. Unfortunately, order='F' is needed to get the alternating pattern.)
So the other two methods copy the data twice. In my version, each array is copied once.

This'll do it:
vstack((a,b)).T.flatten()
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

Using numpy.ravel:
>>> np.ravel([a, b], order='F')
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

Select elements row-wise based on single array

Say I have an array d of size (N,T), out of which I need to select elements using index of shape (N,), where the first element corresponds to the index in the first row, etc... how would I do that?
For example
>>> d
Out[748]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]])
>>> index
Out[752]: array([5, 6, 1], dtype=int64)
Expected Output:
array([[5],
[6],
[2])
Which is an array containing the fifth element of the first row, the 6th element of the second row and the second element of the third row.
Update
Since I will have sufficiently larger N, I was interested in the speed of the different methods for higher N. With N = 30000:
>>> %timeit np.diag(e.take(index2, axis=1)).reshape(N*3, 1)
1 loops, best of 3: 3.9 s per loop
>>> %timeit e.ravel()[np.arange(e.shape[0])*e.shape[1]+index2].reshape(N*3, 1)
1000 loops, best of 3: 287 µs per loop
Finally, you suggest reshape(). As I want to leave it as general as possible (without knowing N), I instead use [:,np.newaxis] - it seems to increase duration from 287µs to 288µs, which I'll take :)

This might be ugly but more efficient:
>>> d.ravel()[np.arange(d.shape[0])*d.shape[1]+index]
array([5, 6, 2])
edit
As pointed out by #deinonychusaur the statement above can be written as clean as:
d[np.arange(index.size),index]

There might be nicer ways, but a combo of take, diag and reshape would do:
In [137]: np.diag(d.take(index, axis=1)).reshape(3, 1)
Out[137]:
array([[5],
[6],
[2]])
EDIT
Comparisons with #Emanuele Paolinis' alterative, adding reshape to it to match the sought output:
In [142]: %timeit d.reshape(d.size)[np.arange(d.shape[0])*d.shape[1]+index].reshape(3, 1)
100000 loops, best of 3: 9.51 µs per loop
In [143]: %timeit np.diag(d.take(index, axis=1)).reshape(3, 1)
100000 loops, best of 3: 3.81 µs per loop
In [146]: %timeit d.ravel()[np.arange(d.shape[0])*d.shape[1]+index].reshape(3, 1)
100000 loops, best of 3: 8.56 µs per loop
This method is about twice as fast as both proposed alternatives.
EDIT 2: An even better method
Based on #Emanuele Paulinis' version but reduced number of operations outperforms all on large arrays 10k rows by 100 columns.
In [199]: %timeit d[(np.arange(index.size), index)].reshape(index.size, 1)
1000 loops, best of 3: 364 µs per loop
In [200]: %timeit d.ravel()[np.arange(d.shape[0])*d.shape[1]+index].reshape(index.size, 1)
100 loops, best of 3: 5.22 ms per loop
So if speed is of essence:
d[(np.arange(index.size), index)].reshape(index.size, 1)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Efficiently sorting a numpy array in descending order? - python

>>> a=np.array([5, 2, 7, 4, 4, 2, 8, 6, 4, 4]) >>> np.sort(a) array([2, 2, 4, 4, 4, 4, 5, 6, 7, 8]) >>> -np.sort(-a) array([8, 7, 6, 5, 4, 4, 4, 4, 2, 2])

Unfortunately when you have a complex array, only np.sort(temp)[::-1] works properly. The two other methods mentioned here are not effective.

i suggest using this ... np.arange(start_index, end_index, intervals)[::-1] for example: np.arange(10, 20, 0.5) np.arange(10, 20, 0.5)[::-1] Then your resault: [ 19.5, 19. , 18.5, 18. , 17.5, 17. , 16.5, 16. , 15.5, 15. , 14.5, 14. , 13.5, 13. , 12.5, 12. , 11.5, 11. , 10.5, 10. ]

Related

Efficient way to shift values in one matrix according to the values in another matrix

Can numpy's argsort give equal element the same rank?

Find the index of the k smallest values of a numpy array

Fastest way to mix arrays in numpy?

Select elements row-wise based on single array

Categories

Resources