Argsort.argsort - What is it doing? - python

Why is numpy giving this result:
x = numpy.array([-1,5,-2,0])
print x.argsort().argsort()
[1,3,0,2]

The first x.argsort() returns the indices in the order that will sort the array x. When you call argsort again on this result, what you get is the array of indices that will sort the previous array of indices:
x.argsort()
# array([2, 0, 3, 1])
x.argsort().argsort()
# array([1, 3, 0, 2])
# To sort the array x.argsort(), you start with index 1 which hold the value 0, and so on

argsort() gives the indexes of sorted array.
fist argsort() result is: [2, 0, 3, 1]
second argsort() results [1, 3, 0, 2]
You can find details in NumPy documentation.
https://numpy.org/doc/stable/reference/generated/numpy.argsort.html

The numpy.argsort function
Returns the indices that would sort an array.
x = numpy.array([-1, 5, -2, 0])
print(x.argsort()) # [2 0 3 1]
To get a sorted array, you need to use the order [2 0 3 1]
index [ 2 0 3 1]
| | | |
v v v v
value -2 -1 0 5 << is sorted
Calling x.argsort().argsort() is same as numpy.array([2, 0, 3, 1]).argsort(). It gives
index [1 3 0 2]
| | | |
v v v v
value 0 1 2 3 << is sorted

Actually, this is a good question, this can get the ranks of the original array.
For example,
data = np.array([0.9, 0.5, 0.3, 0.6])
the output of data.argsort().argsort() is: array([3, 1, 0, 2]).
The default order is ascending.
3 is the rank of 0.9.
1 is the rank of 0.5.
0 is the rank of 0.3.
2 is the rank of 0.6.

Related

How do we extract arrays from a nested numpy array based on conditions on the subarrays?

I have an output which is a nested numpy array. Each subarray has 10 float values, from this 'larger' array I want to extract those subarrays which have the maximum value at a specific index.
Edit: (Edited for clarity)
Example of nested array -
[[1 0 0 0] [1 0 0 0] [0 0 1 0] [1 0 0 0] [0 1 0 0] [0 0.99 0 0]
Required output
[[0 1 0 0] [0 0.99 0 0]] (We take the index as 1 in this example)
I want to extract those subarrays, whose value at the index 0,1,2 etc is the highest. So the condition here is extract all the subarrays where the value at index 1 (for eg) is the maximum.
Here is a way to do it :
example = [[8, 5], [8,7], [5.6,1], [7, 9]]
# You choose which specific index you want tha max value
max_value_index= 0
# We use a list comprehension to select the corresponding rows
result = [x for x in example if x[max_value_index] == np.max(x)]
Output :
[[8, 5], [8, 7], [5.6, 1]
So from my understanding you have some max_index parameter, and you want to get all rows whose maximum element falls at that index. To do this you can say "return all rows of my array for which the argmax of this row is equal to the max index", which in numpy is a one-liner:
arr = np.random.randn(100, 10)
max_index = 2
rows_with_max_at_max_index = arr[np.argmax(arr, axis=1) == max_index] # A (N x 10) array

compute density map D

You are given two integer numbers n and r, such that 1 <= r < n,
a two-dimensional array W of size n x n.
Each element of this array is either 0 or 1.
Your goal is to compute density map D for array W, using radius of r.
The output density map is also two-dimensional array,
where each value represent number of 1's in matrix W within the specified radius.
Given the following input array W of size 5 and radius 1 (n = 5, r = 1)
1 0 0 0 1
1 1 1 0 0
1 0 0 0 0
0 0 0 1 1
0 1 0 0 0
Output (using Python):
3 4 2 2 1
4 5 2 2 1
3 4 3 3 2
2 2 2 2 2
1 1 2 2 2
Logic: Input first row, first column value is 1. r value is 1. So we should check 1 right element, 1 left element, 1 top element, top left, top right, bottom , bottom left and bottom right and sum all elements.
Should not use any 3rd party library.
I did it using for loop and inner for loop and check for each element. Any better work around ?
Optimization: For each 1 in W, update count for locations, in whose neighborhood it belongs
Although for W of size nxn, the following algorithm would still take O(n^2) steps, however if W is sparse i.e. number of 1s (say k) << nxn then instead of rxrxnxn steps for approach stated in question, following would take nxn + rxrxk steps, which is much lower if k << nxn
Given r assigned and W stored as
[[1, 0, 0, 0, 1],
[1, 1, 1, 0, 0],
[1, 0, 0, 0, 0],
[0, 0, 0, 1, 1],
[0, 1, 0, 0, 0]]
then following
output = [[ 0 for i in range(5) ] for j in range(5) ]
for i in range(len(W)):
for j in range(len(W[0])):
if W[i][j] == 1:
for off_i in range(-r,r+1):
for off_j in range(-r,r+1):
if (0 <= i+off_i < len(W)) and (0 <= j+off_j < len(W[0])):
output[i+off_i][j+off_j] += 1
stores required values in output
for r = 1, output is as required
[[3, 4, 2, 2, 1],
[4, 5, 2, 2, 1],
[3, 4, 3, 3, 2],
[2, 2, 2, 2, 2],
[1, 1, 2, 2, 2]]

can't understand scipy.sparse.csr_matrix example

I can't wrap my head around csr_matrix examples in scipy documentation: https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html
Can someone explain how this example work?
>>> row = np.array([0, 0, 1, 2, 2, 2])
>>> col = np.array([0, 2, 2, 0, 1, 2])
>>> data = np.array([1, 2, 3, 4, 5, 6])
>>> csr_matrix((data, (row, col)), shape=(3, 3)).toarray()
array([[1, 0, 2],
[0, 0, 3],
[4, 5, 6]])
I believe this is following this format.
csr_matrix((data, (row_ind, col_ind)), [shape=(M, N)])
where data, row_ind and col_ind satisfy the relationship a[row_ind[k], col_ind[k]] = data[k].
What is a here?
row = np.array([0, 0, 1, 2, 2, 2])
col = np.array([0, 2, 2, 0, 1, 2])
data = np.array([1, 2, 3, 4, 5, 6])
from the above arrays;
for k in 0~5
a[row_ind[k], col_ind[k]] = data[k]
a
row[0],col[0] = [0,0] = 1 (from data[0])
row[1],col[1] = [0,2] = 2 (from data[1])
row[2],col[2] = [1,2] = 3 (from data[2])
row[3],col[3] = [2,0] = 4 (from data[3])
row[4],col[4] = [2,1] = 5 (from data[4])
row[5],col[5] = [2,2] = 6 (from data[5])
so let's arrange matrix 'a' in shape(3X3)
a
0 1 2
0 [1, 0, 2]
1 [0, 0, 3]
2 [4, 5, 6]
This is a sparse matrix. So, it stores the explicit indices and values at those indices. So for example, since row=0 and col=0 corresponds to 1 (the first entries of all three arrays in your example). Hence, the [0,0] entry of the matrix is 1. And so on.
Represent the "data" in a 4 X 4 Matrix:
data = np.array([10,0,5,99,25,9,3,90,12,87,20,38,1,8])
indices = np.array([0,1,2,3,0,2,3,0,1,2,3,1,2,3])
indptr = np.array([0,4,7,11,14])
'indptr'- Index pointers is linked list of pointers to 'indices' (Column
index Pointers)...
indptr[i:i+1] represents i to i+1 index of pointer
14 reprents len of Data len(data)...
indptr = np.array([0,4,7,11,len(data)]) other way of represenint 'indptr'
0,4 --> 0:4 represents pointers to indices 0,1,2,3
4,7 --> 4:7 represents the pointers of indices 0,2,3
7,11 --> 7:11 represents the pointers of 0,1,2,3
11,14 --> 11:14 represents pointers 1,2,3
# Representing the data in a 4,4 matrix
a = csr_matrix((data,indices,indptr),shape=(4,4),dtype=np.int)
a.todense()
matrix([[10, 0, 5, 99],
[25, 0, 9, 3],
[90, 12, 87, 20],
[ 0, 38, 1, 8]])
Another Stackoverflow explanation
As far as I understand, in row and col arrays we have indices which corrensponds to non-zero values in matrix. a[0, 0] = 1, a[0, 2] = 2, a[1, 2] = 3 and so on. As we have no indices for a[0, 1], a[1, 0], a[1, 1] so appropriate values in matrix are equal to 0.
Also, maybe this little intro will be helpful for you:
https://www.youtube.com/watch?v=Lhef_jxzqCg
#Rohit Pandey stated correctly, I just want to add an example on that.
When most of the elements of a matrix have 0 values, then we call this a sparse matrix. The process includes removing zero elements from the matrix and thus saving memory space and computing time. We only store non-zero items with their respected row and column index. i.e.
0 3 0 4
0 5 7 0
0 0 0 0
0 2 6 0
We calculate the sparse matrix by putting non-zero items row index first, then column index, and finally non-zero values like the following:
Row
0
0
1
1
3
3
Column
1
3
1
2
1
2
Value
3
4
5
7
2
6
By reversing the process we get the simple matrix form from the sparse form.

how to do this operation in numpy (chaining of tiling operation)?

I'm trying to do fast generation of numpy array, possibly without passing through python.
I want to build an 1D index numpy array that would take this as an input:
[2,3] and this [2,4] and would return this
[0,1,0,1,0,1,2,0,1,2,0,1,2,0,1,2]
Explanation:
I iterate from 0 to 2 (so [0,1] array) and repeat it 2 times : [0,1,0,1]
Then I iterate from 0 to 3 (so [0,1,2] array) and repeat it 4 times : [0,1,2,0,1,2,0,1,2,0,1,2]
Then I flattened everything.
Is there a way to do this fully in numpy?
For now I'm building each table separately in numpy by using np.tile() and flattening everything afterwards but I feel like there is a more efficient way that would only translate to C functions calls and no python
Here is a vectorized solution:
def cycles(spec):
steps = np.repeat(*spec)
ps = steps.cumsum()
psj = np.zeros(ps[-1], int)
psj[ps[:-1]] = steps[:-1]
return np.arange(ps[-1]) - psj.cumsum()
Demo:
>>> cycles(((2,3),(2,4)))
array([0, 1, 0, 1, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2])
I am not entirely sure if this is what you want; here each tuple in the call to func() contains first the range and then the repeat.
import numpy
def func(tups):
Arr = numpy.empty(numpy.sum([ele[0] * ele[1] for ele in tups]), dtype=int)
i = 0
for ele in tups:
Arr[i:i + ele[0] * ele[1]] = numpy.tile(numpy.arange(ele[0]), ele[1])
i += ele[0] * ele[1]
return Arr
arr = func([(2, 3), (3, 4)])
print(arr)
# [0 1 0 1 0 1 0 1 2 0 1 2 0 1 2 0 1 2]

Efficient way of re-numbering elements in an array

I am reasonably new to python and am trying to implement a genetic algorithm, but need some assistance with the code for one of the operations.
I have formulated the problem this way:
each individual I is represented by a string of M integers
each element e in I takes a value from 0 to N
every number from 0 - N must appear in I at least once
the value of e is not important, so long as each uniquely valued element takes the same unique value (think of them as class labels)
e is less than or equal to N
N can be different for each I
after applying the crossover operation i can potentially generate children which violate one or more of these constraints, so i need to find a way to re-number the elements so that they retain their properties, but fit with the constraints.
for example:
parent_1 (N=5): [1 3 5 4 2 1|0 0 5 2]
parent_2 (N=3): [2 0 1 3 0 1|0 2 1 3]
*** crossover applied at "|" ***
child_1: [1 3 5 4 2 1 0 2 1 3]
child_2: [2 0 1 3 0 1 0 0 5 2]
child_1 obviously still satisfies all of the constraints, as N = 5 and all values 0-5 appear at least once in the array.
The problem lies with child 2 - if we use the max(child_2) way of calculating N we get a value of 5, but if we count the number of unique values then N = 4, which is what the value for N should be. What I am asking (in a very long winded way, granted) is what is a good, pythonic way of doing this:
child_2: [2 0 1 3 0 1 0 0 5 2]
*** some python magic ***
child_2': [2 0 1 3 0 1 0 0 4 2]
*or*
child_2'': [0 1 2 3 1 2 1 1 4 0]
child_2'' is there to illustrate that the values themselves dont matter, so long as each element of a unique value maps to the same value, the constraints are satisfied.
here is what i have tried so far:
value_map = []
for el in child:
if el not in value_map:
value_map.append(el)
for ii in range(0,len(child)):
child[ii] = value_map.index(child[ii])
this approach works and returns a result similar to child_2'', but i can't imagine that it is very efficient in the way it iterates over the string twice, so i was wondering if anyone has any suggestions of how to make it better.
thanks, and sorry for such a long post for such a simple question!
You will need to iterates the list more than once, I don't think there's any way around this. After all, you first have to determine the number of different elements (first pass) before you can start changing elements (second pass). Note, however, that depending on the number of different elements you might have up to O(n^2) due to the repetitive calls to index and not in, which have O(n) on a list.
Alternatively, you could use a dict instead of a list for your value_map. A dictionary has much faster lookup than a list, so this way, the complexity should indeed be on the order of O(n). You can do this using (1) a dictionary comprehension to determine the mapping of old to new values, and (2) a list comprehension for creating the updated child.
value_map = {el: i for i, el in enumerate(set(child))}
child2 = [value_map[el] for el in child]
Or change the child in-place using a for loop.
for i, el in enumerate(child):
child[i] = value_map[el]
You can do it with a single loop like this:
value_map = []
result = []
for el in child:
if el not in value_map:
value_map.append(el)
result.append(value_map.index(el))
One solution I can think of is:
Determine the value of N and determine unused integers. (this forces you to iterate over the array once)
Go through the array and each time you meet a number superior to N, map it to an unused integer.
This forces you to go through the arrays twice, but it should be faster than your example (that forces you to go through the value_map at each element of the array at each iteration)
child = [2, 0, 1, 3, 0, 1, 0, 0, 5, 2]
used = set(child)
N = len(used) - 1
unused = set(xrange(N+1)) - used
value_map = dict()
for i, e in enumerate(child):
if e <= N:
continue
if e not in value_map:
value_map[e] = unused.pop()
child[i] = value_map[e]
print child # [2, 0, 1, 3, 0, 1, 0, 0, 4, 2]
I like #Selçuk Cihan answer. It can also be done in place.
>>> child = [2, 0, 1, 3, 0, 1, 0, 0, 5, 2]
>>>
>>> value_map = []
>>> for i in range(len(child)):
... el = child[i]
... if el not in value_map:
... value_map.append(el)
... child[i] = value_map.index(el)
...
>>> child
[0, 1, 2, 3, 1, 2, 1, 1, 4, 0]
I believe that this works, although I didn't test it for more than the single case that is given in the question.
The only thing that bothers me is that value_map appears three times in the code...
def renumber(individual):
"""
>>> renumber([2, 0, 1, 3, 0, 1, 0, 0, 4, 2])
[0, 1, 2, 3, 1, 2, 1, 1, 4, 0]
"""
value_map = {}
return [value_map.setdefault(e, len(value_map)) for e in individual]
Here is a fast solution, which iterates the list only once.
a = [2, 0, 1, 3, 0, 1, 0, 0, 5, 2]
b = [-1]*len(a)
j = 0
for i in range(len(a)):
if b[a[i]] == -1:
b[a[i]] = j
a[i] = j
j += 1
else:
a[i] = b[a[i]]
print(a) # [0, 1, 2, 3, 1, 2, 1, 1, 4, 0]

Categories