Use index of maximum value on a defined axis - python

I want to extract the values of an array "B" at the same index of the maximum of each line of the matrix "A". to find the index I use the numpy function "numpy.argmax" like this:
>>> A=numpy.asarray([[0,1,6],[3,2,4]]);A
array([[0, 1, 6],
[3, 2, 4]])
>>> argA=numpy.argmax(A,axis=1);argA
array([2, 2])
The problem is that I don't know how to use "argA" to extract the values in the array "B"

Each entry in argA corresponds to the index position of a maximum value within a corresponding row. The rows are not explicit (due to using axis=1), but correspond to the index for each entry. So you need to add them in to get at the elements you are after.
>>> A[[0,1], argA]
array([6, 4])
So:
>>> B
array([[ 9, 8, 2],
[ 3, 4, 5]])
>>> B[[0,1], argA] = 84,89
>>> B
array([[ 9, 8, 84],
[ 3, 4, 89]])
to generalise use:
>>> B[np.arange(A.shape[0]),argA]

You can use an array directly as an index - this is actually as simple as:
B[:, arga]
eg:
>>> A[:,argA]
array([[6, 6],
[4, 4]])

This may appear a little inefficient, but you could use:
A.take(argA, axis=1).diagonal()
Instead of A.take you can use A[:, argA] but take is more explicit about the axis.
You could use ravel_multi_index to convert to flat indices:
A.take(np.ravel_multi_index((np.arange(len(argA)), argA), A.shape))

Related

Given the indexes corresponding to each row, get the corresponding elements from a matrix

Given indexes for each row, how to return the corresponding elements in a 2-d matrix?
For instance, In array of np.array([[1,2,3,4],[4,5,6,7]]) I expect to see the output [[1,2],[4,5]] given indxs = np.array([[0,1],[0,1]]). Below is what I've tried:
a= np.array([[1,2,3,4],[4,5,6,7]])
indxs = np.array([[0,1],[0,1]]) #means return the elements located at 0 and 1 for each row
#I tried this, but it returns an array with shape (2, 2, 4)
a[idxs]
The reason you are getting two times your array is that when you do a[[0,1]] you are selecting the rows 0 and 1 from your array a, which are indeed your entire array.
In[]: a[[0,1]]
Out[]: array([[1, 2, 3, 4],
[4, 5, 6, 7]])
You can get the desired output using slides. That would be the easiest way.
a = np.array([[1,2,3,4],[4,5,6,7]])
a[:,0:2]
Out []: array([[1, 2],
[4, 5]])
In case you are still interested on indexing, you could also get your output doing:
In[]: [list(a[[0],[0,1]]),list(a[[1],[0,1]])]
Out[]: [[1, 2], [4, 5]]
The NumPy documentation gives you a really nice overview on how indexes work.
In [120]: indxs = np.array([[0,1],[0,1]])
In [121]: a= np.array([[1,2,3,4],[4,5,6,7]])
...: indxs = np.array([[0,1],[0,1]]) #
You need to provide an index for the first dimension, one that broadcasts with with indxs.
In [122]: a[np.arange(2)[:,None], indxs]
Out[122]:
array([[1, 2],
[4, 5]])
indxs is (2,n), so you need a (2,1) array to give a (2,n) result

Numpy increment array indexed array? [duplicate]

I am trying to efficiently update some elements of a numpy array A, using another array b to indicate the indexes of the elements of A to be updated. However b can contain duplicates which are ignored whereas I would like to be taken into account. I would like to avoid for looping b. To illustrate it:
>>> A = np.arange(10).reshape(2,5)
>>> A[0, np.array([1,1,1,2])] += 1
>>> A
array([[0, 2, 3, 3, 4],
[5, 6, 7, 8, 9]])
whereas I would like the output to be:
array([[0, 3, 3, 3, 4],
[5, 6, 7, 8, 9]])
Any ideas?
To correctly handle the duplicate indices, you'll need to use np.add.at instead of +=. Therefore to update the first row of A, the simplest way would probably be to do the following:
>>> np.add.at(A[0], [1,1,1,2], 1)
>>> A
array([[0, 4, 3, 3, 4],
[5, 6, 7, 8, 9]])
The documents for the ufunc.at method can be found here.
One approach is to use numpy.histogram to find out how many values there are at each index, then add the result to A:
A[0, :] += np.histogram(np.array([1,1,1,2]), bins=np.arange(A.shape[1]+1))[0]

Increase numpy array elements using array as index

I am trying to efficiently update some elements of a numpy array A, using another array b to indicate the indexes of the elements of A to be updated. However b can contain duplicates which are ignored whereas I would like to be taken into account. I would like to avoid for looping b. To illustrate it:
>>> A = np.arange(10).reshape(2,5)
>>> A[0, np.array([1,1,1,2])] += 1
>>> A
array([[0, 2, 3, 3, 4],
[5, 6, 7, 8, 9]])
whereas I would like the output to be:
array([[0, 3, 3, 3, 4],
[5, 6, 7, 8, 9]])
Any ideas?
To correctly handle the duplicate indices, you'll need to use np.add.at instead of +=. Therefore to update the first row of A, the simplest way would probably be to do the following:
>>> np.add.at(A[0], [1,1,1,2], 1)
>>> A
array([[0, 4, 3, 3, 4],
[5, 6, 7, 8, 9]])
The documents for the ufunc.at method can be found here.
One approach is to use numpy.histogram to find out how many values there are at each index, then add the result to A:
A[0, :] += np.histogram(np.array([1,1,1,2]), bins=np.arange(A.shape[1]+1))[0]

Index a numpy array using a list in a pythonic way

Imagine that I have a python array like
array = [[2,3,4],[5,6,7],[8,9,10]]
And a list
list = [0,2,1]
I basically want a one liner to extract the indexed elements from the array given by the list
For example, with the given array and list:
result = [2,7,9]
My kneejerk option was
result = array[:, list]
But that did not work
I know that a for cycle should do it, I just want to know if there is some indexing that might do the trick
Something like this?
In [24]: a
Out[24]:
array([[ 2, 3, 4],
[ 5, 6, 7],
[ 8, 9, 10]])
In [25]: lis
Out[25]: [0, 2, 1]
In [26]: a[np.arange(len(a)), lis]
Out[26]: array([2, 7, 9])
Use enumerate to create row indices and unzip (zip(*...)) this collection to get the row indices (the range [0, len(list))) and the column indices (lis) :
a[zip(*enumerate(lis))]

removing columns from an array in Python

I have a 2D Python array, from which I would like to remove certain columns, but I don't know how many I would like to remove until the code runs.
I want to loop over the columns in the original array, and if the sum of the rows in any one column is about a certain value I want to remove the whole column.
I started to do this the following way:
for i in range(original_number_of_columns)
if sum(original_array[:,i]) < certain_value:
new_array[:,new_index] = original_array[:,i]
new_index+=1
But then I realised that I was going to have to define new_array first, and tell Python what size it is. But I don't know what size it is going to be beforehand.
I have got around it before by firstly looping over the columns to find out how many I will lose, then defining the new_array, and then lastly running the loop above - but obviously there will be much more efficient ways to do such things!
Thank you.
You can use the following:
import numpy as np
a = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
]
)
print a.compress(a.sum(0) > 15, 1)
[[3]
[6]
[9]]
without numpy
my_2d_table = [[...],[...],...]
only_cols_that_sum_lt_x = [col for col in zip(*my_2d_table) if sum(col) < some_threshold]
new_table = map(list,zip(*only_cols_that_sum_lt_x))
with numpy
a = np.array(my_2d_table)
a[:,np.sum(a,0) < some_target]
I suggest using numpy.compress.
>>> import numpy as np
>>> a = np.array([[1, 2, 3], [1, -3, 2], [4, 5, 7]])
>>> a
array([[ 1, 2, 3],
[ 1, -3, 2],
[ 4, 5, 7]])
>>> a.sum(axis=0) # sums each column
array([ 6, 4, 12])
>>> a.sum(0) < 5
array([ False, True, False], dtype=bool)
>>> a.compress(a.sum(0) < 5, axis=1) # applies the condition to the elements of each row so that only those elements in the rows whose column indices correspond to True values in the condition array will be kept
array([[ 2],
[-3],
[ 5]])

Categories