The use of numpy.argmax - python

I'm here to inquire about the use of numpy.argmax
For instance, consider this array:
import numpy as np
a = np.arange(6).reshape(2,3)
b = np.argmax(a, axis = 0)
c = np.argmax(a, axis = 1)
print(a)
print(b)
print(c)
Here's the output:
[[0 1 2]
[3 4 5]]
5
[1 1 1]
[2 2]
I'm confused about the use of the parameter axis for numpy.argmax. What does it do? Why does it return [1 1 1] if axis = 0 and [2 2] if the value of axis = 1?

numpy.argmax() returns the position of the largest element in an array, optionally by row or column (the axis argument). So in the first case, [1 1 1], you get the position of the largest element column-wise. Since the elements in row 1 are all larger that the elements in row 0, you get your array of three ones. Analogously for axis=1, where you get the column of the largest element in each row.

argmax returns to you the index of the max value along the axis you specified.
The exact comparisons it did to get there:
3 > 0, 4 > 1, 5 > 2 : [1 1 1]
2 is the largest of set [0 1 2]
5 is the largest of set [3 4 5]:
[2 2]

Related

Is np.argpartition giving me the wrong results?

Take the following code:
import numpy as np
one_dim = np.array([2, 3, 1, 5, 4])
partitioned = np.argpartition(one_dim, 0)
print(f'Unpartitioned array: {one_dim}')
print(f'Partitioned array index: {partitioned}')
print(f'Partitioned array: {one_dim[partitioned]}')
The following output results:
Unpartitioned array: [2 3 1 5 4]
Partitioned array index: [2 1 0 3 4]
Partitioned array: [1 3 2 5 4]
The output for the partitioned array should be [1 2 3 5 4]. How is three on the left side of two? It seems to me the function is making an error or am I missing something?
The second argument is which index will be in sorted position after partitioning, so it is correct that index 0 of the partition (element value 1) is in sorted position, and everything to the right is greater.

I have a 2d array. Need to make a loop to replace first 2 rows, then next 2 only rows and so on by ones and print till the loop ends

b = np.random.randint(0,10, (6,3))
I tried this code, but it gives the `ValueError: operands could not be broadcast together with shapes (2,3) () (6,3)
step = 2
r1 = 0
r2 = 2
while r2 <= len(b):
c = np.where(b[r1:r2] >= 0, 1, b)
print(c)
r1+ = step
r2+ = step
I think the problem is in a condition of np.where. It creates an array wih a shape that is incompatible with b array
What i need is for the code to receive array b and to return 3 arrays of the same size of b but with two rows been substituted by 1´s. Like this:
[[1 1 1]
[1 1 1]
[6 3 4]
[2 9 3]
[6 9 2]
[8 1 0]]
[[3 2 8]
[3 8 5]
[1 1 1]
[1 1 1]
[6 9 2]
[8 1 0]]
[[3 2 8]
[3 8 5]
[6 3 4]
[2 9 3]
[1 1 1]
[1 1 1]]
My tutor told me to try it with 'np.where' function.But it seems that this function doesnt support this type of condition i´m trying to feed to it. May be there is another way to get the desired output. All examples I googled work with random values of the array and not precisely rows. In pandas it easier. But i need numpy code to feed the output to the neural network. The ones will be treated by it as an empty values, but the size of the array will be always the same, thus not producing errors
You are getting a ValueError because the size of b[0:2] is not the same as the size of b.
print(b.shape)
# (6, 3)
print(b[0:2].shape)
# (2, 3)
The documentation for numpy.where states that the way the condition works is "Where True, yield x, otherwise yield y." Thus, you need to be able to broadcast x and y onto the size of your condition. In your example, you can't broadcast (6,3) onto (2,3) and hence the error.
You need things to be the same size. For example, c = np.where(b[0:2] >= 0, 1, b[0:2]) would not give you an error.
However, if you want to step through your array b, then you need something other than b[0:2]. Otherwise it will just keep repeating that first part your array. I think you probably want b[r1:r2].
Also, I notice that you have r1+ = step instead of r1 += step, which will also spit out an error. Note that you don't actually need both r1 and r2 since their offset is step.
Putting all this together, we can adjust your code to give you something that works:
import numpy as np
b = np.random.randint(0,5, (6,3))
step = 2
r1 = 0
while r1 <= len(b) - step:
c = np.copy(b)
c[r1:r1+step] = np.where(b[r1:r1+step] >= 0, 1, b[r1:r1+step])
print(c)
r1 += step
Or you could instead do it with a for loop instead of a while loop:
import numpy as np
b = np.random.randint(0,5, (6,3))
step = 2
for r1 in range(0, len(b), step):
c = np.copy(b)
c[r1:r1+step] = np.where(b[r1:r1+step] >= 0, 1, b[r1:r1+step])
print(c)
Resulting output:
[[1 1 1]
[1 1 1]
[3 2 2]
[1 1 2]
[3 3 0]
[3 2 2]]
[[4 0 2]
[4 0 0]
[1 1 1]
[1 1 1]
[3 3 0]
[3 2 2]]
[[4 0 2]
[4 0 0]
[3 2 2]
[1 1 2]
[1 1 1]
[1 1 1]]

Getting the Highest Sum of the columns and rows of a 4x4 matrix

I currently have to figure out which row and which column has the highest sum of integers in my 4x4 matrix. The issue is that the matrix has to be randomly generated each time. Here is my code:
def nativeSolution():
array = []
for x in range(4):
array.append([])
for i in range(4):
array[x].append(random.randint(0,1))
print(array[x][i],end='')
print()
for row in array:
rowArray = []
rowArray[row].append(sum(row))
print(rowArray)
My task as I said is to take the randomly output rows and find the one with the most 1s and then do the same with the columns. Thank you!
As I said in the comment, Numpy will be very helpful in this situation.
If you dont want to use it, you have to reimplement every method I used here.
import numpy as np
arr = np.random.randint(0, 2, (4, 4))
print(arr)
print('Highest row (index={}, sum={}): {}'.format(
arr.sum(axis=1).argmax(),
arr.sum(axis=1).max(),
arr[arr.sum(axis=1).argmax()])
)
print('Highest column (index={}, sum={}): {}'.format(
arr.sum(axis=0).argmax(),
arr.sum(axis=0).max(),
arr[:, arr.sum(axis=0).argmax()])
)
# [[1 1 1 0]
# [1 0 1 1]
# [0 0 0 1]
# [1 1 0 1]]
# Highest row (index=0, sum=3): [1 1 1 0]
# Highest column (index=0, sum=3): [1 1 0 1]
Pretty good chance you'll have at least 2 or more rows or columns with the same max number of 1s. You can use argwhere to filter the df by that number and get back all of the top rows/columns.
a = np.random.randint(2, size=(4,4))
print(a,'\n')
print(f'Max Col(s) {np.argwhere(a.sum(axis=0) == a.sum(axis=0).max()).flatten()}')
print(f'Max Row(s) {np.argwhere(a.sum(axis=1) == a.sum(axis=1).max()).flatten()}')
Output
[[0 0 1 1]
[0 0 0 0]
[0 1 1 1]
[1 1 0 0]]
Max Col(s) [1 2 3]
Max Row(s) [2]

How to get indexes of top 2 values of each row in a 2-D numpy array, but with a specific area is excluded?

I have a 2-D array for example:
p = np.array([[21,2,3,1,12,13],
[4,5,6,14,15,16],
[7,8,9,17,18,19]])
b = np.argpartition(p, np.argmin(p, axis=1))[:, -2:]
com = np.ones([3,6],dtype=np.int)
com[np.arange(com.shape[0])[:,None],b] = 0
print(com)
b is the indices of top 2 values of each row in p:
b = [[0 5]
[4 5]
[4 5]]
com is np.ones matrix, the same size as p, the element whose index is same as b will change to 0.
So the result is :
com = [[0 1 1 1 1 0]
[1 1 1 1 0 0]
[1 1 1 1 0 0]]
Now I have one more constraint :
p[0:2,0:2] = [[21 2]
[4 5]]
The numbers in these area [0:2,0:2] should not be considered, so the result should be:
b = [[4 5]
[4 5]
[4 5]]
com = [[1 1 1 1 0 0]
[1 1 1 1 0 0]
[1 1 1 1 0 0]]
How can I do this ? Should I use a mask or something similarly?
Thanks in advance !
Just set the values in those slices to a low value, ensuring that they won't be among the two largest, an then use argpartition:
out = np.copy(p)
out[0:2,0:2] = -np.inf
np.argpartition(out, [-2,-1])[:, -2:]
array([[4, 5],
[4, 5],
[4, 5]])

Performance of NumPy for algorithms concerning individual elements of an array

I'm interested in the performance of NumPy, when it comes to algorithms that check whether a condition is True for an element and its affiliations (e.g. neighbouring elements) and assign a value according to the condition.
An example may be: (I make this up now)
I generate a 2d array of 1's and 0's, randomly.
Then I check whether the first element of the array is the same with its neighbors.
If the similar ones are the majority, I switch (0 -> 1 or 1 -> 0) that particular element.
And I proceed to the next element.
I guess that this kind of element wise conditions and element-wise operations are pretty slow with NumPy, is there a way that I can make the performance better?
For example, would creating the array with type dbool and adjusting the code, would it help?
Thanks in advance.
Maybe http://www.scipy.org/Cookbook/GameOfLifeStrides helps you.
It looks like your are doing some kind of image processing, you can try scipy.ndimage.
from scipy.ndimage import convolve
import numpy as np
np.random.seed(0)
x = np.random.randint(0,2,(5,5))
print x
w = np.ones((3,3), dtype=np.int8)
w[1,1] = 0
y = convolve(x, w, mode="constant")
print y
the outputs are:
[[0 1 1 0 1]
[1 1 1 1 1]
[1 0 0 1 0]
[0 0 0 0 1]
[0 1 1 0 0]]
[[3 4 4 5 2]
[3 5 5 5 3]
[2 4 4 4 4]
[2 3 3 3 1]
[1 1 1 2 1]]
y is the sum of the neighbors of every element. Do the same convolve with all ones, you get the number of neighbors number of every element:
>>> n = convolve(np.ones((5,5),np.int8), w, mode="constant")
>>> n
[[3 5 5 5 3]
[5 8 8 8 5]
[5 8 8 8 5]
[5 8 8 8 5]
[3 5 5 5 3]]
then you can do element-wise operations with x, y, n, and get your result.

Categories