How to the function 'where' generate this array? [duplicate] - python

This question already has answers here:
How do I use numpy.where()? What should I pass, and what does the result mean? [closed]
(2 answers)
Closed 4 years ago.
>>> x = np.arange(9.).reshape(3, 3)
>>> np.where( x > 5 )
(array([2, 2, 2]), array([0, 1, 2]))
What does the x>5 exactly mean? The resulting array seems mysterious.

It's a tuple with row and column indices. x > 5 returns a boolean array of the same shape as x with elements set to True where the condition is fulfilled and False otherwise. According to the documentation np.where will fallback on condition.nonzero when given no other arguments. For your given example all elements greater than 5 happen to be in row 2 and all columns fulfill the condition, hence the [2, 2, 2] (rows), [0, 1, 2] (columns). Note that you can use this result to index the original array:
>>> x[np.where(x > 5)]
[6 7 8]

The usual syntax is np.where(condition, res_if_true, res_if_false). With only the first argument, this is a special case described in the docs:
When only condition is provided, this function is a shorthand for
np.asarray(condition).nonzero().
So first calculate x > 5:
arr = x > 5
print(arr)
# array([[False, False, False],
# [False, False, False],
# [ True, True, True]])
Since it's already an array, calculate arr.nonzero():
print(arr.nonzero())
# (array([2, 2, 2], dtype=int64), array([0, 1, 2], dtype=int64))
This return the indices of the elements that are non-zero. The first element of the tuple represents coordinates of axis=0 and the second element coordinates of axis=1, i.e. all values in the 2nd and final row are greater than 5.

Related

how do I find the index of row in an array that the first column has a specific value and the second column has the max value?

suppose I have an array a with size (n,2) like this:
a = np.array([
[6, 185.153],
[6, 9.50864],
[1, 9.31425],
[1, 16.4629],
[6, 19.6971],
[1, 2.02113],
[1, 14.0193],
[5, 2.92495],
[3, 56.0731],
[3, 77.6965],
])
now I need to find the index of row that the first column is specific value M (for example 3) and the second corresponding column has max value between other rows with first column equals M. for example in the above array the index will be 8 I used the following code but it doese not work and the output is wrong. do you know what is the problem?
indx_nonremoved=np.where([minimum_merge.max(axis=1) ==3 ])[1]
You need to use list comprehensions and the key parameter of max() built-in function:
a=[[6, 185.153],
[6, 9.50864],
[1 , 9.31425],
[1 , 16.4629],
[6 ,19.6971],
[1 ,2.02113],
[1 ,14.0193],
[5 ,2.92495],
[3 ,56.0731],
[3 ,77.6965]]
print(a.index(max([i for i in a if i[0]==3], key=lambda x : x[1])))
print(numpy.where(a == max([i for i in a if i[0]==3], key=lambda x : x[1]))) #Use this if a is a numpy.ndarray
Output:
9
List comprehensions are certainly one approach. If you already use numpy, and you have a lot of data, then numpy methods will be faster...
I'll make a mask for the first column, to select those rows.
Then I'll argsort just those rows, taking the values from the second column
finally I'll map the indices from the sorted selection back to the whole array.
The row containing the maximum is the row with the last index
# a = np.asarray(a)
mask = (a[:,0] == 3)
# array([False, False, False, False, False, False, False, False, True, True])
(indices,) = np.nonzero(mask)
# array([8, 9], dtype=int64)
maxindex = np.argmax(a[mask, 1])
# 1
indices[maxindex]
# 9
So row 9 is the best that fits your criteria.
You can reorder the rows of a in any way and that will still be true.
The same can be done using np.argsort to get all rows sorted, instead of getting just the max.

arranging a array without loops list comp or recursion

Definition:Arranged array
an arranged array is an array of dim 2 , shape is square matrix (NXN) and for every cell in the matrix : A[I,J] > A[I,J+1] AND A[I,J] > A[I+1,J]
I have an assignment to write a func that:
gets a numpy array and returns
True if - the given array is an arranged array
False - otherwise
note: We CANNOT use loops, list comps OR recursion. the point of the task is to use numpy things.
assumptions: we can assume that the array isn't empty and has no NA's, also all of the cells are numerics
My code isn't very numpy oriented.. :
def is_square_ordered_matrix(A):
# Checking if the dimension is 2
if A.ndim != 2:
return False
# Checking if it is a squared matrix
if A.shape[0] != A.shape[1]:
return False
# Saving the original shape to reshape later
originalDim = A.shape
# Making it a dim of 1 to use it as a list
arrayAsList = list((A.reshape((1,originalDim[0]**2)))[0])
# Keeping original order before sorting
originalArray = arrayAsList[:]
# Using the values of the list as keys to see if there are doubles
valuesDictionary = dict.fromkeys(arrayAsList, 1)
# If len is different, means there are doubles and i should return False
if len(arrayAsList) != len(valuesDictionary):
return False
# If sorted list is equal to original list it means the original is already ordered and i should return True
arrayAsList.sort(reverse=True)
if originalArray == arrayAsList:
return True
else:
return False
True example:
is_square_ordered_matrix(np.arange(8,-1,-1).reshape((3,3)))
False example:
is_square_ordered_matrix(np.arange(9).reshape((3,3)))
is_square_ordered_matrix(np.arange(5,-1,-1).reshape((3,2)))
Simple comparison:
>>> def is_square_ordered_matrix(a):
... return a.shape[0] == a.shape[1] and np.all(a[:-1] > a[1:]) and np.all(a[:, :-1] > a[:, 1:])
...
>>> is_square_ordered_matrix(np.arange(8,-1,-1).reshape((3,3)))
True
>>> is_square_ordered_matrix(np.arange(9).reshape((3,3)))
False
>>> is_square_ordered_matrix(np.arange(5,-1,-1).reshape((3,2)))
False
First, compare a[:-1] with a[1:], which will compare the elements of each row with the elements of the next row, and then use np.all to judge:
>>> a = np.arange(8,-1,-1).reshape((3,3))
>>> a[:-1] # First and second lines
array([[8, 7, 6],
[5, 4, 3]])
>>> a[1:] # Second and third lines
array([[5, 4, 3],
[2, 1, 0]])
>>> a[:-1] > a[1:]
array([[ True, True, True],
[ True, True, True]])
>>> np.all(a[:-1] > a[1:])
True
Then compare a[:,:-1] with a[:, 1:], which will compare the columns:
>>> a[:, :-1] # First and second columns
array([[8, 7],
[5, 4],
[2, 1]])
>>> a[:, 1:] # Second and third columns
array([[7, 6],
[4, 3],
[1, 0]])
>>> a[:, :-1] > a[:, 1:]
array([[ True, True],
[ True, True],
[ True, True]])
The result of row comparison and column comparison is the result you want.

Using np.where for nested lists

I am trying to use np.where() function with nested lists.
I would like to find an index with a given condition of the first layer of the nested list.
For example, if I have the following code
arr = [[1,1], [2,2],[3,3]]
a = np.where(arr == [2,2])
then ideally I would like code to return 'a' as 1.
Since [2,2] is in index 1 of the nested list.
However, I am just getting a empty array back as a result.
Of course, I can make it work easily by implementing external for loop such as
for n in range(len(arr)):
if arr[n] == [2,2]:
a = n
but I would like to implement this simply within the function np.where(write the entire code here).
Is there a way to do this?
Well you can write your own function to do so:
You'll need to
Find every line equal to what you looking for
Get indices of found rows (You can use where):
numpy compression
You can use compression operator to see if each line satisfies the condition. Such as:
np_arr = np.array(
[1, 2, 3, 4, 5]
)
print(np_arr < 3)
This will return a boolean where every element is True or False where the condition is satisfied:
[ True True False False False]
For a 2D array you'll get a 2D boolean array:
to_find = np.array([2, 2])
np_arr = np.array(
[
[1, 1],
[2, 2],
[3, 3],
[2, 2]
]
)
print(np_arr == to_find)
The result is:
[[False False]
[ True True]
[False False]
[ True True]]
Now we are looking for lines with all True values. So we can use all method of ndarray. And we will provide the axis we are looking to look to all. X, Y or Both. We want to look to x axis:
to_find = np.array([2, 2])
np_arr = np.array(
[
[1, 1],
[2, 2],
[3, 3],
[2, 2]
]
)
print((np_arr == to_find).all(axis=1))
The result is:
[False True False True]
Get indices of Trues
At the end you are looking for indices where the values are True:
np.where((np_arr == to_find).all(axis=1))
The result would be:
(array([1, 3]),)
The best solution is that mentioned by #Michael Szczesny, but using np.where you can do this too:
a = np.where(np.array(arr) == [2, 2])[0]
resulted_ind = np.where(np.bincount(a) == 2)[0] # --> [1]
numpy runs in Python, so you can use both the basic Python lists and numpy arrays (which are more like MATLAB matrices)
A list of lists:
In [43]: alist = [[1,1], [2,2],[3,3]]
A list has an index method, which tests against each element of the list (elements here are 2 element lists):
In [44]: alist.index([2,2])
Out[44]: 1
In [45]: alist.index([2,3])
Traceback (most recent call last):
Input In [45] in <cell line: 1>
alist.index([2,3])
ValueError: [2, 3] is not in list
alist==[2,2] returns False, because the list is not the same as the [2,2] list.
If we make an array from that list:
In [46]: arr = np.array(alist)
In [47]: arr
Out[47]:
array([[1, 1],
[2, 2],
[3, 3]])
we can do an == test - but it compares numeric elements.
In [48]: arr == np.array([2,2])
Out[48]:
array([[False, False],
[ True, True],
[False, False]])
Underlying this comparison is the concept of broadcasting, allow it to compare a (3,2) array with a (2,) (a 2d with a 1d). Here's its trivial, but it can be much more complicated.
To find rows where all values are True, use:
In [50]: (arr == np.array([2,2])).all(axis=1)
Out[50]: array([False, True, False])
and where finds the True in that array (the result is a tuple with 1 array):
In [51]: np.where(_)
Out[51]: (array([1]),)
In Octave the equivalent is:
>> arr = [[1,1];[2,2];[3,3]]
arr =
1 1
2 2
3 3
>> all(arr == [2,2],2)
ans =
0
1
0
>> find(all(arr == [2,2],2))
ans = 2

Numpy getting row indices of last two elements of each column in mask

I have a boolean mask shaped (M, N). Each column in the mask may have a different number of True elements, but is guaranteed to have at least two. I want to find the row index of the last two such elements as efficiently as possible.
If I only wanted one element, I could do something like (M - 1) - np.argmax(mask[::-1, :], axis=0). However, that won't help me get the second-to-last index.
I've come up with an iterative solution using np.where or np.nonzero:
M = 4
N = 3
mask = np.array([
[False, True, True],
[True, False, True],
[True, False, True],
[False, True, False]
])
result = np.zeros((2, N), dtype=np.intp)
for col in range(N):
result[:, col] = np.flatnonzero(mask[:, col])[-2:]
This creates the expected result:
array([[1, 0, 1],
[2, 3, 2]], dtype=int64)
I would like to avoid the final loop. Is there a reasonably vectorized form of the above? I am looking for specifically two rows, which are always guaranteed to exist. A general solution for arbitrary element counts is not required.
An argsort does it -
In [9]: np.argsort(mask,axis=0,kind='stable')[-2:]
Out[9]:
array([[1, 0, 1],
[2, 3, 2]])
Another with cumsum -
c = mask.cumsum(0)
out = np.where((mask & (c>=c[-1]-1)).T)[1].reshape(-1,2).T
Specifically for exactly two rows, one way with argmax -
c = mask.copy()
idx = len(c)-c[::-1].argmax(0)-1
c[idx,np.arange(len(idx))] = 0
idx2 = len(c)-c[::-1].argmax(0)-1
out = np.vstack((idx2,idx))

Elegant way to check co-ordinates of a 2D NumPy array lie within a certain range

So let us say we have a 2D NumPy array (denoting co-ordinates) and I want to check whether all the co-ordinates lie within a certain range. What is the most Pythonic way to do this? For example:
a = np.array([[-1,2], [1,5], [6,7], [5,2], [3,4], [0, 0], [-1,-1]])
#ALL THE COORDINATES WITHIN x-> 0 to 4 AND y-> 0 to 4 SHOULD
BE PUT IN b (x and y ranges might not be equal)
b = #DO SOME OPERATION
>>> b
>>> [[3,4],
[0,0]]
If the range is the same for both directions, x, and y, just compare them and use all:
import numpy as np
a = np.array([[-1,2], [1,5], [6,7], [5,2], [3,4], [0, 0], [-1,-1]])
a[(a >= 0).all(axis=1) & (a <= 4).all(axis=1)]
# array([[3, 4],
# [0, 0]])
If the ranges are not the same, you can also compare to an iterable of the same size as that axis (so two here):
mins = 0, 1 # x_min, y_min
maxs = 4, 10 # x_max, y_max
a[(a >= mins).all(axis=1) & (a <= maxs).all(axis=1)]
# array([[1, 5],
# [3, 4]])
To see what is happening here, let's have a look at the intermediate steps:
The comparison gives a per-element result of the comparison, with the same shape as the original array:
a >= mins
# array([[False, True],
# [ True, True],
# [ True, True],
# [ True, True],
# [ True, True],
# [ True, False],
# [False, False]], dtype=bool)
Using nmpy.ndarray.all, you get if all values are truthy or not, similarly to the built-in function all:
(a >= mins).all()
# False
With the axis argument, you can restrict this to only compare values along one (or multiple) axis of the array:
(a >= mins).all(axis=1)
# array([False, True, True, True, True, False, False], dtype=bool)
(a >= mins).all(axis=0)
# array([False, False], dtype=bool)
Note that the output of this is the same shape as array, except that all dimnsions mentioned with axis have been contracted to a single True/False.
When indexing an array with a sequence of True, False values, it is cast to the right shape if possible. Since we index an array with shape (7, 2) with an (7,) = (7, 1) index, the values are implicitly repeated along the second dimension, so these values are used to select rows of the original array.

Categories