Apply numpy 'where' along one of axes - python

I have an array like that:
array = np.array([
[True, False],
[True, False],
[True, False],
[True, True],
])
I would like to find the last occurance of True for each row of the array.
If it was 1d array I would do it in this way:
np.where(array)[0][-1]
How do I do something similar in 2D? Kind of like:
np.where(array, axis = 1)[0][:,-1]
but there is no axis argument in np.where.

Since True is greater than False, find the position of the largest element in each row. Unfortunately, argmax finds the first largest element, not the last one. So, reverse the array sideways, find the first True from the end, and recalculate the indexes:
(array.shape[1] - 1) - array[:, ::-1].argmax(axis=1)
# array([0, 0, 0, 1])
The method fails if there are no True values in a row. You can check if that's the case by dividing by array.max(axis=1). A row with no Trues will have its last True at the infinity :)
array[0, 0] = False
((array.shape[1] - 1) - array[:, ::-1].argmax(axis=1)) / array.max(axis=1)
#array([inf, 0., 0., 1.])

I found an older answer but didn't like that it returns 0 for both a True in the first position, and for a row of False.
So here's a way to solve that problem, if it's important to you:
import numpy as np
arr = np.array([[False, False, False], # -1
[False, False, True], # 2
[True, False, False], # 0
[True, False, True], # 2
[True, True, False], # 1
[True, True, True], # 2
])
# Make an adustment for no Trues at all.
adj = np.sum(arr, axis=1) == 0
# Get the position and adjust.
x = np.argmax(np.cumsum(arr, axis=1), axis=1) - adj
# Compare to expected result:
assert np.all(x == np.array([-1, 2, 0, 2, 1, 2]))
print(x)
Gives [-1 2 0 2 1 2].

Related

Efficienctly selecting rows that end with zeros in numpy

I have a tensor / array of shape N x M, where M is less than 10 but N can potentially be > 2000. All entries are larger than or equal to zero. I want to filter out rows that either
Do not contain any zeros
End with zeros only, i.e [1,2,0,0] would be valid but not [1,0,2,0] or [0,0,1,2]. Put differently once a zero appears all following entries of that row must also be zero, otherwise the row should be ignored.
as efficiently as possible. Consider the following example
Example:
[[35, 25, 17], # no zeros -> valid
[12, 0, 0], # ends with zeros -> valid
[36, 2, 0], # ends with zeros -> valid
[8, 0, 9]] # contains zeros and does not end with zeros -> invalid
should yield [True, True, True, False]. The straightforward implementation I came up with is:
import numpy as np
T = np.array([[35,25,17], [12,0,0], [36,2,0], [0,0,9]])
N,M = T.shape
valid = [i*[True,] + (M-i)*[False,] for i in range(1, M+1)]
mask = [((row > 0).tolist() in valid) for row in T]
Is there a more elegant and efficient solution to this? Any help is greatly appreciated!
Here's one way:
x[np.all((x == 0) == (x.cumprod(axis=1) == 0), axis=1)]
This calculates the row-wise cumulative product, matches the original array's zeros up with the cumprod array, then filters any rows where there's one or more False.
Workings:
In [3]: x
Out[3]:
array([[35, 25, 17],
[12, 0, 0],
[36, 2, 0],
[ 8, 0, 9]])
In [4]: x == 0
Out[4]:
array([[False, False, False],
[False, True, True],
[False, False, True],
[False, True, False]])
In [5]: x.cumprod(axis=1) == 0
Out[5]:
array([[False, False, False],
[False, True, True],
[False, False, True],
[False, True, True]])
In [6]: (x == 0) == (x.cumprod(axis=1) == 0)
Out[6]:
array([[ True, True, True],
[ True, True, True],
[ True, True, True],
[ True, True, False]]) # bad row!
In [7]: np.all((x == 0) == (x.cumprod(axis=1) == 0), axis=1)
Out[7]: array([ True, True, True, False])

Count number of transitions in each row of a Numpy array

I have a 2D boolean array
a=np.array([[True, False, True, False, True],[True , True, True , True, True], [True , True ,False, False ,False], [False, True , True, False, False], [True , True ,False, True, False]])
I would like to create a new array, providing count of True-False transitions in each row of this array.
The desired result is count=[2, 0, 1, 1, 2]
I operate with a large numpy array, so I don't apply cycle to browse through all lines.
I tried to adopt available solutions to a 2D array with counting for each line separately, but did not succeed.
Here is a possible solution:
b = a.astype(int)
c = (b[:, :-1] - b[:, 1:])
count = (c == 1).sum(axis=1)
Result:
>>> count
array([2, 0, 1, 1, 2])

Elegant way to check co-ordinates of a 2D NumPy array lie within a certain range

So let us say we have a 2D NumPy array (denoting co-ordinates) and I want to check whether all the co-ordinates lie within a certain range. What is the most Pythonic way to do this? For example:
a = np.array([[-1,2], [1,5], [6,7], [5,2], [3,4], [0, 0], [-1,-1]])
#ALL THE COORDINATES WITHIN x-> 0 to 4 AND y-> 0 to 4 SHOULD
BE PUT IN b (x and y ranges might not be equal)
b = #DO SOME OPERATION
>>> b
>>> [[3,4],
[0,0]]
If the range is the same for both directions, x, and y, just compare them and use all:
import numpy as np
a = np.array([[-1,2], [1,5], [6,7], [5,2], [3,4], [0, 0], [-1,-1]])
a[(a >= 0).all(axis=1) & (a <= 4).all(axis=1)]
# array([[3, 4],
# [0, 0]])
If the ranges are not the same, you can also compare to an iterable of the same size as that axis (so two here):
mins = 0, 1 # x_min, y_min
maxs = 4, 10 # x_max, y_max
a[(a >= mins).all(axis=1) & (a <= maxs).all(axis=1)]
# array([[1, 5],
# [3, 4]])
To see what is happening here, let's have a look at the intermediate steps:
The comparison gives a per-element result of the comparison, with the same shape as the original array:
a >= mins
# array([[False, True],
# [ True, True],
# [ True, True],
# [ True, True],
# [ True, True],
# [ True, False],
# [False, False]], dtype=bool)
Using nmpy.ndarray.all, you get if all values are truthy or not, similarly to the built-in function all:
(a >= mins).all()
# False
With the axis argument, you can restrict this to only compare values along one (or multiple) axis of the array:
(a >= mins).all(axis=1)
# array([False, True, True, True, True, False, False], dtype=bool)
(a >= mins).all(axis=0)
# array([False, False], dtype=bool)
Note that the output of this is the same shape as array, except that all dimnsions mentioned with axis have been contracted to a single True/False.
When indexing an array with a sequence of True, False values, it is cast to the right shape if possible. Since we index an array with shape (7, 2) with an (7,) = (7, 1) index, the values are implicitly repeated along the second dimension, so these values are used to select rows of the original array.

Delete rows from a multidimensional array in Python

Im trying to delete specific rows in my numpy array that following certain conditions.
This is an example:
a = np.array ([[1,1,0,0,1],
[0,0,1,1,1],
[0,1,0,1,1],
[1,0,1,0,1],
[0,0,1,0,1],
[1,0,1,0,0]])
I want to able to delete all rows, where specific columns are zero, this array could be a lot bigger.
In this example, if first two element are zero, or if last two elements are zero, the rows will be deleted.
It could be any combination, no only first element or last ones.
This should be the final:
a = np.array ([[1,1,0,0,1],
[0,1,0,1,1],
[1,0,1,0,1]])
For example If I try:
a[:,0:2] == 0
After reading:
Remove lines with empty values from multidimensional-array in php
and this question: How to delete specific rows from a numpy array using a condition?
But they don't seem to apply to my case, or probably I'm not understanding something here as nothing works my case.
This gives me all rows there the first two cases are zero, True, True
array([[False, False],
[ True, True],
[ True, False],
[False, True],
[ True, True],
[False, True]])
and for the last two columns being zero, the last row should be deleted too. So at the end I will only be left with 2 rows.
a[:,3:5] == 0
array([[ True, False],
[False, False],
[False, False],
[ True, False],
[ True, False],
[ True, True]])
Im trying something like this, but I don't understand now how to tell it to only give me the rows that follow the condition, although this only :
(a[a[:,0:2]] == 0).all(axis=1)
array([[ True, True, False, False, False],
[False, False, True, True, False],
[False, False, False, False, False],
[False, False, False, False, False],
[False, False, True, True, False],
[False, False, False, False, False]])
(a[((a[:,0])& (a[:,1])) ] == 0).all(axis=1)
and this shows everything as False
could you please guide me a bit?
thank you
Just adding in the question, that the case it wont always be the first 2 or the last 2. If my matrix has 35 columns, it could be the column 6th to 10th, and then column 20th and 25th. An user will be able to decide which columns they want to get deleted.
Try this
idx0 = (a[:,0:2] == 0).all(axis=1)
idx1 = (a[:,-2:] == 0).all(axis=1)
a[~(idx0 | idx1)]
The first two steps select the indices of the rows that match your filtering criteria. Then do an or (|) operation, and the not (~) operation to obtain the final indices you want.
If I understood correctly you could do something like this:
import numpy as np
a = np.array([[1, 1, 0, 0, 1],
[0, 0, 1, 1, 1],
[0, 1, 0, 1, 1],
[1, 0, 1, 0, 1],
[0, 0, 1, 0, 1],
[1, 0, 1, 0, 0]])
left = np.count_nonzero(a[:, :2], axis=1) != 0
a = a[left]
right = np.count_nonzero(a[:, -2:], axis=1) != 0
a = a[right]
print(a)
Output
[[1 1 0 0 1]
[0 1 0 1 1]
[1 0 1 0 1]]
Or, a shorter version:
left = np.count_nonzero(a[:, :2], axis=1) != 0
right = np.count_nonzero(a[:, -2:], axis=1) != 0
a = a[(left & right)]
Use the following mask:
[np.any(a[:,:2], axis=1) & np.any(a[:,:-2], axis=1)]
if you want to create a filtered view:
a[np.any(a[:,:2], axis=1) & np.any(a[:,:-2], axis=1)]
if you want to create a new array:
np.delete(a,np.where(~(np.any(a[:,:2], axis=1) & np.any(a[:,:-2], axis=1))), axis=0)

How to create multiple column list of booleans from given list of integers in phython?

I am new to Python. I want to do following.
Input: A list of integers of size n. Each integer is in a range of 0 to 3.
Output: A multi-column (4 column in this case as integer range in 0-3 = 4) numpy list of size n. Each row of the new list will have the column corresponding to the integer value of Input list as True and rest of the columns as False.
E.g. Input list : [0, 3, 2, 1, 1, 2], size = 6, Each integer is in range of 0-3
Output list :
Row 0: True False False False
Row 1: False False False True
Row 2: False False True False
Row 3: False True False False
Row 4: False True False False
Row 5: False False True False
Now, I can start with 4 columns. Traverse through the input list and create this as follows,
output_columns[].
for i in Input list:
output_column[i] = True
Create an output numpy list with output columns
Is this the best way to do this in Python? Especially for creating numpy list as an output.
If yes, How do I merge output_columns[] at the end to create numpy multidimensional list with each dimension as a column of output_columns.
If not, what would be the best (most time efficient way) to do this in Python?
Thank you,
Is this the best way to do this in Python?
No, a more Pythonic and probably the best way is to use a simple broadcasting comparison as following:
In [196]: a = np.array([0, 3, 2, 1, 1, 2])
In [197]: r = list(range(0, 4))
In [198]: a[:,None] == r
Out[198]:
array([[ True, False, False, False],
[False, False, False, True],
[False, False, True, False],
[False, True, False, False],
[False, True, False, False],
[False, False, True, False]])
You are creating so called one-hot vector (each row in matrix is a one-hot vector meaning that only one value is True).
mylist = [0, 3, 2, 1, 1, 2]
one_hot = np.zeros((len(mylist), 4), dtype=np.bool)
for i, v in enumerate(mylist):
one_hot[i, v] = True
Output
array([[ True, False, False, False],
[False, False, False, True],
[False, False, True, False],
[False, True, False, False],
[False, True, False, False],
[False, False, True, False]], dtype=bool)

Categories