Detect if any value is above zero and change it - python

I have the following array:
[(True,False,True), (False,False,False), (False,False,True)]
If any element contains a True then they should all be true. So the above should become:
[(True,True,True), (False,False,False), (True,True,True)]
My below code attempts to do that but it simply converts all elements to True:
a = np.array([(True,False,True), (False,False,False), (False,True,False)], dtype='bool')
aint = a.astype('int')
print(aint)
aint[aint.sum() > 0] = (1,1,1)
print(aint.astype('bool'))
The output is:
[[1 0 1]
[0 0 0]
[0 1 0]]
[[ True True True]
[ True True True]
[ True True True]]

You could try np.any, which tests whether any array element along a given axis evaluates to True.
Here's a quick line of code that uses a list comprehension to get your intended result.
lst = [(True,False,True), (False,False,False), (False,False,True)]
result = [(np.any(x),) * len(x) for x in lst]
# result is [(True, True, True), (False, False, False), (True, True, True)]

I'm no numpy wizard but this should return what you want.
import numpy as np
def switch(arr):
if np.any(arr):
return np.ones(*arr.shape).astype(bool)
return arr.astype(bool)
np.apply_along_axis(switch, 1, a)
array([[ True, True, True],
[False, False, False],
[ True, True, True]])

ndarray.any along axis=1 and np.tile will get job done
np.tile(a.any(1)[:,None], a.shape[1])
array([[ True, True, True],
[False, False, False],
[ True, True, True]])

Create an array of True's based on the original array's second dimension and assign it to all rows that have a True in it.
>>> a
array([[ True, False, True],
[False, False, False],
[False, True, False]])
>>> a[a.any(1)] = np.ones(a.shape[1], dtype=bool)
>>> a
array([[ True, True, True],
[False, False, False],
[ True, True, True]])
>>>
Relies on Broadcasting.

Related

2d index to select elements from 1d array

I'm trying to use a 2d boolean array (ix) to pick elements from a 1d array (c) to create a 2d array (r). The resulting 2d array is also a boolean array. Each column stands for the unique value in c.
Example:
>>> ix
array([[ True, True, False, False, False, False, False],
[False, False, True, False, False, False, True],
[False, False, False, True, False, False, False]])
>>> c
array([1, 2, 3, 4, 8, 2, 4])
Expected result
1, 2, 3, 4, 8
r = [
[ True, True, False, False, False], # c[ix[0][0]] == 1 and c[ix[0][1]] == 2; it doesn't matter that ix[0][5] (pointing to `2` in `c`) is False as ix[0][1] was already True which is sufficient.
[False, False, True, True, False], # [3]
[False, False, False, True, False] # [4] as ix[2][3] is True
]
Can this be done in a vectorised way?
Let us try:
# unique values
uniques = np.unique(c)
# boolean index into each row
vals = np.tile(c,3)[ix.ravel()]
# search within the unique values
idx = np.searchsorted(uniques, vals)
# pre-populate output
out = np.full((len(ix), len(uniques)), False)
# index into the output:
out[np.repeat(np.arange(len(ix)), ix.sum(1)), idx ] = True
Output:
array([[ True, True, False, False, False],
[False, False, True, True, False],
[False, False, False, True, False]])

Efficient way to find indices of topmost True values in 2d boolean array (Python)

Suppose I have a 2d boolean array with shape (nrows,ncols). I'm trying to efficiently extract the indices of the topmost True value for each column in the array. If the column has all False values, then no indices are returned for that column. Below is an example of a boolean array with shape (4,6) where the indices of the bold Trues would be the desired output.
False False False False False False
True  False False True  False False
True  False True  False False True
True  False True  True  False False
Desired output of indices (row,col): [(1,0),(2,2),(1,3),(2,5)]
I tried using numpy.where and also an implementation of the skyline algorithm but both options are slow. Is there a more efficient way to solve this problem?
Thank you in advance for your help.
You can use np.argmax to detect the first True values.
Prepare the example array.
import numpy as np
a = np.array(
[[0,0,0,0,0,0],
[1,0,0,1,0,0],
[1,0,1,0,0,1],
[1,0,1,1,0,0]]).astype('bool')
a
Output
array([[False, False, False, False, False, False],
[ True, False, False, True, False, False],
[ True, False, True, False, False, True],
[ True, False, True, True, False, False]])
Stack one row of False to deal with columns without a True. Find first True in every column with np.argmax and append an arange for the row indices. You have to adjust the column indices by -1 because we added one row to the array. Then select the columns where the True's index was greater than 0
b = np.vstack([np.zeros_like(a[0]),a])
t = b.argmax(axis=0)
np.vstack([t - 1, np.arange(len(a[0]))]).T[t > 0]
Output
array([[1, 0],
[2, 2],
[1, 3],
[2, 5]])
Translating #HenryYik answer to numpy gives a one line solution
np.vstack([a.argmax(axis=0), np.arange(len(a[0]))]).T[a.sum(0) > 0]
Output
array([[1, 0],
[2, 2],
[1, 3],
[2, 5]])
If you are open to using pandas, you can construct a df, drop columns with False only and then idxmax:
arr = [[False, False, False, False, False, False],
[True, False, False, True, False, False],
[True, False, True, False, False, True],
[True, False, True, True, False, False]]
df = pd.DataFrame(arr, columns=range(len(arr[0])))
s = df.loc[:, df.sum()>0].idxmax()
print (s)
Result:
0 1
2 2
3 1
5 2
dtype: int64
Which is col value vs row value. You can convert it back to your desired form:
print (list(zip(s, s.index)))
[(1, 0), (2, 2), (1, 3), (2, 5)]
I suggest you try this:
def get_topmost(ar: np.ndarray):
return [(row.index(True), i) for i, row in enumerate(ar.T.tolist()) if True in row]
Example: (should works as is)
>>> test = np.array([
[False, False, False, False, False, False],
[True, False, False, True, False, False],
[True, False, True, False, False, True],
[True, False, True, True, False, False],
])
>>> print(get_topmost(test))
[(1, 0), (2, 2), (1, 3), (2, 5)]

numpy slicing the matrix based condition on the column python

X = np.arange(1, 26).reshape(5, 5)
X[:,1:2] % 2 == 0
The conditions should only be applied to the second column
I want the whole matrix where the condition is true like
[array([[False, True, False, False, False],
[ False, False, False, False, False],
[False, True, False, False, False],
[ False, False, False, False, False],
[False, True, False, False, False]])]
It's giving the error
IndexError: boolean index did not match indexed array along dimension 1; dimension is 5 but corresponding boolean dimension is 1
Is this what you want?
import numpy as np
X = np.arange(1, 26).reshape(5, 5)
X=[X[::] % 2 == 0]
print(X)
Output
[array([[False, True, False, True, False],
[ True, False, True, False, True],
[False, True, False, True, False],
[ True, False, True, False, True],
[False, True, False, True, False]])]
If you want to get the whole matrix where the condition is true. You can simply do this
X % 2 == 0
If you want to get the first column where condition is true then
X[:, 1:2] % 2 ==0

Python: Get last n Trues from Boolean array

I have a boolean array and a want to convert that to an array, where only the last_n_trues True values are still True. A simple example:
>>> boolean_array = [False, False, True, True, True, False, False]
>>> last_n_trues = 2
>>> desired_output = [False, False, False, True, True, False, False]
My approach:
>>> import numpy as np
>>> idxs_of_trues = np.where(boolean_array)[0]
array([2, 3, 4], dtype=int64)
>>> idxs_of_trues_last_n = idxs_of_trues[-last_n_trues:]
array([3, 4], dtype=int64)
>>> [x in idxs_of_trues_last_n for x in range(0, len(boolean_array))]
[False, False, False, True, True, False, False]
Is there a faster way to do so? Especially the list comprehension seems pretty complicated to me...
You should just be able to simply use np.where
In [116]: x
Out[116]: array([False, False, True, True, True, False, False], dtype=bool)
In [117]: x[np.where(x)[0][:-2]] = False
In [118]: x
Out[118]: array([False, False, False, True, True, False, False], dtype=bool)
This just replaces all True that aren't the last 2 with False
This will only work if x is a np.array, so verify that before you try this.
Approach #1 : Here's one with cumsum -
def keep_lastNTrue_cumsum(a, n):
c = np.count_nonzero(a) # or a.sum()
a[c - a.cumsum() >= n] = 0
return a
Approach #2 : Two more with argpartition -
def keep_lastNTrue_argpartition1(a, n):
c = np.count_nonzero(a) # or a.sum()
a[np.sort(np.argpartition(a,-n)[-c:])[:-n]] = 0
return a
def keep_lastNTrue_argpartition2(a, n):
c = np.count_nonzero(a) # or a.sum()
p = np.argpartition(a,-n)[-a.sum():]
cn = c-n
idx = np.argpartition(p,cn)
a[p[idx[:cn]]] = 0
return a
Approach #3 : Another with a bit more of mask usage -
def keep_lastNTrue_allmask(a, n):
c = a.sum()
set_mask = np.ones(c, dtype=bool)
set_mask[:-n] = False
a[a] = set_mask
return a
Sample runs -
In [141]: boolean_array = np.array([False, False, True, True, True, False, False])
In [142]: keep_lastNTrue_cumsum(boolean_array, n=2)
Out[142]: array([False, False, False, True, True, False, False])
In [143]: boolean_array = np.array([False, False, True, True, True, False, False])
In [144]: keep_lastNTrue_argpartition1(boolean_array, n=2)
Out[144]: array([False, False, False, True, True, False, False])
In [145]: boolean_array = np.array([False, False, True, True, True, False, False])
In [146]: keep_lastNTrue_argpartition2(boolean_array, n=2)
Out[146]: array([False, False, False, True, True, False, False])
The fastest way without libraries is going to be to clone the list and iterate through it in reverse:
def foo(bools, last_n_trues):
result = bools[:]
count = 0
for i in range(len(bools) - 1, -1, -1):
if count < last_n_trues:
if result[i]:
count += 1
else:
result[i] = False
return result

Combining 3 boolean masks in Python

I have 3 lists:
a = [True, False, True]
b = [False, False, True]
c = [True, True, False]
When I type
a or b or c
I want to get back a list that's
[True, True, True]
but I'm getting back
[True, False, True]
Any ideas on why? And how can I combine these masks?
Your or operators are comparing the lists as entire objects, not their elements. Since a is not an empty list, it evaluates as true, and becomes the result of the or. b and c are not even evaluated.
To produce the logical OR of the three lists position-wise, you have to iterate over their contents and OR the values at each position. To convert a bunch of iterables into a list of their grouped elements, use zip(). To check if any element in an iterable is true (the OR of its entire contents), use any(). Do these two at once with a list comprehension:
mask = [any(tup) for tup in zip(a, b, c)]
How about this:
from numpy import asarray as ar
a = [True, False, True]
b = [False, False, True]
c = [True, True, False]
Try:
>>> ar(a) | ar(b) | ar(c) #note also the use `|` instead of `or`
array([ True, True, True], dtype=bool)
So no need for zip etc.
or returns the first operand if it evaluates as true, and a non-empty list evaluates as true; so, a or b or c will always return a if it's a non-empty list.
Probably you want
[any(t) for t in zip(a, b, c)]
(this works also for element-wise and if you replace any with all)
a is treated as true because it contains values; b, c is not evaluated.
>>> bool([])
False
>>> bool([True])
True
>>> bool([False])
True
>>> [False] or [True]
[False]
According to Boolean Operations:
The expression x or y first evaluates x; if x is true, its value is returned; otherwise, y is evaluated and the resulting value is returned.
how about
result = numpy.logical_or(a, b, c)
print(result)
Try this:
a = [True, False, True]
b = [False, False, True]
c = [True, True, False]
res = [a[i] or b[i] or c[i] for i in range(len(a))]
print res
Here is how to do it fast (large arrays) using numpy:
import numpy as np
a = [True, False, True,...]
b = [False, False, True,...]
c = [True, True, False,...]
res = (np.column_stack((a,b,c)).any(axis=1)
print res
Note that a becomes the first column, b the second, and so on when using np.column_stack(). Then do a np.any() (logical OR) on that array along axis=1 which will compare the first elements of a,b, and c, and so on, and so on; resulting in a boolean vector that is the same length as the vectors you want to compare.
This is a good place to use one of python's asterisk functions (*args, **kwargs), which will allow you to pass 3 or 300000 lists.
a = [True, False, True]
b = [False, False, True]
c = [True, True, False]
data_lists = [a,b,c]
Here you use * to expand that list as arguments, zip to rearrange your data into columns (as lists), and any to check if any cell in that column is True.
[any(l) for l in zip(*data_lists)]
[True, True, True]
If you're working with a lot of lists it's the same, e.g.
import numpy as np
data_lists = [[True if (i == j) else False for i in range(7)] for j in range(7)]
np.matrix(data_lists)
matrix([[ True, False, False, False, False, False, False],
[False, True, False, False, False, False, False],
[False, False, True, False, False, False, False],
[False, False, False, True, False, False, False],
[False, False, False, False, True, False, False],
[False, False, False, False, False, True, False],
[False, False, False, False, False, False, True]])
[any(l) for l in zip(*data_lists)]
[True, True, True, True, True, True, True]

Categories