Python: Get last n Trues from Boolean array - python

I have a boolean array and a want to convert that to an array, where only the last_n_trues True values are still True. A simple example:
>>> boolean_array = [False, False, True, True, True, False, False]
>>> last_n_trues = 2
>>> desired_output = [False, False, False, True, True, False, False]
My approach:
>>> import numpy as np
>>> idxs_of_trues = np.where(boolean_array)[0]
array([2, 3, 4], dtype=int64)
>>> idxs_of_trues_last_n = idxs_of_trues[-last_n_trues:]
array([3, 4], dtype=int64)
>>> [x in idxs_of_trues_last_n for x in range(0, len(boolean_array))]
[False, False, False, True, True, False, False]
Is there a faster way to do so? Especially the list comprehension seems pretty complicated to me...

You should just be able to simply use np.where
In [116]: x
Out[116]: array([False, False, True, True, True, False, False], dtype=bool)
In [117]: x[np.where(x)[0][:-2]] = False
In [118]: x
Out[118]: array([False, False, False, True, True, False, False], dtype=bool)
This just replaces all True that aren't the last 2 with False
This will only work if x is a np.array, so verify that before you try this.

Approach #1 : Here's one with cumsum -
def keep_lastNTrue_cumsum(a, n):
c = np.count_nonzero(a) # or a.sum()
a[c - a.cumsum() >= n] = 0
return a
Approach #2 : Two more with argpartition -
def keep_lastNTrue_argpartition1(a, n):
c = np.count_nonzero(a) # or a.sum()
a[np.sort(np.argpartition(a,-n)[-c:])[:-n]] = 0
return a
def keep_lastNTrue_argpartition2(a, n):
c = np.count_nonzero(a) # or a.sum()
p = np.argpartition(a,-n)[-a.sum():]
cn = c-n
idx = np.argpartition(p,cn)
a[p[idx[:cn]]] = 0
return a
Approach #3 : Another with a bit more of mask usage -
def keep_lastNTrue_allmask(a, n):
c = a.sum()
set_mask = np.ones(c, dtype=bool)
set_mask[:-n] = False
a[a] = set_mask
return a
Sample runs -
In [141]: boolean_array = np.array([False, False, True, True, True, False, False])
In [142]: keep_lastNTrue_cumsum(boolean_array, n=2)
Out[142]: array([False, False, False, True, True, False, False])
In [143]: boolean_array = np.array([False, False, True, True, True, False, False])
In [144]: keep_lastNTrue_argpartition1(boolean_array, n=2)
Out[144]: array([False, False, False, True, True, False, False])
In [145]: boolean_array = np.array([False, False, True, True, True, False, False])
In [146]: keep_lastNTrue_argpartition2(boolean_array, n=2)
Out[146]: array([False, False, False, True, True, False, False])

The fastest way without libraries is going to be to clone the list and iterate through it in reverse:
def foo(bools, last_n_trues):
result = bools[:]
count = 0
for i in range(len(bools) - 1, -1, -1):
if count < last_n_trues:
if result[i]:
count += 1
else:
result[i] = False
return result

Related

2d index to select elements from 1d array

I'm trying to use a 2d boolean array (ix) to pick elements from a 1d array (c) to create a 2d array (r). The resulting 2d array is also a boolean array. Each column stands for the unique value in c.
Example:
>>> ix
array([[ True, True, False, False, False, False, False],
[False, False, True, False, False, False, True],
[False, False, False, True, False, False, False]])
>>> c
array([1, 2, 3, 4, 8, 2, 4])
Expected result
1, 2, 3, 4, 8
r = [
[ True, True, False, False, False], # c[ix[0][0]] == 1 and c[ix[0][1]] == 2; it doesn't matter that ix[0][5] (pointing to `2` in `c`) is False as ix[0][1] was already True which is sufficient.
[False, False, True, True, False], # [3]
[False, False, False, True, False] # [4] as ix[2][3] is True
]
Can this be done in a vectorised way?
Let us try:
# unique values
uniques = np.unique(c)
# boolean index into each row
vals = np.tile(c,3)[ix.ravel()]
# search within the unique values
idx = np.searchsorted(uniques, vals)
# pre-populate output
out = np.full((len(ix), len(uniques)), False)
# index into the output:
out[np.repeat(np.arange(len(ix)), ix.sum(1)), idx ] = True
Output:
array([[ True, True, False, False, False],
[False, False, True, True, False],
[False, False, False, True, False]])

Detect if any value is above zero and change it

I have the following array:
[(True,False,True), (False,False,False), (False,False,True)]
If any element contains a True then they should all be true. So the above should become:
[(True,True,True), (False,False,False), (True,True,True)]
My below code attempts to do that but it simply converts all elements to True:
a = np.array([(True,False,True), (False,False,False), (False,True,False)], dtype='bool')
aint = a.astype('int')
print(aint)
aint[aint.sum() > 0] = (1,1,1)
print(aint.astype('bool'))
The output is:
[[1 0 1]
[0 0 0]
[0 1 0]]
[[ True True True]
[ True True True]
[ True True True]]
You could try np.any, which tests whether any array element along a given axis evaluates to True.
Here's a quick line of code that uses a list comprehension to get your intended result.
lst = [(True,False,True), (False,False,False), (False,False,True)]
result = [(np.any(x),) * len(x) for x in lst]
# result is [(True, True, True), (False, False, False), (True, True, True)]
I'm no numpy wizard but this should return what you want.
import numpy as np
def switch(arr):
if np.any(arr):
return np.ones(*arr.shape).astype(bool)
return arr.astype(bool)
np.apply_along_axis(switch, 1, a)
array([[ True, True, True],
[False, False, False],
[ True, True, True]])
ndarray.any along axis=1 and np.tile will get job done
np.tile(a.any(1)[:,None], a.shape[1])
array([[ True, True, True],
[False, False, False],
[ True, True, True]])
Create an array of True's based on the original array's second dimension and assign it to all rows that have a True in it.
>>> a
array([[ True, False, True],
[False, False, False],
[False, True, False]])
>>> a[a.any(1)] = np.ones(a.shape[1], dtype=bool)
>>> a
array([[ True, True, True],
[False, False, False],
[ True, True, True]])
>>>
Relies on Broadcasting.

What is best way to find first half of True's in boolean numpy array?

Here is the problem:
Take a numpy boolean array:
a = np.array([False, False, True, True, True, False, True, False])
Which I am using as indexes to panda dataframe. But I need to create 2 new arrays where they each have half the True's as the original array. Note the example arrays are much shorter than actual set.
So like:
1st_half = array([False, False, True, True, False, False, False, False])
2nd_half = array([False, False, False, False, True, False, True, False])
Does anyone have a good way to do this? Thanks.
First find true indices
inds = np.where(a)[0]
cutoff = inds[inds.shape[0]//2]
Set values equivalent before and after cutoff:
b = np.zeros(a.shape,dtype=bool)
c = np.zeros(a.shape,dtype=bool)
c[cutoff:] = a[cutoff:]
b[:cutoff] = a[:cutoff]
Results:
b
Out[65]: array([False, False, True, True, False, False, False, False], dtype=bool)
c
Out[64]: array([False, False, False, False, True, False, True, False], dtype=bool)
There are numerous ways to handle the indexing.

Combining 3 boolean masks in Python

I have 3 lists:
a = [True, False, True]
b = [False, False, True]
c = [True, True, False]
When I type
a or b or c
I want to get back a list that's
[True, True, True]
but I'm getting back
[True, False, True]
Any ideas on why? And how can I combine these masks?
Your or operators are comparing the lists as entire objects, not their elements. Since a is not an empty list, it evaluates as true, and becomes the result of the or. b and c are not even evaluated.
To produce the logical OR of the three lists position-wise, you have to iterate over their contents and OR the values at each position. To convert a bunch of iterables into a list of their grouped elements, use zip(). To check if any element in an iterable is true (the OR of its entire contents), use any(). Do these two at once with a list comprehension:
mask = [any(tup) for tup in zip(a, b, c)]
How about this:
from numpy import asarray as ar
a = [True, False, True]
b = [False, False, True]
c = [True, True, False]
Try:
>>> ar(a) | ar(b) | ar(c) #note also the use `|` instead of `or`
array([ True, True, True], dtype=bool)
So no need for zip etc.
or returns the first operand if it evaluates as true, and a non-empty list evaluates as true; so, a or b or c will always return a if it's a non-empty list.
Probably you want
[any(t) for t in zip(a, b, c)]
(this works also for element-wise and if you replace any with all)
a is treated as true because it contains values; b, c is not evaluated.
>>> bool([])
False
>>> bool([True])
True
>>> bool([False])
True
>>> [False] or [True]
[False]
According to Boolean Operations:
The expression x or y first evaluates x; if x is true, its value is returned; otherwise, y is evaluated and the resulting value is returned.
how about
result = numpy.logical_or(a, b, c)
print(result)
Try this:
a = [True, False, True]
b = [False, False, True]
c = [True, True, False]
res = [a[i] or b[i] or c[i] for i in range(len(a))]
print res
Here is how to do it fast (large arrays) using numpy:
import numpy as np
a = [True, False, True,...]
b = [False, False, True,...]
c = [True, True, False,...]
res = (np.column_stack((a,b,c)).any(axis=1)
print res
Note that a becomes the first column, b the second, and so on when using np.column_stack(). Then do a np.any() (logical OR) on that array along axis=1 which will compare the first elements of a,b, and c, and so on, and so on; resulting in a boolean vector that is the same length as the vectors you want to compare.
This is a good place to use one of python's asterisk functions (*args, **kwargs), which will allow you to pass 3 or 300000 lists.
a = [True, False, True]
b = [False, False, True]
c = [True, True, False]
data_lists = [a,b,c]
Here you use * to expand that list as arguments, zip to rearrange your data into columns (as lists), and any to check if any cell in that column is True.
[any(l) for l in zip(*data_lists)]
[True, True, True]
If you're working with a lot of lists it's the same, e.g.
import numpy as np
data_lists = [[True if (i == j) else False for i in range(7)] for j in range(7)]
np.matrix(data_lists)
matrix([[ True, False, False, False, False, False, False],
[False, True, False, False, False, False, False],
[False, False, True, False, False, False, False],
[False, False, False, True, False, False, False],
[False, False, False, False, True, False, False],
[False, False, False, False, False, True, False],
[False, False, False, False, False, False, True]])
[any(l) for l in zip(*data_lists)]
[True, True, True, True, True, True, True]

numpy: to exclude single and double saltuses in an array

I use Python with numpy.
I have numpy array b:
b = np.array([True,True,True,False,False,True,True,False,False,False,True,False,True])
I need to replace arrays [False] and [False,False] on arrays [True] and [True,True] respectively.
(I need to exclude single and double saltuses of an array in False)
For this exsample:
out= np.array([True,True,True,True, True, True,True,False,False,False,True,True,True])
Can someone please suggest, how do I get out?
P.S: if I need to replace arrays [False],[False,False], [False,False,False] and [False,False,False,False] on arrays [True],[True,True], [True,True,True] and [True,True,True,True] respectively?
How about use scipy.ndimage.binary_dilation & scipy.ndimage.binary_erosion
import numpy as np
from scipy import ndimage
b = np.array([True,True,True,False,False,True,True,False,False,False,True,False,True])
ndimage.binary_erosion(ndimage.binary_dilation(b), border_value=1)
This is may not be the best way to solve this but have a look at the following ...
In [115]: b
Out[115]:
array([ True, True, True, False, False, True, True, False, False,
False, True, False, True], dtype=bool)
In [116]: l = [(k,len(list(g))) for k, g in itertools.groupby(b)]
In [117]: l
Out[117]:
[(True, 3),
(False, 2),
(True, 2),
(False, 3),
(True, 1),
(False, 1),
(True, 1)]
In [118]: l2 = [(True, x[1]) if x[1] in [1,2] else x for x in l]
In [119]: l2
Out[119]: [(True, 3), (True, 2), (True, 2), (False, 3), (True, 1), (True, 1), (True, 1)]
In [120]: l3 = [[x[0]] * x[1] for x in l2]
In [121]: l3
Out[121]:
[[True, True, True],
[True, True],
[True, True],
[False, False, False],
[True],
[True],
[True]]
In [122]: l4 = [x for x in itertools.chain(*l3)]
In [123]: l4
Out[123]:
[True,
True,
True,
True,
True,
True,
True,
False,
False,
False,
True,
True,
True]
In [124]: out = np.array(l4)
In [125]: out
Out[125]:
array([ True, True, True, True, True, True, True, False, False,
False, True, True, True], dtype=bool)
I haven't numpy installed, but I think the following code will give you idea to do something similar in numpy, if I understood correctly what you want:
b = [True,True,True,False,False,True,True,False,False,False,True,False,True,
False,False,False,False,True,False,False,True]
print b,'\n'
def grignote(X):
it = iter(xrange(len(X)))
for i in it:
print 'i == %d %s' % (i,X[i])
if X[i]==False:
j = (k for k in it if X[k]==True).next()
print ' j == %d %s X[%d:%d]==%r' % (j,X[j],i,j,X[i:j])
if j-i<3:
print ' executing X[%d:%d]==%r' % (i,j,[True for m in xrange(j-i)])
X[i:j] = [True for m in xrange(j-i)]
else:
print ' --- no execution --- too long'
grignote(b)
print '\n',b
result
[True, True, True, False, False, True, True, False, False, False, True, False, True, False, False, False, False, True, False, False, True]
i == 0 True
i == 1 True
i == 2 True
i == 3 False
j == 5 True X[3:5]==[False, False]
executing X[3:5]==[True, True]
i == 6 True
i == 7 False
j == 10 True X[7:10]==[False, False, False]
--- no execution --- too long
i == 11 False
j == 12 True X[11:12]==[False]
executing X[11:12]==[True]
i == 13 False
j == 17 True X[13:17]==[False, False, False, False]
--- no execution --- too long
i == 18 False
j == 20 True X[18:20]==[False, False]
executing X[18:20]==[True, True]
[True, True, True, True, True, True, True, False, False, False, True, True, True, False, False, False, False, True, True, True, True]

Categories