numpy conditional list membership element wise

numpy conditional list membership element wise - python

I have a 2D numpy array:
a = np.array([[0,1],
[2,3]])
I have a list of values to keep:
vals_keep = [1,2]
I want to test for list membership for each element in the array. Something like:
mask = a in vals_keep
The result I want:
array([[False, True],
[True, False]])

You can use isin
isin is an element-wise function version of the python keyword in
np.isin(a, vals_keep)
array([[False, True],
[ True, False]])
An added benefit of isin is that it's flexible with arrays of different dimensions:
a = np.arange(4).reshape(1,2,2,1)
np.isin(a, vals_keep)
array([[[[False],
[ True]],
[[ True],
[False]]]])

Here is one way using broadcasting:
In [35]: (a[:, :, None] == vals_keep).any(2)
Out[35]:
array([[False, True],
[ True, False]])
Which is faster than isin for small arrays (less than 100 rows):
In [37]: %timeit np.isin(a, vals_keep)
22 µs ± 728 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [38]: %timeit (a[:, :, None] == vals_keep).any(2)
12.6 µs ± 95.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
For large arrays it's better to use isin because broadcasting in 3D is not very efficient for large arrays/matrices.

Related

How to obtain only the first True value from each row of a numpy array?

I have a 4x3 boolean numpy array, and I'm trying to return a same-sized array which is all False, except for the location of the first True value on each row of the original. So if I have a starting array of
all_bools = np.array([[False, True, True],[True, True, True],[False, False, True],[False,False,False]])
all_bools
array([[False, True, True], # First true value = index 1
[ True, True, True], # First true value = index 0
[False, False, True], # First true value = index 2
[False, False, False]]) # No True Values
then I'd like to return
[[False, True, False],
[True, False, False],
[False, False, True],
[False, False, False]]
so indices 1, 0 and 2 on the first three rows have been set to True and nothing else. Essentially any True value (beyond the first on each row) from the original way have been set to False.
I've been fiddling around with this with np.where and np.argmax and I haven't yet found a good solution - any help gratefully received. This needs to run many, many times so I'd like to avoid iterating.

You can use cumsum, and find the first bool by comparing the result with 1.
all_bools.cumsum(axis=1).cumsum(axis=1) == 1
array([[False, True, False],
[ True, False, False],
[False, False, True],
[False, False, False]])
This also accounts for the issue #a_guest pointed out. The second cumsum call is needed to avoid matching all False values between the first and second True value.
If performance is important, use argmax and set values:
y = np.zeros_like(all_bools, dtype=bool)
idx = np.arange(len(x)), x.argmax(axis=1)
y[idx] = x[idx]
y
array([[False, True, False],
[ True, False, False],
[False, False, True],
[False, False, False]])
Perfplot Performance Timings
I'll take this opportunity to show off perfplot, with some timings, since it is good to see how our solutions vary with different sized inputs.
import numpy as np
import perfplot
def cs1(x):
return x.cumsum(axis=1).cumsum(axis=1) == 1
def cs2(x):
y = np.zeros_like(x, dtype=bool)
idx = np.arange(len(x)), x.argmax(axis=1)
y[idx] = x[idx]
return y
def a_guest(x):
b = np.zeros_like(x, dtype=bool)
i = np.argmax(x, axis=1)
b[np.arange(i.size), i] = np.logical_or.reduce(x, axis=1)
return b
perfplot.show(
setup=lambda n: np.random.randint(0, 2, size=(n, n)).astype(bool),
kernels=[cs1, cs2, a_guest],
labels=['cs1', 'cs2', 'a_guest'],
n_range=[2**k for k in range(1, 8)],
xlabel='N'
)
The trend carries forward to larger N. cumsum is very expensive, while there is a constant time difference between my second solution, and #a_guest's.

You can use the following approach using np.argmax and a product with np.logical_or.reduce for dealing with rows that are all False:
b = np.zeros_like(a, dtype=bool)
i = np.argmax(a, axis=1)
b[np.arange(i.size), i] = np.logical_or.reduce(a, axis=1)
Timing results
Different versions in increasing performance, i.e. fastest approach comes last:
In [1]: import numpy as np
In [2]: def f(a):
...: return a.cumsum(axis=1).cumsum(axis=1) == 1
...:
...:
In [3]: def g(a):
...: b = np.zeros_like(a, dtype=bool)
...: i = np.argmax(a, axis=1)
...: b[np.arange(i.size), i] = np.logical_or.reduce(a, axis=1)
...: return b
...:
...:
In [4]: x = np.random.randint(0, 2, size=(1000, 1000)).astype(bool)
In [5]: %timeit f(x)
10.4 ms ± 155 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [6]: %timeit g(x)
120 µs ± 184 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [7]: def h(a):
...: y = np.zeros_like(x)
...: idx = np.arange(len(x)), x.argmax(axis=1)
...: y[idx] += x[idx]
...: return y
...:
...:
In [8]: %timeit h(x)
92.1 µs ± 3.51 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [9]: def h2(a):
...: y = np.zeros_like(x)
...: idx = np.arange(len(x)), x.argmax(axis=1)
...: y[idx] = x[idx]
...: return y
...:
...:
In [10]: %timeit h2(x)
78.5 µs ± 353 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Search indexes where values in my array match a value in a different array (python) [duplicate]

I have two numpy arrays, A and B. A conatains unique values and B is a sub-array of A.
Now I am looking for a way to get the index of B's values within A.
For example:
A = np.array([1,2,3,4,5,6,7,8,9,10])
B = np.array([1,7,10])
# I need a function fun() that:
fun(A,B)
>> 0,6,9

You can use np.in1d with np.nonzero -
np.nonzero(np.in1d(A,B))[0]
You can also use np.searchsorted, if you care about maintaining the order -
np.searchsorted(A,B)
For a generic case, when A & B are unsorted arrays, you can bring in the sorter option in np.searchsorted, like so -
sort_idx = A.argsort()
out = sort_idx[np.searchsorted(A,B,sorter = sort_idx)]
I would add in my favorite broadcasting too in the mix to solve a generic case -
np.nonzero(B[:,None] == A)[1]
Sample run -
In [125]: A
Out[125]: array([ 7, 5, 1, 6, 10, 9, 8])
In [126]: B
Out[126]: array([ 1, 10, 7])
In [127]: sort_idx = A.argsort()
In [128]: sort_idx[np.searchsorted(A,B,sorter = sort_idx)]
Out[128]: array([2, 4, 0])
In [129]: np.nonzero(B[:,None] == A)[1]
Out[129]: array([2, 4, 0])

Have you tried searchsorted?
A = np.array([1,2,3,4,5,6,7,8,9,10])
B = np.array([1,7,10])
A.searchsorted(B)
# array([0, 6, 9])

Just for completeness: If the values in A are non negative and reasonably small:
lookup = np.empty((np.max(A) + 1), dtype=int)
lookup[A] = np.arange(len(A))
indices = lookup[B]

I had the same question these days. However, the timing performance is very critical for me. Therefore, I guess the timing comparison of different solutions may be useful for others.
As Divakar mentioned, you can use np.in1d(A, B) with np.where, np.nonzero. Moreover, you can use the np.in1d(A, B) with np.intersect1d (based on this page). Also, you can use np.searchsorted as another useful approach for sorted arrays.
I want to add another simple solution. You can use the comprehension list. It may take longer that the previous ones. However, if you take the advantage of Numba python package, it is much less time-consuming.
In [1]: import numpy as np
In [2]: from numba import njit
In [3]: a = np.array([1,2,3,4,5,6,7,8,9,10])
In [4]: b = np.array([1,7,10])
In [5]: np.where(np.in1d(a, b))[0]
...: array([0, 6, 9])
In [6]: np.nonzero(np.in1d(a, b))[0]
...: array([0, 6, 9])
In [7]: np.searchsorted(a, b)
...: array([0, 6, 9])
In [8]: np.searchsorted(a, np.intersect1d(a, b))
...: array([0, 6, 9])
In [9]: [i for i, x in enumerate(a) if x in b]
...: [0, 6, 9]
In [10]: #njit
...: def func(a, b):
...: return [i for i, x in enumerate(a) if x in b]
In [11]: func(a, b)
...: [0, 6, 9]
Now, let's compare the timing performance of these solutions.
In [12]: %timeit np.where(np.in1d(a, b))[0]
4.26 µs ± 6.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [13]: %timeit np.nonzero(np.in1d(a, b))[0]
4.39 µs ± 14.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [14]: %timeit np.searchsorted(a, b)
800 ns ± 6.04 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [15]: %timeit np.searchsorted(a, np.intersect1d(a, b))
8.8 µs ± 73.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [16]: %timeit [i for i, x in enumerate(a) if x in b]
15.4 µs ± 18.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [17]: %timeit func(a, b)
336 ns ± 0.579 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Intersection between two multi-dimensional arrays with tolerance - NumPy / Python

i am stuck at a problem. I have two 2-D numpy arrays, filled with x and y coordinates. Those arrays might look like:
array1([[(1.22, 5.64)],
[(2.31, 7.63)],
[(4.94, 4.15)]],
array2([[(1.23, 5.63)],
[(6.31, 10.63)],
[(2.32, 7.65)]],
Now I have to find "duplicate nodes". However, i also have to consider nodes as equal within a given tolerance of the coordinates, therefore, i can't use solutions like this . Since my arrays are quite big (~200.000 lines each) two simple for loops are not an option as well. My final output should look like this:
output([[(1.23, 5.63)],
[(2.32, 7.65)]],
I would appreciate some hints.
Cheers,

In order to compare to nodes with a giving tolerance I recommend to use numpy.isclose(), where you can set a relative and absolute tolerance.
numpy.isclose(1.24, 1.25, atol=1e-1)
# [True]
numpy.isclose([1.24, 2.31], [1.25, 2.32], atol=1e-1)
# [True, True]
Instead of using a two for loops, you can make use of itertools.product() package, to go through all pairs. The following code does what you want:
array1 = np.array([[1.22, 5.64],
[2.31, 7.63],
[4.94, 4.15]])
array2 = np.array([[1.23, 5.63],
[6.31, 10.63],
[2.32, 7.64]])
output = np.empty((0,2))
for i0, i1 in itertools.product(np.arange(array1.shape[0]),
np.arange(array2.shape[0])):
if np.all(np.isclose(array1[i0], array2[i1], atol=1e-1)):
output = np.concatenate((output, [array2[i1]]), axis=0)
# output = [[ 1.23 5.63]
# [ 2.32 7.64]]

Defining a isclose function similar to numpy.isclose, but a bit faster (mostly due to not checking any input and not supporting both relative and absolute tolerance):
import numpy as np
array1 = np.array([[(1.22, 5.64)],
[(2.31, 7.63)],
[(4.94, 4.15)]])
array2 = np.array([[(1.23, 5.63)],
[(6.31, 10.63)],
[(2.32, 7.65)]])
def isclose(x, y, atol):
return np.abs(x - y) < atol
Now comes the hard part. We need to calculate if any two values are close within the inner most dimension. For this I reshape the arrays in such a way that the first array has its values along the second dimension, replicated across the first and the second array has its values along the first dimension, replicated along the second (note the 1, 3 and 3, 1):
In [92]: isclose(array1.reshape(1,3,2), array2.reshape(3,1,2), 0.03)
Out[92]:
array([[[ True, True],
[False, False],
[False, False]],
[[False, False],
[False, False],
[False, False]],
[[False, False],
[ True, True],
[False, False]]], dtype=bool)
Now we want all entries where the value is close to any other value (along the same dimension):
In [93]: isclose(array1.reshape(1,3,2), array2.reshape(3,1,2), 0.03).any(axis=0)
Out[93]:
array([[ True, True],
[ True, True],
[False, False]], dtype=bool)
Then we want only those where both values of the tuple are close:
In [111]: isclose(array1.reshape(1,3,2), array2.reshape(3,1,2), 0.03).any(axis=0).all(axis=-1)
Out[111]: array([ True, True, False], dtype=bool)
And finally, we can use this to index array1:
In [112]: array1[isclose(array1.reshape(1,3,2), array2.reshape(3,1,2), 0.03).any(axis=0).all(axis=-1)]
Out[112]:
array([[[ 1.22, 5.64]],
[[ 2.31, 7.63]]])
If you want to, you can swap the any and all calls. One might be faster than the other in your case.
The 3 in the reshape calls needs to be substituted for the actual length of your data.
This algorithm will have the same bad runtime of the other answer using itertools.product, but at least the actual looping is done implicitly by numpy and is implemented in C. This is visible in the timings:
In [122]: %timeit array1[isclose(array1.reshape(1,len(array1),2), array2.reshape(len(array2),1,2), 0.03).any(axis=0).all(axis=-1)]
11.6 µs ± 493 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [126]: %timeit pares(array1_pares, array2_pares)
267 µs ± 8.72 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Where the pares function is the code defined by #Ferran Parés in another answer and the arrays as already reshaped there.
And for larger arrays it becomes more obvious:
array1 = np.random.normal(0, 0.1, size=(1000, 1, 2))
array2 = np.random.normal(0, 0.1, size=(1000, 1, 2))
array1_pares = array1.reshape(1000, 2)
array2_pares = arra2.reshape(1000, 2)
In [149]: %timeit array1[isclose(array1.reshape(1,len(array1),2), array2.reshape(len(array2),1,2), 0.03).any(axis=0).all(axis=-1)]
135 µs ± 5.34 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [157]: %timeit pares(array1_pares, array2_pares)
1min 36s ± 6.85 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
In the end this is limited by the available system memory. My machine (16GB RAM) can still handle arrays of length 20000, but that pushes it almost to 100%. It also takes about 12s:
In [14]: array1 = np.random.normal(0, 0.1, size=(20000, 1, 2))
In [15]: array2 = np.random.normal(0, 0.1, size=(20000, 1, 2))
In [16]: %timeit array1[isclose(array1.reshape(1,len(array1),2), array2.reshape(len(array2),1,2), 0.03).any(axis=0).all(axis=-1)]
12.3 s ± 514 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

There are many possible ways to define that tolerance. Since, we are talking about XY coordinates, most probably we are talking about euclidean distances to set that tolerance value. So, we can use Cython-powered kd-tree for quick nearest-neighbor lookup, which is very efficient both memory-wise and with performance. The implementation would look something like this -
from scipy.spatial import cKDTree
# Assuming a default tolerance value of 1 here
def intersect_close(a, b, tol=1):
# Get closest distances for each pt in b
dist = cKDTree(a).query(b, k=1)[0] # k=1 selects closest one neighbor
# Check the distances against the given tolerance value and
# thus filter out rows off b for the final output
return b[dist <= tol]
Sample step-by-step run -
# Input 2D arrays
In [68]: a
Out[68]:
array([[1.22, 5.64],
[2.31, 7.63],
[4.94, 4.15]])
In [69]: b
Out[69]:
array([[ 1.23, 5.63],
[ 6.31, 10.63],
[ 2.32, 7.65]])
# Get closest distances for each pt in b
In [70]: dist = cKDTree(a).query(b, k=1)[0]
In [71]: dist
Out[71]: array([0.01414214, 5. , 0.02236068])
# Mask of distances within the given tolerance
In [72]: tol = 1
In [73]: dist <= tol
Out[73]: array([ True, False, True])
# Finally filter out valid ones off b
In [74]: b[dist <= tol]
Out[74]:
array([[1.23, 5.63],
[2.32, 7.65]])
Timings on 200,000 pts -
In [20]: N = 200000
...: np.random.seed(0)
...: a = np.random.rand(N,2)
...: b = np.random.rand(N,2)
In [21]: %timeit intersect_close(a, b)
1 loop, best of 3: 1.37 s per loop

As commented, scaling and rounding your numbers might allow you to use intersect1d or the equivalent.
And if you have just 2 columns, it might work to turn it into a 1d array of complex dtype.
But you might also want to keep in mind what intersect1d does:
if not assume_unique:
# Might be faster than unique( intersect1d( ar1, ar2 ) )?
ar1 = unique(ar1)
ar2 = unique(ar2)
aux = np.concatenate((ar1, ar2))
aux.sort()
return aux[:-1][aux[1:] == aux[:-1]]
unique has been enhanced to handle rows (axis parameters), but intersect has not. In any case it uses argsort to put similar elements next to each other, and then skips the duplicates.
Notice that intersect concatenenates the unique arrays, sorts, and again finds the duplicates.
I know you didn't want a loop version, but to promote conceptualization of the problem here's one anyways:
In [581]: a = np.array([(1.22, 5.64),
...: (2.31, 7.63),
...: (4.94, 4.15)])
...:
...: b = np.array([(1.23, 5.63),
...: (6.31, 10.63),
...: (2.32, 7.65)])
...:
I removed a layer of nesting in your arrays.
In [582]: c = []
In [583]: for a1 in a:
...: for b1 in b:
...: if np.allclose(a1,b1, atol=0.5): c.append((a1,b1))
or as list comprehension
In [586]: [(a1,b1) for a1 in a for b1 in b if np.allclose(a1,b1,atol=0.5)]
Out[586]:
[(array([1.22, 5.64]), array([1.23, 5.63])),
(array([2.31, 7.63]), array([2.32, 7.65]))]
complex approximation
In [604]: aa = (a*10).astype(int)
In [605]: aa
Out[605]:
array([[12, 56],
[23, 76],
[49, 41]])
In [606]: ac=aa[:,0]+1j*aa[:,1]
In [607]: bb = (b*10).astype(int)
In [608]: bc=bb[:,0]+1j*bb[:,1]
In [609]: np.intersect1d(ac,bc)
Out[609]: array([12.+56.j, 23.+76.j])
intersect inspired
Concatenate the arrays, sort them, take difference, and find the small differences:
In [616]: ab = np.concatenate((a,b),axis=0)
In [618]: np.lexsort(ab.T)
Out[618]: array([2, 3, 0, 1, 5, 4], dtype=int32)
In [619]: ab1 = ab[_,:]
In [620]: ab1
Out[620]:
array([[ 4.94, 4.15],
[ 1.23, 5.63],
[ 1.22, 5.64],
[ 2.31, 7.63],
[ 2.32, 7.65],
[ 6.31, 10.63]])
In [621]: ab1[1:]-ab1[:-1]
Out[621]:
array([[-3.71, 1.48],
[-0.01, 0.01],
[ 1.09, 1.99],
[ 0.01, 0.02],
[ 3.99, 2.98]])
In [623]: ((ab1[1:]-ab1[:-1])<.1).all(axis=1) # refine with abs
Out[623]: array([False, True, False, True, False])
In [626]: np.where(Out[623])
Out[626]: (array([1, 3], dtype=int32),)
In [627]: ab[_]
Out[627]:
array([[2.31, 7.63],
[1.23, 5.63]])

May be you could try this using pure NP and self defined function:
import numpy as np
#Your Example
xDA=np.array([[1.22, 5.64],[2.31, 7.63],[4.94, 4.15],[6.1,6.2]])
yDA=np.array([[1.23, 5.63],[6.31, 10.63],[2.32, 7.65],[3.1,9.2]])
###Try this large sample###
#xDA=np.round(np.random.uniform(1,2, size=(5000, 2)),2)
#yDA=np.round(np.random.uniform(1,2, size=(5000, 2)),2)
print(xDA)
print(yDA)
#Match x to y
def np_matrix(myx,myy,calp=0.2):
Xxx = np.transpose(np.repeat(myx[:, np.newaxis], myy.size, axis=1))
Yyy = np.repeat(myy[:, np.newaxis], myx.size, axis=1)
# define a caliper
matches = {}
dist = np.abs(Xxx - Yyy)
for m in range(0, myx.size):
if (np.min(dist[:, m]) <= calp) or not calp:
matches[m] = np.argmin(dist[:, m])
return matches
alwd_dist=0.1
xc1=xDA[:,1]
yc1=yDA[:,1]
m1=np_matrix(xc1,yc1,alwd_dist)
xc0=xDA[:,0]
yc0=yDA[:,0]
m0=np_matrix(xc0,yc0,alwd_dist)
shared_items = set(m1.items()) & set(m0.items())
if (int(len(shared_items))==0):
print("No Matched Items based on given allowed distance:",alwd_dist)
else:
print("Matched:")
for ke in shared_items:
print(xDA[ke[0]],yDA[ke[1]])

Finding indices of matches of one array in another array

I have two numpy arrays, A and B. A conatains unique values and B is a sub-array of A.
Now I am looking for a way to get the index of B's values within A.
For example:
A = np.array([1,2,3,4,5,6,7,8,9,10])
B = np.array([1,7,10])
# I need a function fun() that:
fun(A,B)
>> 0,6,9

You can use np.in1d with np.nonzero -
np.nonzero(np.in1d(A,B))[0]
You can also use np.searchsorted, if you care about maintaining the order -
np.searchsorted(A,B)
For a generic case, when A & B are unsorted arrays, you can bring in the sorter option in np.searchsorted, like so -
sort_idx = A.argsort()
out = sort_idx[np.searchsorted(A,B,sorter = sort_idx)]
I would add in my favorite broadcasting too in the mix to solve a generic case -
np.nonzero(B[:,None] == A)[1]
Sample run -
In [125]: A
Out[125]: array([ 7, 5, 1, 6, 10, 9, 8])
In [126]: B
Out[126]: array([ 1, 10, 7])
In [127]: sort_idx = A.argsort()
In [128]: sort_idx[np.searchsorted(A,B,sorter = sort_idx)]
Out[128]: array([2, 4, 0])
In [129]: np.nonzero(B[:,None] == A)[1]
Out[129]: array([2, 4, 0])

Have you tried searchsorted?
A = np.array([1,2,3,4,5,6,7,8,9,10])
B = np.array([1,7,10])
A.searchsorted(B)
# array([0, 6, 9])

Just for completeness: If the values in A are non negative and reasonably small:
lookup = np.empty((np.max(A) + 1), dtype=int)
lookup[A] = np.arange(len(A))
indices = lookup[B]

I had the same question these days. However, the timing performance is very critical for me. Therefore, I guess the timing comparison of different solutions may be useful for others.
As Divakar mentioned, you can use np.in1d(A, B) with np.where, np.nonzero. Moreover, you can use the np.in1d(A, B) with np.intersect1d (based on this page). Also, you can use np.searchsorted as another useful approach for sorted arrays.
I want to add another simple solution. You can use the comprehension list. It may take longer that the previous ones. However, if you take the advantage of Numba python package, it is much less time-consuming.
In [1]: import numpy as np
In [2]: from numba import njit
In [3]: a = np.array([1,2,3,4,5,6,7,8,9,10])
In [4]: b = np.array([1,7,10])
In [5]: np.where(np.in1d(a, b))[0]
...: array([0, 6, 9])
In [6]: np.nonzero(np.in1d(a, b))[0]
...: array([0, 6, 9])
In [7]: np.searchsorted(a, b)
...: array([0, 6, 9])
In [8]: np.searchsorted(a, np.intersect1d(a, b))
...: array([0, 6, 9])
In [9]: [i for i, x in enumerate(a) if x in b]
...: [0, 6, 9]
In [10]: #njit
...: def func(a, b):
...: return [i for i, x in enumerate(a) if x in b]
In [11]: func(a, b)
...: [0, 6, 9]
Now, let's compare the timing performance of these solutions.
In [12]: %timeit np.where(np.in1d(a, b))[0]
4.26 µs ± 6.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [13]: %timeit np.nonzero(np.in1d(a, b))[0]
4.39 µs ± 14.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [14]: %timeit np.searchsorted(a, b)
800 ns ± 6.04 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [15]: %timeit np.searchsorted(a, np.intersect1d(a, b))
8.8 µs ± 73.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [16]: %timeit [i for i, x in enumerate(a) if x in b]
15.4 µs ± 18.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [17]: %timeit func(a, b)
336 ns ± 0.579 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Getting indices of True values in a boolean list

I have a piece of my code where I'm supposed to create a switchboard. I want to return a list of all the switches that are on. Here "on" will equal True and "off" equal False. So now I just want to return a list of all the True values and their position. This is all I have but it only return the position of the first occurrence of True (this is just a portion of my code):
self.states = [False, False, False, False, True, True, False, True, False, False, False, False, False, False, False, False]
def which_switch(self):
x = [self.states.index(i) for i in self.states if i == True]
This only returns "4"

Use enumerate, list.index returns the index of first match found.
>>> t = [False, False, False, False, True, True, False, True, False, False, False, False, False, False, False, False]
>>> [i for i, x in enumerate(t) if x]
[4, 5, 7]
For huge lists, it'd be better to use itertools.compress:
>>> from itertools import compress
>>> list(compress(xrange(len(t)), t))
[4, 5, 7]
>>> t = t*1000
>>> %timeit [i for i, x in enumerate(t) if x]
100 loops, best of 3: 2.55 ms per loop
>>> %timeit list(compress(xrange(len(t)), t))
1000 loops, best of 3: 696 µs per loop

If you have numpy available:
>>> import numpy as np
>>> states = [False, False, False, False, True, True, False, True, False, False, False, False, False, False, False, False]
>>> np.where(states)[0]
array([4, 5, 7])

TL; DR: use np.where as it is the fastest option. Your options are np.where, itertools.compress, and list comprehension.
See the detailed comparison below, where it can be seen np.where outperforms both itertools.compress and also list comprehension.
>>> from itertools import compress
>>> import numpy as np
>>> t = [False, False, False, False, True, True, False, True, False, False, False, False, False, False, False, False]`
>>> t = 1000*t
Method 1: Using list comprehension
>>> %timeit [i for i, x in enumerate(t) if x]
457 µs ± 1.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Method 2: Using itertools.compress
>>> %timeit list(compress(range(len(t)), t))
210 µs ± 704 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Method 3 (the fastest method): Using numpy.where
>>> %timeit np.where(t)
179 µs ± 593 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Using element-wise multiplication and a set:
>>> states = [False, False, False, False, True, True, False, True, False, False, False, False, False, False, False, False]
>>> set(multiply(states,range(1,len(states)+1))-1).difference({-1})
Output:
{4, 5, 7}

You can use filter for it:
filter(lambda x: self.states[x], range(len(self.states)))
The range here enumerates elements of your list and since we want only those where self.states is True, we are applying a filter based on this condition.
For Python > 3.0:
list(filter(lambda x: self.states[x], range(len(self.states))))

Use dictionary comprehension way,
x = {k:v for k,v in enumerate(states) if v == True}
Input:
states = [False, False, False, False, True, True, False, True, False, False, False, False, False, False, False, False]
Output:
{4: True, 5: True, 7: True}

Simply do this:
def which_index(self):
return [
i for i in range(len(self.states))
if self.states[i] == True
]

I got different benchmark result compared to #meysham answer. In this test, compress seems the fastest (python 3.7).
from itertools import compress
import numpy as np
t = [True, False, False, False, False, True, True, False, True, False, False, False, False, False, False, False, False]
%timeit [i for i, x in enumerate(t) if x]
%timeit list(compress(range(len(t)), t))
%timeit list(filter(lambda x: t[x], range(len(t))))
%timeit np.where(t)[0]
# 2.54 µs ± 400 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 2.67 µs ± 600 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 6.22 µs ± 624 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# 6.52 µs ± 768 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
t = 1000*t
%timeit [i for i, x in enumerate(t) if x]
%timeit list(compress(range(len(t)), t))
%timeit list(filter(lambda x: t[x], range(len(t))))
%timeit np.where(t)[0]
# 1.68 ms ± 112 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# 947 µs ± 105 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# 3.96 ms ± 97 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# 2.14 ms ± 45.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

You can filter by using boolean mask array with square bracket, it's faster than np.where
>>> states = [True, False, False, True]
>>> np.arange(len(states))[states]
array([0, 3])
>>> size = 1_000_000
>>> states = np.arange(size) % 2 == 0
>>> states
array([ True, False, True, ..., False, True, False])
>>> true_index = np.arange(size)[states]
>>> len(true_index)
500000
>>> true_index
array([ 0, 2, 4, ..., 999994, 999996, 999998])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

numpy conditional list membership element wise - python

I have a 2D numpy array: a = np.array([[0,1], [2,3]]) I have a list of values to keep: vals_keep = [1,2] I want to test for list membership for each element in the array. Something like: mask = a in vals_keep The result I want: array([[False, True], [True, False]])

Related

How to obtain only the first True value from each row of a numpy array?

Search indexes where values in my array match a value in a different array (python) [duplicate]

Intersection between two multi-dimensional arrays with tolerance - NumPy / Python

Finding indices of matches of one array in another array

Getting indices of True values in a boolean list

Categories

Resources