Using entrywise sum of boolean arrays as inclusive `or` - python

I would like to compare many m-by-n boolean numpy arrays and get an array of the same shape whose entries are True if the corresponding entry in at least one of the inputs is True.
The easiest way I've found to do this is:
In [5]: import numpy as np
In [6]: a = np.array([True, False, True])
In [7]: b = np.array([True, True, False])
In [8]: a + b
Out[8]: array([ True, True, True])
But I can also use
In [11]: np.stack([a, b]).sum(axis=0) > 0
Out[11]: array([ True, True, True])
Are these equivalent operations? Are there any gotchas I should be aware of? Is one method preferable to the other?

You can use np.logical_or
a = np.array([True, False, True])
b = np.array([True, True, False])
np.logical_or(a,b)
it also works for (m,n) arrays
a = np.random.rand(3,4) < 0.5
b = np.random.rand(3,4) < 0.5
print('a\n',a)
print('b\n',b)
np.logical_or(a,b)

Related

How to return a numpy array of the indices of the first element in each row of a numpy array with a given value?

Given a numpy array of shape (2, 4):
input = np.array([[False, True, False, True], [False, False, True, True]])
I want to return an array of shape (N,) where each element of the array is the index of the first True value:
expected = np.array([1, 2])
Is there an easy way to do this using numpy functions and without resorting to standard loops?
np.max with axis finds the max along the dimension; argmax finds the first max index:
In [42]: arr = np.array([[False, True, False, True], [False, False, True, True]])
In [43]: np.argmax(arr, axis=1)
Out[43]: array([1, 2])
This worked for me:
nonzeros = np.nonzero(input)
u, indices = np.unique(nonzeros[0], return_index=True)
expected = nonzeros[1][indices]

Numpy: sums 1D array indexed by rows of 2D boolean array

Assume that I have an (m,)-array a and an (n, m)-array b of booleans. For each row b[i] of b, I want to take np.sum(a, where=b[i]), which should return an (n,)-array. I could do the following:
a = np.array([1,2,3])
b = np.array([[True, False, False], [True, True, False], [False, True, True]])
c = np.array([np.sum(a, where=r) for r in b])
# c is [1,3,5]
but this seems quite unelegant to me. I would have hoped that broadcasting magic makes something like
c = np.sum(a, where=b)
# c is 9
work, but apparently, np.sum then sums over the rows of b, which I do not want. Is there a numpy-inherent way of achieving the desired behavour with np.sum (or any ufunc.reduce)?
How about:
a = np.array([1,2,3])
b = np.array([[True, False, False], [True, True, False], [False, True, True]])
c = np.sum(a*b, axis = 1)
output:
array([1, 3, 5])

Mergining Two Numpy Boolean Arrays after an Index

In my problem I have 2 boolean numpy arrays that I would like to merge after a given index. Currently I am using np.logical_or(arr1, arr2), but executes on the entire array. I am trying to only execute the operation after an index.
Below I would like to use arr1 as the master and merge arr2 after any index.
For example, take the arrays and index below
arr1 = np.array([ True, False, False, True, False])
arr2 = np.array([False, True, True, True, False])
index = 2
Returns
# array([True, False, True, True, False])
You can use array slicing and np.concatenate to implement this. In this case, arr3 will consist of elements from arr1 in indices 0 to 'index' and the rest of the elements will be from arr2.
arr1 = np.array([ True, False, False, True, False])
arr2 = np.array([False, True, True, True, False])
index = 2
arr3= np.concatenate((arr1[:index], arr2[index:]), axis = 0)
print(arr3)

Filling a numpy array with x random values

I have a numpy array of size x, which I need to fill with 700 true.
For example:
a = np.zeros(5956)
If I want to fill this with 70 % True, I can write this:
msk = np.random.rand(len(a)) < 0.7
b = spam_df[msk]
But what if I need exactly 700 true, and the rest false?
import numpy as np
x = 5956
a = np.zeros((x), dtype=bool)
random_places = np.random.choice(x, 700, replace=False)
a[random_places] = True
import numpy as np
zeros = np.zeros(5956-700, dtype=bool)
ones=np.ones(700, dtype=bool)
arr=np.concatenate((ones,zeros), axis=0, out=None)
np.random.shuffle(arr)#Now, this array 'arr' is shuffled, with 700 Trues and rest False
Example - there should be 5 elements in an array with 3 True and rest False.
ones= np.ones(3, dtype=bool) #array([True, True, True])
zeros= np.zeros(5-3, dtype=bool) #array([False, False])
arr=np.concatenate((ones,zeros), axis=0, out=None) #arr - array([ True, True, True, False, False])
np.random.shuffle(arr) # now arr - array([False, True, True, True, False])

pandas build matrix of row by row comparisons

I have two dataframes, a (10,2) and a (4,2) and I am looking for a faster/more pythonic way to compare them row by row.
x = pd.DataFrame([range(10),range(2,12)])
x = x.transpose()
y = pd.DataFrame([[5,8],[2,3],[5,5]])
I'd like to build a comparison matrix (10,3) that shows which of the rows in the first dataframe fit the following requirements in the second dataframe. the x1 value must be >= the y[0] value and the x[0] value must be <= the y1 value. In reality, the data are dates, but for simplicity I have just used integers to make this example easier to follow. We're testing for overlap in time periods, so the logic shows that there must be some overlap in the periods of the respective tables.
arr = np.zeros((len(x),len(y)), dtype=bool)
for xrow in x.index:
for yrow in y.index:
if x.loc[xrow,1] >= y.loc[yrow,0] and x.loc[xrow,0] <= y.loc[yrow,1]:
arr[xrow,yrow] = True
arr
The brute force approach above is too slow. Any suggestions for how I could vectorize this or do some sort of transposed matrix comparisons?
You can convert x, y to NumPy arrays and then extend dimensions with np.newaxis/None, which would bring in NumPy's broadcasting when performing the same operations. Thus, all those comparisons and the output boolean array would be created in a vectorized fashion. The implementation would look like this -
X = np.asarray(x)
Y = np.asarray(y)
arr = (X[:,None,1] >= Y[:,0]) & (X[:,None,0] <= Y[:,1])
Sample run -
In [207]: x = pd.DataFrame([range(10),range(2,12)])
...: x = x.transpose()
...: y = pd.DataFrame([[5,8],[2,3],[5,5]])
...:
In [208]: X = np.asarray(x)
...: Y = np.asarray(y)
...: arr = (X[:,None,1] >= Y[:,0]) & (X[:,None,0] <= Y[:,1])
...:
In [209]: arr
Out[209]:
array([[False, True, False],
[False, True, False],
[False, True, False],
[ True, True, True],
[ True, False, True],
[ True, False, True],
[ True, False, False],
[ True, False, False],
[ True, False, False],
[False, False, False]], dtype=bool)

Categories