How do I search for indices that satisfy condition in numpy? - python

I have columns corresponding to a given day, month, and year in a numpy array called 'a' and I am comparing all three of these values to the columns of another array called 'b' which also correspond to day,month, and year to find the index of 'a' that is equal to 'b' so far I have tried:
a[:,3:6,1] == b[1,3:6]
array([[False, True, True],
[ True, True, True],
[False, True, True],
...,
[False, False, False],
[False, False, False],
[False, False, False]], dtype=bool)
which works fine but I need the row that corresponds to [True,True,True]
I've also tried:
np.where(a[:,3:6,1] == b[1,3:6], a[:,3:6,1])
ValueError: either both or neither of x and y should be given
and
a[:,:,1].all(a[:,3:6,1] == b[1,3:6])
TypeError: only length-1 arrays can be converted to Python scalars
What is a quick and easy way to do this?

You can use np.all() along the last axis:
rows = np.where((a[:,3:6,1]==b[1,3:6]).all(axis=1))[0]
it will store in rows the indices where all the row contains True values.

Related

At least one True value per column in numpy boolean array

Suppose I have a very big 2D boolean array (for the sake of the example, let's take dimensions 4 lines x 3 columns):
toto = np.array([[True, True, False],
[False, True, False],
[True, False, False],
[False, True, False]])
I want to transform totoso that it contains at least one True value per column , by leaving other columns untouched.
EDIT : The rule is just this : If a column is all False, I want to introduce a True in a random line.
So in this example, one of the False in the 3rd column should become True.
How would you do that efficiently?
Thank you in advance
You can do it like this:
col_mask = ~np.any(toto, axis=0)
row_idx = np.random.randint(toto.shape[0], size=np.sum(col_mask))
toto[row_idx, col_mask]=True
col_mask is array([False, False, True]) of changeable columns.
row_idx is array that consists of changeable indexes of rows.
import numpy as np
toto = np.array([[False, True, False], [False, True, False],
[False, False, False], [False, True, False]])
# First we get a boolean array indicating columns that have at least one True value
mask = np.any(toto, axis=0)
# Now we invert the mask to get columns indexes (as boolean array) with no True value
mask = np.logical_not(mask)
# Notice that if we index with this mask on the colum dimension we get elements
# in all rows only in the columns containing no True value. The dimension is is
# "num_rows x num_columns_without_true"
toto[:, mask]
# Now we need random indexes for rows in the columns containing only false. That
# means an array of integers from zero to `num_rows - 1` with
# `num_columns_without_true` elements
row_indexes = np.random.randint(toto.shape[0], size=np.sum(mask))
# Now we can use both masks to select one False element in each column containing only False elements and set them to True
toto[row_indexes, mask] = True
Disclaimer: mathfux was faster with essentially the same solution as the one I was writing (accept his answer then if this is what you were looking for), but since I was writting with more comments I decided to post anyway.

Check numpy array if a row contains at least one false

I have an array 2*n of True and False boolean.
array([[False, True],
[False, True],
[False, True],
...,
[False, True],
[False, True],
[False, True]])
What I want is a new vector, can be in another array that has False if any of the two values are False.
I can create a loop and check each value in the row, and make a new vector. but I'm guessing it's slow
boidx = np.empty(len(minindex), dtype=bool)
for idx in range(len(minindex)):
if minindex[idx,0] and minindex[idx,1]:
boidx[idx]=True
else:
boidx[idx]=False
but this is long and not pythonic.
The array is either 2n or 4n. so it should cover those options (my for loop does not)
but if needed, two solutions with an if for size is doable.
I also tried to use numpy.isin() command. but it works for each cell. I need per row.
As an answer already pointed out, you can use numpy.all() to solve it.
A simpler formulation without any loop would be:
np.all(minindex, axis=1)
If I understand correctly, a pythonic solution could use numpy.all:
import numpy as np
minindex = np.array([[False, True],
[False, True],
[True, True],
[True, False],
[False, True],
[False, False],
[False, False],
[True, True]
boidx = np.array([np.all(i) for i in minindex])
and you get:
[False False True False False False False True]
Another solution could be the use of prod:
boidx = np.array([bool(i.prod()) for i in minindex])
and you get the same result.
As suggested by #Jianyu, this way should be definitely faster:
boidx = np.all(minindex, axis=1)

Appending one truth table to another

So I need to generate a truth table for a bunch of different functions (like implies, not p and q, not p and q, and, or, etc.)
I have a recursive method that generates the first two terms of each index correctly ([False, False], [False, True], [True, False], [True, True]).
However what I need to do is take those two terms and then append the result of those two from one of the different functions to the end of the indices.
make_tt_ins(n): My recursive table builder with n rows (in this case two)
and callf2(f, p, q): a given function that generates the True / False term I'll need to append onto each index.
my_list = PA1.make_tt_ins(2)
p = True;
q = True;
val = [callf2(f, p, q)]
returnVal = [i + val for i in my_list]
return returnVal
Obviously, all I'm getting is True after my intial two values in each index. I just don't know how to correctly append the callf2 function result onto my first two values in each index.
For the function implies (p <-> q), I'm getting:
[[False, False, True], [False, True, True], [True, False, True], [True, True, True]]
It should look something like:
[[False, False, True], [False, True, False], [True, False, False], [True, True, True]]
Figured it out. To anyone wondering, I decided to use one massive while loop with a counter where at each step I would set p / q to different True/False values and then run them with the callf2 function. I then turned those values in a list, which I appended onto my first partial list.

too many indices for array when using np.where

I have the code:
a=b=np.arange(9).reshape(3,3)
c=np.zeros(3)
for x in range(3):
c[x]=np.average(b[np.where(a<x+3)])
The output of c is
>>>array([ 1. , 1.5, 2. ])
Instead of the for loop, I wanna use array (vectorization), then I did the following code:
a=b=np.arange(9).reshape(3,3)
c=np.zeros(3)
i=np.arange(3)
c[i]=np.average(b[np.where(a<i[:,None,None]+3)])
But it shows IndexError: too many indices for array
As for a<i[:,None,None]+3
it correctly shows
array([[[ True, True, True],
[False, False, False],
[False, False, False]],
[[ True, True, True],
[ True, False, False],
[False, False, False]],
[[ True, True, True],
[ True, True, False],
[False, False, False]]], dtype=bool)
But when I use b[np.where(a<i[:,None,None]+3)], it again shows IndexError: too many indices for array. I cannot get the correct output of c.
I am sensing you are trying to vectorize things here, though not explicitly mentioned. Now, I don't think you can index like that in a vectorized manner. To solve your qustion in a vectorized manner, I would suggest a more efficient way to get the sum-reduction with matrix-multiplication using np.tensordot and with help from broadcasting as you had set out already in your trials.
Thus, one solution would be -
from __future__ import division
i = np.arange(3)
mask = a<i[:,None,None]+3
c = np.tensordot(b,mask,axes=((0,1),(1,2)))/mask.sum((1,2))
Related post to understand tensordot.
Possible improvements on performance
Convert the mask to float dtype before feeding to np.dot as BLAS based matrix-multiplication would be faster with it.
Use np.count_nonzero instead of np.sum for counting booleans. So, use it to replace mask.sum() part.

Count of specific case per row in matrix

I am fairly new to numpy and scientific computing and I struggle with a problem for several days, so I decided to post it here.
I am trying to get a count for a specific occurence of a condition in a numpy array.
In [233]: import numpy as np
In [234]: a= np.random.random([5,5])
In [235]: a >.7
Out[235]: array([[False, True, True, False, False],
[ True, False, False, False, True],
[ True, False, True, True, False],
[False, False, False, False, False],
[False, False, True, False, False]], dtype=bool)
What I would like to count the number of occurence of True in each row and keep the rows when this count reach a certain threshold:
ex :
results=[]
threshold = 2
for i,row in enumerate(a>.7):
if len([value for value in row if value==True]) > threshold:
results.append(i) # keep ids for each row that have more than 'threshold' times True
This is the non-optimized version of the code but I would love to achieve the same thing with numpy (I have a very large matrix to process).
I have been trying all sort of things with np.where but I only can get flatten results. I need the row number
Thanks in advance !
To make results reproducible, use some seed:
>>> np.random.seed(100)
Then for a sample matrix
>>> a = np.random.random([5,5])
Count number of occurences along axis with sum:
>>> (a >.7).sum(axis=1)
array([1, 0, 3, 1, 2])
You can get row numbers with np.where:
>>> np.where((a > .7).sum(axis=1) >= 2)
(array([2, 4]),)
To filter result, just use boolean indexing:
>>> a[(a > .7).sum(axis=1) >= 2]
array([[ 0.89041156, 0.98092086, 0.05994199, 0.89054594, 0.5769015 ],
[ 0.54468488, 0.76911517, 0.25069523, 0.28589569, 0.85239509]])
You can sum over axis with a.sum.
Then you can use where on the resulting vector.
results = np.where(a.sum(axis=0) < threshold))

Categories