NumPy chained comparison with two predicates - python

In NumPy, I can generate a boolean array like this:
>>> arr = np.array([1, 2, 1, 2, 3, 6, 9])
>>> arr > 2
array([False, False, False, False, True, True, True], dtype=bool)
How can we chain comparisons together? For example:
>>> 6 > arr > 2
array([False, False, False, False, True, False, False], dtype=bool)
Attempting to do so results in the error message
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

AFAIK the closest you can get is to use &, |, and ^:
>>> arr = np.array([1, 2, 1, 2, 3, 6, 9])
>>> (2 < arr) & (arr < 6)
array([False, False, False, False, True, False, False], dtype=bool)
>>> (2 < arr) | (arr < 6)
array([ True, True, True, True, True, True, True], dtype=bool)
>>> (2 < arr) ^ (arr < 6)
array([ True, True, True, True, False, True, True], dtype=bool)
I don't think you'll be able to get a < b < c-style chaining to work.

You can use the numpy logical operators to do something similar.
>>> arr = np.array([1, 2, 1, 2, 3, 6, 9])
>>> arr > 2
array([False, False, False, False, True, True, True], dtype=bool)
>>>np.logical_and(arr>2,arr<6)
Out[5]: array([False, False, False, False, True, False, False], dtype=bool)

Chained comparisons are not allowed in numpy. You need to write both left and right comparisons separately, and chain them with bitwise operators. Also you'll need to parenthesise both expressions due to operator precendence (|, & and ^ have a higher precedence). In this case, since you want both conditions to be satisfied you need an bitwise AND (&):
(2<arr) & (arr<6)
# array([False, False, False, False, True, False, False])
It was actually proposed to make this possible in PEP 535, though it still remains deferred. In it there is an explanation on why this occurs. As posed in the question, chaining comparisons in such way, yields:
2<arr<6
ValueError: The truth value of an array with more than one element is ambiguous.
Use a.any() or a.all()
The problem here, is that python is internally expanding the above to:
2<arr and arr<6
Which is what causes the error, since and is implicitly calling bool, and NumPy only permits implicit coercion to a boolean value for single elements (not arrays with size>1), since a boolean array with many values does not evaluate neither to True or False. It is due to this ambiguity that this isn't allowed, and evaluating an array in boolean context always yields a ValueError

Related

Can someone please explain np.less_equal.outer(range(1,18),range(1,13))

I was debugging a code written by someone who has left the organization and came across a line, which uses np.less_equal.outer & np.greater_equal.outer functions. I know that np.outer creates a Cartesian cross product of two 1-dimensional arrays and creates two arrays, and np.less_equal compares the element of two arrays and returns true or false. Can someone please explain how this combined form works.
Thanks!
less_equal and greater_equal are special types of numpy functions called ufuncs, in that they have extendible functionalities, including accumulate, at, and outer.
In this case ufunc.outer extends the function to work similarly to the outer product - but while the actual outer product would be multiply.outer, this instead does the greater or less than comparison.
So you get a 2d array of booleans corresponding to each element of the first array, and whether they are greater or less than each of the elements in the second array.
np.less_equal.outer(range(1,18),range(1,13))
Out[]:
array([[ True, True, True, ..., True, True, True],
[False, True, True, ..., True, True, True],
[False, False, True, ..., True, True, True],
...,
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False]], dtype=bool)
EDIT: a much more pythonic way of doing this would be:
np.triu(np.ones((18, 13), dtype = bool), 0)
That is, the upper triangle of a boolean array of shape (18, 13)
From the documentation, we have that for one-dimensional arrays A and B, the operation np.less_equal.outer(A, B) is equivalent to:
m = len(A)
n = len(B)
r = empty(m, n)
for i in range(m):
for j in range(n):
r[i,j] = (A[i] <= B[j])
Here's the mathematical representation of the result:
here is an example:
np.less_equal([4, 2, 1], [2, 2, 2])
array([False, True, True])
np.greater_equal([4, 2, 1], [2, 2, 2])
array([ True, True, False], dtype=bool)
and first the outer function
np.outer(range(1,2), range(1,3))
array([[1 2 3],
[2 4 6],
)
hope that helps.

How to create multiple column list of booleans from given list of integers in phython?

I am new to Python. I want to do following.
Input: A list of integers of size n. Each integer is in a range of 0 to 3.
Output: A multi-column (4 column in this case as integer range in 0-3 = 4) numpy list of size n. Each row of the new list will have the column corresponding to the integer value of Input list as True and rest of the columns as False.
E.g. Input list : [0, 3, 2, 1, 1, 2], size = 6, Each integer is in range of 0-3
Output list :
Row 0: True False False False
Row 1: False False False True
Row 2: False False True False
Row 3: False True False False
Row 4: False True False False
Row 5: False False True False
Now, I can start with 4 columns. Traverse through the input list and create this as follows,
output_columns[].
for i in Input list:
output_column[i] = True
Create an output numpy list with output columns
Is this the best way to do this in Python? Especially for creating numpy list as an output.
If yes, How do I merge output_columns[] at the end to create numpy multidimensional list with each dimension as a column of output_columns.
If not, what would be the best (most time efficient way) to do this in Python?
Thank you,
Is this the best way to do this in Python?
No, a more Pythonic and probably the best way is to use a simple broadcasting comparison as following:
In [196]: a = np.array([0, 3, 2, 1, 1, 2])
In [197]: r = list(range(0, 4))
In [198]: a[:,None] == r
Out[198]:
array([[ True, False, False, False],
[False, False, False, True],
[False, False, True, False],
[False, True, False, False],
[False, True, False, False],
[False, False, True, False]])
You are creating so called one-hot vector (each row in matrix is a one-hot vector meaning that only one value is True).
mylist = [0, 3, 2, 1, 1, 2]
one_hot = np.zeros((len(mylist), 4), dtype=np.bool)
for i, v in enumerate(mylist):
one_hot[i, v] = True
Output
array([[ True, False, False, False],
[False, False, False, True],
[False, False, True, False],
[False, True, False, False],
[False, True, False, False],
[False, False, True, False]], dtype=bool)

numpy where operation on 2D array

I have a numpy array 'A' of size 571x24 and I am trying to find the index of zeros in it so I do:
>>>A.shape
(571L, 24L)
import numpy as np
z1 = np.where(A==0)
z1 is a tuple with following size:
>>> len(z1)
2
>>> len(z1[0])
29
>>> len(z1[1])
29
I was hoping to create a z1 of same size as A. How do I achieve that?
Edit: I want to create array z1 of booleans for presence of zero in A such that:
>>>z1.shape
(571L, 24L)
You can just check this with the equality operator in python with numpy. Example:
>>> A = np.array([[0,2,2,1],[2,0,0,3]])
>>> A == 0
array([[ True, False, False, False],
[False, True, True, False]], dtype=bool)
np.where() does something else, see documentation. Although, it is possible to achieve this with np.where() using broadcasting. See documentation.
>>> np.where(A == 0, True, False)
array([[ True, False, False, False],
[False, True, True, False]], dtype=bool)
Try this:
import numpy as np
myarray = np.array([[0,3,4,5],[9,4,0,4],[1,2,3,4]])
ix = np.in1d(myarray.ravel(), 0).reshape(myarray.shape)
Output of ix:
array([[ True, False, False, False],
[False, False, True, False],
[False, False, False, False]], dtype=bool)

What is best way to find first half of True's in boolean numpy array?

Here is the problem:
Take a numpy boolean array:
a = np.array([False, False, True, True, True, False, True, False])
Which I am using as indexes to panda dataframe. But I need to create 2 new arrays where they each have half the True's as the original array. Note the example arrays are much shorter than actual set.
So like:
1st_half = array([False, False, True, True, False, False, False, False])
2nd_half = array([False, False, False, False, True, False, True, False])
Does anyone have a good way to do this? Thanks.
First find true indices
inds = np.where(a)[0]
cutoff = inds[inds.shape[0]//2]
Set values equivalent before and after cutoff:
b = np.zeros(a.shape,dtype=bool)
c = np.zeros(a.shape,dtype=bool)
c[cutoff:] = a[cutoff:]
b[:cutoff] = a[:cutoff]
Results:
b
Out[65]: array([False, False, True, True, False, False, False, False], dtype=bool)
c
Out[64]: array([False, False, False, False, True, False, True, False], dtype=bool)
There are numerous ways to handle the indexing.

Numpy.where: very slow with conditions from two different arrays

I have three arrays of type numpy.ndarray with dimensions (n by 1), named amplitude, distance and weight. I would like to use selected entries of the amplitude array, based on their respective distance- and weight-values. For example I would like to find the indices of the entries within a certain distance range, so I write:
index = np.where( (distance<10) & (distance>=5) )
and I would then proceed by using the values from amplitude(index).
This works perfectly well as long as I only use one array for specifying the conditions. When I try for example
index = np.where( (distance<10) & (distance>=5) & (weight>0.8) )
the operation becomes super-slow. Why is that, and is there a better way for this task? In fact, I eventually want to use many conditions from something like 6 different arrays.
This is just a guess, but perhaps numpy is broadcasting your arrays? If the arrays are the exact same shape, then numpy won't broadcast them:
>>> distance = numpy.arange(5) > 2
>>> weight = numpy.arange(5) < 4
>>> distance.shape, weight.shape
((5,), (5,))
>>> distance & weight
array([False, False, False, True, False], dtype=bool)
But if they have different shapes, and the shapes are broadcastable, then it will. (n,), (n, 1), and (1, n) are all arguably "n by 1" arrays, they aren't all the same:
>>> distance[None,:].shape, weight[:,None].shape
((1, 5), (5, 1))
>>> distance[None,:]
array([[False, False, False, True, True]], dtype=bool)
>>> weight[:,None]
array([[ True],
[ True],
[ True],
[ True],
[False]], dtype=bool)
>>> distance[None,:] & weight[:,None]
array([[False, False, False, True, True],
[False, False, False, True, True],
[False, False, False, True, True],
[False, False, False, True, True],
[False, False, False, False, False]], dtype=bool)
In addition to returning undesired results, this could be causing a big slowdown if the arrays are even moderately large:
>>> distance = numpy.arange(5000) > 500
>>> weight = numpy.arange(5000) < 4500
>>> %timeit distance & weight
100000 loops, best of 3: 8.17 us per loop
>>> %timeit distance[:,None] & weight[None,:]
10 loops, best of 3: 48.6 ms per loop

Categories