Inverting boolean array using np.invert - python

I have two boolean arrays a and b. I want a resulting boolean array c such that each element in a is reversed if condition in b is True and keeps original if condition in b is false.
a = np.array([True, False, True, True, False])
b = np.array([True, False, False, False, True])
c = np.invert(a, where=b)
Expected output:
c = np.array([False, False, True, True, True])
However this is the output I'm getting:
c = np.array([False False False False True])
Why is this so?

You need to include an out to specify the value for the not-where elements. Otherwise they are unpredictable.
In [242]: np.invert(a,where=b, out=a)
Out[242]: array([False, False, True, True, True])

Passing where=b to numpy.invert doesn't mean "keep the original a values for cells not selected by b". It means "don't write anything to the output array for cells not selected by b". Since you didn't pass an initialized out array, the unselected cells are filled with whatever garbage happened to be in that memory when it was allocated.
Since NumPy has some free lists for small array buffers, we can demonstrate that the output is uninitialized garbage by getting NumPy to reuse an allocation filled with whatever we want:
import numpy
a = numpy.zeros(4, dtype=bool)
numpy.array([True, False, True, False])
print(repr(numpy.invert(a, where=a)))
Output:
array([ True, False, True, False])
In this example, we can see that NumPy reused the buffer from the array we created but didn't save. Since where=a selected no cells, numpy.invert didn't write anything to the buffer, and the result is exactly the contents of the discarded array.
As for the operation you wanted to perform, that's just XOR: c = a ^ b

Related

Python: comparing numpy array with sub-numpy array without loop

My problem is quite simple but I cannot figure how to solve it without a loop.
I have a first numpy array:
FullArray = np.array([0,1,2,3,4,5,6,7,8,9])
and a sub array (not necessarily ordered in the same way):
Sub array = np.array([8, 3, 5])
I would like to create a bool array that has the same size of the full array and that returns True if a given value of FullArray is present in the SubArray and False either way.
For example here I expect to get:
BoolArray = np.array([False, False, False, True, False, True, False, False, True, False])
Is there a way to do this without using a loop?
You can use np.isin:
np.isin(FullArray, SubArray)
# array([False, False, False, True, False, True, False, False, True, False])

At least one True value per column in numpy boolean array

Suppose I have a very big 2D boolean array (for the sake of the example, let's take dimensions 4 lines x 3 columns):
toto = np.array([[True, True, False],
[False, True, False],
[True, False, False],
[False, True, False]])
I want to transform totoso that it contains at least one True value per column , by leaving other columns untouched.
EDIT : The rule is just this : If a column is all False, I want to introduce a True in a random line.
So in this example, one of the False in the 3rd column should become True.
How would you do that efficiently?
Thank you in advance
You can do it like this:
col_mask = ~np.any(toto, axis=0)
row_idx = np.random.randint(toto.shape[0], size=np.sum(col_mask))
toto[row_idx, col_mask]=True
col_mask is array([False, False, True]) of changeable columns.
row_idx is array that consists of changeable indexes of rows.
import numpy as np
toto = np.array([[False, True, False], [False, True, False],
[False, False, False], [False, True, False]])
# First we get a boolean array indicating columns that have at least one True value
mask = np.any(toto, axis=0)
# Now we invert the mask to get columns indexes (as boolean array) with no True value
mask = np.logical_not(mask)
# Notice that if we index with this mask on the colum dimension we get elements
# in all rows only in the columns containing no True value. The dimension is is
# "num_rows x num_columns_without_true"
toto[:, mask]
# Now we need random indexes for rows in the columns containing only false. That
# means an array of integers from zero to `num_rows - 1` with
# `num_columns_without_true` elements
row_indexes = np.random.randint(toto.shape[0], size=np.sum(mask))
# Now we can use both masks to select one False element in each column containing only False elements and set them to True
toto[row_indexes, mask] = True
Disclaimer: mathfux was faster with essentially the same solution as the one I was writing (accept his answer then if this is what you were looking for), but since I was writting with more comments I decided to post anyway.

Python: How to pass subarrays of array into array function

The ultimate goal of my question is that I want to generate a new array 'output' by passing the subarrays of an array into a function, where the return of the function for each subarray generates a new element into 'output'.
My input array was generated as follows:
aggregate_input = np.random.rand(100, 5)
input = np.split(aggregate_predictors, 1, axis=1)[0]
So now input appears as follows:
print(input[0:2])
>>[[ 0.61521025 0.07407679 0.92888063 0.66066605 0.95023826]
>> [ 0.0666379 0.20007622 0.84123138 0.94585421 0.81627862]]
Next, I want to pass each element of input (so the array of 5 floats) through my function 'condition' and I want the return of each function call to fill in a new array 'output'. Basically, I want 'output' to contain 100 values.
def condition(array):
return array[4] < 0.5
How do I pass each element of input into condition without using any nasty loops?
========
Basically, I want to do this, but optimized:
lister = []
for i in range(100):
lister.append(condition(input[i]))
output = np.array(lister)
That initial split and index does nothing. It just wraps the array in list, and then takes out again:
In [76]: x=np.random.rand(100,5)
In [77]: y = np.split(x,1,axis=1)
In [78]: len(y)
Out[78]: 1
In [79]: y[0].shape
Out[79]: (100, 5)
The rest just tests if the 4th element of each row is <.5:
In [81]: def condition(array):
...:
...: return array[4] < 0.5
...:
In [82]: lister = []
...:
...: for i in range(100):
...: lister.append(condition(x[i]))
...:
...: output = np.array(lister)
...:
In [83]: output
Out[83]:
array([ True, False, False, True, False, True, True, False, False,
True, False, True, False, False, True, False, False, True,
False, True, False, True, False, False, False, True, False,
...], dtype=bool)
We can do just as easily with column indexing
In [84]: x[:,4]<.5
Out[84]:
array([ True, False, False, True, False, True, True, False, False,
True, False, True, False, False, True, False, False, True,
False, True, False, True, False, False, False, True, False,
...], dtype=bool)
In other words, operate on the whole 4th column of the array.
You are trying to make a very simple indexing expression very convoluted. If you read the docs for np.split very carefully, you will see that passing a second argument of 1 does absolutely nothing: it splits the array into one chunk. The following line is literally a no-op and should be removed:
input = np.split(aggregate_predictors, 1, axis=1)[0]
You have a 2D numpy array of shape 100, 5 (you can check that with aggregate_predictors.shape). Your function returns whether or not the fifth column contains a value less than 0.5. You can do this with a single vectorized expression:
output = aggregate_predictors[:, 4] < 0.5
If you want to find the last column instead of the fifth, use index -1 instead:
output = aggregate_predictors[:, -1] < 0.5
The important thing to remember here is that all the comparison operators are vectorized element-wise in numpy. Usually, vectorizing an operation like this involves finding the correct index in the array. You should never have to convert anything to a list: numpy arrays are iterable as it is, and there are more complex iterators available.
That being said, your original intent was probably to do something like
input = split(aggregate_predictors, len(aggregate_predictors), axis=0)
OR
input = split(aggregate_predictors, aggregate_predictors.shape[0])
Both expressions are equivalent. They split aggregate_predictors into a list of 100 single-row matrices.

Python: numpy array larger and smaller than a value

How to look for numbers that is between a range?
c = array[2,3,4,5,6]
>>> c>3
>>> array([False, False, True, True, True]
However, when I give c in between two numbers, it return error
>>> 2<c<5
>>> ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
The desire output is
array([False, True, True, False, False]
Try this,
(c > 2) & (c < 5)
Result
array([False, True, True, False, False], dtype=bool)
Python evaluates 2<c<5 as (2<c) and (c<5) which would be valid, except the and keyword doesn't work as we would want with numpy arrays. (It attempts to cast each array to a single boolean, and that behavior can't be overridden, as discussed here.) So for a vectorized and operation with numpy arrays you need to do this:
(2<c) & (c<5)
You can do something like this :
import numpy as np
c = np.array([2,3,4,5,6])
output = [(i and j) for i, j in zip(c>2, c<5)]
Output :
[False, True, True, False, False]

Numpy array update command explanation

How is this operation called technically and what other functionalities does it allow for:
Z[1:-1,1:-1][birth|survive]=1. Where Z is a 4x4 array and birth and survive are same size Boolean arrays. I understand what this code does, but would like to know how is this operation called and what else can I do with it (talking about this latter part [birth|survive]).
The pipe | is the bitwise or operator. Therefore, birth|survive is the equivalent to np.bitwise_or(birth, survive). Presumably birth and survive are boolean arrays, so the output is a boolean array with the straightforward or behavior:
a = np.array([True, True, False, False])
b = np.array([True, False, False, True])
a|b
# array([ True, True, False, True], dtype=bool)
For integers, each bit is considered and an integer array is returned where for each digit in the binary representation has been or'ed. There is a better explanation on its behavior and some examples at the documentation page.
Once you've created the boolean array from birth|survive, you are using it to do a boolean index into the Z array. Most simply, this can be shown with:
a = np.array([1,2,3])
b = np.array([True, False, True])
a[b] # the elements of a where b is True
# array([1, 3])
Since it's on the left side of the assignment =, python will assign the value 1 to every point in Z where birth or survive is True:
a[b] = 99
a
# array([99, 2, 99])

Categories