Sum over variable size windows defined by custom boolean array - python

I can solve this problem using a loop structure, so I am specifically wondering whether there is any numpy/vectorized method of accomplishing this.
I have two arrays of values of the same length, e.g.
a = [1,2,3,4,5,6]
b = [False, True, True, False, False, True]
I would like to sum all the elements in a which correspond to each window of "true" positions in b and add them to the immediately preceding "false" values (again in a). So to complete the example I gave I would like to get the output
c = [6,4,11]
In this case we get:
6 (sum of 1,2,3 from the sequence [False, True, True, in b), 4 (no "True" positions immediately following in b) , 11 (from False, True in b).
I realize that may be hard to follow, so please let me know if another example/more explanation would be helpful.

I will piggyback on #Divakar's comment. It is a great idea to use np.add.reduceat(). However, it seems to me that #Divakar's treatment of indices is too simplistic for this problem and a more complicated analysis is required. I think the following may produce what you are looking for:
idx = np.insert(
np.flatnonzero(
np.ediff1d(np.pad(b.astype(np.int), 1, 'constant', constant_values=0)) == -1
), 0, 0
)
wsum = np.add.reduceat(np.append(a, 0), idx)[:-1]
Test
With
>>> a = np.arange(1, 11)
>>> b = np.array([True, True, False, True, False, True, True, False, False, True])
I get:
>>> print(wsum)
[ 3 7 18 27]

Related

Inverting boolean array using np.invert

I have two boolean arrays a and b. I want a resulting boolean array c such that each element in a is reversed if condition in b is True and keeps original if condition in b is false.
a = np.array([True, False, True, True, False])
b = np.array([True, False, False, False, True])
c = np.invert(a, where=b)
Expected output:
c = np.array([False, False, True, True, True])
However this is the output I'm getting:
c = np.array([False False False False True])
Why is this so?
You need to include an out to specify the value for the not-where elements. Otherwise they are unpredictable.
In [242]: np.invert(a,where=b, out=a)
Out[242]: array([False, False, True, True, True])
Passing where=b to numpy.invert doesn't mean "keep the original a values for cells not selected by b". It means "don't write anything to the output array for cells not selected by b". Since you didn't pass an initialized out array, the unselected cells are filled with whatever garbage happened to be in that memory when it was allocated.
Since NumPy has some free lists for small array buffers, we can demonstrate that the output is uninitialized garbage by getting NumPy to reuse an allocation filled with whatever we want:
import numpy
a = numpy.zeros(4, dtype=bool)
numpy.array([True, False, True, False])
print(repr(numpy.invert(a, where=a)))
Output:
array([ True, False, True, False])
In this example, we can see that NumPy reused the buffer from the array we created but didn't save. Since where=a selected no cells, numpy.invert didn't write anything to the buffer, and the result is exactly the contents of the discarded array.
As for the operation you wanted to perform, that's just XOR: c = a ^ b

Appending one truth table to another

So I need to generate a truth table for a bunch of different functions (like implies, not p and q, not p and q, and, or, etc.)
I have a recursive method that generates the first two terms of each index correctly ([False, False], [False, True], [True, False], [True, True]).
However what I need to do is take those two terms and then append the result of those two from one of the different functions to the end of the indices.
make_tt_ins(n): My recursive table builder with n rows (in this case two)
and callf2(f, p, q): a given function that generates the True / False term I'll need to append onto each index.
my_list = PA1.make_tt_ins(2)
p = True;
q = True;
val = [callf2(f, p, q)]
returnVal = [i + val for i in my_list]
return returnVal
Obviously, all I'm getting is True after my intial two values in each index. I just don't know how to correctly append the callf2 function result onto my first two values in each index.
For the function implies (p <-> q), I'm getting:
[[False, False, True], [False, True, True], [True, False, True], [True, True, True]]
It should look something like:
[[False, False, True], [False, True, False], [True, False, False], [True, True, True]]
Figured it out. To anyone wondering, I decided to use one massive while loop with a counter where at each step I would set p / q to different True/False values and then run them with the callf2 function. I then turned those values in a list, which I appended onto my first partial list.

Python: How to pass subarrays of array into array function

The ultimate goal of my question is that I want to generate a new array 'output' by passing the subarrays of an array into a function, where the return of the function for each subarray generates a new element into 'output'.
My input array was generated as follows:
aggregate_input = np.random.rand(100, 5)
input = np.split(aggregate_predictors, 1, axis=1)[0]
So now input appears as follows:
print(input[0:2])
>>[[ 0.61521025 0.07407679 0.92888063 0.66066605 0.95023826]
>> [ 0.0666379 0.20007622 0.84123138 0.94585421 0.81627862]]
Next, I want to pass each element of input (so the array of 5 floats) through my function 'condition' and I want the return of each function call to fill in a new array 'output'. Basically, I want 'output' to contain 100 values.
def condition(array):
return array[4] < 0.5
How do I pass each element of input into condition without using any nasty loops?
========
Basically, I want to do this, but optimized:
lister = []
for i in range(100):
lister.append(condition(input[i]))
output = np.array(lister)
That initial split and index does nothing. It just wraps the array in list, and then takes out again:
In [76]: x=np.random.rand(100,5)
In [77]: y = np.split(x,1,axis=1)
In [78]: len(y)
Out[78]: 1
In [79]: y[0].shape
Out[79]: (100, 5)
The rest just tests if the 4th element of each row is <.5:
In [81]: def condition(array):
...:
...: return array[4] < 0.5
...:
In [82]: lister = []
...:
...: for i in range(100):
...: lister.append(condition(x[i]))
...:
...: output = np.array(lister)
...:
In [83]: output
Out[83]:
array([ True, False, False, True, False, True, True, False, False,
True, False, True, False, False, True, False, False, True,
False, True, False, True, False, False, False, True, False,
...], dtype=bool)
We can do just as easily with column indexing
In [84]: x[:,4]<.5
Out[84]:
array([ True, False, False, True, False, True, True, False, False,
True, False, True, False, False, True, False, False, True,
False, True, False, True, False, False, False, True, False,
...], dtype=bool)
In other words, operate on the whole 4th column of the array.
You are trying to make a very simple indexing expression very convoluted. If you read the docs for np.split very carefully, you will see that passing a second argument of 1 does absolutely nothing: it splits the array into one chunk. The following line is literally a no-op and should be removed:
input = np.split(aggregate_predictors, 1, axis=1)[0]
You have a 2D numpy array of shape 100, 5 (you can check that with aggregate_predictors.shape). Your function returns whether or not the fifth column contains a value less than 0.5. You can do this with a single vectorized expression:
output = aggregate_predictors[:, 4] < 0.5
If you want to find the last column instead of the fifth, use index -1 instead:
output = aggregate_predictors[:, -1] < 0.5
The important thing to remember here is that all the comparison operators are vectorized element-wise in numpy. Usually, vectorizing an operation like this involves finding the correct index in the array. You should never have to convert anything to a list: numpy arrays are iterable as it is, and there are more complex iterators available.
That being said, your original intent was probably to do something like
input = split(aggregate_predictors, len(aggregate_predictors), axis=0)
OR
input = split(aggregate_predictors, aggregate_predictors.shape[0])
Both expressions are equivalent. They split aggregate_predictors into a list of 100 single-row matrices.

Numpy array update command explanation

How is this operation called technically and what other functionalities does it allow for:
Z[1:-1,1:-1][birth|survive]=1. Where Z is a 4x4 array and birth and survive are same size Boolean arrays. I understand what this code does, but would like to know how is this operation called and what else can I do with it (talking about this latter part [birth|survive]).
The pipe | is the bitwise or operator. Therefore, birth|survive is the equivalent to np.bitwise_or(birth, survive). Presumably birth and survive are boolean arrays, so the output is a boolean array with the straightforward or behavior:
a = np.array([True, True, False, False])
b = np.array([True, False, False, True])
a|b
# array([ True, True, False, True], dtype=bool)
For integers, each bit is considered and an integer array is returned where for each digit in the binary representation has been or'ed. There is a better explanation on its behavior and some examples at the documentation page.
Once you've created the boolean array from birth|survive, you are using it to do a boolean index into the Z array. Most simply, this can be shown with:
a = np.array([1,2,3])
b = np.array([True, False, True])
a[b] # the elements of a where b is True
# array([1, 3])
Since it's on the left side of the assignment =, python will assign the value 1 to every point in Z where birth or survive is True:
a[b] = 99
a
# array([99, 2, 99])

Fill scipy / numpy matrix based on indices and values

I have a graph of nodes which each represent about 100 voxels in the brain. I partitioned the graph into communities, but now I need to make a correlation matrix where every voxel in a node is connected to every voxel in the nodes that are in the same community. In other words, if nodes 1 and 2 are in the same community, I need a 1 in the matrix between every voxel in node 1 and every voxel in node 2. This takes a very long time with the code below. Does anyone know how to speed this up?
for edge in combinations(graph.nodes(),2):
if partition.get_node_community(edge[0]) == partition.get_node_community(edge[1]): # if nodes are in same community
voxels1 = np.argwhere(flat_parcel==edge[0]+1) # this is where I find the voxels in each node, and I get the indices for the matrix where I want them.
voxels2 = np.argwhere(flat_parcel==edge[1]+1)
for voxel1 in voxels1:
voxel_matrix[voxel1,voxels2] = 1
Thanks for the responses, I think the easiest and fastest solution is to replace the last loop with
voxel_matrix[np.ix_(voxels1, voxels2)] = 1
Here's an approach that I expect to work for you. It's a stretch on my machine -- even storing two copies of the voxel adjacency matrix (using dtype=bool) pushes my (somewhat old) desktop right to the edge of its memory capacity. But I'm assuming that you have a machine capable of handling at least two (300 * 100) ** 2 = 900 MB arrays -- otherwise, you would probably have run into problems before this stage. It takes my desktop about 30 minutes to process 30000 voxels.
This assumes that voxel_communities is a simple array containing a community label for each voxel at index i. It sounds like you can generate that pretty quickly. It also assumes that voxels are present in only one node.
def voxel_adjacency(voxel_communities):
n_voxels = voxel_communities.size
comm_labels = sorted(set(voxel_communities))
comm_counts = [(voxel_communities == l).sum() for l in comm_labels]
blocks = numpy.zeros((n_voxels, n_voxels), dtype=bool)
start = 0
for c in comm_counts:
blocks[start:start + c, start:start + c] = 1
start += c
ix = numpy.empty_like(voxel_communities)
ix[voxel_communities.argsort()] = numpy.arange(n_voxels)
blocks[:] = blocks[ix,:]
blocks[:] = blocks[:,ix]
return blocks
Here's a quick explanation. This uses an inverse indexing trick to reorder the columns and rows of an array of diagonal blocks into the desired matrix.
n_voxels = voxel_communities.size
comm_labels = sorted(set(voxel_communities))
comm_counts = [(voxel_communities == l).sum() for l in comm_labels]
blocks = numpy.zeros((n_voxels, n_voxels), dtype=bool)
start = 0
for c in comm_counts:
blocks[start:start + c, start:start + c] = 1
start += c
These lines are used to construct the initial block matrix. So for example, say you have six voxels and three communities, and each community contains two voxels. Then the initial block matrix will look like this:
array([[ True, True, False, False, False, False],
[ True, True, False, False, False, False],
[False, False, True, True, False, False],
[False, False, True, True, False, False],
[False, False, False, False, True, True],
[False, False, False, False, True, True]], dtype=bool)
This is essentially the same as the desired adjacency matrix after the voxels have been sorted by community membership. So we need to reverse that sorting. We do so by constructing an inverse argsort array.
ix = numpy.empty_like(voxel_communities)
ix[voxel_communities.argsort()] = numpy.arange(n_voxels)
Now ix will reverse the sorting process when used as an index. And since this is a symmetric matrix, we can perform the reverse sorting operation separately on columns and then on rows:
blocks[:] = blocks[ix,:]
blocks[:] = blocks[:,ix]
return blocks
Here's an example of the result it generates for a small input:
>>> voxel_adjacency(numpy.array([0, 3, 1, 1, 0, 2]))
array([[ True, False, False, False, True, False],
[False, True, False, False, False, False],
[False, False, True, True, False, False],
[False, False, True, True, False, False],
[ True, False, False, False, True, False],
[False, False, False, False, False, True]], dtype=bool)
It seems to me that this does something quite similar to voxel_matrix[np.ix_(voxels1, voxels2)] = 1 as suggested by pv., except it does it all at once, instead of tracking each possible combination of nodes.
There may be a better solution, but this should at least be an improvement.
Also, note that if you can simply accept the new ordering of voxels as canonical, then this solution becomes as simple as creating the block array! That takes all of about 300 milliseconds.

Categories