too many indices for array when using np.where - python

I have the code:
a=b=np.arange(9).reshape(3,3)
c=np.zeros(3)
for x in range(3):
c[x]=np.average(b[np.where(a<x+3)])
The output of c is
>>>array([ 1. , 1.5, 2. ])
Instead of the for loop, I wanna use array (vectorization), then I did the following code:
a=b=np.arange(9).reshape(3,3)
c=np.zeros(3)
i=np.arange(3)
c[i]=np.average(b[np.where(a<i[:,None,None]+3)])
But it shows IndexError: too many indices for array
As for a<i[:,None,None]+3
it correctly shows
array([[[ True, True, True],
[False, False, False],
[False, False, False]],
[[ True, True, True],
[ True, False, False],
[False, False, False]],
[[ True, True, True],
[ True, True, False],
[False, False, False]]], dtype=bool)
But when I use b[np.where(a<i[:,None,None]+3)], it again shows IndexError: too many indices for array. I cannot get the correct output of c.

I am sensing you are trying to vectorize things here, though not explicitly mentioned. Now, I don't think you can index like that in a vectorized manner. To solve your qustion in a vectorized manner, I would suggest a more efficient way to get the sum-reduction with matrix-multiplication using np.tensordot and with help from broadcasting as you had set out already in your trials.
Thus, one solution would be -
from __future__ import division
i = np.arange(3)
mask = a<i[:,None,None]+3
c = np.tensordot(b,mask,axes=((0,1),(1,2)))/mask.sum((1,2))
Related post to understand tensordot.
Possible improvements on performance
Convert the mask to float dtype before feeding to np.dot as BLAS based matrix-multiplication would be faster with it.
Use np.count_nonzero instead of np.sum for counting booleans. So, use it to replace mask.sum() part.

Related

Is there a fast way to create a bool matrix from another matrix in python?

i would like to know if there is a faster way, not O(n^2), to create a bool matrix out of an integer nxn-matrix.
Example:
given is the matrix:
matrix_int = [[-5,-8,6],[4,6,-9],[7,8,9]]
after transformation i want this:
matrix_bool = [[False,False,True],[True,True,False],[True,True,True]]
so all negative values should be False and all positive values should be True.
The brute force way is O(n^2) and this is too slow for me, too you have any ideas how to make this faster?
matrix_int = [[-5,-8,6],[4,6,-9],[7,8,9]]
matrix_int = np.array(matrix_int)
bool_mat = matrix_int > 0
result:
array([[False, False, True],
[ True, True, False],
[ True, True, True]])
matrix_int = [[-5,-8,6],[4,6,-9],[7,8,9]]
matrix_bool = [[num > 0 for num in row] for row in matrix_int]
# [[False, False, True], [True, True, False], [True, True, True]]

Check numpy array if a row contains at least one false

I have an array 2*n of True and False boolean.
array([[False, True],
[False, True],
[False, True],
...,
[False, True],
[False, True],
[False, True]])
What I want is a new vector, can be in another array that has False if any of the two values are False.
I can create a loop and check each value in the row, and make a new vector. but I'm guessing it's slow
boidx = np.empty(len(minindex), dtype=bool)
for idx in range(len(minindex)):
if minindex[idx,0] and minindex[idx,1]:
boidx[idx]=True
else:
boidx[idx]=False
but this is long and not pythonic.
The array is either 2n or 4n. so it should cover those options (my for loop does not)
but if needed, two solutions with an if for size is doable.
I also tried to use numpy.isin() command. but it works for each cell. I need per row.
As an answer already pointed out, you can use numpy.all() to solve it.
A simpler formulation without any loop would be:
np.all(minindex, axis=1)
If I understand correctly, a pythonic solution could use numpy.all:
import numpy as np
minindex = np.array([[False, True],
[False, True],
[True, True],
[True, False],
[False, True],
[False, False],
[False, False],
[True, True]
boidx = np.array([np.all(i) for i in minindex])
and you get:
[False False True False False False False True]
Another solution could be the use of prod:
boidx = np.array([bool(i.prod()) for i in minindex])
and you get the same result.
As suggested by #Jianyu, this way should be definitely faster:
boidx = np.all(minindex, axis=1)

Fill scipy / numpy matrix based on indices and values

I have a graph of nodes which each represent about 100 voxels in the brain. I partitioned the graph into communities, but now I need to make a correlation matrix where every voxel in a node is connected to every voxel in the nodes that are in the same community. In other words, if nodes 1 and 2 are in the same community, I need a 1 in the matrix between every voxel in node 1 and every voxel in node 2. This takes a very long time with the code below. Does anyone know how to speed this up?
for edge in combinations(graph.nodes(),2):
if partition.get_node_community(edge[0]) == partition.get_node_community(edge[1]): # if nodes are in same community
voxels1 = np.argwhere(flat_parcel==edge[0]+1) # this is where I find the voxels in each node, and I get the indices for the matrix where I want them.
voxels2 = np.argwhere(flat_parcel==edge[1]+1)
for voxel1 in voxels1:
voxel_matrix[voxel1,voxels2] = 1
Thanks for the responses, I think the easiest and fastest solution is to replace the last loop with
voxel_matrix[np.ix_(voxels1, voxels2)] = 1
Here's an approach that I expect to work for you. It's a stretch on my machine -- even storing two copies of the voxel adjacency matrix (using dtype=bool) pushes my (somewhat old) desktop right to the edge of its memory capacity. But I'm assuming that you have a machine capable of handling at least two (300 * 100) ** 2 = 900 MB arrays -- otherwise, you would probably have run into problems before this stage. It takes my desktop about 30 minutes to process 30000 voxels.
This assumes that voxel_communities is a simple array containing a community label for each voxel at index i. It sounds like you can generate that pretty quickly. It also assumes that voxels are present in only one node.
def voxel_adjacency(voxel_communities):
n_voxels = voxel_communities.size
comm_labels = sorted(set(voxel_communities))
comm_counts = [(voxel_communities == l).sum() for l in comm_labels]
blocks = numpy.zeros((n_voxels, n_voxels), dtype=bool)
start = 0
for c in comm_counts:
blocks[start:start + c, start:start + c] = 1
start += c
ix = numpy.empty_like(voxel_communities)
ix[voxel_communities.argsort()] = numpy.arange(n_voxels)
blocks[:] = blocks[ix,:]
blocks[:] = blocks[:,ix]
return blocks
Here's a quick explanation. This uses an inverse indexing trick to reorder the columns and rows of an array of diagonal blocks into the desired matrix.
n_voxels = voxel_communities.size
comm_labels = sorted(set(voxel_communities))
comm_counts = [(voxel_communities == l).sum() for l in comm_labels]
blocks = numpy.zeros((n_voxels, n_voxels), dtype=bool)
start = 0
for c in comm_counts:
blocks[start:start + c, start:start + c] = 1
start += c
These lines are used to construct the initial block matrix. So for example, say you have six voxels and three communities, and each community contains two voxels. Then the initial block matrix will look like this:
array([[ True, True, False, False, False, False],
[ True, True, False, False, False, False],
[False, False, True, True, False, False],
[False, False, True, True, False, False],
[False, False, False, False, True, True],
[False, False, False, False, True, True]], dtype=bool)
This is essentially the same as the desired adjacency matrix after the voxels have been sorted by community membership. So we need to reverse that sorting. We do so by constructing an inverse argsort array.
ix = numpy.empty_like(voxel_communities)
ix[voxel_communities.argsort()] = numpy.arange(n_voxels)
Now ix will reverse the sorting process when used as an index. And since this is a symmetric matrix, we can perform the reverse sorting operation separately on columns and then on rows:
blocks[:] = blocks[ix,:]
blocks[:] = blocks[:,ix]
return blocks
Here's an example of the result it generates for a small input:
>>> voxel_adjacency(numpy.array([0, 3, 1, 1, 0, 2]))
array([[ True, False, False, False, True, False],
[False, True, False, False, False, False],
[False, False, True, True, False, False],
[False, False, True, True, False, False],
[ True, False, False, False, True, False],
[False, False, False, False, False, True]], dtype=bool)
It seems to me that this does something quite similar to voxel_matrix[np.ix_(voxels1, voxels2)] = 1 as suggested by pv., except it does it all at once, instead of tracking each possible combination of nodes.
There may be a better solution, but this should at least be an improvement.
Also, note that if you can simply accept the new ordering of voxels as canonical, then this solution becomes as simple as creating the block array! That takes all of about 300 milliseconds.

How do I search for indices that satisfy condition in numpy?

I have columns corresponding to a given day, month, and year in a numpy array called 'a' and I am comparing all three of these values to the columns of another array called 'b' which also correspond to day,month, and year to find the index of 'a' that is equal to 'b' so far I have tried:
a[:,3:6,1] == b[1,3:6]
array([[False, True, True],
[ True, True, True],
[False, True, True],
...,
[False, False, False],
[False, False, False],
[False, False, False]], dtype=bool)
which works fine but I need the row that corresponds to [True,True,True]
I've also tried:
np.where(a[:,3:6,1] == b[1,3:6], a[:,3:6,1])
ValueError: either both or neither of x and y should be given
and
a[:,:,1].all(a[:,3:6,1] == b[1,3:6])
TypeError: only length-1 arrays can be converted to Python scalars
What is a quick and easy way to do this?
You can use np.all() along the last axis:
rows = np.where((a[:,3:6,1]==b[1,3:6]).all(axis=1))[0]
it will store in rows the indices where all the row contains True values.

Find where a NumPy array is equal to any value in a list of values

I have an array of integers and want to find where that array is equal to any value in a list of multiple values.
This can easily be done by treating each value individually, or by using multiple "or" statements in a loop, but I feel like there must be a better/faster way to do it. I'm actually dealing with arrays of size 4000 x 2000, but here is a simplified edition of the problem:
fake = arange(9).reshape((3,3))
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
want = (fake==0) + (fake==2) + (fake==6) + (fake==8)
print want
array([[ True, False, True],
[False, False, False],
[ True, False, True]], dtype=bool)
What I would like is a way to get want from a single command involving fake and the list of values [0, 2, 6, 8].
I'm assuming there is a package that has this included already that would be significantly faster than if I just wrote a function with a loop in Python.
The function numpy.in1d seems to do what you want. The only problems is that it only works on 1d arrays, so you should use it like this:
In [9]: np.in1d(fake, [0,2,6,8]).reshape(fake.shape)
Out[9]:
array([[ True, False, True],
[False, False, False],
[ True, False, True]], dtype=bool)
I have no clue why this is limited to 1d arrays only. Looking at its source code, it first seems to flatten the two arrays, after which it does some clever sorting tricks. But nothing would stop it from unflattening the result at the end again, like I had to do by hand here.
NumPy 0.13+
As of NumPy v0.13, you can use np.isin, which works on multi-dimensional arrays:
>>> element = 2*np.arange(4).reshape((2, 2))
>>> element
array([[0, 2],
[4, 6]])
>>> test_elements = [1, 2, 4, 8]
>>> mask = np.isin(element, test_elements)
>>> mask
array([[ False, True],
[ True, False]])
NumPy pre-0.13
The accepted answer with np.in1d works only with 1d arrays and requires reshaping for the desired result. This is good for versions of NumPy before v0.13.
#Bas's answer is the one you're probably looking for. But here's another way to do it, using numpy's vectorize trick:
import numpy as np
S = set([0,2,6,8])
#np.vectorize
def contained(x):
return x in S
contained(fake)
=> array([[ True, False, True],
[False, False, False],
[ True, False, True]], dtype=bool)
The con of this solution is that contained() is called for each element (i.e. in python-space), which makes this much slower than a pure-numpy solution.

Categories