How to invert numpy.where (np.where) function - python

I frequently use the numpy.where function to gather a tuple of indices of a matrix having some property. For example
import numpy as np
X = np.random.rand(3,3)
>>> X
array([[ 0.51035326, 0.41536004, 0.37821622],
[ 0.32285063, 0.29847402, 0.82969935],
[ 0.74340225, 0.51553363, 0.22528989]])
>>> ix = np.where(X > 0.5)
>>> ix
(array([0, 1, 2, 2]), array([0, 2, 0, 1]))
ix is now a tuple of ndarray objects that contain the row and column indices, whereas the sub-expression X>0.5 contains a single boolean matrix indicating which cells had the >0.5 property. Each representation has its own advantages.
What is the best way to take ix object and convert it back to the boolean form later when it is desired? For example
G = np.zeros(X.shape,dtype=np.bool)
>>> G[ix] = True
Is there a one-liner that accomplishes the same thing?

Something like this maybe?
mask = np.zeros(X.shape, dtype='bool')
mask[ix] = True
but if it's something simple like X > 0, you're probably better off doing mask = X > 0 unless mask is very sparse or you no longer have a reference to X.

mask = X > 0
imask = np.logical_not(mask)
For example
Edit: Sorry for being so concise before. Shouldn't be answering things on the phone :P
As I noted in the example, it's better to just invert the boolean mask. Much more efficient/easier than going back from the result of where.

The bottom of the np.where docstring suggests to use np.in1d for this.
>>> x = np.array([1, 3, 4, 1, 2, 7, 6])
>>> indices = np.where(x % 3 == 1)[0]
>>> indices
array([0, 2, 3, 5])
>>> np.in1d(np.arange(len(x)), indices)
array([ True, False, True, True, False, True, False], dtype=bool)
(While this is a nice one-liner, it is a lot slower than #Bi Rico's solution.)

Related

How do I remove separate elements in a vector without using the range function?

I've created vector x and I need to create a vector z by removing the 3rd and 6th elements of x. I cannot just create a vector by simply typing in the elements that should be in z. I have to index them or use a separate function.
x = [5,2,0,6,-10,12]
np.array(x)
print x
z = np.delete(x,)
I am not sure if using np.delete is best or if there is a better approach. Help?
You can index and conact pieces of the list excluding the one you want to "delete"
x = [5,2,0,6,-10,12]
print ( x[0:2]+x[3:5] )
[5, 2, 6, -10]
if x is numpy array, first convert to list:
x = list(x)
if not array then:
z = [x.pop(2), x.pop(-1)]
This will remove 3rd and 6th element form x and place it in z. Then convert it to numpy array if needed.
In [69]: x = np.array([5,2,0,6,-10,12])
Using delete is straight forward:
In [70]: np.delete(x,[2,5])
Out[70]: array([ 5, 2, 6, -10])
delete is a general function that takes various approaches based on the delete object, but in a case like this it uses a boolean mask:
In [71]: mask = np.ones(x.shape, bool); mask[[2,5]] = False; mask
Out[71]: array([ True, True, False, True, True, False])
In [72]: x[mask]
Out[72]: array([ 5, 2, 6, -10])

What is properway to specify numpy masked array maksed value?

I basically want to run something like the following
x = np.array([1,2,3,4,5])
a = ma.masked_array(x, mask=[0, 0, 0, 1, 0])
for i in range(5):
if (a[i] == "--"):
print("a[{0:d}] is masked value".format(i))
I am not sure how I should specify the -- value of the masked array in the if (a[i] == "--") part where "--" is something that I could not figure out. I know there are few other ways of doing it by processing the entire masked array into a boolean values, but I don't want that.
Edit.
The array a is an masked array, and when I print it out I get
masked_array(data=[1, 2, 3, --, 5],
mask=[False, False, False, True, False],
fill_value=999999)
What I want to do is to skip the -- values in that output using the if statement.
A masked array has two key attributes, data and mask.
In [63]: a.mask
Out[63]: array([False, False, False, True, False])
In [64]: a.data
Out[64]: array([1, 2, 3, 4, 5])
getmask docs say its equivalent to getting the attribute:
In [65]: np.ma.getmask(a)
Out[65]: array([False, False, False, True, False])
That mask can then be used to select values from data:
In [66]: a.data[a.mask]
Out[66]: array([4])
More commonly we are interested in the unmasked values:
In [67]: a.compressed()
Out[67]: array([1, 2, 3, 5])
After all if using masking, we aren't "supposed" to care about the masked values. The compressed ones can be used to take the sum:
In [68]: a.sum()
Out[68]: 11
Alternatively the masked values can be filled with something innocuous
In [69]: a.filled()
Out[69]: array([ 1, 2, 3, 999999, 5])
In [70]: a.filled(0)
Out[70]: array([1, 2, 3, 0, 5])
The proper way should be:
mask_a = numpy.ma.getmask(a)
which following your example returns the mask array:
array([False, False, False, True, False])
If I understand correctly how numpy works internally, this does not "process" the masked array to get boolean out of it. The mask is already there, you are just getting it in a proper array which can be used in your for loop, so if you are worried about performance... don't worry.
for i in range(5):
if mask_a[i]:
print("a[{0:d}] is masked value".format(i))
However, if for whatever reason you don't want to use the getmask function, you can get the string representation of a.
str_a = str(a)
which in your example is: '[1 2 3 -- 5]'
Then you can strip the square brackets and split the string on white spaces:
str_a = str(a)[1:-1].split()
which in your example is ['1', '2', '3', '--', '5'].
Then you have a list where you can filter out the "--" values with your for loop:
for i in range(5):
if str_a[i] == "--":
print("a[{0:d}] is masked value".format(i))
But honestly, using the getmask function should be the way to go: I didn't profile it, but I expect it to be faster.

Count the occurrences of a specific value and remove them at the same time

I want to count the occurrences of a specific value (in my case -1) in a numpy array and delete them at the same time.
I could do that so here is what I've done:
a = np.array([1, 2, 0, -1, 3, -1, -1])
b = a[a==-1]
a = np.delete(a, np.where(a==-1))
print("a -> ", a) # a -> [1 2 0 3]
print("b -> ", b) # b -> 3
Is there any more optimised way to do it ?
Something like this ?
Using numpy like you did is probably more optimized though.
a = [x for x in a if x != -1]
First, a list in-place count and delete operation:
In [100]: al=a.tolist(); cnt=0
In [101]: for i in range(len(a)-1,-1,-1):
...: if al[i]==-1:
...: del al[i]
...: cnt += 1
In [102]: al
Out[102]: [1, 2, 0, 3]
In [103]: cnt
Out[103]: 3
It operates in place, but has to work from the end. The list comprehension alternative makes a new list, but often is easier to write and read.
The cleanest array operation uses a boolean mask.
In [104]: idx = a==-1
In [105]: idx
Out[105]: array([False, False, False, True, False, True, True], dtype=bool)
In [106]: np.sum(idx) # or np.count_nonzero(idx)
Out[106]: 3
In [107]: a[~idx]
Out[107]: array([1, 2, 0, 3])
You have to identify, in one way or other, all elements that match the target. The count is a trivial operation. Masking is also easy.
np.delete has to be told which items to delete; and in one way or other constructs a new array that contains all but the 'deleted' ones. Because of its generality it will almost always be slower than a direct action like this masking.
np.where (aka np.nonzeros) uses count_nonzero to determine how many values it will return.
So I'm proposing the same actions as you are doing, but in a little more direct way.

numpy.where for 2+ specific values

Can the numpy.where function be used for more than one specific value?
I can specify a specific value:
>>> x = numpy.arange(5)
>>> numpy.where(x == 2)[0][0]
2
But I would like to do something like the following. It gives an error of course.
>>> numpy.where(x in [3,4])[0][0]
[3,4]
Is there a way to do this without iterating through the list and combining the resulting arrays?
EDIT: I also have a lists of lists of unknown lengths and unknown values so I cannot easily form the parameters of np.where() to search for multiple items. It would be much easier to pass a list.
You can use the numpy.in1d function with numpy.where:
import numpy
numpy.where(numpy.in1d(x, [2,3]))
# (array([2, 3]),)
I guess np.in1d might help you, instead:
>>> x = np.arange(5)
>>> np.in1d(x, [3,4])
array([False, False, False, True, True], dtype=bool)
>>> np.argwhere(_)
array([[3],
[4]])
If you only need to check for a few values you can:
import numpy as np
x = np.arange(4)
ret_arr = np.where([x == 1, x == 2, x == 4, x == 0])[1]
print "Ret arr = ",ret_arr
Output:
Ret arr = [1 2 0]

Printing 'True' index/es after comparing numpy arrays with numpy.logical_and(...)

Is there a more elegant way to return the index of a True value when comparing arrays using numpy.logical_and(...) than just looping through the boolean result vector and printing/saving the index?
Currently I have something like:
array1 = numpy.array([1,2,3])
array2 = numpy.array([0.5,1.2,2])
comp = numpy.logical_and(numpy.logical_and(array1 != 0, array2 != 0), array1 > (3*array2))
if True in comp:
# basically just loop and find True/s.
Would prefer something that just returns the locations that have true values so I can more easily and access them faster in the original ararys.
You can use numpy.where(), more particularly here numpy.where(comp)[0]
As a remark, you MCVE is not very well chosen, since comp does not contain True.
If instead I use
comp = numpy.logical_and(numpy.logical_and(array1 != 0, array2 != 0), array1 > (0.6+array2))
Then I get
>>>comp
array([False, True, True], dtype=bool)
>>> np.where(comp)
(array([1, 2]),)
>>> np.where(comp)[0]
array([1, 2])

Categories