Unexpected masked element in numpy's isin() with masked arrays. Bug? - python

Using numpy and the following masked arrays
import numpy.ma as ma
a = ma.MaskedArray([[1,2,3],[4,5,6]], [[True,False,False],[False,False,False]])
ta = ma.array([1,4,5])
>>> a
masked_array(
data=[[--, 2, 3],
[4, 5, 6]],
mask=[[ True, False, False],
[False, False, False]],
fill_value=999999)
>>> ta
masked_array(data=[1, 4, 5],
mask=False,
fill_value=999999)
to check for each element in a if it is in ta, I use
ma.isin(a, ta)
This command gives
masked_array(
data=[[False, False, False],
[True, True, --]],
mask=[[False, False, False],
[False, False, True]],
fill_value=True)
Why is the last element in the result masked? Neither of the input arrays is masked at this point.
Using the the standard numpy version produces to be expected results:
>>> import numpy as np
>>> np.isin(a, ta)
array([[ True, False, False],
[ True, True, False]])
Here, however, the very first element is True because the mask of a was ignored.
Tested with Python 3.9.4 and numpy 1.20.3.

Related

how to mask 2d array by if there are at least 2 neighbors greater than one value?

For a 2d numpy array, e.g.,
import numpy as np
a = np.random.rand(5, 4)
a
a looks like
array([[0.92576936, 0.41860519, 0.26446948, 0.31691141],
[0.31797497, 0.2044637 , 0.20939504, 0.54034017],
[0.85781227, 0.40367301, 0.40215265, 0.95902499],
[0.15700837, 0.10680368, 0.61971475, 0.35586694],
[0.25211967, 0.98171005, 0.60740472, 0.89452886]])
Apparently, each element has neighbors. For elements in the border, it has 3 or 5 neighbors. And for central elements, it has 8. Thus, is there an efficient and elegant way to mask a by only selecting elements and their neighbors together greater than 0.5? That means, do not consider isolated elements larger than 0.5, whose neighbors all smaller than 0.5.
For a, the expected output mask would be
array([[False, False, False, False],
[False, False , False, True],
[False, False, False, True],
[False, False, True, False],
[False, True, True, True]])
You can use a 2D convolution:
from scipy.signal import convolve2d
kernel = np.array([[1, 1, 1],
[1, 10, 1],
[1, 1, 1]])
out = convolve2d(a>0.5, kernel, mode='same') > 10
The kernel is designed to count 10 for each center > 0.5 and 1 for each surrounding value > 0.5, and the convolution computes the sum. Thus if you have a total sum > 10, you know that the value is > 0.5 and so it at least one of its neighbors.
Output:
array([[False, False, False, False],
[False, False, False, True],
[False, False, False, True],
[False, False, True, False],
[False, True, True, True]])
more classical alternative
from scipy.signal import convolve2d
m = a>0.5
kernel = np.array([[1, 1, 1],
[1, 0, 1],
[1, 1, 1]])
out = m & (convolve2d(m, kernel, mode='same') > 0)

NumPy 2D array boolean indexing with each axis

I created 2D array and I did boolean indexing with 2 bool index arrays.
first one is for axis 0, next one is for axis 1.
I expected that values on cross True and True from each axis are selected like Pandas.
but the result is not.
I wonder how it works that code below.
and I want to get the link from official numpy site describing this question.
Thanks in advance.
a = np.arange(9).reshape(3,3)
a
----------------------------
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
a[ [True, False, True], [True, False, True] ]
--------------------------
array([0, 8])
My expectation is [0, 6, 2, 8].
(I know how to get the result that I expect.)
In [20]: a = np.arange(9).reshape(3,3)
If the lists are passed to ix_, the result is 2 arrays that can be used, with broadcasting to index the desired block:
In [21]: np.ix_([True, False, True], [True, False, True] )
Out[21]:
(array([[0],
[2]]),
array([[0, 2]]))
In [22]: a[_]
Out[22]:
array([[0, 2],
[6, 8]])
This isn't 1d, but can be easily raveled.
Trying to make equivalent boolean arrays does not work:
In [23]: a[[[True], [False], [True]], [True, False, True]]
Traceback (most recent call last):
File "<ipython-input-23-26bc93cfc53a>", line 1, in <module>
a[[[True], [False], [True]], [True, False, True]]
IndexError: too many indices for array: array is 2-dimensional, but 3 were indexed
Boolean indexes must be either 1d, or nd matching the target, here (3,3).
In [26]: np.array([True, False, True])[:,None]& np.array([True, False, True])
Out[26]:
array([[ True, False, True],
[False, False, False],
[ True, False, True]])
What you want is consecutive slices: a[[True, False, True]][:,[True, False, True]]
a = np.arange(9).reshape(3,3)
x = [True, False, True]
y = [True, False, True]
a[x][:,y]
as flat array
a[[True, False, True]][:,[True, False, True]].flatten(order='F')
output: array([0, 6, 2, 8])
alternative
NB. this requires arrays for slicing
a = np.arange(9).reshape(3,3)
x = np.array([False, False, True])
y = np.array([True, False, True])
a.T[x&y[:,None]]
output: array([0, 6, 2, 8])

How do I create a boolean mask of length n from two breakpoints without a for loop?

I am trying to create an array of boolean masks from an array of breakpoint-pairs. So the result should be boolean masks of length n with true values in between the two breakpoints. I could solve the problem iteratively by writing a for loop but I want to find out the vectorized numpy equivalent for it.
mask = np.array([[False, False, False, False, False],
[False, False, False, False, False]])
breakpoints = np.array([[1, 3],
[2, 4]])
for i, bp in enumerate(breakpoints):
mask[i, bp[0]:bp[1]] = True
Output:
array([[False, True, True, False, False],
[False, False, True, True, False]])
Optimally, I would like to solve this with indexing and array operations in numpy but I can't get my head around the correct way of doing it.
I hope this example is clear and thank you for any help!
You can use the following trick:
>>> breakpoints = np.array([[1, 3],
... [2, 4]])
>>> output_width = 5
>>> idx = np.arange(output_width)
>>> (breakpoints[:,[0]] <= idx) & (idx < breakpoints[:,[1]])
array([[False, True, True, False, False],
[False, False, True, True, False]])

Is there a faster alternative to np.where for determining indeces?

I have an array like this:
arrayElements = [[1, 4, 6],[2, 4, 6],[3, 5, 6],...,[2, 5, 6]]
I need to know, for example, the indices where an arrayElements is equal to 1.
Right now, I am doing:
rows, columns = np.where(arrayElements == 1)
This works, but I am doing this in a loop that loops through all possible element values, in my case, it's 1-500,000+. This is taking 30-40 minutes to run depending how big my array is. Can anyone suggest a better way of going about this? (Additional information is that I don't care about the column that the value is in, just the row, not sure if that's useful.)
Edit: I need to know the value of every element separately. That is, I need the values of rows for each value that elements contains.
So you are generating thousands of arrays like this:
In [271]: [(i,np.where(arr==i)[0]) for i in range(1,7)]
Out[271]:
[(1, array([0])),
(2, array([1, 3])),
(3, array([2])),
(4, array([0, 1])),
(5, array([2, 3])),
(6, array([0, 1, 2, 3]))]
I could do the == test for all values at once with a bit of broadcasting:
In [281]: arr==np.arange(1,7)[:,None,None]
Out[281]:
array([[[ True, False, False],
[False, False, False],
[False, False, False],
[False, False, False]],
[[False, False, False],
[ True, False, False],
[False, False, False],
[ True, False, False]],
[[False, False, False],
[False, False, False],
[ True, False, False],
[False, False, False]],
[[False, True, False],
[False, True, False],
[False, False, False],
[False, False, False]],
[[False, False, False],
[False, False, False],
[False, True, False],
[False, True, False]],
[[False, False, True],
[False, False, True],
[False, False, True],
[False, False, True]]])
and since you only care about rows, apply an any:
In [282]: (arr==np.arange(1,7)[:,None,None]).any(axis=2)
Out[282]:
array([[ True, False, False, False],
[False, True, False, True],
[False, False, True, False],
[ True, True, False, False],
[False, False, True, True],
[ True, True, True, True]])
The where on this is the same values as in Out[271], but grouped differently:
In [283]: np.where((arr==np.arange(1,7)[:,None,None]).any(axis=2))
Out[283]:
(array([0, 1, 1, 2, 3, 3, 4, 4, 5, 5, 5, 5]),
array([0, 1, 3, 2, 0, 1, 2, 3, 0, 1, 2, 3]))
It can be split up with:
In [284]: from collections import defaultdict
In [285]: dd = defaultdict(list)
In [287]: for i,j in zip(*Out[283]): dd[i].append(j)
In [288]: dd
Out[288]:
defaultdict(list,
{0: [0], 1: [1, 3], 2: [2], 3: [0, 1], 4: [2, 3], 5: [0, 1, 2, 3]})
This 2nd approach may be faster for some arrays, though it may not scale well to your full problem.
By using np.isin (see documentation), you can test for multiple element values.
For example:
import numpy as np
a = np.array([1,2,3,4])
check_for = np.array([1,2])
locs = np.isin(a, check_for)
# [True, True, False, False]
np.where(locs)
#[0, 1]
Note: This assumes that you do not need to know the indices for every element value separately.
In the case that you need to track every element value separately, use a default dictionary and iterate through the matrix.
from collections import defaultdict
tracker = defaultdict(set)
for (row, column), value in np.ndenumerate(arrayElements):
tracker[value].add(row)
You could try looping over the values and indices using numpy.ndenumerate and using Counter, defaultdict, or dict where the keys are the values in the array.

numpy.equal with nested lists

I'll want to search a rectangle in a picture. The picture is gathered from PIL. This means I'll get a 2d-array where each item is a list with three entries for the colors.
To get where's the rectangle with the searched color I'm using np.equal. Here an shrunk down example:
>>> l = np.array([[1,1], [2,1], [2,2], [1,0]])
>>> np.equal(l, [2,1]) # where [2,1] is the searched color
array([[False, True],
[ True, True],
[ True, False],
[False, False]], dtype=bool)
But I've expected:
array([False, True, False, False], dtype=bool)
or
array([[False, False],
[ True, True],
[ False, False],
[False, False]], dtype=bool)
How can I achieve a nested list comparison with numpy?
Note: and then I'll want to extract with np.where the indexes of the rectangle out of the result from np.equal.
You could use the all method along the second axis:
>>> result = numpy.array([[1, 1], [2, 1], [2, 2], [1, 0]]) == [2, 1]
>>> result.all(axis=1)
array([False, True, False, False], dtype=bool)
And to get the indices:
>>> result.all(axis=1).nonzero()
(array([1]),)
I prefer nonzero to where for this, because where does two very different things depending on how many arguments are passed to it. I use where when I need its unique functionality; when I need the behavior of nonzero, I use nonzero explicitly.

Categories