Related
Using numpy and the following masked arrays
import numpy.ma as ma
a = ma.MaskedArray([[1,2,3],[4,5,6]], [[True,False,False],[False,False,False]])
ta = ma.array([1,4,5])
>>> a
masked_array(
data=[[--, 2, 3],
[4, 5, 6]],
mask=[[ True, False, False],
[False, False, False]],
fill_value=999999)
>>> ta
masked_array(data=[1, 4, 5],
mask=False,
fill_value=999999)
to check for each element in a if it is in ta, I use
ma.isin(a, ta)
This command gives
masked_array(
data=[[False, False, False],
[True, True, --]],
mask=[[False, False, False],
[False, False, True]],
fill_value=True)
Why is the last element in the result masked? Neither of the input arrays is masked at this point.
Using the the standard numpy version produces to be expected results:
>>> import numpy as np
>>> np.isin(a, ta)
array([[ True, False, False],
[ True, True, False]])
Here, however, the very first element is True because the mask of a was ignored.
Tested with Python 3.9.4 and numpy 1.20.3.
I am trying to create an array of boolean masks from an array of breakpoint-pairs. So the result should be boolean masks of length n with true values in between the two breakpoints. I could solve the problem iteratively by writing a for loop but I want to find out the vectorized numpy equivalent for it.
mask = np.array([[False, False, False, False, False],
[False, False, False, False, False]])
breakpoints = np.array([[1, 3],
[2, 4]])
for i, bp in enumerate(breakpoints):
mask[i, bp[0]:bp[1]] = True
Output:
array([[False, True, True, False, False],
[False, False, True, True, False]])
Optimally, I would like to solve this with indexing and array operations in numpy but I can't get my head around the correct way of doing it.
I hope this example is clear and thank you for any help!
You can use the following trick:
>>> breakpoints = np.array([[1, 3],
... [2, 4]])
>>> output_width = 5
>>> idx = np.arange(output_width)
>>> (breakpoints[:,[0]] <= idx) & (idx < breakpoints[:,[1]])
array([[False, True, True, False, False],
[False, False, True, True, False]])
I have an array like this:
arrayElements = [[1, 4, 6],[2, 4, 6],[3, 5, 6],...,[2, 5, 6]]
I need to know, for example, the indices where an arrayElements is equal to 1.
Right now, I am doing:
rows, columns = np.where(arrayElements == 1)
This works, but I am doing this in a loop that loops through all possible element values, in my case, it's 1-500,000+. This is taking 30-40 minutes to run depending how big my array is. Can anyone suggest a better way of going about this? (Additional information is that I don't care about the column that the value is in, just the row, not sure if that's useful.)
Edit: I need to know the value of every element separately. That is, I need the values of rows for each value that elements contains.
So you are generating thousands of arrays like this:
In [271]: [(i,np.where(arr==i)[0]) for i in range(1,7)]
Out[271]:
[(1, array([0])),
(2, array([1, 3])),
(3, array([2])),
(4, array([0, 1])),
(5, array([2, 3])),
(6, array([0, 1, 2, 3]))]
I could do the == test for all values at once with a bit of broadcasting:
In [281]: arr==np.arange(1,7)[:,None,None]
Out[281]:
array([[[ True, False, False],
[False, False, False],
[False, False, False],
[False, False, False]],
[[False, False, False],
[ True, False, False],
[False, False, False],
[ True, False, False]],
[[False, False, False],
[False, False, False],
[ True, False, False],
[False, False, False]],
[[False, True, False],
[False, True, False],
[False, False, False],
[False, False, False]],
[[False, False, False],
[False, False, False],
[False, True, False],
[False, True, False]],
[[False, False, True],
[False, False, True],
[False, False, True],
[False, False, True]]])
and since you only care about rows, apply an any:
In [282]: (arr==np.arange(1,7)[:,None,None]).any(axis=2)
Out[282]:
array([[ True, False, False, False],
[False, True, False, True],
[False, False, True, False],
[ True, True, False, False],
[False, False, True, True],
[ True, True, True, True]])
The where on this is the same values as in Out[271], but grouped differently:
In [283]: np.where((arr==np.arange(1,7)[:,None,None]).any(axis=2))
Out[283]:
(array([0, 1, 1, 2, 3, 3, 4, 4, 5, 5, 5, 5]),
array([0, 1, 3, 2, 0, 1, 2, 3, 0, 1, 2, 3]))
It can be split up with:
In [284]: from collections import defaultdict
In [285]: dd = defaultdict(list)
In [287]: for i,j in zip(*Out[283]): dd[i].append(j)
In [288]: dd
Out[288]:
defaultdict(list,
{0: [0], 1: [1, 3], 2: [2], 3: [0, 1], 4: [2, 3], 5: [0, 1, 2, 3]})
This 2nd approach may be faster for some arrays, though it may not scale well to your full problem.
By using np.isin (see documentation), you can test for multiple element values.
For example:
import numpy as np
a = np.array([1,2,3,4])
check_for = np.array([1,2])
locs = np.isin(a, check_for)
# [True, True, False, False]
np.where(locs)
#[0, 1]
Note: This assumes that you do not need to know the indices for every element value separately.
In the case that you need to track every element value separately, use a default dictionary and iterate through the matrix.
from collections import defaultdict
tracker = defaultdict(set)
for (row, column), value in np.ndenumerate(arrayElements):
tracker[value].add(row)
You could try looping over the values and indices using numpy.ndenumerate and using Counter, defaultdict, or dict where the keys are the values in the array.
I have the following numpy array:
a = np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
I can use extended slicing to select e.g. columns:
>>> a[:,0::2]
array([[1, 3],
[4, 6],
[7, 9]])
>>> a[:,1::2]
array([[2],
[5],
[8]])
But I want to produce the following:
array([[True, False, True],
[True, False, True],
[True, False, True]])
array([[False, True, False],
[False, True, False],
[False, True, False]])
import numpy as np
bools = np.array([[False, False, False],
[False, False, False],
[False, False, False]])
bools[:, 0::2] = True
print(bools)
Output:
[[ True False True]
[ True False True]
[ True False True]]
np.array([[True if y%2==0 else False for y,z in enumerate(x)] for x in bools])
np.array([[False if y%2==0 else True for y,z in enumerate(x)] for x in bools])
Explanation:
By using list comprehension, variable 'x' iterates through each row of a. The inner list comprehension iterates through each of this('x' from outer comprehension) list elements. It can be observed that in your output, True & False values depend on index of the elements rather than the element values. Hence by using enumerate(), we get the index of each element in 'y' & value in 'z'. And using conditions on 'y', we decide on replacing with True or False
Input example:
I have a numpy array, e.g.
a=np.array([[0,1], [2, 1], [4, 8]])
Desired output:
I would like to produce a mask array with the max value along a given axis, in my case axis 1, being True and all others being False. e.g. in this case
mask = np.array([[False, True], [True, False], [False, True]])
Attempt:
I have tried approaches using np.amax but this returns the max values in a flattened list:
>>> np.amax(a, axis=1)
array([1, 2, 8])
and np.argmax similarly returns the indices of the max values along that axis.
>>> np.argmax(a, axis=1)
array([1, 0, 1])
I could iterate over this in some way but once these arrays become bigger I want the solution to remain something native in numpy.
Method #1
Using broadcasting, we can use comparison against the max values, while keeping dims to facilitate broadcasting -
a.max(axis=1,keepdims=1) == a
Sample run -
In [83]: a
Out[83]:
array([[0, 1],
[2, 1],
[4, 8]])
In [84]: a.max(axis=1,keepdims=1) == a
Out[84]:
array([[False, True],
[ True, False],
[False, True]], dtype=bool)
Method #2
Alternatively with argmax indices for one more case of broadcasted-comparison against the range of indices along the columns -
In [92]: a.argmax(axis=1)[:,None] == range(a.shape[1])
Out[92]:
array([[False, True],
[ True, False],
[False, True]], dtype=bool)
Method #3
To finish off the set, and if we are looking for performance, use intialization and then advanced-indexing -
out = np.zeros(a.shape, dtype=bool)
out[np.arange(len(a)), a.argmax(axis=1)] = 1
Create an identity matrix and select from its rows using argmax on your array:
np.identity(a.shape[1], bool)[a.argmax(axis=1)]
# array([[False, True],
# [ True, False],
# [False, True]], dtype=bool)
Please note that this ignores ties, it just goes with the value returned by argmax.
You're already halfway in the answer. Once you compute the max along an axis, you can compare it with the input array and you'll have the required binary mask!
In [7]: maxx = np.amax(a, axis=1)
In [8]: maxx
Out[8]: array([1, 2, 8])
In [12]: a >= maxx[:, None]
Out[12]:
array([[False, True],
[ True, False],
[False, True]], dtype=bool)
Note: This uses NumPy broadcasting when doing the comparison between a and maxx
in on line : np.equal(a.max(1)[:,None],a) or np.equal(a.max(1),a.T).T .
But this can lead to several ones in a row.
In a multi-dimensional case you can also use np.indices. Let's suppose you have an array:
a = np.array([[
[0, 1, 2],
[3, 8, 5],
[6, 7, -1],
[9, 5, 8]],[
[5, 2, 8],
[7, 6, -3],
[-1, 2, 1],
[3, 5, 6]]
])
you can access argmax values calculated for axis 0 like so:
k = np.zeros((2, 4, 3), np.bool)
k[a.argmax(0), ind[0], ind[1]] = 1
The output would be:
array([[[False, False, False],
[False, True, True],
[ True, True, False],
[ True, True, True]],
[[ True, True, True],
[ True, False, False],
[False, False, True],
[False, False, False]]])