I have an array:
rp = np.array(
[
[[True, False], [True, True]],
[[True, True], [False, True]],
[[True, True], [True, True]],
[[True, True], [True, False]],
[[True, True], [True, True]],
]
)
I want to zero it out in the z axis above the first False point, i.e. set subsequent z values to False. Thus I have a test:
def test_f():
desired = np.array(
[
[[True, False], [True, True]],
[[True, False], [False, True]],
[[True, False], [False, True]],
[[True, False], [False, False]],
[[True, False], [False, False]],
]
)
assert (f(rp) == desired).all()
And I have some non vectorised code which passes the test:
def f(a: np.ndarray) -> np.ndarray:
lowest = np.argmin(a, axis=0)
for x, row in enumerate(lowest):
for y, z in enumerate(row):
if not a[z, x, y]: # catch cases with no False at all
a[z:, x, y] = False
return a
How do I vectorise this function?
What about this?
def f(a: np.ndarray) -> np.ndarray:
return a.cumprod(axis=0, dtype=np.bool)
I know that Numpy provides logical_and() which allows us to intersect two boolean arrays for True values only (True and True would yield True while True and False would yield False). For example,
a = np.array([True, False, False, True, False], dtype=bool)
b = np.array([False, True, True, True, False], dtype=bool)
np.logical_and(a, b)
> array([False, False, False, True, False], dtype=bool)
However, I'm wondering how I can apply this to two subarrays in an overall array? For example, consider the array:
[[[ True, True], [ True, False]], [[ True, False], [False, True]]]
The two subarrays I'm looking to intersect are:
[[ True, True], [ True, False]]
and
[[ True, False], [False, True]]
which should yield:
[[ True, False], [False, False]]
Is there a way to specify that I want to apply logical_and() to the outermost subarrays to combine the two?
You can use .reduce() along the first axis:
>>> a = np.array([[[ True, True], [ True, False]], [[ True, False], [False, True]]])
>>> np.logical_and.reduce(a, axis=0)
array([[ True, False],
[False, False]])
This works even when you have more than two "sub-arrays" in your outer array. I prefer this over the unpacking approach because it allows you to apply your function (np.logical_and) over any axis of your array.
If I understand your question correctly, you are looking to do:
import numpy as np
output = np.logical_and(a[:, 0], a[:, 1])
This simply slices your arrays so that you can use logical_and the way your results suggest.
Suppose I have an all-zero mask tensor like this:
mask = torch.zeros(5,3, dtype=torch.bool)
Now I want to set the value of mask at the intersection of the following rows and cols indices to True:
rows = torch.tensor([0,2,4])
cols = torch.tensor([1,2])
I would like to produce the following result:
tensor([[False, True, True ],
[False, False, False],
[False, True, True ],
[False, False, False],
[False, True, True ]])
When I try the following code, I receive an error:
mask[rows, cols] = True
IndexError: shape mismatch: indexing tensors could not be broadcast together with shapes [3], [2]
How can I do that efficiently in PyTorch?
You need proper shape for that you can use torch.unsqueeze
mask = torch.zeros(5,3, dtype=torch.bool)
mask[rows, cols.unsqueeze(1)] = True
mask
tensor([[False, True, True],
[False, False, False],
[False, True, True],
[False, False, False],
[False, True, True]])
or torch.reshape
mask[rows, cols.reshape(-1,1)] = True
mask
tensor([[False, True, True],
[False, False, False],
[False, True, True],
[False, False, False],
[False, True, True]])
A = np.array([5,1,5,8])
B = np.array([2,5])
I want to compare the A array to each element of B. In other words I'm lookin for a function which do the following computations :
A>2
A>5
(array([ True, False, True, True]), array([False, False, False, True]))
Not particularly fancy but a list comprehension will work:
[A > b for b in B]
[array([ True, False, True, True], dtype=bool),
array([False, False, False, True], dtype=bool)]
You can also use np.greater(), which requires the dimension-adding trick that Brenlla uses in the comments:
np.greater(A, B[:,np.newaxis])
array([[ True, False, True, True],
[False, False, False, True]], dtype=bool)
Consider the matrix quantiles that's a subset [:8,:3,0] of a 3D matrix with shape (10,355,8).
quantiles = np.array([
[ 1. , 1. , 1. ],
[ 0.63763978, 0.61848863, 0.75348137],
[ 0.43439645, 0.42485407, 0.5341457 ],
[ 0.22682343, 0.18878366, 0.25253915],
[ 0.16229408, 0.12541476, 0.15263742],
[ 0.12306046, 0.10372971, 0.09832783],
[ 0.09271845, 0.08209844, 0.05982584],
[ 0.06363636, 0.05471266, 0.03855727]])
I want a boolean output of the same shape as the quantiles matrix where True marks the row in which the median is located:
In [21]: medians
Out[21]:
array([[False, False, False],
[ True, True, False],
[False, False, True],
[False, False, False],
[False, False, False],
[False, False, False],
[False, False, False],
[False, False, False]], dtype=bool)
To achieve this, I have the following algorithm in mind:
1) Identify the entries that are greater than .5:
In [22]: quantiles>.5
Out[22]:
array([[ True, True, True],
[ True, True, True],
[False, False, True],
[False, False, False],
[False, False, False],
[False, False, False],
[False, False, False],
[False, False, False]], dtype=bool)
2) Considering only the values subset by the quantiles>.5 operation, mark the row that minimizes the np.abs distance between the entry and .5. Torturing the terminology a bit, I wish to intersect the two matrices of np.argmin(np.abs(quantiles-.5),axis=0) and quantiles>.5 to get the above result. However, I cannot for my life figure out a way to perform the np.argmin on the subset and retain the shape of the quantile matrix.
PS. Yes, there is a similar question here but it doesn't implement my algorithm which could be, I think, more efficient on a larger scale
Bumping into the old mask operation in Numpy, I found the following solution
#mask quantities that are less than .5
masked_quantiles = ma.masked_where(quantiles<.5,quantiles)
#identify the minimum in column of the masked array
median_idx = np.where(masked_quantiles == masked_quantiles.min(axis=0))
#make a matrix of all False values
median_mat = np.zeros(quantiles.shape, dtype=bool)
#assign True value to corresponding rows
In [86]: median_mat[medians] = True
In [87]: median_mat
Out[87]:
array([[False, False, False],
[ True, True, False],
[False, False, True],
[False, False, False],
[False, False, False],
[False, False, False],
[False, False, False],
[False, False, False]], dtype=bool)
Update: comparison of my answer to that of Divakar's:
I ran two comparisons, one on the sample 2D matrix provided for this question and one on my 3D (10,380,8) dataset (not large data by any means).
Sample dataset:
My code
%%timeit
masked_quantiles = ma.masked_where(quantiles<=.5,quantiles)
median_idx = masked_quantiles.argmin(0)
10000 loops, best of 3: 65.1 µs per loop
Divakar's code
%%timeit
mask1 = quantiles<=0.5
min_idx = (quantiles+mask1).argmin(0)
The slowest run took 17.49 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 5.92 µs per loop
Full dataset
My code:
%%timeit
masked_quantiles = ma.masked_where(quantiles<=.5,quantiles)
median_idx = masked_quantiles.argmin(0)
1000 loops, best of 3: 490 µs per loop
Divakar's code:
%%timeit
mask1 = quantiles<=0.5
min_idx = (quantiles+mask1).argmin(0)
10000 loops, best of 3: 172 µs per loop
Conclusion:
Divakar's answer seems about 3-12 times faster than mine. I presume that the np.ma.where masking operation takes longer than matrix addition. However, the addition operation needs to be stored whereas masking may be more efficient on larger datasets. I wonder how it would compare on something that doesn't or nearly doesn't fit into memory.
Approach #1
Here's an approach using broadcasting and some masking trick -
# Mask of quantiles lesser than or equal to 0.5 to select the invalid ones
mask1 = quantiles<=0.5
# Since we are dealing with quantiles, the elems won't be > 1,
# which can be leveraged here as we will add 1s to invalid elems, and
# then look for argmin across each col
min_idx = (np.abs(quantiles-0.5)+mask1).argmin(0)
# Let some broadcasting magic happen here!
out = min_idx == np.arange(quantiles.shape[0])[:,None]
Step-by-step run
1) Input :
In [37]: quantiles
Out[37]:
array([[ 1. , 1. , 1. ],
[ 0.63763978, 0.61848863, 0.75348137],
[ 0.43439645, 0.42485407, 0.5341457 ],
[ 0.22682343, 0.18878366, 0.25253915],
[ 0.16229408, 0.12541476, 0.15263742],
[ 0.12306046, 0.10372971, 0.09832783],
[ 0.09271845, 0.08209844, 0.05982584],
[ 0.06363636, 0.05471266, 0.03855727]])
2) Run the code :
In [38]: mask1 = quantiles<=0.5
...: min_idx = (np.abs(quantiles-0.5)+mask1).argmin(0)
...: out = min_idx == np.arange(quantiles.shape[0])[:,None]
...:
3) Analyze output at each step :
In [39]: mask1
Out[39]:
array([[False, False, False],
[False, False, False],
[ True, True, False],
[ True, True, True],
[ True, True, True],
[ True, True, True],
[ True, True, True],
[ True, True, True]], dtype=bool)
In [40]: np.abs(quantiles-0.5)+mask1
Out[40]:
array([[ 0.5 , 0.5 , 0.5 ],
[ 0.13763978, 0.11848863, 0.25348137],
[ 1.06560355, 1.07514593, 0.0341457 ],
[ 1.27317657, 1.31121634, 1.24746085],
[ 1.33770592, 1.37458524, 1.34736258],
[ 1.37693954, 1.39627029, 1.40167217],
[ 1.40728155, 1.41790156, 1.44017416],
[ 1.43636364, 1.44528734, 1.46144273]])
In [41]: (np.abs(quantiles-0.5)+mask1).argmin(0)
Out[41]: array([1, 1, 2])
In [42]: min_idx == np.arange(quantiles.shape[0])[:,None]
Out[42]:
array([[False, False, False],
[ True, True, False],
[False, False, True],
[False, False, False],
[False, False, False],
[False, False, False],
[False, False, False],
[False, False, False]], dtype=bool)
Performance boost : Following the comments, it seems to get min_idx, we can just do :
min_idx = (quantiles+mask1).argmin(0)
Approach #2
This is focused on memory efficiency.
# Mask of quantiles greater than 0.5 to select the valid ones
mask = quantiles>0.5
# Select valid elems
vals = quantiles.T[mask.T]
# Get vald count per col
count = mask.sum(0)
# Get the min val per col given the mask
minval = np.minimum.reduceat(vals,np.append(0,count[:-1].cumsum()))
# Get final boolean array by just comparing the min vals across each col
out = np.isclose(quantiles,minval)