Numpy.where: very slow with conditions from two different arrays - python

I have three arrays of type numpy.ndarray with dimensions (n by 1), named amplitude, distance and weight. I would like to use selected entries of the amplitude array, based on their respective distance- and weight-values. For example I would like to find the indices of the entries within a certain distance range, so I write:
index = np.where( (distance<10) & (distance>=5) )
and I would then proceed by using the values from amplitude(index).
This works perfectly well as long as I only use one array for specifying the conditions. When I try for example
index = np.where( (distance<10) & (distance>=5) & (weight>0.8) )
the operation becomes super-slow. Why is that, and is there a better way for this task? In fact, I eventually want to use many conditions from something like 6 different arrays.

This is just a guess, but perhaps numpy is broadcasting your arrays? If the arrays are the exact same shape, then numpy won't broadcast them:
>>> distance = numpy.arange(5) > 2
>>> weight = numpy.arange(5) < 4
>>> distance.shape, weight.shape
((5,), (5,))
>>> distance & weight
array([False, False, False, True, False], dtype=bool)
But if they have different shapes, and the shapes are broadcastable, then it will. (n,), (n, 1), and (1, n) are all arguably "n by 1" arrays, they aren't all the same:
>>> distance[None,:].shape, weight[:,None].shape
((1, 5), (5, 1))
>>> distance[None,:]
array([[False, False, False, True, True]], dtype=bool)
>>> weight[:,None]
array([[ True],
[ True],
[ True],
[ True],
[False]], dtype=bool)
>>> distance[None,:] & weight[:,None]
array([[False, False, False, True, True],
[False, False, False, True, True],
[False, False, False, True, True],
[False, False, False, True, True],
[False, False, False, False, False]], dtype=bool)
In addition to returning undesired results, this could be causing a big slowdown if the arrays are even moderately large:
>>> distance = numpy.arange(5000) > 500
>>> weight = numpy.arange(5000) < 4500
>>> %timeit distance & weight
100000 loops, best of 3: 8.17 us per loop
>>> %timeit distance[:,None] & weight[None,:]
10 loops, best of 3: 48.6 ms per loop

Related

FAST: 1D overlaps with rows in 2D?

let say i have 2D array, f.e.:
In [136]: ary
array([[6, 7, 9],
[0, 2, 5],
[3, 3, 4],
[2, 2, 8],
[3, 4, 9],
[0, 5, 7],
[2, 4, 9],
[3, 5, 7],
[7, 8, 8],
[0, 2, 3]])
I want to calculate overlap with 1D vector, FAST.
I can almost do it with (8ms on big array):
(ary == match) # .sum(axis=1).argsort()[::-1]
The problem with it is that it only matches if both Position and Value match.
match == [6,5,4]
array([[ True, False, False],
[False, False, False],
[False, False, True],
[False, False, False],
[False, False, False],
[False, True, False],
[False, False, False],
[False, True, False],
[False, False, False],
[False, False, False]])
F.e. 5 in 2nd column of 1d vec did not match with 5 in 3rd column on the 2nd row.
It works with .isin()
np.isin(ary,match,assume_unique=True).sum(axis=1).argsort()[::-1][:5]
but it is slow on big arrays (200000,10) ~20ms
Help me extend the first case so that it can match the Value in any position of 1D vector with the row.
the expected result is row-indexes ordered by OVERLAP COUNT, lets use [2,4,5] because it has more matches:
In [147]: np.isin(ary,[2,5,4],assume_unique=True)
Out[147]:
array([[False, False, False],
[False, True, True],
[False, False, True],
[ True, True, False],
[False, True, False],
[False, True, False],
[ True, True, False],
[False, True, False],
[False, False, False],
[False, True, False]])
Overlap :
In [149]: np.isin(ary,[2,5,4],assume_unique=True).sum(axis=1)
Out[149]: array([0, 2, 1, 2, 1, 1, 2, 1, 0, 1])
Order by overlap :
In [148]: np.isin(ary,[2,5,4],assume_unique=True).sum(axis=1).argsort()[::-1]
Out[148]: array([6, 3, 1, 9, 7, 5, 4, 2, 8, 0])
See rows : 6,3,1 have Overlap:2 that why they are first
Variants:
#could be from 1000,10,10 to 2000,100,20 .. ++
def data(cells=2000,seg=100,items=10):
ary = np.random.randint(0,cells,(cells*seg,items))
rnd = np.random.randint(0,cells*seg)
return ary, ary[rnd]
def best2(match,ary): #~20ms, (200000,10)
return np.isin(ary,match,assume_unique=True).sum(axis=1).argsort()[::-1][:5]
def best3(match,ary): #Corralien ~20ms
return np.logical_or.reduce(np.ravel(ary) == match[:, None], axis=0).reshape(ary.shape).sum(axis=1).argsort()[::-1][:5]
Can this be sped if using numba+cuda OR cupy on GPU ?
The main problem of all approach so fast is that they create huge temporary array while finally only 5 items are important. Numba can be used to compute the arrays on the fly (with efficient JIT-compiled loops) avoiding some temporary array. Moreover, a full sort is not required as only the top 5 items need to be retrieved. A partition can be used instead. It is even possible to use a faster approach since only the 5 selected items matters and not the others. Here is the resulting code:
#nb.njit('int32[::1](int32[::1], int32[:,::1])')
def computeScore(match, ary):
n, m = ary.shape
assert m == match.shape[0]
tmp = np.empty(n, dtype=np.int32)
for i in range(n):
s = 0
# Count the number of matching items (with repetition)
for j in range(m):
# Find a match
item = ary[i, j]
found = False
for k in range(m):
found |= item == match[k]
s += found
tmp[i] = s
return tmp
def best4(match, ary):
n, m = ary.shape
score = computeScore(match, ary)
bestItems = np.argpartition(score, n-5)[n-5:] # sadly not supported by Numba yet
order = np.argsort(-score[bestItems]) # bastItems is not sorted and likely needs to be
return bestItems[order]
Note that best4 can provide results different to best2 when the matching score (stored in tmp) is equal between multiple items. This is due to the sorting algorithm which is not stable by default in Numpy (the kind parameter can be used to adapt this behavior). This is also true for the partition algorithm although Numpy does not seems to provide a stable partition algorithm yet.
This code should be faster than other implementation, but not by a large margin. One of the issue is that Numba (and most C/C++ compilers like the one used to compile Numpy) do not succeed to generate a fast code since it does not know the value m at compile time. As a result, the most aggressive optimizations (eg. unrolling loops and using of SIMD instructions) can hardly be applied. You can help Numba using assertions or escaping conditionals.
Moreover, the code can be parallelized using multiple threads to make it much faster on mainstream platforms. Note that the parallelized version may not faster on small data nor all platforms since creating threads introduces an overhead that could be bigger than the actual computation.
Here is the resulting implementation:
#nb.njit('int32[::1](int32[::1], int32[:,::1])', parallel=True)
def computeScoreOpt(match, ary):
n, m = ary.shape
assert m == match.shape[0]
assert m == 10
tmp = np.empty(n, dtype=np.int32)
for i in nb.prange(n):
# Thie enable Numba to assume m=10 in the following code
# and generate a very efficient code for this specific case.
# The assert should be enough but the internals of Numba
# prevent the information to be propagatted to this portion
# of the code when it is parallelized.
if m != 10: continue
s = 0
for j in range(m):
item = ary[i, j]
found = False
for k in range(m):
found |= item == match[k]
s += found
tmp[i] = s
return tmp
def best5(match, ary):
n, m = ary.shape
score = computeScoreOpt(match, ary)
bestItems = np.argpartition(score, n-5)[n-5:]
order = np.argsort(-score[bestItems])
return bestItems[order]
Here are the timings on my machine with the example dataset:
best2: 18.2 ms
best3: 17.8 ms
best4 (sequential -- default): 12.0 ms
best4 (parallel): 3.1 ms
best5 (sequential): 3.2 ms
best5 (parallel -- default): 1.2 ms
The fastest implementation is 15 times faster than the original reference implementation.
Note that if m is greater than about 30, it should be better to use a more advanced set-based algorithm. An alternative solution is to sort match first and then use np.isin in the i-based loop in this case.
Use broadcasting and np.logical_or.reduce:
# match = np.array(match)
>>> np.logical_or.reduce(np.ravel(ary) == match[:, None], axis=0) \
.reshape(ary.shape)
array([[ True, False, False],
[False, False, True],
[False, False, True],
[False, False, False],
[False, True, False],
[False, True, False],
[False, True, False],
[False, True, False],
[False, False, False],
[False, False, False]])
Performance
match = np.array([6, 5, 4])
ary = np.random.randint(0, 10, (200000, 10))
%timeit np.logical_or.reduce(np.ravel(ary) == match[:, None], axis=0).reshape(ary.shape)
7.49 ms ± 174 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Can someone please explain np.less_equal.outer(range(1,18),range(1,13))

I was debugging a code written by someone who has left the organization and came across a line, which uses np.less_equal.outer & np.greater_equal.outer functions. I know that np.outer creates a Cartesian cross product of two 1-dimensional arrays and creates two arrays, and np.less_equal compares the element of two arrays and returns true or false. Can someone please explain how this combined form works.
Thanks!
less_equal and greater_equal are special types of numpy functions called ufuncs, in that they have extendible functionalities, including accumulate, at, and outer.
In this case ufunc.outer extends the function to work similarly to the outer product - but while the actual outer product would be multiply.outer, this instead does the greater or less than comparison.
So you get a 2d array of booleans corresponding to each element of the first array, and whether they are greater or less than each of the elements in the second array.
np.less_equal.outer(range(1,18),range(1,13))
Out[]:
array([[ True, True, True, ..., True, True, True],
[False, True, True, ..., True, True, True],
[False, False, True, ..., True, True, True],
...,
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False]], dtype=bool)
EDIT: a much more pythonic way of doing this would be:
np.triu(np.ones((18, 13), dtype = bool), 0)
That is, the upper triangle of a boolean array of shape (18, 13)
From the documentation, we have that for one-dimensional arrays A and B, the operation np.less_equal.outer(A, B) is equivalent to:
m = len(A)
n = len(B)
r = empty(m, n)
for i in range(m):
for j in range(n):
r[i,j] = (A[i] <= B[j])
Here's the mathematical representation of the result:
here is an example:
np.less_equal([4, 2, 1], [2, 2, 2])
array([False, True, True])
np.greater_equal([4, 2, 1], [2, 2, 2])
array([ True, True, False], dtype=bool)
and first the outer function
np.outer(range(1,2), range(1,3))
array([[1 2 3],
[2 4 6],
)
hope that helps.

What is the fastest way to get the result of matrix < matrix in numpy?

Suppose I have a matrix M_1 of dimension (M, A) and a matrix M_2 of dimension (M, B). The result of M_1 < M_2 should be a matrix of dimension (M, B, A) where by each row in M1 is being compared with each element of the corresponding row of M_2 and give a boolean vector (or 1,0-vector) for each comparison.
For example, if I have a matrix of
M1 = [[1,2,3]
[3,4,5]]
M2 = [[1,2],
[3,4]]
result should be [[[False, False, False],
[True, False, False]],
[[False, False, False],
[True, False, False]]]
Currently, I am using for loops which is tremendously slow when I have to repeat this operations many times (taking months). Hopefully, there is a vectorized way to do this. If not, what else can I do?
I am looking at M_1 being (500, 3000000) and M_2 being (500, 500) and repeated about 10000 times.
For NumPy arrays, extend dims with None/np.newaxis such that the first axes are aligned, while the second ones are spread that lets them be compared in an elementwise fashion. Finally do the comparsion leveraging broadcasting for a vectorized solution -
M1[:,None,:] < M2[:,:,None]
Sample run -
In [19]: M1
Out[19]:
array([[1, 2, 3],
[3, 4, 5]])
In [20]: M2
Out[20]:
array([[1, 2],
[3, 4]])
In [21]: M1[:,None,:] < M2[:,:,None]
Out[21]:
array([[[False, False, False],
[ True, False, False]],
[[False, False, False],
[ True, False, False]]])
For lists as inputs, use numpy.expand_dims and then compare -
In [42]: M1 = [[1,2,3],
...: [3,4,5]]
...:
...: M2 = [[1,2],
...: [3,4]]
In [43]: np.expand_dims(M1, axis=1) < np.expand_dims(M2, axis=2)
Out[43]:
array([[[False, False, False],
[ True, False, False]],
[[False, False, False],
[ True, False, False]]])
Further boost
Further boost upon leveraging multi-core with numexpr module for large data -
In [44]: import numexpr as ne
In [52]: M1 = np.random.randint(0,9,(500, 30000))
In [53]: M2 = np.random.randint(0,9,(500, 500))
In [55]: %timeit M1[:,None,:] < M2[:,:,None]
1 loop, best of 3: 3.32 s per loop
In [56]: %timeit ne.evaluate('M1e<M2e',{'M1e':M1[:,None,:],'M2e':M2[:,:,None]})
1 loop, best of 3: 1.53 s per loop

numpy where operation on 2D array

I have a numpy array 'A' of size 571x24 and I am trying to find the index of zeros in it so I do:
>>>A.shape
(571L, 24L)
import numpy as np
z1 = np.where(A==0)
z1 is a tuple with following size:
>>> len(z1)
2
>>> len(z1[0])
29
>>> len(z1[1])
29
I was hoping to create a z1 of same size as A. How do I achieve that?
Edit: I want to create array z1 of booleans for presence of zero in A such that:
>>>z1.shape
(571L, 24L)
You can just check this with the equality operator in python with numpy. Example:
>>> A = np.array([[0,2,2,1],[2,0,0,3]])
>>> A == 0
array([[ True, False, False, False],
[False, True, True, False]], dtype=bool)
np.where() does something else, see documentation. Although, it is possible to achieve this with np.where() using broadcasting. See documentation.
>>> np.where(A == 0, True, False)
array([[ True, False, False, False],
[False, True, True, False]], dtype=bool)
Try this:
import numpy as np
myarray = np.array([[0,3,4,5],[9,4,0,4],[1,2,3,4]])
ix = np.in1d(myarray.ravel(), 0).reshape(myarray.shape)
Output of ix:
array([[ True, False, False, False],
[False, False, True, False],
[False, False, False, False]], dtype=bool)

Find consecutive unmasked values

I have a large 3 dimensional (time, longitude, latitude) input array. Most of the entries are masked. I need to find those entries where the mask is False for longer than a specific number of consecutive time steps (which I call threshold here). The result should be a mask with the same shape as the input mask.
Here is some pseudo-code to hopefully make clearer what I mean:
new_mask = find_consecutive(mask, threshold=3)
mask[:, i_lon, i_lat]
# [1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0]
new_mask[:, i_lon, i_lat]
# [1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
EDIT:
I'm not sure if my approach so far make sense. It does good performance-wise and gives me a labeled array and knowledge about which labels I want. I just couldn't figure out an efficient way to transform labels into a mask again.
from scipy.ndimage import measurements
structure = np.zeros((3, 3, 3))
structure[:, 1, 1] = 1
labels, nr_labels = measurements.label(1 - mask, structure=structure)
_, counts = np.unique(labels, return_counts=True)
labels_selected = [i_count for i_count, count in enumerate(counts)
if count >= threshold]
That's a classical case of binary closing operation in image-processing. To solve it you can take help from scipy module, specifically - scipy.ndimage.morphology.binary_closing after we feed an appropriate 1D kernel of all ONES and of length threshold. Also, Scipy's binary closing function gives us the closed mask only. So, to get the desired output, we need to OR it with the input mask. Thus, the implementation would look something like this -
from scipy.ndimage import binary_closing
out = mask | binary_closing(mask, structure=np.ones(threshold))
How about a NumPy version of binary closing!
Now, a closing operation is basically image-dilation and image-erosion, so we can simulate that behiviour using the trusty convolution operation and we do have that here in NumPy as np.convolve. Similar to scipy's binary closing operation, we need the same kernel here as well and we would use it both for dilation and erosion. The implementation would be -
def numpy_binary_closing(mask,threshold):
# Define kernel
K = np.ones(threshold)
# Perform dilation and threshold at 1
dil = np.convolve(mask,K,mode='same')>=1
# Perform erosion on the dilated mask array and threshold at given threshold
dil_erd = np.convolve(dil,K,mode='same')>= threshold
return dil_erd
Sample run -
In [133]: mask
Out[133]:
array([ True, False, False, False, False, True, True, False, False,
True, False], dtype=bool)
In [134]: threshold = 3
In [135]: binary_closing(mask, structure=np.ones(threshold))
Out[135]:
array([False, False, False, False, False, True, True, True, True,
True, False], dtype=bool)
In [136]: numpy_binary_closing(mask,threshold)
Out[136]:
array([False, False, False, False, False, True, True, True, True,
True, False], dtype=bool)
In [137]: mask | binary_closing(mask, structure=np.ones(threshold))
Out[137]:
array([ True, False, False, False, False, True, True, True, True,
True, False], dtype=bool)
In [138]: mask| numpy_binary_closing(mask,threshold)
Out[138]:
array([ True, False, False, False, False, True, True, True, True,
True, False], dtype=bool)
Runtime tests (Scipy vs Numpy!)
Case #1 : Uniformly sparse
In [163]: mask = np.random.rand(10000) > 0.5
In [164]: threshold = 3
In [165]: %timeit binary_closing(mask, structure=np.ones(threshold))
1000 loops, best of 3: 582 µs per loop
In [166]: %timeit numpy_binary_closing(mask,threshold)
10000 loops, best of 3: 178 µs per loop
In [167]: out1 = binary_closing(mask, structure=np.ones(threshold))
In [168]: out2 = numpy_binary_closing(mask,threshold)
In [169]: np.allclose(out1,out2) # Verify outputs
Out[169]: True
Case #2 : More sparse and bigger threshold
In [176]: mask = np.random.rand(10000) > 0.8
In [177]: threshold = 11
In [178]: %timeit binary_closing(mask, structure=np.ones(threshold))
1000 loops, best of 3: 823 µs per loop
In [179]: %timeit numpy_binary_closing(mask,threshold)
1000 loops, best of 3: 331 µs per loop
In [180]: out1 = binary_closing(mask, structure=np.ones(threshold))
In [181]: out2 = numpy_binary_closing(mask,threshold)
In [182]: np.allclose(out1,out2) # Verify outputs
Out[182]: True
Winner is Numpy and by a big margin!
Boundary conditions
It seems the boundaries need the closing too, if the 1s are close enough to the bounadries. To solve those cases, you can pad one 1 each at the start and end of the input boolean array, use the posted code and then at the end de-select the first and last element. Thus, the complete implementation using scipy's binary_closing approach would be -
mask_ext = np.pad(mask,1,'constant',constant_values=(1))
out = mask_ext | binary_closing(mask_ext, structure=np.ones(threshold))
out = out[1:-1]
Sample run -
In [369]: mask
Out[369]:
array([False, False, True, False, False, False, False, True, True,
False, False, True, False], dtype=bool)
In [370]: threshold = 3
In [371]: mask_ext = np.pad(mask,1,'constant',constant_values=(1))
...: out = mask_ext | binary_closing(mask_ext, structure=np.ones(threshold))
...: out = out[1:-1]
...:
In [372]: out
Out[372]:
array([ True, True, True, False, False, False, False, True, True,
True, True, True, True], dtype=bool)
Just for the sake of completeness here also the solution for the approach I outlined in my EDIT. It does way worse than both of Divakars solutions performance-wise (about a factor 10 compared to numpy_binary_closing) but allows handling of 3D arrays. In addition it offers the possibility to write out the positions of the clusters (wasn't part of the question but it can be interesting information)
import numpy as np
from scipy.ndimage import measurements
def select_consecutive(mask, threshold):
structure = np.zeros((3, 3, 3))
structure[:, 1, 1] = 1
labels, _ = measurements.label(1 - mask, structure=structure)
# find positions of all unmasked values
# object_slices = measurements.find_objects(labels)
_, counts = np.unique(labels, return_counts=True)
labels_selected = [i_count for i_count, count in enumerate(counts)
if count >= threshold and i_count != 0]
ind = np.in1d(labels.flatten(), labels_selected).reshape(mask.shape)
mask_new = np.ones_like(mask)
mask_new[ind] = 0
return mask_new

Categories