Count unique elements along an axis of a NumPy array - python

I have a three-dimensional array like
A=np.array([[[1,1],
[1,0]],
[[1,2],
[1,0]],
[[1,0],
[0,0]]])
Now I would like to obtain an array that has a nonzero value in a given position if only a unique nonzero value (or zero) occurs in that position. It should have zero if only zeros or more than one nonzero value occur in that position. For the example above, I would like
[[1,0],
[1,0]]
since
in A[:,0,0] there are only 1s
in A[:,0,1] there are 0, 1 and 2, so more than one nonzero value
in A[:,1,0] there are 0 and 1, so 1 is retained
in A[:,1,1] there are only 0s
I can find how many nonzero elements there are with np.count_nonzero(A, axis=0), but I would like to keep 1s or 2s even if there are several of them. I looked at np.unique but it doesn't seem to support what I'd like to do.
Ideally, I'd like a function like np.count_unique(A, axis=0) which would return an array in the original shape, e.g. [[1, 3],[2, 1]], so I could check whether 3 or more occur and then ignore that position.
All I could come up with was a list comprehension iterating over the that I'd like to obtain
[[len(np.unique(A[:, i, j])) for j in range(A.shape[2])] for i in range(A.shape[1])]
Any other ideas?

You can use np.diff to stay at numpy level for the second task.
def diffcount(A):
B=A.copy()
B.sort(axis=0)
C=np.diff(B,axis=0)>0
D=C.sum(axis=0)+1
return D
# [[1 3]
# [2 1]]
it's seems to be a little faster on big arrays:
In [62]: A=np.random.randint(0,100,(100,100,100))
In [63]: %timeit diffcount(A)
46.8 ms ± 769 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [64]: timeit [[len(np.unique(A[:, i, j])) for j in range(A.shape[2])]\
for i in range(A.shape[1])]
149 ms ± 700 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Finally counting unique is simpler than sorting, a ln(A.shape[0]) factor can be win.
A way to win this factor is to use the set mechanism :
In [81]: %timeit np.apply_along_axis(lambda a:len(set(a)),axis=0,A)
183 ms ± 1.17 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Unfortunately, this is not faster.
Another way is to do it by hand :
def countunique(A,Amax):
res=np.empty(A.shape[1:],A.dtype)
c=np.empty(Amax+1,A.dtype)
for i in range(A.shape[1]):
for j in range(A.shape[2]):
T=A[:,i,j]
for k in range(c.size): c[k]=0
for x in T:
c[x]=1
res[i,j]= c.sum()
return res
At python level:
In [70]: %timeit countunique(A,100)
429 ms ± 18.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Which is not so bad for a pure python approach. Then just shift this code at low level with numba :
import numba
countunique2=numba.jit(countunique)
In [71]: %timeit countunique2(A,100)
3.63 ms ± 70.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Which will be difficult to improve a lot.

One approach would be to use A as first axis indices for setting a boolean array of the same lengths along the other two axes and then simply counting the non-zeros along the first axis of it. Two variants would be possible - One keeping it as 3D and another would be to reshape into 2D for some performance benefit as indexing into 2D would be faster. Thus, the two implementations would be -
def nunique_axis0_maskcount_app1(A):
m,n = A.shape[1:]
mask = np.zeros((A.max()+1,m,n),dtype=bool)
mask[A,np.arange(m)[:,None],np.arange(n)] = 1
return mask.sum(0)
def nunique_axis0_maskcount_app2(A):
m,n = A.shape[1:]
A.shape = (-1,m*n)
maxn = A.max()+1
N = A.shape[1]
mask = np.zeros((maxn,N),dtype=bool)
mask[A,np.arange(N)] = 1
A.shape = (-1,m,n)
return mask.sum(0).reshape(m,n)
Runtime test -
In [154]: A = np.random.randint(0,100,(100,100,100))
# #B. M.'s soln
In [155]: %timeit f(A)
10 loops, best of 3: 28.3 ms per loop
# #B. M.'s soln using slicing : (B[1:] != B[:-1]).sum(0)+1
In [156]: %timeit f2(A)
10 loops, best of 3: 26.2 ms per loop
In [157]: %timeit nunique_axis0_maskcount_app1(A)
100 loops, best of 3: 12 ms per loop
In [158]: %timeit nunique_axis0_maskcount_app2(A)
100 loops, best of 3: 9.14 ms per loop
Numba method
Using the same strategy as used for nunique_axis0_maskcount_app2 with directly getting the counts at C-level with numba, we would have -
from numba import njit
#njit
def nunique_loopy_func(mask, N, A, p, count):
for j in range(N):
mask[:] = True
mask[A[0,j]] = False
c = 1
for i in range(1,p):
if mask[A[i,j]]:
c += 1
mask[A[i,j]] = False
count[j] = c
return count
def nunique_axis0_numba(A):
p,m,n = A.shape
A.shape = (-1,m*n)
maxn = A.max()+1
N = A.shape[1]
mask = np.empty(maxn,dtype=bool)
count = np.empty(N,dtype=int)
out = nunique_loopy_func(mask, N, A, p, count).reshape(m,n)
A.shape = (-1,m,n)
return out
Runtime test -
In [328]: np.random.seed(0)
In [329]: A = np.random.randint(0,100,(100,100,100))
In [330]: %timeit nunique_axis0_maskcount_app2(A)
100 loops, best of 3: 11.1 ms per loop
# #B.M.'s numba soln
In [331]: %timeit countunique2(A,A.max()+1)
100 loops, best of 3: 3.43 ms per loop
# Numba soln posted in this post
In [332]: %timeit nunique_axis0_numba(A)
100 loops, best of 3: 2.76 ms per loop

Related

Efficient numpy row-wise matrix multiplication using 3d arrays

I have two 3d arrays of shape (N, M, D) and I want to perform an efficient row wise (over N) matrix multiplication such that the resulting array is of shape (N, D, D).
An inefficient code sample showing what I try to achieve is given by:
N = 100
M = 10
D = 50
arr1 = np.random.normal(size=(N, M, D))
arr2 = np.random.normal(size=(N, M, D))
result = []
for i in range(N):
result.append(arr1[i].T # arr2[i])
result = np.array(result)
However, this application is quite slow for large N due to the loop. Is there a more efficient way to achieve this computation without using loops? I already tried to find a solution via tensordot and einsum to no avail.
The vectorization solution is to swap the last two axes of arr1:
>>> N, M, D = 2, 3, 4
>>> np.random.seed(0)
>>> arr1 = np.random.normal(size=(N, M, D))
>>> arr2 = np.random.normal(size=(N, M, D))
>>> arr1.transpose(0, 2, 1) # arr2
array([[[ 6.95815626, 0.38299107, 0.40600482, 0.35990016],
[-0.95421604, -2.83125879, -0.2759683 , -0.38027618],
[ 3.54989101, -0.31274318, 0.14188485, 0.19860495],
[ 3.56319723, -6.36209602, -0.42687188, -0.24932248]],
[[ 0.67081341, -0.08816343, 0.35430089, 0.69962394],
[ 0.0316968 , 0.15129449, -0.51592291, 0.07118177],
[-0.22274906, -0.28955683, -1.78905988, 1.1486345 ],
[ 1.68432706, 1.93915798, 2.25785798, -2.34404577]]])
A simple benchmark for the super N:
In [225]: arr1.shape
Out[225]: (100000, 10, 50)
In [226]: %%timeit
...: result = []
...: for i in range(N):
...: result.append(arr1[i].T # arr2[i])
...: result = np.array(result)
...:
...:
12.4 s ± 224 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [227]: %timeit arr1.transpose(0, 2, 1) # arr2
843 ms ± 7.79 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Use pre allocated lists and do not perform data conversion after the loop ends. The performance here is not much worse than vectorization, which means that the most overhead comes from the final data conversion:
In [375]: %%timeit
...: result = [None] * N
...: for i in range(N):
...: result[i] = arr1[i].T # arr2[i]
...: # result = np.array(result)
...:
...:
1.22 s ± 16.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Performance of loop solution with data conversion:
In [376]: %%timeit
...: result = [None] * N
...: for i in range(N):
...: result[i] = arr1[i].T # arr2[i]
...: result = np.array(result)
...:
...:
11.3 s ± 179 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Another refers to the answer of #9769953 and makes additional optimization test. To my surprise, its performance is almost the same as the vectorization solution:
In [378]: %%timeit
...: result = np.empty_like(arr1, shape=(N, D, D))
...: for res, ar1, ar2 in zip(result, arr1.transpose(0, 2, 1), arr2):
...: np.matmul(ar1, ar2, out=res)
...:
843 ms ± 4.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
For interest, I wondered about the loop overhead, which I guess is minimal compared to the matrix multiplication; but in particular, the loop overhead is minimal to the potential reallocation of the list memory, which with N = 10000 could be significant.
Using a pre-allocated array instead of a list, I compared the loop result and the solution provided by Mechanic Pig, and achieved the following results on my machine:
In [10]: %timeit result1 = arr1.transpose(0, 2, 1) # arr2
33.7 ms ± 117 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
versus
In [14]: %%timeit
...: result = np.empty_like(arr1, shape=(N, D, D))
...: for i in range(N):
...: result[i, ...] = arr1[i].T # arr2[i]
...:
...:
48.5 ms ± 184 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
The pure NumPy solution is still faster, so that's good, but only by a factor of about 1.5. Not too bad. Depending on the needs, the loop may be clearer as to what it intents (and easier to modify, in case there's a need for an if-statement or other shenigans).
And naturally, a simple comment above the faster solution can easily point out what it actually replaces.
Following the comments to this answer by Mechanic Pig, I've added below the timing results of a loop without preallocating an array (but with a preallocated list) and without conversion to a NumPy array. Mainly so the results are compared for the same machine:
In [11]: %%timeit
...: result = [None] * N
...: for i in range(N):
...: result.append(arr1[i].T # arr2[i])
...:
49.5 ms ± 672 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
So interestingly, this results, without conversion, is (a tiny bit) slower than the one with a pre-allocated array and directly assigning into the array.

Pandas capping cumulative sum [duplicate]

Say I have an array of distances x=[1,2,1,3,3,2,1,5,1,1].
I want to get the indices from x where cumsum reaches 10, in this case, idx=[4,9].
So the cumsum restarts after the condition are met.
I can do it with a loop, but loops are slow for large arrays and I was wondering if I could do it in a vectorized way.
A fun method
sumlm = np.frompyfunc(lambda a,b:a+b if a < 10 else b,2,1)
newx=sumlm.accumulate(x, dtype=np.object)
newx
array([1, 3, 4, 7, 10, 2, 3, 8, 9, 10], dtype=object)
np.nonzero(newx==10)
(array([4, 9]),)
Here's one with numba and array-initialization -
from numba import njit
#njit
def cumsum_breach_numba2(x, target, result):
total = 0
iterID = 0
for i,x_i in enumerate(x):
total += x_i
if total >= target:
result[iterID] = i
iterID += 1
total = 0
return iterID
def cumsum_breach_array_init(x, target):
x = np.asarray(x)
result = np.empty(len(x),dtype=np.uint64)
idx = cumsum_breach_numba2(x, target, result)
return result[:idx]
Timings
Including #piRSquared's solutions and using the benchmarking setup from the same post -
In [58]: np.random.seed([3, 1415])
...: x = np.random.randint(100, size=1000000).tolist()
# #piRSquared soln1
In [59]: %timeit list(cumsum_breach(x, 10))
10 loops, best of 3: 73.2 ms per loop
# #piRSquared soln2
In [60]: %timeit cumsum_breach_numba(np.asarray(x), 10)
10 loops, best of 3: 69.2 ms per loop
# From this post
In [61]: %timeit cumsum_breach_array_init(x, 10)
10 loops, best of 3: 39.1 ms per loop
Numba : Appending vs. array-initialization
For a closer look at how the array-initialization helps, which seems be the big difference between the two numba implementations, let's time these on the array data, as the array data creation was in itself heavy on runtime and they both depend on it -
In [62]: x = np.array(x)
In [63]: %timeit cumsum_breach_numba(x, 10)# with appending
10 loops, best of 3: 31.5 ms per loop
In [64]: %timeit cumsum_breach_array_init(x, 10)
1000 loops, best of 3: 1.8 ms per loop
To force the output to have it own memory space, we can make a copy. Won't change the things in a big way though -
In [65]: %timeit cumsum_breach_array_init(x, 10).copy()
100 loops, best of 3: 2.67 ms per loop
Loops are not always bad (especially when you need one). Also, There is no tool or algorithm that will make this quicker than O(n). So let's just make a good loop.
Generator Function
def cumsum_breach(x, target):
total = 0
for i, y in enumerate(x):
total += y
if total >= target:
yield i
total = 0
list(cumsum_breach(x, 10))
[4, 9]
Just In Time compiling with Numba
Numba is a third party library that needs to be installed.
Numba can be persnickety about what features are supported. But this works.
Also, as pointed out by Divakar, Numba performs better with arrays
from numba import njit
#njit
def cumsum_breach_numba(x, target):
total = 0
result = []
for i, y in enumerate(x):
total += y
if total >= target:
result.append(i)
total = 0
return result
cumsum_breach_numba(x, 10)
Testing the Two
Because I felt like it ¯\_(ツ)_/¯
Setup
np.random.seed([3, 1415])
x0 = np.random.randint(100, size=1_000_000)
x1 = x0.tolist()
Accuracy
i0 = cumsum_breach_numba(x0, 200_000)
i1 = list(cumsum_breach(x1, 200_000))
assert i0 == i1
Time
%timeit cumsum_breach_numba(x0, 200_000)
%timeit list(cumsum_breach(x1, 200_000))
582 µs ± 40.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
64.3 ms ± 5.66 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Numba was on the order of 100 times faster.
For a more true apples to apples test, I convert a list to a Numpy array
%timeit cumsum_breach_numba(np.array(x1), 200_000)
%timeit list(cumsum_breach(x1, 200_000))
43.1 ms ± 202 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
62.8 ms ± 327 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Which brings them to about even.

Python Numpy get difference between 2 two-dimensional array

Well, I have a simple problem that is giving me a headache, basically I have two two-dimensional arrays, full of [x,y] coordinates, and I want to compare the first with the second and generate a third array that contains all the elements of the first array that doesn't appear in the second. It's simple but I couldn't make it work at all. The size varies a lot, the first array can have between a thousand and 2 million coordinates, while the first array has between 1 and a thousand.
This operation will occur many times and the larger the first array, the more times it will occur
sample:
arr1 = np.array([[0, 3], [0, 4], [1, 3], [1, 7], ])
arr2 = np.array([[0, 3], [1, 7]])
result = np.array([[0, 4], [1, 3]])
In Depth: Basically I have a binary image with variable resolution, it is composed of 0 and 1 (255) and I analyze each pixel individually (with an algorithm that is already optimized), but (on purpose) every time this function is executed it analyzes only a fraction of the pixels, and when it is finished it gives me back all the coordinates of these pixels. The problem is that when it executes it runs the following code:
ones = np.argwhere(img == 255) # ones = pixels array
It takes about 0.02 seconds and is by far the slowest part of the code. My idea is to create this variable once and, each time the function ends, it removes the parsed pixels and passes the new array as parameter to continue until the array is empty
Not sure what you intend to do with the extra dimensions, as the set difference, like any filtering, is inherently losing the shape information.
Anyway, NumPy does provide np.setdiff1d() to solve this problem elegantly.
EDIT With the clarifications provided, you seems to be looking for a way compute the set difference on a given axis, i.e. the elements of the sets are actually arrays.
There is no built-in specifically for this in NumPy, but it is not too difficult to craft one.
For simplicity, we assume that the operating axis is the first one (so that the element of the set are arr[i]), that only unique elements appear in the first array, and that the arrays are 2D.
They are all based on the idea that the asymptotically best approach is to build a set() of the second array and then using that to filter out the entries from the first array.
The idiomatic way to build such set in Python / NumPy is to use:
set(map(tuple, arr))
where the mapping to tuple freezes arr[i], allowing them to be hashable and hence making them available to use with set().
Unfortunately, since the filtering would produce results of unpredictable size, NumPy arrays are not the ideal container for the result.
To solve this issue, one can use:
an intermediate list
import numpy as np
def setdiff2d_list(arr1, arr2):
delta = set(map(tuple, arr2))
return np.array([x for x in arr1 if tuple(x) not in delta])
np.fromiter() followed by np.reshape()
import numpy as np
def setdiff2d_iter(arr1, arr2):
delta = set(map(tuple, arr2))
return np.fromiter((x for xs in arr1 if tuple(xs) not in delta for x in xs), dtype=arr1.dtype).reshape(-1, arr1.shape[-1])
NumPy's advanced indexing
def setdiff2d_idx(arr1, arr2):
delta = set(map(tuple, arr2))
idx = [tuple(x) not in delta for x in arr1]
return arr1[idx]
Convert both inputs to set() (will force uniqueness of the output elements and will lose ordering):
import numpy as np
def setdiff2d_set(arr1, arr2):
set1 = set(map(tuple, arr1))
set2 = set(map(tuple, arr2))
return np.array(list(set1 - set2))
Alternatively, the advanced indexing can be built using broadcasting, np.any() and np.all():
def setdiff2d_bc(arr1, arr2):
idx = (arr1[:, None] != arr2).any(-1).all(1)
return arr1[idx]
Some form of the above methods were originally suggested in #QuangHoang's answer.
A similar approach could also be implemented in Numba, following the same idea as above but using a hash instead of the actual array view arr[i] (because of the limitations in what is supported inside a set() by Numba) and pre-computing the output size (for speed):
import numpy as np
import numba as nb
#nb.njit
def mul_xor_hash(arr, init=65537, k=37):
result = init
for x in arr.view(np.uint64):
result = (result * k) ^ x
return result
#nb.njit
def setdiff2d_nb(arr1, arr2):
# : build `delta` set using hashes
delta = {mul_xor_hash(arr2[0])}
for i in range(1, arr2.shape[0]):
delta.add(mul_xor_hash(arr2[i]))
# : compute the size of the result
n = 0
for i in range(arr1.shape[0]):
if mul_xor_hash(arr1[i]) not in delta:
n += 1
# : build the result
result = np.empty((n, arr1.shape[-1]), dtype=arr1.dtype)
j = 0
for i in range(arr1.shape[0]):
if mul_xor_hash(arr1[i]) not in delta:
result[j] = arr1[i]
j += 1
return result
While they all give the same result:
funcs = setdiff2d_iter, setdiff2d_list, setdiff2d_idx, setdiff2d_set, setdiff2d_bc, setdiff2d_nb
arr1 = np.array([[0, 3], [0, 4], [1, 3], [1, 7]])
print(arr1)
# [[0 3]
# [0 4]
# [1 3]
# [1 7]]
arr2 = np.array([[0, 3], [1, 7], [4, 0]])
print(arr2)
# [[0 3]
# [1 7]
# [4 0]]
result = funcs[0](arr1, arr2)
print(result)
# [[0 4]
# [1 3]]
for func in funcs:
print(f'{func.__name__:>24s}', np.all(result == func(arr1, arr2)))
# setdiff2d_iter True
# setdiff2d_list True
# setdiff2d_idx True
# setdiff2d_set False # because of ordering
# setdiff2d_bc True
# setdiff2d_nb True
their performance seems to be varying:
for func in funcs:
print(f'{func.__name__:>24s}', end=' ')
%timeit func(arr1, arr2)
# setdiff2d_iter 16.3 µs ± 719 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# setdiff2d_list 14.9 µs ± 528 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# setdiff2d_idx 17.8 µs ± 1.75 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# setdiff2d_set 17.5 µs ± 1.31 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# setdiff2d_bc 9.45 µs ± 405 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
# setdiff2d_nb 1.58 µs ± 51.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
The Numba-based approach proposed seems to outperform the other approaches by a fair margin (some 10x using the given input).
Similar timings are observed with larger inputs:
np.random.seed(42)
arr1 = np.random.randint(0, 100, (1000, 2))
arr2 = np.random.randint(0, 100, (1000, 2))
print(setdiff2d_nb(arr1, arr2).shape)
# (736, 2)
for func in funcs:
print(f'{func.__name__:>24s}', end=' ')
%timeit func(arr1, arr2)
# setdiff2d_iter 3.51 ms ± 75.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# setdiff2d_list 2.92 ms ± 32.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# setdiff2d_idx 2.61 ms ± 38.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# setdiff2d_set 3.52 ms ± 67.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# setdiff2d_bc 25.6 ms ± 198 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# setdiff2d_nb 192 µs ± 1.66 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
(As a side note, setdiff2d_bc() is the most negatively affected by the size of the second input).
Depending on how large your arrays are. If they are not too large (few thousands), you can
use broadcasting to compare each point in x to each point in y
use any to check for inequality at the last dimension
use all to check for matching
Code:
idx = (arr1[:,None]!=arr2).any(-1).all(1)
arr1[idx]
Output:
array([[0, 4],
[1, 3]])
update: for longer data, you can try set and a for loop:
set_y = set(map(tuple, y))
idx = [tuple(point) not in set_y for point in x]
x[idx]

Searching large array by two columns

I have a large array, that looks like something below:
np.random.seed(42)
arr = np.random.permutation(np.array([
(1,1,2,2,2,2,3,3,4,4,4),
(8,9,3,4,7,9,1,9,3,4,50000)
]).T)
It isn't sorted, the rows of this array are unique, I also know the bounds for the values in both columns, they are [0, n] and [0, k]. So the maximum possible size of the array is (n+1)*(k+1), but the actual size is closer to log of that.
I need to search the array by both columns to find such row that arr[row,:] = (i,j), and return -1 when (i,j) is absent in the array. The naive implementation for such function is:
def get(arr, i, j):
cond = (arr[:,0] == i) & (arr[:,1] == j)
if np.any(cond):
return np.where(cond)[0][0]
else:
return -1
Unfortunately, since in my case arr is very large (>90M rows), this is very inefficient, especially since I would need to call get() multiple times.
Alternatively I tried translating this to a dict with (i,j) keys, such that
index[(i,j)] = row
that can be accessed by:
def get(index, i, j):
try:
retuen index[(i,j)]
except KeyError:
return -1
This works (and is much faster when tested on smaller data than I have), but again, creating the dict on-the-fly by
index = {}
for row in range(arr.shape[0]):
i,j = arr[row, :]
index[(i,j)] = row
takes huge amount of time and eats lots of RAM in my case. I was also thinking of first sorting arr and then using something like np.searchsorted, but this didn't lead me anywhere.
So what I need is a fast function get(arr, i, j) that returns
>>> get(arr, 2, 3)
4
>>> get(arr, 4, 100)
-1
A partial solution would be:
In [36]: arr
Out[36]:
array([[ 2, 9],
[ 1, 8],
[ 4, 4],
[ 4, 50000],
[ 2, 3],
[ 1, 9],
[ 4, 3],
[ 2, 7],
[ 3, 9],
[ 2, 4],
[ 3, 1]])
In [37]: (i,j) = (2, 3)
# we can use `assume_unique=True` which can speed up the calculation
In [38]: np.all(np.isin(arr, [i,j], assume_unique=True), axis=1, keepdims=True)
Out[38]:
array([[False],
[False],
[False],
[False],
[ True],
[False],
[False],
[False],
[False],
[False],
[False]])
# we can use `assume_unique=True` which can speed up the calculation
In [39]: mask = np.all(np.isin(arr, [i,j], assume_unique=True), axis=1, keepdims=True)
In [40]: np.argwhere(mask)
Out[40]: array([[4, 0]])
If you need the final result as a scalar, then don't use keepdims argument and cast the array to a scalar like:
# we can use `assume_unique=True` which can speed up the calculation
In [41]: mask = np.all(np.isin(arr, [i,j], assume_unique=True), axis=1)
In [42]: np.argwhere(mask)
Out[42]: array([[4]])
In [43]: np.asscalar(np.argwhere(mask))
Out[43]: 4
Solution
Python offers a set type to store unique values, but sadly no ordered version of a set. But you can use the ordered-set package.
Create an OrderedSet from the data. Fortunately, this only needs to be done once:
import ordered_set
o = ordered_set.OrderedSet(map(tuple, arr))
def ordered_get(o, i, j):
try:
return o.index((i,j))
except KeyError:
return -1
Runtime
Finding the index of a value should be O(1), according to the documentation:
In [46]: %timeit get(arr, 2, 3)
10.6 µs ± 39 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [47]: %timeit ordered_get(o, 2, 3)
1.16 µs ± 14.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [48]: %timeit ordered_get(o, 2, 300)
1.05 µs ± 2.67 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Testing this for a much larger array:
a2 = random.randint(10000, size=1000000).reshape(-1,2)
o2 = ordered_set.OrderedSet()
for t in map(tuple, a2):
o2.add(t)
In [65]: %timeit get(a2, 2, 3)
1.05 ms ± 2.14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [66]: %timeit ordered_get(o2, 2, 3)
1.03 µs ± 2.12 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [67]: %timeit ordered_get(o2, 2, 30000)
1.06 µs ± 28.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Looks like it indeed is O(1) runtime.
def get_agn(arr, i, j):
idx = np.flatnonzero((arr[:,0] == j) & (arr[:,1] == j))
return -1 if idx.size == 0 else idx[0]
Also, just in case you are thinking about the ordered_set solution, here is a better one (however, in both cases see timing tests below):
d = { (i, j): k for k, (i, j) in enumerate(arr)}
def unordered_get(d, i, j):
return d.get((i, j), -1)
and it's "full" equivalent (that builds the dictionary inside the function):
def unordered_get_full(arr, i, j):
d = { (i, j): k for k, (i, j) in enumerate(arr)}
return d.get((i, j), -1)
Timing tests:
First, define #kmario23 function:
def get_kmario23(arr, i, j):
# fundamentally, kmario23's code re-aranged to return scalars
# and -1 when (i, j) not found:
mask = np.all(np.isin(arr, [i,j], assume_unique=True), axis=1)
idx = np.argwhere(mask)[0]
return -1 if idx.size == 0 else np.asscalar(idx[0])
Second, define #ChristophTerasa function (original and the full version):
import ordered_set
o = ordered_set.OrderedSet(map(tuple, arr))
def ordered_get(o, i, j):
try:
return o.index((i,j))
except KeyError:
return -1
def ordered_get_full(arr, i, j):
# "Full" version that builds ordered set inside the function
o = ordered_set.OrderedSet(map(tuple, arr))
try:
return o.index((i,j))
except KeyError:
return -1
Generate some large data:
arr = np.random.randint(1, 2000, 200000).reshape((-1, 2))
Timing results:
In [55]: %timeit get_agn(arr, *arr[-1])
149 µs ± 3.17 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [56]: %timeit get_kmario23(arr, *arr[-1])
1.42 ms ± 17.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [57]: %timeit get_kmario23(arr, *arr[0])
1.2 ms ± 14.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Ordered set tests:
In [80]: o = ordered_set.OrderedSet(map(tuple, arr))
In [81]: %timeit ordered_get(o, *arr[-1])
1.74 µs ± 32.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [82]: %timeit ordered_get_full(arr, *arr[-1]) # include ordered set creation time
166 ms ± 2.16 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Unordered dictionary tests:
In [83]: d = { (i, j): k for k, (i, j) in enumerate(arr)}
In [84]: %timeit unordered_get(d, *arr[-1])
1.18 µs ± 21.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [85]: %timeit unordered_get_full(arr, *arr[-1])
102 ms ± 1.45 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
So, when taking into account the time needed to create either ordered set or unordered dictionary, these methods are quite slow. You must plan running several hundred searches on the same data for these methods to make sense. Even then, there is no need to use ordered_set package - regular dictionaries are faster.
It seems I was over-thinking this problem, there is easy solution. I was considering either filtering and subsetting the array or using dict index[(i,j)] = row. Filtering and subsetting was slow (O(n) when searching), while using dict was fast (O(1) access time), but creating the dict was slow and memory intensive.
The simple solution for this problem is using nested dicts.
index = {}
for row in range(arr.shape[0]):
i,j = arr[row, :]
try:
index[i][j] = row
except KeyError:
index[i] = {}
index[i][j] = row
def get(index, i, j):
try:
return index[i][j]
except KeyError:
return -1
Alternatively, instead of dict on higher level, I could use index = defaultdict(dict), what would allow for assigning index[i][j] = row
directly, without the try ... except conditions, but then the defaultdict(dict) object would create empty {} when queried for nonexistent i by the get(index, i, j) function, so it would be expanding the index unnecessarily.
The access time is O(1) for the first dict and O(1) for the nested dicts, so basically it's O(1). The upper level dict has manageable size (bounded by n < n*k), while the nested dicts are small (the nesting order is chosen based on the fact that in my case k << n). Building the nested dict is also very fast, even for >90M rows in the array. Moreover, it can be easily extended to more complicated cases.

subtract element from number if element satisfies if condition numpy

say I have array a=np.random.randn(4,2) and I want to subtract each negative element from 100.
if I want to subtract 100 from each element then I would use a[(a<0)] -=100. but what if I want to subtract each element from 100. How can I do it without looping through each element?
You can avoid the temporary array in #Akiiino's solution by using the out and where arguments of np.ufuncs:
np.subtract(100, a, out=a, where=a<0)
In general, the meaning of ufunc(a, b, out, where) is roughly:
out[where] = ufunc(a[where], b[where])
Comparing the speed:
In [1]: import numpy as np
In [2]: a = np.random.randn(10000, 2)
In [3]: b = a.copy() # so the that boolean mask doesn't change
In [4]: %timeit a[b < 0] = 100 - a[b < 0]
307 µs ± 1.53 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [5]: %timeit np.subtract(100, a, out=a, where=b < 0)
260 µs ± 39.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
We see that there's a ~15% speed boost here
You can use the same idea:
a[a<0] = 100 - a[a<0]

Categories