Is there an efficient way to calculate sum of bits in each column over array in Python?
Example (Python 3.7 and Numpy 1.20.1):
Create numpy array with values 0 or 1
import numpy as np
array = np.array(
[
[1, 0, 1],
[1, 1, 1],
[0, 0, 1],
]
)
Compress size by np.packbits
pack_array = np.packbits(array, axis=1)
Expected result: sum of bits in each position (column) without np.unpackbits to get the same as array.sum(axis=0):
array([2, 1, 3])
I found just very slow solution:
dim = array.shape[1]
candidates = np.zeros((dim, dim)).astype(int)
np.fill_diagonal(candidates, 1)
pack_candidates = np.packbits(candidates, axis=1)
np.apply_along_axis(lambda c:np.sum((np.bitwise_and(pack_array, c) == c).all(axis=1)), 1, pack_candidates)
Using np.unpackbits can be problematic if the input array is big since the resulting array can be too big to fit in RAM, and even if it does fit in RAM, this would be far from being efficient since the huge array have to be written and read from the (slow) main memory. The same thing apply for CPU caches: smaller arrays can generally be computed faster. Moreover, np.unpackbits have a quite big overhead for small arrays.
AFAIK, this is not possible to do this operation very efficiently in Numpy while using a small amount of RAM (ie. using np.unpackbits, as pointed out by #mathfux). However, Numba can be used to speed up this computation, especially for small arrays. Here is the code:
#nb.njit('int32[::1](uint8[:,::1], int_)')
def bitSum(packed, m):
n = packed.shape[0]
assert packed.shape[1]*8-7 <= m <= packed.shape[1]*8
res = np.zeros(m, dtype=np.int32)
for i in range(n):
for j in range(m):
res[j] += bool(packed[i, j//8] & (128>>(j%8)))
return res
If you want a faster implementation, you can optimize the code by working on fixed-size tiles. However, this makes the code also more complex. Here is the resulting code:
#nb.njit('int32[::1](uint8[:,::1], int_)')
def bitSumOpt(packed, m):
n = packed.shape[0]
assert packed.shape[1]*8-7 <= m <= packed.shape[1]*8
res = np.zeros(m, dtype=np.int32)
for i in range(0, n, 4):
for j in range(0, m, 8):
if i+3 < n and j+7 < m:
# Highly-optimized 4x8 tile computation
k = j//8
b0, b1, b2, b3 = packed[i,k], packed[i+1,k], packed[i+2,k], packed[i+3,k]
for j2 in range(8):
shift = 7 - j2
mask = 1 << shift
res[j+j2] += ((b0 & mask) + (b1 & mask) + (b2 & mask) + (b3 & mask)) >> shift
else:
# Slow fallback computation
for i2 in range(i, min(i+4, n)):
for j2 in range(j, min(j+8, m)):
res[j2] += bool(packed[i2, j2//8] & (128>>(j2%8)))
return res
Here are performance results on my machine:
On the example array:
Initial code: 62.90 us (x1)
numpy_sumbits: 4.37 us (x14)
bitSumOpt: 0.84 us (x75)
bitSum: 0.77 us (x82)
On a random 2000x2000 array:
Initial code: 1203.8 ms (x1)
numpy_sumbits: 3.9 ms (x308)
bitSum: 2.7 ms (x446)
bitSumOpt: 1.5 ms (x802)
The memory footprint of the Numba implementations is much better too (at least 8 times smaller).
It seems there is no better option in numpy than numpy.unpackbits.
To be more clear, let's take another example:
array = np.array([[1, 0, 1, 0, 1, 1, 1, 0, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1],
[0, 0, 1, 0, 0, 0, 0, 0, 0]])
pack_array = np.packbits(array, axis=1)
dim = array.shape[1]
Now, pack_array is calculated in this way:
[[1,0,1,0,1,1,1,0], [1,0,0,0,0,0,0,0]] -> [174, 128]
[[1,1,1,1,1,1,1,1], [1,0,0,0,0,0,0,0]] -> [255, 128]
[[0,0,1,0,0,0,0,0], [0,0,0,0,0,0,0,0]] -> [32, 0]
I've tested various algorithms and unpacking bits seems to be the fastest:
def numpy_sumbits(pack_array, dim):
out = np.unpackbits(pack_array, axis=1, count=dim)
arr = np.sum(out, axis=0)
return arr
def manual_sumbits(pack_array, dim):
arr = pack_array.copy()
out = np.empty((dim//8+1) * 8, dtype=int)
for i in range(8):
out[7 - i%8::8] = np.sum(arr % 2, axis=0)
arr = arr // 2
return out[:dim]
def numpy_sumshifts(pack_array, dim):
res = (pack_array.reshape(pack_array.size, -1) >> np.arange(8)) % 2
res = res.reshape(*pack_array.shape, 8)
return np.sum(res, axis=0)[:,::-1].ravel()[:dim]
print(numpy_unpackbits(pack_array, dim))
print(manual_unpackbits(pack_array, dim))
print(numpy_sumshifts(pack_array, dim))
>>>
[2 1 3 1 2 2 2 1 2]
[2 1 3 1 2 2 2 1 2]
[2 1 3 1 2 2 2 1 2]
%%timeit
numpy_sumbits(pack_array, dim)
>>> 3.49 ms ± 57.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
manual_sumbits(pack_array, dim)
>>> 10 ms ± 22.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
numpy_sumshifts(pack_array, dim)
>>> 20.1 ms ± 97.9 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Related
How to can I compute variance without zero elements?
For example
np.var([[1, 1], [1, 2]], axis=1) -> [0, 0.25]
I need:
var([[1, 1, 0], [1, 2, 0]], axis=1) -> [0, 0.25]
Is it what your are looking for? You can filter out columns where all values are 0 (or at least one value is not 0).
m = np.array([[1, 1, 0], [1, 2, 0]])
np.var(m[:, np.any(m != 0, axis=0)], axis=1)
# Output
array([0. , 0.25])
V1
You can use a masked array:
data = np.array([[1, 1, 0], [1, 2, 0]])
np.ma.array(data, mask=(data == 0)).var(axis=1)
The result is
masked_array(data=[0. , 0.25],
mask=False,
fill_value=1e+20)
The raw numpy array is the data attribute of the resulting masked array:
>>> np.ma.array(data, mask=(data == 0)).var(axis=1).data
array([0. , 0.25])
V2
Without masked arrays, the operation of removing a variable number of elements in each row is a bit tricky. It would be simpler to implement the variance in terms of the formula sum(x**2) / N - (sum(x) / N)**2 and partial reduction of ufuncs.
First we need to find the split indices and segment lengths. In the general case, that looks like
lens = np.count_nonzero(data, axis=1)
inds = np.r_[0, lens[:-1].cumsum()]
Now you can operate on the raveled masked data:
mdata = data[data != 0]
mdata2 = mdata**2
var = np.add.reduceat(mdata2, inds) / lens - (np.add.reduceat(mdata, inds) / lens)**2
This gives you the same result for var (probably more efficiently than the masked version by the way):
array([0. , 0.25])
V3
The var function appears to use the more traditional formula (x - x.mean()).mean(). You can implement that using the quantities above with just a bit more work:
means = (np.add.reduceat(mdata, inds) / lens).repeat(lens)
var = np.add.reduceat((mdata - means)**2, inds) / lens
Comparison
Here is a quick benchmark for the two approaches:
def nzvar_v1(data):
return np.ma.array(data, mask=(data == 0)).var(axis=1).data
def nzvar_v2(data):
lens = np.count_nonzero(data, axis=1)
inds = np.r_[0, lens[:-1].cumsum()]
mdata = data[data != 0]
return np.add.reduceat(mdata**2, inds) / lens - (np.add.reduceat(mdata, inds) / lens)**2
def nzvar_v3(data):
lens = np.count_nonzero(data, axis=1)
inds = np.r_[0, lens[:-1].cumsum()]
mdata = data[data != 0]
return np.add.reduceat((mdata - (np.add.reduceat(mdata, inds) / lens).repeat(lens))**2, inds) / lens
np.random.seed(100)
data = np.random.randint(10, size=(1000, 1000))
%timeit nzvar_v1(data)
18.3 ms ± 278 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit nzvar_v2(data)
5.89 ms ± 69.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit nzvar_v3(data)
11.8 ms ± 62.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
So for a large dataset, the second approach, while requiring a bit more code, appears to be ~3x faster than masked arrays and ~2x faster than using the traditional formulation.
as stated in the title I want to make a calculation where instead of multiplying corresponding elements I binary XOR them and then add them.
Example for illustration:
EDIT: The big picture above IS the calculation but here we go: Take first row from the left [1 0 1] and first column from top matrix [1 0 0]. 1 XOR 1 = 0, 0 XOR 0 = 0, 1 XOR 0 = 1. Add them all 0 + 0 + 1 = 1. First row from the left [1 0 1], second column [0 0 0]: 1 XOR 0 = 1, 0 XOR 0 = 0, 1 XOR 0 = 1. Add them all 1 + 0 + 1 = 2. And so on
Is it possible to do that in numpy?
Try this:
M1 = np.array([[1, 0, 0], [0, 0, 0], [0, 0, 0]])
M2 = np.array([[1, 0, 1], [0, 0, 1], [1, 1, 1]])
(M1 ^ M2[:,None]).sum(-1)
Output:
array([[1, 2, 2],
[2, 1, 1],
[2, 3, 3]])
EDIT
If you want to preallocate memory:
intermediary = np.empty((3,3,3), dtype=np.int32)
np.bitwise_xor(M1, M2[:,None], out=intermediary).sum(-1)
This is just a longer comment on Artys answer. There are a few things to speed up the Numba function.
Steps to improve performance
import numpy as np, numba
m1 = np.random.randint(low=0, high=1,size=1_000_000).reshape(1_000,1_000)
m2 = np.random.randint(low=0, high=1,size=1_000_000).reshape(1_000,1_000)
##Arty
#numba.njit(cache = True)
def matxor_1(m1, m2):
mr = np.empty((m2.shape[0], m1.shape[1]), dtype = np.int64)
for i in range(mr.shape[0]):
for j in range(mr.shape[1]):
mr[i, j] = np.sum(m1[:, j] ^ m2[i, :])
return mr
%timeit matxor_1(m1, m2)
#1.06 s ± 9.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#Aligned memory acces (real transpose the ascontiguousarray is important)
#numba.njit(cache = True)
def matxor_2(m1, m2):
mr = np.empty((m2.shape[0], m1.shape[1]), dtype = np.int64)
m1_T=np.ascontiguousarray(m1.T)
for i in range(mr.shape[0]):
for j in range(mr.shape[1]):
mr[i, j] = np.sum(m1_T[j, :] ^ m2[i, :])
return mr
%timeit matxor_2(m1, m2)
#312 ms ± 7.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#Writing out the inner loop
#numba.njit(fastmath=True,cache = True)
def matxor_3(m1, m2):
mr = np.empty((m2.shape[0], m1.shape[1]), dtype = np.int64)
m1_T=np.ascontiguousarray(m1.T)
for i in range(mr.shape[0]):
for j in range(mr.shape[1]):
acc=0
for k in range(m2.shape[1]):
acc+=m1_T[j, k] ^ m2[i, k]
mr[i, j] = acc
return mr
%timeit matxor_3(m1, m2)
#125 ms ± 3.85 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#Parallelization
#numba.njit(fastmath=True,cache = True,parallel=True)
def matxor_4(m1, m2):
mr = np.empty((m2.shape[0], m1.shape[1]), dtype = np.int64)
m1_T=np.ascontiguousarray(m1.T)
for i in numba.prange(mr.shape[0]):
for j in range(mr.shape[1]):
acc=0
for k in range(m2.shape[1]):
acc+=m1_T[j, k] ^ m2[i, k]
mr[i, j] = acc
return mr
%timeit matxor_4(m1, m2)
#23.8 ms ± 711 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
print(np.allclose(matxor_1(m1, m2),matxor_2(m1, m2)))
#True
print(np.allclose(matxor_1(m1, m2),matxor_3(m1, m2)))
#True
print(np.allclose(matxor_1(m1, m2),matxor_4(m1, m2)))
#True
You can just make a combination of two loops and Numpy 1D xor-sum, like below:
Try it online!
import numpy as np
m1 = np.array([[1, 0, 0], [0, 0, 0], [0, 0, 0]])
m2 = np.array([[1, 0, 1], [0, 0, 1], [1, 1, 1]])
mr = np.empty((m2.shape[0], m1.shape[1]), dtype = np.int64)
for i in range(mr.shape[0]):
for j in range(mr.shape[1]):
mr[i, j] = np.sum(m1[:, j] ^ m2[i, :])
print(mr)
Output:
[[1 2 2]
[2 1 1]
[2 3 3]]
As #MadPhysicist suggested you can use Numba JIT-optimizer (pip install numba) to boost code above and you'll get very fast code for you operations with small memory consumption:
Try it online!
import numpy as np, numba
#numba.njit(cache = True)
def matxor(m1, m2):
mr = np.empty((m2.shape[0], m1.shape[1]), dtype = np.int64)
for i in range(mr.shape[0]):
for j in range(mr.shape[1]):
mr[i, j] = np.sum(m1[:, j] ^ m2[i, :])
return mr
m1 = np.array([[1, 0, 0], [0, 0, 0], [0, 0, 0]])
m2 = np.array([[1, 0, 1], [0, 0, 1], [1, 1, 1]])
print(matxor(m1, m2))
Also Numba code above can be boosted up to 44x times more thanks to following great improvements suggested and coded by #max9111:
import numpy as np, numba
m1 = np.random.randint(low=0, high=1,size=1_000_000).reshape(1_000,1_000)
m2 = np.random.randint(low=0, high=1,size=1_000_000).reshape(1_000,1_000)
##Arty
#numba.njit(cache = True)
def matxor_1(m1, m2):
mr = np.empty((m2.shape[0], m1.shape[1]), dtype = np.int64)
for i in range(mr.shape[0]):
for j in range(mr.shape[1]):
mr[i, j] = np.sum(m1[:, j] ^ m2[i, :])
return mr
%timeit matxor_1(m1, m2)
#1.06 s ± 9.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#Aligned memory acces (real transpose the ascontiguousarray is important)
#numba.njit(cache = True)
def matxor_2(m1, m2):
mr = np.empty((m2.shape[0], m1.shape[1]), dtype = np.int64)
m1_T=np.ascontiguousarray(m1.T)
for i in range(mr.shape[0]):
for j in range(mr.shape[1]):
mr[i, j] = np.sum(m1_T[j, :] ^ m2[i, :])
return mr
%timeit matxor_2(m1, m2)
#312 ms ± 7.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#Writing out the inner loop
#numba.njit(fastmath=True,cache = True)
def matxor_3(m1, m2):
mr = np.empty((m2.shape[0], m1.shape[1]), dtype = np.int64)
m1_T=np.ascontiguousarray(m1.T)
for i in range(mr.shape[0]):
for j in range(mr.shape[1]):
acc=0
for k in range(m2.shape[1]):
acc+=m1_T[j, k] ^ m2[i, k]
mr[i, j] = acc
return mr
%timeit matxor_3(m1, m2)
#125 ms ± 3.85 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
#Parallelization
#numba.njit(fastmath=True,cache = True,parallel=True)
def matxor_4(m1, m2):
mr = np.empty((m2.shape[0], m1.shape[1]), dtype = np.int64)
m1_T=np.ascontiguousarray(m1.T)
for i in numba.prange(mr.shape[0]):
for j in range(mr.shape[1]):
acc=0
for k in range(m2.shape[1]):
acc+=m1_T[j, k] ^ m2[i, k]
mr[i, j] = acc
return mr
%timeit matxor_4(m1, m2)
#23.8 ms ± 711 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
print(np.allclose(matxor_1(m1, m2),matxor_2(m1, m2)))
#True
print(np.allclose(matxor_1(m1, m2),matxor_3(m1, m2)))
#True
print(np.allclose(matxor_1(m1, m2),matxor_4(m1, m2)))
#True
I was trying to find something that is simpler for me to understand and runs faster. So, I came up with this.
To try and avoid loops, we can convert this problem into a matrix multiplication problem. Matrix multiplications are highly optimized in libraries like numpy(I guess).
XOR operation followed by summing calculates number of dissimilar bits between two bit vectors. We could count this value indirectly by using bipolar vectors, i.e. converting the 1,0 bit vectors to 1,-1 vectors.
Steps:
The matrix can be seen as a stack of bit vectors of length m. You can assume these bit vectors to be the rows of M1 and the columns of M2.
Obtain the product matrix M1 x M2.
Recover the number of dissimilar bits from the values in the product matrix using the following formula.
Each bit vector has m bits. m = length of bit vector
When comparing two bit vectors,
let k = number of similar bits
=> m-k = number of dissimilar bits
let r = intermediate result = result obtained by taking dot product of two 1,-1 vectors
clearly,
r = 1*k + (-1) (m-k)
r = k -m + k
r = 2k -m --(1)
OR
k = (m+r)/2 --(2) = sum(XNOR)
Given bit vectors, we know m. We can compute r as descibed above.
We can compute k using (2)
But we need the number of dissimilar bits (m-k)
m-k = m - ((m+r)/2) = (2m-m-r)/2 = (m-r)/2
number of dissimilar bits = (m-k) = (m-r)/2 = sum(XOR)
Doing the same at the matrix level, we get:
Actual code:
import numpy as np
def matXOR_matXNOR(M1,M2):
N1 = 2*M1 - 1; N2 = 2*M2 - 1 #convert to bipolar vectors
m1, n = N1.shape #N2 is n x m2 matrix
intr = np.matmul(N1,N2) #m1 x m2 matrix -> intermediate result
xorMat = (m-intr)//2
xnorMat = (m+intr)//2
return xorMat, xnorMat
I need to find the index of the minimum per row in a 2-dim array which at the same time satifies additional constraint on the column values. Having two arrays a and b
a = np.array([[1,0,1],[0,0,1],[0,0,0],[1,1,1]])
b = np.array([[1,-1,2],[4,-1,1],[1,-1,2],[1,2,-1]])
the objective is to find the indicies for which holds that a == 1, b is positive and b is the minimumim value of the row. Fulfilling the first two conditions is easy
idx = np.where(np.logical_and(a == 1, b > 0))
which yields the indices:
(array([0, 0, 1, 3, 3]), array([0, 2, 2, 0, 1]))
Now I need to filter the duplicate row entries (stick with minimum value only) but I cannot think of an elegant way to achieve that. In the above example the result should be
(array([0,1,3]), array([0,2,0]))
edit:
It should also work for a containing other values than just 0 and 1.
Updated to trying to understand the problem better, try:
c = b*(b*a > 0)
np.where(c==np.min(c[np.nonzero(c)]))
Output:
(array([0, 1, 3], dtype=int64), array([0, 2, 0], dtype=int64))
Timings:
Method 1
a = np.array([[1,0,1],[0,0,1],[0,0,0],[1,1,1]])
b = np.array([[1,-1,2],[4,-1,1],[1,-1,2],[1,2,-1]])
b[b<0] = 100000
cond = [[True if i == b.argmin(axis=1)[k] else False for i in range(b.shape[1])] for k in range(b.shape[0])]
idx = np.where(np.logical_and(np.logical_and(a == 1, b > 0),cond))
idx
Method 2
c = b*(b*a > 0)
idx1 = np.where(c==np.min(c[np.nonzero(c)]))
idx1
Method 1 Timing:
28.3 µs ± 418 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Method 2 Timing:
12.2 µs ± 144 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
I found a solution based on list comprehension. It is necessary to change the negative values of b to some high value though.
a = np.array([[1,0,1],[0,0,1],[0,0,0],[1,1,1]])
b = np.array([[1,-1,2],[4,-1,1],[1,-1,2],[1,2,-1]])
b[b<0] = 100000
cond = [[True if i == b.argmin(axis=1)[k] else False for i in range(b.shape[1])] for k in range(b.shape[0])]
idx = np.where(np.logical_and(np.logical_and(a == 1, b > 0),cond))
print(idx)
(array([0, 1, 3]), array([0, 2, 0]))
Please let me hear what you think.
edit: I just noticed that this solution is horribly slow.
I have one numpy array, where indices are stored in the shape of (n, 2). E.g.:
[[0, 1],
[2, 3],
[1, 2],
[4, 2]]
Then I do some processing and create an array in the shape of (m, 2), where n > m. E.g.:
[[2, 3]
[4, 2]]
Now I want to delete every row in the first array that can be found in the second array as well. So my wanted result is:
[[0, 1],
[1, 2]]
My current solution is as follows:
for row in second_array:
result = np.delete(first_array, np.where(np.all(first_array == second_array, axis=1)), axis=0)
However, this is quiet time consuming if the second is large. Does someone know a numpy only solution, which does not require a loop?
Here's one leveraging the fact that they are positive numbers using matrix-multiplication for dimensionality-reduction -
def setdiff_nd_positivenums(a,b):
s = np.maximum(a.max(0)+1,b.max(0)+1)
return a[~np.isin(a.dot(s),b.dot(s))]
Sample run -
In [82]: a
Out[82]:
array([[0, 1],
[2, 3],
[1, 2],
[4, 2]])
In [83]: b
Out[83]:
array([[2, 3],
[4, 2]])
In [85]: setdiff_nd_positivenums(a,b)
Out[85]:
array([[0, 1],
[1, 2]])
Also, it seems the second-array b is a subset of a. So, we can leverage that scenario to boost the performance even further using np.searchsorted, like so -
def setdiff_nd_positivenums_searchsorted(a,b):
s = np.maximum(a.max(0)+1,b.max(0)+1)
a1D,b1D = a.dot(s),b.dot(s)
b1Ds = np.sort(b1D)
return a[b1Ds[np.searchsorted(b1Ds,a1D)] != a1D]
Timings -
In [146]: np.random.seed(0)
...: a = np.random.randint(0,9,(1000000,2))
...: b = a[np.random.choice(len(a), 10000, replace=0)]
In [147]: %timeit setdiff_nd_positivenums(a,b)
...: %timeit setdiff_nd_positivenums_searchsorted(a,b)
10 loops, best of 3: 101 ms per loop
10 loops, best of 3: 70.9 ms per loop
For generic numbers, here's another using views -
# https://stackoverflow.com/a/45313353/ #Divakar
def view1D(a, b): # a, b are arrays
a = np.ascontiguousarray(a)
b = np.ascontiguousarray(b)
void_dt = np.dtype((np.void, a.dtype.itemsize * a.shape[1]))
return a.view(void_dt).ravel(), b.view(void_dt).ravel()
def setdiff_nd(a,b):
# a,b are the nD input arrays
A,B = view1D(a,b)
return a[~np.isin(A,B)]
Sample run -
In [94]: a
Out[94]:
array([[ 0, 1],
[-2, -3],
[ 1, 2],
[-4, -2]])
In [95]: b
Out[95]:
array([[-2, -3],
[ 4, 2]])
In [96]: setdiff_nd(a,b)
Out[96]:
array([[ 0, 1],
[ 1, 2],
[-4, -2]])
Timings -
In [158]: np.random.seed(0)
...: a = np.random.randint(0,9,(1000000,2))
...: b = a[np.random.choice(len(a), 10000, replace=0)]
In [159]: %timeit setdiff_nd(a,b)
1 loop, best of 3: 352 ms per loop
The numpy-indexed package (disclaimer: I am its author) was designed to perform operations of this type efficiently on nd-arrays.
import numpy_indexed as npi
# if the output should consist of unique values and there is no need to preserve ordering
result = npi.difference(first_array, second_array)
# otherwise:
result = first_array[~npi.in_(first_array, second_array)]
Here is a function that works with 2D arrays of integers with any shape, and accepting both positive and negative numbers:
import numpy as np
# Gets a boolean array of rows of a that are in b
def isin_rows(a, b):
a = np.asarray(a)
b = np.asarray(b)
# Subtract minimum value per column
min = np.minimum(a.min(0), b.min(0))
a = a - min
b = b - min
# Get maximum value per column
max = np.maximum(a.max(0), b.max(0))
# Compute multiplicative base for each column
base = np.roll(max, 1)
base[0] = 1
base = np.cumprod(max)
# Make flattened version of arrays
a_flat = (a * base).sum(1)
b_flat = (b * base).sum(1)
# Check elements of a in b
return np.isin(a_flat, b_flat)
# Test
a = np.array([[0, 1],
[2, 3],
[1, 2],
[4, 2]])
b = np.array([[2, 3],
[4, 2]])
a_in_b_mask = isin_rows(a, b)
a_not_in_b = a[~a_in_b_mask]
print(a_not_in_b)
# [[0 1]
# [1 2]]
EDIT: One possible optimization raises from considering the number of possible rows in b. If b has more rows than the possible number of combinations, then you may find its unique elements first so np.isin is faster:
import numpy as np
def isin_rows_opt(a, b):
a = np.asarray(a)
b = np.asarray(b)
min = np.minimum(a.min(0), b.min(0))
a = a - min
b = b - min
max = np.maximum(a.max(0), b.max(0))
base = np.roll(max, 1)
base[0] = 1
base = np.cumprod(max)
a_flat = (a * base).sum(1)
b_flat = (b * base).sum(1)
# Count number of possible different rows for b
num_possible_b = np.prod(b.max(0) - b.min(0) + 1)
if len(b_flat) > num_possible_b: # May tune this condition
b_flat = np.unique(b_flat)
return np.isin(a_flat, b_flat)
The condition len(b_flat) > num_possible_b should probably be tuned better so you only find for unique elements if it is really going to be worth it (maybe len(b_flat) > 2 * num_possible_b or len(b_flat) > num_possible_b + CONSTANT). It seems to give some improvement for big arrays with fewer values:
import numpy as np
# Test setup from #Divakar
np.random.seed(0)
a = np.random.randint(0, 9, (1000000, 2))
b = a[np.random.choice(len(a), 10000, replace=0)]
print(np.all(isin_rows(a, b) == isin_rows_opt(a, b)))
# True
%timeit isin_rows(a, b)
# 100 ms ± 425 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit isin_rows_opt(a, b)
# 81.2 ms ± 324 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
There are several elegant examples of using numpy in Python to generate arrays of all combinations. For example the answer here: Using numpy to build an array of all combinations of two arrays .
Now suppose there is an extra constraint, namely, the sum of all numbers cannot add up to more than a given constant K. Using a generator and itertools.product, for an example with K=3 where we want the combinations of three variables with ranges 0-1,0-3, and 0-2 we can do it a follows:
from itertools import product
K = 3
maxRange = np.array([1,3,2])
states = np.array([i for i in product(*(range(i+1) for i in maxRange)) if sum(i)<=K])
which returns
array([[0, 0, 0],
[0, 0, 1],
[0, 0, 2],
[0, 1, 0],
[0, 1, 1],
[0, 1, 2],
[0, 2, 0],
[0, 2, 1],
[0, 3, 0],
[1, 0, 0],
[1, 0, 1],
[1, 0, 2],
[1, 1, 0],
[1, 1, 1],
[1, 2, 0]])
In principle, the approach from https://stackoverflow.com/a/25655090/1479342 can be used to generate all possible combinations without the constraint and then selecting the subset of combinations that sum up to less than K. However, that approach generates much more combinations than necessary, especially if K is relatively small compared to sum(maxRange).
There must be a way to do this faster and with lower memory usage. How can this be achieved using a vectorized approach (for example using np.indices)?
Edited
For completeness, I'm adding here the OP's code:
def partition0(max_range, S):
K = len(max_range)
return np.array([i for i in itertools.product(*(range(i+1) for i in max_range)) if sum(i)<=S])
The first approach is pure np.indices. It's fast for small input but consumes a lot of memory (OP already pointed out it's not what he meant).
def partition1(max_range, S):
max_range = np.asarray(max_range, dtype = int)
a = np.indices(max_range + 1)
b = a.sum(axis = 0) <= S
return (a[:,b].T)
Recurrent approach seems to be much better than those above:
def partition2(max_range, max_sum):
max_range = np.asarray(max_range, dtype = int).ravel()
if(max_range.size == 1):
return np.arange(min(max_range[0],max_sum) + 1, dtype = int).reshape(-1,1)
P = partition2(max_range[1:], max_sum)
# S[i] is the largest summand we can place in front of P[i]
S = np.minimum(max_sum - P.sum(axis = 1), max_range[0])
offset, sz = 0, S.size
out = np.empty(shape = (sz + S.sum(), P.shape[1]+1), dtype = int)
out[:sz,0] = 0
out[:sz,1:] = P
for i in range(1, max_range[0]+1):
ind, = np.nonzero(S)
offset, sz = offset + sz, ind.size
out[offset:offset+sz, 0] = i
out[offset:offset+sz, 1:] = P[ind]
S[ind] -= 1
return out
After a short thought, I was able to take it a bit further. If we know in advance the number of possible partitions, we can allocate enough memory at once. (It's somewhat similar to cartesian in an already linked thread.)
First, we need a function which counts partitions.
def number_of_partitions(max_range, max_sum):
'''
Returns an array arr of the same shape as max_range, where
arr[j] = number of admissible partitions for
j summands bounded by max_range[j:] and with sum <= max_sum
'''
M = max_sum + 1
N = len(max_range)
arr = np.zeros(shape=(M,N), dtype = int)
arr[:,-1] = np.where(np.arange(M) <= min(max_range[-1], max_sum), 1, 0)
for i in range(N-2,-1,-1):
for j in range(max_range[i]+1):
arr[j:,i] += arr[:M-j,i+1]
return arr.sum(axis = 0)
The main function:
def partition3(max_range, max_sum, out = None, n_part = None):
if out is None:
max_range = np.asarray(max_range, dtype = int).ravel()
n_part = number_of_partitions(max_range, max_sum)
out = np.zeros(shape = (n_part[0], max_range.size), dtype = int)
if(max_range.size == 1):
out[:] = np.arange(min(max_range[0],max_sum) + 1, dtype = int).reshape(-1,1)
return out
P = partition3(max_range[1:], max_sum, out=out[:n_part[1],1:], n_part = n_part[1:])
# P is now a useful reference
S = np.minimum(max_sum - P.sum(axis = 1), max_range[0])
offset, sz = 0, S.size
out[:sz,0] = 0
for i in range(1, max_range[0]+1):
ind, = np.nonzero(S)
offset, sz = offset + sz, ind.size
out[offset:offset+sz, 0] = i
out[offset:offset+sz, 1:] = P[ind]
S[ind] -= 1
return out
Some tests:
max_range = [3, 4, 6, 3, 4, 6, 3, 4, 6]
for f in [partition0, partition1, partition2, partition3]:
print(f.__name__ + ':')
for max_sum in [5, 15, 25]:
print('Sum %2d: ' % max_sum, end = '')
%timeit f(max_range, max_sum)
print()
partition0:
Sum 5: 1 loops, best of 3: 859 ms per loop
Sum 15: 1 loops, best of 3: 1.39 s per loop
Sum 25: 1 loops, best of 3: 3.18 s per loop
partition1:
Sum 5: 10 loops, best of 3: 176 ms per loop
Sum 15: 1 loops, best of 3: 224 ms per loop
Sum 25: 1 loops, best of 3: 403 ms per loop
partition2:
Sum 5: 1000 loops, best of 3: 809 µs per loop
Sum 15: 10 loops, best of 3: 62.5 ms per loop
Sum 25: 1 loops, best of 3: 262 ms per loop
partition3:
Sum 5: 1000 loops, best of 3: 853 µs per loop
Sum 15: 10 loops, best of 3: 59.1 ms per loop
Sum 25: 1 loops, best of 3: 249 ms per loop
And something larger:
%timeit partition0([3,6] * 5, 20)
1 loops, best of 3: 11.9 s per loop
%timeit partition1([3,6] * 5, 20)
The slowest run took 12.68 times longer than the fastest. This could mean that an intermediate result is being cached
1 loops, best of 3: 2.33 s per loop
# MemoryError in another test
%timeit partition2([3,6] * 5, 20)
1 loops, best of 3: 877 ms per loop
%timeit partition3([3,6] * 5, 20)
1 loops, best of 3: 739 ms per loop
I don't know what's a numpy approach, but here's a reasonably clean solution. Let A be an array of integers and let k be a number that you are given as input.
Start with an empty array B; keep the sum of the array B in a variable s (initially set to zero). Apply the following procedure:
if the sum s of the array B is less than k, then (i) add it to the collection, (ii) and for each element from the original array A, add that element to B and update s, (iii) delete it from A and (iv) recursively apply the procedure; (iv) when the call returns, add the element back to A and update s; else do nothing.
This bottom-up approach prunes invalid branches early on and only visits the necessary subsets (i.e. almost only the subsets that sum to less than k).