I have a 2d array of integers and I want to sum up 2d sub arrays of it. Both arrays can have arbitrary dimensions, although we can assume that the subarray will be orders of magnitudes smaller than the total array.
The reference implementation in python is trivial:
def sub_sums(arr, l, m):
result = np.zeros((len(arr) // l, len(arr[0]) // m))
rows = len(arr) // l * l
cols = len(arr[0]) // m * m
for i in range(rows):
for j in range(cols):
result[i // l, j // m] += arr[i, j]
return result
The question is how I do this best using numpy, hopefully without any looping in python at all. For 1d arrays cumsum and r_ would work and I could use that with a bit of looping to implement a solution for 2d, but I'm still learning numpy and I'm almost certain there's some cleverer way.
Example output:
arr = np.asarray([range(0, 5),
range(4, 9),
range(8, 13),
range(12, 17)])
result = sub_sums(arr, 2, 2)
gives:
[[ 0 1 2 3 4]
[ 4 5 6 7 8]
[ 8 9 10 11 12]
[12 13 14 15 16]]
[[ 10. 18.]
[ 42. 50.]]
There is a blockshaped function which does something rather close to what you want:
In [81]: arr
Out[81]:
array([[ 0, 1, 2, 3, 4],
[ 4, 5, 6, 7, 8],
[ 8, 9, 10, 11, 12],
[12, 13, 14, 15, 16]])
In [82]: blockshaped(arr[:,:4], 2,2)
Out[82]:
array([[[ 0, 1],
[ 4, 5]],
[[ 2, 3],
[ 6, 7]],
[[ 8, 9],
[12, 13]],
[[10, 11],
[14, 15]]])
In [83]: blockshaped(arr[:,:4], 2,2).shape
Out[83]: (4, 2, 2)
Once you have the "blockshaped" array, you can obtain the desired result by reshaping (so the numbers in one block are strung out along a single axis) and then calling the sum method on that axis.
So, with a slight modification of the blockshaped function, you can define sub_sums like this:
import numpy as np
def sub_sums(arr, nrows, ncols):
h, w = arr.shape
h = (h // nrows)*nrows
w = (w // ncols)*ncols
arr = arr[:h,:w]
return (arr.reshape(h // nrows, nrows, -1, ncols)
.swapaxes(1, 2)
.reshape(h // nrows, w // ncols, -1).sum(axis=-1))
arr = np.asarray([range(0, 5),
range(4, 9),
range(8, 13),
range(12, 17)])
print(sub_sums(arr, 2, 2))
yields
[[10 18]
[42 50]]
Edit: Ophion provides a nice improvement -- use np.einsum instead of reshaping before summing:
def sub_sums_ophion(arr, nrows, ncols):
h, w = arr.shape
h = (h // nrows)*nrows
w = (w // ncols)*ncols
arr = arr[:h,:w]
return np.einsum('ijkl->ik', arr.reshape(h // nrows, nrows, -1, ncols))
In [105]: %timeit sub_sums(arr, 2, 2)
10000 loops, best of 3: 112 µs per loop
In [106]: %timeit sub_sums_ophion(arr, 2, 2)
10000 loops, best of 3: 76.2 µs per loop
Here is the simpler way:
In [160]: import numpy as np
In [161]: arr = np.asarray([range(0, 5),
range(4, 9),
range(8, 13),
range(12, 17)])
In [162]: np.add.reduceat(arr, [0], axis=1)
Out[162]:
array([[10],
[30],
[50],
[70]])
In [163]: arr
Out[163]:
array([[ 0, 1, 2, 3, 4],
[ 4, 5, 6, 7, 8],
[ 8, 9, 10, 11, 12],
[12, 13, 14, 15, 16]])
In [164]: import numpy as np
In [165]: arr = np.asarray([range(0, 5),
range(4, 9),
range(8, 13),
range(12, 17)])
In [166]: arr
Out[166]:
array([[ 0, 1, 2, 3, 4],
[ 4, 5, 6, 7, 8],
[ 8, 9, 10, 11, 12],
[12, 13, 14, 15, 16]])
In [167]: np.add.reduceat(arr, [0], axis=1)
Out[167]:
array([[10],
[30],
[50],
[70]])
A very small change in your code is to use slicing and perform the sums of the sub-arrays using the sum() method:
def sub_sums(arr, l, m):
result = np.zeros((len(arr) // l, len(arr[0]) // m))
rows = len(arr) // l * l
cols = len(arr[0]) // m * m
for i in range(len(arr) // l):
for j in range(len(arr[0]) // m):
result[i, j] = arr[i*m:(i+1)*m, j*l:(j+1)*l].sum()
return result
Doing some very simple benchmarks shows that this is slower in the 2x2 case, about equal to your approach in the 3x3 case and faster for bigger sub-arrays (sub_sums2 is your version of the code):
In [19]: arr = np.asarray([range(100)] * 100)
In [20]: %timeit sub_sums(arr, 2, 2)
10 loops, best of 3: 21.8 ms per loop
In [21]: %timeit sub_sums2(arr, 2, 2)
100 loops, best of 3: 9.56 ms per loop
In [22]: %timeit sub_sums(arr, 3, 3)
100 loops, best of 3: 9.58 ms per loop
In [23]: %timeit sub_sums2(arr, 3, 3)
100 loops, best of 3: 9.36 ms per loop
In [24]: %timeit sub_sums(arr, 4, 4)
100 loops, best of 3: 5.58 ms per loop
In [25]: %timeit sub_sums2(arr, 4, 4)
100 loops, best of 3: 9.56 ms per loop
In [26]: %timeit sub_sums(arr, 10, 10)
1000 loops, best of 3: 939 us per loop
In [27]: %timeit sub_sums2(arr, 10, 10)
100 loops, best of 3: 9.48 ms per loop
Notice that with 10x10 sub-arrays it's 1000 times faster. In the 2x2 case it's about twice as slow. Your method basically takes always the same time, while my implementation gets faster with bigger sub-arrays.
I'm pretty sure we can avoid using the for loops explicitly (maybe reshaping the array so that it has the sub-arrays as rows?), but I'm not an expert in numpy and it may take some time before I'll be able to find the final solution. However I believe that 3 orders of magnitude are already a nice improvement.
Related
I am having some problems implementing the following equation in a performant way using Python:
beta and gamma are cartesian coordinates {x,y} and b,m are some index value which can be quite big n=10000. I have a working version of the code which is shown below for the simple case of l=2 and m,b = 4 (l and m always have the same length). I checked the code using timeit and the the bottleneck is the element-wise multiplication with an array of size (3,3) and the reshaping of the resulting array into shape (3m,3m).
Does anybody has an idea how to increase the performance? (I also noticed that my current version suffers a quite big overhead for large values of l....)
import numpy as np
g_l3 = np.array([[1, 4, 5],[2, 6, 7]])
g_l33 = g_l3.reshape(-1, 3, 1) * g_l3.reshape(-1, 1, 3)
A_lm = np.arange(1, 9, 1).reshape(2, 4)
B_lb = np.arange(7, 15, 1).reshape(2, 4)
AB_lmb = A_lm.reshape(-1, 4, 1) * B_lb.reshape(-1, 1, 4)
D_lmb33 = np.sum(g_l33.reshape(-1, 1, 1, 3, 3) * AB_lmb.reshape(-1, 4, 4, 1, 1), axis=0)
D = np.concatenate(np.concatenate(D_lmb33, axis=2), axis=0)
In [387]: %%timeit
...: g_l3 = np.array([[1, 4, 5],[2, 6, 7]])
...
...: D_lmb33 = np.sum(g_l33.reshape(-1, 1, 1, 3, 3) * AB_lmb.reshape(-1, 4,
...: 4, 1, 1), axis=0)
...: D = np.concatenate(np.concatenate(D_lmb33, axis=2), axis=0)
...:
...:
70.7 µs ± 226 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Examining the pieces, and rewriting the reshape with newaxis, which is visually clearer to me - though basically the same speed:
In [388]: g_l3.shape
Out[388]: (2, 3)
In [389]: g_l33.shape
Out[389]: (2, 3, 3)
In [390]: np.allclose(g_l33, g_l3[:,:,None]*g_l3[:,None,:])
Out[390]: True
In [391]: AB_lmb.shape
Out[391]: (2, 4, 4)
In [392]: np.allclose(AB_lmb, A_lm[:,:,None]*B_lb[:,None,:])
Out[392]: True
So these the common outer products on the last dimension of 2d arrays.
And another outer,
In [393]: temp=g_l33.reshape(-1, 1, 1, 3, 3) * AB_lmb.reshape(-1, 4, 4, 1, 1)
In [394]: temp.shape
Out[394]: (2, 4, 4, 3, 3)
In [396]: np.allclose(temp, g_l33[:,None,None,:,:] * AB_lmb[:, :,:, None,None])
Out[396]: True
These probably could be combined into one expression, but that's not necessary.
D_lmb33 sums on the leading dimension:
In [405]: D_lmb33.shape
Out[405]: (4, 4, 3, 3)
the double concatenate can also be done with a transpose and reshape:
In [406]: np.allclose(D_lmb33.transpose(1,2,0,3).reshape(12,12),D)
Out[406]: True
Overall your code appears to make efficient use of the numpy. For a large leading dimension that (N,4,4,3,3) intermediate array could be large, and take time. But within numpy itself there isn't an alternative. I don't think the algebra allows us to do the sum earlier. Using numba or numexpr another question.
I am using the following code to create interaction terms in my data:
def Interaction(x):
for k in range(0,x.shape[1]-1):
for j in range(k+1,x.shape[1]-1):
new = x[:,k] * x[:,j]
x = np.hstack((x,new[:,None]))
return x
My problem is that it is extremely slow compared to SKLearn's PolynomialFeatures. How can I speed it up? I can't use SKLearn because there are a few customizations that I would like to make. For example, I would like to make an interaction variable of X1 * X2 but also X1 * (1-X2), etc.
We should multiply each element of each row pairwise we can do it as np.einsum('ij,ik->ijk, x, x). This is 2 times redundand but still 2 times faster than PolynomialFeatures.
import numpy as np
def interaction(x):
"""
>>> a = np.arange(9).reshape(3, 3)
>>> b = np.arange(6).reshape(3, 2)
>>> a
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
>>> interaction(a)
array([[ 0, 1, 2, 0, 0, 2],
[ 3, 4, 5, 12, 15, 20],
[ 6, 7, 8, 42, 48, 56]])
>>> b
array([[0, 1],
[2, 3],
[4, 5]])
>>> interaction(b)
array([[ 0, 1, 0],
[ 2, 3, 6],
[ 4, 5, 20]])
"""
b = np.einsum('ij,ik->ijk', x, x)
m, n = x.shape
axis1, axis2 = np.triu_indices(n, 1)
axis1 = np.tile(axis1, m)
axis2 = np.tile(axis2, m)
axis0 = np.arange(m).repeat(n * (n - 1) // 2)
return np.c_[x, b[axis0, axis1, axis2].reshape(m, -1)]
Performance comparision:
c = np.arange(30).reshape(6, 5)
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(2, interaction_only=True)
skl = poly.fit_transform
print(np.allclose(interaction(c), skl(c)[:, 1:]))
# True
In [1]: %timeit interaction(c)
118 µs ± 172 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [2]: %timeit skl(c)
243 µs ± 4.69 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Say I have a np.array like this:
a = [1, 3, 4, 5, 60, 43, 53, 4, 46, 54, 56, 78]
Is there a quick method to get the indices of all locations where 3 consecutive numbers are all above some threshold? That is, for some threshold th, get all x where this holds:
a[x]>th and a[x+1]>th and a[x+2]>th
Example: for threshold 40 and the list given above, x should be [4,8,9].
Many thanks.
Approach #1
Use convolution on the mask of boolean array obtained after comparison -
In [40]: a # input array
Out[40]: array([ 1, 3, 4, 5, 60, 43, 53, 4, 46, 54, 56, 78])
In [42]: N = 3 # compare N consecutive numbers
In [44]: T = 40 # threshold for comparison
In [45]: np.flatnonzero(np.convolve(a>T, np.ones(N, dtype=int),'valid')>=N)
Out[45]: array([4, 8, 9])
Approach #2
Use binary_erosion -
In [77]: from scipy.ndimage.morphology import binary_erosion
In [31]: np.flatnonzero(binary_erosion(a>T,np.ones(N, dtype=int), origin=-(N//2)))
Out[31]: array([4, 8, 9])
Approach #3 (Specific case) : Small numbers of consecutive numbers check
For checking such a small number of consecutive numbers (three in this case), we can also slicing on the compared mask for better performance -
m = a>T
out = np.flatnonzero(m[:-2] & m[1:-1] & m[2:])
Benchmarking
Timings on 100000 repeated/tiled array from given sample -
In [78]: a
Out[78]: array([ 1, 3, 4, 5, 60, 43, 53, 4, 46, 54, 56, 78])
In [79]: a = np.tile(a,100000)
In [80]: N = 3
In [81]: T = 40
# Approach #3
In [82]: %%timeit
...: m = a>T
...: out = np.flatnonzero(m[:-2] & m[1:-1] & m[2:])
1000 loops, best of 3: 1.83 ms per loop
# Approach #1
In [83]: %timeit np.flatnonzero(np.convolve(a>T, np.ones(N, dtype=int),'valid')>=N)
100 loops, best of 3: 10.9 ms per loop
# Approach #2
In [84]: %timeit np.flatnonzero(binary_erosion(a>T,np.ones(N, dtype=int), origin=-(N//2)))
100 loops, best of 3: 11.7 ms per loop
try:
th=40
results = [ x for x in range( len( array ) -2 ) if(array[x:x+3].min() > th) ]
which is a list comprehension for
th=40
results = []
for x in range( len( array ) -2 ):
if( array[x:x+3].min() > th ):
results.append( x )
Another approach, using numpy.lib.stride_tricks.as_strided:
in [59]: import numpy as np
In [60]: from numpy.lib.stride_tricks import as_strided
Define the input data:
In [61]: a = np.array([ 1, 3, 4, 5, 60, 43, 53, 4, 46, 54, 56, 78])
In [62]: N = 3
In [63]: threshold = 40
Compute the result; q is the boolean mask for the "big" values.
In [64]: q = a > threshold
In [65]: result = np.all(as_strided(q, shape=(len(q)-N+1, N), strides=(q.strides[0], q.strides[0])), axis=1).nonzero()[0]
In [66]: result
Out[66]: array([4, 8, 9])
Do it again with N = 4:
In [67]: N = 4
In [68]: result = np.all(as_strided(q, shape=(len(q)-N+1, N), strides=(q.strides[0], q.strides[0])), axis=1).nonzero()[0]
In [69]: result
Out[69]: array([8])
I want to get the rank of each element, so I use argsort in numpy:
np.argsort(np.array((1,1,1,2,2,3,3,3,3)))
array([0, 1, 2, 3, 4, 5, 6, 7, 8])
it give the same element the different rank, can I get the same rank like:
array([0, 0, 0, 3, 3, 5, 5, 5, 5])
If you don't mind a dependency on scipy, you can use scipy.stats.rankdata, with method='min':
In [14]: a
Out[14]: array([1, 1, 1, 2, 2, 3, 3, 3, 3])
In [15]: from scipy.stats import rankdata
In [16]: rankdata(a, method='min')
Out[16]: array([1, 1, 1, 4, 4, 6, 6, 6, 6])
Note that rankdata starts the ranks at 1. To start at 0, subtract 1 from the result:
In [17]: rankdata(a, method='min') - 1
Out[17]: array([0, 0, 0, 3, 3, 5, 5, 5, 5])
If you don't want the scipy dependency, you can use numpy.unique to compute the ranking. Here's a function that computes the same result as rankdata(x, method='min') - 1:
import numpy as np
def rankmin(x):
u, inv, counts = np.unique(x, return_inverse=True, return_counts=True)
csum = np.zeros_like(counts)
csum[1:] = counts[:-1].cumsum()
return csum[inv]
For example,
In [137]: x = np.array([60, 10, 0, 30, 20, 40, 50])
In [138]: rankdata(x, method='min') - 1
Out[138]: array([6, 1, 0, 3, 2, 4, 5])
In [139]: rankmin(x)
Out[139]: array([6, 1, 0, 3, 2, 4, 5])
In [140]: a = np.array([1,1,1,2,2,3,3,3,3])
In [141]: rankdata(a, method='min') - 1
Out[141]: array([0, 0, 0, 3, 3, 5, 5, 5, 5])
In [142]: rankmin(a)
Out[142]: array([0, 0, 0, 3, 3, 5, 5, 5, 5])
By the way, a single call to argsort() does not give ranks. You can find an assortment of approaches to ranking in the question Rank items in an array using Python/NumPy, including how to do it using argsort().
Alternatively, pandas series has a rank method which does what you need with the min method:
import pandas as pd
pd.Series((1,1,1,2,2,3,3,3,3)).rank(method="min")
# 0 1
# 1 1
# 2 1
# 3 4
# 4 4
# 5 6
# 6 6
# 7 6
# 8 6
# dtype: float64
With focus on performance, here's an approach -
def rank_repeat_based(arr):
idx = np.concatenate(([0],np.flatnonzero(np.diff(arr))+1,[arr.size]))
return np.repeat(idx[:-1],np.diff(idx))
For a generic case with the elements in input array not already sorted, we would need to use argsort() to keep track of the positions. So, we would have a modified version, like so -
def rank_repeat_based_generic(arr):
sidx = np.argsort(arr,kind='mergesort')
idx = np.concatenate(([0],np.flatnonzero(np.diff(arr[sidx]))+1,[arr.size]))
return np.repeat(idx[:-1],np.diff(idx))[sidx.argsort()]
Runtime test
Testing out all the approaches listed thus far to solve the problem on a large dataset.
Sorted array case :
In [96]: arr = np.sort(np.random.randint(1,100,(10000)))
In [97]: %timeit rankdata(arr, method='min') - 1
1000 loops, best of 3: 635 µs per loop
In [98]: %timeit rankmin(arr)
1000 loops, best of 3: 495 µs per loop
In [99]: %timeit (pd.Series(arr).rank(method="min")-1).values
1000 loops, best of 3: 826 µs per loop
In [100]: %timeit rank_repeat_based(arr)
10000 loops, best of 3: 200 µs per loop
Unsorted case :
In [106]: arr = np.random.randint(1,100,(10000))
In [107]: %timeit rankdata(arr, method='min') - 1
1000 loops, best of 3: 963 µs per loop
In [108]: %timeit rankmin(arr)
1000 loops, best of 3: 869 µs per loop
In [109]: %timeit (pd.Series(arr).rank(method="min")-1).values
1000 loops, best of 3: 1.17 ms per loop
In [110]: %timeit rank_repeat_based_generic(arr)
1000 loops, best of 3: 1.76 ms per loop
I've written a function for the same purpose. It uses pure python and numpy only. Please have a look. I put comments as well.
def my_argsort(array):
# this type conversion let us work with python lists and pandas series
array = np.array(array)
# create mapping for unique values
# it's a dictionary where keys are values from the array and
# values are desired indices
unique_values = list(set(array))
mapping = dict(zip(unique_values, np.argsort(unique_values)))
# apply mapping to our array
# np.vectorize works similar map(), and can work with dictionaries
array = np.vectorize(mapping.get)(array)
return array
Hope that helps.
Complex solutions are unnecessary for this problem.
> ary = np.sort([1, 1, 1, 2, 2, 3, 3, 3, 3]) # or anything; must be sorted.
> a = np.diff().cumsum(); a
array([0, 0, 1, 1, 2, 2, 2, 2])
> b = np.r_[0, a]; b # ties get first open rank
array([0, 0, 0, 1, 1, 2, 2, 2, 2])
> c = np.flatnonzero(ary[1:] != ary[:-1])
> np.r_[0, 1 + c][b] # ties get last open rank
array([0, 0, 0, 3, 3, 5, 5, 5])
In order to find the index of the smallest value, I can use argmin:
import numpy as np
A = np.array([1, 7, 9, 2, 0.1, 17, 17, 1.5])
print A.argmin() # 4 because A[4] = 0.1
But how can I find the indices of the k-smallest values?
I'm looking for something like:
print A.argmin(numberofvalues=3)
# [4, 0, 7] because A[4] <= A[0] <= A[7] <= all other A[i]
Note: in my use case A has between ~ 10 000 and 100 000 values, and I'm interested for only the indices of the k=10 smallest values. k will never be > 10.
Use np.argpartition. It does not sort the entire array. It only guarantees that the kth element is in sorted position and all smaller elements will be moved before it. Thus the first k elements will be the k-smallest elements.
import numpy as np
A = np.array([1, 7, 9, 2, 0.1, 17, 17, 1.5])
k = 3
idx = np.argpartition(A, k)
print(idx)
# [4 0 7 3 1 2 6 5]
This returns the k-smallest values. Note that these may not be in sorted order.
print(A[idx[:k]])
# [ 0.1 1. 1.5]
To obtain the k-largest values use
idx = np.argpartition(A, -k)
# [4 0 7 3 1 2 6 5]
A[idx[-k:]]
# [ 9. 17. 17.]
WARNING: Do not (re)use idx = np.argpartition(A, k); A[idx[-k:]] to obtain the k-largest.
That won't always work. For example, these are NOT the 3 largest values in x:
x = np.array([100, 90, 80, 70, 60, 50, 40, 30, 20, 10, 0])
idx = np.argpartition(x, 3)
x[idx[-3:]]
array([ 70, 80, 100])
Here is a comparison against np.argsort, which also works but just sorts the entire array to get the result.
In [2]: x = np.random.randn(100000)
In [3]: %timeit idx0 = np.argsort(x)[:100]
100 loops, best of 3: 8.26 ms per loop
In [4]: %timeit idx1 = np.argpartition(x, 100)[:100]
1000 loops, best of 3: 721 µs per loop
In [5]: np.alltrue(np.sort(np.argsort(x)[:100]) == np.sort(np.argpartition(x, 100)[:100]))
Out[5]: True
You can use numpy.argsort with slicing
>>> import numpy as np
>>> A = np.array([1, 7, 9, 2, 0.1, 17, 17, 1.5])
>>> np.argsort(A)[:3]
array([4, 0, 7], dtype=int32)
For n-dimentional arrays, this function works well. The indecies are returned in a callable form. If you want a list of the indices to be returned, then you need to transpose the array before you make a list.
To retrieve the k largest, simply pass in -k.
def get_indices_of_k_smallest(arr, k):
idx = np.argpartition(arr.ravel(), k)
return tuple(np.array(np.unravel_index(idx, arr.shape))[:, range(min(k, 0), max(k, 0))])
# if you want it in a list of indices . . .
# return np.array(np.unravel_index(idx, arr.shape))[:, range(k)].transpose().tolist()
Example:
r = np.random.RandomState(1234)
arr = r.randint(1, 1000, 2 * 4 * 6).reshape(2, 4, 6)
indices = get_indices_of_k_smallest(arr, 4)
indices
# (array([1, 0, 0, 1], dtype=int64),
# array([3, 2, 0, 1], dtype=int64),
# array([3, 0, 3, 3], dtype=int64))
arr[indices]
# array([ 4, 31, 54, 77])
%%timeit
get_indices_of_k_smallest(arr, 4)
# 17.1 µs ± 651 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
numpy.partition(your_array, k) is an alternative. No slicing necessary as it gives the values sorted until the kth element.