Numpy Vectorization of sliding-window operation

Numpy Vectorization of sliding-window operation - python

I have the following numpy arrays:
arr_1 = [[1,2],[3,4],[5,6]] # 3 X 2
arr_2 = [[0.5,0.6],[0.7,0.8],[0.9,1.0],[1.1,1.2],[1.3,1.4]] # 5 X 2
arr_1 is clearly a 3 X 2 array, whereas arr_2 is a 5 X 2 array.
Now without looping, I want to element-wise multiply arr_1 and arr_2 so that I apply a sliding window technique (window size 3) to arr_2.
Example:
Multiplication 1: np.multiply(arr_1,arr_2[:3,:])
Multiplication 2: np.multiply(arr_1,arr_2[1:4,:])
Multiplication 3: np.multiply(arr_1,arr_2[2:5,:])
I want to do this in some sort of a matrix multiplication form to make it faster than my current solution which is of the form:
for i in (2):
np.multiply(arr_1,arr_2[i:i+3,:])
So if the number of rows in arr_2 are large (of the order of tens of thousands), this solution doesn't really scale very well.
Any help would be much appreciated.

We can use NumPy broadcasting to create those sliding windowed indices in a vectorized manner. Then, we can simply index into arr_2 with those to create a 3D array and perform element-wise multiplication with 2D array arr_1, which in turn will bring on broadcasting again.
So, we would have a vectorized implementation like so -
W = arr_1.shape[0] # Window size
idx = np.arange(arr_2.shape[0]-W+1)[:,None] + np.arange(W)
out = arr_1*arr_2[idx]
Runtime test and verify results -
In [143]: # Input arrays
...: arr_1 = np.random.rand(3,2)
...: arr_2 = np.random.rand(10000,2)
...:
...: def org_app(arr_1,arr_2):
...: W = arr_1.shape[0] # Window size
...: L = arr_2.shape[0]-W+1
...: out = np.empty((L,W,arr_1.shape[1]))
...: for i in range(L):
...: out[i] = np.multiply(arr_1,arr_2[i:i+W,:])
...: return out
...:
...: def vectorized_app(arr_1,arr_2):
...: W = arr_1.shape[0] # Window size
...: idx = np.arange(arr_2.shape[0]-W+1)[:,None] + np.arange(W)
...: return arr_1*arr_2[idx]
...:
In [144]: np.allclose(org_app(arr_1,arr_2),vectorized_app(arr_1,arr_2))
Out[144]: True
In [145]: %timeit org_app(arr_1,arr_2)
10 loops, best of 3: 47.3 ms per loop
In [146]: %timeit vectorized_app(arr_1,arr_2)
1000 loops, best of 3: 1.21 ms per loop

This is a nice case to test the speed of as_strided and Divakar's broadcasting.
In [281]: %%timeit
...: out=np.empty((L,W,arr1.shape[1]))
...: for i in range(L):
...: out[i]=np.multiply(arr1,arr2[i:i+W,:])
...:
10 loops, best of 3: 48.9 ms per loop
In [282]: %%timeit
...: idx=np.arange(L)[:,None]+np.arange(W)
...: out=arr1*arr2[idx]
...:
100 loops, best of 3: 2.18 ms per loop
In [283]: %%timeit
...: arr3=as_strided(arr2, shape=(L,W,2), strides=(16,16,8))
...: out=arr1*arr3
...:
1000 loops, best of 3: 805 µs per loop
Create Numpy array without enumerating array for more of a comparison of these methods.

Related

Multiple cumulative sum within a numpy array

I'm sort of newbie in numpy so I'm sorry if this question was already asked. I'm looking for a vectorization solution which enable to run multiple cumsum of different size within a one dimension numpy array.
my_vector=np.array([1,2,3,4,5])
size_of_groups=np.array([3,2])
I would like something like
np.cumsum.group(my_vector,size_of_groups)
[1,3,6,4,9]
I do not want a solution with loops. Either numpy functions or numpy operations.

Not sure about numpy, but pandas can do this pretty easily with a groupby + cumsum:
import pandas as pd
s = pd.Series(my_vector)
s.groupby(s.index.isin(size_of_groups.cumsum()).cumsum()).cumsum()
0 1
1 3
2 6
3 4
4 9
dtype: int64

Here's a vectorized solution -
def intervaled_cumsum(ar, sizes):
# Make a copy to be used as output array
out = ar.copy()
# Get cumumlative values of array
arc = ar.cumsum()
# Get cumsumed indices to be used to place differentiated values into
# input array's copy
idx = sizes.cumsum()
# Place differentiated values that when cumumlatively summed later on would
# give us the desired intervaled cumsum
out[idx[0]] = ar[idx[0]] - arc[idx[0]-1]
out[idx[1:-1]] = ar[idx[1:-1]] - np.diff(arc[idx[:-1]-1])
return out.cumsum()
Sample run -
In [114]: ar = np.array([1,2,3,4,5,6,7,8,9,10,11,12])
...: sizes = np.array([3,2,2,3,2])
In [115]: intervaled_cumsum(ar, sizes)
Out[115]: array([ 1, 3, 6, 4, 9, 6, 13, 8, 17, 27, 11, 23])
Benchmarking
Other approach(es) -
# #cᴏʟᴅsᴘᴇᴇᴅ's solution
import pandas as pd
def pandas_soln(my_vector, sizes):
s = pd.Series(my_vector)
return s.groupby(s.index.isin(sizes.cumsum()).cumsum()).cumsum().values
The given sample used two intervals of lengths 2 and 3 Keeping that and simply giving it more number of groups for timing purpose.
Timings -
In [146]: N = 10000 # number of groups
...: np.random.seed(0)
...: sizes = np.random.randint(2,4,(N))
...: ar = np.random.randint(0,N,sizes.sum())
In [147]: %timeit intervaled_cumsum(ar, sizes)
...: %timeit pandas_soln(ar, sizes)
10000 loops, best of 3: 178 µs per loop
1000 loops, best of 3: 1.82 ms per loop
In [148]: N = 100000 # number of groups
...: np.random.seed(0)
...: sizes = np.random.randint(2,4,(N))
...: ar = np.random.randint(0,N,sizes.sum())
In [149]: %timeit intervaled_cumsum(ar, sizes)
...: %timeit pandas_soln(ar, sizes)
100 loops, best of 3: 3.91 ms per loop
100 loops, best of 3: 17.3 ms per loop
In [150]: N = 1000000 # number of groups
...: np.random.seed(0)
...: sizes = np.random.randint(2,4,(N))
...: ar = np.random.randint(0,N,sizes.sum())
In [151]: %timeit intervaled_cumsum(ar, sizes)
...: %timeit pandas_soln(ar, sizes)
10 loops, best of 3: 31.6 ms per loop
1 loop, best of 3: 357 ms per loop

Here is an unconventional solution. Not very fast, though. (Even a bit slower than pandas).
>>> from scipy import linalg
>>>
>>> N = len(my_vector)
>>> D = np.repeat((*zip((1,-1)),), N, axis=1)
>>> D[1, np.cumsum(size_of_groups) - 1] = 0
>>>
>>> linalg.solve_banded((1, 0), D, my_vector)
array([1., 3., 6., 4., 9.])

Vectorizing nearest neighbor computation

I have the following function which is returning an array calculating the nearest neighbor:
def p_batch(U,X,Y):
return [nearest(u,X,Y) for u in U]
I would like to replace the for loop using numpy. I've been looking into numpy.vectorize() as this seems to be the right approach, but I can't get it to work. This is what I've tried so far:
def n_batch(U,X,Y):
vbatch = np.vectorize(nearest)
return vbatch(U,X,Y)
Can anyone give me a hint where I went wrong?
Edit:
Implementation of nearest:
def nearest(u,X,Y):
return Y[np.argmin(np.sqrt(np.sum(np.square(np.subtract(u,X)),axis=1)))]
Function for U,X,Y (with M=20,N=100,d=50):
U = numpy.random.mtrand.RandomState(123).uniform(0,1,[M,d])
X = numpy.random.mtrand.RandomState(456).uniform(0,1,[N,d])
Y = numpy.random.mtrand.RandomState(789).randint(0,2,[N])

Approach #1
You could use Scipy's cdist to generate all those euclidean distances and then simply use argmin and index into Y -
from scipy.spatial.distance import cdist
out = Y[cdist(U,X).argmin(1)]
Sample run -
In [76]: M,N,d = 5,6,3
...: U = np.random.mtrand.RandomState(123).uniform(0,1,[M,d])
...: X = np.random.mtrand.RandomState(456).uniform(0,1,[N,d])
...: Y = np.random.mtrand.RandomState(789).randint(0,2,[N])
...:
# Using a loop comprehension to verify values
In [77]: [nearest(U[i], X,Y) for i in range(len(U))]
Out[77]: [1, 0, 0, 1, 1]
In [78]: Y[cdist(U,X).argmin(1)]
Out[78]: array([1, 0, 0, 1, 1])
Approach #2
Another way with sklearn.metrics.pairwise_distances_argmin_min to give us those argmin indices directly -
from sklearn.metrics import pairwise
Y[pairwise.pairwise_distances_argmin(U,X)]
Runtime test with M=20,N=100,d=50 -
In [90]: M,N,d = 20,100,50
...: U = np.random.mtrand.RandomState(123).uniform(0,1,[M,d])
...: X = np.random.mtrand.RandomState(456).uniform(0,1,[N,d])
...: Y = np.random.mtrand.RandomState(789).randint(0,2,[N])
...:
Testing between cdist and pairwise_distances_argmin -
In [91]: %timeit cdist(U,X).argmin(1)
10000 loops, best of 3: 55.2 µs per loop
In [92]: %timeit pairwise.pairwise_distances_argmin(U,X)
10000 loops, best of 3: 90.6 µs per loop
Timings against loopy version -
In [93]: %timeit [nearest(U[i], X,Y) for i in range(len(U))]
1000 loops, best of 3: 298 µs per loop
In [94]: %timeit Y[cdist(U,X).argmin(1)]
10000 loops, best of 3: 55.6 µs per loop
In [95]: %timeit Y[pairwise.pairwise_distances_argmin(U,X)]
10000 loops, best of 3: 91.1 µs per loop
In [96]: 298.0/55.6 # Speedup with cdist over loopy one
Out[96]: 5.359712230215827

How to generate a cyclic sequence of numbers without using looping?

I want to generate a cyclic sequence of numbers like: [A B C A B C] with arbitrary length N I tried:
import numpy as np
def cyclic(N):
x = np.array([1.0,2.0,3.0]) # The main sequence
y = np.tile(x,N//3) # Repeats the sequence N//3 times
return y
but the problem with my code is if i enter any integer which ain't dividable by three then the results would have smaller length (N) than I excpected. I know this is very newbish question but i really got stuck

You can just use numpy.resize
x = np.array([1.0, 2.0, 3.0])
y = np.resize(x, 13)
y
Out[332]: array([ 1., 2., 3., 1., 2., 3., 1., 2., 3., 1., 2., 3., 1.])
WARNING: This is answer does not extend to 2D, as resize flattens the array before repeating it.

Approach #1 : Here'e one approach to handle generic sequences using modulus to generate those cyclic indices -
def cyclic_seq(x, N):
return np.take(x, np.mod(np.arange(N),len(x)))
Approach #2 : For performance, here's another method that tiles to the multiple of the max number of intervals and then making use of slicing to select the first N elements -
def cyclic_seq_v2(x, N):
return np.tile(x,(N+N-1)//len(x))[:N]
Sample runs -
In [81]: cyclic_seq([6,9,2,1,7],14)
Out[81]: array([6, 9, 2, 1, 7, 6, 9, 2, 1, 7, 6, 9, 2, 1])
In [82]: cyclic_seq_v2([6,9,2,1,7],14)
Out[82]: array([6, 9, 2, 1, 7, 6, 9, 2, 1, 7, 6, 9, 2, 1])
Runtime test
In [327]: x = np.random.randint(0,9,(3))
In [328]: %timeit np.resize(x, 10000) # #Daniel Forsman's solution
...: %timeit list(itertools.islice(itertools.cycle(x),10000)) # #Chris soln
...: %timeit cyclic_seq(x,10000) # Approach #1 from this post
...: %timeit cyclic_seq_v2(x,10000) # Approach #2 from this post
...:
1000 loops, best of 3: 296 µs per loop
10000 loops, best of 3: 185 µs per loop
10000 loops, best of 3: 120 µs per loop
10000 loops, best of 3: 28.7 µs per loop
In [329]: x = np.random.randint(0,9,(30))
In [330]: %timeit np.resize(x, 10000) # #Daniel Forsman's solution
...: %timeit list(itertools.islice(itertools.cycle(x),10000)) # #Chris soln
...: %timeit cyclic_seq(x,10000) # Approach #1 from this post
...: %timeit cyclic_seq_v2(x,10000) # Approach #2 from this post
...:
10000 loops, best of 3: 38.8 µs per loop
10000 loops, best of 3: 101 µs per loop
10000 loops, best of 3: 115 µs per loop
100000 loops, best of 3: 13.2 µs per loop
In [331]: %timeit np.resize(x, 100000) # #Daniel Forsman's solution
...: %timeit list(itertools.islice(itertools.cycle(x),100000)) # #Chris soln
...: %timeit cyclic_seq(x,100000) # Approach #1 from this post
...: %timeit cyclic_seq_v2(x,100000) # Approach #2 from this post
...:
1000 loops, best of 3: 297 µs per loop
1000 loops, best of 3: 942 µs per loop
1000 loops, best of 3: 1.13 ms per loop
10000 loops, best of 3: 88.3 µs per loop
On performance, approach #2 seems to be working quite well.

First over-length it (using math.ceil) then resize it after tile
import numpy as np
import math
def cyclic(N):
x = np.array([1.0,2.0,3.0]) # The main sequence
y = np.tile(x, math.ceil(N / 3.0))
y = np.resize(y, N)
return y
After taking Daniel Forsman's suggestion, it can be simplified as
import numpy as np
def cyclic(N):
x = np.array([1.0,2.0,3.0]) # The main sequence
y = np.resize(x, N)
return y
because np.resize automatically tiles the response in 1D

You can use itertools.cycle, an infinite iterator, for this:
>>> import itertools
>>> it = itertools.cycle([1,2,3])
>>> next(it)
1
>>> next(it)
2
>>> next(it)
3
>>> next(it)
1
You get a specific length of sequence (N), combine it with itertools.islice:
>>> list(itertools.islice(itertools.cycle([1,2,3]),11))
[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2]
EDIT: as you can see in Divakar's benchmark, this approach is generally intermediate in terms of speed compared to other answers. I recommend when this solution when you want an iterator returned rather than a list or numpy array.

You can use itertools cycle for that.
In [3]: from itertools import cycle
In [4]: for x in cycle(['A','B','C']):
...: print(x)
...:
C
A
B
C
A
B
C
A
B
C
A
B
C
A
B
C
A
B
Edit:
If you want to implement it with out loops, you are going to need recursive functions. Solutions based on itertools cycle and the like are just hiding the loops behind the imported function.
In [5]: def repeater(arr, n):
...: yield arr[0]
...: yield arr[1]
...: yield arr[2]
...: if n == 0:
...: yield StopIteration
...: else:
...: yield from repeater(arr, n-1)
...:

efficiently fill a tensor in numpy

I have a numpy array X of size (n, m) and of type np.uint8 (so it only contains values in [0, 255]). I also have a mapping f from [0, 255] to [0, 3].
I want to create an array Y of shape (4, n, m) such that y_{k, i, j} = 1 if k == f(x_{i, j}) and 0 otherwise. For now, I do it like this:
Y = np.zeros((4, n, m))
for i in range(256):
Y[f(i), X == i] = 1
But this is super slow, and I'm not able to find a more efficient way to do it. Any ideas?

Assuming f could operate on all iterating values in one-go, you can use broadcasting -
Yout = (f(X) == np.arange(4)[:,None,None]).astype(int)
Runtime test and verification -
In [35]: def original_app(X,n,m):
...: Y = np.zeros((4, n, m))
...: for i in range(256):
...: Y[f(i), X == i] = 1
...: return Y
...:
In [36]: # Setup Inputs
...: n,m = 2000,2000
...: X = np.random.randint(0,255,(n,m)).astype('uint8')
...: v = np.random.randint(4, size=(256,))
...: def f(x):
...: return v[x]
...:
In [37]: Y = original_app(X,n,m)
...: Yout = (f(X) == np.arange(4)[:,None,None]).astype(int)
...:
In [38]: np.allclose(Yout,Y) # Verify
Out[38]: True
In [39]: %timeit original_app(X,n,m)
1 loops, best of 3: 3.77 s per loop
In [40]: %timeit (f(X) == np.arange(4)[:,None,None]).astype(int)
10 loops, best of 3: 74.5 ms per loop

The mix of scalar indexing and boolean appears to be hurting your speed:
In [706]: %%timeit
...: Y=np.zeros((4,3,4))
...: for i in range(256):
...: Y[f(i), X==i]+=1
...:
100 loops, best of 3: 12.5 ms per loop
In [722]: %%timeit
...: Y=np.zeros((4,3,4))
...: for i in range(256):
...: I,J=np.where(X==i)
...: Y[f(i),I,J] = 1
...:
100 loops, best of 3: 8.55 ms per loop
This is for
X=np.arange(12,dtype=np.uint8).reshape(3,4)
def f(i):
return i%4
In this case, the f(i) is not a major time consumer:
In [718]: timeit K=[f(i) for i in range(256)]
10000 loops, best of 3: 120 µs per loop
but getting the X==i indexes is slow
In [720]: timeit K=[X==i for i in range(256)]
1000 loops, best of 3: 1.29 ms per loop
In [721]: timeit K=[np.where(X==i) for i in range(256)]
100 loops, best of 3: 2.73 ms per loop
We need to rethink the X==i part of the mapping, rather than the f(i) part.
=====================
Flattening the last 2 dimensions helps;
In [780]: %%timeit
...: X1=X.ravel()
...: Y=np.zeros((4,12))
...: for i in range(256):
...: Y[f(i),X1==i]=1
...: Y.shape=(4,3,4)
...:
100 loops, best of 3: 3.16 ms per loop

Cython numpy array indexer speed improvement

I wrote the following code in pure python, the description of what it does is in the docstrings:
import numpy as np
from scipy.ndimage.measurements import find_objects
import itertools
def alt_indexer(arr):
"""
Returns a dictionary with the elements of arr as key
and the corresponding slice as value.
Note:
This function assumes arr is sorted.
Example:
>>> arr = [0,0,3,2,1,2,3]
>>> loc = _indexer(arr)
>>> loc
{0: (slice(0L, 2L, None),),
1: (slice(2L, 3L, None),),
2: (slice(3L, 5L, None),),
3: (slice(5L, 7L, None),)}
>>> arr = sorted(arr)
>>> arr[loc[3][0]]
[3, 3]
>>> arr[loc[2][0]]
[2, 2]
"""
unique, counts = np.unique(arr, return_counts=True)
labels = np.arange(1,len(unique)+1)
labels = np.repeat(labels,counts)
slicearr = find_objects(labels)
index_dict = dict(itertools.izip(unique,slicearr))
return index_dict
Since i will be indexing very large arrays, i wanted to speed up the operations by using cython, here is the equivalent implementation:
import numpy as np
cimport numpy as np
def _indexer(arr):
cdef tuple unique_counts = np.unique(arr, return_counts=True)
cdef np.ndarray[np.int32_t,ndim=1] unique = unique_counts[0]
cdef np.ndarray[np.int32_t,ndim=1] counts = unique_counts[1].astype(int)
cdef int start=0
cdef int end
cdef int i
cdef dict d ={}
for i in xrange(len(counts)):
if i>0:
start = counts[i-1]+start
end=counts[i]+start
d[unique[i]]=slice(start,end)
return d
Benchmarks
I compared the time it took to complete both operations:
In [26]: import numpy as np
In [27]: rr=np.random.randint(0,1000,1000000)
In [28]: %timeit _indexer(rr)
10 loops, best of 3: 40.5 ms per loop
In [29]: %timeit alt_indexer(rr) #pure python
10 loops, best of 3: 51.4 ms per loop
As you can see the speed improvements are minimal. I do realize that my code was already partly optimized since i used numpy.
Is there a bottleneck that i am not aware of?
Should i not use np.unique and write my own implementation instead?
Thanks.

With arr having non-negative, not very large and many repeated int numbers, here's an alternative approach using np.bincount to simulate the same behavior as np.unique(arr, return_counts=True) -
def unique_counts(arr):
counts = np.bincount(arr)
mask = counts!=0
unique = np.nonzero(mask)[0]
return unique, counts[mask]
Runtime test
Case #1 :
In [83]: arr = np.random.randint(0,100,(1000)) # Input array
In [84]: unique, counts = np.unique(arr, return_counts=True)
...: unique1, counts1 = unique_counts(arr)
...:
In [85]: np.allclose(unique,unique1)
Out[85]: True
In [86]: np.allclose(counts,counts1)
Out[86]: True
In [87]: %timeit np.unique(arr, return_counts=True)
10000 loops, best of 3: 53.2 µs per loop
In [88]: %timeit unique_counts(arr)
100000 loops, best of 3: 10.2 µs per loop
Case #2:
In [89]: arr = np.random.randint(0,1000,(10000)) # Input array
In [90]: %timeit np.unique(arr, return_counts=True)
1000 loops, best of 3: 713 µs per loop
In [91]: %timeit unique_counts(arr)
10000 loops, best of 3: 39.1 µs per loop
Case #3: Let's run a case with unique having some missing numbers in the min to max range and verify the results against np.unique version as a sanity check. We won't have a lot of repeated numbers in this case and as such isn't expected to be better on performance.
In [98]: arr = np.random.randint(0,10000,(1000)) # Input array
In [99]: unique, counts = np.unique(arr, return_counts=True)
...: unique1, counts1 = unique_counts(arr)
...:
In [100]: np.allclose(unique,unique1)
Out[100]: True
In [101]: np.allclose(counts,counts1)
Out[101]: True
In [102]: %timeit np.unique(arr, return_counts=True)
10000 loops, best of 3: 61.9 µs per loop
In [103]: %timeit unique_counts(arr)
10000 loops, best of 3: 71.8 µs per loop

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Numpy Vectorization of sliding-window operation - python

Related

Multiple cumulative sum within a numpy array

Vectorizing nearest neighbor computation

How to generate a cyclic sequence of numbers without using looping?

efficiently fill a tensor in numpy

Cython numpy array indexer speed improvement

Categories

Resources