How can I improve my custom function vectorization using numpy

How can I improve my custom function vectorization using numpy - python

I am new to python, and even more new to vectorization. I have attempted to vectorize a custom similarity function that should return a matrix of pairwise similarities between each row in an input array.
IMPORTS:
import numpy as np
from itertools import product
from numpy.lib.stride_tricks import sliding_window_view
INPUT:
np.random.seed(11)
a = np.array([0, 0, 0, 0, 0, 10, 0, 0, 0, 50, 0, 0, 5, 0, 0, 10])
b = np.array([0, 0, 5, 0, 0, 10, 0, 0, 0, 50, 0, 0, 10, 0, 0, 5])
c = np.array([0, 0, 5, 1, 0, 20, 0, 0, 0, 30, 0, 1, 10, 0, 0, 5])
m = np.array((a,b,c))
OUTPUT:
custom_func(m)
array([[ 0, 440, 1903],
[ 440, 0, 1603],
[1903, 1603, 0]])
FUNCTION:
def custom_func(arr):
diffs = 0
max_k = 6
for n in range(1, max_k):
arr1 = np.array([np.sum(i, axis = 1) for i in sliding_window_view(arr, window_shape = n, axis = 1)])
# this function uses np.maximum and np.minimum to subtract the max and min elements (element-wise) between two rows and then sum up the entire of that subtraction
diffs += np.sum((np.array([np.maximum(arr1[i[0]], arr1[i[1]]) for i in product(np.arange(len(arr1)), np.arange(len(arr1)))]) - np.array([np.minimum(arr1[i[0]], arr1[i[1]]) for i in product(np.arange(len(arr1)), np.arange(len(arr1)))])), axis = 1) * n
diffs = diffs.reshape(len(arr), -1)
return diffs
The function is quite simple, it sums up the element-wise differences between max and minimum of rows in N sliding windows. This function is much faster than what I was using before finding out about vectorization today (for loops and pandas dataframes yay).
My first thought is to figure out a way to find both the minimum and maximum of my arrays in a single pass since I currently THINK it has to do two passes, but I was unable to figure out how. Also there is a for loop in my current function because I need to do this for multiple N sliding windows, and I am not sure how to do this without the loop.
Any help is appreciated!

Here are the several optimizations you can apply on the code:
use the Numba's JIT to speed up the computation and replace the product call with nested loops
use a more efficient sliding window algorithm (better complexity)
avoid to compute multiple time product and arrange in the loop
reduce the number of implicit temporary arrays allocated (and array Numpy calls)
do not compute the lower triangular part of diffs since it will always be symmetric
(just copy the upper triangular part)
use integer-based indexing rather than slow slow floating-point one
Here is the resulting code:
import numpy as np
from itertools import product
from numpy.lib.stride_tricks import sliding_window_view
import numba as nb
#nb.njit
def custom_func_fast(arr):
h, w = arr.shape[0], arr.shape[1]
diffs = np.zeros((h, h), dtype=arr.dtype)
max_k = 6
for n in range(1, max_k):
arr1 = np.empty(shape=(h, w-n+1), dtype=arr.dtype)
for i in range(h):
# Efficient sliding window algorithm
assert w >= n
s = np.sum(arr[i, 0:n])
arr1[i, 0] = s
for j in range(n, w):
s -= arr[i, j-n]
s += arr[i, j]
arr1[i, j-n+1] = s
# Efficient distance matrix computation
for i in range(h):
for j in range(i+1, h):
s = 0
for k in range(w-n+1):
s += np.abs(arr1[i,k] - arr1[j,k])
diffs[i, j] += s * n
# Fill the lower triangular part
for i in range(h):
for j in range(i):
diffs[i, j] = diffs[j, i]
return diffs
The resulting code is 290 times faster on the example input array on my machine.

You can start by removing the first list comprehension:
arr1 = sliding_window_view(arr, window_shape = n, axis = 1).sum(axis=2)
I'm not going to touch that long diffs line :(

Related

'Lossy' cumsum in numpy

I have an array a of length N and need to implement the following operation:
With p in [0..1]. This equation is a lossy sum, where the first indexes in the sum are weighted by a greater loss (p^{n-i}) than the last ones. The last index (i=n) is always weigthed by 1. if p = 1, then the operation is a simple cumsum.
b = np.cumsum(a)
If if p != 1, I can implement this operation in a cpu-inefficient way:
b = np.empty(np.shape(a))
# I'm using the (-1,-1,-1) idiom for reversed ranges
p_vec = np.power(p, np.arange(N-1, 0-1, -1))
# p_vec[0] = p^{N-1}, p_vec[-1] = 1
for n in range(N):
b[n] = np.sum(a[:n+1]*p_vec[-(n+1):])
Or in a memory-inefficient but vectorized way (IMO is cpu inefficient too, since a lot of work is wasted):
a_idx = np.reshape(np.arange(N+1), (1, N+1)) - np.reshape(np.arange(N-1, 0-1, -1), (N, 1))
a_idx = np.maximum(0, a_idx)
# For N=4, a_idx looks like this:
# [[0, 0, 0, 0, 1],
# [0, 0, 0, 1, 2],
# [0, 0, 1, 2, 3],
# [0, 1, 2, 3, 4]]
a_ext = np.concatenate(([0], a,), axis=0) # len(a_ext) = N + 1
p_vec = np.power(p, np.arange(N, 0-1, -1)) # len(p_vec) = N + 1
b = np.dot(a_ext[a_idx], p_vec)
Is there a better way to achieve this 'lossy' cumsum?

What you want is a IIR filter, you can use scipy.signal.lfilter(), here is the code:
Your code:
import numpy as np
N = 10
p = 0.8
np.random.seed(0)
x = np.random.randn(N)
y = np.empty_like(x)
p_vec = np.power(p, np.arange(N-1, 0-1, -1))
for n in range(N):
y[n] = np.sum(x[:n+1]*p_vec[-(n+1):])
y
the output:
array([1.76405235, 1.81139909, 2.42785725, 4.183179 , 5.21410119,
3.19400307, 3.50529088, 2.65287549, 2.01908154, 2.02586374])
By using lfilter():
from scipy import signal
y = signal.lfilter([1], [1, -p], x)
print(y)
the output:
array([1.76405235, 1.81139909, 2.42785725, 4.183179 , 5.21410119,
3.19400307, 3.50529088, 2.65287549, 2.01908154, 2.02586374])

Randomize part of an array

I'm working on a project involving binary patterns (here np.arrays of 0 and 1).
I'd like to modify a random subset of these and return several altered versions of the pattern where a given fraction of the values have been changed (like map a function to a random subset of an array of fixed size)
ex : take the pattern [0 0 1 0 1] and rate 0.2, return [[0 1 1 0 1] [1 0 1 0 1]]
It seems possible by using auxiliary arrays and iterating with a condition, but is there a "clean" way to do that ?
Thanks in advance !

The map function works on boolean arrays too. You could add the subsample logic to your function, like so:
import numpy as np
rate = 0.2
f = lambda x: np.random.choice((True, x),1,p=[rate,1-rate])[0]
a = np.array([0,0,1,0,1], dtype='bool')
map(f, a)
# This will output array a with on average 20% of the elements changed to "1"
# it can be slightly more or less than 20%, by chance.
Or you could rewrite a map function, like so:
import numpy as np
def map_bitarray(f, b, rate):
'''
maps function f on a random subset of b
:param f: the function, should take a binary array of size <= len(b)
:param b: the binary array
:param rate: the fraction of elements that will be replaced
:return: the modified binary array
'''
c = np.copy(b)
num_elem = len(c)
idx = np.random.choice(range(num_elem), num_elem*rate, replace=False)
c[idx] = f(c[idx])
return c
f = lambda x: True
b = np.array([0,0,1,0,1], dtype='bool')
map_bitarray(f, b, 0.2)
# This will output array b with exactly 20% of the elements changed to "1"

rate=0.2
repeats=5
seed=[0,0,1,0,1]
realizations=np.tile(seed,[repeats,1]) ^ np.random.binomial(1,rate,[repeats,len(seed)])
Use np.tile() to generate a matrix from the seed row.
np.random.binomial() to generate a binomial mask matrix with your requested rate.
Apply the mask with the xor binary operator ^
EDIT:
Based on #Jared Goguen comments, if you want to change 20% of the bits, you can elaborate a mask by choosing elements to change randomly:
seed=[1,0,1,0,1]
rate=0.2
repeats=10
mask_list=[]
for _ in xrange(repeats):
y=np.zeros(len(seed),np.int32)
y[np.random.choice(len(seed),0.2*len(seed))]=1
mask_list.append(y)
mask = np.vstack(mask_list)
realizations=np.tile(seed,[repeats,1]) ^ mask

So, there's already an answer that provides sequences where each element has a random transition probability. However, it seems like you might want an exact fraction of the elements to change instead. For example, [1, 0, 0, 1, 0] can change to [1, 1, 0, 1, 0] or [0, 0, 0, 1, 0], but not [1, 1, 1, 1, 0].
The premise, based off of xvan's answer, uses the bit-wise xor operator ^. When a bit is xor'd with 0, it's value will not change. When a bit is xor'd with 1, it will flip. From your question, it seems like you want to change len(seq)*rate number of bits in the sequence. First create mask which contains len(seq)*rate number of 1's. To get an altered sequence, xor the original sequence with a shuffled version of mask.
Here's a simple, inefficient implementation:
import numpy as np
def edit_sequence(seq, rate, count):
length = len(seq)
change = int(length * rate)
mask = [0]*(length - change) + [1]*change
return [seq ^ np.random.permutation(mask) for _ in range(count)]
rate = 0.2
seq = np.array([0, 0, 1, 0, 1])
print edit_sequence(seq, rate, 5)
# [0, 0, 1, 0, 0]
# [0, 1, 1, 0, 1]
# [1, 0, 1, 0, 1]
# [0, 1, 1, 0, 1]
# [0, 0, 0, 0, 1]
I don't really know much about NumPy, so maybe someone with more experience can make this efficient, but the approach seems solid.
Edit: Here's a version that times about 30% faster:
def edit_sequence(seq, rate, count):
mask = np.zeros(len(seq), dtype=int)
mask[:len(seq)*rate] = 1
output = []
for _ in range(count):
np.random.shuffle(mask)
output.append(seq ^ mask)
return output
It appears that this updated version scales very well with the size of seq and the value of count. Using dtype=bool in seq and mask yields another 50% improvement in the timing.

How can I smooth elements of a two-dimensional array with differing gaussian functions in python?

How could I smooth the x[1,3] and x[3,2] elements of the array,
x = np.array([[0,0,0,0,0],[0,0,0,1,0],[0,0,0,0,0],[0,0,1,0,0],[0,0,0,0,0]])
with two two-dimensional gaussian functions of width 1 and 2, respectively? In essence I need a function that allows me to smooth single "point like" array elements with gaussians of differing widths, such that I get an array with smoothly varying values.

I am a little confused with the question you asked and the comments you have posted. It seems to me that you want to use scipy.ndimage.filters.gaussian_filter but I don't understand what you mean by:
[...] gaussian functions with different sigma values to each pixel. [...]
In fact, since you use a 2-dimensional array x the gaussian filter will have 2 parameters. The rule is: one sigma value per dimension rather than one sigma value per pixel.
Here is a short example:
import matplotlib.pyplot as pl
import numpy as np
import scipy as sp
import scipy.ndimage
n = 200 # widht/height of the array
m = 1000 # number of points
sigma_y = 3.0
sigma_x = 2.0
# Create input array
x = np.zeros((n, n))
i = np.random.choice(range(0, n * n), size=m)
x[i / n, i % n] = 1.0
# Plot input array
pl.imshow(x, cmap='Blues', interpolation='nearest')
pl.xlabel("$x$")
pl.ylabel("$y$")
pl.savefig("array.png")
# Apply gaussian filter
sigma = [sigma_y, sigma_x]
y = sp.ndimage.filters.gaussian_filter(x, sigma, mode='constant')
# Display filtered array
pl.imshow(y, cmap='Blues', interpolation='nearest')
pl.xlabel("$x$")
pl.ylabel("$y$")
pl.title("$\sigma_x = " + str(sigma_x) + "\quad \sigma_y = " + str(sigma_y) + "$")
pl.savefig("smooth_array_" + str(sigma_x) + "_" + str(sigma_y) + ".png")
Here is the initial array:
Here are some results for different values of sigma_x and sigma_y:
This allows to properly account for the influence of the second parameter of scipy.ndimage.filters.gaussian_filter.
However, according to the previous quote, you might be more interested in the assigement of different weights to each pixel. In this case, scipy.ndimage.filters.convolve is the function you are looking for. Here is the corresponding example:
import matplotlib.pyplot as pl
import numpy as np
import scipy as sp
import scipy.ndimage
# Arbitrary weights
weights = np.array([[0, 0, 1, 0, 0],
[0, 2, 4, 2, 0],
[1, 4, 8, 4, 1],
[0, 2, 4, 2, 0],
[0, 0, 1, 0, 0]],
dtype=np.float)
weights = weights / np.sum(weights[:])
y = sp.ndimage.filters.convolve(x, weights, mode='constant')
# Display filtered array
pl.imshow(y, cmap='Blues', interpolation='nearest')
pl.xlabel("$x$")
pl.ylabel("$y$")
pl.savefig("smooth_array.png")
And the corresponding result:
I hope this will help you.

Roll rows of a matrix independently

I have a matrix (2d numpy ndarray, to be precise):
A = np.array([[4, 0, 0],
[1, 2, 3],
[0, 0, 5]])
And I want to roll each row of A independently, according to roll values in another array:
r = np.array([2, 0, -1])
That is, I want to do this:
print np.array([np.roll(row, x) for row,x in zip(A, r)])
[[0 0 4]
[1 2 3]
[0 5 0]]
Is there a way to do this efficiently? Perhaps using fancy indexing tricks?

Sure you can do it using advanced indexing, whether it is the fastest way probably depends on your array size (if your rows are large it may not be):
rows, column_indices = np.ogrid[:A.shape[0], :A.shape[1]]
# Use always a negative shift, so that column_indices are valid.
# (could also use module operation)
r[r < 0] += A.shape[1]
column_indices = column_indices - r[:, np.newaxis]
result = A[rows, column_indices]

numpy.lib.stride_tricks.as_strided stricks (abbrev pun intended) again!
Speaking of fancy indexing tricks, there's the infamous - np.lib.stride_tricks.as_strided. The idea/trick would be to get a sliced portion starting from the first column until the second last one and concatenate at the end. This ensures that we can stride in the forward direction as needed to leverage np.lib.stride_tricks.as_strided and thus avoid the need of actually rolling back. That's the whole idea!
Now, in terms of actual implementation we would use scikit-image's view_as_windows to elegantly use np.lib.stride_tricks.as_strided under the hoods. Thus, the final implementation would be -
from skimage.util.shape import view_as_windows as viewW
def strided_indexing_roll(a, r):
# Concatenate with sliced to cover all rolls
a_ext = np.concatenate((a,a[:,:-1]),axis=1)
# Get sliding windows; use advanced-indexing to select appropriate ones
n = a.shape[1]
return viewW(a_ext,(1,n))[np.arange(len(r)), (n-r)%n,0]
Here's a sample run -
In [327]: A = np.array([[4, 0, 0],
...: [1, 2, 3],
...: [0, 0, 5]])
In [328]: r = np.array([2, 0, -1])
In [329]: strided_indexing_roll(A, r)
Out[329]:
array([[0, 0, 4],
[1, 2, 3],
[0, 5, 0]])
Benchmarking
# #seberg's solution
def advindexing_roll(A, r):
rows, column_indices = np.ogrid[:A.shape[0], :A.shape[1]]
r[r < 0] += A.shape[1]
column_indices = column_indices - r[:,np.newaxis]
return A[rows, column_indices]
Let's do some benchmarking on an array with large number of rows and columns -
In [324]: np.random.seed(0)
...: a = np.random.rand(10000,1000)
...: r = np.random.randint(-1000,1000,(10000))
# #seberg's solution
In [325]: %timeit advindexing_roll(a, r)
10 loops, best of 3: 71.3 ms per loop
# Solution from this post
In [326]: %timeit strided_indexing_roll(a, r)
10 loops, best of 3: 44 ms per loop

In case you want more general solution (dealing with any shape and with any axis), I modified #seberg's solution:
def indep_roll(arr, shifts, axis=1):
"""Apply an independent roll for each dimensions of a single axis.
Parameters
----------
arr : np.ndarray
Array of any shape.
shifts : np.ndarray
How many shifting to use for each dimension. Shape: `(arr.shape[axis],)`.
axis : int
Axis along which elements are shifted.
"""
arr = np.swapaxes(arr,axis,-1)
all_idcs = np.ogrid[[slice(0,n) for n in arr.shape]]
# Convert to a positive shift
shifts[shifts < 0] += arr.shape[-1]
all_idcs[-1] = all_idcs[-1] - shifts[:, np.newaxis]
result = arr[tuple(all_idcs)]
arr = np.swapaxes(result,-1,axis)
return arr

I implement a pure numpy.lib.stride_tricks.as_strided solution as follows
from numpy.lib.stride_tricks import as_strided
def custom_roll(arr, r_tup):
m = np.asarray(r_tup)
arr_roll = arr[:, [*range(arr.shape[1]),*range(arr.shape[1]-1)]].copy() #need `copy`
strd_0, strd_1 = arr_roll.strides
n = arr.shape[1]
result = as_strided(arr_roll, (*arr.shape, n), (strd_0 ,strd_1, strd_1))
return result[np.arange(arr.shape[0]), (n-m)%n]
A = np.array([[4, 0, 0],
[1, 2, 3],
[0, 0, 5]])
r = np.array([2, 0, -1])
out = custom_roll(A, r)
Out[789]:
array([[0, 0, 4],
[1, 2, 3],
[0, 5, 0]])

By using a fast fourrier transform we can apply a transformation in the frequency domain and then use the inverse fast fourrier transform to obtain the row shift.
So this is a pure numpy solution that take only one line:
import numpy as np
from numpy.fft import fft, ifft
# The row shift function using the fast fourrier transform
# rshift(A,r) where A is a 2D array, r the row shift vector
def rshift(A,r):
return np.real(ifft(fft(A,axis=1)*np.exp(2*1j*np.pi/A.shape[1]*r[:,None]*np.r_[0:A.shape[1]][None,:]),axis=1).round())
This will apply a left shift, but we can simply negate the exponential exponant to turn the function into a right shift function:
ifft(fft(...)*np.exp(-2*1j...)
It can be used like that:
# Example:
A = np.array([[1,2,3,4],
[1,2,3,4],
[1,2,3,4]])
r = np.array([1,-1,3])
print(rshift(A,r))

Building on divakar's excellent answer, you can apply this logic to 3D array easily (which was the problematic that brought me here in the first place). Here's an example - basically flatten your data, roll it & reshape it after::
def applyroll_30(cube, threshold=25, offset=500):
flattened_cube = cube.copy().reshape(cube.shape[0]*cube.shape[1], cube.shape[2])
roll_matrix = calc_roll_matrix_flattened(flattened_cube, threshold, offset)
rolled_cube = strided_indexing_roll(flattened_cube, roll_matrix, cube_shape=cube.shape)
rolled_cube = triggered_cube.reshape(cube.shape[0], cube.shape[1], cube.shape[2])
return rolled_cube
def calc_roll_matrix_flattened(cube_flattened, threshold, offset):
""" Calculates the number of position along time axis we need to shift
elements in order to trig the data.
We return a 1D numpy array of shape (X*Y, time) elements
"""
# armax(...) finds the position in the cube (3d) where we are above threshold
roll_matrix = np.argmax(cube_flattened > threshold, axis=1) + offset
# ensure we don't have index out of bound
roll_matrix[roll_matrix>cube_flattened.shape[1]] = cube_flattened.shape[1]
return roll_matrix
def strided_indexing_roll(cube_flattened, roll_matrix_flattened, cube_shape):
# Concatenate with sliced to cover all rolls
# otherwise we shift in the wrong direction for my application
roll_matrix_flattened = -1 * roll_matrix_flattened
a_ext = np.concatenate((cube_flattened, cube_flattened[:, :-1]), axis=1)
# Get sliding windows; use advanced-indexing to select appropriate ones
n = cube_flattened.shape[1]
result = viewW(a_ext,(1,n))[np.arange(len(roll_matrix_flattened)), (n - roll_matrix_flattened) % n, 0]
result = result.reshape(cube_shape)
return result
Divakar's answer doesn't do justice to how much more efficient this is on large cube of data. I've timed it on a 400x400x2000 data formatted as int8. An equivalent for-loop does ~5.5seconds, Seberg's answer ~3.0seconds and strided_indexing.... ~0.5second.

Iterate over all pairwise combinations of numpy array columns

I have an numpy array of size
arr.size = (200, 600, 20).
I want to compute scipy.stats.kendalltau on every pairwise combination of the last two dimensions. For example:
kendalltau(arr[:, 0, 0], arr[:, 1, 0])
kendalltau(arr[:, 0, 0], arr[:, 1, 1])
kendalltau(arr[:, 0, 0], arr[:, 1, 2])
...
kendalltau(arr[:, 0, 0], arr[:, 2, 0])
kendalltau(arr[:, 0, 0], arr[:, 2, 1])
kendalltau(arr[:, 0, 0], arr[:, 2, 2])
...
...
kendalltau(arr[:, 598, 20], arr[:, 599, 20])
such that I cover all combinations of arr[:, i, xi] with arr[:, j, xj] with i < j and xi in [0,20), xj in [0, 20). This is (600 choose 2) * 400 individual calculations, but since each takes about 0.002 s on my machine, it shouldn't take much longer than a day with the multiprocessing module.
What's the best way to go about iterating over these columns (with i<j)? I figure I should avoid something like
for i in range(600):
for j in range(i+1, 600):
for xi in range(20):
for xj in range(20):
What is the most numpythonic way of doing this?
Edit: I changed the title since Kendall Tau isn't really important to the question. I realize I could also do something like
import itertools as it
for i, j in it.combinations(xrange(600), 2):
for xi, xj in product(xrange(20), xrange(20)):
but there's got to be a better, more vectorized way with numpy.

The general way of vectorizing something like this is to use broadcasting to create the cartesian product of the set with itself. In your case you have an array arr of shape (200, 600, 20), so you would take two views of it:
arr_x = arr[:, :, np.newaxis, np.newaxis, :] # shape (200, 600, 1, 1, 20)
arr_y = arr[np.newaxis, np.newaxis, :, :, :] # shape (1, 1, 200, 600, 20)
The above two lines have been expanded for clarity, but I would normally write the equivalent:
arr_x = arr[:, :, None, None]
arr_y = arr
If you have a vectorized function, f, that did broadcasting on all but the last dimension, you could then do:
out = f(arr[:, :, None, None], arr)
And then out would be an array of shape (200, 600, 200, 600), with out[i, j, k, l] holding the value of f(arr[i, j], arr[k, l]). For instance, if you wanted to compute all the pairwise inner products, you could do:
from numpy.core.umath_tests import inner1d
out = inner1d(arr[:, :, None, None], arr)
Unfortunately scipy.stats.kendalltau is not vectorized like this. According to the docs
"If arrays are not 1-D, they will be flattened to 1-D."
So you cannot go about it like this, and you are going to wind up doing Python nested loops, be it explicitly writing them out, using itertools or disguising it under np.vectorize. That's going to be slow, because of the iteration on Python variables, and because you have a Python function per iteration step, which are both expensive actions.
Do note that, when you can go the vectorized way, there is an obvious drawback: if your function is commutative, i.e. if f(a, b) == f(b, a), then you are doing twice the computations needed. Depending on how expensive your actual computation is, this is very often offset by the increase in speed from not having any Python loops or function calls.

If you don't want to use recursion you should generally be using itertools.combinations. There is no specific reason (afaik) why this should cause your code to run slower. The computationally-intensive parts are still being handled by numpy. Itertools also has the advantage of readability.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I improve my custom function vectorization using numpy - python

You can start by removing the first list comprehension: arr1 = sliding_window_view(arr, window_shape = n, axis = 1).sum(axis=2) I'm not going to touch that long diffs line :(

Related

'Lossy' cumsum in numpy

Randomize part of an array

How can I smooth elements of a two-dimensional array with differing gaussian functions in python?

Roll rows of a matrix independently

Iterate over all pairwise combinations of numpy array columns

Categories

Resources