For-loop Alternative to do 2D & 3D Matrix Multiplication in Numpy - python

For discussions' sake, I have a 2D matrix (A) of shape 2x2 and a 3D matrix (B) of shape 2x2x10. I am currently looping over the last axis of matrix B, and constructing the full matrix one sub-matrix at a time.
A = np.random.random((2,2))
B = np.random.random((2,2,10))
C = np.zeros_like(B)
for i in range(B.shape[-1]):
C[:,:,i] = A # B[:, :, i]
In reality, my matrices are much larger than this and I know there must be something more efficient than a for loop. I have looked at a couple of prior questions where the solution involves using np.tensordot or np.einsum, but frankly, I don't think I am using it right.
# Basic
C_basic = A # B
print(f'Basic {np.allclose(C, C_basic)}') # False
# Einsum
C_einsum = np.einsum('ij, jik-> ijk', A, B)
print(f'np.einsum {np.allclose(C, C_einsum)}') # False
# Newaxis
C_newaxis = A[np.newaxis, ...] # B
print(f'np.newaxis {np.allclose(C, C_newaxis)}') # False
# Swapaxes
C_swapaxes = A # np.swapaxes(B, 0, 2)
C_swapaxes = np.swapaxes(C_swapaxes, 0, 2)
print(f'np.swapaxes {np.allclose(C, C_swapaxes)}') # False

Here are a few possibilities:
import numpy as np
A = np.random.random((2,2))
B = np.random.random((2,2,10))
C = np.zeros_like(B)
for i in range(B.shape[-1]):
C[:,:,i] = A # B[:, :, i]
Cs = [np.einsum('ij,jkl',A,B),
np.tensordot(A,B,((-1,),(0,))),
(A#B.reshape(len(B),-1)).reshape(-1,*B.shape[1:]),
np.moveaxis(A#np.moveaxis(B,-1,0),0,-1),
(A#B.transpose(2,0,1)).transpose(1,2,0),
np.inner(B.T,A).T,
(B.T#A.T).T]
print([np.allclose(C,Ci) for Ci in Cs])
prints:
[True, True, True, True, True, True, True]
These are, however, not 100% equivalent: For example. the first three are C-contiguous, the last two Fortran and the middle two neither.
You can inspect using:
for Ci in Cs:
print(Ci.flags)

Related

How to index a multidimensional numpy array with a number of 1d boolean arrays?

Assume that I have a numpy array A with n dimensions, which might be very large, and assume that I have k 1-dimensional boolean masks M1, ..., Mk
I would like to extract from A an n-dimensional array B which contains all the elements of A located at indices where the "outer-AND" of all the masks is True.
..but I would like to do this without first forming the (possibly very large) "outer-AND" of all the masks, and without having to extract the specified elements from each axis one axis at a time hence creating (possibly many) intermediate copies in the process.
The example below demonstrates the two ways of extracting the elements from A just described above:
from functools import reduce
import numpy as np
m = 100
for _ in range(m):
n = np.random.randint(0, 10)
k = np.random.randint(0, n + 1)
A_shape = tuple(np.random.randint(0, 10, n))
A = np.random.uniform(-1, 1, A_shape)
M_lst = [np.random.randint(0, 2, dim).astype(bool) for dim in A_shape]
# creating shape of B:
B_shape = tuple(map(np.count_nonzero, M_lst)) + A_shape[len(M_lst):]
# size of B:
B_size = np.prod(B_shape)
# --- USING "OUTER-AND" OF ALL MASKS --- #
# creating "outer-AND" of all masks:
M = reduce(np.bitwise_and, (np.expand_dims(M, tuple(np.r_[:i, i+1:n])) for i, M in enumerate(M_lst)), True)
# extracting elements from A and reshaping to the correct shape:
B1 = A[M].reshape(B_shape)
# checking that the correct number of elements was extracted
assert B1.size == B_size
# THE PROBLEM WITH THIS METHOD IS THE POSSIBLY VERY LARGE OUTER-AND OF ALL THE MASKS!
# --- USING ONE MASK AT A TIME --- #
B2 = A
for i, M in enumerate(M_lst):
B2 = B2[tuple(slice(None) for _ in range(i)) + (M,)]
assert B2.size == np.prod(B_shape)
assert B2.shape == B_shape
# THE PROBLEM WITH THIS METHOD IS THE POSSIBLY LARGE NUMBER OF POSSIBLY LARGE INTERMEDIATE COPIES!
assert np.all(B1 == B2)
# EDIT 1:
# USING np.ix_ AS SUGGESTED BY Chrysophylaxs
i = np.ix_(*M_lst)
B3 = A[i]
assert B3.shape == B_shape
assert B3.size == B_size
assert np.prod(list(map(np.size, i))) == B_size
print(f'All three methods worked all {m} times')
Is there a smarter (more efficient) way to do this, possibly using an existing numpy function?.
IIUC, you're looking for np.ix_; an example:
import numpy as np
arr = np.arange(60).reshape(3, 4, 5)
x = [True, False, True]
y = [False, True, True, False]
z = [False, True, False, True, False]
out = arr[np.ix_(x, y, z)]
out:
array([[[ 6, 8],
[11, 13]],
[[46, 48],
[51, 53]]])

Numba-compatible implementation of np.tile?

I'm working on some code for dehazing images, based on this paper, and I started with an abandoned Py2.7 implementation. Since then, particularly with Numba, I've made some real performance improvements (important since I'll have to run this on 8K images).
I'm pretty convinced my last significant performance bottleneck is in performing the box filter step (I've already shaved off almost a minute per image, but this last slow step is ~30s/image), and I'm close to getting it to run as nopython in Numba:
#njit # Row dependencies means can't be parallel
def yCumSum(a):
"""
Numba based computation of y-direction
cumulative sum. Can't be parallel!
"""
out = np.empty_like(a)
out[0, :] = a[0, :]
for i in prange(1, a.shape[0]):
out[i, :] = a[i, :] + out[i - 1, :]
return out
#njit(parallel= True)
def xCumSum(a):
"""
Numba-based parallel computation
of X-direction cumulative sum
"""
out = np.empty_like(a)
for i in prange(a.shape[0]):
out[i, :] = np.cumsum(a[i, :])
return out
#jit
def _boxFilter(m, r, gpu= hasGPU):
if gpu:
m = cp.asnumpy(m)
out = __boxfilter__(m, r)
if gpu:
return cp.asarray(out)
return out
#jit(fastmath= True)
def __boxfilter__(m, r):
"""
Fast box filtering implementation, O(1) time.
Parameters
----------
m: a 2-D matrix data normalized to [0.0, 1.0]
r: radius of the window considered
Return
-----------
The filtered matrix m'.
"""
#H: height, W: width
H, W = m.shape
#the output matrix m'
mp = np.empty(m.shape)
#cumulative sum over y axis
ySum = yCumSum(m) #np.cumsum(m, axis=0)
#copy the accumulated values of the windows in y
mp[0:r+1,: ] = ySum[r:(2*r)+1,: ]
#differences in y axis
mp[r+1:H-r,: ] = ySum[(2*r)+1:,: ] - ySum[ :H-(2*r)-1,: ]
mp[(-r):,: ] = np.tile(ySum[-1,: ], (r, 1)) - ySum[H-(2*r)-1:H-r-1,: ]
#cumulative sum over x axis
xSum = xCumSum(mp) #np.cumsum(mp, axis=1)
#copy the accumulated values of the windows in x
mp[:, 0:r+1] = xSum[:, r:(2*r)+1]
#difference over x axis
mp[:, r+1:W-r] = xSum[:, (2*r)+1: ] - xSum[:, :W-(2*r)-1]
mp[:, -r: ] = np.tile(xSum[:, -1][:, None], (1, r)) - xSum[:, W-(2*r)-1:W-r-1]
return mp
There's plenty to do around the edges, but if I can get the tile operation as a nopython call, I can nopython the whole boxfilter step and get a big performance boost. I'm not super inclined to do something really really specific as I'd love to reuse this code elsewhere, but I wouldn't particularly object to it being limited to a 2D scope. For whatever reason I'm just staring at this and not really sure where to start.
np.tile is a bit too complicated to reimplement in full, but unless I'm misreading it looks like you only need to take a vector and then repeat it along a different axis r times.
A Numba-compatible way to do this is to write
y = x.repeat(r).reshape((-1, r))
Then x will be repeated r times along the second dimension, so that y[i, j] == x[i].
Example:
In [2]: x = np.arange(5)
In [3]: x.repeat(3).reshape((-1, 3))
Out[3]:
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2],
[3, 3, 3],
[4, 4, 4]])
If you want x to be repeated along the first dimension instead, just take the transpose y.T.

Vectorizing Numpy 3D and 2D array operation

I'm trying to create K MxN matrices in Python, stored in a (M,N,K) numpy array, C, from two matrices, A and B, with shapes (K, M) and (K,N) respectively. The first matrix is computed as C0 = a0.T x b0, where a0 is the first row of A and b1 is the first row of B, the second matrix as C1 = a1.T x b0 and so on.
Right now I'm using a for loop to compute the matrices.
import numpy as np
A = np.random.random((10,800))
B = np.random.random((10,500))
C = np.zeros((800,500,10))
for k in range(10):
C[:,:,k] = A[k,:][:,None] # B[k,:][None,:]
Since the operations are independent, I was wondering if there was some pythonic way to avoid the for loop. Perhaps I can vectorize the code, but I fail to see how it could be done.
In [235]: A = np.random.random((10,800))
...: B = np.random.random((10,500))
...: C = np.zeros((800,500,10))
...: for k in range(10):
...: C[:,:,k] = A[k,:][:,None] # B[k,:][None,:]
...:
In [236]: C.shape
Out[236]: (800, 500, 10)
Batched matrix product, followed by transpose
In [237]: np.allclose((A[:,:,None]#B[:,None,:]).transpose(1,2,0), C)
Out[237]: True
But since the matrix product axis is size 1, and there's no other summation, broadcasted multiply is just as good:
In [238]: np.allclose((A[:,:,None]*B[:,None,:]).transpose(1,2,0), C)
Out[238]: True
Execution time is about the same

Calculate mean, variance, covariance of different length matrices in a split list

I have an array of 5 values, consisting of 4 values and one index. I sort and split the array along the index. This leads me to splits of matrices with different lengths. From here on I want to calculate the mean, variance of the fourth values and covariance of the first 3 values for every split. My current approach works with a for loop, which I would like to replace by matrix operations, but I am struggeling with the different sizes of my matrices.
import numpy as np
A = np.random.rand(10,5)
A[:,-1] = np.random.randint(4, size=10)
sorted_A = A[np.argsort(A[:,4])]
splits = np.split(sorted_A, np.where(np.diff(sorted_A[:,4]))[0]+1)
My current for loop looks like this:
result = np.zeros((len(splits), 5))
for idx, values in enumerate(splits):
if(len(values))>0:
result[idx, 0] = np.mean(values[:,3])
result[idx, 1] = np.var(values[:,3])
result[idx, 2:5] = np.cov(values[:,0:3].transpose(), ddof=0).diagonal()
else:
result[idx, 0] = values[:,3]
I tried to work with masked arrays without success, since I couldn't load the matrices into the masked arrays in a proper form. Maybe someone knows how to do this or has a different suggestion.
You can use np.add.reduceat as follows:
>>> idx = np.concatenate([[0], np.where(np.diff(sorted_A[:,4]))[0]+1, [A.shape[0]]])
>>> result2 = np.empty((idx.size-1, 5))
>>> result2[:, 0] = np.add.reduceat(sorted_A[:, 3], idx[:-1]) / np.diff(idx)
>>> result2[:, 1] = np.add.reduceat(sorted_A[:, 3]**2, idx[:-1]) / np.diff(idx) - result2[:, 0]**2
>>> result2[:, 2:5] = np.add.reduceat(sorted_A[:, :3]**2, idx[:-1], axis=0) / np.diff(idx)[:, None]
>>> result2[:, 2:5] -= (np.add.reduceat(sorted_A[:, :3], idx[:-1], axis=0) / np.diff(idx)[:, None])**2
>>>
>>> np.allclose(result, result2)
True
Note that the diagonal of the covariance matrix are just the variances which simplifies this vectorization quite a bit.

Vectorizing an operation between all pairs of elements in two numpy arrays

Given two arrays where each row represents a circle (x, y, r):
data = {}
data[1] = np.array([[455.108, 97.0478, 0.0122453333],
[403.775, 170.558, 0.0138770952],
[255.383, 363.815, 0.0179857619]])
data[2] = np.array([[455.103, 97.0473, 0.012041],
[210.19, 326.958, 0.0156912857],
[455.106, 97.049, 0.0150472381]])
I would like to pull out all of the pairs of circles that are not disjointed. This can be done by:
close_data = {}
for row1 in data[1]: #loop over first array
for row2 in data[2]: #loop over second array
condition = ((abs(row1[0]-row2[0]) + abs(row1[1]-row2[1])) < (row1[2]+row2[2]))
if condition: #circles overlap if true
if tuple(row1) not in close_data.keys():
close_data[tuple(row1)] = [row1, row2] #pull out close data points
else:
close_data[tuple(row1)].append(row2)
for k, v in close_data.iteritems():
print k, v
#desired outcome
#(455.108, 97.047799999999995, 0.012245333299999999)
#[array([ 4.55108000e+02, 9.70478000e+01, 1.22453333e-02]),
# array([ 4.55103000e+02, 9.70473000e+01, 1.2040000e-02]),
# array([ 4.55106000e+02, 9.70490000e+01, 1.50472381e-02])]
However the multiple loops over the arrays are very inefficient for large datasets. Is it possible to vectorize the calculations so I get the advantage of using numpy?
The most difficult bit is actually getting to your representation of the info. Oh, and I inserted a few squares. If you really don't want Euclidean distances you have to change back.
import numpy as np
data = {}
data[1] = np.array([[455.108, 97.0478, 0.0122453333],
[403.775, 170.558, 0.0138770952],
[255.383, 363.815, 0.0179857619]])
data[2] = np.array([[455.103, 97.0473, 0.012041],
[210.19, 326.958, 0.0156912857],
[455.106, 97.049, 0.0150472381]])
d1 = data[1][:, None, :]
d2 = data[2][None, :, :]
dists2 = ((d1[..., :2] - d2[..., :2])**2).sum(axis = -1)
radss2 = (d1[..., 2] + d2[..., 2])**2
inds1, inds2 = np.where(dists2 <= radss2)
# translate to your representation:
bnds = np.r_[np.searchsorted(inds1, np.arange(3)), len(inds1)]
rows = [data[2][inds2[bnds[i]:bnds[i+1]]] for i in range(3)]
out = dict([(tuple (data[1][i]), rows[i]) for i in range(3) if rows[i].size > 0])
Here is a pure numpythonic way (a is data[1] and b is data[2]):
In [80]: p = np.arange(3) # for creating the indices of combinations using np.tile and np.repeat
In [81]: a = a[np.repeat(p, 3)] # creates the first column of combination array
In [82]: b = b[np.tile(p, 3)] # creates the second column of combination array
In [83]: abs(a[:, :2] - b[:, :2]).sum(1) < a[:, 2] + b[:, 2]
Out[83]: array([ True, False, True, True, False, True, True, False, True], dtype=bool)

Categories