NumPy - vectorizing matrix-matrix column correlation coefficient [duplicate] - python

I have two arrays that have the shapes N X T and M X T. I'd like to compute the correlation coefficient across T between every possible pair of rows n and m (from N and M, respectively).
What's the fastest, most pythonic way to do this? (Looping over N and M would seem to me to be neither fast nor pythonic.) I'm expecting the answer to involve numpy and/or scipy. Right now my arrays are numpy arrays, but I'm open to converting them to a different type.
I'm expecting my output to be an array with the shape N X M.
N.B. When I say "correlation coefficient," I mean the Pearson product-moment correlation coefficient.
Here are some things to note:
The numpy function correlate requires input arrays to be one-dimensional.
The numpy function corrcoef accepts two-dimensional arrays, but they must have the same shape.
The scipy.stats function pearsonr requires input arrays to be one-dimensional.

Correlation (default 'valid' case) between two 2D arrays:
You can simply use matrix-multiplication np.dot like so -
out = np.dot(arr_one,arr_two.T)
Correlation with the default "valid" case between each pairwise row combinations (row1,row2) of the two input arrays would correspond to multiplication result at each (row1,row2) position.
Row-wise Correlation Coefficient calculation for two 2D arrays:
def corr2_coeff(A, B):
# Rowwise mean of input arrays & subtract from input arrays themeselves
A_mA = A - A.mean(1)[:, None]
B_mB = B - B.mean(1)[:, None]
# Sum of squares across rows
ssA = (A_mA**2).sum(1)
ssB = (B_mB**2).sum(1)
# Finally get corr coeff
return np.dot(A_mA, B_mB.T) / np.sqrt(np.dot(ssA[:, None],ssB[None]))
This is based upon this solution to How to apply corr2 functions in Multidimentional arrays in MATLAB
Benchmarking
This section compares runtime performance with the proposed approach against generate_correlation_map & loopy pearsonr based approach listed in the other answer.(taken from the function test_generate_correlation_map() without the value correctness verification code at the end of it). Please note the timings for the proposed approach also include a check at the start to check for equal number of columns in the two input arrays, as also done in that other answer. The runtimes are listed next.
Case #1:
In [106]: A = np.random.rand(1000, 100)
In [107]: B = np.random.rand(1000, 100)
In [108]: %timeit corr2_coeff(A, B)
100 loops, best of 3: 15 ms per loop
In [109]: %timeit generate_correlation_map(A, B)
100 loops, best of 3: 19.6 ms per loop
Case #2:
In [110]: A = np.random.rand(5000, 100)
In [111]: B = np.random.rand(5000, 100)
In [112]: %timeit corr2_coeff(A, B)
1 loops, best of 3: 368 ms per loop
In [113]: %timeit generate_correlation_map(A, B)
1 loops, best of 3: 493 ms per loop
Case #3:
In [114]: A = np.random.rand(10000, 10)
In [115]: B = np.random.rand(10000, 10)
In [116]: %timeit corr2_coeff(A, B)
1 loops, best of 3: 1.29 s per loop
In [117]: %timeit generate_correlation_map(A, B)
1 loops, best of 3: 1.83 s per loop
The other loopy pearsonr based approach seemed too slow, but here are the runtimes for one small datasize -
In [118]: A = np.random.rand(1000, 100)
In [119]: B = np.random.rand(1000, 100)
In [120]: %timeit corr2_coeff(A, B)
100 loops, best of 3: 15.3 ms per loop
In [121]: %timeit generate_correlation_map(A, B)
100 loops, best of 3: 19.7 ms per loop
In [122]: %timeit pearsonr_based(A, B)
1 loops, best of 3: 33 s per loop

#Divakar provides a great option for computing the unscaled correlation, which is what I originally asked for.
In order to calculate the correlation coefficient, a bit more is required:
import numpy as np
def generate_correlation_map(x, y):
"""Correlate each n with each m.
Parameters
----------
x : np.array
Shape N X T.
y : np.array
Shape M X T.
Returns
-------
np.array
N X M array in which each element is a correlation coefficient.
"""
mu_x = x.mean(1)
mu_y = y.mean(1)
n = x.shape[1]
if n != y.shape[1]:
raise ValueError('x and y must ' +
'have the same number of timepoints.')
s_x = x.std(1, ddof=n - 1)
s_y = y.std(1, ddof=n - 1)
cov = np.dot(x,
y.T) - n * np.dot(mu_x[:, np.newaxis],
mu_y[np.newaxis, :])
return cov / np.dot(s_x[:, np.newaxis], s_y[np.newaxis, :])
Here's a test of this function, which passes:
from scipy.stats import pearsonr
def test_generate_correlation_map():
x = np.random.rand(10, 10)
y = np.random.rand(20, 10)
desired = np.empty((10, 20))
for n in range(x.shape[0]):
for m in range(y.shape[0]):
desired[n, m] = pearsonr(x[n, :], y[m, :])[0]
actual = generate_correlation_map(x, y)
np.testing.assert_array_almost_equal(actual, desired)

For those interested in computing the Pearson correlation coefficient between a 1D and 2D array, I wrote the following function, where x is a 1D array and y a 2D array.
def pearsonr_2D(x, y):
"""computes pearson correlation coefficient
where x is a 1D and y a 2D array"""
upper = np.sum((x - np.mean(x)) * (y - np.mean(y, axis=1)[:,None]), axis=1)
lower = np.sqrt(np.sum(np.power(x - np.mean(x), 2)) * np.sum(np.power(y - np.mean(y, axis=1)[:,None], 2), axis=1))
rho = upper / lower
return rho
Example run:
>>> x
Out[1]: array([1, 2, 3])
>>> y
Out[2]: array([[ 1, 2, 3],
[ 6, 7, 12],
[ 9, 3, 1]])
>>> pearsonr_2D(x, y)
Out[3]: array([ 1. , 0.93325653, -0.96076892])

Related

Fast inner product of two 2-d masked arrays in numpy

My problem is the following. I have two arrays X and Y of shape n, p where p >> n (e.g. n = 50, p = 10000).
I also have a mask mask (1-d array of booleans of size p) with respect to p, of small density (e.g. np.mean(mask) is 0.05).
I try to compute, as fast as possible, the inner product of X and Y with respect to mask: the output inner is an array of shape n, n, and is such that inner[i, j] = np.sum(X[i, np.logical_not(mask)] * Y[j, np.logical_not(mask)]).
I have tried using the numpy.ma library, but it is quite slow for my use:
import numpy as np
import numpy.ma as ma
n, p = 50, 10000
density = 0.05
mask = np.array(np.random.binomial(1, density, size=p), dtype=np.bool_)
mask_big = np.ones(n)[:, None] * mask[None, :]
X = np.random.randn(n, p)
Y = np.random.randn(n, p)
X_ma = ma.array(X, mask=mask_big)
Y_ma = ma.array(Y, mask=mask_big)
But then, on my machine, X_ma.dot(Y_ma.T) is about 5 times slower than X.dot(Y.T)...
To begin with, I think it is a problem that .dot does not know that the mask is only with respect to p but I don't if its possible to use this information.
I'm looking for a way to perform the computation without being much slower than the naive dot.
Thanks a lot !
We can use matrix-multiplication with and without the masked versions as the masked subtraction from the full version yields to us the desired output -
inner = X.dot(Y.T)-X[:,mask].dot(Y[:,mask].T)
Or simply use the reversed mask, would be slower though for a sparsey mask -
inner = X[:,~mask].dot(Y[:,~mask].T)
Timings -
In [34]: np.random.seed(0)
...: p,n = 10000,50
...: X = np.random.rand(n,p)
...: Y = np.random.rand(n,p)
...: mask = np.random.rand(p)>0.95
In [35]: mask.mean()
Out[35]: 0.0507
In [36]: %timeit X.dot(Y.T)-X[:,mask].dot(Y[:,mask].T)
100 loops, best of 3: 2.54 ms per loop
In [37]: %timeit X[:,~mask].dot(Y[:,~mask].T)
100 loops, best of 3: 4.1 ms per loop
In [39]: %%timeit
...: inner = np.empty((n,n))
...: for i in range(X.shape[0]):
...: for j in range(X.shape[0]):
...: inner[i, j] = np.sum(X[i, ~mask] * Y[j, ~mask])
1 loop, best of 3: 302 ms per loop

Optimizing an operation with numpy sparsey array

I am struggling with a slow numpy operation, using python 3.
I have the following operation:
np.sum(np.log(X.T * b + a).T, 1)
where
(30000,1000) = X.shape
(1000,1) = b.shape
(1000,1) = a.shape
My problem is that this operation is pretty slow (around 1.5 seconds), and it is inside a loop, so it is repeated around 100 times, that makes the running time of my code very long.
I am wondering if there is a faster implementation of this function.
Maybe useful fact: X is extremely sparse (only 0.08% of the entries are nonzero), but is a NumPy array.
We can optimize the logarithm operation which seems to be the bottleneck and that being one of the transcendental functions could be sped up with numexpr module and then sum-reduce with NumPy because NumPy does it much better, thus giving us a hybrid one, like so -
import numexpr as ne
def numexpr_app(X, a, b):
XT = X.T
return ne.evaluate('log(XT * b + a)').sum(0)
Looking closely at the broadcasting operations : XT * b + a, we see that there are two stages of broadcasting, on which we can optimize further. The intention is to see if that could be reduced to one stage and that seems possible here with some division. This gives us a slightly modified version, shown below -
def numexpr_app2(X, a, b):
ab = (a/b)
XT = X.T
return np.log(b).sum() + ne.evaluate('log(ab + XT)').sum(0)
Runtime test and verification
Original approach -
def numpy_app(X, a, b):
return np.sum(np.log(X.T * b + a).T, 1)
Timings -
In [111]: # Setup inputs
...: density = 0.08/100 # 0.08 % sparse
...: m,n = 30000, 1000
...: X = scipy.sparse.rand(m,n,density=density,format="csr").toarray()
...: a = np.random.rand(n,1)
...: b = np.random.rand(n,1)
...:
In [112]: out0 = numpy_app(X, a, b)
...: out1 = numexpr_app(X, a, b)
...: out2 = numexpr_app2(X, a, b)
...: print np.allclose(out0, out1)
...: print np.allclose(out0, out2)
...:
True
True
In [114]: %timeit numpy_app(X, a, b)
1 loop, best of 3: 691 ms per loop
In [115]: %timeit numexpr_app(X, a, b)
10 loops, best of 3: 153 ms per loop
In [116]: %timeit numexpr_app2(X, a, b)
10 loops, best of 3: 149 ms per loop
Just to prove the observation stated at the start that log part is the bottleneck with the original NumPy approach, here's the timing on it -
In [44]: %timeit np.log(X.T * b + a)
1 loop, best of 3: 682 ms per loop
On which the improvement was significant -
In [120]: XT = X.T
In [121]: %timeit ne.evaluate('log(XT * b + a)')
10 loops, best of 3: 142 ms per loop
It's a bit unclear why you would do np.sum(your_array.T, axis=1) instead of np.sum(your_array, axis=0).
You can use a scipy sparse matrix: (use column compressed format for X, so that X.T is row compressed, since you multiply by b which has the shape of one row of X.T)
X_sparse = scipy.sparse.csc_matrx(X)
and replace X.T * b by:
X_sparse.T.multiply(b)
However if a is not sparse it will not help you as much as it could.
These are the speed ups I obtain for this operation:
In [16]: %timeit X_sparse.T.multiply(b)
The slowest run took 10.80 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 374 µs per loop
In [17]: %timeit X.T * b
10 loops, best of 3: 44.5 ms per loop
with:
import numpy as np
from scipy import sparse
X = np.random.randn(30000, 1000)
a = np.random.randn(1000, 1)
b = np.random.randn(1000, 1)
X[X < 3] = 0
print(np.sum(X != 0))
X_sparse = sparse.csc_matrix(X)

Speed up angle calculation for each x,y point in a matrix

I have a 3-d Numpy array flow as follows:
flow = np.random.uniform(low=-1.0, high=1.0, size=(720,1280,2))
# Suppose flow[0] are x-coordinates. flow[1] are y-coordinates.
Need to calculate the angle for each x,y point. Here is how I have implemented it:
def calcAngle(a):
assert(len(a) == 2)
(x, y) = a
# angle_deg = 0
angle_deg = np.angle(x + y * 1j, deg=True)
return angle_deg
fangle = np.apply_along_axis(calcAngle, axis=2, arr=flow)
# The above statement takes 14.0389318466 to execute
The calculation of angle at each point takes 14.0389318466 seconds to execute on my Macbook Pro.
Is there a way I could speed this up, probably by using some matrix operation, rather than processing each pixel one at a time.
You can use numpy.arctan2() to get the angle in radians, and then convert to degrees with numpy.rad2deg():
fangle = np.rad2deg(np.arctan2(flow[:,:,1], flow[:,:,0]))
On my computer, this is a little faster than Divakar's version:
In [17]: %timeit np.angle(flow[...,0] + flow[...,1] * 1j, deg=True)
10 loops, best of 3: 44.5 ms per loop
In [18]: %timeit np.rad2deg(np.arctan2(flow[:,:,1], flow[:,:,0]))
10 loops, best of 3: 35.4 ms per loop
A more efficient way to use np.angle() is to create a complex view of flow. If flow is an array of type np.float64 with shape (m, n, 2), then flow.view(np.complex128)[:,:,0] will be an array of type np.complex128 with shape (m, n):
fangle = np.angle(flow.view(np.complex128)[:,:,0], deg=True)
This appears to be a smidge faster than using arctan2 followed by rad2deg (but the difference is not far above the measurement noise of timeit):
In [47]: %timeit np.angle(flow.view(np.complex128)[:,:,0], deg=True)
10 loops, best of 3: 35 ms per loop
Note that this might not work if flow was creating as the tranpose of some other array, or as a slice of another array using steps bigger than 1.
numpy.angle supports vectorized operation. So, just feed in the first and second column slices to it for the final output, like so -
fangle = np.angle(flow[...,0] + flow[...,1] * 1j, deg=True)
Verification -
In [9]: flow = np.random.uniform(low=-1.0, high=1.0, size=(720,1280,2))
In [17]: out1 = np.apply_along_axis(calcAngle, axis=2, arr=flow)
In [18]: out2 = np.angle(flow[...,0] + flow[...,1] * 1j, deg=True)
In [19]: np.allclose(out1, out2)
Out[19]: True
Runtime test -
In [10]: %timeit np.apply_along_axis(calcAngle, axis=2, arr=flow)
1 loop, best of 3: 8.27 s per loop
In [11]: %timeit np.angle(flow[...,0] + flow[...,1] * 1j, deg=True)
10 loops, best of 3: 47.6 ms per loop
In [12]: 8270/47.6
Out[12]: 173.73949579831933
173x+ speedup!

Exterior product in NumPy : Vectorizing six nested loops

In a research paper, the author introduces an exterior product between two (3*3) matrices A and B, resulting in C:
C(i, j) = sum(k=1..3, l=1..3, m=1..3, n=1..3) eps(i,k,l)*eps(j,m,n)*A(k,m)*B(l,n)
where eps(a, b, c) is the Levi-Civita symbol.
I am wondering how to vectorize such a mathematical operator in Numpy instead of implementing 6 nested loops (for i, j, k, l, m, n) naively.
It looks like a purely sum-reduction based problem without the requirement of keeping any axis aligned between the inputs. So, I would suggest matrix-multiplication based solution for tensors using np.tensordot.
Thus, one solution could be implemented in three steps -
# Matrix-multiplication between first eps and A.
# Thus losing second axis from eps and first from A : k
parte1 = np.tensordot(eps,A,axes=((1),(0)))
# Matrix-multiplication between second eps and B.
# Thus losing third axis from eps and second from B : n
parte2 = np.tensordot(eps,B,axes=((2),(1)))
# Finally, we are left with two products : ilm & jml.
# We need to lose lm and ml from these inputs respectively to get ij.
# So, we need to lose last two dims from the products, but flipped .
out = np.tensordot(parte1,parte2,axes=((1,2),(2,1)))
Runtime test
Approaches -
def einsum_based1(eps, A, B): # #unutbu's soln1
return np.einsum('ikl,jmn,km,ln->ij', eps, eps, A, B)
def einsum_based2(eps, A, B): # #unutbu's soln2
return np.einsum('ilm,jml->ij',
np.einsum('ikl,km->ilm', eps, A),
np.einsum('jmn,ln->jml', eps, B))
def tensordot_based(eps, A, B):
parte1 = np.tensordot(eps,A,axes=((1),(0)))
parte2 = np.tensordot(eps,B,axes=((2),(1)))
return np.tensordot(parte1,parte2,axes=((1,2),(2,1)))
Timings -
In [5]: # Setup inputs
...: N = 20
...: eps = np.random.rand(N,N,N)
...: A = np.random.rand(N,N)
...: B = np.random.rand(N,N)
...:
In [6]: %timeit einsum_based1(eps, A, B)
1 loops, best of 3: 773 ms per loop
In [7]: %timeit einsum_based2(eps, A, B)
1000 loops, best of 3: 972 µs per loop
In [8]: %timeit tensordot_based(eps, A, B)
1000 loops, best of 3: 214 µs per loop
Bigger dataset -
In [12]: # Setup inputs
...: N = 100
...: eps = np.random.rand(N,N,N)
...: A = np.random.rand(N,N)
...: B = np.random.rand(N,N)
...:
In [13]: %timeit einsum_based2(eps, A, B)
1 loops, best of 3: 856 ms per loop
In [14]: %timeit tensordot_based(eps, A, B)
10 loops, best of 3: 49.2 ms per loop
You could use einsum which implements Einstein summation notation:
C = np.einsum('ikl,jmn,km,ln->ij', eps, eps, A, B)
or for better performance, apply einsum to two arrays at a time:
C = np.einsum('ilm,jml->ij',
np.einsum('ikl,km->ilm', eps, A),
np.einsum('jmn,ln->jml', eps, B))
np.einsum computes a sum of products.
The subscript specifier 'ikl,jmn,km,ln->ij' tells np.einsum that
the first eps has subcripts i,k,l,
the second eps has subcripts j,m,n,
A has subcripts k,m,
B has subcripts l,n,
the output array has subscripts i,j
Thus, the summation is over products of the form
eps(i,k,l) * eps(j,m,n) * A(k,m) * B(l,n)
All subscripts not in the output array are summed over.

Efficient way of computing Kullback–Leibler divergence in Python

I have to compute the Kullback-Leibler Divergence (KLD) between thousands of discrete probability vectors. Currently I am using the following code but it's way too slow for my purposes. I was wondering if there is any faster way to compute KL Divergence?
import numpy as np
import scipy.stats as sc
#n is the number of data points
kld = np.zeros(n, n)
for i in range(0, n):
for j in range(0, n):
if(i != j):
kld[i, j] = sc.entropy(distributions[i, :], distributions[j, :])
Scipy's stats.entropy in its default sense invites inputs as 1D arrays giving us a scalar, which is being done in the listed question. Internally this function also allows broadcasting, which we can abuse in here for a vectorized solution.
From the docs -
scipy.stats.entropy(pk, qk=None, base=None)
If only probabilities pk
are given, the entropy is calculated as S = -sum(pk * log(pk),
axis=0).
If qk is not None, then compute the Kullback-Leibler divergence S =
sum(pk * log(pk / qk), axis=0).
In our case, we are doing these entropy calculations for each row against all rows, performing sum reductions to have a scalar at each iteration with those two nested loops. Thus, the output array would be of shape (M,M), where M is the number of rows in input array.
Now, the catch here is that stats.entropy() would sum along axis=0, so we will feed it two versions of distributions, both of whom would have the rowth-dimension brought to axis=0 for reduction along it and the other two axes interleaved - (M,1) & (1,M) to give us a (M,M) shaped output array using broadcasting.
Thus, a vectorized and much more efficient way to solve our case would be -
from scipy import stats
kld = stats.entropy(distributions.T[:,:,None], distributions.T[:,None,:])
Runtime tests and verify -
In [15]: def entropy_loopy(distrib):
...: n = distrib.shape[0] #n is the number of data points
...: kld = np.zeros((n, n))
...: for i in range(0, n):
...: for j in range(0, n):
...: if(i != j):
...: kld[i, j] = stats.entropy(distrib[i, :], distrib[j, :])
...: return kld
...:
In [16]: distrib = np.random.randint(0,9,(100,100)) # Setup input
In [17]: out = stats.entropy(distrib.T[:,:,None], distrib.T[:,None,:])
In [18]: np.allclose(entropy_loopy(distrib),out) # Verify
Out[18]: True
In [19]: %timeit entropy_loopy(distrib)
1 loops, best of 3: 800 ms per loop
In [20]: %timeit stats.entropy(distrib.T[:,:,None], distrib.T[:,None,:])
10 loops, best of 3: 104 ms per loop

Categories