Numpy - Covariance between row of two matrix - python

I need to compute the covariance between each row of two different matrices, i.e. the covariance between the first row of the first matrix with the first row of the second matrix, and so on till the last row of both matrices. I can do it without NumPy with the code attached below, my question is: is it possible to avoid the use of the "for loop" and get the same result with NumPy?
m1 = np.array([[1,2,3],[2,2,2]])
m2 = np.array([[2.56, 2.89, 3.76],[1,2,3.95]])
output = []
for a,b in zip(m1,m2):
cov = np.cov(a, b)
output.append(cov[0][1])
print(output)
Thanks in advance!

If you are handling big arrays, I would consider this:
from numba import jit
import numpy as np
m1 = np.random.rand(10000, 3)
m2 = np.random.rand(10000, 3)
#jit(nopython=True)
def nb_cov(a, b):
return [np.cov(x)[0,1] for x in np.stack((a, b), axis=1)]
To get a runtime of
>>> %timeit nb_cov(m1, m2)
The slowest run took 94.24 times longer than the fastest. This could mean that an intermediate result is being cached.
1 loop, best of 5: 10.5 ms per loop
Compared with
>>> %timeit [np.cov(x)[0,1] for x in np.stack((m1, m2), axis=1)]
1 loop, best of 5: 410 ms per loop

You could use a list comprehension instead of a for loop, and you could eliminate zip (if you wanted to) by concatenating the two arrays along a third dimension.
import numpy as np
m1 = np.array([[1,2,3],[2,2,2]])
m2 = np.array([[2.56, 2.89, 3.76],[1,2,3.95]])
# List comprehension on zipped arrays.
out2 = [np.cov(a, b)[0][1] for a, b in zip(m1, m2)]
print(out2)
# [0.5999999999999999, 0.0]
# List comprehension on concatenated arrays.
big_array = np.concatenate((m1[:, np.newaxis, :],
m2[:, np.newaxis, :]), axis=1)
out3 = [np.cov(X)[0][1] for X in big_array]
print(out3)
# [0.5999999999999999, 0.0]

Related

How to vectorize multiple matrix multiplications in numpy?

For a conceptual idea of what I mean, I have 2 data points:
x_0 = np.array([0.6, 1.4])[:, None]
x_1 = np.array([2.6, 3.4])[:, None]
And a 2x2 matrix:
y = np.array([[2, 2], [2, 2]])
If I perform x_0.T # y # x_0, I get array([[ 8.]]). Similarly, x_1.T # y # x_1 returns array([[ 72.]]).
But is there a way to perform both of these calculations in one go, without a for loop? Obviously the speed-up here is negligible, but I am working with much more data points than presented here.
With x as the column stacked version of x_0, x_1 and so on, we can use np.einsum -
np.einsum('ji,jk,ki->i',x,y,x)
With a mix of np.einsum and matrix-multiplcation -
np.einsum('ij,ji->i',x.T.dot(y),x)
As stated earlier, x was assumed to be column-stacked, like so :
x = np.column_stack((x_0, x_1))
Runtime test -
In [236]: x = np.random.randint(0,255,(3,100000))
In [237]: y = np.random.randint(0,255,(3,3))
# Proposed in #titipata's post/comments under this post
In [238]: %timeit (x.T.dot(y)*x.T).sum(1)
100 loops, best of 3: 3.45 ms per loop
# Proposed earlier in this post
In [239]: %timeit np.einsum('ji,jk,ki->i',x,y,x)
1000 loops, best of 3: 832 µs per loop
# Proposed earlier in this post
In [240]: %timeit np.einsum('ij,ji->i',x.T.dot(y),x)
100 loops, best of 3: 2.6 ms per loop
Basically, you want to do the operation (x.T).dot(A).dot(x) for all x that you have.
x_0 = np.array([0.6, 1.4])[:, None]
x_1 = np.array([2.6, 3.4])[:, None]
x = np.hstack((x_0, x_1)) # [[ 0.6 2.6], [ 1.4 3.4]]
The easy way to think about it is to do multiplication for all x_i that you have with y as
[x_i.dot(y).dot(x_i) for x_i in x.T]
>> [8.0, 72.0]
But of course this is not too efficient. However, you can do the trick where you can do dot product of x with y first and multiply back with itself and sum over column i.e. you manually do dot product. This will make the calculation much faster:
x = x.T
(x.dot(y) * x).sum(axis=1)
>> array([ 8., 72.])
Note that I transpose the matrix first because we want to multiply column of y to each row of x

Replace looping-over-axes with broadcasting, pt 2

Earlier I asked a similar question where the answer used np.dot, taking advantage of the fact that a dot product involves a sum of products. (To my understanding.)
Now I have a similar issue where I don't think dot will apply, because in place of a sum I want to take an element-wise diagonal. If it does, I haven't been able to apply it correctly.
Given a matrix x and array err:
x = np.matrix([[ 0.02984406, -0.00257266],
[-0.00257266, 0.00320312]])
err = np.array([ 7.6363226 , 13.16548267])
My current implementation with loop is:
res = np.array([np.sqrt(np.diagonal(x * err[i])) for i in range(err.shape[0])])
print(res)
[[ 0.47738755 0.15639712]
[ 0.62682649 0.20535487]]
which takes the diagonal of x.dot(i) for each i in err. Could this be vectorized? In other words, can the output of x * err be 3-dimensional, with np.diagonal then yielding a 2d array, with one element for each diagonal?
Program:
import numpy as np
x = np.matrix([[ 0.02984406, -0.00257266],
[-0.00257266, 0.00320312]])
err = np.array([ 7.6363226 , 13.16548267])
diag = np.diagonal(x)
ans = np.sqrt(diag*err[:,np.newaxis]) # sqrt of outer product
print(ans)
# use out keyword to avoid making new numpy array for many times.
ans = np.empty(x.shape, dtype=x.dtype)
for i in range(100):
ans = np.multiply(diag, err, out=ans)
ans = np.sqrt(ans, out=ans)
Result:
[[ 0.47738755 0.15639712]
[ 0.62682649 0.20535487]]
Here's an approach making use of diagonal-view with ndarray.flat into x and then use broadcasting for element-wise multiplication, like so -
np.sqrt(x.flat[::x.shape[1]+1].A1 * err[:,None])
Sample run -
In [108]: x = np.matrix([[ 0.02984406, -0.00257266],
...: [-0.00257266, 0.00320312]])
...:
...: err = np.array([ 7.6363226 , 13.16548267])
...:
In [109]: np.sqrt(x.flat[::x.shape[1]+1].A1 * err[:,None])
Out[109]:
array([[ 0.47738755, 0.15639712],
[ 0.62682649, 0.20535487]])
Runtime test to see how a view helps over np.diagonal that creates a copy -
In [104]: x = np.matrix(np.random.rand(5000,5000))
In [105]: err = np.random.rand(5000)
In [106]: %timeit np.diagonal(x)*err[:,np.newaxis]
10 loops, best of 3: 66.8 ms per loop
In [107]: %timeit x.flat[::x.shape[1]+1].A1 * err[:,None]
10 loops, best of 3: 37.7 ms per loop

Is there a more efficient way to slice a multi dimensional array

I noticed that indexing a multi dimensional array takes more time than indexing a single dimensional array
a1 = np.arange(1000000)
a2 = np.arange(1000000).reshape(1000, 1000)
a3 = np.arange(1000000).reshape(100, 100, 100)
When I index a1
%%timeit
a1[500000]
The slowest run took 39.17 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 84.6 ns per loop
%%timeit
a2[500, 0]
The slowest run took 31.85 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 102 ns per loop
%%timeit
a3[50, 0, 0]
The slowest run took 46.72 times longer than the fastest. This could mean that an intermediate result is being cached.
10000000 loops, best of 3: 119 ns per loop
At what point should I consider an alternative way to index or slice a multi-dimensional array? What are the circumstances that make it worth the effort and loss of transparency?
One alternative to slicing an (n, m) array is to flatten the array and derive what it's one dimensional position must be.
consider a = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
we can get the 2nd row, 3rd column with a[1, 2] and get 5
or we can calculate that 1 * a.shape[1] + 2 is the one dimensional position if we flatten a with order='C'
thus we can perform the equivalent slice with a.ravel()[1 * a.shape[1] + 2]
Is this efficient? No, for indexing a single number from an array, it isn't worth the trouble.
What about if we want to slice many numbers from the array? I devised the following test for a 2-D array
2-D test
from timeit import timeit
n, m = 10000, 10000
a = np.random.rand(n, m)
r = pd.DataFrame(index=np.power(10, np.arange(7)), columns=['Multi', 'Flat'])
for k in r.index:
b = np.random.randint(n, size=k)
c = np.random.randint(m, size=k)
kw = dict(setup='from __main__ import a, b, c', number=100)
r.loc[k, 'Multi'] = timeit('a[b, c]', **kw)
r.loc[k, 'Flat'] = timeit('a.ravel()[b * a.shape[1] + c]', **kw)
r.div(r.sum(1), 0).plot.bar()
It appears that when slicing more than 100,000 numbers, it's better to flatten the array.
What about 3-D
3-D test
from timeit import timeit
l, n, m = 1000, 1000, 1000
a = np.random.rand(l, n, m)
r = pd.DataFrame(index=np.power(10, np.arange(7)), columns=['Multi', 'Flat'])
for k in r.index:
b = np.random.randint(l, size=k)
c = np.random.randint(m, size=k)
d = np.random.randint(n, size=k)
kw = dict(setup='from __main__ import a, b, c, d', number=100)
r.loc[k, 'Multi'] = timeit('a[b, c, d]', **kw)
r.loc[k, 'Flat'] = timeit('a.ravel()[b * a.shape[1] * a.shape[2] + c * a.shape[1] + d]', **kw)
r.div(r.sum(1), 0).plot.bar()
Similar results, maybe more dramatic.
Conclusion
For 2 dimensional arrays, consider flattening and deriving flatten positions if you need to pull more than 100,000 elements from the array.
For 3 or more dimensions, it seems clear that flattening the array is almost always better.
Criticism is welcome
Did I do something wrong? Did I not think of something obvious?

numpy 2d boolean array indexing with reduce along one axis

This question is similar to this one.
I have a 2d boolean array "belong" and a 2d float array "angles".
What I want is to sum along the rows the angles for which the corresponding index in belong is True, and do that with numpy (ie. avoid python loops). I don't need to store the resulting rows, which would have different lengths and as explained in the linked question would require a list.
So what I attempted is np.sum(angles[belong] ,axis=1), but angles[belong] returns a 1d result, and I can't reduce it as I want. I have also tried np.sum(angles*belong ,axis=1) and that works. But I wonder if I could improve the timing by accessing only the indexes where belong is True. belong is True about 30% of the time and angles is a simplification of a longer formula which involves angles.
UPDATE
I like the solution with einsum, however in my actual computation the speed up is tiny. I used angles in the question to simplify, in practice it is a formula that uses angles. I suspect that this formula is calculated for all the angles (regardless of belong) and then passed to einsum, which would perform the computation.
This is what I've done:
THRES_THETA and max_line_length are floats.
belong, angle and lines_lengths_vstacked have shape (1653, 58)
and np.count_nonzero(belong)/belong.size -> 0.376473287856979
l2 = (lambda angle=angle, belong=belong, THRES_THETA=THRES_THETA, lines_lengths_vstacked=lines_lengths_vstacked, max_line_length=max_line_length:
np.sum(belong*(0.3 * (1-(angle/THRES_THETA)) + 0.7 * (lines_lengths_vstacked/max_line_length)), axis=1)) #base method
t2 = timeit.Timer(l2)
print(t2.repeat(3, 100))
l1 = (lambda angle=angle, belong=belong, THRES_THETA=THRES_THETA, lines_lengths_vstacked=lines_lengths_vstacked, max_line_length=max_line_length:
np.einsum('ij,ij->i', belong, 0.3 * (1-(angle/THRES_THETA)) + 0.7 * (lines_lengths_vstacked/max_line_length)))
t1 = timeit.Timer(l1)
print(t1.repeat(3, 100))
l3 = (lambda angle=angle, belong=belong:
np.sum(angle*belong ,axis=1)) #base method
t3 = timeit.Timer(l3)
print(t3.repeat(3, 100))
l4 = (lambda angle=angle, belong=belong:
np.einsum('ij,ij->i', belong, angle))
t4 = timeit.Timer(l4)
print(t4.repeat(3, 100))
and the results were:
[0.2505458095931187, 0.22666162878242901, 0.23591678551324263]
[0.23295411847036418, 0.21908727226505043, 0.22407296178704272]
[0.03711204915708555, 0.03149960399994978, 0.033403337575027114]
[0.025264803208228992, 0.022590580646423053, 0.024585736455331464]
If we look at the last two rows, the one corresponding to einsum is about 30% faster than using the base method. But if we look at the first two rows, the speed up for the einsum method is smaller, just about 0.1% faster.
I'm not sure if this timing can be improved.
You can use np.einsum -
np.einsum('ij,ij->i',belong,angles)
You can also use np.bincount, like so -
idx,_ = np.where(belong)
out = np.bincount(idx,angles[belong])
Sample run -
In [32]: belong
Out[32]:
array([[ True, True, True, False, True],
[False, False, False, True, True],
[False, False, True, True, True],
[False, False, True, False, True]], dtype=bool)
In [33]: angles
Out[33]:
array([[ 0.65429151, 0.36235607, 0.98316406, 0.08236384, 0.5576149 ],
[ 0.37890797, 0.60705112, 0.79411002, 0.6450942 , 0.57750073],
[ 0.6731019 , 0.18608778, 0.83387574, 0.80120389, 0.54971573],
[ 0.18971255, 0.86765132, 0.82994543, 0.62344429, 0.05207639]])
In [34]: np.sum(angles*belong ,axis=1) # This worked for you, so using as baseline
Out[34]: array([ 2.55742654, 1.22259493, 2.18479536, 0.88202183])
In [35]: np.einsum('ij,ij->i',belong,angles)
Out[35]: array([ 2.55742654, 1.22259493, 2.18479536, 0.88202183])
In [36]: idx,_ = np.where(belong)
...: out = np.bincount(idx,angles[belong])
...:
In [37]: out
Out[37]: array([ 2.55742654, 1.22259493, 2.18479536, 0.88202183])
Runtime test -
In [52]: def sum_based(belong,angles):
...: return np.sum(angles*belong ,axis=1)
...:
...: def einsum_based(belong,angles):
...: return np.einsum('ij,ij->i',belong,angles)
...:
...: def bincount_based(belong,angles):
...: idx,_ = np.where(belong)
...: return np.bincount(idx,angles[belong])
...:
In [53]: # Inputs
...: belong = np.random.rand(4000,5000)>0.7
...: angles = np.random.rand(4000,5000)
...:
In [54]: %timeit sum_based(belong,angles)
...: %timeit einsum_based(belong,angles)
...: %timeit bincount_based(belong,angles)
...:
1 loops, best of 3: 308 ms per loop
10 loops, best of 3: 134 ms per loop
1 loops, best of 3: 554 ms per loop
I would go with the np.einsum one!
You could use masked arrays for this, but in the tests I ran it is not faster than (angles * belong).sum(1).
A masked array approach would look like this:
sum_ang = np.ma.masked_where(~belong, angles, copy=False).sum(1).data
Here, we are creating a masked array of angles where the values ~belong ("not belong"), are masked (excluded). We take the not because we want to exclude the values in belong that are False. Then take the sum along rows .sum(1). The sum will return another masked array, so you grab the values with the .data attribute of that masked array.
I added the copy=False kwarg so that this code doesn't get slowed down by array creation, but it's still slower than your (angles * belong).sum(1) approach so you should probably just stick with that.
I have found a way that is about 3 times faster than the einsum solution, and I don't think it can get any faster, so I'm answering my own question with this other method.
What I was hoping is to calculate the formula involving angles just for the positions where belong is True. This should speed up about 3 times as belong is True about 30% of the time.
My first attempt using angles[belong] would calculate the formula just for the positions where belong is True, but had the problem that the resulting array was 1d and I couldn't do the row reductions with np.sum. The solution is to use np.add.reduceat.
reduceat can reduce an ufunc (in this case add) at a list of specific slices. So I just need to create that list of slices so that I can reduce the 1d array resulting from angles[belong].
I'll show my code and timings and that should speak by itself.
first I define a function with the reduceat solution:
def vote_op(angle, belong, THRES_THETA, lines_lengths_vstacked, max_line_length):
intermediate = (0.3 * (1-(angle[belong]/THRES_THETA)) + 0.7 * (lines_lengths_vstacked[belong]/max_line_length))
b_ind = np.hstack([0, np.cumsum(np.sum(belong, axis=1))])
votes = np.add.reduceat(intermediate, b_ind[:-1])
return votes
then I compare with the base method and the einsum method:
l1 = (lambda angle=angle, belong=belong, THRES_THETA=THRES_THETA, lines_lengths_vstacked=lines_lengths_vstacked, max_line_length=max_line_length:
np.sum(belong*(0.3 * (1-(angle/THRES_THETA)) + 0.7 * (lines_lengths_vstacked/max_line_length)), axis=1))
t1 = timeit.Timer(l1)
print(t1.repeat(3, 100))
l2 = (lambda angle=angle, belong=belong, THRES_THETA=THRES_THETA, lines_lengths_vstacked=lines_lengths_vstacked, max_line_length=max_line_length:
np.einsum('ij,ij->i', belong, 0.3 * (1-(angle/THRES_THETA)) + 0.7 * (lines_lengths_vstacked/max_line_length)))
t2 = timeit.Timer(l2)
print(t2.repeat(3, 100))
l3 = (lambda angle=angle, belong=belong, THRES_THETA=THRES_THETA, lines_lengths_vstacked=lines_lengths_vstacked, max_line_length=max_line_length:
vote_op(angle, belong, THRES_THETA, lines_lengths_vstacked, max_line_length))
t3 = timeit.Timer(l3)
print(t3.repeat(3, 100))
and the timings:
[2.866840408487671, 2.6822349628234874, 2.665520338478774]
[2.3444239421490725, 2.352450520946098, 2.4150879511222794]
[0.6846337313820605, 0.660780839464234, 0.6091473217964847]
So the reduceat solution is about 3 times faster and gives the same results as the other two.
Note that these results are for a slightly larger example than before where:
belong, angle and lines_lengths_vstacked have shape: (3400, 170)
and np.count_nonzero(belong)/belong.size->0.16765051903114186
Update
Due to a corner case in np.reduceat (as in numpy version '1.11.0rc1') where it can't handle repeated indices correctly, see, I had to add a hack to vote_op() function for the case where there are whole rows in belong that are False. This results in repeated indices in b_ind and wrong results in votes. My solution for the moment is to patch the wrong values, that works but is another step. see new vote_op():
def vote_op(angle, belong, THRES_THETA, lines_lengths_vstacked, max_line_length):
intermediate = (0.3 * (1-(angle[belong]/THRES_THETA)) + 0.7 * (lines_lengths_vstacked[belong]/max_line_length))
b_rows = np.sum(belong, axis=1)
b_ind = np.hstack([0, np.cumsum(b_rows)])[:-1]
intermediate = np.hstack([intermediate, 0])
votes = np.add.reduceat(intermediate, b_ind)
votes[b_rows == 0] = 0
return votes

Multiply array of vectors with array of matrices; return array of vectors?

I've got a numpy array of row vectors of shape (n,3) and another numpy array of matrices of shape (n,3,3). I would like to multiply each of the n vectors with the corresponding matrix and return an array of shape (n,3) of the resulting vectors.
By now I've been using a for loop to iterate through the n vectors/matrices and do the multiplication item by item.
I would like to know if there's a more numpy-ish way of doing this. A way without the for loop that might even be faster.
//edit 1:
As requested, here's my loopy code (with n = 10):
arr_in = np.random.randn(10, 3)
matrices = np.random.randn(10, 3, 3)
for i in range(arr_in.shape[0]): # 10 iterations
arr_out[i] = np.asarray(np.dot(arr_in[i], matrices[i]))
That dot-product is essentially performing reduction along axis=1 of the two input arrays. The dimensions could be represented like so -
arr_in : n 3
matrices : n 3 3
So, one way to solve it would be to "push" the dimensions of arr_in to front by one axis/dimension, thus creating a singleton dimension at axis=2 in a 3D array version of it. Then, sum-reducing the elements along axis = 1 would give us the desired output. Let's show it -
arr_in : n [3] 1
matrices : n [3] 3
Now, this could be achieved through two ways.
1) With np.einsum -
np.einsum('ij,ijk->ik',arr_in,matrices)
2) With NumPy broadcasting -
(arr_in[...,None]*matrices).sum(1)
Runtime test and verify output (for einsum version) -
In [329]: def loop_based(arr_in,matrices):
...: arr_out = np.zeros((arr_in.shape[0], 3))
...: for i in range(arr_in.shape[0]):
...: arr_out[i] = np.dot(arr_in[i], matrices[i])
...: return arr_out
...:
...: def einsum_based(arr_in,matrices):
...: return np.einsum('ij,ijk->ik',arr_in,matrices)
...:
In [330]: # Inputs
...: N = 16935
...: arr_in = np.random.randn(N, 3)
...: matrices = np.random.randn(N, 3, 3)
...:
In [331]: np.allclose(einsum_based(arr_in,matrices),loop_based(arr_in,matrices))
Out[331]: True
In [332]: %timeit loop_based(arr_in,matrices)
10 loops, best of 3: 49.1 ms per loop
In [333]: %timeit einsum_based(arr_in,matrices)
1000 loops, best of 3: 714 µs per loop
You could use np.einsum. To get v.dot(M) for each vector-matrix pair, use np.einsum("...i,...ij", arr_in, matrices). To get M.dot(v) use np.einsum("...ij,...i", matrices, arr_in)

Categories