For a conceptual idea of what I mean, I have 2 data points:
x_0 = np.array([0.6, 1.4])[:, None]
x_1 = np.array([2.6, 3.4])[:, None]
And a 2x2 matrix:
y = np.array([[2, 2], [2, 2]])
If I perform x_0.T # y # x_0, I get array([[ 8.]]). Similarly, x_1.T # y # x_1 returns array([[ 72.]]).
But is there a way to perform both of these calculations in one go, without a for loop? Obviously the speed-up here is negligible, but I am working with much more data points than presented here.
With x as the column stacked version of x_0, x_1 and so on, we can use np.einsum -
np.einsum('ji,jk,ki->i',x,y,x)
With a mix of np.einsum and matrix-multiplcation -
np.einsum('ij,ji->i',x.T.dot(y),x)
As stated earlier, x was assumed to be column-stacked, like so :
x = np.column_stack((x_0, x_1))
Runtime test -
In [236]: x = np.random.randint(0,255,(3,100000))
In [237]: y = np.random.randint(0,255,(3,3))
# Proposed in #titipata's post/comments under this post
In [238]: %timeit (x.T.dot(y)*x.T).sum(1)
100 loops, best of 3: 3.45 ms per loop
# Proposed earlier in this post
In [239]: %timeit np.einsum('ji,jk,ki->i',x,y,x)
1000 loops, best of 3: 832 µs per loop
# Proposed earlier in this post
In [240]: %timeit np.einsum('ij,ji->i',x.T.dot(y),x)
100 loops, best of 3: 2.6 ms per loop
Basically, you want to do the operation (x.T).dot(A).dot(x) for all x that you have.
x_0 = np.array([0.6, 1.4])[:, None]
x_1 = np.array([2.6, 3.4])[:, None]
x = np.hstack((x_0, x_1)) # [[ 0.6 2.6], [ 1.4 3.4]]
The easy way to think about it is to do multiplication for all x_i that you have with y as
[x_i.dot(y).dot(x_i) for x_i in x.T]
>> [8.0, 72.0]
But of course this is not too efficient. However, you can do the trick where you can do dot product of x with y first and multiply back with itself and sum over column i.e. you manually do dot product. This will make the calculation much faster:
x = x.T
(x.dot(y) * x).sum(axis=1)
>> array([ 8., 72.])
Note that I transpose the matrix first because we want to multiply column of y to each row of x
Related
I need to compute the covariance between each row of two different matrices, i.e. the covariance between the first row of the first matrix with the first row of the second matrix, and so on till the last row of both matrices. I can do it without NumPy with the code attached below, my question is: is it possible to avoid the use of the "for loop" and get the same result with NumPy?
m1 = np.array([[1,2,3],[2,2,2]])
m2 = np.array([[2.56, 2.89, 3.76],[1,2,3.95]])
output = []
for a,b in zip(m1,m2):
cov = np.cov(a, b)
output.append(cov[0][1])
print(output)
Thanks in advance!
If you are handling big arrays, I would consider this:
from numba import jit
import numpy as np
m1 = np.random.rand(10000, 3)
m2 = np.random.rand(10000, 3)
#jit(nopython=True)
def nb_cov(a, b):
return [np.cov(x)[0,1] for x in np.stack((a, b), axis=1)]
To get a runtime of
>>> %timeit nb_cov(m1, m2)
The slowest run took 94.24 times longer than the fastest. This could mean that an intermediate result is being cached.
1 loop, best of 5: 10.5 ms per loop
Compared with
>>> %timeit [np.cov(x)[0,1] for x in np.stack((m1, m2), axis=1)]
1 loop, best of 5: 410 ms per loop
You could use a list comprehension instead of a for loop, and you could eliminate zip (if you wanted to) by concatenating the two arrays along a third dimension.
import numpy as np
m1 = np.array([[1,2,3],[2,2,2]])
m2 = np.array([[2.56, 2.89, 3.76],[1,2,3.95]])
# List comprehension on zipped arrays.
out2 = [np.cov(a, b)[0][1] for a, b in zip(m1, m2)]
print(out2)
# [0.5999999999999999, 0.0]
# List comprehension on concatenated arrays.
big_array = np.concatenate((m1[:, np.newaxis, :],
m2[:, np.newaxis, :]), axis=1)
out3 = [np.cov(X)[0][1] for X in big_array]
print(out3)
# [0.5999999999999999, 0.0]
Earlier I asked a similar question where the answer used np.dot, taking advantage of the fact that a dot product involves a sum of products. (To my understanding.)
Now I have a similar issue where I don't think dot will apply, because in place of a sum I want to take an element-wise diagonal. If it does, I haven't been able to apply it correctly.
Given a matrix x and array err:
x = np.matrix([[ 0.02984406, -0.00257266],
[-0.00257266, 0.00320312]])
err = np.array([ 7.6363226 , 13.16548267])
My current implementation with loop is:
res = np.array([np.sqrt(np.diagonal(x * err[i])) for i in range(err.shape[0])])
print(res)
[[ 0.47738755 0.15639712]
[ 0.62682649 0.20535487]]
which takes the diagonal of x.dot(i) for each i in err. Could this be vectorized? In other words, can the output of x * err be 3-dimensional, with np.diagonal then yielding a 2d array, with one element for each diagonal?
Program:
import numpy as np
x = np.matrix([[ 0.02984406, -0.00257266],
[-0.00257266, 0.00320312]])
err = np.array([ 7.6363226 , 13.16548267])
diag = np.diagonal(x)
ans = np.sqrt(diag*err[:,np.newaxis]) # sqrt of outer product
print(ans)
# use out keyword to avoid making new numpy array for many times.
ans = np.empty(x.shape, dtype=x.dtype)
for i in range(100):
ans = np.multiply(diag, err, out=ans)
ans = np.sqrt(ans, out=ans)
Result:
[[ 0.47738755 0.15639712]
[ 0.62682649 0.20535487]]
Here's an approach making use of diagonal-view with ndarray.flat into x and then use broadcasting for element-wise multiplication, like so -
np.sqrt(x.flat[::x.shape[1]+1].A1 * err[:,None])
Sample run -
In [108]: x = np.matrix([[ 0.02984406, -0.00257266],
...: [-0.00257266, 0.00320312]])
...:
...: err = np.array([ 7.6363226 , 13.16548267])
...:
In [109]: np.sqrt(x.flat[::x.shape[1]+1].A1 * err[:,None])
Out[109]:
array([[ 0.47738755, 0.15639712],
[ 0.62682649, 0.20535487]])
Runtime test to see how a view helps over np.diagonal that creates a copy -
In [104]: x = np.matrix(np.random.rand(5000,5000))
In [105]: err = np.random.rand(5000)
In [106]: %timeit np.diagonal(x)*err[:,np.newaxis]
10 loops, best of 3: 66.8 ms per loop
In [107]: %timeit x.flat[::x.shape[1]+1].A1 * err[:,None]
10 loops, best of 3: 37.7 ms per loop
I have an 8x8x25000 array W and an 8 x 25000 array r. I want to multiple each 8x8 slice of W by each column (8x1) of r and save the result in Wres, which will end up being an 8x25000 matrix.
I am accomplishing this using a for loop as such:
for i in range(0,25000):
Wres[:,i] = np.matmul(W[:,:,i],res[:,i])
But this is slow and I am hoping there is a quicker way to accomplish this.
Any ideas?
Matmul can propagate as long as the 2 arrays share the same 1 axis length. From the docs:
If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly.
Thus, you have to perform 2 operations prior to matmul:
import numpy as np
a = np.random.rand(8,8,100)
b = np.random.rand(8, 100)
transpose a and b so that the first axis are the 100 slices
add an extra dimension to b so that b.shape = (100, 8, 1)
Then:
at = a.transpose(2, 0, 1) # swap to shape 100, 8, 8
bt = b.T[..., None] # swap to shape 100, 8, 1
c = np.matmul(at, bt)
c is now 100, 8, 1, reshape back to 8, 100:
c = np.squeeze(c).swapaxes(0, 1)
or
c = np.squeeze(c).T
And last, a one-liner just for conveniende:
c = np.squeeze(np.matmul(a.transpose(2, 0, 1), b.T[..., None])).T
An alternative to using np.matmul is np.einsum, which can be accomplished in 1 shorter and arguably more palatable line of code with no method chaining.
Example arrays:
np.random.seed(123)
w = np.random.rand(8,8,25000)
r = np.random.rand(8,25000)
wres = np.einsum('ijk,jk->ik',w,r)
# a quick check on result equivalency to your loop
print(np.allclose(np.matmul(w[:, :, 1], r[:, 1]), wres[:, 1]))
True
Timing is equivalent to #Imanol's solution so take your pick of the two. Both are 30x faster than looping. Here, einsum will be competitive because of the size of the arrays. With arrays larger than these, it would likely win out, and lose for smaller arrays. See this discussion for more.
def solution1():
return np.einsum('ijk,jk->ik',w,r)
def solution2():
return np.squeeze(np.matmul(w.transpose(2, 0, 1), r.T[..., None])).T
def solution3():
Wres = np.empty((8, 25000))
for i in range(0,25000):
Wres[:,i] = np.matmul(w[:,:,i],r[:,i])
return Wres
%timeit solution1()
100 loops, best of 3: 2.51 ms per loop
%timeit solution2()
100 loops, best of 3: 2.52 ms per loop
%timeit solution3()
10 loops, best of 3: 64.2 ms per loop
Credit to: #Divakar
I've got a numpy array of row vectors of shape (n,3) and another numpy array of matrices of shape (n,3,3). I would like to multiply each of the n vectors with the corresponding matrix and return an array of shape (n,3) of the resulting vectors.
By now I've been using a for loop to iterate through the n vectors/matrices and do the multiplication item by item.
I would like to know if there's a more numpy-ish way of doing this. A way without the for loop that might even be faster.
//edit 1:
As requested, here's my loopy code (with n = 10):
arr_in = np.random.randn(10, 3)
matrices = np.random.randn(10, 3, 3)
for i in range(arr_in.shape[0]): # 10 iterations
arr_out[i] = np.asarray(np.dot(arr_in[i], matrices[i]))
That dot-product is essentially performing reduction along axis=1 of the two input arrays. The dimensions could be represented like so -
arr_in : n 3
matrices : n 3 3
So, one way to solve it would be to "push" the dimensions of arr_in to front by one axis/dimension, thus creating a singleton dimension at axis=2 in a 3D array version of it. Then, sum-reducing the elements along axis = 1 would give us the desired output. Let's show it -
arr_in : n [3] 1
matrices : n [3] 3
Now, this could be achieved through two ways.
1) With np.einsum -
np.einsum('ij,ijk->ik',arr_in,matrices)
2) With NumPy broadcasting -
(arr_in[...,None]*matrices).sum(1)
Runtime test and verify output (for einsum version) -
In [329]: def loop_based(arr_in,matrices):
...: arr_out = np.zeros((arr_in.shape[0], 3))
...: for i in range(arr_in.shape[0]):
...: arr_out[i] = np.dot(arr_in[i], matrices[i])
...: return arr_out
...:
...: def einsum_based(arr_in,matrices):
...: return np.einsum('ij,ijk->ik',arr_in,matrices)
...:
In [330]: # Inputs
...: N = 16935
...: arr_in = np.random.randn(N, 3)
...: matrices = np.random.randn(N, 3, 3)
...:
In [331]: np.allclose(einsum_based(arr_in,matrices),loop_based(arr_in,matrices))
Out[331]: True
In [332]: %timeit loop_based(arr_in,matrices)
10 loops, best of 3: 49.1 ms per loop
In [333]: %timeit einsum_based(arr_in,matrices)
1000 loops, best of 3: 714 µs per loop
You could use np.einsum. To get v.dot(M) for each vector-matrix pair, use np.einsum("...i,...ij", arr_in, matrices). To get M.dot(v) use np.einsum("...ij,...i", matrices, arr_in)
I have a array in size MxN and I like to compute the entropy value of each row. What would be the fastest way to do so ?
scipy.special.entr computes -x*log(x) for each element in an array. After calling that, you can sum the rows.
Here's an example. First, create an array p of positive values whose rows sum to 1:
In [23]: np.random.seed(123)
In [24]: x = np.random.rand(3, 10)
In [25]: p = x/x.sum(axis=1, keepdims=True)
In [26]: p
Out[26]:
array([[ 0.12798052, 0.05257987, 0.04168536, 0.1013075 , 0.13220688,
0.07774843, 0.18022149, 0.1258417 , 0.08837421, 0.07205402],
[ 0.08313743, 0.17661773, 0.1062474 , 0.01445742, 0.09642919,
0.17878489, 0.04420998, 0.0425045 , 0.12877228, 0.1288392 ],
[ 0.11793032, 0.15790292, 0.13467074, 0.11358463, 0.13429674,
0.06003561, 0.06725376, 0.0424324 , 0.05459921, 0.11729367]])
In [27]: p.shape
Out[27]: (3, 10)
In [28]: p.sum(axis=1)
Out[28]: array([ 1., 1., 1.])
Now compute the entropy of each row. entr uses the natural logarithm, so to get the base-2 log, divide the result by log(2).
In [29]: from scipy.special import entr
In [30]: entr(p).sum(axis=1)
Out[30]: array([ 2.22208731, 2.14586635, 2.22486581])
In [31]: entr(p).sum(axis=1)/np.log(2)
Out[31]: array([ 3.20579434, 3.09583074, 3.20980287])
If you don't want the dependency on scipy, you can use the explicit formula:
In [32]: (-p*np.log2(p)).sum(axis=1)
Out[32]: array([ 3.20579434, 3.09583074, 3.20980287])
As #Warren pointed out, it's unclear from your question whether you are starting out from an array of probabilities, or from the raw samples themselves. In my answer I've assumed the latter, in which case the main bottleneck will be computing the bin counts over each row.
Assuming that each vector of samples is relatively long, the fastest way to do this will probably be to use np.bincount:
import numpy as np
def entropy(x):
"""
x is assumed to be an (nsignals, nsamples) array containing integers between
0 and n_unique_vals
"""
x = np.atleast_2d(x)
nrows, ncols = x.shape
nbins = x.max() + 1
# count the number of occurrences for each unique integer between 0 and x.max()
# in each row of x
counts = np.vstack((np.bincount(row, minlength=nbins) for row in x))
# divide by number of columns to get the probability of each unique value
p = counts / float(ncols)
# compute Shannon entropy in bits
return -np.sum(p * np.log2(p), axis=1)
Although Warren's method of computing the entropies from the probability values using entr is slightly faster than using the explicit formula, in practice this is likely to represent a tiny fraction of the total runtime compared to the time taken to compute the bin counts.
Test correctness for a single row:
vals = np.arange(3)
prob = np.array([0.1, 0.7, 0.2])
row = np.random.choice(vals, p=prob, size=1000000)
print("theoretical H(x): %.6f, empirical H(x): %.6f" %
(-np.sum(prob * np.log2(prob)), entropy(row)[0]))
# theoretical H(x): 1.156780, empirical H(x): 1.157532
Test speed:
In [1]: %%timeit x = np.random.choice(vals, p=prob, size=(1000, 10000))
....: entropy(x)
....:
10 loops, best of 3: 34.6 ms per loop
If your data don't consist of integer indices between 0 and the number of unique values, you can convert them into this format using np.unique:
y = np.random.choice([2.5, 3.14, 42], p=prob, size=(1000, 10000))
unq, x = np.unique(y, return_inverse=True)
x.shape = y.shape