This question is similar to this one.
I have a 2d boolean array "belong" and a 2d float array "angles".
What I want is to sum along the rows the angles for which the corresponding index in belong is True, and do that with numpy (ie. avoid python loops). I don't need to store the resulting rows, which would have different lengths and as explained in the linked question would require a list.
So what I attempted is np.sum(angles[belong] ,axis=1), but angles[belong] returns a 1d result, and I can't reduce it as I want. I have also tried np.sum(angles*belong ,axis=1) and that works. But I wonder if I could improve the timing by accessing only the indexes where belong is True. belong is True about 30% of the time and angles is a simplification of a longer formula which involves angles.
UPDATE
I like the solution with einsum, however in my actual computation the speed up is tiny. I used angles in the question to simplify, in practice it is a formula that uses angles. I suspect that this formula is calculated for all the angles (regardless of belong) and then passed to einsum, which would perform the computation.
This is what I've done:
THRES_THETA and max_line_length are floats.
belong, angle and lines_lengths_vstacked have shape (1653, 58)
and np.count_nonzero(belong)/belong.size -> 0.376473287856979
l2 = (lambda angle=angle, belong=belong, THRES_THETA=THRES_THETA, lines_lengths_vstacked=lines_lengths_vstacked, max_line_length=max_line_length:
np.sum(belong*(0.3 * (1-(angle/THRES_THETA)) + 0.7 * (lines_lengths_vstacked/max_line_length)), axis=1)) #base method
t2 = timeit.Timer(l2)
print(t2.repeat(3, 100))
l1 = (lambda angle=angle, belong=belong, THRES_THETA=THRES_THETA, lines_lengths_vstacked=lines_lengths_vstacked, max_line_length=max_line_length:
np.einsum('ij,ij->i', belong, 0.3 * (1-(angle/THRES_THETA)) + 0.7 * (lines_lengths_vstacked/max_line_length)))
t1 = timeit.Timer(l1)
print(t1.repeat(3, 100))
l3 = (lambda angle=angle, belong=belong:
np.sum(angle*belong ,axis=1)) #base method
t3 = timeit.Timer(l3)
print(t3.repeat(3, 100))
l4 = (lambda angle=angle, belong=belong:
np.einsum('ij,ij->i', belong, angle))
t4 = timeit.Timer(l4)
print(t4.repeat(3, 100))
and the results were:
[0.2505458095931187, 0.22666162878242901, 0.23591678551324263]
[0.23295411847036418, 0.21908727226505043, 0.22407296178704272]
[0.03711204915708555, 0.03149960399994978, 0.033403337575027114]
[0.025264803208228992, 0.022590580646423053, 0.024585736455331464]
If we look at the last two rows, the one corresponding to einsum is about 30% faster than using the base method. But if we look at the first two rows, the speed up for the einsum method is smaller, just about 0.1% faster.
I'm not sure if this timing can be improved.
You can use np.einsum -
np.einsum('ij,ij->i',belong,angles)
You can also use np.bincount, like so -
idx,_ = np.where(belong)
out = np.bincount(idx,angles[belong])
Sample run -
In [32]: belong
Out[32]:
array([[ True, True, True, False, True],
[False, False, False, True, True],
[False, False, True, True, True],
[False, False, True, False, True]], dtype=bool)
In [33]: angles
Out[33]:
array([[ 0.65429151, 0.36235607, 0.98316406, 0.08236384, 0.5576149 ],
[ 0.37890797, 0.60705112, 0.79411002, 0.6450942 , 0.57750073],
[ 0.6731019 , 0.18608778, 0.83387574, 0.80120389, 0.54971573],
[ 0.18971255, 0.86765132, 0.82994543, 0.62344429, 0.05207639]])
In [34]: np.sum(angles*belong ,axis=1) # This worked for you, so using as baseline
Out[34]: array([ 2.55742654, 1.22259493, 2.18479536, 0.88202183])
In [35]: np.einsum('ij,ij->i',belong,angles)
Out[35]: array([ 2.55742654, 1.22259493, 2.18479536, 0.88202183])
In [36]: idx,_ = np.where(belong)
...: out = np.bincount(idx,angles[belong])
...:
In [37]: out
Out[37]: array([ 2.55742654, 1.22259493, 2.18479536, 0.88202183])
Runtime test -
In [52]: def sum_based(belong,angles):
...: return np.sum(angles*belong ,axis=1)
...:
...: def einsum_based(belong,angles):
...: return np.einsum('ij,ij->i',belong,angles)
...:
...: def bincount_based(belong,angles):
...: idx,_ = np.where(belong)
...: return np.bincount(idx,angles[belong])
...:
In [53]: # Inputs
...: belong = np.random.rand(4000,5000)>0.7
...: angles = np.random.rand(4000,5000)
...:
In [54]: %timeit sum_based(belong,angles)
...: %timeit einsum_based(belong,angles)
...: %timeit bincount_based(belong,angles)
...:
1 loops, best of 3: 308 ms per loop
10 loops, best of 3: 134 ms per loop
1 loops, best of 3: 554 ms per loop
I would go with the np.einsum one!
You could use masked arrays for this, but in the tests I ran it is not faster than (angles * belong).sum(1).
A masked array approach would look like this:
sum_ang = np.ma.masked_where(~belong, angles, copy=False).sum(1).data
Here, we are creating a masked array of angles where the values ~belong ("not belong"), are masked (excluded). We take the not because we want to exclude the values in belong that are False. Then take the sum along rows .sum(1). The sum will return another masked array, so you grab the values with the .data attribute of that masked array.
I added the copy=False kwarg so that this code doesn't get slowed down by array creation, but it's still slower than your (angles * belong).sum(1) approach so you should probably just stick with that.
I have found a way that is about 3 times faster than the einsum solution, and I don't think it can get any faster, so I'm answering my own question with this other method.
What I was hoping is to calculate the formula involving angles just for the positions where belong is True. This should speed up about 3 times as belong is True about 30% of the time.
My first attempt using angles[belong] would calculate the formula just for the positions where belong is True, but had the problem that the resulting array was 1d and I couldn't do the row reductions with np.sum. The solution is to use np.add.reduceat.
reduceat can reduce an ufunc (in this case add) at a list of specific slices. So I just need to create that list of slices so that I can reduce the 1d array resulting from angles[belong].
I'll show my code and timings and that should speak by itself.
first I define a function with the reduceat solution:
def vote_op(angle, belong, THRES_THETA, lines_lengths_vstacked, max_line_length):
intermediate = (0.3 * (1-(angle[belong]/THRES_THETA)) + 0.7 * (lines_lengths_vstacked[belong]/max_line_length))
b_ind = np.hstack([0, np.cumsum(np.sum(belong, axis=1))])
votes = np.add.reduceat(intermediate, b_ind[:-1])
return votes
then I compare with the base method and the einsum method:
l1 = (lambda angle=angle, belong=belong, THRES_THETA=THRES_THETA, lines_lengths_vstacked=lines_lengths_vstacked, max_line_length=max_line_length:
np.sum(belong*(0.3 * (1-(angle/THRES_THETA)) + 0.7 * (lines_lengths_vstacked/max_line_length)), axis=1))
t1 = timeit.Timer(l1)
print(t1.repeat(3, 100))
l2 = (lambda angle=angle, belong=belong, THRES_THETA=THRES_THETA, lines_lengths_vstacked=lines_lengths_vstacked, max_line_length=max_line_length:
np.einsum('ij,ij->i', belong, 0.3 * (1-(angle/THRES_THETA)) + 0.7 * (lines_lengths_vstacked/max_line_length)))
t2 = timeit.Timer(l2)
print(t2.repeat(3, 100))
l3 = (lambda angle=angle, belong=belong, THRES_THETA=THRES_THETA, lines_lengths_vstacked=lines_lengths_vstacked, max_line_length=max_line_length:
vote_op(angle, belong, THRES_THETA, lines_lengths_vstacked, max_line_length))
t3 = timeit.Timer(l3)
print(t3.repeat(3, 100))
and the timings:
[2.866840408487671, 2.6822349628234874, 2.665520338478774]
[2.3444239421490725, 2.352450520946098, 2.4150879511222794]
[0.6846337313820605, 0.660780839464234, 0.6091473217964847]
So the reduceat solution is about 3 times faster and gives the same results as the other two.
Note that these results are for a slightly larger example than before where:
belong, angle and lines_lengths_vstacked have shape: (3400, 170)
and np.count_nonzero(belong)/belong.size->0.16765051903114186
Update
Due to a corner case in np.reduceat (as in numpy version '1.11.0rc1') where it can't handle repeated indices correctly, see, I had to add a hack to vote_op() function for the case where there are whole rows in belong that are False. This results in repeated indices in b_ind and wrong results in votes. My solution for the moment is to patch the wrong values, that works but is another step. see new vote_op():
def vote_op(angle, belong, THRES_THETA, lines_lengths_vstacked, max_line_length):
intermediate = (0.3 * (1-(angle[belong]/THRES_THETA)) + 0.7 * (lines_lengths_vstacked[belong]/max_line_length))
b_rows = np.sum(belong, axis=1)
b_ind = np.hstack([0, np.cumsum(b_rows)])[:-1]
intermediate = np.hstack([intermediate, 0])
votes = np.add.reduceat(intermediate, b_ind)
votes[b_rows == 0] = 0
return votes
Related
I need to compute the covariance between each row of two different matrices, i.e. the covariance between the first row of the first matrix with the first row of the second matrix, and so on till the last row of both matrices. I can do it without NumPy with the code attached below, my question is: is it possible to avoid the use of the "for loop" and get the same result with NumPy?
m1 = np.array([[1,2,3],[2,2,2]])
m2 = np.array([[2.56, 2.89, 3.76],[1,2,3.95]])
output = []
for a,b in zip(m1,m2):
cov = np.cov(a, b)
output.append(cov[0][1])
print(output)
Thanks in advance!
If you are handling big arrays, I would consider this:
from numba import jit
import numpy as np
m1 = np.random.rand(10000, 3)
m2 = np.random.rand(10000, 3)
#jit(nopython=True)
def nb_cov(a, b):
return [np.cov(x)[0,1] for x in np.stack((a, b), axis=1)]
To get a runtime of
>>> %timeit nb_cov(m1, m2)
The slowest run took 94.24 times longer than the fastest. This could mean that an intermediate result is being cached.
1 loop, best of 5: 10.5 ms per loop
Compared with
>>> %timeit [np.cov(x)[0,1] for x in np.stack((m1, m2), axis=1)]
1 loop, best of 5: 410 ms per loop
You could use a list comprehension instead of a for loop, and you could eliminate zip (if you wanted to) by concatenating the two arrays along a third dimension.
import numpy as np
m1 = np.array([[1,2,3],[2,2,2]])
m2 = np.array([[2.56, 2.89, 3.76],[1,2,3.95]])
# List comprehension on zipped arrays.
out2 = [np.cov(a, b)[0][1] for a, b in zip(m1, m2)]
print(out2)
# [0.5999999999999999, 0.0]
# List comprehension on concatenated arrays.
big_array = np.concatenate((m1[:, np.newaxis, :],
m2[:, np.newaxis, :]), axis=1)
out3 = [np.cov(X)[0][1] for X in big_array]
print(out3)
# [0.5999999999999999, 0.0]
For a conceptual idea of what I mean, I have 2 data points:
x_0 = np.array([0.6, 1.4])[:, None]
x_1 = np.array([2.6, 3.4])[:, None]
And a 2x2 matrix:
y = np.array([[2, 2], [2, 2]])
If I perform x_0.T # y # x_0, I get array([[ 8.]]). Similarly, x_1.T # y # x_1 returns array([[ 72.]]).
But is there a way to perform both of these calculations in one go, without a for loop? Obviously the speed-up here is negligible, but I am working with much more data points than presented here.
With x as the column stacked version of x_0, x_1 and so on, we can use np.einsum -
np.einsum('ji,jk,ki->i',x,y,x)
With a mix of np.einsum and matrix-multiplcation -
np.einsum('ij,ji->i',x.T.dot(y),x)
As stated earlier, x was assumed to be column-stacked, like so :
x = np.column_stack((x_0, x_1))
Runtime test -
In [236]: x = np.random.randint(0,255,(3,100000))
In [237]: y = np.random.randint(0,255,(3,3))
# Proposed in #titipata's post/comments under this post
In [238]: %timeit (x.T.dot(y)*x.T).sum(1)
100 loops, best of 3: 3.45 ms per loop
# Proposed earlier in this post
In [239]: %timeit np.einsum('ji,jk,ki->i',x,y,x)
1000 loops, best of 3: 832 µs per loop
# Proposed earlier in this post
In [240]: %timeit np.einsum('ij,ji->i',x.T.dot(y),x)
100 loops, best of 3: 2.6 ms per loop
Basically, you want to do the operation (x.T).dot(A).dot(x) for all x that you have.
x_0 = np.array([0.6, 1.4])[:, None]
x_1 = np.array([2.6, 3.4])[:, None]
x = np.hstack((x_0, x_1)) # [[ 0.6 2.6], [ 1.4 3.4]]
The easy way to think about it is to do multiplication for all x_i that you have with y as
[x_i.dot(y).dot(x_i) for x_i in x.T]
>> [8.0, 72.0]
But of course this is not too efficient. However, you can do the trick where you can do dot product of x with y first and multiply back with itself and sum over column i.e. you manually do dot product. This will make the calculation much faster:
x = x.T
(x.dot(y) * x).sum(axis=1)
>> array([ 8., 72.])
Note that I transpose the matrix first because we want to multiply column of y to each row of x
Earlier I asked a similar question where the answer used np.dot, taking advantage of the fact that a dot product involves a sum of products. (To my understanding.)
Now I have a similar issue where I don't think dot will apply, because in place of a sum I want to take an element-wise diagonal. If it does, I haven't been able to apply it correctly.
Given a matrix x and array err:
x = np.matrix([[ 0.02984406, -0.00257266],
[-0.00257266, 0.00320312]])
err = np.array([ 7.6363226 , 13.16548267])
My current implementation with loop is:
res = np.array([np.sqrt(np.diagonal(x * err[i])) for i in range(err.shape[0])])
print(res)
[[ 0.47738755 0.15639712]
[ 0.62682649 0.20535487]]
which takes the diagonal of x.dot(i) for each i in err. Could this be vectorized? In other words, can the output of x * err be 3-dimensional, with np.diagonal then yielding a 2d array, with one element for each diagonal?
Program:
import numpy as np
x = np.matrix([[ 0.02984406, -0.00257266],
[-0.00257266, 0.00320312]])
err = np.array([ 7.6363226 , 13.16548267])
diag = np.diagonal(x)
ans = np.sqrt(diag*err[:,np.newaxis]) # sqrt of outer product
print(ans)
# use out keyword to avoid making new numpy array for many times.
ans = np.empty(x.shape, dtype=x.dtype)
for i in range(100):
ans = np.multiply(diag, err, out=ans)
ans = np.sqrt(ans, out=ans)
Result:
[[ 0.47738755 0.15639712]
[ 0.62682649 0.20535487]]
Here's an approach making use of diagonal-view with ndarray.flat into x and then use broadcasting for element-wise multiplication, like so -
np.sqrt(x.flat[::x.shape[1]+1].A1 * err[:,None])
Sample run -
In [108]: x = np.matrix([[ 0.02984406, -0.00257266],
...: [-0.00257266, 0.00320312]])
...:
...: err = np.array([ 7.6363226 , 13.16548267])
...:
In [109]: np.sqrt(x.flat[::x.shape[1]+1].A1 * err[:,None])
Out[109]:
array([[ 0.47738755, 0.15639712],
[ 0.62682649, 0.20535487]])
Runtime test to see how a view helps over np.diagonal that creates a copy -
In [104]: x = np.matrix(np.random.rand(5000,5000))
In [105]: err = np.random.rand(5000)
In [106]: %timeit np.diagonal(x)*err[:,np.newaxis]
10 loops, best of 3: 66.8 ms per loop
In [107]: %timeit x.flat[::x.shape[1]+1].A1 * err[:,None]
10 loops, best of 3: 37.7 ms per loop
I have an 8x8x25000 array W and an 8 x 25000 array r. I want to multiple each 8x8 slice of W by each column (8x1) of r and save the result in Wres, which will end up being an 8x25000 matrix.
I am accomplishing this using a for loop as such:
for i in range(0,25000):
Wres[:,i] = np.matmul(W[:,:,i],res[:,i])
But this is slow and I am hoping there is a quicker way to accomplish this.
Any ideas?
Matmul can propagate as long as the 2 arrays share the same 1 axis length. From the docs:
If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly.
Thus, you have to perform 2 operations prior to matmul:
import numpy as np
a = np.random.rand(8,8,100)
b = np.random.rand(8, 100)
transpose a and b so that the first axis are the 100 slices
add an extra dimension to b so that b.shape = (100, 8, 1)
Then:
at = a.transpose(2, 0, 1) # swap to shape 100, 8, 8
bt = b.T[..., None] # swap to shape 100, 8, 1
c = np.matmul(at, bt)
c is now 100, 8, 1, reshape back to 8, 100:
c = np.squeeze(c).swapaxes(0, 1)
or
c = np.squeeze(c).T
And last, a one-liner just for conveniende:
c = np.squeeze(np.matmul(a.transpose(2, 0, 1), b.T[..., None])).T
An alternative to using np.matmul is np.einsum, which can be accomplished in 1 shorter and arguably more palatable line of code with no method chaining.
Example arrays:
np.random.seed(123)
w = np.random.rand(8,8,25000)
r = np.random.rand(8,25000)
wres = np.einsum('ijk,jk->ik',w,r)
# a quick check on result equivalency to your loop
print(np.allclose(np.matmul(w[:, :, 1], r[:, 1]), wres[:, 1]))
True
Timing is equivalent to #Imanol's solution so take your pick of the two. Both are 30x faster than looping. Here, einsum will be competitive because of the size of the arrays. With arrays larger than these, it would likely win out, and lose for smaller arrays. See this discussion for more.
def solution1():
return np.einsum('ijk,jk->ik',w,r)
def solution2():
return np.squeeze(np.matmul(w.transpose(2, 0, 1), r.T[..., None])).T
def solution3():
Wres = np.empty((8, 25000))
for i in range(0,25000):
Wres[:,i] = np.matmul(w[:,:,i],r[:,i])
return Wres
%timeit solution1()
100 loops, best of 3: 2.51 ms per loop
%timeit solution2()
100 loops, best of 3: 2.52 ms per loop
%timeit solution3()
10 loops, best of 3: 64.2 ms per loop
Credit to: #Divakar
I have a array in size MxN and I like to compute the entropy value of each row. What would be the fastest way to do so ?
scipy.special.entr computes -x*log(x) for each element in an array. After calling that, you can sum the rows.
Here's an example. First, create an array p of positive values whose rows sum to 1:
In [23]: np.random.seed(123)
In [24]: x = np.random.rand(3, 10)
In [25]: p = x/x.sum(axis=1, keepdims=True)
In [26]: p
Out[26]:
array([[ 0.12798052, 0.05257987, 0.04168536, 0.1013075 , 0.13220688,
0.07774843, 0.18022149, 0.1258417 , 0.08837421, 0.07205402],
[ 0.08313743, 0.17661773, 0.1062474 , 0.01445742, 0.09642919,
0.17878489, 0.04420998, 0.0425045 , 0.12877228, 0.1288392 ],
[ 0.11793032, 0.15790292, 0.13467074, 0.11358463, 0.13429674,
0.06003561, 0.06725376, 0.0424324 , 0.05459921, 0.11729367]])
In [27]: p.shape
Out[27]: (3, 10)
In [28]: p.sum(axis=1)
Out[28]: array([ 1., 1., 1.])
Now compute the entropy of each row. entr uses the natural logarithm, so to get the base-2 log, divide the result by log(2).
In [29]: from scipy.special import entr
In [30]: entr(p).sum(axis=1)
Out[30]: array([ 2.22208731, 2.14586635, 2.22486581])
In [31]: entr(p).sum(axis=1)/np.log(2)
Out[31]: array([ 3.20579434, 3.09583074, 3.20980287])
If you don't want the dependency on scipy, you can use the explicit formula:
In [32]: (-p*np.log2(p)).sum(axis=1)
Out[32]: array([ 3.20579434, 3.09583074, 3.20980287])
As #Warren pointed out, it's unclear from your question whether you are starting out from an array of probabilities, or from the raw samples themselves. In my answer I've assumed the latter, in which case the main bottleneck will be computing the bin counts over each row.
Assuming that each vector of samples is relatively long, the fastest way to do this will probably be to use np.bincount:
import numpy as np
def entropy(x):
"""
x is assumed to be an (nsignals, nsamples) array containing integers between
0 and n_unique_vals
"""
x = np.atleast_2d(x)
nrows, ncols = x.shape
nbins = x.max() + 1
# count the number of occurrences for each unique integer between 0 and x.max()
# in each row of x
counts = np.vstack((np.bincount(row, minlength=nbins) for row in x))
# divide by number of columns to get the probability of each unique value
p = counts / float(ncols)
# compute Shannon entropy in bits
return -np.sum(p * np.log2(p), axis=1)
Although Warren's method of computing the entropies from the probability values using entr is slightly faster than using the explicit formula, in practice this is likely to represent a tiny fraction of the total runtime compared to the time taken to compute the bin counts.
Test correctness for a single row:
vals = np.arange(3)
prob = np.array([0.1, 0.7, 0.2])
row = np.random.choice(vals, p=prob, size=1000000)
print("theoretical H(x): %.6f, empirical H(x): %.6f" %
(-np.sum(prob * np.log2(prob)), entropy(row)[0]))
# theoretical H(x): 1.156780, empirical H(x): 1.157532
Test speed:
In [1]: %%timeit x = np.random.choice(vals, p=prob, size=(1000, 10000))
....: entropy(x)
....:
10 loops, best of 3: 34.6 ms per loop
If your data don't consist of integer indices between 0 and the number of unique values, you can convert them into this format using np.unique:
y = np.random.choice([2.5, 3.14, 42], p=prob, size=(1000, 10000))
unq, x = np.unique(y, return_inverse=True)
x.shape = y.shape