Given an index and a size, is there a more efficient way to produce the standard basis vector:
import numpy as np
np.array([1.0 if i == index else 0.0 for i in range(size)])
In [2]: import numpy as np
In [9]: size = 5
In [10]: index = 2
In [11]: np.eye(1,size,index)
Out[11]: array([[ 0., 0., 1., 0., 0.]])
Hm, unfortunately, using np.eye for this is rather slow:
In [12]: %timeit np.eye(1,size,index)
100000 loops, best of 3: 7.68 us per loop
In [13]: %timeit a = np.zeros(size); a[index] = 1.0
1000000 loops, best of 3: 1.53 us per loop
Wrapping np.zeros(size); a[index] = 1.0 in a function makes only a modest difference, and is still much faster than np.eye:
In [24]: def f(size, index):
....: arr = np.zeros(size)
....: arr[index] = 1.0
....: return arr
....:
In [27]: %timeit f(size, index)
1000000 loops, best of 3: 1.79 us per loop
x = np.zeros(size)
x[index] = 1.0
at least i think thats it...
>>> t = timeit.Timer('np.array([1.0 if i == index else 0.0 for i in range(size)]
)','import numpy as np;size=10000;index=5123')
>>> t.timeit(10)
0.039461429317952934 #original method
>>> t = timeit.Timer('x=np.zeros(size);x[index]=1.0','import numpy as np;size=10000;index=5123')
>>> t.timeit(10)
9.4077963240124518e-05 #zeros method
>>> t = timeit.Timer('x=np.eye(1.0,size,index)','import numpy as np;size=10000;index=5123')
>>> t.timeit(10)
0.0001398340635319073 #eye method
looks like np.zeros is fastest...
I'm not sure if this is faster, but it's definitely more clear to me.
a = np.zeros(size)
a[index] = 1.0
Often, you need not one but all basis vectors. If this is the case, consider np.eye:
basis = np.eye(3)
for vector in basis:
...
Not exactly the same, but closely related: This even works to get a set of basis matrices with a bit of tricks:
>>> d, e = 2, 3 # want 2x3 matrices
>>> basis = np.eye(d*e,d*e).reshape((d*e,d,e))
>>> print(basis)
[[[ 1. 0. 0.]
[ 0. 0. 0.]]
[[ 0. 1. 0.]
[ 0. 0. 0.]]
[[ 0. 0. 1.]
[ 0. 0. 0.]]
[[ 0. 0. 0.]
[ 1. 0. 0.]]
[[ 0. 0. 0.]
[ 0. 1. 0.]]
[[ 0. 0. 0.]
[ 0. 0. 1.]]]
and so on.
Another way to implement this is :
>>> def f(size, index):
... return (np.arange(size) == index).astype(float)
...
Which gives a slightly slower execution time :
>>> timeit.timeit('f(size, index)', 'from __main__ import f, size, index', number=1000000)
2.2554846050043125
It may not be the fastest, but the method scipy.signal.unit_impulse generalizes the above concept to numpy arrays of any shape.
Related
I have a 2d array, and I have some numbers to add to some cells. I want to vectorize the operation in order to save time. The problem is when I need to add several numbers to the same cell. In this case, the vectorized code only adds the last.
'a' is my array, 'x' and 'y' are the coordinates of the cells I want to increment, and 'z' contains the numbers I want to add.
import numpy as np
a=np.zeros((4,4))
x=[1,2,1]
y=[0,1,0]
z=[2,3,1]
a[x,y]+=z
print(a)
As you see, a[1,0] should be incremented twice: one by 2, one by 1. So the expected array should be:
[[0. 0. 0. 0.]
[3. 0. 0. 0.]
[0. 3. 0. 0.]
[0. 0. 0. 0.]]
but instead I get:
[[0. 0. 0. 0.]
[1. 0. 0. 0.]
[0. 3. 0. 0.]
[0. 0. 0. 0.]]
The problem would be easy to solve with a for loop, but I wonder if I can correctly vectorize this operation.
Use np.add.at for that:
import numpy as np
a = np.zeros((4,4))
x = [1, 2, 1]
y = [0, 1, 0]
z = [2, 3, 1]
np.add.at(a, (x, y), z)
print(a)
# [[0. 0. 0. 0.]
# [3. 0. 0. 0.]
# [0. 3. 0. 0.]
# [0. 0. 0. 0.]]
When you're doing a[x,y]+=z, we can decompose the operations as :
a[1, 0], a[2, 1], a[1, 0] = [a[1, 0] + 2, a[2, 1] + 3, a[1, 0] + 1]
# Equivalent to :
a[1, 0] = 2
a[2, 1] = 3
a[1, 0] = 1
That's why it doesn't works.
But if you're incrementing your array with a loop for each dimention, it should work
You could create a multi-dimensional array of size 3x4x4, then add up z to all the 3 different dimensions and them sum them all
import numpy as np
x = [1,2,1]
y = [0,1,0]
z = [2,3,1]
a = np.zeros((3,4,4))
n = range(a.shape[0])
a[n,x,y] += z
print(sum(a))
which will result in
[[0. 0. 0. 0.]
[3. 0. 0. 0.]
[0. 3. 0. 0.]
[0. 0. 0. 0.]]
Approach #1: Bincount-based method for performance
We can use np.bincount for efficient bin-based summation and basically inspired by this post -
def accumulate_arr(x, y, z, out):
# Get output array shape
shp = out.shape
# Get linear indices to be used as IDs with bincount
lidx = np.ravel_multi_index((x,y),shp)
# Or lidx = coords[0]*(coords[1].max()+1) + coords[1]
# Accumulate arr with IDs from lidx
out += np.bincount(lidx,z,minlength=out.size).reshape(out.shape)
return out
If you are working with a zeros-initialized output array, feed in the output shape directly into the function and get the bincount output as the final one.
Output on given sample -
In [48]: accumulate_arr(x,y,z,a)
Out[48]:
array([[0., 0., 0., 0.],
[3., 0., 0., 0.],
[0., 3., 0., 0.],
[0., 0., 0., 0.]])
Approach #2: Using sparse-matrix for memory-efficiency
In [54]: from scipy.sparse import coo_matrix
In [56]: coo_matrix((z,(x,y)), shape=(4,4)).toarray()
Out[56]:
array([[0, 0, 0, 0],
[3, 0, 0, 0],
[0, 3, 0, 0],
[0, 0, 0, 0]])
If you are okay with a sparse-matrix, skip the .toarray() part for a memory-efficient solution.
I want to eliminate the unefficient for loop from this code
import numpy as np
x = np.zeros((5,5))
for i in range(5):
x[i] = np.random.choice(i+1, 5)
While maintaining the output given
[[0. 0. 0. 0. 0.]
[0. 0. 1. 0. 0.]
[0. 2. 2. 1. 0.]
[1. 2. 3. 1. 0.]
[1. 0. 3. 3. 1.]]
I have tried this
i = np.arange(5)
x[i] = np.random.choice(i+1, 5)
But it outputs
[[0. 1. 1. 3. 3.]
[0. 1. 1. 3. 3.]
[0. 1. 1. 3. 3.]
[0. 1. 1. 3. 3.]
[0. 1. 1. 3. 3.]]
Is it possible to remove the loop? If not, which is the most efficient way to proceed for a big array and a lot of repetitions?
Create a random int array with the highest number per row as the number of columns. Hence, we can use np.random.randint with its high arg set as the no. of cols. Then, perform modulus operation to set across each row a different limit defined by the row number. Thus, we would have a vectorized implementation like so -
def create_rand_limited_per_row(m,n):
s = np.arange(1,m+1)
return np.random.randint(low=0,high=n,size=(m,n))%s[:,None]
Sample run -
In [45]: create_rand_limited_per_row(m=5,n=5)
Out[45]:
array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[1, 2, 0, 2, 1],
[0, 0, 1, 3, 0],
[1, 2, 3, 3, 2]])
To leverage multi-core with numexpr module for large data -
import numexpr as ne
def create_rand_limited_per_row_numepxr(m,n):
s = np.arange(1,m+1)[:,None]
a = np.random.randint(0,n,(m,n))
return ne.evaluate('a%s')
Benchmarking
# Original approach
def create_rand_limited_per_row_loopy(m,n):
x = np.empty((m,n),dtype=int)
for i in range(m):
x[i] = np.random.choice(i+1, n)
return x
Timings on 1k x 1k data -
In [71]: %timeit create_rand_limited_per_row_loopy(m=1000,n=1000)
10 loops, best of 3: 20.6 ms per loop
In [72]: %timeit create_rand_limited_per_row(m=1000,n=1000)
100 loops, best of 3: 14.3 ms per loop
In [73]: %timeit create_rand_limited_per_row_numepxr(m=1000,n=1000)
100 loops, best of 3: 6.98 ms per loop
Let me start off by explaining what I want to do. I am trying to build a recommendation system based off of m packages, each with n features, stored in an m x n sparse matrix X. To do this, I'm attempting to run kNN to get the k closest matches for a packages. I want to build an m x m sparse matrix K where K[i, j] is the dot product of rows X[i] and X[j] if X[j] was a package returned by kNN for X[i], otherwise 0.
Here is the code I've written:
X = ...
knn = NearestNeighbors(n_neighbors=self.n_neighbors, metric='l2')
knn.fit(X)
knn_indices = knn.kneighbors(X, return_distance=False)
m, k = X.shape[0], self.n_neighbors
K = lil_matrix((m, m))
for i, indices in enumerate(knn_indices):
xi = X.getrow(i)
for j in indices:
xj = X.getrow(j)
K[i, j] = xi.dot(xj.T)[0, 0]
I'm trying to figure out how to make this more efficient. In my scenario, m is ~1.2 million, n is ~50000, and k is 500, so perf is very important.
The last part where I populate K is the bottleneck of my program. getrow seems to perform very poorly; according to the scipy docs, it makes a copy of the row, so getrow call could be copying up to 50k elements each time it's called. Also, in the innermost loop I can't figure out how to get back a scalar for dot instead of creating a whole new 1x1 sparse matrix.
How can I avoid these problems and speed up/vectorize the last part of this code? Thanks.
In [21]: from scipy import sparse
In [22]: M = sparse.random(10,10,.2,'csr')
In [23]: M
Out[23]:
<10x10 sparse matrix of type '<class 'numpy.float64'>'
with 20 stored elements in Compressed Sparse Row format>
Looking a M.A, I selected this small knn_indices array for testing:
In [45]: knn = np.array([[4],[2],[],[1,3]])
Your double loop:
In [46]: for i, indices in enumerate(knn):
...: xi = M[i,:]
...: for j in indices:
...: xj = M[j,:]
...: print((xi*xj.T).A)
...:
[[0.35494592]]
[[0.]]
[[0.08112133]]
[[0.56905781]]
The inner loop can be condensed:
In [47]: for i, indices in enumerate(knn):
...: xi = M[i,:]
...: xj = M[indices,:]
...: print((xi*xj.T).A)
...:
[[0.35494592]]
[[0.]]
[]
[[0.08112133 0.56905781]]
and with the assignment:
In [49]: k = sparse.lil_matrix((4,5))
In [50]: for i, indices in enumerate(knn):
...: xi = M[i,:]
...: for j in indices:
...: xj = M[j,:]
...: k[i,j] = (xi*xj.T)[0,0]
...:
...:
In [51]: k.A
Out[51]:
array([[0. , 0. , 0. , 0. , 0.35494592],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0.08112133, 0. , 0.56905781, 0. ]])
The second loop with
k[i,indices] = (xi*xj.T)
does the same thing.
It may be possible to do something with the i loop as well, but this is at least a start.
That knn doesn't need to an array. With differing inner list lengths it's an object dtype anyways. Better leave it as list.
An alternative to filling this lil matrix, would be to accumulate i, indices and the dot product in coo style arrays.
In [64]: r,c,d = [],[],[]
In [65]: for i, indices in enumerate(knn):
...: xi = M[i,:]
...: xj = M[indices,:]
...: t = (xi*xj.T).data
...: if len(t)>0:
...: r.extend([i]*len(indices))
...: c.extend(indices)
...: d.extend(t)
...:
In [66]: r,c,d
Out[66]:
([0, 3, 3],
[4, 1, 3],
[0.3549459176547072, 0.08112132851228658, 0.5690578146292733])
In [67]: sparse.coo_matrix((d,(r,c))).A
Out[67]:
array([[0. , 0. , 0. , 0. , 0.35494592],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0. , 0. , 0. , 0. ],
[0. , 0.08112133, 0. , 0.56905781, 0. ]])
In my test case the 2nd row doesn't have any nonzero values, requiring an extra test in the loop. I don't know if this is any faster than the lil approach.
Is it possible to use np.bincount but get the max instead of sum of weights? Here, bbb at index 3 has two values, 11.1 and 55.5. I want to have 55.5, not 66.6. I doubt I choose use other function but not so sure which one is good for this purpose.
bbb = np.array([ 3, 7, 11, 13, 3])
weight = np.array([ 11.1, 22.2, 33.3, 44.4, 55.5])
print np.bincount(bbb, weight, minlength=15)
OUT >> [ 0. 0. 0. 66.6 0. 0. 0. 22.2 0. 0. 0. 33.3 0. 44.4 0. ]
Note that, in fact, bbb and weight are very large (about 5e6 elements).
The solution to your 2D question is also valid for the 1D case, so you can use np.maxmimum.at
out = np.zeros(15)
np.maximum.at(out, bbb, weight)
# array([ 0. , 0. , 0. , 55.5, 0. , 0. , 0. , 22.2, 0. ,
# 0. , 0. , 33.3, 0. , 44.4, 0. ])
Approach #1 : Here's one way with np.maximum.reduceat to get the binned maximum values -
def binned_max(bbb, weight, minlength):
sidx = bbb.argsort()
weight_s = weight[sidx]
bbb_s = bbb[sidx]
cut_idx = np.flatnonzero(np.concatenate(([True], bbb_s[1:] != bbb_s[:-1])))
bbb_unq = bbb_s[cut_idx]
#Or bbb_unq, cut_idx = np.unique(bbb_s, return_index=1)
max_val = np.maximum.reduceat(weight_s, cut_idx)
out = np.zeros(minlength, dtype=weight.dtype)
out[bbb_unq] = max_val
return out
Sample run -
In [36]: bbb = np.array([ 3, 7, 11, 13, 3])
...: weight = np.array([ 11.1, 22.2, 33.3, 44.4, 55.5])
In [37]: binned_max(bbb, weight, minlength=15)
Out[37]:
array([ 0. , 0. , 0. , 55.5, 0. , 0. , 0. , 22.2, 0. ,
0. , 0. , 33.3, 0. , 44.4, 0. ])
Approach #2 : Well I was trying to check out/having fun with numba to solve this and it seems quite efficient. Here's one numba way -
from numba import njit
#njit
def numba_func(out, bins, weight, minlength):
l = len(bins)
for i in range(l):
if out[bins[i]] < weight[i]:
out[bins[i]] = weight[i]
return out
def maxat_numba(bins, weight, minlength):
out = np.zeros(minlength, dtype=weight.dtype)
out[bins] = weight.min()
numba_func(out, bins, weight, minlength)
return out
Runtime test -
The built-in with np.maximum.at looks quite neat and would be the preferred one in most scenarios, so testing the proposed one against it -
# #Nils Werner's soln with np.maximum.at
def maxat_numpy(bins, weight, minlength):
out = np.zeros(minlength)
np.maximum.at(out, bins, weight)
return out
Timings -
Case #1 :
In [155]: bbb = np.random.randint(1,1000, (10000))
In [156]: weight = np.random.rand(*bbb.shape)
In [157]: %timeit maxat_numpy(bbb, weight, minlength=bbb.max()+1)
1000 loops, best of 3: 686 µs per loop
In [158]: %timeit maxat_numba(bbb, weight, minlength=bbb.max()+1)
10000 loops, best of 3: 60.6 µs per loop
Case #2 :
In [159]: bbb = np.random.randint(1,10000, (1000000))
In [160]: weight = np.random.rand(*bbb.shape)
In [161]: %timeit maxat_numpy(bbb, weight, minlength=bbb.max()+1)
10 loops, best of 3: 66 ms per loop
In [162]: %timeit maxat_numba(bbb, weight, minlength=bbb.max()+1)
100 loops, best of 3: 5.42 ms per loop
Probably not quite as fast as the answer by Nils, but the numpy_indexed package (disclaimer: I am its author) has a more flexible syntax for performing these type of operations:
import numpy_indexed as npi
unique_keys, maxima_per_key = npi.group_by(bbb).max(weight)
I wanted to interleave the rows of two numpy arrays of the same size.
I came up with this solution.
# A and B are same-shaped arrays
A = numpy.ones((4,3))
B = numpy.zeros_like(A)
C = numpy.array(zip(A[::1], B[::1])).reshape(A.shape[0]*2, A.shape[1])
print(C)
Outputs
[[ 1. 1. 1.]
[ 0. 0. 0.]
[ 1. 1. 1.]
[ 0. 0. 0.]
[ 1. 1. 1.]
[ 0. 0. 0.]
[ 1. 1. 1.]
[ 0. 0. 0.]]
Is there a cleaner, faster, better, numpy-only way?
It is maybe a bit clearer to do:
A = np.ones((4,3))
B = np.zeros_like(A)
C = np.empty((A.shape[0]+B.shape[0],A.shape[1]))
C[::2,:] = A
C[1::2,:] = B
and it's probably a bit faster as well, I'm guessing.
I find the following approach using numpy.hstack() quite readable:
import numpy as np
a = np.ones((2,3))
b = np.zeros_like(a)
c = np.hstack([a, b]).reshape(4, 3)
print(c)
Output:
[[ 1. 1. 1.]
[ 0. 0. 0.]
[ 1. 1. 1.]
[ 0. 0. 0.]]
It is easy to generalize this to a list of arrays of the same shape:
arrays = [a, b, c,...]
shape = (len(arrays)*a.shape[0], a.shape[1])
interleaved_array = np.hstack(arrays).reshape(shape)
It seems to be a bit slower than the accepted answer of #JoshAdel on small arrays but equally fast or faster on large arrays:
a = np.random.random((3,100))
b = np.random.random((3,100))
%%timeit
...: C = np.empty((a.shape[0]+b.shape[0],a.shape[1]))
...: C[::2,:] = a
...: C[1::2,:] = b
...:
The slowest run took 9.29 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 3.3 µs per loop
%timeit c = np.hstack([a,b]).reshape(2*a.shape[0], a.shape[1])
The slowest run took 5.06 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 10.1 µs per loop
a = np.random.random((4,1000000))
b = np.random.random((4,1000000))
%%timeit
...: C = np.empty((a.shape[0]+b.shape[0],a.shape[1]))
...: C[::2,:] = a
...: C[1::2,:] = b
...:
10 loops, best of 3: 23.2 ms per loop
%timeit c = np.hstack([a,b]).reshape(2*a.shape[0], a.shape[1])
10 loops, best of 3: 21.3 ms per loop
You can stack, transpose, and reshape:
numpy.dstack((A, B)).transpose(0, 2, 1).reshape(A.shape[0]*2, A.shape[1])