I am quite new to python numpy.
If i do have a list of numpy vectors. What is the best way to ensure computation is fast.
I am currently doing this which i find it to be too slow.
vec = sum(list of numpy vectors) # 4 vectors of 500 dimensions each
It does take up quite a noticeable amount of time using sum.
Is this what you are trying to do (but with much larger arrays)?
In [193]: sum([np.ones((2,3)),np.arange(6).reshape(2,3)])
Out[193]:
array([[ 1., 2., 3.],
[ 4., 5., 6.]])
500 dimensions each is an unclear description. Do you mean an array with shape (500,) or with ndim==500? If the latter, just how many elements total are there.
The fact that it is a list of 4 of these arrays shouldn't be a big deal. What's the time for array1 + array2?
If the arrays just have 500 elements each, the sum time is trivial:
In [195]: timeit sum([np.arange(500),np.arange(500),np.arange(500),np.arange(500)])
10000 loops, best of 3: 20.9 µs per loop
on the other hand a sum of arrays with many small dimensions is slower simply because such an array is much larger
In [204]: x=np.ones((3,)*10)
In [205]: timeit z=sum([x,x,x,x])
1000 loops, best of 3: 1.6 ms per loop
In my opinion this is already the fastest variant. It is pure numpy and as such computed within C-code.
Alternatives could be to compute the sums of each vector individually and then sum the values of the list or to stack all vectors and then sum up. But both are slower:
import numpy as np
import time
n = 10000
start = time.time()
for i in range(n):
lst = np.hstack([np.random.random(500) for i in range(4)])
x = np.sum(lst)
print("stack then np.sum: ", time.time()- start)
start = time.time()
for i in range(n):
lst = [np.sum(np.random.random(500)) for i in range(4)]
x = np.sum(lst)
print("sum up individually: ", time.time()- start)
start = time.time()
for i in range(n):
lst = [np.random.random(500) for i in range(4)]
x = np.sum(lst)
print("np.sum on list of vectors:", time.time()- start)
output:
stack then np.sum: 0.35804247856140137
sum up individually: 0.400468111038208
np.sum on list of vectors: 0.3427283763885498
Related
This sounds simple, and I think I'm overcomplicating this in my mind.
I want to make an array whose elements are generated from two source arrays of the same shape, depending on which element in the source arrays is greater.
to illustrate:
import numpy as np
array1 = np.array((2,3,0))
array2 = np.array((1,5,0))
array3 = (insert magic)
>> array([2, 5, 0))
I can't work out how to produce an array3 that combines the elements of array1 and array2 to produce an array where only the greater of the two array1/array2 element values is taken.
Any help would be much appreciated. Thanks.
We could use NumPy built-in np.maximum, made exactly for that purpose -
np.maximum(array1, array2)
Another way would be to use the NumPy ufunc np.max on a 2D stacked array and max-reduce along the first axis (axis=0) -
np.max([array1,array2],axis=0)
Timings on 1 million datasets -
In [271]: array1 = np.random.randint(0,9,(1000000))
In [272]: array2 = np.random.randint(0,9,(1000000))
In [274]: %timeit np.maximum(array1, array2)
1000 loops, best of 3: 1.25 ms per loop
In [275]: %timeit np.max([array1, array2],axis=0)
100 loops, best of 3: 3.31 ms per loop
# #Eric Duminil's soln1
In [276]: %timeit np.where( array1 > array2, array1, array2)
100 loops, best of 3: 5.15 ms per loop
# #Eric Duminil's soln2
In [277]: magic = lambda x,y : np.where(x > y , x, y)
In [278]: %timeit magic(array1, array2)
100 loops, best of 3: 5.13 ms per loop
Extending to other supporting ufuncs
Similarly, there's np.minimum for finding element-wise minimum values between two arrays of same or broadcastable shapes. So, to find element-wise minimum between array1 and array2, we would have :
np.minimum(array1, array2)
For a complete list of ufuncs that support this feature, please refer to the docs and look for the keyword : element-wise. Grep-ing for those, I got the following ufuncs :
add, subtract, multiply, divide, logaddexp, logaddexp2, true_divide,
floor_divide, power, remainder, mod, fmod, divmod, heaviside, gcd,
lcm, arctan2, hypot, bitwise_and, bitwise_or, bitwise_xor, left_shift,
right_shift, greater, greater_equal, less, less_equal, not_equal,
equal, logical_and, logical_or, logical_xor, maximum, minimum, fmax,
fmin, copysign, nextafter, ldexp, fmod
If your condition ever becomes more complex, you could use np.where:
import numpy as np
array1 = np.array((2,3,0))
array2 = np.array((1,5,0))
array3 = np.where( array1 > array2, array1, array2)
# array([2, 5, 0])
You could replace array1 > array2 with any condition. If all you want is the maximum, go with #Divakar's answer.
And just for fun :
magic = lambda x,y : np.where(x > y , x, y)
magic(array1, array2)
# array([2, 5, 0])
I am working on a python project and making use of numpy. I frequently have to compute Kronecker products of matrices by the identity matrix. These are a pretty big bottleneck in my code so I would like to optimize them. There are two kinds of products I have to take. The first one is:
np.kron(np.eye(N), A)
This one is pretty easy to optimize by simply using scipy.linalg.block_diag. The product is equivalent to:
la.block_diag(*[A]*N)
Which is about 10 times faster. However, I am unsure on how to optimize the second kind of product:
np.kron(A, np.eye(N))
Is there a similar trick I can use?
One approach would be to initialize an output array of 4D and then assign values into it from A. Such an assignment would broadcast values and this is where we would get efficiency in NumPy.
Thus, a solution would be like so -
# Get shape of A
m,n = A.shape
# Initialize output array as 4D
out = np.zeros((m,N,n,N))
# Get range array for indexing into the second and fourth axes
r = np.arange(N)
# Index into the second and fourth axes and selecting all elements along
# the rest to assign values from A. The values are broadcasted.
out[:,r,:,r] = A
# Finally reshape back to 2D
out.shape = (m*N,n*N)
Put as a function -
def kron_A_N(A, N): # Simulates np.kron(A, np.eye(N))
m,n = A.shape
out = np.zeros((m,N,n,N),dtype=A.dtype)
r = np.arange(N)
out[:,r,:,r] = A
out.shape = (m*N,n*N)
return out
To simulate np.kron(np.eye(N), A), simply swap the operations along the first and second and similarly for third and fourth axes -
def kron_N_A(A, N): # Simulates np.kron(np.eye(N), A)
m,n = A.shape
out = np.zeros((N,m,N,n),dtype=A.dtype)
r = np.arange(N)
out[r,:,r,:] = A
out.shape = (m*N,n*N)
return out
Timings -
In [174]: N = 100
...: A = np.random.rand(100,100)
...:
In [175]: np.allclose(np.kron(A, np.eye(N)), kron_A_N(A,N))
Out[175]: True
In [176]: %timeit np.kron(A, np.eye(N))
1 loops, best of 3: 458 ms per loop
In [177]: %timeit kron_A_N(A, N)
10 loops, best of 3: 58.4 ms per loop
In [178]: 458/58.4
Out[178]: 7.842465753424658
I want to compute eigenvectors for an array of data (in my actual case, i cloud of polygons)
To do so i wrote this function:
import numpy as np
def eigen(data):
eigenvectors = []
eigenvalues = []
for d in data:
# compute covariance for each triangle
cov = np.cov(d, ddof=0, rowvar=False)
# compute eigen vectors
vals, vecs = np.linalg.eig(cov)
eigenvalues.append(vals)
eigenvectors.append(vecs)
return np.array(eigenvalues), np.array(eigenvectors)
Running this on some test data:
import cProfile
triangles = np.random.random((10**4,3,3,)) # 10k 3D triangles
cProfile.run('eigen(triangles)') # 550005 function calls in 0.933 seconds
Works fine but it gets very slow because of the iteration loop. Is there a faster way to compute the data I need without iterating over the array? And if not can anyone suggest ways to speed it up?
Hack It!
Well I hacked into covariance func definition and put in the stated input states : ddof=0, rowvar=False and as it turns out, everything reduces to just three lines -
nC = m.shape[1] # m is the 2D input array
X = m - m.mean(0)
out = np.dot(X.T, X)/nC
To extend it to our 3D array case, I wrote down the loopy version with these three lines being iterated for the 2D arrays sections from the 3D input array, like so -
for i,d in enumerate(m):
# Using np.cov :
org_cov = np.cov(d, ddof=0, rowvar=False)
# Using earlier 2D array hacked version :
nC = m[i].shape[0]
X = m[i] - m[i].mean(0,keepdims=True)
hacked_cov = np.dot(X.T, X)/nC
Boost-it-up
We are needed to speedup the last three lines there. Computation of X across all iterations could be done with broadcasting -
diffs = data - data.mean(1,keepdims=True)
Next up, the dot-product calculation for all iterations could be done with transpose and np.dot, but that transpose could be a costly affair for such a multi-dimensional array. A better alternative exists in np.einsum, like so -
cov3D = np.einsum('ijk,ijl->ikl',diffs,diffs)/data.shape[1]
Use it!
To sum up :
for d in data:
# compute covariance for each triangle
cov = np.cov(d, ddof=0, rowvar=False)
Could be pre-computed like so :
diffs = data - data.mean(1,keepdims=True)
cov3D = np.einsum('ijk,ijl->ikl',diffs,diffs)/data.shape[1]
These pre-computed values could be used across iterations to compute eigen vectors like so -
for i,d in enumerate(data):
# Directly use pre-computed covariances for each triangle
vals, vecs = np.linalg.eig(cov3D[i])
Test It!
Here are some runtime tests to assess the effect of pre-computing covariance results -
In [148]: def original_app(data):
...: cov = np.empty(data.shape)
...: for i,d in enumerate(data):
...: # compute covariance for each triangle
...: cov[i] = np.cov(d, ddof=0, rowvar=False)
...: return cov
...:
...: def vectorized_app(data):
...: diffs = data - data.mean(1,keepdims=True)
...: return np.einsum('ijk,ijl->ikl',diffs,diffs)/data.shape[1]
...:
In [149]: data = np.random.randint(0,10,(1000,3,3))
In [150]: np.allclose(original_app(data),vectorized_app(data))
Out[150]: True
In [151]: %timeit original_app(data)
10 loops, best of 3: 64.4 ms per loop
In [152]: %timeit vectorized_app(data)
1000 loops, best of 3: 1.14 ms per loop
In [153]: data = np.random.randint(0,10,(5000,3,3))
In [154]: np.allclose(original_app(data),vectorized_app(data))
Out[154]: True
In [155]: %timeit original_app(data)
1 loops, best of 3: 324 ms per loop
In [156]: %timeit vectorized_app(data)
100 loops, best of 3: 5.67 ms per loop
I don't know how much of a speed up you can actually achieve.
Here is a slight modification that can help a little:
%timeit -n 10 values, vectors = \
eigen(triangles)
10 loops, best of 3: 745 ms per loop
%timeit values, vectors = \
zip(*(np.linalg.eig(np.cov(d, ddof=0, rowvar=False)) for d in triangles))
10 loops, best of 3: 705 ms per loop
I am interested in calculating a large NumPy array. I have a large array A which contains a bunch of numbers. I want to calculate the sum of different combinations of these numbers. The structure of the data is as follows:
A = np.random.uniform(0,1, (3743, 1388, 3))
Combinations = np.random.randint(0,3, (306,3))
Final_Product = np.array([ np.sum( A*cb, axis=2) for cb in Combinations])
My question is if there is a more elegant and memory efficient way to calculate this? I find it frustrating to work with np.dot() when a 3-D array is involved.
If it helps, the shape of Final_Product ideally should be (3743, 306, 1388). Currently Final_Product is of the shape (306, 3743, 1388), so I can just reshape to get there.
np.dot() won't give give you the desired output , unless you involve extra step(s) that would probably include reshaping. Here's one vectorized approach using np.einsum to do it one shot without any extra memory overhead -
Final_Product = np.einsum('ijk,lk->lij',A,Combinations)
For completeness, here's with np.dot and reshaping as discussed earlier -
M,N,R = A.shape
Final_Product = A.reshape(-1,R).dot(Combinations.T).T.reshape(-1,M,N)
Runtime tests and verify output -
In [138]: # Inputs ( smaller version of those listed in question )
...: A = np.random.uniform(0,1, (374, 138, 3))
...: Combinations = np.random.randint(0,3, (30,3))
...:
In [139]: %timeit np.array([ np.sum( A*cb, axis=2) for cb in Combinations])
1 loops, best of 3: 324 ms per loop
In [140]: %timeit np.einsum('ijk,lk->lij',A,Combinations)
10 loops, best of 3: 32 ms per loop
In [141]: M,N,R = A.shape
In [142]: %timeit A.reshape(-1,R).dot(Combinations.T).T.reshape(-1,M,N)
100 loops, best of 3: 15.6 ms per loop
In [143]: Final_Product =np.array([np.sum( A*cb, axis=2) for cb in Combinations])
...: Final_Product2 = np.einsum('ijk,lk->lij',A,Combinations)
...: M,N,R = A.shape
...: Final_Product3 = A.reshape(-1,R).dot(Combinations.T).T.reshape(-1,M,N)
...:
In [144]: print np.allclose(Final_Product,Final_Product2)
True
In [145]: print np.allclose(Final_Product,Final_Product3)
True
Instead of dot you could use tensordot. Your current method is equivalent to:
np.tensordot(A, Combinations, [2, 1]).transpose(2, 0, 1)
Note the transpose at the end to put the axes in the correct order.
Like dot, the tensordot function can call down to the fast BLAS/LAPACK libraries (if you have them installed) and so should be perform well for large arrays.
I have a large coordinate grid (vectors a and b), for which I generate and solve a matrix (10x10) equation. Is there a way for scipy.linalg.solve to accept vector input? So far my solution was to run for cycles over the coordinate arrays.
import time
import numpy as np
import scipy.linalg
N = 10
a = np.linspace(0, 1, 10**3)
b = np.linspace(0, 1, 2*10**3)
A = np.random.random((N, N)) # input matrix, not static
def f(a,b,n): # array-filling function
return a*b*n
def sol(x,y): # matrix solver
D = np.arange(0,N)
B = f(x,y,D)**2 + f(x-1, y+1, D) # source vector
X = scipy.linalg.solve(A,B)
return X # output an N-size vector
start = time.time()
answer = np.zeros(shape=(a.size, b.size)) # predefine output array
for egg in range(a.size): # an ugly double-for cycle
for ham in range(b.size):
aa = a[egg]
bb = b[ham]
answer[egg,ham] = sol(aa,bb)[0]
print time.time() - start
To illustrate my point about generalized ufuncs, and the ability to eliminate the loop over egg and ham, consider the following piece of code:
import numpy as np
A = np.random.randn(4,4,10,10)
AI = np.linalg.inv(A)
#check that generalized ufuncs work as expected
I = np.einsum('xyij,xyjk->xyik', A, AI)
print np.allclose(I, np.eye(10)[np.newaxis, np.newaxis, :, :])
#yevgeniy You are right, efficiently solving multiple independent linear systems A x = b with scipy a bit tricky (assuming an A array that changes for every iteration).
For instance, here is a benchmark for solving 1000 systems of the form, A x = b, where A is a 10x10 matrix, and b a 10 element vector. Surprisingly, the approach to put all this into one block diagonal matrix and call scipy.linalg.solve once is indeed slower both with dense and sparse matrices.
import numpy as np
from scipy.linalg import block_diag, solve
from scipy.sparse import block_diag as sp_block_diag
from scipy.sparse.linalg import spsolve
N = 10
M = 1000 # number of coordinates
Ai = np.random.randn(N, N) # we can compute the inverse here,
# but let's assume that Ai are different matrices in the for loop loop
bi = np.random.randn(N)
%timeit [solve(Ai, bi) for el in range(M)]
# 10 loops, best of 3: 32.1 ms per loop
Afull = sp_block_diag([Ai]*M, format='csr')
bfull = np.tile(bi, M)
%timeit Afull = sp_block_diag([Ai]*M, format='csr')
%timeit spsolve(Afull, bfull)
# 1 loops, best of 3: 303 ms per loop
# 100 loops, best of 3: 5.55 ms per loop
Afull = block_diag(*[Ai]*M)
%timeit Afull = block_diag(*[Ai]*M)
%timeit solve(Afull, bfull)
# 100 loops, best of 3: 14.1 ms per loop
# 1 loops, best of 3: 23.6 s per loop
The solution of the linear system, with sparse arrays is faster, but the time to create this block diagonal array is actually very slow. As to dense arrays, they are simply slower in this case (and take lots of RAM).
Maybe I'm missing something about how to make this work efficiently with sparse arrays, but if you are keeping the for loops, there are two things that you could do for optimizations.
From pure python, look at the source code of scipy.linalg.solve : remove unnecessary tests and factorize all repeated operations outside of your loops. For instance, assuming your arrays are not symmetrical positives, we could do
from scipy.linalg import get_lapack_funcs
gesv, = get_lapack_funcs(('gesv',), (Ai, bi))
def solve_opt(A, b, gesv=gesv):
# not sure if copying A and B is necessary, but just in case (faster if arrays are not copied)
lu, piv, x, info = gesv(A.copy(), b.copy(), overwrite_a=False, overwrite_b=False)
if info == 0:
return x
if info > 0:
raise LinAlgError("singular matrix")
raise ValueError('illegal value in %d-th argument of internal gesv|posv' % -info)
%timeit [solve(Ai, bi) for el in range(M)]
%timeit [solve_opt(Ai, bi) for el in range(M)]
# 10 loops, best of 3: 30.1 ms per loop
# 100 loops, best of 3: 3.77 ms per loop
which results in a 6.5x speed up.
If you need even better performance, you would have to port this for loop in Cython and interface the gesv BLAS functions directly in C, as discussed here, or better with the Cython API for BLAS/LAPACK in Scipy 0.16.
Edit: As #Eelco Hoogendoorn mentioned if your A matrix is fixed, there is a much simpler and more efficient approach.