summing outer product of multiple vectors in einsum - python

I have read through the einsum manual and ajcr's basic introduction
I have zero experience with einstein summation in a non-coding context, although I have tried to remedy that with some internet research (would provide links but don't have the reputation for more than two yet). I've also tried experimenting in python with einsum to see if I could get a better handle on things.
And yet I'm still unclear on whether it is both possible and efficient to do as follows:
on two arrays of arrays (a and b) of equal length (3) and height (n) , row by row produce the outer product of ( row i: a on b) plus the outer product of (row i: b on a), and then sum all the outer product matrices to output one, final matrix.
I know that 'i,j->ij' produces the outer product of one vector on another-- it's the next steps that have lost me. ('ijk,jik->ij' is definitely not it)
my other available option is to loop through the array and call the basic functions (a double outer product, and a matrix addition) from functions I've written in cython (using the numpy built in outer and sum function is not an option, it is far too slow). It is likely I'd end up moving the loop itself to cython as well.
so:
how can I express einsum-ically the procedure I described above?
would it offer real gains over doing everything in cython? or are there other alternatives I'm not aware of? (including the possibility that I've been using numpy less efficiently than I could be...)
Thanks.
edit with example:
A=np.zeros((3,3))
arrays_1=np.array([[1,0,0],[1,2,3],[0,1,0],[3,2,1]])
arrays_2=np.array([[1,2,3],[0,1,0],[1,0,0],[3,2,1]])
for i in range(len(arrays_1)):
A=A+(np.outer(arrays_1[i], arrays_2[i])+np.outer(arrays_2[i],arrays_1[i]))
(note, however, that in practice we're dealing with arrays of much greater length (ie still length 3 for each internal member but up to a few thousand such members), and this section of code gets (unavoidably) called many times)
in case it's at all helpful, here's the cython for the summing two outer products:
def outer_product_sum(np.ndarray[DTYPE_t, ndim=1] a_in, np.ndarray[DTYPE_t, ndim=1] b_in):
cdef double *a = <double *>a_in.data
cdef double *b = <double *>b_in.data
return np.array([
[a[0]*b[0]+a[0]*b[0], a[0]*b[1]+a[1]*b[0], a[0] * b[2]+a[2] * b[0]],
[a[1]*b[0]+a[0]*b[1], a[1]*b[1]+a[1]*b[1], a[1] * b[2]+a[2] * b[1]],
[a[2]*b[0]+a[0]*b[2], a[2]*b[1]+a[1]*b[2], a[2] * b[2]+a[2] * b[2]]])
which, right now, I call from within a 'i in range(len(array))' loop as shown above.

Einstein summation can only be used for the multiplicative part of question (i.e. the outer product). Luckily the summation does not have to be performed element-wise, but you can do that on the reduce matrices. Using the arrays from your example:
arrays_1 = np.array([[1,0,0],[1,2,3],[0,1,0],[3,2,1]])
arrays_2 = np.array([[1,2,3],[0,1,0],[1,0,0],[3,2,1]])
A = np.einsum('ki,kj->ij', arrays_1, arrays_2) + np.einsum('ki,kj->ij', arrays_2, arrays_1)
The input arrays are of shape (4,3), summation takes place over the first index (named 'k'). If summation should take place over the second index, change the subscripts string to 'ik,jk->ij'.

Whatever you can do with np.einsum, you can usually do faster using np.dot. In this case, A is the sum of two dot products:
arrays_1 = np.array([[1,0,0],[1,2,3],[0,1,0],[3,2,1]])
arrays_2 = np.array([[1,2,3],[0,1,0],[1,0,0],[3,2,1]])
A1 = (np.einsum('ki,kj->ij', arrays_1, arrays_2) +
np.einsum('ki,kj->ij', arrays_2, arrays_1))
A2 = arrays_1.T.dot(arrays_2) + arrays_2.T.dot(arrays_1)
print(np.allclose(A1, A2))
# True
%timeit (np.einsum('ki,kj->ij', arrays_1, arrays_2) +
np.einsum('ki,kj->ij', arrays_2, arrays_1))
# 100000 loops, best of 3: 7.51 µs per loop
%timeit arrays_1.T.dot(arrays_2) + arrays_2.T.dot(arrays_1)
# 100000 loops, best of 3: 4.51 µs per loop

Related

Vectorized inner product with numpy

I need to compute many inner products for vectors stored in numpy arrays.
Namely, in mathematical terms,
Vi = Σk Ai,k Bi,k
and
Ci,j = Σk Ai,k Di,j,k
I can of course use a loop, but I am wondering whether I could make use of a higher level operator like dot or matmul in a clever way that I haven't thought of. What is bugging me is that np.cross does accept arrays of vector to perform on, but dot and inner don't do what I want.
Your mathematical formulas already gives you the outline to define the subscripts when using np.einsum. It is as simple as:
V = np.einsum('ij,ik->i', A, B)
which translates to V[i] = sum(A[i][k]*B[i][k]).
and
C = np.einsum('ik,ijk->ij', A, D)
i.e. C[i][j] = sum(A[i][k]*D[i][j][k]
You can read more about the einsum operator on this other question.
These can be done with element-wise multiplication and sum:
Vi = Σk Ai,k Bi,k
V = (A*B).sum(axis=-1)
and
Ci,j = Σk Ai,k Di,j,k
C = (A[:,None,:] * D).sum(axis=-1)
einsum may be faster, but it's a good idea of understand this approach.
To use matmul we have to add dimensions to fit the
(batch,i,j)#(batch,j,k) => (batch,k)
pattern. The sum-of-products dimension is j. Calculation-wise this is fast, but application here may be a bit tedious.

Speed Up Program Below

I have written this for loop program below where I go through element by element of an array and do some math to those elements. Once the math is calculated it gets stored into another array.
for i in range(0, 1024):
x[i] = a * data[i]+ b * x[(i-1)] + c * x[(i-2)]
So in my program a, b, and c are just scalar numbers. Data and x are arrays. Data has an array size 1024 filled with numbers in each element. X is also an array size 1024 but it's filled with all zeros initially. In order to calculate the new elements of x I use the previous two elements of x. Initially the previous two are 0 and 0 since it takes the last two element from the x array of zeros. I multiply the current element of data by a, the last element of x by b, and the second to last element of x by c. Then I add everything up and save it to the current element of x. Then I do the same thing for every element in data and x.
This loop program works but I was wondering if there is a faster way to do it? Maybe using a combination of numpy functions like cumsum or dot product? Can some one help me maybe make the program faster? Thank you!
Best you could do using recursive method:
x = a * data
coef = np.array([c,b])
for i in range(2, 1024):
x[i] += np.dot(coef, x[i-2:i])
But even better, you can solve this recurrence equation to a closed form solution and apply directly without loop. (This is a basic 2nd order linear equation)
In general, if you want a programm that is fast, Python is not the best option. Python is great for prototyping since it is easy and has a lot of tools, however it is not verry computationally efficient in it's raw form if you compare it to for example C. What I usually do is to use Cython, is is a module for python that let's you convert your script to machiene code (as you do with C) which would greatly increase the speed of the appliation.
It let's you type cast the variables for example:
cdef double a, b, c
When you use a variable in Python the variables has to be checked every single time to make sure what type of variable it is (int, double, string etc). In C, that is not an issue since you have to decide from the start what the variable should be, decreasing the time consumption of the operation.
I would try to transform the for loop in a list comprehension which has much faster processing time in python.

Fast way to construct a matrix in Python

I have been browsing through the questions, and could find some help, but I prefer having confirmation by asking it directly. So here is my problem.
I have an (numpy) array u of dimension N, from which I want to build a square matrix k of dimension N^2. Basically, each matrix element k(i,j) is defined as k(i,j)=exp(-|u_i-u_j|^2).
My first naive way to do it was like this, which is, I believe, Fortran-like:
for i in range(N):
for j in range(N):
k[i][j]=np.exp(np.sum(-(u[i]-u[j])**2))
However, this is extremely slow. For N=1000, for example, it is taking around 15 seconds.
My other way to proceed is the following (inspired by other questions/answers):
i, j = np.ogrid[:N,:N]
k = np.exp(np.sum(-(u[i]-u[j])**2,axis=2))
This is way faster, as for N=1000, the result is almost instantaneous.
So I have two questions.
1) Why is the first method so slow, and why is the second one so fast ?
2) Is there a faster way to do it ? For N=10000, it is starting to take quite some time already, so I really don't know if this was the "right" way to do it.
Thank you in advance !
P.S: the matrix is symmetric, so there must also be a way to make the process faster by calculating only the upper half of the matrix, but my question was more related to the way to manipulate arrays, etc.
First, a small remark, there is no need to use np.sum if u can be re-written as u = np.arange(N). Which seems to be the case since you wrote that it is of dimension N.
1) First question:
Accessing indices in Python is slow, so best is to not use [] if there is a way to not use it. Plus you call multiple times np.exp and np.sum, whereas they can be called for vectors and matrices. So, your second proposal is better since you compute your k all in once, instead of elements by elements.
2) Second question:
Yes there is. You should consider using only numpy functions and not using indices (around 3 times faster):
k = np.exp(-np.power(np.subtract.outer(u,u),2))
(NB: You can keep **2 instead of np.power, which is a bit faster but has smaller precision)
edit (Take into account that u is an array of tuples)
With tuple data, it's a bit more complicated:
ma = np.subtract.outer(u[:,0],u[:,0])**2
mb = np.subtract.outer(u[:,1],u[:,1])**2
k = np.exp(-np.add(ma, mb))
You'll have to use twice np.substract.outer since it will return a 4 dimensions array if you do it in one time (and compute lots of useless data), whereas u[i]-u[j] returns a 3 dimensions array.
I used np.add instead of np.sum since it keep the array dimensions.
NB: I checked with
N = 10000
u = np.random.random_sample((N,2))
I returns the same as your proposals. (But 1.7 times faster)

an efficient way to speed up some numpy operations

I am trying to find an efficient code instead of the following piece of code (that is only one part of my code), to increase the speed:
for pr in some_list:
Tp = T[partition[pr]].sum(0)
Tpx = np.dot(Tp, xhat)
hp = h[partition[[pr]].sum(0)
up = (uk[partition[pr][:]].sum(0))/len(partition[pr])
hpu = hpu + np.dot(hp.T, up)
Tpu = Tpu + np.dot(Tp.T, up)
I have at least two more similar blocks of code. As you can see, I used fancy indexing three times (really couldn't find another way). In my algorithm, I need this part to be done very quickly, but it's not happening now. I will really appreciate any suggestion.
Thank you all.
Best,
If your partitions are few and have many elements each, you should consider swapping around the indices of your objects. Summing an array of shape (30,1000) along its second dimension should be faster than summing an array of shape (1000,30) along its first dimension, since in the former case you are always summing contiguous blocks of memory (i.e. arr[k,:] for each k) for each remaining index. So if you put the summation index last (and get rid of some trailing singleton dimension while you're at it), you might get speed-up.
As hpaulj noted in a comment, it's not clear how your loop could be vectorized. However, since it's performance-critical, you could still try vectorizing some of the work.
I suggest that you store hp, up and Tp for each partition (following pre-allocation), then perform the scalar/matrix products in a single vectorized step. Also note that Tpx is unused in your example, so I omitted it here (whatever you're doing with it, you can do it similarly to the other examples):
part_len = len(some_list) # number of partitions, N
Tpshape = (part_len,) + T.shape[1:] # (N,30,100) if T was (1000,30,100)
hpshape = (part_len,) + h.shape[1:] # (N,30,1) if h was (1000,30,1)
upshape = (part_len,) + uk.shape[1:] # (N,30,1) if uk was (1000,30,1)
Tp = np.zeros(Tpshape)
hp = np.zeros(hpshape)
up = np.zeros(upshape)
for ipr,pr in enumerate(some_list):
Tp[ipr,:,:] = T[partition[pr]].sum(0)
hp[ipr,:,:] = h[partition[[pr]].sum(0)
up[ipr,:,:] = uk[partition[pr]].sum(0)/len(partition[pr])
# compute vectorized dot products:
#Tpx unclear in original, omitted
# sum over second index (dot), sum over first index (sum in loop)
hpu = np.einsum('abc,abd->cd',hp,up) # shape (1,1)
Tpu = np.einsum('abc,abd->cd',Tp,up) # shape (100,1)
Clearly the key player is numpy.einsum. And of course if hpu and Tpu had some prior values before the loop, you have to increment those values with the results from einsum above.
As for einsum, it performs summations and contractions of arrays of arbitrary dimensions. The pattern apearing above, 'abc,abd->cd', when applied to 3d arrays A and B, will return a 2d array C, with the following definition (math pseudocode):
C(c,d) = sum_a sum_b A(a,b,c)*B(a,b,d)
For a given fix a summation index, what's inside is
sum_b A(a,b,c)*B(a,b,d)
which, if the c and d indices are kept, will be euqivalent to np.dot(A(a,:,:).T,B(a,:,:)). Since we're summing these matrices with respect to a too, we're supposed to do exactly what your loopy version does, adding up each np.dot() contribution of the total sums.

Optimizing a nested for loop

I'm trying avoid to use for loops to run my calculations. But I don't know how to do it. I have a matrix w with shape (40,100). Each line holds the position to a wave in a t time. For example first line w[0] is the initial condition (also w[1] for reasons that I will show).
To calculate the next line elements I use, for every t and x on shape range:
w[t+1,x] = a * w[t,x] + b * ( w[t,x-1] + w[t,x+1] ) - w[t-1,x]
Where a and b are some constants based on equation solution (it really doesn't matter), a = 2(1-r), b=r, r=(c*(dt/dx))**2. Where c is the wave speed and dt, dx are related to the increment on x and t direction.
Is there any way to avoid a for loop like:
for t in range(1,nt-1):
for x in range(1,nx-1):
w[t+1,x] = a * w[t,x] + b * ( w[t,x-1] + w[t,x+1] ) - w[t-1,x]
nt and nx are the shape of w matrix.
I assume you're setting w[:,0] and w[:-1] beforehand (to some constants?) because I don't see it in the loop.
If so, you can eliminate for x loop vectorizing this part of code:
for t in range(1,nt-1):
w[t+1,1:-1] = a*w[t,1:-1] + b*(w[t,:-2] + w[t,2:]) - w[t-1,1:-1]
Not really. If you want to do something for every element in your matrix (which you do), you're going to have to operate on each element in some way or another (most obvious way is with a for loop. Less obvious methods will either perform the same or worse).
If you're trying to avoid loops because loops are slow, know that sometimes loops are necessary to solve a certain kind of problem. However, there are lots of ways to make loops more efficient.
Generally with matrix problems like this where you're looking at the neighboring elements, a good solution is using some kind of dynamic programming or memoization (saving your work so you don't have to repeat calculations frequently). Like, suppose for each element you wanted to take the average of it and all the things around it (this is how blurring images works). Each pixel has 8 neighbors, so the average will be the sum / 9. Well, let's say you save the sums of the columns (save NW + W + SW, N + me + S, NE + E + SE). Well when you go to the next one to the right, just sum the values of your previous middle column, your previous last column, and the values of a new column (the new ones to the right). You just replaced adding 9 numbers with adding 5. In operations that are more complicated than addition, reducing 9 to 5 can mean a huge performance increase.
I looked at what you have to do and I couldn't think of a good way to do something like I just described. But see if you can think of something similar.
Also, remember multiplication is much more expensive than addition. So if you had a loop where, for instance, you had to multiply some number by the loop variable, instead of doing 1x, 2x, 3x, ..., you could do (value last time + x).

Categories