Vectorized inner product with numpy - python

I need to compute many inner products for vectors stored in numpy arrays.
Namely, in mathematical terms,
Vi = Σk Ai,k Bi,k
and
Ci,j = Σk Ai,k Di,j,k
I can of course use a loop, but I am wondering whether I could make use of a higher level operator like dot or matmul in a clever way that I haven't thought of. What is bugging me is that np.cross does accept arrays of vector to perform on, but dot and inner don't do what I want.

Your mathematical formulas already gives you the outline to define the subscripts when using np.einsum. It is as simple as:
V = np.einsum('ij,ik->i', A, B)
which translates to V[i] = sum(A[i][k]*B[i][k]).
and
C = np.einsum('ik,ijk->ij', A, D)
i.e. C[i][j] = sum(A[i][k]*D[i][j][k]
You can read more about the einsum operator on this other question.

These can be done with element-wise multiplication and sum:
Vi = Σk Ai,k Bi,k
V = (A*B).sum(axis=-1)
and
Ci,j = Σk Ai,k Di,j,k
C = (A[:,None,:] * D).sum(axis=-1)
einsum may be faster, but it's a good idea of understand this approach.
To use matmul we have to add dimensions to fit the
(batch,i,j)#(batch,j,k) => (batch,k)
pattern. The sum-of-products dimension is j. Calculation-wise this is fast, but application here may be a bit tedious.

Related

Matrix of "for x in vectors: for y in vectors:", without two for loops

Can one get the cartesian outer product of two set of vectors without using two for loops? It is slow because the data is large.
[[f(x,y) for x in vectors] for y in vectors]
I am trying to do an agglomerative clustering project in python and for this, I need to create a distance matrix.
This is the code that I have to define the function for the distance matrix:
def distance_matrix(vectors):
s = np.zeros((len(vectors), len(vectors)))
for i in range(len(vectors)):
for v in range(len(vectors)):
s[i, v] = dissimilarity(vectors[i], vectors[v])
return s
What it should do is take a take a list of NumPy arrays and return a 2D NumPy array d where the entry d[i,j] should contain the dissimilarity between vectors[i] and vectors[j].
In this case, vectors is the list of NumPy arrays and the dissimilarity is calculated by:
def dissimilarity(v1, v2):
return (1-(v1.dot(v2)/(np.linalg.norm(v1)*np.linalg.norm(v2))))
or in other words, the dissimilarity is the cosine dissimilarity between two 1D NumPy arrays.
My goal is to find a way to get the distance matrix without the double for loops but still have the computational time be very small.
In general one cannot do this. In general (but not in this case), the computation time will not change because the work is physically there and exists and nomatter the accounting tricks one might try to rewrite it (a single for loop that acts like two for loops, or a recursive implementation), at the end of the day the work must still be done irrespective of the ordering you do it.
Is there a way to get rid of the for loops? Even if it slows down the run time a bit. – Cat_Smithyyyy03
Your case is special. While one normally has no reason to consider a speedup is possible, you are asking to consider tensor products, or in this case a matrix product. When you think about general matrix multiplication AB=C, a matrix product C is basically the outer product {a dot b} of all vectors a in the rowspace of A and b in the colspace of B.
So, for your case, first normalize all your vectors (so the normalization is not required later), then stack them to form A, then let B=transp(A), then do a matrix multiply.
[[a0 a1 a2] [[a0 [b0 [z0 [aa ab ac .. az
[b0 b1 b2] a1 b1 ... z1 ba bb bc .. bz
( . )( a2] b2] z2]] ) = ( . . )
. . .
. . .
[z0 z1 z2]] za zb zc zz]
Interestingly, now you can then plug this into a fast matrix-multiply algorithm, which is actually faster than O(dim * #vecs^2). Hopefully it's also optimized for self-transpose matrices that would generate symmetric matrices (which might save a factor of 2 work... maybe it has some flag like matrixmult(a,b,outputWillBeSymmetric)).
This is faster than "O(N^2)", unintuitively: This rewrite exposed a substructure in the problem, which can be leveraged to get faster than O(dim * #vecs^2). The leveragable substructure is namely the fact that you are computing the outer product of THE SAME vectors. The fast matrix-multiply algorithm will leverage this.
edit: my original answer was wrong
You have a set of size N and you wish to compute f(a,b) for all a and b in the set.
Unless you know some values are trivial, there is no way to be asymptotically faster than this, because you have to imagine the worst case: Every pair f(a,b) may be unique... so there's no way to do less than roughly N^2 work.
However since your function f is symmetric, you could do half the work, then duplicate it:
N = len(vectors)
for i in range(N):
for v in range(N):
dissim = #...
s[i,v] = dissim
s[v,i] = dissim
(You can avoid calculating your metric in the reflexive case f(a,a) because it's trivial, but that doesn't make things asymptotically faster since the fraction of such work N/N^2 tends to zero as N increases, so it's not that great an optimization... it is reasonable only if you are working with a very small number of vectors.)
Whether you should optimize further depends on whether you need to. Such code should be able to easily handle millions of small vectors. Next steps:
Is there something fishy that is making my code very slow? What could it be? We don't have enough information to comment.
If there is nothing fishy of the sort, you can try rephrasing as matrix operations, in order to stay as much as possible in numpy's optimized C routines instead of bouncing back and forth into python. This is ugly and I would avoid doing it, because your code readability will decrease.
If you are dealing with hundreds of millions of vectors, perhaps consider a more cache-friendly approach where you do for blockI in range(N//10**6): for blockV in range(N//10**6): for i in range(blockI*10**6, (blockI+1)*10**6): for v in range(blockV*...):
If you're dealing with billions of vectors, then look into leveraging gpgpu. This is quite ideal for the gpu and might be a factor of thousands speedup.

Sum of many outer products in NumPy

I have two matrices A and B both with shape (N,M). I would like to perform the following operation: C = np.sum(A[:,None,:]*B[:,:,None],axis=(1,2)), which corresponds to performing the sum of the outer product of each row of A with each row of B. C would then have a shape of (N,).
The problem is that I get a MemoryError when using this form because N=12000 and M=4000.
Is there a way to perform this operation without having to first build the (huge) intermediary array to be summed ?
I suspect a solution with np.einsum would do the trick, but I'm not familiar with it !
Not sure if np.einsum solves the memory issue, but below is equivalent of your calculation using it:
C = np.einsum('ij,ik->i',A,B)
For future readers: this operation is mathematically equivalent to multiplying the sum of each row of A with the sum of each row of B. In other words, the fastest solution is C = np.sum(A,axis=-1) * np.sum(B,axis=-1).
This is much faster than np.einsum and any other way of computing it.

Numpy - writing a function in vector form?

I'm quite new to NumPy (or SciPy) and coming from Octave/Matlab, this seems a bit challenging to me.
I'm reading through the docs and writing some basic functions. I came across this section: Vectorizing functions (vectorize)
It defines this function:
def addsubtract(a, b):
if a > b:
return a - b
else:
return a + b
Then vectorizes it:
vec_addsubtract = np.vectorize(addsubtract)
But at the end, it says:
This particular function could have been written in vector form without the use of vectorize.
I wouldn't know any other way to write such function. So what is the vector form?
np.vectorize is a glorified python for loop, which means that it effectively strips away any optimizations that numpy offers.
To actually vectorize addsubtract, we can use the fact that numpy offers three things: a vectorized add function, a vectorized subtract function, and all sorts of boolean mask operations.
The simplest, but least efficient, way to write this is using np.where:
np.where(a > b, a - b, a + b)
This is inefficient because it pre-computes a - b and a + b in all cases, and then selects from one or the other for each element.
A more efficient solution would only compute the values where the condition required it:
result = np.empty_like(a)
mask = a > b
np.subtract(a, b, where=mask, out=result)
np.add(a, b, where=~mask, out=result)
For very small arrays, the overhead of the complicated method makes it less worthwhile. But for large arrays, it's the fastest solution.
Fun fact: the page in the tutorial you are referencing will not be available in future versions of the SciPy tutorial exactly because it is an intro to NumPy, as explained in PR #12432.
You can do this with np.where, which computes both results (a-b and a+b) and selects the values depending on an boolean array (a>b):
def addsubtract(a, b):
return np.where(a>b, a-b, a+b)
It can be seen as a vectorized ternary operator: "Where a>b, take the value from a-b, else take the value from a+b".
Despite computing both possible results, it was significantly faster than the vectorized if/else function you wrote (at least on my machine).

Scipy LinearOperator With Multiple Inputs

I need to invert a large, dense matrix which I hoped to use Scipy's gmres to do. Fortunately, the dense matrix A follows a pattern and I do not need to store the matrix in memory. The LinearOperator class allows us to construct an object which acts as the matrix for GMRES and can compute directly the matrix vector product A*v. That is, we write a function mv(v) which takes as input a vector v and returns mv(v) = A*v. Then, we can use the LinearOperator class to create A_LinOp = LinearOperator(shape = shape, matvec = mv). We can put the linear operator into the Scipy gmres command to evaluate the matrix vector products without ever having to fully load A into memory.
The documentation for the LinearOperator is found here: LinearOperator Documentation.
Here is my problem: to write the routine to compute the matrix vector product mv(v) = A*v, I need another input vector C. The entries in A are of the form A[i,j] = f(C[i] - C[j]). So, what I really want is for mv to be of two inputs, one fixed vector input C, and one variable input v which we want to compute A*v.
MATLAB has a similar setup, where would write x = gmres(#(v) mv(v,C),b) where b is the right hand side of the problem Ax = b, , and mv is the function that takes as variable input v which we want to compute A*v and C is the fixed, known vector which we need for the assembly of A.
My problem is that I can't figure out how to allow the LinearOperator class to accept two inputs, one variable and one "fixed" like I can in MATLAB.
Is there a way to do the analogous operation in SciPy? Alternatively, if anyone knows of a better way of inverting a large, dense matrix (50000, 50000) where the entries follow a pattern, I would greatly appreciate any suggestions.
Thanks!
EDIT: I should have stated this information actually. The matrix is actually (in block form) [A C; C^T 0], where A is N x N (N large) and C is N x 3, and the 0 is 3 x 3 and C^T is the transpose of C. This array C is the same array as the one mentioned above. The entries of A follow a pattern A[i,j] = f(C[i] - C[j]).
I wrote mv(v,C) to go row by row construct A*v[i] for i=0,N, by computing sum f(C[i]-C[j)*v[j] (actually, I do numpy.dot(FC,v) where FC[j] = f(C[i]-C[j]) which works well). Then, at the end doing the computations for the C^T rows. I was hoping to eventually replace the large for loop with a multiprocessing call to parallelize the for loop, but that's a future thing to consider. I will also look into using Cython to speed up the computations.
This is very late, but if you're still interested...
Your A matrix must be very low rank since it's a nonlinearly transformed version of a rank-2 matrix. Plus it's symmetric. That means it's trivial to inverse: get the truncated eigenvalue decompostion with, say, 5 eigenvalues: A = U*S*U', then invert that: A^-1 = U*S^-1*U'. S is diagonal so this is inexpensive. You can get the truncated eigenvalue decomposition with eigh.
That takes care of A. Then for the rest: use the block matrix inversion formula. Looks nasty, but I will bet you 100,000,000 prussian francs that it's 50x faster than the direct method you were using.
I faced the same situation (some years later than you) of trying to use more than one argument to LinearOperator, but for another problem. The solution I found was the use of global variables, to avoid passing the variables as arguments to the function.

Multiply several matrices in numpy

Suppose you have n square matrices A1,...,An. Is there anyway to multiply these matrices in a neat way? As far as I know dot in numpy accepts only two arguments. One obvious way is to define a function to call itself and get the result. Is there any better way to get it done?
This might be a relatively recent feature, but I like:
A.dot(B).dot(C)
or if you had a long chain you could do:
reduce(numpy.dot, [A1, A2, ..., An])
Update:
There is more info about reduce here. Here is an example that might help.
>>> A = [np.random.random((5, 5)) for i in xrange(4)]
>>> product1 = A[0].dot(A[1]).dot(A[2]).dot(A[3])
>>> product2 = reduce(numpy.dot, A)
>>> numpy.all(product1 == product2)
True
Update 2016:
As of python 3.5, there is a new matrix_multiply symbol, #:
R = A # B # C
Resurrecting an old question with an update:
As of November 13, 2014 there is now a np.linalg.multi_dot function which does exactly what you want. It also has the benefit of optimizing call order, though that isn't necessary in your case.
Note that this available starting with numpy version 1.10.
If you compute all the matrices a priori then you should use an optimization scheme for matrix chain multiplication. See this Wikipedia article.
Another way to achieve this would be using einsum, which implements the Einstein summation convention for NumPy.
To very briefly explain this convention with respect to this problem: When you write down your multiple matrix product as one big sum of products, you get something like:
P_im = sum_j sum_k sum_l A1_ij A2_jk A3_kl A4_lm
where P is the result of your product and A1, A2, A3, and A4 are the input matrices. Note that you sum over exactly those indices that appear twice in the summand, namely j, k, and l. As a sum with this property often appears in physics, vector calculus, and probably some other fields, there is a NumPy tool for it, namely einsum.
In the above example, you can use it to calculate your matrix product as follows:
P = np.einsum( "ij,jk,kl,lm", A1, A2, A3, A4 )
Here, the first argument tells the function which indices to apply to the argument matrices and then all doubly appearing indices are summed over, yielding the desired result.
Note that the computational efficiency depends on several factors (so you are probably best off with just testing it):
Why is numpy's einsum slower than numpy's built-in functions?
Why is numpy's einsum faster than numpy's built in functions?
A_list = [np.random.randn(100, 100) for i in xrange(10)]
B = np.eye(A_list[0].shape[0])
for A in A_list:
B = np.dot(B, A)
C = reduce(np.dot, A_list)
assert(B == C)

Categories