Multiply several matrices in numpy - python

Suppose you have n square matrices A1,...,An. Is there anyway to multiply these matrices in a neat way? As far as I know dot in numpy accepts only two arguments. One obvious way is to define a function to call itself and get the result. Is there any better way to get it done?

This might be a relatively recent feature, but I like:
A.dot(B).dot(C)
or if you had a long chain you could do:
reduce(numpy.dot, [A1, A2, ..., An])
Update:
There is more info about reduce here. Here is an example that might help.
>>> A = [np.random.random((5, 5)) for i in xrange(4)]
>>> product1 = A[0].dot(A[1]).dot(A[2]).dot(A[3])
>>> product2 = reduce(numpy.dot, A)
>>> numpy.all(product1 == product2)
True
Update 2016:
As of python 3.5, there is a new matrix_multiply symbol, #:
R = A # B # C

Resurrecting an old question with an update:
As of November 13, 2014 there is now a np.linalg.multi_dot function which does exactly what you want. It also has the benefit of optimizing call order, though that isn't necessary in your case.
Note that this available starting with numpy version 1.10.

If you compute all the matrices a priori then you should use an optimization scheme for matrix chain multiplication. See this Wikipedia article.

Another way to achieve this would be using einsum, which implements the Einstein summation convention for NumPy.
To very briefly explain this convention with respect to this problem: When you write down your multiple matrix product as one big sum of products, you get something like:
P_im = sum_j sum_k sum_l A1_ij A2_jk A3_kl A4_lm
where P is the result of your product and A1, A2, A3, and A4 are the input matrices. Note that you sum over exactly those indices that appear twice in the summand, namely j, k, and l. As a sum with this property often appears in physics, vector calculus, and probably some other fields, there is a NumPy tool for it, namely einsum.
In the above example, you can use it to calculate your matrix product as follows:
P = np.einsum( "ij,jk,kl,lm", A1, A2, A3, A4 )
Here, the first argument tells the function which indices to apply to the argument matrices and then all doubly appearing indices are summed over, yielding the desired result.
Note that the computational efficiency depends on several factors (so you are probably best off with just testing it):
Why is numpy's einsum slower than numpy's built-in functions?
Why is numpy's einsum faster than numpy's built in functions?

A_list = [np.random.randn(100, 100) for i in xrange(10)]
B = np.eye(A_list[0].shape[0])
for A in A_list:
B = np.dot(B, A)
C = reduce(np.dot, A_list)
assert(B == C)

Related

Vectorized inner product with numpy

I need to compute many inner products for vectors stored in numpy arrays.
Namely, in mathematical terms,
Vi = Σk Ai,k Bi,k
and
Ci,j = Σk Ai,k Di,j,k
I can of course use a loop, but I am wondering whether I could make use of a higher level operator like dot or matmul in a clever way that I haven't thought of. What is bugging me is that np.cross does accept arrays of vector to perform on, but dot and inner don't do what I want.
Your mathematical formulas already gives you the outline to define the subscripts when using np.einsum. It is as simple as:
V = np.einsum('ij,ik->i', A, B)
which translates to V[i] = sum(A[i][k]*B[i][k]).
and
C = np.einsum('ik,ijk->ij', A, D)
i.e. C[i][j] = sum(A[i][k]*D[i][j][k]
You can read more about the einsum operator on this other question.
These can be done with element-wise multiplication and sum:
Vi = Σk Ai,k Bi,k
V = (A*B).sum(axis=-1)
and
Ci,j = Σk Ai,k Di,j,k
C = (A[:,None,:] * D).sum(axis=-1)
einsum may be faster, but it's a good idea of understand this approach.
To use matmul we have to add dimensions to fit the
(batch,i,j)#(batch,j,k) => (batch,k)
pattern. The sum-of-products dimension is j. Calculation-wise this is fast, but application here may be a bit tedious.

Matrix of "for x in vectors: for y in vectors:", without two for loops

Can one get the cartesian outer product of two set of vectors without using two for loops? It is slow because the data is large.
[[f(x,y) for x in vectors] for y in vectors]
I am trying to do an agglomerative clustering project in python and for this, I need to create a distance matrix.
This is the code that I have to define the function for the distance matrix:
def distance_matrix(vectors):
s = np.zeros((len(vectors), len(vectors)))
for i in range(len(vectors)):
for v in range(len(vectors)):
s[i, v] = dissimilarity(vectors[i], vectors[v])
return s
What it should do is take a take a list of NumPy arrays and return a 2D NumPy array d where the entry d[i,j] should contain the dissimilarity between vectors[i] and vectors[j].
In this case, vectors is the list of NumPy arrays and the dissimilarity is calculated by:
def dissimilarity(v1, v2):
return (1-(v1.dot(v2)/(np.linalg.norm(v1)*np.linalg.norm(v2))))
or in other words, the dissimilarity is the cosine dissimilarity between two 1D NumPy arrays.
My goal is to find a way to get the distance matrix without the double for loops but still have the computational time be very small.
In general one cannot do this. In general (but not in this case), the computation time will not change because the work is physically there and exists and nomatter the accounting tricks one might try to rewrite it (a single for loop that acts like two for loops, or a recursive implementation), at the end of the day the work must still be done irrespective of the ordering you do it.
Is there a way to get rid of the for loops? Even if it slows down the run time a bit. – Cat_Smithyyyy03
Your case is special. While one normally has no reason to consider a speedup is possible, you are asking to consider tensor products, or in this case a matrix product. When you think about general matrix multiplication AB=C, a matrix product C is basically the outer product {a dot b} of all vectors a in the rowspace of A and b in the colspace of B.
So, for your case, first normalize all your vectors (so the normalization is not required later), then stack them to form A, then let B=transp(A), then do a matrix multiply.
[[a0 a1 a2] [[a0 [b0 [z0 [aa ab ac .. az
[b0 b1 b2] a1 b1 ... z1 ba bb bc .. bz
( . )( a2] b2] z2]] ) = ( . . )
. . .
. . .
[z0 z1 z2]] za zb zc zz]
Interestingly, now you can then plug this into a fast matrix-multiply algorithm, which is actually faster than O(dim * #vecs^2). Hopefully it's also optimized for self-transpose matrices that would generate symmetric matrices (which might save a factor of 2 work... maybe it has some flag like matrixmult(a,b,outputWillBeSymmetric)).
This is faster than "O(N^2)", unintuitively: This rewrite exposed a substructure in the problem, which can be leveraged to get faster than O(dim * #vecs^2). The leveragable substructure is namely the fact that you are computing the outer product of THE SAME vectors. The fast matrix-multiply algorithm will leverage this.
edit: my original answer was wrong
You have a set of size N and you wish to compute f(a,b) for all a and b in the set.
Unless you know some values are trivial, there is no way to be asymptotically faster than this, because you have to imagine the worst case: Every pair f(a,b) may be unique... so there's no way to do less than roughly N^2 work.
However since your function f is symmetric, you could do half the work, then duplicate it:
N = len(vectors)
for i in range(N):
for v in range(N):
dissim = #...
s[i,v] = dissim
s[v,i] = dissim
(You can avoid calculating your metric in the reflexive case f(a,a) because it's trivial, but that doesn't make things asymptotically faster since the fraction of such work N/N^2 tends to zero as N increases, so it's not that great an optimization... it is reasonable only if you are working with a very small number of vectors.)
Whether you should optimize further depends on whether you need to. Such code should be able to easily handle millions of small vectors. Next steps:
Is there something fishy that is making my code very slow? What could it be? We don't have enough information to comment.
If there is nothing fishy of the sort, you can try rephrasing as matrix operations, in order to stay as much as possible in numpy's optimized C routines instead of bouncing back and forth into python. This is ugly and I would avoid doing it, because your code readability will decrease.
If you are dealing with hundreds of millions of vectors, perhaps consider a more cache-friendly approach where you do for blockI in range(N//10**6): for blockV in range(N//10**6): for i in range(blockI*10**6, (blockI+1)*10**6): for v in range(blockV*...):
If you're dealing with billions of vectors, then look into leveraging gpgpu. This is quite ideal for the gpu and might be a factor of thousands speedup.

Fast way to construct a matrix in Python

I have been browsing through the questions, and could find some help, but I prefer having confirmation by asking it directly. So here is my problem.
I have an (numpy) array u of dimension N, from which I want to build a square matrix k of dimension N^2. Basically, each matrix element k(i,j) is defined as k(i,j)=exp(-|u_i-u_j|^2).
My first naive way to do it was like this, which is, I believe, Fortran-like:
for i in range(N):
for j in range(N):
k[i][j]=np.exp(np.sum(-(u[i]-u[j])**2))
However, this is extremely slow. For N=1000, for example, it is taking around 15 seconds.
My other way to proceed is the following (inspired by other questions/answers):
i, j = np.ogrid[:N,:N]
k = np.exp(np.sum(-(u[i]-u[j])**2,axis=2))
This is way faster, as for N=1000, the result is almost instantaneous.
So I have two questions.
1) Why is the first method so slow, and why is the second one so fast ?
2) Is there a faster way to do it ? For N=10000, it is starting to take quite some time already, so I really don't know if this was the "right" way to do it.
Thank you in advance !
P.S: the matrix is symmetric, so there must also be a way to make the process faster by calculating only the upper half of the matrix, but my question was more related to the way to manipulate arrays, etc.
First, a small remark, there is no need to use np.sum if u can be re-written as u = np.arange(N). Which seems to be the case since you wrote that it is of dimension N.
1) First question:
Accessing indices in Python is slow, so best is to not use [] if there is a way to not use it. Plus you call multiple times np.exp and np.sum, whereas they can be called for vectors and matrices. So, your second proposal is better since you compute your k all in once, instead of elements by elements.
2) Second question:
Yes there is. You should consider using only numpy functions and not using indices (around 3 times faster):
k = np.exp(-np.power(np.subtract.outer(u,u),2))
(NB: You can keep **2 instead of np.power, which is a bit faster but has smaller precision)
edit (Take into account that u is an array of tuples)
With tuple data, it's a bit more complicated:
ma = np.subtract.outer(u[:,0],u[:,0])**2
mb = np.subtract.outer(u[:,1],u[:,1])**2
k = np.exp(-np.add(ma, mb))
You'll have to use twice np.substract.outer since it will return a 4 dimensions array if you do it in one time (and compute lots of useless data), whereas u[i]-u[j] returns a 3 dimensions array.
I used np.add instead of np.sum since it keep the array dimensions.
NB: I checked with
N = 10000
u = np.random.random_sample((N,2))
I returns the same as your proposals. (But 1.7 times faster)

Allowing for deviations in exact values during matrix multiplication, python

I need to solve this:
Check if AT * n * A = n, where A is the test matrix, AT is the transposed test matrix and n = [[1,0,0,0],[0,-1,0,0],[0,0,-1,0],[0,0,0,-1]].
I don't know how to check for equality due to the numerical errors in the float multiplication. How do I go about doing this?
Current code:
def trans(A):
n = numpy.matrix([[1,0,0,0],[0,-1,0,0],[0,0,-1,0],[0,0,0,-1]])
c = numpy.matrix.transpose(A) * n * numpy.matrix(A)
Have then tried
>if c == n:
return True
I have also tried assigning variables to every element of matrix and then checking that each variable is within certain limits.
Typically, the way that numerical-precision limitations are overcome is by allowing for some epsilon (or error-value) between the actual value and expected value that is still considered 'equal'. For example, I might say that some value a is equal to some value b if they are within plus/minus 0.01. This would be implemented in python as:
def float_equals(a, b, epsilon):
return abs(a-b)<epsilon
Of course, for matrixes entered as lists, this isn't quite so simple. We have to check if all values are within the epsilon to their partner. One example solution would be as follows, assuming your matrices are standard python lists:
from itertools import product # need this to generate indexes
def matrix_float_equals(A, B, epsilon):
return all(abs(A[i][j]-B[i][j])<epsilon for i,j in product(xrange(len(A)), repeat = 2))
all returns True iff all values in a list are True (list-wise and). product effectively dot-products two lists, with the repeat keyword allowing easy duplicate lists. Therefore given a range repeated twice, it will produce a list of tuples for each index. Of course, this method of index generation assumes square, equally-sized matrices. For non-square matrices you have to get more creative, but the idea is the same.
However, as is typically the way in python, there are libraries that do this kind of thing for you. Numpy's allclose does exactly this; compares two numpy arrays for equality element-wise within some tolerance. If you're working with matrices in python for numeric analysis, numpy is really the way to go, I would get familiar with its basic API.
If a and b are numpy arrays or matrices of the same shape, then you can use allclose:
if numpy.allclose(a, b): # a is approximately equal to b
# do something ...
This checks that for all i and all j, |aij - bij| < εa for some absolute error εa (by default 10-5) and that |aij - bij| < |bij| εr for some relative error εr (by default 10-8). Thus it is safe to use, even if your calculations introduce numerical errors.

Scipy LinearOperator With Multiple Inputs

I need to invert a large, dense matrix which I hoped to use Scipy's gmres to do. Fortunately, the dense matrix A follows a pattern and I do not need to store the matrix in memory. The LinearOperator class allows us to construct an object which acts as the matrix for GMRES and can compute directly the matrix vector product A*v. That is, we write a function mv(v) which takes as input a vector v and returns mv(v) = A*v. Then, we can use the LinearOperator class to create A_LinOp = LinearOperator(shape = shape, matvec = mv). We can put the linear operator into the Scipy gmres command to evaluate the matrix vector products without ever having to fully load A into memory.
The documentation for the LinearOperator is found here: LinearOperator Documentation.
Here is my problem: to write the routine to compute the matrix vector product mv(v) = A*v, I need another input vector C. The entries in A are of the form A[i,j] = f(C[i] - C[j]). So, what I really want is for mv to be of two inputs, one fixed vector input C, and one variable input v which we want to compute A*v.
MATLAB has a similar setup, where would write x = gmres(#(v) mv(v,C),b) where b is the right hand side of the problem Ax = b, , and mv is the function that takes as variable input v which we want to compute A*v and C is the fixed, known vector which we need for the assembly of A.
My problem is that I can't figure out how to allow the LinearOperator class to accept two inputs, one variable and one "fixed" like I can in MATLAB.
Is there a way to do the analogous operation in SciPy? Alternatively, if anyone knows of a better way of inverting a large, dense matrix (50000, 50000) where the entries follow a pattern, I would greatly appreciate any suggestions.
Thanks!
EDIT: I should have stated this information actually. The matrix is actually (in block form) [A C; C^T 0], where A is N x N (N large) and C is N x 3, and the 0 is 3 x 3 and C^T is the transpose of C. This array C is the same array as the one mentioned above. The entries of A follow a pattern A[i,j] = f(C[i] - C[j]).
I wrote mv(v,C) to go row by row construct A*v[i] for i=0,N, by computing sum f(C[i]-C[j)*v[j] (actually, I do numpy.dot(FC,v) where FC[j] = f(C[i]-C[j]) which works well). Then, at the end doing the computations for the C^T rows. I was hoping to eventually replace the large for loop with a multiprocessing call to parallelize the for loop, but that's a future thing to consider. I will also look into using Cython to speed up the computations.
This is very late, but if you're still interested...
Your A matrix must be very low rank since it's a nonlinearly transformed version of a rank-2 matrix. Plus it's symmetric. That means it's trivial to inverse: get the truncated eigenvalue decompostion with, say, 5 eigenvalues: A = U*S*U', then invert that: A^-1 = U*S^-1*U'. S is diagonal so this is inexpensive. You can get the truncated eigenvalue decomposition with eigh.
That takes care of A. Then for the rest: use the block matrix inversion formula. Looks nasty, but I will bet you 100,000,000 prussian francs that it's 50x faster than the direct method you were using.
I faced the same situation (some years later than you) of trying to use more than one argument to LinearOperator, but for another problem. The solution I found was the use of global variables, to avoid passing the variables as arguments to the function.

Categories