What is the fastest way for matrix multiplication in Python?

What is the fastest way for matrix multiplication in Python? - python

I am writing an application in Python having speed as the main driver. While optimizing my code, I found out that the main bottleneck is given by the code used to compute
In my code, this matrix multiplication is computed as
POW = np.arange(4)
y = C # (x ** POW)
I tried to use different methods (e.g., for cycle and others), but as now this is the fastest way I found. Do you have any suggestion to improve the computational time?

It's absolutely to use Numpy. Numpy does the actual mathematical operations in highly optimized C code. Using Numpy is faster than writing your own non-optimized C code.

Firstly, use float instead int. Secondly, if you don't need double precision then use np.float32.
POW = np.arange(4, dtype='f')
# C = C.astype('f', copy=False) # ensure that C.dtype == np.float32
y = C # (x ** POW)

Related

Optimized way of element wise vector inverse with perturbations

I am having a big vector and I want to compute the elements wise inverse each time with a small perturbation. For example, for large N, I have y and its element wise inverse y_inv
y = np.random.rand(1, N)
y_inv = 1. / y
y_inv[y_inv == np.inf] = 0 # y may contain zeros
and I want to compute
p = 0.01
z = y + p
z_inv = 1./ z = 1./ (y + p)
multiple times for different p and fixed y. Because N is very large, is there an efficient way or an approximation, to use the already computed y_inv in the computation of z_inv? Any suggestions to increase the speed of inverting z are highly appreciated.

Floating point divisions are slow, especially for double-precision floating point numbers. Simple-precision is faster (with a relative error likely less than 2.5e-07) and it can be made even faster if you do not need a high-precision by computing the approximate reciprocal (with a maximum relative error for this approximation is less than 3.66e-4 on x86-64 processors).
Assuming this is OK to you to decrease the precision of results, here are the performance of the different approach on a Skylake processor (assuming Numpy is compiled with the support of the AVX SIMD instruction set):
Double-precision division: 8 operation/cycle/core
Simple-precision division: 5 operation/cycle/core
Simple-precision approximate reciprocal: 1 operation/cycle/core
You can easily switch to simple-precision with Numpy by specifying the dtype of arrays. For the reciprocal, you need to use Numba (or Cython) configured to use the fastmath model (a flag of njit).
Another solution is simply to execute this operation in using multiple threads. This can be easily done with Numba using the njit flag parallel=True and a loop iterating over nb.prange. This solution can be combined with the previous one (about precision) resulting in a much faster computation compared to the initial Numpy double-precision-based code.
Moreover, computing arrays in-place should be faster (especially when using multiple threads). Alternatively, you can preallocate the output array and reuse it (slower than the in-place method but faster than the naive approach). The parameter out of Numpy function (like np.divide) can be used to do that.
Here is an (untested) example of parallel Numba code:
import numba as nb
# See fastmath=True and float32 types for faster performance of approximate results
# Assume `out` is already preallocated
#njit('void(float64[::1], float64[::1], float64)', parallel=True)
def compute(out, y, p):
assert(out.size == y.size)
for i in nb.prange(y.size):
out[i] = 1.0 / (y[i] + p)

Find L3 norm of two arrays efficiently in Python

Suppose I have two arrays. A has size n by d, and B has size t by d. Suppose I want to output an array C, where C[i, j] gives the cubed L3 norm between A[i] and B[j] (both of these have size d). i.e.
C[i, j] = |A[i, 0]-B[j, 0]|**3 + |A[i, 1]-B[j, 1]|**3 + ... + |A[i, d-1]-B[j, d-1]|**3
Can anyone redirect me to a really efficient way to do this? I have tried using a double for loop and something with array operations but the runtime is incredibly slow.
Edit: I know TensorFlow.norm works, but how could I implement this myself?

In accordance with #gph answer, an explicit example of the application of Numpy's np.linalg.norm, using broadcasting to avoid any loops:
import numpy as np
n, m, d = 200, 300, 3
A = np.random.random((n,d))
B = np.random.random((m,d))
C = np.linalg.norm(A[:,None,:] - B[None,...], ord=3, axis = -1)
# The 'raw' option would be:
C2 = (np.sum(np.abs(A[:,None,:] - B[None,...])**3, axis = -1))**(1/3)
Both ways seems to take around the same time to execute. I've tried to use np.einsum but let's just say my algebra skills have seen better days. Perhaps you have better luck (a nice resource is this wiki by Divakar).

There may be a few optimizations to speed this up, but the performance isn't going to be anywhere near specialized math packages. Those packages are using blas and lapack to vectorize the operations. They also avoid a lot of type checking overhead by enforcing types at the time you set a value. If performance is important there really is no way you are going to do better than a numerical computing package like numpy.

Use a 3rd-party library written in C or create your own
numpy has a linalg library which should be able to compute your L3 norm for each A[i]-B[j]
If numpy works for you, take a look at numba's JIT, which can compile and cache some (numpy) code to be orders of magnitude faster (successive runs will take advantage of it). I believe you will need to describe the parts to accelerate as functions which will be called many times to take advantage of it.
roughly
import numpy as np
from numpy import linalg as LA
from numba import jit as numbajit
#numbajit
def L3norm(A, B, i, j):
return LA.norm([A[i] - B[j]], ord=3)
# may be able to apply JIT here too
def operate(A, B, n, t):
# generate array
C = np.zeros(shape=(n, t))
for i in range(n):
for j in range(t):
C[i, j] = L3norm(A, B, i, j)
return C

Assembling block matrix with Numba support

In my code (written in Python 2.7), I create two numpy arrays, A and B. I then use them to assemble a larger matrix, H, with the following code
H = np.block([A, B], [-B, -A])
Various computations follow, involving substantial amounts of numpy manipulations and for loops. As a result, I would like to use Numba to optimize the code. However, it appears that the numpy block function is unsupported in Numba. The matrices A and B are not terribly large, so I'm fine using a function that may not be as optimized as np.block, but I would still like to assemble H in a block matrix fashion for the purpose of readability. Are there any functions that would accomplish this?

Just to make #hpaulj's comment concrete, making some basic assumptions about your input data and not doing any sort of error checking:
#nb.njit
def nb_block(X):
xtmp1 = np.concatenate(X[0], axis=1)
xtmp2 = np.concatenate(X[1], axis=1)
return np.concatenate((xtmp1, xtmp2), axis=0)
The following also works:
#nb.njit
def nb_block2(X):
xtmp1 = np.hstack(X[0])
xtmp2 = np.hstack(X[1])
return np.vstack((xtmp1, xtmp2))
and the performance of the two differs with different sized arrays. You should benchmark your own application.
Then calling:
A = np.zeros((50,30))
B = np.ones((50,30))
X = np.block([[A, B], [-B, -A]])
Y = nb_block(((A, B), (-B, -A))) # Note the tuple-of-tuples vs list-of-lists
np.all_close(X, Y) # True

numpy Matrix Multiplication Simplification - is it possible?

Is there a way to simplify
a=np.dot(a,b)
just like the way you write a=a+b as a+=b ? (a,b are both np.array)

In Python3.5+ you can use the # operator for matrix multiplication, e.g.:
import numpy as np
a = np.random.randn(4, 10)
b = np.random.randn(10, 5)
c = a # b
This is equivalent to calling c = np.matmul(a, b). Inplace matrix multiplication (#=) is not yet supported (and doesn't make sense in most cases anyway, since the output usually has different dimensions to the first input).
Also note that np.matmul (and #) will behave differently to np.dot when one or more of the input arrays has >2 dimensions (see here).

Python Inverse of a Matrix

How do I get the inverse of a matrix in python? I've implemented it myself, but it's pure python, and I suspect there are faster modules out there to do it.

You should have a look at numpy if you do matrix manipulation. This is a module mainly written in C, which will be much faster than programming in pure python. Here is an example of how to invert a matrix, and do other matrix manipulation.
from numpy import matrix
from numpy import linalg
A = matrix( [[1,2,3],[11,12,13],[21,22,23]]) # Creates a matrix.
x = matrix( [[1],[2],[3]] ) # Creates a matrix (like a column vector).
y = matrix( [[1,2,3]] ) # Creates a matrix (like a row vector).
print A.T # Transpose of A.
print A*x # Matrix multiplication of A and x.
print A.I # Inverse of A.
print linalg.solve(A, x) # Solve the linear equation system.
You can also have a look at the array module, which is a much more efficient implementation of lists when you have to deal with only one data type.

Make sure you really need to invert the matrix. This is often unnecessary and can be numerically unstable. When most people ask how to invert a matrix, they really want to know how to solve Ax = b where A is a matrix and x and b are vectors. It's more efficient and more accurate to use code that solves the equation Ax = b for x directly than to calculate A inverse then multiply the inverse by B. Even if you need to solve Ax = b for many b values, it's not a good idea to invert A. If you have to solve the system for multiple b values, save the Cholesky factorization of A, but don't invert it.
See Don't invert that matrix.

It is a pity that the chosen matrix, repeated here again, is either singular or badly conditioned:
A = matrix( [[1,2,3],[11,12,13],[21,22,23]])
By definition, the inverse of A when multiplied by the matrix A itself must give a unit matrix. The A chosen in the much praised explanation does not do that. In fact just looking at the inverse gives a clue that the inversion did not work correctly. Look at the magnitude of the individual terms - they are very, very big compared with the terms of the original A matrix...
It is remarkable that the humans when picking an example of a matrix so often manage to pick a singular matrix!
I did have a problem with the solution, so looked into it further. On the ubuntu-kubuntu platform, the debian package numpy does not have the matrix and the linalg sub-packages, so in addition to import of numpy, scipy needs to be imported also.
If the diagonal terms of A are multiplied by a large enough factor, say 2, the matrix will most likely cease to be singular or near singular. So
A = matrix( [[2,2,3],[11,24,13],[21,22,46]])
becomes neither singular nor nearly singular and the example gives meaningful results... When dealing with floating numbers one must be watchful for the effects of inavoidable round off errors.

For those like me, who were looking for a pure Python solution without pandas or numpy involved, check out the following GitHub project: https://github.com/ThomIves/MatrixInverse.
It generously provides a very good explanation of how the process looks like "behind the scenes". The author has nicely described the step-by-step approach and presented some practical examples, all easy to follow.
This is just a little code snippet from there to illustrate the approach very briefly (AM is the source matrix, IM is the identity matrix of the same size):
def invert_matrix(AM, IM):
for fd in range(len(AM)):
fdScaler = 1.0 / AM[fd][fd]
for j in range(len(AM)):
AM[fd][j] *= fdScaler
IM[fd][j] *= fdScaler
for i in list(range(len(AM)))[0:fd] + list(range(len(AM)))[fd+1:]:
crScaler = AM[i][fd]
for j in range(len(AM)):
AM[i][j] = AM[i][j] - crScaler * AM[fd][j]
IM[i][j] = IM[i][j] - crScaler * IM[fd][j]
return IM
But please do follow the entire thing, you'll learn a lot more than just copy-pasting this code! There's a Jupyter notebook as well, btw.
Hope that helps someone, I personally found it extremely useful for my very particular task (Absorbing Markov Chain) where I wasn't able to use any non-standard packages.

You could calculate the determinant of the matrix which is recursive
and then form the adjoined matrix
Here is a short tutorial
I think this only works for square matrices
Another way of computing these involves gram-schmidt orthogonalization and then transposing the matrix, the transpose of an orthogonalized matrix is its inverse!

Numpy will be suitable for most people, but you can also do matrices in Sympy
Try running these commands at http://live.sympy.org/
M = Matrix([[1, 3], [-2, 3]])
M
M**-1
For fun, try M**(1/2)

If you hate numpy, get out RPy and your local copy of R, and use it instead.
(I would also echo to make you you really need to invert the matrix. In R, for example, linalg.solve and the solve() function don't actually do a full inversion, since it is unnecessary.)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

What is the fastest way for matrix multiplication in Python? - python

It's absolutely to use Numpy. Numpy does the actual mathematical operations in highly optimized C code. Using Numpy is faster than writing your own non-optimized C code.

Firstly, use float instead int. Secondly, if you don't need double precision then use np.float32. POW = np.arange(4, dtype='f') # C = C.astype('f', copy=False) # ensure that C.dtype == np.float32 y = C # (x ** POW)

Related

Optimized way of element wise vector inverse with perturbations

Find L3 norm of two arrays efficiently in Python

Assembling block matrix with Numba support

numpy Matrix Multiplication Simplification - is it possible?

Python Inverse of a Matrix

Categories

Resources