Matrix inversion without Numpy - python

I want to invert a matrix without using numpy.linalg.inv.
The reason is that I am using Numba to speed up the code, but numpy.linalg.inv is not supported, so I am wondering if I can invert a matrix with 'classic' Python code.
With numpy.linalg.inv an example code would look like that:
import numpy as np
M = np.array([[1,0,0],[0,1,0],[0,0,1]])
Minv = np.linalg.inv(M)

Here is a more elegant and scalable solution, imo. It'll work for any nxn matrix and you may find use for the other methods. Note that getMatrixInverse(m) takes in an array of arrays as input. Please feel free to ask any questions.
def transposeMatrix(m):
return map(list,zip(*m))
def getMatrixMinor(m,i,j):
return [row[:j] + row[j+1:] for row in (m[:i]+m[i+1:])]
def getMatrixDeternminant(m):
#base case for 2x2 matrix
if len(m) == 2:
return m[0][0]*m[1][1]-m[0][1]*m[1][0]
determinant = 0
for c in range(len(m)):
determinant += ((-1)**c)*m[0][c]*getMatrixDeternminant(getMatrixMinor(m,0,c))
return determinant
def getMatrixInverse(m):
determinant = getMatrixDeternminant(m)
#special case for 2x2 matrix:
if len(m) == 2:
return [[m[1][1]/determinant, -1*m[0][1]/determinant],
[-1*m[1][0]/determinant, m[0][0]/determinant]]
#find matrix of cofactors
cofactors = []
for r in range(len(m)):
cofactorRow = []
for c in range(len(m)):
minor = getMatrixMinor(m,r,c)
cofactorRow.append(((-1)**(r+c)) * getMatrixDeternminant(minor))
cofactors.append(cofactorRow)
cofactors = transposeMatrix(cofactors)
for r in range(len(cofactors)):
for c in range(len(cofactors)):
cofactors[r][c] = cofactors[r][c]/determinant
return cofactors

Here is another way, using gaussian elimination instead:
def eliminate(r1, r2, col, target=0):
fac = (r2[col]-target) / r1[col]
for i in range(len(r2)):
r2[i] -= fac * r1[i]
def gauss(a):
for i in range(len(a)):
if a[i][i] == 0:
for j in range(i+1, len(a)):
if a[i][j] != 0:
a[i], a[j] = a[j], a[i]
break
else:
raise ValueError("Matrix is not invertible")
for j in range(i+1, len(a)):
eliminate(a[i], a[j], i)
for i in range(len(a)-1, -1, -1):
for j in range(i-1, -1, -1):
eliminate(a[i], a[j], i)
for i in range(len(a)):
eliminate(a[i], a[i], i, target=1)
return a
def inverse(a):
tmp = [[] for _ in a]
for i,row in enumerate(a):
assert len(row) == len(a)
tmp[i].extend(row + [0]*i + [1] + [0]*(len(a)-i-1))
gauss(tmp)
ret = []
for i in range(len(tmp)):
ret.append(tmp[i][len(tmp[i])//2:])
return ret

As of at least July 16, 2018 Numba has a fast matrix inverse. (You can see how they overload the standard NumPy inverse and other operations here.)
Here are the results of my benchmarking:
import numpy as np
from scipy import linalg as sla
from scipy import linalg as nla
import numba
def gen_ex(d0):
x = np.random.randn(d0,d0)
return x.T + x
#numba.jit
def inv_nla_jit(A):
return np.linalg.inv(A)
#numba.jit
def inv_sla_jit(A):
return sla.inv(A)
For small matrices it is particularly fast:
ex1 = gen_ex(4)
%timeit inv_nla_jit(ex1) # NumPy + Numba
%timeit inv_sla_jit(ex1) # SciPy + Numba
%timeit nla.inv(ex1) # NumPy
%timeit sla.inv(ex1) # SciPy
[Out]
2.54 µs ± 467 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
67.3 µs ± 9.18 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
63.5 µs ± 7.65 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
56.6 µs ± 5.03 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Notice that the speedup only works for NumPy inverse, not SciPy (as expected).
Slightly larger matrix:
ex2 = gen_ex(40)
%timeit inv_nla_jit(ex2) # NumPy + Numba
%timeit inv_sla_jit(ex2) # SciPy + Numba
%timeit nla.inv(ex2) # NumPy
%timeit sla.inv(ex2) # SciPy
[Out]
131 µs ± 12.9 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
278 µs ± 26.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
231 µs ± 24.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
189 µs ± 11.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
So there's still a speedup here but SciPy is catching up.

For a 4 x 4 matrix it's probably just about OK to use the mathematical formula, which you can find using Googling "formula for 4 by 4 matrix inverse". For example here (I can't vouch for its accuracy):
http://www.cg.info.hiroshima-cu.ac.jp/~miyazaki/knowledge/teche23.html
In general inverting a general matrix is not for the faint-hearted. You have to be aware of all the mathematically difficult cases and know why they won't apply to your usage, and catch them when you are supplied with mathematically pathological inputs (that, or return results of low accuracy or numerical garbage in the knowledge that it won't matter in your usage case provided you don't actually end up dividing by zero or overflowing MAXFLOAT ... which you might catch with an exception handler and present as "Error: matrix is singular or very close thereto").
It's generally better as a programmer to use library code written by numerical mathematics experts, unless you are willing to spend time understanding the physical and mathematical nature of the particular problem that you are addressing and become your own mathematics expert in your own specialist field.

I found that Gaussian Jordan Elimination Algorithm helped a lot when attempting this. If you're going to use a given matrix (any size, i.e 5x5) where the hardcore formula for it is 49 pages long. It's best to use this. To inverse a matrix place it as a 2D array and then run the Inverse function
# Python test Guassion Jordan Elimination
# Inputs are 2D array not matrix
Test_Array = [[3,3,2,1,1],[2,1,3,2,3],[1,3,3,2,2],[2,3,3,1,1],
[3,1,2,1,2]]
# Creating storage & initalizing for augmented matrix
# this is the same as the np.zeros((n,2*n)) function
def nx2n(n_Rows, n_Columns):
Zeros = []
for i in range(n_Rows):
Zeros.append([])
for j in range(n_Columns*2):
Zeros[i].append(0)
return Zeros
# Applying matrix coefficients
def update(inputs, n_Rows, n_Columns, Zero):
for i in range(n_Rows):
for j in range(n_Columns):
Zero[i][j] = inputs[i][j]
return Zero
# Augmenting Identity Matrix of Order n
def identity(n_Rows, n_Columns, Matrix):
for i in range(n_Rows):
for j in range(n_Columns):
if i == j:
Matrix[i][j+n_Columns] = 1
return Matrix
# Applying & implementing the GJE algorithm
def Gussain_Jordan_Elimination(n_Rows, n_Columns, Matrix):
for i in range(n_Rows):
if Matrix[i][i] == 0:
print('error cannot divide by "0"')
for j in range(n_Columns):
if i != j:
ratio = Matrix[j][i]/Matrix[i][i]
for k in range(2*n_Columns):
Matrix[j][k] = Matrix[j][k] - ratio * Matrix[i][k]
return Matrix
# Row Operation to make Principal Diagonal Element to '1'
def row_op(n_Rows, n_Columns, Matrix):
for i in range(n_Rows):
divide = Matrix[i][i]
for j in range(2*n_Columns):
Matrix[i][j] = Matrix[i][j]/divide
return Matrix
# Display Inversed Matix
def Inverse(Matrix):
returnable = []
number_Rows = int(len(Matrix))
number_Columns = int(len(Matrix[0]))
Inversed_Matrix = (row_op(number_Rows, number_Columns,
Gussain_Jordan_Elimination(number_Rows, number_Columns,
identity(number_Rows, number_Columns,
update(Matrix, number_Rows, number_Columns,
nx2n(number_Rows, number_Columns))))))
for i in range(number_Rows):
returnable.append([])
for j in range(number_Columns, 2*number_Columns):
returnable[i].append(Inversed_Matrix[i][j])
return returnable
print(Inverse(Test_Array))

Simply add all methods
import math
def getMinorIndex(matrixLocal, x, y):
minor = []
for i in range(3):
minorRow = []
if i == x:
continue
for j in range(3):
if j == y:
continue
minorRow.append(matrixLocal[i][j])
minor.append(minorRow)
return minor
def getDeterminant2By2(matrixLocal):
determinant = matrixLocal[0][0] * matrixLocal[1][1] - matrixLocal[0][1] * matrixLocal[1][0]
return determinant
def getDeterminant(matrixLocal):
determinant = 0
for x in range(3):
t = getDeterminant2By2(getMinorIndex(matrixLocal, 0, x))
e = matrixLocal[0][x]
determinant += (t * e * math.pow(-1, x))
return determinant
def getCofactorMatrix(matrixLocal):
cofactorMatrix = []
for i in range(3):
row = []
for j in range(3):
e = matrixLocal[i][j]
t = getDeterminant2By2(getMinorIndex(matrixLocal, i, j))
row.append(t * math.pow(-1, i + j))
cofactorMatrix.append(row)
return cofactorMatrix
def transpose(matrixLocal):
transposeMatrix = []
for i in range(3):
row = []
for j in range(3):
e = matrixLocal[j][i]
row.append(e)
transposeMatrix.append(row)
return transposeMatrix
def divideMatrix(matrixLocal, divisor):
ansMatrix = []
for i in range(3):
row = []
for j in range(3):
e = matrixLocal[i][j]/divisor
row.append(e)
ansMatrix.append(row)
return ansMatrix
cofactor = getCofactorMatrix(matrix)
adjoint = transpose(cofactor)
det = getDeterminant(matrix)
inverse = divideMatrix(adjoint, det)
inverse

Inverse matrix of 3x3 without numpy [python3]
import pprint
def inverse_3X3_matrix():
I_Q_list = [[0, 1, 1],
[2, 3, -1],
[-1, 2, 1]]
det_ = I_Q_list[0][0] * (
(I_Q_list[1][1] * I_Q_list[2][2]) - (I_Q_list[1][2] * I_Q_list[2][1])) - \
I_Q_list[0][1] * (
(I_Q_list[1][0] * I_Q_list[2][2]) - (I_Q_list[1][2] * I_Q_list[2][0])) + \
I_Q_list[0][2] * (
(I_Q_list[1][0] * I_Q_list[2][1]) - (I_Q_list[1][1] * I_Q_list[2][0]))
co_fctr_1 = [(I_Q_list[1][1] * I_Q_list[2][2]) - (I_Q_list[1][2] * I_Q_list[2][1]),
-((I_Q_list[1][0] * I_Q_list[2][2]) - (I_Q_list[1][2] * I_Q_list[2][0])),
(I_Q_list[1][0] * I_Q_list[2][1]) - (I_Q_list[1][1] * I_Q_list[2][0])]
co_fctr_2 = [-((I_Q_list[0][1] * I_Q_list[2][2]) - (I_Q_list[0][2] * I_Q_list[2][1])),
(I_Q_list[0][0] * I_Q_list[2][2]) - (I_Q_list[0][2] * I_Q_list[2][0]),
-((I_Q_list[0][0] * I_Q_list[2][1]) - (I_Q_list[0][1] * I_Q_list[2][0]))]
co_fctr_3 = [(I_Q_list[0][1] * I_Q_list[1][2]) - (I_Q_list[0][2] * I_Q_list[1][1]),
-((I_Q_list[0][0] * I_Q_list[1][2]) - (I_Q_list[0][2] * I_Q_list[1][0])),
(I_Q_list[0][0] * I_Q_list[1][1]) - (I_Q_list[0][1] * I_Q_list[1][0])]
inv_list = [[1 / det_ * (co_fctr_1[0]), 1 / det_ * (co_fctr_2[0]), 1 / det_ * (co_fctr_3[0])],
[1 / det_ * (co_fctr_1[1]), 1 / det_ * (co_fctr_2[1]), 1 / det_ * (co_fctr_3[1])],
[1 / det_ * (co_fctr_1[2]), 1 / det_ * (co_fctr_2[2]), 1 / det_ * (co_fctr_3[2])]]
pprint.pprint(inv_list)
inverse_3X3_matrix()

I used the formula from http://cg.info.hiroshima-cu.ac.jp/~miyazaki/knowledge/teche23.html to write the function that does the inversion of a 4x4 matrix:
import numpy as np
def myInverse(A):
detA = np.linalg.det(A)
b00 = A[1,1]*A[2,2]*A[3,3] + A[1,2]*A[2,3]*A[3,1] + A[1,3]*A[2,1]*A[3,2] - A[1,1]*A[2,3]*A[3,2] - A[1,2]*A[2,1]*A[3,3] - A[1,3]*A[2,2]*A[3,1]
b01 = A[0,1]*A[2,3]*A[3,2] + A[0,2]*A[2,1]*A[3,3] + A[0,3]*A[2,2]*A[3,1] - A[0,1]*A[2,2]*A[3,3] - A[0,2]*A[2,3]*A[3,1] - A[0,3]*A[2,1]*A[3,2]
b02 = A[0,1]*A[1,2]*A[3,3] + A[0,2]*A[1,3]*A[3,1] + A[0,3]*A[1,1]*A[3,2] - A[0,1]*A[1,3]*A[3,2] - A[0,2]*A[1,1]*A[3,3] - A[0,3]*A[1,2]*A[3,1]
b03 = A[0,1]*A[1,3]*A[2,2] + A[0,2]*A[1,1]*A[2,3] + A[0,3]*A[1,2]*A[2,1] - A[0,1]*A[1,2]*A[2,3] - A[0,2]*A[1,3]*A[2,1] - A[0,3]*A[1,1]*A[2,2]
b10 = A[1,0]*A[2,3]*A[3,2] + A[1,2]*A[2,0]*A[3,3] + A[1,3]*A[2,2]*A[3,0] - A[1,0]*A[2,2]*A[3,3] - A[1,2]*A[2,3]*A[3,0] - A[1,3]*A[2,0]*A[3,2]
b11 = A[0,0]*A[2,2]*A[3,3] + A[0,2]*A[2,3]*A[3,0] + A[0,3]*A[2,0]*A[3,2] - A[0,0]*A[2,3]*A[3,2] - A[0,2]*A[2,0]*A[3,3] - A[0,3]*A[2,2]*A[3,0]
b12 = A[0,0]*A[1,3]*A[3,2] + A[0,2]*A[1,0]*A[3,3] + A[0,3]*A[1,2]*A[3,0] - A[0,0]*A[1,2]*A[3,3] - A[0,2]*A[1,3]*A[3,0] - A[0,3]*A[1,0]*A[3,2]
b13 = A[0,0]*A[1,2]*A[2,3] + A[0,2]*A[1,3]*A[2,0] + A[0,3]*A[1,0]*A[2,2] - A[0,0]*A[1,3]*A[2,2] - A[0,2]*A[1,0]*A[2,3] - A[0,3]*A[1,2]*A[2,0]
b20 = A[1,0]*A[2,1]*A[3,3] + A[1,1]*A[2,3]*A[3,0] + A[1,3]*A[2,0]*A[3,1] - A[1,0]*A[2,3]*A[3,1] - A[1,1]*A[2,0]*A[3,3] - A[1,3]*A[2,1]*A[3,0]
b21 = A[0,0]*A[2,3]*A[3,1] + A[0,1]*A[2,0]*A[3,3] + A[0,3]*A[2,1]*A[3,0] - A[0,0]*A[2,1]*A[3,3] - A[0,1]*A[2,3]*A[3,0] - A[0,3]*A[2,0]*A[3,1]
b22 = A[0,0]*A[1,1]*A[3,3] + A[0,1]*A[1,3]*A[3,0] + A[0,3]*A[1,0]*A[3,1] - A[0,0]*A[1,3]*A[3,1] - A[0,1]*A[1,0]*A[3,3] - A[0,3]*A[1,1]*A[3,0]
b23 = A[0,0]*A[1,3]*A[2,1] + A[0,1]*A[1,0]*A[2,3] + A[0,3]*A[1,1]*A[2,0] - A[0,0]*A[1,1]*A[2,3] - A[0,1]*A[1,3]*A[2,0] - A[0,3]*A[1,0]*A[2,1]
b30 = A[1,0]*A[2,2]*A[3,1] + A[1,1]*A[2,0]*A[3,2] + A[1,2]*A[2,1]*A[3,0] - A[1,0]*A[2,1]*A[3,2] - A[1,1]*A[2,2]*A[3,0] - A[1,2]*A[2,0]*A[3,1]
b31 = A[0,0]*A[2,1]*A[3,2] + A[0,1]*A[2,2]*A[3,0] + A[0,2]*A[2,0]*A[3,1] - A[0,0]*A[2,2]*A[3,1] - A[0,1]*A[2,0]*A[3,2] - A[0,2]*A[2,1]*A[3,0]
b32 = A[0,0]*A[1,2]*A[3,1] + A[0,1]*A[1,0]*A[3,2] + A[0,2]*A[1,1]*A[3,0] - A[0,0]*A[1,1]*A[3,2] - A[0,1]*A[1,2]*A[3,0] - A[0,2]*A[1,0]*A[3,1]
b33 = A[0,0]*A[1,1]*A[2,2] + A[0,1]*A[1,2]*A[2,0] + A[0,2]*A[1,0]*A[2,1] - A[0,0]*A[1,2]*A[2,1] - A[0,1]*A[1,0]*A[2,2] - A[0,2]*A[1,1]*A[2,0]
Ainv = np.array([[b00, b01, b02, b03], [b10, b11, b12, b13], [b20, b21, b22, b23], [b30, b31, b32, b33]]) / detA
return Ainv

Related

vectorizing a "leaky integrator" in numpy

I need a leaky integrator -- an IIR filter -- that implements:
y[i] = x[i] + y[i-1] * leakiness
The following code works. However, my x vectors are long and this is in an inner loop. So my questions:
For efficiency, is there a way to vectorize this in numpy?
If not numpy, would it be advantageous to use one of the scipy.signal filter algorithms?
The iterative code follows. state is simply the value of the previous y[i-1] that gets carried forward over successive calls:
import numpy as np
def leaky_integrator(x, state, leakiness):
y = np.zeros(len(x), dtype=np.float32)
for i in range(len(x)):
if i == 0:
y[i] = x[i] + state * leakiness
else:
y[i] = x[i] + y[i-1] * leakiness
return y, y[-1]
>>> leakiness = 0.5
>>> a1 = [1, 0, 0, 0]
>>> state = 0
>>> print("a1=", a1, "state=", state)
a1= [1, 0, 0, 0] state= 0
>>> a2, state = leaky_integrator(a1, state, leakiness)
>>> print("a2=", a2, "state=", state)
a2= [1. 0.5 0.25 0.125] state= 0.125
>>> a3, state = leaky_integrator(a2, state, leakiness)
>>> print("a3=", a3, "state=", state)
a3= [1.0625 1.03125 0.765625 0.5078125] state= 0.5078125
I can see two options:
The simplest (and suggested solution) is to extend the dependency list and use numba.
Use matrix multiplication by rethinking the problem in term of matrix operations.
In fact if x=[a, b, c], s=state, l=leakiness
then
y = [a + s*l, b + (a + s*l)*l, c + (b + (a + s*l)*l)*l]
= [a + s*l, b + a*l + s*l**2, c + b*l + a*l**2 + s*l**3]
= [[1, 0, 0], [l, 1, 0], [l**2, l, 1]] # x + s * [l, l**2, l**3]
However you may need to generate a matrix of size x.size**2 and may get an out of memory error even for small sizes (e.g. for a 1M size array it results about 7 TiB, which I don't think it is doable).
Going back to the numba implemenation, it can be enough to add a #jit(nopython=True) decorator to the function you already implemented.
Doing so in my machine with a random array of size 1M:
%timeit leaky_integrator(a1, s, l)
2.07 s ± 99.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit leaky_integrator_jitted(a1, s, l)
7.66 ms ± 22.2 µs per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
(Remark on numba performance here)
from numba import jit
#jit(nopython=True)
def leaky_integrator_jitted(x, state, leakiness):
y = np.zeros(len(x), dtype=np.float32)
for i in range(len(x)):
if i == 0:
y[i] = x[i] + state * leakiness
else:
y[i] = x[i] + y[i-1] * leakiness
return y, y[-1]

Using numba to speed up a Metropolis algorithm

I am testing how much numba can help in speeding up a classical Metropolis algorithm, e.g. hard-disks in a box:
def Metropolis2D(L, d_ex, n_steps, delta):
for steps in range(n_steps):
a = rm.choice(L)
b = [a[0] + rm.uniform(-delta, delta), a[1] + rm.uniform(-delta, delta)]
min_dist = min((b[0] - c[0]) ** 2 + (b[1] - c[1]) ** 2 for c in L if c != a)
box_cond = min(b[0], b[1]) <= d_ex or max(b[0], b[1]) >= 1.0 - d_ex
if not (box_cond or min_dist < 4.0 * d_ex ** 2):
a[:] = b
return L
Metropolis2D_jit = njit()(Metropolis2D)
For example, let us consider the following settings:
ell = 4
N = ell**2 # number of disks
eta = 0.65 # disk density
d_ex = math.sqrt(eta / (math.pi * N)) # radius of the N disks at density eta
del_xy = 1 / (2 * ell) # helf lattice spacing for the initial configuration
L = [[del_xy + i * 2 * del_xy, del_xy + j * 2 * del_xy] for i in range(ell) for j in range(ell)]
n_steps = 1
delta = 2 * (del_xy - d_ex) # maximal displacement along x or y
We are starting from all disks aligned on a square grid at a relatively high (0.65) density of them on the unit square. A single MC move (i.e. n_steps = 1) in the non-jitted Metropolis function requires then:
%timeit Metropolis2D_jit(L, d_ex, n_steps, delta)
7.01 µs ± 11.4 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
However, when running Metropolis2D_jit, I get the following error:
UnsupportedError: Failed in nopython mode pipeline (step: inline calls to locally defined closures)
The use of yield in a closure is unsupported.
File "../../../var/folders/q9/jzmgl38d1h5gnmkllt_ly8rr0000gn/T/ipykernel_28212/2383793983.py", line 6:
<source missing, REPL/exec in use?>
Do you have ideas on how to circumvent this problem and exploit numba speed-up?
Indeed, one possibility would be to write a partially jitted function, where the disk-disk distance is jitted in numba. However, I was curious if there are solutions to have a full jit of the function Metropolis2D.
Thanks #Jérôme Richard. You are right: it has nothing to do with bound-checking (I am amending the question). I fixed min by means of a loop and corrected some issues due to the adoption of rm.choice(). This is an attempted solution that compiles now in numba without errors:
import numba as nb
import random as rm
#nb.njit()
def dist_sq_jit(b, c):
return (b[0] - c[0]) ** 2 + (b[1] - c[1]) ** 2
#nb.njit()
def Metropolis2D_jit(L, d_ex, n_steps, delta):
for steps in range(n_steps):
ran_ind = np.random.randint(0, len(L))
a = L[ran_ind]
b = [a[0] + rm.uniform(-delta, delta), a[1] + rm.uniform(-delta, delta)]
L_sliced = np.concatenate((L[:ran_ind], L[ran_ind + 1:]), axis = 0)
min_dist = np.zeros(len(L))
i = -1
for c in L_sliced:
i += 1
min_dist[i] = dist_sq_jit(b, c)
if (min(b) >= d_ex and max(b) <= 1.0 - d_ex and min(min_dist) >= 4.0 * d_ex ** 2):
L[ran_ind] = b
return L
The computation time (same parameters' settings as above, only change: L is now a numpy array) is reduced by roughly a factor 5:
L_vec = np.array(L)
%timeit Metropolis2D_jit(L_vec, d_ex, n_steps, delta)
1.39 µs ± 30.3 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Perhaps there is room for further improvements?

how to avoid loops when using numpy

I have an numpy array with size of 10x250000 and I need to change values of it during the program but using "for loop" make my program very slow. it is the mentioned section of program. (L2 , L1 and T are other arrays) :
EEf1=numpy.zeros((10,250000))
for m in range(10):
for n in range(250000):
EEf1[m,n]+=2*(L2[m,0]-T[m,0])*L2[m,0]*(1-L2[m,0])*L1[n,0]
My question : Is there any solution to use numpy features to avoid this loop ?
I have tried fromfunction method like this (for example):
np.fromfunction(lambda i,j : A[i,j]+B[i,j] , (2,2))
(A and B are 2x2 arrays)
but it doesnt work.
I really need to avoid this loop.
can anyone help ?
I think it's just broadcasting, but it's difficult to test without a reproducible example:
EEf1 = 2*(L2[:, 0] - T[:, 0]) * L2[:, 0] * (1 - L2[:, 0]) * L1[None, :, 0]
Most of the time there is little change to do, just skip the for loops.
Here you have a shape mismatch between L1 and L2 -- np.outer does the job here:
def my_func(L1, L2, T):
return np.outer(2 * (L2 - T) * L2 * (1 - L2), L1)
Also, it is a bit odd that L1, L2 and T are not 1D arrays to begin with, because you only ever use their first column.
Equality check:
M, N = 10, 25_000
L1 = np.random.rand(N, 1)
L2 = np.random.rand(M, 1)
T = np.random.rand(M, 1)
expected = op_func(L1, L2, T)
result = my_func(L1, L2, T)
np.array_equal(result, expected) # True
Performance tests:
>>> %timeit op_func(L1, L2, T)
... %timeit my_func(L1, L2, T)
250 ms ± 2.24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
85.2 µs ± 191 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
Your lines of code as a function, for reference:
def op_func(L1, L2, T):
N, M = L1.shape[0], L2.shape[0]
assert T.shape[0] == M
EEf1 = np.zeros((M, N))
for m in range(M):
for n in range(N):
EEf1[m, n] += (
2 * (L2[m, 0] - T[m, 0]) * L2[m, 0] * (1 - L2[m, 0]) * L1[n, 0]
)
return EEf1

Using a nested for loop to append to multiple lists

I'm trying to loop the Shockley diode equation over three different temperatures (with three different saturation currents), for a range of values of voltage, and then insert these three lots of data into lists.
kb = 1.38E-23
q = 1.602E-19
voltage = np.arange(0.5, 1.02, 0.02)
T = np.array([269, 289.1, 294.5])
Is = np.array([1.707E-14, 6.877E-14, 1.4510E-13])
i269 = []
i284 = []
i294 = []
for i in range(1, len(voltage)):
for j in range(1, 3):
I = Is[j] * np.exp((voltage[i] * q) / (kb * T[j]))
i269.append(I[j[0]])
i284.append(I[j[1]])
i294.append(I[j[2]])
I know the method I've used here is not syntactically correct but I've written it this way to try to aid my efforts in conveying what it is I'm trying to achieve.
I want to loop through voltage first for j = 0, and append I into i269, then again through voltage for j = 1 and append into i284 etc.
Any help would be much appreciated.
Thanks
Note that with your current approach I cannot be indexed, as you're simply assigning a value to it. You'd firstly need to define it as a list, and then index it to assign its values to i269, i284 or i294:
kb = 1.38E-23
q = 1.602E-19
voltage = np.arange(0.5, 1.02, 0.02)
T = np.array([269, 289.1, 294.5])
Is = np.array([1.707E-14, 6.877E-14, 1.4510E-13])
I = [0 for _ in range(len(T))]
i269 = []
i284 = []
i294 = []
for i in range(len(voltage)):
for j in range(3):
I[j] = Is[j] * np.exp((voltage[i] * q) / (kb * T[j]))
i269.append(I[0])
i284.append(I[1])
i294.append(I[2])
However, it is possible to gain improvements in performance by leveraging broadcasting and vectorising the above as:
i269, i284, i294 = Is[:,None] * np.exp((voltage * q) / (kb * T)[:,None])
Lets check the results and have a look at the timings with both approaches:
kb = 1.38E-23
q = 1.602E-19
voltage = np.arange(0.5, 1.02, 0.02)
T = np.array([269, 289.1, 294.5])
Is = np.array([1.707E-14, 6.877E-14, 1.4510E-13])
def current_approach(kb, q, voltage, T, Is):
I = [0 for _ in range(len(T))]
i269 = []
i284 = []
i294 = []
for i in range(len(voltage)):
for j in range(3):
I[j] = Is[j] * np.exp((voltage[i] * q) / (kb * T[j]))
i269.append(I[0])
i284.append(I[1])
i294.append(I[2])
return i269, i284, i294
def vect_approach(kb, q, voltage, T, Is):
i269, i284, i294 = Is[:,None] * np.exp((voltage * q) / (kb * T)[:,None])
return i269, i284, i294
np.allclose(current_approach(kb, q, voltage, T, Is),
vect_approach(kb, q, voltage, T, Is))
# True
%timeit current_approach(kb, q, voltage, T, Is)
# 201 µs ± 3.46 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit vect_approach(kb, q, voltage, T, Is)
# 30.9 ns ± 0.647 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
Up to a huge 6,500x speedup with the vectorised approach!

Arnaud Legoux Moving Average and numpy

I'd like to write the vectored version of code that calculates Arnaud Legoux Moving Average using NumPy (or Pandas). Could you help me with this, please? Thanks.
Non-vectored version looks like following (see below).
def NPALMA(pnp_array, **kwargs) :
'''
ALMA - Arnaud Legoux Moving Average,
http://www.financial-hacker.com/trend-delusion-or-reality/
https://github.com/darwinsys/Trading_Strategies/blob/master/ML/Features.py
'''
length = kwargs['length']
# just some number (6.0 is useful)
sigma = kwargs['sigma']
# sensisitivity (close to 1) or smoothness (close to 0)
offset = kwargs['offset']
asize = length - 1
m = offset * asize
s = length / sigma
dss = 2 * s * s
alma = np.zeros(pnp_array.shape)
wtd_sum = np.zeros(pnp_array.shape)
for l in range(len(pnp_array)):
if l >= asize:
for i in range(length):
im = i - m
wtd = np.exp( -(im * im) / dss)
alma[l] += pnp_array[l - length + i] * wtd
wtd_sum[l] += wtd
alma[l] = alma[l] / wtd_sum[l]
return alma
Starting Approach
We can create sliding windows along the first axis and then use tensor multiplication with the range of wtd values for the sum-reductions.
The implementation would look something like this -
# Get all wtd values in an array
wtds = np.exp(-(np.arange(length) - m)**2/dss)
# Get the sliding windows for input array along first axis
pnp_array3D = strided_axis0(pnp_array,len(wtds))
# Initialize o/p array
out = np.zeros(pnp_array.shape)
# Get sum-reductions for the windows which don't need wrapping over
out[length:] = np.tensordot(pnp_array3D,wtds,axes=((1),(0)))[:-1]
# Last element of the output needed wrapping. So, do it separately.
out[length-1] = wtds.dot(pnp_array[np.r_[-1,range(length-1)]])
# Finally perform the divisions
out /= wtds.sum()
Function to get the sliding windows : strided_axis0 is from here.
Boost with 1D convolution
Those multiplications with wtds values and then their sum-reductions are basically convolution along the first axis. As such, we can use scipy.ndimage.convolve1d along axis=0. This would be much faster given the memory efficiency, as we won't be creating huge sliding windows.
The implementation would be -
from scipy.ndimage import convolve1d as conv
avgs = conv(pnp_array, weights=wtds/wtds.sum(),axis=0, mode='wrap')
Thus, out[length-1:], which are the non-zero rows would be same as avgs[:-length+1].
There could be some precision difference if we are working with really small kernel numbers from wtds. So, keep that in mind if using this convolution method.
Runtime test
Approaches -
def original_app(pnp_array, length, m, dss):
alma = np.zeros(pnp_array.shape)
wtd_sum = np.zeros(pnp_array.shape)
for l in range(len(pnp_array)):
if l >= asize:
for i in range(length):
im = i - m
wtd = np.exp( -(im * im) / dss)
alma[l] += pnp_array[l - length + i] * wtd
wtd_sum[l] += wtd
alma[l] = alma[l] / wtd_sum[l]
return alma
def vectorized_app1(pnp_array, length, m, dss):
wtds = np.exp(-(np.arange(length) - m)**2/dss)
pnp_array3D = strided_axis0(pnp_array,len(wtds))
out = np.zeros(pnp_array.shape)
out[length:] = np.tensordot(pnp_array3D,wtds,axes=((1),(0)))[:-1]
out[length-1] = wtds.dot(pnp_array[np.r_[-1,range(length-1)]])
out /= wtds.sum()
return out
def vectorized_app2(pnp_array, length, m, dss):
wtds = np.exp(-(np.arange(length) - m)**2/dss)
return conv(pnp_array, weights=wtds/wtds.sum(),axis=0, mode='wrap')
Timings -
In [470]: np.random.seed(0)
...: m,n = 1000,100
...: pnp_array = np.random.rand(m,n)
...:
...: length = 6
...: sigma = 0.3
...: offset = 0.5
...:
...: asize = length - 1
...: m = np.floor(offset * asize)
...: s = length / sigma
...: dss = 2 * s * s
...:
In [471]: %timeit original_app(pnp_array, length, m, dss)
...: %timeit vectorized_app1(pnp_array, length, m, dss)
...: %timeit vectorized_app2(pnp_array, length, m, dss)
...:
10 loops, best of 3: 36.1 ms per loop
1000 loops, best of 3: 1.84 ms per loop
1000 loops, best of 3: 684 µs per loop
In [472]: np.random.seed(0)
...: m,n = 10000,1000 # rest same as previous one
In [473]: %timeit original_app(pnp_array, length, m, dss)
...: %timeit vectorized_app1(pnp_array, length, m, dss)
...: %timeit vectorized_app2(pnp_array, length, m, dss)
...:
1 loop, best of 3: 503 ms per loop
1 loop, best of 3: 222 ms per loop
10 loops, best of 3: 106 ms per loop

Categories