Related
I'm trying to get the L and U--matrices from the following Gauss-elimination code I wrote
matrix = np.array ([[2,1,4,1], [3,4,-1,-1] , [1,-4,1,5] , [2,-2,1,3]], dtype = float)
vector = np.array([-4, 3, 9, 7], float)
length = len(vector)
L_matrix = np.zeros((4,4), float)
U_matrix = np.zeros((4,4), float)
for m in range(length):
L_matrix[:,m] = matrix[:,m]
div = matrix[m,m]
matrix[m,:] /= div
U_matrix[m, :] = matrix[m,:]
vector[m] /= div
I'm getting the right U-matrix, but I'm getting this L-matrix
[[ 2. 0.5 2. 0.5]
[ 3. 2.5 -2.8 -1. ]
[ 1. -4.5 -13.6 -0. ]
[ 2. -3. -11.4 -1. ]]
i.e I'm getting the whole matrix instead of a lower triangular matrix with zeros at the top! What am I doing wrong here?
The issue here is that the provided code does not perform the elimination. Try this:
for m in range(length):
div = matrix[m, m]
L_matrix[:, m] = matrix[:, m] / div
U_matrix[m, :] = matrix[m, :]
matrix -= np.outer(L_matrix[:, m], U_matrix[m, :])
See this article for more details. For actually solving your linear system, the issue is that LU is not exactly the same as standard Gaussian elimination. You can use back substitution to efficiently compute what vector should be.
I have implemented the standard equations/algorithm of LU Decomposition of a Matrix by following this link: (1) and (2)
This returns the LU decomposition of a square matrix like below perfectly.
My problem is, however- it also gives a Divide by Zero warning.
Code here:
import numpy as np
def LUDecomposition (A):
L = np.zeros(np.shape(A),np.float64)
U = np.zeros(np.shape(A),np.float64)
acc = 0
L[0,0]=1
for i in np.arange(len(A)):
for k in range(i,len(A)):
for j in range(0,i):
acc += L[i,j]*U[j,k]
U[i,k] = A[i,k]-acc
for m in range(k+1,len(A)):
if m==k:
L[m,k]=1
else:
L[m,k] = (A[m,k]-acc)/U[k,k]
acc=0
return (L,U)
A = np.array([[-4, -1, -2],
[-4, 12, 3],
[-4, -2, 18]])
L, U = LUDecomposition (A)
Where am I going wrong?
It seems that you may have made some indentation errors regarding the first inner level for loops: U must be evaluated before L ; you also didn't correctly compute the summation term acc and didn't properly set the diagonal terms of L to 1. Following some other syntax modifications, you may rewrite your function as follows:
def LUDecomposition(A):
n = A.shape[0]
L = np.zeros((n,n), np.float64)
U = np.zeros((n,n), np.float64)
for i in range(n):
# U
for k in range(i,n):
s1 = 0 # summation of L(i, j)*U(j, k)
for j in range(i):
s1 += L[i,j]*U[j,k]
U[i,k] = A[i,k] - s1
# L
for k in range(i,n):
if i==k:
# diagonal terms of L
L[i,i] = 1
else:
s2 = 0 # summation of L(k, j)*U(j, i)
for j in range(i):
s2 += L[k,j]*U[j,i]
L[k,i] = (A[k,i] - s2)/U[i,i]
return L, U
which gives this time the correct output for matrix A when compared to scipy.linalg.lu as a reliable reference:
import numpy as np
from scipy.linalg import lu
A = np.array([[-4, -1, -2],
[-4, 12, 3],
[-4, -2, 18]])
L, U = LUDecomposition(A)
P, L_sp, U_sp = lu(A, permute_l=False)
P
>>> [[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])
L
>>> [[ 1. 0. 0. ]
[ 1. 1. 0. ]
[ 1. -0.07692308 1. ]]
np.allclose(L_sp, L))
>>> True
U
>>> [[-4. -1. -2. ]
[ 0. 13. 5. ]
[ 0. 0. 20.38461538]]
np.allclose(U_sp, U))
>>> True
Note: unlike scipy lapack getrf algorithm, this Doolittle implementation does not include pivoting, these two comparisons are then only true if permutation matrix P returned by scipy.linalg.lu is an identity matrix, i.e. scipy didn't performed any permutations, which is indeed the case for your matrix A. The permutation matrix determined in scipy algorithm is meant to optimize conditions numbers of resulting matrix in order to reduce roundoff errors. At last, you may just simply verify that A = LU which will always be the case if the factorization is done right:
A = np.random.rand(10,10)
L, U = LUDecomposition(A)
np.allclose(A, np.dot(L, U))
>>> True
Nevertheless, in terms of numerical efficiency and accuracy, I wouldn't recommend you to use your own function to compute LU decomposition. Hope this helps.
I have a list of indices
a = [
[1,2,4],
[0,2,3],
[1,3,4],
[0,2]]
What's the fastest way to convert this to a numpy array of ones, where each index shows the position where 1 would occur?
I.e. what I want is:
output = array([
[0,1,1,0,1],
[1,0,1,1,0],
[0,1,0,1,1],
[1,0,1,0,0]])
I know the max size of the array beforehand. I know I could loop through each list and insert a 1 into at each index position, but is there a faster/vectorized way to do this?
My use case could have thousands of rows/cols and I need to do this thousands of times, so the faster the better.
How about this:
ncol = 5
nrow = len(a)
out = np.zeros((nrow, ncol), int)
out[np.arange(nrow).repeat([*map(len,a)]), np.concatenate(a)] = 1
out
# array([[0, 1, 1, 0, 1],
# [1, 0, 1, 1, 0],
# [0, 1, 0, 1, 1],
# [1, 0, 1, 0, 0]])
Here are timings for a 1000x1000 binary array, note that I use an optimized version of the above, see function pp below:
pp 21.717635259992676 ms
ts 37.10938713003998 ms
u9 37.32933565042913 ms
Code to produce timings:
import itertools as it
import numpy as np
def make_data(n,m):
I,J = np.where(np.random.random((n,m))<np.random.random((n,1)))
return [*map(np.ndarray.tolist, np.split(J, I.searchsorted(np.arange(1,n))))]
def pp():
sz = np.fromiter(map(len,a),int,nrow)
out = np.zeros((nrow,ncol),int)
out[np.arange(nrow).repeat(sz),np.fromiter(it.chain.from_iterable(a),int,sz.sum())] = 1
return out
def ts():
out = np.zeros((nrow,ncol),int)
for i, ix in enumerate(a):
out[i][ix] = 1
return out
def u9():
out = np.zeros((nrow,ncol),int)
for i, (x, y) in enumerate(zip(a, out)):
y[x] = 1
out[i] = y
return out
nrow,ncol = 1000,1000
a = make_data(nrow,ncol)
from timeit import timeit
assert (pp()==ts()).all()
assert (pp()==u9()).all()
print("pp", timeit(pp,number=100)*10, "ms")
print("ts", timeit(ts,number=100)*10, "ms")
print("u9", timeit(u9,number=100)*10, "ms")
This might not be the fastest way. You will need to compare execution times of these answers using large arrays in order to find out the fastest way. Here's my solution
output = np.zeros((4,5))
for i, ix in enumerate(a):
output[i][ix] = 1
# output ->
# array([[0, 1, 1, 0, 1],
# [1, 0, 1, 1, 0],
# [0, 1, 0, 1, 1],
# [1, 0, 1, 0, 0]])
In case you can and want to use Cython you can create a readable (at least if you don't mind the typing) and fast solution.
Here I'm using the IPython bindings of Cython to compile it in a Jupyter notebook:
%load_ext cython
%%cython
cimport cython
cimport numpy as cnp
import numpy as np
#cython.boundscheck(False) # remove this if you cannot guarantee that nrow/ncol are correct
#cython.wraparound(False)
cpdef cnp.int_t[:, :] mseifert(list a, int nrow, int ncol):
cdef cnp.int_t[:, :] out = np.zeros([nrow, ncol], dtype=int)
cdef list subl
cdef int row_idx
cdef int col_idx
for row_idx, subl in enumerate(a):
for col_idx in subl:
out[row_idx, col_idx] = 1
return out
To compare the performance of the solutions presented here I use my library simple_benchmark:
Note that this uses logarithmic axis to simultaneously show the differences for small and large arrays. According to my benchmark my function is actually the fastest of the solutions, however it's also worth pointing out that all of the solutions aren't too far off.
Here is the complete code I used for the benchmark:
import numpy as np
from simple_benchmark import BenchmarkBuilder, MultiArgument
import itertools
b = BenchmarkBuilder()
#b.add_function()
def pp(a, nrow, ncol):
sz = np.fromiter(map(len, a), int, nrow)
out = np.zeros((nrow, ncol), int)
out[np.arange(nrow).repeat(sz), np.fromiter(itertools.chain.from_iterable(a), int, sz.sum())] = 1
return out
#b.add_function()
def ts(a, nrow, ncol):
out = np.zeros((nrow, ncol), int)
for i, ix in enumerate(a):
out[i][ix] = 1
return out
#b.add_function()
def u9(a, nrow, ncol):
out = np.zeros((nrow, ncol), int)
for i, (x, y) in enumerate(zip(a, out)):
y[x] = 1
out[i] = y
return out
b.add_functions([mseifert])
#b.add_arguments("number of rows/columns")
def argument_provider():
for n in range(2, 13):
ncols = 2**n
a = [
sorted(set(np.random.randint(0, ncols, size=np.random.randint(0, ncols))))
for _ in range(ncols)
]
yield ncols, MultiArgument([a, ncols, ncols])
r = b.run()
r.plot()
May not be the best way but the only way I can think of:
output = np.zeros((4,5))
for i, (x, y) in enumerate(zip(a, output)):
y[x] = 1
output[i] = y
print(output)
Which outputs:
[[ 0. 1. 1. 0. 1.]
[ 1. 0. 1. 1. 0.]
[ 0. 1. 0. 1. 1.]
[ 1. 0. 1. 0. 0.]]
How about using array indexing? If you knew more about your input, you could get rid of the penalty for having to convert to a linear array first.
import numpy as np
def main():
row_count = 4
col_count = 5
a = [[1,2,4],[0,2,3],[1,3,4],[0,2]]
# iterate through each row, concatenate all indices and convert them to linear
# numpy append performs copy even if you don't want it, list append is faster
b = []
for row_idx, row in enumerate(a):
b.append(np.array(row, dtype=np.int64) + (row_idx * col_count))
linear_idxs = np.hstack(b)
#could skip previous steps if given index inputs well before hand, or in linear index order.
c = np.zeros(row_count * col_count)
c[linear_idxs] = 1
c = c.reshape(row_count, col_count)
print(c)
if __name__ == "__main__":
main()
#output
# [[0. 1. 1. 0. 1.]
# [1. 0. 1. 1. 0.]
# [0. 1. 0. 1. 1.]
# [1. 0. 1. 0. 0.]]
Depending on your use case, you might look into using sparse matrices. The input matrix looks suspiciously like a Compressed Sparse Row (CSR) matrix. Perhaps something like
import numpy as np
from scipy.sparse import csr_matrix
from itertools import accumulate
def ragged2csr(inds):
offset = len(inds[0])
lens = [len(x) for x in inds]
indptr = list(accumulate(lens))
indptr = np.array([x - offset for x in indptr])
indices = np.array([val for sublist in inds for val in sublist])
n = indices.size
data = np.ones(n)
return csr_matrix((data, indices, indptr))
Again, if it fits in your use case, a sparse matrix would allow elementwise/masking operations to scale with the number of nonzeros, rather than the number of elements (rows*columns), which could bring significant speedup (for a sparse enough matrix).
Another good introduction to CSR matrices is section 3.4 of Iterative Methods. In this case, data is aa, indices is ja and indptr is ia. This format also has the benefit of being very popular among different packages/libraries.
I am now working on a calculation shown below. I want to update the values of each element based on their adjacent elements. I am now using two for loops, but it shows the calculation is very slow since there are several outer iterations. I want to know whether there is any way can speed up this calculation>
for i in range(1,nx+1):
for j in range(1,ny+1):
p[i,j]=(a*p[i-1,j]+b*p[i+1,j]+c*p[i,j-1]+d*p[i,j+1])
a, b, c, d are some constant, p is numpy.array type
Sample input:
import numpy as np
p = np.ones((5,5))
for i in range(1,4):
for j in range(1,4):
p[i,j]=p[i-1,j] + p[i+1,j] +2*p[i,j+1]+2*p[i,j-1]
print(p)
The final output should be:
[[ 1. 1. 1. 1. 1.]
[ 1. 6. 16. 36. 1.]
[ 1. 11. 41. 121. 1.]
[ 1. 16. 76. 276. 1.]
[ 1. 1. 1. 1. 1.]]
Don't have enough rep to comment and this doesn't fully answer the question, but if you are using NumPy, you should definitely look at array broadcasting. Hard to tell exactly what your code is doing, but using broadcasting should make it a lot easier to update the full matrix instead of value by value
We can at least get rid of one nested loop using np.cumsum. In favorable conditions (large number of columns) this can give a 30fold speedup. Sample run:
results equal True
original 31.644793 ms
optimized 0.861980 ms
Code:
import numpy as np
n, m = 50, 600
a, b, c, d = np.random.random((4,))
P = np.random.random((n, m))
def f_OP(P):
p = P.copy()
for i in range(1, n-1):
for j in range(1, m-1):
p[i,j]=a*p[i-1,j] + b*p[i+1,j] +c*p[i,j-1]+d*p[i,j+1]
return p
def f_pp(P):
p = P.copy()
pp = d*p[1:-1, 2:] + b*p[2:, 1:-1]
pp[0] += a*p[0, 1:-1]
pp[:, 0] += c*p[1:-1, 0]
x = np.full((m-2,), c)
x[0] = 1
x = np.cumprod(x)[::-1]
pp = np.cumsum(pp * x, axis=1)
for i in range(1, n-2):
pp[i] += a * np.cumsum(pp[i-1])
p[1:-1, 1:-1] = pp / x
return(p)
print('results equal', np.allclose(f_OP(P), f_pp(P)))
from timeit import timeit
kwds = dict(globals=globals(), number=10)
print('original {:10.6f} ms'.format(timeit('f_OP(P)', **kwds)*100))
print('optimized {:10.6f} ms'.format(timeit('f_pp(P)', **kwds)*100))
The question:
For this problem, you are given a list of matrices called As, and your job is to find the QR factorization for each of them.
Implement qr_by_gram_schmidt: This function takes as input a matrix A and computes a QR decomposition, returning two variables, Q and R where A=QR, with Q orthogonal and R zero below the diagonal.
A is an n×m matrix with n≥m (i.e. more rows than columns).
You should implement this function using the modified Gram-Schmidt procedure.
INPUT:
As: List of arrays
OUTPUT:
Qs: List of the Q matrices output by qr_by_gram_schmidt, in the same order as As. For a matrix A of shape n×m, Q should have shape n×m.
Rs: List of the R matrices output by qr_by_gram_schmidt, in the same order as As. For a matrix A of shape n×m, R should have shape m×m
I have written the code for the QR factorization which I believe is correct:
import numpy as np
def qr_by_gram_schmidt(A):
m = np.shape(A)[0]
n = np.shape(A)[1]
Q = np.zeros((m, m))
R = np.zeros((n, n))
for j in xrange(n):
v = A[:,j]
for i in xrange(j):
R[i,j] = Q[:,i].T * A[:,j]
v = v.squeeze() - (R[i,j] * Q[:,i])
R[j,j] = np.linalg.norm(v)
Q[:,j] = (v / R[j,j]).squeeze()
return Q, R
How do I write the loop to calculate the the QR factorization of each of the matrices in As and storing them in that order?
edit: The code has some error too. I will appreciate it if you can help me in debugging it.
Thanks
I didn't check your GS code, but had to make a change (may not be correct!) to make it compile. You just have to set up a list of your matrices, I made 2 of them and then loop through that list and apply your function.
import numpy as np
def gs(A):
m = np.shape(A)[0]
n = np.shape(A)[1]
Q = np.zeros((m, m))
R = np.zeros((n, n))
print m,n,Q,R
for j in xrange(n):
v = A[:,j]
for i in xrange(j):
R[i,j] = np.dot(Q[:,i].T , A[:,j]) # I made an arbitrary change here!!!
v = v.squeeze() - (R[i,j] * Q[:,i])
R[j,j] = np.linalg.norm(v)
Q[:,j] = (v / R[j,j]).squeeze()
return Q, R
As= np.random.rand(2,3,3) # list of 2 (3x3) matrices
print As
for A in As:
print gs(A)
Output:
[[[ 0.9599614 0.02213113 0.43343881]
[ 0.44202415 0.6816688 0.88321052]
[ 0.93098107 0.80528361 0.88473308]]
[[ 0.41794678 0.10762796 0.42110659]
[ 0.89598082 0.81225543 0.52947205]
[ 0.0621515 0.59826789 0.14021332]]]
(array([[ 0.68158915, -0.67980134, 0.27075149],
[ 0.31384477, 0.60583989, 0.73106736],
[ 0.66101262, 0.41331364, -0.626286 ]]), array([[ 1.40841649, 0.76132516, 1.15743793],
[ 0. , 0.73077208, 0.60610414],
[ 0. , 0. , 0.20894464]]))
(array([[ 0.42190511, -0.39510208, 0.81602109],
[ 0.90446656, 0.121136 , -0.40898205],
[ 0.06274013, 0.91061541, 0.40846452]]), array([[ 0.99061796, 0.81760207, 0.66535379],
[ 0. , 0.6006613 , 0.02543844],
[ 0. , 0. , 0.18435946]]))