In the following code I have been able to:
Implement Gaussian elimination with no pivoting for a general square linear system.
I have tested it by solving Ax=b, where A is a random 100x100 matrix and b is a random 100x1 vector.
I have compared my solution against the solution obtained using numpy.linalg.solve
However in the final task I need to compute the infinity norm of the difference between the two solutions. I know the infinity norm is the greatest absolute row sum of a matrix. But how can I do this to compute the infinity norm of the difference between the two solutions, my solution and the numpy.linalg.solve. Looking for some help with this!
import numpy as np
def GENP(A, b):
'''
Gaussian elimination with no pivoting.
% input: A is an n x n nonsingular matrix
% b is an n x 1 vector
% output: x is the solution of Ax=b.
% post-condition: A and b have been modified.
'''
n = len(A)
if b.size != n:
raise ValueError("Invalid argument: incompatible sizes between A & b.", b.size, n)
for pivot_row in range(n-1):
for row in range(pivot_row+1, n):
multiplier = A[row][pivot_row]/A[pivot_row][pivot_row]
#the only one in this column since the rest are zero
A[row][pivot_row] = multiplier
for col in range(pivot_row + 1, n):
A[row][col] = A[row][col] - multiplier*A[pivot_row][col]
#Equation solution column
b[row] = b[row] - multiplier*b[pivot_row]
x = np.zeros(n)
k = n-1
x[k] = b[k]/A[k,k]
while k >= 0:
x[k] = (b[k] - np.dot(A[k,k+1:],x[k+1:]))/A[k,k]
k = k-1
return x
if __name__ == "__main__":
A = np.round(np.random.rand(100, 100)*10)
b = np.round(np.random.rand(100)*10)
print (GENP(np.copy(A), np.copy(b)))
for example this code gives the following output for task 1 listed above:
[-6.61537666 0.95704368 1.30101768 -3.69577873 -2.51427519 -4.56927017
-1.61201589 2.88242622 1.67836096 2.18145556 2.60831672 0.08055869
-2.39347903 2.19672137 -0.91609732 -1.17994959 -3.87309152 -2.53330865
5.97476318 3.74687301 5.38585146 -2.71597978 2.0034079 -0.35045844
0.43988439 -2.2623829 -1.82137544 3.20545721 -4.98871738 -6.94378666
-6.5076601 3.28448129 3.42318453 -1.63900434 4.70352047 -4.12289961
-0.79514656 3.09744616 2.96397264 2.60408589 2.38707091 8.72909353
-1.33584905 1.30879264 -0.28008339 0.93560728 -1.40591226 1.31004142
-1.43422946 0.41875924 3.28412668 3.82169545 1.96675247 2.76094378
-0.90069455 1.3641636 -0.60520103 3.4814196 -1.43076816 5.01222382
0.19160657 2.23163261 2.42183726 -0.52941262 -7.35597457 -3.41685057
-0.24359225 -5.33856181 -1.41741354 -0.35654736 -1.71158503 -2.24469314
-3.26453092 1.0932765 1.58333208 0.15567584 0.02793548 1.59561909
0.31732915 -1.00695954 3.41663177 -4.06869021 3.74388762 -0.82868155
1.49789582 -1.63559124 0.2741194 -1.11709237 1.97177449 0.66410154
0.48397714 -1.96241854 0.34975886 1.3317751 2.25763568 -6.80055066
-0.65903682 -1.07105965 -0.40211347 -0.30507635]
then for task two my code gives the following:
my_solution = GENP(np.copy(A), np.copy(b))
numpy_solution = np.linalg.solve(A, b)
print(numpy_solution)
resulting in:
[-6.61537666 0.95704368 1.30101768 -3.69577873 -2.51427519 -4.56927017
-1.61201589 2.88242622 1.67836096 2.18145556 2.60831672 0.08055869
-2.39347903 2.19672137 -0.91609732 -1.17994959 -3.87309152 -2.53330865
5.97476318 3.74687301 5.38585146 -2.71597978 2.0034079 -0.35045844
0.43988439 -2.2623829 -1.82137544 3.20545721 -4.98871738 -6.94378666
-6.5076601 3.28448129 3.42318453 -1.63900434 4.70352047 -4.12289961
-0.79514656 3.09744616 2.96397264 2.60408589 2.38707091 8.72909353
-1.33584905 1.30879264 -0.28008339 0.93560728 -1.40591226 1.31004142
-1.43422946 0.41875924 3.28412668 3.82169545 1.96675247 2.76094378
-0.90069455 1.3641636 -0.60520103 3.4814196 -1.43076816 5.01222382
0.19160657 2.23163261 2.42183726 -0.52941262 -7.35597457 -3.41685057
-0.24359225 -5.33856181 -1.41741354 -0.35654736 -1.71158503 -2.24469314
-3.26453092 1.0932765 1.58333208 0.15567584 0.02793548 1.59561909
0.31732915 -1.00695954 3.41663177 -4.06869021 3.74388762 -0.82868155
1.49789582 -1.63559124 0.2741194 -1.11709237 1.97177449 0.66410154
0.48397714 -1.96241854 0.34975886 1.3317751 2.25763568 -6.80055066
-0.65903682 -1.07105965 -0.40211347 -0.30507635]
finally for task 3:
if np.allclose(my_solution, numpy_solution):
print("These solutions agree")
else:
print("These solutions do not agree")
resulting in:
These solutions agree
If what you want is only the infinity norm for matrix,
it generally should look something like this:
def inf_norm(matrix):
return max(abs(row.sum()) for row in matrix)
But since your my_solution and numpy_solution are just 1-D vectors, you
may either to reshape them (I assume 100x1 which is what you have in your
example) for use with above function:
alternative 1:
def inf_norm(matrix):
return max(abs(row.sum()) for row in matrix)
diff = my_solution - numpy_solution
inf_norm_result = inf_norm(diff.reshape((100, 1))
alternative 2:
Or if you know they will always be 1-D vectors, you can omit the sum
(because the rows will all have length 1) and compute it directly:
abs(my_solution - numpy_solution).max()
alternative 3:
or as it is written in numpy.linalg.norm (see below) documentation:
max(sum(abs(my_solution - numpy_solution), axis=1))
alternative 4:
or use the numpy.linalg.norm() (see: https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.linalg.norm.html):
np.linalg.norm(my_solution - numpy_solution, np.inf)
Related
Hi let's say there are two numpy 2D arrays A, B with same shape.
I am trying to get elements from A where B element equals to x, and get horizontal sum of these elements at A.
I tried two following ways but they were both slow,,(it is big array)
Can somebody advice me faster way? Thank you.
First
farr = np.stack([np.where(rarr == d, varr, 0).sum(axis=1) for d in digits])
Second
#njit
def final_array(varr, rarr, digits):
farr = np.zeros((len(varr), len(digits)))
for i, d in enumerate(digits):
farr[:,i] = np.where(rarr == d, varr, 0).sum(axis=1)
return farr
farr = final_array(varr, rarr, digits)
I have the following polynomial equation that I would like to find the local minima and maxima for.
I defined the function as follows. It uses a flatten function to flatten the nested list, I'll include it for testing purposes (found it here http://rightfootin.blogspot.com/2006/09/more-on-python-flatten.html)
flatten list
from itertools import combinations
import math
def flatten(l, ltypes=(list, tuple)):
ltype = type(l)
l = list(l)
i = 0
while i < len(l):
while isinstance(l[i], ltypes):
if not l[i]:
l.pop(i)
i -= 1
break
else:
l[i:i + 1] = l[i]
i += 1
return ltype(l)
my polynomial
def poly(coefficients, factors):
#quadratic terms
constant = 1
singles = factors
products = [math.prod(c) for c in combinations(factors, 2)]
squares = [f**2 for f in factors]
sequence = flatten([constant, singles, products, squares])
z = sum([math.prod(i) for i in zip(coefficients, sequence)])
return z
The arguments it takes is a list of coefficients, for example:
coefs = [12.19764959, -1.8233151, 2.50952816,-1.56344375, 1.00003828, -1.72128301, -2.54254877, -1.20377309, 5.53510616, 2.94755653, 4.83759279, -0.85507208, -0.48007208, -3.70507208, -0.27007208]
And a list of factor or variable values:
factors = [0.4714, 0.4714, -0.4714, 0.4714]
Plug these in and it calculates the result of the polynomial. The reason I wrote it like this is because the number of variables (factors) changes from fit to fit, so I wanted to keep it flexible. I now want to find the combination of "factors" values within a certain range (let's say between -1 and 1) where the function reaches its maximum and minimum values. If the function was "hard coded" I could use scipy.optimize, but I can't figure out how to make it works as is.
Another option is a brute force grid search (which I use at the moment), but it's very slow as soon as you have more than 2 variables, especially with small step sizes. There may be no true minima/maxima where slope == 0 within the bounds, but as long as I can get the maximum and minimum values that is OK.
Ok, I figured it out. It was two really silly things:
the order of the arguments in the function had to be reversed, so that the first argument (the one I wanted to optimize for) were the "factors" or the X values, followed by the coefficients. That way an array of the same size could be used as the X0 and the coefficients could be used as args.
That wassn't enough, as the function would return an array if an array was the input. I just added a factors = list(factors) to the function itself to put it into the correct shape.
The new function:
def poly(factors, coefficients):
factors = list(factors)
#quadratic terms
constant = 1
singles = factors
products = [math.prod(c) for c in combinations(factors, 2)]
squares = [f**2 for f in factors]
sequence = flatten([constant, singles, products, squares])
z = sum([math.prod(i) for i in zip(coefficients, sequence)])
return z
And the optimization:
coefs = [4.08050532, -0.47042713, -0.08200181, -0.54184481, -0.18515675,
-0.96751856, -1.10814625, -1.7831592, 5.2763512, 2.83505438, 4.7082153,
0.22988773, 1.06488773, -0.70011227, 1.42988773]
x0 = [0.1, 0.1, 0.1, 0.1]
minimize(poly,x0 = x0, args = coefs, bounds = ((-1,1),(-1,1),(-1,1),(-1,1)))
Which returns:
fun: -1.6736636102536673
hess_inv: <4x4 LbfgsInvHessProduct with dtype=float64>
jac: array([-2.10611305e-01, 2.19138777e+00, -8.16990766e+00, -1.11022302e-07])
message: 'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
nfev: 85
nit: 12
njev: 17
status: 0
success: True
x: array([1., -1.,1., 0.03327357])
Consider the following problem: Given a set of n intervals and a set of m floating-point numbers, determine, for each floating-point number, the subset of intervals that contain the floating-point number.
This problem has been addressed by constructing an interval tree (or called range tree or segment tree). Implementations have been done for the one-dimensional case, e.g. python's intervaltree package. Usually, these implementations consider one or few floating-point numbers, namely a small "m" above.
In my problem setting, both n and m are extremely large numbers (from solving an image processing problem). Further, I need to consider the N-dimensional intervals (called cuboid when N=3, because I was modeling human brains with the Finite Element Method). I have implemented a simple N-dimensional interval tree in python, but it run in a loop and can only take one floating-point number at a time. Can anyone help improve the implementation in terms of efficiency? You can change data structure freely.
import sys
import time
import numpy as np
# find the index of a satisfying x > a in one dimension
def find_index_smaller(a, x):
idx = np.argsort(a)
ss = np.searchsorted(a, x, sorter=idx)
res = idx[0:ss]
return res
# find the index of a satisfying x < a in one dimension
def find_index_larger(a, x):
return find_index_smaller(-a, -x)
# find the index of a satisfing amin < x < amax in one dimension
def find_intv_at(amin, amax, x):
idx = find_index_smaller(amin, x)
idx2 = find_index_larger(amax[idx], x)
res = idx[idx2]
return res
# find the index of a satisfying amin < x < amax in N dimensions
def find_intv_at_nd(amin, amax, x):
dim = amin.shape[0]
res = np.arange(amin.shape[-1])
for i in range(dim):
idx = find_intv_at(amin[i, res], amax[i, res], x[i])
res = res[idx]
return res
I also have two test examples for sanity check and performance testing:
def demo1():
print ("By default, we do a correctness test")
n_intv = 2
n_point = 2
# generate the test data
point = np.random.rand(3, n_point)
intv_min = np.random.rand(3, n_intv)
intv_max = intv_min + np.random.rand(3, n_intv)*8
print ("point ")
print (point)
print ("intv_min")
print (intv_min)
print ("intv_max")
print (intv_max)
print ("===Indexes of intervals that contain the point===")
for i in range(n_point):
print (find_intv_at_nd(intv_min,intv_max, point[:, i]))
def demo2():
print ("Performance:")
n_points=100
n_intv = 1000000
# generate the test data
points = np.random.rand(n_points, 3)*512
intv_min = np.random.rand(3, n_intv)*512
intv_max = intv_min + np.random.rand(3, n_intv)*8
print ("point.shape = "+str(points.shape))
print ("intv_min.shape = "+str(intv_min.shape))
print ("intv_max.shape = "+str(intv_max.shape))
starttime = time.time()
for point in points:
tmp = find_intv_at_nd(intv_min, intv_max, point)
print("it took this long to run {} points, with {} interva: {}".format(n_points, n_intv, time.time()-starttime))
My idea would be:
Remove np.argsort() from the algo, because the interval tree does not change, so sorting could have been done in pre-processing.
Vectorize x. The algo runs a loop for each x. It would be nice if we can get rid of the loop over x.
Any contribution would be appreciated.
TL;DR
I want to replicate the functionality of numpy.matmul in theano. What's the best way to do this?
Too Short; Didn't Understand
Looking at theano.tensor.dot and theano.tensor.tensordot, I'm not seeing an easy way to do a straightforward batch matrix multiplication. i.e. treat the last two dimensions of N dimensional tensors as matrices, and multiply them. Do I need to resort to some goofy usage of theano.tensor.batched_dot? Or *shudder* loop them myself without broadcasting!?
The current pull requests don't support broadcasting, so I came up with this for now. I may clean it up, add a little more functionality, and submit my own PR as a temporary solution. Until then, I hope this helps someone!
I included the test to show it replicates numpy.matmul, given that the input complies with my more strict (temporary) assertions.
Also, .scan stops iterating the sequences at argmin(*sequencelengths) iterations. So, I believe that mismatched array shapes won't raise any exceptions.
import theano as th
import theano.tensor as tt
import numpy as np
def matmul(a: tt.TensorType, b: tt.TensorType, _left=False):
"""Replicates the functionality of numpy.matmul, except that
the two tensors must have the same number of dimensions, and their ndim must exceed 1."""
# TODO ensure that broadcastability is maintained if both a and b are broadcastable on a dim.
assert a.ndim == b.ndim # TODO support broadcasting for differing ndims.
ndim = a.ndim
assert ndim >= 2
# If we should left multiply, just swap references.
if _left:
tmp = a
a = b
b = tmp
# If a and b are 2 dimensional, compute their matrix product.
if ndim == 2:
return tt.dot(a, b)
# If they are larger...
else:
# If a is broadcastable but b is not.
if a.broadcastable[0] and not b.broadcastable[0]:
# Scan b, but hold a steady.
# Because b will be passed in as a, we need to left multiply to maintain
# matrix orientation.
output, _ = th.scan(matmul, sequences=[b], non_sequences=[a[0], 1])
# If b is broadcastable but a is not.
elif b.broadcastable[0] and not a.broadcastable[0]:
# Scan a, but hold b steady.
output, _ = th.scan(matmul, sequences=[a], non_sequences=[b[0]])
# If neither dimension is broadcastable or they both are.
else:
# Scan through the sequences, assuming the shape for this dimension is equal.
output, _ = th.scan(matmul, sequences=[a, b])
return output
def matmul_test() -> bool:
vlist = []
flist = []
ndlist = []
for i in range(2, 30):
dims = int(np.random.random() * 4 + 2)
# Create a tuple of tensors with potentially different broadcastability.
vs = tuple(
tt.TensorVariable(
tt.TensorType('float64',
tuple((p < .3) for p in np.random.ranf(dims-2))
# Make full matrices
+ (False, False)
)
)
for _ in range(2)
)
vs = tuple(tt.swapaxes(v, -2, -1) if j % 2 == 0 else v for j, v in enumerate(vs))
f = th.function([*vs], [matmul(*vs)])
# Create the default shape for the test ndarrays
defshape = tuple(int(np.random.random() * 5 + 1) for _ in range(dims))
# Create a test array matching the broadcastability of each v, for each v.
nds = tuple(
np.random.ranf(
tuple(s if not v.broadcastable[j] else 1 for j, s in enumerate(defshape))
)
for v in vs
)
nds = tuple(np.swapaxes(nd, -2, -1) if j % 2 == 0 else nd for j, nd in enumerate(nds))
ndlist.append(nds)
vlist.append(vs)
flist.append(f)
for i in range(len(ndlist)):
assert np.allclose(flist[i](*ndlist[i]), np.matmul(*ndlist[i]))
return True
if __name__ == "__main__":
print("matmul_test -> " + str(matmul_test()))
I'm using NumPy to store data into matrices.
I'm struggling to make the below Python code perform better.
RESULT is the data store I want to put the data into.
TMP = np.array([[1,1,0],[0,0,1],[1,0,0],[0,1,1]])
n_row, n_col = TMP.shape[0], TMP.shape[0]
RESULT = np.zeros((n_row, n_col))
def do_something(array1, array2):
intersect_num = np.bitwise_and(array1, array2).sum()
union_num = np.bitwise_or(array1, array2).sum()
try:
return intersect_num / float(union_num)
except ZeroDivisionError:
return 0
for i in range(n_row):
for j in range(n_col):
if i >= j:
continue
RESULT[i, j] = do_something(TMP[i], TMP[j])
I guess it would be much faster if I could use some NumPy built-in function instead of for-loops.
I was looking for the various questions around here, but I couldn't find the best fit for my problem.
Any suggestion? Thanks in advance!
Approach #1
You could do something like this as a vectorized solution -
# Store number of rows in TMP as a paramter
N = TMP.shape[0]
# Get the indices that would be used as row indices to select rows off TMP and
# also as row,column indices for setting output array. These basically correspond
# to the iterators involved in the loopy implementation
R,C = np.triu_indices(N,1)
# Calculate intersect_num, union_num and division results across all iterations
I = np.bitwise_and(TMP[R],TMP[C]).sum(-1)
U = np.bitwise_or(TMP[R],TMP[C]).sum(-1)
vals = np.true_divide(I,U)
# Setup output array and assign vals into it
out = np.zeros((N, N))
out[R,C] = vals
Approach #2
For cases with TMP holding 1s and 0s, those np.bitwise_and and np.bitwise_or would be replaceable with dot-products and as such could be faster alternatives. So, with those we would have an implementation like so -
M = TMP.shape[1]
I = TMP.dot(TMP.T)
TMP_inv = 1-TMP
U = M - TMP_inv.dot(TMP_inv.T)
out = np.triu(np.true_divide(I,U),1)