I've been facing an interesting python problem. I've tried to inverse 3x3 matrix A
[[1 2 3]
[4 5 6]
[7 8 9]]
and then multiply it on initial one: A⁻ⁱA. Instead of identity matrix (with all diagonal elements equal one) I've got this one:
[[ 12. 8. 8.]
[-16. -8. 0.]
[ 4. 0. 0.]]
The problem occurs only in this specific case. Matrices with other values give right results. Here is the code:
import numpy as np
np.set_printoptions(precision=2,suppress=True)
A = np.array([1,2,3,4,5,6,7,8,9])
A = A.reshape(3,3)
print(A)
print(np.linalg.det(A))
print(np.matmul(np.linalg.inv(A),A))
Output:
[[1 2 3]
[4 5 6]
[7 8 9]]
6.66133814775094e-16
[[ 12. 8. 8.]
[-16. -8. 0.]
[ 4. 0. 0.]]
As others have pointed out, a singular matrix is non-invertible, so you get a nonsense answer from A^-1 A.
Numpy includes a handy function to check the condition number
np.linalg.cond(A)
# 5.0522794445385096e+16
As wikipedia states, this is a measure of the sensitivity of the output value b in Ax = b for small change in the matrix values in A (kind of like a generalized derivative). The large value indicates that A is "il-conditioned", and can result in unstable values. This is intrinsic to the real-valued matrix but can be worsened by floating point arithmetic.
cond is more useful than looking at np.linalg.det(A) to know if your matrix will be well-behaved because its is not sensitive to the scale of values in A (whereas the norm and determinant are). As an example, here is a matrix with small values, but really has no issue with invertibility:
A = 1e-10*np.random.random(size=(3,3))
np.linalg.det(A)
# 2.128774239739163e-31
# ^^ this looks really bad...
np.linalg.cond(A)
# 8.798791503909136
# nevermind, it's probably ok
A_ident = np.matmul(np.linalg.inv(A), A)
np.linalg.norm(A_ident - np.identity(3))
# 5.392490230798587e-16
# A^(-1)*A is very close to the identity matrix, not il-conditioned.
The determinant of this matrix is 0. since
import numpy as np
np.set_printoptions(precision=2,suppress=True)
A = np.array([1,2,3,4,5,6,7,8,9])
A = A.reshape(3,3)
# print determinant
print(np.linalg.det(A))
returns
[[1 2 3]
[4 5 6]
[7 8 9]]
0.0
you have a matrix that has no computable inverse.
Your matrix is not invertible, see e.g. wolfram alpha, which says that matrix is singular.
You may be misguided that Python printed a nonzero value of determinant (6.66133814775094e-16), however, this value is so close to 0, that you should treat it like that. The operations that computers do on floating point numbers usually are not completely accurate (see e.g. this question Why are floating point numbers inaccurate?) and that might have been the reason that the value of determinant was close to zero, but not exactly there.
Related
I want to create an random array with a "norm" of 1 and "mean" of 0. I can use numpy.random.normal() to get the "mean" I want, but how can I create an array such that numpy.linalg.norm(array) returns the number that I want?
Using numpy.random.normal with the size argument will give you an array with values that are drawn from a distribution with a mean of 0. The mean value of the array will not be 0, however (it is more likely to be close to 0, the larger the array is).
But you can easily fix that by subtracting the mean of the array.
Once you have this, you can change the norm of the array to 1 by dividing by its norm. This will not change the mean, because it is 0.
def create(n):
x = numpy.random.normal(size=n)
x -= x.mean()
return x / numpy.linalg.norm(x)
Example
>>> a = create(10)
>>> a
array([-0.48299539, 0.06017975, 0.23788747, -0.31949065, 0.56126426,
-0.33117035, 0.40908645, 0.01169836, -0.1008337 , -0.0456262 ])
>>> a.mean()
-1.3183898417423733e-17 # not exactly 0 due to floating-point math
>>> numpy.linalg.norm(a)
1.0
Notice how for n=2 there are exactly 2 arrays satisfying these conditions: Those that contain both the positive and the negative square root of 1/2:
>>> for _ in range(5):
... print(create(2))
...
[-0.70710678 0.70710678]
[-0.70710678 0.70710678]
[-0.70710678 0.70710678]
[-0.70710678 0.70710678]
[ 0.70710678 -0.70710678]
here is the two vector
X = [1,3,4,5]
Y = [2,6,2,2]
np.cov(X,Y)
out put is array([[ 2.91666667, -0.33333333],
[-0.33333333, 4. ]])
here is the calculation
mu(X)=(1+3+4+5)/4=3.25
mu(Y)=(2+6+2+2)/4=3
var[X]=E[(X-mu(X)(X-mu(X)]=8.75
var[Y]=E[(Y-mu(Y)(Y-mu(Y)]=12
cov[X,Y]=E[(X-mu(X)(Y-mu(Y)]=-1
cov[Y,X]=E[(Y-mu(Y)(X-mu(X)]=-1
so the result is array([[8.75, -1],
[-1, 12]])
It is noticed that the result calculated by hand is just 3 times of the array get from np.cov(X,Y).
My question is why the two matrix is different and does 3 mean anything here or just a
coincidence
I'm trying to get into Singular Value Decomposition (SVD). I've found this YouTube Lecture that contains an example. However, when I try this example in numpy I'm getting "kind of" different results. In this example the input matrix is
A = [ [1,1,1,0,0], [3,3,3,0,0], [4,4,4,0,0], [5,5,5,0,0], [0,2,0,4,4], [0,0,0,5,5], [0,1,0,2,2] ]
A = np.asarray(A)
print(A)
[[1 1 1 0 0]
[3 3 3 0 0]
[4 4 4 0 0]
[5 5 5 0 0]
[0 2 0 4 4]
[0 0 0 5 5]
[0 1 0 2 2]]
The rank of this matrix is 3 (np.linalg.matrix_rank(A)). The lecture states that the number of singular values are the rank of the matrix, and in the example the Sigma matrix S is indeed of size 3=3. However, when I perform
U, S, V = np.linalg.svd(A)
matrix S contains 5 values. On the other hand, the first 3 values match the one in the example, and the other 2 are basically 0. Can I assume that get more singular values than the the rank because of the numerical algorithm behind SVD and the finite representation of real numbers on computers - or something along that line?
As mentioned on this page, numpy internally uses LAPACK routine _gesdd to get the SVD decomposition. Now, if you see _gesdd documentation, it mentions,
To find the SVD of a general matrix A, call the LAPACK routine ?gebrd
or ?gbbrd for reducing A to a bidiagonal matrix B by a unitary
(orthogonal) transformation: A = QBPH. Then call ?bdsqr, which forms
the SVD of a bidiagonal matrix: B = U1ΣV1H.
So, there are 2 steps involved here :
Bidiagonalization by orthogonal transformation(Householder transformations)
Get the SVD of the bidiagonal matrix, using implicit zero-shift QR algorithm.
QR algorithm is an iterative algorithm, meaning you don't get an "exact" answer, but get better and better approximations with each iteration and stop if the change in values fall below a threshold, so it is "approximate" in that sense.
Thus along with the issue of numerical accuracies due to finite machine representation of reals, even if we had infinite representational capacity, we would have gotten "approximate" results(if we ran the algorithm for finite time) due to the iterative nature of the algorithm.
Is there a numerically stable way to compute softmax function below?
I am getting values that becomes Nans in Neural network code.
np.exp(x)/np.sum(np.exp(y))
The softmax exp(x)/sum(exp(x)) is actually numerically well-behaved. It has only positive terms, so we needn't worry about loss of significance, and the denominator is at least as large as the numerator, so the result is guaranteed to fall between 0 and 1.
The only accident that might happen is over- or under-flow in the exponentials. Overflow of a single or underflow of all elements of x will render the output more or less useless.
But it is easy to guard against that by using the identity softmax(x) = softmax(x + c) which holds for any scalar c: Subtracting max(x) from x leaves a vector that has only non-positive entries, ruling out overflow and at least one element that is zero ruling out a vanishing denominator (underflow in some but not all entries is harmless).
Footnote: theoretically, catastrophic accidents in the sum are possible, but you'd need a ridiculous number of terms. For example, even using 16 bit floats which can only resolve 3 decimals---compared to 15 decimals of a "normal" 64 bit float---we'd need between 2^1431 (~6 x 10^431) and 2^1432 to get a sum that is off by a factor of two.
Softmax function is prone to two issues: overflow and underflow
Overflow: It occurs when very large numbers are approximated as infinity
Underflow: It occurs when very small numbers (near zero in the number line) are approximated (i.e. rounded to) as zero
To combat these issues when doing softmax computation, a common trick is to shift the input vector by subtracting the maximum element in it from all elements. For the input vector x, define z such that:
z = x-max(x)
And then take the softmax of the new (stable) vector z
Example:
def stable_softmax(x):
z = x - max(x)
numerator = np.exp(z)
denominator = np.sum(numerator)
softmax = numerator/denominator
return softmax
# input vector
In [267]: vec = np.array([1, 2, 3, 4, 5])
In [268]: stable_softmax(vec)
Out[268]: array([ 0.01165623, 0.03168492, 0.08612854, 0.23412166, 0.63640865])
# input vector with really large number, prone to overflow issue
In [269]: vec = np.array([12345, 67890, 99999999])
In [270]: stable_softmax(vec)
Out[270]: array([ 0., 0., 1.])
In the above case, we safely avoided the overflow problem by using stable_softmax()
For more details, see chapter Numerical Computation in deep learning book.
Extending #kmario23's answer to support 1 or 2 dimensional numpy arrays or lists. 2D tensors (assuming the first dimension is the batch dimension) are common if you're passing a batch of results through softmax:
import numpy as np
def stable_softmax(x):
z = x - np.max(x, axis=-1, keepdims=True)
numerator = np.exp(z)
denominator = np.sum(numerator, axis=-1, keepdims=True)
softmax = numerator / denominator
return softmax
test1 = np.array([12345, 67890, 99999999]) # 1D numpy
test2 = np.array([[12345, 67890, 99999999], # 2D numpy
[123, 678, 88888888]]) #
test3 = [12345, 67890, 999999999] # 1D list
test4 = [[12345, 67890, 999999999]] # 2D list
print(stable_softmax(test1))
print(stable_softmax(test2))
print(stable_softmax(test3))
print(stable_softmax(test4))
[0. 0. 1.]
[[0. 0. 1.]
[0. 0. 1.]]
[0. 0. 1.]
[[0. 0. 1.]]
There is nothing wrong with calculating the softmax function as it is in your case. The problem seems to come from exploding gradient or this sort of issues with your training methods. Focus on those matters with either "clipping values" or "choosing the right initial distribution of weights".
I do try to find an appropriate function for the permeability of cells under varying conditions. If I assume constant permeability, I can fit it to the experimental data and use Sklearns PolynomialFeatures together with a LinearModel (As explained in this post) in order to determine a correlation between the conditions and the permeability. However, the permeability is not constant and now I try to fit my model with the permeability as a function of the process conditions. The PolynomialFeature module of sklearn is quite nice to use.
Is there an equivalent function within scipy or numpy which allows me to create a polynomial model (including interaction terms e.g. a*x[0]*x[1] etc.) of varying order without writing the whole function by hand ?
The standard polynomial class in numpy seems not to support interaction terms.
I'm not aware of such a function that does exactly what you need, but you can achieve it using a combination of itertools and numpy.
If you have n_features predictor variables, you essentially must generate all vectors of length n_features whose entries are non-negative integers and sum to the specified order. Each new feature column is the component-wise power using these vectors who sum to a given order.
For example, if order = 3 and n_features = 2, one of the new features will be the old features raise to the respective powers, [2,1]. I've written some code below for arbitrary order and number of features. I've modified the generation of vectors who sum to order from this post.
import itertools
import numpy as np
from scipy.special import binom
def polynomial_features_with_cross_terms(X, order):
"""
X: numpy ndarray
Matrix of shape, `(n_samples, n_features)`, to be transformed.
order: integer, default 2
Order of polynomial features to be computed.
returns: T, powers.
`T` is a matrix of shape, `(n_samples, n_poly_features)`.
Note that `n_poly_features` is equal to:
`n_features+order-1` Choose `n_features-1`
See: https://en.wikipedia.org\
/wiki/Stars_and_bars_%28combinatorics%29#Theorem_two
`powers` is a matrix of shape, `(n_features, n_poly_features)`.
Each column specifies the power by row of the respective feature,
in the respective column of `T`.
"""
n_samples, n_features = X.shape
n_poly_features = int(binom(n_features+order-1, n_features-1))
powers = np.zeros((n_features, n_poly_features))
T = np.zeros((n_samples, n_poly_features), dtype=X.dtype)
combos = itertools.combinations(range(n_features+order-1), n_features-1)
for i,c in enumerate(combos):
powers[:,i] = np.array([
b-a-1 for a,b in zip((-1,)+c, c+(n_features+order-1,))
])
T[:,i] = np.prod(np.power(X, powers[:,i]), axis=1)
return T, powers
Here's some example usage:
>>> X = np.arange(-5,5).reshape(5,2)
>>> T,p = polynomial_features_with_cross_terms(X, order=3)
>>> print X
[[-5 -4]
[-3 -2]
[-1 0]
[ 1 2]
[ 3 4]]
>>> print p
[[ 0. 1. 2. 3.]
[ 3. 2. 1. 0.]]
>>> print T
[[ -64 -80 -100 -125]
[ -8 -12 -18 -27]
[ 0 0 0 -1]
[ 8 4 2 1]
[ 64 48 36 27]]
Finally, I should mention that the SVM polynomial kernel achieves exactly this effect without explicitly computing the polynomial map. There are of course pro's and con's to this, but I figured I should mentioned it for you to consider if you have not, yet.