Numpy has the following methods:
allclose() - Assuming identical shape of the arrays and a tolerance for the comparison of values
array_equal() - Checking both the shape and the element values, no tolerance (values have to be exactly equal)
I can't seem to find any difference between them. Any examples?
np.allclose is designed to be used with arrays of floating point numbers. Floating point calculations have inherent precision loss, so you can often find yourself with numbers which should be equal but differ by a very tiny amount.
On the other hand, np.array_equal is designed to be used with arrays of integers and only checks for exact equality.
Consider the following example which generates an array of 100 floating point numbers, divides it by 1.5, and then multiplies it by 1.5. Due to precision loss the arrays are no longer exactly equal but are still close within a very small tolerance.
arr = np.random.rand(1000)
arr2 = arr / 1.5
arr2 = arr2 * 1.5
print(np.array_equal(arr, arr2))
# False
print(np.allclose(arr, arr2, atol=1e-16, rtol=1e-16))
# True
Related
I am having a big vector and I want to compute the elements wise inverse each time with a small perturbation. For example, for large N, I have y and its element wise inverse y_inv
y = np.random.rand(1, N)
y_inv = 1. / y
y_inv[y_inv == np.inf] = 0 # y may contain zeros
and I want to compute
p = 0.01
z = y + p
z_inv = 1./ z = 1./ (y + p)
multiple times for different p and fixed y. Because N is very large, is there an efficient way or an approximation, to use the already computed y_inv in the computation of z_inv? Any suggestions to increase the speed of inverting z are highly appreciated.
Floating point divisions are slow, especially for double-precision floating point numbers. Simple-precision is faster (with a relative error likely less than 2.5e-07) and it can be made even faster if you do not need a high-precision by computing the approximate reciprocal (with a maximum relative error for this approximation is less than 3.66e-4 on x86-64 processors).
Assuming this is OK to you to decrease the precision of results, here are the performance of the different approach on a Skylake processor (assuming Numpy is compiled with the support of the AVX SIMD instruction set):
Double-precision division: 8 operation/cycle/core
Simple-precision division: 5 operation/cycle/core
Simple-precision approximate reciprocal: 1 operation/cycle/core
You can easily switch to simple-precision with Numpy by specifying the dtype of arrays. For the reciprocal, you need to use Numba (or Cython) configured to use the fastmath model (a flag of njit).
Another solution is simply to execute this operation in using multiple threads. This can be easily done with Numba using the njit flag parallel=True and a loop iterating over nb.prange. This solution can be combined with the previous one (about precision) resulting in a much faster computation compared to the initial Numpy double-precision-based code.
Moreover, computing arrays in-place should be faster (especially when using multiple threads). Alternatively, you can preallocate the output array and reuse it (slower than the in-place method but faster than the naive approach). The parameter out of Numpy function (like np.divide) can be used to do that.
Here is an (untested) example of parallel Numba code:
import numba as nb
# See fastmath=True and float32 types for faster performance of approximate results
# Assume `out` is already preallocated
#njit('void(float64[::1], float64[::1], float64)', parallel=True)
def compute(out, y, p):
assert(out.size == y.size)
for i in nb.prange(y.size):
out[i] = 1.0 / (y[i] + p)
I'm trying to make a dot product of an expression and it was supposed to be symmetric.
It turns out that it just isn't.
B is a 4D array which I must transpose its last two dimensions to become B^t.
D is a 2D array. (It's an expression of the Stiffness Matrix known to the Finite Element Method programmers)
The numpy.dotproduct associated with numpy.transpose and as a second alternative numpy.einsum (the idea came from this topic: Numpy Matrix Multiplication U*B*U.T Results in Non-symmetric Matrix) have already been used and the problem persists.
By the end of the calculations the product B^tDB is obtained and when it's verified if it really is symmetric by subtracting its transpose B^tDB, there is still a residue.
The Dot product or the Einstein Summation are used only over the dimensions of interest (last ones).
The question is: How can these residues be eliminated?
You need to use arbitrary precision floating point math. Here's how you can combine numpy and the mpmath package to define an arbitrary precision version of matrix multiplication (ie the np.dot method):
from mpmath import mp, mpf
import numpy as np
# stands for "decimal places". Larger values
# mean higher precision, but slower computation
mp.dps = 75
def tompf(arr):
"""Convert any numpy array to one of arbitrary precision mpmath.mpf floats
"""
if arr.size and not isinstance(arr.flat[0], mpf):
return np.array([mpf(x) for x in arr.flat]).reshape(*arr.shape)
else:
return arr
def dotmpf(arr0, arr1):
"""An arbitrary precision version of np.dot
"""
return tompf(arr0).dot(tompf(arr1))
As an example, if you then set up B, B^t, and D matrices as so:
bshape = (8,8,8,8)
dshape = (8,8)
B = np.random.rand(*bshape)
BT = np.swapaxes(B, -2, -1)
d = np.random.rand(*dshape)
D = d.dot(d.T)
then B^tDB - (B^tDB)^t will always have a non-zero value if you calculate it using the standard matrix multiplication method from numpy:
M = np.dot(np.dot(B, D), BT)
np.sum(M - M.T)
but if you use the arbitrary precision version given above it won't have a residue:
M = dotmpf(dotmpf(B, D), BT)
np.sum(M - M.T)
Watch out though. Calculations using arbitrary precision math run much slower than those done using standard floating point numbers.
I am working on a vision algorithm with OpenCV in Python. One of the components of it requires comparing points in color-space, where the x and y components are not integers. Our list of points is stored as ndarray with dtype = float64, and our numbers range from -10 to 10 give or take.
Part of our algorithm involves running a convex hull on some of the points in this space, but cv2.convexHull() requires an ndarray with dtype = int.
Given the narrow range of the values we are comparing, simple truncation causes us to lose ~60 bits of information. Is there any way to have numpy directly interpret the float array as an int array? Since the scale has no significance, I would like all 64 bits to be considered.
Is there any defined way to separate the exponent from the mantissa in a numpy float, without doing bitwise extraction for every element?
"Part of our algorithm involves running a convex hull on some of the points in this space, but cv2.convexHull() requires an ndarray with dtype = int."
cv2.convexHull() also accepts numpy array with float32 number.
Try using cv2.convexHull(numpy.array(a,dtype = 'float32')) where a is a list of dimension n*2 (n = no. of points).
I am getting a very strange value for my (1,1) entry for my BinvA matrix
I am just trying to invert B matrix and do a (B^-1)A multiplication.
I understand that when I do the calculation by hand my (1,1) is supposed to be 0 but instead I get 1.11022302e-16. How can I fix it? I know floating point numbers can't be represented to full accuracy but why is this giving me such an inaccurate response and not rounding to 0 is there any way I can make it more accurate?
Her is my code:
import numpy as np
A = np.array([[2,2],[4,-1]],np.int)
A = A.transpose()
B = np.array([[1,3],[-1,-1]],np.int)
B = B.transpose()
Binv = np.linalg.inv(B) #calculate the inverse
BinvA = np.dot(Binv,A)
print(BinvA)
My print statement:
[[ 1.11022302e-16 -2.50000000e+00]
[ -2.00000000e+00 -6.50000000e+00]]
When you compute the inverse your arrays are converted in float64, whose machine epsilon is 1e-15. The epsilon is the relative quantization step of a floating-point number.
When in doubt we can ask numpy information about a floating-point data type using the finfo function. In this case
np.finfo('float64')
finfo(resolution=1e-15,
min=-1.7976931348623157e+308, max=1.7976931348623157e+308,
dtype=float64)
So, technically, your value being smaller than eps is a very accurate representation of 0 for a float64 type.
If it is only the representation that bothers you, you can tell numpy to don't print small floating point numbers (1 eps or less from 0) with:
np.set_printoptions(suppress=True)
After that your print statement returns:
[[ 0. -2.5]
[-2. -6.5]]
Note that this is a general numerical problem common to all the floating-point implementations. You can find more info about floating-point rounding errors on SO:
Why Are Floating Point Numbers Inaccurate?
or on the net:
Floating Point Accuracy Problems
What Every Computer Scientist Should Know About Floating-Point Arithmetic
This isn't a complete answer, but it may point you in the right direction. What you really want are numpy arrays that use Decimals for math. You might reasonably think to try:
import numpy as np
from decimal import Decimal
A = np.array([[2,2],[4,-1]],np.int)
for i, a in np.ndenumerate(A):
A[i] = Decimal(a)
print type(A[i])
But alas, Decimals are not among the datatypes supported out of the box in numpy, so each time you try to jam a Decimal into the array, it re-casts it as a float.
One possibility would be to set the datatype, thus:
def decimal_array(arr):
X = np.array(arr, dtype = Decimal)
for i, x in np.ndenumerate(X): X[i] = Decimal(x)
return X
A = decimal_array([[2,2],[4,-1]])
B = decimal_array([[1,3],[-1,-1]])
A = A.transpose()
B = B.transpose()
Binv = np.linalg.inv(B) #calculate the inverse
But now, if you
print Binv.dtype
you'll see that the inversion has recast it back to float. The reason is that linalg.inv (like many other functions) looks for B's "common_type," which is the scalar to which it believe it can force your array elements.
It may not be hopeless, though. I looked to see if you could solve this by creating a custom dtype, but it turns out that scalars (ints, floats, etc) are not dtypes at all. Instead, what you probably want to do is register a new scalar--that's the Decimal--as it says in the article on scalars. You'll see a link out to the Numpy C-API (don't be afraid). Search the page for "register" and "scalar" to get started.
The eigenvalues of a covariance matrix should be real and non-negative because covariance matrices are symmetric and semi positive definite.
However, take a look at the following experiment with scipy:
>>> a=np.random.random(5)
>>> b=np.random.random(5)
>>> ab = np.vstack((a,b)).T
>>> C=np.cov(ab)
>>> eig(C)
7.90174997e-01 +0.00000000e+00j,
2.38344473e-17 +6.15983679e-17j,
2.38344473e-17 -6.15983679e-17j,
-1.76100435e-17 +0.00000000e+00j,
5.42658040e-33 +0.00000000e+00j
However, reproducing the above example in Matlab works correctly:
a = [0.6271, 0.4314, 0.3453, 0.8073, 0.9739]
b = [0.1924, 0.3680, 0.0568, 0.1831, 0.0176]
C=cov([a;b])
eig(C)
-0.0000
-0.0000
0.0000
0.0000
0.7902
You have raised two issues:
The eigenvalues returned by scipy.linalg.eig are not real.
Some of the eigenvalues are negative.
Both of these issues are the result of errors introduced by truncation and rounding errors, which always happen with iterative algorithms using floating-point arithmetic. Note that the Matlab results also produced negative eigenvalues.
Now, for a more interesting aspect of the issue: why is Matlab's result real, whereas SciPy's result has some complex components?
Matlab's eig detects if the input matrix is real symmetric or Hermitian and uses Cholesky factorization when it is. See the description of the chol argument in the eig documentation. This is not done automatically in SciPy.
If you want to use an algorithm that exploits the structure of a real symmetric or Hermitian matrix, use scipy.linalg.eigh. For the example in the question:
>>> eigh(C, eigvals_only=True)
array([ -3.73825923e-17, -1.60154836e-17, 8.11704449e-19,
3.65055777e-17, 7.90175615e-01])
This result is the same as Matlab's, if you round to the same number of digits of precision that Matlab printed.
What you are experiencing is numerical instability due to limitations on floating point precision.
Note that:
(1) MATLAB also returned negative values, but the printing format is set to short and you don't see the full precision of the double stored in memory. Use format long g for printing more decimals
(2) All imaginary parts returned by numpy's linalg.eig are close to the machine precision. Thus you should consider them zero.