cancellation in numpy array operation including a scalar - python

I'm using NumPy version 1.7.1.
Now I came across a strange cancellation I don't understand:
>>> import numpy as np
>>> a = np.array([ 883, 931, 874], dtype=np.float32)
Mathematically a+0.1-a should be 0.1.
Now let's calculate the value of
this expression and absolute and relative error:
>>> a+0.1-a
array([ 0.09997559, 0.09997559, 0.09997559], dtype=float32)
>>> (a+0.1-a)-0.1
array([ -2.44155526e-05, -2.44155526e-05, -2.44155526e-05], dtype=float32)
>>> ((a+0.1-a)-0.1) / 0.1
array([-0.00024416, -0.00024416, -0.00024416], dtype=float32)
First question: This is a quite high absolute and relative error, this is just catastrophic cancellation, isn't it?
Second question: When I use an array instead of the scalar, NumPy is able to calculate with much more precision, see the relative error:
>>> a+np.array((0.1,)*3)-a
array([ 0.1, 0.1, 0.1])
>>> (a+np.array((0.1,)*3)-a)-0.1
array([ 2.27318164e-14, 2.27318164e-14, 2.27318164e-14])
This is just the numerical representation of 0.1 I guess.
But why is NumPy not able to handle this the same way if a scalar is used instead of an array as in a+0.1-a?

If you use double precision the scenario changes. What you are getting is expected for single precision (np.float32):
a = np.array([ 883, 931, 874], dtype=np.float64)
a+0.1-a
# array([ 0.1, 0.1, 0.1])
((a+0.1-a)-0.1) / 0.1
# array([ 2.27318164e-13, 2.27318164e-13, 2.27318164e-13])
Using np.array((0.1,)*3) in the middle of the expression turned everything to float64, which explains the higher precision in the second result.

Related

True inverse function for cosine in numpy? (NOT arccos)

Here is a weird one:
I have found myself needing a numpy function that is what I would call the true inverse of np.cos (or another trigonometric function, cosine is used here for definiteness). What I mean by ''true inverse'' is a function invcos, such that
np.cos(invcos(x)) = x
for any real float x. Two observations: invcos(x) exists (it is a complex float) and np.arccos(x) does not do the job, because it only works for -1 < x < 1.
My question is if there is an efficient numpy function for this operation or if it can built from existing ones easily?
My attempt was to use a combination of np.arccos and np.arccosh to build the function by hand. This is based on the observation that np.arccos can deal with x in [-1,1] and np.arccosh can deal with x outside [-1,1] if one multiplies by the complex unit. To see that this works:
cos_x = np.array([0.5, 1., 1.5])
x = np.arccos(cos_x)
cos_x_reconstucted = np.cos(x)
# [0.5 1. nan]
x2 = 1j*np.arccosh(cos_x)
cos_x_reconstructed2 = np.cos(x2)
# [nan+nanj 1.-0.j 1.5-0.j]
So we could combine this to
def invcos(array):
x1 = np.arccos(array)
x2 = 1j*np.arccosh(array)
print(x1)
print(x2)
x = np.empty_like(x1, dtype=np.complex128)
x[~np.isnan(x1)] = x1[~np.isnan(x1)]
x[~np.isnan(x2)] = x2[~np.isnan(x2)]
return x
cos_x = np.array([0.5, 1., 1.5])
x = invcos(cos_x)
cos_x_reconstructed = np.cos(x)
# [0.5-0.j 1.-0.j 1.5-0.j]
This gives the correct results, but naturally raises RuntimeWarnings:
RuntimeWarning: invalid value encountered in arccos.
I guess since numpy even tells me that my algorithm is not efficient, it is probably not efficient. Is there a better way to do this?
For readers who are interested in why this strange function may be useful: The motivation comes from a physics background. In certain theories, one can have vector components that are 'off-shell', which means that the components might even be longer than the vector. The above function can be useful to nevertheless parametrize things in terms of angles.
My question is if there is an efficient numpy function for this operation or if it can built from existing ones easily?
Yes; it is... np.arccos.
From the documentation:
For real-valued input data types, arccos always returns real output. For each value that cannot be expressed as a real number or infinity, it yields nan and sets the invalid floating point error flag.
For complex-valued input, arccos is a complex analytic function that has branch cuts [-inf, -1] and [1, inf] and is continuous from above on the former and from below on the latter.
So all we need to do is ensure that the input is a complex number (even if its imaginary part is zero):
>>> import numpy as np
>>> np.arccos(2.0)
__main__:1: RuntimeWarning: invalid value encountered in arccos
nan
>>> np.arccos(2 + 0j)
-1.3169578969248166j
For an array, we need the appropriate dtype:
>>> np.arccos(np.ones((3,3)) * 2)
array([[nan, nan, nan],
[nan, nan, nan],
[nan, nan, nan]])
>>> np.arccos(np.ones((3,3), dtype=np.complex) * 2)
array([[0.-1.3169579j, 0.-1.3169579j, 0.-1.3169579j],
[0.-1.3169579j, 0.-1.3169579j, 0.-1.3169579j],
[0.-1.3169579j, 0.-1.3169579j, 0.-1.3169579j]])

Truncating decimal digits numpy array of floats

I want to truncate the float values within the numpy array, for .e.g.
2.34341232 --> 2.34
I read the post truncate floating point but its for one float. I don't want to run a loop on the numpy array, it will be quite expensive. Is there any inbuilt method within numpy which can do this easily? I do need output as a float not string.
Try out this modified version of numpy.trunc().
import numpy as np
def trunc(values, decs=0):
return np.trunc(values*10**decs)/(10**decs)
Sadly, numpy.trunc function doesn't allow decimal truncation. Luckily, multiplying the argument and dividing it's result by a power of ten give the expected results.
vec = np.array([-4.79, -0.38, -0.001, 0.011, 0.4444, 2.34341232, 6.999])
trunc(vec, decs=2)
which returns:
>>> array([-4.79, -0.38, -0. , 0.01, 0.44, 2.34, 6.99])
Use numpy.round:
import numpy as np
a = np.arange(4) ** np.pi
a
=> array([ 0. , 1. , 8.82497783, 31.5442807 ])
a.round(decimals=2)
=> array([ 0. , 1. , 8.82, 31.54])

Difference between sum and np.sum for complex numbers numpy

I am trying to split up the multiplication of a dft matrix in to real and imaginary parts
from scipy.linalg import dft
improt numpy as np
# x is always real
x = np.ones(4)
W = dft(4)
Wx = W.dot(x)
Wxme = np.real(W).dot(x) + np.imag(W).dot(x)*1.0j
I would like that Wx and Wxme give the same value but they are not at all. I have narrowed down the bug a bit more:
In [62]: W[1]
Out[62]:
array([ 1.00000000e+00 +0.00000000e+00j,
6.12323400e-17 -1.00000000e+00j,
-1.00000000e+00 -1.22464680e-16j, -1.83697020e-16 +1.00000000e+00j])
In [63]: np.sum(W[1])
Out[63]: (-2.2204460492503131e-16-1.1102230246251565e-16j)
In [64]: sum(W[1])
Out[64]: (-1.8369701987210297e-16-2.2204460492503131e-16j)
Why do sum and np.sum give different values ?? addition of complex numbers should not be anything but adding the real parts and the imaginary parts seperately right ??
Adding the by hand gives me the result I would expect as opposed to what numy gives me:
In [65]: 1.00000000e+00 + 6.12323400e-17 + -1.00000000e+00 + 1.83697020e-16
Out[65]: 1.8369702e-16
What am I missing ??
Up to rounding error, these results are equal. The results have slightly different rounding error due to factors such as different summation order or different levels of precision used to represent intermediate results.

Why is the mean larger than the max in this array?

I have found myself with a very confusing array in Python. There following is the output from iPython when I work with it (with the pylab flag):
In [1]: x = np.load('x.npy')
In [2]: x.shape
Out[2]: (504000,)
In [3]: x
Out[3]:
array([ 98.20354462, 98.26583099, 98.26529694, ..., 98.20297241,
98.19876862, 98.29492188], dtype=float32)
In [4]: min(x), mean(x), max(x)
Out[4]: (97.950058, 98.689438, 98.329773)
I have no idea what is going on. Why is the mean() function providing what is obviously the wrong answer?
I don't even know where to begin to debug this problem.
I am using Python 2.7.6.
I would be willing to share the .npy file if necessary.
Probably due to accumulated rounding error in computing mean(). float32 relative precision is ~ 1e-7, and you have 500000 elements -> ~ 5% rounding in direct computation of sum().
The algorithm for computing sum() and mean() is more sophisticated (pairwise summation) in the latest Numpy version 1.9.0:
>>> import numpy
>>> numpy.__version__
'1.9.0'
>>> x = numpy.random.random(500000).astype("float32") + 300
>>> min(x), numpy.mean(x), max(x)
(300.0, 300.50024, 301.0)
In the meanwhile, you may want to use higher-precision accumulator type: numpy.mean(x, dtype=numpy.float64)
I have included a snippet from the np.mean.__doc__ below. You should try using np.mean(x, dtype=np.float64).
-----
The arithmetic mean is the sum of the elements along the axis divided
by the number of elements.
Note that for floating-point input, the mean is computed using the
same precision the input has. Depending on the input data, this can
cause the results to be inaccurate, especially for `float32` (see
example below). Specifying a higher-precision accumulator using the
`dtype` keyword can alleviate this issue.
In single precision, `mean` can be inaccurate:
>>> a = np.zeros((2, 512*512), dtype=np.float32)
>>> a[0, :] = 1.0
>>> a[1, :] = 0.1
>>> np.mean(a)
0.546875
Computing the mean in float64 is more accurate:
>>> np.mean(a, dtype=np.float64)
0.55000000074505806

Rounding numpy float array read from img with python, values returned not rounded

Simple rounding of a floating point numpy array seems not working for some reason..
I get numpy array from reading a huge img (shape of (7352, 7472)). Ex values:
>>> imarray[3500:3503, 5000:5003]
array([[ 73.33999634, 73.40000153, 73.45999908],
[ 73.30999756, 73.37999725, 73.43000031],
[ 73.30000305, 73.36000061, 73.41000366]], dtype=float32)
And for rounding I've been just trying to use numpy.around() for the raw value, also writing values to a new array, a copie of raw array, but for some reason no results..
arr=imarray
numpy.around(imarray, decimals=3, out=arr)
arr[3500,5000] #results in 73.3399963379, as well as accessing imarray
So, even higher precision!!!
Is that because of such big array?
I need to round it to get the most frequent value (mode), and I'm searching the vay to avoid more and more libraries..
Your array has dtype float32. That is a 4-byte float.
The closest float to 73.340 representable using float32 is roughly 73.33999634:
In [62]: x = np.array([73.33999634, 73.340], dtype = np.float32)
In [63]: x
Out[63]: array([ 73.33999634, 73.33999634], dtype=float32)
So I think np.around is rounding correctly, it is just that your dtype has too large a granularity to round to the number you might be expecting.
In [60]: y = np.around(x, decimals = 3)
In [61]: y
Out[61]: array([ 73.33999634, 73.33999634], dtype=float32)
Whereas, if the dtype were np.float64:
In [64]: x = np.array([73.33999634, 73.340], dtype = np.float64)
In [65]: y = np.around(x, decimals = 3)
In [66]: y
Out[66]: array([ 73.34, 73.34])
Note that even though printed representation for y shows 73.34, it is not necessarily true that the real number 73.34 is exactly representable as a float64 either. The float64 representation is probably just so close to 73.34 that NumPy chooses to print it as 73.34.
The answer by #unutbu is absolutely correct. Numpy is rounding it as close to the number as it can given the precision that you requested. The only thing that I have to add is that you can use numpy.set_printoptions to change how the array is displayed:
>>> import numpy as np
>>> x = np.array([73.33999634, 73.340], dtype = np.float32)
>>> y = np.round(x, decimals = 3)
>>> y
array([ 73.33999634, 73.33999634], dtype=float32)
>>> np.set_printoptions(precision=3)
>>> y
array([ 73.34, 73.34], dtype=float32)

Categories