Python language reasons for NaN (other than dividing by zero)

Python language reasons for NaN (other than dividing by zero) - python

I am at rather a loss at the moment, I have a program that works well for most parameters, but for some parameters it throws a NaN exception. The problem seems to lie in calculating values slightly larger than 1. I have a function that is something like this f(x,t), for large values of x, f increases rapidly with t and there is no issue, however for small values of x f increases very slowly with t. For small values of x it doesn't work at all going something like this (array representing discrete time steps):
1.0
1.0
1.0
NaN
I don't understand how NaN is returned for values that clearly aren't tending away to infinity, and are slightly greater than 1. Is there some data type problem that might throw this exception?
For larger values of x the behavior might be something like this:
1.0
1.000000000000074
1.000000000000486
1.000000000000888
and so on with no issues at all.

PEP 754 specifically indicates PositiveInfinty/PositiveInfinity will return NaN. (And talks about using PosInf = 1e300000 as a value for PositiveInfinity but also points out this isn't portable or non-ugly)
The wiki page for IEEE 754 NaN's indicates the following additional examples:
Operations with a NaN as at least one operand.
Indeterminate forms
The divisions 0/0 and ±∞/±∞ (as mentioned in your question and above respectively)
The multiplications 0×±∞ and ±∞×0
The additions ∞ + (−∞), (−∞) + ∞ and equivalent subtractions
The standard has alternative functions for powers:
The standard pow function and the integer exponent pown function define 0^0, 1∞, and ∞0 as 1. (Note that 0**0 which I would expect to either use or at least be similar to pow in python 2.7 returns 1)
The powr function defines all three indeterminate forms as invalid operations and so returns NaN.
Real operations with complex results, for example: [which for all the ones I tried, using math.sqrt, math.log, math.asin instead throw Math Domain Error in python versus returning NaN)]
The square root of a negative number.
The logarithm of a negative number
The inverse sine or cosine of a number that is less than −1 or greater than +1.

Related

How do you find the largest float below some value?

Given a float x, I would like to find the largest floating point number that is less than x. How can I do this in Python?
I've tried subtracting machine epsilon from x (x - numpy.finfo(float).eps), but this evaluates to x for sufficiently large floats, and I need the value I get back to be strictly less than x.
There's some information about how to do this in C# here, but I have no idea how to do the same bitwise conversion in Python. Anybody know how to do this, or have another method for getting the same value?
(Bigger-picture problem -- I'm trying to numerically find the root of an equation with a singularity at x, within the bounds 0 < root < x. The solver (Scipy's toms748 implementation) evaluates on the boundaries, and it can't handle nan or inf values, so I can't give it exactly x as a bound. I don't know how close the root might be to the bound, so I want to give a bound as close to x as possible without actually producing an infinite value and crashing the solver.)

You are describing the basic usage of numpy.nextafter.
>>> import numpy as np
>>> np.nextafter(1.5, 0.0) # biggest float smaller than 1.5
1.4999999999999998
>>> np.nextafter(1.5, 2.0) # smallest float bigger than 1.5
1.5000000000000002

Python Numerical Differentiation and the minimum value for h

I calculate the first derivative using the following code:
def f(x):
f = np.exp(x)
return f
def dfdx(x):
Df = (f(x+h)-f(x-h)) / (2*h)
return Df
For example, for x == 10 this works fine. But when I set h to around 10E-14 or below, Df starts
to get values that are really far away from the expected value f(10) and the relative error between the expected value and Df becomes huge.
Why is that? What is happening here?

The evaluation of f(x) has, at best, a rounding error of |f(x)|*mu where mu is the machine constant of the floating point type. The total error of the central difference formula is thus approximately
2*|f(x)|*mu/(2*h) + |f'''(x)|/6 * h^2
In the present case, the exponential function is equal to all of its derivatives, so that the error is proportional to
mu/h + h^2/6
which has a minimum at h = (3*mu)^(1/3), which for the double format with mu=1e-16 is around h=1e-5.
The precision is increased if instead of 2*h the actual difference (x+h)-(x-h) between the evaluation points is used in the denominator. This can be seen in the following loglog plot of the distance to the exact derivative.

You are probably encountering some numerical instability, as for x = 10 and h =~ 1E-13, the argument for np.exp is very close to 10 whether h is added or subtracted, so small approximation errors in the value of np.exp are scaled significantly by the division with the very small 2 * h.

In addition to the answer by #LutzL I will add some info from a great book Numerical Recipes 3rd Edition: The Art of Scientific Computing from chapter 5.7 about Numerical Derivatives, especially about the choice of optimal h value for given x:
Always choose h so that h and x differ by an exactly representable number. Funny stuff like 1/3 should be avoided, except when x is equal to something along the lines of 14.3333333.
Round-off error is approximately epsilon * |f(x) * h|, where epsilon is floating point accuracy, Python represents floating point numbers with double precision so it's 1e-16. It may differ for more complicated functions (where precision errors arise further), though it's not your case.
Choice of optimal h: Not getting into details it would be sqrt(epsilon) * x for simple forward case, except when your x is near zero (you will find more information in the book), which is your case. You may want to use higher x values in such cases, complementary answer is already provided. In the case of f(x+h) - f(x-h) as in your example it would amount to epsilon ** 1/3 * x, so approximately 5e-6 times x, which choice might be a little difficult in case of small values like yours. Quite close (if one can say so bearing in mind floating point arithmetic...) to practical results posted by #LutzL though.
You may use other derivative formulas, except the symmetric one you are using. You may want to use the forward or backward evaluation(if the function is costly to evaluate and you have calculated f(x) beforehand. If your function is cheap to evaluate, you may want to evaluate it multiple times using higher order methods to make the precision error smaller (see five-point stencil on wikipedia as provided in the comment to your question).

This Python tutorial explains the reason behind the limited precision. In summary, decimals are ultimately represented in binary and the precision is about 17 significant digits. So, you are right that it gets fuzzy beyond 10E-14.

Extremely low values from NumPy

I am attempting to do a few different operations in Numpy (mean and interp), and with both operations I am getting the result 2.77555756156e-17 at various times, usually when I'm expecting a zero. Even attempting to filter these out with array[array < 0.0] = 0.0 fails to remove the values.
I assume there's some sort of underlying data type or environment error that's causing this. The data should all be float.
Edit: It's been helpfully pointed out that I was only filtering out the values of -2.77555756156e-17 but still seeing positive 2.77555756156e-17. The crux of the question is what might be causing these wacky values to appear when doing simple functions like interpolating values between 0-10 and taking a mean of floats in the same range, and how can I avoid it without having to explicitly filter the arrays after every statement.

You're running into numerical precision, which is a huge topic in numerical computing; when you do any computation with floating point numbers, you run the risk of running into tiny values like the one you've posted here. What's happening is that your calculations are resulting in values that can't quite be expressed with floating-point numbers.
Floating-point numbers are expressed with a fixed amount of information (in Python, this amount defaults to 64 bits). You can read more about how that information is encoded on the very good Floating point Wikipedia page. In short, some calculation that you're performing in the process of computing your mean produces an intermediate value that cannot be precisely expressed.
This isn't a property of numpy (and it's not even really a property of Python); it's really a property of the computer itself. You can see this is normal Python by playing around in the repl:
>>> repr(3.0)
'3.0'
>>> repr(3.0 + 1e-10)
'3.0000000001'
>>> repr(3.0 + 1e-18)
'3.0'
For the last result, you would expect 3.000000000000000001, but that number can't be expressed in a 64-bit floating point number, so the computer uses the closest approximation, which in this case is just 3.0. If you were trying to average the following list of numbers:
[3., -3., 1e-18]
Depending on the order in which you summed them, you could get 1e-18 / 3., which is the "correct" answer, or zero. You're in a slightly stranger situation; two numbers that you expected to cancel didn't quite cancel out.
This is just a fact of life when you're dealing with floating point mathematics. The common way of working around it is to eschew the equals sign entirely and to only perform "numerically tolerant comparison", which means equality-with-a-bound. So this check:
a == b
Would become this check:
abs(a - b) < TOLERANCE
For some tolerance amount. The tolerance depends on what you know about your inputs and the precision of your computer; if you're using a 64-bit machine, you want this to be at least 1e-10 times the largest amount you'll be working with. For example, if the biggest input you'll be working with is around 100, it's reasonable to use a tolerance of 1e-8.

You can round your values to 15 digits:
a = a.round(15)
Now the array a should show you 0.0 values.
Example:
>>> a = np.array([2.77555756156e-17])
>>> a.round(15)
array([ 0.])

This is most likely the result of floating point arithmetic errors. For instance:
In [3]: 0.1 + 0.2 - 0.3
Out[3]: 5.551115123125783e-17
Not what you would expect? Numpy has a built in isclose() method that can deal with these things. Also, you can see the machine precision with
eps = np.finfo(np.float).eps
So, perhaps something like this could work too:
a = np.array([[-1e-17, 1.0], [1e-16, 1.0]])
a[np.abs(a) <= eps] = 0.0

Why is the value of 1**Inf equal to 1, not NaN?

Why is 1**Inf == 1 ?
I believe it should be NaN, just like Inf-Inf or Inf/Inf.
How is exponentiation implemented on floats in python?
exp(y*log(x)) would get correct result :/

You are right, mathematically, the value of 1∞ is indeterminate.
However, Python doesn't follow the maths exactly in this case. The document of math.pow says:
math.pow(x, y)
Return x raised to the power y. Exceptional cases follow Annex ‘F’ of the C99 standard as far as possible. In particular, pow(1.0, x) and pow(x, 0.0) always return 1.0, even when x is a zero or a NaN.

Floating-point arithmetic is not real-number arithmetic. Notions of "correct" informed by real analysis do not necessarily apply to floating-point.
In this case, however, the trouble is just that pow fundamentally represents two similar but distinct functions:
Exponentiation with an integer power, which is naturally a function RxZ --> R (or RxN --> R).
The two-variable complex function given by pow(x,y) = exp(y * log(x)) restricted to the real line.
These functions agree for normal values, but differ in their edge cases at zero, infinity, and along the negative real axis (which is traditionally the branch cut for the second function).
These two functions are sometimes divided up to make the edge cases more reasonable; when that's done the first function is called pown and the second is called powr; as you have noticed pow is a conflation of the two functions, and uses the edge cases for these values that come from pown.

Technically 1^inf is defined as limit(1^x, x->inf). 1^x = 1 for any x >1, so it should be limit(1,x->inf) = 1, not NaN

Determine whether two complex numbers are equal

The following code causes the print statements to be executed:
import numpy as np
import math
foo = np.array([1/math.sqrt(2), 1/math.sqrt(2)], dtype=np.complex_)
total = complex(0, 0)
one = complex(1, 0)
for f in foo:
total = total + pow(np.abs(f), 2)
if(total != one):
print str(total) + " vs " + str(one)
print "NOT EQUAL"
However, my input of [1/math.sqrt(2), 1/math.sqrt(2)] results in the total being one:
(1+0j) vs (1+0j) NOT EQUAL
Is it something to do with mixing NumPy with Python's complex type?

When using floating point numbers it is important to keep in mind that working with these numbers is never accurate and thus computations are every time subject to rounding errors. This is caused by the design of floating point arithmetic and currently the most practicable way to do high arbitrary precision mathematics on computers with limited resources. You can't compute exactly using floats (means you have practically no alternative), as your numbers have to be cut off somewhere to fit in a reasonable amount of memory (in most cases at maximum 64 bits), this cut-off is done by rounding it (see below for an example).
To deal correctly with these shortcomings you should never compare to floats for equality, but for closeness. Numpy provides 2 functions for that: np.isclose for comparison of single values (or a item-wise comparison for arrays) and np.allclose for whole arrays. The latter is a np.all(np.isclose(a, b)), so you get a single value for an array.
>>> np.isclose(np.float32('1.000001'), np.float32('0.999999'))
True
But sometimes the rounding is very practicable and matches with our analytical expectation, see for example:
>>> np.float(1) == np.square(np.sqrt(1))
True
After squaring the value will be reduced in size to fit in the given memory, so in this case it's rounded to what we would expect.
These two functions have built-in absolute and relative tolerances (you can also give then as parameter) that are use to compare two values. By default they are rtol=1e-05 and atol=1e-08.
Also, don't mix different packages with their types. If you use Numpy, use Numpy-Types and Numpy-Functions. This will also reduce your rounding errors.
Btw: Rounding errors have even more impact when working with numbers which differ in their exponent widely.

I guess, the same considerations as for real numbers are applicable: never assume they can be equal, but rather close enough:
eps = 0.000001
if abs(a - b) < eps:
print "Equal"

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python language reasons for NaN (other than dividing by zero) - python

Related

How do you find the largest float below some value?

Python Numerical Differentiation and the minimum value for h

Extremely low values from NumPy

Why is the value of 1**Inf equal to 1, not NaN?

Determine whether two complex numbers are equal

Categories

Resources