Mitigating Floating Point Approximation Issues with Numpy

Mitigating Floating Point Approximation Issues with Numpy - python

My code is quite simple, and only 1 line is causing an issue:
np.tan(np.radians(rotation))
Instead of my expected output for rotation = 45 as 1, I get 0.9999999999999999. I understand that 0 and a ton of 9's is 1. In my use case, however, it seems like the type of thing that will definitely build up over iterations.
What is causing the floating point error: np.tan or np.radians, and how do I get the problem function to come out correctly regardless of floating point inaccuracies?
Edit:
I should clarify that I am familiar with floating point inaccuracies. My concern is that as that number gets multiplied, added, and compared, the 1e-6 error suddenly becomes a tangible issue. I've normally been able to safely ignore floating point issues, but now I am far more concerned about the build up of error. I would like to reduce the possibility of such an error.
Edit 2:
My current solution is to just round to 8 decimal places because that's most likely enough. It's sort of a temporary solution because I'd much prefer a way to get around the IEEE decimal representations.

What is causing the floating point error: np.tan or np.radians, and how do I get the problem function to come out correctly regardless of floating point inaccuracies?
Both functions incur rounding error, since in neither case is the exact result representable in floating point.
My current solution is to just round to 8 decimal places because that's most likely enough. It's sort of a temporary solution because I'd much prefer a way to get around the IEEE decimal representations.
The problem has nothing to do with decimal representation, and this will give worse results outside of the exact case you mention above, e.g.
>>> np.tan(np.radians(60))
1.7320508075688767
>>> round(np.tan(np.radians(60)), 8)
1.73205081
>>> np.sqrt(3) # sqrt is correctly rounded, so this is the closest float to the true result
1.7320508075688772
If you absolutely need higher accuracy than the 15 decimal digits you would get from code above, then you can use an arbitrary precision library like gmpy2.

Take a look here: https://docs.scipy.org/doc/numpy/user/basics.types.html .
Standard dtypes in numpy do not go beyond 64 bits precision. From the docs:
Be warned that even if np.longdouble offers more precision than python
float, it is easy to lose that extra precision, since python often
forces values to pass through float. For example, the % formatting
operator requires its arguments to be converted to standard python
types, and it is therefore impossible to preserve extended precision
even if many decimal places are requested. It can be useful to test
your code with the value 1 + np.finfo(np.longdouble).eps.
You can increase precision with np.longdouble, but this is platform dependent
In spyder (windows):
np.finfo(np.longdouble).eps #same precision as float
>> 2.220446049250313e-16
np.finfo(np.longdouble).precision
>> 15
In google colab:
np.finfo(np.longdouble).eps #larger precision
>> 1.084202172485504434e-19
np.finfo(np.longdouble).precision
>> 18
print(np.tan(np.radians(45, dtype=np.float), dtype=np.float) - 1)
print(np.tan(np.radians(45, dtype=np.longfloat), dtype=np.longfloat) - 1)
>> -1.1102230246251565e-16
0.0

Related

Algorithm to check if a given integer is a power of two; I am using log and failing? [duplicate]

I work daily with Python 2.4 at my company. I used the versatile logarithm function 'log' from the standard math library, and when I entered log(2**31, 2) it returned 31.000000000000004, which struck me as a bit odd.
I did the same thing with other powers of 2, and it worked perfectly. I ran 'log10(2**31) / log10(2)' and I got a round 31.0
I tried running the same original function in Python 3.0.1, assuming that it was fixed in a more advanced version.
Why does this happen? Is it possible that there are some inaccuracies in mathematical functions in Python?

This is to be expected with computer arithmetic. It is following particular rules, such as IEEE 754, that probably don't match the math you learned in school.
If this actually matters, use Python's decimal type.
Example:
from decimal import Decimal, Context
ctx = Context(prec=20)
two = Decimal(2)
ctx.divide(ctx.power(two, Decimal(31)).ln(ctx), two.ln(ctx))

You should read "What Every Computer Scientist Should Know About Floating-Point Arithmetic".
http://docs.sun.com/source/806-3568/ncg_goldberg.html

Always assume that floating point operations will have some error in them and check for equality taking that error into account (either a percentage value like 0.00001% or a fixed value like 0.00000000001). This inaccuracy is a given since not all decimal numbers can be represented in binary with a fixed number of bits precision.
Your particular case is not one of them if Python uses IEEE754 since 31 should be easily representable with even single precision. It's possible however that it loses precision in one of the many steps it takes to calculate log2231, simply because it doesn't have code to detect special cases like a direct power of two.

floating-point operations are never exact. They return a result which has an acceptable relative error, for the language/hardware infrastructure.
In general, it's quite wrong to assume that floating-point operations are precise, especially with single-precision. "Accuracy problems" section from Wikipedia Floating point article :)

IEEE double floating point numbers have 52 bits of precision. Since 10^15 < 2^52 < 10^16, a double has between 15 and 16 significant figures. The result 31.000000000000004 is correct to 16 figures, so it is as good as you can expect.

This is normal. I would expect log10 to be more accurate then log(x, y), since it knows exactly what the base of the logarithm is, also there may be some hardware support for calculating base-10 logarithms.

float are imprecise
I don't buy that argument, because exact power of two are represented exactly on most platforms (with underlying IEEE 754 floating point).
So if we really want that log2 of an exact power of 2 be exact, we can.
I'll demonstrate it in Squeak Smalltalk, because it is easy to change the base system in that language, but the language does not really matter, floating point computation are universal, and Python object model is not that far from Smalltalk.
For taking log in base n, there is the log: function defined in Number, which naively use the Neperian logarithm ln:
log: aNumber
"Answer the log base aNumber of the receiver."
^self ln / aNumber ln
self ln (take the neperian logarithm of receiver) , aNumber ln and / are three operations that will round there result to nearest Float, and these rounding error can cumulate... So the naive implementation is subject to the rounding error you observe, and I guess that Python implementation of log function is not much different.
((2 raisedTo: 31) log: 2) = 31.000000000000004
But if I change the definition like this:
log: aNumber
"Answer the log base aNumber of the receiver."
aNumber = 2 ifTrue: [^self log2].
^self ln / aNumber ln
provide a generic log2 in Number class:
log2
"Answer the base-2 log of the receiver."
^self asFloat log2
and this refinment in Float class:
log2
"Answer the base 2 logarithm of the receiver.
Care to answer exact result for exact power of two."
^self significand ln / Ln2 + self exponent asFloat
where Ln2 is a constant (2 ln), then I effectively get an exact log2 for exact power of two, because significand of such number = 1.0 (including subnormal for Squeak exponent/significand definition), and 1.0 ln = 0.0.
The implementation is quite trivial, and should translate without difficulty in Python (probably in the VM); the runtime cost is very cheap, so it's just a matter of how important we think this feature is, or is not.
As I always say, the fact that floating point operations results are rounded to nearest (or whatever rounding direction) representable value is not a license to waste ulp. Exactness has a cost, both in term of runtime penalty and implementation complexity, so it's trade-offs driven.

The representation (float.__repr__) of a number in python tries to return a string of digits as close to the real value as possible when converted back, given that IEEE-754 arithmetic is precise up to a limit. In any case, if you printed the result, you wouldn't notice:
>>> from math import log
>>> log(2**31,2)
31.000000000000004
>>> print log(2**31,2)
31.0
print converts its arguments to strings (in this case, through the float.__str__ method), which caters for the inaccuracy by displaying less digits:
>>> log(1000000,2)
19.931568569324174
>>> print log(1000000,2)
19.9315685693
>>> 1.0/10
0.10000000000000001
>>> print 1.0/10
0.1
usuallyuseless' answer is very useful, actually :)

If you wish to calculate the highest power of 'k' in a number 'n'. Then the code below might be helpful:
import math
answer = math.ceil(math.log(n,k))
while k**answer>n:
answer-=1
NOTE: You shouldn't use 'if' instead of 'while' because that will give wrong results in some cases like n=2**51-1 and k=2. In this example with 'if' the answer is 51 whereas with 'while' the answer is 50, which is correct.

1 == 2 for large numbers of 1

I'm wondering what causes this behaviour. I haven't been able to find an answer that covers this. It is probably something simple and obvious, but it is not to me. I am using python 2.7.3 in Ubuntu.
In [1]: 2 == 1.9999999999999999
Out[1]: True
In [2]: 2 == 1.999999999999999
Out[2]: False
EDIT:
To clarify my question. Is there a written(in documentation) max number of 9's where python will evaluate the expression above as being equal to 2?

Python uses floating point representation
What a floating point actually is, is a fixed-width binary number (called the "significand") plus a small integer to tell you how many powers of two to shift that value by (the "exponent"). Plus a sign bit. Just like scientific notation, but in base 2 instead of 10.
The closest 64 bit floating point value to 1.9999999999999999 is 2.0, because 64 bit floating point values (so-called "double precision") uses 52 bits of significand, which is equivalent to about 15 decimal places. So the literal 1.9999999999999999 is just another way of writing 2.0. However, the closest value to 1.999999999999999 is less than 2.0 (I think it's 1.9999999999999988897769753748434595763683319091796875 exactly, but I'm too lazy to check that's correct, I'm just relying on Python's formatting code to be exact).
I don't actually know whether the use specifically of 64 bit floats is required by the Python language, or is an implementation detail of CPython. But whatever size is used, the important thing is not specifically the number of decimal places, it is where the closest floating-point value of that size lies to your decimal literal. It will be closer for some literals than others.
Hence, 1.9999999999999999 == 2 for the same reason that 2.0 == 2 (Python allows mixed-type numeric operations including comparison, and the integer 2 is equal to the float 2.0). Whereas 1.999999999999999 != 2.

Types coercion
>>> 2 == 2.0
True
And consequences of maximum number of digits that can be represented in python :
>>> import sys
>>> sys.float_info.dig
15
>>> 1.9999999999999999
2.0
more from docs
>>> float('9876543211234567')
9876543211234568.0
note ..68 on the end instead of expected ..67

This is due to the way floats are implemented in Python. To keep it short and simple: Since floats almost always are an approximation and thus have more digits than most people find useful, the Python interpreter displays a rounded value.
More detailed, floats are stored in binary. This means that they're stored as fractions to the base 2, unlike decimal, were you can display a float as fractions to the base 10. However, most decimal fractions don't have an exact representation in binary. Because of that, they are typically stored with a precision of 53 bits. This renders them pretty much useless if you want to do more complex arithmetic operations, since you'll run into some strange problems, e. g.:
>>> 0.1 + 0.2
0.30000000000000004
>>> round(2.675, 2)
2.67
See The docs on floats as well.

Mathematically speaking, 2.0 does equal 1.9999... forever. They are two different ways of writing the same number.
However, in software, it's important to never compare two floats or decimals for equality - instead, subtract them, take the absolute value, and verify that the (always positive) difference is sufficiently low for your purposes.
EG:
if abs(value1 - value2) < 1e10:
# they are close enough
else:
# they are not
You probably should set EPSILON = 1e10, and use the symbolic constant instead of scattering 1e10 throughout your code, or better still use a comparison function.

Rounding ** 0.5 and math.sqrt

In Python, are either
n**0.5 # or
math.sqrt(n)
recognized when a number is a perfect square? Specifically, should I worry that when I use
int(n**0.5) # instead of
int(n**0.5 + 0.000000001)
I might accidentally end up with the number one less than the actual square root due to precision error?

As several answers have suggested integer arithmetic, I'll recommend the gmpy2 library. It provides functions for checking if a number is a perfect power, calculating integer square roots, and integer square root with remainder.
>>> import gmpy2
>>> gmpy2.is_power(9)
True
>>> gmpy2.is_power(10)
False
>>> gmpy2.isqrt(10)
mpz(3)
>>> gmpy2.isqrt_rem(10)
(mpz(3), mpz(1))
Disclaimer: I maintain gmpy2.

Yes, you should worry:
In [11]: int((100000000000000000000000000000000000**2) ** 0.5)
Out[11]: 99999999999999996863366107917975552L
In [12]: int(math.sqrt(100000000000000000000000000000000000**2))
Out[12]: 99999999999999996863366107917975552L
obviously adding the 0.000000001 doesn't help here either...
As #DSM points out, you can use the decimal library:
In [21]: from decimal import Decimal
In [22]: x = Decimal('100000000000000000000000000000000000')
In [23]: (x ** 2).sqrt() == x
Out[23]: True
for numbers over 10**999999999, provided you keep a check on the precision (configurable), it'll throw an error rather than an incorrect answer...

Both **0.5 and math.sqrt() perform the calculation using floating point arithmetic. The input is converted to float before the square root is calculated.
Do these calculations recognize when the input value is a perfect square?
No they do not. Floating arithmetic has no concept of perfect squares.
large integers may not be representable, for values where the number has more significant digits than available in the floating point mantissa. It's easy to see therefore that for non-representable input values, n**0.5 may be innaccurate. And you proposed fix by adding a small value will not in general fix the problem.
If your input is an integer then you should consider performing your calculation using integer arithmetic. That ultimately is the right way to deal with this.

You can use the round(number, significant_figures) before converting to an int, I cannot recall if python truncs or rounds when doing a float-to-integer conversion.
In any case, since python uses floating point arithmetic, all the pitfalls apply. See:
http://docs.python.org/2/tutorial/floatingpoint.html

Perfect-square values will have no fractional components, so your main worry would be very large values, and for such values a difference of 1 or 2 being significant means you're going to want a specific numerical library that supports such high precision (as DSM mentions, the Decimal library, standard since Python 2.4, should be able to do what you want as it supports arbitrary precision.
http://docs.python.org/library/decimal.html

sqrt is one of the easier math library functions to implement, and any math library of reasonable quality will implement it with faithful rounding (sub-ULP accuracy). If the input is a perfect square, its square root is representable (in a reasonable floating-point format). In this case, faithful rounding guarantees the result is exact.
This addresses only the value actually passed to sqrt. Whether a number can be converted without error from another format to the floating-point input for sqrt is a separate issue.

Inaccurate Logarithm in Python

I work daily with Python 2.4 at my company. I used the versatile logarithm function 'log' from the standard math library, and when I entered log(2**31, 2) it returned 31.000000000000004, which struck me as a bit odd.
I did the same thing with other powers of 2, and it worked perfectly. I ran 'log10(2**31) / log10(2)' and I got a round 31.0
I tried running the same original function in Python 3.0.1, assuming that it was fixed in a more advanced version.
Why does this happen? Is it possible that there are some inaccuracies in mathematical functions in Python?

This is to be expected with computer arithmetic. It is following particular rules, such as IEEE 754, that probably don't match the math you learned in school.
If this actually matters, use Python's decimal type.
Example:
from decimal import Decimal, Context
ctx = Context(prec=20)
two = Decimal(2)
ctx.divide(ctx.power(two, Decimal(31)).ln(ctx), two.ln(ctx))

You should read "What Every Computer Scientist Should Know About Floating-Point Arithmetic".
http://docs.sun.com/source/806-3568/ncg_goldberg.html

Always assume that floating point operations will have some error in them and check for equality taking that error into account (either a percentage value like 0.00001% or a fixed value like 0.00000000001). This inaccuracy is a given since not all decimal numbers can be represented in binary with a fixed number of bits precision.
Your particular case is not one of them if Python uses IEEE754 since 31 should be easily representable with even single precision. It's possible however that it loses precision in one of the many steps it takes to calculate log2231, simply because it doesn't have code to detect special cases like a direct power of two.

floating-point operations are never exact. They return a result which has an acceptable relative error, for the language/hardware infrastructure.
In general, it's quite wrong to assume that floating-point operations are precise, especially with single-precision. "Accuracy problems" section from Wikipedia Floating point article :)

IEEE double floating point numbers have 52 bits of precision. Since 10^15 < 2^52 < 10^16, a double has between 15 and 16 significant figures. The result 31.000000000000004 is correct to 16 figures, so it is as good as you can expect.

This is normal. I would expect log10 to be more accurate then log(x, y), since it knows exactly what the base of the logarithm is, also there may be some hardware support for calculating base-10 logarithms.

float are imprecise
I don't buy that argument, because exact power of two are represented exactly on most platforms (with underlying IEEE 754 floating point).
So if we really want that log2 of an exact power of 2 be exact, we can.
I'll demonstrate it in Squeak Smalltalk, because it is easy to change the base system in that language, but the language does not really matter, floating point computation are universal, and Python object model is not that far from Smalltalk.
For taking log in base n, there is the log: function defined in Number, which naively use the Neperian logarithm ln:
log: aNumber
"Answer the log base aNumber of the receiver."
^self ln / aNumber ln
self ln (take the neperian logarithm of receiver) , aNumber ln and / are three operations that will round there result to nearest Float, and these rounding error can cumulate... So the naive implementation is subject to the rounding error you observe, and I guess that Python implementation of log function is not much different.
((2 raisedTo: 31) log: 2) = 31.000000000000004
But if I change the definition like this:
log: aNumber
"Answer the log base aNumber of the receiver."
aNumber = 2 ifTrue: [^self log2].
^self ln / aNumber ln
provide a generic log2 in Number class:
log2
"Answer the base-2 log of the receiver."
^self asFloat log2
and this refinment in Float class:
log2
"Answer the base 2 logarithm of the receiver.
Care to answer exact result for exact power of two."
^self significand ln / Ln2 + self exponent asFloat
where Ln2 is a constant (2 ln), then I effectively get an exact log2 for exact power of two, because significand of such number = 1.0 (including subnormal for Squeak exponent/significand definition), and 1.0 ln = 0.0.
The implementation is quite trivial, and should translate without difficulty in Python (probably in the VM); the runtime cost is very cheap, so it's just a matter of how important we think this feature is, or is not.
As I always say, the fact that floating point operations results are rounded to nearest (or whatever rounding direction) representable value is not a license to waste ulp. Exactness has a cost, both in term of runtime penalty and implementation complexity, so it's trade-offs driven.

The representation (float.__repr__) of a number in python tries to return a string of digits as close to the real value as possible when converted back, given that IEEE-754 arithmetic is precise up to a limit. In any case, if you printed the result, you wouldn't notice:
>>> from math import log
>>> log(2**31,2)
31.000000000000004
>>> print log(2**31,2)
31.0
print converts its arguments to strings (in this case, through the float.__str__ method), which caters for the inaccuracy by displaying less digits:
>>> log(1000000,2)
19.931568569324174
>>> print log(1000000,2)
19.9315685693
>>> 1.0/10
0.10000000000000001
>>> print 1.0/10
0.1
usuallyuseless' answer is very useful, actually :)

If you wish to calculate the highest power of 'k' in a number 'n'. Then the code below might be helpful:
import math
answer = math.ceil(math.log(n,k))
while k**answer>n:
answer-=1
NOTE: You shouldn't use 'if' instead of 'while' because that will give wrong results in some cases like n=2**51-1 and k=2. In this example with 'if' the answer is 51 whereas with 'while' the answer is 50, which is correct.

Significant figures in the decimal module

So I've decided to try to solve my physics homework by writing some python scripts to solve problems for me. One problem that I'm running into is that significant figures don't always seem to come out properly. For example this handles significant figures properly:
from decimal import Decimal
>>> Decimal('1.0') + Decimal('2.0')
Decimal("3.0")
But this doesn't:
>>> Decimal('1.00') / Decimal('3.00')
Decimal("0.3333333333333333333333333333")
So two questions:
Am I right that this isn't the expected amount of significant digits, or do I need to brush up on significant digit math?
Is there any way to do this without having to set the decimal precision manually? Granted, I'm sure I can use numpy to do this, but I just want to know if there's a way to do this with the decimal module out of curiosity.

Changing the decimal working precision to 2 digits is not a good idea, unless you absolutely only are going to perform a single operation.
You should always perform calculations at higher precision than the level of significance, and only round the final result. If you perform a long sequence of calculations and round to the number of significant digits at each step, errors will accumulate. The decimal module doesn't know whether any particular operation is one in a long sequence, or the final result, so it assumes that it shouldn't round more than necessary. Ideally it would use infinite precision, but that is too expensive so the Python developers settled for 28 digits.
Once you've arrived at the final result, what you probably want is quantize:
>>> (Decimal('1.00') / Decimal('3.00')).quantize(Decimal("0.001"))
Decimal("0.333")
You have to keep track of significance manually. If you want automatic significance tracking, you should use interval arithmetic. There are some libraries available for Python, including pyinterval and mpmath (which supports arbitrary precision). It is also straightforward to implement interval arithmetic with the decimal library, since it supports directed rounding.
You may also want to read the Decimal Arithmetic FAQ: Is the decimal arithmetic ‘significance’ arithmetic?

Decimals won't throw away decimal places like that. If you really want to limit precision to 2 d.p. then try
decimal.getcontext().prec=2
EDIT: You can alternatively call quantize() every time you multiply or divide (addition and subtraction will preserve the 2 dps).

Just out of curiosity...is it necessary to use the decimal module? Why not floating point with a significant-figures rounding of numbers when you are ready to see them? Or are you trying to keep track of the significant figures of the computation (like when you have to do an error analysis of a result, calculating the computed error as a function of the uncertainties that went into the calculation)? If you want a rounding function that rounds from the left of the number instead of the right, try:
def lround(x,leadingDigits=0):
"""Return x either as 'print' would show it (the default)
or rounded to the specified digit as counted from the leftmost
non-zero digit of the number, e.g. lround(0.00326,2) --> 0.0033
"""
assert leadingDigits>=0
if leadingDigits==0:
return float(str(x)) #just give it back like 'print' would give it
return float('%.*e' % (int(leadingDigits),x)) #give it back as rounded by the %e format
The numbers will look right when you print them or convert them to strings, but if you are working at the prompt and don't explicitly print them they may look a bit strange:
>>> lround(1./3.,2),str(lround(1./3.,2)),str(lround(1./3.,4))
(0.33000000000000002, '0.33', '0.3333')

Decimal defaults to 28 places of precision.
The only way to limit the number of digits it returns is by altering the precision.

What's wrong with floating point?
>>> "%8.2e"% ( 1.0/3.0 )
'3.33e-01'
It was designed for scientific-style calculations with a limited number of significant digits.

If I undertand Decimal correctly, the "precision" is the number of digits after the decimal point in decimal notation.
You seem to want something else: the number of significant digits. That is one more than the number of digits after the decimal point in scientific notation.
I would be interested in learning about a Python module that does significant-digits-aware floating point point computations.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Mitigating Floating Point Approximation Issues with Numpy - python

Related

Algorithm to check if a given integer is a power of two; I am using log and failing? [duplicate]

1 == 2 for large numbers of 1

Rounding ** 0.5 and math.sqrt

Inaccurate Logarithm in Python

Significant figures in the decimal module

Categories

Resources