Python Shell - "Extras" in float subtraction [duplicate] - python

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Floating Point Limitations
Using Python 2.7 here.
Can someone explain why this happens in the shell?
>>> 5.2-5.0
0.20000000000000018
Searching yielded things about different scales of numbers not producing the right results (a very small number and a very large number), but that seemed pretty general, and considering the numbers I'm using are of the same scale, I don't think that's why this happens.
EDIT: I suppose I didn't define that the "this thing happening" I meant was that it returns 0.2 ... 018 instead of simply resulting in 0.2. I get that print rounds, and removed the print part in the code snippet, as that was misleading.

You need to understand that 5.2-5.0 really is 0.20000000000000018, not 0.2. The standard explanation for this is found in What Every Computer Scientist Should Know About Floating-Point Arithmetic.
If you don't want to read all of that, just accept that 5.2, 5.0, and 0.20000000000000018 are all just approximations, as close as the computer can get to the numbers you really way.
Python has some tricks to allow you to not know what every computer scientist should know and still get away with it. The main trick is that str(f)—that is, the human-readable rendition of a floating-point number—is truncated to 12 significant digits, so str(5.2-5.0) is "0.2", not "0.20000000000000018". But sometimes you need all the precision you can get, so repr(f)—that is, the machine-readable rendition—is not truncated, so repr(5.2-5.0) is "0.20000000000000018".
Now the only thing left to understand is what the interpreter shell does. As Ashwini Chaudhary explains, just evaluating something in the shell prints out its repr, while the print statement prints out its str.

shell uses repr():
In [1]: print repr(5.2-5.0)
0.20000000000000018
In [2]: print str(5.2-5.0)
0.2
In [3]: print 5.2-5.0
0.2

The default implementation of float.__str__ limits the output to 12 digits only.
Thus, the least significant digits are dropped and what is left is the value 0.2.
To print more digits (if available), use string formatting:
print '%f' % result # prints 0.200000
That defaults to 6 digits, but you can specify more precision:
print '%.16f' % result # prints 0.2000000000000002
Alternatively, python offers a newer string formatting method too:
print '{0:.16f}'.format(result) # prints 0.2000000000000002
Why python produces the 'imprecise' result in the first place has everything to do with the imprecise nature of floating point arithmetic. Use the decimal module instead if you need more predictable precision:
>>> from decimal import *
>>> getcontext().prec = 1
>>> Decimal(5.2) - Decimal(5.0)
Decimal('0.2')

Python has two different ways of converting an object to a string, the __str__ and __repr__ methods. __str__ is meant to be a normal string output and is used by print; __repr__ is meant to be a more exact representation and is what is displayed when you don't use print, or when you print the contents of a list or dictionary. __str__ rounds floating-point values.
As for why the actual result of the subtraction is 0.20000000000000018 rather than 0.2 exactly, it has to do with the internal representation of floating point. It's impossible to represent 5.2 exactly because it's an infinitely repeating binary number. The closest that you can come is approximately 5.20000000000000018.

Related

Loss of precision float in python

I have a list called scores of varying -log probabilities.
when I call this function:
maxState = scores.pop(scores.index(max(scores)))
and print maxState, I realize that the maxState loses its precision as a float. Is there a way I can get the maxState without losing precision?
ex: I print out the list scores: [-35.7971525669589, -34.67875545008369]
and print maxState, I get this: -34.6787554501
(You can see it's rounded)
You are confusing string presentation with actual contents. Nowhere is precision lost, only the string produced to write to your console is using a rounded value rather than show you all digits. And always remember that float numbers are digital approximations, not precise values.
Python floats are formatted differently when using the str() and repr() functions; in a list or other container, repr() is used, but print it directly and str() is used.
If you don't like either option, format it explicitly with the format() function and specifying a precision:
print format(maxState, '.12f')
to print it with 8 decimals, for example.
Demo:
>>> maxState = -34.67875545008369
>>> repr(maxState)
'-34.67875545008369'
>>> str(maxState)
'-34.6787554501'
>>> format(maxState, '.8f')
'-34.67875545'
>>> format(maxState, '.12f')
'-34.678755450084'
The repr() output is roughly equivalent to using '.17g' as the format, while str() is equivalent to '.12g'; here the precision denotes when to use scientific notation (e) and when to display in floating point notation (f).
I say roughly because the repr() output aims to give you round-trippable output; see the change notes for Python 3.1 on float() representation, which where backported to Python 2.7:
What is new is how the number gets displayed. Formerly, Python used a simple approach. The value of repr(1.1) was computed as format(1.1, '.17g') which evaluated to '1.1000000000000001'. The advantage of using 17 digits was that it relied on IEEE-754 guarantees to assure that eval(repr(1.1)) would round-trip exactly to its original value. The disadvantage is that many people found the output to be confusing (mistaking intrinsic limitations of binary floating point representation as being a problem with Python itself).
The new algorithm for repr(1.1) is smarter and returns '1.1'. Effectively, it searches all equivalent string representations (ones that get stored with the same underlying float value) and returns the shortest representation.

1 == 2 for large numbers of 1

I'm wondering what causes this behaviour. I haven't been able to find an answer that covers this. It is probably something simple and obvious, but it is not to me. I am using python 2.7.3 in Ubuntu.
In [1]: 2 == 1.9999999999999999
Out[1]: True
In [2]: 2 == 1.999999999999999
Out[2]: False
EDIT:
To clarify my question. Is there a written(in documentation) max number of 9's where python will evaluate the expression above as being equal to 2?
Python uses floating point representation
What a floating point actually is, is a fixed-width binary number (called the "significand") plus a small integer to tell you how many powers of two to shift that value by (the "exponent"). Plus a sign bit. Just like scientific notation, but in base 2 instead of 10.
The closest 64 bit floating point value to 1.9999999999999999 is 2.0, because 64 bit floating point values (so-called "double precision") uses 52 bits of significand, which is equivalent to about 15 decimal places. So the literal 1.9999999999999999 is just another way of writing 2.0. However, the closest value to 1.999999999999999 is less than 2.0 (I think it's 1.9999999999999988897769753748434595763683319091796875 exactly, but I'm too lazy to check that's correct, I'm just relying on Python's formatting code to be exact).
I don't actually know whether the use specifically of 64 bit floats is required by the Python language, or is an implementation detail of CPython. But whatever size is used, the important thing is not specifically the number of decimal places, it is where the closest floating-point value of that size lies to your decimal literal. It will be closer for some literals than others.
Hence, 1.9999999999999999 == 2 for the same reason that 2.0 == 2 (Python allows mixed-type numeric operations including comparison, and the integer 2 is equal to the float 2.0). Whereas 1.999999999999999 != 2.
Types coercion
>>> 2 == 2.0
True
And consequences of maximum number of digits that can be represented in python :
>>> import sys
>>> sys.float_info.dig
15
>>> 1.9999999999999999
2.0
more from docs
>>> float('9876543211234567')
9876543211234568.0
note ..68 on the end instead of expected ..67
This is due to the way floats are implemented in Python. To keep it short and simple: Since floats almost always are an approximation and thus have more digits than most people find useful, the Python interpreter displays a rounded value.
More detailed, floats are stored in binary. This means that they're stored as fractions to the base 2, unlike decimal, were you can display a float as fractions to the base 10. However, most decimal fractions don't have an exact representation in binary. Because of that, they are typically stored with a precision of 53 bits. This renders them pretty much useless if you want to do more complex arithmetic operations, since you'll run into some strange problems, e. g.:
>>> 0.1 + 0.2
0.30000000000000004
>>> round(2.675, 2)
2.67
See The docs on floats as well.
Mathematically speaking, 2.0 does equal 1.9999... forever. They are two different ways of writing the same number.
However, in software, it's important to never compare two floats or decimals for equality - instead, subtract them, take the absolute value, and verify that the (always positive) difference is sufficiently low for your purposes.
EG:
if abs(value1 - value2) < 1e10:
# they are close enough
else:
# they are not
You probably should set EPSILON = 1e10, and use the symbolic constant instead of scattering 1e10 throughout your code, or better still use a comparison function.

Numpy to weak to calculate a precise mean value

This question is very similar to this post - but not exactly
I have some data in a .csv file. The data has precision to the 4th digit (#.####).
Calculating the mean in Excel or SAS gives a result with precision to 5th digit (#.#####) but using numpy gives:
import numpy as np
data = np.recfromcsv(path2file, delimiter=';', names=['measurements'], dtype=np.float64)
rawD = data['measurements']
print np.average(rawD)
gives a number like this
#.#####999999999994
Clearly something is wrong..
using
from math import fsum
print fsum(rawD.ravel())/rawD.size
gives
#.#####
Is there anything in the np.average that I set wrong _______?
BONUS info:
I'm only working with 200 data points in the array
UPDATE
I thought I should make my case more clear.
I have numbers like 4.2730 in my csv (giving a 4 decimal precision - even though the 4th always is zero [not part of the subject so don't mind that])
Calculating an average/mean by numpy gives me this
4.2516499999999994
Which gives a print by
>>>print "%.4f" % np.average(rawD)
4.2516
During the same thing in Excel or SAS gives me this:
4.2517
Which I actually believe as being the true average value because it finds it to be 4.25165.
This code also illustrate it:
answer = 0
for number in rawD:
answer += int(number*1000)
print answer/2
425165
So how do I tell np.average() to calculate this value ___?
I'm a bit surprised that numpy did this to me... I thought that I only needed to worry if I was dealing with 16 digits numbers. Didn't expect a round off on the 4 decimal place would be influenced by this..
I know I could use
fsum(rawD.ravel())/rawD.size
But I also have other things (like std) I want to calculate with the same precision
UPDATE 2
I thought I could make a temp solution by
>>>print "%.4f" % np.float64("%.5f" % np.mean(rawD))
4.2416
Which did not solve the case. Then I tried
>>>print "%.4f" % float("4.24165")
4.2416
AHA! There is a bug in the formatter: Issue 5118
To be honest I don't care if python stores 4.24165 as 4.241649999... It's still a round off error - NO MATTER WHAT.
If the interpeter can figure out how to display the number
>>>print float("4.24165")
4.24165
Then should the formatter as well and deal with that number when rounding..
It still doesn't change the fact that I have a round off problem (now both with the formatter and numpy)
In case you need some numbers to help me out then I have made this modified .csv file:
Download it from here
(I'm aware that this file does not have the number of digits I explained earlier and that the average gives ..9988 at the end instead of ..9994 - it's modified)
Guess my qeustion boils down to how do I get a string output like the one excel gives me if I use =average()
and have it round off correctly if I choose to show only 4 digits
I know that this might seem strange for some.. But I have my reasons for wanting to reproduce the behavior of Excel.
Any help would be appreciated, thank you.
To get exact decimal numbers, you need to use decimal arithmetic instead of binary. Python provides the decimal module for this.
If you want to continue to use numpy for the calculations and simply round the result, you can still do this with decimal. You do it in two steps, rounding to a large number of digits to eliminate the accumulated error, then rounding to the desired precision. The quantize method is used for rounding.
from decimal import Decimal,ROUND_HALF_UP
ten_places = Decimal('0.0000000001')
four_places = Decimal('0.0001')
mean = 4.2516499999999994
print Decimal(mean).quantize(ten_places).quantize(four_places, rounding=ROUND_HALF_UP)
4.2517
The result value of average is a double. When you print out a double, by default all digits are printed. What you see here is the result of limited digital precision, which is not a problem of numpy, but a general computing problem. When you care of the presentation of your float value, use "%.4f" % avg_val. There is also a package for rational numbers, to avoid representing fractions as real numbers, but I guess that's not what you're looking for.
For your second statement, summarizing all the values by hand and then dividing it, I suppose you're using python 2.7 and all your input values are integer. In that way, you would have an integer division, which truncates everything after the dot, resulting in another integer value.

Why is Ruby's Float#round behavior different than Python's?

"Behavior of “round” function in Python" observes that Python rounds floats like this:
>>> round(0.45, 1)
0.5
>>> round(1.45, 1)
1.4
>>> round(2.45, 1)
2.5
>>> round(3.45, 1)
3.5
>>> round(4.45, 1)
4.5
>>> round(5.45, 1)
5.5
>>> round(6.45, 1)
6.5
>>> round(7.45, 1)
7.5
>>> round(8.45, 1)
8.4
>>> round(9.45, 1)
9.4
The accepted answer confirms this is caused by the binary representation of floats being inaccurate, which is all logical.
Assuming that Ruby floats are just as inaccurate as Python's, how come Ruby floats round like a human would? Does Ruby cheat?
1.9.3p194 :009 > 0.upto(9) do |n|
1.9.3p194 :010 > puts (n+0.45).round(1)
1.9.3p194 :011?> end
0.5
1.5
2.5
3.5
4.5
5.5
6.5
7.5
8.5
9.5
Summary
Both implementations are confront the same issues surrounding binary floating point numbers.
Ruby operates directly on the floating point number with simple operations (multiply by a power of ten, adjust, and truncate).
Python converts the binary floating point number to a string using David Gay's sophisticated algorithm that yields the shortest decimal representation that is exactly equal to the binary floating point number. This does not do any additional rounding, it is an exact conversion to a string.
With the shortest string representation in-hand, Python rounds to the appropriate number of decimal places using exact string operations. The goal of the float-to-string conversion is to attempt to "undo" some of the binary floating point representation error (i.e. if you enter 6.6, Python rounds on the 6.6 rather that 6.5999999999999996.
In addition, Ruby differs from some versions of Python in rounding modes: round-away-from-zero versus round-half-even.
Detail
Ruby doesn't cheat. It starts with plain old binary float point numbers the same a Python does. Accordingly, it is subject to some of the same challenges (such 3.35 being represented at slightly more than 3.35 and 4.35 being represented as slightly less than 4.35):
>>> Decimal.from_float(3.35)
Decimal('3.350000000000000088817841970012523233890533447265625')
>>> Decimal.from_float(4.35)
Decimal('4.3499999999999996447286321199499070644378662109375')
The best way to see the implementation differences is to look at the underlying source code:
Here's a link to the Ruby source code: https://github.com/ruby/ruby/blob/trunk/numeric.c#L1587
The Python source is starts here: http://hg.python.org/cpython/file/37352a3ccd54/Python/bltinmodule.c
and finishes here: http://hg.python.org/cpython/file/37352a3ccd54/Objects/floatobject.c#l1080
The latter has an extensive comment that reveals the differences between the two implementations:
The basic idea is very simple: convert and round the double to a
decimal string using _Py_dg_dtoa, then convert that decimal string
back to a double with _Py_dg_strtod. There's one minor difficulty:
Python 2.x expects round to do round-half-away-from-zero, while
_Py_dg_dtoa does round-half-to-even. So we need some way to detect and correct the halfway cases.
Detection: a halfway value has the form k * 0.5 * 10**-ndigits for
some odd integer k. Or in other words, a rational number x is exactly
halfway between two multiples of 10**-ndigits if its 2-valuation is
exactly -ndigits-1 and its 5-valuation is at least
-ndigits. For ndigits >= 0 the latter condition is automatically satisfied for a binary float x, since any such float has nonnegative
5-valuation. For 0 > ndigits >= -22, x needs to be an integral
multiple of 5**-ndigits; we can check this using fmod. For -22 >
ndigits, there are no halfway cases: 5**23 takes 54 bits to represent
exactly, so any odd multiple of 0.5 * 10**n for n >= 23 takes at least
54 bits of precision to represent exactly.
Correction: a simple strategy for dealing with halfway cases is to
(for the halfway cases only) call _Py_dg_dtoa with an argument of
ndigits+1 instead of ndigits (thus doing an exact conversion to
decimal), round the resulting string manually, and then convert back
using _Py_dg_strtod.
In short, Python 2.7 goes to great lengths to accurately follow a round-away-from-zero rule.
In Python 3.3, it goes to equally great length to accurately follow a round-to-even rule.
Here's a little additional detail on the _Py_dg_dtoa function. Python calls the float to string function because it implements an algorithm that gives the shortest possible string representation among equal alternatives. In Python 2.6, for example, the number 1.1 shows up as 1.1000000000000001, but in Python 2.7 and later, it is simply 1.1. David Gay's sophisticated dtoa.c algorithm gives "the-result-that-people-expect" without forgoing accuracy.
That string conversion algorithm tends to make-up for some of the issues that plague any implementation of round() on binary floating point numbers (i.e. it less rounding of 4.35 start with 4.35 instead of 4.3499999999999996447286321199499070644378662109375).
That and the rounding mode (round-half-even vs round-away-from-zero) are the essential differences between the Python and Ruby round() functions.
The fundamental difference is:
Python: Convert to decimal and then round
Ruby: Round and then convert to decimal
Ruby is rounding it from the original floating point bit string, but after operating on it with 10n. You can't see the original binary value without looking very closely. The values are inexact because they are binary, and we are used to writing in decimal, and as it happens almost all of the decimal fraction strings we are likely to write do not have an exact equivalence as a base 2 fraction string.
In particular, 0.45 looks like this:
01111111101 1100110011001100110011001100110011001100110011001101
In hex, that is 3fdccccccccccccd.
It repeats in binary, the first unrepresented digit is 0xc, and the clever decimal input conversion has accurately rounded this very last fractional digit to 0xd.
This means that inside the machine, the value is greater than 0.45 by roughly 1/250. This is obviously a very, very small number but it's enough to cause the default round-nearest algorithm to round up instead of to the tie-breaker of even.
Both Python and Ruby are potentially rounding more than once as every operation effectively rounds into the least significant bit.
I'm not sure I agree that Ruby does what a human would do. I think Python is approximating what decimal arithmetic would do. Python (depending on version) is applying round-nearest to the decimal string and Ruby is applying the round nearest algorithm to a computed binary value.
Note that we can see here quite clearly the reason people say that FP is inexact. It's a reasonably true statement, but it's more true to say that we simply can't convert accurately between binary and most decimal fractions. (Some do: 0.25, 0.5, 0.75, ...) Most simple decimal numbers are repeating numbers in binary, so we can never store the exact equivalent value. But, every value we can store is known exactly and all arithmetic performed on it is performed exactly. If we wrote our fractions in binary in the first place our FP arithmetic would be considered exact.
Ruby doesn't cheat. It just chose another way to implement round.
In Ruby, 9.45.round(1) is almost equivalent to (9.45*10.0).round / 10.0.
irb(main):001:0> printf "%.20f", 9.45
9.44999999999999928946=> nil
irb(main):002:0> printf "%.20f", 9.45*10.0
94.50000000000000000000=> nil
So
irb(main):003:0> puts 9.45.round(1)
9.5
If we use such way in Python, we will get 9.5 as well.
>>> round(9.45, 1)
9.4
>>> round(9.45*10)/10
9.5

Significant figures in the decimal module

So I've decided to try to solve my physics homework by writing some python scripts to solve problems for me. One problem that I'm running into is that significant figures don't always seem to come out properly. For example this handles significant figures properly:
from decimal import Decimal
>>> Decimal('1.0') + Decimal('2.0')
Decimal("3.0")
But this doesn't:
>>> Decimal('1.00') / Decimal('3.00')
Decimal("0.3333333333333333333333333333")
So two questions:
Am I right that this isn't the expected amount of significant digits, or do I need to brush up on significant digit math?
Is there any way to do this without having to set the decimal precision manually? Granted, I'm sure I can use numpy to do this, but I just want to know if there's a way to do this with the decimal module out of curiosity.
Changing the decimal working precision to 2 digits is not a good idea, unless you absolutely only are going to perform a single operation.
You should always perform calculations at higher precision than the level of significance, and only round the final result. If you perform a long sequence of calculations and round to the number of significant digits at each step, errors will accumulate. The decimal module doesn't know whether any particular operation is one in a long sequence, or the final result, so it assumes that it shouldn't round more than necessary. Ideally it would use infinite precision, but that is too expensive so the Python developers settled for 28 digits.
Once you've arrived at the final result, what you probably want is quantize:
>>> (Decimal('1.00') / Decimal('3.00')).quantize(Decimal("0.001"))
Decimal("0.333")
You have to keep track of significance manually. If you want automatic significance tracking, you should use interval arithmetic. There are some libraries available for Python, including pyinterval and mpmath (which supports arbitrary precision). It is also straightforward to implement interval arithmetic with the decimal library, since it supports directed rounding.
You may also want to read the Decimal Arithmetic FAQ: Is the decimal arithmetic ‘significance’ arithmetic?
Decimals won't throw away decimal places like that. If you really want to limit precision to 2 d.p. then try
decimal.getcontext().prec=2
EDIT: You can alternatively call quantize() every time you multiply or divide (addition and subtraction will preserve the 2 dps).
Just out of curiosity...is it necessary to use the decimal module? Why not floating point with a significant-figures rounding of numbers when you are ready to see them? Or are you trying to keep track of the significant figures of the computation (like when you have to do an error analysis of a result, calculating the computed error as a function of the uncertainties that went into the calculation)? If you want a rounding function that rounds from the left of the number instead of the right, try:
def lround(x,leadingDigits=0):
"""Return x either as 'print' would show it (the default)
or rounded to the specified digit as counted from the leftmost
non-zero digit of the number, e.g. lround(0.00326,2) --> 0.0033
"""
assert leadingDigits>=0
if leadingDigits==0:
return float(str(x)) #just give it back like 'print' would give it
return float('%.*e' % (int(leadingDigits),x)) #give it back as rounded by the %e format
The numbers will look right when you print them or convert them to strings, but if you are working at the prompt and don't explicitly print them they may look a bit strange:
>>> lround(1./3.,2),str(lround(1./3.,2)),str(lround(1./3.,4))
(0.33000000000000002, '0.33', '0.3333')
Decimal defaults to 28 places of precision.
The only way to limit the number of digits it returns is by altering the precision.
What's wrong with floating point?
>>> "%8.2e"% ( 1.0/3.0 )
'3.33e-01'
It was designed for scientific-style calculations with a limited number of significant digits.
If I undertand Decimal correctly, the "precision" is the number of digits after the decimal point in decimal notation.
You seem to want something else: the number of significant digits. That is one more than the number of digits after the decimal point in scientific notation.
I would be interested in learning about a Python module that does significant-digits-aware floating point point computations.

Categories