Can someone explain why the following three examples are not all equal?
ipdb> Decimal(71.60) == Decimal(71.60)
True
ipdb> Decimal('71.60') == Decimal('71.60')
True
ipdb> Decimal(71.60) == Decimal('71.60')
False
Is there a general 'correct' way to create Decimal objects in Python? (ie, as strings or as floats)
Floating point numbers, what are used by default, are in base 2. 71.6 can't be accurately represented in base 2. (Think of numbers like 1/3 in base 10).
Because of this, they will be converted to be as many decimal places as the floating point can represent. Because the number 71.6 in base 2 would go on forever and you almost certainly don't have infinate memory to play with, the computer decides to represent it (well, is told to) in a fewer number of bits.
If you were to use a string instead, the program can use an algorithm to convert it exactly instead of starting from the dodgy rounded floating point number.
>>> decimal.Decimal(71.6)
Decimal('71.599999999999994315658113919198513031005859375')
Compared to
>>> decimal.Decimal("71.6")
Decimal('71.6')
However, if your number is representable exactly as a float, it is just as accurate as a string
>>> decimal.Decimal(71.5)
Decimal('71.5')
Normally Decimal is used to avoid the floating point precision problem. For example, the float literal 71.60 isn't mathematically 71.60, but a number very close to it.
As a result, using float to initialize Decimal won't avoid the problem. In general, you should use strings to initialize Decimal.
Related
I'm starting with Python and I recently came across a dataset with big values.
One of my fields has a list of values that looks like this: 1.3212724310201994e+18 (note the e+18 by the end of the number).
How can I convert it to a floating point number and remove the the exponent without affecting the value?
First of all, the number is already a floating point number, and you do not need to change this. The only issue is that you want to have more control over how it is converted to a string for output purposes.
By default, floating point numbers above a certain size are converted to strings using exponential notation (with "e" representing "*10^"). However, if you want to convert it to a string without exponential notation, you can use the f format specifier, for example:
a = 1.3212724310201994e+18
print("{:f}".format(a))
gives:
1321272431020199424.000000
or using "f-strings" in Python 3:
print(f"{a:f}")
here the first f tells it to use an f-string and the :f is the floating point format specifier.
You can also specify the number of decimal places that should be displayed, for example:
>>> print(f"{a:.2f}") # 2 decimal places
1321272431020199424.00
>>> print(f"{a:.0f}") # no decimal places
1321272431020199424
Note that the internal representation of a floating-point number in Python uses 53 binary digits of accuracy (approximately one part in 10^16), so in this case, the value of your number of magnitude approximately 10^18 is not stored with accuracy down to the nearest integer, let alone any decimal places. However, the above gives the general principle of how you control the formatting used for string conversion.
You can use Decimal from the decimal module for each element of your data:
from decimal import Decimal
s = 1.3212724310201994e+18
print(Decimal(s))
Output:
1321272431020199424
I need to write a simple program that calculates a mathematical formula.
The only problem here is that one of the variables can take the value 10^100.
Because of this I can not write this program in C++/C (I can't use external libraries like gmp).
Few hours ago I read that Python is capable of calculating such values.
My question is:
Why
print("%.10f"%(10.25**100))
is returning the number "118137163510621843218803309161687290343217035128100169109374848108012122824436799009169146127891562496.0000000000"
instead of
"118137163510621850716311252946961817841741635398513936935237985161753371506358048089333490072379307296.453937046171461"?
By default, Python uses a fixed precision floating-point data type to represent fractional numbers (just like double in C). You can work with precise rational numbers, though:
>>> from fractions import Fraction
>>> Fraction("10.25")
Fraction(41, 4)
>>> x = Fraction("10.25")
>>> x**100
Fraction(189839102486063226543090986563273122284619337618944664609359292215966165735102377674211649585188827411673346619890309129617784863285653302296666895356073140724001, 1606938044258990275541962092341162602522202993782792835301376)
You can also use the decimal module if you want arbitrary precision decimals (only numbers that are representable as finite decimals are supported, though):
>>> from decimal import *
>>> getcontext().prec = 150
>>> Decimal("10.25")**100
Decimal('118137163510621850716311252946961817841741635398513936935237985161753371506358048089333490072379307296.453937046171460995169093650913476028229144848989')
Python is capable of handling arbitrarily large integers, but not floating point values. They can get pretty large, but as you noticed, you lose precision in the low digits.
I'm wondering what causes this behaviour. I haven't been able to find an answer that covers this. It is probably something simple and obvious, but it is not to me. I am using python 2.7.3 in Ubuntu.
In [1]: 2 == 1.9999999999999999
Out[1]: True
In [2]: 2 == 1.999999999999999
Out[2]: False
EDIT:
To clarify my question. Is there a written(in documentation) max number of 9's where python will evaluate the expression above as being equal to 2?
Python uses floating point representation
What a floating point actually is, is a fixed-width binary number (called the "significand") plus a small integer to tell you how many powers of two to shift that value by (the "exponent"). Plus a sign bit. Just like scientific notation, but in base 2 instead of 10.
The closest 64 bit floating point value to 1.9999999999999999 is 2.0, because 64 bit floating point values (so-called "double precision") uses 52 bits of significand, which is equivalent to about 15 decimal places. So the literal 1.9999999999999999 is just another way of writing 2.0. However, the closest value to 1.999999999999999 is less than 2.0 (I think it's 1.9999999999999988897769753748434595763683319091796875 exactly, but I'm too lazy to check that's correct, I'm just relying on Python's formatting code to be exact).
I don't actually know whether the use specifically of 64 bit floats is required by the Python language, or is an implementation detail of CPython. But whatever size is used, the important thing is not specifically the number of decimal places, it is where the closest floating-point value of that size lies to your decimal literal. It will be closer for some literals than others.
Hence, 1.9999999999999999 == 2 for the same reason that 2.0 == 2 (Python allows mixed-type numeric operations including comparison, and the integer 2 is equal to the float 2.0). Whereas 1.999999999999999 != 2.
Types coercion
>>> 2 == 2.0
True
And consequences of maximum number of digits that can be represented in python :
>>> import sys
>>> sys.float_info.dig
15
>>> 1.9999999999999999
2.0
more from docs
>>> float('9876543211234567')
9876543211234568.0
note ..68 on the end instead of expected ..67
This is due to the way floats are implemented in Python. To keep it short and simple: Since floats almost always are an approximation and thus have more digits than most people find useful, the Python interpreter displays a rounded value.
More detailed, floats are stored in binary. This means that they're stored as fractions to the base 2, unlike decimal, were you can display a float as fractions to the base 10. However, most decimal fractions don't have an exact representation in binary. Because of that, they are typically stored with a precision of 53 bits. This renders them pretty much useless if you want to do more complex arithmetic operations, since you'll run into some strange problems, e. g.:
>>> 0.1 + 0.2
0.30000000000000004
>>> round(2.675, 2)
2.67
See The docs on floats as well.
Mathematically speaking, 2.0 does equal 1.9999... forever. They are two different ways of writing the same number.
However, in software, it's important to never compare two floats or decimals for equality - instead, subtract them, take the absolute value, and verify that the (always positive) difference is sufficiently low for your purposes.
EG:
if abs(value1 - value2) < 1e10:
# they are close enough
else:
# they are not
You probably should set EPSILON = 1e10, and use the symbolic constant instead of scattering 1e10 throughout your code, or better still use a comparison function.
x = 4.2 - 0.1
vb.net gives 4.1000000000000005
python gives 4.1000000000000005
Excel gives 4.1
Google calc gives 4.1
What is the reason this happens?
Float/double precision.
You must remember that in binary, 4.1 = 4 + 1/10. 1/10 is an infinitely repeating sum in binary, much like 1/9 is an infinite sum in decimal.
>>> x = 4.2 - 0.1
>>> x
4.1000000000000005
>>>>print(x)
4.1
This happens because of how numbers are stored internally.
Computers represent numbers in binary, instead of decimal, as us humans are used to. With floating point numbers, computers have to make an approximation to the closest binary floating point value.
Almost all machines today (November 2000) use IEEE-754 floating point arithmetic, and almost all platforms map Python floats to IEEE-754 “double precision”. 754 doubles contain 53 bits of precision, so on input the computer strives to convert 0.1 to the closest fraction it can of the form J/2***N* where J is an integer containing exactly 53 bits.
If you print the number, it will show the approximation, truncated to a normal value. For example, the real value of 0.1 is 0.1000000000000000055511151231257827021181583404541015625.
If you really need a base 10 based number (if you don't know the answer to this question, you don't), you could use (in Python) decimal.Decimal:
>>> from decimal import Decimal
>>> Decimal("4.2") - Decimal("0.1")
Decimal("4.1")
Binary floating-point arithmetic holds many surprises like this. The problem with “0.1” is explained in precise detail below, in the “Representation Error” section. See The Perils of Floating Point for a more complete account of other common surprises.
As that says near the end, “there are no easy answers.” Still, don’t be unduly wary of floating-point! The errors in Python float operations are inherited from the floating-point hardware, and on most machines are on the order of no more than 1 part in 2**53 per operation. That’s more than adequate for most tasks, but you do need to keep in mind that it’s not decimal arithmetic, and that every float operation can suffer a new rounding error.
While pathological cases do exist, for most casual use of floating-point arithmetic you’ll see the result you expect in the end if you simply round the display of your final results to the number of decimal digits you expect. str() usually suffices, and for finer control see the str.format() method’s format specifiers in Format String Syntax.
There is no problem, really. It is just the way floats work (their internal binary representation). Anyway:
>>> from decimal import Decimal
>>> Decimal('4.2')-Decimal('0.1')
Decimal('4.1')
In vb.net, you can avoid this problem by using Decimal type instead:
Dim x As Decimal = 4.2D - 0.1D
The result is 4.1 .
So I've decided to try to solve my physics homework by writing some python scripts to solve problems for me. One problem that I'm running into is that significant figures don't always seem to come out properly. For example this handles significant figures properly:
from decimal import Decimal
>>> Decimal('1.0') + Decimal('2.0')
Decimal("3.0")
But this doesn't:
>>> Decimal('1.00') / Decimal('3.00')
Decimal("0.3333333333333333333333333333")
So two questions:
Am I right that this isn't the expected amount of significant digits, or do I need to brush up on significant digit math?
Is there any way to do this without having to set the decimal precision manually? Granted, I'm sure I can use numpy to do this, but I just want to know if there's a way to do this with the decimal module out of curiosity.
Changing the decimal working precision to 2 digits is not a good idea, unless you absolutely only are going to perform a single operation.
You should always perform calculations at higher precision than the level of significance, and only round the final result. If you perform a long sequence of calculations and round to the number of significant digits at each step, errors will accumulate. The decimal module doesn't know whether any particular operation is one in a long sequence, or the final result, so it assumes that it shouldn't round more than necessary. Ideally it would use infinite precision, but that is too expensive so the Python developers settled for 28 digits.
Once you've arrived at the final result, what you probably want is quantize:
>>> (Decimal('1.00') / Decimal('3.00')).quantize(Decimal("0.001"))
Decimal("0.333")
You have to keep track of significance manually. If you want automatic significance tracking, you should use interval arithmetic. There are some libraries available for Python, including pyinterval and mpmath (which supports arbitrary precision). It is also straightforward to implement interval arithmetic with the decimal library, since it supports directed rounding.
You may also want to read the Decimal Arithmetic FAQ: Is the decimal arithmetic ‘significance’ arithmetic?
Decimals won't throw away decimal places like that. If you really want to limit precision to 2 d.p. then try
decimal.getcontext().prec=2
EDIT: You can alternatively call quantize() every time you multiply or divide (addition and subtraction will preserve the 2 dps).
Just out of curiosity...is it necessary to use the decimal module? Why not floating point with a significant-figures rounding of numbers when you are ready to see them? Or are you trying to keep track of the significant figures of the computation (like when you have to do an error analysis of a result, calculating the computed error as a function of the uncertainties that went into the calculation)? If you want a rounding function that rounds from the left of the number instead of the right, try:
def lround(x,leadingDigits=0):
"""Return x either as 'print' would show it (the default)
or rounded to the specified digit as counted from the leftmost
non-zero digit of the number, e.g. lround(0.00326,2) --> 0.0033
"""
assert leadingDigits>=0
if leadingDigits==0:
return float(str(x)) #just give it back like 'print' would give it
return float('%.*e' % (int(leadingDigits),x)) #give it back as rounded by the %e format
The numbers will look right when you print them or convert them to strings, but if you are working at the prompt and don't explicitly print them they may look a bit strange:
>>> lround(1./3.,2),str(lround(1./3.,2)),str(lround(1./3.,4))
(0.33000000000000002, '0.33', '0.3333')
Decimal defaults to 28 places of precision.
The only way to limit the number of digits it returns is by altering the precision.
What's wrong with floating point?
>>> "%8.2e"% ( 1.0/3.0 )
'3.33e-01'
It was designed for scientific-style calculations with a limited number of significant digits.
If I undertand Decimal correctly, the "precision" is the number of digits after the decimal point in decimal notation.
You seem to want something else: the number of significant digits. That is one more than the number of digits after the decimal point in scientific notation.
I would be interested in learning about a Python module that does significant-digits-aware floating point point computations.