My python program manipulates bitcoin amounts precise to 8 decimal places. My intention is to use decimal.Decimal types everywhere, to avoid any floating point precision issues -- but I'm not sure I got every usage.
For quality assurance, I'd like to raise an error if there's any floats constructed anywhere in the program. Is this possible in python 3.5?
(I cannot use integers because I'm interfacing via JSON with other programs that expect decimal values.)
Related
We are trying to convert some qbasic scripts into python scripts.
The scripts are used to generate some reports. Generally the reports generated by qbasic and python scripts should be exactly same.
While generating a report we need to format a floating point number in a particular format.
We use the following commands for formatting the number.
For QBASIC, we use
PRINT USING "########.###"; VAL(MYNUM$)
For Python, we use
print('{:12.3f}'.format(mynum))
where MYNUM$ and mynum having the floating point value.
But in certain cases, the formatted value differs between python and qbasic.
The result become as follows,
Can anyone help me to sort out this problem and make the python formatting work like qbasic?
This seems to be an related to the datatype (maybe 32bit float in qbasic and 64bit in python) used and how rounding is implemented. For example when you use:
from ctypes import c_float
print(floor(c_float(mynum).value*1000+.5)/1000)
c_float converts the python float into C format.
it will give me the numbers exactly in python exactly as in qbasic.
For relatively simple floats, the numerical precision is sufficient to represent them exactly. For example, 17.5 is equal to 17.5
For more complicated floats, such as
17.4999999999999982236431605997495353221893310546874 = 17.499999999999996447286321199499070644378662109375
17.4999999999999982236431605997495353221893310546875 = 17.5
Using as_integer_ratio() on the first number above, one obtains (4925812092436479, 281474976710656) and since (4925812092436479*2+1)/(2*281474976710656) equals the second number above, it becomes evident that the partition between >=17.5 and <17.5 is 1/(2*281474976710656).
Do the python standards guarantee a particular float will be "binned" into a particular bin above, or is it implementation dependent? If there is a guarantee, how is it decided?
For the above I used, python 3.5.6, but I am interested in the general answer for python 3.x if it exists.
For relatively simple floats, the numerical precision is sufficient to represent them exactly
Not really. Yes, 17.5 can be represented exactly because it is a multiple of a power of two (a multiple of 2-1, to be exact). But even very simple floats like 0.1 cannot be represented exactly. There it depends on the text to float conversion routine to get a representation that is as close as possible.
The conversion is done by the runtime (or the C or Java runtime of the compiler, for literals), which uses the C or Java functions (like C's strtod()) to do this (Java implements the code of David Gay's strtod(), but in Java language).
Not every implementation of strtod(), i.e. not every C/Java compiler uses the same methodology to convert, so there may be slight, usually insignificant differences in some of the results.
FWIW, the website Exploring Binary (no affiliation, I'm just a big fan) has many articles on this subject. It is obviously not as simple as expected.
For relatively simple floats, the numerical precision is sufficient to represent them exactly.
No, even simple decimals don't necessarily have an exact IEEE-754 representation:
>>> format(0.1, '.20f')
'0.10000000000000000555'
>>> format(0.2, '.20f')
'0.20000000000000001110'
>>> format(0.3, '.20f')
'0.29999999999999998890'
>>> format(0.1 + 0.2, '.20f')
'0.30000000000000004441'
Powers of 2 (x.0, x.5, x.25, x.125, …) are exactly representable, modulo precision issues.
Do the python standards guarantee a particular float will be "binned" into a particular bin above, or is it implementation dependent?
Pretty sure Python simply delegates to the underlying system, so it's mostly hardware-dependent. If you want guarantees, use decimal. IIRC the native (C) implementation was merged in 3.3, and the performance impact of using decimals has thus become much, much lower than it was in Python 2.
Python floats are IEEE-754 doubles.
I'm porting a MATLAB code to Python 3.5.1 and I found a float round-off issue.
In MATLAB, the following number is rounded up to the 6th decimal place:
fprintf(1,'%f', -67.6640625);
-67.664063
In Python, on the other hand, the following number is rounded off to the 6th decimal place:
print('%f' % -67.6640625)
-67.664062
Interestingly enough, if the number is '-67.6000625', then it is rounded up even in Python:
print('%f' % -67.6000625)
-67.600063
... Why does this happen?
What are the criteria to round-off/up in Python?
(I believe this has something to do with handling hexadecimal values.)
More importantly, how can I prevent this difference?
I'm supposed to create a python code which can reproduce exactly the same output as MATLAB produces.
The reason for the python behavior has to do with how floating point numbers are stored in a computer and the standardized rounding rules defined by IEEE, which defined the standard number formats and mathematical operations used on pretty much all modern computers.
The need to store numbers efficiently in binary on a computer has lead computers to use floating-point numbers. These numbers are easy for processors to work with, but have the disadvantage that many decimal numbers cannot be exactly represented. This results in numbers sometimes being a little off from what we think they should be.
The situation becomes a bit clearer if we expand the values in Python, rather than truncating them:
>>> print('%.20f' % -67.6640625)
-67.66406250000000000000
>>> print('%.20f' % -67.6000625)
-67.60006250000000704858
So as you can see, -67.6640625 is a number that can be exactly represented, but -67.6000625 isn't, it is actually a little bigger. The default rounding mode defined by the IEEE stanard for floating-point numbers says that anything above 5 should be rounded up, anything below should be rounded down. So for the case of -67.6000625, it is actualy 5 plus a small amount, so it is rounded up. However, in the case of -67.6640625, it is exactly equal to five, so a tiebreak rule comes into play. The default tiebreaker rule is round to the nearest even number. Since 2 is the nearest event number, it rounds down to two.
So Python is following the approach recommended by the floating-point standard. The question, then, is why your version of MATLAB doesn't do this. I tried it on my computer with 64bit MATLAB R2016a, and I got the same result as in Python:
>> fprintf(1,'%f', -67.6640625)
-67.664062>>
So it seems like MATLAB was, at some point, using a different rounding approach (perhaps a non-standard approach, perhaps one of the alternatives specified in the standard), and has since switched to follow the same rules as everyone else.
This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 7 years ago.
This might seem really silly. But I am new to python and like to use equality conditions in my program, and has hit a very surprising road block. While the practical issue here is that the last condition r==rmax is not satisfied and I will miss out on an iteration of the loop, but that is not what is worrying me.
Rather than (trivial) work arounds, can someone explain to me what is going on in simple terms? (Also why the numbers turn out the same no matter how many times I run this loop, therefore it is something systematic and not something probabilistic).
And a proper way to make sure this does not happen (What I mean by proper is a programming practice I should adopt in all my coding, so that such unintentional discrepancy does not occur ever again)? I mean such loops are omnipresent in my codes and it makes me worried.
It seems I cannot trust numbers in python, which would make it useless as a computational tool.
PS : I am working on a scientific computing project with Numpy.
>>> while r<=r_max:
... print repr(r)
... r = r + r_step
...
2.4
2.5
2.6
2.7
2.8000000000000003
2.9000000000000004
3.0000000000000004
3.1000000000000005
3.2000000000000006
3.3000000000000007
3.400000000000001
3.500000000000001
3.600000000000001
3.700000000000001
3.800000000000001
3.9000000000000012
The simple answer is that floating-point numbers, as usually represented in computing, aren't exact, and you can't treat them as if they were exact. Treat them as if they're fuzzy; see if they're within a certain range, not whether they "equal" something.
The numbers turn out the same because it's all deterministic. The calculations are being done in exactly the same way each time. This isn't a case of random errors, it's a case of the machine representing floating-point numbers in an inexact way.
Python has some exact datatypes you can use instead of inexact floats; see the decimal and fractions modules.
There's a classic article called "What Every Computer Scientist Should Know About Floating-Point Arithmetic"; google it and pick any link you like.
Pierre G. is correct. Because computer calculate number in binary, it cannot present a lot float number exactly. But it should be precise with in certain digits depends the data type you use.
For your case, I think maybe you could use round(number, digits) function to get a round number and then compare it.
It's just the matter of float arithmetic. Actually If you know about the representation of floating numbers in the computer memory, then the representation of certain numbers though we consider to be whole integer, is not stored as a whole integer. It's stored with certain precision(in terms of number of digits after decimal point) only. This problem will remain always, whenever you are doing mathematical programming. I suggest you to use comparision, which can accept tolerance value. Numpy has such methods, which facilitates such comparision.
First of all, I was not studying math in English language, so I may use wrong words in my text.
Float numbers can be finite(42.36) and infinite (42.363636...)
In C/C++ numbers are stored at base 2. Our minds operate floats at base 10.
The problem is -
many (a lot, actually) of float numbers with base 10, that are finite, have no exact finite representation in base 2, and vice-versa.
This doesn't mean anything most of the time. The last digit of double may be off by 1 bit - not a problem.
A problem arises when we compute two floats that are actually integers. 99.0/3.0 on C++ can result in 33.0 as well as 32.9999...99. And if you convert it to integer then - you are in for a surprise. I always add a special value (2*smallest value for given type and architecture) before rounding up in C for this reason. Should I do it in Python or not?
I have run some tests in Python and it seems float division always results as expected. But some tests are not enough because the problem is architecture-dependent. Do somebody know for sure if it is taken care of, and on what level - in float type itself or only in rounding up and shortening functions?
P.S. And if somebody can clarify the same thing for Haskell, which I am only starting with - it would be great.
UPDATE
Folks pointed out to an official document stating there is uncertainty in floating point arithmetic. The remaining question is - do math functions like ceil take care of them or should I do it on my own? This must be pointed out to beginner users every time we speak of these functions, because otherwise they will all stumble on that problem.
The format C and C++ use for representing float and double is standardized (IEEE 754), and the problems you describe are inherent in that representation. Since Python is implemented in C, its floating point types are prone to the same rounding problems.
Haskell's Float and Double are a somewhat higher level abstraction, but since most (all?) modern CPUs use IEEE754 for floating point calculations, you most probably will have that kind of rounding errors there as well.
In other words: Only languages/libraries which choose to not base their floating point types on the underlying architecture might be able to circumvent the IEEE754 rounding problems to a certain degree, but since the underlying hardware does not support other representations directly, there has to be a performance penalty. Therefore, probably most languages will stick to the standard, not least because its limitations are well known.
Real numbers themselves, including floats, are never "infinite" in any mathematical sense. They may have infinite decimal representations, but that's only a technical problem of the way we write them (or store them in computers). In fact though, IEEE754 also specifies +∞ and -∞ values, those are actual infinities... but they don't represent real numbers and are mathematically quite horrible in many a way.
Also... "And if you convert it to integer then" you should never "convert" floats to integers anyway, it's not really possible: you can only round them to integers. and if you do that with e.g. Haskell's round, it's pretty safe indeed, certainly
Prelude> round $ 99/3
33
Though ghci calculates the division with floating-point.
The only things that are always unsafe:
Of course, implicit conversion from float to int is completely crazy, and positively a mistake in the C-languages. Haskell and Python are both properly strongly typed, so such stuff won't happen by accident.
Floating-points should generally not be expected to be exactly equal to anything particular. It's not really useful to expect so anyway, because for actual real numbers any single one is a null set, which roughly means the only way two real number can be equal is if there's so deep mathematical reason for it. But for any distribution e.g. from a physical process, the probability for equalness is exactly zero, so why would you check?Only comparing numbers OTOH, with <, is perfectly safe (unless you're dealing with very small differences between huge numbers, or you use it to "simulate" equality by also checking >).
Yes, this is a problem in Python.
See https://docs.python.org/2/tutorial/floatingpoint.html
Python internally represents numbers as C doubles, so you will have all the problems inherent to floating point arithmetics. But it also includes some algorithms to "fix" the obvious cases. The example you give, 32.99999... is recognised as being 33.0. From Python 2.7 and 3.1 onwards they do this using Gay's algorithm; that is, the shortest string that rounds back to the original value. You can see a description in Python 3.1 release notes. In earlier versions, it just rounds to the first 17 decimal places.
As they themselves warn, it doesn't mean that it is going to work as decimal numbers.
>>> 1.1 + 2.2
3.3000000000000003
>>> 1.1 + 2.2 == 3.3
False
(But that should already be ringing your bells, as comparing floating point numbers for equality is never a good thing)
If you want to assure precision to a number of decimal places (for example, if you are working with finances), you can use the module decimal from the standard library. If you want to represent fractional numbers, you could use fractions, but they are both slower than plain numbers.
>>> import decimal
>>> decimal.Decimal(1.1) + decimal.Decimal(2.2)
Decimal('3.300000000000000266453525910')
# Decimal is getting the full floating point representation, no what I type!
>>> decimal.Decimal('1.1') + decimal.Decimal('2.2')
Decimal('3.3')
# Now it is fine.
>>> decimal.Decimal('1.1') + decimal.Decimal('2.2') == 3.3
False
>>> decimal.Decimal('1.1') + decimal.Decimal('2.2') == decimal.Decimal(3.3)
False
>>> decimal.Decimal('1.1') + decimal.Decimal('2.2') == decimal.Decimal('3.3')
True
In addition to the other fantastic answers here, saying roughly that IEEE754 has exactly the same issues no matter which language you interface to them with, I'd like to point out that many languages have libraries for other kinds of numbers. Some standard approaches are to use fixed-point arithmetic (many, but not all, of IEEE754's nuances come from being floating-point) or rationals. Haskell also libraries for the computable reals and cyclotomic numbers.
In addition, using these alternative kinds of numbers is especially convenient in Haskell due to its typeclass mechanism, which means that doing arithmetic with these other types of numbers looks and feels exactly the same and doing arithmetic with your usual IEEE754 Floats and Doubles; but you get the better (and worse!) properties of the alternate type. For example, with appropriate imports, you can see:
> 99/3 :: Double
33.0
> 99/3 :: Fixed E12
33.000000000000
> 99/3 :: Rational
33 % 1
> 99/3 :: CReal
33.0
> 99/3 :: Cyclotomic
33
> 98/3 :: Rational
98 % 3
> sqrt 2 :: CReal
1.4142135623730950488016887242096980785697
> sqrtInteger (-5) :: Cyclotomic
e(20) + e(20)^9 - e(20)^13 - e(20)^17
Haskell doesn't require Float and Double to be IEEE single- and double-precision floating-point numbers, but it strongly recommends it. GHC follows the recommendation. IEEE floating-point numbers have the same issues across all languages. Some of this is handled by the LIA standard, but Haskell only implements that in "a library". (No, I'm not sure what libraryor if it even exists.)
This great answer shows the various other numeric representations that are either part of Haskell (like Rational) or available from hackage like (Fixed, CReal, and Cyclotomic).
Rational, Fixed, and Cyclotomic might have similar Python libraries; Fixed is somewhat similar to the .Net Decimal type. CReal also might, but I think it might take advantage of Haskell's call-by-need and could be difficult to directly port to Python; it's also pretty slow.