On nearly every system, Python can give you human-readable, short representation of a floating point, not the 17 digit machine-precision:
Python 3.3.0 (default, Dec 20 2014, 13:28:01)
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 0.1
0.1
>>> import sys; sys.float_repr_style
'short'
On an ARM926EJ-S, you don't get the short representation:
Python 3.3.0 (default, Jun 3 2014, 12:11:19)
[GCC 4.7.3] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 0.1
0.10000000000000001
>>> import sys; sys.float_repr_style
'legacy'
Python 2.7 apparently added this short representation to repr(), for most systems:
Conversions between floating-point numbers and strings are now correctly rounded on most platforms. These conversions occur in many different places: str() on floats and complex numbers; the float and complexconstructors; numeric formatting; serializing and deserializing floats and complex numbers using the marshal, pickle and json modules; parsing of float and imaginary literals in Python code; and Decimal-to-float conversion.
Related to this, the repr() of a floating-point number x now returns a result based on the shortest decimal string that’s guaranteed to round back to x under correct rounding (with round-half-to-even rounding mode). Previously it gave a string based on rounding x to 17 decimal digits.
The rounding library responsible for this improvement works on Windows and on Unix platforms using the gcc, icc, or suncc compilers. There may be a small number of platforms where correct operation of this code cannot be guaranteed, so the code is not used on such systems. You can find out which code is being used by checking sys.float_repr_style, which will be short if the new code is in use and legacy if it isn’t.
Implemented by Eric Smith and Mark Dickinson, using David Gay’s dtoa.c library; issue 7117.
They say some platforms can't guarantee correct operation (of dtoa.c I assume), but don't say which platform limitation are the ones that cause this.
What is it about the ARM926EJ-S that means the short float repr() can't be used?
Short answer: it's likely to be not a limitation of the platform, but a limitation of Python's build machinery: it doesn't have a universal way to set 53-bit precision for floating-point computations.
For more detail, take a look at the Include/pyport.h file in the Python source distribution. Here's an excerpt:
/* If we can't guarantee 53-bit precision, don't use the code
in Python/dtoa.c, but fall back to standard code. This
means that repr of a float will be long (17 sig digits).
Realistically, there are two things that could go wrong:
(1) doubles aren't IEEE 754 doubles, or
(2) we're on x86 with the rounding precision set to 64-bits
(extended precision), and we don't know how to change
the rounding precision.
*/
#if !defined(DOUBLE_IS_LITTLE_ENDIAN_IEEE754) && \
!defined(DOUBLE_IS_BIG_ENDIAN_IEEE754) && \
!defined(DOUBLE_IS_ARM_MIXED_ENDIAN_IEEE754)
#define PY_NO_SHORT_FLOAT_REPR
#endif
/* double rounding is symptomatic of use of extended precision on x86. If
we're seeing double rounding, and we don't have any mechanism available for
changing the FPU rounding precision, then don't use Python/dtoa.c. */
#if defined(X87_DOUBLE_ROUNDING) && !defined(HAVE_PY_SET_53BIT_PRECISION)
#define PY_NO_SHORT_FLOAT_REPR
#endif
Essentially, there are two things that can go wrong. One is that the Python configuration fails to identify the floating-point format of a C double. That format is almost always IEEE 754 binary64, but sometimes the config script fails to figure that out. That's the first #if preprocessor check in the snippet above. Look at the pyconfig.h file generated at compile time, and see if at least one of the DOUBLE_IS_... macros is #defined. Alternatively, try this at a Python prompt:
>>> float.__getformat__('double')
'IEEE, little-endian'
If you see something like the above, this part should be okay. If you see something like 'unknown', then Python hasn't managed to identify the floating-point format.
The second thing that can go wrong is that we do have IEEE 754 binary64 format doubles, but Python's build machinery can't figure out how to ensure 53-bit precision for floating-point computations for this platform. The dtoa.c source requires that we're able to do all floating-point operations (whether implemented in hardware or software) at a precision of 53 bits. That's particularly a problem on Intel processors that are using the x87 floating-point unit for double-precision computations (as opposed to the newer SSE2 instructions): the default precision of the x87 is 64-bits, and using it for double-precision computations with that default precision setting leads to double rounding, which breaks the dtoa.c assumptions. So at config time, the build machinery runs a check to see (1) whether double rounding is a potential problem, and (2) if so, whether there's a way to put the FPU into 53-bit precision. So now you want to look at pyconfig.h for the X87_DOUBLE_ROUNDING and HAVE_PY_SET_53BIT_PRECISION macros.
So it could be either of the above. If I had to guess, I'd guess that on that platform, double rounding is being detected as a problem, and it's not known how to fix it. The solution in that case is to adapt pyport.h to define the _Py_SET_53BIT_PRECISION_* macros in whatever platform-specific way works to get that 53-bit precision mode, and then to define HAVE_PY_SET_53BIT_PRECISION.
Related
Python makes various references to IEEE 754 floating point operations, but doesn't guarantee 1 2 that it'll be used at runtime. I'm therefore wondering where this isn't the case.
CPython source code defers to whatever the C compiler is using for a double, which in practice is an IEEE 754-2008 binary64 on all common systems I'm aware of, e.g.:
Linux and BSD distros (e.g. FreeBSD, OpenBSD, NetBSD)
Intel i386/x86 and x86-64
ARM: AArch64
Power: PPC64
MacOS all architectures supported are 754 compatible
Windows x86 and x86-64 systems
I'm aware there are other platforms it's known to build on but don't know how these work out in practice.
In theory, as you say, CPython is designed to be buildable and usable on any platform without caring about what floating-point format their C double is using.
In practice, two things are true:
To the best of my knowledge, CPython has not met a system that's not using IEEE 754 binary64 format for its C double within the last 15 years (though I'd love to hear stories to the contrary; I've been asking about this at conferences and the like for a while). My knowledge is a long way from perfect, but I've been involved with mathematical and floating-point-related aspects of CPython core development for at least 13 of those 15 years, and paying close attention to floating-point related issues in that time. I haven't seen any indications on the bug tracker or elsewhere that anyone has been trying to run CPython on systems using a floating-point format other than IEEE 754 binary64.
I strongly suspect that the first time modern CPython does meet such a system, there will be a significant number of test failures, and so the core developers are likely to find out about it fairly quickly. While we've made an effort to make things format-agnostic, it's currently close to impossible to do any testing of CPython on other formats, and it's highly likely that there are some places that implicitly assume IEEE 754 format or semantics, and that will break for something more exotic. We have yet to see any reports of such breakage.
There's one exception to the "no bug reports" report above. It's this issue: https://bugs.python.org/issue27444. There, Greg Stark reported that there were indeed failures using VAX floating-point. It's not clear to me whether the original bug report came from a system that emulated VAX floating-point.
I joined the CPython core development team in 2008. Back then, while I was working on floating-point-related issues I tried to keep in mind 5 different floating-point formats: IEEE 754 binary64, IBM's hex floating-point format as used in their zSeries mainframes, the Cray floating-point format used in the SV1 and earlier machines, and the VAX D-float and G-float formats; anything else was too ancient to be worth worrying about. Since then, the VAX formats are no longer worth caring about. Cray machines now use IEEE 754 floating-point. The IBM hex floating-point format is very much still in existence, but in practice the relevant IBM hardware also has support for IEEE 754, and the IBM machines that Python meets all seem to be using IEEE 754 floating-point.
Rather than exotic floating-point formats, the modern challenges seem to be more to do with variations in adherence to the rest of the IEEE 754 standard: systems that don't support NaNs, or treat subnormals differently, or allow use of higher precision for intermediate operations, or where compilers make behaviour-changing optimizations.
The above is all about CPython-the-implementation, not Python-the-language. But the story for the Python language is largely similar. In theory, it makes no assumptions about the floating-point format. In practice, I don't know of any alternative Python implementations that don't end up using an IEEE 754 binary format (if not semantics) for the float type. IronPython and Jython both target runtimes that are explicit that floating-point will be IEEE 754 binary64. JavaScript-based versions of Python will similarly presumably be using JavaScript's Number type, which is required to be IEEE 754 binary64 by the ECMAScript standard. PyPy runs on more-or-less the same platforms that CPython does, with the same floating-point formats. MicroPython uses single-precision for its float type, but as far as I know that's still IEEE 754 binary32 in practice.
What precision does numpy.float128 map to internally? Is it __float128 or long double? Or something else entirely?
A potential follow on question if anybody knows: is it safe in C to cast a __float128 to a (16 byte) long double, with just a loss in precision? (this is for interfacing with a C lib that operates on long doubles).
Edit: In response to the comment, the platform is 'Linux-3.0.0-14-generic-x86_64-with-Ubuntu-11.10-oneiric'. Now, if numpy.float128 has varying precision dependent on the platform, that is also useful knowledge for me!
Just to be clear, it is the precision I am interested in, not the size of an element.
numpy.longdouble refers to whatever type your C compiler calls long double. Currently, this is the only extended precision floating point type that numpy supports.
On x86-32 and x86-64, this is an 80-bit floating point type. On more exotic systems it may be something else (IIRC on Sparc it's an actual 128-bit IEEE float, and on PPC it's double-double). (It also may depend on what OS and compiler you're using -- e.g. MSVC on Windows doesn't support any kind of extended precision at all.)
Numpy will also export some name like numpy.float96 or numpy.float128. Which of these names is exported depends on your platform/compiler, but whatever you get always refers to the same underlying type as longdouble. Also, these names are highly misleading. They do not indicate a 96- or 128-bit IEEE floating point format. Instead, they indicate the number of bits of alignment used by the underlying long double type. So e.g. on x86-32, long double is 80 bits, but gets padded up to 96 bits to maintain 32-bit alignment, and numpy calls this float96. On x86-64, long double is again the identical 80 bit type, but now it gets padded up to 128 bits to maintain 64-bit alignment, and numpy calls this float128. There's no extra precision, just extra padding.
Recommendation: ignore the float96/float128 names, just use numpy.longdouble. Or better yet stick to doubles unless you have a truly compelling reason. They'll be faster, more portable, etc.
It's quite recommended to use longdouble instead of float128, since it's quite a mess, ATM. Python will cast it to float64 during initialization.
Inside numpy, it can be a double or a long double. It's defined in npy_common.h and depends of your platform. I don't know if you can include it out-of-the-box into your source code.
If you don't need performance in this part of your algorithm, a safer way could be to export it to a string and use strold afterwards.
TLDR from the numpy docs:
np.longdouble is padded to the system default; np.float96 and np.float128 are provided for users who want specific padding. In spite of the names, np.float96 and np.float128 provide only as much precision as np.longdouble, that is, 80 bits on most x86 machines and 64 bits in standard Windows builds.
For relatively simple floats, the numerical precision is sufficient to represent them exactly. For example, 17.5 is equal to 17.5
For more complicated floats, such as
17.4999999999999982236431605997495353221893310546874 = 17.499999999999996447286321199499070644378662109375
17.4999999999999982236431605997495353221893310546875 = 17.5
Using as_integer_ratio() on the first number above, one obtains (4925812092436479, 281474976710656) and since (4925812092436479*2+1)/(2*281474976710656) equals the second number above, it becomes evident that the partition between >=17.5 and <17.5 is 1/(2*281474976710656).
Do the python standards guarantee a particular float will be "binned" into a particular bin above, or is it implementation dependent? If there is a guarantee, how is it decided?
For the above I used, python 3.5.6, but I am interested in the general answer for python 3.x if it exists.
For relatively simple floats, the numerical precision is sufficient to represent them exactly
Not really. Yes, 17.5 can be represented exactly because it is a multiple of a power of two (a multiple of 2-1, to be exact). But even very simple floats like 0.1 cannot be represented exactly. There it depends on the text to float conversion routine to get a representation that is as close as possible.
The conversion is done by the runtime (or the C or Java runtime of the compiler, for literals), which uses the C or Java functions (like C's strtod()) to do this (Java implements the code of David Gay's strtod(), but in Java language).
Not every implementation of strtod(), i.e. not every C/Java compiler uses the same methodology to convert, so there may be slight, usually insignificant differences in some of the results.
FWIW, the website Exploring Binary (no affiliation, I'm just a big fan) has many articles on this subject. It is obviously not as simple as expected.
For relatively simple floats, the numerical precision is sufficient to represent them exactly.
No, even simple decimals don't necessarily have an exact IEEE-754 representation:
>>> format(0.1, '.20f')
'0.10000000000000000555'
>>> format(0.2, '.20f')
'0.20000000000000001110'
>>> format(0.3, '.20f')
'0.29999999999999998890'
>>> format(0.1 + 0.2, '.20f')
'0.30000000000000004441'
Powers of 2 (x.0, x.5, x.25, x.125, …) are exactly representable, modulo precision issues.
Do the python standards guarantee a particular float will be "binned" into a particular bin above, or is it implementation dependent?
Pretty sure Python simply delegates to the underlying system, so it's mostly hardware-dependent. If you want guarantees, use decimal. IIRC the native (C) implementation was merged in 3.3, and the performance impact of using decimals has thus become much, much lower than it was in Python 2.
Python floats are IEEE-754 doubles.
My python program manipulates bitcoin amounts precise to 8 decimal places. My intention is to use decimal.Decimal types everywhere, to avoid any floating point precision issues -- but I'm not sure I got every usage.
For quality assurance, I'd like to raise an error if there's any floats constructed anywhere in the program. Is this possible in python 3.5?
(I cannot use integers because I'm interfacing via JSON with other programs that expect decimal values.)
First of all, I was not studying math in English language, so I may use wrong words in my text.
Float numbers can be finite(42.36) and infinite (42.363636...)
In C/C++ numbers are stored at base 2. Our minds operate floats at base 10.
The problem is -
many (a lot, actually) of float numbers with base 10, that are finite, have no exact finite representation in base 2, and vice-versa.
This doesn't mean anything most of the time. The last digit of double may be off by 1 bit - not a problem.
A problem arises when we compute two floats that are actually integers. 99.0/3.0 on C++ can result in 33.0 as well as 32.9999...99. And if you convert it to integer then - you are in for a surprise. I always add a special value (2*smallest value for given type and architecture) before rounding up in C for this reason. Should I do it in Python or not?
I have run some tests in Python and it seems float division always results as expected. But some tests are not enough because the problem is architecture-dependent. Do somebody know for sure if it is taken care of, and on what level - in float type itself or only in rounding up and shortening functions?
P.S. And if somebody can clarify the same thing for Haskell, which I am only starting with - it would be great.
UPDATE
Folks pointed out to an official document stating there is uncertainty in floating point arithmetic. The remaining question is - do math functions like ceil take care of them or should I do it on my own? This must be pointed out to beginner users every time we speak of these functions, because otherwise they will all stumble on that problem.
The format C and C++ use for representing float and double is standardized (IEEE 754), and the problems you describe are inherent in that representation. Since Python is implemented in C, its floating point types are prone to the same rounding problems.
Haskell's Float and Double are a somewhat higher level abstraction, but since most (all?) modern CPUs use IEEE754 for floating point calculations, you most probably will have that kind of rounding errors there as well.
In other words: Only languages/libraries which choose to not base their floating point types on the underlying architecture might be able to circumvent the IEEE754 rounding problems to a certain degree, but since the underlying hardware does not support other representations directly, there has to be a performance penalty. Therefore, probably most languages will stick to the standard, not least because its limitations are well known.
Real numbers themselves, including floats, are never "infinite" in any mathematical sense. They may have infinite decimal representations, but that's only a technical problem of the way we write them (or store them in computers). In fact though, IEEE754 also specifies +∞ and -∞ values, those are actual infinities... but they don't represent real numbers and are mathematically quite horrible in many a way.
Also... "And if you convert it to integer then" you should never "convert" floats to integers anyway, it's not really possible: you can only round them to integers. and if you do that with e.g. Haskell's round, it's pretty safe indeed, certainly
Prelude> round $ 99/3
33
Though ghci calculates the division with floating-point.
The only things that are always unsafe:
Of course, implicit conversion from float to int is completely crazy, and positively a mistake in the C-languages. Haskell and Python are both properly strongly typed, so such stuff won't happen by accident.
Floating-points should generally not be expected to be exactly equal to anything particular. It's not really useful to expect so anyway, because for actual real numbers any single one is a null set, which roughly means the only way two real number can be equal is if there's so deep mathematical reason for it. But for any distribution e.g. from a physical process, the probability for equalness is exactly zero, so why would you check?Only comparing numbers OTOH, with <, is perfectly safe (unless you're dealing with very small differences between huge numbers, or you use it to "simulate" equality by also checking >).
Yes, this is a problem in Python.
See https://docs.python.org/2/tutorial/floatingpoint.html
Python internally represents numbers as C doubles, so you will have all the problems inherent to floating point arithmetics. But it also includes some algorithms to "fix" the obvious cases. The example you give, 32.99999... is recognised as being 33.0. From Python 2.7 and 3.1 onwards they do this using Gay's algorithm; that is, the shortest string that rounds back to the original value. You can see a description in Python 3.1 release notes. In earlier versions, it just rounds to the first 17 decimal places.
As they themselves warn, it doesn't mean that it is going to work as decimal numbers.
>>> 1.1 + 2.2
3.3000000000000003
>>> 1.1 + 2.2 == 3.3
False
(But that should already be ringing your bells, as comparing floating point numbers for equality is never a good thing)
If you want to assure precision to a number of decimal places (for example, if you are working with finances), you can use the module decimal from the standard library. If you want to represent fractional numbers, you could use fractions, but they are both slower than plain numbers.
>>> import decimal
>>> decimal.Decimal(1.1) + decimal.Decimal(2.2)
Decimal('3.300000000000000266453525910')
# Decimal is getting the full floating point representation, no what I type!
>>> decimal.Decimal('1.1') + decimal.Decimal('2.2')
Decimal('3.3')
# Now it is fine.
>>> decimal.Decimal('1.1') + decimal.Decimal('2.2') == 3.3
False
>>> decimal.Decimal('1.1') + decimal.Decimal('2.2') == decimal.Decimal(3.3)
False
>>> decimal.Decimal('1.1') + decimal.Decimal('2.2') == decimal.Decimal('3.3')
True
In addition to the other fantastic answers here, saying roughly that IEEE754 has exactly the same issues no matter which language you interface to them with, I'd like to point out that many languages have libraries for other kinds of numbers. Some standard approaches are to use fixed-point arithmetic (many, but not all, of IEEE754's nuances come from being floating-point) or rationals. Haskell also libraries for the computable reals and cyclotomic numbers.
In addition, using these alternative kinds of numbers is especially convenient in Haskell due to its typeclass mechanism, which means that doing arithmetic with these other types of numbers looks and feels exactly the same and doing arithmetic with your usual IEEE754 Floats and Doubles; but you get the better (and worse!) properties of the alternate type. For example, with appropriate imports, you can see:
> 99/3 :: Double
33.0
> 99/3 :: Fixed E12
33.000000000000
> 99/3 :: Rational
33 % 1
> 99/3 :: CReal
33.0
> 99/3 :: Cyclotomic
33
> 98/3 :: Rational
98 % 3
> sqrt 2 :: CReal
1.4142135623730950488016887242096980785697
> sqrtInteger (-5) :: Cyclotomic
e(20) + e(20)^9 - e(20)^13 - e(20)^17
Haskell doesn't require Float and Double to be IEEE single- and double-precision floating-point numbers, but it strongly recommends it. GHC follows the recommendation. IEEE floating-point numbers have the same issues across all languages. Some of this is handled by the LIA standard, but Haskell only implements that in "a library". (No, I'm not sure what libraryor if it even exists.)
This great answer shows the various other numeric representations that are either part of Haskell (like Rational) or available from hackage like (Fixed, CReal, and Cyclotomic).
Rational, Fixed, and Cyclotomic might have similar Python libraries; Fixed is somewhat similar to the .Net Decimal type. CReal also might, but I think it might take advantage of Haskell's call-by-need and could be difficult to directly port to Python; it's also pretty slow.