Is sys.float_info machine specific? - python

I would like to know if sys.float_info.max/ sys.float_info.min are fixed values which are identical on all machines or does it depend on the machine's memory or other factors (and make the value different from machine to machine). Thank you!

It's relatively safe to assume those values will stay the same. They are specified by the IEEE 754 standard for "64bit binary floats", which all conventional Python installations use for their float values.
See on what systems does Python not use IEEE-754 double precision floats for more details from a core developer.
As noted in the above, the only known deviation is micropython which uses 32bit binary floats instead. However, given other constraints of micropython it's pretty safe to assume you're not going to be running there unless you know about it. For example, lots of the standard library doesn't exist which would break most conventional packages.

Related

on what systems does Python not use IEEE-754 double precision floats

Python makes various references to IEEE 754 floating point operations, but doesn't guarantee 1 2 that it'll be used at runtime. I'm therefore wondering where this isn't the case.
CPython source code defers to whatever the C compiler is using for a double, which in practice is an IEEE 754-2008 binary64 on all common systems I'm aware of, e.g.:
Linux and BSD distros (e.g. FreeBSD, OpenBSD, NetBSD)
Intel i386/x86 and x86-64
ARM: AArch64
Power: PPC64
MacOS all architectures supported are 754 compatible
Windows x86 and x86-64 systems
I'm aware there are other platforms it's known to build on but don't know how these work out in practice.
In theory, as you say, CPython is designed to be buildable and usable on any platform without caring about what floating-point format their C double is using.
In practice, two things are true:
To the best of my knowledge, CPython has not met a system that's not using IEEE 754 binary64 format for its C double within the last 15 years (though I'd love to hear stories to the contrary; I've been asking about this at conferences and the like for a while). My knowledge is a long way from perfect, but I've been involved with mathematical and floating-point-related aspects of CPython core development for at least 13 of those 15 years, and paying close attention to floating-point related issues in that time. I haven't seen any indications on the bug tracker or elsewhere that anyone has been trying to run CPython on systems using a floating-point format other than IEEE 754 binary64.
I strongly suspect that the first time modern CPython does meet such a system, there will be a significant number of test failures, and so the core developers are likely to find out about it fairly quickly. While we've made an effort to make things format-agnostic, it's currently close to impossible to do any testing of CPython on other formats, and it's highly likely that there are some places that implicitly assume IEEE 754 format or semantics, and that will break for something more exotic. We have yet to see any reports of such breakage.
There's one exception to the "no bug reports" report above. It's this issue: https://bugs.python.org/issue27444. There, Greg Stark reported that there were indeed failures using VAX floating-point. It's not clear to me whether the original bug report came from a system that emulated VAX floating-point.
I joined the CPython core development team in 2008. Back then, while I was working on floating-point-related issues I tried to keep in mind 5 different floating-point formats: IEEE 754 binary64, IBM's hex floating-point format as used in their zSeries mainframes, the Cray floating-point format used in the SV1 and earlier machines, and the VAX D-float and G-float formats; anything else was too ancient to be worth worrying about. Since then, the VAX formats are no longer worth caring about. Cray machines now use IEEE 754 floating-point. The IBM hex floating-point format is very much still in existence, but in practice the relevant IBM hardware also has support for IEEE 754, and the IBM machines that Python meets all seem to be using IEEE 754 floating-point.
Rather than exotic floating-point formats, the modern challenges seem to be more to do with variations in adherence to the rest of the IEEE 754 standard: systems that don't support NaNs, or treat subnormals differently, or allow use of higher precision for intermediate operations, or where compilers make behaviour-changing optimizations.
The above is all about CPython-the-implementation, not Python-the-language. But the story for the Python language is largely similar. In theory, it makes no assumptions about the floating-point format. In practice, I don't know of any alternative Python implementations that don't end up using an IEEE 754 binary format (if not semantics) for the float type. IronPython and Jython both target runtimes that are explicit that floating-point will be IEEE 754 binary64. JavaScript-based versions of Python will similarly presumably be using JavaScript's Number type, which is required to be IEEE 754 binary64 by the ECMAScript standard. PyPy runs on more-or-less the same platforms that CPython does, with the same floating-point formats. MicroPython uses single-precision for its float type, but as far as I know that's still IEEE 754 binary32 in practice.

int(np.float128) cast precision loss [duplicate]

What precision does numpy.float128 map to internally? Is it __float128 or long double? Or something else entirely?
A potential follow on question if anybody knows: is it safe in C to cast a __float128 to a (16 byte) long double, with just a loss in precision? (this is for interfacing with a C lib that operates on long doubles).
Edit: In response to the comment, the platform is 'Linux-3.0.0-14-generic-x86_64-with-Ubuntu-11.10-oneiric'. Now, if numpy.float128 has varying precision dependent on the platform, that is also useful knowledge for me!
Just to be clear, it is the precision I am interested in, not the size of an element.
numpy.longdouble refers to whatever type your C compiler calls long double. Currently, this is the only extended precision floating point type that numpy supports.
On x86-32 and x86-64, this is an 80-bit floating point type. On more exotic systems it may be something else (IIRC on Sparc it's an actual 128-bit IEEE float, and on PPC it's double-double). (It also may depend on what OS and compiler you're using -- e.g. MSVC on Windows doesn't support any kind of extended precision at all.)
Numpy will also export some name like numpy.float96 or numpy.float128. Which of these names is exported depends on your platform/compiler, but whatever you get always refers to the same underlying type as longdouble. Also, these names are highly misleading. They do not indicate a 96- or 128-bit IEEE floating point format. Instead, they indicate the number of bits of alignment used by the underlying long double type. So e.g. on x86-32, long double is 80 bits, but gets padded up to 96 bits to maintain 32-bit alignment, and numpy calls this float96. On x86-64, long double is again the identical 80 bit type, but now it gets padded up to 128 bits to maintain 64-bit alignment, and numpy calls this float128. There's no extra precision, just extra padding.
Recommendation: ignore the float96/float128 names, just use numpy.longdouble. Or better yet stick to doubles unless you have a truly compelling reason. They'll be faster, more portable, etc.
It's quite recommended to use longdouble instead of float128, since it's quite a mess, ATM. Python will cast it to float64 during initialization.
Inside numpy, it can be a double or a long double. It's defined in npy_common.h and depends of your platform. I don't know if you can include it out-of-the-box into your source code.
If you don't need performance in this part of your algorithm, a safer way could be to export it to a string and use strold afterwards.
TLDR from the numpy docs:
np.longdouble is padded to the system default; np.float96 and np.float128 are provided for users who want specific padding. In spite of the names, np.float96 and np.float128 provide only as much precision as np.longdouble, that is, 80 bits on most x86 machines and 64 bits in standard Windows builds.

What is the python standards by which a string is parsed to float?

For relatively simple floats, the numerical precision is sufficient to represent them exactly. For example, 17.5 is equal to 17.5
For more complicated floats, such as
17.4999999999999982236431605997495353221893310546874 = 17.499999999999996447286321199499070644378662109375
17.4999999999999982236431605997495353221893310546875 = 17.5
Using as_integer_ratio() on the first number above, one obtains (4925812092436479, 281474976710656) and since (4925812092436479*2+1)/(2*281474976710656) equals the second number above, it becomes evident that the partition between >=17.5 and <17.5 is 1/(2*281474976710656).
Do the python standards guarantee a particular float will be "binned" into a particular bin above, or is it implementation dependent? If there is a guarantee, how is it decided?
For the above I used, python 3.5.6, but I am interested in the general answer for python 3.x if it exists.
For relatively simple floats, the numerical precision is sufficient to represent them exactly
Not really. Yes, 17.5 can be represented exactly because it is a multiple of a power of two (a multiple of 2-1, to be exact). But even very simple floats like 0.1 cannot be represented exactly. There it depends on the text to float conversion routine to get a representation that is as close as possible.
The conversion is done by the runtime (or the C or Java runtime of the compiler, for literals), which uses the C or Java functions (like C's strtod()) to do this (Java implements the code of David Gay's strtod(), but in Java language).
Not every implementation of strtod(), i.e. not every C/Java compiler uses the same methodology to convert, so there may be slight, usually insignificant differences in some of the results.
FWIW, the website Exploring Binary (no affiliation, I'm just a big fan) has many articles on this subject. It is obviously not as simple as expected.
For relatively simple floats, the numerical precision is sufficient to represent them exactly.
No, even simple decimals don't necessarily have an exact IEEE-754 representation:
>>> format(0.1, '.20f')
'0.10000000000000000555'
>>> format(0.2, '.20f')
'0.20000000000000001110'
>>> format(0.3, '.20f')
'0.29999999999999998890'
>>> format(0.1 + 0.2, '.20f')
'0.30000000000000004441'
Powers of 2 (x.0, x.5, x.25, x.125, …) are exactly representable, modulo precision issues.
Do the python standards guarantee a particular float will be "binned" into a particular bin above, or is it implementation dependent?
Pretty sure Python simply delegates to the underlying system, so it's mostly hardware-dependent. If you want guarantees, use decimal. IIRC the native (C) implementation was merged in 3.3, and the performance impact of using decimals has thus become much, much lower than it was in Python 2.
Python floats are IEEE-754 doubles.

Numpy octuple precision floats and 128 bit ints. Why and how?

This is mostly a question out of curiosity. I noticed that the numpy test suite contains tests for 128 bit integers, and the numerictypes module refers to int128, float256 (octuple precision?), and other types that don't seem to map to numpy dtypes on my machine.
My machine is 64bit, yet I can use quadruple 128bit floats (but not really). I suppose that if it's possible to emulate quadruple floats in software, one can theoretically also emulate octuple floats and 128bit ints. On the other hand, until just now I had never heard of either 128bit ints or octuple precision floating point before. Why is there a reference to 128bit ints and 256bit floats in numpy's numerictypes module if there are no corresponding dtypes, and how can I use those?
This is a very interesting question and probably there are reasons related to python, to computing and/or to hardware. While not trying to give a full answer, here is what I would go towards...
First note that the types are defined by the language and can be different from your hardware architecture. For example you could even have doubles with an 8-bits processor. Of course any arithmetic involves multiple CPU instructions, making the computation much slower. Still, if your application requires it, it might be worth it or even required (better being late than wrong, especially if say you are running a simulation for a say bridge stability...) So where is 128bit precision required? Here's the wikipedia article on it...
One more interesting detail is that when we say a computer is say 64-bit, this is not fully describing the hardware. There are a lot of pieces that can each be (and at least have been at times) different bits: The computational registers in the CPU, the memory addressing scheme / memory registers and the different buses with most important the buss from CPU to memory.
-The ALU (arithmetic and logic unit) has registers that do calculations. Your machines are 64bit (not sure if that also mean they could do 2 32bit calculations at a similar time) This is clearly the most relevant quantity for this discussion. Long time ago, it used to be the case you could go out and buy a co-processor to speed that for calculations of higher precision...
-The Registers that hold memory addresses limit the memory the computer can see (directly) that is why computers that had 32bit memory registers could only see 2^32 bytes (or approx 4 GB) Notice that for 16bits, this becomes 65K which is very low. The OS can find ways around this limit, but not for a single program, so no program in a 32bit computer can normally have more than 4GB memmory.
-Notice that those limits are about bytes, not bits. That is because when referring and loading from memory we load bytes. In fact, the way this is done, loading a byte (8 bits) or 8 (64 bits == buss length for your computer) takes the same time. I ask for an address, and then get at once all bits through the bus.
It can be that in an architecture all these quantities are not the same number of bits.
NumPy is amazingly powerful and can handle numbers much bigger than the internal CPU representation (e.g. 64 bit).
In case of dynamic type it stores the number in an array. It can extend the memory block too, that is why you can have an integer with 500 digits. This dynamic type is called bignum. In older Python versions it was the type long. In newer Python (3.0+) there is only long, which is called int, which supports almost arbitrarily number of digits (-> bignum).
If you specify a data type (int32 for example), then you specify bit length and bit format, i.e. which bits in memory stands for what. Example:
dt = np.dtype(np.int32) # 32-bit integer
dt = np.dtype(np.complex128) # 128-bit complex floating-point number
Look in: https://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html

Do Python and Haskell have the float uncertanity issue of C/C++?

First of all, I was not studying math in English language, so I may use wrong words in my text.
Float numbers can be finite(42.36) and infinite (42.363636...)
In C/C++ numbers are stored at base 2. Our minds operate floats at base 10.
The problem is -
many (a lot, actually) of float numbers with base 10, that are finite, have no exact finite representation in base 2, and vice-versa.
This doesn't mean anything most of the time. The last digit of double may be off by 1 bit - not a problem.
A problem arises when we compute two floats that are actually integers. 99.0/3.0 on C++ can result in 33.0 as well as 32.9999...99. And if you convert it to integer then - you are in for a surprise. I always add a special value (2*smallest value for given type and architecture) before rounding up in C for this reason. Should I do it in Python or not?
I have run some tests in Python and it seems float division always results as expected. But some tests are not enough because the problem is architecture-dependent. Do somebody know for sure if it is taken care of, and on what level - in float type itself or only in rounding up and shortening functions?
P.S. And if somebody can clarify the same thing for Haskell, which I am only starting with - it would be great.
UPDATE
Folks pointed out to an official document stating there is uncertainty in floating point arithmetic. The remaining question is - do math functions like ceil take care of them or should I do it on my own? This must be pointed out to beginner users every time we speak of these functions, because otherwise they will all stumble on that problem.
The format C and C++ use for representing float and double is standardized (IEEE 754), and the problems you describe are inherent in that representation. Since Python is implemented in C, its floating point types are prone to the same rounding problems.
Haskell's Float and Double are a somewhat higher level abstraction, but since most (all?) modern CPUs use IEEE754 for floating point calculations, you most probably will have that kind of rounding errors there as well.
In other words: Only languages/libraries which choose to not base their floating point types on the underlying architecture might be able to circumvent the IEEE754 rounding problems to a certain degree, but since the underlying hardware does not support other representations directly, there has to be a performance penalty. Therefore, probably most languages will stick to the standard, not least because its limitations are well known.
Real numbers themselves, including floats, are never "infinite" in any mathematical sense. They may have infinite decimal representations, but that's only a technical problem of the way we write them (or store them in computers). In fact though, IEEE754 also specifies +∞ and -∞ values, those are actual infinities... but they don't represent real numbers and are mathematically quite horrible in many a way.
Also... "And if you convert it to integer then" you should never "convert" floats to integers anyway, it's not really possible: you can only round them to integers. and if you do that with e.g. Haskell's round, it's pretty safe indeed, certainly
Prelude> round $ 99/3
33
Though ghci calculates the division with floating-point.
The only things that are always unsafe:
Of course, implicit conversion from float to int is completely crazy, and positively a mistake in the C-languages. Haskell and Python are both properly strongly typed, so such stuff won't happen by accident.
Floating-points should generally not be expected to be exactly equal to anything particular. It's not really useful to expect so anyway, because for actual real numbers any single one is a null set, which roughly means the only way two real number can be equal is if there's so deep mathematical reason for it. But for any distribution e.g. from a physical process, the probability for equalness is exactly zero, so why would you check?Only comparing numbers OTOH, with <, is perfectly safe (unless you're dealing with very small differences between huge numbers, or you use it to "simulate" equality by also checking >).
Yes, this is a problem in Python.
See https://docs.python.org/2/tutorial/floatingpoint.html
Python internally represents numbers as C doubles, so you will have all the problems inherent to floating point arithmetics. But it also includes some algorithms to "fix" the obvious cases. The example you give, 32.99999... is recognised as being 33.0. From Python 2.7 and 3.1 onwards they do this using Gay's algorithm; that is, the shortest string that rounds back to the original value. You can see a description in Python 3.1 release notes. In earlier versions, it just rounds to the first 17 decimal places.
As they themselves warn, it doesn't mean that it is going to work as decimal numbers.
>>> 1.1 + 2.2
3.3000000000000003
>>> 1.1 + 2.2 == 3.3
False
(But that should already be ringing your bells, as comparing floating point numbers for equality is never a good thing)
If you want to assure precision to a number of decimal places (for example, if you are working with finances), you can use the module decimal from the standard library. If you want to represent fractional numbers, you could use fractions, but they are both slower than plain numbers.
>>> import decimal
>>> decimal.Decimal(1.1) + decimal.Decimal(2.2)
Decimal('3.300000000000000266453525910')
# Decimal is getting the full floating point representation, no what I type!
>>> decimal.Decimal('1.1') + decimal.Decimal('2.2')
Decimal('3.3')
# Now it is fine.
>>> decimal.Decimal('1.1') + decimal.Decimal('2.2') == 3.3
False
>>> decimal.Decimal('1.1') + decimal.Decimal('2.2') == decimal.Decimal(3.3)
False
>>> decimal.Decimal('1.1') + decimal.Decimal('2.2') == decimal.Decimal('3.3')
True
In addition to the other fantastic answers here, saying roughly that IEEE754 has exactly the same issues no matter which language you interface to them with, I'd like to point out that many languages have libraries for other kinds of numbers. Some standard approaches are to use fixed-point arithmetic (many, but not all, of IEEE754's nuances come from being floating-point) or rationals. Haskell also libraries for the computable reals and cyclotomic numbers.
In addition, using these alternative kinds of numbers is especially convenient in Haskell due to its typeclass mechanism, which means that doing arithmetic with these other types of numbers looks and feels exactly the same and doing arithmetic with your usual IEEE754 Floats and Doubles; but you get the better (and worse!) properties of the alternate type. For example, with appropriate imports, you can see:
> 99/3 :: Double
33.0
> 99/3 :: Fixed E12
33.000000000000
> 99/3 :: Rational
33 % 1
> 99/3 :: CReal
33.0
> 99/3 :: Cyclotomic
33
> 98/3 :: Rational
98 % 3
> sqrt 2 :: CReal
1.4142135623730950488016887242096980785697
> sqrtInteger (-5) :: Cyclotomic
e(20) + e(20)^9 - e(20)^13 - e(20)^17
Haskell doesn't require Float and Double to be IEEE single- and double-precision floating-point numbers, but it strongly recommends it. GHC follows the recommendation. IEEE floating-point numbers have the same issues across all languages. Some of this is handled by the LIA standard, but Haskell only implements that in "a library". (No, I'm not sure what libraryor if it even exists.)
This great answer shows the various other numeric representations that are either part of Haskell (like Rational) or available from hackage like (Fixed, CReal, and Cyclotomic).
Rational, Fixed, and Cyclotomic might have similar Python libraries; Fixed is somewhat similar to the .Net Decimal type. CReal also might, but I think it might take advantage of Haskell's call-by-need and could be difficult to directly port to Python; it's also pretty slow.

Categories