Norms in Python for floating point vs. Decimal (fixed-point) - python

Is it recommended to use Python's native floating point implementation, or its decimal implementation for use-cases where precision is important?
I thought this question would be easy to answer: if accumulated error has significant implications, e.g. perhaps in calculating orbital trajectories or the like, then an exact representation might make more sense.
I'm unsure for run of the mill deep learning use-cases, for scientific computing generally (e.g. many people use numpy or scikit-learn which i think use floating point implementations), and for financial computing (e.g. trading strategies) what the norms are.
Does anyone know the norms for floating point vs. Decimal use in python for these three areas?
Finance (Trading Strategies)
Deep Learning
Scientific Computing
Thanks
N.B.: This is /not/ a question about the difference between floating point and fixed-point representations, or why floating point arithmetic produces surprising results. This is a question about what norms are.

I learn more about Deep Learning and Scientific Computing, but since my family is running the financing business, I think I can answer the question.
First and foremost, the float numbers are not evil; all you need to do is to understand how much precision does your project needs.
Finance
In the Financing area, depending on usage, you can use decimal or float number. Plus, different banks have different requirements. Generally, if you are dealing with cash or cash equivalent, you may use decimal since the fractional monetary unit is known. For example, for dollars, the fractional monetary unit is 0.01. So you can use decimal to store it, and in the database, you can just use number(20,2)(oracle) or similar things to store your decimal number. The precision is enough since banks have a systematic way to minimize errors on day one, even before the computers appear. The programmers only need to correctly implement what the bank's guideline says.
For other things in the financing area, like analysis and interest rate, using double is enough. Here the precision is not important, but the simplicity matters. CPUs are optimized to calculate float numbers, so no special methods are needed to calculate float arithmetic. Since arithmetic in computers is a huge topic, using an optimized and stabilized way to perform a calculation is much safer than to create its own methods to do arithmetic. Plus, one or two float calculations will not have a huge compact on the precision. For example, banks usually store the value in decimal and then perform multiplication with a float interest rate and then convert back to decimal. In this way, errors will not accumulate. Considering we only need two digits to the right of the decimal point, the float number's precision is quite enough to do such a computation.
I have heard that in investment banks, they use double in all of their systems since they deal with very large amounts of cash. Thus in these banks, simplicity and performance are more important than precision.
Deep Learning
Deep Learning is one of the fields that do not need high precision but do need high performance. A neural network can have millions of parameters, so the precision of a single weight and bias will not impact the prediction of the network. Instead, the neural network needs to compute very fast to train on a given dataset and give out a prediction in a reasonable time interval. Plus, many accelerators can actually accelerate a specific type of float: half-precision i.e., fp16. Thus, to reduce the size of the network in memory and to accelerate the train and prediction process, many neural networks usually run in hybrid mode. The neural network framework and accelerator driver can decide what parameters can be computed in fp16 with minimum overflow and underflow risk since fp16 has a pretty small range: 10^-8 to 65504. Other parameters are still computed in fp32. In some edge usage, the usable memory is very small (for example, K 210 and edge TPU only has 8MB onboard SRAM), so neural networks need to use 8-bit fixed-point numbers to fit in these devices. The fixed-point numbers are like decimals that they are the opposite of floating-point numbers as they have fixed digits after the decimal point. Usually, they represent themselves in the system as int8 or unit8.
Scientific Computation
The double type (i.e. 64-bit floating number) usually meets the scientist's need in scientific computation. In addition, IEEE 754 also has defined quad precision (128 bit) to facilitate scientific computation. Intel's x86 processors also have an 80-bit extended precision format.
However, some of the scientific computation needs arbitrary precision arithmetic. For example, to compute pi and to do astronomical simulation need high precision computation. Thus, they need something different, which is called arbitrary-precision floating-point number. One of the most famous libraries that support arbitrary-precision floating-point numbers is GNU Multiple Precision Arithmetic Library(GMP). They generally store the number directly across the memory and use stacks to simulate a vertical method to compute a final result.
In general, standard floating-point numbers are designed fairly well and elegantly. As long as you understand your need, floating-point numbers are capable for most usages.

Related

Numerical stability of argument of complex number / branch cuts

I am implementing a numerical evaluation of some analytical expressions which involve factors like exp(1i*arg(z) / 2), where z is in principle a complex number, which sometimes happens to be almost real (i.e. to floating point precision, e.g. 4.440892098500626e-16j).
I have implemented my computations in Python and C++ and find that sometimes results disagree as the small imaginary part of the "almost real" numbers differ slightly in sign, and then branch cut behaviour of arg(z)(i.e. arg(-1+0j) = pi, but arg(-1-0j) = -pi) significantly changes the result … I was wondering if there is any commonly used protocol to mitigate these issues?
Many thanks in advance.

Leveraging high precision gpu's in tensor flow

Hi I was reading the using GPUs page at tensor flow and I was wondering if gpu precision performance was ever a factor in tensor flow. For example given a machine with two cards,
gaming gpu
+
workstation gpu
is there any implementation that would provide the workstation card's higher precision performance could overcome the slower clock speed?
I'm not sure if these situations would exist in the context of gradient decent or network performance after training or elsewhere entirely but I would love to get some more information on the topic!
Thanks in advance.
TL;DR
The opposite is actually the case. Higher precision calculations are less desired by frameworks like TensorFlow. This is due to slower training and larger models (more ram and disc space).
The long version
Neural networks actually benefit from using lower precision representations. This paper is a good introduction to the topic.
The key finding of our exploration is that deep neural networks can
be trained using low-precision fixed-point arithmetic, provided
that the stochastic rounding scheme is applied while operating on
fixed-point numbers.
They use 16 bit fixed point number rather than the much higher precession 32 bit floating point number (more information on their difference here).
The following image was taken from that paper. It shows the test error for different rounding schemes as well as the number of bits dedicated to the integer part of the fixed point representation. As you can see the solid red and blue lines (16 bit fixed) have a very similar error to the black line (32 bit float).
The main benefit/driver for going to a lower precision is computational cost and storage of weights. So the higher precision hardware would not give enough of an accuracy increase to out way the cost of slower computation.
Studies like this I believe are a large driver behind the specs for neural network specific processing hardware, such as Google's new TPU. Even though most GPUs don't support 16 bit floats yet Google is working to support it.

Prevent underflow in floating point division in Python

Suppose both x and y are very small numbers, but I know that the true value of x / y is reasonable.
What is the best way to compute x/y?
In particular, I have been doing np.exp(np.log(x) - np.log(y) instead, but I'm not sure if that would make a difference at all?
Python uses the floating-point features of the hardware it runs on, according to Python documentation. On most common machines today, that is IEEE-754 arithmetic or something near it. That Python documentation is not explicit about rounding mode but mentions in passing that the result of a sample division is the nearest representable value, so presumably Python uses round-to-nearest-ties-to-even mode. (“Round-to-nearest” for short. If two representable values are equally close in binary floating-point, the one with a zero in the low bit of its significand is produced.)
In IEEE-754 arithmetic in round-to-nearest mode, the result of a division is the representable value nearest to the exact mathematical value. Since you say the mathematical value of x/y is reasonable, it is in the normal range of representable values (not below it, in the subnormal range, where precision suffers, and not above it, where results are rounded to infinity). In the normal range, results of elementary operations will be accurate within the normal precision of the format.
However, since x and y are “very small numbers,” we may be concerned that they are subnormal and have a loss of precision already in them, before division is performed. In the IEEE-754 basic 64-bit binary format, numbers below 2-1022 (about 2.22507•10-308) are subnormal. If x and y are smaller than that, then they have already suffered a loss of precision, and no method can produce a correct quotient from them except by happenstance. Taking the logarithms to calculate the quotient will not help.
If the machine you are running on happens not to be using IEEE-754, it is still likely that computing x/y directly will produce a better result than np.exp(np.log(x)-np.log(y)). The former is a single operation computing a basic function in hardware that was likely reasonably designed. The latter is several operations computing complicated functions in software that is difficult to make accurate using common hardware operations.
There is a fair amount of unease and distrust of floating-point operations. Lack of knowledge seems to lead to people being afraid of them. But what should be understood here is that elementary floating-point operations are very well defined and are accurate in normal ranges. The actual problems with floating-point computing arise from accumulating rounding errors over sequences of operations, from the inherent mathematics that compounds errors, and from incorrect expectations about results. What this means is that there is no need to worry about the accuracy of a single division. Rather, it is the overall use of floating-point that should be kept in mind. (Your question could be better answered if it presented more context, illuminating why this division is important, how x and y have been produced from prior data, and what the overall goal is.)
Note
A not uncommon deviation from IEEE-754 is to flush subnormal values to zero. If you have some x and some y that are subnormal, some implementations might flush them to zero before performing operations on them. However, this is more common in SIMD code than in normal scalar programming. And, if it were occurring, it would prevent you from evaluating np.log(x) and np.log(y) anyway, as subnormal values would be flushed to zero in those as well. So we can likely dismiss this possibility.
Division, like other IEEE-754-specified operations, is computed at infinite precision and then (with ordinary rounding rules) rounded to the closest representable float. The result of calculating x/y will almost certainly be a lot more accurate than the result of calculating np.exp(np.log(x) - np.log(y) (and is guaranteed not to be less accurate).

Encoding float constants as extremely long binary strings

Recently, I've been trying to implement the 15 tests for randomness described in NIST SP800-22. As a check of my function implementation, I have been running the examples that the NIST document provides for each of it's tests. Some of these tests require bit strings that are very long (up to a million bits). For example, on one of the examples, the input is "the first 100,000 bits of e." That brings up the question: how do I generate a bit representation of a float value that exceeds the precision available for floating point numbers in Python?
I have found articles converting integers to binary strings (the bin() function), and converting floating point fractions to binary (repeated division by 2 (slow!) and limited by floating point precision). I've considered constructing it iteratively in some way using $e=\sum_{n=0}^{\infty}\frac{2n+2}{(2n+1)!}$, calculating the next portion value, converting it to a binary representation, and somehow adding it to the cumulative representation (still thinking through how to do this). However, I've hit the same wall going down this path: the precision of the floating point values as I go farther out on this sum.
Does anyone have some suggestions on creating arbitrarily long bit strings from arbitrarily precise floating point values?
PS - Also, is there any way to get my Markdown math equation above to render properly here? :-)
I maintain the gmpy2 library and its supports arbitrary-precision binary arithmetic. Here is an example of generating the first 100 bits of e.
>>> import gmpy2
>>> gmpy2.get_context().precision=100
>>> gmpy2.exp(1).digits(2)[0]
'101011011111100001010100010110001010001010111011010010101001101010101
1111101110001010110001000000010'

floating point subtraction in python [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 6 years ago.
I'm trying to subtract to floating point numbers in python.
I've values
a = 1460356156116843.000000, b = 2301.93138123
When I try to print a-b, it's resulting the value 1460356156114541.000000 contrary to the actual value 1460356156114541.06861877.
What are the limitations in python while doing floating point arithmetic. Is there any way in python through which I can get the actual result of this subtraction?
Python has the same limitations for floating point arithmetic as all the other languages. You can use Decimal to get the accurate result:
from decimal import Decimal
a = Decimal('1460356156116843.000000')
b = Decimal('2301.93138123')
print a - b # 1460356156114541.06861877
Python uses IEEE 754 doubles for its floats. So you should treat anything after 15 significant figures or so as science fiction. And that's just for a freshly-initialised number. When you start doing operations with floats you can lose more precision, especially doing addition or subtraction between numbers that differ significantly in absolute magnitude.
OTOH, doing subtraction between numbers very close to each other in magnitude can lead to catastrophic cancellation.
If you are careful, you can reduce the impact of these problems, but you do need a good understanding of how floating-point arithmetic works, and well-behaved data.
Alternatively, you can work with a library that provides higher precision, eg Python's Decimal module. You still need to take care to avoid catastrophic cancellation and the other problems that lead to loss of significance, but at least you've got more significant digits to play with.
The Decimal module just provides basic arithmetic operations. If you need advanced mathematical functions like trig and exponential functions, take a look at the excellent 3rd-party arbitrary precision mathematics module mpmath. It can handle complex numbers, solve equations, and provides some calculus operations.
Using decimal is convenient. But for the sake of demonstration regarding the importance of keeping significant digits, let me throw in this example.
import sys
print(sys.maxsize)
9223372036854775807 # for 64 bit machine, the max integer number. But it can grow as needed.
So for the above case you can do the computation in two steps.
1460356156116842 - 2301 = 1460356156114541 # all integer digits preserved
1 - .93138123 = 0.06861877 # all significant float digits preserved.
So the answer would be adding the two. But if you do that you will lose all float digits. The 64bit is not big enough to keep all digits.

Categories