FloatingPointError in calculating standard deviation? - python

Using pandas/numpy, I sometimes get the floating point error when trying to calculate a standard deviation:
FloatingPointError: invalid value encountered in less
My code looks something like this:
def historical_volatility(p):
return p.pct_change().ewm(span=35, min_periods=35).std()
It is only DataFrames of floats going in.
My understanding is that for a technical reason related to how the standard deviation is calculated computationally, situations with particularly low deviation will result in a floating point error.
How can I make this more robust?
P.S. It is acceptable to set a 'minimum value' for low volatility; a result of 0 would be bad as I am subsequently dividing by these numbers.

Computing standard deviation likely involves doing a square root of the variance. The latter can only be negative if there's some precision loss. If this is the case, you likely don't care if your variance is of the order of 1e-16 or zero. If you do, you likely need to use extended precision (not easy), or switch to decimal arithmetics.

Related

Norms in Python for floating point vs. Decimal (fixed-point)

Is it recommended to use Python's native floating point implementation, or its decimal implementation for use-cases where precision is important?
I thought this question would be easy to answer: if accumulated error has significant implications, e.g. perhaps in calculating orbital trajectories or the like, then an exact representation might make more sense.
I'm unsure for run of the mill deep learning use-cases, for scientific computing generally (e.g. many people use numpy or scikit-learn which i think use floating point implementations), and for financial computing (e.g. trading strategies) what the norms are.
Does anyone know the norms for floating point vs. Decimal use in python for these three areas?
Finance (Trading Strategies)
Deep Learning
Scientific Computing
Thanks
N.B.: This is /not/ a question about the difference between floating point and fixed-point representations, or why floating point arithmetic produces surprising results. This is a question about what norms are.
I learn more about Deep Learning and Scientific Computing, but since my family is running the financing business, I think I can answer the question.
First and foremost, the float numbers are not evil; all you need to do is to understand how much precision does your project needs.
Finance
In the Financing area, depending on usage, you can use decimal or float number. Plus, different banks have different requirements. Generally, if you are dealing with cash or cash equivalent, you may use decimal since the fractional monetary unit is known. For example, for dollars, the fractional monetary unit is 0.01. So you can use decimal to store it, and in the database, you can just use number(20,2)(oracle) or similar things to store your decimal number. The precision is enough since banks have a systematic way to minimize errors on day one, even before the computers appear. The programmers only need to correctly implement what the bank's guideline says.
For other things in the financing area, like analysis and interest rate, using double is enough. Here the precision is not important, but the simplicity matters. CPUs are optimized to calculate float numbers, so no special methods are needed to calculate float arithmetic. Since arithmetic in computers is a huge topic, using an optimized and stabilized way to perform a calculation is much safer than to create its own methods to do arithmetic. Plus, one or two float calculations will not have a huge compact on the precision. For example, banks usually store the value in decimal and then perform multiplication with a float interest rate and then convert back to decimal. In this way, errors will not accumulate. Considering we only need two digits to the right of the decimal point, the float number's precision is quite enough to do such a computation.
I have heard that in investment banks, they use double in all of their systems since they deal with very large amounts of cash. Thus in these banks, simplicity and performance are more important than precision.
Deep Learning
Deep Learning is one of the fields that do not need high precision but do need high performance. A neural network can have millions of parameters, so the precision of a single weight and bias will not impact the prediction of the network. Instead, the neural network needs to compute very fast to train on a given dataset and give out a prediction in a reasonable time interval. Plus, many accelerators can actually accelerate a specific type of float: half-precision i.e., fp16. Thus, to reduce the size of the network in memory and to accelerate the train and prediction process, many neural networks usually run in hybrid mode. The neural network framework and accelerator driver can decide what parameters can be computed in fp16 with minimum overflow and underflow risk since fp16 has a pretty small range: 10^-8 to 65504. Other parameters are still computed in fp32. In some edge usage, the usable memory is very small (for example, K 210 and edge TPU only has 8MB onboard SRAM), so neural networks need to use 8-bit fixed-point numbers to fit in these devices. The fixed-point numbers are like decimals that they are the opposite of floating-point numbers as they have fixed digits after the decimal point. Usually, they represent themselves in the system as int8 or unit8.
Scientific Computation
The double type (i.e. 64-bit floating number) usually meets the scientist's need in scientific computation. In addition, IEEE 754 also has defined quad precision (128 bit) to facilitate scientific computation. Intel's x86 processors also have an 80-bit extended precision format.
However, some of the scientific computation needs arbitrary precision arithmetic. For example, to compute pi and to do astronomical simulation need high precision computation. Thus, they need something different, which is called arbitrary-precision floating-point number. One of the most famous libraries that support arbitrary-precision floating-point numbers is GNU Multiple Precision Arithmetic Library(GMP). They generally store the number directly across the memory and use stacks to simulate a vertical method to compute a final result.
In general, standard floating-point numbers are designed fairly well and elegantly. As long as you understand your need, floating-point numbers are capable for most usages.

Numerical stability of argument of complex number / branch cuts

I am implementing a numerical evaluation of some analytical expressions which involve factors like exp(1i*arg(z) / 2), where z is in principle a complex number, which sometimes happens to be almost real (i.e. to floating point precision, e.g. 4.440892098500626e-16j).
I have implemented my computations in Python and C++ and find that sometimes results disagree as the small imaginary part of the "almost real" numbers differ slightly in sign, and then branch cut behaviour of arg(z)(i.e. arg(-1+0j) = pi, but arg(-1-0j) = -pi) significantly changes the result … I was wondering if there is any commonly used protocol to mitigate these issues?
Many thanks in advance.

Prevent underflow in floating point division in Python

Suppose both x and y are very small numbers, but I know that the true value of x / y is reasonable.
What is the best way to compute x/y?
In particular, I have been doing np.exp(np.log(x) - np.log(y) instead, but I'm not sure if that would make a difference at all?
Python uses the floating-point features of the hardware it runs on, according to Python documentation. On most common machines today, that is IEEE-754 arithmetic or something near it. That Python documentation is not explicit about rounding mode but mentions in passing that the result of a sample division is the nearest representable value, so presumably Python uses round-to-nearest-ties-to-even mode. (“Round-to-nearest” for short. If two representable values are equally close in binary floating-point, the one with a zero in the low bit of its significand is produced.)
In IEEE-754 arithmetic in round-to-nearest mode, the result of a division is the representable value nearest to the exact mathematical value. Since you say the mathematical value of x/y is reasonable, it is in the normal range of representable values (not below it, in the subnormal range, where precision suffers, and not above it, where results are rounded to infinity). In the normal range, results of elementary operations will be accurate within the normal precision of the format.
However, since x and y are “very small numbers,” we may be concerned that they are subnormal and have a loss of precision already in them, before division is performed. In the IEEE-754 basic 64-bit binary format, numbers below 2-1022 (about 2.22507•10-308) are subnormal. If x and y are smaller than that, then they have already suffered a loss of precision, and no method can produce a correct quotient from them except by happenstance. Taking the logarithms to calculate the quotient will not help.
If the machine you are running on happens not to be using IEEE-754, it is still likely that computing x/y directly will produce a better result than np.exp(np.log(x)-np.log(y)). The former is a single operation computing a basic function in hardware that was likely reasonably designed. The latter is several operations computing complicated functions in software that is difficult to make accurate using common hardware operations.
There is a fair amount of unease and distrust of floating-point operations. Lack of knowledge seems to lead to people being afraid of them. But what should be understood here is that elementary floating-point operations are very well defined and are accurate in normal ranges. The actual problems with floating-point computing arise from accumulating rounding errors over sequences of operations, from the inherent mathematics that compounds errors, and from incorrect expectations about results. What this means is that there is no need to worry about the accuracy of a single division. Rather, it is the overall use of floating-point that should be kept in mind. (Your question could be better answered if it presented more context, illuminating why this division is important, how x and y have been produced from prior data, and what the overall goal is.)
Note
A not uncommon deviation from IEEE-754 is to flush subnormal values to zero. If you have some x and some y that are subnormal, some implementations might flush them to zero before performing operations on them. However, this is more common in SIMD code than in normal scalar programming. And, if it were occurring, it would prevent you from evaluating np.log(x) and np.log(y) anyway, as subnormal values would be flushed to zero in those as well. So we can likely dismiss this possibility.
Division, like other IEEE-754-specified operations, is computed at infinite precision and then (with ordinary rounding rules) rounded to the closest representable float. The result of calculating x/y will almost certainly be a lot more accurate than the result of calculating np.exp(np.log(x) - np.log(y) (and is guaranteed not to be less accurate).

What should I worry about if I compress float64 array to float32 in numpy?

This is a particular kind of lossy compression that's quite easy to implement in numpy.
I could in principle directly compare original (float64) to reconstructed (float64(float32(original)) and know things like the maximum error.
Other than looking at the maximum error for my actual data, does anybody have a good idea what type of distortions this creates, e.g. as a function of the magnitude of the original value?
Would I be better off mapping all values (in 64-bits) onto say [-1,1] first (as a fraction of extreme values, which could be preserved in 64-bits) to take advantage of greater density of floats near zero?
I'm adding a specific case I have in mind. Let's say I have 500k to 1e6 values ranging from -20 to 20, that are approximately IID ~ Normal(mu=0,sigma=4) so they're already pretty concentrated near zero and the "20" is ~5-sigma rare. Let's say they are scientific measurements where the true precision is a whole lot less than the 64-bit floats, but hard to really know exactly. I have tons of separate instances (potentially TB's worth) so compressing has a lot of practical value, and float32 is a quick way to get 50% (and if anything, works better with an additional round of lossless compression like gzip). So the "-20 to 20" eliminates a lot of concerns about really large values.
The following assumes you are using standard IEEE-754 floating-point operations, which are common (with some exceptions), in the usual round-to-nearest mode.
If a double value is within the normal range of float values, then the only change that occurs when the double is rounded to a float is that the significand (fraction portion of the value) is rounded from 53 bits to 24 bits. This will cause an error of at most 1/2 ULP (unit of least precision). The ULP of a float is 2-23 times the greatest power of two not greater than the float. E.g., if a float is 7.25, the greatest power of two not greater than it is 4, so its ULP is 4*2-23 = 2-21, about 4.77e-7. So the error when double in the interval [4, 8) is converted to float is at most 2-22, about 2.38e-7. For another example, if a float is about .03, the greatest power of two not greater than it is 2-6, so the ULP is 2-29, and the maximum error when converting to double is 2-30.
Those are absolute errors. The relative error is less than 2-24, which is 1/2 ULP divided by the smallest the value could be (the smallest value in the interval for a particular ULP, so the power of two that bounds it). E.g., for each number x in [4, 8), we know the number is at least 4 and error is at most 2-22, so the relative error is at most 2-22/4 = 2-24. (The error cannot be exactly 2-24 because there is no error when converting an exact power of two from float to double, so there is an error only if x is greater than four, so the relative error is less than, not equal to, 2-24.) When you know more about the value being converted, e.g., it is nearer 8 than 4, you can bound the error more tightly.
If the number is outside the normal range of a float, errors can be larger. The maximum finite floating-point value is 2128-2104, about 3.40e38. When you convert a double that is 1/2 ULP (of a float; doubles have finer ULP) more than that or greater to float, infinity is returned, which is, of course, an infinite absolute error and an infinite relative error. (A double that is greater than the maximum finite float but is greater by less than 1/2 ULP is converted to the maximum finite float and has the same errors discussed in the previous paragraph.)
The minimum positive normal float is 2-126, about 1.18e-38. Numbers within 1/2 ULP of this (inclusive) are converted to it, but numbers less than that are converted to a special denormalized format, where the ULP is fixed at 2-149. The absolute error will be at most 1/2 ULP, 2-150. The relative error will depend significantly on the value being converted.
The above discusses positive numbers. The errors for negative numbers are symmetric.
If the value of a double can be represented exactly as a float, there is no error in conversion.
Mapping the input numbers to a new interval can reduce errors in specific situations. As a contrived example, suppose all your numbers are integers in the interval [248, 248+224). Then converting them to float would lose all information that distinguishes the values; they would all be converted to 248. But mapping them to [0, 224) would preserve all information; each different input would be converted to a different result.
Which map would best suit your purposes depends on your specific situation.
It is unlikely that a simple transformation will reduce error significantly, since your distribution is centered around zero.
Scaling can have effect in only two ways: One, it moves values away from the denormal interval of single-precision values, (-2-126, 2-126). (E.g., if you multiply by, say, 2123 values that were in [2-249, 2-126) are mapped to [2-126, 2-3), which is outside the denormal interval.) Two, it changes where values lie in each “binade” (interval from one power of two to the next). E.g., your maximum value is 20, where the relative error may be 1/2 ULP / 20, where the ULP for that binade is 16*2-23 = 2-19, so the relative error may be 1/2 * 2-19 / 20, about 4.77e-8. Suppose you scale by 32/20, so values just under 20 become values just under 32. Then, when you convert to float, the relative error is at most 1/2 * 2-19 / 32 (or just under 32), about 2.98e-8. So you may reduce the error slightly.
With regard to the former, if your values are nearly normally distributed, very few are in (-2-126, 2-126), simply because that interval is so small. (A trillion samples of your normal distribution almost certainly have no values in that interval.) You say these are scientific measurements, so perhaps they are produced with some instrument. It may be that the machine does not measure or calculate finely enough to return values that range from 2-126 to 20, so it would not surprise me if you have no values in the denormal interval at all. If you have no values in the single-precision denormal range, then scaling to avoid that range is of no use.
With regard to the latter, we see a small improvement is available at the end of your range. However, elsewhere in your range, some values are also moved to the high end of a binade, but some are moved across a binade boundary to the small end of a new binade, resulting in increased relative error for them. It is unlikely there is a significant net improvement.
On the other hand, we do not know what is significant for your application. How much error can your application tolerate? Will the change in the ultimate result be unnoticeable if random noise of 1% is added to each number? Or will the result be completely unacceptable if a few numbers change by as little as 2-200?
What do you know about the machinery producing these numbers? Is it truly producing numbers more precise than single-precision floats? Perhaps, although it produces 64-bit floating-point values, the actual values are limited to a population that is representable in 32-bit floating-point. Have you performed a conversion from double to float and measured the error?
There is still insufficient information to rule out these or other possibilities, but my best guess is that there is little to gain by any transformation. Converting to float will either introduce too much error or it will not, and transforming the numbers first is unlikely to alter that.
The exponent for float32 is quite a lot smaller (or bigger in the case of negative exponents), but assuming all you numbers are less than that you only need to worry about the loss of precision. float32 is only good to about 7 or 8 significant decimal digits

Fixed-point arithmetic

Does anyone know of a library to do fixed point arithmetic in Python?
Or, does anyone has sample code?
If you are interested in doing fixed point arithmetic, the Python Standard Library has a decimal module that can do it.
Actually, it has a more flexible floating point ability than the built-in too. By flexible I mean that it:
Has "signals" for various exceptional conditions (these can be set to do a variety of things on signaling)
Has positive and negative
infinities, as well as NaN (not a
number)
Can differentiate between positive
and negative 0
Allows you to set different rounding
schemes.
Allows you to set your own min and
max values.
All in all, it is handy for a million household uses.
The deModel package sounds like what you're looking for.
Another option worth considering if you want to simulate the behaviour of binary fixed-point numbers beyond simple arithmetic operations, is the spfpm module. That will allow you to calculate square-roots, powers, logarithms and trigonometric functions using fixed numbers of bits. It's a pure-python module, so doesn't offer the ultimate performance but can do hundreds of thousands of arithmetic operations per second on 256-bit numbers.
recently I'm working on similar project, https://numfi.readthedocs.io/en/latest/
>>> from numfi import numfi
>>> x = numfi(0.68751,1,6,3)
>>> x + 1/3
numfi([1.125]) s7/3-r/s
>>> np.sin(x)
numfi([0.625 ]) s6/3-r/s

Categories