Convert scientific to decimal - dynamic float precision?

Convert scientific to decimal - dynamic float precision? - python

I have a random set of numbers in a SQL database:
1.2
0.4
5.1
0.0000000000232
1
7.54
0.000000000000006534
The decimals way below zero are displayed as scientific notation
num = 0.0000000000232
print(num)
> 2.23e-11
But that causes the rest of my code to bug out as the api behind it expects a decimal number. I checked it as I increased the precision with :.20f - that works fine.
Since the very small numbers are not constant with their precision, It would be unwise to simply set a static .20f.
What is a more elegant way to translate this to the correct decimal, always dynamic with the precision?

If Python provides a way to do this, they've hidden it very well. But a simple function can do it.
def float_to_str(x):
to_the_left = 1 + floor(log(x, 10))
to_the_right = sys.float_info.dig - to_the_left
if to_the_right <= 0:
s = str(int(x))
else:
s = format(x, f'0.{to_the_right}f').rstrip('0')
return s
>>> for num in [1.2, 0.4, 5.1, 0.0000000000232, 1, 7.54, 0.000000000000006534]:
print(float_to_str(num))
1.2
0.4
5.1
0.0000000000232
1.
7.54
0.000000000000006534
The first part uses the logarithm base 10 to figure out how many digits will be on the left of the decimal point, or the number of zeros to the right of it if the number is negative. To find out how many digits can be to the right, we take the total number of significant digits that a float can hold as given by sys.float_info.dig which should be 15 on most Python implementations, and subtract the digits on the left. If this number is negative there won't be anything but garbage after the decimal point, so we can rely on integer conversion instead - it never uses scientific notation. Otherwise we simply conjure up the proper string to use with format. For the final step we strip off the redundant trailing zeros.
Using integers for large numbers isn't perfect because we lose the rounding that naturally occurs with floating point string conversion. float_to_str(1e25) for example will return '10000000000000000905969664'. Since your examples didn't contain any such large numbers I didn't worry about it, but it could be fixed with a little more work. For the reasons behind this see Is floating point math broken?

Related

Why python print() prints a rounded value rather than the exact value for non-representable float

The value 0.1 is not representable as a 64 bits floats.
The exact value is roughly equals to 0.10000000000000000555
https://www.exploringbinary.com/why-0-point-1-does-not-exist-in-floating-point/
You can highlight this behavior with this simple code:
timestep = 0.1
iterations = 1_000_000
total = 0
for _ in range(iterations):
total += timestep
print(total - timestep * iterations) # output is not zero but 1.3328826753422618e-06
I totally understand why 0.1 is not representable as an exact value as a float 64, but what I don't get is why when I do print(0.1), it outputs 0.1 and not the underlying value as a float 64.
Of course, the underlying value has many more digits on a base 10 system so there should be some rounding involved, but I am looking for the specification for all values and how to control that.
I had the issue with some application storing data in database:
the python app (using str(0.1)) would show 0.1
another database client UI would show 0.10000000000000000555, which would throw off the end user
P-S: I had other issues with other values
Regards,

First, you are right, floats (single, double, whatever) have an exact value.
For 64 bits IEEE-754 double, the nearest representable value to 0.1 would be exactly 0.1000000000000000055511151231257827021181583404541015625, quite long as you can see. But representable floating point values all have a finite number of decimal digits, because the base (2) is a divisor of some power of 10.
For a REPL language like python, it is essential to have this property:
the printed representation of the float shall be reinterpreted as the same value
A consequence is that
every two different float shall have different printed representation
For obtaining those properties, there are several possbilities:
print the exact value. That can be many digits, and for the vast majority of humans, just noise.
print enough digits so that every two different float have a different representation. For double precision, that's 17 digits in the worse case. So a naive implementation for representing floating point values would be to always print 17 significant digits.
print the shortest representation that would be reinterpreted unchanged.
Python, and many other languages have chosen the 3rd solution, because it is considered annoying to print 0.10000000000000001 when user have entered 0.1. Human users generally choose the shorter representation and printed representation is for human consumption. The shorter, the better.
The bad property is that it could give the false impression that those floating point values are storing exact decimal values like 1/10. That's a knowledge that is evangelized here and in many places now.

Python: Why does have 2^-n work for n>52 and not 1+2^-n-1?

I'm pretty new to python, and I've made a table which calculates T=1+2^-n-1 and C=2^n, which both give the same values from n=40 to n=52, but for n=52 to n=61 I get 0.0 for T, whereas C gives me progressively smaller decimals each time - why is this?
I think I understand why T becomes 0.0, because of python using binary floating point and because of the machine epsilon value - but I'm slightly confused as to why C doesn't also become 0.0.
import numpy as np
import math
t=np.zeros(21)
c=np.zeros(21)
for n in range(40,61):
m=n-40
t[m]=1+2**(-n)-1
c[m]=2**(-n)
print (n,t[m],c[m])

The "floating" in floating point means that values are represented by storing a fixed number of leading digits and a scale factor, rather than assuming a fixed scale (which would be fixed point).
2**-53 only takes one (binary) digit to represent (not including the scale), but 1+2**-53 would take 54 to represent exactly. Python floats only have 53 binary digits of precision; 2**-53 can be represented exactly, but 1+2**-53 gets rounded to exactly 1, and subtracting 1 from that gives exactly 0. Thus, we have
>>> 2**-53
1.1102230246251565e-16
>>> 1+(2**-53)-1
0.0
Postscript: you might wonder why 2**-53 displays as a value not equal to the exact mathematical value when I said it was exact. That's due to the float->string conversion logic, which only keeps enough decimal digits to reconstruct the original float (instead of printing a bunch of digits at the end that are usually just noise).

The difference between both is indeed due to floating-point representation. Indeed, if you perform 1 + X where X is a very very small number, then the floating-point representation sets its exponent value to 0 and the precision is ensured by the mantissa, which is 52-bit on a 64-bit computer. Therefore, 1 + 2^(-X) if X > 52 is equal to 1. However, even 2^-100 can be represented in double-precision floating-point, so you can see C decrease for a larger number of samples.

Floating point math accuracy in Python 2.7

I am having a bit of trouble understanding some results I am getting while operating with Python 2.7.
>>> x=1
>>> e=1e-20
>>> x+e
1.0
>>> x+e-x
0.0
>>> e+x-x
0.0
>>> x-x+e
1e-20
This is copied directly from Python. I am having a class on how to program on Python and I do not understand the disparity of results (x+e==1, x-x+e==1e-20 but x+e-x==0 and e+x-x==0).
I have already read the Python tutorial on Representation Errors, but I believe none of that was mentioned there
Thanks in advance

Floating-point addition is not associative.
x+e-x is grouped as (x+e)-x. It adds x and e, rounds the result to the nearest representable number (which is 1), then subtracts x from the result and rounds again, producing 0.
x-x+e is grouped as (x-x)+e. It subtracts x from x, producing 0, and rounds it to the nearest representable number, which is 0. It then adds e to 0, producing e, and rounds it to the nearest representable number, which is e.

This is because of the way that computers represent floating point numbers.
This is all really in binary format but let's pretend that it works with base 10 numbers because that's a lot easier for us to relate to.
A floating point number is expressed on the form 0.x*10^y where x is a 10-digit number (I'm omitting trailing zeroes here) and y is the exponent. This means that the number 1.0 is expressed as 0.1*10^1 and the number 0.1 as 0.1*10^0.
To add these two numbers together we need to make sure that they have the same exponent. We can do this easily by shifting the numbers back and forth, i.e. we change 0.1*10^0 to 0.01*10^1 and then we add the together to get 0.11*10^1.
When we have 0.1*10^1 and 0.1*10^-19 (1e-20) we will shift 0.1*10^-19 20 steps, meaning that the 1 will fall outside the range of our 10 digit number so we will end up with 0.1*10^1 + 0.0*10^1 = 0.1*10^1.
The reason you end up with 1e-20 in your last example is because addition is done from left to right, so we subtract 0.1*10^1 from 0.1*10^1 ending up with 0.0*10^0 and add 0.1*10^-19 to that, which is a special case where we don't need to shift any of them because one of them is exactly zero.

A "round"ed number multiplied by 0.01 results in x.y00000000000001 and not x.y?

The reason I'm asking this is because there is a validation in OpenERP that it's driving me crazy:
>>> round(1.2 / 0.01) * 0.01
1.2
>>> round(12.2 / 0.01) * 0.01
12.200000000000001
>>> round(122.2 / 0.01) * 0.01
122.2
>>> round(1222.2 / 0.01) * 0.01
1222.2
As you can see, the second round is returning an odd value.
Can someone explain to me why is this happening?

This has in fact nothing to with round, you can witness the exact same problem if you just do 1220 * 0.01:
>>> 1220*0.01
12.200000000000001
What you see here is a standard floating point issue.
You might want to read what Wikipedia has to say about floating point accuracy problems:
The fact that floating-point numbers cannot precisely represent all real numbers, and that floating-point operations cannot precisely represent true arithmetic operations, leads to many surprising situations. This is related to the finite precision with which computers generally represent numbers.
Also see:
Numerical analysis
Numerical stability
A simple example for numerical instability with floating-point:
the numbers are finite. lets say we save 4 digits after the dot in a given computer or language.
0.0001 multiplied with 0.0001 would result something lower than 0.0001, and therefore it is impossible to save this result!
In this case if you calculate (0.0001 x 0.0001) / 0.0001 = 0.0001, this simple computer will fail in being accurate because it tries to multiply first and only afterwards to divide. In javascript, dividing with fractions leads to similar inaccuracies.

The float type that you are using stores binary floating point numbers. Not every decimal number is exactly representable as a float. In particular there is no exact representation of 1.2 or 0.01, so the actual number stored in the computer will differ very slightly from the value written in the source code. This representation error can cause calculations to give slightly different results from the exact mathematical result.
It is important to be aware of the possibility of small errors whenever you use floating point arithmetic, and write your code to work well even when the values calculated are not exactly correct. For example, you should consider rounding values to a certain number of decimal places when displaying them to the user.
You could also consider using the decimal type which stores decimal floating point numbers. If you use decimal then 1.2 can be stored exactly. However, working with decimal will reduce the performance of your code. You should only use it if exact representation of decimal numbers is important. You should also be aware that decimal does not mean that you'll never have any problems. For example 0.33333... has no exact representation as a decimal.

There is a loss of accuracy from the division due to the way floating point numbers are stored, so you see that this identity doesn't hold
>>> 12.2 / 0.01 * 0.01 == 12.2
False
bArmageddon, has provided a bunch of links which you should read, but I believe the takeaway message is don't expect floats to give exact results unless you fully understand the limits of the representation.
Especially don't use floats to represent amounts of money! which is a pretty common mistake
Python also has the decimal module, which may be useful to you

Others have answered your question and mentioned that many numbers don't have an exact binary fractional representation. If you are accustomed to working only with decimal numbers, it can seem deeply weird that a nice, "round" number like 0.01 could be a non-terminating number in some other base. In the spirit of "seeing is believing," here's a little Python program that will print out a binary representation of any number to any desired number of digits.
from decimal import Decimal
n = Decimal("0.01") # the number to print the binary equivalent of
m = 1000 # maximum number of digits to print
p = -1
r = []
w = int(n)
n = abs(n) - abs(w)
while n and -p < m:
s = Decimal(2) ** p
if n >= s:
r.append("1")
n -= s
else:
r.append("0")
p -= 1
print "%s.%s%s" % ("-" if w < 0 else "", bin(abs(w))[2:],
"".join(r), "..." if n else "")

Safest way to convert float to integer in python?

Python's math module contain handy functions like floor & ceil. These functions take a floating point number and return the nearest integer below or above it. However these functions return the answer as a floating point number. For example:
import math
f=math.floor(2.3)
Now f returns:
2.0
What is the safest way to get an integer out of this float, without running the risk of rounding errors (for example if the float is the equivalent of 1.99999) or perhaps I should use another function altogether?

All integers that can be represented by floating point numbers have an exact representation. So you can safely use int on the result. Inexact representations occur only if you are trying to represent a rational number with a denominator that is not a power of two.
That this works is not trivial at all! It's a property of the IEEE floating point representation that int∘floor = ⌊⋅⌋ if the magnitude of the numbers in question is small enough, but different representations are possible where int(floor(2.3)) might be 1.
To quote from Wikipedia,
Any integer with absolute value less than or equal to 224 can be exactly represented in the single precision format, and any integer with absolute value less than or equal to 253 can be exactly represented in the double precision format.

Use int(your non integer number) will nail it.
print int(2.3) # "2"
print int(math.sqrt(5)) # "2"

You could use the round function. If you use no second parameter (# of significant digits) then I think you will get the behavior you want.
IDLE output.
>>> round(2.99999999999)
3
>>> round(2.6)
3
>>> round(2.5)
3
>>> round(2.4)
2

Combining two of the previous results, we have:
int(round(some_float))
This converts a float to an integer fairly dependably.

That this works is not trivial at all! It's a property of the IEEE floating point representation that int∘floor = ⌊⋅⌋ if the magnitude of the numbers in question is small enough, but different representations are possible where int(floor(2.3)) might be 1.
This post explains why it works in that range.
In a double, you can represent 32bit integers without any problems. There cannot be any rounding issues. More precisely, doubles can represent all integers between and including 253 and -253.
Short explanation: A double can store up to 53 binary digits. When you require more, the number is padded with zeroes on the right.
It follows that 53 ones is the largest number that can be stored without padding. Naturally, all (integer) numbers requiring less digits can be stored accurately.
Adding one to 111(omitted)111 (53 ones) yields 100...000, (53 zeroes). As we know, we can store 53 digits, that makes the rightmost zero padding.
This is where 253 comes from.
More detail: We need to consider how IEEE-754 floating point works.
1 bit 11 / 8 52 / 23 # bits double/single precision
[ sign | exponent | mantissa ]
The number is then calculated as follows (excluding special cases that are irrelevant here):
-1sign × 1.mantissa ×2exponent - bias
where bias = 2exponent - 1 - 1, i.e. 1023 and 127 for double/single precision respectively.
Knowing that multiplying by 2X simply shifts all bits X places to the left, it's easy to see that any integer must have all bits in the mantissa that end up right of the decimal point to zero.
Any integer except zero has the following form in binary:
1x...x where the x-es represent the bits to the right of the MSB (most significant bit).
Because we excluded zero, there will always be a MSB that is one—which is why it's not stored. To store the integer, we must bring it into the aforementioned form: -1sign × 1.mantissa ×2exponent - bias.
That's saying the same as shifting the bits over the decimal point until there's only the MSB towards the left of the MSB. All the bits right of the decimal point are then stored in the mantissa.
From this, we can see that we can store at most 52 binary digits apart from the MSB.
It follows that the highest number where all bits are explicitly stored is
111(omitted)111. that's 53 ones (52 + implicit 1) in the case of doubles.
For this, we need to set the exponent, such that the decimal point will be shifted 52 places. If we were to increase the exponent by one, we cannot know the digit right to the left after the decimal point.
111(omitted)111x.
By convention, it's 0. Setting the entire mantissa to zero, we receive the following number:
100(omitted)00x. = 100(omitted)000.
That's a 1 followed by 53 zeroes, 52 stored and 1 added due to the exponent.
It represents 253, which marks the boundary (both negative and positive) between which we can accurately represent all integers. If we wanted to add one to 253, we would have to set the implicit zero (denoted by the x) to one, but that's impossible.

If you need to convert a string float to an int you can use this method.
Example: '38.0' to 38
In order to convert this to an int you can cast it as a float then an int. This will also work for float strings or integer strings.
>>> int(float('38.0'))
38
>>> int(float('38'))
38
Note: This will strip any numbers after the decimal.
>>> int(float('38.2'))
38

math.floor will always return an integer number and thus int(math.floor(some_float)) will never introduce rounding errors.
The rounding error might already be introduced in math.floor(some_large_float), though, or even when storing a large number in a float in the first place. (Large numbers may lose precision when stored in floats.)

Another code sample to convert a real/float to an integer using variables.
"vel" is a real/float number and converted to the next highest INTEGER, "newvel".
import arcpy.math, os, sys, arcpy.da
.
.
with arcpy.da.SearchCursor(densifybkp,[floseg,vel,Length]) as cursor:
for row in cursor:
curvel = float(row[1])
newvel = int(math.ceil(curvel))

Since you're asking for the 'safest' way, I'll provide another answer other than the top answer.
An easy way to make sure you don't lose any precision is to check if the values would be equal after you convert them.
if int(some_value) == some_value:
some_value = int(some_value)
If the float is 1.0 for example, 1.0 is equal to 1. So the conversion to int will execute. And if the float is 1.1, int(1.1) equates to 1, and 1.1 != 1. So the value will remain a float and you won't lose any precision.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.