When I convert a float to decimal.Decimal in Python and afterwards call Decimal.shift it may give me completely wrong and unexpected results depending on the float. Why is this the case?
Converting 123.5 and shifting it:
from decimal import Decimal
a = Decimal(123.5)
print(a.shift(1)) # Gives expected result
The code above prints the expected result of 1235.0.
If I instead convert and shift 123.4:
from decimal import Decimal
a = Decimal(123.4)
print(a.shift(1)) # Gives UNexpected result
it gives me 3.418860808014869689941406250E-18 (approx. 0) which is completely unexpected and wrong.
Why is this the case?
Note:
I understand the floating-point imprecision because of the representation of floats in memory. However, I can't explain why this should give me such a completely wrong result.
Edit:
Yes, in general it would be best to not convert the floats to decimals but convert strings to decimals instead. However, this is not the point of my question. I want to understand why the shifting after float conversion gives such a completely wrong result. So if I print(Decimal(123.4)) it gives 123.40000000000000568434188608080148696899414062 so after shifting I would expect it to be 1234.0000000000000568434188608080148696899414062 and not nearly zero.
You need to change the Decimal constructor input to use strings instead of floats.
a = Decimal('123.5')
print(a.shift(1))
a = Decimal('123.4')
print(a.shift(1))
or
a = Decimal(str(123.5))
print(a.shift(1))
a = Decimal(str(123.4))
print(a.shift(1))
The output will be as expected.
>>> 1235.0
>>> 1234.0
Decimal instances can be constructed from integers, strings, floats, or tuples. Construction from an integer or a float performs an exact conversion of the value of that integer or float.
For floats, Decimal calls Decimal.from_float()
Note that Decimal.from_float(0.1) is not the same as Decimal('0.1'). Since 0.1 is not exactly representable in binary floating point, the value is stored as the nearest representable value which is 0x1.999999999999ap-4. The exact equivalent of the value in decimal is 0.1000000000000000055511151231257827021181583404541015625.
Internally, the Python decimal library converts a float into two integers representing the numerator and denominator of a fraction that yields the float.
n, d = abs(123.4).as_integer_ratio()
It then calculates the bit length of the denominator, which is the number of bits required to represent the number in binary.
k = d.bit_length() - 1
And then from there the bit length k is used to record the coefficient of the decimal number by multiplying the numerator * 5 to the power of the bit length of the denominator.
coeff = str(n*5**k)
The resulting values are used to create a new Decimal object with constructor arguments of sign, coefficient, and exponent using this values.
For the float 123.5 these values are
>>> 1 1235 -1
and for the float 123.4 these values are
1 123400000000000005684341886080801486968994140625 -45
So far, nothing is amiss.
However when you call shift, the Decimal library has to calculate how much to pad the number with zeroes based on the shift you've specified. To do this internally it takes the precision subtracted by length of the coefficient.
amount_to_pad = context.prec - len(coeff)
The default precision is only 28 and with a float like 123.4 the coefficient becomes much longer than the default precision as noted above. This creates a negative amount to pad with zeroes and makes the number very tiny as you noted.
A way around this is to increase the precision to the length of the exponent + the length of the number you started with (45 + 4).
from decimal import Decimal, getcontext
getcontext().prec = 49
a = Decimal(123.4)
print(a)
print(a.shift(1))
>>> 123.400000000000005684341886080801486968994140625
>>> 1234.000000000000056843418860808014869689941406250
The documentation for shift hints that the precision is important for this calculation:
The second operand must be an integer in the range -precision through precision.
However it does not explain this caveat for floats that don't play nice with memory limitations.
I would expect this to raise some kind of error and prompt you to change your input or increase the precision, but at least you know!
#MarkDickinson noted in a comment above that you can view this Python bug tracker for more information: https://bugs.python.org/issue7233
Related
This question already has answers here:
Float to Int type conversion in Python for large integers/numbers
(2 answers)
Closed 22 days ago.
Why is the result of below code 0 in python3?
a = "4.15129406851375e+17"
a = float(a)
b = "415129406851375001"
b = float(b)
a-b
This happens because both 415129406851375001 and 4.15129406851375e+17 are greater than the integer representational limits of a C double (which is what a Python float is implemented in terms of).
Typically, C doubles are IEEE 754 64 bit binary floating point values, which means they have 53 bits of integer precision (the last consecutive integer values float can represent are 2 ** 53 - 1 followed by 2 ** 53; it can't represent 2 ** 53 + 1). Problem is, 415129406851375001 requires 59 bits of integer precision to store ((415129406851375001).bit_length() will provide this information). When a value is too large for the significand (the integer component) alone, the exponent component of the floating point value is used to scale a smaller integer value by powers of 2 to be roughly in the ballpark of the original value, but this means that the representable integers start to skip, first by 2 (as you require >53 bits), then by 4 (for >54 bits), then 8 (>55 bits), then 16 (>56 bits), etc., skipping twice as far between representable values for each bit of magnitude you have that can't be represented in 53 bits.
In your case, both numbers, converted to float, have an integer value of 415129406851374976 (print(int(a), int(b)) will show you the true integer value; they're too large to have any fractional component), having lost precision in the low digits.
If you need arbitrarily precise base-10 floating point math, replace your use of float with decimal.Decimal (conveniently, your values are already strings, so you don't risk loss of precision between how you type a float and the actual value stored); the default precision will handle these values, and you can increase it if you need larger values. If you do that, you get the behavior you expected:
from decimal import Decimal as Dec # Import class with shorter name
a = "4.15129406851375e+17"
a = Dec(a) # Convert to Decimal instead of float
b = "415129406851375001"
b = Dec(b) # Ditto
print(a-b)
which outputs -1. If you echoed it in an interactive interpreter instead of using print, you'd see Decimal('-1'), which is the repr form of Decimals, but it's numerically -1, and if converted to int, or stringified via any method that doesn't use the repr, e.g. print, it displays as just -1.
Try it online!
Why is the answer not same for the below operations, and also, since // is essentially floor division, then why is the output different when a floor function is used.
I ran the following code:
import math
x = 2**64 -1
print("Original value:", x)
print("Floor division:", x//1)
print("Floor function:", math.floor(x/1))
print("Trunc function:", math.trunc(x/1))
print("Type conversion:", int((x/1)))
Output:
Original value: 18446744073709551615
Floor division: 18446744073709551615
Floor function: 18446744073709551616
Trunc function: 18446744073709551616
Type conversion: 18446744073709551616
Now, why is the answer not equal to the original value since all i did was divide by 1?
float is a 64 bit IEEE-754 binary floating point value; it only has 53 bits of integer precision (beyond which it's stuck with imprecise approximations based on multiplying an integer value by a power of 2), and you put a 64 bit value in there when you divided by 1 (which coerced to a float result before the eventual call to round/trunc). Basically, it made the float value as close to the int you used as possible, which was unfortunately not equal to the int (because that's impossible), then rounded/truncated it (which, given the value had no decimal component, just meant converting back to the equivalent int value).
Floor division with // never has the problem, because it's a purely int-based division (nothing is ever represented as a float), and ints are (to the limits of computer memory) effectively infinite precision.
/ converts to float. And float arithmetic has inherent numerical errors. As soon as you input x/1 you potentially introduce errors on the original value of x. Once the error is there you basically can't recover it.
This is broader than just an integer flooring issue:
x = 2**70 + 123
print(x-int(x/1))
123
Conversions to integer can unmask inaccuracies in a floating-point number. For example, the closest single-precision floating-point number to 21.33 is slightly less than 21.33, so when it is multiplied by 100, the result Y is slightly less than 2133.0.
If you print Y in a typical floating-point format, rounding causes it to be displayed as 2133.00. However, if you assign Y to an integer I, no rounding is done, and the number is truncated to 2132.
Also, as #shadowRanger said, // represents integer division and hence is precise till infinite.
I am a new user to Python 3.6.0. I am trying to divide 2 numbers that produces a big output. However, using
return ans1 // ans2
produces 55347740058143507128 while using
return int(ans1 / ans2)
produces 55347740058143506432.
Which is more accurate and why is that so?
The first one is more accurate since it gives the exact integer result.
The second represents the intermediate result as a float. Floats have limited resolution (53 bits of mantissa) whereas the result needs 66 bits to be represented exactly. This results in a loss of accuracy.
If we looks at the hex representation of both results:
>>> hex(55347740058143507128)
'0x3001aac56864d42b8L'
>>> hex(55347740058143506432)
'0x3001aac56864d4000L'
we can see that the least-significant bits of the result that didn't fit in a 53-bit mantissa all got set to zero.
One way to see the rounding directly, without any complications brought about by division is:
>>> int(float(55347740058143507128))
55347740058143506432L
The flooring integer division is more accurate, in that sense.
The problem with this construction int(ans1 / ans2), is that the result is temporarily a float (before, obviously, converting it to an integer), introducing rounding to the nearest float (the amount of rounding depends on the magnitude of the number). This can even be seen by just trying to round-trip that value through a float:
print(int(float(55347740058143507128)))
Which prints 55347740058143506432. So, because plain / results in a float, that limits its accuracy.
I'm having a hard time understanding this:
>>> 52920*(15303855351918+15303855298999)/2.0 == 809880023823263820
False
>>> 52920*(15303855351918+15303855298999)/2.0 == 809880023823263820.0
True
>>> (52920*(15303855351918+15303855298999)/2.0) - 809880023823263820
0.0
>>> int(52920*(15303855351918+15303855298999)/2.0)
809880023823263872
>>> int(52920/2.0)*(15303855351918+15303855298999)
809880023823263820
Running Python 3.5.3
The number 809880023823263820 is obviously representable as an integer since it's the sum of an a.p. series with all integer parameters. What is the explanation for the 1st and 4th computations?
Python int objects have infinite precision (bounded only by memory). float objects do not have infinite precision; numbers are represented as an exponent and a significant, with the latter building up a number from 53 binary fractions. Binary fractions can't represent every possible decimal number and that is what is happening here.
809880023823263820 is not cleanly representable using binary fractions:
>>> float(809880023823263820)
8.098800238232639e+17
>>> format(float(809880023823263820), 'f')
'809880023823263872.000000'
The number is stored as Note the last two digits; they are 72, not 20.
Note that a float value is always stored as fractions, with the exponent determining the position of the decimal point; there is non 'integer portion'. Both very large and very small numbers are simply represented as a fraction with the decimal point shifted by the exponent.
For the third expression, Python converts numbers to a common type. Subtracting an integer from a float causes the integer to be converted to a float first. Since float(809880023823263820) is the same floating point value as the result on the left of the - operator, the result is 0.0.
From the Binary arithmetic operations section of the expressions reference documentation:
The - (subtraction) operator yields the difference of its arguments. The numeric arguments are first converted to a common type.
and a separate Arithmetic conversions section documents how this conversion is done:
When a description of an arithmetic operator below uses the phrase “the numeric arguments are converted to a common type,” this means that the operator implementation for built-in types works as follows:
If either argument is a complex number, the other is converted to complex;
otherwise, if either argument is a floating point number, the other is converted to floating point;
otherwise, both must be integers and no conversion is necessary.
You probably want to explore a different representation of rational numbers, either the fractions or the decimal modules. Either let you handle rational numbers as abstract fractions, or let you configure the numeric precision.
Floats are only of limited precision, while integers have arbitrary precision. That means you cannot represent every integer as float in Python! You can easily see this if you check:
>>> 809880023823263820 == 809880023823263820.0
False
>>> 809880023823263820 == int(809880023823263820.0)
False
From Wikipedia:
The 53-bit significand precision gives from 15 to 17 significant decimal digits precision (2−53 ≈ 1.11 × 10−16). If a decimal string with at most 15 significant digits is converted to IEEE 754 double-precision representation, and then converted back to a decimal string with the same number of digits, the final result should match the original string.
But your integer has 18 digits, so you shouldn't expect that a float can exactly represent that integer. There are values that can be represented exactly that are longer than 18 digits, but the conditions have to be "right" (for example powers of two):
>>> 2.0 ** 80 + 1 == 2 ** 80 + 1
False
>>> 2.0 ** 80 == 2 ** 80
True
Your last example is actually a bit misleading because 52920/2.0 can be represented exactly as float and because you actually convert it to an integer the calculation will be done completely with integers and result in an accurate integer result.
In the general case you could use Fraction to represent the value exactly:
>>> from fractions import Fraction
>>> Fraction(52920*(15303855351918+15303855298999), 2) == 809880023823263820
True
Note the , 2 instead of the / 2.0 here.
Python's math module contain handy functions like floor & ceil. These functions take a floating point number and return the nearest integer below or above it. However these functions return the answer as a floating point number. For example:
import math
f=math.floor(2.3)
Now f returns:
2.0
What is the safest way to get an integer out of this float, without running the risk of rounding errors (for example if the float is the equivalent of 1.99999) or perhaps I should use another function altogether?
All integers that can be represented by floating point numbers have an exact representation. So you can safely use int on the result. Inexact representations occur only if you are trying to represent a rational number with a denominator that is not a power of two.
That this works is not trivial at all! It's a property of the IEEE floating point representation that int∘floor = ⌊⋅⌋ if the magnitude of the numbers in question is small enough, but different representations are possible where int(floor(2.3)) might be 1.
To quote from Wikipedia,
Any integer with absolute value less than or equal to 224 can be exactly represented in the single precision format, and any integer with absolute value less than or equal to 253 can be exactly represented in the double precision format.
Use int(your non integer number) will nail it.
print int(2.3) # "2"
print int(math.sqrt(5)) # "2"
You could use the round function. If you use no second parameter (# of significant digits) then I think you will get the behavior you want.
IDLE output.
>>> round(2.99999999999)
3
>>> round(2.6)
3
>>> round(2.5)
3
>>> round(2.4)
2
Combining two of the previous results, we have:
int(round(some_float))
This converts a float to an integer fairly dependably.
That this works is not trivial at all! It's a property of the IEEE floating point representation that int∘floor = ⌊⋅⌋ if the magnitude of the numbers in question is small enough, but different representations are possible where int(floor(2.3)) might be 1.
This post explains why it works in that range.
In a double, you can represent 32bit integers without any problems. There cannot be any rounding issues. More precisely, doubles can represent all integers between and including 253 and -253.
Short explanation: A double can store up to 53 binary digits. When you require more, the number is padded with zeroes on the right.
It follows that 53 ones is the largest number that can be stored without padding. Naturally, all (integer) numbers requiring less digits can be stored accurately.
Adding one to 111(omitted)111 (53 ones) yields 100...000, (53 zeroes). As we know, we can store 53 digits, that makes the rightmost zero padding.
This is where 253 comes from.
More detail: We need to consider how IEEE-754 floating point works.
1 bit 11 / 8 52 / 23 # bits double/single precision
[ sign | exponent | mantissa ]
The number is then calculated as follows (excluding special cases that are irrelevant here):
-1sign × 1.mantissa ×2exponent - bias
where bias = 2exponent - 1 - 1, i.e. 1023 and 127 for double/single precision respectively.
Knowing that multiplying by 2X simply shifts all bits X places to the left, it's easy to see that any integer must have all bits in the mantissa that end up right of the decimal point to zero.
Any integer except zero has the following form in binary:
1x...x where the x-es represent the bits to the right of the MSB (most significant bit).
Because we excluded zero, there will always be a MSB that is one—which is why it's not stored. To store the integer, we must bring it into the aforementioned form: -1sign × 1.mantissa ×2exponent - bias.
That's saying the same as shifting the bits over the decimal point until there's only the MSB towards the left of the MSB. All the bits right of the decimal point are then stored in the mantissa.
From this, we can see that we can store at most 52 binary digits apart from the MSB.
It follows that the highest number where all bits are explicitly stored is
111(omitted)111. that's 53 ones (52 + implicit 1) in the case of doubles.
For this, we need to set the exponent, such that the decimal point will be shifted 52 places. If we were to increase the exponent by one, we cannot know the digit right to the left after the decimal point.
111(omitted)111x.
By convention, it's 0. Setting the entire mantissa to zero, we receive the following number:
100(omitted)00x. = 100(omitted)000.
That's a 1 followed by 53 zeroes, 52 stored and 1 added due to the exponent.
It represents 253, which marks the boundary (both negative and positive) between which we can accurately represent all integers. If we wanted to add one to 253, we would have to set the implicit zero (denoted by the x) to one, but that's impossible.
If you need to convert a string float to an int you can use this method.
Example: '38.0' to 38
In order to convert this to an int you can cast it as a float then an int. This will also work for float strings or integer strings.
>>> int(float('38.0'))
38
>>> int(float('38'))
38
Note: This will strip any numbers after the decimal.
>>> int(float('38.2'))
38
math.floor will always return an integer number and thus int(math.floor(some_float)) will never introduce rounding errors.
The rounding error might already be introduced in math.floor(some_large_float), though, or even when storing a large number in a float in the first place. (Large numbers may lose precision when stored in floats.)
Another code sample to convert a real/float to an integer using variables.
"vel" is a real/float number and converted to the next highest INTEGER, "newvel".
import arcpy.math, os, sys, arcpy.da
.
.
with arcpy.da.SearchCursor(densifybkp,[floseg,vel,Length]) as cursor:
for row in cursor:
curvel = float(row[1])
newvel = int(math.ceil(curvel))
Since you're asking for the 'safest' way, I'll provide another answer other than the top answer.
An easy way to make sure you don't lose any precision is to check if the values would be equal after you convert them.
if int(some_value) == some_value:
some_value = int(some_value)
If the float is 1.0 for example, 1.0 is equal to 1. So the conversion to int will execute. And if the float is 1.1, int(1.1) equates to 1, and 1.1 != 1. So the value will remain a float and you won't lose any precision.