Someone reverse-sorted by 1/i instead of the usual -i and it made me wonder: What is the smallest positive integer case where that fails*? I think it must be where two consecutive integers i and i+1 have the same reciprocal float. The smallest I found is i = 6369051721119404:
i = 6369051721119404
print(1/i == 1/(i+1))
import math
print(math.log2(i))
Output (Try it online!):
True
52.50000001100726
Note that it's only a bit larger than 252.5 (which I only noticed after finding the number with other ways) and 52 is the mantissa bits stored by float, so maybe that is meaningful.
So: What is the smallest where it fails? Is it the one I found? And is there a meaningful explanation for why?
* Meaning it fails to sort correctly, for example sorted([6369051721119404, 6369051721119405], key=lambda i: 1/i) doesn't reverse that list, as both key values are the same.
Suppose i is between 2^n and 2^(n+1) for some n.
Then 1/i is between 2^(-n-1) and 2^-n. Its representation as double-precision floating point is 1.xxx...xxx * 2^(-n-1), where there are 52 x's. The smallest difference that can be expressed at that magnitude is 2^-52 * 2^(-n-1) = 2^(-n-53).
1/i and 1/(i+1) may get rounded to the same number if the difference between them is at most 2^(-n-53). Solving for i: 1/i - 1/(i+1) = 2^(-n-53) ==> i(i+1) = 2^(n+53). The solution is approximately i = 2^((n+53) / 2). This matches our range for i if n = 52. Then i = 2^52.5.
Starting at this value of i there is a possibility to get the same value for 1/i and 1/(i+1). But it won't happen for every i as it depends on how the numbers are rounded. We can start searching at that point, and as you discovered shortly after that we'll find the first occurrence.
Note: We also need to rule out n=51 as it seems to have solutions that fall in range. However, the only integer i that falls in the range 2^51 to 2^52 and is at least the minimal value calculated above is 2^52 itself, which can be ruled out. For larger values we need to switch to n=52 as above.
Related
I am trying to match an expected output of "13031.157014219536" exactly, and I have attempted 3 times to get the value with different methods, detailed below, and come extremely close to the value, but not close enough. What is happening in these code snippets that it causing the deviation? Is it rounding error in the calculation? Did I do something wrong?
Attempts:
item_worth = 8000
years = range(10)
for year in years:
item_worth += (item_worth*0.05)
print(item_worth)
value = item_worth * (1 + ((0.05)**10))
print(value)
cost = 8000
for x in range(10):
cost *= 1.05
print(cost)
Expected Output:
13031.157014219536
Actual Outputs:
13031.157014219529
13031.157014220802
13031.157014219538
On most machine, floats are represented by fp64, or double floats.
You can check the precision of those for your number that way (not a method to be used for real computation. Just out of curiosity):
import struct
struct.pack('d', 13031.157014219536)
# bytes representing that number in fp64 b'\xf5\xbc\n\x19\x94s\xc9#'
# or, in a more humanely understandable way
struct.unpack('q', struct.pack('d', 13031.157014219536))[0]
# 4668389568658717941
# That number has no meaning, except that this integer is represented by the same bytes as your float.
# Now, let's see what float is "next in line"
struct.unpack('d', struct.pack('q', 4668389568658717941+1))[0]
# 13031.157014219538
Note that this code works on most machine, but is not reliable. First of all, it relies on the fact that significant bits are not just all 1. Otherwise, it would give a totally unrelated number. Secondly, it makes assumption that ints are LE. But well, it gave me what I wanted.
That is the information that the smallest number bigger than 13031.157014219536 is 13031.157014219538.
(Or, said more accurately for this kind of conversation: the smallest float bigger than 13031.157014219536 whose representation is not the same as the representation of 13031.157014219536 has the same representation as 13031.157014219538)
So, my point is you are flirting with the representation limit. You can't expect the sum of 10 numbers to be more accurate.
I could also have said that saying that the biggest power of 2 smaller than your number is 8192=2¹³.
So, that 13 is the exponent of your float in its representation. And this you have 53 significant bits, the precision of such a number is 2**(13-53+1) = 1.8×10⁻¹² (which is indeed also the result of 13031.157014219538-13031.157014219536). Hence the reason why in decimal, 12 decimal places are printed. But not all combination of them can exist, and the last one is not insignificant, but not fully significant neither.
If your computation is the result of the sum of 10 such numbers, you could even have an error 10 times bigger than your last result the right to complain :D
I have written a Python program that needs to run until the initial value gets to 0 or until another case is met. I wanted to say anything useful about the amount of loops the program would go through until it would reach 0 as this is necessary to find the solution to another program. But for some reason I get the following results, depending on the input variables and searching the Internet thus far has not helped solve my problem. I wrote the following code to simulate my problem.
temp = 1
cool = 0.4999 # This is the important variable
count = 0
while temp != 0.0:
temp *= cool
count += 1
print(count, temp)
print(count)
So, at some point I'd expect the script to stop and print the amount of loops necessary to get to 0.0. And with the code above that is indeed the case. After 1075 loops the program stops and returns 1075. However, if I change the value of cool to something above 0.5 (for example 0.5001) the program seems to run indefinitely. Why is this the case?
The default behavior in many implementations of floating-point arithmetic is to round the real-number arithmetic result to the nearest number representable in floating-point arithmetic.
In any floating-point number format, there is some smallest positive representable value, say s. Consider how s * .25 would be evaluated. The real number result is .25•s. This is not a representable value, so it must be rounded to the nearest representable number. It is between 0 and s, so those are the two closest representable values. Clearly, it is closer to 0 than it is to s, so it is rounded to 0, and that is the floating-point result of the operation.
Similarly, when s is multiplied by any number between 0 and ½, the result is rounded to zero.
In contrast, consider s * .75. The real number result is .75•s. This is closer to s than it is to 0, so it is rounded to s, and that is the floating-point result of the operation.
Similarly, when s is multiplied by any number between ½ and 1, the result is rounded to s.
Thus, if you start with a positive number and continually multiply by some fraction between 0 and 1, the number will get smaller and smaller until it reaches s, and then it will either jump to 0 or remain at s depending on whether the fraction is less than or greater than ½.
If the fraction is exactly ½, .5•s is the same distance from 0 and s. The usual rule for breaking ties favors the number with the even low bit in its fraction portion, so the result is rounded to 0.
Note that Python does not fully specific floating-point semantics, so the behaviors may vary from implementation to implementation. Most commonly, IEEE-754 binary64 is used, in which the smallest representable positive number is 2−1074.
I want to print floating point numbers with a set number of significant (i.e. non-zero) digits, but avoid the scientific notation (e).
So, for example, the number 0.000000002343245345 should be printed as 0.000000002343 (and not 2.343e-09)
I know how to define number of significant digits with an e notation:
>>>print('{:.3e}'.format(0.000000002343245345))
2.343e-09
And how to print a set number of decimal places without e-notation:
>>>print('{:.12f}'.format(0.000000002343245345))
0.000000002343
but not how to combine the two.
is there any simple way of doing so?
Here is some code that usually does what you want.
x = 0.000000002343245345
n = 4
from math import log10, floor
print('{:.{}f}'.format(x, n - floor(log10(x)) - 1))
Note that, due to the lack of exactness in floating-point arithmetic, that this may occasionally give a wrong answer. For example, the floor(log10()) may be one off from what is expected at or very near negative powers of 10 such as 0.1, 0.01, 0.001, and so on. My code seems to work well with those values but that is not guaranteed.
Also, there is no reasonable answer for some combinations of x and n. For example, what do you expect to result from x = 200000 and n = 4? There is no good answer, and my code gives the error
ValueError: Format specifier missing precision
You have to calculate the number of digits yourself. For four significant digits, this would be
number = 0.000000002343245345
digits = 4 - int(math.ceil(math.log10(number)))
print("{:.{}f}".format(number, digits))
# 0.000000002343
I have read that minimal float value that Python support is something in power -308.
Exact number doesn't matter because:
>>> -1.42108547152e-14 + 360.0 == 360.0
True
How is that? I have CPython 2.7.3 on Windows.
It cause me errors. My problem will be fixed if I will compare my value -1.42108547152e-14 (computed somehow) to some "delta" and do this:
if v < delta:
v = 0
What delta should I choose? In other words, with values smaller than what this effect will occur?
Please note that NumPy is not available.
An (over)simplified explanation is: A (normal) double-precision floating point number holds (what is equivalent to) approximately 16 decimal digits. Let's try to do your addition by hand:
360.0000000000000000000000000
- 0.0000000000000142108547152
______________________________
359.9999999999999857891452848
If you round this to 16 figures (3 before the point and 13 after), you get 360.
Now, in reality this is done in binary. The "16 decimal digits" is therefore not a precise rule. In reality, the precision here (between 256.0 and 512.0) is 44 binary digits for the fractional part of the number. So the number closest to 360 which can be represented is 360 minus {2 to the -44th power}, which gives:
359.9999999999999431565811391 (truncated)
But since our result before was closer to 360.0 than to this number, 360.0 is what you get.
Most processors use IEEE 754 binary floating-point arithmetic. In this format, numbers are represented as a sign s, a fraction f, and an exponent e. The fraction is also called a significand.
The sign s is a bit 0 or 1 representing + or –, respectively.
In double precision, the significand f is a 53-bit binary numeral with a radix point after the first bit, such as 1.10100001001001100111000110110011001010100000001000112.
In double precision, the exponent e is an integer from –1022 to +1023.
The combined value represented by the sign, significand, and exponent is (-1)s•2e•f.
When you add two numbers, the processor figures out what exponent to use for the result. Then, given that exponent, it figures out what fraction to use for the result. When you add a large number and a small number, the entire result will not fit into the significand. So, the processor must round the mathematical result to something that will fit into the significand.
In the cases you ask about, the second added number is so small that the rounding produces the same value as the first number. In order to change the first number, you must add a value that is at least half the value of the lowest bit in the significand of the first number. (When rounding, if part that does not fit in the significand is more than half the lowest bit of the significand, it is rounded up. If it is exactly half, it is rounded up if that makes the lowest bit zero or down if that makes the lowest bit zero.)
There are additional issues in floating-point, such as subnormal numbers, infinities, how the exponent is stored, and so on, but the above explains the behavior you asked about.
In the specific case you ask about, adding to 360, adding any value greater than 2-45 will produce a sum greater than 360. Adding any positive value less than or equal to 2-45 will produce exactly 360. This is because the highest bit in 360 is 28, so the lowest bit in its significand is 2-44.
Python's math module contain handy functions like floor & ceil. These functions take a floating point number and return the nearest integer below or above it. However these functions return the answer as a floating point number. For example:
import math
f=math.floor(2.3)
Now f returns:
2.0
What is the safest way to get an integer out of this float, without running the risk of rounding errors (for example if the float is the equivalent of 1.99999) or perhaps I should use another function altogether?
All integers that can be represented by floating point numbers have an exact representation. So you can safely use int on the result. Inexact representations occur only if you are trying to represent a rational number with a denominator that is not a power of two.
That this works is not trivial at all! It's a property of the IEEE floating point representation that int∘floor = ⌊⋅⌋ if the magnitude of the numbers in question is small enough, but different representations are possible where int(floor(2.3)) might be 1.
To quote from Wikipedia,
Any integer with absolute value less than or equal to 224 can be exactly represented in the single precision format, and any integer with absolute value less than or equal to 253 can be exactly represented in the double precision format.
Use int(your non integer number) will nail it.
print int(2.3) # "2"
print int(math.sqrt(5)) # "2"
You could use the round function. If you use no second parameter (# of significant digits) then I think you will get the behavior you want.
IDLE output.
>>> round(2.99999999999)
3
>>> round(2.6)
3
>>> round(2.5)
3
>>> round(2.4)
2
Combining two of the previous results, we have:
int(round(some_float))
This converts a float to an integer fairly dependably.
That this works is not trivial at all! It's a property of the IEEE floating point representation that int∘floor = ⌊⋅⌋ if the magnitude of the numbers in question is small enough, but different representations are possible where int(floor(2.3)) might be 1.
This post explains why it works in that range.
In a double, you can represent 32bit integers without any problems. There cannot be any rounding issues. More precisely, doubles can represent all integers between and including 253 and -253.
Short explanation: A double can store up to 53 binary digits. When you require more, the number is padded with zeroes on the right.
It follows that 53 ones is the largest number that can be stored without padding. Naturally, all (integer) numbers requiring less digits can be stored accurately.
Adding one to 111(omitted)111 (53 ones) yields 100...000, (53 zeroes). As we know, we can store 53 digits, that makes the rightmost zero padding.
This is where 253 comes from.
More detail: We need to consider how IEEE-754 floating point works.
1 bit 11 / 8 52 / 23 # bits double/single precision
[ sign | exponent | mantissa ]
The number is then calculated as follows (excluding special cases that are irrelevant here):
-1sign × 1.mantissa ×2exponent - bias
where bias = 2exponent - 1 - 1, i.e. 1023 and 127 for double/single precision respectively.
Knowing that multiplying by 2X simply shifts all bits X places to the left, it's easy to see that any integer must have all bits in the mantissa that end up right of the decimal point to zero.
Any integer except zero has the following form in binary:
1x...x where the x-es represent the bits to the right of the MSB (most significant bit).
Because we excluded zero, there will always be a MSB that is one—which is why it's not stored. To store the integer, we must bring it into the aforementioned form: -1sign × 1.mantissa ×2exponent - bias.
That's saying the same as shifting the bits over the decimal point until there's only the MSB towards the left of the MSB. All the bits right of the decimal point are then stored in the mantissa.
From this, we can see that we can store at most 52 binary digits apart from the MSB.
It follows that the highest number where all bits are explicitly stored is
111(omitted)111. that's 53 ones (52 + implicit 1) in the case of doubles.
For this, we need to set the exponent, such that the decimal point will be shifted 52 places. If we were to increase the exponent by one, we cannot know the digit right to the left after the decimal point.
111(omitted)111x.
By convention, it's 0. Setting the entire mantissa to zero, we receive the following number:
100(omitted)00x. = 100(omitted)000.
That's a 1 followed by 53 zeroes, 52 stored and 1 added due to the exponent.
It represents 253, which marks the boundary (both negative and positive) between which we can accurately represent all integers. If we wanted to add one to 253, we would have to set the implicit zero (denoted by the x) to one, but that's impossible.
If you need to convert a string float to an int you can use this method.
Example: '38.0' to 38
In order to convert this to an int you can cast it as a float then an int. This will also work for float strings or integer strings.
>>> int(float('38.0'))
38
>>> int(float('38'))
38
Note: This will strip any numbers after the decimal.
>>> int(float('38.2'))
38
math.floor will always return an integer number and thus int(math.floor(some_float)) will never introduce rounding errors.
The rounding error might already be introduced in math.floor(some_large_float), though, or even when storing a large number in a float in the first place. (Large numbers may lose precision when stored in floats.)
Another code sample to convert a real/float to an integer using variables.
"vel" is a real/float number and converted to the next highest INTEGER, "newvel".
import arcpy.math, os, sys, arcpy.da
.
.
with arcpy.da.SearchCursor(densifybkp,[floseg,vel,Length]) as cursor:
for row in cursor:
curvel = float(row[1])
newvel = int(math.ceil(curvel))
Since you're asking for the 'safest' way, I'll provide another answer other than the top answer.
An easy way to make sure you don't lose any precision is to check if the values would be equal after you convert them.
if int(some_value) == some_value:
some_value = int(some_value)
If the float is 1.0 for example, 1.0 is equal to 1. So the conversion to int will execute. And if the float is 1.1, int(1.1) equates to 1, and 1.1 != 1. So the value will remain a float and you won't lose any precision.