Calculating large fractions in Python? - python

I am trying to calculate fractions in Python 2.7. The limit_denominator method works great for the first 15 iterations of this code. However, then the code gets stuck in a loop, outputting denominators less than 1,000,000
Fraction = 1217471/860882
When I don't use limit_denominator, I get repeat outputs like this:
Fraction = 141421356237/100000000000
Eventually I want to iterate i to 1000, so my fractions will be very large. Any help?
from fractions import *
i = 0
x = 1/2.0
x1 = 0
count = 0
while i < 20:
(y) = (1.0 + (x))
(x) = (1 / (2.0 + (x)))
y1 = Fraction(str(y)).limit_denominator()
print("\nFraction = " + str(y1))
i += 1

The values converge to sqrt(2.0), which gives you a narrow range of fractions that will accurately represent the 64-bit float value. Your rational fraction cannot be more accurate than the float you give it.
If you want larger denominators, then you have to specify a larger denominator limit. You're still limited by the float accuracy: once you converge within the accuracy of your computational type (likely float64), you will not get more accuracy in your rational representation thereof. If you want greater accuracy, convert the entirety to fraction computations:
from fractions import *
x = Fraction(1,2)
for i in range(40):
y = Fraction(1) + x
x = Fraction(1) / (Fraction(2) + x)
print("Fraction = " + str(y))
Output:
Fraction = 3/2
Fraction = 7/5
Fraction = 17/12
Fraction = 41/29
Fraction = 99/70
Fraction = 239/169
Fraction = 577/408
Fraction = 1393/985
Fraction = 3363/2378
Fraction = 8119/5741
Fraction = 19601/13860
Fraction = 47321/33461
Fraction = 114243/80782
Fraction = 275807/195025
Fraction = 665857/470832
Fraction = 1607521/1136689
Fraction = 3880899/2744210
Fraction = 9369319/6625109
Fraction = 22619537/15994428
Fraction = 54608393/38613965
Fraction = 131836323/93222358
Fraction = 318281039/225058681
Fraction = 768398401/543339720
Fraction = 1855077841/1311738121
Fraction = 4478554083/3166815962
Fraction = 10812186007/7645370045
Fraction = 26102926097/18457556052
Fraction = 63018038201/44560482149
Fraction = 152139002499/107578520350
Fraction = 367296043199/259717522849
Fraction = 886731088897/627013566048
Fraction = 2140758220993/1513744654945
Fraction = 5168247530883/3654502875938
Fraction = 12477253282759/8822750406821
Fraction = 30122754096401/21300003689580
Fraction = 72722761475561/51422757785981
Fraction = 175568277047523/124145519261542
Fraction = 423859315570607/299713796309065
Fraction = 1023286908188737/723573111879672
Fraction = 2470433131948081/1746860020068409

I rewrote your code trying to solve your problem because i did not understand the need for limit_denominator. This is the result:
from fractions import *
x = Fraction(1, 2)
for i in range(1000):
y = 1 + Fraction(x)
print 'Y', y
x = 1 / (2 + x)
print 'X', x
The problem is that computers don't really understand numbers, instead they work with an abstract representation of numbers in memory called floating point (the origin of float i assume). This representation has a given precision (limit) which depends on the amount of memory reserved for the data type. That is why int32 has fewer accepted values than int64 for example.
However, python has a smart and efficient way of calculating large numbers.
Besides, the fractions library provides you with a way of representing numbers (fractions) that escape (not really, after all it is a computer) the floating point numbers constraint.
If you want to dive more into floating point arithmetic I recommend the all-mighty Numerical Analysis by Burden & Faires and Numerical Methods by Dr David Ham.

As Prune says, it's best to avoid floats when working with Fraction. And if you want to convert your fraction to a decimal without losing any accuracy you need to use a numeric type like Decimal which has enough precision. Another option is to just work with Python integers, and scale up your numerator with a sufficiently large multiplier.
Your series finds the convergents to the continued fraction of the square root of two. If you want to loop over all the convergents you can use the algorithm shown in Prune's answer. But if you want to calculate sqrt(2) quickly to a large number of digits, there's a better way, known as Hero's method (or Heron's method). This is a special case of Newton's method for calculating roots of algebraic equations. Instead of calculating the terms for each i in Prune's algorithm 1 by 1 we're essentially doubling i on each iteration, so the numerator & denominator grow large very quickly, doubling the accuracy of the answer on each loop iteration.
Here's a short demo that calculates sqrt(2) accurate to 100 digits. I'd normally do this using plain Python integers (or long integers in Python 2), but it's also easy to do it with a Fraction.
from __future__ import print_function
from fractions import Fraction as F
digits = 100
m = 10 ** digits
x = F(1, 1)
while x.denominator < m:
print(x)
x = x / 2 + 1 / x
print()
print(m * x.numerator // x.denominator)
output
1
3/2
17/12
577/408
665857/470832
886731088897/627013566048
1572584048032918633353217/1111984844349868137938112
4946041176255201878775086487573351061418968498177/3497379255757941172020851852070562919437964212608
48926646634423881954586808839856694558492182258668537145547700898547222910968507268117381704646657/34596363615919099765318545389014861517389860071988342648187104766246565694525469768325292176831232
14142135623730950488016887242096980785696718753769480731766797379907324784621070388503875343276415727
Tested on Python 2.6 and 3.6

Related

Fast way to find bit length of large positive integer from decimal

Given the decimal string representation of a large positive integer, what's a fast way to find the integer's bit length? Using int() and then bit_length() is slow. This example with a million digits takes over five seconds to tell me it has 3321926 bits:
s = '1234567890' * 10**5
print(int(s).bit_length())
Result should be exact, at least for all strings one can actually have in memory (so let's say up to up to 100 billion decimal digits).
If storage space is not an issue and you don't mind spending time up-front, (and you'd rather have a solution that doesn't depend on floating point accuracy, even if it's otherwise impractical) you can solve just about any speed issue with more memory. Build a lookup table of the string representations of 2**n. Set up a dictionary, keying the string length to a list of (string of that length, corresponding n value) pairs. To test an input, look up the appropriate list, and then use ordinary string comparison to figure out which bit-length category it's in.
This should be accurate for billions of digits, I think. Calculate the exact result for 100000...00000 by simply bits-per-digit, then add the log of the first 10 digits.
import math
s = '1234567890' * 10**5
dper = math.log(10)/math.log(2)
base= (len(s)-10)*dper
extra = math.log(int(s[:10]))/math.log(2)
print(int(base+extra+0.99))
This does the example in about 0.15 seconds, and '1234567890' * 10**6 in about 2 seconds and '1234567890' * 10**7 in about 20 seconds. First I approximate the bit length with logarithms (similar to Tim's way), then I use decimal.Decimal to adjust until exact. That class uses base 10, so it doesnt need a costly base conversion.
Bit length b covers the interval [2**(b-1), 2**b). So we want (the exponent of) the smallest power of 2 larger than the number.
Try it online!
from time import time
from math import log2
from decimal import *
setcontext(Context(prec=MAX_PREC, Emax=MAX_EMAX, Emin=MIN_EMIN))
def bit_length(s):
if len(s) <= 20:
return int(s).bit_length()
head_bits = log2(int(s[:20]))
tail_bits = (len(s) - 20) * log2(10)
b = int(head_bits + tail_bits)
n = Decimal(s)
power = Decimal(2) ** b
while power > n:
b -= 1
power //= 2
while power <= n:
b += 1
power *= 2
return b
s = '1234567890' * 10**5
start = time()
print(bit_length(s))
print(time() - start, 'seconds')

Explain a surprising parity in the rounding direction of apparent ties in the interval [0, 1]

Consider the collection of floating-point numbers of the form 0.xx5 between 0.0 and 1.0: [0.005, 0.015, 0.025, 0.035, ..., 0.985, 0.995]
I can make a list of all 100 such numbers easily in Python:
>>> values = [n/1000 for n in range(5, 1000, 10)]
Let's look at the first few and last few values to check we didn't make any mistakes:
>>> values[:8]
[0.005, 0.015, 0.025, 0.035, 0.045, 0.055, 0.065, 0.075]
>>> values[-8:]
[0.925, 0.935, 0.945, 0.955, 0.965, 0.975, 0.985, 0.995]
Now I want to round each of these numbers to two decimal places after the point. Some of the numbers will be rounded up; some will be rounded down. I'm interested in counting exactly how many round up. I can compute this easily in Python, too:
>>> sum(round(value, 2) > value for value in values)
50
So it turns out that exactly half of the 100 numbers were rounded up.
If you didn't know that Python was using binary floating-point under the hood, this result wouldn't be surprising. After all, Python's documentation states clearly that the round function uses round-ties-to-even (a.k.a. Banker's rounding) as its rounding mode, so you'd expect the values to round up and round down alternately.
But Python does use binary floating-point under the hood, and that means that with a handful of exceptions (namely 0.125, 0.375, 0.625 and 0.875), these values are not exact ties, but merely very good binary approximations to those ties. And not surprisingly, closer inspection of the rounding results shows that the values do not round up and down alternately. Instead, each value rounds up or down depending on which side of the decimal value the binary approximation happens to land. So there's no a priori reason to expect exactly half of the values to round up, and exactly half to round down. That makes it a little surprising that we got a result of exactly 50.
But maybe we just got lucky? After all, if you toss a fair coin 100 times, getting exactly 50 heads isn't that unusual an outcome: it'll happen with around an 8% probability. But it turns out that the pattern persists with a higher number of decimal places. Here's the analogous example when rounding to 6 decimal places:
>>> values = [n/10**7 for n in range(5, 10**7, 10)]
>>> sum(round(value, 6) > value for value in values)
500000
And here it is again rounding apparent ties to 8 decimal places after the point:
>>> values = [n/10**9 for n in range(5, 10**9, 10)]
>>> sum(round(value, 8) > value for value in values)
50000000
So the question is: why do exactly half of the cases round up? Or put another way, why is it that out of all the binary approximations to these decimal ties, the number of approximations that are larger than the true value exactly matches the number of approximations that are smaller? (One can easily show that for the case that are exact, we will have the same number of rounds up as down, so we can disregard those cases.)
Notes
I'm assuming Python 3.
On a typical desktop or laptop machine, Python's floats will be using the IEEE 754 binary64 ("double precision") floating-point format, and true division of integers and the round function will both be correctly rounded operations, using the round-ties-to-even rounding mode. While none of this is guaranteed by the language itself, the behaviour is overwhelmingly common, and we're assuming that such a typical machine is being used in this question.
This question was inspired by a Python bug report: https://bugs.python.org/issue41198
Not an answer, but just want to flesh out what's puzzling about it. It's certainly not "random", but noting that isn't enough ;-) Just look at the 2-digit case for concreteness:
>>> from decimal import Decimal as D
>>> for i in range(5, 100, 10):
... print('%2d' % i, D(i / 100))
5 0.05000000000000000277555756156289135105907917022705078125
15 0.1499999999999999944488848768742172978818416595458984375
25 0.25
35 0.34999999999999997779553950749686919152736663818359375
45 0.450000000000000011102230246251565404236316680908203125
55 0.5500000000000000444089209850062616169452667236328125
65 0.65000000000000002220446049250313080847263336181640625
75 0.75
85 0.84999999999999997779553950749686919152736663818359375
95 0.9499999999999999555910790149937383830547332763671875
Now you can pair i/100 with (100-i)/100 and their mathematical sum is exactly 1. So this pairs, in the above, 5 with 95, 15 with 85, and so on. The exact machine value for 5 rounds up, while that for 95 rounds down, which "is expected": if the true sum is 1, and one addend "rounds up", then surely the other "rounds down".
But that's not always so. 15 and 85 both round down, 25 and 75 is a mix, 35 and 65 is a mix, but 45 and 55 both round up.
What's at work that makes the total "up" and "down" cases exactly balance? Mark showed that they do for 10**3, 10**7, and 10**9, and I verified exact balance holds for exponents 2, 4, 5, 6, 8, 10, and 11 too.
A puzzling clue
This is very delicate. Instead of dividing by 10**n, what if we multiplied by its reciprocal instead. Contrast this with the above:
>>> for i in range(5, 100, 10):
... print('%2d' % i, D(i * (1 / 100)))
5 0.05000000000000000277555756156289135105907917022705078125
15 0.1499999999999999944488848768742172978818416595458984375
25 0.25
35 0.350000000000000033306690738754696212708950042724609375
45 0.450000000000000011102230246251565404236316680908203125
55 0.5500000000000000444089209850062616169452667236328125
65 0.65000000000000002220446049250313080847263336181640625
75 0.75
85 0.84999999999999997779553950749686919152736663818359375
95 0.95000000000000006661338147750939242541790008544921875
Now 7 (instead of 5) cases round up.
For 10**3, 64 (instead of 50) round up; for 10**4, 828 (instead of 500), for 10**5, 9763 (instead of 5000); and so on. So there's something vital about suffering no more than one rounding error in computing i/10**n.
It turns out that one can prove something stronger, that has nothing particularly to do with decimal representations or decimal rounding. Here's that stronger statement:
Theorem. Choose a positive integer n <= 2^1021, and consider the sequence of length n consisting of the fractions 1/2n, 3/2n, 5/2n, ..., (2n-1)/2n. Convert each fraction to the nearest IEEE 754 binary64 floating-point value, using the IEEE 754 roundTiesToEven rounding direction. Then the number of fractions for which the converted value is larger than the original fraction will exactly equal the number of fractions for which the converted value is smaller than the original fraction.
The original observation involving the sequence [0.005, 0.015, ..., 0.995] of floats then follows from the case n = 100 of the above statement: in 96 of the 100 cases, the result of round(value, 2) depends on the sign of the error introduced when rounding to binary64 format, and by the above statement, 48 of those cases will have positive error, and 48 will have negative error, so 48 will round up and 48 will round down. The remaining 4 cases (0.125, 0.375, 0.625, 0.875) convert to binary64 format with no change in value, and then the Banker's Rounding rule for round kicks in to round 0.125 and 0.625 down, and 0.375 and 0.875 up.
Notation. Here and below, I'm using pseudo-mathematical notation, not Python notation: ^ means exponentiation rather than bitwise exclusive or, and / means exact division, not floating-point division.
Example
Suppose n = 11. Then we're considering the sequence 1/22, 3/22, ..., 21/22. The exact values, expressed in decimal, have a nice simple recurring form:
1/22 = 0.04545454545454545...
3/22 = 0.13636363636363636...
5/22 = 0.22727272727272727...
7/22 = 0.31818181818181818...
9/22 = 0.40909090909090909...
11/22 = 0.50000000000000000...
13/22 = 0.59090909090909090...
15/22 = 0.68181818181818181...
17/22 = 0.77272727272727272...
19/22 = 0.86363636363636363...
21/22 = 0.95454545454545454...
The nearest exactly representable IEEE 754 binary64 floating-point values are:
1/22 -> 0.04545454545454545580707161889222334139049053192138671875
3/22 -> 0.13636363636363635354342704886221326887607574462890625
5/22 -> 0.2272727272727272651575702866466599516570568084716796875
7/22 -> 0.318181818181818176771713524431106634438037872314453125
9/22 -> 0.409090909090909116141432377844466827809810638427734375
11/22 -> 0.5
13/22 -> 0.59090909090909093936971885341336019337177276611328125
15/22 -> 0.68181818181818176771713524431106634438037872314453125
17/22 -> 0.7727272727272727070868540977244265377521514892578125
19/22 -> 0.86363636363636364645657295113778673112392425537109375
21/22 -> 0.954545454545454585826291804551146924495697021484375
And we see by direct inspection that when converting to float, 1/22, 9/22, 13/22, 19/22 and 21/22 rounded upward, while 3/22, 5/22, 7/22, 15/22 and 17/22 rounded downward. (11/22 was already exactly representable, so no rounding occurred.) So 5 of the 11 values were rounded up, and 5 were rounded down. The claim is that this perfect balance occurs regardless of the value of n.
Computational experiments
For those who might be more convinced by numerical experiments than a formal proof, here's some code (in Python).
First, let's write a function to create the sequences we're interested in, using Python's fractions module:
from fractions import Fraction
def sequence(n):
""" [1/2n, 3/2n, ..., (2n-1)/2n] """
return [Fraction(2*i+1, 2*n) for i in range(n)]
Next, here's a function to compute the "rounding direction" of a given fraction f, which we'll define as 1 if the closest float to f is larger than f, -1 if it's smaller, and 0 if it's equal (i.e., if f turns out to be exactly representable in IEEE 754 binary64 format). Note that the conversion from Fraction to float is correctly rounded under roundTiesToEven on a typical IEEE 754-using machine, and that the order comparisons between a Fraction and a float are computed using the exact values of the numbers involved.
def rounding_direction(f):
""" 1 if float(f) > f, -1 if float(f) < f, 0 otherwise """
x = float(f)
if x > f:
return 1
elif x < f:
return -1
else:
return 0
Now to count the various rounding directions for a given sequence, the simplest approach is to use collections.Counter:
from collections import Counter
def round_direction_counts(n):
""" Count of rounding directions for sequence(n). """
return Counter(rounding_direction(value)
for value in sequence(n))
Now we can put in any integer we like to observe that the count for 1 always matches the count for -1. Here's a handful of examples, starting with the n = 100 example that started this whole thing:
>>> round_direction_counts(100)
Counter({1: 48, -1: 48, 0: 4})
>>> round_direction_counts(237)
Counter({-1: 118, 1: 118, 0: 1})
>>> round_direction_counts(24)
Counter({-1: 8, 0: 8, 1: 8})
>>> round_direction_counts(11523)
Counter({1: 5761, -1: 5761, 0: 1})
The code above is unoptimised and fairly slow, but I used it to run tests up to n = 50000 and checked that the counts were balanced in each case.
As an extra, here's an easy way to visualise the roundings for small n: it produces a string containing + for cases that round up, - for cases that round down, and . for cases that are exactly representable. So our theorem says that each signature has the same number of + characters as - characters.
def signature(n):
""" String visualising rounding directions for given n. """
return "".join(".+-"[rounding_direction(value)]
for value in sequence(n))
And some examples, demonstrating that there's no immediately obvious pattern:
>>> signature(10)
'+-.-+++.--'
>>> signature(11)
'+---+.+--++'
>>> signature(23)
'---+++-+-+-.-++--++--++'
>>> signature(59)
'-+-+++--+--+-+++---++---+++--.-+-+--+-+--+-+-++-+-++-+-++-+'
>>> signature(50)
'+-++-++-++-+.+--+--+--+--+++---+++---.+++---+++---'
Proof of the statement
The original proof I gave was unnecessarily complicated. Following a suggestion from Tim Peters, I realised that there's a much simpler one. You can find the old one in the edit history, if you're really interested.
The proof rests on three simple observations. Two of those are floating-point facts; the third is a number-theoretic observation.
Observation 1. For any (non-tiny, non-huge) positive fraction x, x rounds "the same way" as 2x.
If y is the closest binary64 float to x, then 2y is the closest binary64 float to 2x. So if x rounds up, so does 2x, and if x rounds down, so does 2x. If x is exactly representable, so is 2x.
Small print: "non-tiny, non-huge" should be interpreted to mean that we avoid the extremes of the IEEE 754 binary64 exponent range. Strictly, the above statement applies for all x in the interval [-2^1022, 2^1023). There's a corner-case involving infinity to be careful of right at the top end of that range: if x rounds to 2^1023, then 2x rounds to inf, so the statement still holds in that corner case.
Observation 1 implies that (again provided that underflow and overflow are avoided), we can scale any fraction x by an arbitrary power of two without affecting the direction it rounds when converting to binary64.
Observation 2. If x is a fraction in the closed interval [1, 2], then 3 - x rounds the opposite way to x.
This follows because if y is the closest float to x (which implies that y must also be in the interval [1.0, 2.0]), then thanks to the even spacing of floats within [1, 2], 3 - y is also exactly representable and is the closest float to 3 - x. This works even for ties under the roundTiesToEven definition of "closest", since the last bit of y is even if and only if the last bit of 3 - y is.
So if x rounds up (i.e., y is greater than x), then 3 - y is smaller than 3 - x and so 3 - x rounds down. Similarly, if x is exactly representable, so is 3 - x.
Observation 3. The sequence 1/2n, 3/2n, 5/2n, ..., (2n-1)/2n of fractions is equal to the sequence n/n, (n+1)/n, (n+2)/n, ..., (2n-1)/n, up to scaling by powers of two and reordering.
This is just a scaled version of a simpler statement, that the sequence 1, 3, 5, ..., 2n-1 of integers is equal to the sequence n, n+1, ..., 2n-1, up to scaling by powers of two and reordering. That statement is perhaps easiest to see in the reverse direction: start out with the sequence n, n+1, n+2, ...,2n-1, and then divide each integer by its largest power-of-two divisor. What you're left with must be, in each case, an odd integer smaller than 2n, and it's easy to see that no such odd integer can occur twice, so by counting we must get every odd integer in 1, 3, 5, ..., 2n - 1, in some order.
With these three observations in place, we can complete the proof. Combining Observation 1 and Observation 3, we get that the cumulative rounding directions (i.e., the total counts of rounds-up, rounds-down, stays-the-same) of 1/2n, 3/2n, ..., (2n-1)/2n exactly match the cumulative rounding directions of n/n, (n+1)/n, ..., (2n-1)/n.
Now n/n is exactly one, so is exactly representable. In the case that n is even, 3/2 also occurs in this sequence, and is exactly representable. The rest of the values can be paired with each other in pairs that add up to 3: (n+1)/n pairs with (2n-1)/n, (n+2)/n pairs with (2n-2)/n, and so-on. And now by Observation 2, within each pair either one value rounds up and one value rounds down, or both values are exactly representable.
So the sequence n/n, (n+1)/2n, ..., (2n-1)/n has exactly as many rounds-down cases as rounds-up cases, and hence the original sequence 1/2n, 3/2n, ..., (2n-1)/2n has exactly as many rounds-down cases as rounds-up cases. That completes the proof.
Note: the restriction on the size of n in the original statement is there to ensure that none of our sequence elements lie in the subnormal range, so that Observation 1 can be used. The smallest positive binary64 normal value is 2^-1022, so our proof works for all n <= 2^1021.
Not an answer, but a further comment.
I am working on the assumption that:
the results of the original n/1000 will have been rounded to either less than or more than the exact fractional value, by calculating an extra bit of precision and then using the 0 or 1 in that extra bit to determine whether to round up or down (binary equivalent of Banker's rounding)
round is somehow comparing the value with the exact fractional value, or at least acting as if it is doing so (for example, doing the multiply-round-divide while using more bits of precision internally, at least for the multiply)
taking it on trust from the question that half of the exact fractions can be shown to round up and the other half down
If this is the case, then the question is equivalent to saying:
if you write the fractions as binimals, how many of them have a 1 in the i'th place (where the i'th place corresponds to the place after the final bit stored, which according to my assumptions will have been used to decide which way to round the number)
With this in mind, here is some code that will calculate arbitrary precision binimals, then sum the i'th bit of these binimals (for the non-exact cases) and add on half the number of non-exact cases.
def get_binimal(x, y, places=100,
normalise=True):
"""
returns a 2-tuple containing:
- x/y as a binimal, e.g. for
x=3, y=4 it would be 110000000...
- whether it is an exact fraction (in that example, True)
if normalise=True then give fractional part of binimal that starts
with 1. (i.e. IEEE mantissa)
"""
if x > y:
raise ValueError("x > y not supported")
frac = ""
val = x
exact = False
seen_one = False
if normalise:
places += 1 # allow for value which is always 1 (remove later)
while len(frac) < places:
val *= 2
if val >= y:
frac += "1"
val -= y
seen_one = True
if val == 0:
exact = True
else:
if seen_one or not normalise:
frac += "0"
if normalise:
frac = frac[1:] # discard the initial 1
return (frac, exact)
places = 100
n_exact = 0
n = 100
divisor = n * 10
binimals = []
for x in range(5, divisor, 10):
binimal, exact = get_binimal(x, divisor, places, True)
print(binimal, exact, x, n)
if exact:
n_exact += 1
else:
binimals.append(binimal)
for i in range(places):
print(i, n_exact // 2 + sum((b[i] == "1") for b in binimals))
Running this program gives for example:
0 50
1 50
2 50
3 50
4 50
5 50
6 50
7 50
8 50
... etc ...
Some observations from the results of, namely:
It is confirmed (from results shown plus experimenting with other values of n) that this gives the same counts as observed in the question (i.e. n/2), so the above hypothesis seems to be working.
The value of i does not matter, i.e. there is nothing special about the 53 mantissa bits in IEEE 64-bit floats -- any other length would give the same.
It does not matter whether the numbers are normalised or not. See the normalise argument to my get_binimal function); if this is set to True, then the returned value is analogous to a normalised IEEE mantissa, but the counts are unaffected.
Clearly the binimal expansions will consist of repeating sequences, and the fact that i does not matter is showing that the sequences must be aligned in such a way that the sum of i'th digits is always the same because there are equal numbers with each alignment of the repeating sequence.
Taking the case where n=100, and showing counts of the last 20 bits of each of the expansions (i.e. bits 80-99 because we asked for 100 places) using:
counts = collections.Counter([b[-20:] for b in binimals])
pprint.pprint(counts.items())
gives something like the following, although here I have hand-edited the ordering so as to show the repeating sequences more clearly:
[('00001010001111010111', 4),
('00010100011110101110', 4),
('00101000111101011100', 4),
('01010001111010111000', 4),
('10100011110101110000', 4),
('01000111101011100001', 4),
('10001111010111000010', 4),
('00011110101110000101', 4),
('00111101011100001010', 4),
('01111010111000010100', 4),
('11110101110000101000', 4),
('11101011100001010001', 4),
('11010111000010100011', 4),
('10101110000101000111', 4),
('01011100001010001111', 4),
('10111000010100011110', 4),
('01110000101000111101', 4),
('11100001010001111010', 4),
('11000010100011110101', 4),
('10000101000111101011', 4),
('00110011001100110011', 4),
('01100110011001100110', 4),
('11001100110011001100', 4),
('10011001100110011001', 4)]
There are:
80 (=4 * 20) views of a 20-bit repeating sequence
16 (=4 * 4) views of a 4-bit repeating sequence corresponding to division by 5 (for example 0.025 decimal = (1/5) * 2^-3)
4 exact fractions (not shown), for example 0.375 decimal (= 3 * 2^-3)
As I say, this is not claiming to be a full answer.
The really intriguing thing is that this result does not seem to be disrupted by normalising the numbers. Discarding the leading zeros will certainly change the alignment of the repeating sequence for individual fractions (shifting the sequence by varying number of bits depending how many leading zeros were ignored), but it is doing so in such a way that the total count for each alignment is preserved. I find this possibly the most curious part of the result.
And another curious thing - the 20-bit repeating sequence consists of a 10-bit sequence followed by its ones complement, so just e.g. the following two alignments in equal numbers would give the same total in every bit position:
10111000010100011110
01000111101011100001
and similarly for the 4-bit repeating sequence. BUT the result does not seem to depend on this - instead all 20 (and all 4) alignments are present in equal numbers.
For concreteness, I'll walk through Mark's explanation (as I modified in a comment) to explain everything seen in the 2-digit case I posted exhaustive results for.
There we're looking at i / 100 for i in range(5, 100, 10), which is looking at (10*i + 5) / 100 for i in range(10), which is the same (divide numerator and denominator by 5) as looking at (2*i + 1) / 20 for i in range(10).
The "rescaling trick" consists of shifting each numerator left until it's >= 10. This doesn't matter to rounding when converting to binary float! Factors of powers of 2 only affect the exponent, not the significand bits (assuming we stay within the normal range). By shifting, we adjust all the numerators to be in range(10, 20), and so when dividing by 20 we get signifcand fractions in the semi-open range [0.5, 1.0), which all have the same power-of-2 exponent.
The unique k such that 2**52 <= 10/20 * 2**k = 1/2 * 2**k < 2**53 is k=53 (so that the integer portion of the quotient has the 53 bits of precision IEEE-754 doubles hold), so we're looking at converting ratios of the form i * 2**53 / 20 for i in range(10, 20).
Now for any n, and expressing n as 2**t * o where o is odd:
i * 2**k = j * 2**k (mod 2*n) iff
i * 2**k = j * 2**k (mod 2**(t+1) * o) iff (assuming k >= t+1)
i * 2**(k-t-1) = j * 2**(k-t-1) (mod o) iff (o is odd, so coprime to 2**(k-t-1))
i = j (mod o)
range(n, 2*n) is n consecutive integers, so every subslice of o elements, mod o, contains each residue class mod o exactly once, and each residue class modulo o shows up exactly 2**t times in range(n, 2*n). The last point is most important here, since the rescaling trick leaves us with a permutation of range(n, 2*n).
We're using n = 10 = 2**1 * 5, and i * 2**53 / 20 = i * 2**51 / 5. In
q, r = divmod(i * 2**51, 5)
q is the 53-bit signifcand, and r is the remainder. If the remainder is 0, q is exact; if the remainder is 1 or 2, q is slightly too small ("rounding down"), and if the remainder is 3 or 4 the hardware will "round up" by adding 1 to q. But we don't care about q here, we only want to know which rounding action will occur, so r is what we do care about.
Now pow(2, 51, 5) = 3, so, modulo 5, multiplying by 2**51 is the same as multiplying by 3. Taking the odd integers in range(1, 20, 2) and doing the rescaling trick, to squash everything into range(10, 20), then multiplying by 2**51 (same as 3), and finding the remainder mod 5:
1 -> 16, * 3 % 5 = 3 up
3 -> 12, * 3 % 5 = 1 down
5 -> 10, * 3 % 5 = 0 exact
7 -> 14, * 3 % 5 = 2 down
9 -> 18, * 3 % 5 = 4 up
11 -> 11, * 3 % 5 = 3 up
13 -> 13, * 3 % 5 = 4 up
15 -> 15, * 3 % 5 = 0 exact
17 -> 17, * 3 % 5 = 1 down
19 -> 19, * 3 % 5 = 2 down
Which all match what the exhaustive results posted before showed.

How are data types interpreted, calculated, and/or stored?

In python, suppose the code is:
import.math
a = math.sqrt(2.0)
if a * a == 2.0:
x = 2
else:
x = 1
This is a variant of "Floating Point Numbers are Approximations -- Not Exact".
Mathematically speaking, you are correct that sqrt(2) * sqrt(2) == 2. But sqrt(2) can not be exactly represented as a native datatype (read: floating point number). (Heck, the sqrt(2) is actually guaranteed to be an infinite decimal!). It can get really close, but not exact:
>>> import math
>>> math.sqrt(2)
1.4142135623730951
>>> math.sqrt(2) * math.sqrt(2)
2.0000000000000004
Note the result is, in fact, not exactly 2.
If you want the x = 2 branch to execute, you will need to use an epsilon value of "is the result close enough?":
epsilon = 1e-6 # 0.000001
if abs(2.0 - a*a) < epsilon:
x = 2
else:
x = 1
Numbers with decimals are stored as floating point numbers and they can only be an approximation to the real number in some cases.
So your comparison needs to be not "are these two numbers exactly equal (==)" but "are they sufficiently close as to be considered equal".
Fortunately, in the math library, there's a function to do that conveniently. Using isClose(), you can compare with a defined tolerance. The function isn't too complicated, you could do it yourself.
math.isclose(a*a, 2, abs_tol=0.0001)
>> True

Convert float to rounded decimal equivalent

When you convert a float to Decimal, the Decimal will contain as accurate a representation of the binary number that it can. It's nice to be accurate, but it isn't always what you want. Since many decimal numbers can't be represented exactly in binary, the resulting Decimal will be a little off - sometimes a little high, sometimes a little low.
>>> from decimal import Decimal
>>> for f in (0.1, 0.3, 1e25, 1e28, 1.0000000000001):
print Decimal(f)
0.1000000000000000055511151231257827021181583404541015625
0.299999999999999988897769753748434595763683319091796875
10000000000000000905969664
9999999999999999583119736832
1.000000000000099920072216264088638126850128173828125
Ideally we'd like the Decimal to be rounded to the most likely decimal equivalent.
I tried converting to str since a Decimal created from a string will be exact. Unfortunately str rounds a little too much.
>>> for f in (0.1, 0.3, 1e25, 1e28, 1.0000000000001):
print Decimal(str(f))
0.1
0.3
1E+25
1E+28
1.0
Is there a way of getting a nicely rounded Decimal from a float?
It turns out that repr does a better job of converting a float to a string than str does. It's the quick-and-easy way to do the conversion.
>>> for f in (0.1, 0.3, 1e25, 1e28, 1.0000000000001):
print Decimal(repr(f))
0.1
0.3
1E+25
1E+28
1.0000000000001
Before I discovered that, I came up with a brute-force way of doing the rounding. It has the advantage of recognizing that large numbers are accurate to 15 digits - the repr method above only recognizes one significant digit for the 1e25 and 1e28 examples.
from decimal import Decimal,DecimalTuple
def _increment(digits, exponent):
new_digits = [0] + list(digits)
new_digits[-1] += 1
for i in range(len(new_digits)-1, 0, -1):
if new_digits[i] > 9:
new_digits[i] -= 10
new_digits[i-1] += 1
if new_digits[0]:
return tuple(new_digits[:-1]), exponent + 1
return tuple(new_digits[1:]), exponent
def nearest_decimal(f):
sign, digits, exponent = Decimal(f).as_tuple()
if len(digits) > 15:
round_up = digits[15] >= 5
exponent += len(digits) - 15
digits = digits[:15]
if round_up:
digits, exponent = _increment(digits, exponent)
while digits and digits[-1] == 0 and exponent < 0:
digits = digits[:-1]
exponent += 1
return Decimal(DecimalTuple(sign, digits, exponent))
>>> for f in (0.1, 0.3, 1e25, 1e28, 1.0000000000001):
print nearest_decimal(f)
0.1
0.3
1.00000000000000E+25
1.00000000000000E+28
1.0000000000001
Edit: I discovered one more reason to use the brute-force rounding. repr tries to return a string that uniquely identifies the underlying float bit representation, but it doesn't necessarily ensure the accuracy of the last digit. By using one less digit, my rounding function will more often be the number you would expect.
>>> print Decimal(repr(2.0/3.0))
0.6666666666666666
>>> print dec.nearest_decimal(2.0/3.0)
0.666666666666667
The decimal created with repr is actually more accurate, but it implies a level of precision that doesn't exist. The nearest_decimal function delivers a better match between precision and accuracy.
I have implemented this in Pharo Smalltalk, in a Float method named asMinimalDecimalFraction.
It is exactly the same problem as printing the shortest decimal fraction that would be re-interpreted as the same float/double, assuming correct rounding (to nearest).
See my answer at Count number of digits after `.` in floating point numbers? for more references

round() doesn't seem to be rounding properly

The documentation for the round() function states that you pass it a number, and the positions past the decimal to round. Thus it should do this:
n = 5.59
round(n, 1) # 5.6
But, in actuality, good old floating point weirdness creeps in and you get:
5.5999999999999996
For the purposes of UI, I need to display 5.6. I poked around the Internet and found some documentation that this is dependent on my implementation of Python. Unfortunately, this occurs on both my Windows dev machine and each Linux server I've tried. See here also.
Short of creating my own round library, is there any way around this?
I can't help the way it's stored, but at least formatting works correctly:
'%.1f' % round(n, 1) # Gives you '5.6'
Formatting works correctly even without having to round:
"%.1f" % n
If you use the Decimal module you can approximate without the use of the 'round' function. Here is what I've been using for rounding especially when writing monetary applications:
from decimal import Decimal, ROUND_UP
Decimal(str(16.2)).quantize(Decimal('.01'), rounding=ROUND_UP)
This will return a Decimal Number which is 16.20.
round(5.59, 1) is working fine. The problem is that 5.6 cannot be represented exactly in binary floating point.
>>> 5.6
5.5999999999999996
>>>
As Vinko says, you can use string formatting to do rounding for display.
Python has a module for decimal arithmetic if you need that.
You get '5.6' if you do str(round(n, 1)) instead of just round(n, 1).
You can switch the data type to an integer:
>>> n = 5.59
>>> int(n * 10) / 10.0
5.5
>>> int(n * 10 + 0.5)
56
And then display the number by inserting the locale's decimal separator.
However, Jimmy's answer is better.
Take a look at the Decimal module
Decimal β€œis based on a floating-point
model which was designed with people
in mind, and necessarily has a
paramount guiding principle –
computers must provide an arithmetic
that works in the same way as the
arithmetic that people learn at
school.” – excerpt from the decimal
arithmetic specification.
and
Decimal numbers can be represented
exactly. In contrast, numbers like 1.1
and 2.2 do not have an exact
representations in binary floating
point. End users typically would not
expect 1.1 + 2.2 to display as
3.3000000000000003 as it does with binary floating point.
Decimal provides the kind of operations that make it easy to write apps that require floating point operations and also need to present those results in a human readable format, e.g., accounting.
Floating point math is vulnerable to slight, but annoying, precision inaccuracies. If you can work with integer or fixed point, you will be guaranteed precision.
It's a big problem indeed. Try out this code:
print "%.2f" % (round((2*4.4+3*5.6+3*4.4)/8,2),)
It displays 4.85. Then you do:
print "Media = %.1f" % (round((2*4.4+3*5.6+3*4.4)/8,1),)
and it shows 4.8. Do you calculations by hand the exact answer is 4.85, but if you try:
print "Media = %.20f" % (round((2*4.4+3*5.6+3*4.4)/8,20),)
you can see the truth: the float point is stored as the nearest finite sum of fractions whose denominators are powers of two.
printf the sucker.
print '%.1f' % 5.59 # returns 5.6
I would avoid relying on round() at all in this case. Consider
print(round(61.295, 2))
print(round(1.295, 2))
will output
61.3
1.29
which is not a desired output if you need solid rounding to the nearest integer. To bypass this behavior go with math.ceil() (or math.floor() if you want to round down):
from math import ceil
decimal_count = 2
print(ceil(61.295 * 10 ** decimal_count) / 10 ** decimal_count)
print(ceil(1.295 * 10 ** decimal_count) / 10 ** decimal_count)
outputs
61.3
1.3
Hope that helps.
You can use the string format operator %, similar to sprintf.
mystring = "%.2f" % 5.5999
I am doing:
int(round( x , 0))
In this case, we first round properly at the unit level, then we convert to integer to avoid printing a float.
so
>>> int(round(5.59,0))
6
I think this answer works better than formating the string, and it also makes more sens to me to use the round function.
Works Perfect
format(5.59, '.1f') # to display
float(format(5.59, '.1f')) #to round
Another potential option is:
def hard_round(number, decimal_places=0):
"""
Function:
- Rounds a float value to a specified number of decimal places
- Fixes issues with floating point binary approximation rounding in python
Requires:
- `number`:
- Type: int|float
- What: The number to round
Optional:
- `decimal_places`:
- Type: int
- What: The number of decimal places to round to
- Default: 0
Example:
```
hard_round(5.6,1)
```
"""
return int(number*(10**decimal_places)+0.5)/(10**decimal_places)
Code:
x1 = 5.63
x2 = 5.65
print(float('%.2f' % round(x1,1))) # gives you '5.6'
print(float('%.2f' % round(x2,1))) # gives you '5.7'
Output:
5.6
5.7
The problem is only when last digit is 5. Eg. 0.045 is internally stored as 0.044999999999999... You could simply increment last digit to 6 and round off. This will give you the desired results.
import re
def custom_round(num, precision=0):
# Get the type of given number
type_num = type(num)
# If the given type is not a valid number type, raise TypeError
if type_num not in [int, float, Decimal]:
raise TypeError("type {} doesn't define __round__ method".format(type_num.__name__))
# If passed number is int, there is no rounding off.
if type_num == int:
return num
# Convert number to string.
str_num = str(num).lower()
# We will remove negative context from the number and add it back in the end
negative_number = False
if num < 0:
negative_number = True
str_num = str_num[1:]
# If number is in format 1e-12 or 2e+13, we have to convert it to
# to a string in standard decimal notation.
if 'e-' in str_num:
# For 1.23e-7, e_power = 7
e_power = int(re.findall('e-[0-9]+', str_num)[0][2:])
# For 1.23e-7, number = 123
number = ''.join(str_num.split('e-')[0].split('.'))
zeros = ''
# Number of zeros = e_power - 1 = 6
for i in range(e_power - 1):
zeros = zeros + '0'
# Scientific notation 1.23e-7 in regular decimal = 0.000000123
str_num = '0.' + zeros + number
if 'e+' in str_num:
# For 1.23e+7, e_power = 7
e_power = int(re.findall('e\+[0-9]+', str_num)[0][2:])
# For 1.23e+7, number_characteristic = 1
# characteristic is number left of decimal point.
number_characteristic = str_num.split('e+')[0].split('.')[0]
# For 1.23e+7, number_mantissa = 23
# mantissa is number right of decimal point.
number_mantissa = str_num.split('e+')[0].split('.')[1]
# For 1.23e+7, number = 123
number = number_characteristic + number_mantissa
zeros = ''
# Eg: for this condition = 1.23e+7
if e_power >= len(number_mantissa):
# Number of zeros = e_power - mantissa length = 5
for i in range(e_power - len(number_mantissa)):
zeros = zeros + '0'
# Scientific notation 1.23e+7 in regular decimal = 12300000.0
str_num = number + zeros + '.0'
# Eg: for this condition = 1.23e+1
if e_power < len(number_mantissa):
# In this case, we only need to shift the decimal e_power digits to the right
# So we just copy the digits from mantissa to characteristic and then remove
# them from mantissa.
for i in range(e_power):
number_characteristic = number_characteristic + number_mantissa[i]
number_mantissa = number_mantissa[i:]
# Scientific notation 1.23e+1 in regular decimal = 12.3
str_num = number_characteristic + '.' + number_mantissa
# characteristic is number left of decimal point.
characteristic_part = str_num.split('.')[0]
# mantissa is number right of decimal point.
mantissa_part = str_num.split('.')[1]
# If number is supposed to be rounded to whole number,
# check first decimal digit. If more than 5, return
# characteristic + 1 else return characteristic
if precision == 0:
if mantissa_part and int(mantissa_part[0]) >= 5:
return type_num(int(characteristic_part) + 1)
return type_num(characteristic_part)
# Get the precision of the given number.
num_precision = len(mantissa_part)
# Rounding off is done only if number precision is
# greater than requested precision
if num_precision <= precision:
return num
# Replace the last '5' with 6 so that rounding off returns desired results
if str_num[-1] == '5':
str_num = re.sub('5$', '6', str_num)
result = round(type_num(str_num), precision)
# If the number was negative, add negative context back
if negative_number:
result = result * -1
return result
Here's where I see round failing. What if you wanted to round these 2 numbers to one decimal place?
23.45
23.55
My education was that from rounding these you should get:
23.4
23.6
the "rule" being that you should round up if the preceding number was odd, not round up if the preceding number were even.
The round function in python simply truncates the 5.
Here is an easy way to round a float number to any number of decimal places, and it still works in 2021!
float_number = 12.234325335563
rounded = round(float_number, 3) # 3 is the number of decimal places to be returned.You can pass any number in place of 3 depending on how many decimal places you want to return.
print(rounded)
And this will print;
12.234
What about:
round(n,1)+epsilon

Categories