Float to String Round-Trip Test - python

I believe that 17 decimal places should be enough to correctly represent an 8-byte float, such that it is round-trip safe (converted to a string and back without any loss).
But in this test, the number can go as high as 23, and probably higher if you increase the number of iterations.
Is this a flawed test and why?
And how do you ensure a round-trip integrity of a float in Python?
def TestLoop():
sFormat = ''
success = True
ff = [1.0/i for i in range(1,10000000)]
for n in range(17, 31):
sFormat = '{{:.{:d}f}}'.format(n)
success = True
for f in ff:
if f != float(sFormat.format(f)):
success = False
break
if success:
return(n)
return(-1)
n = TestLoop()
print('Lossless with ', n, ' decimal places.')
If an IEEE 754 double precision is converted to a decimal string with at least 17 significant digits and then converted back to double, then the final number must match the original.

In my original test, I was operating on small numbers, so there were a lot of leading zeros, which are not significant digits. Floats require 17 significant digits to be represented correctly. By changing one line like so, I made the numbers larger and was able to succeed with only 16 digits after the decimal point.
ff = [10000000.0/i for i in range(1,10000000)]
The best approach seems to be to not use the format() at all, but use repr() or str() instead.
This code here succeeds:
def TestLoop():
for i in range(1, 10000000):
f = 1.0 / i
if f != float(repr(f)):
print('Failed.')
return
print('Succeeded.')
return
TestLoop()
Another way that worked was to use 17 digits after the decimal point, but use the g formatter instead of f. This uses an exponent, so the leading zeros are eliminated.
if f != float('{:.17g}'.format(f)):

Related

Why does my program only print the first few characters of e rather than the whole number?

e = str(2.7182818284590452353602874713526624977572470936999595749669676277240766303535475945713821785251664274)
print(e)
Output:
2.718281828459045
Screenshots: here and here.
Why does the code only print out the first few characters of e instead of the whole string?
A string str has characters, but a number (be it an int or a float) just has a value.
If you do this:
e_first_100 = '2.7182818284590452353602874713526624977572470936999595749669676277240766303535475945713821785251664274'
print(e_first_100)
You'll see all digits printed, because they are just characters in a string, it could have also been the first 100 characters from 'War and Peace' and you would not expect any of that to get lost either.
Since 'e' is not an integer value, you can't use int here, so you'll have to use float, but Python uses a finite number of bits to represent such a number, while there's an infinite number of real numbers. In fact there's an infinite number of values between any two real numbers. So a clever way has to be used to represent at least the ones you use most often, with a limited amount of precision.
You often don't notice the lack of precision, but try something like .1 + .1 + .1 == .3 in Python and you'll see that it can pop up in common situations.
Your computer already has a built-in way to represent these floating point numbers, using either 32 or 64 bits, although many languages (Python included) do offer additional ways of representing floats that aren't part of the way your computer works and allow a bit more precision. By default, Python uses these standard representations of real numbers.
So, if you then do this:
e1 = float(e_first_100)
print(e1)
e2 = 2.7182818284590452353602874713526624977572470936999595749669676277240766303535475945713821785251664274
print(e2)
Both result in a value that, when you print it, looks like:
2.718281828459045
Because that's the precision up to which the number is (more or less) accurately represented.
If you need to use e in a more precise manner, you can use Python's own representation:
from decimal import Decimal
e3 = Decimal(e_first_100)
print(e3)
That looks promising, but even Decimal only has limited precision, although it's better than standard floats:
print(e2 * 3)
print(e3 * Decimal(3))
The difference:
8.154845485377136
8.154845485377135706080862414
To expand on Grismar's answer, you don't see the data because the default string representation of floats cuts off at that point as going further than that wouldn't be very useful, but while the object is a float the data is still there.
To get a string with the data, you could provide a fixed precision to some larger amount of digits, for example
In [2]: e = format(
...: 2.7182818284590452353602874713526624977572470936999595749669676277240766303535475945713821785251664274,
...: ".50f",
...: )
In [3]: e
Out[3]: '2.71828182845904509079559829842764884233474731445312'
which gives us the first 50 digits, but this is of course not particularly useful with floats as the loss of precision picks up the further you go

Is it possible to have a float number without a decimal point in Python?

I asked this because it is possible in R. Note that both 1.5 and 1 are in numeric type (double-precision), and only 1L is an integer. When coercing a string into numeric type, it doesn't show a decimal point if there's not one in the string.
class(1.5)
# "numeric"
class(1)
# "numeric"
class(1L)
# "integer"
x <- as.numeric("3")
x
# 3
class(x)
# "numeric"
Am I allowed to have similar operations in Python? Let's say I have a function called key_in_a_number:
def key_in_a_number():
num = input("Key in a number here: ")
try:
return float(num)
except ValueError:
return "Please key in only numbers."
Now if one keys in "40", it will return 40.0, but 40.0 and 40 are different in certain digits. Thus, 40 should be returned if "40" is keyed in, while 40.0 should be returned only when "40.0" is keyed in.
My work around is:
def key_in_a_number():
num = input("Key in a number here: ")
try:
return int(num)
except ValueError:
try:
return float(num)
except ValueError:
return "Please key in only numbers."
However, in this way, I cannot be sure that the results are always in the same type, which could be problematic in following data storage or processing. Is there any way to have a number in float type without a decimal point?
I think your core problem here is that you're misunderstanding what float is.
A float represents a C double, which almost always means an IEEE 754-1985 double (or an IEEE 754-2008 binary64, which is basically the same thing but slightly better defined). It always has 53 binary digits of precision. It doesn't matter whether you specify it as 40., 40.00000, float(40), float('40'), or float('40.00'); those are all identical in every way.
So, the main problem you're asking about doesn't make any sense:
Now if one keys in "40", it will return 40.0, but 40.0 and 40 are different in certain digits.
No, they aren't. float("40") and float("40.0") are both the exact same value, with no differences in any digits, and no difference in their precision, or anything else.
There's a different type in Python, in the decimal library, that represents an IEEE 754-2008 arbitrary-sized decimal. It has as many decimal digits of precision as you tell it to have.
So, Decimal('40') and Decimal('40.') have two digits; Decimal('40.000') has five digits—they may be equal, but they're not identical, because the last one is more precise.
Decimal, on the other hand, prints out however many digits of precision it actually has:
>>> print(Decimal('40'))
40
>>> print(Decimal('40.'))
40
>>> print(Decimal('40.0'))
40.0
While we're at it, if you do want float and int values, here's how to translate each line of R into Python:
class(1.5) # numeric
type(1.5) # float
class(1) # numeric
type(1) # int
type(1.) # float
class(1L) # integer
type(1) # int
x <- as.numeric("3") # numeric
x = float(3) # float
x = float("3") # float
Notice that, just like as.numeric("3") gives you a numeric rather than an integer, float("3")gives you afloatrather than anint`. I'm not sure why that Python behavior puzzles you given that it's identical to the equivalent R behavior.
Yes,
10 would be an integer in Python, whereas 10. which represents the same number would be a float.

Understanding the Python %g in string formatting, achieving Java String.format behavior

In Java:
String test1 = String.format("%7.2g", 3e9);
System.out.println(test1);
This prints 3.0e+09
In Python 2.7, if I run this code
for num in [3e9, 3.1e9, 3.01e9, 3e2, 3.1e2, 3.01e2]:
print '%7.2g %7.2f %7.2e' % (num, num, num)
I get
3e+09 3000000000.00 3.00e+09
3.1e+09 3100000000.00 3.10e+09
3e+09 3010000000.00 3.01e+09
3e+02 300.00 3.00e+02
3.1e+02 310.00 3.10e+02
3e+02 301.00 3.01e+02
Huh? It looks like the precision (.2) argument is treated totally differently in Python's %g than in Python's %f, Python's %e, or Java's %g. Here's the doc (my emphasis):
General format. For a given precision p >= 1, this rounds the number to p significant digits and then formats the result in either fixed-point format or in scientific notation, depending on its magnitude.
The precise rules are as follows: suppose that the result formatted with presentation type 'e' and precision p-1 would have exponent exp. Then if -4 <= exp < p, the number is formatted with presentation type 'f' and precision p-1-exp. Otherwise, the number is formatted with presentation type 'e' and precision p-1. In both cases insignificant trailing zeros are removed from the significand, and the decimal point is also removed if there are no remaining digits following it.
Positive and negative infinity, positive and negative zero, and nans, are formatted as inf, -inf, 0, -0 and nan respectively, regardless of the precision.
A precision of 0 is treated as equivalent to a precision of 1. The default precision is 6.
WTF? Is there any way to prevent those trailing zeros from being removed? The whole point of string formatting is to achieve some consistency, e.g. for text alignment.
Is there any way to get the Java behavior (essentially the number of significant digits to the right of the decimal point) without having to rewrite the whole thing from scratch?
With the format method, you can do something like this:
for num in [3e9, 3.1e9, 3.01e9, 3e2, 3.1e2, 3.01e2]:
print ('{n:7.2{c}} {n:7.2f} {n:7.2e}'.format(n=num, c='e' if num > 1e4 else 'f'))
Output:
3.00e+09 3000000000.00 3.00e+09
3.10e+09 3100000000.00 3.10e+09
3.01e+09 3010000000.00 3.01e+09
300.00 300.00 3.00e+02
310.00 310.00 3.10e+02
301.00 301.00 3.01e+02
There are two parts to it that might not be so well known.
1. Parametrizing the string formatting
In addition to simple formatting:
>>> '{}'.format(3.5)
'3.5'
and formatting with more detailed specification of the result:
>>> '{:5.2f}'.format(3.5)
' 3.50'
you can use keyword arguments in formatthat you can access in the string :
>>> '{num:5.2f}'.format(num=3.5)
' 3.50'
You can use these also for the format specification itself:
>>> '{:5.{deci}f}'.format(3.5, deci=3)
'3.500'
2. The if expression
In addition to the if statement there is an if expression, a.k.a. ternary operator.
So, this expression:
a = 1
b = 2
res = 10 if a < b else 20
is equivalent to this statement:
if a < b:
res = 10
else:
res= 20
Putting both together yields something like this:
'{num:7.2{c}}'.format(num=num, c='e' if num > 1e4 else 'f')
The formatting that python does is more consistent with C's printf style formatting, which also drops trailing zeros for the g conversion. Since python's reference implementation is in C, why should it be consistent with Java in this case?
When using the % operator for string formatting, the relevant documentation is String Formatting Operations, which has some differences to the one you linked to, notably that it allows the # alternate form for g:
The alternate form causes the result to always contain a decimal point, and trailing zeroes are not removed as they would otherwise be.
The precision determines the number of significant digits before and after the decimal point and defaults to 6.
So in your case:
>>> "%#7.2g" % 3e9
3.0e+09
This is different from what is allowed by str.format(), where # is used to enable prefixes for binary, octal or hexadecimal output (at least in python2, this was changed in python3).

Convert float to string in positional format (without scientific notation and false precision)

I want to print some floating point numbers so that they're always written in decimal form (e.g. 12345000000000000000000.0 or 0.000000000000012345, not in scientific notation, yet I'd want to the result to have the up to ~15.7 significant figures of a IEEE 754 double, and no more.
What I want is ideally so that the result is the shortest string in positional decimal format that still results in the same value when converted to a float.
It is well-known that the repr of a float is written in scientific notation if the exponent is greater than 15, or less than -4:
>>> n = 0.000000054321654321
>>> n
5.4321654321e-08 # scientific notation
If str is used, the resulting string again is in scientific notation:
>>> str(n)
'5.4321654321e-08'
It has been suggested that I can use format with f flag and sufficient precision to get rid of the scientific notation:
>>> format(0.00000005, '.20f')
'0.00000005000000000000'
It works for that number, though it has some extra trailing zeroes. But then the same format fails for .1, which gives decimal digits beyond the actual machine precision of float:
>>> format(0.1, '.20f')
'0.10000000000000000555'
And if my number is 4.5678e-20, using .20f would still lose relative precision:
>>> format(4.5678e-20, '.20f')
'0.00000000000000000005'
Thus these approaches do not match my requirements.
This leads to the question: what is the easiest and also well-performing way to print arbitrary floating point number in decimal format, having the same digits as in repr(n) (or str(n) on Python 3), but always using the decimal format, not the scientific notation.
That is, a function or operation that for example converts the float value 0.00000005 to string '0.00000005'; 0.1 to '0.1'; 420000000000000000.0 to '420000000000000000.0' or 420000000000000000 and formats the float value -4.5678e-5 as '-0.000045678'.
After the bounty period: It seems that there are at least 2 viable approaches, as Karin demonstrated that using string manipulation one can achieve significant speed boost compared to my initial algorithm on Python 2.
Thus,
If performance is important and Python 2 compatibility is required; or if the decimal module cannot be used for some reason, then Karin's approach using string manipulation is the way to do it.
On Python 3, my somewhat shorter code will also be faster.
Since I am primarily developing on Python 3, I will accept my own answer, and shall award Karin the bounty.
Unfortunately it seems that not even the new-style formatting with float.__format__ supports this. The default formatting of floats is the same as with repr; and with f flag there are 6 fractional digits by default:
>>> format(0.0000000005, 'f')
'0.000000'
However there is a hack to get the desired result - not the fastest one, but relatively simple:
first the float is converted to a string using str() or repr()
then a new Decimal instance is created from that string.
Decimal.__format__ supports f flag which gives the desired result, and, unlike floats it prints the actual precision instead of default precision.
Thus we can make a simple utility function float_to_str:
import decimal
# create a new context for this task
ctx = decimal.Context()
# 20 digits should be enough for everyone :D
ctx.prec = 20
def float_to_str(f):
"""
Convert the given float to a string,
without resorting to scientific notation
"""
d1 = ctx.create_decimal(repr(f))
return format(d1, 'f')
Care must be taken to not use the global decimal context, so a new context is constructed for this function. This is the fastest way; another way would be to use decimal.local_context but it would be slower, creating a new thread-local context and a context manager for each conversion.
This function now returns the string with all possible digits from mantissa, rounded to the shortest equivalent representation:
>>> float_to_str(0.1)
'0.1'
>>> float_to_str(0.00000005)
'0.00000005'
>>> float_to_str(420000000000000000.0)
'420000000000000000'
>>> float_to_str(0.000000000123123123123123123123)
'0.00000000012312312312312313'
The last result is rounded at the last digit
As #Karin noted, float_to_str(420000000000000000.0) does not strictly match the format expected; it returns 420000000000000000 without trailing .0.
If you are satisfied with the precision in scientific notation, then could we just take a simple string manipulation approach? Maybe it's not terribly clever, but it seems to work (passes all of the use cases you've presented), and I think it's fairly understandable:
def float_to_str(f):
float_string = repr(f)
if 'e' in float_string: # detect scientific notation
digits, exp = float_string.split('e')
digits = digits.replace('.', '').replace('-', '')
exp = int(exp)
zero_padding = '0' * (abs(int(exp)) - 1) # minus 1 for decimal point in the sci notation
sign = '-' if f < 0 else ''
if exp > 0:
float_string = '{}{}{}.0'.format(sign, digits, zero_padding)
else:
float_string = '{}0.{}{}'.format(sign, zero_padding, digits)
return float_string
n = 0.000000054321654321
assert(float_to_str(n) == '0.000000054321654321')
n = 0.00000005
assert(float_to_str(n) == '0.00000005')
n = 420000000000000000.0
assert(float_to_str(n) == '420000000000000000.0')
n = 4.5678e-5
assert(float_to_str(n) == '0.000045678')
n = 1.1
assert(float_to_str(n) == '1.1')
n = -4.5678e-5
assert(float_to_str(n) == '-0.000045678')
Performance:
I was worried this approach may be too slow, so I ran timeit and compared with the OP's solution of decimal contexts. It appears the string manipulation is actually quite a bit faster. Edit: It appears to only be much faster in Python 2. In Python 3, the results were similar, but with the decimal approach slightly faster.
Result:
Python 2: using ctx.create_decimal(): 2.43655490875
Python 2: using string manipulation: 0.305557966232
Python 3: using ctx.create_decimal(): 0.19519368198234588
Python 3: using string manipulation: 0.2661344590014778
Here is the timing code:
from timeit import timeit
CODE_TO_TIME = '''
float_to_str(0.000000054321654321)
float_to_str(0.00000005)
float_to_str(420000000000000000.0)
float_to_str(4.5678e-5)
float_to_str(1.1)
float_to_str(-0.000045678)
'''
SETUP_1 = '''
import decimal
# create a new context for this task
ctx = decimal.Context()
# 20 digits should be enough for everyone :D
ctx.prec = 20
def float_to_str(f):
"""
Convert the given float to a string,
without resorting to scientific notation
"""
d1 = ctx.create_decimal(repr(f))
return format(d1, 'f')
'''
SETUP_2 = '''
def float_to_str(f):
float_string = repr(f)
if 'e' in float_string: # detect scientific notation
digits, exp = float_string.split('e')
digits = digits.replace('.', '').replace('-', '')
exp = int(exp)
zero_padding = '0' * (abs(int(exp)) - 1) # minus 1 for decimal point in the sci notation
sign = '-' if f < 0 else ''
if exp > 0:
float_string = '{}{}{}.0'.format(sign, digits, zero_padding)
else:
float_string = '{}0.{}{}'.format(sign, zero_padding, digits)
return float_string
'''
print(timeit(CODE_TO_TIME, setup=SETUP_1, number=10000))
print(timeit(CODE_TO_TIME, setup=SETUP_2, number=10000))
As of NumPy 1.14.0, you can just use numpy.format_float_positional. For example, running against the inputs from your question:
>>> numpy.format_float_positional(0.000000054321654321)
'0.000000054321654321'
>>> numpy.format_float_positional(0.00000005)
'0.00000005'
>>> numpy.format_float_positional(0.1)
'0.1'
>>> numpy.format_float_positional(4.5678e-20)
'0.000000000000000000045678'
numpy.format_float_positional uses the Dragon4 algorithm to produce the shortest decimal representation in positional format that round-trips back to the original float input. There's also numpy.format_float_scientific for scientific notation, and both functions offer optional arguments to customize things like rounding and trimming of zeros.
If you are ready to lose your precision arbitrary by calling str() on the float number, then it's the way to go:
import decimal
def float_to_string(number, precision=20):
return '{0:.{prec}f}'.format(
decimal.Context(prec=100).create_decimal(str(number)),
prec=precision,
).rstrip('0').rstrip('.') or '0'
It doesn't include global variables and allows you to choose the precision yourself. Decimal precision 100 is chosen as an upper bound for str(float) length. The actual supremum is much lower. The or '0' part is for the situation with small numbers and zero precision.
Note that it still has its consequences:
>> float_to_string(0.10101010101010101010101010101)
'0.10101010101'
Otherwise, if the precision is important, format is just fine:
import decimal
def float_to_string(number, precision=20):
return '{0:.{prec}f}'.format(
number, prec=precision,
).rstrip('0').rstrip('.') or '0'
It doesn't miss the precision being lost while calling str(f).
The or
>> float_to_string(0.1, precision=10)
'0.1'
>> float_to_string(0.1)
'0.10000000000000000555'
>>float_to_string(0.1, precision=40)
'0.1000000000000000055511151231257827021182'
>>float_to_string(4.5678e-5)
'0.000045678'
>>float_to_string(4.5678e-5, precision=1)
'0'
Anyway, maximum decimal places are limited, since the float type itself has its limits and cannot express really long floats:
>> float_to_string(0.1, precision=10000)
'0.1000000000000000055511151231257827021181583404541015625'
Also, whole numbers are being formatted as-is.
>> float_to_string(100)
'100'
I think rstrip can get the job done.
a=5.4321654321e-08
'{0:.40f}'.format(a).rstrip("0") # float number and delete the zeros on the right
# '0.0000000543216543210000004442039220863003' # there's roundoff error though
Let me know if that works for you.
Interesting question, to add a little bit more of content to the question, here's a litte test comparing #Antti Haapala and #Harold solutions outputs:
import decimal
import math
ctx = decimal.Context()
def f1(number, prec=20):
ctx.prec = prec
return format(ctx.create_decimal(str(number)), 'f')
def f2(number, prec=20):
return '{0:.{prec}f}'.format(
number, prec=prec,
).rstrip('0').rstrip('.')
k = 2*8
for i in range(-2**8,2**8):
if i<0:
value = -k*math.sqrt(math.sqrt(-i))
else:
value = k*math.sqrt(math.sqrt(i))
value_s = '{0:.{prec}E}'.format(value, prec=10)
n = 10
print ' | '.join([str(value), value_s])
for f in [f1, f2]:
test = [f(value, prec=p) for p in range(n)]
print '\t{0}'.format(test)
Neither of them gives "consistent" results for all cases.
With Anti's you'll see strings like '-000' or '000'
With Harolds's you'll see strings like ''
I'd prefer consistency even if I'm sacrificing a little bit of speed. Depends which tradeoffs you want to assume for your use-case.
using format(float, ' .f '):
old = 0.00000000000000000000123
if str(old).__contains__('e-'):
float_length = str(old)[-2:]
new=format(old,'.'+str(float_length)+'f')
print(old)
print(new)

How to print floating point numbers as it is without any truncation in python?

I have some number 0.0000002345E^-60. I want to print the floating point value as it is.
What is the way to do it?
print %f truncates it to 6 digits. Also %n.nf gives fixed numbers. What is the way to print without truncation.
Like this?
>>> print('{:.100f}'.format(0.0000002345E-60))
0.0000000000000000000000000000000000000000000000000000000000000000002344999999999999860343602938602754
As you might notice from the output, it’s not really that clear how you want to do it. Due to the float representation you lose precision and can’t really represent the number precisely. As such it’s not really clear where you want the number to stop displaying.
Also note that the exponential representation is often used to more explicitly show the number of significant digits the number has.
You could also use decimal to not lose the precision due to binary float truncation:
>>> from decimal import Decimal
>>> d = Decimal('0.0000002345E-60')
>>> p = abs(d.as_tuple().exponent)
>>> print(('{:.%df}' % p).format(d))
0.0000000000000000000000000000000000000000000000000000000000000000002345
You can use decimal.Decimal:
>>> from decimal import Decimal
>>> str(Decimal(0.0000002345e-60))
'2.344999999999999860343602938602754401109865640550232148836753621775217856801120686600683401464097113374472942165409862789978024748827516129306833728589548440037314681709534891496105046826414763927459716796875E-67'
This is the actual value of float created by literal 0.0000002345e-60. Its value is a number representable as python float which is closest to actual 0.0000002345 * 10**-60.
float should be generally used for approximate calculations. If you want accurate results you should use something else, like mentioned Decimal.
If I understand, you want to print a float?
The problem is, you cannot print a float.
You can only print a string representation of a float. So, in short, you cannot print a float, that is your answer.
If you accept that you need to print a string representation of a float, and your question is how specify your preferred format for the string representations of your floats, then judging by the comments you have been very unclear in your question.
If you would like to print the string representations of your floats in exponent notation, then the format specification language allows this:
{:g} or {:G}, depending whether or not you want the E in the output to be capitalized). This gets around the default precision for e and E types, which leads to unwanted trailing 0s in the part before the exponent symbol.
Assuming your value is my_float, "{:G}".format(my_float) would print the output the way that the Python interpreter prints it. You could probably just print the number without any formatting and get the same exact result.
If your goal is to print the string representation of the float with its current precision, in non-exponentiated form, User poke describes a good way to do this by casting the float to a Decimal object.
If, for some reason, you do not want to do this, you can do something like is mentioned in this answer. However, you should set 'max_digits' to sys.float_info.max_10_exp, instead of 14 used in the answer. This requires you to import sys at some point prior in the code.
A full example of this would be:
import math
import sys
def precision_and_scale(x):
max_digits = sys.float_info.max_10_exp
int_part = int(abs(x))
magnitude = 1 if int_part == 0 else int(math.log10(int_part)) + 1
if magnitude >= max_digits:
return (magnitude, 0)
frac_part = abs(x) - int_part
multiplier = 10 ** (max_digits - magnitude)
frac_digits = multiplier + int(multiplier * frac_part + 0.5)
while frac_digits % 10 == 0:
frac_digits /= 10
scale = int(math.log10(frac_digits))
return (magnitude + scale, scale)
f = 0.0000002345E^-60
p, s = precision_and_scale(f)
print "{:.{p}f}".format(f, p=p)
But I think the method involving casting to Decimal is probably better, overall.

Categories