Loss of precision float in python

Loss of precision float in python - python

I have a list called scores of varying -log probabilities.
when I call this function:
maxState = scores.pop(scores.index(max(scores)))
and print maxState, I realize that the maxState loses its precision as a float. Is there a way I can get the maxState without losing precision?
ex: I print out the list scores: [-35.7971525669589, -34.67875545008369]
and print maxState, I get this: -34.6787554501
(You can see it's rounded)

You are confusing string presentation with actual contents. Nowhere is precision lost, only the string produced to write to your console is using a rounded value rather than show you all digits. And always remember that float numbers are digital approximations, not precise values.
Python floats are formatted differently when using the str() and repr() functions; in a list or other container, repr() is used, but print it directly and str() is used.
If you don't like either option, format it explicitly with the format() function and specifying a precision:
print format(maxState, '.12f')
to print it with 8 decimals, for example.
Demo:
>>> maxState = -34.67875545008369
>>> repr(maxState)
'-34.67875545008369'
>>> str(maxState)
'-34.6787554501'
>>> format(maxState, '.8f')
'-34.67875545'
>>> format(maxState, '.12f')
'-34.678755450084'
The repr() output is roughly equivalent to using '.17g' as the format, while str() is equivalent to '.12g'; here the precision denotes when to use scientific notation (e) and when to display in floating point notation (f).
I say roughly because the repr() output aims to give you round-trippable output; see the change notes for Python 3.1 on float() representation, which where backported to Python 2.7:
What is new is how the number gets displayed. Formerly, Python used a simple approach. The value of repr(1.1) was computed as format(1.1, '.17g') which evaluated to '1.1000000000000001'. The advantage of using 17 digits was that it relied on IEEE-754 guarantees to assure that eval(repr(1.1)) would round-trip exactly to its original value. The disadvantage is that many people found the output to be confusing (mistaking intrinsic limitations of binary floating point representation as being a problem with Python itself).
The new algorithm for repr(1.1) is smarter and returns '1.1'. Effectively, it searches all equivalent string representations (ones that get stored with the same underlying float value) and returns the shortest representation.

Related

Preserving or adding decimal places in Python 3.x

I am trying to return a number with 6 decimal places, regardless of what the number is.
For example:
>>> a = 3/6
>>> a
0.5
How can I take a and make it 0.500000 while preserving its type as a float?
I've tried
'{0:.6f}'.format(a)
but that returns a string. I'd like something that accomplishes this same task, but returns a float.

In memory of the computer, the float is being stored as an IEEE754 object, that means it's just a bunch of binary data exposed with a given format that's nothing alike the string of the number as you write it.
So when you manipulate it, it's still a float and has no number of decimals after the dot. It's only when you display it that it does, and whatever you do, when you display it, it gets converted to a string.
That's when you do the conversion to string that you can specify the number of decimals to show, and you do it using the string format as you wrote.

This question shows a slight misunderstanding on the nature of data types such as float and string.
A float in a computer has a binary representation, not a decimal one. The rendering to decimal that python is giving you in the console was converted to a string when it was printed, even if it's implicit by the print function. There is no difference between how a 0.5 and 0.5000000 is stored as a float in its binary representation.
When you are writing application code, it is best not to worry about the presentation until it gets to the end user where it must, somehow, be converted to a string if only implicitly. At that point you can worry about decimal places, or even whether you want it shown in decimal at all.

Identify actual precision of a float

I am interacting with an API that returns floats. I am trying to calculate the number of decimal places with which the API created these floats.
For example:
# API returns the following floats.
>> 0.0194360600000000015297185740.....
>> 0.0193793800000000016048318230.....
>> 0.0193793699999999999294963970.....
# Quite clearly these are supposed to represent:
>> 0.01943606
>> 0.01937938
>> 0.01937937
# And are therefore ACTUALLY accurate to only 8 decimal places.
How can I identify that the floats are actually accurate to 8 decimal places? Once I do that, I can initialize a decimal.Decimal instance with the "true" values rather than the inaccurate floats.
Edit: The number of accurate decimal places returned by the API varies and is not always 8!

If you are using Python 2.7 or Python 3.1+, consider using the repr() builtin.
Here's how it works with your examples in a Python 3.6 interpreter.
>>> repr(0.0194360600000000015297185740)
'0.01943606'
>>> repr(0.0193793800000000016048318230)
'0.01937938'
>>> repr(0.0193793699999999999294963970)
'0.01937937'
This works because repr() shows the minimum precision of the number, n, that still satisfies float(repr(n)) == n.
Given the string representation returned by repr(), you can count the number of digits to the right of the decimal point.

Losing float precision within the dictionary

jsons = json.loads(request.data)
jsons -->
dict: {u'json_event': {u'timestamp': 1408878136.318921}}
and
json_event = jsons['json_event']
json_event -->
dict: {u'timestamp': 1408878136.318921}
However when I do json_event['timestamp']
I only get two decimal places precision:
float: 1408878136.32
Is there a way to keep the precision?
Update:
I don't think this is a representation problem.
event, is_created = Event.create_or_update(json_event['event_id'],
timestamp=json_event['timestamp'])
class Event(ndb.Model):
...
timestamp = ndb.FloatProperty(required=True)
event.timestamp --> 1408878136.32

When you—or whatever tool you use to print the numbers—uses standard conversion to string, only 12 significant digits get printed:
>>> str(1408878136.318921)
'1408878136.32'
But when you use the repr builtin, enough significant digits get printed to ensure identical value would be read back by python parser:
>>> repr(1408878136.318921)
'1408878136.318921'
So just wrap whatever you are printing in a manual repr() call.
This is just representational issue. Obviously the JSON printer uses some logic (may be via repr or may not) to print enough digits to read back the same value. But the tool you are using to print them is not.
Note that the logic is pretty complex, because binary fractional numbers don't correspond exactly to decimal fractional numbers. 0.3 has periodic representation in binary and thus if you read 0.3, the actual number stored will have slightly different value. And a closest decimal representation is different. So the logic has to consider how much rounding it can apply to still read back the correct value.

Python Shell - "Extras" in float subtraction [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Floating Point Limitations
Using Python 2.7 here.
Can someone explain why this happens in the shell?
>>> 5.2-5.0
0.20000000000000018
Searching yielded things about different scales of numbers not producing the right results (a very small number and a very large number), but that seemed pretty general, and considering the numbers I'm using are of the same scale, I don't think that's why this happens.
EDIT: I suppose I didn't define that the "this thing happening" I meant was that it returns 0.2 ... 018 instead of simply resulting in 0.2. I get that print rounds, and removed the print part in the code snippet, as that was misleading.

You need to understand that 5.2-5.0 really is 0.20000000000000018, not 0.2. The standard explanation for this is found in What Every Computer Scientist Should Know About Floating-Point Arithmetic.
If you don't want to read all of that, just accept that 5.2, 5.0, and 0.20000000000000018 are all just approximations, as close as the computer can get to the numbers you really way.
Python has some tricks to allow you to not know what every computer scientist should know and still get away with it. The main trick is that str(f)—that is, the human-readable rendition of a floating-point number—is truncated to 12 significant digits, so str(5.2-5.0) is "0.2", not "0.20000000000000018". But sometimes you need all the precision you can get, so repr(f)—that is, the machine-readable rendition—is not truncated, so repr(5.2-5.0) is "0.20000000000000018".
Now the only thing left to understand is what the interpreter shell does. As Ashwini Chaudhary explains, just evaluating something in the shell prints out its repr, while the print statement prints out its str.

shell uses repr():
In [1]: print repr(5.2-5.0)
0.20000000000000018
In [2]: print str(5.2-5.0)
0.2
In [3]: print 5.2-5.0
0.2

The default implementation of float.__str__ limits the output to 12 digits only.
Thus, the least significant digits are dropped and what is left is the value 0.2.
To print more digits (if available), use string formatting:
print '%f' % result # prints 0.200000
That defaults to 6 digits, but you can specify more precision:
print '%.16f' % result # prints 0.2000000000000002
Alternatively, python offers a newer string formatting method too:
print '{0:.16f}'.format(result) # prints 0.2000000000000002
Why python produces the 'imprecise' result in the first place has everything to do with the imprecise nature of floating point arithmetic. Use the decimal module instead if you need more predictable precision:
>>> from decimal import *
>>> getcontext().prec = 1
>>> Decimal(5.2) - Decimal(5.0)
Decimal('0.2')

Python has two different ways of converting an object to a string, the __str__ and __repr__ methods. __str__ is meant to be a normal string output and is used by print; __repr__ is meant to be a more exact representation and is what is displayed when you don't use print, or when you print the contents of a list or dictionary. __str__ rounds floating-point values.
As for why the actual result of the subtraction is 0.20000000000000018 rather than 0.2 exactly, it has to do with the internal representation of floating point. It's impossible to represent 5.2 exactly because it's an infinitely repeating binary number. The closest that you can come is approximately 5.20000000000000018.

floats inside tuples changing values when accessed

So I have a list of tuples of two floats each. Each tuple represents a range. I am going through another list of floats which represent values to be fit into the ranges. All of these floats are < 1 but positive, so precision matter. One of my tests to determine if a value fits into a range is failing when it should pass. If I print the value and the range that is causing problems I can tell this much:
curValue = 0.00145000000671
range = (0.0014500000067055225, 0.0020968749796738849)
The conditional that is failing is:
if curValue > range[0] and ... blah :
# do some stuff
From the values given by curValue and range, the test should clearly pass (don't worry about what is in the conditional). Now, if I print explicitly what the value of range[0] is I get:
range[0] = 0.00145000000671
Which would explain why the test is failing. So my question then, is why is the float changing when it is accessed. It has decimal values available up to a certain precision when part of a tuple, and a different precision when accessed. Why would this be? What can I do to ensure my data maintains a consistent amount of precision across my calculations?

The float doesn't change. The built-in numberic types are all immutable. The cause for what you're observing is that:
print range[0] uses str on the float, which (up until very recent versions of Python) printed less digits of a float.
Printing a tuple (be it with repr or str) uses repr on the individual items, which gives a much more accurate representation (again, this isn't true anymore in recent releases which use a better algorithm for both).
As for why the condition doesn't work out the way you expect, it's propably the usual culprit, the limited precision of floats. Try print repr(curVal), repr(range[0]) to see if what Python decided was the closest representation of your float literal possible.

In modern day PC's floats aren't that precise. So even if you enter pi as a constant to 100 decimals, it's only getting a few of them accurate. The same is happening to you. This is because in 32-bit floats you only get 24 bits of mantissa, which limits your precision (and in unexpected ways because it's in base2).
Please note, 0.00145000000671 isn't the exact value as stored by Python. Python only diplays a few decimals of the complete stored float if you use print. If you want to see exactly how python stores the float use repr.
If you want better precision use the decimal module.

It isn't changing per se. Python is doing its best to store the data as a float, but that number is too precise for float, so Python modifies it before it is even accessed (in the very process of storing it). Funny how something so small is such a big pain.
You need to use a arbitrary fixed point module like Simple Python Fixed Point or the decimal module.

Not sure it would work in this case, because I don't know if Python's limiting in the output or in the storage itself, but you could try doing:
if curValue - range[0] > 0 and...

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Loss of precision float in python - python

Related

Preserving or adding decimal places in Python 3.x

Identify actual precision of a float

Losing float precision within the dictionary

Python Shell - "Extras" in float subtraction [duplicate]

floats inside tuples changing values when accessed

Categories

Resources