Losing float precision within the dictionary - python

jsons = json.loads(request.data)
jsons -->
dict: {u'json_event': {u'timestamp': 1408878136.318921}}
and
json_event = jsons['json_event']
json_event -->
dict: {u'timestamp': 1408878136.318921}
However when I do json_event['timestamp']
I only get two decimal places precision:
float: 1408878136.32
Is there a way to keep the precision?
Update:
I don't think this is a representation problem.
event, is_created = Event.create_or_update(json_event['event_id'],
timestamp=json_event['timestamp'])
class Event(ndb.Model):
...
timestamp = ndb.FloatProperty(required=True)
event.timestamp --> 1408878136.32

When you—or whatever tool you use to print the numbers—uses standard conversion to string, only 12 significant digits get printed:
>>> str(1408878136.318921)
'1408878136.32'
But when you use the repr builtin, enough significant digits get printed to ensure identical value would be read back by python parser:
>>> repr(1408878136.318921)
'1408878136.318921'
So just wrap whatever you are printing in a manual repr() call.
This is just representational issue. Obviously the JSON printer uses some logic (may be via repr or may not) to print enough digits to read back the same value. But the tool you are using to print them is not.
Note that the logic is pretty complex, because binary fractional numbers don't correspond exactly to decimal fractional numbers. 0.3 has periodic representation in binary and thus if you read 0.3, the actual number stored will have slightly different value. And a closest decimal representation is different. So the logic has to consider how much rounding it can apply to still read back the correct value.

Related

How to convert exponent in Python and get rid of the 'e+'?

I'm starting with Python and I recently came across a dataset with big values.
One of my fields has a list of values that looks like this: 1.3212724310201994e+18 (note the e+18 by the end of the number).
How can I convert it to a floating point number and remove the the exponent without affecting the value?
First of all, the number is already a floating point number, and you do not need to change this. The only issue is that you want to have more control over how it is converted to a string for output purposes.
By default, floating point numbers above a certain size are converted to strings using exponential notation (with "e" representing "*10^"). However, if you want to convert it to a string without exponential notation, you can use the f format specifier, for example:
a = 1.3212724310201994e+18
print("{:f}".format(a))
gives:
1321272431020199424.000000
or using "f-strings" in Python 3:
print(f"{a:f}")
here the first f tells it to use an f-string and the :f is the floating point format specifier.
You can also specify the number of decimal places that should be displayed, for example:
>>> print(f"{a:.2f}") # 2 decimal places
1321272431020199424.00
>>> print(f"{a:.0f}") # no decimal places
1321272431020199424
Note that the internal representation of a floating-point number in Python uses 53 binary digits of accuracy (approximately one part in 10^16), so in this case, the value of your number of magnitude approximately 10^18 is not stored with accuracy down to the nearest integer, let alone any decimal places. However, the above gives the general principle of how you control the formatting used for string conversion.
You can use Decimal from the decimal module for each element of your data:
from decimal import Decimal
s = 1.3212724310201994e+18
print(Decimal(s))
Output:
1321272431020199424

Preserving or adding decimal places in Python 3.x

I am trying to return a number with 6 decimal places, regardless of what the number is.
For example:
>>> a = 3/6
>>> a
0.5
How can I take a and make it 0.500000 while preserving its type as a float?
I've tried
'{0:.6f}'.format(a)
but that returns a string. I'd like something that accomplishes this same task, but returns a float.
In memory of the computer, the float is being stored as an IEEE754 object, that means it's just a bunch of binary data exposed with a given format that's nothing alike the string of the number as you write it.
So when you manipulate it, it's still a float and has no number of decimals after the dot. It's only when you display it that it does, and whatever you do, when you display it, it gets converted to a string.
That's when you do the conversion to string that you can specify the number of decimals to show, and you do it using the string format as you wrote.
This question shows a slight misunderstanding on the nature of data types such as float and string.
A float in a computer has a binary representation, not a decimal one. The rendering to decimal that python is giving you in the console was converted to a string when it was printed, even if it's implicit by the print function. There is no difference between how a 0.5 and 0.5000000 is stored as a float in its binary representation.
When you are writing application code, it is best not to worry about the presentation until it gets to the end user where it must, somehow, be converted to a string if only implicitly. At that point you can worry about decimal places, or even whether you want it shown in decimal at all.

Loss of precision float in python

I have a list called scores of varying -log probabilities.
when I call this function:
maxState = scores.pop(scores.index(max(scores)))
and print maxState, I realize that the maxState loses its precision as a float. Is there a way I can get the maxState without losing precision?
ex: I print out the list scores: [-35.7971525669589, -34.67875545008369]
and print maxState, I get this: -34.6787554501
(You can see it's rounded)
You are confusing string presentation with actual contents. Nowhere is precision lost, only the string produced to write to your console is using a rounded value rather than show you all digits. And always remember that float numbers are digital approximations, not precise values.
Python floats are formatted differently when using the str() and repr() functions; in a list or other container, repr() is used, but print it directly and str() is used.
If you don't like either option, format it explicitly with the format() function and specifying a precision:
print format(maxState, '.12f')
to print it with 8 decimals, for example.
Demo:
>>> maxState = -34.67875545008369
>>> repr(maxState)
'-34.67875545008369'
>>> str(maxState)
'-34.6787554501'
>>> format(maxState, '.8f')
'-34.67875545'
>>> format(maxState, '.12f')
'-34.678755450084'
The repr() output is roughly equivalent to using '.17g' as the format, while str() is equivalent to '.12g'; here the precision denotes when to use scientific notation (e) and when to display in floating point notation (f).
I say roughly because the repr() output aims to give you round-trippable output; see the change notes for Python 3.1 on float() representation, which where backported to Python 2.7:
What is new is how the number gets displayed. Formerly, Python used a simple approach. The value of repr(1.1) was computed as format(1.1, '.17g') which evaluated to '1.1000000000000001'. The advantage of using 17 digits was that it relied on IEEE-754 guarantees to assure that eval(repr(1.1)) would round-trip exactly to its original value. The disadvantage is that many people found the output to be confusing (mistaking intrinsic limitations of binary floating point representation as being a problem with Python itself).
The new algorithm for repr(1.1) is smarter and returns '1.1'. Effectively, it searches all equivalent string representations (ones that get stored with the same underlying float value) and returns the shortest representation.

Python: read mixed float and string csv file

I have a csv file with mixed floats, a string and an integer, the formatted output from a FORTRAN file.
A typical line looks like:
507.930 , 24.4097 , 1.0253E-04, O III , 4
I want to read it while keeping the float decimal places unmodified, and check to see if the first entry in each line is present is another list.
Using loadtxt and genfromtxt results in the demical places changing from 3 (or 4) to 12.
How should I tackle this?
If you need to keep precision exactly, you need to use the decimal module. Otherwise, issues with floating point arithmetic limitations might trip you up.
Chances are, though, that you don't really need that precision - just make sure you don't compare floats for equality exactly but always allow a fudge factor, and format the output to a limited number of significant digits:
# instead of if float1==float2:, use this:
if abs(float1-float2) <= sys.float_info.epsilon:
print "equal"
loadtxt appears to take a converters argument so something like:
from decimal import Decimal
numpy.loadtxt(..., converters={0: Decimal,
1: Decimal,
2: Decimal})
Should work.
Decimal's should work with whatever precision you require although if you're doing significant number crunching with Decimal it will be considerably slower than working with float. However, I assume you're just looking to transform the data without losing any precision so this should be fine.
I finished up writing some string processing code. Not elegant but it works:
stuff=loadtxt(fname1,skiprows=35,dtype="f10,f10,e10,S10,i1",delimiter=','‌​)
stuff2 = loadtxt('keylines.txt') # a list of the reference values
... # open file for writing etc
for i in range(0,len(stuff)):
bb=round(float(stuff[i][0]),3) # gets number back to correct decimal format
cc=round(float(stuff[i][1]),5) # ditto
dd=float(stuff[i][2])
ee=stuff[i][3].replace(" ","") # gets rid of extra FORTRAN spaes
ff=int(stuff[i][4])
for item in stuff2:
if bb == item:
fn.write( str(bb)+','+str("%1.5f" % cc)+','+str("%1.4e" % dd)+','+ee+','+str(ff)+'\n')

floats inside tuples changing values when accessed

So I have a list of tuples of two floats each. Each tuple represents a range. I am going through another list of floats which represent values to be fit into the ranges. All of these floats are < 1 but positive, so precision matter. One of my tests to determine if a value fits into a range is failing when it should pass. If I print the value and the range that is causing problems I can tell this much:
curValue = 0.00145000000671
range = (0.0014500000067055225, 0.0020968749796738849)
The conditional that is failing is:
if curValue > range[0] and ... blah :
# do some stuff
From the values given by curValue and range, the test should clearly pass (don't worry about what is in the conditional). Now, if I print explicitly what the value of range[0] is I get:
range[0] = 0.00145000000671
Which would explain why the test is failing. So my question then, is why is the float changing when it is accessed. It has decimal values available up to a certain precision when part of a tuple, and a different precision when accessed. Why would this be? What can I do to ensure my data maintains a consistent amount of precision across my calculations?
The float doesn't change. The built-in numberic types are all immutable. The cause for what you're observing is that:
print range[0] uses str on the float, which (up until very recent versions of Python) printed less digits of a float.
Printing a tuple (be it with repr or str) uses repr on the individual items, which gives a much more accurate representation (again, this isn't true anymore in recent releases which use a better algorithm for both).
As for why the condition doesn't work out the way you expect, it's propably the usual culprit, the limited precision of floats. Try print repr(curVal), repr(range[0]) to see if what Python decided was the closest representation of your float literal possible.
In modern day PC's floats aren't that precise. So even if you enter pi as a constant to 100 decimals, it's only getting a few of them accurate. The same is happening to you. This is because in 32-bit floats you only get 24 bits of mantissa, which limits your precision (and in unexpected ways because it's in base2).
Please note, 0.00145000000671 isn't the exact value as stored by Python. Python only diplays a few decimals of the complete stored float if you use print. If you want to see exactly how python stores the float use repr.
If you want better precision use the decimal module.
It isn't changing per se. Python is doing its best to store the data as a float, but that number is too precise for float, so Python modifies it before it is even accessed (in the very process of storing it). Funny how something so small is such a big pain.
You need to use a arbitrary fixed point module like Simple Python Fixed Point or the decimal module.
Not sure it would work in this case, because I don't know if Python's limiting in the output or in the storage itself, but you could try doing:
if curValue - range[0] > 0 and...

Categories