I want to extract significant digit of 7 from XX=0.0007
The code is as follows
XX=0.0007
enX1=XX//10**np.floor(np.log10(XX));
But XX becomes 6not 7. Can anyone help me?
In some sense, you were lucky to start out with the value 0.0007. As it turns out, that value is one of the (many!) decimal values that cannot be represented exactly in a floating point format.
A floating point number gets usually stored in the common IEEE-754 format as powers of 2. Just like a whole number such as 175 is stored as the sum of bits with increasing powers-of-two values (165 = 128 + 32 + 4 + 1), fractions are stored as a sum of 1/power-of-two numbers. That means that a value of 1/2, 1/4, and 1/65536 can be stored exactly (and sums thereof, such as 3/4), but your 0.0007 can not. Its closest value is actually 0.0000699999999999999992887633748495. ("Closest" in the sense that adding just one more one-bit at the end will make it slightly larger than 0.0007, and the difference is ever so slightly larger than this lower one.)
In your calculation, you use the double divide slash //, which instructs Python to do an integer division and discard the fractional part. So while the intermediate calculation is correct and you get something like 6.99999..., this gets truncated and you end up with 6.
If you use a single slash, the result will keep its (exact!) decimals but Python will represent it as 7.0000, give or take a few zeroes. By default, Python displays only a small number of decimals.
Note that this still "is" not the exact value 7. The calculation starts out with an imprecise number, and although there may be some intermediate rounding here and there, there is only a small chance you end up with a precise integer. Again, not for all decimals, but for a large number of them. Other fractional values may be stored fractionally larger than the value you enter – 0.0004, for examplea – but the underlying 'problem' of accuracy is also present there. It's just not as visible as with yours.
If you want a nearest integer result, use a single divide slash for the exact calculation, followed by round to force the number to the nearest integer anyway.
a To be precise, as somewhere about 0.000400000000000000019168694409544. After your routine, Python will display it as 4 but internally it's still just a bit larger than that.
Related
I am trying to match an expected output of "13031.157014219536" exactly, and I have attempted 3 times to get the value with different methods, detailed below, and come extremely close to the value, but not close enough. What is happening in these code snippets that it causing the deviation? Is it rounding error in the calculation? Did I do something wrong?
Attempts:
item_worth = 8000
years = range(10)
for year in years:
item_worth += (item_worth*0.05)
print(item_worth)
value = item_worth * (1 + ((0.05)**10))
print(value)
cost = 8000
for x in range(10):
cost *= 1.05
print(cost)
Expected Output:
13031.157014219536
Actual Outputs:
13031.157014219529
13031.157014220802
13031.157014219538
On most machine, floats are represented by fp64, or double floats.
You can check the precision of those for your number that way (not a method to be used for real computation. Just out of curiosity):
import struct
struct.pack('d', 13031.157014219536)
# bytes representing that number in fp64 b'\xf5\xbc\n\x19\x94s\xc9#'
# or, in a more humanely understandable way
struct.unpack('q', struct.pack('d', 13031.157014219536))[0]
# 4668389568658717941
# That number has no meaning, except that this integer is represented by the same bytes as your float.
# Now, let's see what float is "next in line"
struct.unpack('d', struct.pack('q', 4668389568658717941+1))[0]
# 13031.157014219538
Note that this code works on most machine, but is not reliable. First of all, it relies on the fact that significant bits are not just all 1. Otherwise, it would give a totally unrelated number. Secondly, it makes assumption that ints are LE. But well, it gave me what I wanted.
That is the information that the smallest number bigger than 13031.157014219536 is 13031.157014219538.
(Or, said more accurately for this kind of conversation: the smallest float bigger than 13031.157014219536 whose representation is not the same as the representation of 13031.157014219536 has the same representation as 13031.157014219538)
So, my point is you are flirting with the representation limit. You can't expect the sum of 10 numbers to be more accurate.
I could also have said that saying that the biggest power of 2 smaller than your number is 8192=2¹³.
So, that 13 is the exponent of your float in its representation. And this you have 53 significant bits, the precision of such a number is 2**(13-53+1) = 1.8×10⁻¹² (which is indeed also the result of 13031.157014219538-13031.157014219536). Hence the reason why in decimal, 12 decimal places are printed. But not all combination of them can exist, and the last one is not insignificant, but not fully significant neither.
If your computation is the result of the sum of 10 such numbers, you could even have an error 10 times bigger than your last result the right to complain :D
The value 0.1 is not representable as a 64 bits floats.
The exact value is roughly equals to 0.10000000000000000555
https://www.exploringbinary.com/why-0-point-1-does-not-exist-in-floating-point/
You can highlight this behavior with this simple code:
timestep = 0.1
iterations = 1_000_000
total = 0
for _ in range(iterations):
total += timestep
print(total - timestep * iterations) # output is not zero but 1.3328826753422618e-06
I totally understand why 0.1 is not representable as an exact value as a float 64, but what I don't get is why when I do print(0.1), it outputs 0.1 and not the underlying value as a float 64.
Of course, the underlying value has many more digits on a base 10 system so there should be some rounding involved, but I am looking for the specification for all values and how to control that.
I had the issue with some application storing data in database:
the python app (using str(0.1)) would show 0.1
another database client UI would show 0.10000000000000000555, which would throw off the end user
P-S: I had other issues with other values
Regards,
First, you are right, floats (single, double, whatever) have an exact value.
For 64 bits IEEE-754 double, the nearest representable value to 0.1 would be exactly 0.1000000000000000055511151231257827021181583404541015625, quite long as you can see. But representable floating point values all have a finite number of decimal digits, because the base (2) is a divisor of some power of 10.
For a REPL language like python, it is essential to have this property:
the printed representation of the float shall be reinterpreted as the same value
A consequence is that
every two different float shall have different printed representation
For obtaining those properties, there are several possbilities:
print the exact value. That can be many digits, and for the vast majority of humans, just noise.
print enough digits so that every two different float have a different representation. For double precision, that's 17 digits in the worse case. So a naive implementation for representing floating point values would be to always print 17 significant digits.
print the shortest representation that would be reinterpreted unchanged.
Python, and many other languages have chosen the 3rd solution, because it is considered annoying to print 0.10000000000000001 when user have entered 0.1. Human users generally choose the shorter representation and printed representation is for human consumption. The shorter, the better.
The bad property is that it could give the false impression that those floating point values are storing exact decimal values like 1/10. That's a knowledge that is evangelized here and in many places now.
I'm pretty new to python, and I've made a table which calculates T=1+2^-n-1 and C=2^n, which both give the same values from n=40 to n=52, but for n=52 to n=61 I get 0.0 for T, whereas C gives me progressively smaller decimals each time - why is this?
I think I understand why T becomes 0.0, because of python using binary floating point and because of the machine epsilon value - but I'm slightly confused as to why C doesn't also become 0.0.
import numpy as np
import math
t=np.zeros(21)
c=np.zeros(21)
for n in range(40,61):
m=n-40
t[m]=1+2**(-n)-1
c[m]=2**(-n)
print (n,t[m],c[m])
The "floating" in floating point means that values are represented by storing a fixed number of leading digits and a scale factor, rather than assuming a fixed scale (which would be fixed point).
2**-53 only takes one (binary) digit to represent (not including the scale), but 1+2**-53 would take 54 to represent exactly. Python floats only have 53 binary digits of precision; 2**-53 can be represented exactly, but 1+2**-53 gets rounded to exactly 1, and subtracting 1 from that gives exactly 0. Thus, we have
>>> 2**-53
1.1102230246251565e-16
>>> 1+(2**-53)-1
0.0
Postscript: you might wonder why 2**-53 displays as a value not equal to the exact mathematical value when I said it was exact. That's due to the float->string conversion logic, which only keeps enough decimal digits to reconstruct the original float (instead of printing a bunch of digits at the end that are usually just noise).
The difference between both is indeed due to floating-point representation. Indeed, if you perform 1 + X where X is a very very small number, then the floating-point representation sets its exponent value to 0 and the precision is ensured by the mantissa, which is 52-bit on a 64-bit computer. Therefore, 1 + 2^(-X) if X > 52 is equal to 1. However, even 2^-100 can be represented in double-precision floating-point, so you can see C decrease for a larger number of samples.
I am having a bit of trouble understanding some results I am getting while operating with Python 2.7.
>>> x=1
>>> e=1e-20
>>> x+e
1.0
>>> x+e-x
0.0
>>> e+x-x
0.0
>>> x-x+e
1e-20
This is copied directly from Python. I am having a class on how to program on Python and I do not understand the disparity of results (x+e==1, x-x+e==1e-20 but x+e-x==0 and e+x-x==0).
I have already read the Python tutorial on Representation Errors, but I believe none of that was mentioned there
Thanks in advance
Floating-point addition is not associative.
x+e-x is grouped as (x+e)-x. It adds x and e, rounds the result to the nearest representable number (which is 1), then subtracts x from the result and rounds again, producing 0.
x-x+e is grouped as (x-x)+e. It subtracts x from x, producing 0, and rounds it to the nearest representable number, which is 0. It then adds e to 0, producing e, and rounds it to the nearest representable number, which is e.
This is because of the way that computers represent floating point numbers.
This is all really in binary format but let's pretend that it works with base 10 numbers because that's a lot easier for us to relate to.
A floating point number is expressed on the form 0.x*10^y where x is a 10-digit number (I'm omitting trailing zeroes here) and y is the exponent. This means that the number 1.0 is expressed as 0.1*10^1 and the number 0.1 as 0.1*10^0.
To add these two numbers together we need to make sure that they have the same exponent. We can do this easily by shifting the numbers back and forth, i.e. we change 0.1*10^0 to 0.01*10^1 and then we add the together to get 0.11*10^1.
When we have 0.1*10^1 and 0.1*10^-19 (1e-20) we will shift 0.1*10^-19 20 steps, meaning that the 1 will fall outside the range of our 10 digit number so we will end up with 0.1*10^1 + 0.0*10^1 = 0.1*10^1.
The reason you end up with 1e-20 in your last example is because addition is done from left to right, so we subtract 0.1*10^1 from 0.1*10^1 ending up with 0.0*10^0 and add 0.1*10^-19 to that, which is a special case where we don't need to shift any of them because one of them is exactly zero.
Python's math module contain handy functions like floor & ceil. These functions take a floating point number and return the nearest integer below or above it. However these functions return the answer as a floating point number. For example:
import math
f=math.floor(2.3)
Now f returns:
2.0
What is the safest way to get an integer out of this float, without running the risk of rounding errors (for example if the float is the equivalent of 1.99999) or perhaps I should use another function altogether?
All integers that can be represented by floating point numbers have an exact representation. So you can safely use int on the result. Inexact representations occur only if you are trying to represent a rational number with a denominator that is not a power of two.
That this works is not trivial at all! It's a property of the IEEE floating point representation that int∘floor = ⌊⋅⌋ if the magnitude of the numbers in question is small enough, but different representations are possible where int(floor(2.3)) might be 1.
To quote from Wikipedia,
Any integer with absolute value less than or equal to 224 can be exactly represented in the single precision format, and any integer with absolute value less than or equal to 253 can be exactly represented in the double precision format.
Use int(your non integer number) will nail it.
print int(2.3) # "2"
print int(math.sqrt(5)) # "2"
You could use the round function. If you use no second parameter (# of significant digits) then I think you will get the behavior you want.
IDLE output.
>>> round(2.99999999999)
3
>>> round(2.6)
3
>>> round(2.5)
3
>>> round(2.4)
2
Combining two of the previous results, we have:
int(round(some_float))
This converts a float to an integer fairly dependably.
That this works is not trivial at all! It's a property of the IEEE floating point representation that int∘floor = ⌊⋅⌋ if the magnitude of the numbers in question is small enough, but different representations are possible where int(floor(2.3)) might be 1.
This post explains why it works in that range.
In a double, you can represent 32bit integers without any problems. There cannot be any rounding issues. More precisely, doubles can represent all integers between and including 253 and -253.
Short explanation: A double can store up to 53 binary digits. When you require more, the number is padded with zeroes on the right.
It follows that 53 ones is the largest number that can be stored without padding. Naturally, all (integer) numbers requiring less digits can be stored accurately.
Adding one to 111(omitted)111 (53 ones) yields 100...000, (53 zeroes). As we know, we can store 53 digits, that makes the rightmost zero padding.
This is where 253 comes from.
More detail: We need to consider how IEEE-754 floating point works.
1 bit 11 / 8 52 / 23 # bits double/single precision
[ sign | exponent | mantissa ]
The number is then calculated as follows (excluding special cases that are irrelevant here):
-1sign × 1.mantissa ×2exponent - bias
where bias = 2exponent - 1 - 1, i.e. 1023 and 127 for double/single precision respectively.
Knowing that multiplying by 2X simply shifts all bits X places to the left, it's easy to see that any integer must have all bits in the mantissa that end up right of the decimal point to zero.
Any integer except zero has the following form in binary:
1x...x where the x-es represent the bits to the right of the MSB (most significant bit).
Because we excluded zero, there will always be a MSB that is one—which is why it's not stored. To store the integer, we must bring it into the aforementioned form: -1sign × 1.mantissa ×2exponent - bias.
That's saying the same as shifting the bits over the decimal point until there's only the MSB towards the left of the MSB. All the bits right of the decimal point are then stored in the mantissa.
From this, we can see that we can store at most 52 binary digits apart from the MSB.
It follows that the highest number where all bits are explicitly stored is
111(omitted)111. that's 53 ones (52 + implicit 1) in the case of doubles.
For this, we need to set the exponent, such that the decimal point will be shifted 52 places. If we were to increase the exponent by one, we cannot know the digit right to the left after the decimal point.
111(omitted)111x.
By convention, it's 0. Setting the entire mantissa to zero, we receive the following number:
100(omitted)00x. = 100(omitted)000.
That's a 1 followed by 53 zeroes, 52 stored and 1 added due to the exponent.
It represents 253, which marks the boundary (both negative and positive) between which we can accurately represent all integers. If we wanted to add one to 253, we would have to set the implicit zero (denoted by the x) to one, but that's impossible.
If you need to convert a string float to an int you can use this method.
Example: '38.0' to 38
In order to convert this to an int you can cast it as a float then an int. This will also work for float strings or integer strings.
>>> int(float('38.0'))
38
>>> int(float('38'))
38
Note: This will strip any numbers after the decimal.
>>> int(float('38.2'))
38
math.floor will always return an integer number and thus int(math.floor(some_float)) will never introduce rounding errors.
The rounding error might already be introduced in math.floor(some_large_float), though, or even when storing a large number in a float in the first place. (Large numbers may lose precision when stored in floats.)
Another code sample to convert a real/float to an integer using variables.
"vel" is a real/float number and converted to the next highest INTEGER, "newvel".
import arcpy.math, os, sys, arcpy.da
.
.
with arcpy.da.SearchCursor(densifybkp,[floseg,vel,Length]) as cursor:
for row in cursor:
curvel = float(row[1])
newvel = int(math.ceil(curvel))
Since you're asking for the 'safest' way, I'll provide another answer other than the top answer.
An easy way to make sure you don't lose any precision is to check if the values would be equal after you convert them.
if int(some_value) == some_value:
some_value = int(some_value)
If the float is 1.0 for example, 1.0 is equal to 1. So the conversion to int will execute. And if the float is 1.1, int(1.1) equates to 1, and 1.1 != 1. So the value will remain a float and you won't lose any precision.