I was trying to process some rather large numbers in python and came across an overflow error. I decided to investigate a little bit more and came across an inequality I cannot explain.
When I evaluate 10^26 I get:
>>> 10**26
100000000000000000000000000
Which is perfectly logical. However when I evaluate 10e26 and convert it to an int I get:
>>>int(10e26)
1000000000000000013287555072
Why is this?
Do I not understand the e notation properly? (From what I know 10e26 is 10*10^26 as seen in this answer: 10e notation used with variables?)
10^26 is way past the max integer size so I was also wondering if there was any mechanism in python which could allow to work with numbers in scientific format (not considering all those zeros) in order to be able to compute operations with numbers past the max size.
The short answer is that 10e26 and 10**26 do not represent identical values.
10**26, with both operands being int values, evaluates to an int. As int represents integers with arbitrary precision, its value is exactly 1026 as intended.
10e26, on the other hand, is a float literal, and as such the resulting value is subject to the limited precision of the float type on your machine. The result of int(10e26) is the integer value of the float closest to the real number 1027.
10e26 represents ten times ten to the power of 26, which is 1027.
10**26 represents represents ten to the power of 26, 1026.
Obviously, these are different, so 10e26 == 10**26 is false.
However, if we correct the mistake so we compare 1e26 and 10**26 by evaluating 1e26 == 10**26, we get false for a different reason:
1e26 is evaluated in a limited-precision floating-point format, producing 100000000000000004764729344 in most implementations. (Python is not strict about the floating-point format.) 100000000000000004764729344 is the closest one can get to 1026 using 53 significant bits.
10**26 is evaluated with integer arithmetic, producing 100000000000000000000000000.
Comparing them reports they are different.
(I am uncertain of Python semantics, but I presume it converts the floating-point value to an extended-precision integer for the comparison. If we instead convert the integer to floating-point, with float(10**26) == 1e26, the conversion of 100000000000000000000000000 to float produces the same value, 100000000000000004764729344, and the comparison returns true.)
Related
I am trying to match an expected output of "13031.157014219536" exactly, and I have attempted 3 times to get the value with different methods, detailed below, and come extremely close to the value, but not close enough. What is happening in these code snippets that it causing the deviation? Is it rounding error in the calculation? Did I do something wrong?
Attempts:
item_worth = 8000
years = range(10)
for year in years:
item_worth += (item_worth*0.05)
print(item_worth)
value = item_worth * (1 + ((0.05)**10))
print(value)
cost = 8000
for x in range(10):
cost *= 1.05
print(cost)
Expected Output:
13031.157014219536
Actual Outputs:
13031.157014219529
13031.157014220802
13031.157014219538
On most machine, floats are represented by fp64, or double floats.
You can check the precision of those for your number that way (not a method to be used for real computation. Just out of curiosity):
import struct
struct.pack('d', 13031.157014219536)
# bytes representing that number in fp64 b'\xf5\xbc\n\x19\x94s\xc9#'
# or, in a more humanely understandable way
struct.unpack('q', struct.pack('d', 13031.157014219536))[0]
# 4668389568658717941
# That number has no meaning, except that this integer is represented by the same bytes as your float.
# Now, let's see what float is "next in line"
struct.unpack('d', struct.pack('q', 4668389568658717941+1))[0]
# 13031.157014219538
Note that this code works on most machine, but is not reliable. First of all, it relies on the fact that significant bits are not just all 1. Otherwise, it would give a totally unrelated number. Secondly, it makes assumption that ints are LE. But well, it gave me what I wanted.
That is the information that the smallest number bigger than 13031.157014219536 is 13031.157014219538.
(Or, said more accurately for this kind of conversation: the smallest float bigger than 13031.157014219536 whose representation is not the same as the representation of 13031.157014219536 has the same representation as 13031.157014219538)
So, my point is you are flirting with the representation limit. You can't expect the sum of 10 numbers to be more accurate.
I could also have said that saying that the biggest power of 2 smaller than your number is 8192=2¹³.
So, that 13 is the exponent of your float in its representation. And this you have 53 significant bits, the precision of such a number is 2**(13-53+1) = 1.8×10⁻¹² (which is indeed also the result of 13031.157014219538-13031.157014219536). Hence the reason why in decimal, 12 decimal places are printed. But not all combination of them can exist, and the last one is not insignificant, but not fully significant neither.
If your computation is the result of the sum of 10 such numbers, you could even have an error 10 times bigger than your last result the right to complain :D
I am a new user to Python 3.6.0. I am trying to divide 2 numbers that produces a big output. However, using
return ans1 // ans2
produces 55347740058143507128 while using
return int(ans1 / ans2)
produces 55347740058143506432.
Which is more accurate and why is that so?
The first one is more accurate since it gives the exact integer result.
The second represents the intermediate result as a float. Floats have limited resolution (53 bits of mantissa) whereas the result needs 66 bits to be represented exactly. This results in a loss of accuracy.
If we looks at the hex representation of both results:
>>> hex(55347740058143507128)
'0x3001aac56864d42b8L'
>>> hex(55347740058143506432)
'0x3001aac56864d4000L'
we can see that the least-significant bits of the result that didn't fit in a 53-bit mantissa all got set to zero.
One way to see the rounding directly, without any complications brought about by division is:
>>> int(float(55347740058143507128))
55347740058143506432L
The flooring integer division is more accurate, in that sense.
The problem with this construction int(ans1 / ans2), is that the result is temporarily a float (before, obviously, converting it to an integer), introducing rounding to the nearest float (the amount of rounding depends on the magnitude of the number). This can even be seen by just trying to round-trip that value through a float:
print(int(float(55347740058143507128)))
Which prints 55347740058143506432. So, because plain / results in a float, that limits its accuracy.
I am trying to write a program in python 2.7 that will first see if a number divides the other evenly, and if it does get the result of the division.
However, I am getting some interesting results when I use large numbers.
Currently I am using:
from __future__ import division
import math
a=82348972389472433334783
b=2
if a/b==math.trunc(a/b):
answer=a/b
print 'True' #to quickly see if the if loop was invoked
When I run this I get:
True
But 82348972389472433334783 is clearly not even.
Any help would be appreciated.
That's a crazy way to do it. Just use the remainder operator.
if a % b == 0:
# then b divides a evenly
quotient = a // b
The true division implicitly converts the input to floats which don't provide the precision to store the value of a accurately. E.g. on my machine
>>> int(1E15+1)
1000000000000001
>>> int(1E16+1)
10000000000000000
hence you loose precision. A similar thing happens with your big number (compare int(float(a))-a).
Now, if you check your division, you see the result "is" actually found to be an integer
>>> (a/b).is_integer()
True
which is again not really expected beforehand.
The math.trunc function does something similar (from the docs):
Return the Real value x truncated to an Integral (usually a long integer).
The duck typing nature of python allows a comparison of the long integer and float, see
Checking if float is equivalent to an integer value in python and
Comparing a float and an int in Python.
Why don't you use the modulus operator instead to check if a number can be divided evenly?
n % x == 0
I need to use a module that does some math on integers, however my input is in floats.
What I want to achieve is to convert a generic float value into a corresponding integer value and loose as little data as possible.
For example:
val : 1.28827339907e-08
result : 128827339906934
Which is achieved after multiplying by 1e22.
Unfortunately the range of values can change, so I cannot always multiply them by the same constant. Any ideas?
ADDED
To put it in other words, I have a matrix of values < 1, let's say from 1.323224e-8 to 3.457782e-6.
I want to convert them all into integers and loose as little data as possible.
The answers that suggest multiplying by a power of ten cause unnecessary rounding.
Multiplication by a power of the base used in the floating-point representation has no error in IEEE 754 arithmetic (the most common floating-point implementation) as long as there is no overflow or underflow.
Thus, for binary floating-point, you may be able to achieve your goal by multiplying the floating-point number by a power of two and rounding the result to the nearest integer. The multiplication will have no error. The rounding to integer may have an error up to .5, obviously.
You might select a power of two that is as large as possible without causing any of your numbers to exceed the bounds of the integer type you are using.
The most common conversion of floating-point to integer truncates, so that 3.75 becomes 3. I am not sure about Python semantics. To round instead of truncating, you might use a function such as round before converting to integer.
If you want to preserve the values for operations on matrices I would choose some value to multiply them all by.
For Example:
1.23423
2.32423
4.2324534
Multiply them all by 10000000 and you get
12342300
23242300
42324534
You can perform you multiplications, additions etc with your matrices. Once you have performed all your calculations you can convert them back to floats by dividing them all by the appropriate value depending on the operation you performed.
Mathematically it makes sense because
(Scalar multiplication)
M1` = M1 * 10000000
M2` = M2 * 10000000
Result = M1`.M2`
Result = (M1 x 10000000).(M2 x 10000000)
Result = (10000000 x 10000000) x (M1.M2)
So in the case of multiplication you would divide your result by 10000000 x 10000000.
If its addition / subtraction then you simply divide by 10000000.
You can either choose the value to multiply by through your knowledge of what decimals you expect to find or by scanning the floats and generating the value yourself at runtime.
Hope that helps.
EDIT: If you are worried about going over the maximum capacity of integers - then you would be happy to know that python automatically (and silently) converts integers to longs when it notices overflow is going to occur. You can see for yourself in a python console:
>>> i = 3423
>>> type(i)
<type 'int'>
>>> i *= 100000
>>> type(i)
<type 'int'>
>>> i *= 100000
>>> type(i)
<type 'long'>
If you are still worried about overflow, you can always choose a lower constant with a compromise for slightly less accuracy (since you will be losing some digits towards then end of the decimal point).
Also, the method proposed by Eric Postpischil seems to make sense - but I have not tried it out myself. I gave you a solution from a more mathematical perspective which also seems to be more "pythonic"
Perhaps consider counting the number of places after the decimal for each value to determine the value (x) of your exponent (1ex). Roughly something like what's addressed here. Cheers!
Here's one solution:
def to_int(val):
return int(repr(val).replace('.', '').split('e')[0])
Usage:
>>> to_int(1.28827339907e-08)
128827339907
Python's math module contain handy functions like floor & ceil. These functions take a floating point number and return the nearest integer below or above it. However these functions return the answer as a floating point number. For example:
import math
f=math.floor(2.3)
Now f returns:
2.0
What is the safest way to get an integer out of this float, without running the risk of rounding errors (for example if the float is the equivalent of 1.99999) or perhaps I should use another function altogether?
All integers that can be represented by floating point numbers have an exact representation. So you can safely use int on the result. Inexact representations occur only if you are trying to represent a rational number with a denominator that is not a power of two.
That this works is not trivial at all! It's a property of the IEEE floating point representation that int∘floor = ⌊⋅⌋ if the magnitude of the numbers in question is small enough, but different representations are possible where int(floor(2.3)) might be 1.
To quote from Wikipedia,
Any integer with absolute value less than or equal to 224 can be exactly represented in the single precision format, and any integer with absolute value less than or equal to 253 can be exactly represented in the double precision format.
Use int(your non integer number) will nail it.
print int(2.3) # "2"
print int(math.sqrt(5)) # "2"
You could use the round function. If you use no second parameter (# of significant digits) then I think you will get the behavior you want.
IDLE output.
>>> round(2.99999999999)
3
>>> round(2.6)
3
>>> round(2.5)
3
>>> round(2.4)
2
Combining two of the previous results, we have:
int(round(some_float))
This converts a float to an integer fairly dependably.
That this works is not trivial at all! It's a property of the IEEE floating point representation that int∘floor = ⌊⋅⌋ if the magnitude of the numbers in question is small enough, but different representations are possible where int(floor(2.3)) might be 1.
This post explains why it works in that range.
In a double, you can represent 32bit integers without any problems. There cannot be any rounding issues. More precisely, doubles can represent all integers between and including 253 and -253.
Short explanation: A double can store up to 53 binary digits. When you require more, the number is padded with zeroes on the right.
It follows that 53 ones is the largest number that can be stored without padding. Naturally, all (integer) numbers requiring less digits can be stored accurately.
Adding one to 111(omitted)111 (53 ones) yields 100...000, (53 zeroes). As we know, we can store 53 digits, that makes the rightmost zero padding.
This is where 253 comes from.
More detail: We need to consider how IEEE-754 floating point works.
1 bit 11 / 8 52 / 23 # bits double/single precision
[ sign | exponent | mantissa ]
The number is then calculated as follows (excluding special cases that are irrelevant here):
-1sign × 1.mantissa ×2exponent - bias
where bias = 2exponent - 1 - 1, i.e. 1023 and 127 for double/single precision respectively.
Knowing that multiplying by 2X simply shifts all bits X places to the left, it's easy to see that any integer must have all bits in the mantissa that end up right of the decimal point to zero.
Any integer except zero has the following form in binary:
1x...x where the x-es represent the bits to the right of the MSB (most significant bit).
Because we excluded zero, there will always be a MSB that is one—which is why it's not stored. To store the integer, we must bring it into the aforementioned form: -1sign × 1.mantissa ×2exponent - bias.
That's saying the same as shifting the bits over the decimal point until there's only the MSB towards the left of the MSB. All the bits right of the decimal point are then stored in the mantissa.
From this, we can see that we can store at most 52 binary digits apart from the MSB.
It follows that the highest number where all bits are explicitly stored is
111(omitted)111. that's 53 ones (52 + implicit 1) in the case of doubles.
For this, we need to set the exponent, such that the decimal point will be shifted 52 places. If we were to increase the exponent by one, we cannot know the digit right to the left after the decimal point.
111(omitted)111x.
By convention, it's 0. Setting the entire mantissa to zero, we receive the following number:
100(omitted)00x. = 100(omitted)000.
That's a 1 followed by 53 zeroes, 52 stored and 1 added due to the exponent.
It represents 253, which marks the boundary (both negative and positive) between which we can accurately represent all integers. If we wanted to add one to 253, we would have to set the implicit zero (denoted by the x) to one, but that's impossible.
If you need to convert a string float to an int you can use this method.
Example: '38.0' to 38
In order to convert this to an int you can cast it as a float then an int. This will also work for float strings or integer strings.
>>> int(float('38.0'))
38
>>> int(float('38'))
38
Note: This will strip any numbers after the decimal.
>>> int(float('38.2'))
38
math.floor will always return an integer number and thus int(math.floor(some_float)) will never introduce rounding errors.
The rounding error might already be introduced in math.floor(some_large_float), though, or even when storing a large number in a float in the first place. (Large numbers may lose precision when stored in floats.)
Another code sample to convert a real/float to an integer using variables.
"vel" is a real/float number and converted to the next highest INTEGER, "newvel".
import arcpy.math, os, sys, arcpy.da
.
.
with arcpy.da.SearchCursor(densifybkp,[floseg,vel,Length]) as cursor:
for row in cursor:
curvel = float(row[1])
newvel = int(math.ceil(curvel))
Since you're asking for the 'safest' way, I'll provide another answer other than the top answer.
An easy way to make sure you don't lose any precision is to check if the values would be equal after you convert them.
if int(some_value) == some_value:
some_value = int(some_value)
If the float is 1.0 for example, 1.0 is equal to 1. So the conversion to int will execute. And if the float is 1.1, int(1.1) equates to 1, and 1.1 != 1. So the value will remain a float and you won't lose any precision.