I am using the function math.modf which separates the integer and decimal part of a number as follows:
decimal_part, integer_part = math.modf(x)
Where x is a decimal number.
An example for a small number is as follows:
x = 1993.0787353515625
decimal_part = 0.0787353515625, integer_part = 1993.0
But when I work with very large numbers the following happens:
x = 6.797731511223558e+44
decimal_part = 0.0, integer_part = 6.797731511223558e+44
In this case it doesn't save the result in the decimal part and appears 0.0. And the same happens for numbers up to 300 digits. But when the number x has at least 360 digits, the following error appears:
OverflowError: int too large to convert to float.
I would like to save the decimal part of large numbers of at least 300 digits without overflowing the register where the decimal part is stored. And I would like to avoid the error in numbers with more than 360 digits: "OverflowError: int too large to convert to float".
How can I solve it?
Due to the extra information it has to save, float needs more space than int. But let's break this down:
The number 6.797731511223558e+44 is an integer, which means it has no decimal part, so it will always return 0.0 as decimal.
If you are providing an integer with 300+ digits, it will still be an integer, so the decimal part will still be 0.0, so there's no need to use the function. You are getting that error because you are passing a very large int that is converted to float to give you the result, but this is not necessary since you already know the result.
On the other hand, if you use the function with a float, the function doesn't have problems casting float to float, so it won't show the error.
The number 6.797731511223558e+44 should be a number with a decimal part because it is the result of dividing a number by another number. But python doesn't save the decimal result and 0.0 appears. When we introduce small numbers in the function, it saves the decimal part.
Related
When I convert a float to decimal.Decimal in Python and afterwards call Decimal.shift it may give me completely wrong and unexpected results depending on the float. Why is this the case?
Converting 123.5 and shifting it:
from decimal import Decimal
a = Decimal(123.5)
print(a.shift(1)) # Gives expected result
The code above prints the expected result of 1235.0.
If I instead convert and shift 123.4:
from decimal import Decimal
a = Decimal(123.4)
print(a.shift(1)) # Gives UNexpected result
it gives me 3.418860808014869689941406250E-18 (approx. 0) which is completely unexpected and wrong.
Why is this the case?
Note:
I understand the floating-point imprecision because of the representation of floats in memory. However, I can't explain why this should give me such a completely wrong result.
Edit:
Yes, in general it would be best to not convert the floats to decimals but convert strings to decimals instead. However, this is not the point of my question. I want to understand why the shifting after float conversion gives such a completely wrong result. So if I print(Decimal(123.4)) it gives 123.40000000000000568434188608080148696899414062 so after shifting I would expect it to be 1234.0000000000000568434188608080148696899414062 and not nearly zero.
You need to change the Decimal constructor input to use strings instead of floats.
a = Decimal('123.5')
print(a.shift(1))
a = Decimal('123.4')
print(a.shift(1))
or
a = Decimal(str(123.5))
print(a.shift(1))
a = Decimal(str(123.4))
print(a.shift(1))
The output will be as expected.
>>> 1235.0
>>> 1234.0
Decimal instances can be constructed from integers, strings, floats, or tuples. Construction from an integer or a float performs an exact conversion of the value of that integer or float.
For floats, Decimal calls Decimal.from_float()
Note that Decimal.from_float(0.1) is not the same as Decimal('0.1'). Since 0.1 is not exactly representable in binary floating point, the value is stored as the nearest representable value which is 0x1.999999999999ap-4. The exact equivalent of the value in decimal is 0.1000000000000000055511151231257827021181583404541015625.
Internally, the Python decimal library converts a float into two integers representing the numerator and denominator of a fraction that yields the float.
n, d = abs(123.4).as_integer_ratio()
It then calculates the bit length of the denominator, which is the number of bits required to represent the number in binary.
k = d.bit_length() - 1
And then from there the bit length k is used to record the coefficient of the decimal number by multiplying the numerator * 5 to the power of the bit length of the denominator.
coeff = str(n*5**k)
The resulting values are used to create a new Decimal object with constructor arguments of sign, coefficient, and exponent using this values.
For the float 123.5 these values are
>>> 1 1235 -1
and for the float 123.4 these values are
1 123400000000000005684341886080801486968994140625 -45
So far, nothing is amiss.
However when you call shift, the Decimal library has to calculate how much to pad the number with zeroes based on the shift you've specified. To do this internally it takes the precision subtracted by length of the coefficient.
amount_to_pad = context.prec - len(coeff)
The default precision is only 28 and with a float like 123.4 the coefficient becomes much longer than the default precision as noted above. This creates a negative amount to pad with zeroes and makes the number very tiny as you noted.
A way around this is to increase the precision to the length of the exponent + the length of the number you started with (45 + 4).
from decimal import Decimal, getcontext
getcontext().prec = 49
a = Decimal(123.4)
print(a)
print(a.shift(1))
>>> 123.400000000000005684341886080801486968994140625
>>> 1234.000000000000056843418860808014869689941406250
The documentation for shift hints that the precision is important for this calculation:
The second operand must be an integer in the range -precision through precision.
However it does not explain this caveat for floats that don't play nice with memory limitations.
I would expect this to raise some kind of error and prompt you to change your input or increase the precision, but at least you know!
#MarkDickinson noted in a comment above that you can view this Python bug tracker for more information: https://bugs.python.org/issue7233
pd.set_option('display.max_colwidth', None )
pd.set_option('display.float_format', lambda x: '%.200f' % x)
exData = pd.read_csv('AP11.csv',delimiter=';',float_precision=None)
x = exData.loc[:,['A','B']]
y = exData.loc[:,['C']]
x
my original float on excel is 0.1211101931541032183754113717355410323332315436353654273243543132542237415430173719
what is being displayed is
0.12111019315410319341363987177828676067292690277099609375000000000000000000000000000000000000000000000000000000000...
this is not a display issue. something in pandas rounds my float. i don't want to round any number for it will affect the result of my string. because this is originally a string that is converted to a float. i tried to use int64 but it can't handle big numbers. so instead i decided to use floats with "0.mystring" to not get "inf" displayed in pandas. and i get it rounded. is machine learning limited by these missy variables? or is there another way to deal with big numbers without rounding, displaying inf?
Use decimal instead of float. Just put
from decimal import Decimal
at the top of your code, and write your floats as
x = Decimal(0.121110193154103218375411371735541032333231543635365427324354313254223741543017371)
decimal is a library for floats with a dynamic length, rather than rounded.
Generally you should avoid floats, as they can have strange irregularities and roundings. Often when operations are performed on them, they can have a series of zeros and then some other numbers, when it should just have a few decimal places.
I'm trying to convert strings of numbers that come from the output of another program into floating point numbers with two forced decimal places (including trailing zeros).
Right now I'm converting the strings to floats, then separately specifying precision (two decimal places), then converting back to float to do numeral comparisons on later.
# convert to float
float1 = float(output_string[6])
# this doesn't guarantee two decimal places in my output
# eg: -36.55, -36.55, -40.34, -36.55, -35.7 (no trailing zero on the last number)
nice_float = float('{0:.2f}'.format(float1))
# this works but then I later need to convert back into a float
# string->float->string->float is not super clean
nice_string = '{0:.2f}'.format(float1)
Edit for clarity:
I have a problem with the display in that I need that to show exactly two decimal places.
Is there a way to convert a string to a floating point number rounded to two decimal places that's cleaner than my implementation which involves converting a string to a float, then the float back into a formatted string?
I asked this because it is possible in R. Note that both 1.5 and 1 are in numeric type (double-precision), and only 1L is an integer. When coercing a string into numeric type, it doesn't show a decimal point if there's not one in the string.
class(1.5)
# "numeric"
class(1)
# "numeric"
class(1L)
# "integer"
x <- as.numeric("3")
x
# 3
class(x)
# "numeric"
Am I allowed to have similar operations in Python? Let's say I have a function called key_in_a_number:
def key_in_a_number():
num = input("Key in a number here: ")
try:
return float(num)
except ValueError:
return "Please key in only numbers."
Now if one keys in "40", it will return 40.0, but 40.0 and 40 are different in certain digits. Thus, 40 should be returned if "40" is keyed in, while 40.0 should be returned only when "40.0" is keyed in.
My work around is:
def key_in_a_number():
num = input("Key in a number here: ")
try:
return int(num)
except ValueError:
try:
return float(num)
except ValueError:
return "Please key in only numbers."
However, in this way, I cannot be sure that the results are always in the same type, which could be problematic in following data storage or processing. Is there any way to have a number in float type without a decimal point?
I think your core problem here is that you're misunderstanding what float is.
A float represents a C double, which almost always means an IEEE 754-1985 double (or an IEEE 754-2008 binary64, which is basically the same thing but slightly better defined). It always has 53 binary digits of precision. It doesn't matter whether you specify it as 40., 40.00000, float(40), float('40'), or float('40.00'); those are all identical in every way.
So, the main problem you're asking about doesn't make any sense:
Now if one keys in "40", it will return 40.0, but 40.0 and 40 are different in certain digits.
No, they aren't. float("40") and float("40.0") are both the exact same value, with no differences in any digits, and no difference in their precision, or anything else.
There's a different type in Python, in the decimal library, that represents an IEEE 754-2008 arbitrary-sized decimal. It has as many decimal digits of precision as you tell it to have.
So, Decimal('40') and Decimal('40.') have two digits; Decimal('40.000') has five digits—they may be equal, but they're not identical, because the last one is more precise.
Decimal, on the other hand, prints out however many digits of precision it actually has:
>>> print(Decimal('40'))
40
>>> print(Decimal('40.'))
40
>>> print(Decimal('40.0'))
40.0
While we're at it, if you do want float and int values, here's how to translate each line of R into Python:
class(1.5) # numeric
type(1.5) # float
class(1) # numeric
type(1) # int
type(1.) # float
class(1L) # integer
type(1) # int
x <- as.numeric("3") # numeric
x = float(3) # float
x = float("3") # float
Notice that, just like as.numeric("3") gives you a numeric rather than an integer, float("3")gives you afloatrather than anint`. I'm not sure why that Python behavior puzzles you given that it's identical to the equivalent R behavior.
Yes,
10 would be an integer in Python, whereas 10. which represents the same number would be a float.
I am trying to write a function to round a floating point number up to n decimal places. The function can take one or two arguments. If there is only one argument the number should be rounded to two decimal places.
This is where I have gotten so far:
def roundno(num,point=2):
import math
x=1*(math.pow(10,-point))
round=0
while (num>x):
while(num>0):
round+=num/10
num=num/10
round*=10
round+=num/10
num=num/10
round*=0.1
return round
I am getting infinity as the output, every time... Where did I go wrong?
I can't see how your algorithm is supposed to round numbers. I guess a similar strategy could work, but you'd need a subtraction in there somewhere...
One way to do this would be to convert the argument to a string, adjust the number of digits after the decimal point, and then convert the string back to a float, but I suspect that your teacher would not like that solution. :)
Here's a simple way to do rounding arithmetically:
def roundno(num, point=2):
scale = 10.0 ** point
return int(num * scale) / scale
data = [123, 12.34, 1.234, 9.8765, 98.76543]
for n in data:
print n, roundno(n), roundno(n, 3)
output
123 123.0 123.0
12.34 12.34 12.34
1.234 1.23 1.234
9.8765 9.87 9.876
98.76543 98.76 98.765
This simply drops unwanted digits, but it's not hard to modify it to round up or off (your question isn't clear on exactly what type of rounding you want).
Note that this function doesn't check the point argument. It really should check that it's a non-negative integer and raise ValueError with an appropriate error message otherwise.