How to reliably separate decimal and floating parts from a number? - python

This is not a duplicate of this, I'll explain here.
Consider x = 1.2. I'd like to separate it out into 1 and 0.2. I've tried all these methods as outlined in the linked question:
In [370]: x = 1.2
In [371]: divmod(x, 1)
Out[371]: (1.0, 0.19999999999999996)
In [372]: math.modf(x)
Out[372]: (0.19999999999999996, 1.0)
In [373]: x - int(x)
Out[373]: 0.19999999999999996
In [374]: x - int(str(x).split('.')[0])
Out[374]: 0.19999999999999996
Nothing I try gives me exactly 1 and 0.2.
Is there any way to reliably convert a floating number to its decimal and floating point equivalents that is not hindered by the limitation of floating point representation?
I understand this might be due to the limitation of how the number is itself stored, so I'm open to any suggestion (like a package or otherwise) that overcomes this.
Edit: Would prefer a way that didn't involve string manipulation, if possible.

Solution
It may seem like a hack, but you could separate the string form (actually repr) and convert it back to ints and floats:
In [1]: x = 1.2
In [2]: s = repr(x)
In [3]: p, q = s.split('.')
In [4]: int(p)
Out[4]: 1
In [5]: float('.' + q)
Out[5]: 0.2
How it works
The reason for approaching it this way is that the internal algorithm for displaying 1.2 is very sophisticated (a fast variant of David Gay's algorithm). It works hard to show the shortest of the possible representations of numbers that cannot be represented exactly. By splitting the repr form, you're taking advantage of that algorithm.
Internally, the value entered as 1.2 is stored as the binary fraction, 5404319552844595 / 4503599627370496 which is actually equal to 1.1999999999999999555910790149937383830547332763671875. The Gay algorithm is used to display this as the string 1.2. The split then reliably extracts the integer portion.
In [6]: from decimal import Decimal
In [7]: Decimal(1.2)
Out[7]: Decimal('1.1999999999999999555910790149937383830547332763671875')
In [8]: (1.2).as_integer_ratio()
Out[8]: (5404319552844595, 4503599627370496)
Rationale and problem analysis
As stated, your problem roughly translates to "I want to split the integral and fractional parts of the number as it appears visually rather that according to how it is actually stored".
Framed that way, it is clear that the solution involves parsing how it is displayed visually. While it make feel like a hack, this is the most direct way to take advantage of the very sophisticated display algorithms and actually match what you see.
This way may the only reliable way to match what you see unless you manually reproduce the internal display algorithms.
Failure of alternatives
If you want to stay in realm of integers, you could try rounding and subtraction but that would give you an unexpected value for the floating point portion:
In [9]: round(x)
Out[9]: 1.0
In [10]: x - round(x)
Out[10]: 0.19999999999999996

Here is a solution without string manipulation (frac_digits is the count of decimal digits that you can guarantee the fractional part of your numbers will fit into):
>>> def integer_and_fraction(x, frac_digits=3):
... i = int(x)
... c = 10**frac_digits
... f = round(x*c-i*c)/c
... return (i, f)
...
>>> integer_and_fraction(1.2)
(1, 0.2)
>>> integer_and_fraction(1.2, 1)
(1, 0.2)
>>> integer_and_fraction(1.2, 2)
(1, 0.2)
>>> integer_and_fraction(1.2, 5)
(1, 0.2)
>>>

You could try converting 1.2 to string, splitting on the '.' and then converting the two strings ("1" and "2") back to the format you want.
Additionally padding the second portion with a '0.' will give you a nice format.

So I just did the following in a python terminal and it seemed to work properly...
x=1.2
s=str(x).split('.')
i=int(s[0])
d=int(s[1])/10

Related

Working with big numbers in Python and writing them to file

I'm trying to find an efficient way to do the following in Python:
a = 12345678901234567890123456**12345678
f = open('file', 'w')
f.write(str(a))
f.close()
The calculation of the power takes about 40 minutes while one thread is utilized. Is there a quick and easy way to spread this operation over multiple threads?
As the number is quite huge, I think the string function isn't quite up to the task - it's been going for almost three hours now. I need the number to end up in a text file.
Any ideas on how to better accomplish this?
I would like to give a lavish ;-) answer, but don't have the time now. Elaborating on my comment, the decimal module is what you really want here. It's much faster at computing the power, and very very much faster to convert the result to a decimal string:
>>> import decimal
You need to change its internals so that it avoids floating point, giving it more than enough internal digits to store the final result. We want exact integer arithmetic here, not rounded floating-point. So we fiddle things so decimal uses as much precision as it's capable of using, and tell it to raise the "Inexact" exception if it ever loses information to rounding. Note that you need a 64-bit version of Python for decimal to be capable of using enough precision to hold the exact result in your example:
>>> import decimal
>>> c = decimal.getcontext()
>>> c.prec = decimal.MAX_PREC
>>> c.Emax = decimal.MAX_EMAX
>>> c.Emin = decimal.MIN_EMIN
>>> c.traps[decimal.Inexact] = 1
Now create a Decimal for the base:
>>> base = decimal.Decimal(12345678901234567890123456)
>>> base
Decimal('12345678901234567890123456')
And raise to the power - the exponent will automatically be converted to Decimal, because the base is already Decimal:
>>> x = base ** 12345678
That takes less than a minute on my box! The reasons for that are involved. It's not really because it's working in base 10, but because the person who wrote the decimal module implemented "advanced" algorithms for doing very large multiplications.
Now convert to a string. Because it's already stored in a variant of base 10, converting to a decimal string goes very fast (a few seconds on my box, just because the string has hundreds of millions of digits):
>>> y = str(x)
>>> len(y)
309771765
And, for sanity, let's just look at the last 10, and first 10, digits:
>>> y[-10:]
'6044706816'
>>> y[:10]
'2759594879'
As #StefanPochmann noted in a comment, the last 10 digits can be obtained very quickly with native ints by using modular (3-argument) pow():
>>> pow(int(base), 12345678, 10**10)
6044706816
Which matches the last 10 digits of the string above. For the first 10 digits, we can use decimal again but with much less precision, which will cause it (you'll just to have trust me on this) to use a different approach under the covers:
>>> c.prec = 12
>>> c.traps[decimal.Inexact] = 0 # don't trap on rounding!
>>> base ** 12345678
Decimal('2.75959487945E+309771764')
Rounding that back to 10 digits matches the earlier result, and the exponent is consistent with the length of y too.

Round function does return different value depending on the type [duplicate]

Let's look at the ever-shocking round statement:
>>> round(2.675, 2)
2.67
I know why round "fails"; it's because of the binary representation of 2.675:
>>> import decimal
>>> decimal.Decimal(2.675)
Decimal('2.67499999999999982236431605997495353221893310546875')
What I do not understand is: why does NumPy not fail?
>>> import numpy
>>> numpy.round(2.675, 2)
2.6800000000000002
Thinking
Do not mind the extra zeros; it's an artefact from Python's printing internal rounding. If we look at the "exact" values, they're still different:
>>> decimal.Decimal(round(2.675, 2))
Decimal('2.6699999999999999289457264239899814128875732421875')
>>> decimal.Decimal(numpy.round(2.675, 2))
Decimal('2.680000000000000159872115546022541821002960205078125')
Why oh why does NumPy behave?
I thought at first that NumPy had to handle floats using extra bits, but:
>>> decimal.Decimal(numpy.float(2.675))
Decimal('2.67499999999999982236431605997495353221893310546875')
>>> decimal.Decimal(2.675)
Decimal('2.67499999999999982236431605997495353221893310546875')
# Twins!
What is happening behind the curtains? I looked a bit at NumPy's round implementation, but I'm a Python newbie and I don't see anything overly fishy.
One on top of the hood difference is documented:
In cases where you're halfway between numbers, np.round rounds to the nearest "even" number (after multiplying by 10**n where n is the second argument to the respective round function) whereas the builtin round rounds away from 0.
>>> np.round(2.685, 2)
2.6800000000000002
>>> round(2.685, 2)
2.69
Under the hood, you can get differences when using the scaling parameter. Consider the differences between round(2.675 * 10.**2) and round(2.675, 2). This is certainly a result of the floating point math which as always has some rounding error associated with it. To go any further we would really need to look at the real implementation.

How to prevent float imprecision from affecting numpy.arange?

Because numpy.arange() uses ceil((stop - start)/step) to determine the number of items, a small float imprecision (stop = .400000001) can add an unintended value to the list.
Example
The first case does not include the stop point (intended)
>>> print(np.arange(.1,.3,.1))
[0.1 0.2]
The second case includes the stop point (not intended)
>>> print(np.arange(.1,.4,.1))
[0.1 0.2 0.3 0.4]
numpy.linspace() fixes this problem, np.linspace(.1,.4-.1,3). but requires you know the number of steps. np.linspace(start,stop-step,np.ceil((stop-step)/step)) leads to the same incosistencies.
Question
How can I generate a reliable float range without knowing the # of elements in the range?
Extreme Case
Consider the case in which I want generate a float index of unknown precision
np.arange(2.00(...)001,2.00(...)021,.00(...)001)
Your goal is to calculate what ceil((stop - start)/step) would be if the values had been calculated with exact mathematics.
This is impossible to do given only floating-point values of start, stop, and step that are the results of operations in which some rounding errors may have occurred. Rounding removes information, and there is simply no way to create information from lack of information.
Therefore, this problem is only solvable if you have additional information about start, stop, and step.
Suppose step is exact, but start and stop have some accumulated errors bounded by e0 and e1. That is, you know start is at most e0 away from its ideal mathematical value (in either direction), and stop is at most e1 away from its ideal value (in either direction). Then the ideal value of (stop-start)/step could range from (stop-start-e0-e1)/step to (stop-start+e0+e1)/step away from its ideal value.
Suppose there is an integer between (stop-start-e0-e1)/step to (stop-start+e0+e1)/step. Then it is impossible to know whether the ideal ceil result should be the lesser integer or the greater just from the floating-point values of start, stop, and step and the bounds e0 and e1.
However, from the examples you have given, the ideal (stop-start)/step could be exactly an integer, as in (.4-.1)/.1. If so, any non-zero error bounds could result in the error interval straddling an integer, making the problem impossible to solve from the information we have so far.
Therefore, in order to solve the problem, you must have more information than just simple bounds on the errors. You must know, for example, that (stop-start)/step is exactly an integer or is otherwise quantized. For example, if you knew that the ideal calculation of the number of steps would produce a multiple of .1, such as 3.8, 3.9, 4.0, 4.1, or 4.2, but never 4.05, and the errors were sufficiently small that the floating-point calculation (stop-start)/step had a final error less than .05, then it would be possible to round (stop-start)/step to the nearest qualifying multiple and then to apply ceil to that.
If you have such information, you can update the question with what you know about the errors in start, stop, and step (e.g., perhaps each of them is the result of a single conversion from decimal to floating-point) and the possible values of the ideal (stop-start)/step. If you do not have such information, there is no solution.
If you are guaranteed that (stop-start) is a multiple of step, then you can use the decimal module to compute the number of steps, i.e.
from decimal import Decimal
def arange(start, stop, step):
steps = (Decimal(stop) - Decimal(start))/Decimal(step)
if steps % 1 != 0:
raise ValueError("step is not a multiple of stop-start")
return np.linspace(float(start),float(stop),int(steps),endpoint=False)
print(arange('0.1','0.4','0.1'))
If you have an exact representation of your ends and step and if they are rational you can use the fractions module:
>>> from fractions import Fraction
>>>
>>> a = Fraction('1.0000000100000000042')
>>> b = Fraction('1.0000002100000000002')
>>> c = Fraction('0.0000000099999999998') * 5 / 3
>>>
>>> float(a) + float(c) * np.arange(int((b-a)/c))
array([1.00000001, 1.00000003, 1.00000004, 1.00000006, 1.00000008,
1.00000009, 1.00000011, 1.00000013, 1.00000014, 1.00000016,
1.00000018, 1.00000019])
>>>
>>> eps = Fraction(1, 10**100)
>>> b2 = b - eps
>>> float(a) + float(c) * np.arange(int((b2-a)/c))
array([1.00000001, 1.00000003, 1.00000004, 1.00000006, 1.00000008,
1.00000009, 1.00000011, 1.00000013, 1.00000014, 1.00000016,
1.00000018])
if not you'll have to settle for some form of cutoff:
>>> a = 1.0
>>> b = 1.003999999
>>> c = 0.001
>>>
# cut off at 4 decimals
>>> round(float((b-a)/c), 4)
4.0
# cut off at 6 decimals
>>> round(float((b-a)/c), 6)
3.999999
You can round numbers to arbitrary degrees of precision in Python using the format function.
For example, if you want the first three digits of e after the decimal place, you can run
float(format(np.e, '.3f'))
Use this to eliminate float imprecisions and you should be go to go.

How to get the correct decimal result

I'm trying to write a program to search for duplicate representations of integers in fractional number bases. Consequently, I have to do things like this:
1.1**7
which equals 1.9487171. However, python automatically represents that result as a float, whereas the given value is exact. This is what I need, which is not the same as rounding a float. I also must allow the program to specify how many decimal places there are. I've tried using the decimal module but can't quite get it to work. What would be the best way to do this?
decimal.Decimal arguments should be strings. If you use a float, it carries along it's imprecision:
>>> decimal.Decimal('1.1')**7
Decimal('1.9487171')
>>>
VS
>>> decimal.Decimal(1.1)**7
Decimal('1.948717100000001101423574568')
>>>
The decimal module will give you exact results:
>>> Decimal('1.1') ** 7
Decimal('1.9487171')
For non-decimal bases, the fractions module will do the exact arithmetic. The only issue though is that the output is in fractional form rather than indicating the decimal notation (likely with repeating, non-terminating sequences) that you seem to be looking for:
>>> Fraction(3, 7) ** 5
Fraction(243, 16807)
>>> Context(prec=200).divide(243, 16807)
Decimal('0.014458261438686261676682334741476765633367049443684179211043017790206461593383709168798714821205450110073183792467424287499256262271672517403462842863092758969477003629440114238115071101326828107336229')
fractional number bases
Sounds like fractions, no?
>>> import fractions
>>> fractions.Fraction(11, 10) ** 7
Fraction(19487171, 10000000)
>>> fractions.Fraction(13, 11) ** 7
Fraction(62748517, 19487171)
Have you tried checking for equality to within a tolerance? E.g.
def approx(left, right, tolerance=1**10-6):
if left - right < tolerance:
return True
else:
return False

Round float to x decimals?

Is there a way to round a python float to x decimals? For example:
>>> x = roundfloat(66.66666666666, 4)
66.6667
>>> x = roundfloat(1.29578293, 6)
1.295783
I've found ways to trim/truncate them (66.666666666 --> 66.6666), but not round (66.666666666 --> 66.6667).
I feel compelled to provide a counterpoint to Ashwini Chaudhary's answer. Despite appearances, the two-argument form of the round function does not round a Python float to a given number of decimal places, and it's often not the solution you want, even when you think it is. Let me explain...
The ability to round a (Python) float to some number of decimal places is something that's frequently requested, but turns out to be rarely what's actually needed. The beguilingly simple answer round(x, number_of_places) is something of an attractive nuisance: it looks as though it does what you want, but thanks to the fact that Python floats are stored internally in binary, it's doing something rather subtler. Consider the following example:
>>> round(52.15, 1)
52.1
With a naive understanding of what round does, this looks wrong: surely it should be rounding up to 52.2 rather than down to 52.1? To understand why such behaviours can't be relied upon, you need to appreciate that while this looks like a simple decimal-to-decimal operation, it's far from simple.
So here's what's really happening in the example above. (deep breath) We're displaying a decimal representation of the nearest binary floating-point number to the nearest n-digits-after-the-point decimal number to a binary floating-point approximation of a numeric literal written in decimal. So to get from the original numeric literal to the displayed output, the underlying machinery has made four separate conversions between binary and decimal formats, two in each direction. Breaking it down (and with the usual disclaimers about assuming IEEE 754 binary64 format, round-ties-to-even rounding, and IEEE 754 rules):
First the numeric literal 52.15 gets parsed and converted to a Python float. The actual number stored is 7339460017730355 * 2**-47, or 52.14999999999999857891452847979962825775146484375.
Internally as the first step of the round operation, Python computes the closest 1-digit-after-the-point decimal string to the stored number. Since that stored number is a touch under the original value of 52.15, we end up rounding down and getting a string 52.1. This explains why we're getting 52.1 as the final output instead of 52.2.
Then in the second step of the round operation, Python turns that string back into a float, getting the closest binary floating-point number to 52.1, which is now 7332423143312589 * 2**-47, or 52.10000000000000142108547152020037174224853515625.
Finally, as part of Python's read-eval-print loop (REPL), the floating-point value is displayed (in decimal). That involves converting the binary value back to a decimal string, getting 52.1 as the final output.
In Python 2.7 and later, we have the pleasant situation that the two conversions in step 3 and 4 cancel each other out. That's due to Python's choice of repr implementation, which produces the shortest decimal value guaranteed to round correctly to the actual float. One consequence of that choice is that if you start with any (not too large, not too small) decimal literal with 15 or fewer significant digits then the corresponding float will be displayed showing those exact same digits:
>>> x = 15.34509809234
>>> x
15.34509809234
Unfortunately, this furthers the illusion that Python is storing values in decimal. Not so in Python 2.6, though! Here's the original example executed in Python 2.6:
>>> round(52.15, 1)
52.200000000000003
Not only do we round in the opposite direction, getting 52.2 instead of 52.1, but the displayed value doesn't even print as 52.2! This behaviour has caused numerous reports to the Python bug tracker along the lines of "round is broken!". But it's not round that's broken, it's user expectations. (Okay, okay, round is a little bit broken in Python 2.6, in that it doesn't use correct rounding.)
Short version: if you're using two-argument round, and you're expecting predictable behaviour from a binary approximation to a decimal round of a binary approximation to a decimal halfway case, you're asking for trouble.
So enough with the "two-argument round is bad" argument. What should you be using instead? There are a few possibilities, depending on what you're trying to do.
If you're rounding for display purposes, then you don't want a float result at all; you want a string. In that case the answer is to use string formatting:
>>> format(66.66666666666, '.4f')
'66.6667'
>>> format(1.29578293, '.6f')
'1.295783'
Even then, one has to be aware of the internal binary representation in order not to be surprised by the behaviour of apparent decimal halfway cases.
>>> format(52.15, '.1f')
'52.1'
If you're operating in a context where it matters which direction decimal halfway cases are rounded (for example, in some financial contexts), you might want to represent your numbers using the Decimal type. Doing a decimal round on the Decimal type makes a lot more sense than on a binary type (equally, rounding to a fixed number of binary places makes perfect sense on a binary type). Moreover, the decimal module gives you better control of the rounding mode. In Python 3, round does the job directly. In Python 2, you need the quantize method.
>>> Decimal('66.66666666666').quantize(Decimal('1e-4'))
Decimal('66.6667')
>>> Decimal('1.29578293').quantize(Decimal('1e-6'))
Decimal('1.295783')
In rare cases, the two-argument version of round really is what you want: perhaps you're binning floats into bins of size 0.01, and you don't particularly care which way border cases go. However, these cases are rare, and it's difficult to justify the existence of the two-argument version of the round builtin based on those cases alone.
Use the built-in function round():
In [23]: round(66.66666666666,4)
Out[23]: 66.6667
In [24]: round(1.29578293,6)
Out[24]: 1.295783
help on round():
round(number[, ndigits]) -> floating point number
Round a number to a given precision in decimal digits (default 0
digits). This always returns a floating point number. Precision may
be negative.
Default rounding in python and numpy:
In: [round(i) for i in np.arange(10) + .5]
Out: [0, 2, 2, 4, 4, 6, 6, 8, 8, 10]
I used this to get integer rounding to be applied to a pandas series:
import decimal
and use this line to set the rounding to "half up" a.k.a rounding as taught in school:
decimal.getcontext().rounding = decimal.ROUND_HALF_UP
Finally I made this function to apply it to a pandas series object
def roundint(value):
return value.apply(lambda x: int(decimal.Decimal(x).to_integral_value()))
So now you can do roundint(df.columnname)
And for numbers:
In: [int(decimal.Decimal(i).to_integral_value()) for i in np.arange(10) + .5]
Out: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Credit: kares
The Mark Dickinson answer, although complete, didn't work with the float(52.15) case. After some tests, there is the solution that I'm using:
import decimal
def value_to_decimal(value, decimal_places):
decimal.getcontext().rounding = decimal.ROUND_HALF_UP # define rounding method
return decimal.Decimal(str(float(value))).quantize(decimal.Decimal('1e-{}'.format(decimal_places)))
(The conversion of the 'value' to float and then string is very important, that way, 'value' can be of the type float, decimal, integer or string!)
Hope this helps anyone.
I coded a function (used in Django project for DecimalField) but it can be used in Python project :
This code :
Manage integers digits to avoid too high number
Manage decimals digits to avoid too low number
Manage signed and unsigned numbers
Code with tests :
def convert_decimal_to_right(value, max_digits, decimal_places, signed=True):
integer_digits = max_digits - decimal_places
max_value = float((10**integer_digits)-float(float(1)/float((10**decimal_places))))
if signed:
min_value = max_value*-1
else:
min_value = 0
if value > max_value:
value = max_value
if value < min_value:
value = min_value
return round(value, decimal_places)
value = 12.12345
nb = convert_decimal_to_right(value, 4, 2)
# nb : 12.12
value = 12.126
nb = convert_decimal_to_right(value, 4, 2)
# nb : 12.13
value = 1234.123
nb = convert_decimal_to_right(value, 4, 2)
# nb : 99.99
value = -1234.123
nb = convert_decimal_to_right(value, 4, 2)
# nb : -99.99
value = -1234.123
nb = convert_decimal_to_right(value, 4, 2, signed = False)
# nb : 0
value = 12.123
nb = convert_decimal_to_right(value, 8, 4)
# nb : 12.123
def trim_to_a_point(num, dec_point):
factor = 10**dec_point # number of points to trim
num = num*factor # multiple
num = int(num) # use the trimming of int
num = num/factor #divide by the same factor of 10s you multiplied
return num
#test
a = 14.1234567
trim_to_a_point(a, 5)
output
========
14.12345
multiple by 10^ decimal point you want
truncate with int() method
divide by the same number you multiplied before
done!
Just posted this for educational reasons i think it is correct though :)

Categories