Float divisions returning wierd results - python

Im trying to do a project and for some reason the same divisions give me different results. I am trying to check if 2 divisions are equal and give me the same results but When I try 5.99/1 and 0.599/0.1 the script says that they are different while they are supposed to return the same results. I figured out what the problem is that 5.99/1 = 5.99 and 0.599/0.1 = 5.989999999999999but I cant find a fix for this.

You can find the reason in this answer: https://stackoverflow.com/a/588014/11502612
I have written a possible solution for you:
Code:
a = 5.99 / 1
b = 0.599 / 0.1
a_str = "{:.4f}".format(5.99 / 1)
b_str = "{:.4f}".format(0.599 / 0.1)
print(a, b)
print(a_str, b_str)
print(a == b)
print(a_str == b_str)
Output:
>>> python3 test.py
5.99 5.989999999999999
5.9900 5.9900
False
True
As you can see below I have converted the result of division to a formatted string and I check them instead of default floating type.

Related

How does Ruby's to_s(2) translate into python?

So I've been trying to rewrite a Ruby snippet of code into Python, and I haven't been able to make it work. I reread everything to make sure I did it right, but it still doesn't work. I guess the problem lies in this "translation":
def multiply(k, point = $G)
current = point
binary = k.to_s(2)
binary.split("").drop(1).each do |char|
current = double(current)
current = add(current, point) if char == "1"
end
current
end
This is my translated python version:
def multiply(k, point = G):
current = point
binary = bin(k)
for i in binary[3:]:
current = double(current)
if i == "1":
current = add(current, point)
return current
I believe I didn't quite understand Ruby's concepts of to_s(2) and/or .drop(1).
Could someone tell me what is the best way of translating this Ruby code into Python?
EDIT
So, I'll elaborate just as #Michael Butscher suggested:
I have this Ruby code, which I tried to translate into this Python code. And while the output should be
044aeaf55040fa16de37303d13ca1dde85f4ca9baa36e2963a27a1c0c1165fe2b11511a626b232de4ed05b204bd9eccaf1b79f5752e14dd1e847aa2f4db6a5
it throws an error. Why?
The problem is not in the function you have shown, but in your inverse function. / between integers in Ruby translates as // in Python 3:
Ruby:
3 / 2
# => 1
3.0 / 2
# => 1.5
Python 3:
3 / 2
# => 1.5
3 // 2
# => 1

Python, different rounding from variable, and hardcoded value

I have a simple math formula that results in a decimal number (0.97745) that I want to round to 4 numbers.
When I do that from my evaluated variable I get (0.9774), but when I hardcode that number into function round(), I get 0.9775
Here is the code
zero = 0.9700
effective_beta = 0.00745
loan = {}
loan['beta2'] = 0.0
loan['beta3'] = 0.0
mrktdiff_2 =0.08880400
mrktdiff_3 = 0.026463592000
forecasted_pt = (float(zero) + float(effective_beta) + float(loan['beta2'] or 0.) * float(mrktdiff_2) +
float(loan['beta3'] or 0.) * float(mrktdiff_3))
print("before rounding forecastedpt is ")
print(forecasted_pt)
print("after rounding")
print(round(forecasted_pt,4))
print("Dont get this part")
print(round(0.97745,4))
The reason why I use the float operators is due to the that these variables are dynamic and sometimes can result in string / null values.
Also when I run the same code in php I get the 0.9775 value for this.
Edit:
I ran the code in katacoda.com editor, and got the following:
before rounding forecastedpt is
0.97745
after rounding
0.9774
Dont get this part
0.9775
But running it in repl.com I get the first value as: 0.97744999999999 so I guess it could be in the precision of the expression itself
try it :
zero = 0.9700
effective_beta = 0.00745
loan = {}
loan['beta2'] = 0.0
loan['beta3'] = 0.0
mrktdiff_2 =0.08880400
mrktdiff_3 = 0.026463592000
forecasted_pt = (float(zero) + float(effective_beta) + float(loan['beta2'] or 0.) * float(mrktdiff_2) + float(loan['beta3'] or 0.) * float(mrktdiff_3))
print("before rounding forecastedpt is ")
print(forecasted_pt)
print("after rounding")
print(round(forecasted_pt,5))
numb = round(forecasted_pt,5)
print(round(numb,4))
print("Dont get this part")
print(round(0.97745,4))
the output:
before rounding forecastedpt is
0.9774499999999999
after rounding
0.97745
0.9775
Dont get this part
0.9775
the round function in any language dont round all entire number its strip the numbers to round only the last number of it.
From the python documentation:
Note: The behavior of round() for floats can be surprising: for example, round(2.675, 2) gives 2.67 instead of the expected 2.68. This is not a bug: it’s a result of the fact that most decimal fractions can’t be represented exactly as a float. See Floating Point Arithmetic: Issues and Limitations for more information.
Try this:
I remember on my statistics class 10 years ago. My Professor has always been advising us to round the calculation up to 6 decimal points as statistics is all about estimation, it counts alot.
zero = 0.9700
effective_beta = 0.00745
loan = {}
loan['beta2'] = 0.0
loan['beta3'] = 0.0
mrktdiff_2 =0.08880400
mrktdiff_3 = 0.026463592000
forecasted_pt = round((float(zero) + float(effective_beta) + float(loan['beta2'] or 0.) * float(mrktdiff_2) + float(loan['beta3'] or 0.) * float(mrktdiff_3)),6)
print(round(forecasted_pt,4))

KeyError with a poisson process using pandas

I am trying to create a function which will simulate a poison process for a changeable dt and total time, and have the following:
def compound_poisson(lamda,mu,sigma,dt,T):
points = pd.Series(0)
out = pd.Series(0)
inds = simple_poisson(lamda,dt,T)
for ind in inds.index:
if inds[ind+dt] > inds[ind]:
points[ind+dt] = np.random.normal(mu,sigma)
else:
points[ind+dt] = 0
out = out.append(np.cumsum(points),ignore_index=True)
out.index = np.linspace(0,T,int(T/dt + 1))
return out
However, I receive a "KeyError: 0.010000000000000002", which should not be in the index at all. Is this a result of being lax with float objects?
In short, yes, it's a floating point error. It's quite hard to know how you got there, but probably something like this:
>>> 0.1 * 0.1
0.010000000000000002
Maybe use round?

format() function is always returning "0.00"

I have some calculation that I am running that would provide a double/float back:
a = float(4)
b = float(56100)
c = a / b
Now when run the script, I get this:
7.1301e-05
I just need to format this response so that I get 7.13. But when I try to do this I get 0.00:
percentage_connections_used = float(a) / float(b)
percentage_float = float(percentage_connections_used)
print(format(percentage_float, '.2f'))
I can't seem to figure out why it would return 0 when trying to format it. Can someone possibly tell me what is going on? This is Python 2.7
I think your format is correct, but when you try to round to 2 decimal places It actually rounds to 0.00.
7.8125e-05 = 0.000078125
When rendered as 2 decimals, you get 0.00.
You could do a little string manipulation to parse out the 7.8125 figure by using:
d = float(str(c).split('e')[0])
It's a little verbose, though, and maybe someone in the community can do better.
By the way, I get 7.1301...e-05 when I run a/b.
7.8125e-05 is the same as 0.000078125 so formatting it with only two decimal points gives you 0.00. You could do '.7f' which would get you 0.0000713. If you want it to output in scientific notation, you should do that explicitly. Try this:
a = float(4)
b = float(56100)
c = a / b
print("{:.2e}".format(c))

Understanding pandas.read_csv() float parsing

I am having problems reading probabilities from CSV using pandas.read_csv; some of the values are read as floats with > 1.0.
Specifically, I am confused about the following behavior:
>>> pandas.read_csv(io.StringIO("column\n0.99999999999999998"))["column"][0]
1.0
>>> pandas.read_csv(io.StringIO("column\n0.99999999999999999"))["column"][0]
1.0000000000000002
>>> pandas.read_csv(io.StringIO("column\n1.00000000000000000"))["column"][0]
1.0
>>> pandas.read_csv(io.StringIO("column\n1.00000000000000001"))["column"][0]
1.0
>>> pandas.read_csv(io.StringIO("column\n1.00000000000000008"))["column"][0]
1.0
>>> pandas.read_csv(io.StringIO("column\n1.00000000000000009"))["column"][0]
1.0000000000000002
Default float-parsing behavior seems to be non-monotonic, and especially some values starting 0.9... are converted to floats that are strictly greater than 1.0, causing problems e.g. when feeding them into sklearn.metrics.
The documentation states that read_csv has a parameter float_precision that can be used to select “which converter the C engine should use for floating-point values”, and setting this to 'high' indeed solves my problem.
However, I would like to understand the default behavior:
Where can I find the source code of the default float converter?
Where can I find documentation on the intended behavior of the default float converter and the other possible choices?
Why does a single-figure change in the least significant position skip a value?
Why does this behave non-monotonically at all?
Edit regarding “duplicate question”: This is not a duplicate. I am aware of the limitations of floating-point math. I was specifically asking about the default parsing mechanism in Pandas, since the builtin float does not show this behavior:
>>> float("0.99999999999999999")
1.0
...and I could not find documentation.
#MaxU already showed the source code for the parser and the relevant tokenizer xstrtod so I'll focus on the "why" part:
The code for xstrtod is roughly like this (translated to pure Python):
def xstrtod(p):
number = 0.
idx = 0
ndecimals = 0
while p[idx].isdigit():
number = number * 10. + int(p[idx])
idx += 1
idx += 1
while idx < len(p) and p[idx].isdigit():
number = number * 10. + int(p[idx])
idx += 1
ndecimals += 1
return number / 10**ndecimals
Which reproduces the "problem" you saw:
print(xstrtod('0.99999999999999997')) # 1.0
print(xstrtod('0.99999999999999998')) # 1.0
print(xstrtod('0.99999999999999999')) # 1.0000000000000002
print(xstrtod('1.00000000000000000')) # 1.0
print(xstrtod('1.00000000000000001')) # 1.0
print(xstrtod('1.00000000000000002')) # 1.0
print(xstrtod('1.00000000000000003')) # 1.0
print(xstrtod('1.00000000000000004')) # 1.0
print(xstrtod('1.00000000000000005')) # 1.0
print(xstrtod('1.00000000000000006')) # 1.0
print(xstrtod('1.00000000000000007')) # 1.0
print(xstrtod('1.00000000000000008')) # 1.0
print(xstrtod('1.00000000000000009')) # 1.0000000000000002
print(xstrtod('1.00000000000000019')) # 1.0000000000000002
The problem seems to be the 9 in the last place which alters the result. So it's floating point accuracy:
>>> float('100000000000000008')
1e+17
>>> float('100000000000000009')
1.0000000000000002e+17
It's the 9 in the last place that is responsible for the skewed results.
If you want high precision you can define your own converters or use python-provided ones, i.e. decimal.Decimal if you want arbitary precision:
>>> import pandas
>>> import decimal
>>> converter = {0: decimal.Decimal} # parse column 0 as decimals
>>> import io
>>> def parse(string):
... return '{:.30f}'.format(pd.read_csv(io.StringIO(string), converters=converter)["column"][0])
>>> print(parse("column\n0.99999999999999998"))
>>> print(parse("column\n0.99999999999999999"))
>>> print(parse("column\n1.00000000000000000"))
>>> print(parse("column\n1.00000000000000001"))
>>> print(parse("column\n1.00000000000000008"))
>>> print(parse("column\n1.00000000000000009"))
which prints:
0.999999999999999980000000000000
0.999999999999999990000000000000
1.000000000000000000000000000000
1.000000000000000010000000000000
1.000000000000000080000000000000
1.000000000000000090000000000000
Exactly representing the input!
If you want to understand how it works - look at the source code - file "_libs/parsers.pyx" lines: 492-499 for Pandas 0.20.1:
self.parser.double_converter_nogil = xstrtod # <------- default converter
self.parser.double_converter_withgil = NULL
if float_precision == 'high':
self.parser.double_converter_nogil = precise_xstrtod # <------- 'high' converter
self.parser.double_converter_withgil = NULL
elif float_precision == 'round_trip': # avoid gh-15140
self.parser.double_converter_nogil = NULL
self.parser.double_converter_withgil = round_trip
Source code for xstrtod
Source code for precise_xstrtod

Categories