numpy arange: how to make "precise" array of floats? - python

In short, the problem I encounter is this:
aa = np.arange(-1., 0.001, 0.01)
aa[-1]
Out[16]: 8.8817841970012523e-16
In reality, this cause a series problem since my simulations doesn't allow positive value inputs.
I can sort of get around by doing:
aa = np.arange(-100, 1, 1)/100.
aa[-1]
Out[21]: 0.0
But this is a pain. Practically you can't do this every time.
This seems like such a basic problem. There's gotta be something I am missing here.
By the way, I am using Python 2.7.13.

This happens because Python (like most modern programming languages) uses floating point arithmetic, which cannot exactly represent some numbers (see Is floating point math broken?).
This means that, regardless of whether you're using Python 2, Python 3, R, C, Java, etc. you have to think about the effects of adding two floating point numbers together.
np.arange works by repeatedly adding the step value to the start value, and this leads to imprecision in the end:
>>> start = -1
>>> for i in range(1000):
... start += 0.001
>>> start
8.81239525796218e-16
Similarly:
>>> x = np.arange(-1., 0.001, 0.01)
>>> x[-1]
8.8817841970012523e-16
The typical pattern used to circumvent this is to work with integers whenever possible if repeated operations are needed. So, for example, I would do something like this:
>>> x = 0.01 * np.arange(-100, 0.1)
>>> x[-1]
0.0
Alternatively, you could create a one-line convenience function that will do this for you:
>>> def safe_arange(start, stop, step):
... return step * np.arange(start / step, stop / step)
>>> x = safe_arange(-1, 0.001, 0.01)
>>> x[-1]
0
But note that even this can't get around the limits of floating point precision; for example, the number -0.99 cannot be represented exactly in floating point:
>>> val = -0.99
>>> print('{0:.20f}'.format(val))
-0.98999999999999999112
So you must always keep that in mind when working with floating point numbers, in any language.

Using np.linespace solved it for me:
For example np.linspace(0.5, 0.9, 5) produce [0.5 0.6 0.7 0.8 0.9].

We don't get to forget about the limitations of floating-point arithmetics. Repeatedly adding 0.01, or rather the double-precision float that is close to 0.01, will result in the kind of effects you observe.
To ensure that an array does not contain positive numbers, use numpy.clip:
aa = np.clip(np.arange(-1., 0.001, 0.01), None, 0)

I have the same problem as you do.
Here is a simple solution:
for b in np.arange(maxb,minb+stepb,stepb):
for a in np.arange(mina,maxa+stepa,stepa):
a=round(a,2);b=round(b,2); # 2 is the size of the floating part.

Related

Is it possible to generate a random number in python on a completely open interval or one that is closed on the high end?

I would like to generate a random number n such that n is in the range (a,b) or (a,b] where a < b. Is this possible in python? It seems the only choices are a + random.random()*(b-a) which is includes [a,b) or random.uniform(a,b) which includes the range [a,b] so neither meet my needs.
Computer generation of "random" numbers is tricky, and especially of "random" floats. You need to think long & hard about what you really want. In the end, you'll need to build something on top of integers, not directly out of floats.
Under the covers, in Python (and every other language using the Mersenne Twister's source code), generating a "random" IEEE-754 double (Python's basic random.random()) really works by generating a random 53-bit integer, then dividing by 2**53:
randrange(2**53) / 9007199254740992.0
That's why the output range is [0.0, 1.0), but not all representable floats in that range are equally likely. Only the ones that can be expressed in the form I/2**53 for an integer 0 <= I < 2**53. For example, the float 1.0 / 2**60 can never be returned.
There are no "real numbers" here, just representable binary-floating-point numbers, so to answer your question first requires that you specify the exact set of those from which you're trying to pick.
If the answer is that you don't want to get that picky, then the distinction between open and closed is also too picky to bother with. If you can specify the precise set, then the solution is to generate more-or-less obvious random integers that map to your output set.
For example, if you want to pick "random" floats from [3.0, 6.0] with just 2 bits after the radix point, there are 13 possible outputs. So the first step is
i = random.randrange(13)
Then map to the range of interest:
return 3.0 + i / 4.0
EDIT: USELESS BUT EDUCATIONAL ;-)
As noted in the comments, picking uniformly from all representable floats x with 0.0 < x < 1.0 can be done, but is very far from being uniformly distributed across that range. There are, for example, 2**52 representable floats in [0.5, 1.0), but also 2**52 representable floats in [0.25, 0.5), and ... in [2.0**-i, 2.0**(1-i)) for increasing i until the number of representable floats starts shrinking when we hit the subnormal range, eventually falling to none when we underflow to 0 completely.
As bit patterns they're very simple, though: the set of representable IEEE-754 doubles (Python floats on almost all platforms) in (0, 1) consists of, when viewing the bit patterns as integers, simply
range(1, 0x3ff0000000000000)
So a function to generate each of those with equal likelihood is straightforward to write using bit-fiddling tricks:
from struct import unpack
from random import randrange
def gen01():
i = randrange(1, 0x3ff0000000000000)
as_bytes = i.to_bytes(8, "big")
return unpack(">d", as_bytes)[0]
Just run that a few times to see why it's useless - it's very heavily skewed toward the 0.0 end of the range:
>>> for i in range(10):
... print(gen01())
9.796357610869274e-104
4.125848254595866e-197
1.8114434720880952e-253
1.4937625148849258e-285
1.0537573744489343e-304
2.79008159472542e-58
4.718459887295062e-217
2.7996009087703915e-295
3.4129442284798105e-170
2.299402306630583e-115
random.randint(a,b) seems to do that. https://docs.python.org/2/library/random.html
Though a bit tricky, you may use np.random.rand to generate random number in (a, b]:
import numpy as np
size = 10 # No. of random numbers to be generated
a, b = 0, 10 # Can be any values
rand_num = np.random.rand(size) # [0, 1)
rand_num *= -1 # (-1, 0]
rand_num += 1 # (0, 1]
rand_num = a + rand_num * (b - a) # (a, b]

How are data types interpreted, calculated, and/or stored?

In python, suppose the code is:
import.math
a = math.sqrt(2.0)
if a * a == 2.0:
x = 2
else:
x = 1
This is a variant of "Floating Point Numbers are Approximations -- Not Exact".
Mathematically speaking, you are correct that sqrt(2) * sqrt(2) == 2. But sqrt(2) can not be exactly represented as a native datatype (read: floating point number). (Heck, the sqrt(2) is actually guaranteed to be an infinite decimal!). It can get really close, but not exact:
>>> import math
>>> math.sqrt(2)
1.4142135623730951
>>> math.sqrt(2) * math.sqrt(2)
2.0000000000000004
Note the result is, in fact, not exactly 2.
If you want the x = 2 branch to execute, you will need to use an epsilon value of "is the result close enough?":
epsilon = 1e-6 # 0.000001
if abs(2.0 - a*a) < epsilon:
x = 2
else:
x = 1
Numbers with decimals are stored as floating point numbers and they can only be an approximation to the real number in some cases.
So your comparison needs to be not "are these two numbers exactly equal (==)" but "are they sufficiently close as to be considered equal".
Fortunately, in the math library, there's a function to do that conveniently. Using isClose(), you can compare with a defined tolerance. The function isn't too complicated, you could do it yourself.
math.isclose(a*a, 2, abs_tol=0.0001)
>> True

Why does this While loop terminate?

x=1.0
i=1
while(1.0+x>1.0):
x=x/2
i=i+1
print i
Follow up question, why is the value of i=54?
My thinking was that the loop would not end as the value of (1.0+x) will always stay greater than 1.0. But when running the code, that's not the case.
Due to the inaccuracy of floating point, there will always come a time when the value of x is so small that Python can't store its value, and it essentially becomes 0. It takes 54 iterations (53, actually) to get to that stage, which is why i is 54.
For example,
>>> 1e-1000
0.0
Why 54? -- Actually it is 53, because it was before you increment it
>>> 2.**-54
5.551115123125783e-17
>>> 2.**-53
1.1102230246251565e-16
>>> 2.**-52
2.220446049250313e-16
>>> sys.float_info.epsilon
2.220446049250313e-16
if you add something so small to 1, it will be still 1.
When dealing with floats or floating point numbers, you will encounter the notorious Floating Point Epsilon:
In your case, this takes 54 iterations to get below that threshold (since the default floating point type in Python is single precision, and the floating point epsilon for single precision is:
def machineEpsilon(func=float):
machine_epsilon = func(1)
while func(1)+func(machine_epsilon) != func(1):
machine_epsilon_last = machine_epsilon
machine_epsilon = func(machine_epsilon) / func(2)
return machine_epsilon_last
Hence:
In [2]: machineEpsilon(float)
Out[2]: 2.2204460492503131e-16
Where does the 53 iterations come from?
From this line in your code:
x=x/2
Which assigns the current value of x to x/2 meaning that on the 53th iteration, it became:
1.11022302463e-16
Which is less than the floating point epsilon.
As has been pointed out - it's because of the accuracy of floats. If you wanted to overcome this "limitation" you can use Python's fractions module, eg:
from fractions import Fraction as F
x = F(1, 1)
i=1
while(F(1, 1)+x>1.0):
print i, x
x = F(1, x.denominator * 2)
i=i+1
print i
(NB: This will continue until interrupted)

number approximation in python

I have a list of floating points numbers which represent x and y coordinates of points.
(-379.99418604651157, 47.517234218543351, 0.0) #representing point x
an edge contains two such numbers.
I'd like to use a graph traversal algorithm, such as dijkstra, but using floating point numbers such as the ones above don't help.
What I'm actually looking for is a way of approximating those numbers:
(-37*.*, 4*.*, 0.0)
is there a python function that does that?
"...using floating point numbers such as the ones above don't help..." - why not? I don't recall integers as a requirement for Dijkstra. Aren't you concerned with the length of the edge? That's more likely to be a floating point number, even if the endpoints are expressed in integer values.
I'm quoting from Steve Skiena's "Algorithm Design Manual":
Dijkstra's algorithm proceeds in a
series of rounds, where each round
establishes the shortest path from s
to some new vertex. Specifically, x
is the vertex that minimizes dist(s,
vi) + w(vi, x) over all unfinished 1
<= i <= n...
Distance - no mention of integer.
Like so?
>>> x, y, z = (-379.99418604651157, 47.517234218543351, 0.0)
>>> abs(x - -370) < 10
True
>>> abs(y - 40) < 10
True
Given your vector
(-379.99418604651157, 47.517234218543351, 0.0) #representing point x
The easiest way to perform rounding that works like you would expect would probably be to use the decimal module: http://docs.python.org/library/decimal.html .
from decimal import Decimal:
point = (-379.99418604651157, 47.517234218543351, 0.0) #representing point x
converted = [Decimal(str(x)) for x in point]
Then, to get an approximation, you can use the quantize method:
>>> converted[0].quantize(Decimal('.0001'), rounding="ROUND_DOWN")
Decimal("-379.9941")
This approach has the advantage of the built in ability to avoid rounding errors. Hopefully this is helpful.
Edit:
After seeing your comment, it looks like you're trying to see if two points are close to each other. These functions might do what you want:
def roundable(a,b):
"""Returns true if a can be rounded to b at any precision"""
a = Decimal(str(a))
b = Decimal(str(b))
return a.quantize(b) == b
def close(point_1, point_2):
for a,b in zip(point_1, point_2):
if not (roundable(a,b) or roundable(b,a)):
return False
return True
I don't know if this is better than an epsilon approach, but it's fairly simple to implement.
I'm not sure what the problem is with the floating point numbers, but there are several ways you can approximate your values. If you just want to round them you can use math.ceil(), math.floor() and math.trunc().
If you actually want to keep track of the precision, there are a bunch of multi-precision math libraries listed on the wiki which might be useful.
I suppose that you want to approximate the number so that you can visually easily understand you're algorithm while stepping into it (as Djikstra pose no limitation on the coordinate of the node, in fact it is only interested with the cost of edges).
A simple function to approximate numbers:
>>> import math
>>> def approximate(value, places = 0):
... factor = 10. ** places
... return factor * math.trunc(value / factor)
>>> p = (-379.99418604651157, 47.517234218543351, 0.0)
>>> print [ approximate(x, 1) for x in p ]
[-370.0, 40.0, 0.0]

testing for numeric equality when variable is modified inside loop

I am new to python and I was writing something like:
t = 0.
while t<4.9:
t = t + 0.1
if t == 1.:
... do something ...
I noticed that the if statement was never being executed. So I modified the code to look like this:
''' Case a'''
t = 0.
while t<4.9:
t = t + 0.1
print(t)
print(t == 5.)
When I run this I get:
>>> ================================ RESTART ================================
>>>
5.0
False
This was a surprise because I expected the comparison to test as True. Then, I tried the following two cases:
''' Case b'''
t = 0
while t<5:
t = t + 1
print(t)
print(t == 5)
''' Case c'''
t = 0.
while t<5:
t = t + 0.5
print(t)
print(t == 5)
When I run the last 2 cases (b and c) the comparison in the final statement tests as True. I do not understand why it is so or why it seems that the behavior is not consistent. What am I doing wrong?
The problem is that binary floating point arithmetic is not precise so you will get small errors in the calculations. In particular the number 0.1 has no exact binary representation. When you calculate using floating point numbers the very small errors cause the result to be slightly incorrect from what you might expect and that makes the equality test fail.
This small error might not be visible when printing the float with the default string representation. Try using repr instead, as this gives a slightly more accurate representation of the number (but still not 100% accurate):
>>> print(repr(t))
4.999999999999998
>>> print(t == 5.)
False
To get an accurate string representation of a float you can use the format method:
>>> print '{0:.60f}'.format(t)
4.999999999999998223643160599749535322189331054687500000000000
>>> print '{0:.60f}'.format(0.1)
0.100000000000000005551115123125782702118158340454101562500000
A general rule with floating point arithmetic is to never make equality comparisons.
The reason why it works when you used 0.5 is because 0.5 does have an exact representation as a binary floating point number so you don't see any problem in that case. Similarly it would work for 0.25 or 0.125.
If you need precise calculations you can use a decimal type instead.
from decimal import Decimal
step = Decimal('0.1')
t = Decimal(0)
while t < Decimal(5):
t += step
print(t)
print(t == Decimal(5))
Result:
5.0
True
NEVER try to test floats for equality.
Floats are often not exactly what you inputted them to be.
In [43]: .1
Out[43]: 0.10000000000000001
So it's much safer to only test floats with inequalities.
If you need to test if two floats are nearly equal, use a utility function like near:
def near(a,b,rtol=1e-5,atol=1e-8):
try:
return abs(a-b)<(atol+rtol*abs(b))
except TypeError:
return False
The rtol parameter allows you to specify relative tolerance. (abs(a-b)/abs(b)) < rtol
The atol parameter allows you to specify absolute tolerance. abs(a-b) < atol
So for example, you could write
t = 0.
while t<4.9:
t = t + 0.1
if near(t,1.):
print('Hiya')

Categories