Because numpy.arange() uses ceil((stop - start)/step) to determine the number of items, a small float imprecision (stop = .400000001) can add an unintended value to the list.
Example
The first case does not include the stop point (intended)
>>> print(np.arange(.1,.3,.1))
[0.1 0.2]
The second case includes the stop point (not intended)
>>> print(np.arange(.1,.4,.1))
[0.1 0.2 0.3 0.4]
numpy.linspace() fixes this problem, np.linspace(.1,.4-.1,3). but requires you know the number of steps. np.linspace(start,stop-step,np.ceil((stop-step)/step)) leads to the same incosistencies.
Question
How can I generate a reliable float range without knowing the # of elements in the range?
Extreme Case
Consider the case in which I want generate a float index of unknown precision
np.arange(2.00(...)001,2.00(...)021,.00(...)001)
Your goal is to calculate what ceil((stop - start)/step) would be if the values had been calculated with exact mathematics.
This is impossible to do given only floating-point values of start, stop, and step that are the results of operations in which some rounding errors may have occurred. Rounding removes information, and there is simply no way to create information from lack of information.
Therefore, this problem is only solvable if you have additional information about start, stop, and step.
Suppose step is exact, but start and stop have some accumulated errors bounded by e0 and e1. That is, you know start is at most e0 away from its ideal mathematical value (in either direction), and stop is at most e1 away from its ideal value (in either direction). Then the ideal value of (stop-start)/step could range from (stop-start-e0-e1)/step to (stop-start+e0+e1)/step away from its ideal value.
Suppose there is an integer between (stop-start-e0-e1)/step to (stop-start+e0+e1)/step. Then it is impossible to know whether the ideal ceil result should be the lesser integer or the greater just from the floating-point values of start, stop, and step and the bounds e0 and e1.
However, from the examples you have given, the ideal (stop-start)/step could be exactly an integer, as in (.4-.1)/.1. If so, any non-zero error bounds could result in the error interval straddling an integer, making the problem impossible to solve from the information we have so far.
Therefore, in order to solve the problem, you must have more information than just simple bounds on the errors. You must know, for example, that (stop-start)/step is exactly an integer or is otherwise quantized. For example, if you knew that the ideal calculation of the number of steps would produce a multiple of .1, such as 3.8, 3.9, 4.0, 4.1, or 4.2, but never 4.05, and the errors were sufficiently small that the floating-point calculation (stop-start)/step had a final error less than .05, then it would be possible to round (stop-start)/step to the nearest qualifying multiple and then to apply ceil to that.
If you have such information, you can update the question with what you know about the errors in start, stop, and step (e.g., perhaps each of them is the result of a single conversion from decimal to floating-point) and the possible values of the ideal (stop-start)/step. If you do not have such information, there is no solution.
If you are guaranteed that (stop-start) is a multiple of step, then you can use the decimal module to compute the number of steps, i.e.
from decimal import Decimal
def arange(start, stop, step):
steps = (Decimal(stop) - Decimal(start))/Decimal(step)
if steps % 1 != 0:
raise ValueError("step is not a multiple of stop-start")
return np.linspace(float(start),float(stop),int(steps),endpoint=False)
print(arange('0.1','0.4','0.1'))
If you have an exact representation of your ends and step and if they are rational you can use the fractions module:
>>> from fractions import Fraction
>>>
>>> a = Fraction('1.0000000100000000042')
>>> b = Fraction('1.0000002100000000002')
>>> c = Fraction('0.0000000099999999998') * 5 / 3
>>>
>>> float(a) + float(c) * np.arange(int((b-a)/c))
array([1.00000001, 1.00000003, 1.00000004, 1.00000006, 1.00000008,
1.00000009, 1.00000011, 1.00000013, 1.00000014, 1.00000016,
1.00000018, 1.00000019])
>>>
>>> eps = Fraction(1, 10**100)
>>> b2 = b - eps
>>> float(a) + float(c) * np.arange(int((b2-a)/c))
array([1.00000001, 1.00000003, 1.00000004, 1.00000006, 1.00000008,
1.00000009, 1.00000011, 1.00000013, 1.00000014, 1.00000016,
1.00000018])
if not you'll have to settle for some form of cutoff:
>>> a = 1.0
>>> b = 1.003999999
>>> c = 0.001
>>>
# cut off at 4 decimals
>>> round(float((b-a)/c), 4)
4.0
# cut off at 6 decimals
>>> round(float((b-a)/c), 6)
3.999999
You can round numbers to arbitrary degrees of precision in Python using the format function.
For example, if you want the first three digits of e after the decimal place, you can run
float(format(np.e, '.3f'))
Use this to eliminate float imprecisions and you should be go to go.
Related
I have a random set of numbers in a SQL database:
1.2
0.4
5.1
0.0000000000232
1
7.54
0.000000000000006534
The decimals way below zero are displayed as scientific notation
num = 0.0000000000232
print(num)
> 2.23e-11
But that causes the rest of my code to bug out as the api behind it expects a decimal number. I checked it as I increased the precision with :.20f - that works fine.
Since the very small numbers are not constant with their precision, It would be unwise to simply set a static .20f.
What is a more elegant way to translate this to the correct decimal, always dynamic with the precision?
If Python provides a way to do this, they've hidden it very well. But a simple function can do it.
def float_to_str(x):
to_the_left = 1 + floor(log(x, 10))
to_the_right = sys.float_info.dig - to_the_left
if to_the_right <= 0:
s = str(int(x))
else:
s = format(x, f'0.{to_the_right}f').rstrip('0')
return s
>>> for num in [1.2, 0.4, 5.1, 0.0000000000232, 1, 7.54, 0.000000000000006534]:
print(float_to_str(num))
1.2
0.4
5.1
0.0000000000232
1.
7.54
0.000000000000006534
The first part uses the logarithm base 10 to figure out how many digits will be on the left of the decimal point, or the number of zeros to the right of it if the number is negative. To find out how many digits can be to the right, we take the total number of significant digits that a float can hold as given by sys.float_info.dig which should be 15 on most Python implementations, and subtract the digits on the left. If this number is negative there won't be anything but garbage after the decimal point, so we can rely on integer conversion instead - it never uses scientific notation. Otherwise we simply conjure up the proper string to use with format. For the final step we strip off the redundant trailing zeros.
Using integers for large numbers isn't perfect because we lose the rounding that naturally occurs with floating point string conversion. float_to_str(1e25) for example will return '10000000000000000905969664'. Since your examples didn't contain any such large numbers I didn't worry about it, but it could be fixed with a little more work. For the reasons behind this see Is floating point math broken?
Importing Fraction from fractions to give a fractional representation of a real number, but giving responses quite complicated which seems very simple by the paper-pen method.
Fractions(.2) giving answer 3602879701896397/18014398509481984,
which is 0.20000000000000001110223024625157, almost .2, but I want it to be simply 1/5.
I know there's limit() for this use, but what I simply required is smallest numerator and denominator which gives the exact real number bcoz I am dealing with a lot of numbers in a big range so i cant use same limit() argument for all.
You can use the Fraction class to represent 0.2, and you can access the numerator and denominator as follows:
>>> from fractions import Fraction
>>> f = Fraction(1, 5)
>>> f.numerator
1
>>> f.denominator
5
Hope it helps.
Your strange output results from float point problems. You can in certain cases overcome this by limiting the denominator with Fraction.limit_denominator(). This procedure can of course also cause rounding errors, if the real value of the denominator is larger than the threshold you use. The default value for this threshold is 1,000,000, but you can also use smaller values.
>>> import fractions
>>> print(fractions.Fraction(0.1))
3602879701896397/36028797018963968
>>> # lower the threshold to 1000
>>> print(fractions.Fraction(0.1).limit_denominator(1000))
1/10
>>> # alternatively, use a str representation as per documentation/examples
>>> print(fractions.Fraction('0.1'))
1/10
>>> # won't work for smaller fractions, use default of 1,000,000 instead
>>> print(fractions.Fraction(0.00001).limit_denominator(1000))
0
>>> print(fractions.Fraction(0.00001).limit_denominator())
1/100000
Of course, as explained in the first sentence, there are precision limitations due to the way float numbers are stored. If you have numbers in the magnitude of 10^9, you won't get an accurate representation of 10 digits in the fractional part as
a = 1234567890.0987654321
print(a)
demonstrates. But you might ask yourself, if you really need an accuracy of 10^-15, if your input doesn't reflect this accuracy. If you want to have a higher precision, you have to use the decimal module right from the start with increased precision level throughout all mathematical operations. Even better is to take care of numerators and denominators as integer values from the beginning - in Python integer values are theoretically not restricted in size
I am attempting to do a few different operations in Numpy (mean and interp), and with both operations I am getting the result 2.77555756156e-17 at various times, usually when I'm expecting a zero. Even attempting to filter these out with array[array < 0.0] = 0.0 fails to remove the values.
I assume there's some sort of underlying data type or environment error that's causing this. The data should all be float.
Edit: It's been helpfully pointed out that I was only filtering out the values of -2.77555756156e-17 but still seeing positive 2.77555756156e-17. The crux of the question is what might be causing these wacky values to appear when doing simple functions like interpolating values between 0-10 and taking a mean of floats in the same range, and how can I avoid it without having to explicitly filter the arrays after every statement.
You're running into numerical precision, which is a huge topic in numerical computing; when you do any computation with floating point numbers, you run the risk of running into tiny values like the one you've posted here. What's happening is that your calculations are resulting in values that can't quite be expressed with floating-point numbers.
Floating-point numbers are expressed with a fixed amount of information (in Python, this amount defaults to 64 bits). You can read more about how that information is encoded on the very good Floating point Wikipedia page. In short, some calculation that you're performing in the process of computing your mean produces an intermediate value that cannot be precisely expressed.
This isn't a property of numpy (and it's not even really a property of Python); it's really a property of the computer itself. You can see this is normal Python by playing around in the repl:
>>> repr(3.0)
'3.0'
>>> repr(3.0 + 1e-10)
'3.0000000001'
>>> repr(3.0 + 1e-18)
'3.0'
For the last result, you would expect 3.000000000000000001, but that number can't be expressed in a 64-bit floating point number, so the computer uses the closest approximation, which in this case is just 3.0. If you were trying to average the following list of numbers:
[3., -3., 1e-18]
Depending on the order in which you summed them, you could get 1e-18 / 3., which is the "correct" answer, or zero. You're in a slightly stranger situation; two numbers that you expected to cancel didn't quite cancel out.
This is just a fact of life when you're dealing with floating point mathematics. The common way of working around it is to eschew the equals sign entirely and to only perform "numerically tolerant comparison", which means equality-with-a-bound. So this check:
a == b
Would become this check:
abs(a - b) < TOLERANCE
For some tolerance amount. The tolerance depends on what you know about your inputs and the precision of your computer; if you're using a 64-bit machine, you want this to be at least 1e-10 times the largest amount you'll be working with. For example, if the biggest input you'll be working with is around 100, it's reasonable to use a tolerance of 1e-8.
You can round your values to 15 digits:
a = a.round(15)
Now the array a should show you 0.0 values.
Example:
>>> a = np.array([2.77555756156e-17])
>>> a.round(15)
array([ 0.])
This is most likely the result of floating point arithmetic errors. For instance:
In [3]: 0.1 + 0.2 - 0.3
Out[3]: 5.551115123125783e-17
Not what you would expect? Numpy has a built in isclose() method that can deal with these things. Also, you can see the machine precision with
eps = np.finfo(np.float).eps
So, perhaps something like this could work too:
a = np.array([[-1e-17, 1.0], [1e-16, 1.0]])
a[np.abs(a) <= eps] = 0.0
I am having a bit of trouble understanding some results I am getting while operating with Python 2.7.
>>> x=1
>>> e=1e-20
>>> x+e
1.0
>>> x+e-x
0.0
>>> e+x-x
0.0
>>> x-x+e
1e-20
This is copied directly from Python. I am having a class on how to program on Python and I do not understand the disparity of results (x+e==1, x-x+e==1e-20 but x+e-x==0 and e+x-x==0).
I have already read the Python tutorial on Representation Errors, but I believe none of that was mentioned there
Thanks in advance
Floating-point addition is not associative.
x+e-x is grouped as (x+e)-x. It adds x and e, rounds the result to the nearest representable number (which is 1), then subtracts x from the result and rounds again, producing 0.
x-x+e is grouped as (x-x)+e. It subtracts x from x, producing 0, and rounds it to the nearest representable number, which is 0. It then adds e to 0, producing e, and rounds it to the nearest representable number, which is e.
This is because of the way that computers represent floating point numbers.
This is all really in binary format but let's pretend that it works with base 10 numbers because that's a lot easier for us to relate to.
A floating point number is expressed on the form 0.x*10^y where x is a 10-digit number (I'm omitting trailing zeroes here) and y is the exponent. This means that the number 1.0 is expressed as 0.1*10^1 and the number 0.1 as 0.1*10^0.
To add these two numbers together we need to make sure that they have the same exponent. We can do this easily by shifting the numbers back and forth, i.e. we change 0.1*10^0 to 0.01*10^1 and then we add the together to get 0.11*10^1.
When we have 0.1*10^1 and 0.1*10^-19 (1e-20) we will shift 0.1*10^-19 20 steps, meaning that the 1 will fall outside the range of our 10 digit number so we will end up with 0.1*10^1 + 0.0*10^1 = 0.1*10^1.
The reason you end up with 1e-20 in your last example is because addition is done from left to right, so we subtract 0.1*10^1 from 0.1*10^1 ending up with 0.0*10^0 and add 0.1*10^-19 to that, which is a special case where we don't need to shift any of them because one of them is exactly zero.
The reason I'm asking this is because there is a validation in OpenERP that it's driving me crazy:
>>> round(1.2 / 0.01) * 0.01
1.2
>>> round(12.2 / 0.01) * 0.01
12.200000000000001
>>> round(122.2 / 0.01) * 0.01
122.2
>>> round(1222.2 / 0.01) * 0.01
1222.2
As you can see, the second round is returning an odd value.
Can someone explain to me why is this happening?
This has in fact nothing to with round, you can witness the exact same problem if you just do 1220 * 0.01:
>>> 1220*0.01
12.200000000000001
What you see here is a standard floating point issue.
You might want to read what Wikipedia has to say about floating point accuracy problems:
The fact that floating-point numbers cannot precisely represent all real numbers, and that floating-point operations cannot precisely represent true arithmetic operations, leads to many surprising situations. This is related to the finite precision with which computers generally represent numbers.
Also see:
Numerical analysis
Numerical stability
A simple example for numerical instability with floating-point:
the numbers are finite. lets say we save 4 digits after the dot in a given computer or language.
0.0001 multiplied with 0.0001 would result something lower than 0.0001, and therefore it is impossible to save this result!
In this case if you calculate (0.0001 x 0.0001) / 0.0001 = 0.0001, this simple computer will fail in being accurate because it tries to multiply first and only afterwards to divide. In javascript, dividing with fractions leads to similar inaccuracies.
The float type that you are using stores binary floating point numbers. Not every decimal number is exactly representable as a float. In particular there is no exact representation of 1.2 or 0.01, so the actual number stored in the computer will differ very slightly from the value written in the source code. This representation error can cause calculations to give slightly different results from the exact mathematical result.
It is important to be aware of the possibility of small errors whenever you use floating point arithmetic, and write your code to work well even when the values calculated are not exactly correct. For example, you should consider rounding values to a certain number of decimal places when displaying them to the user.
You could also consider using the decimal type which stores decimal floating point numbers. If you use decimal then 1.2 can be stored exactly. However, working with decimal will reduce the performance of your code. You should only use it if exact representation of decimal numbers is important. You should also be aware that decimal does not mean that you'll never have any problems. For example 0.33333... has no exact representation as a decimal.
There is a loss of accuracy from the division due to the way floating point numbers are stored, so you see that this identity doesn't hold
>>> 12.2 / 0.01 * 0.01 == 12.2
False
bArmageddon, has provided a bunch of links which you should read, but I believe the takeaway message is don't expect floats to give exact results unless you fully understand the limits of the representation.
Especially don't use floats to represent amounts of money! which is a pretty common mistake
Python also has the decimal module, which may be useful to you
Others have answered your question and mentioned that many numbers don't have an exact binary fractional representation. If you are accustomed to working only with decimal numbers, it can seem deeply weird that a nice, "round" number like 0.01 could be a non-terminating number in some other base. In the spirit of "seeing is believing," here's a little Python program that will print out a binary representation of any number to any desired number of digits.
from decimal import Decimal
n = Decimal("0.01") # the number to print the binary equivalent of
m = 1000 # maximum number of digits to print
p = -1
r = []
w = int(n)
n = abs(n) - abs(w)
while n and -p < m:
s = Decimal(2) ** p
if n >= s:
r.append("1")
n -= s
else:
r.append("0")
p -= 1
print "%s.%s%s" % ("-" if w < 0 else "", bin(abs(w))[2:],
"".join(r), "..." if n else "")