Different behavior of float32/float64 numpy variables - python

after googling a while, I'm posting here for help.
I have two float64 variables returned from a function.
Both of them are apparently 1:
>>> x, y = somefunc()
>>> print x,y
>>> if x < 1 : print "x < 1"
>>> if y < 1 : print "y < 1"
1.0 1.0
y < 1
Behavior changes when variables are defined float32, in which case the 'y<1' statement doesn't appear.
I tried setting
np.set_printoptions(precision=10)
expecting to see the differences between variables but even so, both of them appear as 1.0 when printed.
I am a bit confused at this point.
Is there a way to visualize the difference of these float64 numbers?
Can "if/then" be used reliably to check float64 numbers?
Thanks
Trevarez

The printed values are not correct. In your case y is smaller than 1 when using float64 and bigger or equal to 1 when using float32. this is expected since rounding errors depend on the size of the float.
To avoid this kind of problems, when dealing with floating point numbers you should always decide a "minimum error", usually called epsilon and, instead of comparing for equality, checking whether the result is at most distant epsilon from the target value:
In [13]: epsilon = 1e-11
In [14]: number = np.float64(1) - 1e-16
In [15]: target = 1
In [16]: abs(number - target) < epsilon # instead of number == target
Out[16]: True
In particular, numpy already provides np.allclose which can be useful to compare arrays for equality given a certain tolerance. It works even when the arguments aren't arrays(e.g. np.allclose(1 - 1e-16, 1) -> True).
Note however than numpy.set_printoptions doesn't affect how np.float32/64 are printed. It affects only how arrays are printed:
In [1]: import numpy as np
In [2]: np.float(1) - 1e-16
Out[2]: 0.9999999999999999
In [3]: np.array([1 - 1e-16])
Out[3]: array([ 1.])
In [4]: np.set_printoptions(precision=16)
In [5]: np.array([1 - 1e-16])
Out[5]: array([ 0.9999999999999999])
In [6]: np.float(1) - 1e-16
Out[6]: 0.9999999999999999
Also note that doing print y or evaluating y in the interactive interpreter gives different results:
In [1]: import numpy as np
In [2]: np.float(1) - 1e-16
Out[2]: 0.9999999999999999
In [3]: print(np.float64(1) - 1e-16)
1.0
The difference is that print calls str while evaluating calls repr:
In [9]: str(np.float64(1) - 1e-16)
Out[9]: '1.0'
In [10]: repr(np.float64(1) - 1e-16)
Out[10]: '0.99999999999999989'

In [26]: x = numpy.float64("1.000000000000001")
In [27]: print x, repr(x)
1.0 1.0000000000000011
In other words, you are plagued by loss of precision in print statement. The value is very slightly different than 1.

Following the advices provided here I summarize the answers in this way:
To make comparisons between floats, the programmer has to define a minimum distance (eps) for them to be considered different (eps=1e-12, for example). Doing so, the conditions should be written like this:
Instead of (x>a), use (x-a)>eps
Instead of (x<a), use (a-x)>eps
Instead of (x==a), use abs(x-a)<eps
This doesn't apply to comparison between integer numbers since difference between them is fixed to 1.
Hope it helps others as it helped me.

Related

Numpy where and division by zero

I need to compute x in the following way (legacy code):
x = numpy.where(b == 0, a, 1/b)
I suppose it worked in python-2.x (as it was in a python-2.7 code), but it does not work in python-3.x (if b = 0 it returns an error).
How do I make it work in python-3.x?
EDIT: error message (Python 3.6.3):
ZeroDivisionError: division by zero
numpy.where is not conditional execution; it is conditional selection. Python function parameters are always completely evaluated before a function call, so there is no way for a function to conditionally or partially evaluate its parameters.
Your code:
x = numpy.where(b == 0, a, 1/b)
tells Python to invert every element of b and then select elements from a or 1/b based on elements of b == 0. Python never even reaches the point of selecting elements, because computing 1/b fails.
You can avoid this problem by only inverting the nonzero parts of b. Assuming a and b have the same shape, it could look like this:
x = numpy.empty_like(b)
mask = (b == 0)
x[mask] = a[mask]
x[~mask] = 1/b[~mask]
A old trick for handling 0 elements in an array division is to add a conditional value:
In [63]: 1/(b+(b==0))
Out[63]: array([1. , 1. , 0.5 , 0.33333333])
(I used this years ago in apl).
x = numpy.where(b == 0, a, 1/b) is evaluated in the same way as any other Python function. Each function argument is evaluated, and the value passed to the where function. There's no 'short-circuiting' or other method of bypassing bad values of 1/b.
So if 1/b returns a error you need to either change b so it doesn't do that, calculate it in context that traps traps the ZeroDivisionError, or skips the 1/b.
In [53]: 1/0
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
<ipython-input-53-9e1622b385b6> in <module>()
----> 1 1/0
ZeroDivisionError: division by zero
In [54]: 1.0/0
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
<ipython-input-54-99b9b9983fe8> in <module>()
----> 1 1.0/0
ZeroDivisionError: float division by zero
In [55]: 1/np.array(0)
/usr/local/bin/ipython3:1: RuntimeWarning: divide by zero encountered in true_divide
#!/usr/bin/python3
Out[55]: inf
What are a and b? Scalars, arrays of some size?
where makes most sense if b (and maybe a) is an array:
In [59]: b = np.array([0,1,2,3])
The bare division gives me a warning, and an inf element:
In [60]: 1/b
/usr/local/bin/ipython3:1: RuntimeWarning: divide by zero encountered in true_divide
#!/usr/bin/python3
Out[60]: array([ inf, 1. , 0.5 , 0.33333333])
I could use where to replace that inf with something else, for example a nan:
In [61]: np.where(b==0, np.nan, 1/b)
/usr/local/bin/ipython3:1: RuntimeWarning: divide by zero encountered in true_divide
#!/usr/bin/python3
Out[61]: array([ nan, 1. , 0.5 , 0.33333333])
The warning can be silenced as #donkopotamus shows.
An alternative to seterr is errstate in a with context:
In [64]: with np.errstate(divide='ignore'):
...: x = np.where(b==0, np.nan, 1/b)
...:
In [65]: x
Out[65]: array([ nan, 1. , 0.5 , 0.33333333])
How to suppress the error message when dividing 0 by 0 using np.divide (alongside other floats)?
If you wish to disable warnings in numpy while you divide by zero, then do something like:
>>> existing = numpy.seterr(divide="ignore")
>>> # now divide by zero in numpy raises no sort of exception
>>> 1 / numpy.zeros( (2, 2) )
array([[ inf, inf],
[ inf, inf]])
>>> numpy.seterr(*existing)
Of course this only governs division by zero in an array. It will not prevent an error when doing a simple 1 / 0.
In your particular case, if we wish to ensure that we work whether b is a scalar or a numpy type, do as follows:
# ignore division by zero in numpy
existing = numpy.seterr(divide="ignore")
# upcast `1.0` to be a numpy type so that numpy division will always occur
x = numpy.where(b == 0, a, numpy.float64(1.0) / b)
# restore old error settings for numpy
numpy.seterr(*existing)
I solved it using this:
x = (1/(np.where(b == 0, np.nan, b))).fillna(a)
The numpy.where documentation states:
If x and y are given and input arrays are 1-D, where is
equivalent to::
[xv if c else yv for (c,xv,yv) in zip(condition,x,y)]
So why do you see the error? Take this trivial example:
c = 0
result = (1 if c==0 else 1/c)
# 1
So far so good. if c==0 is checked first and the result is 1. The code does not attempt to evaluate 1/c. This is because the Python interpreter processes a lazy ternary operator and so only evaluates the appropriate expression.
Now let's translate this into numpy.where approach:
c = 0
result = (xv if c else yv for (c, xv, yv) in zip([c==0], [1], [1/c]))
# ZeroDivisionError
The error occurs in evaluating zip([c==0], [1], [1/c]) before even the logic is applied. The generator expression itself can't be evaluated. As a function, numpy.where does not, and indeed cannot, replicate the lazy computation of Python's ternary expression.

Can't figure out why numpy.log10 outputs nan?

So I have an 500k array of floating values. When I am trying to:
np.log10(my_long_array)
270k numbers getting replaced to nan, and they are not that small. For example:
In [1]: import numpy as np
In [2]: t = -0.055488893531690543
In [3]: np.log10(t)
/home/aydar/anaconda3/bin/ipython:1: RuntimeWarning: invalid value encountered in log10
#!/home/aydar/anaconda3/bin/python3
Out[3]: nan
In [4]: type(t)
Out[4]: float
What am I missing?
the logarithm of a negative number is undefined, hence the nan
From the docs to numpy.log10:
Returns: y : ndarray
The logarithm to the base 10 of x, element-wise. NaNs are returned where x is negative.
Negative numbers always give undefined log,
The logarithmic function
y = logb(x)
is the inverse function of the exponential function
x = b^y
Since the base b is positive (b>0), the base b raised to the power of y must be positive (b^y>0) for any real y. So the number x must be positive (x>0).
The real base b logarithm of a negative number is undefined.
logb(x) is undefined for x ≤ 0

Get the similarity of two numbers with python

i'm studying about Case-Based Reasoning algorithms, and I need to get the similarity of two numbers (integer or float).
For strings i'm using the Levenshtein lib and it handle well, but I don't know any Python lib to calculate the similarity of two numbers, there is one out there?
Anyone knows?
The result should be between 0 (different) and 1(perfect match), like Levenshtein.ratio().
#update1:
Using Levenshtein.ratio we get the ratio of similarity of two strings, 0 means totaly different, 1 exact match, any between 0 and 1 is the coeficient of similarity.
Example:
>>> import Levenshtein
>>> Levenshtein.ratio("This is a test","This is a test with similarity")
0.6363636363636364
>>> Levenshtein.ratio("This is a test","This is another test")
0.8235294117647058
>>> Levenshtein.ratio("This is a test","This is a test")
1.0
>>>
I need something like that, but with numbers.
For example, 5 has n% of similarity with 6. The number 5.4 has n% of similarity with 5.8.
I don't know if my example is clear.
#update 2:
Let me put a real word example. Let's say i'm looking for similar versions of CentOS linux distributions on a range of 100 servers. The CentOS Linux version numbers are something like 5.6, 5.7, 6.5. So, how close the number 5.7 are of 6.5? It's not so close, we get many versions (numbers) between them. But there is a coeficient of similarity, let's say 40% (or 0.4) using some algorithm of similarity like Levenshtein.
#update 3:
I got the answer for thia question. Im posting here to help more people:
>>> sum = 2.4 * 2.4
>>> sum2 = 7.5 * 7.5
>>> sum /math.sqrt(sum*sum2)
0.32
>>> sum = 7.4 * 7.4
>>> sum /math.sqrt(sum*sum2)
0.9866666666666666
>>> sum = 7.5 * 7.5
>>> sum /math.sqrt(sum*sum2)
1.0
To calculate the similarity of 2 numbers (float or integer) I wrote a simple function
def num_sim(n1, n2):
""" calculates a similarity score between 2 numbers """
return 1 - abs(n1 - n2) / (n1 + n2)
It simply returns 1 if they are exactly equal. It will go to 0 as the values of numbers differ.
From the link, I see that Ian Watson's slides show three options for assessing "similarity" of numbers. Of these, the "step function" option is readily available from numpy:
In [1]: from numpy import allclose
In [2]: a = 0.3 + 1e-9
In [3]: a == 0.3
Out[3]: False
In [4]: allclose(a, 0.3)
Out[4]: True
To get numeric output, as required for similarity, we make one change:
In [5]: int(a == 0.3)
Out[5]: 0
In [6]: int(allclose(a, 0.3))
Out[6]: 1
If preferred, float can be used in place of int:
In [8]: float(a == 0.3)
Out[8]: 0.0
In [9]: float(allclose(a, 0.3))
Out[9]: 1.0
allclose takes optional arguments rtol and atol so that you can specify, respectively, the relative or absolute tolerance to be used. Full documentation on allclose is here.

Why does math.modf return floats?

from http://docs.python.org/2/library/math.html:
math.frexp(x)
Return the mantissa and exponent of x as the pair (m, e). m is a float and e is an integer such that x == m * 2**e exactly. If x is zero, returns (0.0, 0), otherwise 0.5 <= abs(m) < 1. This is used to “pick apart” the internal representation of a float in a portable way.
math.modf(x)
Return the fractional and integer parts of x. Both results carry the sign of x and are floats.
In this related question, it's pointed out that returning floats doesn't really make sense for ceil and floor, so in Python 3, they were changed to return integers. Is there some reason why the integer result of modf isn't returned as an integer, too? In Python 2.7:
In [2]: math.floor(5.5)
Out[2]: 5.0
In [3]: math.modf(5.5)
Out[3]: (0.5, 5.0)
In [4]: math.frexp(5.5)
Out[4]: (0.6875, 3)
In Python 3:
In [2]: math.floor(5.5)
Out[2]: 5
In [3]: math.modf(5.5)
Out[3]: (0.5, 5.0)
In [4]: math.frexp(5.5)
Out[4]: (0.6875, 3)
Most of the math module functions are thin wrappers around functions of the same name defined by the C language standard. frexp() and modf() are two such, and return the same things the C functions of the same names return.
So part of this is ease of inter-language operation.
But another part is practicality:
>>> from math import modf
>>> modf(1e100)
(0.0, 1e+100)
Would you really want to get 10000000000000000159028911097599180468360808563945281389781327557747838772170381060813469985856815104 back as "the integer part"?
C can't possibly do so, because it doesn't have unbounded integers. Python simply doesn't want to do so ;-) Note that all floating-point values of sufficient magnitude are exact integers - although they may require hundreds of decimal digits to express.

Why is sin(180) not zero when using python and numpy?

Does anyone know why the below doesn't equal 0?
import numpy as np
np.sin(np.radians(180))
or:
np.sin(np.pi)
When I enter it into python it gives me 1.22e-16.
The number π cannot be represented exactly as a floating-point number. So, np.radians(180) doesn't give you π, it gives you 3.1415926535897931.
And sin(3.1415926535897931) is in fact something like 1.22e-16.
So, how do you deal with this?
You have to work out, or at least guess at, appropriate absolute and/or relative error bounds, and then instead of x == y, you write:
abs(y - x) < abs_bounds and abs(y-x) < rel_bounds * y
(This also means that you have to organize your computation so that the relative error is larger relative to y than to x. In your case, because y is the constant 0, that's trivial—just do it backward.)
Numpy provides a function that does this for you across a whole array, allclose:
np.allclose(x, y, rel_bounds, abs_bounds)
(This actually checks abs(y - x) < abs_ bounds + rel_bounds * y), but that's almost always sufficient, and you can easily reorganize your code when it's not.)
In your case:
np.allclose(0, np.sin(np.radians(180)), rel_bounds, abs_bounds)
So, how do you know what the right bounds are? There's no way to teach you enough error analysis in an SO answer. Propagation of uncertainty at Wikipedia gives a high-level overview. If you really have no clue, you can use the defaults, which are 1e-5 relative and 1e-8 absolute.
One solution is to switch to sympy when calculating sin's and cos's, then to switch back to numpy using sp.N(...) function:
>>> # Numpy not exactly zero
>>> import numpy as np
>>> value = np.cos(np.pi/2)
6.123233995736766e-17
# Sympy workaround
>>> import sympy as sp
>>> def scos(x): return sp.N(sp.cos(x))
>>> def ssin(x): return sp.N(sp.sin(x))
>>> value = scos(sp.pi/2)
0
just remember to use sp.pi instead of sp.np when using scos and ssin functions.
Faced same problem,
import numpy as np
print(np.cos(math.radians(90)))
>> 6.123233995736766e-17
and tried this,
print(np.around(np.cos(math.radians(90)), decimals=5))
>> 0
Worked in my case. I set decimal 5 not lose too many information. As you can think of round function get rid of after 5 digit values.
Try this... it zeros anything below a given tiny-ness value...
import numpy as np
def zero_tiny(x, threshold):
if (x.dtype == complex):
x_real = x.real
x_imag = x.imag
if (np.abs(x_real) < threshold): x_real = 0
if (np.abs(x_imag) < threshold): x_imag = 0
return x_real + 1j*x_imag
else:
return x if (np.abs(x) > threshold) else 0
value = np.cos(np.pi/2)
print(value)
value = zero_tiny(value, 10e-10)
print(value)
value = np.exp(-1j*np.pi/2)
print(value)
value = zero_tiny(value, 10e-10)
print(value)
Python uses the normal taylor expansion theory it solve its trig functions and since this expansion theory has infinite terms, its results doesn't reach exact but it only approximates.
For e.g
sin(x) = x - x³/3! + x⁵/5! - ...
=> Sin(180) = 180 - ... Never 0 bout approaches 0.
That is my own reason by prove.
Simple.
np.sin(np.pi).astype(int)
np.sin(np.pi/2).astype(int)
np.sin(3 * np.pi / 2).astype(int)
np.sin(2 * np.pi).astype(int)
returns
0
1
0
-1

Categories