I have read this question and understand that Numpy arrays cannot be used in boolean context. Let's say I want to perform an element-wise boolean check on the validity of inputs to a function. Can I realize this behavior while still using Numpy vectorization, and if so, how? (and if not, why?)
In the following example, I compute a value from two inputs while checking that both inputs are valid (both must be greater than 0)
import math, numpy
def calculate(input_1, input_2):
if input_1 < 0 or input_2 < 0:
return 0
return math.sqrt(input_1) + math.sqrt(input_2)
calculate_many = (lambda x: calculate(x, 20 - x))(np.arange(-20, 40))
By itself, this would not work with Numpy arrays because of ValueError. But, it is imperative that math.sqrt is never run on negative inputs because that would result in another error.
One solution using list comprehension is as follows:
calculate_many = [calculate(x, 20 - x) for x in np.arange(-20, 40)]/=
However, this no longer uses vectorization and would be painfully slow if the size of the arange was increased drastically.
Is there a way to implement this if check while still using vectorization?
I believe below expression performs vectorized operations and avoid the use of loops/lambda functions
np.sqrt(((input1>0) & 1)*input1) + np.sqrt(((input2>0) & 1)*input2)
In [121]: x = np.array([1, 10, 21, -1.])
In [122]: y = 20-x
In [123]: np.sqrt(x)
/usr/local/bin/ipython3:1: RuntimeWarning: invalid value encountered in sqrt
#!/usr/bin/python3
Out[123]: array([1. , 3.16227766, 4.58257569, nan])
There are several ways of dealing with 'out-of-range' values.
#Sam's approach is to tweak the inputs so they are valid
In [129]: ((x>0) & 1)*x
Out[129]: array([ 1., 10., 21., -0.])
Another is to use masking to limit the values calculate.
Your function skips the sqrt is either input is negative; conversely it doe sthe calc where both are valid. That's different from testing each separately.
In [124]: mask = (x>=0) & (y>=0)
In [125]: mask
Out[125]: array([ True, True, False, False])
We can use the mask thus:
In [126]: res = np.zeros_like(x)
In [127]: res[mask] = np.sqrt(x[mask]) + np.sqrt(y[mask])
In [128]: res
Out[128]: array([5.35889894, 6.32455532, 0. , 0. ])
In my comments I suggested using the where parameter of np.sqrt. It does, though, need an out parameter as well.
In [130]: np.sqrt(x, where=mask, out=np.zeros_like(x)) +
np.sqrt(y, where=mask, out=np.zeros_like(x))
Out[130]: array([5.35889894, 6.32455532, 0. , 0. ])
Alternatively if we are are happy with the nan in Out[123] we can just suppress the RuntimeWarning.
Related
I want to replace a row of a NumPy array by the same row after modification by a function.
Here is my code:
def _softmax(z):
array = np.exp(z)
array = np.divide(array,np.sum(array))
return array
a = np.array([[1,2,3,4],[5,15,4,7]])
n =_softmax(a[0])
print(n)
a[0]= n
print(a[0])
I get the folowing result :
[0.0320586 0.08714432 0.23688282 0.64391426]
[0 0 0 0]
As you can see, n is okay, but a[0] won't change, except to [0,0,0,0].
However, if I try:
a[0] = np.array([4,3,2,1])
...it works perfectly fine.
The reason why is because a is originally of type np.int64, whereas the output of your softmax is np.float. You must change the precision of your NumPy array a to np.float or the assignment into the first row of a gets down-converted to integer precision:
a = np.array([[1,2,3,4],[5,15,4,7]], dtype=np.float)
The reason why it is created with type np.int64 originally is because all of your values are integer. As soon as you change one of them to float, the array gets promoted to floating-point:
In [9]: a = np.array([[1,2,3,4],[5,15,4,7]])
In [10]: a.dtype
Out[10]: dtype('int64')
In [11]: a = np.array([[1.0,2,3,4],[5,15,4,7]])
In [12]: a.dtype
Out[12]: dtype('float64')
Take note that I changed the precision for the value 1 to 1.0. You can do it this way if you like without explicitly specifying the type. NumPy figures out what is contained in your array construction and infers the best type that matches all of the information provided.
Finally once we run through everything we get:
In [14]: a
Out[14]:
array([[ 0.0320586 , 0.08714432, 0.23688282, 0.64391426],
[ 5. , 15. , 4. , 7. ]])
I need to compute x in the following way (legacy code):
x = numpy.where(b == 0, a, 1/b)
I suppose it worked in python-2.x (as it was in a python-2.7 code), but it does not work in python-3.x (if b = 0 it returns an error).
How do I make it work in python-3.x?
EDIT: error message (Python 3.6.3):
ZeroDivisionError: division by zero
numpy.where is not conditional execution; it is conditional selection. Python function parameters are always completely evaluated before a function call, so there is no way for a function to conditionally or partially evaluate its parameters.
Your code:
x = numpy.where(b == 0, a, 1/b)
tells Python to invert every element of b and then select elements from a or 1/b based on elements of b == 0. Python never even reaches the point of selecting elements, because computing 1/b fails.
You can avoid this problem by only inverting the nonzero parts of b. Assuming a and b have the same shape, it could look like this:
x = numpy.empty_like(b)
mask = (b == 0)
x[mask] = a[mask]
x[~mask] = 1/b[~mask]
A old trick for handling 0 elements in an array division is to add a conditional value:
In [63]: 1/(b+(b==0))
Out[63]: array([1. , 1. , 0.5 , 0.33333333])
(I used this years ago in apl).
x = numpy.where(b == 0, a, 1/b) is evaluated in the same way as any other Python function. Each function argument is evaluated, and the value passed to the where function. There's no 'short-circuiting' or other method of bypassing bad values of 1/b.
So if 1/b returns a error you need to either change b so it doesn't do that, calculate it in context that traps traps the ZeroDivisionError, or skips the 1/b.
In [53]: 1/0
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
<ipython-input-53-9e1622b385b6> in <module>()
----> 1 1/0
ZeroDivisionError: division by zero
In [54]: 1.0/0
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
<ipython-input-54-99b9b9983fe8> in <module>()
----> 1 1.0/0
ZeroDivisionError: float division by zero
In [55]: 1/np.array(0)
/usr/local/bin/ipython3:1: RuntimeWarning: divide by zero encountered in true_divide
#!/usr/bin/python3
Out[55]: inf
What are a and b? Scalars, arrays of some size?
where makes most sense if b (and maybe a) is an array:
In [59]: b = np.array([0,1,2,3])
The bare division gives me a warning, and an inf element:
In [60]: 1/b
/usr/local/bin/ipython3:1: RuntimeWarning: divide by zero encountered in true_divide
#!/usr/bin/python3
Out[60]: array([ inf, 1. , 0.5 , 0.33333333])
I could use where to replace that inf with something else, for example a nan:
In [61]: np.where(b==0, np.nan, 1/b)
/usr/local/bin/ipython3:1: RuntimeWarning: divide by zero encountered in true_divide
#!/usr/bin/python3
Out[61]: array([ nan, 1. , 0.5 , 0.33333333])
The warning can be silenced as #donkopotamus shows.
An alternative to seterr is errstate in a with context:
In [64]: with np.errstate(divide='ignore'):
...: x = np.where(b==0, np.nan, 1/b)
...:
In [65]: x
Out[65]: array([ nan, 1. , 0.5 , 0.33333333])
How to suppress the error message when dividing 0 by 0 using np.divide (alongside other floats)?
If you wish to disable warnings in numpy while you divide by zero, then do something like:
>>> existing = numpy.seterr(divide="ignore")
>>> # now divide by zero in numpy raises no sort of exception
>>> 1 / numpy.zeros( (2, 2) )
array([[ inf, inf],
[ inf, inf]])
>>> numpy.seterr(*existing)
Of course this only governs division by zero in an array. It will not prevent an error when doing a simple 1 / 0.
In your particular case, if we wish to ensure that we work whether b is a scalar or a numpy type, do as follows:
# ignore division by zero in numpy
existing = numpy.seterr(divide="ignore")
# upcast `1.0` to be a numpy type so that numpy division will always occur
x = numpy.where(b == 0, a, numpy.float64(1.0) / b)
# restore old error settings for numpy
numpy.seterr(*existing)
I solved it using this:
x = (1/(np.where(b == 0, np.nan, b))).fillna(a)
The numpy.where documentation states:
If x and y are given and input arrays are 1-D, where is
equivalent to::
[xv if c else yv for (c,xv,yv) in zip(condition,x,y)]
So why do you see the error? Take this trivial example:
c = 0
result = (1 if c==0 else 1/c)
# 1
So far so good. if c==0 is checked first and the result is 1. The code does not attempt to evaluate 1/c. This is because the Python interpreter processes a lazy ternary operator and so only evaluates the appropriate expression.
Now let's translate this into numpy.where approach:
c = 0
result = (xv if c else yv for (c, xv, yv) in zip([c==0], [1], [1/c]))
# ZeroDivisionError
The error occurs in evaluating zip([c==0], [1], [1/c]) before even the logic is applied. The generator expression itself can't be evaluated. As a function, numpy.where does not, and indeed cannot, replicate the lazy computation of Python's ternary expression.
So, I created an vertical numpy array, used the /= operator and the output seems to be incorrect.
Basically if x is a vector, s a scalar. I would expect x /= s have every entry of x divided by s. However, I couldn't make much sense of the output. The operator is only applied on part of the entries in x, and I am not sure how they are chosen.
In [8]: np.__version__
Out[8]: '1.10.4'
In [9]: x = np.random.rand(5,1)
In [10]: x
Out[10]:
array([[ 0.47577008],
[ 0.66127875],
[ 0.49337183],
[ 0.47195985],
[ 0.82384023]]) ####
In [11]: x /= x[2]
In [12]: x
Out[12]:
array([[ 0.96432356],
[ 1.3403253 ],
[ 1. ],
[ 0.95660073],
[ 0.82384023]]) #### this entry is not changed.
Your value of x[2] changes to 1 midway through the evaluation. you need to make a copy of the value then divide each element by it, either assign it to another variable or use copy i.e.
from copy import copy
x /= copy(x[2])
To understand why we need to do this lets look under the hood of what is happening.
In [9]: x = np.random.rand(5,1)
Here we define x as an array, but what isn't exactly clear that each element in this array is technically an array also. This is the important distinction, as we are not dealing with defined values rather numpy array objects so in the next line:
In [11]: x /= x[2]
We end up essentially 'looking up' the value in x[2] which returns an array with one value, but because we're looking this up each time it is possible to change.
A cleaner solution would be to flatten the array into 1d therefore x[2] with now equal 0.49337183 instead of array( [0.49337183])
So before we do x /= x[2] we can call x = x.flatten()
Or better yet keep it 1d from the start x = np.random.rand(5)
And as for the reason x[3] changes and x[4] does not, the only real helpful answer I can give is that the division does not happen in order, complex buffering timey wimey stuff.
It is only for odd size of vector in theory but if you do :
x = np.random.rand(5,1)
a = x[2]*1
x/=a
it will be working
I am trying to devise an efficient method to perform array division on NumPy where the divisor is largely made up of 1's.
import numpy as np
A = np.random.rand(3,3)
B = np.array([[1,1,3],[1,1,1],[1,4,1]])
Result = A/B
Here, only 2 instances of the division operation are really required. I am not sure if Numpy is already optimized for division by 1 but my gut feeling is that it isn't.
Your ideas, please?
You can apply the division to selected items of A and B:
In [249]: A=np.arange(9.).reshape(3,3)
In [250]: B = np.array([[1,1,3],[1,1,1],[1,4,1]])
In [251]: I=np.nonzero(B>1)
In [252]: I
Out[252]: (array([0, 2], dtype=int32), array([2, 1], dtype=int32))
In [253]: A[I] /= B[I]
In [254]: A
Out[254]:
array([[ 0. , 1. , 0.66666667],
[ 3. , 4. , 5. ],
[ 6. , 1.75 , 8. ]])
Also a boolean index: A[B>1] /= B[B>1]
I doubt if it's faster. But for other cases, such as a B that contains 0 it is a way of avoiding errors/warnings. There must be a number of SO questions about numpy division by zero.
Interesting question. I didn't do a very thorough test, but filtering the division by searching for 1's in the denominator seems to slow things down, slightly, even when the fraction of 1's is very high (99%) (see code below). This suggests that the search for 1's denom[np.where(denom<>1.0)]... slows things down. Perhaps Numpy already optimizes array divisions in this way?
import numpy as np
def div(filter=False):
np.random.seed(1234)
num = np.random.rand(1024)
denom = np.random.rand(1024)
denom[np.where(denom>.01)] = 1.0
if not filter:
return num/denom
else:
idx = np.where(denom<>1.0)[0]
num[idx]/=denom[idx]
return num
In [17]: timeit div(True)
10000 loops, best of 3: 89.7 µs per loop
In [18]: timeit div(False)
10000 loops, best of 3: 69.2 µs per loop
after googling a while, I'm posting here for help.
I have two float64 variables returned from a function.
Both of them are apparently 1:
>>> x, y = somefunc()
>>> print x,y
>>> if x < 1 : print "x < 1"
>>> if y < 1 : print "y < 1"
1.0 1.0
y < 1
Behavior changes when variables are defined float32, in which case the 'y<1' statement doesn't appear.
I tried setting
np.set_printoptions(precision=10)
expecting to see the differences between variables but even so, both of them appear as 1.0 when printed.
I am a bit confused at this point.
Is there a way to visualize the difference of these float64 numbers?
Can "if/then" be used reliably to check float64 numbers?
Thanks
Trevarez
The printed values are not correct. In your case y is smaller than 1 when using float64 and bigger or equal to 1 when using float32. this is expected since rounding errors depend on the size of the float.
To avoid this kind of problems, when dealing with floating point numbers you should always decide a "minimum error", usually called epsilon and, instead of comparing for equality, checking whether the result is at most distant epsilon from the target value:
In [13]: epsilon = 1e-11
In [14]: number = np.float64(1) - 1e-16
In [15]: target = 1
In [16]: abs(number - target) < epsilon # instead of number == target
Out[16]: True
In particular, numpy already provides np.allclose which can be useful to compare arrays for equality given a certain tolerance. It works even when the arguments aren't arrays(e.g. np.allclose(1 - 1e-16, 1) -> True).
Note however than numpy.set_printoptions doesn't affect how np.float32/64 are printed. It affects only how arrays are printed:
In [1]: import numpy as np
In [2]: np.float(1) - 1e-16
Out[2]: 0.9999999999999999
In [3]: np.array([1 - 1e-16])
Out[3]: array([ 1.])
In [4]: np.set_printoptions(precision=16)
In [5]: np.array([1 - 1e-16])
Out[5]: array([ 0.9999999999999999])
In [6]: np.float(1) - 1e-16
Out[6]: 0.9999999999999999
Also note that doing print y or evaluating y in the interactive interpreter gives different results:
In [1]: import numpy as np
In [2]: np.float(1) - 1e-16
Out[2]: 0.9999999999999999
In [3]: print(np.float64(1) - 1e-16)
1.0
The difference is that print calls str while evaluating calls repr:
In [9]: str(np.float64(1) - 1e-16)
Out[9]: '1.0'
In [10]: repr(np.float64(1) - 1e-16)
Out[10]: '0.99999999999999989'
In [26]: x = numpy.float64("1.000000000000001")
In [27]: print x, repr(x)
1.0 1.0000000000000011
In other words, you are plagued by loss of precision in print statement. The value is very slightly different than 1.
Following the advices provided here I summarize the answers in this way:
To make comparisons between floats, the programmer has to define a minimum distance (eps) for them to be considered different (eps=1e-12, for example). Doing so, the conditions should be written like this:
Instead of (x>a), use (x-a)>eps
Instead of (x<a), use (a-x)>eps
Instead of (x==a), use abs(x-a)<eps
This doesn't apply to comparison between integer numbers since difference between them is fixed to 1.
Hope it helps others as it helped me.