Using python 2.7, scipy 1.0.0-3
Apparently I have a misunderstanding of how the numpy where function is supposed to operate or there is a known bug in its operation. I'm hoping someone can tell me which and explain a work-around to suppress the annoying warning that I am trying to avoid. I'm getting the same behavior when I use the pandas Series where().
To make it simple, I'll use a numpy array as my example. Say I want to apply np.log() on the array and only so for the condition a value is a valid input, i.e., myArray>0.0. For values where this function should not be applied, I want to set the output flag of -999.9:
myArray = np.array([1.0, 0.75, 0.5, 0.25, 0.0])
np.where(myArray>0.0, np.log(myArray), -999.9)
I expected numpy.where() to not complain about the 0.0 value in the array since the condition is False there, yet it does and it appears to actually execute for that False condition:
-c:2: RuntimeWarning: divide by zero encountered in log
array([ 0.00000000e+00, -2.87682072e-01, -6.93147181e-01,
-1.38629436e+00, -9.99900000e+02])
The numpy documentation states:
If x and y are given and input arrays are 1-D, where is equivalent to:
[xv if c else yv for (c,xv,yv) in zip(condition,x,y)]
I beg to differ with this statement since
[np.log(val) if val>0.0 else -999.9 for val in myArray]
provides no warning at all:
[0.0, -0.2876820724517809, -0.69314718055994529, -1.3862943611198906, -999.9]
So, is this a known bug? I don't want to suppress the warning for my entire code.
You can have the log evaluated at the relevant places only using its optional where parameter
np.where(myArray>0.0, np.log(myArray, where=myArray>0.0), -999.9)
or more efficiently
mask = myArray > 0.0
np.where(mask, np.log(myArray, where=mask), -999)
or if you find the "double where" ugly
np.log(myArray, where=myArray>0.0, out=np.full(myArray.shape, -999.9))
Any one of those three should suppress the warning.
This behavior of where should be understandable given a basic understanding of Python. This is a Python expression that uses a couple of numpy functions.
What happens in this expression?
np.where(myArray>0.0, np.log(myArray), -999.9)
The interpreter first evaluates all the arguments of the function, and then passes the results to the where. Effectively then:
cond = myArray>0.0
A = np.log(myArray)
B = -999.9
np.where(cond, A, B)
The warning is produced in the 2nd line, not in the 4th.
The 4th line is equivalent to:
[xv if c else yv for (c,xv,yv) in zip(cond, A, B)]
or
[A[i] if c else B for i,c in enumerate(cond)]
np.where is most often used with one argument, where it is a synonym for np.nonzero. We don't see this three-argument form that often on SO. It isn't that useful, in part because it doesn't save on calculations.
Masked assignment is more often, especially if there are more than 2 alternatives.
In [123]: mask = myArray>0
In [124]: out = np.full(myArray.shape, np.nan)
In [125]: out[mask] = np.log(myArray[mask])
In [126]: out
Out[126]: array([ 0. , -0.28768207, -0.69314718, -1.38629436, nan])
Paul Panzer showed how to do the same with the where parameter of log. That feature isn't being used as much as it could be.
In [127]: np.log(myArray, where=mask, out=out)
Out[127]: array([ 0. , -0.28768207, -0.69314718, -1.38629436, nan])
This is not a bug. See this related answer to a similar question. The example in the docs is misleading, but that answer looks at it in detail.
The issue is that ternary statements are processed by the interpreter at compile-time while numpy.where is a regular function. Therefore, ternary statements allow short-circuiting, whereas this is not possible when arguments are defined beforehand.
In other words, the arguments of numpy.where are calculated before the Boolean array is processed.
You may think this is inefficient: why build 2 separate arrays and then use a 3rd Boolean array to decide which item to choose? Surely that's double the work / double the memory?
However, this inefficiency is more than offset by the vectorisation provided by numpy functions acting on an entire array, e.g. np.log(arr).
Consider the example provided in the docs:
If x and y are given and input arrays are 1-D, where is
equivalent to::
[xv if c else yv for (c,xv,yv) in zip(condition,x,y)]
Notice the inputs are arrays. Try running:
c = np.array([0])
result = [xv if c else yv for (c, xv, yv) in zip(c==0, np.array([1]), np.log(c))]
You will notice that this errors.
Related
I'm writing unit tests for my simulation and want to check that for specific parameters the result, a numpy array, is zero. Due to calculation inaccuracies, small values are also accepted (1e-7). What is the best way to assert this array is close to 0 in all places?
np.testing.assert_array_almost_equal(a, np.zeros(a.shape)) and assert_allclose fail as the relative tolerance is inf (or 1 if you switch the arguments) Docu
I feel like np.testing.assert_array_almost_equal_nulp(a, np.zeros(a.shape)) is not precise enough as it compares the difference to the spacing, therefore it's always true for nulps >= 1 and false otherways but does not say anything about the amplitude of a Docu
Use of np.testing.assert_(np.all(np.absolute(a) < 1e-7)) based on this question does not give any of the detailed output, I am used to by other np.testing methods
Is there another way to test this? Maybe another testing package?
If you compare a numpy array with all zeros, you can use the absolute tolerance, as the relative tolerance does not make sense here:
from numpy.testing import assert_allclose
def test_zero_array():
a = np.array([0, 1e-07, 1e-08])
assert_allclose(a, 0, atol=1e-07)
The rtol value does not matter in this case, as it is multiplied with 0 if calculating the tolerance:
atol + rtol * abs(desired)
Update: Replaced np.zeros_like(a) with the simpler scalar 0. As pointed out by #hintze, np array comparisons also work against scalars.
For writing “piecewise functions” in Python, I'd normally use if (in either the control-flow or ternary-operator form).
def spam(x):
return x+1 if x>=0 else 1/(1-x)
Now, with NumPy, the mantra is to avoid working on single values in favour of vectorisation, for performance. So I reckon something like this would be preferred:As Leon remarks, the following is wrong
def eggs(x):
y = np.zeros_like(x)
positive = x>=0
y[positive] = x+1
y[np.logical_not(positive)] = 1/(1-x)
return y
(Correct me if I've missed something here, because frankly I find this very ugly.)
Now, of course eggs will only work if x is actually a NumPy array, because otherwise x>=0 simply yields a single boolean, which can't be used for indexing (at least doesn't do the right thing).
Is there a good way to write code that looks more like spam but works idiomatic on Numpy arrays, or should I just use vectorize(spam)?
Use np.where. You'll get an array as the output even for plain number input, though.
def eggs(x):
y = np.asarray(x)
return np.where(y>=0, y+1, 1/(1-y))
This works for both arrays and plain numbers:
>>> eggs(5)
array(6.0)
>>> eggs(-3)
array(0.25)
>>> eggs(np.arange(-3, 3))
/home/praveen/.virtualenvs/numpy3-mkl/bin/ipython3:2: RuntimeWarning: divide by zero encountered in true_divide
array([ 0.25 , 0.33333333, 0.5 , 1. , 2. , 3. ])
>>> eggs(1)
/home/praveen/.virtualenvs/numpy3-mkl/bin/ipython3:3: RuntimeWarning: divide by zero encountered in long_scalars
# -*- coding: utf-8 -*-
array(2.0)
As ayhan remarks, this raises a warning, since 1/(1-x) gets evaluated for the whole range. But a warning is just that: a warning. If you know what you're doing, you can ignore the warning. In this case, you're only choosing 1/(1-x) from indices where it can never be inf, so you're safe.
I would use numpy.asarray (which is a no-op if the argument is already an numpy array) if I want to handle both numbers and numpy arrays
def eggs(x):
x = np.asfarray(x)
m = x>=0
x[m] = x[m] + 1
x[~m] = 1 / (1 - x[~m])
return x
(here I used asfarray to enforce a floating-point type, since your function requires floating-point computations).
This is less efficient than your spam function for single inputs, and arguably uglier. However it seems to be the easiest choice.
EDIT: If you want to ensure that x is not modified (as pointed out by Leon) you can replace np.asfarray(x) by np.array(x, dtype=np.float64), the array constructor copies by default.
I have two numpy arrays and I am trying to divide one with the other and at the same time, I want to make sure that the entries where the divisor is 0, should just be replaced with 0.
So, I do something like:
log_norm_images = np.where(b_0 > 0, np.divide(diff_images, b_0), 0)
This gives me a run time warning of:
RuntimeWarning: invalid value encountered in true_divide
Now, I wanted to see what was going on and I did the following:
xx = np.isfinite(diff_images)
print (xx[xx == False])
xx = np.isfinite(b_0)
print (xx[xx == False])
However, both of these return empty arrays meaning that all the values in the arrays are finite. So, I am not sure where the invalid value is coming from. I am assuming checking b_0 > 0 in the np.where function takes care of the divide by 0.
The shape of the two arrays are (96, 96, 55, 64) and (96, 96, 55, 1)
You may have a NAN, INF, or NINF floating around somewhere. Try this:
np.isfinite(diff_images).all()
np.isfinite(b_0).all()
If one or both of those returns False, that's likely the cause of the runtime error.
The reason you get the runtime warning when running this:
log_norm_images = np.where(b_0 > 0, np.divide(diff_images, b_0), 0)
is that the inner expression
np.divide(diff_images, b_0)
gets evaluated first, and is run on all elements of diff_images and b_0 (even though you end up ignoring the elements that involve division-by-zero). In other words, the warning happens before the code that ignores those elements. That is why it's a warning and not an error: there are legitimate cases like this one where the division-by-zero is not a problem because it's being handled in a later operation.
Another useful Numpy command is nan_to_num(diff_images)
By default it replaces in a Numpy array; NaN to zero, -INF to -(large number) and +INF to +(large number)
You can change the defaults, see https://numpy.org/doc/stable/reference/generated/numpy.nan_to_num.html
As #drammock pointed out, the cause of the warning is that some of the values in b_0 is 0 and the runtime warning is generated before the np.where is evaluated. While #Luca's suggestion of running np.errstate(invalid='ignore', divide='ignore'):" before the np.where will prevent the warning in this case, there may be other legitimate cases where this warning could be generated. For instance, corresponding elements of b_0 and diff_images are set to np.inf, which would return np.nan.
So to prevent warnings for known cases (i.e. b_0 = 0) and allow for warnings of unknown cases, evaluate the np.where first then evaluate the arithmetic:
#First, create log_norm_images
log_norm_images = np.zeros(b_0.shape)
#Now get the valid indexes
valid = where(b_0 > 0)
#Lastly, evaluate the division problem at the valid indexes
log_norm_images[valid] = np.divide(diff_images[valid], b_0[valid])
num = np.array([1,2,3,4,5])
den = np.array([1,1,0,1,1])
res = np.array([None]*5)
ix = (den!=0)
res[ix] = np.divide( num[ix], den[ix] )
print(res)
[1.0 2.0 None 4.0 5.0]
I have three arrays that are processed with a mathematical function to get a final result array. Some of the arrays contain NaNs and some contain 0. However a division by zero logically raise a Warning, a calculation with NaN gives NaN. So I'd like to do certain operations on certain parts of the arrays where zeros are involved:
r=numpy.array([3,3,3])
k=numpy.array([numpy.nan,0,numpy.nan])
n=numpy.array([numpy.nan,0,0])
1.0*n*numpy.exp(r*(1-(n/k)))
e.g. in cases where k == 0, I'd like to get as a result 0. In all other cases I'd to calculate the function above. So what is the way to do such calculations on parts of the array (via indexing) to get a final single result array?
import numpy
r=numpy.array([3,3,3])
k=numpy.array([numpy.nan,0,numpy.nan])
n=numpy.array([numpy.nan,0,0])
indxZeros=numpy.where(k==0)
indxNonZeros=numpy.where(k!=0)
d=numpy.empty(k.shape)
d[indxZeros]=0
d[indxNonZeros]=n[indxNonZeros]/k[indxNonZeros]
print d
Is following what you need?
>>> rv = 1.0*n*numpy.exp(r*(1-(n/k)))
>>> rv[k==0] = 0
>>> rv
array([ nan, 0., nan])
So, you may think that the solution to this problem is to use numpy.where, but the following:
numpy.where(k==0, 0, 1.0*n*numpy.exp(r*(1-(n/k))))
still gives a warning, as the expression is actually evaluated for the cases where k is zero, even if those results aren't used.
If this really bothers you, you can use numexpr for this expression, which will actually branch on the where statement and not evaluate the k==0 case:
import numexpr
numexpr.evaluate('where(k==0, 0, 1.0*n*exp(r*(1-(n/k))))')
Another way, based on indexing as you asked for, involves a little loss in legibility
result = numpy.zeros_like(k)
good = k != 0
result[good] = 1.0*n[good]*numpy.exp(r[good]*(1-(n[good]/k[good])))
This can be bypassed somewhat by defining a gaussian function:
def gaussian(r, k, n):
return 1.0*n*numpy.exp(r*(1-(n/k)))
result = numpy.zeros_like(k)
good = k != 0
result[good] = gaussian(r[good], k[good], n[good])
Could anyone suggest which library supports creation of a gaussian filter of required length and sigma?I basically need an equivalent function for the below matlab function:
fltr = fspecial('gaussian',[1 n],sd)
You don't need a library for a simple 1D gaussian.
from math import pi, sqrt, exp
def gauss(n=11,sigma=1):
r = range(-int(n/2),int(n/2)+1)
return [1 / (sigma * sqrt(2*pi)) * exp(-float(x)**2/(2*sigma**2)) for x in r]
Note: This will always return an odd-length list centered around 0. I suppose there may be situations where you would want an even-length Gaussian with values for x = [..., -1.5, -0.5, 0.5, 1.5, ...], but in that case, you would need a slightly different formula and I'll leave that to you ;)
Output example with default values n = 11, sigma = 1:
>>> g = gauss()
1.48671951473e-06
0.000133830225765
0.00443184841194
0.0539909665132
0.241970724519
0.398942280401
0.241970724519
0.0539909665132
0.00443184841194
0.000133830225765
1.48671951473e-06
>>> sum(g)
0.99999999318053079
Perhaps scipy.ndimage.filters.gaussian_filter? I've never used it, but the documentation is at: https://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.ndimage.filters.gaussian_filter.html
Try scipy.ndimage.gaussian_filter, but do you really want the kernel or do you also want to apply it? (In which case you can just use this function.) In the former case, apply the filter on an array which is 0 everywhere but with a 1 in the center. For the easier-to-write 1d case, this would be for example:
>>> ndimage.gaussian_filter1d(np.float_([0,0,0,0,1,0,0,0,0]), 1)
array([ 1.33830625e-04, 4.43186162e-03, 5.39911274e-02,
2.41971446e-01, 3.98943469e-01, 2.41971446e-01,
5.39911274e-02, 4.43186162e-03, 1.33830625e-04])
If run-time speed is of importance I highly recommend creating the filter once and then using it on every iteration. Optimizations are constantly made but a couple of years ago this significantly sped some code I wrote. ( The above answers show how to create the filter ).