I have two numpy arrays and I am trying to divide one with the other and at the same time, I want to make sure that the entries where the divisor is 0, should just be replaced with 0.
So, I do something like:
log_norm_images = np.where(b_0 > 0, np.divide(diff_images, b_0), 0)
This gives me a run time warning of:
RuntimeWarning: invalid value encountered in true_divide
Now, I wanted to see what was going on and I did the following:
xx = np.isfinite(diff_images)
print (xx[xx == False])
xx = np.isfinite(b_0)
print (xx[xx == False])
However, both of these return empty arrays meaning that all the values in the arrays are finite. So, I am not sure where the invalid value is coming from. I am assuming checking b_0 > 0 in the np.where function takes care of the divide by 0.
The shape of the two arrays are (96, 96, 55, 64) and (96, 96, 55, 1)
You may have a NAN, INF, or NINF floating around somewhere. Try this:
np.isfinite(diff_images).all()
np.isfinite(b_0).all()
If one or both of those returns False, that's likely the cause of the runtime error.
The reason you get the runtime warning when running this:
log_norm_images = np.where(b_0 > 0, np.divide(diff_images, b_0), 0)
is that the inner expression
np.divide(diff_images, b_0)
gets evaluated first, and is run on all elements of diff_images and b_0 (even though you end up ignoring the elements that involve division-by-zero). In other words, the warning happens before the code that ignores those elements. That is why it's a warning and not an error: there are legitimate cases like this one where the division-by-zero is not a problem because it's being handled in a later operation.
Another useful Numpy command is nan_to_num(diff_images)
By default it replaces in a Numpy array; NaN to zero, -INF to -(large number) and +INF to +(large number)
You can change the defaults, see https://numpy.org/doc/stable/reference/generated/numpy.nan_to_num.html
As #drammock pointed out, the cause of the warning is that some of the values in b_0 is 0 and the runtime warning is generated before the np.where is evaluated. While #Luca's suggestion of running np.errstate(invalid='ignore', divide='ignore'):" before the np.where will prevent the warning in this case, there may be other legitimate cases where this warning could be generated. For instance, corresponding elements of b_0 and diff_images are set to np.inf, which would return np.nan.
So to prevent warnings for known cases (i.e. b_0 = 0) and allow for warnings of unknown cases, evaluate the np.where first then evaluate the arithmetic:
#First, create log_norm_images
log_norm_images = np.zeros(b_0.shape)
#Now get the valid indexes
valid = where(b_0 > 0)
#Lastly, evaluate the division problem at the valid indexes
log_norm_images[valid] = np.divide(diff_images[valid], b_0[valid])
num = np.array([1,2,3,4,5])
den = np.array([1,1,0,1,1])
res = np.array([None]*5)
ix = (den!=0)
res[ix] = np.divide( num[ix], den[ix] )
print(res)
[1.0 2.0 None 4.0 5.0]
Related
Consider the following Python + NumPy code that executes without error:
a = np.array((1, 2, 3))
a[13:17] = 23
Using a slice beyond the limits of the array truncates the slice and even returns an empty view if start and stop are beyond the limits. Assigning to such a slice just drops the input.
In my use case the indices are calculated in a non-trivial way and are used to manipulate selected parts of an array. The above behavior means that I might silently skip parts of that manipultion if the indices are miscalculated. That can be hard to detect and can lead to "almost correct" results, i.e. the worst kind of programming errors.
For that reason I'd like to have strict checking for slices so that a start or stop outside the array bounds triggers an error. Is there a way to enable that in NumPy?
As additional information, the arrays are large and the operation is performed very often, i.e. there should be no performance penalty. Furthermore, the arrays are often multidimensional, including multidimensional slicing.
You could be using np.put_along_axis instead, which seems to fit your needs:
>>> a = np.array((1, 2, 3))
>>> np.put_along_axis(a, indices=np.arange(13, 17), axis=0, values=23)
The above will raise the following error:
IndexError: index 13 is out of bounds for axis 0 with size 3
Parameter values can either be a scalar value or another NumPy array.
Or in a shorter form:
>>> np.put_along_axis(a, np.r_[13:17], 23, 0)
Edit: Alternatively np.put has a mode='raise' option (which is set by default):
np.put(a, ind, v, mode='raise')
a: ndarray - Target array.
ind: array_like - Target indices, interpreted as integers.
v: array_like - Values to place in a at target indices. [...]
mode: {'raise', 'wrap', 'clip'} optional - Specifies how out-of-bounds
indices will behave.
'raise' – raise an error (default)
'wrap' – wrap around
'clip' – clip to the range
The default behavior will be:
>>> np.put(a, np.r_[13:17], 23)
IndexError: index 13 is out of bounds for axis 0 with size 3
while with mode='clip', it remains silent:
>>> np.put(a, np.r_[13:17], 23, mode='clip')
Depending on how complicated your indices are (read: how much pain in the backside it is to predict shapes after slicing), you may want to compute the expected shape directly and then reshape to it. If the size of your actual sliced array doesn't match this will raise an error. Overhead is minor:
import numpy as np
from timeit import timeit
def use_reshape(a,idx,val):
expected_shape = ((s.stop-s.start-1)//(s.step or 1) + 1 if isinstance(s,slice) else 1 for s in idx)
a[idx].reshape(*expected_shape)[...] = val
def no_check(a,idx,val):
a[idx] = val
val = 23
idx = np.s_[13:1000:2,14:20]
for f in (no_check,use_reshape):
a = np.zeros((1000,1000))
print(f.__name__)
print(timeit(lambda:f(a,idx,val),number=1000),'ms')
assert (a[idx] == val).all()
# check it works
print("\nThis should raise an exception:\n")
use_reshape(a,np.s_[1000:1001,10],0)
Please note, that this is proof of concept code. To make it safe you'd have to check for unexpected index kinds, matching numbers of dimensions and, importantly, check for indices that select a single element.
Running it anyway:
no_check
0.004587646995787509 ms
use_reshape
0.006306983006652445 ms
This should raise an exception:
Traceback (most recent call last):
File "check.py", line 22, in <module>
use_reshape(a,np.s_[1000:1001,10],0)
File "check.py", line 7, in use_reshape
a[idx].reshape(*expected_shape)[...] = val
ValueError: cannot reshape array of size 0 into shape (1,1)
One way to achieve the behavior you want is to use ranges instead of slices:
a = np.array((1, 2, 3))
a[np.arange(13, 17)] = 23
I think NumPy's behavior here is consistent with the behavior of pure Python's lists and should be expected. Instead of workarounds, it might be better for code readability to explicitly add asserts:
index_1, index_2 = ... # a complex computation
assert index_1 < index_2 and index_2 < a.shape[0]
a[index_1:index_2] = 23
Using python 2.7, scipy 1.0.0-3
Apparently I have a misunderstanding of how the numpy where function is supposed to operate or there is a known bug in its operation. I'm hoping someone can tell me which and explain a work-around to suppress the annoying warning that I am trying to avoid. I'm getting the same behavior when I use the pandas Series where().
To make it simple, I'll use a numpy array as my example. Say I want to apply np.log() on the array and only so for the condition a value is a valid input, i.e., myArray>0.0. For values where this function should not be applied, I want to set the output flag of -999.9:
myArray = np.array([1.0, 0.75, 0.5, 0.25, 0.0])
np.where(myArray>0.0, np.log(myArray), -999.9)
I expected numpy.where() to not complain about the 0.0 value in the array since the condition is False there, yet it does and it appears to actually execute for that False condition:
-c:2: RuntimeWarning: divide by zero encountered in log
array([ 0.00000000e+00, -2.87682072e-01, -6.93147181e-01,
-1.38629436e+00, -9.99900000e+02])
The numpy documentation states:
If x and y are given and input arrays are 1-D, where is equivalent to:
[xv if c else yv for (c,xv,yv) in zip(condition,x,y)]
I beg to differ with this statement since
[np.log(val) if val>0.0 else -999.9 for val in myArray]
provides no warning at all:
[0.0, -0.2876820724517809, -0.69314718055994529, -1.3862943611198906, -999.9]
So, is this a known bug? I don't want to suppress the warning for my entire code.
You can have the log evaluated at the relevant places only using its optional where parameter
np.where(myArray>0.0, np.log(myArray, where=myArray>0.0), -999.9)
or more efficiently
mask = myArray > 0.0
np.where(mask, np.log(myArray, where=mask), -999)
or if you find the "double where" ugly
np.log(myArray, where=myArray>0.0, out=np.full(myArray.shape, -999.9))
Any one of those three should suppress the warning.
This behavior of where should be understandable given a basic understanding of Python. This is a Python expression that uses a couple of numpy functions.
What happens in this expression?
np.where(myArray>0.0, np.log(myArray), -999.9)
The interpreter first evaluates all the arguments of the function, and then passes the results to the where. Effectively then:
cond = myArray>0.0
A = np.log(myArray)
B = -999.9
np.where(cond, A, B)
The warning is produced in the 2nd line, not in the 4th.
The 4th line is equivalent to:
[xv if c else yv for (c,xv,yv) in zip(cond, A, B)]
or
[A[i] if c else B for i,c in enumerate(cond)]
np.where is most often used with one argument, where it is a synonym for np.nonzero. We don't see this three-argument form that often on SO. It isn't that useful, in part because it doesn't save on calculations.
Masked assignment is more often, especially if there are more than 2 alternatives.
In [123]: mask = myArray>0
In [124]: out = np.full(myArray.shape, np.nan)
In [125]: out[mask] = np.log(myArray[mask])
In [126]: out
Out[126]: array([ 0. , -0.28768207, -0.69314718, -1.38629436, nan])
Paul Panzer showed how to do the same with the where parameter of log. That feature isn't being used as much as it could be.
In [127]: np.log(myArray, where=mask, out=out)
Out[127]: array([ 0. , -0.28768207, -0.69314718, -1.38629436, nan])
This is not a bug. See this related answer to a similar question. The example in the docs is misleading, but that answer looks at it in detail.
The issue is that ternary statements are processed by the interpreter at compile-time while numpy.where is a regular function. Therefore, ternary statements allow short-circuiting, whereas this is not possible when arguments are defined beforehand.
In other words, the arguments of numpy.where are calculated before the Boolean array is processed.
You may think this is inefficient: why build 2 separate arrays and then use a 3rd Boolean array to decide which item to choose? Surely that's double the work / double the memory?
However, this inefficiency is more than offset by the vectorisation provided by numpy functions acting on an entire array, e.g. np.log(arr).
Consider the example provided in the docs:
If x and y are given and input arrays are 1-D, where is
equivalent to::
[xv if c else yv for (c,xv,yv) in zip(condition,x,y)]
Notice the inputs are arrays. Try running:
c = np.array([0])
result = [xv if c else yv for (c, xv, yv) in zip(c==0, np.array([1]), np.log(c))]
You will notice that this errors.
I accidentally stumbled upon some strange behavior with max, min and numpy.nan and I'm curious about what's going on under the hood.
Consider the following code run in python3:
import numpy as np
max(np.nan, 0) # outputs nan
max(np.nan, 10000) # outputs nan
max(0, np.nan) # outputs 0
max(10000, np.nan) # outputs 10000
I've played around with a number of values, and it seems that the first value given is always what's returned. The same behavior can be observed with min. I would have expected the output to consistently be nan, or even to throw an error, but this is quite unexpected. Math.nan does the same thing.
I'm very curious about this behavior -- does anyone have any ideas?
Write your own version of max. Remember that NaN will cause any greater, equal, or less comparison to return False. For instance,
def my_max(iter):
result = iter[0]
for val in iter[1:]:
if result < val:
result = val
return result
When you begin with a number, the comparison fails, and that number becomes the result. When you start with nan, any comparison fails, and the result is stuck at that initial nan value.
It's not always the first value, just what you get with the above mechanics. For instance:
>>> nan = numpy.nan
>>> max([7, nan, 15, nan, 5])
15
>>> max([nan, 7, nan, 15, nan, 5])
nan
max doesn't know anything about floats or NaN. It assumes that there actually is an ordering relationship between the arguments, and it may produce nonsensical results when there is no such relationship, as is the case with NaN.
numpy.maximum behaves more reasonably:
>>> numpy.maximum(numpy.nan, 1)
nan
>>> numpy.maximum(1, numpy.nan)
nan
For writing “piecewise functions” in Python, I'd normally use if (in either the control-flow or ternary-operator form).
def spam(x):
return x+1 if x>=0 else 1/(1-x)
Now, with NumPy, the mantra is to avoid working on single values in favour of vectorisation, for performance. So I reckon something like this would be preferred:As Leon remarks, the following is wrong
def eggs(x):
y = np.zeros_like(x)
positive = x>=0
y[positive] = x+1
y[np.logical_not(positive)] = 1/(1-x)
return y
(Correct me if I've missed something here, because frankly I find this very ugly.)
Now, of course eggs will only work if x is actually a NumPy array, because otherwise x>=0 simply yields a single boolean, which can't be used for indexing (at least doesn't do the right thing).
Is there a good way to write code that looks more like spam but works idiomatic on Numpy arrays, or should I just use vectorize(spam)?
Use np.where. You'll get an array as the output even for plain number input, though.
def eggs(x):
y = np.asarray(x)
return np.where(y>=0, y+1, 1/(1-y))
This works for both arrays and plain numbers:
>>> eggs(5)
array(6.0)
>>> eggs(-3)
array(0.25)
>>> eggs(np.arange(-3, 3))
/home/praveen/.virtualenvs/numpy3-mkl/bin/ipython3:2: RuntimeWarning: divide by zero encountered in true_divide
array([ 0.25 , 0.33333333, 0.5 , 1. , 2. , 3. ])
>>> eggs(1)
/home/praveen/.virtualenvs/numpy3-mkl/bin/ipython3:3: RuntimeWarning: divide by zero encountered in long_scalars
# -*- coding: utf-8 -*-
array(2.0)
As ayhan remarks, this raises a warning, since 1/(1-x) gets evaluated for the whole range. But a warning is just that: a warning. If you know what you're doing, you can ignore the warning. In this case, you're only choosing 1/(1-x) from indices where it can never be inf, so you're safe.
I would use numpy.asarray (which is a no-op if the argument is already an numpy array) if I want to handle both numbers and numpy arrays
def eggs(x):
x = np.asfarray(x)
m = x>=0
x[m] = x[m] + 1
x[~m] = 1 / (1 - x[~m])
return x
(here I used asfarray to enforce a floating-point type, since your function requires floating-point computations).
This is less efficient than your spam function for single inputs, and arguably uglier. However it seems to be the easiest choice.
EDIT: If you want to ensure that x is not modified (as pointed out by Leon) you can replace np.asfarray(x) by np.array(x, dtype=np.float64), the array constructor copies by default.
I have three arrays that are processed with a mathematical function to get a final result array. Some of the arrays contain NaNs and some contain 0. However a division by zero logically raise a Warning, a calculation with NaN gives NaN. So I'd like to do certain operations on certain parts of the arrays where zeros are involved:
r=numpy.array([3,3,3])
k=numpy.array([numpy.nan,0,numpy.nan])
n=numpy.array([numpy.nan,0,0])
1.0*n*numpy.exp(r*(1-(n/k)))
e.g. in cases where k == 0, I'd like to get as a result 0. In all other cases I'd to calculate the function above. So what is the way to do such calculations on parts of the array (via indexing) to get a final single result array?
import numpy
r=numpy.array([3,3,3])
k=numpy.array([numpy.nan,0,numpy.nan])
n=numpy.array([numpy.nan,0,0])
indxZeros=numpy.where(k==0)
indxNonZeros=numpy.where(k!=0)
d=numpy.empty(k.shape)
d[indxZeros]=0
d[indxNonZeros]=n[indxNonZeros]/k[indxNonZeros]
print d
Is following what you need?
>>> rv = 1.0*n*numpy.exp(r*(1-(n/k)))
>>> rv[k==0] = 0
>>> rv
array([ nan, 0., nan])
So, you may think that the solution to this problem is to use numpy.where, but the following:
numpy.where(k==0, 0, 1.0*n*numpy.exp(r*(1-(n/k))))
still gives a warning, as the expression is actually evaluated for the cases where k is zero, even if those results aren't used.
If this really bothers you, you can use numexpr for this expression, which will actually branch on the where statement and not evaluate the k==0 case:
import numexpr
numexpr.evaluate('where(k==0, 0, 1.0*n*exp(r*(1-(n/k))))')
Another way, based on indexing as you asked for, involves a little loss in legibility
result = numpy.zeros_like(k)
good = k != 0
result[good] = 1.0*n[good]*numpy.exp(r[good]*(1-(n[good]/k[good])))
This can be bypassed somewhat by defining a gaussian function:
def gaussian(r, k, n):
return 1.0*n*numpy.exp(r*(1-(n/k)))
result = numpy.zeros_like(k)
good = k != 0
result[good] = gaussian(r[good], k[good], n[good])