I would like to check a condition inside an array and perform an operation on the position where the condition is met. For example, this piece of code does the job:
res = somefunction(x)
for i in range(x.shape[0]):
for j in range(x.shape[1]):
if not 6 < res[i,j] < 18:
x[i,j] = float('nan')
But I thought a faster (and shorter) way would maybe be something like this:
x[not 6 < somefunction(x) < 18] = float('nan')
But python gives the error that condition checking doesn't work in array with more than element. Is there a way to make my code go faster?
You can't use not or chained comparisons with arrays, since neither not nor chained comparisons can be implemented to broadcast.
Split the chained comparison into two comparisons, and use ~ and & instead of not and and, since NumPy uses the bitwise operators for boolean operations on boolean arrays:
x[~((6 < res) & (res < 18))] = numpy.nan
Related
I am looking for an elegant way to check if a given index is inside a numpy array (for example for BFS algorithms on a grid).
The following code does what I want:
import numpy as np
def isValid(np_shape: tuple, index: tuple):
if min(index) < 0:
return False
for ind,sh in zip(index,np_shape):
if ind >= sh:
return False
return True
arr = np.zeros((3,5))
print(isValid(arr.shape,(0,0))) # True
print(isValid(arr.shape,(2,4))) # True
print(isValid(arr.shape,(4,4))) # False
But I'd prefer something build-in or more elegant than writing my own function including python for-loops (yikes)
You can try:
def isValid(np_shape: tuple, index: tuple):
index = np.array(index)
return (index >= 0).all() and (index < arr.shape).all()
arr = np.zeros((3,5))
print(isValid(arr.shape,(0,0))) # True
print(isValid(arr.shape,(2,4))) # True
print(isValid(arr.shape,(4,4))) # False
I have benchmarked the answers quite a bit, and come to the conclusion that actually the explicit for loop as provided in my code performs best.
Dmitri's solution is wrong for several reasons (tuple1 < tuple2 just compares the first value; ideas like np.all(ni < sh for ind,sh in zip(index,np_shape)) fail as the input to all returns a generator, not a list etc).
#mozway's solution is correct, but all the casts make it a lot slower. Also it always needs to consider all numbers for casting, while an explicit loop can stop earlier, I suppose.
Here is my benchmark (Method 0 is #mozway's solution, Method 1 is my solution):
I have code that generates a boolean array that acts as a mask on numpy arrays, along the lines of:
def func():
a = numpy.arange(10)
mask = a % 2 == 0
return a[mask]
Now, I need to separate this into a case where the mask is created, and one where it is not created and all values are used instead. This could be achieved as follows:
def func(use_mask):
a = numpy.arange(10)
if use_mask:
mask = a % 2 == 0
else:
mask = numpy.ones(10, dtype=bool)
return a[mask]
However, this becomes extremely wasteful for large arrays, since an equally large boolean array must first be created.
My question is thus: Is there something I can pass as an "index" to recreate the behavior of such an everywhere-true array?
Systematically changing occurrences of a[mask] to something else involving some indexing magic etc. is a valid solution, but just avoiding the masking entirely via an expanded case distinction or something else that changes the structure of the code is not desired, as it would impair readability and maintainability (see next paragraph).
For the sake of completeness, here's what I'm currently considering doing, though this makes the code messier and less streamlined since it expands the if/else beyond where it technically needs to be (in reality, the mask is used more than once, hence every occurrence would need to be contained within the case distinction; I used f1 and f2 as examples here):
def func(use_mask):
a = numpy.arange(10)
if use_mask:
mask = a % 2 == 0
r = f1(a[mask])
q = f2(a[mask], r)
return q
else:
r = f1(a)
q = f2(a, r)
return q
Recall that a[:] returns the contents of a (even if a is multidimensional). We cannot store the : in the mask variable, but we can use a slice object equivalently:
def func(use_mask):
a = numpy.arange(10)
if use_mask:
mask = a % 2 == 0
else:
mask = slice(None)
return a[mask]
This does not use any memory to create the index array. I'm not sure what the CPU usage of the a[slice(None)] operation is, though.
I have an array of numbers:
my_arr = np.array([n, n+1, n+2 ... , m-1, m]
I want to create an array of Booleans which indicate which numbers are in some (closed) interval, [A,B], to operate on some other related array that has the same shape. There are two cases:
Case 1: B >= m or A <= n
This case is trivial; the interval can be fully described with only one Boolean expression, and np.where() provides the solution to testing my array; e.g.:
my_boolean_arr = np.where(my_arr >= A)
or it's equivalent for B. This works.
Case 2: n <= A and m >= B
Here, I run into problems. I can no longer reduce my interval expression into a "single" Boolean expression. Python allows me to come close: the expression A < x < B will return a single (correct) Boolean. However,
my_boolean_arr = np.where(A <= my_arr <= B)
now fails:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
So, Two questions. First, how do I make this work? Second, why does this fail?
Operators "and" and "or" are not defined for numpy arrays.
In your case, you can use np.logical_and instead:
my_boolean_arr = np.logical_and(my_arr>=A, my_arr<=B)
https://numpy.org/doc/stable/reference/generated/numpy.logical_and.html
Alternative way is to use operator &
my_boolean_arr = (my_arr>=A) & (my_arr<=B)
Python short circuits the logical operators.
for eg:
if False and Condition2:
#condition2 won't even be checked because the first condition is already false.
Is there a way to stop this behavior. I want it to check both the conditions and then perform the and operation(as done in c, c++ etc). It's useful when we are performing some operation along with the condition. e.g.:
if a < p.pop() and b < p.pop():
One way can be checking the conditions before and then comparing the Boolean values. But that would be wastage of memory.
if all([a < p.pop(), b < p.pop()])
This creates a list, which will be evaluated in its entirety, and then uses all to confirm that both values are truthy. But this is somewhat obscure and I'd rather suggest you write plain, easy to understand code:
a_within_limit = a < p.pop()
b_within_limit = b < p.pop()
if a_within_limit and b_within_limit:
If the conditions are booleans, as they are in your example, you could use & instead:
>>> a, b, p = 1, 1, [0, 0]
>>> (a < p.pop()) & (b < p.pop())
False
>>> p
[]
You can use the all() and any() built-in functions to somehow emulate the and and or operators. Both take an iterable of boolean-likes values as parameter. If you give it a literal tuple or list, all members will be fully evaluated:
# all emulates the and operator
if all((False, Condition2)):
do_stuff()
# any emulates the or operator
if any((False, Condition2)):
do_stuff()
Short answer: No, you cannot stop it to do this.
For example:
av = p.pop()
bv = p.pop()
if a < av and b < bv:
pass
Or:
av, bv = p.pop(), p.pop()
if a < av and b < bv:
pass
Also, there is no waste of memory in these examples. In Python, almost everything is done by reference. The value object being popped already exists somewhere. Even the scalars like strings, ints, etc are objects (some of them are slightly optimized). The only memory changes here are (1) the creation of a new variable that refers to the same existing object, and (2) removal of the record in the dict at the same time (which referred to that object before popping). They are of the similar scale.
I want to write this if statement as compact as possible (to avoid having duplicate code)
if length == 10 if boolean is False or length == 13 if boolean is True:
The part that PyCharm does not like is the
if boolean is True
It asks for a colon.
PyCharm does not allow me to run it. Does anyone have a nice compact solution to this if?
I think you meant
if (not boolean and length == 10) or (boolean and length == 13):
The parentheses aren't necessary, but I think they help readability. #jonsharpe's solution is even shorter and only has to evaluate boolean once, but it may be harder to read, especially if you're not familiar with Python's ternary expressions.
Never use is for equality comparison (that's what == is for), but boolean types should never be explicitly compared to True or False anyway.
You can use a conditional expression (also known as a "ternary") to write it out much more concisely:
if length == 13 if boolean else length == 10:
or, equivalently:
if length == (13 if boolean else 10):
Per the documentation:
The expression x if C else y first evaluates the condition, C (not x); if C is true, x is evaluated and its value is returned; otherwise, y is evaluated and its value is returned.