element wise "contains" in Python - python

Say I have an array:
import numpy as np
arr = np.random.randint(0, 5, 20)
then arr>3 results in an array of type bool with shape (20,). How can I most efficiently do the same thing with the "contains" operator? The simple
arr in [2, 4]
will result in "The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()". Is there another way than
np.array([ x in [2, 4] for x in arr])
?

You can use np.isin:
import numpy as np
arr = np.random.randint(0, 5, 20)
np.isin(arr, [2, 4])
Output:
array([False, True, False, False, False, True, False, False, True,
False, False, False, True, False, False, True, False, True,
True, True])
The function returns a boolean array of the same shape as your input array named arr that is True where an element of arr is in your second list argument [2, 4] and False otherwise

pandas offer this via, pd.Series, or np.ndarray, but so far I don't know any other array module provide this.
a = pd.Series([0,1,2,3,4,5,6,7,8,9])
print(a.isin([0,3]).any()) # returns True
print(a.values.isin([0,3]).any()) # returns True (a.values is np.ndarray)

Related

Python: comparing numpy array with sub-numpy array without loop

My problem is quite simple but I cannot figure how to solve it without a loop.
I have a first numpy array:
FullArray = np.array([0,1,2,3,4,5,6,7,8,9])
and a sub array (not necessarily ordered in the same way):
Sub array = np.array([8, 3, 5])
I would like to create a bool array that has the same size of the full array and that returns True if a given value of FullArray is present in the SubArray and False either way.
For example here I expect to get:
BoolArray = np.array([False, False, False, True, False, True, False, False, True, False])
Is there a way to do this without using a loop?
You can use np.isin:
np.isin(FullArray, SubArray)
# array([False, False, False, True, False, True, False, False, True, False])

Numpy slice of a slice assignment fails unexpectedly

This attempt at a slice+assignment operation fails unexpectedly:
>>> x = np.array([True, True, True, True])
>>> x[x][0:2] = False
>>> x
array([ True, True, True, True])
I'd like to understand why the above simplified code snippet fails to assign the underlying array values.
Seemingly equivalent slicing+assignment operations do work, for example:
>>> x = np.array([True, True, True, True])
>>> x[0:4][0:2] = False
>>> x
array([False, False, True, True])
np.version.version == 1.17.0
The reason this will not work is because x[x] is not a "view", but a copy, and then you thus assign on a slice of that copy. But that copy is never saved. Indeed, if we evaluate x[x], then we see it has no base:
>>> x[x].base is None
True
We can however assign to the first two, or last five, etc. items, by first calculating the indices:
>>> x = np.array([True, True, True, True])
>>> x[np.where(x)[0][:2]] = False
>>> x
array([False, False, True, True])
Here np.where(x) will return a 1-tuple that contains the indices for which x is True:
>>> np.where(x)
(array([0, 1, 2, 3]),)
we then slice that array, and assign the indices of the sliced array.

Python: numpy array larger and smaller than a value

How to look for numbers that is between a range?
c = array[2,3,4,5,6]
>>> c>3
>>> array([False, False, True, True, True]
However, when I give c in between two numbers, it return error
>>> 2<c<5
>>> ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
The desire output is
array([False, True, True, False, False]
Try this,
(c > 2) & (c < 5)
Result
array([False, True, True, False, False], dtype=bool)
Python evaluates 2<c<5 as (2<c) and (c<5) which would be valid, except the and keyword doesn't work as we would want with numpy arrays. (It attempts to cast each array to a single boolean, and that behavior can't be overridden, as discussed here.) So for a vectorized and operation with numpy arrays you need to do this:
(2<c) & (c<5)
You can do something like this :
import numpy as np
c = np.array([2,3,4,5,6])
output = [(i and j) for i, j in zip(c>2, c<5)]
Output :
[False, True, True, False, False]

Find indices in an array that contain one of values from another array

How to get the index of values in an array (a) by a another array (label) with more than one "markers"? For example, given
label = array([1, 2])
a = array([1, 1, 2, 2, 3, 3])
the goal is to find the indices of a with the value of 1 or 2; that is, 0, 1, 2, 3.
I tried several combinations. None of the following seems to work.
label = array([1, 2])
a = array([1, 1, 2, 2, 3, 3])
idx = where(a==label) # gives me only the index of the last value in label
idx = where(a==label[0] or label[1]) # Is confused by all or any?
idx = where(a==label[0] | label[1]) # gives me results as if nor. idx = [4,5]
idx = where(a==label[0] || label[1]) # syntax error
idx = where(a==bolean.or(label,0,1) # I know, this is not the correct form but I don`t remember it correctly but remember the error: also asks for a.all or a.any
idx = where(label[0] or label[1] in a) # gives me only the first appearance. index = 0. Also without where().
idx = where(a==label[0] or a==label[1]).all()) # syntax error
idx = where(a.any(0,label[0] or label[1])) # gives me only the first appearance. index=0. Also without where().
idx = where(a.any(0,label[0] | label[1])) # gives me only the first appearance. index=0. Also without where().
idx=where(a.any(0,label)) # Datatype not understood
Ok, I think you get my problem. Does anyone know how to do it correctly? Best would be a solution with a general label instead of label[x] so that the use of label is more variable for later changes.
You can use numpy.in1d:
>>> a = numpy.array([1, 1, 2, 2, 3, 3])
>>> label = numpy.array([1, 2])
>>> numpy.in1d(a, label)
array([ True, True, True, True, False, False], dtype=bool)
The above returns a mask. If you want indices, you can call numpy.nonzero on the mask array.
Also, if the values in label array are unique, you can pass assume_unique=True to in1d to possibly speed it up.
np.where(a==label) is the same as np.nonzeros(a==label). It tells us the coordinates (indexes) of all non-zero (or True) elements in the array, a==label.
So instead of trying all these different where expressions, focus on the conditional array
Without the where here's what some of your expressions produce:
In [40]: a==label # 2 arrays don't match in size, scalar False
Out[40]: False
In [41]: a==label[0] # result is the size of a
Out[41]: array([ True, True, False, False, False, False], dtype=bool)
In [42]: a==label[0] or label[1] # or is a Python scalar operation
...
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
In [43]: a==label[0] | label[1]
Out[43]: array([False, False, False, False, True, True], dtype=bool)
This last is the same as a==(label[0] | label[1]), the | is evaluated before the ==.
You need to understand how each of those arrays (or scalar or error) are produced before you understand what where gives you.
Correct combination of 2 equality tests (the extra () are important):
In [44]: (a==label[1]) | (a==label[0])
Out[44]: array([ True, True, True, True, False, False], dtype=bool)
Using broadcasting to separately test the 2 elements of label. Result is 2d array:
In [45]: a==label[:,None]
Out[45]:
array([[ True, True, False, False, False, False],
[False, False, True, True, False, False]], dtype=bool)
In [47]: (a==label[:,None]).any(axis=0)
Out[47]: array([ True, True, True, True, False, False], dtype=bool)
As I understand it, you want the indices of 1 and 2 in array "a".
In that case, try
label= [1,2]
a= [1,1,2,2,3,3]
idx_list = list()
for x in label:
for i in range(0,len(a)-1):
if a[i] == x:
idx_list.append(i)
I think what I'm reading as your intent is to get the indices in the second list, 'a', of the values in the first list, 'labels'. I think that a dictionary is a good way to store this information where the labels will be keys and indices will be the values.
Try this:
labels = [a,2]
a = [1,1,2,2,3,3]
results = {}
for label in labels:
results[label] = [i for i,x in enumerate(a) if x == label]
if you want the indices of 1 just call results[1]. The list comprehension is and the enumerate function are the real MVPs here.

Use of python's logical operators when slicing a numpy array

I would like to perform a slicing on a two dimensional numpy array:
type1_c = type1_c[
(type1_c[:,10]==2) or
(type1_c[:,10]==3) or
(type1_c[:,10]==4) or
(type1_c[:,10]==5) or
(type1_c[:,10]==6)
]
The syntax looks right; however I got the following error message:
'The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()'
I really don't understand what's going wrong. Any idea?
or is unambiguous when it's between two scalars, but what's the right vector generalization? if x == array([0, 0]) and y == array([0,1]), should x or y be (1) False, because not all pairwise terms or-ed together are True, (2) True, because at least one pairwise or result is true, (3) array([0, 1]), because that's the pairwise result of an or, (4) array([0, 0]), because [0,0] or [0,1] would return [0,0] because nonempty lists are truthy, and so should arrays be?
You could use | here, and treat it as a bitwise issue:
>>> import numpy as np
>>> vec = np.arange(10)
>>> vec[(vec == 2) | (vec == 7)]
array([2, 7])
Explicitly use numpys vectorized logical or:
>>> np.logical_or(vec==3, vec==5)
array([False, False, False, True, False, True, False, False, False, False], dtype=bool)
>>> vec[np.logical_or(vec==3, vec==5)]
array([3, 5])
or use in1d, which is far more efficient here:
>>> np.in1d(vec, [2, 7])
array([False, False, True, False, False, False, False, True, False, False], dtype=bool)
>>> vec[np.in1d(vec, [2, 7])]
array([2, 7])

Categories