I would like to perform a slicing on a two dimensional numpy array:
type1_c = type1_c[
(type1_c[:,10]==2) or
(type1_c[:,10]==3) or
(type1_c[:,10]==4) or
(type1_c[:,10]==5) or
(type1_c[:,10]==6)
]
The syntax looks right; however I got the following error message:
'The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()'
I really don't understand what's going wrong. Any idea?
or is unambiguous when it's between two scalars, but what's the right vector generalization? if x == array([0, 0]) and y == array([0,1]), should x or y be (1) False, because not all pairwise terms or-ed together are True, (2) True, because at least one pairwise or result is true, (3) array([0, 1]), because that's the pairwise result of an or, (4) array([0, 0]), because [0,0] or [0,1] would return [0,0] because nonempty lists are truthy, and so should arrays be?
You could use | here, and treat it as a bitwise issue:
>>> import numpy as np
>>> vec = np.arange(10)
>>> vec[(vec == 2) | (vec == 7)]
array([2, 7])
Explicitly use numpys vectorized logical or:
>>> np.logical_or(vec==3, vec==5)
array([False, False, False, True, False, True, False, False, False, False], dtype=bool)
>>> vec[np.logical_or(vec==3, vec==5)]
array([3, 5])
or use in1d, which is far more efficient here:
>>> np.in1d(vec, [2, 7])
array([False, False, True, False, False, False, False, True, False, False], dtype=bool)
>>> vec[np.in1d(vec, [2, 7])]
array([2, 7])
Related
Say I have an array:
import numpy as np
arr = np.random.randint(0, 5, 20)
then arr>3 results in an array of type bool with shape (20,). How can I most efficiently do the same thing with the "contains" operator? The simple
arr in [2, 4]
will result in "The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()". Is there another way than
np.array([ x in [2, 4] for x in arr])
?
You can use np.isin:
import numpy as np
arr = np.random.randint(0, 5, 20)
np.isin(arr, [2, 4])
Output:
array([False, True, False, False, False, True, False, False, True,
False, False, False, True, False, False, True, False, True,
True, True])
The function returns a boolean array of the same shape as your input array named arr that is True where an element of arr is in your second list argument [2, 4] and False otherwise
pandas offer this via, pd.Series, or np.ndarray, but so far I don't know any other array module provide this.
a = pd.Series([0,1,2,3,4,5,6,7,8,9])
print(a.isin([0,3]).any()) # returns True
print(a.values.isin([0,3]).any()) # returns True (a.values is np.ndarray)
This attempt at a slice+assignment operation fails unexpectedly:
>>> x = np.array([True, True, True, True])
>>> x[x][0:2] = False
>>> x
array([ True, True, True, True])
I'd like to understand why the above simplified code snippet fails to assign the underlying array values.
Seemingly equivalent slicing+assignment operations do work, for example:
>>> x = np.array([True, True, True, True])
>>> x[0:4][0:2] = False
>>> x
array([False, False, True, True])
np.version.version == 1.17.0
The reason this will not work is because x[x] is not a "view", but a copy, and then you thus assign on a slice of that copy. But that copy is never saved. Indeed, if we evaluate x[x], then we see it has no base:
>>> x[x].base is None
True
We can however assign to the first two, or last five, etc. items, by first calculating the indices:
>>> x = np.array([True, True, True, True])
>>> x[np.where(x)[0][:2]] = False
>>> x
array([False, False, True, True])
Here np.where(x) will return a 1-tuple that contains the indices for which x is True:
>>> np.where(x)
(array([0, 1, 2, 3]),)
we then slice that array, and assign the indices of the sliced array.
How to get the index of values in an array (a) by a another array (label) with more than one "markers"? For example, given
label = array([1, 2])
a = array([1, 1, 2, 2, 3, 3])
the goal is to find the indices of a with the value of 1 or 2; that is, 0, 1, 2, 3.
I tried several combinations. None of the following seems to work.
label = array([1, 2])
a = array([1, 1, 2, 2, 3, 3])
idx = where(a==label) # gives me only the index of the last value in label
idx = where(a==label[0] or label[1]) # Is confused by all or any?
idx = where(a==label[0] | label[1]) # gives me results as if nor. idx = [4,5]
idx = where(a==label[0] || label[1]) # syntax error
idx = where(a==bolean.or(label,0,1) # I know, this is not the correct form but I don`t remember it correctly but remember the error: also asks for a.all or a.any
idx = where(label[0] or label[1] in a) # gives me only the first appearance. index = 0. Also without where().
idx = where(a==label[0] or a==label[1]).all()) # syntax error
idx = where(a.any(0,label[0] or label[1])) # gives me only the first appearance. index=0. Also without where().
idx = where(a.any(0,label[0] | label[1])) # gives me only the first appearance. index=0. Also without where().
idx=where(a.any(0,label)) # Datatype not understood
Ok, I think you get my problem. Does anyone know how to do it correctly? Best would be a solution with a general label instead of label[x] so that the use of label is more variable for later changes.
You can use numpy.in1d:
>>> a = numpy.array([1, 1, 2, 2, 3, 3])
>>> label = numpy.array([1, 2])
>>> numpy.in1d(a, label)
array([ True, True, True, True, False, False], dtype=bool)
The above returns a mask. If you want indices, you can call numpy.nonzero on the mask array.
Also, if the values in label array are unique, you can pass assume_unique=True to in1d to possibly speed it up.
np.where(a==label) is the same as np.nonzeros(a==label). It tells us the coordinates (indexes) of all non-zero (or True) elements in the array, a==label.
So instead of trying all these different where expressions, focus on the conditional array
Without the where here's what some of your expressions produce:
In [40]: a==label # 2 arrays don't match in size, scalar False
Out[40]: False
In [41]: a==label[0] # result is the size of a
Out[41]: array([ True, True, False, False, False, False], dtype=bool)
In [42]: a==label[0] or label[1] # or is a Python scalar operation
...
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
In [43]: a==label[0] | label[1]
Out[43]: array([False, False, False, False, True, True], dtype=bool)
This last is the same as a==(label[0] | label[1]), the | is evaluated before the ==.
You need to understand how each of those arrays (or scalar or error) are produced before you understand what where gives you.
Correct combination of 2 equality tests (the extra () are important):
In [44]: (a==label[1]) | (a==label[0])
Out[44]: array([ True, True, True, True, False, False], dtype=bool)
Using broadcasting to separately test the 2 elements of label. Result is 2d array:
In [45]: a==label[:,None]
Out[45]:
array([[ True, True, False, False, False, False],
[False, False, True, True, False, False]], dtype=bool)
In [47]: (a==label[:,None]).any(axis=0)
Out[47]: array([ True, True, True, True, False, False], dtype=bool)
As I understand it, you want the indices of 1 and 2 in array "a".
In that case, try
label= [1,2]
a= [1,1,2,2,3,3]
idx_list = list()
for x in label:
for i in range(0,len(a)-1):
if a[i] == x:
idx_list.append(i)
I think what I'm reading as your intent is to get the indices in the second list, 'a', of the values in the first list, 'labels'. I think that a dictionary is a good way to store this information where the labels will be keys and indices will be the values.
Try this:
labels = [a,2]
a = [1,1,2,2,3,3]
results = {}
for label in labels:
results[label] = [i for i,x in enumerate(a) if x == label]
if you want the indices of 1 just call results[1]. The list comprehension is and the enumerate function are the real MVPs here.
I am fairly new to numpy and scientific computing and I struggle with a problem for several days, so I decided to post it here.
I am trying to get a count for a specific occurence of a condition in a numpy array.
In [233]: import numpy as np
In [234]: a= np.random.random([5,5])
In [235]: a >.7
Out[235]: array([[False, True, True, False, False],
[ True, False, False, False, True],
[ True, False, True, True, False],
[False, False, False, False, False],
[False, False, True, False, False]], dtype=bool)
What I would like to count the number of occurence of True in each row and keep the rows when this count reach a certain threshold:
ex :
results=[]
threshold = 2
for i,row in enumerate(a>.7):
if len([value for value in row if value==True]) > threshold:
results.append(i) # keep ids for each row that have more than 'threshold' times True
This is the non-optimized version of the code but I would love to achieve the same thing with numpy (I have a very large matrix to process).
I have been trying all sort of things with np.where but I only can get flatten results. I need the row number
Thanks in advance !
To make results reproducible, use some seed:
>>> np.random.seed(100)
Then for a sample matrix
>>> a = np.random.random([5,5])
Count number of occurences along axis with sum:
>>> (a >.7).sum(axis=1)
array([1, 0, 3, 1, 2])
You can get row numbers with np.where:
>>> np.where((a > .7).sum(axis=1) >= 2)
(array([2, 4]),)
To filter result, just use boolean indexing:
>>> a[(a > .7).sum(axis=1) >= 2]
array([[ 0.89041156, 0.98092086, 0.05994199, 0.89054594, 0.5769015 ],
[ 0.54468488, 0.76911517, 0.25069523, 0.28589569, 0.85239509]])
You can sum over axis with a.sum.
Then you can use where on the resulting vector.
results = np.where(a.sum(axis=0) < threshold))
I have an array of integers and want to find where that array is equal to any value in a list of multiple values.
This can easily be done by treating each value individually, or by using multiple "or" statements in a loop, but I feel like there must be a better/faster way to do it. I'm actually dealing with arrays of size 4000 x 2000, but here is a simplified edition of the problem:
fake = arange(9).reshape((3,3))
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
want = (fake==0) + (fake==2) + (fake==6) + (fake==8)
print want
array([[ True, False, True],
[False, False, False],
[ True, False, True]], dtype=bool)
What I would like is a way to get want from a single command involving fake and the list of values [0, 2, 6, 8].
I'm assuming there is a package that has this included already that would be significantly faster than if I just wrote a function with a loop in Python.
The function numpy.in1d seems to do what you want. The only problems is that it only works on 1d arrays, so you should use it like this:
In [9]: np.in1d(fake, [0,2,6,8]).reshape(fake.shape)
Out[9]:
array([[ True, False, True],
[False, False, False],
[ True, False, True]], dtype=bool)
I have no clue why this is limited to 1d arrays only. Looking at its source code, it first seems to flatten the two arrays, after which it does some clever sorting tricks. But nothing would stop it from unflattening the result at the end again, like I had to do by hand here.
NumPy 0.13+
As of NumPy v0.13, you can use np.isin, which works on multi-dimensional arrays:
>>> element = 2*np.arange(4).reshape((2, 2))
>>> element
array([[0, 2],
[4, 6]])
>>> test_elements = [1, 2, 4, 8]
>>> mask = np.isin(element, test_elements)
>>> mask
array([[ False, True],
[ True, False]])
NumPy pre-0.13
The accepted answer with np.in1d works only with 1d arrays and requires reshaping for the desired result. This is good for versions of NumPy before v0.13.
#Bas's answer is the one you're probably looking for. But here's another way to do it, using numpy's vectorize trick:
import numpy as np
S = set([0,2,6,8])
#np.vectorize
def contained(x):
return x in S
contained(fake)
=> array([[ True, False, True],
[False, False, False],
[ True, False, True]], dtype=bool)
The con of this solution is that contained() is called for each element (i.e. in python-space), which makes this much slower than a pure-numpy solution.