Retrieving all numpy array indices where condition - python

Say I have a numpy array, a, of elements (which do not repeat) np.array([1,3,5,2,4]), I would like to retrieve the indices a contains [4,2]. Desired output: np.array([3,4]) as these are the indices the requested elements.
So far, I've tried
np.all(np.array([[1,2,3,4]]).transpose(), axis=1, where=lambda x: x in [1,2])
>>>
array([ True, True, True, True])
But this result does not make sense to me. Elements at indices 2,3 should be False
Perhaps I need to search for one element at a time, but I'd prefer if this operation could be vectorized/fast.

I'd say the function you're looking for is numpy.isin()
arr = np.array([[1,2,3,4]])
print(np.where(np.isin(arr, [1,2])))
Should give the output you're looking for

Related

Pytorch tensor get the index of the element with specific values?

I have two tensores, tensor a and tensor b.
I want to get all indexes of values in tensor b.
For example.
a = torch.Tensor([1,2,2,3,4,4,4,5])
b = torch.Tensor([1,2,4])
I want the index of 1, 2, 4 in tensor a. I can do this by the following code.
a = torch.Tensor([1,2,2,3,4,4,4,5])
b = torch.Tensor([1,2,4])
mask = torch.zeros(a.shape).type(torch.bool)
print(mask)
for e in b:
mask = mask + (a == e)
print(mask)
How can I do it without for?
Update:
As #zaydh kindly pointed out in the comments, since PyTorch 1.10, isin() and isinf()(and many other numpy equivalents) are available as well, thus you can simply do:
torch.isin(a, b)
which would give you :
Out[4]: tensor([ True, True, True, False, True, True, True, False])
Old answer:
Is this what you want? :
np.in1d(a.numpy(), b.numpy())
will result in :
array([ True, True, True, False, True, True, True, False])
If you just do not want to use a for loop, you can just use list comprehension:
mask = [a[index] for index in b]
If do not even want to use the "for" word, you can always convert the tensors to numpy and use numpy indexing.
mask = torch.tensor(a.numpy()[b.numpy()])
UPDATE
Might have misunderstood your question. In that case, I would say the best way to achieve this is through list comprehension. (Slicing will probably not achieve this.
mask = [index for index,value in enumerate(a) if value in b.tolist()]
This iterates over every element in a, gets their index and values, and if the value is inside b, then gets the index.

Return True/False for entire array if any value meets mask requirement(s)

I have already tried looking at other similar posts however, their solutions do not solve this specific issue. Using the answer from this post I found that I get the error: "The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()" because I define my array differently from theirs. Their array is a size (n,) while my array is a size (n,m). Moreover, the solution from this post does not work either because it applies to lists. The only method I could think of was this:
When there is at least 1 True in array, then entire array is considered True:
filt = 4
tracktruth = list()
arraytruth = list()
arr1 = np.array([[1,2,4]])
for track in range(0,arr1.size):
if filt == arr1[0,track]:
tracktruth.append(True)
else:
tracktruth.append(False)
if any(tracktruth):
arraytruth.append(True)
else:
arraytruth.append(False)
When there is not a single True in array, then entire array is considered False:
filt = 5
tracktruth = list()
arraytruth = list()
arr1 = np.array([[1,2,4]])
for track in range(0,arr1.size):
if filt == arr1[0,track]:
tracktruth.append(True)
else:
tracktruth.append(False)
if any(tracktruth):
arraytruth.append(True)
else:
arraytruth.append(False)
The reason the second if-else statement is there is because I wish to apply this mask to multiple arrays and ultimately create a master list that describes which arrays are true and which are false in their entirety. However, with a for loop and two if-else statements, I think this would be very slow with larger arrays. What would be a faster way to do this?
This seems overly complicated, you can use boolean indexing to achieve results without loops
arr1=np.array([[1,2,4]])
filt=4
arr1==filt
array([[False, False, True]])
np.sum(arr1==filt).astype(bool)
True
With nmore than one row, you can use the row or column index in the np.sum or you can use the axis parameter to sum on rows or columns
As pointed out in the comments, you can use np.any() instead of the np.sum(...).astype(bool) and it runs in roughly 2/3 the time on the test dataset:
np.any(a==filt, axis=1)
array([ True])
You can do this with list comprehension. I've done it here for one array but it's easily extended to multiple arrays with a for loop
filt = 4
arr1 = np.array([[1,2,4]])
print(any([part == filt for part in arr1[0]]))
You can get the arraytruth more generally, with list comprehension for the array of size (n,m)
import numpy as np
filt = 4
a = np.array([[1, 2, 4]])
b = np.array([[1, 2, 3],
[5, 6, 7]])
array_lists = [a, b]
arraytruth = [True if a[a==filt].size>0 else False for a in array_lists]
print(arraytruth)
This will give you:
[True, False]
[edit] Use numpy hstack method.
filt = 4
arr = np.array([[1,2,3,4], [1,2,3]])
print(any([x for x in np.hstack(arr) if x < filt]))

Query regarding a Numpy exercise in Datacamp

Just started learning python with Datacamp and I ran into a question on Numpy. When doing this problem(it's a standalone question so should be easy to understand without any context), I am confused by the instruction:"You can use a little trick here: use np_positions == 'GK' as an index for np_heights".
Nowhere in the code did it link np_heights and np_positions together, how could this index work? At first I thought I had to concatenate the two vertically but it turns out it's not necessary.
Is it because there are only two Numpy arrays and it just so happened that they have the same number of elements, Python decides to pair them up automatically? What if I have multiple Numpy arrays with the same number of elements and I use that index, will it be a problem?
The only thing they have in common is their length. Other than that, they are not linked together. The length comes into play when you use boolean indexing.
Consider the following array:
arr = np.array([1, 2, 3])
With boolean values, we can index into this array:
arr[[True, False, True]]
Out: array([1, 3])
This returned values at positions 0 and 2 (where they have True values).
This boolean array may come from anywhere. It may come from the same array with a comparison, or from a different array of the same length.
arr1 = np.array(['a', 'b', 'a', 'c'])
If I do arr1 == 'a' it will do an element-wise comparison and return
arr1 == 'a'
Out: array([ True, False, True, False], dtype=bool)
I can use this in the same array:
arr1[arr1=='a']
Out:
array(['a', 'a'],
dtype='<U1')
Or in a different array:
arr2 = np.array([2, 5, 1, 7])
arr2[arr1=='a']
Out: array([2, 1])
Note that this is no different than arr2[[True, False, True, False]]. So we are not actually using arr1 here. In your example, np_positions == 'GK' will return a boolean array too. Since it will have the same size as the np_height array, you will only deal with positions where the boolean array has True values.

Different in List size

could someone explain me the difference in the List size? Once it is (x,1) and the other (x,). I think I get an idexError due to that.
Thanks
print(Annotation_Matrix)
[array([[1],
...,
[7],
[7],
[7]], dtype=uint8)]
print(idx)
[array([ True, True, True, ..., False, False, False], dtype=bool)]
p.s. the left one is created with
matlabfile.get(...)
the right one with
in1d(...)
An array A of size (x,1) is a matrix of x rows and 1 columns (2 dimensions), which differs from A.T of size (1,x). They have the same elements but in different 'orientation'.
An array B of size (x,) is a vector of x coordinates (1 dimension), without any orientation (it's not a row nor a column). It's just a list of elements.
In the first case, one can access an element with A[i,:] which is the same of A[i,0] (because it has only one column).
In the later, the call B[i,:] causes an error because the array B has only one dimension. The correct call is B[i].
I hope this helps you to solve the problem.

Equality of copy.copy and copy.deepcopy in python copy module

I am creating a list of numpy arrays then copying it to another array to keep an original copy. Copying was done using deepcopy() function. When I am comparing the two arrays now, it is showing false in equivalence. But its all good when I am using copy() function .I understand the difference between copy and deepcopy function, but shall the equivalence be not same?
That is:
grid1=np.empty([3,3],dtype=object)
for i in xrange(3):
for j in xrange(3):
grid1[i][j] = [i,np.random.uniform(-3.5,3.5,(3,3))]
grid_init=[]
grid_init=copy.deepcopy(grid1)
grid1==grid_init #returns False
grid_init=[]
grid_init=copy.copy(grid1)
grid1==grid_init #returns True
grid_init=[]
grid_init=copy.deepcopy(grid1)
np.array_equal(grid1,grid_init) #returns False
Shall all be not true?
This is what I'm getting when running the first example:
WARNING:py.warnings:/usr/local/bin/ipython:1: DeprecationWarning: elementwise comparison failed; this will raise the error in the future.
To see why the elementwise comparison fails, simply try to compare a single element:
grid_init=copy.deepcopy(grid1)
grid_init[0][0] == grid1[0][0]
>>> ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
This fails because the second element in the list is in itself a numpy array, and comparison of two numpy arrays does not return a bool (but an array).
Now, why does the example case behave differently?
Seems to be some interpreter optimization which avoid the actual comparison logic if the two objects are the same one. The two are the same object, because the copying was shallow.
grid_init=copy.copy(grid1)
grid_init[0][0] is grid1[0][0]
> True
grid_init[0][0] == grid1[0][0]
> True
The root cause is that you're using a numpy array of dtype=object, with lists in it. This is not a good idea, and can lead to all sorts of weirdnesses.
Instead, you should simply create 2 aligned arrays, one for the first element in your lists, and one for the second.
I must be running a different version of numpy/python, but I get slightly different errors and/or results. Still the same issue applies - mixing arrays and lists can produce complicated results.
Make the 2 copies
In [217]: x=copy.copy(grid1)
In [218]: y=copy.deepcopy(grid1)
Equality with the shallow copy, gives a element by element comparison, a 3x3 boolean:
In [219]: x==grid1
Out[219]:
array([[ True, True, True],
[ True, True, True],
[ True, True, True]], dtype=bool)
The elements are 2 item lists:
In [220]: grid1[0,0]
Out[220]:
[0, array([[ 2.08833787, -0.24595155, -3.15694342],
[-3.05157909, 1.83814619, -0.78387624],
[ 1.70892355, -0.87361521, -0.83255383]])]
And in the shallow copy, the list ids are the same. The 2 arrays have different data buffers (x is not a view), but they both point to the same list objects (located else where in memeory).
In [221]: id(grid1[0,0])
Out[221]: 2958477004
In [222]: id(x[0,0])
Out[222]: 2958477004
With the same id the lists are equal (they also satisfy the is test).
In [234]: grid1[0,0]==x[0,0]
Out[234]: True
But == with the deepcopy produces a simple False. No element by element comparison here. I'm not sure why. Maybe this is an area in which numpy is undergoing development.
In [223]: y==grid1
Out[223]: False
Note that the deepcopy element ids are different:
In [229]: id(y[0,0])
Out[229]: 2957009900
When I try to apply == to an element of these arrays I get an error:
In [235]: grid1[0,0]==y[0,0]
...
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
This is the error that comes up repeatedly in SO questions, usually because people try to use an boolean array (from a comparison) in a scalar Python context.
I can compare the arrays with in the lists:
In [236]: grid1[0,0][1]==y[0,0][1]
Out[236]:
array([[ True, True, True],
[ True, True, True],
[ True, True, True]], dtype=bool)
I can reproduce the ValueError with a simpler comparison - 2 lists, which contain an array. On the surface they look the same, but because the arrays have different ids, it fails.
In [239]: [0,np.arange(3)]==[0,np.arange(3)]
...
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
This pair of comparisons shows what is going on:
In [242]: [0,np.arange(3)][0]==[0,np.arange(3)][0]
Out[242]: True
In [243]: [0,np.arange(3)][1]==[0,np.arange(3)][1]
Out[243]: array([ True, True, True], dtype=bool)
Python compares the respective elements of the lists, and then tries to perform a logical operation to combine them, all(). But it can't perform all on [True, array([True,True,True])].
So in my version, y==grid1 returns False because the element by element comparisons return ValueErrors. It's either that or raise an error or warning. They clearly aren't equal.
In sum, with this array of lists of number and array, equality tests end up mixing array operations and list operations. The outcomes are logical, but complicated. You have to be keenly aware of how arrays are compared, and how lists are compared. They are not interchangeable.
A structured array
You could put this data in a structured array, with a dtype like
dt = np.dtype([('f0',int),('f1',float,(3,3))])
In [263]: dt = np.dtype([('f0',int),('f1',float,(3,3))])
In [264]: grid2=np.empty([3,3],dtype=dt)
In [265]: for i in range(3):
for j in range(3):
grid2[i][j] = (i,np.random.uniform(-3.5,3.5,(3,3)))
.....:
In [266]: grid2
Out[266]:
array([[ (0,
[[2.719807845330254, -0.6379512247418969, -0.02567206509563602],
[0.9585030371031278, -1.0042751112999135, -2.7805349057485946],
[-2.244526250770717, 0.5740647379258945, 0.29076071288760574]]),
....]])]],
dtype=[('f0', '<i4'), ('f1', '<f8', (3, 3))])
The first field, integers can be fetched with (giving a 3x3 array)
In [267]: grid2['f0']
Out[267]:
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2]])
The second field contains 3x3 arrays, which when accessed by field name are a 4d array:
In [269]: grid2['f1'].shape
Out[269]: (3, 3, 3, 3)
A single element is a record (or tuple),
In [270]: grid2[2,1]
Out[270]: (2, [[1.6236266210555836, -2.7383730706629636, -0.46604477485902374], [-2.781740733659544, 0.7822732671353201, 3.0054266762730473], [3.3135671425199824, -2.7466097112667103, -0.15205961855874406]])
Now both kinds of copy produce the same thing:
In [271]: x=copy.copy(grid2)
In [272]: y=copy.deepcopy(grid2)
In [273]: x==grid2
Out[273]:
array([[ True, True, True],
[ True, True, True],
[ True, True, True]], dtype=bool)
In [274]: y==grid2
Out[274]:
array([[ True, True, True],
[ True, True, True],
[ True, True, True]], dtype=bool)
Since grid2 is pure ndarray (no intermediate lists) I suspect copy.copy and copy.deepcopy end up using grid2.copy(). In numpy we normally use the array copy method, and don't bother with the copy module.
p.s. it appears that with dtype=object, grid1.copy() is the same as copy.copy(grid1) - a new array, but the same object pointers (i.e. same data).

Categories