Using matlab's find like operation in python [duplicate] - python

This question already has answers here:
MATLAB-style find() function in Python
(9 answers)
Closed 8 years ago.
I have this matrix of shape 10,000x30 in python. What I want is to find the indices of the rows, i.e., from the 10,000 rows, determine the indices with the 5th column value equal to 0.
How can I get the indices. Once I get the indices, I want to selected corresponding rows from another matrix B. How do I accomplish this?

>>> a = np.random.randint(0, 10, (10, 5))
>>> a
array([[4, 9, 7, 2, 9],
[1, 9, 5, 0, 8],
[1, 7, 7, 8, 4],
[6, 2, 1, 9, 6],
[6, 2, 0, 0, 8],
[5, 5, 8, 4, 5],
[6, 8, 8, 8, 8],
[2, 2, 3, 4, 3],
[3, 6, 2, 1, 2],
[6, 3, 2, 4, 0]])
>>> a[:, 4] == 0
array([False, False, False, False, False, False, False, False, False, True], dtype=bool)
>>> b = np.random.rand(10, 5)
>>> b
array([[ 0.37363295, 0.96763033, 0.72892652, 0.77217485, 0.86549555],
[ 0.83041897, 0.35277681, 0.13011611, 0.82887195, 0.87522863],
[ 0.88325189, 0.67976957, 0.56058782, 0.58438597, 0.10571746],
[ 0.27305838, 0.72306733, 0.01630463, 0.86069002, 0.9458257 ],
[ 0.23113894, 0.30396521, 0.92840314, 0.39544522, 0.59708927],
[ 0.71878406, 0.91327744, 0.71407427, 0.65388644, 0.416599 ],
[ 0.83550209, 0.85024774, 0.96788451, 0.72253464, 0.41661953],
[ 0.61458993, 0.34527785, 0.20301719, 0.10626226, 0.00773484],
[ 0.87275531, 0.54878131, 0.24933454, 0.29894835, 0.66966912],
[ 0.59533278, 0.15037691, 0.37865046, 0.99402371, 0.17325722]])
>>> b[a[:,4] == 0]
array([[ 0.59533278, 0.15037691, 0.37865046, 0.99402371, 0.17325722]])
>>>
To get a find like result instead of using logical indexing use np.where, which returns a tuple of arrays which serve as indices into each dimension:
>>> indices = np.where(a[:, 4] == 0)
>>> b[indices[0]]
array([[ 0.59533278, 0.15037691, 0.37865046, 0.99402371, 0.17325722]])
>>>

Related

How to remove specific elements in a numpy array (passing a list of values not indexes)

I have a 1d numpy array and a list of values to remove (not indexes), how can I modify this code so that the actual values not indexes are removed
import numpy as np
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
values_to_remove = [2, 3, 6]
new_a = np.delete(a, values_to_remove)
So what I want to delete is the values 2,3,6 NOT their corresponding index. Actually the list is quite long so ideally I should be able to pass the second parameter as a list
So the final array should actually be = 1, 4, 5, 7, 8, 9
Use this:
import numpy as np
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
values_to_remove = [2, 3, 6]
for i in range(0, len(values_to_remove)):
index = np.where(a==values_to_remove[i])
a = np.delete(a, index[0][0])
print(a)
Output:
[1 4 5 7 8 9]
You can use numpy.isin
If you don't mind a copy:
out = a[~np.isin(a, values_to_remove)]
Output: array([1, 4, 5, 7, 8, 9])
To update in place:
np.delete(a, np.isin(a, values_to_remove))
updated a: array([1, 4, 5, 7, 8, 9])
Intermediate:
np.isin(a, values_to_remove)
# array([False, True, True, False, False, True, False, False, False])

Pytorch/NumPy batched submatrix indexing

There's a single source (square) matrix L of shape (N, N)
import torch as pt
import numpy as np
N = 4
L = pt.arange(N*N).reshape(N, N) # or np.arange(N*N).reshape(N, N)
L = tensor([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
and a matrix (vector of vectors) of boolean masks m of shape (K, N) according to which I'd like to extract submatrices from L.
K = 3
m = tensor([[ True, True, False, False],
[False, True, True, False],
[False, True, False, True]])
I know how to extract a single submatrix using a single mask vector by calling L[m[i]][:, m[i]] for any i. So, for example, for i=0, we'd get
tensor([[ 0, 1],
[ 4, 5]])
but I need to perform the operation along the entire "batch" dimension. The end result I'm looking for then could be achieved by
res = []
for i in range(K):
res.append(L[m[i]][:, m[i]])
output = pt.stack(res)
however, I hope there is a better solution excluding the for loop. I realize that the for loop solution itself would crash if the sum of m along the last dimension (dim/axis=1) wasn't constant, but if I can guarantee that it is, is there a better solution? If there isn't, would changing the selector representation help? I chose boolean masks for convenience, but I prefer better performance.
Notice that you can get the first square by indexing together with broadcasting:
r = torch.tensor([0,1])
L[r[:,None], r]
output:
tensor([[0, 1],
[4, 5]])
The same principle can be applied to the second square:
r = torch.tensor([1,2])
L[r[:,None], r]
output:
tensor([[ 5, 6],
[ 9, 10]])
In combination you get:
i = torch.tensor([[0, 1], [1, 2]])
L[i[:,:,None], i[:,None]]
output:
tensor([[[ 0, 4],
[ 1, 5]],
[[ 5, 9],
[ 6, 10]]])
All 3 squares:
i = torch.tensor([
[0, 1],
[1, 2],
[1, 3],
])
L[i[:,:,None], i[:,None]]
output:
tensor([[[ 0, 1],
[ 4, 5]],
[[ 5, 6],
[ 9, 10]],
[[ 5, 7],
[13, 15]]])
to summarize, I would suggest using indices instead of a mask.

Remove entire sub array from multi-dimensional array if any element in array is duplicate

I have a multi-dimensional array in Python where there may be a repeated integer within a vector in the array. For example.
array = [[1,2,3,4],
[2,9,12,4],
[5,6,7,8],
[6,8,12,13]]
I would like to completely remove the vectors that contain any element that has appeared previously. In this case, vector [2,9,12,4] and vector [6,11,12,13] should be removed because they have an element (2 and 6 respectively) that has appeared in a previous vector within that array. Note that [6,8,12,13] contains two elements that have appeared previously, so the code should be able to work with these scenarios as well.
The resulting array should end up being:
array = [[1,2,3,4],
[5,6,7,8]]
I thought I could achieve this with np.unique(array, axis=0), but I couldnt find another function that would take care of this particular uniqueness.
Any thoughts are appreaciated.
You can work with array of sorted numbers and corresponding indices of rows that looks like so:
number_info = array([[ 0, 1],
[ 0, 2],
[ 1, 2],
[ 0, 3],
[ 0, 4],
[ 1, 4],
[ 2, 5],
[ 2, 6],
[ 3, 6],
[ 2, 7],
[ 2, 8],
[ 3, 8],
[ 1, 9],
[ 1, 12],
[ 3, 12],
[ 3, 13]])
It indicates that rows remove_idx = [2, 5, 8, 11, 14] of this array needs to be removed and it points to rows rows_idx = [1, 1, 3, 3, 3] of the original array. Now, the code:
flat_idx = np.repeat(np.arange(array.shape[0]), array.shape[1])
number_info = np.transpose([flat_idx, array.ravel()])
number_info = number_info[np.argsort(number_info[:,1])]
remove_idx = np.where((np.diff(number_info[:,1])==0) &
(np.diff(number_info[:,0])>0))[0] + 1
remove_rows = number_info[remove_idx, 0]
output = np.delete(array, remove_rows, axis=0)
Output:
array([[1, 2, 3, 4],
[5, 6, 7, 8]])
Here's a quick way to do it with a list comprehension and set intersections:
>>> array = [[1,2,3,4],
... [2,9,12,4],
... [5,6,7,8],
... [6,8,12,13]]
>>> [v for i, v in enumerate(array) if not any(set(a) & set(v) for a in array[:i])]
[[1, 2, 3, 4], [5, 6, 7, 8]]

pick TxK numpy array from TxN numpy array using TxK column index array

This is an indirect indexing problem.
It can be solved with a list comprehension.
The question is whether, or, how to solve it within numpy,
When
data.shape is (T,N)
and
c.shape is (T,K)
and each element of c is an int between 0 and N-1 inclusive, that is,
each element of c is intended to refer to a column number from data.
The goal is to obtain out where
out.shape = (T,K)
And for each i in 0..(T-1)
the row out[i] = [ data[i, c[i,0]] , ... , data[i, c[i,K-1]] ]
Concrete example:
data = np.array([\
[ 0, 1, 2],\
[ 3, 4, 5],\
[ 6, 7, 8],\
[ 9, 10, 11],\
[12, 13, 14]])
c = np.array([
[0, 2],\
[1, 2],\
[0, 0],\
[1, 1],\
[2, 2]])
out should be out = [[0, 2], [4, 5], [6, 6], [10, 10], [14, 14]]
The first row of out is [0,2] because the columns chosen are given by c's row 0, they are 0 and 2, and data[0] at columns 0 and 2 are 0 and 2.
The second row of out is [4,5] because the columns chosen are given by c's row 1, they are 1 and 2, and data[1] at columns 1 and 2 is 4 and 5.
Numpy fancy indexing doesn't seem to solve this in an obvious way because indexing data with c (e.g. data[c], np.take(data,c,axis=1) ) always produces a 3 dimensional array.
A list comprehension can solve it:
out = [ [data[rowidx,i1],data[rowidx,i2]] for (rowidx, (i1,i2)) in enumerate(c) ]
if K is 2 I suppose this is marginally OK. If K is variable, this is not so good.
The list comprehension has to be rewritten for each value K, because it unrolls the columns picked out of data by each row of c. It also violates DRY.
Is there a solution based entirely in numpy?
You can avoid loops with np.choose:
In [1]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
data = np.array([\
[ 0, 1, 2],\
[ 3, 4, 5],\
[ 6, 7, 8],\
[ 9, 10, 11],\
[12, 13, 14]])
c = np.array([
[0, 2],\
[1, 2],\
[0, 0],\
[1, 1],\
[2, 2]])
--
In [2]: np.choose(c, data.T[:,:,np.newaxis])
Out[2]:
array([[ 0, 2],
[ 4, 5],
[ 6, 6],
[10, 10],
[14, 14]])
Here's one possible route to a general solution...
Create masks for data to select the values for each column of out. For example, the first mask could be achieved by writing:
>>> np.arange(3) == np.vstack(c[:,0])
array([[ True, False, False],
[False, True, False],
[ True, False, False],
[False, True, False],
[False, False, True]], dtype=bool)
>>> data[_]
array([ 2, 5, 6, 10, 14])
The mask to get the values for the second column of out: np.arange(3) == np.vstack(c[:,1]).
So, to get the out array...
>>> mask0 = np.arange(3) == np.vstack(c[:,0])
>>> mask1 = np.arange(3) == np.vstack(c[:,1])
>>> np.vstack((data[mask0], data[mask1])).T
array([[ 0, 2],
[ 4, 5],
[ 6, 6],
[10, 10],
[14, 14]])
Edit: Given arbitrary array widths K and N you could use a loop to create the masks, so the general construction of the out array might simply look like this:
np.vstack([data[np.arange(N) == np.vstack(c[:,i])] for i in range(K)]).T
Edit 2: A slightly neater solution (though still relying on a loop) is:
np.vstack([data[i][c[i]] for i in range(T)])

Adding a dimension to every element of a numpy.array

I'm trying to transform each element of a numpy array into an array itself (say, to interpret a greyscale image as a color image). In other words:
>>> my_ar = numpy.array((0,5,10))
[0, 5, 10]
>>> transformed = my_fun(my_ar) # In reality, my_fun() would do something more useful
array([
[ 0, 0, 0],
[ 5, 10, 15],
[10, 20, 30]])
>>> transformed.shape
(3, 3)
I've tried:
def my_fun_e(val):
return numpy.array((val, val*2, val*3))
my_fun = numpy.frompyfunc(my_fun_e, 1, 3)
but get:
my_fun(my_ar)
(array([[0 0 0], [ 5 10 15], [10 20 30]], dtype=object), array([None, None, None], dtype=object), array([None, None, None], dtype=object))
and I've tried:
my_fun = numpy.frompyfunc(my_fun_e, 1, 1)
but get:
>>> my_fun(my_ar)
array([[0 0 0], [ 5 10 15], [10 20 30]], dtype=object)
This is close, but not quite right -- I get an array of objects, not an array of ints.
Update 3! OK. I've realized that my example was too simple beforehand -- I don't just want to replicate my data in a third dimension, I'd like to transform it at the same time. Maybe this is clearer?
Does numpy.dstack do what you want? The first two indexes are the same as the original array, and the new third index is "depth".
>>> import numpy as N
>>> a = N.array([[1,2,3],[4,5,6],[7,8,9]])
>>> a
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
>>> b = N.dstack((a,a,a))
>>> b
array([[[1, 1, 1],
[2, 2, 2],
[3, 3, 3]],
[[4, 4, 4],
[5, 5, 5],
[6, 6, 6]],
[[7, 7, 7],
[8, 8, 8],
[9, 9, 9]]])
>>> b[1,1]
array([5, 5, 5])
Use map to apply your transformation function to each element in my_ar:
import numpy
my_ar = numpy.array((0,5,10))
print my_ar
transformed = numpy.array(map(lambda x:numpy.array((x,x*2,x*3)), my_ar))
print transformed
print transformed.shape
I propose:
numpy.resize(my_ar, (3,3)).transpose()
You can of course adapt the shape (my_ar.shape[0],)*2 or whatever
Does this do what you want:
tile(my_ar, (1,1,3))

Categories