Numpy: Find column index for element on each row - python

Suppose I have a vector with elements to find:
a = np.array([1, 5, 9, 7])
Now I have a matrix where those elements should be searched:
M = np.array([
[0, 1, 9],
[5, 3, 8],
[3, 9, 0],
[0, 1, 7]
])
Now I'd like to get an index array telling in which column of row j of M the element j of a occurs.
The result would be:
[1, 0, 1, 2]
Does Numpy offer such a function?
(Thanks for the answers with list comprehensions, but that's not an option performance-wise. I also apologize for mentioning Numpy just in the final question.)

Note the result of:
M == a[:, None]
>>> array([[False, True, False],
[ True, False, False],
[False, True, False],
[False, False, True]], dtype=bool)
The indices can be retrieved with:
yind, xind = numpy.where(M == a[:, None])
>>> (array([0, 1, 2, 3], dtype=int64), array([1, 0, 1, 2], dtype=int64))

For the first match in each row, it might be an efficient way to use argmax after extending a to 2D as done in #Benjamin's post -
(M == a[:,None]).argmax(1)
Sample run -
In [16]: M
Out[16]:
array([[0, 1, 9],
[5, 3, 8],
[3, 9, 0],
[0, 1, 7]])
In [17]: a
Out[17]: array([1, 5, 9, 7])
In [18]: a[:,None]
Out[18]:
array([[1],
[5],
[9],
[7]])
In [19]: (M == a[:,None]).argmax(1)
Out[19]: array([1, 0, 1, 2])

Lazy solution without any import:
a = [1, 5, 9, 7]
M = [
[0, 1, 9],
[5, 3, 8],
[3, 9, 0],
[0, 1, 7],
]
for n, i in enumerate(M):
for j in a:
if j in i:
print("{} found at row {} column: {}".format(j, n, i.index(j)))
Returns:
1 found at row 0 column: 1
9 found at row 0 column: 2
5 found at row 1 column: 0
9 found at row 2 column: 1
1 found at row 3 column: 1
7 found at row 3 column: 2

Maybe something like this?
>>> [list(M[i,:]).index(a[i]) for i in range(len(a))]
[1, 0, 1, 2]

[sub.index(val) if val in sub else -1 for sub, val in zip(M, a)]
# [1, 0, 1, 2]

Related

Compare two 3d Numpy array and return unmatched values with index and later recreate them without loop

I am currently working on a problem where in one requirement I need to compare two 3d NumPy arrays and return the unmatched values with their index position and later recreate the same array. Currently, the only approach I can think of is to loop across the arrays to get the values during comparing and later recreating. The problem is with scale as there will be hundreds of arrays and looping effects the Latency of the overall application. I would be thankful if anyone can help me with better utilization of NumPy comparison while using minimal or no loops. A dummy code is below:
def compare_array(final_array_list):
base_array = None
i = 0
for array in final_array_list:
if i==0:
base_array =array[0]
else:
index = np.where(base_array != array)
#getting index like (array([0, 1]), array([1, 1]), array([2, 2]))
# to access all unmatched values I need to loop.Need to avoid loop here
i=i+1
return [base_array, [unmatched value (8,10)and its index (array([0, 1]), array([1, 1]), array([2, 2])],..]
# similarly recreate array1 back
def recreate_array(array_list):
# need to avoid looping while recreating array back
return list of array #i.e. [base_array, array_1]
# creating dummy array
base_array = np.array([[[1, 2, 3], [3, 4, 5]], [[5, 6, 7], [7, 8, 9]]])
array_1 = b = np.array([[[1, 2,3], [3, 4,8]], [[5, 6,7], [7, 8,10]]])
final_array_list = [base_array,array_1, ...... ]
#compare base_array with other arrays and get unmatched values (like 8,10 in array_1) and their index
difff_array = compare_array(final_array_list)
# recreate array1 from the base array after receiving unmatched value and its index value
recreate_array(difff_array)
I think this may be what you're looking for:
base_array = np.array([[[1, 2, 3], [3, 4, 5]], [[5, 6, 7], [7, 8, 9]]])
array_1 = b = np.array([[[1, 2,3], [3, 4,8]], [[5, 6,7], [7, 8,10]]])
match_mask = (base_array == array_1)
idx_unmatched = np.argwhere(~match_mask)
# idx_unmatched:
# array([[0, 1, 2],
# [1, 1, 2]])
# values with associated with idx_unmatched:
values_unmatched = base_array[tuple(idx_unmatched.T)]
# values_unmatched:
# array([5, 9])
I'm not sure I understand what you mean by "recreate them" (completely recreate them? why not use the arrays themselves?).
I can help you though by noting that ther are plenty of functions which vectorize with numpy, and as a general rule of thumb, do not use for loops unless G-d himself tells you to :)
For example:
If a, b are any np.arrays (regardless of dimensions), the simple a == b will return a numpy array of the same size, with boolean values. Trues = they are equal in this coordinate, and False otherwise.
The function np.where(c), will convert c to a boolean np.array, and return you the indexes in which c is True.
To clarify:
Here I instantiate two arrays, with b differing from a with -1 values:
Note what a==b is, at the end.
>>> a = np.random.randint(low=0, high=10, size=(4, 4))
>>> b = np.copy(a)
>>> b[2, 3] = -1
>>> b[0, 1] = -1
>>> b[1, 1] = -1
>>> a
array([[9, 9, 3, 4],
[8, 4, 6, 7],
[8, 4, 5, 5],
[1, 7, 2, 5]])
>>> b
array([[ 9, -1, 3, 4],
[ 8, -1, 6, 7],
[ 8, 4, 5, -1],
[ 1, 7, 2, 5]])
>>> a == b
array([[ True, False, True, True],
[ True, False, True, True],
[ True, True, True, False],
[ True, True, True, True]])
Now the function np.where, which output is a bit tricky, but can be used easily. This will return two arrays of the same size: the first array is the rows and the second array is the columns at places in which the given array is True.
>>> np.where(a == b)
(array([0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3], dtype=int64), array([0, 2, 3, 0, 2, 3, 0, 1, 2, 0, 1, 2, 3], dtype=int64))
Now you can "fix" the b array to match a, by switching the values of b ar indexes in which it differs from a, to be a's indexes:
>>> b[np.where(a != b)]
array([-1, -1, -1])
>>> b[np.where(a != b)] = a[np.where(a != b)]
>>> np.all(a == b)
True

How to reduce multiple arrays to one?

Ignoring my data structure and just given a list of multiple numpy arrays (all arrays have the exact same size/shape):
list[0] = [[0, 2, 0],
[1, 3, 0]]
list[1] = [[0, 0, 0],
[0, 0, 3]]
list[2] = [[5, 0, 0],
[0, 0, 0]]
I want to reduce this list of arrays to only one array. Zero means there is no value. For every entry the only given value should be taken. There are no overlapping values. There is always only one assigned value in one array for a given position.
result: [[5, 2, 0]
[1, 3, 3]]
In my case I have a dictionary with tuples as keys and arrays as values. The arrays are Boolean arrays. Every dict entry represents a special channel and at one specific position only one dict entry has the value True.
I now want to replace all True values by the dictionary keys and reduce this dictionary down to only one array.
For example (near my real data):
dict { (9, 2, 6): [[False, True],
[True, False]]
(1, 5, 8): [[True, False],
[False, True]] }
result: [[(1, 5, 8),(9, 2, 6)]
[(9, 2, 6),(1, 5, 8)]]
How could this be done with list comprehension, a numpy function and/or map & reduce?
First try:
At first I thought I could just turn my numpy arrays into 0 & 1 (.astype(np.float32)) and then just multiply these arrays with my key:
values_filled = [key_tuple * value_array for key_tuple, value_array in dict]
And then to just sum over all arrays:
final = reduce(lambda right, left: right + left, values_filled)
But this obviously doesn't work since my keys are tuples of values and not just numbers.
What I try to archive is to do the opposite of the following operation:
{color: np.all(mask, axis=-1) for (color, mask) in
((color, segmentation == color) for color in colors) if mask.max()}
With this operation I take a segmented image and create a dictionary with predefined colors. The numpy arrays have True at every position where the color is equal to the color in the image at that position / equal to the key of the dictionary.
Now I want to reduce this dictionary back to an image (there where changes on the dictionary).
The first part of your question just requires an array sum:
In [167]: alist = [[[0, 2, 0],
...: [1, 3, 0]],[[0, 0, 0],
...: [0, 0, 3]],[[5, 0, 0],
...: [0, 0, 0]]]
...:
In [168]: alist
Out[168]: [[[0, 2, 0], [1, 3, 0]], [[0, 0, 0], [0, 0, 3]], [[5, 0, 0], [0, 0, 0]]]
In [169]: np.array(alist).shape
Out[169]: (3, 2, 3)
In [170]: np.array(alist).sum(axis=0)
Out[170]:
array([[5, 2, 0],
[1, 3, 3]])
That takes advantage of the fact that 0 doesn't affect the sum, and there aren't any overlapping values.
You apparently have a second question involving a dictionary of boolean arrays or masks. Assuming that's related to the first question, then you just need a way of translating those masks into the list of arrays (or lists) as given in the first.
Starting with the dictionary, we'll need to iterate over the keys (or items). We can use the same summing. After a little experimentation I decided I,J=np.where(v) was the easiest way of mapping the boolean mask on to the target array:
In [200]: dd={ (9, 2, 6): [[False, True],
...: [True, False]],
...: (1, 5, 8): [[True, False],
...: [False, True]] }
...:
In [201]: arr = np.zeros((2,2,3),int)
In [202]: for k,v in dd.items():
...: I,J = np.where(v)
...: arr[I,J,:] += k
...:
In [203]: arr
Out[203]:
array([[[1, 5, 8],
[9, 2, 6]],
[[9, 2, 6],
[1, 5, 8]]])
For the last iteration:
In [204]: k
Out[204]: (1, 5, 8)
In [205]: v
Out[205]: [[True, False], [False, True]]
In [206]: I,J=np.where(v)
In [207]: I,J
Out[207]: (array([0, 1]), array([0, 1]))
In [208]: arr[I,J,:]
Out[208]:
array([[1, 5, 8],
[1, 5, 8]])
This can be done without anything fancy, just good old for loops and list comprehension, and enumerate. I'm sure there's a better one liner out there or a library that can cover it, but here's a vanilla Python solution :
d = { (9, 2, 6): [[False, True],[True, False]],(1, 5, 8): [[True, False],[False, True]] }
new_list = []
for k,v in d.items():
if new_list:
for i, each in enumerate(v):
x = [k if z else new_list[i][j] for j,z in enumerate(each)]
new_list[i] = x
else:
for each in v:
new_list.append([k if x else x for x in each])
print(new_list) # [[(1, 5, 8), (9, 2, 6)], [(9, 2, 6), (1, 5, 8)]]
P.S. also thank you for showing your effort.
Another numpy approach, in case you want it to be a normal numpy array afterwards:
import numpy as np
d = {(9, 2, 6): [[False, True],
[True, False]],
(1, 5, 8): [[True, False],
[False, True]]}
x = np.sum(np.reshape(k, (1,1,-1)) * np.array(v)[..., None] for k, v in d.items())
# x = np.sum(np.array(k)[None, None, :] * np.array(v)[..., None] for k, v in d.items()) # Alternative way
print(X)
# array([[[1, 5, 8],
# [9, 2, 6]],
# [[9, 2, 6],
# [1, 5, 8]]])
np.all(x == np.array([[(1, 5, 8),(9, 2, 6)], [(9, 2, 6),(1, 5, 8)]]))
# True
This basically uses the approach you outlined in your question, of multiplying the truth mask with the value. I just added the fact that the content of that value is another (third) dimension and used numpys broadcasting features to achieve this.
Here is a Numpythonic approach:
arr = np.dstack((lst1, lst2, lst3)) # Create columns along the first axis
mask = arr.astype(bool) # create a mask from places that are nonzero
mask2 = (~mask).all(axis=-1) # another mask that gives rows that are all zero
mask[mask2] = [True, False, False] # Truthify one optional item in all-zero rows
result = arr[mask].reshape(lst1.shape) # get the desire items and reshape
Demo:
In [135]: arr = np.dstack((lst1, lst2, lst3))
In [136]: arr
Out[136]:
array([[[0, 0, 5],
[2, 0, 0],
[0, 0, 0]],
[[1, 0, 0],
[3, 0, 0],
[0, 3, 0]]])
In [137]: mask = arr.astype(bool)
In [138]: mask2 = (~mask).all(axis=-1)
In [139]: mask[mask2] = [True, False, False]
In [140]: arr[mask].reshape(lst1.shape)
Out[140]:
array([[5, 2, 0],
[1, 3, 3]])

Row wise element search in an array

I have a vector ( say v = (1, 5, 7) ) and an array.
a = [ [1, 2, 3],
[4, 5, 6],
[7, 8, 9] ]
What would be the most efficient way to find indices of elements in vector v in the corresponding row in a. For example, the output here would be
b = (0, 1, 0) since 1 is at the 0th index in 1st row and so on.
You can convert v to a column vector with [:,None] and then compare with a to bring in broadcasting and finally use np.where to get the final output as indices -
np.where(a == v[:,None])[1]
Sample run -
In [34]: a
Out[34]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [35]: v
Out[35]: array([1, 5, 7])
In [36]: np.where(a == v[:,None])[1]
Out[36]: array([0, 1, 0])
In case, there are multiple elements in a row in a that match the corresponding element from v, you can use np.argmax to get indices of the first match in each row, like so -
np.argmax(a == v[:,None],axis=1)
Sample run -
In [57]: a
Out[57]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 7]])
In [58]: v
Out[58]: array([1, 5, 7])
In [59]: np.argmax(a == v[:,None],axis=1)
Out[59]: array([0, 1, 0])
>>> a = [ [1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>> v = (1, 5, 7)
>>> b = tuple([a[id].index(val) for id, val in enumerate(v)])
>>> b
(0, 1, 0)
You can use list comprehension:
[a[idx].index(val) for idx, val in enumerate(v)]
Where enumerate returns an iterable of index and the value itself, and index returns the index of the first apperance of val in the correct row.
If you must get a tuple as the return value convert it in the end:
b = tuple([a[idx].index(val) for idx, val in enumerate(v)])
Just note that index may raise ValueError if val wasn't found in the correct row of a.

Slicing a 3-D array using a 2-D array

Assume we have two matrices:
x = np.random.randint(10, size=(2, 3, 3))
idx = np.random.randint(3, size=(2, 3))
The question is to access the element of x using idx, in the way as:
dim1 = x[0, range(0,3), idx[0]] # slicing x[0] using idx[0]
dim2 = x[1, range(0,3), idx[1]]
res = np.vstack((dim1, dim2))
Is there a neat way to do this?
You can just index it the basic way, only that the size of indexer array has to match. That's what those .reshape s are for:
x[np.array([0,1]).reshape(idx.shape[0], -1),
np.array([0,1,2]).reshape(-1,idx.shape[1]),
idx]
Out[29]:
array([[ 0.10786251, 0.2527514 , 0.11305823],
[ 0.67264076, 0.80958292, 0.07703623]])
Here's another way to do it with reshaping -
x.reshape(-1,x.shape[2])[np.arange(idx.size),idx.ravel()].reshape(idx.shape)
Sample run -
In [2]: x
Out[2]:
array([[[5, 0, 9],
[3, 0, 7],
[7, 1, 2]],
[[5, 3, 5],
[8, 6, 1],
[7, 0, 9]]])
In [3]: idx
Out[3]:
array([[2, 1, 2],
[1, 2, 0]])
In [4]: x.reshape(-1,x.shape[2])[np.arange(idx.size),idx.ravel()].reshape(idx.shape)
Out[4]:
array([[9, 0, 2],
[3, 1, 7]])

pick TxK numpy array from TxN numpy array using TxK column index array

This is an indirect indexing problem.
It can be solved with a list comprehension.
The question is whether, or, how to solve it within numpy,
When
data.shape is (T,N)
and
c.shape is (T,K)
and each element of c is an int between 0 and N-1 inclusive, that is,
each element of c is intended to refer to a column number from data.
The goal is to obtain out where
out.shape = (T,K)
And for each i in 0..(T-1)
the row out[i] = [ data[i, c[i,0]] , ... , data[i, c[i,K-1]] ]
Concrete example:
data = np.array([\
[ 0, 1, 2],\
[ 3, 4, 5],\
[ 6, 7, 8],\
[ 9, 10, 11],\
[12, 13, 14]])
c = np.array([
[0, 2],\
[1, 2],\
[0, 0],\
[1, 1],\
[2, 2]])
out should be out = [[0, 2], [4, 5], [6, 6], [10, 10], [14, 14]]
The first row of out is [0,2] because the columns chosen are given by c's row 0, they are 0 and 2, and data[0] at columns 0 and 2 are 0 and 2.
The second row of out is [4,5] because the columns chosen are given by c's row 1, they are 1 and 2, and data[1] at columns 1 and 2 is 4 and 5.
Numpy fancy indexing doesn't seem to solve this in an obvious way because indexing data with c (e.g. data[c], np.take(data,c,axis=1) ) always produces a 3 dimensional array.
A list comprehension can solve it:
out = [ [data[rowidx,i1],data[rowidx,i2]] for (rowidx, (i1,i2)) in enumerate(c) ]
if K is 2 I suppose this is marginally OK. If K is variable, this is not so good.
The list comprehension has to be rewritten for each value K, because it unrolls the columns picked out of data by each row of c. It also violates DRY.
Is there a solution based entirely in numpy?
You can avoid loops with np.choose:
In [1]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
data = np.array([\
[ 0, 1, 2],\
[ 3, 4, 5],\
[ 6, 7, 8],\
[ 9, 10, 11],\
[12, 13, 14]])
c = np.array([
[0, 2],\
[1, 2],\
[0, 0],\
[1, 1],\
[2, 2]])
--
In [2]: np.choose(c, data.T[:,:,np.newaxis])
Out[2]:
array([[ 0, 2],
[ 4, 5],
[ 6, 6],
[10, 10],
[14, 14]])
Here's one possible route to a general solution...
Create masks for data to select the values for each column of out. For example, the first mask could be achieved by writing:
>>> np.arange(3) == np.vstack(c[:,0])
array([[ True, False, False],
[False, True, False],
[ True, False, False],
[False, True, False],
[False, False, True]], dtype=bool)
>>> data[_]
array([ 2, 5, 6, 10, 14])
The mask to get the values for the second column of out: np.arange(3) == np.vstack(c[:,1]).
So, to get the out array...
>>> mask0 = np.arange(3) == np.vstack(c[:,0])
>>> mask1 = np.arange(3) == np.vstack(c[:,1])
>>> np.vstack((data[mask0], data[mask1])).T
array([[ 0, 2],
[ 4, 5],
[ 6, 6],
[10, 10],
[14, 14]])
Edit: Given arbitrary array widths K and N you could use a loop to create the masks, so the general construction of the out array might simply look like this:
np.vstack([data[np.arange(N) == np.vstack(c[:,i])] for i in range(K)]).T
Edit 2: A slightly neater solution (though still relying on a loop) is:
np.vstack([data[i][c[i]] for i in range(T)])

Categories