Search elements of one array in another, row-wise - Python / NumPy - python

For example, I have a matrix of unique elements,
a=[
[1,2,3,4],
[7,5,8,6]
]
and another unique matrix filled with elements which has appeard in the first matrix.
b=[
[4,1],
[5,6]
]
And I expect the result of
[
[3,0],
[1,3]
].
That is to say, I want to find each row elements of b which equals to some elements of a in the same row, return the indices of these elements in a.
How can i do that? Thanks.

Here's a vectorized approach -
# https://stackoverflow.com/a/40588862/ #Divakar
def searchsorted2d(a,b):
m,n = a.shape
max_num = np.maximum(a.max() - a.min(), b.max() - b.min()) + 1
r = max_num*np.arange(a.shape[0])[:,None]
p = np.searchsorted( (a+r).ravel(), (b+r).ravel() ).reshape(m,-1)
return p - n*(np.arange(m)[:,None])
def search_indices(a, b):
sidx = a.argsort(1)
a_s = np.take_along_axis(a,sidx,axis=1)
return np.take_along_axis(sidx,searchsorted2d(a_s,b),axis=1)
Sample run -
In [54]: a
Out[54]:
array([[1, 2, 3, 4],
[7, 5, 8, 6]])
In [55]: b
Out[55]:
array([[4, 1],
[5, 6]])
In [56]: search_indices(a, b)
Out[56]:
array([[3, 0],
[1, 3]])
Another vectorized one leveraging broadcasting -
In [65]: (a[:,None,:]==b[:,:,None]).argmax(2)
Out[65]:
array([[3, 0],
[1, 3]])

If you don't mind using loops, here's a quick solution using np.where:
import numpy as np
a=[[1,2,3,4],
[7,5,8,6]]
b=[[4,1],
[5,6]]
a = np.array(a)
b = np.array(b)
c = np.zeros_like(b)
for i in range(c.shape[0]):
for j in range(c.shape[1]):
_, pos = np.where(a==b[i,j])
c[i,j] = pos
print(c.tolist())

You can do it this way:
np.split(pd.DataFrame(a).where(pd.DataFrame(np.isin(a,b))).T.sort_values(by=[0,1])[::-1].unstack().dropna().reset_index().iloc[:,1].to_numpy(),len(a))
# [array([3, 0]), array([1, 3])]

Related

Numpy: multiply first elements n elements along an axis where n is given by an array

I have a 3D numpy array, e.g. A = np.random.rand(a, b, c), and I want to compute the product of the first n elements across the last axis (i.e. axis=2), where each n is given by an array N with N.shape = (a, b). The end result is a 2d array with shape (a, b).
For instance, let's say one slice (e.g. for a=0, b=0) is [3, 2, 5, 7], and N[0, 0] = 3, then I want the product 3*2*5, that is, multiply the first n=3 elements.
Is there any efficient way of doing this without resorting to very slow quasi-for-loop solitions like np.fromiter or np.vectorize?
Edit: as per request, a minimal example
A = np.array([
[[1, 2, 3], [4, 5, 6]],
[[1, 2, 1], [3, 2, 4]]
])
N = np.array([
[2, 1],
[3, 2]
])
# desired result using a for loop:
desired_result = np.full(N.shape, np.nan)
for a in range(A.shape[0]):
for b in range(A.shape[1]):
# multiply the first n=N[a, b] elements
desired_result[a, b] = np.product(A[a, b][:N[a, b]])
print(desired_result)
# output = array([[2., 4.], [2., 6.]])
Approach #1
Here's one vectorized way leveraging broadcasting -
# Mask of same shape as 3D input array and thats has True from 0th till N[a, b]]
# for each element in N.
In [22]: m = N[...,None] > np.arange(A.shape[2])
# Use it to create an array where all alements with True in mask are A,
# 1s otherwise. The idea is when prod reduced along the last axis those False
# from mask will not affect, while valid ones will be prod-reduced with proper
# values.
In [23]: np.where(m,A,1).prod(-1)
Out[23]:
array([[2, 4],
[2, 6]])
Alternatively, using numexpr to leverage multi-cores, as we will translate the masking steps from earlier to mathematical ones -
In [14]: import numexpr as ne
In [15]: ne.evaluate('prod(m*A + ~m,2)')
Out[15]:
array([[2, 4],
[2, 6]], dtype=int64)
Approach #2
Based on this idea, here's one with np.multiply.reduceat -
s0 = np.arange(0, A.size, A.shape[2])
p = np.stack((s0, N.ravel()+s0),axis=1).ravel()
out = np.multiply.reduceat(A.ravel(), p)[::2].reshape(N.shape)

Index n dimensional array with (n-1) d array

What is the most elegant way to access an n dimensional array with an (n-1) dimensional array along a given dimension as in the dummy example
a = np.random.random_sample((3,4,4))
b = np.random.random_sample((3,4,4))
idx = np.argmax(a, axis=0)
How can I access now with idx a to get the maxima in a as if I had used a.max(axis=0)? or how to retrieve the values specified by idx in b?
I thought about using np.meshgrid but I think it is an overkill. Note that the dimension axis can be any usefull axis (0,1,2) and is not known in advance. Is there an elegant way to do this?
Make use of advanced-indexing -
m,n = a.shape[1:]
I,J = np.ogrid[:m,:n]
a_max_values = a[idx, I, J]
b_max_values = b[idx, I, J]
For the general case:
def argmax_to_max(arr, argmax, axis):
"""argmax_to_max(arr, arr.argmax(axis), axis) == arr.max(axis)"""
new_shape = list(arr.shape)
del new_shape[axis]
grid = np.ogrid[tuple(map(slice, new_shape))]
grid.insert(axis, argmax)
return arr[tuple(grid)]
Quite a bit more awkward than such a natural operation should be, unfortunately.
For indexing a n dim array with a (n-1) dim array, we could simplify it a bit to give us the grid of indices for all axes, like so -
def all_idx(idx, axis):
grid = np.ogrid[tuple(map(slice, idx.shape))]
grid.insert(axis, idx)
return tuple(grid)
Hence, use it to index into input arrays -
axis = 0
a_max_values = a[all_idx(idx, axis=axis)]
b_max_values = b[all_idx(idx, axis=axis)]
using indexing in numpy https://docs.scipy.org/doc/numpy-1.10.1/reference/arrays.indexing.html#advanced-indexing
a = np.array([[1, 2], [3, 4], [5, 6]])
a
> a: array([[1, 2],
[3, 4],
[5, 6]])
idx = a.argmax(axis=1)
idx
> array([1, 0, 0], dtype=int64)
since you want all rows but only columns with idx indexes you can use [0, 1, 2] or np.arange(a.shape[0]) for the row indexes
rows = np.arange(a.shape[0])
a[rows, idx]
>array([3, 2, 1])
which is the same as a.max(axis=1)
a.max(axis=1)
>array([3, 2, 1])
if you have 3 dimensions you add the indexes of the 3rd dimension as well:
index2 = np.arange(a.shape[2])
a[rows, idx, index2]
I suggest the following:
a = np.array([[1, 3], [2, -2], [1, -1]])
a
>array([[ 1, 3],
[ 2, -2],
[ 1, -1]])
idx = a.argmax(axis=1)
idx
> array([1, 0, 0], dtype=int64)
np.take_along_axis(a, idx[:, None], axis=1).squeeze()
>array([3, 2, 1])
a.max(axis=1)
>array([3, 2, 1])

Delete one element from each row of a NumPy array

import numpy as np
a=np.array([[1,2,3], [4,5,6], [7,8,9]])
k = [0, 1, 2]
print np.delete(a, k, 1)
This returns
[]
But, the result I really want is
[[2,3],
[4,6],
[7,8]]
I want to delete the first element (indexed as 0) from a[0], the second (indexed as 1) from a[1], and the third (indexed as 2) from a[2].
Any thoughts?
Here's an approach using boolean indexing -
m,n = a.shape
out = a[np.arange(n) != np.array(k)[:,None]].reshape(m,-1)
If you would like to persist with np.delete, you could calculate the linear indices and then delete those after flattening the input array, like so -
m,n = a.shape
del_idx = np.arange(n)*m + k
out = np.delete(a.ravel(),del_idx,axis=0).reshape(m,-1)
Sample run -
In [94]: a
Out[94]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [95]: k = [0, 2, 1]
In [96]: m,n = a.shape
In [97]: a[np.arange(n) != np.array(k)[:,None]].reshape(m,-1)
Out[97]:
array([[2, 3],
[4, 5],
[7, 9]])
In [98]: del_idx = np.arange(n)*m + k
In [99]: np.delete(a.ravel(),del_idx,axis=0).reshape(m,-1)
Out[99]:
array([[2, 3],
[4, 5],
[7, 9]])

Howto expand 2D NumPy array by copy bottom row and right column?

I have a 2D NumPy array and I hope to expand its size on both dimensions by copying the bottom row and right column.
For example, from 2x2:
[[0,1],
[2,3]]
to 4x4:
[[0,1,1,1],
[2,3,3,3],
[2,3,3,3],
[2,3,3,3]]
What's the best way to do it?
Thanks.
Here, the hstack and vstack functions can come in handy. For example,
In [16]: p = array(([0,1], [2,3]))
In [20]: vstack((p, p[-1], p[-1]))
Out[20]:
array([[0, 1],
[2, 3],
[2, 3],
[2, 3]])
And remembering that p.T is the transpose:
So now you can do something like the following:
In [16]: p = array(([0,1], [2,3]))
In [22]: p = vstack((p, p[-1], p[-1]))
In [25]: p = vstack((p.T, p.T[-1], p.T[-1])).T
In [26]: p
Out[26]:
array([[0, 1, 1, 1],
[2, 3, 3, 3],
[2, 3, 3, 3],
[2, 3, 3, 3]])
So the 2 lines of code should do it...
Make an empty array and copy whatever rows, columns you want into it.
def expand(a, new_shape):
x, y = a.shape
r = np.empty(new_shape, a.dtype)
r[:x, :y] = a
r[x:, :y] = a[-1:, :]
r[:x, y:] = a[:, -1:]
r[x:, y:] = a[-1, -1]
return r

Select a submatrix based on diagonal value

I want to select a submatrix of a numpy matrix based on whether the diagonal is less than some cutoff value. For example, given the matrix:
Test = array([[1,2,3,4,5],
[2,3,4,5,6],
[3,4,5,6,7],
[4,5,6,7,8],
[5,6,7,8,9]])
I want to select the rows and columns where the diagonal value is less than, say, 6. In this example, the diagonal values are sorted, so that I could just take Test[:3,:3], but in the general problem I want to solve this isn't the case.
The following snippet works:
def MatrixCut(M,Ecut):
D = diag(M)
indices = D<Ecut
n = sum(indices)
NewM = zeros((n,n),'d')
ii = -1
for i,ibool in enumerate(indices):
if ibool:
ii += 1
jj = -1
for j,jbool in enumerate(indices):
if jbool:
jj += 1
NewM[ii,jj] = M[i,j]
return NewM
print MatrixCut(Test,6)
[[ 1. 2. 3.]
[ 2. 3. 4.]
[ 3. 4. 5.]]
However, this is fugly code, with all kinds of dangerous things like initializing the ii/jj indices to -1, which won't cause an error if somehow I get into the loop and take M[-1,-1].
Plus, there must be a numpythonic way of doing this. For a one-dimensional array, you could do:
D = diag(A)
A[D<Ecut]
But the analogous thing for a 2d array doesn't work:
D = diag(Test)
Test[D<6,D<6]
array([1, 3, 5])
Is there a good way to do this? Thanks in advance.
This also works when the diagonals are not sorted:
In [7]: Test = array([[1,2,3,4,5],
[2,3,4,5,6],
[3,4,5,6,7],
[4,5,6,7,8],
[5,6,7,8,9]])
In [8]: d = np.argwhere(np.diag(Test) < 6).squeeze()
In [9]: Test[d][:,d]
Out[9]:
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5]])
Alternately, to use a single subscript call, you could do:
In [10]: d = np.argwhere(np.diag(Test) < 6)
In [11]: Test[d, d.flat]
Out[11]:
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5]])
[UPDATE]: Explanation of the second form.
At first, it may be tempting to just try Test[d, d] but that will only extract elements from the diagonal of the array:
In [75]: Test[d, d]
Out[75]:
array([[1],
[3],
[5]])
The problem is that d has shape (3, 1) so if we use d in both subscripts, the output array will have the same shape as d. The d.flat is equivalent to using d.flatten() or d.ravel() (except flat just returns an iterator instead of an array). The effect is that the result has shape (3,):
In [76]: d
Out[76]:
array([[0],
[1],
[2]])
In [77]: d.flatten()
Out[77]: array([0, 1, 2])
In [79]: print d.shape, d.flatten().shape
(3, 1) (3,)
The reason Test[d, d.flat] works is because numpy's general broadcasting rules cause the last dimension of d (which is 1) to be broadcast to the last (and only) dimension of d.flat (which is 3). Similarly, d.flat is broadcast to match the first dimension of d. The result is two (3,3) index arrays, which are equivalent to the following arrays i and j:
In [80]: dd = d.flatten()
In [81]: i = np.hstack((d, d, d)
In [82]: j = np.vstack((dd, dd, dd))
In [83]: print i
[[0 0 0]
[1 1 1]
[2 2 2]]
In [84]: print j
[[0 1 2]
[0 1 2]
[0 1 2]]
And just to make sure they work:
In [85]: Test[i, j]
Out[85]:
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5]])
The only way I found to solve your task is somewhat tricky
>>> Test[[[i] for i,x in enumerate(D<6) if x], D<6]
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5]])
possibly not the best one. Based on this answer.
Or (thanks to #bogatron or reminding me argwhere):
>>> Test[np.argwhere(D<6), D<6]
array([[1, 2, 3],
[2, 3, 4],
[3, 4, 5]])

Categories