Numpy: Efficient search for an array into the other one

Numpy: Efficient search for an array into the other one - python

I have got two arrays like those below:
first_array = np.array([2,2,2,10,10,15,20,20,20,20])
second_array = np.array([15,5,10,78,2,44,20,2,66,1,10,15,40,85,71,23,20,45,29,20,1])
I want to search for each element of the first array inside the second one to end up with a 2D array which includes the indexes of the second array and the values searched.
The codes below are working and giving what I desire to me, but they also seem definitely inefficient to me. There must exist an index operation or some different approaches rather than applying element by element search (loop).
out = []
for i in first_array:
index = np.argwhere(second_array == i)
out.append(np.array([*index.T,np.ones(len(index))*i]))
np.hstack(out).T
Here is the desired output for those who don't want to run the codes.
desired_output = np.array([[4,7,4,7,4,7,2,10,2,10,0,11,6,16,19,6,16,19,6,16,19,6,16,19],
[2,2,2,2,2,2,10,10,10,10,15,15,20,20,20,20,20,20,20,20,20,20,20,20]]).T
Thanks in advance!

It is easy to think of two solutions:
Broadcast comparison, but this is still the O(mn) algorithm (m = len(first_array) and n = len(second_array)):
>>> i, j = (first_array[:, None] == second_array).nonzero()
>>> np.array([j, first_array[i]]).T
array([[ 4, 2],
[ 7, 2],
[ 4, 2],
[ 7, 2],
[ 4, 2],
[ 7, 2],
[ 2, 10],
[10, 10],
[ 2, 10],
[10, 10],
[ 0, 15],
[11, 15],
[ 6, 20],
[16, 20],
[19, 20],
[ 6, 20],
[16, 20],
[19, 20],
[ 6, 20],
[16, 20],
[19, 20],
[ 6, 20],
[16, 20],
[19, 20]], dtype=int64)
Use the dictionary to build the mapping between elements and indices. If don't consider the final concatenating result step, this is the O(m + n) algorithm. However, in the current example, the performance is not as good as the first solution:
>>> mp = {}
>>> for i, val in enumerate(first_array):
... mp.setdefault(val, []).append(i)
...
>>> np.concatenate([(indices := mp[val], [val] * len(indices))
... for val in first_array], -1).T
array([[ 4, 2],
[ 7, 2],
[ 4, 2],
[ 7, 2],
[ 4, 2],
[ 7, 2],
[ 2, 10],
[10, 10],
[ 2, 10],
[10, 10],
[ 0, 15],
[11, 15],
[ 6, 20],
[16, 20],
[19, 20],
[ 6, 20],
[16, 20],
[19, 20],
[ 6, 20],
[16, 20],
[19, 20],
[ 6, 20],
[16, 20],
[19, 20]])

Related

Using numpy.delete() or any other function to delete from a list of lists/arrays

I have the following list, let's call it R:
[(array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]),
array([100, 101, 102])),
(array([[10, 11, 12],
[13, 14, 15],
[16, 17, 18]]),
array([103, 104, 105]))]
I want to be able to delete columns of R in a for loop, based on an index i. For example, if i = 3, the 3rd column should be deleted, which should result in the following new, say R1:
[(array([[1, 2],
[4, 5],
[7, 8]]),
array([100, 101])),
(array([[10, 11],
[13, 14],
[16, 17]]),
array([103, 104]))]
I have zero experience with handling such multi dimensional arrays, so I am unsure how to use numpy.delete(). My actual list R is pretty big, so I would appreciate if someone can suggest how to go about the loop.

You can use np.delete with col==2 and axis=-1.
# if your 'list' be like below as you say in the question :
print(lst)
# [
# array([[1, 2, 3],
# [4, 5, 6],
# [7, 8, 9]]),
# array([100, 101, 102]),
# array([[10, 11, 12],
# [13, 14, 15],
# [16, 17, 18]]),
# array([103, 104, 105])
# ]
for idx, l in enumerate(lst):
lst[idx] = np.delete(l, 2, axis=-1)
print(lst)
Output:
[
array([[1, 2],
[4, 5],
[7, 8]]),
array([100, 101]),
array([[10, 11],
[13, 14],
[16, 17]]),
array([103, 104])
]
Creating input array like in the question:
import numpy as np
lst = [[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]],
[100, 101, 102],
[[10, 11, 12],
[13, 14, 15],
[16, 17, 18]],
[103, 104, 105]
]
lst = [np.array(l) for l in lst]
Update base comment, If you have a tuple of np.array in your list, you can try like below:
lst = [
(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), np.array([100, 101, 102])),
(np.array([[10, 11, 12], [13, 14, 15], [16, 17, 18]]), np.array([103, 104, 105]))
]
for idx, tpl in enumerate(lst):
lst[idx] = tuple(np.delete(l, 2, axis=-1) for l in tpl)
print(lst)
Output:
[
(array([[1, 2],
[4, 5],
[7, 8]]),
array([100, 101])
),
(array([[10, 11],
[13, 14],
[16, 17]]),
array([103, 104]))
]

Use array to define indices for multidimensional numpy array

I have a multidimensional Numpy array; let's say it's
myArray = array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])
I know that running myArray[1,1,1], for instance, will return 13. However, I want to define indx = [1,1,1] then call something to the effect ofmyArray[indx].
However, this does some other multidimensional indexing stuff.
I have also tried myArray[*indx] but that understandably throws a syntax error.
Currently my very ugly workaround is to define
def array_as_indices(array, matrix):
st = ''
for i in array:
st += '%s,' % i
st = st[:-1]
return matrix[eval(st)]
which works but is quite inelegant and presumably slow.
Is there a more pythonic way to do what I'm looking for?

This is a duplicate of Unpacking tuples/arrays/lists as indices for Numpy Arrays, but you can just create a tuple
import numpy as np
def main():
my_array = np.array(
[
[[0, 1, 2], [3, 4, 5], [6, 7, 8]],
[[9, 10, 11], [12, 13, 14], [15, 16, 17]],
[[18, 19, 20], [21, 22, 23], [24, 25, 26]],
]
)
print(f"my_array[1,1,1]: {my_array[1,1,1]}")
indx = (1, 1, 1)
print(f"my_array[indx]: {my_array[indx]}")
if __name__ == "__main__":
main()
will give
my_array[1,1,1]: 13
my_array[indx]: 13

The indices of a numpy array are addressed by tuples, not lists. Use indx = (1, 1, 1).
As an extension, if you want to call the indices (1, 1, 1) and (2, 2, 2), you can use
>>> indx = ([1, 2], [1, 2], [1, 2])
>>> x[indx]
array([13, 26])
The rationale behind the behavior with lists is that numpy treats lists sequentially, so
>>> indx = [1, 1, 1]
>>> x[indx]
array([[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]]])
It returns a list of three elements, each equal to x[1].

How to subset a numpy array of different lengths

I have a numpy array and would like to subset the first two arrays of each element in an ndarray.
Here is an example array:
import numpy as np
a1 = np.array([[ 1, 2, 3],
[ 4, 5, 6]])
a2 = np.array([[ 7, 8, 9],
[10, 11, 12],
[13, 14, 15],
[16, 17, 18]])
a3 = np.array([[19, 20, 21],
[22, 23, 24],
[25, 26, 27]])
A = np.array([a1, a2, a3])
print("A =\n", A)
Which prints:
A =
[array([[ 1, 2, 3],
[ 4, 5, 6]])
array([[ 7, 8, 9],
[10, 11, 12],
[13, 14, 15],
[16, 17, 18]])
array([[19, 20, 21],
[22, 23, 24],
[25, 26, 27]])]
The desired result is as follows:
A =
[array([[ 1, 2, 3],
[ 4, 5, 6]])
array([[ 7, 8, 9],
[10, 11, 12]])
array([[19, 20, 21],
[22, 23, 24]])]
To print the equivalent object, you could do
print(np.array([a1[0:2], a2[0:2], a3[0:2]]))
But I want to directly get what is desired using A.
What is the correct way of doing this in numpy?
Edit: I would like to subset the array without looping. Alternative ways of structuring the arrays so that they can be directly indexed are okay too. Any numpy function to avoid looping is fair game.

a = [i[0:2] for i in A]
This will work!

numpy 3d array and 1d array addition on first axis

i have a 1d np array "array1d" and a 3d np array "array3d", i want to sum them so the n'th value in "array1d" will be added to each of the elements of the n'th plane in array3d.
this can be done in the following loop
for i, value in enumerate(array1d):
array3d[i] += value
question is, how can this be done in a single numpy line?
example arrays:
arr1d = np.array(range(3))
>>>array([0, 1, 2])
arr3d = np.array(range(27)).reshape(3, 3, 3)
>>>array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]]])
wanted result:
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 10, 11, 12],
[13, 14, 15],
[16, 17, 18]],
[[20, 21, 22],
[23, 24, 25],
[26, 27, 28]]])

Use Numpy's broadcasting features:
In [23]: arr1d[:, None, None] + arr3d
Out[23]:
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[10, 11, 12],
[13, 14, 15],
[16, 17, 18]],
[[20, 21, 22],
[23, 24, 25],
[26, 27, 28]]])
This basically copies the content of arr1d across the other two dimensions (without actually copying, it just provides a view of the memory which looks like it). Instead of None, you can also use numpy.newaxis.
Alternatively, you can also use reshape:
In [32]: arr1d.reshape(3, 1, 1) + arr3d
Out[32]:
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[10, 11, 12],
[13, 14, 15],
[16, 17, 18]],
[[20, 21, 22],
[23, 24, 25],
[26, 27, 28]]])

How to select values in a n-dimensional array

I have been trying to perform a simple operation, but I can't seem to find a simple way to do it using Numpy functions without creating unnecessary copies of the array.
Suppose we have the following 3-dimensional array :
In [171]: x = np.arange(24).reshape((4, 3, 2))
In [172]: x
Out[172]:
array([[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[ 6, 7],
[ 8, 9],
[10, 11]],
[[12, 13],
[14, 15],
[16, 17]],
[[18, 19],
[20, 21],
[22, 23]]])
And the following array :
In [173]: y = np.array([0, 1, 1, 0])
I want to select in x, for each row, the value of the last dimension whose index is the corresponding element in y. In other words, I want :
array([[ 0, 2, 4],
[ 7, 9, 11],
[13, 15, 17],
[18, 20, 22]])
The only solution that I have for now is using a for loop over the first dimension of x and y, as follows :
z = np.zeros((4, 3), dtype=int)
for i, row in enumerate(x):
z[i, :] = row[:, y[i]]
Is there a way of avoiding a for loop here, using numpy functions or fancy indexing?
Thanks!

The tricky aspect is that you don't want all of the 0th-dimension for each slice, you want the slices to correspond to each element in the 0th-dimension. So you could do something like:
>>> x[np.arange(x.shape[0]), :, y]
array([[ 0, 2, 4],
[ 7, 9, 11],
[13, 15, 17],
[18, 20, 22]])

Fancy indexing:
x[np.arange(y.size),:,y]
gives:
array([[ 0, 2, 4],
[ 7, 9, 11],
[13, 15, 17],
[18, 20, 22]])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Numpy: Efficient search for an array into the other one - python

Related

Using numpy.delete() or any other function to delete from a list of lists/arrays

Use array to define indices for multidimensional numpy array

How to subset a numpy array of different lengths

numpy 3d array and 1d array addition on first axis

How to select values in a n-dimensional array

Categories

Resources