Get list of indices matching condition with NumPy [duplicate]

Get list of indices matching condition with NumPy [duplicate] - python

Is there any way to get the indices of several elements in a NumPy array at once?
E.g.
import numpy as np
a = np.array([1, 2, 4])
b = np.array([1, 2, 3, 10, 4])
I would like to find the index of each element of a in b, namely: [0,1,4].
I find the solution I am using a bit verbose:
import numpy as np
a = np.array([1, 2, 4])
b = np.array([1, 2, 3, 10, 4])
c = np.zeros_like(a)
for i, aa in np.ndenumerate(a):
c[i] = np.where(b == aa)[0]
print('c: {0}'.format(c))
Output:
c: [0 1 4]

You could use in1d and nonzero (or where for that matter):
>>> np.in1d(b, a).nonzero()[0]
array([0, 1, 4])
This works fine for your example arrays, but in general the array of returned indices does not honour the order of the values in a. This may be a problem depending on what you want to do next.
In that case, a much better answer is the one #Jaime gives here, using searchsorted:
>>> sorter = np.argsort(b)
>>> sorter[np.searchsorted(b, a, sorter=sorter)]
array([0, 1, 4])
This returns the indices for values as they appear in a. For instance:
a = np.array([1, 2, 4])
b = np.array([4, 2, 3, 1])
>>> sorter = np.argsort(b)
>>> sorter[np.searchsorted(b, a, sorter=sorter)]
array([3, 1, 0]) # the other method would return [0, 1, 3]

This is a simple one-liner using the numpy-indexed package (disclaimer: I am its author):
import numpy_indexed as npi
idx = npi.indices(b, a)
The implementation is fully vectorized, and it gives you control over the handling of missing values. Moreover, it works for nd-arrays as well (for instance, finding the indices of rows of a in b).

All of the solutions here recommend using a linear search. You can use np.argsort and np.searchsorted to speed things up dramatically for large arrays:
sorter = b.argsort()
i = sorter[np.searchsorted(b, a, sorter=sorter)]

For an order-agnostic solution, you can use np.flatnonzero with np.isin (v 1.13+).
import numpy as np
a = np.array([1, 2, 4])
b = np.array([1, 2, 3, 10, 4])
res = np.flatnonzero(np.isin(a, b)) # NumPy v1.13+
res = np.flatnonzero(np.in1d(a, b)) # earlier versions
# array([0, 1, 2], dtype=int64)

There are a bunch of approaches for getting the index of multiple items at once mentioned in passing in answers to this related question: Is there a NumPy function to return the first index of something in an array?. The wide variety and creativity of the answers suggests there is no single best practice, so if your code above works and is easy to understand, I'd say keep it.
I personally found this approach to be both performant and easy to read: https://stackoverflow.com/a/23994923/3823857
Adapting it for your example:
import numpy as np
a = np.array([1, 2, 4])
b_list = [1, 2, 3, 10, 4]
b_array = np.array(b_list)
indices = [b_list.index(x) for x in a]
vals_at_indices = b_array[indices]
I personally like adding a little bit of error handling in case a value in a does not exist in b.
import numpy as np
a = np.array([1, 2, 4])
b_list = [1, 2, 3, 10, 4]
b_array = np.array(b_list)
b_set = set(b_list)
indices = [b_list.index(x) if x in b_set else np.nan for x in a]
vals_at_indices = b_array[indices]
For my use case, it's pretty fast, since it relies on parts of Python that are fast (list comprehensions, .index(), sets, numpy indexing). Would still love to see something that's a NumPy equivalent to VLOOKUP, or even a Pandas merge. But this seems to work for now.

Related

Indexing array from second element for all elements

I think it must be easy, but I cannot google it. Suppose I have array of numbers 1, 2, 3, 4.
import numpy as np
a = np.array([1,2,3,4])
How to index array if I want sequence 2, 3, 4, 1??
I know that for sequence 2, 3, 4 I can choose e.g.:
print(a[1::1])

If you want to rotate the list, you can use a deque instead of a numpy array. This data structure is designed for this kind of operation and directly provides a rotate function.
>>> from collections import deque
>>> a = deque([1, 2, 3, 4])
>>> a.rotate(-1)
>>> a
deque([2, 3, 4, 1])
If you want to use Numpy, you can check out the roll function.
>>> import numpy as np
>>> a = np.array([1,2,3,4])
>>> np.roll(a, -1)
array([2, 3, 4, 1])

One possible way is to define index set (a list).
index_set = [1, 2, 3, 0]
print(a[index_set])

How to convert [2,3,4] to [0,0,1,1,1,2,2,2,2] to utilize tf.math.segment_sum?

Assume I have an array like [2,3,4], I am looking for a way in NumPy (or Tensorflow) to convert it to [0,0,1,1,1,2,2,2,2] to apply tf.math.segment_sum() on a tensor that has a size of 2+3+4.
No elegant idea comes to my mind, only loops and list comprehension.

Would something like this work for you?
import numpy
arr = numpy.array([2, 3, 4])
numpy.repeat(numpy.arange(arr.size), arr)
# array([0, 0, 1, 1, 1, 2, 2, 2, 2])

You don't need to use numpy. You can use nothing but list comprehensions:
>>> foo = [2,3,4]
>>> sum([[i]*foo[i] for i in range(len(foo))], [])
[0, 0, 1, 1, 1, 2, 2, 2, 2]
It works like this:
You can create expanded arrays by multiplying a simple one with a constant, so [0] * 2 == [0,0]. So for each index in the array, we expand with [i]*foo[i]. In other words:
>>> [[i]*foo[i] for i in range(len(foo))]
[[0, 0], [1, 1, 1], [2, 2, 2, 2]]
Then we use sum to reduce the lists into a single list:
>>> sum([[i]*foo[i] for i in range(len(foo))], [])
[0, 0, 1, 1, 1, 2, 2, 2, 2]
Because we are "summing" lists, not integers, we pass [] to sum to make an empty list the starting value of the sum.
(Note that this likely will be slower than numpy, though I have not personally compared it to something like #Patol75's answer.)

I really like the answer from #Patol75 since it's neat. However, there is no pure tensorflow solution yet, so I provide one which maybe kinda complex. Just for reference and fun!
BTW, I didn't see tf.repeat this API in tf master. Please check this PR which adds tf.repeat support equivalent to numpy.repeat.
import tensorflow as tf
repeats = tf.constant([2,3,4])
values = tf.range(tf.size(repeats)) # [0,1,2]
max_repeats = tf.reduce_max(repeats) # max repeat is 4
tiled = tf.tile(tf.reshape(values, [-1,1]), [1,max_repeats]) # [[0,0,0,0],[1,1,1,1],[2,2,2,2]]
mask = tf.sequence_mask(repeats, max_repeats) # [[1,1,0,0],[1,1,1,0],[1,1,1,1]]
res = tf.boolean_mask(tiled, mask) # [0,0,1,1,1,2,2,2,2]

Patol75's answer uses Numpy but Gort the Robot's answer is actually faster (on your example list at least).
I'll keep this answer up as another solution, but it's slower than both.
Given that a = [2,3,4] this could be done using a loop like so:
b = []
for i in range(len(a)):
for j in range(a[i]):
b.append(range(len(a))[i])
Which, as a list comprehension one-liner, is this diabolical thing:
b = [range(len(a))[i] for i in range(len(a)) for j in range(a[i])]
Both end up with b = [0,0,1,1,1,2,2,2,2].

Make member vectors out of tuples in python

I have a list of tuples/lists.
Example:
a = [[1,2], [2,4], [3,6]]
Given all sub-lists are the same length I want to split them and receive lists/vectors for each member.
Or in one [[1,2,3],[2,4,6]]
Every solution using numpy or default lists would be appretiated.
I have not found a way to do this pythonicly, or efficiently by using any other feature than loops:
def vectorise_pairs(pairs):
return [[p[0] for p in pairs],
[p[1] for p in pairs]
]
Is there a better way to do this?

first, second = zip(*a)
print(first, second)
outputs
(1, 2, 3) (2, 4, 6)
If you need lists or numpy arrays you can convert them:
first, second = list(first), list(second)
first, second = np.array(first), np.array(second)

Since you tagged numpy, my_array.T transposes my_array.
>>> import numpy as np
>>> a = [[1,2], [2,4], [3,6]]
>>> np.array(a).T
array([[1, 2, 3],
[2, 4, 6]])
Alternatively, you can use np.transpose (which even accepts lists).
>>> np.transpose(a)
array([[1, 2, 3],
[2, 4, 6]])

Alex's solution works well as a general transposition of any Python iterable. If you have some reason to specifically want to use Numpy, you could also use the following:
import numpy as np
a = np.array([[1,2], [2,4], [3,6]])
first, second = a.T
# OR,
first = a[:, 0]
second = a[:, 1] # etc.

Directly from the official documentation (https://docs.python.org/2/tutorial/datastructures.html#nested-list-comprehensions):
a = [[1,2], [2,4], [3,6]]
[[row[i] for row in a] for i in range(len(a[0]))]
#=> [[1, 2, 3], [2, 4, 6]]

Replace values in a large list of arrays (performance)

I have a performance problem with replacing values of a list of arrays using a dictionary.
Let's say this is my dictionary:
# Create a sample dictionary
keys = [1, 2, 3, 4]
values = [5, 6, 7, 8]
dictionary = dict(zip(keys, values))
And this is my list of arrays:
# import numpy as np
# List of arrays
listvalues = []
arr1 = np.array([1, 3, 2])
arr2 = np.array([1, 1, 2, 4])
arr3 = np.array([4, 3, 2])
listvalues.append(arr1)
listvalues.append(arr2)
listvalues.append(arr3)
listvalues
>[array([1, 3, 2]), array([1, 1, 2, 4]), array([4, 3, 2])]
I then use the following function to replace all values in a nD numpy array using a dictionary:
# Replace function
def replace(arr, rep_dict):
rep_keys, rep_vals = np.array(list(zip(*sorted(rep_dict.items()))))
idces = np.digitize(arr, rep_keys, right=True)
return rep_vals[idces]
This function is really fast, however I need to iterate over my list of arrays to apply this function to each array:
replaced = []
for i in xrange(len(listvalues)):
replaced.append(replace(listvalues[i], dictionary))
This is the bottleneck of the process, as it needs to iterate over thousands of arrays.
How could I do achieve the same result without using the for-loop? It is important that the result is in the same format as the input (a list of arrays with replaced values)
Many thanks guys!!

This will do the trick efficiently, using the numpy_indexed package. It can be further simplified if all values in 'listvalues' are guaranteed to be present in 'keys'; but ill leave that as an exercise to the reader.
import numpy_indexed as npi
arr = np.concatenate(listvalues)
idx = npi.indices(keys, arr, missing='mask')
remap = np.logical_not(idx.mask)
arr[remap] = np.array(values)[idx[remap]]
replaced = np.array_split(arr, np.cumsum([len(a) for a in listvalues][:-1]))

Reverse part of an array using NumPy

I am trying to use array slicing to reverse part of a NumPy array. If my array is, for example,
a = np.array([1,2,3,4,5,6])
then I can get a slice b
b = a[::-1]
Which is a view on the original array. What I would like is a view that is partially reversed, for example
1,4,3,2,5,6
I have encountered performance problems with NumPy if you don't play along exactly with how it is designed, so I would like to avoid "fancy" indexing if it is possible.

If you don't like the off by one indices
>>> a = np.array([1,2,3,4,5,6])
>>> a[1:4] = a[1:4][::-1]
>>> a
array([1, 4, 3, 2, 5, 6])

>>> a = np.array([1,2,3,4,5,6])
>>> a[1:4] = a[3:0:-1]
>>> a
array([1, 4, 3, 2, 5, 6])

You can use the permutation matrices (that's the numpiest way to partially reverse an array).
a = np.array([1,2,3,4,5,6])
new_order_for_index = [1,4,3,2,5,6] # Careful: index from 1 to n !
# Permutation matrix
m = np.zeros( (len(a),len(a)) )
for index , new_index in enumerate(new_order_for_index ):
m[index ,new_index -1] = 1
print np.dot(m,a)
# np.array([1,4,3,2,5,6])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Get list of indices matching condition with NumPy [duplicate] - python

All of the solutions here recommend using a linear search. You can use np.argsort and np.searchsorted to speed things up dramatically for large arrays: sorter = b.argsort() i = sorter[np.searchsorted(b, a, sorter=sorter)]

For an order-agnostic solution, you can use np.flatnonzero with np.isin (v 1.13+). import numpy as np a = np.array([1, 2, 4]) b = np.array([1, 2, 3, 10, 4]) res = np.flatnonzero(np.isin(a, b)) # NumPy v1.13+ res = np.flatnonzero(np.in1d(a, b)) # earlier versions # array([0, 1, 2], dtype=int64)

Related

Indexing array from second element for all elements

How to convert [2,3,4] to [0,0,1,1,1,2,2,2,2] to utilize tf.math.segment_sum?

Make member vectors out of tuples in python

Replace values in a large list of arrays (performance)

Reverse part of an array using NumPy

Categories

Resources