Numpy: get index of smallest value based on conditions

Numpy: get index of smallest value based on conditions - python

I have an array as such:
array([[ 10, -1],
[ 3, 1],
[ 5, -1],
[ 7, 1]])
What I want is to get the index of row with the smallest value in the first column and -1 in the second.
So basically, np.argmin() with a condition for the second column to be equal to -1 (or any other value for that matter).
In my example, I would like to get 2 which is the index of [ 5, -1].
I'm pretty sure there's a simple way, but I can't find it.

import numpy as np
a = np.array([
[10, -1],
[ 3, 1],
[ 5, -1],
[ 7, 1]])
mask = (a[:, 1] == -1)
arg = np.argmin(a[mask][:, 0])
result = np.arange(a.shape[0])[mask][arg]
print result

np.argwhere(a[:,1] == -1)[np.argmin(a[a[:, 1] == -1, 0])]

This is not efficient but if you have a relatively small array and want a one-line solution:
>>> a = np.array([[ 10, -1],
... [ 3, 1],
... [ 5, -1],
... [ 7, 1]])
>>> [i for i in np.argsort(a[:, 0]) if a[i, 1] == -1][0]
2

Related

Replace consecutive duplicates in 2D numpy array

I have a two dimensional numpy array x:
import numpy as np
x = np.array([
[1, 2, 8, 4, 5, 5, 5, 3],
[0, 2, 2, 2, 2, 1, 1, 4]
])
My goal is to replace all consecutive duplicate numbers with a specific value (lets take -1), but by leaving one occurrence unchanged.
I could do this as follows:
def replace_consecutive_duplicates(x):
consec_dup = np.zeros(x.shape, dtype=bool)
consec_dup[:, 1:] = np.diff(x, axis=1) == 0
x[consec_dup] = -1
return x
# current output
replace_consecutive_duplicates(x)
# array([[ 1, 2, 8, 4, 5, -1, -1, 3],
# [ 0, 2, -1, -1, -1, 1, -1, 4]])
However, in this case the one occurrence left unchanged is always the first.
My goal is to leave the middle occurrence unchanged.
So given the same x as input, the desired output of function replace_consecutive_duplicates is:
# desired output
replace_consecutive_duplicates(x)
# array([[ 1, 2, 8, 4, -1, 5, -1, 3],
# [ 0, -1, 2, -1, -1, 1, -1, 4]])
Note that in case consecutive duplicate sequences with an even number of occurrences the middle left value should be unchanged. So the consecutive duplicate sequence [2, 2, 2, 2] in x[1] becomes [-1, 2, -1, -1]
Also note that I'm looking for a vectorized solution for 2D numpy arrays since performance is of absolute importance in my particular use case.
I've already tried looking at things like run length encoding and using np.diff(), but I didn't manage to solve this. Hope you guys can help!

The main problem is that you require the length of the number of consecutives values. This is not easy to get with numpy, but using itertools.groupby we can solve it using the following code.
import numpy as np
x = np.array([
[1, 2, 8, 4, 5, 5, 5, 3],
[0, 2, 2, 2, 2, 1, 1, 4]
])
def replace_row(arr: np.ndarray, new_val=-1):
results = []
for val, count in itertools.groupby(arr):
k = len(list(count))
results.extend([new_val] * ((k - 1) // 2))
results.append(val)
results.extend([new_val] * (k // 2))
return np.fromiter(results, arr.dtype)
if __name__ == '__main__':
for idx, row in enumerate(x):
x[idx, :] = replace_row(row)
print(x)
Output:
[[ 1 2 8 4 -1 5 -1 3]
[ 0 -1 2 -1 -1 1 -1 4]]
This isn't vectorized, but can be combined with multi threading since every row is handled one by one.

How to get count of elements in a nested list?

I am trying to count labels in a nested list and would appreciate any help on how to go about it. An example of the array I am trying to count is given below. This is not the real array, but a good enough approximation:
x = [[[0,0,1,0,2,3],-1],[[-1,1,1,0,2,5],-1],[[0,0,1,0,2,1],-1],[[0,0,-1,0,2,3],0]]
What I would like is to count all the occurrences of a given integer in the second element of the middle list (to visualize better, I would like to count all occurrences of X in the list like this [[[],X]]).
For instance, counting -1 in the above array would get 3 as a result and not 5. I do not want to get into loops and counters and such naive computations because the arrays I am working with are fairly large. Are there any standard python libraries that deal with such cases?

One approach:
data = [[[0, 0, 1, 0, 2, 3], -1], [[-1, 1, 1, 0, 2, 5], -1], [[0, 0, 1, 0, 2, 1], -1], [[0, 0, -1, 0, 2, 3], 0]]
res = sum(1 for _, y in data if y == -1)
print(res)
Output
3
Alternative, use collections.Counter, if you need to count more than a single element.
res = Counter(y for _, y in data)
print(res)
Output
Counter({-1: 3, 0: 1})
A third alternative is use operator.countOf:
from operator import countOf
res = countOf((y for _, y in data), -1)
print(res)
Output
3

You can use collections.Counter:
from collections import Counter
c = Counter((i[1] for i in x))
c[-1]
output:
>>> c[-1]
3
>>> c
Counter({-1: 3, 0: 1})

x = [
[ [0,0,1,0,2,3], -1],
[ [-1, 1, 1, 0, 2, 5], -1],
[ [0, 0, 1, 0, 2, 1], -1],
[ [0, 0, -1, 0, 2, 3], 0]
]
These are the items of the list [ [....], x] with another list and an integer.
You want to count how many times the x is -1 and not 0.
x = [
[ [0,0,1,0,2,3], -1],
[ [-1, 1, 1, 0, 2, 5], -1],
[ [0, 0, 1, 0, 2, 1], -1],
[ [0, 0, -1, 0, 2, 3], 0]
]
def counter(_list, element):
count = 0
for item in _list:
if item[1] == element:
count += 1
return count
print(counter(x, -1))
>>> 3

How can I extract a column and create a vector out of them using NumPy?

mat = [[ 1. 2. 3. 4. 5.]
[ 6. 7. 8. 9. 10.]
[11. 12. 13. 14. 15.]]
Suppose, I have this NumPy array.
Say, I need to extract the 2nd column of each row, convert them into binary, and then create a vector out of them.
How can I do it using NumPy?
For instance, if I select 2nd column of this NumPy array, my output should look as follows:
[[0 0 1 0],
[0 1 1 1],
[1 1 0 0]]
I tried as follows:
my_data = np.genfromtxt('data_input')
print(my_data)
my_data_2nd_column = my_data[:, 1]
my_data_2nd_column_binary = Utils.encode(my_data_2nd_column)
my_2nd_column_binary = np.apply_along_axis(Utils.encode, 1, my_data)
print(my_2nd_column_binary)

Numpy has a built-in function for this. First, you can get a particular column using indexing:
>>> arr
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15]])
>>> arr[:, [1]]
array([[ 2],
[ 7],
[12]])
Then, you could use the built-in function, but make sure you convert to unsigned, 8-bit integers:
>>> np.unpackbits(arr[:, [1]].astype(np.uint8), axis=1)
array([[0, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 1, 1, 0, 0]], dtype=uint8)
Of course, if you need the second dimension to be rank 4, just use slicing again, although, it is probably worth copying if you are going to do lots of operations on the resulting array:
>>> np.unpackbits(arr[:, [1]].astype(np.uint8), axis=1)[:, -4:]
array([[0, 0, 1, 0],
[0, 1, 1, 1],
[1, 1, 0, 0]], dtype=uint8)

I had done this example, but without using the numpy library.
I commented on all functions.
mat = [[ 1, 2, 3, 4, 1,],
[ 6, 7, 8, 9, 40,],
[11, 12, 13, 14, 15,]]
# convert the binary into a vector of elements
def split(word):
return [int(char) for char in word]
# returns the vector size of the largest binary
def binaryBig(lista):
maior = max(lista, key=int)
temp = "{0:b}".format(maior)
return len(split(temp))
# convert the element to binary
def binary(x,big):
temp = split(format(x, "b"))
for n in range(len(temp),big):
temp.insert(0,0)
return temp
# create the matrix with the binaries
def createBinaryMat(lista):
big = binaryBig(lista)
mat = []
for i in lista:
mat.append(binary(i,big))
return mat
# select the column and return the created matrix
def binaryElementsOfColum(colum,mat):
lista = []
for i in mat:
lista.append(i[colum])
return createBinaryMat(lista)
for i in binaryElementsOfColum(4,mat):
print(i)
Output:
[0, 0, 0, 0, 0, 1]
[1, 0, 1, 0, 0, 0]
[0, 0, 1, 1, 1, 1]

Batch dot product with numpy?

I need to get the dot product of many vectors with one vector. Example code:
a = np.array([0, 1, 2])
b = np.array([
[0, 1, 2],
[4, 5, 6],
[-1, 0, 1],
[-3, -2, 1]
])
I would like to get the dot product of each row of b against a. I can iterate:
result = []
for row in b:
result.append(np.dot(row, a))
print(result)
which gives:
[5, 17, 2, 0]
How can I get this without iterating? Thanks!

Use numpy.dot or numpy.matmul without for loop:
import numpy as np
np.matmul(b, a)
# or
np.dot(b, a)
Output:
array([ 5, 17, 2, 0])

I will just do #
b#a
Out[108]: array([ 5, 17, 2, 0])

How to find the rows having values between -1 and 1 in a given numpy 2D-array?

I have a np.array of shape (15,3).
final_vals = array([[ 37, -84, -143],
[ 29, 2, -2],
[ -18, -2, 0],
[ -3, 6, 0],
[ 361, -5, 2],
[ -23, 4, 8],
[ 0, -1, 0],
[ -1, 1, 0],
[ 62, 181, 83],
[-193, -14, -2],
[ 42, -154, -92],
[ 16, -13, 1],
[ -10, -3, 0],
[-299, 244, 110],
[ 223, -237, -110]])
am trying to find the rows whose element values are between -1 and 1.In the array printed above ROW-6 and ROW-7 are target/result rows.
I tried,
result_idx = np.where(np.logical_and(final_vals>=-1, final_vals<=1))
which returns,
result_idx = (array([ 2, 3, 6, 6, 6, 7, 7, 7, 11, 12], dtype=int64),
array([2, 2, 0, 1, 2, 0, 1, 2, 2, 2], dtype=int64))
I want my program to return only row numbers

You could take the absolute value of all elements, and check which rows's elements are smaller or equal to 1. Then use np.flatnonzero to find the indices where all columns fullfil the condition:
np.flatnonzero((np.abs(final_vals) <= 1).all(axis=1))
Output
array([6, 7], dtype=int64)

Another way to do this based on your approach is to find the truth value of each element and then use numpy.all for each row. Then numpy.where gets you what you want.
mask = (final_vals <= 1) * (final_vals >= -1)
np.where(np.all(mask, axis=1))

How about
np.where(np.all((-1<=final_vals) & (final_vals<=1),axis=1))

You could use np.argwhere:
r = np.logical_and(final_vals <= 1, final_vals >=-1)
result = np.argwhere(r.all(1)).flatten()
print(result)
Output
[6 7]

Another way is using pandas,
you can achieve the row with following code:
df = pd.DataFrame(final_vals)
temp= ((df>=-1) & (df<=1 )).product(axis=1)
rows = temp[temp!=0].keys()
rows
At first it check numbers that are between -1 and +1 and then check rows(with axis=1) that all values accept the condition.
and the result is:
Int64Index([ 6, 7], dtype='int64')

Just a simple list comprehension:
[ i for i, row in enumerate(final_vals) if all([ e >= -1 and e <= 1 for e in row ]) ]
#=> [6, 7]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Numpy: get index of smallest value based on conditions - python

import numpy as np a = np.array([ [10, -1], [ 3, 1], [ 5, -1], [ 7, 1]]) mask = (a[:, 1] == -1) arg = np.argmin(a[mask][:, 0]) result = np.arange(a.shape[0])[mask][arg] print result

np.argwhere(a[:,1] == -1)[np.argmin(a[a[:, 1] == -1, 0])]

This is not efficient but if you have a relatively small array and want a one-line solution: >>> a = np.array([[ 10, -1], ... [ 3, 1], ... [ 5, -1], ... [ 7, 1]]) >>> [i for i in np.argsort(a[:, 0]) if a[i, 1] == -1][0] 2

Related

Replace consecutive duplicates in 2D numpy array

How to get count of elements in a nested list?

How can I extract a column and create a vector out of them using NumPy?

Batch dot product with numpy?

How to find the rows having values between -1 and 1 in a given numpy 2D-array?

Categories

Resources